eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access)
Hi,
The attached patch set eliminates xl_heap_visible, the WAL record
emitted when a block of the heap is set all-visible/frozen in the
visibility map. Instead, it includes the information needed to update
the VM in the WAL record already emitted by the operation modifying
the heap page.
Currently COPY FREEZE and vacuum are the only operations that set the
VM. So, this patch modifies the xl_heap_multi_insert and xl_heap_prune
records.
The result is a dramatic reduction in WAL volume for these operations.
I've included numbers below.
I also think that it makes more sense to include changes to the VM in
the same WAL record as the changes that rendered the page all-visible.
In some cases, we will only set the page all-visible, but that is in
the context of the operation on the heap page which discovered that it
was all-visible. Therefore, I find this to be a clarity as well as a
performance improvement.
This project is also the first step toward setting the VM on-access
for queries which do not modify the page. There are a few design
issues that must be sorted out for that project which I will detail
separately. Note that this patch set currently does not implement
setting the VM on-access.
The attached patch set isn't 100% polished. I think some of the
variable names and comments could use work, but I'd like to validate
the idea of doing this before doing a full polish. This is a summary
of what is in the set:
Patches:
0001 - 0002: cleanup
0003 - 0004: refactoring
0005: COPY FREEZE changes
0006: refactoring
0007: vacuum phase III changes
0008: vacuum phase I empty page changes
0009 - 0012: refactoring
0013: vacuum phase I normal page changes
0014: cleanup
Performance benefits of eliminating xl_heap_visible:
vacuum of table with index (DDL at bottom of email)
--
master -> patch
WAL bytes: 405346 -> 303088 = 25% reduction
WAL records: 6682 -> 4459 = 33% reduction
vacuum of table without index
--
master -> patch
WAL records: 4452 -> 2231 = 50% reduction
WAL bytes: 289016 -> 177978 = 38% reduction
COPY FREEZE of table without index
--
master -> patch
WAL records: 3672777 -> 1854589 = 50% reduction
WAL bytes: 841340339 -> 748545732 = 11% reduction (new pages need a
copy of the whole page)
table for vacuum example:
--
create table foo(a int, b numeric, c numeric) with (autovacuum_enabled= false);
insert into foo select i % 18, repeat('1', 400)::numeric, repeat('2',
400)::numeric from generate_series(1,40000)i;
-- don't make index for no-index case
create index on foo(a);
delete from foo where a = 1;
vacuum (verbose, process_toast false) foo;
copy freeze example:
--
-- create a data file
create table large(a int, b int) with (autovacuum_enabled = false,
fillfactor = 10);
insert into large SELECT generate_series(1,40000000)i, 1;
copy large to 'large.data';
-- example
BEGIN;
create table large(a int, b int) with (autovacuum_enabled = false,
fillfactor = 10);
COPY large FROM 'large.data' WITH (FREEZE);
COMMIT;
- Melanie
Attachments:
v1-0011-Find-and-fix-VM-corruption-in-heap_page_prune_and.patchtext/x-patch; charset=US-ASCII; name=v1-0011-Find-and-fix-VM-corruption-in-heap_page_prune_and.patchDownload
From 5266974bf5a16b05b8c8bac33f4630ed1f1552e1 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <melanieplageman@gmail.com>
Date: Wed, 28 May 2025 16:45:59 -0400
Subject: [PATCH v1 11/14] Find and fix VM corruption in
heap_page_prune_and_freeze
Future commits will update the VM in the same critical section and WAL
record as pruning and freezing. For ease of review, this commit makes
one step toward doing this. It moves the VM corruption handling case to
heap_page_prune_and_freeze().
---
src/backend/access/heap/pruneheap.c | 87 +++++++++++++++++++++++++++-
src/backend/access/heap/vacuumlazy.c | 78 +++----------------------
src/include/access/heapam.h | 4 ++
3 files changed, 96 insertions(+), 73 deletions(-)
diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index 97e51f78854..496b70e318f 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -21,7 +21,7 @@
#include "access/transam.h"
#include "access/xlog.h"
#include "access/xloginsert.h"
-#include "access/visibilitymapdefs.h"
+#include "access/visibilitymap.h"
#include "commands/vacuum.h"
#include "executor/instrument.h"
#include "miscadmin.h"
@@ -177,6 +177,13 @@ static void heap_prune_record_unchanged_lp_redirect(PruneState *prstate, OffsetN
static void page_verify_redirects(Page page);
+static bool identify_and_fix_vm_corruption(Relation relation,
+ BlockNumber heap_blk,
+ Buffer heap_buffer, Page heap_page,
+ bool heap_blk_known_av,
+ int64 nlpdead_items,
+ Buffer vmbuffer);
+
/*
* Optionally prune and repair fragmentation in the specified page.
@@ -261,7 +268,9 @@ heap_page_prune_opt(Relation relation, Buffer buffer)
* not the relation has indexes, since we cannot safely determine
* that during on-access pruning with the current implementation.
*/
- heap_page_prune_and_freeze(relation, buffer, vistest, 0,
+ heap_page_prune_and_freeze(relation, buffer, false,
+ InvalidBuffer,
+ vistest, 0,
NULL, &presult, PRUNE_ON_ACCESS, &dummy_off_loc, NULL, NULL);
/*
@@ -294,6 +303,64 @@ heap_page_prune_opt(Relation relation, Buffer buffer)
}
}
+/*
+ * When updating the visibility map after phase I heap vacuuming, we take the
+ * opportunity to identify and fix any VM corruption.
+ *
+ * heap_blk_known_av is the visibility status of the heap page collected
+ * while finding the next unskippable block in heap_vac_scan_next_block().
+ */
+static bool
+identify_and_fix_vm_corruption(Relation relation,
+ BlockNumber heap_blk,
+ Buffer heap_buffer, Page heap_page,
+ bool heap_blk_known_av,
+ int64 nlpdead_items,
+ Buffer vmbuffer)
+{
+ /*
+ * As of PostgreSQL 9.2, the visibility map bit should never be set if the
+ * page-level bit is clear. However, it's possible that the bit got
+ * cleared after heap_vac_scan_next_block() was called, so we must recheck
+ * with buffer lock before concluding that the VM is corrupt.
+ */
+ if (heap_blk_known_av && !PageIsAllVisible(heap_page) &&
+ visibilitymap_get_status(relation, heap_blk, &vmbuffer) != 0)
+ {
+ elog(WARNING, "page is not marked all-visible but visibility map bit is set in relation \"%s\" page %u",
+ RelationGetRelationName(relation), heap_blk);
+ visibilitymap_clear(relation, heap_blk, vmbuffer,
+ VISIBILITYMAP_VALID_BITS);
+ return true;
+ }
+
+ /*
+ * It's possible for the value returned by
+ * GetOldestNonRemovableTransactionId() to move backwards, so it's not
+ * wrong for us to see tuples that appear to not be visible to everyone
+ * yet, while PD_ALL_VISIBLE is already set. The real safe xmin value
+ * never moves backwards, but GetOldestNonRemovableTransactionId() is
+ * conservative and sometimes returns a value that's unnecessarily small,
+ * so if we see that contradiction it just means that the tuples that we
+ * think are not visible to everyone yet actually are, and the
+ * PD_ALL_VISIBLE flag is correct.
+ *
+ * There should never be LP_DEAD items on a page with PD_ALL_VISIBLE set,
+ * however.
+ */
+ if (nlpdead_items > 0 && PageIsAllVisible(heap_page))
+ {
+ elog(WARNING, "page containing LP_DEAD items is marked as all-visible in relation \"%s\" page %u",
+ RelationGetRelationName(relation), heap_blk);
+ PageClearAllVisible(heap_page);
+ MarkBufferDirty(heap_buffer);
+ visibilitymap_clear(relation, heap_blk, vmbuffer,
+ VISIBILITYMAP_VALID_BITS);
+ return true;
+ }
+
+ return false;
+}
/*
* Prune and repair fragmentation and potentially freeze tuples on the
@@ -314,6 +381,10 @@ heap_page_prune_opt(Relation relation, Buffer buffer)
* HEAP_PRUNE_FREEZE option is not set, because at the moment only callers
* that also freeze need that information.
*
+ * blk_known_av is the visibility status of the heap block as of the last call
+ * to find_next_unskippable_block(). vmbuffer is the buffer that may already
+ * contain the required block of the visibility map.
+ *
* vistest is used to distinguish whether tuples are DEAD or RECENTLY_DEAD
* (see heap_prune_satisfies_vacuum).
*
@@ -349,6 +420,8 @@ heap_page_prune_opt(Relation relation, Buffer buffer)
*/
void
heap_page_prune_and_freeze(Relation relation, Buffer buffer,
+ bool blk_known_av,
+ Buffer vmbuffer,
GlobalVisState *vistest,
int options,
struct VacuumCutoffs *cutoffs,
@@ -897,6 +970,16 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
presult->lpdead_items = prstate.lpdead_items;
/* the presult->deadoffsets array was already filled in */
+ /*
+ * Clear any VM corruption. This does not need to be done in a critical
+ * section.
+ */
+ presult->vm_corruption = false;
+ if (options & HEAP_PAGE_PRUNE_UPDATE_VM)
+ presult->vm_corruption = identify_and_fix_vm_corruption(relation,
+ blockno, buffer, page,
+ blk_known_av,
+ prstate.lpdead_items, vmbuffer);
if (prstate.freeze)
{
if (presult->nfrozen > 0)
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index fdac36f0835..592cd455cf4 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -431,13 +431,6 @@ static void find_next_unskippable_block(LVRelState *vacrel, bool *skipsallvis);
static bool lazy_scan_new_or_empty(LVRelState *vacrel, Buffer buf,
BlockNumber blkno, Page page,
bool sharelock, Buffer vmbuffer);
-
-static bool identify_and_fix_vm_corruption(Relation relation,
- BlockNumber heap_blk,
- Buffer heap_buffer, Page heap_page,
- bool heap_blk_known_av,
- int64 nlpdead_items,
- Buffer vmbuffer);
static void lazy_scan_prune(LVRelState *vacrel, Buffer buf,
BlockNumber blkno, Page page,
Buffer vmbuffer, bool all_visible_according_to_vm,
@@ -1947,65 +1940,6 @@ lazy_scan_new_or_empty(LVRelState *vacrel, Buffer buf, BlockNumber blkno,
return false;
}
-/*
- * When updating the visibility map after phase I heap vacuuming, we take the
- * opportunity to identify and fix any VM corruption.
- *
- * heap_blk_known_av is the visibility status of the heap page collected
- * while finding the next unskippable block in heap_vac_scan_next_block().
- */
-static bool
-identify_and_fix_vm_corruption(Relation relation,
- BlockNumber heap_blk,
- Buffer heap_buffer, Page heap_page,
- bool heap_blk_known_av,
- int64 nlpdead_items,
- Buffer vmbuffer)
-{
- /*
- * As of PostgreSQL 9.2, the visibility map bit should never be set if the
- * page-level bit is clear. However, it's possible that the bit got
- * cleared after heap_vac_scan_next_block() was called, so we must recheck
- * with buffer lock before concluding that the VM is corrupt.
- */
- if (heap_blk_known_av && !PageIsAllVisible(heap_page) &&
- visibilitymap_get_status(relation, heap_blk, &vmbuffer) != 0)
- {
- elog(WARNING, "page is not marked all-visible but visibility map bit is set in relation \"%s\" page %u",
- RelationGetRelationName(relation), heap_blk);
- visibilitymap_clear(relation, heap_blk, vmbuffer,
- VISIBILITYMAP_VALID_BITS);
- return true;
- }
-
- /*
- * It's possible for the value returned by
- * GetOldestNonRemovableTransactionId() to move backwards, so it's not
- * wrong for us to see tuples that appear to not be visible to everyone
- * yet, while PD_ALL_VISIBLE is already set. The real safe xmin value
- * never moves backwards, but GetOldestNonRemovableTransactionId() is
- * conservative and sometimes returns a value that's unnecessarily small,
- * so if we see that contradiction it just means that the tuples that we
- * think are not visible to everyone yet actually are, and the
- * PD_ALL_VISIBLE flag is correct.
- *
- * There should never be LP_DEAD items on a page with PD_ALL_VISIBLE set,
- * however.
- */
- if (nlpdead_items > 0 && PageIsAllVisible(heap_page))
- {
- elog(WARNING, "page containing LP_DEAD items is marked as all-visible in relation \"%s\" page %u",
- RelationGetRelationName(relation), heap_blk);
- PageClearAllVisible(heap_page);
- MarkBufferDirty(heap_buffer);
- visibilitymap_clear(relation, heap_blk, vmbuffer,
- VISIBILITYMAP_VALID_BITS);
- return true;
- }
-
- return false;
-}
-
/* qsort comparator for sorting OffsetNumbers */
static int
@@ -2062,11 +1996,14 @@ lazy_scan_prune(LVRelState *vacrel,
* tuples. Pruning will have determined whether or not the page is
* all-visible.
*/
- prune_options = HEAP_PAGE_PRUNE_FREEZE;
+ prune_options = HEAP_PAGE_PRUNE_FREEZE | HEAP_PAGE_PRUNE_UPDATE_VM;
if (vacrel->nindexes == 0)
prune_options |= HEAP_PAGE_PRUNE_MARK_UNUSED_NOW;
- heap_page_prune_and_freeze(rel, buf, vacrel->vistest, prune_options,
+ heap_page_prune_and_freeze(rel, buf,
+ all_visible_according_to_vm,
+ vmbuffer,
+ vacrel->vistest, prune_options,
&vacrel->cutoffs, &presult, PRUNE_VACUUM_SCAN,
&vacrel->offnum,
&vacrel->NewRelfrozenXid, &vacrel->NewRelminMxid);
@@ -2151,10 +2088,9 @@ lazy_scan_prune(LVRelState *vacrel,
/*
* Handle setting visibility map bit based on information from the VM (as
* of last heap_vac_scan_next_block() call), and from all_visible and
- * all_frozen variables. Start by looking for any VM corruption.
+ * all_frozen variables.
*/
- if (identify_and_fix_vm_corruption(vacrel->rel, blkno, buf, page,
- all_visible_according_to_vm, presult.lpdead_items, vmbuffer))
+ if (presult.vm_corruption)
{
/* Don't update the VM if we just cleared corruption in it */
}
diff --git a/src/include/access/heapam.h b/src/include/access/heapam.h
index 1fa6eb047fd..0886867a161 100644
--- a/src/include/access/heapam.h
+++ b/src/include/access/heapam.h
@@ -41,6 +41,7 @@
/* "options" flag bits for heap_page_prune_and_freeze */
#define HEAP_PAGE_PRUNE_MARK_UNUSED_NOW (1 << 0)
#define HEAP_PAGE_PRUNE_FREEZE (1 << 1)
+#define HEAP_PAGE_PRUNE_UPDATE_VM (1 << 2)
typedef struct BulkInsertStateData *BulkInsertState;
struct TupleTableSlot;
@@ -246,6 +247,7 @@ typedef struct PruneFreezeResult
bool all_visible;
bool all_frozen;
TransactionId vm_conflict_horizon;
+ bool vm_corruption;
/*
* Whether or not the page makes rel truncation unsafe. This is set to
@@ -385,6 +387,8 @@ extern TransactionId heap_index_delete_tuples(Relation rel,
struct GlobalVisState;
extern void heap_page_prune_opt(Relation relation, Buffer buffer);
extern void heap_page_prune_and_freeze(Relation relation, Buffer buffer,
+ bool blk_known_av,
+ Buffer vmbuffer,
struct GlobalVisState *vistest,
int options,
struct VacuumCutoffs *cutoffs,
--
2.34.1
v1-0013-Eliminate-xl_heap_visible-from-vacuum-phase-I-pru.patchtext/x-patch; charset=US-ASCII; name=v1-0013-Eliminate-xl_heap_visible-from-vacuum-phase-I-pru.patchDownload
From f106462fefde3c18ae5767c879f2cc6026748938 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <melanieplageman@gmail.com>
Date: Wed, 18 Jun 2025 12:41:00 -0400
Subject: [PATCH v1 13/14] Eliminate xl_heap_visible from vacuum phase I
prune/freeze
Instead of emitting a separate WAL record for every block rendered
all-visible/frozen by vacuum's phase I, include the changes to the VM in
the xl_heap_prune record already emitted.
This is only enabled for vacuum's prune/freeze work, not for on-access
pruning.
---
src/backend/access/heap/pruneheap.c | 384 +++++++++++++++------------
src/backend/access/heap/vacuumlazy.c | 30 ---
src/include/access/heapam.h | 15 +-
3 files changed, 223 insertions(+), 206 deletions(-)
diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index 425dcc77534..2d9624a246e 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -44,6 +44,13 @@ typedef struct
bool mark_unused_now;
/* whether to attempt freezing tuples */
bool freeze;
+
+ /*
+ * Whether or not to consider updating the VM. There is some bookkeeping
+ * that must be maintained if we would like to update the VM.
+ */
+ bool update_vm;
+
struct VacuumCutoffs *cutoffs;
/*-------------------------------------------------------
@@ -108,8 +115,9 @@ typedef struct
*
* These fields are not used by pruning itself for the most part, but are
* used to collect information about what was pruned and what state the
- * page is in after pruning, for the benefit of the caller. They are
- * copied to the caller's PruneFreezeResult at the end.
+ * page is in after pruning to use when updating the visibility map and
+ * for the benefit of the caller. They are copied to the caller's
+ * PruneFreezeResult at the end.
* -------------------------------------------------------
*/
@@ -138,11 +146,10 @@ typedef struct
* bits. It is only valid if we froze some tuples, and all_frozen is
* true.
*
- * NOTE: all_visible and all_frozen don't include LP_DEAD items. That's
- * convenient for heap_page_prune_and_freeze(), to use them to decide
- * whether to freeze the page or not. The all_visible and all_frozen
- * values returned to the caller are adjusted to include LP_DEAD items at
- * the end.
+ * NOTE: all_visible and all_frozen don't include LP_DEAD items until
+ * directly before updating the VM. We ignore LP_DEAD items when deciding
+ * whether or not to opportunistically freeze and when determining the
+ * snapshot conflict horizon required when freezing tuples.
*
* all_frozen should only be considered valid if all_visible is also set;
* we don't bother to clear the all_frozen flag every time we clear the
@@ -377,11 +384,15 @@ identify_and_fix_vm_corruption(Relation relation,
* considered advantageous for overall system performance to do so now. The
* 'cutoffs', 'presult', 'new_relfrozen_xid' and 'new_relmin_mxid' arguments
* are required when freezing. When HEAP_PRUNE_FREEZE option is set, we also
- * set presult->all_visible and presult->all_frozen on exit, to indicate if
- * the VM bits can be set. They are always set to false when the
- * HEAP_PRUNE_FREEZE option is not set, because at the moment only callers
+ * set presult->all_visible and presult->all_frozen on exit, for use when
+ * validating the changes made to the VM. They are always set to false when
+ * the HEAP_PRUNE_FREEZE option is not set, because at the moment only callers
* that also freeze need that information.
*
+ * If HEAP_PAGE_PRUNE_UPDATE_VM is set and the visibility status of the page
+ * has changed, we will update the VM at the same time as pruning and freezing
+ * the heap page.
+ *
* blk_known_av is the visibility status of the heap block as of the last call
* to find_next_unskippable_block(). vmbuffer is the buffer that may already
* contain the required block of the visibility map.
@@ -396,6 +407,8 @@ identify_and_fix_vm_corruption(Relation relation,
* FREEZE indicates that we will also freeze tuples, and will return
* 'all_visible', 'all_frozen' flags to the caller.
*
+ * UPDATE_VM indicates that we will set the page's status in the VM.
+ *
* cutoffs contains the freeze cutoffs, established by VACUUM at the beginning
* of vacuuming the relation. Required if HEAP_PRUNE_FREEZE option is set.
* cutoffs->OldestXmin is also used to determine if dead tuples are
@@ -441,15 +454,19 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
bool do_freeze;
bool do_prune;
bool do_hint;
+ bool do_set_vm;
uint8 vmflags = 0;
uint8 old_vmbits = 0;
bool hint_bit_fpi;
int64 fpi_before = pgWalUsage.wal_fpi;
+ bool all_frozen_except_lp_dead = false;
+ bool set_pd_all_visible = false;
/* Copy parameters to prstate */
prstate.vistest = vistest;
prstate.mark_unused_now = (options & HEAP_PAGE_PRUNE_MARK_UNUSED_NOW) != 0;
prstate.freeze = (options & HEAP_PAGE_PRUNE_FREEZE) != 0;
+ prstate.update_vm = (options & HEAP_PAGE_PRUNE_UPDATE_VM) != 0;
prstate.cutoffs = cutoffs;
/*
@@ -496,29 +513,27 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
prstate.deadoffsets = presult->deadoffsets;
/*
- * Caller may update the VM after we're done. We can keep track of
- * whether the page will be all-visible and all-frozen after pruning and
- * freezing to help the caller to do that.
+ * Keep track of whether or not the page will be all-visible and
+ * all-frozen for use in opportunistic freezing and to update the VM if
+ * the caller requests it.
+ *
+ * Currently, only VACUUM attempts freezing and setting the VM bits. But
+ * other callers could do either one. The visibility bookkeeping is
+ * required for opportunistic freezing (in addition to setting the VM
+ * bits) because we only consider opportunistically freezing tuples if the
+ * whole page would become all-frozen or if the whole page will be frozen
+ * except for dead tuples that will be removed by vacuum.
*
- * Currently, only VACUUM sets the VM bits. To save the effort, only do
- * the bookkeeping if the caller needs it. Currently, that's tied to
- * HEAP_PAGE_PRUNE_FREEZE, but it could be a separate flag if you wanted
- * to update the VM bits without also freezing or freeze without also
- * setting the VM bits.
+ * Dead tuples which will be removed by the end of vacuuming should not
+ * preclude us from opportunistically freezing, so we do not clear
+ * all_visible when we see LP_DEAD items. We fix that after determining
+ * whether or not to freeze but before deciding whether or not to update
+ * the VM so that we don't set the VM bit incorrectly.
*
- * In addition to telling the caller whether it can set the VM bit, we
- * also use 'all_visible' and 'all_frozen' for our own decision-making. If
- * the whole page would become frozen, we consider opportunistically
- * freezing tuples. We will not be able to freeze the whole page if there
- * are tuples present that are not visible to everyone or if there are
- * dead tuples which are not yet removable. However, dead tuples which
- * will be removed by the end of vacuuming should not preclude us from
- * opportunistically freezing. Because of that, we do not clear
- * all_visible when we see LP_DEAD items. We fix that at the end of the
- * function, when we return the value to the caller, so that the caller
- * doesn't set the VM bit incorrectly.
+ * If not freezing or updating the VM, we otherwise avoid the extra
+ * bookkeeping.
*/
- if (prstate.freeze)
+ if (prstate.freeze || prstate.update_vm)
{
prstate.all_visible = true;
prstate.all_frozen = true;
@@ -534,12 +549,15 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
}
/*
- * The visibility cutoff xid is the newest xmin of live tuples on the
- * page. In the common case, this will be set as the conflict horizon the
- * caller can use for updating the VM. If, at the end of freezing and
- * pruning, the page is all-frozen, there is no possibility that any
- * running transaction on the standby does not see tuples on the page as
- * all-visible, so the conflict horizon remains InvalidTransactionId.
+ * The visibility cutoff xid is the newest xmin of live, committed tuples
+ * older than OldestXmin on the page. This field is only kept up-to-date
+ * if the page is all-visible. As soon as a tuple is encountered that is
+ * not visible to all, this field is unmaintained. As long as it is
+ * maintained, it can be used to calculate the snapshot conflict horizon.
+ * This is most likely to happen when updating the VM and/or freezing all
+ * live tuples on the page. It is updated before returning to the caller
+ * because vacuum does assert-build only validation on the page using this
+ * field.
*/
prstate.visibility_cutoff_xid = InvalidTransactionId;
@@ -827,6 +845,68 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
*/
}
+ /*
+ * It was convenient to ignore LP_DEAD items in all_visible earlier on to
+ * make the choice of whether or not to freeze the page unaffected by the
+ * short-term presence of LP_DEAD items. These LP_DEAD items were
+ * effectively assumed to be LP_UNUSED items in the making. It doesn't
+ * matter which vacuum heap pass (initial pass or final pass) ends up
+ * setting the page all-frozen, as long as the ongoing VACUUM does it.
+ *
+ * Now that freezing has been finalized, unset all_visible if there are
+ * any LP_DEAD items on the page. It needs to reflect the present state of
+ * the page when using it to determine whether or not to update the VM.
+ *
+ * Keep track of whether or not the page was all-frozen except LP_DEAD
+ * items for the purposes of calculating the snapshot conflict horizon,
+ * though.
+ */
+ all_frozen_except_lp_dead = prstate.all_frozen;
+ if (prstate.lpdead_items > 0)
+ {
+ prstate.all_visible = false;
+ prstate.all_frozen = false;
+ }
+
+ /*
+ * Handle setting visibility map bit based on information from the VM (as
+ * of last heap_vac_scan_next_block() call), and from all_visible and
+ * all_frozen variables.
+ */
+ if (prstate.update_vm)
+ {
+ /*
+ * Clear any VM corruption. This does not need to be in a critical
+ * section, so we do it first. If PD_ALL_VISIBLE is incorrectly set,
+ * we may mark the heap page buffer dirty here and could end up doing
+ * so again later. This is not a correctness issue and is in the path
+ * of VM corruption, so we don't have to worry about the extra
+ * performance overhead.
+ */
+ if (identify_and_fix_vm_corruption(relation,
+ blockno, buffer, page,
+ blk_known_av, prstate.lpdead_items, vmbuffer))
+ {
+ /* If we fix corruption, don't update the VM further */
+ }
+
+ /* Determine if we actually need to set the VM and which bits to set. */
+ else if (prstate.all_visible &&
+ (!blk_known_av ||
+ (prstate.all_frozen && !VM_ALL_FROZEN(relation, blockno, &vmbuffer))))
+ {
+ vmflags |= VISIBILITYMAP_ALL_VISIBLE;
+ if (prstate.all_frozen)
+ vmflags |= VISIBILITYMAP_ALL_FROZEN;
+ }
+ }
+
+ do_set_vm = vmflags & VISIBILITYMAP_VALID_BITS;
+ set_pd_all_visible = do_set_vm && !PageIsAllVisible(page);
+
+ /* Save these for the caller in case we later zero out vmflags */
+ presult->new_vmbits = vmflags;
+
/* Any error while applying the changes is critical */
START_CRIT_SECTION();
@@ -848,13 +928,13 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
/*
* If that's all we had to do to the page, this is a non-WAL-logged
* hint. If we are going to freeze or prune the page, we will mark
- * the buffer dirty below.
+ * the buffer dirty and emit WAL below.
*/
- if (!do_freeze && !do_prune)
+ if (!do_prune && !do_freeze && !do_set_vm)
MarkBufferDirtyHint(buffer, true);
}
- if (do_prune || do_freeze)
+ if (do_prune || do_freeze || do_set_vm)
{
/* Apply the planned item changes and repair page fragmentation. */
if (do_prune)
@@ -868,7 +948,23 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
if (do_freeze)
heap_freeze_prepared_tuples(buffer, prstate.frozen, prstate.nfrozen);
- MarkBufferDirty(buffer);
+ if (do_prune || do_freeze)
+ MarkBufferDirty(buffer);
+
+ if (do_set_vm)
+ {
+ LockBuffer(vmbuffer, BUFFER_LOCK_EXCLUSIVE);
+ old_vmbits = heap_page_set_vm(relation, blockno, buffer,
+ vmbuffer, vmflags);
+
+ if (old_vmbits == vmflags)
+ {
+ LockBuffer(vmbuffer, BUFFER_LOCK_UNLOCK);
+ do_set_vm = false;
+ /* 0 out vmflags so we don't emit VM update WAL */
+ vmflags = 0;
+ }
+ }
/*
* Emit a WAL XLOG_HEAP2_PRUNE_FREEZE record showing what we did
@@ -885,35 +981,57 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
* on the standby with xids older than the youngest tuple this
* record will freeze will conflict.
*/
- TransactionId frz_conflict_horizon = InvalidTransactionId;
- TransactionId conflict_xid;
+ TransactionId conflict_xid = InvalidTransactionId;
+
+ /*
+ * If we are updating the VM, the conflict horizon is almost
+ * always the visibility cutoff XID.
+ *
+ * Separately, if we are freezing any tuples, as an optimization,
+ * we can use the visibility_cutoff_xid as the conflict horizon if
+ * the page will be all-frozen. This is true even if there are
+ * LP_DEAD line pointers because we ignored those when maintaining
+ * the visibility_cutoff_xid.
+ */
+ if (do_set_vm || (do_freeze && all_frozen_except_lp_dead))
+ conflict_xid = prstate.visibility_cutoff_xid;
/*
- * We can use the visibility_cutoff_xid as our cutoff for
- * conflicts when the whole page is eligible to become all-frozen
- * in the VM once we're done with it. Otherwise we generate a
- * conservative cutoff by stepping back from OldestXmin.
+ * Otherwise, if we are freezing but the page would not be
+ * all-frozen, we have to use the more pessimistic horizon of
+ * OldestXmin, which may be newer than the newest tuple we froze.
+ * That's because we won't have maintained the
+ * visibility_cutoff_xid.
*/
- if (do_freeze)
+ else if (do_freeze)
{
- if (prstate.all_visible && prstate.all_frozen)
- frz_conflict_horizon = prstate.visibility_cutoff_xid;
- else
- {
- /* Avoids false conflicts when hot_standby_feedback in use */
- frz_conflict_horizon = prstate.cutoffs->OldestXmin;
- TransactionIdRetreat(frz_conflict_horizon);
- }
+ conflict_xid = prstate.cutoffs->OldestXmin;
+ TransactionIdRetreat(conflict_xid);
}
- if (TransactionIdFollows(frz_conflict_horizon, prstate.latest_xid_removed))
- conflict_xid = frz_conflict_horizon;
- else
+ /*
+ * If we are removing tuples with a younger xmax than our so far
+ * calculated conflict_xid, we must use this as our horizon.
+ */
+ if (TransactionIdFollows(prstate.latest_xid_removed, conflict_xid))
conflict_xid = prstate.latest_xid_removed;
+ /*
+ * We can omit the snapshot conflict horizon if we are not pruning
+ * or freezing any tuples and are setting an already all-visible
+ * page all-frozen in the VM. In this case, all of the tuples on
+ * the page must already be visible to all MVCC snapshots on the
+ * standby.
+ */
+ if (!do_prune && !do_freeze && do_set_vm &&
+ blk_known_av && (vmflags & VISIBILITYMAP_ALL_FROZEN))
+ conflict_xid = InvalidTransactionId;
+
log_heap_prune_and_freeze(relation, buffer,
false,
- InvalidBuffer, 0, false,
+ vmbuffer,
+ vmflags,
+ set_pd_all_visible,
conflict_xid,
true, reason,
prstate.frozen, prstate.nfrozen,
@@ -925,124 +1043,55 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
END_CRIT_SECTION();
- /* Copy information back for caller */
- presult->ndeleted = prstate.ndeleted;
- presult->nnewlpdead = prstate.ndead;
- presult->nfrozen = prstate.nfrozen;
- presult->live_tuples = prstate.live_tuples;
- presult->recently_dead_tuples = prstate.recently_dead_tuples;
-
- /*
- * It was convenient to ignore LP_DEAD items in all_visible earlier on to
- * make the choice of whether or not to freeze the page unaffected by the
- * short-term presence of LP_DEAD items. These LP_DEAD items were
- * effectively assumed to be LP_UNUSED items in the making. It doesn't
- * matter which vacuum heap pass (initial pass or final pass) ends up
- * setting the page all-frozen, as long as the ongoing VACUUM does it.
- *
- * Now that freezing has been finalized, unset all_visible if there are
- * any LP_DEAD items on the page. It needs to reflect the present state
- * of the page, as expected for updating the visibility map.
- */
- if (prstate.all_visible && prstate.lpdead_items == 0)
- {
- presult->all_visible = prstate.all_visible;
- presult->all_frozen = prstate.all_frozen;
- }
- else
- {
- presult->all_visible = false;
- presult->all_frozen = false;
- }
+ if (do_set_vm)
+ LockBuffer(vmbuffer, BUFFER_LOCK_UNLOCK);
- presult->hastup = prstate.hastup;
-
- /*
- * If updating the visibility map, the conflict horizon for that record
- * must be the newest xmin on the page. However, if the page is
- * completely frozen, there can be no conflict and the vm_conflict_horizon
- * should remain InvalidTransactionId. This includes the case that we
- * just froze all the tuples; the prune-freeze record included the
- * conflict XID already so the VM update record doesn't need it.
- */
- if (presult->all_frozen)
- presult->vm_conflict_horizon = InvalidTransactionId;
- else
- presult->vm_conflict_horizon = prstate.visibility_cutoff_xid;
+ Assert(!prstate.all_visible || (prstate.lpdead_items == 0));
/*
- * Handle setting visibility map bit based on information from the VM (as
- * of last heap_vac_scan_next_block() call), and from all_visible and
- * all_frozen variables.
+ * VACUUM will call heap_page_is_all_visible() during the second pass over
+ * the heap to determine all_visible and all_frozen for the page -- this
+ * is a specialized version of the logic from this function. Now that
+ * we've finished pruning and freezing, make sure that we're in total
+ * agreement with heap_page_is_all_visible() using an assertion. We will
+ * have already set the page in the VM, so this assertion will only let
+ * you know that you've already done something wrong.
*/
- if (options & HEAP_PAGE_PRUNE_UPDATE_VM)
+#ifdef USE_ASSERT_CHECKING
+ if (prstate.all_visible)
{
- if (identify_and_fix_vm_corruption(relation,
- blockno, buffer, page,
- blk_known_av,
- prstate.lpdead_items, vmbuffer))
- {
- /* If we fix corruption, don't update the VM further */
- }
+ TransactionId debug_cutoff;
+ bool debug_all_frozen;
- /*
- * If the page isn't yet marked all-visible in the VM or it is and
- * needs to me marked all-frozen, update the VM Note that all_frozen
- * is only valid if all_visible is true, so we must check both
- * all_visible and all_frozen.
- */
- else if (presult->all_visible &&
- (!blk_known_av ||
- (presult->all_frozen && !VM_ALL_FROZEN(relation, blockno, &vmbuffer))))
- {
- Assert(prstate.lpdead_items == 0);
- vmflags = VISIBILITYMAP_ALL_VISIBLE;
+ Assert(cutoffs);
- /*
- * If the page is all-frozen, we can pass InvalidTransactionId as
- * our cutoff_xid, since a snapshotConflictHorizon sufficient to
- * make everything safe for REDO was logged when the page's tuples
- * were frozen.
- */
- if (presult->all_frozen)
- {
- Assert(!TransactionIdIsValid(presult->vm_conflict_horizon));
- vmflags |= VISIBILITYMAP_ALL_FROZEN;
- }
+ Assert(prstate.lpdead_items == 0);
- /*
- * It's possible for the VM bit to be clear and the page-level bit
- * to be set if checksums are not enabled.
- *
- * And even if we are just planning to update the frozen bit in
- * the VM, we shouldn't rely on all_visible_according_to_vm as a
- * proxy for the page-level PD_ALL_VISIBLE bit being set, since it
- * might have become stale.
- *
- * If the heap page is all-visible but the VM bit is not set, we
- * don't need to dirty the heap page. However, if checksums are
- * enabled, we do need to make sure that the heap page is dirtied
- * before passing it to visibilitymap_set(), because it may be
- * logged.
- */
- if (!PageIsAllVisible(page) || XLogHintBitIsNeeded())
- {
- PageSetAllVisible(page);
- MarkBufferDirty(buffer);
- }
+ if (!heap_page_is_all_visible(relation, buffer,
+ cutoffs->OldestXmin,
+ &debug_all_frozen,
+ &debug_cutoff, off_loc))
+ Assert(false);
- old_vmbits = heap_page_set_vm_and_log(relation, blockno, buffer,
- vmbuffer, presult->vm_conflict_horizon,
- vmflags);
- }
+ Assert(prstate.all_frozen == debug_all_frozen);
+
+ Assert(!TransactionIdIsValid(debug_cutoff) ||
+ debug_cutoff == prstate.visibility_cutoff_xid);
}
+#endif
+ /* Copy information back for caller */
+ presult->ndeleted = prstate.ndeleted;
+ presult->nnewlpdead = prstate.ndead;
+ presult->nfrozen = prstate.nfrozen;
+ presult->live_tuples = prstate.live_tuples;
+ presult->recently_dead_tuples = prstate.recently_dead_tuples;
+ presult->old_vmbits = old_vmbits;
+ /* new_vmbits was set above */
+ presult->hastup = prstate.hastup;
presult->lpdead_items = prstate.lpdead_items;
/* the presult->deadoffsets array was already filled in */
- presult->old_vmbits = old_vmbits;
- presult->new_vmbits = vmflags;
-
if (prstate.freeze)
{
if (presult->nfrozen > 0)
@@ -1624,8 +1673,13 @@ heap_prune_record_unchanged_lp_normal(Page page, PruneState *prstate, OffsetNumb
break;
}
- /* Consider freezing any normal tuples which will not be removed */
- if (prstate->freeze)
+ /*
+ * Consider freezing any normal tuples which will not be removed.
+ * Regardless of whether or not we want to freeze the tuples, if we want
+ * to update the VM, we have to call heap_prepare_freeze_tuple() on every
+ * tuple to know whether or not the page will be totally frozen.
+ */
+ if (prstate->freeze || prstate->update_vm)
{
bool totally_frozen;
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index 5806207a674..7d74f8fc0f1 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -2019,34 +2019,6 @@ lazy_scan_prune(LVRelState *vacrel,
vacrel->new_frozen_tuple_pages++;
}
- /*
- * VACUUM will call heap_page_is_all_visible() during the second pass over
- * the heap to determine all_visible and all_frozen for the page -- this
- * is a specialized version of the logic from this function. Now that
- * we've finished pruning and freezing, make sure that we're in total
- * agreement with heap_page_is_all_visible() using an assertion.
- */
-#ifdef USE_ASSERT_CHECKING
- /* Note that all_frozen value does not matter when !all_visible */
- if (presult.all_visible)
- {
- TransactionId debug_cutoff;
- bool debug_all_frozen;
-
- Assert(presult.lpdead_items == 0);
-
- if (!heap_page_is_all_visible(vacrel->rel, buf,
- vacrel->cutoffs.OldestXmin, &debug_all_frozen,
- &debug_cutoff, &vacrel->offnum))
- Assert(false);
-
- Assert(presult.all_frozen == debug_all_frozen);
-
- Assert(!TransactionIdIsValid(debug_cutoff) ||
- debug_cutoff == presult.vm_conflict_horizon);
- }
-#endif
-
/*
* Now save details of the LP_DEAD items from the page in vacrel
*/
@@ -2080,8 +2052,6 @@ lazy_scan_prune(LVRelState *vacrel,
/* Did we find LP_DEAD items? */
*has_lpdead_items = (presult.lpdead_items > 0);
- Assert(!presult.all_visible || !(*has_lpdead_items));
-
/*
* For the purposes of logging, count whether or not the page was newly
* set all-visible and, potentially, all-frozen.
diff --git a/src/include/access/heapam.h b/src/include/access/heapam.h
index 534a63aab31..e35b4adf38d 100644
--- a/src/include/access/heapam.h
+++ b/src/include/access/heapam.h
@@ -234,19 +234,12 @@ typedef struct PruneFreezeResult
int recently_dead_tuples;
/*
- * all_visible and all_frozen indicate the status of the page as reflected
- * in the visibility map after pruning, freezing, and setting any pages
- * all-visible in the visibility map.
+ * old_vmbits are the state of the all-visible and all-frozen bits in the
+ * visibility map before updating it during phase I of vacuuming.
+ * new_vmbits are the state of those bits after phase I of vacuuming.
*
- * vm_conflict_horizon is the newest xmin of live tuples on the page
- * (older than OldestXmin). It will only be valid if we did not set the
- * page all-frozen in the VM.
- *
- * These are only set if the HEAP_PRUNE_FREEZE option is set.
+ * These are only set if the HEAP_PAGE_PRUNE_UPDATE_VM option is set.
*/
- bool all_visible;
- bool all_frozen;
- TransactionId vm_conflict_horizon;
uint8 old_vmbits;
uint8 new_vmbits;
--
2.34.1
v1-0002-Simplify-vacuum-VM-update-logging-counters.patchtext/x-patch; charset=US-ASCII; name=v1-0002-Simplify-vacuum-VM-update-logging-counters.patchDownload
From 6cbbdd359ae4de835bbd77369b598885e8a279b2 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <melanieplageman@gmail.com>
Date: Wed, 18 Jun 2025 16:12:15 -0400
Subject: [PATCH v1 02/14] Simplify vacuum VM update logging counters
We can simplify the VM counters added in dc6acfd910b8 to
lazy_vacuum_heap_page() and lazy_scan_new_or_empty().
We won't invoke lazy_vacuum_heap_page() unless there are dead line
pointers, so we know the page can't be all-visible.
In lazy_scan_new_or_empty(), we only update the VM if the page-level
hint PD_ALL_VISIBLE is clear, and the VM bit cannot be set if the page
level bit is clear because a subsequent page update would fail to clear
the visibility map bit.
Simplify the logic for determining which log counters to increment based
on this knowledge.
---
src/backend/access/heap/vacuumlazy.c | 32 +++++++++++-----------------
1 file changed, 12 insertions(+), 20 deletions(-)
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index 09416450af9..c8da2f835c4 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -1900,17 +1900,12 @@ lazy_scan_new_or_empty(LVRelState *vacrel, Buffer buf, BlockNumber blkno,
VISIBILITYMAP_ALL_FROZEN);
END_CRIT_SECTION();
- /*
- * If the page wasn't already set all-visible and/or all-frozen in
- * the VM, count it as newly set for logging.
- */
- if ((old_vmbits & VISIBILITYMAP_ALL_VISIBLE) == 0)
- {
- vacrel->vm_new_visible_pages++;
- vacrel->vm_new_visible_frozen_pages++;
- }
- else if ((old_vmbits & VISIBILITYMAP_ALL_FROZEN) == 0)
- vacrel->vm_new_frozen_pages++;
+ /* VM bits cannot have been set if PD_ALL_VISIBLE was clear */
+ Assert((old_vmbits & VISIBILITYMAP_VALID_BITS) == 0);
+ (void) old_vmbits; /* Silence compiler */
+ /* Count the newly all-frozen pages for logging. */
+ vacrel->vm_new_visible_pages++;
+ vacrel->vm_new_visible_frozen_pages++;
}
freespace = PageGetHeapFreeSpace(page);
@@ -2930,20 +2925,17 @@ lazy_vacuum_heap_page(LVRelState *vacrel, BlockNumber blkno, Buffer buffer,
vmbuffer, visibility_cutoff_xid,
flags);
- /*
- * If the page wasn't already set all-visible and/or all-frozen in the
- * VM, count it as newly set for logging.
- */
- if ((old_vmbits & VISIBILITYMAP_ALL_VISIBLE) == 0)
+ /* We know the page should not have been all-visible */
+ Assert((old_vmbits & VISIBILITYMAP_VALID_BITS) == 0);
+ (void) old_vmbits; /* Silence compiler */
+
+ /* Count the newly set VM page for logging */
+ if ((flags & VISIBILITYMAP_ALL_VISIBLE) != 0)
{
vacrel->vm_new_visible_pages++;
if (all_frozen)
vacrel->vm_new_visible_frozen_pages++;
}
-
- else if ((old_vmbits & VISIBILITYMAP_ALL_FROZEN) == 0 &&
- all_frozen)
- vacrel->vm_new_frozen_pages++;
}
/* Revert to the previous phase information for error traceback */
--
2.34.1
v1-0004-Introduce-unlogged-versions-of-VM-functions.patchtext/x-patch; charset=US-ASCII; name=v1-0004-Introduce-unlogged-versions-of-VM-functions.patchDownload
From 9750354f2b7d7bd3afd38fca5e0ca2dd814a19a2 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <melanieplageman@gmail.com>
Date: Tue, 17 Jun 2025 17:22:10 -0400
Subject: [PATCH v1 04/14] Introduce unlogged versions of VM functions
Future commits will eliminate usages of xl_heap_visible and incorporate
setting the VM into the WAL records making other changes to the heap
page. As a step toward this make versions of the functions which update
the VM and its heap-specific wrapper which do not emit their own WAL.
These will be used in follow-on commits.
---
src/backend/access/heap/heapam.c | 44 ++++++++++++++++++++++++
src/backend/access/heap/visibilitymap.c | 45 +++++++++++++++++++++++++
src/include/access/heapam.h | 3 ++
src/include/access/visibilitymap.h | 2 ++
4 files changed, 94 insertions(+)
diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index dc409fd3a60..15dc3d88843 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -7868,6 +7868,50 @@ heap_page_set_vm_and_log(Relation rel, BlockNumber heap_blk, Buffer heap_buf,
InvalidXLogRecPtr, vmbuf, cutoff_xid, vmflags, set_heap_lsn);
}
+/*
+ * Ensure the provided heap page is marked PD_ALL_VISIBLE and then set the
+ * provided vmflags in the provided vmbuf.
+ *
+ * Both the heap page and VM page should be pinned and exclusive locked.
+ * You must pass a VM buffer containing the correct page of the map
+ * corresponding to the passed in heap block.
+ *
+ * This should only be called in a critical section that also emits WAL (as
+ * needed) for both heap page changes and VM page changes.
+ */
+uint8
+heap_page_set_vm(Relation rel, BlockNumber heap_blk, Buffer heap_buf,
+ Buffer vmbuf, uint8 vmflags)
+{
+ Page heap_page = BufferGetPage(heap_buf);
+
+ Assert(BufferIsValid(heap_buf));
+ Assert(CritSectionCount > 0);
+
+ /* Check that we have the right heap page pinned */
+ if (BufferGetBlockNumber(heap_buf) != heap_blk)
+ elog(ERROR, "wrong heap buffer passed to heap_page_set_vm");
+
+ /*
+ * We must never end up with the VM bit set and the page-level
+ * PD_ALL_VISIBLE bit clear. If that were to occur, a subsequent page
+ * modification would fail to clear the VM bit.
+ *
+ * Prior to Postgres 19, it was possible for the page-level bit to be set
+ * and the VM bit to be clear. This could happen if we crashed after
+ * setting PD_ALL_VISIBLE but before setting bits in the VM. Since
+ * Postgres 19, since heap page modifications are done in the same
+ * critical section as setting the VM bits, that should not longer happen.
+ */
+ if (!PageIsAllVisible(heap_page))
+ {
+ PageSetAllVisible(heap_page);
+ MarkBufferDirty(heap_buf);
+ }
+
+ return visibilitymap_set_vmbyte(rel, heap_blk, vmbuf, vmflags);
+}
+
/*
* heap_tuple_should_freeze
*
diff --git a/src/backend/access/heap/visibilitymap.c b/src/backend/access/heap/visibilitymap.c
index 45721399122..9f27ace0e1c 100644
--- a/src/backend/access/heap/visibilitymap.c
+++ b/src/backend/access/heap/visibilitymap.c
@@ -317,6 +317,51 @@ visibilitymap_set(Relation rel, BlockNumber heapBlk, Buffer heapBuf,
return status;
}
+/*
+ * Set flags in the VM block contained in the passed in vmBuf.
+ * Caller must have pinned and exclusive locked the correct block of the VM in
+ * vmBuf.
+ * Caller is responsible for WAL logging the changes to the VM buffer and for
+ * making any changes needed to the associated heap page.
+ */
+uint8
+visibilitymap_set_vmbyte(Relation rel, BlockNumber heapBlk,
+ Buffer vmBuf, uint8 flags)
+{
+ BlockNumber mapBlock = HEAPBLK_TO_MAPBLOCK(heapBlk);
+ uint32 mapByte = HEAPBLK_TO_MAPBYTE(heapBlk);
+ uint8 mapOffset = HEAPBLK_TO_OFFSET(heapBlk);
+ Page page;
+ uint8 *map;
+ uint8 status;
+
+#ifdef TRACE_VISIBILITYMAP
+ elog(DEBUG1, "vm_set %s %d", RelationGetRelationName(rel), heapBlk);
+#endif
+
+ /* Flags should be valid. Also never clear bits with this function */
+ Assert((flags & VISIBILITYMAP_VALID_BITS) == flags);
+
+ /* Must never set all_frozen bit without also setting all_visible bit */
+ Assert(flags != VISIBILITYMAP_ALL_FROZEN);
+
+ /* Check that we have the right VM page pinned */
+ if (!BufferIsValid(vmBuf) || BufferGetBlockNumber(vmBuf) != mapBlock)
+ elog(ERROR, "wrong VM buffer passed to visibilitymap_set");
+
+ page = BufferGetPage(vmBuf);
+ map = (uint8 *) PageGetContents(page);
+
+ status = (map[mapByte] >> mapOffset) & VISIBILITYMAP_VALID_BITS;
+ if (flags != status)
+ {
+ map[mapByte] |= (flags << mapOffset);
+ MarkBufferDirty(vmBuf);
+ }
+
+ return status;
+}
+
/*
* visibilitymap_get_status - get status of bits
*
diff --git a/src/include/access/heapam.h b/src/include/access/heapam.h
index 9375296062f..5127fdb9c77 100644
--- a/src/include/access/heapam.h
+++ b/src/include/access/heapam.h
@@ -360,6 +360,9 @@ extern bool heap_tuple_should_freeze(HeapTupleHeader tuple,
TransactionId *NoFreezePageRelfrozenXid,
MultiXactId *NoFreezePageRelminMxid);
extern bool heap_tuple_needs_eventual_freeze(HeapTupleHeader tuple);
+
+extern uint8 heap_page_set_vm(Relation rel, BlockNumber heap_blk, Buffer heap_buf,
+ Buffer vmbuf, uint8 vmflags);
extern uint8 heap_page_set_vm_and_log(Relation rel, BlockNumber heap_blk, Buffer heap_buf,
Buffer vmbuf, TransactionId cutoff_xid,
uint8 vmflags);
diff --git a/src/include/access/visibilitymap.h b/src/include/access/visibilitymap.h
index 4fa4f837535..5d0a9417c25 100644
--- a/src/include/access/visibilitymap.h
+++ b/src/include/access/visibilitymap.h
@@ -37,6 +37,8 @@ extern uint8 visibilitymap_set(Relation rel,
Buffer vmBuf,
TransactionId cutoff_xid,
uint8 flags, bool set_heap_lsn);
+extern uint8 visibilitymap_set_vmbyte(Relation rel, BlockNumber heapBlk,
+ Buffer vmBuf, uint8 flags);
extern uint8 visibilitymap_get_status(Relation rel, BlockNumber heapBlk, Buffer *vmbuf);
extern void visibilitymap_count(Relation rel, BlockNumber *all_visible, BlockNumber *all_frozen);
extern BlockNumber visibilitymap_prepare_truncate(Relation rel,
--
2.34.1
v1-0001-Remove-unused-check-in-heap_xlog_insert.patchtext/x-patch; charset=US-ASCII; name=v1-0001-Remove-unused-check-in-heap_xlog_insert.patchDownload
From 593d33896dcb618f806b911e80fd448fdacbba0a Mon Sep 17 00:00:00 2001
From: Melanie Plageman <melanieplageman@gmail.com>
Date: Thu, 12 Jun 2025 15:38:37 -0400
Subject: [PATCH v1 01/14] Remove unused check in heap_xlog_insert()
8e03eb92e9ad54e2 reverted the commit 39b66a91bd which allowed freezing
in the heap_insert() code path but did not remove the corresponding
check in heap_xlog_insert(). This code is extraneous but not harmful.
However, cleaning it up makes it very clear that, as of now, we do not
support any freezing of pages in the heap_insert() path.
---
src/backend/access/heap/heapam_xlog.c | 5 ++---
1 file changed, 2 insertions(+), 3 deletions(-)
diff --git a/src/backend/access/heap/heapam_xlog.c b/src/backend/access/heap/heapam_xlog.c
index 30f4c2d3c67..fa94e104f1c 100644
--- a/src/backend/access/heap/heapam_xlog.c
+++ b/src/backend/access/heap/heapam_xlog.c
@@ -508,9 +508,8 @@ heap_xlog_insert(XLogReaderState *record)
if (xlrec->flags & XLH_INSERT_ALL_VISIBLE_CLEARED)
PageClearAllVisible(page);
- /* XLH_INSERT_ALL_FROZEN_SET implies that all tuples are visible */
- if (xlrec->flags & XLH_INSERT_ALL_FROZEN_SET)
- PageSetAllVisible(page);
+ /* This should not happen in the heap_insert() code path */
+ Assert(!(xlrec->flags & XLH_INSERT_ALL_FROZEN_SET));
MarkBufferDirty(buffer);
}
--
2.34.1
v1-0003-Introduce-heap-specific-wrapper-for-visibilitymap.patchtext/x-patch; charset=US-ASCII; name=v1-0003-Introduce-heap-specific-wrapper-for-visibilitymap.patchDownload
From 904a31bb8f519f5a9e4b30d9010edf506cddad1f Mon Sep 17 00:00:00 2001
From: Melanie Plageman <melanieplageman@gmail.com>
Date: Tue, 17 Jun 2025 17:06:45 -0400
Subject: [PATCH v1 03/14] Introduce heap-specific wrapper for
visibilitymap_set()
visibilitymap_set(), which sets bits in the visibility map corresponding
to the heap block of the table passed in, arguably contains some
layering violations.
For example, it sets the heap page's LSN when checksums/wal_log_hints
are enabled. However, the caller may not need to set PD_ALL_VISIBLE
(when it is already set) and thus may not have marked the buffer dirty.
visibilitymap_set() will still set the page LSN in this case, even when
it would have been correct for the caller to *not* mark the buffer
dirty.
Also, every caller that needs to has to remember to set PD_ALL_VISIBLE
and mark the buffer dirty. This commit introduces a wrapper that does
this and a flag to visibilitymap_set() indicating whether or not the
heap page LSN should be set.
---
src/backend/access/heap/heapam.c | 62 ++++++++++++++++++-------
src/backend/access/heap/heapam_xlog.c | 2 +-
src/backend/access/heap/vacuumlazy.c | 60 ++++++------------------
src/backend/access/heap/visibilitymap.c | 19 ++++----
src/include/access/heapam.h | 3 ++
src/include/access/visibilitymap.h | 2 +-
6 files changed, 73 insertions(+), 75 deletions(-)
diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index 0dcd6ee817e..dc409fd3a60 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -2505,8 +2505,6 @@ heap_multi_insert(Relation relation, TupleTableSlot **slots, int ntuples,
BufferGetBlockNumber(buffer),
vmbuffer, VISIBILITYMAP_VALID_BITS);
}
- else if (all_frozen_set)
- PageSetAllVisible(page);
/*
* XXX Should we set PageSetPrunable on this page ? See heap_insert()
@@ -2632,23 +2630,16 @@ heap_multi_insert(Relation relation, TupleTableSlot **slots, int ntuples,
/*
* If we've frozen everything on the page, update the visibilitymap.
- * We're already holding pin on the vmbuffer.
+ * We're already holding pin on the vmbuffer. It's fine to use
+ * InvalidTransactionId here - this is only used when
+ * HEAP_INSERT_FROZEN is specified, which intentionally violates
+ * visibility rules.
*/
if (all_frozen_set)
- {
- Assert(PageIsAllVisible(page));
- Assert(visibilitymap_pin_ok(BufferGetBlockNumber(buffer), vmbuffer));
-
- /*
- * It's fine to use InvalidTransactionId here - this is only used
- * when HEAP_INSERT_FROZEN is specified, which intentionally
- * violates visibility rules.
- */
- visibilitymap_set(relation, BufferGetBlockNumber(buffer), buffer,
- InvalidXLogRecPtr, vmbuffer,
- InvalidTransactionId,
- VISIBILITYMAP_ALL_VISIBLE | VISIBILITYMAP_ALL_FROZEN);
- }
+ heap_page_set_vm_and_log(relation, BufferGetBlockNumber(buffer), buffer,
+ vmbuffer,
+ InvalidTransactionId,
+ VISIBILITYMAP_ALL_VISIBLE | VISIBILITYMAP_ALL_FROZEN);
UnlockReleaseBuffer(buffer);
ndone += nthispage;
@@ -7840,6 +7831,43 @@ heap_tuple_needs_eventual_freeze(HeapTupleHeader tuple)
return false;
}
+/*
+ * Make the heap and VM page changes needed to set a page all-visible.
+ * Do not call in recovery.
+ */
+uint8
+heap_page_set_vm_and_log(Relation rel, BlockNumber heap_blk, Buffer heap_buf,
+ Buffer vmbuf, TransactionId cutoff_xid,
+ uint8 vmflags)
+{
+ Page heap_page = BufferGetPage(heap_buf);
+ bool set_heap_lsn = false;
+
+ Assert(BufferIsValid(heap_buf));
+
+ /* Check that we have the right heap page pinned, if present */
+ if (BufferGetBlockNumber(heap_buf) != heap_blk)
+ elog(ERROR, "wrong heap buffer passed to heap_page_set_vm_and_log");
+
+ /*
+ * We must never end up with the VM bit set and the page-level
+ * PD_ALL_VISIBLE bit clear. If that were to occur, a subsequent page
+ * modification would fail to clear the VM bit. Though it is possible for
+ * the page-level bit to be set and the VM bit to be clear if checksums
+ * and wal_log_hints are not enabled.
+ */
+ if (!PageIsAllVisible(heap_page))
+ {
+ PageSetAllVisible(heap_page);
+ MarkBufferDirty(heap_buf);
+ if (XLogHintBitIsNeeded())
+ set_heap_lsn = true;
+ }
+
+ return visibilitymap_set(rel, heap_blk, heap_buf,
+ InvalidXLogRecPtr, vmbuf, cutoff_xid, vmflags, set_heap_lsn);
+}
+
/*
* heap_tuple_should_freeze
*
diff --git a/src/backend/access/heap/heapam_xlog.c b/src/backend/access/heap/heapam_xlog.c
index fa94e104f1c..cfd4fc3327d 100644
--- a/src/backend/access/heap/heapam_xlog.c
+++ b/src/backend/access/heap/heapam_xlog.c
@@ -298,7 +298,7 @@ heap_xlog_visible(XLogReaderState *record)
visibilitymap_pin(reln, blkno, &vmbuffer);
visibilitymap_set(reln, blkno, InvalidBuffer, lsn, vmbuffer,
- xlrec->snapshotConflictHorizon, vmbits);
+ xlrec->snapshotConflictHorizon, vmbits, false);
ReleaseBuffer(vmbuffer);
FreeFakeRelcacheEntry(reln);
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index c8da2f835c4..5e662936dd7 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -1892,12 +1892,10 @@ lazy_scan_new_or_empty(LVRelState *vacrel, Buffer buf, BlockNumber blkno,
PageGetLSN(page) == InvalidXLogRecPtr)
log_newpage_buffer(buf, true);
- PageSetAllVisible(page);
- old_vmbits = visibilitymap_set(vacrel->rel, blkno, buf,
- InvalidXLogRecPtr,
- vmbuffer, InvalidTransactionId,
- VISIBILITYMAP_ALL_VISIBLE |
- VISIBILITYMAP_ALL_FROZEN);
+ old_vmbits = heap_page_set_vm_and_log(vacrel->rel, blkno, buf,
+ vmbuffer, InvalidTransactionId,
+ VISIBILITYMAP_ALL_VISIBLE |
+ VISIBILITYMAP_ALL_FROZEN);
END_CRIT_SECTION();
/* VM bits cannot have been set if PD_ALL_VISIBLE was clear */
@@ -2074,25 +2072,9 @@ lazy_scan_prune(LVRelState *vacrel,
flags |= VISIBILITYMAP_ALL_FROZEN;
}
- /*
- * It should never be the case that the visibility map page is set
- * while the page-level bit is clear, but the reverse is allowed (if
- * checksums are not enabled). Regardless, set both bits so that we
- * get back in sync.
- *
- * NB: If the heap page is all-visible but the VM bit is not set, we
- * don't need to dirty the heap page. However, if checksums are
- * enabled, we do need to make sure that the heap page is dirtied
- * before passing it to visibilitymap_set(), because it may be logged.
- * Given that this situation should only happen in rare cases after a
- * crash, it is not worth optimizing.
- */
- PageSetAllVisible(page);
- MarkBufferDirty(buf);
- old_vmbits = visibilitymap_set(vacrel->rel, blkno, buf,
- InvalidXLogRecPtr,
- vmbuffer, presult.vm_conflict_horizon,
- flags);
+ old_vmbits = heap_page_set_vm_and_log(vacrel->rel, blkno, buf,
+ vmbuffer, presult.vm_conflict_horizon,
+ flags);
/*
* If the page wasn't already set all-visible and/or all-frozen in the
@@ -2164,17 +2146,6 @@ lazy_scan_prune(LVRelState *vacrel,
{
uint8 old_vmbits;
- /*
- * Avoid relying on all_visible_according_to_vm as a proxy for the
- * page-level PD_ALL_VISIBLE bit being set, since it might have become
- * stale -- even when all_visible is set
- */
- if (!PageIsAllVisible(page))
- {
- PageSetAllVisible(page);
- MarkBufferDirty(buf);
- }
-
/*
* Set the page all-frozen (and all-visible) in the VM.
*
@@ -2183,11 +2154,10 @@ lazy_scan_prune(LVRelState *vacrel,
* was logged when the page's tuples were frozen.
*/
Assert(!TransactionIdIsValid(presult.vm_conflict_horizon));
- old_vmbits = visibilitymap_set(vacrel->rel, blkno, buf,
- InvalidXLogRecPtr,
- vmbuffer, InvalidTransactionId,
- VISIBILITYMAP_ALL_VISIBLE |
- VISIBILITYMAP_ALL_FROZEN);
+ old_vmbits = heap_page_set_vm_and_log(vacrel->rel, blkno, buf,
+ vmbuffer, InvalidTransactionId,
+ VISIBILITYMAP_ALL_VISIBLE |
+ VISIBILITYMAP_ALL_FROZEN);
/*
* The page was likely already set all-visible in the VM. However,
@@ -2919,11 +2889,9 @@ lazy_vacuum_heap_page(LVRelState *vacrel, BlockNumber blkno, Buffer buffer,
flags |= VISIBILITYMAP_ALL_FROZEN;
}
- PageSetAllVisible(page);
- old_vmbits = visibilitymap_set(vacrel->rel, blkno, buffer,
- InvalidXLogRecPtr,
- vmbuffer, visibility_cutoff_xid,
- flags);
+ old_vmbits = heap_page_set_vm_and_log(vacrel->rel, blkno, buffer,
+ vmbuffer, visibility_cutoff_xid,
+ flags);
/* We know the page should not have been all-visible */
Assert((old_vmbits & VISIBILITYMAP_VALID_BITS) == 0);
diff --git a/src/backend/access/heap/visibilitymap.c b/src/backend/access/heap/visibilitymap.c
index 745a04ef26e..45721399122 100644
--- a/src/backend/access/heap/visibilitymap.c
+++ b/src/backend/access/heap/visibilitymap.c
@@ -232,9 +232,10 @@ visibilitymap_pin_ok(BlockNumber heapBlk, Buffer vmbuf)
* when a page that is already all-visible is being marked all-frozen.
*
* Caller is expected to set the heap page's PD_ALL_VISIBLE bit before calling
- * this function. Except in recovery, caller should also pass the heap
- * buffer. When checksums are enabled and we're not in recovery, we must add
- * the heap buffer to the WAL chain to protect it from being torn.
+ * this function. Except in recovery, caller should also pass the heap buffer.
+ * When checksums are enabled and we're not in recovery, if the heap page was
+ * modified, we must add the heap buffer to the WAL chain to protect it from
+ * being torn.
*
* You must pass a buffer containing the correct map page to this function.
* Call visibilitymap_pin first to pin the right one. This function doesn't do
@@ -245,7 +246,7 @@ visibilitymap_pin_ok(BlockNumber heapBlk, Buffer vmbuf)
uint8
visibilitymap_set(Relation rel, BlockNumber heapBlk, Buffer heapBuf,
XLogRecPtr recptr, Buffer vmBuf, TransactionId cutoff_xid,
- uint8 flags)
+ uint8 flags, bool set_heap_lsn)
{
BlockNumber mapBlock = HEAPBLK_TO_MAPBLOCK(heapBlk);
uint32 mapByte = HEAPBLK_TO_MAPBYTE(heapBlk);
@@ -259,16 +260,12 @@ visibilitymap_set(Relation rel, BlockNumber heapBlk, Buffer heapBuf,
#endif
Assert(InRecovery || XLogRecPtrIsInvalid(recptr));
- Assert(InRecovery || PageIsAllVisible((Page) BufferGetPage(heapBuf)));
+ Assert(!(InRecovery && set_heap_lsn));
Assert((flags & VISIBILITYMAP_VALID_BITS) == flags);
/* Must never set all_frozen bit without also setting all_visible bit */
Assert(flags != VISIBILITYMAP_ALL_FROZEN);
- /* Check that we have the right heap page pinned, if present */
- if (BufferIsValid(heapBuf) && BufferGetBlockNumber(heapBuf) != heapBlk)
- elog(ERROR, "wrong heap buffer passed to visibilitymap_set");
-
/* Check that we have the right VM page pinned */
if (!BufferIsValid(vmBuf) || BufferGetBlockNumber(vmBuf) != mapBlock)
elog(ERROR, "wrong VM buffer passed to visibilitymap_set");
@@ -301,10 +298,12 @@ visibilitymap_set(Relation rel, BlockNumber heapBlk, Buffer heapBuf,
* WAL record inserted above, so it would be incorrect to
* update the heap page's LSN.
*/
- if (XLogHintBitIsNeeded())
+ if (set_heap_lsn)
{
Page heapPage = BufferGetPage(heapBuf);
+ Assert(XLogHintBitIsNeeded());
+
PageSetLSN(heapPage, recptr);
}
}
diff --git a/src/include/access/heapam.h b/src/include/access/heapam.h
index 3a9424c19c9..9375296062f 100644
--- a/src/include/access/heapam.h
+++ b/src/include/access/heapam.h
@@ -360,6 +360,9 @@ extern bool heap_tuple_should_freeze(HeapTupleHeader tuple,
TransactionId *NoFreezePageRelfrozenXid,
MultiXactId *NoFreezePageRelminMxid);
extern bool heap_tuple_needs_eventual_freeze(HeapTupleHeader tuple);
+extern uint8 heap_page_set_vm_and_log(Relation rel, BlockNumber heap_blk, Buffer heap_buf,
+ Buffer vmbuf, TransactionId cutoff_xid,
+ uint8 vmflags);
extern void simple_heap_insert(Relation relation, HeapTuple tup);
extern void simple_heap_delete(Relation relation, ItemPointer tid);
diff --git a/src/include/access/visibilitymap.h b/src/include/access/visibilitymap.h
index be21c6dd1a3..4fa4f837535 100644
--- a/src/include/access/visibilitymap.h
+++ b/src/include/access/visibilitymap.h
@@ -36,7 +36,7 @@ extern uint8 visibilitymap_set(Relation rel,
XLogRecPtr recptr,
Buffer vmBuf,
TransactionId cutoff_xid,
- uint8 flags);
+ uint8 flags, bool set_heap_lsn);
extern uint8 visibilitymap_get_status(Relation rel, BlockNumber heapBlk, Buffer *vmbuf);
extern void visibilitymap_count(Relation rel, BlockNumber *all_visible, BlockNumber *all_frozen);
extern BlockNumber visibilitymap_prepare_truncate(Relation rel,
--
2.34.1
v1-0005-Eliminate-xl_heap_visible-in-COPY-FREEZE.patchtext/x-patch; charset=US-ASCII; name=v1-0005-Eliminate-xl_heap_visible-in-COPY-FREEZE.patchDownload
From b8dacf8fed00b3d1fcf59e61adb1541ba68746a0 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <melanieplageman@gmail.com>
Date: Tue, 17 Jun 2025 17:40:28 -0400
Subject: [PATCH v1 05/14] Eliminate xl_heap_visible in COPY FREEZE
Instead of emitting a separate WAL record for setting the VM bits in
xl_heap_visible, include the required update in the xl_heap_multi_insert
record instead.
---
src/backend/access/heap/heapam.c | 42 +++++++++++++++++---------
src/backend/access/heap/heapam_xlog.c | 37 ++++++++++++++++++++++-
src/backend/access/rmgrdesc/heapdesc.c | 5 +++
3 files changed, 69 insertions(+), 15 deletions(-)
diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index 15dc3d88843..3d9b114b4e8 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -2493,9 +2493,6 @@ heap_multi_insert(Relation relation, TupleTableSlot **slots, int ntuples,
/*
* If the page is all visible, need to clear that, unless we're only
* going to add further frozen rows to it.
- *
- * If we're only adding already frozen rows to a previously empty
- * page, mark it as all-visible.
*/
if (PageIsAllVisible(page) && !(options & HEAP_INSERT_FROZEN))
{
@@ -2506,6 +2503,22 @@ heap_multi_insert(Relation relation, TupleTableSlot **slots, int ntuples,
vmbuffer, VISIBILITYMAP_VALID_BITS);
}
+ /*
+ * If we're only adding already frozen rows to a previously empty
+ * page, mark it as all-visible. And if we've frozen everything on the
+ * page, update the visibility map. We're already holding a pin on the
+ * vmbuffer.
+ */
+ else if (all_frozen_set)
+ {
+ LockBuffer(vmbuffer, BUFFER_LOCK_EXCLUSIVE);
+ heap_page_set_vm(relation,
+ BufferGetBlockNumber(buffer), buffer,
+ vmbuffer,
+ VISIBILITYMAP_ALL_VISIBLE |
+ VISIBILITYMAP_ALL_FROZEN);
+ }
+
/*
* XXX Should we set PageSetPrunable on this page ? See heap_insert()
*/
@@ -2552,6 +2565,12 @@ heap_multi_insert(Relation relation, TupleTableSlot **slots, int ntuples,
xlrec->flags = 0;
if (all_visible_cleared)
xlrec->flags = XLH_INSERT_ALL_VISIBLE_CLEARED;
+
+ /*
+ * We don't have to worry about including a conflict xid in the
+ * WAL record as HEAP_INSERT_FROZEN intentionally violates
+ * visibility rules.
+ */
if (all_frozen_set)
xlrec->flags = XLH_INSERT_ALL_FROZEN_SET;
@@ -2614,7 +2633,10 @@ heap_multi_insert(Relation relation, TupleTableSlot **slots, int ntuples,
XLogBeginInsert();
XLogRegisterData(xlrec, tupledata - scratch.data);
+
XLogRegisterBuffer(0, buffer, REGBUF_STANDARD | bufflags);
+ if (all_frozen_set)
+ XLogRegisterBuffer(1, vmbuffer, 0);
XLogRegisterBufData(0, tupledata, totaldatalen);
@@ -2624,22 +2646,14 @@ heap_multi_insert(Relation relation, TupleTableSlot **slots, int ntuples,
recptr = XLogInsert(RM_HEAP2_ID, info);
PageSetLSN(page, recptr);
+ if (all_frozen_set)
+ PageSetLSN(BufferGetPage(vmbuffer), recptr);
}
END_CRIT_SECTION();
- /*
- * If we've frozen everything on the page, update the visibilitymap.
- * We're already holding pin on the vmbuffer. It's fine to use
- * InvalidTransactionId here - this is only used when
- * HEAP_INSERT_FROZEN is specified, which intentionally violates
- * visibility rules.
- */
if (all_frozen_set)
- heap_page_set_vm_and_log(relation, BufferGetBlockNumber(buffer), buffer,
- vmbuffer,
- InvalidTransactionId,
- VISIBILITYMAP_ALL_VISIBLE | VISIBILITYMAP_ALL_FROZEN);
+ LockBuffer(vmbuffer, BUFFER_LOCK_UNLOCK);
UnlockReleaseBuffer(buffer);
ndone += nthispage;
diff --git a/src/backend/access/heap/heapam_xlog.c b/src/backend/access/heap/heapam_xlog.c
index cfd4fc3327d..a0f3673621a 100644
--- a/src/backend/access/heap/heapam_xlog.c
+++ b/src/backend/access/heap/heapam_xlog.c
@@ -552,6 +552,7 @@ heap_xlog_multi_insert(XLogReaderState *record)
int i;
bool isinit = (XLogRecGetInfo(record) & XLOG_HEAP_INIT_PAGE) != 0;
XLogRedoAction action;
+ Buffer vmbuffer = InvalidBuffer;
/*
* Insertion doesn't overwrite MVCC data, so no conflict processing is
@@ -572,11 +573,11 @@ heap_xlog_multi_insert(XLogReaderState *record)
if (xlrec->flags & XLH_INSERT_ALL_VISIBLE_CLEARED)
{
Relation reln = CreateFakeRelcacheEntry(rlocator);
- Buffer vmbuffer = InvalidBuffer;
visibilitymap_pin(reln, blkno, &vmbuffer);
visibilitymap_clear(reln, blkno, vmbuffer, VISIBILITYMAP_VALID_BITS);
ReleaseBuffer(vmbuffer);
+ vmbuffer = InvalidBuffer;
FreeFakeRelcacheEntry(reln);
}
@@ -663,6 +664,40 @@ heap_xlog_multi_insert(XLogReaderState *record)
if (BufferIsValid(buffer))
UnlockReleaseBuffer(buffer);
+ buffer = InvalidBuffer;
+
+ /*
+ * Now read and update the VM block. Even if we skipped updating the heap
+ * page due to the file being dropped or truncated later in recovery, it's
+ * still safe to update the visibility map. Any WAL record that clears
+ * the visibility map bit does so before checking the page LSN, so any
+ * bits that need to be cleared will still be cleared.
+ *
+ * It is only okay to set the VM bits without holding the heap page lock
+ * because we can expect no other writers of this page.
+ */
+ if (xlrec->flags & XLH_INSERT_ALL_FROZEN_SET &&
+ XLogReadBufferForRedoExtended(record, 1, RBM_ZERO_ON_ERROR, false,
+ &vmbuffer) == BLK_NEEDS_REDO)
+ {
+
+ uint8 old_vmbits = 0;
+ Relation reln = CreateFakeRelcacheEntry(rlocator);
+
+ visibilitymap_pin(reln, blkno, &vmbuffer);
+ old_vmbits = visibilitymap_set_vmbyte(reln, blkno,
+ vmbuffer,
+ VISIBILITYMAP_ALL_VISIBLE |
+ VISIBILITYMAP_ALL_FROZEN);
+ Assert((old_vmbits & VISIBILITYMAP_VALID_BITS) == 0);
+ (void) old_vmbits; /* Silence compiler */
+ PageSetLSN(BufferGetPage(vmbuffer), lsn);
+ FreeFakeRelcacheEntry(reln);
+ }
+
+ if (BufferIsValid(vmbuffer))
+ UnlockReleaseBuffer(vmbuffer);
+
/*
* If the page is running low on free space, update the FSM as well.
* Arbitrarily, our definition of "low" is less than 20%. We can't do much
diff --git a/src/backend/access/rmgrdesc/heapdesc.c b/src/backend/access/rmgrdesc/heapdesc.c
index 82b62c95de5..b48d7dc1d24 100644
--- a/src/backend/access/rmgrdesc/heapdesc.c
+++ b/src/backend/access/rmgrdesc/heapdesc.c
@@ -16,6 +16,7 @@
#include "access/heapam_xlog.h"
#include "access/rmgrdesc_utils.h"
+#include "access/visibilitymapdefs.h"
#include "storage/standbydefs.h"
/*
@@ -354,6 +355,10 @@ heap2_desc(StringInfo buf, XLogReaderState *record)
appendStringInfo(buf, "ntuples: %d, flags: 0x%02X", xlrec->ntuples,
xlrec->flags);
+ if (xlrec->flags & XLH_INSERT_ALL_FROZEN_SET)
+ appendStringInfo(buf, ", vm_flags: 0x%02X",
+ VISIBILITYMAP_ALL_VISIBLE | VISIBILITYMAP_ALL_FROZEN);
+
if (XLogRecHasBlockData(record, 0) && !isinit)
{
appendStringInfoString(buf, ", offsets:");
--
2.34.1
v1-0009-Combine-lazy_scan_prune-VM-corruption-cases.patchtext/x-patch; charset=US-ASCII; name=v1-0009-Combine-lazy_scan_prune-VM-corruption-cases.patchDownload
From 40308800989edf1821639cd18e0c2630f4417c22 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <melanieplageman@gmail.com>
Date: Wed, 28 May 2025 16:04:03 -0400
Subject: [PATCH v1 09/14] Combine lazy_scan_prune VM corruption cases
lazy_scan_prune() updates the visibility map after phase I of heap
vacuuming. It also checks and fixes corruption in the VM. The corruption
cases where mixed in with the normal visibility map update cases.
Careful study of the ordering of the current logic reveals that the
corruption cases can be reordered and extracted into a separate
function. This should result in no additional overhead when compared to
previous execution.
This reordering makes it clear which cases are about corruption and
which cases are normal VM updates. Separating them also makes it
possible to combine the normal cases in a future commit. This will make
the logic easier to understand and allow for further separation of the
logic to allow updating the VM in the same record as pruning and
freezing in phase I.
---
src/backend/access/heap/vacuumlazy.c | 115 +++++++++++++++++----------
1 file changed, 74 insertions(+), 41 deletions(-)
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index b1ff49bee6b..8328cab0955 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -431,6 +431,13 @@ static void find_next_unskippable_block(LVRelState *vacrel, bool *skipsallvis);
static bool lazy_scan_new_or_empty(LVRelState *vacrel, Buffer buf,
BlockNumber blkno, Page page,
bool sharelock, Buffer vmbuffer);
+
+static bool identify_and_fix_vm_corruption(Relation relation,
+ BlockNumber heap_blk,
+ Buffer heap_buffer, Page heap_page,
+ bool heap_blk_known_av,
+ int64 nlpdead_items,
+ Buffer vmbuffer);
static void lazy_scan_prune(LVRelState *vacrel, Buffer buf,
BlockNumber blkno, Page page,
Buffer vmbuffer, bool all_visible_according_to_vm,
@@ -1940,6 +1947,66 @@ lazy_scan_new_or_empty(LVRelState *vacrel, Buffer buf, BlockNumber blkno,
return false;
}
+/*
+ * When updating the visibility map after phase I heap vacuuming, we take the
+ * opportunity to identify and fix any VM corruption.
+ *
+ * heap_blk_known_av is the visibility status of the heap page collected
+ * while finding the next unskippable block in heap_vac_scan_next_block().
+ */
+static bool
+identify_and_fix_vm_corruption(Relation relation,
+ BlockNumber heap_blk,
+ Buffer heap_buffer, Page heap_page,
+ bool heap_blk_known_av,
+ int64 nlpdead_items,
+ Buffer vmbuffer)
+{
+ /*
+ * As of PostgreSQL 9.2, the visibility map bit should never be set if the
+ * page-level bit is clear. However, it's possible that the bit got
+ * cleared after heap_vac_scan_next_block() was called, so we must recheck
+ * with buffer lock before concluding that the VM is corrupt.
+ */
+ if (heap_blk_known_av && !PageIsAllVisible(heap_page) &&
+ visibilitymap_get_status(relation, heap_blk, &vmbuffer) != 0)
+ {
+ elog(WARNING, "page is not marked all-visible but visibility map bit is set in relation \"%s\" page %u",
+ RelationGetRelationName(relation), heap_blk);
+ visibilitymap_clear(relation, heap_blk, vmbuffer,
+ VISIBILITYMAP_VALID_BITS);
+ return true;
+ }
+
+ /*
+ * It's possible for the value returned by
+ * GetOldestNonRemovableTransactionId() to move backwards, so it's not
+ * wrong for us to see tuples that appear to not be visible to everyone
+ * yet, while PD_ALL_VISIBLE is already set. The real safe xmin value
+ * never moves backwards, but GetOldestNonRemovableTransactionId() is
+ * conservative and sometimes returns a value that's unnecessarily small,
+ * so if we see that contradiction it just means that the tuples that we
+ * think are not visible to everyone yet actually are, and the
+ * PD_ALL_VISIBLE flag is correct.
+ *
+ * There should never be LP_DEAD items on a page with PD_ALL_VISIBLE set,
+ * however.
+ */
+ if (nlpdead_items > 0 && PageIsAllVisible(heap_page))
+ {
+ elog(WARNING, "page containing LP_DEAD items is marked as all-visible in relation \"%s\" page %u",
+ RelationGetRelationName(relation), heap_blk);
+ PageClearAllVisible(heap_page);
+ MarkBufferDirty(heap_buffer);
+ visibilitymap_clear(relation, heap_blk, vmbuffer,
+ VISIBILITYMAP_VALID_BITS);
+ return true;
+ }
+
+ return false;
+}
+
+
/* qsort comparator for sorting OffsetNumbers */
static int
cmpOffsetNumbers(const void *a, const void *b)
@@ -2084,9 +2151,14 @@ lazy_scan_prune(LVRelState *vacrel,
/*
* Handle setting visibility map bit based on information from the VM (as
* of last heap_vac_scan_next_block() call), and from all_visible and
- * all_frozen variables
+ * all_frozen variables. Start by looking for any VM corruption.
*/
- if (!all_visible_according_to_vm && presult.all_visible)
+ if (identify_and_fix_vm_corruption(vacrel->rel, blkno, buf, page,
+ all_visible_according_to_vm, presult.lpdead_items, vmbuffer))
+ {
+ /* Don't update the VM if we just cleared corruption in it */
+ }
+ else if (!all_visible_according_to_vm && presult.all_visible)
{
uint8 old_vmbits;
uint8 flags = VISIBILITYMAP_ALL_VISIBLE;
@@ -2122,45 +2194,6 @@ lazy_scan_prune(LVRelState *vacrel,
}
}
- /*
- * As of PostgreSQL 9.2, the visibility map bit should never be set if the
- * page-level bit is clear. However, it's possible that the bit got
- * cleared after heap_vac_scan_next_block() was called, so we must recheck
- * with buffer lock before concluding that the VM is corrupt.
- */
- else if (all_visible_according_to_vm && !PageIsAllVisible(page) &&
- visibilitymap_get_status(vacrel->rel, blkno, &vmbuffer) != 0)
- {
- elog(WARNING, "page is not marked all-visible but visibility map bit is set in relation \"%s\" page %u",
- vacrel->relname, blkno);
- visibilitymap_clear(vacrel->rel, blkno, vmbuffer,
- VISIBILITYMAP_VALID_BITS);
- }
-
- /*
- * It's possible for the value returned by
- * GetOldestNonRemovableTransactionId() to move backwards, so it's not
- * wrong for us to see tuples that appear to not be visible to everyone
- * yet, while PD_ALL_VISIBLE is already set. The real safe xmin value
- * never moves backwards, but GetOldestNonRemovableTransactionId() is
- * conservative and sometimes returns a value that's unnecessarily small,
- * so if we see that contradiction it just means that the tuples that we
- * think are not visible to everyone yet actually are, and the
- * PD_ALL_VISIBLE flag is correct.
- *
- * There should never be LP_DEAD items on a page with PD_ALL_VISIBLE set,
- * however.
- */
- else if (presult.lpdead_items > 0 && PageIsAllVisible(page))
- {
- elog(WARNING, "page containing LP_DEAD items is marked as all-visible in relation \"%s\" page %u",
- vacrel->relname, blkno);
- PageClearAllVisible(page);
- MarkBufferDirty(buf);
- visibilitymap_clear(vacrel->rel, blkno, vmbuffer,
- VISIBILITYMAP_VALID_BITS);
- }
-
/*
* If the all-visible page is all-frozen but not marked as such yet, mark
* it as all-frozen. Note that all_frozen is only valid if all_visible is
--
2.34.1
v1-0006-Make-heap_page_is_all_visible-independent-of-LVRe.patchtext/x-patch; charset=US-ASCII; name=v1-0006-Make-heap_page_is_all_visible-independent-of-LVRe.patchDownload
From 797f6f09cf7287af4b4e929e903e115a767df145 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <melanieplageman@gmail.com>
Date: Wed, 18 Jun 2025 15:48:51 -0400
Subject: [PATCH v1 06/14] Make heap_page_is_all_visible independent of
LVRelState
Future commits will use this function inside of pruneheap.c where we do
not have access to the LVRelState. We only need two parameters from the
LVRelState, so just pass those in explicitly.
---
src/backend/access/heap/vacuumlazy.c | 45 ++++++++++++++++++----------
1 file changed, 29 insertions(+), 16 deletions(-)
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index 5e662936dd7..0cf4a69c431 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -464,8 +464,11 @@ static void dead_items_add(LVRelState *vacrel, BlockNumber blkno, OffsetNumber *
int num_offsets);
static void dead_items_reset(LVRelState *vacrel);
static void dead_items_cleanup(LVRelState *vacrel);
-static bool heap_page_is_all_visible(LVRelState *vacrel, Buffer buf,
- TransactionId *visibility_cutoff_xid, bool *all_frozen);
+static bool heap_page_is_all_visible(Relation rel, Buffer buf,
+ TransactionId OldestXmin,
+ bool *all_frozen,
+ TransactionId *visibility_cutoff_xid,
+ OffsetNumber *logging_offnum);
static void update_relstats_all_indexes(LVRelState *vacrel);
static void vacuum_error_callback(void *arg);
static void update_vacuum_error_info(LVRelState *vacrel,
@@ -2010,8 +2013,9 @@ lazy_scan_prune(LVRelState *vacrel,
Assert(presult.lpdead_items == 0);
- if (!heap_page_is_all_visible(vacrel, buf,
- &debug_cutoff, &debug_all_frozen))
+ if (!heap_page_is_all_visible(vacrel->rel, buf,
+ vacrel->cutoffs.OldestXmin, &debug_all_frozen,
+ &debug_cutoff, &vacrel->offnum))
Assert(false);
Assert(presult.all_frozen == debug_all_frozen);
@@ -2877,8 +2881,8 @@ lazy_vacuum_heap_page(LVRelState *vacrel, BlockNumber blkno, Buffer buffer,
* emitted.
*/
Assert(!PageIsAllVisible(page));
- if (heap_page_is_all_visible(vacrel, buffer, &visibility_cutoff_xid,
- &all_frozen))
+ if (heap_page_is_all_visible(vacrel->rel, buffer, vacrel->cutoffs.OldestXmin,
+ &all_frozen, &visibility_cutoff_xid, &vacrel->offnum))
{
uint8 old_vmbits;
uint8 flags = VISIBILITYMAP_ALL_VISIBLE;
@@ -3568,9 +3572,16 @@ dead_items_cleanup(LVRelState *vacrel)
/*
* Check if every tuple in the given page is visible to all current and future
- * transactions. Also return the visibility_cutoff_xid which is the highest
- * xmin amongst the visible tuples. Set *all_frozen to true if every tuple
- * on this page is frozen.
+ * transactions.
+ *
+ * OldestXmin is used to determine visibility.
+ *
+ * *logging_offnum will have the OffsetNumber of the current tuple being
+ * processed for vacuum's error callback system.
+ *
+ * Return the visibility_cutoff_xid which is the highest xmin amongst the
+ * visible tuples. Sets *all_frozen to true if every tuple on this page is
+ * frozen.
*
* This is a stripped down version of lazy_scan_prune(). If you change
* anything here, make sure that everything stays in sync. Note that an
@@ -3578,9 +3589,11 @@ dead_items_cleanup(LVRelState *vacrel)
* introducing new side-effects here.
*/
static bool
-heap_page_is_all_visible(LVRelState *vacrel, Buffer buf,
+heap_page_is_all_visible(Relation rel, Buffer buf,
+ TransactionId OldestXmin,
+ bool *all_frozen,
TransactionId *visibility_cutoff_xid,
- bool *all_frozen)
+ OffsetNumber *logging_offnum)
{
Page page = BufferGetPage(buf);
BlockNumber blockno = BufferGetBlockNumber(buf);
@@ -3603,7 +3616,7 @@ heap_page_is_all_visible(LVRelState *vacrel, Buffer buf,
* Set the offset number so that we can display it along with any
* error that occurred while processing this tuple.
*/
- vacrel->offnum = offnum;
+ *logging_offnum = offnum;
itemid = PageGetItemId(page, offnum);
/* Unused or redirect line pointers are of no interest */
@@ -3627,9 +3640,9 @@ heap_page_is_all_visible(LVRelState *vacrel, Buffer buf,
tuple.t_data = (HeapTupleHeader) PageGetItem(page, itemid);
tuple.t_len = ItemIdGetLength(itemid);
- tuple.t_tableOid = RelationGetRelid(vacrel->rel);
+ tuple.t_tableOid = RelationGetRelid(rel);
- switch (HeapTupleSatisfiesVacuum(&tuple, vacrel->cutoffs.OldestXmin,
+ switch (HeapTupleSatisfiesVacuum(&tuple, OldestXmin,
buf))
{
case HEAPTUPLE_LIVE:
@@ -3650,7 +3663,7 @@ heap_page_is_all_visible(LVRelState *vacrel, Buffer buf,
*/
xmin = HeapTupleHeaderGetXmin(tuple.t_data);
if (!TransactionIdPrecedes(xmin,
- vacrel->cutoffs.OldestXmin))
+ OldestXmin))
{
all_visible = false;
*all_frozen = false;
@@ -3685,7 +3698,7 @@ heap_page_is_all_visible(LVRelState *vacrel, Buffer buf,
} /* scan along page */
/* Clear the offset information once we have processed the given page. */
- vacrel->offnum = InvalidOffsetNumber;
+ *logging_offnum = InvalidOffsetNumber;
return all_visible;
}
--
2.34.1
v1-0007-Eliminate-xl_heap_visible-from-vacuum-phase-III.patchtext/x-patch; charset=US-ASCII; name=v1-0007-Eliminate-xl_heap_visible-from-vacuum-phase-III.patchDownload
From 087566e214b67713390f742dfd825d330d3f8360 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <melanieplageman@gmail.com>
Date: Wed, 18 Jun 2025 12:42:13 -0400
Subject: [PATCH v1 07/14] Eliminate xl_heap_visible from vacuum phase III
Instead of emitting a separate xl_heap_visible record for each page that
is all-visible after vacuum's third phase, use the VM-related options
when emitting the xl_heap_prune record with the changes vacuum makes in
phase III.
---
src/backend/access/heap/heapam_xlog.c | 148 +++++++++++++++++++---
src/backend/access/heap/pruneheap.c | 48 +++++++-
src/backend/access/heap/vacuumlazy.c | 164 ++++++++++++++++---------
src/backend/access/rmgrdesc/heapdesc.c | 13 +-
src/include/access/heapam.h | 9 ++
src/include/access/heapam_xlog.h | 3 +
6 files changed, 308 insertions(+), 77 deletions(-)
diff --git a/src/backend/access/heap/heapam_xlog.c b/src/backend/access/heap/heapam_xlog.c
index a0f3673621a..bb6680c0467 100644
--- a/src/backend/access/heap/heapam_xlog.c
+++ b/src/backend/access/heap/heapam_xlog.c
@@ -35,7 +35,8 @@ heap_xlog_prune_freeze(XLogReaderState *record)
Buffer buffer;
RelFileLocator rlocator;
BlockNumber blkno;
- XLogRedoAction action;
+ Buffer vmbuffer = InvalidBuffer;
+ uint8 vmflags = 0;
XLogRecGetBlockTag(record, 0, &rlocator, NULL, &blkno);
memcpy(&xlrec, maindataptr, SizeOfHeapPrune);
@@ -51,10 +52,15 @@ heap_xlog_prune_freeze(XLogReaderState *record)
(xlrec.flags & (XLHP_HAS_REDIRECTIONS | XLHP_HAS_DEAD_ITEMS)) == 0);
/*
- * We are about to remove and/or freeze tuples. In Hot Standby mode,
- * ensure that there are no queries running for which the removed tuples
- * are still visible or which still consider the frozen xids as running.
- * The conflict horizon XID comes after xl_heap_prune.
+ * After xl_heap_prune is the optional snapshot conflict horizon.
+ *
+ * In Hot Standby mode, we must ensure that there are no running queries
+ * which would conflict with the changes in this record. If pruning, that
+ * means we cannot remove tuples still visible to transactions on the
+ * standby. If freezing, that means we cannot freeze tuples with xids that
+ * are still considered running on the standby. And for setting the VM, we
+ * cannot do so if the page isn't all-visible to all transactions on the
+ * standby.
*/
if ((xlrec.flags & XLHP_HAS_CONFLICT_HORIZON) != 0)
{
@@ -70,13 +76,28 @@ heap_xlog_prune_freeze(XLogReaderState *record)
rlocator);
}
+ /* Next are the optionally included vmflags. Copy them out for later use. */
+ if ((xlrec.flags & XLHP_HAS_VMFLAGS) != 0)
+ {
+ memcpy(&vmflags, maindataptr, sizeof(uint8));
+ maindataptr += sizeof(uint8);
+
+ /*
+ * We don't set VISIBILITYMAP_XLOG_CATALOG_REL in the combined record
+ * because we already have XLHP_IS_CATALOG_REL.
+ */
+ Assert((vmflags & VISIBILITYMAP_VALID_BITS) == vmflags);
+ /* Must never set all_frozen bit without also setting all_visible bit */
+ Assert(vmflags != VISIBILITYMAP_ALL_FROZEN);
+ }
+
/*
- * If we have a full-page image, restore it and we're done.
+ * If we have a full-page image of the heap block, restore it and we're
+ * done with the heap block.
*/
- action = XLogReadBufferForRedoExtended(record, 0, RBM_NORMAL,
- (xlrec.flags & XLHP_CLEANUP_LOCK) != 0,
- &buffer);
- if (action == BLK_NEEDS_REDO)
+ if (XLogReadBufferForRedoExtended(record, 0, RBM_NORMAL,
+ (xlrec.flags & XLHP_CLEANUP_LOCK) != 0,
+ &buffer) == BLK_NEEDS_REDO)
{
Page page = (Page) BufferGetPage(buffer);
OffsetNumber *redirected;
@@ -89,6 +110,9 @@ heap_xlog_prune_freeze(XLogReaderState *record)
Size datalen;
xlhp_freeze_plan *plans;
OffsetNumber *frz_offsets;
+ bool do_prune;
+ bool mark_buffer_dirty;
+ bool set_heap_lsn;
char *dataptr = XLogRecGetBlockData(record, 0, &datalen);
heap_xlog_deserialize_prune_and_freeze(dataptr, xlrec.flags,
@@ -97,11 +121,18 @@ heap_xlog_prune_freeze(XLogReaderState *record)
&ndead, &nowdead,
&nunused, &nowunused);
+ do_prune = nredirected > 0 || ndead > 0 || nunused > 0;
+ set_heap_lsn = mark_buffer_dirty = do_prune || nplans > 0;
+
+ /* Ensure the record does something */
+ Assert(do_prune || nplans > 0 ||
+ vmflags & VISIBILITYMAP_VALID_BITS);
+
/*
* Update all line pointers per the record, and repair fragmentation
* if needed.
*/
- if (nredirected > 0 || ndead > 0 || nunused > 0)
+ if (do_prune)
heap_page_prune_execute(buffer,
(xlrec.flags & XLHP_CLEANUP_LOCK) == 0,
redirected, nredirected,
@@ -138,26 +169,78 @@ heap_xlog_prune_freeze(XLogReaderState *record)
/* There should be no more data */
Assert((char *) frz_offsets == dataptr + datalen);
+ Assert(BufferIsValid(buffer) &&
+ BufferGetBlockNumber(buffer) == blkno);
+
+ /*
+ * Now set PD_ALL_VISIBLE, if required. We'll only do this if we are
+ * also going to set bits in the VM later.
+ *
+ * We must never end up with the VM bit set and the page-level
+ * PD_ALL_VISIBLE bit clear. If that were to occur, a subsequent page
+ * modification would fail to clear the VM bit.
+ *
+ * Prior to Postgres 19, it was possible for the page-level bit to be
+ * set and the VM bit to be clear. This could happen if we crashed
+ * after setting PD_ALL_VISIBLE but before setting bits in the VM.
+ */
+ if ((vmflags & VISIBILITYMAP_VALID_BITS) && !PageIsAllVisible(page))
+ {
+ PageSetAllVisible(page);
+
+ /*
+ * Setting PD_ALL_VISIBLE only forces us to update the heap page
+ * LSN if checksums or wal_log_hints are enabled (in which case we
+ * must). This exposes us to torn page hazards, but since we're
+ * not inspecting the existing page contents in any way, we don't
+ * care.
+ */
+ set_heap_lsn = XLogHintBitIsNeeded() ? true : set_heap_lsn;
+ mark_buffer_dirty = true;
+ }
+
/*
* Note: we don't worry about updating the page's prunability hints.
* At worst this will cause an extra prune cycle to occur soon.
*/
- PageSetLSN(page, lsn);
- MarkBufferDirty(buffer);
+ if (mark_buffer_dirty)
+ MarkBufferDirty(buffer);
+ if (set_heap_lsn)
+ PageSetLSN(page, lsn);
}
/*
- * If we released any space or line pointers, update the free space map.
+ * If we released any space or line pointers or will be setting a page in
+ * the visibility map, update the free space map.
+ *
+ * Even if we are just updating the VM (and thus not freeing up any
+ * space), we'll still update the FSM for this page. Since FSM is not
+ * WAL-logged and only updated heuristically, it easily becomes stale in
+ * standbys. If the standby is later promoted and runs VACUUM, it will
+ * skip updating individual free space figures for pages that became
+ * all-visible (or all-frozen, depending on the vacuum mode,) which is
+ * troublesome when FreeSpaceMapVacuum propagates too optimistic free
+ * space values to upper FSM layers; later inserters try to use such pages
+ * only to find out that they are unusable. This can cause long stalls
+ * when there are many such pages.
+ *
+ * Forestall those problems by updating FSM's idea about a page that is
+ * becoming all-visible or all-frozen.
*
* Do this regardless of a full-page image being applied, since the FSM
* data is not in the page anyway.
+ *
+ * We want to avoid holding an exclusive lock on the heap buffer while
+ * doing IO (either of the FSM or the VM), so we'll release the lock on
+ * the heap buffer before doing either.
*/
if (BufferIsValid(buffer))
{
- if (xlrec.flags & (XLHP_HAS_REDIRECTIONS |
- XLHP_HAS_DEAD_ITEMS |
- XLHP_HAS_NOW_UNUSED_ITEMS))
+ if ((xlrec.flags & (XLHP_HAS_REDIRECTIONS |
+ XLHP_HAS_DEAD_ITEMS |
+ XLHP_HAS_NOW_UNUSED_ITEMS)) ||
+ vmflags & VISIBILITYMAP_VALID_BITS)
{
Size freespace = PageGetHeapFreeSpace(BufferGetPage(buffer));
@@ -168,6 +251,37 @@ heap_xlog_prune_freeze(XLogReaderState *record)
else
UnlockReleaseBuffer(buffer);
}
+
+ /*
+ * Read and update the VM block. Even if we skipped updating the heap page
+ * due to the file being dropped or truncated later in recovery, it's
+ * still safe to update the visibility map. Any WAL record that clears
+ * the visibility map bit does so before checking the page LSN, so any
+ * bits that need to be cleared will still be cleared.
+ *
+ * Note that it is *only* okay that we do not hold a lock on the heap page
+ * because we are in recovery and can expect no other writers to clear
+ * PD_ALL_VISIBLE before we are able to update the VM.
+ */
+ if (vmflags & VISIBILITYMAP_VALID_BITS &&
+ XLogReadBufferForRedoExtended(record, 1,
+ RBM_ZERO_ON_ERROR,
+ false,
+ &vmbuffer) == BLK_NEEDS_REDO)
+ {
+ uint8 old_vmbits = 0;
+ Relation reln = CreateFakeRelcacheEntry(rlocator);
+
+ visibilitymap_pin(reln, blkno, &vmbuffer);
+ old_vmbits = visibilitymap_set_vmbyte(reln, blkno, vmbuffer, vmflags);
+ /* Only set VM page LSN if we modified the page */
+ if (old_vmbits != vmflags)
+ PageSetLSN(BufferGetPage(vmbuffer), lsn);
+ FreeFakeRelcacheEntry(reln);
+ }
+
+ if (BufferIsValid(vmbuffer))
+ UnlockReleaseBuffer(vmbuffer);
}
/*
diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index a8025889be0..d9ba0f96e34 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -21,6 +21,7 @@
#include "access/transam.h"
#include "access/xlog.h"
#include "access/xloginsert.h"
+#include "access/visibilitymapdefs.h"
#include "commands/vacuum.h"
#include "executor/instrument.h"
#include "miscadmin.h"
@@ -835,6 +836,7 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
conflict_xid = prstate.latest_xid_removed;
log_heap_prune_and_freeze(relation, buffer,
+ InvalidBuffer, 0, false,
conflict_xid,
true, reason,
prstate.frozen, prstate.nfrozen,
@@ -2045,12 +2047,23 @@ heap_log_freeze_plan(HeapTupleFreeze *tuples, int ntuples,
* replaying 'unused' items depends on whether they were all previously marked
* as dead.
*
+ * If the VM is being updated, vmflags will contain the bits to set. In this
+ * case, vmbuffer should already have been updated and marked dirty and should
+ * still be pinned and locked.
+ *
+ * set_pd_all_vis indicates that we set PD_ALL_VISIBLE and thus should update
+ * the page LSN when checksums/wal_log_hints are enabled even if we did not
+ * prune or freeze tuples on the page.
+ *
* Note: This function scribbles on the 'frozen' array.
*
* Note: This is called in a critical section, so careful what you do here.
*/
void
log_heap_prune_and_freeze(Relation relation, Buffer buffer,
+ Buffer vmbuffer,
+ uint8 vmflags,
+ bool set_pd_all_vis,
TransactionId conflict_xid,
bool cleanup_lock,
PruneReason reason,
@@ -2062,6 +2075,7 @@ log_heap_prune_and_freeze(Relation relation, Buffer buffer,
xl_heap_prune xlrec;
XLogRecPtr recptr;
uint8 info;
+ uint8 regbuf_flags;
/* The following local variables hold data registered in the WAL record: */
xlhp_freeze_plan plans[MaxHeapTuplesPerPage];
@@ -2070,8 +2084,19 @@ log_heap_prune_and_freeze(Relation relation, Buffer buffer,
xlhp_prune_items dead_items;
xlhp_prune_items unused_items;
OffsetNumber frz_offsets[MaxHeapTuplesPerPage];
+ bool do_prune = nredirected > 0 || ndead > 0 || nunused > 0;
xlrec.flags = 0;
+ regbuf_flags = REGBUF_STANDARD;
+
+ /*
+ * We can avoid an FPI if the only modification we are making to the heap
+ * page is to set PD_ALL_VISIBLE and checksums/wal_log_hints are disabled.
+ */
+ if (!do_prune &&
+ nfrozen == 0 &&
+ (!set_pd_all_vis || !XLogHintBitIsNeeded()))
+ regbuf_flags |= REGBUF_NO_IMAGE;
/*
* Prepare data for the buffer. The arrays are not actually in the
@@ -2079,7 +2104,11 @@ log_heap_prune_and_freeze(Relation relation, Buffer buffer,
* page image, the arrays can be omitted.
*/
XLogBeginInsert();
- XLogRegisterBuffer(0, buffer, REGBUF_STANDARD);
+ XLogRegisterBuffer(0, buffer, regbuf_flags);
+
+ if (vmflags & VISIBILITYMAP_VALID_BITS)
+ XLogRegisterBuffer(1, vmbuffer, 0);
+
if (nfrozen > 0)
{
int nplans;
@@ -2136,6 +2165,8 @@ log_heap_prune_and_freeze(Relation relation, Buffer buffer,
* Prepare the main xl_heap_prune record. We already set the XLHP_HAS_*
* flag above.
*/
+ if (vmflags & VISIBILITYMAP_VALID_BITS)
+ xlrec.flags |= XLHP_HAS_VMFLAGS;
if (RelationIsAccessibleInLogicalDecoding(relation))
xlrec.flags |= XLHP_IS_CATALOG_REL;
if (TransactionIdIsValid(conflict_xid))
@@ -2150,6 +2181,8 @@ log_heap_prune_and_freeze(Relation relation, Buffer buffer,
XLogRegisterData(&xlrec, SizeOfHeapPrune);
if (TransactionIdIsValid(conflict_xid))
XLogRegisterData(&conflict_xid, sizeof(TransactionId));
+ if (vmflags & VISIBILITYMAP_VALID_BITS)
+ XLogRegisterData(&vmflags, sizeof(uint8));
switch (reason)
{
@@ -2168,5 +2201,16 @@ log_heap_prune_and_freeze(Relation relation, Buffer buffer,
}
recptr = XLogInsert(RM_HEAP2_ID, info);
- PageSetLSN(BufferGetPage(buffer), recptr);
+ if (vmflags & VISIBILITYMAP_VALID_BITS)
+ PageSetLSN(BufferGetPage(vmbuffer), recptr);
+
+ /*
+ * If pruning or freezing tuples or setting the page all-visible when
+ * checksums or wal_hint_bits are enabled, we must bump the LSN. Torn
+ * pages are possible if we update PD_ALL_VISIBLE without bumping the LSN,
+ * but this is deemed okay for page hint updates.
+ */
+ if (do_prune || nfrozen > 0 ||
+ (set_pd_all_vis && XLogHintBitIsNeeded()))
+ PageSetLSN(BufferGetPage(buffer), recptr);
}
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index 0cf4a69c431..32f21d20194 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -464,11 +464,13 @@ static void dead_items_add(LVRelState *vacrel, BlockNumber blkno, OffsetNumber *
int num_offsets);
static void dead_items_reset(LVRelState *vacrel);
static void dead_items_cleanup(LVRelState *vacrel);
-static bool heap_page_is_all_visible(Relation rel, Buffer buf,
- TransactionId OldestXmin,
- bool *all_frozen,
- TransactionId *visibility_cutoff_xid,
- OffsetNumber *logging_offnum);
+static bool heap_page_is_all_visible_except_lpdead(Relation rel, Buffer buf,
+ TransactionId OldestXmin,
+ OffsetNumber *deadoffsets,
+ int allowed_num_offsets,
+ bool *all_frozen,
+ TransactionId *visibility_cutoff_xid,
+ OffsetNumber *logging_offnum);
static void update_relstats_all_indexes(LVRelState *vacrel);
static void vacuum_error_callback(void *arg);
static void update_vacuum_error_info(LVRelState *vacrel,
@@ -2817,8 +2819,12 @@ lazy_vacuum_heap_page(LVRelState *vacrel, BlockNumber blkno, Buffer buffer,
OffsetNumber unused[MaxHeapTuplesPerPage];
int nunused = 0;
TransactionId visibility_cutoff_xid;
+ TransactionId conflict_xid = InvalidTransactionId;
bool all_frozen;
LVSavedErrInfo saved_err_info;
+ uint8 old_vmbits = 0;
+ uint8 vmflags = 0;
+ bool set_pd_all_vis = false;
Assert(vacrel->do_index_vacuuming);
@@ -2829,6 +2835,20 @@ lazy_vacuum_heap_page(LVRelState *vacrel, BlockNumber blkno, Buffer buffer,
VACUUM_ERRCB_PHASE_VACUUM_HEAP, blkno,
InvalidOffsetNumber);
+ if (heap_page_is_all_visible_except_lpdead(vacrel->rel, buffer,
+ vacrel->cutoffs.OldestXmin,
+ deadoffsets, num_offsets,
+ &all_frozen, &visibility_cutoff_xid,
+ &vacrel->offnum))
+ {
+ vmflags |= VISIBILITYMAP_ALL_VISIBLE;
+ if (all_frozen)
+ {
+ vmflags |= VISIBILITYMAP_ALL_FROZEN;
+ Assert(!TransactionIdIsValid(visibility_cutoff_xid));
+ }
+ }
+
START_CRIT_SECTION();
for (int i = 0; i < num_offsets; i++)
@@ -2848,6 +2868,21 @@ lazy_vacuum_heap_page(LVRelState *vacrel, BlockNumber blkno, Buffer buffer,
/* Attempt to truncate line pointer array now */
PageTruncateLinePointerArray(page);
+ if ((vmflags & VISIBILITYMAP_VALID_BITS) != 0)
+ {
+ Assert(!PageIsAllVisible(page));
+ set_pd_all_vis = true;
+ LockBuffer(vmbuffer, BUFFER_LOCK_EXCLUSIVE);
+ old_vmbits = heap_page_set_vm(vacrel->rel,
+ blkno, buffer,
+ vmbuffer, vmflags);
+
+ /* We know the page should not have been all-visible */
+ Assert((old_vmbits & VISIBILITYMAP_VALID_BITS) == 0);
+ (void) old_vmbits; /* Silence compiler */
+ conflict_xid = visibility_cutoff_xid;
+ }
+
/*
* Mark buffer dirty before we write WAL.
*/
@@ -2857,7 +2892,10 @@ lazy_vacuum_heap_page(LVRelState *vacrel, BlockNumber blkno, Buffer buffer,
if (RelationNeedsWAL(vacrel->rel))
{
log_heap_prune_and_freeze(vacrel->rel, buffer,
- InvalidTransactionId,
+ vmbuffer,
+ vmflags,
+ set_pd_all_vis,
+ conflict_xid,
false, /* no cleanup lock required */
PRUNE_VACUUM_CLEANUP,
NULL, 0, /* frozen */
@@ -2866,48 +2904,14 @@ lazy_vacuum_heap_page(LVRelState *vacrel, BlockNumber blkno, Buffer buffer,
unused, nunused);
}
- /*
- * End critical section, so we safely can do visibility tests (which
- * possibly need to perform IO and allocate memory!). If we crash now the
- * page (including the corresponding vm bit) might not be marked all
- * visible, but that's fine. A later vacuum will fix that.
- */
END_CRIT_SECTION();
- /*
- * Now that we have removed the LP_DEAD items from the page, once again
- * check if the page has become all-visible. The page is already marked
- * dirty, exclusively locked, and, if needed, a full page image has been
- * emitted.
- */
- Assert(!PageIsAllVisible(page));
- if (heap_page_is_all_visible(vacrel->rel, buffer, vacrel->cutoffs.OldestXmin,
- &all_frozen, &visibility_cutoff_xid, &vacrel->offnum))
+ if ((vmflags & VISIBILITYMAP_ALL_VISIBLE) != 0)
{
- uint8 old_vmbits;
- uint8 flags = VISIBILITYMAP_ALL_VISIBLE;
-
+ LockBuffer(vmbuffer, BUFFER_LOCK_UNLOCK);
+ vacrel->vm_new_visible_pages++;
if (all_frozen)
- {
- Assert(!TransactionIdIsValid(visibility_cutoff_xid));
- flags |= VISIBILITYMAP_ALL_FROZEN;
- }
-
- old_vmbits = heap_page_set_vm_and_log(vacrel->rel, blkno, buffer,
- vmbuffer, visibility_cutoff_xid,
- flags);
-
- /* We know the page should not have been all-visible */
- Assert((old_vmbits & VISIBILITYMAP_VALID_BITS) == 0);
- (void) old_vmbits; /* Silence compiler */
-
- /* Count the newly set VM page for logging */
- if ((flags & VISIBILITYMAP_ALL_VISIBLE) != 0)
- {
- vacrel->vm_new_visible_pages++;
- if (all_frozen)
- vacrel->vm_new_visible_frozen_pages++;
- }
+ vacrel->vm_new_visible_frozen_pages++;
}
/* Revert to the previous phase information for error traceback */
@@ -3570,6 +3574,25 @@ dead_items_cleanup(LVRelState *vacrel)
vacrel->pvs = NULL;
}
+/*
+ * Wrapper for heap_page_is_all_visible_except_lpdead() which can be used for
+ * callers that expect no LP_DEAD on the page.
+ */
+bool
+heap_page_is_all_visible(Relation rel, Buffer buf,
+ TransactionId OldestXmin,
+ bool *all_frozen,
+ TransactionId *visibility_cutoff_xid,
+ OffsetNumber *logging_offnum)
+{
+
+ return heap_page_is_all_visible_except_lpdead(rel, buf, OldestXmin,
+ NULL, 0,
+ all_frozen,
+ visibility_cutoff_xid,
+ logging_offnum);
+}
+
/*
* Check if every tuple in the given page is visible to all current and future
* transactions.
@@ -3583,23 +3606,35 @@ dead_items_cleanup(LVRelState *vacrel)
* visible tuples. Sets *all_frozen to true if every tuple on this page is
* frozen.
*
- * This is a stripped down version of lazy_scan_prune(). If you change
- * anything here, make sure that everything stays in sync. Note that an
- * assertion calls us to verify that everybody still agrees. Be sure to avoid
- * introducing new side-effects here.
+ * deadoffsets are the offsets we know about and are about to set LP_UNUSED.
+ * allowed_num_offsets is the number of those. As long as the LP_DEAD items we
+ * encounter on the page match those exactly, we can set the page all-visible
+ * in the VM.
+ *
+ * Callers looking to verify that the page is all-visible can call
+ * heap_page_is_all_visible().
+ *
+ * This is similar logic to that in heap_prune_record_unchanged_lp_normal() If
+ * you change anything here, make sure that everything stays in sync. Note
+ * that an assertion calls us to verify that everybody still agrees. Be sure
+ * to avoid introducing new side-effects here.
*/
static bool
-heap_page_is_all_visible(Relation rel, Buffer buf,
- TransactionId OldestXmin,
- bool *all_frozen,
- TransactionId *visibility_cutoff_xid,
- OffsetNumber *logging_offnum)
+heap_page_is_all_visible_except_lpdead(Relation rel, Buffer buf,
+ TransactionId OldestXmin,
+ OffsetNumber *deadoffsets,
+ int allowed_num_offsets,
+ bool *all_frozen,
+ TransactionId *visibility_cutoff_xid,
+ OffsetNumber *logging_offnum)
{
Page page = BufferGetPage(buf);
BlockNumber blockno = BufferGetBlockNumber(buf);
OffsetNumber offnum,
maxoff;
bool all_visible = true;
+ OffsetNumber current_dead_offsets[MaxHeapTuplesPerPage];
+ size_t current_num_offsets = 0;
*visibility_cutoff_xid = InvalidTransactionId;
*all_frozen = true;
@@ -3631,9 +3666,8 @@ heap_page_is_all_visible(Relation rel, Buffer buf,
*/
if (ItemIdIsDead(itemid))
{
- all_visible = false;
- *all_frozen = false;
- break;
+ current_dead_offsets[current_num_offsets++] = offnum;
+ continue;
}
Assert(ItemIdIsNormal(itemid));
@@ -3700,7 +3734,23 @@ heap_page_is_all_visible(Relation rel, Buffer buf,
/* Clear the offset information once we have processed the given page. */
*logging_offnum = InvalidOffsetNumber;
- return all_visible;
+ /* If we already know it's not all-visible, return false */
+ if (!all_visible)
+ return false;
+
+ /* If we weren't allowed any dead offsets, we're done */
+ if (allowed_num_offsets == 0)
+ return current_num_offsets == 0;
+
+ /* If the number of dead offsets has changed, that's wrong */
+ if (current_num_offsets != allowed_num_offsets)
+ return false;
+
+ Assert(deadoffsets);
+
+ /* The dead offsets must be the same dead offsets */
+ return memcmp(current_dead_offsets, deadoffsets,
+ allowed_num_offsets * sizeof(OffsetNumber)) == 0;
}
/*
diff --git a/src/backend/access/rmgrdesc/heapdesc.c b/src/backend/access/rmgrdesc/heapdesc.c
index b48d7dc1d24..d6c86ccac20 100644
--- a/src/backend/access/rmgrdesc/heapdesc.c
+++ b/src/backend/access/rmgrdesc/heapdesc.c
@@ -266,6 +266,7 @@ heap2_desc(StringInfo buf, XLogReaderState *record)
{
char *rec = XLogRecGetData(record);
uint8 info = XLogRecGetInfo(record) & ~XLR_INFO_MASK;
+ char *maindataptr = rec + SizeOfHeapPrune;
info &= XLOG_HEAP_OPMASK;
if (info == XLOG_HEAP2_PRUNE_ON_ACCESS ||
@@ -278,7 +279,8 @@ heap2_desc(StringInfo buf, XLogReaderState *record)
{
TransactionId conflict_xid;
- memcpy(&conflict_xid, rec + SizeOfHeapPrune, sizeof(TransactionId));
+ memcpy(&conflict_xid, maindataptr, sizeof(TransactionId));
+ maindataptr += sizeof(TransactionId);
appendStringInfo(buf, "snapshotConflictHorizon: %u",
conflict_xid);
@@ -287,6 +289,15 @@ heap2_desc(StringInfo buf, XLogReaderState *record)
appendStringInfo(buf, ", isCatalogRel: %c",
xlrec->flags & XLHP_IS_CATALOG_REL ? 'T' : 'F');
+ if (xlrec->flags & XLHP_HAS_VMFLAGS)
+ {
+ uint8 vmflags;
+
+ memcpy(&vmflags, maindataptr, sizeof(uint8));
+ maindataptr += sizeof(uint8);
+ appendStringInfo(buf, ", vm_flags: 0x%02X", vmflags);
+ }
+
if (XLogRecHasBlockData(record, 0))
{
Size datalen;
diff --git a/src/include/access/heapam.h b/src/include/access/heapam.h
index 5127fdb9c77..d2ac380bb64 100644
--- a/src/include/access/heapam.h
+++ b/src/include/access/heapam.h
@@ -343,6 +343,12 @@ extern void heap_inplace_update_and_unlock(Relation relation,
Buffer buffer);
extern void heap_inplace_unlock(Relation relation,
HeapTuple oldtup, Buffer buffer);
+
+extern bool heap_page_is_all_visible(Relation rel, Buffer buf,
+ TransactionId OldestXmin,
+ bool *all_frozen,
+ TransactionId *visibility_cutoff_xid,
+ OffsetNumber *logging_offnum);
extern bool heap_prepare_freeze_tuple(HeapTupleHeader tuple,
const struct VacuumCutoffs *cutoffs,
HeapPageFreeze *pagefrz,
@@ -393,6 +399,9 @@ extern void heap_page_prune_execute(Buffer buffer, bool lp_truncate_only,
OffsetNumber *nowunused, int nunused);
extern void heap_get_root_tuples(Page page, OffsetNumber *root_offsets);
extern void log_heap_prune_and_freeze(Relation relation, Buffer buffer,
+ Buffer vmbuffer,
+ uint8 vmflags,
+ bool vm_modified_heap_page,
TransactionId conflict_xid,
bool cleanup_lock,
PruneReason reason,
diff --git a/src/include/access/heapam_xlog.h b/src/include/access/heapam_xlog.h
index 277df6b3cf0..ceae9c083ff 100644
--- a/src/include/access/heapam_xlog.h
+++ b/src/include/access/heapam_xlog.h
@@ -295,6 +295,9 @@ typedef struct xl_heap_prune
#define SizeOfHeapPrune (offsetof(xl_heap_prune, flags) + sizeof(uint8))
+/* If the record should update the VM, this is the new value */
+#define XLHP_HAS_VMFLAGS (1 << 0)
+
/* to handle recovery conflict during logical decoding on standby */
#define XLHP_IS_CATALOG_REL (1 << 1)
--
2.34.1
v1-0008-Use-xl_heap_prune-record-for-setting-empty-pages-.patchtext/x-patch; charset=US-ASCII; name=v1-0008-Use-xl_heap_prune-record-for-setting-empty-pages-.patchDownload
From 52dada80db2b5f5f6e5810c633d953d03ad10c05 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <melanieplageman@gmail.com>
Date: Wed, 18 Jun 2025 12:42:19 -0400
Subject: [PATCH v1 08/14] Use xl_heap_prune record for setting empty pages
all-visible
As part of a project to eliminate xl_heap_visible records, eliminate
their usage in phase I vacuum of empty pages.
---
src/backend/access/heap/pruneheap.c | 14 ++++--
src/backend/access/heap/vacuumlazy.c | 64 ++++++++++++++++++----------
src/include/access/heapam.h | 1 +
3 files changed, 54 insertions(+), 25 deletions(-)
diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index d9ba0f96e34..97e51f78854 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -836,6 +836,7 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
conflict_xid = prstate.latest_xid_removed;
log_heap_prune_and_freeze(relation, buffer,
+ false,
InvalidBuffer, 0, false,
conflict_xid,
true, reason,
@@ -2051,6 +2052,9 @@ heap_log_freeze_plan(HeapTupleFreeze *tuples, int ntuples,
* case, vmbuffer should already have been updated and marked dirty and should
* still be pinned and locked.
*
+ * force_heap_fpi indicates that a full page image of the heap block should be
+ * forced.
+ *
* set_pd_all_vis indicates that we set PD_ALL_VISIBLE and thus should update
* the page LSN when checksums/wal_log_hints are enabled even if we did not
* prune or freeze tuples on the page.
@@ -2061,6 +2065,7 @@ heap_log_freeze_plan(HeapTupleFreeze *tuples, int ntuples,
*/
void
log_heap_prune_and_freeze(Relation relation, Buffer buffer,
+ bool force_heap_fpi,
Buffer vmbuffer,
uint8 vmflags,
bool set_pd_all_vis,
@@ -2089,13 +2094,16 @@ log_heap_prune_and_freeze(Relation relation, Buffer buffer,
xlrec.flags = 0;
regbuf_flags = REGBUF_STANDARD;
+ if (force_heap_fpi)
+ regbuf_flags |= REGBUF_FORCE_IMAGE;
+
/*
* We can avoid an FPI if the only modification we are making to the heap
* page is to set PD_ALL_VISIBLE and checksums/wal_log_hints are disabled.
*/
- if (!do_prune &&
- nfrozen == 0 &&
- (!set_pd_all_vis || !XLogHintBitIsNeeded()))
+ else if (!do_prune &&
+ nfrozen == 0 &&
+ (!set_pd_all_vis || !XLogHintBitIsNeeded()))
regbuf_flags |= REGBUF_NO_IMAGE;
/*
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index 32f21d20194..b1ff49bee6b 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -1850,6 +1850,7 @@ lazy_scan_new_or_empty(LVRelState *vacrel, Buffer buf, BlockNumber blkno,
if (PageIsEmpty(page))
{
+
/*
* It seems likely that caller will always be able to get a cleanup
* lock on an empty page. But don't take any chances -- escalate to
@@ -1877,35 +1878,53 @@ lazy_scan_new_or_empty(LVRelState *vacrel, Buffer buf, BlockNumber blkno,
*/
if (!PageIsAllVisible(page))
{
- uint8 old_vmbits;
+ uint8 old_vmbits = 0;
+ uint8 new_vmbits = 0;
- START_CRIT_SECTION();
+ new_vmbits = VISIBILITYMAP_ALL_VISIBLE |
+ VISIBILITYMAP_ALL_FROZEN;
- /* mark buffer dirty before writing a WAL record */
- MarkBufferDirty(buf);
+ START_CRIT_SECTION();
- /*
- * It's possible that another backend has extended the heap,
- * initialized the page, and then failed to WAL-log the page due
- * to an ERROR. Since heap extension is not WAL-logged, recovery
- * might try to replay our record setting the page all-visible and
- * find that the page isn't initialized, which will cause a PANIC.
- * To prevent that, check whether the page has been previously
- * WAL-logged, and if not, do that now.
- */
- if (RelationNeedsWAL(vacrel->rel) &&
- PageGetLSN(page) == InvalidXLogRecPtr)
- log_newpage_buffer(buf, true);
-
- old_vmbits = heap_page_set_vm_and_log(vacrel->rel, blkno, buf,
- vmbuffer, InvalidTransactionId,
- VISIBILITYMAP_ALL_VISIBLE |
- VISIBILITYMAP_ALL_FROZEN);
- END_CRIT_SECTION();
+ LockBuffer(vmbuffer, BUFFER_LOCK_EXCLUSIVE);
+ old_vmbits = heap_page_set_vm(vacrel->rel, blkno, buf,
+ vmbuffer, new_vmbits);
/* VM bits cannot have been set if PD_ALL_VISIBLE was clear */
Assert((old_vmbits & VISIBILITYMAP_VALID_BITS) == 0);
(void) old_vmbits; /* Silence compiler */
+
+ /* Should have set PD_ALL_VISIBLE and marked buf dirty */
+ Assert(BufferIsDirty(buf));
+
+ if (RelationNeedsWAL(vacrel->rel))
+ {
+ /*
+ * It's possible that another backend has extended the heap,
+ * initialized the page, and then failed to WAL-log the page
+ * due to an ERROR. Since heap extension is not WAL-logged,
+ * recovery might try to replay our record setting the page
+ * all-visible and find that the page isn't initialized, which
+ * will cause a PANIC. To prevent that, if the page hasn't
+ * been previously WAL-logged, force a heap FPI.
+ */
+ log_heap_prune_and_freeze(vacrel->rel, buf,
+ PageGetLSN(page) == InvalidXLogRecPtr,
+ vmbuffer,
+ new_vmbits,
+ true,
+ InvalidTransactionId,
+ false, PRUNE_VACUUM_SCAN,
+ NULL, 0,
+ NULL, 0,
+ NULL, 0,
+ NULL, 0);
+ }
+
+ END_CRIT_SECTION();
+
+ LockBuffer(vmbuffer, BUFFER_LOCK_UNLOCK);
+
/* Count the newly all-frozen pages for logging. */
vacrel->vm_new_visible_pages++;
vacrel->vm_new_visible_frozen_pages++;
@@ -2892,6 +2911,7 @@ lazy_vacuum_heap_page(LVRelState *vacrel, BlockNumber blkno, Buffer buffer,
if (RelationNeedsWAL(vacrel->rel))
{
log_heap_prune_and_freeze(vacrel->rel, buffer,
+ false,
vmbuffer,
vmflags,
set_pd_all_vis,
diff --git a/src/include/access/heapam.h b/src/include/access/heapam.h
index d2ac380bb64..1fa6eb047fd 100644
--- a/src/include/access/heapam.h
+++ b/src/include/access/heapam.h
@@ -399,6 +399,7 @@ extern void heap_page_prune_execute(Buffer buffer, bool lp_truncate_only,
OffsetNumber *nowunused, int nunused);
extern void heap_get_root_tuples(Page page, OffsetNumber *root_offsets);
extern void log_heap_prune_and_freeze(Relation relation, Buffer buffer,
+ bool force_heap_fpi,
Buffer vmbuffer,
uint8 vmflags,
bool vm_modified_heap_page,
--
2.34.1
v1-0010-Combine-vacuum-phase-I-VM-update-cases.patchtext/x-patch; charset=US-ASCII; name=v1-0010-Combine-vacuum-phase-I-VM-update-cases.patchDownload
From 00d6a64f3c5fa4d87e59968b636941846e6c542b Mon Sep 17 00:00:00 2001
From: Melanie Plageman <melanieplageman@gmail.com>
Date: Wed, 28 May 2025 16:35:36 -0400
Subject: [PATCH v1 10/14] Combine vacuum phase I VM update cases
After phase I of vacuum we update the VM, either setting the VM bits
when all bits are currently unset or setting just the frozen bit when
the all-visible bit is already set. Those cases had a lot of duplicated
code. Combine them. This is simpler to understand and also allows makes
the code compact enough to start using to update the VM while pruning
and freezing.
---
src/backend/access/heap/vacuumlazy.c | 71 +++++++++-------------------
1 file changed, 22 insertions(+), 49 deletions(-)
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index 8328cab0955..fdac36f0835 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -2158,11 +2158,26 @@ lazy_scan_prune(LVRelState *vacrel,
{
/* Don't update the VM if we just cleared corruption in it */
}
- else if (!all_visible_according_to_vm && presult.all_visible)
+
+ /*
+ * If the page isn't yet marked all-visible in the VM or it is and needs
+ * to me marked all-frozen, update the VM Note that all_frozen is only
+ * valid if all_visible is true, so we must check both all_visible and
+ * all_frozen.
+ */
+ else if (presult.all_visible &&
+ (!all_visible_according_to_vm ||
+ (presult.all_frozen && !VM_ALL_FROZEN(vacrel->rel, blkno, &vmbuffer))))
{
uint8 old_vmbits;
uint8 flags = VISIBILITYMAP_ALL_VISIBLE;
+ /*
+ * If the page is all-frozen, we can pass InvalidTransactionId as our
+ * cutoff_xid, since a snapshotConflictHorizon sufficient to make
+ * everything safe for REDO was logged when the page's tuples were
+ * frozen.
+ */
if (presult.all_frozen)
{
Assert(!TransactionIdIsValid(presult.vm_conflict_horizon));
@@ -2174,6 +2189,12 @@ lazy_scan_prune(LVRelState *vacrel,
flags);
/*
+ * Even if we are only setting the all-frozen bit, there is a small
+ * chance that the VM was modified sometime between setting
+ * all_visible_according_to_vm and checking the visibility during
+ * pruning. Check the return value of old_vmbits to ensure the
+ * visibility map counters used for logging are accurate.
+ *
* If the page wasn't already set all-visible and/or all-frozen in the
* VM, count it as newly set for logging.
*/
@@ -2193,54 +2214,6 @@ lazy_scan_prune(LVRelState *vacrel,
*vm_page_frozen = true;
}
}
-
- /*
- * If the all-visible page is all-frozen but not marked as such yet, mark
- * it as all-frozen. Note that all_frozen is only valid if all_visible is
- * true, so we must check both all_visible and all_frozen.
- */
- else if (all_visible_according_to_vm && presult.all_visible &&
- presult.all_frozen && !VM_ALL_FROZEN(vacrel->rel, blkno, &vmbuffer))
- {
- uint8 old_vmbits;
-
- /*
- * Set the page all-frozen (and all-visible) in the VM.
- *
- * We can pass InvalidTransactionId as our cutoff_xid, since a
- * snapshotConflictHorizon sufficient to make everything safe for REDO
- * was logged when the page's tuples were frozen.
- */
- Assert(!TransactionIdIsValid(presult.vm_conflict_horizon));
- old_vmbits = heap_page_set_vm_and_log(vacrel->rel, blkno, buf,
- vmbuffer, InvalidTransactionId,
- VISIBILITYMAP_ALL_VISIBLE |
- VISIBILITYMAP_ALL_FROZEN);
-
- /*
- * The page was likely already set all-visible in the VM. However,
- * there is a small chance that it was modified sometime between
- * setting all_visible_according_to_vm and checking the visibility
- * during pruning. Check the return value of old_vmbits anyway to
- * ensure the visibility map counters used for logging are accurate.
- */
- if ((old_vmbits & VISIBILITYMAP_ALL_VISIBLE) == 0)
- {
- vacrel->vm_new_visible_pages++;
- vacrel->vm_new_visible_frozen_pages++;
- *vm_page_frozen = true;
- }
-
- /*
- * We already checked that the page was not set all-frozen in the VM
- * above, so we don't need to test the value of old_vmbits.
- */
- else
- {
- vacrel->vm_new_frozen_pages++;
- *vm_page_frozen = true;
- }
- }
}
/*
--
2.34.1
v1-0012-Update-VM-in-pruneheap.c.patchtext/x-patch; charset=US-ASCII; name=v1-0012-Update-VM-in-pruneheap.c.patchDownload
From 53357a7e0f61e1ec00323c0cd14c8afb3f655b83 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <melanieplageman@gmail.com>
Date: Mon, 2 Jun 2025 11:04:14 -0400
Subject: [PATCH v1 12/14] Update VM in pruneheap.c
As a step toward updating the VM in the same critical section and WAL
record as pruning and freezing (during phase I of vacuuming), first move
the VM update (still in its own critical section and WAL record) into
heap_page_prune_and_freeze(). This makes review easier.
---
src/backend/access/heap/pruneheap.c | 99 +++++++++++++++++++++++-----
src/backend/access/heap/vacuumlazy.c | 81 +++++------------------
src/include/access/heapam.h | 15 +++--
3 files changed, 106 insertions(+), 89 deletions(-)
diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index 496b70e318f..425dcc77534 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -364,7 +364,8 @@ identify_and_fix_vm_corruption(Relation relation,
/*
* Prune and repair fragmentation and potentially freeze tuples on the
- * specified page.
+ * specified page. If the page's visibility status has changed, update it in
+ * the VM.
*
* Caller must have pin and buffer cleanup lock on the page. Note that we
* don't update the FSM information for page on caller's behalf. Caller might
@@ -440,6 +441,8 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
bool do_freeze;
bool do_prune;
bool do_hint;
+ uint8 vmflags = 0;
+ uint8 old_vmbits = 0;
bool hint_bit_fpi;
int64 fpi_before = pgWalUsage.wal_fpi;
@@ -939,7 +942,7 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
*
* Now that freezing has been finalized, unset all_visible if there are
* any LP_DEAD items on the page. It needs to reflect the present state
- * of the page, as expected by our caller.
+ * of the page, as expected for updating the visibility map.
*/
if (prstate.all_visible && prstate.lpdead_items == 0)
{
@@ -955,31 +958,91 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
presult->hastup = prstate.hastup;
/*
- * For callers planning to update the visibility map, the conflict horizon
- * for that record must be the newest xmin on the page. However, if the
- * page is completely frozen, there can be no conflict and the
- * vm_conflict_horizon should remain InvalidTransactionId. This includes
- * the case that we just froze all the tuples; the prune-freeze record
- * included the conflict XID already so the caller doesn't need it.
+ * If updating the visibility map, the conflict horizon for that record
+ * must be the newest xmin on the page. However, if the page is
+ * completely frozen, there can be no conflict and the vm_conflict_horizon
+ * should remain InvalidTransactionId. This includes the case that we
+ * just froze all the tuples; the prune-freeze record included the
+ * conflict XID already so the VM update record doesn't need it.
*/
if (presult->all_frozen)
presult->vm_conflict_horizon = InvalidTransactionId;
else
presult->vm_conflict_horizon = prstate.visibility_cutoff_xid;
- presult->lpdead_items = prstate.lpdead_items;
- /* the presult->deadoffsets array was already filled in */
-
/*
- * Clear any VM corruption. This does not need to be done in a critical
- * section.
+ * Handle setting visibility map bit based on information from the VM (as
+ * of last heap_vac_scan_next_block() call), and from all_visible and
+ * all_frozen variables.
*/
- presult->vm_corruption = false;
if (options & HEAP_PAGE_PRUNE_UPDATE_VM)
- presult->vm_corruption = identify_and_fix_vm_corruption(relation,
- blockno, buffer, page,
- blk_known_av,
- prstate.lpdead_items, vmbuffer);
+ {
+ if (identify_and_fix_vm_corruption(relation,
+ blockno, buffer, page,
+ blk_known_av,
+ prstate.lpdead_items, vmbuffer))
+ {
+ /* If we fix corruption, don't update the VM further */
+ }
+
+ /*
+ * If the page isn't yet marked all-visible in the VM or it is and
+ * needs to me marked all-frozen, update the VM Note that all_frozen
+ * is only valid if all_visible is true, so we must check both
+ * all_visible and all_frozen.
+ */
+ else if (presult->all_visible &&
+ (!blk_known_av ||
+ (presult->all_frozen && !VM_ALL_FROZEN(relation, blockno, &vmbuffer))))
+ {
+ Assert(prstate.lpdead_items == 0);
+ vmflags = VISIBILITYMAP_ALL_VISIBLE;
+
+ /*
+ * If the page is all-frozen, we can pass InvalidTransactionId as
+ * our cutoff_xid, since a snapshotConflictHorizon sufficient to
+ * make everything safe for REDO was logged when the page's tuples
+ * were frozen.
+ */
+ if (presult->all_frozen)
+ {
+ Assert(!TransactionIdIsValid(presult->vm_conflict_horizon));
+ vmflags |= VISIBILITYMAP_ALL_FROZEN;
+ }
+
+ /*
+ * It's possible for the VM bit to be clear and the page-level bit
+ * to be set if checksums are not enabled.
+ *
+ * And even if we are just planning to update the frozen bit in
+ * the VM, we shouldn't rely on all_visible_according_to_vm as a
+ * proxy for the page-level PD_ALL_VISIBLE bit being set, since it
+ * might have become stale.
+ *
+ * If the heap page is all-visible but the VM bit is not set, we
+ * don't need to dirty the heap page. However, if checksums are
+ * enabled, we do need to make sure that the heap page is dirtied
+ * before passing it to visibilitymap_set(), because it may be
+ * logged.
+ */
+ if (!PageIsAllVisible(page) || XLogHintBitIsNeeded())
+ {
+ PageSetAllVisible(page);
+ MarkBufferDirty(buffer);
+ }
+
+ old_vmbits = heap_page_set_vm_and_log(relation, blockno, buffer,
+ vmbuffer, presult->vm_conflict_horizon,
+ vmflags);
+ }
+ }
+
+ presult->lpdead_items = prstate.lpdead_items;
+ /* the presult->deadoffsets array was already filled in */
+
+ presult->old_vmbits = old_vmbits;
+ presult->new_vmbits = vmflags;
+
if (prstate.freeze)
{
if (presult->nfrozen > 0)
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index 592cd455cf4..5806207a674 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -1940,7 +1940,6 @@ lazy_scan_new_or_empty(LVRelState *vacrel, Buffer buf, BlockNumber blkno,
return false;
}
-
/* qsort comparator for sorting OffsetNumbers */
static int
cmpOffsetNumbers(const void *a, const void *b)
@@ -1956,7 +1955,8 @@ cmpOffsetNumbers(const void *a, const void *b)
* vmbuffer is the buffer containing the VM block with visibility information
* for the heap block, blkno. all_visible_according_to_vm is the saved
* visibility status of the heap block looked up earlier by the caller. We
- * won't rely entirely on this status, as it may be out of date.
+ * won't rely entirely on this status, as it may be out of date. These will be
+ * passed on to heap_page_prune_and_freeze() to use while setting the VM.
*
* *has_lpdead_items is set to true or false depending on whether, upon return
* from this function, any LP_DEAD items are still present on the page.
@@ -1983,6 +1983,7 @@ lazy_scan_prune(LVRelState *vacrel,
/*
* Prune all HOT-update chains and potentially freeze tuples on this page.
+ * Then, if the page's visibility status has changed, update the VM.
*
* If the relation has no indexes, we can immediately mark would-be dead
* items LP_UNUSED.
@@ -1991,10 +1992,6 @@ lazy_scan_prune(LVRelState *vacrel,
* presult.ndeleted. It should not be confused with presult.lpdead_items;
* presult.lpdead_items's final value can be thought of as the number of
* tuples that were deleted from indexes.
- *
- * We will update the VM after collecting LP_DEAD items and freezing
- * tuples. Pruning will have determined whether or not the page is
- * all-visible.
*/
prune_options = HEAP_PAGE_PRUNE_FREEZE | HEAP_PAGE_PRUNE_UPDATE_VM;
if (vacrel->nindexes == 0)
@@ -2086,70 +2083,26 @@ lazy_scan_prune(LVRelState *vacrel,
Assert(!presult.all_visible || !(*has_lpdead_items));
/*
- * Handle setting visibility map bit based on information from the VM (as
- * of last heap_vac_scan_next_block() call), and from all_visible and
- * all_frozen variables.
- */
- if (presult.vm_corruption)
- {
- /* Don't update the VM if we just cleared corruption in it */
- }
-
- /*
- * If the page isn't yet marked all-visible in the VM or it is and needs
- * to me marked all-frozen, update the VM Note that all_frozen is only
- * valid if all_visible is true, so we must check both all_visible and
- * all_frozen.
+ * For the purposes of logging, count whether or not the page was newly
+ * set all-visible and, potentially, all-frozen.
*/
- else if (presult.all_visible &&
- (!all_visible_according_to_vm ||
- (presult.all_frozen && !VM_ALL_FROZEN(vacrel->rel, blkno, &vmbuffer))))
+ if ((presult.old_vmbits & VISIBILITYMAP_ALL_VISIBLE) == 0 &&
+ (presult.new_vmbits & VISIBILITYMAP_ALL_VISIBLE) != 0)
{
- uint8 old_vmbits;
- uint8 flags = VISIBILITYMAP_ALL_VISIBLE;
-
- /*
- * If the page is all-frozen, we can pass InvalidTransactionId as our
- * cutoff_xid, since a snapshotConflictHorizon sufficient to make
- * everything safe for REDO was logged when the page's tuples were
- * frozen.
- */
- if (presult.all_frozen)
- {
- Assert(!TransactionIdIsValid(presult.vm_conflict_horizon));
- flags |= VISIBILITYMAP_ALL_FROZEN;
- }
-
- old_vmbits = heap_page_set_vm_and_log(vacrel->rel, blkno, buf,
- vmbuffer, presult.vm_conflict_horizon,
- flags);
-
- /*
- * Even if we are only setting the all-frozen bit, there is a small
- * chance that the VM was modified sometime between setting
- * all_visible_according_to_vm and checking the visibility during
- * pruning. Check the return value of old_vmbits to ensure the
- * visibility map counters used for logging are accurate.
- *
- * If the page wasn't already set all-visible and/or all-frozen in the
- * VM, count it as newly set for logging.
- */
- if ((old_vmbits & VISIBILITYMAP_ALL_VISIBLE) == 0)
- {
- vacrel->vm_new_visible_pages++;
- if (presult.all_frozen)
- {
- vacrel->vm_new_visible_frozen_pages++;
- *vm_page_frozen = true;
- }
- }
- else if ((old_vmbits & VISIBILITYMAP_ALL_FROZEN) == 0 &&
- presult.all_frozen)
+ vacrel->vm_new_visible_pages++;
+ if ((presult.new_vmbits & VISIBILITYMAP_ALL_FROZEN) != 0)
{
- vacrel->vm_new_frozen_pages++;
+ vacrel->vm_new_visible_frozen_pages++;
*vm_page_frozen = true;
}
}
+ else if ((presult.old_vmbits & VISIBILITYMAP_ALL_FROZEN) == 0 &&
+ (presult.new_vmbits & VISIBILITYMAP_ALL_FROZEN) != 0)
+ {
+ Assert((presult.new_vmbits & VISIBILITYMAP_ALL_VISIBLE) != 0);
+ vacrel->vm_new_frozen_pages++;
+ *vm_page_frozen = true;
+ }
}
/*
diff --git a/src/include/access/heapam.h b/src/include/access/heapam.h
index 0886867a161..534a63aab31 100644
--- a/src/include/access/heapam.h
+++ b/src/include/access/heapam.h
@@ -234,20 +234,21 @@ typedef struct PruneFreezeResult
int recently_dead_tuples;
/*
- * all_visible and all_frozen indicate if the all-visible and all-frozen
- * bits in the visibility map can be set for this page, after pruning.
+ * all_visible and all_frozen indicate the status of the page as reflected
+ * in the visibility map after pruning, freezing, and setting any pages
+ * all-visible in the visibility map.
*
- * vm_conflict_horizon is the newest xmin of live tuples on the page. The
- * caller can use it as the conflict horizon when setting the VM bits. It
- * is only valid if we froze some tuples (nfrozen > 0), and all_frozen is
- * true.
+ * vm_conflict_horizon is the newest xmin of live tuples on the page
+ * (older than OldestXmin). It will only be valid if we did not set the
+ * page all-frozen in the VM.
*
* These are only set if the HEAP_PRUNE_FREEZE option is set.
*/
bool all_visible;
bool all_frozen;
TransactionId vm_conflict_horizon;
- bool vm_corruption;
+ uint8 old_vmbits;
+ uint8 new_vmbits;
/*
* Whether or not the page makes rel truncation unsafe. This is set to
--
2.34.1
v1-0014-Remove-xl_heap_visible-entirely.patchtext/x-patch; charset=US-ASCII; name=v1-0014-Remove-xl_heap_visible-entirely.patchDownload
From 51ce0c717152c316a29ce97edf6dfd8f720c3cba Mon Sep 17 00:00:00 2001
From: Melanie Plageman <melanieplageman@gmail.com>
Date: Wed, 18 Jun 2025 12:30:42 -0400
Subject: [PATCH v1 14/14] Remove xl_heap_visible entirely
There are now no users of this, so eliminate it entirely.
ci-os-only:
---
src/backend/access/common/bufmask.c | 3 +-
src/backend/access/heap/heapam.c | 83 +------------
src/backend/access/heap/heapam_xlog.c | 149 +----------------------
src/backend/access/heap/visibilitymap.c | 101 +--------------
src/backend/access/rmgrdesc/heapdesc.c | 10 --
src/backend/replication/logical/decode.c | 1 -
src/include/access/heapam.h | 3 -
src/include/access/heapam_xlog.h | 6 -
src/include/access/visibilitymap.h | 10 +-
9 files changed, 12 insertions(+), 354 deletions(-)
diff --git a/src/backend/access/common/bufmask.c b/src/backend/access/common/bufmask.c
index bb260cffa68..1fff01383b3 100644
--- a/src/backend/access/common/bufmask.c
+++ b/src/backend/access/common/bufmask.c
@@ -56,8 +56,7 @@ mask_page_hint_bits(Page page)
/*
* During replay, if the page LSN has advanced past our XLOG record's LSN,
- * we don't mark the page all-visible. See heap_xlog_visible() for
- * details.
+ * we don't mark the page all-visible.
*/
PageClearAllVisible(page);
}
diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index 3d9b114b4e8..6f134dfd535 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -40,6 +40,7 @@
#include "access/valid.h"
#include "access/visibilitymap.h"
#include "access/xloginsert.h"
+#include "access/xlogutils.h"
#include "catalog/pg_database.h"
#include "catalog/pg_database_d.h"
#include "commands/vacuum.h"
@@ -7845,43 +7846,6 @@ heap_tuple_needs_eventual_freeze(HeapTupleHeader tuple)
return false;
}
-/*
- * Make the heap and VM page changes needed to set a page all-visible.
- * Do not call in recovery.
- */
-uint8
-heap_page_set_vm_and_log(Relation rel, BlockNumber heap_blk, Buffer heap_buf,
- Buffer vmbuf, TransactionId cutoff_xid,
- uint8 vmflags)
-{
- Page heap_page = BufferGetPage(heap_buf);
- bool set_heap_lsn = false;
-
- Assert(BufferIsValid(heap_buf));
-
- /* Check that we have the right heap page pinned, if present */
- if (BufferGetBlockNumber(heap_buf) != heap_blk)
- elog(ERROR, "wrong heap buffer passed to heap_page_set_vm_and_log");
-
- /*
- * We must never end up with the VM bit set and the page-level
- * PD_ALL_VISIBLE bit clear. If that were to occur, a subsequent page
- * modification would fail to clear the VM bit. Though it is possible for
- * the page-level bit to be set and the VM bit to be clear if checksums
- * and wal_log_hints are not enabled.
- */
- if (!PageIsAllVisible(heap_page))
- {
- PageSetAllVisible(heap_page);
- MarkBufferDirty(heap_buf);
- if (XLogHintBitIsNeeded())
- set_heap_lsn = true;
- }
-
- return visibilitymap_set(rel, heap_blk, heap_buf,
- InvalidXLogRecPtr, vmbuf, cutoff_xid, vmflags, set_heap_lsn);
-}
-
/*
* Ensure the provided heap page is marked PD_ALL_VISIBLE and then set the
* provided vmflags in the provided vmbuf.
@@ -7923,7 +7887,7 @@ heap_page_set_vm(Relation rel, BlockNumber heap_blk, Buffer heap_buf,
MarkBufferDirty(heap_buf);
}
- return visibilitymap_set_vmbyte(rel, heap_blk, vmbuf, vmflags);
+ return visibilitymap_set(rel, heap_blk, vmbuf, vmflags);
}
/*
@@ -8865,49 +8829,6 @@ bottomup_sort_and_shrink(TM_IndexDeleteOp *delstate)
return nblocksfavorable;
}
-/*
- * Perform XLogInsert for a heap-visible operation. 'block' is the block
- * being marked all-visible, and vm_buffer is the buffer containing the
- * corresponding visibility map block. Both should have already been modified
- * and dirtied.
- *
- * snapshotConflictHorizon comes from the largest xmin on the page being
- * marked all-visible. REDO routine uses it to generate recovery conflicts.
- *
- * If checksums or wal_log_hints are enabled, we may also generate a full-page
- * image of heap_buffer. Otherwise, we optimize away the FPI (by specifying
- * REGBUF_NO_IMAGE for the heap buffer), in which case the caller should *not*
- * update the heap page's LSN.
- */
-XLogRecPtr
-log_heap_visible(Relation rel, Buffer heap_buffer, Buffer vm_buffer,
- TransactionId snapshotConflictHorizon, uint8 vmflags)
-{
- xl_heap_visible xlrec;
- XLogRecPtr recptr;
- uint8 flags;
-
- Assert(BufferIsValid(heap_buffer));
- Assert(BufferIsValid(vm_buffer));
-
- xlrec.snapshotConflictHorizon = snapshotConflictHorizon;
- xlrec.flags = vmflags;
- if (RelationIsAccessibleInLogicalDecoding(rel))
- xlrec.flags |= VISIBILITYMAP_XLOG_CATALOG_REL;
- XLogBeginInsert();
- XLogRegisterData(&xlrec, SizeOfHeapVisible);
-
- XLogRegisterBuffer(0, vm_buffer, 0);
-
- flags = REGBUF_STANDARD;
- if (!XLogHintBitIsNeeded())
- flags |= REGBUF_NO_IMAGE;
- XLogRegisterBuffer(1, heap_buffer, flags);
-
- recptr = XLogInsert(RM_HEAP2_ID, XLOG_HEAP2_VISIBLE);
-
- return recptr;
-}
/*
* Perform XLogInsert for a heap-update operation. Caller must already
diff --git a/src/backend/access/heap/heapam_xlog.c b/src/backend/access/heap/heapam_xlog.c
index bb6680c0467..c64fc39bc01 100644
--- a/src/backend/access/heap/heapam_xlog.c
+++ b/src/backend/access/heap/heapam_xlog.c
@@ -273,7 +273,7 @@ heap_xlog_prune_freeze(XLogReaderState *record)
Relation reln = CreateFakeRelcacheEntry(rlocator);
visibilitymap_pin(reln, blkno, &vmbuffer);
- old_vmbits = visibilitymap_set_vmbyte(reln, blkno, vmbuffer, vmflags);
+ old_vmbits = visibilitymap_set(reln, blkno, vmbuffer, vmflags);
/* Only set VM page LSN if we modified the page */
if (old_vmbits != vmflags)
PageSetLSN(BufferGetPage(vmbuffer), lsn);
@@ -284,142 +284,6 @@ heap_xlog_prune_freeze(XLogReaderState *record)
UnlockReleaseBuffer(vmbuffer);
}
-/*
- * Replay XLOG_HEAP2_VISIBLE records.
- *
- * The critical integrity requirement here is that we must never end up with
- * a situation where the visibility map bit is set, and the page-level
- * PD_ALL_VISIBLE bit is clear. If that were to occur, then a subsequent
- * page modification would fail to clear the visibility map bit.
- */
-static void
-heap_xlog_visible(XLogReaderState *record)
-{
- XLogRecPtr lsn = record->EndRecPtr;
- xl_heap_visible *xlrec = (xl_heap_visible *) XLogRecGetData(record);
- Buffer vmbuffer = InvalidBuffer;
- Buffer buffer;
- Page page;
- RelFileLocator rlocator;
- BlockNumber blkno;
- XLogRedoAction action;
-
- Assert((xlrec->flags & VISIBILITYMAP_XLOG_VALID_BITS) == xlrec->flags);
-
- XLogRecGetBlockTag(record, 1, &rlocator, NULL, &blkno);
-
- /*
- * If there are any Hot Standby transactions running that have an xmin
- * horizon old enough that this page isn't all-visible for them, they
- * might incorrectly decide that an index-only scan can skip a heap fetch.
- *
- * NB: It might be better to throw some kind of "soft" conflict here that
- * forces any index-only scan that is in flight to perform heap fetches,
- * rather than killing the transaction outright.
- */
- if (InHotStandby)
- ResolveRecoveryConflictWithSnapshot(xlrec->snapshotConflictHorizon,
- xlrec->flags & VISIBILITYMAP_XLOG_CATALOG_REL,
- rlocator);
-
- /*
- * Read the heap page, if it still exists. If the heap file has dropped or
- * truncated later in recovery, we don't need to update the page, but we'd
- * better still update the visibility map.
- */
- action = XLogReadBufferForRedo(record, 1, &buffer);
- if (action == BLK_NEEDS_REDO)
- {
- /*
- * We don't bump the LSN of the heap page when setting the visibility
- * map bit (unless checksums or wal_hint_bits is enabled, in which
- * case we must). This exposes us to torn page hazards, but since
- * we're not inspecting the existing page contents in any way, we
- * don't care.
- */
- page = BufferGetPage(buffer);
-
- PageSetAllVisible(page);
-
- if (XLogHintBitIsNeeded())
- PageSetLSN(page, lsn);
-
- MarkBufferDirty(buffer);
- }
- else if (action == BLK_RESTORED)
- {
- /*
- * If heap block was backed up, we already restored it and there's
- * nothing more to do. (This can only happen with checksums or
- * wal_log_hints enabled.)
- */
- }
-
- if (BufferIsValid(buffer))
- {
- Size space = PageGetFreeSpace(BufferGetPage(buffer));
-
- UnlockReleaseBuffer(buffer);
-
- /*
- * Since FSM is not WAL-logged and only updated heuristically, it
- * easily becomes stale in standbys. If the standby is later promoted
- * and runs VACUUM, it will skip updating individual free space
- * figures for pages that became all-visible (or all-frozen, depending
- * on the vacuum mode,) which is troublesome when FreeSpaceMapVacuum
- * propagates too optimistic free space values to upper FSM layers;
- * later inserters try to use such pages only to find out that they
- * are unusable. This can cause long stalls when there are many such
- * pages.
- *
- * Forestall those problems by updating FSM's idea about a page that
- * is becoming all-visible or all-frozen.
- *
- * Do this regardless of a full-page image being applied, since the
- * FSM data is not in the page anyway.
- */
- if (xlrec->flags & VISIBILITYMAP_VALID_BITS)
- XLogRecordPageWithFreeSpace(rlocator, blkno, space);
- }
-
- /*
- * Even if we skipped the heap page update due to the LSN interlock, it's
- * still safe to update the visibility map. Any WAL record that clears
- * the visibility map bit does so before checking the page LSN, so any
- * bits that need to be cleared will still be cleared.
- */
- if (XLogReadBufferForRedoExtended(record, 0, RBM_ZERO_ON_ERROR, false,
- &vmbuffer) == BLK_NEEDS_REDO)
- {
- Page vmpage = BufferGetPage(vmbuffer);
- Relation reln;
- uint8 vmbits;
-
- /* initialize the page if it was read as zeros */
- if (PageIsNew(vmpage))
- PageInit(vmpage, BLCKSZ, 0);
-
- /* remove VISIBILITYMAP_XLOG_* */
- vmbits = xlrec->flags & VISIBILITYMAP_VALID_BITS;
-
- /*
- * XLogReadBufferForRedoExtended locked the buffer. But
- * visibilitymap_set will handle locking itself.
- */
- LockBuffer(vmbuffer, BUFFER_LOCK_UNLOCK);
-
- reln = CreateFakeRelcacheEntry(rlocator);
- visibilitymap_pin(reln, blkno, &vmbuffer);
-
- visibilitymap_set(reln, blkno, InvalidBuffer, lsn, vmbuffer,
- xlrec->snapshotConflictHorizon, vmbits, false);
-
- ReleaseBuffer(vmbuffer);
- FreeFakeRelcacheEntry(reln);
- }
- else if (BufferIsValid(vmbuffer))
- UnlockReleaseBuffer(vmbuffer);
-}
/*
* Given an "infobits" field from an XLog record, set the correct bits in the
@@ -799,10 +663,10 @@ heap_xlog_multi_insert(XLogReaderState *record)
Relation reln = CreateFakeRelcacheEntry(rlocator);
visibilitymap_pin(reln, blkno, &vmbuffer);
- old_vmbits = visibilitymap_set_vmbyte(reln, blkno,
- vmbuffer,
- VISIBILITYMAP_ALL_VISIBLE |
- VISIBILITYMAP_ALL_FROZEN);
+ old_vmbits = visibilitymap_set(reln, blkno,
+ vmbuffer,
+ VISIBILITYMAP_ALL_VISIBLE |
+ VISIBILITYMAP_ALL_FROZEN);
Assert((old_vmbits & VISIBILITYMAP_VALID_BITS) == 0);
(void) old_vmbits; /* Silence compiler */
PageSetLSN(BufferGetPage(vmbuffer), lsn);
@@ -1384,9 +1248,6 @@ heap2_redo(XLogReaderState *record)
case XLOG_HEAP2_PRUNE_VACUUM_CLEANUP:
heap_xlog_prune_freeze(record);
break;
- case XLOG_HEAP2_VISIBLE:
- heap_xlog_visible(record);
- break;
case XLOG_HEAP2_MULTI_INSERT:
heap_xlog_multi_insert(record);
break;
diff --git a/src/backend/access/heap/visibilitymap.c b/src/backend/access/heap/visibilitymap.c
index 9f27ace0e1c..a24554fe191 100644
--- a/src/backend/access/heap/visibilitymap.c
+++ b/src/backend/access/heap/visibilitymap.c
@@ -219,103 +219,6 @@ visibilitymap_pin_ok(BlockNumber heapBlk, Buffer vmbuf)
return BufferIsValid(vmbuf) && BufferGetBlockNumber(vmbuf) == mapBlock;
}
-/*
- * visibilitymap_set - set bit(s) on a previously pinned page
- *
- * recptr is the LSN of the XLOG record we're replaying, if we're in recovery,
- * or InvalidXLogRecPtr in normal running. The VM page LSN is advanced to the
- * one provided; in normal running, we generate a new XLOG record and set the
- * page LSN to that value (though the heap page's LSN may *not* be updated;
- * see below). cutoff_xid is the largest xmin on the page being marked
- * all-visible; it is needed for Hot Standby, and can be InvalidTransactionId
- * if the page contains no tuples. It can also be set to InvalidTransactionId
- * when a page that is already all-visible is being marked all-frozen.
- *
- * Caller is expected to set the heap page's PD_ALL_VISIBLE bit before calling
- * this function. Except in recovery, caller should also pass the heap buffer.
- * When checksums are enabled and we're not in recovery, if the heap page was
- * modified, we must add the heap buffer to the WAL chain to protect it from
- * being torn.
- *
- * You must pass a buffer containing the correct map page to this function.
- * Call visibilitymap_pin first to pin the right one. This function doesn't do
- * any I/O.
- *
- * Returns the state of the page's VM bits before setting flags.
- */
-uint8
-visibilitymap_set(Relation rel, BlockNumber heapBlk, Buffer heapBuf,
- XLogRecPtr recptr, Buffer vmBuf, TransactionId cutoff_xid,
- uint8 flags, bool set_heap_lsn)
-{
- BlockNumber mapBlock = HEAPBLK_TO_MAPBLOCK(heapBlk);
- uint32 mapByte = HEAPBLK_TO_MAPBYTE(heapBlk);
- uint8 mapOffset = HEAPBLK_TO_OFFSET(heapBlk);
- Page page;
- uint8 *map;
- uint8 status;
-
-#ifdef TRACE_VISIBILITYMAP
- elog(DEBUG1, "vm_set %s %d", RelationGetRelationName(rel), heapBlk);
-#endif
-
- Assert(InRecovery || XLogRecPtrIsInvalid(recptr));
- Assert(!(InRecovery && set_heap_lsn));
- Assert((flags & VISIBILITYMAP_VALID_BITS) == flags);
-
- /* Must never set all_frozen bit without also setting all_visible bit */
- Assert(flags != VISIBILITYMAP_ALL_FROZEN);
-
- /* Check that we have the right VM page pinned */
- if (!BufferIsValid(vmBuf) || BufferGetBlockNumber(vmBuf) != mapBlock)
- elog(ERROR, "wrong VM buffer passed to visibilitymap_set");
-
- page = BufferGetPage(vmBuf);
- map = (uint8 *) PageGetContents(page);
- LockBuffer(vmBuf, BUFFER_LOCK_EXCLUSIVE);
-
- status = (map[mapByte] >> mapOffset) & VISIBILITYMAP_VALID_BITS;
- if (flags != status)
- {
- START_CRIT_SECTION();
-
- map[mapByte] |= (flags << mapOffset);
- MarkBufferDirty(vmBuf);
-
- if (RelationNeedsWAL(rel))
- {
- if (XLogRecPtrIsInvalid(recptr))
- {
- Assert(!InRecovery);
- recptr = log_heap_visible(rel, heapBuf, vmBuf, cutoff_xid, flags);
-
- /*
- * If data checksums are enabled (or wal_log_hints=on), we
- * need to protect the heap page from being torn.
- *
- * If not, then we must *not* update the heap page's LSN. In
- * this case, the FPI for the heap page was omitted from the
- * WAL record inserted above, so it would be incorrect to
- * update the heap page's LSN.
- */
- if (set_heap_lsn)
- {
- Page heapPage = BufferGetPage(heapBuf);
-
- Assert(XLogHintBitIsNeeded());
-
- PageSetLSN(heapPage, recptr);
- }
- }
- PageSetLSN(page, recptr);
- }
-
- END_CRIT_SECTION();
- }
-
- LockBuffer(vmBuf, BUFFER_LOCK_UNLOCK);
- return status;
-}
/*
* Set flags in the VM block contained in the passed in vmBuf.
@@ -325,8 +228,8 @@ visibilitymap_set(Relation rel, BlockNumber heapBlk, Buffer heapBuf,
* making any changes needed to the associated heap page.
*/
uint8
-visibilitymap_set_vmbyte(Relation rel, BlockNumber heapBlk,
- Buffer vmBuf, uint8 flags)
+visibilitymap_set(Relation rel, BlockNumber heapBlk,
+ Buffer vmBuf, uint8 flags)
{
BlockNumber mapBlock = HEAPBLK_TO_MAPBLOCK(heapBlk);
uint32 mapByte = HEAPBLK_TO_MAPBYTE(heapBlk);
diff --git a/src/backend/access/rmgrdesc/heapdesc.c b/src/backend/access/rmgrdesc/heapdesc.c
index d6c86ccac20..f7880a4ed81 100644
--- a/src/backend/access/rmgrdesc/heapdesc.c
+++ b/src/backend/access/rmgrdesc/heapdesc.c
@@ -351,13 +351,6 @@ heap2_desc(StringInfo buf, XLogReaderState *record)
}
}
}
- else if (info == XLOG_HEAP2_VISIBLE)
- {
- xl_heap_visible *xlrec = (xl_heap_visible *) rec;
-
- appendStringInfo(buf, "snapshotConflictHorizon: %u, flags: 0x%02X",
- xlrec->snapshotConflictHorizon, xlrec->flags);
- }
else if (info == XLOG_HEAP2_MULTI_INSERT)
{
xl_heap_multi_insert *xlrec = (xl_heap_multi_insert *) rec;
@@ -462,9 +455,6 @@ heap2_identify(uint8 info)
case XLOG_HEAP2_PRUNE_VACUUM_CLEANUP:
id = "PRUNE_VACUUM_CLEANUP";
break;
- case XLOG_HEAP2_VISIBLE:
- id = "VISIBLE";
- break;
case XLOG_HEAP2_MULTI_INSERT:
id = "MULTI_INSERT";
break;
diff --git a/src/backend/replication/logical/decode.c b/src/backend/replication/logical/decode.c
index cc03f0706e9..2fdd4af90a8 100644
--- a/src/backend/replication/logical/decode.c
+++ b/src/backend/replication/logical/decode.c
@@ -454,7 +454,6 @@ heap2_decode(LogicalDecodingContext *ctx, XLogRecordBuffer *buf)
case XLOG_HEAP2_PRUNE_ON_ACCESS:
case XLOG_HEAP2_PRUNE_VACUUM_SCAN:
case XLOG_HEAP2_PRUNE_VACUUM_CLEANUP:
- case XLOG_HEAP2_VISIBLE:
case XLOG_HEAP2_LOCK_UPDATED:
break;
default:
diff --git a/src/include/access/heapam.h b/src/include/access/heapam.h
index e35b4adf38d..c404b794fda 100644
--- a/src/include/access/heapam.h
+++ b/src/include/access/heapam.h
@@ -365,9 +365,6 @@ extern bool heap_tuple_needs_eventual_freeze(HeapTupleHeader tuple);
extern uint8 heap_page_set_vm(Relation rel, BlockNumber heap_blk, Buffer heap_buf,
Buffer vmbuf, uint8 vmflags);
-extern uint8 heap_page_set_vm_and_log(Relation rel, BlockNumber heap_blk, Buffer heap_buf,
- Buffer vmbuf, TransactionId cutoff_xid,
- uint8 vmflags);
extern void simple_heap_insert(Relation relation, HeapTuple tup);
extern void simple_heap_delete(Relation relation, ItemPointer tid);
diff --git a/src/include/access/heapam_xlog.h b/src/include/access/heapam_xlog.h
index ceae9c083ff..9a61434b881 100644
--- a/src/include/access/heapam_xlog.h
+++ b/src/include/access/heapam_xlog.h
@@ -60,7 +60,6 @@
#define XLOG_HEAP2_PRUNE_ON_ACCESS 0x10
#define XLOG_HEAP2_PRUNE_VACUUM_SCAN 0x20
#define XLOG_HEAP2_PRUNE_VACUUM_CLEANUP 0x30
-#define XLOG_HEAP2_VISIBLE 0x40
#define XLOG_HEAP2_MULTI_INSERT 0x50
#define XLOG_HEAP2_LOCK_UPDATED 0x60
#define XLOG_HEAP2_NEW_CID 0x70
@@ -495,11 +494,6 @@ extern void heap2_desc(StringInfo buf, XLogReaderState *record);
extern const char *heap2_identify(uint8 info);
extern void heap_xlog_logical_rewrite(XLogReaderState *r);
-extern XLogRecPtr log_heap_visible(Relation rel, Buffer heap_buffer,
- Buffer vm_buffer,
- TransactionId snapshotConflictHorizon,
- uint8 vmflags);
-
/* in heapdesc.c, so it can be shared between frontend/backend code */
extern void heap_xlog_deserialize_prune_and_freeze(char *cursor, uint8 flags,
int *nplans, xlhp_freeze_plan **plans,
diff --git a/src/include/access/visibilitymap.h b/src/include/access/visibilitymap.h
index 5d0a9417c25..20141e3e805 100644
--- a/src/include/access/visibilitymap.h
+++ b/src/include/access/visibilitymap.h
@@ -31,14 +31,8 @@ extern bool visibilitymap_clear(Relation rel, BlockNumber heapBlk,
extern void visibilitymap_pin(Relation rel, BlockNumber heapBlk,
Buffer *vmbuf);
extern bool visibilitymap_pin_ok(BlockNumber heapBlk, Buffer vmbuf);
-extern uint8 visibilitymap_set(Relation rel,
- BlockNumber heapBlk, Buffer heapBuf,
- XLogRecPtr recptr,
- Buffer vmBuf,
- TransactionId cutoff_xid,
- uint8 flags, bool set_heap_lsn);
-extern uint8 visibilitymap_set_vmbyte(Relation rel, BlockNumber heapBlk,
- Buffer vmBuf, uint8 flags);
+extern uint8 visibilitymap_set(Relation rel, BlockNumber heapBlk,
+ Buffer vmBuf, uint8 flags);
extern uint8 visibilitymap_get_status(Relation rel, BlockNumber heapBlk, Buffer *vmbuf);
extern void visibilitymap_count(Relation rel, BlockNumber *all_visible, BlockNumber *all_frozen);
extern BlockNumber visibilitymap_prepare_truncate(Relation rel,
--
2.34.1
On Mon, Jun 23, 2025 at 4:25 PM Melanie Plageman
<melanieplageman@gmail.com> wrote:
The attached patch set eliminates xl_heap_visible, the WAL record
emitted when a block of the heap is set all-visible/frozen in the
visibility map. Instead, it includes the information needed to update
the VM in the WAL record already emitted by the operation modifying
the heap page.
Rebased in light of recent changes on master:
0001: cleanup
0002: preparatory work
0003: eliminate xl_heap_visible for COPY FREEZE
0004 - 0005: eliminate xl_heap_visible for vacuum's phase III
0006: eliminate xl_heap_visible for vacuum phase I empty pages
0007 - 0010: preparatory refactoring
0011: eliminate xl_heap_visible from vacuum phase I prune/freeze
0012: remove xl_heap_visible
- Melanie
Attachments:
v2-0002-Introduce-unlogged-versions-of-VM-functions.patchtext/x-patch; charset=US-ASCII; name=v2-0002-Introduce-unlogged-versions-of-VM-functions.patchDownload
From 3b9cbaac3b40976ef04ead3e2500f24d8938bda8 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <melanieplageman@gmail.com>
Date: Tue, 17 Jun 2025 17:22:10 -0400
Subject: [PATCH v2 02/12] Introduce unlogged versions of VM functions
Future commits will eliminate usages of xl_heap_visible and incorporate
setting the VM into the WAL records making other changes to the heap
page. As a step toward this make versions of the functions which update
the VM and its heap-specific wrapper which do not emit their own WAL.
These will be used in follow-on commits.
---
src/backend/access/heap/heapam.c | 44 ++++++++++++++++++++++++
src/backend/access/heap/visibilitymap.c | 45 +++++++++++++++++++++++++
src/include/access/heapam.h | 3 ++
src/include/access/visibilitymap.h | 2 ++
4 files changed, 94 insertions(+)
diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index 112f946dab0..d125787fcb6 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -7898,6 +7898,50 @@ heap_page_set_vm_and_log(Relation rel, BlockNumber heap_blk, Buffer heap_buf,
return old_vmbits;
}
+/*
+ * Ensure the provided heap page is marked PD_ALL_VISIBLE and then set the
+ * provided vmflags in the provided vmbuf.
+ *
+ * Both the heap page and VM page should be pinned and exclusive locked.
+ * You must pass a VM buffer containing the correct page of the map
+ * corresponding to the passed in heap block.
+ *
+ * This should only be called in a critical section that also emits WAL (as
+ * needed) for both heap page changes and VM page changes.
+ */
+uint8
+heap_page_set_vm(Relation rel, BlockNumber heap_blk, Buffer heap_buf,
+ Buffer vmbuf, uint8 vmflags)
+{
+ Page heap_page = BufferGetPage(heap_buf);
+
+ Assert(BufferIsValid(heap_buf));
+ Assert(CritSectionCount > 0);
+
+ /* Check that we have the right heap page pinned */
+ if (BufferGetBlockNumber(heap_buf) != heap_blk)
+ elog(ERROR, "wrong heap buffer passed to heap_page_set_vm");
+
+ /*
+ * We must never end up with the VM bit set and the page-level
+ * PD_ALL_VISIBLE bit clear. If that were to occur, a subsequent page
+ * modification would fail to clear the VM bit.
+ *
+ * Prior to Postgres 19, it was possible for the page-level bit to be set
+ * and the VM bit to be clear. This could happen if we crashed after
+ * setting PD_ALL_VISIBLE but before setting bits in the VM. Since
+ * Postgres 19, since heap page modifications are done in the same
+ * critical section as setting the VM bits, that should not longer happen.
+ */
+ if (!PageIsAllVisible(heap_page))
+ {
+ PageSetAllVisible(heap_page);
+ MarkBufferDirty(heap_buf);
+ }
+
+ return visibilitymap_set_vmbyte(rel, heap_blk, vmbuf, vmflags);
+}
+
/*
* heap_tuple_should_freeze
*
diff --git a/src/backend/access/heap/visibilitymap.c b/src/backend/access/heap/visibilitymap.c
index c57632168c7..cabd0fa0880 100644
--- a/src/backend/access/heap/visibilitymap.c
+++ b/src/backend/access/heap/visibilitymap.c
@@ -300,6 +300,51 @@ visibilitymap_set(Relation rel, BlockNumber heapBlk, Buffer heapBuf,
return status;
}
+/*
+ * Set flags in the VM block contained in the passed in vmBuf.
+ * Caller must have pinned and exclusive locked the correct block of the VM in
+ * vmBuf.
+ * Caller is responsible for WAL logging the changes to the VM buffer and for
+ * making any changes needed to the associated heap page.
+ */
+uint8
+visibilitymap_set_vmbyte(Relation rel, BlockNumber heapBlk,
+ Buffer vmBuf, uint8 flags)
+{
+ BlockNumber mapBlock = HEAPBLK_TO_MAPBLOCK(heapBlk);
+ uint32 mapByte = HEAPBLK_TO_MAPBYTE(heapBlk);
+ uint8 mapOffset = HEAPBLK_TO_OFFSET(heapBlk);
+ Page page;
+ uint8 *map;
+ uint8 status;
+
+#ifdef TRACE_VISIBILITYMAP
+ elog(DEBUG1, "vm_set %s %d", RelationGetRelationName(rel), heapBlk);
+#endif
+
+ /* Flags should be valid. Also never clear bits with this function */
+ Assert((flags & VISIBILITYMAP_VALID_BITS) == flags);
+
+ /* Must never set all_frozen bit without also setting all_visible bit */
+ Assert(flags != VISIBILITYMAP_ALL_FROZEN);
+
+ /* Check that we have the right VM page pinned */
+ if (!BufferIsValid(vmBuf) || BufferGetBlockNumber(vmBuf) != mapBlock)
+ elog(ERROR, "wrong VM buffer passed to visibilitymap_set");
+
+ page = BufferGetPage(vmBuf);
+ map = (uint8 *) PageGetContents(page);
+
+ status = (map[mapByte] >> mapOffset) & VISIBILITYMAP_VALID_BITS;
+ if (flags != status)
+ {
+ map[mapByte] |= (flags << mapOffset);
+ MarkBufferDirty(vmBuf);
+ }
+
+ return status;
+}
+
/*
* visibilitymap_get_status - get status of bits
*
diff --git a/src/include/access/heapam.h b/src/include/access/heapam.h
index 9375296062f..5127fdb9c77 100644
--- a/src/include/access/heapam.h
+++ b/src/include/access/heapam.h
@@ -360,6 +360,9 @@ extern bool heap_tuple_should_freeze(HeapTupleHeader tuple,
TransactionId *NoFreezePageRelfrozenXid,
MultiXactId *NoFreezePageRelminMxid);
extern bool heap_tuple_needs_eventual_freeze(HeapTupleHeader tuple);
+
+extern uint8 heap_page_set_vm(Relation rel, BlockNumber heap_blk, Buffer heap_buf,
+ Buffer vmbuf, uint8 vmflags);
extern uint8 heap_page_set_vm_and_log(Relation rel, BlockNumber heap_blk, Buffer heap_buf,
Buffer vmbuf, TransactionId cutoff_xid,
uint8 vmflags);
diff --git a/src/include/access/visibilitymap.h b/src/include/access/visibilitymap.h
index 4c7472e0b51..91ef3705e84 100644
--- a/src/include/access/visibilitymap.h
+++ b/src/include/access/visibilitymap.h
@@ -37,6 +37,8 @@ extern uint8 visibilitymap_set(Relation rel,
Buffer vmBuf,
TransactionId cutoff_xid,
uint8 flags);
+extern uint8 visibilitymap_set_vmbyte(Relation rel, BlockNumber heapBlk,
+ Buffer vmBuf, uint8 flags);
extern uint8 visibilitymap_get_status(Relation rel, BlockNumber heapBlk, Buffer *vmbuf);
extern void visibilitymap_count(Relation rel, BlockNumber *all_visible, BlockNumber *all_frozen);
extern BlockNumber visibilitymap_prepare_truncate(Relation rel,
--
2.34.1
v2-0001-Introduce-heap-specific-wrapper-for-visibilitymap.patchtext/x-patch; charset=US-ASCII; name=v2-0001-Introduce-heap-specific-wrapper-for-visibilitymap.patchDownload
From 44370f480a1da1c51640faa5098ef127be7f3092 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <melanieplageman@gmail.com>
Date: Thu, 26 Jun 2025 15:57:53 -0400
Subject: [PATCH v2 01/12] Introduce heap-specific wrapper for
visibilitymap_set()
visibilitymap_set(), which sets bits in the visibility map corresponding
to the heap block of the table passed in, arguably breaks a few of the
coding rules for modifying and WAL logging buffers set out in
access/transam/README.
In several of the places where visibilitymap_set() is called, setting
the heap page PD_ALL_VISIBLE and marking the buffer dirty are done
outside of a critical section.
In some places before visibilitymap_set() is called, MarkBufferDirty()
is used when MarkBufferDirtyHint() would be appropriate.
And in some places where PD_ALL_VISIBLE may already be set and we don't
mark the buffer dirty, when checksums/wal_log_hints are enabled
visibilitymap_set() will still set the heap page LSN -- even though it
was correct not to set the buffer dirty.
Besides all of these issues, having these operations open-coded all over
the place is error-prone. This commit introduces a wrapper that does the
correct operations to the heap page itself and invokes
visibilitymap_set() to make changes to the VM page.
---
src/backend/access/heap/heapam.c | 92 ++++++++++++++++++++-----
src/backend/access/heap/heapam_xlog.c | 2 +-
src/backend/access/heap/vacuumlazy.c | 66 +++++-------------
src/backend/access/heap/visibilitymap.c | 58 ++++++----------
src/include/access/heapam.h | 3 +
src/include/access/visibilitymap.h | 2 +-
6 files changed, 117 insertions(+), 106 deletions(-)
diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index 0dcd6ee817e..112f946dab0 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -2505,8 +2505,6 @@ heap_multi_insert(Relation relation, TupleTableSlot **slots, int ntuples,
BufferGetBlockNumber(buffer),
vmbuffer, VISIBILITYMAP_VALID_BITS);
}
- else if (all_frozen_set)
- PageSetAllVisible(page);
/*
* XXX Should we set PageSetPrunable on this page ? See heap_insert()
@@ -2632,23 +2630,16 @@ heap_multi_insert(Relation relation, TupleTableSlot **slots, int ntuples,
/*
* If we've frozen everything on the page, update the visibilitymap.
- * We're already holding pin on the vmbuffer.
+ * We're already holding pin on the vmbuffer. It's fine to use
+ * InvalidTransactionId as the cutoff_xid here - this is only used
+ * when HEAP_INSERT_FROZEN is specified, which intentionally violates
+ * visibility rules.
*/
if (all_frozen_set)
- {
- Assert(PageIsAllVisible(page));
- Assert(visibilitymap_pin_ok(BufferGetBlockNumber(buffer), vmbuffer));
-
- /*
- * It's fine to use InvalidTransactionId here - this is only used
- * when HEAP_INSERT_FROZEN is specified, which intentionally
- * violates visibility rules.
- */
- visibilitymap_set(relation, BufferGetBlockNumber(buffer), buffer,
- InvalidXLogRecPtr, vmbuffer,
- InvalidTransactionId,
- VISIBILITYMAP_ALL_VISIBLE | VISIBILITYMAP_ALL_FROZEN);
- }
+ heap_page_set_vm_and_log(relation, BufferGetBlockNumber(buffer), buffer,
+ vmbuffer,
+ InvalidTransactionId,
+ VISIBILITYMAP_ALL_VISIBLE | VISIBILITYMAP_ALL_FROZEN);
UnlockReleaseBuffer(buffer);
ndone += nthispage;
@@ -7840,6 +7831,73 @@ heap_tuple_needs_eventual_freeze(HeapTupleHeader tuple)
return false;
}
+/*
+ * Make the heap and VM page changes needed to set a page all-visible.
+ * Do not call in recovery.
+ */
+uint8
+heap_page_set_vm_and_log(Relation rel, BlockNumber heap_blk, Buffer heap_buf,
+ Buffer vmbuf, TransactionId cutoff_xid,
+ uint8 vmflags)
+{
+ Page heap_page = BufferGetPage(heap_buf);
+ bool set_heap_lsn = false;
+ XLogRecPtr recptr = InvalidXLogRecPtr;
+ uint8 old_vmbits = 0;
+
+ Assert(BufferIsValid(heap_buf));
+
+ START_CRIT_SECTION();
+
+ /* Check that we have the right heap page pinned, if present */
+ if (BufferGetBlockNumber(heap_buf) != heap_blk)
+ elog(ERROR, "wrong heap buffer passed to heap_page_set_vm_and_log");
+
+ /*
+ * We must never end up with the VM bit set and the page-level
+ * PD_ALL_VISIBLE bit clear. If that were to occur, a subsequent page
+ * modification would fail to clear the VM bit. Though it is possible for
+ * the page-level bit to be set and the VM bit to be clear if checksums
+ * and wal_log_hints are not enabled.
+ */
+ if (!PageIsAllVisible(heap_page))
+ {
+ PageSetAllVisible(heap_page);
+
+ /*
+ * Buffer will usually be dirty from other changes, so it is worth the
+ * extra check
+ */
+ if (!BufferIsDirty(heap_buf))
+ {
+ if (XLogHintBitIsNeeded())
+ MarkBufferDirty(heap_buf);
+ else
+ MarkBufferDirtyHint(heap_buf, true);
+ }
+
+ set_heap_lsn = XLogHintBitIsNeeded();
+ }
+
+ old_vmbits = visibilitymap_set(rel, heap_blk, heap_buf,
+ &recptr, vmbuf, cutoff_xid, vmflags);
+
+ /*
+ * If we modified the heap page and data checksums are enabled (or
+ * wal_log_hints=on), we need to protect the heap page from being torn.
+ *
+ * If not, then we must *not* update the heap page's LSN. In this case,
+ * the FPI for the heap page was omitted from the WAL record inserted in
+ * the VM record, so it would be incorrect to update the heap page's LSN.
+ */
+ if (set_heap_lsn)
+ PageSetLSN(heap_page, recptr);
+
+ END_CRIT_SECTION();
+
+ return old_vmbits;
+}
+
/*
* heap_tuple_should_freeze
*
diff --git a/src/backend/access/heap/heapam_xlog.c b/src/backend/access/heap/heapam_xlog.c
index eb4bd3d6ae3..f2bc1bd06ee 100644
--- a/src/backend/access/heap/heapam_xlog.c
+++ b/src/backend/access/heap/heapam_xlog.c
@@ -297,7 +297,7 @@ heap_xlog_visible(XLogReaderState *record)
reln = CreateFakeRelcacheEntry(rlocator);
visibilitymap_pin(reln, blkno, &vmbuffer);
- visibilitymap_set(reln, blkno, InvalidBuffer, lsn, vmbuffer,
+ visibilitymap_set(reln, blkno, InvalidBuffer, &lsn, vmbuffer,
xlrec->snapshotConflictHorizon, vmbits);
ReleaseBuffer(vmbuffer);
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index 8a42e17aec2..c0608af7d29 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -1874,9 +1874,6 @@ lazy_scan_new_or_empty(LVRelState *vacrel, Buffer buf, BlockNumber blkno,
{
START_CRIT_SECTION();
- /* mark buffer dirty before writing a WAL record */
- MarkBufferDirty(buf);
-
/*
* It's possible that another backend has extended the heap,
* initialized the page, and then failed to WAL-log the page due
@@ -1888,14 +1885,15 @@ lazy_scan_new_or_empty(LVRelState *vacrel, Buffer buf, BlockNumber blkno,
*/
if (RelationNeedsWAL(vacrel->rel) &&
PageGetLSN(page) == InvalidXLogRecPtr)
+ {
+ MarkBufferDirty(buf);
log_newpage_buffer(buf, true);
+ }
- PageSetAllVisible(page);
- visibilitymap_set(vacrel->rel, blkno, buf,
- InvalidXLogRecPtr,
- vmbuffer, InvalidTransactionId,
- VISIBILITYMAP_ALL_VISIBLE |
- VISIBILITYMAP_ALL_FROZEN);
+ heap_page_set_vm_and_log(vacrel->rel, blkno, buf,
+ vmbuffer, InvalidTransactionId,
+ VISIBILITYMAP_ALL_VISIBLE |
+ VISIBILITYMAP_ALL_FROZEN);
END_CRIT_SECTION();
/* Count the newly all-frozen pages for logging */
@@ -2069,25 +2067,9 @@ lazy_scan_prune(LVRelState *vacrel,
flags |= VISIBILITYMAP_ALL_FROZEN;
}
- /*
- * It should never be the case that the visibility map page is set
- * while the page-level bit is clear, but the reverse is allowed (if
- * checksums are not enabled). Regardless, set both bits so that we
- * get back in sync.
- *
- * NB: If the heap page is all-visible but the VM bit is not set, we
- * don't need to dirty the heap page. However, if checksums are
- * enabled, we do need to make sure that the heap page is dirtied
- * before passing it to visibilitymap_set(), because it may be logged.
- * Given that this situation should only happen in rare cases after a
- * crash, it is not worth optimizing.
- */
- PageSetAllVisible(page);
- MarkBufferDirty(buf);
- old_vmbits = visibilitymap_set(vacrel->rel, blkno, buf,
- InvalidXLogRecPtr,
- vmbuffer, presult.vm_conflict_horizon,
- flags);
+ old_vmbits = heap_page_set_vm_and_log(vacrel->rel, blkno, buf,
+ vmbuffer, presult.vm_conflict_horizon,
+ flags);
/*
* If the page wasn't already set all-visible and/or all-frozen in the
@@ -2159,17 +2141,6 @@ lazy_scan_prune(LVRelState *vacrel,
{
uint8 old_vmbits;
- /*
- * Avoid relying on all_visible_according_to_vm as a proxy for the
- * page-level PD_ALL_VISIBLE bit being set, since it might have become
- * stale -- even when all_visible is set
- */
- if (!PageIsAllVisible(page))
- {
- PageSetAllVisible(page);
- MarkBufferDirty(buf);
- }
-
/*
* Set the page all-frozen (and all-visible) in the VM.
*
@@ -2178,11 +2149,10 @@ lazy_scan_prune(LVRelState *vacrel,
* was logged when the page's tuples were frozen.
*/
Assert(!TransactionIdIsValid(presult.vm_conflict_horizon));
- old_vmbits = visibilitymap_set(vacrel->rel, blkno, buf,
- InvalidXLogRecPtr,
- vmbuffer, InvalidTransactionId,
- VISIBILITYMAP_ALL_VISIBLE |
- VISIBILITYMAP_ALL_FROZEN);
+ old_vmbits = heap_page_set_vm_and_log(vacrel->rel, blkno, buf,
+ vmbuffer, InvalidTransactionId,
+ VISIBILITYMAP_ALL_VISIBLE |
+ VISIBILITYMAP_ALL_FROZEN);
/*
* The page was likely already set all-visible in the VM. However,
@@ -2913,11 +2883,9 @@ lazy_vacuum_heap_page(LVRelState *vacrel, BlockNumber blkno, Buffer buffer,
flags |= VISIBILITYMAP_ALL_FROZEN;
}
- PageSetAllVisible(page);
- visibilitymap_set(vacrel->rel, blkno, buffer,
- InvalidXLogRecPtr,
- vmbuffer, visibility_cutoff_xid,
- flags);
+ heap_page_set_vm_and_log(vacrel->rel, blkno, buffer,
+ vmbuffer, visibility_cutoff_xid,
+ flags);
/* Count the newly set VM page for logging */
vacrel->vm_new_visible_pages++;
diff --git a/src/backend/access/heap/visibilitymap.c b/src/backend/access/heap/visibilitymap.c
index 745a04ef26e..c57632168c7 100644
--- a/src/backend/access/heap/visibilitymap.c
+++ b/src/backend/access/heap/visibilitymap.c
@@ -222,29 +222,31 @@ visibilitymap_pin_ok(BlockNumber heapBlk, Buffer vmbuf)
/*
* visibilitymap_set - set bit(s) on a previously pinned page
*
- * recptr is the LSN of the XLOG record we're replaying, if we're in recovery,
- * or InvalidXLogRecPtr in normal running. The VM page LSN is advanced to the
- * one provided; in normal running, we generate a new XLOG record and set the
- * page LSN to that value (though the heap page's LSN may *not* be updated;
- * see below). cutoff_xid is the largest xmin on the page being marked
- * all-visible; it is needed for Hot Standby, and can be InvalidTransactionId
- * if the page contains no tuples. It can also be set to InvalidTransactionId
- * when a page that is already all-visible is being marked all-frozen.
- *
* Caller is expected to set the heap page's PD_ALL_VISIBLE bit before calling
- * this function. Except in recovery, caller should also pass the heap
- * buffer. When checksums are enabled and we're not in recovery, we must add
- * the heap buffer to the WAL chain to protect it from being torn.
+ * this function. Except in recovery, caller should also pass the heap buffer.
+ * When checksums are enabled and we're not in recovery, we must add the heap
+ * buffer to the WAL chain to protect it from being torn.
*
* You must pass a buffer containing the correct map page to this function.
* Call visibilitymap_pin first to pin the right one. This function doesn't do
* any I/O.
*
- * Returns the state of the page's VM bits before setting flags.
+ * cutoff_xid is the largest xmin on the page being marked all-visible; it is
+ * needed for Hot Standby, and can be InvalidTransactionId if the page
+ * contains no tuples. It can also be set to InvalidTransactionId when a page
+ * that is already all-visible is being marked all-frozen.
+ *
+ * If we're in recovery, recptr points to the LSN of the XLOG record we're
+ * replaying and the VM page LSN is advanced to this LSN. During normal
+ * running, we'll generate a new XLOG record for the changes to the VM and set
+ * the VM page LSN. We will return this LSN in recptr, and the caller may use
+ * this to set the heap page LSN.
+ *
+ * Returns the state of the page's VM bits before setting flags and sets.
*/
uint8
visibilitymap_set(Relation rel, BlockNumber heapBlk, Buffer heapBuf,
- XLogRecPtr recptr, Buffer vmBuf, TransactionId cutoff_xid,
+ XLogRecPtr *recptr, Buffer vmBuf, TransactionId cutoff_xid,
uint8 flags)
{
BlockNumber mapBlock = HEAPBLK_TO_MAPBLOCK(heapBlk);
@@ -258,17 +260,13 @@ visibilitymap_set(Relation rel, BlockNumber heapBlk, Buffer heapBuf,
elog(DEBUG1, "vm_set %s %d", RelationGetRelationName(rel), heapBlk);
#endif
- Assert(InRecovery || XLogRecPtrIsInvalid(recptr));
+ Assert(InRecovery || XLogRecPtrIsInvalid(*recptr));
Assert(InRecovery || PageIsAllVisible((Page) BufferGetPage(heapBuf)));
Assert((flags & VISIBILITYMAP_VALID_BITS) == flags);
/* Must never set all_frozen bit without also setting all_visible bit */
Assert(flags != VISIBILITYMAP_ALL_FROZEN);
- /* Check that we have the right heap page pinned, if present */
- if (BufferIsValid(heapBuf) && BufferGetBlockNumber(heapBuf) != heapBlk)
- elog(ERROR, "wrong heap buffer passed to visibilitymap_set");
-
/* Check that we have the right VM page pinned */
if (!BufferIsValid(vmBuf) || BufferGetBlockNumber(vmBuf) != mapBlock)
elog(ERROR, "wrong VM buffer passed to visibilitymap_set");
@@ -287,28 +285,12 @@ visibilitymap_set(Relation rel, BlockNumber heapBlk, Buffer heapBuf,
if (RelationNeedsWAL(rel))
{
- if (XLogRecPtrIsInvalid(recptr))
+ if (XLogRecPtrIsInvalid(*recptr))
{
Assert(!InRecovery);
- recptr = log_heap_visible(rel, heapBuf, vmBuf, cutoff_xid, flags);
-
- /*
- * If data checksums are enabled (or wal_log_hints=on), we
- * need to protect the heap page from being torn.
- *
- * If not, then we must *not* update the heap page's LSN. In
- * this case, the FPI for the heap page was omitted from the
- * WAL record inserted above, so it would be incorrect to
- * update the heap page's LSN.
- */
- if (XLogHintBitIsNeeded())
- {
- Page heapPage = BufferGetPage(heapBuf);
-
- PageSetLSN(heapPage, recptr);
- }
+ *recptr = log_heap_visible(rel, heapBuf, vmBuf, cutoff_xid, flags);
}
- PageSetLSN(page, recptr);
+ PageSetLSN(page, *recptr);
}
END_CRIT_SECTION();
diff --git a/src/include/access/heapam.h b/src/include/access/heapam.h
index 3a9424c19c9..9375296062f 100644
--- a/src/include/access/heapam.h
+++ b/src/include/access/heapam.h
@@ -360,6 +360,9 @@ extern bool heap_tuple_should_freeze(HeapTupleHeader tuple,
TransactionId *NoFreezePageRelfrozenXid,
MultiXactId *NoFreezePageRelminMxid);
extern bool heap_tuple_needs_eventual_freeze(HeapTupleHeader tuple);
+extern uint8 heap_page_set_vm_and_log(Relation rel, BlockNumber heap_blk, Buffer heap_buf,
+ Buffer vmbuf, TransactionId cutoff_xid,
+ uint8 vmflags);
extern void simple_heap_insert(Relation relation, HeapTuple tup);
extern void simple_heap_delete(Relation relation, ItemPointer tid);
diff --git a/src/include/access/visibilitymap.h b/src/include/access/visibilitymap.h
index be21c6dd1a3..4c7472e0b51 100644
--- a/src/include/access/visibilitymap.h
+++ b/src/include/access/visibilitymap.h
@@ -33,7 +33,7 @@ extern void visibilitymap_pin(Relation rel, BlockNumber heapBlk,
extern bool visibilitymap_pin_ok(BlockNumber heapBlk, Buffer vmbuf);
extern uint8 visibilitymap_set(Relation rel,
BlockNumber heapBlk, Buffer heapBuf,
- XLogRecPtr recptr,
+ XLogRecPtr *recptr,
Buffer vmBuf,
TransactionId cutoff_xid,
uint8 flags);
--
2.34.1
v2-0004-Make-heap_page_is_all_visible-independent-of-LVRe.patchtext/x-patch; charset=US-ASCII; name=v2-0004-Make-heap_page_is_all_visible-independent-of-LVRe.patchDownload
From b927fb837d0d0897620e2a805b0a8d517522a0bc Mon Sep 17 00:00:00 2001
From: Melanie Plageman <melanieplageman@gmail.com>
Date: Wed, 18 Jun 2025 15:48:51 -0400
Subject: [PATCH v2 04/12] Make heap_page_is_all_visible independent of
LVRelState
Future commits will use this function inside of pruneheap.c where we do
not have access to the LVRelState. We only need two parameters from the
LVRelState, so just pass those in explicitly.
---
src/backend/access/heap/vacuumlazy.c | 45 ++++++++++++++++++----------
1 file changed, 29 insertions(+), 16 deletions(-)
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index c0608af7d29..e620f0a635b 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -464,8 +464,11 @@ static void dead_items_add(LVRelState *vacrel, BlockNumber blkno, OffsetNumber *
int num_offsets);
static void dead_items_reset(LVRelState *vacrel);
static void dead_items_cleanup(LVRelState *vacrel);
-static bool heap_page_is_all_visible(LVRelState *vacrel, Buffer buf,
- TransactionId *visibility_cutoff_xid, bool *all_frozen);
+static bool heap_page_is_all_visible(Relation rel, Buffer buf,
+ TransactionId OldestXmin,
+ bool *all_frozen,
+ TransactionId *visibility_cutoff_xid,
+ OffsetNumber *logging_offnum);
static void update_relstats_all_indexes(LVRelState *vacrel);
static void vacuum_error_callback(void *arg);
static void update_vacuum_error_info(LVRelState *vacrel,
@@ -2005,8 +2008,9 @@ lazy_scan_prune(LVRelState *vacrel,
Assert(presult.lpdead_items == 0);
- if (!heap_page_is_all_visible(vacrel, buf,
- &debug_cutoff, &debug_all_frozen))
+ if (!heap_page_is_all_visible(vacrel->rel, buf,
+ vacrel->cutoffs.OldestXmin, &debug_all_frozen,
+ &debug_cutoff, &vacrel->offnum))
Assert(false);
Assert(presult.all_frozen == debug_all_frozen);
@@ -2872,8 +2876,8 @@ lazy_vacuum_heap_page(LVRelState *vacrel, BlockNumber blkno, Buffer buffer,
* emitted.
*/
Assert(!PageIsAllVisible(page));
- if (heap_page_is_all_visible(vacrel, buffer, &visibility_cutoff_xid,
- &all_frozen))
+ if (heap_page_is_all_visible(vacrel->rel, buffer, vacrel->cutoffs.OldestXmin,
+ &all_frozen, &visibility_cutoff_xid, &vacrel->offnum))
{
uint8 flags = VISIBILITYMAP_ALL_VISIBLE;
@@ -3555,9 +3559,16 @@ dead_items_cleanup(LVRelState *vacrel)
/*
* Check if every tuple in the given page is visible to all current and future
- * transactions. Also return the visibility_cutoff_xid which is the highest
- * xmin amongst the visible tuples. Set *all_frozen to true if every tuple
- * on this page is frozen.
+ * transactions.
+ *
+ * OldestXmin is used to determine visibility.
+ *
+ * *logging_offnum will have the OffsetNumber of the current tuple being
+ * processed for vacuum's error callback system.
+ *
+ * Return the visibility_cutoff_xid which is the highest xmin amongst the
+ * visible tuples. Sets *all_frozen to true if every tuple on this page is
+ * frozen.
*
* This is a stripped down version of lazy_scan_prune(). If you change
* anything here, make sure that everything stays in sync. Note that an
@@ -3565,9 +3576,11 @@ dead_items_cleanup(LVRelState *vacrel)
* introducing new side-effects here.
*/
static bool
-heap_page_is_all_visible(LVRelState *vacrel, Buffer buf,
+heap_page_is_all_visible(Relation rel, Buffer buf,
+ TransactionId OldestXmin,
+ bool *all_frozen,
TransactionId *visibility_cutoff_xid,
- bool *all_frozen)
+ OffsetNumber *logging_offnum)
{
Page page = BufferGetPage(buf);
BlockNumber blockno = BufferGetBlockNumber(buf);
@@ -3590,7 +3603,7 @@ heap_page_is_all_visible(LVRelState *vacrel, Buffer buf,
* Set the offset number so that we can display it along with any
* error that occurred while processing this tuple.
*/
- vacrel->offnum = offnum;
+ *logging_offnum = offnum;
itemid = PageGetItemId(page, offnum);
/* Unused or redirect line pointers are of no interest */
@@ -3614,9 +3627,9 @@ heap_page_is_all_visible(LVRelState *vacrel, Buffer buf,
tuple.t_data = (HeapTupleHeader) PageGetItem(page, itemid);
tuple.t_len = ItemIdGetLength(itemid);
- tuple.t_tableOid = RelationGetRelid(vacrel->rel);
+ tuple.t_tableOid = RelationGetRelid(rel);
- switch (HeapTupleSatisfiesVacuum(&tuple, vacrel->cutoffs.OldestXmin,
+ switch (HeapTupleSatisfiesVacuum(&tuple, OldestXmin,
buf))
{
case HEAPTUPLE_LIVE:
@@ -3637,7 +3650,7 @@ heap_page_is_all_visible(LVRelState *vacrel, Buffer buf,
*/
xmin = HeapTupleHeaderGetXmin(tuple.t_data);
if (!TransactionIdPrecedes(xmin,
- vacrel->cutoffs.OldestXmin))
+ OldestXmin))
{
all_visible = false;
*all_frozen = false;
@@ -3672,7 +3685,7 @@ heap_page_is_all_visible(LVRelState *vacrel, Buffer buf,
} /* scan along page */
/* Clear the offset information once we have processed the given page. */
- vacrel->offnum = InvalidOffsetNumber;
+ *logging_offnum = InvalidOffsetNumber;
return all_visible;
}
--
2.34.1
v2-0005-Eliminate-xl_heap_visible-from-vacuum-phase-III.patchtext/x-patch; charset=US-ASCII; name=v2-0005-Eliminate-xl_heap_visible-from-vacuum-phase-III.patchDownload
From c28b2edbd682e22546d4bce080728b2ef8a35601 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <melanieplageman@gmail.com>
Date: Wed, 18 Jun 2025 12:42:13 -0400
Subject: [PATCH v2 05/12] Eliminate xl_heap_visible from vacuum phase III
Instead of emitting a separate xl_heap_visible record for each page that
is all-visible after vacuum's third phase, use the VM-related options
when emitting the xl_heap_prune record with the changes vacuum makes in
phase III.
---
src/backend/access/heap/heapam_xlog.c | 148 ++++++++++++++++++++++---
src/backend/access/heap/pruneheap.c | 48 +++++++-
src/backend/access/heap/vacuumlazy.c | 146 ++++++++++++++++--------
src/backend/access/rmgrdesc/heapdesc.c | 13 ++-
src/include/access/heapam.h | 9 ++
src/include/access/heapam_xlog.h | 3 +
6 files changed, 301 insertions(+), 66 deletions(-)
diff --git a/src/backend/access/heap/heapam_xlog.c b/src/backend/access/heap/heapam_xlog.c
index 71754fd77c4..70a46a37357 100644
--- a/src/backend/access/heap/heapam_xlog.c
+++ b/src/backend/access/heap/heapam_xlog.c
@@ -35,7 +35,8 @@ heap_xlog_prune_freeze(XLogReaderState *record)
Buffer buffer;
RelFileLocator rlocator;
BlockNumber blkno;
- XLogRedoAction action;
+ Buffer vmbuffer = InvalidBuffer;
+ uint8 vmflags = 0;
XLogRecGetBlockTag(record, 0, &rlocator, NULL, &blkno);
memcpy(&xlrec, maindataptr, SizeOfHeapPrune);
@@ -51,10 +52,15 @@ heap_xlog_prune_freeze(XLogReaderState *record)
(xlrec.flags & (XLHP_HAS_REDIRECTIONS | XLHP_HAS_DEAD_ITEMS)) == 0);
/*
- * We are about to remove and/or freeze tuples. In Hot Standby mode,
- * ensure that there are no queries running for which the removed tuples
- * are still visible or which still consider the frozen xids as running.
- * The conflict horizon XID comes after xl_heap_prune.
+ * After xl_heap_prune is the optional snapshot conflict horizon.
+ *
+ * In Hot Standby mode, we must ensure that there are no running queries
+ * which would conflict with the changes in this record. If pruning, that
+ * means we cannot remove tuples still visible to transactions on the
+ * standby. If freezing, that means we cannot freeze tuples with xids that
+ * are still considered running on the standby. And for setting the VM, we
+ * cannot do so if the page isn't all-visible to all transactions on the
+ * standby.
*/
if ((xlrec.flags & XLHP_HAS_CONFLICT_HORIZON) != 0)
{
@@ -70,13 +76,28 @@ heap_xlog_prune_freeze(XLogReaderState *record)
rlocator);
}
+ /* Next are the optionally included vmflags. Copy them out for later use. */
+ if ((xlrec.flags & XLHP_HAS_VMFLAGS) != 0)
+ {
+ memcpy(&vmflags, maindataptr, sizeof(uint8));
+ maindataptr += sizeof(uint8);
+
+ /*
+ * We don't set VISIBILITYMAP_XLOG_CATALOG_REL in the combined record
+ * because we already have XLHP_IS_CATALOG_REL.
+ */
+ Assert((vmflags & VISIBILITYMAP_VALID_BITS) == vmflags);
+ /* Must never set all_frozen bit without also setting all_visible bit */
+ Assert(vmflags != VISIBILITYMAP_ALL_FROZEN);
+ }
+
/*
- * If we have a full-page image, restore it and we're done.
+ * If we have a full-page image of the heap block, restore it and we're
+ * done with the heap block.
*/
- action = XLogReadBufferForRedoExtended(record, 0, RBM_NORMAL,
- (xlrec.flags & XLHP_CLEANUP_LOCK) != 0,
- &buffer);
- if (action == BLK_NEEDS_REDO)
+ if (XLogReadBufferForRedoExtended(record, 0, RBM_NORMAL,
+ (xlrec.flags & XLHP_CLEANUP_LOCK) != 0,
+ &buffer) == BLK_NEEDS_REDO)
{
Page page = (Page) BufferGetPage(buffer);
OffsetNumber *redirected;
@@ -89,6 +110,9 @@ heap_xlog_prune_freeze(XLogReaderState *record)
Size datalen;
xlhp_freeze_plan *plans;
OffsetNumber *frz_offsets;
+ bool do_prune;
+ bool mark_buffer_dirty;
+ bool set_heap_lsn;
char *dataptr = XLogRecGetBlockData(record, 0, &datalen);
heap_xlog_deserialize_prune_and_freeze(dataptr, xlrec.flags,
@@ -97,11 +121,18 @@ heap_xlog_prune_freeze(XLogReaderState *record)
&ndead, &nowdead,
&nunused, &nowunused);
+ do_prune = nredirected > 0 || ndead > 0 || nunused > 0;
+ set_heap_lsn = mark_buffer_dirty = do_prune || nplans > 0;
+
+ /* Ensure the record does something */
+ Assert(do_prune || nplans > 0 ||
+ vmflags & VISIBILITYMAP_VALID_BITS);
+
/*
* Update all line pointers per the record, and repair fragmentation
* if needed.
*/
- if (nredirected > 0 || ndead > 0 || nunused > 0)
+ if (do_prune)
heap_page_prune_execute(buffer,
(xlrec.flags & XLHP_CLEANUP_LOCK) == 0,
redirected, nredirected,
@@ -138,26 +169,78 @@ heap_xlog_prune_freeze(XLogReaderState *record)
/* There should be no more data */
Assert((char *) frz_offsets == dataptr + datalen);
+ Assert(BufferIsValid(buffer) &&
+ BufferGetBlockNumber(buffer) == blkno);
+
+ /*
+ * Now set PD_ALL_VISIBLE, if required. We'll only do this if we are
+ * also going to set bits in the VM later.
+ *
+ * We must never end up with the VM bit set and the page-level
+ * PD_ALL_VISIBLE bit clear. If that were to occur, a subsequent page
+ * modification would fail to clear the VM bit.
+ *
+ * Prior to Postgres 19, it was possible for the page-level bit to be
+ * set and the VM bit to be clear. This could happen if we crashed
+ * after setting PD_ALL_VISIBLE but before setting bits in the VM.
+ */
+ if ((vmflags & VISIBILITYMAP_VALID_BITS) && !PageIsAllVisible(page))
+ {
+ PageSetAllVisible(page);
+
+ /*
+ * Setting PD_ALL_VISIBLE only forces us to update the heap page
+ * LSN if checksums or wal_log_hints are enabled (in which case we
+ * must). This exposes us to torn page hazards, but since we're
+ * not inspecting the existing page contents in any way, we don't
+ * care.
+ */
+ set_heap_lsn = XLogHintBitIsNeeded() ? true : set_heap_lsn;
+ mark_buffer_dirty = true;
+ }
+
/*
* Note: we don't worry about updating the page's prunability hints.
* At worst this will cause an extra prune cycle to occur soon.
*/
- PageSetLSN(page, lsn);
- MarkBufferDirty(buffer);
+ if (mark_buffer_dirty)
+ MarkBufferDirty(buffer);
+ if (set_heap_lsn)
+ PageSetLSN(page, lsn);
}
/*
- * If we released any space or line pointers, update the free space map.
+ * If we released any space or line pointers or will be setting a page in
+ * the visibility map, update the free space map.
+ *
+ * Even if we are just updating the VM (and thus not freeing up any
+ * space), we'll still update the FSM for this page. Since FSM is not
+ * WAL-logged and only updated heuristically, it easily becomes stale in
+ * standbys. If the standby is later promoted and runs VACUUM, it will
+ * skip updating individual free space figures for pages that became
+ * all-visible (or all-frozen, depending on the vacuum mode,) which is
+ * troublesome when FreeSpaceMapVacuum propagates too optimistic free
+ * space values to upper FSM layers; later inserters try to use such pages
+ * only to find out that they are unusable. This can cause long stalls
+ * when there are many such pages.
+ *
+ * Forestall those problems by updating FSM's idea about a page that is
+ * becoming all-visible or all-frozen.
*
* Do this regardless of a full-page image being applied, since the FSM
* data is not in the page anyway.
+ *
+ * We want to avoid holding an exclusive lock on the heap buffer while
+ * doing IO (either of the FSM or the VM), so we'll release the lock on
+ * the heap buffer before doing either.
*/
if (BufferIsValid(buffer))
{
- if (xlrec.flags & (XLHP_HAS_REDIRECTIONS |
- XLHP_HAS_DEAD_ITEMS |
- XLHP_HAS_NOW_UNUSED_ITEMS))
+ if ((xlrec.flags & (XLHP_HAS_REDIRECTIONS |
+ XLHP_HAS_DEAD_ITEMS |
+ XLHP_HAS_NOW_UNUSED_ITEMS)) ||
+ vmflags & VISIBILITYMAP_VALID_BITS)
{
Size freespace = PageGetHeapFreeSpace(BufferGetPage(buffer));
@@ -168,6 +251,37 @@ heap_xlog_prune_freeze(XLogReaderState *record)
else
UnlockReleaseBuffer(buffer);
}
+
+ /*
+ * Read and update the VM block. Even if we skipped updating the heap page
+ * due to the file being dropped or truncated later in recovery, it's
+ * still safe to update the visibility map. Any WAL record that clears
+ * the visibility map bit does so before checking the page LSN, so any
+ * bits that need to be cleared will still be cleared.
+ *
+ * Note that it is *only* okay that we do not hold a lock on the heap page
+ * because we are in recovery and can expect no other writers to clear
+ * PD_ALL_VISIBLE before we are able to update the VM.
+ */
+ if (vmflags & VISIBILITYMAP_VALID_BITS &&
+ XLogReadBufferForRedoExtended(record, 1,
+ RBM_ZERO_ON_ERROR,
+ false,
+ &vmbuffer) == BLK_NEEDS_REDO)
+ {
+ uint8 old_vmbits = 0;
+ Relation reln = CreateFakeRelcacheEntry(rlocator);
+
+ visibilitymap_pin(reln, blkno, &vmbuffer);
+ old_vmbits = visibilitymap_set_vmbyte(reln, blkno, vmbuffer, vmflags);
+ /* Only set VM page LSN if we modified the page */
+ if (old_vmbits != vmflags)
+ PageSetLSN(BufferGetPage(vmbuffer), lsn);
+ FreeFakeRelcacheEntry(reln);
+ }
+
+ if (BufferIsValid(vmbuffer))
+ UnlockReleaseBuffer(vmbuffer);
}
/*
diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index a8025889be0..d9ba0f96e34 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -21,6 +21,7 @@
#include "access/transam.h"
#include "access/xlog.h"
#include "access/xloginsert.h"
+#include "access/visibilitymapdefs.h"
#include "commands/vacuum.h"
#include "executor/instrument.h"
#include "miscadmin.h"
@@ -835,6 +836,7 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
conflict_xid = prstate.latest_xid_removed;
log_heap_prune_and_freeze(relation, buffer,
+ InvalidBuffer, 0, false,
conflict_xid,
true, reason,
prstate.frozen, prstate.nfrozen,
@@ -2045,12 +2047,23 @@ heap_log_freeze_plan(HeapTupleFreeze *tuples, int ntuples,
* replaying 'unused' items depends on whether they were all previously marked
* as dead.
*
+ * If the VM is being updated, vmflags will contain the bits to set. In this
+ * case, vmbuffer should already have been updated and marked dirty and should
+ * still be pinned and locked.
+ *
+ * set_pd_all_vis indicates that we set PD_ALL_VISIBLE and thus should update
+ * the page LSN when checksums/wal_log_hints are enabled even if we did not
+ * prune or freeze tuples on the page.
+ *
* Note: This function scribbles on the 'frozen' array.
*
* Note: This is called in a critical section, so careful what you do here.
*/
void
log_heap_prune_and_freeze(Relation relation, Buffer buffer,
+ Buffer vmbuffer,
+ uint8 vmflags,
+ bool set_pd_all_vis,
TransactionId conflict_xid,
bool cleanup_lock,
PruneReason reason,
@@ -2062,6 +2075,7 @@ log_heap_prune_and_freeze(Relation relation, Buffer buffer,
xl_heap_prune xlrec;
XLogRecPtr recptr;
uint8 info;
+ uint8 regbuf_flags;
/* The following local variables hold data registered in the WAL record: */
xlhp_freeze_plan plans[MaxHeapTuplesPerPage];
@@ -2070,8 +2084,19 @@ log_heap_prune_and_freeze(Relation relation, Buffer buffer,
xlhp_prune_items dead_items;
xlhp_prune_items unused_items;
OffsetNumber frz_offsets[MaxHeapTuplesPerPage];
+ bool do_prune = nredirected > 0 || ndead > 0 || nunused > 0;
xlrec.flags = 0;
+ regbuf_flags = REGBUF_STANDARD;
+
+ /*
+ * We can avoid an FPI if the only modification we are making to the heap
+ * page is to set PD_ALL_VISIBLE and checksums/wal_log_hints are disabled.
+ */
+ if (!do_prune &&
+ nfrozen == 0 &&
+ (!set_pd_all_vis || !XLogHintBitIsNeeded()))
+ regbuf_flags |= REGBUF_NO_IMAGE;
/*
* Prepare data for the buffer. The arrays are not actually in the
@@ -2079,7 +2104,11 @@ log_heap_prune_and_freeze(Relation relation, Buffer buffer,
* page image, the arrays can be omitted.
*/
XLogBeginInsert();
- XLogRegisterBuffer(0, buffer, REGBUF_STANDARD);
+ XLogRegisterBuffer(0, buffer, regbuf_flags);
+
+ if (vmflags & VISIBILITYMAP_VALID_BITS)
+ XLogRegisterBuffer(1, vmbuffer, 0);
+
if (nfrozen > 0)
{
int nplans;
@@ -2136,6 +2165,8 @@ log_heap_prune_and_freeze(Relation relation, Buffer buffer,
* Prepare the main xl_heap_prune record. We already set the XLHP_HAS_*
* flag above.
*/
+ if (vmflags & VISIBILITYMAP_VALID_BITS)
+ xlrec.flags |= XLHP_HAS_VMFLAGS;
if (RelationIsAccessibleInLogicalDecoding(relation))
xlrec.flags |= XLHP_IS_CATALOG_REL;
if (TransactionIdIsValid(conflict_xid))
@@ -2150,6 +2181,8 @@ log_heap_prune_and_freeze(Relation relation, Buffer buffer,
XLogRegisterData(&xlrec, SizeOfHeapPrune);
if (TransactionIdIsValid(conflict_xid))
XLogRegisterData(&conflict_xid, sizeof(TransactionId));
+ if (vmflags & VISIBILITYMAP_VALID_BITS)
+ XLogRegisterData(&vmflags, sizeof(uint8));
switch (reason)
{
@@ -2168,5 +2201,16 @@ log_heap_prune_and_freeze(Relation relation, Buffer buffer,
}
recptr = XLogInsert(RM_HEAP2_ID, info);
- PageSetLSN(BufferGetPage(buffer), recptr);
+ if (vmflags & VISIBILITYMAP_VALID_BITS)
+ PageSetLSN(BufferGetPage(vmbuffer), recptr);
+
+ /*
+ * If pruning or freezing tuples or setting the page all-visible when
+ * checksums or wal_hint_bits are enabled, we must bump the LSN. Torn
+ * pages are possible if we update PD_ALL_VISIBLE without bumping the LSN,
+ * but this is deemed okay for page hint updates.
+ */
+ if (do_prune || nfrozen > 0 ||
+ (set_pd_all_vis && XLogHintBitIsNeeded()))
+ PageSetLSN(BufferGetPage(buffer), recptr);
}
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index e620f0a635b..56acb224d71 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -464,11 +464,13 @@ static void dead_items_add(LVRelState *vacrel, BlockNumber blkno, OffsetNumber *
int num_offsets);
static void dead_items_reset(LVRelState *vacrel);
static void dead_items_cleanup(LVRelState *vacrel);
-static bool heap_page_is_all_visible(Relation rel, Buffer buf,
- TransactionId OldestXmin,
- bool *all_frozen,
- TransactionId *visibility_cutoff_xid,
- OffsetNumber *logging_offnum);
+static bool heap_page_is_all_visible_except_lpdead(Relation rel, Buffer buf,
+ TransactionId OldestXmin,
+ OffsetNumber *deadoffsets,
+ int allowed_num_offsets,
+ bool *all_frozen,
+ TransactionId *visibility_cutoff_xid,
+ OffsetNumber *logging_offnum);
static void update_relstats_all_indexes(LVRelState *vacrel);
static void vacuum_error_callback(void *arg);
static void update_vacuum_error_info(LVRelState *vacrel,
@@ -2812,8 +2814,11 @@ lazy_vacuum_heap_page(LVRelState *vacrel, BlockNumber blkno, Buffer buffer,
OffsetNumber unused[MaxHeapTuplesPerPage];
int nunused = 0;
TransactionId visibility_cutoff_xid;
+ TransactionId conflict_xid = InvalidTransactionId;
bool all_frozen;
LVSavedErrInfo saved_err_info;
+ uint8 vmflags = 0;
+ bool set_pd_all_vis = false;
Assert(vacrel->do_index_vacuuming);
@@ -2824,6 +2829,20 @@ lazy_vacuum_heap_page(LVRelState *vacrel, BlockNumber blkno, Buffer buffer,
VACUUM_ERRCB_PHASE_VACUUM_HEAP, blkno,
InvalidOffsetNumber);
+ if (heap_page_is_all_visible_except_lpdead(vacrel->rel, buffer,
+ vacrel->cutoffs.OldestXmin,
+ deadoffsets, num_offsets,
+ &all_frozen, &visibility_cutoff_xid,
+ &vacrel->offnum))
+ {
+ vmflags |= VISIBILITYMAP_ALL_VISIBLE;
+ if (all_frozen)
+ {
+ vmflags |= VISIBILITYMAP_ALL_FROZEN;
+ Assert(!TransactionIdIsValid(visibility_cutoff_xid));
+ }
+ }
+
START_CRIT_SECTION();
for (int i = 0; i < num_offsets; i++)
@@ -2843,6 +2862,17 @@ lazy_vacuum_heap_page(LVRelState *vacrel, BlockNumber blkno, Buffer buffer,
/* Attempt to truncate line pointer array now */
PageTruncateLinePointerArray(page);
+ if ((vmflags & VISIBILITYMAP_VALID_BITS) != 0)
+ {
+ Assert(!PageIsAllVisible(page));
+ set_pd_all_vis = true;
+ LockBuffer(vmbuffer, BUFFER_LOCK_EXCLUSIVE);
+ heap_page_set_vm(vacrel->rel,
+ blkno, buffer,
+ vmbuffer, vmflags);
+ conflict_xid = visibility_cutoff_xid;
+ }
+
/*
* Mark buffer dirty before we write WAL.
*/
@@ -2852,7 +2882,10 @@ lazy_vacuum_heap_page(LVRelState *vacrel, BlockNumber blkno, Buffer buffer,
if (RelationNeedsWAL(vacrel->rel))
{
log_heap_prune_and_freeze(vacrel->rel, buffer,
- InvalidTransactionId,
+ vmbuffer,
+ vmflags,
+ set_pd_all_vis,
+ conflict_xid,
false, /* no cleanup lock required */
PRUNE_VACUUM_CLEANUP,
NULL, 0, /* frozen */
@@ -2861,37 +2894,12 @@ lazy_vacuum_heap_page(LVRelState *vacrel, BlockNumber blkno, Buffer buffer,
unused, nunused);
}
- /*
- * End critical section, so we safely can do visibility tests (which
- * possibly need to perform IO and allocate memory!). If we crash now the
- * page (including the corresponding vm bit) might not be marked all
- * visible, but that's fine. A later vacuum will fix that.
- */
END_CRIT_SECTION();
- /*
- * Now that we have removed the LP_DEAD items from the page, once again
- * check if the page has become all-visible. The page is already marked
- * dirty, exclusively locked, and, if needed, a full page image has been
- * emitted.
- */
- Assert(!PageIsAllVisible(page));
- if (heap_page_is_all_visible(vacrel->rel, buffer, vacrel->cutoffs.OldestXmin,
- &all_frozen, &visibility_cutoff_xid, &vacrel->offnum))
+ if ((vmflags & VISIBILITYMAP_ALL_VISIBLE) != 0)
{
- uint8 flags = VISIBILITYMAP_ALL_VISIBLE;
-
- if (all_frozen)
- {
- Assert(!TransactionIdIsValid(visibility_cutoff_xid));
- flags |= VISIBILITYMAP_ALL_FROZEN;
- }
-
- heap_page_set_vm_and_log(vacrel->rel, blkno, buffer,
- vmbuffer, visibility_cutoff_xid,
- flags);
-
/* Count the newly set VM page for logging */
+ LockBuffer(vmbuffer, BUFFER_LOCK_UNLOCK);
vacrel->vm_new_visible_pages++;
if (all_frozen)
vacrel->vm_new_visible_frozen_pages++;
@@ -3557,6 +3565,25 @@ dead_items_cleanup(LVRelState *vacrel)
vacrel->pvs = NULL;
}
+/*
+ * Wrapper for heap_page_is_all_visible_except_lpdead() which can be used for
+ * callers that expect no LP_DEAD on the page.
+ */
+bool
+heap_page_is_all_visible(Relation rel, Buffer buf,
+ TransactionId OldestXmin,
+ bool *all_frozen,
+ TransactionId *visibility_cutoff_xid,
+ OffsetNumber *logging_offnum)
+{
+
+ return heap_page_is_all_visible_except_lpdead(rel, buf, OldestXmin,
+ NULL, 0,
+ all_frozen,
+ visibility_cutoff_xid,
+ logging_offnum);
+}
+
/*
* Check if every tuple in the given page is visible to all current and future
* transactions.
@@ -3570,23 +3597,35 @@ dead_items_cleanup(LVRelState *vacrel)
* visible tuples. Sets *all_frozen to true if every tuple on this page is
* frozen.
*
- * This is a stripped down version of lazy_scan_prune(). If you change
- * anything here, make sure that everything stays in sync. Note that an
- * assertion calls us to verify that everybody still agrees. Be sure to avoid
- * introducing new side-effects here.
+ * deadoffsets are the offsets we know about and are about to set LP_UNUSED.
+ * allowed_num_offsets is the number of those. As long as the LP_DEAD items we
+ * encounter on the page match those exactly, we can set the page all-visible
+ * in the VM.
+ *
+ * Callers looking to verify that the page is all-visible can call
+ * heap_page_is_all_visible().
+ *
+ * This is similar logic to that in heap_prune_record_unchanged_lp_normal() If
+ * you change anything here, make sure that everything stays in sync. Note
+ * that an assertion calls us to verify that everybody still agrees. Be sure
+ * to avoid introducing new side-effects here.
*/
static bool
-heap_page_is_all_visible(Relation rel, Buffer buf,
- TransactionId OldestXmin,
- bool *all_frozen,
- TransactionId *visibility_cutoff_xid,
- OffsetNumber *logging_offnum)
+heap_page_is_all_visible_except_lpdead(Relation rel, Buffer buf,
+ TransactionId OldestXmin,
+ OffsetNumber *deadoffsets,
+ int allowed_num_offsets,
+ bool *all_frozen,
+ TransactionId *visibility_cutoff_xid,
+ OffsetNumber *logging_offnum)
{
Page page = BufferGetPage(buf);
BlockNumber blockno = BufferGetBlockNumber(buf);
OffsetNumber offnum,
maxoff;
bool all_visible = true;
+ OffsetNumber current_dead_offsets[MaxHeapTuplesPerPage];
+ size_t current_num_offsets = 0;
*visibility_cutoff_xid = InvalidTransactionId;
*all_frozen = true;
@@ -3618,9 +3657,8 @@ heap_page_is_all_visible(Relation rel, Buffer buf,
*/
if (ItemIdIsDead(itemid))
{
- all_visible = false;
- *all_frozen = false;
- break;
+ current_dead_offsets[current_num_offsets++] = offnum;
+ continue;
}
Assert(ItemIdIsNormal(itemid));
@@ -3687,7 +3725,23 @@ heap_page_is_all_visible(Relation rel, Buffer buf,
/* Clear the offset information once we have processed the given page. */
*logging_offnum = InvalidOffsetNumber;
- return all_visible;
+ /* If we already know it's not all-visible, return false */
+ if (!all_visible)
+ return false;
+
+ /* If we weren't allowed any dead offsets, we're done */
+ if (allowed_num_offsets == 0)
+ return current_num_offsets == 0;
+
+ /* If the number of dead offsets has changed, that's wrong */
+ if (current_num_offsets != allowed_num_offsets)
+ return false;
+
+ Assert(deadoffsets);
+
+ /* The dead offsets must be the same dead offsets */
+ return memcmp(current_dead_offsets, deadoffsets,
+ allowed_num_offsets * sizeof(OffsetNumber)) == 0;
}
/*
diff --git a/src/backend/access/rmgrdesc/heapdesc.c b/src/backend/access/rmgrdesc/heapdesc.c
index b48d7dc1d24..d6c86ccac20 100644
--- a/src/backend/access/rmgrdesc/heapdesc.c
+++ b/src/backend/access/rmgrdesc/heapdesc.c
@@ -266,6 +266,7 @@ heap2_desc(StringInfo buf, XLogReaderState *record)
{
char *rec = XLogRecGetData(record);
uint8 info = XLogRecGetInfo(record) & ~XLR_INFO_MASK;
+ char *maindataptr = rec + SizeOfHeapPrune;
info &= XLOG_HEAP_OPMASK;
if (info == XLOG_HEAP2_PRUNE_ON_ACCESS ||
@@ -278,7 +279,8 @@ heap2_desc(StringInfo buf, XLogReaderState *record)
{
TransactionId conflict_xid;
- memcpy(&conflict_xid, rec + SizeOfHeapPrune, sizeof(TransactionId));
+ memcpy(&conflict_xid, maindataptr, sizeof(TransactionId));
+ maindataptr += sizeof(TransactionId);
appendStringInfo(buf, "snapshotConflictHorizon: %u",
conflict_xid);
@@ -287,6 +289,15 @@ heap2_desc(StringInfo buf, XLogReaderState *record)
appendStringInfo(buf, ", isCatalogRel: %c",
xlrec->flags & XLHP_IS_CATALOG_REL ? 'T' : 'F');
+ if (xlrec->flags & XLHP_HAS_VMFLAGS)
+ {
+ uint8 vmflags;
+
+ memcpy(&vmflags, maindataptr, sizeof(uint8));
+ maindataptr += sizeof(uint8);
+ appendStringInfo(buf, ", vm_flags: 0x%02X", vmflags);
+ }
+
if (XLogRecHasBlockData(record, 0))
{
Size datalen;
diff --git a/src/include/access/heapam.h b/src/include/access/heapam.h
index 5127fdb9c77..d2ac380bb64 100644
--- a/src/include/access/heapam.h
+++ b/src/include/access/heapam.h
@@ -343,6 +343,12 @@ extern void heap_inplace_update_and_unlock(Relation relation,
Buffer buffer);
extern void heap_inplace_unlock(Relation relation,
HeapTuple oldtup, Buffer buffer);
+
+extern bool heap_page_is_all_visible(Relation rel, Buffer buf,
+ TransactionId OldestXmin,
+ bool *all_frozen,
+ TransactionId *visibility_cutoff_xid,
+ OffsetNumber *logging_offnum);
extern bool heap_prepare_freeze_tuple(HeapTupleHeader tuple,
const struct VacuumCutoffs *cutoffs,
HeapPageFreeze *pagefrz,
@@ -393,6 +399,9 @@ extern void heap_page_prune_execute(Buffer buffer, bool lp_truncate_only,
OffsetNumber *nowunused, int nunused);
extern void heap_get_root_tuples(Page page, OffsetNumber *root_offsets);
extern void log_heap_prune_and_freeze(Relation relation, Buffer buffer,
+ Buffer vmbuffer,
+ uint8 vmflags,
+ bool vm_modified_heap_page,
TransactionId conflict_xid,
bool cleanup_lock,
PruneReason reason,
diff --git a/src/include/access/heapam_xlog.h b/src/include/access/heapam_xlog.h
index 277df6b3cf0..ceae9c083ff 100644
--- a/src/include/access/heapam_xlog.h
+++ b/src/include/access/heapam_xlog.h
@@ -295,6 +295,9 @@ typedef struct xl_heap_prune
#define SizeOfHeapPrune (offsetof(xl_heap_prune, flags) + sizeof(uint8))
+/* If the record should update the VM, this is the new value */
+#define XLHP_HAS_VMFLAGS (1 << 0)
+
/* to handle recovery conflict during logical decoding on standby */
#define XLHP_IS_CATALOG_REL (1 << 1)
--
2.34.1
v2-0003-Eliminate-xl_heap_visible-in-COPY-FREEZE.patchtext/x-patch; charset=US-ASCII; name=v2-0003-Eliminate-xl_heap_visible-in-COPY-FREEZE.patchDownload
From b0e549ab6d941e04dd8a1380523aad249e7fdde9 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <melanieplageman@gmail.com>
Date: Tue, 17 Jun 2025 17:40:28 -0400
Subject: [PATCH v2 03/12] Eliminate xl_heap_visible in COPY FREEZE
Instead of emitting a separate WAL record for setting the VM bits in
xl_heap_visible, include the required update in the xl_heap_multi_insert
record instead.
---
src/backend/access/heap/heapam.c | 42 +++++++++++++++++---------
src/backend/access/heap/heapam_xlog.c | 33 +++++++++++++++++++-
src/backend/access/rmgrdesc/heapdesc.c | 5 +++
3 files changed, 65 insertions(+), 15 deletions(-)
diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index d125787fcb6..d2cf8aa9fb8 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -2493,9 +2493,6 @@ heap_multi_insert(Relation relation, TupleTableSlot **slots, int ntuples,
/*
* If the page is all visible, need to clear that, unless we're only
* going to add further frozen rows to it.
- *
- * If we're only adding already frozen rows to a previously empty
- * page, mark it as all-visible.
*/
if (PageIsAllVisible(page) && !(options & HEAP_INSERT_FROZEN))
{
@@ -2506,6 +2503,22 @@ heap_multi_insert(Relation relation, TupleTableSlot **slots, int ntuples,
vmbuffer, VISIBILITYMAP_VALID_BITS);
}
+ /*
+ * If we're only adding already frozen rows to a previously empty
+ * page, mark it as all-visible. And if we've frozen everything on the
+ * page, update the visibility map. We're already holding a pin on the
+ * vmbuffer.
+ */
+ else if (all_frozen_set)
+ {
+ LockBuffer(vmbuffer, BUFFER_LOCK_EXCLUSIVE);
+ heap_page_set_vm(relation,
+ BufferGetBlockNumber(buffer), buffer,
+ vmbuffer,
+ VISIBILITYMAP_ALL_VISIBLE |
+ VISIBILITYMAP_ALL_FROZEN);
+ }
+
/*
* XXX Should we set PageSetPrunable on this page ? See heap_insert()
*/
@@ -2552,6 +2565,12 @@ heap_multi_insert(Relation relation, TupleTableSlot **slots, int ntuples,
xlrec->flags = 0;
if (all_visible_cleared)
xlrec->flags = XLH_INSERT_ALL_VISIBLE_CLEARED;
+
+ /*
+ * We don't have to worry about including a conflict xid in the
+ * WAL record as HEAP_INSERT_FROZEN intentionally violates
+ * visibility rules.
+ */
if (all_frozen_set)
xlrec->flags = XLH_INSERT_ALL_FROZEN_SET;
@@ -2614,7 +2633,10 @@ heap_multi_insert(Relation relation, TupleTableSlot **slots, int ntuples,
XLogBeginInsert();
XLogRegisterData(xlrec, tupledata - scratch.data);
+
XLogRegisterBuffer(0, buffer, REGBUF_STANDARD | bufflags);
+ if (all_frozen_set)
+ XLogRegisterBuffer(1, vmbuffer, 0);
XLogRegisterBufData(0, tupledata, totaldatalen);
@@ -2624,22 +2646,14 @@ heap_multi_insert(Relation relation, TupleTableSlot **slots, int ntuples,
recptr = XLogInsert(RM_HEAP2_ID, info);
PageSetLSN(page, recptr);
+ if (all_frozen_set)
+ PageSetLSN(BufferGetPage(vmbuffer), recptr);
}
END_CRIT_SECTION();
- /*
- * If we've frozen everything on the page, update the visibilitymap.
- * We're already holding pin on the vmbuffer. It's fine to use
- * InvalidTransactionId as the cutoff_xid here - this is only used
- * when HEAP_INSERT_FROZEN is specified, which intentionally violates
- * visibility rules.
- */
if (all_frozen_set)
- heap_page_set_vm_and_log(relation, BufferGetBlockNumber(buffer), buffer,
- vmbuffer,
- InvalidTransactionId,
- VISIBILITYMAP_ALL_VISIBLE | VISIBILITYMAP_ALL_FROZEN);
+ LockBuffer(vmbuffer, BUFFER_LOCK_UNLOCK);
UnlockReleaseBuffer(buffer);
ndone += nthispage;
diff --git a/src/backend/access/heap/heapam_xlog.c b/src/backend/access/heap/heapam_xlog.c
index f2bc1bd06ee..71754fd77c4 100644
--- a/src/backend/access/heap/heapam_xlog.c
+++ b/src/backend/access/heap/heapam_xlog.c
@@ -552,6 +552,7 @@ heap_xlog_multi_insert(XLogReaderState *record)
int i;
bool isinit = (XLogRecGetInfo(record) & XLOG_HEAP_INIT_PAGE) != 0;
XLogRedoAction action;
+ Buffer vmbuffer = InvalidBuffer;
/*
* Insertion doesn't overwrite MVCC data, so no conflict processing is
@@ -572,11 +573,11 @@ heap_xlog_multi_insert(XLogReaderState *record)
if (xlrec->flags & XLH_INSERT_ALL_VISIBLE_CLEARED)
{
Relation reln = CreateFakeRelcacheEntry(rlocator);
- Buffer vmbuffer = InvalidBuffer;
visibilitymap_pin(reln, blkno, &vmbuffer);
visibilitymap_clear(reln, blkno, vmbuffer, VISIBILITYMAP_VALID_BITS);
ReleaseBuffer(vmbuffer);
+ vmbuffer = InvalidBuffer;
FreeFakeRelcacheEntry(reln);
}
@@ -663,6 +664,36 @@ heap_xlog_multi_insert(XLogReaderState *record)
if (BufferIsValid(buffer))
UnlockReleaseBuffer(buffer);
+ buffer = InvalidBuffer;
+
+ /*
+ * Now read and update the VM block. Even if we skipped updating the heap
+ * page due to the file being dropped or truncated later in recovery, it's
+ * still safe to update the visibility map. Any WAL record that clears
+ * the visibility map bit does so before checking the page LSN, so any
+ * bits that need to be cleared will still be cleared.
+ *
+ * It is only okay to set the VM bits without holding the heap page lock
+ * because we can expect no other writers of this page.
+ */
+ if (xlrec->flags & XLH_INSERT_ALL_FROZEN_SET &&
+ XLogReadBufferForRedoExtended(record, 1, RBM_ZERO_ON_ERROR, false,
+ &vmbuffer) == BLK_NEEDS_REDO)
+ {
+ Relation reln = CreateFakeRelcacheEntry(rlocator);
+
+ visibilitymap_pin(reln, blkno, &vmbuffer);
+ visibilitymap_set_vmbyte(reln, blkno,
+ vmbuffer,
+ VISIBILITYMAP_ALL_VISIBLE |
+ VISIBILITYMAP_ALL_FROZEN);
+ PageSetLSN(BufferGetPage(vmbuffer), lsn);
+ FreeFakeRelcacheEntry(reln);
+ }
+
+ if (BufferIsValid(vmbuffer))
+ UnlockReleaseBuffer(vmbuffer);
+
/*
* If the page is running low on free space, update the FSM as well.
* Arbitrarily, our definition of "low" is less than 20%. We can't do much
diff --git a/src/backend/access/rmgrdesc/heapdesc.c b/src/backend/access/rmgrdesc/heapdesc.c
index 82b62c95de5..b48d7dc1d24 100644
--- a/src/backend/access/rmgrdesc/heapdesc.c
+++ b/src/backend/access/rmgrdesc/heapdesc.c
@@ -16,6 +16,7 @@
#include "access/heapam_xlog.h"
#include "access/rmgrdesc_utils.h"
+#include "access/visibilitymapdefs.h"
#include "storage/standbydefs.h"
/*
@@ -354,6 +355,10 @@ heap2_desc(StringInfo buf, XLogReaderState *record)
appendStringInfo(buf, "ntuples: %d, flags: 0x%02X", xlrec->ntuples,
xlrec->flags);
+ if (xlrec->flags & XLH_INSERT_ALL_FROZEN_SET)
+ appendStringInfo(buf, ", vm_flags: 0x%02X",
+ VISIBILITYMAP_ALL_VISIBLE | VISIBILITYMAP_ALL_FROZEN);
+
if (XLogRecHasBlockData(record, 0) && !isinit)
{
appendStringInfoString(buf, ", offsets:");
--
2.34.1
v2-0008-Combine-vacuum-phase-I-VM-update-cases.patchtext/x-patch; charset=US-ASCII; name=v2-0008-Combine-vacuum-phase-I-VM-update-cases.patchDownload
From 8716ac80b1b9b840a34ebcc1012565ca0375e045 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <melanieplageman@gmail.com>
Date: Wed, 28 May 2025 16:35:36 -0400
Subject: [PATCH v2 08/12] Combine vacuum phase I VM update cases
After phase I of vacuum we update the VM, either setting the VM bits
when all bits are currently unset or setting just the frozen bit when
the all-visible bit is already set. Those cases had a lot of duplicated
code. Combine them. This is simpler to understand and also allows makes
the code compact enough to start using to update the VM while pruning
and freezing.
---
src/backend/access/heap/vacuumlazy.c | 71 +++++++++-------------------
1 file changed, 22 insertions(+), 49 deletions(-)
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index 0c5f8484866..402b2bd65ca 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -2151,11 +2151,26 @@ lazy_scan_prune(LVRelState *vacrel,
{
/* Don't update the VM if we just cleared corruption in it */
}
- else if (!all_visible_according_to_vm && presult.all_visible)
+
+ /*
+ * If the page isn't yet marked all-visible in the VM or it is and needs
+ * to me marked all-frozen, update the VM Note that all_frozen is only
+ * valid if all_visible is true, so we must check both all_visible and
+ * all_frozen.
+ */
+ else if (presult.all_visible &&
+ (!all_visible_according_to_vm ||
+ (presult.all_frozen && !VM_ALL_FROZEN(vacrel->rel, blkno, &vmbuffer))))
{
uint8 old_vmbits;
uint8 flags = VISIBILITYMAP_ALL_VISIBLE;
+ /*
+ * If the page is all-frozen, we can pass InvalidTransactionId as our
+ * cutoff_xid, since a snapshotConflictHorizon sufficient to make
+ * everything safe for REDO was logged when the page's tuples were
+ * frozen.
+ */
if (presult.all_frozen)
{
Assert(!TransactionIdIsValid(presult.vm_conflict_horizon));
@@ -2167,6 +2182,12 @@ lazy_scan_prune(LVRelState *vacrel,
flags);
/*
+ * Even if we are only setting the all-frozen bit, there is a small
+ * chance that the VM was modified sometime between setting
+ * all_visible_according_to_vm and checking the visibility during
+ * pruning. Check the return value of old_vmbits to ensure the
+ * visibility map counters used for logging are accurate.
+ *
* If the page wasn't already set all-visible and/or all-frozen in the
* VM, count it as newly set for logging.
*/
@@ -2186,54 +2207,6 @@ lazy_scan_prune(LVRelState *vacrel,
*vm_page_frozen = true;
}
}
-
- /*
- * If the all-visible page is all-frozen but not marked as such yet, mark
- * it as all-frozen. Note that all_frozen is only valid if all_visible is
- * true, so we must check both all_visible and all_frozen.
- */
- else if (all_visible_according_to_vm && presult.all_visible &&
- presult.all_frozen && !VM_ALL_FROZEN(vacrel->rel, blkno, &vmbuffer))
- {
- uint8 old_vmbits;
-
- /*
- * Set the page all-frozen (and all-visible) in the VM.
- *
- * We can pass InvalidTransactionId as our cutoff_xid, since a
- * snapshotConflictHorizon sufficient to make everything safe for REDO
- * was logged when the page's tuples were frozen.
- */
- Assert(!TransactionIdIsValid(presult.vm_conflict_horizon));
- old_vmbits = heap_page_set_vm_and_log(vacrel->rel, blkno, buf,
- vmbuffer, InvalidTransactionId,
- VISIBILITYMAP_ALL_VISIBLE |
- VISIBILITYMAP_ALL_FROZEN);
-
- /*
- * The page was likely already set all-visible in the VM. However,
- * there is a small chance that it was modified sometime between
- * setting all_visible_according_to_vm and checking the visibility
- * during pruning. Check the return value of old_vmbits anyway to
- * ensure the visibility map counters used for logging are accurate.
- */
- if ((old_vmbits & VISIBILITYMAP_ALL_VISIBLE) == 0)
- {
- vacrel->vm_new_visible_pages++;
- vacrel->vm_new_visible_frozen_pages++;
- *vm_page_frozen = true;
- }
-
- /*
- * We already checked that the page was not set all-frozen in the VM
- * above, so we don't need to test the value of old_vmbits.
- */
- else
- {
- vacrel->vm_new_frozen_pages++;
- *vm_page_frozen = true;
- }
- }
}
/*
--
2.34.1
v2-0007-Combine-lazy_scan_prune-VM-corruption-cases.patchtext/x-patch; charset=US-ASCII; name=v2-0007-Combine-lazy_scan_prune-VM-corruption-cases.patchDownload
From f1a47d3e3ef4822689acedf1eea5557aa8fdd850 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <melanieplageman@gmail.com>
Date: Wed, 28 May 2025 16:04:03 -0400
Subject: [PATCH v2 07/12] Combine lazy_scan_prune VM corruption cases
lazy_scan_prune() updates the visibility map after phase I of heap
vacuuming. It also checks and fixes corruption in the VM. The corruption
cases where mixed in with the normal visibility map update cases.
Careful study of the ordering of the current logic reveals that the
corruption cases can be reordered and extracted into a separate
function. This should result in no additional overhead when compared to
previous execution.
This reordering makes it clear which cases are about corruption and
which cases are normal VM updates. Separating them also makes it
possible to combine the normal cases in a future commit. This will make
the logic easier to understand and allow for further separation of the
logic to allow updating the VM in the same record as pruning and
freezing in phase I.
---
src/backend/access/heap/vacuumlazy.c | 115 +++++++++++++++++----------
1 file changed, 74 insertions(+), 41 deletions(-)
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index 68c8b0f4475..0c5f8484866 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -431,6 +431,13 @@ static void find_next_unskippable_block(LVRelState *vacrel, bool *skipsallvis);
static bool lazy_scan_new_or_empty(LVRelState *vacrel, Buffer buf,
BlockNumber blkno, Page page,
bool sharelock, Buffer vmbuffer);
+
+static bool identify_and_fix_vm_corruption(Relation relation,
+ BlockNumber heap_blk,
+ Buffer heap_buffer, Page heap_page,
+ bool heap_blk_known_av,
+ int64 nlpdead_items,
+ Buffer vmbuffer);
static void lazy_scan_prune(LVRelState *vacrel, Buffer buf,
BlockNumber blkno, Page page,
Buffer vmbuffer, bool all_visible_according_to_vm,
@@ -1933,6 +1940,66 @@ lazy_scan_new_or_empty(LVRelState *vacrel, Buffer buf, BlockNumber blkno,
return false;
}
+/*
+ * When updating the visibility map after phase I heap vacuuming, we take the
+ * opportunity to identify and fix any VM corruption.
+ *
+ * heap_blk_known_av is the visibility status of the heap page collected
+ * while finding the next unskippable block in heap_vac_scan_next_block().
+ */
+static bool
+identify_and_fix_vm_corruption(Relation relation,
+ BlockNumber heap_blk,
+ Buffer heap_buffer, Page heap_page,
+ bool heap_blk_known_av,
+ int64 nlpdead_items,
+ Buffer vmbuffer)
+{
+ /*
+ * As of PostgreSQL 9.2, the visibility map bit should never be set if the
+ * page-level bit is clear. However, it's possible that the bit got
+ * cleared after heap_vac_scan_next_block() was called, so we must recheck
+ * with buffer lock before concluding that the VM is corrupt.
+ */
+ if (heap_blk_known_av && !PageIsAllVisible(heap_page) &&
+ visibilitymap_get_status(relation, heap_blk, &vmbuffer) != 0)
+ {
+ elog(WARNING, "page is not marked all-visible but visibility map bit is set in relation \"%s\" page %u",
+ RelationGetRelationName(relation), heap_blk);
+ visibilitymap_clear(relation, heap_blk, vmbuffer,
+ VISIBILITYMAP_VALID_BITS);
+ return true;
+ }
+
+ /*
+ * It's possible for the value returned by
+ * GetOldestNonRemovableTransactionId() to move backwards, so it's not
+ * wrong for us to see tuples that appear to not be visible to everyone
+ * yet, while PD_ALL_VISIBLE is already set. The real safe xmin value
+ * never moves backwards, but GetOldestNonRemovableTransactionId() is
+ * conservative and sometimes returns a value that's unnecessarily small,
+ * so if we see that contradiction it just means that the tuples that we
+ * think are not visible to everyone yet actually are, and the
+ * PD_ALL_VISIBLE flag is correct.
+ *
+ * There should never be LP_DEAD items on a page with PD_ALL_VISIBLE set,
+ * however.
+ */
+ if (nlpdead_items > 0 && PageIsAllVisible(heap_page))
+ {
+ elog(WARNING, "page containing LP_DEAD items is marked as all-visible in relation \"%s\" page %u",
+ RelationGetRelationName(relation), heap_blk);
+ PageClearAllVisible(heap_page);
+ MarkBufferDirty(heap_buffer);
+ visibilitymap_clear(relation, heap_blk, vmbuffer,
+ VISIBILITYMAP_VALID_BITS);
+ return true;
+ }
+
+ return false;
+}
+
+
/* qsort comparator for sorting OffsetNumbers */
static int
cmpOffsetNumbers(const void *a, const void *b)
@@ -2077,9 +2144,14 @@ lazy_scan_prune(LVRelState *vacrel,
/*
* Handle setting visibility map bit based on information from the VM (as
* of last heap_vac_scan_next_block() call), and from all_visible and
- * all_frozen variables
+ * all_frozen variables. Start by looking for any VM corruption.
*/
- if (!all_visible_according_to_vm && presult.all_visible)
+ if (identify_and_fix_vm_corruption(vacrel->rel, blkno, buf, page,
+ all_visible_according_to_vm, presult.lpdead_items, vmbuffer))
+ {
+ /* Don't update the VM if we just cleared corruption in it */
+ }
+ else if (!all_visible_according_to_vm && presult.all_visible)
{
uint8 old_vmbits;
uint8 flags = VISIBILITYMAP_ALL_VISIBLE;
@@ -2115,45 +2187,6 @@ lazy_scan_prune(LVRelState *vacrel,
}
}
- /*
- * As of PostgreSQL 9.2, the visibility map bit should never be set if the
- * page-level bit is clear. However, it's possible that the bit got
- * cleared after heap_vac_scan_next_block() was called, so we must recheck
- * with buffer lock before concluding that the VM is corrupt.
- */
- else if (all_visible_according_to_vm && !PageIsAllVisible(page) &&
- visibilitymap_get_status(vacrel->rel, blkno, &vmbuffer) != 0)
- {
- elog(WARNING, "page is not marked all-visible but visibility map bit is set in relation \"%s\" page %u",
- vacrel->relname, blkno);
- visibilitymap_clear(vacrel->rel, blkno, vmbuffer,
- VISIBILITYMAP_VALID_BITS);
- }
-
- /*
- * It's possible for the value returned by
- * GetOldestNonRemovableTransactionId() to move backwards, so it's not
- * wrong for us to see tuples that appear to not be visible to everyone
- * yet, while PD_ALL_VISIBLE is already set. The real safe xmin value
- * never moves backwards, but GetOldestNonRemovableTransactionId() is
- * conservative and sometimes returns a value that's unnecessarily small,
- * so if we see that contradiction it just means that the tuples that we
- * think are not visible to everyone yet actually are, and the
- * PD_ALL_VISIBLE flag is correct.
- *
- * There should never be LP_DEAD items on a page with PD_ALL_VISIBLE set,
- * however.
- */
- else if (presult.lpdead_items > 0 && PageIsAllVisible(page))
- {
- elog(WARNING, "page containing LP_DEAD items is marked as all-visible in relation \"%s\" page %u",
- vacrel->relname, blkno);
- PageClearAllVisible(page);
- MarkBufferDirty(buf);
- visibilitymap_clear(vacrel->rel, blkno, vmbuffer,
- VISIBILITYMAP_VALID_BITS);
- }
-
/*
* If the all-visible page is all-frozen but not marked as such yet, mark
* it as all-frozen. Note that all_frozen is only valid if all_visible is
--
2.34.1
v2-0009-Find-and-fix-VM-corruption-in-heap_page_prune_and.patchtext/x-patch; charset=US-ASCII; name=v2-0009-Find-and-fix-VM-corruption-in-heap_page_prune_and.patchDownload
From 6e021c54db3f723c814a73d431a30995d9256655 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <melanieplageman@gmail.com>
Date: Wed, 28 May 2025 16:45:59 -0400
Subject: [PATCH v2 09/12] Find and fix VM corruption in
heap_page_prune_and_freeze
Future commits will update the VM in the same critical section and WAL
record as pruning and freezing. For ease of review, this commit makes
one step toward doing this. It moves the VM corruption handling case to
heap_page_prune_and_freeze().
---
src/backend/access/heap/pruneheap.c | 87 +++++++++++++++++++++++++++-
src/backend/access/heap/vacuumlazy.c | 78 +++----------------------
src/include/access/heapam.h | 4 ++
3 files changed, 96 insertions(+), 73 deletions(-)
diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index 97e51f78854..496b70e318f 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -21,7 +21,7 @@
#include "access/transam.h"
#include "access/xlog.h"
#include "access/xloginsert.h"
-#include "access/visibilitymapdefs.h"
+#include "access/visibilitymap.h"
#include "commands/vacuum.h"
#include "executor/instrument.h"
#include "miscadmin.h"
@@ -177,6 +177,13 @@ static void heap_prune_record_unchanged_lp_redirect(PruneState *prstate, OffsetN
static void page_verify_redirects(Page page);
+static bool identify_and_fix_vm_corruption(Relation relation,
+ BlockNumber heap_blk,
+ Buffer heap_buffer, Page heap_page,
+ bool heap_blk_known_av,
+ int64 nlpdead_items,
+ Buffer vmbuffer);
+
/*
* Optionally prune and repair fragmentation in the specified page.
@@ -261,7 +268,9 @@ heap_page_prune_opt(Relation relation, Buffer buffer)
* not the relation has indexes, since we cannot safely determine
* that during on-access pruning with the current implementation.
*/
- heap_page_prune_and_freeze(relation, buffer, vistest, 0,
+ heap_page_prune_and_freeze(relation, buffer, false,
+ InvalidBuffer,
+ vistest, 0,
NULL, &presult, PRUNE_ON_ACCESS, &dummy_off_loc, NULL, NULL);
/*
@@ -294,6 +303,64 @@ heap_page_prune_opt(Relation relation, Buffer buffer)
}
}
+/*
+ * When updating the visibility map after phase I heap vacuuming, we take the
+ * opportunity to identify and fix any VM corruption.
+ *
+ * heap_blk_known_av is the visibility status of the heap page collected
+ * while finding the next unskippable block in heap_vac_scan_next_block().
+ */
+static bool
+identify_and_fix_vm_corruption(Relation relation,
+ BlockNumber heap_blk,
+ Buffer heap_buffer, Page heap_page,
+ bool heap_blk_known_av,
+ int64 nlpdead_items,
+ Buffer vmbuffer)
+{
+ /*
+ * As of PostgreSQL 9.2, the visibility map bit should never be set if the
+ * page-level bit is clear. However, it's possible that the bit got
+ * cleared after heap_vac_scan_next_block() was called, so we must recheck
+ * with buffer lock before concluding that the VM is corrupt.
+ */
+ if (heap_blk_known_av && !PageIsAllVisible(heap_page) &&
+ visibilitymap_get_status(relation, heap_blk, &vmbuffer) != 0)
+ {
+ elog(WARNING, "page is not marked all-visible but visibility map bit is set in relation \"%s\" page %u",
+ RelationGetRelationName(relation), heap_blk);
+ visibilitymap_clear(relation, heap_blk, vmbuffer,
+ VISIBILITYMAP_VALID_BITS);
+ return true;
+ }
+
+ /*
+ * It's possible for the value returned by
+ * GetOldestNonRemovableTransactionId() to move backwards, so it's not
+ * wrong for us to see tuples that appear to not be visible to everyone
+ * yet, while PD_ALL_VISIBLE is already set. The real safe xmin value
+ * never moves backwards, but GetOldestNonRemovableTransactionId() is
+ * conservative and sometimes returns a value that's unnecessarily small,
+ * so if we see that contradiction it just means that the tuples that we
+ * think are not visible to everyone yet actually are, and the
+ * PD_ALL_VISIBLE flag is correct.
+ *
+ * There should never be LP_DEAD items on a page with PD_ALL_VISIBLE set,
+ * however.
+ */
+ if (nlpdead_items > 0 && PageIsAllVisible(heap_page))
+ {
+ elog(WARNING, "page containing LP_DEAD items is marked as all-visible in relation \"%s\" page %u",
+ RelationGetRelationName(relation), heap_blk);
+ PageClearAllVisible(heap_page);
+ MarkBufferDirty(heap_buffer);
+ visibilitymap_clear(relation, heap_blk, vmbuffer,
+ VISIBILITYMAP_VALID_BITS);
+ return true;
+ }
+
+ return false;
+}
/*
* Prune and repair fragmentation and potentially freeze tuples on the
@@ -314,6 +381,10 @@ heap_page_prune_opt(Relation relation, Buffer buffer)
* HEAP_PRUNE_FREEZE option is not set, because at the moment only callers
* that also freeze need that information.
*
+ * blk_known_av is the visibility status of the heap block as of the last call
+ * to find_next_unskippable_block(). vmbuffer is the buffer that may already
+ * contain the required block of the visibility map.
+ *
* vistest is used to distinguish whether tuples are DEAD or RECENTLY_DEAD
* (see heap_prune_satisfies_vacuum).
*
@@ -349,6 +420,8 @@ heap_page_prune_opt(Relation relation, Buffer buffer)
*/
void
heap_page_prune_and_freeze(Relation relation, Buffer buffer,
+ bool blk_known_av,
+ Buffer vmbuffer,
GlobalVisState *vistest,
int options,
struct VacuumCutoffs *cutoffs,
@@ -897,6 +970,16 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
presult->lpdead_items = prstate.lpdead_items;
/* the presult->deadoffsets array was already filled in */
+ /*
+ * Clear any VM corruption. This does not need to be done in a critical
+ * section.
+ */
+ presult->vm_corruption = false;
+ if (options & HEAP_PAGE_PRUNE_UPDATE_VM)
+ presult->vm_corruption = identify_and_fix_vm_corruption(relation,
+ blockno, buffer, page,
+ blk_known_av,
+ prstate.lpdead_items, vmbuffer);
if (prstate.freeze)
{
if (presult->nfrozen > 0)
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index 402b2bd65ca..9e0b0a31013 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -431,13 +431,6 @@ static void find_next_unskippable_block(LVRelState *vacrel, bool *skipsallvis);
static bool lazy_scan_new_or_empty(LVRelState *vacrel, Buffer buf,
BlockNumber blkno, Page page,
bool sharelock, Buffer vmbuffer);
-
-static bool identify_and_fix_vm_corruption(Relation relation,
- BlockNumber heap_blk,
- Buffer heap_buffer, Page heap_page,
- bool heap_blk_known_av,
- int64 nlpdead_items,
- Buffer vmbuffer);
static void lazy_scan_prune(LVRelState *vacrel, Buffer buf,
BlockNumber blkno, Page page,
Buffer vmbuffer, bool all_visible_according_to_vm,
@@ -1940,65 +1933,6 @@ lazy_scan_new_or_empty(LVRelState *vacrel, Buffer buf, BlockNumber blkno,
return false;
}
-/*
- * When updating the visibility map after phase I heap vacuuming, we take the
- * opportunity to identify and fix any VM corruption.
- *
- * heap_blk_known_av is the visibility status of the heap page collected
- * while finding the next unskippable block in heap_vac_scan_next_block().
- */
-static bool
-identify_and_fix_vm_corruption(Relation relation,
- BlockNumber heap_blk,
- Buffer heap_buffer, Page heap_page,
- bool heap_blk_known_av,
- int64 nlpdead_items,
- Buffer vmbuffer)
-{
- /*
- * As of PostgreSQL 9.2, the visibility map bit should never be set if the
- * page-level bit is clear. However, it's possible that the bit got
- * cleared after heap_vac_scan_next_block() was called, so we must recheck
- * with buffer lock before concluding that the VM is corrupt.
- */
- if (heap_blk_known_av && !PageIsAllVisible(heap_page) &&
- visibilitymap_get_status(relation, heap_blk, &vmbuffer) != 0)
- {
- elog(WARNING, "page is not marked all-visible but visibility map bit is set in relation \"%s\" page %u",
- RelationGetRelationName(relation), heap_blk);
- visibilitymap_clear(relation, heap_blk, vmbuffer,
- VISIBILITYMAP_VALID_BITS);
- return true;
- }
-
- /*
- * It's possible for the value returned by
- * GetOldestNonRemovableTransactionId() to move backwards, so it's not
- * wrong for us to see tuples that appear to not be visible to everyone
- * yet, while PD_ALL_VISIBLE is already set. The real safe xmin value
- * never moves backwards, but GetOldestNonRemovableTransactionId() is
- * conservative and sometimes returns a value that's unnecessarily small,
- * so if we see that contradiction it just means that the tuples that we
- * think are not visible to everyone yet actually are, and the
- * PD_ALL_VISIBLE flag is correct.
- *
- * There should never be LP_DEAD items on a page with PD_ALL_VISIBLE set,
- * however.
- */
- if (nlpdead_items > 0 && PageIsAllVisible(heap_page))
- {
- elog(WARNING, "page containing LP_DEAD items is marked as all-visible in relation \"%s\" page %u",
- RelationGetRelationName(relation), heap_blk);
- PageClearAllVisible(heap_page);
- MarkBufferDirty(heap_buffer);
- visibilitymap_clear(relation, heap_blk, vmbuffer,
- VISIBILITYMAP_VALID_BITS);
- return true;
- }
-
- return false;
-}
-
/* qsort comparator for sorting OffsetNumbers */
static int
@@ -2055,11 +1989,14 @@ lazy_scan_prune(LVRelState *vacrel,
* tuples. Pruning will have determined whether or not the page is
* all-visible.
*/
- prune_options = HEAP_PAGE_PRUNE_FREEZE;
+ prune_options = HEAP_PAGE_PRUNE_FREEZE | HEAP_PAGE_PRUNE_UPDATE_VM;
if (vacrel->nindexes == 0)
prune_options |= HEAP_PAGE_PRUNE_MARK_UNUSED_NOW;
- heap_page_prune_and_freeze(rel, buf, vacrel->vistest, prune_options,
+ heap_page_prune_and_freeze(rel, buf,
+ all_visible_according_to_vm,
+ vmbuffer,
+ vacrel->vistest, prune_options,
&vacrel->cutoffs, &presult, PRUNE_VACUUM_SCAN,
&vacrel->offnum,
&vacrel->NewRelfrozenXid, &vacrel->NewRelminMxid);
@@ -2144,10 +2081,9 @@ lazy_scan_prune(LVRelState *vacrel,
/*
* Handle setting visibility map bit based on information from the VM (as
* of last heap_vac_scan_next_block() call), and from all_visible and
- * all_frozen variables. Start by looking for any VM corruption.
+ * all_frozen variables.
*/
- if (identify_and_fix_vm_corruption(vacrel->rel, blkno, buf, page,
- all_visible_according_to_vm, presult.lpdead_items, vmbuffer))
+ if (presult.vm_corruption)
{
/* Don't update the VM if we just cleared corruption in it */
}
diff --git a/src/include/access/heapam.h b/src/include/access/heapam.h
index 1fa6eb047fd..0886867a161 100644
--- a/src/include/access/heapam.h
+++ b/src/include/access/heapam.h
@@ -41,6 +41,7 @@
/* "options" flag bits for heap_page_prune_and_freeze */
#define HEAP_PAGE_PRUNE_MARK_UNUSED_NOW (1 << 0)
#define HEAP_PAGE_PRUNE_FREEZE (1 << 1)
+#define HEAP_PAGE_PRUNE_UPDATE_VM (1 << 2)
typedef struct BulkInsertStateData *BulkInsertState;
struct TupleTableSlot;
@@ -246,6 +247,7 @@ typedef struct PruneFreezeResult
bool all_visible;
bool all_frozen;
TransactionId vm_conflict_horizon;
+ bool vm_corruption;
/*
* Whether or not the page makes rel truncation unsafe. This is set to
@@ -385,6 +387,8 @@ extern TransactionId heap_index_delete_tuples(Relation rel,
struct GlobalVisState;
extern void heap_page_prune_opt(Relation relation, Buffer buffer);
extern void heap_page_prune_and_freeze(Relation relation, Buffer buffer,
+ bool blk_known_av,
+ Buffer vmbuffer,
struct GlobalVisState *vistest,
int options,
struct VacuumCutoffs *cutoffs,
--
2.34.1
v2-0010-Update-VM-in-pruneheap.c.patchtext/x-patch; charset=US-ASCII; name=v2-0010-Update-VM-in-pruneheap.c.patchDownload
From cdf83732bb633199eab6016e08e7cc1c2185c144 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <melanieplageman@gmail.com>
Date: Mon, 2 Jun 2025 11:04:14 -0400
Subject: [PATCH v2 10/12] Update VM in pruneheap.c
As a step toward updating the VM in the same critical section and WAL
record as pruning and freezing (during phase I of vacuuming), first move
the VM update (still in its own critical section and WAL record) into
heap_page_prune_and_freeze(). This makes review easier.
---
src/backend/access/heap/pruneheap.c | 99 +++++++++++++++++++++++-----
src/backend/access/heap/vacuumlazy.c | 81 +++++------------------
src/include/access/heapam.h | 15 +++--
3 files changed, 106 insertions(+), 89 deletions(-)
diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index 496b70e318f..425dcc77534 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -364,7 +364,8 @@ identify_and_fix_vm_corruption(Relation relation,
/*
* Prune and repair fragmentation and potentially freeze tuples on the
- * specified page.
+ * specified page. If the page's visibility status has changed, update it in
+ * the VM.
*
* Caller must have pin and buffer cleanup lock on the page. Note that we
* don't update the FSM information for page on caller's behalf. Caller might
@@ -440,6 +441,8 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
bool do_freeze;
bool do_prune;
bool do_hint;
+ uint8 vmflags = 0;
+ uint8 old_vmbits = 0;
bool hint_bit_fpi;
int64 fpi_before = pgWalUsage.wal_fpi;
@@ -939,7 +942,7 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
*
* Now that freezing has been finalized, unset all_visible if there are
* any LP_DEAD items on the page. It needs to reflect the present state
- * of the page, as expected by our caller.
+ * of the page, as expected for updating the visibility map.
*/
if (prstate.all_visible && prstate.lpdead_items == 0)
{
@@ -955,31 +958,91 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
presult->hastup = prstate.hastup;
/*
- * For callers planning to update the visibility map, the conflict horizon
- * for that record must be the newest xmin on the page. However, if the
- * page is completely frozen, there can be no conflict and the
- * vm_conflict_horizon should remain InvalidTransactionId. This includes
- * the case that we just froze all the tuples; the prune-freeze record
- * included the conflict XID already so the caller doesn't need it.
+ * If updating the visibility map, the conflict horizon for that record
+ * must be the newest xmin on the page. However, if the page is
+ * completely frozen, there can be no conflict and the vm_conflict_horizon
+ * should remain InvalidTransactionId. This includes the case that we
+ * just froze all the tuples; the prune-freeze record included the
+ * conflict XID already so the VM update record doesn't need it.
*/
if (presult->all_frozen)
presult->vm_conflict_horizon = InvalidTransactionId;
else
presult->vm_conflict_horizon = prstate.visibility_cutoff_xid;
- presult->lpdead_items = prstate.lpdead_items;
- /* the presult->deadoffsets array was already filled in */
-
/*
- * Clear any VM corruption. This does not need to be done in a critical
- * section.
+ * Handle setting visibility map bit based on information from the VM (as
+ * of last heap_vac_scan_next_block() call), and from all_visible and
+ * all_frozen variables.
*/
- presult->vm_corruption = false;
if (options & HEAP_PAGE_PRUNE_UPDATE_VM)
- presult->vm_corruption = identify_and_fix_vm_corruption(relation,
- blockno, buffer, page,
- blk_known_av,
- prstate.lpdead_items, vmbuffer);
+ {
+ if (identify_and_fix_vm_corruption(relation,
+ blockno, buffer, page,
+ blk_known_av,
+ prstate.lpdead_items, vmbuffer))
+ {
+ /* If we fix corruption, don't update the VM further */
+ }
+
+ /*
+ * If the page isn't yet marked all-visible in the VM or it is and
+ * needs to me marked all-frozen, update the VM Note that all_frozen
+ * is only valid if all_visible is true, so we must check both
+ * all_visible and all_frozen.
+ */
+ else if (presult->all_visible &&
+ (!blk_known_av ||
+ (presult->all_frozen && !VM_ALL_FROZEN(relation, blockno, &vmbuffer))))
+ {
+ Assert(prstate.lpdead_items == 0);
+ vmflags = VISIBILITYMAP_ALL_VISIBLE;
+
+ /*
+ * If the page is all-frozen, we can pass InvalidTransactionId as
+ * our cutoff_xid, since a snapshotConflictHorizon sufficient to
+ * make everything safe for REDO was logged when the page's tuples
+ * were frozen.
+ */
+ if (presult->all_frozen)
+ {
+ Assert(!TransactionIdIsValid(presult->vm_conflict_horizon));
+ vmflags |= VISIBILITYMAP_ALL_FROZEN;
+ }
+
+ /*
+ * It's possible for the VM bit to be clear and the page-level bit
+ * to be set if checksums are not enabled.
+ *
+ * And even if we are just planning to update the frozen bit in
+ * the VM, we shouldn't rely on all_visible_according_to_vm as a
+ * proxy for the page-level PD_ALL_VISIBLE bit being set, since it
+ * might have become stale.
+ *
+ * If the heap page is all-visible but the VM bit is not set, we
+ * don't need to dirty the heap page. However, if checksums are
+ * enabled, we do need to make sure that the heap page is dirtied
+ * before passing it to visibilitymap_set(), because it may be
+ * logged.
+ */
+ if (!PageIsAllVisible(page) || XLogHintBitIsNeeded())
+ {
+ PageSetAllVisible(page);
+ MarkBufferDirty(buffer);
+ }
+
+ old_vmbits = heap_page_set_vm_and_log(relation, blockno, buffer,
+ vmbuffer, presult->vm_conflict_horizon,
+ vmflags);
+ }
+ }
+
+ presult->lpdead_items = prstate.lpdead_items;
+ /* the presult->deadoffsets array was already filled in */
+
+ presult->old_vmbits = old_vmbits;
+ presult->new_vmbits = vmflags;
+
if (prstate.freeze)
{
if (presult->nfrozen > 0)
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index 9e0b0a31013..8daad54a0fe 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -1933,7 +1933,6 @@ lazy_scan_new_or_empty(LVRelState *vacrel, Buffer buf, BlockNumber blkno,
return false;
}
-
/* qsort comparator for sorting OffsetNumbers */
static int
cmpOffsetNumbers(const void *a, const void *b)
@@ -1949,7 +1948,8 @@ cmpOffsetNumbers(const void *a, const void *b)
* vmbuffer is the buffer containing the VM block with visibility information
* for the heap block, blkno. all_visible_according_to_vm is the saved
* visibility status of the heap block looked up earlier by the caller. We
- * won't rely entirely on this status, as it may be out of date.
+ * won't rely entirely on this status, as it may be out of date. These will be
+ * passed on to heap_page_prune_and_freeze() to use while setting the VM.
*
* *has_lpdead_items is set to true or false depending on whether, upon return
* from this function, any LP_DEAD items are still present on the page.
@@ -1976,6 +1976,7 @@ lazy_scan_prune(LVRelState *vacrel,
/*
* Prune all HOT-update chains and potentially freeze tuples on this page.
+ * Then, if the page's visibility status has changed, update the VM.
*
* If the relation has no indexes, we can immediately mark would-be dead
* items LP_UNUSED.
@@ -1984,10 +1985,6 @@ lazy_scan_prune(LVRelState *vacrel,
* presult.ndeleted. It should not be confused with presult.lpdead_items;
* presult.lpdead_items's final value can be thought of as the number of
* tuples that were deleted from indexes.
- *
- * We will update the VM after collecting LP_DEAD items and freezing
- * tuples. Pruning will have determined whether or not the page is
- * all-visible.
*/
prune_options = HEAP_PAGE_PRUNE_FREEZE | HEAP_PAGE_PRUNE_UPDATE_VM;
if (vacrel->nindexes == 0)
@@ -2079,70 +2076,26 @@ lazy_scan_prune(LVRelState *vacrel,
Assert(!presult.all_visible || !(*has_lpdead_items));
/*
- * Handle setting visibility map bit based on information from the VM (as
- * of last heap_vac_scan_next_block() call), and from all_visible and
- * all_frozen variables.
- */
- if (presult.vm_corruption)
- {
- /* Don't update the VM if we just cleared corruption in it */
- }
-
- /*
- * If the page isn't yet marked all-visible in the VM or it is and needs
- * to me marked all-frozen, update the VM Note that all_frozen is only
- * valid if all_visible is true, so we must check both all_visible and
- * all_frozen.
+ * For the purposes of logging, count whether or not the page was newly
+ * set all-visible and, potentially, all-frozen.
*/
- else if (presult.all_visible &&
- (!all_visible_according_to_vm ||
- (presult.all_frozen && !VM_ALL_FROZEN(vacrel->rel, blkno, &vmbuffer))))
+ if ((presult.old_vmbits & VISIBILITYMAP_ALL_VISIBLE) == 0 &&
+ (presult.new_vmbits & VISIBILITYMAP_ALL_VISIBLE) != 0)
{
- uint8 old_vmbits;
- uint8 flags = VISIBILITYMAP_ALL_VISIBLE;
-
- /*
- * If the page is all-frozen, we can pass InvalidTransactionId as our
- * cutoff_xid, since a snapshotConflictHorizon sufficient to make
- * everything safe for REDO was logged when the page's tuples were
- * frozen.
- */
- if (presult.all_frozen)
- {
- Assert(!TransactionIdIsValid(presult.vm_conflict_horizon));
- flags |= VISIBILITYMAP_ALL_FROZEN;
- }
-
- old_vmbits = heap_page_set_vm_and_log(vacrel->rel, blkno, buf,
- vmbuffer, presult.vm_conflict_horizon,
- flags);
-
- /*
- * Even if we are only setting the all-frozen bit, there is a small
- * chance that the VM was modified sometime between setting
- * all_visible_according_to_vm and checking the visibility during
- * pruning. Check the return value of old_vmbits to ensure the
- * visibility map counters used for logging are accurate.
- *
- * If the page wasn't already set all-visible and/or all-frozen in the
- * VM, count it as newly set for logging.
- */
- if ((old_vmbits & VISIBILITYMAP_ALL_VISIBLE) == 0)
- {
- vacrel->vm_new_visible_pages++;
- if (presult.all_frozen)
- {
- vacrel->vm_new_visible_frozen_pages++;
- *vm_page_frozen = true;
- }
- }
- else if ((old_vmbits & VISIBILITYMAP_ALL_FROZEN) == 0 &&
- presult.all_frozen)
+ vacrel->vm_new_visible_pages++;
+ if ((presult.new_vmbits & VISIBILITYMAP_ALL_FROZEN) != 0)
{
- vacrel->vm_new_frozen_pages++;
+ vacrel->vm_new_visible_frozen_pages++;
*vm_page_frozen = true;
}
}
+ else if ((presult.old_vmbits & VISIBILITYMAP_ALL_FROZEN) == 0 &&
+ (presult.new_vmbits & VISIBILITYMAP_ALL_FROZEN) != 0)
+ {
+ Assert((presult.new_vmbits & VISIBILITYMAP_ALL_VISIBLE) != 0);
+ vacrel->vm_new_frozen_pages++;
+ *vm_page_frozen = true;
+ }
}
/*
diff --git a/src/include/access/heapam.h b/src/include/access/heapam.h
index 0886867a161..534a63aab31 100644
--- a/src/include/access/heapam.h
+++ b/src/include/access/heapam.h
@@ -234,20 +234,21 @@ typedef struct PruneFreezeResult
int recently_dead_tuples;
/*
- * all_visible and all_frozen indicate if the all-visible and all-frozen
- * bits in the visibility map can be set for this page, after pruning.
+ * all_visible and all_frozen indicate the status of the page as reflected
+ * in the visibility map after pruning, freezing, and setting any pages
+ * all-visible in the visibility map.
*
- * vm_conflict_horizon is the newest xmin of live tuples on the page. The
- * caller can use it as the conflict horizon when setting the VM bits. It
- * is only valid if we froze some tuples (nfrozen > 0), and all_frozen is
- * true.
+ * vm_conflict_horizon is the newest xmin of live tuples on the page
+ * (older than OldestXmin). It will only be valid if we did not set the
+ * page all-frozen in the VM.
*
* These are only set if the HEAP_PRUNE_FREEZE option is set.
*/
bool all_visible;
bool all_frozen;
TransactionId vm_conflict_horizon;
- bool vm_corruption;
+ uint8 old_vmbits;
+ uint8 new_vmbits;
/*
* Whether or not the page makes rel truncation unsafe. This is set to
--
2.34.1
v2-0006-Use-xl_heap_prune-record-for-setting-empty-pages-.patchtext/x-patch; charset=US-ASCII; name=v2-0006-Use-xl_heap_prune-record-for-setting-empty-pages-.patchDownload
From 86684d2c31ab2da25d742028fab502e67cc73545 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <melanieplageman@gmail.com>
Date: Wed, 18 Jun 2025 12:42:19 -0400
Subject: [PATCH v2 06/12] Use xl_heap_prune record for setting empty pages
all-visible
As part of a project to eliminate xl_heap_visible records, eliminate
their usage in phase I vacuum of empty pages.
---
src/backend/access/heap/pruneheap.c | 14 ++++++--
src/backend/access/heap/vacuumlazy.c | 54 ++++++++++++++++++----------
src/include/access/heapam.h | 1 +
3 files changed, 48 insertions(+), 21 deletions(-)
diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index d9ba0f96e34..97e51f78854 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -836,6 +836,7 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
conflict_xid = prstate.latest_xid_removed;
log_heap_prune_and_freeze(relation, buffer,
+ false,
InvalidBuffer, 0, false,
conflict_xid,
true, reason,
@@ -2051,6 +2052,9 @@ heap_log_freeze_plan(HeapTupleFreeze *tuples, int ntuples,
* case, vmbuffer should already have been updated and marked dirty and should
* still be pinned and locked.
*
+ * force_heap_fpi indicates that a full page image of the heap block should be
+ * forced.
+ *
* set_pd_all_vis indicates that we set PD_ALL_VISIBLE and thus should update
* the page LSN when checksums/wal_log_hints are enabled even if we did not
* prune or freeze tuples on the page.
@@ -2061,6 +2065,7 @@ heap_log_freeze_plan(HeapTupleFreeze *tuples, int ntuples,
*/
void
log_heap_prune_and_freeze(Relation relation, Buffer buffer,
+ bool force_heap_fpi,
Buffer vmbuffer,
uint8 vmflags,
bool set_pd_all_vis,
@@ -2089,13 +2094,16 @@ log_heap_prune_and_freeze(Relation relation, Buffer buffer,
xlrec.flags = 0;
regbuf_flags = REGBUF_STANDARD;
+ if (force_heap_fpi)
+ regbuf_flags |= REGBUF_FORCE_IMAGE;
+
/*
* We can avoid an FPI if the only modification we are making to the heap
* page is to set PD_ALL_VISIBLE and checksums/wal_log_hints are disabled.
*/
- if (!do_prune &&
- nfrozen == 0 &&
- (!set_pd_all_vis || !XLogHintBitIsNeeded()))
+ else if (!do_prune &&
+ nfrozen == 0 &&
+ (!set_pd_all_vis || !XLogHintBitIsNeeded()))
regbuf_flags |= REGBUF_NO_IMAGE;
/*
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index 56acb224d71..68c8b0f4475 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -1850,6 +1850,7 @@ lazy_scan_new_or_empty(LVRelState *vacrel, Buffer buf, BlockNumber blkno,
if (PageIsEmpty(page))
{
+
/*
* It seems likely that caller will always be able to get a cleanup
* lock on an empty page. But don't take any chances -- escalate to
@@ -1877,31 +1878,47 @@ lazy_scan_new_or_empty(LVRelState *vacrel, Buffer buf, BlockNumber blkno,
*/
if (!PageIsAllVisible(page))
{
+ uint8 new_vmbits = VISIBILITYMAP_ALL_VISIBLE |
+ VISIBILITYMAP_ALL_FROZEN;
+
START_CRIT_SECTION();
- /*
- * It's possible that another backend has extended the heap,
- * initialized the page, and then failed to WAL-log the page due
- * to an ERROR. Since heap extension is not WAL-logged, recovery
- * might try to replay our record setting the page all-visible and
- * find that the page isn't initialized, which will cause a PANIC.
- * To prevent that, check whether the page has been previously
- * WAL-logged, and if not, do that now.
- */
- if (RelationNeedsWAL(vacrel->rel) &&
- PageGetLSN(page) == InvalidXLogRecPtr)
+ LockBuffer(vmbuffer, BUFFER_LOCK_EXCLUSIVE);
+ heap_page_set_vm(vacrel->rel, blkno, buf,
+ vmbuffer, new_vmbits);
+
+ /* Should have set PD_ALL_VISIBLE and marked buf dirty */
+ Assert(BufferIsDirty(buf));
+
+ if (RelationNeedsWAL(vacrel->rel))
{
- MarkBufferDirty(buf);
- log_newpage_buffer(buf, true);
+ /*
+ * It's possible that another backend has extended the heap,
+ * initialized the page, and then failed to WAL-log the page
+ * due to an ERROR. Since heap extension is not WAL-logged,
+ * recovery might try to replay our record setting the page
+ * all-visible and find that the page isn't initialized, which
+ * will cause a PANIC. To prevent that, if the page hasn't
+ * been previously WAL-logged, force a heap FPI.
+ */
+ log_heap_prune_and_freeze(vacrel->rel, buf,
+ PageGetLSN(page) == InvalidXLogRecPtr,
+ vmbuffer,
+ new_vmbits,
+ true,
+ InvalidTransactionId,
+ false, PRUNE_VACUUM_SCAN,
+ NULL, 0,
+ NULL, 0,
+ NULL, 0,
+ NULL, 0);
}
- heap_page_set_vm_and_log(vacrel->rel, blkno, buf,
- vmbuffer, InvalidTransactionId,
- VISIBILITYMAP_ALL_VISIBLE |
- VISIBILITYMAP_ALL_FROZEN);
END_CRIT_SECTION();
- /* Count the newly all-frozen pages for logging */
+ LockBuffer(vmbuffer, BUFFER_LOCK_UNLOCK);
+
+ /* Count the newly all-frozen pages for logging. */
vacrel->vm_new_visible_pages++;
vacrel->vm_new_visible_frozen_pages++;
}
@@ -2882,6 +2899,7 @@ lazy_vacuum_heap_page(LVRelState *vacrel, BlockNumber blkno, Buffer buffer,
if (RelationNeedsWAL(vacrel->rel))
{
log_heap_prune_and_freeze(vacrel->rel, buffer,
+ false,
vmbuffer,
vmflags,
set_pd_all_vis,
diff --git a/src/include/access/heapam.h b/src/include/access/heapam.h
index d2ac380bb64..1fa6eb047fd 100644
--- a/src/include/access/heapam.h
+++ b/src/include/access/heapam.h
@@ -399,6 +399,7 @@ extern void heap_page_prune_execute(Buffer buffer, bool lp_truncate_only,
OffsetNumber *nowunused, int nunused);
extern void heap_get_root_tuples(Page page, OffsetNumber *root_offsets);
extern void log_heap_prune_and_freeze(Relation relation, Buffer buffer,
+ bool force_heap_fpi,
Buffer vmbuffer,
uint8 vmflags,
bool vm_modified_heap_page,
--
2.34.1
v2-0011-Eliminate-xl_heap_visible-from-vacuum-phase-I-pru.patchtext/x-patch; charset=US-ASCII; name=v2-0011-Eliminate-xl_heap_visible-from-vacuum-phase-I-pru.patchDownload
From 0aa2f93ff11a27c21f857326e90c813e765ecada Mon Sep 17 00:00:00 2001
From: Melanie Plageman <melanieplageman@gmail.com>
Date: Wed, 18 Jun 2025 12:41:00 -0400
Subject: [PATCH v2 11/12] Eliminate xl_heap_visible from vacuum phase I
prune/freeze
Instead of emitting a separate WAL record for every block rendered
all-visible/frozen by vacuum's phase I, include the changes to the VM in
the xl_heap_prune record already emitted.
This is only enabled for vacuum's prune/freeze work, not for on-access
pruning.
---
src/backend/access/heap/pruneheap.c | 384 +++++++++++++++------------
src/backend/access/heap/vacuumlazy.c | 30 ---
src/include/access/heapam.h | 15 +-
3 files changed, 223 insertions(+), 206 deletions(-)
diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index 425dcc77534..2d9624a246e 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -44,6 +44,13 @@ typedef struct
bool mark_unused_now;
/* whether to attempt freezing tuples */
bool freeze;
+
+ /*
+ * Whether or not to consider updating the VM. There is some bookkeeping
+ * that must be maintained if we would like to update the VM.
+ */
+ bool update_vm;
+
struct VacuumCutoffs *cutoffs;
/*-------------------------------------------------------
@@ -108,8 +115,9 @@ typedef struct
*
* These fields are not used by pruning itself for the most part, but are
* used to collect information about what was pruned and what state the
- * page is in after pruning, for the benefit of the caller. They are
- * copied to the caller's PruneFreezeResult at the end.
+ * page is in after pruning to use when updating the visibility map and
+ * for the benefit of the caller. They are copied to the caller's
+ * PruneFreezeResult at the end.
* -------------------------------------------------------
*/
@@ -138,11 +146,10 @@ typedef struct
* bits. It is only valid if we froze some tuples, and all_frozen is
* true.
*
- * NOTE: all_visible and all_frozen don't include LP_DEAD items. That's
- * convenient for heap_page_prune_and_freeze(), to use them to decide
- * whether to freeze the page or not. The all_visible and all_frozen
- * values returned to the caller are adjusted to include LP_DEAD items at
- * the end.
+ * NOTE: all_visible and all_frozen don't include LP_DEAD items until
+ * directly before updating the VM. We ignore LP_DEAD items when deciding
+ * whether or not to opportunistically freeze and when determining the
+ * snapshot conflict horizon required when freezing tuples.
*
* all_frozen should only be considered valid if all_visible is also set;
* we don't bother to clear the all_frozen flag every time we clear the
@@ -377,11 +384,15 @@ identify_and_fix_vm_corruption(Relation relation,
* considered advantageous for overall system performance to do so now. The
* 'cutoffs', 'presult', 'new_relfrozen_xid' and 'new_relmin_mxid' arguments
* are required when freezing. When HEAP_PRUNE_FREEZE option is set, we also
- * set presult->all_visible and presult->all_frozen on exit, to indicate if
- * the VM bits can be set. They are always set to false when the
- * HEAP_PRUNE_FREEZE option is not set, because at the moment only callers
+ * set presult->all_visible and presult->all_frozen on exit, for use when
+ * validating the changes made to the VM. They are always set to false when
+ * the HEAP_PRUNE_FREEZE option is not set, because at the moment only callers
* that also freeze need that information.
*
+ * If HEAP_PAGE_PRUNE_UPDATE_VM is set and the visibility status of the page
+ * has changed, we will update the VM at the same time as pruning and freezing
+ * the heap page.
+ *
* blk_known_av is the visibility status of the heap block as of the last call
* to find_next_unskippable_block(). vmbuffer is the buffer that may already
* contain the required block of the visibility map.
@@ -396,6 +407,8 @@ identify_and_fix_vm_corruption(Relation relation,
* FREEZE indicates that we will also freeze tuples, and will return
* 'all_visible', 'all_frozen' flags to the caller.
*
+ * UPDATE_VM indicates that we will set the page's status in the VM.
+ *
* cutoffs contains the freeze cutoffs, established by VACUUM at the beginning
* of vacuuming the relation. Required if HEAP_PRUNE_FREEZE option is set.
* cutoffs->OldestXmin is also used to determine if dead tuples are
@@ -441,15 +454,19 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
bool do_freeze;
bool do_prune;
bool do_hint;
+ bool do_set_vm;
uint8 vmflags = 0;
uint8 old_vmbits = 0;
bool hint_bit_fpi;
int64 fpi_before = pgWalUsage.wal_fpi;
+ bool all_frozen_except_lp_dead = false;
+ bool set_pd_all_visible = false;
/* Copy parameters to prstate */
prstate.vistest = vistest;
prstate.mark_unused_now = (options & HEAP_PAGE_PRUNE_MARK_UNUSED_NOW) != 0;
prstate.freeze = (options & HEAP_PAGE_PRUNE_FREEZE) != 0;
+ prstate.update_vm = (options & HEAP_PAGE_PRUNE_UPDATE_VM) != 0;
prstate.cutoffs = cutoffs;
/*
@@ -496,29 +513,27 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
prstate.deadoffsets = presult->deadoffsets;
/*
- * Caller may update the VM after we're done. We can keep track of
- * whether the page will be all-visible and all-frozen after pruning and
- * freezing to help the caller to do that.
+ * Keep track of whether or not the page will be all-visible and
+ * all-frozen for use in opportunistic freezing and to update the VM if
+ * the caller requests it.
+ *
+ * Currently, only VACUUM attempts freezing and setting the VM bits. But
+ * other callers could do either one. The visibility bookkeeping is
+ * required for opportunistic freezing (in addition to setting the VM
+ * bits) because we only consider opportunistically freezing tuples if the
+ * whole page would become all-frozen or if the whole page will be frozen
+ * except for dead tuples that will be removed by vacuum.
*
- * Currently, only VACUUM sets the VM bits. To save the effort, only do
- * the bookkeeping if the caller needs it. Currently, that's tied to
- * HEAP_PAGE_PRUNE_FREEZE, but it could be a separate flag if you wanted
- * to update the VM bits without also freezing or freeze without also
- * setting the VM bits.
+ * Dead tuples which will be removed by the end of vacuuming should not
+ * preclude us from opportunistically freezing, so we do not clear
+ * all_visible when we see LP_DEAD items. We fix that after determining
+ * whether or not to freeze but before deciding whether or not to update
+ * the VM so that we don't set the VM bit incorrectly.
*
- * In addition to telling the caller whether it can set the VM bit, we
- * also use 'all_visible' and 'all_frozen' for our own decision-making. If
- * the whole page would become frozen, we consider opportunistically
- * freezing tuples. We will not be able to freeze the whole page if there
- * are tuples present that are not visible to everyone or if there are
- * dead tuples which are not yet removable. However, dead tuples which
- * will be removed by the end of vacuuming should not preclude us from
- * opportunistically freezing. Because of that, we do not clear
- * all_visible when we see LP_DEAD items. We fix that at the end of the
- * function, when we return the value to the caller, so that the caller
- * doesn't set the VM bit incorrectly.
+ * If not freezing or updating the VM, we otherwise avoid the extra
+ * bookkeeping.
*/
- if (prstate.freeze)
+ if (prstate.freeze || prstate.update_vm)
{
prstate.all_visible = true;
prstate.all_frozen = true;
@@ -534,12 +549,15 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
}
/*
- * The visibility cutoff xid is the newest xmin of live tuples on the
- * page. In the common case, this will be set as the conflict horizon the
- * caller can use for updating the VM. If, at the end of freezing and
- * pruning, the page is all-frozen, there is no possibility that any
- * running transaction on the standby does not see tuples on the page as
- * all-visible, so the conflict horizon remains InvalidTransactionId.
+ * The visibility cutoff xid is the newest xmin of live, committed tuples
+ * older than OldestXmin on the page. This field is only kept up-to-date
+ * if the page is all-visible. As soon as a tuple is encountered that is
+ * not visible to all, this field is unmaintained. As long as it is
+ * maintained, it can be used to calculate the snapshot conflict horizon.
+ * This is most likely to happen when updating the VM and/or freezing all
+ * live tuples on the page. It is updated before returning to the caller
+ * because vacuum does assert-build only validation on the page using this
+ * field.
*/
prstate.visibility_cutoff_xid = InvalidTransactionId;
@@ -827,6 +845,68 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
*/
}
+ /*
+ * It was convenient to ignore LP_DEAD items in all_visible earlier on to
+ * make the choice of whether or not to freeze the page unaffected by the
+ * short-term presence of LP_DEAD items. These LP_DEAD items were
+ * effectively assumed to be LP_UNUSED items in the making. It doesn't
+ * matter which vacuum heap pass (initial pass or final pass) ends up
+ * setting the page all-frozen, as long as the ongoing VACUUM does it.
+ *
+ * Now that freezing has been finalized, unset all_visible if there are
+ * any LP_DEAD items on the page. It needs to reflect the present state of
+ * the page when using it to determine whether or not to update the VM.
+ *
+ * Keep track of whether or not the page was all-frozen except LP_DEAD
+ * items for the purposes of calculating the snapshot conflict horizon,
+ * though.
+ */
+ all_frozen_except_lp_dead = prstate.all_frozen;
+ if (prstate.lpdead_items > 0)
+ {
+ prstate.all_visible = false;
+ prstate.all_frozen = false;
+ }
+
+ /*
+ * Handle setting visibility map bit based on information from the VM (as
+ * of last heap_vac_scan_next_block() call), and from all_visible and
+ * all_frozen variables.
+ */
+ if (prstate.update_vm)
+ {
+ /*
+ * Clear any VM corruption. This does not need to be in a critical
+ * section, so we do it first. If PD_ALL_VISIBLE is incorrectly set,
+ * we may mark the heap page buffer dirty here and could end up doing
+ * so again later. This is not a correctness issue and is in the path
+ * of VM corruption, so we don't have to worry about the extra
+ * performance overhead.
+ */
+ if (identify_and_fix_vm_corruption(relation,
+ blockno, buffer, page,
+ blk_known_av, prstate.lpdead_items, vmbuffer))
+ {
+ /* If we fix corruption, don't update the VM further */
+ }
+
+ /* Determine if we actually need to set the VM and which bits to set. */
+ else if (prstate.all_visible &&
+ (!blk_known_av ||
+ (prstate.all_frozen && !VM_ALL_FROZEN(relation, blockno, &vmbuffer))))
+ {
+ vmflags |= VISIBILITYMAP_ALL_VISIBLE;
+ if (prstate.all_frozen)
+ vmflags |= VISIBILITYMAP_ALL_FROZEN;
+ }
+ }
+
+ do_set_vm = vmflags & VISIBILITYMAP_VALID_BITS;
+ set_pd_all_visible = do_set_vm && !PageIsAllVisible(page);
+
+ /* Save these for the caller in case we later zero out vmflags */
+ presult->new_vmbits = vmflags;
+
/* Any error while applying the changes is critical */
START_CRIT_SECTION();
@@ -848,13 +928,13 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
/*
* If that's all we had to do to the page, this is a non-WAL-logged
* hint. If we are going to freeze or prune the page, we will mark
- * the buffer dirty below.
+ * the buffer dirty and emit WAL below.
*/
- if (!do_freeze && !do_prune)
+ if (!do_prune && !do_freeze && !do_set_vm)
MarkBufferDirtyHint(buffer, true);
}
- if (do_prune || do_freeze)
+ if (do_prune || do_freeze || do_set_vm)
{
/* Apply the planned item changes and repair page fragmentation. */
if (do_prune)
@@ -868,7 +948,23 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
if (do_freeze)
heap_freeze_prepared_tuples(buffer, prstate.frozen, prstate.nfrozen);
- MarkBufferDirty(buffer);
+ if (do_prune || do_freeze)
+ MarkBufferDirty(buffer);
+
+ if (do_set_vm)
+ {
+ LockBuffer(vmbuffer, BUFFER_LOCK_EXCLUSIVE);
+ old_vmbits = heap_page_set_vm(relation, blockno, buffer,
+ vmbuffer, vmflags);
+
+ if (old_vmbits == vmflags)
+ {
+ LockBuffer(vmbuffer, BUFFER_LOCK_UNLOCK);
+ do_set_vm = false;
+ /* 0 out vmflags so we don't emit VM update WAL */
+ vmflags = 0;
+ }
+ }
/*
* Emit a WAL XLOG_HEAP2_PRUNE_FREEZE record showing what we did
@@ -885,35 +981,57 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
* on the standby with xids older than the youngest tuple this
* record will freeze will conflict.
*/
- TransactionId frz_conflict_horizon = InvalidTransactionId;
- TransactionId conflict_xid;
+ TransactionId conflict_xid = InvalidTransactionId;
+
+ /*
+ * If we are updating the VM, the conflict horizon is almost
+ * always the visibility cutoff XID.
+ *
+ * Separately, if we are freezing any tuples, as an optimization,
+ * we can use the visibility_cutoff_xid as the conflict horizon if
+ * the page will be all-frozen. This is true even if there are
+ * LP_DEAD line pointers because we ignored those when maintaining
+ * the visibility_cutoff_xid.
+ */
+ if (do_set_vm || (do_freeze && all_frozen_except_lp_dead))
+ conflict_xid = prstate.visibility_cutoff_xid;
/*
- * We can use the visibility_cutoff_xid as our cutoff for
- * conflicts when the whole page is eligible to become all-frozen
- * in the VM once we're done with it. Otherwise we generate a
- * conservative cutoff by stepping back from OldestXmin.
+ * Otherwise, if we are freezing but the page would not be
+ * all-frozen, we have to use the more pessimistic horizon of
+ * OldestXmin, which may be newer than the newest tuple we froze.
+ * That's because we won't have maintained the
+ * visibility_cutoff_xid.
*/
- if (do_freeze)
+ else if (do_freeze)
{
- if (prstate.all_visible && prstate.all_frozen)
- frz_conflict_horizon = prstate.visibility_cutoff_xid;
- else
- {
- /* Avoids false conflicts when hot_standby_feedback in use */
- frz_conflict_horizon = prstate.cutoffs->OldestXmin;
- TransactionIdRetreat(frz_conflict_horizon);
- }
+ conflict_xid = prstate.cutoffs->OldestXmin;
+ TransactionIdRetreat(conflict_xid);
}
- if (TransactionIdFollows(frz_conflict_horizon, prstate.latest_xid_removed))
- conflict_xid = frz_conflict_horizon;
- else
+ /*
+ * If we are removing tuples with a younger xmax than our so far
+ * calculated conflict_xid, we must use this as our horizon.
+ */
+ if (TransactionIdFollows(prstate.latest_xid_removed, conflict_xid))
conflict_xid = prstate.latest_xid_removed;
+ /*
+ * We can omit the snapshot conflict horizon if we are not pruning
+ * or freezing any tuples and are setting an already all-visible
+ * page all-frozen in the VM. In this case, all of the tuples on
+ * the page must already be visible to all MVCC snapshots on the
+ * standby.
+ */
+ if (!do_prune && !do_freeze && do_set_vm &&
+ blk_known_av && (vmflags & VISIBILITYMAP_ALL_FROZEN))
+ conflict_xid = InvalidTransactionId;
+
log_heap_prune_and_freeze(relation, buffer,
false,
- InvalidBuffer, 0, false,
+ vmbuffer,
+ vmflags,
+ set_pd_all_visible,
conflict_xid,
true, reason,
prstate.frozen, prstate.nfrozen,
@@ -925,124 +1043,55 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
END_CRIT_SECTION();
- /* Copy information back for caller */
- presult->ndeleted = prstate.ndeleted;
- presult->nnewlpdead = prstate.ndead;
- presult->nfrozen = prstate.nfrozen;
- presult->live_tuples = prstate.live_tuples;
- presult->recently_dead_tuples = prstate.recently_dead_tuples;
-
- /*
- * It was convenient to ignore LP_DEAD items in all_visible earlier on to
- * make the choice of whether or not to freeze the page unaffected by the
- * short-term presence of LP_DEAD items. These LP_DEAD items were
- * effectively assumed to be LP_UNUSED items in the making. It doesn't
- * matter which vacuum heap pass (initial pass or final pass) ends up
- * setting the page all-frozen, as long as the ongoing VACUUM does it.
- *
- * Now that freezing has been finalized, unset all_visible if there are
- * any LP_DEAD items on the page. It needs to reflect the present state
- * of the page, as expected for updating the visibility map.
- */
- if (prstate.all_visible && prstate.lpdead_items == 0)
- {
- presult->all_visible = prstate.all_visible;
- presult->all_frozen = prstate.all_frozen;
- }
- else
- {
- presult->all_visible = false;
- presult->all_frozen = false;
- }
+ if (do_set_vm)
+ LockBuffer(vmbuffer, BUFFER_LOCK_UNLOCK);
- presult->hastup = prstate.hastup;
-
- /*
- * If updating the visibility map, the conflict horizon for that record
- * must be the newest xmin on the page. However, if the page is
- * completely frozen, there can be no conflict and the vm_conflict_horizon
- * should remain InvalidTransactionId. This includes the case that we
- * just froze all the tuples; the prune-freeze record included the
- * conflict XID already so the VM update record doesn't need it.
- */
- if (presult->all_frozen)
- presult->vm_conflict_horizon = InvalidTransactionId;
- else
- presult->vm_conflict_horizon = prstate.visibility_cutoff_xid;
+ Assert(!prstate.all_visible || (prstate.lpdead_items == 0));
/*
- * Handle setting visibility map bit based on information from the VM (as
- * of last heap_vac_scan_next_block() call), and from all_visible and
- * all_frozen variables.
+ * VACUUM will call heap_page_is_all_visible() during the second pass over
+ * the heap to determine all_visible and all_frozen for the page -- this
+ * is a specialized version of the logic from this function. Now that
+ * we've finished pruning and freezing, make sure that we're in total
+ * agreement with heap_page_is_all_visible() using an assertion. We will
+ * have already set the page in the VM, so this assertion will only let
+ * you know that you've already done something wrong.
*/
- if (options & HEAP_PAGE_PRUNE_UPDATE_VM)
+#ifdef USE_ASSERT_CHECKING
+ if (prstate.all_visible)
{
- if (identify_and_fix_vm_corruption(relation,
- blockno, buffer, page,
- blk_known_av,
- prstate.lpdead_items, vmbuffer))
- {
- /* If we fix corruption, don't update the VM further */
- }
+ TransactionId debug_cutoff;
+ bool debug_all_frozen;
- /*
- * If the page isn't yet marked all-visible in the VM or it is and
- * needs to me marked all-frozen, update the VM Note that all_frozen
- * is only valid if all_visible is true, so we must check both
- * all_visible and all_frozen.
- */
- else if (presult->all_visible &&
- (!blk_known_av ||
- (presult->all_frozen && !VM_ALL_FROZEN(relation, blockno, &vmbuffer))))
- {
- Assert(prstate.lpdead_items == 0);
- vmflags = VISIBILITYMAP_ALL_VISIBLE;
+ Assert(cutoffs);
- /*
- * If the page is all-frozen, we can pass InvalidTransactionId as
- * our cutoff_xid, since a snapshotConflictHorizon sufficient to
- * make everything safe for REDO was logged when the page's tuples
- * were frozen.
- */
- if (presult->all_frozen)
- {
- Assert(!TransactionIdIsValid(presult->vm_conflict_horizon));
- vmflags |= VISIBILITYMAP_ALL_FROZEN;
- }
+ Assert(prstate.lpdead_items == 0);
- /*
- * It's possible for the VM bit to be clear and the page-level bit
- * to be set if checksums are not enabled.
- *
- * And even if we are just planning to update the frozen bit in
- * the VM, we shouldn't rely on all_visible_according_to_vm as a
- * proxy for the page-level PD_ALL_VISIBLE bit being set, since it
- * might have become stale.
- *
- * If the heap page is all-visible but the VM bit is not set, we
- * don't need to dirty the heap page. However, if checksums are
- * enabled, we do need to make sure that the heap page is dirtied
- * before passing it to visibilitymap_set(), because it may be
- * logged.
- */
- if (!PageIsAllVisible(page) || XLogHintBitIsNeeded())
- {
- PageSetAllVisible(page);
- MarkBufferDirty(buffer);
- }
+ if (!heap_page_is_all_visible(relation, buffer,
+ cutoffs->OldestXmin,
+ &debug_all_frozen,
+ &debug_cutoff, off_loc))
+ Assert(false);
- old_vmbits = heap_page_set_vm_and_log(relation, blockno, buffer,
- vmbuffer, presult->vm_conflict_horizon,
- vmflags);
- }
+ Assert(prstate.all_frozen == debug_all_frozen);
+
+ Assert(!TransactionIdIsValid(debug_cutoff) ||
+ debug_cutoff == prstate.visibility_cutoff_xid);
}
+#endif
+ /* Copy information back for caller */
+ presult->ndeleted = prstate.ndeleted;
+ presult->nnewlpdead = prstate.ndead;
+ presult->nfrozen = prstate.nfrozen;
+ presult->live_tuples = prstate.live_tuples;
+ presult->recently_dead_tuples = prstate.recently_dead_tuples;
+ presult->old_vmbits = old_vmbits;
+ /* new_vmbits was set above */
+ presult->hastup = prstate.hastup;
presult->lpdead_items = prstate.lpdead_items;
/* the presult->deadoffsets array was already filled in */
- presult->old_vmbits = old_vmbits;
- presult->new_vmbits = vmflags;
-
if (prstate.freeze)
{
if (presult->nfrozen > 0)
@@ -1624,8 +1673,13 @@ heap_prune_record_unchanged_lp_normal(Page page, PruneState *prstate, OffsetNumb
break;
}
- /* Consider freezing any normal tuples which will not be removed */
- if (prstate->freeze)
+ /*
+ * Consider freezing any normal tuples which will not be removed.
+ * Regardless of whether or not we want to freeze the tuples, if we want
+ * to update the VM, we have to call heap_prepare_freeze_tuple() on every
+ * tuple to know whether or not the page will be totally frozen.
+ */
+ if (prstate->freeze || prstate->update_vm)
{
bool totally_frozen;
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index 8daad54a0fe..246ba07db9c 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -2012,34 +2012,6 @@ lazy_scan_prune(LVRelState *vacrel,
vacrel->new_frozen_tuple_pages++;
}
- /*
- * VACUUM will call heap_page_is_all_visible() during the second pass over
- * the heap to determine all_visible and all_frozen for the page -- this
- * is a specialized version of the logic from this function. Now that
- * we've finished pruning and freezing, make sure that we're in total
- * agreement with heap_page_is_all_visible() using an assertion.
- */
-#ifdef USE_ASSERT_CHECKING
- /* Note that all_frozen value does not matter when !all_visible */
- if (presult.all_visible)
- {
- TransactionId debug_cutoff;
- bool debug_all_frozen;
-
- Assert(presult.lpdead_items == 0);
-
- if (!heap_page_is_all_visible(vacrel->rel, buf,
- vacrel->cutoffs.OldestXmin, &debug_all_frozen,
- &debug_cutoff, &vacrel->offnum))
- Assert(false);
-
- Assert(presult.all_frozen == debug_all_frozen);
-
- Assert(!TransactionIdIsValid(debug_cutoff) ||
- debug_cutoff == presult.vm_conflict_horizon);
- }
-#endif
-
/*
* Now save details of the LP_DEAD items from the page in vacrel
*/
@@ -2073,8 +2045,6 @@ lazy_scan_prune(LVRelState *vacrel,
/* Did we find LP_DEAD items? */
*has_lpdead_items = (presult.lpdead_items > 0);
- Assert(!presult.all_visible || !(*has_lpdead_items));
-
/*
* For the purposes of logging, count whether or not the page was newly
* set all-visible and, potentially, all-frozen.
diff --git a/src/include/access/heapam.h b/src/include/access/heapam.h
index 534a63aab31..e35b4adf38d 100644
--- a/src/include/access/heapam.h
+++ b/src/include/access/heapam.h
@@ -234,19 +234,12 @@ typedef struct PruneFreezeResult
int recently_dead_tuples;
/*
- * all_visible and all_frozen indicate the status of the page as reflected
- * in the visibility map after pruning, freezing, and setting any pages
- * all-visible in the visibility map.
+ * old_vmbits are the state of the all-visible and all-frozen bits in the
+ * visibility map before updating it during phase I of vacuuming.
+ * new_vmbits are the state of those bits after phase I of vacuuming.
*
- * vm_conflict_horizon is the newest xmin of live tuples on the page
- * (older than OldestXmin). It will only be valid if we did not set the
- * page all-frozen in the VM.
- *
- * These are only set if the HEAP_PRUNE_FREEZE option is set.
+ * These are only set if the HEAP_PAGE_PRUNE_UPDATE_VM option is set.
*/
- bool all_visible;
- bool all_frozen;
- TransactionId vm_conflict_horizon;
uint8 old_vmbits;
uint8 new_vmbits;
--
2.34.1
v2-0012-Remove-xl_heap_visible-entirely.patchtext/x-patch; charset=US-ASCII; name=v2-0012-Remove-xl_heap_visible-entirely.patchDownload
From a1bbff2e42b771bbd8a4b8e2b0719e4582bfcf1f Mon Sep 17 00:00:00 2001
From: Melanie Plageman <melanieplageman@gmail.com>
Date: Wed, 18 Jun 2025 12:30:42 -0400
Subject: [PATCH v2 12/12] Remove xl_heap_visible entirely
There are now no users of this, so eliminate it entirely.
ci-os-only:
---
src/backend/access/common/bufmask.c | 3 +-
src/backend/access/heap/heapam.c | 113 +----------------
src/backend/access/heap/heapam_xlog.c | 150 +----------------------
src/backend/access/heap/vacuumlazy.c | 4 +-
src/backend/access/heap/visibilitymap.c | 84 +------------
src/backend/access/rmgrdesc/heapdesc.c | 10 --
src/backend/replication/logical/decode.c | 1 -
src/include/access/heapam.h | 3 -
src/include/access/heapam_xlog.h | 6 -
src/include/access/visibilitymap.h | 10 +-
10 files changed, 14 insertions(+), 370 deletions(-)
diff --git a/src/backend/access/common/bufmask.c b/src/backend/access/common/bufmask.c
index bb260cffa68..1fff01383b3 100644
--- a/src/backend/access/common/bufmask.c
+++ b/src/backend/access/common/bufmask.c
@@ -56,8 +56,7 @@ mask_page_hint_bits(Page page)
/*
* During replay, if the page LSN has advanced past our XLOG record's LSN,
- * we don't mark the page all-visible. See heap_xlog_visible() for
- * details.
+ * we don't mark the page all-visible.
*/
PageClearAllVisible(page);
}
diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index d2cf8aa9fb8..6f134dfd535 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -40,6 +40,7 @@
#include "access/valid.h"
#include "access/visibilitymap.h"
#include "access/xloginsert.h"
+#include "access/xlogutils.h"
#include "catalog/pg_database.h"
#include "catalog/pg_database_d.h"
#include "commands/vacuum.h"
@@ -7845,73 +7846,6 @@ heap_tuple_needs_eventual_freeze(HeapTupleHeader tuple)
return false;
}
-/*
- * Make the heap and VM page changes needed to set a page all-visible.
- * Do not call in recovery.
- */
-uint8
-heap_page_set_vm_and_log(Relation rel, BlockNumber heap_blk, Buffer heap_buf,
- Buffer vmbuf, TransactionId cutoff_xid,
- uint8 vmflags)
-{
- Page heap_page = BufferGetPage(heap_buf);
- bool set_heap_lsn = false;
- XLogRecPtr recptr = InvalidXLogRecPtr;
- uint8 old_vmbits = 0;
-
- Assert(BufferIsValid(heap_buf));
-
- START_CRIT_SECTION();
-
- /* Check that we have the right heap page pinned, if present */
- if (BufferGetBlockNumber(heap_buf) != heap_blk)
- elog(ERROR, "wrong heap buffer passed to heap_page_set_vm_and_log");
-
- /*
- * We must never end up with the VM bit set and the page-level
- * PD_ALL_VISIBLE bit clear. If that were to occur, a subsequent page
- * modification would fail to clear the VM bit. Though it is possible for
- * the page-level bit to be set and the VM bit to be clear if checksums
- * and wal_log_hints are not enabled.
- */
- if (!PageIsAllVisible(heap_page))
- {
- PageSetAllVisible(heap_page);
-
- /*
- * Buffer will usually be dirty from other changes, so it is worth the
- * extra check
- */
- if (!BufferIsDirty(heap_buf))
- {
- if (XLogHintBitIsNeeded())
- MarkBufferDirty(heap_buf);
- else
- MarkBufferDirtyHint(heap_buf, true);
- }
-
- set_heap_lsn = XLogHintBitIsNeeded();
- }
-
- old_vmbits = visibilitymap_set(rel, heap_blk, heap_buf,
- &recptr, vmbuf, cutoff_xid, vmflags);
-
- /*
- * If we modified the heap page and data checksums are enabled (or
- * wal_log_hints=on), we need to protect the heap page from being torn.
- *
- * If not, then we must *not* update the heap page's LSN. In this case,
- * the FPI for the heap page was omitted from the WAL record inserted in
- * the VM record, so it would be incorrect to update the heap page's LSN.
- */
- if (set_heap_lsn)
- PageSetLSN(heap_page, recptr);
-
- END_CRIT_SECTION();
-
- return old_vmbits;
-}
-
/*
* Ensure the provided heap page is marked PD_ALL_VISIBLE and then set the
* provided vmflags in the provided vmbuf.
@@ -7953,7 +7887,7 @@ heap_page_set_vm(Relation rel, BlockNumber heap_blk, Buffer heap_buf,
MarkBufferDirty(heap_buf);
}
- return visibilitymap_set_vmbyte(rel, heap_blk, vmbuf, vmflags);
+ return visibilitymap_set(rel, heap_blk, vmbuf, vmflags);
}
/*
@@ -8895,49 +8829,6 @@ bottomup_sort_and_shrink(TM_IndexDeleteOp *delstate)
return nblocksfavorable;
}
-/*
- * Perform XLogInsert for a heap-visible operation. 'block' is the block
- * being marked all-visible, and vm_buffer is the buffer containing the
- * corresponding visibility map block. Both should have already been modified
- * and dirtied.
- *
- * snapshotConflictHorizon comes from the largest xmin on the page being
- * marked all-visible. REDO routine uses it to generate recovery conflicts.
- *
- * If checksums or wal_log_hints are enabled, we may also generate a full-page
- * image of heap_buffer. Otherwise, we optimize away the FPI (by specifying
- * REGBUF_NO_IMAGE for the heap buffer), in which case the caller should *not*
- * update the heap page's LSN.
- */
-XLogRecPtr
-log_heap_visible(Relation rel, Buffer heap_buffer, Buffer vm_buffer,
- TransactionId snapshotConflictHorizon, uint8 vmflags)
-{
- xl_heap_visible xlrec;
- XLogRecPtr recptr;
- uint8 flags;
-
- Assert(BufferIsValid(heap_buffer));
- Assert(BufferIsValid(vm_buffer));
-
- xlrec.snapshotConflictHorizon = snapshotConflictHorizon;
- xlrec.flags = vmflags;
- if (RelationIsAccessibleInLogicalDecoding(rel))
- xlrec.flags |= VISIBILITYMAP_XLOG_CATALOG_REL;
- XLogBeginInsert();
- XLogRegisterData(&xlrec, SizeOfHeapVisible);
-
- XLogRegisterBuffer(0, vm_buffer, 0);
-
- flags = REGBUF_STANDARD;
- if (!XLogHintBitIsNeeded())
- flags |= REGBUF_NO_IMAGE;
- XLogRegisterBuffer(1, heap_buffer, flags);
-
- recptr = XLogInsert(RM_HEAP2_ID, XLOG_HEAP2_VISIBLE);
-
- return recptr;
-}
/*
* Perform XLogInsert for a heap-update operation. Caller must already
diff --git a/src/backend/access/heap/heapam_xlog.c b/src/backend/access/heap/heapam_xlog.c
index 70a46a37357..975a59d717e 100644
--- a/src/backend/access/heap/heapam_xlog.c
+++ b/src/backend/access/heap/heapam_xlog.c
@@ -273,7 +273,7 @@ heap_xlog_prune_freeze(XLogReaderState *record)
Relation reln = CreateFakeRelcacheEntry(rlocator);
visibilitymap_pin(reln, blkno, &vmbuffer);
- old_vmbits = visibilitymap_set_vmbyte(reln, blkno, vmbuffer, vmflags);
+ old_vmbits = visibilitymap_set(reln, blkno, vmbuffer, vmflags);
/* Only set VM page LSN if we modified the page */
if (old_vmbits != vmflags)
PageSetLSN(BufferGetPage(vmbuffer), lsn);
@@ -284,143 +284,6 @@ heap_xlog_prune_freeze(XLogReaderState *record)
UnlockReleaseBuffer(vmbuffer);
}
-/*
- * Replay XLOG_HEAP2_VISIBLE records.
- *
- * The critical integrity requirement here is that we must never end up with
- * a situation where the visibility map bit is set, and the page-level
- * PD_ALL_VISIBLE bit is clear. If that were to occur, then a subsequent
- * page modification would fail to clear the visibility map bit.
- */
-static void
-heap_xlog_visible(XLogReaderState *record)
-{
- XLogRecPtr lsn = record->EndRecPtr;
- xl_heap_visible *xlrec = (xl_heap_visible *) XLogRecGetData(record);
- Buffer vmbuffer = InvalidBuffer;
- Buffer buffer;
- Page page;
- RelFileLocator rlocator;
- BlockNumber blkno;
- XLogRedoAction action;
-
- Assert((xlrec->flags & VISIBILITYMAP_XLOG_VALID_BITS) == xlrec->flags);
-
- XLogRecGetBlockTag(record, 1, &rlocator, NULL, &blkno);
-
- /*
- * If there are any Hot Standby transactions running that have an xmin
- * horizon old enough that this page isn't all-visible for them, they
- * might incorrectly decide that an index-only scan can skip a heap fetch.
- *
- * NB: It might be better to throw some kind of "soft" conflict here that
- * forces any index-only scan that is in flight to perform heap fetches,
- * rather than killing the transaction outright.
- */
- if (InHotStandby)
- ResolveRecoveryConflictWithSnapshot(xlrec->snapshotConflictHorizon,
- xlrec->flags & VISIBILITYMAP_XLOG_CATALOG_REL,
- rlocator);
-
- /*
- * Read the heap page, if it still exists. If the heap file has dropped or
- * truncated later in recovery, we don't need to update the page, but we'd
- * better still update the visibility map.
- */
- action = XLogReadBufferForRedo(record, 1, &buffer);
- if (action == BLK_NEEDS_REDO)
- {
- /*
- * We don't bump the LSN of the heap page when setting the visibility
- * map bit (unless checksums or wal_hint_bits is enabled, in which
- * case we must). This exposes us to torn page hazards, but since
- * we're not inspecting the existing page contents in any way, we
- * don't care.
- */
- page = BufferGetPage(buffer);
-
- PageSetAllVisible(page);
-
- if (XLogHintBitIsNeeded())
- PageSetLSN(page, lsn);
-
- MarkBufferDirty(buffer);
- }
- else if (action == BLK_RESTORED)
- {
- /*
- * If heap block was backed up, we already restored it and there's
- * nothing more to do. (This can only happen with checksums or
- * wal_log_hints enabled.)
- */
- }
-
- if (BufferIsValid(buffer))
- {
- Size space = PageGetFreeSpace(BufferGetPage(buffer));
-
- UnlockReleaseBuffer(buffer);
-
- /*
- * Since FSM is not WAL-logged and only updated heuristically, it
- * easily becomes stale in standbys. If the standby is later promoted
- * and runs VACUUM, it will skip updating individual free space
- * figures for pages that became all-visible (or all-frozen, depending
- * on the vacuum mode,) which is troublesome when FreeSpaceMapVacuum
- * propagates too optimistic free space values to upper FSM layers;
- * later inserters try to use such pages only to find out that they
- * are unusable. This can cause long stalls when there are many such
- * pages.
- *
- * Forestall those problems by updating FSM's idea about a page that
- * is becoming all-visible or all-frozen.
- *
- * Do this regardless of a full-page image being applied, since the
- * FSM data is not in the page anyway.
- */
- if (xlrec->flags & VISIBILITYMAP_VALID_BITS)
- XLogRecordPageWithFreeSpace(rlocator, blkno, space);
- }
-
- /*
- * Even if we skipped the heap page update due to the LSN interlock, it's
- * still safe to update the visibility map. Any WAL record that clears
- * the visibility map bit does so before checking the page LSN, so any
- * bits that need to be cleared will still be cleared.
- */
- if (XLogReadBufferForRedoExtended(record, 0, RBM_ZERO_ON_ERROR, false,
- &vmbuffer) == BLK_NEEDS_REDO)
- {
- Page vmpage = BufferGetPage(vmbuffer);
- Relation reln;
- uint8 vmbits;
-
- /* initialize the page if it was read as zeros */
- if (PageIsNew(vmpage))
- PageInit(vmpage, BLCKSZ, 0);
-
- /* remove VISIBILITYMAP_XLOG_* */
- vmbits = xlrec->flags & VISIBILITYMAP_VALID_BITS;
-
- /*
- * XLogReadBufferForRedoExtended locked the buffer. But
- * visibilitymap_set will handle locking itself.
- */
- LockBuffer(vmbuffer, BUFFER_LOCK_UNLOCK);
-
- reln = CreateFakeRelcacheEntry(rlocator);
- visibilitymap_pin(reln, blkno, &vmbuffer);
-
- visibilitymap_set(reln, blkno, InvalidBuffer, &lsn, vmbuffer,
- xlrec->snapshotConflictHorizon, vmbits);
-
- ReleaseBuffer(vmbuffer);
- FreeFakeRelcacheEntry(reln);
- }
- else if (BufferIsValid(vmbuffer))
- UnlockReleaseBuffer(vmbuffer);
-}
-
/*
* Given an "infobits" field from an XLog record, set the correct bits in the
* given infomask and infomask2 for the tuple touched by the record.
@@ -797,10 +660,10 @@ heap_xlog_multi_insert(XLogReaderState *record)
Relation reln = CreateFakeRelcacheEntry(rlocator);
visibilitymap_pin(reln, blkno, &vmbuffer);
- visibilitymap_set_vmbyte(reln, blkno,
- vmbuffer,
- VISIBILITYMAP_ALL_VISIBLE |
- VISIBILITYMAP_ALL_FROZEN);
+ visibilitymap_set(reln, blkno,
+ vmbuffer,
+ VISIBILITYMAP_ALL_VISIBLE |
+ VISIBILITYMAP_ALL_FROZEN);
PageSetLSN(BufferGetPage(vmbuffer), lsn);
FreeFakeRelcacheEntry(reln);
}
@@ -1380,9 +1243,6 @@ heap2_redo(XLogReaderState *record)
case XLOG_HEAP2_PRUNE_VACUUM_CLEANUP:
heap_xlog_prune_freeze(record);
break;
- case XLOG_HEAP2_VISIBLE:
- heap_xlog_visible(record);
- break;
case XLOG_HEAP2_MULTI_INSERT:
heap_xlog_multi_insert(record);
break;
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index 246ba07db9c..9371d6f37c1 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -1878,14 +1878,14 @@ lazy_scan_new_or_empty(LVRelState *vacrel, Buffer buf, BlockNumber blkno,
*/
if (!PageIsAllVisible(page))
{
- uint8 new_vmbits = VISIBILITYMAP_ALL_VISIBLE |
+ uint8 new_vmbits = VISIBILITYMAP_ALL_VISIBLE |
VISIBILITYMAP_ALL_FROZEN;
START_CRIT_SECTION();
LockBuffer(vmbuffer, BUFFER_LOCK_EXCLUSIVE);
heap_page_set_vm(vacrel->rel, blkno, buf,
- vmbuffer, new_vmbits);
+ vmbuffer, new_vmbits);
/* Should have set PD_ALL_VISIBLE and marked buf dirty */
Assert(BufferIsDirty(buf));
diff --git a/src/backend/access/heap/visibilitymap.c b/src/backend/access/heap/visibilitymap.c
index cabd0fa0880..a24554fe191 100644
--- a/src/backend/access/heap/visibilitymap.c
+++ b/src/backend/access/heap/visibilitymap.c
@@ -219,86 +219,6 @@ visibilitymap_pin_ok(BlockNumber heapBlk, Buffer vmbuf)
return BufferIsValid(vmbuf) && BufferGetBlockNumber(vmbuf) == mapBlock;
}
-/*
- * visibilitymap_set - set bit(s) on a previously pinned page
- *
- * Caller is expected to set the heap page's PD_ALL_VISIBLE bit before calling
- * this function. Except in recovery, caller should also pass the heap buffer.
- * When checksums are enabled and we're not in recovery, we must add the heap
- * buffer to the WAL chain to protect it from being torn.
- *
- * You must pass a buffer containing the correct map page to this function.
- * Call visibilitymap_pin first to pin the right one. This function doesn't do
- * any I/O.
- *
- * cutoff_xid is the largest xmin on the page being marked all-visible; it is
- * needed for Hot Standby, and can be InvalidTransactionId if the page
- * contains no tuples. It can also be set to InvalidTransactionId when a page
- * that is already all-visible is being marked all-frozen.
- *
- * If we're in recovery, recptr points to the LSN of the XLOG record we're
- * replaying and the VM page LSN is advanced to this LSN. During normal
- * running, we'll generate a new XLOG record for the changes to the VM and set
- * the VM page LSN. We will return this LSN in recptr, and the caller may use
- * this to set the heap page LSN.
- *
- * Returns the state of the page's VM bits before setting flags and sets.
- */
-uint8
-visibilitymap_set(Relation rel, BlockNumber heapBlk, Buffer heapBuf,
- XLogRecPtr *recptr, Buffer vmBuf, TransactionId cutoff_xid,
- uint8 flags)
-{
- BlockNumber mapBlock = HEAPBLK_TO_MAPBLOCK(heapBlk);
- uint32 mapByte = HEAPBLK_TO_MAPBYTE(heapBlk);
- uint8 mapOffset = HEAPBLK_TO_OFFSET(heapBlk);
- Page page;
- uint8 *map;
- uint8 status;
-
-#ifdef TRACE_VISIBILITYMAP
- elog(DEBUG1, "vm_set %s %d", RelationGetRelationName(rel), heapBlk);
-#endif
-
- Assert(InRecovery || XLogRecPtrIsInvalid(*recptr));
- Assert(InRecovery || PageIsAllVisible((Page) BufferGetPage(heapBuf)));
- Assert((flags & VISIBILITYMAP_VALID_BITS) == flags);
-
- /* Must never set all_frozen bit without also setting all_visible bit */
- Assert(flags != VISIBILITYMAP_ALL_FROZEN);
-
- /* Check that we have the right VM page pinned */
- if (!BufferIsValid(vmBuf) || BufferGetBlockNumber(vmBuf) != mapBlock)
- elog(ERROR, "wrong VM buffer passed to visibilitymap_set");
-
- page = BufferGetPage(vmBuf);
- map = (uint8 *) PageGetContents(page);
- LockBuffer(vmBuf, BUFFER_LOCK_EXCLUSIVE);
-
- status = (map[mapByte] >> mapOffset) & VISIBILITYMAP_VALID_BITS;
- if (flags != status)
- {
- START_CRIT_SECTION();
-
- map[mapByte] |= (flags << mapOffset);
- MarkBufferDirty(vmBuf);
-
- if (RelationNeedsWAL(rel))
- {
- if (XLogRecPtrIsInvalid(*recptr))
- {
- Assert(!InRecovery);
- *recptr = log_heap_visible(rel, heapBuf, vmBuf, cutoff_xid, flags);
- }
- PageSetLSN(page, *recptr);
- }
-
- END_CRIT_SECTION();
- }
-
- LockBuffer(vmBuf, BUFFER_LOCK_UNLOCK);
- return status;
-}
/*
* Set flags in the VM block contained in the passed in vmBuf.
@@ -308,8 +228,8 @@ visibilitymap_set(Relation rel, BlockNumber heapBlk, Buffer heapBuf,
* making any changes needed to the associated heap page.
*/
uint8
-visibilitymap_set_vmbyte(Relation rel, BlockNumber heapBlk,
- Buffer vmBuf, uint8 flags)
+visibilitymap_set(Relation rel, BlockNumber heapBlk,
+ Buffer vmBuf, uint8 flags)
{
BlockNumber mapBlock = HEAPBLK_TO_MAPBLOCK(heapBlk);
uint32 mapByte = HEAPBLK_TO_MAPBYTE(heapBlk);
diff --git a/src/backend/access/rmgrdesc/heapdesc.c b/src/backend/access/rmgrdesc/heapdesc.c
index d6c86ccac20..f7880a4ed81 100644
--- a/src/backend/access/rmgrdesc/heapdesc.c
+++ b/src/backend/access/rmgrdesc/heapdesc.c
@@ -351,13 +351,6 @@ heap2_desc(StringInfo buf, XLogReaderState *record)
}
}
}
- else if (info == XLOG_HEAP2_VISIBLE)
- {
- xl_heap_visible *xlrec = (xl_heap_visible *) rec;
-
- appendStringInfo(buf, "snapshotConflictHorizon: %u, flags: 0x%02X",
- xlrec->snapshotConflictHorizon, xlrec->flags);
- }
else if (info == XLOG_HEAP2_MULTI_INSERT)
{
xl_heap_multi_insert *xlrec = (xl_heap_multi_insert *) rec;
@@ -462,9 +455,6 @@ heap2_identify(uint8 info)
case XLOG_HEAP2_PRUNE_VACUUM_CLEANUP:
id = "PRUNE_VACUUM_CLEANUP";
break;
- case XLOG_HEAP2_VISIBLE:
- id = "VISIBLE";
- break;
case XLOG_HEAP2_MULTI_INSERT:
id = "MULTI_INSERT";
break;
diff --git a/src/backend/replication/logical/decode.c b/src/backend/replication/logical/decode.c
index cc03f0706e9..2fdd4af90a8 100644
--- a/src/backend/replication/logical/decode.c
+++ b/src/backend/replication/logical/decode.c
@@ -454,7 +454,6 @@ heap2_decode(LogicalDecodingContext *ctx, XLogRecordBuffer *buf)
case XLOG_HEAP2_PRUNE_ON_ACCESS:
case XLOG_HEAP2_PRUNE_VACUUM_SCAN:
case XLOG_HEAP2_PRUNE_VACUUM_CLEANUP:
- case XLOG_HEAP2_VISIBLE:
case XLOG_HEAP2_LOCK_UPDATED:
break;
default:
diff --git a/src/include/access/heapam.h b/src/include/access/heapam.h
index e35b4adf38d..c404b794fda 100644
--- a/src/include/access/heapam.h
+++ b/src/include/access/heapam.h
@@ -365,9 +365,6 @@ extern bool heap_tuple_needs_eventual_freeze(HeapTupleHeader tuple);
extern uint8 heap_page_set_vm(Relation rel, BlockNumber heap_blk, Buffer heap_buf,
Buffer vmbuf, uint8 vmflags);
-extern uint8 heap_page_set_vm_and_log(Relation rel, BlockNumber heap_blk, Buffer heap_buf,
- Buffer vmbuf, TransactionId cutoff_xid,
- uint8 vmflags);
extern void simple_heap_insert(Relation relation, HeapTuple tup);
extern void simple_heap_delete(Relation relation, ItemPointer tid);
diff --git a/src/include/access/heapam_xlog.h b/src/include/access/heapam_xlog.h
index ceae9c083ff..9a61434b881 100644
--- a/src/include/access/heapam_xlog.h
+++ b/src/include/access/heapam_xlog.h
@@ -60,7 +60,6 @@
#define XLOG_HEAP2_PRUNE_ON_ACCESS 0x10
#define XLOG_HEAP2_PRUNE_VACUUM_SCAN 0x20
#define XLOG_HEAP2_PRUNE_VACUUM_CLEANUP 0x30
-#define XLOG_HEAP2_VISIBLE 0x40
#define XLOG_HEAP2_MULTI_INSERT 0x50
#define XLOG_HEAP2_LOCK_UPDATED 0x60
#define XLOG_HEAP2_NEW_CID 0x70
@@ -495,11 +494,6 @@ extern void heap2_desc(StringInfo buf, XLogReaderState *record);
extern const char *heap2_identify(uint8 info);
extern void heap_xlog_logical_rewrite(XLogReaderState *r);
-extern XLogRecPtr log_heap_visible(Relation rel, Buffer heap_buffer,
- Buffer vm_buffer,
- TransactionId snapshotConflictHorizon,
- uint8 vmflags);
-
/* in heapdesc.c, so it can be shared between frontend/backend code */
extern void heap_xlog_deserialize_prune_and_freeze(char *cursor, uint8 flags,
int *nplans, xlhp_freeze_plan **plans,
diff --git a/src/include/access/visibilitymap.h b/src/include/access/visibilitymap.h
index 91ef3705e84..20141e3e805 100644
--- a/src/include/access/visibilitymap.h
+++ b/src/include/access/visibilitymap.h
@@ -31,14 +31,8 @@ extern bool visibilitymap_clear(Relation rel, BlockNumber heapBlk,
extern void visibilitymap_pin(Relation rel, BlockNumber heapBlk,
Buffer *vmbuf);
extern bool visibilitymap_pin_ok(BlockNumber heapBlk, Buffer vmbuf);
-extern uint8 visibilitymap_set(Relation rel,
- BlockNumber heapBlk, Buffer heapBuf,
- XLogRecPtr *recptr,
- Buffer vmBuf,
- TransactionId cutoff_xid,
- uint8 flags);
-extern uint8 visibilitymap_set_vmbyte(Relation rel, BlockNumber heapBlk,
- Buffer vmBuf, uint8 flags);
+extern uint8 visibilitymap_set(Relation rel, BlockNumber heapBlk,
+ Buffer vmBuf, uint8 flags);
extern uint8 visibilitymap_get_status(Relation rel, BlockNumber heapBlk, Buffer *vmbuf);
extern void visibilitymap_count(Relation rel, BlockNumber *all_visible, BlockNumber *all_frozen);
extern BlockNumber visibilitymap_prepare_truncate(Relation rel,
--
2.34.1
On Thu, Jun 26, 2025 at 6:04 PM Melanie Plageman
<melanieplageman@gmail.com> wrote:
Rebased in light of recent changes on master:
This needed another rebase, and, in light of the discussion in [1]/messages/by-id/CAAKRu_Yj=yrL+gGGsqfYVQcYn7rDp6hDeoF1vN453JDp8dEY+w@mail.gmail.com,
I've also removed the patch to add heap wrappers for setting pages
all-visible.
More notably, the final patch (0012) in attached v3 allows on-access
pruning to set the VM.
To do this, it plumbs some information down from the executor to the
table scan about whether or not the table is modified by the query. We
don't want to set the VM only to clear it while scanning pages for an
UPDATE or while locking rows in a SELECT FOR UPDATE.
Because we only do on-access pruning when pd_prune_xid is valid, we
shouldn't need much of a heuristic for deciding when to set the VM
on-access -- but I've included one anyway: we only do it if we are
actually pruning or if the page is already dirty and no FPI would be
emitted.
You can see it in action with the following:
create extension pg_visibility;
create table foo (a int, b int) with (autovacuum_enabled=false, fillfactor=90);
insert into foo select generate_series(1,300), generate_series(1,300);
create index on foo (a);
update foo set b = 51 where b = 50;
select * from foo where a = 50;
select * from pg_visibility_map_summary('foo');
The SELECT will set a page all-visible in the VM.
In this patch set, on-access pruning is enabled for sequential scans
and the underlying heap relation in index scans and bitmap heap scans.
This example can exercise any of the three if you toggle
enable_indexscan and enable_bitmapscan appropriately.
From a performance perspective, If you run a trivial pgbench, you can
see far more all-visible pages set in the pgbench_[x] relations with
no noticeable overhead. But, I'm planning to do some performance
experiments to show how this affects our ability to choose index only
scan plans in realistic workloads.
- Melanie
[1]: /messages/by-id/CAAKRu_Yj=yrL+gGGsqfYVQcYn7rDp6hDeoF1vN453JDp8dEY+w@mail.gmail.com
Attachments:
v3-0001-Add-assert-to-heap_prune_record_unchanged_lp_norm.patchtext/x-patch; charset=US-ASCII; name=v3-0001-Add-assert-to-heap_prune_record_unchanged_lp_norm.patchDownload
From a5b11c191f34ca5fefc1c81d0a882a43df308060 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <melanieplageman@gmail.com>
Date: Mon, 7 Jul 2025 17:33:26 -0400
Subject: [PATCH v3 01/13] Add assert to heap_prune_record_unchanged_lp_normal
Not all callers provide VacuumCutoffs to heap_page_prune_and_freeze(),
so assert those are provided before passing them along to
heap_prepare_freeze_tuple().
---
src/backend/access/heap/pruneheap.c | 1 +
1 file changed, 1 insertion(+)
diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index a8025889be0..dd00931f179 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -1480,6 +1480,7 @@ heap_prune_record_unchanged_lp_normal(Page page, PruneState *prstate, OffsetNumb
{
bool totally_frozen;
+ Assert(prstate->cutoffs);
if ((heap_prepare_freeze_tuple(htup,
prstate->cutoffs,
&prstate->pagefrz,
--
2.43.0
v3-0005-Use-xl_heap_prune-record-for-setting-empty-pages-.patchtext/x-patch; charset=US-ASCII; name=v3-0005-Use-xl_heap_prune-record-for-setting-empty-pages-.patchDownload
From 4165b85aeebd9f7fd84d21b8f1cb7ead7597ba27 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <melanieplageman@gmail.com>
Date: Wed, 18 Jun 2025 12:42:19 -0400
Subject: [PATCH v3 05/13] Use xl_heap_prune record for setting empty pages
all-visible
As part of a project to eliminate xl_heap_visible records, eliminate
their usage in phase I vacuum of empty pages.
---
src/backend/access/heap/pruneheap.c | 14 +++++--
src/backend/access/heap/vacuumlazy.c | 55 ++++++++++++++++++----------
src/include/access/heapam.h | 1 +
3 files changed, 47 insertions(+), 23 deletions(-)
diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index 68ecf50848b..2724cf7f64f 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -836,6 +836,7 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
conflict_xid = prstate.latest_xid_removed;
log_heap_prune_and_freeze(relation, buffer,
+ false,
InvalidBuffer, 0, false,
conflict_xid,
true, reason,
@@ -2052,6 +2053,9 @@ heap_log_freeze_plan(HeapTupleFreeze *tuples, int ntuples,
* case, vmbuffer should already have been updated and marked dirty and should
* still be pinned and locked.
*
+ * force_heap_fpi indicates that a full page image of the heap block should be
+ * forced.
+ *
* set_pd_all_vis indicates that we set PD_ALL_VISIBLE and thus should update
* the page LSN when checksums/wal_log_hints are enabled even if we did not
* prune or freeze tuples on the page.
@@ -2062,6 +2066,7 @@ heap_log_freeze_plan(HeapTupleFreeze *tuples, int ntuples,
*/
void
log_heap_prune_and_freeze(Relation relation, Buffer buffer,
+ bool force_heap_fpi,
Buffer vmbuffer,
uint8 vmflags,
bool set_pd_all_vis,
@@ -2090,13 +2095,16 @@ log_heap_prune_and_freeze(Relation relation, Buffer buffer,
xlrec.flags = 0;
regbuf_flags = REGBUF_STANDARD;
+ if (force_heap_fpi)
+ regbuf_flags |= REGBUF_FORCE_IMAGE;
+
/*
* We can avoid an FPI if the only modification we are making to the heap
* page is to set PD_ALL_VISIBLE and checksums/wal_log_hints are disabled.
*/
- if (!do_prune &&
- nfrozen == 0 &&
- (!set_pd_all_vis || !XLogHintBitIsNeeded()))
+ else if (!do_prune &&
+ nfrozen == 0 &&
+ (!set_pd_all_vis || !XLogHintBitIsNeeded()))
regbuf_flags |= REGBUF_NO_IMAGE;
/*
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index 460cdbd8417..d9e195269d2 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -1878,33 +1878,47 @@ lazy_scan_new_or_empty(LVRelState *vacrel, Buffer buf, BlockNumber blkno,
*/
if (!PageIsAllVisible(page))
{
+ uint8 new_vmbits = VISIBILITYMAP_ALL_VISIBLE |
+ VISIBILITYMAP_ALL_FROZEN;
+
START_CRIT_SECTION();
- /* mark buffer dirty before writing a WAL record */
+ PageSetAllVisible(page);
MarkBufferDirty(buf);
- /*
- * It's possible that another backend has extended the heap,
- * initialized the page, and then failed to WAL-log the page due
- * to an ERROR. Since heap extension is not WAL-logged, recovery
- * might try to replay our record setting the page all-visible and
- * find that the page isn't initialized, which will cause a PANIC.
- * To prevent that, check whether the page has been previously
- * WAL-logged, and if not, do that now.
- */
- if (RelationNeedsWAL(vacrel->rel) &&
- PageGetLSN(page) == InvalidXLogRecPtr)
- log_newpage_buffer(buf, true);
+ LockBuffer(vmbuffer, BUFFER_LOCK_EXCLUSIVE);
+ visibilitymap_set_vmbyte(vacrel->rel, blkno,
+ vmbuffer, new_vmbits);
+
+ if (RelationNeedsWAL(vacrel->rel))
+ {
+ /*
+ * It's possible that another backend has extended the heap,
+ * initialized the page, and then failed to WAL-log the page
+ * due to an ERROR. Since heap extension is not WAL-logged,
+ * recovery might try to replay our record setting the page
+ * all-visible and find that the page isn't initialized, which
+ * will cause a PANIC. To prevent that, if the page hasn't
+ * been previously WAL-logged, force a heap FPI.
+ */
+ log_heap_prune_and_freeze(vacrel->rel, buf,
+ PageGetLSN(page) == InvalidXLogRecPtr,
+ vmbuffer,
+ new_vmbits,
+ true,
+ InvalidTransactionId,
+ false, PRUNE_VACUUM_SCAN,
+ NULL, 0,
+ NULL, 0,
+ NULL, 0,
+ NULL, 0);
+ }
- PageSetAllVisible(page);
- visibilitymap_set(vacrel->rel, blkno, buf,
- InvalidXLogRecPtr,
- vmbuffer, InvalidTransactionId,
- VISIBILITYMAP_ALL_VISIBLE |
- VISIBILITYMAP_ALL_FROZEN);
END_CRIT_SECTION();
- /* Count the newly all-frozen pages for logging */
+ LockBuffer(vmbuffer, BUFFER_LOCK_UNLOCK);
+
+ /* Count the newly all-frozen pages for logging. */
vacrel->vm_new_visible_pages++;
vacrel->vm_new_visible_frozen_pages++;
}
@@ -2918,6 +2932,7 @@ lazy_vacuum_heap_page(LVRelState *vacrel, BlockNumber blkno, Buffer buffer,
if (RelationNeedsWAL(vacrel->rel))
{
log_heap_prune_and_freeze(vacrel->rel, buffer,
+ false,
vmbuffer,
vmflags,
set_pd_all_vis,
diff --git a/src/include/access/heapam.h b/src/include/access/heapam.h
index 8b47295efa2..e7129a644a1 100644
--- a/src/include/access/heapam.h
+++ b/src/include/access/heapam.h
@@ -394,6 +394,7 @@ extern void heap_page_prune_execute(Buffer buffer, bool lp_truncate_only,
OffsetNumber *nowunused, int nunused);
extern void heap_get_root_tuples(Page page, OffsetNumber *root_offsets);
extern void log_heap_prune_and_freeze(Relation relation, Buffer buffer,
+ bool force_heap_fpi,
Buffer vmbuffer,
uint8 vmflags,
bool vm_modified_heap_page,
--
2.43.0
v3-0002-Eliminate-xl_heap_visible-in-COPY-FREEZE.patchtext/x-patch; charset=US-ASCII; name=v3-0002-Eliminate-xl_heap_visible-in-COPY-FREEZE.patchDownload
From f6a10188136d22a3ea73cf86d85dbb947ec87238 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <melanieplageman@gmail.com>
Date: Tue, 17 Jun 2025 17:22:10 -0400
Subject: [PATCH v3 02/13] Eliminate xl_heap_visible in COPY FREEZE
Instead of emitting a separate WAL record for setting the VM bits in
xl_heap_visible, include the required update in the xl_heap_multi_insert
record instead.
---
src/backend/access/heap/heapam.c | 47 +++++++++++---------
src/backend/access/heap/heapam_xlog.c | 39 +++++++++++++++-
src/backend/access/heap/visibilitymap.c | 59 +++++++++++++++++++++++++
src/backend/access/rmgrdesc/heapdesc.c | 5 +++
src/include/access/visibilitymap.h | 2 +
5 files changed, 130 insertions(+), 22 deletions(-)
diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index 0dcd6ee817e..68db4325285 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -2493,9 +2493,6 @@ heap_multi_insert(Relation relation, TupleTableSlot **slots, int ntuples,
/*
* If the page is all visible, need to clear that, unless we're only
* going to add further frozen rows to it.
- *
- * If we're only adding already frozen rows to a previously empty
- * page, mark it as all-visible.
*/
if (PageIsAllVisible(page) && !(options & HEAP_INSERT_FROZEN))
{
@@ -2505,8 +2502,22 @@ heap_multi_insert(Relation relation, TupleTableSlot **slots, int ntuples,
BufferGetBlockNumber(buffer),
vmbuffer, VISIBILITYMAP_VALID_BITS);
}
+
+ /*
+ * If we're only adding already frozen rows to a previously empty
+ * page, mark it as all-frozen and update the visibility map. We're
+ * already holding a pin on the vmbuffer.
+ */
else if (all_frozen_set)
+ {
PageSetAllVisible(page);
+ LockBuffer(vmbuffer, BUFFER_LOCK_EXCLUSIVE);
+ visibilitymap_set_vmbyte(relation,
+ BufferGetBlockNumber(buffer),
+ vmbuffer,
+ VISIBILITYMAP_ALL_VISIBLE |
+ VISIBILITYMAP_ALL_FROZEN);
+ }
/*
* XXX Should we set PageSetPrunable on this page ? See heap_insert()
@@ -2554,6 +2565,12 @@ heap_multi_insert(Relation relation, TupleTableSlot **slots, int ntuples,
xlrec->flags = 0;
if (all_visible_cleared)
xlrec->flags = XLH_INSERT_ALL_VISIBLE_CLEARED;
+
+ /*
+ * We don't have to worry about including a conflict xid in the
+ * WAL record as HEAP_INSERT_FROZEN intentionally violates
+ * visibility rules.
+ */
if (all_frozen_set)
xlrec->flags = XLH_INSERT_ALL_FROZEN_SET;
@@ -2616,7 +2633,10 @@ heap_multi_insert(Relation relation, TupleTableSlot **slots, int ntuples,
XLogBeginInsert();
XLogRegisterData(xlrec, tupledata - scratch.data);
+
XLogRegisterBuffer(0, buffer, REGBUF_STANDARD | bufflags);
+ if (all_frozen_set)
+ XLogRegisterBuffer(1, vmbuffer, 0);
XLogRegisterBufData(0, tupledata, totaldatalen);
@@ -2626,29 +2646,14 @@ heap_multi_insert(Relation relation, TupleTableSlot **slots, int ntuples,
recptr = XLogInsert(RM_HEAP2_ID, info);
PageSetLSN(page, recptr);
+ if (all_frozen_set)
+ PageSetLSN(BufferGetPage(vmbuffer), recptr);
}
END_CRIT_SECTION();
- /*
- * If we've frozen everything on the page, update the visibilitymap.
- * We're already holding pin on the vmbuffer.
- */
if (all_frozen_set)
- {
- Assert(PageIsAllVisible(page));
- Assert(visibilitymap_pin_ok(BufferGetBlockNumber(buffer), vmbuffer));
-
- /*
- * It's fine to use InvalidTransactionId here - this is only used
- * when HEAP_INSERT_FROZEN is specified, which intentionally
- * violates visibility rules.
- */
- visibilitymap_set(relation, BufferGetBlockNumber(buffer), buffer,
- InvalidXLogRecPtr, vmbuffer,
- InvalidTransactionId,
- VISIBILITYMAP_ALL_VISIBLE | VISIBILITYMAP_ALL_FROZEN);
- }
+ LockBuffer(vmbuffer, BUFFER_LOCK_UNLOCK);
UnlockReleaseBuffer(buffer);
ndone += nthispage;
diff --git a/src/backend/access/heap/heapam_xlog.c b/src/backend/access/heap/heapam_xlog.c
index eb4bd3d6ae3..2485c344191 100644
--- a/src/backend/access/heap/heapam_xlog.c
+++ b/src/backend/access/heap/heapam_xlog.c
@@ -552,6 +552,7 @@ heap_xlog_multi_insert(XLogReaderState *record)
int i;
bool isinit = (XLogRecGetInfo(record) & XLOG_HEAP_INIT_PAGE) != 0;
XLogRedoAction action;
+ Buffer vmbuffer = InvalidBuffer;
/*
* Insertion doesn't overwrite MVCC data, so no conflict processing is
@@ -572,11 +573,11 @@ heap_xlog_multi_insert(XLogReaderState *record)
if (xlrec->flags & XLH_INSERT_ALL_VISIBLE_CLEARED)
{
Relation reln = CreateFakeRelcacheEntry(rlocator);
- Buffer vmbuffer = InvalidBuffer;
visibilitymap_pin(reln, blkno, &vmbuffer);
visibilitymap_clear(reln, blkno, vmbuffer, VISIBILITYMAP_VALID_BITS);
ReleaseBuffer(vmbuffer);
+ vmbuffer = InvalidBuffer;
FreeFakeRelcacheEntry(reln);
}
@@ -663,6 +664,42 @@ heap_xlog_multi_insert(XLogReaderState *record)
if (BufferIsValid(buffer))
UnlockReleaseBuffer(buffer);
+ buffer = InvalidBuffer;
+
+ /*
+ * Now read and update the VM block. Even if we skipped updating the heap
+ * page due to the file being dropped or truncated later in recovery, it's
+ * still safe to update the visibility map. Any WAL record that clears
+ * the visibility map bit does so before checking the page LSN, so any
+ * bits that need to be cleared will still be cleared.
+ *
+ * It is only okay to set the VM bits without holding the heap page lock
+ * because we can expect no other writers of this page.
+ */
+ if (xlrec->flags & XLH_INSERT_ALL_FROZEN_SET &&
+ XLogReadBufferForRedoExtended(record, 1, RBM_ZERO_ON_ERROR, false,
+ &vmbuffer) == BLK_NEEDS_REDO)
+ {
+ Relation reln = CreateFakeRelcacheEntry(rlocator);
+
+ visibilitymap_pin(reln, blkno, &vmbuffer);
+ visibilitymap_set_vmbyte(reln, blkno,
+ vmbuffer,
+ VISIBILITYMAP_ALL_VISIBLE |
+ VISIBILITYMAP_ALL_FROZEN);
+
+ /*
+ * It is not possible that the VM was already set for this heap page,
+ * so the vmbuffer must have been modified and marked dirty.
+ */
+ Assert(BufferIsDirty(vmbuffer));
+ PageSetLSN(BufferGetPage(vmbuffer), lsn);
+ FreeFakeRelcacheEntry(reln);
+ }
+
+ if (BufferIsValid(vmbuffer))
+ UnlockReleaseBuffer(vmbuffer);
+
/*
* If the page is running low on free space, update the FSM as well.
* Arbitrarily, our definition of "low" is less than 20%. We can't do much
diff --git a/src/backend/access/heap/visibilitymap.c b/src/backend/access/heap/visibilitymap.c
index 745a04ef26e..573df6f6891 100644
--- a/src/backend/access/heap/visibilitymap.c
+++ b/src/backend/access/heap/visibilitymap.c
@@ -318,6 +318,65 @@ visibilitymap_set(Relation rel, BlockNumber heapBlk, Buffer heapBuf,
return status;
}
+/*
+ * Set flags in the VM block contained in the passed in vmBuf.
+ *
+ * This function is for callers which include the VM changes in the same WAL
+ * record as the modifications of the heap page which rendered it all-visible.
+ * Callers separately logging the VM changes should invoke visibilitymap_set()
+ * instead.
+ *
+ * Caller must have pinned and exclusive locked the correct block of the VM in
+ * vmBuf.
+ *
+ * During normal operation (i.e. not recovery), this should be called in a
+ * critical section which also makes any necessary changes to the heap page
+ * and, if relevant, emits WAL.
+ *
+ * Caller is responsible for WAL logging the changes to the VM buffer and for
+ * making any changes needed to the associated heap page.
+ */
+uint8
+visibilitymap_set_vmbyte(Relation rel, BlockNumber heapBlk,
+ Buffer vmBuf, uint8 flags)
+{
+ BlockNumber mapBlock = HEAPBLK_TO_MAPBLOCK(heapBlk);
+ uint32 mapByte = HEAPBLK_TO_MAPBYTE(heapBlk);
+ uint8 mapOffset = HEAPBLK_TO_OFFSET(heapBlk);
+ Page page;
+ uint8 *map;
+ uint8 status;
+
+#ifdef TRACE_VISIBILITYMAP
+ elog(DEBUG1, "vm_set %s %d", RelationGetRelationName(rel), heapBlk);
+#endif
+
+ /* Call in same critical section where WAL is emitted. */
+ Assert(InRecovery || CritSectionCount > 0);
+
+ /* Flags should be valid. Also never clear bits with this function */
+ Assert((flags & VISIBILITYMAP_VALID_BITS) == flags);
+
+ /* Must never set all_frozen bit without also setting all_visible bit */
+ Assert(flags != VISIBILITYMAP_ALL_FROZEN);
+
+ /* Check that we have the right VM page pinned */
+ if (!BufferIsValid(vmBuf) || BufferGetBlockNumber(vmBuf) != mapBlock)
+ elog(ERROR, "wrong VM buffer passed to visibilitymap_set");
+
+ page = BufferGetPage(vmBuf);
+ map = (uint8 *) PageGetContents(page);
+
+ status = (map[mapByte] >> mapOffset) & VISIBILITYMAP_VALID_BITS;
+ if (flags != status)
+ {
+ map[mapByte] |= (flags << mapOffset);
+ MarkBufferDirty(vmBuf);
+ }
+
+ return status;
+}
+
/*
* visibilitymap_get_status - get status of bits
*
diff --git a/src/backend/access/rmgrdesc/heapdesc.c b/src/backend/access/rmgrdesc/heapdesc.c
index 82b62c95de5..b48d7dc1d24 100644
--- a/src/backend/access/rmgrdesc/heapdesc.c
+++ b/src/backend/access/rmgrdesc/heapdesc.c
@@ -16,6 +16,7 @@
#include "access/heapam_xlog.h"
#include "access/rmgrdesc_utils.h"
+#include "access/visibilitymapdefs.h"
#include "storage/standbydefs.h"
/*
@@ -354,6 +355,10 @@ heap2_desc(StringInfo buf, XLogReaderState *record)
appendStringInfo(buf, "ntuples: %d, flags: 0x%02X", xlrec->ntuples,
xlrec->flags);
+ if (xlrec->flags & XLH_INSERT_ALL_FROZEN_SET)
+ appendStringInfo(buf, ", vm_flags: 0x%02X",
+ VISIBILITYMAP_ALL_VISIBLE | VISIBILITYMAP_ALL_FROZEN);
+
if (XLogRecHasBlockData(record, 0) && !isinit)
{
appendStringInfoString(buf, ", offsets:");
diff --git a/src/include/access/visibilitymap.h b/src/include/access/visibilitymap.h
index be21c6dd1a3..977566f6b98 100644
--- a/src/include/access/visibilitymap.h
+++ b/src/include/access/visibilitymap.h
@@ -37,6 +37,8 @@ extern uint8 visibilitymap_set(Relation rel,
Buffer vmBuf,
TransactionId cutoff_xid,
uint8 flags);
+extern uint8 visibilitymap_set_vmbyte(Relation rel, BlockNumber heapBlk,
+ Buffer vmBuf, uint8 flags);
extern uint8 visibilitymap_get_status(Relation rel, BlockNumber heapBlk, Buffer *vmbuf);
extern void visibilitymap_count(Relation rel, BlockNumber *all_visible, BlockNumber *all_frozen);
extern BlockNumber visibilitymap_prepare_truncate(Relation rel,
--
2.43.0
v3-0003-Make-heap_page_is_all_visible-independent-of-LVRe.patchtext/x-patch; charset=US-ASCII; name=v3-0003-Make-heap_page_is_all_visible-independent-of-LVRe.patchDownload
From 2959249115b654dbc7be6e79c914a8d163b765f1 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <melanieplageman@gmail.com>
Date: Wed, 18 Jun 2025 15:48:51 -0400
Subject: [PATCH v3 03/13] Make heap_page_is_all_visible independent of
LVRelState
Future commits will use this function inside of pruneheap.c where we do
not have access to the LVRelState. We only need two parameters from the
LVRelState, so just pass those in explicitly.
---
src/backend/access/heap/vacuumlazy.c | 45 ++++++++++++++++++----------
1 file changed, 29 insertions(+), 16 deletions(-)
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index 14036c27e87..8a62a93eee5 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -464,8 +464,11 @@ static void dead_items_add(LVRelState *vacrel, BlockNumber blkno, OffsetNumber *
int num_offsets);
static void dead_items_reset(LVRelState *vacrel);
static void dead_items_cleanup(LVRelState *vacrel);
-static bool heap_page_is_all_visible(LVRelState *vacrel, Buffer buf,
- TransactionId *visibility_cutoff_xid, bool *all_frozen);
+static bool heap_page_is_all_visible(Relation rel, Buffer buf,
+ TransactionId OldestXmin,
+ bool *all_frozen,
+ TransactionId *visibility_cutoff_xid,
+ OffsetNumber *logging_offnum);
static void update_relstats_all_indexes(LVRelState *vacrel);
static void vacuum_error_callback(void *arg);
static void update_vacuum_error_info(LVRelState *vacrel,
@@ -2010,8 +2013,9 @@ lazy_scan_prune(LVRelState *vacrel,
Assert(presult.lpdead_items == 0);
- if (!heap_page_is_all_visible(vacrel, buf,
- &debug_cutoff, &debug_all_frozen))
+ if (!heap_page_is_all_visible(vacrel->rel, buf,
+ vacrel->cutoffs.OldestXmin, &debug_all_frozen,
+ &debug_cutoff, &vacrel->offnum))
Assert(false);
Assert(presult.all_frozen == debug_all_frozen);
@@ -2907,8 +2911,8 @@ lazy_vacuum_heap_page(LVRelState *vacrel, BlockNumber blkno, Buffer buffer,
* emitted.
*/
Assert(!PageIsAllVisible(page));
- if (heap_page_is_all_visible(vacrel, buffer, &visibility_cutoff_xid,
- &all_frozen))
+ if (heap_page_is_all_visible(vacrel->rel, buffer, vacrel->cutoffs.OldestXmin,
+ &all_frozen, &visibility_cutoff_xid, &vacrel->offnum))
{
uint8 flags = VISIBILITYMAP_ALL_VISIBLE;
@@ -3592,9 +3596,16 @@ dead_items_cleanup(LVRelState *vacrel)
/*
* Check if every tuple in the given page is visible to all current and future
- * transactions. Also return the visibility_cutoff_xid which is the highest
- * xmin amongst the visible tuples. Set *all_frozen to true if every tuple
- * on this page is frozen.
+ * transactions.
+ *
+ * OldestXmin is used to determine visibility.
+ *
+ * *logging_offnum will have the OffsetNumber of the current tuple being
+ * processed for vacuum's error callback system.
+ *
+ * Return the visibility_cutoff_xid which is the highest xmin amongst the
+ * visible tuples. Sets *all_frozen to true if every tuple on this page is
+ * frozen.
*
* This is a stripped down version of lazy_scan_prune(). If you change
* anything here, make sure that everything stays in sync. Note that an
@@ -3602,9 +3613,11 @@ dead_items_cleanup(LVRelState *vacrel)
* introducing new side-effects here.
*/
static bool
-heap_page_is_all_visible(LVRelState *vacrel, Buffer buf,
+heap_page_is_all_visible(Relation rel, Buffer buf,
+ TransactionId OldestXmin,
+ bool *all_frozen,
TransactionId *visibility_cutoff_xid,
- bool *all_frozen)
+ OffsetNumber *logging_offnum)
{
Page page = BufferGetPage(buf);
BlockNumber blockno = BufferGetBlockNumber(buf);
@@ -3627,7 +3640,7 @@ heap_page_is_all_visible(LVRelState *vacrel, Buffer buf,
* Set the offset number so that we can display it along with any
* error that occurred while processing this tuple.
*/
- vacrel->offnum = offnum;
+ *logging_offnum = offnum;
itemid = PageGetItemId(page, offnum);
/* Unused or redirect line pointers are of no interest */
@@ -3651,9 +3664,9 @@ heap_page_is_all_visible(LVRelState *vacrel, Buffer buf,
tuple.t_data = (HeapTupleHeader) PageGetItem(page, itemid);
tuple.t_len = ItemIdGetLength(itemid);
- tuple.t_tableOid = RelationGetRelid(vacrel->rel);
+ tuple.t_tableOid = RelationGetRelid(rel);
- switch (HeapTupleSatisfiesVacuum(&tuple, vacrel->cutoffs.OldestXmin,
+ switch (HeapTupleSatisfiesVacuum(&tuple, OldestXmin,
buf))
{
case HEAPTUPLE_LIVE:
@@ -3674,7 +3687,7 @@ heap_page_is_all_visible(LVRelState *vacrel, Buffer buf,
*/
xmin = HeapTupleHeaderGetXmin(tuple.t_data);
if (!TransactionIdPrecedes(xmin,
- vacrel->cutoffs.OldestXmin))
+ OldestXmin))
{
all_visible = false;
*all_frozen = false;
@@ -3709,7 +3722,7 @@ heap_page_is_all_visible(LVRelState *vacrel, Buffer buf,
} /* scan along page */
/* Clear the offset information once we have processed the given page. */
- vacrel->offnum = InvalidOffsetNumber;
+ *logging_offnum = InvalidOffsetNumber;
return all_visible;
}
--
2.43.0
v3-0004-Eliminate-xl_heap_visible-from-vacuum-phase-III.patchtext/x-patch; charset=US-ASCII; name=v3-0004-Eliminate-xl_heap_visible-from-vacuum-phase-III.patchDownload
From 5bcb28fcc409d80ef62c855b15a2a45f006d40f1 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <melanieplageman@gmail.com>
Date: Wed, 18 Jun 2025 12:42:13 -0400
Subject: [PATCH v3 04/13] Eliminate xl_heap_visible from vacuum phase III
Instead of emitting a separate xl_heap_visible record for each page that
is rendered all-visible by vacuum's third phase, include the updates to
the VM in the already emitted xl_heap_prune record.
---
src/backend/access/heap/heapam_xlog.c | 142 ++++++++++++++++++++---
src/backend/access/heap/pruneheap.c | 48 +++++++-
src/backend/access/heap/vacuumlazy.c | 149 +++++++++++++++++--------
src/backend/access/rmgrdesc/heapdesc.c | 13 ++-
src/include/access/heapam.h | 9 ++
src/include/access/heapam_xlog.h | 3 +
6 files changed, 296 insertions(+), 68 deletions(-)
diff --git a/src/backend/access/heap/heapam_xlog.c b/src/backend/access/heap/heapam_xlog.c
index 2485c344191..14541e2e94f 100644
--- a/src/backend/access/heap/heapam_xlog.c
+++ b/src/backend/access/heap/heapam_xlog.c
@@ -35,7 +35,8 @@ heap_xlog_prune_freeze(XLogReaderState *record)
Buffer buffer;
RelFileLocator rlocator;
BlockNumber blkno;
- XLogRedoAction action;
+ Buffer vmbuffer = InvalidBuffer;
+ uint8 vmflags = 0;
XLogRecGetBlockTag(record, 0, &rlocator, NULL, &blkno);
memcpy(&xlrec, maindataptr, SizeOfHeapPrune);
@@ -51,10 +52,15 @@ heap_xlog_prune_freeze(XLogReaderState *record)
(xlrec.flags & (XLHP_HAS_REDIRECTIONS | XLHP_HAS_DEAD_ITEMS)) == 0);
/*
- * We are about to remove and/or freeze tuples. In Hot Standby mode,
- * ensure that there are no queries running for which the removed tuples
- * are still visible or which still consider the frozen xids as running.
- * The conflict horizon XID comes after xl_heap_prune.
+ * After xl_heap_prune is the optional snapshot conflict horizon.
+ *
+ * In Hot Standby mode, we must ensure that there are no running queries
+ * which would conflict with the changes in this record. If pruning, that
+ * means we cannot remove tuples still visible to transactions on the
+ * standby. If freezing, that means we cannot freeze tuples with xids that
+ * are still considered running on the standby. And for setting the VM, we
+ * cannot do so if the page isn't all-visible to all transactions on the
+ * standby.
*/
if ((xlrec.flags & XLHP_HAS_CONFLICT_HORIZON) != 0)
{
@@ -70,13 +76,28 @@ heap_xlog_prune_freeze(XLogReaderState *record)
rlocator);
}
+ /* Next are the optionally included vmflags. Copy them out for later use. */
+ if ((xlrec.flags & XLHP_HAS_VMFLAGS) != 0)
+ {
+ memcpy(&vmflags, maindataptr, sizeof(uint8));
+ maindataptr += sizeof(uint8);
+
+ /*
+ * We don't set VISIBILITYMAP_XLOG_CATALOG_REL in the combined record
+ * because we already have XLHP_IS_CATALOG_REL.
+ */
+ Assert((vmflags & VISIBILITYMAP_VALID_BITS) == vmflags);
+ /* Must never set all_frozen bit without also setting all_visible bit */
+ Assert(vmflags != VISIBILITYMAP_ALL_FROZEN);
+ }
+
/*
- * If we have a full-page image, restore it and we're done.
+ * If we have a full-page image of the heap block, restore it and we're
+ * done with the heap block.
*/
- action = XLogReadBufferForRedoExtended(record, 0, RBM_NORMAL,
- (xlrec.flags & XLHP_CLEANUP_LOCK) != 0,
- &buffer);
- if (action == BLK_NEEDS_REDO)
+ if (XLogReadBufferForRedoExtended(record, 0, RBM_NORMAL,
+ (xlrec.flags & XLHP_CLEANUP_LOCK) != 0,
+ &buffer) == BLK_NEEDS_REDO)
{
Page page = (Page) BufferGetPage(buffer);
OffsetNumber *redirected;
@@ -89,6 +110,9 @@ heap_xlog_prune_freeze(XLogReaderState *record)
Size datalen;
xlhp_freeze_plan *plans;
OffsetNumber *frz_offsets;
+ bool do_prune;
+ bool mark_buffer_dirty;
+ bool set_heap_lsn;
char *dataptr = XLogRecGetBlockData(record, 0, &datalen);
heap_xlog_deserialize_prune_and_freeze(dataptr, xlrec.flags,
@@ -97,11 +121,18 @@ heap_xlog_prune_freeze(XLogReaderState *record)
&ndead, &nowdead,
&nunused, &nowunused);
+ do_prune = nredirected > 0 || ndead > 0 || nunused > 0;
+ set_heap_lsn = mark_buffer_dirty = do_prune || nplans > 0;
+
+ /* Ensure the record does something */
+ Assert(do_prune || nplans > 0 ||
+ vmflags & VISIBILITYMAP_VALID_BITS);
+
/*
* Update all line pointers per the record, and repair fragmentation
* if needed.
*/
- if (nredirected > 0 || ndead > 0 || nunused > 0)
+ if (do_prune)
heap_page_prune_execute(buffer,
(xlrec.flags & XLHP_CLEANUP_LOCK) == 0,
redirected, nredirected,
@@ -138,26 +169,72 @@ heap_xlog_prune_freeze(XLogReaderState *record)
/* There should be no more data */
Assert((char *) frz_offsets == dataptr + datalen);
+ Assert(BufferIsValid(buffer) &&
+ BufferGetBlockNumber(buffer) == blkno);
+
+ /*
+ * Now set PD_ALL_VISIBLE, if required. We'll only do this if we are
+ * also going to set bits in the VM later.
+ *
+ * We must never end up with the VM bit set and the page-level
+ * PD_ALL_VISIBLE bit clear. If that were to occur, a subsequent page
+ * modification would fail to clear the VM bit.
+ */
+ if ((vmflags & VISIBILITYMAP_VALID_BITS) && !PageIsAllVisible(page))
+ {
+ PageSetAllVisible(page);
+
+ /*
+ * If the only change to the heap page is setting PD_ALL_VISIBLE,
+ * we can avoid setting the page LSN unless checksums or
+ * wal_log_hints are enabled.
+ */
+ set_heap_lsn = XLogHintBitIsNeeded() ? true : set_heap_lsn;
+ mark_buffer_dirty = true;
+ }
+
/*
* Note: we don't worry about updating the page's prunability hints.
* At worst this will cause an extra prune cycle to occur soon.
*/
- PageSetLSN(page, lsn);
- MarkBufferDirty(buffer);
+ if (mark_buffer_dirty)
+ MarkBufferDirty(buffer);
+ if (set_heap_lsn)
+ PageSetLSN(page, lsn);
}
/*
- * If we released any space or line pointers, update the free space map.
+ * If we released any space or line pointers or will be setting a page in
+ * the visibility map, update the free space map.
+ *
+ * Even if we are just updating the VM (and thus not freeing up any
+ * space), we'll still update the FSM for this page. Since FSM is not
+ * WAL-logged and only updated heuristically, it easily becomes stale in
+ * standbys. If the standby is later promoted and runs VACUUM, it will
+ * skip updating individual free space figures for pages that became
+ * all-visible (or all-frozen, depending on the vacuum mode,) which is
+ * troublesome when FreeSpaceMapVacuum propagates too optimistic free
+ * space values to upper FSM layers; later inserters try to use such pages
+ * only to find out that they are unusable. This can cause long stalls
+ * when there are many such pages.
+ *
+ * Forestall those problems by updating FSM's idea about a page that is
+ * becoming all-visible or all-frozen.
*
* Do this regardless of a full-page image being applied, since the FSM
* data is not in the page anyway.
+ *
+ * We want to avoid holding an exclusive lock on the heap buffer while
+ * doing IO (either of the FSM or the VM), so we'll release the lock on
+ * the heap buffer before doing either.
*/
if (BufferIsValid(buffer))
{
- if (xlrec.flags & (XLHP_HAS_REDIRECTIONS |
- XLHP_HAS_DEAD_ITEMS |
- XLHP_HAS_NOW_UNUSED_ITEMS))
+ if ((xlrec.flags & (XLHP_HAS_REDIRECTIONS |
+ XLHP_HAS_DEAD_ITEMS |
+ XLHP_HAS_NOW_UNUSED_ITEMS)) ||
+ vmflags & VISIBILITYMAP_VALID_BITS)
{
Size freespace = PageGetHeapFreeSpace(BufferGetPage(buffer));
@@ -168,6 +245,37 @@ heap_xlog_prune_freeze(XLogReaderState *record)
else
UnlockReleaseBuffer(buffer);
}
+
+ /*
+ * Read and update the VM block. Even if we skipped updating the heap page
+ * due to the file being dropped or truncated later in recovery, it's
+ * still safe to update the visibility map. Any WAL record that clears
+ * the visibility map bit does so before checking the page LSN, so any
+ * bits that need to be cleared will still be cleared.
+ *
+ * Note that it is *only* okay that we do not hold a lock on the heap page
+ * because we are in recovery and can expect no other writers to clear
+ * PD_ALL_VISIBLE before we are able to update the VM.
+ */
+ if (vmflags & VISIBILITYMAP_VALID_BITS &&
+ XLogReadBufferForRedoExtended(record, 1,
+ RBM_ZERO_ON_ERROR,
+ false,
+ &vmbuffer) == BLK_NEEDS_REDO)
+ {
+ uint8 old_vmbits = 0;
+ Relation reln = CreateFakeRelcacheEntry(rlocator);
+
+ visibilitymap_pin(reln, blkno, &vmbuffer);
+ old_vmbits = visibilitymap_set_vmbyte(reln, blkno, vmbuffer, vmflags);
+ /* Only set VM page LSN if we modified the page */
+ if (old_vmbits != vmflags)
+ PageSetLSN(BufferGetPage(vmbuffer), lsn);
+ FreeFakeRelcacheEntry(reln);
+ }
+
+ if (BufferIsValid(vmbuffer))
+ UnlockReleaseBuffer(vmbuffer);
}
/*
diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index dd00931f179..68ecf50848b 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -21,6 +21,7 @@
#include "access/transam.h"
#include "access/xlog.h"
#include "access/xloginsert.h"
+#include "access/visibilitymapdefs.h"
#include "commands/vacuum.h"
#include "executor/instrument.h"
#include "miscadmin.h"
@@ -835,6 +836,7 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
conflict_xid = prstate.latest_xid_removed;
log_heap_prune_and_freeze(relation, buffer,
+ InvalidBuffer, 0, false,
conflict_xid,
true, reason,
prstate.frozen, prstate.nfrozen,
@@ -2046,12 +2048,23 @@ heap_log_freeze_plan(HeapTupleFreeze *tuples, int ntuples,
* replaying 'unused' items depends on whether they were all previously marked
* as dead.
*
+ * If the VM is being updated, vmflags will contain the bits to set. In this
+ * case, vmbuffer should already have been updated and marked dirty and should
+ * still be pinned and locked.
+ *
+ * set_pd_all_vis indicates that we set PD_ALL_VISIBLE and thus should update
+ * the page LSN when checksums/wal_log_hints are enabled even if we did not
+ * prune or freeze tuples on the page.
+ *
* Note: This function scribbles on the 'frozen' array.
*
* Note: This is called in a critical section, so careful what you do here.
*/
void
log_heap_prune_and_freeze(Relation relation, Buffer buffer,
+ Buffer vmbuffer,
+ uint8 vmflags,
+ bool set_pd_all_vis,
TransactionId conflict_xid,
bool cleanup_lock,
PruneReason reason,
@@ -2063,6 +2076,7 @@ log_heap_prune_and_freeze(Relation relation, Buffer buffer,
xl_heap_prune xlrec;
XLogRecPtr recptr;
uint8 info;
+ uint8 regbuf_flags;
/* The following local variables hold data registered in the WAL record: */
xlhp_freeze_plan plans[MaxHeapTuplesPerPage];
@@ -2071,8 +2085,19 @@ log_heap_prune_and_freeze(Relation relation, Buffer buffer,
xlhp_prune_items dead_items;
xlhp_prune_items unused_items;
OffsetNumber frz_offsets[MaxHeapTuplesPerPage];
+ bool do_prune = nredirected > 0 || ndead > 0 || nunused > 0;
xlrec.flags = 0;
+ regbuf_flags = REGBUF_STANDARD;
+
+ /*
+ * We can avoid an FPI if the only modification we are making to the heap
+ * page is to set PD_ALL_VISIBLE and checksums/wal_log_hints are disabled.
+ */
+ if (!do_prune &&
+ nfrozen == 0 &&
+ (!set_pd_all_vis || !XLogHintBitIsNeeded()))
+ regbuf_flags |= REGBUF_NO_IMAGE;
/*
* Prepare data for the buffer. The arrays are not actually in the
@@ -2080,7 +2105,11 @@ log_heap_prune_and_freeze(Relation relation, Buffer buffer,
* page image, the arrays can be omitted.
*/
XLogBeginInsert();
- XLogRegisterBuffer(0, buffer, REGBUF_STANDARD);
+ XLogRegisterBuffer(0, buffer, regbuf_flags);
+
+ if (vmflags & VISIBILITYMAP_VALID_BITS)
+ XLogRegisterBuffer(1, vmbuffer, 0);
+
if (nfrozen > 0)
{
int nplans;
@@ -2137,6 +2166,8 @@ log_heap_prune_and_freeze(Relation relation, Buffer buffer,
* Prepare the main xl_heap_prune record. We already set the XLHP_HAS_*
* flag above.
*/
+ if (vmflags & VISIBILITYMAP_VALID_BITS)
+ xlrec.flags |= XLHP_HAS_VMFLAGS;
if (RelationIsAccessibleInLogicalDecoding(relation))
xlrec.flags |= XLHP_IS_CATALOG_REL;
if (TransactionIdIsValid(conflict_xid))
@@ -2151,6 +2182,8 @@ log_heap_prune_and_freeze(Relation relation, Buffer buffer,
XLogRegisterData(&xlrec, SizeOfHeapPrune);
if (TransactionIdIsValid(conflict_xid))
XLogRegisterData(&conflict_xid, sizeof(TransactionId));
+ if (vmflags & VISIBILITYMAP_VALID_BITS)
+ XLogRegisterData(&vmflags, sizeof(uint8));
switch (reason)
{
@@ -2169,5 +2202,16 @@ log_heap_prune_and_freeze(Relation relation, Buffer buffer,
}
recptr = XLogInsert(RM_HEAP2_ID, info);
- PageSetLSN(BufferGetPage(buffer), recptr);
+ if (vmflags & VISIBILITYMAP_VALID_BITS)
+ PageSetLSN(BufferGetPage(vmbuffer), recptr);
+
+ /*
+ * If pruning or freezing tuples or setting the page all-visible when
+ * checksums or wal_hint_bits are enabled, we must bump the LSN. Torn
+ * pages are possible if we update PD_ALL_VISIBLE without bumping the LSN,
+ * but this is deemed okay for page hint updates.
+ */
+ if (do_prune || nfrozen > 0 ||
+ (set_pd_all_vis && XLogHintBitIsNeeded()))
+ PageSetLSN(BufferGetPage(buffer), recptr);
}
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index 8a62a93eee5..460cdbd8417 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -464,11 +464,13 @@ static void dead_items_add(LVRelState *vacrel, BlockNumber blkno, OffsetNumber *
int num_offsets);
static void dead_items_reset(LVRelState *vacrel);
static void dead_items_cleanup(LVRelState *vacrel);
-static bool heap_page_is_all_visible(Relation rel, Buffer buf,
- TransactionId OldestXmin,
- bool *all_frozen,
- TransactionId *visibility_cutoff_xid,
- OffsetNumber *logging_offnum);
+static bool heap_page_is_all_visible_except_lpdead(Relation rel, Buffer buf,
+ TransactionId OldestXmin,
+ OffsetNumber *deadoffsets,
+ int allowed_num_offsets,
+ bool *all_frozen,
+ TransactionId *visibility_cutoff_xid,
+ OffsetNumber *logging_offnum);
static void update_relstats_all_indexes(LVRelState *vacrel);
static void vacuum_error_callback(void *arg);
static void update_vacuum_error_info(LVRelState *vacrel,
@@ -2847,8 +2849,11 @@ lazy_vacuum_heap_page(LVRelState *vacrel, BlockNumber blkno, Buffer buffer,
OffsetNumber unused[MaxHeapTuplesPerPage];
int nunused = 0;
TransactionId visibility_cutoff_xid;
+ TransactionId conflict_xid = InvalidTransactionId;
bool all_frozen;
LVSavedErrInfo saved_err_info;
+ uint8 vmflags = 0;
+ bool set_pd_all_vis = false;
Assert(vacrel->do_index_vacuuming);
@@ -2859,6 +2864,20 @@ lazy_vacuum_heap_page(LVRelState *vacrel, BlockNumber blkno, Buffer buffer,
VACUUM_ERRCB_PHASE_VACUUM_HEAP, blkno,
InvalidOffsetNumber);
+ if (heap_page_is_all_visible_except_lpdead(vacrel->rel, buffer,
+ vacrel->cutoffs.OldestXmin,
+ deadoffsets, num_offsets,
+ &all_frozen, &visibility_cutoff_xid,
+ &vacrel->offnum))
+ {
+ vmflags |= VISIBILITYMAP_ALL_VISIBLE;
+ if (all_frozen)
+ {
+ vmflags |= VISIBILITYMAP_ALL_FROZEN;
+ Assert(!TransactionIdIsValid(visibility_cutoff_xid));
+ }
+ }
+
START_CRIT_SECTION();
for (int i = 0; i < num_offsets; i++)
@@ -2878,6 +2897,18 @@ lazy_vacuum_heap_page(LVRelState *vacrel, BlockNumber blkno, Buffer buffer,
/* Attempt to truncate line pointer array now */
PageTruncateLinePointerArray(page);
+ if ((vmflags & VISIBILITYMAP_VALID_BITS) != 0)
+ {
+ Assert(!PageIsAllVisible(page));
+ set_pd_all_vis = true;
+ LockBuffer(vmbuffer, BUFFER_LOCK_EXCLUSIVE);
+ PageSetAllVisible(page);
+ visibilitymap_set_vmbyte(vacrel->rel,
+ blkno,
+ vmbuffer, vmflags);
+ conflict_xid = visibility_cutoff_xid;
+ }
+
/*
* Mark buffer dirty before we write WAL.
*/
@@ -2887,7 +2918,10 @@ lazy_vacuum_heap_page(LVRelState *vacrel, BlockNumber blkno, Buffer buffer,
if (RelationNeedsWAL(vacrel->rel))
{
log_heap_prune_and_freeze(vacrel->rel, buffer,
- InvalidTransactionId,
+ vmbuffer,
+ vmflags,
+ set_pd_all_vis,
+ conflict_xid,
false, /* no cleanup lock required */
PRUNE_VACUUM_CLEANUP,
NULL, 0, /* frozen */
@@ -2896,39 +2930,12 @@ lazy_vacuum_heap_page(LVRelState *vacrel, BlockNumber blkno, Buffer buffer,
unused, nunused);
}
- /*
- * End critical section, so we safely can do visibility tests (which
- * possibly need to perform IO and allocate memory!). If we crash now the
- * page (including the corresponding vm bit) might not be marked all
- * visible, but that's fine. A later vacuum will fix that.
- */
END_CRIT_SECTION();
- /*
- * Now that we have removed the LP_DEAD items from the page, once again
- * check if the page has become all-visible. The page is already marked
- * dirty, exclusively locked, and, if needed, a full page image has been
- * emitted.
- */
- Assert(!PageIsAllVisible(page));
- if (heap_page_is_all_visible(vacrel->rel, buffer, vacrel->cutoffs.OldestXmin,
- &all_frozen, &visibility_cutoff_xid, &vacrel->offnum))
+ if ((vmflags & VISIBILITYMAP_ALL_VISIBLE) != 0)
{
- uint8 flags = VISIBILITYMAP_ALL_VISIBLE;
-
- if (all_frozen)
- {
- Assert(!TransactionIdIsValid(visibility_cutoff_xid));
- flags |= VISIBILITYMAP_ALL_FROZEN;
- }
-
- PageSetAllVisible(page);
- visibilitymap_set(vacrel->rel, blkno, buffer,
- InvalidXLogRecPtr,
- vmbuffer, visibility_cutoff_xid,
- flags);
-
/* Count the newly set VM page for logging */
+ LockBuffer(vmbuffer, BUFFER_LOCK_UNLOCK);
vacrel->vm_new_visible_pages++;
if (all_frozen)
vacrel->vm_new_visible_frozen_pages++;
@@ -3594,6 +3601,25 @@ dead_items_cleanup(LVRelState *vacrel)
vacrel->pvs = NULL;
}
+/*
+ * Wrapper for heap_page_is_all_visible_except_lpdead() which can be used for
+ * callers that expect no LP_DEAD on the page.
+ */
+bool
+heap_page_is_all_visible(Relation rel, Buffer buf,
+ TransactionId OldestXmin,
+ bool *all_frozen,
+ TransactionId *visibility_cutoff_xid,
+ OffsetNumber *logging_offnum)
+{
+
+ return heap_page_is_all_visible_except_lpdead(rel, buf, OldestXmin,
+ NULL, 0,
+ all_frozen,
+ visibility_cutoff_xid,
+ logging_offnum);
+}
+
/*
* Check if every tuple in the given page is visible to all current and future
* transactions.
@@ -3607,23 +3633,35 @@ dead_items_cleanup(LVRelState *vacrel)
* visible tuples. Sets *all_frozen to true if every tuple on this page is
* frozen.
*
- * This is a stripped down version of lazy_scan_prune(). If you change
- * anything here, make sure that everything stays in sync. Note that an
- * assertion calls us to verify that everybody still agrees. Be sure to avoid
- * introducing new side-effects here.
+ * deadoffsets are the offsets we know about and are about to set LP_UNUSED.
+ * allowed_num_offsets is the number of those. As long as the LP_DEAD items we
+ * encounter on the page match those exactly, we can set the page all-visible
+ * in the VM.
+ *
+ * Callers looking to verify that the page is all-visible can call
+ * heap_page_is_all_visible().
+ *
+ * This is similar logic to that in heap_prune_record_unchanged_lp_normal() If
+ * you change anything here, make sure that everything stays in sync. Note
+ * that an assertion calls us to verify that everybody still agrees. Be sure
+ * to avoid introducing new side-effects here.
*/
static bool
-heap_page_is_all_visible(Relation rel, Buffer buf,
- TransactionId OldestXmin,
- bool *all_frozen,
- TransactionId *visibility_cutoff_xid,
- OffsetNumber *logging_offnum)
+heap_page_is_all_visible_except_lpdead(Relation rel, Buffer buf,
+ TransactionId OldestXmin,
+ OffsetNumber *deadoffsets,
+ int allowed_num_offsets,
+ bool *all_frozen,
+ TransactionId *visibility_cutoff_xid,
+ OffsetNumber *logging_offnum)
{
Page page = BufferGetPage(buf);
BlockNumber blockno = BufferGetBlockNumber(buf);
OffsetNumber offnum,
maxoff;
bool all_visible = true;
+ OffsetNumber current_dead_offsets[MaxHeapTuplesPerPage];
+ size_t current_num_offsets = 0;
*visibility_cutoff_xid = InvalidTransactionId;
*all_frozen = true;
@@ -3655,9 +3693,8 @@ heap_page_is_all_visible(Relation rel, Buffer buf,
*/
if (ItemIdIsDead(itemid))
{
- all_visible = false;
- *all_frozen = false;
- break;
+ current_dead_offsets[current_num_offsets++] = offnum;
+ continue;
}
Assert(ItemIdIsNormal(itemid));
@@ -3724,7 +3761,23 @@ heap_page_is_all_visible(Relation rel, Buffer buf,
/* Clear the offset information once we have processed the given page. */
*logging_offnum = InvalidOffsetNumber;
- return all_visible;
+ /* If we already know it's not all-visible, return false */
+ if (!all_visible)
+ return false;
+
+ /* If we weren't allowed any dead offsets, we're done */
+ if (allowed_num_offsets == 0)
+ return current_num_offsets == 0;
+
+ /* If the number of dead offsets has changed, that's wrong */
+ if (current_num_offsets != allowed_num_offsets)
+ return false;
+
+ Assert(deadoffsets);
+
+ /* The dead offsets must be the same dead offsets */
+ return memcmp(current_dead_offsets, deadoffsets,
+ allowed_num_offsets * sizeof(OffsetNumber)) == 0;
}
/*
diff --git a/src/backend/access/rmgrdesc/heapdesc.c b/src/backend/access/rmgrdesc/heapdesc.c
index b48d7dc1d24..d6c86ccac20 100644
--- a/src/backend/access/rmgrdesc/heapdesc.c
+++ b/src/backend/access/rmgrdesc/heapdesc.c
@@ -266,6 +266,7 @@ heap2_desc(StringInfo buf, XLogReaderState *record)
{
char *rec = XLogRecGetData(record);
uint8 info = XLogRecGetInfo(record) & ~XLR_INFO_MASK;
+ char *maindataptr = rec + SizeOfHeapPrune;
info &= XLOG_HEAP_OPMASK;
if (info == XLOG_HEAP2_PRUNE_ON_ACCESS ||
@@ -278,7 +279,8 @@ heap2_desc(StringInfo buf, XLogReaderState *record)
{
TransactionId conflict_xid;
- memcpy(&conflict_xid, rec + SizeOfHeapPrune, sizeof(TransactionId));
+ memcpy(&conflict_xid, maindataptr, sizeof(TransactionId));
+ maindataptr += sizeof(TransactionId);
appendStringInfo(buf, "snapshotConflictHorizon: %u",
conflict_xid);
@@ -287,6 +289,15 @@ heap2_desc(StringInfo buf, XLogReaderState *record)
appendStringInfo(buf, ", isCatalogRel: %c",
xlrec->flags & XLHP_IS_CATALOG_REL ? 'T' : 'F');
+ if (xlrec->flags & XLHP_HAS_VMFLAGS)
+ {
+ uint8 vmflags;
+
+ memcpy(&vmflags, maindataptr, sizeof(uint8));
+ maindataptr += sizeof(uint8);
+ appendStringInfo(buf, ", vm_flags: 0x%02X", vmflags);
+ }
+
if (XLogRecHasBlockData(record, 0))
{
Size datalen;
diff --git a/src/include/access/heapam.h b/src/include/access/heapam.h
index a2bd5a897f8..8b47295efa2 100644
--- a/src/include/access/heapam.h
+++ b/src/include/access/heapam.h
@@ -344,6 +344,12 @@ extern void heap_inplace_update_and_unlock(Relation relation,
Buffer buffer);
extern void heap_inplace_unlock(Relation relation,
HeapTuple oldtup, Buffer buffer);
+
+extern bool heap_page_is_all_visible(Relation rel, Buffer buf,
+ TransactionId OldestXmin,
+ bool *all_frozen,
+ TransactionId *visibility_cutoff_xid,
+ OffsetNumber *logging_offnum);
extern bool heap_prepare_freeze_tuple(HeapTupleHeader tuple,
const struct VacuumCutoffs *cutoffs,
HeapPageFreeze *pagefrz,
@@ -388,6 +394,9 @@ extern void heap_page_prune_execute(Buffer buffer, bool lp_truncate_only,
OffsetNumber *nowunused, int nunused);
extern void heap_get_root_tuples(Page page, OffsetNumber *root_offsets);
extern void log_heap_prune_and_freeze(Relation relation, Buffer buffer,
+ Buffer vmbuffer,
+ uint8 vmflags,
+ bool vm_modified_heap_page,
TransactionId conflict_xid,
bool cleanup_lock,
PruneReason reason,
diff --git a/src/include/access/heapam_xlog.h b/src/include/access/heapam_xlog.h
index 277df6b3cf0..ceae9c083ff 100644
--- a/src/include/access/heapam_xlog.h
+++ b/src/include/access/heapam_xlog.h
@@ -295,6 +295,9 @@ typedef struct xl_heap_prune
#define SizeOfHeapPrune (offsetof(xl_heap_prune, flags) + sizeof(uint8))
+/* If the record should update the VM, this is the new value */
+#define XLHP_HAS_VMFLAGS (1 << 0)
+
/* to handle recovery conflict during logical decoding on standby */
#define XLHP_IS_CATALOG_REL (1 << 1)
--
2.43.0
v3-0007-Combine-vacuum-phase-I-VM-update-cases.patchtext/x-patch; charset=US-ASCII; name=v3-0007-Combine-vacuum-phase-I-VM-update-cases.patchDownload
From 11ffcf801d1f1ef112d4f7a67cdcb626d1557221 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <melanieplageman@gmail.com>
Date: Wed, 28 May 2025 16:35:36 -0400
Subject: [PATCH v3 07/13] Combine vacuum phase I VM update cases
After phase I of vacuum we update the VM, either setting the VM bits
when all bits are currently unset or setting just the frozen bit when
the all-visible bit is already set. Those cases had a lot of duplicated
code. Combine them. This is simpler to understand and also allows makes
the code compact enough to start using to update the VM while pruning
and freezing.
---
src/backend/access/heap/vacuumlazy.c | 100 +++++++++------------------
1 file changed, 31 insertions(+), 69 deletions(-)
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index 04a7b6c4181..da71f095da9 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -2152,11 +2152,26 @@ lazy_scan_prune(LVRelState *vacrel,
{
/* Don't update the VM if we just cleared corruption in it */
}
- else if (!all_visible_according_to_vm && presult.all_visible)
+
+ /*
+ * If the page isn't yet marked all-visible in the VM or it is and needs
+ * to me marked all-frozen, update the VM Note that all_frozen is only
+ * valid if all_visible is true, so we must check both all_visible and
+ * all_frozen.
+ */
+ else if (presult.all_visible &&
+ (!all_visible_according_to_vm ||
+ (presult.all_frozen && !VM_ALL_FROZEN(vacrel->rel, blkno, &vmbuffer))))
{
uint8 old_vmbits;
uint8 flags = VISIBILITYMAP_ALL_VISIBLE;
+ /*
+ * If the page is all-frozen, we can pass InvalidTransactionId as our
+ * cutoff_xid, since a snapshotConflictHorizon sufficient to make
+ * everything safe for REDO was logged when the page's tuples were
+ * frozen.
+ */
if (presult.all_frozen)
{
Assert(!TransactionIdIsValid(presult.vm_conflict_horizon));
@@ -2169,21 +2184,28 @@ lazy_scan_prune(LVRelState *vacrel,
* checksums are not enabled). Regardless, set both bits so that we
* get back in sync.
*
- * NB: If the heap page is all-visible but the VM bit is not set, we
- * don't need to dirty the heap page. However, if checksums are
- * enabled, we do need to make sure that the heap page is dirtied
- * before passing it to visibilitymap_set(), because it may be logged.
- * Given that this situation should only happen in rare cases after a
- * crash, it is not worth optimizing.
+ * If the heap page is all-visible but the VM bit is not set, we don't
+ * need to dirty the heap page. However, if checksums are enabled, we
+ * do need to make sure that the heap page is dirtied before passing
+ * it to visibilitymap_set(), because it may be logged.
*/
- PageSetAllVisible(page);
- MarkBufferDirty(buf);
+ if (!PageIsAllVisible(page))
+ {
+ PageSetAllVisible(page);
+ MarkBufferDirty(buf);
+ }
old_vmbits = visibilitymap_set(vacrel->rel, blkno, buf,
InvalidXLogRecPtr,
vmbuffer, presult.vm_conflict_horizon,
flags);
/*
+ * Even if we are only setting the all-frozen bit, there is a small
+ * chance that the VM was modified sometime between setting
+ * all_visible_according_to_vm and checking the visibility during
+ * pruning. Check the return value of old_vmbits to ensure the
+ * visibility map counters used for logging are accurate.
+ *
* If the page wasn't already set all-visible and/or all-frozen in the
* VM, count it as newly set for logging.
*/
@@ -2204,66 +2226,6 @@ lazy_scan_prune(LVRelState *vacrel,
}
}
- /*
- * If the all-visible page is all-frozen but not marked as such yet, mark
- * it as all-frozen. Note that all_frozen is only valid if all_visible is
- * true, so we must check both all_visible and all_frozen.
- */
- else if (all_visible_according_to_vm && presult.all_visible &&
- presult.all_frozen && !VM_ALL_FROZEN(vacrel->rel, blkno, &vmbuffer))
- {
- uint8 old_vmbits;
-
- /*
- * Avoid relying on all_visible_according_to_vm as a proxy for the
- * page-level PD_ALL_VISIBLE bit being set, since it might have become
- * stale -- even when all_visible is set
- */
- if (!PageIsAllVisible(page))
- {
- PageSetAllVisible(page);
- MarkBufferDirty(buf);
- }
-
- /*
- * Set the page all-frozen (and all-visible) in the VM.
- *
- * We can pass InvalidTransactionId as our cutoff_xid, since a
- * snapshotConflictHorizon sufficient to make everything safe for REDO
- * was logged when the page's tuples were frozen.
- */
- Assert(!TransactionIdIsValid(presult.vm_conflict_horizon));
- old_vmbits = visibilitymap_set(vacrel->rel, blkno, buf,
- InvalidXLogRecPtr,
- vmbuffer, InvalidTransactionId,
- VISIBILITYMAP_ALL_VISIBLE |
- VISIBILITYMAP_ALL_FROZEN);
-
- /*
- * The page was likely already set all-visible in the VM. However,
- * there is a small chance that it was modified sometime between
- * setting all_visible_according_to_vm and checking the visibility
- * during pruning. Check the return value of old_vmbits anyway to
- * ensure the visibility map counters used for logging are accurate.
- */
- if ((old_vmbits & VISIBILITYMAP_ALL_VISIBLE) == 0)
- {
- vacrel->vm_new_visible_pages++;
- vacrel->vm_new_visible_frozen_pages++;
- *vm_page_frozen = true;
- }
-
- /*
- * We already checked that the page was not set all-frozen in the VM
- * above, so we don't need to test the value of old_vmbits.
- */
- else
- {
- vacrel->vm_new_frozen_pages++;
- *vm_page_frozen = true;
- }
- }
-
return presult.ndeleted;
}
--
2.43.0
v3-0010-Eliminate-xl_heap_visible-from-vacuum-phase-I-pru.patchtext/x-patch; charset=US-ASCII; name=v3-0010-Eliminate-xl_heap_visible-from-vacuum-phase-I-pru.patchDownload
From 09533fb483d74f78a4378c3fe79091a9572b9b36 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <melanieplageman@gmail.com>
Date: Wed, 18 Jun 2025 12:41:00 -0400
Subject: [PATCH v3 10/13] Eliminate xl_heap_visible from vacuum phase I
prune/freeze
Instead of emitting a separate WAL record for every block rendered
all-visible/frozen by vacuum's phase I, include the changes to the VM in
the xl_heap_prune record already emitted.
This is only enabled for vacuum's prune/freeze work, not for on-access
pruning.
---
src/backend/access/heap/pruneheap.c | 402 ++++++++++++++++-----------
src/backend/access/heap/vacuumlazy.c | 30 --
src/include/access/heapam.h | 15 +-
3 files changed, 237 insertions(+), 210 deletions(-)
diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index 797b3710862..6208f55176f 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -44,6 +44,13 @@ typedef struct
bool mark_unused_now;
/* whether to attempt freezing tuples */
bool freeze;
+
+ /*
+ * Whether or not to consider updating the VM. There is some bookkeeping
+ * that must be maintained if we would like to update the VM.
+ */
+ bool update_vm;
+
struct VacuumCutoffs *cutoffs;
/*-------------------------------------------------------
@@ -108,8 +115,9 @@ typedef struct
*
* These fields are not used by pruning itself for the most part, but are
* used to collect information about what was pruned and what state the
- * page is in after pruning, for the benefit of the caller. They are
- * copied to the caller's PruneFreezeResult at the end.
+ * page is in after pruning to use when updating the visibility map and
+ * for the benefit of the caller. They are copied to the caller's
+ * PruneFreezeResult at the end.
* -------------------------------------------------------
*/
@@ -138,11 +146,10 @@ typedef struct
* bits. It is only valid if we froze some tuples, and all_frozen is
* true.
*
- * NOTE: all_visible and all_frozen don't include LP_DEAD items. That's
- * convenient for heap_page_prune_and_freeze(), to use them to decide
- * whether to freeze the page or not. The all_visible and all_frozen
- * values returned to the caller are adjusted to include LP_DEAD items at
- * the end.
+ * NOTE: all_visible and all_frozen don't include LP_DEAD items until
+ * directly before updating the VM. We ignore LP_DEAD items when deciding
+ * whether or not to opportunistically freeze and when determining the
+ * snapshot conflict horizon required when freezing tuples.
*
* all_frozen should only be considered valid if all_visible is also set;
* we don't bother to clear the all_frozen flag every time we clear the
@@ -377,11 +384,15 @@ identify_and_fix_vm_corruption(Relation relation,
* considered advantageous for overall system performance to do so now. The
* 'cutoffs', 'presult', 'new_relfrozen_xid' and 'new_relmin_mxid' arguments
* are required when freezing. When HEAP_PRUNE_FREEZE option is set, we also
- * set presult->all_visible and presult->all_frozen on exit, to indicate if
- * the VM bits can be set. They are always set to false when the
- * HEAP_PRUNE_FREEZE option is not set, because at the moment only callers
+ * set presult->all_visible and presult->all_frozen on exit, for use when
+ * validating the changes made to the VM. They are always set to false when
+ * the HEAP_PRUNE_FREEZE option is not set, because at the moment only callers
* that also freeze need that information.
*
+ * If HEAP_PAGE_PRUNE_UPDATE_VM is set and the visibility status of the page
+ * has changed, we will update the VM at the same time as pruning and freezing
+ * the heap page.
+ *
* blk_known_av is the visibility status of the heap block as of the last call
* to find_next_unskippable_block(). vmbuffer is the buffer that may already
* contain the required block of the visibility map.
@@ -396,6 +407,8 @@ identify_and_fix_vm_corruption(Relation relation,
* FREEZE indicates that we will also freeze tuples, and will return
* 'all_visible', 'all_frozen' flags to the caller.
*
+ * UPDATE_VM indicates that we will set the page's status in the VM.
+ *
* cutoffs contains the freeze cutoffs, established by VACUUM at the beginning
* of vacuuming the relation. Required if HEAP_PRUNE_FREEZE option is set.
* cutoffs->OldestXmin is also used to determine if dead tuples are
@@ -441,15 +454,19 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
bool do_freeze;
bool do_prune;
bool do_hint;
+ bool do_set_vm;
uint8 vmflags = 0;
uint8 old_vmbits = 0;
bool hint_bit_fpi;
int64 fpi_before = pgWalUsage.wal_fpi;
+ bool all_frozen_except_lp_dead = false;
+ bool set_pd_all_visible = false;
/* Copy parameters to prstate */
prstate.vistest = vistest;
prstate.mark_unused_now = (options & HEAP_PAGE_PRUNE_MARK_UNUSED_NOW) != 0;
prstate.freeze = (options & HEAP_PAGE_PRUNE_FREEZE) != 0;
+ prstate.update_vm = (options & HEAP_PAGE_PRUNE_UPDATE_VM) != 0;
prstate.cutoffs = cutoffs;
/*
@@ -496,50 +513,53 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
prstate.deadoffsets = presult->deadoffsets;
/*
- * Caller may update the VM after we're done. We can keep track of
- * whether the page will be all-visible and all-frozen after pruning and
- * freezing to help the caller to do that.
+ * Keep track of whether or not the page will be all-visible and
+ * all-frozen for use in opportunistic freezing and to update the VM if
+ * the caller requests it.
+ *
+ * Currently, only VACUUM attempts freezing and setting the VM bits. But
+ * other callers could do either one. The visibility bookkeeping is
+ * required for opportunistic freezing (in addition to setting the VM
+ * bits) because we only consider opportunistically freezing tuples if the
+ * whole page would become all-frozen or if the whole page will be frozen
+ * except for dead tuples that will be removed by vacuum.
*
- * Currently, only VACUUM sets the VM bits. To save the effort, only do
- * the bookkeeping if the caller needs it. Currently, that's tied to
- * HEAP_PAGE_PRUNE_FREEZE, but it could be a separate flag if you wanted
- * to update the VM bits without also freezing or freeze without also
- * setting the VM bits.
+ * Dead tuples which will be removed by the end of vacuuming should not
+ * preclude us from opportunistically freezing, so we do not clear
+ * all_visible when we see LP_DEAD items. We fix that after determining
+ * whether or not to freeze but before deciding whether or not to update
+ * the VM so that we don't set the VM bit incorrectly.
*
- * In addition to telling the caller whether it can set the VM bit, we
- * also use 'all_visible' and 'all_frozen' for our own decision-making. If
- * the whole page would become frozen, we consider opportunistically
- * freezing tuples. We will not be able to freeze the whole page if there
- * are tuples present that are not visible to everyone or if there are
- * dead tuples which are not yet removable. However, dead tuples which
- * will be removed by the end of vacuuming should not preclude us from
- * opportunistically freezing. Because of that, we do not clear
- * all_visible when we see LP_DEAD items. We fix that at the end of the
- * function, when we return the value to the caller, so that the caller
- * doesn't set the VM bit incorrectly.
+ * If not freezing or updating the VM, we otherwise avoid the extra
+ * bookkeeping. Initializing all_visible to false allows skipping the work
+ * to update them in heap_prune_record_unchanged_lp_normal().
*/
if (prstate.freeze)
{
prstate.all_visible = true;
prstate.all_frozen = true;
}
+ else if (prstate.update_vm)
+ {
+ prstate.all_visible = true;
+ prstate.all_frozen = false;
+ }
else
{
- /*
- * Initializing to false allows skipping the work to update them in
- * heap_prune_record_unchanged_lp_normal().
- */
prstate.all_visible = false;
prstate.all_frozen = false;
}
/*
- * The visibility cutoff xid is the newest xmin of live tuples on the
- * page. In the common case, this will be set as the conflict horizon the
- * caller can use for updating the VM. If, at the end of freezing and
- * pruning, the page is all-frozen, there is no possibility that any
- * running transaction on the standby does not see tuples on the page as
- * all-visible, so the conflict horizon remains InvalidTransactionId.
+ * The visibility cutoff xid is the newest xmin of live, committed tuples
+ * older than OldestXmin on the page. This field is only kept up-to-date
+ * if the page is all-visible. As soon as a tuple is encountered that is
+ * not visible to all, this field is unmaintained. As long as it is
+ * maintained, it can be used to calculate the snapshot conflict horizon.
+ * This is most likely to happen when updating the VM and/or freezing all
+ * live tuples on the page. It is updated before returning to the caller
+ * because vacuum does assert-build only validation on the page using this
+ * field.
*/
prstate.visibility_cutoff_xid = InvalidTransactionId;
@@ -827,6 +847,68 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
*/
}
+ /*
+ * It was convenient to ignore LP_DEAD items in all_visible earlier on to
+ * make the choice of whether or not to freeze the page unaffected by the
+ * short-term presence of LP_DEAD items. These LP_DEAD items were
+ * effectively assumed to be LP_UNUSED items in the making. It doesn't
+ * matter which vacuum heap pass (initial pass or final pass) ends up
+ * setting the page all-frozen, as long as the ongoing VACUUM does it.
+ *
+ * Now that freezing has been finalized, unset all_visible if there are
+ * any LP_DEAD items on the page. It needs to reflect the present state of
+ * the page when using it to determine whether or not to update the VM.
+ *
+ * Keep track of whether or not the page was all-frozen except LP_DEAD
+ * items for the purposes of calculating the snapshot conflict horizon,
+ * though.
+ */
+ all_frozen_except_lp_dead = prstate.all_frozen;
+ if (prstate.lpdead_items > 0)
+ {
+ prstate.all_visible = false;
+ prstate.all_frozen = false;
+ }
+
+ /*
+ * Handle setting visibility map bit based on information from the VM (as
+ * of last heap_vac_scan_next_block() call), and from all_visible and
+ * all_frozen variables.
+ */
+ if (prstate.update_vm)
+ {
+ /*
+ * Clear any VM corruption. This does not need to be in a critical
+ * section, so we do it first. If PD_ALL_VISIBLE is incorrectly set,
+ * we may mark the heap page buffer dirty here and could end up doing
+ * so again later. This is not a correctness issue and is in the path
+ * of VM corruption, so we don't have to worry about the extra
+ * performance overhead.
+ */
+ if (identify_and_fix_vm_corruption(relation,
+ blockno, buffer, page,
+ blk_known_av, prstate.lpdead_items, vmbuffer))
+ {
+ /* If we fix corruption, don't update the VM further */
+ }
+
+ /* Determine if we actually need to set the VM and which bits to set. */
+ else if (prstate.all_visible &&
+ (!blk_known_av ||
+ (prstate.all_frozen && !VM_ALL_FROZEN(relation, blockno, &vmbuffer))))
+ {
+ vmflags |= VISIBILITYMAP_ALL_VISIBLE;
+ if (prstate.all_frozen)
+ vmflags |= VISIBILITYMAP_ALL_FROZEN;
+ }
+ }
+
+ do_set_vm = vmflags & VISIBILITYMAP_VALID_BITS;
+ set_pd_all_visible = do_set_vm && !PageIsAllVisible(page);
+
+ /* Save these for the caller in case we later zero out vmflags */
+ presult->new_vmbits = vmflags;
+
/* Any error while applying the changes is critical */
START_CRIT_SECTION();
@@ -848,13 +930,13 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
/*
* If that's all we had to do to the page, this is a non-WAL-logged
* hint. If we are going to freeze or prune the page, we will mark
- * the buffer dirty below.
+ * the buffer dirty and emit WAL below.
*/
- if (!do_freeze && !do_prune)
+ if (!do_prune && !do_freeze && !do_set_vm)
MarkBufferDirtyHint(buffer, true);
}
- if (do_prune || do_freeze)
+ if (do_prune || do_freeze || do_set_vm)
{
/* Apply the planned item changes and repair page fragmentation. */
if (do_prune)
@@ -868,12 +950,34 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
if (do_freeze)
heap_freeze_prepared_tuples(buffer, prstate.frozen, prstate.nfrozen);
- MarkBufferDirty(buffer);
+ if (set_pd_all_visible)
+ PageSetAllVisible(page);
+
+ if (do_prune || do_freeze || set_pd_all_visible)
+ MarkBufferDirty(buffer);
+
+ if (do_set_vm)
+ {
+ LockBuffer(vmbuffer, BUFFER_LOCK_EXCLUSIVE);
+ old_vmbits = visibilitymap_set_vmbyte(relation, blockno,
+ vmbuffer, vmflags);
+
+ if (old_vmbits == vmflags)
+ {
+ LockBuffer(vmbuffer, BUFFER_LOCK_UNLOCK);
+ do_set_vm = false;
+ /* 0 out vmflags so we don't emit VM update WAL */
+ vmflags = 0;
+ }
+ }
/*
- * Emit a WAL XLOG_HEAP2_PRUNE_FREEZE record showing what we did
+ * Emit a WAL XLOG_HEAP2_PRUNE_FREEZE record showing what we did. If
+ * we were only updating the VM and it turns out it was already set,
+ * we will have unset do_set_vm above. As such, check it again before
+ * emitting the record.
*/
- if (RelationNeedsWAL(relation))
+ if (RelationNeedsWAL(relation) && (do_set_vm || do_prune || do_freeze))
{
/*
* The snapshotConflictHorizon for the whole record should be the
@@ -885,35 +989,57 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
* on the standby with xids older than the youngest tuple this
* record will freeze will conflict.
*/
- TransactionId frz_conflict_horizon = InvalidTransactionId;
- TransactionId conflict_xid;
+ TransactionId conflict_xid = InvalidTransactionId;
/*
- * We can use the visibility_cutoff_xid as our cutoff for
- * conflicts when the whole page is eligible to become all-frozen
- * in the VM once we're done with it. Otherwise we generate a
- * conservative cutoff by stepping back from OldestXmin.
+ * If we are updating the VM, the conflict horizon is almost
+ * always the visibility cutoff XID.
+ *
+ * Separately, if we are freezing any tuples, as an optimization,
+ * we can use the visibility_cutoff_xid as the conflict horizon if
+ * the page will be all-frozen. This is true even if there are
+ * LP_DEAD line pointers because we ignored those when maintaining
+ * the visibility_cutoff_xid.
*/
- if (do_freeze)
+ if (do_set_vm || (do_freeze && all_frozen_except_lp_dead))
+ conflict_xid = prstate.visibility_cutoff_xid;
+
+ /*
+ * Otherwise, if we are freezing but the page would not be
+ * all-frozen, we have to use the more pessimistic horizon of
+ * OldestXmin, which may be newer than the newest tuple we froze.
+ * That's because we won't have maintained the
+ * visibility_cutoff_xid.
+ */
+ else if (do_freeze)
{
- if (prstate.all_visible && prstate.all_frozen)
- frz_conflict_horizon = prstate.visibility_cutoff_xid;
- else
- {
- /* Avoids false conflicts when hot_standby_feedback in use */
- frz_conflict_horizon = prstate.cutoffs->OldestXmin;
- TransactionIdRetreat(frz_conflict_horizon);
- }
+ conflict_xid = prstate.cutoffs->OldestXmin;
+ TransactionIdRetreat(conflict_xid);
}
- if (TransactionIdFollows(frz_conflict_horizon, prstate.latest_xid_removed))
- conflict_xid = frz_conflict_horizon;
- else
+ /*
+ * If we are removing tuples with a younger xmax than our so far
+ * calculated conflict_xid, we must use this as our horizon.
+ */
+ if (TransactionIdFollows(prstate.latest_xid_removed, conflict_xid))
conflict_xid = prstate.latest_xid_removed;
+ /*
+ * We can omit the snapshot conflict horizon if we are not pruning
+ * or freezing any tuples and are setting an already all-visible
+ * page all-frozen in the VM. In this case, all of the tuples on
+ * the page must already be visible to all MVCC snapshots on the
+ * standby.
+ */
+ if (!do_prune && !do_freeze && do_set_vm &&
+ blk_known_av && (vmflags & VISIBILITYMAP_ALL_FROZEN))
+ conflict_xid = InvalidTransactionId;
+
log_heap_prune_and_freeze(relation, buffer,
false,
- InvalidBuffer, 0, false,
+ vmbuffer,
+ vmflags,
+ set_pd_all_visible,
conflict_xid,
true, reason,
prstate.frozen, prstate.nfrozen,
@@ -925,124 +1051,55 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
END_CRIT_SECTION();
- /* Copy information back for caller */
- presult->ndeleted = prstate.ndeleted;
- presult->nnewlpdead = prstate.ndead;
- presult->nfrozen = prstate.nfrozen;
- presult->live_tuples = prstate.live_tuples;
- presult->recently_dead_tuples = prstate.recently_dead_tuples;
+ if (do_set_vm)
+ LockBuffer(vmbuffer, BUFFER_LOCK_UNLOCK);
- /*
- * It was convenient to ignore LP_DEAD items in all_visible earlier on to
- * make the choice of whether or not to freeze the page unaffected by the
- * short-term presence of LP_DEAD items. These LP_DEAD items were
- * effectively assumed to be LP_UNUSED items in the making. It doesn't
- * matter which vacuum heap pass (initial pass or final pass) ends up
- * setting the page all-frozen, as long as the ongoing VACUUM does it.
- *
- * Now that freezing has been finalized, unset all_visible if there are
- * any LP_DEAD items on the page. It needs to reflect the present state
- * of the page, as expected for updating the visibility map.
- */
- if (prstate.all_visible && prstate.lpdead_items == 0)
- {
- presult->all_visible = prstate.all_visible;
- presult->all_frozen = prstate.all_frozen;
- }
- else
- {
- presult->all_visible = false;
- presult->all_frozen = false;
- }
-
- presult->hastup = prstate.hastup;
+ Assert(!prstate.all_visible || (prstate.lpdead_items == 0));
/*
- * If updating the visibility map, the conflict horizon for that record
- * must be the newest xmin on the page. However, if the page is
- * completely frozen, there can be no conflict and the vm_conflict_horizon
- * should remain InvalidTransactionId. This includes the case that we
- * just froze all the tuples; the prune-freeze record included the
- * conflict XID already so the VM update record doesn't need it.
+ * VACUUM will call heap_page_is_all_visible() during the second pass over
+ * the heap to determine all_visible and all_frozen for the page -- this
+ * is a specialized version of the logic from this function. Now that
+ * we've finished pruning and freezing, make sure that we're in total
+ * agreement with heap_page_is_all_visible() using an assertion. We will
+ * have already set the page in the VM, so this assertion will only let
+ * you know that you've already done something wrong.
*/
- if (presult->all_frozen)
- presult->vm_conflict_horizon = InvalidTransactionId;
- else
- presult->vm_conflict_horizon = prstate.visibility_cutoff_xid;
-
- /*
- * Handle setting visibility map bit based on information from the VM (as
- * of last heap_vac_scan_next_block() call), and from all_visible and
- * all_frozen variables.
- */
- if (options & HEAP_PAGE_PRUNE_UPDATE_VM)
+#ifdef USE_ASSERT_CHECKING
+ if (prstate.all_visible)
{
- if (identify_and_fix_vm_corruption(relation,
- blockno, buffer, page,
- blk_known_av,
- prstate.lpdead_items, vmbuffer))
- {
- /* If we fix corruption, don't update the VM further */
- }
+ TransactionId debug_cutoff;
+ bool debug_all_frozen;
- /*
- * If the page isn't yet marked all-visible in the VM or it is and
- * needs to me marked all-frozen, update the VM Note that all_frozen
- * is only valid if all_visible is true, so we must check both
- * all_visible and all_frozen.
- */
- else if (presult->all_visible &&
- (!blk_known_av ||
- (presult->all_frozen && !VM_ALL_FROZEN(relation, blockno, &vmbuffer))))
- {
- Assert(prstate.lpdead_items == 0);
- vmflags = VISIBILITYMAP_ALL_VISIBLE;
+ Assert(cutoffs);
- /*
- * If the page is all-frozen, we can pass InvalidTransactionId as
- * our cutoff_xid, since a snapshotConflictHorizon sufficient to
- * make everything safe for REDO was logged when the page's tuples
- * were frozen.
- */
- if (presult->all_frozen)
- {
- Assert(!TransactionIdIsValid(presult->vm_conflict_horizon));
- vmflags |= VISIBILITYMAP_ALL_FROZEN;
- }
+ Assert(prstate.lpdead_items == 0);
- /*
- * It's possible for the VM bit to be clear and the page-level bit
- * to be set if checksums are not enabled.
- *
- * And even if we are just planning to update the frozen bit in
- * the VM, we shouldn't rely on all_visible_according_to_vm as a
- * proxy for the page-level PD_ALL_VISIBLE bit being set, since it
- * might have become stale.
- *
- * If the heap page is all-visible but the VM bit is not set, we
- * don't need to dirty the heap page. However, if checksums are
- * enabled, we do need to make sure that the heap page is dirtied
- * before passing it to visibilitymap_set(), because it may be
- * logged.
- */
- if (!PageIsAllVisible(page))
- {
- PageSetAllVisible(page);
- MarkBufferDirty(buffer);
- }
+ if (!heap_page_is_all_visible(relation, buffer,
+ cutoffs->OldestXmin,
+ &debug_all_frozen,
+ &debug_cutoff, off_loc))
+ Assert(false);
- old_vmbits = visibilitymap_set(relation, blockno, buffer, InvalidXLogRecPtr,
- vmbuffer, presult->vm_conflict_horizon,
- vmflags);
- }
+ Assert(prstate.all_frozen == debug_all_frozen);
+
+ Assert(!TransactionIdIsValid(debug_cutoff) ||
+ debug_cutoff == prstate.visibility_cutoff_xid);
}
+#endif
+ /* Copy information back for caller */
+ presult->ndeleted = prstate.ndeleted;
+ presult->nnewlpdead = prstate.ndead;
+ presult->nfrozen = prstate.nfrozen;
+ presult->live_tuples = prstate.live_tuples;
+ presult->recently_dead_tuples = prstate.recently_dead_tuples;
+ presult->old_vmbits = old_vmbits;
+ /* new_vmbits was set above */
+ presult->hastup = prstate.hastup;
presult->lpdead_items = prstate.lpdead_items;
/* the presult->deadoffsets array was already filled in */
- presult->old_vmbits = old_vmbits;
- presult->new_vmbits = vmflags;
-
if (prstate.freeze)
{
if (presult->nfrozen > 0)
@@ -1624,7 +1681,12 @@ heap_prune_record_unchanged_lp_normal(Page page, PruneState *prstate, OffsetNumb
break;
}
- /* Consider freezing any normal tuples which will not be removed */
+ /*
+ * Consider freezing any normal tuples which will not be removed.
+ * Regardless of whether or not we want to freeze the tuples, if we want
+ * to update the VM, we have to call heap_prepare_freeze_tuple() on every
+ * tuple to know whether or not the page will be totally frozen.
+ */
if (prstate->freeze)
{
bool totally_frozen;
@@ -2238,6 +2300,8 @@ log_heap_prune_and_freeze(Relation relation, Buffer buffer,
OffsetNumber frz_offsets[MaxHeapTuplesPerPage];
bool do_prune = nredirected > 0 || ndead > 0 || nunused > 0;
+ Assert(do_prune || nfrozen > 0 || vmflags & VISIBILITYMAP_VALID_BITS);
+
xlrec.flags = 0;
regbuf_flags = REGBUF_STANDARD;
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index c49e81bc5dd..91e209901b8 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -2014,34 +2014,6 @@ lazy_scan_prune(LVRelState *vacrel,
vacrel->new_frozen_tuple_pages++;
}
- /*
- * VACUUM will call heap_page_is_all_visible() during the second pass over
- * the heap to determine all_visible and all_frozen for the page -- this
- * is a specialized version of the logic from this function. Now that
- * we've finished pruning and freezing, make sure that we're in total
- * agreement with heap_page_is_all_visible() using an assertion.
- */
-#ifdef USE_ASSERT_CHECKING
- /* Note that all_frozen value does not matter when !all_visible */
- if (presult.all_visible)
- {
- TransactionId debug_cutoff;
- bool debug_all_frozen;
-
- Assert(presult.lpdead_items == 0);
-
- if (!heap_page_is_all_visible(vacrel->rel, buf,
- vacrel->cutoffs.OldestXmin, &debug_all_frozen,
- &debug_cutoff, &vacrel->offnum))
- Assert(false);
-
- Assert(presult.all_frozen == debug_all_frozen);
-
- Assert(!TransactionIdIsValid(debug_cutoff) ||
- debug_cutoff == presult.vm_conflict_horizon);
- }
-#endif
-
/*
* Now save details of the LP_DEAD items from the page in vacrel
*/
@@ -2075,8 +2047,6 @@ lazy_scan_prune(LVRelState *vacrel,
/* Did we find LP_DEAD items? */
*has_lpdead_items = (presult.lpdead_items > 0);
- Assert(!presult.all_visible || !(*has_lpdead_items));
-
/*
* For the purposes of logging, count whether or not the page was newly
* set all-visible and, potentially, all-frozen.
diff --git a/src/include/access/heapam.h b/src/include/access/heapam.h
index b85648456e9..0b9bb1c9b13 100644
--- a/src/include/access/heapam.h
+++ b/src/include/access/heapam.h
@@ -235,19 +235,12 @@ typedef struct PruneFreezeResult
int recently_dead_tuples;
/*
- * all_visible and all_frozen indicate the status of the page as reflected
- * in the visibility map after pruning, freezing, and setting any pages
- * all-visible in the visibility map.
+ * old_vmbits are the state of the all-visible and all-frozen bits in the
+ * visibility map before updating it during phase I of vacuuming.
+ * new_vmbits are the state of those bits after phase I of vacuuming.
*
- * vm_conflict_horizon is the newest xmin of live tuples on the page
- * (older than OldestXmin). It will only be valid if we did not set the
- * page all-frozen in the VM.
- *
- * These are only set if the HEAP_PRUNE_FREEZE option is set.
+ * These are only set if the HEAP_PAGE_PRUNE_UPDATE_VM option is set.
*/
- bool all_visible;
- bool all_frozen;
- TransactionId vm_conflict_horizon;
uint8 old_vmbits;
uint8 new_vmbits;
--
2.43.0
v3-0006-Combine-lazy_scan_prune-VM-corruption-cases.patchtext/x-patch; charset=US-ASCII; name=v3-0006-Combine-lazy_scan_prune-VM-corruption-cases.patchDownload
From 6bd70652d1a0c9d219ade5001c6b7f79298c4a5f Mon Sep 17 00:00:00 2001
From: Melanie Plageman <melanieplageman@gmail.com>
Date: Wed, 28 May 2025 16:04:03 -0400
Subject: [PATCH v3 06/13] Combine lazy_scan_prune VM corruption cases
lazy_scan_prune() updates the visibility map after phase I of heap
vacuuming. It also checks and fixes corruption in the VM. The corruption
cases where mixed in with the normal visibility map update cases.
Careful study of the ordering of the current logic reveals that the
corruption cases can be reordered and extracted into a separate
function. This should result in no additional overhead when compared to
previous execution.
This reordering makes it clear which cases are about corruption and
which cases are normal VM updates. Separating them also makes it
possible to combine the normal cases in a future commit. This will make
the logic easier to understand and allow for further separation of the
logic to allow updating the VM in the same record as pruning and
freezing in phase I.
---
src/backend/access/heap/vacuumlazy.c | 114 +++++++++++++++++----------
1 file changed, 73 insertions(+), 41 deletions(-)
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index d9e195269d2..04a7b6c4181 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -431,6 +431,12 @@ static void find_next_unskippable_block(LVRelState *vacrel, bool *skipsallvis);
static bool lazy_scan_new_or_empty(LVRelState *vacrel, Buffer buf,
BlockNumber blkno, Page page,
bool sharelock, Buffer vmbuffer);
+static bool identify_and_fix_vm_corruption(Relation relation,
+ BlockNumber heap_blk,
+ Buffer heap_buffer, Page heap_page,
+ bool heap_blk_known_av,
+ int64 nlpdead_items,
+ Buffer vmbuffer);
static int lazy_scan_prune(LVRelState *vacrel, Buffer buf,
BlockNumber blkno, Page page,
Buffer vmbuffer, bool all_visible_according_to_vm,
@@ -1933,6 +1939,66 @@ lazy_scan_new_or_empty(LVRelState *vacrel, Buffer buf, BlockNumber blkno,
return false;
}
+/*
+ * When updating the visibility map after phase I heap vacuuming, we take the
+ * opportunity to identify and fix any VM corruption.
+ *
+ * heap_blk_known_av is the visibility status of the heap page collected
+ * while finding the next unskippable block in heap_vac_scan_next_block().
+ */
+static bool
+identify_and_fix_vm_corruption(Relation relation,
+ BlockNumber heap_blk,
+ Buffer heap_buffer, Page heap_page,
+ bool heap_blk_known_av,
+ int64 nlpdead_items,
+ Buffer vmbuffer)
+{
+ /*
+ * As of PostgreSQL 9.2, the visibility map bit should never be set if the
+ * page-level bit is clear. However, it's possible that the bit got
+ * cleared after heap_vac_scan_next_block() was called, so we must recheck
+ * with buffer lock before concluding that the VM is corrupt.
+ */
+ if (heap_blk_known_av && !PageIsAllVisible(heap_page) &&
+ visibilitymap_get_status(relation, heap_blk, &vmbuffer) != 0)
+ {
+ elog(WARNING, "page is not marked all-visible but visibility map bit is set in relation \"%s\" page %u",
+ RelationGetRelationName(relation), heap_blk);
+ visibilitymap_clear(relation, heap_blk, vmbuffer,
+ VISIBILITYMAP_VALID_BITS);
+ return true;
+ }
+
+ /*
+ * It's possible for the value returned by
+ * GetOldestNonRemovableTransactionId() to move backwards, so it's not
+ * wrong for us to see tuples that appear to not be visible to everyone
+ * yet, while PD_ALL_VISIBLE is already set. The real safe xmin value
+ * never moves backwards, but GetOldestNonRemovableTransactionId() is
+ * conservative and sometimes returns a value that's unnecessarily small,
+ * so if we see that contradiction it just means that the tuples that we
+ * think are not visible to everyone yet actually are, and the
+ * PD_ALL_VISIBLE flag is correct.
+ *
+ * There should never be LP_DEAD items on a page with PD_ALL_VISIBLE set,
+ * however.
+ */
+ if (nlpdead_items > 0 && PageIsAllVisible(heap_page))
+ {
+ elog(WARNING, "page containing LP_DEAD items is marked as all-visible in relation \"%s\" page %u",
+ RelationGetRelationName(relation), heap_blk);
+ PageClearAllVisible(heap_page);
+ MarkBufferDirty(heap_buffer);
+ visibilitymap_clear(relation, heap_blk, vmbuffer,
+ VISIBILITYMAP_VALID_BITS);
+ return true;
+ }
+
+ return false;
+}
+
+
/* qsort comparator for sorting OffsetNumbers */
static int
cmpOffsetNumbers(const void *a, const void *b)
@@ -2079,9 +2145,14 @@ lazy_scan_prune(LVRelState *vacrel,
/*
* Handle setting visibility map bit based on information from the VM (as
* of last heap_vac_scan_next_block() call), and from all_visible and
- * all_frozen variables
+ * all_frozen variables. Start by looking for any VM corruption.
*/
- if (!all_visible_according_to_vm && presult.all_visible)
+ if (identify_and_fix_vm_corruption(vacrel->rel, blkno, buf, page,
+ all_visible_according_to_vm, presult.lpdead_items, vmbuffer))
+ {
+ /* Don't update the VM if we just cleared corruption in it */
+ }
+ else if (!all_visible_according_to_vm && presult.all_visible)
{
uint8 old_vmbits;
uint8 flags = VISIBILITYMAP_ALL_VISIBLE;
@@ -2133,45 +2204,6 @@ lazy_scan_prune(LVRelState *vacrel,
}
}
- /*
- * As of PostgreSQL 9.2, the visibility map bit should never be set if the
- * page-level bit is clear. However, it's possible that the bit got
- * cleared after heap_vac_scan_next_block() was called, so we must recheck
- * with buffer lock before concluding that the VM is corrupt.
- */
- else if (all_visible_according_to_vm && !PageIsAllVisible(page) &&
- visibilitymap_get_status(vacrel->rel, blkno, &vmbuffer) != 0)
- {
- elog(WARNING, "page is not marked all-visible but visibility map bit is set in relation \"%s\" page %u",
- vacrel->relname, blkno);
- visibilitymap_clear(vacrel->rel, blkno, vmbuffer,
- VISIBILITYMAP_VALID_BITS);
- }
-
- /*
- * It's possible for the value returned by
- * GetOldestNonRemovableTransactionId() to move backwards, so it's not
- * wrong for us to see tuples that appear to not be visible to everyone
- * yet, while PD_ALL_VISIBLE is already set. The real safe xmin value
- * never moves backwards, but GetOldestNonRemovableTransactionId() is
- * conservative and sometimes returns a value that's unnecessarily small,
- * so if we see that contradiction it just means that the tuples that we
- * think are not visible to everyone yet actually are, and the
- * PD_ALL_VISIBLE flag is correct.
- *
- * There should never be LP_DEAD items on a page with PD_ALL_VISIBLE set,
- * however.
- */
- else if (presult.lpdead_items > 0 && PageIsAllVisible(page))
- {
- elog(WARNING, "page containing LP_DEAD items is marked as all-visible in relation \"%s\" page %u",
- vacrel->relname, blkno);
- PageClearAllVisible(page);
- MarkBufferDirty(buf);
- visibilitymap_clear(vacrel->rel, blkno, vmbuffer,
- VISIBILITYMAP_VALID_BITS);
- }
-
/*
* If the all-visible page is all-frozen but not marked as such yet, mark
* it as all-frozen. Note that all_frozen is only valid if all_visible is
--
2.43.0
v3-0008-Find-and-fix-VM-corruption-in-heap_page_prune_and.patchtext/x-patch; charset=US-ASCII; name=v3-0008-Find-and-fix-VM-corruption-in-heap_page_prune_and.patchDownload
From d59b1a5ac1eca7c2315effc06701ae4cc1703513 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <melanieplageman@gmail.com>
Date: Wed, 28 May 2025 16:45:59 -0400
Subject: [PATCH v3 08/13] Find and fix VM corruption in
heap_page_prune_and_freeze
Future commits will update the VM in the same critical section and WAL
record as pruning and freezing. For ease of review, this commit makes
one step toward doing this. It moves the VM corruption handling case to
heap_page_prune_and_freeze().
---
src/backend/access/heap/pruneheap.c | 87 +++++++++++++++++++++++++++-
src/backend/access/heap/vacuumlazy.c | 77 +++---------------------
src/include/access/heapam.h | 4 ++
3 files changed, 96 insertions(+), 72 deletions(-)
diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index 2724cf7f64f..070d64fa9c3 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -21,7 +21,7 @@
#include "access/transam.h"
#include "access/xlog.h"
#include "access/xloginsert.h"
-#include "access/visibilitymapdefs.h"
+#include "access/visibilitymap.h"
#include "commands/vacuum.h"
#include "executor/instrument.h"
#include "miscadmin.h"
@@ -177,6 +177,13 @@ static void heap_prune_record_unchanged_lp_redirect(PruneState *prstate, OffsetN
static void page_verify_redirects(Page page);
+static bool identify_and_fix_vm_corruption(Relation relation,
+ BlockNumber heap_blk,
+ Buffer heap_buffer, Page heap_page,
+ bool heap_blk_known_av,
+ int64 nlpdead_items,
+ Buffer vmbuffer);
+
/*
* Optionally prune and repair fragmentation in the specified page.
@@ -261,7 +268,9 @@ heap_page_prune_opt(Relation relation, Buffer buffer)
* not the relation has indexes, since we cannot safely determine
* that during on-access pruning with the current implementation.
*/
- heap_page_prune_and_freeze(relation, buffer, vistest, 0,
+ heap_page_prune_and_freeze(relation, buffer, false,
+ InvalidBuffer,
+ vistest, 0,
NULL, &presult, PRUNE_ON_ACCESS, &dummy_off_loc, NULL, NULL);
/*
@@ -294,6 +303,64 @@ heap_page_prune_opt(Relation relation, Buffer buffer)
}
}
+/*
+ * When updating the visibility map after phase I heap vacuuming, we take the
+ * opportunity to identify and fix any VM corruption.
+ *
+ * heap_blk_known_av is the visibility status of the heap page collected
+ * while finding the next unskippable block in heap_vac_scan_next_block().
+ */
+static bool
+identify_and_fix_vm_corruption(Relation relation,
+ BlockNumber heap_blk,
+ Buffer heap_buffer, Page heap_page,
+ bool heap_blk_known_av,
+ int64 nlpdead_items,
+ Buffer vmbuffer)
+{
+ /*
+ * As of PostgreSQL 9.2, the visibility map bit should never be set if the
+ * page-level bit is clear. However, it's possible that the bit got
+ * cleared after heap_vac_scan_next_block() was called, so we must recheck
+ * with buffer lock before concluding that the VM is corrupt.
+ */
+ if (heap_blk_known_av && !PageIsAllVisible(heap_page) &&
+ visibilitymap_get_status(relation, heap_blk, &vmbuffer) != 0)
+ {
+ elog(WARNING, "page is not marked all-visible but visibility map bit is set in relation \"%s\" page %u",
+ RelationGetRelationName(relation), heap_blk);
+ visibilitymap_clear(relation, heap_blk, vmbuffer,
+ VISIBILITYMAP_VALID_BITS);
+ return true;
+ }
+
+ /*
+ * It's possible for the value returned by
+ * GetOldestNonRemovableTransactionId() to move backwards, so it's not
+ * wrong for us to see tuples that appear to not be visible to everyone
+ * yet, while PD_ALL_VISIBLE is already set. The real safe xmin value
+ * never moves backwards, but GetOldestNonRemovableTransactionId() is
+ * conservative and sometimes returns a value that's unnecessarily small,
+ * so if we see that contradiction it just means that the tuples that we
+ * think are not visible to everyone yet actually are, and the
+ * PD_ALL_VISIBLE flag is correct.
+ *
+ * There should never be LP_DEAD items on a page with PD_ALL_VISIBLE set,
+ * however.
+ */
+ if (nlpdead_items > 0 && PageIsAllVisible(heap_page))
+ {
+ elog(WARNING, "page containing LP_DEAD items is marked as all-visible in relation \"%s\" page %u",
+ RelationGetRelationName(relation), heap_blk);
+ PageClearAllVisible(heap_page);
+ MarkBufferDirty(heap_buffer);
+ visibilitymap_clear(relation, heap_blk, vmbuffer,
+ VISIBILITYMAP_VALID_BITS);
+ return true;
+ }
+
+ return false;
+}
/*
* Prune and repair fragmentation and potentially freeze tuples on the
@@ -314,6 +381,10 @@ heap_page_prune_opt(Relation relation, Buffer buffer)
* HEAP_PRUNE_FREEZE option is not set, because at the moment only callers
* that also freeze need that information.
*
+ * blk_known_av is the visibility status of the heap block as of the last call
+ * to find_next_unskippable_block(). vmbuffer is the buffer that may already
+ * contain the required block of the visibility map.
+ *
* vistest is used to distinguish whether tuples are DEAD or RECENTLY_DEAD
* (see heap_prune_satisfies_vacuum).
*
@@ -349,6 +420,8 @@ heap_page_prune_opt(Relation relation, Buffer buffer)
*/
void
heap_page_prune_and_freeze(Relation relation, Buffer buffer,
+ bool blk_known_av,
+ Buffer vmbuffer,
GlobalVisState *vistest,
int options,
struct VacuumCutoffs *cutoffs,
@@ -897,6 +970,16 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
presult->lpdead_items = prstate.lpdead_items;
/* the presult->deadoffsets array was already filled in */
+ /*
+ * Clear any VM corruption. This does not need to be done in a critical
+ * section.
+ */
+ presult->vm_corruption = false;
+ if (options & HEAP_PAGE_PRUNE_UPDATE_VM)
+ presult->vm_corruption = identify_and_fix_vm_corruption(relation,
+ blockno, buffer, page,
+ blk_known_av,
+ prstate.lpdead_items, vmbuffer);
if (prstate.freeze)
{
if (presult->nfrozen > 0)
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index da71f095da9..6110b7f80ce 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -431,12 +431,6 @@ static void find_next_unskippable_block(LVRelState *vacrel, bool *skipsallvis);
static bool lazy_scan_new_or_empty(LVRelState *vacrel, Buffer buf,
BlockNumber blkno, Page page,
bool sharelock, Buffer vmbuffer);
-static bool identify_and_fix_vm_corruption(Relation relation,
- BlockNumber heap_blk,
- Buffer heap_buffer, Page heap_page,
- bool heap_blk_known_av,
- int64 nlpdead_items,
- Buffer vmbuffer);
static int lazy_scan_prune(LVRelState *vacrel, Buffer buf,
BlockNumber blkno, Page page,
Buffer vmbuffer, bool all_visible_according_to_vm,
@@ -1939,65 +1933,6 @@ lazy_scan_new_or_empty(LVRelState *vacrel, Buffer buf, BlockNumber blkno,
return false;
}
-/*
- * When updating the visibility map after phase I heap vacuuming, we take the
- * opportunity to identify and fix any VM corruption.
- *
- * heap_blk_known_av is the visibility status of the heap page collected
- * while finding the next unskippable block in heap_vac_scan_next_block().
- */
-static bool
-identify_and_fix_vm_corruption(Relation relation,
- BlockNumber heap_blk,
- Buffer heap_buffer, Page heap_page,
- bool heap_blk_known_av,
- int64 nlpdead_items,
- Buffer vmbuffer)
-{
- /*
- * As of PostgreSQL 9.2, the visibility map bit should never be set if the
- * page-level bit is clear. However, it's possible that the bit got
- * cleared after heap_vac_scan_next_block() was called, so we must recheck
- * with buffer lock before concluding that the VM is corrupt.
- */
- if (heap_blk_known_av && !PageIsAllVisible(heap_page) &&
- visibilitymap_get_status(relation, heap_blk, &vmbuffer) != 0)
- {
- elog(WARNING, "page is not marked all-visible but visibility map bit is set in relation \"%s\" page %u",
- RelationGetRelationName(relation), heap_blk);
- visibilitymap_clear(relation, heap_blk, vmbuffer,
- VISIBILITYMAP_VALID_BITS);
- return true;
- }
-
- /*
- * It's possible for the value returned by
- * GetOldestNonRemovableTransactionId() to move backwards, so it's not
- * wrong for us to see tuples that appear to not be visible to everyone
- * yet, while PD_ALL_VISIBLE is already set. The real safe xmin value
- * never moves backwards, but GetOldestNonRemovableTransactionId() is
- * conservative and sometimes returns a value that's unnecessarily small,
- * so if we see that contradiction it just means that the tuples that we
- * think are not visible to everyone yet actually are, and the
- * PD_ALL_VISIBLE flag is correct.
- *
- * There should never be LP_DEAD items on a page with PD_ALL_VISIBLE set,
- * however.
- */
- if (nlpdead_items > 0 && PageIsAllVisible(heap_page))
- {
- elog(WARNING, "page containing LP_DEAD items is marked as all-visible in relation \"%s\" page %u",
- RelationGetRelationName(relation), heap_blk);
- PageClearAllVisible(heap_page);
- MarkBufferDirty(heap_buffer);
- visibilitymap_clear(relation, heap_blk, vmbuffer,
- VISIBILITYMAP_VALID_BITS);
- return true;
- }
-
- return false;
-}
-
/* qsort comparator for sorting OffsetNumbers */
static int
@@ -2056,11 +1991,14 @@ lazy_scan_prune(LVRelState *vacrel,
* tuples. Pruning will have determined whether or not the page is
* all-visible.
*/
- prune_options = HEAP_PAGE_PRUNE_FREEZE;
+ prune_options = HEAP_PAGE_PRUNE_FREEZE | HEAP_PAGE_PRUNE_UPDATE_VM;
if (vacrel->nindexes == 0)
prune_options |= HEAP_PAGE_PRUNE_MARK_UNUSED_NOW;
- heap_page_prune_and_freeze(rel, buf, vacrel->vistest, prune_options,
+ heap_page_prune_and_freeze(rel, buf,
+ all_visible_according_to_vm,
+ vmbuffer,
+ vacrel->vistest, prune_options,
&vacrel->cutoffs, &presult, PRUNE_VACUUM_SCAN,
&vacrel->offnum,
&vacrel->NewRelfrozenXid, &vacrel->NewRelminMxid);
@@ -2145,10 +2083,9 @@ lazy_scan_prune(LVRelState *vacrel,
/*
* Handle setting visibility map bit based on information from the VM (as
* of last heap_vac_scan_next_block() call), and from all_visible and
- * all_frozen variables. Start by looking for any VM corruption.
+ * all_frozen variables.
*/
- if (identify_and_fix_vm_corruption(vacrel->rel, blkno, buf, page,
- all_visible_according_to_vm, presult.lpdead_items, vmbuffer))
+ if (presult.vm_corruption)
{
/* Don't update the VM if we just cleared corruption in it */
}
diff --git a/src/include/access/heapam.h b/src/include/access/heapam.h
index e7129a644a1..0c7eb5e46f4 100644
--- a/src/include/access/heapam.h
+++ b/src/include/access/heapam.h
@@ -42,6 +42,7 @@
/* "options" flag bits for heap_page_prune_and_freeze */
#define HEAP_PAGE_PRUNE_MARK_UNUSED_NOW (1 << 0)
#define HEAP_PAGE_PRUNE_FREEZE (1 << 1)
+#define HEAP_PAGE_PRUNE_UPDATE_VM (1 << 2)
typedef struct BulkInsertStateData *BulkInsertState;
struct TupleTableSlot;
@@ -247,6 +248,7 @@ typedef struct PruneFreezeResult
bool all_visible;
bool all_frozen;
TransactionId vm_conflict_horizon;
+ bool vm_corruption;
/*
* Whether or not the page makes rel truncation unsafe. This is set to
@@ -380,6 +382,8 @@ extern TransactionId heap_index_delete_tuples(Relation rel,
struct GlobalVisState;
extern void heap_page_prune_opt(Relation relation, Buffer buffer);
extern void heap_page_prune_and_freeze(Relation relation, Buffer buffer,
+ bool blk_known_av,
+ Buffer vmbuffer,
struct GlobalVisState *vistest,
int options,
struct VacuumCutoffs *cutoffs,
--
2.43.0
v3-0009-Update-VM-in-pruneheap.c.patchtext/x-patch; charset=US-ASCII; name=v3-0009-Update-VM-in-pruneheap.c.patchDownload
From 6181180ec46fa558b9938a484d2feaa973b07dee Mon Sep 17 00:00:00 2001
From: Melanie Plageman <melanieplageman@gmail.com>
Date: Mon, 2 Jun 2025 11:04:14 -0400
Subject: [PATCH v3 09/13] Update VM in pruneheap.c
As a step toward updating the VM in the same critical section and WAL
record as pruning and freezing (during phase I of vacuuming), first move
the VM update (still in its own critical section and WAL record) into
heap_page_prune_and_freeze(). This makes review easier.
---
src/backend/access/heap/pruneheap.c | 99 +++++++++++++++++++++++-----
src/backend/access/heap/vacuumlazy.c | 98 +++++----------------------
src/include/access/heapam.h | 15 +++--
3 files changed, 106 insertions(+), 106 deletions(-)
diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index 070d64fa9c3..797b3710862 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -364,7 +364,8 @@ identify_and_fix_vm_corruption(Relation relation,
/*
* Prune and repair fragmentation and potentially freeze tuples on the
- * specified page.
+ * specified page. If the page's visibility status has changed, update it in
+ * the VM.
*
* Caller must have pin and buffer cleanup lock on the page. Note that we
* don't update the FSM information for page on caller's behalf. Caller might
@@ -440,6 +441,8 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
bool do_freeze;
bool do_prune;
bool do_hint;
+ uint8 vmflags = 0;
+ uint8 old_vmbits = 0;
bool hint_bit_fpi;
int64 fpi_before = pgWalUsage.wal_fpi;
@@ -939,7 +942,7 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
*
* Now that freezing has been finalized, unset all_visible if there are
* any LP_DEAD items on the page. It needs to reflect the present state
- * of the page, as expected by our caller.
+ * of the page, as expected for updating the visibility map.
*/
if (prstate.all_visible && prstate.lpdead_items == 0)
{
@@ -955,31 +958,91 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
presult->hastup = prstate.hastup;
/*
- * For callers planning to update the visibility map, the conflict horizon
- * for that record must be the newest xmin on the page. However, if the
- * page is completely frozen, there can be no conflict and the
- * vm_conflict_horizon should remain InvalidTransactionId. This includes
- * the case that we just froze all the tuples; the prune-freeze record
- * included the conflict XID already so the caller doesn't need it.
+ * If updating the visibility map, the conflict horizon for that record
+ * must be the newest xmin on the page. However, if the page is
+ * completely frozen, there can be no conflict and the vm_conflict_horizon
+ * should remain InvalidTransactionId. This includes the case that we
+ * just froze all the tuples; the prune-freeze record included the
+ * conflict XID already so the VM update record doesn't need it.
*/
if (presult->all_frozen)
presult->vm_conflict_horizon = InvalidTransactionId;
else
presult->vm_conflict_horizon = prstate.visibility_cutoff_xid;
- presult->lpdead_items = prstate.lpdead_items;
- /* the presult->deadoffsets array was already filled in */
-
/*
- * Clear any VM corruption. This does not need to be done in a critical
- * section.
+ * Handle setting visibility map bit based on information from the VM (as
+ * of last heap_vac_scan_next_block() call), and from all_visible and
+ * all_frozen variables.
*/
- presult->vm_corruption = false;
if (options & HEAP_PAGE_PRUNE_UPDATE_VM)
- presult->vm_corruption = identify_and_fix_vm_corruption(relation,
- blockno, buffer, page,
- blk_known_av,
- prstate.lpdead_items, vmbuffer);
+ {
+ if (identify_and_fix_vm_corruption(relation,
+ blockno, buffer, page,
+ blk_known_av,
+ prstate.lpdead_items, vmbuffer))
+ {
+ /* If we fix corruption, don't update the VM further */
+ }
+
+ /*
+ * If the page isn't yet marked all-visible in the VM or it is and
+ * needs to me marked all-frozen, update the VM Note that all_frozen
+ * is only valid if all_visible is true, so we must check both
+ * all_visible and all_frozen.
+ */
+ else if (presult->all_visible &&
+ (!blk_known_av ||
+ (presult->all_frozen && !VM_ALL_FROZEN(relation, blockno, &vmbuffer))))
+ {
+ Assert(prstate.lpdead_items == 0);
+ vmflags = VISIBILITYMAP_ALL_VISIBLE;
+
+ /*
+ * If the page is all-frozen, we can pass InvalidTransactionId as
+ * our cutoff_xid, since a snapshotConflictHorizon sufficient to
+ * make everything safe for REDO was logged when the page's tuples
+ * were frozen.
+ */
+ if (presult->all_frozen)
+ {
+ Assert(!TransactionIdIsValid(presult->vm_conflict_horizon));
+ vmflags |= VISIBILITYMAP_ALL_FROZEN;
+ }
+
+ /*
+ * It's possible for the VM bit to be clear and the page-level bit
+ * to be set if checksums are not enabled.
+ *
+ * And even if we are just planning to update the frozen bit in
+ * the VM, we shouldn't rely on all_visible_according_to_vm as a
+ * proxy for the page-level PD_ALL_VISIBLE bit being set, since it
+ * might have become stale.
+ *
+ * If the heap page is all-visible but the VM bit is not set, we
+ * don't need to dirty the heap page. However, if checksums are
+ * enabled, we do need to make sure that the heap page is dirtied
+ * before passing it to visibilitymap_set(), because it may be
+ * logged.
+ */
+ if (!PageIsAllVisible(page))
+ {
+ PageSetAllVisible(page);
+ MarkBufferDirty(buffer);
+ }
+
+ old_vmbits = visibilitymap_set(relation, blockno, buffer, InvalidXLogRecPtr,
+ vmbuffer, presult->vm_conflict_horizon,
+ vmflags);
+ }
+ }
+
+ presult->lpdead_items = prstate.lpdead_items;
+ /* the presult->deadoffsets array was already filled in */
+
+ presult->old_vmbits = old_vmbits;
+ presult->new_vmbits = vmflags;
+
if (prstate.freeze)
{
if (presult->nfrozen > 0)
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index 6110b7f80ce..c49e81bc5dd 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -1933,7 +1933,6 @@ lazy_scan_new_or_empty(LVRelState *vacrel, Buffer buf, BlockNumber blkno,
return false;
}
-
/* qsort comparator for sorting OffsetNumbers */
static int
cmpOffsetNumbers(const void *a, const void *b)
@@ -1949,7 +1948,8 @@ cmpOffsetNumbers(const void *a, const void *b)
* vmbuffer is the buffer containing the VM block with visibility information
* for the heap block, blkno. all_visible_according_to_vm is the saved
* visibility status of the heap block looked up earlier by the caller. We
- * won't rely entirely on this status, as it may be out of date.
+ * won't rely entirely on this status, as it may be out of date. These will be
+ * passed on to heap_page_prune_and_freeze() to use while setting the VM.
*
* *has_lpdead_items is set to true or false depending on whether, upon return
* from this function, any LP_DEAD items are still present on the page.
@@ -1978,6 +1978,7 @@ lazy_scan_prune(LVRelState *vacrel,
/*
* Prune all HOT-update chains and potentially freeze tuples on this page.
+ * Then, if the page's visibility status has changed, update the VM.
*
* If the relation has no indexes, we can immediately mark would-be dead
* items LP_UNUSED.
@@ -1986,10 +1987,6 @@ lazy_scan_prune(LVRelState *vacrel,
* presult.ndeleted. It should not be confused with presult.lpdead_items;
* presult.lpdead_items's final value can be thought of as the number of
* tuples that were deleted from indexes.
- *
- * We will update the VM after collecting LP_DEAD items and freezing
- * tuples. Pruning will have determined whether or not the page is
- * all-visible.
*/
prune_options = HEAP_PAGE_PRUNE_FREEZE | HEAP_PAGE_PRUNE_UPDATE_VM;
if (vacrel->nindexes == 0)
@@ -2081,87 +2078,26 @@ lazy_scan_prune(LVRelState *vacrel,
Assert(!presult.all_visible || !(*has_lpdead_items));
/*
- * Handle setting visibility map bit based on information from the VM (as
- * of last heap_vac_scan_next_block() call), and from all_visible and
- * all_frozen variables.
+ * For the purposes of logging, count whether or not the page was newly
+ * set all-visible and, potentially, all-frozen.
*/
- if (presult.vm_corruption)
+ if ((presult.old_vmbits & VISIBILITYMAP_ALL_VISIBLE) == 0 &&
+ (presult.new_vmbits & VISIBILITYMAP_ALL_VISIBLE) != 0)
{
- /* Don't update the VM if we just cleared corruption in it */
- }
-
- /*
- * If the page isn't yet marked all-visible in the VM or it is and needs
- * to me marked all-frozen, update the VM Note that all_frozen is only
- * valid if all_visible is true, so we must check both all_visible and
- * all_frozen.
- */
- else if (presult.all_visible &&
- (!all_visible_according_to_vm ||
- (presult.all_frozen && !VM_ALL_FROZEN(vacrel->rel, blkno, &vmbuffer))))
- {
- uint8 old_vmbits;
- uint8 flags = VISIBILITYMAP_ALL_VISIBLE;
-
- /*
- * If the page is all-frozen, we can pass InvalidTransactionId as our
- * cutoff_xid, since a snapshotConflictHorizon sufficient to make
- * everything safe for REDO was logged when the page's tuples were
- * frozen.
- */
- if (presult.all_frozen)
- {
- Assert(!TransactionIdIsValid(presult.vm_conflict_horizon));
- flags |= VISIBILITYMAP_ALL_FROZEN;
- }
-
- /*
- * It should never be the case that the visibility map page is set
- * while the page-level bit is clear, but the reverse is allowed (if
- * checksums are not enabled). Regardless, set both bits so that we
- * get back in sync.
- *
- * If the heap page is all-visible but the VM bit is not set, we don't
- * need to dirty the heap page. However, if checksums are enabled, we
- * do need to make sure that the heap page is dirtied before passing
- * it to visibilitymap_set(), because it may be logged.
- */
- if (!PageIsAllVisible(page))
- {
- PageSetAllVisible(page);
- MarkBufferDirty(buf);
- }
- old_vmbits = visibilitymap_set(vacrel->rel, blkno, buf,
- InvalidXLogRecPtr,
- vmbuffer, presult.vm_conflict_horizon,
- flags);
-
- /*
- * Even if we are only setting the all-frozen bit, there is a small
- * chance that the VM was modified sometime between setting
- * all_visible_according_to_vm and checking the visibility during
- * pruning. Check the return value of old_vmbits to ensure the
- * visibility map counters used for logging are accurate.
- *
- * If the page wasn't already set all-visible and/or all-frozen in the
- * VM, count it as newly set for logging.
- */
- if ((old_vmbits & VISIBILITYMAP_ALL_VISIBLE) == 0)
- {
- vacrel->vm_new_visible_pages++;
- if (presult.all_frozen)
- {
- vacrel->vm_new_visible_frozen_pages++;
- *vm_page_frozen = true;
- }
- }
- else if ((old_vmbits & VISIBILITYMAP_ALL_FROZEN) == 0 &&
- presult.all_frozen)
+ vacrel->vm_new_visible_pages++;
+ if ((presult.new_vmbits & VISIBILITYMAP_ALL_FROZEN) != 0)
{
- vacrel->vm_new_frozen_pages++;
+ vacrel->vm_new_visible_frozen_pages++;
*vm_page_frozen = true;
}
}
+ else if ((presult.old_vmbits & VISIBILITYMAP_ALL_FROZEN) == 0 &&
+ (presult.new_vmbits & VISIBILITYMAP_ALL_FROZEN) != 0)
+ {
+ Assert((presult.new_vmbits & VISIBILITYMAP_ALL_VISIBLE) != 0);
+ vacrel->vm_new_frozen_pages++;
+ *vm_page_frozen = true;
+ }
return presult.ndeleted;
}
diff --git a/src/include/access/heapam.h b/src/include/access/heapam.h
index 0c7eb5e46f4..b85648456e9 100644
--- a/src/include/access/heapam.h
+++ b/src/include/access/heapam.h
@@ -235,20 +235,21 @@ typedef struct PruneFreezeResult
int recently_dead_tuples;
/*
- * all_visible and all_frozen indicate if the all-visible and all-frozen
- * bits in the visibility map can be set for this page, after pruning.
+ * all_visible and all_frozen indicate the status of the page as reflected
+ * in the visibility map after pruning, freezing, and setting any pages
+ * all-visible in the visibility map.
*
- * vm_conflict_horizon is the newest xmin of live tuples on the page. The
- * caller can use it as the conflict horizon when setting the VM bits. It
- * is only valid if we froze some tuples (nfrozen > 0), and all_frozen is
- * true.
+ * vm_conflict_horizon is the newest xmin of live tuples on the page
+ * (older than OldestXmin). It will only be valid if we did not set the
+ * page all-frozen in the VM.
*
* These are only set if the HEAP_PRUNE_FREEZE option is set.
*/
bool all_visible;
bool all_frozen;
TransactionId vm_conflict_horizon;
- bool vm_corruption;
+ uint8 old_vmbits;
+ uint8 new_vmbits;
/*
* Whether or not the page makes rel truncation unsafe. This is set to
--
2.43.0
v3-0011-Remove-xl_heap_visible-entirely.patchtext/x-patch; charset=US-ASCII; name=v3-0011-Remove-xl_heap_visible-entirely.patchDownload
From 121ba45c82d9a3798243d65f9838b1875484cb49 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <melanieplageman@gmail.com>
Date: Wed, 18 Jun 2025 12:30:42 -0400
Subject: [PATCH v3 11/13] Remove xl_heap_visible entirely
There are now no users of this, so eliminate it entirely.
---
src/backend/access/common/bufmask.c | 3 +-
src/backend/access/heap/heapam.c | 54 +-------
src/backend/access/heap/heapam_xlog.c | 150 +----------------------
src/backend/access/heap/pruneheap.c | 4 +-
src/backend/access/heap/vacuumlazy.c | 10 +-
src/backend/access/heap/visibilitymap.c | 103 +---------------
src/backend/access/rmgrdesc/heapdesc.c | 10 --
src/backend/replication/logical/decode.c | 1 -
src/include/access/heapam_xlog.h | 6 -
src/include/access/visibilitymap.h | 10 +-
10 files changed, 23 insertions(+), 328 deletions(-)
diff --git a/src/backend/access/common/bufmask.c b/src/backend/access/common/bufmask.c
index bb260cffa68..1fff01383b3 100644
--- a/src/backend/access/common/bufmask.c
+++ b/src/backend/access/common/bufmask.c
@@ -56,8 +56,7 @@ mask_page_hint_bits(Page page)
/*
* During replay, if the page LSN has advanced past our XLOG record's LSN,
- * we don't mark the page all-visible. See heap_xlog_visible() for
- * details.
+ * we don't mark the page all-visible.
*/
PageClearAllVisible(page);
}
diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index 68db4325285..48f7b84156a 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -40,6 +40,7 @@
#include "access/valid.h"
#include "access/visibilitymap.h"
#include "access/xloginsert.h"
+#include "access/xlogutils.h"
#include "catalog/pg_database.h"
#include "catalog/pg_database_d.h"
#include "commands/vacuum.h"
@@ -2512,11 +2513,11 @@ heap_multi_insert(Relation relation, TupleTableSlot **slots, int ntuples,
{
PageSetAllVisible(page);
LockBuffer(vmbuffer, BUFFER_LOCK_EXCLUSIVE);
- visibilitymap_set_vmbyte(relation,
- BufferGetBlockNumber(buffer),
- vmbuffer,
- VISIBILITYMAP_ALL_VISIBLE |
- VISIBILITYMAP_ALL_FROZEN);
+ visibilitymap_set(relation,
+ BufferGetBlockNumber(buffer),
+ vmbuffer,
+ VISIBILITYMAP_ALL_VISIBLE |
+ VISIBILITYMAP_ALL_FROZEN);
}
/*
@@ -8784,49 +8785,6 @@ bottomup_sort_and_shrink(TM_IndexDeleteOp *delstate)
return nblocksfavorable;
}
-/*
- * Perform XLogInsert for a heap-visible operation. 'block' is the block
- * being marked all-visible, and vm_buffer is the buffer containing the
- * corresponding visibility map block. Both should have already been modified
- * and dirtied.
- *
- * snapshotConflictHorizon comes from the largest xmin on the page being
- * marked all-visible. REDO routine uses it to generate recovery conflicts.
- *
- * If checksums or wal_log_hints are enabled, we may also generate a full-page
- * image of heap_buffer. Otherwise, we optimize away the FPI (by specifying
- * REGBUF_NO_IMAGE for the heap buffer), in which case the caller should *not*
- * update the heap page's LSN.
- */
-XLogRecPtr
-log_heap_visible(Relation rel, Buffer heap_buffer, Buffer vm_buffer,
- TransactionId snapshotConflictHorizon, uint8 vmflags)
-{
- xl_heap_visible xlrec;
- XLogRecPtr recptr;
- uint8 flags;
-
- Assert(BufferIsValid(heap_buffer));
- Assert(BufferIsValid(vm_buffer));
-
- xlrec.snapshotConflictHorizon = snapshotConflictHorizon;
- xlrec.flags = vmflags;
- if (RelationIsAccessibleInLogicalDecoding(rel))
- xlrec.flags |= VISIBILITYMAP_XLOG_CATALOG_REL;
- XLogBeginInsert();
- XLogRegisterData(&xlrec, SizeOfHeapVisible);
-
- XLogRegisterBuffer(0, vm_buffer, 0);
-
- flags = REGBUF_STANDARD;
- if (!XLogHintBitIsNeeded())
- flags |= REGBUF_NO_IMAGE;
- XLogRegisterBuffer(1, heap_buffer, flags);
-
- recptr = XLogInsert(RM_HEAP2_ID, XLOG_HEAP2_VISIBLE);
-
- return recptr;
-}
/*
* Perform XLogInsert for a heap-update operation. Caller must already
diff --git a/src/backend/access/heap/heapam_xlog.c b/src/backend/access/heap/heapam_xlog.c
index 14541e2e94f..51c158af89e 100644
--- a/src/backend/access/heap/heapam_xlog.c
+++ b/src/backend/access/heap/heapam_xlog.c
@@ -267,7 +267,7 @@ heap_xlog_prune_freeze(XLogReaderState *record)
Relation reln = CreateFakeRelcacheEntry(rlocator);
visibilitymap_pin(reln, blkno, &vmbuffer);
- old_vmbits = visibilitymap_set_vmbyte(reln, blkno, vmbuffer, vmflags);
+ old_vmbits = visibilitymap_set(reln, blkno, vmbuffer, vmflags);
/* Only set VM page LSN if we modified the page */
if (old_vmbits != vmflags)
PageSetLSN(BufferGetPage(vmbuffer), lsn);
@@ -278,143 +278,6 @@ heap_xlog_prune_freeze(XLogReaderState *record)
UnlockReleaseBuffer(vmbuffer);
}
-/*
- * Replay XLOG_HEAP2_VISIBLE records.
- *
- * The critical integrity requirement here is that we must never end up with
- * a situation where the visibility map bit is set, and the page-level
- * PD_ALL_VISIBLE bit is clear. If that were to occur, then a subsequent
- * page modification would fail to clear the visibility map bit.
- */
-static void
-heap_xlog_visible(XLogReaderState *record)
-{
- XLogRecPtr lsn = record->EndRecPtr;
- xl_heap_visible *xlrec = (xl_heap_visible *) XLogRecGetData(record);
- Buffer vmbuffer = InvalidBuffer;
- Buffer buffer;
- Page page;
- RelFileLocator rlocator;
- BlockNumber blkno;
- XLogRedoAction action;
-
- Assert((xlrec->flags & VISIBILITYMAP_XLOG_VALID_BITS) == xlrec->flags);
-
- XLogRecGetBlockTag(record, 1, &rlocator, NULL, &blkno);
-
- /*
- * If there are any Hot Standby transactions running that have an xmin
- * horizon old enough that this page isn't all-visible for them, they
- * might incorrectly decide that an index-only scan can skip a heap fetch.
- *
- * NB: It might be better to throw some kind of "soft" conflict here that
- * forces any index-only scan that is in flight to perform heap fetches,
- * rather than killing the transaction outright.
- */
- if (InHotStandby)
- ResolveRecoveryConflictWithSnapshot(xlrec->snapshotConflictHorizon,
- xlrec->flags & VISIBILITYMAP_XLOG_CATALOG_REL,
- rlocator);
-
- /*
- * Read the heap page, if it still exists. If the heap file has dropped or
- * truncated later in recovery, we don't need to update the page, but we'd
- * better still update the visibility map.
- */
- action = XLogReadBufferForRedo(record, 1, &buffer);
- if (action == BLK_NEEDS_REDO)
- {
- /*
- * We don't bump the LSN of the heap page when setting the visibility
- * map bit (unless checksums or wal_hint_bits is enabled, in which
- * case we must). This exposes us to torn page hazards, but since
- * we're not inspecting the existing page contents in any way, we
- * don't care.
- */
- page = BufferGetPage(buffer);
-
- PageSetAllVisible(page);
-
- if (XLogHintBitIsNeeded())
- PageSetLSN(page, lsn);
-
- MarkBufferDirty(buffer);
- }
- else if (action == BLK_RESTORED)
- {
- /*
- * If heap block was backed up, we already restored it and there's
- * nothing more to do. (This can only happen with checksums or
- * wal_log_hints enabled.)
- */
- }
-
- if (BufferIsValid(buffer))
- {
- Size space = PageGetFreeSpace(BufferGetPage(buffer));
-
- UnlockReleaseBuffer(buffer);
-
- /*
- * Since FSM is not WAL-logged and only updated heuristically, it
- * easily becomes stale in standbys. If the standby is later promoted
- * and runs VACUUM, it will skip updating individual free space
- * figures for pages that became all-visible (or all-frozen, depending
- * on the vacuum mode,) which is troublesome when FreeSpaceMapVacuum
- * propagates too optimistic free space values to upper FSM layers;
- * later inserters try to use such pages only to find out that they
- * are unusable. This can cause long stalls when there are many such
- * pages.
- *
- * Forestall those problems by updating FSM's idea about a page that
- * is becoming all-visible or all-frozen.
- *
- * Do this regardless of a full-page image being applied, since the
- * FSM data is not in the page anyway.
- */
- if (xlrec->flags & VISIBILITYMAP_VALID_BITS)
- XLogRecordPageWithFreeSpace(rlocator, blkno, space);
- }
-
- /*
- * Even if we skipped the heap page update due to the LSN interlock, it's
- * still safe to update the visibility map. Any WAL record that clears
- * the visibility map bit does so before checking the page LSN, so any
- * bits that need to be cleared will still be cleared.
- */
- if (XLogReadBufferForRedoExtended(record, 0, RBM_ZERO_ON_ERROR, false,
- &vmbuffer) == BLK_NEEDS_REDO)
- {
- Page vmpage = BufferGetPage(vmbuffer);
- Relation reln;
- uint8 vmbits;
-
- /* initialize the page if it was read as zeros */
- if (PageIsNew(vmpage))
- PageInit(vmpage, BLCKSZ, 0);
-
- /* remove VISIBILITYMAP_XLOG_* */
- vmbits = xlrec->flags & VISIBILITYMAP_VALID_BITS;
-
- /*
- * XLogReadBufferForRedoExtended locked the buffer. But
- * visibilitymap_set will handle locking itself.
- */
- LockBuffer(vmbuffer, BUFFER_LOCK_UNLOCK);
-
- reln = CreateFakeRelcacheEntry(rlocator);
- visibilitymap_pin(reln, blkno, &vmbuffer);
-
- visibilitymap_set(reln, blkno, InvalidBuffer, lsn, vmbuffer,
- xlrec->snapshotConflictHorizon, vmbits);
-
- ReleaseBuffer(vmbuffer);
- FreeFakeRelcacheEntry(reln);
- }
- else if (BufferIsValid(vmbuffer))
- UnlockReleaseBuffer(vmbuffer);
-}
-
/*
* Given an "infobits" field from an XLog record, set the correct bits in the
* given infomask and infomask2 for the tuple touched by the record.
@@ -791,16 +654,16 @@ heap_xlog_multi_insert(XLogReaderState *record)
Relation reln = CreateFakeRelcacheEntry(rlocator);
visibilitymap_pin(reln, blkno, &vmbuffer);
- visibilitymap_set_vmbyte(reln, blkno,
- vmbuffer,
- VISIBILITYMAP_ALL_VISIBLE |
- VISIBILITYMAP_ALL_FROZEN);
/*
* It is not possible that the VM was already set for this heap page,
* so the vmbuffer must have been modified and marked dirty.
*/
Assert(BufferIsDirty(vmbuffer));
+ visibilitymap_set(reln, blkno,
+ vmbuffer,
+ VISIBILITYMAP_ALL_VISIBLE |
+ VISIBILITYMAP_ALL_FROZEN);
PageSetLSN(BufferGetPage(vmbuffer), lsn);
FreeFakeRelcacheEntry(reln);
}
@@ -1380,9 +1243,6 @@ heap2_redo(XLogReaderState *record)
case XLOG_HEAP2_PRUNE_VACUUM_CLEANUP:
heap_xlog_prune_freeze(record);
break;
- case XLOG_HEAP2_VISIBLE:
- heap_xlog_visible(record);
- break;
case XLOG_HEAP2_MULTI_INSERT:
heap_xlog_multi_insert(record);
break;
diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index 6208f55176f..f6509695e3a 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -959,8 +959,8 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
if (do_set_vm)
{
LockBuffer(vmbuffer, BUFFER_LOCK_EXCLUSIVE);
- old_vmbits = visibilitymap_set_vmbyte(relation, blockno,
- vmbuffer, vmflags);
+ old_vmbits = visibilitymap_set(relation, blockno,
+ vmbuffer, vmflags);
if (old_vmbits == vmflags)
{
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index 91e209901b8..6a0fa371a06 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -1887,8 +1887,8 @@ lazy_scan_new_or_empty(LVRelState *vacrel, Buffer buf, BlockNumber blkno,
MarkBufferDirty(buf);
LockBuffer(vmbuffer, BUFFER_LOCK_EXCLUSIVE);
- visibilitymap_set_vmbyte(vacrel->rel, blkno,
- vmbuffer, new_vmbits);
+ visibilitymap_set(vacrel->rel, blkno,
+ vmbuffer, new_vmbits);
if (RelationNeedsWAL(vacrel->rel))
{
@@ -2754,9 +2754,9 @@ lazy_vacuum_heap_page(LVRelState *vacrel, BlockNumber blkno, Buffer buffer,
set_pd_all_vis = true;
LockBuffer(vmbuffer, BUFFER_LOCK_EXCLUSIVE);
PageSetAllVisible(page);
- visibilitymap_set_vmbyte(vacrel->rel,
- blkno,
- vmbuffer, vmflags);
+ visibilitymap_set(vacrel->rel,
+ blkno,
+ vmbuffer, vmflags);
conflict_xid = visibility_cutoff_xid;
}
diff --git a/src/backend/access/heap/visibilitymap.c b/src/backend/access/heap/visibilitymap.c
index 573df6f6891..478b08fa520 100644
--- a/src/backend/access/heap/visibilitymap.c
+++ b/src/backend/access/heap/visibilitymap.c
@@ -219,105 +219,6 @@ visibilitymap_pin_ok(BlockNumber heapBlk, Buffer vmbuf)
return BufferIsValid(vmbuf) && BufferGetBlockNumber(vmbuf) == mapBlock;
}
-/*
- * visibilitymap_set - set bit(s) on a previously pinned page
- *
- * recptr is the LSN of the XLOG record we're replaying, if we're in recovery,
- * or InvalidXLogRecPtr in normal running. The VM page LSN is advanced to the
- * one provided; in normal running, we generate a new XLOG record and set the
- * page LSN to that value (though the heap page's LSN may *not* be updated;
- * see below). cutoff_xid is the largest xmin on the page being marked
- * all-visible; it is needed for Hot Standby, and can be InvalidTransactionId
- * if the page contains no tuples. It can also be set to InvalidTransactionId
- * when a page that is already all-visible is being marked all-frozen.
- *
- * Caller is expected to set the heap page's PD_ALL_VISIBLE bit before calling
- * this function. Except in recovery, caller should also pass the heap
- * buffer. When checksums are enabled and we're not in recovery, we must add
- * the heap buffer to the WAL chain to protect it from being torn.
- *
- * You must pass a buffer containing the correct map page to this function.
- * Call visibilitymap_pin first to pin the right one. This function doesn't do
- * any I/O.
- *
- * Returns the state of the page's VM bits before setting flags.
- */
-uint8
-visibilitymap_set(Relation rel, BlockNumber heapBlk, Buffer heapBuf,
- XLogRecPtr recptr, Buffer vmBuf, TransactionId cutoff_xid,
- uint8 flags)
-{
- BlockNumber mapBlock = HEAPBLK_TO_MAPBLOCK(heapBlk);
- uint32 mapByte = HEAPBLK_TO_MAPBYTE(heapBlk);
- uint8 mapOffset = HEAPBLK_TO_OFFSET(heapBlk);
- Page page;
- uint8 *map;
- uint8 status;
-
-#ifdef TRACE_VISIBILITYMAP
- elog(DEBUG1, "vm_set %s %d", RelationGetRelationName(rel), heapBlk);
-#endif
-
- Assert(InRecovery || XLogRecPtrIsInvalid(recptr));
- Assert(InRecovery || PageIsAllVisible((Page) BufferGetPage(heapBuf)));
- Assert((flags & VISIBILITYMAP_VALID_BITS) == flags);
-
- /* Must never set all_frozen bit without also setting all_visible bit */
- Assert(flags != VISIBILITYMAP_ALL_FROZEN);
-
- /* Check that we have the right heap page pinned, if present */
- if (BufferIsValid(heapBuf) && BufferGetBlockNumber(heapBuf) != heapBlk)
- elog(ERROR, "wrong heap buffer passed to visibilitymap_set");
-
- /* Check that we have the right VM page pinned */
- if (!BufferIsValid(vmBuf) || BufferGetBlockNumber(vmBuf) != mapBlock)
- elog(ERROR, "wrong VM buffer passed to visibilitymap_set");
-
- page = BufferGetPage(vmBuf);
- map = (uint8 *) PageGetContents(page);
- LockBuffer(vmBuf, BUFFER_LOCK_EXCLUSIVE);
-
- status = (map[mapByte] >> mapOffset) & VISIBILITYMAP_VALID_BITS;
- if (flags != status)
- {
- START_CRIT_SECTION();
-
- map[mapByte] |= (flags << mapOffset);
- MarkBufferDirty(vmBuf);
-
- if (RelationNeedsWAL(rel))
- {
- if (XLogRecPtrIsInvalid(recptr))
- {
- Assert(!InRecovery);
- recptr = log_heap_visible(rel, heapBuf, vmBuf, cutoff_xid, flags);
-
- /*
- * If data checksums are enabled (or wal_log_hints=on), we
- * need to protect the heap page from being torn.
- *
- * If not, then we must *not* update the heap page's LSN. In
- * this case, the FPI for the heap page was omitted from the
- * WAL record inserted above, so it would be incorrect to
- * update the heap page's LSN.
- */
- if (XLogHintBitIsNeeded())
- {
- Page heapPage = BufferGetPage(heapBuf);
-
- PageSetLSN(heapPage, recptr);
- }
- }
- PageSetLSN(page, recptr);
- }
-
- END_CRIT_SECTION();
- }
-
- LockBuffer(vmBuf, BUFFER_LOCK_UNLOCK);
- return status;
-}
-
/*
* Set flags in the VM block contained in the passed in vmBuf.
*
@@ -337,8 +238,8 @@ visibilitymap_set(Relation rel, BlockNumber heapBlk, Buffer heapBuf,
* making any changes needed to the associated heap page.
*/
uint8
-visibilitymap_set_vmbyte(Relation rel, BlockNumber heapBlk,
- Buffer vmBuf, uint8 flags)
+visibilitymap_set(Relation rel, BlockNumber heapBlk,
+ Buffer vmBuf, uint8 flags)
{
BlockNumber mapBlock = HEAPBLK_TO_MAPBLOCK(heapBlk);
uint32 mapByte = HEAPBLK_TO_MAPBYTE(heapBlk);
diff --git a/src/backend/access/rmgrdesc/heapdesc.c b/src/backend/access/rmgrdesc/heapdesc.c
index d6c86ccac20..f7880a4ed81 100644
--- a/src/backend/access/rmgrdesc/heapdesc.c
+++ b/src/backend/access/rmgrdesc/heapdesc.c
@@ -351,13 +351,6 @@ heap2_desc(StringInfo buf, XLogReaderState *record)
}
}
}
- else if (info == XLOG_HEAP2_VISIBLE)
- {
- xl_heap_visible *xlrec = (xl_heap_visible *) rec;
-
- appendStringInfo(buf, "snapshotConflictHorizon: %u, flags: 0x%02X",
- xlrec->snapshotConflictHorizon, xlrec->flags);
- }
else if (info == XLOG_HEAP2_MULTI_INSERT)
{
xl_heap_multi_insert *xlrec = (xl_heap_multi_insert *) rec;
@@ -462,9 +455,6 @@ heap2_identify(uint8 info)
case XLOG_HEAP2_PRUNE_VACUUM_CLEANUP:
id = "PRUNE_VACUUM_CLEANUP";
break;
- case XLOG_HEAP2_VISIBLE:
- id = "VISIBLE";
- break;
case XLOG_HEAP2_MULTI_INSERT:
id = "MULTI_INSERT";
break;
diff --git a/src/backend/replication/logical/decode.c b/src/backend/replication/logical/decode.c
index cc03f0706e9..2fdd4af90a8 100644
--- a/src/backend/replication/logical/decode.c
+++ b/src/backend/replication/logical/decode.c
@@ -454,7 +454,6 @@ heap2_decode(LogicalDecodingContext *ctx, XLogRecordBuffer *buf)
case XLOG_HEAP2_PRUNE_ON_ACCESS:
case XLOG_HEAP2_PRUNE_VACUUM_SCAN:
case XLOG_HEAP2_PRUNE_VACUUM_CLEANUP:
- case XLOG_HEAP2_VISIBLE:
case XLOG_HEAP2_LOCK_UPDATED:
break;
default:
diff --git a/src/include/access/heapam_xlog.h b/src/include/access/heapam_xlog.h
index ceae9c083ff..9a61434b881 100644
--- a/src/include/access/heapam_xlog.h
+++ b/src/include/access/heapam_xlog.h
@@ -60,7 +60,6 @@
#define XLOG_HEAP2_PRUNE_ON_ACCESS 0x10
#define XLOG_HEAP2_PRUNE_VACUUM_SCAN 0x20
#define XLOG_HEAP2_PRUNE_VACUUM_CLEANUP 0x30
-#define XLOG_HEAP2_VISIBLE 0x40
#define XLOG_HEAP2_MULTI_INSERT 0x50
#define XLOG_HEAP2_LOCK_UPDATED 0x60
#define XLOG_HEAP2_NEW_CID 0x70
@@ -495,11 +494,6 @@ extern void heap2_desc(StringInfo buf, XLogReaderState *record);
extern const char *heap2_identify(uint8 info);
extern void heap_xlog_logical_rewrite(XLogReaderState *r);
-extern XLogRecPtr log_heap_visible(Relation rel, Buffer heap_buffer,
- Buffer vm_buffer,
- TransactionId snapshotConflictHorizon,
- uint8 vmflags);
-
/* in heapdesc.c, so it can be shared between frontend/backend code */
extern void heap_xlog_deserialize_prune_and_freeze(char *cursor, uint8 flags,
int *nplans, xlhp_freeze_plan **plans,
diff --git a/src/include/access/visibilitymap.h b/src/include/access/visibilitymap.h
index 977566f6b98..20141e3e805 100644
--- a/src/include/access/visibilitymap.h
+++ b/src/include/access/visibilitymap.h
@@ -31,14 +31,8 @@ extern bool visibilitymap_clear(Relation rel, BlockNumber heapBlk,
extern void visibilitymap_pin(Relation rel, BlockNumber heapBlk,
Buffer *vmbuf);
extern bool visibilitymap_pin_ok(BlockNumber heapBlk, Buffer vmbuf);
-extern uint8 visibilitymap_set(Relation rel,
- BlockNumber heapBlk, Buffer heapBuf,
- XLogRecPtr recptr,
- Buffer vmBuf,
- TransactionId cutoff_xid,
- uint8 flags);
-extern uint8 visibilitymap_set_vmbyte(Relation rel, BlockNumber heapBlk,
- Buffer vmBuf, uint8 flags);
+extern uint8 visibilitymap_set(Relation rel, BlockNumber heapBlk,
+ Buffer vmBuf, uint8 flags);
extern uint8 visibilitymap_get_status(Relation rel, BlockNumber heapBlk, Buffer *vmbuf);
extern void visibilitymap_count(Relation rel, BlockNumber *all_visible, BlockNumber *all_frozen);
extern BlockNumber visibilitymap_prepare_truncate(Relation rel,
--
2.43.0
v3-0012-Allow-on-access-pruning-to-set-pages-all-visible.patchtext/x-patch; charset=US-ASCII; name=v3-0012-Allow-on-access-pruning-to-set-pages-all-visible.patchDownload
From 2c200ceb43899301fd0a6ad079aa9d4d48c24afb Mon Sep 17 00:00:00 2001
From: Melanie Plageman <melanieplageman@gmail.com>
Date: Mon, 7 Jul 2025 17:30:14 -0400
Subject: [PATCH v3 12/13] Allow on-access pruning to set pages all-visible
Many queries do not modify the underlying relation. For such queries, if
on-access pruning occurs during the scan, we can check whether the page
has become all-visible and update the visibility map accordingly.
Previously, only vacuum marked pages as all-visible or all-frozen.
Supporting this requires passing information about whether the relation
is modified from the executor down to the scan descriptor.
This commit implements on-access VM setting for sequential scans as well
as for the underlying heap relation in index scans and bitmap heap
scans.
---
src/backend/access/heap/heapam.c | 15 +++++-
src/backend/access/heap/heapam_handler.c | 17 ++++++-
src/backend/access/heap/pruneheap.c | 59 +++++++++++++++++------
src/backend/access/index/indexam.c | 46 ++++++++++++++++++
src/backend/access/table/tableam.c | 39 +++++++++++++--
src/backend/executor/execMain.c | 4 ++
src/backend/executor/execUtils.c | 2 +
src/backend/executor/nodeBitmapHeapscan.c | 6 ++-
src/backend/executor/nodeIndexscan.c | 17 ++++---
src/backend/executor/nodeSeqscan.c | 17 +++++--
src/backend/storage/ipc/procarray.c | 12 +++++
src/include/access/genam.h | 11 +++++
src/include/access/heapam.h | 24 +++++++--
src/include/access/relscan.h | 6 +++
src/include/access/tableam.h | 30 +++++++++++-
src/include/nodes/execnodes.h | 17 +++++++
src/include/utils/snapmgr.h | 1 +
17 files changed, 285 insertions(+), 38 deletions(-)
diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index 48f7b84156a..50b0d169d54 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -560,6 +560,7 @@ heap_prepare_pagescan(TableScanDesc sscan)
int lines;
bool all_visible;
bool check_serializable;
+ bool allow_vmset;
Assert(BufferGetBlockNumber(buffer) == block);
@@ -570,7 +571,9 @@ heap_prepare_pagescan(TableScanDesc sscan)
/*
* Prune and repair fragmentation for the whole page, if possible.
*/
- heap_page_prune_opt(scan->rs_base.rs_rd, buffer);
+ allow_vmset = sscan->rs_flags & SO_ALLOW_VM_SET;
+ heap_page_prune_opt(scan->rs_base.rs_rd, buffer,
+ allow_vmset ? &scan->rs_vmbuffer : NULL, allow_vmset);
/*
* We must hold share lock on the buffer content while examining tuple
@@ -1236,6 +1239,7 @@ heap_beginscan(Relation relation, Snapshot snapshot,
sizeof(TBMIterateResult));
}
+ scan->rs_vmbuffer = InvalidBuffer;
return (TableScanDesc) scan;
}
@@ -1274,6 +1278,12 @@ heap_rescan(TableScanDesc sscan, ScanKey key, bool set_params,
scan->rs_cbuf = InvalidBuffer;
}
+ if (BufferIsValid(scan->rs_vmbuffer))
+ {
+ ReleaseBuffer(scan->rs_vmbuffer);
+ scan->rs_vmbuffer = InvalidBuffer;
+ }
+
/*
* SO_TYPE_BITMAPSCAN would be cleaned up here, but it does not hold any
* additional data vs a normal HeapScan
@@ -1306,6 +1316,9 @@ heap_endscan(TableScanDesc sscan)
if (BufferIsValid(scan->rs_cbuf))
ReleaseBuffer(scan->rs_cbuf);
+ if (BufferIsValid(scan->rs_vmbuffer))
+ ReleaseBuffer(scan->rs_vmbuffer);
+
/*
* Must free the read stream before freeing the BufferAccessStrategy.
*/
diff --git a/src/backend/access/heap/heapam_handler.c b/src/backend/access/heap/heapam_handler.c
index cb4bc35c93e..fb450c5a84f 100644
--- a/src/backend/access/heap/heapam_handler.c
+++ b/src/backend/access/heap/heapam_handler.c
@@ -85,6 +85,7 @@ heapam_index_fetch_begin(Relation rel)
hscan->xs_base.rel = rel;
hscan->xs_cbuf = InvalidBuffer;
+ hscan->xs_vmbuffer = InvalidBuffer;
return &hscan->xs_base;
}
@@ -99,6 +100,12 @@ heapam_index_fetch_reset(IndexFetchTableData *scan)
ReleaseBuffer(hscan->xs_cbuf);
hscan->xs_cbuf = InvalidBuffer;
}
+
+ if (BufferIsValid(hscan->xs_vmbuffer))
+ {
+ ReleaseBuffer(hscan->xs_vmbuffer);
+ hscan->xs_vmbuffer = InvalidBuffer;
+ }
}
static void
@@ -138,7 +145,9 @@ heapam_index_fetch_tuple(struct IndexFetchTableData *scan,
* Prune page, but only if we weren't already on this page
*/
if (prev_buf != hscan->xs_cbuf)
- heap_page_prune_opt(hscan->xs_base.rel, hscan->xs_cbuf);
+ heap_page_prune_opt(hscan->xs_base.rel, hscan->xs_cbuf,
+ scan->modifies_base_rel ? NULL : &hscan->xs_vmbuffer,
+ !scan->modifies_base_rel);
}
/* Obtain share-lock on the buffer so we can examine visibility */
@@ -2471,6 +2480,7 @@ BitmapHeapScanNextBlock(TableScanDesc scan,
TBMIterateResult *tbmres;
OffsetNumber offsets[TBM_MAX_TUPLES_PER_PAGE];
int noffsets = -1;
+ bool allow_vmset = false;
Assert(scan->rs_flags & SO_TYPE_BITMAPSCAN);
Assert(hscan->rs_read_stream);
@@ -2517,7 +2527,10 @@ BitmapHeapScanNextBlock(TableScanDesc scan,
/*
* Prune and repair fragmentation for the whole page, if possible.
*/
- heap_page_prune_opt(scan->rs_rd, buffer);
+ allow_vmset = scan->rs_flags & SO_ALLOW_VM_SET;
+ heap_page_prune_opt(scan->rs_rd, buffer,
+ allow_vmset ? &hscan->rs_vmbuffer : NULL,
+ allow_vmset);
/*
* We must hold share lock on the buffer content while examining tuple
diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index f6509695e3a..af23008ddf7 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -158,6 +158,7 @@ typedef struct
bool all_visible;
bool all_frozen;
TransactionId visibility_cutoff_xid;
+ TransactionId oldest_xmin;
} PruneState;
/* Local functions */
@@ -203,9 +204,13 @@ static bool identify_and_fix_vm_corruption(Relation relation,
* if there's not any use in pruning.
*
* Caller must have pin on the buffer, and must *not* have a lock on it.
+ *
+ * If allow_vmset is true, it is okay for pruning to set the visibility map if
+ * the page is all visible.
*/
void
-heap_page_prune_opt(Relation relation, Buffer buffer)
+heap_page_prune_opt(Relation relation, Buffer buffer,
+ Buffer *vmbuffer, bool allow_vmset)
{
Page page = BufferGetPage(buffer);
TransactionId prune_xid;
@@ -260,6 +265,9 @@ heap_page_prune_opt(Relation relation, Buffer buffer)
if (!ConditionalLockBufferForCleanup(buffer))
return;
+ /* Caller should not pass a vmbuffer if allow_vmset is false. */
+ Assert(allow_vmset || vmbuffer == NULL);
+
/*
* Now that we have buffer lock, get accurate information about the
* page's free space, and recheck the heuristic about whether to
@@ -269,6 +277,13 @@ heap_page_prune_opt(Relation relation, Buffer buffer)
{
OffsetNumber dummy_off_loc;
PruneFreezeResult presult;
+ int options = 0;
+
+ if (allow_vmset)
+ {
+ visibilitymap_pin(relation, BufferGetBlockNumber(buffer), vmbuffer);
+ options = HEAP_PAGE_PRUNE_UPDATE_VM;
+ }
/*
* For now, pass mark_unused_now as false regardless of whether or
@@ -276,8 +291,8 @@ heap_page_prune_opt(Relation relation, Buffer buffer)
* that during on-access pruning with the current implementation.
*/
heap_page_prune_and_freeze(relation, buffer, false,
- InvalidBuffer,
- vistest, 0,
+ vmbuffer ? *vmbuffer : InvalidBuffer,
+ vistest, options,
NULL, &presult, PRUNE_ON_ACCESS, &dummy_off_loc, NULL, NULL);
/*
@@ -467,6 +482,10 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
prstate.mark_unused_now = (options & HEAP_PAGE_PRUNE_MARK_UNUSED_NOW) != 0;
prstate.freeze = (options & HEAP_PAGE_PRUNE_FREEZE) != 0;
prstate.update_vm = (options & HEAP_PAGE_PRUNE_UPDATE_VM) != 0;
+ if (cutoffs)
+ prstate.oldest_xmin = cutoffs->OldestXmin;
+ else
+ prstate.oldest_xmin = OldestXminFromGlobalVisState(vistest);
prstate.cutoffs = cutoffs;
/*
@@ -877,6 +896,20 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
*/
if (prstate.update_vm)
{
+ /*
+ * If this is on-access and we aren't actually pruning, don't set the
+ * VM if doing so would newly dirty the heap page or, if the page is
+ * already dirty, if the WAL record emitted would have to contain an
+ * FPI of the heap page. This should rarely happen, as we only attempt
+ * on-access pruning when pd_prune_xid is valid.
+ */
+ if (reason == PRUNE_ON_ACCESS &&
+ !do_prune && !do_freeze &&
+ (!BufferIsDirty(buffer) || XLogCheckBufferNeedsBackup(buffer)))
+ {
+ /* Don't update the VM */
+ }
+
/*
* Clear any VM corruption. This does not need to be in a critical
* section, so we do it first. If PD_ALL_VISIBLE is incorrectly set,
@@ -885,9 +918,9 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
* of VM corruption, so we don't have to worry about the extra
* performance overhead.
*/
- if (identify_and_fix_vm_corruption(relation,
- blockno, buffer, page,
- blk_known_av, prstate.lpdead_items, vmbuffer))
+ else if (identify_and_fix_vm_corruption(relation,
+ blockno, buffer, page,
+ blk_known_av, prstate.lpdead_items, vmbuffer))
{
/* If we fix corruption, don't update the VM further */
}
@@ -1013,7 +1046,7 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
*/
else if (do_freeze)
{
- conflict_xid = prstate.cutoffs->OldestXmin;
+ conflict_xid = prstate.oldest_xmin;
TransactionIdRetreat(conflict_xid);
}
@@ -1071,12 +1104,10 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
TransactionId debug_cutoff;
bool debug_all_frozen;
- Assert(cutoffs);
-
Assert(prstate.lpdead_items == 0);
if (!heap_page_is_all_visible(relation, buffer,
- cutoffs->OldestXmin,
+ prstate.oldest_xmin,
&debug_all_frozen,
&debug_cutoff, off_loc))
Assert(false);
@@ -1136,9 +1167,8 @@ heap_prune_satisfies_vacuum(PruneState *prstate, HeapTuple tup, Buffer buffer)
* vacuuming the relation. OldestXmin is used for freezing determination
* and we cannot freeze dead tuples' xmaxes.
*/
- if (prstate->cutoffs &&
- TransactionIdIsValid(prstate->cutoffs->OldestXmin) &&
- NormalTransactionIdPrecedes(dead_after, prstate->cutoffs->OldestXmin))
+ if (TransactionIdIsValid(prstate->oldest_xmin) &&
+ NormalTransactionIdPrecedes(dead_after, prstate->oldest_xmin))
return HEAPTUPLE_DEAD;
/*
@@ -1607,8 +1637,7 @@ heap_prune_record_unchanged_lp_normal(Page page, PruneState *prstate, OffsetNumb
* could use GlobalVisTestIsRemovableXid instead, if a
* non-freezing caller wanted to set the VM bit.
*/
- Assert(prstate->cutoffs);
- if (!TransactionIdPrecedes(xmin, prstate->cutoffs->OldestXmin))
+ if (!TransactionIdPrecedes(xmin, prstate->oldest_xmin))
{
prstate->all_visible = false;
break;
diff --git a/src/backend/access/index/indexam.c b/src/backend/access/index/indexam.c
index 219df1971da..d803c307517 100644
--- a/src/backend/access/index/indexam.c
+++ b/src/backend/access/index/indexam.c
@@ -279,6 +279,32 @@ index_beginscan(Relation heapRelation,
return scan;
}
+/*
+ * Similar to index_beginscan(), but allows the caller to indicate whether the
+ * query modifies the underlying base relation. This is used when the caller
+ * wants to attempt marking pages in the base relation as all-visible in the
+ * visibility map during on-access pruning.
+ */
+IndexScanDesc
+index_beginscan_vmset(Relation heapRelation,
+ Relation indexRelation,
+ Snapshot snapshot,
+ IndexScanInstrumentation *instrument,
+ int nkeys, int norderbys, bool modifies_base_rel)
+{
+ IndexScanDesc scan;
+
+ scan = index_beginscan(heapRelation,
+ indexRelation,
+ snapshot,
+ instrument,
+ nkeys, norderbys);
+
+ scan->xs_heapfetch->modifies_base_rel = modifies_base_rel;
+
+ return scan;
+}
+
/*
* index_beginscan_bitmap - start a scan of an index with amgetbitmap
*
@@ -610,6 +636,26 @@ index_beginscan_parallel(Relation heaprel, Relation indexrel,
return scan;
}
+/*
+ * Parallel version of index_beginscan_vmset()
+ */
+IndexScanDesc
+index_beginscan_parallel_vmset(Relation heaprel, Relation indexrel,
+ IndexScanInstrumentation *instrument,
+ int nkeys, int norderbys,
+ ParallelIndexScanDesc pscan,
+ bool modifies_base_rel)
+{
+ IndexScanDesc scan;
+
+ scan = index_beginscan_parallel(heaprel, indexrel,
+ instrument,
+ nkeys, norderbys,
+ pscan);
+ scan->xs_heapfetch->modifies_base_rel = modifies_base_rel;
+ return scan;
+}
+
/* ----------------
* index_getnext_tid - get the next TID from a scan
*
diff --git a/src/backend/access/table/tableam.c b/src/backend/access/table/tableam.c
index a56c5eceb14..67dbf99f5b5 100644
--- a/src/backend/access/table/tableam.c
+++ b/src/backend/access/table/tableam.c
@@ -49,6 +49,10 @@
char *default_table_access_method = DEFAULT_TABLE_ACCESS_METHOD;
bool synchronize_seqscans = true;
+/* Helper for table_beginscan_parallel() and table_beginscan_parallel_vmset() */
+static TableScanDesc table_beginscan_parallel_common(Relation relation, ParallelTableScanDesc pscan,
+ uint32 flags);
+
/* ----------------------------------------------------------------------------
* Slot functions.
@@ -162,12 +166,14 @@ table_parallelscan_initialize(Relation rel, ParallelTableScanDesc pscan,
}
}
-TableScanDesc
-table_beginscan_parallel(Relation relation, ParallelTableScanDesc pscan)
+/*
+ * Common helper for table_beginscan_parallel() and table_beginscan_parallel_vmset()
+ */
+static TableScanDesc
+table_beginscan_parallel_common(Relation relation, ParallelTableScanDesc pscan,
+ uint32 flags)
{
Snapshot snapshot;
- uint32 flags = SO_TYPE_SEQSCAN |
- SO_ALLOW_STRAT | SO_ALLOW_SYNC | SO_ALLOW_PAGEMODE;
Assert(RelFileLocatorEquals(relation->rd_locator, pscan->phs_locator));
@@ -188,6 +194,31 @@ table_beginscan_parallel(Relation relation, ParallelTableScanDesc pscan)
pscan, flags);
}
+TableScanDesc
+table_beginscan_parallel(Relation relation, ParallelTableScanDesc pscan)
+{
+ uint32 flags = SO_TYPE_SEQSCAN |
+ SO_ALLOW_STRAT | SO_ALLOW_SYNC | SO_ALLOW_PAGEMODE;
+
+ return table_beginscan_parallel_common(relation, pscan, flags);
+}
+
+/*
+ * Parallel version of table_beginscan_vmset()
+ */
+TableScanDesc
+table_beginscan_parallel_vmset(Relation relation, ParallelTableScanDesc pscan,
+ bool modifies_rel)
+{
+ uint32 flags = SO_TYPE_SEQSCAN |
+ SO_ALLOW_STRAT | SO_ALLOW_SYNC | SO_ALLOW_PAGEMODE;
+
+ if (!modifies_rel)
+ flags |= SO_ALLOW_VM_SET;
+
+ return table_beginscan_parallel_common(relation, pscan, flags);
+}
+
/* ----------------------------------------------------------------------------
* Index scan related functions.
diff --git a/src/backend/executor/execMain.c b/src/backend/executor/execMain.c
index 0391798dd2c..065676eb7cf 100644
--- a/src/backend/executor/execMain.c
+++ b/src/backend/executor/execMain.c
@@ -917,6 +917,10 @@ InitPlan(QueryDesc *queryDesc, int eflags)
break;
}
+ /* If it has a rowmark, the relation is modified */
+ estate->es_modified_relids = bms_add_member(estate->es_modified_relids,
+ rc->rti);
+
/* Check that relation is a legal target for marking */
if (relation)
CheckValidRowMarkRel(relation, rc->markType);
diff --git a/src/backend/executor/execUtils.c b/src/backend/executor/execUtils.c
index fdc65c2b42b..28a06dcd244 100644
--- a/src/backend/executor/execUtils.c
+++ b/src/backend/executor/execUtils.c
@@ -893,6 +893,8 @@ ExecInitResultRelation(EState *estate, ResultRelInfo *resultRelInfo,
estate->es_result_relations = (ResultRelInfo **)
palloc0(estate->es_range_table_size * sizeof(ResultRelInfo *));
estate->es_result_relations[rti - 1] = resultRelInfo;
+ estate->es_modified_relids = bms_add_member(estate->es_modified_relids,
+ rti);
/*
* Saving in the list allows to avoid needlessly traversing the whole
diff --git a/src/backend/executor/nodeBitmapHeapscan.c b/src/backend/executor/nodeBitmapHeapscan.c
index bf24f3d7fe0..2c57bc7ac49 100644
--- a/src/backend/executor/nodeBitmapHeapscan.c
+++ b/src/backend/executor/nodeBitmapHeapscan.c
@@ -109,7 +109,8 @@ BitmapTableScanSetup(BitmapHeapScanState *node)
table_beginscan_bm(node->ss.ss_currentRelation,
node->ss.ps.state->es_snapshot,
0,
- NULL);
+ NULL,
+ node->modifies_rel);
}
node->ss.ss_currentScanDesc->st.rs_tbmiterator = tbmiterator;
@@ -360,6 +361,9 @@ ExecInitBitmapHeapScan(BitmapHeapScan *node, EState *estate, int eflags)
scanstate->initialized = false;
scanstate->pstate = NULL;
scanstate->recheck = true;
+ scanstate->modifies_rel =
+ bms_is_member(node->scan.scanrelid,
+ estate->es_modified_relids);
/*
* Miscellaneous initialization
diff --git a/src/backend/executor/nodeIndexscan.c b/src/backend/executor/nodeIndexscan.c
index 7fcaa37fe62..f91c6b17620 100644
--- a/src/backend/executor/nodeIndexscan.c
+++ b/src/backend/executor/nodeIndexscan.c
@@ -106,12 +106,13 @@ IndexNext(IndexScanState *node)
* We reach here if the index scan is not parallel, or if we're
* serially executing an index scan that was planned to be parallel.
*/
- scandesc = index_beginscan(node->ss.ss_currentRelation,
- node->iss_RelationDesc,
- estate->es_snapshot,
- &node->iss_Instrument,
- node->iss_NumScanKeys,
- node->iss_NumOrderByKeys);
+ scandesc = index_beginscan_vmset(node->ss.ss_currentRelation,
+ node->iss_RelationDesc,
+ estate->es_snapshot,
+ &node->iss_Instrument,
+ node->iss_NumScanKeys,
+ node->iss_NumOrderByKeys,
+ node->iss_ModifiesBaseRel);
node->iss_ScanDesc = scandesc;
@@ -935,6 +936,10 @@ ExecInitIndexScan(IndexScan *node, EState *estate, int eflags)
indexstate->ss.ss_currentRelation = currentRelation;
indexstate->ss.ss_currentScanDesc = NULL; /* no heap scan here */
+ indexstate->iss_ModifiesBaseRel =
+ bms_is_member(node->scan.scanrelid,
+ estate->es_modified_relids);
+
/*
* get the scan type from the relation descriptor.
*/
diff --git a/src/backend/executor/nodeSeqscan.c b/src/backend/executor/nodeSeqscan.c
index ed35c58c2c3..cded7f15703 100644
--- a/src/backend/executor/nodeSeqscan.c
+++ b/src/backend/executor/nodeSeqscan.c
@@ -69,9 +69,9 @@ SeqNext(SeqScanState *node)
* We reach here if the scan is not parallel, or if we're serially
* executing a scan that was planned to be parallel.
*/
- scandesc = table_beginscan(node->ss.ss_currentRelation,
- estate->es_snapshot,
- 0, NULL);
+ scandesc = table_beginscan_vmset(node->ss.ss_currentRelation,
+ estate->es_snapshot,
+ 0, NULL, node->modifies_rel);
node->ss.ss_currentScanDesc = scandesc;
}
@@ -237,6 +237,10 @@ ExecInitSeqScan(SeqScan *node, EState *estate, int eflags)
node->scan.scanrelid,
eflags);
+ scanstate->modifies_rel =
+ bms_is_member(node->scan.scanrelid,
+ estate->es_modified_relids);
+
/* and create slot with the appropriate rowtype */
ExecInitScanTupleSlot(estate, &scanstate->ss,
RelationGetDescr(scanstate->ss.ss_currentRelation),
@@ -370,7 +374,8 @@ ExecSeqScanInitializeDSM(SeqScanState *node,
estate->es_snapshot);
shm_toc_insert(pcxt->toc, node->ss.ps.plan->plan_node_id, pscan);
node->ss.ss_currentScanDesc =
- table_beginscan_parallel(node->ss.ss_currentRelation, pscan);
+ table_beginscan_parallel_vmset(node->ss.ss_currentRelation, pscan,
+ node->modifies_rel);
}
/* ----------------------------------------------------------------
@@ -403,5 +408,7 @@ ExecSeqScanInitializeWorker(SeqScanState *node,
pscan = shm_toc_lookup(pwcxt->toc, node->ss.ps.plan->plan_node_id, false);
node->ss.ss_currentScanDesc =
- table_beginscan_parallel(node->ss.ss_currentRelation, pscan);
+ table_beginscan_parallel_vmset(node->ss.ss_currentRelation,
+ pscan,
+ node->modifies_rel);
}
diff --git a/src/backend/storage/ipc/procarray.c b/src/backend/storage/ipc/procarray.c
index e5b945a9ee3..01d2bda3f72 100644
--- a/src/backend/storage/ipc/procarray.c
+++ b/src/backend/storage/ipc/procarray.c
@@ -4133,6 +4133,18 @@ GlobalVisTestFor(Relation rel)
return state;
}
+/*
+ * Returns maybe_needed as a 32-bit TransactionId. Can be used in callers that
+ * need to compare transaction IDs to a single value and are okay with using
+ * the more conservative boundary.
+ */
+TransactionId
+OldestXminFromGlobalVisState(GlobalVisState *state)
+{
+ return XidFromFullTransactionId(state->maybe_needed);
+}
+
+
/*
* Return true if it's worth updating the accurate maybe_needed boundary.
*
diff --git a/src/include/access/genam.h b/src/include/access/genam.h
index 5b2ab181b5f..bf272c2c37f 100644
--- a/src/include/access/genam.h
+++ b/src/include/access/genam.h
@@ -180,6 +180,11 @@ extern IndexScanDesc index_beginscan(Relation heapRelation,
Snapshot snapshot,
IndexScanInstrumentation *instrument,
int nkeys, int norderbys);
+extern IndexScanDesc index_beginscan_vmset(Relation heapRelation,
+ Relation indexRelation,
+ Snapshot snapshot,
+ IndexScanInstrumentation *instrument,
+ int nkeys, int norderbys, bool modifies_heap_rel);
extern IndexScanDesc index_beginscan_bitmap(Relation indexRelation,
Snapshot snapshot,
IndexScanInstrumentation *instrument,
@@ -206,6 +211,12 @@ extern IndexScanDesc index_beginscan_parallel(Relation heaprel,
IndexScanInstrumentation *instrument,
int nkeys, int norderbys,
ParallelIndexScanDesc pscan);
+
+extern IndexScanDesc index_beginscan_parallel_vmset(Relation heaprel, Relation indexrel,
+ IndexScanInstrumentation *instrument,
+ int nkeys, int norderbys,
+ ParallelIndexScanDesc pscan,
+ bool modifies_rel);
extern ItemPointer index_getnext_tid(IndexScanDesc scan,
ScanDirection direction);
struct TupleTableSlot;
diff --git a/src/include/access/heapam.h b/src/include/access/heapam.h
index 0b9bb1c9b13..46ea8b8455c 100644
--- a/src/include/access/heapam.h
+++ b/src/include/access/heapam.h
@@ -94,6 +94,13 @@ typedef struct HeapScanDescData
*/
ParallelBlockTableScanWorkerData *rs_parallelworkerdata;
+ /*
+ * For sequential scans and bitmap heap scans. If the relation is not
+ * being modified, on-access pruning may read in the current heap page's
+ * corresponding VM block to this buffer.
+ */
+ Buffer rs_vmbuffer;
+
/* these fields only used in page-at-a-time mode and for bitmap scans */
uint32 rs_cindex; /* current tuple's index in vistuples */
uint32 rs_ntuples; /* number of visible tuples on page */
@@ -116,8 +123,18 @@ typedef struct IndexFetchHeapData
{
IndexFetchTableData xs_base; /* AM independent part of the descriptor */
- Buffer xs_cbuf; /* current heap buffer in scan, if any */
- /* NB: if xs_cbuf is not InvalidBuffer, we hold a pin on that buffer */
+ /*
+ * Current heap buffer in scan, if any. NB: if xs_cbuf is not
+ * InvalidBuffer, we hold a pin on that buffer.
+ */
+ Buffer xs_cbuf;
+
+ /*
+ * For index scans that do not modify the underlying heap table, on-access
+ * pruning may read in the current heap page's corresponding VM block to
+ * this buffer.
+ */
+ Buffer xs_vmbuffer;
} IndexFetchHeapData;
/* Result codes for HeapTupleSatisfiesVacuum */
@@ -374,7 +391,8 @@ extern TransactionId heap_index_delete_tuples(Relation rel,
/* in heap/pruneheap.c */
struct GlobalVisState;
-extern void heap_page_prune_opt(Relation relation, Buffer buffer);
+extern void heap_page_prune_opt(Relation relation, Buffer buffer,
+ Buffer *vmbuffer, bool allow_vmset);
extern void heap_page_prune_and_freeze(Relation relation, Buffer buffer,
bool blk_known_av,
Buffer vmbuffer,
diff --git a/src/include/access/relscan.h b/src/include/access/relscan.h
index b5e0fb386c0..f496e0b4939 100644
--- a/src/include/access/relscan.h
+++ b/src/include/access/relscan.h
@@ -121,6 +121,12 @@ typedef struct ParallelBlockTableScanWorkerData *ParallelBlockTableScanWorker;
typedef struct IndexFetchTableData
{
Relation rel;
+
+ /*
+ * Some optimizations can only be performed if the query does not modify
+ * the underlying relation. Track that here.
+ */
+ bool modifies_base_rel;
} IndexFetchTableData;
struct IndexScanInstrumentation;
diff --git a/src/include/access/tableam.h b/src/include/access/tableam.h
index 1c9e802a6b1..0e986d8ef72 100644
--- a/src/include/access/tableam.h
+++ b/src/include/access/tableam.h
@@ -62,6 +62,8 @@ typedef enum ScanOptions
/* unregister snapshot at scan end? */
SO_TEMP_SNAPSHOT = 1 << 9,
+ /* whether or not scan should attempt to set the VM */
+ SO_ALLOW_VM_SET = 1 << 10,
} ScanOptions;
/*
@@ -876,6 +878,25 @@ table_beginscan(Relation rel, Snapshot snapshot,
return rel->rd_tableam->scan_begin(rel, snapshot, nkeys, key, NULL, flags);
}
+/*
+ * Similar to table_beginscan(), but allows the caller to indicate whether the
+ * query modifies the relation. This is used when the caller wants to attempt
+ * marking pages in the relation as all-visible in the visibility map during
+ * on-access pruning.
+ */
+static inline TableScanDesc
+table_beginscan_vmset(Relation rel, Snapshot snapshot,
+ int nkeys, struct ScanKeyData *key, bool modifies_rel)
+{
+ uint32 flags = SO_TYPE_SEQSCAN |
+ SO_ALLOW_STRAT | SO_ALLOW_SYNC | SO_ALLOW_PAGEMODE;
+
+ if (!modifies_rel)
+ flags |= SO_ALLOW_VM_SET;
+
+ return rel->rd_tableam->scan_begin(rel, snapshot, nkeys, key, NULL, flags);
+}
+
/*
* Like table_beginscan(), but for scanning catalog. It'll automatically use a
* snapshot appropriate for scanning catalog relations.
@@ -913,10 +934,13 @@ table_beginscan_strat(Relation rel, Snapshot snapshot,
*/
static inline TableScanDesc
table_beginscan_bm(Relation rel, Snapshot snapshot,
- int nkeys, struct ScanKeyData *key)
+ int nkeys, struct ScanKeyData *key, bool modifies_rel)
{
uint32 flags = SO_TYPE_BITMAPSCAN | SO_ALLOW_PAGEMODE;
+ if (!modifies_rel)
+ flags |= SO_ALLOW_VM_SET;
+
return rel->rd_tableam->scan_begin(rel, snapshot, nkeys, key,
NULL, flags);
}
@@ -1125,6 +1149,10 @@ extern void table_parallelscan_initialize(Relation rel,
extern TableScanDesc table_beginscan_parallel(Relation relation,
ParallelTableScanDesc pscan);
+extern TableScanDesc table_beginscan_parallel_vmset(Relation relation,
+ ParallelTableScanDesc pscan,
+ bool modifies_rel);
+
/*
* Restart a parallel scan. Call this in the leader process. Caller is
* responsible for making sure that all workers have finished the scan
diff --git a/src/include/nodes/execnodes.h b/src/include/nodes/execnodes.h
index e107d6e5f81..1d0b374b652 100644
--- a/src/include/nodes/execnodes.h
+++ b/src/include/nodes/execnodes.h
@@ -680,6 +680,12 @@ typedef struct EState
* ExecDoInitialPruning() */
const char *es_sourceText; /* Source text from QueryDesc */
+ /*
+ * RT indexes of relations modified by the query either through
+ * UPDATE/DELETE/INSERT/MERGE or SELECT FOR UPDATE
+ */
+ Bitmapset *es_modified_relids;
+
JunkFilter *es_junkFilter; /* top-level junk filter, if any */
/* If query can insert/delete tuples, the command ID to mark them with */
@@ -1631,6 +1637,13 @@ typedef struct SeqScanState
{
ScanState ss; /* its first field is NodeTag */
Size pscan_len; /* size of parallel heap scan descriptor */
+
+ /*
+ * Whether or not the query modifies the relation scanned by this node.
+ * This is used to avoid the overhead of optimizations that are only
+ * effective for tables not modified by the query.
+ */
+ bool modifies_rel;
} SeqScanState;
/* ----------------
@@ -1702,6 +1715,7 @@ typedef struct
* OrderByTypByVals is the datatype of order by expression pass-by-value?
* OrderByTypLens typlens of the datatypes of order by expressions
* PscanLen size of parallel index scan descriptor
+ * ModifiesBaseRel true if query modifies base relation
* ----------------
*/
typedef struct IndexScanState
@@ -1731,6 +1745,7 @@ typedef struct IndexScanState
bool *iss_OrderByTypByVals;
int16 *iss_OrderByTypLens;
Size iss_PscanLen;
+ bool iss_ModifiesBaseRel;
} IndexScanState;
/* ----------------
@@ -1888,6 +1903,7 @@ typedef struct SharedBitmapHeapInstrumentation
* pstate shared state for parallel bitmap scan
* sinstrument statistics for parallel workers
* recheck do current page's tuples need recheck
+ * modifies_rel does the query modify the base relation
* ----------------
*/
typedef struct BitmapHeapScanState
@@ -1900,6 +1916,7 @@ typedef struct BitmapHeapScanState
ParallelBitmapHeapState *pstate;
SharedBitmapHeapInstrumentation *sinstrument;
bool recheck;
+ bool modifies_rel;
} BitmapHeapScanState;
/* ----------------
diff --git a/src/include/utils/snapmgr.h b/src/include/utils/snapmgr.h
index d346be71642..fcb10b8d136 100644
--- a/src/include/utils/snapmgr.h
+++ b/src/include/utils/snapmgr.h
@@ -101,6 +101,7 @@ extern bool GlobalVisTestIsRemovableXid(GlobalVisState *state, TransactionId xid
extern bool GlobalVisTestIsRemovableFullXid(GlobalVisState *state, FullTransactionId fxid);
extern bool GlobalVisCheckRemovableXid(Relation rel, TransactionId xid);
extern bool GlobalVisCheckRemovableFullXid(Relation rel, FullTransactionId fxid);
+extern TransactionId OldestXminFromGlobalVisState(GlobalVisState *state);
/*
* Utility functions for implementing visibility routines in table AMs.
--
2.43.0
On Wed, Jul 9, 2025 at 5:59 PM Melanie Plageman
<melanieplageman@gmail.com> wrote:
On Thu, Jun 26, 2025 at 6:04 PM Melanie Plageman
<melanieplageman@gmail.com> wrote:Rebased in light of recent changes on master:
This needed another rebase, and, in light of the discussion in [1],
I've also removed the patch to add heap wrappers for setting pages
all-visible.
Andrey Borodin made the excellent point off-list that I forgot to
remove the xl_heap_visible struct itself -- which is rather important
to a patch set purporting to eliminate xl_heap_visible! New version
attached.
- Melanie
Attachments:
v4-0002-Eliminate-xl_heap_visible-in-COPY-FREEZE.patchapplication/x-patch; name=v4-0002-Eliminate-xl_heap_visible-in-COPY-FREEZE.patchDownload
From 68f26df83f6ac0f8ce9a8c73894d2298c5273996 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <melanieplageman@gmail.com>
Date: Tue, 17 Jun 2025 17:22:10 -0400
Subject: [PATCH v4 02/13] Eliminate xl_heap_visible in COPY FREEZE
Instead of emitting a separate WAL record for setting the VM bits in
xl_heap_visible, include the required update in the xl_heap_multi_insert
record instead.
---
src/backend/access/heap/heapam.c | 47 +++++++++++---------
src/backend/access/heap/heapam_xlog.c | 39 +++++++++++++++-
src/backend/access/heap/visibilitymap.c | 59 +++++++++++++++++++++++++
src/backend/access/rmgrdesc/heapdesc.c | 5 +++
src/include/access/visibilitymap.h | 2 +
5 files changed, 130 insertions(+), 22 deletions(-)
diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index 0dcd6ee817e..68db4325285 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -2493,9 +2493,6 @@ heap_multi_insert(Relation relation, TupleTableSlot **slots, int ntuples,
/*
* If the page is all visible, need to clear that, unless we're only
* going to add further frozen rows to it.
- *
- * If we're only adding already frozen rows to a previously empty
- * page, mark it as all-visible.
*/
if (PageIsAllVisible(page) && !(options & HEAP_INSERT_FROZEN))
{
@@ -2505,8 +2502,22 @@ heap_multi_insert(Relation relation, TupleTableSlot **slots, int ntuples,
BufferGetBlockNumber(buffer),
vmbuffer, VISIBILITYMAP_VALID_BITS);
}
+
+ /*
+ * If we're only adding already frozen rows to a previously empty
+ * page, mark it as all-frozen and update the visibility map. We're
+ * already holding a pin on the vmbuffer.
+ */
else if (all_frozen_set)
+ {
PageSetAllVisible(page);
+ LockBuffer(vmbuffer, BUFFER_LOCK_EXCLUSIVE);
+ visibilitymap_set_vmbyte(relation,
+ BufferGetBlockNumber(buffer),
+ vmbuffer,
+ VISIBILITYMAP_ALL_VISIBLE |
+ VISIBILITYMAP_ALL_FROZEN);
+ }
/*
* XXX Should we set PageSetPrunable on this page ? See heap_insert()
@@ -2554,6 +2565,12 @@ heap_multi_insert(Relation relation, TupleTableSlot **slots, int ntuples,
xlrec->flags = 0;
if (all_visible_cleared)
xlrec->flags = XLH_INSERT_ALL_VISIBLE_CLEARED;
+
+ /*
+ * We don't have to worry about including a conflict xid in the
+ * WAL record as HEAP_INSERT_FROZEN intentionally violates
+ * visibility rules.
+ */
if (all_frozen_set)
xlrec->flags = XLH_INSERT_ALL_FROZEN_SET;
@@ -2616,7 +2633,10 @@ heap_multi_insert(Relation relation, TupleTableSlot **slots, int ntuples,
XLogBeginInsert();
XLogRegisterData(xlrec, tupledata - scratch.data);
+
XLogRegisterBuffer(0, buffer, REGBUF_STANDARD | bufflags);
+ if (all_frozen_set)
+ XLogRegisterBuffer(1, vmbuffer, 0);
XLogRegisterBufData(0, tupledata, totaldatalen);
@@ -2626,29 +2646,14 @@ heap_multi_insert(Relation relation, TupleTableSlot **slots, int ntuples,
recptr = XLogInsert(RM_HEAP2_ID, info);
PageSetLSN(page, recptr);
+ if (all_frozen_set)
+ PageSetLSN(BufferGetPage(vmbuffer), recptr);
}
END_CRIT_SECTION();
- /*
- * If we've frozen everything on the page, update the visibilitymap.
- * We're already holding pin on the vmbuffer.
- */
if (all_frozen_set)
- {
- Assert(PageIsAllVisible(page));
- Assert(visibilitymap_pin_ok(BufferGetBlockNumber(buffer), vmbuffer));
-
- /*
- * It's fine to use InvalidTransactionId here - this is only used
- * when HEAP_INSERT_FROZEN is specified, which intentionally
- * violates visibility rules.
- */
- visibilitymap_set(relation, BufferGetBlockNumber(buffer), buffer,
- InvalidXLogRecPtr, vmbuffer,
- InvalidTransactionId,
- VISIBILITYMAP_ALL_VISIBLE | VISIBILITYMAP_ALL_FROZEN);
- }
+ LockBuffer(vmbuffer, BUFFER_LOCK_UNLOCK);
UnlockReleaseBuffer(buffer);
ndone += nthispage;
diff --git a/src/backend/access/heap/heapam_xlog.c b/src/backend/access/heap/heapam_xlog.c
index eb4bd3d6ae3..2485c344191 100644
--- a/src/backend/access/heap/heapam_xlog.c
+++ b/src/backend/access/heap/heapam_xlog.c
@@ -552,6 +552,7 @@ heap_xlog_multi_insert(XLogReaderState *record)
int i;
bool isinit = (XLogRecGetInfo(record) & XLOG_HEAP_INIT_PAGE) != 0;
XLogRedoAction action;
+ Buffer vmbuffer = InvalidBuffer;
/*
* Insertion doesn't overwrite MVCC data, so no conflict processing is
@@ -572,11 +573,11 @@ heap_xlog_multi_insert(XLogReaderState *record)
if (xlrec->flags & XLH_INSERT_ALL_VISIBLE_CLEARED)
{
Relation reln = CreateFakeRelcacheEntry(rlocator);
- Buffer vmbuffer = InvalidBuffer;
visibilitymap_pin(reln, blkno, &vmbuffer);
visibilitymap_clear(reln, blkno, vmbuffer, VISIBILITYMAP_VALID_BITS);
ReleaseBuffer(vmbuffer);
+ vmbuffer = InvalidBuffer;
FreeFakeRelcacheEntry(reln);
}
@@ -663,6 +664,42 @@ heap_xlog_multi_insert(XLogReaderState *record)
if (BufferIsValid(buffer))
UnlockReleaseBuffer(buffer);
+ buffer = InvalidBuffer;
+
+ /*
+ * Now read and update the VM block. Even if we skipped updating the heap
+ * page due to the file being dropped or truncated later in recovery, it's
+ * still safe to update the visibility map. Any WAL record that clears
+ * the visibility map bit does so before checking the page LSN, so any
+ * bits that need to be cleared will still be cleared.
+ *
+ * It is only okay to set the VM bits without holding the heap page lock
+ * because we can expect no other writers of this page.
+ */
+ if (xlrec->flags & XLH_INSERT_ALL_FROZEN_SET &&
+ XLogReadBufferForRedoExtended(record, 1, RBM_ZERO_ON_ERROR, false,
+ &vmbuffer) == BLK_NEEDS_REDO)
+ {
+ Relation reln = CreateFakeRelcacheEntry(rlocator);
+
+ visibilitymap_pin(reln, blkno, &vmbuffer);
+ visibilitymap_set_vmbyte(reln, blkno,
+ vmbuffer,
+ VISIBILITYMAP_ALL_VISIBLE |
+ VISIBILITYMAP_ALL_FROZEN);
+
+ /*
+ * It is not possible that the VM was already set for this heap page,
+ * so the vmbuffer must have been modified and marked dirty.
+ */
+ Assert(BufferIsDirty(vmbuffer));
+ PageSetLSN(BufferGetPage(vmbuffer), lsn);
+ FreeFakeRelcacheEntry(reln);
+ }
+
+ if (BufferIsValid(vmbuffer))
+ UnlockReleaseBuffer(vmbuffer);
+
/*
* If the page is running low on free space, update the FSM as well.
* Arbitrarily, our definition of "low" is less than 20%. We can't do much
diff --git a/src/backend/access/heap/visibilitymap.c b/src/backend/access/heap/visibilitymap.c
index 745a04ef26e..573df6f6891 100644
--- a/src/backend/access/heap/visibilitymap.c
+++ b/src/backend/access/heap/visibilitymap.c
@@ -318,6 +318,65 @@ visibilitymap_set(Relation rel, BlockNumber heapBlk, Buffer heapBuf,
return status;
}
+/*
+ * Set flags in the VM block contained in the passed in vmBuf.
+ *
+ * This function is for callers which include the VM changes in the same WAL
+ * record as the modifications of the heap page which rendered it all-visible.
+ * Callers separately logging the VM changes should invoke visibilitymap_set()
+ * instead.
+ *
+ * Caller must have pinned and exclusive locked the correct block of the VM in
+ * vmBuf.
+ *
+ * During normal operation (i.e. not recovery), this should be called in a
+ * critical section which also makes any necessary changes to the heap page
+ * and, if relevant, emits WAL.
+ *
+ * Caller is responsible for WAL logging the changes to the VM buffer and for
+ * making any changes needed to the associated heap page.
+ */
+uint8
+visibilitymap_set_vmbyte(Relation rel, BlockNumber heapBlk,
+ Buffer vmBuf, uint8 flags)
+{
+ BlockNumber mapBlock = HEAPBLK_TO_MAPBLOCK(heapBlk);
+ uint32 mapByte = HEAPBLK_TO_MAPBYTE(heapBlk);
+ uint8 mapOffset = HEAPBLK_TO_OFFSET(heapBlk);
+ Page page;
+ uint8 *map;
+ uint8 status;
+
+#ifdef TRACE_VISIBILITYMAP
+ elog(DEBUG1, "vm_set %s %d", RelationGetRelationName(rel), heapBlk);
+#endif
+
+ /* Call in same critical section where WAL is emitted. */
+ Assert(InRecovery || CritSectionCount > 0);
+
+ /* Flags should be valid. Also never clear bits with this function */
+ Assert((flags & VISIBILITYMAP_VALID_BITS) == flags);
+
+ /* Must never set all_frozen bit without also setting all_visible bit */
+ Assert(flags != VISIBILITYMAP_ALL_FROZEN);
+
+ /* Check that we have the right VM page pinned */
+ if (!BufferIsValid(vmBuf) || BufferGetBlockNumber(vmBuf) != mapBlock)
+ elog(ERROR, "wrong VM buffer passed to visibilitymap_set");
+
+ page = BufferGetPage(vmBuf);
+ map = (uint8 *) PageGetContents(page);
+
+ status = (map[mapByte] >> mapOffset) & VISIBILITYMAP_VALID_BITS;
+ if (flags != status)
+ {
+ map[mapByte] |= (flags << mapOffset);
+ MarkBufferDirty(vmBuf);
+ }
+
+ return status;
+}
+
/*
* visibilitymap_get_status - get status of bits
*
diff --git a/src/backend/access/rmgrdesc/heapdesc.c b/src/backend/access/rmgrdesc/heapdesc.c
index 82b62c95de5..b48d7dc1d24 100644
--- a/src/backend/access/rmgrdesc/heapdesc.c
+++ b/src/backend/access/rmgrdesc/heapdesc.c
@@ -16,6 +16,7 @@
#include "access/heapam_xlog.h"
#include "access/rmgrdesc_utils.h"
+#include "access/visibilitymapdefs.h"
#include "storage/standbydefs.h"
/*
@@ -354,6 +355,10 @@ heap2_desc(StringInfo buf, XLogReaderState *record)
appendStringInfo(buf, "ntuples: %d, flags: 0x%02X", xlrec->ntuples,
xlrec->flags);
+ if (xlrec->flags & XLH_INSERT_ALL_FROZEN_SET)
+ appendStringInfo(buf, ", vm_flags: 0x%02X",
+ VISIBILITYMAP_ALL_VISIBLE | VISIBILITYMAP_ALL_FROZEN);
+
if (XLogRecHasBlockData(record, 0) && !isinit)
{
appendStringInfoString(buf, ", offsets:");
diff --git a/src/include/access/visibilitymap.h b/src/include/access/visibilitymap.h
index be21c6dd1a3..977566f6b98 100644
--- a/src/include/access/visibilitymap.h
+++ b/src/include/access/visibilitymap.h
@@ -37,6 +37,8 @@ extern uint8 visibilitymap_set(Relation rel,
Buffer vmBuf,
TransactionId cutoff_xid,
uint8 flags);
+extern uint8 visibilitymap_set_vmbyte(Relation rel, BlockNumber heapBlk,
+ Buffer vmBuf, uint8 flags);
extern uint8 visibilitymap_get_status(Relation rel, BlockNumber heapBlk, Buffer *vmbuf);
extern void visibilitymap_count(Relation rel, BlockNumber *all_visible, BlockNumber *all_frozen);
extern BlockNumber visibilitymap_prepare_truncate(Relation rel,
--
2.43.0
v4-0003-Make-heap_page_is_all_visible-independent-of-LVRe.patchapplication/x-patch; name=v4-0003-Make-heap_page_is_all_visible-independent-of-LVRe.patchDownload
From 0846b7106d72c6ade04eccebe51dcc1e1cedd39a Mon Sep 17 00:00:00 2001
From: Melanie Plageman <melanieplageman@gmail.com>
Date: Wed, 18 Jun 2025 15:48:51 -0400
Subject: [PATCH v4 03/13] Make heap_page_is_all_visible independent of
LVRelState
Future commits will use this function inside of pruneheap.c where we do
not have access to the LVRelState. We only need two parameters from the
LVRelState, so just pass those in explicitly.
---
src/backend/access/heap/vacuumlazy.c | 45 ++++++++++++++++++----------
1 file changed, 29 insertions(+), 16 deletions(-)
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index 14036c27e87..8a62a93eee5 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -464,8 +464,11 @@ static void dead_items_add(LVRelState *vacrel, BlockNumber blkno, OffsetNumber *
int num_offsets);
static void dead_items_reset(LVRelState *vacrel);
static void dead_items_cleanup(LVRelState *vacrel);
-static bool heap_page_is_all_visible(LVRelState *vacrel, Buffer buf,
- TransactionId *visibility_cutoff_xid, bool *all_frozen);
+static bool heap_page_is_all_visible(Relation rel, Buffer buf,
+ TransactionId OldestXmin,
+ bool *all_frozen,
+ TransactionId *visibility_cutoff_xid,
+ OffsetNumber *logging_offnum);
static void update_relstats_all_indexes(LVRelState *vacrel);
static void vacuum_error_callback(void *arg);
static void update_vacuum_error_info(LVRelState *vacrel,
@@ -2010,8 +2013,9 @@ lazy_scan_prune(LVRelState *vacrel,
Assert(presult.lpdead_items == 0);
- if (!heap_page_is_all_visible(vacrel, buf,
- &debug_cutoff, &debug_all_frozen))
+ if (!heap_page_is_all_visible(vacrel->rel, buf,
+ vacrel->cutoffs.OldestXmin, &debug_all_frozen,
+ &debug_cutoff, &vacrel->offnum))
Assert(false);
Assert(presult.all_frozen == debug_all_frozen);
@@ -2907,8 +2911,8 @@ lazy_vacuum_heap_page(LVRelState *vacrel, BlockNumber blkno, Buffer buffer,
* emitted.
*/
Assert(!PageIsAllVisible(page));
- if (heap_page_is_all_visible(vacrel, buffer, &visibility_cutoff_xid,
- &all_frozen))
+ if (heap_page_is_all_visible(vacrel->rel, buffer, vacrel->cutoffs.OldestXmin,
+ &all_frozen, &visibility_cutoff_xid, &vacrel->offnum))
{
uint8 flags = VISIBILITYMAP_ALL_VISIBLE;
@@ -3592,9 +3596,16 @@ dead_items_cleanup(LVRelState *vacrel)
/*
* Check if every tuple in the given page is visible to all current and future
- * transactions. Also return the visibility_cutoff_xid which is the highest
- * xmin amongst the visible tuples. Set *all_frozen to true if every tuple
- * on this page is frozen.
+ * transactions.
+ *
+ * OldestXmin is used to determine visibility.
+ *
+ * *logging_offnum will have the OffsetNumber of the current tuple being
+ * processed for vacuum's error callback system.
+ *
+ * Return the visibility_cutoff_xid which is the highest xmin amongst the
+ * visible tuples. Sets *all_frozen to true if every tuple on this page is
+ * frozen.
*
* This is a stripped down version of lazy_scan_prune(). If you change
* anything here, make sure that everything stays in sync. Note that an
@@ -3602,9 +3613,11 @@ dead_items_cleanup(LVRelState *vacrel)
* introducing new side-effects here.
*/
static bool
-heap_page_is_all_visible(LVRelState *vacrel, Buffer buf,
+heap_page_is_all_visible(Relation rel, Buffer buf,
+ TransactionId OldestXmin,
+ bool *all_frozen,
TransactionId *visibility_cutoff_xid,
- bool *all_frozen)
+ OffsetNumber *logging_offnum)
{
Page page = BufferGetPage(buf);
BlockNumber blockno = BufferGetBlockNumber(buf);
@@ -3627,7 +3640,7 @@ heap_page_is_all_visible(LVRelState *vacrel, Buffer buf,
* Set the offset number so that we can display it along with any
* error that occurred while processing this tuple.
*/
- vacrel->offnum = offnum;
+ *logging_offnum = offnum;
itemid = PageGetItemId(page, offnum);
/* Unused or redirect line pointers are of no interest */
@@ -3651,9 +3664,9 @@ heap_page_is_all_visible(LVRelState *vacrel, Buffer buf,
tuple.t_data = (HeapTupleHeader) PageGetItem(page, itemid);
tuple.t_len = ItemIdGetLength(itemid);
- tuple.t_tableOid = RelationGetRelid(vacrel->rel);
+ tuple.t_tableOid = RelationGetRelid(rel);
- switch (HeapTupleSatisfiesVacuum(&tuple, vacrel->cutoffs.OldestXmin,
+ switch (HeapTupleSatisfiesVacuum(&tuple, OldestXmin,
buf))
{
case HEAPTUPLE_LIVE:
@@ -3674,7 +3687,7 @@ heap_page_is_all_visible(LVRelState *vacrel, Buffer buf,
*/
xmin = HeapTupleHeaderGetXmin(tuple.t_data);
if (!TransactionIdPrecedes(xmin,
- vacrel->cutoffs.OldestXmin))
+ OldestXmin))
{
all_visible = false;
*all_frozen = false;
@@ -3709,7 +3722,7 @@ heap_page_is_all_visible(LVRelState *vacrel, Buffer buf,
} /* scan along page */
/* Clear the offset information once we have processed the given page. */
- vacrel->offnum = InvalidOffsetNumber;
+ *logging_offnum = InvalidOffsetNumber;
return all_visible;
}
--
2.43.0
v4-0005-Use-xl_heap_prune-record-for-setting-empty-pages-.patchapplication/x-patch; name=v4-0005-Use-xl_heap_prune-record-for-setting-empty-pages-.patchDownload
From 60b36cd8b4d9e2de125690a7fcfbab7330c12287 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <melanieplageman@gmail.com>
Date: Wed, 18 Jun 2025 12:42:19 -0400
Subject: [PATCH v4 05/13] Use xl_heap_prune record for setting empty pages
all-visible
As part of a project to eliminate xl_heap_visible records, eliminate
their usage in phase I vacuum of empty pages.
---
src/backend/access/heap/pruneheap.c | 14 +++++--
src/backend/access/heap/vacuumlazy.c | 55 ++++++++++++++++++----------
src/include/access/heapam.h | 1 +
3 files changed, 47 insertions(+), 23 deletions(-)
diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index 68ecf50848b..2724cf7f64f 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -836,6 +836,7 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
conflict_xid = prstate.latest_xid_removed;
log_heap_prune_and_freeze(relation, buffer,
+ false,
InvalidBuffer, 0, false,
conflict_xid,
true, reason,
@@ -2052,6 +2053,9 @@ heap_log_freeze_plan(HeapTupleFreeze *tuples, int ntuples,
* case, vmbuffer should already have been updated and marked dirty and should
* still be pinned and locked.
*
+ * force_heap_fpi indicates that a full page image of the heap block should be
+ * forced.
+ *
* set_pd_all_vis indicates that we set PD_ALL_VISIBLE and thus should update
* the page LSN when checksums/wal_log_hints are enabled even if we did not
* prune or freeze tuples on the page.
@@ -2062,6 +2066,7 @@ heap_log_freeze_plan(HeapTupleFreeze *tuples, int ntuples,
*/
void
log_heap_prune_and_freeze(Relation relation, Buffer buffer,
+ bool force_heap_fpi,
Buffer vmbuffer,
uint8 vmflags,
bool set_pd_all_vis,
@@ -2090,13 +2095,16 @@ log_heap_prune_and_freeze(Relation relation, Buffer buffer,
xlrec.flags = 0;
regbuf_flags = REGBUF_STANDARD;
+ if (force_heap_fpi)
+ regbuf_flags |= REGBUF_FORCE_IMAGE;
+
/*
* We can avoid an FPI if the only modification we are making to the heap
* page is to set PD_ALL_VISIBLE and checksums/wal_log_hints are disabled.
*/
- if (!do_prune &&
- nfrozen == 0 &&
- (!set_pd_all_vis || !XLogHintBitIsNeeded()))
+ else if (!do_prune &&
+ nfrozen == 0 &&
+ (!set_pd_all_vis || !XLogHintBitIsNeeded()))
regbuf_flags |= REGBUF_NO_IMAGE;
/*
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index 460cdbd8417..d9e195269d2 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -1878,33 +1878,47 @@ lazy_scan_new_or_empty(LVRelState *vacrel, Buffer buf, BlockNumber blkno,
*/
if (!PageIsAllVisible(page))
{
+ uint8 new_vmbits = VISIBILITYMAP_ALL_VISIBLE |
+ VISIBILITYMAP_ALL_FROZEN;
+
START_CRIT_SECTION();
- /* mark buffer dirty before writing a WAL record */
+ PageSetAllVisible(page);
MarkBufferDirty(buf);
- /*
- * It's possible that another backend has extended the heap,
- * initialized the page, and then failed to WAL-log the page due
- * to an ERROR. Since heap extension is not WAL-logged, recovery
- * might try to replay our record setting the page all-visible and
- * find that the page isn't initialized, which will cause a PANIC.
- * To prevent that, check whether the page has been previously
- * WAL-logged, and if not, do that now.
- */
- if (RelationNeedsWAL(vacrel->rel) &&
- PageGetLSN(page) == InvalidXLogRecPtr)
- log_newpage_buffer(buf, true);
+ LockBuffer(vmbuffer, BUFFER_LOCK_EXCLUSIVE);
+ visibilitymap_set_vmbyte(vacrel->rel, blkno,
+ vmbuffer, new_vmbits);
+
+ if (RelationNeedsWAL(vacrel->rel))
+ {
+ /*
+ * It's possible that another backend has extended the heap,
+ * initialized the page, and then failed to WAL-log the page
+ * due to an ERROR. Since heap extension is not WAL-logged,
+ * recovery might try to replay our record setting the page
+ * all-visible and find that the page isn't initialized, which
+ * will cause a PANIC. To prevent that, if the page hasn't
+ * been previously WAL-logged, force a heap FPI.
+ */
+ log_heap_prune_and_freeze(vacrel->rel, buf,
+ PageGetLSN(page) == InvalidXLogRecPtr,
+ vmbuffer,
+ new_vmbits,
+ true,
+ InvalidTransactionId,
+ false, PRUNE_VACUUM_SCAN,
+ NULL, 0,
+ NULL, 0,
+ NULL, 0,
+ NULL, 0);
+ }
- PageSetAllVisible(page);
- visibilitymap_set(vacrel->rel, blkno, buf,
- InvalidXLogRecPtr,
- vmbuffer, InvalidTransactionId,
- VISIBILITYMAP_ALL_VISIBLE |
- VISIBILITYMAP_ALL_FROZEN);
END_CRIT_SECTION();
- /* Count the newly all-frozen pages for logging */
+ LockBuffer(vmbuffer, BUFFER_LOCK_UNLOCK);
+
+ /* Count the newly all-frozen pages for logging. */
vacrel->vm_new_visible_pages++;
vacrel->vm_new_visible_frozen_pages++;
}
@@ -2918,6 +2932,7 @@ lazy_vacuum_heap_page(LVRelState *vacrel, BlockNumber blkno, Buffer buffer,
if (RelationNeedsWAL(vacrel->rel))
{
log_heap_prune_and_freeze(vacrel->rel, buffer,
+ false,
vmbuffer,
vmflags,
set_pd_all_vis,
diff --git a/src/include/access/heapam.h b/src/include/access/heapam.h
index 8b47295efa2..e7129a644a1 100644
--- a/src/include/access/heapam.h
+++ b/src/include/access/heapam.h
@@ -394,6 +394,7 @@ extern void heap_page_prune_execute(Buffer buffer, bool lp_truncate_only,
OffsetNumber *nowunused, int nunused);
extern void heap_get_root_tuples(Page page, OffsetNumber *root_offsets);
extern void log_heap_prune_and_freeze(Relation relation, Buffer buffer,
+ bool force_heap_fpi,
Buffer vmbuffer,
uint8 vmflags,
bool vm_modified_heap_page,
--
2.43.0
v4-0001-Add-assert-to-heap_prune_record_unchanged_lp_norm.patchapplication/x-patch; name=v4-0001-Add-assert-to-heap_prune_record_unchanged_lp_norm.patchDownload
From d98156d3d8ac522381dc3ccf9a8608168649fdfe Mon Sep 17 00:00:00 2001
From: Melanie Plageman <melanieplageman@gmail.com>
Date: Mon, 7 Jul 2025 17:33:26 -0400
Subject: [PATCH v4 01/13] Add assert to heap_prune_record_unchanged_lp_normal
Not all callers provide VacuumCutoffs to heap_page_prune_and_freeze(),
so assert those are provided before passing them along to
heap_prepare_freeze_tuple().
---
src/backend/access/heap/pruneheap.c | 1 +
1 file changed, 1 insertion(+)
diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index a8025889be0..dd00931f179 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -1480,6 +1480,7 @@ heap_prune_record_unchanged_lp_normal(Page page, PruneState *prstate, OffsetNumb
{
bool totally_frozen;
+ Assert(prstate->cutoffs);
if ((heap_prepare_freeze_tuple(htup,
prstate->cutoffs,
&prstate->pagefrz,
--
2.43.0
v4-0004-Eliminate-xl_heap_visible-from-vacuum-phase-III.patchapplication/x-patch; name=v4-0004-Eliminate-xl_heap_visible-from-vacuum-phase-III.patchDownload
From 51900f8b94a1ebfc7777a9d9a4af379be8597ceb Mon Sep 17 00:00:00 2001
From: Melanie Plageman <melanieplageman@gmail.com>
Date: Wed, 18 Jun 2025 12:42:13 -0400
Subject: [PATCH v4 04/13] Eliminate xl_heap_visible from vacuum phase III
Instead of emitting a separate xl_heap_visible record for each page that
is rendered all-visible by vacuum's third phase, include the updates to
the VM in the already emitted xl_heap_prune record.
---
src/backend/access/heap/heapam_xlog.c | 142 ++++++++++++++++++++---
src/backend/access/heap/pruneheap.c | 48 +++++++-
src/backend/access/heap/vacuumlazy.c | 149 +++++++++++++++++--------
src/backend/access/rmgrdesc/heapdesc.c | 13 ++-
src/include/access/heapam.h | 9 ++
src/include/access/heapam_xlog.h | 3 +
6 files changed, 296 insertions(+), 68 deletions(-)
diff --git a/src/backend/access/heap/heapam_xlog.c b/src/backend/access/heap/heapam_xlog.c
index 2485c344191..14541e2e94f 100644
--- a/src/backend/access/heap/heapam_xlog.c
+++ b/src/backend/access/heap/heapam_xlog.c
@@ -35,7 +35,8 @@ heap_xlog_prune_freeze(XLogReaderState *record)
Buffer buffer;
RelFileLocator rlocator;
BlockNumber blkno;
- XLogRedoAction action;
+ Buffer vmbuffer = InvalidBuffer;
+ uint8 vmflags = 0;
XLogRecGetBlockTag(record, 0, &rlocator, NULL, &blkno);
memcpy(&xlrec, maindataptr, SizeOfHeapPrune);
@@ -51,10 +52,15 @@ heap_xlog_prune_freeze(XLogReaderState *record)
(xlrec.flags & (XLHP_HAS_REDIRECTIONS | XLHP_HAS_DEAD_ITEMS)) == 0);
/*
- * We are about to remove and/or freeze tuples. In Hot Standby mode,
- * ensure that there are no queries running for which the removed tuples
- * are still visible or which still consider the frozen xids as running.
- * The conflict horizon XID comes after xl_heap_prune.
+ * After xl_heap_prune is the optional snapshot conflict horizon.
+ *
+ * In Hot Standby mode, we must ensure that there are no running queries
+ * which would conflict with the changes in this record. If pruning, that
+ * means we cannot remove tuples still visible to transactions on the
+ * standby. If freezing, that means we cannot freeze tuples with xids that
+ * are still considered running on the standby. And for setting the VM, we
+ * cannot do so if the page isn't all-visible to all transactions on the
+ * standby.
*/
if ((xlrec.flags & XLHP_HAS_CONFLICT_HORIZON) != 0)
{
@@ -70,13 +76,28 @@ heap_xlog_prune_freeze(XLogReaderState *record)
rlocator);
}
+ /* Next are the optionally included vmflags. Copy them out for later use. */
+ if ((xlrec.flags & XLHP_HAS_VMFLAGS) != 0)
+ {
+ memcpy(&vmflags, maindataptr, sizeof(uint8));
+ maindataptr += sizeof(uint8);
+
+ /*
+ * We don't set VISIBILITYMAP_XLOG_CATALOG_REL in the combined record
+ * because we already have XLHP_IS_CATALOG_REL.
+ */
+ Assert((vmflags & VISIBILITYMAP_VALID_BITS) == vmflags);
+ /* Must never set all_frozen bit without also setting all_visible bit */
+ Assert(vmflags != VISIBILITYMAP_ALL_FROZEN);
+ }
+
/*
- * If we have a full-page image, restore it and we're done.
+ * If we have a full-page image of the heap block, restore it and we're
+ * done with the heap block.
*/
- action = XLogReadBufferForRedoExtended(record, 0, RBM_NORMAL,
- (xlrec.flags & XLHP_CLEANUP_LOCK) != 0,
- &buffer);
- if (action == BLK_NEEDS_REDO)
+ if (XLogReadBufferForRedoExtended(record, 0, RBM_NORMAL,
+ (xlrec.flags & XLHP_CLEANUP_LOCK) != 0,
+ &buffer) == BLK_NEEDS_REDO)
{
Page page = (Page) BufferGetPage(buffer);
OffsetNumber *redirected;
@@ -89,6 +110,9 @@ heap_xlog_prune_freeze(XLogReaderState *record)
Size datalen;
xlhp_freeze_plan *plans;
OffsetNumber *frz_offsets;
+ bool do_prune;
+ bool mark_buffer_dirty;
+ bool set_heap_lsn;
char *dataptr = XLogRecGetBlockData(record, 0, &datalen);
heap_xlog_deserialize_prune_and_freeze(dataptr, xlrec.flags,
@@ -97,11 +121,18 @@ heap_xlog_prune_freeze(XLogReaderState *record)
&ndead, &nowdead,
&nunused, &nowunused);
+ do_prune = nredirected > 0 || ndead > 0 || nunused > 0;
+ set_heap_lsn = mark_buffer_dirty = do_prune || nplans > 0;
+
+ /* Ensure the record does something */
+ Assert(do_prune || nplans > 0 ||
+ vmflags & VISIBILITYMAP_VALID_BITS);
+
/*
* Update all line pointers per the record, and repair fragmentation
* if needed.
*/
- if (nredirected > 0 || ndead > 0 || nunused > 0)
+ if (do_prune)
heap_page_prune_execute(buffer,
(xlrec.flags & XLHP_CLEANUP_LOCK) == 0,
redirected, nredirected,
@@ -138,26 +169,72 @@ heap_xlog_prune_freeze(XLogReaderState *record)
/* There should be no more data */
Assert((char *) frz_offsets == dataptr + datalen);
+ Assert(BufferIsValid(buffer) &&
+ BufferGetBlockNumber(buffer) == blkno);
+
+ /*
+ * Now set PD_ALL_VISIBLE, if required. We'll only do this if we are
+ * also going to set bits in the VM later.
+ *
+ * We must never end up with the VM bit set and the page-level
+ * PD_ALL_VISIBLE bit clear. If that were to occur, a subsequent page
+ * modification would fail to clear the VM bit.
+ */
+ if ((vmflags & VISIBILITYMAP_VALID_BITS) && !PageIsAllVisible(page))
+ {
+ PageSetAllVisible(page);
+
+ /*
+ * If the only change to the heap page is setting PD_ALL_VISIBLE,
+ * we can avoid setting the page LSN unless checksums or
+ * wal_log_hints are enabled.
+ */
+ set_heap_lsn = XLogHintBitIsNeeded() ? true : set_heap_lsn;
+ mark_buffer_dirty = true;
+ }
+
/*
* Note: we don't worry about updating the page's prunability hints.
* At worst this will cause an extra prune cycle to occur soon.
*/
- PageSetLSN(page, lsn);
- MarkBufferDirty(buffer);
+ if (mark_buffer_dirty)
+ MarkBufferDirty(buffer);
+ if (set_heap_lsn)
+ PageSetLSN(page, lsn);
}
/*
- * If we released any space or line pointers, update the free space map.
+ * If we released any space or line pointers or will be setting a page in
+ * the visibility map, update the free space map.
+ *
+ * Even if we are just updating the VM (and thus not freeing up any
+ * space), we'll still update the FSM for this page. Since FSM is not
+ * WAL-logged and only updated heuristically, it easily becomes stale in
+ * standbys. If the standby is later promoted and runs VACUUM, it will
+ * skip updating individual free space figures for pages that became
+ * all-visible (or all-frozen, depending on the vacuum mode,) which is
+ * troublesome when FreeSpaceMapVacuum propagates too optimistic free
+ * space values to upper FSM layers; later inserters try to use such pages
+ * only to find out that they are unusable. This can cause long stalls
+ * when there are many such pages.
+ *
+ * Forestall those problems by updating FSM's idea about a page that is
+ * becoming all-visible or all-frozen.
*
* Do this regardless of a full-page image being applied, since the FSM
* data is not in the page anyway.
+ *
+ * We want to avoid holding an exclusive lock on the heap buffer while
+ * doing IO (either of the FSM or the VM), so we'll release the lock on
+ * the heap buffer before doing either.
*/
if (BufferIsValid(buffer))
{
- if (xlrec.flags & (XLHP_HAS_REDIRECTIONS |
- XLHP_HAS_DEAD_ITEMS |
- XLHP_HAS_NOW_UNUSED_ITEMS))
+ if ((xlrec.flags & (XLHP_HAS_REDIRECTIONS |
+ XLHP_HAS_DEAD_ITEMS |
+ XLHP_HAS_NOW_UNUSED_ITEMS)) ||
+ vmflags & VISIBILITYMAP_VALID_BITS)
{
Size freespace = PageGetHeapFreeSpace(BufferGetPage(buffer));
@@ -168,6 +245,37 @@ heap_xlog_prune_freeze(XLogReaderState *record)
else
UnlockReleaseBuffer(buffer);
}
+
+ /*
+ * Read and update the VM block. Even if we skipped updating the heap page
+ * due to the file being dropped or truncated later in recovery, it's
+ * still safe to update the visibility map. Any WAL record that clears
+ * the visibility map bit does so before checking the page LSN, so any
+ * bits that need to be cleared will still be cleared.
+ *
+ * Note that it is *only* okay that we do not hold a lock on the heap page
+ * because we are in recovery and can expect no other writers to clear
+ * PD_ALL_VISIBLE before we are able to update the VM.
+ */
+ if (vmflags & VISIBILITYMAP_VALID_BITS &&
+ XLogReadBufferForRedoExtended(record, 1,
+ RBM_ZERO_ON_ERROR,
+ false,
+ &vmbuffer) == BLK_NEEDS_REDO)
+ {
+ uint8 old_vmbits = 0;
+ Relation reln = CreateFakeRelcacheEntry(rlocator);
+
+ visibilitymap_pin(reln, blkno, &vmbuffer);
+ old_vmbits = visibilitymap_set_vmbyte(reln, blkno, vmbuffer, vmflags);
+ /* Only set VM page LSN if we modified the page */
+ if (old_vmbits != vmflags)
+ PageSetLSN(BufferGetPage(vmbuffer), lsn);
+ FreeFakeRelcacheEntry(reln);
+ }
+
+ if (BufferIsValid(vmbuffer))
+ UnlockReleaseBuffer(vmbuffer);
}
/*
diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index dd00931f179..68ecf50848b 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -21,6 +21,7 @@
#include "access/transam.h"
#include "access/xlog.h"
#include "access/xloginsert.h"
+#include "access/visibilitymapdefs.h"
#include "commands/vacuum.h"
#include "executor/instrument.h"
#include "miscadmin.h"
@@ -835,6 +836,7 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
conflict_xid = prstate.latest_xid_removed;
log_heap_prune_and_freeze(relation, buffer,
+ InvalidBuffer, 0, false,
conflict_xid,
true, reason,
prstate.frozen, prstate.nfrozen,
@@ -2046,12 +2048,23 @@ heap_log_freeze_plan(HeapTupleFreeze *tuples, int ntuples,
* replaying 'unused' items depends on whether they were all previously marked
* as dead.
*
+ * If the VM is being updated, vmflags will contain the bits to set. In this
+ * case, vmbuffer should already have been updated and marked dirty and should
+ * still be pinned and locked.
+ *
+ * set_pd_all_vis indicates that we set PD_ALL_VISIBLE and thus should update
+ * the page LSN when checksums/wal_log_hints are enabled even if we did not
+ * prune or freeze tuples on the page.
+ *
* Note: This function scribbles on the 'frozen' array.
*
* Note: This is called in a critical section, so careful what you do here.
*/
void
log_heap_prune_and_freeze(Relation relation, Buffer buffer,
+ Buffer vmbuffer,
+ uint8 vmflags,
+ bool set_pd_all_vis,
TransactionId conflict_xid,
bool cleanup_lock,
PruneReason reason,
@@ -2063,6 +2076,7 @@ log_heap_prune_and_freeze(Relation relation, Buffer buffer,
xl_heap_prune xlrec;
XLogRecPtr recptr;
uint8 info;
+ uint8 regbuf_flags;
/* The following local variables hold data registered in the WAL record: */
xlhp_freeze_plan plans[MaxHeapTuplesPerPage];
@@ -2071,8 +2085,19 @@ log_heap_prune_and_freeze(Relation relation, Buffer buffer,
xlhp_prune_items dead_items;
xlhp_prune_items unused_items;
OffsetNumber frz_offsets[MaxHeapTuplesPerPage];
+ bool do_prune = nredirected > 0 || ndead > 0 || nunused > 0;
xlrec.flags = 0;
+ regbuf_flags = REGBUF_STANDARD;
+
+ /*
+ * We can avoid an FPI if the only modification we are making to the heap
+ * page is to set PD_ALL_VISIBLE and checksums/wal_log_hints are disabled.
+ */
+ if (!do_prune &&
+ nfrozen == 0 &&
+ (!set_pd_all_vis || !XLogHintBitIsNeeded()))
+ regbuf_flags |= REGBUF_NO_IMAGE;
/*
* Prepare data for the buffer. The arrays are not actually in the
@@ -2080,7 +2105,11 @@ log_heap_prune_and_freeze(Relation relation, Buffer buffer,
* page image, the arrays can be omitted.
*/
XLogBeginInsert();
- XLogRegisterBuffer(0, buffer, REGBUF_STANDARD);
+ XLogRegisterBuffer(0, buffer, regbuf_flags);
+
+ if (vmflags & VISIBILITYMAP_VALID_BITS)
+ XLogRegisterBuffer(1, vmbuffer, 0);
+
if (nfrozen > 0)
{
int nplans;
@@ -2137,6 +2166,8 @@ log_heap_prune_and_freeze(Relation relation, Buffer buffer,
* Prepare the main xl_heap_prune record. We already set the XLHP_HAS_*
* flag above.
*/
+ if (vmflags & VISIBILITYMAP_VALID_BITS)
+ xlrec.flags |= XLHP_HAS_VMFLAGS;
if (RelationIsAccessibleInLogicalDecoding(relation))
xlrec.flags |= XLHP_IS_CATALOG_REL;
if (TransactionIdIsValid(conflict_xid))
@@ -2151,6 +2182,8 @@ log_heap_prune_and_freeze(Relation relation, Buffer buffer,
XLogRegisterData(&xlrec, SizeOfHeapPrune);
if (TransactionIdIsValid(conflict_xid))
XLogRegisterData(&conflict_xid, sizeof(TransactionId));
+ if (vmflags & VISIBILITYMAP_VALID_BITS)
+ XLogRegisterData(&vmflags, sizeof(uint8));
switch (reason)
{
@@ -2169,5 +2202,16 @@ log_heap_prune_and_freeze(Relation relation, Buffer buffer,
}
recptr = XLogInsert(RM_HEAP2_ID, info);
- PageSetLSN(BufferGetPage(buffer), recptr);
+ if (vmflags & VISIBILITYMAP_VALID_BITS)
+ PageSetLSN(BufferGetPage(vmbuffer), recptr);
+
+ /*
+ * If pruning or freezing tuples or setting the page all-visible when
+ * checksums or wal_hint_bits are enabled, we must bump the LSN. Torn
+ * pages are possible if we update PD_ALL_VISIBLE without bumping the LSN,
+ * but this is deemed okay for page hint updates.
+ */
+ if (do_prune || nfrozen > 0 ||
+ (set_pd_all_vis && XLogHintBitIsNeeded()))
+ PageSetLSN(BufferGetPage(buffer), recptr);
}
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index 8a62a93eee5..460cdbd8417 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -464,11 +464,13 @@ static void dead_items_add(LVRelState *vacrel, BlockNumber blkno, OffsetNumber *
int num_offsets);
static void dead_items_reset(LVRelState *vacrel);
static void dead_items_cleanup(LVRelState *vacrel);
-static bool heap_page_is_all_visible(Relation rel, Buffer buf,
- TransactionId OldestXmin,
- bool *all_frozen,
- TransactionId *visibility_cutoff_xid,
- OffsetNumber *logging_offnum);
+static bool heap_page_is_all_visible_except_lpdead(Relation rel, Buffer buf,
+ TransactionId OldestXmin,
+ OffsetNumber *deadoffsets,
+ int allowed_num_offsets,
+ bool *all_frozen,
+ TransactionId *visibility_cutoff_xid,
+ OffsetNumber *logging_offnum);
static void update_relstats_all_indexes(LVRelState *vacrel);
static void vacuum_error_callback(void *arg);
static void update_vacuum_error_info(LVRelState *vacrel,
@@ -2847,8 +2849,11 @@ lazy_vacuum_heap_page(LVRelState *vacrel, BlockNumber blkno, Buffer buffer,
OffsetNumber unused[MaxHeapTuplesPerPage];
int nunused = 0;
TransactionId visibility_cutoff_xid;
+ TransactionId conflict_xid = InvalidTransactionId;
bool all_frozen;
LVSavedErrInfo saved_err_info;
+ uint8 vmflags = 0;
+ bool set_pd_all_vis = false;
Assert(vacrel->do_index_vacuuming);
@@ -2859,6 +2864,20 @@ lazy_vacuum_heap_page(LVRelState *vacrel, BlockNumber blkno, Buffer buffer,
VACUUM_ERRCB_PHASE_VACUUM_HEAP, blkno,
InvalidOffsetNumber);
+ if (heap_page_is_all_visible_except_lpdead(vacrel->rel, buffer,
+ vacrel->cutoffs.OldestXmin,
+ deadoffsets, num_offsets,
+ &all_frozen, &visibility_cutoff_xid,
+ &vacrel->offnum))
+ {
+ vmflags |= VISIBILITYMAP_ALL_VISIBLE;
+ if (all_frozen)
+ {
+ vmflags |= VISIBILITYMAP_ALL_FROZEN;
+ Assert(!TransactionIdIsValid(visibility_cutoff_xid));
+ }
+ }
+
START_CRIT_SECTION();
for (int i = 0; i < num_offsets; i++)
@@ -2878,6 +2897,18 @@ lazy_vacuum_heap_page(LVRelState *vacrel, BlockNumber blkno, Buffer buffer,
/* Attempt to truncate line pointer array now */
PageTruncateLinePointerArray(page);
+ if ((vmflags & VISIBILITYMAP_VALID_BITS) != 0)
+ {
+ Assert(!PageIsAllVisible(page));
+ set_pd_all_vis = true;
+ LockBuffer(vmbuffer, BUFFER_LOCK_EXCLUSIVE);
+ PageSetAllVisible(page);
+ visibilitymap_set_vmbyte(vacrel->rel,
+ blkno,
+ vmbuffer, vmflags);
+ conflict_xid = visibility_cutoff_xid;
+ }
+
/*
* Mark buffer dirty before we write WAL.
*/
@@ -2887,7 +2918,10 @@ lazy_vacuum_heap_page(LVRelState *vacrel, BlockNumber blkno, Buffer buffer,
if (RelationNeedsWAL(vacrel->rel))
{
log_heap_prune_and_freeze(vacrel->rel, buffer,
- InvalidTransactionId,
+ vmbuffer,
+ vmflags,
+ set_pd_all_vis,
+ conflict_xid,
false, /* no cleanup lock required */
PRUNE_VACUUM_CLEANUP,
NULL, 0, /* frozen */
@@ -2896,39 +2930,12 @@ lazy_vacuum_heap_page(LVRelState *vacrel, BlockNumber blkno, Buffer buffer,
unused, nunused);
}
- /*
- * End critical section, so we safely can do visibility tests (which
- * possibly need to perform IO and allocate memory!). If we crash now the
- * page (including the corresponding vm bit) might not be marked all
- * visible, but that's fine. A later vacuum will fix that.
- */
END_CRIT_SECTION();
- /*
- * Now that we have removed the LP_DEAD items from the page, once again
- * check if the page has become all-visible. The page is already marked
- * dirty, exclusively locked, and, if needed, a full page image has been
- * emitted.
- */
- Assert(!PageIsAllVisible(page));
- if (heap_page_is_all_visible(vacrel->rel, buffer, vacrel->cutoffs.OldestXmin,
- &all_frozen, &visibility_cutoff_xid, &vacrel->offnum))
+ if ((vmflags & VISIBILITYMAP_ALL_VISIBLE) != 0)
{
- uint8 flags = VISIBILITYMAP_ALL_VISIBLE;
-
- if (all_frozen)
- {
- Assert(!TransactionIdIsValid(visibility_cutoff_xid));
- flags |= VISIBILITYMAP_ALL_FROZEN;
- }
-
- PageSetAllVisible(page);
- visibilitymap_set(vacrel->rel, blkno, buffer,
- InvalidXLogRecPtr,
- vmbuffer, visibility_cutoff_xid,
- flags);
-
/* Count the newly set VM page for logging */
+ LockBuffer(vmbuffer, BUFFER_LOCK_UNLOCK);
vacrel->vm_new_visible_pages++;
if (all_frozen)
vacrel->vm_new_visible_frozen_pages++;
@@ -3594,6 +3601,25 @@ dead_items_cleanup(LVRelState *vacrel)
vacrel->pvs = NULL;
}
+/*
+ * Wrapper for heap_page_is_all_visible_except_lpdead() which can be used for
+ * callers that expect no LP_DEAD on the page.
+ */
+bool
+heap_page_is_all_visible(Relation rel, Buffer buf,
+ TransactionId OldestXmin,
+ bool *all_frozen,
+ TransactionId *visibility_cutoff_xid,
+ OffsetNumber *logging_offnum)
+{
+
+ return heap_page_is_all_visible_except_lpdead(rel, buf, OldestXmin,
+ NULL, 0,
+ all_frozen,
+ visibility_cutoff_xid,
+ logging_offnum);
+}
+
/*
* Check if every tuple in the given page is visible to all current and future
* transactions.
@@ -3607,23 +3633,35 @@ dead_items_cleanup(LVRelState *vacrel)
* visible tuples. Sets *all_frozen to true if every tuple on this page is
* frozen.
*
- * This is a stripped down version of lazy_scan_prune(). If you change
- * anything here, make sure that everything stays in sync. Note that an
- * assertion calls us to verify that everybody still agrees. Be sure to avoid
- * introducing new side-effects here.
+ * deadoffsets are the offsets we know about and are about to set LP_UNUSED.
+ * allowed_num_offsets is the number of those. As long as the LP_DEAD items we
+ * encounter on the page match those exactly, we can set the page all-visible
+ * in the VM.
+ *
+ * Callers looking to verify that the page is all-visible can call
+ * heap_page_is_all_visible().
+ *
+ * This is similar logic to that in heap_prune_record_unchanged_lp_normal() If
+ * you change anything here, make sure that everything stays in sync. Note
+ * that an assertion calls us to verify that everybody still agrees. Be sure
+ * to avoid introducing new side-effects here.
*/
static bool
-heap_page_is_all_visible(Relation rel, Buffer buf,
- TransactionId OldestXmin,
- bool *all_frozen,
- TransactionId *visibility_cutoff_xid,
- OffsetNumber *logging_offnum)
+heap_page_is_all_visible_except_lpdead(Relation rel, Buffer buf,
+ TransactionId OldestXmin,
+ OffsetNumber *deadoffsets,
+ int allowed_num_offsets,
+ bool *all_frozen,
+ TransactionId *visibility_cutoff_xid,
+ OffsetNumber *logging_offnum)
{
Page page = BufferGetPage(buf);
BlockNumber blockno = BufferGetBlockNumber(buf);
OffsetNumber offnum,
maxoff;
bool all_visible = true;
+ OffsetNumber current_dead_offsets[MaxHeapTuplesPerPage];
+ size_t current_num_offsets = 0;
*visibility_cutoff_xid = InvalidTransactionId;
*all_frozen = true;
@@ -3655,9 +3693,8 @@ heap_page_is_all_visible(Relation rel, Buffer buf,
*/
if (ItemIdIsDead(itemid))
{
- all_visible = false;
- *all_frozen = false;
- break;
+ current_dead_offsets[current_num_offsets++] = offnum;
+ continue;
}
Assert(ItemIdIsNormal(itemid));
@@ -3724,7 +3761,23 @@ heap_page_is_all_visible(Relation rel, Buffer buf,
/* Clear the offset information once we have processed the given page. */
*logging_offnum = InvalidOffsetNumber;
- return all_visible;
+ /* If we already know it's not all-visible, return false */
+ if (!all_visible)
+ return false;
+
+ /* If we weren't allowed any dead offsets, we're done */
+ if (allowed_num_offsets == 0)
+ return current_num_offsets == 0;
+
+ /* If the number of dead offsets has changed, that's wrong */
+ if (current_num_offsets != allowed_num_offsets)
+ return false;
+
+ Assert(deadoffsets);
+
+ /* The dead offsets must be the same dead offsets */
+ return memcmp(current_dead_offsets, deadoffsets,
+ allowed_num_offsets * sizeof(OffsetNumber)) == 0;
}
/*
diff --git a/src/backend/access/rmgrdesc/heapdesc.c b/src/backend/access/rmgrdesc/heapdesc.c
index b48d7dc1d24..d6c86ccac20 100644
--- a/src/backend/access/rmgrdesc/heapdesc.c
+++ b/src/backend/access/rmgrdesc/heapdesc.c
@@ -266,6 +266,7 @@ heap2_desc(StringInfo buf, XLogReaderState *record)
{
char *rec = XLogRecGetData(record);
uint8 info = XLogRecGetInfo(record) & ~XLR_INFO_MASK;
+ char *maindataptr = rec + SizeOfHeapPrune;
info &= XLOG_HEAP_OPMASK;
if (info == XLOG_HEAP2_PRUNE_ON_ACCESS ||
@@ -278,7 +279,8 @@ heap2_desc(StringInfo buf, XLogReaderState *record)
{
TransactionId conflict_xid;
- memcpy(&conflict_xid, rec + SizeOfHeapPrune, sizeof(TransactionId));
+ memcpy(&conflict_xid, maindataptr, sizeof(TransactionId));
+ maindataptr += sizeof(TransactionId);
appendStringInfo(buf, "snapshotConflictHorizon: %u",
conflict_xid);
@@ -287,6 +289,15 @@ heap2_desc(StringInfo buf, XLogReaderState *record)
appendStringInfo(buf, ", isCatalogRel: %c",
xlrec->flags & XLHP_IS_CATALOG_REL ? 'T' : 'F');
+ if (xlrec->flags & XLHP_HAS_VMFLAGS)
+ {
+ uint8 vmflags;
+
+ memcpy(&vmflags, maindataptr, sizeof(uint8));
+ maindataptr += sizeof(uint8);
+ appendStringInfo(buf, ", vm_flags: 0x%02X", vmflags);
+ }
+
if (XLogRecHasBlockData(record, 0))
{
Size datalen;
diff --git a/src/include/access/heapam.h b/src/include/access/heapam.h
index a2bd5a897f8..8b47295efa2 100644
--- a/src/include/access/heapam.h
+++ b/src/include/access/heapam.h
@@ -344,6 +344,12 @@ extern void heap_inplace_update_and_unlock(Relation relation,
Buffer buffer);
extern void heap_inplace_unlock(Relation relation,
HeapTuple oldtup, Buffer buffer);
+
+extern bool heap_page_is_all_visible(Relation rel, Buffer buf,
+ TransactionId OldestXmin,
+ bool *all_frozen,
+ TransactionId *visibility_cutoff_xid,
+ OffsetNumber *logging_offnum);
extern bool heap_prepare_freeze_tuple(HeapTupleHeader tuple,
const struct VacuumCutoffs *cutoffs,
HeapPageFreeze *pagefrz,
@@ -388,6 +394,9 @@ extern void heap_page_prune_execute(Buffer buffer, bool lp_truncate_only,
OffsetNumber *nowunused, int nunused);
extern void heap_get_root_tuples(Page page, OffsetNumber *root_offsets);
extern void log_heap_prune_and_freeze(Relation relation, Buffer buffer,
+ Buffer vmbuffer,
+ uint8 vmflags,
+ bool vm_modified_heap_page,
TransactionId conflict_xid,
bool cleanup_lock,
PruneReason reason,
diff --git a/src/include/access/heapam_xlog.h b/src/include/access/heapam_xlog.h
index 277df6b3cf0..ceae9c083ff 100644
--- a/src/include/access/heapam_xlog.h
+++ b/src/include/access/heapam_xlog.h
@@ -295,6 +295,9 @@ typedef struct xl_heap_prune
#define SizeOfHeapPrune (offsetof(xl_heap_prune, flags) + sizeof(uint8))
+/* If the record should update the VM, this is the new value */
+#define XLHP_HAS_VMFLAGS (1 << 0)
+
/* to handle recovery conflict during logical decoding on standby */
#define XLHP_IS_CATALOG_REL (1 << 1)
--
2.43.0
v4-0010-Eliminate-xl_heap_visible-from-vacuum-phase-I-pru.patchapplication/x-patch; name=v4-0010-Eliminate-xl_heap_visible-from-vacuum-phase-I-pru.patchDownload
From bf9f5caacfb0f1f12bff35e8cd004519deea6e11 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <melanieplageman@gmail.com>
Date: Wed, 18 Jun 2025 12:41:00 -0400
Subject: [PATCH v4 10/13] Eliminate xl_heap_visible from vacuum phase I
prune/freeze
Instead of emitting a separate WAL record for every block rendered
all-visible/frozen by vacuum's phase I, include the changes to the VM in
the xl_heap_prune record already emitted.
This is only enabled for vacuum's prune/freeze work, not for on-access
pruning.
---
src/backend/access/heap/pruneheap.c | 402 ++++++++++++++++-----------
src/backend/access/heap/vacuumlazy.c | 30 --
src/include/access/heapam.h | 15 +-
3 files changed, 237 insertions(+), 210 deletions(-)
diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index 797b3710862..6208f55176f 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -44,6 +44,13 @@ typedef struct
bool mark_unused_now;
/* whether to attempt freezing tuples */
bool freeze;
+
+ /*
+ * Whether or not to consider updating the VM. There is some bookkeeping
+ * that must be maintained if we would like to update the VM.
+ */
+ bool update_vm;
+
struct VacuumCutoffs *cutoffs;
/*-------------------------------------------------------
@@ -108,8 +115,9 @@ typedef struct
*
* These fields are not used by pruning itself for the most part, but are
* used to collect information about what was pruned and what state the
- * page is in after pruning, for the benefit of the caller. They are
- * copied to the caller's PruneFreezeResult at the end.
+ * page is in after pruning to use when updating the visibility map and
+ * for the benefit of the caller. They are copied to the caller's
+ * PruneFreezeResult at the end.
* -------------------------------------------------------
*/
@@ -138,11 +146,10 @@ typedef struct
* bits. It is only valid if we froze some tuples, and all_frozen is
* true.
*
- * NOTE: all_visible and all_frozen don't include LP_DEAD items. That's
- * convenient for heap_page_prune_and_freeze(), to use them to decide
- * whether to freeze the page or not. The all_visible and all_frozen
- * values returned to the caller are adjusted to include LP_DEAD items at
- * the end.
+ * NOTE: all_visible and all_frozen don't include LP_DEAD items until
+ * directly before updating the VM. We ignore LP_DEAD items when deciding
+ * whether or not to opportunistically freeze and when determining the
+ * snapshot conflict horizon required when freezing tuples.
*
* all_frozen should only be considered valid if all_visible is also set;
* we don't bother to clear the all_frozen flag every time we clear the
@@ -377,11 +384,15 @@ identify_and_fix_vm_corruption(Relation relation,
* considered advantageous for overall system performance to do so now. The
* 'cutoffs', 'presult', 'new_relfrozen_xid' and 'new_relmin_mxid' arguments
* are required when freezing. When HEAP_PRUNE_FREEZE option is set, we also
- * set presult->all_visible and presult->all_frozen on exit, to indicate if
- * the VM bits can be set. They are always set to false when the
- * HEAP_PRUNE_FREEZE option is not set, because at the moment only callers
+ * set presult->all_visible and presult->all_frozen on exit, for use when
+ * validating the changes made to the VM. They are always set to false when
+ * the HEAP_PRUNE_FREEZE option is not set, because at the moment only callers
* that also freeze need that information.
*
+ * If HEAP_PAGE_PRUNE_UPDATE_VM is set and the visibility status of the page
+ * has changed, we will update the VM at the same time as pruning and freezing
+ * the heap page.
+ *
* blk_known_av is the visibility status of the heap block as of the last call
* to find_next_unskippable_block(). vmbuffer is the buffer that may already
* contain the required block of the visibility map.
@@ -396,6 +407,8 @@ identify_and_fix_vm_corruption(Relation relation,
* FREEZE indicates that we will also freeze tuples, and will return
* 'all_visible', 'all_frozen' flags to the caller.
*
+ * UPDATE_VM indicates that we will set the page's status in the VM.
+ *
* cutoffs contains the freeze cutoffs, established by VACUUM at the beginning
* of vacuuming the relation. Required if HEAP_PRUNE_FREEZE option is set.
* cutoffs->OldestXmin is also used to determine if dead tuples are
@@ -441,15 +454,19 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
bool do_freeze;
bool do_prune;
bool do_hint;
+ bool do_set_vm;
uint8 vmflags = 0;
uint8 old_vmbits = 0;
bool hint_bit_fpi;
int64 fpi_before = pgWalUsage.wal_fpi;
+ bool all_frozen_except_lp_dead = false;
+ bool set_pd_all_visible = false;
/* Copy parameters to prstate */
prstate.vistest = vistest;
prstate.mark_unused_now = (options & HEAP_PAGE_PRUNE_MARK_UNUSED_NOW) != 0;
prstate.freeze = (options & HEAP_PAGE_PRUNE_FREEZE) != 0;
+ prstate.update_vm = (options & HEAP_PAGE_PRUNE_UPDATE_VM) != 0;
prstate.cutoffs = cutoffs;
/*
@@ -496,50 +513,53 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
prstate.deadoffsets = presult->deadoffsets;
/*
- * Caller may update the VM after we're done. We can keep track of
- * whether the page will be all-visible and all-frozen after pruning and
- * freezing to help the caller to do that.
+ * Keep track of whether or not the page will be all-visible and
+ * all-frozen for use in opportunistic freezing and to update the VM if
+ * the caller requests it.
+ *
+ * Currently, only VACUUM attempts freezing and setting the VM bits. But
+ * other callers could do either one. The visibility bookkeeping is
+ * required for opportunistic freezing (in addition to setting the VM
+ * bits) because we only consider opportunistically freezing tuples if the
+ * whole page would become all-frozen or if the whole page will be frozen
+ * except for dead tuples that will be removed by vacuum.
*
- * Currently, only VACUUM sets the VM bits. To save the effort, only do
- * the bookkeeping if the caller needs it. Currently, that's tied to
- * HEAP_PAGE_PRUNE_FREEZE, but it could be a separate flag if you wanted
- * to update the VM bits without also freezing or freeze without also
- * setting the VM bits.
+ * Dead tuples which will be removed by the end of vacuuming should not
+ * preclude us from opportunistically freezing, so we do not clear
+ * all_visible when we see LP_DEAD items. We fix that after determining
+ * whether or not to freeze but before deciding whether or not to update
+ * the VM so that we don't set the VM bit incorrectly.
*
- * In addition to telling the caller whether it can set the VM bit, we
- * also use 'all_visible' and 'all_frozen' for our own decision-making. If
- * the whole page would become frozen, we consider opportunistically
- * freezing tuples. We will not be able to freeze the whole page if there
- * are tuples present that are not visible to everyone or if there are
- * dead tuples which are not yet removable. However, dead tuples which
- * will be removed by the end of vacuuming should not preclude us from
- * opportunistically freezing. Because of that, we do not clear
- * all_visible when we see LP_DEAD items. We fix that at the end of the
- * function, when we return the value to the caller, so that the caller
- * doesn't set the VM bit incorrectly.
+ * If not freezing or updating the VM, we otherwise avoid the extra
+ * bookkeeping. Initializing all_visible to false allows skipping the work
+ * to update them in heap_prune_record_unchanged_lp_normal().
*/
if (prstate.freeze)
{
prstate.all_visible = true;
prstate.all_frozen = true;
}
+ else if (prstate.update_vm)
+ {
+ prstate.all_visible = true;
+ prstate.all_frozen = false;
+ }
else
{
- /*
- * Initializing to false allows skipping the work to update them in
- * heap_prune_record_unchanged_lp_normal().
- */
prstate.all_visible = false;
prstate.all_frozen = false;
}
/*
- * The visibility cutoff xid is the newest xmin of live tuples on the
- * page. In the common case, this will be set as the conflict horizon the
- * caller can use for updating the VM. If, at the end of freezing and
- * pruning, the page is all-frozen, there is no possibility that any
- * running transaction on the standby does not see tuples on the page as
- * all-visible, so the conflict horizon remains InvalidTransactionId.
+ * The visibility cutoff xid is the newest xmin of live, committed tuples
+ * older than OldestXmin on the page. This field is only kept up-to-date
+ * if the page is all-visible. As soon as a tuple is encountered that is
+ * not visible to all, this field is unmaintained. As long as it is
+ * maintained, it can be used to calculate the snapshot conflict horizon.
+ * This is most likely to happen when updating the VM and/or freezing all
+ * live tuples on the page. It is updated before returning to the caller
+ * because vacuum does assert-build only validation on the page using this
+ * field.
*/
prstate.visibility_cutoff_xid = InvalidTransactionId;
@@ -827,6 +847,68 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
*/
}
+ /*
+ * It was convenient to ignore LP_DEAD items in all_visible earlier on to
+ * make the choice of whether or not to freeze the page unaffected by the
+ * short-term presence of LP_DEAD items. These LP_DEAD items were
+ * effectively assumed to be LP_UNUSED items in the making. It doesn't
+ * matter which vacuum heap pass (initial pass or final pass) ends up
+ * setting the page all-frozen, as long as the ongoing VACUUM does it.
+ *
+ * Now that freezing has been finalized, unset all_visible if there are
+ * any LP_DEAD items on the page. It needs to reflect the present state of
+ * the page when using it to determine whether or not to update the VM.
+ *
+ * Keep track of whether or not the page was all-frozen except LP_DEAD
+ * items for the purposes of calculating the snapshot conflict horizon,
+ * though.
+ */
+ all_frozen_except_lp_dead = prstate.all_frozen;
+ if (prstate.lpdead_items > 0)
+ {
+ prstate.all_visible = false;
+ prstate.all_frozen = false;
+ }
+
+ /*
+ * Handle setting visibility map bit based on information from the VM (as
+ * of last heap_vac_scan_next_block() call), and from all_visible and
+ * all_frozen variables.
+ */
+ if (prstate.update_vm)
+ {
+ /*
+ * Clear any VM corruption. This does not need to be in a critical
+ * section, so we do it first. If PD_ALL_VISIBLE is incorrectly set,
+ * we may mark the heap page buffer dirty here and could end up doing
+ * so again later. This is not a correctness issue and is in the path
+ * of VM corruption, so we don't have to worry about the extra
+ * performance overhead.
+ */
+ if (identify_and_fix_vm_corruption(relation,
+ blockno, buffer, page,
+ blk_known_av, prstate.lpdead_items, vmbuffer))
+ {
+ /* If we fix corruption, don't update the VM further */
+ }
+
+ /* Determine if we actually need to set the VM and which bits to set. */
+ else if (prstate.all_visible &&
+ (!blk_known_av ||
+ (prstate.all_frozen && !VM_ALL_FROZEN(relation, blockno, &vmbuffer))))
+ {
+ vmflags |= VISIBILITYMAP_ALL_VISIBLE;
+ if (prstate.all_frozen)
+ vmflags |= VISIBILITYMAP_ALL_FROZEN;
+ }
+ }
+
+ do_set_vm = vmflags & VISIBILITYMAP_VALID_BITS;
+ set_pd_all_visible = do_set_vm && !PageIsAllVisible(page);
+
+ /* Save these for the caller in case we later zero out vmflags */
+ presult->new_vmbits = vmflags;
+
/* Any error while applying the changes is critical */
START_CRIT_SECTION();
@@ -848,13 +930,13 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
/*
* If that's all we had to do to the page, this is a non-WAL-logged
* hint. If we are going to freeze or prune the page, we will mark
- * the buffer dirty below.
+ * the buffer dirty and emit WAL below.
*/
- if (!do_freeze && !do_prune)
+ if (!do_prune && !do_freeze && !do_set_vm)
MarkBufferDirtyHint(buffer, true);
}
- if (do_prune || do_freeze)
+ if (do_prune || do_freeze || do_set_vm)
{
/* Apply the planned item changes and repair page fragmentation. */
if (do_prune)
@@ -868,12 +950,34 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
if (do_freeze)
heap_freeze_prepared_tuples(buffer, prstate.frozen, prstate.nfrozen);
- MarkBufferDirty(buffer);
+ if (set_pd_all_visible)
+ PageSetAllVisible(page);
+
+ if (do_prune || do_freeze || set_pd_all_visible)
+ MarkBufferDirty(buffer);
+
+ if (do_set_vm)
+ {
+ LockBuffer(vmbuffer, BUFFER_LOCK_EXCLUSIVE);
+ old_vmbits = visibilitymap_set_vmbyte(relation, blockno,
+ vmbuffer, vmflags);
+
+ if (old_vmbits == vmflags)
+ {
+ LockBuffer(vmbuffer, BUFFER_LOCK_UNLOCK);
+ do_set_vm = false;
+ /* 0 out vmflags so we don't emit VM update WAL */
+ vmflags = 0;
+ }
+ }
/*
- * Emit a WAL XLOG_HEAP2_PRUNE_FREEZE record showing what we did
+ * Emit a WAL XLOG_HEAP2_PRUNE_FREEZE record showing what we did. If
+ * we were only updating the VM and it turns out it was already set,
+ * we will have unset do_set_vm above. As such, check it again before
+ * emitting the record.
*/
- if (RelationNeedsWAL(relation))
+ if (RelationNeedsWAL(relation) && (do_set_vm || do_prune || do_freeze))
{
/*
* The snapshotConflictHorizon for the whole record should be the
@@ -885,35 +989,57 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
* on the standby with xids older than the youngest tuple this
* record will freeze will conflict.
*/
- TransactionId frz_conflict_horizon = InvalidTransactionId;
- TransactionId conflict_xid;
+ TransactionId conflict_xid = InvalidTransactionId;
/*
- * We can use the visibility_cutoff_xid as our cutoff for
- * conflicts when the whole page is eligible to become all-frozen
- * in the VM once we're done with it. Otherwise we generate a
- * conservative cutoff by stepping back from OldestXmin.
+ * If we are updating the VM, the conflict horizon is almost
+ * always the visibility cutoff XID.
+ *
+ * Separately, if we are freezing any tuples, as an optimization,
+ * we can use the visibility_cutoff_xid as the conflict horizon if
+ * the page will be all-frozen. This is true even if there are
+ * LP_DEAD line pointers because we ignored those when maintaining
+ * the visibility_cutoff_xid.
*/
- if (do_freeze)
+ if (do_set_vm || (do_freeze && all_frozen_except_lp_dead))
+ conflict_xid = prstate.visibility_cutoff_xid;
+
+ /*
+ * Otherwise, if we are freezing but the page would not be
+ * all-frozen, we have to use the more pessimistic horizon of
+ * OldestXmin, which may be newer than the newest tuple we froze.
+ * That's because we won't have maintained the
+ * visibility_cutoff_xid.
+ */
+ else if (do_freeze)
{
- if (prstate.all_visible && prstate.all_frozen)
- frz_conflict_horizon = prstate.visibility_cutoff_xid;
- else
- {
- /* Avoids false conflicts when hot_standby_feedback in use */
- frz_conflict_horizon = prstate.cutoffs->OldestXmin;
- TransactionIdRetreat(frz_conflict_horizon);
- }
+ conflict_xid = prstate.cutoffs->OldestXmin;
+ TransactionIdRetreat(conflict_xid);
}
- if (TransactionIdFollows(frz_conflict_horizon, prstate.latest_xid_removed))
- conflict_xid = frz_conflict_horizon;
- else
+ /*
+ * If we are removing tuples with a younger xmax than our so far
+ * calculated conflict_xid, we must use this as our horizon.
+ */
+ if (TransactionIdFollows(prstate.latest_xid_removed, conflict_xid))
conflict_xid = prstate.latest_xid_removed;
+ /*
+ * We can omit the snapshot conflict horizon if we are not pruning
+ * or freezing any tuples and are setting an already all-visible
+ * page all-frozen in the VM. In this case, all of the tuples on
+ * the page must already be visible to all MVCC snapshots on the
+ * standby.
+ */
+ if (!do_prune && !do_freeze && do_set_vm &&
+ blk_known_av && (vmflags & VISIBILITYMAP_ALL_FROZEN))
+ conflict_xid = InvalidTransactionId;
+
log_heap_prune_and_freeze(relation, buffer,
false,
- InvalidBuffer, 0, false,
+ vmbuffer,
+ vmflags,
+ set_pd_all_visible,
conflict_xid,
true, reason,
prstate.frozen, prstate.nfrozen,
@@ -925,124 +1051,55 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
END_CRIT_SECTION();
- /* Copy information back for caller */
- presult->ndeleted = prstate.ndeleted;
- presult->nnewlpdead = prstate.ndead;
- presult->nfrozen = prstate.nfrozen;
- presult->live_tuples = prstate.live_tuples;
- presult->recently_dead_tuples = prstate.recently_dead_tuples;
+ if (do_set_vm)
+ LockBuffer(vmbuffer, BUFFER_LOCK_UNLOCK);
- /*
- * It was convenient to ignore LP_DEAD items in all_visible earlier on to
- * make the choice of whether or not to freeze the page unaffected by the
- * short-term presence of LP_DEAD items. These LP_DEAD items were
- * effectively assumed to be LP_UNUSED items in the making. It doesn't
- * matter which vacuum heap pass (initial pass or final pass) ends up
- * setting the page all-frozen, as long as the ongoing VACUUM does it.
- *
- * Now that freezing has been finalized, unset all_visible if there are
- * any LP_DEAD items on the page. It needs to reflect the present state
- * of the page, as expected for updating the visibility map.
- */
- if (prstate.all_visible && prstate.lpdead_items == 0)
- {
- presult->all_visible = prstate.all_visible;
- presult->all_frozen = prstate.all_frozen;
- }
- else
- {
- presult->all_visible = false;
- presult->all_frozen = false;
- }
-
- presult->hastup = prstate.hastup;
+ Assert(!prstate.all_visible || (prstate.lpdead_items == 0));
/*
- * If updating the visibility map, the conflict horizon for that record
- * must be the newest xmin on the page. However, if the page is
- * completely frozen, there can be no conflict and the vm_conflict_horizon
- * should remain InvalidTransactionId. This includes the case that we
- * just froze all the tuples; the prune-freeze record included the
- * conflict XID already so the VM update record doesn't need it.
+ * VACUUM will call heap_page_is_all_visible() during the second pass over
+ * the heap to determine all_visible and all_frozen for the page -- this
+ * is a specialized version of the logic from this function. Now that
+ * we've finished pruning and freezing, make sure that we're in total
+ * agreement with heap_page_is_all_visible() using an assertion. We will
+ * have already set the page in the VM, so this assertion will only let
+ * you know that you've already done something wrong.
*/
- if (presult->all_frozen)
- presult->vm_conflict_horizon = InvalidTransactionId;
- else
- presult->vm_conflict_horizon = prstate.visibility_cutoff_xid;
-
- /*
- * Handle setting visibility map bit based on information from the VM (as
- * of last heap_vac_scan_next_block() call), and from all_visible and
- * all_frozen variables.
- */
- if (options & HEAP_PAGE_PRUNE_UPDATE_VM)
+#ifdef USE_ASSERT_CHECKING
+ if (prstate.all_visible)
{
- if (identify_and_fix_vm_corruption(relation,
- blockno, buffer, page,
- blk_known_av,
- prstate.lpdead_items, vmbuffer))
- {
- /* If we fix corruption, don't update the VM further */
- }
+ TransactionId debug_cutoff;
+ bool debug_all_frozen;
- /*
- * If the page isn't yet marked all-visible in the VM or it is and
- * needs to me marked all-frozen, update the VM Note that all_frozen
- * is only valid if all_visible is true, so we must check both
- * all_visible and all_frozen.
- */
- else if (presult->all_visible &&
- (!blk_known_av ||
- (presult->all_frozen && !VM_ALL_FROZEN(relation, blockno, &vmbuffer))))
- {
- Assert(prstate.lpdead_items == 0);
- vmflags = VISIBILITYMAP_ALL_VISIBLE;
+ Assert(cutoffs);
- /*
- * If the page is all-frozen, we can pass InvalidTransactionId as
- * our cutoff_xid, since a snapshotConflictHorizon sufficient to
- * make everything safe for REDO was logged when the page's tuples
- * were frozen.
- */
- if (presult->all_frozen)
- {
- Assert(!TransactionIdIsValid(presult->vm_conflict_horizon));
- vmflags |= VISIBILITYMAP_ALL_FROZEN;
- }
+ Assert(prstate.lpdead_items == 0);
- /*
- * It's possible for the VM bit to be clear and the page-level bit
- * to be set if checksums are not enabled.
- *
- * And even if we are just planning to update the frozen bit in
- * the VM, we shouldn't rely on all_visible_according_to_vm as a
- * proxy for the page-level PD_ALL_VISIBLE bit being set, since it
- * might have become stale.
- *
- * If the heap page is all-visible but the VM bit is not set, we
- * don't need to dirty the heap page. However, if checksums are
- * enabled, we do need to make sure that the heap page is dirtied
- * before passing it to visibilitymap_set(), because it may be
- * logged.
- */
- if (!PageIsAllVisible(page))
- {
- PageSetAllVisible(page);
- MarkBufferDirty(buffer);
- }
+ if (!heap_page_is_all_visible(relation, buffer,
+ cutoffs->OldestXmin,
+ &debug_all_frozen,
+ &debug_cutoff, off_loc))
+ Assert(false);
- old_vmbits = visibilitymap_set(relation, blockno, buffer, InvalidXLogRecPtr,
- vmbuffer, presult->vm_conflict_horizon,
- vmflags);
- }
+ Assert(prstate.all_frozen == debug_all_frozen);
+
+ Assert(!TransactionIdIsValid(debug_cutoff) ||
+ debug_cutoff == prstate.visibility_cutoff_xid);
}
+#endif
+ /* Copy information back for caller */
+ presult->ndeleted = prstate.ndeleted;
+ presult->nnewlpdead = prstate.ndead;
+ presult->nfrozen = prstate.nfrozen;
+ presult->live_tuples = prstate.live_tuples;
+ presult->recently_dead_tuples = prstate.recently_dead_tuples;
+ presult->old_vmbits = old_vmbits;
+ /* new_vmbits was set above */
+ presult->hastup = prstate.hastup;
presult->lpdead_items = prstate.lpdead_items;
/* the presult->deadoffsets array was already filled in */
- presult->old_vmbits = old_vmbits;
- presult->new_vmbits = vmflags;
-
if (prstate.freeze)
{
if (presult->nfrozen > 0)
@@ -1624,7 +1681,12 @@ heap_prune_record_unchanged_lp_normal(Page page, PruneState *prstate, OffsetNumb
break;
}
- /* Consider freezing any normal tuples which will not be removed */
+ /*
+ * Consider freezing any normal tuples which will not be removed.
+ * Regardless of whether or not we want to freeze the tuples, if we want
+ * to update the VM, we have to call heap_prepare_freeze_tuple() on every
+ * tuple to know whether or not the page will be totally frozen.
+ */
if (prstate->freeze)
{
bool totally_frozen;
@@ -2238,6 +2300,8 @@ log_heap_prune_and_freeze(Relation relation, Buffer buffer,
OffsetNumber frz_offsets[MaxHeapTuplesPerPage];
bool do_prune = nredirected > 0 || ndead > 0 || nunused > 0;
+ Assert(do_prune || nfrozen > 0 || vmflags & VISIBILITYMAP_VALID_BITS);
+
xlrec.flags = 0;
regbuf_flags = REGBUF_STANDARD;
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index c49e81bc5dd..91e209901b8 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -2014,34 +2014,6 @@ lazy_scan_prune(LVRelState *vacrel,
vacrel->new_frozen_tuple_pages++;
}
- /*
- * VACUUM will call heap_page_is_all_visible() during the second pass over
- * the heap to determine all_visible and all_frozen for the page -- this
- * is a specialized version of the logic from this function. Now that
- * we've finished pruning and freezing, make sure that we're in total
- * agreement with heap_page_is_all_visible() using an assertion.
- */
-#ifdef USE_ASSERT_CHECKING
- /* Note that all_frozen value does not matter when !all_visible */
- if (presult.all_visible)
- {
- TransactionId debug_cutoff;
- bool debug_all_frozen;
-
- Assert(presult.lpdead_items == 0);
-
- if (!heap_page_is_all_visible(vacrel->rel, buf,
- vacrel->cutoffs.OldestXmin, &debug_all_frozen,
- &debug_cutoff, &vacrel->offnum))
- Assert(false);
-
- Assert(presult.all_frozen == debug_all_frozen);
-
- Assert(!TransactionIdIsValid(debug_cutoff) ||
- debug_cutoff == presult.vm_conflict_horizon);
- }
-#endif
-
/*
* Now save details of the LP_DEAD items from the page in vacrel
*/
@@ -2075,8 +2047,6 @@ lazy_scan_prune(LVRelState *vacrel,
/* Did we find LP_DEAD items? */
*has_lpdead_items = (presult.lpdead_items > 0);
- Assert(!presult.all_visible || !(*has_lpdead_items));
-
/*
* For the purposes of logging, count whether or not the page was newly
* set all-visible and, potentially, all-frozen.
diff --git a/src/include/access/heapam.h b/src/include/access/heapam.h
index b85648456e9..0b9bb1c9b13 100644
--- a/src/include/access/heapam.h
+++ b/src/include/access/heapam.h
@@ -235,19 +235,12 @@ typedef struct PruneFreezeResult
int recently_dead_tuples;
/*
- * all_visible and all_frozen indicate the status of the page as reflected
- * in the visibility map after pruning, freezing, and setting any pages
- * all-visible in the visibility map.
+ * old_vmbits are the state of the all-visible and all-frozen bits in the
+ * visibility map before updating it during phase I of vacuuming.
+ * new_vmbits are the state of those bits after phase I of vacuuming.
*
- * vm_conflict_horizon is the newest xmin of live tuples on the page
- * (older than OldestXmin). It will only be valid if we did not set the
- * page all-frozen in the VM.
- *
- * These are only set if the HEAP_PRUNE_FREEZE option is set.
+ * These are only set if the HEAP_PAGE_PRUNE_UPDATE_VM option is set.
*/
- bool all_visible;
- bool all_frozen;
- TransactionId vm_conflict_horizon;
uint8 old_vmbits;
uint8 new_vmbits;
--
2.43.0
v4-0009-Update-VM-in-pruneheap.c.patchapplication/x-patch; name=v4-0009-Update-VM-in-pruneheap.c.patchDownload
From 4aae350102d197fb511b45b478fe887e1900c3a7 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <melanieplageman@gmail.com>
Date: Mon, 2 Jun 2025 11:04:14 -0400
Subject: [PATCH v4 09/13] Update VM in pruneheap.c
As a step toward updating the VM in the same critical section and WAL
record as pruning and freezing (during phase I of vacuuming), first move
the VM update (still in its own critical section and WAL record) into
heap_page_prune_and_freeze(). This makes review easier.
---
src/backend/access/heap/pruneheap.c | 99 +++++++++++++++++++++++-----
src/backend/access/heap/vacuumlazy.c | 98 +++++----------------------
src/include/access/heapam.h | 15 +++--
3 files changed, 106 insertions(+), 106 deletions(-)
diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index 070d64fa9c3..797b3710862 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -364,7 +364,8 @@ identify_and_fix_vm_corruption(Relation relation,
/*
* Prune and repair fragmentation and potentially freeze tuples on the
- * specified page.
+ * specified page. If the page's visibility status has changed, update it in
+ * the VM.
*
* Caller must have pin and buffer cleanup lock on the page. Note that we
* don't update the FSM information for page on caller's behalf. Caller might
@@ -440,6 +441,8 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
bool do_freeze;
bool do_prune;
bool do_hint;
+ uint8 vmflags = 0;
+ uint8 old_vmbits = 0;
bool hint_bit_fpi;
int64 fpi_before = pgWalUsage.wal_fpi;
@@ -939,7 +942,7 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
*
* Now that freezing has been finalized, unset all_visible if there are
* any LP_DEAD items on the page. It needs to reflect the present state
- * of the page, as expected by our caller.
+ * of the page, as expected for updating the visibility map.
*/
if (prstate.all_visible && prstate.lpdead_items == 0)
{
@@ -955,31 +958,91 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
presult->hastup = prstate.hastup;
/*
- * For callers planning to update the visibility map, the conflict horizon
- * for that record must be the newest xmin on the page. However, if the
- * page is completely frozen, there can be no conflict and the
- * vm_conflict_horizon should remain InvalidTransactionId. This includes
- * the case that we just froze all the tuples; the prune-freeze record
- * included the conflict XID already so the caller doesn't need it.
+ * If updating the visibility map, the conflict horizon for that record
+ * must be the newest xmin on the page. However, if the page is
+ * completely frozen, there can be no conflict and the vm_conflict_horizon
+ * should remain InvalidTransactionId. This includes the case that we
+ * just froze all the tuples; the prune-freeze record included the
+ * conflict XID already so the VM update record doesn't need it.
*/
if (presult->all_frozen)
presult->vm_conflict_horizon = InvalidTransactionId;
else
presult->vm_conflict_horizon = prstate.visibility_cutoff_xid;
- presult->lpdead_items = prstate.lpdead_items;
- /* the presult->deadoffsets array was already filled in */
-
/*
- * Clear any VM corruption. This does not need to be done in a critical
- * section.
+ * Handle setting visibility map bit based on information from the VM (as
+ * of last heap_vac_scan_next_block() call), and from all_visible and
+ * all_frozen variables.
*/
- presult->vm_corruption = false;
if (options & HEAP_PAGE_PRUNE_UPDATE_VM)
- presult->vm_corruption = identify_and_fix_vm_corruption(relation,
- blockno, buffer, page,
- blk_known_av,
- prstate.lpdead_items, vmbuffer);
+ {
+ if (identify_and_fix_vm_corruption(relation,
+ blockno, buffer, page,
+ blk_known_av,
+ prstate.lpdead_items, vmbuffer))
+ {
+ /* If we fix corruption, don't update the VM further */
+ }
+
+ /*
+ * If the page isn't yet marked all-visible in the VM or it is and
+ * needs to me marked all-frozen, update the VM Note that all_frozen
+ * is only valid if all_visible is true, so we must check both
+ * all_visible and all_frozen.
+ */
+ else if (presult->all_visible &&
+ (!blk_known_av ||
+ (presult->all_frozen && !VM_ALL_FROZEN(relation, blockno, &vmbuffer))))
+ {
+ Assert(prstate.lpdead_items == 0);
+ vmflags = VISIBILITYMAP_ALL_VISIBLE;
+
+ /*
+ * If the page is all-frozen, we can pass InvalidTransactionId as
+ * our cutoff_xid, since a snapshotConflictHorizon sufficient to
+ * make everything safe for REDO was logged when the page's tuples
+ * were frozen.
+ */
+ if (presult->all_frozen)
+ {
+ Assert(!TransactionIdIsValid(presult->vm_conflict_horizon));
+ vmflags |= VISIBILITYMAP_ALL_FROZEN;
+ }
+
+ /*
+ * It's possible for the VM bit to be clear and the page-level bit
+ * to be set if checksums are not enabled.
+ *
+ * And even if we are just planning to update the frozen bit in
+ * the VM, we shouldn't rely on all_visible_according_to_vm as a
+ * proxy for the page-level PD_ALL_VISIBLE bit being set, since it
+ * might have become stale.
+ *
+ * If the heap page is all-visible but the VM bit is not set, we
+ * don't need to dirty the heap page. However, if checksums are
+ * enabled, we do need to make sure that the heap page is dirtied
+ * before passing it to visibilitymap_set(), because it may be
+ * logged.
+ */
+ if (!PageIsAllVisible(page))
+ {
+ PageSetAllVisible(page);
+ MarkBufferDirty(buffer);
+ }
+
+ old_vmbits = visibilitymap_set(relation, blockno, buffer, InvalidXLogRecPtr,
+ vmbuffer, presult->vm_conflict_horizon,
+ vmflags);
+ }
+ }
+
+ presult->lpdead_items = prstate.lpdead_items;
+ /* the presult->deadoffsets array was already filled in */
+
+ presult->old_vmbits = old_vmbits;
+ presult->new_vmbits = vmflags;
+
if (prstate.freeze)
{
if (presult->nfrozen > 0)
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index 6110b7f80ce..c49e81bc5dd 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -1933,7 +1933,6 @@ lazy_scan_new_or_empty(LVRelState *vacrel, Buffer buf, BlockNumber blkno,
return false;
}
-
/* qsort comparator for sorting OffsetNumbers */
static int
cmpOffsetNumbers(const void *a, const void *b)
@@ -1949,7 +1948,8 @@ cmpOffsetNumbers(const void *a, const void *b)
* vmbuffer is the buffer containing the VM block with visibility information
* for the heap block, blkno. all_visible_according_to_vm is the saved
* visibility status of the heap block looked up earlier by the caller. We
- * won't rely entirely on this status, as it may be out of date.
+ * won't rely entirely on this status, as it may be out of date. These will be
+ * passed on to heap_page_prune_and_freeze() to use while setting the VM.
*
* *has_lpdead_items is set to true or false depending on whether, upon return
* from this function, any LP_DEAD items are still present on the page.
@@ -1978,6 +1978,7 @@ lazy_scan_prune(LVRelState *vacrel,
/*
* Prune all HOT-update chains and potentially freeze tuples on this page.
+ * Then, if the page's visibility status has changed, update the VM.
*
* If the relation has no indexes, we can immediately mark would-be dead
* items LP_UNUSED.
@@ -1986,10 +1987,6 @@ lazy_scan_prune(LVRelState *vacrel,
* presult.ndeleted. It should not be confused with presult.lpdead_items;
* presult.lpdead_items's final value can be thought of as the number of
* tuples that were deleted from indexes.
- *
- * We will update the VM after collecting LP_DEAD items and freezing
- * tuples. Pruning will have determined whether or not the page is
- * all-visible.
*/
prune_options = HEAP_PAGE_PRUNE_FREEZE | HEAP_PAGE_PRUNE_UPDATE_VM;
if (vacrel->nindexes == 0)
@@ -2081,87 +2078,26 @@ lazy_scan_prune(LVRelState *vacrel,
Assert(!presult.all_visible || !(*has_lpdead_items));
/*
- * Handle setting visibility map bit based on information from the VM (as
- * of last heap_vac_scan_next_block() call), and from all_visible and
- * all_frozen variables.
+ * For the purposes of logging, count whether or not the page was newly
+ * set all-visible and, potentially, all-frozen.
*/
- if (presult.vm_corruption)
+ if ((presult.old_vmbits & VISIBILITYMAP_ALL_VISIBLE) == 0 &&
+ (presult.new_vmbits & VISIBILITYMAP_ALL_VISIBLE) != 0)
{
- /* Don't update the VM if we just cleared corruption in it */
- }
-
- /*
- * If the page isn't yet marked all-visible in the VM or it is and needs
- * to me marked all-frozen, update the VM Note that all_frozen is only
- * valid if all_visible is true, so we must check both all_visible and
- * all_frozen.
- */
- else if (presult.all_visible &&
- (!all_visible_according_to_vm ||
- (presult.all_frozen && !VM_ALL_FROZEN(vacrel->rel, blkno, &vmbuffer))))
- {
- uint8 old_vmbits;
- uint8 flags = VISIBILITYMAP_ALL_VISIBLE;
-
- /*
- * If the page is all-frozen, we can pass InvalidTransactionId as our
- * cutoff_xid, since a snapshotConflictHorizon sufficient to make
- * everything safe for REDO was logged when the page's tuples were
- * frozen.
- */
- if (presult.all_frozen)
- {
- Assert(!TransactionIdIsValid(presult.vm_conflict_horizon));
- flags |= VISIBILITYMAP_ALL_FROZEN;
- }
-
- /*
- * It should never be the case that the visibility map page is set
- * while the page-level bit is clear, but the reverse is allowed (if
- * checksums are not enabled). Regardless, set both bits so that we
- * get back in sync.
- *
- * If the heap page is all-visible but the VM bit is not set, we don't
- * need to dirty the heap page. However, if checksums are enabled, we
- * do need to make sure that the heap page is dirtied before passing
- * it to visibilitymap_set(), because it may be logged.
- */
- if (!PageIsAllVisible(page))
- {
- PageSetAllVisible(page);
- MarkBufferDirty(buf);
- }
- old_vmbits = visibilitymap_set(vacrel->rel, blkno, buf,
- InvalidXLogRecPtr,
- vmbuffer, presult.vm_conflict_horizon,
- flags);
-
- /*
- * Even if we are only setting the all-frozen bit, there is a small
- * chance that the VM was modified sometime between setting
- * all_visible_according_to_vm and checking the visibility during
- * pruning. Check the return value of old_vmbits to ensure the
- * visibility map counters used for logging are accurate.
- *
- * If the page wasn't already set all-visible and/or all-frozen in the
- * VM, count it as newly set for logging.
- */
- if ((old_vmbits & VISIBILITYMAP_ALL_VISIBLE) == 0)
- {
- vacrel->vm_new_visible_pages++;
- if (presult.all_frozen)
- {
- vacrel->vm_new_visible_frozen_pages++;
- *vm_page_frozen = true;
- }
- }
- else if ((old_vmbits & VISIBILITYMAP_ALL_FROZEN) == 0 &&
- presult.all_frozen)
+ vacrel->vm_new_visible_pages++;
+ if ((presult.new_vmbits & VISIBILITYMAP_ALL_FROZEN) != 0)
{
- vacrel->vm_new_frozen_pages++;
+ vacrel->vm_new_visible_frozen_pages++;
*vm_page_frozen = true;
}
}
+ else if ((presult.old_vmbits & VISIBILITYMAP_ALL_FROZEN) == 0 &&
+ (presult.new_vmbits & VISIBILITYMAP_ALL_FROZEN) != 0)
+ {
+ Assert((presult.new_vmbits & VISIBILITYMAP_ALL_VISIBLE) != 0);
+ vacrel->vm_new_frozen_pages++;
+ *vm_page_frozen = true;
+ }
return presult.ndeleted;
}
diff --git a/src/include/access/heapam.h b/src/include/access/heapam.h
index 0c7eb5e46f4..b85648456e9 100644
--- a/src/include/access/heapam.h
+++ b/src/include/access/heapam.h
@@ -235,20 +235,21 @@ typedef struct PruneFreezeResult
int recently_dead_tuples;
/*
- * all_visible and all_frozen indicate if the all-visible and all-frozen
- * bits in the visibility map can be set for this page, after pruning.
+ * all_visible and all_frozen indicate the status of the page as reflected
+ * in the visibility map after pruning, freezing, and setting any pages
+ * all-visible in the visibility map.
*
- * vm_conflict_horizon is the newest xmin of live tuples on the page. The
- * caller can use it as the conflict horizon when setting the VM bits. It
- * is only valid if we froze some tuples (nfrozen > 0), and all_frozen is
- * true.
+ * vm_conflict_horizon is the newest xmin of live tuples on the page
+ * (older than OldestXmin). It will only be valid if we did not set the
+ * page all-frozen in the VM.
*
* These are only set if the HEAP_PRUNE_FREEZE option is set.
*/
bool all_visible;
bool all_frozen;
TransactionId vm_conflict_horizon;
- bool vm_corruption;
+ uint8 old_vmbits;
+ uint8 new_vmbits;
/*
* Whether or not the page makes rel truncation unsafe. This is set to
--
2.43.0
v4-0006-Combine-lazy_scan_prune-VM-corruption-cases.patchapplication/x-patch; name=v4-0006-Combine-lazy_scan_prune-VM-corruption-cases.patchDownload
From 9acac2dcc61134502a305e055d2d5403c9c3d559 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <melanieplageman@gmail.com>
Date: Wed, 28 May 2025 16:04:03 -0400
Subject: [PATCH v4 06/13] Combine lazy_scan_prune VM corruption cases
lazy_scan_prune() updates the visibility map after phase I of heap
vacuuming. It also checks and fixes corruption in the VM. The corruption
cases where mixed in with the normal visibility map update cases.
Careful study of the ordering of the current logic reveals that the
corruption cases can be reordered and extracted into a separate
function. This should result in no additional overhead when compared to
previous execution.
This reordering makes it clear which cases are about corruption and
which cases are normal VM updates. Separating them also makes it
possible to combine the normal cases in a future commit. This will make
the logic easier to understand and allow for further separation of the
logic to allow updating the VM in the same record as pruning and
freezing in phase I.
---
src/backend/access/heap/vacuumlazy.c | 114 +++++++++++++++++----------
1 file changed, 73 insertions(+), 41 deletions(-)
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index d9e195269d2..04a7b6c4181 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -431,6 +431,12 @@ static void find_next_unskippable_block(LVRelState *vacrel, bool *skipsallvis);
static bool lazy_scan_new_or_empty(LVRelState *vacrel, Buffer buf,
BlockNumber blkno, Page page,
bool sharelock, Buffer vmbuffer);
+static bool identify_and_fix_vm_corruption(Relation relation,
+ BlockNumber heap_blk,
+ Buffer heap_buffer, Page heap_page,
+ bool heap_blk_known_av,
+ int64 nlpdead_items,
+ Buffer vmbuffer);
static int lazy_scan_prune(LVRelState *vacrel, Buffer buf,
BlockNumber blkno, Page page,
Buffer vmbuffer, bool all_visible_according_to_vm,
@@ -1933,6 +1939,66 @@ lazy_scan_new_or_empty(LVRelState *vacrel, Buffer buf, BlockNumber blkno,
return false;
}
+/*
+ * When updating the visibility map after phase I heap vacuuming, we take the
+ * opportunity to identify and fix any VM corruption.
+ *
+ * heap_blk_known_av is the visibility status of the heap page collected
+ * while finding the next unskippable block in heap_vac_scan_next_block().
+ */
+static bool
+identify_and_fix_vm_corruption(Relation relation,
+ BlockNumber heap_blk,
+ Buffer heap_buffer, Page heap_page,
+ bool heap_blk_known_av,
+ int64 nlpdead_items,
+ Buffer vmbuffer)
+{
+ /*
+ * As of PostgreSQL 9.2, the visibility map bit should never be set if the
+ * page-level bit is clear. However, it's possible that the bit got
+ * cleared after heap_vac_scan_next_block() was called, so we must recheck
+ * with buffer lock before concluding that the VM is corrupt.
+ */
+ if (heap_blk_known_av && !PageIsAllVisible(heap_page) &&
+ visibilitymap_get_status(relation, heap_blk, &vmbuffer) != 0)
+ {
+ elog(WARNING, "page is not marked all-visible but visibility map bit is set in relation \"%s\" page %u",
+ RelationGetRelationName(relation), heap_blk);
+ visibilitymap_clear(relation, heap_blk, vmbuffer,
+ VISIBILITYMAP_VALID_BITS);
+ return true;
+ }
+
+ /*
+ * It's possible for the value returned by
+ * GetOldestNonRemovableTransactionId() to move backwards, so it's not
+ * wrong for us to see tuples that appear to not be visible to everyone
+ * yet, while PD_ALL_VISIBLE is already set. The real safe xmin value
+ * never moves backwards, but GetOldestNonRemovableTransactionId() is
+ * conservative and sometimes returns a value that's unnecessarily small,
+ * so if we see that contradiction it just means that the tuples that we
+ * think are not visible to everyone yet actually are, and the
+ * PD_ALL_VISIBLE flag is correct.
+ *
+ * There should never be LP_DEAD items on a page with PD_ALL_VISIBLE set,
+ * however.
+ */
+ if (nlpdead_items > 0 && PageIsAllVisible(heap_page))
+ {
+ elog(WARNING, "page containing LP_DEAD items is marked as all-visible in relation \"%s\" page %u",
+ RelationGetRelationName(relation), heap_blk);
+ PageClearAllVisible(heap_page);
+ MarkBufferDirty(heap_buffer);
+ visibilitymap_clear(relation, heap_blk, vmbuffer,
+ VISIBILITYMAP_VALID_BITS);
+ return true;
+ }
+
+ return false;
+}
+
+
/* qsort comparator for sorting OffsetNumbers */
static int
cmpOffsetNumbers(const void *a, const void *b)
@@ -2079,9 +2145,14 @@ lazy_scan_prune(LVRelState *vacrel,
/*
* Handle setting visibility map bit based on information from the VM (as
* of last heap_vac_scan_next_block() call), and from all_visible and
- * all_frozen variables
+ * all_frozen variables. Start by looking for any VM corruption.
*/
- if (!all_visible_according_to_vm && presult.all_visible)
+ if (identify_and_fix_vm_corruption(vacrel->rel, blkno, buf, page,
+ all_visible_according_to_vm, presult.lpdead_items, vmbuffer))
+ {
+ /* Don't update the VM if we just cleared corruption in it */
+ }
+ else if (!all_visible_according_to_vm && presult.all_visible)
{
uint8 old_vmbits;
uint8 flags = VISIBILITYMAP_ALL_VISIBLE;
@@ -2133,45 +2204,6 @@ lazy_scan_prune(LVRelState *vacrel,
}
}
- /*
- * As of PostgreSQL 9.2, the visibility map bit should never be set if the
- * page-level bit is clear. However, it's possible that the bit got
- * cleared after heap_vac_scan_next_block() was called, so we must recheck
- * with buffer lock before concluding that the VM is corrupt.
- */
- else if (all_visible_according_to_vm && !PageIsAllVisible(page) &&
- visibilitymap_get_status(vacrel->rel, blkno, &vmbuffer) != 0)
- {
- elog(WARNING, "page is not marked all-visible but visibility map bit is set in relation \"%s\" page %u",
- vacrel->relname, blkno);
- visibilitymap_clear(vacrel->rel, blkno, vmbuffer,
- VISIBILITYMAP_VALID_BITS);
- }
-
- /*
- * It's possible for the value returned by
- * GetOldestNonRemovableTransactionId() to move backwards, so it's not
- * wrong for us to see tuples that appear to not be visible to everyone
- * yet, while PD_ALL_VISIBLE is already set. The real safe xmin value
- * never moves backwards, but GetOldestNonRemovableTransactionId() is
- * conservative and sometimes returns a value that's unnecessarily small,
- * so if we see that contradiction it just means that the tuples that we
- * think are not visible to everyone yet actually are, and the
- * PD_ALL_VISIBLE flag is correct.
- *
- * There should never be LP_DEAD items on a page with PD_ALL_VISIBLE set,
- * however.
- */
- else if (presult.lpdead_items > 0 && PageIsAllVisible(page))
- {
- elog(WARNING, "page containing LP_DEAD items is marked as all-visible in relation \"%s\" page %u",
- vacrel->relname, blkno);
- PageClearAllVisible(page);
- MarkBufferDirty(buf);
- visibilitymap_clear(vacrel->rel, blkno, vmbuffer,
- VISIBILITYMAP_VALID_BITS);
- }
-
/*
* If the all-visible page is all-frozen but not marked as such yet, mark
* it as all-frozen. Note that all_frozen is only valid if all_visible is
--
2.43.0
v4-0008-Find-and-fix-VM-corruption-in-heap_page_prune_and.patchapplication/x-patch; name=v4-0008-Find-and-fix-VM-corruption-in-heap_page_prune_and.patchDownload
From 2c08444d6ef0f22a978aa1f8b099cee7517930f0 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <melanieplageman@gmail.com>
Date: Wed, 28 May 2025 16:45:59 -0400
Subject: [PATCH v4 08/13] Find and fix VM corruption in
heap_page_prune_and_freeze
Future commits will update the VM in the same critical section and WAL
record as pruning and freezing. For ease of review, this commit makes
one step toward doing this. It moves the VM corruption handling case to
heap_page_prune_and_freeze().
---
src/backend/access/heap/pruneheap.c | 87 +++++++++++++++++++++++++++-
src/backend/access/heap/vacuumlazy.c | 77 +++---------------------
src/include/access/heapam.h | 4 ++
3 files changed, 96 insertions(+), 72 deletions(-)
diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index 2724cf7f64f..070d64fa9c3 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -21,7 +21,7 @@
#include "access/transam.h"
#include "access/xlog.h"
#include "access/xloginsert.h"
-#include "access/visibilitymapdefs.h"
+#include "access/visibilitymap.h"
#include "commands/vacuum.h"
#include "executor/instrument.h"
#include "miscadmin.h"
@@ -177,6 +177,13 @@ static void heap_prune_record_unchanged_lp_redirect(PruneState *prstate, OffsetN
static void page_verify_redirects(Page page);
+static bool identify_and_fix_vm_corruption(Relation relation,
+ BlockNumber heap_blk,
+ Buffer heap_buffer, Page heap_page,
+ bool heap_blk_known_av,
+ int64 nlpdead_items,
+ Buffer vmbuffer);
+
/*
* Optionally prune and repair fragmentation in the specified page.
@@ -261,7 +268,9 @@ heap_page_prune_opt(Relation relation, Buffer buffer)
* not the relation has indexes, since we cannot safely determine
* that during on-access pruning with the current implementation.
*/
- heap_page_prune_and_freeze(relation, buffer, vistest, 0,
+ heap_page_prune_and_freeze(relation, buffer, false,
+ InvalidBuffer,
+ vistest, 0,
NULL, &presult, PRUNE_ON_ACCESS, &dummy_off_loc, NULL, NULL);
/*
@@ -294,6 +303,64 @@ heap_page_prune_opt(Relation relation, Buffer buffer)
}
}
+/*
+ * When updating the visibility map after phase I heap vacuuming, we take the
+ * opportunity to identify and fix any VM corruption.
+ *
+ * heap_blk_known_av is the visibility status of the heap page collected
+ * while finding the next unskippable block in heap_vac_scan_next_block().
+ */
+static bool
+identify_and_fix_vm_corruption(Relation relation,
+ BlockNumber heap_blk,
+ Buffer heap_buffer, Page heap_page,
+ bool heap_blk_known_av,
+ int64 nlpdead_items,
+ Buffer vmbuffer)
+{
+ /*
+ * As of PostgreSQL 9.2, the visibility map bit should never be set if the
+ * page-level bit is clear. However, it's possible that the bit got
+ * cleared after heap_vac_scan_next_block() was called, so we must recheck
+ * with buffer lock before concluding that the VM is corrupt.
+ */
+ if (heap_blk_known_av && !PageIsAllVisible(heap_page) &&
+ visibilitymap_get_status(relation, heap_blk, &vmbuffer) != 0)
+ {
+ elog(WARNING, "page is not marked all-visible but visibility map bit is set in relation \"%s\" page %u",
+ RelationGetRelationName(relation), heap_blk);
+ visibilitymap_clear(relation, heap_blk, vmbuffer,
+ VISIBILITYMAP_VALID_BITS);
+ return true;
+ }
+
+ /*
+ * It's possible for the value returned by
+ * GetOldestNonRemovableTransactionId() to move backwards, so it's not
+ * wrong for us to see tuples that appear to not be visible to everyone
+ * yet, while PD_ALL_VISIBLE is already set. The real safe xmin value
+ * never moves backwards, but GetOldestNonRemovableTransactionId() is
+ * conservative and sometimes returns a value that's unnecessarily small,
+ * so if we see that contradiction it just means that the tuples that we
+ * think are not visible to everyone yet actually are, and the
+ * PD_ALL_VISIBLE flag is correct.
+ *
+ * There should never be LP_DEAD items on a page with PD_ALL_VISIBLE set,
+ * however.
+ */
+ if (nlpdead_items > 0 && PageIsAllVisible(heap_page))
+ {
+ elog(WARNING, "page containing LP_DEAD items is marked as all-visible in relation \"%s\" page %u",
+ RelationGetRelationName(relation), heap_blk);
+ PageClearAllVisible(heap_page);
+ MarkBufferDirty(heap_buffer);
+ visibilitymap_clear(relation, heap_blk, vmbuffer,
+ VISIBILITYMAP_VALID_BITS);
+ return true;
+ }
+
+ return false;
+}
/*
* Prune and repair fragmentation and potentially freeze tuples on the
@@ -314,6 +381,10 @@ heap_page_prune_opt(Relation relation, Buffer buffer)
* HEAP_PRUNE_FREEZE option is not set, because at the moment only callers
* that also freeze need that information.
*
+ * blk_known_av is the visibility status of the heap block as of the last call
+ * to find_next_unskippable_block(). vmbuffer is the buffer that may already
+ * contain the required block of the visibility map.
+ *
* vistest is used to distinguish whether tuples are DEAD or RECENTLY_DEAD
* (see heap_prune_satisfies_vacuum).
*
@@ -349,6 +420,8 @@ heap_page_prune_opt(Relation relation, Buffer buffer)
*/
void
heap_page_prune_and_freeze(Relation relation, Buffer buffer,
+ bool blk_known_av,
+ Buffer vmbuffer,
GlobalVisState *vistest,
int options,
struct VacuumCutoffs *cutoffs,
@@ -897,6 +970,16 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
presult->lpdead_items = prstate.lpdead_items;
/* the presult->deadoffsets array was already filled in */
+ /*
+ * Clear any VM corruption. This does not need to be done in a critical
+ * section.
+ */
+ presult->vm_corruption = false;
+ if (options & HEAP_PAGE_PRUNE_UPDATE_VM)
+ presult->vm_corruption = identify_and_fix_vm_corruption(relation,
+ blockno, buffer, page,
+ blk_known_av,
+ prstate.lpdead_items, vmbuffer);
if (prstate.freeze)
{
if (presult->nfrozen > 0)
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index da71f095da9..6110b7f80ce 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -431,12 +431,6 @@ static void find_next_unskippable_block(LVRelState *vacrel, bool *skipsallvis);
static bool lazy_scan_new_or_empty(LVRelState *vacrel, Buffer buf,
BlockNumber blkno, Page page,
bool sharelock, Buffer vmbuffer);
-static bool identify_and_fix_vm_corruption(Relation relation,
- BlockNumber heap_blk,
- Buffer heap_buffer, Page heap_page,
- bool heap_blk_known_av,
- int64 nlpdead_items,
- Buffer vmbuffer);
static int lazy_scan_prune(LVRelState *vacrel, Buffer buf,
BlockNumber blkno, Page page,
Buffer vmbuffer, bool all_visible_according_to_vm,
@@ -1939,65 +1933,6 @@ lazy_scan_new_or_empty(LVRelState *vacrel, Buffer buf, BlockNumber blkno,
return false;
}
-/*
- * When updating the visibility map after phase I heap vacuuming, we take the
- * opportunity to identify and fix any VM corruption.
- *
- * heap_blk_known_av is the visibility status of the heap page collected
- * while finding the next unskippable block in heap_vac_scan_next_block().
- */
-static bool
-identify_and_fix_vm_corruption(Relation relation,
- BlockNumber heap_blk,
- Buffer heap_buffer, Page heap_page,
- bool heap_blk_known_av,
- int64 nlpdead_items,
- Buffer vmbuffer)
-{
- /*
- * As of PostgreSQL 9.2, the visibility map bit should never be set if the
- * page-level bit is clear. However, it's possible that the bit got
- * cleared after heap_vac_scan_next_block() was called, so we must recheck
- * with buffer lock before concluding that the VM is corrupt.
- */
- if (heap_blk_known_av && !PageIsAllVisible(heap_page) &&
- visibilitymap_get_status(relation, heap_blk, &vmbuffer) != 0)
- {
- elog(WARNING, "page is not marked all-visible but visibility map bit is set in relation \"%s\" page %u",
- RelationGetRelationName(relation), heap_blk);
- visibilitymap_clear(relation, heap_blk, vmbuffer,
- VISIBILITYMAP_VALID_BITS);
- return true;
- }
-
- /*
- * It's possible for the value returned by
- * GetOldestNonRemovableTransactionId() to move backwards, so it's not
- * wrong for us to see tuples that appear to not be visible to everyone
- * yet, while PD_ALL_VISIBLE is already set. The real safe xmin value
- * never moves backwards, but GetOldestNonRemovableTransactionId() is
- * conservative and sometimes returns a value that's unnecessarily small,
- * so if we see that contradiction it just means that the tuples that we
- * think are not visible to everyone yet actually are, and the
- * PD_ALL_VISIBLE flag is correct.
- *
- * There should never be LP_DEAD items on a page with PD_ALL_VISIBLE set,
- * however.
- */
- if (nlpdead_items > 0 && PageIsAllVisible(heap_page))
- {
- elog(WARNING, "page containing LP_DEAD items is marked as all-visible in relation \"%s\" page %u",
- RelationGetRelationName(relation), heap_blk);
- PageClearAllVisible(heap_page);
- MarkBufferDirty(heap_buffer);
- visibilitymap_clear(relation, heap_blk, vmbuffer,
- VISIBILITYMAP_VALID_BITS);
- return true;
- }
-
- return false;
-}
-
/* qsort comparator for sorting OffsetNumbers */
static int
@@ -2056,11 +1991,14 @@ lazy_scan_prune(LVRelState *vacrel,
* tuples. Pruning will have determined whether or not the page is
* all-visible.
*/
- prune_options = HEAP_PAGE_PRUNE_FREEZE;
+ prune_options = HEAP_PAGE_PRUNE_FREEZE | HEAP_PAGE_PRUNE_UPDATE_VM;
if (vacrel->nindexes == 0)
prune_options |= HEAP_PAGE_PRUNE_MARK_UNUSED_NOW;
- heap_page_prune_and_freeze(rel, buf, vacrel->vistest, prune_options,
+ heap_page_prune_and_freeze(rel, buf,
+ all_visible_according_to_vm,
+ vmbuffer,
+ vacrel->vistest, prune_options,
&vacrel->cutoffs, &presult, PRUNE_VACUUM_SCAN,
&vacrel->offnum,
&vacrel->NewRelfrozenXid, &vacrel->NewRelminMxid);
@@ -2145,10 +2083,9 @@ lazy_scan_prune(LVRelState *vacrel,
/*
* Handle setting visibility map bit based on information from the VM (as
* of last heap_vac_scan_next_block() call), and from all_visible and
- * all_frozen variables. Start by looking for any VM corruption.
+ * all_frozen variables.
*/
- if (identify_and_fix_vm_corruption(vacrel->rel, blkno, buf, page,
- all_visible_according_to_vm, presult.lpdead_items, vmbuffer))
+ if (presult.vm_corruption)
{
/* Don't update the VM if we just cleared corruption in it */
}
diff --git a/src/include/access/heapam.h b/src/include/access/heapam.h
index e7129a644a1..0c7eb5e46f4 100644
--- a/src/include/access/heapam.h
+++ b/src/include/access/heapam.h
@@ -42,6 +42,7 @@
/* "options" flag bits for heap_page_prune_and_freeze */
#define HEAP_PAGE_PRUNE_MARK_UNUSED_NOW (1 << 0)
#define HEAP_PAGE_PRUNE_FREEZE (1 << 1)
+#define HEAP_PAGE_PRUNE_UPDATE_VM (1 << 2)
typedef struct BulkInsertStateData *BulkInsertState;
struct TupleTableSlot;
@@ -247,6 +248,7 @@ typedef struct PruneFreezeResult
bool all_visible;
bool all_frozen;
TransactionId vm_conflict_horizon;
+ bool vm_corruption;
/*
* Whether or not the page makes rel truncation unsafe. This is set to
@@ -380,6 +382,8 @@ extern TransactionId heap_index_delete_tuples(Relation rel,
struct GlobalVisState;
extern void heap_page_prune_opt(Relation relation, Buffer buffer);
extern void heap_page_prune_and_freeze(Relation relation, Buffer buffer,
+ bool blk_known_av,
+ Buffer vmbuffer,
struct GlobalVisState *vistest,
int options,
struct VacuumCutoffs *cutoffs,
--
2.43.0
v4-0007-Combine-vacuum-phase-I-VM-update-cases.patchapplication/x-patch; name=v4-0007-Combine-vacuum-phase-I-VM-update-cases.patchDownload
From 775d6c17d83095bac01b2e0b7e344d809b5ded7a Mon Sep 17 00:00:00 2001
From: Melanie Plageman <melanieplageman@gmail.com>
Date: Wed, 28 May 2025 16:35:36 -0400
Subject: [PATCH v4 07/13] Combine vacuum phase I VM update cases
After phase I of vacuum we update the VM, either setting the VM bits
when all bits are currently unset or setting just the frozen bit when
the all-visible bit is already set. Those cases had a lot of duplicated
code. Combine them. This is simpler to understand and also allows makes
the code compact enough to start using to update the VM while pruning
and freezing.
---
src/backend/access/heap/vacuumlazy.c | 100 +++++++++------------------
1 file changed, 31 insertions(+), 69 deletions(-)
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index 04a7b6c4181..da71f095da9 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -2152,11 +2152,26 @@ lazy_scan_prune(LVRelState *vacrel,
{
/* Don't update the VM if we just cleared corruption in it */
}
- else if (!all_visible_according_to_vm && presult.all_visible)
+
+ /*
+ * If the page isn't yet marked all-visible in the VM or it is and needs
+ * to me marked all-frozen, update the VM Note that all_frozen is only
+ * valid if all_visible is true, so we must check both all_visible and
+ * all_frozen.
+ */
+ else if (presult.all_visible &&
+ (!all_visible_according_to_vm ||
+ (presult.all_frozen && !VM_ALL_FROZEN(vacrel->rel, blkno, &vmbuffer))))
{
uint8 old_vmbits;
uint8 flags = VISIBILITYMAP_ALL_VISIBLE;
+ /*
+ * If the page is all-frozen, we can pass InvalidTransactionId as our
+ * cutoff_xid, since a snapshotConflictHorizon sufficient to make
+ * everything safe for REDO was logged when the page's tuples were
+ * frozen.
+ */
if (presult.all_frozen)
{
Assert(!TransactionIdIsValid(presult.vm_conflict_horizon));
@@ -2169,21 +2184,28 @@ lazy_scan_prune(LVRelState *vacrel,
* checksums are not enabled). Regardless, set both bits so that we
* get back in sync.
*
- * NB: If the heap page is all-visible but the VM bit is not set, we
- * don't need to dirty the heap page. However, if checksums are
- * enabled, we do need to make sure that the heap page is dirtied
- * before passing it to visibilitymap_set(), because it may be logged.
- * Given that this situation should only happen in rare cases after a
- * crash, it is not worth optimizing.
+ * If the heap page is all-visible but the VM bit is not set, we don't
+ * need to dirty the heap page. However, if checksums are enabled, we
+ * do need to make sure that the heap page is dirtied before passing
+ * it to visibilitymap_set(), because it may be logged.
*/
- PageSetAllVisible(page);
- MarkBufferDirty(buf);
+ if (!PageIsAllVisible(page))
+ {
+ PageSetAllVisible(page);
+ MarkBufferDirty(buf);
+ }
old_vmbits = visibilitymap_set(vacrel->rel, blkno, buf,
InvalidXLogRecPtr,
vmbuffer, presult.vm_conflict_horizon,
flags);
/*
+ * Even if we are only setting the all-frozen bit, there is a small
+ * chance that the VM was modified sometime between setting
+ * all_visible_according_to_vm and checking the visibility during
+ * pruning. Check the return value of old_vmbits to ensure the
+ * visibility map counters used for logging are accurate.
+ *
* If the page wasn't already set all-visible and/or all-frozen in the
* VM, count it as newly set for logging.
*/
@@ -2204,66 +2226,6 @@ lazy_scan_prune(LVRelState *vacrel,
}
}
- /*
- * If the all-visible page is all-frozen but not marked as such yet, mark
- * it as all-frozen. Note that all_frozen is only valid if all_visible is
- * true, so we must check both all_visible and all_frozen.
- */
- else if (all_visible_according_to_vm && presult.all_visible &&
- presult.all_frozen && !VM_ALL_FROZEN(vacrel->rel, blkno, &vmbuffer))
- {
- uint8 old_vmbits;
-
- /*
- * Avoid relying on all_visible_according_to_vm as a proxy for the
- * page-level PD_ALL_VISIBLE bit being set, since it might have become
- * stale -- even when all_visible is set
- */
- if (!PageIsAllVisible(page))
- {
- PageSetAllVisible(page);
- MarkBufferDirty(buf);
- }
-
- /*
- * Set the page all-frozen (and all-visible) in the VM.
- *
- * We can pass InvalidTransactionId as our cutoff_xid, since a
- * snapshotConflictHorizon sufficient to make everything safe for REDO
- * was logged when the page's tuples were frozen.
- */
- Assert(!TransactionIdIsValid(presult.vm_conflict_horizon));
- old_vmbits = visibilitymap_set(vacrel->rel, blkno, buf,
- InvalidXLogRecPtr,
- vmbuffer, InvalidTransactionId,
- VISIBILITYMAP_ALL_VISIBLE |
- VISIBILITYMAP_ALL_FROZEN);
-
- /*
- * The page was likely already set all-visible in the VM. However,
- * there is a small chance that it was modified sometime between
- * setting all_visible_according_to_vm and checking the visibility
- * during pruning. Check the return value of old_vmbits anyway to
- * ensure the visibility map counters used for logging are accurate.
- */
- if ((old_vmbits & VISIBILITYMAP_ALL_VISIBLE) == 0)
- {
- vacrel->vm_new_visible_pages++;
- vacrel->vm_new_visible_frozen_pages++;
- *vm_page_frozen = true;
- }
-
- /*
- * We already checked that the page was not set all-frozen in the VM
- * above, so we don't need to test the value of old_vmbits.
- */
- else
- {
- vacrel->vm_new_frozen_pages++;
- *vm_page_frozen = true;
- }
- }
-
return presult.ndeleted;
}
--
2.43.0
v4-0011-Remove-xl_heap_visible-entirely.patchapplication/x-patch; name=v4-0011-Remove-xl_heap_visible-entirely.patchDownload
From b554be605998123bb1e57edc6669147aa8f979a6 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <melanieplageman@gmail.com>
Date: Wed, 18 Jun 2025 12:30:42 -0400
Subject: [PATCH v4 11/13] Remove xl_heap_visible entirely
There are now no users of this, so eliminate it entirely.
---
src/backend/access/common/bufmask.c | 3 +-
src/backend/access/heap/heapam.c | 54 +-------
src/backend/access/heap/heapam_xlog.c | 154 +----------------------
src/backend/access/heap/pruneheap.c | 4 +-
src/backend/access/heap/vacuumlazy.c | 10 +-
src/backend/access/heap/visibilitymap.c | 103 +--------------
src/backend/access/rmgrdesc/heapdesc.c | 10 --
src/backend/replication/logical/decode.c | 1 -
src/include/access/heapam_xlog.h | 20 ---
src/include/access/visibilitymap.h | 11 +-
src/include/access/visibilitymapdefs.h | 9 --
src/tools/pgindent/typedefs.list | 1 -
12 files changed, 23 insertions(+), 357 deletions(-)
diff --git a/src/backend/access/common/bufmask.c b/src/backend/access/common/bufmask.c
index bb260cffa68..1fff01383b3 100644
--- a/src/backend/access/common/bufmask.c
+++ b/src/backend/access/common/bufmask.c
@@ -56,8 +56,7 @@ mask_page_hint_bits(Page page)
/*
* During replay, if the page LSN has advanced past our XLOG record's LSN,
- * we don't mark the page all-visible. See heap_xlog_visible() for
- * details.
+ * we don't mark the page all-visible.
*/
PageClearAllVisible(page);
}
diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index 68db4325285..48f7b84156a 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -40,6 +40,7 @@
#include "access/valid.h"
#include "access/visibilitymap.h"
#include "access/xloginsert.h"
+#include "access/xlogutils.h"
#include "catalog/pg_database.h"
#include "catalog/pg_database_d.h"
#include "commands/vacuum.h"
@@ -2512,11 +2513,11 @@ heap_multi_insert(Relation relation, TupleTableSlot **slots, int ntuples,
{
PageSetAllVisible(page);
LockBuffer(vmbuffer, BUFFER_LOCK_EXCLUSIVE);
- visibilitymap_set_vmbyte(relation,
- BufferGetBlockNumber(buffer),
- vmbuffer,
- VISIBILITYMAP_ALL_VISIBLE |
- VISIBILITYMAP_ALL_FROZEN);
+ visibilitymap_set(relation,
+ BufferGetBlockNumber(buffer),
+ vmbuffer,
+ VISIBILITYMAP_ALL_VISIBLE |
+ VISIBILITYMAP_ALL_FROZEN);
}
/*
@@ -8784,49 +8785,6 @@ bottomup_sort_and_shrink(TM_IndexDeleteOp *delstate)
return nblocksfavorable;
}
-/*
- * Perform XLogInsert for a heap-visible operation. 'block' is the block
- * being marked all-visible, and vm_buffer is the buffer containing the
- * corresponding visibility map block. Both should have already been modified
- * and dirtied.
- *
- * snapshotConflictHorizon comes from the largest xmin on the page being
- * marked all-visible. REDO routine uses it to generate recovery conflicts.
- *
- * If checksums or wal_log_hints are enabled, we may also generate a full-page
- * image of heap_buffer. Otherwise, we optimize away the FPI (by specifying
- * REGBUF_NO_IMAGE for the heap buffer), in which case the caller should *not*
- * update the heap page's LSN.
- */
-XLogRecPtr
-log_heap_visible(Relation rel, Buffer heap_buffer, Buffer vm_buffer,
- TransactionId snapshotConflictHorizon, uint8 vmflags)
-{
- xl_heap_visible xlrec;
- XLogRecPtr recptr;
- uint8 flags;
-
- Assert(BufferIsValid(heap_buffer));
- Assert(BufferIsValid(vm_buffer));
-
- xlrec.snapshotConflictHorizon = snapshotConflictHorizon;
- xlrec.flags = vmflags;
- if (RelationIsAccessibleInLogicalDecoding(rel))
- xlrec.flags |= VISIBILITYMAP_XLOG_CATALOG_REL;
- XLogBeginInsert();
- XLogRegisterData(&xlrec, SizeOfHeapVisible);
-
- XLogRegisterBuffer(0, vm_buffer, 0);
-
- flags = REGBUF_STANDARD;
- if (!XLogHintBitIsNeeded())
- flags |= REGBUF_NO_IMAGE;
- XLogRegisterBuffer(1, heap_buffer, flags);
-
- recptr = XLogInsert(RM_HEAP2_ID, XLOG_HEAP2_VISIBLE);
-
- return recptr;
-}
/*
* Perform XLogInsert for a heap-update operation. Caller must already
diff --git a/src/backend/access/heap/heapam_xlog.c b/src/backend/access/heap/heapam_xlog.c
index 14541e2e94f..64f06d46bf1 100644
--- a/src/backend/access/heap/heapam_xlog.c
+++ b/src/backend/access/heap/heapam_xlog.c
@@ -82,10 +82,6 @@ heap_xlog_prune_freeze(XLogReaderState *record)
memcpy(&vmflags, maindataptr, sizeof(uint8));
maindataptr += sizeof(uint8);
- /*
- * We don't set VISIBILITYMAP_XLOG_CATALOG_REL in the combined record
- * because we already have XLHP_IS_CATALOG_REL.
- */
Assert((vmflags & VISIBILITYMAP_VALID_BITS) == vmflags);
/* Must never set all_frozen bit without also setting all_visible bit */
Assert(vmflags != VISIBILITYMAP_ALL_FROZEN);
@@ -267,7 +263,7 @@ heap_xlog_prune_freeze(XLogReaderState *record)
Relation reln = CreateFakeRelcacheEntry(rlocator);
visibilitymap_pin(reln, blkno, &vmbuffer);
- old_vmbits = visibilitymap_set_vmbyte(reln, blkno, vmbuffer, vmflags);
+ old_vmbits = visibilitymap_set(reln, blkno, vmbuffer, vmflags);
/* Only set VM page LSN if we modified the page */
if (old_vmbits != vmflags)
PageSetLSN(BufferGetPage(vmbuffer), lsn);
@@ -278,143 +274,6 @@ heap_xlog_prune_freeze(XLogReaderState *record)
UnlockReleaseBuffer(vmbuffer);
}
-/*
- * Replay XLOG_HEAP2_VISIBLE records.
- *
- * The critical integrity requirement here is that we must never end up with
- * a situation where the visibility map bit is set, and the page-level
- * PD_ALL_VISIBLE bit is clear. If that were to occur, then a subsequent
- * page modification would fail to clear the visibility map bit.
- */
-static void
-heap_xlog_visible(XLogReaderState *record)
-{
- XLogRecPtr lsn = record->EndRecPtr;
- xl_heap_visible *xlrec = (xl_heap_visible *) XLogRecGetData(record);
- Buffer vmbuffer = InvalidBuffer;
- Buffer buffer;
- Page page;
- RelFileLocator rlocator;
- BlockNumber blkno;
- XLogRedoAction action;
-
- Assert((xlrec->flags & VISIBILITYMAP_XLOG_VALID_BITS) == xlrec->flags);
-
- XLogRecGetBlockTag(record, 1, &rlocator, NULL, &blkno);
-
- /*
- * If there are any Hot Standby transactions running that have an xmin
- * horizon old enough that this page isn't all-visible for them, they
- * might incorrectly decide that an index-only scan can skip a heap fetch.
- *
- * NB: It might be better to throw some kind of "soft" conflict here that
- * forces any index-only scan that is in flight to perform heap fetches,
- * rather than killing the transaction outright.
- */
- if (InHotStandby)
- ResolveRecoveryConflictWithSnapshot(xlrec->snapshotConflictHorizon,
- xlrec->flags & VISIBILITYMAP_XLOG_CATALOG_REL,
- rlocator);
-
- /*
- * Read the heap page, if it still exists. If the heap file has dropped or
- * truncated later in recovery, we don't need to update the page, but we'd
- * better still update the visibility map.
- */
- action = XLogReadBufferForRedo(record, 1, &buffer);
- if (action == BLK_NEEDS_REDO)
- {
- /*
- * We don't bump the LSN of the heap page when setting the visibility
- * map bit (unless checksums or wal_hint_bits is enabled, in which
- * case we must). This exposes us to torn page hazards, but since
- * we're not inspecting the existing page contents in any way, we
- * don't care.
- */
- page = BufferGetPage(buffer);
-
- PageSetAllVisible(page);
-
- if (XLogHintBitIsNeeded())
- PageSetLSN(page, lsn);
-
- MarkBufferDirty(buffer);
- }
- else if (action == BLK_RESTORED)
- {
- /*
- * If heap block was backed up, we already restored it and there's
- * nothing more to do. (This can only happen with checksums or
- * wal_log_hints enabled.)
- */
- }
-
- if (BufferIsValid(buffer))
- {
- Size space = PageGetFreeSpace(BufferGetPage(buffer));
-
- UnlockReleaseBuffer(buffer);
-
- /*
- * Since FSM is not WAL-logged and only updated heuristically, it
- * easily becomes stale in standbys. If the standby is later promoted
- * and runs VACUUM, it will skip updating individual free space
- * figures for pages that became all-visible (or all-frozen, depending
- * on the vacuum mode,) which is troublesome when FreeSpaceMapVacuum
- * propagates too optimistic free space values to upper FSM layers;
- * later inserters try to use such pages only to find out that they
- * are unusable. This can cause long stalls when there are many such
- * pages.
- *
- * Forestall those problems by updating FSM's idea about a page that
- * is becoming all-visible or all-frozen.
- *
- * Do this regardless of a full-page image being applied, since the
- * FSM data is not in the page anyway.
- */
- if (xlrec->flags & VISIBILITYMAP_VALID_BITS)
- XLogRecordPageWithFreeSpace(rlocator, blkno, space);
- }
-
- /*
- * Even if we skipped the heap page update due to the LSN interlock, it's
- * still safe to update the visibility map. Any WAL record that clears
- * the visibility map bit does so before checking the page LSN, so any
- * bits that need to be cleared will still be cleared.
- */
- if (XLogReadBufferForRedoExtended(record, 0, RBM_ZERO_ON_ERROR, false,
- &vmbuffer) == BLK_NEEDS_REDO)
- {
- Page vmpage = BufferGetPage(vmbuffer);
- Relation reln;
- uint8 vmbits;
-
- /* initialize the page if it was read as zeros */
- if (PageIsNew(vmpage))
- PageInit(vmpage, BLCKSZ, 0);
-
- /* remove VISIBILITYMAP_XLOG_* */
- vmbits = xlrec->flags & VISIBILITYMAP_VALID_BITS;
-
- /*
- * XLogReadBufferForRedoExtended locked the buffer. But
- * visibilitymap_set will handle locking itself.
- */
- LockBuffer(vmbuffer, BUFFER_LOCK_UNLOCK);
-
- reln = CreateFakeRelcacheEntry(rlocator);
- visibilitymap_pin(reln, blkno, &vmbuffer);
-
- visibilitymap_set(reln, blkno, InvalidBuffer, lsn, vmbuffer,
- xlrec->snapshotConflictHorizon, vmbits);
-
- ReleaseBuffer(vmbuffer);
- FreeFakeRelcacheEntry(reln);
- }
- else if (BufferIsValid(vmbuffer))
- UnlockReleaseBuffer(vmbuffer);
-}
-
/*
* Given an "infobits" field from an XLog record, set the correct bits in the
* given infomask and infomask2 for the tuple touched by the record.
@@ -791,16 +650,16 @@ heap_xlog_multi_insert(XLogReaderState *record)
Relation reln = CreateFakeRelcacheEntry(rlocator);
visibilitymap_pin(reln, blkno, &vmbuffer);
- visibilitymap_set_vmbyte(reln, blkno,
- vmbuffer,
- VISIBILITYMAP_ALL_VISIBLE |
- VISIBILITYMAP_ALL_FROZEN);
/*
* It is not possible that the VM was already set for this heap page,
* so the vmbuffer must have been modified and marked dirty.
*/
Assert(BufferIsDirty(vmbuffer));
+ visibilitymap_set(reln, blkno,
+ vmbuffer,
+ VISIBILITYMAP_ALL_VISIBLE |
+ VISIBILITYMAP_ALL_FROZEN);
PageSetLSN(BufferGetPage(vmbuffer), lsn);
FreeFakeRelcacheEntry(reln);
}
@@ -1380,9 +1239,6 @@ heap2_redo(XLogReaderState *record)
case XLOG_HEAP2_PRUNE_VACUUM_CLEANUP:
heap_xlog_prune_freeze(record);
break;
- case XLOG_HEAP2_VISIBLE:
- heap_xlog_visible(record);
- break;
case XLOG_HEAP2_MULTI_INSERT:
heap_xlog_multi_insert(record);
break;
diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index 6208f55176f..f6509695e3a 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -959,8 +959,8 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
if (do_set_vm)
{
LockBuffer(vmbuffer, BUFFER_LOCK_EXCLUSIVE);
- old_vmbits = visibilitymap_set_vmbyte(relation, blockno,
- vmbuffer, vmflags);
+ old_vmbits = visibilitymap_set(relation, blockno,
+ vmbuffer, vmflags);
if (old_vmbits == vmflags)
{
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index 91e209901b8..6a0fa371a06 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -1887,8 +1887,8 @@ lazy_scan_new_or_empty(LVRelState *vacrel, Buffer buf, BlockNumber blkno,
MarkBufferDirty(buf);
LockBuffer(vmbuffer, BUFFER_LOCK_EXCLUSIVE);
- visibilitymap_set_vmbyte(vacrel->rel, blkno,
- vmbuffer, new_vmbits);
+ visibilitymap_set(vacrel->rel, blkno,
+ vmbuffer, new_vmbits);
if (RelationNeedsWAL(vacrel->rel))
{
@@ -2754,9 +2754,9 @@ lazy_vacuum_heap_page(LVRelState *vacrel, BlockNumber blkno, Buffer buffer,
set_pd_all_vis = true;
LockBuffer(vmbuffer, BUFFER_LOCK_EXCLUSIVE);
PageSetAllVisible(page);
- visibilitymap_set_vmbyte(vacrel->rel,
- blkno,
- vmbuffer, vmflags);
+ visibilitymap_set(vacrel->rel,
+ blkno,
+ vmbuffer, vmflags);
conflict_xid = visibility_cutoff_xid;
}
diff --git a/src/backend/access/heap/visibilitymap.c b/src/backend/access/heap/visibilitymap.c
index 573df6f6891..478b08fa520 100644
--- a/src/backend/access/heap/visibilitymap.c
+++ b/src/backend/access/heap/visibilitymap.c
@@ -219,105 +219,6 @@ visibilitymap_pin_ok(BlockNumber heapBlk, Buffer vmbuf)
return BufferIsValid(vmbuf) && BufferGetBlockNumber(vmbuf) == mapBlock;
}
-/*
- * visibilitymap_set - set bit(s) on a previously pinned page
- *
- * recptr is the LSN of the XLOG record we're replaying, if we're in recovery,
- * or InvalidXLogRecPtr in normal running. The VM page LSN is advanced to the
- * one provided; in normal running, we generate a new XLOG record and set the
- * page LSN to that value (though the heap page's LSN may *not* be updated;
- * see below). cutoff_xid is the largest xmin on the page being marked
- * all-visible; it is needed for Hot Standby, and can be InvalidTransactionId
- * if the page contains no tuples. It can also be set to InvalidTransactionId
- * when a page that is already all-visible is being marked all-frozen.
- *
- * Caller is expected to set the heap page's PD_ALL_VISIBLE bit before calling
- * this function. Except in recovery, caller should also pass the heap
- * buffer. When checksums are enabled and we're not in recovery, we must add
- * the heap buffer to the WAL chain to protect it from being torn.
- *
- * You must pass a buffer containing the correct map page to this function.
- * Call visibilitymap_pin first to pin the right one. This function doesn't do
- * any I/O.
- *
- * Returns the state of the page's VM bits before setting flags.
- */
-uint8
-visibilitymap_set(Relation rel, BlockNumber heapBlk, Buffer heapBuf,
- XLogRecPtr recptr, Buffer vmBuf, TransactionId cutoff_xid,
- uint8 flags)
-{
- BlockNumber mapBlock = HEAPBLK_TO_MAPBLOCK(heapBlk);
- uint32 mapByte = HEAPBLK_TO_MAPBYTE(heapBlk);
- uint8 mapOffset = HEAPBLK_TO_OFFSET(heapBlk);
- Page page;
- uint8 *map;
- uint8 status;
-
-#ifdef TRACE_VISIBILITYMAP
- elog(DEBUG1, "vm_set %s %d", RelationGetRelationName(rel), heapBlk);
-#endif
-
- Assert(InRecovery || XLogRecPtrIsInvalid(recptr));
- Assert(InRecovery || PageIsAllVisible((Page) BufferGetPage(heapBuf)));
- Assert((flags & VISIBILITYMAP_VALID_BITS) == flags);
-
- /* Must never set all_frozen bit without also setting all_visible bit */
- Assert(flags != VISIBILITYMAP_ALL_FROZEN);
-
- /* Check that we have the right heap page pinned, if present */
- if (BufferIsValid(heapBuf) && BufferGetBlockNumber(heapBuf) != heapBlk)
- elog(ERROR, "wrong heap buffer passed to visibilitymap_set");
-
- /* Check that we have the right VM page pinned */
- if (!BufferIsValid(vmBuf) || BufferGetBlockNumber(vmBuf) != mapBlock)
- elog(ERROR, "wrong VM buffer passed to visibilitymap_set");
-
- page = BufferGetPage(vmBuf);
- map = (uint8 *) PageGetContents(page);
- LockBuffer(vmBuf, BUFFER_LOCK_EXCLUSIVE);
-
- status = (map[mapByte] >> mapOffset) & VISIBILITYMAP_VALID_BITS;
- if (flags != status)
- {
- START_CRIT_SECTION();
-
- map[mapByte] |= (flags << mapOffset);
- MarkBufferDirty(vmBuf);
-
- if (RelationNeedsWAL(rel))
- {
- if (XLogRecPtrIsInvalid(recptr))
- {
- Assert(!InRecovery);
- recptr = log_heap_visible(rel, heapBuf, vmBuf, cutoff_xid, flags);
-
- /*
- * If data checksums are enabled (or wal_log_hints=on), we
- * need to protect the heap page from being torn.
- *
- * If not, then we must *not* update the heap page's LSN. In
- * this case, the FPI for the heap page was omitted from the
- * WAL record inserted above, so it would be incorrect to
- * update the heap page's LSN.
- */
- if (XLogHintBitIsNeeded())
- {
- Page heapPage = BufferGetPage(heapBuf);
-
- PageSetLSN(heapPage, recptr);
- }
- }
- PageSetLSN(page, recptr);
- }
-
- END_CRIT_SECTION();
- }
-
- LockBuffer(vmBuf, BUFFER_LOCK_UNLOCK);
- return status;
-}
-
/*
* Set flags in the VM block contained in the passed in vmBuf.
*
@@ -337,8 +238,8 @@ visibilitymap_set(Relation rel, BlockNumber heapBlk, Buffer heapBuf,
* making any changes needed to the associated heap page.
*/
uint8
-visibilitymap_set_vmbyte(Relation rel, BlockNumber heapBlk,
- Buffer vmBuf, uint8 flags)
+visibilitymap_set(Relation rel, BlockNumber heapBlk,
+ Buffer vmBuf, uint8 flags)
{
BlockNumber mapBlock = HEAPBLK_TO_MAPBLOCK(heapBlk);
uint32 mapByte = HEAPBLK_TO_MAPBYTE(heapBlk);
diff --git a/src/backend/access/rmgrdesc/heapdesc.c b/src/backend/access/rmgrdesc/heapdesc.c
index d6c86ccac20..f7880a4ed81 100644
--- a/src/backend/access/rmgrdesc/heapdesc.c
+++ b/src/backend/access/rmgrdesc/heapdesc.c
@@ -351,13 +351,6 @@ heap2_desc(StringInfo buf, XLogReaderState *record)
}
}
}
- else if (info == XLOG_HEAP2_VISIBLE)
- {
- xl_heap_visible *xlrec = (xl_heap_visible *) rec;
-
- appendStringInfo(buf, "snapshotConflictHorizon: %u, flags: 0x%02X",
- xlrec->snapshotConflictHorizon, xlrec->flags);
- }
else if (info == XLOG_HEAP2_MULTI_INSERT)
{
xl_heap_multi_insert *xlrec = (xl_heap_multi_insert *) rec;
@@ -462,9 +455,6 @@ heap2_identify(uint8 info)
case XLOG_HEAP2_PRUNE_VACUUM_CLEANUP:
id = "PRUNE_VACUUM_CLEANUP";
break;
- case XLOG_HEAP2_VISIBLE:
- id = "VISIBLE";
- break;
case XLOG_HEAP2_MULTI_INSERT:
id = "MULTI_INSERT";
break;
diff --git a/src/backend/replication/logical/decode.c b/src/backend/replication/logical/decode.c
index cc03f0706e9..2fdd4af90a8 100644
--- a/src/backend/replication/logical/decode.c
+++ b/src/backend/replication/logical/decode.c
@@ -454,7 +454,6 @@ heap2_decode(LogicalDecodingContext *ctx, XLogRecordBuffer *buf)
case XLOG_HEAP2_PRUNE_ON_ACCESS:
case XLOG_HEAP2_PRUNE_VACUUM_SCAN:
case XLOG_HEAP2_PRUNE_VACUUM_CLEANUP:
- case XLOG_HEAP2_VISIBLE:
case XLOG_HEAP2_LOCK_UPDATED:
break;
default:
diff --git a/src/include/access/heapam_xlog.h b/src/include/access/heapam_xlog.h
index ceae9c083ff..a64677b7bca 100644
--- a/src/include/access/heapam_xlog.h
+++ b/src/include/access/heapam_xlog.h
@@ -60,7 +60,6 @@
#define XLOG_HEAP2_PRUNE_ON_ACCESS 0x10
#define XLOG_HEAP2_PRUNE_VACUUM_SCAN 0x20
#define XLOG_HEAP2_PRUNE_VACUUM_CLEANUP 0x30
-#define XLOG_HEAP2_VISIBLE 0x40
#define XLOG_HEAP2_MULTI_INSERT 0x50
#define XLOG_HEAP2_LOCK_UPDATED 0x60
#define XLOG_HEAP2_NEW_CID 0x70
@@ -438,20 +437,6 @@ typedef struct xl_heap_inplace
#define MinSizeOfHeapInplace (offsetof(xl_heap_inplace, nmsgs) + sizeof(int))
-/*
- * This is what we need to know about setting a visibility map bit
- *
- * Backup blk 0: visibility map buffer
- * Backup blk 1: heap buffer
- */
-typedef struct xl_heap_visible
-{
- TransactionId snapshotConflictHorizon;
- uint8 flags;
-} xl_heap_visible;
-
-#define SizeOfHeapVisible (offsetof(xl_heap_visible, flags) + sizeof(uint8))
-
typedef struct xl_heap_new_cid
{
/*
@@ -495,11 +480,6 @@ extern void heap2_desc(StringInfo buf, XLogReaderState *record);
extern const char *heap2_identify(uint8 info);
extern void heap_xlog_logical_rewrite(XLogReaderState *r);
-extern XLogRecPtr log_heap_visible(Relation rel, Buffer heap_buffer,
- Buffer vm_buffer,
- TransactionId snapshotConflictHorizon,
- uint8 vmflags);
-
/* in heapdesc.c, so it can be shared between frontend/backend code */
extern void heap_xlog_deserialize_prune_and_freeze(char *cursor, uint8 flags,
int *nplans, xlhp_freeze_plan **plans,
diff --git a/src/include/access/visibilitymap.h b/src/include/access/visibilitymap.h
index 977566f6b98..c5b1e1f7adb 100644
--- a/src/include/access/visibilitymap.h
+++ b/src/include/access/visibilitymap.h
@@ -15,7 +15,6 @@
#define VISIBILITYMAP_H
#include "access/visibilitymapdefs.h"
-#include "access/xlogdefs.h"
#include "storage/block.h"
#include "storage/buf.h"
#include "utils/relcache.h"
@@ -31,14 +30,8 @@ extern bool visibilitymap_clear(Relation rel, BlockNumber heapBlk,
extern void visibilitymap_pin(Relation rel, BlockNumber heapBlk,
Buffer *vmbuf);
extern bool visibilitymap_pin_ok(BlockNumber heapBlk, Buffer vmbuf);
-extern uint8 visibilitymap_set(Relation rel,
- BlockNumber heapBlk, Buffer heapBuf,
- XLogRecPtr recptr,
- Buffer vmBuf,
- TransactionId cutoff_xid,
- uint8 flags);
-extern uint8 visibilitymap_set_vmbyte(Relation rel, BlockNumber heapBlk,
- Buffer vmBuf, uint8 flags);
+extern uint8 visibilitymap_set(Relation rel, BlockNumber heapBlk,
+ Buffer vmBuf, uint8 flags);
extern uint8 visibilitymap_get_status(Relation rel, BlockNumber heapBlk, Buffer *vmbuf);
extern void visibilitymap_count(Relation rel, BlockNumber *all_visible, BlockNumber *all_frozen);
extern BlockNumber visibilitymap_prepare_truncate(Relation rel,
diff --git a/src/include/access/visibilitymapdefs.h b/src/include/access/visibilitymapdefs.h
index 5ad5c020877..e01bce4c99f 100644
--- a/src/include/access/visibilitymapdefs.h
+++ b/src/include/access/visibilitymapdefs.h
@@ -21,14 +21,5 @@
#define VISIBILITYMAP_ALL_FROZEN 0x02
#define VISIBILITYMAP_VALID_BITS 0x03 /* OR of all valid visibilitymap
* flags bits */
-/*
- * To detect recovery conflicts during logical decoding on a standby, we need
- * to know if a table is a user catalog table. For that we add an additional
- * bit into xl_heap_visible.flags, in addition to the above.
- *
- * NB: VISIBILITYMAP_XLOG_* may not be passed to visibilitymap_set().
- */
-#define VISIBILITYMAP_XLOG_CATALOG_REL 0x04
-#define VISIBILITYMAP_XLOG_VALID_BITS (VISIBILITYMAP_VALID_BITS | VISIBILITYMAP_XLOG_CATALOG_REL)
#endif /* VISIBILITYMAPDEFS_H */
diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list
index 83192038571..e65094cb5df 100644
--- a/src/tools/pgindent/typedefs.list
+++ b/src/tools/pgindent/typedefs.list
@@ -4269,7 +4269,6 @@ xl_heap_prune
xl_heap_rewrite_mapping
xl_heap_truncate
xl_heap_update
-xl_heap_visible
xl_invalid_page
xl_invalid_page_key
xl_invalidations
--
2.43.0
v4-0012-Allow-on-access-pruning-to-set-pages-all-visible.patchapplication/x-patch; name=v4-0012-Allow-on-access-pruning-to-set-pages-all-visible.patchDownload
From e60710ccff532c6f6da9c470edc6eab9ecdbc37c Mon Sep 17 00:00:00 2001
From: Melanie Plageman <melanieplageman@gmail.com>
Date: Mon, 7 Jul 2025 17:30:14 -0400
Subject: [PATCH v4 12/13] Allow on-access pruning to set pages all-visible
Many queries do not modify the underlying relation. For such queries, if
on-access pruning occurs during the scan, we can check whether the page
has become all-visible and update the visibility map accordingly.
Previously, only vacuum marked pages as all-visible or all-frozen.
Supporting this requires passing information about whether the relation
is modified from the executor down to the scan descriptor.
This commit implements on-access VM setting for sequential scans as well
as for the underlying heap relation in index scans and bitmap heap
scans.
---
src/backend/access/heap/heapam.c | 15 +++++-
src/backend/access/heap/heapam_handler.c | 17 ++++++-
src/backend/access/heap/pruneheap.c | 59 +++++++++++++++++------
src/backend/access/index/indexam.c | 46 ++++++++++++++++++
src/backend/access/table/tableam.c | 39 +++++++++++++--
src/backend/executor/execMain.c | 4 ++
src/backend/executor/execUtils.c | 2 +
src/backend/executor/nodeBitmapHeapscan.c | 6 ++-
src/backend/executor/nodeIndexscan.c | 17 ++++---
src/backend/executor/nodeSeqscan.c | 17 +++++--
src/backend/storage/ipc/procarray.c | 12 +++++
src/include/access/genam.h | 11 +++++
src/include/access/heapam.h | 24 +++++++--
src/include/access/relscan.h | 6 +++
src/include/access/tableam.h | 30 +++++++++++-
src/include/nodes/execnodes.h | 17 +++++++
src/include/utils/snapmgr.h | 1 +
17 files changed, 285 insertions(+), 38 deletions(-)
diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index 48f7b84156a..50b0d169d54 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -560,6 +560,7 @@ heap_prepare_pagescan(TableScanDesc sscan)
int lines;
bool all_visible;
bool check_serializable;
+ bool allow_vmset;
Assert(BufferGetBlockNumber(buffer) == block);
@@ -570,7 +571,9 @@ heap_prepare_pagescan(TableScanDesc sscan)
/*
* Prune and repair fragmentation for the whole page, if possible.
*/
- heap_page_prune_opt(scan->rs_base.rs_rd, buffer);
+ allow_vmset = sscan->rs_flags & SO_ALLOW_VM_SET;
+ heap_page_prune_opt(scan->rs_base.rs_rd, buffer,
+ allow_vmset ? &scan->rs_vmbuffer : NULL, allow_vmset);
/*
* We must hold share lock on the buffer content while examining tuple
@@ -1236,6 +1239,7 @@ heap_beginscan(Relation relation, Snapshot snapshot,
sizeof(TBMIterateResult));
}
+ scan->rs_vmbuffer = InvalidBuffer;
return (TableScanDesc) scan;
}
@@ -1274,6 +1278,12 @@ heap_rescan(TableScanDesc sscan, ScanKey key, bool set_params,
scan->rs_cbuf = InvalidBuffer;
}
+ if (BufferIsValid(scan->rs_vmbuffer))
+ {
+ ReleaseBuffer(scan->rs_vmbuffer);
+ scan->rs_vmbuffer = InvalidBuffer;
+ }
+
/*
* SO_TYPE_BITMAPSCAN would be cleaned up here, but it does not hold any
* additional data vs a normal HeapScan
@@ -1306,6 +1316,9 @@ heap_endscan(TableScanDesc sscan)
if (BufferIsValid(scan->rs_cbuf))
ReleaseBuffer(scan->rs_cbuf);
+ if (BufferIsValid(scan->rs_vmbuffer))
+ ReleaseBuffer(scan->rs_vmbuffer);
+
/*
* Must free the read stream before freeing the BufferAccessStrategy.
*/
diff --git a/src/backend/access/heap/heapam_handler.c b/src/backend/access/heap/heapam_handler.c
index cb4bc35c93e..fb450c5a84f 100644
--- a/src/backend/access/heap/heapam_handler.c
+++ b/src/backend/access/heap/heapam_handler.c
@@ -85,6 +85,7 @@ heapam_index_fetch_begin(Relation rel)
hscan->xs_base.rel = rel;
hscan->xs_cbuf = InvalidBuffer;
+ hscan->xs_vmbuffer = InvalidBuffer;
return &hscan->xs_base;
}
@@ -99,6 +100,12 @@ heapam_index_fetch_reset(IndexFetchTableData *scan)
ReleaseBuffer(hscan->xs_cbuf);
hscan->xs_cbuf = InvalidBuffer;
}
+
+ if (BufferIsValid(hscan->xs_vmbuffer))
+ {
+ ReleaseBuffer(hscan->xs_vmbuffer);
+ hscan->xs_vmbuffer = InvalidBuffer;
+ }
}
static void
@@ -138,7 +145,9 @@ heapam_index_fetch_tuple(struct IndexFetchTableData *scan,
* Prune page, but only if we weren't already on this page
*/
if (prev_buf != hscan->xs_cbuf)
- heap_page_prune_opt(hscan->xs_base.rel, hscan->xs_cbuf);
+ heap_page_prune_opt(hscan->xs_base.rel, hscan->xs_cbuf,
+ scan->modifies_base_rel ? NULL : &hscan->xs_vmbuffer,
+ !scan->modifies_base_rel);
}
/* Obtain share-lock on the buffer so we can examine visibility */
@@ -2471,6 +2480,7 @@ BitmapHeapScanNextBlock(TableScanDesc scan,
TBMIterateResult *tbmres;
OffsetNumber offsets[TBM_MAX_TUPLES_PER_PAGE];
int noffsets = -1;
+ bool allow_vmset = false;
Assert(scan->rs_flags & SO_TYPE_BITMAPSCAN);
Assert(hscan->rs_read_stream);
@@ -2517,7 +2527,10 @@ BitmapHeapScanNextBlock(TableScanDesc scan,
/*
* Prune and repair fragmentation for the whole page, if possible.
*/
- heap_page_prune_opt(scan->rs_rd, buffer);
+ allow_vmset = scan->rs_flags & SO_ALLOW_VM_SET;
+ heap_page_prune_opt(scan->rs_rd, buffer,
+ allow_vmset ? &hscan->rs_vmbuffer : NULL,
+ allow_vmset);
/*
* We must hold share lock on the buffer content while examining tuple
diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index f6509695e3a..af23008ddf7 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -158,6 +158,7 @@ typedef struct
bool all_visible;
bool all_frozen;
TransactionId visibility_cutoff_xid;
+ TransactionId oldest_xmin;
} PruneState;
/* Local functions */
@@ -203,9 +204,13 @@ static bool identify_and_fix_vm_corruption(Relation relation,
* if there's not any use in pruning.
*
* Caller must have pin on the buffer, and must *not* have a lock on it.
+ *
+ * If allow_vmset is true, it is okay for pruning to set the visibility map if
+ * the page is all visible.
*/
void
-heap_page_prune_opt(Relation relation, Buffer buffer)
+heap_page_prune_opt(Relation relation, Buffer buffer,
+ Buffer *vmbuffer, bool allow_vmset)
{
Page page = BufferGetPage(buffer);
TransactionId prune_xid;
@@ -260,6 +265,9 @@ heap_page_prune_opt(Relation relation, Buffer buffer)
if (!ConditionalLockBufferForCleanup(buffer))
return;
+ /* Caller should not pass a vmbuffer if allow_vmset is false. */
+ Assert(allow_vmset || vmbuffer == NULL);
+
/*
* Now that we have buffer lock, get accurate information about the
* page's free space, and recheck the heuristic about whether to
@@ -269,6 +277,13 @@ heap_page_prune_opt(Relation relation, Buffer buffer)
{
OffsetNumber dummy_off_loc;
PruneFreezeResult presult;
+ int options = 0;
+
+ if (allow_vmset)
+ {
+ visibilitymap_pin(relation, BufferGetBlockNumber(buffer), vmbuffer);
+ options = HEAP_PAGE_PRUNE_UPDATE_VM;
+ }
/*
* For now, pass mark_unused_now as false regardless of whether or
@@ -276,8 +291,8 @@ heap_page_prune_opt(Relation relation, Buffer buffer)
* that during on-access pruning with the current implementation.
*/
heap_page_prune_and_freeze(relation, buffer, false,
- InvalidBuffer,
- vistest, 0,
+ vmbuffer ? *vmbuffer : InvalidBuffer,
+ vistest, options,
NULL, &presult, PRUNE_ON_ACCESS, &dummy_off_loc, NULL, NULL);
/*
@@ -467,6 +482,10 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
prstate.mark_unused_now = (options & HEAP_PAGE_PRUNE_MARK_UNUSED_NOW) != 0;
prstate.freeze = (options & HEAP_PAGE_PRUNE_FREEZE) != 0;
prstate.update_vm = (options & HEAP_PAGE_PRUNE_UPDATE_VM) != 0;
+ if (cutoffs)
+ prstate.oldest_xmin = cutoffs->OldestXmin;
+ else
+ prstate.oldest_xmin = OldestXminFromGlobalVisState(vistest);
prstate.cutoffs = cutoffs;
/*
@@ -877,6 +896,20 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
*/
if (prstate.update_vm)
{
+ /*
+ * If this is on-access and we aren't actually pruning, don't set the
+ * VM if doing so would newly dirty the heap page or, if the page is
+ * already dirty, if the WAL record emitted would have to contain an
+ * FPI of the heap page. This should rarely happen, as we only attempt
+ * on-access pruning when pd_prune_xid is valid.
+ */
+ if (reason == PRUNE_ON_ACCESS &&
+ !do_prune && !do_freeze &&
+ (!BufferIsDirty(buffer) || XLogCheckBufferNeedsBackup(buffer)))
+ {
+ /* Don't update the VM */
+ }
+
/*
* Clear any VM corruption. This does not need to be in a critical
* section, so we do it first. If PD_ALL_VISIBLE is incorrectly set,
@@ -885,9 +918,9 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
* of VM corruption, so we don't have to worry about the extra
* performance overhead.
*/
- if (identify_and_fix_vm_corruption(relation,
- blockno, buffer, page,
- blk_known_av, prstate.lpdead_items, vmbuffer))
+ else if (identify_and_fix_vm_corruption(relation,
+ blockno, buffer, page,
+ blk_known_av, prstate.lpdead_items, vmbuffer))
{
/* If we fix corruption, don't update the VM further */
}
@@ -1013,7 +1046,7 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
*/
else if (do_freeze)
{
- conflict_xid = prstate.cutoffs->OldestXmin;
+ conflict_xid = prstate.oldest_xmin;
TransactionIdRetreat(conflict_xid);
}
@@ -1071,12 +1104,10 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
TransactionId debug_cutoff;
bool debug_all_frozen;
- Assert(cutoffs);
-
Assert(prstate.lpdead_items == 0);
if (!heap_page_is_all_visible(relation, buffer,
- cutoffs->OldestXmin,
+ prstate.oldest_xmin,
&debug_all_frozen,
&debug_cutoff, off_loc))
Assert(false);
@@ -1136,9 +1167,8 @@ heap_prune_satisfies_vacuum(PruneState *prstate, HeapTuple tup, Buffer buffer)
* vacuuming the relation. OldestXmin is used for freezing determination
* and we cannot freeze dead tuples' xmaxes.
*/
- if (prstate->cutoffs &&
- TransactionIdIsValid(prstate->cutoffs->OldestXmin) &&
- NormalTransactionIdPrecedes(dead_after, prstate->cutoffs->OldestXmin))
+ if (TransactionIdIsValid(prstate->oldest_xmin) &&
+ NormalTransactionIdPrecedes(dead_after, prstate->oldest_xmin))
return HEAPTUPLE_DEAD;
/*
@@ -1607,8 +1637,7 @@ heap_prune_record_unchanged_lp_normal(Page page, PruneState *prstate, OffsetNumb
* could use GlobalVisTestIsRemovableXid instead, if a
* non-freezing caller wanted to set the VM bit.
*/
- Assert(prstate->cutoffs);
- if (!TransactionIdPrecedes(xmin, prstate->cutoffs->OldestXmin))
+ if (!TransactionIdPrecedes(xmin, prstate->oldest_xmin))
{
prstate->all_visible = false;
break;
diff --git a/src/backend/access/index/indexam.c b/src/backend/access/index/indexam.c
index 219df1971da..d803c307517 100644
--- a/src/backend/access/index/indexam.c
+++ b/src/backend/access/index/indexam.c
@@ -279,6 +279,32 @@ index_beginscan(Relation heapRelation,
return scan;
}
+/*
+ * Similar to index_beginscan(), but allows the caller to indicate whether the
+ * query modifies the underlying base relation. This is used when the caller
+ * wants to attempt marking pages in the base relation as all-visible in the
+ * visibility map during on-access pruning.
+ */
+IndexScanDesc
+index_beginscan_vmset(Relation heapRelation,
+ Relation indexRelation,
+ Snapshot snapshot,
+ IndexScanInstrumentation *instrument,
+ int nkeys, int norderbys, bool modifies_base_rel)
+{
+ IndexScanDesc scan;
+
+ scan = index_beginscan(heapRelation,
+ indexRelation,
+ snapshot,
+ instrument,
+ nkeys, norderbys);
+
+ scan->xs_heapfetch->modifies_base_rel = modifies_base_rel;
+
+ return scan;
+}
+
/*
* index_beginscan_bitmap - start a scan of an index with amgetbitmap
*
@@ -610,6 +636,26 @@ index_beginscan_parallel(Relation heaprel, Relation indexrel,
return scan;
}
+/*
+ * Parallel version of index_beginscan_vmset()
+ */
+IndexScanDesc
+index_beginscan_parallel_vmset(Relation heaprel, Relation indexrel,
+ IndexScanInstrumentation *instrument,
+ int nkeys, int norderbys,
+ ParallelIndexScanDesc pscan,
+ bool modifies_base_rel)
+{
+ IndexScanDesc scan;
+
+ scan = index_beginscan_parallel(heaprel, indexrel,
+ instrument,
+ nkeys, norderbys,
+ pscan);
+ scan->xs_heapfetch->modifies_base_rel = modifies_base_rel;
+ return scan;
+}
+
/* ----------------
* index_getnext_tid - get the next TID from a scan
*
diff --git a/src/backend/access/table/tableam.c b/src/backend/access/table/tableam.c
index a56c5eceb14..67dbf99f5b5 100644
--- a/src/backend/access/table/tableam.c
+++ b/src/backend/access/table/tableam.c
@@ -49,6 +49,10 @@
char *default_table_access_method = DEFAULT_TABLE_ACCESS_METHOD;
bool synchronize_seqscans = true;
+/* Helper for table_beginscan_parallel() and table_beginscan_parallel_vmset() */
+static TableScanDesc table_beginscan_parallel_common(Relation relation, ParallelTableScanDesc pscan,
+ uint32 flags);
+
/* ----------------------------------------------------------------------------
* Slot functions.
@@ -162,12 +166,14 @@ table_parallelscan_initialize(Relation rel, ParallelTableScanDesc pscan,
}
}
-TableScanDesc
-table_beginscan_parallel(Relation relation, ParallelTableScanDesc pscan)
+/*
+ * Common helper for table_beginscan_parallel() and table_beginscan_parallel_vmset()
+ */
+static TableScanDesc
+table_beginscan_parallel_common(Relation relation, ParallelTableScanDesc pscan,
+ uint32 flags)
{
Snapshot snapshot;
- uint32 flags = SO_TYPE_SEQSCAN |
- SO_ALLOW_STRAT | SO_ALLOW_SYNC | SO_ALLOW_PAGEMODE;
Assert(RelFileLocatorEquals(relation->rd_locator, pscan->phs_locator));
@@ -188,6 +194,31 @@ table_beginscan_parallel(Relation relation, ParallelTableScanDesc pscan)
pscan, flags);
}
+TableScanDesc
+table_beginscan_parallel(Relation relation, ParallelTableScanDesc pscan)
+{
+ uint32 flags = SO_TYPE_SEQSCAN |
+ SO_ALLOW_STRAT | SO_ALLOW_SYNC | SO_ALLOW_PAGEMODE;
+
+ return table_beginscan_parallel_common(relation, pscan, flags);
+}
+
+/*
+ * Parallel version of table_beginscan_vmset()
+ */
+TableScanDesc
+table_beginscan_parallel_vmset(Relation relation, ParallelTableScanDesc pscan,
+ bool modifies_rel)
+{
+ uint32 flags = SO_TYPE_SEQSCAN |
+ SO_ALLOW_STRAT | SO_ALLOW_SYNC | SO_ALLOW_PAGEMODE;
+
+ if (!modifies_rel)
+ flags |= SO_ALLOW_VM_SET;
+
+ return table_beginscan_parallel_common(relation, pscan, flags);
+}
+
/* ----------------------------------------------------------------------------
* Index scan related functions.
diff --git a/src/backend/executor/execMain.c b/src/backend/executor/execMain.c
index 0391798dd2c..065676eb7cf 100644
--- a/src/backend/executor/execMain.c
+++ b/src/backend/executor/execMain.c
@@ -917,6 +917,10 @@ InitPlan(QueryDesc *queryDesc, int eflags)
break;
}
+ /* If it has a rowmark, the relation is modified */
+ estate->es_modified_relids = bms_add_member(estate->es_modified_relids,
+ rc->rti);
+
/* Check that relation is a legal target for marking */
if (relation)
CheckValidRowMarkRel(relation, rc->markType);
diff --git a/src/backend/executor/execUtils.c b/src/backend/executor/execUtils.c
index fdc65c2b42b..28a06dcd244 100644
--- a/src/backend/executor/execUtils.c
+++ b/src/backend/executor/execUtils.c
@@ -893,6 +893,8 @@ ExecInitResultRelation(EState *estate, ResultRelInfo *resultRelInfo,
estate->es_result_relations = (ResultRelInfo **)
palloc0(estate->es_range_table_size * sizeof(ResultRelInfo *));
estate->es_result_relations[rti - 1] = resultRelInfo;
+ estate->es_modified_relids = bms_add_member(estate->es_modified_relids,
+ rti);
/*
* Saving in the list allows to avoid needlessly traversing the whole
diff --git a/src/backend/executor/nodeBitmapHeapscan.c b/src/backend/executor/nodeBitmapHeapscan.c
index bf24f3d7fe0..2c57bc7ac49 100644
--- a/src/backend/executor/nodeBitmapHeapscan.c
+++ b/src/backend/executor/nodeBitmapHeapscan.c
@@ -109,7 +109,8 @@ BitmapTableScanSetup(BitmapHeapScanState *node)
table_beginscan_bm(node->ss.ss_currentRelation,
node->ss.ps.state->es_snapshot,
0,
- NULL);
+ NULL,
+ node->modifies_rel);
}
node->ss.ss_currentScanDesc->st.rs_tbmiterator = tbmiterator;
@@ -360,6 +361,9 @@ ExecInitBitmapHeapScan(BitmapHeapScan *node, EState *estate, int eflags)
scanstate->initialized = false;
scanstate->pstate = NULL;
scanstate->recheck = true;
+ scanstate->modifies_rel =
+ bms_is_member(node->scan.scanrelid,
+ estate->es_modified_relids);
/*
* Miscellaneous initialization
diff --git a/src/backend/executor/nodeIndexscan.c b/src/backend/executor/nodeIndexscan.c
index 7fcaa37fe62..f91c6b17620 100644
--- a/src/backend/executor/nodeIndexscan.c
+++ b/src/backend/executor/nodeIndexscan.c
@@ -106,12 +106,13 @@ IndexNext(IndexScanState *node)
* We reach here if the index scan is not parallel, or if we're
* serially executing an index scan that was planned to be parallel.
*/
- scandesc = index_beginscan(node->ss.ss_currentRelation,
- node->iss_RelationDesc,
- estate->es_snapshot,
- &node->iss_Instrument,
- node->iss_NumScanKeys,
- node->iss_NumOrderByKeys);
+ scandesc = index_beginscan_vmset(node->ss.ss_currentRelation,
+ node->iss_RelationDesc,
+ estate->es_snapshot,
+ &node->iss_Instrument,
+ node->iss_NumScanKeys,
+ node->iss_NumOrderByKeys,
+ node->iss_ModifiesBaseRel);
node->iss_ScanDesc = scandesc;
@@ -935,6 +936,10 @@ ExecInitIndexScan(IndexScan *node, EState *estate, int eflags)
indexstate->ss.ss_currentRelation = currentRelation;
indexstate->ss.ss_currentScanDesc = NULL; /* no heap scan here */
+ indexstate->iss_ModifiesBaseRel =
+ bms_is_member(node->scan.scanrelid,
+ estate->es_modified_relids);
+
/*
* get the scan type from the relation descriptor.
*/
diff --git a/src/backend/executor/nodeSeqscan.c b/src/backend/executor/nodeSeqscan.c
index ed35c58c2c3..cded7f15703 100644
--- a/src/backend/executor/nodeSeqscan.c
+++ b/src/backend/executor/nodeSeqscan.c
@@ -69,9 +69,9 @@ SeqNext(SeqScanState *node)
* We reach here if the scan is not parallel, or if we're serially
* executing a scan that was planned to be parallel.
*/
- scandesc = table_beginscan(node->ss.ss_currentRelation,
- estate->es_snapshot,
- 0, NULL);
+ scandesc = table_beginscan_vmset(node->ss.ss_currentRelation,
+ estate->es_snapshot,
+ 0, NULL, node->modifies_rel);
node->ss.ss_currentScanDesc = scandesc;
}
@@ -237,6 +237,10 @@ ExecInitSeqScan(SeqScan *node, EState *estate, int eflags)
node->scan.scanrelid,
eflags);
+ scanstate->modifies_rel =
+ bms_is_member(node->scan.scanrelid,
+ estate->es_modified_relids);
+
/* and create slot with the appropriate rowtype */
ExecInitScanTupleSlot(estate, &scanstate->ss,
RelationGetDescr(scanstate->ss.ss_currentRelation),
@@ -370,7 +374,8 @@ ExecSeqScanInitializeDSM(SeqScanState *node,
estate->es_snapshot);
shm_toc_insert(pcxt->toc, node->ss.ps.plan->plan_node_id, pscan);
node->ss.ss_currentScanDesc =
- table_beginscan_parallel(node->ss.ss_currentRelation, pscan);
+ table_beginscan_parallel_vmset(node->ss.ss_currentRelation, pscan,
+ node->modifies_rel);
}
/* ----------------------------------------------------------------
@@ -403,5 +408,7 @@ ExecSeqScanInitializeWorker(SeqScanState *node,
pscan = shm_toc_lookup(pwcxt->toc, node->ss.ps.plan->plan_node_id, false);
node->ss.ss_currentScanDesc =
- table_beginscan_parallel(node->ss.ss_currentRelation, pscan);
+ table_beginscan_parallel_vmset(node->ss.ss_currentRelation,
+ pscan,
+ node->modifies_rel);
}
diff --git a/src/backend/storage/ipc/procarray.c b/src/backend/storage/ipc/procarray.c
index e5b945a9ee3..01d2bda3f72 100644
--- a/src/backend/storage/ipc/procarray.c
+++ b/src/backend/storage/ipc/procarray.c
@@ -4133,6 +4133,18 @@ GlobalVisTestFor(Relation rel)
return state;
}
+/*
+ * Returns maybe_needed as a 32-bit TransactionId. Can be used in callers that
+ * need to compare transaction IDs to a single value and are okay with using
+ * the more conservative boundary.
+ */
+TransactionId
+OldestXminFromGlobalVisState(GlobalVisState *state)
+{
+ return XidFromFullTransactionId(state->maybe_needed);
+}
+
+
/*
* Return true if it's worth updating the accurate maybe_needed boundary.
*
diff --git a/src/include/access/genam.h b/src/include/access/genam.h
index 5b2ab181b5f..bf272c2c37f 100644
--- a/src/include/access/genam.h
+++ b/src/include/access/genam.h
@@ -180,6 +180,11 @@ extern IndexScanDesc index_beginscan(Relation heapRelation,
Snapshot snapshot,
IndexScanInstrumentation *instrument,
int nkeys, int norderbys);
+extern IndexScanDesc index_beginscan_vmset(Relation heapRelation,
+ Relation indexRelation,
+ Snapshot snapshot,
+ IndexScanInstrumentation *instrument,
+ int nkeys, int norderbys, bool modifies_heap_rel);
extern IndexScanDesc index_beginscan_bitmap(Relation indexRelation,
Snapshot snapshot,
IndexScanInstrumentation *instrument,
@@ -206,6 +211,12 @@ extern IndexScanDesc index_beginscan_parallel(Relation heaprel,
IndexScanInstrumentation *instrument,
int nkeys, int norderbys,
ParallelIndexScanDesc pscan);
+
+extern IndexScanDesc index_beginscan_parallel_vmset(Relation heaprel, Relation indexrel,
+ IndexScanInstrumentation *instrument,
+ int nkeys, int norderbys,
+ ParallelIndexScanDesc pscan,
+ bool modifies_rel);
extern ItemPointer index_getnext_tid(IndexScanDesc scan,
ScanDirection direction);
struct TupleTableSlot;
diff --git a/src/include/access/heapam.h b/src/include/access/heapam.h
index 0b9bb1c9b13..46ea8b8455c 100644
--- a/src/include/access/heapam.h
+++ b/src/include/access/heapam.h
@@ -94,6 +94,13 @@ typedef struct HeapScanDescData
*/
ParallelBlockTableScanWorkerData *rs_parallelworkerdata;
+ /*
+ * For sequential scans and bitmap heap scans. If the relation is not
+ * being modified, on-access pruning may read in the current heap page's
+ * corresponding VM block to this buffer.
+ */
+ Buffer rs_vmbuffer;
+
/* these fields only used in page-at-a-time mode and for bitmap scans */
uint32 rs_cindex; /* current tuple's index in vistuples */
uint32 rs_ntuples; /* number of visible tuples on page */
@@ -116,8 +123,18 @@ typedef struct IndexFetchHeapData
{
IndexFetchTableData xs_base; /* AM independent part of the descriptor */
- Buffer xs_cbuf; /* current heap buffer in scan, if any */
- /* NB: if xs_cbuf is not InvalidBuffer, we hold a pin on that buffer */
+ /*
+ * Current heap buffer in scan, if any. NB: if xs_cbuf is not
+ * InvalidBuffer, we hold a pin on that buffer.
+ */
+ Buffer xs_cbuf;
+
+ /*
+ * For index scans that do not modify the underlying heap table, on-access
+ * pruning may read in the current heap page's corresponding VM block to
+ * this buffer.
+ */
+ Buffer xs_vmbuffer;
} IndexFetchHeapData;
/* Result codes for HeapTupleSatisfiesVacuum */
@@ -374,7 +391,8 @@ extern TransactionId heap_index_delete_tuples(Relation rel,
/* in heap/pruneheap.c */
struct GlobalVisState;
-extern void heap_page_prune_opt(Relation relation, Buffer buffer);
+extern void heap_page_prune_opt(Relation relation, Buffer buffer,
+ Buffer *vmbuffer, bool allow_vmset);
extern void heap_page_prune_and_freeze(Relation relation, Buffer buffer,
bool blk_known_av,
Buffer vmbuffer,
diff --git a/src/include/access/relscan.h b/src/include/access/relscan.h
index b5e0fb386c0..f496e0b4939 100644
--- a/src/include/access/relscan.h
+++ b/src/include/access/relscan.h
@@ -121,6 +121,12 @@ typedef struct ParallelBlockTableScanWorkerData *ParallelBlockTableScanWorker;
typedef struct IndexFetchTableData
{
Relation rel;
+
+ /*
+ * Some optimizations can only be performed if the query does not modify
+ * the underlying relation. Track that here.
+ */
+ bool modifies_base_rel;
} IndexFetchTableData;
struct IndexScanInstrumentation;
diff --git a/src/include/access/tableam.h b/src/include/access/tableam.h
index 1c9e802a6b1..0e986d8ef72 100644
--- a/src/include/access/tableam.h
+++ b/src/include/access/tableam.h
@@ -62,6 +62,8 @@ typedef enum ScanOptions
/* unregister snapshot at scan end? */
SO_TEMP_SNAPSHOT = 1 << 9,
+ /* whether or not scan should attempt to set the VM */
+ SO_ALLOW_VM_SET = 1 << 10,
} ScanOptions;
/*
@@ -876,6 +878,25 @@ table_beginscan(Relation rel, Snapshot snapshot,
return rel->rd_tableam->scan_begin(rel, snapshot, nkeys, key, NULL, flags);
}
+/*
+ * Similar to table_beginscan(), but allows the caller to indicate whether the
+ * query modifies the relation. This is used when the caller wants to attempt
+ * marking pages in the relation as all-visible in the visibility map during
+ * on-access pruning.
+ */
+static inline TableScanDesc
+table_beginscan_vmset(Relation rel, Snapshot snapshot,
+ int nkeys, struct ScanKeyData *key, bool modifies_rel)
+{
+ uint32 flags = SO_TYPE_SEQSCAN |
+ SO_ALLOW_STRAT | SO_ALLOW_SYNC | SO_ALLOW_PAGEMODE;
+
+ if (!modifies_rel)
+ flags |= SO_ALLOW_VM_SET;
+
+ return rel->rd_tableam->scan_begin(rel, snapshot, nkeys, key, NULL, flags);
+}
+
/*
* Like table_beginscan(), but for scanning catalog. It'll automatically use a
* snapshot appropriate for scanning catalog relations.
@@ -913,10 +934,13 @@ table_beginscan_strat(Relation rel, Snapshot snapshot,
*/
static inline TableScanDesc
table_beginscan_bm(Relation rel, Snapshot snapshot,
- int nkeys, struct ScanKeyData *key)
+ int nkeys, struct ScanKeyData *key, bool modifies_rel)
{
uint32 flags = SO_TYPE_BITMAPSCAN | SO_ALLOW_PAGEMODE;
+ if (!modifies_rel)
+ flags |= SO_ALLOW_VM_SET;
+
return rel->rd_tableam->scan_begin(rel, snapshot, nkeys, key,
NULL, flags);
}
@@ -1125,6 +1149,10 @@ extern void table_parallelscan_initialize(Relation rel,
extern TableScanDesc table_beginscan_parallel(Relation relation,
ParallelTableScanDesc pscan);
+extern TableScanDesc table_beginscan_parallel_vmset(Relation relation,
+ ParallelTableScanDesc pscan,
+ bool modifies_rel);
+
/*
* Restart a parallel scan. Call this in the leader process. Caller is
* responsible for making sure that all workers have finished the scan
diff --git a/src/include/nodes/execnodes.h b/src/include/nodes/execnodes.h
index e107d6e5f81..1d0b374b652 100644
--- a/src/include/nodes/execnodes.h
+++ b/src/include/nodes/execnodes.h
@@ -680,6 +680,12 @@ typedef struct EState
* ExecDoInitialPruning() */
const char *es_sourceText; /* Source text from QueryDesc */
+ /*
+ * RT indexes of relations modified by the query either through
+ * UPDATE/DELETE/INSERT/MERGE or SELECT FOR UPDATE
+ */
+ Bitmapset *es_modified_relids;
+
JunkFilter *es_junkFilter; /* top-level junk filter, if any */
/* If query can insert/delete tuples, the command ID to mark them with */
@@ -1631,6 +1637,13 @@ typedef struct SeqScanState
{
ScanState ss; /* its first field is NodeTag */
Size pscan_len; /* size of parallel heap scan descriptor */
+
+ /*
+ * Whether or not the query modifies the relation scanned by this node.
+ * This is used to avoid the overhead of optimizations that are only
+ * effective for tables not modified by the query.
+ */
+ bool modifies_rel;
} SeqScanState;
/* ----------------
@@ -1702,6 +1715,7 @@ typedef struct
* OrderByTypByVals is the datatype of order by expression pass-by-value?
* OrderByTypLens typlens of the datatypes of order by expressions
* PscanLen size of parallel index scan descriptor
+ * ModifiesBaseRel true if query modifies base relation
* ----------------
*/
typedef struct IndexScanState
@@ -1731,6 +1745,7 @@ typedef struct IndexScanState
bool *iss_OrderByTypByVals;
int16 *iss_OrderByTypLens;
Size iss_PscanLen;
+ bool iss_ModifiesBaseRel;
} IndexScanState;
/* ----------------
@@ -1888,6 +1903,7 @@ typedef struct SharedBitmapHeapInstrumentation
* pstate shared state for parallel bitmap scan
* sinstrument statistics for parallel workers
* recheck do current page's tuples need recheck
+ * modifies_rel does the query modify the base relation
* ----------------
*/
typedef struct BitmapHeapScanState
@@ -1900,6 +1916,7 @@ typedef struct BitmapHeapScanState
ParallelBitmapHeapState *pstate;
SharedBitmapHeapInstrumentation *sinstrument;
bool recheck;
+ bool modifies_rel;
} BitmapHeapScanState;
/* ----------------
diff --git a/src/include/utils/snapmgr.h b/src/include/utils/snapmgr.h
index d346be71642..fcb10b8d136 100644
--- a/src/include/utils/snapmgr.h
+++ b/src/include/utils/snapmgr.h
@@ -101,6 +101,7 @@ extern bool GlobalVisTestIsRemovableXid(GlobalVisState *state, TransactionId xid
extern bool GlobalVisTestIsRemovableFullXid(GlobalVisState *state, FullTransactionId fxid);
extern bool GlobalVisCheckRemovableXid(Relation rel, TransactionId xid);
extern bool GlobalVisCheckRemovableFullXid(Relation rel, FullTransactionId fxid);
+extern TransactionId OldestXminFromGlobalVisState(GlobalVisState *state);
/*
* Utility functions for implementing visibility routines in table AMs.
--
2.43.0
On 12 Jul 2025, at 03:19, Melanie Plageman <melanieplageman@gmail.com> wrote:
remove the xl_heap_visible struct
Same goes for VISIBILITYMAP_XLOG_CATALOG_REL and XLOG_HEAP2_VISIBLE. But please do not rush to remove it, perhaps I will have a more exhaustive list later. Currently the patch set is expected to be unpolished.
I just need to absorb all effects to have a high-level evaluation of the patch set effect.
I'm still trying to grasp connection of first patch with Assert(prstate->cutoffs) to other patches;
Also, I'd prefer "page is not marked all-visible but visibility map bit is set in relation" to emit XX001 for monitoring reasons, but again, this is small note, while I need a broader picture.
So far I do not see any general problems in delegating redo work from xl_heap_visible to other record. FWIW I observed several cases of VM corruptions that might be connected to the fact that we log VM changes independently of data changes that caused VM to change. But I have no real evidence or understanding what happened.
Best regards, Andrey Borodin.
On Sun, Jul 13, 2025 at 2:34 PM Andrey Borodin <x4mmm@yandex-team.ru> wrote:
On 12 Jul 2025, at 03:19, Melanie Plageman <melanieplageman@gmail.com> wrote:
remove the xl_heap_visible struct
Same goes for VISIBILITYMAP_XLOG_CATALOG_REL and XLOG_HEAP2_VISIBLE. But please do not rush to remove it, perhaps I will have a more exhaustive list later. Currently the patch set is expected to be unpolished.
I just need to absorb all effects to have a high-level evaluation of the patch set effect.
I actually did remove those if you check the last version posted. I
did notice there is one remaining comment referring to
XLOG_HEAP2_VISIBLE I missed somehow, but the actual enums/macros were
removed already.
I'm still trying to grasp connection of first patch with Assert(prstate->cutoffs) to other patches;
I added this because I noticed that it was used without validating it
was provided in that location. The last patch in the set which sets
the VM on access changes where cutoffs are used, so I noticed what I
felt was a missing assert in master while developing that page.
Also, I'd prefer "page is not marked all-visible but visibility map bit is set in relation" to emit XX001 for monitoring reasons, but again, this is small note, while I need a broader picture.
Could you clarify what you mean by this? Are you talking about the
string representation of the visibility map bits in the WAL record
representations in heapdesc.c?
- Melanie
On 14 Jul 2025, at 00:15, Melanie Plageman <melanieplageman@gmail.com> wrote:
Also, I'd prefer "page is not marked all-visible but visibility map bit is set in relation" to emit XX001 for monitoring reasons, but again, this is small note, while I need a broader picture.
Could you clarify what you mean by this? Are you talking about the
string representation of the visibility map bits in the WAL record
representations in heapdesc.c?
This might be a bit off-topic for this thread, but as long as the patch touches that code we can look into this too.
If VM bit all-visible is set while page is not all-visible IndexOnlyScan will show incorrect results. I observed this inconsistency few times on production.
Two persistent subsystems (VM and heap) contradict each other, that's why I think this is a data corruption. Yes, we can repair the VM by assuming heap to be the source of truth in this case. But we must also emit ERRCODE_DATA_CORRUPTED XX001 code into the logs. In many cases this will alert on-call SRE.
To do so I propose to replace elog(WARNING,...) with ereport(WARNING,(errcode(ERRCODE_DATA_CORRUPTED),..).
Best regards, Andrey Borodin.
Thanks for continuing to take a look, Andrey.
On Mon, Jul 14, 2025 at 2:37 AM Andrey Borodin <x4mmm@yandex-team.ru> wrote:
This might be a bit off-topic for this thread, but as long as the patch touches that code we can look into this too.
If VM bit all-visible is set while page is not all-visible IndexOnlyScan will show incorrect results. I observed this inconsistency few times on production.
That's very unfortunate. I wonder what could be causing this. Do you
suspect a bug in Postgres? Or something wrong with the disk, etc?
Two persistent subsystems (VM and heap) contradict each other, that's why I think this is a data corruption. Yes, we can repair the VM by assuming heap to be the source of truth in this case. But we must also emit ERRCODE_DATA_CORRUPTED XX001 code into the logs. In many cases this will alert on-call SRE.
To do so I propose to replace elog(WARNING,...) with ereport(WARNING,(errcode(ERRCODE_DATA_CORRUPTED),..).
Ah, you mean the warnings currently in lazy_scan_prune(). To me this
suggestion makes sense. I see at least one other example with
ERRCODE_DATA_CORRUPTED that is an error level below ERROR.
I have attached a cleaned up and updated version of the patch set (it
doesn't yet include your suggested error message change).
What's new in this version
-----
In addition to general code, comment, and commit message improvements,
notable changes are as follows:
- I have used the GlobalVisState for determining if the whole page is
visible in a more natural way.
- I micro-benchmarked and identified some sources of regression in the
additional code SELECT queries would do to set the VM. So, there are
several new commits addressing these (for example inlining several
functions and unsetting all-visible when we see a dead tuple if we
won't attempt freezing).
- Because heap_page_prune_and_freeze() was getting long, I added some
helper functions.
Performance impact of setting the VM on-access
-------
I found that with the patch set applied, we set many pages all-visible
in the VM on access, resulting in a higher overall number of pages set
all-visible, reducing load for vacuum, and dramatically decreasing
heap fetches by index-only scans.
I devised a simple benchmark -- with 8 workers inserting 20 rows at a
time into a table with a few columns and updating a single row that
they just inserted. Another worker queries the table 1x second using
an index.
After running the benchmark for a few minutes, though the table was
autovacuumed several times in both cases, with the patchset applied,
15% more blocks were all-visible at the end of the benchmark.
And with my patch applied, index-only scans did far fewer heap
fetches. A SELECT count(*) of the table at the same point in the
benchmark did 10,000 heap fetches on master and 500 with the patch
applied (I used auto_explain to determine this).
With my patch applied, autovacuum workers write half as much WAL as on
master. Some of this is courtesy of other patches in the set which
eliminate separate WAL records for setting the page all-visible. But,
vacuum is also scanning fewer pages and dirtying fewer buffers because
they are being set all-visible on-access.
There are more details about the benchmark at the end of the email.
Setting pd_prune_xid on insert
------
The patch "Set-pd_prune_xid-on-insert.txt" can be applied as the last
patch in the set. It sets pd_prune_xid on insert (so pages filled by
COPY or insert can also be set all-visible in the VM before they are
vacuumed). I gave it a .txt extension because it currently fails
035_standby_logical_decoding due to a recovery conflict. I need to
investigate more to see if this is a bug in my patch set or elsewhere
in Postgres.
Besides the failing test, I have a feeling that my current heuristic
for whether or not to set the VM on-access is not quite right for
pages that have only been inserted to -- and if we get it wrong, we've
wasted those CPU cycles because we didn't otherwise need to prune the
page.
- Melanie
Benchmark
-------
psql -c "
DROP TABLE IF EXISTS simple_table;
CREATE TABLE simple_table (
id SERIAL PRIMARY KEY,
group_id INT NOT NULL,
data TEXT,
created_at TIMESTAMPTZ DEFAULT now()
);
create index on simple_table(group_id);
"
pgbench \
--no-vacuum \
--random-seed=0 \
-c 8 \
-j 8 \
-M prepared \
-T 200 \
"pgbench_run_summary_update_${version}" \
-f- <<EOF &
\set gid random(1,1000)
INSERT INTO simple_table (group_id, data)
SELECT :gid, 'inserted'
RETURNING id \gset
update simple_table set data = 'updated' where id = :id;
insert into simple_table (group_id, data)
select :gid, 'inserted'
from generate_series(1,20);
EOF
insert_pid=$!
pgbench \
--no-vacuum \
--random-seed=0 \
-c 1 \
-j 1 \
--rate=1 \
-M prepared \
-T 200 \
"pgbench_run_summary_select_${version}" \
-f- <<EOF
\set gid random(1, 1000)
select max(created_at) from simple_table where group_id = :gid;
select count(*) from simple_table where group_id = :gid;
EOF
wait $insert_pid
Attachments:
Set-pd_prune_xid-on-insert.txttext/plain; charset=US-ASCII; name=Set-pd_prune_xid-on-insert.txtDownload
From 058df21a6da05956bbf3a0a45db575d83a515002 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <melanieplageman@gmail.com>
Date: Tue, 29 Jul 2025 16:12:56 -0400
Subject: [PATCH v5 20/20] Set pd_prune_xid on insert
Now that we can set the VM during read-only queries, it makes sense to
start setting the page prunable hint on insert. This will allow
heap_page_prune_and_freeze() to be called when the page is full or
mostly full.
For years there has been a note in heap_insert() and heap_multi_insert()
pointing out that setting pd_prune_xid would help clean up aborted
inserted tuples that would otherwise not be cleaned up until vacuum.
So, that's another benefit of setting it.
ci-os-only:
---
src/backend/access/heap/heapam.c | 25 +++++++++++++++++--------
src/backend/access/heap/heapam_xlog.c | 15 ++++++++++++++-
2 files changed, 31 insertions(+), 9 deletions(-)
diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index f90b014a9b0..e0f2245052c 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -2094,6 +2094,7 @@ heap_insert(Relation relation, HeapTuple tup, CommandId cid,
TransactionId xid = GetCurrentTransactionId();
HeapTuple heaptup;
Buffer buffer;
+ Page page;
Buffer vmbuffer = InvalidBuffer;
bool all_visible_cleared = false;
@@ -2153,15 +2154,19 @@ heap_insert(Relation relation, HeapTuple tup, CommandId cid,
}
/*
- * XXX Should we set PageSetPrunable on this page ?
+ * Set pd_prune_xid to trigger heap_page_prune_and_freeze() once the page
+ * is full so that we can set the page all-visible in the VM.
*
- * The inserting transaction may eventually abort thus making this tuple
- * DEAD and hence available for pruning. Though we don't want to optimize
- * for aborts, if no other tuple in this page is UPDATEd/DELETEd, the
- * aborted tuple will never be pruned until next vacuum is triggered.
+ * Setting pd_prune_xid is also handy if the inserting transaction
+ * eventually aborts making this tuple DEAD and hence available for
+ * pruning. If no other tuple in this page is UPDATEd/DELETEd, the aborted
+ * tuple would never otherwise be pruned until next vacuum is triggered.
*
- * If you do add PageSetPrunable here, add it in heap_xlog_insert too.
+ * Don't set it if we are in bootstrap mode, though.
*/
+ page = BufferGetPage(buffer);
+ if (TransactionIdIsNormal(xid))
+ PageSetPrunable(page, xid);
MarkBufferDirty(buffer);
@@ -2171,7 +2176,6 @@ heap_insert(Relation relation, HeapTuple tup, CommandId cid,
xl_heap_insert xlrec;
xl_heap_header xlhdr;
XLogRecPtr recptr;
- Page page = BufferGetPage(buffer);
uint8 info = XLOG_HEAP_INSERT;
int bufflags = 0;
@@ -2534,8 +2538,13 @@ heap_multi_insert(Relation relation, TupleTableSlot **slots, int ntuples,
}
/*
- * XXX Should we set PageSetPrunable on this page ? See heap_insert()
+ * Set pd_prune_xid. See heap_insert() for more on why we do this when
+ * inserting tuples. This only makes sense if we aren't already
+ * setting the page frozen in the VM. We also don't set it in
+ * bootstrap mode.
*/
+ if (!all_frozen_set && TransactionIdIsNormal(xid))
+ PageSetPrunable(page, xid);
MarkBufferDirty(buffer);
diff --git a/src/backend/access/heap/heapam_xlog.c b/src/backend/access/heap/heapam_xlog.c
index 64f06d46bf1..234e9a401b9 100644
--- a/src/backend/access/heap/heapam_xlog.c
+++ b/src/backend/access/heap/heapam_xlog.c
@@ -473,6 +473,12 @@ heap_xlog_insert(XLogReaderState *record)
freespace = PageGetHeapFreeSpace(page); /* needed to update FSM below */
+ /*
+ * Set the page prunable to trigger on-access pruning later which may
+ * set the page all-visible in the VM.
+ */
+ PageSetPrunable(page, XLogRecGetXid(record));
+
PageSetLSN(page, lsn);
if (xlrec->flags & XLH_INSERT_ALL_VISIBLE_CLEARED)
@@ -622,9 +628,16 @@ heap_xlog_multi_insert(XLogReaderState *record)
if (xlrec->flags & XLH_INSERT_ALL_VISIBLE_CLEARED)
PageClearAllVisible(page);
- /* XLH_INSERT_ALL_FROZEN_SET implies that all tuples are visible */
+ /*
+ * XLH_INSERT_ALL_FROZEN_SET implies that all tuples are visible. If
+ * we are not setting the page frozen, then set the page's prunable
+ * hint so that we trigger on-access pruning later which may set the
+ * page all-visible in the VM.
+ */
if (xlrec->flags & XLH_INSERT_ALL_FROZEN_SET)
PageSetAllVisible(page);
+ else
+ PageSetPrunable(page, XLogRecGetXid(record));
MarkBufferDirty(buffer);
}
--
2.43.0
v5-0003-Eliminate-xl_heap_visible-from-vacuum-phase-III.patchtext/x-patch; charset=US-ASCII; name=v5-0003-Eliminate-xl_heap_visible-from-vacuum-phase-III.patchDownload
From 053a650299b860242664accc703f46b711807901 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <melanieplageman@gmail.com>
Date: Wed, 18 Jun 2025 12:42:13 -0400
Subject: [PATCH v5 03/20] Eliminate xl_heap_visible from vacuum phase III
Instead of emitting a separate xl_heap_visible record for each page that
is rendered all-visible by vacuum's third phase, include the updates to
the VM in the already emitted xl_heap_prune record.
---
src/backend/access/heap/heapam_xlog.c | 142 ++++++++++++++++++++---
src/backend/access/heap/pruneheap.c | 48 +++++++-
src/backend/access/heap/vacuumlazy.c | 149 +++++++++++++++++--------
src/backend/access/rmgrdesc/heapdesc.c | 13 ++-
src/include/access/heapam.h | 9 ++
src/include/access/heapam_xlog.h | 3 +
6 files changed, 296 insertions(+), 68 deletions(-)
diff --git a/src/backend/access/heap/heapam_xlog.c b/src/backend/access/heap/heapam_xlog.c
index 2485c344191..14541e2e94f 100644
--- a/src/backend/access/heap/heapam_xlog.c
+++ b/src/backend/access/heap/heapam_xlog.c
@@ -35,7 +35,8 @@ heap_xlog_prune_freeze(XLogReaderState *record)
Buffer buffer;
RelFileLocator rlocator;
BlockNumber blkno;
- XLogRedoAction action;
+ Buffer vmbuffer = InvalidBuffer;
+ uint8 vmflags = 0;
XLogRecGetBlockTag(record, 0, &rlocator, NULL, &blkno);
memcpy(&xlrec, maindataptr, SizeOfHeapPrune);
@@ -51,10 +52,15 @@ heap_xlog_prune_freeze(XLogReaderState *record)
(xlrec.flags & (XLHP_HAS_REDIRECTIONS | XLHP_HAS_DEAD_ITEMS)) == 0);
/*
- * We are about to remove and/or freeze tuples. In Hot Standby mode,
- * ensure that there are no queries running for which the removed tuples
- * are still visible or which still consider the frozen xids as running.
- * The conflict horizon XID comes after xl_heap_prune.
+ * After xl_heap_prune is the optional snapshot conflict horizon.
+ *
+ * In Hot Standby mode, we must ensure that there are no running queries
+ * which would conflict with the changes in this record. If pruning, that
+ * means we cannot remove tuples still visible to transactions on the
+ * standby. If freezing, that means we cannot freeze tuples with xids that
+ * are still considered running on the standby. And for setting the VM, we
+ * cannot do so if the page isn't all-visible to all transactions on the
+ * standby.
*/
if ((xlrec.flags & XLHP_HAS_CONFLICT_HORIZON) != 0)
{
@@ -70,13 +76,28 @@ heap_xlog_prune_freeze(XLogReaderState *record)
rlocator);
}
+ /* Next are the optionally included vmflags. Copy them out for later use. */
+ if ((xlrec.flags & XLHP_HAS_VMFLAGS) != 0)
+ {
+ memcpy(&vmflags, maindataptr, sizeof(uint8));
+ maindataptr += sizeof(uint8);
+
+ /*
+ * We don't set VISIBILITYMAP_XLOG_CATALOG_REL in the combined record
+ * because we already have XLHP_IS_CATALOG_REL.
+ */
+ Assert((vmflags & VISIBILITYMAP_VALID_BITS) == vmflags);
+ /* Must never set all_frozen bit without also setting all_visible bit */
+ Assert(vmflags != VISIBILITYMAP_ALL_FROZEN);
+ }
+
/*
- * If we have a full-page image, restore it and we're done.
+ * If we have a full-page image of the heap block, restore it and we're
+ * done with the heap block.
*/
- action = XLogReadBufferForRedoExtended(record, 0, RBM_NORMAL,
- (xlrec.flags & XLHP_CLEANUP_LOCK) != 0,
- &buffer);
- if (action == BLK_NEEDS_REDO)
+ if (XLogReadBufferForRedoExtended(record, 0, RBM_NORMAL,
+ (xlrec.flags & XLHP_CLEANUP_LOCK) != 0,
+ &buffer) == BLK_NEEDS_REDO)
{
Page page = (Page) BufferGetPage(buffer);
OffsetNumber *redirected;
@@ -89,6 +110,9 @@ heap_xlog_prune_freeze(XLogReaderState *record)
Size datalen;
xlhp_freeze_plan *plans;
OffsetNumber *frz_offsets;
+ bool do_prune;
+ bool mark_buffer_dirty;
+ bool set_heap_lsn;
char *dataptr = XLogRecGetBlockData(record, 0, &datalen);
heap_xlog_deserialize_prune_and_freeze(dataptr, xlrec.flags,
@@ -97,11 +121,18 @@ heap_xlog_prune_freeze(XLogReaderState *record)
&ndead, &nowdead,
&nunused, &nowunused);
+ do_prune = nredirected > 0 || ndead > 0 || nunused > 0;
+ set_heap_lsn = mark_buffer_dirty = do_prune || nplans > 0;
+
+ /* Ensure the record does something */
+ Assert(do_prune || nplans > 0 ||
+ vmflags & VISIBILITYMAP_VALID_BITS);
+
/*
* Update all line pointers per the record, and repair fragmentation
* if needed.
*/
- if (nredirected > 0 || ndead > 0 || nunused > 0)
+ if (do_prune)
heap_page_prune_execute(buffer,
(xlrec.flags & XLHP_CLEANUP_LOCK) == 0,
redirected, nredirected,
@@ -138,26 +169,72 @@ heap_xlog_prune_freeze(XLogReaderState *record)
/* There should be no more data */
Assert((char *) frz_offsets == dataptr + datalen);
+ Assert(BufferIsValid(buffer) &&
+ BufferGetBlockNumber(buffer) == blkno);
+
+ /*
+ * Now set PD_ALL_VISIBLE, if required. We'll only do this if we are
+ * also going to set bits in the VM later.
+ *
+ * We must never end up with the VM bit set and the page-level
+ * PD_ALL_VISIBLE bit clear. If that were to occur, a subsequent page
+ * modification would fail to clear the VM bit.
+ */
+ if ((vmflags & VISIBILITYMAP_VALID_BITS) && !PageIsAllVisible(page))
+ {
+ PageSetAllVisible(page);
+
+ /*
+ * If the only change to the heap page is setting PD_ALL_VISIBLE,
+ * we can avoid setting the page LSN unless checksums or
+ * wal_log_hints are enabled.
+ */
+ set_heap_lsn = XLogHintBitIsNeeded() ? true : set_heap_lsn;
+ mark_buffer_dirty = true;
+ }
+
/*
* Note: we don't worry about updating the page's prunability hints.
* At worst this will cause an extra prune cycle to occur soon.
*/
- PageSetLSN(page, lsn);
- MarkBufferDirty(buffer);
+ if (mark_buffer_dirty)
+ MarkBufferDirty(buffer);
+ if (set_heap_lsn)
+ PageSetLSN(page, lsn);
}
/*
- * If we released any space or line pointers, update the free space map.
+ * If we released any space or line pointers or will be setting a page in
+ * the visibility map, update the free space map.
+ *
+ * Even if we are just updating the VM (and thus not freeing up any
+ * space), we'll still update the FSM for this page. Since FSM is not
+ * WAL-logged and only updated heuristically, it easily becomes stale in
+ * standbys. If the standby is later promoted and runs VACUUM, it will
+ * skip updating individual free space figures for pages that became
+ * all-visible (or all-frozen, depending on the vacuum mode,) which is
+ * troublesome when FreeSpaceMapVacuum propagates too optimistic free
+ * space values to upper FSM layers; later inserters try to use such pages
+ * only to find out that they are unusable. This can cause long stalls
+ * when there are many such pages.
+ *
+ * Forestall those problems by updating FSM's idea about a page that is
+ * becoming all-visible or all-frozen.
*
* Do this regardless of a full-page image being applied, since the FSM
* data is not in the page anyway.
+ *
+ * We want to avoid holding an exclusive lock on the heap buffer while
+ * doing IO (either of the FSM or the VM), so we'll release the lock on
+ * the heap buffer before doing either.
*/
if (BufferIsValid(buffer))
{
- if (xlrec.flags & (XLHP_HAS_REDIRECTIONS |
- XLHP_HAS_DEAD_ITEMS |
- XLHP_HAS_NOW_UNUSED_ITEMS))
+ if ((xlrec.flags & (XLHP_HAS_REDIRECTIONS |
+ XLHP_HAS_DEAD_ITEMS |
+ XLHP_HAS_NOW_UNUSED_ITEMS)) ||
+ vmflags & VISIBILITYMAP_VALID_BITS)
{
Size freespace = PageGetHeapFreeSpace(BufferGetPage(buffer));
@@ -168,6 +245,37 @@ heap_xlog_prune_freeze(XLogReaderState *record)
else
UnlockReleaseBuffer(buffer);
}
+
+ /*
+ * Read and update the VM block. Even if we skipped updating the heap page
+ * due to the file being dropped or truncated later in recovery, it's
+ * still safe to update the visibility map. Any WAL record that clears
+ * the visibility map bit does so before checking the page LSN, so any
+ * bits that need to be cleared will still be cleared.
+ *
+ * Note that it is *only* okay that we do not hold a lock on the heap page
+ * because we are in recovery and can expect no other writers to clear
+ * PD_ALL_VISIBLE before we are able to update the VM.
+ */
+ if (vmflags & VISIBILITYMAP_VALID_BITS &&
+ XLogReadBufferForRedoExtended(record, 1,
+ RBM_ZERO_ON_ERROR,
+ false,
+ &vmbuffer) == BLK_NEEDS_REDO)
+ {
+ uint8 old_vmbits = 0;
+ Relation reln = CreateFakeRelcacheEntry(rlocator);
+
+ visibilitymap_pin(reln, blkno, &vmbuffer);
+ old_vmbits = visibilitymap_set_vmbyte(reln, blkno, vmbuffer, vmflags);
+ /* Only set VM page LSN if we modified the page */
+ if (old_vmbits != vmflags)
+ PageSetLSN(BufferGetPage(vmbuffer), lsn);
+ FreeFakeRelcacheEntry(reln);
+ }
+
+ if (BufferIsValid(vmbuffer))
+ UnlockReleaseBuffer(vmbuffer);
}
/*
diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index a8025889be0..d9ba0f96e34 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -21,6 +21,7 @@
#include "access/transam.h"
#include "access/xlog.h"
#include "access/xloginsert.h"
+#include "access/visibilitymapdefs.h"
#include "commands/vacuum.h"
#include "executor/instrument.h"
#include "miscadmin.h"
@@ -835,6 +836,7 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
conflict_xid = prstate.latest_xid_removed;
log_heap_prune_and_freeze(relation, buffer,
+ InvalidBuffer, 0, false,
conflict_xid,
true, reason,
prstate.frozen, prstate.nfrozen,
@@ -2045,12 +2047,23 @@ heap_log_freeze_plan(HeapTupleFreeze *tuples, int ntuples,
* replaying 'unused' items depends on whether they were all previously marked
* as dead.
*
+ * If the VM is being updated, vmflags will contain the bits to set. In this
+ * case, vmbuffer should already have been updated and marked dirty and should
+ * still be pinned and locked.
+ *
+ * set_pd_all_vis indicates that we set PD_ALL_VISIBLE and thus should update
+ * the page LSN when checksums/wal_log_hints are enabled even if we did not
+ * prune or freeze tuples on the page.
+ *
* Note: This function scribbles on the 'frozen' array.
*
* Note: This is called in a critical section, so careful what you do here.
*/
void
log_heap_prune_and_freeze(Relation relation, Buffer buffer,
+ Buffer vmbuffer,
+ uint8 vmflags,
+ bool set_pd_all_vis,
TransactionId conflict_xid,
bool cleanup_lock,
PruneReason reason,
@@ -2062,6 +2075,7 @@ log_heap_prune_and_freeze(Relation relation, Buffer buffer,
xl_heap_prune xlrec;
XLogRecPtr recptr;
uint8 info;
+ uint8 regbuf_flags;
/* The following local variables hold data registered in the WAL record: */
xlhp_freeze_plan plans[MaxHeapTuplesPerPage];
@@ -2070,8 +2084,19 @@ log_heap_prune_and_freeze(Relation relation, Buffer buffer,
xlhp_prune_items dead_items;
xlhp_prune_items unused_items;
OffsetNumber frz_offsets[MaxHeapTuplesPerPage];
+ bool do_prune = nredirected > 0 || ndead > 0 || nunused > 0;
xlrec.flags = 0;
+ regbuf_flags = REGBUF_STANDARD;
+
+ /*
+ * We can avoid an FPI if the only modification we are making to the heap
+ * page is to set PD_ALL_VISIBLE and checksums/wal_log_hints are disabled.
+ */
+ if (!do_prune &&
+ nfrozen == 0 &&
+ (!set_pd_all_vis || !XLogHintBitIsNeeded()))
+ regbuf_flags |= REGBUF_NO_IMAGE;
/*
* Prepare data for the buffer. The arrays are not actually in the
@@ -2079,7 +2104,11 @@ log_heap_prune_and_freeze(Relation relation, Buffer buffer,
* page image, the arrays can be omitted.
*/
XLogBeginInsert();
- XLogRegisterBuffer(0, buffer, REGBUF_STANDARD);
+ XLogRegisterBuffer(0, buffer, regbuf_flags);
+
+ if (vmflags & VISIBILITYMAP_VALID_BITS)
+ XLogRegisterBuffer(1, vmbuffer, 0);
+
if (nfrozen > 0)
{
int nplans;
@@ -2136,6 +2165,8 @@ log_heap_prune_and_freeze(Relation relation, Buffer buffer,
* Prepare the main xl_heap_prune record. We already set the XLHP_HAS_*
* flag above.
*/
+ if (vmflags & VISIBILITYMAP_VALID_BITS)
+ xlrec.flags |= XLHP_HAS_VMFLAGS;
if (RelationIsAccessibleInLogicalDecoding(relation))
xlrec.flags |= XLHP_IS_CATALOG_REL;
if (TransactionIdIsValid(conflict_xid))
@@ -2150,6 +2181,8 @@ log_heap_prune_and_freeze(Relation relation, Buffer buffer,
XLogRegisterData(&xlrec, SizeOfHeapPrune);
if (TransactionIdIsValid(conflict_xid))
XLogRegisterData(&conflict_xid, sizeof(TransactionId));
+ if (vmflags & VISIBILITYMAP_VALID_BITS)
+ XLogRegisterData(&vmflags, sizeof(uint8));
switch (reason)
{
@@ -2168,5 +2201,16 @@ log_heap_prune_and_freeze(Relation relation, Buffer buffer,
}
recptr = XLogInsert(RM_HEAP2_ID, info);
- PageSetLSN(BufferGetPage(buffer), recptr);
+ if (vmflags & VISIBILITYMAP_VALID_BITS)
+ PageSetLSN(BufferGetPage(vmbuffer), recptr);
+
+ /*
+ * If pruning or freezing tuples or setting the page all-visible when
+ * checksums or wal_hint_bits are enabled, we must bump the LSN. Torn
+ * pages are possible if we update PD_ALL_VISIBLE without bumping the LSN,
+ * but this is deemed okay for page hint updates.
+ */
+ if (do_prune || nfrozen > 0 ||
+ (set_pd_all_vis && XLogHintBitIsNeeded()))
+ PageSetLSN(BufferGetPage(buffer), recptr);
}
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index 8a62a93eee5..460cdbd8417 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -464,11 +464,13 @@ static void dead_items_add(LVRelState *vacrel, BlockNumber blkno, OffsetNumber *
int num_offsets);
static void dead_items_reset(LVRelState *vacrel);
static void dead_items_cleanup(LVRelState *vacrel);
-static bool heap_page_is_all_visible(Relation rel, Buffer buf,
- TransactionId OldestXmin,
- bool *all_frozen,
- TransactionId *visibility_cutoff_xid,
- OffsetNumber *logging_offnum);
+static bool heap_page_is_all_visible_except_lpdead(Relation rel, Buffer buf,
+ TransactionId OldestXmin,
+ OffsetNumber *deadoffsets,
+ int allowed_num_offsets,
+ bool *all_frozen,
+ TransactionId *visibility_cutoff_xid,
+ OffsetNumber *logging_offnum);
static void update_relstats_all_indexes(LVRelState *vacrel);
static void vacuum_error_callback(void *arg);
static void update_vacuum_error_info(LVRelState *vacrel,
@@ -2847,8 +2849,11 @@ lazy_vacuum_heap_page(LVRelState *vacrel, BlockNumber blkno, Buffer buffer,
OffsetNumber unused[MaxHeapTuplesPerPage];
int nunused = 0;
TransactionId visibility_cutoff_xid;
+ TransactionId conflict_xid = InvalidTransactionId;
bool all_frozen;
LVSavedErrInfo saved_err_info;
+ uint8 vmflags = 0;
+ bool set_pd_all_vis = false;
Assert(vacrel->do_index_vacuuming);
@@ -2859,6 +2864,20 @@ lazy_vacuum_heap_page(LVRelState *vacrel, BlockNumber blkno, Buffer buffer,
VACUUM_ERRCB_PHASE_VACUUM_HEAP, blkno,
InvalidOffsetNumber);
+ if (heap_page_is_all_visible_except_lpdead(vacrel->rel, buffer,
+ vacrel->cutoffs.OldestXmin,
+ deadoffsets, num_offsets,
+ &all_frozen, &visibility_cutoff_xid,
+ &vacrel->offnum))
+ {
+ vmflags |= VISIBILITYMAP_ALL_VISIBLE;
+ if (all_frozen)
+ {
+ vmflags |= VISIBILITYMAP_ALL_FROZEN;
+ Assert(!TransactionIdIsValid(visibility_cutoff_xid));
+ }
+ }
+
START_CRIT_SECTION();
for (int i = 0; i < num_offsets; i++)
@@ -2878,6 +2897,18 @@ lazy_vacuum_heap_page(LVRelState *vacrel, BlockNumber blkno, Buffer buffer,
/* Attempt to truncate line pointer array now */
PageTruncateLinePointerArray(page);
+ if ((vmflags & VISIBILITYMAP_VALID_BITS) != 0)
+ {
+ Assert(!PageIsAllVisible(page));
+ set_pd_all_vis = true;
+ LockBuffer(vmbuffer, BUFFER_LOCK_EXCLUSIVE);
+ PageSetAllVisible(page);
+ visibilitymap_set_vmbyte(vacrel->rel,
+ blkno,
+ vmbuffer, vmflags);
+ conflict_xid = visibility_cutoff_xid;
+ }
+
/*
* Mark buffer dirty before we write WAL.
*/
@@ -2887,7 +2918,10 @@ lazy_vacuum_heap_page(LVRelState *vacrel, BlockNumber blkno, Buffer buffer,
if (RelationNeedsWAL(vacrel->rel))
{
log_heap_prune_and_freeze(vacrel->rel, buffer,
- InvalidTransactionId,
+ vmbuffer,
+ vmflags,
+ set_pd_all_vis,
+ conflict_xid,
false, /* no cleanup lock required */
PRUNE_VACUUM_CLEANUP,
NULL, 0, /* frozen */
@@ -2896,39 +2930,12 @@ lazy_vacuum_heap_page(LVRelState *vacrel, BlockNumber blkno, Buffer buffer,
unused, nunused);
}
- /*
- * End critical section, so we safely can do visibility tests (which
- * possibly need to perform IO and allocate memory!). If we crash now the
- * page (including the corresponding vm bit) might not be marked all
- * visible, but that's fine. A later vacuum will fix that.
- */
END_CRIT_SECTION();
- /*
- * Now that we have removed the LP_DEAD items from the page, once again
- * check if the page has become all-visible. The page is already marked
- * dirty, exclusively locked, and, if needed, a full page image has been
- * emitted.
- */
- Assert(!PageIsAllVisible(page));
- if (heap_page_is_all_visible(vacrel->rel, buffer, vacrel->cutoffs.OldestXmin,
- &all_frozen, &visibility_cutoff_xid, &vacrel->offnum))
+ if ((vmflags & VISIBILITYMAP_ALL_VISIBLE) != 0)
{
- uint8 flags = VISIBILITYMAP_ALL_VISIBLE;
-
- if (all_frozen)
- {
- Assert(!TransactionIdIsValid(visibility_cutoff_xid));
- flags |= VISIBILITYMAP_ALL_FROZEN;
- }
-
- PageSetAllVisible(page);
- visibilitymap_set(vacrel->rel, blkno, buffer,
- InvalidXLogRecPtr,
- vmbuffer, visibility_cutoff_xid,
- flags);
-
/* Count the newly set VM page for logging */
+ LockBuffer(vmbuffer, BUFFER_LOCK_UNLOCK);
vacrel->vm_new_visible_pages++;
if (all_frozen)
vacrel->vm_new_visible_frozen_pages++;
@@ -3594,6 +3601,25 @@ dead_items_cleanup(LVRelState *vacrel)
vacrel->pvs = NULL;
}
+/*
+ * Wrapper for heap_page_is_all_visible_except_lpdead() which can be used for
+ * callers that expect no LP_DEAD on the page.
+ */
+bool
+heap_page_is_all_visible(Relation rel, Buffer buf,
+ TransactionId OldestXmin,
+ bool *all_frozen,
+ TransactionId *visibility_cutoff_xid,
+ OffsetNumber *logging_offnum)
+{
+
+ return heap_page_is_all_visible_except_lpdead(rel, buf, OldestXmin,
+ NULL, 0,
+ all_frozen,
+ visibility_cutoff_xid,
+ logging_offnum);
+}
+
/*
* Check if every tuple in the given page is visible to all current and future
* transactions.
@@ -3607,23 +3633,35 @@ dead_items_cleanup(LVRelState *vacrel)
* visible tuples. Sets *all_frozen to true if every tuple on this page is
* frozen.
*
- * This is a stripped down version of lazy_scan_prune(). If you change
- * anything here, make sure that everything stays in sync. Note that an
- * assertion calls us to verify that everybody still agrees. Be sure to avoid
- * introducing new side-effects here.
+ * deadoffsets are the offsets we know about and are about to set LP_UNUSED.
+ * allowed_num_offsets is the number of those. As long as the LP_DEAD items we
+ * encounter on the page match those exactly, we can set the page all-visible
+ * in the VM.
+ *
+ * Callers looking to verify that the page is all-visible can call
+ * heap_page_is_all_visible().
+ *
+ * This is similar logic to that in heap_prune_record_unchanged_lp_normal() If
+ * you change anything here, make sure that everything stays in sync. Note
+ * that an assertion calls us to verify that everybody still agrees. Be sure
+ * to avoid introducing new side-effects here.
*/
static bool
-heap_page_is_all_visible(Relation rel, Buffer buf,
- TransactionId OldestXmin,
- bool *all_frozen,
- TransactionId *visibility_cutoff_xid,
- OffsetNumber *logging_offnum)
+heap_page_is_all_visible_except_lpdead(Relation rel, Buffer buf,
+ TransactionId OldestXmin,
+ OffsetNumber *deadoffsets,
+ int allowed_num_offsets,
+ bool *all_frozen,
+ TransactionId *visibility_cutoff_xid,
+ OffsetNumber *logging_offnum)
{
Page page = BufferGetPage(buf);
BlockNumber blockno = BufferGetBlockNumber(buf);
OffsetNumber offnum,
maxoff;
bool all_visible = true;
+ OffsetNumber current_dead_offsets[MaxHeapTuplesPerPage];
+ size_t current_num_offsets = 0;
*visibility_cutoff_xid = InvalidTransactionId;
*all_frozen = true;
@@ -3655,9 +3693,8 @@ heap_page_is_all_visible(Relation rel, Buffer buf,
*/
if (ItemIdIsDead(itemid))
{
- all_visible = false;
- *all_frozen = false;
- break;
+ current_dead_offsets[current_num_offsets++] = offnum;
+ continue;
}
Assert(ItemIdIsNormal(itemid));
@@ -3724,7 +3761,23 @@ heap_page_is_all_visible(Relation rel, Buffer buf,
/* Clear the offset information once we have processed the given page. */
*logging_offnum = InvalidOffsetNumber;
- return all_visible;
+ /* If we already know it's not all-visible, return false */
+ if (!all_visible)
+ return false;
+
+ /* If we weren't allowed any dead offsets, we're done */
+ if (allowed_num_offsets == 0)
+ return current_num_offsets == 0;
+
+ /* If the number of dead offsets has changed, that's wrong */
+ if (current_num_offsets != allowed_num_offsets)
+ return false;
+
+ Assert(deadoffsets);
+
+ /* The dead offsets must be the same dead offsets */
+ return memcmp(current_dead_offsets, deadoffsets,
+ allowed_num_offsets * sizeof(OffsetNumber)) == 0;
}
/*
diff --git a/src/backend/access/rmgrdesc/heapdesc.c b/src/backend/access/rmgrdesc/heapdesc.c
index b48d7dc1d24..d6c86ccac20 100644
--- a/src/backend/access/rmgrdesc/heapdesc.c
+++ b/src/backend/access/rmgrdesc/heapdesc.c
@@ -266,6 +266,7 @@ heap2_desc(StringInfo buf, XLogReaderState *record)
{
char *rec = XLogRecGetData(record);
uint8 info = XLogRecGetInfo(record) & ~XLR_INFO_MASK;
+ char *maindataptr = rec + SizeOfHeapPrune;
info &= XLOG_HEAP_OPMASK;
if (info == XLOG_HEAP2_PRUNE_ON_ACCESS ||
@@ -278,7 +279,8 @@ heap2_desc(StringInfo buf, XLogReaderState *record)
{
TransactionId conflict_xid;
- memcpy(&conflict_xid, rec + SizeOfHeapPrune, sizeof(TransactionId));
+ memcpy(&conflict_xid, maindataptr, sizeof(TransactionId));
+ maindataptr += sizeof(TransactionId);
appendStringInfo(buf, "snapshotConflictHorizon: %u",
conflict_xid);
@@ -287,6 +289,15 @@ heap2_desc(StringInfo buf, XLogReaderState *record)
appendStringInfo(buf, ", isCatalogRel: %c",
xlrec->flags & XLHP_IS_CATALOG_REL ? 'T' : 'F');
+ if (xlrec->flags & XLHP_HAS_VMFLAGS)
+ {
+ uint8 vmflags;
+
+ memcpy(&vmflags, maindataptr, sizeof(uint8));
+ maindataptr += sizeof(uint8);
+ appendStringInfo(buf, ", vm_flags: 0x%02X", vmflags);
+ }
+
if (XLogRecHasBlockData(record, 0))
{
Size datalen;
diff --git a/src/include/access/heapam.h b/src/include/access/heapam.h
index a2bd5a897f8..8b47295efa2 100644
--- a/src/include/access/heapam.h
+++ b/src/include/access/heapam.h
@@ -344,6 +344,12 @@ extern void heap_inplace_update_and_unlock(Relation relation,
Buffer buffer);
extern void heap_inplace_unlock(Relation relation,
HeapTuple oldtup, Buffer buffer);
+
+extern bool heap_page_is_all_visible(Relation rel, Buffer buf,
+ TransactionId OldestXmin,
+ bool *all_frozen,
+ TransactionId *visibility_cutoff_xid,
+ OffsetNumber *logging_offnum);
extern bool heap_prepare_freeze_tuple(HeapTupleHeader tuple,
const struct VacuumCutoffs *cutoffs,
HeapPageFreeze *pagefrz,
@@ -388,6 +394,9 @@ extern void heap_page_prune_execute(Buffer buffer, bool lp_truncate_only,
OffsetNumber *nowunused, int nunused);
extern void heap_get_root_tuples(Page page, OffsetNumber *root_offsets);
extern void log_heap_prune_and_freeze(Relation relation, Buffer buffer,
+ Buffer vmbuffer,
+ uint8 vmflags,
+ bool vm_modified_heap_page,
TransactionId conflict_xid,
bool cleanup_lock,
PruneReason reason,
diff --git a/src/include/access/heapam_xlog.h b/src/include/access/heapam_xlog.h
index 277df6b3cf0..ceae9c083ff 100644
--- a/src/include/access/heapam_xlog.h
+++ b/src/include/access/heapam_xlog.h
@@ -295,6 +295,9 @@ typedef struct xl_heap_prune
#define SizeOfHeapPrune (offsetof(xl_heap_prune, flags) + sizeof(uint8))
+/* If the record should update the VM, this is the new value */
+#define XLHP_HAS_VMFLAGS (1 << 0)
+
/* to handle recovery conflict during logical decoding on standby */
#define XLHP_IS_CATALOG_REL (1 << 1)
--
2.43.0
v5-0004-Use-xl_heap_prune-record-for-setting-empty-pages-.patchtext/x-patch; charset=US-ASCII; name=v5-0004-Use-xl_heap_prune-record-for-setting-empty-pages-.patchDownload
From 291de3c976a1312b86156d3e4e984eb66808b9b8 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <melanieplageman@gmail.com>
Date: Wed, 18 Jun 2025 12:42:19 -0400
Subject: [PATCH v5 04/20] Use xl_heap_prune record for setting empty pages
all-visible
As part of a project to eliminate xl_heap_visible records, eliminate
their usage in phase I vacuum of empty pages.
---
src/backend/access/heap/pruneheap.c | 14 +++++--
src/backend/access/heap/vacuumlazy.c | 55 ++++++++++++++++++----------
src/include/access/heapam.h | 1 +
3 files changed, 47 insertions(+), 23 deletions(-)
diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index d9ba0f96e34..97e51f78854 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -836,6 +836,7 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
conflict_xid = prstate.latest_xid_removed;
log_heap_prune_and_freeze(relation, buffer,
+ false,
InvalidBuffer, 0, false,
conflict_xid,
true, reason,
@@ -2051,6 +2052,9 @@ heap_log_freeze_plan(HeapTupleFreeze *tuples, int ntuples,
* case, vmbuffer should already have been updated and marked dirty and should
* still be pinned and locked.
*
+ * force_heap_fpi indicates that a full page image of the heap block should be
+ * forced.
+ *
* set_pd_all_vis indicates that we set PD_ALL_VISIBLE and thus should update
* the page LSN when checksums/wal_log_hints are enabled even if we did not
* prune or freeze tuples on the page.
@@ -2061,6 +2065,7 @@ heap_log_freeze_plan(HeapTupleFreeze *tuples, int ntuples,
*/
void
log_heap_prune_and_freeze(Relation relation, Buffer buffer,
+ bool force_heap_fpi,
Buffer vmbuffer,
uint8 vmflags,
bool set_pd_all_vis,
@@ -2089,13 +2094,16 @@ log_heap_prune_and_freeze(Relation relation, Buffer buffer,
xlrec.flags = 0;
regbuf_flags = REGBUF_STANDARD;
+ if (force_heap_fpi)
+ regbuf_flags |= REGBUF_FORCE_IMAGE;
+
/*
* We can avoid an FPI if the only modification we are making to the heap
* page is to set PD_ALL_VISIBLE and checksums/wal_log_hints are disabled.
*/
- if (!do_prune &&
- nfrozen == 0 &&
- (!set_pd_all_vis || !XLogHintBitIsNeeded()))
+ else if (!do_prune &&
+ nfrozen == 0 &&
+ (!set_pd_all_vis || !XLogHintBitIsNeeded()))
regbuf_flags |= REGBUF_NO_IMAGE;
/*
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index 460cdbd8417..d9e195269d2 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -1878,33 +1878,47 @@ lazy_scan_new_or_empty(LVRelState *vacrel, Buffer buf, BlockNumber blkno,
*/
if (!PageIsAllVisible(page))
{
+ uint8 new_vmbits = VISIBILITYMAP_ALL_VISIBLE |
+ VISIBILITYMAP_ALL_FROZEN;
+
START_CRIT_SECTION();
- /* mark buffer dirty before writing a WAL record */
+ PageSetAllVisible(page);
MarkBufferDirty(buf);
- /*
- * It's possible that another backend has extended the heap,
- * initialized the page, and then failed to WAL-log the page due
- * to an ERROR. Since heap extension is not WAL-logged, recovery
- * might try to replay our record setting the page all-visible and
- * find that the page isn't initialized, which will cause a PANIC.
- * To prevent that, check whether the page has been previously
- * WAL-logged, and if not, do that now.
- */
- if (RelationNeedsWAL(vacrel->rel) &&
- PageGetLSN(page) == InvalidXLogRecPtr)
- log_newpage_buffer(buf, true);
+ LockBuffer(vmbuffer, BUFFER_LOCK_EXCLUSIVE);
+ visibilitymap_set_vmbyte(vacrel->rel, blkno,
+ vmbuffer, new_vmbits);
+
+ if (RelationNeedsWAL(vacrel->rel))
+ {
+ /*
+ * It's possible that another backend has extended the heap,
+ * initialized the page, and then failed to WAL-log the page
+ * due to an ERROR. Since heap extension is not WAL-logged,
+ * recovery might try to replay our record setting the page
+ * all-visible and find that the page isn't initialized, which
+ * will cause a PANIC. To prevent that, if the page hasn't
+ * been previously WAL-logged, force a heap FPI.
+ */
+ log_heap_prune_and_freeze(vacrel->rel, buf,
+ PageGetLSN(page) == InvalidXLogRecPtr,
+ vmbuffer,
+ new_vmbits,
+ true,
+ InvalidTransactionId,
+ false, PRUNE_VACUUM_SCAN,
+ NULL, 0,
+ NULL, 0,
+ NULL, 0,
+ NULL, 0);
+ }
- PageSetAllVisible(page);
- visibilitymap_set(vacrel->rel, blkno, buf,
- InvalidXLogRecPtr,
- vmbuffer, InvalidTransactionId,
- VISIBILITYMAP_ALL_VISIBLE |
- VISIBILITYMAP_ALL_FROZEN);
END_CRIT_SECTION();
- /* Count the newly all-frozen pages for logging */
+ LockBuffer(vmbuffer, BUFFER_LOCK_UNLOCK);
+
+ /* Count the newly all-frozen pages for logging. */
vacrel->vm_new_visible_pages++;
vacrel->vm_new_visible_frozen_pages++;
}
@@ -2918,6 +2932,7 @@ lazy_vacuum_heap_page(LVRelState *vacrel, BlockNumber blkno, Buffer buffer,
if (RelationNeedsWAL(vacrel->rel))
{
log_heap_prune_and_freeze(vacrel->rel, buffer,
+ false,
vmbuffer,
vmflags,
set_pd_all_vis,
diff --git a/src/include/access/heapam.h b/src/include/access/heapam.h
index 8b47295efa2..e7129a644a1 100644
--- a/src/include/access/heapam.h
+++ b/src/include/access/heapam.h
@@ -394,6 +394,7 @@ extern void heap_page_prune_execute(Buffer buffer, bool lp_truncate_only,
OffsetNumber *nowunused, int nunused);
extern void heap_get_root_tuples(Page page, OffsetNumber *root_offsets);
extern void log_heap_prune_and_freeze(Relation relation, Buffer buffer,
+ bool force_heap_fpi,
Buffer vmbuffer,
uint8 vmflags,
bool vm_modified_heap_page,
--
2.43.0
v5-0001-Eliminate-xl_heap_visible-in-COPY-FREEZE.patchtext/x-patch; charset=US-ASCII; name=v5-0001-Eliminate-xl_heap_visible-in-COPY-FREEZE.patchDownload
From f98373090c6281d0278bfc7ffd407bad274c302d Mon Sep 17 00:00:00 2001
From: Melanie Plageman <melanieplageman@gmail.com>
Date: Tue, 17 Jun 2025 17:22:10 -0400
Subject: [PATCH v5 01/20] Eliminate xl_heap_visible in COPY FREEZE
Instead of emitting a separate WAL record for setting the VM bits in
xl_heap_visible, specify the changes to make to the VM block in the
xl_heap_multi_insert record instead.
---
src/backend/access/heap/heapam.c | 47 ++++++++++---------
src/backend/access/heap/heapam_xlog.c | 39 +++++++++++++++-
src/backend/access/heap/visibilitymap.c | 62 ++++++++++++++++++++++++-
src/backend/access/rmgrdesc/heapdesc.c | 5 ++
src/include/access/visibilitymap.h | 2 +
5 files changed, 132 insertions(+), 23 deletions(-)
diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index 0dcd6ee817e..68db4325285 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -2493,9 +2493,6 @@ heap_multi_insert(Relation relation, TupleTableSlot **slots, int ntuples,
/*
* If the page is all visible, need to clear that, unless we're only
* going to add further frozen rows to it.
- *
- * If we're only adding already frozen rows to a previously empty
- * page, mark it as all-visible.
*/
if (PageIsAllVisible(page) && !(options & HEAP_INSERT_FROZEN))
{
@@ -2505,8 +2502,22 @@ heap_multi_insert(Relation relation, TupleTableSlot **slots, int ntuples,
BufferGetBlockNumber(buffer),
vmbuffer, VISIBILITYMAP_VALID_BITS);
}
+
+ /*
+ * If we're only adding already frozen rows to a previously empty
+ * page, mark it as all-frozen and update the visibility map. We're
+ * already holding a pin on the vmbuffer.
+ */
else if (all_frozen_set)
+ {
PageSetAllVisible(page);
+ LockBuffer(vmbuffer, BUFFER_LOCK_EXCLUSIVE);
+ visibilitymap_set_vmbyte(relation,
+ BufferGetBlockNumber(buffer),
+ vmbuffer,
+ VISIBILITYMAP_ALL_VISIBLE |
+ VISIBILITYMAP_ALL_FROZEN);
+ }
/*
* XXX Should we set PageSetPrunable on this page ? See heap_insert()
@@ -2554,6 +2565,12 @@ heap_multi_insert(Relation relation, TupleTableSlot **slots, int ntuples,
xlrec->flags = 0;
if (all_visible_cleared)
xlrec->flags = XLH_INSERT_ALL_VISIBLE_CLEARED;
+
+ /*
+ * We don't have to worry about including a conflict xid in the
+ * WAL record as HEAP_INSERT_FROZEN intentionally violates
+ * visibility rules.
+ */
if (all_frozen_set)
xlrec->flags = XLH_INSERT_ALL_FROZEN_SET;
@@ -2616,7 +2633,10 @@ heap_multi_insert(Relation relation, TupleTableSlot **slots, int ntuples,
XLogBeginInsert();
XLogRegisterData(xlrec, tupledata - scratch.data);
+
XLogRegisterBuffer(0, buffer, REGBUF_STANDARD | bufflags);
+ if (all_frozen_set)
+ XLogRegisterBuffer(1, vmbuffer, 0);
XLogRegisterBufData(0, tupledata, totaldatalen);
@@ -2626,29 +2646,14 @@ heap_multi_insert(Relation relation, TupleTableSlot **slots, int ntuples,
recptr = XLogInsert(RM_HEAP2_ID, info);
PageSetLSN(page, recptr);
+ if (all_frozen_set)
+ PageSetLSN(BufferGetPage(vmbuffer), recptr);
}
END_CRIT_SECTION();
- /*
- * If we've frozen everything on the page, update the visibilitymap.
- * We're already holding pin on the vmbuffer.
- */
if (all_frozen_set)
- {
- Assert(PageIsAllVisible(page));
- Assert(visibilitymap_pin_ok(BufferGetBlockNumber(buffer), vmbuffer));
-
- /*
- * It's fine to use InvalidTransactionId here - this is only used
- * when HEAP_INSERT_FROZEN is specified, which intentionally
- * violates visibility rules.
- */
- visibilitymap_set(relation, BufferGetBlockNumber(buffer), buffer,
- InvalidXLogRecPtr, vmbuffer,
- InvalidTransactionId,
- VISIBILITYMAP_ALL_VISIBLE | VISIBILITYMAP_ALL_FROZEN);
- }
+ LockBuffer(vmbuffer, BUFFER_LOCK_UNLOCK);
UnlockReleaseBuffer(buffer);
ndone += nthispage;
diff --git a/src/backend/access/heap/heapam_xlog.c b/src/backend/access/heap/heapam_xlog.c
index eb4bd3d6ae3..2485c344191 100644
--- a/src/backend/access/heap/heapam_xlog.c
+++ b/src/backend/access/heap/heapam_xlog.c
@@ -552,6 +552,7 @@ heap_xlog_multi_insert(XLogReaderState *record)
int i;
bool isinit = (XLogRecGetInfo(record) & XLOG_HEAP_INIT_PAGE) != 0;
XLogRedoAction action;
+ Buffer vmbuffer = InvalidBuffer;
/*
* Insertion doesn't overwrite MVCC data, so no conflict processing is
@@ -572,11 +573,11 @@ heap_xlog_multi_insert(XLogReaderState *record)
if (xlrec->flags & XLH_INSERT_ALL_VISIBLE_CLEARED)
{
Relation reln = CreateFakeRelcacheEntry(rlocator);
- Buffer vmbuffer = InvalidBuffer;
visibilitymap_pin(reln, blkno, &vmbuffer);
visibilitymap_clear(reln, blkno, vmbuffer, VISIBILITYMAP_VALID_BITS);
ReleaseBuffer(vmbuffer);
+ vmbuffer = InvalidBuffer;
FreeFakeRelcacheEntry(reln);
}
@@ -663,6 +664,42 @@ heap_xlog_multi_insert(XLogReaderState *record)
if (BufferIsValid(buffer))
UnlockReleaseBuffer(buffer);
+ buffer = InvalidBuffer;
+
+ /*
+ * Now read and update the VM block. Even if we skipped updating the heap
+ * page due to the file being dropped or truncated later in recovery, it's
+ * still safe to update the visibility map. Any WAL record that clears
+ * the visibility map bit does so before checking the page LSN, so any
+ * bits that need to be cleared will still be cleared.
+ *
+ * It is only okay to set the VM bits without holding the heap page lock
+ * because we can expect no other writers of this page.
+ */
+ if (xlrec->flags & XLH_INSERT_ALL_FROZEN_SET &&
+ XLogReadBufferForRedoExtended(record, 1, RBM_ZERO_ON_ERROR, false,
+ &vmbuffer) == BLK_NEEDS_REDO)
+ {
+ Relation reln = CreateFakeRelcacheEntry(rlocator);
+
+ visibilitymap_pin(reln, blkno, &vmbuffer);
+ visibilitymap_set_vmbyte(reln, blkno,
+ vmbuffer,
+ VISIBILITYMAP_ALL_VISIBLE |
+ VISIBILITYMAP_ALL_FROZEN);
+
+ /*
+ * It is not possible that the VM was already set for this heap page,
+ * so the vmbuffer must have been modified and marked dirty.
+ */
+ Assert(BufferIsDirty(vmbuffer));
+ PageSetLSN(BufferGetPage(vmbuffer), lsn);
+ FreeFakeRelcacheEntry(reln);
+ }
+
+ if (BufferIsValid(vmbuffer))
+ UnlockReleaseBuffer(vmbuffer);
+
/*
* If the page is running low on free space, update the FSM as well.
* Arbitrarily, our definition of "low" is less than 20%. We can't do much
diff --git a/src/backend/access/heap/visibilitymap.c b/src/backend/access/heap/visibilitymap.c
index 8f918e00af7..0bc64203959 100644
--- a/src/backend/access/heap/visibilitymap.c
+++ b/src/backend/access/heap/visibilitymap.c
@@ -14,7 +14,8 @@
* visibilitymap_clear - clear bits for one page in the visibility map
* visibilitymap_pin - pin a map page for setting a bit
* visibilitymap_pin_ok - check whether correct map page is already pinned
- * visibilitymap_set - set a bit in a previously pinned page
+ * visibilitymap_set - set a bit in a previously pinned page and log
+ * visibilitymap_set_vmbyte - set a bit in a pinned page
* visibilitymap_get_status - get status of bits
* visibilitymap_count - count number of bits set in visibility map
* visibilitymap_prepare_truncate -
@@ -318,6 +319,65 @@ visibilitymap_set(Relation rel, BlockNumber heapBlk, Buffer heapBuf,
return status;
}
+/*
+ * Set flags in the VM block contained in the passed in vmBuf.
+ *
+ * This function is for callers which include the VM changes in the same WAL
+ * record as the modifications of the heap page which rendered it all-visible.
+ * Callers separately logging the VM changes should invoke visibilitymap_set()
+ * instead.
+ *
+ * Caller must have pinned and exclusive locked the correct block of the VM in
+ * vmBuf.
+ *
+ * During normal operation (i.e. not recovery), this should be called in a
+ * critical section which also makes any necessary changes to the heap page
+ * and, if relevant, emits WAL.
+ *
+ * Caller is responsible for WAL logging the changes to the VM buffer and for
+ * making any changes needed to the associated heap page.
+ */
+uint8
+visibilitymap_set_vmbyte(Relation rel, BlockNumber heapBlk,
+ Buffer vmBuf, uint8 flags)
+{
+ BlockNumber mapBlock = HEAPBLK_TO_MAPBLOCK(heapBlk);
+ uint32 mapByte = HEAPBLK_TO_MAPBYTE(heapBlk);
+ uint8 mapOffset = HEAPBLK_TO_OFFSET(heapBlk);
+ Page page;
+ uint8 *map;
+ uint8 status;
+
+#ifdef TRACE_VISIBILITYMAP
+ elog(DEBUG1, "vm_set %s %d", RelationGetRelationName(rel), heapBlk);
+#endif
+
+ /* Call in same critical section where WAL is emitted. */
+ Assert(InRecovery || CritSectionCount > 0);
+
+ /* Flags should be valid. Also never clear bits with this function */
+ Assert((flags & VISIBILITYMAP_VALID_BITS) == flags);
+
+ /* Must never set all_frozen bit without also setting all_visible bit */
+ Assert(flags != VISIBILITYMAP_ALL_FROZEN);
+
+ /* Check that we have the right VM page pinned */
+ if (!BufferIsValid(vmBuf) || BufferGetBlockNumber(vmBuf) != mapBlock)
+ elog(ERROR, "wrong VM buffer passed to visibilitymap_set");
+
+ page = BufferGetPage(vmBuf);
+ map = (uint8 *) PageGetContents(page);
+
+ status = (map[mapByte] >> mapOffset) & VISIBILITYMAP_VALID_BITS;
+ if (flags != status)
+ {
+ map[mapByte] |= (flags << mapOffset);
+ MarkBufferDirty(vmBuf);
+ }
+
+ return status;
+}
+
/*
* visibilitymap_get_status - get status of bits
*
diff --git a/src/backend/access/rmgrdesc/heapdesc.c b/src/backend/access/rmgrdesc/heapdesc.c
index 82b62c95de5..b48d7dc1d24 100644
--- a/src/backend/access/rmgrdesc/heapdesc.c
+++ b/src/backend/access/rmgrdesc/heapdesc.c
@@ -16,6 +16,7 @@
#include "access/heapam_xlog.h"
#include "access/rmgrdesc_utils.h"
+#include "access/visibilitymapdefs.h"
#include "storage/standbydefs.h"
/*
@@ -354,6 +355,10 @@ heap2_desc(StringInfo buf, XLogReaderState *record)
appendStringInfo(buf, "ntuples: %d, flags: 0x%02X", xlrec->ntuples,
xlrec->flags);
+ if (xlrec->flags & XLH_INSERT_ALL_FROZEN_SET)
+ appendStringInfo(buf, ", vm_flags: 0x%02X",
+ VISIBILITYMAP_ALL_VISIBLE | VISIBILITYMAP_ALL_FROZEN);
+
if (XLogRecHasBlockData(record, 0) && !isinit)
{
appendStringInfoString(buf, ", offsets:");
diff --git a/src/include/access/visibilitymap.h b/src/include/access/visibilitymap.h
index be21c6dd1a3..977566f6b98 100644
--- a/src/include/access/visibilitymap.h
+++ b/src/include/access/visibilitymap.h
@@ -37,6 +37,8 @@ extern uint8 visibilitymap_set(Relation rel,
Buffer vmBuf,
TransactionId cutoff_xid,
uint8 flags);
+extern uint8 visibilitymap_set_vmbyte(Relation rel, BlockNumber heapBlk,
+ Buffer vmBuf, uint8 flags);
extern uint8 visibilitymap_get_status(Relation rel, BlockNumber heapBlk, Buffer *vmbuf);
extern void visibilitymap_count(Relation rel, BlockNumber *all_visible, BlockNumber *all_frozen);
extern BlockNumber visibilitymap_prepare_truncate(Relation rel,
--
2.43.0
v5-0002-Make-heap_page_is_all_visible-independent-of-LVRe.patchtext/x-patch; charset=US-ASCII; name=v5-0002-Make-heap_page_is_all_visible-independent-of-LVRe.patchDownload
From a18fe6f8169af3c4e286a3dc3332ab31108998ff Mon Sep 17 00:00:00 2001
From: Melanie Plageman <melanieplageman@gmail.com>
Date: Wed, 18 Jun 2025 15:48:51 -0400
Subject: [PATCH v5 02/20] Make heap_page_is_all_visible independent of
LVRelState
Future commits will use this function inside of pruneheap.c where we do
not have access to the LVRelState. We only need two parameters from the
LVRelState, so just pass those in explicitly.
---
src/backend/access/heap/vacuumlazy.c | 45 ++++++++++++++++++----------
1 file changed, 29 insertions(+), 16 deletions(-)
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index 14036c27e87..8a62a93eee5 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -464,8 +464,11 @@ static void dead_items_add(LVRelState *vacrel, BlockNumber blkno, OffsetNumber *
int num_offsets);
static void dead_items_reset(LVRelState *vacrel);
static void dead_items_cleanup(LVRelState *vacrel);
-static bool heap_page_is_all_visible(LVRelState *vacrel, Buffer buf,
- TransactionId *visibility_cutoff_xid, bool *all_frozen);
+static bool heap_page_is_all_visible(Relation rel, Buffer buf,
+ TransactionId OldestXmin,
+ bool *all_frozen,
+ TransactionId *visibility_cutoff_xid,
+ OffsetNumber *logging_offnum);
static void update_relstats_all_indexes(LVRelState *vacrel);
static void vacuum_error_callback(void *arg);
static void update_vacuum_error_info(LVRelState *vacrel,
@@ -2010,8 +2013,9 @@ lazy_scan_prune(LVRelState *vacrel,
Assert(presult.lpdead_items == 0);
- if (!heap_page_is_all_visible(vacrel, buf,
- &debug_cutoff, &debug_all_frozen))
+ if (!heap_page_is_all_visible(vacrel->rel, buf,
+ vacrel->cutoffs.OldestXmin, &debug_all_frozen,
+ &debug_cutoff, &vacrel->offnum))
Assert(false);
Assert(presult.all_frozen == debug_all_frozen);
@@ -2907,8 +2911,8 @@ lazy_vacuum_heap_page(LVRelState *vacrel, BlockNumber blkno, Buffer buffer,
* emitted.
*/
Assert(!PageIsAllVisible(page));
- if (heap_page_is_all_visible(vacrel, buffer, &visibility_cutoff_xid,
- &all_frozen))
+ if (heap_page_is_all_visible(vacrel->rel, buffer, vacrel->cutoffs.OldestXmin,
+ &all_frozen, &visibility_cutoff_xid, &vacrel->offnum))
{
uint8 flags = VISIBILITYMAP_ALL_VISIBLE;
@@ -3592,9 +3596,16 @@ dead_items_cleanup(LVRelState *vacrel)
/*
* Check if every tuple in the given page is visible to all current and future
- * transactions. Also return the visibility_cutoff_xid which is the highest
- * xmin amongst the visible tuples. Set *all_frozen to true if every tuple
- * on this page is frozen.
+ * transactions.
+ *
+ * OldestXmin is used to determine visibility.
+ *
+ * *logging_offnum will have the OffsetNumber of the current tuple being
+ * processed for vacuum's error callback system.
+ *
+ * Return the visibility_cutoff_xid which is the highest xmin amongst the
+ * visible tuples. Sets *all_frozen to true if every tuple on this page is
+ * frozen.
*
* This is a stripped down version of lazy_scan_prune(). If you change
* anything here, make sure that everything stays in sync. Note that an
@@ -3602,9 +3613,11 @@ dead_items_cleanup(LVRelState *vacrel)
* introducing new side-effects here.
*/
static bool
-heap_page_is_all_visible(LVRelState *vacrel, Buffer buf,
+heap_page_is_all_visible(Relation rel, Buffer buf,
+ TransactionId OldestXmin,
+ bool *all_frozen,
TransactionId *visibility_cutoff_xid,
- bool *all_frozen)
+ OffsetNumber *logging_offnum)
{
Page page = BufferGetPage(buf);
BlockNumber blockno = BufferGetBlockNumber(buf);
@@ -3627,7 +3640,7 @@ heap_page_is_all_visible(LVRelState *vacrel, Buffer buf,
* Set the offset number so that we can display it along with any
* error that occurred while processing this tuple.
*/
- vacrel->offnum = offnum;
+ *logging_offnum = offnum;
itemid = PageGetItemId(page, offnum);
/* Unused or redirect line pointers are of no interest */
@@ -3651,9 +3664,9 @@ heap_page_is_all_visible(LVRelState *vacrel, Buffer buf,
tuple.t_data = (HeapTupleHeader) PageGetItem(page, itemid);
tuple.t_len = ItemIdGetLength(itemid);
- tuple.t_tableOid = RelationGetRelid(vacrel->rel);
+ tuple.t_tableOid = RelationGetRelid(rel);
- switch (HeapTupleSatisfiesVacuum(&tuple, vacrel->cutoffs.OldestXmin,
+ switch (HeapTupleSatisfiesVacuum(&tuple, OldestXmin,
buf))
{
case HEAPTUPLE_LIVE:
@@ -3674,7 +3687,7 @@ heap_page_is_all_visible(LVRelState *vacrel, Buffer buf,
*/
xmin = HeapTupleHeaderGetXmin(tuple.t_data);
if (!TransactionIdPrecedes(xmin,
- vacrel->cutoffs.OldestXmin))
+ OldestXmin))
{
all_visible = false;
*all_frozen = false;
@@ -3709,7 +3722,7 @@ heap_page_is_all_visible(LVRelState *vacrel, Buffer buf,
} /* scan along page */
/* Clear the offset information once we have processed the given page. */
- vacrel->offnum = InvalidOffsetNumber;
+ *logging_offnum = InvalidOffsetNumber;
return all_visible;
}
--
2.43.0
v5-0006-Combine-vacuum-phase-I-VM-update-cases.patchtext/x-patch; charset=US-ASCII; name=v5-0006-Combine-vacuum-phase-I-VM-update-cases.patchDownload
From 82343294b239425abb298358a5881f9308f7ec08 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <melanieplageman@gmail.com>
Date: Wed, 28 May 2025 16:35:36 -0400
Subject: [PATCH v5 06/20] Combine vacuum phase I VM update cases
We update the VM after phase I of vacuum -- either setting both the VM
bits when all bits are currently unset or setting just the frozen bit
when the all-visible bit is already set.
Those two cases shared much of the same code -- leading to unnecessary
duplication. This commit combines them, which is simpler and easier to
understand.
The combined case also happens to fix a longstanding bug where if we are
only setting an all-visible page all-frozen and checksums/wal_log_hints
are enabled, we would fail to set the buffer dirty before setting the
page LSN in visibilitymap_set().
---
src/backend/access/heap/vacuumlazy.c | 101 +++++++++------------------
1 file changed, 32 insertions(+), 69 deletions(-)
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index 04a7b6c4181..f6cdd9e6828 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -2152,11 +2152,26 @@ lazy_scan_prune(LVRelState *vacrel,
{
/* Don't update the VM if we just cleared corruption in it */
}
- else if (!all_visible_according_to_vm && presult.all_visible)
+
+ /*
+ * If the page isn't yet marked all-visible in the VM or it is and needs
+ * to me marked all-frozen, update the VM Note that all_frozen is only
+ * valid if all_visible is true, so we must check both all_visible and
+ * all_frozen.
+ */
+ else if (presult.all_visible &&
+ (!all_visible_according_to_vm ||
+ (presult.all_frozen && !VM_ALL_FROZEN(vacrel->rel, blkno, &vmbuffer))))
{
uint8 old_vmbits;
uint8 flags = VISIBILITYMAP_ALL_VISIBLE;
+ /*
+ * If the page is all-frozen, we can pass InvalidTransactionId as our
+ * cutoff_xid, since a snapshotConflictHorizon sufficient to make
+ * everything safe for REDO was logged when the page's tuples were
+ * frozen.
+ */
if (presult.all_frozen)
{
Assert(!TransactionIdIsValid(presult.vm_conflict_horizon));
@@ -2169,21 +2184,29 @@ lazy_scan_prune(LVRelState *vacrel,
* checksums are not enabled). Regardless, set both bits so that we
* get back in sync.
*
- * NB: If the heap page is all-visible but the VM bit is not set, we
- * don't need to dirty the heap page. However, if checksums are
- * enabled, we do need to make sure that the heap page is dirtied
- * before passing it to visibilitymap_set(), because it may be logged.
- * Given that this situation should only happen in rare cases after a
- * crash, it is not worth optimizing.
+ * If the heap page is all-visible but the VM bit is not set, we don't
+ * need to dirty the heap page. However, if checksums are enabled, we
+ * do need to make sure that the heap page is dirtied before passing
+ * it to visibilitymap_set(), because it may be logged.
*/
- PageSetAllVisible(page);
- MarkBufferDirty(buf);
+ if (!PageIsAllVisible(page) || XLogHintBitIsNeeded())
+ {
+ PageSetAllVisible(page);
+ MarkBufferDirty(buf);
+ }
+
old_vmbits = visibilitymap_set(vacrel->rel, blkno, buf,
InvalidXLogRecPtr,
vmbuffer, presult.vm_conflict_horizon,
flags);
/*
+ * Even if we are only setting the all-frozen bit, there is a small
+ * chance that the VM was modified sometime between setting
+ * all_visible_according_to_vm and checking the visibility during
+ * pruning. Check the return value of old_vmbits to ensure the
+ * visibility map counters used for logging are accurate.
+ *
* If the page wasn't already set all-visible and/or all-frozen in the
* VM, count it as newly set for logging.
*/
@@ -2204,66 +2227,6 @@ lazy_scan_prune(LVRelState *vacrel,
}
}
- /*
- * If the all-visible page is all-frozen but not marked as such yet, mark
- * it as all-frozen. Note that all_frozen is only valid if all_visible is
- * true, so we must check both all_visible and all_frozen.
- */
- else if (all_visible_according_to_vm && presult.all_visible &&
- presult.all_frozen && !VM_ALL_FROZEN(vacrel->rel, blkno, &vmbuffer))
- {
- uint8 old_vmbits;
-
- /*
- * Avoid relying on all_visible_according_to_vm as a proxy for the
- * page-level PD_ALL_VISIBLE bit being set, since it might have become
- * stale -- even when all_visible is set
- */
- if (!PageIsAllVisible(page))
- {
- PageSetAllVisible(page);
- MarkBufferDirty(buf);
- }
-
- /*
- * Set the page all-frozen (and all-visible) in the VM.
- *
- * We can pass InvalidTransactionId as our cutoff_xid, since a
- * snapshotConflictHorizon sufficient to make everything safe for REDO
- * was logged when the page's tuples were frozen.
- */
- Assert(!TransactionIdIsValid(presult.vm_conflict_horizon));
- old_vmbits = visibilitymap_set(vacrel->rel, blkno, buf,
- InvalidXLogRecPtr,
- vmbuffer, InvalidTransactionId,
- VISIBILITYMAP_ALL_VISIBLE |
- VISIBILITYMAP_ALL_FROZEN);
-
- /*
- * The page was likely already set all-visible in the VM. However,
- * there is a small chance that it was modified sometime between
- * setting all_visible_according_to_vm and checking the visibility
- * during pruning. Check the return value of old_vmbits anyway to
- * ensure the visibility map counters used for logging are accurate.
- */
- if ((old_vmbits & VISIBILITYMAP_ALL_VISIBLE) == 0)
- {
- vacrel->vm_new_visible_pages++;
- vacrel->vm_new_visible_frozen_pages++;
- *vm_page_frozen = true;
- }
-
- /*
- * We already checked that the page was not set all-frozen in the VM
- * above, so we don't need to test the value of old_vmbits.
- */
- else
- {
- vacrel->vm_new_frozen_pages++;
- *vm_page_frozen = true;
- }
- }
-
return presult.ndeleted;
}
--
2.43.0
v5-0007-Find-and-fix-VM-corruption-in-heap_page_prune_and.patchtext/x-patch; charset=US-ASCII; name=v5-0007-Find-and-fix-VM-corruption-in-heap_page_prune_and.patchDownload
From 033dd160216fb21473adb94c61286b34dc0abd36 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <melanieplageman@gmail.com>
Date: Wed, 28 May 2025 16:45:59 -0400
Subject: [PATCH v5 07/20] Find and fix VM corruption in
heap_page_prune_and_freeze
Future commits will update the VM in the same critical section and WAL
record as pruning and freezing. For ease of review, this commit makes
one step toward doing this. It moves the VM corruption handling case to
heap_page_prune_and_freeze().
---
src/backend/access/heap/pruneheap.c | 87 +++++++++++++++++++++++++++-
src/backend/access/heap/vacuumlazy.c | 77 +++---------------------
src/include/access/heapam.h | 4 ++
3 files changed, 96 insertions(+), 72 deletions(-)
diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index 97e51f78854..496b70e318f 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -21,7 +21,7 @@
#include "access/transam.h"
#include "access/xlog.h"
#include "access/xloginsert.h"
-#include "access/visibilitymapdefs.h"
+#include "access/visibilitymap.h"
#include "commands/vacuum.h"
#include "executor/instrument.h"
#include "miscadmin.h"
@@ -177,6 +177,13 @@ static void heap_prune_record_unchanged_lp_redirect(PruneState *prstate, OffsetN
static void page_verify_redirects(Page page);
+static bool identify_and_fix_vm_corruption(Relation relation,
+ BlockNumber heap_blk,
+ Buffer heap_buffer, Page heap_page,
+ bool heap_blk_known_av,
+ int64 nlpdead_items,
+ Buffer vmbuffer);
+
/*
* Optionally prune and repair fragmentation in the specified page.
@@ -261,7 +268,9 @@ heap_page_prune_opt(Relation relation, Buffer buffer)
* not the relation has indexes, since we cannot safely determine
* that during on-access pruning with the current implementation.
*/
- heap_page_prune_and_freeze(relation, buffer, vistest, 0,
+ heap_page_prune_and_freeze(relation, buffer, false,
+ InvalidBuffer,
+ vistest, 0,
NULL, &presult, PRUNE_ON_ACCESS, &dummy_off_loc, NULL, NULL);
/*
@@ -294,6 +303,64 @@ heap_page_prune_opt(Relation relation, Buffer buffer)
}
}
+/*
+ * When updating the visibility map after phase I heap vacuuming, we take the
+ * opportunity to identify and fix any VM corruption.
+ *
+ * heap_blk_known_av is the visibility status of the heap page collected
+ * while finding the next unskippable block in heap_vac_scan_next_block().
+ */
+static bool
+identify_and_fix_vm_corruption(Relation relation,
+ BlockNumber heap_blk,
+ Buffer heap_buffer, Page heap_page,
+ bool heap_blk_known_av,
+ int64 nlpdead_items,
+ Buffer vmbuffer)
+{
+ /*
+ * As of PostgreSQL 9.2, the visibility map bit should never be set if the
+ * page-level bit is clear. However, it's possible that the bit got
+ * cleared after heap_vac_scan_next_block() was called, so we must recheck
+ * with buffer lock before concluding that the VM is corrupt.
+ */
+ if (heap_blk_known_av && !PageIsAllVisible(heap_page) &&
+ visibilitymap_get_status(relation, heap_blk, &vmbuffer) != 0)
+ {
+ elog(WARNING, "page is not marked all-visible but visibility map bit is set in relation \"%s\" page %u",
+ RelationGetRelationName(relation), heap_blk);
+ visibilitymap_clear(relation, heap_blk, vmbuffer,
+ VISIBILITYMAP_VALID_BITS);
+ return true;
+ }
+
+ /*
+ * It's possible for the value returned by
+ * GetOldestNonRemovableTransactionId() to move backwards, so it's not
+ * wrong for us to see tuples that appear to not be visible to everyone
+ * yet, while PD_ALL_VISIBLE is already set. The real safe xmin value
+ * never moves backwards, but GetOldestNonRemovableTransactionId() is
+ * conservative and sometimes returns a value that's unnecessarily small,
+ * so if we see that contradiction it just means that the tuples that we
+ * think are not visible to everyone yet actually are, and the
+ * PD_ALL_VISIBLE flag is correct.
+ *
+ * There should never be LP_DEAD items on a page with PD_ALL_VISIBLE set,
+ * however.
+ */
+ if (nlpdead_items > 0 && PageIsAllVisible(heap_page))
+ {
+ elog(WARNING, "page containing LP_DEAD items is marked as all-visible in relation \"%s\" page %u",
+ RelationGetRelationName(relation), heap_blk);
+ PageClearAllVisible(heap_page);
+ MarkBufferDirty(heap_buffer);
+ visibilitymap_clear(relation, heap_blk, vmbuffer,
+ VISIBILITYMAP_VALID_BITS);
+ return true;
+ }
+
+ return false;
+}
/*
* Prune and repair fragmentation and potentially freeze tuples on the
@@ -314,6 +381,10 @@ heap_page_prune_opt(Relation relation, Buffer buffer)
* HEAP_PRUNE_FREEZE option is not set, because at the moment only callers
* that also freeze need that information.
*
+ * blk_known_av is the visibility status of the heap block as of the last call
+ * to find_next_unskippable_block(). vmbuffer is the buffer that may already
+ * contain the required block of the visibility map.
+ *
* vistest is used to distinguish whether tuples are DEAD or RECENTLY_DEAD
* (see heap_prune_satisfies_vacuum).
*
@@ -349,6 +420,8 @@ heap_page_prune_opt(Relation relation, Buffer buffer)
*/
void
heap_page_prune_and_freeze(Relation relation, Buffer buffer,
+ bool blk_known_av,
+ Buffer vmbuffer,
GlobalVisState *vistest,
int options,
struct VacuumCutoffs *cutoffs,
@@ -897,6 +970,16 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
presult->lpdead_items = prstate.lpdead_items;
/* the presult->deadoffsets array was already filled in */
+ /*
+ * Clear any VM corruption. This does not need to be done in a critical
+ * section.
+ */
+ presult->vm_corruption = false;
+ if (options & HEAP_PAGE_PRUNE_UPDATE_VM)
+ presult->vm_corruption = identify_and_fix_vm_corruption(relation,
+ blockno, buffer, page,
+ blk_known_av,
+ prstate.lpdead_items, vmbuffer);
if (prstate.freeze)
{
if (presult->nfrozen > 0)
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index f6cdd9e6828..0c121fdf4e6 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -431,12 +431,6 @@ static void find_next_unskippable_block(LVRelState *vacrel, bool *skipsallvis);
static bool lazy_scan_new_or_empty(LVRelState *vacrel, Buffer buf,
BlockNumber blkno, Page page,
bool sharelock, Buffer vmbuffer);
-static bool identify_and_fix_vm_corruption(Relation relation,
- BlockNumber heap_blk,
- Buffer heap_buffer, Page heap_page,
- bool heap_blk_known_av,
- int64 nlpdead_items,
- Buffer vmbuffer);
static int lazy_scan_prune(LVRelState *vacrel, Buffer buf,
BlockNumber blkno, Page page,
Buffer vmbuffer, bool all_visible_according_to_vm,
@@ -1939,65 +1933,6 @@ lazy_scan_new_or_empty(LVRelState *vacrel, Buffer buf, BlockNumber blkno,
return false;
}
-/*
- * When updating the visibility map after phase I heap vacuuming, we take the
- * opportunity to identify and fix any VM corruption.
- *
- * heap_blk_known_av is the visibility status of the heap page collected
- * while finding the next unskippable block in heap_vac_scan_next_block().
- */
-static bool
-identify_and_fix_vm_corruption(Relation relation,
- BlockNumber heap_blk,
- Buffer heap_buffer, Page heap_page,
- bool heap_blk_known_av,
- int64 nlpdead_items,
- Buffer vmbuffer)
-{
- /*
- * As of PostgreSQL 9.2, the visibility map bit should never be set if the
- * page-level bit is clear. However, it's possible that the bit got
- * cleared after heap_vac_scan_next_block() was called, so we must recheck
- * with buffer lock before concluding that the VM is corrupt.
- */
- if (heap_blk_known_av && !PageIsAllVisible(heap_page) &&
- visibilitymap_get_status(relation, heap_blk, &vmbuffer) != 0)
- {
- elog(WARNING, "page is not marked all-visible but visibility map bit is set in relation \"%s\" page %u",
- RelationGetRelationName(relation), heap_blk);
- visibilitymap_clear(relation, heap_blk, vmbuffer,
- VISIBILITYMAP_VALID_BITS);
- return true;
- }
-
- /*
- * It's possible for the value returned by
- * GetOldestNonRemovableTransactionId() to move backwards, so it's not
- * wrong for us to see tuples that appear to not be visible to everyone
- * yet, while PD_ALL_VISIBLE is already set. The real safe xmin value
- * never moves backwards, but GetOldestNonRemovableTransactionId() is
- * conservative and sometimes returns a value that's unnecessarily small,
- * so if we see that contradiction it just means that the tuples that we
- * think are not visible to everyone yet actually are, and the
- * PD_ALL_VISIBLE flag is correct.
- *
- * There should never be LP_DEAD items on a page with PD_ALL_VISIBLE set,
- * however.
- */
- if (nlpdead_items > 0 && PageIsAllVisible(heap_page))
- {
- elog(WARNING, "page containing LP_DEAD items is marked as all-visible in relation \"%s\" page %u",
- RelationGetRelationName(relation), heap_blk);
- PageClearAllVisible(heap_page);
- MarkBufferDirty(heap_buffer);
- visibilitymap_clear(relation, heap_blk, vmbuffer,
- VISIBILITYMAP_VALID_BITS);
- return true;
- }
-
- return false;
-}
-
/* qsort comparator for sorting OffsetNumbers */
static int
@@ -2056,11 +1991,14 @@ lazy_scan_prune(LVRelState *vacrel,
* tuples. Pruning will have determined whether or not the page is
* all-visible.
*/
- prune_options = HEAP_PAGE_PRUNE_FREEZE;
+ prune_options = HEAP_PAGE_PRUNE_FREEZE | HEAP_PAGE_PRUNE_UPDATE_VM;
if (vacrel->nindexes == 0)
prune_options |= HEAP_PAGE_PRUNE_MARK_UNUSED_NOW;
- heap_page_prune_and_freeze(rel, buf, vacrel->vistest, prune_options,
+ heap_page_prune_and_freeze(rel, buf,
+ all_visible_according_to_vm,
+ vmbuffer,
+ vacrel->vistest, prune_options,
&vacrel->cutoffs, &presult, PRUNE_VACUUM_SCAN,
&vacrel->offnum,
&vacrel->NewRelfrozenXid, &vacrel->NewRelminMxid);
@@ -2145,10 +2083,9 @@ lazy_scan_prune(LVRelState *vacrel,
/*
* Handle setting visibility map bit based on information from the VM (as
* of last heap_vac_scan_next_block() call), and from all_visible and
- * all_frozen variables. Start by looking for any VM corruption.
+ * all_frozen variables.
*/
- if (identify_and_fix_vm_corruption(vacrel->rel, blkno, buf, page,
- all_visible_according_to_vm, presult.lpdead_items, vmbuffer))
+ if (presult.vm_corruption)
{
/* Don't update the VM if we just cleared corruption in it */
}
diff --git a/src/include/access/heapam.h b/src/include/access/heapam.h
index e7129a644a1..0c7eb5e46f4 100644
--- a/src/include/access/heapam.h
+++ b/src/include/access/heapam.h
@@ -42,6 +42,7 @@
/* "options" flag bits for heap_page_prune_and_freeze */
#define HEAP_PAGE_PRUNE_MARK_UNUSED_NOW (1 << 0)
#define HEAP_PAGE_PRUNE_FREEZE (1 << 1)
+#define HEAP_PAGE_PRUNE_UPDATE_VM (1 << 2)
typedef struct BulkInsertStateData *BulkInsertState;
struct TupleTableSlot;
@@ -247,6 +248,7 @@ typedef struct PruneFreezeResult
bool all_visible;
bool all_frozen;
TransactionId vm_conflict_horizon;
+ bool vm_corruption;
/*
* Whether or not the page makes rel truncation unsafe. This is set to
@@ -380,6 +382,8 @@ extern TransactionId heap_index_delete_tuples(Relation rel,
struct GlobalVisState;
extern void heap_page_prune_opt(Relation relation, Buffer buffer);
extern void heap_page_prune_and_freeze(Relation relation, Buffer buffer,
+ bool blk_known_av,
+ Buffer vmbuffer,
struct GlobalVisState *vistest,
int options,
struct VacuumCutoffs *cutoffs,
--
2.43.0
v5-0005-Combine-lazy_scan_prune-VM-corruption-cases.patchtext/x-patch; charset=US-ASCII; name=v5-0005-Combine-lazy_scan_prune-VM-corruption-cases.patchDownload
From c2d9153bfcf2a4c4d703bbfdd262dd21a6172c9d Mon Sep 17 00:00:00 2001
From: Melanie Plageman <melanieplageman@gmail.com>
Date: Wed, 28 May 2025 16:04:03 -0400
Subject: [PATCH v5 05/20] Combine lazy_scan_prune VM corruption cases
lazy_scan_prune() updates the visibility map after phase I of heap
vacuuming. It also checks and fixes corruption in the VM. The corruption
cases where mixed in with the normal visibility map update cases.
Careful study of the ordering of the current logic reveals that the
corruption cases can be reordered and extracted into a separate
function. This should result in no additional overhead when compared to
previous execution.
This reordering makes it clear which cases are about corruption and
which cases are normal VM updates. Separating them also makes it
possible to combine the normal cases in a future commit. This will make
the logic easier to understand and allow for further separation of the
logic to allow updating the VM in the same record as pruning and
freezing in phase I.
---
src/backend/access/heap/vacuumlazy.c | 114 +++++++++++++++++----------
1 file changed, 73 insertions(+), 41 deletions(-)
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index d9e195269d2..04a7b6c4181 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -431,6 +431,12 @@ static void find_next_unskippable_block(LVRelState *vacrel, bool *skipsallvis);
static bool lazy_scan_new_or_empty(LVRelState *vacrel, Buffer buf,
BlockNumber blkno, Page page,
bool sharelock, Buffer vmbuffer);
+static bool identify_and_fix_vm_corruption(Relation relation,
+ BlockNumber heap_blk,
+ Buffer heap_buffer, Page heap_page,
+ bool heap_blk_known_av,
+ int64 nlpdead_items,
+ Buffer vmbuffer);
static int lazy_scan_prune(LVRelState *vacrel, Buffer buf,
BlockNumber blkno, Page page,
Buffer vmbuffer, bool all_visible_according_to_vm,
@@ -1933,6 +1939,66 @@ lazy_scan_new_or_empty(LVRelState *vacrel, Buffer buf, BlockNumber blkno,
return false;
}
+/*
+ * When updating the visibility map after phase I heap vacuuming, we take the
+ * opportunity to identify and fix any VM corruption.
+ *
+ * heap_blk_known_av is the visibility status of the heap page collected
+ * while finding the next unskippable block in heap_vac_scan_next_block().
+ */
+static bool
+identify_and_fix_vm_corruption(Relation relation,
+ BlockNumber heap_blk,
+ Buffer heap_buffer, Page heap_page,
+ bool heap_blk_known_av,
+ int64 nlpdead_items,
+ Buffer vmbuffer)
+{
+ /*
+ * As of PostgreSQL 9.2, the visibility map bit should never be set if the
+ * page-level bit is clear. However, it's possible that the bit got
+ * cleared after heap_vac_scan_next_block() was called, so we must recheck
+ * with buffer lock before concluding that the VM is corrupt.
+ */
+ if (heap_blk_known_av && !PageIsAllVisible(heap_page) &&
+ visibilitymap_get_status(relation, heap_blk, &vmbuffer) != 0)
+ {
+ elog(WARNING, "page is not marked all-visible but visibility map bit is set in relation \"%s\" page %u",
+ RelationGetRelationName(relation), heap_blk);
+ visibilitymap_clear(relation, heap_blk, vmbuffer,
+ VISIBILITYMAP_VALID_BITS);
+ return true;
+ }
+
+ /*
+ * It's possible for the value returned by
+ * GetOldestNonRemovableTransactionId() to move backwards, so it's not
+ * wrong for us to see tuples that appear to not be visible to everyone
+ * yet, while PD_ALL_VISIBLE is already set. The real safe xmin value
+ * never moves backwards, but GetOldestNonRemovableTransactionId() is
+ * conservative and sometimes returns a value that's unnecessarily small,
+ * so if we see that contradiction it just means that the tuples that we
+ * think are not visible to everyone yet actually are, and the
+ * PD_ALL_VISIBLE flag is correct.
+ *
+ * There should never be LP_DEAD items on a page with PD_ALL_VISIBLE set,
+ * however.
+ */
+ if (nlpdead_items > 0 && PageIsAllVisible(heap_page))
+ {
+ elog(WARNING, "page containing LP_DEAD items is marked as all-visible in relation \"%s\" page %u",
+ RelationGetRelationName(relation), heap_blk);
+ PageClearAllVisible(heap_page);
+ MarkBufferDirty(heap_buffer);
+ visibilitymap_clear(relation, heap_blk, vmbuffer,
+ VISIBILITYMAP_VALID_BITS);
+ return true;
+ }
+
+ return false;
+}
+
+
/* qsort comparator for sorting OffsetNumbers */
static int
cmpOffsetNumbers(const void *a, const void *b)
@@ -2079,9 +2145,14 @@ lazy_scan_prune(LVRelState *vacrel,
/*
* Handle setting visibility map bit based on information from the VM (as
* of last heap_vac_scan_next_block() call), and from all_visible and
- * all_frozen variables
+ * all_frozen variables. Start by looking for any VM corruption.
*/
- if (!all_visible_according_to_vm && presult.all_visible)
+ if (identify_and_fix_vm_corruption(vacrel->rel, blkno, buf, page,
+ all_visible_according_to_vm, presult.lpdead_items, vmbuffer))
+ {
+ /* Don't update the VM if we just cleared corruption in it */
+ }
+ else if (!all_visible_according_to_vm && presult.all_visible)
{
uint8 old_vmbits;
uint8 flags = VISIBILITYMAP_ALL_VISIBLE;
@@ -2133,45 +2204,6 @@ lazy_scan_prune(LVRelState *vacrel,
}
}
- /*
- * As of PostgreSQL 9.2, the visibility map bit should never be set if the
- * page-level bit is clear. However, it's possible that the bit got
- * cleared after heap_vac_scan_next_block() was called, so we must recheck
- * with buffer lock before concluding that the VM is corrupt.
- */
- else if (all_visible_according_to_vm && !PageIsAllVisible(page) &&
- visibilitymap_get_status(vacrel->rel, blkno, &vmbuffer) != 0)
- {
- elog(WARNING, "page is not marked all-visible but visibility map bit is set in relation \"%s\" page %u",
- vacrel->relname, blkno);
- visibilitymap_clear(vacrel->rel, blkno, vmbuffer,
- VISIBILITYMAP_VALID_BITS);
- }
-
- /*
- * It's possible for the value returned by
- * GetOldestNonRemovableTransactionId() to move backwards, so it's not
- * wrong for us to see tuples that appear to not be visible to everyone
- * yet, while PD_ALL_VISIBLE is already set. The real safe xmin value
- * never moves backwards, but GetOldestNonRemovableTransactionId() is
- * conservative and sometimes returns a value that's unnecessarily small,
- * so if we see that contradiction it just means that the tuples that we
- * think are not visible to everyone yet actually are, and the
- * PD_ALL_VISIBLE flag is correct.
- *
- * There should never be LP_DEAD items on a page with PD_ALL_VISIBLE set,
- * however.
- */
- else if (presult.lpdead_items > 0 && PageIsAllVisible(page))
- {
- elog(WARNING, "page containing LP_DEAD items is marked as all-visible in relation \"%s\" page %u",
- vacrel->relname, blkno);
- PageClearAllVisible(page);
- MarkBufferDirty(buf);
- visibilitymap_clear(vacrel->rel, blkno, vmbuffer,
- VISIBILITYMAP_VALID_BITS);
- }
-
/*
* If the all-visible page is all-frozen but not marked as such yet, mark
* it as all-frozen. Note that all_frozen is only valid if all_visible is
--
2.43.0
v5-0009-Update-VM-in-pruneheap.c.patchtext/x-patch; charset=US-ASCII; name=v5-0009-Update-VM-in-pruneheap.c.patchDownload
From 473633011ff4448cf7332de529ca235f5802c749 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <melanieplageman@gmail.com>
Date: Mon, 2 Jun 2025 11:04:14 -0400
Subject: [PATCH v5 09/20] Update VM in pruneheap.c
As a step toward updating the VM in the same critical section and WAL
record as pruning and freezing (during phase I of vacuuming), first move
the VM update (still in its own critical section and WAL record) into
heap_page_prune_and_freeze(). This makes review easier.
---
src/backend/access/heap/pruneheap.c | 99 +++++++++++++++++++++++-----
src/backend/access/heap/vacuumlazy.c | 99 +++++-----------------------
src/include/access/heapam.h | 15 +++--
3 files changed, 106 insertions(+), 107 deletions(-)
diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index 6c3653e776c..05227ce0339 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -360,7 +360,8 @@ identify_and_fix_vm_corruption(Relation relation,
/*
* Prune and repair fragmentation and potentially freeze tuples on the
- * specified page.
+ * specified page. If the page's visibility status has changed, update it in
+ * the VM.
*
* Caller must have pin and buffer cleanup lock on the page. Note that we
* don't update the FSM information for page on caller's behalf. Caller might
@@ -436,6 +437,8 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
bool do_freeze;
bool do_prune;
bool do_hint;
+ uint8 vmflags = 0;
+ uint8 old_vmbits = 0;
bool hint_bit_fpi;
int64 fpi_before = pgWalUsage.wal_fpi;
@@ -936,7 +939,7 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
*
* Now that freezing has been finalized, unset all_visible if there are
* any LP_DEAD items on the page. It needs to reflect the present state
- * of the page, as expected by our caller.
+ * of the page, as expected for updating the visibility map.
*/
if (prstate.all_visible && prstate.lpdead_items == 0)
{
@@ -952,31 +955,91 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
presult->hastup = prstate.hastup;
/*
- * For callers planning to update the visibility map, the conflict horizon
- * for that record must be the newest xmin on the page. However, if the
- * page is completely frozen, there can be no conflict and the
- * vm_conflict_horizon should remain InvalidTransactionId. This includes
- * the case that we just froze all the tuples; the prune-freeze record
- * included the conflict XID already so the caller doesn't need it.
+ * If updating the visibility map, the conflict horizon for that record
+ * must be the newest xmin on the page. However, if the page is
+ * completely frozen, there can be no conflict and the vm_conflict_horizon
+ * should remain InvalidTransactionId. This includes the case that we
+ * just froze all the tuples; the prune-freeze record included the
+ * conflict XID already so the VM update record doesn't need it.
*/
if (presult->all_frozen)
presult->vm_conflict_horizon = InvalidTransactionId;
else
presult->vm_conflict_horizon = prstate.visibility_cutoff_xid;
- presult->lpdead_items = prstate.lpdead_items;
- /* the presult->deadoffsets array was already filled in */
-
/*
- * Clear any VM corruption. This does not need to be done in a critical
- * section.
+ * Handle setting visibility map bit based on information from the VM (as
+ * of last heap_vac_scan_next_block() call), and from all_visible and
+ * all_frozen variables.
*/
- presult->vm_corruption = false;
if (options & HEAP_PAGE_PRUNE_UPDATE_VM)
- presult->vm_corruption = identify_and_fix_vm_corruption(relation,
- blockno, buffer, page,
- blk_known_av,
- prstate.lpdead_items, vmbuffer);
+ {
+ if (identify_and_fix_vm_corruption(relation,
+ blockno, buffer, page,
+ blk_known_av,
+ prstate.lpdead_items, vmbuffer))
+ {
+ /* If we fix corruption, don't update the VM further */
+ }
+
+ /*
+ * If the page isn't yet marked all-visible in the VM or it is and
+ * needs to me marked all-frozen, update the VM Note that all_frozen
+ * is only valid if all_visible is true, so we must check both
+ * all_visible and all_frozen.
+ */
+ else if (presult->all_visible &&
+ (!blk_known_av ||
+ (presult->all_frozen && !VM_ALL_FROZEN(relation, blockno, &vmbuffer))))
+ {
+ Assert(prstate.lpdead_items == 0);
+ vmflags = VISIBILITYMAP_ALL_VISIBLE;
+
+ /*
+ * If the page is all-frozen, we can pass InvalidTransactionId as
+ * our cutoff_xid, since a snapshotConflictHorizon sufficient to
+ * make everything safe for REDO was logged when the page's tuples
+ * were frozen.
+ */
+ if (presult->all_frozen)
+ {
+ Assert(!TransactionIdIsValid(presult->vm_conflict_horizon));
+ vmflags |= VISIBILITYMAP_ALL_FROZEN;
+ }
+
+ /*
+ * It's possible for the VM bit to be clear and the page-level bit
+ * to be set if checksums are not enabled.
+ *
+ * And even if we are just planning to update the frozen bit in
+ * the VM, we shouldn't rely on all_visible_according_to_vm as a
+ * proxy for the page-level PD_ALL_VISIBLE bit being set, since it
+ * might have become stale.
+ *
+ * If the heap page is all-visible but the VM bit is not set, we
+ * don't need to dirty the heap page. However, if checksums are
+ * enabled, we do need to make sure that the heap page is dirtied
+ * before passing it to visibilitymap_set(), because it may be
+ * logged.
+ */
+ if (!PageIsAllVisible(page) || XLogHintBitIsNeeded())
+ {
+ PageSetAllVisible(page);
+ MarkBufferDirty(buffer);
+ }
+
+ old_vmbits = visibilitymap_set(relation, blockno, buffer, InvalidXLogRecPtr,
+ vmbuffer, presult->vm_conflict_horizon,
+ vmflags);
+ }
+ }
+
+ presult->lpdead_items = prstate.lpdead_items;
+ /* the presult->deadoffsets array was already filled in */
+
+ presult->old_vmbits = old_vmbits;
+ presult->new_vmbits = vmflags;
+
if (prstate.freeze)
{
if (presult->nfrozen > 0)
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index 0c121fdf4e6..c49e81bc5dd 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -1933,7 +1933,6 @@ lazy_scan_new_or_empty(LVRelState *vacrel, Buffer buf, BlockNumber blkno,
return false;
}
-
/* qsort comparator for sorting OffsetNumbers */
static int
cmpOffsetNumbers(const void *a, const void *b)
@@ -1949,7 +1948,8 @@ cmpOffsetNumbers(const void *a, const void *b)
* vmbuffer is the buffer containing the VM block with visibility information
* for the heap block, blkno. all_visible_according_to_vm is the saved
* visibility status of the heap block looked up earlier by the caller. We
- * won't rely entirely on this status, as it may be out of date.
+ * won't rely entirely on this status, as it may be out of date. These will be
+ * passed on to heap_page_prune_and_freeze() to use while setting the VM.
*
* *has_lpdead_items is set to true or false depending on whether, upon return
* from this function, any LP_DEAD items are still present on the page.
@@ -1978,6 +1978,7 @@ lazy_scan_prune(LVRelState *vacrel,
/*
* Prune all HOT-update chains and potentially freeze tuples on this page.
+ * Then, if the page's visibility status has changed, update the VM.
*
* If the relation has no indexes, we can immediately mark would-be dead
* items LP_UNUSED.
@@ -1986,10 +1987,6 @@ lazy_scan_prune(LVRelState *vacrel,
* presult.ndeleted. It should not be confused with presult.lpdead_items;
* presult.lpdead_items's final value can be thought of as the number of
* tuples that were deleted from indexes.
- *
- * We will update the VM after collecting LP_DEAD items and freezing
- * tuples. Pruning will have determined whether or not the page is
- * all-visible.
*/
prune_options = HEAP_PAGE_PRUNE_FREEZE | HEAP_PAGE_PRUNE_UPDATE_VM;
if (vacrel->nindexes == 0)
@@ -2081,88 +2078,26 @@ lazy_scan_prune(LVRelState *vacrel,
Assert(!presult.all_visible || !(*has_lpdead_items));
/*
- * Handle setting visibility map bit based on information from the VM (as
- * of last heap_vac_scan_next_block() call), and from all_visible and
- * all_frozen variables.
+ * For the purposes of logging, count whether or not the page was newly
+ * set all-visible and, potentially, all-frozen.
*/
- if (presult.vm_corruption)
+ if ((presult.old_vmbits & VISIBILITYMAP_ALL_VISIBLE) == 0 &&
+ (presult.new_vmbits & VISIBILITYMAP_ALL_VISIBLE) != 0)
{
- /* Don't update the VM if we just cleared corruption in it */
- }
-
- /*
- * If the page isn't yet marked all-visible in the VM or it is and needs
- * to me marked all-frozen, update the VM Note that all_frozen is only
- * valid if all_visible is true, so we must check both all_visible and
- * all_frozen.
- */
- else if (presult.all_visible &&
- (!all_visible_according_to_vm ||
- (presult.all_frozen && !VM_ALL_FROZEN(vacrel->rel, blkno, &vmbuffer))))
- {
- uint8 old_vmbits;
- uint8 flags = VISIBILITYMAP_ALL_VISIBLE;
-
- /*
- * If the page is all-frozen, we can pass InvalidTransactionId as our
- * cutoff_xid, since a snapshotConflictHorizon sufficient to make
- * everything safe for REDO was logged when the page's tuples were
- * frozen.
- */
- if (presult.all_frozen)
- {
- Assert(!TransactionIdIsValid(presult.vm_conflict_horizon));
- flags |= VISIBILITYMAP_ALL_FROZEN;
- }
-
- /*
- * It should never be the case that the visibility map page is set
- * while the page-level bit is clear, but the reverse is allowed (if
- * checksums are not enabled). Regardless, set both bits so that we
- * get back in sync.
- *
- * If the heap page is all-visible but the VM bit is not set, we don't
- * need to dirty the heap page. However, if checksums are enabled, we
- * do need to make sure that the heap page is dirtied before passing
- * it to visibilitymap_set(), because it may be logged.
- */
- if (!PageIsAllVisible(page) || XLogHintBitIsNeeded())
- {
- PageSetAllVisible(page);
- MarkBufferDirty(buf);
- }
-
- old_vmbits = visibilitymap_set(vacrel->rel, blkno, buf,
- InvalidXLogRecPtr,
- vmbuffer, presult.vm_conflict_horizon,
- flags);
-
- /*
- * Even if we are only setting the all-frozen bit, there is a small
- * chance that the VM was modified sometime between setting
- * all_visible_according_to_vm and checking the visibility during
- * pruning. Check the return value of old_vmbits to ensure the
- * visibility map counters used for logging are accurate.
- *
- * If the page wasn't already set all-visible and/or all-frozen in the
- * VM, count it as newly set for logging.
- */
- if ((old_vmbits & VISIBILITYMAP_ALL_VISIBLE) == 0)
- {
- vacrel->vm_new_visible_pages++;
- if (presult.all_frozen)
- {
- vacrel->vm_new_visible_frozen_pages++;
- *vm_page_frozen = true;
- }
- }
- else if ((old_vmbits & VISIBILITYMAP_ALL_FROZEN) == 0 &&
- presult.all_frozen)
+ vacrel->vm_new_visible_pages++;
+ if ((presult.new_vmbits & VISIBILITYMAP_ALL_FROZEN) != 0)
{
- vacrel->vm_new_frozen_pages++;
+ vacrel->vm_new_visible_frozen_pages++;
*vm_page_frozen = true;
}
}
+ else if ((presult.old_vmbits & VISIBILITYMAP_ALL_FROZEN) == 0 &&
+ (presult.new_vmbits & VISIBILITYMAP_ALL_FROZEN) != 0)
+ {
+ Assert((presult.new_vmbits & VISIBILITYMAP_ALL_VISIBLE) != 0);
+ vacrel->vm_new_frozen_pages++;
+ *vm_page_frozen = true;
+ }
return presult.ndeleted;
}
diff --git a/src/include/access/heapam.h b/src/include/access/heapam.h
index 0c7eb5e46f4..b85648456e9 100644
--- a/src/include/access/heapam.h
+++ b/src/include/access/heapam.h
@@ -235,20 +235,21 @@ typedef struct PruneFreezeResult
int recently_dead_tuples;
/*
- * all_visible and all_frozen indicate if the all-visible and all-frozen
- * bits in the visibility map can be set for this page, after pruning.
+ * all_visible and all_frozen indicate the status of the page as reflected
+ * in the visibility map after pruning, freezing, and setting any pages
+ * all-visible in the visibility map.
*
- * vm_conflict_horizon is the newest xmin of live tuples on the page. The
- * caller can use it as the conflict horizon when setting the VM bits. It
- * is only valid if we froze some tuples (nfrozen > 0), and all_frozen is
- * true.
+ * vm_conflict_horizon is the newest xmin of live tuples on the page
+ * (older than OldestXmin). It will only be valid if we did not set the
+ * page all-frozen in the VM.
*
* These are only set if the HEAP_PRUNE_FREEZE option is set.
*/
bool all_visible;
bool all_frozen;
TransactionId vm_conflict_horizon;
- bool vm_corruption;
+ uint8 old_vmbits;
+ uint8 new_vmbits;
/*
* Whether or not the page makes rel truncation unsafe. This is set to
--
2.43.0
v5-0008-Keep-all_frozen-updated-too-in-heap_page_prune_an.patchtext/x-patch; charset=US-ASCII; name=v5-0008-Keep-all_frozen-updated-too-in-heap_page_prune_an.patchDownload
From dfe004443fabc70f586e0073b4b6f07d687e185b Mon Sep 17 00:00:00 2001
From: Melanie Plageman <melanieplageman@gmail.com>
Date: Wed, 23 Jul 2025 16:01:24 -0400
Subject: [PATCH v5 08/20] Keep all_frozen updated too in
heap_page_prune_and_freeze
We previously relied on only using all-visible and all-frozen together
but it's best to keep them both updated.
---
src/backend/access/heap/pruneheap.c | 15 ++++++---------
1 file changed, 6 insertions(+), 9 deletions(-)
diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index 496b70e318f..6c3653e776c 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -143,10 +143,6 @@ typedef struct
* whether to freeze the page or not. The all_visible and all_frozen
* values returned to the caller are adjusted to include LP_DEAD items at
* the end.
- *
- * all_frozen should only be considered valid if all_visible is also set;
- * we don't bother to clear the all_frozen flag every time we clear the
- * all_visible flag.
*/
bool all_visible;
bool all_frozen;
@@ -824,6 +820,7 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
*/
}
+ Assert(!prstate.all_frozen || prstate.all_visible);
/* Any error while applying the changes is critical */
START_CRIT_SECTION();
@@ -1468,7 +1465,7 @@ heap_prune_record_unchanged_lp_normal(Page page, PruneState *prstate, OffsetNumb
if (!HeapTupleHeaderXminCommitted(htup))
{
- prstate->all_visible = false;
+ prstate->all_visible = prstate->all_frozen = false;
break;
}
@@ -1490,7 +1487,7 @@ heap_prune_record_unchanged_lp_normal(Page page, PruneState *prstate, OffsetNumb
Assert(prstate->cutoffs);
if (!TransactionIdPrecedes(xmin, prstate->cutoffs->OldestXmin))
{
- prstate->all_visible = false;
+ prstate->all_visible = prstate->all_frozen = false;
break;
}
@@ -1503,7 +1500,7 @@ heap_prune_record_unchanged_lp_normal(Page page, PruneState *prstate, OffsetNumb
case HEAPTUPLE_RECENTLY_DEAD:
prstate->recently_dead_tuples++;
- prstate->all_visible = false;
+ prstate->all_visible = prstate->all_frozen = false;
/*
* This tuple will soon become DEAD. Update the hint field so
@@ -1522,7 +1519,7 @@ heap_prune_record_unchanged_lp_normal(Page page, PruneState *prstate, OffsetNumb
* assumption is a bit shaky, but it is what acquire_sample_rows()
* does, so be consistent.
*/
- prstate->all_visible = false;
+ prstate->all_visible = prstate->all_frozen = false;
/*
* If we wanted to optimize for aborts, we might consider marking
@@ -1540,7 +1537,7 @@ heap_prune_record_unchanged_lp_normal(Page page, PruneState *prstate, OffsetNumb
* will commit and update the counters after we report.
*/
prstate->live_tuples++;
- prstate->all_visible = false;
+ prstate->all_visible = prstate->all_frozen = false;
/*
* This tuple may soon become DEAD. Update the hint field so that
--
2.43.0
v5-0010-Eliminate-xl_heap_visible-from-vacuum-phase-I-pru.patchtext/x-patch; charset=US-ASCII; name=v5-0010-Eliminate-xl_heap_visible-from-vacuum-phase-I-pru.patchDownload
From 1960da345a3fba00d668a23204684a75f08b0d05 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <melanieplageman@gmail.com>
Date: Wed, 18 Jun 2025 12:41:00 -0400
Subject: [PATCH v5 10/20] Eliminate xl_heap_visible from vacuum phase I
prune/freeze
Instead of emitting a separate WAL record for every block rendered
all-visible/frozen by vacuum's phase I, include the changes to the VM in
the xl_heap_prune record already emitted.
This is only enabled for vacuum's prune/freeze work, not for on-access
pruning.
---
src/backend/access/heap/pruneheap.c | 454 ++++++++++++++++-----------
src/backend/access/heap/vacuumlazy.c | 30 --
src/include/access/heapam.h | 15 +-
3 files changed, 278 insertions(+), 221 deletions(-)
diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index 05227ce0339..cf9e5215d6b 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -44,6 +44,13 @@ typedef struct
bool mark_unused_now;
/* whether to attempt freezing tuples */
bool freeze;
+
+ /*
+ * Whether or not to consider updating the VM. There is some bookkeeping
+ * that must be maintained if we would like to update the VM.
+ */
+ bool consider_update_vm;
+
struct VacuumCutoffs *cutoffs;
/*-------------------------------------------------------
@@ -108,8 +115,9 @@ typedef struct
*
* These fields are not used by pruning itself for the most part, but are
* used to collect information about what was pruned and what state the
- * page is in after pruning, for the benefit of the caller. They are
- * copied to the caller's PruneFreezeResult at the end.
+ * page is in after pruning to use when updating the visibility map and
+ * for the benefit of the caller. They are copied to the caller's
+ * PruneFreezeResult at the end.
* -------------------------------------------------------
*/
@@ -138,11 +146,10 @@ typedef struct
* bits. It is only valid if we froze some tuples, and all_frozen is
* true.
*
- * NOTE: all_visible and all_frozen don't include LP_DEAD items. That's
- * convenient for heap_page_prune_and_freeze(), to use them to decide
- * whether to freeze the page or not. The all_visible and all_frozen
- * values returned to the caller are adjusted to include LP_DEAD items at
- * the end.
+ * NOTE: all_visible and all_frozen don't include LP_DEAD items until
+ * directly before updating the VM. We ignore LP_DEAD items when deciding
+ * whether or not to opportunistically freeze and when determining the
+ * snapshot conflict horizon required when freezing tuples.
*/
bool all_visible;
bool all_frozen;
@@ -371,12 +378,15 @@ identify_and_fix_vm_corruption(Relation relation,
* If the HEAP_PRUNE_FREEZE option is set, we will also freeze tuples if it's
* required in order to advance relfrozenxid / relminmxid, or if it's
* considered advantageous for overall system performance to do so now. The
- * 'cutoffs', 'presult', 'new_relfrozen_xid' and 'new_relmin_mxid' arguments
- * are required when freezing. When HEAP_PRUNE_FREEZE option is set, we also
- * set presult->all_visible and presult->all_frozen on exit, to indicate if
- * the VM bits can be set. They are always set to false when the
- * HEAP_PRUNE_FREEZE option is not set, because at the moment only callers
- * that also freeze need that information.
+ * 'cutoffs', 'new_relfrozen_xid' and 'new_relmin_mxid' arguments are required
+ * when freezing.
+ *
+ * If HEAP_PAGE_PRUNE_UPDATE_VM is set and the visibility status of the page
+ * has changed, we will update the VM at the same time as pruning and freezing
+ * the heap page. We will also update presult->old_vmbits and
+ * presult->new_vmbits with the state of the VM before and after updating it
+ * for the caller to use in bookkeeping. Note that new and old_vmbits will be
+ * 0 if HEAP_PAGE_PRUNE_UPDATE_VM is not set.
*
* blk_known_av is the visibility status of the heap block as of the last call
* to find_next_unskippable_block(). vmbuffer is the buffer that may already
@@ -392,6 +402,8 @@ identify_and_fix_vm_corruption(Relation relation,
* FREEZE indicates that we will also freeze tuples, and will return
* 'all_visible', 'all_frozen' flags to the caller.
*
+ * UPDATE_VM indicates that we will set the page's status in the VM.
+ *
* cutoffs contains the freeze cutoffs, established by VACUUM at the beginning
* of vacuuming the relation. Required if HEAP_PRUNE_FREEZE option is set.
* cutoffs->OldestXmin is also used to determine if dead tuples are
@@ -436,18 +448,24 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
HeapTupleData tup;
bool do_freeze;
bool do_prune;
- bool do_hint;
+ bool do_hint_full_or_prunable;
+ bool do_set_vm;
uint8 vmflags = 0;
uint8 old_vmbits = 0;
bool hint_bit_fpi;
int64 fpi_before = pgWalUsage.wal_fpi;
+ bool all_frozen_except_lp_dead = false;
+ bool set_pd_all_visible = false;
/* Copy parameters to prstate */
prstate.vistest = vistest;
prstate.mark_unused_now = (options & HEAP_PAGE_PRUNE_MARK_UNUSED_NOW) != 0;
prstate.freeze = (options & HEAP_PAGE_PRUNE_FREEZE) != 0;
+ prstate.consider_update_vm = (options & HEAP_PAGE_PRUNE_UPDATE_VM) != 0;
prstate.cutoffs = cutoffs;
+ Assert(!prstate.consider_update_vm || vmbuffer);
+
/*
* Our strategy is to scan the page and make lists of items to change,
* then apply the changes within a critical section. This keeps as much
@@ -492,50 +510,57 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
prstate.deadoffsets = presult->deadoffsets;
/*
- * Caller may update the VM after we're done. We can keep track of
- * whether the page will be all-visible and all-frozen after pruning and
- * freezing to help the caller to do that.
+ * Keep track of whether or not the page will be all-visible and
+ * all-frozen for use in opportunistic freezing and to update the VM if
+ * the caller requests it.
+ *
+ * Currently, only VACUUM attempts freezing and setting the VM bits. But
+ * other callers could do either one. The visibility bookkeeping is
+ * required for opportunistic freezing (in addition to setting the VM
+ * bits) because we only consider opportunistically freezing tuples if the
+ * whole page would become all-frozen or if the whole page will be frozen
+ * except for dead tuples that will be removed by vacuum.
*
- * Currently, only VACUUM sets the VM bits. To save the effort, only do
- * the bookkeeping if the caller needs it. Currently, that's tied to
- * HEAP_PAGE_PRUNE_FREEZE, but it could be a separate flag if you wanted
- * to update the VM bits without also freezing or freeze without also
- * setting the VM bits.
+ * If only updating the VM, we must initialize all_frozen to false, as
+ * heap_prepare_freeze_tuple() will not be called for each tuple on the
+ * page and we will not end up correctly setting it to false later.
*
- * In addition to telling the caller whether it can set the VM bit, we
- * also use 'all_visible' and 'all_frozen' for our own decision-making. If
- * the whole page would become frozen, we consider opportunistically
- * freezing tuples. We will not be able to freeze the whole page if there
- * are tuples present that are not visible to everyone or if there are
- * dead tuples which are not yet removable. However, dead tuples which
- * will be removed by the end of vacuuming should not preclude us from
- * opportunistically freezing. Because of that, we do not clear
- * all_visible when we see LP_DEAD items. We fix that at the end of the
- * function, when we return the value to the caller, so that the caller
- * doesn't set the VM bit incorrectly.
+ * Dead tuples which will be removed by the end of vacuuming should not
+ * preclude us from opportunistically freezing, so we do not clear
+ * all_visible when we see LP_DEAD items. We fix that after determining
+ * whether or not to freeze but before deciding whether or not to update
+ * the VM so that we don't set the VM bit incorrectly.
+ *
+ * If not freezing or updating the VM, we otherwise avoid the extra
+ * bookkeeping. Initializing all_visible to false allows skipping the work
+ * to update them in heap_prune_record_unchanged_lp_normal().
*/
if (prstate.freeze)
{
prstate.all_visible = true;
prstate.all_frozen = true;
}
+ else if (prstate.consider_update_vm)
+ {
+ prstate.all_visible = true;
+ prstate.all_frozen = false;
+ }
else
{
- /*
- * Initializing to false allows skipping the work to update them in
- * heap_prune_record_unchanged_lp_normal().
- */
prstate.all_visible = false;
prstate.all_frozen = false;
}
/*
- * The visibility cutoff xid is the newest xmin of live tuples on the
- * page. In the common case, this will be set as the conflict horizon the
- * caller can use for updating the VM. If, at the end of freezing and
- * pruning, the page is all-frozen, there is no possibility that any
- * running transaction on the standby does not see tuples on the page as
- * all-visible, so the conflict horizon remains InvalidTransactionId.
+ * The visibility cutoff xid is the newest xmin of live, committed tuples
+ * older than OldestXmin on the page. This field is only kept up-to-date
+ * if the page is all-visible. As soon as a tuple is encountered that is
+ * not visible to all, this field is unmaintained. As long as it is
+ * maintained, it can be used to calculate the snapshot conflict horizon.
+ * This is most likely to happen when updating the VM and/or freezing all
+ * live tuples on the page. It is updated before returning to the caller
+ * because vacuum does assert-build only validation on the page using this
+ * field.
*/
prstate.visibility_cutoff_xid = InvalidTransactionId;
@@ -733,10 +758,11 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
/*
* Even if we don't prune anything, if we found a new value for the
- * pd_prune_xid field or the page was marked full, we will update the hint
- * bit.
+ * pd_prune_xid field or the page was marked full, we will update those
+ * hint bits.
*/
- do_hint = ((PageHeader) page)->pd_prune_xid != prstate.new_prune_xid ||
+ do_hint_full_or_prunable =
+ ((PageHeader) page)->pd_prune_xid != prstate.new_prune_xid ||
PageIsFull(page);
/*
@@ -784,7 +810,7 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
if (XLogCheckBufferNeedsBackup(buffer))
do_freeze = true;
}
- else if (do_hint)
+ else if (do_hint_full_or_prunable)
{
if (XLogHintBitIsNeeded() && XLogCheckBufferNeedsBackup(buffer))
do_freeze = true;
@@ -823,11 +849,84 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
*/
}
+ /*
+ * It was convenient to ignore LP_DEAD items in all_visible earlier on to
+ * make the choice of whether or not to freeze the page unaffected by the
+ * short-term presence of LP_DEAD items. These LP_DEAD items were
+ * effectively assumed to be LP_UNUSED items in the making. It doesn't
+ * matter which vacuum heap pass (initial pass or final pass) ends up
+ * setting the page all-frozen, as long as the ongoing VACUUM does it.
+ *
+ * Now that freezing has been finalized, unset all_visible if there are
+ * any LP_DEAD items on the page. It needs to reflect the present state of
+ * the page when using it to determine whether or not to update the VM.
+ *
+ * Keep track of whether or not the page was all-frozen except LP_DEAD
+ * items for the purposes of calculating the snapshot conflict horizon,
+ * though.
+ */
+ all_frozen_except_lp_dead = prstate.all_frozen;
+ if (prstate.lpdead_items > 0)
+ {
+ prstate.all_visible = false;
+ prstate.all_frozen = false;
+ }
+
Assert(!prstate.all_frozen || prstate.all_visible);
+
+ /*
+ * Handle setting visibility map bit based on information from the VM (as
+ * of last heap_vac_scan_next_block() call), and from all_visible and
+ * all_frozen variables.
+ */
+ if (prstate.consider_update_vm)
+ {
+ /*
+ * Clear any VM corruption. This does not need to be in a critical
+ * section, so we do it first. If PD_ALL_VISIBLE is incorrectly set,
+ * we may mark the heap page buffer dirty here and could end up doing
+ * so again later. This is not a correctness issue and is in the path
+ * of VM corruption, so we don't have to worry about the extra
+ * performance overhead.
+ */
+ if (identify_and_fix_vm_corruption(relation,
+ blockno, buffer, page,
+ blk_known_av, prstate.lpdead_items, vmbuffer))
+ {
+ /* If we fix corruption, don't update the VM further */
+ }
+
+ /* Determine if we actually need to set the VM and which bits to set. */
+ else if (prstate.all_visible &&
+ (!blk_known_av ||
+ (prstate.all_frozen && !VM_ALL_FROZEN(relation, blockno, &vmbuffer))))
+ {
+ vmflags |= VISIBILITYMAP_ALL_VISIBLE;
+ if (prstate.all_frozen)
+ vmflags |= VISIBILITYMAP_ALL_FROZEN;
+ }
+ }
+
+ do_set_vm = vmflags & VISIBILITYMAP_VALID_BITS;
+
+ /*
+ * Don't set PD_ALL_VISIBLE unless we also plan to set the VM. While it is
+ * correct for a heap page to have PD_ALL_VISIBLE even if the VM is not
+ * set, we strongly prefer to keep them in sync.
+ *
+ * Prior to Postgres 19, it was possible for the page-level bit to be set
+ * and the VM bit to be clear. This could happen if we crashed after
+ * setting PD_ALL_VISIBLE but before setting bits in the VM.
+ */
+ set_pd_all_visible = do_set_vm && !PageIsAllVisible(page);
+
+ /* Save these for the caller in case we later zero out vmflags */
+ presult->new_vmbits = vmflags;
+
/* Any error while applying the changes is critical */
START_CRIT_SECTION();
- if (do_hint)
+ if (do_hint_full_or_prunable)
{
/*
* Update the page's pd_prune_xid field to either zero, or the lowest
@@ -843,15 +942,16 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
PageClearFull(page);
/*
- * If that's all we had to do to the page, this is a non-WAL-logged
- * hint. If we are going to freeze or prune the page, we will mark
- * the buffer dirty below.
+ * If we are _only_ setting the prune_xid or PD_PAGE_FULL hint, then
+ * this is a non-WAL-logged hint. If we are going to freeze or prune
+ * tuples on the page or set PD_ALL_VISIBLE, we will mark the buffer
+ * dirty and emit WAL below.
*/
- if (!do_freeze && !do_prune)
+ if (!do_prune && !do_freeze && !set_pd_all_visible)
MarkBufferDirtyHint(buffer, true);
}
- if (do_prune || do_freeze)
+ if (do_prune || do_freeze || do_set_vm)
{
/* Apply the planned item changes and repair page fragmentation. */
if (do_prune)
@@ -865,12 +965,47 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
if (do_freeze)
heap_freeze_prepared_tuples(buffer, prstate.frozen, prstate.nfrozen);
- MarkBufferDirty(buffer);
+ if (set_pd_all_visible)
+ PageSetAllVisible(page);
/*
- * Emit a WAL XLOG_HEAP2_PRUNE_FREEZE record showing what we did
+ * We only set PD_ALL_VISIBLE if we also set the VM, and since setting
+ * the VM requires emitting WAL, MarkBufferDirtyHint() isn't
+ * appropriate here.
*/
- if (RelationNeedsWAL(relation))
+ if (do_prune || do_freeze || set_pd_all_visible)
+ MarkBufferDirty(buffer);
+
+ if (do_set_vm)
+ {
+ LockBuffer(vmbuffer, BUFFER_LOCK_EXCLUSIVE);
+ old_vmbits = visibilitymap_set_vmbyte(relation, blockno,
+ vmbuffer, vmflags);
+
+ if (old_vmbits == vmflags)
+ {
+ LockBuffer(vmbuffer, BUFFER_LOCK_UNLOCK);
+ do_set_vm = false;
+ /* 0 out vmflags so we don't emit WAL to update the VM */
+ vmflags = 0;
+ }
+ }
+
+ /*
+ * It should never be the case that PD_ALL_VISIBLE is not set and the
+ * VM is set. Or, if it were, we should have caught it earlier when
+ * finding and fixing VM corruption. So, if we found out the VM was
+ * already set above, we should have found PD_ALL_VISIBLE set earlier.
+ */
+ Assert(!set_pd_all_visible || do_set_vm);
+
+ /*
+ * Emit a WAL XLOG_HEAP2_PRUNE_FREEZE record showing what we did. If
+ * we were only updating the VM and it turns out it was already set,
+ * we will have unset do_set_vm earlier. As such, check it again
+ * before emitting the record.
+ */
+ if (RelationNeedsWAL(relation) && (do_set_vm || do_prune || do_freeze))
{
/*
* The snapshotConflictHorizon for the whole record should be the
@@ -882,35 +1017,56 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
* on the standby with xids older than the youngest tuple this
* record will freeze will conflict.
*/
- TransactionId frz_conflict_horizon = InvalidTransactionId;
- TransactionId conflict_xid;
+ TransactionId conflict_xid = InvalidTransactionId;
/*
- * We can use the visibility_cutoff_xid as our cutoff for
- * conflicts when the whole page is eligible to become all-frozen
- * in the VM once we're done with it. Otherwise we generate a
- * conservative cutoff by stepping back from OldestXmin.
+ * If we are updating the VM, the conflict horizon is almost
+ * always the visibility cutoff XID.
+ *
+ * Separately, if we are freezing any tuples, as an optimization,
+ * we can use the visibility_cutoff_xid as the conflict horizon if
+ * the page will be all-frozen. This is true even if there are
+ * LP_DEAD line pointers because we ignored those when maintaining
+ * the visibility_cutoff_xid.
*/
- if (do_freeze)
+ if (do_set_vm || (do_freeze && all_frozen_except_lp_dead))
+ conflict_xid = prstate.visibility_cutoff_xid;
+
+ /*
+ * Otherwise, if we are freezing but the page would not be
+ * all-frozen, we have to use the more pessimistic horizon of
+ * OldestXmin, which may be newer than the newest tuple we froze.
+ * We currently don't track the newest tuple we froze.
+ */
+ else if (do_freeze)
{
- if (prstate.all_visible && prstate.all_frozen)
- frz_conflict_horizon = prstate.visibility_cutoff_xid;
- else
- {
- /* Avoids false conflicts when hot_standby_feedback in use */
- frz_conflict_horizon = prstate.cutoffs->OldestXmin;
- TransactionIdRetreat(frz_conflict_horizon);
- }
+ conflict_xid = prstate.cutoffs->OldestXmin;
+ TransactionIdRetreat(conflict_xid);
}
- if (TransactionIdFollows(frz_conflict_horizon, prstate.latest_xid_removed))
- conflict_xid = frz_conflict_horizon;
- else
+ /*
+ * If we are removing tuples with a younger xmax than our so far
+ * calculated conflict_xid, we must use this as our horizon.
+ */
+ if (TransactionIdFollows(prstate.latest_xid_removed, conflict_xid))
conflict_xid = prstate.latest_xid_removed;
+ /*
+ * We can omit the snapshot conflict horizon if we are not pruning
+ * or freezing any tuples and are setting an already all-visible
+ * page all-frozen in the VM. In this case, all of the tuples on
+ * the page must already be visible to all MVCC snapshots on the
+ * standby.
+ */
+ if (!do_prune && !do_freeze && do_set_vm &&
+ blk_known_av && (vmflags & VISIBILITYMAP_ALL_FROZEN))
+ conflict_xid = InvalidTransactionId;
+
log_heap_prune_and_freeze(relation, buffer,
false,
- InvalidBuffer, 0, false,
+ vmbuffer,
+ vmflags,
+ set_pd_all_visible,
conflict_xid,
true, reason,
prstate.frozen, prstate.nfrozen,
@@ -922,124 +1078,55 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
END_CRIT_SECTION();
- /* Copy information back for caller */
- presult->ndeleted = prstate.ndeleted;
- presult->nnewlpdead = prstate.ndead;
- presult->nfrozen = prstate.nfrozen;
- presult->live_tuples = prstate.live_tuples;
- presult->recently_dead_tuples = prstate.recently_dead_tuples;
-
- /*
- * It was convenient to ignore LP_DEAD items in all_visible earlier on to
- * make the choice of whether or not to freeze the page unaffected by the
- * short-term presence of LP_DEAD items. These LP_DEAD items were
- * effectively assumed to be LP_UNUSED items in the making. It doesn't
- * matter which vacuum heap pass (initial pass or final pass) ends up
- * setting the page all-frozen, as long as the ongoing VACUUM does it.
- *
- * Now that freezing has been finalized, unset all_visible if there are
- * any LP_DEAD items on the page. It needs to reflect the present state
- * of the page, as expected for updating the visibility map.
- */
- if (prstate.all_visible && prstate.lpdead_items == 0)
- {
- presult->all_visible = prstate.all_visible;
- presult->all_frozen = prstate.all_frozen;
- }
- else
- {
- presult->all_visible = false;
- presult->all_frozen = false;
- }
-
- presult->hastup = prstate.hastup;
+ if (do_set_vm)
+ LockBuffer(vmbuffer, BUFFER_LOCK_UNLOCK);
- /*
- * If updating the visibility map, the conflict horizon for that record
- * must be the newest xmin on the page. However, if the page is
- * completely frozen, there can be no conflict and the vm_conflict_horizon
- * should remain InvalidTransactionId. This includes the case that we
- * just froze all the tuples; the prune-freeze record included the
- * conflict XID already so the VM update record doesn't need it.
- */
- if (presult->all_frozen)
- presult->vm_conflict_horizon = InvalidTransactionId;
- else
- presult->vm_conflict_horizon = prstate.visibility_cutoff_xid;
+ Assert(!prstate.all_visible || (prstate.lpdead_items == 0));
/*
- * Handle setting visibility map bit based on information from the VM (as
- * of last heap_vac_scan_next_block() call), and from all_visible and
- * all_frozen variables.
+ * VACUUM will call heap_page_is_all_visible() during the second pass over
+ * the heap to determine all_visible and all_frozen for the page -- this
+ * is a specialized version of the logic from this function. Now that
+ * we've finished pruning and freezing, make sure that we're in total
+ * agreement with heap_page_is_all_visible() using an assertion. We will
+ * have already set the page in the VM, so this assertion will only let
+ * you know that you've already done something wrong.
*/
- if (options & HEAP_PAGE_PRUNE_UPDATE_VM)
+#ifdef USE_ASSERT_CHECKING
+ if (prstate.all_visible)
{
- if (identify_and_fix_vm_corruption(relation,
- blockno, buffer, page,
- blk_known_av,
- prstate.lpdead_items, vmbuffer))
- {
- /* If we fix corruption, don't update the VM further */
- }
+ TransactionId debug_cutoff;
+ bool debug_all_frozen;
- /*
- * If the page isn't yet marked all-visible in the VM or it is and
- * needs to me marked all-frozen, update the VM Note that all_frozen
- * is only valid if all_visible is true, so we must check both
- * all_visible and all_frozen.
- */
- else if (presult->all_visible &&
- (!blk_known_av ||
- (presult->all_frozen && !VM_ALL_FROZEN(relation, blockno, &vmbuffer))))
- {
- Assert(prstate.lpdead_items == 0);
- vmflags = VISIBILITYMAP_ALL_VISIBLE;
+ Assert(cutoffs);
- /*
- * If the page is all-frozen, we can pass InvalidTransactionId as
- * our cutoff_xid, since a snapshotConflictHorizon sufficient to
- * make everything safe for REDO was logged when the page's tuples
- * were frozen.
- */
- if (presult->all_frozen)
- {
- Assert(!TransactionIdIsValid(presult->vm_conflict_horizon));
- vmflags |= VISIBILITYMAP_ALL_FROZEN;
- }
+ Assert(prstate.lpdead_items == 0);
- /*
- * It's possible for the VM bit to be clear and the page-level bit
- * to be set if checksums are not enabled.
- *
- * And even if we are just planning to update the frozen bit in
- * the VM, we shouldn't rely on all_visible_according_to_vm as a
- * proxy for the page-level PD_ALL_VISIBLE bit being set, since it
- * might have become stale.
- *
- * If the heap page is all-visible but the VM bit is not set, we
- * don't need to dirty the heap page. However, if checksums are
- * enabled, we do need to make sure that the heap page is dirtied
- * before passing it to visibilitymap_set(), because it may be
- * logged.
- */
- if (!PageIsAllVisible(page) || XLogHintBitIsNeeded())
- {
- PageSetAllVisible(page);
- MarkBufferDirty(buffer);
- }
+ if (!heap_page_is_all_visible(relation, buffer,
+ cutoffs->OldestXmin,
+ &debug_all_frozen,
+ &debug_cutoff, off_loc))
+ Assert(false);
- old_vmbits = visibilitymap_set(relation, blockno, buffer, InvalidXLogRecPtr,
- vmbuffer, presult->vm_conflict_horizon,
- vmflags);
- }
+ Assert(prstate.all_frozen == debug_all_frozen);
+
+ Assert(!TransactionIdIsValid(debug_cutoff) ||
+ debug_cutoff == prstate.visibility_cutoff_xid);
}
+#endif
+ /* Copy information back for caller */
+ presult->ndeleted = prstate.ndeleted;
+ presult->nnewlpdead = prstate.ndead;
+ presult->nfrozen = prstate.nfrozen;
+ presult->live_tuples = prstate.live_tuples;
+ presult->recently_dead_tuples = prstate.recently_dead_tuples;
+ presult->old_vmbits = old_vmbits;
+ /* new_vmbits was set above */
+ presult->hastup = prstate.hastup;
presult->lpdead_items = prstate.lpdead_items;
/* the presult->deadoffsets array was already filled in */
- presult->old_vmbits = old_vmbits;
- presult->new_vmbits = vmflags;
-
if (prstate.freeze)
{
if (presult->nfrozen > 0)
@@ -1621,7 +1708,12 @@ heap_prune_record_unchanged_lp_normal(Page page, PruneState *prstate, OffsetNumb
break;
}
- /* Consider freezing any normal tuples which will not be removed */
+ /*
+ * Consider freezing any normal tuples which will not be removed.
+ * Regardless of whether or not we want to freeze the tuples, if we want
+ * to update the VM, we have to call heap_prepare_freeze_tuple() on every
+ * tuple to know whether or not the page will be totally frozen.
+ */
if (prstate->freeze)
{
bool totally_frozen;
@@ -2234,6 +2326,8 @@ log_heap_prune_and_freeze(Relation relation, Buffer buffer,
OffsetNumber frz_offsets[MaxHeapTuplesPerPage];
bool do_prune = nredirected > 0 || ndead > 0 || nunused > 0;
+ Assert(do_prune || nfrozen > 0 || vmflags & VISIBILITYMAP_VALID_BITS);
+
xlrec.flags = 0;
regbuf_flags = REGBUF_STANDARD;
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index c49e81bc5dd..91e209901b8 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -2014,34 +2014,6 @@ lazy_scan_prune(LVRelState *vacrel,
vacrel->new_frozen_tuple_pages++;
}
- /*
- * VACUUM will call heap_page_is_all_visible() during the second pass over
- * the heap to determine all_visible and all_frozen for the page -- this
- * is a specialized version of the logic from this function. Now that
- * we've finished pruning and freezing, make sure that we're in total
- * agreement with heap_page_is_all_visible() using an assertion.
- */
-#ifdef USE_ASSERT_CHECKING
- /* Note that all_frozen value does not matter when !all_visible */
- if (presult.all_visible)
- {
- TransactionId debug_cutoff;
- bool debug_all_frozen;
-
- Assert(presult.lpdead_items == 0);
-
- if (!heap_page_is_all_visible(vacrel->rel, buf,
- vacrel->cutoffs.OldestXmin, &debug_all_frozen,
- &debug_cutoff, &vacrel->offnum))
- Assert(false);
-
- Assert(presult.all_frozen == debug_all_frozen);
-
- Assert(!TransactionIdIsValid(debug_cutoff) ||
- debug_cutoff == presult.vm_conflict_horizon);
- }
-#endif
-
/*
* Now save details of the LP_DEAD items from the page in vacrel
*/
@@ -2075,8 +2047,6 @@ lazy_scan_prune(LVRelState *vacrel,
/* Did we find LP_DEAD items? */
*has_lpdead_items = (presult.lpdead_items > 0);
- Assert(!presult.all_visible || !(*has_lpdead_items));
-
/*
* For the purposes of logging, count whether or not the page was newly
* set all-visible and, potentially, all-frozen.
diff --git a/src/include/access/heapam.h b/src/include/access/heapam.h
index b85648456e9..0b9bb1c9b13 100644
--- a/src/include/access/heapam.h
+++ b/src/include/access/heapam.h
@@ -235,19 +235,12 @@ typedef struct PruneFreezeResult
int recently_dead_tuples;
/*
- * all_visible and all_frozen indicate the status of the page as reflected
- * in the visibility map after pruning, freezing, and setting any pages
- * all-visible in the visibility map.
+ * old_vmbits are the state of the all-visible and all-frozen bits in the
+ * visibility map before updating it during phase I of vacuuming.
+ * new_vmbits are the state of those bits after phase I of vacuuming.
*
- * vm_conflict_horizon is the newest xmin of live tuples on the page
- * (older than OldestXmin). It will only be valid if we did not set the
- * page all-frozen in the VM.
- *
- * These are only set if the HEAP_PRUNE_FREEZE option is set.
+ * These are only set if the HEAP_PAGE_PRUNE_UPDATE_VM option is set.
*/
- bool all_visible;
- bool all_frozen;
- TransactionId vm_conflict_horizon;
uint8 old_vmbits;
uint8 new_vmbits;
--
2.43.0
v5-0011-Rename-PruneState.freeze-to-attempt_freeze.patchtext/x-patch; charset=US-ASCII; name=v5-0011-Rename-PruneState.freeze-to-attempt_freeze.patchDownload
From 92232d63451af2ffb3eeaa2dfe9c6e83ce7ba938 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <melanieplageman@gmail.com>
Date: Thu, 31 Jul 2025 14:07:51 -0400
Subject: [PATCH v5 11/20] Rename PruneState.freeze to attempt_freeze
This makes it more clear that this is to indicate the caller would like
heap_page_prune_and_freeze() to consider freezing tuples -- not that we
ultimately will end up freezing them.
Also rename local variable hint_bit_fpi to did_tuple_hint_fpi. This
makes it clear it is about tuple hints and not page hints and that it
indicates something that happened and not something that could happen.
---
src/backend/access/heap/pruneheap.c | 20 ++++++++++----------
1 file changed, 10 insertions(+), 10 deletions(-)
diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index cf9e5215d6b..82127e8728b 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -43,7 +43,7 @@ typedef struct
/* whether or not dead items can be set LP_UNUSED during pruning */
bool mark_unused_now;
/* whether to attempt freezing tuples */
- bool freeze;
+ bool attempt_freeze;
/*
* Whether or not to consider updating the VM. There is some bookkeeping
@@ -452,7 +452,7 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
bool do_set_vm;
uint8 vmflags = 0;
uint8 old_vmbits = 0;
- bool hint_bit_fpi;
+ bool did_tuple_hint_fpi;
int64 fpi_before = pgWalUsage.wal_fpi;
bool all_frozen_except_lp_dead = false;
bool set_pd_all_visible = false;
@@ -460,7 +460,7 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
/* Copy parameters to prstate */
prstate.vistest = vistest;
prstate.mark_unused_now = (options & HEAP_PAGE_PRUNE_MARK_UNUSED_NOW) != 0;
- prstate.freeze = (options & HEAP_PAGE_PRUNE_FREEZE) != 0;
+ prstate.attempt_freeze = (options & HEAP_PAGE_PRUNE_FREEZE) != 0;
prstate.consider_update_vm = (options & HEAP_PAGE_PRUNE_UPDATE_VM) != 0;
prstate.cutoffs = cutoffs;
@@ -485,7 +485,7 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
/* initialize page freezing working state */
prstate.pagefrz.freeze_required = false;
- if (prstate.freeze)
+ if (prstate.attempt_freeze)
{
Assert(new_relfrozen_xid && new_relmin_mxid);
prstate.pagefrz.FreezePageRelfrozenXid = *new_relfrozen_xid;
@@ -535,7 +535,7 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
* bookkeeping. Initializing all_visible to false allows skipping the work
* to update them in heap_prune_record_unchanged_lp_normal().
*/
- if (prstate.freeze)
+ if (prstate.attempt_freeze)
{
prstate.all_visible = true;
prstate.all_frozen = true;
@@ -653,7 +653,7 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
* If checksums are enabled, heap_prune_satisfies_vacuum() may have caused
* an FPI to be emitted.
*/
- hint_bit_fpi = fpi_before != pgWalUsage.wal_fpi;
+ did_tuple_hint_fpi = fpi_before != pgWalUsage.wal_fpi;
/*
* Process HOT chains.
@@ -770,7 +770,7 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
* plans we prepared, or not.
*/
do_freeze = false;
- if (prstate.freeze)
+ if (prstate.attempt_freeze)
{
if (prstate.pagefrz.freeze_required)
{
@@ -803,7 +803,7 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
*/
if (RelationNeedsWAL(relation))
{
- if (hint_bit_fpi)
+ if (did_tuple_hint_fpi)
do_freeze = true;
else if (do_prune)
{
@@ -1127,7 +1127,7 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
presult->lpdead_items = prstate.lpdead_items;
/* the presult->deadoffsets array was already filled in */
- if (prstate.freeze)
+ if (prstate.attempt_freeze)
{
if (presult->nfrozen > 0)
{
@@ -1714,7 +1714,7 @@ heap_prune_record_unchanged_lp_normal(Page page, PruneState *prstate, OffsetNumb
* to update the VM, we have to call heap_prepare_freeze_tuple() on every
* tuple to know whether or not the page will be totally frozen.
*/
- if (prstate->freeze)
+ if (prstate->attempt_freeze)
{
bool totally_frozen;
--
2.43.0
v5-0014-Use-GlobalVisState-to-determine-page-level-visibi.patchtext/x-patch; charset=US-ASCII; name=v5-0014-Use-GlobalVisState-to-determine-page-level-visibi.patchDownload
From 67597b88b4127d767db8ca32d1e29cd4ec79a070 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <melanieplageman@gmail.com>
Date: Tue, 29 Jul 2025 14:38:24 -0400
Subject: [PATCH v5 14/20] Use GlobalVisState to determine page level
visibility
During pruning and during vacuum's third phase, we try to determine if
the whole page can be set all-visible in the visibility map. Instead of
using OldestXmin to determine if all the tuples on a page are visible to
everyone, use the GlobalVisState. This allows us to start setting the VM
during on-access pruning in a future commit.
It is possible for the GlobalVisState to change during the course of a
vacuum. In all but extraordinary cases, it moves forward, meaning more
pages could potentially be set in the VM.
Because comparing a transaction ID to the GlobalVisState requires more
operations than comparing it to another single transaction ID, we now
wait until after examining all the tuples on the page and if we have
maintained the visibility_cutoff_xid, we compare that to the
GlobalVisState just once per page. This works because if the page is
all-visible and has live, committed tuples on it, the
visibility_cutoff_xid will contain the newest xmin on the page. If
everyone can see it, the page is truly all-visible.
Doing this may mean we examine more tuples' xmins than before, as we may
have set all_visible to false sooner when encountering a live tuple
newer than OldestXmin. However, these extra comparisons were found not
to be significant in a profile.
---
src/backend/access/heap/heapam_visibility.c | 28 ++++++++++++
src/backend/access/heap/pruneheap.c | 48 +++++++++------------
src/backend/access/heap/vacuumlazy.c | 17 ++++----
src/include/access/heapam.h | 4 +-
4 files changed, 59 insertions(+), 38 deletions(-)
diff --git a/src/backend/access/heap/heapam_visibility.c b/src/backend/access/heap/heapam_visibility.c
index 4ebc8abdbeb..edd529dc3c0 100644
--- a/src/backend/access/heap/heapam_visibility.c
+++ b/src/backend/access/heap/heapam_visibility.c
@@ -1189,6 +1189,34 @@ HeapTupleSatisfiesVacuum(HeapTuple htup, TransactionId OldestXmin,
return res;
}
+/*
+ * Nearly the same as HeapTupleSatisfiesVacuum, but uses a GlobalVisState to
+ * determine whether or not a tuple is HEAPTUPLE_DEAD Or
+ * HEAPTUPLE_RECENTLY_DEAD. It serves the same purpose but can be used by
+ * callers that have not calculated a single OldestXmin value.
+ */
+HTSV_Result
+HeapTupleSatisfiesVacuumGlobalVis(HeapTuple htup, GlobalVisState *vistest,
+ Buffer buffer)
+{
+ TransactionId dead_after = InvalidTransactionId;
+ HTSV_Result res;
+
+ res = HeapTupleSatisfiesVacuumHorizon(htup, buffer, &dead_after);
+
+ if (res == HEAPTUPLE_RECENTLY_DEAD)
+ {
+ Assert(TransactionIdIsValid(dead_after));
+
+ if (GlobalVisXidVisibleToAll(vistest, dead_after))
+ res = HEAPTUPLE_DEAD;
+ }
+ else
+ Assert(!TransactionIdIsValid(dead_after));
+
+ return res;
+}
+
/*
* Work horse for HeapTupleSatisfiesVacuum and similar routines.
*
diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index 715dfc16ba7..ab79d8a3ed9 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -141,10 +141,9 @@ typedef struct
* all_visible and all_frozen indicate if the all-visible and all-frozen
* bits in the visibility map can be set for this page after pruning.
*
- * visibility_cutoff_xid is the newest xmin of live tuples on the page.
- * The caller can use it as the conflict horizon, when setting the VM
- * bits. It is only valid if we froze some tuples, and all_frozen is
- * true.
+ * visibility_cutoff_xid is the newest xmin of live tuples on the page. It
+ * can be used as the conflict horizon, when setting the VM or when
+ * freezing all the live tuples on the page.
*
* NOTE: all_visible and all_frozen don't include LP_DEAD items until
* directly before updating the VM. We ignore LP_DEAD items when deciding
@@ -553,14 +552,12 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
/*
* The visibility cutoff xid is the newest xmin of live, committed tuples
- * older than OldestXmin on the page. This field is only kept up-to-date
- * if the page is all-visible. As soon as a tuple is encountered that is
- * not visible to all, this field is unmaintained. As long as it is
- * maintained, it can be used to calculate the snapshot conflict horizon.
- * This is most likely to happen when updating the VM and/or freezing all
- * live tuples on the page. It is updated before returning to the caller
- * because vacuum does assert-build only validation on the page using this
- * field.
+ * on the page older than the visibility horizon represented in the
+ * GlobalVisState.
+ *
+ * If we encounter an uncommitted tuple, this field is unmaintained. If
+ * the page is being set all-visible or when freezing all live tuples on
+ * the page, it is used to calculate the snapshot conflict horizon.
*/
prstate.visibility_cutoff_xid = InvalidTransactionId;
@@ -756,6 +753,16 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
prstate.ndead > 0 ||
prstate.nunused > 0;
+ /*
+ * After processing all the live tuples on the page, if the newest xmin
+ * amongst them is not visible to everyone, the page cannot be
+ * all-visible.
+ */
+ if (prstate.all_visible &&
+ TransactionIdIsNormal(prstate.visibility_cutoff_xid) &&
+ !GlobalVisXidVisibleToAll(prstate.vistest, prstate.visibility_cutoff_xid))
+ prstate.all_visible = prstate.all_frozen = false;
+
/*
* Even if we don't prune anything, if we found a new value for the
* pd_prune_xid field or the page was marked full, we will update those
@@ -1098,12 +1105,10 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
TransactionId debug_cutoff;
bool debug_all_frozen;
- Assert(cutoffs);
-
Assert(prstate.lpdead_items == 0);
if (!heap_page_is_all_visible(relation, buffer,
- cutoffs->OldestXmin,
+ prstate.vistest,
&debug_all_frozen,
&debug_cutoff, off_loc))
Assert(false);
@@ -1628,19 +1633,6 @@ heap_prune_record_unchanged_lp_normal(Page page, PruneState *prstate, OffsetNumb
*/
xmin = HeapTupleHeaderGetXmin(htup);
- /*
- * For now always use prstate->cutoffs for this test, because
- * we only update 'all_visible' when freezing is requested. We
- * could use GlobalVisTestIsRemovableXid instead, if a
- * non-freezing caller wanted to set the VM bit.
- */
- Assert(prstate->cutoffs);
- if (!TransactionIdPrecedes(xmin, prstate->cutoffs->OldestXmin))
- {
- prstate->all_visible = prstate->all_frozen = false;
- break;
- }
-
/* Track newest xmin on page. */
if (TransactionIdFollows(xmin, prstate->visibility_cutoff_xid) &&
TransactionIdIsNormal(xmin))
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index 6a0fa371a06..777ec30eb82 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -465,7 +465,7 @@ static void dead_items_add(LVRelState *vacrel, BlockNumber blkno, OffsetNumber *
static void dead_items_reset(LVRelState *vacrel);
static void dead_items_cleanup(LVRelState *vacrel);
static bool heap_page_is_all_visible_except_lpdead(Relation rel, Buffer buf,
- TransactionId OldestXmin,
+ GlobalVisState *vistest,
OffsetNumber *deadoffsets,
int allowed_num_offsets,
bool *all_frozen,
@@ -2716,7 +2716,7 @@ lazy_vacuum_heap_page(LVRelState *vacrel, BlockNumber blkno, Buffer buffer,
InvalidOffsetNumber);
if (heap_page_is_all_visible_except_lpdead(vacrel->rel, buffer,
- vacrel->cutoffs.OldestXmin,
+ vacrel->vistest,
deadoffsets, num_offsets,
&all_frozen, &visibility_cutoff_xid,
&vacrel->offnum))
@@ -3459,13 +3459,13 @@ dead_items_cleanup(LVRelState *vacrel)
*/
bool
heap_page_is_all_visible(Relation rel, Buffer buf,
- TransactionId OldestXmin,
+ GlobalVisState *vistest,
bool *all_frozen,
TransactionId *visibility_cutoff_xid,
OffsetNumber *logging_offnum)
{
- return heap_page_is_all_visible_except_lpdead(rel, buf, OldestXmin,
+ return heap_page_is_all_visible_except_lpdead(rel, buf, vistest,
NULL, 0,
all_frozen,
visibility_cutoff_xid,
@@ -3500,7 +3500,7 @@ heap_page_is_all_visible(Relation rel, Buffer buf,
*/
static bool
heap_page_is_all_visible_except_lpdead(Relation rel, Buffer buf,
- TransactionId OldestXmin,
+ GlobalVisState *vistest,
OffsetNumber *deadoffsets,
int allowed_num_offsets,
bool *all_frozen,
@@ -3555,8 +3555,8 @@ heap_page_is_all_visible_except_lpdead(Relation rel, Buffer buf,
tuple.t_len = ItemIdGetLength(itemid);
tuple.t_tableOid = RelationGetRelid(rel);
- switch (HeapTupleSatisfiesVacuum(&tuple, OldestXmin,
- buf))
+ switch (HeapTupleSatisfiesVacuumGlobalVis(&tuple, vistest,
+ buf))
{
case HEAPTUPLE_LIVE:
{
@@ -3575,8 +3575,7 @@ heap_page_is_all_visible_except_lpdead(Relation rel, Buffer buf,
* that everyone sees it as committed?
*/
xmin = HeapTupleHeaderGetXmin(tuple.t_data);
- if (!TransactionIdPrecedes(xmin,
- OldestXmin))
+ if (!GlobalVisXidVisibleToAll(vistest, xmin))
{
all_visible = false;
*all_frozen = false;
diff --git a/src/include/access/heapam.h b/src/include/access/heapam.h
index 0b9bb1c9b13..4278f351bdf 100644
--- a/src/include/access/heapam.h
+++ b/src/include/access/heapam.h
@@ -342,7 +342,7 @@ extern void heap_inplace_unlock(Relation relation,
HeapTuple oldtup, Buffer buffer);
extern bool heap_page_is_all_visible(Relation rel, Buffer buf,
- TransactionId OldestXmin,
+ GlobalVisState *vistest,
bool *all_frozen,
TransactionId *visibility_cutoff_xid,
OffsetNumber *logging_offnum);
@@ -415,6 +415,8 @@ extern TM_Result HeapTupleSatisfiesUpdate(HeapTuple htup, CommandId curcid,
Buffer buffer);
extern HTSV_Result HeapTupleSatisfiesVacuum(HeapTuple htup, TransactionId OldestXmin,
Buffer buffer);
+extern HTSV_Result HeapTupleSatisfiesVacuumGlobalVis(HeapTuple htup,
+ GlobalVisState *vistest, Buffer buffer);
extern HTSV_Result HeapTupleSatisfiesVacuumHorizon(HeapTuple htup, Buffer buffer,
TransactionId *dead_after);
extern void HeapTupleSetHintBits(HeapTupleHeader tuple, Buffer buffer,
--
2.43.0
v5-0013-Rename-GlobalVisTestIsRemovableXid-to-GlobalVisXi.patchtext/x-patch; charset=US-ASCII; name=v5-0013-Rename-GlobalVisTestIsRemovableXid-to-GlobalVisXi.patchDownload
From 2b99c5954eaa99bad8efebc1eb0289b42469eee2 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <melanieplageman@gmail.com>
Date: Fri, 18 Jul 2025 16:30:04 -0400
Subject: [PATCH v5 13/20] Rename GlobalVisTestIsRemovableXid() to
GlobalVisXidVisibleToAll()
Currently, we only use GlobalVisTestIsRemovableXid() to check if a
tuple's xmax is visible to all, meaning we can remove it. But future
commits will use GlobalVisTestIsRemovableXid() to test if a tuple's xmin
is visible to all for the purposes of determining if setting the page
all-visible in the VM. In that case, it makes more sense to call the
function GlobalVisXidVisibleToAll().
---
src/backend/access/heap/heapam_visibility.c | 6 +++---
src/backend/access/heap/pruneheap.c | 14 +++++++-------
src/backend/access/spgist/spgvacuum.c | 2 +-
src/backend/storage/ipc/procarray.c | 13 ++++++-------
src/include/utils/snapmgr.h | 4 ++--
5 files changed, 19 insertions(+), 20 deletions(-)
diff --git a/src/backend/access/heap/heapam_visibility.c b/src/backend/access/heap/heapam_visibility.c
index 05f6946fe60..4ebc8abdbeb 100644
--- a/src/backend/access/heap/heapam_visibility.c
+++ b/src/backend/access/heap/heapam_visibility.c
@@ -1447,7 +1447,7 @@ HeapTupleSatisfiesNonVacuumable(HeapTuple htup, Snapshot snapshot,
{
Assert(TransactionIdIsValid(dead_after));
- if (GlobalVisTestIsRemovableXid(snapshot->vistest, dead_after))
+ if (GlobalVisXidVisibleToAll(snapshot->vistest, dead_after))
res = HEAPTUPLE_DEAD;
}
else
@@ -1512,8 +1512,8 @@ HeapTupleIsSurelyDead(HeapTuple htup, GlobalVisState *vistest)
return false;
/* Deleter committed, so tuple is dead if the XID is old enough. */
- return GlobalVisTestIsRemovableXid(vistest,
- HeapTupleHeaderGetRawXmax(tuple));
+ return GlobalVisXidVisibleToAll(vistest,
+ HeapTupleHeaderGetRawXmax(tuple));
}
/*
diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index ffc12314b41..715dfc16ba7 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -231,7 +231,7 @@ heap_page_prune_opt(Relation relation, Buffer buffer)
*/
vistest = GlobalVisTestFor(relation);
- if (!GlobalVisTestIsRemovableXid(vistest, prune_xid))
+ if (!GlobalVisXidVisibleToAll(vistest, prune_xid))
return;
/*
@@ -574,9 +574,9 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
* Determining HTSV only once for each tuple is required for correctness,
* to deal with cases where running HTSV twice could result in different
* results. For example, RECENTLY_DEAD can turn to DEAD if another
- * checked item causes GlobalVisTestIsRemovableFullXid() to update the
- * horizon, or INSERT_IN_PROGRESS can change to DEAD if the inserting
- * transaction aborts.
+ * checked item causes GlobalVisXidVisibleToAll() to update the horizon,
+ * or INSERT_IN_PROGRESS can change to DEAD if the inserting transaction
+ * aborts.
*
* It's also good for performance. Most commonly tuples within a page are
* stored at decreasing offsets (while the items are stored at increasing
@@ -1172,11 +1172,11 @@ heap_prune_satisfies_vacuum(PruneState *prstate, HeapTuple tup, Buffer buffer)
* Determine whether or not the tuple is considered dead when compared
* with the provided GlobalVisState. On-access pruning does not provide
* VacuumCutoffs. And for vacuum, even if the tuple's xmax is not older
- * than OldestXmin, GlobalVisTestIsRemovableXid() could find the row dead
- * if the GlobalVisState has been updated since the beginning of vacuuming
+ * than OldestXmin, GlobalVisXidVisibleToAll() could find the row dead if
+ * the GlobalVisState has been updated since the beginning of vacuuming
* the relation.
*/
- if (GlobalVisTestIsRemovableXid(prstate->vistest, dead_after))
+ if (GlobalVisXidVisibleToAll(prstate->vistest, dead_after))
return HEAPTUPLE_DEAD;
return res;
diff --git a/src/backend/access/spgist/spgvacuum.c b/src/backend/access/spgist/spgvacuum.c
index 2678f7ab782..4b8e5747239 100644
--- a/src/backend/access/spgist/spgvacuum.c
+++ b/src/backend/access/spgist/spgvacuum.c
@@ -536,7 +536,7 @@ vacuumRedirectAndPlaceholder(Relation index, Relation heaprel, Buffer buffer)
*/
if (dt->tupstate == SPGIST_REDIRECT &&
(!TransactionIdIsValid(dt->xid) ||
- GlobalVisTestIsRemovableXid(vistest, dt->xid)))
+ GlobalVisXidVisibleToAll(vistest, dt->xid)))
{
dt->tupstate = SPGIST_PLACEHOLDER;
Assert(opaque->nRedirection > 0);
diff --git a/src/backend/storage/ipc/procarray.c b/src/backend/storage/ipc/procarray.c
index bf987aed8d3..508bb379f87 100644
--- a/src/backend/storage/ipc/procarray.c
+++ b/src/backend/storage/ipc/procarray.c
@@ -4181,8 +4181,7 @@ GlobalVisUpdate(void)
* See comment for GlobalVisState for details.
*/
bool
-GlobalVisTestIsRemovableFullXid(GlobalVisState *state,
- FullTransactionId fxid)
+GlobalVisFullXidVisible(GlobalVisState *state, FullTransactionId fxid)
{
/*
* If fxid is older than maybe_needed bound, it definitely is visible to
@@ -4223,7 +4222,7 @@ GlobalVisTestIsRemovableFullXid(GlobalVisState *state,
* relfrozenxid).
*/
bool
-GlobalVisTestIsRemovableXid(GlobalVisState *state, TransactionId xid)
+GlobalVisXidVisibleToAll(GlobalVisState *state, TransactionId xid)
{
FullTransactionId fxid;
@@ -4237,7 +4236,7 @@ GlobalVisTestIsRemovableXid(GlobalVisState *state, TransactionId xid)
*/
fxid = FullXidRelativeTo(state->definitely_needed, xid);
- return GlobalVisTestIsRemovableFullXid(state, fxid);
+ return GlobalVisFullXidVisible(state, fxid);
}
/*
@@ -4251,12 +4250,12 @@ GlobalVisCheckRemovableFullXid(Relation rel, FullTransactionId fxid)
state = GlobalVisTestFor(rel);
- return GlobalVisTestIsRemovableFullXid(state, fxid);
+ return GlobalVisFullXidVisible(state, fxid);
}
/*
* Convenience wrapper around GlobalVisTestFor() and
- * GlobalVisTestIsRemovableXid(), see their comments.
+ * GlobalVisTestIsVisibleXid(), see their comments.
*/
bool
GlobalVisCheckRemovableXid(Relation rel, TransactionId xid)
@@ -4265,7 +4264,7 @@ GlobalVisCheckRemovableXid(Relation rel, TransactionId xid)
state = GlobalVisTestFor(rel);
- return GlobalVisTestIsRemovableXid(state, xid);
+ return GlobalVisXidVisibleToAll(state, xid);
}
/*
diff --git a/src/include/utils/snapmgr.h b/src/include/utils/snapmgr.h
index d346be71642..aec0692b5db 100644
--- a/src/include/utils/snapmgr.h
+++ b/src/include/utils/snapmgr.h
@@ -97,8 +97,8 @@ extern char *ExportSnapshot(Snapshot snapshot);
*/
typedef struct GlobalVisState GlobalVisState;
extern GlobalVisState *GlobalVisTestFor(Relation rel);
-extern bool GlobalVisTestIsRemovableXid(GlobalVisState *state, TransactionId xid);
-extern bool GlobalVisTestIsRemovableFullXid(GlobalVisState *state, FullTransactionId fxid);
+extern bool GlobalVisXidVisibleToAll(GlobalVisState *state, TransactionId xid);
+extern bool GlobalVisFullXidVisible(GlobalVisState *state, FullTransactionId fxid);
extern bool GlobalVisCheckRemovableXid(Relation rel, TransactionId xid);
extern bool GlobalVisCheckRemovableFullXid(Relation rel, FullTransactionId fxid);
--
2.43.0
v5-0012-Remove-xl_heap_visible-entirely.patchtext/x-patch; charset=US-ASCII; name=v5-0012-Remove-xl_heap_visible-entirely.patchDownload
From 632ace2402679e28a3af367d16434523135402a0 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <melanieplageman@gmail.com>
Date: Wed, 18 Jun 2025 12:30:42 -0400
Subject: [PATCH v5 12/20] Remove xl_heap_visible entirely
There are now no users of this, so eliminate it entirely.
---
src/backend/access/common/bufmask.c | 3 +-
src/backend/access/heap/heapam.c | 54 +-------
src/backend/access/heap/heapam_xlog.c | 154 +----------------------
src/backend/access/heap/pruneheap.c | 4 +-
src/backend/access/heap/vacuumlazy.c | 10 +-
src/backend/access/heap/visibilitymap.c | 106 +---------------
src/backend/access/rmgrdesc/heapdesc.c | 10 --
src/backend/replication/logical/decode.c | 1 -
src/backend/storage/ipc/standby.c | 12 +-
src/include/access/heapam_xlog.h | 20 ---
src/include/access/visibilitymap.h | 11 +-
src/include/access/visibilitymapdefs.h | 9 --
src/tools/pgindent/typedefs.list | 1 -
13 files changed, 30 insertions(+), 365 deletions(-)
diff --git a/src/backend/access/common/bufmask.c b/src/backend/access/common/bufmask.c
index bb260cffa68..1fff01383b3 100644
--- a/src/backend/access/common/bufmask.c
+++ b/src/backend/access/common/bufmask.c
@@ -56,8 +56,7 @@ mask_page_hint_bits(Page page)
/*
* During replay, if the page LSN has advanced past our XLOG record's LSN,
- * we don't mark the page all-visible. See heap_xlog_visible() for
- * details.
+ * we don't mark the page all-visible.
*/
PageClearAllVisible(page);
}
diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index 68db4325285..48f7b84156a 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -40,6 +40,7 @@
#include "access/valid.h"
#include "access/visibilitymap.h"
#include "access/xloginsert.h"
+#include "access/xlogutils.h"
#include "catalog/pg_database.h"
#include "catalog/pg_database_d.h"
#include "commands/vacuum.h"
@@ -2512,11 +2513,11 @@ heap_multi_insert(Relation relation, TupleTableSlot **slots, int ntuples,
{
PageSetAllVisible(page);
LockBuffer(vmbuffer, BUFFER_LOCK_EXCLUSIVE);
- visibilitymap_set_vmbyte(relation,
- BufferGetBlockNumber(buffer),
- vmbuffer,
- VISIBILITYMAP_ALL_VISIBLE |
- VISIBILITYMAP_ALL_FROZEN);
+ visibilitymap_set(relation,
+ BufferGetBlockNumber(buffer),
+ vmbuffer,
+ VISIBILITYMAP_ALL_VISIBLE |
+ VISIBILITYMAP_ALL_FROZEN);
}
/*
@@ -8784,49 +8785,6 @@ bottomup_sort_and_shrink(TM_IndexDeleteOp *delstate)
return nblocksfavorable;
}
-/*
- * Perform XLogInsert for a heap-visible operation. 'block' is the block
- * being marked all-visible, and vm_buffer is the buffer containing the
- * corresponding visibility map block. Both should have already been modified
- * and dirtied.
- *
- * snapshotConflictHorizon comes from the largest xmin on the page being
- * marked all-visible. REDO routine uses it to generate recovery conflicts.
- *
- * If checksums or wal_log_hints are enabled, we may also generate a full-page
- * image of heap_buffer. Otherwise, we optimize away the FPI (by specifying
- * REGBUF_NO_IMAGE for the heap buffer), in which case the caller should *not*
- * update the heap page's LSN.
- */
-XLogRecPtr
-log_heap_visible(Relation rel, Buffer heap_buffer, Buffer vm_buffer,
- TransactionId snapshotConflictHorizon, uint8 vmflags)
-{
- xl_heap_visible xlrec;
- XLogRecPtr recptr;
- uint8 flags;
-
- Assert(BufferIsValid(heap_buffer));
- Assert(BufferIsValid(vm_buffer));
-
- xlrec.snapshotConflictHorizon = snapshotConflictHorizon;
- xlrec.flags = vmflags;
- if (RelationIsAccessibleInLogicalDecoding(rel))
- xlrec.flags |= VISIBILITYMAP_XLOG_CATALOG_REL;
- XLogBeginInsert();
- XLogRegisterData(&xlrec, SizeOfHeapVisible);
-
- XLogRegisterBuffer(0, vm_buffer, 0);
-
- flags = REGBUF_STANDARD;
- if (!XLogHintBitIsNeeded())
- flags |= REGBUF_NO_IMAGE;
- XLogRegisterBuffer(1, heap_buffer, flags);
-
- recptr = XLogInsert(RM_HEAP2_ID, XLOG_HEAP2_VISIBLE);
-
- return recptr;
-}
/*
* Perform XLogInsert for a heap-update operation. Caller must already
diff --git a/src/backend/access/heap/heapam_xlog.c b/src/backend/access/heap/heapam_xlog.c
index 14541e2e94f..64f06d46bf1 100644
--- a/src/backend/access/heap/heapam_xlog.c
+++ b/src/backend/access/heap/heapam_xlog.c
@@ -82,10 +82,6 @@ heap_xlog_prune_freeze(XLogReaderState *record)
memcpy(&vmflags, maindataptr, sizeof(uint8));
maindataptr += sizeof(uint8);
- /*
- * We don't set VISIBILITYMAP_XLOG_CATALOG_REL in the combined record
- * because we already have XLHP_IS_CATALOG_REL.
- */
Assert((vmflags & VISIBILITYMAP_VALID_BITS) == vmflags);
/* Must never set all_frozen bit without also setting all_visible bit */
Assert(vmflags != VISIBILITYMAP_ALL_FROZEN);
@@ -267,7 +263,7 @@ heap_xlog_prune_freeze(XLogReaderState *record)
Relation reln = CreateFakeRelcacheEntry(rlocator);
visibilitymap_pin(reln, blkno, &vmbuffer);
- old_vmbits = visibilitymap_set_vmbyte(reln, blkno, vmbuffer, vmflags);
+ old_vmbits = visibilitymap_set(reln, blkno, vmbuffer, vmflags);
/* Only set VM page LSN if we modified the page */
if (old_vmbits != vmflags)
PageSetLSN(BufferGetPage(vmbuffer), lsn);
@@ -278,143 +274,6 @@ heap_xlog_prune_freeze(XLogReaderState *record)
UnlockReleaseBuffer(vmbuffer);
}
-/*
- * Replay XLOG_HEAP2_VISIBLE records.
- *
- * The critical integrity requirement here is that we must never end up with
- * a situation where the visibility map bit is set, and the page-level
- * PD_ALL_VISIBLE bit is clear. If that were to occur, then a subsequent
- * page modification would fail to clear the visibility map bit.
- */
-static void
-heap_xlog_visible(XLogReaderState *record)
-{
- XLogRecPtr lsn = record->EndRecPtr;
- xl_heap_visible *xlrec = (xl_heap_visible *) XLogRecGetData(record);
- Buffer vmbuffer = InvalidBuffer;
- Buffer buffer;
- Page page;
- RelFileLocator rlocator;
- BlockNumber blkno;
- XLogRedoAction action;
-
- Assert((xlrec->flags & VISIBILITYMAP_XLOG_VALID_BITS) == xlrec->flags);
-
- XLogRecGetBlockTag(record, 1, &rlocator, NULL, &blkno);
-
- /*
- * If there are any Hot Standby transactions running that have an xmin
- * horizon old enough that this page isn't all-visible for them, they
- * might incorrectly decide that an index-only scan can skip a heap fetch.
- *
- * NB: It might be better to throw some kind of "soft" conflict here that
- * forces any index-only scan that is in flight to perform heap fetches,
- * rather than killing the transaction outright.
- */
- if (InHotStandby)
- ResolveRecoveryConflictWithSnapshot(xlrec->snapshotConflictHorizon,
- xlrec->flags & VISIBILITYMAP_XLOG_CATALOG_REL,
- rlocator);
-
- /*
- * Read the heap page, if it still exists. If the heap file has dropped or
- * truncated later in recovery, we don't need to update the page, but we'd
- * better still update the visibility map.
- */
- action = XLogReadBufferForRedo(record, 1, &buffer);
- if (action == BLK_NEEDS_REDO)
- {
- /*
- * We don't bump the LSN of the heap page when setting the visibility
- * map bit (unless checksums or wal_hint_bits is enabled, in which
- * case we must). This exposes us to torn page hazards, but since
- * we're not inspecting the existing page contents in any way, we
- * don't care.
- */
- page = BufferGetPage(buffer);
-
- PageSetAllVisible(page);
-
- if (XLogHintBitIsNeeded())
- PageSetLSN(page, lsn);
-
- MarkBufferDirty(buffer);
- }
- else if (action == BLK_RESTORED)
- {
- /*
- * If heap block was backed up, we already restored it and there's
- * nothing more to do. (This can only happen with checksums or
- * wal_log_hints enabled.)
- */
- }
-
- if (BufferIsValid(buffer))
- {
- Size space = PageGetFreeSpace(BufferGetPage(buffer));
-
- UnlockReleaseBuffer(buffer);
-
- /*
- * Since FSM is not WAL-logged and only updated heuristically, it
- * easily becomes stale in standbys. If the standby is later promoted
- * and runs VACUUM, it will skip updating individual free space
- * figures for pages that became all-visible (or all-frozen, depending
- * on the vacuum mode,) which is troublesome when FreeSpaceMapVacuum
- * propagates too optimistic free space values to upper FSM layers;
- * later inserters try to use such pages only to find out that they
- * are unusable. This can cause long stalls when there are many such
- * pages.
- *
- * Forestall those problems by updating FSM's idea about a page that
- * is becoming all-visible or all-frozen.
- *
- * Do this regardless of a full-page image being applied, since the
- * FSM data is not in the page anyway.
- */
- if (xlrec->flags & VISIBILITYMAP_VALID_BITS)
- XLogRecordPageWithFreeSpace(rlocator, blkno, space);
- }
-
- /*
- * Even if we skipped the heap page update due to the LSN interlock, it's
- * still safe to update the visibility map. Any WAL record that clears
- * the visibility map bit does so before checking the page LSN, so any
- * bits that need to be cleared will still be cleared.
- */
- if (XLogReadBufferForRedoExtended(record, 0, RBM_ZERO_ON_ERROR, false,
- &vmbuffer) == BLK_NEEDS_REDO)
- {
- Page vmpage = BufferGetPage(vmbuffer);
- Relation reln;
- uint8 vmbits;
-
- /* initialize the page if it was read as zeros */
- if (PageIsNew(vmpage))
- PageInit(vmpage, BLCKSZ, 0);
-
- /* remove VISIBILITYMAP_XLOG_* */
- vmbits = xlrec->flags & VISIBILITYMAP_VALID_BITS;
-
- /*
- * XLogReadBufferForRedoExtended locked the buffer. But
- * visibilitymap_set will handle locking itself.
- */
- LockBuffer(vmbuffer, BUFFER_LOCK_UNLOCK);
-
- reln = CreateFakeRelcacheEntry(rlocator);
- visibilitymap_pin(reln, blkno, &vmbuffer);
-
- visibilitymap_set(reln, blkno, InvalidBuffer, lsn, vmbuffer,
- xlrec->snapshotConflictHorizon, vmbits);
-
- ReleaseBuffer(vmbuffer);
- FreeFakeRelcacheEntry(reln);
- }
- else if (BufferIsValid(vmbuffer))
- UnlockReleaseBuffer(vmbuffer);
-}
-
/*
* Given an "infobits" field from an XLog record, set the correct bits in the
* given infomask and infomask2 for the tuple touched by the record.
@@ -791,16 +650,16 @@ heap_xlog_multi_insert(XLogReaderState *record)
Relation reln = CreateFakeRelcacheEntry(rlocator);
visibilitymap_pin(reln, blkno, &vmbuffer);
- visibilitymap_set_vmbyte(reln, blkno,
- vmbuffer,
- VISIBILITYMAP_ALL_VISIBLE |
- VISIBILITYMAP_ALL_FROZEN);
/*
* It is not possible that the VM was already set for this heap page,
* so the vmbuffer must have been modified and marked dirty.
*/
Assert(BufferIsDirty(vmbuffer));
+ visibilitymap_set(reln, blkno,
+ vmbuffer,
+ VISIBILITYMAP_ALL_VISIBLE |
+ VISIBILITYMAP_ALL_FROZEN);
PageSetLSN(BufferGetPage(vmbuffer), lsn);
FreeFakeRelcacheEntry(reln);
}
@@ -1380,9 +1239,6 @@ heap2_redo(XLogReaderState *record)
case XLOG_HEAP2_PRUNE_VACUUM_CLEANUP:
heap_xlog_prune_freeze(record);
break;
- case XLOG_HEAP2_VISIBLE:
- heap_xlog_visible(record);
- break;
case XLOG_HEAP2_MULTI_INSERT:
heap_xlog_multi_insert(record);
break;
diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index 82127e8728b..ffc12314b41 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -979,8 +979,8 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
if (do_set_vm)
{
LockBuffer(vmbuffer, BUFFER_LOCK_EXCLUSIVE);
- old_vmbits = visibilitymap_set_vmbyte(relation, blockno,
- vmbuffer, vmflags);
+ old_vmbits = visibilitymap_set(relation, blockno,
+ vmbuffer, vmflags);
if (old_vmbits == vmflags)
{
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index 91e209901b8..6a0fa371a06 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -1887,8 +1887,8 @@ lazy_scan_new_or_empty(LVRelState *vacrel, Buffer buf, BlockNumber blkno,
MarkBufferDirty(buf);
LockBuffer(vmbuffer, BUFFER_LOCK_EXCLUSIVE);
- visibilitymap_set_vmbyte(vacrel->rel, blkno,
- vmbuffer, new_vmbits);
+ visibilitymap_set(vacrel->rel, blkno,
+ vmbuffer, new_vmbits);
if (RelationNeedsWAL(vacrel->rel))
{
@@ -2754,9 +2754,9 @@ lazy_vacuum_heap_page(LVRelState *vacrel, BlockNumber blkno, Buffer buffer,
set_pd_all_vis = true;
LockBuffer(vmbuffer, BUFFER_LOCK_EXCLUSIVE);
PageSetAllVisible(page);
- visibilitymap_set_vmbyte(vacrel->rel,
- blkno,
- vmbuffer, vmflags);
+ visibilitymap_set(vacrel->rel,
+ blkno,
+ vmbuffer, vmflags);
conflict_xid = visibility_cutoff_xid;
}
diff --git a/src/backend/access/heap/visibilitymap.c b/src/backend/access/heap/visibilitymap.c
index 0bc64203959..5ed54e06dd4 100644
--- a/src/backend/access/heap/visibilitymap.c
+++ b/src/backend/access/heap/visibilitymap.c
@@ -14,8 +14,7 @@
* visibilitymap_clear - clear bits for one page in the visibility map
* visibilitymap_pin - pin a map page for setting a bit
* visibilitymap_pin_ok - check whether correct map page is already pinned
- * visibilitymap_set - set a bit in a previously pinned page and log
- * visibilitymap_set_vmbyte - set a bit in a pinned page
+ * visibilitymap_set - set a bit in a previously pinned page
* visibilitymap_get_status - get status of bits
* visibilitymap_count - count number of bits set in visibility map
* visibilitymap_prepare_truncate -
@@ -220,105 +219,6 @@ visibilitymap_pin_ok(BlockNumber heapBlk, Buffer vmbuf)
return BufferIsValid(vmbuf) && BufferGetBlockNumber(vmbuf) == mapBlock;
}
-/*
- * visibilitymap_set - set bit(s) on a previously pinned page
- *
- * recptr is the LSN of the XLOG record we're replaying, if we're in recovery,
- * or InvalidXLogRecPtr in normal running. The VM page LSN is advanced to the
- * one provided; in normal running, we generate a new XLOG record and set the
- * page LSN to that value (though the heap page's LSN may *not* be updated;
- * see below). cutoff_xid is the largest xmin on the page being marked
- * all-visible; it is needed for Hot Standby, and can be InvalidTransactionId
- * if the page contains no tuples. It can also be set to InvalidTransactionId
- * when a page that is already all-visible is being marked all-frozen.
- *
- * Caller is expected to set the heap page's PD_ALL_VISIBLE bit before calling
- * this function. Except in recovery, caller should also pass the heap
- * buffer. When checksums are enabled and we're not in recovery, we must add
- * the heap buffer to the WAL chain to protect it from being torn.
- *
- * You must pass a buffer containing the correct map page to this function.
- * Call visibilitymap_pin first to pin the right one. This function doesn't do
- * any I/O.
- *
- * Returns the state of the page's VM bits before setting flags.
- */
-uint8
-visibilitymap_set(Relation rel, BlockNumber heapBlk, Buffer heapBuf,
- XLogRecPtr recptr, Buffer vmBuf, TransactionId cutoff_xid,
- uint8 flags)
-{
- BlockNumber mapBlock = HEAPBLK_TO_MAPBLOCK(heapBlk);
- uint32 mapByte = HEAPBLK_TO_MAPBYTE(heapBlk);
- uint8 mapOffset = HEAPBLK_TO_OFFSET(heapBlk);
- Page page;
- uint8 *map;
- uint8 status;
-
-#ifdef TRACE_VISIBILITYMAP
- elog(DEBUG1, "vm_set %s %d", RelationGetRelationName(rel), heapBlk);
-#endif
-
- Assert(InRecovery || XLogRecPtrIsInvalid(recptr));
- Assert(InRecovery || PageIsAllVisible((Page) BufferGetPage(heapBuf)));
- Assert((flags & VISIBILITYMAP_VALID_BITS) == flags);
-
- /* Must never set all_frozen bit without also setting all_visible bit */
- Assert(flags != VISIBILITYMAP_ALL_FROZEN);
-
- /* Check that we have the right heap page pinned, if present */
- if (BufferIsValid(heapBuf) && BufferGetBlockNumber(heapBuf) != heapBlk)
- elog(ERROR, "wrong heap buffer passed to visibilitymap_set");
-
- /* Check that we have the right VM page pinned */
- if (!BufferIsValid(vmBuf) || BufferGetBlockNumber(vmBuf) != mapBlock)
- elog(ERROR, "wrong VM buffer passed to visibilitymap_set");
-
- page = BufferGetPage(vmBuf);
- map = (uint8 *) PageGetContents(page);
- LockBuffer(vmBuf, BUFFER_LOCK_EXCLUSIVE);
-
- status = (map[mapByte] >> mapOffset) & VISIBILITYMAP_VALID_BITS;
- if (flags != status)
- {
- START_CRIT_SECTION();
-
- map[mapByte] |= (flags << mapOffset);
- MarkBufferDirty(vmBuf);
-
- if (RelationNeedsWAL(rel))
- {
- if (XLogRecPtrIsInvalid(recptr))
- {
- Assert(!InRecovery);
- recptr = log_heap_visible(rel, heapBuf, vmBuf, cutoff_xid, flags);
-
- /*
- * If data checksums are enabled (or wal_log_hints=on), we
- * need to protect the heap page from being torn.
- *
- * If not, then we must *not* update the heap page's LSN. In
- * this case, the FPI for the heap page was omitted from the
- * WAL record inserted above, so it would be incorrect to
- * update the heap page's LSN.
- */
- if (XLogHintBitIsNeeded())
- {
- Page heapPage = BufferGetPage(heapBuf);
-
- PageSetLSN(heapPage, recptr);
- }
- }
- PageSetLSN(page, recptr);
- }
-
- END_CRIT_SECTION();
- }
-
- LockBuffer(vmBuf, BUFFER_LOCK_UNLOCK);
- return status;
-}
-
/*
* Set flags in the VM block contained in the passed in vmBuf.
*
@@ -338,8 +238,8 @@ visibilitymap_set(Relation rel, BlockNumber heapBlk, Buffer heapBuf,
* making any changes needed to the associated heap page.
*/
uint8
-visibilitymap_set_vmbyte(Relation rel, BlockNumber heapBlk,
- Buffer vmBuf, uint8 flags)
+visibilitymap_set(Relation rel, BlockNumber heapBlk,
+ Buffer vmBuf, uint8 flags)
{
BlockNumber mapBlock = HEAPBLK_TO_MAPBLOCK(heapBlk);
uint32 mapByte = HEAPBLK_TO_MAPBYTE(heapBlk);
diff --git a/src/backend/access/rmgrdesc/heapdesc.c b/src/backend/access/rmgrdesc/heapdesc.c
index d6c86ccac20..f7880a4ed81 100644
--- a/src/backend/access/rmgrdesc/heapdesc.c
+++ b/src/backend/access/rmgrdesc/heapdesc.c
@@ -351,13 +351,6 @@ heap2_desc(StringInfo buf, XLogReaderState *record)
}
}
}
- else if (info == XLOG_HEAP2_VISIBLE)
- {
- xl_heap_visible *xlrec = (xl_heap_visible *) rec;
-
- appendStringInfo(buf, "snapshotConflictHorizon: %u, flags: 0x%02X",
- xlrec->snapshotConflictHorizon, xlrec->flags);
- }
else if (info == XLOG_HEAP2_MULTI_INSERT)
{
xl_heap_multi_insert *xlrec = (xl_heap_multi_insert *) rec;
@@ -462,9 +455,6 @@ heap2_identify(uint8 info)
case XLOG_HEAP2_PRUNE_VACUUM_CLEANUP:
id = "PRUNE_VACUUM_CLEANUP";
break;
- case XLOG_HEAP2_VISIBLE:
- id = "VISIBLE";
- break;
case XLOG_HEAP2_MULTI_INSERT:
id = "MULTI_INSERT";
break;
diff --git a/src/backend/replication/logical/decode.c b/src/backend/replication/logical/decode.c
index cc03f0706e9..2fdd4af90a8 100644
--- a/src/backend/replication/logical/decode.c
+++ b/src/backend/replication/logical/decode.c
@@ -454,7 +454,6 @@ heap2_decode(LogicalDecodingContext *ctx, XLogRecordBuffer *buf)
case XLOG_HEAP2_PRUNE_ON_ACCESS:
case XLOG_HEAP2_PRUNE_VACUUM_SCAN:
case XLOG_HEAP2_PRUNE_VACUUM_CLEANUP:
- case XLOG_HEAP2_VISIBLE:
case XLOG_HEAP2_LOCK_UPDATED:
break;
default:
diff --git a/src/backend/storage/ipc/standby.c b/src/backend/storage/ipc/standby.c
index 4222bdab078..c619643e121 100644
--- a/src/backend/storage/ipc/standby.c
+++ b/src/backend/storage/ipc/standby.c
@@ -475,12 +475,12 @@ ResolveRecoveryConflictWithSnapshot(TransactionId snapshotConflictHorizon,
* If we get passed InvalidTransactionId then we do nothing (no conflict).
*
* This can happen when replaying already-applied WAL records after a
- * standby crash or restart, or when replaying an XLOG_HEAP2_VISIBLE
- * record that marks as frozen a page which was already all-visible. It's
- * also quite common with records generated during index deletion
- * (original execution of the deletion can reason that a recovery conflict
- * which is sufficient for the deletion operation must take place before
- * replay of the deletion record itself).
+ * standby crash or restart, or when replaying a record that marks as
+ * frozen a page which was already marked all-visible in the visibility
+ * map. It's also quite common with records generated during index
+ * deletion (original execution of the deletion can reason that a recovery
+ * conflict which is sufficient for the deletion operation must take place
+ * before replay of the deletion record itself).
*/
if (!TransactionIdIsValid(snapshotConflictHorizon))
return;
diff --git a/src/include/access/heapam_xlog.h b/src/include/access/heapam_xlog.h
index ceae9c083ff..a64677b7bca 100644
--- a/src/include/access/heapam_xlog.h
+++ b/src/include/access/heapam_xlog.h
@@ -60,7 +60,6 @@
#define XLOG_HEAP2_PRUNE_ON_ACCESS 0x10
#define XLOG_HEAP2_PRUNE_VACUUM_SCAN 0x20
#define XLOG_HEAP2_PRUNE_VACUUM_CLEANUP 0x30
-#define XLOG_HEAP2_VISIBLE 0x40
#define XLOG_HEAP2_MULTI_INSERT 0x50
#define XLOG_HEAP2_LOCK_UPDATED 0x60
#define XLOG_HEAP2_NEW_CID 0x70
@@ -438,20 +437,6 @@ typedef struct xl_heap_inplace
#define MinSizeOfHeapInplace (offsetof(xl_heap_inplace, nmsgs) + sizeof(int))
-/*
- * This is what we need to know about setting a visibility map bit
- *
- * Backup blk 0: visibility map buffer
- * Backup blk 1: heap buffer
- */
-typedef struct xl_heap_visible
-{
- TransactionId snapshotConflictHorizon;
- uint8 flags;
-} xl_heap_visible;
-
-#define SizeOfHeapVisible (offsetof(xl_heap_visible, flags) + sizeof(uint8))
-
typedef struct xl_heap_new_cid
{
/*
@@ -495,11 +480,6 @@ extern void heap2_desc(StringInfo buf, XLogReaderState *record);
extern const char *heap2_identify(uint8 info);
extern void heap_xlog_logical_rewrite(XLogReaderState *r);
-extern XLogRecPtr log_heap_visible(Relation rel, Buffer heap_buffer,
- Buffer vm_buffer,
- TransactionId snapshotConflictHorizon,
- uint8 vmflags);
-
/* in heapdesc.c, so it can be shared between frontend/backend code */
extern void heap_xlog_deserialize_prune_and_freeze(char *cursor, uint8 flags,
int *nplans, xlhp_freeze_plan **plans,
diff --git a/src/include/access/visibilitymap.h b/src/include/access/visibilitymap.h
index 977566f6b98..c5b1e1f7adb 100644
--- a/src/include/access/visibilitymap.h
+++ b/src/include/access/visibilitymap.h
@@ -15,7 +15,6 @@
#define VISIBILITYMAP_H
#include "access/visibilitymapdefs.h"
-#include "access/xlogdefs.h"
#include "storage/block.h"
#include "storage/buf.h"
#include "utils/relcache.h"
@@ -31,14 +30,8 @@ extern bool visibilitymap_clear(Relation rel, BlockNumber heapBlk,
extern void visibilitymap_pin(Relation rel, BlockNumber heapBlk,
Buffer *vmbuf);
extern bool visibilitymap_pin_ok(BlockNumber heapBlk, Buffer vmbuf);
-extern uint8 visibilitymap_set(Relation rel,
- BlockNumber heapBlk, Buffer heapBuf,
- XLogRecPtr recptr,
- Buffer vmBuf,
- TransactionId cutoff_xid,
- uint8 flags);
-extern uint8 visibilitymap_set_vmbyte(Relation rel, BlockNumber heapBlk,
- Buffer vmBuf, uint8 flags);
+extern uint8 visibilitymap_set(Relation rel, BlockNumber heapBlk,
+ Buffer vmBuf, uint8 flags);
extern uint8 visibilitymap_get_status(Relation rel, BlockNumber heapBlk, Buffer *vmbuf);
extern void visibilitymap_count(Relation rel, BlockNumber *all_visible, BlockNumber *all_frozen);
extern BlockNumber visibilitymap_prepare_truncate(Relation rel,
diff --git a/src/include/access/visibilitymapdefs.h b/src/include/access/visibilitymapdefs.h
index 5ad5c020877..e01bce4c99f 100644
--- a/src/include/access/visibilitymapdefs.h
+++ b/src/include/access/visibilitymapdefs.h
@@ -21,14 +21,5 @@
#define VISIBILITYMAP_ALL_FROZEN 0x02
#define VISIBILITYMAP_VALID_BITS 0x03 /* OR of all valid visibilitymap
* flags bits */
-/*
- * To detect recovery conflicts during logical decoding on a standby, we need
- * to know if a table is a user catalog table. For that we add an additional
- * bit into xl_heap_visible.flags, in addition to the above.
- *
- * NB: VISIBILITYMAP_XLOG_* may not be passed to visibilitymap_set().
- */
-#define VISIBILITYMAP_XLOG_CATALOG_REL 0x04
-#define VISIBILITYMAP_XLOG_VALID_BITS (VISIBILITYMAP_VALID_BITS | VISIBILITYMAP_XLOG_CATALOG_REL)
#endif /* VISIBILITYMAPDEFS_H */
diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list
index e6f2e93b2d6..98b1adc4e9e 100644
--- a/src/tools/pgindent/typedefs.list
+++ b/src/tools/pgindent/typedefs.list
@@ -4274,7 +4274,6 @@ xl_heap_prune
xl_heap_rewrite_mapping
xl_heap_truncate
xl_heap_update
-xl_heap_visible
xl_invalid_page
xl_invalid_page_key
xl_invalidations
--
2.43.0
v5-0015-Inline-TransactionIdFollows-Precedes.patchtext/x-patch; charset=US-ASCII; name=v5-0015-Inline-TransactionIdFollows-Precedes.patchDownload
From 5ca49d81544be2dd5502d5509fe09325df9d0857 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <melanieplageman@gmail.com>
Date: Tue, 29 Jul 2025 09:57:13 -0400
Subject: [PATCH v5 15/20] Inline TransactionIdFollows/Precedes()
Calling these from on-access pruning code had noticeable overhead in a
profile. There does not seem to be a reason not to inline them.
---
src/backend/access/transam/transam.c | 64 -------------------------
src/include/access/transam.h | 70 ++++++++++++++++++++++++++--
2 files changed, 66 insertions(+), 68 deletions(-)
diff --git a/src/backend/access/transam/transam.c b/src/backend/access/transam/transam.c
index 9a39451a29a..553d6756cb3 100644
--- a/src/backend/access/transam/transam.c
+++ b/src/backend/access/transam/transam.c
@@ -273,70 +273,6 @@ TransactionIdAbortTree(TransactionId xid, int nxids, TransactionId *xids)
TRANSACTION_STATUS_ABORTED, InvalidXLogRecPtr);
}
-/*
- * TransactionIdPrecedes --- is id1 logically < id2?
- */
-bool
-TransactionIdPrecedes(TransactionId id1, TransactionId id2)
-{
- /*
- * If either ID is a permanent XID then we can just do unsigned
- * comparison. If both are normal, do a modulo-2^32 comparison.
- */
- int32 diff;
-
- if (!TransactionIdIsNormal(id1) || !TransactionIdIsNormal(id2))
- return (id1 < id2);
-
- diff = (int32) (id1 - id2);
- return (diff < 0);
-}
-
-/*
- * TransactionIdPrecedesOrEquals --- is id1 logically <= id2?
- */
-bool
-TransactionIdPrecedesOrEquals(TransactionId id1, TransactionId id2)
-{
- int32 diff;
-
- if (!TransactionIdIsNormal(id1) || !TransactionIdIsNormal(id2))
- return (id1 <= id2);
-
- diff = (int32) (id1 - id2);
- return (diff <= 0);
-}
-
-/*
- * TransactionIdFollows --- is id1 logically > id2?
- */
-bool
-TransactionIdFollows(TransactionId id1, TransactionId id2)
-{
- int32 diff;
-
- if (!TransactionIdIsNormal(id1) || !TransactionIdIsNormal(id2))
- return (id1 > id2);
-
- diff = (int32) (id1 - id2);
- return (diff > 0);
-}
-
-/*
- * TransactionIdFollowsOrEquals --- is id1 logically >= id2?
- */
-bool
-TransactionIdFollowsOrEquals(TransactionId id1, TransactionId id2)
-{
- int32 diff;
-
- if (!TransactionIdIsNormal(id1) || !TransactionIdIsNormal(id2))
- return (id1 >= id2);
-
- diff = (int32) (id1 - id2);
- return (diff >= 0);
-}
-
/*
* TransactionIdLatest --- get latest XID among a main xact and its children
diff --git a/src/include/access/transam.h b/src/include/access/transam.h
index 7d82cd2eb56..c9e20418275 100644
--- a/src/include/access/transam.h
+++ b/src/include/access/transam.h
@@ -255,6 +255,72 @@ typedef struct TransamVariablesData
} TransamVariablesData;
+
+/*
+ * TransactionIdPrecedes --- is id1 logically < id2?
+ */
+static inline bool
+TransactionIdPrecedes(TransactionId id1, TransactionId id2)
+{
+ /*
+ * If either ID is a permanent XID then we can just do unsigned
+ * comparison. If both are normal, do a modulo-2^32 comparison.
+ */
+ int32 diff;
+
+ if (!TransactionIdIsNormal(id1) || !TransactionIdIsNormal(id2))
+ return (id1 < id2);
+
+ diff = (int32) (id1 - id2);
+ return (diff < 0);
+}
+
+/*
+ * TransactionIdPrecedesOrEquals --- is id1 logically <= id2?
+ */
+static inline bool
+TransactionIdPrecedesOrEquals(TransactionId id1, TransactionId id2)
+{
+ int32 diff;
+
+ if (!TransactionIdIsNormal(id1) || !TransactionIdIsNormal(id2))
+ return (id1 <= id2);
+
+ diff = (int32) (id1 - id2);
+ return (diff <= 0);
+}
+
+/*
+ * TransactionIdFollows --- is id1 logically > id2?
+ */
+static inline bool
+TransactionIdFollows(TransactionId id1, TransactionId id2)
+{
+ int32 diff;
+
+ if (!TransactionIdIsNormal(id1) || !TransactionIdIsNormal(id2))
+ return (id1 > id2);
+
+ diff = (int32) (id1 - id2);
+ return (diff > 0);
+}
+
+/*
+ * TransactionIdFollowsOrEquals --- is id1 logically >= id2?
+ */
+static inline bool
+TransactionIdFollowsOrEquals(TransactionId id1, TransactionId id2)
+{
+ int32 diff;
+
+ if (!TransactionIdIsNormal(id1) || !TransactionIdIsNormal(id2))
+ return (id1 >= id2);
+
+ diff = (int32) (id1 - id2);
+ return (diff >= 0);
+}
+
+
/* ----------------
* extern declarations
* ----------------
@@ -274,10 +340,6 @@ extern bool TransactionIdDidAbort(TransactionId transactionId);
extern void TransactionIdCommitTree(TransactionId xid, int nxids, TransactionId *xids);
extern void TransactionIdAsyncCommitTree(TransactionId xid, int nxids, TransactionId *xids, XLogRecPtr lsn);
extern void TransactionIdAbortTree(TransactionId xid, int nxids, TransactionId *xids);
-extern bool TransactionIdPrecedes(TransactionId id1, TransactionId id2);
-extern bool TransactionIdPrecedesOrEquals(TransactionId id1, TransactionId id2);
-extern bool TransactionIdFollows(TransactionId id1, TransactionId id2);
-extern bool TransactionIdFollowsOrEquals(TransactionId id1, TransactionId id2);
extern TransactionId TransactionIdLatest(TransactionId mainxid,
int nxids, const TransactionId *xids);
extern XLogRecPtr TransactionIdGetCommitLSN(TransactionId xid);
--
2.43.0
v5-0016-Unset-all-visible-sooner-if-not-freezing.patchtext/x-patch; charset=US-ASCII; name=v5-0016-Unset-all-visible-sooner-if-not-freezing.patchDownload
From 474ed6ba17773f557cd9fbf196388ffb6a7b7c4e Mon Sep 17 00:00:00 2001
From: Melanie Plageman <melanieplageman@gmail.com>
Date: Tue, 29 Jul 2025 14:35:13 -0400
Subject: [PATCH v5 16/20] Unset all-visible sooner if not freezing
In prune/freeze code, we delay unsetting all-visible/all-frozen in the
presence of dead items to allow opportunistically freezing tuples if the
whole page would be frozen except for those dead items -- which are
removed later in vacuum's third phase.
Future commits will allow on-access pruning to set the VM, which means
all-visible will be initialized to true instead of false and we will do
extra bookkeeping in heap_prune_unchanged_lp_normal() to keep track of
whether or not the page is all-visible.
Because on-access pruning will not freeze tuples, it makes sense to
unset all-visible as soon as we encounter an LP_DEAD item and
avoid continued bookkeeping since we know the page is not all-visible
and we won't be able to remove those dead items.
---
src/backend/access/heap/pruneheap.c | 10 ++++++++--
1 file changed, 8 insertions(+), 2 deletions(-)
diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index ab79d8a3ed9..80d055e5376 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -1493,8 +1493,11 @@ heap_prune_record_dead(PruneState *prstate, OffsetNumber offnum,
/*
* Deliberately delay unsetting all_visible until later during pruning.
- * Removable dead tuples shouldn't preclude freezing the page.
+ * Removable dead tuples shouldn't preclude freezing the page. If we won't
+ * attempt freezing, just unset all-visible now, though.
*/
+ if (!prstate->attempt_freeze)
+ prstate->all_visible = prstate->all_frozen = false;
/* Record the dead offset for vacuum */
prstate->deadoffsets[prstate->lpdead_items++] = offnum;
@@ -1752,8 +1755,11 @@ heap_prune_record_unchanged_lp_dead(Page page, PruneState *prstate, OffsetNumber
* Similarly, don't unset all_visible until later, at the end of
* heap_page_prune_and_freeze(). This will allow us to attempt to freeze
* the page after pruning. As long as we unset it before updating the
- * visibility map, this will be correct.
+ * visibility map, this will be correct. If we won't attempt freezing,
+ * though, just unset all-visible now.
*/
+ if (!prstate->attempt_freeze)
+ prstate->all_visible = prstate->all_frozen = false;
/* Record the dead offset for vacuum */
prstate->deadoffsets[prstate->lpdead_items++] = offnum;
--
2.43.0
v5-0017-Allow-on-access-pruning-to-set-pages-all-visible.patchtext/x-patch; charset=US-ASCII; name=v5-0017-Allow-on-access-pruning-to-set-pages-all-visible.patchDownload
From 262dd663f1cb7fbbf84865e5bccf890c15762412 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <melanieplageman@gmail.com>
Date: Tue, 29 Jul 2025 14:34:30 -0400
Subject: [PATCH v5 17/20] Allow on-access pruning to set pages all-visible
Many queries do not modify the underlying relation. For such queries, if
on-access pruning occurs during the scan, we can check whether the page
has become all-visible and update the visibility map accordingly.
Previously, only vacuum marked pages as all-visible or all-frozen.
Supporting this requires passing information about whether the relation
is modified from the executor down to the scan descriptor.
This commit implements on-access VM setting for sequential scans as well
as for the underlying heap relation in index scans and bitmap heap
scans.
---
src/backend/access/heap/heapam.c | 15 +++++-
src/backend/access/heap/heapam_handler.c | 15 +++++-
src/backend/access/heap/pruneheap.c | 63 ++++++++++++++++++-----
src/backend/access/index/indexam.c | 46 +++++++++++++++++
src/backend/access/table/tableam.c | 39 ++++++++++++--
src/backend/executor/execMain.c | 4 ++
src/backend/executor/execUtils.c | 2 +
src/backend/executor/nodeBitmapHeapscan.c | 7 ++-
src/backend/executor/nodeIndexscan.c | 18 ++++---
src/backend/executor/nodeSeqscan.c | 24 +++++++--
src/include/access/genam.h | 11 ++++
src/include/access/heapam.h | 24 +++++++--
src/include/access/relscan.h | 6 +++
src/include/access/tableam.h | 30 ++++++++++-
src/include/nodes/execnodes.h | 6 +++
15 files changed, 273 insertions(+), 37 deletions(-)
diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index 48f7b84156a..f90b014a9b0 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -556,6 +556,7 @@ heap_prepare_pagescan(TableScanDesc sscan)
Buffer buffer = scan->rs_cbuf;
BlockNumber block = scan->rs_cblock;
Snapshot snapshot;
+ Buffer *vmbuffer = NULL;
Page page;
int lines;
bool all_visible;
@@ -570,7 +571,9 @@ heap_prepare_pagescan(TableScanDesc sscan)
/*
* Prune and repair fragmentation for the whole page, if possible.
*/
- heap_page_prune_opt(scan->rs_base.rs_rd, buffer);
+ if (sscan->rs_flags & SO_ALLOW_VM_SET)
+ vmbuffer = &scan->rs_vmbuffer;
+ heap_page_prune_opt(scan->rs_base.rs_rd, buffer, vmbuffer);
/*
* We must hold share lock on the buffer content while examining tuple
@@ -1236,6 +1239,7 @@ heap_beginscan(Relation relation, Snapshot snapshot,
sizeof(TBMIterateResult));
}
+ scan->rs_vmbuffer = InvalidBuffer;
return (TableScanDesc) scan;
}
@@ -1274,6 +1278,12 @@ heap_rescan(TableScanDesc sscan, ScanKey key, bool set_params,
scan->rs_cbuf = InvalidBuffer;
}
+ if (BufferIsValid(scan->rs_vmbuffer))
+ {
+ ReleaseBuffer(scan->rs_vmbuffer);
+ scan->rs_vmbuffer = InvalidBuffer;
+ }
+
/*
* SO_TYPE_BITMAPSCAN would be cleaned up here, but it does not hold any
* additional data vs a normal HeapScan
@@ -1306,6 +1316,9 @@ heap_endscan(TableScanDesc sscan)
if (BufferIsValid(scan->rs_cbuf))
ReleaseBuffer(scan->rs_cbuf);
+ if (BufferIsValid(scan->rs_vmbuffer))
+ ReleaseBuffer(scan->rs_vmbuffer);
+
/*
* Must free the read stream before freeing the BufferAccessStrategy.
*/
diff --git a/src/backend/access/heap/heapam_handler.c b/src/backend/access/heap/heapam_handler.c
index cb4bc35c93e..c68283de6f2 100644
--- a/src/backend/access/heap/heapam_handler.c
+++ b/src/backend/access/heap/heapam_handler.c
@@ -85,6 +85,7 @@ heapam_index_fetch_begin(Relation rel)
hscan->xs_base.rel = rel;
hscan->xs_cbuf = InvalidBuffer;
+ hscan->xs_vmbuffer = InvalidBuffer;
return &hscan->xs_base;
}
@@ -99,6 +100,12 @@ heapam_index_fetch_reset(IndexFetchTableData *scan)
ReleaseBuffer(hscan->xs_cbuf);
hscan->xs_cbuf = InvalidBuffer;
}
+
+ if (BufferIsValid(hscan->xs_vmbuffer))
+ {
+ ReleaseBuffer(hscan->xs_vmbuffer);
+ hscan->xs_vmbuffer = InvalidBuffer;
+ }
}
static void
@@ -138,7 +145,8 @@ heapam_index_fetch_tuple(struct IndexFetchTableData *scan,
* Prune page, but only if we weren't already on this page
*/
if (prev_buf != hscan->xs_cbuf)
- heap_page_prune_opt(hscan->xs_base.rel, hscan->xs_cbuf);
+ heap_page_prune_opt(hscan->xs_base.rel, hscan->xs_cbuf,
+ scan->modifies_base_rel ? NULL : &hscan->xs_vmbuffer);
}
/* Obtain share-lock on the buffer so we can examine visibility */
@@ -2471,6 +2479,7 @@ BitmapHeapScanNextBlock(TableScanDesc scan,
TBMIterateResult *tbmres;
OffsetNumber offsets[TBM_MAX_TUPLES_PER_PAGE];
int noffsets = -1;
+ Buffer *vmbuffer = NULL;
Assert(scan->rs_flags & SO_TYPE_BITMAPSCAN);
Assert(hscan->rs_read_stream);
@@ -2517,7 +2526,9 @@ BitmapHeapScanNextBlock(TableScanDesc scan,
/*
* Prune and repair fragmentation for the whole page, if possible.
*/
- heap_page_prune_opt(scan->rs_rd, buffer);
+ if (scan->rs_flags & SO_ALLOW_VM_SET)
+ vmbuffer = &hscan->rs_vmbuffer;
+ heap_page_prune_opt(scan->rs_rd, buffer, vmbuffer);
/*
* We must hold share lock on the buffer content while examining tuple
diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index 80d055e5376..dad341cb265 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -198,9 +198,13 @@ static bool identify_and_fix_vm_corruption(Relation relation,
* if there's not any use in pruning.
*
* Caller must have pin on the buffer, and must *not* have a lock on it.
+ *
+ * If vmbuffer is not NULL, it is okay for pruning to set the visibility map if
+ * the page is all visible. We will take care of pinning and, if needed,
+ * reading in the page of the visibility map.
*/
void
-heap_page_prune_opt(Relation relation, Buffer buffer)
+heap_page_prune_opt(Relation relation, Buffer buffer, Buffer *vmbuffer)
{
Page page = BufferGetPage(buffer);
TransactionId prune_xid;
@@ -264,6 +268,13 @@ heap_page_prune_opt(Relation relation, Buffer buffer)
{
OffsetNumber dummy_off_loc;
PruneFreezeResult presult;
+ int options = 0;
+
+ if (vmbuffer)
+ {
+ visibilitymap_pin(relation, BufferGetBlockNumber(buffer), vmbuffer);
+ options = HEAP_PAGE_PRUNE_UPDATE_VM;
+ }
/*
* For now, pass mark_unused_now as false regardless of whether or
@@ -271,9 +282,10 @@ heap_page_prune_opt(Relation relation, Buffer buffer)
* that during on-access pruning with the current implementation.
*/
heap_page_prune_and_freeze(relation, buffer, false,
- InvalidBuffer,
- vistest, 0,
- NULL, &presult, PRUNE_ON_ACCESS, &dummy_off_loc, NULL, NULL);
+ vmbuffer ? *vmbuffer : InvalidBuffer,
+ vistest, options,
+ NULL, &presult, PRUNE_ON_ACCESS,
+ &dummy_off_loc, NULL, NULL);
/*
* Report the number of tuples reclaimed to pgstats. This is
@@ -513,12 +525,17 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
* all-frozen for use in opportunistic freezing and to update the VM if
* the caller requests it.
*
- * Currently, only VACUUM attempts freezing and setting the VM bits. But
- * other callers could do either one. The visibility bookkeeping is
- * required for opportunistic freezing (in addition to setting the VM
- * bits) because we only consider opportunistically freezing tuples if the
- * whole page would become all-frozen or if the whole page will be frozen
- * except for dead tuples that will be removed by vacuum.
+ * Currently, only VACUUM attempts freezing. But other callers could. The
+ * visibility bookkeeping is required for opportunistic freezing (in
+ * addition to setting the VM bits) because we only consider
+ * opportunistically freezing tuples if the whole page would become
+ * all-frozen or if the whole page will be frozen except for dead tuples
+ * that will be removed by vacuum. But if consider_update_vm is false,
+ * we'll not set the VM even if the page is discovered to be all-visible.
+ *
+ * If only HEAP_PAGE_PRUNE_UPDATE_VM is passed and not
+ * HEAP_PAGE_PRUNE_FREEZE, prstate.all_frozen must be initialized to false
+ * because we will not call heap_prepare_freeze_tuple() on each tuple.
*
* If only updating the VM, we must initialize all_frozen to false, as
* heap_prepare_freeze_tuple() will not be called for each tuple on the
@@ -530,7 +547,7 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
* whether or not to freeze but before deciding whether or not to update
* the VM so that we don't set the VM bit incorrectly.
*
- * If not freezing or updating the VM, we otherwise avoid the extra
+ * If not freezing and not updating the VM, we avoid the extra
* bookkeeping. Initializing all_visible to false allows skipping the work
* to update them in heap_prune_record_unchanged_lp_normal().
*/
@@ -879,12 +896,30 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
prstate.all_frozen = false;
}
+ /*
+ * If this is an on-access call and we're not actually pruning, avoid
+ * setting the visibility map if it would newly dirty the heap page or, if
+ * the page is already dirty, if doing so would require including a
+ * full-page image (FPI) of the heap page in the WAL. This situation
+ * should be rare, as on-access pruning is only attempted when
+ * pd_prune_xid is valid.
+ */
+ if (reason == PRUNE_ON_ACCESS &&
+ prstate.consider_update_vm &&
+ prstate.all_visible &&
+ !do_prune && !do_freeze &&
+ (!BufferIsDirty(buffer) || XLogCheckBufferNeedsBackup(buffer)))
+ {
+ prstate.consider_update_vm = false;
+ prstate.all_visible = prstate.all_frozen = false;
+ }
+
Assert(!prstate.all_frozen || prstate.all_visible);
/*
- * Handle setting visibility map bit based on information from the VM (as
- * of last heap_vac_scan_next_block() call), and from all_visible and
- * all_frozen variables.
+ * Handle setting visibility map bit based on information from the VM (if
+ * provided, e.g. by vacuum from the last heap_vac_scan_next_block()
+ * call), and from all_visible and all_frozen variables.
*/
if (prstate.consider_update_vm)
{
diff --git a/src/backend/access/index/indexam.c b/src/backend/access/index/indexam.c
index 219df1971da..d803c307517 100644
--- a/src/backend/access/index/indexam.c
+++ b/src/backend/access/index/indexam.c
@@ -279,6 +279,32 @@ index_beginscan(Relation heapRelation,
return scan;
}
+/*
+ * Similar to index_beginscan(), but allows the caller to indicate whether the
+ * query modifies the underlying base relation. This is used when the caller
+ * wants to attempt marking pages in the base relation as all-visible in the
+ * visibility map during on-access pruning.
+ */
+IndexScanDesc
+index_beginscan_vmset(Relation heapRelation,
+ Relation indexRelation,
+ Snapshot snapshot,
+ IndexScanInstrumentation *instrument,
+ int nkeys, int norderbys, bool modifies_base_rel)
+{
+ IndexScanDesc scan;
+
+ scan = index_beginscan(heapRelation,
+ indexRelation,
+ snapshot,
+ instrument,
+ nkeys, norderbys);
+
+ scan->xs_heapfetch->modifies_base_rel = modifies_base_rel;
+
+ return scan;
+}
+
/*
* index_beginscan_bitmap - start a scan of an index with amgetbitmap
*
@@ -610,6 +636,26 @@ index_beginscan_parallel(Relation heaprel, Relation indexrel,
return scan;
}
+/*
+ * Parallel version of index_beginscan_vmset()
+ */
+IndexScanDesc
+index_beginscan_parallel_vmset(Relation heaprel, Relation indexrel,
+ IndexScanInstrumentation *instrument,
+ int nkeys, int norderbys,
+ ParallelIndexScanDesc pscan,
+ bool modifies_base_rel)
+{
+ IndexScanDesc scan;
+
+ scan = index_beginscan_parallel(heaprel, indexrel,
+ instrument,
+ nkeys, norderbys,
+ pscan);
+ scan->xs_heapfetch->modifies_base_rel = modifies_base_rel;
+ return scan;
+}
+
/* ----------------
* index_getnext_tid - get the next TID from a scan
*
diff --git a/src/backend/access/table/tableam.c b/src/backend/access/table/tableam.c
index a56c5eceb14..67dbf99f5b5 100644
--- a/src/backend/access/table/tableam.c
+++ b/src/backend/access/table/tableam.c
@@ -49,6 +49,10 @@
char *default_table_access_method = DEFAULT_TABLE_ACCESS_METHOD;
bool synchronize_seqscans = true;
+/* Helper for table_beginscan_parallel() and table_beginscan_parallel_vmset() */
+static TableScanDesc table_beginscan_parallel_common(Relation relation, ParallelTableScanDesc pscan,
+ uint32 flags);
+
/* ----------------------------------------------------------------------------
* Slot functions.
@@ -162,12 +166,14 @@ table_parallelscan_initialize(Relation rel, ParallelTableScanDesc pscan,
}
}
-TableScanDesc
-table_beginscan_parallel(Relation relation, ParallelTableScanDesc pscan)
+/*
+ * Common helper for table_beginscan_parallel() and table_beginscan_parallel_vmset()
+ */
+static TableScanDesc
+table_beginscan_parallel_common(Relation relation, ParallelTableScanDesc pscan,
+ uint32 flags)
{
Snapshot snapshot;
- uint32 flags = SO_TYPE_SEQSCAN |
- SO_ALLOW_STRAT | SO_ALLOW_SYNC | SO_ALLOW_PAGEMODE;
Assert(RelFileLocatorEquals(relation->rd_locator, pscan->phs_locator));
@@ -188,6 +194,31 @@ table_beginscan_parallel(Relation relation, ParallelTableScanDesc pscan)
pscan, flags);
}
+TableScanDesc
+table_beginscan_parallel(Relation relation, ParallelTableScanDesc pscan)
+{
+ uint32 flags = SO_TYPE_SEQSCAN |
+ SO_ALLOW_STRAT | SO_ALLOW_SYNC | SO_ALLOW_PAGEMODE;
+
+ return table_beginscan_parallel_common(relation, pscan, flags);
+}
+
+/*
+ * Parallel version of table_beginscan_vmset()
+ */
+TableScanDesc
+table_beginscan_parallel_vmset(Relation relation, ParallelTableScanDesc pscan,
+ bool modifies_rel)
+{
+ uint32 flags = SO_TYPE_SEQSCAN |
+ SO_ALLOW_STRAT | SO_ALLOW_SYNC | SO_ALLOW_PAGEMODE;
+
+ if (!modifies_rel)
+ flags |= SO_ALLOW_VM_SET;
+
+ return table_beginscan_parallel_common(relation, pscan, flags);
+}
+
/* ----------------------------------------------------------------------------
* Index scan related functions.
diff --git a/src/backend/executor/execMain.c b/src/backend/executor/execMain.c
index 0391798dd2c..065676eb7cf 100644
--- a/src/backend/executor/execMain.c
+++ b/src/backend/executor/execMain.c
@@ -917,6 +917,10 @@ InitPlan(QueryDesc *queryDesc, int eflags)
break;
}
+ /* If it has a rowmark, the relation is modified */
+ estate->es_modified_relids = bms_add_member(estate->es_modified_relids,
+ rc->rti);
+
/* Check that relation is a legal target for marking */
if (relation)
CheckValidRowMarkRel(relation, rc->markType);
diff --git a/src/backend/executor/execUtils.c b/src/backend/executor/execUtils.c
index fdc65c2b42b..28a06dcd244 100644
--- a/src/backend/executor/execUtils.c
+++ b/src/backend/executor/execUtils.c
@@ -893,6 +893,8 @@ ExecInitResultRelation(EState *estate, ResultRelInfo *resultRelInfo,
estate->es_result_relations = (ResultRelInfo **)
palloc0(estate->es_range_table_size * sizeof(ResultRelInfo *));
estate->es_result_relations[rti - 1] = resultRelInfo;
+ estate->es_modified_relids = bms_add_member(estate->es_modified_relids,
+ rti);
/*
* Saving in the list allows to avoid needlessly traversing the whole
diff --git a/src/backend/executor/nodeBitmapHeapscan.c b/src/backend/executor/nodeBitmapHeapscan.c
index bf24f3d7fe0..af6db9f7919 100644
--- a/src/backend/executor/nodeBitmapHeapscan.c
+++ b/src/backend/executor/nodeBitmapHeapscan.c
@@ -105,11 +105,16 @@ BitmapTableScanSetup(BitmapHeapScanState *node)
*/
if (!node->ss.ss_currentScanDesc)
{
+ bool modifies_rel =
+ bms_is_member(((Scan *) node->ss.ps.plan)->scanrelid,
+ node->ss.ps.state->es_modified_relids);
+
node->ss.ss_currentScanDesc =
table_beginscan_bm(node->ss.ss_currentRelation,
node->ss.ps.state->es_snapshot,
0,
- NULL);
+ NULL,
+ modifies_rel);
}
node->ss.ss_currentScanDesc->st.rs_tbmiterator = tbmiterator;
diff --git a/src/backend/executor/nodeIndexscan.c b/src/backend/executor/nodeIndexscan.c
index 7fcaa37fe62..c2ffbd3b08e 100644
--- a/src/backend/executor/nodeIndexscan.c
+++ b/src/backend/executor/nodeIndexscan.c
@@ -102,16 +102,22 @@ IndexNext(IndexScanState *node)
if (scandesc == NULL)
{
+
+ bool modifies_base_rel =
+ bms_is_member(((Scan *) node->ss.ps.plan)->scanrelid,
+ estate->es_modified_relids);
+
/*
* We reach here if the index scan is not parallel, or if we're
* serially executing an index scan that was planned to be parallel.
*/
- scandesc = index_beginscan(node->ss.ss_currentRelation,
- node->iss_RelationDesc,
- estate->es_snapshot,
- &node->iss_Instrument,
- node->iss_NumScanKeys,
- node->iss_NumOrderByKeys);
+ scandesc = index_beginscan_vmset(node->ss.ss_currentRelation,
+ node->iss_RelationDesc,
+ estate->es_snapshot,
+ &node->iss_Instrument,
+ node->iss_NumScanKeys,
+ node->iss_NumOrderByKeys,
+ modifies_base_rel);
node->iss_ScanDesc = scandesc;
diff --git a/src/backend/executor/nodeSeqscan.c b/src/backend/executor/nodeSeqscan.c
index ed35c58c2c3..15e1853027b 100644
--- a/src/backend/executor/nodeSeqscan.c
+++ b/src/backend/executor/nodeSeqscan.c
@@ -65,13 +65,18 @@ SeqNext(SeqScanState *node)
if (scandesc == NULL)
{
+ bool modifies_rel =
+ bms_is_member(((Scan *) node->ss.ps.plan)->scanrelid,
+ estate->es_modified_relids);
+
/*
* We reach here if the scan is not parallel, or if we're serially
* executing a scan that was planned to be parallel.
*/
- scandesc = table_beginscan(node->ss.ss_currentRelation,
- estate->es_snapshot,
- 0, NULL);
+ scandesc = table_beginscan_vmset(node->ss.ss_currentRelation,
+ estate->es_snapshot,
+ 0, NULL, modifies_rel);
+
node->ss.ss_currentScanDesc = scandesc;
}
@@ -362,6 +367,7 @@ ExecSeqScanInitializeDSM(SeqScanState *node,
ParallelContext *pcxt)
{
EState *estate = node->ss.ps.state;
+ bool modifies_rel;
ParallelTableScanDesc pscan;
pscan = shm_toc_allocate(pcxt->toc, node->pscan_len);
@@ -369,8 +375,11 @@ ExecSeqScanInitializeDSM(SeqScanState *node,
pscan,
estate->es_snapshot);
shm_toc_insert(pcxt->toc, node->ss.ps.plan->plan_node_id, pscan);
+ modifies_rel = bms_is_member(((Scan *) node->ss.ps.plan)->scanrelid,
+ estate->es_modified_relids);
node->ss.ss_currentScanDesc =
- table_beginscan_parallel(node->ss.ss_currentRelation, pscan);
+ table_beginscan_parallel_vmset(node->ss.ss_currentRelation, pscan,
+ modifies_rel);
}
/* ----------------------------------------------------------------
@@ -400,8 +409,13 @@ ExecSeqScanInitializeWorker(SeqScanState *node,
ParallelWorkerContext *pwcxt)
{
ParallelTableScanDesc pscan;
+ bool modifies_rel =
+ bms_is_member(((Scan *) node->ss.ps.plan)->scanrelid,
+ node->ss.ps.state->es_modified_relids);
pscan = shm_toc_lookup(pwcxt->toc, node->ss.ps.plan->plan_node_id, false);
node->ss.ss_currentScanDesc =
- table_beginscan_parallel(node->ss.ss_currentRelation, pscan);
+ table_beginscan_parallel_vmset(node->ss.ss_currentRelation,
+ pscan,
+ modifies_rel);
}
diff --git a/src/include/access/genam.h b/src/include/access/genam.h
index 5b2ab181b5f..bf272c2c37f 100644
--- a/src/include/access/genam.h
+++ b/src/include/access/genam.h
@@ -180,6 +180,11 @@ extern IndexScanDesc index_beginscan(Relation heapRelation,
Snapshot snapshot,
IndexScanInstrumentation *instrument,
int nkeys, int norderbys);
+extern IndexScanDesc index_beginscan_vmset(Relation heapRelation,
+ Relation indexRelation,
+ Snapshot snapshot,
+ IndexScanInstrumentation *instrument,
+ int nkeys, int norderbys, bool modifies_heap_rel);
extern IndexScanDesc index_beginscan_bitmap(Relation indexRelation,
Snapshot snapshot,
IndexScanInstrumentation *instrument,
@@ -206,6 +211,12 @@ extern IndexScanDesc index_beginscan_parallel(Relation heaprel,
IndexScanInstrumentation *instrument,
int nkeys, int norderbys,
ParallelIndexScanDesc pscan);
+
+extern IndexScanDesc index_beginscan_parallel_vmset(Relation heaprel, Relation indexrel,
+ IndexScanInstrumentation *instrument,
+ int nkeys, int norderbys,
+ ParallelIndexScanDesc pscan,
+ bool modifies_rel);
extern ItemPointer index_getnext_tid(IndexScanDesc scan,
ScanDirection direction);
struct TupleTableSlot;
diff --git a/src/include/access/heapam.h b/src/include/access/heapam.h
index 4278f351bdf..16f7904a21e 100644
--- a/src/include/access/heapam.h
+++ b/src/include/access/heapam.h
@@ -94,6 +94,13 @@ typedef struct HeapScanDescData
*/
ParallelBlockTableScanWorkerData *rs_parallelworkerdata;
+ /*
+ * For sequential scans and bitmap heap scans. If the relation is not
+ * being modified, on-access pruning may read in the current heap page's
+ * corresponding VM block to this buffer.
+ */
+ Buffer rs_vmbuffer;
+
/* these fields only used in page-at-a-time mode and for bitmap scans */
uint32 rs_cindex; /* current tuple's index in vistuples */
uint32 rs_ntuples; /* number of visible tuples on page */
@@ -116,8 +123,18 @@ typedef struct IndexFetchHeapData
{
IndexFetchTableData xs_base; /* AM independent part of the descriptor */
- Buffer xs_cbuf; /* current heap buffer in scan, if any */
- /* NB: if xs_cbuf is not InvalidBuffer, we hold a pin on that buffer */
+ /*
+ * Current heap buffer in scan, if any. NB: if xs_cbuf is not
+ * InvalidBuffer, we hold a pin on that buffer.
+ */
+ Buffer xs_cbuf;
+
+ /*
+ * For index scans that do not modify the underlying heap table, on-access
+ * pruning may read in the current heap page's corresponding VM block to
+ * this buffer.
+ */
+ Buffer xs_vmbuffer;
} IndexFetchHeapData;
/* Result codes for HeapTupleSatisfiesVacuum */
@@ -374,7 +391,8 @@ extern TransactionId heap_index_delete_tuples(Relation rel,
/* in heap/pruneheap.c */
struct GlobalVisState;
-extern void heap_page_prune_opt(Relation relation, Buffer buffer);
+extern void heap_page_prune_opt(Relation relation, Buffer buffer,
+ Buffer *vmbuffer);
extern void heap_page_prune_and_freeze(Relation relation, Buffer buffer,
bool blk_known_av,
Buffer vmbuffer,
diff --git a/src/include/access/relscan.h b/src/include/access/relscan.h
index b5e0fb386c0..f496e0b4939 100644
--- a/src/include/access/relscan.h
+++ b/src/include/access/relscan.h
@@ -121,6 +121,12 @@ typedef struct ParallelBlockTableScanWorkerData *ParallelBlockTableScanWorker;
typedef struct IndexFetchTableData
{
Relation rel;
+
+ /*
+ * Some optimizations can only be performed if the query does not modify
+ * the underlying relation. Track that here.
+ */
+ bool modifies_base_rel;
} IndexFetchTableData;
struct IndexScanInstrumentation;
diff --git a/src/include/access/tableam.h b/src/include/access/tableam.h
index 1c9e802a6b1..0e986d8ef72 100644
--- a/src/include/access/tableam.h
+++ b/src/include/access/tableam.h
@@ -62,6 +62,8 @@ typedef enum ScanOptions
/* unregister snapshot at scan end? */
SO_TEMP_SNAPSHOT = 1 << 9,
+ /* whether or not scan should attempt to set the VM */
+ SO_ALLOW_VM_SET = 1 << 10,
} ScanOptions;
/*
@@ -876,6 +878,25 @@ table_beginscan(Relation rel, Snapshot snapshot,
return rel->rd_tableam->scan_begin(rel, snapshot, nkeys, key, NULL, flags);
}
+/*
+ * Similar to table_beginscan(), but allows the caller to indicate whether the
+ * query modifies the relation. This is used when the caller wants to attempt
+ * marking pages in the relation as all-visible in the visibility map during
+ * on-access pruning.
+ */
+static inline TableScanDesc
+table_beginscan_vmset(Relation rel, Snapshot snapshot,
+ int nkeys, struct ScanKeyData *key, bool modifies_rel)
+{
+ uint32 flags = SO_TYPE_SEQSCAN |
+ SO_ALLOW_STRAT | SO_ALLOW_SYNC | SO_ALLOW_PAGEMODE;
+
+ if (!modifies_rel)
+ flags |= SO_ALLOW_VM_SET;
+
+ return rel->rd_tableam->scan_begin(rel, snapshot, nkeys, key, NULL, flags);
+}
+
/*
* Like table_beginscan(), but for scanning catalog. It'll automatically use a
* snapshot appropriate for scanning catalog relations.
@@ -913,10 +934,13 @@ table_beginscan_strat(Relation rel, Snapshot snapshot,
*/
static inline TableScanDesc
table_beginscan_bm(Relation rel, Snapshot snapshot,
- int nkeys, struct ScanKeyData *key)
+ int nkeys, struct ScanKeyData *key, bool modifies_rel)
{
uint32 flags = SO_TYPE_BITMAPSCAN | SO_ALLOW_PAGEMODE;
+ if (!modifies_rel)
+ flags |= SO_ALLOW_VM_SET;
+
return rel->rd_tableam->scan_begin(rel, snapshot, nkeys, key,
NULL, flags);
}
@@ -1125,6 +1149,10 @@ extern void table_parallelscan_initialize(Relation rel,
extern TableScanDesc table_beginscan_parallel(Relation relation,
ParallelTableScanDesc pscan);
+extern TableScanDesc table_beginscan_parallel_vmset(Relation relation,
+ ParallelTableScanDesc pscan,
+ bool modifies_rel);
+
/*
* Restart a parallel scan. Call this in the leader process. Caller is
* responsible for making sure that all workers have finished the scan
diff --git a/src/include/nodes/execnodes.h b/src/include/nodes/execnodes.h
index e107d6e5f81..326d7d78860 100644
--- a/src/include/nodes/execnodes.h
+++ b/src/include/nodes/execnodes.h
@@ -680,6 +680,12 @@ typedef struct EState
* ExecDoInitialPruning() */
const char *es_sourceText; /* Source text from QueryDesc */
+ /*
+ * RT indexes of relations modified by the query either through
+ * UPDATE/DELETE/INSERT/MERGE or SELECT FOR UPDATE
+ */
+ Bitmapset *es_modified_relids;
+
JunkFilter *es_junkFilter; /* top-level junk filter, if any */
/* If query can insert/delete tuples, the command ID to mark them with */
--
2.43.0
v5-0018-Add-helper-functions-to-heap_page_prune_and_freez.patchtext/x-patch; charset=US-ASCII; name=v5-0018-Add-helper-functions-to-heap_page_prune_and_freez.patchDownload
From d239dd8a66eee4e0b0dac4dc1e068b71ba219ac7 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <melanieplageman@gmail.com>
Date: Wed, 30 Jul 2025 18:51:43 -0400
Subject: [PATCH v5 18/20] Add helper functions to heap_page_prune_and_freeze
heap_page_prune_and_freeze() has gotten rather long. It has several
stages:
1) setup - where teh PruneStateis set up
2) tuple examination -- where tuples and line pointers are examined to
determine what needs to be pruned and what could be frozen
3) evaluation -- where we determine based on caller provided options,
heuristics, and state gathered during stage 2 whether or not to
freeze tuples and set the page in the VM
4) execution - where the page changes are actually made and logged
This commit refactors the evaluation stage into helpers which return
whether or not to freeze and set the VM.
For the purposes of committing, this likely shouldn't be a separate
commit. But I'm not sure yet whether it makes more sense to do this
refactoring earlier in the set for clarity for the reviewer.
---
src/backend/access/heap/pruneheap.c | 471 +++++++++++++++++-----------
1 file changed, 295 insertions(+), 176 deletions(-)
diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index dad341cb265..5d943b0c64f 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -179,6 +179,22 @@ static void heap_prune_record_unchanged_lp_redirect(PruneState *prstate, OffsetN
static void page_verify_redirects(Page page);
+static bool heap_page_will_freeze(Relation relation, Buffer buffer,
+ bool do_prune,
+ bool do_hint_full_or_prunable,
+ bool did_tuple_hint_fpi,
+ PruneState *prstate,
+ bool *all_frozen_except_lp_dead);
+
+static bool heap_page_will_update_vm(Relation relation,
+ Buffer buffer, BlockNumber blockno, Page page,
+ PruneReason reason,
+ bool do_prune, bool do_freeze,
+ bool blk_known_av,
+ PruneState *prstate,
+ Buffer *vmbuffer, uint8 *vmflags,
+ bool *set_pd_all_visible);
+
static bool identify_and_fix_vm_corruption(Relation relation,
BlockNumber heap_blk,
Buffer heap_buffer, Page heap_page,
@@ -376,6 +392,249 @@ identify_and_fix_vm_corruption(Relation relation,
return false;
}
+
+/*
+ * Determine whether to set the visibility map bits based on information from
+ * the PruneState and blk_known_av, which some callers will provide after
+ * previously examining this heap page's VM bits (e.g. vacuum from the last
+ * heap_vac_scan_next_block() call).
+ *
+ * We pass in blockno and page even those can be derived from buffer to avoid
+ * extra BufferGetBlock() and BufferGetBlockNumber() calls.
+ *
+ * This should be called only after do_freeze has been decided (and do_prune
+ * has been set), as these factor into our heuristic-based decision.
+ *
+ * prstate and vmbuffer are input/output fields. vmflags and and
+ * set_pd_all_visible are output fields.
+ *
+ * Returns true if the caller should set one or both of the VM bits and false
+ * otherwise.
+ */
+static bool
+heap_page_will_update_vm(Relation relation,
+ Buffer buffer, BlockNumber blockno, Page page,
+ PruneReason reason,
+ bool do_prune, bool do_freeze,
+ bool blk_known_av,
+ PruneState *prstate,
+ Buffer *vmbuffer, uint8 *vmflags,
+ bool *set_pd_all_visible)
+{
+ bool do_set_vm = false;
+
+ /*
+ * If the caller specified not to update the VM, validate everything is in
+ * the right state and exit.
+ */
+ if (!prstate->consider_update_vm)
+ {
+ Assert(!prstate->all_visible && !prstate->all_frozen);
+ /* We don't set only the page level visibility hint */
+ Assert(!(*set_pd_all_visible));
+ Assert(*vmflags == 0);
+ return false;
+ }
+
+ /*
+ * If this is an on-access call and we're not actually pruning, avoid
+ * setting the visibility map if it would newly dirty the heap page or, if
+ * the page is already dirty, if doing so would require including a
+ * full-page image (FPI) of the heap page in the WAL. This situation
+ * should be rare, as on-access pruning is only attempted when
+ * pd_prune_xid is valid.
+ */
+ if (reason == PRUNE_ON_ACCESS &&
+ prstate->consider_update_vm &&
+ prstate->all_visible &&
+ !do_prune && !do_freeze &&
+ (!BufferIsDirty(buffer) || XLogCheckBufferNeedsBackup(buffer)))
+ {
+ prstate->consider_update_vm = false;
+ prstate->all_visible = prstate->all_frozen = false;
+ }
+
+ Assert(!prstate->all_frozen || prstate->all_visible);
+
+ /*
+ * Clear any VM corruption. This does not need to be in a critical
+ * section, so we do it first. If PD_ALL_VISIBLE is incorrectly set, we
+ * may mark the heap page buffer dirty here and could end up doing so
+ * again later. This is not a correctness issue and is in the path of VM
+ * corruption, so we don't have to worry about the extra performance
+ * overhead.
+ */
+ if (identify_and_fix_vm_corruption(relation,
+ blockno, buffer, page,
+ blk_known_av, prstate->lpdead_items,
+ *vmbuffer))
+ {
+ /* If we fix corruption, don't update the VM further */
+ }
+
+ /* Determine if we actually need to set the VM and which bits to set. */
+ else if (prstate->all_visible &&
+ (!blk_known_av ||
+ (prstate->all_frozen && !VM_ALL_FROZEN(relation, blockno, vmbuffer))))
+ {
+ *vmflags |= VISIBILITYMAP_ALL_VISIBLE;
+ if (prstate->all_frozen)
+ *vmflags |= VISIBILITYMAP_ALL_FROZEN;
+ }
+
+ do_set_vm = *vmflags & VISIBILITYMAP_VALID_BITS;
+
+ /*
+ * Don't set PD_ALL_VISIBLE unless we also plan to set the VM. While it is
+ * correct for a heap page to have PD_ALL_VISIBLE even if the VM is not
+ * set, we strongly prefer to keep them in sync.
+ *
+ * Prior to Postgres 19, it was possible for the page-level bit to be set
+ * and the VM bit to be clear. This could happen if we crashed after
+ * setting PD_ALL_VISIBLE but before setting bits in the VM.
+ */
+ *set_pd_all_visible = do_set_vm && !PageIsAllVisible(page);
+ return do_set_vm;
+}
+
+/*
+ * Decide if we want to go ahead with freezing according to the freeze plans we
+ * prepared for the given buffer or not. If the caller specified we should not
+ * freeze tuples, it exits early.
+ *
+ * do_prune, do_hint_full_or_prunable, and did_tuple_hint_fpi must all have
+ * been decided before calling this function.
+ *
+ * prstate is an input/output parameter. all_frozen_except_lp_dead is set and
+ * used later to determine the snapshot conflict horizon for the record.
+ *
+ * Returns true if we should use our freeze plans and freeze tuples on the page
+ * and false otherwise.
+ */
+static bool
+heap_page_will_freeze(Relation relation, Buffer buffer,
+ bool do_prune,
+ bool do_hint_full_or_prunable,
+ bool did_tuple_hint_fpi,
+ PruneState *prstate,
+ bool *all_frozen_except_lp_dead)
+{
+ bool do_freeze = false;
+
+ /*
+ * If the caller specified we should not attempt to freeze any tuples,
+ * validate that everything is in the right state and exit.
+ */
+ if (!prstate->attempt_freeze)
+ {
+ Assert(!prstate->all_frozen && prstate->nfrozen == 0);
+ Assert(prstate->lpdead_items == 0 || !prstate->all_visible);
+ Assert(!(*all_frozen_except_lp_dead));
+ return false;
+ }
+
+ if (prstate->pagefrz.freeze_required)
+ {
+ /*
+ * heap_prepare_freeze_tuple indicated that at least one XID/MXID from
+ * before FreezeLimit/MultiXactCutoff is present. Must freeze to
+ * advance relfrozenxid/relminmxid.
+ */
+ do_freeze = true;
+ }
+ else
+ {
+ /*
+ * Opportunistically freeze the page if we are generating an FPI
+ * anyway and if doing so means that we can set the page all-frozen
+ * afterwards (might not happen until VACUUM's final heap pass).
+ *
+ * XXX: Previously, we knew if pruning emitted an FPI by checking
+ * pgWalUsage.wal_fpi before and after pruning. Once the freeze and
+ * prune records were combined, this heuristic couldn't be used
+ * anymore. The opportunistic freeze heuristic must be improved;
+ * however, for now, try to approximate the old logic.
+ */
+ if (prstate->all_visible && prstate->all_frozen && prstate->nfrozen > 0)
+ {
+ /*
+ * Freezing would make the page all-frozen. Have already emitted
+ * an FPI or will do so anyway?
+ */
+ if (RelationNeedsWAL(relation))
+ {
+ if (did_tuple_hint_fpi)
+ do_freeze = true;
+ else if (do_prune)
+ {
+ if (XLogCheckBufferNeedsBackup(buffer))
+ do_freeze = true;
+ }
+ else if (do_hint_full_or_prunable)
+ {
+ if (XLogHintBitIsNeeded() && XLogCheckBufferNeedsBackup(buffer))
+ do_freeze = true;
+ }
+ }
+ }
+ }
+
+ if (do_freeze)
+ {
+ /*
+ * Validate the tuples we will be freezing before entering the
+ * critical section.
+ */
+ heap_pre_freeze_checks(buffer, prstate->frozen, prstate->nfrozen);
+ }
+ else if (prstate->nfrozen > 0)
+ {
+ /*
+ * The page contained some tuples that were not already frozen, and we
+ * chose not to freeze them now. The page won't be all-frozen then.
+ */
+ Assert(!prstate->pagefrz.freeze_required);
+
+ prstate->all_frozen = false;
+ prstate->nfrozen = 0; /* avoid miscounts in instrumentation */
+ }
+ else
+ {
+ /*
+ * We have no freeze plans to execute. The page might already be
+ * all-frozen (perhaps only following pruning), though. Such pages
+ * can be marked all-frozen in the VM by our caller, even though none
+ * of its tuples were newly frozen here.
+ */
+ }
+
+ /*
+ * It was convenient to ignore LP_DEAD items in all_visible earlier on to
+ * make the choice of whether or not to freeze the page unaffected by the
+ * short-term presence of LP_DEAD items. These LP_DEAD items were
+ * effectively assumed to be LP_UNUSED items in the making. It doesn't
+ * matter which vacuum heap pass (initial pass or final pass) ends up
+ * setting the page all-frozen, as long as the ongoing VACUUM does it.
+ *
+ * Now that freezing has been finalized, unset all_visible if there are
+ * any LP_DEAD items on the page. It needs to reflect the present state of
+ * the page when using it to determine whether or not to update the VM.
+ *
+ * Keep track of whether or not the page was all-frozen except LP_DEAD
+ * items for the purposes of calculating the snapshot conflict horizon,
+ * though.
+ */
+ *all_frozen_except_lp_dead = prstate->all_frozen;
+ if (prstate->lpdead_items > 0)
+ {
+ prstate->all_visible = false;
+ prstate->all_frozen = false;
+ }
+
+ return do_freeze;
+}
+
+
/*
* Prune and repair fragmentation and potentially freeze tuples on the
* specified page. If the page's visibility status has changed, update it in
@@ -766,20 +1025,30 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
/* Clear the offset information once we have processed the given page. */
*off_loc = InvalidOffsetNumber;
- do_prune = prstate.nredirected > 0 ||
- prstate.ndead > 0 ||
- prstate.nunused > 0;
-
/*
* After processing all the live tuples on the page, if the newest xmin
* amongst them is not visible to everyone, the page cannot be
- * all-visible.
+ * all-visible. This must be done before we decide whether or not to
+ * opportunistically freeze below because we do not want to
+ * opportunistically freeze the page if there are live tuples not visible
+ * to everyone, which would prevent setting the page frozen in the VM.
*/
if (prstate.all_visible &&
TransactionIdIsNormal(prstate.visibility_cutoff_xid) &&
!GlobalVisXidVisibleToAll(prstate.vistest, prstate.visibility_cutoff_xid))
prstate.all_visible = prstate.all_frozen = false;
+ /*
+ * Now decide based on information collected while examining every tuple
+ * which actions to take. If there are any prunable tuples, we'll prune
+ * them. However, we will decide based on options specified by the caller
+ * and various heuristics whether or not to freeze any tuples and whether
+ * or not the page should be set all-visible/all-frozen in the VM.
+ */
+ do_prune = prstate.nredirected > 0 ||
+ prstate.ndead > 0 ||
+ prstate.nunused > 0;
+
/*
* Even if we don't prune anything, if we found a new value for the
* pd_prune_xid field or the page was marked full, we will update those
@@ -790,182 +1059,32 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
PageIsFull(page);
/*
- * Decide if we want to go ahead with freezing according to the freeze
- * plans we prepared, or not.
- */
- do_freeze = false;
- if (prstate.attempt_freeze)
- {
- if (prstate.pagefrz.freeze_required)
- {
- /*
- * heap_prepare_freeze_tuple indicated that at least one XID/MXID
- * from before FreezeLimit/MultiXactCutoff is present. Must
- * freeze to advance relfrozenxid/relminmxid.
- */
- do_freeze = true;
- }
- else
- {
- /*
- * Opportunistically freeze the page if we are generating an FPI
- * anyway and if doing so means that we can set the page
- * all-frozen afterwards (might not happen until VACUUM's final
- * heap pass).
- *
- * XXX: Previously, we knew if pruning emitted an FPI by checking
- * pgWalUsage.wal_fpi before and after pruning. Once the freeze
- * and prune records were combined, this heuristic couldn't be
- * used anymore. The opportunistic freeze heuristic must be
- * improved; however, for now, try to approximate the old logic.
- */
- if (prstate.all_visible && prstate.all_frozen && prstate.nfrozen > 0)
- {
- /*
- * Freezing would make the page all-frozen. Have already
- * emitted an FPI or will do so anyway?
- */
- if (RelationNeedsWAL(relation))
- {
- if (did_tuple_hint_fpi)
- do_freeze = true;
- else if (do_prune)
- {
- if (XLogCheckBufferNeedsBackup(buffer))
- do_freeze = true;
- }
- else if (do_hint_full_or_prunable)
- {
- if (XLogHintBitIsNeeded() && XLogCheckBufferNeedsBackup(buffer))
- do_freeze = true;
- }
- }
- }
- }
- }
-
- if (do_freeze)
- {
- /*
- * Validate the tuples we will be freezing before entering the
- * critical section.
- */
- heap_pre_freeze_checks(buffer, prstate.frozen, prstate.nfrozen);
- }
- else if (prstate.nfrozen > 0)
- {
- /*
- * The page contained some tuples that were not already frozen, and we
- * chose not to freeze them now. The page won't be all-frozen then.
- */
- Assert(!prstate.pagefrz.freeze_required);
-
- prstate.all_frozen = false;
- prstate.nfrozen = 0; /* avoid miscounts in instrumentation */
- }
- else
- {
- /*
- * We have no freeze plans to execute. The page might already be
- * all-frozen (perhaps only following pruning), though. Such pages
- * can be marked all-frozen in the VM by our caller, even though none
- * of its tuples were newly frozen here.
- */
- }
-
- /*
- * It was convenient to ignore LP_DEAD items in all_visible earlier on to
- * make the choice of whether or not to freeze the page unaffected by the
- * short-term presence of LP_DEAD items. These LP_DEAD items were
- * effectively assumed to be LP_UNUSED items in the making. It doesn't
- * matter which vacuum heap pass (initial pass or final pass) ends up
- * setting the page all-frozen, as long as the ongoing VACUUM does it.
- *
- * Now that freezing has been finalized, unset all_visible if there are
- * any LP_DEAD items on the page. It needs to reflect the present state of
- * the page when using it to determine whether or not to update the VM.
- *
- * Keep track of whether or not the page was all-frozen except LP_DEAD
- * items for the purposes of calculating the snapshot conflict horizon,
- * though.
+ * We must decide whether or not to freeze before deciding if and what to
+ * set in the VM.
*/
- all_frozen_except_lp_dead = prstate.all_frozen;
- if (prstate.lpdead_items > 0)
- {
- prstate.all_visible = false;
- prstate.all_frozen = false;
- }
-
- /*
- * If this is an on-access call and we're not actually pruning, avoid
- * setting the visibility map if it would newly dirty the heap page or, if
- * the page is already dirty, if doing so would require including a
- * full-page image (FPI) of the heap page in the WAL. This situation
- * should be rare, as on-access pruning is only attempted when
- * pd_prune_xid is valid.
- */
- if (reason == PRUNE_ON_ACCESS &&
- prstate.consider_update_vm &&
- prstate.all_visible &&
- !do_prune && !do_freeze &&
- (!BufferIsDirty(buffer) || XLogCheckBufferNeedsBackup(buffer)))
- {
- prstate.consider_update_vm = false;
- prstate.all_visible = prstate.all_frozen = false;
- }
-
- Assert(!prstate.all_frozen || prstate.all_visible);
-
- /*
- * Handle setting visibility map bit based on information from the VM (if
- * provided, e.g. by vacuum from the last heap_vac_scan_next_block()
- * call), and from all_visible and all_frozen variables.
- */
- if (prstate.consider_update_vm)
- {
- /*
- * Clear any VM corruption. This does not need to be in a critical
- * section, so we do it first. If PD_ALL_VISIBLE is incorrectly set,
- * we may mark the heap page buffer dirty here and could end up doing
- * so again later. This is not a correctness issue and is in the path
- * of VM corruption, so we don't have to worry about the extra
- * performance overhead.
- */
- if (identify_and_fix_vm_corruption(relation,
- blockno, buffer, page,
- blk_known_av, prstate.lpdead_items, vmbuffer))
- {
- /* If we fix corruption, don't update the VM further */
- }
-
- /* Determine if we actually need to set the VM and which bits to set. */
- else if (prstate.all_visible &&
- (!blk_known_av ||
- (prstate.all_frozen && !VM_ALL_FROZEN(relation, blockno, &vmbuffer))))
- {
- vmflags |= VISIBILITYMAP_ALL_VISIBLE;
- if (prstate.all_frozen)
- vmflags |= VISIBILITYMAP_ALL_FROZEN;
- }
- }
-
- do_set_vm = vmflags & VISIBILITYMAP_VALID_BITS;
-
- /*
- * Don't set PD_ALL_VISIBLE unless we also plan to set the VM. While it is
- * correct for a heap page to have PD_ALL_VISIBLE even if the VM is not
- * set, we strongly prefer to keep them in sync.
- *
- * Prior to Postgres 19, it was possible for the page-level bit to be set
- * and the VM bit to be clear. This could happen if we crashed after
- * setting PD_ALL_VISIBLE but before setting bits in the VM.
- */
- set_pd_all_visible = do_set_vm && !PageIsAllVisible(page);
+ do_freeze = heap_page_will_freeze(relation, buffer,
+ do_prune,
+ do_hint_full_or_prunable,
+ did_tuple_hint_fpi,
+ &prstate,
+ &all_frozen_except_lp_dead);
+
+ do_set_vm = heap_page_will_update_vm(relation,
+ buffer, blockno, page,
+ reason,
+ do_prune, do_freeze,
+ blk_known_av,
+ &prstate,
+ &vmbuffer,
+ &vmflags, &set_pd_all_visible);
/* Save these for the caller in case we later zero out vmflags */
presult->new_vmbits = vmflags;
- /* Any error while applying the changes is critical */
+ /*
+ * Time to actually make the changes to the page and log them. Any error
+ * while applying the changes is critical.
+ */
START_CRIT_SECTION();
if (do_hint_full_or_prunable)
--
2.43.0
v5-0019-Reorder-heap_page_prune_and_freeze-parameters.patchtext/x-patch; charset=US-ASCII; name=v5-0019-Reorder-heap_page_prune_and_freeze-parameters.patchDownload
From 86abf2be861bcab612737be55256edd6e67cd597 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <melanieplageman@gmail.com>
Date: Thu, 31 Jul 2025 12:08:18 -0400
Subject: [PATCH v5 19/20] Reorder heap_page_prune_and_freeze parameters
Reorder parameters so that all of the output parameters are together at
the end of the parameter list.
---
src/backend/access/heap/pruneheap.c | 38 ++++++++++++++--------------
src/backend/access/heap/vacuumlazy.c | 6 ++---
src/include/access/heapam.h | 4 +--
3 files changed, 24 insertions(+), 24 deletions(-)
diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index 5d943b0c64f..20f4a62fb16 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -297,10 +297,10 @@ heap_page_prune_opt(Relation relation, Buffer buffer, Buffer *vmbuffer)
* not the relation has indexes, since we cannot safely determine
* that during on-access pruning with the current implementation.
*/
- heap_page_prune_and_freeze(relation, buffer, false,
+ heap_page_prune_and_freeze(relation, buffer, options, false,
vmbuffer ? *vmbuffer : InvalidBuffer,
- vistest, options,
- NULL, &presult, PRUNE_ON_ACCESS,
+ vistest,
+ NULL, PRUNE_ON_ACCESS, &presult,
&dummy_off_loc, NULL, NULL);
/*
@@ -645,6 +645,15 @@ heap_page_will_freeze(Relation relation, Buffer buffer,
* also need to account for a reduction in the length of the line pointer
* array following array truncation by us.
*
+ * options:
+ * MARK_UNUSED_NOW indicates that dead items can be set LP_UNUSED during
+ * pruning.
+ *
+ * FREEZE indicates that we will also freeze tuples, and will return
+ * 'all_visible', 'all_frozen' flags to the caller.
+ *
+ * UPDATE_VM indicates that we will set the page's status in the VM.
+ *
* If the HEAP_PRUNE_FREEZE option is set, we will also freeze tuples if it's
* required in order to advance relfrozenxid / relminmxid, or if it's
* considered advantageous for overall system performance to do so now. The
@@ -663,30 +672,21 @@ heap_page_will_freeze(Relation relation, Buffer buffer,
* contain the required block of the visibility map.
*
* vistest is used to distinguish whether tuples are DEAD or RECENTLY_DEAD
- * (see heap_prune_satisfies_vacuum).
- *
- * options:
- * MARK_UNUSED_NOW indicates that dead items can be set LP_UNUSED during
- * pruning.
- *
- * FREEZE indicates that we will also freeze tuples, and will return
- * 'all_visible', 'all_frozen' flags to the caller.
- *
- * UPDATE_VM indicates that we will set the page's status in the VM.
+ * (see heap_prune_satisfies_vacuum). It is an input parameter.
*
* cutoffs contains the freeze cutoffs, established by VACUUM at the beginning
* of vacuuming the relation. Required if HEAP_PRUNE_FREEZE option is set.
* cutoffs->OldestXmin is also used to determine if dead tuples are
- * HEAPTUPLE_RECENTLY_DEAD or HEAPTUPLE_DEAD.
+ * HEAPTUPLE_RECENTLY_DEAD or HEAPTUPLE_DEAD. It is an input parameter.
+ *
+ * reason indicates why the pruning is performed. It is included in the WAL
+ * record for debugging and analysis purposes, but otherwise has no effect.
*
* presult contains output parameters needed by callers, such as the number of
* tuples removed and the offsets of dead items on the page after pruning.
* heap_page_prune_and_freeze() is responsible for initializing it. Required
* by all callers.
*
- * reason indicates why the pruning is performed. It is included in the WAL
- * record for debugging and analysis purposes, but otherwise has no effect.
- *
* off_loc is the offset location required by the caller to use in error
* callback.
*
@@ -699,13 +699,13 @@ heap_page_will_freeze(Relation relation, Buffer buffer,
*/
void
heap_page_prune_and_freeze(Relation relation, Buffer buffer,
+ int options,
bool blk_known_av,
Buffer vmbuffer,
GlobalVisState *vistest,
- int options,
struct VacuumCutoffs *cutoffs,
- PruneFreezeResult *presult,
PruneReason reason,
+ PruneFreezeResult *presult,
OffsetNumber *off_loc,
TransactionId *new_relfrozen_xid,
MultiXactId *new_relmin_mxid)
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index 777ec30eb82..120782fd8ec 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -1992,11 +1992,11 @@ lazy_scan_prune(LVRelState *vacrel,
if (vacrel->nindexes == 0)
prune_options |= HEAP_PAGE_PRUNE_MARK_UNUSED_NOW;
- heap_page_prune_and_freeze(rel, buf,
+ heap_page_prune_and_freeze(rel, buf, prune_options,
all_visible_according_to_vm,
vmbuffer,
- vacrel->vistest, prune_options,
- &vacrel->cutoffs, &presult, PRUNE_VACUUM_SCAN,
+ vacrel->vistest,
+ &vacrel->cutoffs, PRUNE_VACUUM_SCAN, &presult,
&vacrel->offnum,
&vacrel->NewRelfrozenXid, &vacrel->NewRelminMxid);
diff --git a/src/include/access/heapam.h b/src/include/access/heapam.h
index 16f7904a21e..0c4e5607627 100644
--- a/src/include/access/heapam.h
+++ b/src/include/access/heapam.h
@@ -394,13 +394,13 @@ struct GlobalVisState;
extern void heap_page_prune_opt(Relation relation, Buffer buffer,
Buffer *vmbuffer);
extern void heap_page_prune_and_freeze(Relation relation, Buffer buffer,
+ int options,
bool blk_known_av,
Buffer vmbuffer,
struct GlobalVisState *vistest,
- int options,
struct VacuumCutoffs *cutoffs,
- PruneFreezeResult *presult,
PruneReason reason,
+ PruneFreezeResult *presult,
OffsetNumber *off_loc,
TransactionId *new_relfrozen_xid,
MultiXactId *new_relmin_mxid);
--
2.43.0
On Thu, Jul 31, 2025 at 6:58 PM Melanie Plageman
<melanieplageman@gmail.com> wrote:
The patch "Set-pd_prune_xid-on-insert.txt" can be applied as the last
patch in the set. It sets pd_prune_xid on insert (so pages filled by
COPY or insert can also be set all-visible in the VM before they are
vacuumed). I gave it a .txt extension because it currently fails
035_standby_logical_decoding due to a recovery conflict. I need to
investigate more to see if this is a bug in my patch set or elsewhere
in Postgres.
I figured out that if we set the VM on-access, we need to enable
hot_standby_feedback in more places in 035_standby_logical_decoding.pl
to avoid recovery conflicts. I've done that in the attached updated
version 6. There are a few other issues in
035_standby_logical_decoding.pl that I reported here [1]/messages/by-id/CAAKRu_YO2mEm=ZWZKPjTMU=gW5Y83_KMi_1cr51JwavH0ctd7w@mail.gmail.com. With these
changes, setting pd_prune_xid on insert passes tests. Whether or not
we want to do it (and what the heuristic should be for deciding when
to do it) is another question.
- Melanie
[1]: /messages/by-id/CAAKRu_YO2mEm=ZWZKPjTMU=gW5Y83_KMi_1cr51JwavH0ctd7w@mail.gmail.com
Attachments:
v6-0001-Eliminate-xl_heap_visible-in-COPY-FREEZE.patchtext/x-patch; charset=US-ASCII; name=v6-0001-Eliminate-xl_heap_visible-in-COPY-FREEZE.patchDownload
From 62aaaf33ff9fcc256c42579c5dce9e9e6e6344cd Mon Sep 17 00:00:00 2001
From: Melanie Plageman <melanieplageman@gmail.com>
Date: Tue, 17 Jun 2025 17:22:10 -0400
Subject: [PATCH v6 01/20] Eliminate xl_heap_visible in COPY FREEZE
Instead of emitting a separate WAL record for setting the VM bits in
xl_heap_visible, specify the changes to make to the VM block in the
xl_heap_multi_insert record instead.
---
src/backend/access/heap/heapam.c | 47 ++++++++++---------
src/backend/access/heap/heapam_xlog.c | 39 +++++++++++++++-
src/backend/access/heap/visibilitymap.c | 62 ++++++++++++++++++++++++-
src/backend/access/rmgrdesc/heapdesc.c | 5 ++
src/include/access/visibilitymap.h | 2 +
5 files changed, 132 insertions(+), 23 deletions(-)
diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index 0dcd6ee817e..68db4325285 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -2493,9 +2493,6 @@ heap_multi_insert(Relation relation, TupleTableSlot **slots, int ntuples,
/*
* If the page is all visible, need to clear that, unless we're only
* going to add further frozen rows to it.
- *
- * If we're only adding already frozen rows to a previously empty
- * page, mark it as all-visible.
*/
if (PageIsAllVisible(page) && !(options & HEAP_INSERT_FROZEN))
{
@@ -2505,8 +2502,22 @@ heap_multi_insert(Relation relation, TupleTableSlot **slots, int ntuples,
BufferGetBlockNumber(buffer),
vmbuffer, VISIBILITYMAP_VALID_BITS);
}
+
+ /*
+ * If we're only adding already frozen rows to a previously empty
+ * page, mark it as all-frozen and update the visibility map. We're
+ * already holding a pin on the vmbuffer.
+ */
else if (all_frozen_set)
+ {
PageSetAllVisible(page);
+ LockBuffer(vmbuffer, BUFFER_LOCK_EXCLUSIVE);
+ visibilitymap_set_vmbyte(relation,
+ BufferGetBlockNumber(buffer),
+ vmbuffer,
+ VISIBILITYMAP_ALL_VISIBLE |
+ VISIBILITYMAP_ALL_FROZEN);
+ }
/*
* XXX Should we set PageSetPrunable on this page ? See heap_insert()
@@ -2554,6 +2565,12 @@ heap_multi_insert(Relation relation, TupleTableSlot **slots, int ntuples,
xlrec->flags = 0;
if (all_visible_cleared)
xlrec->flags = XLH_INSERT_ALL_VISIBLE_CLEARED;
+
+ /*
+ * We don't have to worry about including a conflict xid in the
+ * WAL record as HEAP_INSERT_FROZEN intentionally violates
+ * visibility rules.
+ */
if (all_frozen_set)
xlrec->flags = XLH_INSERT_ALL_FROZEN_SET;
@@ -2616,7 +2633,10 @@ heap_multi_insert(Relation relation, TupleTableSlot **slots, int ntuples,
XLogBeginInsert();
XLogRegisterData(xlrec, tupledata - scratch.data);
+
XLogRegisterBuffer(0, buffer, REGBUF_STANDARD | bufflags);
+ if (all_frozen_set)
+ XLogRegisterBuffer(1, vmbuffer, 0);
XLogRegisterBufData(0, tupledata, totaldatalen);
@@ -2626,29 +2646,14 @@ heap_multi_insert(Relation relation, TupleTableSlot **slots, int ntuples,
recptr = XLogInsert(RM_HEAP2_ID, info);
PageSetLSN(page, recptr);
+ if (all_frozen_set)
+ PageSetLSN(BufferGetPage(vmbuffer), recptr);
}
END_CRIT_SECTION();
- /*
- * If we've frozen everything on the page, update the visibilitymap.
- * We're already holding pin on the vmbuffer.
- */
if (all_frozen_set)
- {
- Assert(PageIsAllVisible(page));
- Assert(visibilitymap_pin_ok(BufferGetBlockNumber(buffer), vmbuffer));
-
- /*
- * It's fine to use InvalidTransactionId here - this is only used
- * when HEAP_INSERT_FROZEN is specified, which intentionally
- * violates visibility rules.
- */
- visibilitymap_set(relation, BufferGetBlockNumber(buffer), buffer,
- InvalidXLogRecPtr, vmbuffer,
- InvalidTransactionId,
- VISIBILITYMAP_ALL_VISIBLE | VISIBILITYMAP_ALL_FROZEN);
- }
+ LockBuffer(vmbuffer, BUFFER_LOCK_UNLOCK);
UnlockReleaseBuffer(buffer);
ndone += nthispage;
diff --git a/src/backend/access/heap/heapam_xlog.c b/src/backend/access/heap/heapam_xlog.c
index eb4bd3d6ae3..2485c344191 100644
--- a/src/backend/access/heap/heapam_xlog.c
+++ b/src/backend/access/heap/heapam_xlog.c
@@ -552,6 +552,7 @@ heap_xlog_multi_insert(XLogReaderState *record)
int i;
bool isinit = (XLogRecGetInfo(record) & XLOG_HEAP_INIT_PAGE) != 0;
XLogRedoAction action;
+ Buffer vmbuffer = InvalidBuffer;
/*
* Insertion doesn't overwrite MVCC data, so no conflict processing is
@@ -572,11 +573,11 @@ heap_xlog_multi_insert(XLogReaderState *record)
if (xlrec->flags & XLH_INSERT_ALL_VISIBLE_CLEARED)
{
Relation reln = CreateFakeRelcacheEntry(rlocator);
- Buffer vmbuffer = InvalidBuffer;
visibilitymap_pin(reln, blkno, &vmbuffer);
visibilitymap_clear(reln, blkno, vmbuffer, VISIBILITYMAP_VALID_BITS);
ReleaseBuffer(vmbuffer);
+ vmbuffer = InvalidBuffer;
FreeFakeRelcacheEntry(reln);
}
@@ -663,6 +664,42 @@ heap_xlog_multi_insert(XLogReaderState *record)
if (BufferIsValid(buffer))
UnlockReleaseBuffer(buffer);
+ buffer = InvalidBuffer;
+
+ /*
+ * Now read and update the VM block. Even if we skipped updating the heap
+ * page due to the file being dropped or truncated later in recovery, it's
+ * still safe to update the visibility map. Any WAL record that clears
+ * the visibility map bit does so before checking the page LSN, so any
+ * bits that need to be cleared will still be cleared.
+ *
+ * It is only okay to set the VM bits without holding the heap page lock
+ * because we can expect no other writers of this page.
+ */
+ if (xlrec->flags & XLH_INSERT_ALL_FROZEN_SET &&
+ XLogReadBufferForRedoExtended(record, 1, RBM_ZERO_ON_ERROR, false,
+ &vmbuffer) == BLK_NEEDS_REDO)
+ {
+ Relation reln = CreateFakeRelcacheEntry(rlocator);
+
+ visibilitymap_pin(reln, blkno, &vmbuffer);
+ visibilitymap_set_vmbyte(reln, blkno,
+ vmbuffer,
+ VISIBILITYMAP_ALL_VISIBLE |
+ VISIBILITYMAP_ALL_FROZEN);
+
+ /*
+ * It is not possible that the VM was already set for this heap page,
+ * so the vmbuffer must have been modified and marked dirty.
+ */
+ Assert(BufferIsDirty(vmbuffer));
+ PageSetLSN(BufferGetPage(vmbuffer), lsn);
+ FreeFakeRelcacheEntry(reln);
+ }
+
+ if (BufferIsValid(vmbuffer))
+ UnlockReleaseBuffer(vmbuffer);
+
/*
* If the page is running low on free space, update the FSM as well.
* Arbitrarily, our definition of "low" is less than 20%. We can't do much
diff --git a/src/backend/access/heap/visibilitymap.c b/src/backend/access/heap/visibilitymap.c
index 8f918e00af7..0bc64203959 100644
--- a/src/backend/access/heap/visibilitymap.c
+++ b/src/backend/access/heap/visibilitymap.c
@@ -14,7 +14,8 @@
* visibilitymap_clear - clear bits for one page in the visibility map
* visibilitymap_pin - pin a map page for setting a bit
* visibilitymap_pin_ok - check whether correct map page is already pinned
- * visibilitymap_set - set a bit in a previously pinned page
+ * visibilitymap_set - set a bit in a previously pinned page and log
+ * visibilitymap_set_vmbyte - set a bit in a pinned page
* visibilitymap_get_status - get status of bits
* visibilitymap_count - count number of bits set in visibility map
* visibilitymap_prepare_truncate -
@@ -318,6 +319,65 @@ visibilitymap_set(Relation rel, BlockNumber heapBlk, Buffer heapBuf,
return status;
}
+/*
+ * Set flags in the VM block contained in the passed in vmBuf.
+ *
+ * This function is for callers which include the VM changes in the same WAL
+ * record as the modifications of the heap page which rendered it all-visible.
+ * Callers separately logging the VM changes should invoke visibilitymap_set()
+ * instead.
+ *
+ * Caller must have pinned and exclusive locked the correct block of the VM in
+ * vmBuf.
+ *
+ * During normal operation (i.e. not recovery), this should be called in a
+ * critical section which also makes any necessary changes to the heap page
+ * and, if relevant, emits WAL.
+ *
+ * Caller is responsible for WAL logging the changes to the VM buffer and for
+ * making any changes needed to the associated heap page.
+ */
+uint8
+visibilitymap_set_vmbyte(Relation rel, BlockNumber heapBlk,
+ Buffer vmBuf, uint8 flags)
+{
+ BlockNumber mapBlock = HEAPBLK_TO_MAPBLOCK(heapBlk);
+ uint32 mapByte = HEAPBLK_TO_MAPBYTE(heapBlk);
+ uint8 mapOffset = HEAPBLK_TO_OFFSET(heapBlk);
+ Page page;
+ uint8 *map;
+ uint8 status;
+
+#ifdef TRACE_VISIBILITYMAP
+ elog(DEBUG1, "vm_set %s %d", RelationGetRelationName(rel), heapBlk);
+#endif
+
+ /* Call in same critical section where WAL is emitted. */
+ Assert(InRecovery || CritSectionCount > 0);
+
+ /* Flags should be valid. Also never clear bits with this function */
+ Assert((flags & VISIBILITYMAP_VALID_BITS) == flags);
+
+ /* Must never set all_frozen bit without also setting all_visible bit */
+ Assert(flags != VISIBILITYMAP_ALL_FROZEN);
+
+ /* Check that we have the right VM page pinned */
+ if (!BufferIsValid(vmBuf) || BufferGetBlockNumber(vmBuf) != mapBlock)
+ elog(ERROR, "wrong VM buffer passed to visibilitymap_set");
+
+ page = BufferGetPage(vmBuf);
+ map = (uint8 *) PageGetContents(page);
+
+ status = (map[mapByte] >> mapOffset) & VISIBILITYMAP_VALID_BITS;
+ if (flags != status)
+ {
+ map[mapByte] |= (flags << mapOffset);
+ MarkBufferDirty(vmBuf);
+ }
+
+ return status;
+}
+
/*
* visibilitymap_get_status - get status of bits
*
diff --git a/src/backend/access/rmgrdesc/heapdesc.c b/src/backend/access/rmgrdesc/heapdesc.c
index 82b62c95de5..b48d7dc1d24 100644
--- a/src/backend/access/rmgrdesc/heapdesc.c
+++ b/src/backend/access/rmgrdesc/heapdesc.c
@@ -16,6 +16,7 @@
#include "access/heapam_xlog.h"
#include "access/rmgrdesc_utils.h"
+#include "access/visibilitymapdefs.h"
#include "storage/standbydefs.h"
/*
@@ -354,6 +355,10 @@ heap2_desc(StringInfo buf, XLogReaderState *record)
appendStringInfo(buf, "ntuples: %d, flags: 0x%02X", xlrec->ntuples,
xlrec->flags);
+ if (xlrec->flags & XLH_INSERT_ALL_FROZEN_SET)
+ appendStringInfo(buf, ", vm_flags: 0x%02X",
+ VISIBILITYMAP_ALL_VISIBLE | VISIBILITYMAP_ALL_FROZEN);
+
if (XLogRecHasBlockData(record, 0) && !isinit)
{
appendStringInfoString(buf, ", offsets:");
diff --git a/src/include/access/visibilitymap.h b/src/include/access/visibilitymap.h
index be21c6dd1a3..977566f6b98 100644
--- a/src/include/access/visibilitymap.h
+++ b/src/include/access/visibilitymap.h
@@ -37,6 +37,8 @@ extern uint8 visibilitymap_set(Relation rel,
Buffer vmBuf,
TransactionId cutoff_xid,
uint8 flags);
+extern uint8 visibilitymap_set_vmbyte(Relation rel, BlockNumber heapBlk,
+ Buffer vmBuf, uint8 flags);
extern uint8 visibilitymap_get_status(Relation rel, BlockNumber heapBlk, Buffer *vmbuf);
extern void visibilitymap_count(Relation rel, BlockNumber *all_visible, BlockNumber *all_frozen);
extern BlockNumber visibilitymap_prepare_truncate(Relation rel,
--
2.43.0
v6-0003-Eliminate-xl_heap_visible-from-vacuum-phase-III.patchtext/x-patch; charset=US-ASCII; name=v6-0003-Eliminate-xl_heap_visible-from-vacuum-phase-III.patchDownload
From 17427256d348f7414bfb8ceb74e00e3d8cd390a5 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <melanieplageman@gmail.com>
Date: Wed, 18 Jun 2025 12:42:13 -0400
Subject: [PATCH v6 03/20] Eliminate xl_heap_visible from vacuum phase III
Instead of emitting a separate xl_heap_visible record for each page that
is rendered all-visible by vacuum's third phase, include the updates to
the VM in the already emitted xl_heap_prune record.
---
src/backend/access/heap/heapam_xlog.c | 142 ++++++++++++++++++++---
src/backend/access/heap/pruneheap.c | 48 +++++++-
src/backend/access/heap/vacuumlazy.c | 149 +++++++++++++++++--------
src/backend/access/rmgrdesc/heapdesc.c | 13 ++-
src/include/access/heapam.h | 9 ++
src/include/access/heapam_xlog.h | 3 +
6 files changed, 296 insertions(+), 68 deletions(-)
diff --git a/src/backend/access/heap/heapam_xlog.c b/src/backend/access/heap/heapam_xlog.c
index 2485c344191..14541e2e94f 100644
--- a/src/backend/access/heap/heapam_xlog.c
+++ b/src/backend/access/heap/heapam_xlog.c
@@ -35,7 +35,8 @@ heap_xlog_prune_freeze(XLogReaderState *record)
Buffer buffer;
RelFileLocator rlocator;
BlockNumber blkno;
- XLogRedoAction action;
+ Buffer vmbuffer = InvalidBuffer;
+ uint8 vmflags = 0;
XLogRecGetBlockTag(record, 0, &rlocator, NULL, &blkno);
memcpy(&xlrec, maindataptr, SizeOfHeapPrune);
@@ -51,10 +52,15 @@ heap_xlog_prune_freeze(XLogReaderState *record)
(xlrec.flags & (XLHP_HAS_REDIRECTIONS | XLHP_HAS_DEAD_ITEMS)) == 0);
/*
- * We are about to remove and/or freeze tuples. In Hot Standby mode,
- * ensure that there are no queries running for which the removed tuples
- * are still visible or which still consider the frozen xids as running.
- * The conflict horizon XID comes after xl_heap_prune.
+ * After xl_heap_prune is the optional snapshot conflict horizon.
+ *
+ * In Hot Standby mode, we must ensure that there are no running queries
+ * which would conflict with the changes in this record. If pruning, that
+ * means we cannot remove tuples still visible to transactions on the
+ * standby. If freezing, that means we cannot freeze tuples with xids that
+ * are still considered running on the standby. And for setting the VM, we
+ * cannot do so if the page isn't all-visible to all transactions on the
+ * standby.
*/
if ((xlrec.flags & XLHP_HAS_CONFLICT_HORIZON) != 0)
{
@@ -70,13 +76,28 @@ heap_xlog_prune_freeze(XLogReaderState *record)
rlocator);
}
+ /* Next are the optionally included vmflags. Copy them out for later use. */
+ if ((xlrec.flags & XLHP_HAS_VMFLAGS) != 0)
+ {
+ memcpy(&vmflags, maindataptr, sizeof(uint8));
+ maindataptr += sizeof(uint8);
+
+ /*
+ * We don't set VISIBILITYMAP_XLOG_CATALOG_REL in the combined record
+ * because we already have XLHP_IS_CATALOG_REL.
+ */
+ Assert((vmflags & VISIBILITYMAP_VALID_BITS) == vmflags);
+ /* Must never set all_frozen bit without also setting all_visible bit */
+ Assert(vmflags != VISIBILITYMAP_ALL_FROZEN);
+ }
+
/*
- * If we have a full-page image, restore it and we're done.
+ * If we have a full-page image of the heap block, restore it and we're
+ * done with the heap block.
*/
- action = XLogReadBufferForRedoExtended(record, 0, RBM_NORMAL,
- (xlrec.flags & XLHP_CLEANUP_LOCK) != 0,
- &buffer);
- if (action == BLK_NEEDS_REDO)
+ if (XLogReadBufferForRedoExtended(record, 0, RBM_NORMAL,
+ (xlrec.flags & XLHP_CLEANUP_LOCK) != 0,
+ &buffer) == BLK_NEEDS_REDO)
{
Page page = (Page) BufferGetPage(buffer);
OffsetNumber *redirected;
@@ -89,6 +110,9 @@ heap_xlog_prune_freeze(XLogReaderState *record)
Size datalen;
xlhp_freeze_plan *plans;
OffsetNumber *frz_offsets;
+ bool do_prune;
+ bool mark_buffer_dirty;
+ bool set_heap_lsn;
char *dataptr = XLogRecGetBlockData(record, 0, &datalen);
heap_xlog_deserialize_prune_and_freeze(dataptr, xlrec.flags,
@@ -97,11 +121,18 @@ heap_xlog_prune_freeze(XLogReaderState *record)
&ndead, &nowdead,
&nunused, &nowunused);
+ do_prune = nredirected > 0 || ndead > 0 || nunused > 0;
+ set_heap_lsn = mark_buffer_dirty = do_prune || nplans > 0;
+
+ /* Ensure the record does something */
+ Assert(do_prune || nplans > 0 ||
+ vmflags & VISIBILITYMAP_VALID_BITS);
+
/*
* Update all line pointers per the record, and repair fragmentation
* if needed.
*/
- if (nredirected > 0 || ndead > 0 || nunused > 0)
+ if (do_prune)
heap_page_prune_execute(buffer,
(xlrec.flags & XLHP_CLEANUP_LOCK) == 0,
redirected, nredirected,
@@ -138,26 +169,72 @@ heap_xlog_prune_freeze(XLogReaderState *record)
/* There should be no more data */
Assert((char *) frz_offsets == dataptr + datalen);
+ Assert(BufferIsValid(buffer) &&
+ BufferGetBlockNumber(buffer) == blkno);
+
+ /*
+ * Now set PD_ALL_VISIBLE, if required. We'll only do this if we are
+ * also going to set bits in the VM later.
+ *
+ * We must never end up with the VM bit set and the page-level
+ * PD_ALL_VISIBLE bit clear. If that were to occur, a subsequent page
+ * modification would fail to clear the VM bit.
+ */
+ if ((vmflags & VISIBILITYMAP_VALID_BITS) && !PageIsAllVisible(page))
+ {
+ PageSetAllVisible(page);
+
+ /*
+ * If the only change to the heap page is setting PD_ALL_VISIBLE,
+ * we can avoid setting the page LSN unless checksums or
+ * wal_log_hints are enabled.
+ */
+ set_heap_lsn = XLogHintBitIsNeeded() ? true : set_heap_lsn;
+ mark_buffer_dirty = true;
+ }
+
/*
* Note: we don't worry about updating the page's prunability hints.
* At worst this will cause an extra prune cycle to occur soon.
*/
- PageSetLSN(page, lsn);
- MarkBufferDirty(buffer);
+ if (mark_buffer_dirty)
+ MarkBufferDirty(buffer);
+ if (set_heap_lsn)
+ PageSetLSN(page, lsn);
}
/*
- * If we released any space or line pointers, update the free space map.
+ * If we released any space or line pointers or will be setting a page in
+ * the visibility map, update the free space map.
+ *
+ * Even if we are just updating the VM (and thus not freeing up any
+ * space), we'll still update the FSM for this page. Since FSM is not
+ * WAL-logged and only updated heuristically, it easily becomes stale in
+ * standbys. If the standby is later promoted and runs VACUUM, it will
+ * skip updating individual free space figures for pages that became
+ * all-visible (or all-frozen, depending on the vacuum mode,) which is
+ * troublesome when FreeSpaceMapVacuum propagates too optimistic free
+ * space values to upper FSM layers; later inserters try to use such pages
+ * only to find out that they are unusable. This can cause long stalls
+ * when there are many such pages.
+ *
+ * Forestall those problems by updating FSM's idea about a page that is
+ * becoming all-visible or all-frozen.
*
* Do this regardless of a full-page image being applied, since the FSM
* data is not in the page anyway.
+ *
+ * We want to avoid holding an exclusive lock on the heap buffer while
+ * doing IO (either of the FSM or the VM), so we'll release the lock on
+ * the heap buffer before doing either.
*/
if (BufferIsValid(buffer))
{
- if (xlrec.flags & (XLHP_HAS_REDIRECTIONS |
- XLHP_HAS_DEAD_ITEMS |
- XLHP_HAS_NOW_UNUSED_ITEMS))
+ if ((xlrec.flags & (XLHP_HAS_REDIRECTIONS |
+ XLHP_HAS_DEAD_ITEMS |
+ XLHP_HAS_NOW_UNUSED_ITEMS)) ||
+ vmflags & VISIBILITYMAP_VALID_BITS)
{
Size freespace = PageGetHeapFreeSpace(BufferGetPage(buffer));
@@ -168,6 +245,37 @@ heap_xlog_prune_freeze(XLogReaderState *record)
else
UnlockReleaseBuffer(buffer);
}
+
+ /*
+ * Read and update the VM block. Even if we skipped updating the heap page
+ * due to the file being dropped or truncated later in recovery, it's
+ * still safe to update the visibility map. Any WAL record that clears
+ * the visibility map bit does so before checking the page LSN, so any
+ * bits that need to be cleared will still be cleared.
+ *
+ * Note that it is *only* okay that we do not hold a lock on the heap page
+ * because we are in recovery and can expect no other writers to clear
+ * PD_ALL_VISIBLE before we are able to update the VM.
+ */
+ if (vmflags & VISIBILITYMAP_VALID_BITS &&
+ XLogReadBufferForRedoExtended(record, 1,
+ RBM_ZERO_ON_ERROR,
+ false,
+ &vmbuffer) == BLK_NEEDS_REDO)
+ {
+ uint8 old_vmbits = 0;
+ Relation reln = CreateFakeRelcacheEntry(rlocator);
+
+ visibilitymap_pin(reln, blkno, &vmbuffer);
+ old_vmbits = visibilitymap_set_vmbyte(reln, blkno, vmbuffer, vmflags);
+ /* Only set VM page LSN if we modified the page */
+ if (old_vmbits != vmflags)
+ PageSetLSN(BufferGetPage(vmbuffer), lsn);
+ FreeFakeRelcacheEntry(reln);
+ }
+
+ if (BufferIsValid(vmbuffer))
+ UnlockReleaseBuffer(vmbuffer);
}
/*
diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index a8025889be0..d9ba0f96e34 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -21,6 +21,7 @@
#include "access/transam.h"
#include "access/xlog.h"
#include "access/xloginsert.h"
+#include "access/visibilitymapdefs.h"
#include "commands/vacuum.h"
#include "executor/instrument.h"
#include "miscadmin.h"
@@ -835,6 +836,7 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
conflict_xid = prstate.latest_xid_removed;
log_heap_prune_and_freeze(relation, buffer,
+ InvalidBuffer, 0, false,
conflict_xid,
true, reason,
prstate.frozen, prstate.nfrozen,
@@ -2045,12 +2047,23 @@ heap_log_freeze_plan(HeapTupleFreeze *tuples, int ntuples,
* replaying 'unused' items depends on whether they were all previously marked
* as dead.
*
+ * If the VM is being updated, vmflags will contain the bits to set. In this
+ * case, vmbuffer should already have been updated and marked dirty and should
+ * still be pinned and locked.
+ *
+ * set_pd_all_vis indicates that we set PD_ALL_VISIBLE and thus should update
+ * the page LSN when checksums/wal_log_hints are enabled even if we did not
+ * prune or freeze tuples on the page.
+ *
* Note: This function scribbles on the 'frozen' array.
*
* Note: This is called in a critical section, so careful what you do here.
*/
void
log_heap_prune_and_freeze(Relation relation, Buffer buffer,
+ Buffer vmbuffer,
+ uint8 vmflags,
+ bool set_pd_all_vis,
TransactionId conflict_xid,
bool cleanup_lock,
PruneReason reason,
@@ -2062,6 +2075,7 @@ log_heap_prune_and_freeze(Relation relation, Buffer buffer,
xl_heap_prune xlrec;
XLogRecPtr recptr;
uint8 info;
+ uint8 regbuf_flags;
/* The following local variables hold data registered in the WAL record: */
xlhp_freeze_plan plans[MaxHeapTuplesPerPage];
@@ -2070,8 +2084,19 @@ log_heap_prune_and_freeze(Relation relation, Buffer buffer,
xlhp_prune_items dead_items;
xlhp_prune_items unused_items;
OffsetNumber frz_offsets[MaxHeapTuplesPerPage];
+ bool do_prune = nredirected > 0 || ndead > 0 || nunused > 0;
xlrec.flags = 0;
+ regbuf_flags = REGBUF_STANDARD;
+
+ /*
+ * We can avoid an FPI if the only modification we are making to the heap
+ * page is to set PD_ALL_VISIBLE and checksums/wal_log_hints are disabled.
+ */
+ if (!do_prune &&
+ nfrozen == 0 &&
+ (!set_pd_all_vis || !XLogHintBitIsNeeded()))
+ regbuf_flags |= REGBUF_NO_IMAGE;
/*
* Prepare data for the buffer. The arrays are not actually in the
@@ -2079,7 +2104,11 @@ log_heap_prune_and_freeze(Relation relation, Buffer buffer,
* page image, the arrays can be omitted.
*/
XLogBeginInsert();
- XLogRegisterBuffer(0, buffer, REGBUF_STANDARD);
+ XLogRegisterBuffer(0, buffer, regbuf_flags);
+
+ if (vmflags & VISIBILITYMAP_VALID_BITS)
+ XLogRegisterBuffer(1, vmbuffer, 0);
+
if (nfrozen > 0)
{
int nplans;
@@ -2136,6 +2165,8 @@ log_heap_prune_and_freeze(Relation relation, Buffer buffer,
* Prepare the main xl_heap_prune record. We already set the XLHP_HAS_*
* flag above.
*/
+ if (vmflags & VISIBILITYMAP_VALID_BITS)
+ xlrec.flags |= XLHP_HAS_VMFLAGS;
if (RelationIsAccessibleInLogicalDecoding(relation))
xlrec.flags |= XLHP_IS_CATALOG_REL;
if (TransactionIdIsValid(conflict_xid))
@@ -2150,6 +2181,8 @@ log_heap_prune_and_freeze(Relation relation, Buffer buffer,
XLogRegisterData(&xlrec, SizeOfHeapPrune);
if (TransactionIdIsValid(conflict_xid))
XLogRegisterData(&conflict_xid, sizeof(TransactionId));
+ if (vmflags & VISIBILITYMAP_VALID_BITS)
+ XLogRegisterData(&vmflags, sizeof(uint8));
switch (reason)
{
@@ -2168,5 +2201,16 @@ log_heap_prune_and_freeze(Relation relation, Buffer buffer,
}
recptr = XLogInsert(RM_HEAP2_ID, info);
- PageSetLSN(BufferGetPage(buffer), recptr);
+ if (vmflags & VISIBILITYMAP_VALID_BITS)
+ PageSetLSN(BufferGetPage(vmbuffer), recptr);
+
+ /*
+ * If pruning or freezing tuples or setting the page all-visible when
+ * checksums or wal_hint_bits are enabled, we must bump the LSN. Torn
+ * pages are possible if we update PD_ALL_VISIBLE without bumping the LSN,
+ * but this is deemed okay for page hint updates.
+ */
+ if (do_prune || nfrozen > 0 ||
+ (set_pd_all_vis && XLogHintBitIsNeeded()))
+ PageSetLSN(BufferGetPage(buffer), recptr);
}
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index 8a62a93eee5..460cdbd8417 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -464,11 +464,13 @@ static void dead_items_add(LVRelState *vacrel, BlockNumber blkno, OffsetNumber *
int num_offsets);
static void dead_items_reset(LVRelState *vacrel);
static void dead_items_cleanup(LVRelState *vacrel);
-static bool heap_page_is_all_visible(Relation rel, Buffer buf,
- TransactionId OldestXmin,
- bool *all_frozen,
- TransactionId *visibility_cutoff_xid,
- OffsetNumber *logging_offnum);
+static bool heap_page_is_all_visible_except_lpdead(Relation rel, Buffer buf,
+ TransactionId OldestXmin,
+ OffsetNumber *deadoffsets,
+ int allowed_num_offsets,
+ bool *all_frozen,
+ TransactionId *visibility_cutoff_xid,
+ OffsetNumber *logging_offnum);
static void update_relstats_all_indexes(LVRelState *vacrel);
static void vacuum_error_callback(void *arg);
static void update_vacuum_error_info(LVRelState *vacrel,
@@ -2847,8 +2849,11 @@ lazy_vacuum_heap_page(LVRelState *vacrel, BlockNumber blkno, Buffer buffer,
OffsetNumber unused[MaxHeapTuplesPerPage];
int nunused = 0;
TransactionId visibility_cutoff_xid;
+ TransactionId conflict_xid = InvalidTransactionId;
bool all_frozen;
LVSavedErrInfo saved_err_info;
+ uint8 vmflags = 0;
+ bool set_pd_all_vis = false;
Assert(vacrel->do_index_vacuuming);
@@ -2859,6 +2864,20 @@ lazy_vacuum_heap_page(LVRelState *vacrel, BlockNumber blkno, Buffer buffer,
VACUUM_ERRCB_PHASE_VACUUM_HEAP, blkno,
InvalidOffsetNumber);
+ if (heap_page_is_all_visible_except_lpdead(vacrel->rel, buffer,
+ vacrel->cutoffs.OldestXmin,
+ deadoffsets, num_offsets,
+ &all_frozen, &visibility_cutoff_xid,
+ &vacrel->offnum))
+ {
+ vmflags |= VISIBILITYMAP_ALL_VISIBLE;
+ if (all_frozen)
+ {
+ vmflags |= VISIBILITYMAP_ALL_FROZEN;
+ Assert(!TransactionIdIsValid(visibility_cutoff_xid));
+ }
+ }
+
START_CRIT_SECTION();
for (int i = 0; i < num_offsets; i++)
@@ -2878,6 +2897,18 @@ lazy_vacuum_heap_page(LVRelState *vacrel, BlockNumber blkno, Buffer buffer,
/* Attempt to truncate line pointer array now */
PageTruncateLinePointerArray(page);
+ if ((vmflags & VISIBILITYMAP_VALID_BITS) != 0)
+ {
+ Assert(!PageIsAllVisible(page));
+ set_pd_all_vis = true;
+ LockBuffer(vmbuffer, BUFFER_LOCK_EXCLUSIVE);
+ PageSetAllVisible(page);
+ visibilitymap_set_vmbyte(vacrel->rel,
+ blkno,
+ vmbuffer, vmflags);
+ conflict_xid = visibility_cutoff_xid;
+ }
+
/*
* Mark buffer dirty before we write WAL.
*/
@@ -2887,7 +2918,10 @@ lazy_vacuum_heap_page(LVRelState *vacrel, BlockNumber blkno, Buffer buffer,
if (RelationNeedsWAL(vacrel->rel))
{
log_heap_prune_and_freeze(vacrel->rel, buffer,
- InvalidTransactionId,
+ vmbuffer,
+ vmflags,
+ set_pd_all_vis,
+ conflict_xid,
false, /* no cleanup lock required */
PRUNE_VACUUM_CLEANUP,
NULL, 0, /* frozen */
@@ -2896,39 +2930,12 @@ lazy_vacuum_heap_page(LVRelState *vacrel, BlockNumber blkno, Buffer buffer,
unused, nunused);
}
- /*
- * End critical section, so we safely can do visibility tests (which
- * possibly need to perform IO and allocate memory!). If we crash now the
- * page (including the corresponding vm bit) might not be marked all
- * visible, but that's fine. A later vacuum will fix that.
- */
END_CRIT_SECTION();
- /*
- * Now that we have removed the LP_DEAD items from the page, once again
- * check if the page has become all-visible. The page is already marked
- * dirty, exclusively locked, and, if needed, a full page image has been
- * emitted.
- */
- Assert(!PageIsAllVisible(page));
- if (heap_page_is_all_visible(vacrel->rel, buffer, vacrel->cutoffs.OldestXmin,
- &all_frozen, &visibility_cutoff_xid, &vacrel->offnum))
+ if ((vmflags & VISIBILITYMAP_ALL_VISIBLE) != 0)
{
- uint8 flags = VISIBILITYMAP_ALL_VISIBLE;
-
- if (all_frozen)
- {
- Assert(!TransactionIdIsValid(visibility_cutoff_xid));
- flags |= VISIBILITYMAP_ALL_FROZEN;
- }
-
- PageSetAllVisible(page);
- visibilitymap_set(vacrel->rel, blkno, buffer,
- InvalidXLogRecPtr,
- vmbuffer, visibility_cutoff_xid,
- flags);
-
/* Count the newly set VM page for logging */
+ LockBuffer(vmbuffer, BUFFER_LOCK_UNLOCK);
vacrel->vm_new_visible_pages++;
if (all_frozen)
vacrel->vm_new_visible_frozen_pages++;
@@ -3594,6 +3601,25 @@ dead_items_cleanup(LVRelState *vacrel)
vacrel->pvs = NULL;
}
+/*
+ * Wrapper for heap_page_is_all_visible_except_lpdead() which can be used for
+ * callers that expect no LP_DEAD on the page.
+ */
+bool
+heap_page_is_all_visible(Relation rel, Buffer buf,
+ TransactionId OldestXmin,
+ bool *all_frozen,
+ TransactionId *visibility_cutoff_xid,
+ OffsetNumber *logging_offnum)
+{
+
+ return heap_page_is_all_visible_except_lpdead(rel, buf, OldestXmin,
+ NULL, 0,
+ all_frozen,
+ visibility_cutoff_xid,
+ logging_offnum);
+}
+
/*
* Check if every tuple in the given page is visible to all current and future
* transactions.
@@ -3607,23 +3633,35 @@ dead_items_cleanup(LVRelState *vacrel)
* visible tuples. Sets *all_frozen to true if every tuple on this page is
* frozen.
*
- * This is a stripped down version of lazy_scan_prune(). If you change
- * anything here, make sure that everything stays in sync. Note that an
- * assertion calls us to verify that everybody still agrees. Be sure to avoid
- * introducing new side-effects here.
+ * deadoffsets are the offsets we know about and are about to set LP_UNUSED.
+ * allowed_num_offsets is the number of those. As long as the LP_DEAD items we
+ * encounter on the page match those exactly, we can set the page all-visible
+ * in the VM.
+ *
+ * Callers looking to verify that the page is all-visible can call
+ * heap_page_is_all_visible().
+ *
+ * This is similar logic to that in heap_prune_record_unchanged_lp_normal() If
+ * you change anything here, make sure that everything stays in sync. Note
+ * that an assertion calls us to verify that everybody still agrees. Be sure
+ * to avoid introducing new side-effects here.
*/
static bool
-heap_page_is_all_visible(Relation rel, Buffer buf,
- TransactionId OldestXmin,
- bool *all_frozen,
- TransactionId *visibility_cutoff_xid,
- OffsetNumber *logging_offnum)
+heap_page_is_all_visible_except_lpdead(Relation rel, Buffer buf,
+ TransactionId OldestXmin,
+ OffsetNumber *deadoffsets,
+ int allowed_num_offsets,
+ bool *all_frozen,
+ TransactionId *visibility_cutoff_xid,
+ OffsetNumber *logging_offnum)
{
Page page = BufferGetPage(buf);
BlockNumber blockno = BufferGetBlockNumber(buf);
OffsetNumber offnum,
maxoff;
bool all_visible = true;
+ OffsetNumber current_dead_offsets[MaxHeapTuplesPerPage];
+ size_t current_num_offsets = 0;
*visibility_cutoff_xid = InvalidTransactionId;
*all_frozen = true;
@@ -3655,9 +3693,8 @@ heap_page_is_all_visible(Relation rel, Buffer buf,
*/
if (ItemIdIsDead(itemid))
{
- all_visible = false;
- *all_frozen = false;
- break;
+ current_dead_offsets[current_num_offsets++] = offnum;
+ continue;
}
Assert(ItemIdIsNormal(itemid));
@@ -3724,7 +3761,23 @@ heap_page_is_all_visible(Relation rel, Buffer buf,
/* Clear the offset information once we have processed the given page. */
*logging_offnum = InvalidOffsetNumber;
- return all_visible;
+ /* If we already know it's not all-visible, return false */
+ if (!all_visible)
+ return false;
+
+ /* If we weren't allowed any dead offsets, we're done */
+ if (allowed_num_offsets == 0)
+ return current_num_offsets == 0;
+
+ /* If the number of dead offsets has changed, that's wrong */
+ if (current_num_offsets != allowed_num_offsets)
+ return false;
+
+ Assert(deadoffsets);
+
+ /* The dead offsets must be the same dead offsets */
+ return memcmp(current_dead_offsets, deadoffsets,
+ allowed_num_offsets * sizeof(OffsetNumber)) == 0;
}
/*
diff --git a/src/backend/access/rmgrdesc/heapdesc.c b/src/backend/access/rmgrdesc/heapdesc.c
index b48d7dc1d24..d6c86ccac20 100644
--- a/src/backend/access/rmgrdesc/heapdesc.c
+++ b/src/backend/access/rmgrdesc/heapdesc.c
@@ -266,6 +266,7 @@ heap2_desc(StringInfo buf, XLogReaderState *record)
{
char *rec = XLogRecGetData(record);
uint8 info = XLogRecGetInfo(record) & ~XLR_INFO_MASK;
+ char *maindataptr = rec + SizeOfHeapPrune;
info &= XLOG_HEAP_OPMASK;
if (info == XLOG_HEAP2_PRUNE_ON_ACCESS ||
@@ -278,7 +279,8 @@ heap2_desc(StringInfo buf, XLogReaderState *record)
{
TransactionId conflict_xid;
- memcpy(&conflict_xid, rec + SizeOfHeapPrune, sizeof(TransactionId));
+ memcpy(&conflict_xid, maindataptr, sizeof(TransactionId));
+ maindataptr += sizeof(TransactionId);
appendStringInfo(buf, "snapshotConflictHorizon: %u",
conflict_xid);
@@ -287,6 +289,15 @@ heap2_desc(StringInfo buf, XLogReaderState *record)
appendStringInfo(buf, ", isCatalogRel: %c",
xlrec->flags & XLHP_IS_CATALOG_REL ? 'T' : 'F');
+ if (xlrec->flags & XLHP_HAS_VMFLAGS)
+ {
+ uint8 vmflags;
+
+ memcpy(&vmflags, maindataptr, sizeof(uint8));
+ maindataptr += sizeof(uint8);
+ appendStringInfo(buf, ", vm_flags: 0x%02X", vmflags);
+ }
+
if (XLogRecHasBlockData(record, 0))
{
Size datalen;
diff --git a/src/include/access/heapam.h b/src/include/access/heapam.h
index a2bd5a897f8..8b47295efa2 100644
--- a/src/include/access/heapam.h
+++ b/src/include/access/heapam.h
@@ -344,6 +344,12 @@ extern void heap_inplace_update_and_unlock(Relation relation,
Buffer buffer);
extern void heap_inplace_unlock(Relation relation,
HeapTuple oldtup, Buffer buffer);
+
+extern bool heap_page_is_all_visible(Relation rel, Buffer buf,
+ TransactionId OldestXmin,
+ bool *all_frozen,
+ TransactionId *visibility_cutoff_xid,
+ OffsetNumber *logging_offnum);
extern bool heap_prepare_freeze_tuple(HeapTupleHeader tuple,
const struct VacuumCutoffs *cutoffs,
HeapPageFreeze *pagefrz,
@@ -388,6 +394,9 @@ extern void heap_page_prune_execute(Buffer buffer, bool lp_truncate_only,
OffsetNumber *nowunused, int nunused);
extern void heap_get_root_tuples(Page page, OffsetNumber *root_offsets);
extern void log_heap_prune_and_freeze(Relation relation, Buffer buffer,
+ Buffer vmbuffer,
+ uint8 vmflags,
+ bool vm_modified_heap_page,
TransactionId conflict_xid,
bool cleanup_lock,
PruneReason reason,
diff --git a/src/include/access/heapam_xlog.h b/src/include/access/heapam_xlog.h
index 277df6b3cf0..ceae9c083ff 100644
--- a/src/include/access/heapam_xlog.h
+++ b/src/include/access/heapam_xlog.h
@@ -295,6 +295,9 @@ typedef struct xl_heap_prune
#define SizeOfHeapPrune (offsetof(xl_heap_prune, flags) + sizeof(uint8))
+/* If the record should update the VM, this is the new value */
+#define XLHP_HAS_VMFLAGS (1 << 0)
+
/* to handle recovery conflict during logical decoding on standby */
#define XLHP_IS_CATALOG_REL (1 << 1)
--
2.43.0
v6-0005-Combine-lazy_scan_prune-VM-corruption-cases.patchtext/x-patch; charset=US-ASCII; name=v6-0005-Combine-lazy_scan_prune-VM-corruption-cases.patchDownload
From b91503ceb9a923d922da88f13282a860916a9882 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <melanieplageman@gmail.com>
Date: Wed, 28 May 2025 16:04:03 -0400
Subject: [PATCH v6 05/20] Combine lazy_scan_prune VM corruption cases
lazy_scan_prune() updates the visibility map after phase I of heap
vacuuming. It also checks and fixes corruption in the VM. The corruption
cases where mixed in with the normal visibility map update cases.
Careful study of the ordering of the current logic reveals that the
corruption cases can be reordered and extracted into a separate
function. This should result in no additional overhead when compared to
previous execution.
This reordering makes it clear which cases are about corruption and
which cases are normal VM updates. Separating them also makes it
possible to combine the normal cases in a future commit. This will make
the logic easier to understand and allow for further separation of the
logic to allow updating the VM in the same record as pruning and
freezing in phase I.
---
src/backend/access/heap/vacuumlazy.c | 114 +++++++++++++++++----------
1 file changed, 73 insertions(+), 41 deletions(-)
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index d9e195269d2..04a7b6c4181 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -431,6 +431,12 @@ static void find_next_unskippable_block(LVRelState *vacrel, bool *skipsallvis);
static bool lazy_scan_new_or_empty(LVRelState *vacrel, Buffer buf,
BlockNumber blkno, Page page,
bool sharelock, Buffer vmbuffer);
+static bool identify_and_fix_vm_corruption(Relation relation,
+ BlockNumber heap_blk,
+ Buffer heap_buffer, Page heap_page,
+ bool heap_blk_known_av,
+ int64 nlpdead_items,
+ Buffer vmbuffer);
static int lazy_scan_prune(LVRelState *vacrel, Buffer buf,
BlockNumber blkno, Page page,
Buffer vmbuffer, bool all_visible_according_to_vm,
@@ -1933,6 +1939,66 @@ lazy_scan_new_or_empty(LVRelState *vacrel, Buffer buf, BlockNumber blkno,
return false;
}
+/*
+ * When updating the visibility map after phase I heap vacuuming, we take the
+ * opportunity to identify and fix any VM corruption.
+ *
+ * heap_blk_known_av is the visibility status of the heap page collected
+ * while finding the next unskippable block in heap_vac_scan_next_block().
+ */
+static bool
+identify_and_fix_vm_corruption(Relation relation,
+ BlockNumber heap_blk,
+ Buffer heap_buffer, Page heap_page,
+ bool heap_blk_known_av,
+ int64 nlpdead_items,
+ Buffer vmbuffer)
+{
+ /*
+ * As of PostgreSQL 9.2, the visibility map bit should never be set if the
+ * page-level bit is clear. However, it's possible that the bit got
+ * cleared after heap_vac_scan_next_block() was called, so we must recheck
+ * with buffer lock before concluding that the VM is corrupt.
+ */
+ if (heap_blk_known_av && !PageIsAllVisible(heap_page) &&
+ visibilitymap_get_status(relation, heap_blk, &vmbuffer) != 0)
+ {
+ elog(WARNING, "page is not marked all-visible but visibility map bit is set in relation \"%s\" page %u",
+ RelationGetRelationName(relation), heap_blk);
+ visibilitymap_clear(relation, heap_blk, vmbuffer,
+ VISIBILITYMAP_VALID_BITS);
+ return true;
+ }
+
+ /*
+ * It's possible for the value returned by
+ * GetOldestNonRemovableTransactionId() to move backwards, so it's not
+ * wrong for us to see tuples that appear to not be visible to everyone
+ * yet, while PD_ALL_VISIBLE is already set. The real safe xmin value
+ * never moves backwards, but GetOldestNonRemovableTransactionId() is
+ * conservative and sometimes returns a value that's unnecessarily small,
+ * so if we see that contradiction it just means that the tuples that we
+ * think are not visible to everyone yet actually are, and the
+ * PD_ALL_VISIBLE flag is correct.
+ *
+ * There should never be LP_DEAD items on a page with PD_ALL_VISIBLE set,
+ * however.
+ */
+ if (nlpdead_items > 0 && PageIsAllVisible(heap_page))
+ {
+ elog(WARNING, "page containing LP_DEAD items is marked as all-visible in relation \"%s\" page %u",
+ RelationGetRelationName(relation), heap_blk);
+ PageClearAllVisible(heap_page);
+ MarkBufferDirty(heap_buffer);
+ visibilitymap_clear(relation, heap_blk, vmbuffer,
+ VISIBILITYMAP_VALID_BITS);
+ return true;
+ }
+
+ return false;
+}
+
+
/* qsort comparator for sorting OffsetNumbers */
static int
cmpOffsetNumbers(const void *a, const void *b)
@@ -2079,9 +2145,14 @@ lazy_scan_prune(LVRelState *vacrel,
/*
* Handle setting visibility map bit based on information from the VM (as
* of last heap_vac_scan_next_block() call), and from all_visible and
- * all_frozen variables
+ * all_frozen variables. Start by looking for any VM corruption.
*/
- if (!all_visible_according_to_vm && presult.all_visible)
+ if (identify_and_fix_vm_corruption(vacrel->rel, blkno, buf, page,
+ all_visible_according_to_vm, presult.lpdead_items, vmbuffer))
+ {
+ /* Don't update the VM if we just cleared corruption in it */
+ }
+ else if (!all_visible_according_to_vm && presult.all_visible)
{
uint8 old_vmbits;
uint8 flags = VISIBILITYMAP_ALL_VISIBLE;
@@ -2133,45 +2204,6 @@ lazy_scan_prune(LVRelState *vacrel,
}
}
- /*
- * As of PostgreSQL 9.2, the visibility map bit should never be set if the
- * page-level bit is clear. However, it's possible that the bit got
- * cleared after heap_vac_scan_next_block() was called, so we must recheck
- * with buffer lock before concluding that the VM is corrupt.
- */
- else if (all_visible_according_to_vm && !PageIsAllVisible(page) &&
- visibilitymap_get_status(vacrel->rel, blkno, &vmbuffer) != 0)
- {
- elog(WARNING, "page is not marked all-visible but visibility map bit is set in relation \"%s\" page %u",
- vacrel->relname, blkno);
- visibilitymap_clear(vacrel->rel, blkno, vmbuffer,
- VISIBILITYMAP_VALID_BITS);
- }
-
- /*
- * It's possible for the value returned by
- * GetOldestNonRemovableTransactionId() to move backwards, so it's not
- * wrong for us to see tuples that appear to not be visible to everyone
- * yet, while PD_ALL_VISIBLE is already set. The real safe xmin value
- * never moves backwards, but GetOldestNonRemovableTransactionId() is
- * conservative and sometimes returns a value that's unnecessarily small,
- * so if we see that contradiction it just means that the tuples that we
- * think are not visible to everyone yet actually are, and the
- * PD_ALL_VISIBLE flag is correct.
- *
- * There should never be LP_DEAD items on a page with PD_ALL_VISIBLE set,
- * however.
- */
- else if (presult.lpdead_items > 0 && PageIsAllVisible(page))
- {
- elog(WARNING, "page containing LP_DEAD items is marked as all-visible in relation \"%s\" page %u",
- vacrel->relname, blkno);
- PageClearAllVisible(page);
- MarkBufferDirty(buf);
- visibilitymap_clear(vacrel->rel, blkno, vmbuffer,
- VISIBILITYMAP_VALID_BITS);
- }
-
/*
* If the all-visible page is all-frozen but not marked as such yet, mark
* it as all-frozen. Note that all_frozen is only valid if all_visible is
--
2.43.0
v6-0002-Make-heap_page_is_all_visible-independent-of-LVRe.patchtext/x-patch; charset=US-ASCII; name=v6-0002-Make-heap_page_is_all_visible-independent-of-LVRe.patchDownload
From fc11a1942761e1d9f84b805c57333dddede5aa83 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <melanieplageman@gmail.com>
Date: Wed, 18 Jun 2025 15:48:51 -0400
Subject: [PATCH v6 02/20] Make heap_page_is_all_visible independent of
LVRelState
Future commits will use this function inside of pruneheap.c where we do
not have access to the LVRelState. We only need two parameters from the
LVRelState, so just pass those in explicitly.
---
src/backend/access/heap/vacuumlazy.c | 45 ++++++++++++++++++----------
1 file changed, 29 insertions(+), 16 deletions(-)
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index 14036c27e87..8a62a93eee5 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -464,8 +464,11 @@ static void dead_items_add(LVRelState *vacrel, BlockNumber blkno, OffsetNumber *
int num_offsets);
static void dead_items_reset(LVRelState *vacrel);
static void dead_items_cleanup(LVRelState *vacrel);
-static bool heap_page_is_all_visible(LVRelState *vacrel, Buffer buf,
- TransactionId *visibility_cutoff_xid, bool *all_frozen);
+static bool heap_page_is_all_visible(Relation rel, Buffer buf,
+ TransactionId OldestXmin,
+ bool *all_frozen,
+ TransactionId *visibility_cutoff_xid,
+ OffsetNumber *logging_offnum);
static void update_relstats_all_indexes(LVRelState *vacrel);
static void vacuum_error_callback(void *arg);
static void update_vacuum_error_info(LVRelState *vacrel,
@@ -2010,8 +2013,9 @@ lazy_scan_prune(LVRelState *vacrel,
Assert(presult.lpdead_items == 0);
- if (!heap_page_is_all_visible(vacrel, buf,
- &debug_cutoff, &debug_all_frozen))
+ if (!heap_page_is_all_visible(vacrel->rel, buf,
+ vacrel->cutoffs.OldestXmin, &debug_all_frozen,
+ &debug_cutoff, &vacrel->offnum))
Assert(false);
Assert(presult.all_frozen == debug_all_frozen);
@@ -2907,8 +2911,8 @@ lazy_vacuum_heap_page(LVRelState *vacrel, BlockNumber blkno, Buffer buffer,
* emitted.
*/
Assert(!PageIsAllVisible(page));
- if (heap_page_is_all_visible(vacrel, buffer, &visibility_cutoff_xid,
- &all_frozen))
+ if (heap_page_is_all_visible(vacrel->rel, buffer, vacrel->cutoffs.OldestXmin,
+ &all_frozen, &visibility_cutoff_xid, &vacrel->offnum))
{
uint8 flags = VISIBILITYMAP_ALL_VISIBLE;
@@ -3592,9 +3596,16 @@ dead_items_cleanup(LVRelState *vacrel)
/*
* Check if every tuple in the given page is visible to all current and future
- * transactions. Also return the visibility_cutoff_xid which is the highest
- * xmin amongst the visible tuples. Set *all_frozen to true if every tuple
- * on this page is frozen.
+ * transactions.
+ *
+ * OldestXmin is used to determine visibility.
+ *
+ * *logging_offnum will have the OffsetNumber of the current tuple being
+ * processed for vacuum's error callback system.
+ *
+ * Return the visibility_cutoff_xid which is the highest xmin amongst the
+ * visible tuples. Sets *all_frozen to true if every tuple on this page is
+ * frozen.
*
* This is a stripped down version of lazy_scan_prune(). If you change
* anything here, make sure that everything stays in sync. Note that an
@@ -3602,9 +3613,11 @@ dead_items_cleanup(LVRelState *vacrel)
* introducing new side-effects here.
*/
static bool
-heap_page_is_all_visible(LVRelState *vacrel, Buffer buf,
+heap_page_is_all_visible(Relation rel, Buffer buf,
+ TransactionId OldestXmin,
+ bool *all_frozen,
TransactionId *visibility_cutoff_xid,
- bool *all_frozen)
+ OffsetNumber *logging_offnum)
{
Page page = BufferGetPage(buf);
BlockNumber blockno = BufferGetBlockNumber(buf);
@@ -3627,7 +3640,7 @@ heap_page_is_all_visible(LVRelState *vacrel, Buffer buf,
* Set the offset number so that we can display it along with any
* error that occurred while processing this tuple.
*/
- vacrel->offnum = offnum;
+ *logging_offnum = offnum;
itemid = PageGetItemId(page, offnum);
/* Unused or redirect line pointers are of no interest */
@@ -3651,9 +3664,9 @@ heap_page_is_all_visible(LVRelState *vacrel, Buffer buf,
tuple.t_data = (HeapTupleHeader) PageGetItem(page, itemid);
tuple.t_len = ItemIdGetLength(itemid);
- tuple.t_tableOid = RelationGetRelid(vacrel->rel);
+ tuple.t_tableOid = RelationGetRelid(rel);
- switch (HeapTupleSatisfiesVacuum(&tuple, vacrel->cutoffs.OldestXmin,
+ switch (HeapTupleSatisfiesVacuum(&tuple, OldestXmin,
buf))
{
case HEAPTUPLE_LIVE:
@@ -3674,7 +3687,7 @@ heap_page_is_all_visible(LVRelState *vacrel, Buffer buf,
*/
xmin = HeapTupleHeaderGetXmin(tuple.t_data);
if (!TransactionIdPrecedes(xmin,
- vacrel->cutoffs.OldestXmin))
+ OldestXmin))
{
all_visible = false;
*all_frozen = false;
@@ -3709,7 +3722,7 @@ heap_page_is_all_visible(LVRelState *vacrel, Buffer buf,
} /* scan along page */
/* Clear the offset information once we have processed the given page. */
- vacrel->offnum = InvalidOffsetNumber;
+ *logging_offnum = InvalidOffsetNumber;
return all_visible;
}
--
2.43.0
v6-0004-Use-xl_heap_prune-record-for-setting-empty-pages-.patchtext/x-patch; charset=US-ASCII; name=v6-0004-Use-xl_heap_prune-record-for-setting-empty-pages-.patchDownload
From affe7d96f42d36b5f68bea81dbcf08b44648181b Mon Sep 17 00:00:00 2001
From: Melanie Plageman <melanieplageman@gmail.com>
Date: Wed, 18 Jun 2025 12:42:19 -0400
Subject: [PATCH v6 04/20] Use xl_heap_prune record for setting empty pages
all-visible
As part of a project to eliminate xl_heap_visible records, eliminate
their usage in phase I vacuum of empty pages.
---
src/backend/access/heap/pruneheap.c | 14 +++++--
src/backend/access/heap/vacuumlazy.c | 55 ++++++++++++++++++----------
src/include/access/heapam.h | 1 +
3 files changed, 47 insertions(+), 23 deletions(-)
diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index d9ba0f96e34..97e51f78854 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -836,6 +836,7 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
conflict_xid = prstate.latest_xid_removed;
log_heap_prune_and_freeze(relation, buffer,
+ false,
InvalidBuffer, 0, false,
conflict_xid,
true, reason,
@@ -2051,6 +2052,9 @@ heap_log_freeze_plan(HeapTupleFreeze *tuples, int ntuples,
* case, vmbuffer should already have been updated and marked dirty and should
* still be pinned and locked.
*
+ * force_heap_fpi indicates that a full page image of the heap block should be
+ * forced.
+ *
* set_pd_all_vis indicates that we set PD_ALL_VISIBLE and thus should update
* the page LSN when checksums/wal_log_hints are enabled even if we did not
* prune or freeze tuples on the page.
@@ -2061,6 +2065,7 @@ heap_log_freeze_plan(HeapTupleFreeze *tuples, int ntuples,
*/
void
log_heap_prune_and_freeze(Relation relation, Buffer buffer,
+ bool force_heap_fpi,
Buffer vmbuffer,
uint8 vmflags,
bool set_pd_all_vis,
@@ -2089,13 +2094,16 @@ log_heap_prune_and_freeze(Relation relation, Buffer buffer,
xlrec.flags = 0;
regbuf_flags = REGBUF_STANDARD;
+ if (force_heap_fpi)
+ regbuf_flags |= REGBUF_FORCE_IMAGE;
+
/*
* We can avoid an FPI if the only modification we are making to the heap
* page is to set PD_ALL_VISIBLE and checksums/wal_log_hints are disabled.
*/
- if (!do_prune &&
- nfrozen == 0 &&
- (!set_pd_all_vis || !XLogHintBitIsNeeded()))
+ else if (!do_prune &&
+ nfrozen == 0 &&
+ (!set_pd_all_vis || !XLogHintBitIsNeeded()))
regbuf_flags |= REGBUF_NO_IMAGE;
/*
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index 460cdbd8417..d9e195269d2 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -1878,33 +1878,47 @@ lazy_scan_new_or_empty(LVRelState *vacrel, Buffer buf, BlockNumber blkno,
*/
if (!PageIsAllVisible(page))
{
+ uint8 new_vmbits = VISIBILITYMAP_ALL_VISIBLE |
+ VISIBILITYMAP_ALL_FROZEN;
+
START_CRIT_SECTION();
- /* mark buffer dirty before writing a WAL record */
+ PageSetAllVisible(page);
MarkBufferDirty(buf);
- /*
- * It's possible that another backend has extended the heap,
- * initialized the page, and then failed to WAL-log the page due
- * to an ERROR. Since heap extension is not WAL-logged, recovery
- * might try to replay our record setting the page all-visible and
- * find that the page isn't initialized, which will cause a PANIC.
- * To prevent that, check whether the page has been previously
- * WAL-logged, and if not, do that now.
- */
- if (RelationNeedsWAL(vacrel->rel) &&
- PageGetLSN(page) == InvalidXLogRecPtr)
- log_newpage_buffer(buf, true);
+ LockBuffer(vmbuffer, BUFFER_LOCK_EXCLUSIVE);
+ visibilitymap_set_vmbyte(vacrel->rel, blkno,
+ vmbuffer, new_vmbits);
+
+ if (RelationNeedsWAL(vacrel->rel))
+ {
+ /*
+ * It's possible that another backend has extended the heap,
+ * initialized the page, and then failed to WAL-log the page
+ * due to an ERROR. Since heap extension is not WAL-logged,
+ * recovery might try to replay our record setting the page
+ * all-visible and find that the page isn't initialized, which
+ * will cause a PANIC. To prevent that, if the page hasn't
+ * been previously WAL-logged, force a heap FPI.
+ */
+ log_heap_prune_and_freeze(vacrel->rel, buf,
+ PageGetLSN(page) == InvalidXLogRecPtr,
+ vmbuffer,
+ new_vmbits,
+ true,
+ InvalidTransactionId,
+ false, PRUNE_VACUUM_SCAN,
+ NULL, 0,
+ NULL, 0,
+ NULL, 0,
+ NULL, 0);
+ }
- PageSetAllVisible(page);
- visibilitymap_set(vacrel->rel, blkno, buf,
- InvalidXLogRecPtr,
- vmbuffer, InvalidTransactionId,
- VISIBILITYMAP_ALL_VISIBLE |
- VISIBILITYMAP_ALL_FROZEN);
END_CRIT_SECTION();
- /* Count the newly all-frozen pages for logging */
+ LockBuffer(vmbuffer, BUFFER_LOCK_UNLOCK);
+
+ /* Count the newly all-frozen pages for logging. */
vacrel->vm_new_visible_pages++;
vacrel->vm_new_visible_frozen_pages++;
}
@@ -2918,6 +2932,7 @@ lazy_vacuum_heap_page(LVRelState *vacrel, BlockNumber blkno, Buffer buffer,
if (RelationNeedsWAL(vacrel->rel))
{
log_heap_prune_and_freeze(vacrel->rel, buffer,
+ false,
vmbuffer,
vmflags,
set_pd_all_vis,
diff --git a/src/include/access/heapam.h b/src/include/access/heapam.h
index 8b47295efa2..e7129a644a1 100644
--- a/src/include/access/heapam.h
+++ b/src/include/access/heapam.h
@@ -394,6 +394,7 @@ extern void heap_page_prune_execute(Buffer buffer, bool lp_truncate_only,
OffsetNumber *nowunused, int nunused);
extern void heap_get_root_tuples(Page page, OffsetNumber *root_offsets);
extern void log_heap_prune_and_freeze(Relation relation, Buffer buffer,
+ bool force_heap_fpi,
Buffer vmbuffer,
uint8 vmflags,
bool vm_modified_heap_page,
--
2.43.0
v6-0007-Find-and-fix-VM-corruption-in-heap_page_prune_and.patchtext/x-patch; charset=US-ASCII; name=v6-0007-Find-and-fix-VM-corruption-in-heap_page_prune_and.patchDownload
From 423b44273997b9436123bac012fc6cdb78cea824 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <melanieplageman@gmail.com>
Date: Wed, 28 May 2025 16:45:59 -0400
Subject: [PATCH v6 07/20] Find and fix VM corruption in
heap_page_prune_and_freeze
Future commits will update the VM in the same critical section and WAL
record as pruning and freezing. For ease of review, this commit makes
one step toward doing this. It moves the VM corruption handling case to
heap_page_prune_and_freeze().
---
src/backend/access/heap/pruneheap.c | 87 +++++++++++++++++++++++++++-
src/backend/access/heap/vacuumlazy.c | 77 +++---------------------
src/include/access/heapam.h | 4 ++
3 files changed, 96 insertions(+), 72 deletions(-)
diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index 97e51f78854..496b70e318f 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -21,7 +21,7 @@
#include "access/transam.h"
#include "access/xlog.h"
#include "access/xloginsert.h"
-#include "access/visibilitymapdefs.h"
+#include "access/visibilitymap.h"
#include "commands/vacuum.h"
#include "executor/instrument.h"
#include "miscadmin.h"
@@ -177,6 +177,13 @@ static void heap_prune_record_unchanged_lp_redirect(PruneState *prstate, OffsetN
static void page_verify_redirects(Page page);
+static bool identify_and_fix_vm_corruption(Relation relation,
+ BlockNumber heap_blk,
+ Buffer heap_buffer, Page heap_page,
+ bool heap_blk_known_av,
+ int64 nlpdead_items,
+ Buffer vmbuffer);
+
/*
* Optionally prune and repair fragmentation in the specified page.
@@ -261,7 +268,9 @@ heap_page_prune_opt(Relation relation, Buffer buffer)
* not the relation has indexes, since we cannot safely determine
* that during on-access pruning with the current implementation.
*/
- heap_page_prune_and_freeze(relation, buffer, vistest, 0,
+ heap_page_prune_and_freeze(relation, buffer, false,
+ InvalidBuffer,
+ vistest, 0,
NULL, &presult, PRUNE_ON_ACCESS, &dummy_off_loc, NULL, NULL);
/*
@@ -294,6 +303,64 @@ heap_page_prune_opt(Relation relation, Buffer buffer)
}
}
+/*
+ * When updating the visibility map after phase I heap vacuuming, we take the
+ * opportunity to identify and fix any VM corruption.
+ *
+ * heap_blk_known_av is the visibility status of the heap page collected
+ * while finding the next unskippable block in heap_vac_scan_next_block().
+ */
+static bool
+identify_and_fix_vm_corruption(Relation relation,
+ BlockNumber heap_blk,
+ Buffer heap_buffer, Page heap_page,
+ bool heap_blk_known_av,
+ int64 nlpdead_items,
+ Buffer vmbuffer)
+{
+ /*
+ * As of PostgreSQL 9.2, the visibility map bit should never be set if the
+ * page-level bit is clear. However, it's possible that the bit got
+ * cleared after heap_vac_scan_next_block() was called, so we must recheck
+ * with buffer lock before concluding that the VM is corrupt.
+ */
+ if (heap_blk_known_av && !PageIsAllVisible(heap_page) &&
+ visibilitymap_get_status(relation, heap_blk, &vmbuffer) != 0)
+ {
+ elog(WARNING, "page is not marked all-visible but visibility map bit is set in relation \"%s\" page %u",
+ RelationGetRelationName(relation), heap_blk);
+ visibilitymap_clear(relation, heap_blk, vmbuffer,
+ VISIBILITYMAP_VALID_BITS);
+ return true;
+ }
+
+ /*
+ * It's possible for the value returned by
+ * GetOldestNonRemovableTransactionId() to move backwards, so it's not
+ * wrong for us to see tuples that appear to not be visible to everyone
+ * yet, while PD_ALL_VISIBLE is already set. The real safe xmin value
+ * never moves backwards, but GetOldestNonRemovableTransactionId() is
+ * conservative and sometimes returns a value that's unnecessarily small,
+ * so if we see that contradiction it just means that the tuples that we
+ * think are not visible to everyone yet actually are, and the
+ * PD_ALL_VISIBLE flag is correct.
+ *
+ * There should never be LP_DEAD items on a page with PD_ALL_VISIBLE set,
+ * however.
+ */
+ if (nlpdead_items > 0 && PageIsAllVisible(heap_page))
+ {
+ elog(WARNING, "page containing LP_DEAD items is marked as all-visible in relation \"%s\" page %u",
+ RelationGetRelationName(relation), heap_blk);
+ PageClearAllVisible(heap_page);
+ MarkBufferDirty(heap_buffer);
+ visibilitymap_clear(relation, heap_blk, vmbuffer,
+ VISIBILITYMAP_VALID_BITS);
+ return true;
+ }
+
+ return false;
+}
/*
* Prune and repair fragmentation and potentially freeze tuples on the
@@ -314,6 +381,10 @@ heap_page_prune_opt(Relation relation, Buffer buffer)
* HEAP_PRUNE_FREEZE option is not set, because at the moment only callers
* that also freeze need that information.
*
+ * blk_known_av is the visibility status of the heap block as of the last call
+ * to find_next_unskippable_block(). vmbuffer is the buffer that may already
+ * contain the required block of the visibility map.
+ *
* vistest is used to distinguish whether tuples are DEAD or RECENTLY_DEAD
* (see heap_prune_satisfies_vacuum).
*
@@ -349,6 +420,8 @@ heap_page_prune_opt(Relation relation, Buffer buffer)
*/
void
heap_page_prune_and_freeze(Relation relation, Buffer buffer,
+ bool blk_known_av,
+ Buffer vmbuffer,
GlobalVisState *vistest,
int options,
struct VacuumCutoffs *cutoffs,
@@ -897,6 +970,16 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
presult->lpdead_items = prstate.lpdead_items;
/* the presult->deadoffsets array was already filled in */
+ /*
+ * Clear any VM corruption. This does not need to be done in a critical
+ * section.
+ */
+ presult->vm_corruption = false;
+ if (options & HEAP_PAGE_PRUNE_UPDATE_VM)
+ presult->vm_corruption = identify_and_fix_vm_corruption(relation,
+ blockno, buffer, page,
+ blk_known_av,
+ prstate.lpdead_items, vmbuffer);
if (prstate.freeze)
{
if (presult->nfrozen > 0)
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index f6cdd9e6828..0c121fdf4e6 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -431,12 +431,6 @@ static void find_next_unskippable_block(LVRelState *vacrel, bool *skipsallvis);
static bool lazy_scan_new_or_empty(LVRelState *vacrel, Buffer buf,
BlockNumber blkno, Page page,
bool sharelock, Buffer vmbuffer);
-static bool identify_and_fix_vm_corruption(Relation relation,
- BlockNumber heap_blk,
- Buffer heap_buffer, Page heap_page,
- bool heap_blk_known_av,
- int64 nlpdead_items,
- Buffer vmbuffer);
static int lazy_scan_prune(LVRelState *vacrel, Buffer buf,
BlockNumber blkno, Page page,
Buffer vmbuffer, bool all_visible_according_to_vm,
@@ -1939,65 +1933,6 @@ lazy_scan_new_or_empty(LVRelState *vacrel, Buffer buf, BlockNumber blkno,
return false;
}
-/*
- * When updating the visibility map after phase I heap vacuuming, we take the
- * opportunity to identify and fix any VM corruption.
- *
- * heap_blk_known_av is the visibility status of the heap page collected
- * while finding the next unskippable block in heap_vac_scan_next_block().
- */
-static bool
-identify_and_fix_vm_corruption(Relation relation,
- BlockNumber heap_blk,
- Buffer heap_buffer, Page heap_page,
- bool heap_blk_known_av,
- int64 nlpdead_items,
- Buffer vmbuffer)
-{
- /*
- * As of PostgreSQL 9.2, the visibility map bit should never be set if the
- * page-level bit is clear. However, it's possible that the bit got
- * cleared after heap_vac_scan_next_block() was called, so we must recheck
- * with buffer lock before concluding that the VM is corrupt.
- */
- if (heap_blk_known_av && !PageIsAllVisible(heap_page) &&
- visibilitymap_get_status(relation, heap_blk, &vmbuffer) != 0)
- {
- elog(WARNING, "page is not marked all-visible but visibility map bit is set in relation \"%s\" page %u",
- RelationGetRelationName(relation), heap_blk);
- visibilitymap_clear(relation, heap_blk, vmbuffer,
- VISIBILITYMAP_VALID_BITS);
- return true;
- }
-
- /*
- * It's possible for the value returned by
- * GetOldestNonRemovableTransactionId() to move backwards, so it's not
- * wrong for us to see tuples that appear to not be visible to everyone
- * yet, while PD_ALL_VISIBLE is already set. The real safe xmin value
- * never moves backwards, but GetOldestNonRemovableTransactionId() is
- * conservative and sometimes returns a value that's unnecessarily small,
- * so if we see that contradiction it just means that the tuples that we
- * think are not visible to everyone yet actually are, and the
- * PD_ALL_VISIBLE flag is correct.
- *
- * There should never be LP_DEAD items on a page with PD_ALL_VISIBLE set,
- * however.
- */
- if (nlpdead_items > 0 && PageIsAllVisible(heap_page))
- {
- elog(WARNING, "page containing LP_DEAD items is marked as all-visible in relation \"%s\" page %u",
- RelationGetRelationName(relation), heap_blk);
- PageClearAllVisible(heap_page);
- MarkBufferDirty(heap_buffer);
- visibilitymap_clear(relation, heap_blk, vmbuffer,
- VISIBILITYMAP_VALID_BITS);
- return true;
- }
-
- return false;
-}
-
/* qsort comparator for sorting OffsetNumbers */
static int
@@ -2056,11 +1991,14 @@ lazy_scan_prune(LVRelState *vacrel,
* tuples. Pruning will have determined whether or not the page is
* all-visible.
*/
- prune_options = HEAP_PAGE_PRUNE_FREEZE;
+ prune_options = HEAP_PAGE_PRUNE_FREEZE | HEAP_PAGE_PRUNE_UPDATE_VM;
if (vacrel->nindexes == 0)
prune_options |= HEAP_PAGE_PRUNE_MARK_UNUSED_NOW;
- heap_page_prune_and_freeze(rel, buf, vacrel->vistest, prune_options,
+ heap_page_prune_and_freeze(rel, buf,
+ all_visible_according_to_vm,
+ vmbuffer,
+ vacrel->vistest, prune_options,
&vacrel->cutoffs, &presult, PRUNE_VACUUM_SCAN,
&vacrel->offnum,
&vacrel->NewRelfrozenXid, &vacrel->NewRelminMxid);
@@ -2145,10 +2083,9 @@ lazy_scan_prune(LVRelState *vacrel,
/*
* Handle setting visibility map bit based on information from the VM (as
* of last heap_vac_scan_next_block() call), and from all_visible and
- * all_frozen variables. Start by looking for any VM corruption.
+ * all_frozen variables.
*/
- if (identify_and_fix_vm_corruption(vacrel->rel, blkno, buf, page,
- all_visible_according_to_vm, presult.lpdead_items, vmbuffer))
+ if (presult.vm_corruption)
{
/* Don't update the VM if we just cleared corruption in it */
}
diff --git a/src/include/access/heapam.h b/src/include/access/heapam.h
index e7129a644a1..0c7eb5e46f4 100644
--- a/src/include/access/heapam.h
+++ b/src/include/access/heapam.h
@@ -42,6 +42,7 @@
/* "options" flag bits for heap_page_prune_and_freeze */
#define HEAP_PAGE_PRUNE_MARK_UNUSED_NOW (1 << 0)
#define HEAP_PAGE_PRUNE_FREEZE (1 << 1)
+#define HEAP_PAGE_PRUNE_UPDATE_VM (1 << 2)
typedef struct BulkInsertStateData *BulkInsertState;
struct TupleTableSlot;
@@ -247,6 +248,7 @@ typedef struct PruneFreezeResult
bool all_visible;
bool all_frozen;
TransactionId vm_conflict_horizon;
+ bool vm_corruption;
/*
* Whether or not the page makes rel truncation unsafe. This is set to
@@ -380,6 +382,8 @@ extern TransactionId heap_index_delete_tuples(Relation rel,
struct GlobalVisState;
extern void heap_page_prune_opt(Relation relation, Buffer buffer);
extern void heap_page_prune_and_freeze(Relation relation, Buffer buffer,
+ bool blk_known_av,
+ Buffer vmbuffer,
struct GlobalVisState *vistest,
int options,
struct VacuumCutoffs *cutoffs,
--
2.43.0
v6-0008-Keep-all_frozen-updated-too-in-heap_page_prune_an.patchtext/x-patch; charset=US-ASCII; name=v6-0008-Keep-all_frozen-updated-too-in-heap_page_prune_an.patchDownload
From f1169a90d2dea593f4ce565d8311a6cd23157208 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <melanieplageman@gmail.com>
Date: Wed, 23 Jul 2025 16:01:24 -0400
Subject: [PATCH v6 08/20] Keep all_frozen updated too in
heap_page_prune_and_freeze
We previously relied on only using all-visible and all-frozen together
but it's best to keep them both updated.
---
src/backend/access/heap/pruneheap.c | 15 ++++++---------
1 file changed, 6 insertions(+), 9 deletions(-)
diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index 496b70e318f..6c3653e776c 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -143,10 +143,6 @@ typedef struct
* whether to freeze the page or not. The all_visible and all_frozen
* values returned to the caller are adjusted to include LP_DEAD items at
* the end.
- *
- * all_frozen should only be considered valid if all_visible is also set;
- * we don't bother to clear the all_frozen flag every time we clear the
- * all_visible flag.
*/
bool all_visible;
bool all_frozen;
@@ -824,6 +820,7 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
*/
}
+ Assert(!prstate.all_frozen || prstate.all_visible);
/* Any error while applying the changes is critical */
START_CRIT_SECTION();
@@ -1468,7 +1465,7 @@ heap_prune_record_unchanged_lp_normal(Page page, PruneState *prstate, OffsetNumb
if (!HeapTupleHeaderXminCommitted(htup))
{
- prstate->all_visible = false;
+ prstate->all_visible = prstate->all_frozen = false;
break;
}
@@ -1490,7 +1487,7 @@ heap_prune_record_unchanged_lp_normal(Page page, PruneState *prstate, OffsetNumb
Assert(prstate->cutoffs);
if (!TransactionIdPrecedes(xmin, prstate->cutoffs->OldestXmin))
{
- prstate->all_visible = false;
+ prstate->all_visible = prstate->all_frozen = false;
break;
}
@@ -1503,7 +1500,7 @@ heap_prune_record_unchanged_lp_normal(Page page, PruneState *prstate, OffsetNumb
case HEAPTUPLE_RECENTLY_DEAD:
prstate->recently_dead_tuples++;
- prstate->all_visible = false;
+ prstate->all_visible = prstate->all_frozen = false;
/*
* This tuple will soon become DEAD. Update the hint field so
@@ -1522,7 +1519,7 @@ heap_prune_record_unchanged_lp_normal(Page page, PruneState *prstate, OffsetNumb
* assumption is a bit shaky, but it is what acquire_sample_rows()
* does, so be consistent.
*/
- prstate->all_visible = false;
+ prstate->all_visible = prstate->all_frozen = false;
/*
* If we wanted to optimize for aborts, we might consider marking
@@ -1540,7 +1537,7 @@ heap_prune_record_unchanged_lp_normal(Page page, PruneState *prstate, OffsetNumb
* will commit and update the counters after we report.
*/
prstate->live_tuples++;
- prstate->all_visible = false;
+ prstate->all_visible = prstate->all_frozen = false;
/*
* This tuple may soon become DEAD. Update the hint field so that
--
2.43.0
v6-0006-Combine-vacuum-phase-I-VM-update-cases.patchtext/x-patch; charset=US-ASCII; name=v6-0006-Combine-vacuum-phase-I-VM-update-cases.patchDownload
From 8416676eacfad2cfce34279f3edd1b280d1291b3 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <melanieplageman@gmail.com>
Date: Wed, 28 May 2025 16:35:36 -0400
Subject: [PATCH v6 06/20] Combine vacuum phase I VM update cases
We update the VM after phase I of vacuum -- either setting both the VM
bits when all bits are currently unset or setting just the frozen bit
when the all-visible bit is already set.
Those two cases shared much of the same code -- leading to unnecessary
duplication. This commit combines them, which is simpler and easier to
understand.
The combined case also happens to fix a longstanding bug where if we are
only setting an all-visible page all-frozen and checksums/wal_log_hints
are enabled, we would fail to set the buffer dirty before setting the
page LSN in visibilitymap_set().
---
src/backend/access/heap/vacuumlazy.c | 101 +++++++++------------------
1 file changed, 32 insertions(+), 69 deletions(-)
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index 04a7b6c4181..f6cdd9e6828 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -2152,11 +2152,26 @@ lazy_scan_prune(LVRelState *vacrel,
{
/* Don't update the VM if we just cleared corruption in it */
}
- else if (!all_visible_according_to_vm && presult.all_visible)
+
+ /*
+ * If the page isn't yet marked all-visible in the VM or it is and needs
+ * to me marked all-frozen, update the VM Note that all_frozen is only
+ * valid if all_visible is true, so we must check both all_visible and
+ * all_frozen.
+ */
+ else if (presult.all_visible &&
+ (!all_visible_according_to_vm ||
+ (presult.all_frozen && !VM_ALL_FROZEN(vacrel->rel, blkno, &vmbuffer))))
{
uint8 old_vmbits;
uint8 flags = VISIBILITYMAP_ALL_VISIBLE;
+ /*
+ * If the page is all-frozen, we can pass InvalidTransactionId as our
+ * cutoff_xid, since a snapshotConflictHorizon sufficient to make
+ * everything safe for REDO was logged when the page's tuples were
+ * frozen.
+ */
if (presult.all_frozen)
{
Assert(!TransactionIdIsValid(presult.vm_conflict_horizon));
@@ -2169,21 +2184,29 @@ lazy_scan_prune(LVRelState *vacrel,
* checksums are not enabled). Regardless, set both bits so that we
* get back in sync.
*
- * NB: If the heap page is all-visible but the VM bit is not set, we
- * don't need to dirty the heap page. However, if checksums are
- * enabled, we do need to make sure that the heap page is dirtied
- * before passing it to visibilitymap_set(), because it may be logged.
- * Given that this situation should only happen in rare cases after a
- * crash, it is not worth optimizing.
+ * If the heap page is all-visible but the VM bit is not set, we don't
+ * need to dirty the heap page. However, if checksums are enabled, we
+ * do need to make sure that the heap page is dirtied before passing
+ * it to visibilitymap_set(), because it may be logged.
*/
- PageSetAllVisible(page);
- MarkBufferDirty(buf);
+ if (!PageIsAllVisible(page) || XLogHintBitIsNeeded())
+ {
+ PageSetAllVisible(page);
+ MarkBufferDirty(buf);
+ }
+
old_vmbits = visibilitymap_set(vacrel->rel, blkno, buf,
InvalidXLogRecPtr,
vmbuffer, presult.vm_conflict_horizon,
flags);
/*
+ * Even if we are only setting the all-frozen bit, there is a small
+ * chance that the VM was modified sometime between setting
+ * all_visible_according_to_vm and checking the visibility during
+ * pruning. Check the return value of old_vmbits to ensure the
+ * visibility map counters used for logging are accurate.
+ *
* If the page wasn't already set all-visible and/or all-frozen in the
* VM, count it as newly set for logging.
*/
@@ -2204,66 +2227,6 @@ lazy_scan_prune(LVRelState *vacrel,
}
}
- /*
- * If the all-visible page is all-frozen but not marked as such yet, mark
- * it as all-frozen. Note that all_frozen is only valid if all_visible is
- * true, so we must check both all_visible and all_frozen.
- */
- else if (all_visible_according_to_vm && presult.all_visible &&
- presult.all_frozen && !VM_ALL_FROZEN(vacrel->rel, blkno, &vmbuffer))
- {
- uint8 old_vmbits;
-
- /*
- * Avoid relying on all_visible_according_to_vm as a proxy for the
- * page-level PD_ALL_VISIBLE bit being set, since it might have become
- * stale -- even when all_visible is set
- */
- if (!PageIsAllVisible(page))
- {
- PageSetAllVisible(page);
- MarkBufferDirty(buf);
- }
-
- /*
- * Set the page all-frozen (and all-visible) in the VM.
- *
- * We can pass InvalidTransactionId as our cutoff_xid, since a
- * snapshotConflictHorizon sufficient to make everything safe for REDO
- * was logged when the page's tuples were frozen.
- */
- Assert(!TransactionIdIsValid(presult.vm_conflict_horizon));
- old_vmbits = visibilitymap_set(vacrel->rel, blkno, buf,
- InvalidXLogRecPtr,
- vmbuffer, InvalidTransactionId,
- VISIBILITYMAP_ALL_VISIBLE |
- VISIBILITYMAP_ALL_FROZEN);
-
- /*
- * The page was likely already set all-visible in the VM. However,
- * there is a small chance that it was modified sometime between
- * setting all_visible_according_to_vm and checking the visibility
- * during pruning. Check the return value of old_vmbits anyway to
- * ensure the visibility map counters used for logging are accurate.
- */
- if ((old_vmbits & VISIBILITYMAP_ALL_VISIBLE) == 0)
- {
- vacrel->vm_new_visible_pages++;
- vacrel->vm_new_visible_frozen_pages++;
- *vm_page_frozen = true;
- }
-
- /*
- * We already checked that the page was not set all-frozen in the VM
- * above, so we don't need to test the value of old_vmbits.
- */
- else
- {
- vacrel->vm_new_frozen_pages++;
- *vm_page_frozen = true;
- }
- }
-
return presult.ndeleted;
}
--
2.43.0
v6-0009-Update-VM-in-pruneheap.c.patchtext/x-patch; charset=US-ASCII; name=v6-0009-Update-VM-in-pruneheap.c.patchDownload
From 952f9aa12924868d98951ca621d09e7aefc23b81 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <melanieplageman@gmail.com>
Date: Mon, 2 Jun 2025 11:04:14 -0400
Subject: [PATCH v6 09/20] Update VM in pruneheap.c
As a step toward updating the VM in the same critical section and WAL
record as pruning and freezing (during phase I of vacuuming), first move
the VM update (still in its own critical section and WAL record) into
heap_page_prune_and_freeze(). This makes review easier.
---
src/backend/access/heap/pruneheap.c | 99 +++++++++++++++++++++++-----
src/backend/access/heap/vacuumlazy.c | 99 +++++-----------------------
src/include/access/heapam.h | 15 +++--
3 files changed, 106 insertions(+), 107 deletions(-)
diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index 6c3653e776c..05227ce0339 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -360,7 +360,8 @@ identify_and_fix_vm_corruption(Relation relation,
/*
* Prune and repair fragmentation and potentially freeze tuples on the
- * specified page.
+ * specified page. If the page's visibility status has changed, update it in
+ * the VM.
*
* Caller must have pin and buffer cleanup lock on the page. Note that we
* don't update the FSM information for page on caller's behalf. Caller might
@@ -436,6 +437,8 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
bool do_freeze;
bool do_prune;
bool do_hint;
+ uint8 vmflags = 0;
+ uint8 old_vmbits = 0;
bool hint_bit_fpi;
int64 fpi_before = pgWalUsage.wal_fpi;
@@ -936,7 +939,7 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
*
* Now that freezing has been finalized, unset all_visible if there are
* any LP_DEAD items on the page. It needs to reflect the present state
- * of the page, as expected by our caller.
+ * of the page, as expected for updating the visibility map.
*/
if (prstate.all_visible && prstate.lpdead_items == 0)
{
@@ -952,31 +955,91 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
presult->hastup = prstate.hastup;
/*
- * For callers planning to update the visibility map, the conflict horizon
- * for that record must be the newest xmin on the page. However, if the
- * page is completely frozen, there can be no conflict and the
- * vm_conflict_horizon should remain InvalidTransactionId. This includes
- * the case that we just froze all the tuples; the prune-freeze record
- * included the conflict XID already so the caller doesn't need it.
+ * If updating the visibility map, the conflict horizon for that record
+ * must be the newest xmin on the page. However, if the page is
+ * completely frozen, there can be no conflict and the vm_conflict_horizon
+ * should remain InvalidTransactionId. This includes the case that we
+ * just froze all the tuples; the prune-freeze record included the
+ * conflict XID already so the VM update record doesn't need it.
*/
if (presult->all_frozen)
presult->vm_conflict_horizon = InvalidTransactionId;
else
presult->vm_conflict_horizon = prstate.visibility_cutoff_xid;
- presult->lpdead_items = prstate.lpdead_items;
- /* the presult->deadoffsets array was already filled in */
-
/*
- * Clear any VM corruption. This does not need to be done in a critical
- * section.
+ * Handle setting visibility map bit based on information from the VM (as
+ * of last heap_vac_scan_next_block() call), and from all_visible and
+ * all_frozen variables.
*/
- presult->vm_corruption = false;
if (options & HEAP_PAGE_PRUNE_UPDATE_VM)
- presult->vm_corruption = identify_and_fix_vm_corruption(relation,
- blockno, buffer, page,
- blk_known_av,
- prstate.lpdead_items, vmbuffer);
+ {
+ if (identify_and_fix_vm_corruption(relation,
+ blockno, buffer, page,
+ blk_known_av,
+ prstate.lpdead_items, vmbuffer))
+ {
+ /* If we fix corruption, don't update the VM further */
+ }
+
+ /*
+ * If the page isn't yet marked all-visible in the VM or it is and
+ * needs to me marked all-frozen, update the VM Note that all_frozen
+ * is only valid if all_visible is true, so we must check both
+ * all_visible and all_frozen.
+ */
+ else if (presult->all_visible &&
+ (!blk_known_av ||
+ (presult->all_frozen && !VM_ALL_FROZEN(relation, blockno, &vmbuffer))))
+ {
+ Assert(prstate.lpdead_items == 0);
+ vmflags = VISIBILITYMAP_ALL_VISIBLE;
+
+ /*
+ * If the page is all-frozen, we can pass InvalidTransactionId as
+ * our cutoff_xid, since a snapshotConflictHorizon sufficient to
+ * make everything safe for REDO was logged when the page's tuples
+ * were frozen.
+ */
+ if (presult->all_frozen)
+ {
+ Assert(!TransactionIdIsValid(presult->vm_conflict_horizon));
+ vmflags |= VISIBILITYMAP_ALL_FROZEN;
+ }
+
+ /*
+ * It's possible for the VM bit to be clear and the page-level bit
+ * to be set if checksums are not enabled.
+ *
+ * And even if we are just planning to update the frozen bit in
+ * the VM, we shouldn't rely on all_visible_according_to_vm as a
+ * proxy for the page-level PD_ALL_VISIBLE bit being set, since it
+ * might have become stale.
+ *
+ * If the heap page is all-visible but the VM bit is not set, we
+ * don't need to dirty the heap page. However, if checksums are
+ * enabled, we do need to make sure that the heap page is dirtied
+ * before passing it to visibilitymap_set(), because it may be
+ * logged.
+ */
+ if (!PageIsAllVisible(page) || XLogHintBitIsNeeded())
+ {
+ PageSetAllVisible(page);
+ MarkBufferDirty(buffer);
+ }
+
+ old_vmbits = visibilitymap_set(relation, blockno, buffer, InvalidXLogRecPtr,
+ vmbuffer, presult->vm_conflict_horizon,
+ vmflags);
+ }
+ }
+
+ presult->lpdead_items = prstate.lpdead_items;
+ /* the presult->deadoffsets array was already filled in */
+
+ presult->old_vmbits = old_vmbits;
+ presult->new_vmbits = vmflags;
+
if (prstate.freeze)
{
if (presult->nfrozen > 0)
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index 0c121fdf4e6..c49e81bc5dd 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -1933,7 +1933,6 @@ lazy_scan_new_or_empty(LVRelState *vacrel, Buffer buf, BlockNumber blkno,
return false;
}
-
/* qsort comparator for sorting OffsetNumbers */
static int
cmpOffsetNumbers(const void *a, const void *b)
@@ -1949,7 +1948,8 @@ cmpOffsetNumbers(const void *a, const void *b)
* vmbuffer is the buffer containing the VM block with visibility information
* for the heap block, blkno. all_visible_according_to_vm is the saved
* visibility status of the heap block looked up earlier by the caller. We
- * won't rely entirely on this status, as it may be out of date.
+ * won't rely entirely on this status, as it may be out of date. These will be
+ * passed on to heap_page_prune_and_freeze() to use while setting the VM.
*
* *has_lpdead_items is set to true or false depending on whether, upon return
* from this function, any LP_DEAD items are still present on the page.
@@ -1978,6 +1978,7 @@ lazy_scan_prune(LVRelState *vacrel,
/*
* Prune all HOT-update chains and potentially freeze tuples on this page.
+ * Then, if the page's visibility status has changed, update the VM.
*
* If the relation has no indexes, we can immediately mark would-be dead
* items LP_UNUSED.
@@ -1986,10 +1987,6 @@ lazy_scan_prune(LVRelState *vacrel,
* presult.ndeleted. It should not be confused with presult.lpdead_items;
* presult.lpdead_items's final value can be thought of as the number of
* tuples that were deleted from indexes.
- *
- * We will update the VM after collecting LP_DEAD items and freezing
- * tuples. Pruning will have determined whether or not the page is
- * all-visible.
*/
prune_options = HEAP_PAGE_PRUNE_FREEZE | HEAP_PAGE_PRUNE_UPDATE_VM;
if (vacrel->nindexes == 0)
@@ -2081,88 +2078,26 @@ lazy_scan_prune(LVRelState *vacrel,
Assert(!presult.all_visible || !(*has_lpdead_items));
/*
- * Handle setting visibility map bit based on information from the VM (as
- * of last heap_vac_scan_next_block() call), and from all_visible and
- * all_frozen variables.
+ * For the purposes of logging, count whether or not the page was newly
+ * set all-visible and, potentially, all-frozen.
*/
- if (presult.vm_corruption)
+ if ((presult.old_vmbits & VISIBILITYMAP_ALL_VISIBLE) == 0 &&
+ (presult.new_vmbits & VISIBILITYMAP_ALL_VISIBLE) != 0)
{
- /* Don't update the VM if we just cleared corruption in it */
- }
-
- /*
- * If the page isn't yet marked all-visible in the VM or it is and needs
- * to me marked all-frozen, update the VM Note that all_frozen is only
- * valid if all_visible is true, so we must check both all_visible and
- * all_frozen.
- */
- else if (presult.all_visible &&
- (!all_visible_according_to_vm ||
- (presult.all_frozen && !VM_ALL_FROZEN(vacrel->rel, blkno, &vmbuffer))))
- {
- uint8 old_vmbits;
- uint8 flags = VISIBILITYMAP_ALL_VISIBLE;
-
- /*
- * If the page is all-frozen, we can pass InvalidTransactionId as our
- * cutoff_xid, since a snapshotConflictHorizon sufficient to make
- * everything safe for REDO was logged when the page's tuples were
- * frozen.
- */
- if (presult.all_frozen)
- {
- Assert(!TransactionIdIsValid(presult.vm_conflict_horizon));
- flags |= VISIBILITYMAP_ALL_FROZEN;
- }
-
- /*
- * It should never be the case that the visibility map page is set
- * while the page-level bit is clear, but the reverse is allowed (if
- * checksums are not enabled). Regardless, set both bits so that we
- * get back in sync.
- *
- * If the heap page is all-visible but the VM bit is not set, we don't
- * need to dirty the heap page. However, if checksums are enabled, we
- * do need to make sure that the heap page is dirtied before passing
- * it to visibilitymap_set(), because it may be logged.
- */
- if (!PageIsAllVisible(page) || XLogHintBitIsNeeded())
- {
- PageSetAllVisible(page);
- MarkBufferDirty(buf);
- }
-
- old_vmbits = visibilitymap_set(vacrel->rel, blkno, buf,
- InvalidXLogRecPtr,
- vmbuffer, presult.vm_conflict_horizon,
- flags);
-
- /*
- * Even if we are only setting the all-frozen bit, there is a small
- * chance that the VM was modified sometime between setting
- * all_visible_according_to_vm and checking the visibility during
- * pruning. Check the return value of old_vmbits to ensure the
- * visibility map counters used for logging are accurate.
- *
- * If the page wasn't already set all-visible and/or all-frozen in the
- * VM, count it as newly set for logging.
- */
- if ((old_vmbits & VISIBILITYMAP_ALL_VISIBLE) == 0)
- {
- vacrel->vm_new_visible_pages++;
- if (presult.all_frozen)
- {
- vacrel->vm_new_visible_frozen_pages++;
- *vm_page_frozen = true;
- }
- }
- else if ((old_vmbits & VISIBILITYMAP_ALL_FROZEN) == 0 &&
- presult.all_frozen)
+ vacrel->vm_new_visible_pages++;
+ if ((presult.new_vmbits & VISIBILITYMAP_ALL_FROZEN) != 0)
{
- vacrel->vm_new_frozen_pages++;
+ vacrel->vm_new_visible_frozen_pages++;
*vm_page_frozen = true;
}
}
+ else if ((presult.old_vmbits & VISIBILITYMAP_ALL_FROZEN) == 0 &&
+ (presult.new_vmbits & VISIBILITYMAP_ALL_FROZEN) != 0)
+ {
+ Assert((presult.new_vmbits & VISIBILITYMAP_ALL_VISIBLE) != 0);
+ vacrel->vm_new_frozen_pages++;
+ *vm_page_frozen = true;
+ }
return presult.ndeleted;
}
diff --git a/src/include/access/heapam.h b/src/include/access/heapam.h
index 0c7eb5e46f4..b85648456e9 100644
--- a/src/include/access/heapam.h
+++ b/src/include/access/heapam.h
@@ -235,20 +235,21 @@ typedef struct PruneFreezeResult
int recently_dead_tuples;
/*
- * all_visible and all_frozen indicate if the all-visible and all-frozen
- * bits in the visibility map can be set for this page, after pruning.
+ * all_visible and all_frozen indicate the status of the page as reflected
+ * in the visibility map after pruning, freezing, and setting any pages
+ * all-visible in the visibility map.
*
- * vm_conflict_horizon is the newest xmin of live tuples on the page. The
- * caller can use it as the conflict horizon when setting the VM bits. It
- * is only valid if we froze some tuples (nfrozen > 0), and all_frozen is
- * true.
+ * vm_conflict_horizon is the newest xmin of live tuples on the page
+ * (older than OldestXmin). It will only be valid if we did not set the
+ * page all-frozen in the VM.
*
* These are only set if the HEAP_PRUNE_FREEZE option is set.
*/
bool all_visible;
bool all_frozen;
TransactionId vm_conflict_horizon;
- bool vm_corruption;
+ uint8 old_vmbits;
+ uint8 new_vmbits;
/*
* Whether or not the page makes rel truncation unsafe. This is set to
--
2.43.0
v6-0010-Eliminate-xl_heap_visible-from-vacuum-phase-I-pru.patchtext/x-patch; charset=US-ASCII; name=v6-0010-Eliminate-xl_heap_visible-from-vacuum-phase-I-pru.patchDownload
From 64f09710ba6c738870217aa7fcd34e50bd52b93e Mon Sep 17 00:00:00 2001
From: Melanie Plageman <melanieplageman@gmail.com>
Date: Wed, 18 Jun 2025 12:41:00 -0400
Subject: [PATCH v6 10/20] Eliminate xl_heap_visible from vacuum phase I
prune/freeze
Instead of emitting a separate WAL record for every block rendered
all-visible/frozen by vacuum's phase I, include the changes to the VM in
the xl_heap_prune record already emitted.
This is only enabled for vacuum's prune/freeze work, not for on-access
pruning.
---
src/backend/access/heap/pruneheap.c | 454 ++++++++++++++++-----------
src/backend/access/heap/vacuumlazy.c | 30 --
src/include/access/heapam.h | 15 +-
3 files changed, 278 insertions(+), 221 deletions(-)
diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index 05227ce0339..cf9e5215d6b 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -44,6 +44,13 @@ typedef struct
bool mark_unused_now;
/* whether to attempt freezing tuples */
bool freeze;
+
+ /*
+ * Whether or not to consider updating the VM. There is some bookkeeping
+ * that must be maintained if we would like to update the VM.
+ */
+ bool consider_update_vm;
+
struct VacuumCutoffs *cutoffs;
/*-------------------------------------------------------
@@ -108,8 +115,9 @@ typedef struct
*
* These fields are not used by pruning itself for the most part, but are
* used to collect information about what was pruned and what state the
- * page is in after pruning, for the benefit of the caller. They are
- * copied to the caller's PruneFreezeResult at the end.
+ * page is in after pruning to use when updating the visibility map and
+ * for the benefit of the caller. They are copied to the caller's
+ * PruneFreezeResult at the end.
* -------------------------------------------------------
*/
@@ -138,11 +146,10 @@ typedef struct
* bits. It is only valid if we froze some tuples, and all_frozen is
* true.
*
- * NOTE: all_visible and all_frozen don't include LP_DEAD items. That's
- * convenient for heap_page_prune_and_freeze(), to use them to decide
- * whether to freeze the page or not. The all_visible and all_frozen
- * values returned to the caller are adjusted to include LP_DEAD items at
- * the end.
+ * NOTE: all_visible and all_frozen don't include LP_DEAD items until
+ * directly before updating the VM. We ignore LP_DEAD items when deciding
+ * whether or not to opportunistically freeze and when determining the
+ * snapshot conflict horizon required when freezing tuples.
*/
bool all_visible;
bool all_frozen;
@@ -371,12 +378,15 @@ identify_and_fix_vm_corruption(Relation relation,
* If the HEAP_PRUNE_FREEZE option is set, we will also freeze tuples if it's
* required in order to advance relfrozenxid / relminmxid, or if it's
* considered advantageous for overall system performance to do so now. The
- * 'cutoffs', 'presult', 'new_relfrozen_xid' and 'new_relmin_mxid' arguments
- * are required when freezing. When HEAP_PRUNE_FREEZE option is set, we also
- * set presult->all_visible and presult->all_frozen on exit, to indicate if
- * the VM bits can be set. They are always set to false when the
- * HEAP_PRUNE_FREEZE option is not set, because at the moment only callers
- * that also freeze need that information.
+ * 'cutoffs', 'new_relfrozen_xid' and 'new_relmin_mxid' arguments are required
+ * when freezing.
+ *
+ * If HEAP_PAGE_PRUNE_UPDATE_VM is set and the visibility status of the page
+ * has changed, we will update the VM at the same time as pruning and freezing
+ * the heap page. We will also update presult->old_vmbits and
+ * presult->new_vmbits with the state of the VM before and after updating it
+ * for the caller to use in bookkeeping. Note that new and old_vmbits will be
+ * 0 if HEAP_PAGE_PRUNE_UPDATE_VM is not set.
*
* blk_known_av is the visibility status of the heap block as of the last call
* to find_next_unskippable_block(). vmbuffer is the buffer that may already
@@ -392,6 +402,8 @@ identify_and_fix_vm_corruption(Relation relation,
* FREEZE indicates that we will also freeze tuples, and will return
* 'all_visible', 'all_frozen' flags to the caller.
*
+ * UPDATE_VM indicates that we will set the page's status in the VM.
+ *
* cutoffs contains the freeze cutoffs, established by VACUUM at the beginning
* of vacuuming the relation. Required if HEAP_PRUNE_FREEZE option is set.
* cutoffs->OldestXmin is also used to determine if dead tuples are
@@ -436,18 +448,24 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
HeapTupleData tup;
bool do_freeze;
bool do_prune;
- bool do_hint;
+ bool do_hint_full_or_prunable;
+ bool do_set_vm;
uint8 vmflags = 0;
uint8 old_vmbits = 0;
bool hint_bit_fpi;
int64 fpi_before = pgWalUsage.wal_fpi;
+ bool all_frozen_except_lp_dead = false;
+ bool set_pd_all_visible = false;
/* Copy parameters to prstate */
prstate.vistest = vistest;
prstate.mark_unused_now = (options & HEAP_PAGE_PRUNE_MARK_UNUSED_NOW) != 0;
prstate.freeze = (options & HEAP_PAGE_PRUNE_FREEZE) != 0;
+ prstate.consider_update_vm = (options & HEAP_PAGE_PRUNE_UPDATE_VM) != 0;
prstate.cutoffs = cutoffs;
+ Assert(!prstate.consider_update_vm || vmbuffer);
+
/*
* Our strategy is to scan the page and make lists of items to change,
* then apply the changes within a critical section. This keeps as much
@@ -492,50 +510,57 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
prstate.deadoffsets = presult->deadoffsets;
/*
- * Caller may update the VM after we're done. We can keep track of
- * whether the page will be all-visible and all-frozen after pruning and
- * freezing to help the caller to do that.
+ * Keep track of whether or not the page will be all-visible and
+ * all-frozen for use in opportunistic freezing and to update the VM if
+ * the caller requests it.
+ *
+ * Currently, only VACUUM attempts freezing and setting the VM bits. But
+ * other callers could do either one. The visibility bookkeeping is
+ * required for opportunistic freezing (in addition to setting the VM
+ * bits) because we only consider opportunistically freezing tuples if the
+ * whole page would become all-frozen or if the whole page will be frozen
+ * except for dead tuples that will be removed by vacuum.
*
- * Currently, only VACUUM sets the VM bits. To save the effort, only do
- * the bookkeeping if the caller needs it. Currently, that's tied to
- * HEAP_PAGE_PRUNE_FREEZE, but it could be a separate flag if you wanted
- * to update the VM bits without also freezing or freeze without also
- * setting the VM bits.
+ * If only updating the VM, we must initialize all_frozen to false, as
+ * heap_prepare_freeze_tuple() will not be called for each tuple on the
+ * page and we will not end up correctly setting it to false later.
*
- * In addition to telling the caller whether it can set the VM bit, we
- * also use 'all_visible' and 'all_frozen' for our own decision-making. If
- * the whole page would become frozen, we consider opportunistically
- * freezing tuples. We will not be able to freeze the whole page if there
- * are tuples present that are not visible to everyone or if there are
- * dead tuples which are not yet removable. However, dead tuples which
- * will be removed by the end of vacuuming should not preclude us from
- * opportunistically freezing. Because of that, we do not clear
- * all_visible when we see LP_DEAD items. We fix that at the end of the
- * function, when we return the value to the caller, so that the caller
- * doesn't set the VM bit incorrectly.
+ * Dead tuples which will be removed by the end of vacuuming should not
+ * preclude us from opportunistically freezing, so we do not clear
+ * all_visible when we see LP_DEAD items. We fix that after determining
+ * whether or not to freeze but before deciding whether or not to update
+ * the VM so that we don't set the VM bit incorrectly.
+ *
+ * If not freezing or updating the VM, we otherwise avoid the extra
+ * bookkeeping. Initializing all_visible to false allows skipping the work
+ * to update them in heap_prune_record_unchanged_lp_normal().
*/
if (prstate.freeze)
{
prstate.all_visible = true;
prstate.all_frozen = true;
}
+ else if (prstate.consider_update_vm)
+ {
+ prstate.all_visible = true;
+ prstate.all_frozen = false;
+ }
else
{
- /*
- * Initializing to false allows skipping the work to update them in
- * heap_prune_record_unchanged_lp_normal().
- */
prstate.all_visible = false;
prstate.all_frozen = false;
}
/*
- * The visibility cutoff xid is the newest xmin of live tuples on the
- * page. In the common case, this will be set as the conflict horizon the
- * caller can use for updating the VM. If, at the end of freezing and
- * pruning, the page is all-frozen, there is no possibility that any
- * running transaction on the standby does not see tuples on the page as
- * all-visible, so the conflict horizon remains InvalidTransactionId.
+ * The visibility cutoff xid is the newest xmin of live, committed tuples
+ * older than OldestXmin on the page. This field is only kept up-to-date
+ * if the page is all-visible. As soon as a tuple is encountered that is
+ * not visible to all, this field is unmaintained. As long as it is
+ * maintained, it can be used to calculate the snapshot conflict horizon.
+ * This is most likely to happen when updating the VM and/or freezing all
+ * live tuples on the page. It is updated before returning to the caller
+ * because vacuum does assert-build only validation on the page using this
+ * field.
*/
prstate.visibility_cutoff_xid = InvalidTransactionId;
@@ -733,10 +758,11 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
/*
* Even if we don't prune anything, if we found a new value for the
- * pd_prune_xid field or the page was marked full, we will update the hint
- * bit.
+ * pd_prune_xid field or the page was marked full, we will update those
+ * hint bits.
*/
- do_hint = ((PageHeader) page)->pd_prune_xid != prstate.new_prune_xid ||
+ do_hint_full_or_prunable =
+ ((PageHeader) page)->pd_prune_xid != prstate.new_prune_xid ||
PageIsFull(page);
/*
@@ -784,7 +810,7 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
if (XLogCheckBufferNeedsBackup(buffer))
do_freeze = true;
}
- else if (do_hint)
+ else if (do_hint_full_or_prunable)
{
if (XLogHintBitIsNeeded() && XLogCheckBufferNeedsBackup(buffer))
do_freeze = true;
@@ -823,11 +849,84 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
*/
}
+ /*
+ * It was convenient to ignore LP_DEAD items in all_visible earlier on to
+ * make the choice of whether or not to freeze the page unaffected by the
+ * short-term presence of LP_DEAD items. These LP_DEAD items were
+ * effectively assumed to be LP_UNUSED items in the making. It doesn't
+ * matter which vacuum heap pass (initial pass or final pass) ends up
+ * setting the page all-frozen, as long as the ongoing VACUUM does it.
+ *
+ * Now that freezing has been finalized, unset all_visible if there are
+ * any LP_DEAD items on the page. It needs to reflect the present state of
+ * the page when using it to determine whether or not to update the VM.
+ *
+ * Keep track of whether or not the page was all-frozen except LP_DEAD
+ * items for the purposes of calculating the snapshot conflict horizon,
+ * though.
+ */
+ all_frozen_except_lp_dead = prstate.all_frozen;
+ if (prstate.lpdead_items > 0)
+ {
+ prstate.all_visible = false;
+ prstate.all_frozen = false;
+ }
+
Assert(!prstate.all_frozen || prstate.all_visible);
+
+ /*
+ * Handle setting visibility map bit based on information from the VM (as
+ * of last heap_vac_scan_next_block() call), and from all_visible and
+ * all_frozen variables.
+ */
+ if (prstate.consider_update_vm)
+ {
+ /*
+ * Clear any VM corruption. This does not need to be in a critical
+ * section, so we do it first. If PD_ALL_VISIBLE is incorrectly set,
+ * we may mark the heap page buffer dirty here and could end up doing
+ * so again later. This is not a correctness issue and is in the path
+ * of VM corruption, so we don't have to worry about the extra
+ * performance overhead.
+ */
+ if (identify_and_fix_vm_corruption(relation,
+ blockno, buffer, page,
+ blk_known_av, prstate.lpdead_items, vmbuffer))
+ {
+ /* If we fix corruption, don't update the VM further */
+ }
+
+ /* Determine if we actually need to set the VM and which bits to set. */
+ else if (prstate.all_visible &&
+ (!blk_known_av ||
+ (prstate.all_frozen && !VM_ALL_FROZEN(relation, blockno, &vmbuffer))))
+ {
+ vmflags |= VISIBILITYMAP_ALL_VISIBLE;
+ if (prstate.all_frozen)
+ vmflags |= VISIBILITYMAP_ALL_FROZEN;
+ }
+ }
+
+ do_set_vm = vmflags & VISIBILITYMAP_VALID_BITS;
+
+ /*
+ * Don't set PD_ALL_VISIBLE unless we also plan to set the VM. While it is
+ * correct for a heap page to have PD_ALL_VISIBLE even if the VM is not
+ * set, we strongly prefer to keep them in sync.
+ *
+ * Prior to Postgres 19, it was possible for the page-level bit to be set
+ * and the VM bit to be clear. This could happen if we crashed after
+ * setting PD_ALL_VISIBLE but before setting bits in the VM.
+ */
+ set_pd_all_visible = do_set_vm && !PageIsAllVisible(page);
+
+ /* Save these for the caller in case we later zero out vmflags */
+ presult->new_vmbits = vmflags;
+
/* Any error while applying the changes is critical */
START_CRIT_SECTION();
- if (do_hint)
+ if (do_hint_full_or_prunable)
{
/*
* Update the page's pd_prune_xid field to either zero, or the lowest
@@ -843,15 +942,16 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
PageClearFull(page);
/*
- * If that's all we had to do to the page, this is a non-WAL-logged
- * hint. If we are going to freeze or prune the page, we will mark
- * the buffer dirty below.
+ * If we are _only_ setting the prune_xid or PD_PAGE_FULL hint, then
+ * this is a non-WAL-logged hint. If we are going to freeze or prune
+ * tuples on the page or set PD_ALL_VISIBLE, we will mark the buffer
+ * dirty and emit WAL below.
*/
- if (!do_freeze && !do_prune)
+ if (!do_prune && !do_freeze && !set_pd_all_visible)
MarkBufferDirtyHint(buffer, true);
}
- if (do_prune || do_freeze)
+ if (do_prune || do_freeze || do_set_vm)
{
/* Apply the planned item changes and repair page fragmentation. */
if (do_prune)
@@ -865,12 +965,47 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
if (do_freeze)
heap_freeze_prepared_tuples(buffer, prstate.frozen, prstate.nfrozen);
- MarkBufferDirty(buffer);
+ if (set_pd_all_visible)
+ PageSetAllVisible(page);
/*
- * Emit a WAL XLOG_HEAP2_PRUNE_FREEZE record showing what we did
+ * We only set PD_ALL_VISIBLE if we also set the VM, and since setting
+ * the VM requires emitting WAL, MarkBufferDirtyHint() isn't
+ * appropriate here.
*/
- if (RelationNeedsWAL(relation))
+ if (do_prune || do_freeze || set_pd_all_visible)
+ MarkBufferDirty(buffer);
+
+ if (do_set_vm)
+ {
+ LockBuffer(vmbuffer, BUFFER_LOCK_EXCLUSIVE);
+ old_vmbits = visibilitymap_set_vmbyte(relation, blockno,
+ vmbuffer, vmflags);
+
+ if (old_vmbits == vmflags)
+ {
+ LockBuffer(vmbuffer, BUFFER_LOCK_UNLOCK);
+ do_set_vm = false;
+ /* 0 out vmflags so we don't emit WAL to update the VM */
+ vmflags = 0;
+ }
+ }
+
+ /*
+ * It should never be the case that PD_ALL_VISIBLE is not set and the
+ * VM is set. Or, if it were, we should have caught it earlier when
+ * finding and fixing VM corruption. So, if we found out the VM was
+ * already set above, we should have found PD_ALL_VISIBLE set earlier.
+ */
+ Assert(!set_pd_all_visible || do_set_vm);
+
+ /*
+ * Emit a WAL XLOG_HEAP2_PRUNE_FREEZE record showing what we did. If
+ * we were only updating the VM and it turns out it was already set,
+ * we will have unset do_set_vm earlier. As such, check it again
+ * before emitting the record.
+ */
+ if (RelationNeedsWAL(relation) && (do_set_vm || do_prune || do_freeze))
{
/*
* The snapshotConflictHorizon for the whole record should be the
@@ -882,35 +1017,56 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
* on the standby with xids older than the youngest tuple this
* record will freeze will conflict.
*/
- TransactionId frz_conflict_horizon = InvalidTransactionId;
- TransactionId conflict_xid;
+ TransactionId conflict_xid = InvalidTransactionId;
/*
- * We can use the visibility_cutoff_xid as our cutoff for
- * conflicts when the whole page is eligible to become all-frozen
- * in the VM once we're done with it. Otherwise we generate a
- * conservative cutoff by stepping back from OldestXmin.
+ * If we are updating the VM, the conflict horizon is almost
+ * always the visibility cutoff XID.
+ *
+ * Separately, if we are freezing any tuples, as an optimization,
+ * we can use the visibility_cutoff_xid as the conflict horizon if
+ * the page will be all-frozen. This is true even if there are
+ * LP_DEAD line pointers because we ignored those when maintaining
+ * the visibility_cutoff_xid.
*/
- if (do_freeze)
+ if (do_set_vm || (do_freeze && all_frozen_except_lp_dead))
+ conflict_xid = prstate.visibility_cutoff_xid;
+
+ /*
+ * Otherwise, if we are freezing but the page would not be
+ * all-frozen, we have to use the more pessimistic horizon of
+ * OldestXmin, which may be newer than the newest tuple we froze.
+ * We currently don't track the newest tuple we froze.
+ */
+ else if (do_freeze)
{
- if (prstate.all_visible && prstate.all_frozen)
- frz_conflict_horizon = prstate.visibility_cutoff_xid;
- else
- {
- /* Avoids false conflicts when hot_standby_feedback in use */
- frz_conflict_horizon = prstate.cutoffs->OldestXmin;
- TransactionIdRetreat(frz_conflict_horizon);
- }
+ conflict_xid = prstate.cutoffs->OldestXmin;
+ TransactionIdRetreat(conflict_xid);
}
- if (TransactionIdFollows(frz_conflict_horizon, prstate.latest_xid_removed))
- conflict_xid = frz_conflict_horizon;
- else
+ /*
+ * If we are removing tuples with a younger xmax than our so far
+ * calculated conflict_xid, we must use this as our horizon.
+ */
+ if (TransactionIdFollows(prstate.latest_xid_removed, conflict_xid))
conflict_xid = prstate.latest_xid_removed;
+ /*
+ * We can omit the snapshot conflict horizon if we are not pruning
+ * or freezing any tuples and are setting an already all-visible
+ * page all-frozen in the VM. In this case, all of the tuples on
+ * the page must already be visible to all MVCC snapshots on the
+ * standby.
+ */
+ if (!do_prune && !do_freeze && do_set_vm &&
+ blk_known_av && (vmflags & VISIBILITYMAP_ALL_FROZEN))
+ conflict_xid = InvalidTransactionId;
+
log_heap_prune_and_freeze(relation, buffer,
false,
- InvalidBuffer, 0, false,
+ vmbuffer,
+ vmflags,
+ set_pd_all_visible,
conflict_xid,
true, reason,
prstate.frozen, prstate.nfrozen,
@@ -922,124 +1078,55 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
END_CRIT_SECTION();
- /* Copy information back for caller */
- presult->ndeleted = prstate.ndeleted;
- presult->nnewlpdead = prstate.ndead;
- presult->nfrozen = prstate.nfrozen;
- presult->live_tuples = prstate.live_tuples;
- presult->recently_dead_tuples = prstate.recently_dead_tuples;
-
- /*
- * It was convenient to ignore LP_DEAD items in all_visible earlier on to
- * make the choice of whether or not to freeze the page unaffected by the
- * short-term presence of LP_DEAD items. These LP_DEAD items were
- * effectively assumed to be LP_UNUSED items in the making. It doesn't
- * matter which vacuum heap pass (initial pass or final pass) ends up
- * setting the page all-frozen, as long as the ongoing VACUUM does it.
- *
- * Now that freezing has been finalized, unset all_visible if there are
- * any LP_DEAD items on the page. It needs to reflect the present state
- * of the page, as expected for updating the visibility map.
- */
- if (prstate.all_visible && prstate.lpdead_items == 0)
- {
- presult->all_visible = prstate.all_visible;
- presult->all_frozen = prstate.all_frozen;
- }
- else
- {
- presult->all_visible = false;
- presult->all_frozen = false;
- }
-
- presult->hastup = prstate.hastup;
+ if (do_set_vm)
+ LockBuffer(vmbuffer, BUFFER_LOCK_UNLOCK);
- /*
- * If updating the visibility map, the conflict horizon for that record
- * must be the newest xmin on the page. However, if the page is
- * completely frozen, there can be no conflict and the vm_conflict_horizon
- * should remain InvalidTransactionId. This includes the case that we
- * just froze all the tuples; the prune-freeze record included the
- * conflict XID already so the VM update record doesn't need it.
- */
- if (presult->all_frozen)
- presult->vm_conflict_horizon = InvalidTransactionId;
- else
- presult->vm_conflict_horizon = prstate.visibility_cutoff_xid;
+ Assert(!prstate.all_visible || (prstate.lpdead_items == 0));
/*
- * Handle setting visibility map bit based on information from the VM (as
- * of last heap_vac_scan_next_block() call), and from all_visible and
- * all_frozen variables.
+ * VACUUM will call heap_page_is_all_visible() during the second pass over
+ * the heap to determine all_visible and all_frozen for the page -- this
+ * is a specialized version of the logic from this function. Now that
+ * we've finished pruning and freezing, make sure that we're in total
+ * agreement with heap_page_is_all_visible() using an assertion. We will
+ * have already set the page in the VM, so this assertion will only let
+ * you know that you've already done something wrong.
*/
- if (options & HEAP_PAGE_PRUNE_UPDATE_VM)
+#ifdef USE_ASSERT_CHECKING
+ if (prstate.all_visible)
{
- if (identify_and_fix_vm_corruption(relation,
- blockno, buffer, page,
- blk_known_av,
- prstate.lpdead_items, vmbuffer))
- {
- /* If we fix corruption, don't update the VM further */
- }
+ TransactionId debug_cutoff;
+ bool debug_all_frozen;
- /*
- * If the page isn't yet marked all-visible in the VM or it is and
- * needs to me marked all-frozen, update the VM Note that all_frozen
- * is only valid if all_visible is true, so we must check both
- * all_visible and all_frozen.
- */
- else if (presult->all_visible &&
- (!blk_known_av ||
- (presult->all_frozen && !VM_ALL_FROZEN(relation, blockno, &vmbuffer))))
- {
- Assert(prstate.lpdead_items == 0);
- vmflags = VISIBILITYMAP_ALL_VISIBLE;
+ Assert(cutoffs);
- /*
- * If the page is all-frozen, we can pass InvalidTransactionId as
- * our cutoff_xid, since a snapshotConflictHorizon sufficient to
- * make everything safe for REDO was logged when the page's tuples
- * were frozen.
- */
- if (presult->all_frozen)
- {
- Assert(!TransactionIdIsValid(presult->vm_conflict_horizon));
- vmflags |= VISIBILITYMAP_ALL_FROZEN;
- }
+ Assert(prstate.lpdead_items == 0);
- /*
- * It's possible for the VM bit to be clear and the page-level bit
- * to be set if checksums are not enabled.
- *
- * And even if we are just planning to update the frozen bit in
- * the VM, we shouldn't rely on all_visible_according_to_vm as a
- * proxy for the page-level PD_ALL_VISIBLE bit being set, since it
- * might have become stale.
- *
- * If the heap page is all-visible but the VM bit is not set, we
- * don't need to dirty the heap page. However, if checksums are
- * enabled, we do need to make sure that the heap page is dirtied
- * before passing it to visibilitymap_set(), because it may be
- * logged.
- */
- if (!PageIsAllVisible(page) || XLogHintBitIsNeeded())
- {
- PageSetAllVisible(page);
- MarkBufferDirty(buffer);
- }
+ if (!heap_page_is_all_visible(relation, buffer,
+ cutoffs->OldestXmin,
+ &debug_all_frozen,
+ &debug_cutoff, off_loc))
+ Assert(false);
- old_vmbits = visibilitymap_set(relation, blockno, buffer, InvalidXLogRecPtr,
- vmbuffer, presult->vm_conflict_horizon,
- vmflags);
- }
+ Assert(prstate.all_frozen == debug_all_frozen);
+
+ Assert(!TransactionIdIsValid(debug_cutoff) ||
+ debug_cutoff == prstate.visibility_cutoff_xid);
}
+#endif
+ /* Copy information back for caller */
+ presult->ndeleted = prstate.ndeleted;
+ presult->nnewlpdead = prstate.ndead;
+ presult->nfrozen = prstate.nfrozen;
+ presult->live_tuples = prstate.live_tuples;
+ presult->recently_dead_tuples = prstate.recently_dead_tuples;
+ presult->old_vmbits = old_vmbits;
+ /* new_vmbits was set above */
+ presult->hastup = prstate.hastup;
presult->lpdead_items = prstate.lpdead_items;
/* the presult->deadoffsets array was already filled in */
- presult->old_vmbits = old_vmbits;
- presult->new_vmbits = vmflags;
-
if (prstate.freeze)
{
if (presult->nfrozen > 0)
@@ -1621,7 +1708,12 @@ heap_prune_record_unchanged_lp_normal(Page page, PruneState *prstate, OffsetNumb
break;
}
- /* Consider freezing any normal tuples which will not be removed */
+ /*
+ * Consider freezing any normal tuples which will not be removed.
+ * Regardless of whether or not we want to freeze the tuples, if we want
+ * to update the VM, we have to call heap_prepare_freeze_tuple() on every
+ * tuple to know whether or not the page will be totally frozen.
+ */
if (prstate->freeze)
{
bool totally_frozen;
@@ -2234,6 +2326,8 @@ log_heap_prune_and_freeze(Relation relation, Buffer buffer,
OffsetNumber frz_offsets[MaxHeapTuplesPerPage];
bool do_prune = nredirected > 0 || ndead > 0 || nunused > 0;
+ Assert(do_prune || nfrozen > 0 || vmflags & VISIBILITYMAP_VALID_BITS);
+
xlrec.flags = 0;
regbuf_flags = REGBUF_STANDARD;
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index c49e81bc5dd..91e209901b8 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -2014,34 +2014,6 @@ lazy_scan_prune(LVRelState *vacrel,
vacrel->new_frozen_tuple_pages++;
}
- /*
- * VACUUM will call heap_page_is_all_visible() during the second pass over
- * the heap to determine all_visible and all_frozen for the page -- this
- * is a specialized version of the logic from this function. Now that
- * we've finished pruning and freezing, make sure that we're in total
- * agreement with heap_page_is_all_visible() using an assertion.
- */
-#ifdef USE_ASSERT_CHECKING
- /* Note that all_frozen value does not matter when !all_visible */
- if (presult.all_visible)
- {
- TransactionId debug_cutoff;
- bool debug_all_frozen;
-
- Assert(presult.lpdead_items == 0);
-
- if (!heap_page_is_all_visible(vacrel->rel, buf,
- vacrel->cutoffs.OldestXmin, &debug_all_frozen,
- &debug_cutoff, &vacrel->offnum))
- Assert(false);
-
- Assert(presult.all_frozen == debug_all_frozen);
-
- Assert(!TransactionIdIsValid(debug_cutoff) ||
- debug_cutoff == presult.vm_conflict_horizon);
- }
-#endif
-
/*
* Now save details of the LP_DEAD items from the page in vacrel
*/
@@ -2075,8 +2047,6 @@ lazy_scan_prune(LVRelState *vacrel,
/* Did we find LP_DEAD items? */
*has_lpdead_items = (presult.lpdead_items > 0);
- Assert(!presult.all_visible || !(*has_lpdead_items));
-
/*
* For the purposes of logging, count whether or not the page was newly
* set all-visible and, potentially, all-frozen.
diff --git a/src/include/access/heapam.h b/src/include/access/heapam.h
index b85648456e9..0b9bb1c9b13 100644
--- a/src/include/access/heapam.h
+++ b/src/include/access/heapam.h
@@ -235,19 +235,12 @@ typedef struct PruneFreezeResult
int recently_dead_tuples;
/*
- * all_visible and all_frozen indicate the status of the page as reflected
- * in the visibility map after pruning, freezing, and setting any pages
- * all-visible in the visibility map.
+ * old_vmbits are the state of the all-visible and all-frozen bits in the
+ * visibility map before updating it during phase I of vacuuming.
+ * new_vmbits are the state of those bits after phase I of vacuuming.
*
- * vm_conflict_horizon is the newest xmin of live tuples on the page
- * (older than OldestXmin). It will only be valid if we did not set the
- * page all-frozen in the VM.
- *
- * These are only set if the HEAP_PRUNE_FREEZE option is set.
+ * These are only set if the HEAP_PAGE_PRUNE_UPDATE_VM option is set.
*/
- bool all_visible;
- bool all_frozen;
- TransactionId vm_conflict_horizon;
uint8 old_vmbits;
uint8 new_vmbits;
--
2.43.0
v6-0013-Rename-GlobalVisTestIsRemovableXid-to-GlobalVisXi.patchtext/x-patch; charset=US-ASCII; name=v6-0013-Rename-GlobalVisTestIsRemovableXid-to-GlobalVisXi.patchDownload
From 099097a6c0886bd7ac284ba4de6f26fde6f4fb5e Mon Sep 17 00:00:00 2001
From: Melanie Plageman <melanieplageman@gmail.com>
Date: Fri, 18 Jul 2025 16:30:04 -0400
Subject: [PATCH v6 13/20] Rename GlobalVisTestIsRemovableXid() to
GlobalVisXidVisibleToAll()
Currently, we only use GlobalVisTestIsRemovableXid() to check if a
tuple's xmax is visible to all, meaning we can remove it. But future
commits will use GlobalVisTestIsRemovableXid() to test if a tuple's xmin
is visible to all for the purposes of determining if setting the page
all-visible in the VM. In that case, it makes more sense to call the
function GlobalVisXidVisibleToAll().
---
src/backend/access/heap/heapam_visibility.c | 6 +++---
src/backend/access/heap/pruneheap.c | 14 +++++++-------
src/backend/access/spgist/spgvacuum.c | 2 +-
src/backend/storage/ipc/procarray.c | 13 ++++++-------
src/include/utils/snapmgr.h | 4 ++--
5 files changed, 19 insertions(+), 20 deletions(-)
diff --git a/src/backend/access/heap/heapam_visibility.c b/src/backend/access/heap/heapam_visibility.c
index 05f6946fe60..4ebc8abdbeb 100644
--- a/src/backend/access/heap/heapam_visibility.c
+++ b/src/backend/access/heap/heapam_visibility.c
@@ -1447,7 +1447,7 @@ HeapTupleSatisfiesNonVacuumable(HeapTuple htup, Snapshot snapshot,
{
Assert(TransactionIdIsValid(dead_after));
- if (GlobalVisTestIsRemovableXid(snapshot->vistest, dead_after))
+ if (GlobalVisXidVisibleToAll(snapshot->vistest, dead_after))
res = HEAPTUPLE_DEAD;
}
else
@@ -1512,8 +1512,8 @@ HeapTupleIsSurelyDead(HeapTuple htup, GlobalVisState *vistest)
return false;
/* Deleter committed, so tuple is dead if the XID is old enough. */
- return GlobalVisTestIsRemovableXid(vistest,
- HeapTupleHeaderGetRawXmax(tuple));
+ return GlobalVisXidVisibleToAll(vistest,
+ HeapTupleHeaderGetRawXmax(tuple));
}
/*
diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index ffc12314b41..715dfc16ba7 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -231,7 +231,7 @@ heap_page_prune_opt(Relation relation, Buffer buffer)
*/
vistest = GlobalVisTestFor(relation);
- if (!GlobalVisTestIsRemovableXid(vistest, prune_xid))
+ if (!GlobalVisXidVisibleToAll(vistest, prune_xid))
return;
/*
@@ -574,9 +574,9 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
* Determining HTSV only once for each tuple is required for correctness,
* to deal with cases where running HTSV twice could result in different
* results. For example, RECENTLY_DEAD can turn to DEAD if another
- * checked item causes GlobalVisTestIsRemovableFullXid() to update the
- * horizon, or INSERT_IN_PROGRESS can change to DEAD if the inserting
- * transaction aborts.
+ * checked item causes GlobalVisXidVisibleToAll() to update the horizon,
+ * or INSERT_IN_PROGRESS can change to DEAD if the inserting transaction
+ * aborts.
*
* It's also good for performance. Most commonly tuples within a page are
* stored at decreasing offsets (while the items are stored at increasing
@@ -1172,11 +1172,11 @@ heap_prune_satisfies_vacuum(PruneState *prstate, HeapTuple tup, Buffer buffer)
* Determine whether or not the tuple is considered dead when compared
* with the provided GlobalVisState. On-access pruning does not provide
* VacuumCutoffs. And for vacuum, even if the tuple's xmax is not older
- * than OldestXmin, GlobalVisTestIsRemovableXid() could find the row dead
- * if the GlobalVisState has been updated since the beginning of vacuuming
+ * than OldestXmin, GlobalVisXidVisibleToAll() could find the row dead if
+ * the GlobalVisState has been updated since the beginning of vacuuming
* the relation.
*/
- if (GlobalVisTestIsRemovableXid(prstate->vistest, dead_after))
+ if (GlobalVisXidVisibleToAll(prstate->vistest, dead_after))
return HEAPTUPLE_DEAD;
return res;
diff --git a/src/backend/access/spgist/spgvacuum.c b/src/backend/access/spgist/spgvacuum.c
index 2678f7ab782..4b8e5747239 100644
--- a/src/backend/access/spgist/spgvacuum.c
+++ b/src/backend/access/spgist/spgvacuum.c
@@ -536,7 +536,7 @@ vacuumRedirectAndPlaceholder(Relation index, Relation heaprel, Buffer buffer)
*/
if (dt->tupstate == SPGIST_REDIRECT &&
(!TransactionIdIsValid(dt->xid) ||
- GlobalVisTestIsRemovableXid(vistest, dt->xid)))
+ GlobalVisXidVisibleToAll(vistest, dt->xid)))
{
dt->tupstate = SPGIST_PLACEHOLDER;
Assert(opaque->nRedirection > 0);
diff --git a/src/backend/storage/ipc/procarray.c b/src/backend/storage/ipc/procarray.c
index bf987aed8d3..508bb379f87 100644
--- a/src/backend/storage/ipc/procarray.c
+++ b/src/backend/storage/ipc/procarray.c
@@ -4181,8 +4181,7 @@ GlobalVisUpdate(void)
* See comment for GlobalVisState for details.
*/
bool
-GlobalVisTestIsRemovableFullXid(GlobalVisState *state,
- FullTransactionId fxid)
+GlobalVisFullXidVisible(GlobalVisState *state, FullTransactionId fxid)
{
/*
* If fxid is older than maybe_needed bound, it definitely is visible to
@@ -4223,7 +4222,7 @@ GlobalVisTestIsRemovableFullXid(GlobalVisState *state,
* relfrozenxid).
*/
bool
-GlobalVisTestIsRemovableXid(GlobalVisState *state, TransactionId xid)
+GlobalVisXidVisibleToAll(GlobalVisState *state, TransactionId xid)
{
FullTransactionId fxid;
@@ -4237,7 +4236,7 @@ GlobalVisTestIsRemovableXid(GlobalVisState *state, TransactionId xid)
*/
fxid = FullXidRelativeTo(state->definitely_needed, xid);
- return GlobalVisTestIsRemovableFullXid(state, fxid);
+ return GlobalVisFullXidVisible(state, fxid);
}
/*
@@ -4251,12 +4250,12 @@ GlobalVisCheckRemovableFullXid(Relation rel, FullTransactionId fxid)
state = GlobalVisTestFor(rel);
- return GlobalVisTestIsRemovableFullXid(state, fxid);
+ return GlobalVisFullXidVisible(state, fxid);
}
/*
* Convenience wrapper around GlobalVisTestFor() and
- * GlobalVisTestIsRemovableXid(), see their comments.
+ * GlobalVisTestIsVisibleXid(), see their comments.
*/
bool
GlobalVisCheckRemovableXid(Relation rel, TransactionId xid)
@@ -4265,7 +4264,7 @@ GlobalVisCheckRemovableXid(Relation rel, TransactionId xid)
state = GlobalVisTestFor(rel);
- return GlobalVisTestIsRemovableXid(state, xid);
+ return GlobalVisXidVisibleToAll(state, xid);
}
/*
diff --git a/src/include/utils/snapmgr.h b/src/include/utils/snapmgr.h
index d346be71642..aec0692b5db 100644
--- a/src/include/utils/snapmgr.h
+++ b/src/include/utils/snapmgr.h
@@ -97,8 +97,8 @@ extern char *ExportSnapshot(Snapshot snapshot);
*/
typedef struct GlobalVisState GlobalVisState;
extern GlobalVisState *GlobalVisTestFor(Relation rel);
-extern bool GlobalVisTestIsRemovableXid(GlobalVisState *state, TransactionId xid);
-extern bool GlobalVisTestIsRemovableFullXid(GlobalVisState *state, FullTransactionId fxid);
+extern bool GlobalVisXidVisibleToAll(GlobalVisState *state, TransactionId xid);
+extern bool GlobalVisFullXidVisible(GlobalVisState *state, FullTransactionId fxid);
extern bool GlobalVisCheckRemovableXid(Relation rel, TransactionId xid);
extern bool GlobalVisCheckRemovableFullXid(Relation rel, FullTransactionId fxid);
--
2.43.0
v6-0014-Use-GlobalVisState-to-determine-page-level-visibi.patchtext/x-patch; charset=US-ASCII; name=v6-0014-Use-GlobalVisState-to-determine-page-level-visibi.patchDownload
From 84467961e150272593c85cdbc732d1311dd8ae74 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <melanieplageman@gmail.com>
Date: Tue, 29 Jul 2025 14:38:24 -0400
Subject: [PATCH v6 14/20] Use GlobalVisState to determine page level
visibility
During pruning and during vacuum's third phase, we try to determine if
the whole page can be set all-visible in the visibility map. Instead of
using OldestXmin to determine if all the tuples on a page are visible to
everyone, use the GlobalVisState. This allows us to start setting the VM
during on-access pruning in a future commit.
It is possible for the GlobalVisState to change during the course of a
vacuum. In all but extraordinary cases, it moves forward, meaning more
pages could potentially be set in the VM.
Because comparing a transaction ID to the GlobalVisState requires more
operations than comparing it to another single transaction ID, we now
wait until after examining all the tuples on the page and if we have
maintained the visibility_cutoff_xid, we compare that to the
GlobalVisState just once per page. This works because if the page is
all-visible and has live, committed tuples on it, the
visibility_cutoff_xid will contain the newest xmin on the page. If
everyone can see it, the page is truly all-visible.
Doing this may mean we examine more tuples' xmins than before, as we may
have set all_visible to false sooner when encountering a live tuple
newer than OldestXmin. However, these extra comparisons were found not
to be significant in a profile.
---
src/backend/access/heap/heapam_visibility.c | 28 ++++++++++++
src/backend/access/heap/pruneheap.c | 48 +++++++++------------
src/backend/access/heap/vacuumlazy.c | 17 ++++----
src/include/access/heapam.h | 4 +-
4 files changed, 59 insertions(+), 38 deletions(-)
diff --git a/src/backend/access/heap/heapam_visibility.c b/src/backend/access/heap/heapam_visibility.c
index 4ebc8abdbeb..edd529dc3c0 100644
--- a/src/backend/access/heap/heapam_visibility.c
+++ b/src/backend/access/heap/heapam_visibility.c
@@ -1189,6 +1189,34 @@ HeapTupleSatisfiesVacuum(HeapTuple htup, TransactionId OldestXmin,
return res;
}
+/*
+ * Nearly the same as HeapTupleSatisfiesVacuum, but uses a GlobalVisState to
+ * determine whether or not a tuple is HEAPTUPLE_DEAD Or
+ * HEAPTUPLE_RECENTLY_DEAD. It serves the same purpose but can be used by
+ * callers that have not calculated a single OldestXmin value.
+ */
+HTSV_Result
+HeapTupleSatisfiesVacuumGlobalVis(HeapTuple htup, GlobalVisState *vistest,
+ Buffer buffer)
+{
+ TransactionId dead_after = InvalidTransactionId;
+ HTSV_Result res;
+
+ res = HeapTupleSatisfiesVacuumHorizon(htup, buffer, &dead_after);
+
+ if (res == HEAPTUPLE_RECENTLY_DEAD)
+ {
+ Assert(TransactionIdIsValid(dead_after));
+
+ if (GlobalVisXidVisibleToAll(vistest, dead_after))
+ res = HEAPTUPLE_DEAD;
+ }
+ else
+ Assert(!TransactionIdIsValid(dead_after));
+
+ return res;
+}
+
/*
* Work horse for HeapTupleSatisfiesVacuum and similar routines.
*
diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index 715dfc16ba7..ab79d8a3ed9 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -141,10 +141,9 @@ typedef struct
* all_visible and all_frozen indicate if the all-visible and all-frozen
* bits in the visibility map can be set for this page after pruning.
*
- * visibility_cutoff_xid is the newest xmin of live tuples on the page.
- * The caller can use it as the conflict horizon, when setting the VM
- * bits. It is only valid if we froze some tuples, and all_frozen is
- * true.
+ * visibility_cutoff_xid is the newest xmin of live tuples on the page. It
+ * can be used as the conflict horizon, when setting the VM or when
+ * freezing all the live tuples on the page.
*
* NOTE: all_visible and all_frozen don't include LP_DEAD items until
* directly before updating the VM. We ignore LP_DEAD items when deciding
@@ -553,14 +552,12 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
/*
* The visibility cutoff xid is the newest xmin of live, committed tuples
- * older than OldestXmin on the page. This field is only kept up-to-date
- * if the page is all-visible. As soon as a tuple is encountered that is
- * not visible to all, this field is unmaintained. As long as it is
- * maintained, it can be used to calculate the snapshot conflict horizon.
- * This is most likely to happen when updating the VM and/or freezing all
- * live tuples on the page. It is updated before returning to the caller
- * because vacuum does assert-build only validation on the page using this
- * field.
+ * on the page older than the visibility horizon represented in the
+ * GlobalVisState.
+ *
+ * If we encounter an uncommitted tuple, this field is unmaintained. If
+ * the page is being set all-visible or when freezing all live tuples on
+ * the page, it is used to calculate the snapshot conflict horizon.
*/
prstate.visibility_cutoff_xid = InvalidTransactionId;
@@ -756,6 +753,16 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
prstate.ndead > 0 ||
prstate.nunused > 0;
+ /*
+ * After processing all the live tuples on the page, if the newest xmin
+ * amongst them is not visible to everyone, the page cannot be
+ * all-visible.
+ */
+ if (prstate.all_visible &&
+ TransactionIdIsNormal(prstate.visibility_cutoff_xid) &&
+ !GlobalVisXidVisibleToAll(prstate.vistest, prstate.visibility_cutoff_xid))
+ prstate.all_visible = prstate.all_frozen = false;
+
/*
* Even if we don't prune anything, if we found a new value for the
* pd_prune_xid field or the page was marked full, we will update those
@@ -1098,12 +1105,10 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
TransactionId debug_cutoff;
bool debug_all_frozen;
- Assert(cutoffs);
-
Assert(prstate.lpdead_items == 0);
if (!heap_page_is_all_visible(relation, buffer,
- cutoffs->OldestXmin,
+ prstate.vistest,
&debug_all_frozen,
&debug_cutoff, off_loc))
Assert(false);
@@ -1628,19 +1633,6 @@ heap_prune_record_unchanged_lp_normal(Page page, PruneState *prstate, OffsetNumb
*/
xmin = HeapTupleHeaderGetXmin(htup);
- /*
- * For now always use prstate->cutoffs for this test, because
- * we only update 'all_visible' when freezing is requested. We
- * could use GlobalVisTestIsRemovableXid instead, if a
- * non-freezing caller wanted to set the VM bit.
- */
- Assert(prstate->cutoffs);
- if (!TransactionIdPrecedes(xmin, prstate->cutoffs->OldestXmin))
- {
- prstate->all_visible = prstate->all_frozen = false;
- break;
- }
-
/* Track newest xmin on page. */
if (TransactionIdFollows(xmin, prstate->visibility_cutoff_xid) &&
TransactionIdIsNormal(xmin))
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index 6a0fa371a06..777ec30eb82 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -465,7 +465,7 @@ static void dead_items_add(LVRelState *vacrel, BlockNumber blkno, OffsetNumber *
static void dead_items_reset(LVRelState *vacrel);
static void dead_items_cleanup(LVRelState *vacrel);
static bool heap_page_is_all_visible_except_lpdead(Relation rel, Buffer buf,
- TransactionId OldestXmin,
+ GlobalVisState *vistest,
OffsetNumber *deadoffsets,
int allowed_num_offsets,
bool *all_frozen,
@@ -2716,7 +2716,7 @@ lazy_vacuum_heap_page(LVRelState *vacrel, BlockNumber blkno, Buffer buffer,
InvalidOffsetNumber);
if (heap_page_is_all_visible_except_lpdead(vacrel->rel, buffer,
- vacrel->cutoffs.OldestXmin,
+ vacrel->vistest,
deadoffsets, num_offsets,
&all_frozen, &visibility_cutoff_xid,
&vacrel->offnum))
@@ -3459,13 +3459,13 @@ dead_items_cleanup(LVRelState *vacrel)
*/
bool
heap_page_is_all_visible(Relation rel, Buffer buf,
- TransactionId OldestXmin,
+ GlobalVisState *vistest,
bool *all_frozen,
TransactionId *visibility_cutoff_xid,
OffsetNumber *logging_offnum)
{
- return heap_page_is_all_visible_except_lpdead(rel, buf, OldestXmin,
+ return heap_page_is_all_visible_except_lpdead(rel, buf, vistest,
NULL, 0,
all_frozen,
visibility_cutoff_xid,
@@ -3500,7 +3500,7 @@ heap_page_is_all_visible(Relation rel, Buffer buf,
*/
static bool
heap_page_is_all_visible_except_lpdead(Relation rel, Buffer buf,
- TransactionId OldestXmin,
+ GlobalVisState *vistest,
OffsetNumber *deadoffsets,
int allowed_num_offsets,
bool *all_frozen,
@@ -3555,8 +3555,8 @@ heap_page_is_all_visible_except_lpdead(Relation rel, Buffer buf,
tuple.t_len = ItemIdGetLength(itemid);
tuple.t_tableOid = RelationGetRelid(rel);
- switch (HeapTupleSatisfiesVacuum(&tuple, OldestXmin,
- buf))
+ switch (HeapTupleSatisfiesVacuumGlobalVis(&tuple, vistest,
+ buf))
{
case HEAPTUPLE_LIVE:
{
@@ -3575,8 +3575,7 @@ heap_page_is_all_visible_except_lpdead(Relation rel, Buffer buf,
* that everyone sees it as committed?
*/
xmin = HeapTupleHeaderGetXmin(tuple.t_data);
- if (!TransactionIdPrecedes(xmin,
- OldestXmin))
+ if (!GlobalVisXidVisibleToAll(vistest, xmin))
{
all_visible = false;
*all_frozen = false;
diff --git a/src/include/access/heapam.h b/src/include/access/heapam.h
index 0b9bb1c9b13..4278f351bdf 100644
--- a/src/include/access/heapam.h
+++ b/src/include/access/heapam.h
@@ -342,7 +342,7 @@ extern void heap_inplace_unlock(Relation relation,
HeapTuple oldtup, Buffer buffer);
extern bool heap_page_is_all_visible(Relation rel, Buffer buf,
- TransactionId OldestXmin,
+ GlobalVisState *vistest,
bool *all_frozen,
TransactionId *visibility_cutoff_xid,
OffsetNumber *logging_offnum);
@@ -415,6 +415,8 @@ extern TM_Result HeapTupleSatisfiesUpdate(HeapTuple htup, CommandId curcid,
Buffer buffer);
extern HTSV_Result HeapTupleSatisfiesVacuum(HeapTuple htup, TransactionId OldestXmin,
Buffer buffer);
+extern HTSV_Result HeapTupleSatisfiesVacuumGlobalVis(HeapTuple htup,
+ GlobalVisState *vistest, Buffer buffer);
extern HTSV_Result HeapTupleSatisfiesVacuumHorizon(HeapTuple htup, Buffer buffer,
TransactionId *dead_after);
extern void HeapTupleSetHintBits(HeapTupleHeader tuple, Buffer buffer,
--
2.43.0
v6-0012-Remove-xl_heap_visible-entirely.patchtext/x-patch; charset=US-ASCII; name=v6-0012-Remove-xl_heap_visible-entirely.patchDownload
From a9341bda057d50769b6fbb109d847324ab837de9 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <melanieplageman@gmail.com>
Date: Wed, 18 Jun 2025 12:30:42 -0400
Subject: [PATCH v6 12/20] Remove xl_heap_visible entirely
There are now no users of this, so eliminate it entirely.
---
src/backend/access/common/bufmask.c | 3 +-
src/backend/access/heap/heapam.c | 54 +-------
src/backend/access/heap/heapam_xlog.c | 154 +----------------------
src/backend/access/heap/pruneheap.c | 4 +-
src/backend/access/heap/vacuumlazy.c | 10 +-
src/backend/access/heap/visibilitymap.c | 106 +---------------
src/backend/access/rmgrdesc/heapdesc.c | 10 --
src/backend/replication/logical/decode.c | 1 -
src/backend/storage/ipc/standby.c | 12 +-
src/include/access/heapam_xlog.h | 20 ---
src/include/access/visibilitymap.h | 11 +-
src/include/access/visibilitymapdefs.h | 9 --
src/tools/pgindent/typedefs.list | 1 -
13 files changed, 30 insertions(+), 365 deletions(-)
diff --git a/src/backend/access/common/bufmask.c b/src/backend/access/common/bufmask.c
index bb260cffa68..1fff01383b3 100644
--- a/src/backend/access/common/bufmask.c
+++ b/src/backend/access/common/bufmask.c
@@ -56,8 +56,7 @@ mask_page_hint_bits(Page page)
/*
* During replay, if the page LSN has advanced past our XLOG record's LSN,
- * we don't mark the page all-visible. See heap_xlog_visible() for
- * details.
+ * we don't mark the page all-visible.
*/
PageClearAllVisible(page);
}
diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index 68db4325285..48f7b84156a 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -40,6 +40,7 @@
#include "access/valid.h"
#include "access/visibilitymap.h"
#include "access/xloginsert.h"
+#include "access/xlogutils.h"
#include "catalog/pg_database.h"
#include "catalog/pg_database_d.h"
#include "commands/vacuum.h"
@@ -2512,11 +2513,11 @@ heap_multi_insert(Relation relation, TupleTableSlot **slots, int ntuples,
{
PageSetAllVisible(page);
LockBuffer(vmbuffer, BUFFER_LOCK_EXCLUSIVE);
- visibilitymap_set_vmbyte(relation,
- BufferGetBlockNumber(buffer),
- vmbuffer,
- VISIBILITYMAP_ALL_VISIBLE |
- VISIBILITYMAP_ALL_FROZEN);
+ visibilitymap_set(relation,
+ BufferGetBlockNumber(buffer),
+ vmbuffer,
+ VISIBILITYMAP_ALL_VISIBLE |
+ VISIBILITYMAP_ALL_FROZEN);
}
/*
@@ -8784,49 +8785,6 @@ bottomup_sort_and_shrink(TM_IndexDeleteOp *delstate)
return nblocksfavorable;
}
-/*
- * Perform XLogInsert for a heap-visible operation. 'block' is the block
- * being marked all-visible, and vm_buffer is the buffer containing the
- * corresponding visibility map block. Both should have already been modified
- * and dirtied.
- *
- * snapshotConflictHorizon comes from the largest xmin on the page being
- * marked all-visible. REDO routine uses it to generate recovery conflicts.
- *
- * If checksums or wal_log_hints are enabled, we may also generate a full-page
- * image of heap_buffer. Otherwise, we optimize away the FPI (by specifying
- * REGBUF_NO_IMAGE for the heap buffer), in which case the caller should *not*
- * update the heap page's LSN.
- */
-XLogRecPtr
-log_heap_visible(Relation rel, Buffer heap_buffer, Buffer vm_buffer,
- TransactionId snapshotConflictHorizon, uint8 vmflags)
-{
- xl_heap_visible xlrec;
- XLogRecPtr recptr;
- uint8 flags;
-
- Assert(BufferIsValid(heap_buffer));
- Assert(BufferIsValid(vm_buffer));
-
- xlrec.snapshotConflictHorizon = snapshotConflictHorizon;
- xlrec.flags = vmflags;
- if (RelationIsAccessibleInLogicalDecoding(rel))
- xlrec.flags |= VISIBILITYMAP_XLOG_CATALOG_REL;
- XLogBeginInsert();
- XLogRegisterData(&xlrec, SizeOfHeapVisible);
-
- XLogRegisterBuffer(0, vm_buffer, 0);
-
- flags = REGBUF_STANDARD;
- if (!XLogHintBitIsNeeded())
- flags |= REGBUF_NO_IMAGE;
- XLogRegisterBuffer(1, heap_buffer, flags);
-
- recptr = XLogInsert(RM_HEAP2_ID, XLOG_HEAP2_VISIBLE);
-
- return recptr;
-}
/*
* Perform XLogInsert for a heap-update operation. Caller must already
diff --git a/src/backend/access/heap/heapam_xlog.c b/src/backend/access/heap/heapam_xlog.c
index 14541e2e94f..64f06d46bf1 100644
--- a/src/backend/access/heap/heapam_xlog.c
+++ b/src/backend/access/heap/heapam_xlog.c
@@ -82,10 +82,6 @@ heap_xlog_prune_freeze(XLogReaderState *record)
memcpy(&vmflags, maindataptr, sizeof(uint8));
maindataptr += sizeof(uint8);
- /*
- * We don't set VISIBILITYMAP_XLOG_CATALOG_REL in the combined record
- * because we already have XLHP_IS_CATALOG_REL.
- */
Assert((vmflags & VISIBILITYMAP_VALID_BITS) == vmflags);
/* Must never set all_frozen bit without also setting all_visible bit */
Assert(vmflags != VISIBILITYMAP_ALL_FROZEN);
@@ -267,7 +263,7 @@ heap_xlog_prune_freeze(XLogReaderState *record)
Relation reln = CreateFakeRelcacheEntry(rlocator);
visibilitymap_pin(reln, blkno, &vmbuffer);
- old_vmbits = visibilitymap_set_vmbyte(reln, blkno, vmbuffer, vmflags);
+ old_vmbits = visibilitymap_set(reln, blkno, vmbuffer, vmflags);
/* Only set VM page LSN if we modified the page */
if (old_vmbits != vmflags)
PageSetLSN(BufferGetPage(vmbuffer), lsn);
@@ -278,143 +274,6 @@ heap_xlog_prune_freeze(XLogReaderState *record)
UnlockReleaseBuffer(vmbuffer);
}
-/*
- * Replay XLOG_HEAP2_VISIBLE records.
- *
- * The critical integrity requirement here is that we must never end up with
- * a situation where the visibility map bit is set, and the page-level
- * PD_ALL_VISIBLE bit is clear. If that were to occur, then a subsequent
- * page modification would fail to clear the visibility map bit.
- */
-static void
-heap_xlog_visible(XLogReaderState *record)
-{
- XLogRecPtr lsn = record->EndRecPtr;
- xl_heap_visible *xlrec = (xl_heap_visible *) XLogRecGetData(record);
- Buffer vmbuffer = InvalidBuffer;
- Buffer buffer;
- Page page;
- RelFileLocator rlocator;
- BlockNumber blkno;
- XLogRedoAction action;
-
- Assert((xlrec->flags & VISIBILITYMAP_XLOG_VALID_BITS) == xlrec->flags);
-
- XLogRecGetBlockTag(record, 1, &rlocator, NULL, &blkno);
-
- /*
- * If there are any Hot Standby transactions running that have an xmin
- * horizon old enough that this page isn't all-visible for them, they
- * might incorrectly decide that an index-only scan can skip a heap fetch.
- *
- * NB: It might be better to throw some kind of "soft" conflict here that
- * forces any index-only scan that is in flight to perform heap fetches,
- * rather than killing the transaction outright.
- */
- if (InHotStandby)
- ResolveRecoveryConflictWithSnapshot(xlrec->snapshotConflictHorizon,
- xlrec->flags & VISIBILITYMAP_XLOG_CATALOG_REL,
- rlocator);
-
- /*
- * Read the heap page, if it still exists. If the heap file has dropped or
- * truncated later in recovery, we don't need to update the page, but we'd
- * better still update the visibility map.
- */
- action = XLogReadBufferForRedo(record, 1, &buffer);
- if (action == BLK_NEEDS_REDO)
- {
- /*
- * We don't bump the LSN of the heap page when setting the visibility
- * map bit (unless checksums or wal_hint_bits is enabled, in which
- * case we must). This exposes us to torn page hazards, but since
- * we're not inspecting the existing page contents in any way, we
- * don't care.
- */
- page = BufferGetPage(buffer);
-
- PageSetAllVisible(page);
-
- if (XLogHintBitIsNeeded())
- PageSetLSN(page, lsn);
-
- MarkBufferDirty(buffer);
- }
- else if (action == BLK_RESTORED)
- {
- /*
- * If heap block was backed up, we already restored it and there's
- * nothing more to do. (This can only happen with checksums or
- * wal_log_hints enabled.)
- */
- }
-
- if (BufferIsValid(buffer))
- {
- Size space = PageGetFreeSpace(BufferGetPage(buffer));
-
- UnlockReleaseBuffer(buffer);
-
- /*
- * Since FSM is not WAL-logged and only updated heuristically, it
- * easily becomes stale in standbys. If the standby is later promoted
- * and runs VACUUM, it will skip updating individual free space
- * figures for pages that became all-visible (or all-frozen, depending
- * on the vacuum mode,) which is troublesome when FreeSpaceMapVacuum
- * propagates too optimistic free space values to upper FSM layers;
- * later inserters try to use such pages only to find out that they
- * are unusable. This can cause long stalls when there are many such
- * pages.
- *
- * Forestall those problems by updating FSM's idea about a page that
- * is becoming all-visible or all-frozen.
- *
- * Do this regardless of a full-page image being applied, since the
- * FSM data is not in the page anyway.
- */
- if (xlrec->flags & VISIBILITYMAP_VALID_BITS)
- XLogRecordPageWithFreeSpace(rlocator, blkno, space);
- }
-
- /*
- * Even if we skipped the heap page update due to the LSN interlock, it's
- * still safe to update the visibility map. Any WAL record that clears
- * the visibility map bit does so before checking the page LSN, so any
- * bits that need to be cleared will still be cleared.
- */
- if (XLogReadBufferForRedoExtended(record, 0, RBM_ZERO_ON_ERROR, false,
- &vmbuffer) == BLK_NEEDS_REDO)
- {
- Page vmpage = BufferGetPage(vmbuffer);
- Relation reln;
- uint8 vmbits;
-
- /* initialize the page if it was read as zeros */
- if (PageIsNew(vmpage))
- PageInit(vmpage, BLCKSZ, 0);
-
- /* remove VISIBILITYMAP_XLOG_* */
- vmbits = xlrec->flags & VISIBILITYMAP_VALID_BITS;
-
- /*
- * XLogReadBufferForRedoExtended locked the buffer. But
- * visibilitymap_set will handle locking itself.
- */
- LockBuffer(vmbuffer, BUFFER_LOCK_UNLOCK);
-
- reln = CreateFakeRelcacheEntry(rlocator);
- visibilitymap_pin(reln, blkno, &vmbuffer);
-
- visibilitymap_set(reln, blkno, InvalidBuffer, lsn, vmbuffer,
- xlrec->snapshotConflictHorizon, vmbits);
-
- ReleaseBuffer(vmbuffer);
- FreeFakeRelcacheEntry(reln);
- }
- else if (BufferIsValid(vmbuffer))
- UnlockReleaseBuffer(vmbuffer);
-}
-
/*
* Given an "infobits" field from an XLog record, set the correct bits in the
* given infomask and infomask2 for the tuple touched by the record.
@@ -791,16 +650,16 @@ heap_xlog_multi_insert(XLogReaderState *record)
Relation reln = CreateFakeRelcacheEntry(rlocator);
visibilitymap_pin(reln, blkno, &vmbuffer);
- visibilitymap_set_vmbyte(reln, blkno,
- vmbuffer,
- VISIBILITYMAP_ALL_VISIBLE |
- VISIBILITYMAP_ALL_FROZEN);
/*
* It is not possible that the VM was already set for this heap page,
* so the vmbuffer must have been modified and marked dirty.
*/
Assert(BufferIsDirty(vmbuffer));
+ visibilitymap_set(reln, blkno,
+ vmbuffer,
+ VISIBILITYMAP_ALL_VISIBLE |
+ VISIBILITYMAP_ALL_FROZEN);
PageSetLSN(BufferGetPage(vmbuffer), lsn);
FreeFakeRelcacheEntry(reln);
}
@@ -1380,9 +1239,6 @@ heap2_redo(XLogReaderState *record)
case XLOG_HEAP2_PRUNE_VACUUM_CLEANUP:
heap_xlog_prune_freeze(record);
break;
- case XLOG_HEAP2_VISIBLE:
- heap_xlog_visible(record);
- break;
case XLOG_HEAP2_MULTI_INSERT:
heap_xlog_multi_insert(record);
break;
diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index 82127e8728b..ffc12314b41 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -979,8 +979,8 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
if (do_set_vm)
{
LockBuffer(vmbuffer, BUFFER_LOCK_EXCLUSIVE);
- old_vmbits = visibilitymap_set_vmbyte(relation, blockno,
- vmbuffer, vmflags);
+ old_vmbits = visibilitymap_set(relation, blockno,
+ vmbuffer, vmflags);
if (old_vmbits == vmflags)
{
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index 91e209901b8..6a0fa371a06 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -1887,8 +1887,8 @@ lazy_scan_new_or_empty(LVRelState *vacrel, Buffer buf, BlockNumber blkno,
MarkBufferDirty(buf);
LockBuffer(vmbuffer, BUFFER_LOCK_EXCLUSIVE);
- visibilitymap_set_vmbyte(vacrel->rel, blkno,
- vmbuffer, new_vmbits);
+ visibilitymap_set(vacrel->rel, blkno,
+ vmbuffer, new_vmbits);
if (RelationNeedsWAL(vacrel->rel))
{
@@ -2754,9 +2754,9 @@ lazy_vacuum_heap_page(LVRelState *vacrel, BlockNumber blkno, Buffer buffer,
set_pd_all_vis = true;
LockBuffer(vmbuffer, BUFFER_LOCK_EXCLUSIVE);
PageSetAllVisible(page);
- visibilitymap_set_vmbyte(vacrel->rel,
- blkno,
- vmbuffer, vmflags);
+ visibilitymap_set(vacrel->rel,
+ blkno,
+ vmbuffer, vmflags);
conflict_xid = visibility_cutoff_xid;
}
diff --git a/src/backend/access/heap/visibilitymap.c b/src/backend/access/heap/visibilitymap.c
index 0bc64203959..5ed54e06dd4 100644
--- a/src/backend/access/heap/visibilitymap.c
+++ b/src/backend/access/heap/visibilitymap.c
@@ -14,8 +14,7 @@
* visibilitymap_clear - clear bits for one page in the visibility map
* visibilitymap_pin - pin a map page for setting a bit
* visibilitymap_pin_ok - check whether correct map page is already pinned
- * visibilitymap_set - set a bit in a previously pinned page and log
- * visibilitymap_set_vmbyte - set a bit in a pinned page
+ * visibilitymap_set - set a bit in a previously pinned page
* visibilitymap_get_status - get status of bits
* visibilitymap_count - count number of bits set in visibility map
* visibilitymap_prepare_truncate -
@@ -220,105 +219,6 @@ visibilitymap_pin_ok(BlockNumber heapBlk, Buffer vmbuf)
return BufferIsValid(vmbuf) && BufferGetBlockNumber(vmbuf) == mapBlock;
}
-/*
- * visibilitymap_set - set bit(s) on a previously pinned page
- *
- * recptr is the LSN of the XLOG record we're replaying, if we're in recovery,
- * or InvalidXLogRecPtr in normal running. The VM page LSN is advanced to the
- * one provided; in normal running, we generate a new XLOG record and set the
- * page LSN to that value (though the heap page's LSN may *not* be updated;
- * see below). cutoff_xid is the largest xmin on the page being marked
- * all-visible; it is needed for Hot Standby, and can be InvalidTransactionId
- * if the page contains no tuples. It can also be set to InvalidTransactionId
- * when a page that is already all-visible is being marked all-frozen.
- *
- * Caller is expected to set the heap page's PD_ALL_VISIBLE bit before calling
- * this function. Except in recovery, caller should also pass the heap
- * buffer. When checksums are enabled and we're not in recovery, we must add
- * the heap buffer to the WAL chain to protect it from being torn.
- *
- * You must pass a buffer containing the correct map page to this function.
- * Call visibilitymap_pin first to pin the right one. This function doesn't do
- * any I/O.
- *
- * Returns the state of the page's VM bits before setting flags.
- */
-uint8
-visibilitymap_set(Relation rel, BlockNumber heapBlk, Buffer heapBuf,
- XLogRecPtr recptr, Buffer vmBuf, TransactionId cutoff_xid,
- uint8 flags)
-{
- BlockNumber mapBlock = HEAPBLK_TO_MAPBLOCK(heapBlk);
- uint32 mapByte = HEAPBLK_TO_MAPBYTE(heapBlk);
- uint8 mapOffset = HEAPBLK_TO_OFFSET(heapBlk);
- Page page;
- uint8 *map;
- uint8 status;
-
-#ifdef TRACE_VISIBILITYMAP
- elog(DEBUG1, "vm_set %s %d", RelationGetRelationName(rel), heapBlk);
-#endif
-
- Assert(InRecovery || XLogRecPtrIsInvalid(recptr));
- Assert(InRecovery || PageIsAllVisible((Page) BufferGetPage(heapBuf)));
- Assert((flags & VISIBILITYMAP_VALID_BITS) == flags);
-
- /* Must never set all_frozen bit without also setting all_visible bit */
- Assert(flags != VISIBILITYMAP_ALL_FROZEN);
-
- /* Check that we have the right heap page pinned, if present */
- if (BufferIsValid(heapBuf) && BufferGetBlockNumber(heapBuf) != heapBlk)
- elog(ERROR, "wrong heap buffer passed to visibilitymap_set");
-
- /* Check that we have the right VM page pinned */
- if (!BufferIsValid(vmBuf) || BufferGetBlockNumber(vmBuf) != mapBlock)
- elog(ERROR, "wrong VM buffer passed to visibilitymap_set");
-
- page = BufferGetPage(vmBuf);
- map = (uint8 *) PageGetContents(page);
- LockBuffer(vmBuf, BUFFER_LOCK_EXCLUSIVE);
-
- status = (map[mapByte] >> mapOffset) & VISIBILITYMAP_VALID_BITS;
- if (flags != status)
- {
- START_CRIT_SECTION();
-
- map[mapByte] |= (flags << mapOffset);
- MarkBufferDirty(vmBuf);
-
- if (RelationNeedsWAL(rel))
- {
- if (XLogRecPtrIsInvalid(recptr))
- {
- Assert(!InRecovery);
- recptr = log_heap_visible(rel, heapBuf, vmBuf, cutoff_xid, flags);
-
- /*
- * If data checksums are enabled (or wal_log_hints=on), we
- * need to protect the heap page from being torn.
- *
- * If not, then we must *not* update the heap page's LSN. In
- * this case, the FPI for the heap page was omitted from the
- * WAL record inserted above, so it would be incorrect to
- * update the heap page's LSN.
- */
- if (XLogHintBitIsNeeded())
- {
- Page heapPage = BufferGetPage(heapBuf);
-
- PageSetLSN(heapPage, recptr);
- }
- }
- PageSetLSN(page, recptr);
- }
-
- END_CRIT_SECTION();
- }
-
- LockBuffer(vmBuf, BUFFER_LOCK_UNLOCK);
- return status;
-}
-
/*
* Set flags in the VM block contained in the passed in vmBuf.
*
@@ -338,8 +238,8 @@ visibilitymap_set(Relation rel, BlockNumber heapBlk, Buffer heapBuf,
* making any changes needed to the associated heap page.
*/
uint8
-visibilitymap_set_vmbyte(Relation rel, BlockNumber heapBlk,
- Buffer vmBuf, uint8 flags)
+visibilitymap_set(Relation rel, BlockNumber heapBlk,
+ Buffer vmBuf, uint8 flags)
{
BlockNumber mapBlock = HEAPBLK_TO_MAPBLOCK(heapBlk);
uint32 mapByte = HEAPBLK_TO_MAPBYTE(heapBlk);
diff --git a/src/backend/access/rmgrdesc/heapdesc.c b/src/backend/access/rmgrdesc/heapdesc.c
index d6c86ccac20..f7880a4ed81 100644
--- a/src/backend/access/rmgrdesc/heapdesc.c
+++ b/src/backend/access/rmgrdesc/heapdesc.c
@@ -351,13 +351,6 @@ heap2_desc(StringInfo buf, XLogReaderState *record)
}
}
}
- else if (info == XLOG_HEAP2_VISIBLE)
- {
- xl_heap_visible *xlrec = (xl_heap_visible *) rec;
-
- appendStringInfo(buf, "snapshotConflictHorizon: %u, flags: 0x%02X",
- xlrec->snapshotConflictHorizon, xlrec->flags);
- }
else if (info == XLOG_HEAP2_MULTI_INSERT)
{
xl_heap_multi_insert *xlrec = (xl_heap_multi_insert *) rec;
@@ -462,9 +455,6 @@ heap2_identify(uint8 info)
case XLOG_HEAP2_PRUNE_VACUUM_CLEANUP:
id = "PRUNE_VACUUM_CLEANUP";
break;
- case XLOG_HEAP2_VISIBLE:
- id = "VISIBLE";
- break;
case XLOG_HEAP2_MULTI_INSERT:
id = "MULTI_INSERT";
break;
diff --git a/src/backend/replication/logical/decode.c b/src/backend/replication/logical/decode.c
index cc03f0706e9..2fdd4af90a8 100644
--- a/src/backend/replication/logical/decode.c
+++ b/src/backend/replication/logical/decode.c
@@ -454,7 +454,6 @@ heap2_decode(LogicalDecodingContext *ctx, XLogRecordBuffer *buf)
case XLOG_HEAP2_PRUNE_ON_ACCESS:
case XLOG_HEAP2_PRUNE_VACUUM_SCAN:
case XLOG_HEAP2_PRUNE_VACUUM_CLEANUP:
- case XLOG_HEAP2_VISIBLE:
case XLOG_HEAP2_LOCK_UPDATED:
break;
default:
diff --git a/src/backend/storage/ipc/standby.c b/src/backend/storage/ipc/standby.c
index 4222bdab078..c619643e121 100644
--- a/src/backend/storage/ipc/standby.c
+++ b/src/backend/storage/ipc/standby.c
@@ -475,12 +475,12 @@ ResolveRecoveryConflictWithSnapshot(TransactionId snapshotConflictHorizon,
* If we get passed InvalidTransactionId then we do nothing (no conflict).
*
* This can happen when replaying already-applied WAL records after a
- * standby crash or restart, or when replaying an XLOG_HEAP2_VISIBLE
- * record that marks as frozen a page which was already all-visible. It's
- * also quite common with records generated during index deletion
- * (original execution of the deletion can reason that a recovery conflict
- * which is sufficient for the deletion operation must take place before
- * replay of the deletion record itself).
+ * standby crash or restart, or when replaying a record that marks as
+ * frozen a page which was already marked all-visible in the visibility
+ * map. It's also quite common with records generated during index
+ * deletion (original execution of the deletion can reason that a recovery
+ * conflict which is sufficient for the deletion operation must take place
+ * before replay of the deletion record itself).
*/
if (!TransactionIdIsValid(snapshotConflictHorizon))
return;
diff --git a/src/include/access/heapam_xlog.h b/src/include/access/heapam_xlog.h
index ceae9c083ff..a64677b7bca 100644
--- a/src/include/access/heapam_xlog.h
+++ b/src/include/access/heapam_xlog.h
@@ -60,7 +60,6 @@
#define XLOG_HEAP2_PRUNE_ON_ACCESS 0x10
#define XLOG_HEAP2_PRUNE_VACUUM_SCAN 0x20
#define XLOG_HEAP2_PRUNE_VACUUM_CLEANUP 0x30
-#define XLOG_HEAP2_VISIBLE 0x40
#define XLOG_HEAP2_MULTI_INSERT 0x50
#define XLOG_HEAP2_LOCK_UPDATED 0x60
#define XLOG_HEAP2_NEW_CID 0x70
@@ -438,20 +437,6 @@ typedef struct xl_heap_inplace
#define MinSizeOfHeapInplace (offsetof(xl_heap_inplace, nmsgs) + sizeof(int))
-/*
- * This is what we need to know about setting a visibility map bit
- *
- * Backup blk 0: visibility map buffer
- * Backup blk 1: heap buffer
- */
-typedef struct xl_heap_visible
-{
- TransactionId snapshotConflictHorizon;
- uint8 flags;
-} xl_heap_visible;
-
-#define SizeOfHeapVisible (offsetof(xl_heap_visible, flags) + sizeof(uint8))
-
typedef struct xl_heap_new_cid
{
/*
@@ -495,11 +480,6 @@ extern void heap2_desc(StringInfo buf, XLogReaderState *record);
extern const char *heap2_identify(uint8 info);
extern void heap_xlog_logical_rewrite(XLogReaderState *r);
-extern XLogRecPtr log_heap_visible(Relation rel, Buffer heap_buffer,
- Buffer vm_buffer,
- TransactionId snapshotConflictHorizon,
- uint8 vmflags);
-
/* in heapdesc.c, so it can be shared between frontend/backend code */
extern void heap_xlog_deserialize_prune_and_freeze(char *cursor, uint8 flags,
int *nplans, xlhp_freeze_plan **plans,
diff --git a/src/include/access/visibilitymap.h b/src/include/access/visibilitymap.h
index 977566f6b98..c5b1e1f7adb 100644
--- a/src/include/access/visibilitymap.h
+++ b/src/include/access/visibilitymap.h
@@ -15,7 +15,6 @@
#define VISIBILITYMAP_H
#include "access/visibilitymapdefs.h"
-#include "access/xlogdefs.h"
#include "storage/block.h"
#include "storage/buf.h"
#include "utils/relcache.h"
@@ -31,14 +30,8 @@ extern bool visibilitymap_clear(Relation rel, BlockNumber heapBlk,
extern void visibilitymap_pin(Relation rel, BlockNumber heapBlk,
Buffer *vmbuf);
extern bool visibilitymap_pin_ok(BlockNumber heapBlk, Buffer vmbuf);
-extern uint8 visibilitymap_set(Relation rel,
- BlockNumber heapBlk, Buffer heapBuf,
- XLogRecPtr recptr,
- Buffer vmBuf,
- TransactionId cutoff_xid,
- uint8 flags);
-extern uint8 visibilitymap_set_vmbyte(Relation rel, BlockNumber heapBlk,
- Buffer vmBuf, uint8 flags);
+extern uint8 visibilitymap_set(Relation rel, BlockNumber heapBlk,
+ Buffer vmBuf, uint8 flags);
extern uint8 visibilitymap_get_status(Relation rel, BlockNumber heapBlk, Buffer *vmbuf);
extern void visibilitymap_count(Relation rel, BlockNumber *all_visible, BlockNumber *all_frozen);
extern BlockNumber visibilitymap_prepare_truncate(Relation rel,
diff --git a/src/include/access/visibilitymapdefs.h b/src/include/access/visibilitymapdefs.h
index 5ad5c020877..e01bce4c99f 100644
--- a/src/include/access/visibilitymapdefs.h
+++ b/src/include/access/visibilitymapdefs.h
@@ -21,14 +21,5 @@
#define VISIBILITYMAP_ALL_FROZEN 0x02
#define VISIBILITYMAP_VALID_BITS 0x03 /* OR of all valid visibilitymap
* flags bits */
-/*
- * To detect recovery conflicts during logical decoding on a standby, we need
- * to know if a table is a user catalog table. For that we add an additional
- * bit into xl_heap_visible.flags, in addition to the above.
- *
- * NB: VISIBILITYMAP_XLOG_* may not be passed to visibilitymap_set().
- */
-#define VISIBILITYMAP_XLOG_CATALOG_REL 0x04
-#define VISIBILITYMAP_XLOG_VALID_BITS (VISIBILITYMAP_VALID_BITS | VISIBILITYMAP_XLOG_CATALOG_REL)
#endif /* VISIBILITYMAPDEFS_H */
diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list
index e6f2e93b2d6..98b1adc4e9e 100644
--- a/src/tools/pgindent/typedefs.list
+++ b/src/tools/pgindent/typedefs.list
@@ -4274,7 +4274,6 @@ xl_heap_prune
xl_heap_rewrite_mapping
xl_heap_truncate
xl_heap_update
-xl_heap_visible
xl_invalid_page
xl_invalid_page_key
xl_invalidations
--
2.43.0
v6-0011-Rename-PruneState.freeze-to-attempt_freeze.patchtext/x-patch; charset=US-ASCII; name=v6-0011-Rename-PruneState.freeze-to-attempt_freeze.patchDownload
From 07b7bc0bccd41d93e92e2dee4f5a020dbf3e5b0c Mon Sep 17 00:00:00 2001
From: Melanie Plageman <melanieplageman@gmail.com>
Date: Thu, 31 Jul 2025 14:07:51 -0400
Subject: [PATCH v6 11/20] Rename PruneState.freeze to attempt_freeze
This makes it more clear that this is to indicate the caller would like
heap_page_prune_and_freeze() to consider freezing tuples -- not that we
ultimately will end up freezing them.
Also rename local variable hint_bit_fpi to did_tuple_hint_fpi. This
makes it clear it is about tuple hints and not page hints and that it
indicates something that happened and not something that could happen.
---
src/backend/access/heap/pruneheap.c | 20 ++++++++++----------
1 file changed, 10 insertions(+), 10 deletions(-)
diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index cf9e5215d6b..82127e8728b 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -43,7 +43,7 @@ typedef struct
/* whether or not dead items can be set LP_UNUSED during pruning */
bool mark_unused_now;
/* whether to attempt freezing tuples */
- bool freeze;
+ bool attempt_freeze;
/*
* Whether or not to consider updating the VM. There is some bookkeeping
@@ -452,7 +452,7 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
bool do_set_vm;
uint8 vmflags = 0;
uint8 old_vmbits = 0;
- bool hint_bit_fpi;
+ bool did_tuple_hint_fpi;
int64 fpi_before = pgWalUsage.wal_fpi;
bool all_frozen_except_lp_dead = false;
bool set_pd_all_visible = false;
@@ -460,7 +460,7 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
/* Copy parameters to prstate */
prstate.vistest = vistest;
prstate.mark_unused_now = (options & HEAP_PAGE_PRUNE_MARK_UNUSED_NOW) != 0;
- prstate.freeze = (options & HEAP_PAGE_PRUNE_FREEZE) != 0;
+ prstate.attempt_freeze = (options & HEAP_PAGE_PRUNE_FREEZE) != 0;
prstate.consider_update_vm = (options & HEAP_PAGE_PRUNE_UPDATE_VM) != 0;
prstate.cutoffs = cutoffs;
@@ -485,7 +485,7 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
/* initialize page freezing working state */
prstate.pagefrz.freeze_required = false;
- if (prstate.freeze)
+ if (prstate.attempt_freeze)
{
Assert(new_relfrozen_xid && new_relmin_mxid);
prstate.pagefrz.FreezePageRelfrozenXid = *new_relfrozen_xid;
@@ -535,7 +535,7 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
* bookkeeping. Initializing all_visible to false allows skipping the work
* to update them in heap_prune_record_unchanged_lp_normal().
*/
- if (prstate.freeze)
+ if (prstate.attempt_freeze)
{
prstate.all_visible = true;
prstate.all_frozen = true;
@@ -653,7 +653,7 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
* If checksums are enabled, heap_prune_satisfies_vacuum() may have caused
* an FPI to be emitted.
*/
- hint_bit_fpi = fpi_before != pgWalUsage.wal_fpi;
+ did_tuple_hint_fpi = fpi_before != pgWalUsage.wal_fpi;
/*
* Process HOT chains.
@@ -770,7 +770,7 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
* plans we prepared, or not.
*/
do_freeze = false;
- if (prstate.freeze)
+ if (prstate.attempt_freeze)
{
if (prstate.pagefrz.freeze_required)
{
@@ -803,7 +803,7 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
*/
if (RelationNeedsWAL(relation))
{
- if (hint_bit_fpi)
+ if (did_tuple_hint_fpi)
do_freeze = true;
else if (do_prune)
{
@@ -1127,7 +1127,7 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
presult->lpdead_items = prstate.lpdead_items;
/* the presult->deadoffsets array was already filled in */
- if (prstate.freeze)
+ if (prstate.attempt_freeze)
{
if (presult->nfrozen > 0)
{
@@ -1714,7 +1714,7 @@ heap_prune_record_unchanged_lp_normal(Page page, PruneState *prstate, OffsetNumb
* to update the VM, we have to call heap_prepare_freeze_tuple() on every
* tuple to know whether or not the page will be totally frozen.
*/
- if (prstate->freeze)
+ if (prstate->attempt_freeze)
{
bool totally_frozen;
--
2.43.0
v6-0015-Inline-TransactionIdFollows-Precedes.patchtext/x-patch; charset=US-ASCII; name=v6-0015-Inline-TransactionIdFollows-Precedes.patchDownload
From 536d921a94bb3242583c97944e351b6f6a17d600 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <melanieplageman@gmail.com>
Date: Tue, 29 Jul 2025 09:57:13 -0400
Subject: [PATCH v6 15/20] Inline TransactionIdFollows/Precedes()
Calling these from on-access pruning code had noticeable overhead in a
profile. There does not seem to be a reason not to inline them.
---
src/backend/access/transam/transam.c | 64 -------------------------
src/include/access/transam.h | 70 ++++++++++++++++++++++++++--
2 files changed, 66 insertions(+), 68 deletions(-)
diff --git a/src/backend/access/transam/transam.c b/src/backend/access/transam/transam.c
index 9a39451a29a..553d6756cb3 100644
--- a/src/backend/access/transam/transam.c
+++ b/src/backend/access/transam/transam.c
@@ -273,70 +273,6 @@ TransactionIdAbortTree(TransactionId xid, int nxids, TransactionId *xids)
TRANSACTION_STATUS_ABORTED, InvalidXLogRecPtr);
}
-/*
- * TransactionIdPrecedes --- is id1 logically < id2?
- */
-bool
-TransactionIdPrecedes(TransactionId id1, TransactionId id2)
-{
- /*
- * If either ID is a permanent XID then we can just do unsigned
- * comparison. If both are normal, do a modulo-2^32 comparison.
- */
- int32 diff;
-
- if (!TransactionIdIsNormal(id1) || !TransactionIdIsNormal(id2))
- return (id1 < id2);
-
- diff = (int32) (id1 - id2);
- return (diff < 0);
-}
-
-/*
- * TransactionIdPrecedesOrEquals --- is id1 logically <= id2?
- */
-bool
-TransactionIdPrecedesOrEquals(TransactionId id1, TransactionId id2)
-{
- int32 diff;
-
- if (!TransactionIdIsNormal(id1) || !TransactionIdIsNormal(id2))
- return (id1 <= id2);
-
- diff = (int32) (id1 - id2);
- return (diff <= 0);
-}
-
-/*
- * TransactionIdFollows --- is id1 logically > id2?
- */
-bool
-TransactionIdFollows(TransactionId id1, TransactionId id2)
-{
- int32 diff;
-
- if (!TransactionIdIsNormal(id1) || !TransactionIdIsNormal(id2))
- return (id1 > id2);
-
- diff = (int32) (id1 - id2);
- return (diff > 0);
-}
-
-/*
- * TransactionIdFollowsOrEquals --- is id1 logically >= id2?
- */
-bool
-TransactionIdFollowsOrEquals(TransactionId id1, TransactionId id2)
-{
- int32 diff;
-
- if (!TransactionIdIsNormal(id1) || !TransactionIdIsNormal(id2))
- return (id1 >= id2);
-
- diff = (int32) (id1 - id2);
- return (diff >= 0);
-}
-
/*
* TransactionIdLatest --- get latest XID among a main xact and its children
diff --git a/src/include/access/transam.h b/src/include/access/transam.h
index 7d82cd2eb56..c9e20418275 100644
--- a/src/include/access/transam.h
+++ b/src/include/access/transam.h
@@ -255,6 +255,72 @@ typedef struct TransamVariablesData
} TransamVariablesData;
+
+/*
+ * TransactionIdPrecedes --- is id1 logically < id2?
+ */
+static inline bool
+TransactionIdPrecedes(TransactionId id1, TransactionId id2)
+{
+ /*
+ * If either ID is a permanent XID then we can just do unsigned
+ * comparison. If both are normal, do a modulo-2^32 comparison.
+ */
+ int32 diff;
+
+ if (!TransactionIdIsNormal(id1) || !TransactionIdIsNormal(id2))
+ return (id1 < id2);
+
+ diff = (int32) (id1 - id2);
+ return (diff < 0);
+}
+
+/*
+ * TransactionIdPrecedesOrEquals --- is id1 logically <= id2?
+ */
+static inline bool
+TransactionIdPrecedesOrEquals(TransactionId id1, TransactionId id2)
+{
+ int32 diff;
+
+ if (!TransactionIdIsNormal(id1) || !TransactionIdIsNormal(id2))
+ return (id1 <= id2);
+
+ diff = (int32) (id1 - id2);
+ return (diff <= 0);
+}
+
+/*
+ * TransactionIdFollows --- is id1 logically > id2?
+ */
+static inline bool
+TransactionIdFollows(TransactionId id1, TransactionId id2)
+{
+ int32 diff;
+
+ if (!TransactionIdIsNormal(id1) || !TransactionIdIsNormal(id2))
+ return (id1 > id2);
+
+ diff = (int32) (id1 - id2);
+ return (diff > 0);
+}
+
+/*
+ * TransactionIdFollowsOrEquals --- is id1 logically >= id2?
+ */
+static inline bool
+TransactionIdFollowsOrEquals(TransactionId id1, TransactionId id2)
+{
+ int32 diff;
+
+ if (!TransactionIdIsNormal(id1) || !TransactionIdIsNormal(id2))
+ return (id1 >= id2);
+
+ diff = (int32) (id1 - id2);
+ return (diff >= 0);
+}
+
+
/* ----------------
* extern declarations
* ----------------
@@ -274,10 +340,6 @@ extern bool TransactionIdDidAbort(TransactionId transactionId);
extern void TransactionIdCommitTree(TransactionId xid, int nxids, TransactionId *xids);
extern void TransactionIdAsyncCommitTree(TransactionId xid, int nxids, TransactionId *xids, XLogRecPtr lsn);
extern void TransactionIdAbortTree(TransactionId xid, int nxids, TransactionId *xids);
-extern bool TransactionIdPrecedes(TransactionId id1, TransactionId id2);
-extern bool TransactionIdPrecedesOrEquals(TransactionId id1, TransactionId id2);
-extern bool TransactionIdFollows(TransactionId id1, TransactionId id2);
-extern bool TransactionIdFollowsOrEquals(TransactionId id1, TransactionId id2);
extern TransactionId TransactionIdLatest(TransactionId mainxid,
int nxids, const TransactionId *xids);
extern XLogRecPtr TransactionIdGetCommitLSN(TransactionId xid);
--
2.43.0
v6-0017-Allow-on-access-pruning-to-set-pages-all-visible.patchtext/x-patch; charset=US-ASCII; name=v6-0017-Allow-on-access-pruning-to-set-pages-all-visible.patchDownload
From 17dabfb6dade53ab1a73272edc383ed482989329 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <melanieplageman@gmail.com>
Date: Tue, 29 Jul 2025 14:34:30 -0400
Subject: [PATCH v6 17/20] Allow on-access pruning to set pages all-visible
Many queries do not modify the underlying relation. For such queries, if
on-access pruning occurs during the scan, we can check whether the page
has become all-visible and update the visibility map accordingly.
Previously, only vacuum marked pages as all-visible or all-frozen.
Supporting this requires passing information about whether the relation
is modified from the executor down to the scan descriptor.
This commit implements on-access VM setting for sequential scans as well
as for the underlying heap relation in index scans and bitmap heap
scans.
---
src/backend/access/heap/heapam.c | 15 ++++-
src/backend/access/heap/heapam_handler.c | 15 ++++-
src/backend/access/heap/pruneheap.c | 63 ++++++++++++++-----
src/backend/access/index/indexam.c | 46 ++++++++++++++
src/backend/access/table/tableam.c | 39 ++++++++++--
src/backend/executor/execMain.c | 4 ++
src/backend/executor/execUtils.c | 2 +
src/backend/executor/nodeBitmapHeapscan.c | 7 ++-
src/backend/executor/nodeIndexscan.c | 18 ++++--
src/backend/executor/nodeSeqscan.c | 24 +++++--
src/include/access/genam.h | 11 ++++
src/include/access/heapam.h | 24 ++++++-
src/include/access/relscan.h | 6 ++
src/include/access/tableam.h | 30 ++++++++-
src/include/nodes/execnodes.h | 6 ++
.../t/035_standby_logical_decoding.pl | 8 ++-
16 files changed, 278 insertions(+), 40 deletions(-)
diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index 48f7b84156a..f90b014a9b0 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -556,6 +556,7 @@ heap_prepare_pagescan(TableScanDesc sscan)
Buffer buffer = scan->rs_cbuf;
BlockNumber block = scan->rs_cblock;
Snapshot snapshot;
+ Buffer *vmbuffer = NULL;
Page page;
int lines;
bool all_visible;
@@ -570,7 +571,9 @@ heap_prepare_pagescan(TableScanDesc sscan)
/*
* Prune and repair fragmentation for the whole page, if possible.
*/
- heap_page_prune_opt(scan->rs_base.rs_rd, buffer);
+ if (sscan->rs_flags & SO_ALLOW_VM_SET)
+ vmbuffer = &scan->rs_vmbuffer;
+ heap_page_prune_opt(scan->rs_base.rs_rd, buffer, vmbuffer);
/*
* We must hold share lock on the buffer content while examining tuple
@@ -1236,6 +1239,7 @@ heap_beginscan(Relation relation, Snapshot snapshot,
sizeof(TBMIterateResult));
}
+ scan->rs_vmbuffer = InvalidBuffer;
return (TableScanDesc) scan;
}
@@ -1274,6 +1278,12 @@ heap_rescan(TableScanDesc sscan, ScanKey key, bool set_params,
scan->rs_cbuf = InvalidBuffer;
}
+ if (BufferIsValid(scan->rs_vmbuffer))
+ {
+ ReleaseBuffer(scan->rs_vmbuffer);
+ scan->rs_vmbuffer = InvalidBuffer;
+ }
+
/*
* SO_TYPE_BITMAPSCAN would be cleaned up here, but it does not hold any
* additional data vs a normal HeapScan
@@ -1306,6 +1316,9 @@ heap_endscan(TableScanDesc sscan)
if (BufferIsValid(scan->rs_cbuf))
ReleaseBuffer(scan->rs_cbuf);
+ if (BufferIsValid(scan->rs_vmbuffer))
+ ReleaseBuffer(scan->rs_vmbuffer);
+
/*
* Must free the read stream before freeing the BufferAccessStrategy.
*/
diff --git a/src/backend/access/heap/heapam_handler.c b/src/backend/access/heap/heapam_handler.c
index cb4bc35c93e..c68283de6f2 100644
--- a/src/backend/access/heap/heapam_handler.c
+++ b/src/backend/access/heap/heapam_handler.c
@@ -85,6 +85,7 @@ heapam_index_fetch_begin(Relation rel)
hscan->xs_base.rel = rel;
hscan->xs_cbuf = InvalidBuffer;
+ hscan->xs_vmbuffer = InvalidBuffer;
return &hscan->xs_base;
}
@@ -99,6 +100,12 @@ heapam_index_fetch_reset(IndexFetchTableData *scan)
ReleaseBuffer(hscan->xs_cbuf);
hscan->xs_cbuf = InvalidBuffer;
}
+
+ if (BufferIsValid(hscan->xs_vmbuffer))
+ {
+ ReleaseBuffer(hscan->xs_vmbuffer);
+ hscan->xs_vmbuffer = InvalidBuffer;
+ }
}
static void
@@ -138,7 +145,8 @@ heapam_index_fetch_tuple(struct IndexFetchTableData *scan,
* Prune page, but only if we weren't already on this page
*/
if (prev_buf != hscan->xs_cbuf)
- heap_page_prune_opt(hscan->xs_base.rel, hscan->xs_cbuf);
+ heap_page_prune_opt(hscan->xs_base.rel, hscan->xs_cbuf,
+ scan->modifies_base_rel ? NULL : &hscan->xs_vmbuffer);
}
/* Obtain share-lock on the buffer so we can examine visibility */
@@ -2471,6 +2479,7 @@ BitmapHeapScanNextBlock(TableScanDesc scan,
TBMIterateResult *tbmres;
OffsetNumber offsets[TBM_MAX_TUPLES_PER_PAGE];
int noffsets = -1;
+ Buffer *vmbuffer = NULL;
Assert(scan->rs_flags & SO_TYPE_BITMAPSCAN);
Assert(hscan->rs_read_stream);
@@ -2517,7 +2526,9 @@ BitmapHeapScanNextBlock(TableScanDesc scan,
/*
* Prune and repair fragmentation for the whole page, if possible.
*/
- heap_page_prune_opt(scan->rs_rd, buffer);
+ if (scan->rs_flags & SO_ALLOW_VM_SET)
+ vmbuffer = &hscan->rs_vmbuffer;
+ heap_page_prune_opt(scan->rs_rd, buffer, vmbuffer);
/*
* We must hold share lock on the buffer content while examining tuple
diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index 80d055e5376..dad341cb265 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -198,9 +198,13 @@ static bool identify_and_fix_vm_corruption(Relation relation,
* if there's not any use in pruning.
*
* Caller must have pin on the buffer, and must *not* have a lock on it.
+ *
+ * If vmbuffer is not NULL, it is okay for pruning to set the visibility map if
+ * the page is all visible. We will take care of pinning and, if needed,
+ * reading in the page of the visibility map.
*/
void
-heap_page_prune_opt(Relation relation, Buffer buffer)
+heap_page_prune_opt(Relation relation, Buffer buffer, Buffer *vmbuffer)
{
Page page = BufferGetPage(buffer);
TransactionId prune_xid;
@@ -264,6 +268,13 @@ heap_page_prune_opt(Relation relation, Buffer buffer)
{
OffsetNumber dummy_off_loc;
PruneFreezeResult presult;
+ int options = 0;
+
+ if (vmbuffer)
+ {
+ visibilitymap_pin(relation, BufferGetBlockNumber(buffer), vmbuffer);
+ options = HEAP_PAGE_PRUNE_UPDATE_VM;
+ }
/*
* For now, pass mark_unused_now as false regardless of whether or
@@ -271,9 +282,10 @@ heap_page_prune_opt(Relation relation, Buffer buffer)
* that during on-access pruning with the current implementation.
*/
heap_page_prune_and_freeze(relation, buffer, false,
- InvalidBuffer,
- vistest, 0,
- NULL, &presult, PRUNE_ON_ACCESS, &dummy_off_loc, NULL, NULL);
+ vmbuffer ? *vmbuffer : InvalidBuffer,
+ vistest, options,
+ NULL, &presult, PRUNE_ON_ACCESS,
+ &dummy_off_loc, NULL, NULL);
/*
* Report the number of tuples reclaimed to pgstats. This is
@@ -513,12 +525,17 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
* all-frozen for use in opportunistic freezing and to update the VM if
* the caller requests it.
*
- * Currently, only VACUUM attempts freezing and setting the VM bits. But
- * other callers could do either one. The visibility bookkeeping is
- * required for opportunistic freezing (in addition to setting the VM
- * bits) because we only consider opportunistically freezing tuples if the
- * whole page would become all-frozen or if the whole page will be frozen
- * except for dead tuples that will be removed by vacuum.
+ * Currently, only VACUUM attempts freezing. But other callers could. The
+ * visibility bookkeeping is required for opportunistic freezing (in
+ * addition to setting the VM bits) because we only consider
+ * opportunistically freezing tuples if the whole page would become
+ * all-frozen or if the whole page will be frozen except for dead tuples
+ * that will be removed by vacuum. But if consider_update_vm is false,
+ * we'll not set the VM even if the page is discovered to be all-visible.
+ *
+ * If only HEAP_PAGE_PRUNE_UPDATE_VM is passed and not
+ * HEAP_PAGE_PRUNE_FREEZE, prstate.all_frozen must be initialized to false
+ * because we will not call heap_prepare_freeze_tuple() on each tuple.
*
* If only updating the VM, we must initialize all_frozen to false, as
* heap_prepare_freeze_tuple() will not be called for each tuple on the
@@ -530,7 +547,7 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
* whether or not to freeze but before deciding whether or not to update
* the VM so that we don't set the VM bit incorrectly.
*
- * If not freezing or updating the VM, we otherwise avoid the extra
+ * If not freezing and not updating the VM, we avoid the extra
* bookkeeping. Initializing all_visible to false allows skipping the work
* to update them in heap_prune_record_unchanged_lp_normal().
*/
@@ -879,12 +896,30 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
prstate.all_frozen = false;
}
+ /*
+ * If this is an on-access call and we're not actually pruning, avoid
+ * setting the visibility map if it would newly dirty the heap page or, if
+ * the page is already dirty, if doing so would require including a
+ * full-page image (FPI) of the heap page in the WAL. This situation
+ * should be rare, as on-access pruning is only attempted when
+ * pd_prune_xid is valid.
+ */
+ if (reason == PRUNE_ON_ACCESS &&
+ prstate.consider_update_vm &&
+ prstate.all_visible &&
+ !do_prune && !do_freeze &&
+ (!BufferIsDirty(buffer) || XLogCheckBufferNeedsBackup(buffer)))
+ {
+ prstate.consider_update_vm = false;
+ prstate.all_visible = prstate.all_frozen = false;
+ }
+
Assert(!prstate.all_frozen || prstate.all_visible);
/*
- * Handle setting visibility map bit based on information from the VM (as
- * of last heap_vac_scan_next_block() call), and from all_visible and
- * all_frozen variables.
+ * Handle setting visibility map bit based on information from the VM (if
+ * provided, e.g. by vacuum from the last heap_vac_scan_next_block()
+ * call), and from all_visible and all_frozen variables.
*/
if (prstate.consider_update_vm)
{
diff --git a/src/backend/access/index/indexam.c b/src/backend/access/index/indexam.c
index 219df1971da..d803c307517 100644
--- a/src/backend/access/index/indexam.c
+++ b/src/backend/access/index/indexam.c
@@ -279,6 +279,32 @@ index_beginscan(Relation heapRelation,
return scan;
}
+/*
+ * Similar to index_beginscan(), but allows the caller to indicate whether the
+ * query modifies the underlying base relation. This is used when the caller
+ * wants to attempt marking pages in the base relation as all-visible in the
+ * visibility map during on-access pruning.
+ */
+IndexScanDesc
+index_beginscan_vmset(Relation heapRelation,
+ Relation indexRelation,
+ Snapshot snapshot,
+ IndexScanInstrumentation *instrument,
+ int nkeys, int norderbys, bool modifies_base_rel)
+{
+ IndexScanDesc scan;
+
+ scan = index_beginscan(heapRelation,
+ indexRelation,
+ snapshot,
+ instrument,
+ nkeys, norderbys);
+
+ scan->xs_heapfetch->modifies_base_rel = modifies_base_rel;
+
+ return scan;
+}
+
/*
* index_beginscan_bitmap - start a scan of an index with amgetbitmap
*
@@ -610,6 +636,26 @@ index_beginscan_parallel(Relation heaprel, Relation indexrel,
return scan;
}
+/*
+ * Parallel version of index_beginscan_vmset()
+ */
+IndexScanDesc
+index_beginscan_parallel_vmset(Relation heaprel, Relation indexrel,
+ IndexScanInstrumentation *instrument,
+ int nkeys, int norderbys,
+ ParallelIndexScanDesc pscan,
+ bool modifies_base_rel)
+{
+ IndexScanDesc scan;
+
+ scan = index_beginscan_parallel(heaprel, indexrel,
+ instrument,
+ nkeys, norderbys,
+ pscan);
+ scan->xs_heapfetch->modifies_base_rel = modifies_base_rel;
+ return scan;
+}
+
/* ----------------
* index_getnext_tid - get the next TID from a scan
*
diff --git a/src/backend/access/table/tableam.c b/src/backend/access/table/tableam.c
index a56c5eceb14..67dbf99f5b5 100644
--- a/src/backend/access/table/tableam.c
+++ b/src/backend/access/table/tableam.c
@@ -49,6 +49,10 @@
char *default_table_access_method = DEFAULT_TABLE_ACCESS_METHOD;
bool synchronize_seqscans = true;
+/* Helper for table_beginscan_parallel() and table_beginscan_parallel_vmset() */
+static TableScanDesc table_beginscan_parallel_common(Relation relation, ParallelTableScanDesc pscan,
+ uint32 flags);
+
/* ----------------------------------------------------------------------------
* Slot functions.
@@ -162,12 +166,14 @@ table_parallelscan_initialize(Relation rel, ParallelTableScanDesc pscan,
}
}
-TableScanDesc
-table_beginscan_parallel(Relation relation, ParallelTableScanDesc pscan)
+/*
+ * Common helper for table_beginscan_parallel() and table_beginscan_parallel_vmset()
+ */
+static TableScanDesc
+table_beginscan_parallel_common(Relation relation, ParallelTableScanDesc pscan,
+ uint32 flags)
{
Snapshot snapshot;
- uint32 flags = SO_TYPE_SEQSCAN |
- SO_ALLOW_STRAT | SO_ALLOW_SYNC | SO_ALLOW_PAGEMODE;
Assert(RelFileLocatorEquals(relation->rd_locator, pscan->phs_locator));
@@ -188,6 +194,31 @@ table_beginscan_parallel(Relation relation, ParallelTableScanDesc pscan)
pscan, flags);
}
+TableScanDesc
+table_beginscan_parallel(Relation relation, ParallelTableScanDesc pscan)
+{
+ uint32 flags = SO_TYPE_SEQSCAN |
+ SO_ALLOW_STRAT | SO_ALLOW_SYNC | SO_ALLOW_PAGEMODE;
+
+ return table_beginscan_parallel_common(relation, pscan, flags);
+}
+
+/*
+ * Parallel version of table_beginscan_vmset()
+ */
+TableScanDesc
+table_beginscan_parallel_vmset(Relation relation, ParallelTableScanDesc pscan,
+ bool modifies_rel)
+{
+ uint32 flags = SO_TYPE_SEQSCAN |
+ SO_ALLOW_STRAT | SO_ALLOW_SYNC | SO_ALLOW_PAGEMODE;
+
+ if (!modifies_rel)
+ flags |= SO_ALLOW_VM_SET;
+
+ return table_beginscan_parallel_common(relation, pscan, flags);
+}
+
/* ----------------------------------------------------------------------------
* Index scan related functions.
diff --git a/src/backend/executor/execMain.c b/src/backend/executor/execMain.c
index 0391798dd2c..065676eb7cf 100644
--- a/src/backend/executor/execMain.c
+++ b/src/backend/executor/execMain.c
@@ -917,6 +917,10 @@ InitPlan(QueryDesc *queryDesc, int eflags)
break;
}
+ /* If it has a rowmark, the relation is modified */
+ estate->es_modified_relids = bms_add_member(estate->es_modified_relids,
+ rc->rti);
+
/* Check that relation is a legal target for marking */
if (relation)
CheckValidRowMarkRel(relation, rc->markType);
diff --git a/src/backend/executor/execUtils.c b/src/backend/executor/execUtils.c
index fdc65c2b42b..28a06dcd244 100644
--- a/src/backend/executor/execUtils.c
+++ b/src/backend/executor/execUtils.c
@@ -893,6 +893,8 @@ ExecInitResultRelation(EState *estate, ResultRelInfo *resultRelInfo,
estate->es_result_relations = (ResultRelInfo **)
palloc0(estate->es_range_table_size * sizeof(ResultRelInfo *));
estate->es_result_relations[rti - 1] = resultRelInfo;
+ estate->es_modified_relids = bms_add_member(estate->es_modified_relids,
+ rti);
/*
* Saving in the list allows to avoid needlessly traversing the whole
diff --git a/src/backend/executor/nodeBitmapHeapscan.c b/src/backend/executor/nodeBitmapHeapscan.c
index bf24f3d7fe0..af6db9f7919 100644
--- a/src/backend/executor/nodeBitmapHeapscan.c
+++ b/src/backend/executor/nodeBitmapHeapscan.c
@@ -105,11 +105,16 @@ BitmapTableScanSetup(BitmapHeapScanState *node)
*/
if (!node->ss.ss_currentScanDesc)
{
+ bool modifies_rel =
+ bms_is_member(((Scan *) node->ss.ps.plan)->scanrelid,
+ node->ss.ps.state->es_modified_relids);
+
node->ss.ss_currentScanDesc =
table_beginscan_bm(node->ss.ss_currentRelation,
node->ss.ps.state->es_snapshot,
0,
- NULL);
+ NULL,
+ modifies_rel);
}
node->ss.ss_currentScanDesc->st.rs_tbmiterator = tbmiterator;
diff --git a/src/backend/executor/nodeIndexscan.c b/src/backend/executor/nodeIndexscan.c
index 7fcaa37fe62..c2ffbd3b08e 100644
--- a/src/backend/executor/nodeIndexscan.c
+++ b/src/backend/executor/nodeIndexscan.c
@@ -102,16 +102,22 @@ IndexNext(IndexScanState *node)
if (scandesc == NULL)
{
+
+ bool modifies_base_rel =
+ bms_is_member(((Scan *) node->ss.ps.plan)->scanrelid,
+ estate->es_modified_relids);
+
/*
* We reach here if the index scan is not parallel, or if we're
* serially executing an index scan that was planned to be parallel.
*/
- scandesc = index_beginscan(node->ss.ss_currentRelation,
- node->iss_RelationDesc,
- estate->es_snapshot,
- &node->iss_Instrument,
- node->iss_NumScanKeys,
- node->iss_NumOrderByKeys);
+ scandesc = index_beginscan_vmset(node->ss.ss_currentRelation,
+ node->iss_RelationDesc,
+ estate->es_snapshot,
+ &node->iss_Instrument,
+ node->iss_NumScanKeys,
+ node->iss_NumOrderByKeys,
+ modifies_base_rel);
node->iss_ScanDesc = scandesc;
diff --git a/src/backend/executor/nodeSeqscan.c b/src/backend/executor/nodeSeqscan.c
index ed35c58c2c3..15e1853027b 100644
--- a/src/backend/executor/nodeSeqscan.c
+++ b/src/backend/executor/nodeSeqscan.c
@@ -65,13 +65,18 @@ SeqNext(SeqScanState *node)
if (scandesc == NULL)
{
+ bool modifies_rel =
+ bms_is_member(((Scan *) node->ss.ps.plan)->scanrelid,
+ estate->es_modified_relids);
+
/*
* We reach here if the scan is not parallel, or if we're serially
* executing a scan that was planned to be parallel.
*/
- scandesc = table_beginscan(node->ss.ss_currentRelation,
- estate->es_snapshot,
- 0, NULL);
+ scandesc = table_beginscan_vmset(node->ss.ss_currentRelation,
+ estate->es_snapshot,
+ 0, NULL, modifies_rel);
+
node->ss.ss_currentScanDesc = scandesc;
}
@@ -362,6 +367,7 @@ ExecSeqScanInitializeDSM(SeqScanState *node,
ParallelContext *pcxt)
{
EState *estate = node->ss.ps.state;
+ bool modifies_rel;
ParallelTableScanDesc pscan;
pscan = shm_toc_allocate(pcxt->toc, node->pscan_len);
@@ -369,8 +375,11 @@ ExecSeqScanInitializeDSM(SeqScanState *node,
pscan,
estate->es_snapshot);
shm_toc_insert(pcxt->toc, node->ss.ps.plan->plan_node_id, pscan);
+ modifies_rel = bms_is_member(((Scan *) node->ss.ps.plan)->scanrelid,
+ estate->es_modified_relids);
node->ss.ss_currentScanDesc =
- table_beginscan_parallel(node->ss.ss_currentRelation, pscan);
+ table_beginscan_parallel_vmset(node->ss.ss_currentRelation, pscan,
+ modifies_rel);
}
/* ----------------------------------------------------------------
@@ -400,8 +409,13 @@ ExecSeqScanInitializeWorker(SeqScanState *node,
ParallelWorkerContext *pwcxt)
{
ParallelTableScanDesc pscan;
+ bool modifies_rel =
+ bms_is_member(((Scan *) node->ss.ps.plan)->scanrelid,
+ node->ss.ps.state->es_modified_relids);
pscan = shm_toc_lookup(pwcxt->toc, node->ss.ps.plan->plan_node_id, false);
node->ss.ss_currentScanDesc =
- table_beginscan_parallel(node->ss.ss_currentRelation, pscan);
+ table_beginscan_parallel_vmset(node->ss.ss_currentRelation,
+ pscan,
+ modifies_rel);
}
diff --git a/src/include/access/genam.h b/src/include/access/genam.h
index 5b2ab181b5f..bf272c2c37f 100644
--- a/src/include/access/genam.h
+++ b/src/include/access/genam.h
@@ -180,6 +180,11 @@ extern IndexScanDesc index_beginscan(Relation heapRelation,
Snapshot snapshot,
IndexScanInstrumentation *instrument,
int nkeys, int norderbys);
+extern IndexScanDesc index_beginscan_vmset(Relation heapRelation,
+ Relation indexRelation,
+ Snapshot snapshot,
+ IndexScanInstrumentation *instrument,
+ int nkeys, int norderbys, bool modifies_heap_rel);
extern IndexScanDesc index_beginscan_bitmap(Relation indexRelation,
Snapshot snapshot,
IndexScanInstrumentation *instrument,
@@ -206,6 +211,12 @@ extern IndexScanDesc index_beginscan_parallel(Relation heaprel,
IndexScanInstrumentation *instrument,
int nkeys, int norderbys,
ParallelIndexScanDesc pscan);
+
+extern IndexScanDesc index_beginscan_parallel_vmset(Relation heaprel, Relation indexrel,
+ IndexScanInstrumentation *instrument,
+ int nkeys, int norderbys,
+ ParallelIndexScanDesc pscan,
+ bool modifies_rel);
extern ItemPointer index_getnext_tid(IndexScanDesc scan,
ScanDirection direction);
struct TupleTableSlot;
diff --git a/src/include/access/heapam.h b/src/include/access/heapam.h
index 4278f351bdf..16f7904a21e 100644
--- a/src/include/access/heapam.h
+++ b/src/include/access/heapam.h
@@ -94,6 +94,13 @@ typedef struct HeapScanDescData
*/
ParallelBlockTableScanWorkerData *rs_parallelworkerdata;
+ /*
+ * For sequential scans and bitmap heap scans. If the relation is not
+ * being modified, on-access pruning may read in the current heap page's
+ * corresponding VM block to this buffer.
+ */
+ Buffer rs_vmbuffer;
+
/* these fields only used in page-at-a-time mode and for bitmap scans */
uint32 rs_cindex; /* current tuple's index in vistuples */
uint32 rs_ntuples; /* number of visible tuples on page */
@@ -116,8 +123,18 @@ typedef struct IndexFetchHeapData
{
IndexFetchTableData xs_base; /* AM independent part of the descriptor */
- Buffer xs_cbuf; /* current heap buffer in scan, if any */
- /* NB: if xs_cbuf is not InvalidBuffer, we hold a pin on that buffer */
+ /*
+ * Current heap buffer in scan, if any. NB: if xs_cbuf is not
+ * InvalidBuffer, we hold a pin on that buffer.
+ */
+ Buffer xs_cbuf;
+
+ /*
+ * For index scans that do not modify the underlying heap table, on-access
+ * pruning may read in the current heap page's corresponding VM block to
+ * this buffer.
+ */
+ Buffer xs_vmbuffer;
} IndexFetchHeapData;
/* Result codes for HeapTupleSatisfiesVacuum */
@@ -374,7 +391,8 @@ extern TransactionId heap_index_delete_tuples(Relation rel,
/* in heap/pruneheap.c */
struct GlobalVisState;
-extern void heap_page_prune_opt(Relation relation, Buffer buffer);
+extern void heap_page_prune_opt(Relation relation, Buffer buffer,
+ Buffer *vmbuffer);
extern void heap_page_prune_and_freeze(Relation relation, Buffer buffer,
bool blk_known_av,
Buffer vmbuffer,
diff --git a/src/include/access/relscan.h b/src/include/access/relscan.h
index b5e0fb386c0..f496e0b4939 100644
--- a/src/include/access/relscan.h
+++ b/src/include/access/relscan.h
@@ -121,6 +121,12 @@ typedef struct ParallelBlockTableScanWorkerData *ParallelBlockTableScanWorker;
typedef struct IndexFetchTableData
{
Relation rel;
+
+ /*
+ * Some optimizations can only be performed if the query does not modify
+ * the underlying relation. Track that here.
+ */
+ bool modifies_base_rel;
} IndexFetchTableData;
struct IndexScanInstrumentation;
diff --git a/src/include/access/tableam.h b/src/include/access/tableam.h
index 1c9e802a6b1..0e986d8ef72 100644
--- a/src/include/access/tableam.h
+++ b/src/include/access/tableam.h
@@ -62,6 +62,8 @@ typedef enum ScanOptions
/* unregister snapshot at scan end? */
SO_TEMP_SNAPSHOT = 1 << 9,
+ /* whether or not scan should attempt to set the VM */
+ SO_ALLOW_VM_SET = 1 << 10,
} ScanOptions;
/*
@@ -876,6 +878,25 @@ table_beginscan(Relation rel, Snapshot snapshot,
return rel->rd_tableam->scan_begin(rel, snapshot, nkeys, key, NULL, flags);
}
+/*
+ * Similar to table_beginscan(), but allows the caller to indicate whether the
+ * query modifies the relation. This is used when the caller wants to attempt
+ * marking pages in the relation as all-visible in the visibility map during
+ * on-access pruning.
+ */
+static inline TableScanDesc
+table_beginscan_vmset(Relation rel, Snapshot snapshot,
+ int nkeys, struct ScanKeyData *key, bool modifies_rel)
+{
+ uint32 flags = SO_TYPE_SEQSCAN |
+ SO_ALLOW_STRAT | SO_ALLOW_SYNC | SO_ALLOW_PAGEMODE;
+
+ if (!modifies_rel)
+ flags |= SO_ALLOW_VM_SET;
+
+ return rel->rd_tableam->scan_begin(rel, snapshot, nkeys, key, NULL, flags);
+}
+
/*
* Like table_beginscan(), but for scanning catalog. It'll automatically use a
* snapshot appropriate for scanning catalog relations.
@@ -913,10 +934,13 @@ table_beginscan_strat(Relation rel, Snapshot snapshot,
*/
static inline TableScanDesc
table_beginscan_bm(Relation rel, Snapshot snapshot,
- int nkeys, struct ScanKeyData *key)
+ int nkeys, struct ScanKeyData *key, bool modifies_rel)
{
uint32 flags = SO_TYPE_BITMAPSCAN | SO_ALLOW_PAGEMODE;
+ if (!modifies_rel)
+ flags |= SO_ALLOW_VM_SET;
+
return rel->rd_tableam->scan_begin(rel, snapshot, nkeys, key,
NULL, flags);
}
@@ -1125,6 +1149,10 @@ extern void table_parallelscan_initialize(Relation rel,
extern TableScanDesc table_beginscan_parallel(Relation relation,
ParallelTableScanDesc pscan);
+extern TableScanDesc table_beginscan_parallel_vmset(Relation relation,
+ ParallelTableScanDesc pscan,
+ bool modifies_rel);
+
/*
* Restart a parallel scan. Call this in the leader process. Caller is
* responsible for making sure that all workers have finished the scan
diff --git a/src/include/nodes/execnodes.h b/src/include/nodes/execnodes.h
index e107d6e5f81..326d7d78860 100644
--- a/src/include/nodes/execnodes.h
+++ b/src/include/nodes/execnodes.h
@@ -680,6 +680,12 @@ typedef struct EState
* ExecDoInitialPruning() */
const char *es_sourceText; /* Source text from QueryDesc */
+ /*
+ * RT indexes of relations modified by the query either through
+ * UPDATE/DELETE/INSERT/MERGE or SELECT FOR UPDATE
+ */
+ Bitmapset *es_modified_relids;
+
JunkFilter *es_junkFilter; /* top-level junk filter, if any */
/* If query can insert/delete tuples, the command ID to mark them with */
diff --git a/src/test/recovery/t/035_standby_logical_decoding.pl b/src/test/recovery/t/035_standby_logical_decoding.pl
index 921813483e3..5d0863a7933 100644
--- a/src/test/recovery/t/035_standby_logical_decoding.pl
+++ b/src/test/recovery/t/035_standby_logical_decoding.pl
@@ -9,6 +9,7 @@ use warnings FATAL => 'all';
use PostgreSQL::Test::Cluster;
use PostgreSQL::Test::Utils;
use Test::More;
+use Time::HiRes qw(usleep);
if ($ENV{enable_injection_points} ne 'yes')
{
@@ -295,6 +296,7 @@ wal_level = 'logical'
max_replication_slots = 4
max_wal_senders = 4
autovacuum = off
+hot_standby_feedback = on
});
$node_primary->dump_info;
$node_primary->start;
@@ -744,7 +746,7 @@ check_pg_recvlogical_stderr($handle,
$logstart = -s $node_standby->logfile;
reactive_slots_change_hfs_and_wait_for_xmins('shared_row_removal_',
- 'no_conflict_', 0, 1);
+ 'no_conflict_', 1, 0);
# This should not trigger a conflict
wait_until_vacuum_can_remove(
@@ -754,12 +756,12 @@ wait_until_vacuum_can_remove(
# message should not be issued
ok( !$node_standby->log_contains(
- "invalidating obsolete slot \"no_conflict_inactiveslot\"", $logstart),
+ "invalidating obsolete replication slot \"no_conflict_inactiveslot\"", $logstart),
'inactiveslot slot invalidation is not logged with vacuum on conflict_test'
);
ok( !$node_standby->log_contains(
- "invalidating obsolete slot \"no_conflict_activeslot\"", $logstart),
+ "invalidating obsolete replication slot \"no_conflict_activeslot\"", $logstart),
'activeslot slot invalidation is not logged with vacuum on conflict_test'
);
--
2.43.0
v6-0019-Reorder-heap_page_prune_and_freeze-parameters.patchtext/x-patch; charset=US-ASCII; name=v6-0019-Reorder-heap_page_prune_and_freeze-parameters.patchDownload
From d0d3520b2ee064b93449dede1e8ff88b5dc35510 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <melanieplageman@gmail.com>
Date: Thu, 31 Jul 2025 12:08:18 -0400
Subject: [PATCH v6 19/20] Reorder heap_page_prune_and_freeze parameters
Reorder parameters so that all of the output parameters are together at
the end of the parameter list.
---
src/backend/access/heap/pruneheap.c | 38 ++++++++++++++--------------
src/backend/access/heap/vacuumlazy.c | 6 ++---
src/include/access/heapam.h | 4 +--
3 files changed, 24 insertions(+), 24 deletions(-)
diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index 5d943b0c64f..20f4a62fb16 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -297,10 +297,10 @@ heap_page_prune_opt(Relation relation, Buffer buffer, Buffer *vmbuffer)
* not the relation has indexes, since we cannot safely determine
* that during on-access pruning with the current implementation.
*/
- heap_page_prune_and_freeze(relation, buffer, false,
+ heap_page_prune_and_freeze(relation, buffer, options, false,
vmbuffer ? *vmbuffer : InvalidBuffer,
- vistest, options,
- NULL, &presult, PRUNE_ON_ACCESS,
+ vistest,
+ NULL, PRUNE_ON_ACCESS, &presult,
&dummy_off_loc, NULL, NULL);
/*
@@ -645,6 +645,15 @@ heap_page_will_freeze(Relation relation, Buffer buffer,
* also need to account for a reduction in the length of the line pointer
* array following array truncation by us.
*
+ * options:
+ * MARK_UNUSED_NOW indicates that dead items can be set LP_UNUSED during
+ * pruning.
+ *
+ * FREEZE indicates that we will also freeze tuples, and will return
+ * 'all_visible', 'all_frozen' flags to the caller.
+ *
+ * UPDATE_VM indicates that we will set the page's status in the VM.
+ *
* If the HEAP_PRUNE_FREEZE option is set, we will also freeze tuples if it's
* required in order to advance relfrozenxid / relminmxid, or if it's
* considered advantageous for overall system performance to do so now. The
@@ -663,30 +672,21 @@ heap_page_will_freeze(Relation relation, Buffer buffer,
* contain the required block of the visibility map.
*
* vistest is used to distinguish whether tuples are DEAD or RECENTLY_DEAD
- * (see heap_prune_satisfies_vacuum).
- *
- * options:
- * MARK_UNUSED_NOW indicates that dead items can be set LP_UNUSED during
- * pruning.
- *
- * FREEZE indicates that we will also freeze tuples, and will return
- * 'all_visible', 'all_frozen' flags to the caller.
- *
- * UPDATE_VM indicates that we will set the page's status in the VM.
+ * (see heap_prune_satisfies_vacuum). It is an input parameter.
*
* cutoffs contains the freeze cutoffs, established by VACUUM at the beginning
* of vacuuming the relation. Required if HEAP_PRUNE_FREEZE option is set.
* cutoffs->OldestXmin is also used to determine if dead tuples are
- * HEAPTUPLE_RECENTLY_DEAD or HEAPTUPLE_DEAD.
+ * HEAPTUPLE_RECENTLY_DEAD or HEAPTUPLE_DEAD. It is an input parameter.
+ *
+ * reason indicates why the pruning is performed. It is included in the WAL
+ * record for debugging and analysis purposes, but otherwise has no effect.
*
* presult contains output parameters needed by callers, such as the number of
* tuples removed and the offsets of dead items on the page after pruning.
* heap_page_prune_and_freeze() is responsible for initializing it. Required
* by all callers.
*
- * reason indicates why the pruning is performed. It is included in the WAL
- * record for debugging and analysis purposes, but otherwise has no effect.
- *
* off_loc is the offset location required by the caller to use in error
* callback.
*
@@ -699,13 +699,13 @@ heap_page_will_freeze(Relation relation, Buffer buffer,
*/
void
heap_page_prune_and_freeze(Relation relation, Buffer buffer,
+ int options,
bool blk_known_av,
Buffer vmbuffer,
GlobalVisState *vistest,
- int options,
struct VacuumCutoffs *cutoffs,
- PruneFreezeResult *presult,
PruneReason reason,
+ PruneFreezeResult *presult,
OffsetNumber *off_loc,
TransactionId *new_relfrozen_xid,
MultiXactId *new_relmin_mxid)
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index 777ec30eb82..120782fd8ec 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -1992,11 +1992,11 @@ lazy_scan_prune(LVRelState *vacrel,
if (vacrel->nindexes == 0)
prune_options |= HEAP_PAGE_PRUNE_MARK_UNUSED_NOW;
- heap_page_prune_and_freeze(rel, buf,
+ heap_page_prune_and_freeze(rel, buf, prune_options,
all_visible_according_to_vm,
vmbuffer,
- vacrel->vistest, prune_options,
- &vacrel->cutoffs, &presult, PRUNE_VACUUM_SCAN,
+ vacrel->vistest,
+ &vacrel->cutoffs, PRUNE_VACUUM_SCAN, &presult,
&vacrel->offnum,
&vacrel->NewRelfrozenXid, &vacrel->NewRelminMxid);
diff --git a/src/include/access/heapam.h b/src/include/access/heapam.h
index 16f7904a21e..0c4e5607627 100644
--- a/src/include/access/heapam.h
+++ b/src/include/access/heapam.h
@@ -394,13 +394,13 @@ struct GlobalVisState;
extern void heap_page_prune_opt(Relation relation, Buffer buffer,
Buffer *vmbuffer);
extern void heap_page_prune_and_freeze(Relation relation, Buffer buffer,
+ int options,
bool blk_known_av,
Buffer vmbuffer,
struct GlobalVisState *vistest,
- int options,
struct VacuumCutoffs *cutoffs,
- PruneFreezeResult *presult,
PruneReason reason,
+ PruneFreezeResult *presult,
OffsetNumber *off_loc,
TransactionId *new_relfrozen_xid,
MultiXactId *new_relmin_mxid);
--
2.43.0
v6-0016-Unset-all-visible-sooner-if-not-freezing.patchtext/x-patch; charset=US-ASCII; name=v6-0016-Unset-all-visible-sooner-if-not-freezing.patchDownload
From 2f2bfc3d3b436f460ae91e5cbdc8404063b90936 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <melanieplageman@gmail.com>
Date: Tue, 29 Jul 2025 14:35:13 -0400
Subject: [PATCH v6 16/20] Unset all-visible sooner if not freezing
In prune/freeze code, we delay unsetting all-visible/all-frozen in the
presence of dead items to allow opportunistically freezing tuples if the
whole page would be frozen except for those dead items -- which are
removed later in vacuum's third phase.
Future commits will allow on-access pruning to set the VM, which means
all-visible will be initialized to true instead of false and we will do
extra bookkeeping in heap_prune_unchanged_lp_normal() to keep track of
whether or not the page is all-visible.
Because on-access pruning will not freeze tuples, it makes sense to
unset all-visible as soon as we encounter an LP_DEAD item and
avoid continued bookkeeping since we know the page is not all-visible
and we won't be able to remove those dead items.
---
src/backend/access/heap/pruneheap.c | 10 ++++++++--
1 file changed, 8 insertions(+), 2 deletions(-)
diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index ab79d8a3ed9..80d055e5376 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -1493,8 +1493,11 @@ heap_prune_record_dead(PruneState *prstate, OffsetNumber offnum,
/*
* Deliberately delay unsetting all_visible until later during pruning.
- * Removable dead tuples shouldn't preclude freezing the page.
+ * Removable dead tuples shouldn't preclude freezing the page. If we won't
+ * attempt freezing, just unset all-visible now, though.
*/
+ if (!prstate->attempt_freeze)
+ prstate->all_visible = prstate->all_frozen = false;
/* Record the dead offset for vacuum */
prstate->deadoffsets[prstate->lpdead_items++] = offnum;
@@ -1752,8 +1755,11 @@ heap_prune_record_unchanged_lp_dead(Page page, PruneState *prstate, OffsetNumber
* Similarly, don't unset all_visible until later, at the end of
* heap_page_prune_and_freeze(). This will allow us to attempt to freeze
* the page after pruning. As long as we unset it before updating the
- * visibility map, this will be correct.
+ * visibility map, this will be correct. If we won't attempt freezing,
+ * though, just unset all-visible now.
*/
+ if (!prstate->attempt_freeze)
+ prstate->all_visible = prstate->all_frozen = false;
/* Record the dead offset for vacuum */
prstate->deadoffsets[prstate->lpdead_items++] = offnum;
--
2.43.0
v6-0018-Add-helper-functions-to-heap_page_prune_and_freez.patchtext/x-patch; charset=US-ASCII; name=v6-0018-Add-helper-functions-to-heap_page_prune_and_freez.patchDownload
From e80c4241826a58f212601798cd398c4a318a6511 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <melanieplageman@gmail.com>
Date: Wed, 30 Jul 2025 18:51:43 -0400
Subject: [PATCH v6 18/20] Add helper functions to heap_page_prune_and_freeze
heap_page_prune_and_freeze() has gotten rather long. It has several
stages:
1) setup - where the PruneStateis set up
2) tuple examination -- where tuples and line pointers are examined to
determine what needs to be pruned and what could be frozen
3) evaluation -- where we determine based on caller provided options,
heuristics, and state gathered during stage 2 whether or not to
freeze tuples and set the page in the VM
4) execution - where the page changes are actually made and logged
This commit refactors the evaluation stage into helpers which return
whether or not to freeze and set the VM.
For the purposes of committing, this likely shouldn't be a separate
commit. But I'm not sure yet whether it makes more sense to do this
refactoring earlier in the set for clarity for the reviewer.
---
src/backend/access/heap/pruneheap.c | 471 +++++++++++++++++-----------
1 file changed, 295 insertions(+), 176 deletions(-)
diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index dad341cb265..5d943b0c64f 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -179,6 +179,22 @@ static void heap_prune_record_unchanged_lp_redirect(PruneState *prstate, OffsetN
static void page_verify_redirects(Page page);
+static bool heap_page_will_freeze(Relation relation, Buffer buffer,
+ bool do_prune,
+ bool do_hint_full_or_prunable,
+ bool did_tuple_hint_fpi,
+ PruneState *prstate,
+ bool *all_frozen_except_lp_dead);
+
+static bool heap_page_will_update_vm(Relation relation,
+ Buffer buffer, BlockNumber blockno, Page page,
+ PruneReason reason,
+ bool do_prune, bool do_freeze,
+ bool blk_known_av,
+ PruneState *prstate,
+ Buffer *vmbuffer, uint8 *vmflags,
+ bool *set_pd_all_visible);
+
static bool identify_and_fix_vm_corruption(Relation relation,
BlockNumber heap_blk,
Buffer heap_buffer, Page heap_page,
@@ -376,6 +392,249 @@ identify_and_fix_vm_corruption(Relation relation,
return false;
}
+
+/*
+ * Determine whether to set the visibility map bits based on information from
+ * the PruneState and blk_known_av, which some callers will provide after
+ * previously examining this heap page's VM bits (e.g. vacuum from the last
+ * heap_vac_scan_next_block() call).
+ *
+ * We pass in blockno and page even those can be derived from buffer to avoid
+ * extra BufferGetBlock() and BufferGetBlockNumber() calls.
+ *
+ * This should be called only after do_freeze has been decided (and do_prune
+ * has been set), as these factor into our heuristic-based decision.
+ *
+ * prstate and vmbuffer are input/output fields. vmflags and and
+ * set_pd_all_visible are output fields.
+ *
+ * Returns true if the caller should set one or both of the VM bits and false
+ * otherwise.
+ */
+static bool
+heap_page_will_update_vm(Relation relation,
+ Buffer buffer, BlockNumber blockno, Page page,
+ PruneReason reason,
+ bool do_prune, bool do_freeze,
+ bool blk_known_av,
+ PruneState *prstate,
+ Buffer *vmbuffer, uint8 *vmflags,
+ bool *set_pd_all_visible)
+{
+ bool do_set_vm = false;
+
+ /*
+ * If the caller specified not to update the VM, validate everything is in
+ * the right state and exit.
+ */
+ if (!prstate->consider_update_vm)
+ {
+ Assert(!prstate->all_visible && !prstate->all_frozen);
+ /* We don't set only the page level visibility hint */
+ Assert(!(*set_pd_all_visible));
+ Assert(*vmflags == 0);
+ return false;
+ }
+
+ /*
+ * If this is an on-access call and we're not actually pruning, avoid
+ * setting the visibility map if it would newly dirty the heap page or, if
+ * the page is already dirty, if doing so would require including a
+ * full-page image (FPI) of the heap page in the WAL. This situation
+ * should be rare, as on-access pruning is only attempted when
+ * pd_prune_xid is valid.
+ */
+ if (reason == PRUNE_ON_ACCESS &&
+ prstate->consider_update_vm &&
+ prstate->all_visible &&
+ !do_prune && !do_freeze &&
+ (!BufferIsDirty(buffer) || XLogCheckBufferNeedsBackup(buffer)))
+ {
+ prstate->consider_update_vm = false;
+ prstate->all_visible = prstate->all_frozen = false;
+ }
+
+ Assert(!prstate->all_frozen || prstate->all_visible);
+
+ /*
+ * Clear any VM corruption. This does not need to be in a critical
+ * section, so we do it first. If PD_ALL_VISIBLE is incorrectly set, we
+ * may mark the heap page buffer dirty here and could end up doing so
+ * again later. This is not a correctness issue and is in the path of VM
+ * corruption, so we don't have to worry about the extra performance
+ * overhead.
+ */
+ if (identify_and_fix_vm_corruption(relation,
+ blockno, buffer, page,
+ blk_known_av, prstate->lpdead_items,
+ *vmbuffer))
+ {
+ /* If we fix corruption, don't update the VM further */
+ }
+
+ /* Determine if we actually need to set the VM and which bits to set. */
+ else if (prstate->all_visible &&
+ (!blk_known_av ||
+ (prstate->all_frozen && !VM_ALL_FROZEN(relation, blockno, vmbuffer))))
+ {
+ *vmflags |= VISIBILITYMAP_ALL_VISIBLE;
+ if (prstate->all_frozen)
+ *vmflags |= VISIBILITYMAP_ALL_FROZEN;
+ }
+
+ do_set_vm = *vmflags & VISIBILITYMAP_VALID_BITS;
+
+ /*
+ * Don't set PD_ALL_VISIBLE unless we also plan to set the VM. While it is
+ * correct for a heap page to have PD_ALL_VISIBLE even if the VM is not
+ * set, we strongly prefer to keep them in sync.
+ *
+ * Prior to Postgres 19, it was possible for the page-level bit to be set
+ * and the VM bit to be clear. This could happen if we crashed after
+ * setting PD_ALL_VISIBLE but before setting bits in the VM.
+ */
+ *set_pd_all_visible = do_set_vm && !PageIsAllVisible(page);
+ return do_set_vm;
+}
+
+/*
+ * Decide if we want to go ahead with freezing according to the freeze plans we
+ * prepared for the given buffer or not. If the caller specified we should not
+ * freeze tuples, it exits early.
+ *
+ * do_prune, do_hint_full_or_prunable, and did_tuple_hint_fpi must all have
+ * been decided before calling this function.
+ *
+ * prstate is an input/output parameter. all_frozen_except_lp_dead is set and
+ * used later to determine the snapshot conflict horizon for the record.
+ *
+ * Returns true if we should use our freeze plans and freeze tuples on the page
+ * and false otherwise.
+ */
+static bool
+heap_page_will_freeze(Relation relation, Buffer buffer,
+ bool do_prune,
+ bool do_hint_full_or_prunable,
+ bool did_tuple_hint_fpi,
+ PruneState *prstate,
+ bool *all_frozen_except_lp_dead)
+{
+ bool do_freeze = false;
+
+ /*
+ * If the caller specified we should not attempt to freeze any tuples,
+ * validate that everything is in the right state and exit.
+ */
+ if (!prstate->attempt_freeze)
+ {
+ Assert(!prstate->all_frozen && prstate->nfrozen == 0);
+ Assert(prstate->lpdead_items == 0 || !prstate->all_visible);
+ Assert(!(*all_frozen_except_lp_dead));
+ return false;
+ }
+
+ if (prstate->pagefrz.freeze_required)
+ {
+ /*
+ * heap_prepare_freeze_tuple indicated that at least one XID/MXID from
+ * before FreezeLimit/MultiXactCutoff is present. Must freeze to
+ * advance relfrozenxid/relminmxid.
+ */
+ do_freeze = true;
+ }
+ else
+ {
+ /*
+ * Opportunistically freeze the page if we are generating an FPI
+ * anyway and if doing so means that we can set the page all-frozen
+ * afterwards (might not happen until VACUUM's final heap pass).
+ *
+ * XXX: Previously, we knew if pruning emitted an FPI by checking
+ * pgWalUsage.wal_fpi before and after pruning. Once the freeze and
+ * prune records were combined, this heuristic couldn't be used
+ * anymore. The opportunistic freeze heuristic must be improved;
+ * however, for now, try to approximate the old logic.
+ */
+ if (prstate->all_visible && prstate->all_frozen && prstate->nfrozen > 0)
+ {
+ /*
+ * Freezing would make the page all-frozen. Have already emitted
+ * an FPI or will do so anyway?
+ */
+ if (RelationNeedsWAL(relation))
+ {
+ if (did_tuple_hint_fpi)
+ do_freeze = true;
+ else if (do_prune)
+ {
+ if (XLogCheckBufferNeedsBackup(buffer))
+ do_freeze = true;
+ }
+ else if (do_hint_full_or_prunable)
+ {
+ if (XLogHintBitIsNeeded() && XLogCheckBufferNeedsBackup(buffer))
+ do_freeze = true;
+ }
+ }
+ }
+ }
+
+ if (do_freeze)
+ {
+ /*
+ * Validate the tuples we will be freezing before entering the
+ * critical section.
+ */
+ heap_pre_freeze_checks(buffer, prstate->frozen, prstate->nfrozen);
+ }
+ else if (prstate->nfrozen > 0)
+ {
+ /*
+ * The page contained some tuples that were not already frozen, and we
+ * chose not to freeze them now. The page won't be all-frozen then.
+ */
+ Assert(!prstate->pagefrz.freeze_required);
+
+ prstate->all_frozen = false;
+ prstate->nfrozen = 0; /* avoid miscounts in instrumentation */
+ }
+ else
+ {
+ /*
+ * We have no freeze plans to execute. The page might already be
+ * all-frozen (perhaps only following pruning), though. Such pages
+ * can be marked all-frozen in the VM by our caller, even though none
+ * of its tuples were newly frozen here.
+ */
+ }
+
+ /*
+ * It was convenient to ignore LP_DEAD items in all_visible earlier on to
+ * make the choice of whether or not to freeze the page unaffected by the
+ * short-term presence of LP_DEAD items. These LP_DEAD items were
+ * effectively assumed to be LP_UNUSED items in the making. It doesn't
+ * matter which vacuum heap pass (initial pass or final pass) ends up
+ * setting the page all-frozen, as long as the ongoing VACUUM does it.
+ *
+ * Now that freezing has been finalized, unset all_visible if there are
+ * any LP_DEAD items on the page. It needs to reflect the present state of
+ * the page when using it to determine whether or not to update the VM.
+ *
+ * Keep track of whether or not the page was all-frozen except LP_DEAD
+ * items for the purposes of calculating the snapshot conflict horizon,
+ * though.
+ */
+ *all_frozen_except_lp_dead = prstate->all_frozen;
+ if (prstate->lpdead_items > 0)
+ {
+ prstate->all_visible = false;
+ prstate->all_frozen = false;
+ }
+
+ return do_freeze;
+}
+
+
/*
* Prune and repair fragmentation and potentially freeze tuples on the
* specified page. If the page's visibility status has changed, update it in
@@ -766,20 +1025,30 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
/* Clear the offset information once we have processed the given page. */
*off_loc = InvalidOffsetNumber;
- do_prune = prstate.nredirected > 0 ||
- prstate.ndead > 0 ||
- prstate.nunused > 0;
-
/*
* After processing all the live tuples on the page, if the newest xmin
* amongst them is not visible to everyone, the page cannot be
- * all-visible.
+ * all-visible. This must be done before we decide whether or not to
+ * opportunistically freeze below because we do not want to
+ * opportunistically freeze the page if there are live tuples not visible
+ * to everyone, which would prevent setting the page frozen in the VM.
*/
if (prstate.all_visible &&
TransactionIdIsNormal(prstate.visibility_cutoff_xid) &&
!GlobalVisXidVisibleToAll(prstate.vistest, prstate.visibility_cutoff_xid))
prstate.all_visible = prstate.all_frozen = false;
+ /*
+ * Now decide based on information collected while examining every tuple
+ * which actions to take. If there are any prunable tuples, we'll prune
+ * them. However, we will decide based on options specified by the caller
+ * and various heuristics whether or not to freeze any tuples and whether
+ * or not the page should be set all-visible/all-frozen in the VM.
+ */
+ do_prune = prstate.nredirected > 0 ||
+ prstate.ndead > 0 ||
+ prstate.nunused > 0;
+
/*
* Even if we don't prune anything, if we found a new value for the
* pd_prune_xid field or the page was marked full, we will update those
@@ -790,182 +1059,32 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
PageIsFull(page);
/*
- * Decide if we want to go ahead with freezing according to the freeze
- * plans we prepared, or not.
- */
- do_freeze = false;
- if (prstate.attempt_freeze)
- {
- if (prstate.pagefrz.freeze_required)
- {
- /*
- * heap_prepare_freeze_tuple indicated that at least one XID/MXID
- * from before FreezeLimit/MultiXactCutoff is present. Must
- * freeze to advance relfrozenxid/relminmxid.
- */
- do_freeze = true;
- }
- else
- {
- /*
- * Opportunistically freeze the page if we are generating an FPI
- * anyway and if doing so means that we can set the page
- * all-frozen afterwards (might not happen until VACUUM's final
- * heap pass).
- *
- * XXX: Previously, we knew if pruning emitted an FPI by checking
- * pgWalUsage.wal_fpi before and after pruning. Once the freeze
- * and prune records were combined, this heuristic couldn't be
- * used anymore. The opportunistic freeze heuristic must be
- * improved; however, for now, try to approximate the old logic.
- */
- if (prstate.all_visible && prstate.all_frozen && prstate.nfrozen > 0)
- {
- /*
- * Freezing would make the page all-frozen. Have already
- * emitted an FPI or will do so anyway?
- */
- if (RelationNeedsWAL(relation))
- {
- if (did_tuple_hint_fpi)
- do_freeze = true;
- else if (do_prune)
- {
- if (XLogCheckBufferNeedsBackup(buffer))
- do_freeze = true;
- }
- else if (do_hint_full_or_prunable)
- {
- if (XLogHintBitIsNeeded() && XLogCheckBufferNeedsBackup(buffer))
- do_freeze = true;
- }
- }
- }
- }
- }
-
- if (do_freeze)
- {
- /*
- * Validate the tuples we will be freezing before entering the
- * critical section.
- */
- heap_pre_freeze_checks(buffer, prstate.frozen, prstate.nfrozen);
- }
- else if (prstate.nfrozen > 0)
- {
- /*
- * The page contained some tuples that were not already frozen, and we
- * chose not to freeze them now. The page won't be all-frozen then.
- */
- Assert(!prstate.pagefrz.freeze_required);
-
- prstate.all_frozen = false;
- prstate.nfrozen = 0; /* avoid miscounts in instrumentation */
- }
- else
- {
- /*
- * We have no freeze plans to execute. The page might already be
- * all-frozen (perhaps only following pruning), though. Such pages
- * can be marked all-frozen in the VM by our caller, even though none
- * of its tuples were newly frozen here.
- */
- }
-
- /*
- * It was convenient to ignore LP_DEAD items in all_visible earlier on to
- * make the choice of whether or not to freeze the page unaffected by the
- * short-term presence of LP_DEAD items. These LP_DEAD items were
- * effectively assumed to be LP_UNUSED items in the making. It doesn't
- * matter which vacuum heap pass (initial pass or final pass) ends up
- * setting the page all-frozen, as long as the ongoing VACUUM does it.
- *
- * Now that freezing has been finalized, unset all_visible if there are
- * any LP_DEAD items on the page. It needs to reflect the present state of
- * the page when using it to determine whether or not to update the VM.
- *
- * Keep track of whether or not the page was all-frozen except LP_DEAD
- * items for the purposes of calculating the snapshot conflict horizon,
- * though.
+ * We must decide whether or not to freeze before deciding if and what to
+ * set in the VM.
*/
- all_frozen_except_lp_dead = prstate.all_frozen;
- if (prstate.lpdead_items > 0)
- {
- prstate.all_visible = false;
- prstate.all_frozen = false;
- }
-
- /*
- * If this is an on-access call and we're not actually pruning, avoid
- * setting the visibility map if it would newly dirty the heap page or, if
- * the page is already dirty, if doing so would require including a
- * full-page image (FPI) of the heap page in the WAL. This situation
- * should be rare, as on-access pruning is only attempted when
- * pd_prune_xid is valid.
- */
- if (reason == PRUNE_ON_ACCESS &&
- prstate.consider_update_vm &&
- prstate.all_visible &&
- !do_prune && !do_freeze &&
- (!BufferIsDirty(buffer) || XLogCheckBufferNeedsBackup(buffer)))
- {
- prstate.consider_update_vm = false;
- prstate.all_visible = prstate.all_frozen = false;
- }
-
- Assert(!prstate.all_frozen || prstate.all_visible);
-
- /*
- * Handle setting visibility map bit based on information from the VM (if
- * provided, e.g. by vacuum from the last heap_vac_scan_next_block()
- * call), and from all_visible and all_frozen variables.
- */
- if (prstate.consider_update_vm)
- {
- /*
- * Clear any VM corruption. This does not need to be in a critical
- * section, so we do it first. If PD_ALL_VISIBLE is incorrectly set,
- * we may mark the heap page buffer dirty here and could end up doing
- * so again later. This is not a correctness issue and is in the path
- * of VM corruption, so we don't have to worry about the extra
- * performance overhead.
- */
- if (identify_and_fix_vm_corruption(relation,
- blockno, buffer, page,
- blk_known_av, prstate.lpdead_items, vmbuffer))
- {
- /* If we fix corruption, don't update the VM further */
- }
-
- /* Determine if we actually need to set the VM and which bits to set. */
- else if (prstate.all_visible &&
- (!blk_known_av ||
- (prstate.all_frozen && !VM_ALL_FROZEN(relation, blockno, &vmbuffer))))
- {
- vmflags |= VISIBILITYMAP_ALL_VISIBLE;
- if (prstate.all_frozen)
- vmflags |= VISIBILITYMAP_ALL_FROZEN;
- }
- }
-
- do_set_vm = vmflags & VISIBILITYMAP_VALID_BITS;
-
- /*
- * Don't set PD_ALL_VISIBLE unless we also plan to set the VM. While it is
- * correct for a heap page to have PD_ALL_VISIBLE even if the VM is not
- * set, we strongly prefer to keep them in sync.
- *
- * Prior to Postgres 19, it was possible for the page-level bit to be set
- * and the VM bit to be clear. This could happen if we crashed after
- * setting PD_ALL_VISIBLE but before setting bits in the VM.
- */
- set_pd_all_visible = do_set_vm && !PageIsAllVisible(page);
+ do_freeze = heap_page_will_freeze(relation, buffer,
+ do_prune,
+ do_hint_full_or_prunable,
+ did_tuple_hint_fpi,
+ &prstate,
+ &all_frozen_except_lp_dead);
+
+ do_set_vm = heap_page_will_update_vm(relation,
+ buffer, blockno, page,
+ reason,
+ do_prune, do_freeze,
+ blk_known_av,
+ &prstate,
+ &vmbuffer,
+ &vmflags, &set_pd_all_visible);
/* Save these for the caller in case we later zero out vmflags */
presult->new_vmbits = vmflags;
- /* Any error while applying the changes is critical */
+ /*
+ * Time to actually make the changes to the page and log them. Any error
+ * while applying the changes is critical.
+ */
START_CRIT_SECTION();
if (do_hint_full_or_prunable)
--
2.43.0
v6-0020-Set-pd_prune_xid-on-insert.patchtext/x-patch; charset=US-ASCII; name=v6-0020-Set-pd_prune_xid-on-insert.patchDownload
From 40bcb601af134bfa13af29baecf5d6a6f299e5d7 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <melanieplageman@gmail.com>
Date: Tue, 29 Jul 2025 16:12:56 -0400
Subject: [PATCH v6 20/20] Set pd_prune_xid on insert
Now that we can set the VM during read-only queries, it makes sense to
start setting the page prunable hint on insert. This will allow
heap_page_prune_and_freeze() to be called when the page is full or
mostly full.
For years there has been a note in heap_insert() and heap_multi_insert()
pointing out that setting pd_prune_xid would help clean up aborted
inserted tuples that would otherwise not be cleaned up until vacuum.
So, that's another benefit of setting it.
---
src/backend/access/heap/heapam.c | 25 +++++++++++++++++--------
src/backend/access/heap/heapam_xlog.c | 15 ++++++++++++++-
2 files changed, 31 insertions(+), 9 deletions(-)
diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index f90b014a9b0..e0f2245052c 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -2094,6 +2094,7 @@ heap_insert(Relation relation, HeapTuple tup, CommandId cid,
TransactionId xid = GetCurrentTransactionId();
HeapTuple heaptup;
Buffer buffer;
+ Page page;
Buffer vmbuffer = InvalidBuffer;
bool all_visible_cleared = false;
@@ -2153,15 +2154,19 @@ heap_insert(Relation relation, HeapTuple tup, CommandId cid,
}
/*
- * XXX Should we set PageSetPrunable on this page ?
+ * Set pd_prune_xid to trigger heap_page_prune_and_freeze() once the page
+ * is full so that we can set the page all-visible in the VM.
*
- * The inserting transaction may eventually abort thus making this tuple
- * DEAD and hence available for pruning. Though we don't want to optimize
- * for aborts, if no other tuple in this page is UPDATEd/DELETEd, the
- * aborted tuple will never be pruned until next vacuum is triggered.
+ * Setting pd_prune_xid is also handy if the inserting transaction
+ * eventually aborts making this tuple DEAD and hence available for
+ * pruning. If no other tuple in this page is UPDATEd/DELETEd, the aborted
+ * tuple would never otherwise be pruned until next vacuum is triggered.
*
- * If you do add PageSetPrunable here, add it in heap_xlog_insert too.
+ * Don't set it if we are in bootstrap mode, though.
*/
+ page = BufferGetPage(buffer);
+ if (TransactionIdIsNormal(xid))
+ PageSetPrunable(page, xid);
MarkBufferDirty(buffer);
@@ -2171,7 +2176,6 @@ heap_insert(Relation relation, HeapTuple tup, CommandId cid,
xl_heap_insert xlrec;
xl_heap_header xlhdr;
XLogRecPtr recptr;
- Page page = BufferGetPage(buffer);
uint8 info = XLOG_HEAP_INSERT;
int bufflags = 0;
@@ -2534,8 +2538,13 @@ heap_multi_insert(Relation relation, TupleTableSlot **slots, int ntuples,
}
/*
- * XXX Should we set PageSetPrunable on this page ? See heap_insert()
+ * Set pd_prune_xid. See heap_insert() for more on why we do this when
+ * inserting tuples. This only makes sense if we aren't already
+ * setting the page frozen in the VM. We also don't set it in
+ * bootstrap mode.
*/
+ if (!all_frozen_set && TransactionIdIsNormal(xid))
+ PageSetPrunable(page, xid);
MarkBufferDirty(buffer);
diff --git a/src/backend/access/heap/heapam_xlog.c b/src/backend/access/heap/heapam_xlog.c
index 64f06d46bf1..234e9a401b9 100644
--- a/src/backend/access/heap/heapam_xlog.c
+++ b/src/backend/access/heap/heapam_xlog.c
@@ -473,6 +473,12 @@ heap_xlog_insert(XLogReaderState *record)
freespace = PageGetHeapFreeSpace(page); /* needed to update FSM below */
+ /*
+ * Set the page prunable to trigger on-access pruning later which may
+ * set the page all-visible in the VM.
+ */
+ PageSetPrunable(page, XLogRecGetXid(record));
+
PageSetLSN(page, lsn);
if (xlrec->flags & XLH_INSERT_ALL_VISIBLE_CLEARED)
@@ -622,9 +628,16 @@ heap_xlog_multi_insert(XLogReaderState *record)
if (xlrec->flags & XLH_INSERT_ALL_VISIBLE_CLEARED)
PageClearAllVisible(page);
- /* XLH_INSERT_ALL_FROZEN_SET implies that all tuples are visible */
+ /*
+ * XLH_INSERT_ALL_FROZEN_SET implies that all tuples are visible. If
+ * we are not setting the page frozen, then set the page's prunable
+ * hint so that we trigger on-access pruning later which may set the
+ * page all-visible in the VM.
+ */
if (xlrec->flags & XLH_INSERT_ALL_FROZEN_SET)
PageSetAllVisible(page);
+ else
+ PageSetPrunable(page, XLogRecGetXid(record));
MarkBufferDirty(buffer);
}
--
2.43.0
On Sat, 2 Aug 2025 at 02:36, Melanie Plageman <melanieplageman@gmail.com> wrote:
On Thu, Jul 31, 2025 at 6:58 PM Melanie Plageman
<melanieplageman@gmail.com> wrote:The patch "Set-pd_prune_xid-on-insert.txt" can be applied as the last
patch in the set. It sets pd_prune_xid on insert (so pages filled by
COPY or insert can also be set all-visible in the VM before they are
vacuumed). I gave it a .txt extension because it currently fails
035_standby_logical_decoding due to a recovery conflict. I need to
investigate more to see if this is a bug in my patch set or elsewhere
in Postgres.I figured out that if we set the VM on-access, we need to enable
hot_standby_feedback in more places in 035_standby_logical_decoding.pl
to avoid recovery conflicts. I've done that in the attached updated
version 6. There are a few other issues in
035_standby_logical_decoding.pl that I reported here [1]. With these
changes, setting pd_prune_xid on insert passes tests. Whether or not
we want to do it (and what the heuristic should be for deciding when
to do it) is another question.- Melanie
[1] /messages/by-id/CAAKRu_YO2mEm=ZWZKPjTMU=gW5Y83_KMi_1cr51JwavH0ctd7w@mail.gmail.com
Hi!
Andrey told me off-list about this thread and I decided to take a look.
I tried to play with each patch in this patchset and find a
corruption, but I was unsuccessful. I will conduct further tests
later. I am not implying that I suspect this patchset causes any
corruption; I am merely attempting to verify it.
I also have few comments and questions. Here is my (very limited)
review of 0001, because I believe that removing xl_heap_visible from
COPY FREEZE is pure win, so this patch can be very beneficial by
itself.
visibilitymap_set_vmbyte is introduced in 0001 and removed in 0012.
This is strange to me, maybe we can avoid visibilitymap_set_vmbyte in
first place?
In 0001:
1)
should we add "Assert(LWLockHeldByMeInMode(BufferDescriptorGetContentLock(bufHdr),
LW_EXCLUSIVE));" in visibilitymap_set_vmbyte?
Also here `Assert(visibilitymap_pin_ok(BufferGetBlockNumber(buffer),
vmbuffer));` can be beneficial:
/* + * If we're only adding already frozen rows to a previously empty + * page, mark it as all-frozen and update the visibility map. We're + * already holding a pin on the vmbuffer. + */ else if (all_frozen_set) + { PageSetAllVisible(page); + LockBuffer(vmbuffer, BUFFER_LOCK_EXCLUSIVE); + visibilitymap_set_vmbyte(relation, + BufferGetBlockNumber(buffer), + vmbuffer, + VISIBILITYMAP_ALL_VISIBLE | + VISIBILITYMAP_ALL_FROZEN); + }
2)
in heap_xlog_multi_insert:
+
+ visibilitymap_pin(reln, blkno, &vmbuffer);
+ visibilitymap_set_vmbyte(....)
Do we need to pin vmbuffer here? Looks like
XLogReadBufferForRedoExtended already pins vmbuffer. I verified this
with CheckBufferIsPinnedOnce(vmbuffer) just before visibilitymap_pin
and COPY ... WITH (FREEZE true) test.
3)
+ +#ifdef TRACE_VISIBILITYMAP + elog(DEBUG1, "vm_set %s %d", RelationGetRelationName(rel), heapBlk); +#endif
I can see this merely copy-pasted from visibilitymap_set, but maybe
display "flags" also?
4) visibilitymap_set receives XLogRecPtr recptr parameters, which is
set to WAL record lsn during recovery and to InvalidXLogRecPtr
otherwise. visibilitymap_set manages VM page LSN bases on this recptr
value (inside function logic). visibilitymap_set_vmbyte behaves
vise-versa and makes its caller responsible for page LSN management.
Maybe we should keep these two functions akin to each other?
--
Best regards,
Kirill Reshke
On Sat, 2 Aug 2025 at 02:36, Melanie Plageman <melanieplageman@gmail.com> wrote:
On Thu, Jul 31, 2025 at 6:58 PM Melanie Plageman
<melanieplageman@gmail.com> wrote:The patch "Set-pd_prune_xid-on-insert.txt" can be applied as the last
patch in the set. It sets pd_prune_xid on insert (so pages filled by
COPY or insert can also be set all-visible in the VM before they are
vacuumed). I gave it a .txt extension because it currently fails
035_standby_logical_decoding due to a recovery conflict. I need to
investigate more to see if this is a bug in my patch set or elsewhere
in Postgres.I figured out that if we set the VM on-access, we need to enable
hot_standby_feedback in more places in 035_standby_logical_decoding.pl
to avoid recovery conflicts. I've done that in the attached updated
version 6. There are a few other issues in
035_standby_logical_decoding.pl that I reported here [1]. With these
changes, setting pd_prune_xid on insert passes tests. Whether or not
we want to do it (and what the heuristic should be for deciding when
to do it) is another question.- Melanie
[1] /messages/by-id/CAAKRu_YO2mEm=ZWZKPjTMU=gW5Y83_KMi_1cr51JwavH0ctd7w@mail.gmail.com
0002 No comments from me. Looks straightforward.
Few comments on 0003.
1) This patch introduces XLHP_HAS_VMFLAGS. However it lacks some
helpful comments about this new status bit.
a) In heapam_xlog.h, in xl_heap_prune struct definition:
/*
* If XLHP_HAS_CONFLICT_HORIZON is set, the conflict horizon XID follows,
* unaligned
*/
+ /* If XLHP_HAS_VMFLAGS is set, newly set visibility map bits comes,
unaligned */
b)
we can add here comment why we use memcpy assignment, akin to /*
memcpy() because snapshot_conflict_horizon is stored unaligned */
+ /* Next are the optionally included vmflags. Copy them out for later use. */
+ if ((xlrec.flags & XLHP_HAS_VMFLAGS) != 0)
+ {
+ memcpy(&vmflags, maindataptr, sizeof(uint8));
+ maindataptr += sizeof(uint8);
2) Should we move conflict_xid = visibility_cutoff_xid; assignment
just after heap_page_is_all_visible_except_lpdead call in
lazy_vacuum_heap_page?
3) Looking at this diff, do not comprehend one bit: how are we
protected from passing an all-visible page to lazy_vacuum_heap_page. I
did not manage to reproduce such behaviour though.
+ if ((vmflags & VISIBILITYMAP_VALID_BITS) != 0)
+ {
+ Assert(!PageIsAllVisible(page));
+ set_pd_all_vis = true;
+ LockBuffer(vmbuffer, BUFFER_LOCK_EXCLUSIVE);
+ PageSetAllVisible(page);
+ visibilitymap_set_vmbyte(vacrel->rel,
+ blkno,
+
--
Best regards,
Kirill Reshke
On Sat, 2 Aug 2025 at 02:36, Melanie Plageman <melanieplageman@gmail.com> wrote:
On Thu, Jul 31, 2025 at 6:58 PM Melanie Plageman
<melanieplageman@gmail.com> wrote:The patch "Set-pd_prune_xid-on-insert.txt" can be applied as the last
patch in the set. It sets pd_prune_xid on insert (so pages filled by
COPY or insert can also be set all-visible in the VM before they are
vacuumed). I gave it a .txt extension because it currently fails
035_standby_logical_decoding due to a recovery conflict. I need to
investigate more to see if this is a bug in my patch set or elsewhere
in Postgres.I figured out that if we set the VM on-access, we need to enable
hot_standby_feedback in more places in 035_standby_logical_decoding.pl
to avoid recovery conflicts. I've done that in the attached updated
version 6. There are a few other issues in
035_standby_logical_decoding.pl that I reported here [1]. With these
changes, setting pd_prune_xid on insert passes tests. Whether or not
we want to do it (and what the heuristic should be for deciding when
to do it) is another question.- Melanie
[1] /messages/by-id/CAAKRu_YO2mEm=ZWZKPjTMU=gW5Y83_KMi_1cr51JwavH0ctd7w@mail.gmail.com
v6-0015:
I chose to verify whether this single modification would be beneficial
on the HEAD.
Benchmark I did:
```
\timing
CREATE TABLE zz(i int);
alter table zz set (autovacuum_enabled = false);
TRUNCATE zz;
copy zz from program 'yes 2 | head -n 180000000';
copy zz from program 'yes 2 | head -n 180000000';
delete from zz where (REPLACE(REPLACE(ctid::text, '(', '{'), ')',
'}')::int[])[2] = 7 ;
VACUUM FREEZE zz;
```
And I checked perf top footprint for last statement (vacuum). My
detailed results are attached. It is a HEAD vs HEAD+v6-0015 benchmark.
TLDR: function inlining is indeed beneficial, TransactionIdPrecedes
function disappears from perf top footprint, though query runtime is
not changed much. So, while not resulting in query speedup, this can
save CPU.
Maybe we can derive an artificial benchmark, which will show query
speed up, but for now I dont have one.
--
Best regards,
Kirill Reshke
Attachments:
Thanks for all the reviews. I'm working on responding to your previous
mails with a new version.
On Wed, Aug 27, 2025 at 8:55 AM Kirill Reshke <reshkekirill@gmail.com> wrote:
v6-0015:
I chose to verify whether this single modification would be beneficial
on the HEAD.Benchmark I did:
```
\timing
CREATE TABLE zz(i int);
alter table zz set (autovacuum_enabled = false);
TRUNCATE zz;
copy zz from program 'yes 2 | head -n 180000000';
copy zz from program 'yes 2 | head -n 180000000';delete from zz where (REPLACE(REPLACE(ctid::text, '(', '{'), ')',
'}')::int[])[2] = 7 ;VACUUM FREEZE zz;
```And I checked perf top footprint for last statement (vacuum). My
detailed results are attached. It is a HEAD vs HEAD+v6-0015 benchmark.TLDR: function inlining is indeed beneficial, TransactionIdPrecedes
function disappears from perf top footprint, though query runtime is
not changed much. So, while not resulting in query speedup, this can
save CPU.
Maybe we can derive an artificial benchmark, which will show query
speed up, but for now I dont have one.
I'm not surprised that vacuum freeze does not show a speed up from the
function inlining.
This patch was key for avoiding a regression in the most contrived
worst case scenario example of setting the VM on-access. That is, if
you are pruning only a single tuple on the page as part of a SELECT
query that returns no tuples (think SELECT * FROM foo OFFSET N where N
is greater than the number of rows in the table), and I add
determining if the page is all visible, then the overhead of these
extra function calls in heap_prune_record_unchanged_lp_normal() is
noticeable.
We might be able to come up with a similar example in vacuum without
freeze since it will try to determine if the page is all-visible. Your
example is still running on my machine, though, so I haven't verified
this yet :)
- Melanie
Thanks for the review! Updates are in attached v7.
One note on 0022 in the set, which sets pd_prune_xid on insert: the
recently added index-killtuples isolation test was failing with this
patch applied. With the patch, the "access" step reports more heap
page hits than before. After some analysis, it seems one of the heap
pages in kill_prior_tuples table is now being pruned in an earlier
step. Somehow this leads to 4 hits counted instead of 3 (even though
there are only 4 blocks in the relation). I recall Bertrand mentioning
something in some other thread about hits being double counted with
AIO reads, so I'm going to try and go dig that up. But, for now, I've
modified the test -- I believe the patch is only revealing an issue
with instrumentation, not causing a bug.
On Tue, Aug 26, 2025 at 5:58 AM Kirill Reshke <reshkekirill@gmail.com> wrote:
visibilitymap_set_vmbyte is introduced in 0001 and removed in 0012.
This is strange to me, maybe we can avoid visibilitymap_set_vmbyte in
first place?
The reason for this is that in the earlier patch I introduce
visibilitymap_set_vmbyte() for one user while other users still use
visibilitymap_set(). But, by the end of the set, all users use
visibillitymap_set_vmbyte(). So I think it makes most sense for it to
be named visibilitymap_set() at that point. Until all users are
committed, the two functions both have to exist and need different
names.
In 0001:
should we add "Assert(LWLockHeldByMeInMode(BufferDescriptorGetContentLock(bufHdr),
LW_EXCLUSIVE));" in visibilitymap_set_vmbyte?
I don't want any operations on the heap buffer (including asserts) in
visibilitymap_set_vmbyte(). The heap block is only provided to look up
the VM bits.
I think your idea is a good one for the existing visibilitymap_set(),
though, so I've added a new patch to the set (0002) which does this. I
also added a similar assertion for the vmbuffer to
visibilitymap_set_vmbyte().
Also here `Assert(visibilitymap_pin_ok(BufferGetBlockNumber(buffer),
vmbuffer));` can be beneficial:
I had omitted this because the same logic is checked inside of
visiblitymap_set_vmbyte() with an error occurring if conditions are
not met. However, since the same is true in visibilitymap_set() and
heap_multi_insert() still asserted visiblitymap_pin_ok(), I've added
it back to my patch set.
in heap_xlog_multi_insert: + + visibilitymap_pin(reln, blkno, &vmbuffer); + visibilitymap_set_vmbyte(....)Do we need to pin vmbuffer here? Looks like
XLogReadBufferForRedoExtended already pins vmbuffer. I verified this
with CheckBufferIsPinnedOnce(vmbuffer) just before visibilitymap_pin
and COPY ... WITH (FREEZE true) test.
I thought the reason visibilitymap_set() did it was that it was
possible for the block of the VM corresponding to the block of the
heap to be different during recovery than it was when emitting the
record, and thus we needed the part of visiblitymap_pin() that
released the old vmbuffer and got the new one corresponding to the
heap block.
I can't quite think of how this could happen though.
Assuming it can't happen, then we can get rid of visiblitymap_pin()
(and add visibilitymap_pin_ok()) in both visiblitymap_set_vmbyte() and
visibilitymap_set(). I've done this to visibilitymap_set() in a
separate patch 0001. I would like other opinions/confirmation that the
block of the VM corresponding to the heap block cannot differ during
recovery from that what it was when the record was emitted during
normal operation, though.
+#ifdef TRACE_VISIBILITYMAP + elog(DEBUG1, "vm_set %s %d", RelationGetRelationName(rel), heapBlk); +#endifI can see this merely copy-pasted from visibilitymap_set, but maybe
display "flags" also?
Done in attached.
4) visibilitymap_set receives XLogRecPtr recptr parameters, which is
set to WAL record lsn during recovery and to InvalidXLogRecPtr
otherwise. visibilitymap_set manages VM page LSN bases on this recptr
value (inside function logic). visibilitymap_set_vmbyte behaves
vise-versa and makes its caller responsible for page LSN management.
Maybe we should keep these two functions akin to each other?
So, with visibilitymap_set_vmbyte(), the whole idea is to just update
the VM and then leave the WAL logging and other changes to the caller
(like marking the buffer dirty, setting the page LSN, etc). The series
of operations needed to make a persistent change are up to the caller.
visibilitymap_set() is meant to just make sure that the correct bits
in the VM are set for the given heap block.
I looked at ways of making the current visibilitymap_set() API cleaner
-- like setting the heap page LSN with the VM recptr in the caller of
visibilitymap_set() instead. There wasn't a way of doing it that
seemed like enough of an improvement to merit the change.
Not to mention, the goal of the patchset is to remove the current
visibilitymap_set(), so I'm not too worried about parity between the
two functions. They may coexist for awhile, but hopefully today's
visibilitymap_set() will eventually be removed.
- Melanie
Attachments:
v7-0005-Eliminate-xl_heap_visible-from-vacuum-phase-III.patchtext/x-patch; charset=US-ASCII; name=v7-0005-Eliminate-xl_heap_visible-from-vacuum-phase-III.patchDownload
From b980fc377a66d28acaf12a217e0fcd48a422ca69 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <melanieplageman@gmail.com>
Date: Wed, 18 Jun 2025 12:42:13 -0400
Subject: [PATCH v7 05/22] Eliminate xl_heap_visible from vacuum phase III
Instead of emitting a separate xl_heap_visible record for each page that
is rendered all-visible by vacuum's third phase, include the updates to
the VM in the already emitted xl_heap_prune record.
Author: Melanie Plageman <melanieplageman@gmail.com>
Reviewed-by: Kirill Reshke <reshkekirill@gmail.com>
Discussion: https://postgr.es/m/flat/CAAKRu_ZMw6Npd_qm2KM%2BFwQ3cMOMx1Dh3VMhp8-V7SOLxdK9-g%40mail.gmail.com
---
src/backend/access/heap/heapam_xlog.c | 143 +++++++++++++++++++++---
src/backend/access/heap/pruneheap.c | 48 +++++++-
src/backend/access/heap/vacuumlazy.c | 149 +++++++++++++++++--------
src/backend/access/rmgrdesc/heapdesc.c | 13 ++-
src/include/access/heapam.h | 9 ++
src/include/access/heapam_xlog.h | 7 +-
6 files changed, 300 insertions(+), 69 deletions(-)
diff --git a/src/backend/access/heap/heapam_xlog.c b/src/backend/access/heap/heapam_xlog.c
index cf437b14eeb..05dce829eae 100644
--- a/src/backend/access/heap/heapam_xlog.c
+++ b/src/backend/access/heap/heapam_xlog.c
@@ -35,7 +35,8 @@ heap_xlog_prune_freeze(XLogReaderState *record)
Buffer buffer;
RelFileLocator rlocator;
BlockNumber blkno;
- XLogRedoAction action;
+ Buffer vmbuffer = InvalidBuffer;
+ uint8 vmflags = 0;
XLogRecGetBlockTag(record, 0, &rlocator, NULL, &blkno);
memcpy(&xlrec, maindataptr, SizeOfHeapPrune);
@@ -51,10 +52,15 @@ heap_xlog_prune_freeze(XLogReaderState *record)
(xlrec.flags & (XLHP_HAS_REDIRECTIONS | XLHP_HAS_DEAD_ITEMS)) == 0);
/*
- * We are about to remove and/or freeze tuples. In Hot Standby mode,
- * ensure that there are no queries running for which the removed tuples
- * are still visible or which still consider the frozen xids as running.
- * The conflict horizon XID comes after xl_heap_prune.
+ * After xl_heap_prune is the optional snapshot conflict horizon.
+ *
+ * In Hot Standby mode, we must ensure that there are no running queries
+ * which would conflict with the changes in this record. If pruning, that
+ * means we cannot remove tuples still visible to transactions on the
+ * standby. If freezing, that means we cannot freeze tuples with xids that
+ * are still considered running on the standby. And for setting the VM, we
+ * cannot do so if the page isn't all-visible to all transactions on the
+ * standby.
*/
if ((xlrec.flags & XLHP_HAS_CONFLICT_HORIZON) != 0)
{
@@ -70,13 +76,29 @@ heap_xlog_prune_freeze(XLogReaderState *record)
rlocator);
}
+ /* Next are the optionally included vmflags. Copy them out for later use. */
+ if ((xlrec.flags & XLHP_HAS_VMFLAGS) != 0)
+ {
+ /* memcpy because vmflags is stored unaligned */
+ memcpy(&vmflags, maindataptr, sizeof(uint8));
+ maindataptr += sizeof(uint8);
+
+ /*
+ * We don't set VISIBILITYMAP_XLOG_CATALOG_REL in the combined record
+ * because we already have XLHP_IS_CATALOG_REL.
+ */
+ Assert((vmflags & VISIBILITYMAP_VALID_BITS) == vmflags);
+ /* Must never set all_frozen bit without also setting all_visible bit */
+ Assert(vmflags != VISIBILITYMAP_ALL_FROZEN);
+ }
+
/*
- * If we have a full-page image, restore it and we're done.
+ * If we have a full-page image of the heap block, restore it and we're
+ * done with the heap block.
*/
- action = XLogReadBufferForRedoExtended(record, 0, RBM_NORMAL,
- (xlrec.flags & XLHP_CLEANUP_LOCK) != 0,
- &buffer);
- if (action == BLK_NEEDS_REDO)
+ if (XLogReadBufferForRedoExtended(record, 0, RBM_NORMAL,
+ (xlrec.flags & XLHP_CLEANUP_LOCK) != 0,
+ &buffer) == BLK_NEEDS_REDO)
{
Page page = (Page) BufferGetPage(buffer);
OffsetNumber *redirected;
@@ -89,6 +111,9 @@ heap_xlog_prune_freeze(XLogReaderState *record)
Size datalen;
xlhp_freeze_plan *plans;
OffsetNumber *frz_offsets;
+ bool do_prune;
+ bool mark_buffer_dirty;
+ bool set_heap_lsn;
char *dataptr = XLogRecGetBlockData(record, 0, &datalen);
heap_xlog_deserialize_prune_and_freeze(dataptr, xlrec.flags,
@@ -97,11 +122,18 @@ heap_xlog_prune_freeze(XLogReaderState *record)
&ndead, &nowdead,
&nunused, &nowunused);
+ do_prune = nredirected > 0 || ndead > 0 || nunused > 0;
+ set_heap_lsn = mark_buffer_dirty = do_prune || nplans > 0;
+
+ /* Ensure the record does something */
+ Assert(do_prune || nplans > 0 ||
+ vmflags & VISIBILITYMAP_VALID_BITS);
+
/*
* Update all line pointers per the record, and repair fragmentation
* if needed.
*/
- if (nredirected > 0 || ndead > 0 || nunused > 0)
+ if (do_prune)
heap_page_prune_execute(buffer,
(xlrec.flags & XLHP_CLEANUP_LOCK) == 0,
redirected, nredirected,
@@ -138,26 +170,72 @@ heap_xlog_prune_freeze(XLogReaderState *record)
/* There should be no more data */
Assert((char *) frz_offsets == dataptr + datalen);
+ Assert(BufferIsValid(buffer) &&
+ BufferGetBlockNumber(buffer) == blkno);
+
+ /*
+ * Now set PD_ALL_VISIBLE, if required. We'll only do this if we are
+ * also going to set bits in the VM later.
+ *
+ * We must never end up with the VM bit set and the page-level
+ * PD_ALL_VISIBLE bit clear. If that were to occur, a subsequent page
+ * modification would fail to clear the VM bit.
+ */
+ if ((vmflags & VISIBILITYMAP_VALID_BITS) && !PageIsAllVisible(page))
+ {
+ PageSetAllVisible(page);
+
+ /*
+ * If the only change to the heap page is setting PD_ALL_VISIBLE,
+ * we can avoid setting the page LSN unless checksums or
+ * wal_log_hints are enabled.
+ */
+ set_heap_lsn = XLogHintBitIsNeeded() ? true : set_heap_lsn;
+ mark_buffer_dirty = true;
+ }
+
/*
* Note: we don't worry about updating the page's prunability hints.
* At worst this will cause an extra prune cycle to occur soon.
*/
- PageSetLSN(page, lsn);
- MarkBufferDirty(buffer);
+ if (mark_buffer_dirty)
+ MarkBufferDirty(buffer);
+ if (set_heap_lsn)
+ PageSetLSN(page, lsn);
}
/*
- * If we released any space or line pointers, update the free space map.
+ * If we released any space or line pointers or will be setting a page in
+ * the visibility map, update the free space map.
+ *
+ * Even if we are just updating the VM (and thus not freeing up any
+ * space), we'll still update the FSM for this page. Since FSM is not
+ * WAL-logged and only updated heuristically, it easily becomes stale in
+ * standbys. If the standby is later promoted and runs VACUUM, it will
+ * skip updating individual free space figures for pages that became
+ * all-visible (or all-frozen, depending on the vacuum mode,) which is
+ * troublesome when FreeSpaceMapVacuum propagates too optimistic free
+ * space values to upper FSM layers; later inserters try to use such pages
+ * only to find out that they are unusable. This can cause long stalls
+ * when there are many such pages.
+ *
+ * Forestall those problems by updating FSM's idea about a page that is
+ * becoming all-visible or all-frozen.
*
* Do this regardless of a full-page image being applied, since the FSM
* data is not in the page anyway.
+ *
+ * We want to avoid holding an exclusive lock on the heap buffer while
+ * doing IO (either of the FSM or the VM), so we'll release the lock on
+ * the heap buffer before doing either.
*/
if (BufferIsValid(buffer))
{
- if (xlrec.flags & (XLHP_HAS_REDIRECTIONS |
- XLHP_HAS_DEAD_ITEMS |
- XLHP_HAS_NOW_UNUSED_ITEMS))
+ if ((xlrec.flags & (XLHP_HAS_REDIRECTIONS |
+ XLHP_HAS_DEAD_ITEMS |
+ XLHP_HAS_NOW_UNUSED_ITEMS)) ||
+ vmflags & VISIBILITYMAP_VALID_BITS)
{
Size freespace = PageGetHeapFreeSpace(BufferGetPage(buffer));
@@ -168,6 +246,37 @@ heap_xlog_prune_freeze(XLogReaderState *record)
else
UnlockReleaseBuffer(buffer);
}
+
+ /*
+ * Read and update the VM block. Even if we skipped updating the heap page
+ * due to the file being dropped or truncated later in recovery, it's
+ * still safe to update the visibility map. Any WAL record that clears
+ * the visibility map bit does so before checking the page LSN, so any
+ * bits that need to be cleared will still be cleared.
+ *
+ * Note that it is *only* okay that we do not hold a lock on the heap page
+ * because we are in recovery and can expect no other writers to clear
+ * PD_ALL_VISIBLE before we are able to update the VM.
+ */
+ if (vmflags & VISIBILITYMAP_VALID_BITS &&
+ XLogReadBufferForRedoExtended(record, 1,
+ RBM_ZERO_ON_ERROR,
+ false,
+ &vmbuffer) == BLK_NEEDS_REDO)
+ {
+ uint8 old_vmbits = 0;
+ Relation reln = CreateFakeRelcacheEntry(rlocator);
+
+ visibilitymap_pin(reln, blkno, &vmbuffer);
+ old_vmbits = visibilitymap_set_vmbyte(reln, blkno, vmbuffer, vmflags);
+ /* Only set VM page LSN if we modified the page */
+ if (old_vmbits != vmflags)
+ PageSetLSN(BufferGetPage(vmbuffer), lsn);
+ FreeFakeRelcacheEntry(reln);
+ }
+
+ if (BufferIsValid(vmbuffer))
+ UnlockReleaseBuffer(vmbuffer);
}
/*
diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index a8025889be0..d9ba0f96e34 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -21,6 +21,7 @@
#include "access/transam.h"
#include "access/xlog.h"
#include "access/xloginsert.h"
+#include "access/visibilitymapdefs.h"
#include "commands/vacuum.h"
#include "executor/instrument.h"
#include "miscadmin.h"
@@ -835,6 +836,7 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
conflict_xid = prstate.latest_xid_removed;
log_heap_prune_and_freeze(relation, buffer,
+ InvalidBuffer, 0, false,
conflict_xid,
true, reason,
prstate.frozen, prstate.nfrozen,
@@ -2045,12 +2047,23 @@ heap_log_freeze_plan(HeapTupleFreeze *tuples, int ntuples,
* replaying 'unused' items depends on whether they were all previously marked
* as dead.
*
+ * If the VM is being updated, vmflags will contain the bits to set. In this
+ * case, vmbuffer should already have been updated and marked dirty and should
+ * still be pinned and locked.
+ *
+ * set_pd_all_vis indicates that we set PD_ALL_VISIBLE and thus should update
+ * the page LSN when checksums/wal_log_hints are enabled even if we did not
+ * prune or freeze tuples on the page.
+ *
* Note: This function scribbles on the 'frozen' array.
*
* Note: This is called in a critical section, so careful what you do here.
*/
void
log_heap_prune_and_freeze(Relation relation, Buffer buffer,
+ Buffer vmbuffer,
+ uint8 vmflags,
+ bool set_pd_all_vis,
TransactionId conflict_xid,
bool cleanup_lock,
PruneReason reason,
@@ -2062,6 +2075,7 @@ log_heap_prune_and_freeze(Relation relation, Buffer buffer,
xl_heap_prune xlrec;
XLogRecPtr recptr;
uint8 info;
+ uint8 regbuf_flags;
/* The following local variables hold data registered in the WAL record: */
xlhp_freeze_plan plans[MaxHeapTuplesPerPage];
@@ -2070,8 +2084,19 @@ log_heap_prune_and_freeze(Relation relation, Buffer buffer,
xlhp_prune_items dead_items;
xlhp_prune_items unused_items;
OffsetNumber frz_offsets[MaxHeapTuplesPerPage];
+ bool do_prune = nredirected > 0 || ndead > 0 || nunused > 0;
xlrec.flags = 0;
+ regbuf_flags = REGBUF_STANDARD;
+
+ /*
+ * We can avoid an FPI if the only modification we are making to the heap
+ * page is to set PD_ALL_VISIBLE and checksums/wal_log_hints are disabled.
+ */
+ if (!do_prune &&
+ nfrozen == 0 &&
+ (!set_pd_all_vis || !XLogHintBitIsNeeded()))
+ regbuf_flags |= REGBUF_NO_IMAGE;
/*
* Prepare data for the buffer. The arrays are not actually in the
@@ -2079,7 +2104,11 @@ log_heap_prune_and_freeze(Relation relation, Buffer buffer,
* page image, the arrays can be omitted.
*/
XLogBeginInsert();
- XLogRegisterBuffer(0, buffer, REGBUF_STANDARD);
+ XLogRegisterBuffer(0, buffer, regbuf_flags);
+
+ if (vmflags & VISIBILITYMAP_VALID_BITS)
+ XLogRegisterBuffer(1, vmbuffer, 0);
+
if (nfrozen > 0)
{
int nplans;
@@ -2136,6 +2165,8 @@ log_heap_prune_and_freeze(Relation relation, Buffer buffer,
* Prepare the main xl_heap_prune record. We already set the XLHP_HAS_*
* flag above.
*/
+ if (vmflags & VISIBILITYMAP_VALID_BITS)
+ xlrec.flags |= XLHP_HAS_VMFLAGS;
if (RelationIsAccessibleInLogicalDecoding(relation))
xlrec.flags |= XLHP_IS_CATALOG_REL;
if (TransactionIdIsValid(conflict_xid))
@@ -2150,6 +2181,8 @@ log_heap_prune_and_freeze(Relation relation, Buffer buffer,
XLogRegisterData(&xlrec, SizeOfHeapPrune);
if (TransactionIdIsValid(conflict_xid))
XLogRegisterData(&conflict_xid, sizeof(TransactionId));
+ if (vmflags & VISIBILITYMAP_VALID_BITS)
+ XLogRegisterData(&vmflags, sizeof(uint8));
switch (reason)
{
@@ -2168,5 +2201,16 @@ log_heap_prune_and_freeze(Relation relation, Buffer buffer,
}
recptr = XLogInsert(RM_HEAP2_ID, info);
- PageSetLSN(BufferGetPage(buffer), recptr);
+ if (vmflags & VISIBILITYMAP_VALID_BITS)
+ PageSetLSN(BufferGetPage(vmbuffer), recptr);
+
+ /*
+ * If pruning or freezing tuples or setting the page all-visible when
+ * checksums or wal_hint_bits are enabled, we must bump the LSN. Torn
+ * pages are possible if we update PD_ALL_VISIBLE without bumping the LSN,
+ * but this is deemed okay for page hint updates.
+ */
+ if (do_prune || nfrozen > 0 ||
+ (set_pd_all_vis && XLogHintBitIsNeeded()))
+ PageSetLSN(BufferGetPage(buffer), recptr);
}
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index 8a62a93eee5..460cdbd8417 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -464,11 +464,13 @@ static void dead_items_add(LVRelState *vacrel, BlockNumber blkno, OffsetNumber *
int num_offsets);
static void dead_items_reset(LVRelState *vacrel);
static void dead_items_cleanup(LVRelState *vacrel);
-static bool heap_page_is_all_visible(Relation rel, Buffer buf,
- TransactionId OldestXmin,
- bool *all_frozen,
- TransactionId *visibility_cutoff_xid,
- OffsetNumber *logging_offnum);
+static bool heap_page_is_all_visible_except_lpdead(Relation rel, Buffer buf,
+ TransactionId OldestXmin,
+ OffsetNumber *deadoffsets,
+ int allowed_num_offsets,
+ bool *all_frozen,
+ TransactionId *visibility_cutoff_xid,
+ OffsetNumber *logging_offnum);
static void update_relstats_all_indexes(LVRelState *vacrel);
static void vacuum_error_callback(void *arg);
static void update_vacuum_error_info(LVRelState *vacrel,
@@ -2847,8 +2849,11 @@ lazy_vacuum_heap_page(LVRelState *vacrel, BlockNumber blkno, Buffer buffer,
OffsetNumber unused[MaxHeapTuplesPerPage];
int nunused = 0;
TransactionId visibility_cutoff_xid;
+ TransactionId conflict_xid = InvalidTransactionId;
bool all_frozen;
LVSavedErrInfo saved_err_info;
+ uint8 vmflags = 0;
+ bool set_pd_all_vis = false;
Assert(vacrel->do_index_vacuuming);
@@ -2859,6 +2864,20 @@ lazy_vacuum_heap_page(LVRelState *vacrel, BlockNumber blkno, Buffer buffer,
VACUUM_ERRCB_PHASE_VACUUM_HEAP, blkno,
InvalidOffsetNumber);
+ if (heap_page_is_all_visible_except_lpdead(vacrel->rel, buffer,
+ vacrel->cutoffs.OldestXmin,
+ deadoffsets, num_offsets,
+ &all_frozen, &visibility_cutoff_xid,
+ &vacrel->offnum))
+ {
+ vmflags |= VISIBILITYMAP_ALL_VISIBLE;
+ if (all_frozen)
+ {
+ vmflags |= VISIBILITYMAP_ALL_FROZEN;
+ Assert(!TransactionIdIsValid(visibility_cutoff_xid));
+ }
+ }
+
START_CRIT_SECTION();
for (int i = 0; i < num_offsets; i++)
@@ -2878,6 +2897,18 @@ lazy_vacuum_heap_page(LVRelState *vacrel, BlockNumber blkno, Buffer buffer,
/* Attempt to truncate line pointer array now */
PageTruncateLinePointerArray(page);
+ if ((vmflags & VISIBILITYMAP_VALID_BITS) != 0)
+ {
+ Assert(!PageIsAllVisible(page));
+ set_pd_all_vis = true;
+ LockBuffer(vmbuffer, BUFFER_LOCK_EXCLUSIVE);
+ PageSetAllVisible(page);
+ visibilitymap_set_vmbyte(vacrel->rel,
+ blkno,
+ vmbuffer, vmflags);
+ conflict_xid = visibility_cutoff_xid;
+ }
+
/*
* Mark buffer dirty before we write WAL.
*/
@@ -2887,7 +2918,10 @@ lazy_vacuum_heap_page(LVRelState *vacrel, BlockNumber blkno, Buffer buffer,
if (RelationNeedsWAL(vacrel->rel))
{
log_heap_prune_and_freeze(vacrel->rel, buffer,
- InvalidTransactionId,
+ vmbuffer,
+ vmflags,
+ set_pd_all_vis,
+ conflict_xid,
false, /* no cleanup lock required */
PRUNE_VACUUM_CLEANUP,
NULL, 0, /* frozen */
@@ -2896,39 +2930,12 @@ lazy_vacuum_heap_page(LVRelState *vacrel, BlockNumber blkno, Buffer buffer,
unused, nunused);
}
- /*
- * End critical section, so we safely can do visibility tests (which
- * possibly need to perform IO and allocate memory!). If we crash now the
- * page (including the corresponding vm bit) might not be marked all
- * visible, but that's fine. A later vacuum will fix that.
- */
END_CRIT_SECTION();
- /*
- * Now that we have removed the LP_DEAD items from the page, once again
- * check if the page has become all-visible. The page is already marked
- * dirty, exclusively locked, and, if needed, a full page image has been
- * emitted.
- */
- Assert(!PageIsAllVisible(page));
- if (heap_page_is_all_visible(vacrel->rel, buffer, vacrel->cutoffs.OldestXmin,
- &all_frozen, &visibility_cutoff_xid, &vacrel->offnum))
+ if ((vmflags & VISIBILITYMAP_ALL_VISIBLE) != 0)
{
- uint8 flags = VISIBILITYMAP_ALL_VISIBLE;
-
- if (all_frozen)
- {
- Assert(!TransactionIdIsValid(visibility_cutoff_xid));
- flags |= VISIBILITYMAP_ALL_FROZEN;
- }
-
- PageSetAllVisible(page);
- visibilitymap_set(vacrel->rel, blkno, buffer,
- InvalidXLogRecPtr,
- vmbuffer, visibility_cutoff_xid,
- flags);
-
/* Count the newly set VM page for logging */
+ LockBuffer(vmbuffer, BUFFER_LOCK_UNLOCK);
vacrel->vm_new_visible_pages++;
if (all_frozen)
vacrel->vm_new_visible_frozen_pages++;
@@ -3594,6 +3601,25 @@ dead_items_cleanup(LVRelState *vacrel)
vacrel->pvs = NULL;
}
+/*
+ * Wrapper for heap_page_is_all_visible_except_lpdead() which can be used for
+ * callers that expect no LP_DEAD on the page.
+ */
+bool
+heap_page_is_all_visible(Relation rel, Buffer buf,
+ TransactionId OldestXmin,
+ bool *all_frozen,
+ TransactionId *visibility_cutoff_xid,
+ OffsetNumber *logging_offnum)
+{
+
+ return heap_page_is_all_visible_except_lpdead(rel, buf, OldestXmin,
+ NULL, 0,
+ all_frozen,
+ visibility_cutoff_xid,
+ logging_offnum);
+}
+
/*
* Check if every tuple in the given page is visible to all current and future
* transactions.
@@ -3607,23 +3633,35 @@ dead_items_cleanup(LVRelState *vacrel)
* visible tuples. Sets *all_frozen to true if every tuple on this page is
* frozen.
*
- * This is a stripped down version of lazy_scan_prune(). If you change
- * anything here, make sure that everything stays in sync. Note that an
- * assertion calls us to verify that everybody still agrees. Be sure to avoid
- * introducing new side-effects here.
+ * deadoffsets are the offsets we know about and are about to set LP_UNUSED.
+ * allowed_num_offsets is the number of those. As long as the LP_DEAD items we
+ * encounter on the page match those exactly, we can set the page all-visible
+ * in the VM.
+ *
+ * Callers looking to verify that the page is all-visible can call
+ * heap_page_is_all_visible().
+ *
+ * This is similar logic to that in heap_prune_record_unchanged_lp_normal() If
+ * you change anything here, make sure that everything stays in sync. Note
+ * that an assertion calls us to verify that everybody still agrees. Be sure
+ * to avoid introducing new side-effects here.
*/
static bool
-heap_page_is_all_visible(Relation rel, Buffer buf,
- TransactionId OldestXmin,
- bool *all_frozen,
- TransactionId *visibility_cutoff_xid,
- OffsetNumber *logging_offnum)
+heap_page_is_all_visible_except_lpdead(Relation rel, Buffer buf,
+ TransactionId OldestXmin,
+ OffsetNumber *deadoffsets,
+ int allowed_num_offsets,
+ bool *all_frozen,
+ TransactionId *visibility_cutoff_xid,
+ OffsetNumber *logging_offnum)
{
Page page = BufferGetPage(buf);
BlockNumber blockno = BufferGetBlockNumber(buf);
OffsetNumber offnum,
maxoff;
bool all_visible = true;
+ OffsetNumber current_dead_offsets[MaxHeapTuplesPerPage];
+ size_t current_num_offsets = 0;
*visibility_cutoff_xid = InvalidTransactionId;
*all_frozen = true;
@@ -3655,9 +3693,8 @@ heap_page_is_all_visible(Relation rel, Buffer buf,
*/
if (ItemIdIsDead(itemid))
{
- all_visible = false;
- *all_frozen = false;
- break;
+ current_dead_offsets[current_num_offsets++] = offnum;
+ continue;
}
Assert(ItemIdIsNormal(itemid));
@@ -3724,7 +3761,23 @@ heap_page_is_all_visible(Relation rel, Buffer buf,
/* Clear the offset information once we have processed the given page. */
*logging_offnum = InvalidOffsetNumber;
- return all_visible;
+ /* If we already know it's not all-visible, return false */
+ if (!all_visible)
+ return false;
+
+ /* If we weren't allowed any dead offsets, we're done */
+ if (allowed_num_offsets == 0)
+ return current_num_offsets == 0;
+
+ /* If the number of dead offsets has changed, that's wrong */
+ if (current_num_offsets != allowed_num_offsets)
+ return false;
+
+ Assert(deadoffsets);
+
+ /* The dead offsets must be the same dead offsets */
+ return memcmp(current_dead_offsets, deadoffsets,
+ allowed_num_offsets * sizeof(OffsetNumber)) == 0;
}
/*
diff --git a/src/backend/access/rmgrdesc/heapdesc.c b/src/backend/access/rmgrdesc/heapdesc.c
index b48d7dc1d24..d6c86ccac20 100644
--- a/src/backend/access/rmgrdesc/heapdesc.c
+++ b/src/backend/access/rmgrdesc/heapdesc.c
@@ -266,6 +266,7 @@ heap2_desc(StringInfo buf, XLogReaderState *record)
{
char *rec = XLogRecGetData(record);
uint8 info = XLogRecGetInfo(record) & ~XLR_INFO_MASK;
+ char *maindataptr = rec + SizeOfHeapPrune;
info &= XLOG_HEAP_OPMASK;
if (info == XLOG_HEAP2_PRUNE_ON_ACCESS ||
@@ -278,7 +279,8 @@ heap2_desc(StringInfo buf, XLogReaderState *record)
{
TransactionId conflict_xid;
- memcpy(&conflict_xid, rec + SizeOfHeapPrune, sizeof(TransactionId));
+ memcpy(&conflict_xid, maindataptr, sizeof(TransactionId));
+ maindataptr += sizeof(TransactionId);
appendStringInfo(buf, "snapshotConflictHorizon: %u",
conflict_xid);
@@ -287,6 +289,15 @@ heap2_desc(StringInfo buf, XLogReaderState *record)
appendStringInfo(buf, ", isCatalogRel: %c",
xlrec->flags & XLHP_IS_CATALOG_REL ? 'T' : 'F');
+ if (xlrec->flags & XLHP_HAS_VMFLAGS)
+ {
+ uint8 vmflags;
+
+ memcpy(&vmflags, maindataptr, sizeof(uint8));
+ maindataptr += sizeof(uint8);
+ appendStringInfo(buf, ", vm_flags: 0x%02X", vmflags);
+ }
+
if (XLogRecHasBlockData(record, 0))
{
Size datalen;
diff --git a/src/include/access/heapam.h b/src/include/access/heapam.h
index a2bd5a897f8..8b47295efa2 100644
--- a/src/include/access/heapam.h
+++ b/src/include/access/heapam.h
@@ -344,6 +344,12 @@ extern void heap_inplace_update_and_unlock(Relation relation,
Buffer buffer);
extern void heap_inplace_unlock(Relation relation,
HeapTuple oldtup, Buffer buffer);
+
+extern bool heap_page_is_all_visible(Relation rel, Buffer buf,
+ TransactionId OldestXmin,
+ bool *all_frozen,
+ TransactionId *visibility_cutoff_xid,
+ OffsetNumber *logging_offnum);
extern bool heap_prepare_freeze_tuple(HeapTupleHeader tuple,
const struct VacuumCutoffs *cutoffs,
HeapPageFreeze *pagefrz,
@@ -388,6 +394,9 @@ extern void heap_page_prune_execute(Buffer buffer, bool lp_truncate_only,
OffsetNumber *nowunused, int nunused);
extern void heap_get_root_tuples(Page page, OffsetNumber *root_offsets);
extern void log_heap_prune_and_freeze(Relation relation, Buffer buffer,
+ Buffer vmbuffer,
+ uint8 vmflags,
+ bool vm_modified_heap_page,
TransactionId conflict_xid,
bool cleanup_lock,
PruneReason reason,
diff --git a/src/include/access/heapam_xlog.h b/src/include/access/heapam_xlog.h
index 277df6b3cf0..d6a479f6984 100644
--- a/src/include/access/heapam_xlog.h
+++ b/src/include/access/heapam_xlog.h
@@ -289,12 +289,17 @@ typedef struct xl_heap_prune
/*
* If XLHP_HAS_CONFLICT_HORIZON is set, the conflict horizon XID follows,
- * unaligned
+ * unaligned.
+ *
+ * Then, if XLHP_HAS_VMFLAGS is set, the VM flags follow, unaligned.
*/
} xl_heap_prune;
#define SizeOfHeapPrune (offsetof(xl_heap_prune, flags) + sizeof(uint8))
+/* If the record should update the VM, it contains their new value */
+#define XLHP_HAS_VMFLAGS (1 << 0)
+
/* to handle recovery conflict during logical decoding on standby */
#define XLHP_IS_CATALOG_REL (1 << 1)
--
2.43.0
v7-0004-Make-heap_page_is_all_visible-independent-of-LVRe.patchtext/x-patch; charset=US-ASCII; name=v7-0004-Make-heap_page_is_all_visible-independent-of-LVRe.patchDownload
From a8f1b5ef988235f7d9c5fd24d10a139472df2e31 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <melanieplageman@gmail.com>
Date: Wed, 18 Jun 2025 15:48:51 -0400
Subject: [PATCH v7 04/22] Make heap_page_is_all_visible independent of
LVRelState
Future commits will use this function inside of pruneheap.c where we do
not have access to the LVRelState. We only need two parameters from the
LVRelState, so just pass those in explicitly.
Author: Melanie Plageman <melanieplageman@gmail.com>
Reviewed-by: Kirill Reshke <reshkekirill@gmail.com>
Discussion: https://postgr.es/m/flat/CAAKRu_ZMw6Npd_qm2KM%2BFwQ3cMOMx1Dh3VMhp8-V7SOLxdK9-g%40mail.gmail.com
---
src/backend/access/heap/vacuumlazy.c | 45 ++++++++++++++++++----------
1 file changed, 29 insertions(+), 16 deletions(-)
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index 14036c27e87..8a62a93eee5 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -464,8 +464,11 @@ static void dead_items_add(LVRelState *vacrel, BlockNumber blkno, OffsetNumber *
int num_offsets);
static void dead_items_reset(LVRelState *vacrel);
static void dead_items_cleanup(LVRelState *vacrel);
-static bool heap_page_is_all_visible(LVRelState *vacrel, Buffer buf,
- TransactionId *visibility_cutoff_xid, bool *all_frozen);
+static bool heap_page_is_all_visible(Relation rel, Buffer buf,
+ TransactionId OldestXmin,
+ bool *all_frozen,
+ TransactionId *visibility_cutoff_xid,
+ OffsetNumber *logging_offnum);
static void update_relstats_all_indexes(LVRelState *vacrel);
static void vacuum_error_callback(void *arg);
static void update_vacuum_error_info(LVRelState *vacrel,
@@ -2010,8 +2013,9 @@ lazy_scan_prune(LVRelState *vacrel,
Assert(presult.lpdead_items == 0);
- if (!heap_page_is_all_visible(vacrel, buf,
- &debug_cutoff, &debug_all_frozen))
+ if (!heap_page_is_all_visible(vacrel->rel, buf,
+ vacrel->cutoffs.OldestXmin, &debug_all_frozen,
+ &debug_cutoff, &vacrel->offnum))
Assert(false);
Assert(presult.all_frozen == debug_all_frozen);
@@ -2907,8 +2911,8 @@ lazy_vacuum_heap_page(LVRelState *vacrel, BlockNumber blkno, Buffer buffer,
* emitted.
*/
Assert(!PageIsAllVisible(page));
- if (heap_page_is_all_visible(vacrel, buffer, &visibility_cutoff_xid,
- &all_frozen))
+ if (heap_page_is_all_visible(vacrel->rel, buffer, vacrel->cutoffs.OldestXmin,
+ &all_frozen, &visibility_cutoff_xid, &vacrel->offnum))
{
uint8 flags = VISIBILITYMAP_ALL_VISIBLE;
@@ -3592,9 +3596,16 @@ dead_items_cleanup(LVRelState *vacrel)
/*
* Check if every tuple in the given page is visible to all current and future
- * transactions. Also return the visibility_cutoff_xid which is the highest
- * xmin amongst the visible tuples. Set *all_frozen to true if every tuple
- * on this page is frozen.
+ * transactions.
+ *
+ * OldestXmin is used to determine visibility.
+ *
+ * *logging_offnum will have the OffsetNumber of the current tuple being
+ * processed for vacuum's error callback system.
+ *
+ * Return the visibility_cutoff_xid which is the highest xmin amongst the
+ * visible tuples. Sets *all_frozen to true if every tuple on this page is
+ * frozen.
*
* This is a stripped down version of lazy_scan_prune(). If you change
* anything here, make sure that everything stays in sync. Note that an
@@ -3602,9 +3613,11 @@ dead_items_cleanup(LVRelState *vacrel)
* introducing new side-effects here.
*/
static bool
-heap_page_is_all_visible(LVRelState *vacrel, Buffer buf,
+heap_page_is_all_visible(Relation rel, Buffer buf,
+ TransactionId OldestXmin,
+ bool *all_frozen,
TransactionId *visibility_cutoff_xid,
- bool *all_frozen)
+ OffsetNumber *logging_offnum)
{
Page page = BufferGetPage(buf);
BlockNumber blockno = BufferGetBlockNumber(buf);
@@ -3627,7 +3640,7 @@ heap_page_is_all_visible(LVRelState *vacrel, Buffer buf,
* Set the offset number so that we can display it along with any
* error that occurred while processing this tuple.
*/
- vacrel->offnum = offnum;
+ *logging_offnum = offnum;
itemid = PageGetItemId(page, offnum);
/* Unused or redirect line pointers are of no interest */
@@ -3651,9 +3664,9 @@ heap_page_is_all_visible(LVRelState *vacrel, Buffer buf,
tuple.t_data = (HeapTupleHeader) PageGetItem(page, itemid);
tuple.t_len = ItemIdGetLength(itemid);
- tuple.t_tableOid = RelationGetRelid(vacrel->rel);
+ tuple.t_tableOid = RelationGetRelid(rel);
- switch (HeapTupleSatisfiesVacuum(&tuple, vacrel->cutoffs.OldestXmin,
+ switch (HeapTupleSatisfiesVacuum(&tuple, OldestXmin,
buf))
{
case HEAPTUPLE_LIVE:
@@ -3674,7 +3687,7 @@ heap_page_is_all_visible(LVRelState *vacrel, Buffer buf,
*/
xmin = HeapTupleHeaderGetXmin(tuple.t_data);
if (!TransactionIdPrecedes(xmin,
- vacrel->cutoffs.OldestXmin))
+ OldestXmin))
{
all_visible = false;
*all_frozen = false;
@@ -3709,7 +3722,7 @@ heap_page_is_all_visible(LVRelState *vacrel, Buffer buf,
} /* scan along page */
/* Clear the offset information once we have processed the given page. */
- vacrel->offnum = InvalidOffsetNumber;
+ *logging_offnum = InvalidOffsetNumber;
return all_visible;
}
--
2.43.0
v7-0002-Add-assert-and-log-message-to-visibilitymap_set.patchtext/x-patch; charset=US-ASCII; name=v7-0002-Add-assert-and-log-message-to-visibilitymap_set.patchDownload
From 73abe01c6f7c69feca4f1f641c8c64d76cccc340 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <melanieplageman@gmail.com>
Date: Wed, 27 Aug 2025 10:07:29 -0400
Subject: [PATCH v7 02/22] Add assert and log message to visibilitymap_set
Add an assert to visibilitymap_set() that the provided heap buffer is
exclusively locked, which is expected.
Also, enhance the debug logging message to specify which VM flags were
set.
Based on a related suggestion by Kirill Reshke on an in-progress
patchset.
Discussion: https://postgr.es/m/CALdSSPhAU56g1gGVT0%2BwG8RrSWE6qW8TOfNJS1HNAWX6wPgbFA%40mail.gmail.com
---
src/backend/access/heap/visibilitymap.c | 5 ++++-
1 file changed, 4 insertions(+), 1 deletion(-)
diff --git a/src/backend/access/heap/visibilitymap.c b/src/backend/access/heap/visibilitymap.c
index 8f918e00af7..7440a65c404 100644
--- a/src/backend/access/heap/visibilitymap.c
+++ b/src/backend/access/heap/visibilitymap.c
@@ -255,7 +255,8 @@ visibilitymap_set(Relation rel, BlockNumber heapBlk, Buffer heapBuf,
uint8 status;
#ifdef TRACE_VISIBILITYMAP
- elog(DEBUG1, "vm_set %s %d", RelationGetRelationName(rel), heapBlk);
+ elog(DEBUG1, "vm_set flags 0x%02X for %s %d",
+ flags, RelationGetRelationName(rel), heapBlk);
#endif
Assert(InRecovery || XLogRecPtrIsInvalid(recptr));
@@ -269,6 +270,8 @@ visibilitymap_set(Relation rel, BlockNumber heapBlk, Buffer heapBuf,
if (BufferIsValid(heapBuf) && BufferGetBlockNumber(heapBuf) != heapBlk)
elog(ERROR, "wrong heap buffer passed to visibilitymap_set");
+ Assert(!BufferIsValid(heapBuf) || BufferIsExclusiveLocked(heapBuf));
+
/* Check that we have the right VM page pinned */
if (!BufferIsValid(vmBuf) || BufferGetBlockNumber(vmBuf) != mapBlock)
elog(ERROR, "wrong VM buffer passed to visibilitymap_set");
--
2.43.0
v7-0001-Remove-unneeded-VM-pin-from-VM-replay.patchtext/x-patch; charset=US-ASCII; name=v7-0001-Remove-unneeded-VM-pin-from-VM-replay.patchDownload
From 449f67324384d01d0e9601362f49bbe5b25f2676 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <melanieplageman@gmail.com>
Date: Wed, 27 Aug 2025 08:50:15 -0400
Subject: [PATCH v7 01/22] Remove unneeded VM pin from VM replay
During replay of an operation setting bits in the visibility map,
XLogReadBufferForRedoExtended() will return a pinned buffer containing
the specified block of the visibility map. It will also be sure to
create the visibility map if it doesn't exist. Previously,
heap_xlog_visible() called visibilitymap_pin() even after getting a
buffer in this way. This would just have resulted in visibilitymap_pin()
returning early since the specified page was already present and pinned.
Thus, it wouldn't have resulted in another pin and we can just eliminate
this call to visibilitymap_pin().
Inspired by a related report by Kirill Reshke on an in-progress patch.
Discussion: https://postgr.es/m/CALdSSPhAU56g1gGVT0%2BwG8RrSWE6qW8TOfNJS1HNAWX6wPgbFA%40mail.gmail.com
---
src/backend/access/heap/heapam_xlog.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/src/backend/access/heap/heapam_xlog.c b/src/backend/access/heap/heapam_xlog.c
index eb4bd3d6ae3..e3e021f2bdd 100644
--- a/src/backend/access/heap/heapam_xlog.c
+++ b/src/backend/access/heap/heapam_xlog.c
@@ -295,8 +295,8 @@ heap_xlog_visible(XLogReaderState *record)
LockBuffer(vmbuffer, BUFFER_LOCK_UNLOCK);
reln = CreateFakeRelcacheEntry(rlocator);
- visibilitymap_pin(reln, blkno, &vmbuffer);
+ Assert(visibilitymap_pin_ok(blkno, vmbuffer));
visibilitymap_set(reln, blkno, InvalidBuffer, lsn, vmbuffer,
xlrec->snapshotConflictHorizon, vmbits);
--
2.43.0
v7-0003-Eliminate-xl_heap_visible-in-COPY-FREEZE.patchtext/x-patch; charset=US-ASCII; name=v7-0003-Eliminate-xl_heap_visible-in-COPY-FREEZE.patchDownload
From 266450693f4df295a257b8316b285b8cfb25761a Mon Sep 17 00:00:00 2001
From: Melanie Plageman <melanieplageman@gmail.com>
Date: Tue, 17 Jun 2025 17:22:10 -0400
Subject: [PATCH v7 03/22] Eliminate xl_heap_visible in COPY FREEZE
Instead of emitting a separate WAL record for setting the VM bits in
xl_heap_visible, specify the changes to make to the VM block in the
xl_heap_multi_insert record instead.
Author: Melanie Plageman <melanieplageman@gmail.com>
Reviewed-by: Kirill Reshke <reshkekirill@gmail.com>
Discussion: https://postgr.es/m/flat/CAAKRu_ZMw6Npd_qm2KM%2BFwQ3cMOMx1Dh3VMhp8-V7SOLxdK9-g%40mail.gmail.com
---
src/backend/access/heap/heapam.c | 48 ++++++++++--------
src/backend/access/heap/heapam_xlog.c | 39 +++++++++++++-
src/backend/access/heap/visibilitymap.c | 67 ++++++++++++++++++++++++-
src/backend/access/rmgrdesc/heapdesc.c | 5 ++
src/include/access/visibilitymap.h | 2 +
5 files changed, 138 insertions(+), 23 deletions(-)
diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index 7491cc3cb93..4ce0ec61692 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -2504,9 +2504,6 @@ heap_multi_insert(Relation relation, TupleTableSlot **slots, int ntuples,
/*
* If the page is all visible, need to clear that, unless we're only
* going to add further frozen rows to it.
- *
- * If we're only adding already frozen rows to a previously empty
- * page, mark it as all-visible.
*/
if (PageIsAllVisible(page) && !(options & HEAP_INSERT_FROZEN))
{
@@ -2516,8 +2513,23 @@ heap_multi_insert(Relation relation, TupleTableSlot **slots, int ntuples,
BufferGetBlockNumber(buffer),
vmbuffer, VISIBILITYMAP_VALID_BITS);
}
+
+ /*
+ * If we're only adding already frozen rows to a previously empty
+ * page, mark it as all-frozen and update the visibility map. We're
+ * already holding a pin on the vmbuffer.
+ */
else if (all_frozen_set)
+ {
+ Assert(visibilitymap_pin_ok(BufferGetBlockNumber(buffer), vmbuffer));
PageSetAllVisible(page);
+ LockBuffer(vmbuffer, BUFFER_LOCK_EXCLUSIVE);
+ visibilitymap_set_vmbyte(relation,
+ BufferGetBlockNumber(buffer),
+ vmbuffer,
+ VISIBILITYMAP_ALL_VISIBLE |
+ VISIBILITYMAP_ALL_FROZEN);
+ }
/*
* XXX Should we set PageSetPrunable on this page ? See heap_insert()
@@ -2565,6 +2577,12 @@ heap_multi_insert(Relation relation, TupleTableSlot **slots, int ntuples,
xlrec->flags = 0;
if (all_visible_cleared)
xlrec->flags = XLH_INSERT_ALL_VISIBLE_CLEARED;
+
+ /*
+ * We don't have to worry about including a conflict xid in the
+ * WAL record as HEAP_INSERT_FROZEN intentionally violates
+ * visibility rules.
+ */
if (all_frozen_set)
xlrec->flags = XLH_INSERT_ALL_FROZEN_SET;
@@ -2627,7 +2645,10 @@ heap_multi_insert(Relation relation, TupleTableSlot **slots, int ntuples,
XLogBeginInsert();
XLogRegisterData(xlrec, tupledata - scratch.data);
+
XLogRegisterBuffer(0, buffer, REGBUF_STANDARD | bufflags);
+ if (all_frozen_set)
+ XLogRegisterBuffer(1, vmbuffer, 0);
XLogRegisterBufData(0, tupledata, totaldatalen);
@@ -2637,29 +2658,14 @@ heap_multi_insert(Relation relation, TupleTableSlot **slots, int ntuples,
recptr = XLogInsert(RM_HEAP2_ID, info);
PageSetLSN(page, recptr);
+ if (all_frozen_set)
+ PageSetLSN(BufferGetPage(vmbuffer), recptr);
}
END_CRIT_SECTION();
- /*
- * If we've frozen everything on the page, update the visibilitymap.
- * We're already holding pin on the vmbuffer.
- */
if (all_frozen_set)
- {
- Assert(PageIsAllVisible(page));
- Assert(visibilitymap_pin_ok(BufferGetBlockNumber(buffer), vmbuffer));
-
- /*
- * It's fine to use InvalidTransactionId here - this is only used
- * when HEAP_INSERT_FROZEN is specified, which intentionally
- * violates visibility rules.
- */
- visibilitymap_set(relation, BufferGetBlockNumber(buffer), buffer,
- InvalidXLogRecPtr, vmbuffer,
- InvalidTransactionId,
- VISIBILITYMAP_ALL_VISIBLE | VISIBILITYMAP_ALL_FROZEN);
- }
+ LockBuffer(vmbuffer, BUFFER_LOCK_UNLOCK);
UnlockReleaseBuffer(buffer);
ndone += nthispage;
diff --git a/src/backend/access/heap/heapam_xlog.c b/src/backend/access/heap/heapam_xlog.c
index e3e021f2bdd..cf437b14eeb 100644
--- a/src/backend/access/heap/heapam_xlog.c
+++ b/src/backend/access/heap/heapam_xlog.c
@@ -552,6 +552,7 @@ heap_xlog_multi_insert(XLogReaderState *record)
int i;
bool isinit = (XLogRecGetInfo(record) & XLOG_HEAP_INIT_PAGE) != 0;
XLogRedoAction action;
+ Buffer vmbuffer = InvalidBuffer;
/*
* Insertion doesn't overwrite MVCC data, so no conflict processing is
@@ -572,11 +573,11 @@ heap_xlog_multi_insert(XLogReaderState *record)
if (xlrec->flags & XLH_INSERT_ALL_VISIBLE_CLEARED)
{
Relation reln = CreateFakeRelcacheEntry(rlocator);
- Buffer vmbuffer = InvalidBuffer;
visibilitymap_pin(reln, blkno, &vmbuffer);
visibilitymap_clear(reln, blkno, vmbuffer, VISIBILITYMAP_VALID_BITS);
ReleaseBuffer(vmbuffer);
+ vmbuffer = InvalidBuffer;
FreeFakeRelcacheEntry(reln);
}
@@ -663,6 +664,42 @@ heap_xlog_multi_insert(XLogReaderState *record)
if (BufferIsValid(buffer))
UnlockReleaseBuffer(buffer);
+ buffer = InvalidBuffer;
+
+ /*
+ * Now read and update the VM block. Even if we skipped updating the heap
+ * page due to the file being dropped or truncated later in recovery, it's
+ * still safe to update the visibility map. Any WAL record that clears
+ * the visibility map bit does so before checking the page LSN, so any
+ * bits that need to be cleared will still be cleared.
+ *
+ * It is only okay to set the VM bits without holding the heap page lock
+ * because we can expect no other writers of this page.
+ */
+ if (xlrec->flags & XLH_INSERT_ALL_FROZEN_SET &&
+ XLogReadBufferForRedoExtended(record, 1, RBM_ZERO_ON_ERROR, false,
+ &vmbuffer) == BLK_NEEDS_REDO)
+ {
+ Relation reln = CreateFakeRelcacheEntry(rlocator);
+
+ Assert(visibilitymap_pin_ok(blkno, vmbuffer));
+ visibilitymap_set_vmbyte(reln, blkno,
+ vmbuffer,
+ VISIBILITYMAP_ALL_VISIBLE |
+ VISIBILITYMAP_ALL_FROZEN);
+
+ /*
+ * It is not possible that the VM was already set for this heap page,
+ * so the vmbuffer must have been modified and marked dirty.
+ */
+ Assert(BufferIsDirty(vmbuffer));
+ PageSetLSN(BufferGetPage(vmbuffer), lsn);
+ FreeFakeRelcacheEntry(reln);
+ }
+
+ if (BufferIsValid(vmbuffer))
+ UnlockReleaseBuffer(vmbuffer);
+
/*
* If the page is running low on free space, update the FSM as well.
* Arbitrarily, our definition of "low" is less than 20%. We can't do much
diff --git a/src/backend/access/heap/visibilitymap.c b/src/backend/access/heap/visibilitymap.c
index 7440a65c404..568bc83db9c 100644
--- a/src/backend/access/heap/visibilitymap.c
+++ b/src/backend/access/heap/visibilitymap.c
@@ -14,7 +14,8 @@
* visibilitymap_clear - clear bits for one page in the visibility map
* visibilitymap_pin - pin a map page for setting a bit
* visibilitymap_pin_ok - check whether correct map page is already pinned
- * visibilitymap_set - set a bit in a previously pinned page
+ * visibilitymap_set - set a bit in a previously pinned page and log
+ * visibilitymap_set_vmbyte - set a bit in a pinned page
* visibilitymap_get_status - get status of bits
* visibilitymap_count - count number of bits set in visibility map
* visibilitymap_prepare_truncate -
@@ -321,6 +322,70 @@ visibilitymap_set(Relation rel, BlockNumber heapBlk, Buffer heapBuf,
return status;
}
+/*
+ * Set flags in the VM block contained in the passed in vmBuf.
+ *
+ * This function is for callers which include the VM changes in the same WAL
+ * record as the modifications of the heap page which rendered it all-visible.
+ * Callers separately logging the VM changes should invoke visibilitymap_set()
+ * instead.
+ *
+ * Caller must have pinned and exclusive locked the correct block of the VM in
+ * vmBuf. This block should contain the VM bits for the given heapBlk.
+ *
+ * During normal operation (i.e. not recovery), this should be called in a
+ * critical section which also makes any necessary changes to the heap page
+ * and, if relevant, emits WAL.
+ *
+ * Caller is responsible for WAL logging the changes to the VM buffer and for
+ * making any changes needed to the associated heap page. This includes
+ * maintaining any invariants such as ensuring the buffer containing heapBlk
+ * is pinned and exclusive locked.
+ */
+uint8
+visibilitymap_set_vmbyte(Relation rel, BlockNumber heapBlk,
+ Buffer vmBuf, uint8 flags)
+{
+ BlockNumber mapBlock = HEAPBLK_TO_MAPBLOCK(heapBlk);
+ uint32 mapByte = HEAPBLK_TO_MAPBYTE(heapBlk);
+ uint8 mapOffset = HEAPBLK_TO_OFFSET(heapBlk);
+ Page page;
+ uint8 *map;
+ uint8 status;
+
+#ifdef TRACE_VISIBILITYMAP
+ elog(DEBUG1, "vm_set flags 0x%02X for %s %d",
+ flags, RelationGetRelationName(rel), heapBlk);
+#endif
+
+ /* Call in same critical section where WAL is emitted. */
+ Assert(InRecovery || CritSectionCount > 0);
+
+ /* Flags should be valid. Also never clear bits with this function */
+ Assert((flags & VISIBILITYMAP_VALID_BITS) == flags);
+
+ /* Must never set all_frozen bit without also setting all_visible bit */
+ Assert(flags != VISIBILITYMAP_ALL_FROZEN);
+
+ /* Check that we have the right VM page pinned */
+ if (!BufferIsValid(vmBuf) || BufferGetBlockNumber(vmBuf) != mapBlock)
+ elog(ERROR, "wrong VM buffer passed to visibilitymap_set");
+
+ Assert(BufferIsExclusiveLocked(vmBuf));
+
+ page = BufferGetPage(vmBuf);
+ map = (uint8 *) PageGetContents(page);
+
+ status = (map[mapByte] >> mapOffset) & VISIBILITYMAP_VALID_BITS;
+ if (flags != status)
+ {
+ map[mapByte] |= (flags << mapOffset);
+ MarkBufferDirty(vmBuf);
+ }
+
+ return status;
+}
+
/*
* visibilitymap_get_status - get status of bits
*
diff --git a/src/backend/access/rmgrdesc/heapdesc.c b/src/backend/access/rmgrdesc/heapdesc.c
index 82b62c95de5..b48d7dc1d24 100644
--- a/src/backend/access/rmgrdesc/heapdesc.c
+++ b/src/backend/access/rmgrdesc/heapdesc.c
@@ -16,6 +16,7 @@
#include "access/heapam_xlog.h"
#include "access/rmgrdesc_utils.h"
+#include "access/visibilitymapdefs.h"
#include "storage/standbydefs.h"
/*
@@ -354,6 +355,10 @@ heap2_desc(StringInfo buf, XLogReaderState *record)
appendStringInfo(buf, "ntuples: %d, flags: 0x%02X", xlrec->ntuples,
xlrec->flags);
+ if (xlrec->flags & XLH_INSERT_ALL_FROZEN_SET)
+ appendStringInfo(buf, ", vm_flags: 0x%02X",
+ VISIBILITYMAP_ALL_VISIBLE | VISIBILITYMAP_ALL_FROZEN);
+
if (XLogRecHasBlockData(record, 0) && !isinit)
{
appendStringInfoString(buf, ", offsets:");
diff --git a/src/include/access/visibilitymap.h b/src/include/access/visibilitymap.h
index be21c6dd1a3..977566f6b98 100644
--- a/src/include/access/visibilitymap.h
+++ b/src/include/access/visibilitymap.h
@@ -37,6 +37,8 @@ extern uint8 visibilitymap_set(Relation rel,
Buffer vmBuf,
TransactionId cutoff_xid,
uint8 flags);
+extern uint8 visibilitymap_set_vmbyte(Relation rel, BlockNumber heapBlk,
+ Buffer vmBuf, uint8 flags);
extern uint8 visibilitymap_get_status(Relation rel, BlockNumber heapBlk, Buffer *vmbuf);
extern void visibilitymap_count(Relation rel, BlockNumber *all_visible, BlockNumber *all_frozen);
extern BlockNumber visibilitymap_prepare_truncate(Relation rel,
--
2.43.0
v7-0006-Use-xl_heap_prune-record-for-setting-empty-pages-.patchtext/x-patch; charset=US-ASCII; name=v7-0006-Use-xl_heap_prune-record-for-setting-empty-pages-.patchDownload
From 562ad6de26a83928e6bfd11bdc1dd9db1da601fe Mon Sep 17 00:00:00 2001
From: Melanie Plageman <melanieplageman@gmail.com>
Date: Wed, 18 Jun 2025 12:42:19 -0400
Subject: [PATCH v7 06/22] Use xl_heap_prune record for setting empty pages
all-visible
As part of a project to eliminate xl_heap_visible records, eliminate
their usage in phase I vacuum of empty pages.
---
src/backend/access/heap/pruneheap.c | 14 +++++--
src/backend/access/heap/vacuumlazy.c | 55 ++++++++++++++++++----------
src/include/access/heapam.h | 1 +
3 files changed, 47 insertions(+), 23 deletions(-)
diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index d9ba0f96e34..97e51f78854 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -836,6 +836,7 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
conflict_xid = prstate.latest_xid_removed;
log_heap_prune_and_freeze(relation, buffer,
+ false,
InvalidBuffer, 0, false,
conflict_xid,
true, reason,
@@ -2051,6 +2052,9 @@ heap_log_freeze_plan(HeapTupleFreeze *tuples, int ntuples,
* case, vmbuffer should already have been updated and marked dirty and should
* still be pinned and locked.
*
+ * force_heap_fpi indicates that a full page image of the heap block should be
+ * forced.
+ *
* set_pd_all_vis indicates that we set PD_ALL_VISIBLE and thus should update
* the page LSN when checksums/wal_log_hints are enabled even if we did not
* prune or freeze tuples on the page.
@@ -2061,6 +2065,7 @@ heap_log_freeze_plan(HeapTupleFreeze *tuples, int ntuples,
*/
void
log_heap_prune_and_freeze(Relation relation, Buffer buffer,
+ bool force_heap_fpi,
Buffer vmbuffer,
uint8 vmflags,
bool set_pd_all_vis,
@@ -2089,13 +2094,16 @@ log_heap_prune_and_freeze(Relation relation, Buffer buffer,
xlrec.flags = 0;
regbuf_flags = REGBUF_STANDARD;
+ if (force_heap_fpi)
+ regbuf_flags |= REGBUF_FORCE_IMAGE;
+
/*
* We can avoid an FPI if the only modification we are making to the heap
* page is to set PD_ALL_VISIBLE and checksums/wal_log_hints are disabled.
*/
- if (!do_prune &&
- nfrozen == 0 &&
- (!set_pd_all_vis || !XLogHintBitIsNeeded()))
+ else if (!do_prune &&
+ nfrozen == 0 &&
+ (!set_pd_all_vis || !XLogHintBitIsNeeded()))
regbuf_flags |= REGBUF_NO_IMAGE;
/*
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index 460cdbd8417..d9e195269d2 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -1878,33 +1878,47 @@ lazy_scan_new_or_empty(LVRelState *vacrel, Buffer buf, BlockNumber blkno,
*/
if (!PageIsAllVisible(page))
{
+ uint8 new_vmbits = VISIBILITYMAP_ALL_VISIBLE |
+ VISIBILITYMAP_ALL_FROZEN;
+
START_CRIT_SECTION();
- /* mark buffer dirty before writing a WAL record */
+ PageSetAllVisible(page);
MarkBufferDirty(buf);
- /*
- * It's possible that another backend has extended the heap,
- * initialized the page, and then failed to WAL-log the page due
- * to an ERROR. Since heap extension is not WAL-logged, recovery
- * might try to replay our record setting the page all-visible and
- * find that the page isn't initialized, which will cause a PANIC.
- * To prevent that, check whether the page has been previously
- * WAL-logged, and if not, do that now.
- */
- if (RelationNeedsWAL(vacrel->rel) &&
- PageGetLSN(page) == InvalidXLogRecPtr)
- log_newpage_buffer(buf, true);
+ LockBuffer(vmbuffer, BUFFER_LOCK_EXCLUSIVE);
+ visibilitymap_set_vmbyte(vacrel->rel, blkno,
+ vmbuffer, new_vmbits);
+
+ if (RelationNeedsWAL(vacrel->rel))
+ {
+ /*
+ * It's possible that another backend has extended the heap,
+ * initialized the page, and then failed to WAL-log the page
+ * due to an ERROR. Since heap extension is not WAL-logged,
+ * recovery might try to replay our record setting the page
+ * all-visible and find that the page isn't initialized, which
+ * will cause a PANIC. To prevent that, if the page hasn't
+ * been previously WAL-logged, force a heap FPI.
+ */
+ log_heap_prune_and_freeze(vacrel->rel, buf,
+ PageGetLSN(page) == InvalidXLogRecPtr,
+ vmbuffer,
+ new_vmbits,
+ true,
+ InvalidTransactionId,
+ false, PRUNE_VACUUM_SCAN,
+ NULL, 0,
+ NULL, 0,
+ NULL, 0,
+ NULL, 0);
+ }
- PageSetAllVisible(page);
- visibilitymap_set(vacrel->rel, blkno, buf,
- InvalidXLogRecPtr,
- vmbuffer, InvalidTransactionId,
- VISIBILITYMAP_ALL_VISIBLE |
- VISIBILITYMAP_ALL_FROZEN);
END_CRIT_SECTION();
- /* Count the newly all-frozen pages for logging */
+ LockBuffer(vmbuffer, BUFFER_LOCK_UNLOCK);
+
+ /* Count the newly all-frozen pages for logging. */
vacrel->vm_new_visible_pages++;
vacrel->vm_new_visible_frozen_pages++;
}
@@ -2918,6 +2932,7 @@ lazy_vacuum_heap_page(LVRelState *vacrel, BlockNumber blkno, Buffer buffer,
if (RelationNeedsWAL(vacrel->rel))
{
log_heap_prune_and_freeze(vacrel->rel, buffer,
+ false,
vmbuffer,
vmflags,
set_pd_all_vis,
diff --git a/src/include/access/heapam.h b/src/include/access/heapam.h
index 8b47295efa2..e7129a644a1 100644
--- a/src/include/access/heapam.h
+++ b/src/include/access/heapam.h
@@ -394,6 +394,7 @@ extern void heap_page_prune_execute(Buffer buffer, bool lp_truncate_only,
OffsetNumber *nowunused, int nunused);
extern void heap_get_root_tuples(Page page, OffsetNumber *root_offsets);
extern void log_heap_prune_and_freeze(Relation relation, Buffer buffer,
+ bool force_heap_fpi,
Buffer vmbuffer,
uint8 vmflags,
bool vm_modified_heap_page,
--
2.43.0
v7-0007-Combine-lazy_scan_prune-VM-corruption-cases.patchtext/x-patch; charset=US-ASCII; name=v7-0007-Combine-lazy_scan_prune-VM-corruption-cases.patchDownload
From f2365f999a652aabfda0a55761eb3fbb853529ae Mon Sep 17 00:00:00 2001
From: Melanie Plageman <melanieplageman@gmail.com>
Date: Wed, 28 May 2025 16:04:03 -0400
Subject: [PATCH v7 07/22] Combine lazy_scan_prune VM corruption cases
lazy_scan_prune() updates the visibility map after phase I of heap
vacuuming. It also checks and fixes corruption in the VM. The corruption
cases where mixed in with the normal visibility map update cases.
Careful study of the ordering of the current logic reveals that the
corruption cases can be reordered and extracted into a separate
function. This should result in no additional overhead when compared to
previous execution.
This reordering makes it clear which cases are about corruption and
which cases are normal VM updates. Separating them also makes it
possible to combine the normal cases in a future commit. This will make
the logic easier to understand and allow for further separation of the
logic to allow updating the VM in the same record as pruning and
freezing in phase I.
---
src/backend/access/heap/vacuumlazy.c | 114 +++++++++++++++++----------
1 file changed, 73 insertions(+), 41 deletions(-)
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index d9e195269d2..04a7b6c4181 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -431,6 +431,12 @@ static void find_next_unskippable_block(LVRelState *vacrel, bool *skipsallvis);
static bool lazy_scan_new_or_empty(LVRelState *vacrel, Buffer buf,
BlockNumber blkno, Page page,
bool sharelock, Buffer vmbuffer);
+static bool identify_and_fix_vm_corruption(Relation relation,
+ BlockNumber heap_blk,
+ Buffer heap_buffer, Page heap_page,
+ bool heap_blk_known_av,
+ int64 nlpdead_items,
+ Buffer vmbuffer);
static int lazy_scan_prune(LVRelState *vacrel, Buffer buf,
BlockNumber blkno, Page page,
Buffer vmbuffer, bool all_visible_according_to_vm,
@@ -1933,6 +1939,66 @@ lazy_scan_new_or_empty(LVRelState *vacrel, Buffer buf, BlockNumber blkno,
return false;
}
+/*
+ * When updating the visibility map after phase I heap vacuuming, we take the
+ * opportunity to identify and fix any VM corruption.
+ *
+ * heap_blk_known_av is the visibility status of the heap page collected
+ * while finding the next unskippable block in heap_vac_scan_next_block().
+ */
+static bool
+identify_and_fix_vm_corruption(Relation relation,
+ BlockNumber heap_blk,
+ Buffer heap_buffer, Page heap_page,
+ bool heap_blk_known_av,
+ int64 nlpdead_items,
+ Buffer vmbuffer)
+{
+ /*
+ * As of PostgreSQL 9.2, the visibility map bit should never be set if the
+ * page-level bit is clear. However, it's possible that the bit got
+ * cleared after heap_vac_scan_next_block() was called, so we must recheck
+ * with buffer lock before concluding that the VM is corrupt.
+ */
+ if (heap_blk_known_av && !PageIsAllVisible(heap_page) &&
+ visibilitymap_get_status(relation, heap_blk, &vmbuffer) != 0)
+ {
+ elog(WARNING, "page is not marked all-visible but visibility map bit is set in relation \"%s\" page %u",
+ RelationGetRelationName(relation), heap_blk);
+ visibilitymap_clear(relation, heap_blk, vmbuffer,
+ VISIBILITYMAP_VALID_BITS);
+ return true;
+ }
+
+ /*
+ * It's possible for the value returned by
+ * GetOldestNonRemovableTransactionId() to move backwards, so it's not
+ * wrong for us to see tuples that appear to not be visible to everyone
+ * yet, while PD_ALL_VISIBLE is already set. The real safe xmin value
+ * never moves backwards, but GetOldestNonRemovableTransactionId() is
+ * conservative and sometimes returns a value that's unnecessarily small,
+ * so if we see that contradiction it just means that the tuples that we
+ * think are not visible to everyone yet actually are, and the
+ * PD_ALL_VISIBLE flag is correct.
+ *
+ * There should never be LP_DEAD items on a page with PD_ALL_VISIBLE set,
+ * however.
+ */
+ if (nlpdead_items > 0 && PageIsAllVisible(heap_page))
+ {
+ elog(WARNING, "page containing LP_DEAD items is marked as all-visible in relation \"%s\" page %u",
+ RelationGetRelationName(relation), heap_blk);
+ PageClearAllVisible(heap_page);
+ MarkBufferDirty(heap_buffer);
+ visibilitymap_clear(relation, heap_blk, vmbuffer,
+ VISIBILITYMAP_VALID_BITS);
+ return true;
+ }
+
+ return false;
+}
+
+
/* qsort comparator for sorting OffsetNumbers */
static int
cmpOffsetNumbers(const void *a, const void *b)
@@ -2079,9 +2145,14 @@ lazy_scan_prune(LVRelState *vacrel,
/*
* Handle setting visibility map bit based on information from the VM (as
* of last heap_vac_scan_next_block() call), and from all_visible and
- * all_frozen variables
+ * all_frozen variables. Start by looking for any VM corruption.
*/
- if (!all_visible_according_to_vm && presult.all_visible)
+ if (identify_and_fix_vm_corruption(vacrel->rel, blkno, buf, page,
+ all_visible_according_to_vm, presult.lpdead_items, vmbuffer))
+ {
+ /* Don't update the VM if we just cleared corruption in it */
+ }
+ else if (!all_visible_according_to_vm && presult.all_visible)
{
uint8 old_vmbits;
uint8 flags = VISIBILITYMAP_ALL_VISIBLE;
@@ -2133,45 +2204,6 @@ lazy_scan_prune(LVRelState *vacrel,
}
}
- /*
- * As of PostgreSQL 9.2, the visibility map bit should never be set if the
- * page-level bit is clear. However, it's possible that the bit got
- * cleared after heap_vac_scan_next_block() was called, so we must recheck
- * with buffer lock before concluding that the VM is corrupt.
- */
- else if (all_visible_according_to_vm && !PageIsAllVisible(page) &&
- visibilitymap_get_status(vacrel->rel, blkno, &vmbuffer) != 0)
- {
- elog(WARNING, "page is not marked all-visible but visibility map bit is set in relation \"%s\" page %u",
- vacrel->relname, blkno);
- visibilitymap_clear(vacrel->rel, blkno, vmbuffer,
- VISIBILITYMAP_VALID_BITS);
- }
-
- /*
- * It's possible for the value returned by
- * GetOldestNonRemovableTransactionId() to move backwards, so it's not
- * wrong for us to see tuples that appear to not be visible to everyone
- * yet, while PD_ALL_VISIBLE is already set. The real safe xmin value
- * never moves backwards, but GetOldestNonRemovableTransactionId() is
- * conservative and sometimes returns a value that's unnecessarily small,
- * so if we see that contradiction it just means that the tuples that we
- * think are not visible to everyone yet actually are, and the
- * PD_ALL_VISIBLE flag is correct.
- *
- * There should never be LP_DEAD items on a page with PD_ALL_VISIBLE set,
- * however.
- */
- else if (presult.lpdead_items > 0 && PageIsAllVisible(page))
- {
- elog(WARNING, "page containing LP_DEAD items is marked as all-visible in relation \"%s\" page %u",
- vacrel->relname, blkno);
- PageClearAllVisible(page);
- MarkBufferDirty(buf);
- visibilitymap_clear(vacrel->rel, blkno, vmbuffer,
- VISIBILITYMAP_VALID_BITS);
- }
-
/*
* If the all-visible page is all-frozen but not marked as such yet, mark
* it as all-frozen. Note that all_frozen is only valid if all_visible is
--
2.43.0
v7-0008-Combine-vacuum-phase-I-VM-update-cases.patchtext/x-patch; charset=US-ASCII; name=v7-0008-Combine-vacuum-phase-I-VM-update-cases.patchDownload
From 51a5cc0a3334a87a735e0ba5fb20e4bea72aac50 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <melanieplageman@gmail.com>
Date: Wed, 28 May 2025 16:35:36 -0400
Subject: [PATCH v7 08/22] Combine vacuum phase I VM update cases
We update the VM after phase I of vacuum -- either setting both the VM
bits when all bits are currently unset or setting just the frozen bit
when the all-visible bit is already set.
Those two cases shared much of the same code -- leading to unnecessary
duplication. This commit combines them, which is simpler and easier to
understand.
The combined case also happens to fix a longstanding bug where if we are
only setting an all-visible page all-frozen and checksums/wal_log_hints
are enabled, we would fail to set the buffer dirty before setting the
page LSN in visibilitymap_set().
---
src/backend/access/heap/vacuumlazy.c | 101 +++++++++------------------
1 file changed, 32 insertions(+), 69 deletions(-)
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index 04a7b6c4181..f6cdd9e6828 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -2152,11 +2152,26 @@ lazy_scan_prune(LVRelState *vacrel,
{
/* Don't update the VM if we just cleared corruption in it */
}
- else if (!all_visible_according_to_vm && presult.all_visible)
+
+ /*
+ * If the page isn't yet marked all-visible in the VM or it is and needs
+ * to me marked all-frozen, update the VM Note that all_frozen is only
+ * valid if all_visible is true, so we must check both all_visible and
+ * all_frozen.
+ */
+ else if (presult.all_visible &&
+ (!all_visible_according_to_vm ||
+ (presult.all_frozen && !VM_ALL_FROZEN(vacrel->rel, blkno, &vmbuffer))))
{
uint8 old_vmbits;
uint8 flags = VISIBILITYMAP_ALL_VISIBLE;
+ /*
+ * If the page is all-frozen, we can pass InvalidTransactionId as our
+ * cutoff_xid, since a snapshotConflictHorizon sufficient to make
+ * everything safe for REDO was logged when the page's tuples were
+ * frozen.
+ */
if (presult.all_frozen)
{
Assert(!TransactionIdIsValid(presult.vm_conflict_horizon));
@@ -2169,21 +2184,29 @@ lazy_scan_prune(LVRelState *vacrel,
* checksums are not enabled). Regardless, set both bits so that we
* get back in sync.
*
- * NB: If the heap page is all-visible but the VM bit is not set, we
- * don't need to dirty the heap page. However, if checksums are
- * enabled, we do need to make sure that the heap page is dirtied
- * before passing it to visibilitymap_set(), because it may be logged.
- * Given that this situation should only happen in rare cases after a
- * crash, it is not worth optimizing.
+ * If the heap page is all-visible but the VM bit is not set, we don't
+ * need to dirty the heap page. However, if checksums are enabled, we
+ * do need to make sure that the heap page is dirtied before passing
+ * it to visibilitymap_set(), because it may be logged.
*/
- PageSetAllVisible(page);
- MarkBufferDirty(buf);
+ if (!PageIsAllVisible(page) || XLogHintBitIsNeeded())
+ {
+ PageSetAllVisible(page);
+ MarkBufferDirty(buf);
+ }
+
old_vmbits = visibilitymap_set(vacrel->rel, blkno, buf,
InvalidXLogRecPtr,
vmbuffer, presult.vm_conflict_horizon,
flags);
/*
+ * Even if we are only setting the all-frozen bit, there is a small
+ * chance that the VM was modified sometime between setting
+ * all_visible_according_to_vm and checking the visibility during
+ * pruning. Check the return value of old_vmbits to ensure the
+ * visibility map counters used for logging are accurate.
+ *
* If the page wasn't already set all-visible and/or all-frozen in the
* VM, count it as newly set for logging.
*/
@@ -2204,66 +2227,6 @@ lazy_scan_prune(LVRelState *vacrel,
}
}
- /*
- * If the all-visible page is all-frozen but not marked as such yet, mark
- * it as all-frozen. Note that all_frozen is only valid if all_visible is
- * true, so we must check both all_visible and all_frozen.
- */
- else if (all_visible_according_to_vm && presult.all_visible &&
- presult.all_frozen && !VM_ALL_FROZEN(vacrel->rel, blkno, &vmbuffer))
- {
- uint8 old_vmbits;
-
- /*
- * Avoid relying on all_visible_according_to_vm as a proxy for the
- * page-level PD_ALL_VISIBLE bit being set, since it might have become
- * stale -- even when all_visible is set
- */
- if (!PageIsAllVisible(page))
- {
- PageSetAllVisible(page);
- MarkBufferDirty(buf);
- }
-
- /*
- * Set the page all-frozen (and all-visible) in the VM.
- *
- * We can pass InvalidTransactionId as our cutoff_xid, since a
- * snapshotConflictHorizon sufficient to make everything safe for REDO
- * was logged when the page's tuples were frozen.
- */
- Assert(!TransactionIdIsValid(presult.vm_conflict_horizon));
- old_vmbits = visibilitymap_set(vacrel->rel, blkno, buf,
- InvalidXLogRecPtr,
- vmbuffer, InvalidTransactionId,
- VISIBILITYMAP_ALL_VISIBLE |
- VISIBILITYMAP_ALL_FROZEN);
-
- /*
- * The page was likely already set all-visible in the VM. However,
- * there is a small chance that it was modified sometime between
- * setting all_visible_according_to_vm and checking the visibility
- * during pruning. Check the return value of old_vmbits anyway to
- * ensure the visibility map counters used for logging are accurate.
- */
- if ((old_vmbits & VISIBILITYMAP_ALL_VISIBLE) == 0)
- {
- vacrel->vm_new_visible_pages++;
- vacrel->vm_new_visible_frozen_pages++;
- *vm_page_frozen = true;
- }
-
- /*
- * We already checked that the page was not set all-frozen in the VM
- * above, so we don't need to test the value of old_vmbits.
- */
- else
- {
- vacrel->vm_new_frozen_pages++;
- *vm_page_frozen = true;
- }
- }
-
return presult.ndeleted;
}
--
2.43.0
v7-0009-Find-and-fix-VM-corruption-in-heap_page_prune_and.patchtext/x-patch; charset=US-ASCII; name=v7-0009-Find-and-fix-VM-corruption-in-heap_page_prune_and.patchDownload
From 37585b0dba82b169bfd6992f513a8a6e791bb4c2 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <melanieplageman@gmail.com>
Date: Wed, 28 May 2025 16:45:59 -0400
Subject: [PATCH v7 09/22] Find and fix VM corruption in
heap_page_prune_and_freeze
Future commits will update the VM in the same critical section and WAL
record as pruning and freezing. For ease of review, this commit makes
one step toward doing this. It moves the VM corruption handling case to
heap_page_prune_and_freeze().
---
src/backend/access/heap/pruneheap.c | 87 +++++++++++++++++++++++++++-
src/backend/access/heap/vacuumlazy.c | 77 +++---------------------
src/include/access/heapam.h | 4 ++
3 files changed, 96 insertions(+), 72 deletions(-)
diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index 97e51f78854..496b70e318f 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -21,7 +21,7 @@
#include "access/transam.h"
#include "access/xlog.h"
#include "access/xloginsert.h"
-#include "access/visibilitymapdefs.h"
+#include "access/visibilitymap.h"
#include "commands/vacuum.h"
#include "executor/instrument.h"
#include "miscadmin.h"
@@ -177,6 +177,13 @@ static void heap_prune_record_unchanged_lp_redirect(PruneState *prstate, OffsetN
static void page_verify_redirects(Page page);
+static bool identify_and_fix_vm_corruption(Relation relation,
+ BlockNumber heap_blk,
+ Buffer heap_buffer, Page heap_page,
+ bool heap_blk_known_av,
+ int64 nlpdead_items,
+ Buffer vmbuffer);
+
/*
* Optionally prune and repair fragmentation in the specified page.
@@ -261,7 +268,9 @@ heap_page_prune_opt(Relation relation, Buffer buffer)
* not the relation has indexes, since we cannot safely determine
* that during on-access pruning with the current implementation.
*/
- heap_page_prune_and_freeze(relation, buffer, vistest, 0,
+ heap_page_prune_and_freeze(relation, buffer, false,
+ InvalidBuffer,
+ vistest, 0,
NULL, &presult, PRUNE_ON_ACCESS, &dummy_off_loc, NULL, NULL);
/*
@@ -294,6 +303,64 @@ heap_page_prune_opt(Relation relation, Buffer buffer)
}
}
+/*
+ * When updating the visibility map after phase I heap vacuuming, we take the
+ * opportunity to identify and fix any VM corruption.
+ *
+ * heap_blk_known_av is the visibility status of the heap page collected
+ * while finding the next unskippable block in heap_vac_scan_next_block().
+ */
+static bool
+identify_and_fix_vm_corruption(Relation relation,
+ BlockNumber heap_blk,
+ Buffer heap_buffer, Page heap_page,
+ bool heap_blk_known_av,
+ int64 nlpdead_items,
+ Buffer vmbuffer)
+{
+ /*
+ * As of PostgreSQL 9.2, the visibility map bit should never be set if the
+ * page-level bit is clear. However, it's possible that the bit got
+ * cleared after heap_vac_scan_next_block() was called, so we must recheck
+ * with buffer lock before concluding that the VM is corrupt.
+ */
+ if (heap_blk_known_av && !PageIsAllVisible(heap_page) &&
+ visibilitymap_get_status(relation, heap_blk, &vmbuffer) != 0)
+ {
+ elog(WARNING, "page is not marked all-visible but visibility map bit is set in relation \"%s\" page %u",
+ RelationGetRelationName(relation), heap_blk);
+ visibilitymap_clear(relation, heap_blk, vmbuffer,
+ VISIBILITYMAP_VALID_BITS);
+ return true;
+ }
+
+ /*
+ * It's possible for the value returned by
+ * GetOldestNonRemovableTransactionId() to move backwards, so it's not
+ * wrong for us to see tuples that appear to not be visible to everyone
+ * yet, while PD_ALL_VISIBLE is already set. The real safe xmin value
+ * never moves backwards, but GetOldestNonRemovableTransactionId() is
+ * conservative and sometimes returns a value that's unnecessarily small,
+ * so if we see that contradiction it just means that the tuples that we
+ * think are not visible to everyone yet actually are, and the
+ * PD_ALL_VISIBLE flag is correct.
+ *
+ * There should never be LP_DEAD items on a page with PD_ALL_VISIBLE set,
+ * however.
+ */
+ if (nlpdead_items > 0 && PageIsAllVisible(heap_page))
+ {
+ elog(WARNING, "page containing LP_DEAD items is marked as all-visible in relation \"%s\" page %u",
+ RelationGetRelationName(relation), heap_blk);
+ PageClearAllVisible(heap_page);
+ MarkBufferDirty(heap_buffer);
+ visibilitymap_clear(relation, heap_blk, vmbuffer,
+ VISIBILITYMAP_VALID_BITS);
+ return true;
+ }
+
+ return false;
+}
/*
* Prune and repair fragmentation and potentially freeze tuples on the
@@ -314,6 +381,10 @@ heap_page_prune_opt(Relation relation, Buffer buffer)
* HEAP_PRUNE_FREEZE option is not set, because at the moment only callers
* that also freeze need that information.
*
+ * blk_known_av is the visibility status of the heap block as of the last call
+ * to find_next_unskippable_block(). vmbuffer is the buffer that may already
+ * contain the required block of the visibility map.
+ *
* vistest is used to distinguish whether tuples are DEAD or RECENTLY_DEAD
* (see heap_prune_satisfies_vacuum).
*
@@ -349,6 +420,8 @@ heap_page_prune_opt(Relation relation, Buffer buffer)
*/
void
heap_page_prune_and_freeze(Relation relation, Buffer buffer,
+ bool blk_known_av,
+ Buffer vmbuffer,
GlobalVisState *vistest,
int options,
struct VacuumCutoffs *cutoffs,
@@ -897,6 +970,16 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
presult->lpdead_items = prstate.lpdead_items;
/* the presult->deadoffsets array was already filled in */
+ /*
+ * Clear any VM corruption. This does not need to be done in a critical
+ * section.
+ */
+ presult->vm_corruption = false;
+ if (options & HEAP_PAGE_PRUNE_UPDATE_VM)
+ presult->vm_corruption = identify_and_fix_vm_corruption(relation,
+ blockno, buffer, page,
+ blk_known_av,
+ prstate.lpdead_items, vmbuffer);
if (prstate.freeze)
{
if (presult->nfrozen > 0)
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index f6cdd9e6828..0c121fdf4e6 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -431,12 +431,6 @@ static void find_next_unskippable_block(LVRelState *vacrel, bool *skipsallvis);
static bool lazy_scan_new_or_empty(LVRelState *vacrel, Buffer buf,
BlockNumber blkno, Page page,
bool sharelock, Buffer vmbuffer);
-static bool identify_and_fix_vm_corruption(Relation relation,
- BlockNumber heap_blk,
- Buffer heap_buffer, Page heap_page,
- bool heap_blk_known_av,
- int64 nlpdead_items,
- Buffer vmbuffer);
static int lazy_scan_prune(LVRelState *vacrel, Buffer buf,
BlockNumber blkno, Page page,
Buffer vmbuffer, bool all_visible_according_to_vm,
@@ -1939,65 +1933,6 @@ lazy_scan_new_or_empty(LVRelState *vacrel, Buffer buf, BlockNumber blkno,
return false;
}
-/*
- * When updating the visibility map after phase I heap vacuuming, we take the
- * opportunity to identify and fix any VM corruption.
- *
- * heap_blk_known_av is the visibility status of the heap page collected
- * while finding the next unskippable block in heap_vac_scan_next_block().
- */
-static bool
-identify_and_fix_vm_corruption(Relation relation,
- BlockNumber heap_blk,
- Buffer heap_buffer, Page heap_page,
- bool heap_blk_known_av,
- int64 nlpdead_items,
- Buffer vmbuffer)
-{
- /*
- * As of PostgreSQL 9.2, the visibility map bit should never be set if the
- * page-level bit is clear. However, it's possible that the bit got
- * cleared after heap_vac_scan_next_block() was called, so we must recheck
- * with buffer lock before concluding that the VM is corrupt.
- */
- if (heap_blk_known_av && !PageIsAllVisible(heap_page) &&
- visibilitymap_get_status(relation, heap_blk, &vmbuffer) != 0)
- {
- elog(WARNING, "page is not marked all-visible but visibility map bit is set in relation \"%s\" page %u",
- RelationGetRelationName(relation), heap_blk);
- visibilitymap_clear(relation, heap_blk, vmbuffer,
- VISIBILITYMAP_VALID_BITS);
- return true;
- }
-
- /*
- * It's possible for the value returned by
- * GetOldestNonRemovableTransactionId() to move backwards, so it's not
- * wrong for us to see tuples that appear to not be visible to everyone
- * yet, while PD_ALL_VISIBLE is already set. The real safe xmin value
- * never moves backwards, but GetOldestNonRemovableTransactionId() is
- * conservative and sometimes returns a value that's unnecessarily small,
- * so if we see that contradiction it just means that the tuples that we
- * think are not visible to everyone yet actually are, and the
- * PD_ALL_VISIBLE flag is correct.
- *
- * There should never be LP_DEAD items on a page with PD_ALL_VISIBLE set,
- * however.
- */
- if (nlpdead_items > 0 && PageIsAllVisible(heap_page))
- {
- elog(WARNING, "page containing LP_DEAD items is marked as all-visible in relation \"%s\" page %u",
- RelationGetRelationName(relation), heap_blk);
- PageClearAllVisible(heap_page);
- MarkBufferDirty(heap_buffer);
- visibilitymap_clear(relation, heap_blk, vmbuffer,
- VISIBILITYMAP_VALID_BITS);
- return true;
- }
-
- return false;
-}
-
/* qsort comparator for sorting OffsetNumbers */
static int
@@ -2056,11 +1991,14 @@ lazy_scan_prune(LVRelState *vacrel,
* tuples. Pruning will have determined whether or not the page is
* all-visible.
*/
- prune_options = HEAP_PAGE_PRUNE_FREEZE;
+ prune_options = HEAP_PAGE_PRUNE_FREEZE | HEAP_PAGE_PRUNE_UPDATE_VM;
if (vacrel->nindexes == 0)
prune_options |= HEAP_PAGE_PRUNE_MARK_UNUSED_NOW;
- heap_page_prune_and_freeze(rel, buf, vacrel->vistest, prune_options,
+ heap_page_prune_and_freeze(rel, buf,
+ all_visible_according_to_vm,
+ vmbuffer,
+ vacrel->vistest, prune_options,
&vacrel->cutoffs, &presult, PRUNE_VACUUM_SCAN,
&vacrel->offnum,
&vacrel->NewRelfrozenXid, &vacrel->NewRelminMxid);
@@ -2145,10 +2083,9 @@ lazy_scan_prune(LVRelState *vacrel,
/*
* Handle setting visibility map bit based on information from the VM (as
* of last heap_vac_scan_next_block() call), and from all_visible and
- * all_frozen variables. Start by looking for any VM corruption.
+ * all_frozen variables.
*/
- if (identify_and_fix_vm_corruption(vacrel->rel, blkno, buf, page,
- all_visible_according_to_vm, presult.lpdead_items, vmbuffer))
+ if (presult.vm_corruption)
{
/* Don't update the VM if we just cleared corruption in it */
}
diff --git a/src/include/access/heapam.h b/src/include/access/heapam.h
index e7129a644a1..0c7eb5e46f4 100644
--- a/src/include/access/heapam.h
+++ b/src/include/access/heapam.h
@@ -42,6 +42,7 @@
/* "options" flag bits for heap_page_prune_and_freeze */
#define HEAP_PAGE_PRUNE_MARK_UNUSED_NOW (1 << 0)
#define HEAP_PAGE_PRUNE_FREEZE (1 << 1)
+#define HEAP_PAGE_PRUNE_UPDATE_VM (1 << 2)
typedef struct BulkInsertStateData *BulkInsertState;
struct TupleTableSlot;
@@ -247,6 +248,7 @@ typedef struct PruneFreezeResult
bool all_visible;
bool all_frozen;
TransactionId vm_conflict_horizon;
+ bool vm_corruption;
/*
* Whether or not the page makes rel truncation unsafe. This is set to
@@ -380,6 +382,8 @@ extern TransactionId heap_index_delete_tuples(Relation rel,
struct GlobalVisState;
extern void heap_page_prune_opt(Relation relation, Buffer buffer);
extern void heap_page_prune_and_freeze(Relation relation, Buffer buffer,
+ bool blk_known_av,
+ Buffer vmbuffer,
struct GlobalVisState *vistest,
int options,
struct VacuumCutoffs *cutoffs,
--
2.43.0
v7-0011-Update-VM-in-pruneheap.c.patchtext/x-patch; charset=US-ASCII; name=v7-0011-Update-VM-in-pruneheap.c.patchDownload
From 0c0e63c0d7675559acd1f69203ba7423cd286352 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <melanieplageman@gmail.com>
Date: Mon, 2 Jun 2025 11:04:14 -0400
Subject: [PATCH v7 11/22] Update VM in pruneheap.c
As a step toward updating the VM in the same critical section and WAL
record as pruning and freezing (during phase I of vacuuming), first move
the VM update (still in its own critical section and WAL record) into
heap_page_prune_and_freeze(). This makes review easier.
---
src/backend/access/heap/pruneheap.c | 99 +++++++++++++++++++++++-----
src/backend/access/heap/vacuumlazy.c | 99 +++++-----------------------
src/include/access/heapam.h | 15 +++--
3 files changed, 106 insertions(+), 107 deletions(-)
diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index 6c3653e776c..05227ce0339 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -360,7 +360,8 @@ identify_and_fix_vm_corruption(Relation relation,
/*
* Prune and repair fragmentation and potentially freeze tuples on the
- * specified page.
+ * specified page. If the page's visibility status has changed, update it in
+ * the VM.
*
* Caller must have pin and buffer cleanup lock on the page. Note that we
* don't update the FSM information for page on caller's behalf. Caller might
@@ -436,6 +437,8 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
bool do_freeze;
bool do_prune;
bool do_hint;
+ uint8 vmflags = 0;
+ uint8 old_vmbits = 0;
bool hint_bit_fpi;
int64 fpi_before = pgWalUsage.wal_fpi;
@@ -936,7 +939,7 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
*
* Now that freezing has been finalized, unset all_visible if there are
* any LP_DEAD items on the page. It needs to reflect the present state
- * of the page, as expected by our caller.
+ * of the page, as expected for updating the visibility map.
*/
if (prstate.all_visible && prstate.lpdead_items == 0)
{
@@ -952,31 +955,91 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
presult->hastup = prstate.hastup;
/*
- * For callers planning to update the visibility map, the conflict horizon
- * for that record must be the newest xmin on the page. However, if the
- * page is completely frozen, there can be no conflict and the
- * vm_conflict_horizon should remain InvalidTransactionId. This includes
- * the case that we just froze all the tuples; the prune-freeze record
- * included the conflict XID already so the caller doesn't need it.
+ * If updating the visibility map, the conflict horizon for that record
+ * must be the newest xmin on the page. However, if the page is
+ * completely frozen, there can be no conflict and the vm_conflict_horizon
+ * should remain InvalidTransactionId. This includes the case that we
+ * just froze all the tuples; the prune-freeze record included the
+ * conflict XID already so the VM update record doesn't need it.
*/
if (presult->all_frozen)
presult->vm_conflict_horizon = InvalidTransactionId;
else
presult->vm_conflict_horizon = prstate.visibility_cutoff_xid;
- presult->lpdead_items = prstate.lpdead_items;
- /* the presult->deadoffsets array was already filled in */
-
/*
- * Clear any VM corruption. This does not need to be done in a critical
- * section.
+ * Handle setting visibility map bit based on information from the VM (as
+ * of last heap_vac_scan_next_block() call), and from all_visible and
+ * all_frozen variables.
*/
- presult->vm_corruption = false;
if (options & HEAP_PAGE_PRUNE_UPDATE_VM)
- presult->vm_corruption = identify_and_fix_vm_corruption(relation,
- blockno, buffer, page,
- blk_known_av,
- prstate.lpdead_items, vmbuffer);
+ {
+ if (identify_and_fix_vm_corruption(relation,
+ blockno, buffer, page,
+ blk_known_av,
+ prstate.lpdead_items, vmbuffer))
+ {
+ /* If we fix corruption, don't update the VM further */
+ }
+
+ /*
+ * If the page isn't yet marked all-visible in the VM or it is and
+ * needs to me marked all-frozen, update the VM Note that all_frozen
+ * is only valid if all_visible is true, so we must check both
+ * all_visible and all_frozen.
+ */
+ else if (presult->all_visible &&
+ (!blk_known_av ||
+ (presult->all_frozen && !VM_ALL_FROZEN(relation, blockno, &vmbuffer))))
+ {
+ Assert(prstate.lpdead_items == 0);
+ vmflags = VISIBILITYMAP_ALL_VISIBLE;
+
+ /*
+ * If the page is all-frozen, we can pass InvalidTransactionId as
+ * our cutoff_xid, since a snapshotConflictHorizon sufficient to
+ * make everything safe for REDO was logged when the page's tuples
+ * were frozen.
+ */
+ if (presult->all_frozen)
+ {
+ Assert(!TransactionIdIsValid(presult->vm_conflict_horizon));
+ vmflags |= VISIBILITYMAP_ALL_FROZEN;
+ }
+
+ /*
+ * It's possible for the VM bit to be clear and the page-level bit
+ * to be set if checksums are not enabled.
+ *
+ * And even if we are just planning to update the frozen bit in
+ * the VM, we shouldn't rely on all_visible_according_to_vm as a
+ * proxy for the page-level PD_ALL_VISIBLE bit being set, since it
+ * might have become stale.
+ *
+ * If the heap page is all-visible but the VM bit is not set, we
+ * don't need to dirty the heap page. However, if checksums are
+ * enabled, we do need to make sure that the heap page is dirtied
+ * before passing it to visibilitymap_set(), because it may be
+ * logged.
+ */
+ if (!PageIsAllVisible(page) || XLogHintBitIsNeeded())
+ {
+ PageSetAllVisible(page);
+ MarkBufferDirty(buffer);
+ }
+
+ old_vmbits = visibilitymap_set(relation, blockno, buffer, InvalidXLogRecPtr,
+ vmbuffer, presult->vm_conflict_horizon,
+ vmflags);
+ }
+ }
+
+ presult->lpdead_items = prstate.lpdead_items;
+ /* the presult->deadoffsets array was already filled in */
+
+ presult->old_vmbits = old_vmbits;
+ presult->new_vmbits = vmflags;
+
if (prstate.freeze)
{
if (presult->nfrozen > 0)
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index 0c121fdf4e6..c49e81bc5dd 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -1933,7 +1933,6 @@ lazy_scan_new_or_empty(LVRelState *vacrel, Buffer buf, BlockNumber blkno,
return false;
}
-
/* qsort comparator for sorting OffsetNumbers */
static int
cmpOffsetNumbers(const void *a, const void *b)
@@ -1949,7 +1948,8 @@ cmpOffsetNumbers(const void *a, const void *b)
* vmbuffer is the buffer containing the VM block with visibility information
* for the heap block, blkno. all_visible_according_to_vm is the saved
* visibility status of the heap block looked up earlier by the caller. We
- * won't rely entirely on this status, as it may be out of date.
+ * won't rely entirely on this status, as it may be out of date. These will be
+ * passed on to heap_page_prune_and_freeze() to use while setting the VM.
*
* *has_lpdead_items is set to true or false depending on whether, upon return
* from this function, any LP_DEAD items are still present on the page.
@@ -1978,6 +1978,7 @@ lazy_scan_prune(LVRelState *vacrel,
/*
* Prune all HOT-update chains and potentially freeze tuples on this page.
+ * Then, if the page's visibility status has changed, update the VM.
*
* If the relation has no indexes, we can immediately mark would-be dead
* items LP_UNUSED.
@@ -1986,10 +1987,6 @@ lazy_scan_prune(LVRelState *vacrel,
* presult.ndeleted. It should not be confused with presult.lpdead_items;
* presult.lpdead_items's final value can be thought of as the number of
* tuples that were deleted from indexes.
- *
- * We will update the VM after collecting LP_DEAD items and freezing
- * tuples. Pruning will have determined whether or not the page is
- * all-visible.
*/
prune_options = HEAP_PAGE_PRUNE_FREEZE | HEAP_PAGE_PRUNE_UPDATE_VM;
if (vacrel->nindexes == 0)
@@ -2081,88 +2078,26 @@ lazy_scan_prune(LVRelState *vacrel,
Assert(!presult.all_visible || !(*has_lpdead_items));
/*
- * Handle setting visibility map bit based on information from the VM (as
- * of last heap_vac_scan_next_block() call), and from all_visible and
- * all_frozen variables.
+ * For the purposes of logging, count whether or not the page was newly
+ * set all-visible and, potentially, all-frozen.
*/
- if (presult.vm_corruption)
+ if ((presult.old_vmbits & VISIBILITYMAP_ALL_VISIBLE) == 0 &&
+ (presult.new_vmbits & VISIBILITYMAP_ALL_VISIBLE) != 0)
{
- /* Don't update the VM if we just cleared corruption in it */
- }
-
- /*
- * If the page isn't yet marked all-visible in the VM or it is and needs
- * to me marked all-frozen, update the VM Note that all_frozen is only
- * valid if all_visible is true, so we must check both all_visible and
- * all_frozen.
- */
- else if (presult.all_visible &&
- (!all_visible_according_to_vm ||
- (presult.all_frozen && !VM_ALL_FROZEN(vacrel->rel, blkno, &vmbuffer))))
- {
- uint8 old_vmbits;
- uint8 flags = VISIBILITYMAP_ALL_VISIBLE;
-
- /*
- * If the page is all-frozen, we can pass InvalidTransactionId as our
- * cutoff_xid, since a snapshotConflictHorizon sufficient to make
- * everything safe for REDO was logged when the page's tuples were
- * frozen.
- */
- if (presult.all_frozen)
- {
- Assert(!TransactionIdIsValid(presult.vm_conflict_horizon));
- flags |= VISIBILITYMAP_ALL_FROZEN;
- }
-
- /*
- * It should never be the case that the visibility map page is set
- * while the page-level bit is clear, but the reverse is allowed (if
- * checksums are not enabled). Regardless, set both bits so that we
- * get back in sync.
- *
- * If the heap page is all-visible but the VM bit is not set, we don't
- * need to dirty the heap page. However, if checksums are enabled, we
- * do need to make sure that the heap page is dirtied before passing
- * it to visibilitymap_set(), because it may be logged.
- */
- if (!PageIsAllVisible(page) || XLogHintBitIsNeeded())
- {
- PageSetAllVisible(page);
- MarkBufferDirty(buf);
- }
-
- old_vmbits = visibilitymap_set(vacrel->rel, blkno, buf,
- InvalidXLogRecPtr,
- vmbuffer, presult.vm_conflict_horizon,
- flags);
-
- /*
- * Even if we are only setting the all-frozen bit, there is a small
- * chance that the VM was modified sometime between setting
- * all_visible_according_to_vm and checking the visibility during
- * pruning. Check the return value of old_vmbits to ensure the
- * visibility map counters used for logging are accurate.
- *
- * If the page wasn't already set all-visible and/or all-frozen in the
- * VM, count it as newly set for logging.
- */
- if ((old_vmbits & VISIBILITYMAP_ALL_VISIBLE) == 0)
- {
- vacrel->vm_new_visible_pages++;
- if (presult.all_frozen)
- {
- vacrel->vm_new_visible_frozen_pages++;
- *vm_page_frozen = true;
- }
- }
- else if ((old_vmbits & VISIBILITYMAP_ALL_FROZEN) == 0 &&
- presult.all_frozen)
+ vacrel->vm_new_visible_pages++;
+ if ((presult.new_vmbits & VISIBILITYMAP_ALL_FROZEN) != 0)
{
- vacrel->vm_new_frozen_pages++;
+ vacrel->vm_new_visible_frozen_pages++;
*vm_page_frozen = true;
}
}
+ else if ((presult.old_vmbits & VISIBILITYMAP_ALL_FROZEN) == 0 &&
+ (presult.new_vmbits & VISIBILITYMAP_ALL_FROZEN) != 0)
+ {
+ Assert((presult.new_vmbits & VISIBILITYMAP_ALL_VISIBLE) != 0);
+ vacrel->vm_new_frozen_pages++;
+ *vm_page_frozen = true;
+ }
return presult.ndeleted;
}
diff --git a/src/include/access/heapam.h b/src/include/access/heapam.h
index 0c7eb5e46f4..b85648456e9 100644
--- a/src/include/access/heapam.h
+++ b/src/include/access/heapam.h
@@ -235,20 +235,21 @@ typedef struct PruneFreezeResult
int recently_dead_tuples;
/*
- * all_visible and all_frozen indicate if the all-visible and all-frozen
- * bits in the visibility map can be set for this page, after pruning.
+ * all_visible and all_frozen indicate the status of the page as reflected
+ * in the visibility map after pruning, freezing, and setting any pages
+ * all-visible in the visibility map.
*
- * vm_conflict_horizon is the newest xmin of live tuples on the page. The
- * caller can use it as the conflict horizon when setting the VM bits. It
- * is only valid if we froze some tuples (nfrozen > 0), and all_frozen is
- * true.
+ * vm_conflict_horizon is the newest xmin of live tuples on the page
+ * (older than OldestXmin). It will only be valid if we did not set the
+ * page all-frozen in the VM.
*
* These are only set if the HEAP_PRUNE_FREEZE option is set.
*/
bool all_visible;
bool all_frozen;
TransactionId vm_conflict_horizon;
- bool vm_corruption;
+ uint8 old_vmbits;
+ uint8 new_vmbits;
/*
* Whether or not the page makes rel truncation unsafe. This is set to
--
2.43.0
v7-0010-Keep-all_frozen-updated-too-in-heap_page_prune_an.patchtext/x-patch; charset=US-ASCII; name=v7-0010-Keep-all_frozen-updated-too-in-heap_page_prune_an.patchDownload
From a865234b5efa4a994d5f7887bc8222aa172f4f4d Mon Sep 17 00:00:00 2001
From: Melanie Plageman <melanieplageman@gmail.com>
Date: Wed, 23 Jul 2025 16:01:24 -0400
Subject: [PATCH v7 10/22] Keep all_frozen updated too in
heap_page_prune_and_freeze
We previously relied on only using all-visible and all-frozen together
but it's best to keep them both updated.
---
src/backend/access/heap/pruneheap.c | 15 ++++++---------
1 file changed, 6 insertions(+), 9 deletions(-)
diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index 496b70e318f..6c3653e776c 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -143,10 +143,6 @@ typedef struct
* whether to freeze the page or not. The all_visible and all_frozen
* values returned to the caller are adjusted to include LP_DEAD items at
* the end.
- *
- * all_frozen should only be considered valid if all_visible is also set;
- * we don't bother to clear the all_frozen flag every time we clear the
- * all_visible flag.
*/
bool all_visible;
bool all_frozen;
@@ -824,6 +820,7 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
*/
}
+ Assert(!prstate.all_frozen || prstate.all_visible);
/* Any error while applying the changes is critical */
START_CRIT_SECTION();
@@ -1468,7 +1465,7 @@ heap_prune_record_unchanged_lp_normal(Page page, PruneState *prstate, OffsetNumb
if (!HeapTupleHeaderXminCommitted(htup))
{
- prstate->all_visible = false;
+ prstate->all_visible = prstate->all_frozen = false;
break;
}
@@ -1490,7 +1487,7 @@ heap_prune_record_unchanged_lp_normal(Page page, PruneState *prstate, OffsetNumb
Assert(prstate->cutoffs);
if (!TransactionIdPrecedes(xmin, prstate->cutoffs->OldestXmin))
{
- prstate->all_visible = false;
+ prstate->all_visible = prstate->all_frozen = false;
break;
}
@@ -1503,7 +1500,7 @@ heap_prune_record_unchanged_lp_normal(Page page, PruneState *prstate, OffsetNumb
case HEAPTUPLE_RECENTLY_DEAD:
prstate->recently_dead_tuples++;
- prstate->all_visible = false;
+ prstate->all_visible = prstate->all_frozen = false;
/*
* This tuple will soon become DEAD. Update the hint field so
@@ -1522,7 +1519,7 @@ heap_prune_record_unchanged_lp_normal(Page page, PruneState *prstate, OffsetNumb
* assumption is a bit shaky, but it is what acquire_sample_rows()
* does, so be consistent.
*/
- prstate->all_visible = false;
+ prstate->all_visible = prstate->all_frozen = false;
/*
* If we wanted to optimize for aborts, we might consider marking
@@ -1540,7 +1537,7 @@ heap_prune_record_unchanged_lp_normal(Page page, PruneState *prstate, OffsetNumb
* will commit and update the counters after we report.
*/
prstate->live_tuples++;
- prstate->all_visible = false;
+ prstate->all_visible = prstate->all_frozen = false;
/*
* This tuple may soon become DEAD. Update the hint field so that
--
2.43.0
v7-0012-Eliminate-xl_heap_visible-from-vacuum-phase-I-pru.patchtext/x-patch; charset=US-ASCII; name=v7-0012-Eliminate-xl_heap_visible-from-vacuum-phase-I-pru.patchDownload
From a8977f72c3db92c0585bb906a34cd6e003f8a5e5 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <melanieplageman@gmail.com>
Date: Wed, 18 Jun 2025 12:41:00 -0400
Subject: [PATCH v7 12/22] Eliminate xl_heap_visible from vacuum phase I
prune/freeze
Instead of emitting a separate WAL record for every block rendered
all-visible/frozen by vacuum's phase I, include the changes to the VM in
the xl_heap_prune record already emitted.
This is only enabled for vacuum's prune/freeze work, not for on-access
pruning.
---
src/backend/access/heap/pruneheap.c | 454 ++++++++++++++++-----------
src/backend/access/heap/vacuumlazy.c | 30 --
src/include/access/heapam.h | 15 +-
3 files changed, 278 insertions(+), 221 deletions(-)
diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index 05227ce0339..cf9e5215d6b 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -44,6 +44,13 @@ typedef struct
bool mark_unused_now;
/* whether to attempt freezing tuples */
bool freeze;
+
+ /*
+ * Whether or not to consider updating the VM. There is some bookkeeping
+ * that must be maintained if we would like to update the VM.
+ */
+ bool consider_update_vm;
+
struct VacuumCutoffs *cutoffs;
/*-------------------------------------------------------
@@ -108,8 +115,9 @@ typedef struct
*
* These fields are not used by pruning itself for the most part, but are
* used to collect information about what was pruned and what state the
- * page is in after pruning, for the benefit of the caller. They are
- * copied to the caller's PruneFreezeResult at the end.
+ * page is in after pruning to use when updating the visibility map and
+ * for the benefit of the caller. They are copied to the caller's
+ * PruneFreezeResult at the end.
* -------------------------------------------------------
*/
@@ -138,11 +146,10 @@ typedef struct
* bits. It is only valid if we froze some tuples, and all_frozen is
* true.
*
- * NOTE: all_visible and all_frozen don't include LP_DEAD items. That's
- * convenient for heap_page_prune_and_freeze(), to use them to decide
- * whether to freeze the page or not. The all_visible and all_frozen
- * values returned to the caller are adjusted to include LP_DEAD items at
- * the end.
+ * NOTE: all_visible and all_frozen don't include LP_DEAD items until
+ * directly before updating the VM. We ignore LP_DEAD items when deciding
+ * whether or not to opportunistically freeze and when determining the
+ * snapshot conflict horizon required when freezing tuples.
*/
bool all_visible;
bool all_frozen;
@@ -371,12 +378,15 @@ identify_and_fix_vm_corruption(Relation relation,
* If the HEAP_PRUNE_FREEZE option is set, we will also freeze tuples if it's
* required in order to advance relfrozenxid / relminmxid, or if it's
* considered advantageous for overall system performance to do so now. The
- * 'cutoffs', 'presult', 'new_relfrozen_xid' and 'new_relmin_mxid' arguments
- * are required when freezing. When HEAP_PRUNE_FREEZE option is set, we also
- * set presult->all_visible and presult->all_frozen on exit, to indicate if
- * the VM bits can be set. They are always set to false when the
- * HEAP_PRUNE_FREEZE option is not set, because at the moment only callers
- * that also freeze need that information.
+ * 'cutoffs', 'new_relfrozen_xid' and 'new_relmin_mxid' arguments are required
+ * when freezing.
+ *
+ * If HEAP_PAGE_PRUNE_UPDATE_VM is set and the visibility status of the page
+ * has changed, we will update the VM at the same time as pruning and freezing
+ * the heap page. We will also update presult->old_vmbits and
+ * presult->new_vmbits with the state of the VM before and after updating it
+ * for the caller to use in bookkeeping. Note that new and old_vmbits will be
+ * 0 if HEAP_PAGE_PRUNE_UPDATE_VM is not set.
*
* blk_known_av is the visibility status of the heap block as of the last call
* to find_next_unskippable_block(). vmbuffer is the buffer that may already
@@ -392,6 +402,8 @@ identify_and_fix_vm_corruption(Relation relation,
* FREEZE indicates that we will also freeze tuples, and will return
* 'all_visible', 'all_frozen' flags to the caller.
*
+ * UPDATE_VM indicates that we will set the page's status in the VM.
+ *
* cutoffs contains the freeze cutoffs, established by VACUUM at the beginning
* of vacuuming the relation. Required if HEAP_PRUNE_FREEZE option is set.
* cutoffs->OldestXmin is also used to determine if dead tuples are
@@ -436,18 +448,24 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
HeapTupleData tup;
bool do_freeze;
bool do_prune;
- bool do_hint;
+ bool do_hint_full_or_prunable;
+ bool do_set_vm;
uint8 vmflags = 0;
uint8 old_vmbits = 0;
bool hint_bit_fpi;
int64 fpi_before = pgWalUsage.wal_fpi;
+ bool all_frozen_except_lp_dead = false;
+ bool set_pd_all_visible = false;
/* Copy parameters to prstate */
prstate.vistest = vistest;
prstate.mark_unused_now = (options & HEAP_PAGE_PRUNE_MARK_UNUSED_NOW) != 0;
prstate.freeze = (options & HEAP_PAGE_PRUNE_FREEZE) != 0;
+ prstate.consider_update_vm = (options & HEAP_PAGE_PRUNE_UPDATE_VM) != 0;
prstate.cutoffs = cutoffs;
+ Assert(!prstate.consider_update_vm || vmbuffer);
+
/*
* Our strategy is to scan the page and make lists of items to change,
* then apply the changes within a critical section. This keeps as much
@@ -492,50 +510,57 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
prstate.deadoffsets = presult->deadoffsets;
/*
- * Caller may update the VM after we're done. We can keep track of
- * whether the page will be all-visible and all-frozen after pruning and
- * freezing to help the caller to do that.
+ * Keep track of whether or not the page will be all-visible and
+ * all-frozen for use in opportunistic freezing and to update the VM if
+ * the caller requests it.
+ *
+ * Currently, only VACUUM attempts freezing and setting the VM bits. But
+ * other callers could do either one. The visibility bookkeeping is
+ * required for opportunistic freezing (in addition to setting the VM
+ * bits) because we only consider opportunistically freezing tuples if the
+ * whole page would become all-frozen or if the whole page will be frozen
+ * except for dead tuples that will be removed by vacuum.
*
- * Currently, only VACUUM sets the VM bits. To save the effort, only do
- * the bookkeeping if the caller needs it. Currently, that's tied to
- * HEAP_PAGE_PRUNE_FREEZE, but it could be a separate flag if you wanted
- * to update the VM bits without also freezing or freeze without also
- * setting the VM bits.
+ * If only updating the VM, we must initialize all_frozen to false, as
+ * heap_prepare_freeze_tuple() will not be called for each tuple on the
+ * page and we will not end up correctly setting it to false later.
*
- * In addition to telling the caller whether it can set the VM bit, we
- * also use 'all_visible' and 'all_frozen' for our own decision-making. If
- * the whole page would become frozen, we consider opportunistically
- * freezing tuples. We will not be able to freeze the whole page if there
- * are tuples present that are not visible to everyone or if there are
- * dead tuples which are not yet removable. However, dead tuples which
- * will be removed by the end of vacuuming should not preclude us from
- * opportunistically freezing. Because of that, we do not clear
- * all_visible when we see LP_DEAD items. We fix that at the end of the
- * function, when we return the value to the caller, so that the caller
- * doesn't set the VM bit incorrectly.
+ * Dead tuples which will be removed by the end of vacuuming should not
+ * preclude us from opportunistically freezing, so we do not clear
+ * all_visible when we see LP_DEAD items. We fix that after determining
+ * whether or not to freeze but before deciding whether or not to update
+ * the VM so that we don't set the VM bit incorrectly.
+ *
+ * If not freezing or updating the VM, we otherwise avoid the extra
+ * bookkeeping. Initializing all_visible to false allows skipping the work
+ * to update them in heap_prune_record_unchanged_lp_normal().
*/
if (prstate.freeze)
{
prstate.all_visible = true;
prstate.all_frozen = true;
}
+ else if (prstate.consider_update_vm)
+ {
+ prstate.all_visible = true;
+ prstate.all_frozen = false;
+ }
else
{
- /*
- * Initializing to false allows skipping the work to update them in
- * heap_prune_record_unchanged_lp_normal().
- */
prstate.all_visible = false;
prstate.all_frozen = false;
}
/*
- * The visibility cutoff xid is the newest xmin of live tuples on the
- * page. In the common case, this will be set as the conflict horizon the
- * caller can use for updating the VM. If, at the end of freezing and
- * pruning, the page is all-frozen, there is no possibility that any
- * running transaction on the standby does not see tuples on the page as
- * all-visible, so the conflict horizon remains InvalidTransactionId.
+ * The visibility cutoff xid is the newest xmin of live, committed tuples
+ * older than OldestXmin on the page. This field is only kept up-to-date
+ * if the page is all-visible. As soon as a tuple is encountered that is
+ * not visible to all, this field is unmaintained. As long as it is
+ * maintained, it can be used to calculate the snapshot conflict horizon.
+ * This is most likely to happen when updating the VM and/or freezing all
+ * live tuples on the page. It is updated before returning to the caller
+ * because vacuum does assert-build only validation on the page using this
+ * field.
*/
prstate.visibility_cutoff_xid = InvalidTransactionId;
@@ -733,10 +758,11 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
/*
* Even if we don't prune anything, if we found a new value for the
- * pd_prune_xid field or the page was marked full, we will update the hint
- * bit.
+ * pd_prune_xid field or the page was marked full, we will update those
+ * hint bits.
*/
- do_hint = ((PageHeader) page)->pd_prune_xid != prstate.new_prune_xid ||
+ do_hint_full_or_prunable =
+ ((PageHeader) page)->pd_prune_xid != prstate.new_prune_xid ||
PageIsFull(page);
/*
@@ -784,7 +810,7 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
if (XLogCheckBufferNeedsBackup(buffer))
do_freeze = true;
}
- else if (do_hint)
+ else if (do_hint_full_or_prunable)
{
if (XLogHintBitIsNeeded() && XLogCheckBufferNeedsBackup(buffer))
do_freeze = true;
@@ -823,11 +849,84 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
*/
}
+ /*
+ * It was convenient to ignore LP_DEAD items in all_visible earlier on to
+ * make the choice of whether or not to freeze the page unaffected by the
+ * short-term presence of LP_DEAD items. These LP_DEAD items were
+ * effectively assumed to be LP_UNUSED items in the making. It doesn't
+ * matter which vacuum heap pass (initial pass or final pass) ends up
+ * setting the page all-frozen, as long as the ongoing VACUUM does it.
+ *
+ * Now that freezing has been finalized, unset all_visible if there are
+ * any LP_DEAD items on the page. It needs to reflect the present state of
+ * the page when using it to determine whether or not to update the VM.
+ *
+ * Keep track of whether or not the page was all-frozen except LP_DEAD
+ * items for the purposes of calculating the snapshot conflict horizon,
+ * though.
+ */
+ all_frozen_except_lp_dead = prstate.all_frozen;
+ if (prstate.lpdead_items > 0)
+ {
+ prstate.all_visible = false;
+ prstate.all_frozen = false;
+ }
+
Assert(!prstate.all_frozen || prstate.all_visible);
+
+ /*
+ * Handle setting visibility map bit based on information from the VM (as
+ * of last heap_vac_scan_next_block() call), and from all_visible and
+ * all_frozen variables.
+ */
+ if (prstate.consider_update_vm)
+ {
+ /*
+ * Clear any VM corruption. This does not need to be in a critical
+ * section, so we do it first. If PD_ALL_VISIBLE is incorrectly set,
+ * we may mark the heap page buffer dirty here and could end up doing
+ * so again later. This is not a correctness issue and is in the path
+ * of VM corruption, so we don't have to worry about the extra
+ * performance overhead.
+ */
+ if (identify_and_fix_vm_corruption(relation,
+ blockno, buffer, page,
+ blk_known_av, prstate.lpdead_items, vmbuffer))
+ {
+ /* If we fix corruption, don't update the VM further */
+ }
+
+ /* Determine if we actually need to set the VM and which bits to set. */
+ else if (prstate.all_visible &&
+ (!blk_known_av ||
+ (prstate.all_frozen && !VM_ALL_FROZEN(relation, blockno, &vmbuffer))))
+ {
+ vmflags |= VISIBILITYMAP_ALL_VISIBLE;
+ if (prstate.all_frozen)
+ vmflags |= VISIBILITYMAP_ALL_FROZEN;
+ }
+ }
+
+ do_set_vm = vmflags & VISIBILITYMAP_VALID_BITS;
+
+ /*
+ * Don't set PD_ALL_VISIBLE unless we also plan to set the VM. While it is
+ * correct for a heap page to have PD_ALL_VISIBLE even if the VM is not
+ * set, we strongly prefer to keep them in sync.
+ *
+ * Prior to Postgres 19, it was possible for the page-level bit to be set
+ * and the VM bit to be clear. This could happen if we crashed after
+ * setting PD_ALL_VISIBLE but before setting bits in the VM.
+ */
+ set_pd_all_visible = do_set_vm && !PageIsAllVisible(page);
+
+ /* Save these for the caller in case we later zero out vmflags */
+ presult->new_vmbits = vmflags;
+
/* Any error while applying the changes is critical */
START_CRIT_SECTION();
- if (do_hint)
+ if (do_hint_full_or_prunable)
{
/*
* Update the page's pd_prune_xid field to either zero, or the lowest
@@ -843,15 +942,16 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
PageClearFull(page);
/*
- * If that's all we had to do to the page, this is a non-WAL-logged
- * hint. If we are going to freeze or prune the page, we will mark
- * the buffer dirty below.
+ * If we are _only_ setting the prune_xid or PD_PAGE_FULL hint, then
+ * this is a non-WAL-logged hint. If we are going to freeze or prune
+ * tuples on the page or set PD_ALL_VISIBLE, we will mark the buffer
+ * dirty and emit WAL below.
*/
- if (!do_freeze && !do_prune)
+ if (!do_prune && !do_freeze && !set_pd_all_visible)
MarkBufferDirtyHint(buffer, true);
}
- if (do_prune || do_freeze)
+ if (do_prune || do_freeze || do_set_vm)
{
/* Apply the planned item changes and repair page fragmentation. */
if (do_prune)
@@ -865,12 +965,47 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
if (do_freeze)
heap_freeze_prepared_tuples(buffer, prstate.frozen, prstate.nfrozen);
- MarkBufferDirty(buffer);
+ if (set_pd_all_visible)
+ PageSetAllVisible(page);
/*
- * Emit a WAL XLOG_HEAP2_PRUNE_FREEZE record showing what we did
+ * We only set PD_ALL_VISIBLE if we also set the VM, and since setting
+ * the VM requires emitting WAL, MarkBufferDirtyHint() isn't
+ * appropriate here.
*/
- if (RelationNeedsWAL(relation))
+ if (do_prune || do_freeze || set_pd_all_visible)
+ MarkBufferDirty(buffer);
+
+ if (do_set_vm)
+ {
+ LockBuffer(vmbuffer, BUFFER_LOCK_EXCLUSIVE);
+ old_vmbits = visibilitymap_set_vmbyte(relation, blockno,
+ vmbuffer, vmflags);
+
+ if (old_vmbits == vmflags)
+ {
+ LockBuffer(vmbuffer, BUFFER_LOCK_UNLOCK);
+ do_set_vm = false;
+ /* 0 out vmflags so we don't emit WAL to update the VM */
+ vmflags = 0;
+ }
+ }
+
+ /*
+ * It should never be the case that PD_ALL_VISIBLE is not set and the
+ * VM is set. Or, if it were, we should have caught it earlier when
+ * finding and fixing VM corruption. So, if we found out the VM was
+ * already set above, we should have found PD_ALL_VISIBLE set earlier.
+ */
+ Assert(!set_pd_all_visible || do_set_vm);
+
+ /*
+ * Emit a WAL XLOG_HEAP2_PRUNE_FREEZE record showing what we did. If
+ * we were only updating the VM and it turns out it was already set,
+ * we will have unset do_set_vm earlier. As such, check it again
+ * before emitting the record.
+ */
+ if (RelationNeedsWAL(relation) && (do_set_vm || do_prune || do_freeze))
{
/*
* The snapshotConflictHorizon for the whole record should be the
@@ -882,35 +1017,56 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
* on the standby with xids older than the youngest tuple this
* record will freeze will conflict.
*/
- TransactionId frz_conflict_horizon = InvalidTransactionId;
- TransactionId conflict_xid;
+ TransactionId conflict_xid = InvalidTransactionId;
/*
- * We can use the visibility_cutoff_xid as our cutoff for
- * conflicts when the whole page is eligible to become all-frozen
- * in the VM once we're done with it. Otherwise we generate a
- * conservative cutoff by stepping back from OldestXmin.
+ * If we are updating the VM, the conflict horizon is almost
+ * always the visibility cutoff XID.
+ *
+ * Separately, if we are freezing any tuples, as an optimization,
+ * we can use the visibility_cutoff_xid as the conflict horizon if
+ * the page will be all-frozen. This is true even if there are
+ * LP_DEAD line pointers because we ignored those when maintaining
+ * the visibility_cutoff_xid.
*/
- if (do_freeze)
+ if (do_set_vm || (do_freeze && all_frozen_except_lp_dead))
+ conflict_xid = prstate.visibility_cutoff_xid;
+
+ /*
+ * Otherwise, if we are freezing but the page would not be
+ * all-frozen, we have to use the more pessimistic horizon of
+ * OldestXmin, which may be newer than the newest tuple we froze.
+ * We currently don't track the newest tuple we froze.
+ */
+ else if (do_freeze)
{
- if (prstate.all_visible && prstate.all_frozen)
- frz_conflict_horizon = prstate.visibility_cutoff_xid;
- else
- {
- /* Avoids false conflicts when hot_standby_feedback in use */
- frz_conflict_horizon = prstate.cutoffs->OldestXmin;
- TransactionIdRetreat(frz_conflict_horizon);
- }
+ conflict_xid = prstate.cutoffs->OldestXmin;
+ TransactionIdRetreat(conflict_xid);
}
- if (TransactionIdFollows(frz_conflict_horizon, prstate.latest_xid_removed))
- conflict_xid = frz_conflict_horizon;
- else
+ /*
+ * If we are removing tuples with a younger xmax than our so far
+ * calculated conflict_xid, we must use this as our horizon.
+ */
+ if (TransactionIdFollows(prstate.latest_xid_removed, conflict_xid))
conflict_xid = prstate.latest_xid_removed;
+ /*
+ * We can omit the snapshot conflict horizon if we are not pruning
+ * or freezing any tuples and are setting an already all-visible
+ * page all-frozen in the VM. In this case, all of the tuples on
+ * the page must already be visible to all MVCC snapshots on the
+ * standby.
+ */
+ if (!do_prune && !do_freeze && do_set_vm &&
+ blk_known_av && (vmflags & VISIBILITYMAP_ALL_FROZEN))
+ conflict_xid = InvalidTransactionId;
+
log_heap_prune_and_freeze(relation, buffer,
false,
- InvalidBuffer, 0, false,
+ vmbuffer,
+ vmflags,
+ set_pd_all_visible,
conflict_xid,
true, reason,
prstate.frozen, prstate.nfrozen,
@@ -922,124 +1078,55 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
END_CRIT_SECTION();
- /* Copy information back for caller */
- presult->ndeleted = prstate.ndeleted;
- presult->nnewlpdead = prstate.ndead;
- presult->nfrozen = prstate.nfrozen;
- presult->live_tuples = prstate.live_tuples;
- presult->recently_dead_tuples = prstate.recently_dead_tuples;
-
- /*
- * It was convenient to ignore LP_DEAD items in all_visible earlier on to
- * make the choice of whether or not to freeze the page unaffected by the
- * short-term presence of LP_DEAD items. These LP_DEAD items were
- * effectively assumed to be LP_UNUSED items in the making. It doesn't
- * matter which vacuum heap pass (initial pass or final pass) ends up
- * setting the page all-frozen, as long as the ongoing VACUUM does it.
- *
- * Now that freezing has been finalized, unset all_visible if there are
- * any LP_DEAD items on the page. It needs to reflect the present state
- * of the page, as expected for updating the visibility map.
- */
- if (prstate.all_visible && prstate.lpdead_items == 0)
- {
- presult->all_visible = prstate.all_visible;
- presult->all_frozen = prstate.all_frozen;
- }
- else
- {
- presult->all_visible = false;
- presult->all_frozen = false;
- }
-
- presult->hastup = prstate.hastup;
+ if (do_set_vm)
+ LockBuffer(vmbuffer, BUFFER_LOCK_UNLOCK);
- /*
- * If updating the visibility map, the conflict horizon for that record
- * must be the newest xmin on the page. However, if the page is
- * completely frozen, there can be no conflict and the vm_conflict_horizon
- * should remain InvalidTransactionId. This includes the case that we
- * just froze all the tuples; the prune-freeze record included the
- * conflict XID already so the VM update record doesn't need it.
- */
- if (presult->all_frozen)
- presult->vm_conflict_horizon = InvalidTransactionId;
- else
- presult->vm_conflict_horizon = prstate.visibility_cutoff_xid;
+ Assert(!prstate.all_visible || (prstate.lpdead_items == 0));
/*
- * Handle setting visibility map bit based on information from the VM (as
- * of last heap_vac_scan_next_block() call), and from all_visible and
- * all_frozen variables.
+ * VACUUM will call heap_page_is_all_visible() during the second pass over
+ * the heap to determine all_visible and all_frozen for the page -- this
+ * is a specialized version of the logic from this function. Now that
+ * we've finished pruning and freezing, make sure that we're in total
+ * agreement with heap_page_is_all_visible() using an assertion. We will
+ * have already set the page in the VM, so this assertion will only let
+ * you know that you've already done something wrong.
*/
- if (options & HEAP_PAGE_PRUNE_UPDATE_VM)
+#ifdef USE_ASSERT_CHECKING
+ if (prstate.all_visible)
{
- if (identify_and_fix_vm_corruption(relation,
- blockno, buffer, page,
- blk_known_av,
- prstate.lpdead_items, vmbuffer))
- {
- /* If we fix corruption, don't update the VM further */
- }
+ TransactionId debug_cutoff;
+ bool debug_all_frozen;
- /*
- * If the page isn't yet marked all-visible in the VM or it is and
- * needs to me marked all-frozen, update the VM Note that all_frozen
- * is only valid if all_visible is true, so we must check both
- * all_visible and all_frozen.
- */
- else if (presult->all_visible &&
- (!blk_known_av ||
- (presult->all_frozen && !VM_ALL_FROZEN(relation, blockno, &vmbuffer))))
- {
- Assert(prstate.lpdead_items == 0);
- vmflags = VISIBILITYMAP_ALL_VISIBLE;
+ Assert(cutoffs);
- /*
- * If the page is all-frozen, we can pass InvalidTransactionId as
- * our cutoff_xid, since a snapshotConflictHorizon sufficient to
- * make everything safe for REDO was logged when the page's tuples
- * were frozen.
- */
- if (presult->all_frozen)
- {
- Assert(!TransactionIdIsValid(presult->vm_conflict_horizon));
- vmflags |= VISIBILITYMAP_ALL_FROZEN;
- }
+ Assert(prstate.lpdead_items == 0);
- /*
- * It's possible for the VM bit to be clear and the page-level bit
- * to be set if checksums are not enabled.
- *
- * And even if we are just planning to update the frozen bit in
- * the VM, we shouldn't rely on all_visible_according_to_vm as a
- * proxy for the page-level PD_ALL_VISIBLE bit being set, since it
- * might have become stale.
- *
- * If the heap page is all-visible but the VM bit is not set, we
- * don't need to dirty the heap page. However, if checksums are
- * enabled, we do need to make sure that the heap page is dirtied
- * before passing it to visibilitymap_set(), because it may be
- * logged.
- */
- if (!PageIsAllVisible(page) || XLogHintBitIsNeeded())
- {
- PageSetAllVisible(page);
- MarkBufferDirty(buffer);
- }
+ if (!heap_page_is_all_visible(relation, buffer,
+ cutoffs->OldestXmin,
+ &debug_all_frozen,
+ &debug_cutoff, off_loc))
+ Assert(false);
- old_vmbits = visibilitymap_set(relation, blockno, buffer, InvalidXLogRecPtr,
- vmbuffer, presult->vm_conflict_horizon,
- vmflags);
- }
+ Assert(prstate.all_frozen == debug_all_frozen);
+
+ Assert(!TransactionIdIsValid(debug_cutoff) ||
+ debug_cutoff == prstate.visibility_cutoff_xid);
}
+#endif
+ /* Copy information back for caller */
+ presult->ndeleted = prstate.ndeleted;
+ presult->nnewlpdead = prstate.ndead;
+ presult->nfrozen = prstate.nfrozen;
+ presult->live_tuples = prstate.live_tuples;
+ presult->recently_dead_tuples = prstate.recently_dead_tuples;
+ presult->old_vmbits = old_vmbits;
+ /* new_vmbits was set above */
+ presult->hastup = prstate.hastup;
presult->lpdead_items = prstate.lpdead_items;
/* the presult->deadoffsets array was already filled in */
- presult->old_vmbits = old_vmbits;
- presult->new_vmbits = vmflags;
-
if (prstate.freeze)
{
if (presult->nfrozen > 0)
@@ -1621,7 +1708,12 @@ heap_prune_record_unchanged_lp_normal(Page page, PruneState *prstate, OffsetNumb
break;
}
- /* Consider freezing any normal tuples which will not be removed */
+ /*
+ * Consider freezing any normal tuples which will not be removed.
+ * Regardless of whether or not we want to freeze the tuples, if we want
+ * to update the VM, we have to call heap_prepare_freeze_tuple() on every
+ * tuple to know whether or not the page will be totally frozen.
+ */
if (prstate->freeze)
{
bool totally_frozen;
@@ -2234,6 +2326,8 @@ log_heap_prune_and_freeze(Relation relation, Buffer buffer,
OffsetNumber frz_offsets[MaxHeapTuplesPerPage];
bool do_prune = nredirected > 0 || ndead > 0 || nunused > 0;
+ Assert(do_prune || nfrozen > 0 || vmflags & VISIBILITYMAP_VALID_BITS);
+
xlrec.flags = 0;
regbuf_flags = REGBUF_STANDARD;
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index c49e81bc5dd..91e209901b8 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -2014,34 +2014,6 @@ lazy_scan_prune(LVRelState *vacrel,
vacrel->new_frozen_tuple_pages++;
}
- /*
- * VACUUM will call heap_page_is_all_visible() during the second pass over
- * the heap to determine all_visible and all_frozen for the page -- this
- * is a specialized version of the logic from this function. Now that
- * we've finished pruning and freezing, make sure that we're in total
- * agreement with heap_page_is_all_visible() using an assertion.
- */
-#ifdef USE_ASSERT_CHECKING
- /* Note that all_frozen value does not matter when !all_visible */
- if (presult.all_visible)
- {
- TransactionId debug_cutoff;
- bool debug_all_frozen;
-
- Assert(presult.lpdead_items == 0);
-
- if (!heap_page_is_all_visible(vacrel->rel, buf,
- vacrel->cutoffs.OldestXmin, &debug_all_frozen,
- &debug_cutoff, &vacrel->offnum))
- Assert(false);
-
- Assert(presult.all_frozen == debug_all_frozen);
-
- Assert(!TransactionIdIsValid(debug_cutoff) ||
- debug_cutoff == presult.vm_conflict_horizon);
- }
-#endif
-
/*
* Now save details of the LP_DEAD items from the page in vacrel
*/
@@ -2075,8 +2047,6 @@ lazy_scan_prune(LVRelState *vacrel,
/* Did we find LP_DEAD items? */
*has_lpdead_items = (presult.lpdead_items > 0);
- Assert(!presult.all_visible || !(*has_lpdead_items));
-
/*
* For the purposes of logging, count whether or not the page was newly
* set all-visible and, potentially, all-frozen.
diff --git a/src/include/access/heapam.h b/src/include/access/heapam.h
index b85648456e9..0b9bb1c9b13 100644
--- a/src/include/access/heapam.h
+++ b/src/include/access/heapam.h
@@ -235,19 +235,12 @@ typedef struct PruneFreezeResult
int recently_dead_tuples;
/*
- * all_visible and all_frozen indicate the status of the page as reflected
- * in the visibility map after pruning, freezing, and setting any pages
- * all-visible in the visibility map.
+ * old_vmbits are the state of the all-visible and all-frozen bits in the
+ * visibility map before updating it during phase I of vacuuming.
+ * new_vmbits are the state of those bits after phase I of vacuuming.
*
- * vm_conflict_horizon is the newest xmin of live tuples on the page
- * (older than OldestXmin). It will only be valid if we did not set the
- * page all-frozen in the VM.
- *
- * These are only set if the HEAP_PRUNE_FREEZE option is set.
+ * These are only set if the HEAP_PAGE_PRUNE_UPDATE_VM option is set.
*/
- bool all_visible;
- bool all_frozen;
- TransactionId vm_conflict_horizon;
uint8 old_vmbits;
uint8 new_vmbits;
--
2.43.0
v7-0014-Remove-xl_heap_visible-entirely.patchtext/x-patch; charset=US-ASCII; name=v7-0014-Remove-xl_heap_visible-entirely.patchDownload
From 3acf451bb40394330281ae82c5ef4c5c685438b4 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <melanieplageman@gmail.com>
Date: Wed, 18 Jun 2025 12:30:42 -0400
Subject: [PATCH v7 14/22] Remove xl_heap_visible entirely
There are now no users of this, so eliminate it entirely.
---
src/backend/access/common/bufmask.c | 3 +-
src/backend/access/heap/heapam.c | 54 +-------
src/backend/access/heap/heapam_xlog.c | 154 +----------------------
src/backend/access/heap/pruneheap.c | 4 +-
src/backend/access/heap/vacuumlazy.c | 10 +-
src/backend/access/heap/visibilitymap.c | 109 +---------------
src/backend/access/rmgrdesc/heapdesc.c | 10 --
src/backend/replication/logical/decode.c | 1 -
src/backend/storage/ipc/standby.c | 12 +-
src/include/access/heapam_xlog.h | 20 ---
src/include/access/visibilitymap.h | 11 +-
src/include/access/visibilitymapdefs.h | 9 --
src/tools/pgindent/typedefs.list | 1 -
13 files changed, 30 insertions(+), 368 deletions(-)
diff --git a/src/backend/access/common/bufmask.c b/src/backend/access/common/bufmask.c
index bb260cffa68..1fff01383b3 100644
--- a/src/backend/access/common/bufmask.c
+++ b/src/backend/access/common/bufmask.c
@@ -56,8 +56,7 @@ mask_page_hint_bits(Page page)
/*
* During replay, if the page LSN has advanced past our XLOG record's LSN,
- * we don't mark the page all-visible. See heap_xlog_visible() for
- * details.
+ * we don't mark the page all-visible.
*/
PageClearAllVisible(page);
}
diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index 4ce0ec61692..060a166e18f 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -40,6 +40,7 @@
#include "access/valid.h"
#include "access/visibilitymap.h"
#include "access/xloginsert.h"
+#include "access/xlogutils.h"
#include "catalog/pg_database.h"
#include "catalog/pg_database_d.h"
#include "commands/vacuum.h"
@@ -2524,11 +2525,11 @@ heap_multi_insert(Relation relation, TupleTableSlot **slots, int ntuples,
Assert(visibilitymap_pin_ok(BufferGetBlockNumber(buffer), vmbuffer));
PageSetAllVisible(page);
LockBuffer(vmbuffer, BUFFER_LOCK_EXCLUSIVE);
- visibilitymap_set_vmbyte(relation,
- BufferGetBlockNumber(buffer),
- vmbuffer,
- VISIBILITYMAP_ALL_VISIBLE |
- VISIBILITYMAP_ALL_FROZEN);
+ visibilitymap_set(relation,
+ BufferGetBlockNumber(buffer),
+ vmbuffer,
+ VISIBILITYMAP_ALL_VISIBLE |
+ VISIBILITYMAP_ALL_FROZEN);
}
/*
@@ -8796,49 +8797,6 @@ bottomup_sort_and_shrink(TM_IndexDeleteOp *delstate)
return nblocksfavorable;
}
-/*
- * Perform XLogInsert for a heap-visible operation. 'block' is the block
- * being marked all-visible, and vm_buffer is the buffer containing the
- * corresponding visibility map block. Both should have already been modified
- * and dirtied.
- *
- * snapshotConflictHorizon comes from the largest xmin on the page being
- * marked all-visible. REDO routine uses it to generate recovery conflicts.
- *
- * If checksums or wal_log_hints are enabled, we may also generate a full-page
- * image of heap_buffer. Otherwise, we optimize away the FPI (by specifying
- * REGBUF_NO_IMAGE for the heap buffer), in which case the caller should *not*
- * update the heap page's LSN.
- */
-XLogRecPtr
-log_heap_visible(Relation rel, Buffer heap_buffer, Buffer vm_buffer,
- TransactionId snapshotConflictHorizon, uint8 vmflags)
-{
- xl_heap_visible xlrec;
- XLogRecPtr recptr;
- uint8 flags;
-
- Assert(BufferIsValid(heap_buffer));
- Assert(BufferIsValid(vm_buffer));
-
- xlrec.snapshotConflictHorizon = snapshotConflictHorizon;
- xlrec.flags = vmflags;
- if (RelationIsAccessibleInLogicalDecoding(rel))
- xlrec.flags |= VISIBILITYMAP_XLOG_CATALOG_REL;
- XLogBeginInsert();
- XLogRegisterData(&xlrec, SizeOfHeapVisible);
-
- XLogRegisterBuffer(0, vm_buffer, 0);
-
- flags = REGBUF_STANDARD;
- if (!XLogHintBitIsNeeded())
- flags |= REGBUF_NO_IMAGE;
- XLogRegisterBuffer(1, heap_buffer, flags);
-
- recptr = XLogInsert(RM_HEAP2_ID, XLOG_HEAP2_VISIBLE);
-
- return recptr;
-}
/*
* Perform XLogInsert for a heap-update operation. Caller must already
diff --git a/src/backend/access/heap/heapam_xlog.c b/src/backend/access/heap/heapam_xlog.c
index 05dce829eae..539d38194f5 100644
--- a/src/backend/access/heap/heapam_xlog.c
+++ b/src/backend/access/heap/heapam_xlog.c
@@ -83,10 +83,6 @@ heap_xlog_prune_freeze(XLogReaderState *record)
memcpy(&vmflags, maindataptr, sizeof(uint8));
maindataptr += sizeof(uint8);
- /*
- * We don't set VISIBILITYMAP_XLOG_CATALOG_REL in the combined record
- * because we already have XLHP_IS_CATALOG_REL.
- */
Assert((vmflags & VISIBILITYMAP_VALID_BITS) == vmflags);
/* Must never set all_frozen bit without also setting all_visible bit */
Assert(vmflags != VISIBILITYMAP_ALL_FROZEN);
@@ -268,7 +264,7 @@ heap_xlog_prune_freeze(XLogReaderState *record)
Relation reln = CreateFakeRelcacheEntry(rlocator);
visibilitymap_pin(reln, blkno, &vmbuffer);
- old_vmbits = visibilitymap_set_vmbyte(reln, blkno, vmbuffer, vmflags);
+ old_vmbits = visibilitymap_set(reln, blkno, vmbuffer, vmflags);
/* Only set VM page LSN if we modified the page */
if (old_vmbits != vmflags)
PageSetLSN(BufferGetPage(vmbuffer), lsn);
@@ -279,143 +275,6 @@ heap_xlog_prune_freeze(XLogReaderState *record)
UnlockReleaseBuffer(vmbuffer);
}
-/*
- * Replay XLOG_HEAP2_VISIBLE records.
- *
- * The critical integrity requirement here is that we must never end up with
- * a situation where the visibility map bit is set, and the page-level
- * PD_ALL_VISIBLE bit is clear. If that were to occur, then a subsequent
- * page modification would fail to clear the visibility map bit.
- */
-static void
-heap_xlog_visible(XLogReaderState *record)
-{
- XLogRecPtr lsn = record->EndRecPtr;
- xl_heap_visible *xlrec = (xl_heap_visible *) XLogRecGetData(record);
- Buffer vmbuffer = InvalidBuffer;
- Buffer buffer;
- Page page;
- RelFileLocator rlocator;
- BlockNumber blkno;
- XLogRedoAction action;
-
- Assert((xlrec->flags & VISIBILITYMAP_XLOG_VALID_BITS) == xlrec->flags);
-
- XLogRecGetBlockTag(record, 1, &rlocator, NULL, &blkno);
-
- /*
- * If there are any Hot Standby transactions running that have an xmin
- * horizon old enough that this page isn't all-visible for them, they
- * might incorrectly decide that an index-only scan can skip a heap fetch.
- *
- * NB: It might be better to throw some kind of "soft" conflict here that
- * forces any index-only scan that is in flight to perform heap fetches,
- * rather than killing the transaction outright.
- */
- if (InHotStandby)
- ResolveRecoveryConflictWithSnapshot(xlrec->snapshotConflictHorizon,
- xlrec->flags & VISIBILITYMAP_XLOG_CATALOG_REL,
- rlocator);
-
- /*
- * Read the heap page, if it still exists. If the heap file has dropped or
- * truncated later in recovery, we don't need to update the page, but we'd
- * better still update the visibility map.
- */
- action = XLogReadBufferForRedo(record, 1, &buffer);
- if (action == BLK_NEEDS_REDO)
- {
- /*
- * We don't bump the LSN of the heap page when setting the visibility
- * map bit (unless checksums or wal_hint_bits is enabled, in which
- * case we must). This exposes us to torn page hazards, but since
- * we're not inspecting the existing page contents in any way, we
- * don't care.
- */
- page = BufferGetPage(buffer);
-
- PageSetAllVisible(page);
-
- if (XLogHintBitIsNeeded())
- PageSetLSN(page, lsn);
-
- MarkBufferDirty(buffer);
- }
- else if (action == BLK_RESTORED)
- {
- /*
- * If heap block was backed up, we already restored it and there's
- * nothing more to do. (This can only happen with checksums or
- * wal_log_hints enabled.)
- */
- }
-
- if (BufferIsValid(buffer))
- {
- Size space = PageGetFreeSpace(BufferGetPage(buffer));
-
- UnlockReleaseBuffer(buffer);
-
- /*
- * Since FSM is not WAL-logged and only updated heuristically, it
- * easily becomes stale in standbys. If the standby is later promoted
- * and runs VACUUM, it will skip updating individual free space
- * figures for pages that became all-visible (or all-frozen, depending
- * on the vacuum mode,) which is troublesome when FreeSpaceMapVacuum
- * propagates too optimistic free space values to upper FSM layers;
- * later inserters try to use such pages only to find out that they
- * are unusable. This can cause long stalls when there are many such
- * pages.
- *
- * Forestall those problems by updating FSM's idea about a page that
- * is becoming all-visible or all-frozen.
- *
- * Do this regardless of a full-page image being applied, since the
- * FSM data is not in the page anyway.
- */
- if (xlrec->flags & VISIBILITYMAP_VALID_BITS)
- XLogRecordPageWithFreeSpace(rlocator, blkno, space);
- }
-
- /*
- * Even if we skipped the heap page update due to the LSN interlock, it's
- * still safe to update the visibility map. Any WAL record that clears
- * the visibility map bit does so before checking the page LSN, so any
- * bits that need to be cleared will still be cleared.
- */
- if (XLogReadBufferForRedoExtended(record, 0, RBM_ZERO_ON_ERROR, false,
- &vmbuffer) == BLK_NEEDS_REDO)
- {
- Page vmpage = BufferGetPage(vmbuffer);
- Relation reln;
- uint8 vmbits;
-
- /* initialize the page if it was read as zeros */
- if (PageIsNew(vmpage))
- PageInit(vmpage, BLCKSZ, 0);
-
- /* remove VISIBILITYMAP_XLOG_* */
- vmbits = xlrec->flags & VISIBILITYMAP_VALID_BITS;
-
- /*
- * XLogReadBufferForRedoExtended locked the buffer. But
- * visibilitymap_set will handle locking itself.
- */
- LockBuffer(vmbuffer, BUFFER_LOCK_UNLOCK);
-
- reln = CreateFakeRelcacheEntry(rlocator);
-
- Assert(visibilitymap_pin_ok(blkno, vmbuffer));
- visibilitymap_set(reln, blkno, InvalidBuffer, lsn, vmbuffer,
- xlrec->snapshotConflictHorizon, vmbits);
-
- ReleaseBuffer(vmbuffer);
- FreeFakeRelcacheEntry(reln);
- }
- else if (BufferIsValid(vmbuffer))
- UnlockReleaseBuffer(vmbuffer);
-}
-
/*
* Given an "infobits" field from an XLog record, set the correct bits in the
* given infomask and infomask2 for the tuple touched by the record.
@@ -792,16 +651,16 @@ heap_xlog_multi_insert(XLogReaderState *record)
Relation reln = CreateFakeRelcacheEntry(rlocator);
Assert(visibilitymap_pin_ok(blkno, vmbuffer));
- visibilitymap_set_vmbyte(reln, blkno,
- vmbuffer,
- VISIBILITYMAP_ALL_VISIBLE |
- VISIBILITYMAP_ALL_FROZEN);
/*
* It is not possible that the VM was already set for this heap page,
* so the vmbuffer must have been modified and marked dirty.
*/
Assert(BufferIsDirty(vmbuffer));
+ visibilitymap_set(reln, blkno,
+ vmbuffer,
+ VISIBILITYMAP_ALL_VISIBLE |
+ VISIBILITYMAP_ALL_FROZEN);
PageSetLSN(BufferGetPage(vmbuffer), lsn);
FreeFakeRelcacheEntry(reln);
}
@@ -1381,9 +1240,6 @@ heap2_redo(XLogReaderState *record)
case XLOG_HEAP2_PRUNE_VACUUM_CLEANUP:
heap_xlog_prune_freeze(record);
break;
- case XLOG_HEAP2_VISIBLE:
- heap_xlog_visible(record);
- break;
case XLOG_HEAP2_MULTI_INSERT:
heap_xlog_multi_insert(record);
break;
diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index 82127e8728b..ffc12314b41 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -979,8 +979,8 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
if (do_set_vm)
{
LockBuffer(vmbuffer, BUFFER_LOCK_EXCLUSIVE);
- old_vmbits = visibilitymap_set_vmbyte(relation, blockno,
- vmbuffer, vmflags);
+ old_vmbits = visibilitymap_set(relation, blockno,
+ vmbuffer, vmflags);
if (old_vmbits == vmflags)
{
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index 91e209901b8..6a0fa371a06 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -1887,8 +1887,8 @@ lazy_scan_new_or_empty(LVRelState *vacrel, Buffer buf, BlockNumber blkno,
MarkBufferDirty(buf);
LockBuffer(vmbuffer, BUFFER_LOCK_EXCLUSIVE);
- visibilitymap_set_vmbyte(vacrel->rel, blkno,
- vmbuffer, new_vmbits);
+ visibilitymap_set(vacrel->rel, blkno,
+ vmbuffer, new_vmbits);
if (RelationNeedsWAL(vacrel->rel))
{
@@ -2754,9 +2754,9 @@ lazy_vacuum_heap_page(LVRelState *vacrel, BlockNumber blkno, Buffer buffer,
set_pd_all_vis = true;
LockBuffer(vmbuffer, BUFFER_LOCK_EXCLUSIVE);
PageSetAllVisible(page);
- visibilitymap_set_vmbyte(vacrel->rel,
- blkno,
- vmbuffer, vmflags);
+ visibilitymap_set(vacrel->rel,
+ blkno,
+ vmbuffer, vmflags);
conflict_xid = visibility_cutoff_xid;
}
diff --git a/src/backend/access/heap/visibilitymap.c b/src/backend/access/heap/visibilitymap.c
index 568bc83db9c..8342ec1ff22 100644
--- a/src/backend/access/heap/visibilitymap.c
+++ b/src/backend/access/heap/visibilitymap.c
@@ -14,8 +14,7 @@
* visibilitymap_clear - clear bits for one page in the visibility map
* visibilitymap_pin - pin a map page for setting a bit
* visibilitymap_pin_ok - check whether correct map page is already pinned
- * visibilitymap_set - set a bit in a previously pinned page and log
- * visibilitymap_set_vmbyte - set a bit in a pinned page
+ * visibilitymap_set - set a bit in a previously pinned page
* visibilitymap_get_status - get status of bits
* visibilitymap_count - count number of bits set in visibility map
* visibilitymap_prepare_truncate -
@@ -220,108 +219,6 @@ visibilitymap_pin_ok(BlockNumber heapBlk, Buffer vmbuf)
return BufferIsValid(vmbuf) && BufferGetBlockNumber(vmbuf) == mapBlock;
}
-/*
- * visibilitymap_set - set bit(s) on a previously pinned page
- *
- * recptr is the LSN of the XLOG record we're replaying, if we're in recovery,
- * or InvalidXLogRecPtr in normal running. The VM page LSN is advanced to the
- * one provided; in normal running, we generate a new XLOG record and set the
- * page LSN to that value (though the heap page's LSN may *not* be updated;
- * see below). cutoff_xid is the largest xmin on the page being marked
- * all-visible; it is needed for Hot Standby, and can be InvalidTransactionId
- * if the page contains no tuples. It can also be set to InvalidTransactionId
- * when a page that is already all-visible is being marked all-frozen.
- *
- * Caller is expected to set the heap page's PD_ALL_VISIBLE bit before calling
- * this function. Except in recovery, caller should also pass the heap
- * buffer. When checksums are enabled and we're not in recovery, we must add
- * the heap buffer to the WAL chain to protect it from being torn.
- *
- * You must pass a buffer containing the correct map page to this function.
- * Call visibilitymap_pin first to pin the right one. This function doesn't do
- * any I/O.
- *
- * Returns the state of the page's VM bits before setting flags.
- */
-uint8
-visibilitymap_set(Relation rel, BlockNumber heapBlk, Buffer heapBuf,
- XLogRecPtr recptr, Buffer vmBuf, TransactionId cutoff_xid,
- uint8 flags)
-{
- BlockNumber mapBlock = HEAPBLK_TO_MAPBLOCK(heapBlk);
- uint32 mapByte = HEAPBLK_TO_MAPBYTE(heapBlk);
- uint8 mapOffset = HEAPBLK_TO_OFFSET(heapBlk);
- Page page;
- uint8 *map;
- uint8 status;
-
-#ifdef TRACE_VISIBILITYMAP
- elog(DEBUG1, "vm_set flags 0x%02X for %s %d",
- flags, RelationGetRelationName(rel), heapBlk);
-#endif
-
- Assert(InRecovery || XLogRecPtrIsInvalid(recptr));
- Assert(InRecovery || PageIsAllVisible((Page) BufferGetPage(heapBuf)));
- Assert((flags & VISIBILITYMAP_VALID_BITS) == flags);
-
- /* Must never set all_frozen bit without also setting all_visible bit */
- Assert(flags != VISIBILITYMAP_ALL_FROZEN);
-
- /* Check that we have the right heap page pinned, if present */
- if (BufferIsValid(heapBuf) && BufferGetBlockNumber(heapBuf) != heapBlk)
- elog(ERROR, "wrong heap buffer passed to visibilitymap_set");
-
- Assert(!BufferIsValid(heapBuf) || BufferIsExclusiveLocked(heapBuf));
-
- /* Check that we have the right VM page pinned */
- if (!BufferIsValid(vmBuf) || BufferGetBlockNumber(vmBuf) != mapBlock)
- elog(ERROR, "wrong VM buffer passed to visibilitymap_set");
-
- page = BufferGetPage(vmBuf);
- map = (uint8 *) PageGetContents(page);
- LockBuffer(vmBuf, BUFFER_LOCK_EXCLUSIVE);
-
- status = (map[mapByte] >> mapOffset) & VISIBILITYMAP_VALID_BITS;
- if (flags != status)
- {
- START_CRIT_SECTION();
-
- map[mapByte] |= (flags << mapOffset);
- MarkBufferDirty(vmBuf);
-
- if (RelationNeedsWAL(rel))
- {
- if (XLogRecPtrIsInvalid(recptr))
- {
- Assert(!InRecovery);
- recptr = log_heap_visible(rel, heapBuf, vmBuf, cutoff_xid, flags);
-
- /*
- * If data checksums are enabled (or wal_log_hints=on), we
- * need to protect the heap page from being torn.
- *
- * If not, then we must *not* update the heap page's LSN. In
- * this case, the FPI for the heap page was omitted from the
- * WAL record inserted above, so it would be incorrect to
- * update the heap page's LSN.
- */
- if (XLogHintBitIsNeeded())
- {
- Page heapPage = BufferGetPage(heapBuf);
-
- PageSetLSN(heapPage, recptr);
- }
- }
- PageSetLSN(page, recptr);
- }
-
- END_CRIT_SECTION();
- }
-
- LockBuffer(vmBuf, BUFFER_LOCK_UNLOCK);
- return status;
-}
-
/*
* Set flags in the VM block contained in the passed in vmBuf.
*
@@ -343,8 +240,8 @@ visibilitymap_set(Relation rel, BlockNumber heapBlk, Buffer heapBuf,
* is pinned and exclusive locked.
*/
uint8
-visibilitymap_set_vmbyte(Relation rel, BlockNumber heapBlk,
- Buffer vmBuf, uint8 flags)
+visibilitymap_set(Relation rel, BlockNumber heapBlk,
+ Buffer vmBuf, uint8 flags)
{
BlockNumber mapBlock = HEAPBLK_TO_MAPBLOCK(heapBlk);
uint32 mapByte = HEAPBLK_TO_MAPBYTE(heapBlk);
diff --git a/src/backend/access/rmgrdesc/heapdesc.c b/src/backend/access/rmgrdesc/heapdesc.c
index d6c86ccac20..f7880a4ed81 100644
--- a/src/backend/access/rmgrdesc/heapdesc.c
+++ b/src/backend/access/rmgrdesc/heapdesc.c
@@ -351,13 +351,6 @@ heap2_desc(StringInfo buf, XLogReaderState *record)
}
}
}
- else if (info == XLOG_HEAP2_VISIBLE)
- {
- xl_heap_visible *xlrec = (xl_heap_visible *) rec;
-
- appendStringInfo(buf, "snapshotConflictHorizon: %u, flags: 0x%02X",
- xlrec->snapshotConflictHorizon, xlrec->flags);
- }
else if (info == XLOG_HEAP2_MULTI_INSERT)
{
xl_heap_multi_insert *xlrec = (xl_heap_multi_insert *) rec;
@@ -462,9 +455,6 @@ heap2_identify(uint8 info)
case XLOG_HEAP2_PRUNE_VACUUM_CLEANUP:
id = "PRUNE_VACUUM_CLEANUP";
break;
- case XLOG_HEAP2_VISIBLE:
- id = "VISIBLE";
- break;
case XLOG_HEAP2_MULTI_INSERT:
id = "MULTI_INSERT";
break;
diff --git a/src/backend/replication/logical/decode.c b/src/backend/replication/logical/decode.c
index cc03f0706e9..2fdd4af90a8 100644
--- a/src/backend/replication/logical/decode.c
+++ b/src/backend/replication/logical/decode.c
@@ -454,7 +454,6 @@ heap2_decode(LogicalDecodingContext *ctx, XLogRecordBuffer *buf)
case XLOG_HEAP2_PRUNE_ON_ACCESS:
case XLOG_HEAP2_PRUNE_VACUUM_SCAN:
case XLOG_HEAP2_PRUNE_VACUUM_CLEANUP:
- case XLOG_HEAP2_VISIBLE:
case XLOG_HEAP2_LOCK_UPDATED:
break;
default:
diff --git a/src/backend/storage/ipc/standby.c b/src/backend/storage/ipc/standby.c
index 4222bdab078..c619643e121 100644
--- a/src/backend/storage/ipc/standby.c
+++ b/src/backend/storage/ipc/standby.c
@@ -475,12 +475,12 @@ ResolveRecoveryConflictWithSnapshot(TransactionId snapshotConflictHorizon,
* If we get passed InvalidTransactionId then we do nothing (no conflict).
*
* This can happen when replaying already-applied WAL records after a
- * standby crash or restart, or when replaying an XLOG_HEAP2_VISIBLE
- * record that marks as frozen a page which was already all-visible. It's
- * also quite common with records generated during index deletion
- * (original execution of the deletion can reason that a recovery conflict
- * which is sufficient for the deletion operation must take place before
- * replay of the deletion record itself).
+ * standby crash or restart, or when replaying a record that marks as
+ * frozen a page which was already marked all-visible in the visibility
+ * map. It's also quite common with records generated during index
+ * deletion (original execution of the deletion can reason that a recovery
+ * conflict which is sufficient for the deletion operation must take place
+ * before replay of the deletion record itself).
*/
if (!TransactionIdIsValid(snapshotConflictHorizon))
return;
diff --git a/src/include/access/heapam_xlog.h b/src/include/access/heapam_xlog.h
index d6a479f6984..34988d564fd 100644
--- a/src/include/access/heapam_xlog.h
+++ b/src/include/access/heapam_xlog.h
@@ -60,7 +60,6 @@
#define XLOG_HEAP2_PRUNE_ON_ACCESS 0x10
#define XLOG_HEAP2_PRUNE_VACUUM_SCAN 0x20
#define XLOG_HEAP2_PRUNE_VACUUM_CLEANUP 0x30
-#define XLOG_HEAP2_VISIBLE 0x40
#define XLOG_HEAP2_MULTI_INSERT 0x50
#define XLOG_HEAP2_LOCK_UPDATED 0x60
#define XLOG_HEAP2_NEW_CID 0x70
@@ -440,20 +439,6 @@ typedef struct xl_heap_inplace
#define MinSizeOfHeapInplace (offsetof(xl_heap_inplace, nmsgs) + sizeof(int))
-/*
- * This is what we need to know about setting a visibility map bit
- *
- * Backup blk 0: visibility map buffer
- * Backup blk 1: heap buffer
- */
-typedef struct xl_heap_visible
-{
- TransactionId snapshotConflictHorizon;
- uint8 flags;
-} xl_heap_visible;
-
-#define SizeOfHeapVisible (offsetof(xl_heap_visible, flags) + sizeof(uint8))
-
typedef struct xl_heap_new_cid
{
/*
@@ -497,11 +482,6 @@ extern void heap2_desc(StringInfo buf, XLogReaderState *record);
extern const char *heap2_identify(uint8 info);
extern void heap_xlog_logical_rewrite(XLogReaderState *r);
-extern XLogRecPtr log_heap_visible(Relation rel, Buffer heap_buffer,
- Buffer vm_buffer,
- TransactionId snapshotConflictHorizon,
- uint8 vmflags);
-
/* in heapdesc.c, so it can be shared between frontend/backend code */
extern void heap_xlog_deserialize_prune_and_freeze(char *cursor, uint8 flags,
int *nplans, xlhp_freeze_plan **plans,
diff --git a/src/include/access/visibilitymap.h b/src/include/access/visibilitymap.h
index 977566f6b98..c5b1e1f7adb 100644
--- a/src/include/access/visibilitymap.h
+++ b/src/include/access/visibilitymap.h
@@ -15,7 +15,6 @@
#define VISIBILITYMAP_H
#include "access/visibilitymapdefs.h"
-#include "access/xlogdefs.h"
#include "storage/block.h"
#include "storage/buf.h"
#include "utils/relcache.h"
@@ -31,14 +30,8 @@ extern bool visibilitymap_clear(Relation rel, BlockNumber heapBlk,
extern void visibilitymap_pin(Relation rel, BlockNumber heapBlk,
Buffer *vmbuf);
extern bool visibilitymap_pin_ok(BlockNumber heapBlk, Buffer vmbuf);
-extern uint8 visibilitymap_set(Relation rel,
- BlockNumber heapBlk, Buffer heapBuf,
- XLogRecPtr recptr,
- Buffer vmBuf,
- TransactionId cutoff_xid,
- uint8 flags);
-extern uint8 visibilitymap_set_vmbyte(Relation rel, BlockNumber heapBlk,
- Buffer vmBuf, uint8 flags);
+extern uint8 visibilitymap_set(Relation rel, BlockNumber heapBlk,
+ Buffer vmBuf, uint8 flags);
extern uint8 visibilitymap_get_status(Relation rel, BlockNumber heapBlk, Buffer *vmbuf);
extern void visibilitymap_count(Relation rel, BlockNumber *all_visible, BlockNumber *all_frozen);
extern BlockNumber visibilitymap_prepare_truncate(Relation rel,
diff --git a/src/include/access/visibilitymapdefs.h b/src/include/access/visibilitymapdefs.h
index 5ad5c020877..e01bce4c99f 100644
--- a/src/include/access/visibilitymapdefs.h
+++ b/src/include/access/visibilitymapdefs.h
@@ -21,14 +21,5 @@
#define VISIBILITYMAP_ALL_FROZEN 0x02
#define VISIBILITYMAP_VALID_BITS 0x03 /* OR of all valid visibilitymap
* flags bits */
-/*
- * To detect recovery conflicts during logical decoding on a standby, we need
- * to know if a table is a user catalog table. For that we add an additional
- * bit into xl_heap_visible.flags, in addition to the above.
- *
- * NB: VISIBILITYMAP_XLOG_* may not be passed to visibilitymap_set().
- */
-#define VISIBILITYMAP_XLOG_CATALOG_REL 0x04
-#define VISIBILITYMAP_XLOG_VALID_BITS (VISIBILITYMAP_VALID_BITS | VISIBILITYMAP_XLOG_CATALOG_REL)
#endif /* VISIBILITYMAPDEFS_H */
diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list
index a13e8162890..b4c880c083f 100644
--- a/src/tools/pgindent/typedefs.list
+++ b/src/tools/pgindent/typedefs.list
@@ -4273,7 +4273,6 @@ xl_heap_prune
xl_heap_rewrite_mapping
xl_heap_truncate
xl_heap_update
-xl_heap_visible
xl_invalid_page
xl_invalid_page_key
xl_invalidations
--
2.43.0
v7-0015-Rename-GlobalVisTestIsRemovableXid-to-GlobalVisXi.patchtext/x-patch; charset=US-ASCII; name=v7-0015-Rename-GlobalVisTestIsRemovableXid-to-GlobalVisXi.patchDownload
From 03c8a4257e02d217378080d611f4c066ad69c496 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <melanieplageman@gmail.com>
Date: Fri, 18 Jul 2025 16:30:04 -0400
Subject: [PATCH v7 15/22] Rename GlobalVisTestIsRemovableXid() to
GlobalVisXidVisibleToAll()
Currently, we only use GlobalVisTestIsRemovableXid() to check if a
tuple's xmax is visible to all, meaning we can remove it. But future
commits will use GlobalVisTestIsRemovableXid() to test if a tuple's xmin
is visible to all for the purposes of determining if setting the page
all-visible in the VM. In that case, it makes more sense to call the
function GlobalVisXidVisibleToAll().
---
src/backend/access/heap/heapam_visibility.c | 6 +++---
src/backend/access/heap/pruneheap.c | 14 +++++++-------
src/backend/access/spgist/spgvacuum.c | 2 +-
src/backend/storage/ipc/procarray.c | 13 ++++++-------
src/include/utils/snapmgr.h | 4 ++--
5 files changed, 19 insertions(+), 20 deletions(-)
diff --git a/src/backend/access/heap/heapam_visibility.c b/src/backend/access/heap/heapam_visibility.c
index 05f6946fe60..4ebc8abdbeb 100644
--- a/src/backend/access/heap/heapam_visibility.c
+++ b/src/backend/access/heap/heapam_visibility.c
@@ -1447,7 +1447,7 @@ HeapTupleSatisfiesNonVacuumable(HeapTuple htup, Snapshot snapshot,
{
Assert(TransactionIdIsValid(dead_after));
- if (GlobalVisTestIsRemovableXid(snapshot->vistest, dead_after))
+ if (GlobalVisXidVisibleToAll(snapshot->vistest, dead_after))
res = HEAPTUPLE_DEAD;
}
else
@@ -1512,8 +1512,8 @@ HeapTupleIsSurelyDead(HeapTuple htup, GlobalVisState *vistest)
return false;
/* Deleter committed, so tuple is dead if the XID is old enough. */
- return GlobalVisTestIsRemovableXid(vistest,
- HeapTupleHeaderGetRawXmax(tuple));
+ return GlobalVisXidVisibleToAll(vistest,
+ HeapTupleHeaderGetRawXmax(tuple));
}
/*
diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index ffc12314b41..715dfc16ba7 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -231,7 +231,7 @@ heap_page_prune_opt(Relation relation, Buffer buffer)
*/
vistest = GlobalVisTestFor(relation);
- if (!GlobalVisTestIsRemovableXid(vistest, prune_xid))
+ if (!GlobalVisXidVisibleToAll(vistest, prune_xid))
return;
/*
@@ -574,9 +574,9 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
* Determining HTSV only once for each tuple is required for correctness,
* to deal with cases where running HTSV twice could result in different
* results. For example, RECENTLY_DEAD can turn to DEAD if another
- * checked item causes GlobalVisTestIsRemovableFullXid() to update the
- * horizon, or INSERT_IN_PROGRESS can change to DEAD if the inserting
- * transaction aborts.
+ * checked item causes GlobalVisXidVisibleToAll() to update the horizon,
+ * or INSERT_IN_PROGRESS can change to DEAD if the inserting transaction
+ * aborts.
*
* It's also good for performance. Most commonly tuples within a page are
* stored at decreasing offsets (while the items are stored at increasing
@@ -1172,11 +1172,11 @@ heap_prune_satisfies_vacuum(PruneState *prstate, HeapTuple tup, Buffer buffer)
* Determine whether or not the tuple is considered dead when compared
* with the provided GlobalVisState. On-access pruning does not provide
* VacuumCutoffs. And for vacuum, even if the tuple's xmax is not older
- * than OldestXmin, GlobalVisTestIsRemovableXid() could find the row dead
- * if the GlobalVisState has been updated since the beginning of vacuuming
+ * than OldestXmin, GlobalVisXidVisibleToAll() could find the row dead if
+ * the GlobalVisState has been updated since the beginning of vacuuming
* the relation.
*/
- if (GlobalVisTestIsRemovableXid(prstate->vistest, dead_after))
+ if (GlobalVisXidVisibleToAll(prstate->vistest, dead_after))
return HEAPTUPLE_DEAD;
return res;
diff --git a/src/backend/access/spgist/spgvacuum.c b/src/backend/access/spgist/spgvacuum.c
index 2678f7ab782..4b8e5747239 100644
--- a/src/backend/access/spgist/spgvacuum.c
+++ b/src/backend/access/spgist/spgvacuum.c
@@ -536,7 +536,7 @@ vacuumRedirectAndPlaceholder(Relation index, Relation heaprel, Buffer buffer)
*/
if (dt->tupstate == SPGIST_REDIRECT &&
(!TransactionIdIsValid(dt->xid) ||
- GlobalVisTestIsRemovableXid(vistest, dt->xid)))
+ GlobalVisXidVisibleToAll(vistest, dt->xid)))
{
dt->tupstate = SPGIST_PLACEHOLDER;
Assert(opaque->nRedirection > 0);
diff --git a/src/backend/storage/ipc/procarray.c b/src/backend/storage/ipc/procarray.c
index bf987aed8d3..508bb379f87 100644
--- a/src/backend/storage/ipc/procarray.c
+++ b/src/backend/storage/ipc/procarray.c
@@ -4181,8 +4181,7 @@ GlobalVisUpdate(void)
* See comment for GlobalVisState for details.
*/
bool
-GlobalVisTestIsRemovableFullXid(GlobalVisState *state,
- FullTransactionId fxid)
+GlobalVisFullXidVisible(GlobalVisState *state, FullTransactionId fxid)
{
/*
* If fxid is older than maybe_needed bound, it definitely is visible to
@@ -4223,7 +4222,7 @@ GlobalVisTestIsRemovableFullXid(GlobalVisState *state,
* relfrozenxid).
*/
bool
-GlobalVisTestIsRemovableXid(GlobalVisState *state, TransactionId xid)
+GlobalVisXidVisibleToAll(GlobalVisState *state, TransactionId xid)
{
FullTransactionId fxid;
@@ -4237,7 +4236,7 @@ GlobalVisTestIsRemovableXid(GlobalVisState *state, TransactionId xid)
*/
fxid = FullXidRelativeTo(state->definitely_needed, xid);
- return GlobalVisTestIsRemovableFullXid(state, fxid);
+ return GlobalVisFullXidVisible(state, fxid);
}
/*
@@ -4251,12 +4250,12 @@ GlobalVisCheckRemovableFullXid(Relation rel, FullTransactionId fxid)
state = GlobalVisTestFor(rel);
- return GlobalVisTestIsRemovableFullXid(state, fxid);
+ return GlobalVisFullXidVisible(state, fxid);
}
/*
* Convenience wrapper around GlobalVisTestFor() and
- * GlobalVisTestIsRemovableXid(), see their comments.
+ * GlobalVisTestIsVisibleXid(), see their comments.
*/
bool
GlobalVisCheckRemovableXid(Relation rel, TransactionId xid)
@@ -4265,7 +4264,7 @@ GlobalVisCheckRemovableXid(Relation rel, TransactionId xid)
state = GlobalVisTestFor(rel);
- return GlobalVisTestIsRemovableXid(state, xid);
+ return GlobalVisXidVisibleToAll(state, xid);
}
/*
diff --git a/src/include/utils/snapmgr.h b/src/include/utils/snapmgr.h
index 604c1f90216..547c71fcbfe 100644
--- a/src/include/utils/snapmgr.h
+++ b/src/include/utils/snapmgr.h
@@ -100,8 +100,8 @@ extern char *ExportSnapshot(Snapshot snapshot);
*/
typedef struct GlobalVisState GlobalVisState;
extern GlobalVisState *GlobalVisTestFor(Relation rel);
-extern bool GlobalVisTestIsRemovableXid(GlobalVisState *state, TransactionId xid);
-extern bool GlobalVisTestIsRemovableFullXid(GlobalVisState *state, FullTransactionId fxid);
+extern bool GlobalVisXidVisibleToAll(GlobalVisState *state, TransactionId xid);
+extern bool GlobalVisFullXidVisible(GlobalVisState *state, FullTransactionId fxid);
extern bool GlobalVisCheckRemovableXid(Relation rel, TransactionId xid);
extern bool GlobalVisCheckRemovableFullXid(Relation rel, FullTransactionId fxid);
--
2.43.0
v7-0013-Rename-PruneState.freeze-to-attempt_freeze.patchtext/x-patch; charset=US-ASCII; name=v7-0013-Rename-PruneState.freeze-to-attempt_freeze.patchDownload
From 7f96c6f7acd4e08ba0463c5b59394bb79a80005f Mon Sep 17 00:00:00 2001
From: Melanie Plageman <melanieplageman@gmail.com>
Date: Thu, 31 Jul 2025 14:07:51 -0400
Subject: [PATCH v7 13/22] Rename PruneState.freeze to attempt_freeze
This makes it more clear that this is to indicate the caller would like
heap_page_prune_and_freeze() to consider freezing tuples -- not that we
ultimately will end up freezing them.
Also rename local variable hint_bit_fpi to did_tuple_hint_fpi. This
makes it clear it is about tuple hints and not page hints and that it
indicates something that happened and not something that could happen.
---
src/backend/access/heap/pruneheap.c | 20 ++++++++++----------
1 file changed, 10 insertions(+), 10 deletions(-)
diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index cf9e5215d6b..82127e8728b 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -43,7 +43,7 @@ typedef struct
/* whether or not dead items can be set LP_UNUSED during pruning */
bool mark_unused_now;
/* whether to attempt freezing tuples */
- bool freeze;
+ bool attempt_freeze;
/*
* Whether or not to consider updating the VM. There is some bookkeeping
@@ -452,7 +452,7 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
bool do_set_vm;
uint8 vmflags = 0;
uint8 old_vmbits = 0;
- bool hint_bit_fpi;
+ bool did_tuple_hint_fpi;
int64 fpi_before = pgWalUsage.wal_fpi;
bool all_frozen_except_lp_dead = false;
bool set_pd_all_visible = false;
@@ -460,7 +460,7 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
/* Copy parameters to prstate */
prstate.vistest = vistest;
prstate.mark_unused_now = (options & HEAP_PAGE_PRUNE_MARK_UNUSED_NOW) != 0;
- prstate.freeze = (options & HEAP_PAGE_PRUNE_FREEZE) != 0;
+ prstate.attempt_freeze = (options & HEAP_PAGE_PRUNE_FREEZE) != 0;
prstate.consider_update_vm = (options & HEAP_PAGE_PRUNE_UPDATE_VM) != 0;
prstate.cutoffs = cutoffs;
@@ -485,7 +485,7 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
/* initialize page freezing working state */
prstate.pagefrz.freeze_required = false;
- if (prstate.freeze)
+ if (prstate.attempt_freeze)
{
Assert(new_relfrozen_xid && new_relmin_mxid);
prstate.pagefrz.FreezePageRelfrozenXid = *new_relfrozen_xid;
@@ -535,7 +535,7 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
* bookkeeping. Initializing all_visible to false allows skipping the work
* to update them in heap_prune_record_unchanged_lp_normal().
*/
- if (prstate.freeze)
+ if (prstate.attempt_freeze)
{
prstate.all_visible = true;
prstate.all_frozen = true;
@@ -653,7 +653,7 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
* If checksums are enabled, heap_prune_satisfies_vacuum() may have caused
* an FPI to be emitted.
*/
- hint_bit_fpi = fpi_before != pgWalUsage.wal_fpi;
+ did_tuple_hint_fpi = fpi_before != pgWalUsage.wal_fpi;
/*
* Process HOT chains.
@@ -770,7 +770,7 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
* plans we prepared, or not.
*/
do_freeze = false;
- if (prstate.freeze)
+ if (prstate.attempt_freeze)
{
if (prstate.pagefrz.freeze_required)
{
@@ -803,7 +803,7 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
*/
if (RelationNeedsWAL(relation))
{
- if (hint_bit_fpi)
+ if (did_tuple_hint_fpi)
do_freeze = true;
else if (do_prune)
{
@@ -1127,7 +1127,7 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
presult->lpdead_items = prstate.lpdead_items;
/* the presult->deadoffsets array was already filled in */
- if (prstate.freeze)
+ if (prstate.attempt_freeze)
{
if (presult->nfrozen > 0)
{
@@ -1714,7 +1714,7 @@ heap_prune_record_unchanged_lp_normal(Page page, PruneState *prstate, OffsetNumb
* to update the VM, we have to call heap_prepare_freeze_tuple() on every
* tuple to know whether or not the page will be totally frozen.
*/
- if (prstate->freeze)
+ if (prstate->attempt_freeze)
{
bool totally_frozen;
--
2.43.0
v7-0016-Use-GlobalVisState-to-determine-page-level-visibi.patchtext/x-patch; charset=US-ASCII; name=v7-0016-Use-GlobalVisState-to-determine-page-level-visibi.patchDownload
From f3c3a44bb4dd5bf311a3a39876e1d26790321c2f Mon Sep 17 00:00:00 2001
From: Melanie Plageman <melanieplageman@gmail.com>
Date: Tue, 29 Jul 2025 14:38:24 -0400
Subject: [PATCH v7 16/22] Use GlobalVisState to determine page level
visibility
During pruning and during vacuum's third phase, we try to determine if
the whole page can be set all-visible in the visibility map. Instead of
using OldestXmin to determine if all the tuples on a page are visible to
everyone, use the GlobalVisState. This allows us to start setting the VM
during on-access pruning in a future commit.
It is possible for the GlobalVisState to change during the course of a
vacuum. In all but extraordinary cases, it moves forward, meaning more
pages could potentially be set in the VM.
Because comparing a transaction ID to the GlobalVisState requires more
operations than comparing it to another single transaction ID, we now
wait until after examining all the tuples on the page and if we have
maintained the visibility_cutoff_xid, we compare that to the
GlobalVisState just once per page. This works because if the page is
all-visible and has live, committed tuples on it, the
visibility_cutoff_xid will contain the newest xmin on the page. If
everyone can see it, the page is truly all-visible.
Doing this may mean we examine more tuples' xmins than before, as we may
have set all_visible to false sooner when encountering a live tuple
newer than OldestXmin. However, these extra comparisons were found not
to be significant in a profile.
---
src/backend/access/heap/heapam_visibility.c | 28 ++++++++++++
src/backend/access/heap/pruneheap.c | 48 +++++++++------------
src/backend/access/heap/vacuumlazy.c | 17 ++++----
src/include/access/heapam.h | 4 +-
4 files changed, 59 insertions(+), 38 deletions(-)
diff --git a/src/backend/access/heap/heapam_visibility.c b/src/backend/access/heap/heapam_visibility.c
index 4ebc8abdbeb..edd529dc3c0 100644
--- a/src/backend/access/heap/heapam_visibility.c
+++ b/src/backend/access/heap/heapam_visibility.c
@@ -1189,6 +1189,34 @@ HeapTupleSatisfiesVacuum(HeapTuple htup, TransactionId OldestXmin,
return res;
}
+/*
+ * Nearly the same as HeapTupleSatisfiesVacuum, but uses a GlobalVisState to
+ * determine whether or not a tuple is HEAPTUPLE_DEAD Or
+ * HEAPTUPLE_RECENTLY_DEAD. It serves the same purpose but can be used by
+ * callers that have not calculated a single OldestXmin value.
+ */
+HTSV_Result
+HeapTupleSatisfiesVacuumGlobalVis(HeapTuple htup, GlobalVisState *vistest,
+ Buffer buffer)
+{
+ TransactionId dead_after = InvalidTransactionId;
+ HTSV_Result res;
+
+ res = HeapTupleSatisfiesVacuumHorizon(htup, buffer, &dead_after);
+
+ if (res == HEAPTUPLE_RECENTLY_DEAD)
+ {
+ Assert(TransactionIdIsValid(dead_after));
+
+ if (GlobalVisXidVisibleToAll(vistest, dead_after))
+ res = HEAPTUPLE_DEAD;
+ }
+ else
+ Assert(!TransactionIdIsValid(dead_after));
+
+ return res;
+}
+
/*
* Work horse for HeapTupleSatisfiesVacuum and similar routines.
*
diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index 715dfc16ba7..ab79d8a3ed9 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -141,10 +141,9 @@ typedef struct
* all_visible and all_frozen indicate if the all-visible and all-frozen
* bits in the visibility map can be set for this page after pruning.
*
- * visibility_cutoff_xid is the newest xmin of live tuples on the page.
- * The caller can use it as the conflict horizon, when setting the VM
- * bits. It is only valid if we froze some tuples, and all_frozen is
- * true.
+ * visibility_cutoff_xid is the newest xmin of live tuples on the page. It
+ * can be used as the conflict horizon, when setting the VM or when
+ * freezing all the live tuples on the page.
*
* NOTE: all_visible and all_frozen don't include LP_DEAD items until
* directly before updating the VM. We ignore LP_DEAD items when deciding
@@ -553,14 +552,12 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
/*
* The visibility cutoff xid is the newest xmin of live, committed tuples
- * older than OldestXmin on the page. This field is only kept up-to-date
- * if the page is all-visible. As soon as a tuple is encountered that is
- * not visible to all, this field is unmaintained. As long as it is
- * maintained, it can be used to calculate the snapshot conflict horizon.
- * This is most likely to happen when updating the VM and/or freezing all
- * live tuples on the page. It is updated before returning to the caller
- * because vacuum does assert-build only validation on the page using this
- * field.
+ * on the page older than the visibility horizon represented in the
+ * GlobalVisState.
+ *
+ * If we encounter an uncommitted tuple, this field is unmaintained. If
+ * the page is being set all-visible or when freezing all live tuples on
+ * the page, it is used to calculate the snapshot conflict horizon.
*/
prstate.visibility_cutoff_xid = InvalidTransactionId;
@@ -756,6 +753,16 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
prstate.ndead > 0 ||
prstate.nunused > 0;
+ /*
+ * After processing all the live tuples on the page, if the newest xmin
+ * amongst them is not visible to everyone, the page cannot be
+ * all-visible.
+ */
+ if (prstate.all_visible &&
+ TransactionIdIsNormal(prstate.visibility_cutoff_xid) &&
+ !GlobalVisXidVisibleToAll(prstate.vistest, prstate.visibility_cutoff_xid))
+ prstate.all_visible = prstate.all_frozen = false;
+
/*
* Even if we don't prune anything, if we found a new value for the
* pd_prune_xid field or the page was marked full, we will update those
@@ -1098,12 +1105,10 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
TransactionId debug_cutoff;
bool debug_all_frozen;
- Assert(cutoffs);
-
Assert(prstate.lpdead_items == 0);
if (!heap_page_is_all_visible(relation, buffer,
- cutoffs->OldestXmin,
+ prstate.vistest,
&debug_all_frozen,
&debug_cutoff, off_loc))
Assert(false);
@@ -1628,19 +1633,6 @@ heap_prune_record_unchanged_lp_normal(Page page, PruneState *prstate, OffsetNumb
*/
xmin = HeapTupleHeaderGetXmin(htup);
- /*
- * For now always use prstate->cutoffs for this test, because
- * we only update 'all_visible' when freezing is requested. We
- * could use GlobalVisTestIsRemovableXid instead, if a
- * non-freezing caller wanted to set the VM bit.
- */
- Assert(prstate->cutoffs);
- if (!TransactionIdPrecedes(xmin, prstate->cutoffs->OldestXmin))
- {
- prstate->all_visible = prstate->all_frozen = false;
- break;
- }
-
/* Track newest xmin on page. */
if (TransactionIdFollows(xmin, prstate->visibility_cutoff_xid) &&
TransactionIdIsNormal(xmin))
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index 6a0fa371a06..777ec30eb82 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -465,7 +465,7 @@ static void dead_items_add(LVRelState *vacrel, BlockNumber blkno, OffsetNumber *
static void dead_items_reset(LVRelState *vacrel);
static void dead_items_cleanup(LVRelState *vacrel);
static bool heap_page_is_all_visible_except_lpdead(Relation rel, Buffer buf,
- TransactionId OldestXmin,
+ GlobalVisState *vistest,
OffsetNumber *deadoffsets,
int allowed_num_offsets,
bool *all_frozen,
@@ -2716,7 +2716,7 @@ lazy_vacuum_heap_page(LVRelState *vacrel, BlockNumber blkno, Buffer buffer,
InvalidOffsetNumber);
if (heap_page_is_all_visible_except_lpdead(vacrel->rel, buffer,
- vacrel->cutoffs.OldestXmin,
+ vacrel->vistest,
deadoffsets, num_offsets,
&all_frozen, &visibility_cutoff_xid,
&vacrel->offnum))
@@ -3459,13 +3459,13 @@ dead_items_cleanup(LVRelState *vacrel)
*/
bool
heap_page_is_all_visible(Relation rel, Buffer buf,
- TransactionId OldestXmin,
+ GlobalVisState *vistest,
bool *all_frozen,
TransactionId *visibility_cutoff_xid,
OffsetNumber *logging_offnum)
{
- return heap_page_is_all_visible_except_lpdead(rel, buf, OldestXmin,
+ return heap_page_is_all_visible_except_lpdead(rel, buf, vistest,
NULL, 0,
all_frozen,
visibility_cutoff_xid,
@@ -3500,7 +3500,7 @@ heap_page_is_all_visible(Relation rel, Buffer buf,
*/
static bool
heap_page_is_all_visible_except_lpdead(Relation rel, Buffer buf,
- TransactionId OldestXmin,
+ GlobalVisState *vistest,
OffsetNumber *deadoffsets,
int allowed_num_offsets,
bool *all_frozen,
@@ -3555,8 +3555,8 @@ heap_page_is_all_visible_except_lpdead(Relation rel, Buffer buf,
tuple.t_len = ItemIdGetLength(itemid);
tuple.t_tableOid = RelationGetRelid(rel);
- switch (HeapTupleSatisfiesVacuum(&tuple, OldestXmin,
- buf))
+ switch (HeapTupleSatisfiesVacuumGlobalVis(&tuple, vistest,
+ buf))
{
case HEAPTUPLE_LIVE:
{
@@ -3575,8 +3575,7 @@ heap_page_is_all_visible_except_lpdead(Relation rel, Buffer buf,
* that everyone sees it as committed?
*/
xmin = HeapTupleHeaderGetXmin(tuple.t_data);
- if (!TransactionIdPrecedes(xmin,
- OldestXmin))
+ if (!GlobalVisXidVisibleToAll(vistest, xmin))
{
all_visible = false;
*all_frozen = false;
diff --git a/src/include/access/heapam.h b/src/include/access/heapam.h
index 0b9bb1c9b13..4278f351bdf 100644
--- a/src/include/access/heapam.h
+++ b/src/include/access/heapam.h
@@ -342,7 +342,7 @@ extern void heap_inplace_unlock(Relation relation,
HeapTuple oldtup, Buffer buffer);
extern bool heap_page_is_all_visible(Relation rel, Buffer buf,
- TransactionId OldestXmin,
+ GlobalVisState *vistest,
bool *all_frozen,
TransactionId *visibility_cutoff_xid,
OffsetNumber *logging_offnum);
@@ -415,6 +415,8 @@ extern TM_Result HeapTupleSatisfiesUpdate(HeapTuple htup, CommandId curcid,
Buffer buffer);
extern HTSV_Result HeapTupleSatisfiesVacuum(HeapTuple htup, TransactionId OldestXmin,
Buffer buffer);
+extern HTSV_Result HeapTupleSatisfiesVacuumGlobalVis(HeapTuple htup,
+ GlobalVisState *vistest, Buffer buffer);
extern HTSV_Result HeapTupleSatisfiesVacuumHorizon(HeapTuple htup, Buffer buffer,
TransactionId *dead_after);
extern void HeapTupleSetHintBits(HeapTupleHeader tuple, Buffer buffer,
--
2.43.0
v7-0017-Inline-TransactionIdFollows-Precedes.patchtext/x-patch; charset=US-ASCII; name=v7-0017-Inline-TransactionIdFollows-Precedes.patchDownload
From 364e7b71bbbe0b2e5956acbc33bea4fe8d1b3979 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <melanieplageman@gmail.com>
Date: Tue, 29 Jul 2025 09:57:13 -0400
Subject: [PATCH v7 17/22] Inline TransactionIdFollows/Precedes()
Calling these from on-access pruning code had noticeable overhead in a
profile. There does not seem to be a reason not to inline them.
---
src/backend/access/transam/transam.c | 64 -------------------------
src/include/access/transam.h | 70 ++++++++++++++++++++++++++--
2 files changed, 66 insertions(+), 68 deletions(-)
diff --git a/src/backend/access/transam/transam.c b/src/backend/access/transam/transam.c
index 9a39451a29a..553d6756cb3 100644
--- a/src/backend/access/transam/transam.c
+++ b/src/backend/access/transam/transam.c
@@ -273,70 +273,6 @@ TransactionIdAbortTree(TransactionId xid, int nxids, TransactionId *xids)
TRANSACTION_STATUS_ABORTED, InvalidXLogRecPtr);
}
-/*
- * TransactionIdPrecedes --- is id1 logically < id2?
- */
-bool
-TransactionIdPrecedes(TransactionId id1, TransactionId id2)
-{
- /*
- * If either ID is a permanent XID then we can just do unsigned
- * comparison. If both are normal, do a modulo-2^32 comparison.
- */
- int32 diff;
-
- if (!TransactionIdIsNormal(id1) || !TransactionIdIsNormal(id2))
- return (id1 < id2);
-
- diff = (int32) (id1 - id2);
- return (diff < 0);
-}
-
-/*
- * TransactionIdPrecedesOrEquals --- is id1 logically <= id2?
- */
-bool
-TransactionIdPrecedesOrEquals(TransactionId id1, TransactionId id2)
-{
- int32 diff;
-
- if (!TransactionIdIsNormal(id1) || !TransactionIdIsNormal(id2))
- return (id1 <= id2);
-
- diff = (int32) (id1 - id2);
- return (diff <= 0);
-}
-
-/*
- * TransactionIdFollows --- is id1 logically > id2?
- */
-bool
-TransactionIdFollows(TransactionId id1, TransactionId id2)
-{
- int32 diff;
-
- if (!TransactionIdIsNormal(id1) || !TransactionIdIsNormal(id2))
- return (id1 > id2);
-
- diff = (int32) (id1 - id2);
- return (diff > 0);
-}
-
-/*
- * TransactionIdFollowsOrEquals --- is id1 logically >= id2?
- */
-bool
-TransactionIdFollowsOrEquals(TransactionId id1, TransactionId id2)
-{
- int32 diff;
-
- if (!TransactionIdIsNormal(id1) || !TransactionIdIsNormal(id2))
- return (id1 >= id2);
-
- diff = (int32) (id1 - id2);
- return (diff >= 0);
-}
-
/*
* TransactionIdLatest --- get latest XID among a main xact and its children
diff --git a/src/include/access/transam.h b/src/include/access/transam.h
index 7d82cd2eb56..c9e20418275 100644
--- a/src/include/access/transam.h
+++ b/src/include/access/transam.h
@@ -255,6 +255,72 @@ typedef struct TransamVariablesData
} TransamVariablesData;
+
+/*
+ * TransactionIdPrecedes --- is id1 logically < id2?
+ */
+static inline bool
+TransactionIdPrecedes(TransactionId id1, TransactionId id2)
+{
+ /*
+ * If either ID is a permanent XID then we can just do unsigned
+ * comparison. If both are normal, do a modulo-2^32 comparison.
+ */
+ int32 diff;
+
+ if (!TransactionIdIsNormal(id1) || !TransactionIdIsNormal(id2))
+ return (id1 < id2);
+
+ diff = (int32) (id1 - id2);
+ return (diff < 0);
+}
+
+/*
+ * TransactionIdPrecedesOrEquals --- is id1 logically <= id2?
+ */
+static inline bool
+TransactionIdPrecedesOrEquals(TransactionId id1, TransactionId id2)
+{
+ int32 diff;
+
+ if (!TransactionIdIsNormal(id1) || !TransactionIdIsNormal(id2))
+ return (id1 <= id2);
+
+ diff = (int32) (id1 - id2);
+ return (diff <= 0);
+}
+
+/*
+ * TransactionIdFollows --- is id1 logically > id2?
+ */
+static inline bool
+TransactionIdFollows(TransactionId id1, TransactionId id2)
+{
+ int32 diff;
+
+ if (!TransactionIdIsNormal(id1) || !TransactionIdIsNormal(id2))
+ return (id1 > id2);
+
+ diff = (int32) (id1 - id2);
+ return (diff > 0);
+}
+
+/*
+ * TransactionIdFollowsOrEquals --- is id1 logically >= id2?
+ */
+static inline bool
+TransactionIdFollowsOrEquals(TransactionId id1, TransactionId id2)
+{
+ int32 diff;
+
+ if (!TransactionIdIsNormal(id1) || !TransactionIdIsNormal(id2))
+ return (id1 >= id2);
+
+ diff = (int32) (id1 - id2);
+ return (diff >= 0);
+}
+
+
/* ----------------
* extern declarations
* ----------------
@@ -274,10 +340,6 @@ extern bool TransactionIdDidAbort(TransactionId transactionId);
extern void TransactionIdCommitTree(TransactionId xid, int nxids, TransactionId *xids);
extern void TransactionIdAsyncCommitTree(TransactionId xid, int nxids, TransactionId *xids, XLogRecPtr lsn);
extern void TransactionIdAbortTree(TransactionId xid, int nxids, TransactionId *xids);
-extern bool TransactionIdPrecedes(TransactionId id1, TransactionId id2);
-extern bool TransactionIdPrecedesOrEquals(TransactionId id1, TransactionId id2);
-extern bool TransactionIdFollows(TransactionId id1, TransactionId id2);
-extern bool TransactionIdFollowsOrEquals(TransactionId id1, TransactionId id2);
extern TransactionId TransactionIdLatest(TransactionId mainxid,
int nxids, const TransactionId *xids);
extern XLogRecPtr TransactionIdGetCommitLSN(TransactionId xid);
--
2.43.0
v7-0019-Allow-on-access-pruning-to-set-pages-all-visible.patchtext/x-patch; charset=US-ASCII; name=v7-0019-Allow-on-access-pruning-to-set-pages-all-visible.patchDownload
From 55494f3135534695953ce03183f56ff331b3e26e Mon Sep 17 00:00:00 2001
From: Melanie Plageman <melanieplageman@gmail.com>
Date: Tue, 29 Jul 2025 14:34:30 -0400
Subject: [PATCH v7 19/22] Allow on-access pruning to set pages all-visible
Many queries do not modify the underlying relation. For such queries, if
on-access pruning occurs during the scan, we can check whether the page
has become all-visible and update the visibility map accordingly.
Previously, only vacuum marked pages as all-visible or all-frozen.
Supporting this requires passing information about whether the relation
is modified from the executor down to the scan descriptor.
This commit implements on-access VM setting for sequential scans as well
as for the underlying heap relation in index scans and bitmap heap
scans.
---
src/backend/access/heap/heapam.c | 15 ++++-
src/backend/access/heap/heapam_handler.c | 15 ++++-
src/backend/access/heap/pruneheap.c | 63 ++++++++++++++-----
src/backend/access/index/indexam.c | 46 ++++++++++++++
src/backend/access/table/tableam.c | 39 ++++++++++--
src/backend/executor/execMain.c | 4 ++
src/backend/executor/execUtils.c | 2 +
src/backend/executor/nodeBitmapHeapscan.c | 7 ++-
src/backend/executor/nodeIndexscan.c | 18 ++++--
src/backend/executor/nodeSeqscan.c | 24 +++++--
src/include/access/genam.h | 11 ++++
src/include/access/heapam.h | 24 ++++++-
src/include/access/relscan.h | 6 ++
src/include/access/tableam.h | 30 ++++++++-
src/include/nodes/execnodes.h | 6 ++
.../t/035_standby_logical_decoding.pl | 4 +-
16 files changed, 276 insertions(+), 38 deletions(-)
diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index 060a166e18f..8a8b63b79f2 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -556,6 +556,7 @@ heap_prepare_pagescan(TableScanDesc sscan)
Buffer buffer = scan->rs_cbuf;
BlockNumber block = scan->rs_cblock;
Snapshot snapshot;
+ Buffer *vmbuffer = NULL;
Page page;
int lines;
bool all_visible;
@@ -570,7 +571,9 @@ heap_prepare_pagescan(TableScanDesc sscan)
/*
* Prune and repair fragmentation for the whole page, if possible.
*/
- heap_page_prune_opt(scan->rs_base.rs_rd, buffer);
+ if (sscan->rs_flags & SO_ALLOW_VM_SET)
+ vmbuffer = &scan->rs_vmbuffer;
+ heap_page_prune_opt(scan->rs_base.rs_rd, buffer, vmbuffer);
/*
* We must hold share lock on the buffer content while examining tuple
@@ -1247,6 +1250,7 @@ heap_beginscan(Relation relation, Snapshot snapshot,
sizeof(TBMIterateResult));
}
+ scan->rs_vmbuffer = InvalidBuffer;
return (TableScanDesc) scan;
}
@@ -1285,6 +1289,12 @@ heap_rescan(TableScanDesc sscan, ScanKey key, bool set_params,
scan->rs_cbuf = InvalidBuffer;
}
+ if (BufferIsValid(scan->rs_vmbuffer))
+ {
+ ReleaseBuffer(scan->rs_vmbuffer);
+ scan->rs_vmbuffer = InvalidBuffer;
+ }
+
/*
* SO_TYPE_BITMAPSCAN would be cleaned up here, but it does not hold any
* additional data vs a normal HeapScan
@@ -1317,6 +1327,9 @@ heap_endscan(TableScanDesc sscan)
if (BufferIsValid(scan->rs_cbuf))
ReleaseBuffer(scan->rs_cbuf);
+ if (BufferIsValid(scan->rs_vmbuffer))
+ ReleaseBuffer(scan->rs_vmbuffer);
+
/*
* Must free the read stream before freeing the BufferAccessStrategy.
*/
diff --git a/src/backend/access/heap/heapam_handler.c b/src/backend/access/heap/heapam_handler.c
index cb4bc35c93e..c68283de6f2 100644
--- a/src/backend/access/heap/heapam_handler.c
+++ b/src/backend/access/heap/heapam_handler.c
@@ -85,6 +85,7 @@ heapam_index_fetch_begin(Relation rel)
hscan->xs_base.rel = rel;
hscan->xs_cbuf = InvalidBuffer;
+ hscan->xs_vmbuffer = InvalidBuffer;
return &hscan->xs_base;
}
@@ -99,6 +100,12 @@ heapam_index_fetch_reset(IndexFetchTableData *scan)
ReleaseBuffer(hscan->xs_cbuf);
hscan->xs_cbuf = InvalidBuffer;
}
+
+ if (BufferIsValid(hscan->xs_vmbuffer))
+ {
+ ReleaseBuffer(hscan->xs_vmbuffer);
+ hscan->xs_vmbuffer = InvalidBuffer;
+ }
}
static void
@@ -138,7 +145,8 @@ heapam_index_fetch_tuple(struct IndexFetchTableData *scan,
* Prune page, but only if we weren't already on this page
*/
if (prev_buf != hscan->xs_cbuf)
- heap_page_prune_opt(hscan->xs_base.rel, hscan->xs_cbuf);
+ heap_page_prune_opt(hscan->xs_base.rel, hscan->xs_cbuf,
+ scan->modifies_base_rel ? NULL : &hscan->xs_vmbuffer);
}
/* Obtain share-lock on the buffer so we can examine visibility */
@@ -2471,6 +2479,7 @@ BitmapHeapScanNextBlock(TableScanDesc scan,
TBMIterateResult *tbmres;
OffsetNumber offsets[TBM_MAX_TUPLES_PER_PAGE];
int noffsets = -1;
+ Buffer *vmbuffer = NULL;
Assert(scan->rs_flags & SO_TYPE_BITMAPSCAN);
Assert(hscan->rs_read_stream);
@@ -2517,7 +2526,9 @@ BitmapHeapScanNextBlock(TableScanDesc scan,
/*
* Prune and repair fragmentation for the whole page, if possible.
*/
- heap_page_prune_opt(scan->rs_rd, buffer);
+ if (scan->rs_flags & SO_ALLOW_VM_SET)
+ vmbuffer = &hscan->rs_vmbuffer;
+ heap_page_prune_opt(scan->rs_rd, buffer, vmbuffer);
/*
* We must hold share lock on the buffer content while examining tuple
diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index 80d055e5376..dad341cb265 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -198,9 +198,13 @@ static bool identify_and_fix_vm_corruption(Relation relation,
* if there's not any use in pruning.
*
* Caller must have pin on the buffer, and must *not* have a lock on it.
+ *
+ * If vmbuffer is not NULL, it is okay for pruning to set the visibility map if
+ * the page is all visible. We will take care of pinning and, if needed,
+ * reading in the page of the visibility map.
*/
void
-heap_page_prune_opt(Relation relation, Buffer buffer)
+heap_page_prune_opt(Relation relation, Buffer buffer, Buffer *vmbuffer)
{
Page page = BufferGetPage(buffer);
TransactionId prune_xid;
@@ -264,6 +268,13 @@ heap_page_prune_opt(Relation relation, Buffer buffer)
{
OffsetNumber dummy_off_loc;
PruneFreezeResult presult;
+ int options = 0;
+
+ if (vmbuffer)
+ {
+ visibilitymap_pin(relation, BufferGetBlockNumber(buffer), vmbuffer);
+ options = HEAP_PAGE_PRUNE_UPDATE_VM;
+ }
/*
* For now, pass mark_unused_now as false regardless of whether or
@@ -271,9 +282,10 @@ heap_page_prune_opt(Relation relation, Buffer buffer)
* that during on-access pruning with the current implementation.
*/
heap_page_prune_and_freeze(relation, buffer, false,
- InvalidBuffer,
- vistest, 0,
- NULL, &presult, PRUNE_ON_ACCESS, &dummy_off_loc, NULL, NULL);
+ vmbuffer ? *vmbuffer : InvalidBuffer,
+ vistest, options,
+ NULL, &presult, PRUNE_ON_ACCESS,
+ &dummy_off_loc, NULL, NULL);
/*
* Report the number of tuples reclaimed to pgstats. This is
@@ -513,12 +525,17 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
* all-frozen for use in opportunistic freezing and to update the VM if
* the caller requests it.
*
- * Currently, only VACUUM attempts freezing and setting the VM bits. But
- * other callers could do either one. The visibility bookkeeping is
- * required for opportunistic freezing (in addition to setting the VM
- * bits) because we only consider opportunistically freezing tuples if the
- * whole page would become all-frozen or if the whole page will be frozen
- * except for dead tuples that will be removed by vacuum.
+ * Currently, only VACUUM attempts freezing. But other callers could. The
+ * visibility bookkeeping is required for opportunistic freezing (in
+ * addition to setting the VM bits) because we only consider
+ * opportunistically freezing tuples if the whole page would become
+ * all-frozen or if the whole page will be frozen except for dead tuples
+ * that will be removed by vacuum. But if consider_update_vm is false,
+ * we'll not set the VM even if the page is discovered to be all-visible.
+ *
+ * If only HEAP_PAGE_PRUNE_UPDATE_VM is passed and not
+ * HEAP_PAGE_PRUNE_FREEZE, prstate.all_frozen must be initialized to false
+ * because we will not call heap_prepare_freeze_tuple() on each tuple.
*
* If only updating the VM, we must initialize all_frozen to false, as
* heap_prepare_freeze_tuple() will not be called for each tuple on the
@@ -530,7 +547,7 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
* whether or not to freeze but before deciding whether or not to update
* the VM so that we don't set the VM bit incorrectly.
*
- * If not freezing or updating the VM, we otherwise avoid the extra
+ * If not freezing and not updating the VM, we avoid the extra
* bookkeeping. Initializing all_visible to false allows skipping the work
* to update them in heap_prune_record_unchanged_lp_normal().
*/
@@ -879,12 +896,30 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
prstate.all_frozen = false;
}
+ /*
+ * If this is an on-access call and we're not actually pruning, avoid
+ * setting the visibility map if it would newly dirty the heap page or, if
+ * the page is already dirty, if doing so would require including a
+ * full-page image (FPI) of the heap page in the WAL. This situation
+ * should be rare, as on-access pruning is only attempted when
+ * pd_prune_xid is valid.
+ */
+ if (reason == PRUNE_ON_ACCESS &&
+ prstate.consider_update_vm &&
+ prstate.all_visible &&
+ !do_prune && !do_freeze &&
+ (!BufferIsDirty(buffer) || XLogCheckBufferNeedsBackup(buffer)))
+ {
+ prstate.consider_update_vm = false;
+ prstate.all_visible = prstate.all_frozen = false;
+ }
+
Assert(!prstate.all_frozen || prstate.all_visible);
/*
- * Handle setting visibility map bit based on information from the VM (as
- * of last heap_vac_scan_next_block() call), and from all_visible and
- * all_frozen variables.
+ * Handle setting visibility map bit based on information from the VM (if
+ * provided, e.g. by vacuum from the last heap_vac_scan_next_block()
+ * call), and from all_visible and all_frozen variables.
*/
if (prstate.consider_update_vm)
{
diff --git a/src/backend/access/index/indexam.c b/src/backend/access/index/indexam.c
index 86d11f4ec79..4603ece09bd 100644
--- a/src/backend/access/index/indexam.c
+++ b/src/backend/access/index/indexam.c
@@ -289,6 +289,32 @@ index_beginscan(Relation heapRelation,
return scan;
}
+/*
+ * Similar to index_beginscan(), but allows the caller to indicate whether the
+ * query modifies the underlying base relation. This is used when the caller
+ * wants to attempt marking pages in the base relation as all-visible in the
+ * visibility map during on-access pruning.
+ */
+IndexScanDesc
+index_beginscan_vmset(Relation heapRelation,
+ Relation indexRelation,
+ Snapshot snapshot,
+ IndexScanInstrumentation *instrument,
+ int nkeys, int norderbys, bool modifies_base_rel)
+{
+ IndexScanDesc scan;
+
+ scan = index_beginscan(heapRelation,
+ indexRelation,
+ snapshot,
+ instrument,
+ nkeys, norderbys);
+
+ scan->xs_heapfetch->modifies_base_rel = modifies_base_rel;
+
+ return scan;
+}
+
/*
* index_beginscan_bitmap - start a scan of an index with amgetbitmap
*
@@ -620,6 +646,26 @@ index_beginscan_parallel(Relation heaprel, Relation indexrel,
return scan;
}
+/*
+ * Parallel version of index_beginscan_vmset()
+ */
+IndexScanDesc
+index_beginscan_parallel_vmset(Relation heaprel, Relation indexrel,
+ IndexScanInstrumentation *instrument,
+ int nkeys, int norderbys,
+ ParallelIndexScanDesc pscan,
+ bool modifies_base_rel)
+{
+ IndexScanDesc scan;
+
+ scan = index_beginscan_parallel(heaprel, indexrel,
+ instrument,
+ nkeys, norderbys,
+ pscan);
+ scan->xs_heapfetch->modifies_base_rel = modifies_base_rel;
+ return scan;
+}
+
/* ----------------
* index_getnext_tid - get the next TID from a scan
*
diff --git a/src/backend/access/table/tableam.c b/src/backend/access/table/tableam.c
index a56c5eceb14..67dbf99f5b5 100644
--- a/src/backend/access/table/tableam.c
+++ b/src/backend/access/table/tableam.c
@@ -49,6 +49,10 @@
char *default_table_access_method = DEFAULT_TABLE_ACCESS_METHOD;
bool synchronize_seqscans = true;
+/* Helper for table_beginscan_parallel() and table_beginscan_parallel_vmset() */
+static TableScanDesc table_beginscan_parallel_common(Relation relation, ParallelTableScanDesc pscan,
+ uint32 flags);
+
/* ----------------------------------------------------------------------------
* Slot functions.
@@ -162,12 +166,14 @@ table_parallelscan_initialize(Relation rel, ParallelTableScanDesc pscan,
}
}
-TableScanDesc
-table_beginscan_parallel(Relation relation, ParallelTableScanDesc pscan)
+/*
+ * Common helper for table_beginscan_parallel() and table_beginscan_parallel_vmset()
+ */
+static TableScanDesc
+table_beginscan_parallel_common(Relation relation, ParallelTableScanDesc pscan,
+ uint32 flags)
{
Snapshot snapshot;
- uint32 flags = SO_TYPE_SEQSCAN |
- SO_ALLOW_STRAT | SO_ALLOW_SYNC | SO_ALLOW_PAGEMODE;
Assert(RelFileLocatorEquals(relation->rd_locator, pscan->phs_locator));
@@ -188,6 +194,31 @@ table_beginscan_parallel(Relation relation, ParallelTableScanDesc pscan)
pscan, flags);
}
+TableScanDesc
+table_beginscan_parallel(Relation relation, ParallelTableScanDesc pscan)
+{
+ uint32 flags = SO_TYPE_SEQSCAN |
+ SO_ALLOW_STRAT | SO_ALLOW_SYNC | SO_ALLOW_PAGEMODE;
+
+ return table_beginscan_parallel_common(relation, pscan, flags);
+}
+
+/*
+ * Parallel version of table_beginscan_vmset()
+ */
+TableScanDesc
+table_beginscan_parallel_vmset(Relation relation, ParallelTableScanDesc pscan,
+ bool modifies_rel)
+{
+ uint32 flags = SO_TYPE_SEQSCAN |
+ SO_ALLOW_STRAT | SO_ALLOW_SYNC | SO_ALLOW_PAGEMODE;
+
+ if (!modifies_rel)
+ flags |= SO_ALLOW_VM_SET;
+
+ return table_beginscan_parallel_common(relation, pscan, flags);
+}
+
/* ----------------------------------------------------------------------------
* Index scan related functions.
diff --git a/src/backend/executor/execMain.c b/src/backend/executor/execMain.c
index b8b9d2a85f7..a862701edbe 100644
--- a/src/backend/executor/execMain.c
+++ b/src/backend/executor/execMain.c
@@ -916,6 +916,10 @@ InitPlan(QueryDesc *queryDesc, int eflags)
break;
}
+ /* If it has a rowmark, the relation is modified */
+ estate->es_modified_relids = bms_add_member(estate->es_modified_relids,
+ rc->rti);
+
/* Check that relation is a legal target for marking */
if (relation)
CheckValidRowMarkRel(relation, rc->markType);
diff --git a/src/backend/executor/execUtils.c b/src/backend/executor/execUtils.c
index fdc65c2b42b..28a06dcd244 100644
--- a/src/backend/executor/execUtils.c
+++ b/src/backend/executor/execUtils.c
@@ -893,6 +893,8 @@ ExecInitResultRelation(EState *estate, ResultRelInfo *resultRelInfo,
estate->es_result_relations = (ResultRelInfo **)
palloc0(estate->es_range_table_size * sizeof(ResultRelInfo *));
estate->es_result_relations[rti - 1] = resultRelInfo;
+ estate->es_modified_relids = bms_add_member(estate->es_modified_relids,
+ rti);
/*
* Saving in the list allows to avoid needlessly traversing the whole
diff --git a/src/backend/executor/nodeBitmapHeapscan.c b/src/backend/executor/nodeBitmapHeapscan.c
index bf24f3d7fe0..af6db9f7919 100644
--- a/src/backend/executor/nodeBitmapHeapscan.c
+++ b/src/backend/executor/nodeBitmapHeapscan.c
@@ -105,11 +105,16 @@ BitmapTableScanSetup(BitmapHeapScanState *node)
*/
if (!node->ss.ss_currentScanDesc)
{
+ bool modifies_rel =
+ bms_is_member(((Scan *) node->ss.ps.plan)->scanrelid,
+ node->ss.ps.state->es_modified_relids);
+
node->ss.ss_currentScanDesc =
table_beginscan_bm(node->ss.ss_currentRelation,
node->ss.ps.state->es_snapshot,
0,
- NULL);
+ NULL,
+ modifies_rel);
}
node->ss.ss_currentScanDesc->st.rs_tbmiterator = tbmiterator;
diff --git a/src/backend/executor/nodeIndexscan.c b/src/backend/executor/nodeIndexscan.c
index 7fcaa37fe62..c2ffbd3b08e 100644
--- a/src/backend/executor/nodeIndexscan.c
+++ b/src/backend/executor/nodeIndexscan.c
@@ -102,16 +102,22 @@ IndexNext(IndexScanState *node)
if (scandesc == NULL)
{
+
+ bool modifies_base_rel =
+ bms_is_member(((Scan *) node->ss.ps.plan)->scanrelid,
+ estate->es_modified_relids);
+
/*
* We reach here if the index scan is not parallel, or if we're
* serially executing an index scan that was planned to be parallel.
*/
- scandesc = index_beginscan(node->ss.ss_currentRelation,
- node->iss_RelationDesc,
- estate->es_snapshot,
- &node->iss_Instrument,
- node->iss_NumScanKeys,
- node->iss_NumOrderByKeys);
+ scandesc = index_beginscan_vmset(node->ss.ss_currentRelation,
+ node->iss_RelationDesc,
+ estate->es_snapshot,
+ &node->iss_Instrument,
+ node->iss_NumScanKeys,
+ node->iss_NumOrderByKeys,
+ modifies_base_rel);
node->iss_ScanDesc = scandesc;
diff --git a/src/backend/executor/nodeSeqscan.c b/src/backend/executor/nodeSeqscan.c
index 94047d29430..fd69275c181 100644
--- a/src/backend/executor/nodeSeqscan.c
+++ b/src/backend/executor/nodeSeqscan.c
@@ -65,13 +65,18 @@ SeqNext(SeqScanState *node)
if (scandesc == NULL)
{
+ bool modifies_rel =
+ bms_is_member(((Scan *) node->ss.ps.plan)->scanrelid,
+ estate->es_modified_relids);
+
/*
* We reach here if the scan is not parallel, or if we're serially
* executing a scan that was planned to be parallel.
*/
- scandesc = table_beginscan(node->ss.ss_currentRelation,
- estate->es_snapshot,
- 0, NULL);
+ scandesc = table_beginscan_vmset(node->ss.ss_currentRelation,
+ estate->es_snapshot,
+ 0, NULL, modifies_rel);
+
node->ss.ss_currentScanDesc = scandesc;
}
@@ -366,6 +371,7 @@ ExecSeqScanInitializeDSM(SeqScanState *node,
ParallelContext *pcxt)
{
EState *estate = node->ss.ps.state;
+ bool modifies_rel;
ParallelTableScanDesc pscan;
pscan = shm_toc_allocate(pcxt->toc, node->pscan_len);
@@ -373,8 +379,11 @@ ExecSeqScanInitializeDSM(SeqScanState *node,
pscan,
estate->es_snapshot);
shm_toc_insert(pcxt->toc, node->ss.ps.plan->plan_node_id, pscan);
+ modifies_rel = bms_is_member(((Scan *) node->ss.ps.plan)->scanrelid,
+ estate->es_modified_relids);
node->ss.ss_currentScanDesc =
- table_beginscan_parallel(node->ss.ss_currentRelation, pscan);
+ table_beginscan_parallel_vmset(node->ss.ss_currentRelation, pscan,
+ modifies_rel);
}
/* ----------------------------------------------------------------
@@ -404,8 +413,13 @@ ExecSeqScanInitializeWorker(SeqScanState *node,
ParallelWorkerContext *pwcxt)
{
ParallelTableScanDesc pscan;
+ bool modifies_rel =
+ bms_is_member(((Scan *) node->ss.ps.plan)->scanrelid,
+ node->ss.ps.state->es_modified_relids);
pscan = shm_toc_lookup(pwcxt->toc, node->ss.ps.plan->plan_node_id, false);
node->ss.ss_currentScanDesc =
- table_beginscan_parallel(node->ss.ss_currentRelation, pscan);
+ table_beginscan_parallel_vmset(node->ss.ss_currentRelation,
+ pscan,
+ modifies_rel);
}
diff --git a/src/include/access/genam.h b/src/include/access/genam.h
index 5b2ab181b5f..bf272c2c37f 100644
--- a/src/include/access/genam.h
+++ b/src/include/access/genam.h
@@ -180,6 +180,11 @@ extern IndexScanDesc index_beginscan(Relation heapRelation,
Snapshot snapshot,
IndexScanInstrumentation *instrument,
int nkeys, int norderbys);
+extern IndexScanDesc index_beginscan_vmset(Relation heapRelation,
+ Relation indexRelation,
+ Snapshot snapshot,
+ IndexScanInstrumentation *instrument,
+ int nkeys, int norderbys, bool modifies_heap_rel);
extern IndexScanDesc index_beginscan_bitmap(Relation indexRelation,
Snapshot snapshot,
IndexScanInstrumentation *instrument,
@@ -206,6 +211,12 @@ extern IndexScanDesc index_beginscan_parallel(Relation heaprel,
IndexScanInstrumentation *instrument,
int nkeys, int norderbys,
ParallelIndexScanDesc pscan);
+
+extern IndexScanDesc index_beginscan_parallel_vmset(Relation heaprel, Relation indexrel,
+ IndexScanInstrumentation *instrument,
+ int nkeys, int norderbys,
+ ParallelIndexScanDesc pscan,
+ bool modifies_rel);
extern ItemPointer index_getnext_tid(IndexScanDesc scan,
ScanDirection direction);
struct TupleTableSlot;
diff --git a/src/include/access/heapam.h b/src/include/access/heapam.h
index 4278f351bdf..16f7904a21e 100644
--- a/src/include/access/heapam.h
+++ b/src/include/access/heapam.h
@@ -94,6 +94,13 @@ typedef struct HeapScanDescData
*/
ParallelBlockTableScanWorkerData *rs_parallelworkerdata;
+ /*
+ * For sequential scans and bitmap heap scans. If the relation is not
+ * being modified, on-access pruning may read in the current heap page's
+ * corresponding VM block to this buffer.
+ */
+ Buffer rs_vmbuffer;
+
/* these fields only used in page-at-a-time mode and for bitmap scans */
uint32 rs_cindex; /* current tuple's index in vistuples */
uint32 rs_ntuples; /* number of visible tuples on page */
@@ -116,8 +123,18 @@ typedef struct IndexFetchHeapData
{
IndexFetchTableData xs_base; /* AM independent part of the descriptor */
- Buffer xs_cbuf; /* current heap buffer in scan, if any */
- /* NB: if xs_cbuf is not InvalidBuffer, we hold a pin on that buffer */
+ /*
+ * Current heap buffer in scan, if any. NB: if xs_cbuf is not
+ * InvalidBuffer, we hold a pin on that buffer.
+ */
+ Buffer xs_cbuf;
+
+ /*
+ * For index scans that do not modify the underlying heap table, on-access
+ * pruning may read in the current heap page's corresponding VM block to
+ * this buffer.
+ */
+ Buffer xs_vmbuffer;
} IndexFetchHeapData;
/* Result codes for HeapTupleSatisfiesVacuum */
@@ -374,7 +391,8 @@ extern TransactionId heap_index_delete_tuples(Relation rel,
/* in heap/pruneheap.c */
struct GlobalVisState;
-extern void heap_page_prune_opt(Relation relation, Buffer buffer);
+extern void heap_page_prune_opt(Relation relation, Buffer buffer,
+ Buffer *vmbuffer);
extern void heap_page_prune_and_freeze(Relation relation, Buffer buffer,
bool blk_known_av,
Buffer vmbuffer,
diff --git a/src/include/access/relscan.h b/src/include/access/relscan.h
index b5e0fb386c0..f496e0b4939 100644
--- a/src/include/access/relscan.h
+++ b/src/include/access/relscan.h
@@ -121,6 +121,12 @@ typedef struct ParallelBlockTableScanWorkerData *ParallelBlockTableScanWorker;
typedef struct IndexFetchTableData
{
Relation rel;
+
+ /*
+ * Some optimizations can only be performed if the query does not modify
+ * the underlying relation. Track that here.
+ */
+ bool modifies_base_rel;
} IndexFetchTableData;
struct IndexScanInstrumentation;
diff --git a/src/include/access/tableam.h b/src/include/access/tableam.h
index 1c9e802a6b1..0e986d8ef72 100644
--- a/src/include/access/tableam.h
+++ b/src/include/access/tableam.h
@@ -62,6 +62,8 @@ typedef enum ScanOptions
/* unregister snapshot at scan end? */
SO_TEMP_SNAPSHOT = 1 << 9,
+ /* whether or not scan should attempt to set the VM */
+ SO_ALLOW_VM_SET = 1 << 10,
} ScanOptions;
/*
@@ -876,6 +878,25 @@ table_beginscan(Relation rel, Snapshot snapshot,
return rel->rd_tableam->scan_begin(rel, snapshot, nkeys, key, NULL, flags);
}
+/*
+ * Similar to table_beginscan(), but allows the caller to indicate whether the
+ * query modifies the relation. This is used when the caller wants to attempt
+ * marking pages in the relation as all-visible in the visibility map during
+ * on-access pruning.
+ */
+static inline TableScanDesc
+table_beginscan_vmset(Relation rel, Snapshot snapshot,
+ int nkeys, struct ScanKeyData *key, bool modifies_rel)
+{
+ uint32 flags = SO_TYPE_SEQSCAN |
+ SO_ALLOW_STRAT | SO_ALLOW_SYNC | SO_ALLOW_PAGEMODE;
+
+ if (!modifies_rel)
+ flags |= SO_ALLOW_VM_SET;
+
+ return rel->rd_tableam->scan_begin(rel, snapshot, nkeys, key, NULL, flags);
+}
+
/*
* Like table_beginscan(), but for scanning catalog. It'll automatically use a
* snapshot appropriate for scanning catalog relations.
@@ -913,10 +934,13 @@ table_beginscan_strat(Relation rel, Snapshot snapshot,
*/
static inline TableScanDesc
table_beginscan_bm(Relation rel, Snapshot snapshot,
- int nkeys, struct ScanKeyData *key)
+ int nkeys, struct ScanKeyData *key, bool modifies_rel)
{
uint32 flags = SO_TYPE_BITMAPSCAN | SO_ALLOW_PAGEMODE;
+ if (!modifies_rel)
+ flags |= SO_ALLOW_VM_SET;
+
return rel->rd_tableam->scan_begin(rel, snapshot, nkeys, key,
NULL, flags);
}
@@ -1125,6 +1149,10 @@ extern void table_parallelscan_initialize(Relation rel,
extern TableScanDesc table_beginscan_parallel(Relation relation,
ParallelTableScanDesc pscan);
+extern TableScanDesc table_beginscan_parallel_vmset(Relation relation,
+ ParallelTableScanDesc pscan,
+ bool modifies_rel);
+
/*
* Restart a parallel scan. Call this in the leader process. Caller is
* responsible for making sure that all workers have finished the scan
diff --git a/src/include/nodes/execnodes.h b/src/include/nodes/execnodes.h
index de782014b2d..839c1be1d7c 100644
--- a/src/include/nodes/execnodes.h
+++ b/src/include/nodes/execnodes.h
@@ -678,6 +678,12 @@ typedef struct EState
* ExecDoInitialPruning() */
const char *es_sourceText; /* Source text from QueryDesc */
+ /*
+ * RT indexes of relations modified by the query either through
+ * UPDATE/DELETE/INSERT/MERGE or SELECT FOR UPDATE
+ */
+ Bitmapset *es_modified_relids;
+
JunkFilter *es_junkFilter; /* top-level junk filter, if any */
/* If query can insert/delete tuples, the command ID to mark them with */
diff --git a/src/test/recovery/t/035_standby_logical_decoding.pl b/src/test/recovery/t/035_standby_logical_decoding.pl
index c9c182892cf..870f03bdd79 100644
--- a/src/test/recovery/t/035_standby_logical_decoding.pl
+++ b/src/test/recovery/t/035_standby_logical_decoding.pl
@@ -10,6 +10,7 @@ use PostgreSQL::Test::Cluster;
use PostgreSQL::Test::Utils;
use Time::HiRes qw(usleep);
use Test::More;
+use Time::HiRes qw(usleep);
if ($ENV{enable_injection_points} ne 'yes')
{
@@ -296,6 +297,7 @@ wal_level = 'logical'
max_replication_slots = 4
max_wal_senders = 4
autovacuum = off
+hot_standby_feedback = on
});
$node_primary->dump_info;
$node_primary->start;
@@ -745,7 +747,7 @@ check_pg_recvlogical_stderr($handle,
$logstart = -s $node_standby->logfile;
reactive_slots_change_hfs_and_wait_for_xmins('shared_row_removal_',
- 'no_conflict_', 0, 1);
+ 'no_conflict_', 1, 0);
# This should not trigger a conflict
wait_until_vacuum_can_remove(
--
2.43.0
v7-0018-Unset-all-visible-sooner-if-not-freezing.patchtext/x-patch; charset=US-ASCII; name=v7-0018-Unset-all-visible-sooner-if-not-freezing.patchDownload
From e8962ba850206bc7de6f04ba2655c336e8108023 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <melanieplageman@gmail.com>
Date: Tue, 29 Jul 2025 14:35:13 -0400
Subject: [PATCH v7 18/22] Unset all-visible sooner if not freezing
In prune/freeze code, we delay unsetting all-visible/all-frozen in the
presence of dead items to allow opportunistically freezing tuples if the
whole page would be frozen except for those dead items -- which are
removed later in vacuum's third phase.
Future commits will allow on-access pruning to set the VM, which means
all-visible will be initialized to true instead of false and we will do
extra bookkeeping in heap_prune_unchanged_lp_normal() to keep track of
whether or not the page is all-visible.
Because on-access pruning will not freeze tuples, it makes sense to
unset all-visible as soon as we encounter an LP_DEAD item and
avoid continued bookkeeping since we know the page is not all-visible
and we won't be able to remove those dead items.
---
src/backend/access/heap/pruneheap.c | 10 ++++++++--
1 file changed, 8 insertions(+), 2 deletions(-)
diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index ab79d8a3ed9..80d055e5376 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -1493,8 +1493,11 @@ heap_prune_record_dead(PruneState *prstate, OffsetNumber offnum,
/*
* Deliberately delay unsetting all_visible until later during pruning.
- * Removable dead tuples shouldn't preclude freezing the page.
+ * Removable dead tuples shouldn't preclude freezing the page. If we won't
+ * attempt freezing, just unset all-visible now, though.
*/
+ if (!prstate->attempt_freeze)
+ prstate->all_visible = prstate->all_frozen = false;
/* Record the dead offset for vacuum */
prstate->deadoffsets[prstate->lpdead_items++] = offnum;
@@ -1752,8 +1755,11 @@ heap_prune_record_unchanged_lp_dead(Page page, PruneState *prstate, OffsetNumber
* Similarly, don't unset all_visible until later, at the end of
* heap_page_prune_and_freeze(). This will allow us to attempt to freeze
* the page after pruning. As long as we unset it before updating the
- * visibility map, this will be correct.
+ * visibility map, this will be correct. If we won't attempt freezing,
+ * though, just unset all-visible now.
*/
+ if (!prstate->attempt_freeze)
+ prstate->all_visible = prstate->all_frozen = false;
/* Record the dead offset for vacuum */
prstate->deadoffsets[prstate->lpdead_items++] = offnum;
--
2.43.0
v7-0021-Reorder-heap_page_prune_and_freeze-parameters.patchtext/x-patch; charset=US-ASCII; name=v7-0021-Reorder-heap_page_prune_and_freeze-parameters.patchDownload
From d48a5ea23fe37659673658378d8c47bece8ca282 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <melanieplageman@gmail.com>
Date: Thu, 31 Jul 2025 12:08:18 -0400
Subject: [PATCH v7 21/22] Reorder heap_page_prune_and_freeze parameters
Reorder parameters so that all of the output parameters are together at
the end of the parameter list.
---
src/backend/access/heap/pruneheap.c | 38 ++++++++++++++--------------
src/backend/access/heap/vacuumlazy.c | 6 ++---
src/include/access/heapam.h | 4 +--
3 files changed, 24 insertions(+), 24 deletions(-)
diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index 5d943b0c64f..20f4a62fb16 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -297,10 +297,10 @@ heap_page_prune_opt(Relation relation, Buffer buffer, Buffer *vmbuffer)
* not the relation has indexes, since we cannot safely determine
* that during on-access pruning with the current implementation.
*/
- heap_page_prune_and_freeze(relation, buffer, false,
+ heap_page_prune_and_freeze(relation, buffer, options, false,
vmbuffer ? *vmbuffer : InvalidBuffer,
- vistest, options,
- NULL, &presult, PRUNE_ON_ACCESS,
+ vistest,
+ NULL, PRUNE_ON_ACCESS, &presult,
&dummy_off_loc, NULL, NULL);
/*
@@ -645,6 +645,15 @@ heap_page_will_freeze(Relation relation, Buffer buffer,
* also need to account for a reduction in the length of the line pointer
* array following array truncation by us.
*
+ * options:
+ * MARK_UNUSED_NOW indicates that dead items can be set LP_UNUSED during
+ * pruning.
+ *
+ * FREEZE indicates that we will also freeze tuples, and will return
+ * 'all_visible', 'all_frozen' flags to the caller.
+ *
+ * UPDATE_VM indicates that we will set the page's status in the VM.
+ *
* If the HEAP_PRUNE_FREEZE option is set, we will also freeze tuples if it's
* required in order to advance relfrozenxid / relminmxid, or if it's
* considered advantageous for overall system performance to do so now. The
@@ -663,30 +672,21 @@ heap_page_will_freeze(Relation relation, Buffer buffer,
* contain the required block of the visibility map.
*
* vistest is used to distinguish whether tuples are DEAD or RECENTLY_DEAD
- * (see heap_prune_satisfies_vacuum).
- *
- * options:
- * MARK_UNUSED_NOW indicates that dead items can be set LP_UNUSED during
- * pruning.
- *
- * FREEZE indicates that we will also freeze tuples, and will return
- * 'all_visible', 'all_frozen' flags to the caller.
- *
- * UPDATE_VM indicates that we will set the page's status in the VM.
+ * (see heap_prune_satisfies_vacuum). It is an input parameter.
*
* cutoffs contains the freeze cutoffs, established by VACUUM at the beginning
* of vacuuming the relation. Required if HEAP_PRUNE_FREEZE option is set.
* cutoffs->OldestXmin is also used to determine if dead tuples are
- * HEAPTUPLE_RECENTLY_DEAD or HEAPTUPLE_DEAD.
+ * HEAPTUPLE_RECENTLY_DEAD or HEAPTUPLE_DEAD. It is an input parameter.
+ *
+ * reason indicates why the pruning is performed. It is included in the WAL
+ * record for debugging and analysis purposes, but otherwise has no effect.
*
* presult contains output parameters needed by callers, such as the number of
* tuples removed and the offsets of dead items on the page after pruning.
* heap_page_prune_and_freeze() is responsible for initializing it. Required
* by all callers.
*
- * reason indicates why the pruning is performed. It is included in the WAL
- * record for debugging and analysis purposes, but otherwise has no effect.
- *
* off_loc is the offset location required by the caller to use in error
* callback.
*
@@ -699,13 +699,13 @@ heap_page_will_freeze(Relation relation, Buffer buffer,
*/
void
heap_page_prune_and_freeze(Relation relation, Buffer buffer,
+ int options,
bool blk_known_av,
Buffer vmbuffer,
GlobalVisState *vistest,
- int options,
struct VacuumCutoffs *cutoffs,
- PruneFreezeResult *presult,
PruneReason reason,
+ PruneFreezeResult *presult,
OffsetNumber *off_loc,
TransactionId *new_relfrozen_xid,
MultiXactId *new_relmin_mxid)
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index 777ec30eb82..120782fd8ec 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -1992,11 +1992,11 @@ lazy_scan_prune(LVRelState *vacrel,
if (vacrel->nindexes == 0)
prune_options |= HEAP_PAGE_PRUNE_MARK_UNUSED_NOW;
- heap_page_prune_and_freeze(rel, buf,
+ heap_page_prune_and_freeze(rel, buf, prune_options,
all_visible_according_to_vm,
vmbuffer,
- vacrel->vistest, prune_options,
- &vacrel->cutoffs, &presult, PRUNE_VACUUM_SCAN,
+ vacrel->vistest,
+ &vacrel->cutoffs, PRUNE_VACUUM_SCAN, &presult,
&vacrel->offnum,
&vacrel->NewRelfrozenXid, &vacrel->NewRelminMxid);
diff --git a/src/include/access/heapam.h b/src/include/access/heapam.h
index 16f7904a21e..0c4e5607627 100644
--- a/src/include/access/heapam.h
+++ b/src/include/access/heapam.h
@@ -394,13 +394,13 @@ struct GlobalVisState;
extern void heap_page_prune_opt(Relation relation, Buffer buffer,
Buffer *vmbuffer);
extern void heap_page_prune_and_freeze(Relation relation, Buffer buffer,
+ int options,
bool blk_known_av,
Buffer vmbuffer,
struct GlobalVisState *vistest,
- int options,
struct VacuumCutoffs *cutoffs,
- PruneFreezeResult *presult,
PruneReason reason,
+ PruneFreezeResult *presult,
OffsetNumber *off_loc,
TransactionId *new_relfrozen_xid,
MultiXactId *new_relmin_mxid);
--
2.43.0
v7-0022-Set-pd_prune_xid-on-insert.patchtext/x-patch; charset=US-ASCII; name=v7-0022-Set-pd_prune_xid-on-insert.patchDownload
From e46711ca4bb2e484693efa1d2dc8a8f444bfd094 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <melanieplageman@gmail.com>
Date: Tue, 29 Jul 2025 16:12:56 -0400
Subject: [PATCH v7 22/22] Set pd_prune_xid on insert
Now that we can set the VM during read-only queries, it makes sense to
start setting the page prunable hint on insert. This will allow
heap_page_prune_and_freeze() to be called when the page is full or
mostly full.
For years there has been a note in heap_insert() and heap_multi_insert()
pointing out that setting pd_prune_xid would help clean up aborted
inserted tuples that would otherwise not be cleaned up until vacuum.
So, that's another benefit of setting it.
Setting pd_prune_xid on insert causes a page to be pruned and then
written out which then affects the reported number of hits in the
index-killtuples isolation test. This is a quirk of how hits are tracked
which sometimes leads them to be double counted. This should probably be
fixed or changed independently.
---
src/backend/access/heap/heapam.c | 25 +++++++++++++------
src/backend/access/heap/heapam_xlog.c | 15 ++++++++++-
.../isolation/expected/index-killtuples.out | 6 ++---
3 files changed, 34 insertions(+), 12 deletions(-)
diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index 8a8b63b79f2..b44176e3c70 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -2105,6 +2105,7 @@ heap_insert(Relation relation, HeapTuple tup, CommandId cid,
TransactionId xid = GetCurrentTransactionId();
HeapTuple heaptup;
Buffer buffer;
+ Page page;
Buffer vmbuffer = InvalidBuffer;
bool all_visible_cleared = false;
@@ -2164,15 +2165,19 @@ heap_insert(Relation relation, HeapTuple tup, CommandId cid,
}
/*
- * XXX Should we set PageSetPrunable on this page ?
+ * Set pd_prune_xid to trigger heap_page_prune_and_freeze() once the page
+ * is full so that we can set the page all-visible in the VM.
*
- * The inserting transaction may eventually abort thus making this tuple
- * DEAD and hence available for pruning. Though we don't want to optimize
- * for aborts, if no other tuple in this page is UPDATEd/DELETEd, the
- * aborted tuple will never be pruned until next vacuum is triggered.
+ * Setting pd_prune_xid is also handy if the inserting transaction
+ * eventually aborts making this tuple DEAD and hence available for
+ * pruning. If no other tuple in this page is UPDATEd/DELETEd, the aborted
+ * tuple would never otherwise be pruned until next vacuum is triggered.
*
- * If you do add PageSetPrunable here, add it in heap_xlog_insert too.
+ * Don't set it if we are in bootstrap mode, though.
*/
+ page = BufferGetPage(buffer);
+ if (TransactionIdIsNormal(xid))
+ PageSetPrunable(page, xid);
MarkBufferDirty(buffer);
@@ -2182,7 +2187,6 @@ heap_insert(Relation relation, HeapTuple tup, CommandId cid,
xl_heap_insert xlrec;
xl_heap_header xlhdr;
XLogRecPtr recptr;
- Page page = BufferGetPage(buffer);
uint8 info = XLOG_HEAP_INSERT;
int bufflags = 0;
@@ -2546,8 +2550,13 @@ heap_multi_insert(Relation relation, TupleTableSlot **slots, int ntuples,
}
/*
- * XXX Should we set PageSetPrunable on this page ? See heap_insert()
+ * Set pd_prune_xid. See heap_insert() for more on why we do this when
+ * inserting tuples. This only makes sense if we aren't already
+ * setting the page frozen in the VM. We also don't set it in
+ * bootstrap mode.
*/
+ if (!all_frozen_set && TransactionIdIsNormal(xid))
+ PageSetPrunable(page, xid);
MarkBufferDirty(buffer);
diff --git a/src/backend/access/heap/heapam_xlog.c b/src/backend/access/heap/heapam_xlog.c
index 539d38194f5..5ef49e19c7b 100644
--- a/src/backend/access/heap/heapam_xlog.c
+++ b/src/backend/access/heap/heapam_xlog.c
@@ -474,6 +474,12 @@ heap_xlog_insert(XLogReaderState *record)
freespace = PageGetHeapFreeSpace(page); /* needed to update FSM below */
+ /*
+ * Set the page prunable to trigger on-access pruning later which may
+ * set the page all-visible in the VM.
+ */
+ PageSetPrunable(page, XLogRecGetXid(record));
+
PageSetLSN(page, lsn);
if (xlrec->flags & XLH_INSERT_ALL_VISIBLE_CLEARED)
@@ -623,9 +629,16 @@ heap_xlog_multi_insert(XLogReaderState *record)
if (xlrec->flags & XLH_INSERT_ALL_VISIBLE_CLEARED)
PageClearAllVisible(page);
- /* XLH_INSERT_ALL_FROZEN_SET implies that all tuples are visible */
+ /*
+ * XLH_INSERT_ALL_FROZEN_SET implies that all tuples are visible. If
+ * we are not setting the page frozen, then set the page's prunable
+ * hint so that we trigger on-access pruning later which may set the
+ * page all-visible in the VM.
+ */
if (xlrec->flags & XLH_INSERT_ALL_FROZEN_SET)
PageSetAllVisible(page);
+ else
+ PageSetPrunable(page, XLogRecGetXid(record));
MarkBufferDirty(buffer);
}
diff --git a/src/test/isolation/expected/index-killtuples.out b/src/test/isolation/expected/index-killtuples.out
index be7ddd756ef..b29f2434b00 100644
--- a/src/test/isolation/expected/index-killtuples.out
+++ b/src/test/isolation/expected/index-killtuples.out
@@ -54,7 +54,7 @@ step flush: SELECT FROM pg_stat_force_next_flush();
step result: SELECT heap_blks_read + heap_blks_hit - counter.heap_accesses AS new_heap_accesses FROM counter, pg_statio_all_tables WHERE relname = 'kill_prior_tuple';
new_heap_accesses
-----------------
- 1
+ 2
(1 row)
step measure: UPDATE counter SET heap_accesses = (SELECT heap_blks_read + heap_blks_hit FROM pg_statio_all_tables WHERE relname = 'kill_prior_tuple');
@@ -130,7 +130,7 @@ step flush: SELECT FROM pg_stat_force_next_flush();
step result: SELECT heap_blks_read + heap_blks_hit - counter.heap_accesses AS new_heap_accesses FROM counter, pg_statio_all_tables WHERE relname = 'kill_prior_tuple';
new_heap_accesses
-----------------
- 1
+ 2
(1 row)
step measure: UPDATE counter SET heap_accesses = (SELECT heap_blks_read + heap_blks_hit FROM pg_statio_all_tables WHERE relname = 'kill_prior_tuple');
@@ -283,7 +283,7 @@ step flush: SELECT FROM pg_stat_force_next_flush();
step result: SELECT heap_blks_read + heap_blks_hit - counter.heap_accesses AS new_heap_accesses FROM counter, pg_statio_all_tables WHERE relname = 'kill_prior_tuple';
new_heap_accesses
-----------------
- 1
+ 2
(1 row)
step measure: UPDATE counter SET heap_accesses = (SELECT heap_blks_read + heap_blks_hit FROM pg_statio_all_tables WHERE relname = 'kill_prior_tuple');
--
2.43.0
v7-0020-Add-helper-functions-to-heap_page_prune_and_freez.patchtext/x-patch; charset=US-ASCII; name=v7-0020-Add-helper-functions-to-heap_page_prune_and_freez.patchDownload
From 630aea6e2fc3013c20bc938a0f7a115123da230d Mon Sep 17 00:00:00 2001
From: Melanie Plageman <melanieplageman@gmail.com>
Date: Wed, 30 Jul 2025 18:51:43 -0400
Subject: [PATCH v7 20/22] Add helper functions to heap_page_prune_and_freeze
heap_page_prune_and_freeze() has gotten rather long. It has several
stages:
1) setup - where the PruneStateis set up
2) tuple examination -- where tuples and line pointers are examined to
determine what needs to be pruned and what could be frozen
3) evaluation -- where we determine based on caller provided options,
heuristics, and state gathered during stage 2 whether or not to
freeze tuples and set the page in the VM
4) execution - where the page changes are actually made and logged
This commit refactors the evaluation stage into helpers which return
whether or not to freeze and set the VM.
For the purposes of committing, this likely shouldn't be a separate
commit. But I'm not sure yet whether it makes more sense to do this
refactoring earlier in the set for clarity for the reviewer.
---
src/backend/access/heap/pruneheap.c | 471 +++++++++++++++++-----------
1 file changed, 295 insertions(+), 176 deletions(-)
diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index dad341cb265..5d943b0c64f 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -179,6 +179,22 @@ static void heap_prune_record_unchanged_lp_redirect(PruneState *prstate, OffsetN
static void page_verify_redirects(Page page);
+static bool heap_page_will_freeze(Relation relation, Buffer buffer,
+ bool do_prune,
+ bool do_hint_full_or_prunable,
+ bool did_tuple_hint_fpi,
+ PruneState *prstate,
+ bool *all_frozen_except_lp_dead);
+
+static bool heap_page_will_update_vm(Relation relation,
+ Buffer buffer, BlockNumber blockno, Page page,
+ PruneReason reason,
+ bool do_prune, bool do_freeze,
+ bool blk_known_av,
+ PruneState *prstate,
+ Buffer *vmbuffer, uint8 *vmflags,
+ bool *set_pd_all_visible);
+
static bool identify_and_fix_vm_corruption(Relation relation,
BlockNumber heap_blk,
Buffer heap_buffer, Page heap_page,
@@ -376,6 +392,249 @@ identify_and_fix_vm_corruption(Relation relation,
return false;
}
+
+/*
+ * Determine whether to set the visibility map bits based on information from
+ * the PruneState and blk_known_av, which some callers will provide after
+ * previously examining this heap page's VM bits (e.g. vacuum from the last
+ * heap_vac_scan_next_block() call).
+ *
+ * We pass in blockno and page even those can be derived from buffer to avoid
+ * extra BufferGetBlock() and BufferGetBlockNumber() calls.
+ *
+ * This should be called only after do_freeze has been decided (and do_prune
+ * has been set), as these factor into our heuristic-based decision.
+ *
+ * prstate and vmbuffer are input/output fields. vmflags and and
+ * set_pd_all_visible are output fields.
+ *
+ * Returns true if the caller should set one or both of the VM bits and false
+ * otherwise.
+ */
+static bool
+heap_page_will_update_vm(Relation relation,
+ Buffer buffer, BlockNumber blockno, Page page,
+ PruneReason reason,
+ bool do_prune, bool do_freeze,
+ bool blk_known_av,
+ PruneState *prstate,
+ Buffer *vmbuffer, uint8 *vmflags,
+ bool *set_pd_all_visible)
+{
+ bool do_set_vm = false;
+
+ /*
+ * If the caller specified not to update the VM, validate everything is in
+ * the right state and exit.
+ */
+ if (!prstate->consider_update_vm)
+ {
+ Assert(!prstate->all_visible && !prstate->all_frozen);
+ /* We don't set only the page level visibility hint */
+ Assert(!(*set_pd_all_visible));
+ Assert(*vmflags == 0);
+ return false;
+ }
+
+ /*
+ * If this is an on-access call and we're not actually pruning, avoid
+ * setting the visibility map if it would newly dirty the heap page or, if
+ * the page is already dirty, if doing so would require including a
+ * full-page image (FPI) of the heap page in the WAL. This situation
+ * should be rare, as on-access pruning is only attempted when
+ * pd_prune_xid is valid.
+ */
+ if (reason == PRUNE_ON_ACCESS &&
+ prstate->consider_update_vm &&
+ prstate->all_visible &&
+ !do_prune && !do_freeze &&
+ (!BufferIsDirty(buffer) || XLogCheckBufferNeedsBackup(buffer)))
+ {
+ prstate->consider_update_vm = false;
+ prstate->all_visible = prstate->all_frozen = false;
+ }
+
+ Assert(!prstate->all_frozen || prstate->all_visible);
+
+ /*
+ * Clear any VM corruption. This does not need to be in a critical
+ * section, so we do it first. If PD_ALL_VISIBLE is incorrectly set, we
+ * may mark the heap page buffer dirty here and could end up doing so
+ * again later. This is not a correctness issue and is in the path of VM
+ * corruption, so we don't have to worry about the extra performance
+ * overhead.
+ */
+ if (identify_and_fix_vm_corruption(relation,
+ blockno, buffer, page,
+ blk_known_av, prstate->lpdead_items,
+ *vmbuffer))
+ {
+ /* If we fix corruption, don't update the VM further */
+ }
+
+ /* Determine if we actually need to set the VM and which bits to set. */
+ else if (prstate->all_visible &&
+ (!blk_known_av ||
+ (prstate->all_frozen && !VM_ALL_FROZEN(relation, blockno, vmbuffer))))
+ {
+ *vmflags |= VISIBILITYMAP_ALL_VISIBLE;
+ if (prstate->all_frozen)
+ *vmflags |= VISIBILITYMAP_ALL_FROZEN;
+ }
+
+ do_set_vm = *vmflags & VISIBILITYMAP_VALID_BITS;
+
+ /*
+ * Don't set PD_ALL_VISIBLE unless we also plan to set the VM. While it is
+ * correct for a heap page to have PD_ALL_VISIBLE even if the VM is not
+ * set, we strongly prefer to keep them in sync.
+ *
+ * Prior to Postgres 19, it was possible for the page-level bit to be set
+ * and the VM bit to be clear. This could happen if we crashed after
+ * setting PD_ALL_VISIBLE but before setting bits in the VM.
+ */
+ *set_pd_all_visible = do_set_vm && !PageIsAllVisible(page);
+ return do_set_vm;
+}
+
+/*
+ * Decide if we want to go ahead with freezing according to the freeze plans we
+ * prepared for the given buffer or not. If the caller specified we should not
+ * freeze tuples, it exits early.
+ *
+ * do_prune, do_hint_full_or_prunable, and did_tuple_hint_fpi must all have
+ * been decided before calling this function.
+ *
+ * prstate is an input/output parameter. all_frozen_except_lp_dead is set and
+ * used later to determine the snapshot conflict horizon for the record.
+ *
+ * Returns true if we should use our freeze plans and freeze tuples on the page
+ * and false otherwise.
+ */
+static bool
+heap_page_will_freeze(Relation relation, Buffer buffer,
+ bool do_prune,
+ bool do_hint_full_or_prunable,
+ bool did_tuple_hint_fpi,
+ PruneState *prstate,
+ bool *all_frozen_except_lp_dead)
+{
+ bool do_freeze = false;
+
+ /*
+ * If the caller specified we should not attempt to freeze any tuples,
+ * validate that everything is in the right state and exit.
+ */
+ if (!prstate->attempt_freeze)
+ {
+ Assert(!prstate->all_frozen && prstate->nfrozen == 0);
+ Assert(prstate->lpdead_items == 0 || !prstate->all_visible);
+ Assert(!(*all_frozen_except_lp_dead));
+ return false;
+ }
+
+ if (prstate->pagefrz.freeze_required)
+ {
+ /*
+ * heap_prepare_freeze_tuple indicated that at least one XID/MXID from
+ * before FreezeLimit/MultiXactCutoff is present. Must freeze to
+ * advance relfrozenxid/relminmxid.
+ */
+ do_freeze = true;
+ }
+ else
+ {
+ /*
+ * Opportunistically freeze the page if we are generating an FPI
+ * anyway and if doing so means that we can set the page all-frozen
+ * afterwards (might not happen until VACUUM's final heap pass).
+ *
+ * XXX: Previously, we knew if pruning emitted an FPI by checking
+ * pgWalUsage.wal_fpi before and after pruning. Once the freeze and
+ * prune records were combined, this heuristic couldn't be used
+ * anymore. The opportunistic freeze heuristic must be improved;
+ * however, for now, try to approximate the old logic.
+ */
+ if (prstate->all_visible && prstate->all_frozen && prstate->nfrozen > 0)
+ {
+ /*
+ * Freezing would make the page all-frozen. Have already emitted
+ * an FPI or will do so anyway?
+ */
+ if (RelationNeedsWAL(relation))
+ {
+ if (did_tuple_hint_fpi)
+ do_freeze = true;
+ else if (do_prune)
+ {
+ if (XLogCheckBufferNeedsBackup(buffer))
+ do_freeze = true;
+ }
+ else if (do_hint_full_or_prunable)
+ {
+ if (XLogHintBitIsNeeded() && XLogCheckBufferNeedsBackup(buffer))
+ do_freeze = true;
+ }
+ }
+ }
+ }
+
+ if (do_freeze)
+ {
+ /*
+ * Validate the tuples we will be freezing before entering the
+ * critical section.
+ */
+ heap_pre_freeze_checks(buffer, prstate->frozen, prstate->nfrozen);
+ }
+ else if (prstate->nfrozen > 0)
+ {
+ /*
+ * The page contained some tuples that were not already frozen, and we
+ * chose not to freeze them now. The page won't be all-frozen then.
+ */
+ Assert(!prstate->pagefrz.freeze_required);
+
+ prstate->all_frozen = false;
+ prstate->nfrozen = 0; /* avoid miscounts in instrumentation */
+ }
+ else
+ {
+ /*
+ * We have no freeze plans to execute. The page might already be
+ * all-frozen (perhaps only following pruning), though. Such pages
+ * can be marked all-frozen in the VM by our caller, even though none
+ * of its tuples were newly frozen here.
+ */
+ }
+
+ /*
+ * It was convenient to ignore LP_DEAD items in all_visible earlier on to
+ * make the choice of whether or not to freeze the page unaffected by the
+ * short-term presence of LP_DEAD items. These LP_DEAD items were
+ * effectively assumed to be LP_UNUSED items in the making. It doesn't
+ * matter which vacuum heap pass (initial pass or final pass) ends up
+ * setting the page all-frozen, as long as the ongoing VACUUM does it.
+ *
+ * Now that freezing has been finalized, unset all_visible if there are
+ * any LP_DEAD items on the page. It needs to reflect the present state of
+ * the page when using it to determine whether or not to update the VM.
+ *
+ * Keep track of whether or not the page was all-frozen except LP_DEAD
+ * items for the purposes of calculating the snapshot conflict horizon,
+ * though.
+ */
+ *all_frozen_except_lp_dead = prstate->all_frozen;
+ if (prstate->lpdead_items > 0)
+ {
+ prstate->all_visible = false;
+ prstate->all_frozen = false;
+ }
+
+ return do_freeze;
+}
+
+
/*
* Prune and repair fragmentation and potentially freeze tuples on the
* specified page. If the page's visibility status has changed, update it in
@@ -766,20 +1025,30 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
/* Clear the offset information once we have processed the given page. */
*off_loc = InvalidOffsetNumber;
- do_prune = prstate.nredirected > 0 ||
- prstate.ndead > 0 ||
- prstate.nunused > 0;
-
/*
* After processing all the live tuples on the page, if the newest xmin
* amongst them is not visible to everyone, the page cannot be
- * all-visible.
+ * all-visible. This must be done before we decide whether or not to
+ * opportunistically freeze below because we do not want to
+ * opportunistically freeze the page if there are live tuples not visible
+ * to everyone, which would prevent setting the page frozen in the VM.
*/
if (prstate.all_visible &&
TransactionIdIsNormal(prstate.visibility_cutoff_xid) &&
!GlobalVisXidVisibleToAll(prstate.vistest, prstate.visibility_cutoff_xid))
prstate.all_visible = prstate.all_frozen = false;
+ /*
+ * Now decide based on information collected while examining every tuple
+ * which actions to take. If there are any prunable tuples, we'll prune
+ * them. However, we will decide based on options specified by the caller
+ * and various heuristics whether or not to freeze any tuples and whether
+ * or not the page should be set all-visible/all-frozen in the VM.
+ */
+ do_prune = prstate.nredirected > 0 ||
+ prstate.ndead > 0 ||
+ prstate.nunused > 0;
+
/*
* Even if we don't prune anything, if we found a new value for the
* pd_prune_xid field or the page was marked full, we will update those
@@ -790,182 +1059,32 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
PageIsFull(page);
/*
- * Decide if we want to go ahead with freezing according to the freeze
- * plans we prepared, or not.
- */
- do_freeze = false;
- if (prstate.attempt_freeze)
- {
- if (prstate.pagefrz.freeze_required)
- {
- /*
- * heap_prepare_freeze_tuple indicated that at least one XID/MXID
- * from before FreezeLimit/MultiXactCutoff is present. Must
- * freeze to advance relfrozenxid/relminmxid.
- */
- do_freeze = true;
- }
- else
- {
- /*
- * Opportunistically freeze the page if we are generating an FPI
- * anyway and if doing so means that we can set the page
- * all-frozen afterwards (might not happen until VACUUM's final
- * heap pass).
- *
- * XXX: Previously, we knew if pruning emitted an FPI by checking
- * pgWalUsage.wal_fpi before and after pruning. Once the freeze
- * and prune records were combined, this heuristic couldn't be
- * used anymore. The opportunistic freeze heuristic must be
- * improved; however, for now, try to approximate the old logic.
- */
- if (prstate.all_visible && prstate.all_frozen && prstate.nfrozen > 0)
- {
- /*
- * Freezing would make the page all-frozen. Have already
- * emitted an FPI or will do so anyway?
- */
- if (RelationNeedsWAL(relation))
- {
- if (did_tuple_hint_fpi)
- do_freeze = true;
- else if (do_prune)
- {
- if (XLogCheckBufferNeedsBackup(buffer))
- do_freeze = true;
- }
- else if (do_hint_full_or_prunable)
- {
- if (XLogHintBitIsNeeded() && XLogCheckBufferNeedsBackup(buffer))
- do_freeze = true;
- }
- }
- }
- }
- }
-
- if (do_freeze)
- {
- /*
- * Validate the tuples we will be freezing before entering the
- * critical section.
- */
- heap_pre_freeze_checks(buffer, prstate.frozen, prstate.nfrozen);
- }
- else if (prstate.nfrozen > 0)
- {
- /*
- * The page contained some tuples that were not already frozen, and we
- * chose not to freeze them now. The page won't be all-frozen then.
- */
- Assert(!prstate.pagefrz.freeze_required);
-
- prstate.all_frozen = false;
- prstate.nfrozen = 0; /* avoid miscounts in instrumentation */
- }
- else
- {
- /*
- * We have no freeze plans to execute. The page might already be
- * all-frozen (perhaps only following pruning), though. Such pages
- * can be marked all-frozen in the VM by our caller, even though none
- * of its tuples were newly frozen here.
- */
- }
-
- /*
- * It was convenient to ignore LP_DEAD items in all_visible earlier on to
- * make the choice of whether or not to freeze the page unaffected by the
- * short-term presence of LP_DEAD items. These LP_DEAD items were
- * effectively assumed to be LP_UNUSED items in the making. It doesn't
- * matter which vacuum heap pass (initial pass or final pass) ends up
- * setting the page all-frozen, as long as the ongoing VACUUM does it.
- *
- * Now that freezing has been finalized, unset all_visible if there are
- * any LP_DEAD items on the page. It needs to reflect the present state of
- * the page when using it to determine whether or not to update the VM.
- *
- * Keep track of whether or not the page was all-frozen except LP_DEAD
- * items for the purposes of calculating the snapshot conflict horizon,
- * though.
+ * We must decide whether or not to freeze before deciding if and what to
+ * set in the VM.
*/
- all_frozen_except_lp_dead = prstate.all_frozen;
- if (prstate.lpdead_items > 0)
- {
- prstate.all_visible = false;
- prstate.all_frozen = false;
- }
-
- /*
- * If this is an on-access call and we're not actually pruning, avoid
- * setting the visibility map if it would newly dirty the heap page or, if
- * the page is already dirty, if doing so would require including a
- * full-page image (FPI) of the heap page in the WAL. This situation
- * should be rare, as on-access pruning is only attempted when
- * pd_prune_xid is valid.
- */
- if (reason == PRUNE_ON_ACCESS &&
- prstate.consider_update_vm &&
- prstate.all_visible &&
- !do_prune && !do_freeze &&
- (!BufferIsDirty(buffer) || XLogCheckBufferNeedsBackup(buffer)))
- {
- prstate.consider_update_vm = false;
- prstate.all_visible = prstate.all_frozen = false;
- }
-
- Assert(!prstate.all_frozen || prstate.all_visible);
-
- /*
- * Handle setting visibility map bit based on information from the VM (if
- * provided, e.g. by vacuum from the last heap_vac_scan_next_block()
- * call), and from all_visible and all_frozen variables.
- */
- if (prstate.consider_update_vm)
- {
- /*
- * Clear any VM corruption. This does not need to be in a critical
- * section, so we do it first. If PD_ALL_VISIBLE is incorrectly set,
- * we may mark the heap page buffer dirty here and could end up doing
- * so again later. This is not a correctness issue and is in the path
- * of VM corruption, so we don't have to worry about the extra
- * performance overhead.
- */
- if (identify_and_fix_vm_corruption(relation,
- blockno, buffer, page,
- blk_known_av, prstate.lpdead_items, vmbuffer))
- {
- /* If we fix corruption, don't update the VM further */
- }
-
- /* Determine if we actually need to set the VM and which bits to set. */
- else if (prstate.all_visible &&
- (!blk_known_av ||
- (prstate.all_frozen && !VM_ALL_FROZEN(relation, blockno, &vmbuffer))))
- {
- vmflags |= VISIBILITYMAP_ALL_VISIBLE;
- if (prstate.all_frozen)
- vmflags |= VISIBILITYMAP_ALL_FROZEN;
- }
- }
-
- do_set_vm = vmflags & VISIBILITYMAP_VALID_BITS;
-
- /*
- * Don't set PD_ALL_VISIBLE unless we also plan to set the VM. While it is
- * correct for a heap page to have PD_ALL_VISIBLE even if the VM is not
- * set, we strongly prefer to keep them in sync.
- *
- * Prior to Postgres 19, it was possible for the page-level bit to be set
- * and the VM bit to be clear. This could happen if we crashed after
- * setting PD_ALL_VISIBLE but before setting bits in the VM.
- */
- set_pd_all_visible = do_set_vm && !PageIsAllVisible(page);
+ do_freeze = heap_page_will_freeze(relation, buffer,
+ do_prune,
+ do_hint_full_or_prunable,
+ did_tuple_hint_fpi,
+ &prstate,
+ &all_frozen_except_lp_dead);
+
+ do_set_vm = heap_page_will_update_vm(relation,
+ buffer, blockno, page,
+ reason,
+ do_prune, do_freeze,
+ blk_known_av,
+ &prstate,
+ &vmbuffer,
+ &vmflags, &set_pd_all_visible);
/* Save these for the caller in case we later zero out vmflags */
presult->new_vmbits = vmflags;
- /* Any error while applying the changes is critical */
+ /*
+ * Time to actually make the changes to the page and log them. Any error
+ * while applying the changes is critical.
+ */
START_CRIT_SECTION();
if (do_hint_full_or_prunable)
--
2.43.0
On Tue, Aug 26, 2025 at 4:01 PM Kirill Reshke <reshkekirill@gmail.com> wrote:
Few comments on 0003.
1) This patch introduces XLHP_HAS_VMFLAGS. However it lacks some
helpful comments about this new status bit.
I added the ones you suggested in my v7 posted here [1]/messages/by-id/CAAKRu_YD0ecXeAh+DmJpzQOJwcRzmMyGdcc5W_0pEF78rYSJkQ@mail.gmail.com.
2) Should we move conflict_xid = visibility_cutoff_xid; assignment
just after heap_page_is_all_visible_except_lpdead call in
lazy_vacuum_heap_page?
Why would we want to do that? We only want to set it if the page is
all visible, so we would have to guard it similarly.
3) Looking at this diff, do not comprehend one bit: how are we
protected from passing an all-visible page to lazy_vacuum_heap_page. I
did not manage to reproduce such behaviour though.+ if ((vmflags & VISIBILITYMAP_VALID_BITS) != 0) + { + Assert(!PageIsAllVisible(page)); + set_pd_all_vis = true; + LockBuffer(vmbuffer, BUFFER_LOCK_EXCLUSIVE); + PageSetAllVisible(page); + visibilitymap_set_vmbyte(vacrel->rel, + blkno,
So, for one, there is an assert just above this code in
lazy_vacuum_heap_page() that nunused > 0 -- so we know that the page
couldn't have been all-visible already because it had unused line
pointers.
Otherwise, if it was possible for an already all-visible page to get
here, the same thing would happen that happens on master --
heap_page_is_all_visible[_except_lpdead()] would return true and we
would try to set the VM which would end up being a no-op.
- Melanie
[1]: /messages/by-id/CAAKRu_YD0ecXeAh+DmJpzQOJwcRzmMyGdcc5W_0pEF78rYSJkQ@mail.gmail.com
On Thu, 28 Aug 2025 at 00:02, Melanie Plageman
<melanieplageman@gmail.com> wrote:
Do we need to pin vmbuffer here? Looks like
XLogReadBufferForRedoExtended already pins vmbuffer. I verified this
with CheckBufferIsPinnedOnce(vmbuffer) just before visibilitymap_pin
and COPY ... WITH (FREEZE true) test.I thought the reason visibilitymap_set() did it was that it was
possible for the block of the VM corresponding to the block of the
heap to be different during recovery than it was when emitting the
record, and thus we needed the part of visiblitymap_pin() that
released the old vmbuffer and got the new one corresponding to the
heap block.I can't quite think of how this could happen though.
Assuming it can't happen, then we can get rid of visiblitymap_pin()
(and add visibilitymap_pin_ok()) in both visiblitymap_set_vmbyte() and
visibilitymap_set(). I've done this to visibilitymap_set() in a
separate patch 0001. I would like other opinions/confirmation that the
block of the VM corresponding to the heap block cannot differ during
recovery from that what it was when the record was emitted during
normal operation, though.
I did micro git-blame research here. I spotted only one related change
[0]: https://github.com/postgres/postgres/commit/2c03216d8311
But not after this change, so this visibilitymap_pin is just an oversight?
Related thread is [1]/messages/by-id/533D6CBF.6080203@vmware.com. I quickly checked the discussion in this
thread, and it looks like no one was bothered about these lines or VM
logging changes (in this exact pin buffer aspect). The discussion was
of other aspects of this commit.
[0]: https://github.com/postgres/postgres/commit/2c03216d8311
[1]: /messages/by-id/533D6CBF.6080203@vmware.com
--
Best regards,
Kirill Reshke
On Thu, Aug 28, 2025 at 5:12 AM Kirill Reshke <reshkekirill@gmail.com> wrote:
I did micro git-blame research here. I spotted only one related change
[0]. Looks like before this change pin was indeed needed.
But not after this change, so this visibilitymap_pin is just an oversight?
Related thread is [1]. I quickly checked the discussion in this
thread, and it looks like no one was bothered about these lines or VM
logging changes (in this exact pin buffer aspect). The discussion was
of other aspects of this commit.
Wow, thanks so much for doing that research. Looking at it myself, it
does indeed seem like just an oversight. It isn't harmful since it
won't take another pin, but it is confusing, so I think we should at
least remove it in master. I'm not as sure about back branches.
I would like someone to confirm that there is no way we could end up
with a different block of the VM containing the vm bits for a heap
block during recovery than during normal operation.
- Melanie
On Tue, Sep 2, 2025 at 5:52 PM Melanie Plageman
<melanieplageman@gmail.com> wrote:
On Thu, Aug 28, 2025 at 5:12 AM Kirill Reshke <reshkekirill@gmail.com> wrote:
I did micro git-blame research here. I spotted only one related change
[0]. Looks like before this change pin was indeed needed.
But not after this change, so this visibilitymap_pin is just an oversight?
Related thread is [1]. I quickly checked the discussion in this
thread, and it looks like no one was bothered about these lines or VM
logging changes (in this exact pin buffer aspect). The discussion was
of other aspects of this commit.Wow, thanks so much for doing that research. Looking at it myself, it
does indeed seem like just an oversight. It isn't harmful since it
won't take another pin, but it is confusing, so I think we should at
least remove it in master. I'm not as sure about back branches.
I've updated the commit message in the patch set to reflect the
research you did in attached v8.
- Melanie
Attachments:
v8-0002-Add-assert-and-log-message-to-visibilitymap_set.patchtext/x-patch; charset=US-ASCII; name=v8-0002-Add-assert-and-log-message-to-visibilitymap_set.patchDownload
From 7c5cb3edf89735eaa8bee9ca46111bd6c554720b Mon Sep 17 00:00:00 2001
From: Melanie Plageman <melanieplageman@gmail.com>
Date: Wed, 27 Aug 2025 10:07:29 -0400
Subject: [PATCH v8 02/22] Add assert and log message to visibilitymap_set
Add an assert to visibilitymap_set() that the provided heap buffer is
exclusively locked, which is expected.
Also, enhance the debug logging message to specify which VM flags were
set.
Based on a related suggestion by Kirill Reshke on an in-progress
patchset.
Discussion: https://postgr.es/m/CALdSSPhAU56g1gGVT0%2BwG8RrSWE6qW8TOfNJS1HNAWX6wPgbFA%40mail.gmail.com
---
src/backend/access/heap/visibilitymap.c | 5 ++++-
1 file changed, 4 insertions(+), 1 deletion(-)
diff --git a/src/backend/access/heap/visibilitymap.c b/src/backend/access/heap/visibilitymap.c
index 953ad4a4843..7306c16f05c 100644
--- a/src/backend/access/heap/visibilitymap.c
+++ b/src/backend/access/heap/visibilitymap.c
@@ -255,7 +255,8 @@ visibilitymap_set(Relation rel, BlockNumber heapBlk, Buffer heapBuf,
uint8 status;
#ifdef TRACE_VISIBILITYMAP
- elog(DEBUG1, "vm_set %s %d", RelationGetRelationName(rel), heapBlk);
+ elog(DEBUG1, "vm_set flags 0x%02X for %s %d",
+ flags, RelationGetRelationName(rel), heapBlk);
#endif
Assert(InRecovery || XLogRecPtrIsInvalid(recptr));
@@ -269,6 +270,8 @@ visibilitymap_set(Relation rel, BlockNumber heapBlk, Buffer heapBuf,
if (BufferIsValid(heapBuf) && BufferGetBlockNumber(heapBuf) != heapBlk)
elog(ERROR, "wrong heap buffer passed to visibilitymap_set");
+ Assert(!BufferIsValid(heapBuf) || BufferIsExclusiveLocked(heapBuf));
+
/* Check that we have the right VM page pinned */
if (!BufferIsValid(vmBuf) || BufferGetBlockNumber(vmBuf) != mapBlock)
elog(ERROR, "wrong VM buffer passed to visibilitymap_set");
--
2.43.0
v8-0001-Remove-unneeded-VM-pin-from-VM-replay.patchtext/x-patch; charset=US-ASCII; name=v8-0001-Remove-unneeded-VM-pin-from-VM-replay.patchDownload
From dd98177294011ee93cac122405516abd89f4e393 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <melanieplageman@gmail.com>
Date: Wed, 27 Aug 2025 08:50:15 -0400
Subject: [PATCH v8 01/22] Remove unneeded VM pin from VM replay
Previously, heap_xlog_visible() called visibilitymap_pin() even after
getting a buffer from XLogReadBufferForRedoExtended() -- which returns a
pinned buffer containing the specified block of the visibility map.
This would just have resulted in visibilitymap_pin() returning early
since the specified page was already present and pinned, but it was
confusing extraneous code, so remove it.
It appears to be an oversight in 2c03216.
Author: Melanie Plageman <melanieplageman@gmail.com>
Reported-by: Melanie Plageman <melanieplageman@gmail.com>
Reported-by: Kirill Reshke <reshkekirill@gmail.com>
Discussion: https://postgr.es/m/CALdSSPhu7WZd%2BEfQDha1nz%3DDC93OtY1%3DUFEdWwSZsASka_2eRQ%40mail.gmail.com
---
src/backend/access/heap/heapam_xlog.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/src/backend/access/heap/heapam_xlog.c b/src/backend/access/heap/heapam_xlog.c
index 5d48f071f53..69e2003a76f 100644
--- a/src/backend/access/heap/heapam_xlog.c
+++ b/src/backend/access/heap/heapam_xlog.c
@@ -295,8 +295,8 @@ heap_xlog_visible(XLogReaderState *record)
LockBuffer(vmbuffer, BUFFER_LOCK_UNLOCK);
reln = CreateFakeRelcacheEntry(rlocator);
- visibilitymap_pin(reln, blkno, &vmbuffer);
+ Assert(visibilitymap_pin_ok(blkno, vmbuffer));
visibilitymap_set(reln, blkno, InvalidBuffer, lsn, vmbuffer,
xlrec->snapshotConflictHorizon, vmbits);
--
2.43.0
v8-0005-Eliminate-xl_heap_visible-from-vacuum-phase-III.patchtext/x-patch; charset=US-ASCII; name=v8-0005-Eliminate-xl_heap_visible-from-vacuum-phase-III.patchDownload
From dc318358572f61efbd0e05aae2b9a077b422bcf5 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <melanieplageman@gmail.com>
Date: Wed, 18 Jun 2025 12:42:13 -0400
Subject: [PATCH v8 05/22] Eliminate xl_heap_visible from vacuum phase III
Instead of emitting a separate xl_heap_visible record for each page that
is rendered all-visible by vacuum's third phase, include the updates to
the VM in the already emitted xl_heap_prune record.
Author: Melanie Plageman <melanieplageman@gmail.com>
Reviewed-by: Kirill Reshke <reshkekirill@gmail.com>
Discussion: https://postgr.es/m/flat/CAAKRu_ZMw6Npd_qm2KM%2BFwQ3cMOMx1Dh3VMhp8-V7SOLxdK9-g%40mail.gmail.com
---
src/backend/access/heap/heapam_xlog.c | 143 +++++++++++++++++++++---
src/backend/access/heap/pruneheap.c | 48 +++++++-
src/backend/access/heap/vacuumlazy.c | 149 +++++++++++++++++--------
src/backend/access/rmgrdesc/heapdesc.c | 13 ++-
src/include/access/heapam.h | 9 ++
src/include/access/heapam_xlog.h | 7 +-
6 files changed, 300 insertions(+), 69 deletions(-)
diff --git a/src/backend/access/heap/heapam_xlog.c b/src/backend/access/heap/heapam_xlog.c
index 0c902c87682..e68e61feade 100644
--- a/src/backend/access/heap/heapam_xlog.c
+++ b/src/backend/access/heap/heapam_xlog.c
@@ -35,7 +35,8 @@ heap_xlog_prune_freeze(XLogReaderState *record)
Buffer buffer;
RelFileLocator rlocator;
BlockNumber blkno;
- XLogRedoAction action;
+ Buffer vmbuffer = InvalidBuffer;
+ uint8 vmflags = 0;
XLogRecGetBlockTag(record, 0, &rlocator, NULL, &blkno);
memcpy(&xlrec, maindataptr, SizeOfHeapPrune);
@@ -51,10 +52,15 @@ heap_xlog_prune_freeze(XLogReaderState *record)
(xlrec.flags & (XLHP_HAS_REDIRECTIONS | XLHP_HAS_DEAD_ITEMS)) == 0);
/*
- * We are about to remove and/or freeze tuples. In Hot Standby mode,
- * ensure that there are no queries running for which the removed tuples
- * are still visible or which still consider the frozen xids as running.
- * The conflict horizon XID comes after xl_heap_prune.
+ * After xl_heap_prune is the optional snapshot conflict horizon.
+ *
+ * In Hot Standby mode, we must ensure that there are no running queries
+ * which would conflict with the changes in this record. If pruning, that
+ * means we cannot remove tuples still visible to transactions on the
+ * standby. If freezing, that means we cannot freeze tuples with xids that
+ * are still considered running on the standby. And for setting the VM, we
+ * cannot do so if the page isn't all-visible to all transactions on the
+ * standby.
*/
if ((xlrec.flags & XLHP_HAS_CONFLICT_HORIZON) != 0)
{
@@ -70,13 +76,29 @@ heap_xlog_prune_freeze(XLogReaderState *record)
rlocator);
}
+ /* Next are the optionally included vmflags. Copy them out for later use. */
+ if ((xlrec.flags & XLHP_HAS_VMFLAGS) != 0)
+ {
+ /* memcpy because vmflags is stored unaligned */
+ memcpy(&vmflags, maindataptr, sizeof(uint8));
+ maindataptr += sizeof(uint8);
+
+ /*
+ * We don't set VISIBILITYMAP_XLOG_CATALOG_REL in the combined record
+ * because we already have XLHP_IS_CATALOG_REL.
+ */
+ Assert((vmflags & VISIBILITYMAP_VALID_BITS) == vmflags);
+ /* Must never set all_frozen bit without also setting all_visible bit */
+ Assert(vmflags != VISIBILITYMAP_ALL_FROZEN);
+ }
+
/*
- * If we have a full-page image, restore it and we're done.
+ * If we have a full-page image of the heap block, restore it and we're
+ * done with the heap block.
*/
- action = XLogReadBufferForRedoExtended(record, 0, RBM_NORMAL,
- (xlrec.flags & XLHP_CLEANUP_LOCK) != 0,
- &buffer);
- if (action == BLK_NEEDS_REDO)
+ if (XLogReadBufferForRedoExtended(record, 0, RBM_NORMAL,
+ (xlrec.flags & XLHP_CLEANUP_LOCK) != 0,
+ &buffer) == BLK_NEEDS_REDO)
{
Page page = BufferGetPage(buffer);
OffsetNumber *redirected;
@@ -89,6 +111,9 @@ heap_xlog_prune_freeze(XLogReaderState *record)
Size datalen;
xlhp_freeze_plan *plans;
OffsetNumber *frz_offsets;
+ bool do_prune;
+ bool mark_buffer_dirty;
+ bool set_heap_lsn;
char *dataptr = XLogRecGetBlockData(record, 0, &datalen);
heap_xlog_deserialize_prune_and_freeze(dataptr, xlrec.flags,
@@ -97,11 +122,18 @@ heap_xlog_prune_freeze(XLogReaderState *record)
&ndead, &nowdead,
&nunused, &nowunused);
+ do_prune = nredirected > 0 || ndead > 0 || nunused > 0;
+ set_heap_lsn = mark_buffer_dirty = do_prune || nplans > 0;
+
+ /* Ensure the record does something */
+ Assert(do_prune || nplans > 0 ||
+ vmflags & VISIBILITYMAP_VALID_BITS);
+
/*
* Update all line pointers per the record, and repair fragmentation
* if needed.
*/
- if (nredirected > 0 || ndead > 0 || nunused > 0)
+ if (do_prune)
heap_page_prune_execute(buffer,
(xlrec.flags & XLHP_CLEANUP_LOCK) == 0,
redirected, nredirected,
@@ -138,26 +170,72 @@ heap_xlog_prune_freeze(XLogReaderState *record)
/* There should be no more data */
Assert((char *) frz_offsets == dataptr + datalen);
+ Assert(BufferIsValid(buffer) &&
+ BufferGetBlockNumber(buffer) == blkno);
+
+ /*
+ * Now set PD_ALL_VISIBLE, if required. We'll only do this if we are
+ * also going to set bits in the VM later.
+ *
+ * We must never end up with the VM bit set and the page-level
+ * PD_ALL_VISIBLE bit clear. If that were to occur, a subsequent page
+ * modification would fail to clear the VM bit.
+ */
+ if ((vmflags & VISIBILITYMAP_VALID_BITS) && !PageIsAllVisible(page))
+ {
+ PageSetAllVisible(page);
+
+ /*
+ * If the only change to the heap page is setting PD_ALL_VISIBLE,
+ * we can avoid setting the page LSN unless checksums or
+ * wal_log_hints are enabled.
+ */
+ set_heap_lsn = XLogHintBitIsNeeded() ? true : set_heap_lsn;
+ mark_buffer_dirty = true;
+ }
+
/*
* Note: we don't worry about updating the page's prunability hints.
* At worst this will cause an extra prune cycle to occur soon.
*/
- PageSetLSN(page, lsn);
- MarkBufferDirty(buffer);
+ if (mark_buffer_dirty)
+ MarkBufferDirty(buffer);
+ if (set_heap_lsn)
+ PageSetLSN(page, lsn);
}
/*
- * If we released any space or line pointers, update the free space map.
+ * If we released any space or line pointers or will be setting a page in
+ * the visibility map, update the free space map.
+ *
+ * Even if we are just updating the VM (and thus not freeing up any
+ * space), we'll still update the FSM for this page. Since FSM is not
+ * WAL-logged and only updated heuristically, it easily becomes stale in
+ * standbys. If the standby is later promoted and runs VACUUM, it will
+ * skip updating individual free space figures for pages that became
+ * all-visible (or all-frozen, depending on the vacuum mode,) which is
+ * troublesome when FreeSpaceMapVacuum propagates too optimistic free
+ * space values to upper FSM layers; later inserters try to use such pages
+ * only to find out that they are unusable. This can cause long stalls
+ * when there are many such pages.
+ *
+ * Forestall those problems by updating FSM's idea about a page that is
+ * becoming all-visible or all-frozen.
*
* Do this regardless of a full-page image being applied, since the FSM
* data is not in the page anyway.
+ *
+ * We want to avoid holding an exclusive lock on the heap buffer while
+ * doing IO (either of the FSM or the VM), so we'll release the lock on
+ * the heap buffer before doing either.
*/
if (BufferIsValid(buffer))
{
- if (xlrec.flags & (XLHP_HAS_REDIRECTIONS |
- XLHP_HAS_DEAD_ITEMS |
- XLHP_HAS_NOW_UNUSED_ITEMS))
+ if ((xlrec.flags & (XLHP_HAS_REDIRECTIONS |
+ XLHP_HAS_DEAD_ITEMS |
+ XLHP_HAS_NOW_UNUSED_ITEMS)) ||
+ vmflags & VISIBILITYMAP_VALID_BITS)
{
Size freespace = PageGetHeapFreeSpace(BufferGetPage(buffer));
@@ -168,6 +246,37 @@ heap_xlog_prune_freeze(XLogReaderState *record)
else
UnlockReleaseBuffer(buffer);
}
+
+ /*
+ * Read and update the VM block. Even if we skipped updating the heap page
+ * due to the file being dropped or truncated later in recovery, it's
+ * still safe to update the visibility map. Any WAL record that clears
+ * the visibility map bit does so before checking the page LSN, so any
+ * bits that need to be cleared will still be cleared.
+ *
+ * Note that it is *only* okay that we do not hold a lock on the heap page
+ * because we are in recovery and can expect no other writers to clear
+ * PD_ALL_VISIBLE before we are able to update the VM.
+ */
+ if (vmflags & VISIBILITYMAP_VALID_BITS &&
+ XLogReadBufferForRedoExtended(record, 1,
+ RBM_ZERO_ON_ERROR,
+ false,
+ &vmbuffer) == BLK_NEEDS_REDO)
+ {
+ uint8 old_vmbits = 0;
+ Relation reln = CreateFakeRelcacheEntry(rlocator);
+
+ visibilitymap_pin(reln, blkno, &vmbuffer);
+ old_vmbits = visibilitymap_set_vmbyte(reln, blkno, vmbuffer, vmflags);
+ /* Only set VM page LSN if we modified the page */
+ if (old_vmbits != vmflags)
+ PageSetLSN(BufferGetPage(vmbuffer), lsn);
+ FreeFakeRelcacheEntry(reln);
+ }
+
+ if (BufferIsValid(vmbuffer))
+ UnlockReleaseBuffer(vmbuffer);
}
/*
diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index 7ebd22f00a3..f1a8f938e9e 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -21,6 +21,7 @@
#include "access/transam.h"
#include "access/xlog.h"
#include "access/xloginsert.h"
+#include "access/visibilitymapdefs.h"
#include "commands/vacuum.h"
#include "executor/instrument.h"
#include "miscadmin.h"
@@ -835,6 +836,7 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
conflict_xid = prstate.latest_xid_removed;
log_heap_prune_and_freeze(relation, buffer,
+ InvalidBuffer, 0, false,
conflict_xid,
true, reason,
prstate.frozen, prstate.nfrozen,
@@ -2045,12 +2047,23 @@ heap_log_freeze_plan(HeapTupleFreeze *tuples, int ntuples,
* replaying 'unused' items depends on whether they were all previously marked
* as dead.
*
+ * If the VM is being updated, vmflags will contain the bits to set. In this
+ * case, vmbuffer should already have been updated and marked dirty and should
+ * still be pinned and locked.
+ *
+ * set_pd_all_vis indicates that we set PD_ALL_VISIBLE and thus should update
+ * the page LSN when checksums/wal_log_hints are enabled even if we did not
+ * prune or freeze tuples on the page.
+ *
* Note: This function scribbles on the 'frozen' array.
*
* Note: This is called in a critical section, so careful what you do here.
*/
void
log_heap_prune_and_freeze(Relation relation, Buffer buffer,
+ Buffer vmbuffer,
+ uint8 vmflags,
+ bool set_pd_all_vis,
TransactionId conflict_xid,
bool cleanup_lock,
PruneReason reason,
@@ -2062,6 +2075,7 @@ log_heap_prune_and_freeze(Relation relation, Buffer buffer,
xl_heap_prune xlrec;
XLogRecPtr recptr;
uint8 info;
+ uint8 regbuf_flags;
/* The following local variables hold data registered in the WAL record: */
xlhp_freeze_plan plans[MaxHeapTuplesPerPage];
@@ -2070,8 +2084,19 @@ log_heap_prune_and_freeze(Relation relation, Buffer buffer,
xlhp_prune_items dead_items;
xlhp_prune_items unused_items;
OffsetNumber frz_offsets[MaxHeapTuplesPerPage];
+ bool do_prune = nredirected > 0 || ndead > 0 || nunused > 0;
xlrec.flags = 0;
+ regbuf_flags = REGBUF_STANDARD;
+
+ /*
+ * We can avoid an FPI if the only modification we are making to the heap
+ * page is to set PD_ALL_VISIBLE and checksums/wal_log_hints are disabled.
+ */
+ if (!do_prune &&
+ nfrozen == 0 &&
+ (!set_pd_all_vis || !XLogHintBitIsNeeded()))
+ regbuf_flags |= REGBUF_NO_IMAGE;
/*
* Prepare data for the buffer. The arrays are not actually in the
@@ -2079,7 +2104,11 @@ log_heap_prune_and_freeze(Relation relation, Buffer buffer,
* page image, the arrays can be omitted.
*/
XLogBeginInsert();
- XLogRegisterBuffer(0, buffer, REGBUF_STANDARD);
+ XLogRegisterBuffer(0, buffer, regbuf_flags);
+
+ if (vmflags & VISIBILITYMAP_VALID_BITS)
+ XLogRegisterBuffer(1, vmbuffer, 0);
+
if (nfrozen > 0)
{
int nplans;
@@ -2136,6 +2165,8 @@ log_heap_prune_and_freeze(Relation relation, Buffer buffer,
* Prepare the main xl_heap_prune record. We already set the XLHP_HAS_*
* flag above.
*/
+ if (vmflags & VISIBILITYMAP_VALID_BITS)
+ xlrec.flags |= XLHP_HAS_VMFLAGS;
if (RelationIsAccessibleInLogicalDecoding(relation))
xlrec.flags |= XLHP_IS_CATALOG_REL;
if (TransactionIdIsValid(conflict_xid))
@@ -2150,6 +2181,8 @@ log_heap_prune_and_freeze(Relation relation, Buffer buffer,
XLogRegisterData(&xlrec, SizeOfHeapPrune);
if (TransactionIdIsValid(conflict_xid))
XLogRegisterData(&conflict_xid, sizeof(TransactionId));
+ if (vmflags & VISIBILITYMAP_VALID_BITS)
+ XLogRegisterData(&vmflags, sizeof(uint8));
switch (reason)
{
@@ -2168,5 +2201,16 @@ log_heap_prune_and_freeze(Relation relation, Buffer buffer,
}
recptr = XLogInsert(RM_HEAP2_ID, info);
- PageSetLSN(BufferGetPage(buffer), recptr);
+ if (vmflags & VISIBILITYMAP_VALID_BITS)
+ PageSetLSN(BufferGetPage(vmbuffer), recptr);
+
+ /*
+ * If pruning or freezing tuples or setting the page all-visible when
+ * checksums or wal_hint_bits are enabled, we must bump the LSN. Torn
+ * pages are possible if we update PD_ALL_VISIBLE without bumping the LSN,
+ * but this is deemed okay for page hint updates.
+ */
+ if (do_prune || nfrozen > 0 ||
+ (set_pd_all_vis && XLogHintBitIsNeeded()))
+ PageSetLSN(BufferGetPage(buffer), recptr);
}
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index f4e29aecf46..1d3feab4ded 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -463,11 +463,13 @@ static void dead_items_add(LVRelState *vacrel, BlockNumber blkno, OffsetNumber *
int num_offsets);
static void dead_items_reset(LVRelState *vacrel);
static void dead_items_cleanup(LVRelState *vacrel);
-static bool heap_page_is_all_visible(Relation rel, Buffer buf,
- TransactionId OldestXmin,
- bool *all_frozen,
- TransactionId *visibility_cutoff_xid,
- OffsetNumber *logging_offnum);
+static bool heap_page_is_all_visible_except_lpdead(Relation rel, Buffer buf,
+ TransactionId OldestXmin,
+ OffsetNumber *deadoffsets,
+ int allowed_num_offsets,
+ bool *all_frozen,
+ TransactionId *visibility_cutoff_xid,
+ OffsetNumber *logging_offnum);
static void update_relstats_all_indexes(LVRelState *vacrel);
static void vacuum_error_callback(void *arg);
static void update_vacuum_error_info(LVRelState *vacrel,
@@ -2846,8 +2848,11 @@ lazy_vacuum_heap_page(LVRelState *vacrel, BlockNumber blkno, Buffer buffer,
OffsetNumber unused[MaxHeapTuplesPerPage];
int nunused = 0;
TransactionId visibility_cutoff_xid;
+ TransactionId conflict_xid = InvalidTransactionId;
bool all_frozen;
LVSavedErrInfo saved_err_info;
+ uint8 vmflags = 0;
+ bool set_pd_all_vis = false;
Assert(vacrel->do_index_vacuuming);
@@ -2858,6 +2863,20 @@ lazy_vacuum_heap_page(LVRelState *vacrel, BlockNumber blkno, Buffer buffer,
VACUUM_ERRCB_PHASE_VACUUM_HEAP, blkno,
InvalidOffsetNumber);
+ if (heap_page_is_all_visible_except_lpdead(vacrel->rel, buffer,
+ vacrel->cutoffs.OldestXmin,
+ deadoffsets, num_offsets,
+ &all_frozen, &visibility_cutoff_xid,
+ &vacrel->offnum))
+ {
+ vmflags |= VISIBILITYMAP_ALL_VISIBLE;
+ if (all_frozen)
+ {
+ vmflags |= VISIBILITYMAP_ALL_FROZEN;
+ Assert(!TransactionIdIsValid(visibility_cutoff_xid));
+ }
+ }
+
START_CRIT_SECTION();
for (int i = 0; i < num_offsets; i++)
@@ -2877,6 +2896,18 @@ lazy_vacuum_heap_page(LVRelState *vacrel, BlockNumber blkno, Buffer buffer,
/* Attempt to truncate line pointer array now */
PageTruncateLinePointerArray(page);
+ if ((vmflags & VISIBILITYMAP_VALID_BITS) != 0)
+ {
+ Assert(!PageIsAllVisible(page));
+ set_pd_all_vis = true;
+ LockBuffer(vmbuffer, BUFFER_LOCK_EXCLUSIVE);
+ PageSetAllVisible(page);
+ visibilitymap_set_vmbyte(vacrel->rel,
+ blkno,
+ vmbuffer, vmflags);
+ conflict_xid = visibility_cutoff_xid;
+ }
+
/*
* Mark buffer dirty before we write WAL.
*/
@@ -2886,7 +2917,10 @@ lazy_vacuum_heap_page(LVRelState *vacrel, BlockNumber blkno, Buffer buffer,
if (RelationNeedsWAL(vacrel->rel))
{
log_heap_prune_and_freeze(vacrel->rel, buffer,
- InvalidTransactionId,
+ vmbuffer,
+ vmflags,
+ set_pd_all_vis,
+ conflict_xid,
false, /* no cleanup lock required */
PRUNE_VACUUM_CLEANUP,
NULL, 0, /* frozen */
@@ -2895,39 +2929,12 @@ lazy_vacuum_heap_page(LVRelState *vacrel, BlockNumber blkno, Buffer buffer,
unused, nunused);
}
- /*
- * End critical section, so we safely can do visibility tests (which
- * possibly need to perform IO and allocate memory!). If we crash now the
- * page (including the corresponding vm bit) might not be marked all
- * visible, but that's fine. A later vacuum will fix that.
- */
END_CRIT_SECTION();
- /*
- * Now that we have removed the LP_DEAD items from the page, once again
- * check if the page has become all-visible. The page is already marked
- * dirty, exclusively locked, and, if needed, a full page image has been
- * emitted.
- */
- Assert(!PageIsAllVisible(page));
- if (heap_page_is_all_visible(vacrel->rel, buffer, vacrel->cutoffs.OldestXmin,
- &all_frozen, &visibility_cutoff_xid, &vacrel->offnum))
+ if ((vmflags & VISIBILITYMAP_ALL_VISIBLE) != 0)
{
- uint8 flags = VISIBILITYMAP_ALL_VISIBLE;
-
- if (all_frozen)
- {
- Assert(!TransactionIdIsValid(visibility_cutoff_xid));
- flags |= VISIBILITYMAP_ALL_FROZEN;
- }
-
- PageSetAllVisible(page);
- visibilitymap_set(vacrel->rel, blkno, buffer,
- InvalidXLogRecPtr,
- vmbuffer, visibility_cutoff_xid,
- flags);
-
/* Count the newly set VM page for logging */
+ LockBuffer(vmbuffer, BUFFER_LOCK_UNLOCK);
vacrel->vm_new_visible_pages++;
if (all_frozen)
vacrel->vm_new_visible_frozen_pages++;
@@ -3593,6 +3600,25 @@ dead_items_cleanup(LVRelState *vacrel)
vacrel->pvs = NULL;
}
+/*
+ * Wrapper for heap_page_is_all_visible_except_lpdead() which can be used for
+ * callers that expect no LP_DEAD on the page.
+ */
+bool
+heap_page_is_all_visible(Relation rel, Buffer buf,
+ TransactionId OldestXmin,
+ bool *all_frozen,
+ TransactionId *visibility_cutoff_xid,
+ OffsetNumber *logging_offnum)
+{
+
+ return heap_page_is_all_visible_except_lpdead(rel, buf, OldestXmin,
+ NULL, 0,
+ all_frozen,
+ visibility_cutoff_xid,
+ logging_offnum);
+}
+
/*
* Check if every tuple in the given page is visible to all current and future
* transactions.
@@ -3606,23 +3632,35 @@ dead_items_cleanup(LVRelState *vacrel)
* visible tuples. Sets *all_frozen to true if every tuple on this page is
* frozen.
*
- * This is a stripped down version of lazy_scan_prune(). If you change
- * anything here, make sure that everything stays in sync. Note that an
- * assertion calls us to verify that everybody still agrees. Be sure to avoid
- * introducing new side-effects here.
+ * deadoffsets are the offsets we know about and are about to set LP_UNUSED.
+ * allowed_num_offsets is the number of those. As long as the LP_DEAD items we
+ * encounter on the page match those exactly, we can set the page all-visible
+ * in the VM.
+ *
+ * Callers looking to verify that the page is all-visible can call
+ * heap_page_is_all_visible().
+ *
+ * This is similar logic to that in heap_prune_record_unchanged_lp_normal() If
+ * you change anything here, make sure that everything stays in sync. Note
+ * that an assertion calls us to verify that everybody still agrees. Be sure
+ * to avoid introducing new side-effects here.
*/
static bool
-heap_page_is_all_visible(Relation rel, Buffer buf,
- TransactionId OldestXmin,
- bool *all_frozen,
- TransactionId *visibility_cutoff_xid,
- OffsetNumber *logging_offnum)
+heap_page_is_all_visible_except_lpdead(Relation rel, Buffer buf,
+ TransactionId OldestXmin,
+ OffsetNumber *deadoffsets,
+ int allowed_num_offsets,
+ bool *all_frozen,
+ TransactionId *visibility_cutoff_xid,
+ OffsetNumber *logging_offnum)
{
Page page = BufferGetPage(buf);
BlockNumber blockno = BufferGetBlockNumber(buf);
OffsetNumber offnum,
maxoff;
bool all_visible = true;
+ OffsetNumber current_dead_offsets[MaxHeapTuplesPerPage];
+ size_t current_num_offsets = 0;
*visibility_cutoff_xid = InvalidTransactionId;
*all_frozen = true;
@@ -3654,9 +3692,8 @@ heap_page_is_all_visible(Relation rel, Buffer buf,
*/
if (ItemIdIsDead(itemid))
{
- all_visible = false;
- *all_frozen = false;
- break;
+ current_dead_offsets[current_num_offsets++] = offnum;
+ continue;
}
Assert(ItemIdIsNormal(itemid));
@@ -3723,7 +3760,23 @@ heap_page_is_all_visible(Relation rel, Buffer buf,
/* Clear the offset information once we have processed the given page. */
*logging_offnum = InvalidOffsetNumber;
- return all_visible;
+ /* If we already know it's not all-visible, return false */
+ if (!all_visible)
+ return false;
+
+ /* If we weren't allowed any dead offsets, we're done */
+ if (allowed_num_offsets == 0)
+ return current_num_offsets == 0;
+
+ /* If the number of dead offsets has changed, that's wrong */
+ if (current_num_offsets != allowed_num_offsets)
+ return false;
+
+ Assert(deadoffsets);
+
+ /* The dead offsets must be the same dead offsets */
+ return memcmp(current_dead_offsets, deadoffsets,
+ allowed_num_offsets * sizeof(OffsetNumber)) == 0;
}
/*
diff --git a/src/backend/access/rmgrdesc/heapdesc.c b/src/backend/access/rmgrdesc/heapdesc.c
index b48d7dc1d24..d6c86ccac20 100644
--- a/src/backend/access/rmgrdesc/heapdesc.c
+++ b/src/backend/access/rmgrdesc/heapdesc.c
@@ -266,6 +266,7 @@ heap2_desc(StringInfo buf, XLogReaderState *record)
{
char *rec = XLogRecGetData(record);
uint8 info = XLogRecGetInfo(record) & ~XLR_INFO_MASK;
+ char *maindataptr = rec + SizeOfHeapPrune;
info &= XLOG_HEAP_OPMASK;
if (info == XLOG_HEAP2_PRUNE_ON_ACCESS ||
@@ -278,7 +279,8 @@ heap2_desc(StringInfo buf, XLogReaderState *record)
{
TransactionId conflict_xid;
- memcpy(&conflict_xid, rec + SizeOfHeapPrune, sizeof(TransactionId));
+ memcpy(&conflict_xid, maindataptr, sizeof(TransactionId));
+ maindataptr += sizeof(TransactionId);
appendStringInfo(buf, "snapshotConflictHorizon: %u",
conflict_xid);
@@ -287,6 +289,15 @@ heap2_desc(StringInfo buf, XLogReaderState *record)
appendStringInfo(buf, ", isCatalogRel: %c",
xlrec->flags & XLHP_IS_CATALOG_REL ? 'T' : 'F');
+ if (xlrec->flags & XLHP_HAS_VMFLAGS)
+ {
+ uint8 vmflags;
+
+ memcpy(&vmflags, maindataptr, sizeof(uint8));
+ maindataptr += sizeof(uint8);
+ appendStringInfo(buf, ", vm_flags: 0x%02X", vmflags);
+ }
+
if (XLogRecHasBlockData(record, 0))
{
Size datalen;
diff --git a/src/include/access/heapam.h b/src/include/access/heapam.h
index a2bd5a897f8..8b47295efa2 100644
--- a/src/include/access/heapam.h
+++ b/src/include/access/heapam.h
@@ -344,6 +344,12 @@ extern void heap_inplace_update_and_unlock(Relation relation,
Buffer buffer);
extern void heap_inplace_unlock(Relation relation,
HeapTuple oldtup, Buffer buffer);
+
+extern bool heap_page_is_all_visible(Relation rel, Buffer buf,
+ TransactionId OldestXmin,
+ bool *all_frozen,
+ TransactionId *visibility_cutoff_xid,
+ OffsetNumber *logging_offnum);
extern bool heap_prepare_freeze_tuple(HeapTupleHeader tuple,
const struct VacuumCutoffs *cutoffs,
HeapPageFreeze *pagefrz,
@@ -388,6 +394,9 @@ extern void heap_page_prune_execute(Buffer buffer, bool lp_truncate_only,
OffsetNumber *nowunused, int nunused);
extern void heap_get_root_tuples(Page page, OffsetNumber *root_offsets);
extern void log_heap_prune_and_freeze(Relation relation, Buffer buffer,
+ Buffer vmbuffer,
+ uint8 vmflags,
+ bool vm_modified_heap_page,
TransactionId conflict_xid,
bool cleanup_lock,
PruneReason reason,
diff --git a/src/include/access/heapam_xlog.h b/src/include/access/heapam_xlog.h
index 277df6b3cf0..d6a479f6984 100644
--- a/src/include/access/heapam_xlog.h
+++ b/src/include/access/heapam_xlog.h
@@ -289,12 +289,17 @@ typedef struct xl_heap_prune
/*
* If XLHP_HAS_CONFLICT_HORIZON is set, the conflict horizon XID follows,
- * unaligned
+ * unaligned.
+ *
+ * Then, if XLHP_HAS_VMFLAGS is set, the VM flags follow, unaligned.
*/
} xl_heap_prune;
#define SizeOfHeapPrune (offsetof(xl_heap_prune, flags) + sizeof(uint8))
+/* If the record should update the VM, it contains their new value */
+#define XLHP_HAS_VMFLAGS (1 << 0)
+
/* to handle recovery conflict during logical decoding on standby */
#define XLHP_IS_CATALOG_REL (1 << 1)
--
2.43.0
v8-0004-Make-heap_page_is_all_visible-independent-of-LVRe.patchtext/x-patch; charset=US-ASCII; name=v8-0004-Make-heap_page_is_all_visible-independent-of-LVRe.patchDownload
From 0a31bc0bc1012de3ba3ce1194d5ce578f375025c Mon Sep 17 00:00:00 2001
From: Melanie Plageman <melanieplageman@gmail.com>
Date: Wed, 18 Jun 2025 15:48:51 -0400
Subject: [PATCH v8 04/22] Make heap_page_is_all_visible independent of
LVRelState
Future commits will use this function inside of pruneheap.c where we do
not have access to the LVRelState. We only need two parameters from the
LVRelState, so just pass those in explicitly.
Author: Melanie Plageman <melanieplageman@gmail.com>
Reviewed-by: Kirill Reshke <reshkekirill@gmail.com>
Discussion: https://postgr.es/m/flat/CAAKRu_ZMw6Npd_qm2KM%2BFwQ3cMOMx1Dh3VMhp8-V7SOLxdK9-g%40mail.gmail.com
---
src/backend/access/heap/vacuumlazy.c | 45 ++++++++++++++++++----------
1 file changed, 29 insertions(+), 16 deletions(-)
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index 932701d8420..f4e29aecf46 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -463,8 +463,11 @@ static void dead_items_add(LVRelState *vacrel, BlockNumber blkno, OffsetNumber *
int num_offsets);
static void dead_items_reset(LVRelState *vacrel);
static void dead_items_cleanup(LVRelState *vacrel);
-static bool heap_page_is_all_visible(LVRelState *vacrel, Buffer buf,
- TransactionId *visibility_cutoff_xid, bool *all_frozen);
+static bool heap_page_is_all_visible(Relation rel, Buffer buf,
+ TransactionId OldestXmin,
+ bool *all_frozen,
+ TransactionId *visibility_cutoff_xid,
+ OffsetNumber *logging_offnum);
static void update_relstats_all_indexes(LVRelState *vacrel);
static void vacuum_error_callback(void *arg);
static void update_vacuum_error_info(LVRelState *vacrel,
@@ -2009,8 +2012,9 @@ lazy_scan_prune(LVRelState *vacrel,
Assert(presult.lpdead_items == 0);
- if (!heap_page_is_all_visible(vacrel, buf,
- &debug_cutoff, &debug_all_frozen))
+ if (!heap_page_is_all_visible(vacrel->rel, buf,
+ vacrel->cutoffs.OldestXmin, &debug_all_frozen,
+ &debug_cutoff, &vacrel->offnum))
Assert(false);
Assert(presult.all_frozen == debug_all_frozen);
@@ -2906,8 +2910,8 @@ lazy_vacuum_heap_page(LVRelState *vacrel, BlockNumber blkno, Buffer buffer,
* emitted.
*/
Assert(!PageIsAllVisible(page));
- if (heap_page_is_all_visible(vacrel, buffer, &visibility_cutoff_xid,
- &all_frozen))
+ if (heap_page_is_all_visible(vacrel->rel, buffer, vacrel->cutoffs.OldestXmin,
+ &all_frozen, &visibility_cutoff_xid, &vacrel->offnum))
{
uint8 flags = VISIBILITYMAP_ALL_VISIBLE;
@@ -3591,9 +3595,16 @@ dead_items_cleanup(LVRelState *vacrel)
/*
* Check if every tuple in the given page is visible to all current and future
- * transactions. Also return the visibility_cutoff_xid which is the highest
- * xmin amongst the visible tuples. Set *all_frozen to true if every tuple
- * on this page is frozen.
+ * transactions.
+ *
+ * OldestXmin is used to determine visibility.
+ *
+ * *logging_offnum will have the OffsetNumber of the current tuple being
+ * processed for vacuum's error callback system.
+ *
+ * Return the visibility_cutoff_xid which is the highest xmin amongst the
+ * visible tuples. Sets *all_frozen to true if every tuple on this page is
+ * frozen.
*
* This is a stripped down version of lazy_scan_prune(). If you change
* anything here, make sure that everything stays in sync. Note that an
@@ -3601,9 +3612,11 @@ dead_items_cleanup(LVRelState *vacrel)
* introducing new side-effects here.
*/
static bool
-heap_page_is_all_visible(LVRelState *vacrel, Buffer buf,
+heap_page_is_all_visible(Relation rel, Buffer buf,
+ TransactionId OldestXmin,
+ bool *all_frozen,
TransactionId *visibility_cutoff_xid,
- bool *all_frozen)
+ OffsetNumber *logging_offnum)
{
Page page = BufferGetPage(buf);
BlockNumber blockno = BufferGetBlockNumber(buf);
@@ -3626,7 +3639,7 @@ heap_page_is_all_visible(LVRelState *vacrel, Buffer buf,
* Set the offset number so that we can display it along with any
* error that occurred while processing this tuple.
*/
- vacrel->offnum = offnum;
+ *logging_offnum = offnum;
itemid = PageGetItemId(page, offnum);
/* Unused or redirect line pointers are of no interest */
@@ -3650,9 +3663,9 @@ heap_page_is_all_visible(LVRelState *vacrel, Buffer buf,
tuple.t_data = (HeapTupleHeader) PageGetItem(page, itemid);
tuple.t_len = ItemIdGetLength(itemid);
- tuple.t_tableOid = RelationGetRelid(vacrel->rel);
+ tuple.t_tableOid = RelationGetRelid(rel);
- switch (HeapTupleSatisfiesVacuum(&tuple, vacrel->cutoffs.OldestXmin,
+ switch (HeapTupleSatisfiesVacuum(&tuple, OldestXmin,
buf))
{
case HEAPTUPLE_LIVE:
@@ -3673,7 +3686,7 @@ heap_page_is_all_visible(LVRelState *vacrel, Buffer buf,
*/
xmin = HeapTupleHeaderGetXmin(tuple.t_data);
if (!TransactionIdPrecedes(xmin,
- vacrel->cutoffs.OldestXmin))
+ OldestXmin))
{
all_visible = false;
*all_frozen = false;
@@ -3708,7 +3721,7 @@ heap_page_is_all_visible(LVRelState *vacrel, Buffer buf,
} /* scan along page */
/* Clear the offset information once we have processed the given page. */
- vacrel->offnum = InvalidOffsetNumber;
+ *logging_offnum = InvalidOffsetNumber;
return all_visible;
}
--
2.43.0
v8-0008-Combine-vacuum-phase-I-VM-update-cases.patchtext/x-patch; charset=US-ASCII; name=v8-0008-Combine-vacuum-phase-I-VM-update-cases.patchDownload
From e5d0f1c76b805de9de81d31e29c706fd5c8905e9 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <melanieplageman@gmail.com>
Date: Wed, 28 May 2025 16:35:36 -0400
Subject: [PATCH v8 08/22] Combine vacuum phase I VM update cases
We update the VM after phase I of vacuum -- either setting both the VM
bits when all bits are currently unset or setting just the frozen bit
when the all-visible bit is already set.
Those two cases shared much of the same code -- leading to unnecessary
duplication. This commit combines them, which is simpler and easier to
understand.
The combined case also happens to fix a longstanding bug where if we are
only setting an all-visible page all-frozen and checksums/wal_log_hints
are enabled, we would fail to set the buffer dirty before setting the
page LSN in visibilitymap_set().
---
src/backend/access/heap/vacuumlazy.c | 101 +++++++++------------------
1 file changed, 32 insertions(+), 69 deletions(-)
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index 406c30e6ecd..4d47a6b394a 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -2151,11 +2151,26 @@ lazy_scan_prune(LVRelState *vacrel,
{
/* Don't update the VM if we just cleared corruption in it */
}
- else if (!all_visible_according_to_vm && presult.all_visible)
+
+ /*
+ * If the page isn't yet marked all-visible in the VM or it is and needs
+ * to me marked all-frozen, update the VM Note that all_frozen is only
+ * valid if all_visible is true, so we must check both all_visible and
+ * all_frozen.
+ */
+ else if (presult.all_visible &&
+ (!all_visible_according_to_vm ||
+ (presult.all_frozen && !VM_ALL_FROZEN(vacrel->rel, blkno, &vmbuffer))))
{
uint8 old_vmbits;
uint8 flags = VISIBILITYMAP_ALL_VISIBLE;
+ /*
+ * If the page is all-frozen, we can pass InvalidTransactionId as our
+ * cutoff_xid, since a snapshotConflictHorizon sufficient to make
+ * everything safe for REDO was logged when the page's tuples were
+ * frozen.
+ */
if (presult.all_frozen)
{
Assert(!TransactionIdIsValid(presult.vm_conflict_horizon));
@@ -2168,21 +2183,29 @@ lazy_scan_prune(LVRelState *vacrel,
* checksums are not enabled). Regardless, set both bits so that we
* get back in sync.
*
- * NB: If the heap page is all-visible but the VM bit is not set, we
- * don't need to dirty the heap page. However, if checksums are
- * enabled, we do need to make sure that the heap page is dirtied
- * before passing it to visibilitymap_set(), because it may be logged.
- * Given that this situation should only happen in rare cases after a
- * crash, it is not worth optimizing.
+ * If the heap page is all-visible but the VM bit is not set, we don't
+ * need to dirty the heap page. However, if checksums are enabled, we
+ * do need to make sure that the heap page is dirtied before passing
+ * it to visibilitymap_set(), because it may be logged.
*/
- PageSetAllVisible(page);
- MarkBufferDirty(buf);
+ if (!PageIsAllVisible(page) || XLogHintBitIsNeeded())
+ {
+ PageSetAllVisible(page);
+ MarkBufferDirty(buf);
+ }
+
old_vmbits = visibilitymap_set(vacrel->rel, blkno, buf,
InvalidXLogRecPtr,
vmbuffer, presult.vm_conflict_horizon,
flags);
/*
+ * Even if we are only setting the all-frozen bit, there is a small
+ * chance that the VM was modified sometime between setting
+ * all_visible_according_to_vm and checking the visibility during
+ * pruning. Check the return value of old_vmbits to ensure the
+ * visibility map counters used for logging are accurate.
+ *
* If the page wasn't already set all-visible and/or all-frozen in the
* VM, count it as newly set for logging.
*/
@@ -2203,66 +2226,6 @@ lazy_scan_prune(LVRelState *vacrel,
}
}
- /*
- * If the all-visible page is all-frozen but not marked as such yet, mark
- * it as all-frozen. Note that all_frozen is only valid if all_visible is
- * true, so we must check both all_visible and all_frozen.
- */
- else if (all_visible_according_to_vm && presult.all_visible &&
- presult.all_frozen && !VM_ALL_FROZEN(vacrel->rel, blkno, &vmbuffer))
- {
- uint8 old_vmbits;
-
- /*
- * Avoid relying on all_visible_according_to_vm as a proxy for the
- * page-level PD_ALL_VISIBLE bit being set, since it might have become
- * stale -- even when all_visible is set
- */
- if (!PageIsAllVisible(page))
- {
- PageSetAllVisible(page);
- MarkBufferDirty(buf);
- }
-
- /*
- * Set the page all-frozen (and all-visible) in the VM.
- *
- * We can pass InvalidTransactionId as our cutoff_xid, since a
- * snapshotConflictHorizon sufficient to make everything safe for REDO
- * was logged when the page's tuples were frozen.
- */
- Assert(!TransactionIdIsValid(presult.vm_conflict_horizon));
- old_vmbits = visibilitymap_set(vacrel->rel, blkno, buf,
- InvalidXLogRecPtr,
- vmbuffer, InvalidTransactionId,
- VISIBILITYMAP_ALL_VISIBLE |
- VISIBILITYMAP_ALL_FROZEN);
-
- /*
- * The page was likely already set all-visible in the VM. However,
- * there is a small chance that it was modified sometime between
- * setting all_visible_according_to_vm and checking the visibility
- * during pruning. Check the return value of old_vmbits anyway to
- * ensure the visibility map counters used for logging are accurate.
- */
- if ((old_vmbits & VISIBILITYMAP_ALL_VISIBLE) == 0)
- {
- vacrel->vm_new_visible_pages++;
- vacrel->vm_new_visible_frozen_pages++;
- *vm_page_frozen = true;
- }
-
- /*
- * We already checked that the page was not set all-frozen in the VM
- * above, so we don't need to test the value of old_vmbits.
- */
- else
- {
- vacrel->vm_new_frozen_pages++;
- *vm_page_frozen = true;
- }
- }
-
return presult.ndeleted;
}
--
2.43.0
v8-0006-Use-xl_heap_prune-record-for-setting-empty-pages-.patchtext/x-patch; charset=US-ASCII; name=v8-0006-Use-xl_heap_prune-record-for-setting-empty-pages-.patchDownload
From 0d8d3a6b124f244933dbdc50fca90340715bffd5 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <melanieplageman@gmail.com>
Date: Wed, 18 Jun 2025 12:42:19 -0400
Subject: [PATCH v8 06/22] Use xl_heap_prune record for setting empty pages
all-visible
As part of a project to eliminate xl_heap_visible records, eliminate
their usage in phase I vacuum of empty pages.
---
src/backend/access/heap/pruneheap.c | 14 +++++--
src/backend/access/heap/vacuumlazy.c | 55 ++++++++++++++++++----------
src/include/access/heapam.h | 1 +
3 files changed, 47 insertions(+), 23 deletions(-)
diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index f1a8f938e9e..956caeb69dc 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -836,6 +836,7 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
conflict_xid = prstate.latest_xid_removed;
log_heap_prune_and_freeze(relation, buffer,
+ false,
InvalidBuffer, 0, false,
conflict_xid,
true, reason,
@@ -2051,6 +2052,9 @@ heap_log_freeze_plan(HeapTupleFreeze *tuples, int ntuples,
* case, vmbuffer should already have been updated and marked dirty and should
* still be pinned and locked.
*
+ * force_heap_fpi indicates that a full page image of the heap block should be
+ * forced.
+ *
* set_pd_all_vis indicates that we set PD_ALL_VISIBLE and thus should update
* the page LSN when checksums/wal_log_hints are enabled even if we did not
* prune or freeze tuples on the page.
@@ -2061,6 +2065,7 @@ heap_log_freeze_plan(HeapTupleFreeze *tuples, int ntuples,
*/
void
log_heap_prune_and_freeze(Relation relation, Buffer buffer,
+ bool force_heap_fpi,
Buffer vmbuffer,
uint8 vmflags,
bool set_pd_all_vis,
@@ -2089,13 +2094,16 @@ log_heap_prune_and_freeze(Relation relation, Buffer buffer,
xlrec.flags = 0;
regbuf_flags = REGBUF_STANDARD;
+ if (force_heap_fpi)
+ regbuf_flags |= REGBUF_FORCE_IMAGE;
+
/*
* We can avoid an FPI if the only modification we are making to the heap
* page is to set PD_ALL_VISIBLE and checksums/wal_log_hints are disabled.
*/
- if (!do_prune &&
- nfrozen == 0 &&
- (!set_pd_all_vis || !XLogHintBitIsNeeded()))
+ else if (!do_prune &&
+ nfrozen == 0 &&
+ (!set_pd_all_vis || !XLogHintBitIsNeeded()))
regbuf_flags |= REGBUF_NO_IMAGE;
/*
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index 1d3feab4ded..49c46d35486 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -1877,33 +1877,47 @@ lazy_scan_new_or_empty(LVRelState *vacrel, Buffer buf, BlockNumber blkno,
*/
if (!PageIsAllVisible(page))
{
+ uint8 new_vmbits = VISIBILITYMAP_ALL_VISIBLE |
+ VISIBILITYMAP_ALL_FROZEN;
+
START_CRIT_SECTION();
- /* mark buffer dirty before writing a WAL record */
+ PageSetAllVisible(page);
MarkBufferDirty(buf);
- /*
- * It's possible that another backend has extended the heap,
- * initialized the page, and then failed to WAL-log the page due
- * to an ERROR. Since heap extension is not WAL-logged, recovery
- * might try to replay our record setting the page all-visible and
- * find that the page isn't initialized, which will cause a PANIC.
- * To prevent that, check whether the page has been previously
- * WAL-logged, and if not, do that now.
- */
- if (RelationNeedsWAL(vacrel->rel) &&
- PageGetLSN(page) == InvalidXLogRecPtr)
- log_newpage_buffer(buf, true);
+ LockBuffer(vmbuffer, BUFFER_LOCK_EXCLUSIVE);
+ visibilitymap_set_vmbyte(vacrel->rel, blkno,
+ vmbuffer, new_vmbits);
+
+ if (RelationNeedsWAL(vacrel->rel))
+ {
+ /*
+ * It's possible that another backend has extended the heap,
+ * initialized the page, and then failed to WAL-log the page
+ * due to an ERROR. Since heap extension is not WAL-logged,
+ * recovery might try to replay our record setting the page
+ * all-visible and find that the page isn't initialized, which
+ * will cause a PANIC. To prevent that, if the page hasn't
+ * been previously WAL-logged, force a heap FPI.
+ */
+ log_heap_prune_and_freeze(vacrel->rel, buf,
+ PageGetLSN(page) == InvalidXLogRecPtr,
+ vmbuffer,
+ new_vmbits,
+ true,
+ InvalidTransactionId,
+ false, PRUNE_VACUUM_SCAN,
+ NULL, 0,
+ NULL, 0,
+ NULL, 0,
+ NULL, 0);
+ }
- PageSetAllVisible(page);
- visibilitymap_set(vacrel->rel, blkno, buf,
- InvalidXLogRecPtr,
- vmbuffer, InvalidTransactionId,
- VISIBILITYMAP_ALL_VISIBLE |
- VISIBILITYMAP_ALL_FROZEN);
END_CRIT_SECTION();
- /* Count the newly all-frozen pages for logging */
+ LockBuffer(vmbuffer, BUFFER_LOCK_UNLOCK);
+
+ /* Count the newly all-frozen pages for logging. */
vacrel->vm_new_visible_pages++;
vacrel->vm_new_visible_frozen_pages++;
}
@@ -2917,6 +2931,7 @@ lazy_vacuum_heap_page(LVRelState *vacrel, BlockNumber blkno, Buffer buffer,
if (RelationNeedsWAL(vacrel->rel))
{
log_heap_prune_and_freeze(vacrel->rel, buffer,
+ false,
vmbuffer,
vmflags,
set_pd_all_vis,
diff --git a/src/include/access/heapam.h b/src/include/access/heapam.h
index 8b47295efa2..e7129a644a1 100644
--- a/src/include/access/heapam.h
+++ b/src/include/access/heapam.h
@@ -394,6 +394,7 @@ extern void heap_page_prune_execute(Buffer buffer, bool lp_truncate_only,
OffsetNumber *nowunused, int nunused);
extern void heap_get_root_tuples(Page page, OffsetNumber *root_offsets);
extern void log_heap_prune_and_freeze(Relation relation, Buffer buffer,
+ bool force_heap_fpi,
Buffer vmbuffer,
uint8 vmflags,
bool vm_modified_heap_page,
--
2.43.0
v8-0009-Find-and-fix-VM-corruption-in-heap_page_prune_and.patchtext/x-patch; charset=US-ASCII; name=v8-0009-Find-and-fix-VM-corruption-in-heap_page_prune_and.patchDownload
From 76ef56d01483308c635915f8b43e67741876225c Mon Sep 17 00:00:00 2001
From: Melanie Plageman <melanieplageman@gmail.com>
Date: Wed, 28 May 2025 16:45:59 -0400
Subject: [PATCH v8 09/22] Find and fix VM corruption in
heap_page_prune_and_freeze
Future commits will update the VM in the same critical section and WAL
record as pruning and freezing. For ease of review, this commit makes
one step toward doing this. It moves the VM corruption handling case to
heap_page_prune_and_freeze().
---
src/backend/access/heap/pruneheap.c | 87 +++++++++++++++++++++++++++-
src/backend/access/heap/vacuumlazy.c | 77 +++---------------------
src/include/access/heapam.h | 4 ++
3 files changed, 96 insertions(+), 72 deletions(-)
diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index 956caeb69dc..72216126945 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -21,7 +21,7 @@
#include "access/transam.h"
#include "access/xlog.h"
#include "access/xloginsert.h"
-#include "access/visibilitymapdefs.h"
+#include "access/visibilitymap.h"
#include "commands/vacuum.h"
#include "executor/instrument.h"
#include "miscadmin.h"
@@ -177,6 +177,13 @@ static void heap_prune_record_unchanged_lp_redirect(PruneState *prstate, OffsetN
static void page_verify_redirects(Page page);
+static bool identify_and_fix_vm_corruption(Relation relation,
+ BlockNumber heap_blk,
+ Buffer heap_buffer, Page heap_page,
+ bool heap_blk_known_av,
+ int64 nlpdead_items,
+ Buffer vmbuffer);
+
/*
* Optionally prune and repair fragmentation in the specified page.
@@ -261,7 +268,9 @@ heap_page_prune_opt(Relation relation, Buffer buffer)
* not the relation has indexes, since we cannot safely determine
* that during on-access pruning with the current implementation.
*/
- heap_page_prune_and_freeze(relation, buffer, vistest, 0,
+ heap_page_prune_and_freeze(relation, buffer, false,
+ InvalidBuffer,
+ vistest, 0,
NULL, &presult, PRUNE_ON_ACCESS, &dummy_off_loc, NULL, NULL);
/*
@@ -294,6 +303,64 @@ heap_page_prune_opt(Relation relation, Buffer buffer)
}
}
+/*
+ * When updating the visibility map after phase I heap vacuuming, we take the
+ * opportunity to identify and fix any VM corruption.
+ *
+ * heap_blk_known_av is the visibility status of the heap page collected
+ * while finding the next unskippable block in heap_vac_scan_next_block().
+ */
+static bool
+identify_and_fix_vm_corruption(Relation relation,
+ BlockNumber heap_blk,
+ Buffer heap_buffer, Page heap_page,
+ bool heap_blk_known_av,
+ int64 nlpdead_items,
+ Buffer vmbuffer)
+{
+ /*
+ * As of PostgreSQL 9.2, the visibility map bit should never be set if the
+ * page-level bit is clear. However, it's possible that the bit got
+ * cleared after heap_vac_scan_next_block() was called, so we must recheck
+ * with buffer lock before concluding that the VM is corrupt.
+ */
+ if (heap_blk_known_av && !PageIsAllVisible(heap_page) &&
+ visibilitymap_get_status(relation, heap_blk, &vmbuffer) != 0)
+ {
+ elog(WARNING, "page is not marked all-visible but visibility map bit is set in relation \"%s\" page %u",
+ RelationGetRelationName(relation), heap_blk);
+ visibilitymap_clear(relation, heap_blk, vmbuffer,
+ VISIBILITYMAP_VALID_BITS);
+ return true;
+ }
+
+ /*
+ * It's possible for the value returned by
+ * GetOldestNonRemovableTransactionId() to move backwards, so it's not
+ * wrong for us to see tuples that appear to not be visible to everyone
+ * yet, while PD_ALL_VISIBLE is already set. The real safe xmin value
+ * never moves backwards, but GetOldestNonRemovableTransactionId() is
+ * conservative and sometimes returns a value that's unnecessarily small,
+ * so if we see that contradiction it just means that the tuples that we
+ * think are not visible to everyone yet actually are, and the
+ * PD_ALL_VISIBLE flag is correct.
+ *
+ * There should never be LP_DEAD items on a page with PD_ALL_VISIBLE set,
+ * however.
+ */
+ if (nlpdead_items > 0 && PageIsAllVisible(heap_page))
+ {
+ elog(WARNING, "page containing LP_DEAD items is marked as all-visible in relation \"%s\" page %u",
+ RelationGetRelationName(relation), heap_blk);
+ PageClearAllVisible(heap_page);
+ MarkBufferDirty(heap_buffer);
+ visibilitymap_clear(relation, heap_blk, vmbuffer,
+ VISIBILITYMAP_VALID_BITS);
+ return true;
+ }
+
+ return false;
+}
/*
* Prune and repair fragmentation and potentially freeze tuples on the
@@ -314,6 +381,10 @@ heap_page_prune_opt(Relation relation, Buffer buffer)
* HEAP_PRUNE_FREEZE option is not set, because at the moment only callers
* that also freeze need that information.
*
+ * blk_known_av is the visibility status of the heap block as of the last call
+ * to find_next_unskippable_block(). vmbuffer is the buffer that may already
+ * contain the required block of the visibility map.
+ *
* vistest is used to distinguish whether tuples are DEAD or RECENTLY_DEAD
* (see heap_prune_satisfies_vacuum).
*
@@ -349,6 +420,8 @@ heap_page_prune_opt(Relation relation, Buffer buffer)
*/
void
heap_page_prune_and_freeze(Relation relation, Buffer buffer,
+ bool blk_known_av,
+ Buffer vmbuffer,
GlobalVisState *vistest,
int options,
struct VacuumCutoffs *cutoffs,
@@ -897,6 +970,16 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
presult->lpdead_items = prstate.lpdead_items;
/* the presult->deadoffsets array was already filled in */
+ /*
+ * Clear any VM corruption. This does not need to be done in a critical
+ * section.
+ */
+ presult->vm_corruption = false;
+ if (options & HEAP_PAGE_PRUNE_UPDATE_VM)
+ presult->vm_corruption = identify_and_fix_vm_corruption(relation,
+ blockno, buffer, page,
+ blk_known_av,
+ prstate.lpdead_items, vmbuffer);
if (prstate.freeze)
{
if (presult->nfrozen > 0)
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index 4d47a6b394a..64ae63dcb12 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -430,12 +430,6 @@ static void find_next_unskippable_block(LVRelState *vacrel, bool *skipsallvis);
static bool lazy_scan_new_or_empty(LVRelState *vacrel, Buffer buf,
BlockNumber blkno, Page page,
bool sharelock, Buffer vmbuffer);
-static bool identify_and_fix_vm_corruption(Relation relation,
- BlockNumber heap_blk,
- Buffer heap_buffer, Page heap_page,
- bool heap_blk_known_av,
- int64 nlpdead_items,
- Buffer vmbuffer);
static int lazy_scan_prune(LVRelState *vacrel, Buffer buf,
BlockNumber blkno, Page page,
Buffer vmbuffer, bool all_visible_according_to_vm,
@@ -1938,65 +1932,6 @@ lazy_scan_new_or_empty(LVRelState *vacrel, Buffer buf, BlockNumber blkno,
return false;
}
-/*
- * When updating the visibility map after phase I heap vacuuming, we take the
- * opportunity to identify and fix any VM corruption.
- *
- * heap_blk_known_av is the visibility status of the heap page collected
- * while finding the next unskippable block in heap_vac_scan_next_block().
- */
-static bool
-identify_and_fix_vm_corruption(Relation relation,
- BlockNumber heap_blk,
- Buffer heap_buffer, Page heap_page,
- bool heap_blk_known_av,
- int64 nlpdead_items,
- Buffer vmbuffer)
-{
- /*
- * As of PostgreSQL 9.2, the visibility map bit should never be set if the
- * page-level bit is clear. However, it's possible that the bit got
- * cleared after heap_vac_scan_next_block() was called, so we must recheck
- * with buffer lock before concluding that the VM is corrupt.
- */
- if (heap_blk_known_av && !PageIsAllVisible(heap_page) &&
- visibilitymap_get_status(relation, heap_blk, &vmbuffer) != 0)
- {
- elog(WARNING, "page is not marked all-visible but visibility map bit is set in relation \"%s\" page %u",
- RelationGetRelationName(relation), heap_blk);
- visibilitymap_clear(relation, heap_blk, vmbuffer,
- VISIBILITYMAP_VALID_BITS);
- return true;
- }
-
- /*
- * It's possible for the value returned by
- * GetOldestNonRemovableTransactionId() to move backwards, so it's not
- * wrong for us to see tuples that appear to not be visible to everyone
- * yet, while PD_ALL_VISIBLE is already set. The real safe xmin value
- * never moves backwards, but GetOldestNonRemovableTransactionId() is
- * conservative and sometimes returns a value that's unnecessarily small,
- * so if we see that contradiction it just means that the tuples that we
- * think are not visible to everyone yet actually are, and the
- * PD_ALL_VISIBLE flag is correct.
- *
- * There should never be LP_DEAD items on a page with PD_ALL_VISIBLE set,
- * however.
- */
- if (nlpdead_items > 0 && PageIsAllVisible(heap_page))
- {
- elog(WARNING, "page containing LP_DEAD items is marked as all-visible in relation \"%s\" page %u",
- RelationGetRelationName(relation), heap_blk);
- PageClearAllVisible(heap_page);
- MarkBufferDirty(heap_buffer);
- visibilitymap_clear(relation, heap_blk, vmbuffer,
- VISIBILITYMAP_VALID_BITS);
- return true;
- }
-
- return false;
-}
-
/* qsort comparator for sorting OffsetNumbers */
static int
@@ -2055,11 +1990,14 @@ lazy_scan_prune(LVRelState *vacrel,
* tuples. Pruning will have determined whether or not the page is
* all-visible.
*/
- prune_options = HEAP_PAGE_PRUNE_FREEZE;
+ prune_options = HEAP_PAGE_PRUNE_FREEZE | HEAP_PAGE_PRUNE_UPDATE_VM;
if (vacrel->nindexes == 0)
prune_options |= HEAP_PAGE_PRUNE_MARK_UNUSED_NOW;
- heap_page_prune_and_freeze(rel, buf, vacrel->vistest, prune_options,
+ heap_page_prune_and_freeze(rel, buf,
+ all_visible_according_to_vm,
+ vmbuffer,
+ vacrel->vistest, prune_options,
&vacrel->cutoffs, &presult, PRUNE_VACUUM_SCAN,
&vacrel->offnum,
&vacrel->NewRelfrozenXid, &vacrel->NewRelminMxid);
@@ -2144,10 +2082,9 @@ lazy_scan_prune(LVRelState *vacrel,
/*
* Handle setting visibility map bit based on information from the VM (as
* of last heap_vac_scan_next_block() call), and from all_visible and
- * all_frozen variables. Start by looking for any VM corruption.
+ * all_frozen variables.
*/
- if (identify_and_fix_vm_corruption(vacrel->rel, blkno, buf, page,
- all_visible_according_to_vm, presult.lpdead_items, vmbuffer))
+ if (presult.vm_corruption)
{
/* Don't update the VM if we just cleared corruption in it */
}
diff --git a/src/include/access/heapam.h b/src/include/access/heapam.h
index e7129a644a1..0c7eb5e46f4 100644
--- a/src/include/access/heapam.h
+++ b/src/include/access/heapam.h
@@ -42,6 +42,7 @@
/* "options" flag bits for heap_page_prune_and_freeze */
#define HEAP_PAGE_PRUNE_MARK_UNUSED_NOW (1 << 0)
#define HEAP_PAGE_PRUNE_FREEZE (1 << 1)
+#define HEAP_PAGE_PRUNE_UPDATE_VM (1 << 2)
typedef struct BulkInsertStateData *BulkInsertState;
struct TupleTableSlot;
@@ -247,6 +248,7 @@ typedef struct PruneFreezeResult
bool all_visible;
bool all_frozen;
TransactionId vm_conflict_horizon;
+ bool vm_corruption;
/*
* Whether or not the page makes rel truncation unsafe. This is set to
@@ -380,6 +382,8 @@ extern TransactionId heap_index_delete_tuples(Relation rel,
struct GlobalVisState;
extern void heap_page_prune_opt(Relation relation, Buffer buffer);
extern void heap_page_prune_and_freeze(Relation relation, Buffer buffer,
+ bool blk_known_av,
+ Buffer vmbuffer,
struct GlobalVisState *vistest,
int options,
struct VacuumCutoffs *cutoffs,
--
2.43.0
v8-0007-Combine-lazy_scan_prune-VM-corruption-cases.patchtext/x-patch; charset=US-ASCII; name=v8-0007-Combine-lazy_scan_prune-VM-corruption-cases.patchDownload
From 393bce514362c05bed2eba71f1bfad649507d058 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <melanieplageman@gmail.com>
Date: Wed, 28 May 2025 16:04:03 -0400
Subject: [PATCH v8 07/22] Combine lazy_scan_prune VM corruption cases
lazy_scan_prune() updates the visibility map after phase I of heap
vacuuming. It also checks and fixes corruption in the VM. The corruption
cases where mixed in with the normal visibility map update cases.
Careful study of the ordering of the current logic reveals that the
corruption cases can be reordered and extracted into a separate
function. This should result in no additional overhead when compared to
previous execution.
This reordering makes it clear which cases are about corruption and
which cases are normal VM updates. Separating them also makes it
possible to combine the normal cases in a future commit. This will make
the logic easier to understand and allow for further separation of the
logic to allow updating the VM in the same record as pruning and
freezing in phase I.
---
src/backend/access/heap/vacuumlazy.c | 114 +++++++++++++++++----------
1 file changed, 73 insertions(+), 41 deletions(-)
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index 49c46d35486..406c30e6ecd 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -430,6 +430,12 @@ static void find_next_unskippable_block(LVRelState *vacrel, bool *skipsallvis);
static bool lazy_scan_new_or_empty(LVRelState *vacrel, Buffer buf,
BlockNumber blkno, Page page,
bool sharelock, Buffer vmbuffer);
+static bool identify_and_fix_vm_corruption(Relation relation,
+ BlockNumber heap_blk,
+ Buffer heap_buffer, Page heap_page,
+ bool heap_blk_known_av,
+ int64 nlpdead_items,
+ Buffer vmbuffer);
static int lazy_scan_prune(LVRelState *vacrel, Buffer buf,
BlockNumber blkno, Page page,
Buffer vmbuffer, bool all_visible_according_to_vm,
@@ -1932,6 +1938,66 @@ lazy_scan_new_or_empty(LVRelState *vacrel, Buffer buf, BlockNumber blkno,
return false;
}
+/*
+ * When updating the visibility map after phase I heap vacuuming, we take the
+ * opportunity to identify and fix any VM corruption.
+ *
+ * heap_blk_known_av is the visibility status of the heap page collected
+ * while finding the next unskippable block in heap_vac_scan_next_block().
+ */
+static bool
+identify_and_fix_vm_corruption(Relation relation,
+ BlockNumber heap_blk,
+ Buffer heap_buffer, Page heap_page,
+ bool heap_blk_known_av,
+ int64 nlpdead_items,
+ Buffer vmbuffer)
+{
+ /*
+ * As of PostgreSQL 9.2, the visibility map bit should never be set if the
+ * page-level bit is clear. However, it's possible that the bit got
+ * cleared after heap_vac_scan_next_block() was called, so we must recheck
+ * with buffer lock before concluding that the VM is corrupt.
+ */
+ if (heap_blk_known_av && !PageIsAllVisible(heap_page) &&
+ visibilitymap_get_status(relation, heap_blk, &vmbuffer) != 0)
+ {
+ elog(WARNING, "page is not marked all-visible but visibility map bit is set in relation \"%s\" page %u",
+ RelationGetRelationName(relation), heap_blk);
+ visibilitymap_clear(relation, heap_blk, vmbuffer,
+ VISIBILITYMAP_VALID_BITS);
+ return true;
+ }
+
+ /*
+ * It's possible for the value returned by
+ * GetOldestNonRemovableTransactionId() to move backwards, so it's not
+ * wrong for us to see tuples that appear to not be visible to everyone
+ * yet, while PD_ALL_VISIBLE is already set. The real safe xmin value
+ * never moves backwards, but GetOldestNonRemovableTransactionId() is
+ * conservative and sometimes returns a value that's unnecessarily small,
+ * so if we see that contradiction it just means that the tuples that we
+ * think are not visible to everyone yet actually are, and the
+ * PD_ALL_VISIBLE flag is correct.
+ *
+ * There should never be LP_DEAD items on a page with PD_ALL_VISIBLE set,
+ * however.
+ */
+ if (nlpdead_items > 0 && PageIsAllVisible(heap_page))
+ {
+ elog(WARNING, "page containing LP_DEAD items is marked as all-visible in relation \"%s\" page %u",
+ RelationGetRelationName(relation), heap_blk);
+ PageClearAllVisible(heap_page);
+ MarkBufferDirty(heap_buffer);
+ visibilitymap_clear(relation, heap_blk, vmbuffer,
+ VISIBILITYMAP_VALID_BITS);
+ return true;
+ }
+
+ return false;
+}
+
+
/* qsort comparator for sorting OffsetNumbers */
static int
cmpOffsetNumbers(const void *a, const void *b)
@@ -2078,9 +2144,14 @@ lazy_scan_prune(LVRelState *vacrel,
/*
* Handle setting visibility map bit based on information from the VM (as
* of last heap_vac_scan_next_block() call), and from all_visible and
- * all_frozen variables
+ * all_frozen variables. Start by looking for any VM corruption.
*/
- if (!all_visible_according_to_vm && presult.all_visible)
+ if (identify_and_fix_vm_corruption(vacrel->rel, blkno, buf, page,
+ all_visible_according_to_vm, presult.lpdead_items, vmbuffer))
+ {
+ /* Don't update the VM if we just cleared corruption in it */
+ }
+ else if (!all_visible_according_to_vm && presult.all_visible)
{
uint8 old_vmbits;
uint8 flags = VISIBILITYMAP_ALL_VISIBLE;
@@ -2132,45 +2203,6 @@ lazy_scan_prune(LVRelState *vacrel,
}
}
- /*
- * As of PostgreSQL 9.2, the visibility map bit should never be set if the
- * page-level bit is clear. However, it's possible that the bit got
- * cleared after heap_vac_scan_next_block() was called, so we must recheck
- * with buffer lock before concluding that the VM is corrupt.
- */
- else if (all_visible_according_to_vm && !PageIsAllVisible(page) &&
- visibilitymap_get_status(vacrel->rel, blkno, &vmbuffer) != 0)
- {
- elog(WARNING, "page is not marked all-visible but visibility map bit is set in relation \"%s\" page %u",
- vacrel->relname, blkno);
- visibilitymap_clear(vacrel->rel, blkno, vmbuffer,
- VISIBILITYMAP_VALID_BITS);
- }
-
- /*
- * It's possible for the value returned by
- * GetOldestNonRemovableTransactionId() to move backwards, so it's not
- * wrong for us to see tuples that appear to not be visible to everyone
- * yet, while PD_ALL_VISIBLE is already set. The real safe xmin value
- * never moves backwards, but GetOldestNonRemovableTransactionId() is
- * conservative and sometimes returns a value that's unnecessarily small,
- * so if we see that contradiction it just means that the tuples that we
- * think are not visible to everyone yet actually are, and the
- * PD_ALL_VISIBLE flag is correct.
- *
- * There should never be LP_DEAD items on a page with PD_ALL_VISIBLE set,
- * however.
- */
- else if (presult.lpdead_items > 0 && PageIsAllVisible(page))
- {
- elog(WARNING, "page containing LP_DEAD items is marked as all-visible in relation \"%s\" page %u",
- vacrel->relname, blkno);
- PageClearAllVisible(page);
- MarkBufferDirty(buf);
- visibilitymap_clear(vacrel->rel, blkno, vmbuffer,
- VISIBILITYMAP_VALID_BITS);
- }
-
/*
* If the all-visible page is all-frozen but not marked as such yet, mark
* it as all-frozen. Note that all_frozen is only valid if all_visible is
--
2.43.0
v8-0003-Eliminate-xl_heap_visible-in-COPY-FREEZE.patchtext/x-patch; charset=US-ASCII; name=v8-0003-Eliminate-xl_heap_visible-in-COPY-FREEZE.patchDownload
From 07f31099754636ec9dabf6cca06c33c4b19c230c Mon Sep 17 00:00:00 2001
From: Melanie Plageman <melanieplageman@gmail.com>
Date: Tue, 17 Jun 2025 17:22:10 -0400
Subject: [PATCH v8 03/22] Eliminate xl_heap_visible in COPY FREEZE
Instead of emitting a separate WAL record for setting the VM bits in
xl_heap_visible, specify the changes to make to the VM block in the
xl_heap_multi_insert record instead.
Author: Melanie Plageman <melanieplageman@gmail.com>
Reviewed-by: Kirill Reshke <reshkekirill@gmail.com>
Discussion: https://postgr.es/m/flat/CAAKRu_ZMw6Npd_qm2KM%2BFwQ3cMOMx1Dh3VMhp8-V7SOLxdK9-g%40mail.gmail.com
---
src/backend/access/heap/heapam.c | 48 ++++++++++--------
src/backend/access/heap/heapam_xlog.c | 39 +++++++++++++-
src/backend/access/heap/visibilitymap.c | 67 ++++++++++++++++++++++++-
src/backend/access/rmgrdesc/heapdesc.c | 5 ++
src/include/access/visibilitymap.h | 2 +
5 files changed, 138 insertions(+), 23 deletions(-)
diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index e3e7307ef5f..035280dc30a 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -2504,9 +2504,6 @@ heap_multi_insert(Relation relation, TupleTableSlot **slots, int ntuples,
/*
* If the page is all visible, need to clear that, unless we're only
* going to add further frozen rows to it.
- *
- * If we're only adding already frozen rows to a previously empty
- * page, mark it as all-visible.
*/
if (PageIsAllVisible(page) && !(options & HEAP_INSERT_FROZEN))
{
@@ -2516,8 +2513,23 @@ heap_multi_insert(Relation relation, TupleTableSlot **slots, int ntuples,
BufferGetBlockNumber(buffer),
vmbuffer, VISIBILITYMAP_VALID_BITS);
}
+
+ /*
+ * If we're only adding already frozen rows to a previously empty
+ * page, mark it as all-frozen and update the visibility map. We're
+ * already holding a pin on the vmbuffer.
+ */
else if (all_frozen_set)
+ {
+ Assert(visibilitymap_pin_ok(BufferGetBlockNumber(buffer), vmbuffer));
PageSetAllVisible(page);
+ LockBuffer(vmbuffer, BUFFER_LOCK_EXCLUSIVE);
+ visibilitymap_set_vmbyte(relation,
+ BufferGetBlockNumber(buffer),
+ vmbuffer,
+ VISIBILITYMAP_ALL_VISIBLE |
+ VISIBILITYMAP_ALL_FROZEN);
+ }
/*
* XXX Should we set PageSetPrunable on this page ? See heap_insert()
@@ -2565,6 +2577,12 @@ heap_multi_insert(Relation relation, TupleTableSlot **slots, int ntuples,
xlrec->flags = 0;
if (all_visible_cleared)
xlrec->flags = XLH_INSERT_ALL_VISIBLE_CLEARED;
+
+ /*
+ * We don't have to worry about including a conflict xid in the
+ * WAL record as HEAP_INSERT_FROZEN intentionally violates
+ * visibility rules.
+ */
if (all_frozen_set)
xlrec->flags = XLH_INSERT_ALL_FROZEN_SET;
@@ -2627,7 +2645,10 @@ heap_multi_insert(Relation relation, TupleTableSlot **slots, int ntuples,
XLogBeginInsert();
XLogRegisterData(xlrec, tupledata - scratch.data);
+
XLogRegisterBuffer(0, buffer, REGBUF_STANDARD | bufflags);
+ if (all_frozen_set)
+ XLogRegisterBuffer(1, vmbuffer, 0);
XLogRegisterBufData(0, tupledata, totaldatalen);
@@ -2637,29 +2658,14 @@ heap_multi_insert(Relation relation, TupleTableSlot **slots, int ntuples,
recptr = XLogInsert(RM_HEAP2_ID, info);
PageSetLSN(page, recptr);
+ if (all_frozen_set)
+ PageSetLSN(BufferGetPage(vmbuffer), recptr);
}
END_CRIT_SECTION();
- /*
- * If we've frozen everything on the page, update the visibilitymap.
- * We're already holding pin on the vmbuffer.
- */
if (all_frozen_set)
- {
- Assert(PageIsAllVisible(page));
- Assert(visibilitymap_pin_ok(BufferGetBlockNumber(buffer), vmbuffer));
-
- /*
- * It's fine to use InvalidTransactionId here - this is only used
- * when HEAP_INSERT_FROZEN is specified, which intentionally
- * violates visibility rules.
- */
- visibilitymap_set(relation, BufferGetBlockNumber(buffer), buffer,
- InvalidXLogRecPtr, vmbuffer,
- InvalidTransactionId,
- VISIBILITYMAP_ALL_VISIBLE | VISIBILITYMAP_ALL_FROZEN);
- }
+ LockBuffer(vmbuffer, BUFFER_LOCK_UNLOCK);
UnlockReleaseBuffer(buffer);
ndone += nthispage;
diff --git a/src/backend/access/heap/heapam_xlog.c b/src/backend/access/heap/heapam_xlog.c
index 69e2003a76f..0c902c87682 100644
--- a/src/backend/access/heap/heapam_xlog.c
+++ b/src/backend/access/heap/heapam_xlog.c
@@ -552,6 +552,7 @@ heap_xlog_multi_insert(XLogReaderState *record)
int i;
bool isinit = (XLogRecGetInfo(record) & XLOG_HEAP_INIT_PAGE) != 0;
XLogRedoAction action;
+ Buffer vmbuffer = InvalidBuffer;
/*
* Insertion doesn't overwrite MVCC data, so no conflict processing is
@@ -572,11 +573,11 @@ heap_xlog_multi_insert(XLogReaderState *record)
if (xlrec->flags & XLH_INSERT_ALL_VISIBLE_CLEARED)
{
Relation reln = CreateFakeRelcacheEntry(rlocator);
- Buffer vmbuffer = InvalidBuffer;
visibilitymap_pin(reln, blkno, &vmbuffer);
visibilitymap_clear(reln, blkno, vmbuffer, VISIBILITYMAP_VALID_BITS);
ReleaseBuffer(vmbuffer);
+ vmbuffer = InvalidBuffer;
FreeFakeRelcacheEntry(reln);
}
@@ -663,6 +664,42 @@ heap_xlog_multi_insert(XLogReaderState *record)
if (BufferIsValid(buffer))
UnlockReleaseBuffer(buffer);
+ buffer = InvalidBuffer;
+
+ /*
+ * Now read and update the VM block. Even if we skipped updating the heap
+ * page due to the file being dropped or truncated later in recovery, it's
+ * still safe to update the visibility map. Any WAL record that clears
+ * the visibility map bit does so before checking the page LSN, so any
+ * bits that need to be cleared will still be cleared.
+ *
+ * It is only okay to set the VM bits without holding the heap page lock
+ * because we can expect no other writers of this page.
+ */
+ if (xlrec->flags & XLH_INSERT_ALL_FROZEN_SET &&
+ XLogReadBufferForRedoExtended(record, 1, RBM_ZERO_ON_ERROR, false,
+ &vmbuffer) == BLK_NEEDS_REDO)
+ {
+ Relation reln = CreateFakeRelcacheEntry(rlocator);
+
+ Assert(visibilitymap_pin_ok(blkno, vmbuffer));
+ visibilitymap_set_vmbyte(reln, blkno,
+ vmbuffer,
+ VISIBILITYMAP_ALL_VISIBLE |
+ VISIBILITYMAP_ALL_FROZEN);
+
+ /*
+ * It is not possible that the VM was already set for this heap page,
+ * so the vmbuffer must have been modified and marked dirty.
+ */
+ Assert(BufferIsDirty(vmbuffer));
+ PageSetLSN(BufferGetPage(vmbuffer), lsn);
+ FreeFakeRelcacheEntry(reln);
+ }
+
+ if (BufferIsValid(vmbuffer))
+ UnlockReleaseBuffer(vmbuffer);
+
/*
* If the page is running low on free space, update the FSM as well.
* Arbitrarily, our definition of "low" is less than 20%. We can't do much
diff --git a/src/backend/access/heap/visibilitymap.c b/src/backend/access/heap/visibilitymap.c
index 7306c16f05c..64dff7a0026 100644
--- a/src/backend/access/heap/visibilitymap.c
+++ b/src/backend/access/heap/visibilitymap.c
@@ -14,7 +14,8 @@
* visibilitymap_clear - clear bits for one page in the visibility map
* visibilitymap_pin - pin a map page for setting a bit
* visibilitymap_pin_ok - check whether correct map page is already pinned
- * visibilitymap_set - set a bit in a previously pinned page
+ * visibilitymap_set - set a bit in a previously pinned page and log
+ * visibilitymap_set_vmbyte - set a bit in a pinned page
* visibilitymap_get_status - get status of bits
* visibilitymap_count - count number of bits set in visibility map
* visibilitymap_prepare_truncate -
@@ -321,6 +322,70 @@ visibilitymap_set(Relation rel, BlockNumber heapBlk, Buffer heapBuf,
return status;
}
+/*
+ * Set flags in the VM block contained in the passed in vmBuf.
+ *
+ * This function is for callers which include the VM changes in the same WAL
+ * record as the modifications of the heap page which rendered it all-visible.
+ * Callers separately logging the VM changes should invoke visibilitymap_set()
+ * instead.
+ *
+ * Caller must have pinned and exclusive locked the correct block of the VM in
+ * vmBuf. This block should contain the VM bits for the given heapBlk.
+ *
+ * During normal operation (i.e. not recovery), this should be called in a
+ * critical section which also makes any necessary changes to the heap page
+ * and, if relevant, emits WAL.
+ *
+ * Caller is responsible for WAL logging the changes to the VM buffer and for
+ * making any changes needed to the associated heap page. This includes
+ * maintaining any invariants such as ensuring the buffer containing heapBlk
+ * is pinned and exclusive locked.
+ */
+uint8
+visibilitymap_set_vmbyte(Relation rel, BlockNumber heapBlk,
+ Buffer vmBuf, uint8 flags)
+{
+ BlockNumber mapBlock = HEAPBLK_TO_MAPBLOCK(heapBlk);
+ uint32 mapByte = HEAPBLK_TO_MAPBYTE(heapBlk);
+ uint8 mapOffset = HEAPBLK_TO_OFFSET(heapBlk);
+ Page page;
+ uint8 *map;
+ uint8 status;
+
+#ifdef TRACE_VISIBILITYMAP
+ elog(DEBUG1, "vm_set flags 0x%02X for %s %d",
+ flags, RelationGetRelationName(rel), heapBlk);
+#endif
+
+ /* Call in same critical section where WAL is emitted. */
+ Assert(InRecovery || CritSectionCount > 0);
+
+ /* Flags should be valid. Also never clear bits with this function */
+ Assert((flags & VISIBILITYMAP_VALID_BITS) == flags);
+
+ /* Must never set all_frozen bit without also setting all_visible bit */
+ Assert(flags != VISIBILITYMAP_ALL_FROZEN);
+
+ /* Check that we have the right VM page pinned */
+ if (!BufferIsValid(vmBuf) || BufferGetBlockNumber(vmBuf) != mapBlock)
+ elog(ERROR, "wrong VM buffer passed to visibilitymap_set");
+
+ Assert(BufferIsExclusiveLocked(vmBuf));
+
+ page = BufferGetPage(vmBuf);
+ map = (uint8 *) PageGetContents(page);
+
+ status = (map[mapByte] >> mapOffset) & VISIBILITYMAP_VALID_BITS;
+ if (flags != status)
+ {
+ map[mapByte] |= (flags << mapOffset);
+ MarkBufferDirty(vmBuf);
+ }
+
+ return status;
+}
+
/*
* visibilitymap_get_status - get status of bits
*
diff --git a/src/backend/access/rmgrdesc/heapdesc.c b/src/backend/access/rmgrdesc/heapdesc.c
index 82b62c95de5..b48d7dc1d24 100644
--- a/src/backend/access/rmgrdesc/heapdesc.c
+++ b/src/backend/access/rmgrdesc/heapdesc.c
@@ -16,6 +16,7 @@
#include "access/heapam_xlog.h"
#include "access/rmgrdesc_utils.h"
+#include "access/visibilitymapdefs.h"
#include "storage/standbydefs.h"
/*
@@ -354,6 +355,10 @@ heap2_desc(StringInfo buf, XLogReaderState *record)
appendStringInfo(buf, "ntuples: %d, flags: 0x%02X", xlrec->ntuples,
xlrec->flags);
+ if (xlrec->flags & XLH_INSERT_ALL_FROZEN_SET)
+ appendStringInfo(buf, ", vm_flags: 0x%02X",
+ VISIBILITYMAP_ALL_VISIBLE | VISIBILITYMAP_ALL_FROZEN);
+
if (XLogRecHasBlockData(record, 0) && !isinit)
{
appendStringInfoString(buf, ", offsets:");
diff --git a/src/include/access/visibilitymap.h b/src/include/access/visibilitymap.h
index be21c6dd1a3..977566f6b98 100644
--- a/src/include/access/visibilitymap.h
+++ b/src/include/access/visibilitymap.h
@@ -37,6 +37,8 @@ extern uint8 visibilitymap_set(Relation rel,
Buffer vmBuf,
TransactionId cutoff_xid,
uint8 flags);
+extern uint8 visibilitymap_set_vmbyte(Relation rel, BlockNumber heapBlk,
+ Buffer vmBuf, uint8 flags);
extern uint8 visibilitymap_get_status(Relation rel, BlockNumber heapBlk, Buffer *vmbuf);
extern void visibilitymap_count(Relation rel, BlockNumber *all_visible, BlockNumber *all_frozen);
extern BlockNumber visibilitymap_prepare_truncate(Relation rel,
--
2.43.0
v8-0014-Remove-xl_heap_visible-entirely.patchtext/x-patch; charset=US-ASCII; name=v8-0014-Remove-xl_heap_visible-entirely.patchDownload
From 91d6e524a46c4d19dfe82c368ee98a950753cfb4 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <melanieplageman@gmail.com>
Date: Wed, 18 Jun 2025 12:30:42 -0400
Subject: [PATCH v8 14/22] Remove xl_heap_visible entirely
There are now no users of this, so eliminate it entirely.
---
src/backend/access/common/bufmask.c | 3 +-
src/backend/access/heap/heapam.c | 54 +-------
src/backend/access/heap/heapam_xlog.c | 154 +----------------------
src/backend/access/heap/pruneheap.c | 4 +-
src/backend/access/heap/vacuumlazy.c | 10 +-
src/backend/access/heap/visibilitymap.c | 109 +---------------
src/backend/access/rmgrdesc/heapdesc.c | 10 --
src/backend/replication/logical/decode.c | 1 -
src/backend/storage/ipc/standby.c | 12 +-
src/include/access/heapam_xlog.h | 20 ---
src/include/access/visibilitymap.h | 11 +-
src/include/access/visibilitymapdefs.h | 9 --
src/tools/pgindent/typedefs.list | 1 -
13 files changed, 30 insertions(+), 368 deletions(-)
diff --git a/src/backend/access/common/bufmask.c b/src/backend/access/common/bufmask.c
index bb260cffa68..1fff01383b3 100644
--- a/src/backend/access/common/bufmask.c
+++ b/src/backend/access/common/bufmask.c
@@ -56,8 +56,7 @@ mask_page_hint_bits(Page page)
/*
* During replay, if the page LSN has advanced past our XLOG record's LSN,
- * we don't mark the page all-visible. See heap_xlog_visible() for
- * details.
+ * we don't mark the page all-visible.
*/
PageClearAllVisible(page);
}
diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index 035280dc30a..88f880cfd15 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -40,6 +40,7 @@
#include "access/valid.h"
#include "access/visibilitymap.h"
#include "access/xloginsert.h"
+#include "access/xlogutils.h"
#include "catalog/pg_database.h"
#include "catalog/pg_database_d.h"
#include "commands/vacuum.h"
@@ -2524,11 +2525,11 @@ heap_multi_insert(Relation relation, TupleTableSlot **slots, int ntuples,
Assert(visibilitymap_pin_ok(BufferGetBlockNumber(buffer), vmbuffer));
PageSetAllVisible(page);
LockBuffer(vmbuffer, BUFFER_LOCK_EXCLUSIVE);
- visibilitymap_set_vmbyte(relation,
- BufferGetBlockNumber(buffer),
- vmbuffer,
- VISIBILITYMAP_ALL_VISIBLE |
- VISIBILITYMAP_ALL_FROZEN);
+ visibilitymap_set(relation,
+ BufferGetBlockNumber(buffer),
+ vmbuffer,
+ VISIBILITYMAP_ALL_VISIBLE |
+ VISIBILITYMAP_ALL_FROZEN);
}
/*
@@ -8796,49 +8797,6 @@ bottomup_sort_and_shrink(TM_IndexDeleteOp *delstate)
return nblocksfavorable;
}
-/*
- * Perform XLogInsert for a heap-visible operation. 'block' is the block
- * being marked all-visible, and vm_buffer is the buffer containing the
- * corresponding visibility map block. Both should have already been modified
- * and dirtied.
- *
- * snapshotConflictHorizon comes from the largest xmin on the page being
- * marked all-visible. REDO routine uses it to generate recovery conflicts.
- *
- * If checksums or wal_log_hints are enabled, we may also generate a full-page
- * image of heap_buffer. Otherwise, we optimize away the FPI (by specifying
- * REGBUF_NO_IMAGE for the heap buffer), in which case the caller should *not*
- * update the heap page's LSN.
- */
-XLogRecPtr
-log_heap_visible(Relation rel, Buffer heap_buffer, Buffer vm_buffer,
- TransactionId snapshotConflictHorizon, uint8 vmflags)
-{
- xl_heap_visible xlrec;
- XLogRecPtr recptr;
- uint8 flags;
-
- Assert(BufferIsValid(heap_buffer));
- Assert(BufferIsValid(vm_buffer));
-
- xlrec.snapshotConflictHorizon = snapshotConflictHorizon;
- xlrec.flags = vmflags;
- if (RelationIsAccessibleInLogicalDecoding(rel))
- xlrec.flags |= VISIBILITYMAP_XLOG_CATALOG_REL;
- XLogBeginInsert();
- XLogRegisterData(&xlrec, SizeOfHeapVisible);
-
- XLogRegisterBuffer(0, vm_buffer, 0);
-
- flags = REGBUF_STANDARD;
- if (!XLogHintBitIsNeeded())
- flags |= REGBUF_NO_IMAGE;
- XLogRegisterBuffer(1, heap_buffer, flags);
-
- recptr = XLogInsert(RM_HEAP2_ID, XLOG_HEAP2_VISIBLE);
-
- return recptr;
-}
/*
* Perform XLogInsert for a heap-update operation. Caller must already
diff --git a/src/backend/access/heap/heapam_xlog.c b/src/backend/access/heap/heapam_xlog.c
index e68e61feade..83a5f3dbc34 100644
--- a/src/backend/access/heap/heapam_xlog.c
+++ b/src/backend/access/heap/heapam_xlog.c
@@ -83,10 +83,6 @@ heap_xlog_prune_freeze(XLogReaderState *record)
memcpy(&vmflags, maindataptr, sizeof(uint8));
maindataptr += sizeof(uint8);
- /*
- * We don't set VISIBILITYMAP_XLOG_CATALOG_REL in the combined record
- * because we already have XLHP_IS_CATALOG_REL.
- */
Assert((vmflags & VISIBILITYMAP_VALID_BITS) == vmflags);
/* Must never set all_frozen bit without also setting all_visible bit */
Assert(vmflags != VISIBILITYMAP_ALL_FROZEN);
@@ -268,7 +264,7 @@ heap_xlog_prune_freeze(XLogReaderState *record)
Relation reln = CreateFakeRelcacheEntry(rlocator);
visibilitymap_pin(reln, blkno, &vmbuffer);
- old_vmbits = visibilitymap_set_vmbyte(reln, blkno, vmbuffer, vmflags);
+ old_vmbits = visibilitymap_set(reln, blkno, vmbuffer, vmflags);
/* Only set VM page LSN if we modified the page */
if (old_vmbits != vmflags)
PageSetLSN(BufferGetPage(vmbuffer), lsn);
@@ -279,143 +275,6 @@ heap_xlog_prune_freeze(XLogReaderState *record)
UnlockReleaseBuffer(vmbuffer);
}
-/*
- * Replay XLOG_HEAP2_VISIBLE records.
- *
- * The critical integrity requirement here is that we must never end up with
- * a situation where the visibility map bit is set, and the page-level
- * PD_ALL_VISIBLE bit is clear. If that were to occur, then a subsequent
- * page modification would fail to clear the visibility map bit.
- */
-static void
-heap_xlog_visible(XLogReaderState *record)
-{
- XLogRecPtr lsn = record->EndRecPtr;
- xl_heap_visible *xlrec = (xl_heap_visible *) XLogRecGetData(record);
- Buffer vmbuffer = InvalidBuffer;
- Buffer buffer;
- Page page;
- RelFileLocator rlocator;
- BlockNumber blkno;
- XLogRedoAction action;
-
- Assert((xlrec->flags & VISIBILITYMAP_XLOG_VALID_BITS) == xlrec->flags);
-
- XLogRecGetBlockTag(record, 1, &rlocator, NULL, &blkno);
-
- /*
- * If there are any Hot Standby transactions running that have an xmin
- * horizon old enough that this page isn't all-visible for them, they
- * might incorrectly decide that an index-only scan can skip a heap fetch.
- *
- * NB: It might be better to throw some kind of "soft" conflict here that
- * forces any index-only scan that is in flight to perform heap fetches,
- * rather than killing the transaction outright.
- */
- if (InHotStandby)
- ResolveRecoveryConflictWithSnapshot(xlrec->snapshotConflictHorizon,
- xlrec->flags & VISIBILITYMAP_XLOG_CATALOG_REL,
- rlocator);
-
- /*
- * Read the heap page, if it still exists. If the heap file has dropped or
- * truncated later in recovery, we don't need to update the page, but we'd
- * better still update the visibility map.
- */
- action = XLogReadBufferForRedo(record, 1, &buffer);
- if (action == BLK_NEEDS_REDO)
- {
- /*
- * We don't bump the LSN of the heap page when setting the visibility
- * map bit (unless checksums or wal_hint_bits is enabled, in which
- * case we must). This exposes us to torn page hazards, but since
- * we're not inspecting the existing page contents in any way, we
- * don't care.
- */
- page = BufferGetPage(buffer);
-
- PageSetAllVisible(page);
-
- if (XLogHintBitIsNeeded())
- PageSetLSN(page, lsn);
-
- MarkBufferDirty(buffer);
- }
- else if (action == BLK_RESTORED)
- {
- /*
- * If heap block was backed up, we already restored it and there's
- * nothing more to do. (This can only happen with checksums or
- * wal_log_hints enabled.)
- */
- }
-
- if (BufferIsValid(buffer))
- {
- Size space = PageGetFreeSpace(BufferGetPage(buffer));
-
- UnlockReleaseBuffer(buffer);
-
- /*
- * Since FSM is not WAL-logged and only updated heuristically, it
- * easily becomes stale in standbys. If the standby is later promoted
- * and runs VACUUM, it will skip updating individual free space
- * figures for pages that became all-visible (or all-frozen, depending
- * on the vacuum mode,) which is troublesome when FreeSpaceMapVacuum
- * propagates too optimistic free space values to upper FSM layers;
- * later inserters try to use such pages only to find out that they
- * are unusable. This can cause long stalls when there are many such
- * pages.
- *
- * Forestall those problems by updating FSM's idea about a page that
- * is becoming all-visible or all-frozen.
- *
- * Do this regardless of a full-page image being applied, since the
- * FSM data is not in the page anyway.
- */
- if (xlrec->flags & VISIBILITYMAP_VALID_BITS)
- XLogRecordPageWithFreeSpace(rlocator, blkno, space);
- }
-
- /*
- * Even if we skipped the heap page update due to the LSN interlock, it's
- * still safe to update the visibility map. Any WAL record that clears
- * the visibility map bit does so before checking the page LSN, so any
- * bits that need to be cleared will still be cleared.
- */
- if (XLogReadBufferForRedoExtended(record, 0, RBM_ZERO_ON_ERROR, false,
- &vmbuffer) == BLK_NEEDS_REDO)
- {
- Page vmpage = BufferGetPage(vmbuffer);
- Relation reln;
- uint8 vmbits;
-
- /* initialize the page if it was read as zeros */
- if (PageIsNew(vmpage))
- PageInit(vmpage, BLCKSZ, 0);
-
- /* remove VISIBILITYMAP_XLOG_* */
- vmbits = xlrec->flags & VISIBILITYMAP_VALID_BITS;
-
- /*
- * XLogReadBufferForRedoExtended locked the buffer. But
- * visibilitymap_set will handle locking itself.
- */
- LockBuffer(vmbuffer, BUFFER_LOCK_UNLOCK);
-
- reln = CreateFakeRelcacheEntry(rlocator);
-
- Assert(visibilitymap_pin_ok(blkno, vmbuffer));
- visibilitymap_set(reln, blkno, InvalidBuffer, lsn, vmbuffer,
- xlrec->snapshotConflictHorizon, vmbits);
-
- ReleaseBuffer(vmbuffer);
- FreeFakeRelcacheEntry(reln);
- }
- else if (BufferIsValid(vmbuffer))
- UnlockReleaseBuffer(vmbuffer);
-}
-
/*
* Given an "infobits" field from an XLog record, set the correct bits in the
* given infomask and infomask2 for the tuple touched by the record.
@@ -792,16 +651,16 @@ heap_xlog_multi_insert(XLogReaderState *record)
Relation reln = CreateFakeRelcacheEntry(rlocator);
Assert(visibilitymap_pin_ok(blkno, vmbuffer));
- visibilitymap_set_vmbyte(reln, blkno,
- vmbuffer,
- VISIBILITYMAP_ALL_VISIBLE |
- VISIBILITYMAP_ALL_FROZEN);
/*
* It is not possible that the VM was already set for this heap page,
* so the vmbuffer must have been modified and marked dirty.
*/
Assert(BufferIsDirty(vmbuffer));
+ visibilitymap_set(reln, blkno,
+ vmbuffer,
+ VISIBILITYMAP_ALL_VISIBLE |
+ VISIBILITYMAP_ALL_FROZEN);
PageSetLSN(BufferGetPage(vmbuffer), lsn);
FreeFakeRelcacheEntry(reln);
}
@@ -1381,9 +1240,6 @@ heap2_redo(XLogReaderState *record)
case XLOG_HEAP2_PRUNE_VACUUM_CLEANUP:
heap_xlog_prune_freeze(record);
break;
- case XLOG_HEAP2_VISIBLE:
- heap_xlog_visible(record);
- break;
case XLOG_HEAP2_MULTI_INSERT:
heap_xlog_multi_insert(record);
break;
diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index ef9bb0c273a..de656087941 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -979,8 +979,8 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
if (do_set_vm)
{
LockBuffer(vmbuffer, BUFFER_LOCK_EXCLUSIVE);
- old_vmbits = visibilitymap_set_vmbyte(relation, blockno,
- vmbuffer, vmflags);
+ old_vmbits = visibilitymap_set(relation, blockno,
+ vmbuffer, vmflags);
if (old_vmbits == vmflags)
{
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index 5129c13fee9..66ce30ddf03 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -1886,8 +1886,8 @@ lazy_scan_new_or_empty(LVRelState *vacrel, Buffer buf, BlockNumber blkno,
MarkBufferDirty(buf);
LockBuffer(vmbuffer, BUFFER_LOCK_EXCLUSIVE);
- visibilitymap_set_vmbyte(vacrel->rel, blkno,
- vmbuffer, new_vmbits);
+ visibilitymap_set(vacrel->rel, blkno,
+ vmbuffer, new_vmbits);
if (RelationNeedsWAL(vacrel->rel))
{
@@ -2753,9 +2753,9 @@ lazy_vacuum_heap_page(LVRelState *vacrel, BlockNumber blkno, Buffer buffer,
set_pd_all_vis = true;
LockBuffer(vmbuffer, BUFFER_LOCK_EXCLUSIVE);
PageSetAllVisible(page);
- visibilitymap_set_vmbyte(vacrel->rel,
- blkno,
- vmbuffer, vmflags);
+ visibilitymap_set(vacrel->rel,
+ blkno,
+ vmbuffer, vmflags);
conflict_xid = visibility_cutoff_xid;
}
diff --git a/src/backend/access/heap/visibilitymap.c b/src/backend/access/heap/visibilitymap.c
index 64dff7a0026..8342ec1ff22 100644
--- a/src/backend/access/heap/visibilitymap.c
+++ b/src/backend/access/heap/visibilitymap.c
@@ -14,8 +14,7 @@
* visibilitymap_clear - clear bits for one page in the visibility map
* visibilitymap_pin - pin a map page for setting a bit
* visibilitymap_pin_ok - check whether correct map page is already pinned
- * visibilitymap_set - set a bit in a previously pinned page and log
- * visibilitymap_set_vmbyte - set a bit in a pinned page
+ * visibilitymap_set - set a bit in a previously pinned page
* visibilitymap_get_status - get status of bits
* visibilitymap_count - count number of bits set in visibility map
* visibilitymap_prepare_truncate -
@@ -220,108 +219,6 @@ visibilitymap_pin_ok(BlockNumber heapBlk, Buffer vmbuf)
return BufferIsValid(vmbuf) && BufferGetBlockNumber(vmbuf) == mapBlock;
}
-/*
- * visibilitymap_set - set bit(s) on a previously pinned page
- *
- * recptr is the LSN of the XLOG record we're replaying, if we're in recovery,
- * or InvalidXLogRecPtr in normal running. The VM page LSN is advanced to the
- * one provided; in normal running, we generate a new XLOG record and set the
- * page LSN to that value (though the heap page's LSN may *not* be updated;
- * see below). cutoff_xid is the largest xmin on the page being marked
- * all-visible; it is needed for Hot Standby, and can be InvalidTransactionId
- * if the page contains no tuples. It can also be set to InvalidTransactionId
- * when a page that is already all-visible is being marked all-frozen.
- *
- * Caller is expected to set the heap page's PD_ALL_VISIBLE bit before calling
- * this function. Except in recovery, caller should also pass the heap
- * buffer. When checksums are enabled and we're not in recovery, we must add
- * the heap buffer to the WAL chain to protect it from being torn.
- *
- * You must pass a buffer containing the correct map page to this function.
- * Call visibilitymap_pin first to pin the right one. This function doesn't do
- * any I/O.
- *
- * Returns the state of the page's VM bits before setting flags.
- */
-uint8
-visibilitymap_set(Relation rel, BlockNumber heapBlk, Buffer heapBuf,
- XLogRecPtr recptr, Buffer vmBuf, TransactionId cutoff_xid,
- uint8 flags)
-{
- BlockNumber mapBlock = HEAPBLK_TO_MAPBLOCK(heapBlk);
- uint32 mapByte = HEAPBLK_TO_MAPBYTE(heapBlk);
- uint8 mapOffset = HEAPBLK_TO_OFFSET(heapBlk);
- Page page;
- uint8 *map;
- uint8 status;
-
-#ifdef TRACE_VISIBILITYMAP
- elog(DEBUG1, "vm_set flags 0x%02X for %s %d",
- flags, RelationGetRelationName(rel), heapBlk);
-#endif
-
- Assert(InRecovery || XLogRecPtrIsInvalid(recptr));
- Assert(InRecovery || PageIsAllVisible(BufferGetPage(heapBuf)));
- Assert((flags & VISIBILITYMAP_VALID_BITS) == flags);
-
- /* Must never set all_frozen bit without also setting all_visible bit */
- Assert(flags != VISIBILITYMAP_ALL_FROZEN);
-
- /* Check that we have the right heap page pinned, if present */
- if (BufferIsValid(heapBuf) && BufferGetBlockNumber(heapBuf) != heapBlk)
- elog(ERROR, "wrong heap buffer passed to visibilitymap_set");
-
- Assert(!BufferIsValid(heapBuf) || BufferIsExclusiveLocked(heapBuf));
-
- /* Check that we have the right VM page pinned */
- if (!BufferIsValid(vmBuf) || BufferGetBlockNumber(vmBuf) != mapBlock)
- elog(ERROR, "wrong VM buffer passed to visibilitymap_set");
-
- page = BufferGetPage(vmBuf);
- map = (uint8 *) PageGetContents(page);
- LockBuffer(vmBuf, BUFFER_LOCK_EXCLUSIVE);
-
- status = (map[mapByte] >> mapOffset) & VISIBILITYMAP_VALID_BITS;
- if (flags != status)
- {
- START_CRIT_SECTION();
-
- map[mapByte] |= (flags << mapOffset);
- MarkBufferDirty(vmBuf);
-
- if (RelationNeedsWAL(rel))
- {
- if (XLogRecPtrIsInvalid(recptr))
- {
- Assert(!InRecovery);
- recptr = log_heap_visible(rel, heapBuf, vmBuf, cutoff_xid, flags);
-
- /*
- * If data checksums are enabled (or wal_log_hints=on), we
- * need to protect the heap page from being torn.
- *
- * If not, then we must *not* update the heap page's LSN. In
- * this case, the FPI for the heap page was omitted from the
- * WAL record inserted above, so it would be incorrect to
- * update the heap page's LSN.
- */
- if (XLogHintBitIsNeeded())
- {
- Page heapPage = BufferGetPage(heapBuf);
-
- PageSetLSN(heapPage, recptr);
- }
- }
- PageSetLSN(page, recptr);
- }
-
- END_CRIT_SECTION();
- }
-
- LockBuffer(vmBuf, BUFFER_LOCK_UNLOCK);
- return status;
-}
-
/*
* Set flags in the VM block contained in the passed in vmBuf.
*
@@ -343,8 +240,8 @@ visibilitymap_set(Relation rel, BlockNumber heapBlk, Buffer heapBuf,
* is pinned and exclusive locked.
*/
uint8
-visibilitymap_set_vmbyte(Relation rel, BlockNumber heapBlk,
- Buffer vmBuf, uint8 flags)
+visibilitymap_set(Relation rel, BlockNumber heapBlk,
+ Buffer vmBuf, uint8 flags)
{
BlockNumber mapBlock = HEAPBLK_TO_MAPBLOCK(heapBlk);
uint32 mapByte = HEAPBLK_TO_MAPBYTE(heapBlk);
diff --git a/src/backend/access/rmgrdesc/heapdesc.c b/src/backend/access/rmgrdesc/heapdesc.c
index d6c86ccac20..f7880a4ed81 100644
--- a/src/backend/access/rmgrdesc/heapdesc.c
+++ b/src/backend/access/rmgrdesc/heapdesc.c
@@ -351,13 +351,6 @@ heap2_desc(StringInfo buf, XLogReaderState *record)
}
}
}
- else if (info == XLOG_HEAP2_VISIBLE)
- {
- xl_heap_visible *xlrec = (xl_heap_visible *) rec;
-
- appendStringInfo(buf, "snapshotConflictHorizon: %u, flags: 0x%02X",
- xlrec->snapshotConflictHorizon, xlrec->flags);
- }
else if (info == XLOG_HEAP2_MULTI_INSERT)
{
xl_heap_multi_insert *xlrec = (xl_heap_multi_insert *) rec;
@@ -462,9 +455,6 @@ heap2_identify(uint8 info)
case XLOG_HEAP2_PRUNE_VACUUM_CLEANUP:
id = "PRUNE_VACUUM_CLEANUP";
break;
- case XLOG_HEAP2_VISIBLE:
- id = "VISIBLE";
- break;
case XLOG_HEAP2_MULTI_INSERT:
id = "MULTI_INSERT";
break;
diff --git a/src/backend/replication/logical/decode.c b/src/backend/replication/logical/decode.c
index cc03f0706e9..2fdd4af90a8 100644
--- a/src/backend/replication/logical/decode.c
+++ b/src/backend/replication/logical/decode.c
@@ -454,7 +454,6 @@ heap2_decode(LogicalDecodingContext *ctx, XLogRecordBuffer *buf)
case XLOG_HEAP2_PRUNE_ON_ACCESS:
case XLOG_HEAP2_PRUNE_VACUUM_SCAN:
case XLOG_HEAP2_PRUNE_VACUUM_CLEANUP:
- case XLOG_HEAP2_VISIBLE:
case XLOG_HEAP2_LOCK_UPDATED:
break;
default:
diff --git a/src/backend/storage/ipc/standby.c b/src/backend/storage/ipc/standby.c
index 4222bdab078..c619643e121 100644
--- a/src/backend/storage/ipc/standby.c
+++ b/src/backend/storage/ipc/standby.c
@@ -475,12 +475,12 @@ ResolveRecoveryConflictWithSnapshot(TransactionId snapshotConflictHorizon,
* If we get passed InvalidTransactionId then we do nothing (no conflict).
*
* This can happen when replaying already-applied WAL records after a
- * standby crash or restart, or when replaying an XLOG_HEAP2_VISIBLE
- * record that marks as frozen a page which was already all-visible. It's
- * also quite common with records generated during index deletion
- * (original execution of the deletion can reason that a recovery conflict
- * which is sufficient for the deletion operation must take place before
- * replay of the deletion record itself).
+ * standby crash or restart, or when replaying a record that marks as
+ * frozen a page which was already marked all-visible in the visibility
+ * map. It's also quite common with records generated during index
+ * deletion (original execution of the deletion can reason that a recovery
+ * conflict which is sufficient for the deletion operation must take place
+ * before replay of the deletion record itself).
*/
if (!TransactionIdIsValid(snapshotConflictHorizon))
return;
diff --git a/src/include/access/heapam_xlog.h b/src/include/access/heapam_xlog.h
index d6a479f6984..34988d564fd 100644
--- a/src/include/access/heapam_xlog.h
+++ b/src/include/access/heapam_xlog.h
@@ -60,7 +60,6 @@
#define XLOG_HEAP2_PRUNE_ON_ACCESS 0x10
#define XLOG_HEAP2_PRUNE_VACUUM_SCAN 0x20
#define XLOG_HEAP2_PRUNE_VACUUM_CLEANUP 0x30
-#define XLOG_HEAP2_VISIBLE 0x40
#define XLOG_HEAP2_MULTI_INSERT 0x50
#define XLOG_HEAP2_LOCK_UPDATED 0x60
#define XLOG_HEAP2_NEW_CID 0x70
@@ -440,20 +439,6 @@ typedef struct xl_heap_inplace
#define MinSizeOfHeapInplace (offsetof(xl_heap_inplace, nmsgs) + sizeof(int))
-/*
- * This is what we need to know about setting a visibility map bit
- *
- * Backup blk 0: visibility map buffer
- * Backup blk 1: heap buffer
- */
-typedef struct xl_heap_visible
-{
- TransactionId snapshotConflictHorizon;
- uint8 flags;
-} xl_heap_visible;
-
-#define SizeOfHeapVisible (offsetof(xl_heap_visible, flags) + sizeof(uint8))
-
typedef struct xl_heap_new_cid
{
/*
@@ -497,11 +482,6 @@ extern void heap2_desc(StringInfo buf, XLogReaderState *record);
extern const char *heap2_identify(uint8 info);
extern void heap_xlog_logical_rewrite(XLogReaderState *r);
-extern XLogRecPtr log_heap_visible(Relation rel, Buffer heap_buffer,
- Buffer vm_buffer,
- TransactionId snapshotConflictHorizon,
- uint8 vmflags);
-
/* in heapdesc.c, so it can be shared between frontend/backend code */
extern void heap_xlog_deserialize_prune_and_freeze(char *cursor, uint8 flags,
int *nplans, xlhp_freeze_plan **plans,
diff --git a/src/include/access/visibilitymap.h b/src/include/access/visibilitymap.h
index 977566f6b98..c5b1e1f7adb 100644
--- a/src/include/access/visibilitymap.h
+++ b/src/include/access/visibilitymap.h
@@ -15,7 +15,6 @@
#define VISIBILITYMAP_H
#include "access/visibilitymapdefs.h"
-#include "access/xlogdefs.h"
#include "storage/block.h"
#include "storage/buf.h"
#include "utils/relcache.h"
@@ -31,14 +30,8 @@ extern bool visibilitymap_clear(Relation rel, BlockNumber heapBlk,
extern void visibilitymap_pin(Relation rel, BlockNumber heapBlk,
Buffer *vmbuf);
extern bool visibilitymap_pin_ok(BlockNumber heapBlk, Buffer vmbuf);
-extern uint8 visibilitymap_set(Relation rel,
- BlockNumber heapBlk, Buffer heapBuf,
- XLogRecPtr recptr,
- Buffer vmBuf,
- TransactionId cutoff_xid,
- uint8 flags);
-extern uint8 visibilitymap_set_vmbyte(Relation rel, BlockNumber heapBlk,
- Buffer vmBuf, uint8 flags);
+extern uint8 visibilitymap_set(Relation rel, BlockNumber heapBlk,
+ Buffer vmBuf, uint8 flags);
extern uint8 visibilitymap_get_status(Relation rel, BlockNumber heapBlk, Buffer *vmbuf);
extern void visibilitymap_count(Relation rel, BlockNumber *all_visible, BlockNumber *all_frozen);
extern BlockNumber visibilitymap_prepare_truncate(Relation rel,
diff --git a/src/include/access/visibilitymapdefs.h b/src/include/access/visibilitymapdefs.h
index 5ad5c020877..e01bce4c99f 100644
--- a/src/include/access/visibilitymapdefs.h
+++ b/src/include/access/visibilitymapdefs.h
@@ -21,14 +21,5 @@
#define VISIBILITYMAP_ALL_FROZEN 0x02
#define VISIBILITYMAP_VALID_BITS 0x03 /* OR of all valid visibilitymap
* flags bits */
-/*
- * To detect recovery conflicts during logical decoding on a standby, we need
- * to know if a table is a user catalog table. For that we add an additional
- * bit into xl_heap_visible.flags, in addition to the above.
- *
- * NB: VISIBILITYMAP_XLOG_* may not be passed to visibilitymap_set().
- */
-#define VISIBILITYMAP_XLOG_CATALOG_REL 0x04
-#define VISIBILITYMAP_XLOG_VALID_BITS (VISIBILITYMAP_VALID_BITS | VISIBILITYMAP_XLOG_CATALOG_REL)
#endif /* VISIBILITYMAPDEFS_H */
diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list
index a13e8162890..b4c880c083f 100644
--- a/src/tools/pgindent/typedefs.list
+++ b/src/tools/pgindent/typedefs.list
@@ -4273,7 +4273,6 @@ xl_heap_prune
xl_heap_rewrite_mapping
xl_heap_truncate
xl_heap_update
-xl_heap_visible
xl_invalid_page
xl_invalid_page_key
xl_invalidations
--
2.43.0
v8-0011-Update-VM-in-pruneheap.c.patchtext/x-patch; charset=US-ASCII; name=v8-0011-Update-VM-in-pruneheap.c.patchDownload
From 24e738f55987f2690acb8090f9aa78b7d7507d98 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <melanieplageman@gmail.com>
Date: Mon, 2 Jun 2025 11:04:14 -0400
Subject: [PATCH v8 11/22] Update VM in pruneheap.c
As a step toward updating the VM in the same critical section and WAL
record as pruning and freezing (during phase I of vacuuming), first move
the VM update (still in its own critical section and WAL record) into
heap_page_prune_and_freeze(). This makes review easier.
---
src/backend/access/heap/pruneheap.c | 99 +++++++++++++++++++++++-----
src/backend/access/heap/vacuumlazy.c | 99 +++++-----------------------
src/include/access/heapam.h | 15 +++--
3 files changed, 106 insertions(+), 107 deletions(-)
diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index a562573763a..fcf054d04a8 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -360,7 +360,8 @@ identify_and_fix_vm_corruption(Relation relation,
/*
* Prune and repair fragmentation and potentially freeze tuples on the
- * specified page.
+ * specified page. If the page's visibility status has changed, update it in
+ * the VM.
*
* Caller must have pin and buffer cleanup lock on the page. Note that we
* don't update the FSM information for page on caller's behalf. Caller might
@@ -436,6 +437,8 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
bool do_freeze;
bool do_prune;
bool do_hint;
+ uint8 vmflags = 0;
+ uint8 old_vmbits = 0;
bool hint_bit_fpi;
int64 fpi_before = pgWalUsage.wal_fpi;
@@ -936,7 +939,7 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
*
* Now that freezing has been finalized, unset all_visible if there are
* any LP_DEAD items on the page. It needs to reflect the present state
- * of the page, as expected by our caller.
+ * of the page, as expected for updating the visibility map.
*/
if (prstate.all_visible && prstate.lpdead_items == 0)
{
@@ -952,31 +955,91 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
presult->hastup = prstate.hastup;
/*
- * For callers planning to update the visibility map, the conflict horizon
- * for that record must be the newest xmin on the page. However, if the
- * page is completely frozen, there can be no conflict and the
- * vm_conflict_horizon should remain InvalidTransactionId. This includes
- * the case that we just froze all the tuples; the prune-freeze record
- * included the conflict XID already so the caller doesn't need it.
+ * If updating the visibility map, the conflict horizon for that record
+ * must be the newest xmin on the page. However, if the page is
+ * completely frozen, there can be no conflict and the vm_conflict_horizon
+ * should remain InvalidTransactionId. This includes the case that we
+ * just froze all the tuples; the prune-freeze record included the
+ * conflict XID already so the VM update record doesn't need it.
*/
if (presult->all_frozen)
presult->vm_conflict_horizon = InvalidTransactionId;
else
presult->vm_conflict_horizon = prstate.visibility_cutoff_xid;
- presult->lpdead_items = prstate.lpdead_items;
- /* the presult->deadoffsets array was already filled in */
-
/*
- * Clear any VM corruption. This does not need to be done in a critical
- * section.
+ * Handle setting visibility map bit based on information from the VM (as
+ * of last heap_vac_scan_next_block() call), and from all_visible and
+ * all_frozen variables.
*/
- presult->vm_corruption = false;
if (options & HEAP_PAGE_PRUNE_UPDATE_VM)
- presult->vm_corruption = identify_and_fix_vm_corruption(relation,
- blockno, buffer, page,
- blk_known_av,
- prstate.lpdead_items, vmbuffer);
+ {
+ if (identify_and_fix_vm_corruption(relation,
+ blockno, buffer, page,
+ blk_known_av,
+ prstate.lpdead_items, vmbuffer))
+ {
+ /* If we fix corruption, don't update the VM further */
+ }
+
+ /*
+ * If the page isn't yet marked all-visible in the VM or it is and
+ * needs to me marked all-frozen, update the VM Note that all_frozen
+ * is only valid if all_visible is true, so we must check both
+ * all_visible and all_frozen.
+ */
+ else if (presult->all_visible &&
+ (!blk_known_av ||
+ (presult->all_frozen && !VM_ALL_FROZEN(relation, blockno, &vmbuffer))))
+ {
+ Assert(prstate.lpdead_items == 0);
+ vmflags = VISIBILITYMAP_ALL_VISIBLE;
+
+ /*
+ * If the page is all-frozen, we can pass InvalidTransactionId as
+ * our cutoff_xid, since a snapshotConflictHorizon sufficient to
+ * make everything safe for REDO was logged when the page's tuples
+ * were frozen.
+ */
+ if (presult->all_frozen)
+ {
+ Assert(!TransactionIdIsValid(presult->vm_conflict_horizon));
+ vmflags |= VISIBILITYMAP_ALL_FROZEN;
+ }
+
+ /*
+ * It's possible for the VM bit to be clear and the page-level bit
+ * to be set if checksums are not enabled.
+ *
+ * And even if we are just planning to update the frozen bit in
+ * the VM, we shouldn't rely on all_visible_according_to_vm as a
+ * proxy for the page-level PD_ALL_VISIBLE bit being set, since it
+ * might have become stale.
+ *
+ * If the heap page is all-visible but the VM bit is not set, we
+ * don't need to dirty the heap page. However, if checksums are
+ * enabled, we do need to make sure that the heap page is dirtied
+ * before passing it to visibilitymap_set(), because it may be
+ * logged.
+ */
+ if (!PageIsAllVisible(page) || XLogHintBitIsNeeded())
+ {
+ PageSetAllVisible(page);
+ MarkBufferDirty(buffer);
+ }
+
+ old_vmbits = visibilitymap_set(relation, blockno, buffer, InvalidXLogRecPtr,
+ vmbuffer, presult->vm_conflict_horizon,
+ vmflags);
+ }
+ }
+
+ presult->lpdead_items = prstate.lpdead_items;
+ /* the presult->deadoffsets array was already filled in */
+
+ presult->old_vmbits = old_vmbits;
+ presult->new_vmbits = vmflags;
+
if (prstate.freeze)
{
if (presult->nfrozen > 0)
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index 64ae63dcb12..892081033cc 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -1932,7 +1932,6 @@ lazy_scan_new_or_empty(LVRelState *vacrel, Buffer buf, BlockNumber blkno,
return false;
}
-
/* qsort comparator for sorting OffsetNumbers */
static int
cmpOffsetNumbers(const void *a, const void *b)
@@ -1948,7 +1947,8 @@ cmpOffsetNumbers(const void *a, const void *b)
* vmbuffer is the buffer containing the VM block with visibility information
* for the heap block, blkno. all_visible_according_to_vm is the saved
* visibility status of the heap block looked up earlier by the caller. We
- * won't rely entirely on this status, as it may be out of date.
+ * won't rely entirely on this status, as it may be out of date. These will be
+ * passed on to heap_page_prune_and_freeze() to use while setting the VM.
*
* *has_lpdead_items is set to true or false depending on whether, upon return
* from this function, any LP_DEAD items are still present on the page.
@@ -1977,6 +1977,7 @@ lazy_scan_prune(LVRelState *vacrel,
/*
* Prune all HOT-update chains and potentially freeze tuples on this page.
+ * Then, if the page's visibility status has changed, update the VM.
*
* If the relation has no indexes, we can immediately mark would-be dead
* items LP_UNUSED.
@@ -1985,10 +1986,6 @@ lazy_scan_prune(LVRelState *vacrel,
* presult.ndeleted. It should not be confused with presult.lpdead_items;
* presult.lpdead_items's final value can be thought of as the number of
* tuples that were deleted from indexes.
- *
- * We will update the VM after collecting LP_DEAD items and freezing
- * tuples. Pruning will have determined whether or not the page is
- * all-visible.
*/
prune_options = HEAP_PAGE_PRUNE_FREEZE | HEAP_PAGE_PRUNE_UPDATE_VM;
if (vacrel->nindexes == 0)
@@ -2080,88 +2077,26 @@ lazy_scan_prune(LVRelState *vacrel,
Assert(!presult.all_visible || !(*has_lpdead_items));
/*
- * Handle setting visibility map bit based on information from the VM (as
- * of last heap_vac_scan_next_block() call), and from all_visible and
- * all_frozen variables.
+ * For the purposes of logging, count whether or not the page was newly
+ * set all-visible and, potentially, all-frozen.
*/
- if (presult.vm_corruption)
+ if ((presult.old_vmbits & VISIBILITYMAP_ALL_VISIBLE) == 0 &&
+ (presult.new_vmbits & VISIBILITYMAP_ALL_VISIBLE) != 0)
{
- /* Don't update the VM if we just cleared corruption in it */
- }
-
- /*
- * If the page isn't yet marked all-visible in the VM or it is and needs
- * to me marked all-frozen, update the VM Note that all_frozen is only
- * valid if all_visible is true, so we must check both all_visible and
- * all_frozen.
- */
- else if (presult.all_visible &&
- (!all_visible_according_to_vm ||
- (presult.all_frozen && !VM_ALL_FROZEN(vacrel->rel, blkno, &vmbuffer))))
- {
- uint8 old_vmbits;
- uint8 flags = VISIBILITYMAP_ALL_VISIBLE;
-
- /*
- * If the page is all-frozen, we can pass InvalidTransactionId as our
- * cutoff_xid, since a snapshotConflictHorizon sufficient to make
- * everything safe for REDO was logged when the page's tuples were
- * frozen.
- */
- if (presult.all_frozen)
- {
- Assert(!TransactionIdIsValid(presult.vm_conflict_horizon));
- flags |= VISIBILITYMAP_ALL_FROZEN;
- }
-
- /*
- * It should never be the case that the visibility map page is set
- * while the page-level bit is clear, but the reverse is allowed (if
- * checksums are not enabled). Regardless, set both bits so that we
- * get back in sync.
- *
- * If the heap page is all-visible but the VM bit is not set, we don't
- * need to dirty the heap page. However, if checksums are enabled, we
- * do need to make sure that the heap page is dirtied before passing
- * it to visibilitymap_set(), because it may be logged.
- */
- if (!PageIsAllVisible(page) || XLogHintBitIsNeeded())
- {
- PageSetAllVisible(page);
- MarkBufferDirty(buf);
- }
-
- old_vmbits = visibilitymap_set(vacrel->rel, blkno, buf,
- InvalidXLogRecPtr,
- vmbuffer, presult.vm_conflict_horizon,
- flags);
-
- /*
- * Even if we are only setting the all-frozen bit, there is a small
- * chance that the VM was modified sometime between setting
- * all_visible_according_to_vm and checking the visibility during
- * pruning. Check the return value of old_vmbits to ensure the
- * visibility map counters used for logging are accurate.
- *
- * If the page wasn't already set all-visible and/or all-frozen in the
- * VM, count it as newly set for logging.
- */
- if ((old_vmbits & VISIBILITYMAP_ALL_VISIBLE) == 0)
- {
- vacrel->vm_new_visible_pages++;
- if (presult.all_frozen)
- {
- vacrel->vm_new_visible_frozen_pages++;
- *vm_page_frozen = true;
- }
- }
- else if ((old_vmbits & VISIBILITYMAP_ALL_FROZEN) == 0 &&
- presult.all_frozen)
+ vacrel->vm_new_visible_pages++;
+ if ((presult.new_vmbits & VISIBILITYMAP_ALL_FROZEN) != 0)
{
- vacrel->vm_new_frozen_pages++;
+ vacrel->vm_new_visible_frozen_pages++;
*vm_page_frozen = true;
}
}
+ else if ((presult.old_vmbits & VISIBILITYMAP_ALL_FROZEN) == 0 &&
+ (presult.new_vmbits & VISIBILITYMAP_ALL_FROZEN) != 0)
+ {
+ Assert((presult.new_vmbits & VISIBILITYMAP_ALL_VISIBLE) != 0);
+ vacrel->vm_new_frozen_pages++;
+ *vm_page_frozen = true;
+ }
return presult.ndeleted;
}
diff --git a/src/include/access/heapam.h b/src/include/access/heapam.h
index 0c7eb5e46f4..b85648456e9 100644
--- a/src/include/access/heapam.h
+++ b/src/include/access/heapam.h
@@ -235,20 +235,21 @@ typedef struct PruneFreezeResult
int recently_dead_tuples;
/*
- * all_visible and all_frozen indicate if the all-visible and all-frozen
- * bits in the visibility map can be set for this page, after pruning.
+ * all_visible and all_frozen indicate the status of the page as reflected
+ * in the visibility map after pruning, freezing, and setting any pages
+ * all-visible in the visibility map.
*
- * vm_conflict_horizon is the newest xmin of live tuples on the page. The
- * caller can use it as the conflict horizon when setting the VM bits. It
- * is only valid if we froze some tuples (nfrozen > 0), and all_frozen is
- * true.
+ * vm_conflict_horizon is the newest xmin of live tuples on the page
+ * (older than OldestXmin). It will only be valid if we did not set the
+ * page all-frozen in the VM.
*
* These are only set if the HEAP_PRUNE_FREEZE option is set.
*/
bool all_visible;
bool all_frozen;
TransactionId vm_conflict_horizon;
- bool vm_corruption;
+ uint8 old_vmbits;
+ uint8 new_vmbits;
/*
* Whether or not the page makes rel truncation unsafe. This is set to
--
2.43.0
v8-0010-Keep-all_frozen-updated-too-in-heap_page_prune_an.patchtext/x-patch; charset=US-ASCII; name=v8-0010-Keep-all_frozen-updated-too-in-heap_page_prune_an.patchDownload
From cd33da95773743e046219d8bc94d9c929cd5be7f Mon Sep 17 00:00:00 2001
From: Melanie Plageman <melanieplageman@gmail.com>
Date: Wed, 23 Jul 2025 16:01:24 -0400
Subject: [PATCH v8 10/22] Keep all_frozen updated too in
heap_page_prune_and_freeze
We previously relied on only using all-visible and all-frozen together
but it's best to keep them both updated.
---
src/backend/access/heap/pruneheap.c | 15 ++++++---------
1 file changed, 6 insertions(+), 9 deletions(-)
diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index 72216126945..a562573763a 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -143,10 +143,6 @@ typedef struct
* whether to freeze the page or not. The all_visible and all_frozen
* values returned to the caller are adjusted to include LP_DEAD items at
* the end.
- *
- * all_frozen should only be considered valid if all_visible is also set;
- * we don't bother to clear the all_frozen flag every time we clear the
- * all_visible flag.
*/
bool all_visible;
bool all_frozen;
@@ -824,6 +820,7 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
*/
}
+ Assert(!prstate.all_frozen || prstate.all_visible);
/* Any error while applying the changes is critical */
START_CRIT_SECTION();
@@ -1468,7 +1465,7 @@ heap_prune_record_unchanged_lp_normal(Page page, PruneState *prstate, OffsetNumb
if (!HeapTupleHeaderXminCommitted(htup))
{
- prstate->all_visible = false;
+ prstate->all_visible = prstate->all_frozen = false;
break;
}
@@ -1490,7 +1487,7 @@ heap_prune_record_unchanged_lp_normal(Page page, PruneState *prstate, OffsetNumb
Assert(prstate->cutoffs);
if (!TransactionIdPrecedes(xmin, prstate->cutoffs->OldestXmin))
{
- prstate->all_visible = false;
+ prstate->all_visible = prstate->all_frozen = false;
break;
}
@@ -1503,7 +1500,7 @@ heap_prune_record_unchanged_lp_normal(Page page, PruneState *prstate, OffsetNumb
case HEAPTUPLE_RECENTLY_DEAD:
prstate->recently_dead_tuples++;
- prstate->all_visible = false;
+ prstate->all_visible = prstate->all_frozen = false;
/*
* This tuple will soon become DEAD. Update the hint field so
@@ -1522,7 +1519,7 @@ heap_prune_record_unchanged_lp_normal(Page page, PruneState *prstate, OffsetNumb
* assumption is a bit shaky, but it is what acquire_sample_rows()
* does, so be consistent.
*/
- prstate->all_visible = false;
+ prstate->all_visible = prstate->all_frozen = false;
/*
* If we wanted to optimize for aborts, we might consider marking
@@ -1540,7 +1537,7 @@ heap_prune_record_unchanged_lp_normal(Page page, PruneState *prstate, OffsetNumb
* will commit and update the counters after we report.
*/
prstate->live_tuples++;
- prstate->all_visible = false;
+ prstate->all_visible = prstate->all_frozen = false;
/*
* This tuple may soon become DEAD. Update the hint field so that
--
2.43.0
v8-0012-Eliminate-xl_heap_visible-from-vacuum-phase-I-pru.patchtext/x-patch; charset=US-ASCII; name=v8-0012-Eliminate-xl_heap_visible-from-vacuum-phase-I-pru.patchDownload
From 41eb35a7bbeff71763c2be79ae4ecae7f29e4d6a Mon Sep 17 00:00:00 2001
From: Melanie Plageman <melanieplageman@gmail.com>
Date: Wed, 18 Jun 2025 12:41:00 -0400
Subject: [PATCH v8 12/22] Eliminate xl_heap_visible from vacuum phase I
prune/freeze
Instead of emitting a separate WAL record for every block rendered
all-visible/frozen by vacuum's phase I, include the changes to the VM in
the xl_heap_prune record already emitted.
This is only enabled for vacuum's prune/freeze work, not for on-access
pruning.
---
src/backend/access/heap/pruneheap.c | 454 ++++++++++++++++-----------
src/backend/access/heap/vacuumlazy.c | 30 --
src/include/access/heapam.h | 15 +-
3 files changed, 278 insertions(+), 221 deletions(-)
diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index fcf054d04a8..7cef05be5d0 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -44,6 +44,13 @@ typedef struct
bool mark_unused_now;
/* whether to attempt freezing tuples */
bool freeze;
+
+ /*
+ * Whether or not to consider updating the VM. There is some bookkeeping
+ * that must be maintained if we would like to update the VM.
+ */
+ bool consider_update_vm;
+
struct VacuumCutoffs *cutoffs;
/*-------------------------------------------------------
@@ -108,8 +115,9 @@ typedef struct
*
* These fields are not used by pruning itself for the most part, but are
* used to collect information about what was pruned and what state the
- * page is in after pruning, for the benefit of the caller. They are
- * copied to the caller's PruneFreezeResult at the end.
+ * page is in after pruning to use when updating the visibility map and
+ * for the benefit of the caller. They are copied to the caller's
+ * PruneFreezeResult at the end.
* -------------------------------------------------------
*/
@@ -138,11 +146,10 @@ typedef struct
* bits. It is only valid if we froze some tuples, and all_frozen is
* true.
*
- * NOTE: all_visible and all_frozen don't include LP_DEAD items. That's
- * convenient for heap_page_prune_and_freeze(), to use them to decide
- * whether to freeze the page or not. The all_visible and all_frozen
- * values returned to the caller are adjusted to include LP_DEAD items at
- * the end.
+ * NOTE: all_visible and all_frozen don't include LP_DEAD items until
+ * directly before updating the VM. We ignore LP_DEAD items when deciding
+ * whether or not to opportunistically freeze and when determining the
+ * snapshot conflict horizon required when freezing tuples.
*/
bool all_visible;
bool all_frozen;
@@ -371,12 +378,15 @@ identify_and_fix_vm_corruption(Relation relation,
* If the HEAP_PRUNE_FREEZE option is set, we will also freeze tuples if it's
* required in order to advance relfrozenxid / relminmxid, or if it's
* considered advantageous for overall system performance to do so now. The
- * 'cutoffs', 'presult', 'new_relfrozen_xid' and 'new_relmin_mxid' arguments
- * are required when freezing. When HEAP_PRUNE_FREEZE option is set, we also
- * set presult->all_visible and presult->all_frozen on exit, to indicate if
- * the VM bits can be set. They are always set to false when the
- * HEAP_PRUNE_FREEZE option is not set, because at the moment only callers
- * that also freeze need that information.
+ * 'cutoffs', 'new_relfrozen_xid' and 'new_relmin_mxid' arguments are required
+ * when freezing.
+ *
+ * If HEAP_PAGE_PRUNE_UPDATE_VM is set and the visibility status of the page
+ * has changed, we will update the VM at the same time as pruning and freezing
+ * the heap page. We will also update presult->old_vmbits and
+ * presult->new_vmbits with the state of the VM before and after updating it
+ * for the caller to use in bookkeeping. Note that new and old_vmbits will be
+ * 0 if HEAP_PAGE_PRUNE_UPDATE_VM is not set.
*
* blk_known_av is the visibility status of the heap block as of the last call
* to find_next_unskippable_block(). vmbuffer is the buffer that may already
@@ -392,6 +402,8 @@ identify_and_fix_vm_corruption(Relation relation,
* FREEZE indicates that we will also freeze tuples, and will return
* 'all_visible', 'all_frozen' flags to the caller.
*
+ * UPDATE_VM indicates that we will set the page's status in the VM.
+ *
* cutoffs contains the freeze cutoffs, established by VACUUM at the beginning
* of vacuuming the relation. Required if HEAP_PRUNE_FREEZE option is set.
* cutoffs->OldestXmin is also used to determine if dead tuples are
@@ -436,18 +448,24 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
HeapTupleData tup;
bool do_freeze;
bool do_prune;
- bool do_hint;
+ bool do_hint_full_or_prunable;
+ bool do_set_vm;
uint8 vmflags = 0;
uint8 old_vmbits = 0;
bool hint_bit_fpi;
int64 fpi_before = pgWalUsage.wal_fpi;
+ bool all_frozen_except_lp_dead = false;
+ bool set_pd_all_visible = false;
/* Copy parameters to prstate */
prstate.vistest = vistest;
prstate.mark_unused_now = (options & HEAP_PAGE_PRUNE_MARK_UNUSED_NOW) != 0;
prstate.freeze = (options & HEAP_PAGE_PRUNE_FREEZE) != 0;
+ prstate.consider_update_vm = (options & HEAP_PAGE_PRUNE_UPDATE_VM) != 0;
prstate.cutoffs = cutoffs;
+ Assert(!prstate.consider_update_vm || vmbuffer);
+
/*
* Our strategy is to scan the page and make lists of items to change,
* then apply the changes within a critical section. This keeps as much
@@ -492,50 +510,57 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
prstate.deadoffsets = presult->deadoffsets;
/*
- * Caller may update the VM after we're done. We can keep track of
- * whether the page will be all-visible and all-frozen after pruning and
- * freezing to help the caller to do that.
+ * Keep track of whether or not the page will be all-visible and
+ * all-frozen for use in opportunistic freezing and to update the VM if
+ * the caller requests it.
+ *
+ * Currently, only VACUUM attempts freezing and setting the VM bits. But
+ * other callers could do either one. The visibility bookkeeping is
+ * required for opportunistic freezing (in addition to setting the VM
+ * bits) because we only consider opportunistically freezing tuples if the
+ * whole page would become all-frozen or if the whole page will be frozen
+ * except for dead tuples that will be removed by vacuum.
*
- * Currently, only VACUUM sets the VM bits. To save the effort, only do
- * the bookkeeping if the caller needs it. Currently, that's tied to
- * HEAP_PAGE_PRUNE_FREEZE, but it could be a separate flag if you wanted
- * to update the VM bits without also freezing or freeze without also
- * setting the VM bits.
+ * If only updating the VM, we must initialize all_frozen to false, as
+ * heap_prepare_freeze_tuple() will not be called for each tuple on the
+ * page and we will not end up correctly setting it to false later.
*
- * In addition to telling the caller whether it can set the VM bit, we
- * also use 'all_visible' and 'all_frozen' for our own decision-making. If
- * the whole page would become frozen, we consider opportunistically
- * freezing tuples. We will not be able to freeze the whole page if there
- * are tuples present that are not visible to everyone or if there are
- * dead tuples which are not yet removable. However, dead tuples which
- * will be removed by the end of vacuuming should not preclude us from
- * opportunistically freezing. Because of that, we do not clear
- * all_visible when we see LP_DEAD items. We fix that at the end of the
- * function, when we return the value to the caller, so that the caller
- * doesn't set the VM bit incorrectly.
+ * Dead tuples which will be removed by the end of vacuuming should not
+ * preclude us from opportunistically freezing, so we do not clear
+ * all_visible when we see LP_DEAD items. We fix that after determining
+ * whether or not to freeze but before deciding whether or not to update
+ * the VM so that we don't set the VM bit incorrectly.
+ *
+ * If not freezing or updating the VM, we otherwise avoid the extra
+ * bookkeeping. Initializing all_visible to false allows skipping the work
+ * to update them in heap_prune_record_unchanged_lp_normal().
*/
if (prstate.freeze)
{
prstate.all_visible = true;
prstate.all_frozen = true;
}
+ else if (prstate.consider_update_vm)
+ {
+ prstate.all_visible = true;
+ prstate.all_frozen = false;
+ }
else
{
- /*
- * Initializing to false allows skipping the work to update them in
- * heap_prune_record_unchanged_lp_normal().
- */
prstate.all_visible = false;
prstate.all_frozen = false;
}
/*
- * The visibility cutoff xid is the newest xmin of live tuples on the
- * page. In the common case, this will be set as the conflict horizon the
- * caller can use for updating the VM. If, at the end of freezing and
- * pruning, the page is all-frozen, there is no possibility that any
- * running transaction on the standby does not see tuples on the page as
- * all-visible, so the conflict horizon remains InvalidTransactionId.
+ * The visibility cutoff xid is the newest xmin of live, committed tuples
+ * older than OldestXmin on the page. This field is only kept up-to-date
+ * if the page is all-visible. As soon as a tuple is encountered that is
+ * not visible to all, this field is unmaintained. As long as it is
+ * maintained, it can be used to calculate the snapshot conflict horizon.
+ * This is most likely to happen when updating the VM and/or freezing all
+ * live tuples on the page. It is updated before returning to the caller
+ * because vacuum does assert-build only validation on the page using this
+ * field.
*/
prstate.visibility_cutoff_xid = InvalidTransactionId;
@@ -733,10 +758,11 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
/*
* Even if we don't prune anything, if we found a new value for the
- * pd_prune_xid field or the page was marked full, we will update the hint
- * bit.
+ * pd_prune_xid field or the page was marked full, we will update those
+ * hint bits.
*/
- do_hint = ((PageHeader) page)->pd_prune_xid != prstate.new_prune_xid ||
+ do_hint_full_or_prunable =
+ ((PageHeader) page)->pd_prune_xid != prstate.new_prune_xid ||
PageIsFull(page);
/*
@@ -784,7 +810,7 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
if (XLogCheckBufferNeedsBackup(buffer))
do_freeze = true;
}
- else if (do_hint)
+ else if (do_hint_full_or_prunable)
{
if (XLogHintBitIsNeeded() && XLogCheckBufferNeedsBackup(buffer))
do_freeze = true;
@@ -823,11 +849,84 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
*/
}
+ /*
+ * It was convenient to ignore LP_DEAD items in all_visible earlier on to
+ * make the choice of whether or not to freeze the page unaffected by the
+ * short-term presence of LP_DEAD items. These LP_DEAD items were
+ * effectively assumed to be LP_UNUSED items in the making. It doesn't
+ * matter which vacuum heap pass (initial pass or final pass) ends up
+ * setting the page all-frozen, as long as the ongoing VACUUM does it.
+ *
+ * Now that freezing has been finalized, unset all_visible if there are
+ * any LP_DEAD items on the page. It needs to reflect the present state of
+ * the page when using it to determine whether or not to update the VM.
+ *
+ * Keep track of whether or not the page was all-frozen except LP_DEAD
+ * items for the purposes of calculating the snapshot conflict horizon,
+ * though.
+ */
+ all_frozen_except_lp_dead = prstate.all_frozen;
+ if (prstate.lpdead_items > 0)
+ {
+ prstate.all_visible = false;
+ prstate.all_frozen = false;
+ }
+
Assert(!prstate.all_frozen || prstate.all_visible);
+
+ /*
+ * Handle setting visibility map bit based on information from the VM (as
+ * of last heap_vac_scan_next_block() call), and from all_visible and
+ * all_frozen variables.
+ */
+ if (prstate.consider_update_vm)
+ {
+ /*
+ * Clear any VM corruption. This does not need to be in a critical
+ * section, so we do it first. If PD_ALL_VISIBLE is incorrectly set,
+ * we may mark the heap page buffer dirty here and could end up doing
+ * so again later. This is not a correctness issue and is in the path
+ * of VM corruption, so we don't have to worry about the extra
+ * performance overhead.
+ */
+ if (identify_and_fix_vm_corruption(relation,
+ blockno, buffer, page,
+ blk_known_av, prstate.lpdead_items, vmbuffer))
+ {
+ /* If we fix corruption, don't update the VM further */
+ }
+
+ /* Determine if we actually need to set the VM and which bits to set. */
+ else if (prstate.all_visible &&
+ (!blk_known_av ||
+ (prstate.all_frozen && !VM_ALL_FROZEN(relation, blockno, &vmbuffer))))
+ {
+ vmflags |= VISIBILITYMAP_ALL_VISIBLE;
+ if (prstate.all_frozen)
+ vmflags |= VISIBILITYMAP_ALL_FROZEN;
+ }
+ }
+
+ do_set_vm = vmflags & VISIBILITYMAP_VALID_BITS;
+
+ /*
+ * Don't set PD_ALL_VISIBLE unless we also plan to set the VM. While it is
+ * correct for a heap page to have PD_ALL_VISIBLE even if the VM is not
+ * set, we strongly prefer to keep them in sync.
+ *
+ * Prior to Postgres 19, it was possible for the page-level bit to be set
+ * and the VM bit to be clear. This could happen if we crashed after
+ * setting PD_ALL_VISIBLE but before setting bits in the VM.
+ */
+ set_pd_all_visible = do_set_vm && !PageIsAllVisible(page);
+
+ /* Save these for the caller in case we later zero out vmflags */
+ presult->new_vmbits = vmflags;
+
/* Any error while applying the changes is critical */
START_CRIT_SECTION();
- if (do_hint)
+ if (do_hint_full_or_prunable)
{
/*
* Update the page's pd_prune_xid field to either zero, or the lowest
@@ -843,15 +942,16 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
PageClearFull(page);
/*
- * If that's all we had to do to the page, this is a non-WAL-logged
- * hint. If we are going to freeze or prune the page, we will mark
- * the buffer dirty below.
+ * If we are _only_ setting the prune_xid or PD_PAGE_FULL hint, then
+ * this is a non-WAL-logged hint. If we are going to freeze or prune
+ * tuples on the page or set PD_ALL_VISIBLE, we will mark the buffer
+ * dirty and emit WAL below.
*/
- if (!do_freeze && !do_prune)
+ if (!do_prune && !do_freeze && !set_pd_all_visible)
MarkBufferDirtyHint(buffer, true);
}
- if (do_prune || do_freeze)
+ if (do_prune || do_freeze || do_set_vm)
{
/* Apply the planned item changes and repair page fragmentation. */
if (do_prune)
@@ -865,12 +965,47 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
if (do_freeze)
heap_freeze_prepared_tuples(buffer, prstate.frozen, prstate.nfrozen);
- MarkBufferDirty(buffer);
+ if (set_pd_all_visible)
+ PageSetAllVisible(page);
/*
- * Emit a WAL XLOG_HEAP2_PRUNE_FREEZE record showing what we did
+ * We only set PD_ALL_VISIBLE if we also set the VM, and since setting
+ * the VM requires emitting WAL, MarkBufferDirtyHint() isn't
+ * appropriate here.
*/
- if (RelationNeedsWAL(relation))
+ if (do_prune || do_freeze || set_pd_all_visible)
+ MarkBufferDirty(buffer);
+
+ if (do_set_vm)
+ {
+ LockBuffer(vmbuffer, BUFFER_LOCK_EXCLUSIVE);
+ old_vmbits = visibilitymap_set_vmbyte(relation, blockno,
+ vmbuffer, vmflags);
+
+ if (old_vmbits == vmflags)
+ {
+ LockBuffer(vmbuffer, BUFFER_LOCK_UNLOCK);
+ do_set_vm = false;
+ /* 0 out vmflags so we don't emit WAL to update the VM */
+ vmflags = 0;
+ }
+ }
+
+ /*
+ * It should never be the case that PD_ALL_VISIBLE is not set and the
+ * VM is set. Or, if it were, we should have caught it earlier when
+ * finding and fixing VM corruption. So, if we found out the VM was
+ * already set above, we should have found PD_ALL_VISIBLE set earlier.
+ */
+ Assert(!set_pd_all_visible || do_set_vm);
+
+ /*
+ * Emit a WAL XLOG_HEAP2_PRUNE_FREEZE record showing what we did. If
+ * we were only updating the VM and it turns out it was already set,
+ * we will have unset do_set_vm earlier. As such, check it again
+ * before emitting the record.
+ */
+ if (RelationNeedsWAL(relation) && (do_set_vm || do_prune || do_freeze))
{
/*
* The snapshotConflictHorizon for the whole record should be the
@@ -882,35 +1017,56 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
* on the standby with xids older than the youngest tuple this
* record will freeze will conflict.
*/
- TransactionId frz_conflict_horizon = InvalidTransactionId;
- TransactionId conflict_xid;
+ TransactionId conflict_xid = InvalidTransactionId;
/*
- * We can use the visibility_cutoff_xid as our cutoff for
- * conflicts when the whole page is eligible to become all-frozen
- * in the VM once we're done with it. Otherwise we generate a
- * conservative cutoff by stepping back from OldestXmin.
+ * If we are updating the VM, the conflict horizon is almost
+ * always the visibility cutoff XID.
+ *
+ * Separately, if we are freezing any tuples, as an optimization,
+ * we can use the visibility_cutoff_xid as the conflict horizon if
+ * the page will be all-frozen. This is true even if there are
+ * LP_DEAD line pointers because we ignored those when maintaining
+ * the visibility_cutoff_xid.
*/
- if (do_freeze)
+ if (do_set_vm || (do_freeze && all_frozen_except_lp_dead))
+ conflict_xid = prstate.visibility_cutoff_xid;
+
+ /*
+ * Otherwise, if we are freezing but the page would not be
+ * all-frozen, we have to use the more pessimistic horizon of
+ * OldestXmin, which may be newer than the newest tuple we froze.
+ * We currently don't track the newest tuple we froze.
+ */
+ else if (do_freeze)
{
- if (prstate.all_visible && prstate.all_frozen)
- frz_conflict_horizon = prstate.visibility_cutoff_xid;
- else
- {
- /* Avoids false conflicts when hot_standby_feedback in use */
- frz_conflict_horizon = prstate.cutoffs->OldestXmin;
- TransactionIdRetreat(frz_conflict_horizon);
- }
+ conflict_xid = prstate.cutoffs->OldestXmin;
+ TransactionIdRetreat(conflict_xid);
}
- if (TransactionIdFollows(frz_conflict_horizon, prstate.latest_xid_removed))
- conflict_xid = frz_conflict_horizon;
- else
+ /*
+ * If we are removing tuples with a younger xmax than our so far
+ * calculated conflict_xid, we must use this as our horizon.
+ */
+ if (TransactionIdFollows(prstate.latest_xid_removed, conflict_xid))
conflict_xid = prstate.latest_xid_removed;
+ /*
+ * We can omit the snapshot conflict horizon if we are not pruning
+ * or freezing any tuples and are setting an already all-visible
+ * page all-frozen in the VM. In this case, all of the tuples on
+ * the page must already be visible to all MVCC snapshots on the
+ * standby.
+ */
+ if (!do_prune && !do_freeze && do_set_vm &&
+ blk_known_av && (vmflags & VISIBILITYMAP_ALL_FROZEN))
+ conflict_xid = InvalidTransactionId;
+
log_heap_prune_and_freeze(relation, buffer,
false,
- InvalidBuffer, 0, false,
+ vmbuffer,
+ vmflags,
+ set_pd_all_visible,
conflict_xid,
true, reason,
prstate.frozen, prstate.nfrozen,
@@ -922,124 +1078,55 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
END_CRIT_SECTION();
- /* Copy information back for caller */
- presult->ndeleted = prstate.ndeleted;
- presult->nnewlpdead = prstate.ndead;
- presult->nfrozen = prstate.nfrozen;
- presult->live_tuples = prstate.live_tuples;
- presult->recently_dead_tuples = prstate.recently_dead_tuples;
-
- /*
- * It was convenient to ignore LP_DEAD items in all_visible earlier on to
- * make the choice of whether or not to freeze the page unaffected by the
- * short-term presence of LP_DEAD items. These LP_DEAD items were
- * effectively assumed to be LP_UNUSED items in the making. It doesn't
- * matter which vacuum heap pass (initial pass or final pass) ends up
- * setting the page all-frozen, as long as the ongoing VACUUM does it.
- *
- * Now that freezing has been finalized, unset all_visible if there are
- * any LP_DEAD items on the page. It needs to reflect the present state
- * of the page, as expected for updating the visibility map.
- */
- if (prstate.all_visible && prstate.lpdead_items == 0)
- {
- presult->all_visible = prstate.all_visible;
- presult->all_frozen = prstate.all_frozen;
- }
- else
- {
- presult->all_visible = false;
- presult->all_frozen = false;
- }
-
- presult->hastup = prstate.hastup;
+ if (do_set_vm)
+ LockBuffer(vmbuffer, BUFFER_LOCK_UNLOCK);
- /*
- * If updating the visibility map, the conflict horizon for that record
- * must be the newest xmin on the page. However, if the page is
- * completely frozen, there can be no conflict and the vm_conflict_horizon
- * should remain InvalidTransactionId. This includes the case that we
- * just froze all the tuples; the prune-freeze record included the
- * conflict XID already so the VM update record doesn't need it.
- */
- if (presult->all_frozen)
- presult->vm_conflict_horizon = InvalidTransactionId;
- else
- presult->vm_conflict_horizon = prstate.visibility_cutoff_xid;
+ Assert(!prstate.all_visible || (prstate.lpdead_items == 0));
/*
- * Handle setting visibility map bit based on information from the VM (as
- * of last heap_vac_scan_next_block() call), and from all_visible and
- * all_frozen variables.
+ * VACUUM will call heap_page_is_all_visible() during the second pass over
+ * the heap to determine all_visible and all_frozen for the page -- this
+ * is a specialized version of the logic from this function. Now that
+ * we've finished pruning and freezing, make sure that we're in total
+ * agreement with heap_page_is_all_visible() using an assertion. We will
+ * have already set the page in the VM, so this assertion will only let
+ * you know that you've already done something wrong.
*/
- if (options & HEAP_PAGE_PRUNE_UPDATE_VM)
+#ifdef USE_ASSERT_CHECKING
+ if (prstate.all_visible)
{
- if (identify_and_fix_vm_corruption(relation,
- blockno, buffer, page,
- blk_known_av,
- prstate.lpdead_items, vmbuffer))
- {
- /* If we fix corruption, don't update the VM further */
- }
+ TransactionId debug_cutoff;
+ bool debug_all_frozen;
- /*
- * If the page isn't yet marked all-visible in the VM or it is and
- * needs to me marked all-frozen, update the VM Note that all_frozen
- * is only valid if all_visible is true, so we must check both
- * all_visible and all_frozen.
- */
- else if (presult->all_visible &&
- (!blk_known_av ||
- (presult->all_frozen && !VM_ALL_FROZEN(relation, blockno, &vmbuffer))))
- {
- Assert(prstate.lpdead_items == 0);
- vmflags = VISIBILITYMAP_ALL_VISIBLE;
+ Assert(cutoffs);
- /*
- * If the page is all-frozen, we can pass InvalidTransactionId as
- * our cutoff_xid, since a snapshotConflictHorizon sufficient to
- * make everything safe for REDO was logged when the page's tuples
- * were frozen.
- */
- if (presult->all_frozen)
- {
- Assert(!TransactionIdIsValid(presult->vm_conflict_horizon));
- vmflags |= VISIBILITYMAP_ALL_FROZEN;
- }
+ Assert(prstate.lpdead_items == 0);
- /*
- * It's possible for the VM bit to be clear and the page-level bit
- * to be set if checksums are not enabled.
- *
- * And even if we are just planning to update the frozen bit in
- * the VM, we shouldn't rely on all_visible_according_to_vm as a
- * proxy for the page-level PD_ALL_VISIBLE bit being set, since it
- * might have become stale.
- *
- * If the heap page is all-visible but the VM bit is not set, we
- * don't need to dirty the heap page. However, if checksums are
- * enabled, we do need to make sure that the heap page is dirtied
- * before passing it to visibilitymap_set(), because it may be
- * logged.
- */
- if (!PageIsAllVisible(page) || XLogHintBitIsNeeded())
- {
- PageSetAllVisible(page);
- MarkBufferDirty(buffer);
- }
+ if (!heap_page_is_all_visible(relation, buffer,
+ cutoffs->OldestXmin,
+ &debug_all_frozen,
+ &debug_cutoff, off_loc))
+ Assert(false);
- old_vmbits = visibilitymap_set(relation, blockno, buffer, InvalidXLogRecPtr,
- vmbuffer, presult->vm_conflict_horizon,
- vmflags);
- }
+ Assert(prstate.all_frozen == debug_all_frozen);
+
+ Assert(!TransactionIdIsValid(debug_cutoff) ||
+ debug_cutoff == prstate.visibility_cutoff_xid);
}
+#endif
+ /* Copy information back for caller */
+ presult->ndeleted = prstate.ndeleted;
+ presult->nnewlpdead = prstate.ndead;
+ presult->nfrozen = prstate.nfrozen;
+ presult->live_tuples = prstate.live_tuples;
+ presult->recently_dead_tuples = prstate.recently_dead_tuples;
+ presult->old_vmbits = old_vmbits;
+ /* new_vmbits was set above */
+ presult->hastup = prstate.hastup;
presult->lpdead_items = prstate.lpdead_items;
/* the presult->deadoffsets array was already filled in */
- presult->old_vmbits = old_vmbits;
- presult->new_vmbits = vmflags;
-
if (prstate.freeze)
{
if (presult->nfrozen > 0)
@@ -1621,7 +1708,12 @@ heap_prune_record_unchanged_lp_normal(Page page, PruneState *prstate, OffsetNumb
break;
}
- /* Consider freezing any normal tuples which will not be removed */
+ /*
+ * Consider freezing any normal tuples which will not be removed.
+ * Regardless of whether or not we want to freeze the tuples, if we want
+ * to update the VM, we have to call heap_prepare_freeze_tuple() on every
+ * tuple to know whether or not the page will be totally frozen.
+ */
if (prstate->freeze)
{
bool totally_frozen;
@@ -2234,6 +2326,8 @@ log_heap_prune_and_freeze(Relation relation, Buffer buffer,
OffsetNumber frz_offsets[MaxHeapTuplesPerPage];
bool do_prune = nredirected > 0 || ndead > 0 || nunused > 0;
+ Assert(do_prune || nfrozen > 0 || vmflags & VISIBILITYMAP_VALID_BITS);
+
xlrec.flags = 0;
regbuf_flags = REGBUF_STANDARD;
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index 892081033cc..5129c13fee9 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -2013,34 +2013,6 @@ lazy_scan_prune(LVRelState *vacrel,
vacrel->new_frozen_tuple_pages++;
}
- /*
- * VACUUM will call heap_page_is_all_visible() during the second pass over
- * the heap to determine all_visible and all_frozen for the page -- this
- * is a specialized version of the logic from this function. Now that
- * we've finished pruning and freezing, make sure that we're in total
- * agreement with heap_page_is_all_visible() using an assertion.
- */
-#ifdef USE_ASSERT_CHECKING
- /* Note that all_frozen value does not matter when !all_visible */
- if (presult.all_visible)
- {
- TransactionId debug_cutoff;
- bool debug_all_frozen;
-
- Assert(presult.lpdead_items == 0);
-
- if (!heap_page_is_all_visible(vacrel->rel, buf,
- vacrel->cutoffs.OldestXmin, &debug_all_frozen,
- &debug_cutoff, &vacrel->offnum))
- Assert(false);
-
- Assert(presult.all_frozen == debug_all_frozen);
-
- Assert(!TransactionIdIsValid(debug_cutoff) ||
- debug_cutoff == presult.vm_conflict_horizon);
- }
-#endif
-
/*
* Now save details of the LP_DEAD items from the page in vacrel
*/
@@ -2074,8 +2046,6 @@ lazy_scan_prune(LVRelState *vacrel,
/* Did we find LP_DEAD items? */
*has_lpdead_items = (presult.lpdead_items > 0);
- Assert(!presult.all_visible || !(*has_lpdead_items));
-
/*
* For the purposes of logging, count whether or not the page was newly
* set all-visible and, potentially, all-frozen.
diff --git a/src/include/access/heapam.h b/src/include/access/heapam.h
index b85648456e9..0b9bb1c9b13 100644
--- a/src/include/access/heapam.h
+++ b/src/include/access/heapam.h
@@ -235,19 +235,12 @@ typedef struct PruneFreezeResult
int recently_dead_tuples;
/*
- * all_visible and all_frozen indicate the status of the page as reflected
- * in the visibility map after pruning, freezing, and setting any pages
- * all-visible in the visibility map.
+ * old_vmbits are the state of the all-visible and all-frozen bits in the
+ * visibility map before updating it during phase I of vacuuming.
+ * new_vmbits are the state of those bits after phase I of vacuuming.
*
- * vm_conflict_horizon is the newest xmin of live tuples on the page
- * (older than OldestXmin). It will only be valid if we did not set the
- * page all-frozen in the VM.
- *
- * These are only set if the HEAP_PRUNE_FREEZE option is set.
+ * These are only set if the HEAP_PAGE_PRUNE_UPDATE_VM option is set.
*/
- bool all_visible;
- bool all_frozen;
- TransactionId vm_conflict_horizon;
uint8 old_vmbits;
uint8 new_vmbits;
--
2.43.0
v8-0013-Rename-PruneState.freeze-to-attempt_freeze.patchtext/x-patch; charset=US-ASCII; name=v8-0013-Rename-PruneState.freeze-to-attempt_freeze.patchDownload
From 3202f56fe96c30c79c03fa4e6090ae67012840aa Mon Sep 17 00:00:00 2001
From: Melanie Plageman <melanieplageman@gmail.com>
Date: Thu, 31 Jul 2025 14:07:51 -0400
Subject: [PATCH v8 13/22] Rename PruneState.freeze to attempt_freeze
This makes it more clear that this is to indicate the caller would like
heap_page_prune_and_freeze() to consider freezing tuples -- not that we
ultimately will end up freezing them.
Also rename local variable hint_bit_fpi to did_tuple_hint_fpi. This
makes it clear it is about tuple hints and not page hints and that it
indicates something that happened and not something that could happen.
---
src/backend/access/heap/pruneheap.c | 20 ++++++++++----------
1 file changed, 10 insertions(+), 10 deletions(-)
diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index 7cef05be5d0..ef9bb0c273a 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -43,7 +43,7 @@ typedef struct
/* whether or not dead items can be set LP_UNUSED during pruning */
bool mark_unused_now;
/* whether to attempt freezing tuples */
- bool freeze;
+ bool attempt_freeze;
/*
* Whether or not to consider updating the VM. There is some bookkeeping
@@ -452,7 +452,7 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
bool do_set_vm;
uint8 vmflags = 0;
uint8 old_vmbits = 0;
- bool hint_bit_fpi;
+ bool did_tuple_hint_fpi;
int64 fpi_before = pgWalUsage.wal_fpi;
bool all_frozen_except_lp_dead = false;
bool set_pd_all_visible = false;
@@ -460,7 +460,7 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
/* Copy parameters to prstate */
prstate.vistest = vistest;
prstate.mark_unused_now = (options & HEAP_PAGE_PRUNE_MARK_UNUSED_NOW) != 0;
- prstate.freeze = (options & HEAP_PAGE_PRUNE_FREEZE) != 0;
+ prstate.attempt_freeze = (options & HEAP_PAGE_PRUNE_FREEZE) != 0;
prstate.consider_update_vm = (options & HEAP_PAGE_PRUNE_UPDATE_VM) != 0;
prstate.cutoffs = cutoffs;
@@ -485,7 +485,7 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
/* initialize page freezing working state */
prstate.pagefrz.freeze_required = false;
- if (prstate.freeze)
+ if (prstate.attempt_freeze)
{
Assert(new_relfrozen_xid && new_relmin_mxid);
prstate.pagefrz.FreezePageRelfrozenXid = *new_relfrozen_xid;
@@ -535,7 +535,7 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
* bookkeeping. Initializing all_visible to false allows skipping the work
* to update them in heap_prune_record_unchanged_lp_normal().
*/
- if (prstate.freeze)
+ if (prstate.attempt_freeze)
{
prstate.all_visible = true;
prstate.all_frozen = true;
@@ -653,7 +653,7 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
* If checksums are enabled, heap_prune_satisfies_vacuum() may have caused
* an FPI to be emitted.
*/
- hint_bit_fpi = fpi_before != pgWalUsage.wal_fpi;
+ did_tuple_hint_fpi = fpi_before != pgWalUsage.wal_fpi;
/*
* Process HOT chains.
@@ -770,7 +770,7 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
* plans we prepared, or not.
*/
do_freeze = false;
- if (prstate.freeze)
+ if (prstate.attempt_freeze)
{
if (prstate.pagefrz.freeze_required)
{
@@ -803,7 +803,7 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
*/
if (RelationNeedsWAL(relation))
{
- if (hint_bit_fpi)
+ if (did_tuple_hint_fpi)
do_freeze = true;
else if (do_prune)
{
@@ -1127,7 +1127,7 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
presult->lpdead_items = prstate.lpdead_items;
/* the presult->deadoffsets array was already filled in */
- if (prstate.freeze)
+ if (prstate.attempt_freeze)
{
if (presult->nfrozen > 0)
{
@@ -1714,7 +1714,7 @@ heap_prune_record_unchanged_lp_normal(Page page, PruneState *prstate, OffsetNumb
* to update the VM, we have to call heap_prepare_freeze_tuple() on every
* tuple to know whether or not the page will be totally frozen.
*/
- if (prstate->freeze)
+ if (prstate->attempt_freeze)
{
bool totally_frozen;
--
2.43.0
v8-0015-Rename-GlobalVisTestIsRemovableXid-to-GlobalVisXi.patchtext/x-patch; charset=US-ASCII; name=v8-0015-Rename-GlobalVisTestIsRemovableXid-to-GlobalVisXi.patchDownload
From 2189e119a3b666cb073821f8cf61ea00b9317863 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <melanieplageman@gmail.com>
Date: Fri, 18 Jul 2025 16:30:04 -0400
Subject: [PATCH v8 15/22] Rename GlobalVisTestIsRemovableXid() to
GlobalVisXidVisibleToAll()
Currently, we only use GlobalVisTestIsRemovableXid() to check if a
tuple's xmax is visible to all, meaning we can remove it. But future
commits will use GlobalVisTestIsRemovableXid() to test if a tuple's xmin
is visible to all for the purposes of determining if setting the page
all-visible in the VM. In that case, it makes more sense to call the
function GlobalVisXidVisibleToAll().
---
src/backend/access/heap/heapam_visibility.c | 6 +++---
src/backend/access/heap/pruneheap.c | 14 +++++++-------
src/backend/access/spgist/spgvacuum.c | 2 +-
src/backend/storage/ipc/procarray.c | 13 ++++++-------
src/include/utils/snapmgr.h | 4 ++--
5 files changed, 19 insertions(+), 20 deletions(-)
diff --git a/src/backend/access/heap/heapam_visibility.c b/src/backend/access/heap/heapam_visibility.c
index 05f6946fe60..4ebc8abdbeb 100644
--- a/src/backend/access/heap/heapam_visibility.c
+++ b/src/backend/access/heap/heapam_visibility.c
@@ -1447,7 +1447,7 @@ HeapTupleSatisfiesNonVacuumable(HeapTuple htup, Snapshot snapshot,
{
Assert(TransactionIdIsValid(dead_after));
- if (GlobalVisTestIsRemovableXid(snapshot->vistest, dead_after))
+ if (GlobalVisXidVisibleToAll(snapshot->vistest, dead_after))
res = HEAPTUPLE_DEAD;
}
else
@@ -1512,8 +1512,8 @@ HeapTupleIsSurelyDead(HeapTuple htup, GlobalVisState *vistest)
return false;
/* Deleter committed, so tuple is dead if the XID is old enough. */
- return GlobalVisTestIsRemovableXid(vistest,
- HeapTupleHeaderGetRawXmax(tuple));
+ return GlobalVisXidVisibleToAll(vistest,
+ HeapTupleHeaderGetRawXmax(tuple));
}
/*
diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index de656087941..1300d0e89f3 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -231,7 +231,7 @@ heap_page_prune_opt(Relation relation, Buffer buffer)
*/
vistest = GlobalVisTestFor(relation);
- if (!GlobalVisTestIsRemovableXid(vistest, prune_xid))
+ if (!GlobalVisXidVisibleToAll(vistest, prune_xid))
return;
/*
@@ -574,9 +574,9 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
* Determining HTSV only once for each tuple is required for correctness,
* to deal with cases where running HTSV twice could result in different
* results. For example, RECENTLY_DEAD can turn to DEAD if another
- * checked item causes GlobalVisTestIsRemovableFullXid() to update the
- * horizon, or INSERT_IN_PROGRESS can change to DEAD if the inserting
- * transaction aborts.
+ * checked item causes GlobalVisXidVisibleToAll() to update the horizon,
+ * or INSERT_IN_PROGRESS can change to DEAD if the inserting transaction
+ * aborts.
*
* It's also good for performance. Most commonly tuples within a page are
* stored at decreasing offsets (while the items are stored at increasing
@@ -1172,11 +1172,11 @@ heap_prune_satisfies_vacuum(PruneState *prstate, HeapTuple tup, Buffer buffer)
* Determine whether or not the tuple is considered dead when compared
* with the provided GlobalVisState. On-access pruning does not provide
* VacuumCutoffs. And for vacuum, even if the tuple's xmax is not older
- * than OldestXmin, GlobalVisTestIsRemovableXid() could find the row dead
- * if the GlobalVisState has been updated since the beginning of vacuuming
+ * than OldestXmin, GlobalVisXidVisibleToAll() could find the row dead if
+ * the GlobalVisState has been updated since the beginning of vacuuming
* the relation.
*/
- if (GlobalVisTestIsRemovableXid(prstate->vistest, dead_after))
+ if (GlobalVisXidVisibleToAll(prstate->vistest, dead_after))
return HEAPTUPLE_DEAD;
return res;
diff --git a/src/backend/access/spgist/spgvacuum.c b/src/backend/access/spgist/spgvacuum.c
index 8f8a1ad7796..496cca69410 100644
--- a/src/backend/access/spgist/spgvacuum.c
+++ b/src/backend/access/spgist/spgvacuum.c
@@ -536,7 +536,7 @@ vacuumRedirectAndPlaceholder(Relation index, Relation heaprel, Buffer buffer)
*/
if (dt->tupstate == SPGIST_REDIRECT &&
(!TransactionIdIsValid(dt->xid) ||
- GlobalVisTestIsRemovableXid(vistest, dt->xid)))
+ GlobalVisXidVisibleToAll(vistest, dt->xid)))
{
dt->tupstate = SPGIST_PLACEHOLDER;
Assert(opaque->nRedirection > 0);
diff --git a/src/backend/storage/ipc/procarray.c b/src/backend/storage/ipc/procarray.c
index 200f72c6e25..5c121cd72f5 100644
--- a/src/backend/storage/ipc/procarray.c
+++ b/src/backend/storage/ipc/procarray.c
@@ -4181,8 +4181,7 @@ GlobalVisUpdate(void)
* See comment for GlobalVisState for details.
*/
bool
-GlobalVisTestIsRemovableFullXid(GlobalVisState *state,
- FullTransactionId fxid)
+GlobalVisFullXidVisible(GlobalVisState *state, FullTransactionId fxid)
{
/*
* If fxid is older than maybe_needed bound, it definitely is visible to
@@ -4223,7 +4222,7 @@ GlobalVisTestIsRemovableFullXid(GlobalVisState *state,
* relfrozenxid).
*/
bool
-GlobalVisTestIsRemovableXid(GlobalVisState *state, TransactionId xid)
+GlobalVisXidVisibleToAll(GlobalVisState *state, TransactionId xid)
{
FullTransactionId fxid;
@@ -4237,7 +4236,7 @@ GlobalVisTestIsRemovableXid(GlobalVisState *state, TransactionId xid)
*/
fxid = FullXidRelativeTo(state->definitely_needed, xid);
- return GlobalVisTestIsRemovableFullXid(state, fxid);
+ return GlobalVisFullXidVisible(state, fxid);
}
/*
@@ -4251,12 +4250,12 @@ GlobalVisCheckRemovableFullXid(Relation rel, FullTransactionId fxid)
state = GlobalVisTestFor(rel);
- return GlobalVisTestIsRemovableFullXid(state, fxid);
+ return GlobalVisFullXidVisible(state, fxid);
}
/*
* Convenience wrapper around GlobalVisTestFor() and
- * GlobalVisTestIsRemovableXid(), see their comments.
+ * GlobalVisTestIsVisibleXid(), see their comments.
*/
bool
GlobalVisCheckRemovableXid(Relation rel, TransactionId xid)
@@ -4265,7 +4264,7 @@ GlobalVisCheckRemovableXid(Relation rel, TransactionId xid)
state = GlobalVisTestFor(rel);
- return GlobalVisTestIsRemovableXid(state, xid);
+ return GlobalVisXidVisibleToAll(state, xid);
}
/*
diff --git a/src/include/utils/snapmgr.h b/src/include/utils/snapmgr.h
index 604c1f90216..547c71fcbfe 100644
--- a/src/include/utils/snapmgr.h
+++ b/src/include/utils/snapmgr.h
@@ -100,8 +100,8 @@ extern char *ExportSnapshot(Snapshot snapshot);
*/
typedef struct GlobalVisState GlobalVisState;
extern GlobalVisState *GlobalVisTestFor(Relation rel);
-extern bool GlobalVisTestIsRemovableXid(GlobalVisState *state, TransactionId xid);
-extern bool GlobalVisTestIsRemovableFullXid(GlobalVisState *state, FullTransactionId fxid);
+extern bool GlobalVisXidVisibleToAll(GlobalVisState *state, TransactionId xid);
+extern bool GlobalVisFullXidVisible(GlobalVisState *state, FullTransactionId fxid);
extern bool GlobalVisCheckRemovableXid(Relation rel, TransactionId xid);
extern bool GlobalVisCheckRemovableFullXid(Relation rel, FullTransactionId fxid);
--
2.43.0
v8-0016-Use-GlobalVisState-to-determine-page-level-visibi.patchtext/x-patch; charset=US-ASCII; name=v8-0016-Use-GlobalVisState-to-determine-page-level-visibi.patchDownload
From e37454b4baa95e070c3a5d39affcc2d4ae733ad3 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <melanieplageman@gmail.com>
Date: Tue, 29 Jul 2025 14:38:24 -0400
Subject: [PATCH v8 16/22] Use GlobalVisState to determine page level
visibility
During pruning and during vacuum's third phase, we try to determine if
the whole page can be set all-visible in the visibility map. Instead of
using OldestXmin to determine if all the tuples on a page are visible to
everyone, use the GlobalVisState. This allows us to start setting the VM
during on-access pruning in a future commit.
It is possible for the GlobalVisState to change during the course of a
vacuum. In all but extraordinary cases, it moves forward, meaning more
pages could potentially be set in the VM.
Because comparing a transaction ID to the GlobalVisState requires more
operations than comparing it to another single transaction ID, we now
wait until after examining all the tuples on the page and if we have
maintained the visibility_cutoff_xid, we compare that to the
GlobalVisState just once per page. This works because if the page is
all-visible and has live, committed tuples on it, the
visibility_cutoff_xid will contain the newest xmin on the page. If
everyone can see it, the page is truly all-visible.
Doing this may mean we examine more tuples' xmins than before, as we may
have set all_visible to false sooner when encountering a live tuple
newer than OldestXmin. However, these extra comparisons were found not
to be significant in a profile.
---
src/backend/access/heap/heapam_visibility.c | 28 ++++++++++++
src/backend/access/heap/pruneheap.c | 48 +++++++++------------
src/backend/access/heap/vacuumlazy.c | 17 ++++----
src/include/access/heapam.h | 4 +-
4 files changed, 59 insertions(+), 38 deletions(-)
diff --git a/src/backend/access/heap/heapam_visibility.c b/src/backend/access/heap/heapam_visibility.c
index 4ebc8abdbeb..edd529dc3c0 100644
--- a/src/backend/access/heap/heapam_visibility.c
+++ b/src/backend/access/heap/heapam_visibility.c
@@ -1189,6 +1189,34 @@ HeapTupleSatisfiesVacuum(HeapTuple htup, TransactionId OldestXmin,
return res;
}
+/*
+ * Nearly the same as HeapTupleSatisfiesVacuum, but uses a GlobalVisState to
+ * determine whether or not a tuple is HEAPTUPLE_DEAD Or
+ * HEAPTUPLE_RECENTLY_DEAD. It serves the same purpose but can be used by
+ * callers that have not calculated a single OldestXmin value.
+ */
+HTSV_Result
+HeapTupleSatisfiesVacuumGlobalVis(HeapTuple htup, GlobalVisState *vistest,
+ Buffer buffer)
+{
+ TransactionId dead_after = InvalidTransactionId;
+ HTSV_Result res;
+
+ res = HeapTupleSatisfiesVacuumHorizon(htup, buffer, &dead_after);
+
+ if (res == HEAPTUPLE_RECENTLY_DEAD)
+ {
+ Assert(TransactionIdIsValid(dead_after));
+
+ if (GlobalVisXidVisibleToAll(vistest, dead_after))
+ res = HEAPTUPLE_DEAD;
+ }
+ else
+ Assert(!TransactionIdIsValid(dead_after));
+
+ return res;
+}
+
/*
* Work horse for HeapTupleSatisfiesVacuum and similar routines.
*
diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index 1300d0e89f3..f083189fccc 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -141,10 +141,9 @@ typedef struct
* all_visible and all_frozen indicate if the all-visible and all-frozen
* bits in the visibility map can be set for this page after pruning.
*
- * visibility_cutoff_xid is the newest xmin of live tuples on the page.
- * The caller can use it as the conflict horizon, when setting the VM
- * bits. It is only valid if we froze some tuples, and all_frozen is
- * true.
+ * visibility_cutoff_xid is the newest xmin of live tuples on the page. It
+ * can be used as the conflict horizon, when setting the VM or when
+ * freezing all the live tuples on the page.
*
* NOTE: all_visible and all_frozen don't include LP_DEAD items until
* directly before updating the VM. We ignore LP_DEAD items when deciding
@@ -553,14 +552,12 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
/*
* The visibility cutoff xid is the newest xmin of live, committed tuples
- * older than OldestXmin on the page. This field is only kept up-to-date
- * if the page is all-visible. As soon as a tuple is encountered that is
- * not visible to all, this field is unmaintained. As long as it is
- * maintained, it can be used to calculate the snapshot conflict horizon.
- * This is most likely to happen when updating the VM and/or freezing all
- * live tuples on the page. It is updated before returning to the caller
- * because vacuum does assert-build only validation on the page using this
- * field.
+ * on the page older than the visibility horizon represented in the
+ * GlobalVisState.
+ *
+ * If we encounter an uncommitted tuple, this field is unmaintained. If
+ * the page is being set all-visible or when freezing all live tuples on
+ * the page, it is used to calculate the snapshot conflict horizon.
*/
prstate.visibility_cutoff_xid = InvalidTransactionId;
@@ -756,6 +753,16 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
prstate.ndead > 0 ||
prstate.nunused > 0;
+ /*
+ * After processing all the live tuples on the page, if the newest xmin
+ * amongst them is not visible to everyone, the page cannot be
+ * all-visible.
+ */
+ if (prstate.all_visible &&
+ TransactionIdIsNormal(prstate.visibility_cutoff_xid) &&
+ !GlobalVisXidVisibleToAll(prstate.vistest, prstate.visibility_cutoff_xid))
+ prstate.all_visible = prstate.all_frozen = false;
+
/*
* Even if we don't prune anything, if we found a new value for the
* pd_prune_xid field or the page was marked full, we will update those
@@ -1098,12 +1105,10 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
TransactionId debug_cutoff;
bool debug_all_frozen;
- Assert(cutoffs);
-
Assert(prstate.lpdead_items == 0);
if (!heap_page_is_all_visible(relation, buffer,
- cutoffs->OldestXmin,
+ prstate.vistest,
&debug_all_frozen,
&debug_cutoff, off_loc))
Assert(false);
@@ -1628,19 +1633,6 @@ heap_prune_record_unchanged_lp_normal(Page page, PruneState *prstate, OffsetNumb
*/
xmin = HeapTupleHeaderGetXmin(htup);
- /*
- * For now always use prstate->cutoffs for this test, because
- * we only update 'all_visible' when freezing is requested. We
- * could use GlobalVisTestIsRemovableXid instead, if a
- * non-freezing caller wanted to set the VM bit.
- */
- Assert(prstate->cutoffs);
- if (!TransactionIdPrecedes(xmin, prstate->cutoffs->OldestXmin))
- {
- prstate->all_visible = prstate->all_frozen = false;
- break;
- }
-
/* Track newest xmin on page. */
if (TransactionIdFollows(xmin, prstate->visibility_cutoff_xid) &&
TransactionIdIsNormal(xmin))
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index 66ce30ddf03..61c6b3d21ac 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -464,7 +464,7 @@ static void dead_items_add(LVRelState *vacrel, BlockNumber blkno, OffsetNumber *
static void dead_items_reset(LVRelState *vacrel);
static void dead_items_cleanup(LVRelState *vacrel);
static bool heap_page_is_all_visible_except_lpdead(Relation rel, Buffer buf,
- TransactionId OldestXmin,
+ GlobalVisState *vistest,
OffsetNumber *deadoffsets,
int allowed_num_offsets,
bool *all_frozen,
@@ -2715,7 +2715,7 @@ lazy_vacuum_heap_page(LVRelState *vacrel, BlockNumber blkno, Buffer buffer,
InvalidOffsetNumber);
if (heap_page_is_all_visible_except_lpdead(vacrel->rel, buffer,
- vacrel->cutoffs.OldestXmin,
+ vacrel->vistest,
deadoffsets, num_offsets,
&all_frozen, &visibility_cutoff_xid,
&vacrel->offnum))
@@ -3458,13 +3458,13 @@ dead_items_cleanup(LVRelState *vacrel)
*/
bool
heap_page_is_all_visible(Relation rel, Buffer buf,
- TransactionId OldestXmin,
+ GlobalVisState *vistest,
bool *all_frozen,
TransactionId *visibility_cutoff_xid,
OffsetNumber *logging_offnum)
{
- return heap_page_is_all_visible_except_lpdead(rel, buf, OldestXmin,
+ return heap_page_is_all_visible_except_lpdead(rel, buf, vistest,
NULL, 0,
all_frozen,
visibility_cutoff_xid,
@@ -3499,7 +3499,7 @@ heap_page_is_all_visible(Relation rel, Buffer buf,
*/
static bool
heap_page_is_all_visible_except_lpdead(Relation rel, Buffer buf,
- TransactionId OldestXmin,
+ GlobalVisState *vistest,
OffsetNumber *deadoffsets,
int allowed_num_offsets,
bool *all_frozen,
@@ -3554,8 +3554,8 @@ heap_page_is_all_visible_except_lpdead(Relation rel, Buffer buf,
tuple.t_len = ItemIdGetLength(itemid);
tuple.t_tableOid = RelationGetRelid(rel);
- switch (HeapTupleSatisfiesVacuum(&tuple, OldestXmin,
- buf))
+ switch (HeapTupleSatisfiesVacuumGlobalVis(&tuple, vistest,
+ buf))
{
case HEAPTUPLE_LIVE:
{
@@ -3574,8 +3574,7 @@ heap_page_is_all_visible_except_lpdead(Relation rel, Buffer buf,
* that everyone sees it as committed?
*/
xmin = HeapTupleHeaderGetXmin(tuple.t_data);
- if (!TransactionIdPrecedes(xmin,
- OldestXmin))
+ if (!GlobalVisXidVisibleToAll(vistest, xmin))
{
all_visible = false;
*all_frozen = false;
diff --git a/src/include/access/heapam.h b/src/include/access/heapam.h
index 0b9bb1c9b13..4278f351bdf 100644
--- a/src/include/access/heapam.h
+++ b/src/include/access/heapam.h
@@ -342,7 +342,7 @@ extern void heap_inplace_unlock(Relation relation,
HeapTuple oldtup, Buffer buffer);
extern bool heap_page_is_all_visible(Relation rel, Buffer buf,
- TransactionId OldestXmin,
+ GlobalVisState *vistest,
bool *all_frozen,
TransactionId *visibility_cutoff_xid,
OffsetNumber *logging_offnum);
@@ -415,6 +415,8 @@ extern TM_Result HeapTupleSatisfiesUpdate(HeapTuple htup, CommandId curcid,
Buffer buffer);
extern HTSV_Result HeapTupleSatisfiesVacuum(HeapTuple htup, TransactionId OldestXmin,
Buffer buffer);
+extern HTSV_Result HeapTupleSatisfiesVacuumGlobalVis(HeapTuple htup,
+ GlobalVisState *vistest, Buffer buffer);
extern HTSV_Result HeapTupleSatisfiesVacuumHorizon(HeapTuple htup, Buffer buffer,
TransactionId *dead_after);
extern void HeapTupleSetHintBits(HeapTupleHeader tuple, Buffer buffer,
--
2.43.0
v8-0017-Inline-TransactionIdFollows-Precedes.patchtext/x-patch; charset=US-ASCII; name=v8-0017-Inline-TransactionIdFollows-Precedes.patchDownload
From 79e75d5fe9e40964ce2d479f8207c9e56749f41f Mon Sep 17 00:00:00 2001
From: Melanie Plageman <melanieplageman@gmail.com>
Date: Tue, 29 Jul 2025 09:57:13 -0400
Subject: [PATCH v8 17/22] Inline TransactionIdFollows/Precedes()
Calling these from on-access pruning code had noticeable overhead in a
profile. There does not seem to be a reason not to inline them.
---
src/backend/access/transam/transam.c | 64 -------------------------
src/include/access/transam.h | 70 ++++++++++++++++++++++++++--
2 files changed, 66 insertions(+), 68 deletions(-)
diff --git a/src/backend/access/transam/transam.c b/src/backend/access/transam/transam.c
index 9a39451a29a..553d6756cb3 100644
--- a/src/backend/access/transam/transam.c
+++ b/src/backend/access/transam/transam.c
@@ -273,70 +273,6 @@ TransactionIdAbortTree(TransactionId xid, int nxids, TransactionId *xids)
TRANSACTION_STATUS_ABORTED, InvalidXLogRecPtr);
}
-/*
- * TransactionIdPrecedes --- is id1 logically < id2?
- */
-bool
-TransactionIdPrecedes(TransactionId id1, TransactionId id2)
-{
- /*
- * If either ID is a permanent XID then we can just do unsigned
- * comparison. If both are normal, do a modulo-2^32 comparison.
- */
- int32 diff;
-
- if (!TransactionIdIsNormal(id1) || !TransactionIdIsNormal(id2))
- return (id1 < id2);
-
- diff = (int32) (id1 - id2);
- return (diff < 0);
-}
-
-/*
- * TransactionIdPrecedesOrEquals --- is id1 logically <= id2?
- */
-bool
-TransactionIdPrecedesOrEquals(TransactionId id1, TransactionId id2)
-{
- int32 diff;
-
- if (!TransactionIdIsNormal(id1) || !TransactionIdIsNormal(id2))
- return (id1 <= id2);
-
- diff = (int32) (id1 - id2);
- return (diff <= 0);
-}
-
-/*
- * TransactionIdFollows --- is id1 logically > id2?
- */
-bool
-TransactionIdFollows(TransactionId id1, TransactionId id2)
-{
- int32 diff;
-
- if (!TransactionIdIsNormal(id1) || !TransactionIdIsNormal(id2))
- return (id1 > id2);
-
- diff = (int32) (id1 - id2);
- return (diff > 0);
-}
-
-/*
- * TransactionIdFollowsOrEquals --- is id1 logically >= id2?
- */
-bool
-TransactionIdFollowsOrEquals(TransactionId id1, TransactionId id2)
-{
- int32 diff;
-
- if (!TransactionIdIsNormal(id1) || !TransactionIdIsNormal(id2))
- return (id1 >= id2);
-
- diff = (int32) (id1 - id2);
- return (diff >= 0);
-}
-
/*
* TransactionIdLatest --- get latest XID among a main xact and its children
diff --git a/src/include/access/transam.h b/src/include/access/transam.h
index 7d82cd2eb56..c9e20418275 100644
--- a/src/include/access/transam.h
+++ b/src/include/access/transam.h
@@ -255,6 +255,72 @@ typedef struct TransamVariablesData
} TransamVariablesData;
+
+/*
+ * TransactionIdPrecedes --- is id1 logically < id2?
+ */
+static inline bool
+TransactionIdPrecedes(TransactionId id1, TransactionId id2)
+{
+ /*
+ * If either ID is a permanent XID then we can just do unsigned
+ * comparison. If both are normal, do a modulo-2^32 comparison.
+ */
+ int32 diff;
+
+ if (!TransactionIdIsNormal(id1) || !TransactionIdIsNormal(id2))
+ return (id1 < id2);
+
+ diff = (int32) (id1 - id2);
+ return (diff < 0);
+}
+
+/*
+ * TransactionIdPrecedesOrEquals --- is id1 logically <= id2?
+ */
+static inline bool
+TransactionIdPrecedesOrEquals(TransactionId id1, TransactionId id2)
+{
+ int32 diff;
+
+ if (!TransactionIdIsNormal(id1) || !TransactionIdIsNormal(id2))
+ return (id1 <= id2);
+
+ diff = (int32) (id1 - id2);
+ return (diff <= 0);
+}
+
+/*
+ * TransactionIdFollows --- is id1 logically > id2?
+ */
+static inline bool
+TransactionIdFollows(TransactionId id1, TransactionId id2)
+{
+ int32 diff;
+
+ if (!TransactionIdIsNormal(id1) || !TransactionIdIsNormal(id2))
+ return (id1 > id2);
+
+ diff = (int32) (id1 - id2);
+ return (diff > 0);
+}
+
+/*
+ * TransactionIdFollowsOrEquals --- is id1 logically >= id2?
+ */
+static inline bool
+TransactionIdFollowsOrEquals(TransactionId id1, TransactionId id2)
+{
+ int32 diff;
+
+ if (!TransactionIdIsNormal(id1) || !TransactionIdIsNormal(id2))
+ return (id1 >= id2);
+
+ diff = (int32) (id1 - id2);
+ return (diff >= 0);
+}
+
+
/* ----------------
* extern declarations
* ----------------
@@ -274,10 +340,6 @@ extern bool TransactionIdDidAbort(TransactionId transactionId);
extern void TransactionIdCommitTree(TransactionId xid, int nxids, TransactionId *xids);
extern void TransactionIdAsyncCommitTree(TransactionId xid, int nxids, TransactionId *xids, XLogRecPtr lsn);
extern void TransactionIdAbortTree(TransactionId xid, int nxids, TransactionId *xids);
-extern bool TransactionIdPrecedes(TransactionId id1, TransactionId id2);
-extern bool TransactionIdPrecedesOrEquals(TransactionId id1, TransactionId id2);
-extern bool TransactionIdFollows(TransactionId id1, TransactionId id2);
-extern bool TransactionIdFollowsOrEquals(TransactionId id1, TransactionId id2);
extern TransactionId TransactionIdLatest(TransactionId mainxid,
int nxids, const TransactionId *xids);
extern XLogRecPtr TransactionIdGetCommitLSN(TransactionId xid);
--
2.43.0
v8-0018-Unset-all-visible-sooner-if-not-freezing.patchtext/x-patch; charset=US-ASCII; name=v8-0018-Unset-all-visible-sooner-if-not-freezing.patchDownload
From 9c7bbbd397ba2b9b001ba2d0e8a7a52a79cc537b Mon Sep 17 00:00:00 2001
From: Melanie Plageman <melanieplageman@gmail.com>
Date: Tue, 29 Jul 2025 14:35:13 -0400
Subject: [PATCH v8 18/22] Unset all-visible sooner if not freezing
In prune/freeze code, we delay unsetting all-visible/all-frozen in the
presence of dead items to allow opportunistically freezing tuples if the
whole page would be frozen except for those dead items -- which are
removed later in vacuum's third phase.
Future commits will allow on-access pruning to set the VM, which means
all-visible will be initialized to true instead of false and we will do
extra bookkeeping in heap_prune_unchanged_lp_normal() to keep track of
whether or not the page is all-visible.
Because on-access pruning will not freeze tuples, it makes sense to
unset all-visible as soon as we encounter an LP_DEAD item and
avoid continued bookkeeping since we know the page is not all-visible
and we won't be able to remove those dead items.
---
src/backend/access/heap/pruneheap.c | 10 ++++++++--
1 file changed, 8 insertions(+), 2 deletions(-)
diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index f083189fccc..dea8491adbb 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -1493,8 +1493,11 @@ heap_prune_record_dead(PruneState *prstate, OffsetNumber offnum,
/*
* Deliberately delay unsetting all_visible until later during pruning.
- * Removable dead tuples shouldn't preclude freezing the page.
+ * Removable dead tuples shouldn't preclude freezing the page. If we won't
+ * attempt freezing, just unset all-visible now, though.
*/
+ if (!prstate->attempt_freeze)
+ prstate->all_visible = prstate->all_frozen = false;
/* Record the dead offset for vacuum */
prstate->deadoffsets[prstate->lpdead_items++] = offnum;
@@ -1752,8 +1755,11 @@ heap_prune_record_unchanged_lp_dead(Page page, PruneState *prstate, OffsetNumber
* Similarly, don't unset all_visible until later, at the end of
* heap_page_prune_and_freeze(). This will allow us to attempt to freeze
* the page after pruning. As long as we unset it before updating the
- * visibility map, this will be correct.
+ * visibility map, this will be correct. If we won't attempt freezing,
+ * though, just unset all-visible now.
*/
+ if (!prstate->attempt_freeze)
+ prstate->all_visible = prstate->all_frozen = false;
/* Record the dead offset for vacuum */
prstate->deadoffsets[prstate->lpdead_items++] = offnum;
--
2.43.0
v8-0019-Allow-on-access-pruning-to-set-pages-all-visible.patchtext/x-patch; charset=US-ASCII; name=v8-0019-Allow-on-access-pruning-to-set-pages-all-visible.patchDownload
From 6a5e12b22d2e2c18bea556598e1d3ddffc7830cb Mon Sep 17 00:00:00 2001
From: Melanie Plageman <melanieplageman@gmail.com>
Date: Tue, 29 Jul 2025 14:34:30 -0400
Subject: [PATCH v8 19/22] Allow on-access pruning to set pages all-visible
Many queries do not modify the underlying relation. For such queries, if
on-access pruning occurs during the scan, we can check whether the page
has become all-visible and update the visibility map accordingly.
Previously, only vacuum marked pages as all-visible or all-frozen.
Supporting this requires passing information about whether the relation
is modified from the executor down to the scan descriptor.
This commit implements on-access VM setting for sequential scans as well
as for the underlying heap relation in index scans and bitmap heap
scans.
---
src/backend/access/heap/heapam.c | 15 ++++-
src/backend/access/heap/heapam_handler.c | 15 ++++-
src/backend/access/heap/pruneheap.c | 63 ++++++++++++++-----
src/backend/access/index/indexam.c | 46 ++++++++++++++
src/backend/access/table/tableam.c | 39 ++++++++++--
src/backend/executor/execMain.c | 4 ++
src/backend/executor/execUtils.c | 2 +
src/backend/executor/nodeBitmapHeapscan.c | 7 ++-
src/backend/executor/nodeIndexscan.c | 18 ++++--
src/backend/executor/nodeSeqscan.c | 24 +++++--
src/include/access/genam.h | 11 ++++
src/include/access/heapam.h | 24 ++++++-
src/include/access/relscan.h | 6 ++
src/include/access/tableam.h | 30 ++++++++-
src/include/nodes/execnodes.h | 6 ++
.../t/035_standby_logical_decoding.pl | 4 +-
16 files changed, 276 insertions(+), 38 deletions(-)
diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index 88f880cfd15..d99160d5f82 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -556,6 +556,7 @@ heap_prepare_pagescan(TableScanDesc sscan)
Buffer buffer = scan->rs_cbuf;
BlockNumber block = scan->rs_cblock;
Snapshot snapshot;
+ Buffer *vmbuffer = NULL;
Page page;
int lines;
bool all_visible;
@@ -570,7 +571,9 @@ heap_prepare_pagescan(TableScanDesc sscan)
/*
* Prune and repair fragmentation for the whole page, if possible.
*/
- heap_page_prune_opt(scan->rs_base.rs_rd, buffer);
+ if (sscan->rs_flags & SO_ALLOW_VM_SET)
+ vmbuffer = &scan->rs_vmbuffer;
+ heap_page_prune_opt(scan->rs_base.rs_rd, buffer, vmbuffer);
/*
* We must hold share lock on the buffer content while examining tuple
@@ -1247,6 +1250,7 @@ heap_beginscan(Relation relation, Snapshot snapshot,
sizeof(TBMIterateResult));
}
+ scan->rs_vmbuffer = InvalidBuffer;
return (TableScanDesc) scan;
}
@@ -1285,6 +1289,12 @@ heap_rescan(TableScanDesc sscan, ScanKey key, bool set_params,
scan->rs_cbuf = InvalidBuffer;
}
+ if (BufferIsValid(scan->rs_vmbuffer))
+ {
+ ReleaseBuffer(scan->rs_vmbuffer);
+ scan->rs_vmbuffer = InvalidBuffer;
+ }
+
/*
* SO_TYPE_BITMAPSCAN would be cleaned up here, but it does not hold any
* additional data vs a normal HeapScan
@@ -1317,6 +1327,9 @@ heap_endscan(TableScanDesc sscan)
if (BufferIsValid(scan->rs_cbuf))
ReleaseBuffer(scan->rs_cbuf);
+ if (BufferIsValid(scan->rs_vmbuffer))
+ ReleaseBuffer(scan->rs_vmbuffer);
+
/*
* Must free the read stream before freeing the BufferAccessStrategy.
*/
diff --git a/src/backend/access/heap/heapam_handler.c b/src/backend/access/heap/heapam_handler.c
index bcbac844bb6..f05b9e4968d 100644
--- a/src/backend/access/heap/heapam_handler.c
+++ b/src/backend/access/heap/heapam_handler.c
@@ -85,6 +85,7 @@ heapam_index_fetch_begin(Relation rel)
hscan->xs_base.rel = rel;
hscan->xs_cbuf = InvalidBuffer;
+ hscan->xs_vmbuffer = InvalidBuffer;
return &hscan->xs_base;
}
@@ -99,6 +100,12 @@ heapam_index_fetch_reset(IndexFetchTableData *scan)
ReleaseBuffer(hscan->xs_cbuf);
hscan->xs_cbuf = InvalidBuffer;
}
+
+ if (BufferIsValid(hscan->xs_vmbuffer))
+ {
+ ReleaseBuffer(hscan->xs_vmbuffer);
+ hscan->xs_vmbuffer = InvalidBuffer;
+ }
}
static void
@@ -138,7 +145,8 @@ heapam_index_fetch_tuple(struct IndexFetchTableData *scan,
* Prune page, but only if we weren't already on this page
*/
if (prev_buf != hscan->xs_cbuf)
- heap_page_prune_opt(hscan->xs_base.rel, hscan->xs_cbuf);
+ heap_page_prune_opt(hscan->xs_base.rel, hscan->xs_cbuf,
+ scan->modifies_base_rel ? NULL : &hscan->xs_vmbuffer);
}
/* Obtain share-lock on the buffer so we can examine visibility */
@@ -2471,6 +2479,7 @@ BitmapHeapScanNextBlock(TableScanDesc scan,
TBMIterateResult *tbmres;
OffsetNumber offsets[TBM_MAX_TUPLES_PER_PAGE];
int noffsets = -1;
+ Buffer *vmbuffer = NULL;
Assert(scan->rs_flags & SO_TYPE_BITMAPSCAN);
Assert(hscan->rs_read_stream);
@@ -2517,7 +2526,9 @@ BitmapHeapScanNextBlock(TableScanDesc scan,
/*
* Prune and repair fragmentation for the whole page, if possible.
*/
- heap_page_prune_opt(scan->rs_rd, buffer);
+ if (scan->rs_flags & SO_ALLOW_VM_SET)
+ vmbuffer = &hscan->rs_vmbuffer;
+ heap_page_prune_opt(scan->rs_rd, buffer, vmbuffer);
/*
* We must hold share lock on the buffer content while examining tuple
diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index dea8491adbb..1669d7b466e 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -198,9 +198,13 @@ static bool identify_and_fix_vm_corruption(Relation relation,
* if there's not any use in pruning.
*
* Caller must have pin on the buffer, and must *not* have a lock on it.
+ *
+ * If vmbuffer is not NULL, it is okay for pruning to set the visibility map if
+ * the page is all visible. We will take care of pinning and, if needed,
+ * reading in the page of the visibility map.
*/
void
-heap_page_prune_opt(Relation relation, Buffer buffer)
+heap_page_prune_opt(Relation relation, Buffer buffer, Buffer *vmbuffer)
{
Page page = BufferGetPage(buffer);
TransactionId prune_xid;
@@ -264,6 +268,13 @@ heap_page_prune_opt(Relation relation, Buffer buffer)
{
OffsetNumber dummy_off_loc;
PruneFreezeResult presult;
+ int options = 0;
+
+ if (vmbuffer)
+ {
+ visibilitymap_pin(relation, BufferGetBlockNumber(buffer), vmbuffer);
+ options = HEAP_PAGE_PRUNE_UPDATE_VM;
+ }
/*
* For now, pass mark_unused_now as false regardless of whether or
@@ -271,9 +282,10 @@ heap_page_prune_opt(Relation relation, Buffer buffer)
* that during on-access pruning with the current implementation.
*/
heap_page_prune_and_freeze(relation, buffer, false,
- InvalidBuffer,
- vistest, 0,
- NULL, &presult, PRUNE_ON_ACCESS, &dummy_off_loc, NULL, NULL);
+ vmbuffer ? *vmbuffer : InvalidBuffer,
+ vistest, options,
+ NULL, &presult, PRUNE_ON_ACCESS,
+ &dummy_off_loc, NULL, NULL);
/*
* Report the number of tuples reclaimed to pgstats. This is
@@ -513,12 +525,17 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
* all-frozen for use in opportunistic freezing and to update the VM if
* the caller requests it.
*
- * Currently, only VACUUM attempts freezing and setting the VM bits. But
- * other callers could do either one. The visibility bookkeeping is
- * required for opportunistic freezing (in addition to setting the VM
- * bits) because we only consider opportunistically freezing tuples if the
- * whole page would become all-frozen or if the whole page will be frozen
- * except for dead tuples that will be removed by vacuum.
+ * Currently, only VACUUM attempts freezing. But other callers could. The
+ * visibility bookkeeping is required for opportunistic freezing (in
+ * addition to setting the VM bits) because we only consider
+ * opportunistically freezing tuples if the whole page would become
+ * all-frozen or if the whole page will be frozen except for dead tuples
+ * that will be removed by vacuum. But if consider_update_vm is false,
+ * we'll not set the VM even if the page is discovered to be all-visible.
+ *
+ * If only HEAP_PAGE_PRUNE_UPDATE_VM is passed and not
+ * HEAP_PAGE_PRUNE_FREEZE, prstate.all_frozen must be initialized to false
+ * because we will not call heap_prepare_freeze_tuple() on each tuple.
*
* If only updating the VM, we must initialize all_frozen to false, as
* heap_prepare_freeze_tuple() will not be called for each tuple on the
@@ -530,7 +547,7 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
* whether or not to freeze but before deciding whether or not to update
* the VM so that we don't set the VM bit incorrectly.
*
- * If not freezing or updating the VM, we otherwise avoid the extra
+ * If not freezing and not updating the VM, we avoid the extra
* bookkeeping. Initializing all_visible to false allows skipping the work
* to update them in heap_prune_record_unchanged_lp_normal().
*/
@@ -879,12 +896,30 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
prstate.all_frozen = false;
}
+ /*
+ * If this is an on-access call and we're not actually pruning, avoid
+ * setting the visibility map if it would newly dirty the heap page or, if
+ * the page is already dirty, if doing so would require including a
+ * full-page image (FPI) of the heap page in the WAL. This situation
+ * should be rare, as on-access pruning is only attempted when
+ * pd_prune_xid is valid.
+ */
+ if (reason == PRUNE_ON_ACCESS &&
+ prstate.consider_update_vm &&
+ prstate.all_visible &&
+ !do_prune && !do_freeze &&
+ (!BufferIsDirty(buffer) || XLogCheckBufferNeedsBackup(buffer)))
+ {
+ prstate.consider_update_vm = false;
+ prstate.all_visible = prstate.all_frozen = false;
+ }
+
Assert(!prstate.all_frozen || prstate.all_visible);
/*
- * Handle setting visibility map bit based on information from the VM (as
- * of last heap_vac_scan_next_block() call), and from all_visible and
- * all_frozen variables.
+ * Handle setting visibility map bit based on information from the VM (if
+ * provided, e.g. by vacuum from the last heap_vac_scan_next_block()
+ * call), and from all_visible and all_frozen variables.
*/
if (prstate.consider_update_vm)
{
diff --git a/src/backend/access/index/indexam.c b/src/backend/access/index/indexam.c
index 86d11f4ec79..4603ece09bd 100644
--- a/src/backend/access/index/indexam.c
+++ b/src/backend/access/index/indexam.c
@@ -289,6 +289,32 @@ index_beginscan(Relation heapRelation,
return scan;
}
+/*
+ * Similar to index_beginscan(), but allows the caller to indicate whether the
+ * query modifies the underlying base relation. This is used when the caller
+ * wants to attempt marking pages in the base relation as all-visible in the
+ * visibility map during on-access pruning.
+ */
+IndexScanDesc
+index_beginscan_vmset(Relation heapRelation,
+ Relation indexRelation,
+ Snapshot snapshot,
+ IndexScanInstrumentation *instrument,
+ int nkeys, int norderbys, bool modifies_base_rel)
+{
+ IndexScanDesc scan;
+
+ scan = index_beginscan(heapRelation,
+ indexRelation,
+ snapshot,
+ instrument,
+ nkeys, norderbys);
+
+ scan->xs_heapfetch->modifies_base_rel = modifies_base_rel;
+
+ return scan;
+}
+
/*
* index_beginscan_bitmap - start a scan of an index with amgetbitmap
*
@@ -620,6 +646,26 @@ index_beginscan_parallel(Relation heaprel, Relation indexrel,
return scan;
}
+/*
+ * Parallel version of index_beginscan_vmset()
+ */
+IndexScanDesc
+index_beginscan_parallel_vmset(Relation heaprel, Relation indexrel,
+ IndexScanInstrumentation *instrument,
+ int nkeys, int norderbys,
+ ParallelIndexScanDesc pscan,
+ bool modifies_base_rel)
+{
+ IndexScanDesc scan;
+
+ scan = index_beginscan_parallel(heaprel, indexrel,
+ instrument,
+ nkeys, norderbys,
+ pscan);
+ scan->xs_heapfetch->modifies_base_rel = modifies_base_rel;
+ return scan;
+}
+
/* ----------------
* index_getnext_tid - get the next TID from a scan
*
diff --git a/src/backend/access/table/tableam.c b/src/backend/access/table/tableam.c
index a56c5eceb14..67dbf99f5b5 100644
--- a/src/backend/access/table/tableam.c
+++ b/src/backend/access/table/tableam.c
@@ -49,6 +49,10 @@
char *default_table_access_method = DEFAULT_TABLE_ACCESS_METHOD;
bool synchronize_seqscans = true;
+/* Helper for table_beginscan_parallel() and table_beginscan_parallel_vmset() */
+static TableScanDesc table_beginscan_parallel_common(Relation relation, ParallelTableScanDesc pscan,
+ uint32 flags);
+
/* ----------------------------------------------------------------------------
* Slot functions.
@@ -162,12 +166,14 @@ table_parallelscan_initialize(Relation rel, ParallelTableScanDesc pscan,
}
}
-TableScanDesc
-table_beginscan_parallel(Relation relation, ParallelTableScanDesc pscan)
+/*
+ * Common helper for table_beginscan_parallel() and table_beginscan_parallel_vmset()
+ */
+static TableScanDesc
+table_beginscan_parallel_common(Relation relation, ParallelTableScanDesc pscan,
+ uint32 flags)
{
Snapshot snapshot;
- uint32 flags = SO_TYPE_SEQSCAN |
- SO_ALLOW_STRAT | SO_ALLOW_SYNC | SO_ALLOW_PAGEMODE;
Assert(RelFileLocatorEquals(relation->rd_locator, pscan->phs_locator));
@@ -188,6 +194,31 @@ table_beginscan_parallel(Relation relation, ParallelTableScanDesc pscan)
pscan, flags);
}
+TableScanDesc
+table_beginscan_parallel(Relation relation, ParallelTableScanDesc pscan)
+{
+ uint32 flags = SO_TYPE_SEQSCAN |
+ SO_ALLOW_STRAT | SO_ALLOW_SYNC | SO_ALLOW_PAGEMODE;
+
+ return table_beginscan_parallel_common(relation, pscan, flags);
+}
+
+/*
+ * Parallel version of table_beginscan_vmset()
+ */
+TableScanDesc
+table_beginscan_parallel_vmset(Relation relation, ParallelTableScanDesc pscan,
+ bool modifies_rel)
+{
+ uint32 flags = SO_TYPE_SEQSCAN |
+ SO_ALLOW_STRAT | SO_ALLOW_SYNC | SO_ALLOW_PAGEMODE;
+
+ if (!modifies_rel)
+ flags |= SO_ALLOW_VM_SET;
+
+ return table_beginscan_parallel_common(relation, pscan, flags);
+}
+
/* ----------------------------------------------------------------------------
* Index scan related functions.
diff --git a/src/backend/executor/execMain.c b/src/backend/executor/execMain.c
index b8b9d2a85f7..a862701edbe 100644
--- a/src/backend/executor/execMain.c
+++ b/src/backend/executor/execMain.c
@@ -916,6 +916,10 @@ InitPlan(QueryDesc *queryDesc, int eflags)
break;
}
+ /* If it has a rowmark, the relation is modified */
+ estate->es_modified_relids = bms_add_member(estate->es_modified_relids,
+ rc->rti);
+
/* Check that relation is a legal target for marking */
if (relation)
CheckValidRowMarkRel(relation, rc->markType);
diff --git a/src/backend/executor/execUtils.c b/src/backend/executor/execUtils.c
index fdc65c2b42b..28a06dcd244 100644
--- a/src/backend/executor/execUtils.c
+++ b/src/backend/executor/execUtils.c
@@ -893,6 +893,8 @@ ExecInitResultRelation(EState *estate, ResultRelInfo *resultRelInfo,
estate->es_result_relations = (ResultRelInfo **)
palloc0(estate->es_range_table_size * sizeof(ResultRelInfo *));
estate->es_result_relations[rti - 1] = resultRelInfo;
+ estate->es_modified_relids = bms_add_member(estate->es_modified_relids,
+ rti);
/*
* Saving in the list allows to avoid needlessly traversing the whole
diff --git a/src/backend/executor/nodeBitmapHeapscan.c b/src/backend/executor/nodeBitmapHeapscan.c
index bf24f3d7fe0..af6db9f7919 100644
--- a/src/backend/executor/nodeBitmapHeapscan.c
+++ b/src/backend/executor/nodeBitmapHeapscan.c
@@ -105,11 +105,16 @@ BitmapTableScanSetup(BitmapHeapScanState *node)
*/
if (!node->ss.ss_currentScanDesc)
{
+ bool modifies_rel =
+ bms_is_member(((Scan *) node->ss.ps.plan)->scanrelid,
+ node->ss.ps.state->es_modified_relids);
+
node->ss.ss_currentScanDesc =
table_beginscan_bm(node->ss.ss_currentRelation,
node->ss.ps.state->es_snapshot,
0,
- NULL);
+ NULL,
+ modifies_rel);
}
node->ss.ss_currentScanDesc->st.rs_tbmiterator = tbmiterator;
diff --git a/src/backend/executor/nodeIndexscan.c b/src/backend/executor/nodeIndexscan.c
index 7fcaa37fe62..c2ffbd3b08e 100644
--- a/src/backend/executor/nodeIndexscan.c
+++ b/src/backend/executor/nodeIndexscan.c
@@ -102,16 +102,22 @@ IndexNext(IndexScanState *node)
if (scandesc == NULL)
{
+
+ bool modifies_base_rel =
+ bms_is_member(((Scan *) node->ss.ps.plan)->scanrelid,
+ estate->es_modified_relids);
+
/*
* We reach here if the index scan is not parallel, or if we're
* serially executing an index scan that was planned to be parallel.
*/
- scandesc = index_beginscan(node->ss.ss_currentRelation,
- node->iss_RelationDesc,
- estate->es_snapshot,
- &node->iss_Instrument,
- node->iss_NumScanKeys,
- node->iss_NumOrderByKeys);
+ scandesc = index_beginscan_vmset(node->ss.ss_currentRelation,
+ node->iss_RelationDesc,
+ estate->es_snapshot,
+ &node->iss_Instrument,
+ node->iss_NumScanKeys,
+ node->iss_NumOrderByKeys,
+ modifies_base_rel);
node->iss_ScanDesc = scandesc;
diff --git a/src/backend/executor/nodeSeqscan.c b/src/backend/executor/nodeSeqscan.c
index 94047d29430..fd69275c181 100644
--- a/src/backend/executor/nodeSeqscan.c
+++ b/src/backend/executor/nodeSeqscan.c
@@ -65,13 +65,18 @@ SeqNext(SeqScanState *node)
if (scandesc == NULL)
{
+ bool modifies_rel =
+ bms_is_member(((Scan *) node->ss.ps.plan)->scanrelid,
+ estate->es_modified_relids);
+
/*
* We reach here if the scan is not parallel, or if we're serially
* executing a scan that was planned to be parallel.
*/
- scandesc = table_beginscan(node->ss.ss_currentRelation,
- estate->es_snapshot,
- 0, NULL);
+ scandesc = table_beginscan_vmset(node->ss.ss_currentRelation,
+ estate->es_snapshot,
+ 0, NULL, modifies_rel);
+
node->ss.ss_currentScanDesc = scandesc;
}
@@ -366,6 +371,7 @@ ExecSeqScanInitializeDSM(SeqScanState *node,
ParallelContext *pcxt)
{
EState *estate = node->ss.ps.state;
+ bool modifies_rel;
ParallelTableScanDesc pscan;
pscan = shm_toc_allocate(pcxt->toc, node->pscan_len);
@@ -373,8 +379,11 @@ ExecSeqScanInitializeDSM(SeqScanState *node,
pscan,
estate->es_snapshot);
shm_toc_insert(pcxt->toc, node->ss.ps.plan->plan_node_id, pscan);
+ modifies_rel = bms_is_member(((Scan *) node->ss.ps.plan)->scanrelid,
+ estate->es_modified_relids);
node->ss.ss_currentScanDesc =
- table_beginscan_parallel(node->ss.ss_currentRelation, pscan);
+ table_beginscan_parallel_vmset(node->ss.ss_currentRelation, pscan,
+ modifies_rel);
}
/* ----------------------------------------------------------------
@@ -404,8 +413,13 @@ ExecSeqScanInitializeWorker(SeqScanState *node,
ParallelWorkerContext *pwcxt)
{
ParallelTableScanDesc pscan;
+ bool modifies_rel =
+ bms_is_member(((Scan *) node->ss.ps.plan)->scanrelid,
+ node->ss.ps.state->es_modified_relids);
pscan = shm_toc_lookup(pwcxt->toc, node->ss.ps.plan->plan_node_id, false);
node->ss.ss_currentScanDesc =
- table_beginscan_parallel(node->ss.ss_currentRelation, pscan);
+ table_beginscan_parallel_vmset(node->ss.ss_currentRelation,
+ pscan,
+ modifies_rel);
}
diff --git a/src/include/access/genam.h b/src/include/access/genam.h
index 5b2ab181b5f..bf272c2c37f 100644
--- a/src/include/access/genam.h
+++ b/src/include/access/genam.h
@@ -180,6 +180,11 @@ extern IndexScanDesc index_beginscan(Relation heapRelation,
Snapshot snapshot,
IndexScanInstrumentation *instrument,
int nkeys, int norderbys);
+extern IndexScanDesc index_beginscan_vmset(Relation heapRelation,
+ Relation indexRelation,
+ Snapshot snapshot,
+ IndexScanInstrumentation *instrument,
+ int nkeys, int norderbys, bool modifies_heap_rel);
extern IndexScanDesc index_beginscan_bitmap(Relation indexRelation,
Snapshot snapshot,
IndexScanInstrumentation *instrument,
@@ -206,6 +211,12 @@ extern IndexScanDesc index_beginscan_parallel(Relation heaprel,
IndexScanInstrumentation *instrument,
int nkeys, int norderbys,
ParallelIndexScanDesc pscan);
+
+extern IndexScanDesc index_beginscan_parallel_vmset(Relation heaprel, Relation indexrel,
+ IndexScanInstrumentation *instrument,
+ int nkeys, int norderbys,
+ ParallelIndexScanDesc pscan,
+ bool modifies_rel);
extern ItemPointer index_getnext_tid(IndexScanDesc scan,
ScanDirection direction);
struct TupleTableSlot;
diff --git a/src/include/access/heapam.h b/src/include/access/heapam.h
index 4278f351bdf..16f7904a21e 100644
--- a/src/include/access/heapam.h
+++ b/src/include/access/heapam.h
@@ -94,6 +94,13 @@ typedef struct HeapScanDescData
*/
ParallelBlockTableScanWorkerData *rs_parallelworkerdata;
+ /*
+ * For sequential scans and bitmap heap scans. If the relation is not
+ * being modified, on-access pruning may read in the current heap page's
+ * corresponding VM block to this buffer.
+ */
+ Buffer rs_vmbuffer;
+
/* these fields only used in page-at-a-time mode and for bitmap scans */
uint32 rs_cindex; /* current tuple's index in vistuples */
uint32 rs_ntuples; /* number of visible tuples on page */
@@ -116,8 +123,18 @@ typedef struct IndexFetchHeapData
{
IndexFetchTableData xs_base; /* AM independent part of the descriptor */
- Buffer xs_cbuf; /* current heap buffer in scan, if any */
- /* NB: if xs_cbuf is not InvalidBuffer, we hold a pin on that buffer */
+ /*
+ * Current heap buffer in scan, if any. NB: if xs_cbuf is not
+ * InvalidBuffer, we hold a pin on that buffer.
+ */
+ Buffer xs_cbuf;
+
+ /*
+ * For index scans that do not modify the underlying heap table, on-access
+ * pruning may read in the current heap page's corresponding VM block to
+ * this buffer.
+ */
+ Buffer xs_vmbuffer;
} IndexFetchHeapData;
/* Result codes for HeapTupleSatisfiesVacuum */
@@ -374,7 +391,8 @@ extern TransactionId heap_index_delete_tuples(Relation rel,
/* in heap/pruneheap.c */
struct GlobalVisState;
-extern void heap_page_prune_opt(Relation relation, Buffer buffer);
+extern void heap_page_prune_opt(Relation relation, Buffer buffer,
+ Buffer *vmbuffer);
extern void heap_page_prune_and_freeze(Relation relation, Buffer buffer,
bool blk_known_av,
Buffer vmbuffer,
diff --git a/src/include/access/relscan.h b/src/include/access/relscan.h
index b5e0fb386c0..f496e0b4939 100644
--- a/src/include/access/relscan.h
+++ b/src/include/access/relscan.h
@@ -121,6 +121,12 @@ typedef struct ParallelBlockTableScanWorkerData *ParallelBlockTableScanWorker;
typedef struct IndexFetchTableData
{
Relation rel;
+
+ /*
+ * Some optimizations can only be performed if the query does not modify
+ * the underlying relation. Track that here.
+ */
+ bool modifies_base_rel;
} IndexFetchTableData;
struct IndexScanInstrumentation;
diff --git a/src/include/access/tableam.h b/src/include/access/tableam.h
index 1c9e802a6b1..0e986d8ef72 100644
--- a/src/include/access/tableam.h
+++ b/src/include/access/tableam.h
@@ -62,6 +62,8 @@ typedef enum ScanOptions
/* unregister snapshot at scan end? */
SO_TEMP_SNAPSHOT = 1 << 9,
+ /* whether or not scan should attempt to set the VM */
+ SO_ALLOW_VM_SET = 1 << 10,
} ScanOptions;
/*
@@ -876,6 +878,25 @@ table_beginscan(Relation rel, Snapshot snapshot,
return rel->rd_tableam->scan_begin(rel, snapshot, nkeys, key, NULL, flags);
}
+/*
+ * Similar to table_beginscan(), but allows the caller to indicate whether the
+ * query modifies the relation. This is used when the caller wants to attempt
+ * marking pages in the relation as all-visible in the visibility map during
+ * on-access pruning.
+ */
+static inline TableScanDesc
+table_beginscan_vmset(Relation rel, Snapshot snapshot,
+ int nkeys, struct ScanKeyData *key, bool modifies_rel)
+{
+ uint32 flags = SO_TYPE_SEQSCAN |
+ SO_ALLOW_STRAT | SO_ALLOW_SYNC | SO_ALLOW_PAGEMODE;
+
+ if (!modifies_rel)
+ flags |= SO_ALLOW_VM_SET;
+
+ return rel->rd_tableam->scan_begin(rel, snapshot, nkeys, key, NULL, flags);
+}
+
/*
* Like table_beginscan(), but for scanning catalog. It'll automatically use a
* snapshot appropriate for scanning catalog relations.
@@ -913,10 +934,13 @@ table_beginscan_strat(Relation rel, Snapshot snapshot,
*/
static inline TableScanDesc
table_beginscan_bm(Relation rel, Snapshot snapshot,
- int nkeys, struct ScanKeyData *key)
+ int nkeys, struct ScanKeyData *key, bool modifies_rel)
{
uint32 flags = SO_TYPE_BITMAPSCAN | SO_ALLOW_PAGEMODE;
+ if (!modifies_rel)
+ flags |= SO_ALLOW_VM_SET;
+
return rel->rd_tableam->scan_begin(rel, snapshot, nkeys, key,
NULL, flags);
}
@@ -1125,6 +1149,10 @@ extern void table_parallelscan_initialize(Relation rel,
extern TableScanDesc table_beginscan_parallel(Relation relation,
ParallelTableScanDesc pscan);
+extern TableScanDesc table_beginscan_parallel_vmset(Relation relation,
+ ParallelTableScanDesc pscan,
+ bool modifies_rel);
+
/*
* Restart a parallel scan. Call this in the leader process. Caller is
* responsible for making sure that all workers have finished the scan
diff --git a/src/include/nodes/execnodes.h b/src/include/nodes/execnodes.h
index de782014b2d..839c1be1d7c 100644
--- a/src/include/nodes/execnodes.h
+++ b/src/include/nodes/execnodes.h
@@ -678,6 +678,12 @@ typedef struct EState
* ExecDoInitialPruning() */
const char *es_sourceText; /* Source text from QueryDesc */
+ /*
+ * RT indexes of relations modified by the query either through
+ * UPDATE/DELETE/INSERT/MERGE or SELECT FOR UPDATE
+ */
+ Bitmapset *es_modified_relids;
+
JunkFilter *es_junkFilter; /* top-level junk filter, if any */
/* If query can insert/delete tuples, the command ID to mark them with */
diff --git a/src/test/recovery/t/035_standby_logical_decoding.pl b/src/test/recovery/t/035_standby_logical_decoding.pl
index c9c182892cf..870f03bdd79 100644
--- a/src/test/recovery/t/035_standby_logical_decoding.pl
+++ b/src/test/recovery/t/035_standby_logical_decoding.pl
@@ -10,6 +10,7 @@ use PostgreSQL::Test::Cluster;
use PostgreSQL::Test::Utils;
use Time::HiRes qw(usleep);
use Test::More;
+use Time::HiRes qw(usleep);
if ($ENV{enable_injection_points} ne 'yes')
{
@@ -296,6 +297,7 @@ wal_level = 'logical'
max_replication_slots = 4
max_wal_senders = 4
autovacuum = off
+hot_standby_feedback = on
});
$node_primary->dump_info;
$node_primary->start;
@@ -745,7 +747,7 @@ check_pg_recvlogical_stderr($handle,
$logstart = -s $node_standby->logfile;
reactive_slots_change_hfs_and_wait_for_xmins('shared_row_removal_',
- 'no_conflict_', 0, 1);
+ 'no_conflict_', 1, 0);
# This should not trigger a conflict
wait_until_vacuum_can_remove(
--
2.43.0
v8-0020-Add-helper-functions-to-heap_page_prune_and_freez.patchtext/x-patch; charset=US-ASCII; name=v8-0020-Add-helper-functions-to-heap_page_prune_and_freez.patchDownload
From d55216a8a2fb16c176e245caca97e88ae35ad1f5 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <melanieplageman@gmail.com>
Date: Wed, 30 Jul 2025 18:51:43 -0400
Subject: [PATCH v8 20/22] Add helper functions to heap_page_prune_and_freeze
heap_page_prune_and_freeze() has gotten rather long. It has several
stages:
1) setup - where the PruneStateis set up
2) tuple examination -- where tuples and line pointers are examined to
determine what needs to be pruned and what could be frozen
3) evaluation -- where we determine based on caller provided options,
heuristics, and state gathered during stage 2 whether or not to
freeze tuples and set the page in the VM
4) execution - where the page changes are actually made and logged
This commit refactors the evaluation stage into helpers which return
whether or not to freeze and set the VM.
For the purposes of committing, this likely shouldn't be a separate
commit. But I'm not sure yet whether it makes more sense to do this
refactoring earlier in the set for clarity for the reviewer.
---
src/backend/access/heap/pruneheap.c | 471 +++++++++++++++++-----------
1 file changed, 295 insertions(+), 176 deletions(-)
diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index 1669d7b466e..8b898fe19dd 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -179,6 +179,22 @@ static void heap_prune_record_unchanged_lp_redirect(PruneState *prstate, OffsetN
static void page_verify_redirects(Page page);
+static bool heap_page_will_freeze(Relation relation, Buffer buffer,
+ bool do_prune,
+ bool do_hint_full_or_prunable,
+ bool did_tuple_hint_fpi,
+ PruneState *prstate,
+ bool *all_frozen_except_lp_dead);
+
+static bool heap_page_will_update_vm(Relation relation,
+ Buffer buffer, BlockNumber blockno, Page page,
+ PruneReason reason,
+ bool do_prune, bool do_freeze,
+ bool blk_known_av,
+ PruneState *prstate,
+ Buffer *vmbuffer, uint8 *vmflags,
+ bool *set_pd_all_visible);
+
static bool identify_and_fix_vm_corruption(Relation relation,
BlockNumber heap_blk,
Buffer heap_buffer, Page heap_page,
@@ -376,6 +392,249 @@ identify_and_fix_vm_corruption(Relation relation,
return false;
}
+
+/*
+ * Determine whether to set the visibility map bits based on information from
+ * the PruneState and blk_known_av, which some callers will provide after
+ * previously examining this heap page's VM bits (e.g. vacuum from the last
+ * heap_vac_scan_next_block() call).
+ *
+ * We pass in blockno and page even those can be derived from buffer to avoid
+ * extra BufferGetBlock() and BufferGetBlockNumber() calls.
+ *
+ * This should be called only after do_freeze has been decided (and do_prune
+ * has been set), as these factor into our heuristic-based decision.
+ *
+ * prstate and vmbuffer are input/output fields. vmflags and and
+ * set_pd_all_visible are output fields.
+ *
+ * Returns true if the caller should set one or both of the VM bits and false
+ * otherwise.
+ */
+static bool
+heap_page_will_update_vm(Relation relation,
+ Buffer buffer, BlockNumber blockno, Page page,
+ PruneReason reason,
+ bool do_prune, bool do_freeze,
+ bool blk_known_av,
+ PruneState *prstate,
+ Buffer *vmbuffer, uint8 *vmflags,
+ bool *set_pd_all_visible)
+{
+ bool do_set_vm = false;
+
+ /*
+ * If the caller specified not to update the VM, validate everything is in
+ * the right state and exit.
+ */
+ if (!prstate->consider_update_vm)
+ {
+ Assert(!prstate->all_visible && !prstate->all_frozen);
+ /* We don't set only the page level visibility hint */
+ Assert(!(*set_pd_all_visible));
+ Assert(*vmflags == 0);
+ return false;
+ }
+
+ /*
+ * If this is an on-access call and we're not actually pruning, avoid
+ * setting the visibility map if it would newly dirty the heap page or, if
+ * the page is already dirty, if doing so would require including a
+ * full-page image (FPI) of the heap page in the WAL. This situation
+ * should be rare, as on-access pruning is only attempted when
+ * pd_prune_xid is valid.
+ */
+ if (reason == PRUNE_ON_ACCESS &&
+ prstate->consider_update_vm &&
+ prstate->all_visible &&
+ !do_prune && !do_freeze &&
+ (!BufferIsDirty(buffer) || XLogCheckBufferNeedsBackup(buffer)))
+ {
+ prstate->consider_update_vm = false;
+ prstate->all_visible = prstate->all_frozen = false;
+ }
+
+ Assert(!prstate->all_frozen || prstate->all_visible);
+
+ /*
+ * Clear any VM corruption. This does not need to be in a critical
+ * section, so we do it first. If PD_ALL_VISIBLE is incorrectly set, we
+ * may mark the heap page buffer dirty here and could end up doing so
+ * again later. This is not a correctness issue and is in the path of VM
+ * corruption, so we don't have to worry about the extra performance
+ * overhead.
+ */
+ if (identify_and_fix_vm_corruption(relation,
+ blockno, buffer, page,
+ blk_known_av, prstate->lpdead_items,
+ *vmbuffer))
+ {
+ /* If we fix corruption, don't update the VM further */
+ }
+
+ /* Determine if we actually need to set the VM and which bits to set. */
+ else if (prstate->all_visible &&
+ (!blk_known_av ||
+ (prstate->all_frozen && !VM_ALL_FROZEN(relation, blockno, vmbuffer))))
+ {
+ *vmflags |= VISIBILITYMAP_ALL_VISIBLE;
+ if (prstate->all_frozen)
+ *vmflags |= VISIBILITYMAP_ALL_FROZEN;
+ }
+
+ do_set_vm = *vmflags & VISIBILITYMAP_VALID_BITS;
+
+ /*
+ * Don't set PD_ALL_VISIBLE unless we also plan to set the VM. While it is
+ * correct for a heap page to have PD_ALL_VISIBLE even if the VM is not
+ * set, we strongly prefer to keep them in sync.
+ *
+ * Prior to Postgres 19, it was possible for the page-level bit to be set
+ * and the VM bit to be clear. This could happen if we crashed after
+ * setting PD_ALL_VISIBLE but before setting bits in the VM.
+ */
+ *set_pd_all_visible = do_set_vm && !PageIsAllVisible(page);
+ return do_set_vm;
+}
+
+/*
+ * Decide if we want to go ahead with freezing according to the freeze plans we
+ * prepared for the given buffer or not. If the caller specified we should not
+ * freeze tuples, it exits early.
+ *
+ * do_prune, do_hint_full_or_prunable, and did_tuple_hint_fpi must all have
+ * been decided before calling this function.
+ *
+ * prstate is an input/output parameter. all_frozen_except_lp_dead is set and
+ * used later to determine the snapshot conflict horizon for the record.
+ *
+ * Returns true if we should use our freeze plans and freeze tuples on the page
+ * and false otherwise.
+ */
+static bool
+heap_page_will_freeze(Relation relation, Buffer buffer,
+ bool do_prune,
+ bool do_hint_full_or_prunable,
+ bool did_tuple_hint_fpi,
+ PruneState *prstate,
+ bool *all_frozen_except_lp_dead)
+{
+ bool do_freeze = false;
+
+ /*
+ * If the caller specified we should not attempt to freeze any tuples,
+ * validate that everything is in the right state and exit.
+ */
+ if (!prstate->attempt_freeze)
+ {
+ Assert(!prstate->all_frozen && prstate->nfrozen == 0);
+ Assert(prstate->lpdead_items == 0 || !prstate->all_visible);
+ Assert(!(*all_frozen_except_lp_dead));
+ return false;
+ }
+
+ if (prstate->pagefrz.freeze_required)
+ {
+ /*
+ * heap_prepare_freeze_tuple indicated that at least one XID/MXID from
+ * before FreezeLimit/MultiXactCutoff is present. Must freeze to
+ * advance relfrozenxid/relminmxid.
+ */
+ do_freeze = true;
+ }
+ else
+ {
+ /*
+ * Opportunistically freeze the page if we are generating an FPI
+ * anyway and if doing so means that we can set the page all-frozen
+ * afterwards (might not happen until VACUUM's final heap pass).
+ *
+ * XXX: Previously, we knew if pruning emitted an FPI by checking
+ * pgWalUsage.wal_fpi before and after pruning. Once the freeze and
+ * prune records were combined, this heuristic couldn't be used
+ * anymore. The opportunistic freeze heuristic must be improved;
+ * however, for now, try to approximate the old logic.
+ */
+ if (prstate->all_visible && prstate->all_frozen && prstate->nfrozen > 0)
+ {
+ /*
+ * Freezing would make the page all-frozen. Have already emitted
+ * an FPI or will do so anyway?
+ */
+ if (RelationNeedsWAL(relation))
+ {
+ if (did_tuple_hint_fpi)
+ do_freeze = true;
+ else if (do_prune)
+ {
+ if (XLogCheckBufferNeedsBackup(buffer))
+ do_freeze = true;
+ }
+ else if (do_hint_full_or_prunable)
+ {
+ if (XLogHintBitIsNeeded() && XLogCheckBufferNeedsBackup(buffer))
+ do_freeze = true;
+ }
+ }
+ }
+ }
+
+ if (do_freeze)
+ {
+ /*
+ * Validate the tuples we will be freezing before entering the
+ * critical section.
+ */
+ heap_pre_freeze_checks(buffer, prstate->frozen, prstate->nfrozen);
+ }
+ else if (prstate->nfrozen > 0)
+ {
+ /*
+ * The page contained some tuples that were not already frozen, and we
+ * chose not to freeze them now. The page won't be all-frozen then.
+ */
+ Assert(!prstate->pagefrz.freeze_required);
+
+ prstate->all_frozen = false;
+ prstate->nfrozen = 0; /* avoid miscounts in instrumentation */
+ }
+ else
+ {
+ /*
+ * We have no freeze plans to execute. The page might already be
+ * all-frozen (perhaps only following pruning), though. Such pages
+ * can be marked all-frozen in the VM by our caller, even though none
+ * of its tuples were newly frozen here.
+ */
+ }
+
+ /*
+ * It was convenient to ignore LP_DEAD items in all_visible earlier on to
+ * make the choice of whether or not to freeze the page unaffected by the
+ * short-term presence of LP_DEAD items. These LP_DEAD items were
+ * effectively assumed to be LP_UNUSED items in the making. It doesn't
+ * matter which vacuum heap pass (initial pass or final pass) ends up
+ * setting the page all-frozen, as long as the ongoing VACUUM does it.
+ *
+ * Now that freezing has been finalized, unset all_visible if there are
+ * any LP_DEAD items on the page. It needs to reflect the present state of
+ * the page when using it to determine whether or not to update the VM.
+ *
+ * Keep track of whether or not the page was all-frozen except LP_DEAD
+ * items for the purposes of calculating the snapshot conflict horizon,
+ * though.
+ */
+ *all_frozen_except_lp_dead = prstate->all_frozen;
+ if (prstate->lpdead_items > 0)
+ {
+ prstate->all_visible = false;
+ prstate->all_frozen = false;
+ }
+
+ return do_freeze;
+}
+
+
/*
* Prune and repair fragmentation and potentially freeze tuples on the
* specified page. If the page's visibility status has changed, update it in
@@ -766,20 +1025,30 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
/* Clear the offset information once we have processed the given page. */
*off_loc = InvalidOffsetNumber;
- do_prune = prstate.nredirected > 0 ||
- prstate.ndead > 0 ||
- prstate.nunused > 0;
-
/*
* After processing all the live tuples on the page, if the newest xmin
* amongst them is not visible to everyone, the page cannot be
- * all-visible.
+ * all-visible. This must be done before we decide whether or not to
+ * opportunistically freeze below because we do not want to
+ * opportunistically freeze the page if there are live tuples not visible
+ * to everyone, which would prevent setting the page frozen in the VM.
*/
if (prstate.all_visible &&
TransactionIdIsNormal(prstate.visibility_cutoff_xid) &&
!GlobalVisXidVisibleToAll(prstate.vistest, prstate.visibility_cutoff_xid))
prstate.all_visible = prstate.all_frozen = false;
+ /*
+ * Now decide based on information collected while examining every tuple
+ * which actions to take. If there are any prunable tuples, we'll prune
+ * them. However, we will decide based on options specified by the caller
+ * and various heuristics whether or not to freeze any tuples and whether
+ * or not the page should be set all-visible/all-frozen in the VM.
+ */
+ do_prune = prstate.nredirected > 0 ||
+ prstate.ndead > 0 ||
+ prstate.nunused > 0;
+
/*
* Even if we don't prune anything, if we found a new value for the
* pd_prune_xid field or the page was marked full, we will update those
@@ -790,182 +1059,32 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
PageIsFull(page);
/*
- * Decide if we want to go ahead with freezing according to the freeze
- * plans we prepared, or not.
- */
- do_freeze = false;
- if (prstate.attempt_freeze)
- {
- if (prstate.pagefrz.freeze_required)
- {
- /*
- * heap_prepare_freeze_tuple indicated that at least one XID/MXID
- * from before FreezeLimit/MultiXactCutoff is present. Must
- * freeze to advance relfrozenxid/relminmxid.
- */
- do_freeze = true;
- }
- else
- {
- /*
- * Opportunistically freeze the page if we are generating an FPI
- * anyway and if doing so means that we can set the page
- * all-frozen afterwards (might not happen until VACUUM's final
- * heap pass).
- *
- * XXX: Previously, we knew if pruning emitted an FPI by checking
- * pgWalUsage.wal_fpi before and after pruning. Once the freeze
- * and prune records were combined, this heuristic couldn't be
- * used anymore. The opportunistic freeze heuristic must be
- * improved; however, for now, try to approximate the old logic.
- */
- if (prstate.all_visible && prstate.all_frozen && prstate.nfrozen > 0)
- {
- /*
- * Freezing would make the page all-frozen. Have already
- * emitted an FPI or will do so anyway?
- */
- if (RelationNeedsWAL(relation))
- {
- if (did_tuple_hint_fpi)
- do_freeze = true;
- else if (do_prune)
- {
- if (XLogCheckBufferNeedsBackup(buffer))
- do_freeze = true;
- }
- else if (do_hint_full_or_prunable)
- {
- if (XLogHintBitIsNeeded() && XLogCheckBufferNeedsBackup(buffer))
- do_freeze = true;
- }
- }
- }
- }
- }
-
- if (do_freeze)
- {
- /*
- * Validate the tuples we will be freezing before entering the
- * critical section.
- */
- heap_pre_freeze_checks(buffer, prstate.frozen, prstate.nfrozen);
- }
- else if (prstate.nfrozen > 0)
- {
- /*
- * The page contained some tuples that were not already frozen, and we
- * chose not to freeze them now. The page won't be all-frozen then.
- */
- Assert(!prstate.pagefrz.freeze_required);
-
- prstate.all_frozen = false;
- prstate.nfrozen = 0; /* avoid miscounts in instrumentation */
- }
- else
- {
- /*
- * We have no freeze plans to execute. The page might already be
- * all-frozen (perhaps only following pruning), though. Such pages
- * can be marked all-frozen in the VM by our caller, even though none
- * of its tuples were newly frozen here.
- */
- }
-
- /*
- * It was convenient to ignore LP_DEAD items in all_visible earlier on to
- * make the choice of whether or not to freeze the page unaffected by the
- * short-term presence of LP_DEAD items. These LP_DEAD items were
- * effectively assumed to be LP_UNUSED items in the making. It doesn't
- * matter which vacuum heap pass (initial pass or final pass) ends up
- * setting the page all-frozen, as long as the ongoing VACUUM does it.
- *
- * Now that freezing has been finalized, unset all_visible if there are
- * any LP_DEAD items on the page. It needs to reflect the present state of
- * the page when using it to determine whether or not to update the VM.
- *
- * Keep track of whether or not the page was all-frozen except LP_DEAD
- * items for the purposes of calculating the snapshot conflict horizon,
- * though.
+ * We must decide whether or not to freeze before deciding if and what to
+ * set in the VM.
*/
- all_frozen_except_lp_dead = prstate.all_frozen;
- if (prstate.lpdead_items > 0)
- {
- prstate.all_visible = false;
- prstate.all_frozen = false;
- }
-
- /*
- * If this is an on-access call and we're not actually pruning, avoid
- * setting the visibility map if it would newly dirty the heap page or, if
- * the page is already dirty, if doing so would require including a
- * full-page image (FPI) of the heap page in the WAL. This situation
- * should be rare, as on-access pruning is only attempted when
- * pd_prune_xid is valid.
- */
- if (reason == PRUNE_ON_ACCESS &&
- prstate.consider_update_vm &&
- prstate.all_visible &&
- !do_prune && !do_freeze &&
- (!BufferIsDirty(buffer) || XLogCheckBufferNeedsBackup(buffer)))
- {
- prstate.consider_update_vm = false;
- prstate.all_visible = prstate.all_frozen = false;
- }
-
- Assert(!prstate.all_frozen || prstate.all_visible);
-
- /*
- * Handle setting visibility map bit based on information from the VM (if
- * provided, e.g. by vacuum from the last heap_vac_scan_next_block()
- * call), and from all_visible and all_frozen variables.
- */
- if (prstate.consider_update_vm)
- {
- /*
- * Clear any VM corruption. This does not need to be in a critical
- * section, so we do it first. If PD_ALL_VISIBLE is incorrectly set,
- * we may mark the heap page buffer dirty here and could end up doing
- * so again later. This is not a correctness issue and is in the path
- * of VM corruption, so we don't have to worry about the extra
- * performance overhead.
- */
- if (identify_and_fix_vm_corruption(relation,
- blockno, buffer, page,
- blk_known_av, prstate.lpdead_items, vmbuffer))
- {
- /* If we fix corruption, don't update the VM further */
- }
-
- /* Determine if we actually need to set the VM and which bits to set. */
- else if (prstate.all_visible &&
- (!blk_known_av ||
- (prstate.all_frozen && !VM_ALL_FROZEN(relation, blockno, &vmbuffer))))
- {
- vmflags |= VISIBILITYMAP_ALL_VISIBLE;
- if (prstate.all_frozen)
- vmflags |= VISIBILITYMAP_ALL_FROZEN;
- }
- }
-
- do_set_vm = vmflags & VISIBILITYMAP_VALID_BITS;
-
- /*
- * Don't set PD_ALL_VISIBLE unless we also plan to set the VM. While it is
- * correct for a heap page to have PD_ALL_VISIBLE even if the VM is not
- * set, we strongly prefer to keep them in sync.
- *
- * Prior to Postgres 19, it was possible for the page-level bit to be set
- * and the VM bit to be clear. This could happen if we crashed after
- * setting PD_ALL_VISIBLE but before setting bits in the VM.
- */
- set_pd_all_visible = do_set_vm && !PageIsAllVisible(page);
+ do_freeze = heap_page_will_freeze(relation, buffer,
+ do_prune,
+ do_hint_full_or_prunable,
+ did_tuple_hint_fpi,
+ &prstate,
+ &all_frozen_except_lp_dead);
+
+ do_set_vm = heap_page_will_update_vm(relation,
+ buffer, blockno, page,
+ reason,
+ do_prune, do_freeze,
+ blk_known_av,
+ &prstate,
+ &vmbuffer,
+ &vmflags, &set_pd_all_visible);
/* Save these for the caller in case we later zero out vmflags */
presult->new_vmbits = vmflags;
- /* Any error while applying the changes is critical */
+ /*
+ * Time to actually make the changes to the page and log them. Any error
+ * while applying the changes is critical.
+ */
START_CRIT_SECTION();
if (do_hint_full_or_prunable)
--
2.43.0
v8-0021-Reorder-heap_page_prune_and_freeze-parameters.patchtext/x-patch; charset=US-ASCII; name=v8-0021-Reorder-heap_page_prune_and_freeze-parameters.patchDownload
From d68200de41024bd739177bca24cb51f3f37626b5 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <melanieplageman@gmail.com>
Date: Thu, 31 Jul 2025 12:08:18 -0400
Subject: [PATCH v8 21/22] Reorder heap_page_prune_and_freeze parameters
Reorder parameters so that all of the output parameters are together at
the end of the parameter list.
---
src/backend/access/heap/pruneheap.c | 38 ++++++++++++++--------------
src/backend/access/heap/vacuumlazy.c | 6 ++---
src/include/access/heapam.h | 4 +--
3 files changed, 24 insertions(+), 24 deletions(-)
diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index 8b898fe19dd..0a7a4ba0c0e 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -297,10 +297,10 @@ heap_page_prune_opt(Relation relation, Buffer buffer, Buffer *vmbuffer)
* not the relation has indexes, since we cannot safely determine
* that during on-access pruning with the current implementation.
*/
- heap_page_prune_and_freeze(relation, buffer, false,
+ heap_page_prune_and_freeze(relation, buffer, options, false,
vmbuffer ? *vmbuffer : InvalidBuffer,
- vistest, options,
- NULL, &presult, PRUNE_ON_ACCESS,
+ vistest,
+ NULL, PRUNE_ON_ACCESS, &presult,
&dummy_off_loc, NULL, NULL);
/*
@@ -645,6 +645,15 @@ heap_page_will_freeze(Relation relation, Buffer buffer,
* also need to account for a reduction in the length of the line pointer
* array following array truncation by us.
*
+ * options:
+ * MARK_UNUSED_NOW indicates that dead items can be set LP_UNUSED during
+ * pruning.
+ *
+ * FREEZE indicates that we will also freeze tuples, and will return
+ * 'all_visible', 'all_frozen' flags to the caller.
+ *
+ * UPDATE_VM indicates that we will set the page's status in the VM.
+ *
* If the HEAP_PRUNE_FREEZE option is set, we will also freeze tuples if it's
* required in order to advance relfrozenxid / relminmxid, or if it's
* considered advantageous for overall system performance to do so now. The
@@ -663,30 +672,21 @@ heap_page_will_freeze(Relation relation, Buffer buffer,
* contain the required block of the visibility map.
*
* vistest is used to distinguish whether tuples are DEAD or RECENTLY_DEAD
- * (see heap_prune_satisfies_vacuum).
- *
- * options:
- * MARK_UNUSED_NOW indicates that dead items can be set LP_UNUSED during
- * pruning.
- *
- * FREEZE indicates that we will also freeze tuples, and will return
- * 'all_visible', 'all_frozen' flags to the caller.
- *
- * UPDATE_VM indicates that we will set the page's status in the VM.
+ * (see heap_prune_satisfies_vacuum). It is an input parameter.
*
* cutoffs contains the freeze cutoffs, established by VACUUM at the beginning
* of vacuuming the relation. Required if HEAP_PRUNE_FREEZE option is set.
* cutoffs->OldestXmin is also used to determine if dead tuples are
- * HEAPTUPLE_RECENTLY_DEAD or HEAPTUPLE_DEAD.
+ * HEAPTUPLE_RECENTLY_DEAD or HEAPTUPLE_DEAD. It is an input parameter.
+ *
+ * reason indicates why the pruning is performed. It is included in the WAL
+ * record for debugging and analysis purposes, but otherwise has no effect.
*
* presult contains output parameters needed by callers, such as the number of
* tuples removed and the offsets of dead items on the page after pruning.
* heap_page_prune_and_freeze() is responsible for initializing it. Required
* by all callers.
*
- * reason indicates why the pruning is performed. It is included in the WAL
- * record for debugging and analysis purposes, but otherwise has no effect.
- *
* off_loc is the offset location required by the caller to use in error
* callback.
*
@@ -699,13 +699,13 @@ heap_page_will_freeze(Relation relation, Buffer buffer,
*/
void
heap_page_prune_and_freeze(Relation relation, Buffer buffer,
+ int options,
bool blk_known_av,
Buffer vmbuffer,
GlobalVisState *vistest,
- int options,
struct VacuumCutoffs *cutoffs,
- PruneFreezeResult *presult,
PruneReason reason,
+ PruneFreezeResult *presult,
OffsetNumber *off_loc,
TransactionId *new_relfrozen_xid,
MultiXactId *new_relmin_mxid)
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index 61c6b3d21ac..cead3ec84a4 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -1991,11 +1991,11 @@ lazy_scan_prune(LVRelState *vacrel,
if (vacrel->nindexes == 0)
prune_options |= HEAP_PAGE_PRUNE_MARK_UNUSED_NOW;
- heap_page_prune_and_freeze(rel, buf,
+ heap_page_prune_and_freeze(rel, buf, prune_options,
all_visible_according_to_vm,
vmbuffer,
- vacrel->vistest, prune_options,
- &vacrel->cutoffs, &presult, PRUNE_VACUUM_SCAN,
+ vacrel->vistest,
+ &vacrel->cutoffs, PRUNE_VACUUM_SCAN, &presult,
&vacrel->offnum,
&vacrel->NewRelfrozenXid, &vacrel->NewRelminMxid);
diff --git a/src/include/access/heapam.h b/src/include/access/heapam.h
index 16f7904a21e..0c4e5607627 100644
--- a/src/include/access/heapam.h
+++ b/src/include/access/heapam.h
@@ -394,13 +394,13 @@ struct GlobalVisState;
extern void heap_page_prune_opt(Relation relation, Buffer buffer,
Buffer *vmbuffer);
extern void heap_page_prune_and_freeze(Relation relation, Buffer buffer,
+ int options,
bool blk_known_av,
Buffer vmbuffer,
struct GlobalVisState *vistest,
- int options,
struct VacuumCutoffs *cutoffs,
- PruneFreezeResult *presult,
PruneReason reason,
+ PruneFreezeResult *presult,
OffsetNumber *off_loc,
TransactionId *new_relfrozen_xid,
MultiXactId *new_relmin_mxid);
--
2.43.0
v8-0022-Set-pd_prune_xid-on-insert.patchtext/x-patch; charset=US-ASCII; name=v8-0022-Set-pd_prune_xid-on-insert.patchDownload
From 5dcb61d6fba53255bbc3356afb90e575ecf7789d Mon Sep 17 00:00:00 2001
From: Melanie Plageman <melanieplageman@gmail.com>
Date: Tue, 29 Jul 2025 16:12:56 -0400
Subject: [PATCH v8 22/22] Set pd_prune_xid on insert
Now that we can set the VM during read-only queries, it makes sense to
start setting the page prunable hint on insert. This will allow
heap_page_prune_and_freeze() to be called when the page is full or
mostly full.
For years there has been a note in heap_insert() and heap_multi_insert()
pointing out that setting pd_prune_xid would help clean up aborted
inserted tuples that would otherwise not be cleaned up until vacuum.
So, that's another benefit of setting it.
Setting pd_prune_xid on insert causes a page to be pruned and then
written out which then affects the reported number of hits in the
index-killtuples isolation test. This is a quirk of how hits are tracked
which sometimes leads them to be double counted. This should probably be
fixed or changed independently.
---
src/backend/access/heap/heapam.c | 25 +++++++++++++------
src/backend/access/heap/heapam_xlog.c | 15 ++++++++++-
.../isolation/expected/index-killtuples.out | 6 ++---
3 files changed, 34 insertions(+), 12 deletions(-)
diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index d99160d5f82..28da6a1a0fb 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -2105,6 +2105,7 @@ heap_insert(Relation relation, HeapTuple tup, CommandId cid,
TransactionId xid = GetCurrentTransactionId();
HeapTuple heaptup;
Buffer buffer;
+ Page page;
Buffer vmbuffer = InvalidBuffer;
bool all_visible_cleared = false;
@@ -2164,15 +2165,19 @@ heap_insert(Relation relation, HeapTuple tup, CommandId cid,
}
/*
- * XXX Should we set PageSetPrunable on this page ?
+ * Set pd_prune_xid to trigger heap_page_prune_and_freeze() once the page
+ * is full so that we can set the page all-visible in the VM.
*
- * The inserting transaction may eventually abort thus making this tuple
- * DEAD and hence available for pruning. Though we don't want to optimize
- * for aborts, if no other tuple in this page is UPDATEd/DELETEd, the
- * aborted tuple will never be pruned until next vacuum is triggered.
+ * Setting pd_prune_xid is also handy if the inserting transaction
+ * eventually aborts making this tuple DEAD and hence available for
+ * pruning. If no other tuple in this page is UPDATEd/DELETEd, the aborted
+ * tuple would never otherwise be pruned until next vacuum is triggered.
*
- * If you do add PageSetPrunable here, add it in heap_xlog_insert too.
+ * Don't set it if we are in bootstrap mode, though.
*/
+ page = BufferGetPage(buffer);
+ if (TransactionIdIsNormal(xid))
+ PageSetPrunable(page, xid);
MarkBufferDirty(buffer);
@@ -2182,7 +2187,6 @@ heap_insert(Relation relation, HeapTuple tup, CommandId cid,
xl_heap_insert xlrec;
xl_heap_header xlhdr;
XLogRecPtr recptr;
- Page page = BufferGetPage(buffer);
uint8 info = XLOG_HEAP_INSERT;
int bufflags = 0;
@@ -2546,8 +2550,13 @@ heap_multi_insert(Relation relation, TupleTableSlot **slots, int ntuples,
}
/*
- * XXX Should we set PageSetPrunable on this page ? See heap_insert()
+ * Set pd_prune_xid. See heap_insert() for more on why we do this when
+ * inserting tuples. This only makes sense if we aren't already
+ * setting the page frozen in the VM. We also don't set it in
+ * bootstrap mode.
*/
+ if (!all_frozen_set && TransactionIdIsNormal(xid))
+ PageSetPrunable(page, xid);
MarkBufferDirty(buffer);
diff --git a/src/backend/access/heap/heapam_xlog.c b/src/backend/access/heap/heapam_xlog.c
index 83a5f3dbc34..67256280d94 100644
--- a/src/backend/access/heap/heapam_xlog.c
+++ b/src/backend/access/heap/heapam_xlog.c
@@ -474,6 +474,12 @@ heap_xlog_insert(XLogReaderState *record)
freespace = PageGetHeapFreeSpace(page); /* needed to update FSM below */
+ /*
+ * Set the page prunable to trigger on-access pruning later which may
+ * set the page all-visible in the VM.
+ */
+ PageSetPrunable(page, XLogRecGetXid(record));
+
PageSetLSN(page, lsn);
if (xlrec->flags & XLH_INSERT_ALL_VISIBLE_CLEARED)
@@ -623,9 +629,16 @@ heap_xlog_multi_insert(XLogReaderState *record)
if (xlrec->flags & XLH_INSERT_ALL_VISIBLE_CLEARED)
PageClearAllVisible(page);
- /* XLH_INSERT_ALL_FROZEN_SET implies that all tuples are visible */
+ /*
+ * XLH_INSERT_ALL_FROZEN_SET implies that all tuples are visible. If
+ * we are not setting the page frozen, then set the page's prunable
+ * hint so that we trigger on-access pruning later which may set the
+ * page all-visible in the VM.
+ */
if (xlrec->flags & XLH_INSERT_ALL_FROZEN_SET)
PageSetAllVisible(page);
+ else
+ PageSetPrunable(page, XLogRecGetXid(record));
MarkBufferDirty(buffer);
}
diff --git a/src/test/isolation/expected/index-killtuples.out b/src/test/isolation/expected/index-killtuples.out
index be7ddd756ef..b29f2434b00 100644
--- a/src/test/isolation/expected/index-killtuples.out
+++ b/src/test/isolation/expected/index-killtuples.out
@@ -54,7 +54,7 @@ step flush: SELECT FROM pg_stat_force_next_flush();
step result: SELECT heap_blks_read + heap_blks_hit - counter.heap_accesses AS new_heap_accesses FROM counter, pg_statio_all_tables WHERE relname = 'kill_prior_tuple';
new_heap_accesses
-----------------
- 1
+ 2
(1 row)
step measure: UPDATE counter SET heap_accesses = (SELECT heap_blks_read + heap_blks_hit FROM pg_statio_all_tables WHERE relname = 'kill_prior_tuple');
@@ -130,7 +130,7 @@ step flush: SELECT FROM pg_stat_force_next_flush();
step result: SELECT heap_blks_read + heap_blks_hit - counter.heap_accesses AS new_heap_accesses FROM counter, pg_statio_all_tables WHERE relname = 'kill_prior_tuple';
new_heap_accesses
-----------------
- 1
+ 2
(1 row)
step measure: UPDATE counter SET heap_accesses = (SELECT heap_blks_read + heap_blks_hit FROM pg_statio_all_tables WHERE relname = 'kill_prior_tuple');
@@ -283,7 +283,7 @@ step flush: SELECT FROM pg_stat_force_next_flush();
step result: SELECT heap_blks_read + heap_blks_hit - counter.heap_accesses AS new_heap_accesses FROM counter, pg_statio_all_tables WHERE relname = 'kill_prior_tuple';
new_heap_accesses
-----------------
- 1
+ 2
(1 row)
step measure: UPDATE counter SET heap_accesses = (SELECT heap_blks_read + heap_blks_hit FROM pg_statio_all_tables WHERE relname = 'kill_prior_tuple');
--
2.43.0
Hi,
On 2025-09-02 19:11:01 -0400, Melanie Plageman wrote:
From dd98177294011ee93cac122405516abd89f4e393 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <melanieplageman@gmail.com>
Date: Wed, 27 Aug 2025 08:50:15 -0400
Subject: [PATCH v8 01/22] Remove unneeded VM pin from VM replay
LGTM.
From 7c5cb3edf89735eaa8bee9ca46111bd6c554720b Mon Sep 17 00:00:00 2001
From: Melanie Plageman <melanieplageman@gmail.com>
Date: Wed, 27 Aug 2025 10:07:29 -0400
Subject: [PATCH v8 02/22] Add assert and log message to visibilitymap_set
LGTM.
From 07f31099754636ec9dabf6cca06c33c4b19c230c Mon Sep 17 00:00:00 2001
From: Melanie Plageman <melanieplageman@gmail.com>
Date: Tue, 17 Jun 2025 17:22:10 -0400
Subject: [PATCH v8 03/22] Eliminate xl_heap_visible in COPY FREEZEInstead of emitting a separate WAL record for setting the VM bits in
xl_heap_visible, specify the changes to make to the VM block in the
xl_heap_multi_insert record instead.Author: Melanie Plageman <melanieplageman@gmail.com>
Reviewed-by: Kirill Reshke <reshkekirill@gmail.com>
Discussion: /messages/by-id/flat/CAAKRu_ZMw6Npd_qm2KM+FwQ3cMOMx1Dh3VMhp8-V7SOLxdK9-g@mail.gmail.com
+ /* + * If we're only adding already frozen rows to a previously empty + * page, mark it as all-frozen and update the visibility map. We're + * already holding a pin on the vmbuffer. + */ else if (all_frozen_set) + { + Assert(visibilitymap_pin_ok(BufferGetBlockNumber(buffer), vmbuffer)); PageSetAllVisible(page); + LockBuffer(vmbuffer, BUFFER_LOCK_EXCLUSIVE); + visibilitymap_set_vmbyte(relation, + BufferGetBlockNumber(buffer), + vmbuffer, + VISIBILITYMAP_ALL_VISIBLE | + VISIBILITYMAP_ALL_FROZEN); + }
From an abstraction POV I don't love that heapam now is responsible for
acquiring and releasing the lock. But that ship already kind of has sailed, as
heapam.c is already responsible for releasing the vm buffer etc...
I've wondered about splitting the responsibilities up into multiple
visibilitymap_set_* functions, so that heapam.c wouldn't need to acquire the
lock and set the LSN. But it's probably not worth it.
+ /* + * Now read and update the VM block. Even if we skipped updating the heap + * page due to the file being dropped or truncated later in recovery, it's + * still safe to update the visibility map. Any WAL record that clears + * the visibility map bit does so before checking the page LSN, so any + * bits that need to be cleared will still be cleared. + * + * It is only okay to set the VM bits without holding the heap page lock + * because we can expect no other writers of this page. + */ + if (xlrec->flags & XLH_INSERT_ALL_FROZEN_SET && + XLogReadBufferForRedoExtended(record, 1, RBM_ZERO_ON_ERROR, false, + &vmbuffer) == BLK_NEEDS_REDO) + { + Relation reln = CreateFakeRelcacheEntry(rlocator); + + Assert(visibilitymap_pin_ok(blkno, vmbuffer)); + visibilitymap_set_vmbyte(reln, blkno, + vmbuffer, + VISIBILITYMAP_ALL_VISIBLE | + VISIBILITYMAP_ALL_FROZEN); + + /* + * It is not possible that the VM was already set for this heap page, + * so the vmbuffer must have been modified and marked dirty. + */ + Assert(BufferIsDirty(vmbuffer));
How about making visibilitymap_set_vmbyte() return whether it needed to do
something? This seems somewhat indirect...
I think it might be good to encapsulate this code into a helper in
visibilitymap.c, there will be more callers in the subsequent patches.
+/* + * Set flags in the VM block contained in the passed in vmBuf. + * + * This function is for callers which include the VM changes in the same WAL + * record as the modifications of the heap page which rendered it all-visible. + * Callers separately logging the VM changes should invoke visibilitymap_set() + * instead. + * + * Caller must have pinned and exclusive locked the correct block of the VM in + * vmBuf. This block should contain the VM bits for the given heapBlk. + * + * During normal operation (i.e. not recovery), this should be called in a + * critical section which also makes any necessary changes to the heap page + * and, if relevant, emits WAL. + * + * Caller is responsible for WAL logging the changes to the VM buffer and for + * making any changes needed to the associated heap page. This includes + * maintaining any invariants such as ensuring the buffer containing heapBlk + * is pinned and exclusive locked. + */ +uint8 +visibilitymap_set_vmbyte(Relation rel, BlockNumber heapBlk, + Buffer vmBuf, uint8 flags)
Why is it named vmbyte? This actually just sets the two bits corresponding to
the buffer, not the entire byte. So it seems somewhat misleading to reference
byte.
From dc318358572f61efbd0e05aae2b9a077b422bcf5 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <melanieplageman@gmail.com>
Date: Wed, 18 Jun 2025 12:42:13 -0400
Subject: [PATCH v8 05/22] Eliminate xl_heap_visible from vacuum phase IIIInstead of emitting a separate xl_heap_visible record for each page that
is rendered all-visible by vacuum's third phase, include the updates to
the VM in the already emitted xl_heap_prune record.
Reading through the change I didn't particularly like that there's another
optional field in xl_heap_prune, as it seemed liked something that should be
encoded in flags. Of course there aren't enough flag bits available. But
that made me look at the rest of the record: Uh, what do we use the reason
field for? As far as I can tell f83d709760d8 added it without introducing any
users? It doesn't even seem to be set.
@@ -51,10 +52,15 @@ heap_xlog_prune_freeze(XLogReaderState *record)
(xlrec.flags & (XLHP_HAS_REDIRECTIONS | XLHP_HAS_DEAD_ITEMS)) == 0);/* - * We are about to remove and/or freeze tuples. In Hot Standby mode, - * ensure that there are no queries running for which the removed tuples - * are still visible or which still consider the frozen xids as running. - * The conflict horizon XID comes after xl_heap_prune. + * After xl_heap_prune is the optional snapshot conflict horizon. + * + * In Hot Standby mode, we must ensure that there are no running queries + * which would conflict with the changes in this record. If pruning, that + * means we cannot remove tuples still visible to transactions on the + * standby. If freezing, that means we cannot freeze tuples with xids that + * are still considered running on the standby. And for setting the VM, we + * cannot do so if the page isn't all-visible to all transactions on the + * standby. */
I'm a bit confused by this new comment - it sounds like we're deciding whether
to remove tuple versions, but that decision has long been made, no?
@@ -2846,8 +2848,11 @@ lazy_vacuum_heap_page(LVRelState *vacrel, BlockNumber blkno, Buffer buffer, OffsetNumber unused[MaxHeapTuplesPerPage]; int nunused = 0; TransactionId visibility_cutoff_xid; + TransactionId conflict_xid = InvalidTransactionId; bool all_frozen; LVSavedErrInfo saved_err_info; + uint8 vmflags = 0; + bool set_pd_all_vis = false;Assert(vacrel->do_index_vacuuming);
@@ -2858,6 +2863,20 @@ lazy_vacuum_heap_page(LVRelState *vacrel, BlockNumber blkno, Buffer buffer,
VACUUM_ERRCB_PHASE_VACUUM_HEAP, blkno,
InvalidOffsetNumber);+ if (heap_page_is_all_visible_except_lpdead(vacrel->rel, buffer, + vacrel->cutoffs.OldestXmin, + deadoffsets, num_offsets, + &all_frozen, &visibility_cutoff_xid, + &vacrel->offnum)) + { + vmflags |= VISIBILITYMAP_ALL_VISIBLE; + if (all_frozen) + { + vmflags |= VISIBILITYMAP_ALL_FROZEN; + Assert(!TransactionIdIsValid(visibility_cutoff_xid)); + } + } + START_CRIT_SECTION();
I am rather confused - we never can set all-visible if there are any LP_DEAD
items left. If the idea is that we are removing the LP_DEAD items in
lazy_vacuum_heap_page() - what guarantees that all LP_DEAD items are being
removed? Couldn't some tuples get marked LP_DEAD by on-access pruning, after
vacuum visited the page and collected dead items?
Ugh, I see - it works because we pass in the set of dead items. I think that
makes the name *really* misleading, it's not except LP_DEAD, it's except the
offsets passed in, no?
But then you actually check that the set of dead items didn't change - what
guarantees that?
I didn't look at the later patches, except that I did notice this:
@@ -268,7 +264,7 @@ heap_xlog_prune_freeze(XLogReaderState *record)
Relation reln = CreateFakeRelcacheEntry(rlocator);visibilitymap_pin(reln, blkno, &vmbuffer); - old_vmbits = visibilitymap_set_vmbyte(reln, blkno, vmbuffer, vmflags); + old_vmbits = visibilitymap_set(reln, blkno, vmbuffer, vmflags); /* Only set VM page LSN if we modified the page */ if (old_vmbits != vmflags) PageSetLSN(BufferGetPage(vmbuffer), lsn); @@ -279,143 +275,6 @@ heap_xlog_prune_freeze(XLogReaderState *record) UnlockReleaseBuffer(vmbuffer); }
Why are we manually pinning the vm buffer here? Shouldn't the xlog machinery
have done so, as you noticed in one of the early on patches?
Greetings,
Andres Freund
On Wed, 3 Sept 2025 at 04:11, Melanie Plageman
<melanieplageman@gmail.com> wrote:
On Tue, Sep 2, 2025 at 5:52 PM Melanie Plageman
<melanieplageman@gmail.com> wrote:On Thu, Aug 28, 2025 at 5:12 AM Kirill Reshke <reshkekirill@gmail.com> wrote:
I did micro git-blame research here. I spotted only one related change
[0]. Looks like before this change pin was indeed needed.
But not after this change, so this visibilitymap_pin is just an oversight?
Related thread is [1]. I quickly checked the discussion in this
thread, and it looks like no one was bothered about these lines or VM
logging changes (in this exact pin buffer aspect). The discussion was
of other aspects of this commit.Wow, thanks so much for doing that research. Looking at it myself, it
does indeed seem like just an oversight. It isn't harmful since it
won't take another pin, but it is confusing, so I think we should at
least remove it in master. I'm not as sure about back branches.I've updated the commit message in the patch set to reflect the
research you did in attached v8.- Melanie
Hi!
small comments regarding new series
0001, 0002, 0017 LGTM
In 0015:
```
reshke@yezzey-cbdb-bench:~/postgres$ git diff
src/backend/access/heap/pruneheap.c
diff --git a/src/backend/access/heap/pruneheap.c
b/src/backend/access/heap/pruneheap.c
index 05b51bd8d25..0794af9ae89 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -1398,7 +1398,7 @@ heap_prune_record_unchanged_lp_normal(Page page,
PruneState *prstate, OffsetNumb
/*
* For now always use prstate->cutoffs
for this test, because
* we only update 'all_visible' when
freezing is requested. We
- * could use
GlobalVisTestIsRemovableXid instead, if a
+ * could use GlobalVisXidVisibleToAll
instead, if a
* non-freezing caller wanted to set the VM bit.
*/
Assert(prstate->cutoffs);
```
Also, maybe GlobalVisXidTestAllVisible is a slightly better name? (The
term 'all-visible' is one that we occasionally utilize)
--
Best regards,
Kirill Reshke
Thanks for the review!
On Tue, Sep 2, 2025 at 7:54 PM Andres Freund <andres@anarazel.de> wrote:
On 2025-09-02 19:11:01 -0400, Melanie Plageman wrote:
From dd98177294011ee93cac122405516abd89f4e393 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <melanieplageman@gmail.com>
Date: Wed, 27 Aug 2025 08:50:15 -0400
Subject: [PATCH v8 01/22] Remove unneeded VM pin from VM replay
I didn't push it yet because I did a new version that actually
eliminates the asserts in heap_multi_insert() before calling
visibilitymap_set() -- since they are redundant with checks inside
visibilitymap_set(). 0001 of attached v9 is what I plan to push,
barring any objections.
From 7c5cb3edf89735eaa8bee9ca46111bd6c554720b Mon Sep 17 00:00:00 2001
From: Melanie Plageman <melanieplageman@gmail.com>
Date: Wed, 27 Aug 2025 10:07:29 -0400
Subject: [PATCH v8 02/22] Add assert and log message to visibilitymap_set
I pushed this.
From an abstraction POV I don't love that heapam now is responsible for
acquiring and releasing the lock. But that ship already kind of has sailed, as
heapam.c is already responsible for releasing the vm buffer etc...I've wondered about splitting the responsibilities up into multiple
visibilitymap_set_* functions, so that heapam.c wouldn't need to acquire the
lock and set the LSN. But it's probably not worth it.
Yea, I explored heap wrappers coupling heap operations related to
setting the VM along with the VM updates [1][1] /messages/by-id/CAAKRu_Yj=yrL+gGGsqfYVQcYn7rDp6hDeoF1vN453JDp8dEY+w@mail.gmail.com, but the results weren't
appealing. Setting the heap LSN and marking the heap buffer dirty and
such happens in a different place in different callers because it is
happening as part of the operations that actually end up rendering the
page all-visible.
And a VM-only helper would literally just acquire and release the lock
and set the LSN on the vm page -- which I don't think is worth it.
+ /* + * Now read and update the VM block. Even if we skipped updating the heap + * page due to the file being dropped or truncated later in recovery, it's + * still safe to update the visibility map. Any WAL record that clears + * the visibility map bit does so before checking the page LSN, so any + * bits that need to be cleared will still be cleared. + * + * It is only okay to set the VM bits without holding the heap page lock + * because we can expect no other writers of this page. + */ + if (xlrec->flags & XLH_INSERT_ALL_FROZEN_SET && + XLogReadBufferForRedoExtended(record, 1, RBM_ZERO_ON_ERROR, false, + &vmbuffer) == BLK_NEEDS_REDO) + { + Relation reln = CreateFakeRelcacheEntry(rlocator); + + Assert(visibilitymap_pin_ok(blkno, vmbuffer)); + visibilitymap_set_vmbyte(reln, blkno, + vmbuffer, + VISIBILITYMAP_ALL_VISIBLE | + VISIBILITYMAP_ALL_FROZEN); + + /* + * It is not possible that the VM was already set for this heap page, + * so the vmbuffer must have been modified and marked dirty. + */ + Assert(BufferIsDirty(vmbuffer));How about making visibilitymap_set_vmbyte() return whether it needed to do
something? This seems somewhat indirect...
It does return the state of the previous bits. But, I am specifically
asserting that the buffer is dirty because I am about to set the page
LSN. So I don't just care that changes were made, I care that we
remembered to mark the buffer dirty.
I think it might be good to encapsulate this code into a helper in
visibilitymap.c, there will be more callers in the subsequent patches.
By the end of the set, the different callers have different
expectations (some don't expect the buffer to have been dirtied
necessarily) and where they do the various related operations is
spread out depending on the caller. I just couldn't come up with a
helper solution I liked.
That being said, I definitely don't think it's needed for this patch
(logging setting the VM in xl_heap_multi_insert()).
+uint8 +visibilitymap_set_vmbyte(Relation rel, BlockNumber heapBlk, + Buffer vmBuf, uint8 flags)Why is it named vmbyte? This actually just sets the two bits corresponding to
the buffer, not the entire byte. So it seems somewhat misleading to reference
byte.
Renamed it to visibilitymap_set_vmbits.
Instead of emitting a separate xl_heap_visible record for each page that
is rendered all-visible by vacuum's third phase, include the updates to
the VM in the already emitted xl_heap_prune record.Reading through the change I didn't particularly like that there's another
optional field in xl_heap_prune, as it seemed liked something that should be
encoded in flags. Of course there aren't enough flag bits available. But
that made me look at the rest of the record: Uh, what do we use the reason
field for? As far as I can tell f83d709760d8 added it without introducing any
users? It doesn't even seem to be set.
yikes, you are right about the "reason" member. Attached 0002 removes
it, and I'll go ahead and fix it in the back branches too. I can't
fathom how that slipped through the cracks. We do pass the PruneReason
for setting the rmgr info about what type of record it is (i.e. if it
is one emitted by vacuum phase I, phase III, or on-access pruning).
But we don't need or use a separate member.. I went back and tried to
figure out what the rationale was, but I couldn't find anything.
As for the VM flags being an optional unaligned member -- in v9, I've
expanded the flags member to a uint16 to make room for the extra
flags. Seems we've been surviving with using up 2 bytes this long.
@@ -51,10 +52,15 @@ heap_xlog_prune_freeze(XLogReaderState *record)
(xlrec.flags & (XLHP_HAS_REDIRECTIONS | XLHP_HAS_DEAD_ITEMS)) == 0);/* - * We are about to remove and/or freeze tuples. In Hot Standby mode, - * ensure that there are no queries running for which the removed tuples - * are still visible or which still consider the frozen xids as running. - * The conflict horizon XID comes after xl_heap_prune. + * After xl_heap_prune is the optional snapshot conflict horizon. + * + * In Hot Standby mode, we must ensure that there are no running queries + * which would conflict with the changes in this record. If pruning, that + * means we cannot remove tuples still visible to transactions on the + * standby. If freezing, that means we cannot freeze tuples with xids that + * are still considered running on the standby. And for setting the VM, we + * cannot do so if the page isn't all-visible to all transactions on the + * standby. */I'm a bit confused by this new comment - it sounds like we're deciding whether
to remove tuple versions, but that decision has long been made, no?
Well, the comment is a revision of a comment that was already there on
essentially why replaying this record could cause recovery conflicts.
It mentioned pruning and freezing, so I expanded it to mention setting
the VM. Taking into account your confusion, I tried rewording it in
attached v9.
+ if (heap_page_is_all_visible_except_lpdead(vacrel->rel, buffer, + vacrel->cutoffs.OldestXmin, + deadoffsets, num_offsets, + &all_frozen, &visibility_cutoff_xid, + &vacrel->offnum))I am rather confused - we never can set all-visible if there are any LP_DEAD
items left. If the idea is that we are removing the LP_DEAD items in
lazy_vacuum_heap_page() - what guarantees that all LP_DEAD items are being
removed? Couldn't some tuples get marked LP_DEAD by on-access pruning, after
vacuum visited the page and collected dead items?Ugh, I see - it works because we pass in the set of dead items. I think that
makes the name *really* misleading, it's not except LP_DEAD, it's except the
offsets passed in, no?But then you actually check that the set of dead items didn't change - what
guarantees that?
So, I pass in the deadoffsets we got from the TIDStore. If the only
dead items on the page are exactly those dead items, then the page
will be all-visible as soon as we set those LP_UNUSED -- which we do
unconditionally. And we have the lock on the page, so no one can
on-access prune and make new dead items while we are in
lazy_vacuum_heap_page().
Given your confusion, I've refactored this and used a different
approach -- I explicitly check the passed-in deadoffsets array when I
encounter a dead item and see if it is there. That should hopefully
make it more clear.
I didn't look at the later patches, except that I did notice this:
<--snip-->
Why are we manually pinning the vm buffer here? Shouldn't the xlog machinery
have done so, as you noticed in one of the early on patches?
Fixed. Thanks!
- Melanie
[1]: [1] /messages/by-id/CAAKRu_Yj=yrL+gGGsqfYVQcYn7rDp6hDeoF1vN453JDp8dEY+w@mail.gmail.com
Attachments:
v9-0002-Remove-unused-xl_heap_prune-member-reason.patchtext/x-patch; charset=US-ASCII; name=v9-0002-Remove-unused-xl_heap_prune-member-reason.patchDownload
From df9b87d0a1a973c0c655f5ba858485795ff98951 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <melanieplageman@gmail.com>
Date: Wed, 3 Sep 2025 15:02:58 -0400
Subject: [PATCH v9 02/22] Remove unused xl_heap_prune member, reason
f83d709760d8 refactored xl_heap_prune and added an unused member,
reason. While PruneReason is used when constructing this WAL record to
set the WAL record definition, it doesn't need to be stored in a
separate field in the record. Remove it.
Author: Melanie Plageman <melanieplageman@gmail.com>
Reported-by: Andres Freund <andres@anarazel.de>
Discussion: https://postgr.es/m/tvvtfoxz5ykpsctxjbzxg3nldnzfc7geplrt2z2s54pmgto27y%40hbijsndifu45
---
src/include/access/heapam_xlog.h | 1 -
1 file changed, 1 deletion(-)
diff --git a/src/include/access/heapam_xlog.h b/src/include/access/heapam_xlog.h
index 277df6b3cf0..d4c0625b632 100644
--- a/src/include/access/heapam_xlog.h
+++ b/src/include/access/heapam_xlog.h
@@ -284,7 +284,6 @@ typedef struct xl_heap_update
*/
typedef struct xl_heap_prune
{
- uint8 reason;
uint8 flags;
/*
--
2.43.0
v9-0005-Eliminate-xl_heap_visible-from-vacuum-phase-III.patchtext/x-patch; charset=US-ASCII; name=v9-0005-Eliminate-xl_heap_visible-from-vacuum-phase-III.patchDownload
From 81b134346c1a981382d1eb915472aa3f26bb3586 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <melanieplageman@gmail.com>
Date: Wed, 18 Jun 2025 12:42:13 -0400
Subject: [PATCH v9 05/22] Eliminate xl_heap_visible from vacuum phase III
Instead of emitting a separate xl_heap_visible record for each page that
is rendered all-visible by vacuum's third phase, include the updates to
the VM in the already emitted xl_heap_prune record.
The visibilitymap bits are stored in the flags member of the
xl_heap_prune record.
Author: Melanie Plageman <melanieplageman@gmail.com>
Reviewed-by: Kirill Reshke <reshkekirill@gmail.com>
Reviewed-by: Andres Freund <andres@anarazel.de>
Discussion: https://postgr.es/m/flat/CAAKRu_ZMw6Npd_qm2KM%2BFwQ3cMOMx1Dh3VMhp8-V7SOLxdK9-g%40mail.gmail.com
---
src/backend/access/heap/heapam_xlog.c | 145 ++++++++++++++++++----
src/backend/access/heap/pruneheap.c | 66 ++++++++--
src/backend/access/heap/vacuumlazy.c | 164 +++++++++++++++++--------
src/backend/access/rmgrdesc/heapdesc.c | 7 +-
src/include/access/heapam.h | 9 ++
src/include/access/heapam_xlog.h | 36 ++++--
6 files changed, 330 insertions(+), 97 deletions(-)
diff --git a/src/backend/access/heap/heapam_xlog.c b/src/backend/access/heap/heapam_xlog.c
index 0820f7d052d..11c11929ed9 100644
--- a/src/backend/access/heap/heapam_xlog.c
+++ b/src/backend/access/heap/heapam_xlog.c
@@ -35,7 +35,9 @@ heap_xlog_prune_freeze(XLogReaderState *record)
Buffer buffer;
RelFileLocator rlocator;
BlockNumber blkno;
- XLogRedoAction action;
+ Buffer vmbuffer = InvalidBuffer;
+ uint8 vmflags = 0;
+ Size freespace = 0;
XLogRecGetBlockTag(record, 0, &rlocator, NULL, &blkno);
memcpy(&xlrec, maindataptr, SizeOfHeapPrune);
@@ -50,11 +52,17 @@ heap_xlog_prune_freeze(XLogReaderState *record)
Assert((xlrec.flags & XLHP_CLEANUP_LOCK) != 0 ||
(xlrec.flags & (XLHP_HAS_REDIRECTIONS | XLHP_HAS_DEAD_ITEMS)) == 0);
+ vmflags = xlrec.flags & VISIBILITYMAP_VALID_BITS;
+
/*
- * We are about to remove and/or freeze tuples. In Hot Standby mode,
- * ensure that there are no queries running for which the removed tuples
- * are still visible or which still consider the frozen xids as running.
- * The conflict horizon XID comes after xl_heap_prune.
+ * After xl_heap_prune is the optional snapshot conflict horizon.
+ *
+ * In Hot Standby mode, we must ensure that there are no running queries
+ * which would conflict with the changes in this record. That means we
+ * can't replay this record if it removes tuples that are still visible to
+ * transactions on the standby, freeze tuples with xids that are still
+ * considered running on the standby, or set a page as all-visible in the
+ * VM if it isn't all-visible to all transactions on the standby.
*/
if ((xlrec.flags & XLHP_HAS_CONFLICT_HORIZON) != 0)
{
@@ -71,12 +79,12 @@ heap_xlog_prune_freeze(XLogReaderState *record)
}
/*
- * If we have a full-page image, restore it and we're done.
+ * If we have a full-page image of the heap block, restore it and we're
+ * done with the heap block.
*/
- action = XLogReadBufferForRedoExtended(record, 0, RBM_NORMAL,
- (xlrec.flags & XLHP_CLEANUP_LOCK) != 0,
- &buffer);
- if (action == BLK_NEEDS_REDO)
+ if (XLogReadBufferForRedoExtended(record, 0, RBM_NORMAL,
+ (xlrec.flags & XLHP_CLEANUP_LOCK) != 0,
+ &buffer) == BLK_NEEDS_REDO)
{
Page page = BufferGetPage(buffer);
OffsetNumber *redirected;
@@ -89,6 +97,9 @@ heap_xlog_prune_freeze(XLogReaderState *record)
Size datalen;
xlhp_freeze_plan *plans;
OffsetNumber *frz_offsets;
+ bool do_prune;
+ bool mark_buffer_dirty;
+ bool set_heap_lsn;
char *dataptr = XLogRecGetBlockData(record, 0, &datalen);
heap_xlog_deserialize_prune_and_freeze(dataptr, xlrec.flags,
@@ -97,11 +108,18 @@ heap_xlog_prune_freeze(XLogReaderState *record)
&ndead, &nowdead,
&nunused, &nowunused);
+ do_prune = nredirected > 0 || ndead > 0 || nunused > 0;
+ set_heap_lsn = mark_buffer_dirty = do_prune || nplans > 0;
+
+ /* Ensure the record does something */
+ Assert(do_prune || nplans > 0 ||
+ vmflags & VISIBILITYMAP_VALID_BITS);
+
/*
* Update all line pointers per the record, and repair fragmentation
* if needed.
*/
- if (nredirected > 0 || ndead > 0 || nunused > 0)
+ if (do_prune)
heap_page_prune_execute(buffer,
(xlrec.flags & XLHP_CLEANUP_LOCK) == 0,
redirected, nredirected,
@@ -138,36 +156,117 @@ heap_xlog_prune_freeze(XLogReaderState *record)
/* There should be no more data */
Assert((char *) frz_offsets == dataptr + datalen);
+ /*
+ * Now set PD_ALL_VISIBLE, if required. We'll only do this if we are
+ * also going to set bits in the VM later.
+ *
+ * We must never end up with the VM bit set and the page-level
+ * PD_ALL_VISIBLE bit clear. If that were to occur, a subsequent page
+ * modification would fail to clear the VM bit.
+ */
+ if ((vmflags & VISIBILITYMAP_VALID_BITS) && !PageIsAllVisible(page))
+ {
+ PageSetAllVisible(page);
+
+ /*
+ * If the only change to the heap page is setting PD_ALL_VISIBLE,
+ * we can avoid setting the page LSN unless checksums or
+ * wal_log_hints are enabled.
+ */
+ set_heap_lsn = XLogHintBitIsNeeded() ? true : set_heap_lsn;
+ mark_buffer_dirty = true;
+ }
+
/*
* Note: we don't worry about updating the page's prunability hints.
* At worst this will cause an extra prune cycle to occur soon.
*/
- PageSetLSN(page, lsn);
- MarkBufferDirty(buffer);
+ if (mark_buffer_dirty)
+ MarkBufferDirty(buffer);
+ if (set_heap_lsn)
+ PageSetLSN(page, lsn);
}
/*
- * If we released any space or line pointers, update the free space map.
+ * If we released any space or line pointers or will be setting a page in
+ * the visibility map, measure the page's freespace to later update the
+ * freespace map.
+ *
+ * Even if we are just updating the VM (and thus not freeing up any
+ * space), we'll still update the FSM for this page. Since FSM is not
+ * WAL-logged and only updated heuristically, it easily becomes stale in
+ * standbys. If the standby is later promoted and runs VACUUM, it will
+ * skip updating individual free space figures for pages that became
+ * all-visible (or all-frozen, depending on the vacuum mode,) which is
+ * troublesome when FreeSpaceMapVacuum propagates too optimistic free
+ * space values to upper FSM layers; later inserters try to use such pages
+ * only to find out that they are unusable. This can cause long stalls
+ * when there are many such pages.
+ *
+ * Forestall those problems by updating FSM's idea about a page that is
+ * becoming all-visible or all-frozen.
*
* Do this regardless of a full-page image being applied, since the FSM
* data is not in the page anyway.
+ *
+ * We want to avoid holding an exclusive lock on the heap buffer while
+ * doing IO (either of the FSM or the VM), so we'll release the lock on
+ * the heap buffer before doing either.
*/
if (BufferIsValid(buffer))
{
- if (xlrec.flags & (XLHP_HAS_REDIRECTIONS |
- XLHP_HAS_DEAD_ITEMS |
- XLHP_HAS_NOW_UNUSED_ITEMS))
- {
- Size freespace = PageGetHeapFreeSpace(BufferGetPage(buffer));
+ if ((xlrec.flags & (XLHP_HAS_REDIRECTIONS |
+ XLHP_HAS_DEAD_ITEMS |
+ XLHP_HAS_NOW_UNUSED_ITEMS)) ||
+ vmflags & VISIBILITYMAP_VALID_BITS)
+ freespace = PageGetHeapFreeSpace(BufferGetPage(buffer));
+
+ UnlockReleaseBuffer(buffer);
+ }
+
+ /*
+ * Read and update the VM block. Even if we skipped updating the heap page
+ * due to the file being dropped or truncated later in recovery, it's
+ * still safe to update the visibility map. Any WAL record that clears
+ * the visibility map bit does so before checking the page LSN, so any
+ * bits that need to be cleared will still be cleared.
+ *
+ * Note that it is _only_ okay that we do not hold a lock on the heap page
+ * because we are in recovery and can expect no other writers to clear
+ * PD_ALL_VISIBLE before we are able to update the VM.
+ */
+ if (vmflags & VISIBILITYMAP_VALID_BITS &&
+ XLogReadBufferForRedoExtended(record, 1,
+ RBM_ZERO_ON_ERROR,
+ false,
+ &vmbuffer) == BLK_NEEDS_REDO)
+ {
+ Page vmpage = BufferGetPage(vmbuffer);
+ uint8 old_vmbits = 0;
+ Relation reln = CreateFakeRelcacheEntry(rlocator);
- UnlockReleaseBuffer(buffer);
+ /* initialize the page if it was read as zeros */
+ if (PageIsNew(vmpage))
+ PageInit(vmpage, BLCKSZ, 0);
+
+ old_vmbits = visibilitymap_set_vmbits(reln, blkno, vmbuffer, vmflags);
- XLogRecordPageWithFreeSpace(rlocator, blkno, freespace);
+ /* Only set VM page LSN if we modified the page */
+ if (old_vmbits != vmflags)
+ {
+ Assert(BufferIsDirty(vmbuffer));
+ PageSetLSN(BufferGetPage(vmbuffer), lsn);
}
- else
- UnlockReleaseBuffer(buffer);
+
+ FreeFakeRelcacheEntry(reln);
}
+
+ if (BufferIsValid(vmbuffer))
+ UnlockReleaseBuffer(vmbuffer);
+
+ if (freespace > 0)
+ XLogRecordPageWithFreeSpace(rlocator, blkno, freespace);
}
/*
diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index 7ebd22f00a3..f0b33d1b696 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -21,6 +21,7 @@
#include "access/transam.h"
#include "access/xlog.h"
#include "access/xloginsert.h"
+#include "access/visibilitymapdefs.h"
#include "commands/vacuum.h"
#include "executor/instrument.h"
#include "miscadmin.h"
@@ -835,6 +836,7 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
conflict_xid = prstate.latest_xid_removed;
log_heap_prune_and_freeze(relation, buffer,
+ InvalidBuffer, 0, false,
conflict_xid,
true, reason,
prstate.frozen, prstate.nfrozen,
@@ -2030,14 +2032,18 @@ heap_log_freeze_plan(HeapTupleFreeze *tuples, int ntuples,
*
* This is used for several different page maintenance operations:
*
- * - Page pruning, in VACUUM's 1st pass or on access: Some items are
+ * - Page pruning, in vacuum phase I or on-access: Some items are
* redirected, some marked dead, and some removed altogether.
*
- * - Freezing: Items are marked as 'frozen'.
+ * - Freezing: During vacuum phase I, items are marked as 'frozen'
*
- * - Vacuum, 2nd pass: Items that are already LP_DEAD are marked as unused.
+ * - Reaping: During vacuum phase III, items that are already LP_DEAD are
+ * marked as unused.
*
- * They have enough commonalities that we use a single WAL record for them
+ * - VM updates: After vacuum phase III, the heap page may be marked
+ * all-visible and all-frozen.
+ *
+ * These changes all happen together, so we use a singel WAL record for them
* all.
*
* If replaying the record requires a cleanup lock, pass cleanup_lock = true.
@@ -2045,12 +2051,23 @@ heap_log_freeze_plan(HeapTupleFreeze *tuples, int ntuples,
* replaying 'unused' items depends on whether they were all previously marked
* as dead.
*
+ * If the VM is being updated, vmflags will contain the bits to set. In this
+ * case, vmbuffer should already have been updated and marked dirty and should
+ * still be pinned and locked.
+ *
+ * set_pd_all_vis indicates that we set PD_ALL_VISIBLE and thus should update
+ * the page LSN when checksums/wal_log_hints are enabled even if we did not
+ * prune or freeze tuples on the page.
+ *
* Note: This function scribbles on the 'frozen' array.
*
* Note: This is called in a critical section, so careful what you do here.
*/
void
log_heap_prune_and_freeze(Relation relation, Buffer buffer,
+ Buffer vmbuffer,
+ uint8 vmflags,
+ bool set_pd_all_vis,
TransactionId conflict_xid,
bool cleanup_lock,
PruneReason reason,
@@ -2062,6 +2079,7 @@ log_heap_prune_and_freeze(Relation relation, Buffer buffer,
xl_heap_prune xlrec;
XLogRecPtr recptr;
uint8 info;
+ uint8 regbuf_flags;
/* The following local variables hold data registered in the WAL record: */
xlhp_freeze_plan plans[MaxHeapTuplesPerPage];
@@ -2070,8 +2088,21 @@ log_heap_prune_and_freeze(Relation relation, Buffer buffer,
xlhp_prune_items dead_items;
xlhp_prune_items unused_items;
OffsetNumber frz_offsets[MaxHeapTuplesPerPage];
+ bool do_prune = nredirected > 0 || ndead > 0 || nunused > 0;
+
+ Assert((vmflags & VISIBILITYMAP_VALID_BITS) == vmflags);
+ xlrec.flags = vmflags;
- xlrec.flags = 0;
+ regbuf_flags = REGBUF_STANDARD;
+
+ /*
+ * We can avoid an FPI if the only modification we are making to the heap
+ * page is to set PD_ALL_VISIBLE and checksums/wal_log_hints are disabled.
+ */
+ if (!do_prune &&
+ nfrozen == 0 &&
+ (!set_pd_all_vis || !XLogHintBitIsNeeded()))
+ regbuf_flags |= REGBUF_NO_IMAGE;
/*
* Prepare data for the buffer. The arrays are not actually in the
@@ -2079,7 +2110,11 @@ log_heap_prune_and_freeze(Relation relation, Buffer buffer,
* page image, the arrays can be omitted.
*/
XLogBeginInsert();
- XLogRegisterBuffer(0, buffer, REGBUF_STANDARD);
+ XLogRegisterBuffer(0, buffer, regbuf_flags);
+
+ if (vmflags & VISIBILITYMAP_VALID_BITS)
+ XLogRegisterBuffer(1, vmbuffer, 0);
+
if (nfrozen > 0)
{
int nplans;
@@ -2168,5 +2203,22 @@ log_heap_prune_and_freeze(Relation relation, Buffer buffer,
}
recptr = XLogInsert(RM_HEAP2_ID, info);
- PageSetLSN(BufferGetPage(buffer), recptr);
+ if (vmflags & VISIBILITYMAP_VALID_BITS)
+ {
+ Assert(BufferIsDirty(vmbuffer));
+ PageSetLSN(BufferGetPage(vmbuffer), recptr);
+ }
+
+ /*
+ * If pruning or freezing tuples or setting the page all-visible when
+ * checksums or wal_hint_bits are enabled, we must bump the LSN. Torn
+ * pages are possible if we update PD_ALL_VISIBLE without bumping the LSN,
+ * but this is deemed okay for page hint updates.
+ */
+ if (do_prune || nfrozen > 0 ||
+ (set_pd_all_vis && XLogHintBitIsNeeded()))
+ {
+ Assert(BufferIsDirty(buffer));
+ PageSetLSN(BufferGetPage(buffer), recptr);
+ }
}
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index 7f6f684bc63..a50652ca5a0 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -463,11 +463,13 @@ static void dead_items_add(LVRelState *vacrel, BlockNumber blkno, OffsetNumber *
int num_offsets);
static void dead_items_reset(LVRelState *vacrel);
static void dead_items_cleanup(LVRelState *vacrel);
-static bool heap_page_is_all_visible(Relation rel, Buffer buf,
- TransactionId OldestXmin,
- bool *all_frozen,
- TransactionId *visibility_cutoff_xid,
- OffsetNumber *logging_offnum);
+static bool heap_page_would_be_all_visible(Relation rel, Buffer buf,
+ TransactionId OldestXmin,
+ OffsetNumber *deadoffsets,
+ int ndeadoffsets,
+ bool *all_frozen,
+ TransactionId *visibility_cutoff_xid,
+ OffsetNumber *logging_offnum);
static void update_relstats_all_indexes(LVRelState *vacrel);
static void vacuum_error_callback(void *arg);
static void update_vacuum_error_info(LVRelState *vacrel,
@@ -2846,8 +2848,11 @@ lazy_vacuum_heap_page(LVRelState *vacrel, BlockNumber blkno, Buffer buffer,
OffsetNumber unused[MaxHeapTuplesPerPage];
int nunused = 0;
TransactionId visibility_cutoff_xid;
+ TransactionId conflict_xid = InvalidTransactionId;
bool all_frozen;
LVSavedErrInfo saved_err_info;
+ uint8 vmflags = 0;
+ bool set_pd_all_vis = false;
Assert(vacrel->do_index_vacuuming);
@@ -2858,6 +2863,20 @@ lazy_vacuum_heap_page(LVRelState *vacrel, BlockNumber blkno, Buffer buffer,
VACUUM_ERRCB_PHASE_VACUUM_HEAP, blkno,
InvalidOffsetNumber);
+ if (heap_page_would_be_all_visible(vacrel->rel, buffer,
+ vacrel->cutoffs.OldestXmin,
+ deadoffsets, num_offsets,
+ &all_frozen, &visibility_cutoff_xid,
+ &vacrel->offnum))
+ {
+ vmflags |= VISIBILITYMAP_ALL_VISIBLE;
+ if (all_frozen)
+ {
+ vmflags |= VISIBILITYMAP_ALL_FROZEN;
+ Assert(!TransactionIdIsValid(visibility_cutoff_xid));
+ }
+ }
+
START_CRIT_SECTION();
for (int i = 0; i < num_offsets; i++)
@@ -2877,6 +2896,18 @@ lazy_vacuum_heap_page(LVRelState *vacrel, BlockNumber blkno, Buffer buffer,
/* Attempt to truncate line pointer array now */
PageTruncateLinePointerArray(page);
+ if ((vmflags & VISIBILITYMAP_VALID_BITS) != 0)
+ {
+ Assert(!PageIsAllVisible(page));
+ set_pd_all_vis = true;
+ PageSetAllVisible(page);
+ LockBuffer(vmbuffer, BUFFER_LOCK_EXCLUSIVE);
+ visibilitymap_set_vmbits(vacrel->rel,
+ blkno,
+ vmbuffer, vmflags);
+ conflict_xid = visibility_cutoff_xid;
+ }
+
/*
* Mark buffer dirty before we write WAL.
*/
@@ -2886,7 +2917,10 @@ lazy_vacuum_heap_page(LVRelState *vacrel, BlockNumber blkno, Buffer buffer,
if (RelationNeedsWAL(vacrel->rel))
{
log_heap_prune_and_freeze(vacrel->rel, buffer,
- InvalidTransactionId,
+ vmbuffer,
+ vmflags,
+ set_pd_all_vis,
+ conflict_xid,
false, /* no cleanup lock required */
PRUNE_VACUUM_CLEANUP,
NULL, 0, /* frozen */
@@ -2895,39 +2929,12 @@ lazy_vacuum_heap_page(LVRelState *vacrel, BlockNumber blkno, Buffer buffer,
unused, nunused);
}
- /*
- * End critical section, so we safely can do visibility tests (which
- * possibly need to perform IO and allocate memory!). If we crash now the
- * page (including the corresponding vm bit) might not be marked all
- * visible, but that's fine. A later vacuum will fix that.
- */
END_CRIT_SECTION();
- /*
- * Now that we have removed the LP_DEAD items from the page, once again
- * check if the page has become all-visible. The page is already marked
- * dirty, exclusively locked, and, if needed, a full page image has been
- * emitted.
- */
- Assert(!PageIsAllVisible(page));
- if (heap_page_is_all_visible(vacrel->rel, buffer, vacrel->cutoffs.OldestXmin,
- &all_frozen, &visibility_cutoff_xid, &vacrel->offnum))
+ if ((vmflags & VISIBILITYMAP_ALL_VISIBLE) != 0)
{
- uint8 flags = VISIBILITYMAP_ALL_VISIBLE;
-
- if (all_frozen)
- {
- Assert(!TransactionIdIsValid(visibility_cutoff_xid));
- flags |= VISIBILITYMAP_ALL_FROZEN;
- }
-
- PageSetAllVisible(page);
- visibilitymap_set(vacrel->rel, blkno, buffer,
- InvalidXLogRecPtr,
- vmbuffer, visibility_cutoff_xid,
- flags);
-
/* Count the newly set VM page for logging */
+ LockBuffer(vmbuffer, BUFFER_LOCK_UNLOCK);
vacrel->vm_new_visible_pages++;
if (all_frozen)
vacrel->vm_new_visible_frozen_pages++;
@@ -3594,40 +3601,85 @@ dead_items_cleanup(LVRelState *vacrel)
}
/*
- * Check if every tuple in the given page in buf is visible to all current and
- * future transactions.
+ * Wrapper for heap_page_would_be_all_visible() which can be used for
+ * callers that expect no LP_DEAD on the page.
+ */
+bool
+heap_page_is_all_visible(Relation rel, Buffer buf,
+ TransactionId OldestXmin,
+ bool *all_frozen,
+ TransactionId *visibility_cutoff_xid,
+ OffsetNumber *logging_offnum)
+{
+
+ return heap_page_would_be_all_visible(rel, buf, OldestXmin,
+ NULL, 0,
+ all_frozen,
+ visibility_cutoff_xid,
+ logging_offnum);
+}
+
+/*
+ * Determines whether or not the heap page in buf is all-visible other than
+ * the dead line pointers referred to by the provided deadoffsets array.
*
- * OldestXmin is used to determine visibility.
+ * deadoffsets are the offsets the caller knows about and already removed
+ * associated index entries. Vacuum will call this before setting those line
+ * pointers LP_UNUSED. So, if there are no new LP_DEAD items, then the page
+ * can be set all-visible in the VM by the caller.
+ *
+ * Returns true if the page is all-visible other than the provided
+ * deadoffsets and false otherwise.
*
- * Sets *all_frozen to true if every tuple on this page is frozen.
+ * OldestXmin is used to determine visibility.
*
- * Sets *visibility_cutoff_xid to the highest xmin amongst the visible tuples.
- * It is only valid if the page is all-visible.
+ * *all_frozen is an output parameter indicating to the caller if every tuple
+ * on the page is frozen.
*
* *logging_offnum will have the OffsetNumber of the current tuple being
* processed for vacuum's error callback system.
*
- * This is a stripped down version of lazy_scan_prune(). If you change
- * anything here, make sure that everything stays in sync. Note that an
- * assertion calls us to verify that everybody still agrees. Be sure to avoid
- * introducing new side-effects here.
+ * *visibility_cutoff_xid is an output parameter with the highest xmin amongst the
+ * visible tuples. It is only valid if the page is all-visible.
+ *
+ * Callers looking to verify that the page is already all-visible can call
+ * heap_page_is_all_visible().
+ *
+ * This is similar logic to that in heap_prune_record_unchanged_lp_normal() If
+ * you change anything here, make sure that everything stays in sync. Note
+ * that an assertion calls us to verify that everybody still agrees. Be sure
+ * to avoid introducing new side-effects here.
*/
static bool
-heap_page_is_all_visible(Relation rel, Buffer buf,
- TransactionId OldestXmin,
- bool *all_frozen,
- TransactionId *visibility_cutoff_xid,
- OffsetNumber *logging_offnum)
+heap_page_would_be_all_visible(Relation rel, Buffer buf,
+ TransactionId OldestXmin,
+ OffsetNumber *deadoffsets,
+ int ndeadoffsets,
+ bool *all_frozen,
+ TransactionId *visibility_cutoff_xid,
+ OffsetNumber *logging_offnum)
{
Page page = BufferGetPage(buf);
BlockNumber blockno = BufferGetBlockNumber(buf);
OffsetNumber offnum,
maxoff;
bool all_visible = true;
+ int matched_dead_count = 0;
*visibility_cutoff_xid = InvalidTransactionId;
*all_frozen = true;
+ Assert(ndeadoffsets == 0 || deadoffsets);
+
+#ifdef USE_ASSERT_CHECKING
+ /* Confirm input deadoffsets[] is strictly sorted */
+ if (ndeadoffsets > 1)
+ {
+ for (int i = 1; i < ndeadoffsets; i++)
+ Assert(deadoffsets[i - 1] < deadoffsets[i]);
+ }
+#endif
+
maxoff = PageGetMaxOffsetNumber(page);
for (offnum = FirstOffsetNumber;
offnum <= maxoff && all_visible;
@@ -3655,9 +3707,15 @@ heap_page_is_all_visible(Relation rel, Buffer buf,
*/
if (ItemIdIsDead(itemid))
{
- all_visible = false;
- *all_frozen = false;
- break;
+ if (!deadoffsets ||
+ matched_dead_count >= ndeadoffsets ||
+ deadoffsets[matched_dead_count] != offnum)
+ {
+ *all_frozen = all_visible = false;
+ break;
+ }
+ matched_dead_count++;
+ continue;
}
Assert(ItemIdIsNormal(itemid));
diff --git a/src/backend/access/rmgrdesc/heapdesc.c b/src/backend/access/rmgrdesc/heapdesc.c
index b48d7dc1d24..c95d30dfe8d 100644
--- a/src/backend/access/rmgrdesc/heapdesc.c
+++ b/src/backend/access/rmgrdesc/heapdesc.c
@@ -103,7 +103,7 @@ plan_elem_desc(StringInfo buf, void *plan, void *data)
* code, the latter of which is used in frontend (pg_waldump) code.
*/
void
-heap_xlog_deserialize_prune_and_freeze(char *cursor, uint8 flags,
+heap_xlog_deserialize_prune_and_freeze(char *cursor, uint16 flags,
int *nplans, xlhp_freeze_plan **plans,
OffsetNumber **frz_offsets,
int *nredirected, OffsetNumber **redirected,
@@ -279,7 +279,6 @@ heap2_desc(StringInfo buf, XLogReaderState *record)
TransactionId conflict_xid;
memcpy(&conflict_xid, rec + SizeOfHeapPrune, sizeof(TransactionId));
-
appendStringInfo(buf, "snapshotConflictHorizon: %u",
conflict_xid);
}
@@ -287,6 +286,10 @@ heap2_desc(StringInfo buf, XLogReaderState *record)
appendStringInfo(buf, ", isCatalogRel: %c",
xlrec->flags & XLHP_IS_CATALOG_REL ? 'T' : 'F');
+ if (xlrec->flags & VISIBILITYMAP_VALID_BITS)
+ appendStringInfo(buf, ", vm_flags: 0x%02X",
+ xlrec->flags & VISIBILITYMAP_VALID_BITS);
+
if (XLogRecHasBlockData(record, 0))
{
Size datalen;
diff --git a/src/include/access/heapam.h b/src/include/access/heapam.h
index a2bd5a897f8..8b47295efa2 100644
--- a/src/include/access/heapam.h
+++ b/src/include/access/heapam.h
@@ -344,6 +344,12 @@ extern void heap_inplace_update_and_unlock(Relation relation,
Buffer buffer);
extern void heap_inplace_unlock(Relation relation,
HeapTuple oldtup, Buffer buffer);
+
+extern bool heap_page_is_all_visible(Relation rel, Buffer buf,
+ TransactionId OldestXmin,
+ bool *all_frozen,
+ TransactionId *visibility_cutoff_xid,
+ OffsetNumber *logging_offnum);
extern bool heap_prepare_freeze_tuple(HeapTupleHeader tuple,
const struct VacuumCutoffs *cutoffs,
HeapPageFreeze *pagefrz,
@@ -388,6 +394,9 @@ extern void heap_page_prune_execute(Buffer buffer, bool lp_truncate_only,
OffsetNumber *nowunused, int nunused);
extern void heap_get_root_tuples(Page page, OffsetNumber *root_offsets);
extern void log_heap_prune_and_freeze(Relation relation, Buffer buffer,
+ Buffer vmbuffer,
+ uint8 vmflags,
+ bool vm_modified_heap_page,
TransactionId conflict_xid,
bool cleanup_lock,
PruneReason reason,
diff --git a/src/include/access/heapam_xlog.h b/src/include/access/heapam_xlog.h
index d4c0625b632..d8508593e7c 100644
--- a/src/include/access/heapam_xlog.h
+++ b/src/include/access/heapam_xlog.h
@@ -249,7 +249,7 @@ typedef struct xl_heap_update
* Main data section:
*
* xl_heap_prune
- * uint8 flags
+ * uint16 flags
* TransactionId snapshot_conflict_horizon
*
* Block 0 data section:
@@ -284,7 +284,7 @@ typedef struct xl_heap_update
*/
typedef struct xl_heap_prune
{
- uint8 flags;
+ uint16 flags;
/*
* If XLHP_HAS_CONFLICT_HORIZON is set, the conflict horizon XID follows,
@@ -292,10 +292,22 @@ typedef struct xl_heap_prune
*/
} xl_heap_prune;
-#define SizeOfHeapPrune (offsetof(xl_heap_prune, flags) + sizeof(uint8))
+#define SizeOfHeapPrune (offsetof(xl_heap_prune, flags) + sizeof(uint16))
+
+/*
+ * The xl_heap_prune record's flags may also contain which VM bits to set. As
+ * such, (1 << 0) and (1 << 1) are reserved for VISIBILITYMAP_ALL_VISIBLE and
+ * VISIBILITYMAP_ALL_FROZEN.
+ */
-/* to handle recovery conflict during logical decoding on standby */
-#define XLHP_IS_CATALOG_REL (1 << 1)
+/*
+ * To handle recovery conflict during logical decoding on standby, we must know
+ * if the table is a catalog table. Note that in visibilitymapdefs.h
+ * VISIBLITYMAP_XLOG_CATALOG_REL is also defined as (1 << 2). xl_heap_prune
+ * records should use XLHP_IS_CATALOG_REL, not VISIBILIYTMAP_XLOG_CATALOG_REL --
+ * even if they only contain updates to the VM.
+ */
+#define XLHP_IS_CATALOG_REL (1 << 2)
/*
* Does replaying the record require a cleanup-lock?
@@ -305,7 +317,7 @@ typedef struct xl_heap_prune
* marks LP_DEAD line pointers as unused without moving any tuple data, an
* ordinary exclusive lock is sufficient.
*/
-#define XLHP_CLEANUP_LOCK (1 << 2)
+#define XLHP_CLEANUP_LOCK (1 << 3)
/*
* If we remove or freeze any entries that contain xids, we need to include a
@@ -313,22 +325,22 @@ typedef struct xl_heap_prune
* there are no queries running for which the removed tuples are still
* visible, or which still consider the frozen XIDs as running.
*/
-#define XLHP_HAS_CONFLICT_HORIZON (1 << 3)
+#define XLHP_HAS_CONFLICT_HORIZON (1 << 4)
/*
* Indicates that an xlhp_freeze_plans sub-record and one or more
* xlhp_freeze_plan sub-records are present.
*/
-#define XLHP_HAS_FREEZE_PLANS (1 << 4)
+#define XLHP_HAS_FREEZE_PLANS (1 << 5)
/*
* XLHP_HAS_REDIRECTIONS, XLHP_HAS_DEAD_ITEMS, and XLHP_HAS_NOW_UNUSED_ITEMS
* indicate that xlhp_prune_items sub-records with redirected, dead, and
* unused item offsets are present.
*/
-#define XLHP_HAS_REDIRECTIONS (1 << 5)
-#define XLHP_HAS_DEAD_ITEMS (1 << 6)
-#define XLHP_HAS_NOW_UNUSED_ITEMS (1 << 7)
+#define XLHP_HAS_REDIRECTIONS (1 << 6)
+#define XLHP_HAS_DEAD_ITEMS (1 << 7)
+#define XLHP_HAS_NOW_UNUSED_ITEMS (1 << 8)
/*
* xlhp_freeze_plan describes how to freeze a group of one or more heap tuples
@@ -497,7 +509,7 @@ extern XLogRecPtr log_heap_visible(Relation rel, Buffer heap_buffer,
uint8 vmflags);
/* in heapdesc.c, so it can be shared between frontend/backend code */
-extern void heap_xlog_deserialize_prune_and_freeze(char *cursor, uint8 flags,
+extern void heap_xlog_deserialize_prune_and_freeze(char *cursor, uint16 flags,
int *nplans, xlhp_freeze_plan **plans,
OffsetNumber **frz_offsets,
int *nredirected, OffsetNumber **redirected,
--
2.43.0
v9-0001-Remove-unneeded-VM-pin-from-VM-replay.patchtext/x-patch; charset=US-ASCII; name=v9-0001-Remove-unneeded-VM-pin-from-VM-replay.patchDownload
From 686edbfbe6556da8cdd6219fd9cd270ccfc9bb32 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <melanieplageman@gmail.com>
Date: Wed, 27 Aug 2025 08:50:15 -0400
Subject: [PATCH v9 01/22] Remove unneeded VM pin from VM replay
Previously, heap_xlog_visible() called visibilitymap_pin() even after
getting a buffer from XLogReadBufferForRedoExtended() -- which returns a
pinned buffer containing the specified block of the visibility map.
This would just have resulted in visibilitymap_pin() returning early
since the specified page was already present and pinned, but it was
confusing extraneous code, so remove it.
It appears to be an oversight in 2c03216.
While we are at it, remove two VM-related redundant asserts in the COPY
FREEZE code path. visibilitymap_set() already asserts that
PD_ALL_VISIBLE is set on the heap page and checks that the vmbuffer
contains the bits corresponding to the specified heap block, so callers
do not also need to check this.
Author: Melanie Plageman <melanieplageman@gmail.com>
Reported-by: Melanie Plageman <melanieplageman@gmail.com>
Reported-by: Kirill Reshke <reshkekirill@gmail.com>
Reviewed-by: Kirill Reshke <reshkekirill@gmail.com>
Reviewed-by: Andres Freund <andres@anarazel.de>
Discussion: https://postgr.es/m/CALdSSPhu7WZd%2BEfQDha1nz%3DDC93OtY1%3DUFEdWwSZsASka_2eRQ%40mail.gmail.com
---
src/backend/access/heap/heapam.c | 3 ---
src/backend/access/heap/heapam_xlog.c | 1 -
2 files changed, 4 deletions(-)
diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index e3e7307ef5f..4c5ae205a7a 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -2647,9 +2647,6 @@ heap_multi_insert(Relation relation, TupleTableSlot **slots, int ntuples,
*/
if (all_frozen_set)
{
- Assert(PageIsAllVisible(page));
- Assert(visibilitymap_pin_ok(BufferGetBlockNumber(buffer), vmbuffer));
-
/*
* It's fine to use InvalidTransactionId here - this is only used
* when HEAP_INSERT_FROZEN is specified, which intentionally
diff --git a/src/backend/access/heap/heapam_xlog.c b/src/backend/access/heap/heapam_xlog.c
index 5d48f071f53..cf843277938 100644
--- a/src/backend/access/heap/heapam_xlog.c
+++ b/src/backend/access/heap/heapam_xlog.c
@@ -295,7 +295,6 @@ heap_xlog_visible(XLogReaderState *record)
LockBuffer(vmbuffer, BUFFER_LOCK_UNLOCK);
reln = CreateFakeRelcacheEntry(rlocator);
- visibilitymap_pin(reln, blkno, &vmbuffer);
visibilitymap_set(reln, blkno, InvalidBuffer, lsn, vmbuffer,
xlrec->snapshotConflictHorizon, vmbits);
--
2.43.0
v9-0003-Eliminate-xl_heap_visible-in-COPY-FREEZE.patchtext/x-patch; charset=US-ASCII; name=v9-0003-Eliminate-xl_heap_visible-in-COPY-FREEZE.patchDownload
From 7b6222f1670a0078c32383e64fb3782f555a6564 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <melanieplageman@gmail.com>
Date: Tue, 17 Jun 2025 17:22:10 -0400
Subject: [PATCH v9 03/22] Eliminate xl_heap_visible in COPY FREEZE
Instead of emitting a separate WAL record for setting the VM bits in
xl_heap_visible, specify the changes to make to the VM block in the
xl_heap_multi_insert record instead.
Author: Melanie Plageman <melanieplageman@gmail.com>
Reviewed-by: Kirill Reshke <reshkekirill@gmail.com>
Discussion: https://postgr.es/m/flat/CAAKRu_ZMw6Npd_qm2KM%2BFwQ3cMOMx1Dh3VMhp8-V7SOLxdK9-g%40mail.gmail.com
---
src/backend/access/heap/heapam.c | 47 ++++++++++-------
src/backend/access/heap/heapam_xlog.c | 43 +++++++++++++++-
src/backend/access/heap/visibilitymap.c | 67 ++++++++++++++++++++++++-
src/backend/access/rmgrdesc/heapdesc.c | 5 ++
src/include/access/visibilitymap.h | 2 +
5 files changed, 144 insertions(+), 20 deletions(-)
diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index 4c5ae205a7a..893a739009a 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -2504,9 +2504,6 @@ heap_multi_insert(Relation relation, TupleTableSlot **slots, int ntuples,
/*
* If the page is all visible, need to clear that, unless we're only
* going to add further frozen rows to it.
- *
- * If we're only adding already frozen rows to a previously empty
- * page, mark it as all-visible.
*/
if (PageIsAllVisible(page) && !(options & HEAP_INSERT_FROZEN))
{
@@ -2516,8 +2513,22 @@ heap_multi_insert(Relation relation, TupleTableSlot **slots, int ntuples,
BufferGetBlockNumber(buffer),
vmbuffer, VISIBILITYMAP_VALID_BITS);
}
+
+ /*
+ * If we're only adding already frozen rows to a previously empty
+ * page, mark it as all-frozen and update the visibility map. We're
+ * already holding a pin on the vmbuffer.
+ */
else if (all_frozen_set)
+ {
PageSetAllVisible(page);
+ LockBuffer(vmbuffer, BUFFER_LOCK_EXCLUSIVE);
+ visibilitymap_set_vmbits(relation,
+ BufferGetBlockNumber(buffer),
+ vmbuffer,
+ VISIBILITYMAP_ALL_VISIBLE |
+ VISIBILITYMAP_ALL_FROZEN);
+ }
/*
* XXX Should we set PageSetPrunable on this page ? See heap_insert()
@@ -2565,6 +2576,12 @@ heap_multi_insert(Relation relation, TupleTableSlot **slots, int ntuples,
xlrec->flags = 0;
if (all_visible_cleared)
xlrec->flags = XLH_INSERT_ALL_VISIBLE_CLEARED;
+
+ /*
+ * We don't have to worry about including a conflict xid in the
+ * WAL record as HEAP_INSERT_FROZEN intentionally violates
+ * visibility rules.
+ */
if (all_frozen_set)
xlrec->flags = XLH_INSERT_ALL_FROZEN_SET;
@@ -2627,7 +2644,10 @@ heap_multi_insert(Relation relation, TupleTableSlot **slots, int ntuples,
XLogBeginInsert();
XLogRegisterData(xlrec, tupledata - scratch.data);
+
XLogRegisterBuffer(0, buffer, REGBUF_STANDARD | bufflags);
+ if (all_frozen_set)
+ XLogRegisterBuffer(1, vmbuffer, 0);
XLogRegisterBufData(0, tupledata, totaldatalen);
@@ -2637,26 +2657,17 @@ heap_multi_insert(Relation relation, TupleTableSlot **slots, int ntuples,
recptr = XLogInsert(RM_HEAP2_ID, info);
PageSetLSN(page, recptr);
+ if (all_frozen_set)
+ {
+ Assert(BufferIsDirty(vmbuffer));
+ PageSetLSN(BufferGetPage(vmbuffer), recptr);
+ }
}
END_CRIT_SECTION();
- /*
- * If we've frozen everything on the page, update the visibilitymap.
- * We're already holding pin on the vmbuffer.
- */
if (all_frozen_set)
- {
- /*
- * It's fine to use InvalidTransactionId here - this is only used
- * when HEAP_INSERT_FROZEN is specified, which intentionally
- * violates visibility rules.
- */
- visibilitymap_set(relation, BufferGetBlockNumber(buffer), buffer,
- InvalidXLogRecPtr, vmbuffer,
- InvalidTransactionId,
- VISIBILITYMAP_ALL_VISIBLE | VISIBILITYMAP_ALL_FROZEN);
- }
+ LockBuffer(vmbuffer, BUFFER_LOCK_UNLOCK);
UnlockReleaseBuffer(buffer);
ndone += nthispage;
diff --git a/src/backend/access/heap/heapam_xlog.c b/src/backend/access/heap/heapam_xlog.c
index cf843277938..0820f7d052d 100644
--- a/src/backend/access/heap/heapam_xlog.c
+++ b/src/backend/access/heap/heapam_xlog.c
@@ -551,6 +551,7 @@ heap_xlog_multi_insert(XLogReaderState *record)
int i;
bool isinit = (XLogRecGetInfo(record) & XLOG_HEAP_INIT_PAGE) != 0;
XLogRedoAction action;
+ Buffer vmbuffer = InvalidBuffer;
/*
* Insertion doesn't overwrite MVCC data, so no conflict processing is
@@ -571,11 +572,11 @@ heap_xlog_multi_insert(XLogReaderState *record)
if (xlrec->flags & XLH_INSERT_ALL_VISIBLE_CLEARED)
{
Relation reln = CreateFakeRelcacheEntry(rlocator);
- Buffer vmbuffer = InvalidBuffer;
visibilitymap_pin(reln, blkno, &vmbuffer);
visibilitymap_clear(reln, blkno, vmbuffer, VISIBILITYMAP_VALID_BITS);
ReleaseBuffer(vmbuffer);
+ vmbuffer = InvalidBuffer;
FreeFakeRelcacheEntry(reln);
}
@@ -662,6 +663,46 @@ heap_xlog_multi_insert(XLogReaderState *record)
if (BufferIsValid(buffer))
UnlockReleaseBuffer(buffer);
+ buffer = InvalidBuffer;
+
+ /*
+ * Now read and update the VM block. Even if we skipped updating the heap
+ * page due to the file being dropped or truncated later in recovery, it's
+ * still safe to update the visibility map. Any WAL record that clears
+ * the visibility map bit does so before checking the page LSN, so any
+ * bits that need to be cleared will still be cleared.
+ *
+ * It is only okay to set the VM bits without holding the heap page lock
+ * because we can expect no other writers of this page.
+ */
+ if (xlrec->flags & XLH_INSERT_ALL_FROZEN_SET &&
+ XLogReadBufferForRedoExtended(record, 1, RBM_ZERO_ON_ERROR, false,
+ &vmbuffer) == BLK_NEEDS_REDO)
+ {
+ Page vmpage = BufferGetPage(vmbuffer);
+ Relation reln = CreateFakeRelcacheEntry(rlocator);
+
+ /* initialize the page if it was read as zeros */
+ if (PageIsNew(vmpage))
+ PageInit(vmpage, BLCKSZ, 0);
+
+ visibilitymap_set_vmbits(reln, blkno,
+ vmbuffer,
+ VISIBILITYMAP_ALL_VISIBLE |
+ VISIBILITYMAP_ALL_FROZEN);
+
+ /*
+ * It is not possible that the VM was already set for this heap page,
+ * so the vmbuffer must have been modified and marked dirty.
+ */
+ Assert(BufferIsDirty(vmbuffer));
+ PageSetLSN(BufferGetPage(vmbuffer), lsn);
+ FreeFakeRelcacheEntry(reln);
+ }
+
+ if (BufferIsValid(vmbuffer))
+ UnlockReleaseBuffer(vmbuffer);
+
/*
* If the page is running low on free space, update the FSM as well.
* Arbitrarily, our definition of "low" is less than 20%. We can't do much
diff --git a/src/backend/access/heap/visibilitymap.c b/src/backend/access/heap/visibilitymap.c
index 7306c16f05c..bb8dfd8910a 100644
--- a/src/backend/access/heap/visibilitymap.c
+++ b/src/backend/access/heap/visibilitymap.c
@@ -14,7 +14,8 @@
* visibilitymap_clear - clear bits for one page in the visibility map
* visibilitymap_pin - pin a map page for setting a bit
* visibilitymap_pin_ok - check whether correct map page is already pinned
- * visibilitymap_set - set a bit in a previously pinned page
+ * visibilitymap_set - set bit(s) in a previously pinned page and log
+ * visibilitymap_set_vmbits - set bit(s) in a pinned page
* visibilitymap_get_status - get status of bits
* visibilitymap_count - count number of bits set in visibility map
* visibilitymap_prepare_truncate -
@@ -321,6 +322,70 @@ visibilitymap_set(Relation rel, BlockNumber heapBlk, Buffer heapBuf,
return status;
}
+/*
+ * Set flags in the VM block contained in the passed in vmBuf.
+ *
+ * This function is for callers which include the VM changes in the same WAL
+ * record as the modifications of the heap page which rendered it all-visible.
+ * Callers separately logging the VM changes should invoke visibilitymap_set()
+ * instead.
+ *
+ * Caller must have pinned and exclusive locked the correct block of the VM in
+ * vmBuf. This block should contain the VM bits for the given heapBlk.
+ *
+ * During normal operation (i.e. not recovery), this should be called in a
+ * critical section which also makes any necessary changes to the heap page
+ * and, if relevant, emits WAL.
+ *
+ * Caller is responsible for WAL logging the changes to the VM buffer and for
+ * making any changes needed to the associated heap page. This includes
+ * maintaining any invariants such as ensuring the buffer containing heapBlk
+ * is pinned and exclusive locked.
+ */
+uint8
+visibilitymap_set_vmbits(Relation rel, BlockNumber heapBlk,
+ Buffer vmBuf, uint8 flags)
+{
+ BlockNumber mapBlock = HEAPBLK_TO_MAPBLOCK(heapBlk);
+ uint32 mapByte = HEAPBLK_TO_MAPBYTE(heapBlk);
+ uint8 mapOffset = HEAPBLK_TO_OFFSET(heapBlk);
+ Page page;
+ uint8 *map;
+ uint8 status;
+
+#ifdef TRACE_VISIBILITYMAP
+ elog(DEBUG1, "vm_set flags 0x%02X for %s %d",
+ flags, RelationGetRelationName(rel), heapBlk);
+#endif
+
+ /* Call in same critical section where WAL is emitted. */
+ Assert(InRecovery || CritSectionCount > 0);
+
+ /* Flags should be valid. Also never clear bits with this function */
+ Assert((flags & VISIBILITYMAP_VALID_BITS) == flags);
+
+ /* Must never set all_frozen bit without also setting all_visible bit */
+ Assert(flags != VISIBILITYMAP_ALL_FROZEN);
+
+ /* Check that we have the right VM page pinned */
+ if (!BufferIsValid(vmBuf) || BufferGetBlockNumber(vmBuf) != mapBlock)
+ elog(ERROR, "wrong VM buffer passed to visibilitymap_set");
+
+ Assert(BufferIsExclusiveLocked(vmBuf));
+
+ page = BufferGetPage(vmBuf);
+ map = (uint8 *) PageGetContents(page);
+
+ status = (map[mapByte] >> mapOffset) & VISIBILITYMAP_VALID_BITS;
+ if (flags != status)
+ {
+ map[mapByte] |= (flags << mapOffset);
+ MarkBufferDirty(vmBuf);
+ }
+
+ return status;
+}
+
/*
* visibilitymap_get_status - get status of bits
*
diff --git a/src/backend/access/rmgrdesc/heapdesc.c b/src/backend/access/rmgrdesc/heapdesc.c
index 82b62c95de5..b48d7dc1d24 100644
--- a/src/backend/access/rmgrdesc/heapdesc.c
+++ b/src/backend/access/rmgrdesc/heapdesc.c
@@ -16,6 +16,7 @@
#include "access/heapam_xlog.h"
#include "access/rmgrdesc_utils.h"
+#include "access/visibilitymapdefs.h"
#include "storage/standbydefs.h"
/*
@@ -354,6 +355,10 @@ heap2_desc(StringInfo buf, XLogReaderState *record)
appendStringInfo(buf, "ntuples: %d, flags: 0x%02X", xlrec->ntuples,
xlrec->flags);
+ if (xlrec->flags & XLH_INSERT_ALL_FROZEN_SET)
+ appendStringInfo(buf, ", vm_flags: 0x%02X",
+ VISIBILITYMAP_ALL_VISIBLE | VISIBILITYMAP_ALL_FROZEN);
+
if (XLogRecHasBlockData(record, 0) && !isinit)
{
appendStringInfoString(buf, ", offsets:");
diff --git a/src/include/access/visibilitymap.h b/src/include/access/visibilitymap.h
index be21c6dd1a3..fc7056a91ea 100644
--- a/src/include/access/visibilitymap.h
+++ b/src/include/access/visibilitymap.h
@@ -37,6 +37,8 @@ extern uint8 visibilitymap_set(Relation rel,
Buffer vmBuf,
TransactionId cutoff_xid,
uint8 flags);
+extern uint8 visibilitymap_set_vmbits(Relation rel, BlockNumber heapBlk,
+ Buffer vmBuf, uint8 flags);
extern uint8 visibilitymap_get_status(Relation rel, BlockNumber heapBlk, Buffer *vmbuf);
extern void visibilitymap_count(Relation rel, BlockNumber *all_visible, BlockNumber *all_frozen);
extern BlockNumber visibilitymap_prepare_truncate(Relation rel,
--
2.43.0
v9-0004-Make-heap_page_is_all_visible-independent-of-LVRe.patchtext/x-patch; charset=US-ASCII; name=v9-0004-Make-heap_page_is_all_visible-independent-of-LVRe.patchDownload
From abd46a0e574456401cb34380236673239c317361 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <melanieplageman@gmail.com>
Date: Wed, 18 Jun 2025 15:48:51 -0400
Subject: [PATCH v9 04/22] Make heap_page_is_all_visible independent of
LVRelState
Future commits will use this function inside of pruneheap.c where we do
not have access to the LVRelState. We only need a few parameters from
the LVRelState, so just pass those in explicitly.
Author: Melanie Plageman <melanieplageman@gmail.com>
Reviewed-by: Kirill Reshke <reshkekirill@gmail.com>
Discussion: https://postgr.es/m/flat/CAAKRu_ZMw6Npd_qm2KM%2BFwQ3cMOMx1Dh3VMhp8-V7SOLxdK9-g%40mail.gmail.com
---
src/backend/access/heap/vacuumlazy.c | 48 ++++++++++++++++++----------
1 file changed, 31 insertions(+), 17 deletions(-)
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index 932701d8420..7f6f684bc63 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -463,8 +463,11 @@ static void dead_items_add(LVRelState *vacrel, BlockNumber blkno, OffsetNumber *
int num_offsets);
static void dead_items_reset(LVRelState *vacrel);
static void dead_items_cleanup(LVRelState *vacrel);
-static bool heap_page_is_all_visible(LVRelState *vacrel, Buffer buf,
- TransactionId *visibility_cutoff_xid, bool *all_frozen);
+static bool heap_page_is_all_visible(Relation rel, Buffer buf,
+ TransactionId OldestXmin,
+ bool *all_frozen,
+ TransactionId *visibility_cutoff_xid,
+ OffsetNumber *logging_offnum);
static void update_relstats_all_indexes(LVRelState *vacrel);
static void vacuum_error_callback(void *arg);
static void update_vacuum_error_info(LVRelState *vacrel,
@@ -2009,8 +2012,9 @@ lazy_scan_prune(LVRelState *vacrel,
Assert(presult.lpdead_items == 0);
- if (!heap_page_is_all_visible(vacrel, buf,
- &debug_cutoff, &debug_all_frozen))
+ if (!heap_page_is_all_visible(vacrel->rel, buf,
+ vacrel->cutoffs.OldestXmin, &debug_all_frozen,
+ &debug_cutoff, &vacrel->offnum))
Assert(false);
Assert(presult.all_frozen == debug_all_frozen);
@@ -2906,8 +2910,8 @@ lazy_vacuum_heap_page(LVRelState *vacrel, BlockNumber blkno, Buffer buffer,
* emitted.
*/
Assert(!PageIsAllVisible(page));
- if (heap_page_is_all_visible(vacrel, buffer, &visibility_cutoff_xid,
- &all_frozen))
+ if (heap_page_is_all_visible(vacrel->rel, buffer, vacrel->cutoffs.OldestXmin,
+ &all_frozen, &visibility_cutoff_xid, &vacrel->offnum))
{
uint8 flags = VISIBILITYMAP_ALL_VISIBLE;
@@ -3590,10 +3594,18 @@ dead_items_cleanup(LVRelState *vacrel)
}
/*
- * Check if every tuple in the given page is visible to all current and future
- * transactions. Also return the visibility_cutoff_xid which is the highest
- * xmin amongst the visible tuples. Set *all_frozen to true if every tuple
- * on this page is frozen.
+ * Check if every tuple in the given page in buf is visible to all current and
+ * future transactions.
+ *
+ * OldestXmin is used to determine visibility.
+ *
+ * Sets *all_frozen to true if every tuple on this page is frozen.
+ *
+ * Sets *visibility_cutoff_xid to the highest xmin amongst the visible tuples.
+ * It is only valid if the page is all-visible.
+ *
+ * *logging_offnum will have the OffsetNumber of the current tuple being
+ * processed for vacuum's error callback system.
*
* This is a stripped down version of lazy_scan_prune(). If you change
* anything here, make sure that everything stays in sync. Note that an
@@ -3601,9 +3613,11 @@ dead_items_cleanup(LVRelState *vacrel)
* introducing new side-effects here.
*/
static bool
-heap_page_is_all_visible(LVRelState *vacrel, Buffer buf,
+heap_page_is_all_visible(Relation rel, Buffer buf,
+ TransactionId OldestXmin,
+ bool *all_frozen,
TransactionId *visibility_cutoff_xid,
- bool *all_frozen)
+ OffsetNumber *logging_offnum)
{
Page page = BufferGetPage(buf);
BlockNumber blockno = BufferGetBlockNumber(buf);
@@ -3626,7 +3640,7 @@ heap_page_is_all_visible(LVRelState *vacrel, Buffer buf,
* Set the offset number so that we can display it along with any
* error that occurred while processing this tuple.
*/
- vacrel->offnum = offnum;
+ *logging_offnum = offnum;
itemid = PageGetItemId(page, offnum);
/* Unused or redirect line pointers are of no interest */
@@ -3650,9 +3664,9 @@ heap_page_is_all_visible(LVRelState *vacrel, Buffer buf,
tuple.t_data = (HeapTupleHeader) PageGetItem(page, itemid);
tuple.t_len = ItemIdGetLength(itemid);
- tuple.t_tableOid = RelationGetRelid(vacrel->rel);
+ tuple.t_tableOid = RelationGetRelid(rel);
- switch (HeapTupleSatisfiesVacuum(&tuple, vacrel->cutoffs.OldestXmin,
+ switch (HeapTupleSatisfiesVacuum(&tuple, OldestXmin,
buf))
{
case HEAPTUPLE_LIVE:
@@ -3673,7 +3687,7 @@ heap_page_is_all_visible(LVRelState *vacrel, Buffer buf,
*/
xmin = HeapTupleHeaderGetXmin(tuple.t_data);
if (!TransactionIdPrecedes(xmin,
- vacrel->cutoffs.OldestXmin))
+ OldestXmin))
{
all_visible = false;
*all_frozen = false;
@@ -3708,7 +3722,7 @@ heap_page_is_all_visible(LVRelState *vacrel, Buffer buf,
} /* scan along page */
/* Clear the offset information once we have processed the given page. */
- vacrel->offnum = InvalidOffsetNumber;
+ *logging_offnum = InvalidOffsetNumber;
return all_visible;
}
--
2.43.0
v9-0006-Use-xl_heap_prune-record-for-setting-empty-pages-.patchtext/x-patch; charset=US-ASCII; name=v9-0006-Use-xl_heap_prune-record-for-setting-empty-pages-.patchDownload
From 15eb77d2b54d4856d6dd392c48cb68d6721d20ff Mon Sep 17 00:00:00 2001
From: Melanie Plageman <melanieplageman@gmail.com>
Date: Wed, 18 Jun 2025 12:42:19 -0400
Subject: [PATCH v9 06/22] Use xl_heap_prune record for setting empty pages
all-visible
As part of a project to eliminate xl_heap_visible records, eliminate
their usage in phase I vacuum of empty pages.
---
src/backend/access/heap/pruneheap.c | 14 +++++--
src/backend/access/heap/vacuumlazy.c | 55 ++++++++++++++++++----------
src/include/access/heapam.h | 1 +
3 files changed, 47 insertions(+), 23 deletions(-)
diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index f0b33d1b696..373986b204a 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -836,6 +836,7 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
conflict_xid = prstate.latest_xid_removed;
log_heap_prune_and_freeze(relation, buffer,
+ false,
InvalidBuffer, 0, false,
conflict_xid,
true, reason,
@@ -2055,6 +2056,9 @@ heap_log_freeze_plan(HeapTupleFreeze *tuples, int ntuples,
* case, vmbuffer should already have been updated and marked dirty and should
* still be pinned and locked.
*
+ * force_heap_fpi indicates that a full page image of the heap block should be
+ * forced.
+ *
* set_pd_all_vis indicates that we set PD_ALL_VISIBLE and thus should update
* the page LSN when checksums/wal_log_hints are enabled even if we did not
* prune or freeze tuples on the page.
@@ -2065,6 +2069,7 @@ heap_log_freeze_plan(HeapTupleFreeze *tuples, int ntuples,
*/
void
log_heap_prune_and_freeze(Relation relation, Buffer buffer,
+ bool force_heap_fpi,
Buffer vmbuffer,
uint8 vmflags,
bool set_pd_all_vis,
@@ -2095,13 +2100,16 @@ log_heap_prune_and_freeze(Relation relation, Buffer buffer,
regbuf_flags = REGBUF_STANDARD;
+ if (force_heap_fpi)
+ regbuf_flags |= REGBUF_FORCE_IMAGE;
+
/*
* We can avoid an FPI if the only modification we are making to the heap
* page is to set PD_ALL_VISIBLE and checksums/wal_log_hints are disabled.
*/
- if (!do_prune &&
- nfrozen == 0 &&
- (!set_pd_all_vis || !XLogHintBitIsNeeded()))
+ else if (!do_prune &&
+ nfrozen == 0 &&
+ (!set_pd_all_vis || !XLogHintBitIsNeeded()))
regbuf_flags |= REGBUF_NO_IMAGE;
/*
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index a50652ca5a0..edd28123b7d 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -1877,33 +1877,47 @@ lazy_scan_new_or_empty(LVRelState *vacrel, Buffer buf, BlockNumber blkno,
*/
if (!PageIsAllVisible(page))
{
+ uint8 new_vmbits = VISIBILITYMAP_ALL_VISIBLE |
+ VISIBILITYMAP_ALL_FROZEN;
+
START_CRIT_SECTION();
- /* mark buffer dirty before writing a WAL record */
+ PageSetAllVisible(page);
MarkBufferDirty(buf);
- /*
- * It's possible that another backend has extended the heap,
- * initialized the page, and then failed to WAL-log the page due
- * to an ERROR. Since heap extension is not WAL-logged, recovery
- * might try to replay our record setting the page all-visible and
- * find that the page isn't initialized, which will cause a PANIC.
- * To prevent that, check whether the page has been previously
- * WAL-logged, and if not, do that now.
- */
- if (RelationNeedsWAL(vacrel->rel) &&
- PageGetLSN(page) == InvalidXLogRecPtr)
- log_newpage_buffer(buf, true);
+ LockBuffer(vmbuffer, BUFFER_LOCK_EXCLUSIVE);
+ visibilitymap_set_vmbits(vacrel->rel, blkno,
+ vmbuffer, new_vmbits);
+
+ if (RelationNeedsWAL(vacrel->rel))
+ {
+ /*
+ * It's possible that another backend has extended the heap,
+ * initialized the page, and then failed to WAL-log the page
+ * due to an ERROR. Since heap extension is not WAL-logged,
+ * recovery might try to replay our record setting the page
+ * all-visible and find that the page isn't initialized, which
+ * will cause a PANIC. To prevent that, if the page hasn't
+ * been previously WAL-logged, force a heap FPI.
+ */
+ log_heap_prune_and_freeze(vacrel->rel, buf,
+ PageGetLSN(page) == InvalidXLogRecPtr,
+ vmbuffer,
+ new_vmbits,
+ true,
+ InvalidTransactionId,
+ false, PRUNE_VACUUM_SCAN,
+ NULL, 0,
+ NULL, 0,
+ NULL, 0,
+ NULL, 0);
+ }
- PageSetAllVisible(page);
- visibilitymap_set(vacrel->rel, blkno, buf,
- InvalidXLogRecPtr,
- vmbuffer, InvalidTransactionId,
- VISIBILITYMAP_ALL_VISIBLE |
- VISIBILITYMAP_ALL_FROZEN);
END_CRIT_SECTION();
- /* Count the newly all-frozen pages for logging */
+ LockBuffer(vmbuffer, BUFFER_LOCK_UNLOCK);
+
+ /* Count the newly all-frozen pages for logging. */
vacrel->vm_new_visible_pages++;
vacrel->vm_new_visible_frozen_pages++;
}
@@ -2917,6 +2931,7 @@ lazy_vacuum_heap_page(LVRelState *vacrel, BlockNumber blkno, Buffer buffer,
if (RelationNeedsWAL(vacrel->rel))
{
log_heap_prune_and_freeze(vacrel->rel, buffer,
+ false,
vmbuffer,
vmflags,
set_pd_all_vis,
diff --git a/src/include/access/heapam.h b/src/include/access/heapam.h
index 8b47295efa2..e7129a644a1 100644
--- a/src/include/access/heapam.h
+++ b/src/include/access/heapam.h
@@ -394,6 +394,7 @@ extern void heap_page_prune_execute(Buffer buffer, bool lp_truncate_only,
OffsetNumber *nowunused, int nunused);
extern void heap_get_root_tuples(Page page, OffsetNumber *root_offsets);
extern void log_heap_prune_and_freeze(Relation relation, Buffer buffer,
+ bool force_heap_fpi,
Buffer vmbuffer,
uint8 vmflags,
bool vm_modified_heap_page,
--
2.43.0
v9-0007-Combine-lazy_scan_prune-VM-corruption-cases.patchtext/x-patch; charset=US-ASCII; name=v9-0007-Combine-lazy_scan_prune-VM-corruption-cases.patchDownload
From c711696d07304ca3130a56dd9b068779c74e5ec2 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <melanieplageman@gmail.com>
Date: Wed, 28 May 2025 16:04:03 -0400
Subject: [PATCH v9 07/22] Combine lazy_scan_prune VM corruption cases
lazy_scan_prune() updates the visibility map after phase I of heap
vacuuming. It also checks and fixes corruption in the VM. The corruption
cases where mixed in with the normal visibility map update cases.
Careful study of the ordering of the current logic reveals that the
corruption cases can be reordered and extracted into a separate
function. This should result in no additional overhead when compared to
previous execution.
This reordering makes it clear which cases are about corruption and
which cases are normal VM updates. Separating them also makes it
possible to combine the normal cases in a future commit. This will make
the logic easier to understand and allow for further separation of the
logic to allow updating the VM in the same record as pruning and
freezing in phase I.
---
src/backend/access/heap/vacuumlazy.c | 114 +++++++++++++++++----------
1 file changed, 73 insertions(+), 41 deletions(-)
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index edd28123b7d..1474835c74b 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -430,6 +430,12 @@ static void find_next_unskippable_block(LVRelState *vacrel, bool *skipsallvis);
static bool lazy_scan_new_or_empty(LVRelState *vacrel, Buffer buf,
BlockNumber blkno, Page page,
bool sharelock, Buffer vmbuffer);
+static bool identify_and_fix_vm_corruption(Relation relation,
+ BlockNumber heap_blk,
+ Buffer heap_buffer, Page heap_page,
+ bool heap_blk_known_av,
+ int64 nlpdead_items,
+ Buffer vmbuffer);
static int lazy_scan_prune(LVRelState *vacrel, Buffer buf,
BlockNumber blkno, Page page,
Buffer vmbuffer, bool all_visible_according_to_vm,
@@ -1932,6 +1938,66 @@ lazy_scan_new_or_empty(LVRelState *vacrel, Buffer buf, BlockNumber blkno,
return false;
}
+/*
+ * When updating the visibility map after phase I heap vacuuming, we take the
+ * opportunity to identify and fix any VM corruption.
+ *
+ * heap_blk_known_av is the visibility status of the heap page collected
+ * while finding the next unskippable block in heap_vac_scan_next_block().
+ */
+static bool
+identify_and_fix_vm_corruption(Relation relation,
+ BlockNumber heap_blk,
+ Buffer heap_buffer, Page heap_page,
+ bool heap_blk_known_av,
+ int64 nlpdead_items,
+ Buffer vmbuffer)
+{
+ /*
+ * As of PostgreSQL 9.2, the visibility map bit should never be set if the
+ * page-level bit is clear. However, it's possible that the bit got
+ * cleared after heap_vac_scan_next_block() was called, so we must recheck
+ * with buffer lock before concluding that the VM is corrupt.
+ */
+ if (heap_blk_known_av && !PageIsAllVisible(heap_page) &&
+ visibilitymap_get_status(relation, heap_blk, &vmbuffer) != 0)
+ {
+ elog(WARNING, "page is not marked all-visible but visibility map bit is set in relation \"%s\" page %u",
+ RelationGetRelationName(relation), heap_blk);
+ visibilitymap_clear(relation, heap_blk, vmbuffer,
+ VISIBILITYMAP_VALID_BITS);
+ return true;
+ }
+
+ /*
+ * It's possible for the value returned by
+ * GetOldestNonRemovableTransactionId() to move backwards, so it's not
+ * wrong for us to see tuples that appear to not be visible to everyone
+ * yet, while PD_ALL_VISIBLE is already set. The real safe xmin value
+ * never moves backwards, but GetOldestNonRemovableTransactionId() is
+ * conservative and sometimes returns a value that's unnecessarily small,
+ * so if we see that contradiction it just means that the tuples that we
+ * think are not visible to everyone yet actually are, and the
+ * PD_ALL_VISIBLE flag is correct.
+ *
+ * There should never be LP_DEAD items on a page with PD_ALL_VISIBLE set,
+ * however.
+ */
+ if (nlpdead_items > 0 && PageIsAllVisible(heap_page))
+ {
+ elog(WARNING, "page containing LP_DEAD items is marked as all-visible in relation \"%s\" page %u",
+ RelationGetRelationName(relation), heap_blk);
+ PageClearAllVisible(heap_page);
+ MarkBufferDirty(heap_buffer);
+ visibilitymap_clear(relation, heap_blk, vmbuffer,
+ VISIBILITYMAP_VALID_BITS);
+ return true;
+ }
+
+ return false;
+}
+
+
/* qsort comparator for sorting OffsetNumbers */
static int
cmpOffsetNumbers(const void *a, const void *b)
@@ -2078,9 +2144,14 @@ lazy_scan_prune(LVRelState *vacrel,
/*
* Handle setting visibility map bit based on information from the VM (as
* of last heap_vac_scan_next_block() call), and from all_visible and
- * all_frozen variables
+ * all_frozen variables. Start by looking for any VM corruption.
*/
- if (!all_visible_according_to_vm && presult.all_visible)
+ if (identify_and_fix_vm_corruption(vacrel->rel, blkno, buf, page,
+ all_visible_according_to_vm, presult.lpdead_items, vmbuffer))
+ {
+ /* Don't update the VM if we just cleared corruption in it */
+ }
+ else if (!all_visible_according_to_vm && presult.all_visible)
{
uint8 old_vmbits;
uint8 flags = VISIBILITYMAP_ALL_VISIBLE;
@@ -2132,45 +2203,6 @@ lazy_scan_prune(LVRelState *vacrel,
}
}
- /*
- * As of PostgreSQL 9.2, the visibility map bit should never be set if the
- * page-level bit is clear. However, it's possible that the bit got
- * cleared after heap_vac_scan_next_block() was called, so we must recheck
- * with buffer lock before concluding that the VM is corrupt.
- */
- else if (all_visible_according_to_vm && !PageIsAllVisible(page) &&
- visibilitymap_get_status(vacrel->rel, blkno, &vmbuffer) != 0)
- {
- elog(WARNING, "page is not marked all-visible but visibility map bit is set in relation \"%s\" page %u",
- vacrel->relname, blkno);
- visibilitymap_clear(vacrel->rel, blkno, vmbuffer,
- VISIBILITYMAP_VALID_BITS);
- }
-
- /*
- * It's possible for the value returned by
- * GetOldestNonRemovableTransactionId() to move backwards, so it's not
- * wrong for us to see tuples that appear to not be visible to everyone
- * yet, while PD_ALL_VISIBLE is already set. The real safe xmin value
- * never moves backwards, but GetOldestNonRemovableTransactionId() is
- * conservative and sometimes returns a value that's unnecessarily small,
- * so if we see that contradiction it just means that the tuples that we
- * think are not visible to everyone yet actually are, and the
- * PD_ALL_VISIBLE flag is correct.
- *
- * There should never be LP_DEAD items on a page with PD_ALL_VISIBLE set,
- * however.
- */
- else if (presult.lpdead_items > 0 && PageIsAllVisible(page))
- {
- elog(WARNING, "page containing LP_DEAD items is marked as all-visible in relation \"%s\" page %u",
- vacrel->relname, blkno);
- PageClearAllVisible(page);
- MarkBufferDirty(buf);
- visibilitymap_clear(vacrel->rel, blkno, vmbuffer,
- VISIBILITYMAP_VALID_BITS);
- }
-
/*
* If the all-visible page is all-frozen but not marked as such yet, mark
* it as all-frozen. Note that all_frozen is only valid if all_visible is
--
2.43.0
v9-0009-Find-and-fix-VM-corruption-in-heap_page_prune_and.patchtext/x-patch; charset=US-ASCII; name=v9-0009-Find-and-fix-VM-corruption-in-heap_page_prune_and.patchDownload
From 1b86b5724fc3468457f1e2d5d57df4c708080164 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <melanieplageman@gmail.com>
Date: Wed, 28 May 2025 16:45:59 -0400
Subject: [PATCH v9 09/22] Find and fix VM corruption in
heap_page_prune_and_freeze
Future commits will update the VM in the same critical section and WAL
record as pruning and freezing. For ease of review, this commit makes
one step toward doing this. It moves the VM corruption handling case to
heap_page_prune_and_freeze().
---
src/backend/access/heap/pruneheap.c | 87 +++++++++++++++++++++++++++-
src/backend/access/heap/vacuumlazy.c | 77 +++---------------------
src/include/access/heapam.h | 4 ++
3 files changed, 96 insertions(+), 72 deletions(-)
diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index 373986b204a..5c08a5d44c7 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -21,7 +21,7 @@
#include "access/transam.h"
#include "access/xlog.h"
#include "access/xloginsert.h"
-#include "access/visibilitymapdefs.h"
+#include "access/visibilitymap.h"
#include "commands/vacuum.h"
#include "executor/instrument.h"
#include "miscadmin.h"
@@ -177,6 +177,13 @@ static void heap_prune_record_unchanged_lp_redirect(PruneState *prstate, OffsetN
static void page_verify_redirects(Page page);
+static bool identify_and_fix_vm_corruption(Relation relation,
+ BlockNumber heap_blk,
+ Buffer heap_buffer, Page heap_page,
+ bool heap_blk_known_av,
+ int64 nlpdead_items,
+ Buffer vmbuffer);
+
/*
* Optionally prune and repair fragmentation in the specified page.
@@ -261,7 +268,9 @@ heap_page_prune_opt(Relation relation, Buffer buffer)
* not the relation has indexes, since we cannot safely determine
* that during on-access pruning with the current implementation.
*/
- heap_page_prune_and_freeze(relation, buffer, vistest, 0,
+ heap_page_prune_and_freeze(relation, buffer, false,
+ InvalidBuffer,
+ vistest, 0,
NULL, &presult, PRUNE_ON_ACCESS, &dummy_off_loc, NULL, NULL);
/*
@@ -294,6 +303,64 @@ heap_page_prune_opt(Relation relation, Buffer buffer)
}
}
+/*
+ * When updating the visibility map after phase I heap vacuuming, we take the
+ * opportunity to identify and fix any VM corruption.
+ *
+ * heap_blk_known_av is the visibility status of the heap page collected
+ * while finding the next unskippable block in heap_vac_scan_next_block().
+ */
+static bool
+identify_and_fix_vm_corruption(Relation relation,
+ BlockNumber heap_blk,
+ Buffer heap_buffer, Page heap_page,
+ bool heap_blk_known_av,
+ int64 nlpdead_items,
+ Buffer vmbuffer)
+{
+ /*
+ * As of PostgreSQL 9.2, the visibility map bit should never be set if the
+ * page-level bit is clear. However, it's possible that the bit got
+ * cleared after heap_vac_scan_next_block() was called, so we must recheck
+ * with buffer lock before concluding that the VM is corrupt.
+ */
+ if (heap_blk_known_av && !PageIsAllVisible(heap_page) &&
+ visibilitymap_get_status(relation, heap_blk, &vmbuffer) != 0)
+ {
+ elog(WARNING, "page is not marked all-visible but visibility map bit is set in relation \"%s\" page %u",
+ RelationGetRelationName(relation), heap_blk);
+ visibilitymap_clear(relation, heap_blk, vmbuffer,
+ VISIBILITYMAP_VALID_BITS);
+ return true;
+ }
+
+ /*
+ * It's possible for the value returned by
+ * GetOldestNonRemovableTransactionId() to move backwards, so it's not
+ * wrong for us to see tuples that appear to not be visible to everyone
+ * yet, while PD_ALL_VISIBLE is already set. The real safe xmin value
+ * never moves backwards, but GetOldestNonRemovableTransactionId() is
+ * conservative and sometimes returns a value that's unnecessarily small,
+ * so if we see that contradiction it just means that the tuples that we
+ * think are not visible to everyone yet actually are, and the
+ * PD_ALL_VISIBLE flag is correct.
+ *
+ * There should never be LP_DEAD items on a page with PD_ALL_VISIBLE set,
+ * however.
+ */
+ if (nlpdead_items > 0 && PageIsAllVisible(heap_page))
+ {
+ elog(WARNING, "page containing LP_DEAD items is marked as all-visible in relation \"%s\" page %u",
+ RelationGetRelationName(relation), heap_blk);
+ PageClearAllVisible(heap_page);
+ MarkBufferDirty(heap_buffer);
+ visibilitymap_clear(relation, heap_blk, vmbuffer,
+ VISIBILITYMAP_VALID_BITS);
+ return true;
+ }
+
+ return false;
+}
/*
* Prune and repair fragmentation and potentially freeze tuples on the
@@ -314,6 +381,10 @@ heap_page_prune_opt(Relation relation, Buffer buffer)
* HEAP_PRUNE_FREEZE option is not set, because at the moment only callers
* that also freeze need that information.
*
+ * blk_known_av is the visibility status of the heap block as of the last call
+ * to find_next_unskippable_block(). vmbuffer is the buffer that may already
+ * contain the required block of the visibility map.
+ *
* vistest is used to distinguish whether tuples are DEAD or RECENTLY_DEAD
* (see heap_prune_satisfies_vacuum).
*
@@ -349,6 +420,8 @@ heap_page_prune_opt(Relation relation, Buffer buffer)
*/
void
heap_page_prune_and_freeze(Relation relation, Buffer buffer,
+ bool blk_known_av,
+ Buffer vmbuffer,
GlobalVisState *vistest,
int options,
struct VacuumCutoffs *cutoffs,
@@ -897,6 +970,16 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
presult->lpdead_items = prstate.lpdead_items;
/* the presult->deadoffsets array was already filled in */
+ /*
+ * Clear any VM corruption. This does not need to be done in a critical
+ * section.
+ */
+ presult->vm_corruption = false;
+ if (options & HEAP_PAGE_PRUNE_UPDATE_VM)
+ presult->vm_corruption = identify_and_fix_vm_corruption(relation,
+ blockno, buffer, page,
+ blk_known_av,
+ prstate.lpdead_items, vmbuffer);
if (prstate.freeze)
{
if (presult->nfrozen > 0)
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index cbe37369790..d49c71bc1b5 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -430,12 +430,6 @@ static void find_next_unskippable_block(LVRelState *vacrel, bool *skipsallvis);
static bool lazy_scan_new_or_empty(LVRelState *vacrel, Buffer buf,
BlockNumber blkno, Page page,
bool sharelock, Buffer vmbuffer);
-static bool identify_and_fix_vm_corruption(Relation relation,
- BlockNumber heap_blk,
- Buffer heap_buffer, Page heap_page,
- bool heap_blk_known_av,
- int64 nlpdead_items,
- Buffer vmbuffer);
static int lazy_scan_prune(LVRelState *vacrel, Buffer buf,
BlockNumber blkno, Page page,
Buffer vmbuffer, bool all_visible_according_to_vm,
@@ -1938,65 +1932,6 @@ lazy_scan_new_or_empty(LVRelState *vacrel, Buffer buf, BlockNumber blkno,
return false;
}
-/*
- * When updating the visibility map after phase I heap vacuuming, we take the
- * opportunity to identify and fix any VM corruption.
- *
- * heap_blk_known_av is the visibility status of the heap page collected
- * while finding the next unskippable block in heap_vac_scan_next_block().
- */
-static bool
-identify_and_fix_vm_corruption(Relation relation,
- BlockNumber heap_blk,
- Buffer heap_buffer, Page heap_page,
- bool heap_blk_known_av,
- int64 nlpdead_items,
- Buffer vmbuffer)
-{
- /*
- * As of PostgreSQL 9.2, the visibility map bit should never be set if the
- * page-level bit is clear. However, it's possible that the bit got
- * cleared after heap_vac_scan_next_block() was called, so we must recheck
- * with buffer lock before concluding that the VM is corrupt.
- */
- if (heap_blk_known_av && !PageIsAllVisible(heap_page) &&
- visibilitymap_get_status(relation, heap_blk, &vmbuffer) != 0)
- {
- elog(WARNING, "page is not marked all-visible but visibility map bit is set in relation \"%s\" page %u",
- RelationGetRelationName(relation), heap_blk);
- visibilitymap_clear(relation, heap_blk, vmbuffer,
- VISIBILITYMAP_VALID_BITS);
- return true;
- }
-
- /*
- * It's possible for the value returned by
- * GetOldestNonRemovableTransactionId() to move backwards, so it's not
- * wrong for us to see tuples that appear to not be visible to everyone
- * yet, while PD_ALL_VISIBLE is already set. The real safe xmin value
- * never moves backwards, but GetOldestNonRemovableTransactionId() is
- * conservative and sometimes returns a value that's unnecessarily small,
- * so if we see that contradiction it just means that the tuples that we
- * think are not visible to everyone yet actually are, and the
- * PD_ALL_VISIBLE flag is correct.
- *
- * There should never be LP_DEAD items on a page with PD_ALL_VISIBLE set,
- * however.
- */
- if (nlpdead_items > 0 && PageIsAllVisible(heap_page))
- {
- elog(WARNING, "page containing LP_DEAD items is marked as all-visible in relation \"%s\" page %u",
- RelationGetRelationName(relation), heap_blk);
- PageClearAllVisible(heap_page);
- MarkBufferDirty(heap_buffer);
- visibilitymap_clear(relation, heap_blk, vmbuffer,
- VISIBILITYMAP_VALID_BITS);
- return true;
- }
-
- return false;
-}
-
/* qsort comparator for sorting OffsetNumbers */
static int
@@ -2055,11 +1990,14 @@ lazy_scan_prune(LVRelState *vacrel,
* tuples. Pruning will have determined whether or not the page is
* all-visible.
*/
- prune_options = HEAP_PAGE_PRUNE_FREEZE;
+ prune_options = HEAP_PAGE_PRUNE_FREEZE | HEAP_PAGE_PRUNE_UPDATE_VM;
if (vacrel->nindexes == 0)
prune_options |= HEAP_PAGE_PRUNE_MARK_UNUSED_NOW;
- heap_page_prune_and_freeze(rel, buf, vacrel->vistest, prune_options,
+ heap_page_prune_and_freeze(rel, buf,
+ all_visible_according_to_vm,
+ vmbuffer,
+ vacrel->vistest, prune_options,
&vacrel->cutoffs, &presult, PRUNE_VACUUM_SCAN,
&vacrel->offnum,
&vacrel->NewRelfrozenXid, &vacrel->NewRelminMxid);
@@ -2144,10 +2082,9 @@ lazy_scan_prune(LVRelState *vacrel,
/*
* Handle setting visibility map bit based on information from the VM (as
* of last heap_vac_scan_next_block() call), and from all_visible and
- * all_frozen variables. Start by looking for any VM corruption.
+ * all_frozen variables.
*/
- if (identify_and_fix_vm_corruption(vacrel->rel, blkno, buf, page,
- all_visible_according_to_vm, presult.lpdead_items, vmbuffer))
+ if (presult.vm_corruption)
{
/* Don't update the VM if we just cleared corruption in it */
}
diff --git a/src/include/access/heapam.h b/src/include/access/heapam.h
index e7129a644a1..0c7eb5e46f4 100644
--- a/src/include/access/heapam.h
+++ b/src/include/access/heapam.h
@@ -42,6 +42,7 @@
/* "options" flag bits for heap_page_prune_and_freeze */
#define HEAP_PAGE_PRUNE_MARK_UNUSED_NOW (1 << 0)
#define HEAP_PAGE_PRUNE_FREEZE (1 << 1)
+#define HEAP_PAGE_PRUNE_UPDATE_VM (1 << 2)
typedef struct BulkInsertStateData *BulkInsertState;
struct TupleTableSlot;
@@ -247,6 +248,7 @@ typedef struct PruneFreezeResult
bool all_visible;
bool all_frozen;
TransactionId vm_conflict_horizon;
+ bool vm_corruption;
/*
* Whether or not the page makes rel truncation unsafe. This is set to
@@ -380,6 +382,8 @@ extern TransactionId heap_index_delete_tuples(Relation rel,
struct GlobalVisState;
extern void heap_page_prune_opt(Relation relation, Buffer buffer);
extern void heap_page_prune_and_freeze(Relation relation, Buffer buffer,
+ bool blk_known_av,
+ Buffer vmbuffer,
struct GlobalVisState *vistest,
int options,
struct VacuumCutoffs *cutoffs,
--
2.43.0
v9-0008-Combine-vacuum-phase-I-VM-update-cases.patchtext/x-patch; charset=US-ASCII; name=v9-0008-Combine-vacuum-phase-I-VM-update-cases.patchDownload
From 50ca8c73a62531f8d1b30886551c492023ea9e47 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <melanieplageman@gmail.com>
Date: Wed, 28 May 2025 16:35:36 -0400
Subject: [PATCH v9 08/22] Combine vacuum phase I VM update cases
We update the VM after phase I of vacuum -- either setting both the VM
bits when all bits are currently unset or setting just the frozen bit
when the all-visible bit is already set.
Those two cases shared much of the same code -- leading to unnecessary
duplication. This commit combines them, which is simpler and easier to
understand.
The combined case also happens to fix a longstanding bug where if we are
only setting an all-visible page all-frozen and checksums/wal_log_hints
are enabled, we would fail to set the buffer dirty before setting the
page LSN in visibilitymap_set().
---
src/backend/access/heap/vacuumlazy.c | 101 +++++++++------------------
1 file changed, 32 insertions(+), 69 deletions(-)
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index 1474835c74b..cbe37369790 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -2151,11 +2151,26 @@ lazy_scan_prune(LVRelState *vacrel,
{
/* Don't update the VM if we just cleared corruption in it */
}
- else if (!all_visible_according_to_vm && presult.all_visible)
+
+ /*
+ * If the page isn't yet marked all-visible in the VM or it is and needs
+ * to me marked all-frozen, update the VM Note that all_frozen is only
+ * valid if all_visible is true, so we must check both all_visible and
+ * all_frozen.
+ */
+ else if (presult.all_visible &&
+ (!all_visible_according_to_vm ||
+ (presult.all_frozen && !VM_ALL_FROZEN(vacrel->rel, blkno, &vmbuffer))))
{
uint8 old_vmbits;
uint8 flags = VISIBILITYMAP_ALL_VISIBLE;
+ /*
+ * If the page is all-frozen, we can pass InvalidTransactionId as our
+ * cutoff_xid, since a snapshotConflictHorizon sufficient to make
+ * everything safe for REDO was logged when the page's tuples were
+ * frozen.
+ */
if (presult.all_frozen)
{
Assert(!TransactionIdIsValid(presult.vm_conflict_horizon));
@@ -2168,21 +2183,29 @@ lazy_scan_prune(LVRelState *vacrel,
* checksums are not enabled). Regardless, set both bits so that we
* get back in sync.
*
- * NB: If the heap page is all-visible but the VM bit is not set, we
- * don't need to dirty the heap page. However, if checksums are
- * enabled, we do need to make sure that the heap page is dirtied
- * before passing it to visibilitymap_set(), because it may be logged.
- * Given that this situation should only happen in rare cases after a
- * crash, it is not worth optimizing.
+ * If the heap page is all-visible but the VM bit is not set, we don't
+ * need to dirty the heap page. However, if checksums are enabled, we
+ * do need to make sure that the heap page is dirtied before passing
+ * it to visibilitymap_set(), because it may be logged.
*/
- PageSetAllVisible(page);
- MarkBufferDirty(buf);
+ if (!PageIsAllVisible(page) || XLogHintBitIsNeeded())
+ {
+ PageSetAllVisible(page);
+ MarkBufferDirty(buf);
+ }
+
old_vmbits = visibilitymap_set(vacrel->rel, blkno, buf,
InvalidXLogRecPtr,
vmbuffer, presult.vm_conflict_horizon,
flags);
/*
+ * Even if we are only setting the all-frozen bit, there is a small
+ * chance that the VM was modified sometime between setting
+ * all_visible_according_to_vm and checking the visibility during
+ * pruning. Check the return value of old_vmbits to ensure the
+ * visibility map counters used for logging are accurate.
+ *
* If the page wasn't already set all-visible and/or all-frozen in the
* VM, count it as newly set for logging.
*/
@@ -2203,66 +2226,6 @@ lazy_scan_prune(LVRelState *vacrel,
}
}
- /*
- * If the all-visible page is all-frozen but not marked as such yet, mark
- * it as all-frozen. Note that all_frozen is only valid if all_visible is
- * true, so we must check both all_visible and all_frozen.
- */
- else if (all_visible_according_to_vm && presult.all_visible &&
- presult.all_frozen && !VM_ALL_FROZEN(vacrel->rel, blkno, &vmbuffer))
- {
- uint8 old_vmbits;
-
- /*
- * Avoid relying on all_visible_according_to_vm as a proxy for the
- * page-level PD_ALL_VISIBLE bit being set, since it might have become
- * stale -- even when all_visible is set
- */
- if (!PageIsAllVisible(page))
- {
- PageSetAllVisible(page);
- MarkBufferDirty(buf);
- }
-
- /*
- * Set the page all-frozen (and all-visible) in the VM.
- *
- * We can pass InvalidTransactionId as our cutoff_xid, since a
- * snapshotConflictHorizon sufficient to make everything safe for REDO
- * was logged when the page's tuples were frozen.
- */
- Assert(!TransactionIdIsValid(presult.vm_conflict_horizon));
- old_vmbits = visibilitymap_set(vacrel->rel, blkno, buf,
- InvalidXLogRecPtr,
- vmbuffer, InvalidTransactionId,
- VISIBILITYMAP_ALL_VISIBLE |
- VISIBILITYMAP_ALL_FROZEN);
-
- /*
- * The page was likely already set all-visible in the VM. However,
- * there is a small chance that it was modified sometime between
- * setting all_visible_according_to_vm and checking the visibility
- * during pruning. Check the return value of old_vmbits anyway to
- * ensure the visibility map counters used for logging are accurate.
- */
- if ((old_vmbits & VISIBILITYMAP_ALL_VISIBLE) == 0)
- {
- vacrel->vm_new_visible_pages++;
- vacrel->vm_new_visible_frozen_pages++;
- *vm_page_frozen = true;
- }
-
- /*
- * We already checked that the page was not set all-frozen in the VM
- * above, so we don't need to test the value of old_vmbits.
- */
- else
- {
- vacrel->vm_new_frozen_pages++;
- *vm_page_frozen = true;
- }
- }
-
return presult.ndeleted;
}
--
2.43.0
v9-0010-Keep-all_frozen-updated-too-in-heap_page_prune_an.patchtext/x-patch; charset=US-ASCII; name=v9-0010-Keep-all_frozen-updated-too-in-heap_page_prune_an.patchDownload
From c947f3564585049b4349216cbbc57c42aaea8aaf Mon Sep 17 00:00:00 2001
From: Melanie Plageman <melanieplageman@gmail.com>
Date: Wed, 23 Jul 2025 16:01:24 -0400
Subject: [PATCH v9 10/22] Keep all_frozen updated too in
heap_page_prune_and_freeze
We previously relied on only using all-visible and all-frozen together
but it's best to keep them both updated.
Future commits will separate usage of these fields, so it is best not to
rely on all_visible for all_frozen's validity.
---
src/backend/access/heap/pruneheap.c | 15 ++++++---------
1 file changed, 6 insertions(+), 9 deletions(-)
diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index 5c08a5d44c7..18eab8d0518 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -143,10 +143,6 @@ typedef struct
* whether to freeze the page or not. The all_visible and all_frozen
* values returned to the caller are adjusted to include LP_DEAD items at
* the end.
- *
- * all_frozen should only be considered valid if all_visible is also set;
- * we don't bother to clear the all_frozen flag every time we clear the
- * all_visible flag.
*/
bool all_visible;
bool all_frozen;
@@ -824,6 +820,7 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
*/
}
+ Assert(!prstate.all_frozen || prstate.all_visible);
/* Any error while applying the changes is critical */
START_CRIT_SECTION();
@@ -1468,7 +1465,7 @@ heap_prune_record_unchanged_lp_normal(Page page, PruneState *prstate, OffsetNumb
if (!HeapTupleHeaderXminCommitted(htup))
{
- prstate->all_visible = false;
+ prstate->all_visible = prstate->all_frozen = false;
break;
}
@@ -1490,7 +1487,7 @@ heap_prune_record_unchanged_lp_normal(Page page, PruneState *prstate, OffsetNumb
Assert(prstate->cutoffs);
if (!TransactionIdPrecedes(xmin, prstate->cutoffs->OldestXmin))
{
- prstate->all_visible = false;
+ prstate->all_visible = prstate->all_frozen = false;
break;
}
@@ -1503,7 +1500,7 @@ heap_prune_record_unchanged_lp_normal(Page page, PruneState *prstate, OffsetNumb
case HEAPTUPLE_RECENTLY_DEAD:
prstate->recently_dead_tuples++;
- prstate->all_visible = false;
+ prstate->all_visible = prstate->all_frozen = false;
/*
* This tuple will soon become DEAD. Update the hint field so
@@ -1522,7 +1519,7 @@ heap_prune_record_unchanged_lp_normal(Page page, PruneState *prstate, OffsetNumb
* assumption is a bit shaky, but it is what acquire_sample_rows()
* does, so be consistent.
*/
- prstate->all_visible = false;
+ prstate->all_visible = prstate->all_frozen = false;
/*
* If we wanted to optimize for aborts, we might consider marking
@@ -1540,7 +1537,7 @@ heap_prune_record_unchanged_lp_normal(Page page, PruneState *prstate, OffsetNumb
* will commit and update the counters after we report.
*/
prstate->live_tuples++;
- prstate->all_visible = false;
+ prstate->all_visible = prstate->all_frozen = false;
/*
* This tuple may soon become DEAD. Update the hint field so that
--
2.43.0
v9-0011-Update-VM-in-pruneheap.c.patchtext/x-patch; charset=US-ASCII; name=v9-0011-Update-VM-in-pruneheap.c.patchDownload
From fe909609c0d76b835430169a2b7579b0177ca2d1 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <melanieplageman@gmail.com>
Date: Mon, 2 Jun 2025 11:04:14 -0400
Subject: [PATCH v9 11/22] Update VM in pruneheap.c
As a step toward updating the VM in the same critical section and WAL
record as pruning and freezing (during phase I of vacuuming), first move
the VM update (still in its own critical section and WAL record) into
heap_page_prune_and_freeze(). This makes review easier.
---
src/backend/access/heap/pruneheap.c | 99 +++++++++++++++++++++++-----
src/backend/access/heap/vacuumlazy.c | 99 +++++-----------------------
src/include/access/heapam.h | 15 +++--
3 files changed, 106 insertions(+), 107 deletions(-)
diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index 18eab8d0518..3483b5caff3 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -360,7 +360,8 @@ identify_and_fix_vm_corruption(Relation relation,
/*
* Prune and repair fragmentation and potentially freeze tuples on the
- * specified page.
+ * specified page. If the page's visibility status has changed, update it in
+ * the VM.
*
* Caller must have pin and buffer cleanup lock on the page. Note that we
* don't update the FSM information for page on caller's behalf. Caller might
@@ -436,6 +437,8 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
bool do_freeze;
bool do_prune;
bool do_hint;
+ uint8 vmflags = 0;
+ uint8 old_vmbits = 0;
bool hint_bit_fpi;
int64 fpi_before = pgWalUsage.wal_fpi;
@@ -936,7 +939,7 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
*
* Now that freezing has been finalized, unset all_visible if there are
* any LP_DEAD items on the page. It needs to reflect the present state
- * of the page, as expected by our caller.
+ * of the page, as expected for updating the visibility map.
*/
if (prstate.all_visible && prstate.lpdead_items == 0)
{
@@ -952,31 +955,91 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
presult->hastup = prstate.hastup;
/*
- * For callers planning to update the visibility map, the conflict horizon
- * for that record must be the newest xmin on the page. However, if the
- * page is completely frozen, there can be no conflict and the
- * vm_conflict_horizon should remain InvalidTransactionId. This includes
- * the case that we just froze all the tuples; the prune-freeze record
- * included the conflict XID already so the caller doesn't need it.
+ * If updating the visibility map, the conflict horizon for that record
+ * must be the newest xmin on the page. However, if the page is
+ * completely frozen, there can be no conflict and the vm_conflict_horizon
+ * should remain InvalidTransactionId. This includes the case that we
+ * just froze all the tuples; the prune-freeze record included the
+ * conflict XID already so the VM update record doesn't need it.
*/
if (presult->all_frozen)
presult->vm_conflict_horizon = InvalidTransactionId;
else
presult->vm_conflict_horizon = prstate.visibility_cutoff_xid;
- presult->lpdead_items = prstate.lpdead_items;
- /* the presult->deadoffsets array was already filled in */
-
/*
- * Clear any VM corruption. This does not need to be done in a critical
- * section.
+ * Handle setting visibility map bit based on information from the VM (as
+ * of last heap_vac_scan_next_block() call), and from all_visible and
+ * all_frozen variables.
*/
- presult->vm_corruption = false;
if (options & HEAP_PAGE_PRUNE_UPDATE_VM)
- presult->vm_corruption = identify_and_fix_vm_corruption(relation,
- blockno, buffer, page,
- blk_known_av,
- prstate.lpdead_items, vmbuffer);
+ {
+ if (identify_and_fix_vm_corruption(relation,
+ blockno, buffer, page,
+ blk_known_av,
+ prstate.lpdead_items, vmbuffer))
+ {
+ /* If we fix corruption, don't update the VM further */
+ }
+
+ /*
+ * If the page isn't yet marked all-visible in the VM or it is and
+ * needs to me marked all-frozen, update the VM Note that all_frozen
+ * is only valid if all_visible is true, so we must check both
+ * all_visible and all_frozen.
+ */
+ else if (presult->all_visible &&
+ (!blk_known_av ||
+ (presult->all_frozen && !VM_ALL_FROZEN(relation, blockno, &vmbuffer))))
+ {
+ Assert(prstate.lpdead_items == 0);
+ vmflags = VISIBILITYMAP_ALL_VISIBLE;
+
+ /*
+ * If the page is all-frozen, we can pass InvalidTransactionId as
+ * our cutoff_xid, since a snapshotConflictHorizon sufficient to
+ * make everything safe for REDO was logged when the page's tuples
+ * were frozen.
+ */
+ if (presult->all_frozen)
+ {
+ Assert(!TransactionIdIsValid(presult->vm_conflict_horizon));
+ vmflags |= VISIBILITYMAP_ALL_FROZEN;
+ }
+
+ /*
+ * It's possible for the VM bit to be clear and the page-level bit
+ * to be set if checksums are not enabled.
+ *
+ * And even if we are just planning to update the frozen bit in
+ * the VM, we shouldn't rely on all_visible_according_to_vm as a
+ * proxy for the page-level PD_ALL_VISIBLE bit being set, since it
+ * might have become stale.
+ *
+ * If the heap page is all-visible but the VM bit is not set, we
+ * don't need to dirty the heap page. However, if checksums are
+ * enabled, we do need to make sure that the heap page is dirtied
+ * before passing it to visibilitymap_set(), because it may be
+ * logged.
+ */
+ if (!PageIsAllVisible(page) || XLogHintBitIsNeeded())
+ {
+ PageSetAllVisible(page);
+ MarkBufferDirty(buffer);
+ }
+
+ old_vmbits = visibilitymap_set(relation, blockno, buffer, InvalidXLogRecPtr,
+ vmbuffer, presult->vm_conflict_horizon,
+ vmflags);
+ }
+ }
+
+ presult->lpdead_items = prstate.lpdead_items;
+ /* the presult->deadoffsets array was already filled in */
+
+ presult->old_vmbits = old_vmbits;
+ presult->new_vmbits = vmflags;
+
if (prstate.freeze)
{
if (presult->nfrozen > 0)
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index d49c71bc1b5..05d3d2a3267 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -1932,7 +1932,6 @@ lazy_scan_new_or_empty(LVRelState *vacrel, Buffer buf, BlockNumber blkno,
return false;
}
-
/* qsort comparator for sorting OffsetNumbers */
static int
cmpOffsetNumbers(const void *a, const void *b)
@@ -1948,7 +1947,8 @@ cmpOffsetNumbers(const void *a, const void *b)
* vmbuffer is the buffer containing the VM block with visibility information
* for the heap block, blkno. all_visible_according_to_vm is the saved
* visibility status of the heap block looked up earlier by the caller. We
- * won't rely entirely on this status, as it may be out of date.
+ * won't rely entirely on this status, as it may be out of date. These will be
+ * passed on to heap_page_prune_and_freeze() to use while setting the VM.
*
* *has_lpdead_items is set to true or false depending on whether, upon return
* from this function, any LP_DEAD items are still present on the page.
@@ -1977,6 +1977,7 @@ lazy_scan_prune(LVRelState *vacrel,
/*
* Prune all HOT-update chains and potentially freeze tuples on this page.
+ * Then, if the page's visibility status has changed, update the VM.
*
* If the relation has no indexes, we can immediately mark would-be dead
* items LP_UNUSED.
@@ -1985,10 +1986,6 @@ lazy_scan_prune(LVRelState *vacrel,
* presult.ndeleted. It should not be confused with presult.lpdead_items;
* presult.lpdead_items's final value can be thought of as the number of
* tuples that were deleted from indexes.
- *
- * We will update the VM after collecting LP_DEAD items and freezing
- * tuples. Pruning will have determined whether or not the page is
- * all-visible.
*/
prune_options = HEAP_PAGE_PRUNE_FREEZE | HEAP_PAGE_PRUNE_UPDATE_VM;
if (vacrel->nindexes == 0)
@@ -2080,88 +2077,26 @@ lazy_scan_prune(LVRelState *vacrel,
Assert(!presult.all_visible || !(*has_lpdead_items));
/*
- * Handle setting visibility map bit based on information from the VM (as
- * of last heap_vac_scan_next_block() call), and from all_visible and
- * all_frozen variables.
+ * For the purposes of logging, count whether or not the page was newly
+ * set all-visible and, potentially, all-frozen.
*/
- if (presult.vm_corruption)
+ if ((presult.old_vmbits & VISIBILITYMAP_ALL_VISIBLE) == 0 &&
+ (presult.new_vmbits & VISIBILITYMAP_ALL_VISIBLE) != 0)
{
- /* Don't update the VM if we just cleared corruption in it */
- }
-
- /*
- * If the page isn't yet marked all-visible in the VM or it is and needs
- * to me marked all-frozen, update the VM Note that all_frozen is only
- * valid if all_visible is true, so we must check both all_visible and
- * all_frozen.
- */
- else if (presult.all_visible &&
- (!all_visible_according_to_vm ||
- (presult.all_frozen && !VM_ALL_FROZEN(vacrel->rel, blkno, &vmbuffer))))
- {
- uint8 old_vmbits;
- uint8 flags = VISIBILITYMAP_ALL_VISIBLE;
-
- /*
- * If the page is all-frozen, we can pass InvalidTransactionId as our
- * cutoff_xid, since a snapshotConflictHorizon sufficient to make
- * everything safe for REDO was logged when the page's tuples were
- * frozen.
- */
- if (presult.all_frozen)
- {
- Assert(!TransactionIdIsValid(presult.vm_conflict_horizon));
- flags |= VISIBILITYMAP_ALL_FROZEN;
- }
-
- /*
- * It should never be the case that the visibility map page is set
- * while the page-level bit is clear, but the reverse is allowed (if
- * checksums are not enabled). Regardless, set both bits so that we
- * get back in sync.
- *
- * If the heap page is all-visible but the VM bit is not set, we don't
- * need to dirty the heap page. However, if checksums are enabled, we
- * do need to make sure that the heap page is dirtied before passing
- * it to visibilitymap_set(), because it may be logged.
- */
- if (!PageIsAllVisible(page) || XLogHintBitIsNeeded())
- {
- PageSetAllVisible(page);
- MarkBufferDirty(buf);
- }
-
- old_vmbits = visibilitymap_set(vacrel->rel, blkno, buf,
- InvalidXLogRecPtr,
- vmbuffer, presult.vm_conflict_horizon,
- flags);
-
- /*
- * Even if we are only setting the all-frozen bit, there is a small
- * chance that the VM was modified sometime between setting
- * all_visible_according_to_vm and checking the visibility during
- * pruning. Check the return value of old_vmbits to ensure the
- * visibility map counters used for logging are accurate.
- *
- * If the page wasn't already set all-visible and/or all-frozen in the
- * VM, count it as newly set for logging.
- */
- if ((old_vmbits & VISIBILITYMAP_ALL_VISIBLE) == 0)
- {
- vacrel->vm_new_visible_pages++;
- if (presult.all_frozen)
- {
- vacrel->vm_new_visible_frozen_pages++;
- *vm_page_frozen = true;
- }
- }
- else if ((old_vmbits & VISIBILITYMAP_ALL_FROZEN) == 0 &&
- presult.all_frozen)
+ vacrel->vm_new_visible_pages++;
+ if ((presult.new_vmbits & VISIBILITYMAP_ALL_FROZEN) != 0)
{
- vacrel->vm_new_frozen_pages++;
+ vacrel->vm_new_visible_frozen_pages++;
*vm_page_frozen = true;
}
}
+ else if ((presult.old_vmbits & VISIBILITYMAP_ALL_FROZEN) == 0 &&
+ (presult.new_vmbits & VISIBILITYMAP_ALL_FROZEN) != 0)
+ {
+ Assert((presult.new_vmbits & VISIBILITYMAP_ALL_VISIBLE) != 0);
+ vacrel->vm_new_frozen_pages++;
+ *vm_page_frozen = true;
+ }
return presult.ndeleted;
}
diff --git a/src/include/access/heapam.h b/src/include/access/heapam.h
index 0c7eb5e46f4..b85648456e9 100644
--- a/src/include/access/heapam.h
+++ b/src/include/access/heapam.h
@@ -235,20 +235,21 @@ typedef struct PruneFreezeResult
int recently_dead_tuples;
/*
- * all_visible and all_frozen indicate if the all-visible and all-frozen
- * bits in the visibility map can be set for this page, after pruning.
+ * all_visible and all_frozen indicate the status of the page as reflected
+ * in the visibility map after pruning, freezing, and setting any pages
+ * all-visible in the visibility map.
*
- * vm_conflict_horizon is the newest xmin of live tuples on the page. The
- * caller can use it as the conflict horizon when setting the VM bits. It
- * is only valid if we froze some tuples (nfrozen > 0), and all_frozen is
- * true.
+ * vm_conflict_horizon is the newest xmin of live tuples on the page
+ * (older than OldestXmin). It will only be valid if we did not set the
+ * page all-frozen in the VM.
*
* These are only set if the HEAP_PRUNE_FREEZE option is set.
*/
bool all_visible;
bool all_frozen;
TransactionId vm_conflict_horizon;
- bool vm_corruption;
+ uint8 old_vmbits;
+ uint8 new_vmbits;
/*
* Whether or not the page makes rel truncation unsafe. This is set to
--
2.43.0
v9-0013-Rename-PruneState.freeze-to-attempt_freeze.patchtext/x-patch; charset=US-ASCII; name=v9-0013-Rename-PruneState.freeze-to-attempt_freeze.patchDownload
From 96013b0fbfd3bf63d2940549e51317a89ee73b4e Mon Sep 17 00:00:00 2001
From: Melanie Plageman <melanieplageman@gmail.com>
Date: Thu, 31 Jul 2025 14:07:51 -0400
Subject: [PATCH v9 13/22] Rename PruneState.freeze to attempt_freeze
This makes it more clear that this is to indicate the caller would like
heap_page_prune_and_freeze() to consider freezing tuples -- not that we
ultimately will end up freezing them.
Also rename local variable hint_bit_fpi to did_tuple_hint_fpi. This
makes it clear it is about tuple hints and not page hints and that it
indicates something that happened and not something that could happen.
---
src/backend/access/heap/pruneheap.c | 20 ++++++++++----------
1 file changed, 10 insertions(+), 10 deletions(-)
diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index 683c1762c25..669c088ccff 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -43,7 +43,7 @@ typedef struct
/* whether or not dead items can be set LP_UNUSED during pruning */
bool mark_unused_now;
/* whether to attempt freezing tuples */
- bool freeze;
+ bool attempt_freeze;
/*
* Whether or not to consider updating the VM. There is some bookkeeping
@@ -452,7 +452,7 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
bool do_set_vm;
uint8 vmflags = 0;
uint8 old_vmbits = 0;
- bool hint_bit_fpi;
+ bool did_tuple_hint_fpi;
int64 fpi_before = pgWalUsage.wal_fpi;
bool all_frozen_except_lp_dead = false;
bool set_pd_all_visible = false;
@@ -460,7 +460,7 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
/* Copy parameters to prstate */
prstate.vistest = vistest;
prstate.mark_unused_now = (options & HEAP_PAGE_PRUNE_MARK_UNUSED_NOW) != 0;
- prstate.freeze = (options & HEAP_PAGE_PRUNE_FREEZE) != 0;
+ prstate.attempt_freeze = (options & HEAP_PAGE_PRUNE_FREEZE) != 0;
prstate.consider_update_vm = (options & HEAP_PAGE_PRUNE_UPDATE_VM) != 0;
prstate.cutoffs = cutoffs;
@@ -485,7 +485,7 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
/* initialize page freezing working state */
prstate.pagefrz.freeze_required = false;
- if (prstate.freeze)
+ if (prstate.attempt_freeze)
{
Assert(new_relfrozen_xid && new_relmin_mxid);
prstate.pagefrz.FreezePageRelfrozenXid = *new_relfrozen_xid;
@@ -535,7 +535,7 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
* bookkeeping. Initializing all_visible to false allows skipping the work
* to update them in heap_prune_record_unchanged_lp_normal().
*/
- if (prstate.freeze)
+ if (prstate.attempt_freeze)
{
prstate.all_visible = true;
prstate.all_frozen = true;
@@ -653,7 +653,7 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
* If checksums are enabled, heap_prune_satisfies_vacuum() may have caused
* an FPI to be emitted.
*/
- hint_bit_fpi = fpi_before != pgWalUsage.wal_fpi;
+ did_tuple_hint_fpi = fpi_before != pgWalUsage.wal_fpi;
/*
* Process HOT chains.
@@ -770,7 +770,7 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
* plans we prepared, or not.
*/
do_freeze = false;
- if (prstate.freeze)
+ if (prstate.attempt_freeze)
{
if (prstate.pagefrz.freeze_required)
{
@@ -803,7 +803,7 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
*/
if (RelationNeedsWAL(relation))
{
- if (hint_bit_fpi)
+ if (did_tuple_hint_fpi)
do_freeze = true;
else if (do_prune)
{
@@ -1128,7 +1128,7 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
presult->lpdead_items = prstate.lpdead_items;
/* the presult->deadoffsets array was already filled in */
- if (prstate.freeze)
+ if (prstate.attempt_freeze)
{
if (presult->nfrozen > 0)
{
@@ -1715,7 +1715,7 @@ heap_prune_record_unchanged_lp_normal(Page page, PruneState *prstate, OffsetNumb
* to update the VM, we have to call heap_prepare_freeze_tuple() on every
* tuple to know whether or not the page will be totally frozen.
*/
- if (prstate->freeze)
+ if (prstate->attempt_freeze)
{
bool totally_frozen;
--
2.43.0
v9-0014-Remove-xl_heap_visible-entirely.patchtext/x-patch; charset=US-ASCII; name=v9-0014-Remove-xl_heap_visible-entirely.patchDownload
From faf936042bbe225175e8bc6474d3617e70cb215d Mon Sep 17 00:00:00 2001
From: Melanie Plageman <melanieplageman@gmail.com>
Date: Wed, 18 Jun 2025 12:30:42 -0400
Subject: [PATCH v9 14/22] Remove xl_heap_visible entirely
There are now no users of this, so eliminate it entirely.
---
src/backend/access/common/bufmask.c | 3 +-
src/backend/access/heap/heapam.c | 54 +-------
src/backend/access/heap/heapam_xlog.c | 152 ++---------------------
src/backend/access/heap/pruneheap.c | 4 +-
src/backend/access/heap/vacuumlazy.c | 10 +-
src/backend/access/heap/visibilitymap.c | 109 +---------------
src/backend/access/rmgrdesc/heapdesc.c | 10 --
src/backend/replication/logical/decode.c | 1 -
src/backend/storage/ipc/standby.c | 12 +-
src/include/access/heapam_xlog.h | 20 ---
src/include/access/visibilitymap.h | 11 +-
src/include/access/visibilitymapdefs.h | 9 --
src/tools/pgindent/typedefs.list | 1 -
13 files changed, 32 insertions(+), 364 deletions(-)
diff --git a/src/backend/access/common/bufmask.c b/src/backend/access/common/bufmask.c
index bb260cffa68..1fff01383b3 100644
--- a/src/backend/access/common/bufmask.c
+++ b/src/backend/access/common/bufmask.c
@@ -56,8 +56,7 @@ mask_page_hint_bits(Page page)
/*
* During replay, if the page LSN has advanced past our XLOG record's LSN,
- * we don't mark the page all-visible. See heap_xlog_visible() for
- * details.
+ * we don't mark the page all-visible.
*/
PageClearAllVisible(page);
}
diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index 893a739009a..cb16bb0cbbd 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -40,6 +40,7 @@
#include "access/valid.h"
#include "access/visibilitymap.h"
#include "access/xloginsert.h"
+#include "access/xlogutils.h"
#include "catalog/pg_database.h"
#include "catalog/pg_database_d.h"
#include "commands/vacuum.h"
@@ -2523,11 +2524,11 @@ heap_multi_insert(Relation relation, TupleTableSlot **slots, int ntuples,
{
PageSetAllVisible(page);
LockBuffer(vmbuffer, BUFFER_LOCK_EXCLUSIVE);
- visibilitymap_set_vmbits(relation,
- BufferGetBlockNumber(buffer),
- vmbuffer,
- VISIBILITYMAP_ALL_VISIBLE |
- VISIBILITYMAP_ALL_FROZEN);
+ visibilitymap_set(relation,
+ BufferGetBlockNumber(buffer),
+ vmbuffer,
+ VISIBILITYMAP_ALL_VISIBLE |
+ VISIBILITYMAP_ALL_FROZEN);
}
/*
@@ -8798,49 +8799,6 @@ bottomup_sort_and_shrink(TM_IndexDeleteOp *delstate)
return nblocksfavorable;
}
-/*
- * Perform XLogInsert for a heap-visible operation. 'block' is the block
- * being marked all-visible, and vm_buffer is the buffer containing the
- * corresponding visibility map block. Both should have already been modified
- * and dirtied.
- *
- * snapshotConflictHorizon comes from the largest xmin on the page being
- * marked all-visible. REDO routine uses it to generate recovery conflicts.
- *
- * If checksums or wal_log_hints are enabled, we may also generate a full-page
- * image of heap_buffer. Otherwise, we optimize away the FPI (by specifying
- * REGBUF_NO_IMAGE for the heap buffer), in which case the caller should *not*
- * update the heap page's LSN.
- */
-XLogRecPtr
-log_heap_visible(Relation rel, Buffer heap_buffer, Buffer vm_buffer,
- TransactionId snapshotConflictHorizon, uint8 vmflags)
-{
- xl_heap_visible xlrec;
- XLogRecPtr recptr;
- uint8 flags;
-
- Assert(BufferIsValid(heap_buffer));
- Assert(BufferIsValid(vm_buffer));
-
- xlrec.snapshotConflictHorizon = snapshotConflictHorizon;
- xlrec.flags = vmflags;
- if (RelationIsAccessibleInLogicalDecoding(rel))
- xlrec.flags |= VISIBILITYMAP_XLOG_CATALOG_REL;
- XLogBeginInsert();
- XLogRegisterData(&xlrec, SizeOfHeapVisible);
-
- XLogRegisterBuffer(0, vm_buffer, 0);
-
- flags = REGBUF_STANDARD;
- if (!XLogHintBitIsNeeded())
- flags |= REGBUF_NO_IMAGE;
- XLogRegisterBuffer(1, heap_buffer, flags);
-
- recptr = XLogInsert(RM_HEAP2_ID, XLOG_HEAP2_VISIBLE);
-
- return recptr;
-}
/*
* Perform XLogInsert for a heap-update operation. Caller must already
diff --git a/src/backend/access/heap/heapam_xlog.c b/src/backend/access/heap/heapam_xlog.c
index 11c11929ed9..ff3ad8b4cd2 100644
--- a/src/backend/access/heap/heapam_xlog.c
+++ b/src/backend/access/heap/heapam_xlog.c
@@ -53,6 +53,8 @@ heap_xlog_prune_freeze(XLogReaderState *record)
(xlrec.flags & (XLHP_HAS_REDIRECTIONS | XLHP_HAS_DEAD_ITEMS)) == 0);
vmflags = xlrec.flags & VISIBILITYMAP_VALID_BITS;
+ /* Must never set all_frozen bit without also setting all_visible bit */
+ Assert(vmflags != VISIBILITYMAP_ALL_FROZEN);
/*
* After xl_heap_prune is the optional snapshot conflict horizon.
@@ -250,7 +252,7 @@ heap_xlog_prune_freeze(XLogReaderState *record)
if (PageIsNew(vmpage))
PageInit(vmpage, BLCKSZ, 0);
- old_vmbits = visibilitymap_set_vmbits(reln, blkno, vmbuffer, vmflags);
+ old_vmbits = visibilitymap_set(reln, blkno, vmbuffer, vmflags);
/* Only set VM page LSN if we modified the page */
if (old_vmbits != vmflags)
@@ -269,142 +271,6 @@ heap_xlog_prune_freeze(XLogReaderState *record)
XLogRecordPageWithFreeSpace(rlocator, blkno, freespace);
}
-/*
- * Replay XLOG_HEAP2_VISIBLE records.
- *
- * The critical integrity requirement here is that we must never end up with
- * a situation where the visibility map bit is set, and the page-level
- * PD_ALL_VISIBLE bit is clear. If that were to occur, then a subsequent
- * page modification would fail to clear the visibility map bit.
- */
-static void
-heap_xlog_visible(XLogReaderState *record)
-{
- XLogRecPtr lsn = record->EndRecPtr;
- xl_heap_visible *xlrec = (xl_heap_visible *) XLogRecGetData(record);
- Buffer vmbuffer = InvalidBuffer;
- Buffer buffer;
- Page page;
- RelFileLocator rlocator;
- BlockNumber blkno;
- XLogRedoAction action;
-
- Assert((xlrec->flags & VISIBILITYMAP_XLOG_VALID_BITS) == xlrec->flags);
-
- XLogRecGetBlockTag(record, 1, &rlocator, NULL, &blkno);
-
- /*
- * If there are any Hot Standby transactions running that have an xmin
- * horizon old enough that this page isn't all-visible for them, they
- * might incorrectly decide that an index-only scan can skip a heap fetch.
- *
- * NB: It might be better to throw some kind of "soft" conflict here that
- * forces any index-only scan that is in flight to perform heap fetches,
- * rather than killing the transaction outright.
- */
- if (InHotStandby)
- ResolveRecoveryConflictWithSnapshot(xlrec->snapshotConflictHorizon,
- xlrec->flags & VISIBILITYMAP_XLOG_CATALOG_REL,
- rlocator);
-
- /*
- * Read the heap page, if it still exists. If the heap file has dropped or
- * truncated later in recovery, we don't need to update the page, but we'd
- * better still update the visibility map.
- */
- action = XLogReadBufferForRedo(record, 1, &buffer);
- if (action == BLK_NEEDS_REDO)
- {
- /*
- * We don't bump the LSN of the heap page when setting the visibility
- * map bit (unless checksums or wal_hint_bits is enabled, in which
- * case we must). This exposes us to torn page hazards, but since
- * we're not inspecting the existing page contents in any way, we
- * don't care.
- */
- page = BufferGetPage(buffer);
-
- PageSetAllVisible(page);
-
- if (XLogHintBitIsNeeded())
- PageSetLSN(page, lsn);
-
- MarkBufferDirty(buffer);
- }
- else if (action == BLK_RESTORED)
- {
- /*
- * If heap block was backed up, we already restored it and there's
- * nothing more to do. (This can only happen with checksums or
- * wal_log_hints enabled.)
- */
- }
-
- if (BufferIsValid(buffer))
- {
- Size space = PageGetFreeSpace(BufferGetPage(buffer));
-
- UnlockReleaseBuffer(buffer);
-
- /*
- * Since FSM is not WAL-logged and only updated heuristically, it
- * easily becomes stale in standbys. If the standby is later promoted
- * and runs VACUUM, it will skip updating individual free space
- * figures for pages that became all-visible (or all-frozen, depending
- * on the vacuum mode,) which is troublesome when FreeSpaceMapVacuum
- * propagates too optimistic free space values to upper FSM layers;
- * later inserters try to use such pages only to find out that they
- * are unusable. This can cause long stalls when there are many such
- * pages.
- *
- * Forestall those problems by updating FSM's idea about a page that
- * is becoming all-visible or all-frozen.
- *
- * Do this regardless of a full-page image being applied, since the
- * FSM data is not in the page anyway.
- */
- if (xlrec->flags & VISIBILITYMAP_VALID_BITS)
- XLogRecordPageWithFreeSpace(rlocator, blkno, space);
- }
-
- /*
- * Even if we skipped the heap page update due to the LSN interlock, it's
- * still safe to update the visibility map. Any WAL record that clears
- * the visibility map bit does so before checking the page LSN, so any
- * bits that need to be cleared will still be cleared.
- */
- if (XLogReadBufferForRedoExtended(record, 0, RBM_ZERO_ON_ERROR, false,
- &vmbuffer) == BLK_NEEDS_REDO)
- {
- Page vmpage = BufferGetPage(vmbuffer);
- Relation reln;
- uint8 vmbits;
-
- /* initialize the page if it was read as zeros */
- if (PageIsNew(vmpage))
- PageInit(vmpage, BLCKSZ, 0);
-
- /* remove VISIBILITYMAP_XLOG_* */
- vmbits = xlrec->flags & VISIBILITYMAP_VALID_BITS;
-
- /*
- * XLogReadBufferForRedoExtended locked the buffer. But
- * visibilitymap_set will handle locking itself.
- */
- LockBuffer(vmbuffer, BUFFER_LOCK_UNLOCK);
-
- reln = CreateFakeRelcacheEntry(rlocator);
-
- visibilitymap_set(reln, blkno, InvalidBuffer, lsn, vmbuffer,
- xlrec->snapshotConflictHorizon, vmbits);
-
- ReleaseBuffer(vmbuffer);
- FreeFakeRelcacheEntry(reln);
- }
- else if (BufferIsValid(vmbuffer))
- UnlockReleaseBuffer(vmbuffer);
-}
-
/*
* Given an "infobits" field from an XLog record, set the correct bits in the
* given infomask and infomask2 for the tuple touched by the record.
@@ -785,15 +651,14 @@ heap_xlog_multi_insert(XLogReaderState *record)
if (PageIsNew(vmpage))
PageInit(vmpage, BLCKSZ, 0);
- visibilitymap_set_vmbits(reln, blkno,
- vmbuffer,
- VISIBILITYMAP_ALL_VISIBLE |
- VISIBILITYMAP_ALL_FROZEN);
-
/*
* It is not possible that the VM was already set for this heap page,
* so the vmbuffer must have been modified and marked dirty.
*/
+ visibilitymap_set(reln, blkno,
+ vmbuffer,
+ VISIBILITYMAP_ALL_VISIBLE |
+ VISIBILITYMAP_ALL_FROZEN);
Assert(BufferIsDirty(vmbuffer));
PageSetLSN(BufferGetPage(vmbuffer), lsn);
FreeFakeRelcacheEntry(reln);
@@ -1374,9 +1239,6 @@ heap2_redo(XLogReaderState *record)
case XLOG_HEAP2_PRUNE_VACUUM_CLEANUP:
heap_xlog_prune_freeze(record);
break;
- case XLOG_HEAP2_VISIBLE:
- heap_xlog_visible(record);
- break;
case XLOG_HEAP2_MULTI_INSERT:
heap_xlog_multi_insert(record);
break;
diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index 669c088ccff..ecc100c3362 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -980,8 +980,8 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
{
Assert(PageIsAllVisible(page));
LockBuffer(vmbuffer, BUFFER_LOCK_EXCLUSIVE);
- old_vmbits = visibilitymap_set_vmbits(relation, blockno,
- vmbuffer, vmflags);
+ old_vmbits = visibilitymap_set(relation, blockno,
+ vmbuffer, vmflags);
if (old_vmbits == vmflags)
{
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index 75c10ba20c6..2ff67d77cb4 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -1886,8 +1886,8 @@ lazy_scan_new_or_empty(LVRelState *vacrel, Buffer buf, BlockNumber blkno,
MarkBufferDirty(buf);
LockBuffer(vmbuffer, BUFFER_LOCK_EXCLUSIVE);
- visibilitymap_set_vmbits(vacrel->rel, blkno,
- vmbuffer, new_vmbits);
+ visibilitymap_set(vacrel->rel, blkno,
+ vmbuffer, new_vmbits);
if (RelationNeedsWAL(vacrel->rel))
{
@@ -2753,9 +2753,9 @@ lazy_vacuum_heap_page(LVRelState *vacrel, BlockNumber blkno, Buffer buffer,
set_pd_all_vis = true;
PageSetAllVisible(page);
LockBuffer(vmbuffer, BUFFER_LOCK_EXCLUSIVE);
- visibilitymap_set_vmbits(vacrel->rel,
- blkno,
- vmbuffer, vmflags);
+ visibilitymap_set(vacrel->rel,
+ blkno,
+ vmbuffer, vmflags);
conflict_xid = visibility_cutoff_xid;
}
diff --git a/src/backend/access/heap/visibilitymap.c b/src/backend/access/heap/visibilitymap.c
index bb8dfd8910a..f7bad68ffc5 100644
--- a/src/backend/access/heap/visibilitymap.c
+++ b/src/backend/access/heap/visibilitymap.c
@@ -14,8 +14,7 @@
* visibilitymap_clear - clear bits for one page in the visibility map
* visibilitymap_pin - pin a map page for setting a bit
* visibilitymap_pin_ok - check whether correct map page is already pinned
- * visibilitymap_set - set bit(s) in a previously pinned page and log
- * visibilitymap_set_vmbits - set bit(s) in a pinned page
+ * visibilitymap_set - set bit(s) in a previously pinned page
* visibilitymap_get_status - get status of bits
* visibilitymap_count - count number of bits set in visibility map
* visibilitymap_prepare_truncate -
@@ -220,108 +219,6 @@ visibilitymap_pin_ok(BlockNumber heapBlk, Buffer vmbuf)
return BufferIsValid(vmbuf) && BufferGetBlockNumber(vmbuf) == mapBlock;
}
-/*
- * visibilitymap_set - set bit(s) on a previously pinned page
- *
- * recptr is the LSN of the XLOG record we're replaying, if we're in recovery,
- * or InvalidXLogRecPtr in normal running. The VM page LSN is advanced to the
- * one provided; in normal running, we generate a new XLOG record and set the
- * page LSN to that value (though the heap page's LSN may *not* be updated;
- * see below). cutoff_xid is the largest xmin on the page being marked
- * all-visible; it is needed for Hot Standby, and can be InvalidTransactionId
- * if the page contains no tuples. It can also be set to InvalidTransactionId
- * when a page that is already all-visible is being marked all-frozen.
- *
- * Caller is expected to set the heap page's PD_ALL_VISIBLE bit before calling
- * this function. Except in recovery, caller should also pass the heap
- * buffer. When checksums are enabled and we're not in recovery, we must add
- * the heap buffer to the WAL chain to protect it from being torn.
- *
- * You must pass a buffer containing the correct map page to this function.
- * Call visibilitymap_pin first to pin the right one. This function doesn't do
- * any I/O.
- *
- * Returns the state of the page's VM bits before setting flags.
- */
-uint8
-visibilitymap_set(Relation rel, BlockNumber heapBlk, Buffer heapBuf,
- XLogRecPtr recptr, Buffer vmBuf, TransactionId cutoff_xid,
- uint8 flags)
-{
- BlockNumber mapBlock = HEAPBLK_TO_MAPBLOCK(heapBlk);
- uint32 mapByte = HEAPBLK_TO_MAPBYTE(heapBlk);
- uint8 mapOffset = HEAPBLK_TO_OFFSET(heapBlk);
- Page page;
- uint8 *map;
- uint8 status;
-
-#ifdef TRACE_VISIBILITYMAP
- elog(DEBUG1, "vm_set flags 0x%02X for %s %d",
- flags, RelationGetRelationName(rel), heapBlk);
-#endif
-
- Assert(InRecovery || XLogRecPtrIsInvalid(recptr));
- Assert(InRecovery || PageIsAllVisible(BufferGetPage(heapBuf)));
- Assert((flags & VISIBILITYMAP_VALID_BITS) == flags);
-
- /* Must never set all_frozen bit without also setting all_visible bit */
- Assert(flags != VISIBILITYMAP_ALL_FROZEN);
-
- /* Check that we have the right heap page pinned, if present */
- if (BufferIsValid(heapBuf) && BufferGetBlockNumber(heapBuf) != heapBlk)
- elog(ERROR, "wrong heap buffer passed to visibilitymap_set");
-
- Assert(!BufferIsValid(heapBuf) || BufferIsExclusiveLocked(heapBuf));
-
- /* Check that we have the right VM page pinned */
- if (!BufferIsValid(vmBuf) || BufferGetBlockNumber(vmBuf) != mapBlock)
- elog(ERROR, "wrong VM buffer passed to visibilitymap_set");
-
- page = BufferGetPage(vmBuf);
- map = (uint8 *) PageGetContents(page);
- LockBuffer(vmBuf, BUFFER_LOCK_EXCLUSIVE);
-
- status = (map[mapByte] >> mapOffset) & VISIBILITYMAP_VALID_BITS;
- if (flags != status)
- {
- START_CRIT_SECTION();
-
- map[mapByte] |= (flags << mapOffset);
- MarkBufferDirty(vmBuf);
-
- if (RelationNeedsWAL(rel))
- {
- if (XLogRecPtrIsInvalid(recptr))
- {
- Assert(!InRecovery);
- recptr = log_heap_visible(rel, heapBuf, vmBuf, cutoff_xid, flags);
-
- /*
- * If data checksums are enabled (or wal_log_hints=on), we
- * need to protect the heap page from being torn.
- *
- * If not, then we must *not* update the heap page's LSN. In
- * this case, the FPI for the heap page was omitted from the
- * WAL record inserted above, so it would be incorrect to
- * update the heap page's LSN.
- */
- if (XLogHintBitIsNeeded())
- {
- Page heapPage = BufferGetPage(heapBuf);
-
- PageSetLSN(heapPage, recptr);
- }
- }
- PageSetLSN(page, recptr);
- }
-
- END_CRIT_SECTION();
- }
-
- LockBuffer(vmBuf, BUFFER_LOCK_UNLOCK);
- return status;
-}
-
/*
* Set flags in the VM block contained in the passed in vmBuf.
*
@@ -343,8 +240,8 @@ visibilitymap_set(Relation rel, BlockNumber heapBlk, Buffer heapBuf,
* is pinned and exclusive locked.
*/
uint8
-visibilitymap_set_vmbits(Relation rel, BlockNumber heapBlk,
- Buffer vmBuf, uint8 flags)
+visibilitymap_set(Relation rel, BlockNumber heapBlk,
+ Buffer vmBuf, uint8 flags)
{
BlockNumber mapBlock = HEAPBLK_TO_MAPBLOCK(heapBlk);
uint32 mapByte = HEAPBLK_TO_MAPBYTE(heapBlk);
diff --git a/src/backend/access/rmgrdesc/heapdesc.c b/src/backend/access/rmgrdesc/heapdesc.c
index c95d30dfe8d..47998f1df15 100644
--- a/src/backend/access/rmgrdesc/heapdesc.c
+++ b/src/backend/access/rmgrdesc/heapdesc.c
@@ -343,13 +343,6 @@ heap2_desc(StringInfo buf, XLogReaderState *record)
}
}
}
- else if (info == XLOG_HEAP2_VISIBLE)
- {
- xl_heap_visible *xlrec = (xl_heap_visible *) rec;
-
- appendStringInfo(buf, "snapshotConflictHorizon: %u, flags: 0x%02X",
- xlrec->snapshotConflictHorizon, xlrec->flags);
- }
else if (info == XLOG_HEAP2_MULTI_INSERT)
{
xl_heap_multi_insert *xlrec = (xl_heap_multi_insert *) rec;
@@ -454,9 +447,6 @@ heap2_identify(uint8 info)
case XLOG_HEAP2_PRUNE_VACUUM_CLEANUP:
id = "PRUNE_VACUUM_CLEANUP";
break;
- case XLOG_HEAP2_VISIBLE:
- id = "VISIBLE";
- break;
case XLOG_HEAP2_MULTI_INSERT:
id = "MULTI_INSERT";
break;
diff --git a/src/backend/replication/logical/decode.c b/src/backend/replication/logical/decode.c
index cc03f0706e9..2fdd4af90a8 100644
--- a/src/backend/replication/logical/decode.c
+++ b/src/backend/replication/logical/decode.c
@@ -454,7 +454,6 @@ heap2_decode(LogicalDecodingContext *ctx, XLogRecordBuffer *buf)
case XLOG_HEAP2_PRUNE_ON_ACCESS:
case XLOG_HEAP2_PRUNE_VACUUM_SCAN:
case XLOG_HEAP2_PRUNE_VACUUM_CLEANUP:
- case XLOG_HEAP2_VISIBLE:
case XLOG_HEAP2_LOCK_UPDATED:
break;
default:
diff --git a/src/backend/storage/ipc/standby.c b/src/backend/storage/ipc/standby.c
index 4222bdab078..c619643e121 100644
--- a/src/backend/storage/ipc/standby.c
+++ b/src/backend/storage/ipc/standby.c
@@ -475,12 +475,12 @@ ResolveRecoveryConflictWithSnapshot(TransactionId snapshotConflictHorizon,
* If we get passed InvalidTransactionId then we do nothing (no conflict).
*
* This can happen when replaying already-applied WAL records after a
- * standby crash or restart, or when replaying an XLOG_HEAP2_VISIBLE
- * record that marks as frozen a page which was already all-visible. It's
- * also quite common with records generated during index deletion
- * (original execution of the deletion can reason that a recovery conflict
- * which is sufficient for the deletion operation must take place before
- * replay of the deletion record itself).
+ * standby crash or restart, or when replaying a record that marks as
+ * frozen a page which was already marked all-visible in the visibility
+ * map. It's also quite common with records generated during index
+ * deletion (original execution of the deletion can reason that a recovery
+ * conflict which is sufficient for the deletion operation must take place
+ * before replay of the deletion record itself).
*/
if (!TransactionIdIsValid(snapshotConflictHorizon))
return;
diff --git a/src/include/access/heapam_xlog.h b/src/include/access/heapam_xlog.h
index d8508593e7c..3672f372aa8 100644
--- a/src/include/access/heapam_xlog.h
+++ b/src/include/access/heapam_xlog.h
@@ -60,7 +60,6 @@
#define XLOG_HEAP2_PRUNE_ON_ACCESS 0x10
#define XLOG_HEAP2_PRUNE_VACUUM_SCAN 0x20
#define XLOG_HEAP2_PRUNE_VACUUM_CLEANUP 0x30
-#define XLOG_HEAP2_VISIBLE 0x40
#define XLOG_HEAP2_MULTI_INSERT 0x50
#define XLOG_HEAP2_LOCK_UPDATED 0x60
#define XLOG_HEAP2_NEW_CID 0x70
@@ -446,20 +445,6 @@ typedef struct xl_heap_inplace
#define MinSizeOfHeapInplace (offsetof(xl_heap_inplace, nmsgs) + sizeof(int))
-/*
- * This is what we need to know about setting a visibility map bit
- *
- * Backup blk 0: visibility map buffer
- * Backup blk 1: heap buffer
- */
-typedef struct xl_heap_visible
-{
- TransactionId snapshotConflictHorizon;
- uint8 flags;
-} xl_heap_visible;
-
-#define SizeOfHeapVisible (offsetof(xl_heap_visible, flags) + sizeof(uint8))
-
typedef struct xl_heap_new_cid
{
/*
@@ -503,11 +488,6 @@ extern void heap2_desc(StringInfo buf, XLogReaderState *record);
extern const char *heap2_identify(uint8 info);
extern void heap_xlog_logical_rewrite(XLogReaderState *r);
-extern XLogRecPtr log_heap_visible(Relation rel, Buffer heap_buffer,
- Buffer vm_buffer,
- TransactionId snapshotConflictHorizon,
- uint8 vmflags);
-
/* in heapdesc.c, so it can be shared between frontend/backend code */
extern void heap_xlog_deserialize_prune_and_freeze(char *cursor, uint16 flags,
int *nplans, xlhp_freeze_plan **plans,
diff --git a/src/include/access/visibilitymap.h b/src/include/access/visibilitymap.h
index fc7056a91ea..c5b1e1f7adb 100644
--- a/src/include/access/visibilitymap.h
+++ b/src/include/access/visibilitymap.h
@@ -15,7 +15,6 @@
#define VISIBILITYMAP_H
#include "access/visibilitymapdefs.h"
-#include "access/xlogdefs.h"
#include "storage/block.h"
#include "storage/buf.h"
#include "utils/relcache.h"
@@ -31,14 +30,8 @@ extern bool visibilitymap_clear(Relation rel, BlockNumber heapBlk,
extern void visibilitymap_pin(Relation rel, BlockNumber heapBlk,
Buffer *vmbuf);
extern bool visibilitymap_pin_ok(BlockNumber heapBlk, Buffer vmbuf);
-extern uint8 visibilitymap_set(Relation rel,
- BlockNumber heapBlk, Buffer heapBuf,
- XLogRecPtr recptr,
- Buffer vmBuf,
- TransactionId cutoff_xid,
- uint8 flags);
-extern uint8 visibilitymap_set_vmbits(Relation rel, BlockNumber heapBlk,
- Buffer vmBuf, uint8 flags);
+extern uint8 visibilitymap_set(Relation rel, BlockNumber heapBlk,
+ Buffer vmBuf, uint8 flags);
extern uint8 visibilitymap_get_status(Relation rel, BlockNumber heapBlk, Buffer *vmbuf);
extern void visibilitymap_count(Relation rel, BlockNumber *all_visible, BlockNumber *all_frozen);
extern BlockNumber visibilitymap_prepare_truncate(Relation rel,
diff --git a/src/include/access/visibilitymapdefs.h b/src/include/access/visibilitymapdefs.h
index 5ad5c020877..e01bce4c99f 100644
--- a/src/include/access/visibilitymapdefs.h
+++ b/src/include/access/visibilitymapdefs.h
@@ -21,14 +21,5 @@
#define VISIBILITYMAP_ALL_FROZEN 0x02
#define VISIBILITYMAP_VALID_BITS 0x03 /* OR of all valid visibilitymap
* flags bits */
-/*
- * To detect recovery conflicts during logical decoding on a standby, we need
- * to know if a table is a user catalog table. For that we add an additional
- * bit into xl_heap_visible.flags, in addition to the above.
- *
- * NB: VISIBILITYMAP_XLOG_* may not be passed to visibilitymap_set().
- */
-#define VISIBILITYMAP_XLOG_CATALOG_REL 0x04
-#define VISIBILITYMAP_XLOG_VALID_BITS (VISIBILITYMAP_VALID_BITS | VISIBILITYMAP_XLOG_CATALOG_REL)
#endif /* VISIBILITYMAPDEFS_H */
diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list
index a13e8162890..b4c880c083f 100644
--- a/src/tools/pgindent/typedefs.list
+++ b/src/tools/pgindent/typedefs.list
@@ -4273,7 +4273,6 @@ xl_heap_prune
xl_heap_rewrite_mapping
xl_heap_truncate
xl_heap_update
-xl_heap_visible
xl_invalid_page
xl_invalid_page_key
xl_invalidations
--
2.43.0
v9-0015-Rename-GlobalVisTestIsRemovableXid-to-GlobalVisXi.patchtext/x-patch; charset=US-ASCII; name=v9-0015-Rename-GlobalVisTestIsRemovableXid-to-GlobalVisXi.patchDownload
From 00012be836b472c2f0185b1c037cf29b480e5507 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <melanieplageman@gmail.com>
Date: Fri, 18 Jul 2025 16:30:04 -0400
Subject: [PATCH v9 15/22] Rename GlobalVisTestIsRemovableXid() to
GlobalVisXidVisibleToAll()
Currently, we only use GlobalVisTestIsRemovableXid() to check if a
tuple's xmax is visible to all, meaning we can remove it. But future
commits will use GlobalVisTestIsRemovableXid() to test if a tuple's xmin
is visible to all for the purposes of determining if setting the page
all-visible in the VM. In that case, it makes more sense to call the
function GlobalVisXidVisibleToAll().
---
src/backend/access/heap/heapam_visibility.c | 6 +++---
src/backend/access/heap/pruneheap.c | 14 +++++++-------
src/backend/access/spgist/spgvacuum.c | 2 +-
src/backend/storage/ipc/procarray.c | 13 ++++++-------
src/include/utils/snapmgr.h | 4 ++--
5 files changed, 19 insertions(+), 20 deletions(-)
diff --git a/src/backend/access/heap/heapam_visibility.c b/src/backend/access/heap/heapam_visibility.c
index 05f6946fe60..4ebc8abdbeb 100644
--- a/src/backend/access/heap/heapam_visibility.c
+++ b/src/backend/access/heap/heapam_visibility.c
@@ -1447,7 +1447,7 @@ HeapTupleSatisfiesNonVacuumable(HeapTuple htup, Snapshot snapshot,
{
Assert(TransactionIdIsValid(dead_after));
- if (GlobalVisTestIsRemovableXid(snapshot->vistest, dead_after))
+ if (GlobalVisXidVisibleToAll(snapshot->vistest, dead_after))
res = HEAPTUPLE_DEAD;
}
else
@@ -1512,8 +1512,8 @@ HeapTupleIsSurelyDead(HeapTuple htup, GlobalVisState *vistest)
return false;
/* Deleter committed, so tuple is dead if the XID is old enough. */
- return GlobalVisTestIsRemovableXid(vistest,
- HeapTupleHeaderGetRawXmax(tuple));
+ return GlobalVisXidVisibleToAll(vistest,
+ HeapTupleHeaderGetRawXmax(tuple));
}
/*
diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index ecc100c3362..73ca4e88c1f 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -231,7 +231,7 @@ heap_page_prune_opt(Relation relation, Buffer buffer)
*/
vistest = GlobalVisTestFor(relation);
- if (!GlobalVisTestIsRemovableXid(vistest, prune_xid))
+ if (!GlobalVisXidVisibleToAll(vistest, prune_xid))
return;
/*
@@ -574,9 +574,9 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
* Determining HTSV only once for each tuple is required for correctness,
* to deal with cases where running HTSV twice could result in different
* results. For example, RECENTLY_DEAD can turn to DEAD if another
- * checked item causes GlobalVisTestIsRemovableFullXid() to update the
- * horizon, or INSERT_IN_PROGRESS can change to DEAD if the inserting
- * transaction aborts.
+ * checked item causes GlobalVisXidVisibleToAll() to update the horizon,
+ * or INSERT_IN_PROGRESS can change to DEAD if the inserting transaction
+ * aborts.
*
* It's also good for performance. Most commonly tuples within a page are
* stored at decreasing offsets (while the items are stored at increasing
@@ -1173,11 +1173,11 @@ heap_prune_satisfies_vacuum(PruneState *prstate, HeapTuple tup, Buffer buffer)
* Determine whether or not the tuple is considered dead when compared
* with the provided GlobalVisState. On-access pruning does not provide
* VacuumCutoffs. And for vacuum, even if the tuple's xmax is not older
- * than OldestXmin, GlobalVisTestIsRemovableXid() could find the row dead
- * if the GlobalVisState has been updated since the beginning of vacuuming
+ * than OldestXmin, GlobalVisXidVisibleToAll() could find the row dead if
+ * the GlobalVisState has been updated since the beginning of vacuuming
* the relation.
*/
- if (GlobalVisTestIsRemovableXid(prstate->vistest, dead_after))
+ if (GlobalVisXidVisibleToAll(prstate->vistest, dead_after))
return HEAPTUPLE_DEAD;
return res;
diff --git a/src/backend/access/spgist/spgvacuum.c b/src/backend/access/spgist/spgvacuum.c
index 8f8a1ad7796..496cca69410 100644
--- a/src/backend/access/spgist/spgvacuum.c
+++ b/src/backend/access/spgist/spgvacuum.c
@@ -536,7 +536,7 @@ vacuumRedirectAndPlaceholder(Relation index, Relation heaprel, Buffer buffer)
*/
if (dt->tupstate == SPGIST_REDIRECT &&
(!TransactionIdIsValid(dt->xid) ||
- GlobalVisTestIsRemovableXid(vistest, dt->xid)))
+ GlobalVisXidVisibleToAll(vistest, dt->xid)))
{
dt->tupstate = SPGIST_PLACEHOLDER;
Assert(opaque->nRedirection > 0);
diff --git a/src/backend/storage/ipc/procarray.c b/src/backend/storage/ipc/procarray.c
index 200f72c6e25..f67f01c17c2 100644
--- a/src/backend/storage/ipc/procarray.c
+++ b/src/backend/storage/ipc/procarray.c
@@ -4181,8 +4181,7 @@ GlobalVisUpdate(void)
* See comment for GlobalVisState for details.
*/
bool
-GlobalVisTestIsRemovableFullXid(GlobalVisState *state,
- FullTransactionId fxid)
+GlobalVisFullXidVisibleToAll(GlobalVisState *state, FullTransactionId fxid)
{
/*
* If fxid is older than maybe_needed bound, it definitely is visible to
@@ -4223,7 +4222,7 @@ GlobalVisTestIsRemovableFullXid(GlobalVisState *state,
* relfrozenxid).
*/
bool
-GlobalVisTestIsRemovableXid(GlobalVisState *state, TransactionId xid)
+GlobalVisXidVisibleToAll(GlobalVisState *state, TransactionId xid)
{
FullTransactionId fxid;
@@ -4237,7 +4236,7 @@ GlobalVisTestIsRemovableXid(GlobalVisState *state, TransactionId xid)
*/
fxid = FullXidRelativeTo(state->definitely_needed, xid);
- return GlobalVisTestIsRemovableFullXid(state, fxid);
+ return GlobalVisFullXidVisibleToAll(state, fxid);
}
/*
@@ -4251,12 +4250,12 @@ GlobalVisCheckRemovableFullXid(Relation rel, FullTransactionId fxid)
state = GlobalVisTestFor(rel);
- return GlobalVisTestIsRemovableFullXid(state, fxid);
+ return GlobalVisFullXidVisibleToAll(state, fxid);
}
/*
* Convenience wrapper around GlobalVisTestFor() and
- * GlobalVisTestIsRemovableXid(), see their comments.
+ * GlobalVisTestIsVisibleXid(), see their comments.
*/
bool
GlobalVisCheckRemovableXid(Relation rel, TransactionId xid)
@@ -4265,7 +4264,7 @@ GlobalVisCheckRemovableXid(Relation rel, TransactionId xid)
state = GlobalVisTestFor(rel);
- return GlobalVisTestIsRemovableXid(state, xid);
+ return GlobalVisXidVisibleToAll(state, xid);
}
/*
diff --git a/src/include/utils/snapmgr.h b/src/include/utils/snapmgr.h
index 604c1f90216..a0ea2cfcea2 100644
--- a/src/include/utils/snapmgr.h
+++ b/src/include/utils/snapmgr.h
@@ -100,8 +100,8 @@ extern char *ExportSnapshot(Snapshot snapshot);
*/
typedef struct GlobalVisState GlobalVisState;
extern GlobalVisState *GlobalVisTestFor(Relation rel);
-extern bool GlobalVisTestIsRemovableXid(GlobalVisState *state, TransactionId xid);
-extern bool GlobalVisTestIsRemovableFullXid(GlobalVisState *state, FullTransactionId fxid);
+extern bool GlobalVisXidVisibleToAll(GlobalVisState *state, TransactionId xid);
+extern bool GlobalVisFullXidVisibleToAll(GlobalVisState *state, FullTransactionId fxid);
extern bool GlobalVisCheckRemovableXid(Relation rel, TransactionId xid);
extern bool GlobalVisCheckRemovableFullXid(Relation rel, FullTransactionId fxid);
--
2.43.0
v9-0016-Use-GlobalVisState-to-determine-page-level-visibi.patchtext/x-patch; charset=US-ASCII; name=v9-0016-Use-GlobalVisState-to-determine-page-level-visibi.patchDownload
From 438ce859c03936b016e1345be5e9d5950d96f514 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <melanieplageman@gmail.com>
Date: Tue, 29 Jul 2025 14:38:24 -0400
Subject: [PATCH v9 16/22] Use GlobalVisState to determine page level
visibility
During pruning and during vacuum's third phase, we try to determine if
the whole page can be set all-visible in the visibility map. Instead of
using OldestXmin to determine if all the tuples on a page are visible to
everyone, use the GlobalVisState. This allows us to start setting the VM
during on-access pruning in a future commit.
It is possible for the GlobalVisState to change during the course of a
vacuum. In all but extraordinary cases, it moves forward, meaning more
pages could potentially be set in the VM.
Because comparing a transaction ID to the GlobalVisState requires more
operations than comparing it to another single transaction ID, we now
wait until after examining all the tuples on the page and if we have
maintained the visibility_cutoff_xid, we compare that to the
GlobalVisState just once per page. This works because if the page is
all-visible and has live, committed tuples on it, the
visibility_cutoff_xid will contain the newest xmin on the page. If
everyone can see it, the page is truly all-visible.
Doing this may mean we examine more tuples' xmins than before, as we may
have set all_visible to false sooner when encountering a live tuple
newer than OldestXmin. However, these extra comparisons were found not
to be significant in a profile.
---
src/backend/access/heap/heapam_visibility.c | 28 ++++++++++++
src/backend/access/heap/pruneheap.c | 48 +++++++++------------
src/backend/access/heap/vacuumlazy.c | 19 ++++----
src/include/access/heapam.h | 4 +-
4 files changed, 60 insertions(+), 39 deletions(-)
diff --git a/src/backend/access/heap/heapam_visibility.c b/src/backend/access/heap/heapam_visibility.c
index 4ebc8abdbeb..edd529dc3c0 100644
--- a/src/backend/access/heap/heapam_visibility.c
+++ b/src/backend/access/heap/heapam_visibility.c
@@ -1189,6 +1189,34 @@ HeapTupleSatisfiesVacuum(HeapTuple htup, TransactionId OldestXmin,
return res;
}
+/*
+ * Nearly the same as HeapTupleSatisfiesVacuum, but uses a GlobalVisState to
+ * determine whether or not a tuple is HEAPTUPLE_DEAD Or
+ * HEAPTUPLE_RECENTLY_DEAD. It serves the same purpose but can be used by
+ * callers that have not calculated a single OldestXmin value.
+ */
+HTSV_Result
+HeapTupleSatisfiesVacuumGlobalVis(HeapTuple htup, GlobalVisState *vistest,
+ Buffer buffer)
+{
+ TransactionId dead_after = InvalidTransactionId;
+ HTSV_Result res;
+
+ res = HeapTupleSatisfiesVacuumHorizon(htup, buffer, &dead_after);
+
+ if (res == HEAPTUPLE_RECENTLY_DEAD)
+ {
+ Assert(TransactionIdIsValid(dead_after));
+
+ if (GlobalVisXidVisibleToAll(vistest, dead_after))
+ res = HEAPTUPLE_DEAD;
+ }
+ else
+ Assert(!TransactionIdIsValid(dead_after));
+
+ return res;
+}
+
/*
* Work horse for HeapTupleSatisfiesVacuum and similar routines.
*
diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index 73ca4e88c1f..273e9412a01 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -141,10 +141,9 @@ typedef struct
* all_visible and all_frozen indicate if the all-visible and all-frozen
* bits in the visibility map can be set for this page after pruning.
*
- * visibility_cutoff_xid is the newest xmin of live tuples on the page.
- * The caller can use it as the conflict horizon, when setting the VM
- * bits. It is only valid if we froze some tuples, and all_frozen is
- * true.
+ * visibility_cutoff_xid is the newest xmin of live tuples on the page. It
+ * can be used as the conflict horizon, when setting the VM or when
+ * freezing all the live tuples on the page.
*
* NOTE: all_visible and all_frozen don't include LP_DEAD items until
* directly before updating the VM. We ignore LP_DEAD items when deciding
@@ -553,14 +552,12 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
/*
* The visibility cutoff xid is the newest xmin of live, committed tuples
- * older than OldestXmin on the page. This field is only kept up-to-date
- * if the page is all-visible. As soon as a tuple is encountered that is
- * not visible to all, this field is unmaintained. As long as it is
- * maintained, it can be used to calculate the snapshot conflict horizon.
- * This is most likely to happen when updating the VM and/or freezing all
- * live tuples on the page. It is updated before returning to the caller
- * because vacuum does assert-build only validation on the page using this
- * field.
+ * on the page older than the visibility horizon represented in the
+ * GlobalVisState.
+ *
+ * If we encounter an uncommitted tuple, this field is unmaintained. If
+ * the page is being set all-visible or when freezing all live tuples on
+ * the page, it is used to calculate the snapshot conflict horizon.
*/
prstate.visibility_cutoff_xid = InvalidTransactionId;
@@ -756,6 +753,16 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
prstate.ndead > 0 ||
prstate.nunused > 0;
+ /*
+ * After processing all the live tuples on the page, if the newest xmin
+ * amongst them is not visible to everyone, the page cannot be
+ * all-visible.
+ */
+ if (prstate.all_visible &&
+ TransactionIdIsNormal(prstate.visibility_cutoff_xid) &&
+ !GlobalVisXidVisibleToAll(prstate.vistest, prstate.visibility_cutoff_xid))
+ prstate.all_visible = prstate.all_frozen = false;
+
/*
* Even if we don't prune anything, if we found a new value for the
* pd_prune_xid field or the page was marked full, we will update those
@@ -1099,12 +1106,10 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
TransactionId debug_cutoff;
bool debug_all_frozen;
- Assert(cutoffs);
-
Assert(prstate.lpdead_items == 0);
if (!heap_page_is_all_visible(relation, buffer,
- cutoffs->OldestXmin,
+ prstate.vistest,
&debug_all_frozen,
&debug_cutoff, off_loc))
Assert(false);
@@ -1629,19 +1634,6 @@ heap_prune_record_unchanged_lp_normal(Page page, PruneState *prstate, OffsetNumb
*/
xmin = HeapTupleHeaderGetXmin(htup);
- /*
- * For now always use prstate->cutoffs for this test, because
- * we only update 'all_visible' when freezing is requested. We
- * could use GlobalVisTestIsRemovableXid instead, if a
- * non-freezing caller wanted to set the VM bit.
- */
- Assert(prstate->cutoffs);
- if (!TransactionIdPrecedes(xmin, prstate->cutoffs->OldestXmin))
- {
- prstate->all_visible = prstate->all_frozen = false;
- break;
- }
-
/* Track newest xmin on page. */
if (TransactionIdFollows(xmin, prstate->visibility_cutoff_xid) &&
TransactionIdIsNormal(xmin))
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index 2ff67d77cb4..7558ac697f1 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -464,7 +464,7 @@ static void dead_items_add(LVRelState *vacrel, BlockNumber blkno, OffsetNumber *
static void dead_items_reset(LVRelState *vacrel);
static void dead_items_cleanup(LVRelState *vacrel);
static bool heap_page_would_be_all_visible(Relation rel, Buffer buf,
- TransactionId OldestXmin,
+ GlobalVisState *vistest,
OffsetNumber *deadoffsets,
int ndeadoffsets,
bool *all_frozen,
@@ -2715,7 +2715,7 @@ lazy_vacuum_heap_page(LVRelState *vacrel, BlockNumber blkno, Buffer buffer,
InvalidOffsetNumber);
if (heap_page_would_be_all_visible(vacrel->rel, buffer,
- vacrel->cutoffs.OldestXmin,
+ vacrel->vistest,
deadoffsets, num_offsets,
&all_frozen, &visibility_cutoff_xid,
&vacrel->offnum))
@@ -3458,13 +3458,13 @@ dead_items_cleanup(LVRelState *vacrel)
*/
bool
heap_page_is_all_visible(Relation rel, Buffer buf,
- TransactionId OldestXmin,
+ GlobalVisState *vistest,
bool *all_frozen,
TransactionId *visibility_cutoff_xid,
OffsetNumber *logging_offnum)
{
- return heap_page_would_be_all_visible(rel, buf, OldestXmin,
+ return heap_page_would_be_all_visible(rel, buf, vistest,
NULL, 0,
all_frozen,
visibility_cutoff_xid,
@@ -3483,7 +3483,7 @@ heap_page_is_all_visible(Relation rel, Buffer buf,
* Returns true if the page is all-visible other than the provided
* deadoffsets and false otherwise.
*
- * OldestXmin is used to determine visibility.
+ * vistest is used to determine visibility.
*
* *all_frozen is an output parameter indicating to the caller if every tuple
* on the page is frozen.
@@ -3504,7 +3504,7 @@ heap_page_is_all_visible(Relation rel, Buffer buf,
*/
static bool
heap_page_would_be_all_visible(Relation rel, Buffer buf,
- TransactionId OldestXmin,
+ GlobalVisState *vistest,
OffsetNumber *deadoffsets,
int ndeadoffsets,
bool *all_frozen,
@@ -3576,8 +3576,8 @@ heap_page_would_be_all_visible(Relation rel, Buffer buf,
tuple.t_len = ItemIdGetLength(itemid);
tuple.t_tableOid = RelationGetRelid(rel);
- switch (HeapTupleSatisfiesVacuum(&tuple, OldestXmin,
- buf))
+ switch (HeapTupleSatisfiesVacuumGlobalVis(&tuple, vistest,
+ buf))
{
case HEAPTUPLE_LIVE:
{
@@ -3596,8 +3596,7 @@ heap_page_would_be_all_visible(Relation rel, Buffer buf,
* that everyone sees it as committed?
*/
xmin = HeapTupleHeaderGetXmin(tuple.t_data);
- if (!TransactionIdPrecedes(xmin,
- OldestXmin))
+ if (!GlobalVisXidVisibleToAll(vistest, xmin))
{
all_visible = false;
*all_frozen = false;
diff --git a/src/include/access/heapam.h b/src/include/access/heapam.h
index 0b9bb1c9b13..4278f351bdf 100644
--- a/src/include/access/heapam.h
+++ b/src/include/access/heapam.h
@@ -342,7 +342,7 @@ extern void heap_inplace_unlock(Relation relation,
HeapTuple oldtup, Buffer buffer);
extern bool heap_page_is_all_visible(Relation rel, Buffer buf,
- TransactionId OldestXmin,
+ GlobalVisState *vistest,
bool *all_frozen,
TransactionId *visibility_cutoff_xid,
OffsetNumber *logging_offnum);
@@ -415,6 +415,8 @@ extern TM_Result HeapTupleSatisfiesUpdate(HeapTuple htup, CommandId curcid,
Buffer buffer);
extern HTSV_Result HeapTupleSatisfiesVacuum(HeapTuple htup, TransactionId OldestXmin,
Buffer buffer);
+extern HTSV_Result HeapTupleSatisfiesVacuumGlobalVis(HeapTuple htup,
+ GlobalVisState *vistest, Buffer buffer);
extern HTSV_Result HeapTupleSatisfiesVacuumHorizon(HeapTuple htup, Buffer buffer,
TransactionId *dead_after);
extern void HeapTupleSetHintBits(HeapTupleHeader tuple, Buffer buffer,
--
2.43.0
v9-0012-Eliminate-xl_heap_visible-from-vacuum-phase-I-pru.patchtext/x-patch; charset=US-ASCII; name=v9-0012-Eliminate-xl_heap_visible-from-vacuum-phase-I-pru.patchDownload
From 73cfbe246ba075db052afd207749a7c66ec1a9bc Mon Sep 17 00:00:00 2001
From: Melanie Plageman <melanieplageman@gmail.com>
Date: Wed, 18 Jun 2025 12:41:00 -0400
Subject: [PATCH v9 12/22] Eliminate xl_heap_visible from vacuum phase I
prune/freeze
Instead of emitting a separate WAL record for every block rendered
all-visible/frozen by vacuum's phase I, include the changes to the VM in
the xl_heap_prune record already emitted.
This is only enabled for vacuum's prune/freeze work, not for on-access
pruning.
---
src/backend/access/heap/pruneheap.c | 456 ++++++++++++++++-----------
src/backend/access/heap/vacuumlazy.c | 30 --
src/include/access/heapam.h | 15 +-
3 files changed, 279 insertions(+), 222 deletions(-)
diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index 3483b5caff3..683c1762c25 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -44,6 +44,13 @@ typedef struct
bool mark_unused_now;
/* whether to attempt freezing tuples */
bool freeze;
+
+ /*
+ * Whether or not to consider updating the VM. There is some bookkeeping
+ * that must be maintained if we would like to update the VM.
+ */
+ bool consider_update_vm;
+
struct VacuumCutoffs *cutoffs;
/*-------------------------------------------------------
@@ -108,8 +115,9 @@ typedef struct
*
* These fields are not used by pruning itself for the most part, but are
* used to collect information about what was pruned and what state the
- * page is in after pruning, for the benefit of the caller. They are
- * copied to the caller's PruneFreezeResult at the end.
+ * page is in after pruning to use when updating the visibility map and
+ * for the benefit of the caller. They are copied to the caller's
+ * PruneFreezeResult at the end.
* -------------------------------------------------------
*/
@@ -138,11 +146,10 @@ typedef struct
* bits. It is only valid if we froze some tuples, and all_frozen is
* true.
*
- * NOTE: all_visible and all_frozen don't include LP_DEAD items. That's
- * convenient for heap_page_prune_and_freeze(), to use them to decide
- * whether to freeze the page or not. The all_visible and all_frozen
- * values returned to the caller are adjusted to include LP_DEAD items at
- * the end.
+ * NOTE: all_visible and all_frozen don't include LP_DEAD items until
+ * directly before updating the VM. We ignore LP_DEAD items when deciding
+ * whether or not to opportunistically freeze and when determining the
+ * snapshot conflict horizon required when freezing tuples.
*/
bool all_visible;
bool all_frozen;
@@ -371,12 +378,15 @@ identify_and_fix_vm_corruption(Relation relation,
* If the HEAP_PRUNE_FREEZE option is set, we will also freeze tuples if it's
* required in order to advance relfrozenxid / relminmxid, or if it's
* considered advantageous for overall system performance to do so now. The
- * 'cutoffs', 'presult', 'new_relfrozen_xid' and 'new_relmin_mxid' arguments
- * are required when freezing. When HEAP_PRUNE_FREEZE option is set, we also
- * set presult->all_visible and presult->all_frozen on exit, to indicate if
- * the VM bits can be set. They are always set to false when the
- * HEAP_PRUNE_FREEZE option is not set, because at the moment only callers
- * that also freeze need that information.
+ * 'cutoffs', 'new_relfrozen_xid' and 'new_relmin_mxid' arguments are required
+ * when freezing.
+ *
+ * If HEAP_PAGE_PRUNE_UPDATE_VM is set and the visibility status of the page
+ * has changed, we will update the VM at the same time as pruning and freezing
+ * the heap page. We will also update presult->old_vmbits and
+ * presult->new_vmbits with the state of the VM before and after updating it
+ * for the caller to use in bookkeeping. Note that new and old_vmbits will be
+ * 0 if HEAP_PAGE_PRUNE_UPDATE_VM is not set.
*
* blk_known_av is the visibility status of the heap block as of the last call
* to find_next_unskippable_block(). vmbuffer is the buffer that may already
@@ -392,6 +402,8 @@ identify_and_fix_vm_corruption(Relation relation,
* FREEZE indicates that we will also freeze tuples, and will return
* 'all_visible', 'all_frozen' flags to the caller.
*
+ * UPDATE_VM indicates that we will set the page's status in the VM.
+ *
* cutoffs contains the freeze cutoffs, established by VACUUM at the beginning
* of vacuuming the relation. Required if HEAP_PRUNE_FREEZE option is set.
* cutoffs->OldestXmin is also used to determine if dead tuples are
@@ -436,18 +448,24 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
HeapTupleData tup;
bool do_freeze;
bool do_prune;
- bool do_hint;
+ bool do_hint_full_or_prunable;
+ bool do_set_vm;
uint8 vmflags = 0;
uint8 old_vmbits = 0;
bool hint_bit_fpi;
int64 fpi_before = pgWalUsage.wal_fpi;
+ bool all_frozen_except_lp_dead = false;
+ bool set_pd_all_visible = false;
/* Copy parameters to prstate */
prstate.vistest = vistest;
prstate.mark_unused_now = (options & HEAP_PAGE_PRUNE_MARK_UNUSED_NOW) != 0;
prstate.freeze = (options & HEAP_PAGE_PRUNE_FREEZE) != 0;
+ prstate.consider_update_vm = (options & HEAP_PAGE_PRUNE_UPDATE_VM) != 0;
prstate.cutoffs = cutoffs;
+ Assert(!prstate.consider_update_vm || vmbuffer);
+
/*
* Our strategy is to scan the page and make lists of items to change,
* then apply the changes within a critical section. This keeps as much
@@ -492,50 +510,57 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
prstate.deadoffsets = presult->deadoffsets;
/*
- * Caller may update the VM after we're done. We can keep track of
- * whether the page will be all-visible and all-frozen after pruning and
- * freezing to help the caller to do that.
+ * Keep track of whether or not the page will be all-visible and
+ * all-frozen for use in opportunistic freezing and to update the VM if
+ * the caller requests it.
+ *
+ * Currently, only VACUUM attempts freezing and setting the VM bits. But
+ * other callers could do either one. The visibility bookkeeping is
+ * required for opportunistic freezing (in addition to setting the VM
+ * bits) because we only consider opportunistically freezing tuples if the
+ * whole page would become all-frozen or if the whole page will be frozen
+ * except for dead tuples that will be removed by vacuum.
*
- * Currently, only VACUUM sets the VM bits. To save the effort, only do
- * the bookkeeping if the caller needs it. Currently, that's tied to
- * HEAP_PAGE_PRUNE_FREEZE, but it could be a separate flag if you wanted
- * to update the VM bits without also freezing or freeze without also
- * setting the VM bits.
+ * If only updating the VM, we must initialize all_frozen to false, as
+ * heap_prepare_freeze_tuple() will not be called for each tuple on the
+ * page and we will not end up correctly setting it to false later.
*
- * In addition to telling the caller whether it can set the VM bit, we
- * also use 'all_visible' and 'all_frozen' for our own decision-making. If
- * the whole page would become frozen, we consider opportunistically
- * freezing tuples. We will not be able to freeze the whole page if there
- * are tuples present that are not visible to everyone or if there are
- * dead tuples which are not yet removable. However, dead tuples which
- * will be removed by the end of vacuuming should not preclude us from
- * opportunistically freezing. Because of that, we do not clear
- * all_visible when we see LP_DEAD items. We fix that at the end of the
- * function, when we return the value to the caller, so that the caller
- * doesn't set the VM bit incorrectly.
+ * Dead tuples which will be removed by the end of vacuuming should not
+ * preclude us from opportunistically freezing, so we do not clear
+ * all_visible when we see LP_DEAD items. We fix that after determining
+ * whether or not to freeze but before deciding whether or not to update
+ * the VM so that we don't set the VM bit incorrectly.
+ *
+ * If not freezing or updating the VM, we otherwise avoid the extra
+ * bookkeeping. Initializing all_visible to false allows skipping the work
+ * to update them in heap_prune_record_unchanged_lp_normal().
*/
if (prstate.freeze)
{
prstate.all_visible = true;
prstate.all_frozen = true;
}
+ else if (prstate.consider_update_vm)
+ {
+ prstate.all_visible = true;
+ prstate.all_frozen = false;
+ }
else
{
- /*
- * Initializing to false allows skipping the work to update them in
- * heap_prune_record_unchanged_lp_normal().
- */
prstate.all_visible = false;
prstate.all_frozen = false;
}
/*
- * The visibility cutoff xid is the newest xmin of live tuples on the
- * page. In the common case, this will be set as the conflict horizon the
- * caller can use for updating the VM. If, at the end of freezing and
- * pruning, the page is all-frozen, there is no possibility that any
- * running transaction on the standby does not see tuples on the page as
- * all-visible, so the conflict horizon remains InvalidTransactionId.
+ * The visibility cutoff xid is the newest xmin of live, committed tuples
+ * older than OldestXmin on the page. This field is only kept up-to-date
+ * if the page is all-visible. As soon as a tuple is encountered that is
+ * not visible to all, this field is unmaintained. As long as it is
+ * maintained, it can be used to calculate the snapshot conflict horizon.
+ * This is most likely to happen when updating the VM and/or freezing all
+ * live tuples on the page. It is updated before returning to the caller
+ * because vacuum does assert-build only validation on the page using this
+ * field.
*/
prstate.visibility_cutoff_xid = InvalidTransactionId;
@@ -733,10 +758,11 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
/*
* Even if we don't prune anything, if we found a new value for the
- * pd_prune_xid field or the page was marked full, we will update the hint
- * bit.
+ * pd_prune_xid field or the page was marked full, we will update those
+ * hint bits.
*/
- do_hint = ((PageHeader) page)->pd_prune_xid != prstate.new_prune_xid ||
+ do_hint_full_or_prunable =
+ ((PageHeader) page)->pd_prune_xid != prstate.new_prune_xid ||
PageIsFull(page);
/*
@@ -784,7 +810,7 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
if (XLogCheckBufferNeedsBackup(buffer))
do_freeze = true;
}
- else if (do_hint)
+ else if (do_hint_full_or_prunable)
{
if (XLogHintBitIsNeeded() && XLogCheckBufferNeedsBackup(buffer))
do_freeze = true;
@@ -823,11 +849,84 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
*/
}
+ /*
+ * It was convenient to ignore LP_DEAD items in all_visible earlier on to
+ * make the choice of whether or not to freeze the page unaffected by the
+ * short-term presence of LP_DEAD items. These LP_DEAD items were
+ * effectively assumed to be LP_UNUSED items in the making. It doesn't
+ * matter which vacuum heap pass (initial pass or final pass) ends up
+ * setting the page all-frozen, as long as the ongoing VACUUM does it.
+ *
+ * Now that freezing has been finalized, unset all_visible if there are
+ * any LP_DEAD items on the page. It needs to reflect the present state of
+ * the page when using it to determine whether or not to update the VM.
+ *
+ * Keep track of whether or not the page was all-frozen except LP_DEAD
+ * items for the purposes of calculating the snapshot conflict horizon,
+ * though.
+ */
+ all_frozen_except_lp_dead = prstate.all_frozen;
+ if (prstate.lpdead_items > 0)
+ {
+ prstate.all_visible = false;
+ prstate.all_frozen = false;
+ }
+
Assert(!prstate.all_frozen || prstate.all_visible);
+
+ /*
+ * Handle setting visibility map bit based on information from the VM (as
+ * of last heap_vac_scan_next_block() call), and from all_visible and
+ * all_frozen variables.
+ */
+ if (prstate.consider_update_vm)
+ {
+ /*
+ * Clear any VM corruption. This does not need to be in a critical
+ * section, so we do it first. If PD_ALL_VISIBLE is incorrectly set,
+ * we may mark the heap page buffer dirty here and could end up doing
+ * so again later. This is not a correctness issue and is in the path
+ * of VM corruption, so we don't have to worry about the extra
+ * performance overhead.
+ */
+ if (identify_and_fix_vm_corruption(relation,
+ blockno, buffer, page,
+ blk_known_av, prstate.lpdead_items, vmbuffer))
+ {
+ /* If we fix corruption, don't update the VM further */
+ }
+
+ /* Determine if we actually need to set the VM and which bits to set. */
+ else if (prstate.all_visible &&
+ (!blk_known_av ||
+ (prstate.all_frozen && !VM_ALL_FROZEN(relation, blockno, &vmbuffer))))
+ {
+ vmflags |= VISIBILITYMAP_ALL_VISIBLE;
+ if (prstate.all_frozen)
+ vmflags |= VISIBILITYMAP_ALL_FROZEN;
+ }
+ }
+
+ do_set_vm = vmflags & VISIBILITYMAP_VALID_BITS;
+
+ /*
+ * Don't set PD_ALL_VISIBLE unless we also plan to set the VM. While it is
+ * correct for a heap page to have PD_ALL_VISIBLE even if the VM is not
+ * set, we strongly prefer to keep them in sync.
+ *
+ * Prior to Postgres 19, it was possible for the page-level bit to be set
+ * and the VM bit to be clear. This could happen if we crashed after
+ * setting PD_ALL_VISIBLE but before setting bits in the VM.
+ */
+ set_pd_all_visible = do_set_vm && !PageIsAllVisible(page);
+
+ /* Save these for the caller in case we later zero out vmflags */
+ presult->new_vmbits = vmflags;
+
/* Any error while applying the changes is critical */
START_CRIT_SECTION();
- if (do_hint)
+ if (do_hint_full_or_prunable)
{
/*
* Update the page's pd_prune_xid field to either zero, or the lowest
@@ -843,15 +942,16 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
PageClearFull(page);
/*
- * If that's all we had to do to the page, this is a non-WAL-logged
- * hint. If we are going to freeze or prune the page, we will mark
- * the buffer dirty below.
+ * If we are _only_ setting the prune_xid or PD_PAGE_FULL hint, then
+ * this is a non-WAL-logged hint. If we are going to freeze or prune
+ * tuples on the page or set PD_ALL_VISIBLE, we will mark the buffer
+ * dirty and emit WAL below.
*/
- if (!do_freeze && !do_prune)
+ if (!do_prune && !do_freeze && !set_pd_all_visible)
MarkBufferDirtyHint(buffer, true);
}
- if (do_prune || do_freeze)
+ if (do_prune || do_freeze || do_set_vm)
{
/* Apply the planned item changes and repair page fragmentation. */
if (do_prune)
@@ -865,12 +965,48 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
if (do_freeze)
heap_freeze_prepared_tuples(buffer, prstate.frozen, prstate.nfrozen);
- MarkBufferDirty(buffer);
+ if (set_pd_all_visible)
+ PageSetAllVisible(page);
+
+ /*
+ * We only set PD_ALL_VISIBLE if we also set the VM, and since setting
+ * the VM requires emitting WAL, MarkBufferDirtyHint() isn't
+ * appropriate here.
+ */
+ if (do_prune || do_freeze || set_pd_all_visible)
+ MarkBufferDirty(buffer);
+
+ if (do_set_vm)
+ {
+ Assert(PageIsAllVisible(page));
+ LockBuffer(vmbuffer, BUFFER_LOCK_EXCLUSIVE);
+ old_vmbits = visibilitymap_set_vmbits(relation, blockno,
+ vmbuffer, vmflags);
+
+ if (old_vmbits == vmflags)
+ {
+ LockBuffer(vmbuffer, BUFFER_LOCK_UNLOCK);
+ do_set_vm = false;
+ /* 0 out vmflags so we don't emit WAL to update the VM */
+ vmflags = 0;
+ }
+ }
+
+ /*
+ * It should never be the case that PD_ALL_VISIBLE is not set and the
+ * VM is set. Or, if it were, we should have caught it earlier when
+ * finding and fixing VM corruption. So, if we found out the VM was
+ * already set above, we should have found PD_ALL_VISIBLE set earlier.
+ */
+ Assert(!set_pd_all_visible || do_set_vm);
/*
- * Emit a WAL XLOG_HEAP2_PRUNE_FREEZE record showing what we did
+ * Emit a WAL XLOG_HEAP2_PRUNE_FREEZE record showing what we did. If
+ * we were only updating the VM and it turns out it was already set,
+ * we will have unset do_set_vm earlier. As such, check it again
+ * before emitting the record.
*/
- if (RelationNeedsWAL(relation))
+ if (RelationNeedsWAL(relation) && (do_set_vm || do_prune || do_freeze))
{
/*
* The snapshotConflictHorizon for the whole record should be the
@@ -882,35 +1018,56 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
* on the standby with xids older than the youngest tuple this
* record will freeze will conflict.
*/
- TransactionId frz_conflict_horizon = InvalidTransactionId;
- TransactionId conflict_xid;
+ TransactionId conflict_xid = InvalidTransactionId;
+
+ /*
+ * If we are updating the VM, the conflict horizon is almost
+ * always the visibility cutoff XID.
+ *
+ * Separately, if we are freezing any tuples, as an optimization,
+ * we can use the visibility_cutoff_xid as the conflict horizon if
+ * the page will be all-frozen. This is true even if there are
+ * LP_DEAD line pointers because we ignored those when maintaining
+ * the visibility_cutoff_xid.
+ */
+ if (do_set_vm || (do_freeze && all_frozen_except_lp_dead))
+ conflict_xid = prstate.visibility_cutoff_xid;
/*
- * We can use the visibility_cutoff_xid as our cutoff for
- * conflicts when the whole page is eligible to become all-frozen
- * in the VM once we're done with it. Otherwise we generate a
- * conservative cutoff by stepping back from OldestXmin.
+ * Otherwise, if we are freezing but the page would not be
+ * all-frozen, we have to use the more pessimistic horizon of
+ * OldestXmin, which may be newer than the newest tuple we froze.
+ * We currently don't track the newest tuple we froze.
*/
- if (do_freeze)
+ else if (do_freeze)
{
- if (prstate.all_visible && prstate.all_frozen)
- frz_conflict_horizon = prstate.visibility_cutoff_xid;
- else
- {
- /* Avoids false conflicts when hot_standby_feedback in use */
- frz_conflict_horizon = prstate.cutoffs->OldestXmin;
- TransactionIdRetreat(frz_conflict_horizon);
- }
+ conflict_xid = prstate.cutoffs->OldestXmin;
+ TransactionIdRetreat(conflict_xid);
}
- if (TransactionIdFollows(frz_conflict_horizon, prstate.latest_xid_removed))
- conflict_xid = frz_conflict_horizon;
- else
+ /*
+ * If we are removing tuples with a younger xmax than our so far
+ * calculated conflict_xid, we must use this as our horizon.
+ */
+ if (TransactionIdFollows(prstate.latest_xid_removed, conflict_xid))
conflict_xid = prstate.latest_xid_removed;
+ /*
+ * We can omit the snapshot conflict horizon if we are not pruning
+ * or freezing any tuples and are setting an already all-visible
+ * page all-frozen in the VM. In this case, all of the tuples on
+ * the page must already be visible to all MVCC snapshots on the
+ * standby.
+ */
+ if (!do_prune && !do_freeze && do_set_vm &&
+ blk_known_av && (vmflags & VISIBILITYMAP_ALL_FROZEN))
+ conflict_xid = InvalidTransactionId;
+
log_heap_prune_and_freeze(relation, buffer,
false,
- InvalidBuffer, 0, false,
+ vmbuffer,
+ vmflags,
+ set_pd_all_visible,
conflict_xid,
true, reason,
prstate.frozen, prstate.nfrozen,
@@ -922,124 +1079,55 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
END_CRIT_SECTION();
- /* Copy information back for caller */
- presult->ndeleted = prstate.ndeleted;
- presult->nnewlpdead = prstate.ndead;
- presult->nfrozen = prstate.nfrozen;
- presult->live_tuples = prstate.live_tuples;
- presult->recently_dead_tuples = prstate.recently_dead_tuples;
-
- /*
- * It was convenient to ignore LP_DEAD items in all_visible earlier on to
- * make the choice of whether or not to freeze the page unaffected by the
- * short-term presence of LP_DEAD items. These LP_DEAD items were
- * effectively assumed to be LP_UNUSED items in the making. It doesn't
- * matter which vacuum heap pass (initial pass or final pass) ends up
- * setting the page all-frozen, as long as the ongoing VACUUM does it.
- *
- * Now that freezing has been finalized, unset all_visible if there are
- * any LP_DEAD items on the page. It needs to reflect the present state
- * of the page, as expected for updating the visibility map.
- */
- if (prstate.all_visible && prstate.lpdead_items == 0)
- {
- presult->all_visible = prstate.all_visible;
- presult->all_frozen = prstate.all_frozen;
- }
- else
- {
- presult->all_visible = false;
- presult->all_frozen = false;
- }
-
- presult->hastup = prstate.hastup;
+ if (do_set_vm)
+ LockBuffer(vmbuffer, BUFFER_LOCK_UNLOCK);
- /*
- * If updating the visibility map, the conflict horizon for that record
- * must be the newest xmin on the page. However, if the page is
- * completely frozen, there can be no conflict and the vm_conflict_horizon
- * should remain InvalidTransactionId. This includes the case that we
- * just froze all the tuples; the prune-freeze record included the
- * conflict XID already so the VM update record doesn't need it.
- */
- if (presult->all_frozen)
- presult->vm_conflict_horizon = InvalidTransactionId;
- else
- presult->vm_conflict_horizon = prstate.visibility_cutoff_xid;
+ Assert(!prstate.all_visible || (prstate.lpdead_items == 0));
/*
- * Handle setting visibility map bit based on information from the VM (as
- * of last heap_vac_scan_next_block() call), and from all_visible and
- * all_frozen variables.
+ * VACUUM will call heap_page_is_all_visible() during the second pass over
+ * the heap to determine all_visible and all_frozen for the page -- this
+ * is a specialized version of the logic from this function. Now that
+ * we've finished pruning and freezing, make sure that we're in total
+ * agreement with heap_page_is_all_visible() using an assertion. We will
+ * have already set the page in the VM, so this assertion will only let
+ * you know that you've already done something wrong.
*/
- if (options & HEAP_PAGE_PRUNE_UPDATE_VM)
+#ifdef USE_ASSERT_CHECKING
+ if (prstate.all_visible)
{
- if (identify_and_fix_vm_corruption(relation,
- blockno, buffer, page,
- blk_known_av,
- prstate.lpdead_items, vmbuffer))
- {
- /* If we fix corruption, don't update the VM further */
- }
+ TransactionId debug_cutoff;
+ bool debug_all_frozen;
- /*
- * If the page isn't yet marked all-visible in the VM or it is and
- * needs to me marked all-frozen, update the VM Note that all_frozen
- * is only valid if all_visible is true, so we must check both
- * all_visible and all_frozen.
- */
- else if (presult->all_visible &&
- (!blk_known_av ||
- (presult->all_frozen && !VM_ALL_FROZEN(relation, blockno, &vmbuffer))))
- {
- Assert(prstate.lpdead_items == 0);
- vmflags = VISIBILITYMAP_ALL_VISIBLE;
+ Assert(cutoffs);
- /*
- * If the page is all-frozen, we can pass InvalidTransactionId as
- * our cutoff_xid, since a snapshotConflictHorizon sufficient to
- * make everything safe for REDO was logged when the page's tuples
- * were frozen.
- */
- if (presult->all_frozen)
- {
- Assert(!TransactionIdIsValid(presult->vm_conflict_horizon));
- vmflags |= VISIBILITYMAP_ALL_FROZEN;
- }
+ Assert(prstate.lpdead_items == 0);
- /*
- * It's possible for the VM bit to be clear and the page-level bit
- * to be set if checksums are not enabled.
- *
- * And even if we are just planning to update the frozen bit in
- * the VM, we shouldn't rely on all_visible_according_to_vm as a
- * proxy for the page-level PD_ALL_VISIBLE bit being set, since it
- * might have become stale.
- *
- * If the heap page is all-visible but the VM bit is not set, we
- * don't need to dirty the heap page. However, if checksums are
- * enabled, we do need to make sure that the heap page is dirtied
- * before passing it to visibilitymap_set(), because it may be
- * logged.
- */
- if (!PageIsAllVisible(page) || XLogHintBitIsNeeded())
- {
- PageSetAllVisible(page);
- MarkBufferDirty(buffer);
- }
+ if (!heap_page_is_all_visible(relation, buffer,
+ cutoffs->OldestXmin,
+ &debug_all_frozen,
+ &debug_cutoff, off_loc))
+ Assert(false);
- old_vmbits = visibilitymap_set(relation, blockno, buffer, InvalidXLogRecPtr,
- vmbuffer, presult->vm_conflict_horizon,
- vmflags);
- }
+ Assert(prstate.all_frozen == debug_all_frozen);
+
+ Assert(!TransactionIdIsValid(debug_cutoff) ||
+ debug_cutoff == prstate.visibility_cutoff_xid);
}
+#endif
+ /* Copy information back for caller */
+ presult->ndeleted = prstate.ndeleted;
+ presult->nnewlpdead = prstate.ndead;
+ presult->nfrozen = prstate.nfrozen;
+ presult->live_tuples = prstate.live_tuples;
+ presult->recently_dead_tuples = prstate.recently_dead_tuples;
+ presult->old_vmbits = old_vmbits;
+ /* new_vmbits was set above */
+ presult->hastup = prstate.hastup;
presult->lpdead_items = prstate.lpdead_items;
/* the presult->deadoffsets array was already filled in */
- presult->old_vmbits = old_vmbits;
- presult->new_vmbits = vmflags;
-
if (prstate.freeze)
{
if (presult->nfrozen > 0)
@@ -1621,7 +1709,12 @@ heap_prune_record_unchanged_lp_normal(Page page, PruneState *prstate, OffsetNumb
break;
}
- /* Consider freezing any normal tuples which will not be removed */
+ /*
+ * Consider freezing any normal tuples which will not be removed.
+ * Regardless of whether or not we want to freeze the tuples, if we want
+ * to update the VM, we have to call heap_prepare_freeze_tuple() on every
+ * tuple to know whether or not the page will be totally frozen.
+ */
if (prstate->freeze)
{
bool totally_frozen;
@@ -2184,7 +2277,7 @@ heap_log_freeze_plan(HeapTupleFreeze *tuples, int ntuples,
* - Reaping: During vacuum phase III, items that are already LP_DEAD are
* marked as unused.
*
- * - VM updates: After vacuum phase III, the heap page may be marked
+ * - VM updates: After vacuum phases I and III, the heap page may be marked
* all-visible and all-frozen.
*
* These changes all happen together, so we use a singel WAL record for them
@@ -2238,6 +2331,7 @@ log_heap_prune_and_freeze(Relation relation, Buffer buffer,
OffsetNumber frz_offsets[MaxHeapTuplesPerPage];
bool do_prune = nredirected > 0 || ndead > 0 || nunused > 0;
+ Assert(do_prune || nfrozen > 0 || vmflags & VISIBILITYMAP_VALID_BITS);
Assert((vmflags & VISIBILITYMAP_VALID_BITS) == vmflags);
xlrec.flags = vmflags;
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index 05d3d2a3267..75c10ba20c6 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -2013,34 +2013,6 @@ lazy_scan_prune(LVRelState *vacrel,
vacrel->new_frozen_tuple_pages++;
}
- /*
- * VACUUM will call heap_page_is_all_visible() during the second pass over
- * the heap to determine all_visible and all_frozen for the page -- this
- * is a specialized version of the logic from this function. Now that
- * we've finished pruning and freezing, make sure that we're in total
- * agreement with heap_page_is_all_visible() using an assertion.
- */
-#ifdef USE_ASSERT_CHECKING
- /* Note that all_frozen value does not matter when !all_visible */
- if (presult.all_visible)
- {
- TransactionId debug_cutoff;
- bool debug_all_frozen;
-
- Assert(presult.lpdead_items == 0);
-
- if (!heap_page_is_all_visible(vacrel->rel, buf,
- vacrel->cutoffs.OldestXmin, &debug_all_frozen,
- &debug_cutoff, &vacrel->offnum))
- Assert(false);
-
- Assert(presult.all_frozen == debug_all_frozen);
-
- Assert(!TransactionIdIsValid(debug_cutoff) ||
- debug_cutoff == presult.vm_conflict_horizon);
- }
-#endif
-
/*
* Now save details of the LP_DEAD items from the page in vacrel
*/
@@ -2074,8 +2046,6 @@ lazy_scan_prune(LVRelState *vacrel,
/* Did we find LP_DEAD items? */
*has_lpdead_items = (presult.lpdead_items > 0);
- Assert(!presult.all_visible || !(*has_lpdead_items));
-
/*
* For the purposes of logging, count whether or not the page was newly
* set all-visible and, potentially, all-frozen.
diff --git a/src/include/access/heapam.h b/src/include/access/heapam.h
index b85648456e9..0b9bb1c9b13 100644
--- a/src/include/access/heapam.h
+++ b/src/include/access/heapam.h
@@ -235,19 +235,12 @@ typedef struct PruneFreezeResult
int recently_dead_tuples;
/*
- * all_visible and all_frozen indicate the status of the page as reflected
- * in the visibility map after pruning, freezing, and setting any pages
- * all-visible in the visibility map.
+ * old_vmbits are the state of the all-visible and all-frozen bits in the
+ * visibility map before updating it during phase I of vacuuming.
+ * new_vmbits are the state of those bits after phase I of vacuuming.
*
- * vm_conflict_horizon is the newest xmin of live tuples on the page
- * (older than OldestXmin). It will only be valid if we did not set the
- * page all-frozen in the VM.
- *
- * These are only set if the HEAP_PRUNE_FREEZE option is set.
+ * These are only set if the HEAP_PAGE_PRUNE_UPDATE_VM option is set.
*/
- bool all_visible;
- bool all_frozen;
- TransactionId vm_conflict_horizon;
uint8 old_vmbits;
uint8 new_vmbits;
--
2.43.0
v9-0017-Inline-TransactionIdFollows-Precedes.patchtext/x-patch; charset=US-ASCII; name=v9-0017-Inline-TransactionIdFollows-Precedes.patchDownload
From 5be056f48478db42dc0ad09d480e091cd8c53ebe Mon Sep 17 00:00:00 2001
From: Melanie Plageman <melanieplageman@gmail.com>
Date: Tue, 29 Jul 2025 09:57:13 -0400
Subject: [PATCH v9 17/22] Inline TransactionIdFollows/Precedes()
Calling these from on-access pruning code had noticeable overhead in a
profile. There does not seem to be a reason not to inline them.
---
src/backend/access/transam/transam.c | 64 -------------------------
src/include/access/transam.h | 70 ++++++++++++++++++++++++++--
2 files changed, 66 insertions(+), 68 deletions(-)
diff --git a/src/backend/access/transam/transam.c b/src/backend/access/transam/transam.c
index 9a39451a29a..553d6756cb3 100644
--- a/src/backend/access/transam/transam.c
+++ b/src/backend/access/transam/transam.c
@@ -273,70 +273,6 @@ TransactionIdAbortTree(TransactionId xid, int nxids, TransactionId *xids)
TRANSACTION_STATUS_ABORTED, InvalidXLogRecPtr);
}
-/*
- * TransactionIdPrecedes --- is id1 logically < id2?
- */
-bool
-TransactionIdPrecedes(TransactionId id1, TransactionId id2)
-{
- /*
- * If either ID is a permanent XID then we can just do unsigned
- * comparison. If both are normal, do a modulo-2^32 comparison.
- */
- int32 diff;
-
- if (!TransactionIdIsNormal(id1) || !TransactionIdIsNormal(id2))
- return (id1 < id2);
-
- diff = (int32) (id1 - id2);
- return (diff < 0);
-}
-
-/*
- * TransactionIdPrecedesOrEquals --- is id1 logically <= id2?
- */
-bool
-TransactionIdPrecedesOrEquals(TransactionId id1, TransactionId id2)
-{
- int32 diff;
-
- if (!TransactionIdIsNormal(id1) || !TransactionIdIsNormal(id2))
- return (id1 <= id2);
-
- diff = (int32) (id1 - id2);
- return (diff <= 0);
-}
-
-/*
- * TransactionIdFollows --- is id1 logically > id2?
- */
-bool
-TransactionIdFollows(TransactionId id1, TransactionId id2)
-{
- int32 diff;
-
- if (!TransactionIdIsNormal(id1) || !TransactionIdIsNormal(id2))
- return (id1 > id2);
-
- diff = (int32) (id1 - id2);
- return (diff > 0);
-}
-
-/*
- * TransactionIdFollowsOrEquals --- is id1 logically >= id2?
- */
-bool
-TransactionIdFollowsOrEquals(TransactionId id1, TransactionId id2)
-{
- int32 diff;
-
- if (!TransactionIdIsNormal(id1) || !TransactionIdIsNormal(id2))
- return (id1 >= id2);
-
- diff = (int32) (id1 - id2);
- return (diff >= 0);
-}
-
/*
* TransactionIdLatest --- get latest XID among a main xact and its children
diff --git a/src/include/access/transam.h b/src/include/access/transam.h
index 7d82cd2eb56..c9e20418275 100644
--- a/src/include/access/transam.h
+++ b/src/include/access/transam.h
@@ -255,6 +255,72 @@ typedef struct TransamVariablesData
} TransamVariablesData;
+
+/*
+ * TransactionIdPrecedes --- is id1 logically < id2?
+ */
+static inline bool
+TransactionIdPrecedes(TransactionId id1, TransactionId id2)
+{
+ /*
+ * If either ID is a permanent XID then we can just do unsigned
+ * comparison. If both are normal, do a modulo-2^32 comparison.
+ */
+ int32 diff;
+
+ if (!TransactionIdIsNormal(id1) || !TransactionIdIsNormal(id2))
+ return (id1 < id2);
+
+ diff = (int32) (id1 - id2);
+ return (diff < 0);
+}
+
+/*
+ * TransactionIdPrecedesOrEquals --- is id1 logically <= id2?
+ */
+static inline bool
+TransactionIdPrecedesOrEquals(TransactionId id1, TransactionId id2)
+{
+ int32 diff;
+
+ if (!TransactionIdIsNormal(id1) || !TransactionIdIsNormal(id2))
+ return (id1 <= id2);
+
+ diff = (int32) (id1 - id2);
+ return (diff <= 0);
+}
+
+/*
+ * TransactionIdFollows --- is id1 logically > id2?
+ */
+static inline bool
+TransactionIdFollows(TransactionId id1, TransactionId id2)
+{
+ int32 diff;
+
+ if (!TransactionIdIsNormal(id1) || !TransactionIdIsNormal(id2))
+ return (id1 > id2);
+
+ diff = (int32) (id1 - id2);
+ return (diff > 0);
+}
+
+/*
+ * TransactionIdFollowsOrEquals --- is id1 logically >= id2?
+ */
+static inline bool
+TransactionIdFollowsOrEquals(TransactionId id1, TransactionId id2)
+{
+ int32 diff;
+
+ if (!TransactionIdIsNormal(id1) || !TransactionIdIsNormal(id2))
+ return (id1 >= id2);
+
+ diff = (int32) (id1 - id2);
+ return (diff >= 0);
+}
+
+
/* ----------------
* extern declarations
* ----------------
@@ -274,10 +340,6 @@ extern bool TransactionIdDidAbort(TransactionId transactionId);
extern void TransactionIdCommitTree(TransactionId xid, int nxids, TransactionId *xids);
extern void TransactionIdAsyncCommitTree(TransactionId xid, int nxids, TransactionId *xids, XLogRecPtr lsn);
extern void TransactionIdAbortTree(TransactionId xid, int nxids, TransactionId *xids);
-extern bool TransactionIdPrecedes(TransactionId id1, TransactionId id2);
-extern bool TransactionIdPrecedesOrEquals(TransactionId id1, TransactionId id2);
-extern bool TransactionIdFollows(TransactionId id1, TransactionId id2);
-extern bool TransactionIdFollowsOrEquals(TransactionId id1, TransactionId id2);
extern TransactionId TransactionIdLatest(TransactionId mainxid,
int nxids, const TransactionId *xids);
extern XLogRecPtr TransactionIdGetCommitLSN(TransactionId xid);
--
2.43.0
v9-0019-Allow-on-access-pruning-to-set-pages-all-visible.patchtext/x-patch; charset=US-ASCII; name=v9-0019-Allow-on-access-pruning-to-set-pages-all-visible.patchDownload
From a83906def96db35ce75f93b3488ad64fc81b067f Mon Sep 17 00:00:00 2001
From: Melanie Plageman <melanieplageman@gmail.com>
Date: Tue, 29 Jul 2025 14:34:30 -0400
Subject: [PATCH v9 19/22] Allow on-access pruning to set pages all-visible
Many queries do not modify the underlying relation. For such queries, if
on-access pruning occurs during the scan, we can check whether the page
has become all-visible and update the visibility map accordingly.
Previously, only vacuum marked pages as all-visible or all-frozen.
Supporting this requires passing information about whether the relation
is modified from the executor down to the scan descriptor.
This commit implements on-access VM setting for sequential scans as well
as for the underlying heap relation in index scans and bitmap heap
scans.
---
src/backend/access/heap/heapam.c | 15 ++++-
src/backend/access/heap/heapam_handler.c | 15 ++++-
src/backend/access/heap/pruneheap.c | 67 ++++++++++++++-----
src/backend/access/index/indexam.c | 46 +++++++++++++
src/backend/access/table/tableam.c | 39 +++++++++--
src/backend/executor/execMain.c | 4 ++
src/backend/executor/execUtils.c | 2 +
src/backend/executor/nodeBitmapHeapscan.c | 7 +-
src/backend/executor/nodeIndexscan.c | 18 +++--
src/backend/executor/nodeSeqscan.c | 24 +++++--
src/include/access/genam.h | 11 +++
src/include/access/heapam.h | 24 ++++++-
src/include/access/relscan.h | 6 ++
src/include/access/tableam.h | 30 ++++++++-
src/include/nodes/execnodes.h | 6 ++
.../t/035_standby_logical_decoding.pl | 4 +-
16 files changed, 278 insertions(+), 40 deletions(-)
diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index cb16bb0cbbd..d07693b7075 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -556,6 +556,7 @@ heap_prepare_pagescan(TableScanDesc sscan)
Buffer buffer = scan->rs_cbuf;
BlockNumber block = scan->rs_cblock;
Snapshot snapshot;
+ Buffer *vmbuffer = NULL;
Page page;
int lines;
bool all_visible;
@@ -570,7 +571,9 @@ heap_prepare_pagescan(TableScanDesc sscan)
/*
* Prune and repair fragmentation for the whole page, if possible.
*/
- heap_page_prune_opt(scan->rs_base.rs_rd, buffer);
+ if (sscan->rs_flags & SO_ALLOW_VM_SET)
+ vmbuffer = &scan->rs_vmbuffer;
+ heap_page_prune_opt(scan->rs_base.rs_rd, buffer, vmbuffer);
/*
* We must hold share lock on the buffer content while examining tuple
@@ -1247,6 +1250,7 @@ heap_beginscan(Relation relation, Snapshot snapshot,
sizeof(TBMIterateResult));
}
+ scan->rs_vmbuffer = InvalidBuffer;
return (TableScanDesc) scan;
}
@@ -1285,6 +1289,12 @@ heap_rescan(TableScanDesc sscan, ScanKey key, bool set_params,
scan->rs_cbuf = InvalidBuffer;
}
+ if (BufferIsValid(scan->rs_vmbuffer))
+ {
+ ReleaseBuffer(scan->rs_vmbuffer);
+ scan->rs_vmbuffer = InvalidBuffer;
+ }
+
/*
* SO_TYPE_BITMAPSCAN would be cleaned up here, but it does not hold any
* additional data vs a normal HeapScan
@@ -1317,6 +1327,9 @@ heap_endscan(TableScanDesc sscan)
if (BufferIsValid(scan->rs_cbuf))
ReleaseBuffer(scan->rs_cbuf);
+ if (BufferIsValid(scan->rs_vmbuffer))
+ ReleaseBuffer(scan->rs_vmbuffer);
+
/*
* Must free the read stream before freeing the BufferAccessStrategy.
*/
diff --git a/src/backend/access/heap/heapam_handler.c b/src/backend/access/heap/heapam_handler.c
index bcbac844bb6..f05b9e4968d 100644
--- a/src/backend/access/heap/heapam_handler.c
+++ b/src/backend/access/heap/heapam_handler.c
@@ -85,6 +85,7 @@ heapam_index_fetch_begin(Relation rel)
hscan->xs_base.rel = rel;
hscan->xs_cbuf = InvalidBuffer;
+ hscan->xs_vmbuffer = InvalidBuffer;
return &hscan->xs_base;
}
@@ -99,6 +100,12 @@ heapam_index_fetch_reset(IndexFetchTableData *scan)
ReleaseBuffer(hscan->xs_cbuf);
hscan->xs_cbuf = InvalidBuffer;
}
+
+ if (BufferIsValid(hscan->xs_vmbuffer))
+ {
+ ReleaseBuffer(hscan->xs_vmbuffer);
+ hscan->xs_vmbuffer = InvalidBuffer;
+ }
}
static void
@@ -138,7 +145,8 @@ heapam_index_fetch_tuple(struct IndexFetchTableData *scan,
* Prune page, but only if we weren't already on this page
*/
if (prev_buf != hscan->xs_cbuf)
- heap_page_prune_opt(hscan->xs_base.rel, hscan->xs_cbuf);
+ heap_page_prune_opt(hscan->xs_base.rel, hscan->xs_cbuf,
+ scan->modifies_base_rel ? NULL : &hscan->xs_vmbuffer);
}
/* Obtain share-lock on the buffer so we can examine visibility */
@@ -2471,6 +2479,7 @@ BitmapHeapScanNextBlock(TableScanDesc scan,
TBMIterateResult *tbmres;
OffsetNumber offsets[TBM_MAX_TUPLES_PER_PAGE];
int noffsets = -1;
+ Buffer *vmbuffer = NULL;
Assert(scan->rs_flags & SO_TYPE_BITMAPSCAN);
Assert(hscan->rs_read_stream);
@@ -2517,7 +2526,9 @@ BitmapHeapScanNextBlock(TableScanDesc scan,
/*
* Prune and repair fragmentation for the whole page, if possible.
*/
- heap_page_prune_opt(scan->rs_rd, buffer);
+ if (scan->rs_flags & SO_ALLOW_VM_SET)
+ vmbuffer = &hscan->rs_vmbuffer;
+ heap_page_prune_opt(scan->rs_rd, buffer, vmbuffer);
/*
* We must hold share lock on the buffer content while examining tuple
diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index ba3faab91fd..4400bf583dd 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -198,9 +198,13 @@ static bool identify_and_fix_vm_corruption(Relation relation,
* if there's not any use in pruning.
*
* Caller must have pin on the buffer, and must *not* have a lock on it.
+ *
+ * If vmbuffer is not NULL, it is okay for pruning to set the visibility map if
+ * the page is all visible. We will take care of pinning and, if needed,
+ * reading in the page of the visibility map.
*/
void
-heap_page_prune_opt(Relation relation, Buffer buffer)
+heap_page_prune_opt(Relation relation, Buffer buffer, Buffer *vmbuffer)
{
Page page = BufferGetPage(buffer);
TransactionId prune_xid;
@@ -264,6 +268,13 @@ heap_page_prune_opt(Relation relation, Buffer buffer)
{
OffsetNumber dummy_off_loc;
PruneFreezeResult presult;
+ int options = 0;
+
+ if (vmbuffer)
+ {
+ visibilitymap_pin(relation, BufferGetBlockNumber(buffer), vmbuffer);
+ options = HEAP_PAGE_PRUNE_UPDATE_VM;
+ }
/*
* For now, pass mark_unused_now as false regardless of whether or
@@ -271,9 +282,10 @@ heap_page_prune_opt(Relation relation, Buffer buffer)
* that during on-access pruning with the current implementation.
*/
heap_page_prune_and_freeze(relation, buffer, false,
- InvalidBuffer,
- vistest, 0,
- NULL, &presult, PRUNE_ON_ACCESS, &dummy_off_loc, NULL, NULL);
+ vmbuffer ? *vmbuffer : InvalidBuffer,
+ vistest, options,
+ NULL, &presult, PRUNE_ON_ACCESS,
+ &dummy_off_loc, NULL, NULL);
/*
* Report the number of tuples reclaimed to pgstats. This is
@@ -513,12 +525,17 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
* all-frozen for use in opportunistic freezing and to update the VM if
* the caller requests it.
*
- * Currently, only VACUUM attempts freezing and setting the VM bits. But
- * other callers could do either one. The visibility bookkeeping is
- * required for opportunistic freezing (in addition to setting the VM
- * bits) because we only consider opportunistically freezing tuples if the
- * whole page would become all-frozen or if the whole page will be frozen
- * except for dead tuples that will be removed by vacuum.
+ * Currently, only VACUUM attempts freezing. But other callers could. The
+ * visibility bookkeeping is required for opportunistic freezing (in
+ * addition to setting the VM bits) because we only consider
+ * opportunistically freezing tuples if the whole page would become
+ * all-frozen or if the whole page will be frozen except for dead tuples
+ * that will be removed by vacuum. But if consider_update_vm is false,
+ * we'll not set the VM even if the page is discovered to be all-visible.
+ *
+ * If only HEAP_PAGE_PRUNE_UPDATE_VM is passed and not
+ * HEAP_PAGE_PRUNE_FREEZE, prstate.all_frozen must be initialized to false
+ * because we will not call heap_prepare_freeze_tuple() on each tuple.
*
* If only updating the VM, we must initialize all_frozen to false, as
* heap_prepare_freeze_tuple() will not be called for each tuple on the
@@ -530,7 +547,7 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
* whether or not to freeze but before deciding whether or not to update
* the VM so that we don't set the VM bit incorrectly.
*
- * If not freezing or updating the VM, we otherwise avoid the extra
+ * If not freezing and not updating the VM, we avoid the extra
* bookkeeping. Initializing all_visible to false allows skipping the work
* to update them in heap_prune_record_unchanged_lp_normal().
*/
@@ -879,12 +896,30 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
prstate.all_frozen = false;
}
+ /*
+ * If this is an on-access call and we're not actually pruning, avoid
+ * setting the visibility map if it would newly dirty the heap page or, if
+ * the page is already dirty, if doing so would require including a
+ * full-page image (FPI) of the heap page in the WAL. This situation
+ * should be rare, as on-access pruning is only attempted when
+ * pd_prune_xid is valid.
+ */
+ if (reason == PRUNE_ON_ACCESS &&
+ prstate.consider_update_vm &&
+ prstate.all_visible &&
+ !do_prune && !do_freeze &&
+ (!BufferIsDirty(buffer) || XLogCheckBufferNeedsBackup(buffer)))
+ {
+ prstate.consider_update_vm = false;
+ prstate.all_visible = prstate.all_frozen = false;
+ }
+
Assert(!prstate.all_frozen || prstate.all_visible);
/*
- * Handle setting visibility map bit based on information from the VM (as
- * of last heap_vac_scan_next_block() call), and from all_visible and
- * all_frozen variables.
+ * Handle setting visibility map bit based on information from the VM (if
+ * provided, e.g. by vacuum from the last heap_vac_scan_next_block()
+ * call), and from all_visible and all_frozen variables.
*/
if (prstate.consider_update_vm)
{
@@ -2275,8 +2310,8 @@ heap_log_freeze_plan(HeapTupleFreeze *tuples, int ntuples,
* - Reaping: During vacuum phase III, items that are already LP_DEAD are
* marked as unused.
*
- * - VM updates: After vacuum phases I and III, the heap page may be marked
- * all-visible and all-frozen.
+ * - VM updates: After vacuum phases I and III and on-access, the heap page
+ * may be marked all-visible and all-frozen.
*
* These changes all happen together, so we use a singel WAL record for them
* all.
diff --git a/src/backend/access/index/indexam.c b/src/backend/access/index/indexam.c
index 86d11f4ec79..4603ece09bd 100644
--- a/src/backend/access/index/indexam.c
+++ b/src/backend/access/index/indexam.c
@@ -289,6 +289,32 @@ index_beginscan(Relation heapRelation,
return scan;
}
+/*
+ * Similar to index_beginscan(), but allows the caller to indicate whether the
+ * query modifies the underlying base relation. This is used when the caller
+ * wants to attempt marking pages in the base relation as all-visible in the
+ * visibility map during on-access pruning.
+ */
+IndexScanDesc
+index_beginscan_vmset(Relation heapRelation,
+ Relation indexRelation,
+ Snapshot snapshot,
+ IndexScanInstrumentation *instrument,
+ int nkeys, int norderbys, bool modifies_base_rel)
+{
+ IndexScanDesc scan;
+
+ scan = index_beginscan(heapRelation,
+ indexRelation,
+ snapshot,
+ instrument,
+ nkeys, norderbys);
+
+ scan->xs_heapfetch->modifies_base_rel = modifies_base_rel;
+
+ return scan;
+}
+
/*
* index_beginscan_bitmap - start a scan of an index with amgetbitmap
*
@@ -620,6 +646,26 @@ index_beginscan_parallel(Relation heaprel, Relation indexrel,
return scan;
}
+/*
+ * Parallel version of index_beginscan_vmset()
+ */
+IndexScanDesc
+index_beginscan_parallel_vmset(Relation heaprel, Relation indexrel,
+ IndexScanInstrumentation *instrument,
+ int nkeys, int norderbys,
+ ParallelIndexScanDesc pscan,
+ bool modifies_base_rel)
+{
+ IndexScanDesc scan;
+
+ scan = index_beginscan_parallel(heaprel, indexrel,
+ instrument,
+ nkeys, norderbys,
+ pscan);
+ scan->xs_heapfetch->modifies_base_rel = modifies_base_rel;
+ return scan;
+}
+
/* ----------------
* index_getnext_tid - get the next TID from a scan
*
diff --git a/src/backend/access/table/tableam.c b/src/backend/access/table/tableam.c
index a56c5eceb14..67dbf99f5b5 100644
--- a/src/backend/access/table/tableam.c
+++ b/src/backend/access/table/tableam.c
@@ -49,6 +49,10 @@
char *default_table_access_method = DEFAULT_TABLE_ACCESS_METHOD;
bool synchronize_seqscans = true;
+/* Helper for table_beginscan_parallel() and table_beginscan_parallel_vmset() */
+static TableScanDesc table_beginscan_parallel_common(Relation relation, ParallelTableScanDesc pscan,
+ uint32 flags);
+
/* ----------------------------------------------------------------------------
* Slot functions.
@@ -162,12 +166,14 @@ table_parallelscan_initialize(Relation rel, ParallelTableScanDesc pscan,
}
}
-TableScanDesc
-table_beginscan_parallel(Relation relation, ParallelTableScanDesc pscan)
+/*
+ * Common helper for table_beginscan_parallel() and table_beginscan_parallel_vmset()
+ */
+static TableScanDesc
+table_beginscan_parallel_common(Relation relation, ParallelTableScanDesc pscan,
+ uint32 flags)
{
Snapshot snapshot;
- uint32 flags = SO_TYPE_SEQSCAN |
- SO_ALLOW_STRAT | SO_ALLOW_SYNC | SO_ALLOW_PAGEMODE;
Assert(RelFileLocatorEquals(relation->rd_locator, pscan->phs_locator));
@@ -188,6 +194,31 @@ table_beginscan_parallel(Relation relation, ParallelTableScanDesc pscan)
pscan, flags);
}
+TableScanDesc
+table_beginscan_parallel(Relation relation, ParallelTableScanDesc pscan)
+{
+ uint32 flags = SO_TYPE_SEQSCAN |
+ SO_ALLOW_STRAT | SO_ALLOW_SYNC | SO_ALLOW_PAGEMODE;
+
+ return table_beginscan_parallel_common(relation, pscan, flags);
+}
+
+/*
+ * Parallel version of table_beginscan_vmset()
+ */
+TableScanDesc
+table_beginscan_parallel_vmset(Relation relation, ParallelTableScanDesc pscan,
+ bool modifies_rel)
+{
+ uint32 flags = SO_TYPE_SEQSCAN |
+ SO_ALLOW_STRAT | SO_ALLOW_SYNC | SO_ALLOW_PAGEMODE;
+
+ if (!modifies_rel)
+ flags |= SO_ALLOW_VM_SET;
+
+ return table_beginscan_parallel_common(relation, pscan, flags);
+}
+
/* ----------------------------------------------------------------------------
* Index scan related functions.
diff --git a/src/backend/executor/execMain.c b/src/backend/executor/execMain.c
index ff12e2e1364..2e0474c948a 100644
--- a/src/backend/executor/execMain.c
+++ b/src/backend/executor/execMain.c
@@ -916,6 +916,10 @@ InitPlan(QueryDesc *queryDesc, int eflags)
break;
}
+ /* If it has a rowmark, the relation is modified */
+ estate->es_modified_relids = bms_add_member(estate->es_modified_relids,
+ rc->rti);
+
/* Check that relation is a legal target for marking */
if (relation)
CheckValidRowMarkRel(relation, rc->markType);
diff --git a/src/backend/executor/execUtils.c b/src/backend/executor/execUtils.c
index fdc65c2b42b..28a06dcd244 100644
--- a/src/backend/executor/execUtils.c
+++ b/src/backend/executor/execUtils.c
@@ -893,6 +893,8 @@ ExecInitResultRelation(EState *estate, ResultRelInfo *resultRelInfo,
estate->es_result_relations = (ResultRelInfo **)
palloc0(estate->es_range_table_size * sizeof(ResultRelInfo *));
estate->es_result_relations[rti - 1] = resultRelInfo;
+ estate->es_modified_relids = bms_add_member(estate->es_modified_relids,
+ rti);
/*
* Saving in the list allows to avoid needlessly traversing the whole
diff --git a/src/backend/executor/nodeBitmapHeapscan.c b/src/backend/executor/nodeBitmapHeapscan.c
index bf24f3d7fe0..af6db9f7919 100644
--- a/src/backend/executor/nodeBitmapHeapscan.c
+++ b/src/backend/executor/nodeBitmapHeapscan.c
@@ -105,11 +105,16 @@ BitmapTableScanSetup(BitmapHeapScanState *node)
*/
if (!node->ss.ss_currentScanDesc)
{
+ bool modifies_rel =
+ bms_is_member(((Scan *) node->ss.ps.plan)->scanrelid,
+ node->ss.ps.state->es_modified_relids);
+
node->ss.ss_currentScanDesc =
table_beginscan_bm(node->ss.ss_currentRelation,
node->ss.ps.state->es_snapshot,
0,
- NULL);
+ NULL,
+ modifies_rel);
}
node->ss.ss_currentScanDesc->st.rs_tbmiterator = tbmiterator;
diff --git a/src/backend/executor/nodeIndexscan.c b/src/backend/executor/nodeIndexscan.c
index 7fcaa37fe62..c2ffbd3b08e 100644
--- a/src/backend/executor/nodeIndexscan.c
+++ b/src/backend/executor/nodeIndexscan.c
@@ -102,16 +102,22 @@ IndexNext(IndexScanState *node)
if (scandesc == NULL)
{
+
+ bool modifies_base_rel =
+ bms_is_member(((Scan *) node->ss.ps.plan)->scanrelid,
+ estate->es_modified_relids);
+
/*
* We reach here if the index scan is not parallel, or if we're
* serially executing an index scan that was planned to be parallel.
*/
- scandesc = index_beginscan(node->ss.ss_currentRelation,
- node->iss_RelationDesc,
- estate->es_snapshot,
- &node->iss_Instrument,
- node->iss_NumScanKeys,
- node->iss_NumOrderByKeys);
+ scandesc = index_beginscan_vmset(node->ss.ss_currentRelation,
+ node->iss_RelationDesc,
+ estate->es_snapshot,
+ &node->iss_Instrument,
+ node->iss_NumScanKeys,
+ node->iss_NumOrderByKeys,
+ modifies_base_rel);
node->iss_ScanDesc = scandesc;
diff --git a/src/backend/executor/nodeSeqscan.c b/src/backend/executor/nodeSeqscan.c
index 94047d29430..fd69275c181 100644
--- a/src/backend/executor/nodeSeqscan.c
+++ b/src/backend/executor/nodeSeqscan.c
@@ -65,13 +65,18 @@ SeqNext(SeqScanState *node)
if (scandesc == NULL)
{
+ bool modifies_rel =
+ bms_is_member(((Scan *) node->ss.ps.plan)->scanrelid,
+ estate->es_modified_relids);
+
/*
* We reach here if the scan is not parallel, or if we're serially
* executing a scan that was planned to be parallel.
*/
- scandesc = table_beginscan(node->ss.ss_currentRelation,
- estate->es_snapshot,
- 0, NULL);
+ scandesc = table_beginscan_vmset(node->ss.ss_currentRelation,
+ estate->es_snapshot,
+ 0, NULL, modifies_rel);
+
node->ss.ss_currentScanDesc = scandesc;
}
@@ -366,6 +371,7 @@ ExecSeqScanInitializeDSM(SeqScanState *node,
ParallelContext *pcxt)
{
EState *estate = node->ss.ps.state;
+ bool modifies_rel;
ParallelTableScanDesc pscan;
pscan = shm_toc_allocate(pcxt->toc, node->pscan_len);
@@ -373,8 +379,11 @@ ExecSeqScanInitializeDSM(SeqScanState *node,
pscan,
estate->es_snapshot);
shm_toc_insert(pcxt->toc, node->ss.ps.plan->plan_node_id, pscan);
+ modifies_rel = bms_is_member(((Scan *) node->ss.ps.plan)->scanrelid,
+ estate->es_modified_relids);
node->ss.ss_currentScanDesc =
- table_beginscan_parallel(node->ss.ss_currentRelation, pscan);
+ table_beginscan_parallel_vmset(node->ss.ss_currentRelation, pscan,
+ modifies_rel);
}
/* ----------------------------------------------------------------
@@ -404,8 +413,13 @@ ExecSeqScanInitializeWorker(SeqScanState *node,
ParallelWorkerContext *pwcxt)
{
ParallelTableScanDesc pscan;
+ bool modifies_rel =
+ bms_is_member(((Scan *) node->ss.ps.plan)->scanrelid,
+ node->ss.ps.state->es_modified_relids);
pscan = shm_toc_lookup(pwcxt->toc, node->ss.ps.plan->plan_node_id, false);
node->ss.ss_currentScanDesc =
- table_beginscan_parallel(node->ss.ss_currentRelation, pscan);
+ table_beginscan_parallel_vmset(node->ss.ss_currentRelation,
+ pscan,
+ modifies_rel);
}
diff --git a/src/include/access/genam.h b/src/include/access/genam.h
index 5b2ab181b5f..bf272c2c37f 100644
--- a/src/include/access/genam.h
+++ b/src/include/access/genam.h
@@ -180,6 +180,11 @@ extern IndexScanDesc index_beginscan(Relation heapRelation,
Snapshot snapshot,
IndexScanInstrumentation *instrument,
int nkeys, int norderbys);
+extern IndexScanDesc index_beginscan_vmset(Relation heapRelation,
+ Relation indexRelation,
+ Snapshot snapshot,
+ IndexScanInstrumentation *instrument,
+ int nkeys, int norderbys, bool modifies_heap_rel);
extern IndexScanDesc index_beginscan_bitmap(Relation indexRelation,
Snapshot snapshot,
IndexScanInstrumentation *instrument,
@@ -206,6 +211,12 @@ extern IndexScanDesc index_beginscan_parallel(Relation heaprel,
IndexScanInstrumentation *instrument,
int nkeys, int norderbys,
ParallelIndexScanDesc pscan);
+
+extern IndexScanDesc index_beginscan_parallel_vmset(Relation heaprel, Relation indexrel,
+ IndexScanInstrumentation *instrument,
+ int nkeys, int norderbys,
+ ParallelIndexScanDesc pscan,
+ bool modifies_rel);
extern ItemPointer index_getnext_tid(IndexScanDesc scan,
ScanDirection direction);
struct TupleTableSlot;
diff --git a/src/include/access/heapam.h b/src/include/access/heapam.h
index 4278f351bdf..16f7904a21e 100644
--- a/src/include/access/heapam.h
+++ b/src/include/access/heapam.h
@@ -94,6 +94,13 @@ typedef struct HeapScanDescData
*/
ParallelBlockTableScanWorkerData *rs_parallelworkerdata;
+ /*
+ * For sequential scans and bitmap heap scans. If the relation is not
+ * being modified, on-access pruning may read in the current heap page's
+ * corresponding VM block to this buffer.
+ */
+ Buffer rs_vmbuffer;
+
/* these fields only used in page-at-a-time mode and for bitmap scans */
uint32 rs_cindex; /* current tuple's index in vistuples */
uint32 rs_ntuples; /* number of visible tuples on page */
@@ -116,8 +123,18 @@ typedef struct IndexFetchHeapData
{
IndexFetchTableData xs_base; /* AM independent part of the descriptor */
- Buffer xs_cbuf; /* current heap buffer in scan, if any */
- /* NB: if xs_cbuf is not InvalidBuffer, we hold a pin on that buffer */
+ /*
+ * Current heap buffer in scan, if any. NB: if xs_cbuf is not
+ * InvalidBuffer, we hold a pin on that buffer.
+ */
+ Buffer xs_cbuf;
+
+ /*
+ * For index scans that do not modify the underlying heap table, on-access
+ * pruning may read in the current heap page's corresponding VM block to
+ * this buffer.
+ */
+ Buffer xs_vmbuffer;
} IndexFetchHeapData;
/* Result codes for HeapTupleSatisfiesVacuum */
@@ -374,7 +391,8 @@ extern TransactionId heap_index_delete_tuples(Relation rel,
/* in heap/pruneheap.c */
struct GlobalVisState;
-extern void heap_page_prune_opt(Relation relation, Buffer buffer);
+extern void heap_page_prune_opt(Relation relation, Buffer buffer,
+ Buffer *vmbuffer);
extern void heap_page_prune_and_freeze(Relation relation, Buffer buffer,
bool blk_known_av,
Buffer vmbuffer,
diff --git a/src/include/access/relscan.h b/src/include/access/relscan.h
index b5e0fb386c0..f496e0b4939 100644
--- a/src/include/access/relscan.h
+++ b/src/include/access/relscan.h
@@ -121,6 +121,12 @@ typedef struct ParallelBlockTableScanWorkerData *ParallelBlockTableScanWorker;
typedef struct IndexFetchTableData
{
Relation rel;
+
+ /*
+ * Some optimizations can only be performed if the query does not modify
+ * the underlying relation. Track that here.
+ */
+ bool modifies_base_rel;
} IndexFetchTableData;
struct IndexScanInstrumentation;
diff --git a/src/include/access/tableam.h b/src/include/access/tableam.h
index b2ce35e2a34..e31c21cf8eb 100644
--- a/src/include/access/tableam.h
+++ b/src/include/access/tableam.h
@@ -62,6 +62,8 @@ typedef enum ScanOptions
/* unregister snapshot at scan end? */
SO_TEMP_SNAPSHOT = 1 << 9,
+ /* whether or not scan should attempt to set the VM */
+ SO_ALLOW_VM_SET = 1 << 10,
} ScanOptions;
/*
@@ -881,6 +883,25 @@ table_beginscan(Relation rel, Snapshot snapshot,
return rel->rd_tableam->scan_begin(rel, snapshot, nkeys, key, NULL, flags);
}
+/*
+ * Similar to table_beginscan(), but allows the caller to indicate whether the
+ * query modifies the relation. This is used when the caller wants to attempt
+ * marking pages in the relation as all-visible in the visibility map during
+ * on-access pruning.
+ */
+static inline TableScanDesc
+table_beginscan_vmset(Relation rel, Snapshot snapshot,
+ int nkeys, struct ScanKeyData *key, bool modifies_rel)
+{
+ uint32 flags = SO_TYPE_SEQSCAN |
+ SO_ALLOW_STRAT | SO_ALLOW_SYNC | SO_ALLOW_PAGEMODE;
+
+ if (!modifies_rel)
+ flags |= SO_ALLOW_VM_SET;
+
+ return rel->rd_tableam->scan_begin(rel, snapshot, nkeys, key, NULL, flags);
+}
+
/*
* Like table_beginscan(), but for scanning catalog. It'll automatically use a
* snapshot appropriate for scanning catalog relations.
@@ -918,10 +939,13 @@ table_beginscan_strat(Relation rel, Snapshot snapshot,
*/
static inline TableScanDesc
table_beginscan_bm(Relation rel, Snapshot snapshot,
- int nkeys, struct ScanKeyData *key)
+ int nkeys, struct ScanKeyData *key, bool modifies_rel)
{
uint32 flags = SO_TYPE_BITMAPSCAN | SO_ALLOW_PAGEMODE;
+ if (!modifies_rel)
+ flags |= SO_ALLOW_VM_SET;
+
return rel->rd_tableam->scan_begin(rel, snapshot, nkeys, key,
NULL, flags);
}
@@ -1130,6 +1154,10 @@ extern void table_parallelscan_initialize(Relation rel,
extern TableScanDesc table_beginscan_parallel(Relation relation,
ParallelTableScanDesc pscan);
+extern TableScanDesc table_beginscan_parallel_vmset(Relation relation,
+ ParallelTableScanDesc pscan,
+ bool modifies_rel);
+
/*
* Restart a parallel scan. Call this in the leader process. Caller is
* responsible for making sure that all workers have finished the scan
diff --git a/src/include/nodes/execnodes.h b/src/include/nodes/execnodes.h
index de782014b2d..839c1be1d7c 100644
--- a/src/include/nodes/execnodes.h
+++ b/src/include/nodes/execnodes.h
@@ -678,6 +678,12 @@ typedef struct EState
* ExecDoInitialPruning() */
const char *es_sourceText; /* Source text from QueryDesc */
+ /*
+ * RT indexes of relations modified by the query either through
+ * UPDATE/DELETE/INSERT/MERGE or SELECT FOR UPDATE
+ */
+ Bitmapset *es_modified_relids;
+
JunkFilter *es_junkFilter; /* top-level junk filter, if any */
/* If query can insert/delete tuples, the command ID to mark them with */
diff --git a/src/test/recovery/t/035_standby_logical_decoding.pl b/src/test/recovery/t/035_standby_logical_decoding.pl
index c9c182892cf..870f03bdd79 100644
--- a/src/test/recovery/t/035_standby_logical_decoding.pl
+++ b/src/test/recovery/t/035_standby_logical_decoding.pl
@@ -10,6 +10,7 @@ use PostgreSQL::Test::Cluster;
use PostgreSQL::Test::Utils;
use Time::HiRes qw(usleep);
use Test::More;
+use Time::HiRes qw(usleep);
if ($ENV{enable_injection_points} ne 'yes')
{
@@ -296,6 +297,7 @@ wal_level = 'logical'
max_replication_slots = 4
max_wal_senders = 4
autovacuum = off
+hot_standby_feedback = on
});
$node_primary->dump_info;
$node_primary->start;
@@ -745,7 +747,7 @@ check_pg_recvlogical_stderr($handle,
$logstart = -s $node_standby->logfile;
reactive_slots_change_hfs_and_wait_for_xmins('shared_row_removal_',
- 'no_conflict_', 0, 1);
+ 'no_conflict_', 1, 0);
# This should not trigger a conflict
wait_until_vacuum_can_remove(
--
2.43.0
v9-0018-Unset-all-visible-sooner-if-not-freezing.patchtext/x-patch; charset=US-ASCII; name=v9-0018-Unset-all-visible-sooner-if-not-freezing.patchDownload
From 2dc6f7ada64352284c96e5f0d069913a6f1f6eef Mon Sep 17 00:00:00 2001
From: Melanie Plageman <melanieplageman@gmail.com>
Date: Tue, 29 Jul 2025 14:35:13 -0400
Subject: [PATCH v9 18/22] Unset all-visible sooner if not freezing
In prune/freeze code, we delay unsetting all-visible/all-frozen in the
presence of dead items to allow opportunistically freezing tuples if the
whole page would be frozen except for those dead items -- which are
removed later in vacuum's third phase.
Future commits will allow on-access pruning to set the VM, which means
all-visible will be initialized to true instead of false and we will do
extra bookkeeping in heap_prune_unchanged_lp_normal() to keep track of
whether or not the page is all-visible.
Because on-access pruning will not freeze tuples, it makes sense to
unset all-visible as soon as we encounter an LP_DEAD item and
avoid continued bookkeeping since we know the page is not all-visible
and we won't be able to remove those dead items.
---
src/backend/access/heap/pruneheap.c | 10 ++++++++--
1 file changed, 8 insertions(+), 2 deletions(-)
diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index 273e9412a01..ba3faab91fd 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -1494,8 +1494,11 @@ heap_prune_record_dead(PruneState *prstate, OffsetNumber offnum,
/*
* Deliberately delay unsetting all_visible until later during pruning.
- * Removable dead tuples shouldn't preclude freezing the page.
+ * Removable dead tuples shouldn't preclude freezing the page. If we won't
+ * attempt freezing, just unset all-visible now, though.
*/
+ if (!prstate->attempt_freeze)
+ prstate->all_visible = prstate->all_frozen = false;
/* Record the dead offset for vacuum */
prstate->deadoffsets[prstate->lpdead_items++] = offnum;
@@ -1753,8 +1756,11 @@ heap_prune_record_unchanged_lp_dead(Page page, PruneState *prstate, OffsetNumber
* Similarly, don't unset all_visible until later, at the end of
* heap_page_prune_and_freeze(). This will allow us to attempt to freeze
* the page after pruning. As long as we unset it before updating the
- * visibility map, this will be correct.
+ * visibility map, this will be correct. If we won't attempt freezing,
+ * though, just unset all-visible now.
*/
+ if (!prstate->attempt_freeze)
+ prstate->all_visible = prstate->all_frozen = false;
/* Record the dead offset for vacuum */
prstate->deadoffsets[prstate->lpdead_items++] = offnum;
--
2.43.0
v9-0021-Reorder-heap_page_prune_and_freeze-parameters.patchtext/x-patch; charset=US-ASCII; name=v9-0021-Reorder-heap_page_prune_and_freeze-parameters.patchDownload
From fd56683e500e528ee9da99a7326368aca8cb8bac Mon Sep 17 00:00:00 2001
From: Melanie Plageman <melanieplageman@gmail.com>
Date: Thu, 31 Jul 2025 12:08:18 -0400
Subject: [PATCH v9 21/22] Reorder heap_page_prune_and_freeze parameters
Reorder parameters so that all of the output parameters are together at
the end of the parameter list.
---
src/backend/access/heap/pruneheap.c | 38 ++++++++++++++--------------
src/backend/access/heap/vacuumlazy.c | 6 ++---
src/include/access/heapam.h | 4 +--
3 files changed, 24 insertions(+), 24 deletions(-)
diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index b9f85d1452e..53cb81d2510 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -297,10 +297,10 @@ heap_page_prune_opt(Relation relation, Buffer buffer, Buffer *vmbuffer)
* not the relation has indexes, since we cannot safely determine
* that during on-access pruning with the current implementation.
*/
- heap_page_prune_and_freeze(relation, buffer, false,
+ heap_page_prune_and_freeze(relation, buffer, options, false,
vmbuffer ? *vmbuffer : InvalidBuffer,
- vistest, options,
- NULL, &presult, PRUNE_ON_ACCESS,
+ vistest,
+ NULL, PRUNE_ON_ACCESS, &presult,
&dummy_off_loc, NULL, NULL);
/*
@@ -645,6 +645,15 @@ heap_page_will_freeze(Relation relation, Buffer buffer,
* also need to account for a reduction in the length of the line pointer
* array following array truncation by us.
*
+ * options:
+ * MARK_UNUSED_NOW indicates that dead items can be set LP_UNUSED during
+ * pruning.
+ *
+ * FREEZE indicates that we will also freeze tuples, and will return
+ * 'all_visible', 'all_frozen' flags to the caller.
+ *
+ * UPDATE_VM indicates that we will set the page's status in the VM.
+ *
* If the HEAP_PRUNE_FREEZE option is set, we will also freeze tuples if it's
* required in order to advance relfrozenxid / relminmxid, or if it's
* considered advantageous for overall system performance to do so now. The
@@ -663,30 +672,21 @@ heap_page_will_freeze(Relation relation, Buffer buffer,
* contain the required block of the visibility map.
*
* vistest is used to distinguish whether tuples are DEAD or RECENTLY_DEAD
- * (see heap_prune_satisfies_vacuum).
- *
- * options:
- * MARK_UNUSED_NOW indicates that dead items can be set LP_UNUSED during
- * pruning.
- *
- * FREEZE indicates that we will also freeze tuples, and will return
- * 'all_visible', 'all_frozen' flags to the caller.
- *
- * UPDATE_VM indicates that we will set the page's status in the VM.
+ * (see heap_prune_satisfies_vacuum). It is an input parameter.
*
* cutoffs contains the freeze cutoffs, established by VACUUM at the beginning
* of vacuuming the relation. Required if HEAP_PRUNE_FREEZE option is set.
* cutoffs->OldestXmin is also used to determine if dead tuples are
- * HEAPTUPLE_RECENTLY_DEAD or HEAPTUPLE_DEAD.
+ * HEAPTUPLE_RECENTLY_DEAD or HEAPTUPLE_DEAD. It is an input parameter.
+ *
+ * reason indicates why the pruning is performed. It is included in the WAL
+ * record for debugging and analysis purposes, but otherwise has no effect.
*
* presult contains output parameters needed by callers, such as the number of
* tuples removed and the offsets of dead items on the page after pruning.
* heap_page_prune_and_freeze() is responsible for initializing it. Required
* by all callers.
*
- * reason indicates why the pruning is performed. It is included in the WAL
- * record for debugging and analysis purposes, but otherwise has no effect.
- *
* off_loc is the offset location required by the caller to use in error
* callback.
*
@@ -699,13 +699,13 @@ heap_page_will_freeze(Relation relation, Buffer buffer,
*/
void
heap_page_prune_and_freeze(Relation relation, Buffer buffer,
+ int options,
bool blk_known_av,
Buffer vmbuffer,
GlobalVisState *vistest,
- int options,
struct VacuumCutoffs *cutoffs,
- PruneFreezeResult *presult,
PruneReason reason,
+ PruneFreezeResult *presult,
OffsetNumber *off_loc,
TransactionId *new_relfrozen_xid,
MultiXactId *new_relmin_mxid)
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index 7558ac697f1..99b9cab0974 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -1991,11 +1991,11 @@ lazy_scan_prune(LVRelState *vacrel,
if (vacrel->nindexes == 0)
prune_options |= HEAP_PAGE_PRUNE_MARK_UNUSED_NOW;
- heap_page_prune_and_freeze(rel, buf,
+ heap_page_prune_and_freeze(rel, buf, prune_options,
all_visible_according_to_vm,
vmbuffer,
- vacrel->vistest, prune_options,
- &vacrel->cutoffs, &presult, PRUNE_VACUUM_SCAN,
+ vacrel->vistest,
+ &vacrel->cutoffs, PRUNE_VACUUM_SCAN, &presult,
&vacrel->offnum,
&vacrel->NewRelfrozenXid, &vacrel->NewRelminMxid);
diff --git a/src/include/access/heapam.h b/src/include/access/heapam.h
index 16f7904a21e..0c4e5607627 100644
--- a/src/include/access/heapam.h
+++ b/src/include/access/heapam.h
@@ -394,13 +394,13 @@ struct GlobalVisState;
extern void heap_page_prune_opt(Relation relation, Buffer buffer,
Buffer *vmbuffer);
extern void heap_page_prune_and_freeze(Relation relation, Buffer buffer,
+ int options,
bool blk_known_av,
Buffer vmbuffer,
struct GlobalVisState *vistest,
- int options,
struct VacuumCutoffs *cutoffs,
- PruneFreezeResult *presult,
PruneReason reason,
+ PruneFreezeResult *presult,
OffsetNumber *off_loc,
TransactionId *new_relfrozen_xid,
MultiXactId *new_relmin_mxid);
--
2.43.0
v9-0020-Add-helper-functions-to-heap_page_prune_and_freez.patchtext/x-patch; charset=US-ASCII; name=v9-0020-Add-helper-functions-to-heap_page_prune_and_freez.patchDownload
From b4a28cf0ab6cd86be2abc4ff20ecf7e99ed13cf4 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <melanieplageman@gmail.com>
Date: Wed, 30 Jul 2025 18:51:43 -0400
Subject: [PATCH v9 20/22] Add helper functions to heap_page_prune_and_freeze
heap_page_prune_and_freeze() has gotten rather long. It has several
stages:
1) setup - where the PruneStateis set up
2) tuple examination -- where tuples and line pointers are examined to
determine what needs to be pruned and what could be frozen
3) evaluation -- where we determine based on caller provided options,
heuristics, and state gathered during stage 2 whether or not to
freeze tuples and set the page in the VM
4) execution - where the page changes are actually made and logged
This commit refactors the evaluation stage into helpers which return
whether or not to freeze and set the VM.
For the purposes of committing, this likely shouldn't be a separate
commit. But I'm not sure yet whether it makes more sense to do this
refactoring earlier in the set for clarity for the reviewer.
---
src/backend/access/heap/pruneheap.c | 471 +++++++++++++++++-----------
1 file changed, 295 insertions(+), 176 deletions(-)
diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index 4400bf583dd..b9f85d1452e 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -179,6 +179,22 @@ static void heap_prune_record_unchanged_lp_redirect(PruneState *prstate, OffsetN
static void page_verify_redirects(Page page);
+static bool heap_page_will_freeze(Relation relation, Buffer buffer,
+ bool do_prune,
+ bool do_hint_full_or_prunable,
+ bool did_tuple_hint_fpi,
+ PruneState *prstate,
+ bool *all_frozen_except_lp_dead);
+
+static bool heap_page_will_update_vm(Relation relation,
+ Buffer buffer, BlockNumber blockno, Page page,
+ PruneReason reason,
+ bool do_prune, bool do_freeze,
+ bool blk_known_av,
+ PruneState *prstate,
+ Buffer *vmbuffer, uint8 *vmflags,
+ bool *set_pd_all_visible);
+
static bool identify_and_fix_vm_corruption(Relation relation,
BlockNumber heap_blk,
Buffer heap_buffer, Page heap_page,
@@ -376,6 +392,249 @@ identify_and_fix_vm_corruption(Relation relation,
return false;
}
+
+/*
+ * Determine whether to set the visibility map bits based on information from
+ * the PruneState and blk_known_av, which some callers will provide after
+ * previously examining this heap page's VM bits (e.g. vacuum from the last
+ * heap_vac_scan_next_block() call).
+ *
+ * We pass in blockno and page even those can be derived from buffer to avoid
+ * extra BufferGetBlock() and BufferGetBlockNumber() calls.
+ *
+ * This should be called only after do_freeze has been decided (and do_prune
+ * has been set), as these factor into our heuristic-based decision.
+ *
+ * prstate and vmbuffer are input/output fields. vmflags and and
+ * set_pd_all_visible are output fields.
+ *
+ * Returns true if the caller should set one or both of the VM bits and false
+ * otherwise.
+ */
+static bool
+heap_page_will_update_vm(Relation relation,
+ Buffer buffer, BlockNumber blockno, Page page,
+ PruneReason reason,
+ bool do_prune, bool do_freeze,
+ bool blk_known_av,
+ PruneState *prstate,
+ Buffer *vmbuffer, uint8 *vmflags,
+ bool *set_pd_all_visible)
+{
+ bool do_set_vm = false;
+
+ /*
+ * If the caller specified not to update the VM, validate everything is in
+ * the right state and exit.
+ */
+ if (!prstate->consider_update_vm)
+ {
+ Assert(!prstate->all_visible && !prstate->all_frozen);
+ /* We don't set only the page level visibility hint */
+ Assert(!(*set_pd_all_visible));
+ Assert(*vmflags == 0);
+ return false;
+ }
+
+ /*
+ * If this is an on-access call and we're not actually pruning, avoid
+ * setting the visibility map if it would newly dirty the heap page or, if
+ * the page is already dirty, if doing so would require including a
+ * full-page image (FPI) of the heap page in the WAL. This situation
+ * should be rare, as on-access pruning is only attempted when
+ * pd_prune_xid is valid.
+ */
+ if (reason == PRUNE_ON_ACCESS &&
+ prstate->consider_update_vm &&
+ prstate->all_visible &&
+ !do_prune && !do_freeze &&
+ (!BufferIsDirty(buffer) || XLogCheckBufferNeedsBackup(buffer)))
+ {
+ prstate->consider_update_vm = false;
+ prstate->all_visible = prstate->all_frozen = false;
+ }
+
+ Assert(!prstate->all_frozen || prstate->all_visible);
+
+ /*
+ * Clear any VM corruption. This does not need to be in a critical
+ * section, so we do it first. If PD_ALL_VISIBLE is incorrectly set, we
+ * may mark the heap page buffer dirty here and could end up doing so
+ * again later. This is not a correctness issue and is in the path of VM
+ * corruption, so we don't have to worry about the extra performance
+ * overhead.
+ */
+ if (identify_and_fix_vm_corruption(relation,
+ blockno, buffer, page,
+ blk_known_av, prstate->lpdead_items,
+ *vmbuffer))
+ {
+ /* If we fix corruption, don't update the VM further */
+ }
+
+ /* Determine if we actually need to set the VM and which bits to set. */
+ else if (prstate->all_visible &&
+ (!blk_known_av ||
+ (prstate->all_frozen && !VM_ALL_FROZEN(relation, blockno, vmbuffer))))
+ {
+ *vmflags |= VISIBILITYMAP_ALL_VISIBLE;
+ if (prstate->all_frozen)
+ *vmflags |= VISIBILITYMAP_ALL_FROZEN;
+ }
+
+ do_set_vm = *vmflags & VISIBILITYMAP_VALID_BITS;
+
+ /*
+ * Don't set PD_ALL_VISIBLE unless we also plan to set the VM. While it is
+ * correct for a heap page to have PD_ALL_VISIBLE even if the VM is not
+ * set, we strongly prefer to keep them in sync.
+ *
+ * Prior to Postgres 19, it was possible for the page-level bit to be set
+ * and the VM bit to be clear. This could happen if we crashed after
+ * setting PD_ALL_VISIBLE but before setting bits in the VM.
+ */
+ *set_pd_all_visible = do_set_vm && !PageIsAllVisible(page);
+ return do_set_vm;
+}
+
+/*
+ * Decide if we want to go ahead with freezing according to the freeze plans we
+ * prepared for the given buffer or not. If the caller specified we should not
+ * freeze tuples, it exits early.
+ *
+ * do_prune, do_hint_full_or_prunable, and did_tuple_hint_fpi must all have
+ * been decided before calling this function.
+ *
+ * prstate is an input/output parameter. all_frozen_except_lp_dead is set and
+ * used later to determine the snapshot conflict horizon for the record.
+ *
+ * Returns true if we should use our freeze plans and freeze tuples on the page
+ * and false otherwise.
+ */
+static bool
+heap_page_will_freeze(Relation relation, Buffer buffer,
+ bool do_prune,
+ bool do_hint_full_or_prunable,
+ bool did_tuple_hint_fpi,
+ PruneState *prstate,
+ bool *all_frozen_except_lp_dead)
+{
+ bool do_freeze = false;
+
+ /*
+ * If the caller specified we should not attempt to freeze any tuples,
+ * validate that everything is in the right state and exit.
+ */
+ if (!prstate->attempt_freeze)
+ {
+ Assert(!prstate->all_frozen && prstate->nfrozen == 0);
+ Assert(prstate->lpdead_items == 0 || !prstate->all_visible);
+ Assert(!(*all_frozen_except_lp_dead));
+ return false;
+ }
+
+ if (prstate->pagefrz.freeze_required)
+ {
+ /*
+ * heap_prepare_freeze_tuple indicated that at least one XID/MXID from
+ * before FreezeLimit/MultiXactCutoff is present. Must freeze to
+ * advance relfrozenxid/relminmxid.
+ */
+ do_freeze = true;
+ }
+ else
+ {
+ /*
+ * Opportunistically freeze the page if we are generating an FPI
+ * anyway and if doing so means that we can set the page all-frozen
+ * afterwards (might not happen until VACUUM's final heap pass).
+ *
+ * XXX: Previously, we knew if pruning emitted an FPI by checking
+ * pgWalUsage.wal_fpi before and after pruning. Once the freeze and
+ * prune records were combined, this heuristic couldn't be used
+ * anymore. The opportunistic freeze heuristic must be improved;
+ * however, for now, try to approximate the old logic.
+ */
+ if (prstate->all_visible && prstate->all_frozen && prstate->nfrozen > 0)
+ {
+ /*
+ * Freezing would make the page all-frozen. Have already emitted
+ * an FPI or will do so anyway?
+ */
+ if (RelationNeedsWAL(relation))
+ {
+ if (did_tuple_hint_fpi)
+ do_freeze = true;
+ else if (do_prune)
+ {
+ if (XLogCheckBufferNeedsBackup(buffer))
+ do_freeze = true;
+ }
+ else if (do_hint_full_or_prunable)
+ {
+ if (XLogHintBitIsNeeded() && XLogCheckBufferNeedsBackup(buffer))
+ do_freeze = true;
+ }
+ }
+ }
+ }
+
+ if (do_freeze)
+ {
+ /*
+ * Validate the tuples we will be freezing before entering the
+ * critical section.
+ */
+ heap_pre_freeze_checks(buffer, prstate->frozen, prstate->nfrozen);
+ }
+ else if (prstate->nfrozen > 0)
+ {
+ /*
+ * The page contained some tuples that were not already frozen, and we
+ * chose not to freeze them now. The page won't be all-frozen then.
+ */
+ Assert(!prstate->pagefrz.freeze_required);
+
+ prstate->all_frozen = false;
+ prstate->nfrozen = 0; /* avoid miscounts in instrumentation */
+ }
+ else
+ {
+ /*
+ * We have no freeze plans to execute. The page might already be
+ * all-frozen (perhaps only following pruning), though. Such pages
+ * can be marked all-frozen in the VM by our caller, even though none
+ * of its tuples were newly frozen here.
+ */
+ }
+
+ /*
+ * It was convenient to ignore LP_DEAD items in all_visible earlier on to
+ * make the choice of whether or not to freeze the page unaffected by the
+ * short-term presence of LP_DEAD items. These LP_DEAD items were
+ * effectively assumed to be LP_UNUSED items in the making. It doesn't
+ * matter which vacuum heap pass (initial pass or final pass) ends up
+ * setting the page all-frozen, as long as the ongoing VACUUM does it.
+ *
+ * Now that freezing has been finalized, unset all_visible if there are
+ * any LP_DEAD items on the page. It needs to reflect the present state of
+ * the page when using it to determine whether or not to update the VM.
+ *
+ * Keep track of whether or not the page was all-frozen except LP_DEAD
+ * items for the purposes of calculating the snapshot conflict horizon,
+ * though.
+ */
+ *all_frozen_except_lp_dead = prstate->all_frozen;
+ if (prstate->lpdead_items > 0)
+ {
+ prstate->all_visible = false;
+ prstate->all_frozen = false;
+ }
+
+ return do_freeze;
+}
+
+
/*
* Prune and repair fragmentation and potentially freeze tuples on the
* specified page. If the page's visibility status has changed, update it in
@@ -766,20 +1025,30 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
/* Clear the offset information once we have processed the given page. */
*off_loc = InvalidOffsetNumber;
- do_prune = prstate.nredirected > 0 ||
- prstate.ndead > 0 ||
- prstate.nunused > 0;
-
/*
* After processing all the live tuples on the page, if the newest xmin
* amongst them is not visible to everyone, the page cannot be
- * all-visible.
+ * all-visible. This must be done before we decide whether or not to
+ * opportunistically freeze below because we do not want to
+ * opportunistically freeze the page if there are live tuples not visible
+ * to everyone, which would prevent setting the page frozen in the VM.
*/
if (prstate.all_visible &&
TransactionIdIsNormal(prstate.visibility_cutoff_xid) &&
!GlobalVisXidVisibleToAll(prstate.vistest, prstate.visibility_cutoff_xid))
prstate.all_visible = prstate.all_frozen = false;
+ /*
+ * Now decide based on information collected while examining every tuple
+ * which actions to take. If there are any prunable tuples, we'll prune
+ * them. However, we will decide based on options specified by the caller
+ * and various heuristics whether or not to freeze any tuples and whether
+ * or not the page should be set all-visible/all-frozen in the VM.
+ */
+ do_prune = prstate.nredirected > 0 ||
+ prstate.ndead > 0 ||
+ prstate.nunused > 0;
+
/*
* Even if we don't prune anything, if we found a new value for the
* pd_prune_xid field or the page was marked full, we will update those
@@ -790,182 +1059,32 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
PageIsFull(page);
/*
- * Decide if we want to go ahead with freezing according to the freeze
- * plans we prepared, or not.
- */
- do_freeze = false;
- if (prstate.attempt_freeze)
- {
- if (prstate.pagefrz.freeze_required)
- {
- /*
- * heap_prepare_freeze_tuple indicated that at least one XID/MXID
- * from before FreezeLimit/MultiXactCutoff is present. Must
- * freeze to advance relfrozenxid/relminmxid.
- */
- do_freeze = true;
- }
- else
- {
- /*
- * Opportunistically freeze the page if we are generating an FPI
- * anyway and if doing so means that we can set the page
- * all-frozen afterwards (might not happen until VACUUM's final
- * heap pass).
- *
- * XXX: Previously, we knew if pruning emitted an FPI by checking
- * pgWalUsage.wal_fpi before and after pruning. Once the freeze
- * and prune records were combined, this heuristic couldn't be
- * used anymore. The opportunistic freeze heuristic must be
- * improved; however, for now, try to approximate the old logic.
- */
- if (prstate.all_visible && prstate.all_frozen && prstate.nfrozen > 0)
- {
- /*
- * Freezing would make the page all-frozen. Have already
- * emitted an FPI or will do so anyway?
- */
- if (RelationNeedsWAL(relation))
- {
- if (did_tuple_hint_fpi)
- do_freeze = true;
- else if (do_prune)
- {
- if (XLogCheckBufferNeedsBackup(buffer))
- do_freeze = true;
- }
- else if (do_hint_full_or_prunable)
- {
- if (XLogHintBitIsNeeded() && XLogCheckBufferNeedsBackup(buffer))
- do_freeze = true;
- }
- }
- }
- }
- }
-
- if (do_freeze)
- {
- /*
- * Validate the tuples we will be freezing before entering the
- * critical section.
- */
- heap_pre_freeze_checks(buffer, prstate.frozen, prstate.nfrozen);
- }
- else if (prstate.nfrozen > 0)
- {
- /*
- * The page contained some tuples that were not already frozen, and we
- * chose not to freeze them now. The page won't be all-frozen then.
- */
- Assert(!prstate.pagefrz.freeze_required);
-
- prstate.all_frozen = false;
- prstate.nfrozen = 0; /* avoid miscounts in instrumentation */
- }
- else
- {
- /*
- * We have no freeze plans to execute. The page might already be
- * all-frozen (perhaps only following pruning), though. Such pages
- * can be marked all-frozen in the VM by our caller, even though none
- * of its tuples were newly frozen here.
- */
- }
-
- /*
- * It was convenient to ignore LP_DEAD items in all_visible earlier on to
- * make the choice of whether or not to freeze the page unaffected by the
- * short-term presence of LP_DEAD items. These LP_DEAD items were
- * effectively assumed to be LP_UNUSED items in the making. It doesn't
- * matter which vacuum heap pass (initial pass or final pass) ends up
- * setting the page all-frozen, as long as the ongoing VACUUM does it.
- *
- * Now that freezing has been finalized, unset all_visible if there are
- * any LP_DEAD items on the page. It needs to reflect the present state of
- * the page when using it to determine whether or not to update the VM.
- *
- * Keep track of whether or not the page was all-frozen except LP_DEAD
- * items for the purposes of calculating the snapshot conflict horizon,
- * though.
+ * We must decide whether or not to freeze before deciding if and what to
+ * set in the VM.
*/
- all_frozen_except_lp_dead = prstate.all_frozen;
- if (prstate.lpdead_items > 0)
- {
- prstate.all_visible = false;
- prstate.all_frozen = false;
- }
-
- /*
- * If this is an on-access call and we're not actually pruning, avoid
- * setting the visibility map if it would newly dirty the heap page or, if
- * the page is already dirty, if doing so would require including a
- * full-page image (FPI) of the heap page in the WAL. This situation
- * should be rare, as on-access pruning is only attempted when
- * pd_prune_xid is valid.
- */
- if (reason == PRUNE_ON_ACCESS &&
- prstate.consider_update_vm &&
- prstate.all_visible &&
- !do_prune && !do_freeze &&
- (!BufferIsDirty(buffer) || XLogCheckBufferNeedsBackup(buffer)))
- {
- prstate.consider_update_vm = false;
- prstate.all_visible = prstate.all_frozen = false;
- }
-
- Assert(!prstate.all_frozen || prstate.all_visible);
-
- /*
- * Handle setting visibility map bit based on information from the VM (if
- * provided, e.g. by vacuum from the last heap_vac_scan_next_block()
- * call), and from all_visible and all_frozen variables.
- */
- if (prstate.consider_update_vm)
- {
- /*
- * Clear any VM corruption. This does not need to be in a critical
- * section, so we do it first. If PD_ALL_VISIBLE is incorrectly set,
- * we may mark the heap page buffer dirty here and could end up doing
- * so again later. This is not a correctness issue and is in the path
- * of VM corruption, so we don't have to worry about the extra
- * performance overhead.
- */
- if (identify_and_fix_vm_corruption(relation,
- blockno, buffer, page,
- blk_known_av, prstate.lpdead_items, vmbuffer))
- {
- /* If we fix corruption, don't update the VM further */
- }
-
- /* Determine if we actually need to set the VM and which bits to set. */
- else if (prstate.all_visible &&
- (!blk_known_av ||
- (prstate.all_frozen && !VM_ALL_FROZEN(relation, blockno, &vmbuffer))))
- {
- vmflags |= VISIBILITYMAP_ALL_VISIBLE;
- if (prstate.all_frozen)
- vmflags |= VISIBILITYMAP_ALL_FROZEN;
- }
- }
-
- do_set_vm = vmflags & VISIBILITYMAP_VALID_BITS;
-
- /*
- * Don't set PD_ALL_VISIBLE unless we also plan to set the VM. While it is
- * correct for a heap page to have PD_ALL_VISIBLE even if the VM is not
- * set, we strongly prefer to keep them in sync.
- *
- * Prior to Postgres 19, it was possible for the page-level bit to be set
- * and the VM bit to be clear. This could happen if we crashed after
- * setting PD_ALL_VISIBLE but before setting bits in the VM.
- */
- set_pd_all_visible = do_set_vm && !PageIsAllVisible(page);
+ do_freeze = heap_page_will_freeze(relation, buffer,
+ do_prune,
+ do_hint_full_or_prunable,
+ did_tuple_hint_fpi,
+ &prstate,
+ &all_frozen_except_lp_dead);
+
+ do_set_vm = heap_page_will_update_vm(relation,
+ buffer, blockno, page,
+ reason,
+ do_prune, do_freeze,
+ blk_known_av,
+ &prstate,
+ &vmbuffer,
+ &vmflags, &set_pd_all_visible);
/* Save these for the caller in case we later zero out vmflags */
presult->new_vmbits = vmflags;
- /* Any error while applying the changes is critical */
+ /*
+ * Time to actually make the changes to the page and log them. Any error
+ * while applying the changes is critical.
+ */
START_CRIT_SECTION();
if (do_hint_full_or_prunable)
--
2.43.0
v9-0022-Set-pd_prune_xid-on-insert.patchtext/x-patch; charset=US-ASCII; name=v9-0022-Set-pd_prune_xid-on-insert.patchDownload
From 9b5273ec435a8025295c3cfbded611795b50f4d8 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <melanieplageman@gmail.com>
Date: Tue, 29 Jul 2025 16:12:56 -0400
Subject: [PATCH v9 22/22] Set pd_prune_xid on insert
Now that we can set the VM during read-only queries, it makes sense to
start setting the page prunable hint on insert. This will allow
heap_page_prune_and_freeze() to be called when the page is full or
mostly full.
For years there has been a note in heap_insert() and heap_multi_insert()
pointing out that setting pd_prune_xid would help clean up aborted
inserted tuples that would otherwise not be cleaned up until vacuum.
So, that's another benefit of setting it.
Setting pd_prune_xid on insert causes a page to be pruned and then
written out which then affects the reported number of hits in the
index-killtuples isolation test. This is a quirk of how hits are tracked
which sometimes leads them to be double counted. This should probably be
fixed or changed independently.
ci-os-only:
---
src/backend/access/heap/heapam.c | 25 +++++++++++++------
src/backend/access/heap/heapam_xlog.c | 15 ++++++++++-
.../isolation/expected/index-killtuples.out | 6 ++---
3 files changed, 34 insertions(+), 12 deletions(-)
diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index d07693b7075..02aa2383c50 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -2105,6 +2105,7 @@ heap_insert(Relation relation, HeapTuple tup, CommandId cid,
TransactionId xid = GetCurrentTransactionId();
HeapTuple heaptup;
Buffer buffer;
+ Page page;
Buffer vmbuffer = InvalidBuffer;
bool all_visible_cleared = false;
@@ -2164,15 +2165,19 @@ heap_insert(Relation relation, HeapTuple tup, CommandId cid,
}
/*
- * XXX Should we set PageSetPrunable on this page ?
+ * Set pd_prune_xid to trigger heap_page_prune_and_freeze() once the page
+ * is full so that we can set the page all-visible in the VM.
*
- * The inserting transaction may eventually abort thus making this tuple
- * DEAD and hence available for pruning. Though we don't want to optimize
- * for aborts, if no other tuple in this page is UPDATEd/DELETEd, the
- * aborted tuple will never be pruned until next vacuum is triggered.
+ * Setting pd_prune_xid is also handy if the inserting transaction
+ * eventually aborts making this tuple DEAD and hence available for
+ * pruning. If no other tuple in this page is UPDATEd/DELETEd, the aborted
+ * tuple would never otherwise be pruned until next vacuum is triggered.
*
- * If you do add PageSetPrunable here, add it in heap_xlog_insert too.
+ * Don't set it if we are in bootstrap mode, though.
*/
+ page = BufferGetPage(buffer);
+ if (TransactionIdIsNormal(xid))
+ PageSetPrunable(page, xid);
MarkBufferDirty(buffer);
@@ -2182,7 +2187,6 @@ heap_insert(Relation relation, HeapTuple tup, CommandId cid,
xl_heap_insert xlrec;
xl_heap_header xlhdr;
XLogRecPtr recptr;
- Page page = BufferGetPage(buffer);
uint8 info = XLOG_HEAP_INSERT;
int bufflags = 0;
@@ -2545,8 +2549,13 @@ heap_multi_insert(Relation relation, TupleTableSlot **slots, int ntuples,
}
/*
- * XXX Should we set PageSetPrunable on this page ? See heap_insert()
+ * Set pd_prune_xid. See heap_insert() for more on why we do this when
+ * inserting tuples. This only makes sense if we aren't already
+ * setting the page frozen in the VM. We also don't set it in
+ * bootstrap mode.
*/
+ if (!all_frozen_set && TransactionIdIsNormal(xid))
+ PageSetPrunable(page, xid);
MarkBufferDirty(buffer);
diff --git a/src/backend/access/heap/heapam_xlog.c b/src/backend/access/heap/heapam_xlog.c
index ff3ad8b4cd2..e7d7804871b 100644
--- a/src/backend/access/heap/heapam_xlog.c
+++ b/src/backend/access/heap/heapam_xlog.c
@@ -470,6 +470,12 @@ heap_xlog_insert(XLogReaderState *record)
freespace = PageGetHeapFreeSpace(page); /* needed to update FSM below */
+ /*
+ * Set the page prunable to trigger on-access pruning later which may
+ * set the page all-visible in the VM.
+ */
+ PageSetPrunable(page, XLogRecGetXid(record));
+
PageSetLSN(page, lsn);
if (xlrec->flags & XLH_INSERT_ALL_VISIBLE_CLEARED)
@@ -619,9 +625,16 @@ heap_xlog_multi_insert(XLogReaderState *record)
if (xlrec->flags & XLH_INSERT_ALL_VISIBLE_CLEARED)
PageClearAllVisible(page);
- /* XLH_INSERT_ALL_FROZEN_SET implies that all tuples are visible */
+ /*
+ * XLH_INSERT_ALL_FROZEN_SET implies that all tuples are visible. If
+ * we are not setting the page frozen, then set the page's prunable
+ * hint so that we trigger on-access pruning later which may set the
+ * page all-visible in the VM.
+ */
if (xlrec->flags & XLH_INSERT_ALL_FROZEN_SET)
PageSetAllVisible(page);
+ else
+ PageSetPrunable(page, XLogRecGetXid(record));
MarkBufferDirty(buffer);
}
diff --git a/src/test/isolation/expected/index-killtuples.out b/src/test/isolation/expected/index-killtuples.out
index be7ddd756ef..b29f2434b00 100644
--- a/src/test/isolation/expected/index-killtuples.out
+++ b/src/test/isolation/expected/index-killtuples.out
@@ -54,7 +54,7 @@ step flush: SELECT FROM pg_stat_force_next_flush();
step result: SELECT heap_blks_read + heap_blks_hit - counter.heap_accesses AS new_heap_accesses FROM counter, pg_statio_all_tables WHERE relname = 'kill_prior_tuple';
new_heap_accesses
-----------------
- 1
+ 2
(1 row)
step measure: UPDATE counter SET heap_accesses = (SELECT heap_blks_read + heap_blks_hit FROM pg_statio_all_tables WHERE relname = 'kill_prior_tuple');
@@ -130,7 +130,7 @@ step flush: SELECT FROM pg_stat_force_next_flush();
step result: SELECT heap_blks_read + heap_blks_hit - counter.heap_accesses AS new_heap_accesses FROM counter, pg_statio_all_tables WHERE relname = 'kill_prior_tuple';
new_heap_accesses
-----------------
- 1
+ 2
(1 row)
step measure: UPDATE counter SET heap_accesses = (SELECT heap_blks_read + heap_blks_hit FROM pg_statio_all_tables WHERE relname = 'kill_prior_tuple');
@@ -283,7 +283,7 @@ step flush: SELECT FROM pg_stat_force_next_flush();
step result: SELECT heap_blks_read + heap_blks_hit - counter.heap_accesses AS new_heap_accesses FROM counter, pg_statio_all_tables WHERE relname = 'kill_prior_tuple';
new_heap_accesses
-----------------
- 1
+ 2
(1 row)
step measure: UPDATE counter SET heap_accesses = (SELECT heap_blks_read + heap_blks_hit FROM pg_statio_all_tables WHERE relname = 'kill_prior_tuple');
--
2.43.0
On Wed, Sep 3, 2025 at 5:06 AM Kirill Reshke <reshkekirill@gmail.com> wrote:
small comments regarding new series
0001, 0002, 0017 LGTM
Thanks for continuing to review!
In 0015:
Also, maybe GlobalVisXidTestAllVisible is a slightly better name? (The
term 'all-visible' is one that we occasionally utilize)
Actually, I was trying to distinguish it from all-visible because I
interpret that to mean every thing is visible -- as in, every tuple on
a page is visible to everyone. And here we are referring to one xid
and want to know if it is visible to everyone as no longer running. I
don't think my name ("visible-to-all") is good, but I'm hesitant to
co-opt "all-visible" here.
- Melanie
On Fri, Sep 5, 2025 at 6:20 PM Melanie Plageman
<melanieplageman@gmail.com> wrote:
On 2025-09-02 19:11:01 -0400, Melanie Plageman wrote:
From dd98177294011ee93cac122405516abd89f4e393 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <melanieplageman@gmail.com>
Date: Wed, 27 Aug 2025 08:50:15 -0400
Subject: [PATCH v8 01/22] Remove unneeded VM pin from VM replayI didn't push it yet because I did a new version that actually
eliminates the asserts in heap_multi_insert() before calling
visibilitymap_set() -- since they are redundant with checks inside
visibilitymap_set(). 0001 of attached v9 is what I plan to push,
barring any objections.
I pushed this, so rebased v10 is attached. I've added one new patch:
0002 adds ERRCODE_DATA_CORRUPTED to the existing log messages about
VM/data corruption in vacuum. Andrey Borodin earlier suggested this,
and I had neglected to include it.
- Melanie
Attachments:
v10-0020-Add-helper-functions-to-heap_page_prune_and_free.patchtext/x-patch; charset=US-ASCII; name=v10-0020-Add-helper-functions-to-heap_page_prune_and_free.patchDownload
From a0229c65dc6e59033c65f0a9efaf749a82678551 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <melanieplageman@gmail.com>
Date: Wed, 30 Jul 2025 18:51:43 -0400
Subject: [PATCH v10 20/22] Add helper functions to heap_page_prune_and_freeze
heap_page_prune_and_freeze() has gotten rather long. It has several
stages:
1) setup - where the PruneStateis set up
2) tuple examination -- where tuples and line pointers are examined to
determine what needs to be pruned and what could be frozen
3) evaluation -- where we determine based on caller provided options,
heuristics, and state gathered during stage 2 whether or not to
freeze tuples and set the page in the VM
4) execution - where the page changes are actually made and logged
This commit refactors the evaluation stage into helpers which return
whether or not to freeze and set the VM.
For the purposes of committing, this likely shouldn't be a separate
commit. But I'm not sure yet whether it makes more sense to do this
refactoring earlier in the set for clarity for the reviewer.
---
src/backend/access/heap/pruneheap.c | 471 +++++++++++++++++-----------
1 file changed, 295 insertions(+), 176 deletions(-)
diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index 0fedcad24c9..1a1a551859b 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -179,6 +179,22 @@ static void heap_prune_record_unchanged_lp_redirect(PruneState *prstate, OffsetN
static void page_verify_redirects(Page page);
+static bool heap_page_will_freeze(Relation relation, Buffer buffer,
+ bool do_prune,
+ bool do_hint_full_or_prunable,
+ bool did_tuple_hint_fpi,
+ PruneState *prstate,
+ bool *all_frozen_except_lp_dead);
+
+static bool heap_page_will_update_vm(Relation relation,
+ Buffer buffer, BlockNumber blockno, Page page,
+ PruneReason reason,
+ bool do_prune, bool do_freeze,
+ bool blk_known_av,
+ PruneState *prstate,
+ Buffer *vmbuffer, uint8 *vmflags,
+ bool *set_pd_all_visible);
+
static bool identify_and_fix_vm_corruption(Relation relation,
BlockNumber heap_blk,
Buffer heap_buffer, Page heap_page,
@@ -380,6 +396,249 @@ identify_and_fix_vm_corruption(Relation relation,
return false;
}
+
+/*
+ * Determine whether to set the visibility map bits based on information from
+ * the PruneState and blk_known_av, which some callers will provide after
+ * previously examining this heap page's VM bits (e.g. vacuum from the last
+ * heap_vac_scan_next_block() call).
+ *
+ * We pass in blockno and page even those can be derived from buffer to avoid
+ * extra BufferGetBlock() and BufferGetBlockNumber() calls.
+ *
+ * This should be called only after do_freeze has been decided (and do_prune
+ * has been set), as these factor into our heuristic-based decision.
+ *
+ * prstate and vmbuffer are input/output fields. vmflags and and
+ * set_pd_all_visible are output fields.
+ *
+ * Returns true if the caller should set one or both of the VM bits and false
+ * otherwise.
+ */
+static bool
+heap_page_will_update_vm(Relation relation,
+ Buffer buffer, BlockNumber blockno, Page page,
+ PruneReason reason,
+ bool do_prune, bool do_freeze,
+ bool blk_known_av,
+ PruneState *prstate,
+ Buffer *vmbuffer, uint8 *vmflags,
+ bool *set_pd_all_visible)
+{
+ bool do_set_vm = false;
+
+ /*
+ * If the caller specified not to update the VM, validate everything is in
+ * the right state and exit.
+ */
+ if (!prstate->consider_update_vm)
+ {
+ Assert(!prstate->all_visible && !prstate->all_frozen);
+ /* We don't set only the page level visibility hint */
+ Assert(!(*set_pd_all_visible));
+ Assert(*vmflags == 0);
+ return false;
+ }
+
+ /*
+ * If this is an on-access call and we're not actually pruning, avoid
+ * setting the visibility map if it would newly dirty the heap page or, if
+ * the page is already dirty, if doing so would require including a
+ * full-page image (FPI) of the heap page in the WAL. This situation
+ * should be rare, as on-access pruning is only attempted when
+ * pd_prune_xid is valid.
+ */
+ if (reason == PRUNE_ON_ACCESS &&
+ prstate->consider_update_vm &&
+ prstate->all_visible &&
+ !do_prune && !do_freeze &&
+ (!BufferIsDirty(buffer) || XLogCheckBufferNeedsBackup(buffer)))
+ {
+ prstate->consider_update_vm = false;
+ prstate->all_visible = prstate->all_frozen = false;
+ }
+
+ Assert(!prstate->all_frozen || prstate->all_visible);
+
+ /*
+ * Clear any VM corruption. This does not need to be in a critical
+ * section, so we do it first. If PD_ALL_VISIBLE is incorrectly set, we
+ * may mark the heap page buffer dirty here and could end up doing so
+ * again later. This is not a correctness issue and is in the path of VM
+ * corruption, so we don't have to worry about the extra performance
+ * overhead.
+ */
+ if (identify_and_fix_vm_corruption(relation,
+ blockno, buffer, page,
+ blk_known_av, prstate->lpdead_items,
+ *vmbuffer))
+ {
+ /* If we fix corruption, don't update the VM further */
+ }
+
+ /* Determine if we actually need to set the VM and which bits to set. */
+ else if (prstate->all_visible &&
+ (!blk_known_av ||
+ (prstate->all_frozen && !VM_ALL_FROZEN(relation, blockno, vmbuffer))))
+ {
+ *vmflags |= VISIBILITYMAP_ALL_VISIBLE;
+ if (prstate->all_frozen)
+ *vmflags |= VISIBILITYMAP_ALL_FROZEN;
+ }
+
+ do_set_vm = *vmflags & VISIBILITYMAP_VALID_BITS;
+
+ /*
+ * Don't set PD_ALL_VISIBLE unless we also plan to set the VM. While it is
+ * correct for a heap page to have PD_ALL_VISIBLE even if the VM is not
+ * set, we strongly prefer to keep them in sync.
+ *
+ * Prior to Postgres 19, it was possible for the page-level bit to be set
+ * and the VM bit to be clear. This could happen if we crashed after
+ * setting PD_ALL_VISIBLE but before setting bits in the VM.
+ */
+ *set_pd_all_visible = do_set_vm && !PageIsAllVisible(page);
+ return do_set_vm;
+}
+
+/*
+ * Decide if we want to go ahead with freezing according to the freeze plans we
+ * prepared for the given buffer or not. If the caller specified we should not
+ * freeze tuples, it exits early.
+ *
+ * do_prune, do_hint_full_or_prunable, and did_tuple_hint_fpi must all have
+ * been decided before calling this function.
+ *
+ * prstate is an input/output parameter. all_frozen_except_lp_dead is set and
+ * used later to determine the snapshot conflict horizon for the record.
+ *
+ * Returns true if we should use our freeze plans and freeze tuples on the page
+ * and false otherwise.
+ */
+static bool
+heap_page_will_freeze(Relation relation, Buffer buffer,
+ bool do_prune,
+ bool do_hint_full_or_prunable,
+ bool did_tuple_hint_fpi,
+ PruneState *prstate,
+ bool *all_frozen_except_lp_dead)
+{
+ bool do_freeze = false;
+
+ /*
+ * If the caller specified we should not attempt to freeze any tuples,
+ * validate that everything is in the right state and exit.
+ */
+ if (!prstate->attempt_freeze)
+ {
+ Assert(!prstate->all_frozen && prstate->nfrozen == 0);
+ Assert(prstate->lpdead_items == 0 || !prstate->all_visible);
+ Assert(!(*all_frozen_except_lp_dead));
+ return false;
+ }
+
+ if (prstate->pagefrz.freeze_required)
+ {
+ /*
+ * heap_prepare_freeze_tuple indicated that at least one XID/MXID from
+ * before FreezeLimit/MultiXactCutoff is present. Must freeze to
+ * advance relfrozenxid/relminmxid.
+ */
+ do_freeze = true;
+ }
+ else
+ {
+ /*
+ * Opportunistically freeze the page if we are generating an FPI
+ * anyway and if doing so means that we can set the page all-frozen
+ * afterwards (might not happen until VACUUM's final heap pass).
+ *
+ * XXX: Previously, we knew if pruning emitted an FPI by checking
+ * pgWalUsage.wal_fpi before and after pruning. Once the freeze and
+ * prune records were combined, this heuristic couldn't be used
+ * anymore. The opportunistic freeze heuristic must be improved;
+ * however, for now, try to approximate the old logic.
+ */
+ if (prstate->all_visible && prstate->all_frozen && prstate->nfrozen > 0)
+ {
+ /*
+ * Freezing would make the page all-frozen. Have already emitted
+ * an FPI or will do so anyway?
+ */
+ if (RelationNeedsWAL(relation))
+ {
+ if (did_tuple_hint_fpi)
+ do_freeze = true;
+ else if (do_prune)
+ {
+ if (XLogCheckBufferNeedsBackup(buffer))
+ do_freeze = true;
+ }
+ else if (do_hint_full_or_prunable)
+ {
+ if (XLogHintBitIsNeeded() && XLogCheckBufferNeedsBackup(buffer))
+ do_freeze = true;
+ }
+ }
+ }
+ }
+
+ if (do_freeze)
+ {
+ /*
+ * Validate the tuples we will be freezing before entering the
+ * critical section.
+ */
+ heap_pre_freeze_checks(buffer, prstate->frozen, prstate->nfrozen);
+ }
+ else if (prstate->nfrozen > 0)
+ {
+ /*
+ * The page contained some tuples that were not already frozen, and we
+ * chose not to freeze them now. The page won't be all-frozen then.
+ */
+ Assert(!prstate->pagefrz.freeze_required);
+
+ prstate->all_frozen = false;
+ prstate->nfrozen = 0; /* avoid miscounts in instrumentation */
+ }
+ else
+ {
+ /*
+ * We have no freeze plans to execute. The page might already be
+ * all-frozen (perhaps only following pruning), though. Such pages
+ * can be marked all-frozen in the VM by our caller, even though none
+ * of its tuples were newly frozen here.
+ */
+ }
+
+ /*
+ * It was convenient to ignore LP_DEAD items in all_visible earlier on to
+ * make the choice of whether or not to freeze the page unaffected by the
+ * short-term presence of LP_DEAD items. These LP_DEAD items were
+ * effectively assumed to be LP_UNUSED items in the making. It doesn't
+ * matter which vacuum heap pass (initial pass or final pass) ends up
+ * setting the page all-frozen, as long as the ongoing VACUUM does it.
+ *
+ * Now that freezing has been finalized, unset all_visible if there are
+ * any LP_DEAD items on the page. It needs to reflect the present state of
+ * the page when using it to determine whether or not to update the VM.
+ *
+ * Keep track of whether or not the page was all-frozen except LP_DEAD
+ * items for the purposes of calculating the snapshot conflict horizon,
+ * though.
+ */
+ *all_frozen_except_lp_dead = prstate->all_frozen;
+ if (prstate->lpdead_items > 0)
+ {
+ prstate->all_visible = false;
+ prstate->all_frozen = false;
+ }
+
+ return do_freeze;
+}
+
+
/*
* Prune and repair fragmentation and potentially freeze tuples on the
* specified page. If the page's visibility status has changed, update it in
@@ -770,20 +1029,30 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
/* Clear the offset information once we have processed the given page. */
*off_loc = InvalidOffsetNumber;
- do_prune = prstate.nredirected > 0 ||
- prstate.ndead > 0 ||
- prstate.nunused > 0;
-
/*
* After processing all the live tuples on the page, if the newest xmin
* amongst them is not visible to everyone, the page cannot be
- * all-visible.
+ * all-visible. This must be done before we decide whether or not to
+ * opportunistically freeze below because we do not want to
+ * opportunistically freeze the page if there are live tuples not visible
+ * to everyone, which would prevent setting the page frozen in the VM.
*/
if (prstate.all_visible &&
TransactionIdIsNormal(prstate.visibility_cutoff_xid) &&
!GlobalVisXidVisibleToAll(prstate.vistest, prstate.visibility_cutoff_xid))
prstate.all_visible = prstate.all_frozen = false;
+ /*
+ * Now decide based on information collected while examining every tuple
+ * which actions to take. If there are any prunable tuples, we'll prune
+ * them. However, we will decide based on options specified by the caller
+ * and various heuristics whether or not to freeze any tuples and whether
+ * or not the page should be set all-visible/all-frozen in the VM.
+ */
+ do_prune = prstate.nredirected > 0 ||
+ prstate.ndead > 0 ||
+ prstate.nunused > 0;
+
/*
* Even if we don't prune anything, if we found a new value for the
* pd_prune_xid field or the page was marked full, we will update those
@@ -794,182 +1063,32 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
PageIsFull(page);
/*
- * Decide if we want to go ahead with freezing according to the freeze
- * plans we prepared, or not.
- */
- do_freeze = false;
- if (prstate.attempt_freeze)
- {
- if (prstate.pagefrz.freeze_required)
- {
- /*
- * heap_prepare_freeze_tuple indicated that at least one XID/MXID
- * from before FreezeLimit/MultiXactCutoff is present. Must
- * freeze to advance relfrozenxid/relminmxid.
- */
- do_freeze = true;
- }
- else
- {
- /*
- * Opportunistically freeze the page if we are generating an FPI
- * anyway and if doing so means that we can set the page
- * all-frozen afterwards (might not happen until VACUUM's final
- * heap pass).
- *
- * XXX: Previously, we knew if pruning emitted an FPI by checking
- * pgWalUsage.wal_fpi before and after pruning. Once the freeze
- * and prune records were combined, this heuristic couldn't be
- * used anymore. The opportunistic freeze heuristic must be
- * improved; however, for now, try to approximate the old logic.
- */
- if (prstate.all_visible && prstate.all_frozen && prstate.nfrozen > 0)
- {
- /*
- * Freezing would make the page all-frozen. Have already
- * emitted an FPI or will do so anyway?
- */
- if (RelationNeedsWAL(relation))
- {
- if (did_tuple_hint_fpi)
- do_freeze = true;
- else if (do_prune)
- {
- if (XLogCheckBufferNeedsBackup(buffer))
- do_freeze = true;
- }
- else if (do_hint_full_or_prunable)
- {
- if (XLogHintBitIsNeeded() && XLogCheckBufferNeedsBackup(buffer))
- do_freeze = true;
- }
- }
- }
- }
- }
-
- if (do_freeze)
- {
- /*
- * Validate the tuples we will be freezing before entering the
- * critical section.
- */
- heap_pre_freeze_checks(buffer, prstate.frozen, prstate.nfrozen);
- }
- else if (prstate.nfrozen > 0)
- {
- /*
- * The page contained some tuples that were not already frozen, and we
- * chose not to freeze them now. The page won't be all-frozen then.
- */
- Assert(!prstate.pagefrz.freeze_required);
-
- prstate.all_frozen = false;
- prstate.nfrozen = 0; /* avoid miscounts in instrumentation */
- }
- else
- {
- /*
- * We have no freeze plans to execute. The page might already be
- * all-frozen (perhaps only following pruning), though. Such pages
- * can be marked all-frozen in the VM by our caller, even though none
- * of its tuples were newly frozen here.
- */
- }
-
- /*
- * It was convenient to ignore LP_DEAD items in all_visible earlier on to
- * make the choice of whether or not to freeze the page unaffected by the
- * short-term presence of LP_DEAD items. These LP_DEAD items were
- * effectively assumed to be LP_UNUSED items in the making. It doesn't
- * matter which vacuum heap pass (initial pass or final pass) ends up
- * setting the page all-frozen, as long as the ongoing VACUUM does it.
- *
- * Now that freezing has been finalized, unset all_visible if there are
- * any LP_DEAD items on the page. It needs to reflect the present state of
- * the page when using it to determine whether or not to update the VM.
- *
- * Keep track of whether or not the page was all-frozen except LP_DEAD
- * items for the purposes of calculating the snapshot conflict horizon,
- * though.
+ * We must decide whether or not to freeze before deciding if and what to
+ * set in the VM.
*/
- all_frozen_except_lp_dead = prstate.all_frozen;
- if (prstate.lpdead_items > 0)
- {
- prstate.all_visible = false;
- prstate.all_frozen = false;
- }
-
- /*
- * If this is an on-access call and we're not actually pruning, avoid
- * setting the visibility map if it would newly dirty the heap page or, if
- * the page is already dirty, if doing so would require including a
- * full-page image (FPI) of the heap page in the WAL. This situation
- * should be rare, as on-access pruning is only attempted when
- * pd_prune_xid is valid.
- */
- if (reason == PRUNE_ON_ACCESS &&
- prstate.consider_update_vm &&
- prstate.all_visible &&
- !do_prune && !do_freeze &&
- (!BufferIsDirty(buffer) || XLogCheckBufferNeedsBackup(buffer)))
- {
- prstate.consider_update_vm = false;
- prstate.all_visible = prstate.all_frozen = false;
- }
-
- Assert(!prstate.all_frozen || prstate.all_visible);
-
- /*
- * Handle setting visibility map bit based on information from the VM (if
- * provided, e.g. by vacuum from the last heap_vac_scan_next_block()
- * call), and from all_visible and all_frozen variables.
- */
- if (prstate.consider_update_vm)
- {
- /*
- * Clear any VM corruption. This does not need to be in a critical
- * section, so we do it first. If PD_ALL_VISIBLE is incorrectly set,
- * we may mark the heap page buffer dirty here and could end up doing
- * so again later. This is not a correctness issue and is in the path
- * of VM corruption, so we don't have to worry about the extra
- * performance overhead.
- */
- if (identify_and_fix_vm_corruption(relation,
- blockno, buffer, page,
- blk_known_av, prstate.lpdead_items, vmbuffer))
- {
- /* If we fix corruption, don't update the VM further */
- }
-
- /* Determine if we actually need to set the VM and which bits to set. */
- else if (prstate.all_visible &&
- (!blk_known_av ||
- (prstate.all_frozen && !VM_ALL_FROZEN(relation, blockno, &vmbuffer))))
- {
- vmflags |= VISIBILITYMAP_ALL_VISIBLE;
- if (prstate.all_frozen)
- vmflags |= VISIBILITYMAP_ALL_FROZEN;
- }
- }
-
- do_set_vm = vmflags & VISIBILITYMAP_VALID_BITS;
-
- /*
- * Don't set PD_ALL_VISIBLE unless we also plan to set the VM. While it is
- * correct for a heap page to have PD_ALL_VISIBLE even if the VM is not
- * set, we strongly prefer to keep them in sync.
- *
- * Prior to Postgres 19, it was possible for the page-level bit to be set
- * and the VM bit to be clear. This could happen if we crashed after
- * setting PD_ALL_VISIBLE but before setting bits in the VM.
- */
- set_pd_all_visible = do_set_vm && !PageIsAllVisible(page);
+ do_freeze = heap_page_will_freeze(relation, buffer,
+ do_prune,
+ do_hint_full_or_prunable,
+ did_tuple_hint_fpi,
+ &prstate,
+ &all_frozen_except_lp_dead);
+
+ do_set_vm = heap_page_will_update_vm(relation,
+ buffer, blockno, page,
+ reason,
+ do_prune, do_freeze,
+ blk_known_av,
+ &prstate,
+ &vmbuffer,
+ &vmflags, &set_pd_all_visible);
/* Save these for the caller in case we later zero out vmflags */
presult->new_vmbits = vmflags;
- /* Any error while applying the changes is critical */
+ /*
+ * Time to actually make the changes to the page and log them. Any error
+ * while applying the changes is critical.
+ */
START_CRIT_SECTION();
if (do_hint_full_or_prunable)
--
2.43.0
v10-0019-Allow-on-access-pruning-to-set-pages-all-visible.patchtext/x-patch; charset=US-ASCII; name=v10-0019-Allow-on-access-pruning-to-set-pages-all-visible.patchDownload
From 560aca329ffd7bfafcda4f73e339a8efb7dc9ae8 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <melanieplageman@gmail.com>
Date: Tue, 29 Jul 2025 14:34:30 -0400
Subject: [PATCH v10 19/22] Allow on-access pruning to set pages all-visible
Many queries do not modify the underlying relation. For such queries, if
on-access pruning occurs during the scan, we can check whether the page
has become all-visible and update the visibility map accordingly.
Previously, only vacuum marked pages as all-visible or all-frozen.
Supporting this requires passing information about whether the relation
is modified from the executor down to the scan descriptor.
This commit implements on-access VM setting for sequential scans as well
as for the underlying heap relation in index scans and bitmap heap
scans.
---
src/backend/access/heap/heapam.c | 15 ++++-
src/backend/access/heap/heapam_handler.c | 15 ++++-
src/backend/access/heap/pruneheap.c | 67 ++++++++++++++-----
src/backend/access/index/indexam.c | 46 +++++++++++++
src/backend/access/table/tableam.c | 39 +++++++++--
src/backend/executor/execMain.c | 4 ++
src/backend/executor/execUtils.c | 2 +
src/backend/executor/nodeBitmapHeapscan.c | 7 +-
src/backend/executor/nodeIndexscan.c | 18 +++--
src/backend/executor/nodeSeqscan.c | 24 +++++--
src/include/access/genam.h | 11 +++
src/include/access/heapam.h | 24 ++++++-
src/include/access/relscan.h | 6 ++
src/include/access/tableam.h | 30 ++++++++-
src/include/nodes/execnodes.h | 6 ++
.../t/035_standby_logical_decoding.pl | 4 +-
16 files changed, 278 insertions(+), 40 deletions(-)
diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index cb16bb0cbbd..d07693b7075 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -556,6 +556,7 @@ heap_prepare_pagescan(TableScanDesc sscan)
Buffer buffer = scan->rs_cbuf;
BlockNumber block = scan->rs_cblock;
Snapshot snapshot;
+ Buffer *vmbuffer = NULL;
Page page;
int lines;
bool all_visible;
@@ -570,7 +571,9 @@ heap_prepare_pagescan(TableScanDesc sscan)
/*
* Prune and repair fragmentation for the whole page, if possible.
*/
- heap_page_prune_opt(scan->rs_base.rs_rd, buffer);
+ if (sscan->rs_flags & SO_ALLOW_VM_SET)
+ vmbuffer = &scan->rs_vmbuffer;
+ heap_page_prune_opt(scan->rs_base.rs_rd, buffer, vmbuffer);
/*
* We must hold share lock on the buffer content while examining tuple
@@ -1247,6 +1250,7 @@ heap_beginscan(Relation relation, Snapshot snapshot,
sizeof(TBMIterateResult));
}
+ scan->rs_vmbuffer = InvalidBuffer;
return (TableScanDesc) scan;
}
@@ -1285,6 +1289,12 @@ heap_rescan(TableScanDesc sscan, ScanKey key, bool set_params,
scan->rs_cbuf = InvalidBuffer;
}
+ if (BufferIsValid(scan->rs_vmbuffer))
+ {
+ ReleaseBuffer(scan->rs_vmbuffer);
+ scan->rs_vmbuffer = InvalidBuffer;
+ }
+
/*
* SO_TYPE_BITMAPSCAN would be cleaned up here, but it does not hold any
* additional data vs a normal HeapScan
@@ -1317,6 +1327,9 @@ heap_endscan(TableScanDesc sscan)
if (BufferIsValid(scan->rs_cbuf))
ReleaseBuffer(scan->rs_cbuf);
+ if (BufferIsValid(scan->rs_vmbuffer))
+ ReleaseBuffer(scan->rs_vmbuffer);
+
/*
* Must free the read stream before freeing the BufferAccessStrategy.
*/
diff --git a/src/backend/access/heap/heapam_handler.c b/src/backend/access/heap/heapam_handler.c
index bcbac844bb6..f05b9e4968d 100644
--- a/src/backend/access/heap/heapam_handler.c
+++ b/src/backend/access/heap/heapam_handler.c
@@ -85,6 +85,7 @@ heapam_index_fetch_begin(Relation rel)
hscan->xs_base.rel = rel;
hscan->xs_cbuf = InvalidBuffer;
+ hscan->xs_vmbuffer = InvalidBuffer;
return &hscan->xs_base;
}
@@ -99,6 +100,12 @@ heapam_index_fetch_reset(IndexFetchTableData *scan)
ReleaseBuffer(hscan->xs_cbuf);
hscan->xs_cbuf = InvalidBuffer;
}
+
+ if (BufferIsValid(hscan->xs_vmbuffer))
+ {
+ ReleaseBuffer(hscan->xs_vmbuffer);
+ hscan->xs_vmbuffer = InvalidBuffer;
+ }
}
static void
@@ -138,7 +145,8 @@ heapam_index_fetch_tuple(struct IndexFetchTableData *scan,
* Prune page, but only if we weren't already on this page
*/
if (prev_buf != hscan->xs_cbuf)
- heap_page_prune_opt(hscan->xs_base.rel, hscan->xs_cbuf);
+ heap_page_prune_opt(hscan->xs_base.rel, hscan->xs_cbuf,
+ scan->modifies_base_rel ? NULL : &hscan->xs_vmbuffer);
}
/* Obtain share-lock on the buffer so we can examine visibility */
@@ -2471,6 +2479,7 @@ BitmapHeapScanNextBlock(TableScanDesc scan,
TBMIterateResult *tbmres;
OffsetNumber offsets[TBM_MAX_TUPLES_PER_PAGE];
int noffsets = -1;
+ Buffer *vmbuffer = NULL;
Assert(scan->rs_flags & SO_TYPE_BITMAPSCAN);
Assert(hscan->rs_read_stream);
@@ -2517,7 +2526,9 @@ BitmapHeapScanNextBlock(TableScanDesc scan,
/*
* Prune and repair fragmentation for the whole page, if possible.
*/
- heap_page_prune_opt(scan->rs_rd, buffer);
+ if (scan->rs_flags & SO_ALLOW_VM_SET)
+ vmbuffer = &hscan->rs_vmbuffer;
+ heap_page_prune_opt(scan->rs_rd, buffer, vmbuffer);
/*
* We must hold share lock on the buffer content while examining tuple
diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index e6cf4df17c5..0fedcad24c9 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -198,9 +198,13 @@ static bool identify_and_fix_vm_corruption(Relation relation,
* if there's not any use in pruning.
*
* Caller must have pin on the buffer, and must *not* have a lock on it.
+ *
+ * If vmbuffer is not NULL, it is okay for pruning to set the visibility map if
+ * the page is all visible. We will take care of pinning and, if needed,
+ * reading in the page of the visibility map.
*/
void
-heap_page_prune_opt(Relation relation, Buffer buffer)
+heap_page_prune_opt(Relation relation, Buffer buffer, Buffer *vmbuffer)
{
Page page = BufferGetPage(buffer);
TransactionId prune_xid;
@@ -264,6 +268,13 @@ heap_page_prune_opt(Relation relation, Buffer buffer)
{
OffsetNumber dummy_off_loc;
PruneFreezeResult presult;
+ int options = 0;
+
+ if (vmbuffer)
+ {
+ visibilitymap_pin(relation, BufferGetBlockNumber(buffer), vmbuffer);
+ options = HEAP_PAGE_PRUNE_UPDATE_VM;
+ }
/*
* For now, pass mark_unused_now as false regardless of whether or
@@ -271,9 +282,10 @@ heap_page_prune_opt(Relation relation, Buffer buffer)
* that during on-access pruning with the current implementation.
*/
heap_page_prune_and_freeze(relation, buffer, false,
- InvalidBuffer,
- vistest, 0,
- NULL, &presult, PRUNE_ON_ACCESS, &dummy_off_loc, NULL, NULL);
+ vmbuffer ? *vmbuffer : InvalidBuffer,
+ vistest, options,
+ NULL, &presult, PRUNE_ON_ACCESS,
+ &dummy_off_loc, NULL, NULL);
/*
* Report the number of tuples reclaimed to pgstats. This is
@@ -517,12 +529,17 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
* all-frozen for use in opportunistic freezing and to update the VM if
* the caller requests it.
*
- * Currently, only VACUUM attempts freezing and setting the VM bits. But
- * other callers could do either one. The visibility bookkeeping is
- * required for opportunistic freezing (in addition to setting the VM
- * bits) because we only consider opportunistically freezing tuples if the
- * whole page would become all-frozen or if the whole page will be frozen
- * except for dead tuples that will be removed by vacuum.
+ * Currently, only VACUUM attempts freezing. But other callers could. The
+ * visibility bookkeeping is required for opportunistic freezing (in
+ * addition to setting the VM bits) because we only consider
+ * opportunistically freezing tuples if the whole page would become
+ * all-frozen or if the whole page will be frozen except for dead tuples
+ * that will be removed by vacuum. But if consider_update_vm is false,
+ * we'll not set the VM even if the page is discovered to be all-visible.
+ *
+ * If only HEAP_PAGE_PRUNE_UPDATE_VM is passed and not
+ * HEAP_PAGE_PRUNE_FREEZE, prstate.all_frozen must be initialized to false
+ * because we will not call heap_prepare_freeze_tuple() on each tuple.
*
* If only updating the VM, we must initialize all_frozen to false, as
* heap_prepare_freeze_tuple() will not be called for each tuple on the
@@ -534,7 +551,7 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
* whether or not to freeze but before deciding whether or not to update
* the VM so that we don't set the VM bit incorrectly.
*
- * If not freezing or updating the VM, we otherwise avoid the extra
+ * If not freezing and not updating the VM, we avoid the extra
* bookkeeping. Initializing all_visible to false allows skipping the work
* to update them in heap_prune_record_unchanged_lp_normal().
*/
@@ -883,12 +900,30 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
prstate.all_frozen = false;
}
+ /*
+ * If this is an on-access call and we're not actually pruning, avoid
+ * setting the visibility map if it would newly dirty the heap page or, if
+ * the page is already dirty, if doing so would require including a
+ * full-page image (FPI) of the heap page in the WAL. This situation
+ * should be rare, as on-access pruning is only attempted when
+ * pd_prune_xid is valid.
+ */
+ if (reason == PRUNE_ON_ACCESS &&
+ prstate.consider_update_vm &&
+ prstate.all_visible &&
+ !do_prune && !do_freeze &&
+ (!BufferIsDirty(buffer) || XLogCheckBufferNeedsBackup(buffer)))
+ {
+ prstate.consider_update_vm = false;
+ prstate.all_visible = prstate.all_frozen = false;
+ }
+
Assert(!prstate.all_frozen || prstate.all_visible);
/*
- * Handle setting visibility map bit based on information from the VM (as
- * of last heap_vac_scan_next_block() call), and from all_visible and
- * all_frozen variables.
+ * Handle setting visibility map bit based on information from the VM (if
+ * provided, e.g. by vacuum from the last heap_vac_scan_next_block()
+ * call), and from all_visible and all_frozen variables.
*/
if (prstate.consider_update_vm)
{
@@ -2279,8 +2314,8 @@ heap_log_freeze_plan(HeapTupleFreeze *tuples, int ntuples,
* - Reaping: During vacuum phase III, items that are already LP_DEAD are
* marked as unused.
*
- * - VM updates: After vacuum phases I and III, the heap page may be marked
- * all-visible and all-frozen.
+ * - VM updates: After vacuum phases I and III and on-access, the heap page
+ * may be marked all-visible and all-frozen.
*
* These changes all happen together, so we use a singel WAL record for them
* all.
diff --git a/src/backend/access/index/indexam.c b/src/backend/access/index/indexam.c
index 86d11f4ec79..4603ece09bd 100644
--- a/src/backend/access/index/indexam.c
+++ b/src/backend/access/index/indexam.c
@@ -289,6 +289,32 @@ index_beginscan(Relation heapRelation,
return scan;
}
+/*
+ * Similar to index_beginscan(), but allows the caller to indicate whether the
+ * query modifies the underlying base relation. This is used when the caller
+ * wants to attempt marking pages in the base relation as all-visible in the
+ * visibility map during on-access pruning.
+ */
+IndexScanDesc
+index_beginscan_vmset(Relation heapRelation,
+ Relation indexRelation,
+ Snapshot snapshot,
+ IndexScanInstrumentation *instrument,
+ int nkeys, int norderbys, bool modifies_base_rel)
+{
+ IndexScanDesc scan;
+
+ scan = index_beginscan(heapRelation,
+ indexRelation,
+ snapshot,
+ instrument,
+ nkeys, norderbys);
+
+ scan->xs_heapfetch->modifies_base_rel = modifies_base_rel;
+
+ return scan;
+}
+
/*
* index_beginscan_bitmap - start a scan of an index with amgetbitmap
*
@@ -620,6 +646,26 @@ index_beginscan_parallel(Relation heaprel, Relation indexrel,
return scan;
}
+/*
+ * Parallel version of index_beginscan_vmset()
+ */
+IndexScanDesc
+index_beginscan_parallel_vmset(Relation heaprel, Relation indexrel,
+ IndexScanInstrumentation *instrument,
+ int nkeys, int norderbys,
+ ParallelIndexScanDesc pscan,
+ bool modifies_base_rel)
+{
+ IndexScanDesc scan;
+
+ scan = index_beginscan_parallel(heaprel, indexrel,
+ instrument,
+ nkeys, norderbys,
+ pscan);
+ scan->xs_heapfetch->modifies_base_rel = modifies_base_rel;
+ return scan;
+}
+
/* ----------------
* index_getnext_tid - get the next TID from a scan
*
diff --git a/src/backend/access/table/tableam.c b/src/backend/access/table/tableam.c
index a56c5eceb14..67dbf99f5b5 100644
--- a/src/backend/access/table/tableam.c
+++ b/src/backend/access/table/tableam.c
@@ -49,6 +49,10 @@
char *default_table_access_method = DEFAULT_TABLE_ACCESS_METHOD;
bool synchronize_seqscans = true;
+/* Helper for table_beginscan_parallel() and table_beginscan_parallel_vmset() */
+static TableScanDesc table_beginscan_parallel_common(Relation relation, ParallelTableScanDesc pscan,
+ uint32 flags);
+
/* ----------------------------------------------------------------------------
* Slot functions.
@@ -162,12 +166,14 @@ table_parallelscan_initialize(Relation rel, ParallelTableScanDesc pscan,
}
}
-TableScanDesc
-table_beginscan_parallel(Relation relation, ParallelTableScanDesc pscan)
+/*
+ * Common helper for table_beginscan_parallel() and table_beginscan_parallel_vmset()
+ */
+static TableScanDesc
+table_beginscan_parallel_common(Relation relation, ParallelTableScanDesc pscan,
+ uint32 flags)
{
Snapshot snapshot;
- uint32 flags = SO_TYPE_SEQSCAN |
- SO_ALLOW_STRAT | SO_ALLOW_SYNC | SO_ALLOW_PAGEMODE;
Assert(RelFileLocatorEquals(relation->rd_locator, pscan->phs_locator));
@@ -188,6 +194,31 @@ table_beginscan_parallel(Relation relation, ParallelTableScanDesc pscan)
pscan, flags);
}
+TableScanDesc
+table_beginscan_parallel(Relation relation, ParallelTableScanDesc pscan)
+{
+ uint32 flags = SO_TYPE_SEQSCAN |
+ SO_ALLOW_STRAT | SO_ALLOW_SYNC | SO_ALLOW_PAGEMODE;
+
+ return table_beginscan_parallel_common(relation, pscan, flags);
+}
+
+/*
+ * Parallel version of table_beginscan_vmset()
+ */
+TableScanDesc
+table_beginscan_parallel_vmset(Relation relation, ParallelTableScanDesc pscan,
+ bool modifies_rel)
+{
+ uint32 flags = SO_TYPE_SEQSCAN |
+ SO_ALLOW_STRAT | SO_ALLOW_SYNC | SO_ALLOW_PAGEMODE;
+
+ if (!modifies_rel)
+ flags |= SO_ALLOW_VM_SET;
+
+ return table_beginscan_parallel_common(relation, pscan, flags);
+}
+
/* ----------------------------------------------------------------------------
* Index scan related functions.
diff --git a/src/backend/executor/execMain.c b/src/backend/executor/execMain.c
index ff12e2e1364..2e0474c948a 100644
--- a/src/backend/executor/execMain.c
+++ b/src/backend/executor/execMain.c
@@ -916,6 +916,10 @@ InitPlan(QueryDesc *queryDesc, int eflags)
break;
}
+ /* If it has a rowmark, the relation is modified */
+ estate->es_modified_relids = bms_add_member(estate->es_modified_relids,
+ rc->rti);
+
/* Check that relation is a legal target for marking */
if (relation)
CheckValidRowMarkRel(relation, rc->markType);
diff --git a/src/backend/executor/execUtils.c b/src/backend/executor/execUtils.c
index fdc65c2b42b..28a06dcd244 100644
--- a/src/backend/executor/execUtils.c
+++ b/src/backend/executor/execUtils.c
@@ -893,6 +893,8 @@ ExecInitResultRelation(EState *estate, ResultRelInfo *resultRelInfo,
estate->es_result_relations = (ResultRelInfo **)
palloc0(estate->es_range_table_size * sizeof(ResultRelInfo *));
estate->es_result_relations[rti - 1] = resultRelInfo;
+ estate->es_modified_relids = bms_add_member(estate->es_modified_relids,
+ rti);
/*
* Saving in the list allows to avoid needlessly traversing the whole
diff --git a/src/backend/executor/nodeBitmapHeapscan.c b/src/backend/executor/nodeBitmapHeapscan.c
index bf24f3d7fe0..af6db9f7919 100644
--- a/src/backend/executor/nodeBitmapHeapscan.c
+++ b/src/backend/executor/nodeBitmapHeapscan.c
@@ -105,11 +105,16 @@ BitmapTableScanSetup(BitmapHeapScanState *node)
*/
if (!node->ss.ss_currentScanDesc)
{
+ bool modifies_rel =
+ bms_is_member(((Scan *) node->ss.ps.plan)->scanrelid,
+ node->ss.ps.state->es_modified_relids);
+
node->ss.ss_currentScanDesc =
table_beginscan_bm(node->ss.ss_currentRelation,
node->ss.ps.state->es_snapshot,
0,
- NULL);
+ NULL,
+ modifies_rel);
}
node->ss.ss_currentScanDesc->st.rs_tbmiterator = tbmiterator;
diff --git a/src/backend/executor/nodeIndexscan.c b/src/backend/executor/nodeIndexscan.c
index 7fcaa37fe62..c2ffbd3b08e 100644
--- a/src/backend/executor/nodeIndexscan.c
+++ b/src/backend/executor/nodeIndexscan.c
@@ -102,16 +102,22 @@ IndexNext(IndexScanState *node)
if (scandesc == NULL)
{
+
+ bool modifies_base_rel =
+ bms_is_member(((Scan *) node->ss.ps.plan)->scanrelid,
+ estate->es_modified_relids);
+
/*
* We reach here if the index scan is not parallel, or if we're
* serially executing an index scan that was planned to be parallel.
*/
- scandesc = index_beginscan(node->ss.ss_currentRelation,
- node->iss_RelationDesc,
- estate->es_snapshot,
- &node->iss_Instrument,
- node->iss_NumScanKeys,
- node->iss_NumOrderByKeys);
+ scandesc = index_beginscan_vmset(node->ss.ss_currentRelation,
+ node->iss_RelationDesc,
+ estate->es_snapshot,
+ &node->iss_Instrument,
+ node->iss_NumScanKeys,
+ node->iss_NumOrderByKeys,
+ modifies_base_rel);
node->iss_ScanDesc = scandesc;
diff --git a/src/backend/executor/nodeSeqscan.c b/src/backend/executor/nodeSeqscan.c
index 94047d29430..fd69275c181 100644
--- a/src/backend/executor/nodeSeqscan.c
+++ b/src/backend/executor/nodeSeqscan.c
@@ -65,13 +65,18 @@ SeqNext(SeqScanState *node)
if (scandesc == NULL)
{
+ bool modifies_rel =
+ bms_is_member(((Scan *) node->ss.ps.plan)->scanrelid,
+ estate->es_modified_relids);
+
/*
* We reach here if the scan is not parallel, or if we're serially
* executing a scan that was planned to be parallel.
*/
- scandesc = table_beginscan(node->ss.ss_currentRelation,
- estate->es_snapshot,
- 0, NULL);
+ scandesc = table_beginscan_vmset(node->ss.ss_currentRelation,
+ estate->es_snapshot,
+ 0, NULL, modifies_rel);
+
node->ss.ss_currentScanDesc = scandesc;
}
@@ -366,6 +371,7 @@ ExecSeqScanInitializeDSM(SeqScanState *node,
ParallelContext *pcxt)
{
EState *estate = node->ss.ps.state;
+ bool modifies_rel;
ParallelTableScanDesc pscan;
pscan = shm_toc_allocate(pcxt->toc, node->pscan_len);
@@ -373,8 +379,11 @@ ExecSeqScanInitializeDSM(SeqScanState *node,
pscan,
estate->es_snapshot);
shm_toc_insert(pcxt->toc, node->ss.ps.plan->plan_node_id, pscan);
+ modifies_rel = bms_is_member(((Scan *) node->ss.ps.plan)->scanrelid,
+ estate->es_modified_relids);
node->ss.ss_currentScanDesc =
- table_beginscan_parallel(node->ss.ss_currentRelation, pscan);
+ table_beginscan_parallel_vmset(node->ss.ss_currentRelation, pscan,
+ modifies_rel);
}
/* ----------------------------------------------------------------
@@ -404,8 +413,13 @@ ExecSeqScanInitializeWorker(SeqScanState *node,
ParallelWorkerContext *pwcxt)
{
ParallelTableScanDesc pscan;
+ bool modifies_rel =
+ bms_is_member(((Scan *) node->ss.ps.plan)->scanrelid,
+ node->ss.ps.state->es_modified_relids);
pscan = shm_toc_lookup(pwcxt->toc, node->ss.ps.plan->plan_node_id, false);
node->ss.ss_currentScanDesc =
- table_beginscan_parallel(node->ss.ss_currentRelation, pscan);
+ table_beginscan_parallel_vmset(node->ss.ss_currentRelation,
+ pscan,
+ modifies_rel);
}
diff --git a/src/include/access/genam.h b/src/include/access/genam.h
index 5b2ab181b5f..bf272c2c37f 100644
--- a/src/include/access/genam.h
+++ b/src/include/access/genam.h
@@ -180,6 +180,11 @@ extern IndexScanDesc index_beginscan(Relation heapRelation,
Snapshot snapshot,
IndexScanInstrumentation *instrument,
int nkeys, int norderbys);
+extern IndexScanDesc index_beginscan_vmset(Relation heapRelation,
+ Relation indexRelation,
+ Snapshot snapshot,
+ IndexScanInstrumentation *instrument,
+ int nkeys, int norderbys, bool modifies_heap_rel);
extern IndexScanDesc index_beginscan_bitmap(Relation indexRelation,
Snapshot snapshot,
IndexScanInstrumentation *instrument,
@@ -206,6 +211,12 @@ extern IndexScanDesc index_beginscan_parallel(Relation heaprel,
IndexScanInstrumentation *instrument,
int nkeys, int norderbys,
ParallelIndexScanDesc pscan);
+
+extern IndexScanDesc index_beginscan_parallel_vmset(Relation heaprel, Relation indexrel,
+ IndexScanInstrumentation *instrument,
+ int nkeys, int norderbys,
+ ParallelIndexScanDesc pscan,
+ bool modifies_rel);
extern ItemPointer index_getnext_tid(IndexScanDesc scan,
ScanDirection direction);
struct TupleTableSlot;
diff --git a/src/include/access/heapam.h b/src/include/access/heapam.h
index 4278f351bdf..16f7904a21e 100644
--- a/src/include/access/heapam.h
+++ b/src/include/access/heapam.h
@@ -94,6 +94,13 @@ typedef struct HeapScanDescData
*/
ParallelBlockTableScanWorkerData *rs_parallelworkerdata;
+ /*
+ * For sequential scans and bitmap heap scans. If the relation is not
+ * being modified, on-access pruning may read in the current heap page's
+ * corresponding VM block to this buffer.
+ */
+ Buffer rs_vmbuffer;
+
/* these fields only used in page-at-a-time mode and for bitmap scans */
uint32 rs_cindex; /* current tuple's index in vistuples */
uint32 rs_ntuples; /* number of visible tuples on page */
@@ -116,8 +123,18 @@ typedef struct IndexFetchHeapData
{
IndexFetchTableData xs_base; /* AM independent part of the descriptor */
- Buffer xs_cbuf; /* current heap buffer in scan, if any */
- /* NB: if xs_cbuf is not InvalidBuffer, we hold a pin on that buffer */
+ /*
+ * Current heap buffer in scan, if any. NB: if xs_cbuf is not
+ * InvalidBuffer, we hold a pin on that buffer.
+ */
+ Buffer xs_cbuf;
+
+ /*
+ * For index scans that do not modify the underlying heap table, on-access
+ * pruning may read in the current heap page's corresponding VM block to
+ * this buffer.
+ */
+ Buffer xs_vmbuffer;
} IndexFetchHeapData;
/* Result codes for HeapTupleSatisfiesVacuum */
@@ -374,7 +391,8 @@ extern TransactionId heap_index_delete_tuples(Relation rel,
/* in heap/pruneheap.c */
struct GlobalVisState;
-extern void heap_page_prune_opt(Relation relation, Buffer buffer);
+extern void heap_page_prune_opt(Relation relation, Buffer buffer,
+ Buffer *vmbuffer);
extern void heap_page_prune_and_freeze(Relation relation, Buffer buffer,
bool blk_known_av,
Buffer vmbuffer,
diff --git a/src/include/access/relscan.h b/src/include/access/relscan.h
index b5e0fb386c0..f496e0b4939 100644
--- a/src/include/access/relscan.h
+++ b/src/include/access/relscan.h
@@ -121,6 +121,12 @@ typedef struct ParallelBlockTableScanWorkerData *ParallelBlockTableScanWorker;
typedef struct IndexFetchTableData
{
Relation rel;
+
+ /*
+ * Some optimizations can only be performed if the query does not modify
+ * the underlying relation. Track that here.
+ */
+ bool modifies_base_rel;
} IndexFetchTableData;
struct IndexScanInstrumentation;
diff --git a/src/include/access/tableam.h b/src/include/access/tableam.h
index b2ce35e2a34..e31c21cf8eb 100644
--- a/src/include/access/tableam.h
+++ b/src/include/access/tableam.h
@@ -62,6 +62,8 @@ typedef enum ScanOptions
/* unregister snapshot at scan end? */
SO_TEMP_SNAPSHOT = 1 << 9,
+ /* whether or not scan should attempt to set the VM */
+ SO_ALLOW_VM_SET = 1 << 10,
} ScanOptions;
/*
@@ -881,6 +883,25 @@ table_beginscan(Relation rel, Snapshot snapshot,
return rel->rd_tableam->scan_begin(rel, snapshot, nkeys, key, NULL, flags);
}
+/*
+ * Similar to table_beginscan(), but allows the caller to indicate whether the
+ * query modifies the relation. This is used when the caller wants to attempt
+ * marking pages in the relation as all-visible in the visibility map during
+ * on-access pruning.
+ */
+static inline TableScanDesc
+table_beginscan_vmset(Relation rel, Snapshot snapshot,
+ int nkeys, struct ScanKeyData *key, bool modifies_rel)
+{
+ uint32 flags = SO_TYPE_SEQSCAN |
+ SO_ALLOW_STRAT | SO_ALLOW_SYNC | SO_ALLOW_PAGEMODE;
+
+ if (!modifies_rel)
+ flags |= SO_ALLOW_VM_SET;
+
+ return rel->rd_tableam->scan_begin(rel, snapshot, nkeys, key, NULL, flags);
+}
+
/*
* Like table_beginscan(), but for scanning catalog. It'll automatically use a
* snapshot appropriate for scanning catalog relations.
@@ -918,10 +939,13 @@ table_beginscan_strat(Relation rel, Snapshot snapshot,
*/
static inline TableScanDesc
table_beginscan_bm(Relation rel, Snapshot snapshot,
- int nkeys, struct ScanKeyData *key)
+ int nkeys, struct ScanKeyData *key, bool modifies_rel)
{
uint32 flags = SO_TYPE_BITMAPSCAN | SO_ALLOW_PAGEMODE;
+ if (!modifies_rel)
+ flags |= SO_ALLOW_VM_SET;
+
return rel->rd_tableam->scan_begin(rel, snapshot, nkeys, key,
NULL, flags);
}
@@ -1130,6 +1154,10 @@ extern void table_parallelscan_initialize(Relation rel,
extern TableScanDesc table_beginscan_parallel(Relation relation,
ParallelTableScanDesc pscan);
+extern TableScanDesc table_beginscan_parallel_vmset(Relation relation,
+ ParallelTableScanDesc pscan,
+ bool modifies_rel);
+
/*
* Restart a parallel scan. Call this in the leader process. Caller is
* responsible for making sure that all workers have finished the scan
diff --git a/src/include/nodes/execnodes.h b/src/include/nodes/execnodes.h
index de782014b2d..839c1be1d7c 100644
--- a/src/include/nodes/execnodes.h
+++ b/src/include/nodes/execnodes.h
@@ -678,6 +678,12 @@ typedef struct EState
* ExecDoInitialPruning() */
const char *es_sourceText; /* Source text from QueryDesc */
+ /*
+ * RT indexes of relations modified by the query either through
+ * UPDATE/DELETE/INSERT/MERGE or SELECT FOR UPDATE
+ */
+ Bitmapset *es_modified_relids;
+
JunkFilter *es_junkFilter; /* top-level junk filter, if any */
/* If query can insert/delete tuples, the command ID to mark them with */
diff --git a/src/test/recovery/t/035_standby_logical_decoding.pl b/src/test/recovery/t/035_standby_logical_decoding.pl
index c9c182892cf..870f03bdd79 100644
--- a/src/test/recovery/t/035_standby_logical_decoding.pl
+++ b/src/test/recovery/t/035_standby_logical_decoding.pl
@@ -10,6 +10,7 @@ use PostgreSQL::Test::Cluster;
use PostgreSQL::Test::Utils;
use Time::HiRes qw(usleep);
use Test::More;
+use Time::HiRes qw(usleep);
if ($ENV{enable_injection_points} ne 'yes')
{
@@ -296,6 +297,7 @@ wal_level = 'logical'
max_replication_slots = 4
max_wal_senders = 4
autovacuum = off
+hot_standby_feedback = on
});
$node_primary->dump_info;
$node_primary->start;
@@ -745,7 +747,7 @@ check_pg_recvlogical_stderr($handle,
$logstart = -s $node_standby->logfile;
reactive_slots_change_hfs_and_wait_for_xmins('shared_row_removal_',
- 'no_conflict_', 0, 1);
+ 'no_conflict_', 1, 0);
# This should not trigger a conflict
wait_until_vacuum_can_remove(
--
2.43.0
v10-0021-Reorder-heap_page_prune_and_freeze-parameters.patchtext/x-patch; charset=US-ASCII; name=v10-0021-Reorder-heap_page_prune_and_freeze-parameters.patchDownload
From 3eb27edab1d7a5dd45e1678a1a8a5150d620706c Mon Sep 17 00:00:00 2001
From: Melanie Plageman <melanieplageman@gmail.com>
Date: Thu, 31 Jul 2025 12:08:18 -0400
Subject: [PATCH v10 21/22] Reorder heap_page_prune_and_freeze parameters
Reorder parameters so that all of the output parameters are together at
the end of the parameter list.
---
src/backend/access/heap/pruneheap.c | 38 ++++++++++++++--------------
src/backend/access/heap/vacuumlazy.c | 6 ++---
src/include/access/heapam.h | 4 +--
3 files changed, 24 insertions(+), 24 deletions(-)
diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index 1a1a551859b..4377673e3a4 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -297,10 +297,10 @@ heap_page_prune_opt(Relation relation, Buffer buffer, Buffer *vmbuffer)
* not the relation has indexes, since we cannot safely determine
* that during on-access pruning with the current implementation.
*/
- heap_page_prune_and_freeze(relation, buffer, false,
+ heap_page_prune_and_freeze(relation, buffer, options, false,
vmbuffer ? *vmbuffer : InvalidBuffer,
- vistest, options,
- NULL, &presult, PRUNE_ON_ACCESS,
+ vistest,
+ NULL, PRUNE_ON_ACCESS, &presult,
&dummy_off_loc, NULL, NULL);
/*
@@ -649,6 +649,15 @@ heap_page_will_freeze(Relation relation, Buffer buffer,
* also need to account for a reduction in the length of the line pointer
* array following array truncation by us.
*
+ * options:
+ * MARK_UNUSED_NOW indicates that dead items can be set LP_UNUSED during
+ * pruning.
+ *
+ * FREEZE indicates that we will also freeze tuples, and will return
+ * 'all_visible', 'all_frozen' flags to the caller.
+ *
+ * UPDATE_VM indicates that we will set the page's status in the VM.
+ *
* If the HEAP_PRUNE_FREEZE option is set, we will also freeze tuples if it's
* required in order to advance relfrozenxid / relminmxid, or if it's
* considered advantageous for overall system performance to do so now. The
@@ -667,30 +676,21 @@ heap_page_will_freeze(Relation relation, Buffer buffer,
* contain the required block of the visibility map.
*
* vistest is used to distinguish whether tuples are DEAD or RECENTLY_DEAD
- * (see heap_prune_satisfies_vacuum).
- *
- * options:
- * MARK_UNUSED_NOW indicates that dead items can be set LP_UNUSED during
- * pruning.
- *
- * FREEZE indicates that we will also freeze tuples, and will return
- * 'all_visible', 'all_frozen' flags to the caller.
- *
- * UPDATE_VM indicates that we will set the page's status in the VM.
+ * (see heap_prune_satisfies_vacuum). It is an input parameter.
*
* cutoffs contains the freeze cutoffs, established by VACUUM at the beginning
* of vacuuming the relation. Required if HEAP_PRUNE_FREEZE option is set.
* cutoffs->OldestXmin is also used to determine if dead tuples are
- * HEAPTUPLE_RECENTLY_DEAD or HEAPTUPLE_DEAD.
+ * HEAPTUPLE_RECENTLY_DEAD or HEAPTUPLE_DEAD. It is an input parameter.
+ *
+ * reason indicates why the pruning is performed. It is included in the WAL
+ * record for debugging and analysis purposes, but otherwise has no effect.
*
* presult contains output parameters needed by callers, such as the number of
* tuples removed and the offsets of dead items on the page after pruning.
* heap_page_prune_and_freeze() is responsible for initializing it. Required
* by all callers.
*
- * reason indicates why the pruning is performed. It is included in the WAL
- * record for debugging and analysis purposes, but otherwise has no effect.
- *
* off_loc is the offset location required by the caller to use in error
* callback.
*
@@ -703,13 +703,13 @@ heap_page_will_freeze(Relation relation, Buffer buffer,
*/
void
heap_page_prune_and_freeze(Relation relation, Buffer buffer,
+ int options,
bool blk_known_av,
Buffer vmbuffer,
GlobalVisState *vistest,
- int options,
struct VacuumCutoffs *cutoffs,
- PruneFreezeResult *presult,
PruneReason reason,
+ PruneFreezeResult *presult,
OffsetNumber *off_loc,
TransactionId *new_relfrozen_xid,
MultiXactId *new_relmin_mxid)
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index 7558ac697f1..99b9cab0974 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -1991,11 +1991,11 @@ lazy_scan_prune(LVRelState *vacrel,
if (vacrel->nindexes == 0)
prune_options |= HEAP_PAGE_PRUNE_MARK_UNUSED_NOW;
- heap_page_prune_and_freeze(rel, buf,
+ heap_page_prune_and_freeze(rel, buf, prune_options,
all_visible_according_to_vm,
vmbuffer,
- vacrel->vistest, prune_options,
- &vacrel->cutoffs, &presult, PRUNE_VACUUM_SCAN,
+ vacrel->vistest,
+ &vacrel->cutoffs, PRUNE_VACUUM_SCAN, &presult,
&vacrel->offnum,
&vacrel->NewRelfrozenXid, &vacrel->NewRelminMxid);
diff --git a/src/include/access/heapam.h b/src/include/access/heapam.h
index 16f7904a21e..0c4e5607627 100644
--- a/src/include/access/heapam.h
+++ b/src/include/access/heapam.h
@@ -394,13 +394,13 @@ struct GlobalVisState;
extern void heap_page_prune_opt(Relation relation, Buffer buffer,
Buffer *vmbuffer);
extern void heap_page_prune_and_freeze(Relation relation, Buffer buffer,
+ int options,
bool blk_known_av,
Buffer vmbuffer,
struct GlobalVisState *vistest,
- int options,
struct VacuumCutoffs *cutoffs,
- PruneFreezeResult *presult,
PruneReason reason,
+ PruneFreezeResult *presult,
OffsetNumber *off_loc,
TransactionId *new_relfrozen_xid,
MultiXactId *new_relmin_mxid);
--
2.43.0
v10-0022-Set-pd_prune_xid-on-insert.patchtext/x-patch; charset=US-ASCII; name=v10-0022-Set-pd_prune_xid-on-insert.patchDownload
From d6ea4c3af1d3b6ce45f0219b6bc233867629b21e Mon Sep 17 00:00:00 2001
From: Melanie Plageman <melanieplageman@gmail.com>
Date: Tue, 29 Jul 2025 16:12:56 -0400
Subject: [PATCH v10 22/22] Set pd_prune_xid on insert
Now that we can set the VM during read-only queries, it makes sense to
start setting the page prunable hint on insert. This will allow
heap_page_prune_and_freeze() to be called when the page is full or
mostly full.
For years there has been a note in heap_insert() and heap_multi_insert()
pointing out that setting pd_prune_xid would help clean up aborted
inserted tuples that would otherwise not be cleaned up until vacuum.
So, that's another benefit of setting it.
Setting pd_prune_xid on insert causes a page to be pruned and then
written out which then affects the reported number of hits in the
index-killtuples isolation test. This is a quirk of how hits are tracked
which sometimes leads them to be double counted. This should probably be
fixed or changed independently.
---
src/backend/access/heap/heapam.c | 25 +++++++++++++------
src/backend/access/heap/heapam_xlog.c | 15 ++++++++++-
.../isolation/expected/index-killtuples.out | 6 ++---
3 files changed, 34 insertions(+), 12 deletions(-)
diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index d07693b7075..02aa2383c50 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -2105,6 +2105,7 @@ heap_insert(Relation relation, HeapTuple tup, CommandId cid,
TransactionId xid = GetCurrentTransactionId();
HeapTuple heaptup;
Buffer buffer;
+ Page page;
Buffer vmbuffer = InvalidBuffer;
bool all_visible_cleared = false;
@@ -2164,15 +2165,19 @@ heap_insert(Relation relation, HeapTuple tup, CommandId cid,
}
/*
- * XXX Should we set PageSetPrunable on this page ?
+ * Set pd_prune_xid to trigger heap_page_prune_and_freeze() once the page
+ * is full so that we can set the page all-visible in the VM.
*
- * The inserting transaction may eventually abort thus making this tuple
- * DEAD and hence available for pruning. Though we don't want to optimize
- * for aborts, if no other tuple in this page is UPDATEd/DELETEd, the
- * aborted tuple will never be pruned until next vacuum is triggered.
+ * Setting pd_prune_xid is also handy if the inserting transaction
+ * eventually aborts making this tuple DEAD and hence available for
+ * pruning. If no other tuple in this page is UPDATEd/DELETEd, the aborted
+ * tuple would never otherwise be pruned until next vacuum is triggered.
*
- * If you do add PageSetPrunable here, add it in heap_xlog_insert too.
+ * Don't set it if we are in bootstrap mode, though.
*/
+ page = BufferGetPage(buffer);
+ if (TransactionIdIsNormal(xid))
+ PageSetPrunable(page, xid);
MarkBufferDirty(buffer);
@@ -2182,7 +2187,6 @@ heap_insert(Relation relation, HeapTuple tup, CommandId cid,
xl_heap_insert xlrec;
xl_heap_header xlhdr;
XLogRecPtr recptr;
- Page page = BufferGetPage(buffer);
uint8 info = XLOG_HEAP_INSERT;
int bufflags = 0;
@@ -2545,8 +2549,13 @@ heap_multi_insert(Relation relation, TupleTableSlot **slots, int ntuples,
}
/*
- * XXX Should we set PageSetPrunable on this page ? See heap_insert()
+ * Set pd_prune_xid. See heap_insert() for more on why we do this when
+ * inserting tuples. This only makes sense if we aren't already
+ * setting the page frozen in the VM. We also don't set it in
+ * bootstrap mode.
*/
+ if (!all_frozen_set && TransactionIdIsNormal(xid))
+ PageSetPrunable(page, xid);
MarkBufferDirty(buffer);
diff --git a/src/backend/access/heap/heapam_xlog.c b/src/backend/access/heap/heapam_xlog.c
index ff3ad8b4cd2..e7d7804871b 100644
--- a/src/backend/access/heap/heapam_xlog.c
+++ b/src/backend/access/heap/heapam_xlog.c
@@ -470,6 +470,12 @@ heap_xlog_insert(XLogReaderState *record)
freespace = PageGetHeapFreeSpace(page); /* needed to update FSM below */
+ /*
+ * Set the page prunable to trigger on-access pruning later which may
+ * set the page all-visible in the VM.
+ */
+ PageSetPrunable(page, XLogRecGetXid(record));
+
PageSetLSN(page, lsn);
if (xlrec->flags & XLH_INSERT_ALL_VISIBLE_CLEARED)
@@ -619,9 +625,16 @@ heap_xlog_multi_insert(XLogReaderState *record)
if (xlrec->flags & XLH_INSERT_ALL_VISIBLE_CLEARED)
PageClearAllVisible(page);
- /* XLH_INSERT_ALL_FROZEN_SET implies that all tuples are visible */
+ /*
+ * XLH_INSERT_ALL_FROZEN_SET implies that all tuples are visible. If
+ * we are not setting the page frozen, then set the page's prunable
+ * hint so that we trigger on-access pruning later which may set the
+ * page all-visible in the VM.
+ */
if (xlrec->flags & XLH_INSERT_ALL_FROZEN_SET)
PageSetAllVisible(page);
+ else
+ PageSetPrunable(page, XLogRecGetXid(record));
MarkBufferDirty(buffer);
}
diff --git a/src/test/isolation/expected/index-killtuples.out b/src/test/isolation/expected/index-killtuples.out
index be7ddd756ef..b29f2434b00 100644
--- a/src/test/isolation/expected/index-killtuples.out
+++ b/src/test/isolation/expected/index-killtuples.out
@@ -54,7 +54,7 @@ step flush: SELECT FROM pg_stat_force_next_flush();
step result: SELECT heap_blks_read + heap_blks_hit - counter.heap_accesses AS new_heap_accesses FROM counter, pg_statio_all_tables WHERE relname = 'kill_prior_tuple';
new_heap_accesses
-----------------
- 1
+ 2
(1 row)
step measure: UPDATE counter SET heap_accesses = (SELECT heap_blks_read + heap_blks_hit FROM pg_statio_all_tables WHERE relname = 'kill_prior_tuple');
@@ -130,7 +130,7 @@ step flush: SELECT FROM pg_stat_force_next_flush();
step result: SELECT heap_blks_read + heap_blks_hit - counter.heap_accesses AS new_heap_accesses FROM counter, pg_statio_all_tables WHERE relname = 'kill_prior_tuple';
new_heap_accesses
-----------------
- 1
+ 2
(1 row)
step measure: UPDATE counter SET heap_accesses = (SELECT heap_blks_read + heap_blks_hit FROM pg_statio_all_tables WHERE relname = 'kill_prior_tuple');
@@ -283,7 +283,7 @@ step flush: SELECT FROM pg_stat_force_next_flush();
step result: SELECT heap_blks_read + heap_blks_hit - counter.heap_accesses AS new_heap_accesses FROM counter, pg_statio_all_tables WHERE relname = 'kill_prior_tuple';
new_heap_accesses
-----------------
- 1
+ 2
(1 row)
step measure: UPDATE counter SET heap_accesses = (SELECT heap_blks_read + heap_blks_hit FROM pg_statio_all_tables WHERE relname = 'kill_prior_tuple');
--
2.43.0
v10-0002-Add-error-codes-to-vacuum-VM-corruption-case-log.patchtext/x-patch; charset=US-ASCII; name=v10-0002-Add-error-codes-to-vacuum-VM-corruption-case-log.patchDownload
From bf1d4ed090ca4f30d382cb9ff028565967bed5db Mon Sep 17 00:00:00 2001
From: Melanie Plageman <melanieplageman@gmail.com>
Date: Mon, 8 Sep 2025 10:00:34 -0400
Subject: [PATCH v10 02/22] Add error codes to vacuum VM corruption case
logging
Enhance the log message emitted when the heap page is found not
to be consistent with the VM during vacuum.
PD_ALL_VISIBLE must never be clear if the VM bits are set for this page.
And a page marked all-visible in the VM must not contain dead items.
Both of these cases are either data corruption or VM corruption.
Add ERRCODE_DATA_CORRUPTED to the existing log mesage. Using the
appropriate error codes makes monitoring much easier.
Suggested-by: Andrey Borodin <x4mmm@yandex-team.ru>
Author: Melanie Plageman <melanieplageman@gmail.com>
Discussion: https://postgr.es/m/87DD95AA-274F-4F4F-BAD9-7738E5B1F905%40yandex-team.ru
---
src/backend/access/heap/vacuumlazy.c | 12 ++++++++----
1 file changed, 8 insertions(+), 4 deletions(-)
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index 932701d8420..8bea0454ff5 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -2121,8 +2121,10 @@ lazy_scan_prune(LVRelState *vacrel,
else if (all_visible_according_to_vm && !PageIsAllVisible(page) &&
visibilitymap_get_status(vacrel->rel, blkno, &vmbuffer) != 0)
{
- elog(WARNING, "page is not marked all-visible but visibility map bit is set in relation \"%s\" page %u",
- vacrel->relname, blkno);
+ ereport(WARNING, (errcode(ERRCODE_DATA_CORRUPTED),
+ errmsg("page is not marked all-visible but visibility map bit is set in relation \"%s\" page %u",
+ vacrel->relname, blkno)));
+
visibilitymap_clear(vacrel->rel, blkno, vmbuffer,
VISIBILITYMAP_VALID_BITS);
}
@@ -2143,8 +2145,10 @@ lazy_scan_prune(LVRelState *vacrel,
*/
else if (presult.lpdead_items > 0 && PageIsAllVisible(page))
{
- elog(WARNING, "page containing LP_DEAD items is marked as all-visible in relation \"%s\" page %u",
- vacrel->relname, blkno);
+ ereport(WARNING, (errcode(ERRCODE_DATA_CORRUPTED),
+ errmsg("page containing LP_DEAD items is marked as all-visible in relation \"%s\" page %u",
+ vacrel->relname, blkno)));
+
PageClearAllVisible(page);
MarkBufferDirty(buf);
visibilitymap_clear(vacrel->rel, blkno, vmbuffer,
--
2.43.0
v10-0003-Eliminate-xl_heap_visible-in-COPY-FREEZE.patchtext/x-patch; charset=US-ASCII; name=v10-0003-Eliminate-xl_heap_visible-in-COPY-FREEZE.patchDownload
From 2a418d3dee217cbf411ec96a7a6b95831077f887 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <melanieplageman@gmail.com>
Date: Tue, 17 Jun 2025 17:22:10 -0400
Subject: [PATCH v10 03/22] Eliminate xl_heap_visible in COPY FREEZE
Instead of emitting a separate WAL record for setting the VM bits in
xl_heap_visible, specify the changes to make to the VM block in the
xl_heap_multi_insert record instead.
Author: Melanie Plageman <melanieplageman@gmail.com>
Reviewed-by: Kirill Reshke <reshkekirill@gmail.com>
Discussion: https://postgr.es/m/flat/CAAKRu_ZMw6Npd_qm2KM%2BFwQ3cMOMx1Dh3VMhp8-V7SOLxdK9-g%40mail.gmail.com
---
src/backend/access/heap/heapam.c | 47 ++++++++++-------
src/backend/access/heap/heapam_xlog.c | 43 +++++++++++++++-
src/backend/access/heap/visibilitymap.c | 67 ++++++++++++++++++++++++-
src/backend/access/rmgrdesc/heapdesc.c | 5 ++
src/include/access/visibilitymap.h | 2 +
5 files changed, 144 insertions(+), 20 deletions(-)
diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index 4c5ae205a7a..893a739009a 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -2504,9 +2504,6 @@ heap_multi_insert(Relation relation, TupleTableSlot **slots, int ntuples,
/*
* If the page is all visible, need to clear that, unless we're only
* going to add further frozen rows to it.
- *
- * If we're only adding already frozen rows to a previously empty
- * page, mark it as all-visible.
*/
if (PageIsAllVisible(page) && !(options & HEAP_INSERT_FROZEN))
{
@@ -2516,8 +2513,22 @@ heap_multi_insert(Relation relation, TupleTableSlot **slots, int ntuples,
BufferGetBlockNumber(buffer),
vmbuffer, VISIBILITYMAP_VALID_BITS);
}
+
+ /*
+ * If we're only adding already frozen rows to a previously empty
+ * page, mark it as all-frozen and update the visibility map. We're
+ * already holding a pin on the vmbuffer.
+ */
else if (all_frozen_set)
+ {
PageSetAllVisible(page);
+ LockBuffer(vmbuffer, BUFFER_LOCK_EXCLUSIVE);
+ visibilitymap_set_vmbits(relation,
+ BufferGetBlockNumber(buffer),
+ vmbuffer,
+ VISIBILITYMAP_ALL_VISIBLE |
+ VISIBILITYMAP_ALL_FROZEN);
+ }
/*
* XXX Should we set PageSetPrunable on this page ? See heap_insert()
@@ -2565,6 +2576,12 @@ heap_multi_insert(Relation relation, TupleTableSlot **slots, int ntuples,
xlrec->flags = 0;
if (all_visible_cleared)
xlrec->flags = XLH_INSERT_ALL_VISIBLE_CLEARED;
+
+ /*
+ * We don't have to worry about including a conflict xid in the
+ * WAL record as HEAP_INSERT_FROZEN intentionally violates
+ * visibility rules.
+ */
if (all_frozen_set)
xlrec->flags = XLH_INSERT_ALL_FROZEN_SET;
@@ -2627,7 +2644,10 @@ heap_multi_insert(Relation relation, TupleTableSlot **slots, int ntuples,
XLogBeginInsert();
XLogRegisterData(xlrec, tupledata - scratch.data);
+
XLogRegisterBuffer(0, buffer, REGBUF_STANDARD | bufflags);
+ if (all_frozen_set)
+ XLogRegisterBuffer(1, vmbuffer, 0);
XLogRegisterBufData(0, tupledata, totaldatalen);
@@ -2637,26 +2657,17 @@ heap_multi_insert(Relation relation, TupleTableSlot **slots, int ntuples,
recptr = XLogInsert(RM_HEAP2_ID, info);
PageSetLSN(page, recptr);
+ if (all_frozen_set)
+ {
+ Assert(BufferIsDirty(vmbuffer));
+ PageSetLSN(BufferGetPage(vmbuffer), recptr);
+ }
}
END_CRIT_SECTION();
- /*
- * If we've frozen everything on the page, update the visibilitymap.
- * We're already holding pin on the vmbuffer.
- */
if (all_frozen_set)
- {
- /*
- * It's fine to use InvalidTransactionId here - this is only used
- * when HEAP_INSERT_FROZEN is specified, which intentionally
- * violates visibility rules.
- */
- visibilitymap_set(relation, BufferGetBlockNumber(buffer), buffer,
- InvalidXLogRecPtr, vmbuffer,
- InvalidTransactionId,
- VISIBILITYMAP_ALL_VISIBLE | VISIBILITYMAP_ALL_FROZEN);
- }
+ LockBuffer(vmbuffer, BUFFER_LOCK_UNLOCK);
UnlockReleaseBuffer(buffer);
ndone += nthispage;
diff --git a/src/backend/access/heap/heapam_xlog.c b/src/backend/access/heap/heapam_xlog.c
index cf843277938..0820f7d052d 100644
--- a/src/backend/access/heap/heapam_xlog.c
+++ b/src/backend/access/heap/heapam_xlog.c
@@ -551,6 +551,7 @@ heap_xlog_multi_insert(XLogReaderState *record)
int i;
bool isinit = (XLogRecGetInfo(record) & XLOG_HEAP_INIT_PAGE) != 0;
XLogRedoAction action;
+ Buffer vmbuffer = InvalidBuffer;
/*
* Insertion doesn't overwrite MVCC data, so no conflict processing is
@@ -571,11 +572,11 @@ heap_xlog_multi_insert(XLogReaderState *record)
if (xlrec->flags & XLH_INSERT_ALL_VISIBLE_CLEARED)
{
Relation reln = CreateFakeRelcacheEntry(rlocator);
- Buffer vmbuffer = InvalidBuffer;
visibilitymap_pin(reln, blkno, &vmbuffer);
visibilitymap_clear(reln, blkno, vmbuffer, VISIBILITYMAP_VALID_BITS);
ReleaseBuffer(vmbuffer);
+ vmbuffer = InvalidBuffer;
FreeFakeRelcacheEntry(reln);
}
@@ -662,6 +663,46 @@ heap_xlog_multi_insert(XLogReaderState *record)
if (BufferIsValid(buffer))
UnlockReleaseBuffer(buffer);
+ buffer = InvalidBuffer;
+
+ /*
+ * Now read and update the VM block. Even if we skipped updating the heap
+ * page due to the file being dropped or truncated later in recovery, it's
+ * still safe to update the visibility map. Any WAL record that clears
+ * the visibility map bit does so before checking the page LSN, so any
+ * bits that need to be cleared will still be cleared.
+ *
+ * It is only okay to set the VM bits without holding the heap page lock
+ * because we can expect no other writers of this page.
+ */
+ if (xlrec->flags & XLH_INSERT_ALL_FROZEN_SET &&
+ XLogReadBufferForRedoExtended(record, 1, RBM_ZERO_ON_ERROR, false,
+ &vmbuffer) == BLK_NEEDS_REDO)
+ {
+ Page vmpage = BufferGetPage(vmbuffer);
+ Relation reln = CreateFakeRelcacheEntry(rlocator);
+
+ /* initialize the page if it was read as zeros */
+ if (PageIsNew(vmpage))
+ PageInit(vmpage, BLCKSZ, 0);
+
+ visibilitymap_set_vmbits(reln, blkno,
+ vmbuffer,
+ VISIBILITYMAP_ALL_VISIBLE |
+ VISIBILITYMAP_ALL_FROZEN);
+
+ /*
+ * It is not possible that the VM was already set for this heap page,
+ * so the vmbuffer must have been modified and marked dirty.
+ */
+ Assert(BufferIsDirty(vmbuffer));
+ PageSetLSN(BufferGetPage(vmbuffer), lsn);
+ FreeFakeRelcacheEntry(reln);
+ }
+
+ if (BufferIsValid(vmbuffer))
+ UnlockReleaseBuffer(vmbuffer);
+
/*
* If the page is running low on free space, update the FSM as well.
* Arbitrarily, our definition of "low" is less than 20%. We can't do much
diff --git a/src/backend/access/heap/visibilitymap.c b/src/backend/access/heap/visibilitymap.c
index 7306c16f05c..bb8dfd8910a 100644
--- a/src/backend/access/heap/visibilitymap.c
+++ b/src/backend/access/heap/visibilitymap.c
@@ -14,7 +14,8 @@
* visibilitymap_clear - clear bits for one page in the visibility map
* visibilitymap_pin - pin a map page for setting a bit
* visibilitymap_pin_ok - check whether correct map page is already pinned
- * visibilitymap_set - set a bit in a previously pinned page
+ * visibilitymap_set - set bit(s) in a previously pinned page and log
+ * visibilitymap_set_vmbits - set bit(s) in a pinned page
* visibilitymap_get_status - get status of bits
* visibilitymap_count - count number of bits set in visibility map
* visibilitymap_prepare_truncate -
@@ -321,6 +322,70 @@ visibilitymap_set(Relation rel, BlockNumber heapBlk, Buffer heapBuf,
return status;
}
+/*
+ * Set flags in the VM block contained in the passed in vmBuf.
+ *
+ * This function is for callers which include the VM changes in the same WAL
+ * record as the modifications of the heap page which rendered it all-visible.
+ * Callers separately logging the VM changes should invoke visibilitymap_set()
+ * instead.
+ *
+ * Caller must have pinned and exclusive locked the correct block of the VM in
+ * vmBuf. This block should contain the VM bits for the given heapBlk.
+ *
+ * During normal operation (i.e. not recovery), this should be called in a
+ * critical section which also makes any necessary changes to the heap page
+ * and, if relevant, emits WAL.
+ *
+ * Caller is responsible for WAL logging the changes to the VM buffer and for
+ * making any changes needed to the associated heap page. This includes
+ * maintaining any invariants such as ensuring the buffer containing heapBlk
+ * is pinned and exclusive locked.
+ */
+uint8
+visibilitymap_set_vmbits(Relation rel, BlockNumber heapBlk,
+ Buffer vmBuf, uint8 flags)
+{
+ BlockNumber mapBlock = HEAPBLK_TO_MAPBLOCK(heapBlk);
+ uint32 mapByte = HEAPBLK_TO_MAPBYTE(heapBlk);
+ uint8 mapOffset = HEAPBLK_TO_OFFSET(heapBlk);
+ Page page;
+ uint8 *map;
+ uint8 status;
+
+#ifdef TRACE_VISIBILITYMAP
+ elog(DEBUG1, "vm_set flags 0x%02X for %s %d",
+ flags, RelationGetRelationName(rel), heapBlk);
+#endif
+
+ /* Call in same critical section where WAL is emitted. */
+ Assert(InRecovery || CritSectionCount > 0);
+
+ /* Flags should be valid. Also never clear bits with this function */
+ Assert((flags & VISIBILITYMAP_VALID_BITS) == flags);
+
+ /* Must never set all_frozen bit without also setting all_visible bit */
+ Assert(flags != VISIBILITYMAP_ALL_FROZEN);
+
+ /* Check that we have the right VM page pinned */
+ if (!BufferIsValid(vmBuf) || BufferGetBlockNumber(vmBuf) != mapBlock)
+ elog(ERROR, "wrong VM buffer passed to visibilitymap_set");
+
+ Assert(BufferIsExclusiveLocked(vmBuf));
+
+ page = BufferGetPage(vmBuf);
+ map = (uint8 *) PageGetContents(page);
+
+ status = (map[mapByte] >> mapOffset) & VISIBILITYMAP_VALID_BITS;
+ if (flags != status)
+ {
+ map[mapByte] |= (flags << mapOffset);
+ MarkBufferDirty(vmBuf);
+ }
+
+ return status;
+}
+
/*
* visibilitymap_get_status - get status of bits
*
diff --git a/src/backend/access/rmgrdesc/heapdesc.c b/src/backend/access/rmgrdesc/heapdesc.c
index 82b62c95de5..b48d7dc1d24 100644
--- a/src/backend/access/rmgrdesc/heapdesc.c
+++ b/src/backend/access/rmgrdesc/heapdesc.c
@@ -16,6 +16,7 @@
#include "access/heapam_xlog.h"
#include "access/rmgrdesc_utils.h"
+#include "access/visibilitymapdefs.h"
#include "storage/standbydefs.h"
/*
@@ -354,6 +355,10 @@ heap2_desc(StringInfo buf, XLogReaderState *record)
appendStringInfo(buf, "ntuples: %d, flags: 0x%02X", xlrec->ntuples,
xlrec->flags);
+ if (xlrec->flags & XLH_INSERT_ALL_FROZEN_SET)
+ appendStringInfo(buf, ", vm_flags: 0x%02X",
+ VISIBILITYMAP_ALL_VISIBLE | VISIBILITYMAP_ALL_FROZEN);
+
if (XLogRecHasBlockData(record, 0) && !isinit)
{
appendStringInfoString(buf, ", offsets:");
diff --git a/src/include/access/visibilitymap.h b/src/include/access/visibilitymap.h
index be21c6dd1a3..fc7056a91ea 100644
--- a/src/include/access/visibilitymap.h
+++ b/src/include/access/visibilitymap.h
@@ -37,6 +37,8 @@ extern uint8 visibilitymap_set(Relation rel,
Buffer vmBuf,
TransactionId cutoff_xid,
uint8 flags);
+extern uint8 visibilitymap_set_vmbits(Relation rel, BlockNumber heapBlk,
+ Buffer vmBuf, uint8 flags);
extern uint8 visibilitymap_get_status(Relation rel, BlockNumber heapBlk, Buffer *vmbuf);
extern void visibilitymap_count(Relation rel, BlockNumber *all_visible, BlockNumber *all_frozen);
extern BlockNumber visibilitymap_prepare_truncate(Relation rel,
--
2.43.0
v10-0001-Remove-unused-xl_heap_prune-member-reason.patchtext/x-patch; charset=US-ASCII; name=v10-0001-Remove-unused-xl_heap_prune-member-reason.patchDownload
From 17aaea61d6d2f24d9271b5cd122c7ba5c3a31cdd Mon Sep 17 00:00:00 2001
From: Melanie Plageman <melanieplageman@gmail.com>
Date: Wed, 3 Sep 2025 15:02:58 -0400
Subject: [PATCH v10 01/22] Remove unused xl_heap_prune member, reason
f83d709760d8 refactored xl_heap_prune and added an unused member,
reason. While PruneReason is used when constructing this WAL record to
set the WAL record definition, it doesn't need to be stored in a
separate field in the record. Remove it.
Author: Melanie Plageman <melanieplageman@gmail.com>
Reported-by: Andres Freund <andres@anarazel.de>
Discussion: https://postgr.es/m/tvvtfoxz5ykpsctxjbzxg3nldnzfc7geplrt2z2s54pmgto27y%40hbijsndifu45
---
src/include/access/heapam_xlog.h | 1 -
1 file changed, 1 deletion(-)
diff --git a/src/include/access/heapam_xlog.h b/src/include/access/heapam_xlog.h
index 277df6b3cf0..d4c0625b632 100644
--- a/src/include/access/heapam_xlog.h
+++ b/src/include/access/heapam_xlog.h
@@ -284,7 +284,6 @@ typedef struct xl_heap_update
*/
typedef struct xl_heap_prune
{
- uint8 reason;
uint8 flags;
/*
--
2.43.0
v10-0004-Make-heap_page_is_all_visible-independent-of-LVR.patchtext/x-patch; charset=US-ASCII; name=v10-0004-Make-heap_page_is_all_visible-independent-of-LVR.patchDownload
From 1953de8af1b47cd8309859da66e73f3eaeceb878 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <melanieplageman@gmail.com>
Date: Wed, 18 Jun 2025 15:48:51 -0400
Subject: [PATCH v10 04/22] Make heap_page_is_all_visible independent of
LVRelState
Future commits will use this function inside of pruneheap.c where we do
not have access to the LVRelState. We only need a few parameters from
the LVRelState, so just pass those in explicitly.
Author: Melanie Plageman <melanieplageman@gmail.com>
Reviewed-by: Kirill Reshke <reshkekirill@gmail.com>
Discussion: https://postgr.es/m/flat/CAAKRu_ZMw6Npd_qm2KM%2BFwQ3cMOMx1Dh3VMhp8-V7SOLxdK9-g%40mail.gmail.com
---
src/backend/access/heap/vacuumlazy.c | 48 ++++++++++++++++++----------
1 file changed, 31 insertions(+), 17 deletions(-)
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index 8bea0454ff5..c02eca36c88 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -463,8 +463,11 @@ static void dead_items_add(LVRelState *vacrel, BlockNumber blkno, OffsetNumber *
int num_offsets);
static void dead_items_reset(LVRelState *vacrel);
static void dead_items_cleanup(LVRelState *vacrel);
-static bool heap_page_is_all_visible(LVRelState *vacrel, Buffer buf,
- TransactionId *visibility_cutoff_xid, bool *all_frozen);
+static bool heap_page_is_all_visible(Relation rel, Buffer buf,
+ TransactionId OldestXmin,
+ bool *all_frozen,
+ TransactionId *visibility_cutoff_xid,
+ OffsetNumber *logging_offnum);
static void update_relstats_all_indexes(LVRelState *vacrel);
static void vacuum_error_callback(void *arg);
static void update_vacuum_error_info(LVRelState *vacrel,
@@ -2009,8 +2012,9 @@ lazy_scan_prune(LVRelState *vacrel,
Assert(presult.lpdead_items == 0);
- if (!heap_page_is_all_visible(vacrel, buf,
- &debug_cutoff, &debug_all_frozen))
+ if (!heap_page_is_all_visible(vacrel->rel, buf,
+ vacrel->cutoffs.OldestXmin, &debug_all_frozen,
+ &debug_cutoff, &vacrel->offnum))
Assert(false);
Assert(presult.all_frozen == debug_all_frozen);
@@ -2910,8 +2914,8 @@ lazy_vacuum_heap_page(LVRelState *vacrel, BlockNumber blkno, Buffer buffer,
* emitted.
*/
Assert(!PageIsAllVisible(page));
- if (heap_page_is_all_visible(vacrel, buffer, &visibility_cutoff_xid,
- &all_frozen))
+ if (heap_page_is_all_visible(vacrel->rel, buffer, vacrel->cutoffs.OldestXmin,
+ &all_frozen, &visibility_cutoff_xid, &vacrel->offnum))
{
uint8 flags = VISIBILITYMAP_ALL_VISIBLE;
@@ -3594,10 +3598,18 @@ dead_items_cleanup(LVRelState *vacrel)
}
/*
- * Check if every tuple in the given page is visible to all current and future
- * transactions. Also return the visibility_cutoff_xid which is the highest
- * xmin amongst the visible tuples. Set *all_frozen to true if every tuple
- * on this page is frozen.
+ * Check if every tuple in the given page in buf is visible to all current and
+ * future transactions.
+ *
+ * OldestXmin is used to determine visibility.
+ *
+ * Sets *all_frozen to true if every tuple on this page is frozen.
+ *
+ * Sets *visibility_cutoff_xid to the highest xmin amongst the visible tuples.
+ * It is only valid if the page is all-visible.
+ *
+ * *logging_offnum will have the OffsetNumber of the current tuple being
+ * processed for vacuum's error callback system.
*
* This is a stripped down version of lazy_scan_prune(). If you change
* anything here, make sure that everything stays in sync. Note that an
@@ -3605,9 +3617,11 @@ dead_items_cleanup(LVRelState *vacrel)
* introducing new side-effects here.
*/
static bool
-heap_page_is_all_visible(LVRelState *vacrel, Buffer buf,
+heap_page_is_all_visible(Relation rel, Buffer buf,
+ TransactionId OldestXmin,
+ bool *all_frozen,
TransactionId *visibility_cutoff_xid,
- bool *all_frozen)
+ OffsetNumber *logging_offnum)
{
Page page = BufferGetPage(buf);
BlockNumber blockno = BufferGetBlockNumber(buf);
@@ -3630,7 +3644,7 @@ heap_page_is_all_visible(LVRelState *vacrel, Buffer buf,
* Set the offset number so that we can display it along with any
* error that occurred while processing this tuple.
*/
- vacrel->offnum = offnum;
+ *logging_offnum = offnum;
itemid = PageGetItemId(page, offnum);
/* Unused or redirect line pointers are of no interest */
@@ -3654,9 +3668,9 @@ heap_page_is_all_visible(LVRelState *vacrel, Buffer buf,
tuple.t_data = (HeapTupleHeader) PageGetItem(page, itemid);
tuple.t_len = ItemIdGetLength(itemid);
- tuple.t_tableOid = RelationGetRelid(vacrel->rel);
+ tuple.t_tableOid = RelationGetRelid(rel);
- switch (HeapTupleSatisfiesVacuum(&tuple, vacrel->cutoffs.OldestXmin,
+ switch (HeapTupleSatisfiesVacuum(&tuple, OldestXmin,
buf))
{
case HEAPTUPLE_LIVE:
@@ -3677,7 +3691,7 @@ heap_page_is_all_visible(LVRelState *vacrel, Buffer buf,
*/
xmin = HeapTupleHeaderGetXmin(tuple.t_data);
if (!TransactionIdPrecedes(xmin,
- vacrel->cutoffs.OldestXmin))
+ OldestXmin))
{
all_visible = false;
*all_frozen = false;
@@ -3712,7 +3726,7 @@ heap_page_is_all_visible(LVRelState *vacrel, Buffer buf,
} /* scan along page */
/* Clear the offset information once we have processed the given page. */
- vacrel->offnum = InvalidOffsetNumber;
+ *logging_offnum = InvalidOffsetNumber;
return all_visible;
}
--
2.43.0
v10-0005-Eliminate-xl_heap_visible-from-vacuum-phase-III.patchtext/x-patch; charset=US-ASCII; name=v10-0005-Eliminate-xl_heap_visible-from-vacuum-phase-III.patchDownload
From 42341137bce0a86523f864ea57380e0285f18396 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <melanieplageman@gmail.com>
Date: Wed, 18 Jun 2025 12:42:13 -0400
Subject: [PATCH v10 05/22] Eliminate xl_heap_visible from vacuum phase III
Instead of emitting a separate xl_heap_visible record for each page that
is rendered all-visible by vacuum's third phase, include the updates to
the VM in the already emitted xl_heap_prune record.
The visibilitymap bits are stored in the flags member of the
xl_heap_prune record.
Author: Melanie Plageman <melanieplageman@gmail.com>
Reviewed-by: Kirill Reshke <reshkekirill@gmail.com>
Reviewed-by: Andres Freund <andres@anarazel.de>
Discussion: https://postgr.es/m/flat/CAAKRu_ZMw6Npd_qm2KM%2BFwQ3cMOMx1Dh3VMhp8-V7SOLxdK9-g%40mail.gmail.com
---
src/backend/access/heap/heapam_xlog.c | 145 ++++++++++++++++++----
src/backend/access/heap/pruneheap.c | 66 ++++++++--
src/backend/access/heap/vacuumlazy.c | 164 +++++++++++++++++--------
src/backend/access/rmgrdesc/heapdesc.c | 7 +-
src/include/access/heapam.h | 9 ++
src/include/access/heapam_xlog.h | 36 ++++--
6 files changed, 330 insertions(+), 97 deletions(-)
diff --git a/src/backend/access/heap/heapam_xlog.c b/src/backend/access/heap/heapam_xlog.c
index 0820f7d052d..11c11929ed9 100644
--- a/src/backend/access/heap/heapam_xlog.c
+++ b/src/backend/access/heap/heapam_xlog.c
@@ -35,7 +35,9 @@ heap_xlog_prune_freeze(XLogReaderState *record)
Buffer buffer;
RelFileLocator rlocator;
BlockNumber blkno;
- XLogRedoAction action;
+ Buffer vmbuffer = InvalidBuffer;
+ uint8 vmflags = 0;
+ Size freespace = 0;
XLogRecGetBlockTag(record, 0, &rlocator, NULL, &blkno);
memcpy(&xlrec, maindataptr, SizeOfHeapPrune);
@@ -50,11 +52,17 @@ heap_xlog_prune_freeze(XLogReaderState *record)
Assert((xlrec.flags & XLHP_CLEANUP_LOCK) != 0 ||
(xlrec.flags & (XLHP_HAS_REDIRECTIONS | XLHP_HAS_DEAD_ITEMS)) == 0);
+ vmflags = xlrec.flags & VISIBILITYMAP_VALID_BITS;
+
/*
- * We are about to remove and/or freeze tuples. In Hot Standby mode,
- * ensure that there are no queries running for which the removed tuples
- * are still visible or which still consider the frozen xids as running.
- * The conflict horizon XID comes after xl_heap_prune.
+ * After xl_heap_prune is the optional snapshot conflict horizon.
+ *
+ * In Hot Standby mode, we must ensure that there are no running queries
+ * which would conflict with the changes in this record. That means we
+ * can't replay this record if it removes tuples that are still visible to
+ * transactions on the standby, freeze tuples with xids that are still
+ * considered running on the standby, or set a page as all-visible in the
+ * VM if it isn't all-visible to all transactions on the standby.
*/
if ((xlrec.flags & XLHP_HAS_CONFLICT_HORIZON) != 0)
{
@@ -71,12 +79,12 @@ heap_xlog_prune_freeze(XLogReaderState *record)
}
/*
- * If we have a full-page image, restore it and we're done.
+ * If we have a full-page image of the heap block, restore it and we're
+ * done with the heap block.
*/
- action = XLogReadBufferForRedoExtended(record, 0, RBM_NORMAL,
- (xlrec.flags & XLHP_CLEANUP_LOCK) != 0,
- &buffer);
- if (action == BLK_NEEDS_REDO)
+ if (XLogReadBufferForRedoExtended(record, 0, RBM_NORMAL,
+ (xlrec.flags & XLHP_CLEANUP_LOCK) != 0,
+ &buffer) == BLK_NEEDS_REDO)
{
Page page = BufferGetPage(buffer);
OffsetNumber *redirected;
@@ -89,6 +97,9 @@ heap_xlog_prune_freeze(XLogReaderState *record)
Size datalen;
xlhp_freeze_plan *plans;
OffsetNumber *frz_offsets;
+ bool do_prune;
+ bool mark_buffer_dirty;
+ bool set_heap_lsn;
char *dataptr = XLogRecGetBlockData(record, 0, &datalen);
heap_xlog_deserialize_prune_and_freeze(dataptr, xlrec.flags,
@@ -97,11 +108,18 @@ heap_xlog_prune_freeze(XLogReaderState *record)
&ndead, &nowdead,
&nunused, &nowunused);
+ do_prune = nredirected > 0 || ndead > 0 || nunused > 0;
+ set_heap_lsn = mark_buffer_dirty = do_prune || nplans > 0;
+
+ /* Ensure the record does something */
+ Assert(do_prune || nplans > 0 ||
+ vmflags & VISIBILITYMAP_VALID_BITS);
+
/*
* Update all line pointers per the record, and repair fragmentation
* if needed.
*/
- if (nredirected > 0 || ndead > 0 || nunused > 0)
+ if (do_prune)
heap_page_prune_execute(buffer,
(xlrec.flags & XLHP_CLEANUP_LOCK) == 0,
redirected, nredirected,
@@ -138,36 +156,117 @@ heap_xlog_prune_freeze(XLogReaderState *record)
/* There should be no more data */
Assert((char *) frz_offsets == dataptr + datalen);
+ /*
+ * Now set PD_ALL_VISIBLE, if required. We'll only do this if we are
+ * also going to set bits in the VM later.
+ *
+ * We must never end up with the VM bit set and the page-level
+ * PD_ALL_VISIBLE bit clear. If that were to occur, a subsequent page
+ * modification would fail to clear the VM bit.
+ */
+ if ((vmflags & VISIBILITYMAP_VALID_BITS) && !PageIsAllVisible(page))
+ {
+ PageSetAllVisible(page);
+
+ /*
+ * If the only change to the heap page is setting PD_ALL_VISIBLE,
+ * we can avoid setting the page LSN unless checksums or
+ * wal_log_hints are enabled.
+ */
+ set_heap_lsn = XLogHintBitIsNeeded() ? true : set_heap_lsn;
+ mark_buffer_dirty = true;
+ }
+
/*
* Note: we don't worry about updating the page's prunability hints.
* At worst this will cause an extra prune cycle to occur soon.
*/
- PageSetLSN(page, lsn);
- MarkBufferDirty(buffer);
+ if (mark_buffer_dirty)
+ MarkBufferDirty(buffer);
+ if (set_heap_lsn)
+ PageSetLSN(page, lsn);
}
/*
- * If we released any space or line pointers, update the free space map.
+ * If we released any space or line pointers or will be setting a page in
+ * the visibility map, measure the page's freespace to later update the
+ * freespace map.
+ *
+ * Even if we are just updating the VM (and thus not freeing up any
+ * space), we'll still update the FSM for this page. Since FSM is not
+ * WAL-logged and only updated heuristically, it easily becomes stale in
+ * standbys. If the standby is later promoted and runs VACUUM, it will
+ * skip updating individual free space figures for pages that became
+ * all-visible (or all-frozen, depending on the vacuum mode,) which is
+ * troublesome when FreeSpaceMapVacuum propagates too optimistic free
+ * space values to upper FSM layers; later inserters try to use such pages
+ * only to find out that they are unusable. This can cause long stalls
+ * when there are many such pages.
+ *
+ * Forestall those problems by updating FSM's idea about a page that is
+ * becoming all-visible or all-frozen.
*
* Do this regardless of a full-page image being applied, since the FSM
* data is not in the page anyway.
+ *
+ * We want to avoid holding an exclusive lock on the heap buffer while
+ * doing IO (either of the FSM or the VM), so we'll release the lock on
+ * the heap buffer before doing either.
*/
if (BufferIsValid(buffer))
{
- if (xlrec.flags & (XLHP_HAS_REDIRECTIONS |
- XLHP_HAS_DEAD_ITEMS |
- XLHP_HAS_NOW_UNUSED_ITEMS))
- {
- Size freespace = PageGetHeapFreeSpace(BufferGetPage(buffer));
+ if ((xlrec.flags & (XLHP_HAS_REDIRECTIONS |
+ XLHP_HAS_DEAD_ITEMS |
+ XLHP_HAS_NOW_UNUSED_ITEMS)) ||
+ vmflags & VISIBILITYMAP_VALID_BITS)
+ freespace = PageGetHeapFreeSpace(BufferGetPage(buffer));
+
+ UnlockReleaseBuffer(buffer);
+ }
+
+ /*
+ * Read and update the VM block. Even if we skipped updating the heap page
+ * due to the file being dropped or truncated later in recovery, it's
+ * still safe to update the visibility map. Any WAL record that clears
+ * the visibility map bit does so before checking the page LSN, so any
+ * bits that need to be cleared will still be cleared.
+ *
+ * Note that it is _only_ okay that we do not hold a lock on the heap page
+ * because we are in recovery and can expect no other writers to clear
+ * PD_ALL_VISIBLE before we are able to update the VM.
+ */
+ if (vmflags & VISIBILITYMAP_VALID_BITS &&
+ XLogReadBufferForRedoExtended(record, 1,
+ RBM_ZERO_ON_ERROR,
+ false,
+ &vmbuffer) == BLK_NEEDS_REDO)
+ {
+ Page vmpage = BufferGetPage(vmbuffer);
+ uint8 old_vmbits = 0;
+ Relation reln = CreateFakeRelcacheEntry(rlocator);
- UnlockReleaseBuffer(buffer);
+ /* initialize the page if it was read as zeros */
+ if (PageIsNew(vmpage))
+ PageInit(vmpage, BLCKSZ, 0);
+
+ old_vmbits = visibilitymap_set_vmbits(reln, blkno, vmbuffer, vmflags);
- XLogRecordPageWithFreeSpace(rlocator, blkno, freespace);
+ /* Only set VM page LSN if we modified the page */
+ if (old_vmbits != vmflags)
+ {
+ Assert(BufferIsDirty(vmbuffer));
+ PageSetLSN(BufferGetPage(vmbuffer), lsn);
}
- else
- UnlockReleaseBuffer(buffer);
+
+ FreeFakeRelcacheEntry(reln);
}
+
+ if (BufferIsValid(vmbuffer))
+ UnlockReleaseBuffer(vmbuffer);
+
+ if (freespace > 0)
+ XLogRecordPageWithFreeSpace(rlocator, blkno, freespace);
}
/*
diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index 7ebd22f00a3..f0b33d1b696 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -21,6 +21,7 @@
#include "access/transam.h"
#include "access/xlog.h"
#include "access/xloginsert.h"
+#include "access/visibilitymapdefs.h"
#include "commands/vacuum.h"
#include "executor/instrument.h"
#include "miscadmin.h"
@@ -835,6 +836,7 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
conflict_xid = prstate.latest_xid_removed;
log_heap_prune_and_freeze(relation, buffer,
+ InvalidBuffer, 0, false,
conflict_xid,
true, reason,
prstate.frozen, prstate.nfrozen,
@@ -2030,14 +2032,18 @@ heap_log_freeze_plan(HeapTupleFreeze *tuples, int ntuples,
*
* This is used for several different page maintenance operations:
*
- * - Page pruning, in VACUUM's 1st pass or on access: Some items are
+ * - Page pruning, in vacuum phase I or on-access: Some items are
* redirected, some marked dead, and some removed altogether.
*
- * - Freezing: Items are marked as 'frozen'.
+ * - Freezing: During vacuum phase I, items are marked as 'frozen'
*
- * - Vacuum, 2nd pass: Items that are already LP_DEAD are marked as unused.
+ * - Reaping: During vacuum phase III, items that are already LP_DEAD are
+ * marked as unused.
*
- * They have enough commonalities that we use a single WAL record for them
+ * - VM updates: After vacuum phase III, the heap page may be marked
+ * all-visible and all-frozen.
+ *
+ * These changes all happen together, so we use a singel WAL record for them
* all.
*
* If replaying the record requires a cleanup lock, pass cleanup_lock = true.
@@ -2045,12 +2051,23 @@ heap_log_freeze_plan(HeapTupleFreeze *tuples, int ntuples,
* replaying 'unused' items depends on whether they were all previously marked
* as dead.
*
+ * If the VM is being updated, vmflags will contain the bits to set. In this
+ * case, vmbuffer should already have been updated and marked dirty and should
+ * still be pinned and locked.
+ *
+ * set_pd_all_vis indicates that we set PD_ALL_VISIBLE and thus should update
+ * the page LSN when checksums/wal_log_hints are enabled even if we did not
+ * prune or freeze tuples on the page.
+ *
* Note: This function scribbles on the 'frozen' array.
*
* Note: This is called in a critical section, so careful what you do here.
*/
void
log_heap_prune_and_freeze(Relation relation, Buffer buffer,
+ Buffer vmbuffer,
+ uint8 vmflags,
+ bool set_pd_all_vis,
TransactionId conflict_xid,
bool cleanup_lock,
PruneReason reason,
@@ -2062,6 +2079,7 @@ log_heap_prune_and_freeze(Relation relation, Buffer buffer,
xl_heap_prune xlrec;
XLogRecPtr recptr;
uint8 info;
+ uint8 regbuf_flags;
/* The following local variables hold data registered in the WAL record: */
xlhp_freeze_plan plans[MaxHeapTuplesPerPage];
@@ -2070,8 +2088,21 @@ log_heap_prune_and_freeze(Relation relation, Buffer buffer,
xlhp_prune_items dead_items;
xlhp_prune_items unused_items;
OffsetNumber frz_offsets[MaxHeapTuplesPerPage];
+ bool do_prune = nredirected > 0 || ndead > 0 || nunused > 0;
+
+ Assert((vmflags & VISIBILITYMAP_VALID_BITS) == vmflags);
+ xlrec.flags = vmflags;
- xlrec.flags = 0;
+ regbuf_flags = REGBUF_STANDARD;
+
+ /*
+ * We can avoid an FPI if the only modification we are making to the heap
+ * page is to set PD_ALL_VISIBLE and checksums/wal_log_hints are disabled.
+ */
+ if (!do_prune &&
+ nfrozen == 0 &&
+ (!set_pd_all_vis || !XLogHintBitIsNeeded()))
+ regbuf_flags |= REGBUF_NO_IMAGE;
/*
* Prepare data for the buffer. The arrays are not actually in the
@@ -2079,7 +2110,11 @@ log_heap_prune_and_freeze(Relation relation, Buffer buffer,
* page image, the arrays can be omitted.
*/
XLogBeginInsert();
- XLogRegisterBuffer(0, buffer, REGBUF_STANDARD);
+ XLogRegisterBuffer(0, buffer, regbuf_flags);
+
+ if (vmflags & VISIBILITYMAP_VALID_BITS)
+ XLogRegisterBuffer(1, vmbuffer, 0);
+
if (nfrozen > 0)
{
int nplans;
@@ -2168,5 +2203,22 @@ log_heap_prune_and_freeze(Relation relation, Buffer buffer,
}
recptr = XLogInsert(RM_HEAP2_ID, info);
- PageSetLSN(BufferGetPage(buffer), recptr);
+ if (vmflags & VISIBILITYMAP_VALID_BITS)
+ {
+ Assert(BufferIsDirty(vmbuffer));
+ PageSetLSN(BufferGetPage(vmbuffer), recptr);
+ }
+
+ /*
+ * If pruning or freezing tuples or setting the page all-visible when
+ * checksums or wal_hint_bits are enabled, we must bump the LSN. Torn
+ * pages are possible if we update PD_ALL_VISIBLE without bumping the LSN,
+ * but this is deemed okay for page hint updates.
+ */
+ if (do_prune || nfrozen > 0 ||
+ (set_pd_all_vis && XLogHintBitIsNeeded()))
+ {
+ Assert(BufferIsDirty(buffer));
+ PageSetLSN(BufferGetPage(buffer), recptr);
+ }
}
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index c02eca36c88..e35cb629261 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -463,11 +463,13 @@ static void dead_items_add(LVRelState *vacrel, BlockNumber blkno, OffsetNumber *
int num_offsets);
static void dead_items_reset(LVRelState *vacrel);
static void dead_items_cleanup(LVRelState *vacrel);
-static bool heap_page_is_all_visible(Relation rel, Buffer buf,
- TransactionId OldestXmin,
- bool *all_frozen,
- TransactionId *visibility_cutoff_xid,
- OffsetNumber *logging_offnum);
+static bool heap_page_would_be_all_visible(Relation rel, Buffer buf,
+ TransactionId OldestXmin,
+ OffsetNumber *deadoffsets,
+ int ndeadoffsets,
+ bool *all_frozen,
+ TransactionId *visibility_cutoff_xid,
+ OffsetNumber *logging_offnum);
static void update_relstats_all_indexes(LVRelState *vacrel);
static void vacuum_error_callback(void *arg);
static void update_vacuum_error_info(LVRelState *vacrel,
@@ -2850,8 +2852,11 @@ lazy_vacuum_heap_page(LVRelState *vacrel, BlockNumber blkno, Buffer buffer,
OffsetNumber unused[MaxHeapTuplesPerPage];
int nunused = 0;
TransactionId visibility_cutoff_xid;
+ TransactionId conflict_xid = InvalidTransactionId;
bool all_frozen;
LVSavedErrInfo saved_err_info;
+ uint8 vmflags = 0;
+ bool set_pd_all_vis = false;
Assert(vacrel->do_index_vacuuming);
@@ -2862,6 +2867,20 @@ lazy_vacuum_heap_page(LVRelState *vacrel, BlockNumber blkno, Buffer buffer,
VACUUM_ERRCB_PHASE_VACUUM_HEAP, blkno,
InvalidOffsetNumber);
+ if (heap_page_would_be_all_visible(vacrel->rel, buffer,
+ vacrel->cutoffs.OldestXmin,
+ deadoffsets, num_offsets,
+ &all_frozen, &visibility_cutoff_xid,
+ &vacrel->offnum))
+ {
+ vmflags |= VISIBILITYMAP_ALL_VISIBLE;
+ if (all_frozen)
+ {
+ vmflags |= VISIBILITYMAP_ALL_FROZEN;
+ Assert(!TransactionIdIsValid(visibility_cutoff_xid));
+ }
+ }
+
START_CRIT_SECTION();
for (int i = 0; i < num_offsets; i++)
@@ -2881,6 +2900,18 @@ lazy_vacuum_heap_page(LVRelState *vacrel, BlockNumber blkno, Buffer buffer,
/* Attempt to truncate line pointer array now */
PageTruncateLinePointerArray(page);
+ if ((vmflags & VISIBILITYMAP_VALID_BITS) != 0)
+ {
+ Assert(!PageIsAllVisible(page));
+ set_pd_all_vis = true;
+ PageSetAllVisible(page);
+ LockBuffer(vmbuffer, BUFFER_LOCK_EXCLUSIVE);
+ visibilitymap_set_vmbits(vacrel->rel,
+ blkno,
+ vmbuffer, vmflags);
+ conflict_xid = visibility_cutoff_xid;
+ }
+
/*
* Mark buffer dirty before we write WAL.
*/
@@ -2890,7 +2921,10 @@ lazy_vacuum_heap_page(LVRelState *vacrel, BlockNumber blkno, Buffer buffer,
if (RelationNeedsWAL(vacrel->rel))
{
log_heap_prune_and_freeze(vacrel->rel, buffer,
- InvalidTransactionId,
+ vmbuffer,
+ vmflags,
+ set_pd_all_vis,
+ conflict_xid,
false, /* no cleanup lock required */
PRUNE_VACUUM_CLEANUP,
NULL, 0, /* frozen */
@@ -2899,39 +2933,12 @@ lazy_vacuum_heap_page(LVRelState *vacrel, BlockNumber blkno, Buffer buffer,
unused, nunused);
}
- /*
- * End critical section, so we safely can do visibility tests (which
- * possibly need to perform IO and allocate memory!). If we crash now the
- * page (including the corresponding vm bit) might not be marked all
- * visible, but that's fine. A later vacuum will fix that.
- */
END_CRIT_SECTION();
- /*
- * Now that we have removed the LP_DEAD items from the page, once again
- * check if the page has become all-visible. The page is already marked
- * dirty, exclusively locked, and, if needed, a full page image has been
- * emitted.
- */
- Assert(!PageIsAllVisible(page));
- if (heap_page_is_all_visible(vacrel->rel, buffer, vacrel->cutoffs.OldestXmin,
- &all_frozen, &visibility_cutoff_xid, &vacrel->offnum))
+ if ((vmflags & VISIBILITYMAP_ALL_VISIBLE) != 0)
{
- uint8 flags = VISIBILITYMAP_ALL_VISIBLE;
-
- if (all_frozen)
- {
- Assert(!TransactionIdIsValid(visibility_cutoff_xid));
- flags |= VISIBILITYMAP_ALL_FROZEN;
- }
-
- PageSetAllVisible(page);
- visibilitymap_set(vacrel->rel, blkno, buffer,
- InvalidXLogRecPtr,
- vmbuffer, visibility_cutoff_xid,
- flags);
-
/* Count the newly set VM page for logging */
+ LockBuffer(vmbuffer, BUFFER_LOCK_UNLOCK);
vacrel->vm_new_visible_pages++;
if (all_frozen)
vacrel->vm_new_visible_frozen_pages++;
@@ -3598,40 +3605,85 @@ dead_items_cleanup(LVRelState *vacrel)
}
/*
- * Check if every tuple in the given page in buf is visible to all current and
- * future transactions.
+ * Wrapper for heap_page_would_be_all_visible() which can be used for
+ * callers that expect no LP_DEAD on the page.
+ */
+bool
+heap_page_is_all_visible(Relation rel, Buffer buf,
+ TransactionId OldestXmin,
+ bool *all_frozen,
+ TransactionId *visibility_cutoff_xid,
+ OffsetNumber *logging_offnum)
+{
+
+ return heap_page_would_be_all_visible(rel, buf, OldestXmin,
+ NULL, 0,
+ all_frozen,
+ visibility_cutoff_xid,
+ logging_offnum);
+}
+
+/*
+ * Determines whether or not the heap page in buf is all-visible other than
+ * the dead line pointers referred to by the provided deadoffsets array.
*
- * OldestXmin is used to determine visibility.
+ * deadoffsets are the offsets the caller knows about and already removed
+ * associated index entries. Vacuum will call this before setting those line
+ * pointers LP_UNUSED. So, if there are no new LP_DEAD items, then the page
+ * can be set all-visible in the VM by the caller.
+ *
+ * Returns true if the page is all-visible other than the provided
+ * deadoffsets and false otherwise.
*
- * Sets *all_frozen to true if every tuple on this page is frozen.
+ * OldestXmin is used to determine visibility.
*
- * Sets *visibility_cutoff_xid to the highest xmin amongst the visible tuples.
- * It is only valid if the page is all-visible.
+ * *all_frozen is an output parameter indicating to the caller if every tuple
+ * on the page is frozen.
*
* *logging_offnum will have the OffsetNumber of the current tuple being
* processed for vacuum's error callback system.
*
- * This is a stripped down version of lazy_scan_prune(). If you change
- * anything here, make sure that everything stays in sync. Note that an
- * assertion calls us to verify that everybody still agrees. Be sure to avoid
- * introducing new side-effects here.
+ * *visibility_cutoff_xid is an output parameter with the highest xmin amongst the
+ * visible tuples. It is only valid if the page is all-visible.
+ *
+ * Callers looking to verify that the page is already all-visible can call
+ * heap_page_is_all_visible().
+ *
+ * This is similar logic to that in heap_prune_record_unchanged_lp_normal() If
+ * you change anything here, make sure that everything stays in sync. Note
+ * that an assertion calls us to verify that everybody still agrees. Be sure
+ * to avoid introducing new side-effects here.
*/
static bool
-heap_page_is_all_visible(Relation rel, Buffer buf,
- TransactionId OldestXmin,
- bool *all_frozen,
- TransactionId *visibility_cutoff_xid,
- OffsetNumber *logging_offnum)
+heap_page_would_be_all_visible(Relation rel, Buffer buf,
+ TransactionId OldestXmin,
+ OffsetNumber *deadoffsets,
+ int ndeadoffsets,
+ bool *all_frozen,
+ TransactionId *visibility_cutoff_xid,
+ OffsetNumber *logging_offnum)
{
Page page = BufferGetPage(buf);
BlockNumber blockno = BufferGetBlockNumber(buf);
OffsetNumber offnum,
maxoff;
bool all_visible = true;
+ int matched_dead_count = 0;
*visibility_cutoff_xid = InvalidTransactionId;
*all_frozen = true;
+ Assert(ndeadoffsets == 0 || deadoffsets);
+
+#ifdef USE_ASSERT_CHECKING
+ /* Confirm input deadoffsets[] is strictly sorted */
+ if (ndeadoffsets > 1)
+ {
+ for (int i = 1; i < ndeadoffsets; i++)
+ Assert(deadoffsets[i - 1] < deadoffsets[i]);
+ }
+#endif
+
maxoff = PageGetMaxOffsetNumber(page);
for (offnum = FirstOffsetNumber;
offnum <= maxoff && all_visible;
@@ -3659,9 +3711,15 @@ heap_page_is_all_visible(Relation rel, Buffer buf,
*/
if (ItemIdIsDead(itemid))
{
- all_visible = false;
- *all_frozen = false;
- break;
+ if (!deadoffsets ||
+ matched_dead_count >= ndeadoffsets ||
+ deadoffsets[matched_dead_count] != offnum)
+ {
+ *all_frozen = all_visible = false;
+ break;
+ }
+ matched_dead_count++;
+ continue;
}
Assert(ItemIdIsNormal(itemid));
diff --git a/src/backend/access/rmgrdesc/heapdesc.c b/src/backend/access/rmgrdesc/heapdesc.c
index b48d7dc1d24..c95d30dfe8d 100644
--- a/src/backend/access/rmgrdesc/heapdesc.c
+++ b/src/backend/access/rmgrdesc/heapdesc.c
@@ -103,7 +103,7 @@ plan_elem_desc(StringInfo buf, void *plan, void *data)
* code, the latter of which is used in frontend (pg_waldump) code.
*/
void
-heap_xlog_deserialize_prune_and_freeze(char *cursor, uint8 flags,
+heap_xlog_deserialize_prune_and_freeze(char *cursor, uint16 flags,
int *nplans, xlhp_freeze_plan **plans,
OffsetNumber **frz_offsets,
int *nredirected, OffsetNumber **redirected,
@@ -279,7 +279,6 @@ heap2_desc(StringInfo buf, XLogReaderState *record)
TransactionId conflict_xid;
memcpy(&conflict_xid, rec + SizeOfHeapPrune, sizeof(TransactionId));
-
appendStringInfo(buf, "snapshotConflictHorizon: %u",
conflict_xid);
}
@@ -287,6 +286,10 @@ heap2_desc(StringInfo buf, XLogReaderState *record)
appendStringInfo(buf, ", isCatalogRel: %c",
xlrec->flags & XLHP_IS_CATALOG_REL ? 'T' : 'F');
+ if (xlrec->flags & VISIBILITYMAP_VALID_BITS)
+ appendStringInfo(buf, ", vm_flags: 0x%02X",
+ xlrec->flags & VISIBILITYMAP_VALID_BITS);
+
if (XLogRecHasBlockData(record, 0))
{
Size datalen;
diff --git a/src/include/access/heapam.h b/src/include/access/heapam.h
index a2bd5a897f8..8b47295efa2 100644
--- a/src/include/access/heapam.h
+++ b/src/include/access/heapam.h
@@ -344,6 +344,12 @@ extern void heap_inplace_update_and_unlock(Relation relation,
Buffer buffer);
extern void heap_inplace_unlock(Relation relation,
HeapTuple oldtup, Buffer buffer);
+
+extern bool heap_page_is_all_visible(Relation rel, Buffer buf,
+ TransactionId OldestXmin,
+ bool *all_frozen,
+ TransactionId *visibility_cutoff_xid,
+ OffsetNumber *logging_offnum);
extern bool heap_prepare_freeze_tuple(HeapTupleHeader tuple,
const struct VacuumCutoffs *cutoffs,
HeapPageFreeze *pagefrz,
@@ -388,6 +394,9 @@ extern void heap_page_prune_execute(Buffer buffer, bool lp_truncate_only,
OffsetNumber *nowunused, int nunused);
extern void heap_get_root_tuples(Page page, OffsetNumber *root_offsets);
extern void log_heap_prune_and_freeze(Relation relation, Buffer buffer,
+ Buffer vmbuffer,
+ uint8 vmflags,
+ bool vm_modified_heap_page,
TransactionId conflict_xid,
bool cleanup_lock,
PruneReason reason,
diff --git a/src/include/access/heapam_xlog.h b/src/include/access/heapam_xlog.h
index d4c0625b632..d8508593e7c 100644
--- a/src/include/access/heapam_xlog.h
+++ b/src/include/access/heapam_xlog.h
@@ -249,7 +249,7 @@ typedef struct xl_heap_update
* Main data section:
*
* xl_heap_prune
- * uint8 flags
+ * uint16 flags
* TransactionId snapshot_conflict_horizon
*
* Block 0 data section:
@@ -284,7 +284,7 @@ typedef struct xl_heap_update
*/
typedef struct xl_heap_prune
{
- uint8 flags;
+ uint16 flags;
/*
* If XLHP_HAS_CONFLICT_HORIZON is set, the conflict horizon XID follows,
@@ -292,10 +292,22 @@ typedef struct xl_heap_prune
*/
} xl_heap_prune;
-#define SizeOfHeapPrune (offsetof(xl_heap_prune, flags) + sizeof(uint8))
+#define SizeOfHeapPrune (offsetof(xl_heap_prune, flags) + sizeof(uint16))
+
+/*
+ * The xl_heap_prune record's flags may also contain which VM bits to set. As
+ * such, (1 << 0) and (1 << 1) are reserved for VISIBILITYMAP_ALL_VISIBLE and
+ * VISIBILITYMAP_ALL_FROZEN.
+ */
-/* to handle recovery conflict during logical decoding on standby */
-#define XLHP_IS_CATALOG_REL (1 << 1)
+/*
+ * To handle recovery conflict during logical decoding on standby, we must know
+ * if the table is a catalog table. Note that in visibilitymapdefs.h
+ * VISIBLITYMAP_XLOG_CATALOG_REL is also defined as (1 << 2). xl_heap_prune
+ * records should use XLHP_IS_CATALOG_REL, not VISIBILIYTMAP_XLOG_CATALOG_REL --
+ * even if they only contain updates to the VM.
+ */
+#define XLHP_IS_CATALOG_REL (1 << 2)
/*
* Does replaying the record require a cleanup-lock?
@@ -305,7 +317,7 @@ typedef struct xl_heap_prune
* marks LP_DEAD line pointers as unused without moving any tuple data, an
* ordinary exclusive lock is sufficient.
*/
-#define XLHP_CLEANUP_LOCK (1 << 2)
+#define XLHP_CLEANUP_LOCK (1 << 3)
/*
* If we remove or freeze any entries that contain xids, we need to include a
@@ -313,22 +325,22 @@ typedef struct xl_heap_prune
* there are no queries running for which the removed tuples are still
* visible, or which still consider the frozen XIDs as running.
*/
-#define XLHP_HAS_CONFLICT_HORIZON (1 << 3)
+#define XLHP_HAS_CONFLICT_HORIZON (1 << 4)
/*
* Indicates that an xlhp_freeze_plans sub-record and one or more
* xlhp_freeze_plan sub-records are present.
*/
-#define XLHP_HAS_FREEZE_PLANS (1 << 4)
+#define XLHP_HAS_FREEZE_PLANS (1 << 5)
/*
* XLHP_HAS_REDIRECTIONS, XLHP_HAS_DEAD_ITEMS, and XLHP_HAS_NOW_UNUSED_ITEMS
* indicate that xlhp_prune_items sub-records with redirected, dead, and
* unused item offsets are present.
*/
-#define XLHP_HAS_REDIRECTIONS (1 << 5)
-#define XLHP_HAS_DEAD_ITEMS (1 << 6)
-#define XLHP_HAS_NOW_UNUSED_ITEMS (1 << 7)
+#define XLHP_HAS_REDIRECTIONS (1 << 6)
+#define XLHP_HAS_DEAD_ITEMS (1 << 7)
+#define XLHP_HAS_NOW_UNUSED_ITEMS (1 << 8)
/*
* xlhp_freeze_plan describes how to freeze a group of one or more heap tuples
@@ -497,7 +509,7 @@ extern XLogRecPtr log_heap_visible(Relation rel, Buffer heap_buffer,
uint8 vmflags);
/* in heapdesc.c, so it can be shared between frontend/backend code */
-extern void heap_xlog_deserialize_prune_and_freeze(char *cursor, uint8 flags,
+extern void heap_xlog_deserialize_prune_and_freeze(char *cursor, uint16 flags,
int *nplans, xlhp_freeze_plan **plans,
OffsetNumber **frz_offsets,
int *nredirected, OffsetNumber **redirected,
--
2.43.0
v10-0006-Use-xl_heap_prune-record-for-setting-empty-pages.patchtext/x-patch; charset=US-ASCII; name=v10-0006-Use-xl_heap_prune-record-for-setting-empty-pages.patchDownload
From c53c62b18414bb6de25bef4c4e428904828dfa5a Mon Sep 17 00:00:00 2001
From: Melanie Plageman <melanieplageman@gmail.com>
Date: Wed, 18 Jun 2025 12:42:19 -0400
Subject: [PATCH v10 06/22] Use xl_heap_prune record for setting empty pages
all-visible
As part of a project to eliminate xl_heap_visible records, eliminate
their usage in phase I vacuum of empty pages.
---
src/backend/access/heap/pruneheap.c | 14 +++++--
src/backend/access/heap/vacuumlazy.c | 55 ++++++++++++++++++----------
src/include/access/heapam.h | 1 +
3 files changed, 47 insertions(+), 23 deletions(-)
diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index f0b33d1b696..373986b204a 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -836,6 +836,7 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
conflict_xid = prstate.latest_xid_removed;
log_heap_prune_and_freeze(relation, buffer,
+ false,
InvalidBuffer, 0, false,
conflict_xid,
true, reason,
@@ -2055,6 +2056,9 @@ heap_log_freeze_plan(HeapTupleFreeze *tuples, int ntuples,
* case, vmbuffer should already have been updated and marked dirty and should
* still be pinned and locked.
*
+ * force_heap_fpi indicates that a full page image of the heap block should be
+ * forced.
+ *
* set_pd_all_vis indicates that we set PD_ALL_VISIBLE and thus should update
* the page LSN when checksums/wal_log_hints are enabled even if we did not
* prune or freeze tuples on the page.
@@ -2065,6 +2069,7 @@ heap_log_freeze_plan(HeapTupleFreeze *tuples, int ntuples,
*/
void
log_heap_prune_and_freeze(Relation relation, Buffer buffer,
+ bool force_heap_fpi,
Buffer vmbuffer,
uint8 vmflags,
bool set_pd_all_vis,
@@ -2095,13 +2100,16 @@ log_heap_prune_and_freeze(Relation relation, Buffer buffer,
regbuf_flags = REGBUF_STANDARD;
+ if (force_heap_fpi)
+ regbuf_flags |= REGBUF_FORCE_IMAGE;
+
/*
* We can avoid an FPI if the only modification we are making to the heap
* page is to set PD_ALL_VISIBLE and checksums/wal_log_hints are disabled.
*/
- if (!do_prune &&
- nfrozen == 0 &&
- (!set_pd_all_vis || !XLogHintBitIsNeeded()))
+ else if (!do_prune &&
+ nfrozen == 0 &&
+ (!set_pd_all_vis || !XLogHintBitIsNeeded()))
regbuf_flags |= REGBUF_NO_IMAGE;
/*
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index e35cb629261..67c853b586a 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -1877,33 +1877,47 @@ lazy_scan_new_or_empty(LVRelState *vacrel, Buffer buf, BlockNumber blkno,
*/
if (!PageIsAllVisible(page))
{
+ uint8 new_vmbits = VISIBILITYMAP_ALL_VISIBLE |
+ VISIBILITYMAP_ALL_FROZEN;
+
START_CRIT_SECTION();
- /* mark buffer dirty before writing a WAL record */
+ PageSetAllVisible(page);
MarkBufferDirty(buf);
- /*
- * It's possible that another backend has extended the heap,
- * initialized the page, and then failed to WAL-log the page due
- * to an ERROR. Since heap extension is not WAL-logged, recovery
- * might try to replay our record setting the page all-visible and
- * find that the page isn't initialized, which will cause a PANIC.
- * To prevent that, check whether the page has been previously
- * WAL-logged, and if not, do that now.
- */
- if (RelationNeedsWAL(vacrel->rel) &&
- PageGetLSN(page) == InvalidXLogRecPtr)
- log_newpage_buffer(buf, true);
+ LockBuffer(vmbuffer, BUFFER_LOCK_EXCLUSIVE);
+ visibilitymap_set_vmbits(vacrel->rel, blkno,
+ vmbuffer, new_vmbits);
+
+ if (RelationNeedsWAL(vacrel->rel))
+ {
+ /*
+ * It's possible that another backend has extended the heap,
+ * initialized the page, and then failed to WAL-log the page
+ * due to an ERROR. Since heap extension is not WAL-logged,
+ * recovery might try to replay our record setting the page
+ * all-visible and find that the page isn't initialized, which
+ * will cause a PANIC. To prevent that, if the page hasn't
+ * been previously WAL-logged, force a heap FPI.
+ */
+ log_heap_prune_and_freeze(vacrel->rel, buf,
+ PageGetLSN(page) == InvalidXLogRecPtr,
+ vmbuffer,
+ new_vmbits,
+ true,
+ InvalidTransactionId,
+ false, PRUNE_VACUUM_SCAN,
+ NULL, 0,
+ NULL, 0,
+ NULL, 0,
+ NULL, 0);
+ }
- PageSetAllVisible(page);
- visibilitymap_set(vacrel->rel, blkno, buf,
- InvalidXLogRecPtr,
- vmbuffer, InvalidTransactionId,
- VISIBILITYMAP_ALL_VISIBLE |
- VISIBILITYMAP_ALL_FROZEN);
END_CRIT_SECTION();
- /* Count the newly all-frozen pages for logging */
+ LockBuffer(vmbuffer, BUFFER_LOCK_UNLOCK);
+
+ /* Count the newly all-frozen pages for logging. */
vacrel->vm_new_visible_pages++;
vacrel->vm_new_visible_frozen_pages++;
}
@@ -2921,6 +2935,7 @@ lazy_vacuum_heap_page(LVRelState *vacrel, BlockNumber blkno, Buffer buffer,
if (RelationNeedsWAL(vacrel->rel))
{
log_heap_prune_and_freeze(vacrel->rel, buffer,
+ false,
vmbuffer,
vmflags,
set_pd_all_vis,
diff --git a/src/include/access/heapam.h b/src/include/access/heapam.h
index 8b47295efa2..e7129a644a1 100644
--- a/src/include/access/heapam.h
+++ b/src/include/access/heapam.h
@@ -394,6 +394,7 @@ extern void heap_page_prune_execute(Buffer buffer, bool lp_truncate_only,
OffsetNumber *nowunused, int nunused);
extern void heap_get_root_tuples(Page page, OffsetNumber *root_offsets);
extern void log_heap_prune_and_freeze(Relation relation, Buffer buffer,
+ bool force_heap_fpi,
Buffer vmbuffer,
uint8 vmflags,
bool vm_modified_heap_page,
--
2.43.0
v10-0009-Find-and-fix-VM-corruption-in-heap_page_prune_an.patchtext/x-patch; charset=US-ASCII; name=v10-0009-Find-and-fix-VM-corruption-in-heap_page_prune_an.patchDownload
From 76e13c0ec681ec1eaba065bc3e88f72e37b37621 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <melanieplageman@gmail.com>
Date: Wed, 28 May 2025 16:45:59 -0400
Subject: [PATCH v10 09/22] Find and fix VM corruption in
heap_page_prune_and_freeze
Future commits will update the VM in the same critical section and WAL
record as pruning and freezing. For ease of review, this commit makes
one step toward doing this. It moves the VM corruption handling case to
heap_page_prune_and_freeze().
---
src/backend/access/heap/pruneheap.c | 91 +++++++++++++++++++++++++++-
src/backend/access/heap/vacuumlazy.c | 82 +++----------------------
src/include/access/heapam.h | 4 ++
3 files changed, 100 insertions(+), 77 deletions(-)
diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index 373986b204a..e0005c2d4f2 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -21,7 +21,7 @@
#include "access/transam.h"
#include "access/xlog.h"
#include "access/xloginsert.h"
-#include "access/visibilitymapdefs.h"
+#include "access/visibilitymap.h"
#include "commands/vacuum.h"
#include "executor/instrument.h"
#include "miscadmin.h"
@@ -177,6 +177,13 @@ static void heap_prune_record_unchanged_lp_redirect(PruneState *prstate, OffsetN
static void page_verify_redirects(Page page);
+static bool identify_and_fix_vm_corruption(Relation relation,
+ BlockNumber heap_blk,
+ Buffer heap_buffer, Page heap_page,
+ bool heap_blk_known_av,
+ int64 nlpdead_items,
+ Buffer vmbuffer);
+
/*
* Optionally prune and repair fragmentation in the specified page.
@@ -261,7 +268,9 @@ heap_page_prune_opt(Relation relation, Buffer buffer)
* not the relation has indexes, since we cannot safely determine
* that during on-access pruning with the current implementation.
*/
- heap_page_prune_and_freeze(relation, buffer, vistest, 0,
+ heap_page_prune_and_freeze(relation, buffer, false,
+ InvalidBuffer,
+ vistest, 0,
NULL, &presult, PRUNE_ON_ACCESS, &dummy_off_loc, NULL, NULL);
/*
@@ -294,6 +303,68 @@ heap_page_prune_opt(Relation relation, Buffer buffer)
}
}
+/*
+ * When updating the visibility map after phase I heap vacuuming, we take the
+ * opportunity to identify and fix any VM corruption.
+ *
+ * heap_blk_known_av is the visibility status of the heap page collected
+ * while finding the next unskippable block in heap_vac_scan_next_block().
+ */
+static bool
+identify_and_fix_vm_corruption(Relation relation,
+ BlockNumber heap_blk,
+ Buffer heap_buffer, Page heap_page,
+ bool heap_blk_known_av,
+ int64 nlpdead_items,
+ Buffer vmbuffer)
+{
+ /*
+ * As of PostgreSQL 9.2, the visibility map bit should never be set if the
+ * page-level bit is clear. However, it's possible that the bit got
+ * cleared after heap_vac_scan_next_block() was called, so we must recheck
+ * with buffer lock before concluding that the VM is corrupt.
+ */
+ if (heap_blk_known_av && !PageIsAllVisible(heap_page) &&
+ visibilitymap_get_status(relation, heap_blk, &vmbuffer) != 0)
+ {
+ ereport(WARNING, (errcode(ERRCODE_DATA_CORRUPTED),
+ errmsg("page is not marked all-visible but visibility map bit is set in relation \"%s\" page %u",
+ RelationGetRelationName(relation), heap_blk)));
+
+ visibilitymap_clear(relation, heap_blk, vmbuffer,
+ VISIBILITYMAP_VALID_BITS);
+ return true;
+ }
+
+ /*
+ * It's possible for the value returned by
+ * GetOldestNonRemovableTransactionId() to move backwards, so it's not
+ * wrong for us to see tuples that appear to not be visible to everyone
+ * yet, while PD_ALL_VISIBLE is already set. The real safe xmin value
+ * never moves backwards, but GetOldestNonRemovableTransactionId() is
+ * conservative and sometimes returns a value that's unnecessarily small,
+ * so if we see that contradiction it just means that the tuples that we
+ * think are not visible to everyone yet actually are, and the
+ * PD_ALL_VISIBLE flag is correct.
+ *
+ * There should never be LP_DEAD items on a page with PD_ALL_VISIBLE set,
+ * however.
+ */
+ if (nlpdead_items > 0 && PageIsAllVisible(heap_page))
+ {
+ ereport(WARNING, (errcode(ERRCODE_DATA_CORRUPTED),
+ errmsg("page containing LP_DEAD items is marked as all-visible in relation \"%s\" page %u",
+ RelationGetRelationName(relation), heap_blk)));
+
+ PageClearAllVisible(heap_page);
+ MarkBufferDirty(heap_buffer);
+ visibilitymap_clear(relation, heap_blk, vmbuffer,
+ VISIBILITYMAP_VALID_BITS);
+ return true;
+ }
+
+ return false;
+}
/*
* Prune and repair fragmentation and potentially freeze tuples on the
@@ -314,6 +385,10 @@ heap_page_prune_opt(Relation relation, Buffer buffer)
* HEAP_PRUNE_FREEZE option is not set, because at the moment only callers
* that also freeze need that information.
*
+ * blk_known_av is the visibility status of the heap block as of the last call
+ * to find_next_unskippable_block(). vmbuffer is the buffer that may already
+ * contain the required block of the visibility map.
+ *
* vistest is used to distinguish whether tuples are DEAD or RECENTLY_DEAD
* (see heap_prune_satisfies_vacuum).
*
@@ -349,6 +424,8 @@ heap_page_prune_opt(Relation relation, Buffer buffer)
*/
void
heap_page_prune_and_freeze(Relation relation, Buffer buffer,
+ bool blk_known_av,
+ Buffer vmbuffer,
GlobalVisState *vistest,
int options,
struct VacuumCutoffs *cutoffs,
@@ -897,6 +974,16 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
presult->lpdead_items = prstate.lpdead_items;
/* the presult->deadoffsets array was already filled in */
+ /*
+ * Clear any VM corruption. This does not need to be done in a critical
+ * section.
+ */
+ presult->vm_corruption = false;
+ if (options & HEAP_PAGE_PRUNE_UPDATE_VM)
+ presult->vm_corruption = identify_and_fix_vm_corruption(relation,
+ blockno, buffer, page,
+ blk_known_av,
+ prstate.lpdead_items, vmbuffer);
if (prstate.freeze)
{
if (presult->nfrozen > 0)
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index c1ae3a355c8..322e54c803f 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -430,12 +430,6 @@ static void find_next_unskippable_block(LVRelState *vacrel, bool *skipsallvis);
static bool lazy_scan_new_or_empty(LVRelState *vacrel, Buffer buf,
BlockNumber blkno, Page page,
bool sharelock, Buffer vmbuffer);
-static bool identify_and_fix_vm_corruption(Relation relation,
- BlockNumber heap_blk,
- Buffer heap_buffer, Page heap_page,
- bool heap_blk_known_av,
- int64 nlpdead_items,
- Buffer vmbuffer);
static int lazy_scan_prune(LVRelState *vacrel, Buffer buf,
BlockNumber blkno, Page page,
Buffer vmbuffer, bool all_visible_according_to_vm,
@@ -1938,70 +1932,6 @@ lazy_scan_new_or_empty(LVRelState *vacrel, Buffer buf, BlockNumber blkno,
return false;
}
-/*
- * When updating the visibility map after phase I heap vacuuming, we take the
- * opportunity to identify and fix any VM corruption.
- *
- * heap_blk_known_av is the visibility status of the heap page collected
- * while finding the next unskippable block in heap_vac_scan_next_block().
- */
-static bool
-identify_and_fix_vm_corruption(Relation relation,
- BlockNumber heap_blk,
- Buffer heap_buffer, Page heap_page,
- bool heap_blk_known_av,
- int64 nlpdead_items,
- Buffer vmbuffer)
-{
- /*
- * As of PostgreSQL 9.2, the visibility map bit should never be set if the
- * page-level bit is clear. However, it's possible that the bit got
- * cleared after heap_vac_scan_next_block() was called, so we must recheck
- * with buffer lock before concluding that the VM is corrupt.
- */
- if (heap_blk_known_av && !PageIsAllVisible(heap_page) &&
- visibilitymap_get_status(relation, heap_blk, &vmbuffer) != 0)
- {
- ereport(WARNING, (errcode(ERRCODE_DATA_CORRUPTED),
- errmsg("page is not marked all-visible but visibility map bit is set in relation \"%s\" page %u",
- RelationGetRelationName(relation), heap_blk)));
-
- visibilitymap_clear(relation, heap_blk, vmbuffer,
- VISIBILITYMAP_VALID_BITS);
- return true;
- }
-
- /*
- * It's possible for the value returned by
- * GetOldestNonRemovableTransactionId() to move backwards, so it's not
- * wrong for us to see tuples that appear to not be visible to everyone
- * yet, while PD_ALL_VISIBLE is already set. The real safe xmin value
- * never moves backwards, but GetOldestNonRemovableTransactionId() is
- * conservative and sometimes returns a value that's unnecessarily small,
- * so if we see that contradiction it just means that the tuples that we
- * think are not visible to everyone yet actually are, and the
- * PD_ALL_VISIBLE flag is correct.
- *
- * There should never be LP_DEAD items on a page with PD_ALL_VISIBLE set,
- * however.
- */
- if (nlpdead_items > 0 && PageIsAllVisible(heap_page))
- {
- ereport(WARNING, (errcode(ERRCODE_DATA_CORRUPTED),
- errmsg("page containing LP_DEAD items is marked as all-visible in relation \"%s\" page %u",
- RelationGetRelationName(relation), heap_blk)));
-
- PageClearAllVisible(heap_page);
- MarkBufferDirty(heap_buffer);
- visibilitymap_clear(relation, heap_blk, vmbuffer,
- VISIBILITYMAP_VALID_BITS);
- return true;
- }
-
- return false;
-}
-
-
/* qsort comparator for sorting OffsetNumbers */
static int
cmpOffsetNumbers(const void *a, const void *b)
@@ -2059,11 +1989,14 @@ lazy_scan_prune(LVRelState *vacrel,
* tuples. Pruning will have determined whether or not the page is
* all-visible.
*/
- prune_options = HEAP_PAGE_PRUNE_FREEZE;
+ prune_options = HEAP_PAGE_PRUNE_FREEZE | HEAP_PAGE_PRUNE_UPDATE_VM;
if (vacrel->nindexes == 0)
prune_options |= HEAP_PAGE_PRUNE_MARK_UNUSED_NOW;
- heap_page_prune_and_freeze(rel, buf, vacrel->vistest, prune_options,
+ heap_page_prune_and_freeze(rel, buf,
+ all_visible_according_to_vm,
+ vmbuffer,
+ vacrel->vistest, prune_options,
&vacrel->cutoffs, &presult, PRUNE_VACUUM_SCAN,
&vacrel->offnum,
&vacrel->NewRelfrozenXid, &vacrel->NewRelminMxid);
@@ -2148,10 +2081,9 @@ lazy_scan_prune(LVRelState *vacrel,
/*
* Handle setting visibility map bit based on information from the VM (as
* of last heap_vac_scan_next_block() call), and from all_visible and
- * all_frozen variables. Start by looking for any VM corruption.
+ * all_frozen variables.
*/
- if (identify_and_fix_vm_corruption(vacrel->rel, blkno, buf, page,
- all_visible_according_to_vm, presult.lpdead_items, vmbuffer))
+ if (presult.vm_corruption)
{
/* Don't update the VM if we just cleared corruption in it */
}
diff --git a/src/include/access/heapam.h b/src/include/access/heapam.h
index e7129a644a1..0c7eb5e46f4 100644
--- a/src/include/access/heapam.h
+++ b/src/include/access/heapam.h
@@ -42,6 +42,7 @@
/* "options" flag bits for heap_page_prune_and_freeze */
#define HEAP_PAGE_PRUNE_MARK_UNUSED_NOW (1 << 0)
#define HEAP_PAGE_PRUNE_FREEZE (1 << 1)
+#define HEAP_PAGE_PRUNE_UPDATE_VM (1 << 2)
typedef struct BulkInsertStateData *BulkInsertState;
struct TupleTableSlot;
@@ -247,6 +248,7 @@ typedef struct PruneFreezeResult
bool all_visible;
bool all_frozen;
TransactionId vm_conflict_horizon;
+ bool vm_corruption;
/*
* Whether or not the page makes rel truncation unsafe. This is set to
@@ -380,6 +382,8 @@ extern TransactionId heap_index_delete_tuples(Relation rel,
struct GlobalVisState;
extern void heap_page_prune_opt(Relation relation, Buffer buffer);
extern void heap_page_prune_and_freeze(Relation relation, Buffer buffer,
+ bool blk_known_av,
+ Buffer vmbuffer,
struct GlobalVisState *vistest,
int options,
struct VacuumCutoffs *cutoffs,
--
2.43.0
v10-0008-Combine-vacuum-phase-I-VM-update-cases.patchtext/x-patch; charset=US-ASCII; name=v10-0008-Combine-vacuum-phase-I-VM-update-cases.patchDownload
From 271fa7a624f6fe07ce96dc2a59b3ea5ae8303347 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <melanieplageman@gmail.com>
Date: Wed, 28 May 2025 16:35:36 -0400
Subject: [PATCH v10 08/22] Combine vacuum phase I VM update cases
We update the VM after phase I of vacuum -- either setting both the VM
bits when all bits are currently unset or setting just the frozen bit
when the all-visible bit is already set.
Those two cases shared much of the same code -- leading to unnecessary
duplication. This commit combines them, which is simpler and easier to
understand.
The combined case also happens to fix a longstanding bug where if we are
only setting an all-visible page all-frozen and checksums/wal_log_hints
are enabled, we would fail to set the buffer dirty before setting the
page LSN in visibilitymap_set().
---
src/backend/access/heap/vacuumlazy.c | 101 +++++++++------------------
1 file changed, 32 insertions(+), 69 deletions(-)
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index 2cc11e6d55d..c1ae3a355c8 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -2155,11 +2155,26 @@ lazy_scan_prune(LVRelState *vacrel,
{
/* Don't update the VM if we just cleared corruption in it */
}
- else if (!all_visible_according_to_vm && presult.all_visible)
+
+ /*
+ * If the page isn't yet marked all-visible in the VM or it is and needs
+ * to me marked all-frozen, update the VM Note that all_frozen is only
+ * valid if all_visible is true, so we must check both all_visible and
+ * all_frozen.
+ */
+ else if (presult.all_visible &&
+ (!all_visible_according_to_vm ||
+ (presult.all_frozen && !VM_ALL_FROZEN(vacrel->rel, blkno, &vmbuffer))))
{
uint8 old_vmbits;
uint8 flags = VISIBILITYMAP_ALL_VISIBLE;
+ /*
+ * If the page is all-frozen, we can pass InvalidTransactionId as our
+ * cutoff_xid, since a snapshotConflictHorizon sufficient to make
+ * everything safe for REDO was logged when the page's tuples were
+ * frozen.
+ */
if (presult.all_frozen)
{
Assert(!TransactionIdIsValid(presult.vm_conflict_horizon));
@@ -2172,21 +2187,29 @@ lazy_scan_prune(LVRelState *vacrel,
* checksums are not enabled). Regardless, set both bits so that we
* get back in sync.
*
- * NB: If the heap page is all-visible but the VM bit is not set, we
- * don't need to dirty the heap page. However, if checksums are
- * enabled, we do need to make sure that the heap page is dirtied
- * before passing it to visibilitymap_set(), because it may be logged.
- * Given that this situation should only happen in rare cases after a
- * crash, it is not worth optimizing.
+ * If the heap page is all-visible but the VM bit is not set, we don't
+ * need to dirty the heap page. However, if checksums are enabled, we
+ * do need to make sure that the heap page is dirtied before passing
+ * it to visibilitymap_set(), because it may be logged.
*/
- PageSetAllVisible(page);
- MarkBufferDirty(buf);
+ if (!PageIsAllVisible(page) || XLogHintBitIsNeeded())
+ {
+ PageSetAllVisible(page);
+ MarkBufferDirty(buf);
+ }
+
old_vmbits = visibilitymap_set(vacrel->rel, blkno, buf,
InvalidXLogRecPtr,
vmbuffer, presult.vm_conflict_horizon,
flags);
/*
+ * Even if we are only setting the all-frozen bit, there is a small
+ * chance that the VM was modified sometime between setting
+ * all_visible_according_to_vm and checking the visibility during
+ * pruning. Check the return value of old_vmbits to ensure the
+ * visibility map counters used for logging are accurate.
+ *
* If the page wasn't already set all-visible and/or all-frozen in the
* VM, count it as newly set for logging.
*/
@@ -2207,66 +2230,6 @@ lazy_scan_prune(LVRelState *vacrel,
}
}
- /*
- * If the all-visible page is all-frozen but not marked as such yet, mark
- * it as all-frozen. Note that all_frozen is only valid if all_visible is
- * true, so we must check both all_visible and all_frozen.
- */
- else if (all_visible_according_to_vm && presult.all_visible &&
- presult.all_frozen && !VM_ALL_FROZEN(vacrel->rel, blkno, &vmbuffer))
- {
- uint8 old_vmbits;
-
- /*
- * Avoid relying on all_visible_according_to_vm as a proxy for the
- * page-level PD_ALL_VISIBLE bit being set, since it might have become
- * stale -- even when all_visible is set
- */
- if (!PageIsAllVisible(page))
- {
- PageSetAllVisible(page);
- MarkBufferDirty(buf);
- }
-
- /*
- * Set the page all-frozen (and all-visible) in the VM.
- *
- * We can pass InvalidTransactionId as our cutoff_xid, since a
- * snapshotConflictHorizon sufficient to make everything safe for REDO
- * was logged when the page's tuples were frozen.
- */
- Assert(!TransactionIdIsValid(presult.vm_conflict_horizon));
- old_vmbits = visibilitymap_set(vacrel->rel, blkno, buf,
- InvalidXLogRecPtr,
- vmbuffer, InvalidTransactionId,
- VISIBILITYMAP_ALL_VISIBLE |
- VISIBILITYMAP_ALL_FROZEN);
-
- /*
- * The page was likely already set all-visible in the VM. However,
- * there is a small chance that it was modified sometime between
- * setting all_visible_according_to_vm and checking the visibility
- * during pruning. Check the return value of old_vmbits anyway to
- * ensure the visibility map counters used for logging are accurate.
- */
- if ((old_vmbits & VISIBILITYMAP_ALL_VISIBLE) == 0)
- {
- vacrel->vm_new_visible_pages++;
- vacrel->vm_new_visible_frozen_pages++;
- *vm_page_frozen = true;
- }
-
- /*
- * We already checked that the page was not set all-frozen in the VM
- * above, so we don't need to test the value of old_vmbits.
- */
- else
- {
- vacrel->vm_new_frozen_pages++;
- *vm_page_frozen = true;
- }
- }
-
return presult.ndeleted;
}
--
2.43.0
v10-0010-Keep-all_frozen-updated-too-in-heap_page_prune_a.patchtext/x-patch; charset=US-ASCII; name=v10-0010-Keep-all_frozen-updated-too-in-heap_page_prune_a.patchDownload
From 3ac96aa83bad6be7347b5103fb8b31d42c975f2d Mon Sep 17 00:00:00 2001
From: Melanie Plageman <melanieplageman@gmail.com>
Date: Wed, 23 Jul 2025 16:01:24 -0400
Subject: [PATCH v10 10/22] Keep all_frozen updated too in
heap_page_prune_and_freeze
We previously relied on only using all-visible and all-frozen together
but it's best to keep them both updated.
Future commits will separate usage of these fields, so it is best not to
rely on all_visible for all_frozen's validity.
---
src/backend/access/heap/pruneheap.c | 15 ++++++---------
1 file changed, 6 insertions(+), 9 deletions(-)
diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index e0005c2d4f2..10d030fb3e7 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -143,10 +143,6 @@ typedef struct
* whether to freeze the page or not. The all_visible and all_frozen
* values returned to the caller are adjusted to include LP_DEAD items at
* the end.
- *
- * all_frozen should only be considered valid if all_visible is also set;
- * we don't bother to clear the all_frozen flag every time we clear the
- * all_visible flag.
*/
bool all_visible;
bool all_frozen;
@@ -828,6 +824,7 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
*/
}
+ Assert(!prstate.all_frozen || prstate.all_visible);
/* Any error while applying the changes is critical */
START_CRIT_SECTION();
@@ -1472,7 +1469,7 @@ heap_prune_record_unchanged_lp_normal(Page page, PruneState *prstate, OffsetNumb
if (!HeapTupleHeaderXminCommitted(htup))
{
- prstate->all_visible = false;
+ prstate->all_visible = prstate->all_frozen = false;
break;
}
@@ -1494,7 +1491,7 @@ heap_prune_record_unchanged_lp_normal(Page page, PruneState *prstate, OffsetNumb
Assert(prstate->cutoffs);
if (!TransactionIdPrecedes(xmin, prstate->cutoffs->OldestXmin))
{
- prstate->all_visible = false;
+ prstate->all_visible = prstate->all_frozen = false;
break;
}
@@ -1507,7 +1504,7 @@ heap_prune_record_unchanged_lp_normal(Page page, PruneState *prstate, OffsetNumb
case HEAPTUPLE_RECENTLY_DEAD:
prstate->recently_dead_tuples++;
- prstate->all_visible = false;
+ prstate->all_visible = prstate->all_frozen = false;
/*
* This tuple will soon become DEAD. Update the hint field so
@@ -1526,7 +1523,7 @@ heap_prune_record_unchanged_lp_normal(Page page, PruneState *prstate, OffsetNumb
* assumption is a bit shaky, but it is what acquire_sample_rows()
* does, so be consistent.
*/
- prstate->all_visible = false;
+ prstate->all_visible = prstate->all_frozen = false;
/*
* If we wanted to optimize for aborts, we might consider marking
@@ -1544,7 +1541,7 @@ heap_prune_record_unchanged_lp_normal(Page page, PruneState *prstate, OffsetNumb
* will commit and update the counters after we report.
*/
prstate->live_tuples++;
- prstate->all_visible = false;
+ prstate->all_visible = prstate->all_frozen = false;
/*
* This tuple may soon become DEAD. Update the hint field so that
--
2.43.0
v10-0007-Combine-lazy_scan_prune-VM-corruption-cases.patchtext/x-patch; charset=US-ASCII; name=v10-0007-Combine-lazy_scan_prune-VM-corruption-cases.patchDownload
From 84457d090e0f176e84f14fadf95b095722d1e767 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <melanieplageman@gmail.com>
Date: Wed, 28 May 2025 16:04:03 -0400
Subject: [PATCH v10 07/22] Combine lazy_scan_prune VM corruption cases
lazy_scan_prune() updates the visibility map after phase I of heap
vacuuming. It also checks and fixes corruption in the VM. The corruption
cases where mixed in with the normal visibility map update cases.
Careful study of the ordering of the current logic reveals that the
corruption cases can be reordered and extracted into a separate
function. This should result in no additional overhead when compared to
previous execution.
This reordering makes it clear which cases are about corruption and
which cases are normal VM updates. Separating them also makes it
possible to combine the normal cases in a future commit. This will make
the logic easier to understand and allow for further separation of the
logic to allow updating the VM in the same record as pruning and
freezing in phase I.
---
src/backend/access/heap/vacuumlazy.c | 122 +++++++++++++++++----------
1 file changed, 77 insertions(+), 45 deletions(-)
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index 67c853b586a..2cc11e6d55d 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -430,6 +430,12 @@ static void find_next_unskippable_block(LVRelState *vacrel, bool *skipsallvis);
static bool lazy_scan_new_or_empty(LVRelState *vacrel, Buffer buf,
BlockNumber blkno, Page page,
bool sharelock, Buffer vmbuffer);
+static bool identify_and_fix_vm_corruption(Relation relation,
+ BlockNumber heap_blk,
+ Buffer heap_buffer, Page heap_page,
+ bool heap_blk_known_av,
+ int64 nlpdead_items,
+ Buffer vmbuffer);
static int lazy_scan_prune(LVRelState *vacrel, Buffer buf,
BlockNumber blkno, Page page,
Buffer vmbuffer, bool all_visible_according_to_vm,
@@ -1932,6 +1938,70 @@ lazy_scan_new_or_empty(LVRelState *vacrel, Buffer buf, BlockNumber blkno,
return false;
}
+/*
+ * When updating the visibility map after phase I heap vacuuming, we take the
+ * opportunity to identify and fix any VM corruption.
+ *
+ * heap_blk_known_av is the visibility status of the heap page collected
+ * while finding the next unskippable block in heap_vac_scan_next_block().
+ */
+static bool
+identify_and_fix_vm_corruption(Relation relation,
+ BlockNumber heap_blk,
+ Buffer heap_buffer, Page heap_page,
+ bool heap_blk_known_av,
+ int64 nlpdead_items,
+ Buffer vmbuffer)
+{
+ /*
+ * As of PostgreSQL 9.2, the visibility map bit should never be set if the
+ * page-level bit is clear. However, it's possible that the bit got
+ * cleared after heap_vac_scan_next_block() was called, so we must recheck
+ * with buffer lock before concluding that the VM is corrupt.
+ */
+ if (heap_blk_known_av && !PageIsAllVisible(heap_page) &&
+ visibilitymap_get_status(relation, heap_blk, &vmbuffer) != 0)
+ {
+ ereport(WARNING, (errcode(ERRCODE_DATA_CORRUPTED),
+ errmsg("page is not marked all-visible but visibility map bit is set in relation \"%s\" page %u",
+ RelationGetRelationName(relation), heap_blk)));
+
+ visibilitymap_clear(relation, heap_blk, vmbuffer,
+ VISIBILITYMAP_VALID_BITS);
+ return true;
+ }
+
+ /*
+ * It's possible for the value returned by
+ * GetOldestNonRemovableTransactionId() to move backwards, so it's not
+ * wrong for us to see tuples that appear to not be visible to everyone
+ * yet, while PD_ALL_VISIBLE is already set. The real safe xmin value
+ * never moves backwards, but GetOldestNonRemovableTransactionId() is
+ * conservative and sometimes returns a value that's unnecessarily small,
+ * so if we see that contradiction it just means that the tuples that we
+ * think are not visible to everyone yet actually are, and the
+ * PD_ALL_VISIBLE flag is correct.
+ *
+ * There should never be LP_DEAD items on a page with PD_ALL_VISIBLE set,
+ * however.
+ */
+ if (nlpdead_items > 0 && PageIsAllVisible(heap_page))
+ {
+ ereport(WARNING, (errcode(ERRCODE_DATA_CORRUPTED),
+ errmsg("page containing LP_DEAD items is marked as all-visible in relation \"%s\" page %u",
+ RelationGetRelationName(relation), heap_blk)));
+
+ PageClearAllVisible(heap_page);
+ MarkBufferDirty(heap_buffer);
+ visibilitymap_clear(relation, heap_blk, vmbuffer,
+ VISIBILITYMAP_VALID_BITS);
+ return true;
+ }
+
+ return false;
+}
+
+
/* qsort comparator for sorting OffsetNumbers */
static int
cmpOffsetNumbers(const void *a, const void *b)
@@ -2078,9 +2148,14 @@ lazy_scan_prune(LVRelState *vacrel,
/*
* Handle setting visibility map bit based on information from the VM (as
* of last heap_vac_scan_next_block() call), and from all_visible and
- * all_frozen variables
+ * all_frozen variables. Start by looking for any VM corruption.
*/
- if (!all_visible_according_to_vm && presult.all_visible)
+ if (identify_and_fix_vm_corruption(vacrel->rel, blkno, buf, page,
+ all_visible_according_to_vm, presult.lpdead_items, vmbuffer))
+ {
+ /* Don't update the VM if we just cleared corruption in it */
+ }
+ else if (!all_visible_according_to_vm && presult.all_visible)
{
uint8 old_vmbits;
uint8 flags = VISIBILITYMAP_ALL_VISIBLE;
@@ -2132,49 +2207,6 @@ lazy_scan_prune(LVRelState *vacrel,
}
}
- /*
- * As of PostgreSQL 9.2, the visibility map bit should never be set if the
- * page-level bit is clear. However, it's possible that the bit got
- * cleared after heap_vac_scan_next_block() was called, so we must recheck
- * with buffer lock before concluding that the VM is corrupt.
- */
- else if (all_visible_according_to_vm && !PageIsAllVisible(page) &&
- visibilitymap_get_status(vacrel->rel, blkno, &vmbuffer) != 0)
- {
- ereport(WARNING, (errcode(ERRCODE_DATA_CORRUPTED),
- errmsg("page is not marked all-visible but visibility map bit is set in relation \"%s\" page %u",
- vacrel->relname, blkno)));
-
- visibilitymap_clear(vacrel->rel, blkno, vmbuffer,
- VISIBILITYMAP_VALID_BITS);
- }
-
- /*
- * It's possible for the value returned by
- * GetOldestNonRemovableTransactionId() to move backwards, so it's not
- * wrong for us to see tuples that appear to not be visible to everyone
- * yet, while PD_ALL_VISIBLE is already set. The real safe xmin value
- * never moves backwards, but GetOldestNonRemovableTransactionId() is
- * conservative and sometimes returns a value that's unnecessarily small,
- * so if we see that contradiction it just means that the tuples that we
- * think are not visible to everyone yet actually are, and the
- * PD_ALL_VISIBLE flag is correct.
- *
- * There should never be LP_DEAD items on a page with PD_ALL_VISIBLE set,
- * however.
- */
- else if (presult.lpdead_items > 0 && PageIsAllVisible(page))
- {
- ereport(WARNING, (errcode(ERRCODE_DATA_CORRUPTED),
- errmsg("page containing LP_DEAD items is marked as all-visible in relation \"%s\" page %u",
- vacrel->relname, blkno)));
-
- PageClearAllVisible(page);
- MarkBufferDirty(buf);
- visibilitymap_clear(vacrel->rel, blkno, vmbuffer,
- VISIBILITYMAP_VALID_BITS);
- }
-
/*
* If the all-visible page is all-frozen but not marked as such yet, mark
* it as all-frozen. Note that all_frozen is only valid if all_visible is
--
2.43.0
v10-0012-Eliminate-xl_heap_visible-from-vacuum-phase-I-pr.patchtext/x-patch; charset=US-ASCII; name=v10-0012-Eliminate-xl_heap_visible-from-vacuum-phase-I-pr.patchDownload
From a5be384e7806fe72c18a54df1a637cf93d16a0b9 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <melanieplageman@gmail.com>
Date: Wed, 18 Jun 2025 12:41:00 -0400
Subject: [PATCH v10 12/22] Eliminate xl_heap_visible from vacuum phase I
prune/freeze
Instead of emitting a separate WAL record for every block rendered
all-visible/frozen by vacuum's phase I, include the changes to the VM in
the xl_heap_prune record already emitted.
This is only enabled for vacuum's prune/freeze work, not for on-access
pruning.
---
src/backend/access/heap/pruneheap.c | 456 ++++++++++++++++-----------
src/backend/access/heap/vacuumlazy.c | 30 --
src/include/access/heapam.h | 15 +-
3 files changed, 279 insertions(+), 222 deletions(-)
diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index 9fd25d0d501..a415db2c01e 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -44,6 +44,13 @@ typedef struct
bool mark_unused_now;
/* whether to attempt freezing tuples */
bool freeze;
+
+ /*
+ * Whether or not to consider updating the VM. There is some bookkeeping
+ * that must be maintained if we would like to update the VM.
+ */
+ bool consider_update_vm;
+
struct VacuumCutoffs *cutoffs;
/*-------------------------------------------------------
@@ -108,8 +115,9 @@ typedef struct
*
* These fields are not used by pruning itself for the most part, but are
* used to collect information about what was pruned and what state the
- * page is in after pruning, for the benefit of the caller. They are
- * copied to the caller's PruneFreezeResult at the end.
+ * page is in after pruning to use when updating the visibility map and
+ * for the benefit of the caller. They are copied to the caller's
+ * PruneFreezeResult at the end.
* -------------------------------------------------------
*/
@@ -138,11 +146,10 @@ typedef struct
* bits. It is only valid if we froze some tuples, and all_frozen is
* true.
*
- * NOTE: all_visible and all_frozen don't include LP_DEAD items. That's
- * convenient for heap_page_prune_and_freeze(), to use them to decide
- * whether to freeze the page or not. The all_visible and all_frozen
- * values returned to the caller are adjusted to include LP_DEAD items at
- * the end.
+ * NOTE: all_visible and all_frozen don't include LP_DEAD items until
+ * directly before updating the VM. We ignore LP_DEAD items when deciding
+ * whether or not to opportunistically freeze and when determining the
+ * snapshot conflict horizon required when freezing tuples.
*/
bool all_visible;
bool all_frozen;
@@ -375,12 +382,15 @@ identify_and_fix_vm_corruption(Relation relation,
* If the HEAP_PRUNE_FREEZE option is set, we will also freeze tuples if it's
* required in order to advance relfrozenxid / relminmxid, or if it's
* considered advantageous for overall system performance to do so now. The
- * 'cutoffs', 'presult', 'new_relfrozen_xid' and 'new_relmin_mxid' arguments
- * are required when freezing. When HEAP_PRUNE_FREEZE option is set, we also
- * set presult->all_visible and presult->all_frozen on exit, to indicate if
- * the VM bits can be set. They are always set to false when the
- * HEAP_PRUNE_FREEZE option is not set, because at the moment only callers
- * that also freeze need that information.
+ * 'cutoffs', 'new_relfrozen_xid' and 'new_relmin_mxid' arguments are required
+ * when freezing.
+ *
+ * If HEAP_PAGE_PRUNE_UPDATE_VM is set and the visibility status of the page
+ * has changed, we will update the VM at the same time as pruning and freezing
+ * the heap page. We will also update presult->old_vmbits and
+ * presult->new_vmbits with the state of the VM before and after updating it
+ * for the caller to use in bookkeeping. Note that new and old_vmbits will be
+ * 0 if HEAP_PAGE_PRUNE_UPDATE_VM is not set.
*
* blk_known_av is the visibility status of the heap block as of the last call
* to find_next_unskippable_block(). vmbuffer is the buffer that may already
@@ -396,6 +406,8 @@ identify_and_fix_vm_corruption(Relation relation,
* FREEZE indicates that we will also freeze tuples, and will return
* 'all_visible', 'all_frozen' flags to the caller.
*
+ * UPDATE_VM indicates that we will set the page's status in the VM.
+ *
* cutoffs contains the freeze cutoffs, established by VACUUM at the beginning
* of vacuuming the relation. Required if HEAP_PRUNE_FREEZE option is set.
* cutoffs->OldestXmin is also used to determine if dead tuples are
@@ -440,18 +452,24 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
HeapTupleData tup;
bool do_freeze;
bool do_prune;
- bool do_hint;
+ bool do_hint_full_or_prunable;
+ bool do_set_vm;
uint8 vmflags = 0;
uint8 old_vmbits = 0;
bool hint_bit_fpi;
int64 fpi_before = pgWalUsage.wal_fpi;
+ bool all_frozen_except_lp_dead = false;
+ bool set_pd_all_visible = false;
/* Copy parameters to prstate */
prstate.vistest = vistest;
prstate.mark_unused_now = (options & HEAP_PAGE_PRUNE_MARK_UNUSED_NOW) != 0;
prstate.freeze = (options & HEAP_PAGE_PRUNE_FREEZE) != 0;
+ prstate.consider_update_vm = (options & HEAP_PAGE_PRUNE_UPDATE_VM) != 0;
prstate.cutoffs = cutoffs;
+ Assert(!prstate.consider_update_vm || vmbuffer);
+
/*
* Our strategy is to scan the page and make lists of items to change,
* then apply the changes within a critical section. This keeps as much
@@ -496,50 +514,57 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
prstate.deadoffsets = presult->deadoffsets;
/*
- * Caller may update the VM after we're done. We can keep track of
- * whether the page will be all-visible and all-frozen after pruning and
- * freezing to help the caller to do that.
+ * Keep track of whether or not the page will be all-visible and
+ * all-frozen for use in opportunistic freezing and to update the VM if
+ * the caller requests it.
+ *
+ * Currently, only VACUUM attempts freezing and setting the VM bits. But
+ * other callers could do either one. The visibility bookkeeping is
+ * required for opportunistic freezing (in addition to setting the VM
+ * bits) because we only consider opportunistically freezing tuples if the
+ * whole page would become all-frozen or if the whole page will be frozen
+ * except for dead tuples that will be removed by vacuum.
*
- * Currently, only VACUUM sets the VM bits. To save the effort, only do
- * the bookkeeping if the caller needs it. Currently, that's tied to
- * HEAP_PAGE_PRUNE_FREEZE, but it could be a separate flag if you wanted
- * to update the VM bits without also freezing or freeze without also
- * setting the VM bits.
+ * If only updating the VM, we must initialize all_frozen to false, as
+ * heap_prepare_freeze_tuple() will not be called for each tuple on the
+ * page and we will not end up correctly setting it to false later.
*
- * In addition to telling the caller whether it can set the VM bit, we
- * also use 'all_visible' and 'all_frozen' for our own decision-making. If
- * the whole page would become frozen, we consider opportunistically
- * freezing tuples. We will not be able to freeze the whole page if there
- * are tuples present that are not visible to everyone or if there are
- * dead tuples which are not yet removable. However, dead tuples which
- * will be removed by the end of vacuuming should not preclude us from
- * opportunistically freezing. Because of that, we do not clear
- * all_visible when we see LP_DEAD items. We fix that at the end of the
- * function, when we return the value to the caller, so that the caller
- * doesn't set the VM bit incorrectly.
+ * Dead tuples which will be removed by the end of vacuuming should not
+ * preclude us from opportunistically freezing, so we do not clear
+ * all_visible when we see LP_DEAD items. We fix that after determining
+ * whether or not to freeze but before deciding whether or not to update
+ * the VM so that we don't set the VM bit incorrectly.
+ *
+ * If not freezing or updating the VM, we otherwise avoid the extra
+ * bookkeeping. Initializing all_visible to false allows skipping the work
+ * to update them in heap_prune_record_unchanged_lp_normal().
*/
if (prstate.freeze)
{
prstate.all_visible = true;
prstate.all_frozen = true;
}
+ else if (prstate.consider_update_vm)
+ {
+ prstate.all_visible = true;
+ prstate.all_frozen = false;
+ }
else
{
- /*
- * Initializing to false allows skipping the work to update them in
- * heap_prune_record_unchanged_lp_normal().
- */
prstate.all_visible = false;
prstate.all_frozen = false;
}
/*
- * The visibility cutoff xid is the newest xmin of live tuples on the
- * page. In the common case, this will be set as the conflict horizon the
- * caller can use for updating the VM. If, at the end of freezing and
- * pruning, the page is all-frozen, there is no possibility that any
- * running transaction on the standby does not see tuples on the page as
- * all-visible, so the conflict horizon remains InvalidTransactionId.
+ * The visibility cutoff xid is the newest xmin of live, committed tuples
+ * older than OldestXmin on the page. This field is only kept up-to-date
+ * if the page is all-visible. As soon as a tuple is encountered that is
+ * not visible to all, this field is unmaintained. As long as it is
+ * maintained, it can be used to calculate the snapshot conflict horizon.
+ * This is most likely to happen when updating the VM and/or freezing all
+ * live tuples on the page. It is updated before returning to the caller
+ * because vacuum does assert-build only validation on the page using this
+ * field.
*/
prstate.visibility_cutoff_xid = InvalidTransactionId;
@@ -737,10 +762,11 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
/*
* Even if we don't prune anything, if we found a new value for the
- * pd_prune_xid field or the page was marked full, we will update the hint
- * bit.
+ * pd_prune_xid field or the page was marked full, we will update those
+ * hint bits.
*/
- do_hint = ((PageHeader) page)->pd_prune_xid != prstate.new_prune_xid ||
+ do_hint_full_or_prunable =
+ ((PageHeader) page)->pd_prune_xid != prstate.new_prune_xid ||
PageIsFull(page);
/*
@@ -788,7 +814,7 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
if (XLogCheckBufferNeedsBackup(buffer))
do_freeze = true;
}
- else if (do_hint)
+ else if (do_hint_full_or_prunable)
{
if (XLogHintBitIsNeeded() && XLogCheckBufferNeedsBackup(buffer))
do_freeze = true;
@@ -827,11 +853,84 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
*/
}
+ /*
+ * It was convenient to ignore LP_DEAD items in all_visible earlier on to
+ * make the choice of whether or not to freeze the page unaffected by the
+ * short-term presence of LP_DEAD items. These LP_DEAD items were
+ * effectively assumed to be LP_UNUSED items in the making. It doesn't
+ * matter which vacuum heap pass (initial pass or final pass) ends up
+ * setting the page all-frozen, as long as the ongoing VACUUM does it.
+ *
+ * Now that freezing has been finalized, unset all_visible if there are
+ * any LP_DEAD items on the page. It needs to reflect the present state of
+ * the page when using it to determine whether or not to update the VM.
+ *
+ * Keep track of whether or not the page was all-frozen except LP_DEAD
+ * items for the purposes of calculating the snapshot conflict horizon,
+ * though.
+ */
+ all_frozen_except_lp_dead = prstate.all_frozen;
+ if (prstate.lpdead_items > 0)
+ {
+ prstate.all_visible = false;
+ prstate.all_frozen = false;
+ }
+
Assert(!prstate.all_frozen || prstate.all_visible);
+
+ /*
+ * Handle setting visibility map bit based on information from the VM (as
+ * of last heap_vac_scan_next_block() call), and from all_visible and
+ * all_frozen variables.
+ */
+ if (prstate.consider_update_vm)
+ {
+ /*
+ * Clear any VM corruption. This does not need to be in a critical
+ * section, so we do it first. If PD_ALL_VISIBLE is incorrectly set,
+ * we may mark the heap page buffer dirty here and could end up doing
+ * so again later. This is not a correctness issue and is in the path
+ * of VM corruption, so we don't have to worry about the extra
+ * performance overhead.
+ */
+ if (identify_and_fix_vm_corruption(relation,
+ blockno, buffer, page,
+ blk_known_av, prstate.lpdead_items, vmbuffer))
+ {
+ /* If we fix corruption, don't update the VM further */
+ }
+
+ /* Determine if we actually need to set the VM and which bits to set. */
+ else if (prstate.all_visible &&
+ (!blk_known_av ||
+ (prstate.all_frozen && !VM_ALL_FROZEN(relation, blockno, &vmbuffer))))
+ {
+ vmflags |= VISIBILITYMAP_ALL_VISIBLE;
+ if (prstate.all_frozen)
+ vmflags |= VISIBILITYMAP_ALL_FROZEN;
+ }
+ }
+
+ do_set_vm = vmflags & VISIBILITYMAP_VALID_BITS;
+
+ /*
+ * Don't set PD_ALL_VISIBLE unless we also plan to set the VM. While it is
+ * correct for a heap page to have PD_ALL_VISIBLE even if the VM is not
+ * set, we strongly prefer to keep them in sync.
+ *
+ * Prior to Postgres 19, it was possible for the page-level bit to be set
+ * and the VM bit to be clear. This could happen if we crashed after
+ * setting PD_ALL_VISIBLE but before setting bits in the VM.
+ */
+ set_pd_all_visible = do_set_vm && !PageIsAllVisible(page);
+
+ /* Save these for the caller in case we later zero out vmflags */
+ presult->new_vmbits = vmflags;
+
/* Any error while applying the changes is critical */
START_CRIT_SECTION();
- if (do_hint)
+ if (do_hint_full_or_prunable)
{
/*
* Update the page's pd_prune_xid field to either zero, or the lowest
@@ -847,15 +946,16 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
PageClearFull(page);
/*
- * If that's all we had to do to the page, this is a non-WAL-logged
- * hint. If we are going to freeze or prune the page, we will mark
- * the buffer dirty below.
+ * If we are _only_ setting the prune_xid or PD_PAGE_FULL hint, then
+ * this is a non-WAL-logged hint. If we are going to freeze or prune
+ * tuples on the page or set PD_ALL_VISIBLE, we will mark the buffer
+ * dirty and emit WAL below.
*/
- if (!do_freeze && !do_prune)
+ if (!do_prune && !do_freeze && !set_pd_all_visible)
MarkBufferDirtyHint(buffer, true);
}
- if (do_prune || do_freeze)
+ if (do_prune || do_freeze || do_set_vm)
{
/* Apply the planned item changes and repair page fragmentation. */
if (do_prune)
@@ -869,12 +969,48 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
if (do_freeze)
heap_freeze_prepared_tuples(buffer, prstate.frozen, prstate.nfrozen);
- MarkBufferDirty(buffer);
+ if (set_pd_all_visible)
+ PageSetAllVisible(page);
+
+ /*
+ * We only set PD_ALL_VISIBLE if we also set the VM, and since setting
+ * the VM requires emitting WAL, MarkBufferDirtyHint() isn't
+ * appropriate here.
+ */
+ if (do_prune || do_freeze || set_pd_all_visible)
+ MarkBufferDirty(buffer);
+
+ if (do_set_vm)
+ {
+ Assert(PageIsAllVisible(page));
+ LockBuffer(vmbuffer, BUFFER_LOCK_EXCLUSIVE);
+ old_vmbits = visibilitymap_set_vmbits(relation, blockno,
+ vmbuffer, vmflags);
+
+ if (old_vmbits == vmflags)
+ {
+ LockBuffer(vmbuffer, BUFFER_LOCK_UNLOCK);
+ do_set_vm = false;
+ /* 0 out vmflags so we don't emit WAL to update the VM */
+ vmflags = 0;
+ }
+ }
+
+ /*
+ * It should never be the case that PD_ALL_VISIBLE is not set and the
+ * VM is set. Or, if it were, we should have caught it earlier when
+ * finding and fixing VM corruption. So, if we found out the VM was
+ * already set above, we should have found PD_ALL_VISIBLE set earlier.
+ */
+ Assert(!set_pd_all_visible || do_set_vm);
/*
- * Emit a WAL XLOG_HEAP2_PRUNE_FREEZE record showing what we did
+ * Emit a WAL XLOG_HEAP2_PRUNE_FREEZE record showing what we did. If
+ * we were only updating the VM and it turns out it was already set,
+ * we will have unset do_set_vm earlier. As such, check it again
+ * before emitting the record.
*/
- if (RelationNeedsWAL(relation))
+ if (RelationNeedsWAL(relation) && (do_set_vm || do_prune || do_freeze))
{
/*
* The snapshotConflictHorizon for the whole record should be the
@@ -886,35 +1022,56 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
* on the standby with xids older than the youngest tuple this
* record will freeze will conflict.
*/
- TransactionId frz_conflict_horizon = InvalidTransactionId;
- TransactionId conflict_xid;
+ TransactionId conflict_xid = InvalidTransactionId;
+
+ /*
+ * If we are updating the VM, the conflict horizon is almost
+ * always the visibility cutoff XID.
+ *
+ * Separately, if we are freezing any tuples, as an optimization,
+ * we can use the visibility_cutoff_xid as the conflict horizon if
+ * the page will be all-frozen. This is true even if there are
+ * LP_DEAD line pointers because we ignored those when maintaining
+ * the visibility_cutoff_xid.
+ */
+ if (do_set_vm || (do_freeze && all_frozen_except_lp_dead))
+ conflict_xid = prstate.visibility_cutoff_xid;
/*
- * We can use the visibility_cutoff_xid as our cutoff for
- * conflicts when the whole page is eligible to become all-frozen
- * in the VM once we're done with it. Otherwise we generate a
- * conservative cutoff by stepping back from OldestXmin.
+ * Otherwise, if we are freezing but the page would not be
+ * all-frozen, we have to use the more pessimistic horizon of
+ * OldestXmin, which may be newer than the newest tuple we froze.
+ * We currently don't track the newest tuple we froze.
*/
- if (do_freeze)
+ else if (do_freeze)
{
- if (prstate.all_visible && prstate.all_frozen)
- frz_conflict_horizon = prstate.visibility_cutoff_xid;
- else
- {
- /* Avoids false conflicts when hot_standby_feedback in use */
- frz_conflict_horizon = prstate.cutoffs->OldestXmin;
- TransactionIdRetreat(frz_conflict_horizon);
- }
+ conflict_xid = prstate.cutoffs->OldestXmin;
+ TransactionIdRetreat(conflict_xid);
}
- if (TransactionIdFollows(frz_conflict_horizon, prstate.latest_xid_removed))
- conflict_xid = frz_conflict_horizon;
- else
+ /*
+ * If we are removing tuples with a younger xmax than our so far
+ * calculated conflict_xid, we must use this as our horizon.
+ */
+ if (TransactionIdFollows(prstate.latest_xid_removed, conflict_xid))
conflict_xid = prstate.latest_xid_removed;
+ /*
+ * We can omit the snapshot conflict horizon if we are not pruning
+ * or freezing any tuples and are setting an already all-visible
+ * page all-frozen in the VM. In this case, all of the tuples on
+ * the page must already be visible to all MVCC snapshots on the
+ * standby.
+ */
+ if (!do_prune && !do_freeze && do_set_vm &&
+ blk_known_av && (vmflags & VISIBILITYMAP_ALL_FROZEN))
+ conflict_xid = InvalidTransactionId;
+
log_heap_prune_and_freeze(relation, buffer,
false,
- InvalidBuffer, 0, false,
+ vmbuffer,
+ vmflags,
+ set_pd_all_visible,
conflict_xid,
true, reason,
prstate.frozen, prstate.nfrozen,
@@ -926,124 +1083,55 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
END_CRIT_SECTION();
- /* Copy information back for caller */
- presult->ndeleted = prstate.ndeleted;
- presult->nnewlpdead = prstate.ndead;
- presult->nfrozen = prstate.nfrozen;
- presult->live_tuples = prstate.live_tuples;
- presult->recently_dead_tuples = prstate.recently_dead_tuples;
-
- /*
- * It was convenient to ignore LP_DEAD items in all_visible earlier on to
- * make the choice of whether or not to freeze the page unaffected by the
- * short-term presence of LP_DEAD items. These LP_DEAD items were
- * effectively assumed to be LP_UNUSED items in the making. It doesn't
- * matter which vacuum heap pass (initial pass or final pass) ends up
- * setting the page all-frozen, as long as the ongoing VACUUM does it.
- *
- * Now that freezing has been finalized, unset all_visible if there are
- * any LP_DEAD items on the page. It needs to reflect the present state
- * of the page, as expected for updating the visibility map.
- */
- if (prstate.all_visible && prstate.lpdead_items == 0)
- {
- presult->all_visible = prstate.all_visible;
- presult->all_frozen = prstate.all_frozen;
- }
- else
- {
- presult->all_visible = false;
- presult->all_frozen = false;
- }
-
- presult->hastup = prstate.hastup;
+ if (do_set_vm)
+ LockBuffer(vmbuffer, BUFFER_LOCK_UNLOCK);
- /*
- * If updating the visibility map, the conflict horizon for that record
- * must be the newest xmin on the page. However, if the page is
- * completely frozen, there can be no conflict and the vm_conflict_horizon
- * should remain InvalidTransactionId. This includes the case that we
- * just froze all the tuples; the prune-freeze record included the
- * conflict XID already so the VM update record doesn't need it.
- */
- if (presult->all_frozen)
- presult->vm_conflict_horizon = InvalidTransactionId;
- else
- presult->vm_conflict_horizon = prstate.visibility_cutoff_xid;
+ Assert(!prstate.all_visible || (prstate.lpdead_items == 0));
/*
- * Handle setting visibility map bit based on information from the VM (as
- * of last heap_vac_scan_next_block() call), and from all_visible and
- * all_frozen variables.
+ * VACUUM will call heap_page_is_all_visible() during the second pass over
+ * the heap to determine all_visible and all_frozen for the page -- this
+ * is a specialized version of the logic from this function. Now that
+ * we've finished pruning and freezing, make sure that we're in total
+ * agreement with heap_page_is_all_visible() using an assertion. We will
+ * have already set the page in the VM, so this assertion will only let
+ * you know that you've already done something wrong.
*/
- if (options & HEAP_PAGE_PRUNE_UPDATE_VM)
+#ifdef USE_ASSERT_CHECKING
+ if (prstate.all_visible)
{
- if (identify_and_fix_vm_corruption(relation,
- blockno, buffer, page,
- blk_known_av,
- prstate.lpdead_items, vmbuffer))
- {
- /* If we fix corruption, don't update the VM further */
- }
+ TransactionId debug_cutoff;
+ bool debug_all_frozen;
- /*
- * If the page isn't yet marked all-visible in the VM or it is and
- * needs to me marked all-frozen, update the VM Note that all_frozen
- * is only valid if all_visible is true, so we must check both
- * all_visible and all_frozen.
- */
- else if (presult->all_visible &&
- (!blk_known_av ||
- (presult->all_frozen && !VM_ALL_FROZEN(relation, blockno, &vmbuffer))))
- {
- Assert(prstate.lpdead_items == 0);
- vmflags = VISIBILITYMAP_ALL_VISIBLE;
+ Assert(cutoffs);
- /*
- * If the page is all-frozen, we can pass InvalidTransactionId as
- * our cutoff_xid, since a snapshotConflictHorizon sufficient to
- * make everything safe for REDO was logged when the page's tuples
- * were frozen.
- */
- if (presult->all_frozen)
- {
- Assert(!TransactionIdIsValid(presult->vm_conflict_horizon));
- vmflags |= VISIBILITYMAP_ALL_FROZEN;
- }
+ Assert(prstate.lpdead_items == 0);
- /*
- * It's possible for the VM bit to be clear and the page-level bit
- * to be set if checksums are not enabled.
- *
- * And even if we are just planning to update the frozen bit in
- * the VM, we shouldn't rely on all_visible_according_to_vm as a
- * proxy for the page-level PD_ALL_VISIBLE bit being set, since it
- * might have become stale.
- *
- * If the heap page is all-visible but the VM bit is not set, we
- * don't need to dirty the heap page. However, if checksums are
- * enabled, we do need to make sure that the heap page is dirtied
- * before passing it to visibilitymap_set(), because it may be
- * logged.
- */
- if (!PageIsAllVisible(page) || XLogHintBitIsNeeded())
- {
- PageSetAllVisible(page);
- MarkBufferDirty(buffer);
- }
+ if (!heap_page_is_all_visible(relation, buffer,
+ cutoffs->OldestXmin,
+ &debug_all_frozen,
+ &debug_cutoff, off_loc))
+ Assert(false);
- old_vmbits = visibilitymap_set(relation, blockno, buffer, InvalidXLogRecPtr,
- vmbuffer, presult->vm_conflict_horizon,
- vmflags);
- }
+ Assert(prstate.all_frozen == debug_all_frozen);
+
+ Assert(!TransactionIdIsValid(debug_cutoff) ||
+ debug_cutoff == prstate.visibility_cutoff_xid);
}
+#endif
+ /* Copy information back for caller */
+ presult->ndeleted = prstate.ndeleted;
+ presult->nnewlpdead = prstate.ndead;
+ presult->nfrozen = prstate.nfrozen;
+ presult->live_tuples = prstate.live_tuples;
+ presult->recently_dead_tuples = prstate.recently_dead_tuples;
+ presult->old_vmbits = old_vmbits;
+ /* new_vmbits was set above */
+ presult->hastup = prstate.hastup;
presult->lpdead_items = prstate.lpdead_items;
/* the presult->deadoffsets array was already filled in */
- presult->old_vmbits = old_vmbits;
- presult->new_vmbits = vmflags;
-
if (prstate.freeze)
{
if (presult->nfrozen > 0)
@@ -1625,7 +1713,12 @@ heap_prune_record_unchanged_lp_normal(Page page, PruneState *prstate, OffsetNumb
break;
}
- /* Consider freezing any normal tuples which will not be removed */
+ /*
+ * Consider freezing any normal tuples which will not be removed.
+ * Regardless of whether or not we want to freeze the tuples, if we want
+ * to update the VM, we have to call heap_prepare_freeze_tuple() on every
+ * tuple to know whether or not the page will be totally frozen.
+ */
if (prstate->freeze)
{
bool totally_frozen;
@@ -2188,7 +2281,7 @@ heap_log_freeze_plan(HeapTupleFreeze *tuples, int ntuples,
* - Reaping: During vacuum phase III, items that are already LP_DEAD are
* marked as unused.
*
- * - VM updates: After vacuum phase III, the heap page may be marked
+ * - VM updates: After vacuum phases I and III, the heap page may be marked
* all-visible and all-frozen.
*
* These changes all happen together, so we use a singel WAL record for them
@@ -2242,6 +2335,7 @@ log_heap_prune_and_freeze(Relation relation, Buffer buffer,
OffsetNumber frz_offsets[MaxHeapTuplesPerPage];
bool do_prune = nredirected > 0 || ndead > 0 || nunused > 0;
+ Assert(do_prune || nfrozen > 0 || vmflags & VISIBILITYMAP_VALID_BITS);
Assert((vmflags & VISIBILITYMAP_VALID_BITS) == vmflags);
xlrec.flags = vmflags;
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index 05d3d2a3267..75c10ba20c6 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -2013,34 +2013,6 @@ lazy_scan_prune(LVRelState *vacrel,
vacrel->new_frozen_tuple_pages++;
}
- /*
- * VACUUM will call heap_page_is_all_visible() during the second pass over
- * the heap to determine all_visible and all_frozen for the page -- this
- * is a specialized version of the logic from this function. Now that
- * we've finished pruning and freezing, make sure that we're in total
- * agreement with heap_page_is_all_visible() using an assertion.
- */
-#ifdef USE_ASSERT_CHECKING
- /* Note that all_frozen value does not matter when !all_visible */
- if (presult.all_visible)
- {
- TransactionId debug_cutoff;
- bool debug_all_frozen;
-
- Assert(presult.lpdead_items == 0);
-
- if (!heap_page_is_all_visible(vacrel->rel, buf,
- vacrel->cutoffs.OldestXmin, &debug_all_frozen,
- &debug_cutoff, &vacrel->offnum))
- Assert(false);
-
- Assert(presult.all_frozen == debug_all_frozen);
-
- Assert(!TransactionIdIsValid(debug_cutoff) ||
- debug_cutoff == presult.vm_conflict_horizon);
- }
-#endif
-
/*
* Now save details of the LP_DEAD items from the page in vacrel
*/
@@ -2074,8 +2046,6 @@ lazy_scan_prune(LVRelState *vacrel,
/* Did we find LP_DEAD items? */
*has_lpdead_items = (presult.lpdead_items > 0);
- Assert(!presult.all_visible || !(*has_lpdead_items));
-
/*
* For the purposes of logging, count whether or not the page was newly
* set all-visible and, potentially, all-frozen.
diff --git a/src/include/access/heapam.h b/src/include/access/heapam.h
index b85648456e9..0b9bb1c9b13 100644
--- a/src/include/access/heapam.h
+++ b/src/include/access/heapam.h
@@ -235,19 +235,12 @@ typedef struct PruneFreezeResult
int recently_dead_tuples;
/*
- * all_visible and all_frozen indicate the status of the page as reflected
- * in the visibility map after pruning, freezing, and setting any pages
- * all-visible in the visibility map.
+ * old_vmbits are the state of the all-visible and all-frozen bits in the
+ * visibility map before updating it during phase I of vacuuming.
+ * new_vmbits are the state of those bits after phase I of vacuuming.
*
- * vm_conflict_horizon is the newest xmin of live tuples on the page
- * (older than OldestXmin). It will only be valid if we did not set the
- * page all-frozen in the VM.
- *
- * These are only set if the HEAP_PRUNE_FREEZE option is set.
+ * These are only set if the HEAP_PAGE_PRUNE_UPDATE_VM option is set.
*/
- bool all_visible;
- bool all_frozen;
- TransactionId vm_conflict_horizon;
uint8 old_vmbits;
uint8 new_vmbits;
--
2.43.0
v10-0011-Update-VM-in-pruneheap.c.patchtext/x-patch; charset=US-ASCII; name=v10-0011-Update-VM-in-pruneheap.c.patchDownload
From d58838ac3ea61a10b07c792610e6fdd23f5ef487 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <melanieplageman@gmail.com>
Date: Mon, 2 Jun 2025 11:04:14 -0400
Subject: [PATCH v10 11/22] Update VM in pruneheap.c
As a step toward updating the VM in the same critical section and WAL
record as pruning and freezing (during phase I of vacuuming), first move
the VM update (still in its own critical section and WAL record) into
heap_page_prune_and_freeze(). This makes review easier.
---
src/backend/access/heap/pruneheap.c | 99 +++++++++++++++++++++++-----
src/backend/access/heap/vacuumlazy.c | 98 +++++----------------------
src/include/access/heapam.h | 15 +++--
3 files changed, 106 insertions(+), 106 deletions(-)
diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index 10d030fb3e7..9fd25d0d501 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -364,7 +364,8 @@ identify_and_fix_vm_corruption(Relation relation,
/*
* Prune and repair fragmentation and potentially freeze tuples on the
- * specified page.
+ * specified page. If the page's visibility status has changed, update it in
+ * the VM.
*
* Caller must have pin and buffer cleanup lock on the page. Note that we
* don't update the FSM information for page on caller's behalf. Caller might
@@ -440,6 +441,8 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
bool do_freeze;
bool do_prune;
bool do_hint;
+ uint8 vmflags = 0;
+ uint8 old_vmbits = 0;
bool hint_bit_fpi;
int64 fpi_before = pgWalUsage.wal_fpi;
@@ -940,7 +943,7 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
*
* Now that freezing has been finalized, unset all_visible if there are
* any LP_DEAD items on the page. It needs to reflect the present state
- * of the page, as expected by our caller.
+ * of the page, as expected for updating the visibility map.
*/
if (prstate.all_visible && prstate.lpdead_items == 0)
{
@@ -956,31 +959,91 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
presult->hastup = prstate.hastup;
/*
- * For callers planning to update the visibility map, the conflict horizon
- * for that record must be the newest xmin on the page. However, if the
- * page is completely frozen, there can be no conflict and the
- * vm_conflict_horizon should remain InvalidTransactionId. This includes
- * the case that we just froze all the tuples; the prune-freeze record
- * included the conflict XID already so the caller doesn't need it.
+ * If updating the visibility map, the conflict horizon for that record
+ * must be the newest xmin on the page. However, if the page is
+ * completely frozen, there can be no conflict and the vm_conflict_horizon
+ * should remain InvalidTransactionId. This includes the case that we
+ * just froze all the tuples; the prune-freeze record included the
+ * conflict XID already so the VM update record doesn't need it.
*/
if (presult->all_frozen)
presult->vm_conflict_horizon = InvalidTransactionId;
else
presult->vm_conflict_horizon = prstate.visibility_cutoff_xid;
- presult->lpdead_items = prstate.lpdead_items;
- /* the presult->deadoffsets array was already filled in */
-
/*
- * Clear any VM corruption. This does not need to be done in a critical
- * section.
+ * Handle setting visibility map bit based on information from the VM (as
+ * of last heap_vac_scan_next_block() call), and from all_visible and
+ * all_frozen variables.
*/
- presult->vm_corruption = false;
if (options & HEAP_PAGE_PRUNE_UPDATE_VM)
- presult->vm_corruption = identify_and_fix_vm_corruption(relation,
- blockno, buffer, page,
- blk_known_av,
- prstate.lpdead_items, vmbuffer);
+ {
+ if (identify_and_fix_vm_corruption(relation,
+ blockno, buffer, page,
+ blk_known_av,
+ prstate.lpdead_items, vmbuffer))
+ {
+ /* If we fix corruption, don't update the VM further */
+ }
+
+ /*
+ * If the page isn't yet marked all-visible in the VM or it is and
+ * needs to me marked all-frozen, update the VM Note that all_frozen
+ * is only valid if all_visible is true, so we must check both
+ * all_visible and all_frozen.
+ */
+ else if (presult->all_visible &&
+ (!blk_known_av ||
+ (presult->all_frozen && !VM_ALL_FROZEN(relation, blockno, &vmbuffer))))
+ {
+ Assert(prstate.lpdead_items == 0);
+ vmflags = VISIBILITYMAP_ALL_VISIBLE;
+
+ /*
+ * If the page is all-frozen, we can pass InvalidTransactionId as
+ * our cutoff_xid, since a snapshotConflictHorizon sufficient to
+ * make everything safe for REDO was logged when the page's tuples
+ * were frozen.
+ */
+ if (presult->all_frozen)
+ {
+ Assert(!TransactionIdIsValid(presult->vm_conflict_horizon));
+ vmflags |= VISIBILITYMAP_ALL_FROZEN;
+ }
+
+ /*
+ * It's possible for the VM bit to be clear and the page-level bit
+ * to be set if checksums are not enabled.
+ *
+ * And even if we are just planning to update the frozen bit in
+ * the VM, we shouldn't rely on all_visible_according_to_vm as a
+ * proxy for the page-level PD_ALL_VISIBLE bit being set, since it
+ * might have become stale.
+ *
+ * If the heap page is all-visible but the VM bit is not set, we
+ * don't need to dirty the heap page. However, if checksums are
+ * enabled, we do need to make sure that the heap page is dirtied
+ * before passing it to visibilitymap_set(), because it may be
+ * logged.
+ */
+ if (!PageIsAllVisible(page) || XLogHintBitIsNeeded())
+ {
+ PageSetAllVisible(page);
+ MarkBufferDirty(buffer);
+ }
+
+ old_vmbits = visibilitymap_set(relation, blockno, buffer, InvalidXLogRecPtr,
+ vmbuffer, presult->vm_conflict_horizon,
+ vmflags);
+ }
+ }
+
+ presult->lpdead_items = prstate.lpdead_items;
+ /* the presult->deadoffsets array was already filled in */
+
+ presult->old_vmbits = old_vmbits;
+ presult->new_vmbits = vmflags;
+
if (prstate.freeze)
{
if (presult->nfrozen > 0)
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index 322e54c803f..05d3d2a3267 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -1947,7 +1947,8 @@ cmpOffsetNumbers(const void *a, const void *b)
* vmbuffer is the buffer containing the VM block with visibility information
* for the heap block, blkno. all_visible_according_to_vm is the saved
* visibility status of the heap block looked up earlier by the caller. We
- * won't rely entirely on this status, as it may be out of date.
+ * won't rely entirely on this status, as it may be out of date. These will be
+ * passed on to heap_page_prune_and_freeze() to use while setting the VM.
*
* *has_lpdead_items is set to true or false depending on whether, upon return
* from this function, any LP_DEAD items are still present on the page.
@@ -1976,6 +1977,7 @@ lazy_scan_prune(LVRelState *vacrel,
/*
* Prune all HOT-update chains and potentially freeze tuples on this page.
+ * Then, if the page's visibility status has changed, update the VM.
*
* If the relation has no indexes, we can immediately mark would-be dead
* items LP_UNUSED.
@@ -1984,10 +1986,6 @@ lazy_scan_prune(LVRelState *vacrel,
* presult.ndeleted. It should not be confused with presult.lpdead_items;
* presult.lpdead_items's final value can be thought of as the number of
* tuples that were deleted from indexes.
- *
- * We will update the VM after collecting LP_DEAD items and freezing
- * tuples. Pruning will have determined whether or not the page is
- * all-visible.
*/
prune_options = HEAP_PAGE_PRUNE_FREEZE | HEAP_PAGE_PRUNE_UPDATE_VM;
if (vacrel->nindexes == 0)
@@ -2079,88 +2077,26 @@ lazy_scan_prune(LVRelState *vacrel,
Assert(!presult.all_visible || !(*has_lpdead_items));
/*
- * Handle setting visibility map bit based on information from the VM (as
- * of last heap_vac_scan_next_block() call), and from all_visible and
- * all_frozen variables.
+ * For the purposes of logging, count whether or not the page was newly
+ * set all-visible and, potentially, all-frozen.
*/
- if (presult.vm_corruption)
+ if ((presult.old_vmbits & VISIBILITYMAP_ALL_VISIBLE) == 0 &&
+ (presult.new_vmbits & VISIBILITYMAP_ALL_VISIBLE) != 0)
{
- /* Don't update the VM if we just cleared corruption in it */
- }
-
- /*
- * If the page isn't yet marked all-visible in the VM or it is and needs
- * to me marked all-frozen, update the VM Note that all_frozen is only
- * valid if all_visible is true, so we must check both all_visible and
- * all_frozen.
- */
- else if (presult.all_visible &&
- (!all_visible_according_to_vm ||
- (presult.all_frozen && !VM_ALL_FROZEN(vacrel->rel, blkno, &vmbuffer))))
- {
- uint8 old_vmbits;
- uint8 flags = VISIBILITYMAP_ALL_VISIBLE;
-
- /*
- * If the page is all-frozen, we can pass InvalidTransactionId as our
- * cutoff_xid, since a snapshotConflictHorizon sufficient to make
- * everything safe for REDO was logged when the page's tuples were
- * frozen.
- */
- if (presult.all_frozen)
- {
- Assert(!TransactionIdIsValid(presult.vm_conflict_horizon));
- flags |= VISIBILITYMAP_ALL_FROZEN;
- }
-
- /*
- * It should never be the case that the visibility map page is set
- * while the page-level bit is clear, but the reverse is allowed (if
- * checksums are not enabled). Regardless, set both bits so that we
- * get back in sync.
- *
- * If the heap page is all-visible but the VM bit is not set, we don't
- * need to dirty the heap page. However, if checksums are enabled, we
- * do need to make sure that the heap page is dirtied before passing
- * it to visibilitymap_set(), because it may be logged.
- */
- if (!PageIsAllVisible(page) || XLogHintBitIsNeeded())
- {
- PageSetAllVisible(page);
- MarkBufferDirty(buf);
- }
-
- old_vmbits = visibilitymap_set(vacrel->rel, blkno, buf,
- InvalidXLogRecPtr,
- vmbuffer, presult.vm_conflict_horizon,
- flags);
-
- /*
- * Even if we are only setting the all-frozen bit, there is a small
- * chance that the VM was modified sometime between setting
- * all_visible_according_to_vm and checking the visibility during
- * pruning. Check the return value of old_vmbits to ensure the
- * visibility map counters used for logging are accurate.
- *
- * If the page wasn't already set all-visible and/or all-frozen in the
- * VM, count it as newly set for logging.
- */
- if ((old_vmbits & VISIBILITYMAP_ALL_VISIBLE) == 0)
- {
- vacrel->vm_new_visible_pages++;
- if (presult.all_frozen)
- {
- vacrel->vm_new_visible_frozen_pages++;
- *vm_page_frozen = true;
- }
- }
- else if ((old_vmbits & VISIBILITYMAP_ALL_FROZEN) == 0 &&
- presult.all_frozen)
+ vacrel->vm_new_visible_pages++;
+ if ((presult.new_vmbits & VISIBILITYMAP_ALL_FROZEN) != 0)
{
- vacrel->vm_new_frozen_pages++;
+ vacrel->vm_new_visible_frozen_pages++;
*vm_page_frozen = true;
}
}
+ else if ((presult.old_vmbits & VISIBILITYMAP_ALL_FROZEN) == 0 &&
+ (presult.new_vmbits & VISIBILITYMAP_ALL_FROZEN) != 0)
+ {
+ Assert((presult.new_vmbits & VISIBILITYMAP_ALL_VISIBLE) != 0);
+ vacrel->vm_new_frozen_pages++;
+ *vm_page_frozen = true;
+ }
return presult.ndeleted;
}
diff --git a/src/include/access/heapam.h b/src/include/access/heapam.h
index 0c7eb5e46f4..b85648456e9 100644
--- a/src/include/access/heapam.h
+++ b/src/include/access/heapam.h
@@ -235,20 +235,21 @@ typedef struct PruneFreezeResult
int recently_dead_tuples;
/*
- * all_visible and all_frozen indicate if the all-visible and all-frozen
- * bits in the visibility map can be set for this page, after pruning.
+ * all_visible and all_frozen indicate the status of the page as reflected
+ * in the visibility map after pruning, freezing, and setting any pages
+ * all-visible in the visibility map.
*
- * vm_conflict_horizon is the newest xmin of live tuples on the page. The
- * caller can use it as the conflict horizon when setting the VM bits. It
- * is only valid if we froze some tuples (nfrozen > 0), and all_frozen is
- * true.
+ * vm_conflict_horizon is the newest xmin of live tuples on the page
+ * (older than OldestXmin). It will only be valid if we did not set the
+ * page all-frozen in the VM.
*
* These are only set if the HEAP_PRUNE_FREEZE option is set.
*/
bool all_visible;
bool all_frozen;
TransactionId vm_conflict_horizon;
- bool vm_corruption;
+ uint8 old_vmbits;
+ uint8 new_vmbits;
/*
* Whether or not the page makes rel truncation unsafe. This is set to
--
2.43.0
v10-0014-Remove-xl_heap_visible-entirely.patchtext/x-patch; charset=US-ASCII; name=v10-0014-Remove-xl_heap_visible-entirely.patchDownload
From edae8164cc6caf87dfeeb620ac0214ddad4e1b83 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <melanieplageman@gmail.com>
Date: Wed, 18 Jun 2025 12:30:42 -0400
Subject: [PATCH v10 14/22] Remove xl_heap_visible entirely
There are now no users of this, so eliminate it entirely.
---
src/backend/access/common/bufmask.c | 3 +-
src/backend/access/heap/heapam.c | 54 +-------
src/backend/access/heap/heapam_xlog.c | 152 ++---------------------
src/backend/access/heap/pruneheap.c | 4 +-
src/backend/access/heap/vacuumlazy.c | 10 +-
src/backend/access/heap/visibilitymap.c | 109 +---------------
src/backend/access/rmgrdesc/heapdesc.c | 10 --
src/backend/replication/logical/decode.c | 1 -
src/backend/storage/ipc/standby.c | 12 +-
src/include/access/heapam_xlog.h | 20 ---
src/include/access/visibilitymap.h | 11 +-
src/include/access/visibilitymapdefs.h | 9 --
src/tools/pgindent/typedefs.list | 1 -
13 files changed, 32 insertions(+), 364 deletions(-)
diff --git a/src/backend/access/common/bufmask.c b/src/backend/access/common/bufmask.c
index bb260cffa68..1fff01383b3 100644
--- a/src/backend/access/common/bufmask.c
+++ b/src/backend/access/common/bufmask.c
@@ -56,8 +56,7 @@ mask_page_hint_bits(Page page)
/*
* During replay, if the page LSN has advanced past our XLOG record's LSN,
- * we don't mark the page all-visible. See heap_xlog_visible() for
- * details.
+ * we don't mark the page all-visible.
*/
PageClearAllVisible(page);
}
diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index 893a739009a..cb16bb0cbbd 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -40,6 +40,7 @@
#include "access/valid.h"
#include "access/visibilitymap.h"
#include "access/xloginsert.h"
+#include "access/xlogutils.h"
#include "catalog/pg_database.h"
#include "catalog/pg_database_d.h"
#include "commands/vacuum.h"
@@ -2523,11 +2524,11 @@ heap_multi_insert(Relation relation, TupleTableSlot **slots, int ntuples,
{
PageSetAllVisible(page);
LockBuffer(vmbuffer, BUFFER_LOCK_EXCLUSIVE);
- visibilitymap_set_vmbits(relation,
- BufferGetBlockNumber(buffer),
- vmbuffer,
- VISIBILITYMAP_ALL_VISIBLE |
- VISIBILITYMAP_ALL_FROZEN);
+ visibilitymap_set(relation,
+ BufferGetBlockNumber(buffer),
+ vmbuffer,
+ VISIBILITYMAP_ALL_VISIBLE |
+ VISIBILITYMAP_ALL_FROZEN);
}
/*
@@ -8798,49 +8799,6 @@ bottomup_sort_and_shrink(TM_IndexDeleteOp *delstate)
return nblocksfavorable;
}
-/*
- * Perform XLogInsert for a heap-visible operation. 'block' is the block
- * being marked all-visible, and vm_buffer is the buffer containing the
- * corresponding visibility map block. Both should have already been modified
- * and dirtied.
- *
- * snapshotConflictHorizon comes from the largest xmin on the page being
- * marked all-visible. REDO routine uses it to generate recovery conflicts.
- *
- * If checksums or wal_log_hints are enabled, we may also generate a full-page
- * image of heap_buffer. Otherwise, we optimize away the FPI (by specifying
- * REGBUF_NO_IMAGE for the heap buffer), in which case the caller should *not*
- * update the heap page's LSN.
- */
-XLogRecPtr
-log_heap_visible(Relation rel, Buffer heap_buffer, Buffer vm_buffer,
- TransactionId snapshotConflictHorizon, uint8 vmflags)
-{
- xl_heap_visible xlrec;
- XLogRecPtr recptr;
- uint8 flags;
-
- Assert(BufferIsValid(heap_buffer));
- Assert(BufferIsValid(vm_buffer));
-
- xlrec.snapshotConflictHorizon = snapshotConflictHorizon;
- xlrec.flags = vmflags;
- if (RelationIsAccessibleInLogicalDecoding(rel))
- xlrec.flags |= VISIBILITYMAP_XLOG_CATALOG_REL;
- XLogBeginInsert();
- XLogRegisterData(&xlrec, SizeOfHeapVisible);
-
- XLogRegisterBuffer(0, vm_buffer, 0);
-
- flags = REGBUF_STANDARD;
- if (!XLogHintBitIsNeeded())
- flags |= REGBUF_NO_IMAGE;
- XLogRegisterBuffer(1, heap_buffer, flags);
-
- recptr = XLogInsert(RM_HEAP2_ID, XLOG_HEAP2_VISIBLE);
-
- return recptr;
-}
/*
* Perform XLogInsert for a heap-update operation. Caller must already
diff --git a/src/backend/access/heap/heapam_xlog.c b/src/backend/access/heap/heapam_xlog.c
index 11c11929ed9..ff3ad8b4cd2 100644
--- a/src/backend/access/heap/heapam_xlog.c
+++ b/src/backend/access/heap/heapam_xlog.c
@@ -53,6 +53,8 @@ heap_xlog_prune_freeze(XLogReaderState *record)
(xlrec.flags & (XLHP_HAS_REDIRECTIONS | XLHP_HAS_DEAD_ITEMS)) == 0);
vmflags = xlrec.flags & VISIBILITYMAP_VALID_BITS;
+ /* Must never set all_frozen bit without also setting all_visible bit */
+ Assert(vmflags != VISIBILITYMAP_ALL_FROZEN);
/*
* After xl_heap_prune is the optional snapshot conflict horizon.
@@ -250,7 +252,7 @@ heap_xlog_prune_freeze(XLogReaderState *record)
if (PageIsNew(vmpage))
PageInit(vmpage, BLCKSZ, 0);
- old_vmbits = visibilitymap_set_vmbits(reln, blkno, vmbuffer, vmflags);
+ old_vmbits = visibilitymap_set(reln, blkno, vmbuffer, vmflags);
/* Only set VM page LSN if we modified the page */
if (old_vmbits != vmflags)
@@ -269,142 +271,6 @@ heap_xlog_prune_freeze(XLogReaderState *record)
XLogRecordPageWithFreeSpace(rlocator, blkno, freespace);
}
-/*
- * Replay XLOG_HEAP2_VISIBLE records.
- *
- * The critical integrity requirement here is that we must never end up with
- * a situation where the visibility map bit is set, and the page-level
- * PD_ALL_VISIBLE bit is clear. If that were to occur, then a subsequent
- * page modification would fail to clear the visibility map bit.
- */
-static void
-heap_xlog_visible(XLogReaderState *record)
-{
- XLogRecPtr lsn = record->EndRecPtr;
- xl_heap_visible *xlrec = (xl_heap_visible *) XLogRecGetData(record);
- Buffer vmbuffer = InvalidBuffer;
- Buffer buffer;
- Page page;
- RelFileLocator rlocator;
- BlockNumber blkno;
- XLogRedoAction action;
-
- Assert((xlrec->flags & VISIBILITYMAP_XLOG_VALID_BITS) == xlrec->flags);
-
- XLogRecGetBlockTag(record, 1, &rlocator, NULL, &blkno);
-
- /*
- * If there are any Hot Standby transactions running that have an xmin
- * horizon old enough that this page isn't all-visible for them, they
- * might incorrectly decide that an index-only scan can skip a heap fetch.
- *
- * NB: It might be better to throw some kind of "soft" conflict here that
- * forces any index-only scan that is in flight to perform heap fetches,
- * rather than killing the transaction outright.
- */
- if (InHotStandby)
- ResolveRecoveryConflictWithSnapshot(xlrec->snapshotConflictHorizon,
- xlrec->flags & VISIBILITYMAP_XLOG_CATALOG_REL,
- rlocator);
-
- /*
- * Read the heap page, if it still exists. If the heap file has dropped or
- * truncated later in recovery, we don't need to update the page, but we'd
- * better still update the visibility map.
- */
- action = XLogReadBufferForRedo(record, 1, &buffer);
- if (action == BLK_NEEDS_REDO)
- {
- /*
- * We don't bump the LSN of the heap page when setting the visibility
- * map bit (unless checksums or wal_hint_bits is enabled, in which
- * case we must). This exposes us to torn page hazards, but since
- * we're not inspecting the existing page contents in any way, we
- * don't care.
- */
- page = BufferGetPage(buffer);
-
- PageSetAllVisible(page);
-
- if (XLogHintBitIsNeeded())
- PageSetLSN(page, lsn);
-
- MarkBufferDirty(buffer);
- }
- else if (action == BLK_RESTORED)
- {
- /*
- * If heap block was backed up, we already restored it and there's
- * nothing more to do. (This can only happen with checksums or
- * wal_log_hints enabled.)
- */
- }
-
- if (BufferIsValid(buffer))
- {
- Size space = PageGetFreeSpace(BufferGetPage(buffer));
-
- UnlockReleaseBuffer(buffer);
-
- /*
- * Since FSM is not WAL-logged and only updated heuristically, it
- * easily becomes stale in standbys. If the standby is later promoted
- * and runs VACUUM, it will skip updating individual free space
- * figures for pages that became all-visible (or all-frozen, depending
- * on the vacuum mode,) which is troublesome when FreeSpaceMapVacuum
- * propagates too optimistic free space values to upper FSM layers;
- * later inserters try to use such pages only to find out that they
- * are unusable. This can cause long stalls when there are many such
- * pages.
- *
- * Forestall those problems by updating FSM's idea about a page that
- * is becoming all-visible or all-frozen.
- *
- * Do this regardless of a full-page image being applied, since the
- * FSM data is not in the page anyway.
- */
- if (xlrec->flags & VISIBILITYMAP_VALID_BITS)
- XLogRecordPageWithFreeSpace(rlocator, blkno, space);
- }
-
- /*
- * Even if we skipped the heap page update due to the LSN interlock, it's
- * still safe to update the visibility map. Any WAL record that clears
- * the visibility map bit does so before checking the page LSN, so any
- * bits that need to be cleared will still be cleared.
- */
- if (XLogReadBufferForRedoExtended(record, 0, RBM_ZERO_ON_ERROR, false,
- &vmbuffer) == BLK_NEEDS_REDO)
- {
- Page vmpage = BufferGetPage(vmbuffer);
- Relation reln;
- uint8 vmbits;
-
- /* initialize the page if it was read as zeros */
- if (PageIsNew(vmpage))
- PageInit(vmpage, BLCKSZ, 0);
-
- /* remove VISIBILITYMAP_XLOG_* */
- vmbits = xlrec->flags & VISIBILITYMAP_VALID_BITS;
-
- /*
- * XLogReadBufferForRedoExtended locked the buffer. But
- * visibilitymap_set will handle locking itself.
- */
- LockBuffer(vmbuffer, BUFFER_LOCK_UNLOCK);
-
- reln = CreateFakeRelcacheEntry(rlocator);
-
- visibilitymap_set(reln, blkno, InvalidBuffer, lsn, vmbuffer,
- xlrec->snapshotConflictHorizon, vmbits);
-
- ReleaseBuffer(vmbuffer);
- FreeFakeRelcacheEntry(reln);
- }
- else if (BufferIsValid(vmbuffer))
- UnlockReleaseBuffer(vmbuffer);
-}
-
/*
* Given an "infobits" field from an XLog record, set the correct bits in the
* given infomask and infomask2 for the tuple touched by the record.
@@ -785,15 +651,14 @@ heap_xlog_multi_insert(XLogReaderState *record)
if (PageIsNew(vmpage))
PageInit(vmpage, BLCKSZ, 0);
- visibilitymap_set_vmbits(reln, blkno,
- vmbuffer,
- VISIBILITYMAP_ALL_VISIBLE |
- VISIBILITYMAP_ALL_FROZEN);
-
/*
* It is not possible that the VM was already set for this heap page,
* so the vmbuffer must have been modified and marked dirty.
*/
+ visibilitymap_set(reln, blkno,
+ vmbuffer,
+ VISIBILITYMAP_ALL_VISIBLE |
+ VISIBILITYMAP_ALL_FROZEN);
Assert(BufferIsDirty(vmbuffer));
PageSetLSN(BufferGetPage(vmbuffer), lsn);
FreeFakeRelcacheEntry(reln);
@@ -1374,9 +1239,6 @@ heap2_redo(XLogReaderState *record)
case XLOG_HEAP2_PRUNE_VACUUM_CLEANUP:
heap_xlog_prune_freeze(record);
break;
- case XLOG_HEAP2_VISIBLE:
- heap_xlog_visible(record);
- break;
case XLOG_HEAP2_MULTI_INSERT:
heap_xlog_multi_insert(record);
break;
diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index 96b2d58e40c..3fe9db99c0d 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -984,8 +984,8 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
{
Assert(PageIsAllVisible(page));
LockBuffer(vmbuffer, BUFFER_LOCK_EXCLUSIVE);
- old_vmbits = visibilitymap_set_vmbits(relation, blockno,
- vmbuffer, vmflags);
+ old_vmbits = visibilitymap_set(relation, blockno,
+ vmbuffer, vmflags);
if (old_vmbits == vmflags)
{
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index 75c10ba20c6..2ff67d77cb4 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -1886,8 +1886,8 @@ lazy_scan_new_or_empty(LVRelState *vacrel, Buffer buf, BlockNumber blkno,
MarkBufferDirty(buf);
LockBuffer(vmbuffer, BUFFER_LOCK_EXCLUSIVE);
- visibilitymap_set_vmbits(vacrel->rel, blkno,
- vmbuffer, new_vmbits);
+ visibilitymap_set(vacrel->rel, blkno,
+ vmbuffer, new_vmbits);
if (RelationNeedsWAL(vacrel->rel))
{
@@ -2753,9 +2753,9 @@ lazy_vacuum_heap_page(LVRelState *vacrel, BlockNumber blkno, Buffer buffer,
set_pd_all_vis = true;
PageSetAllVisible(page);
LockBuffer(vmbuffer, BUFFER_LOCK_EXCLUSIVE);
- visibilitymap_set_vmbits(vacrel->rel,
- blkno,
- vmbuffer, vmflags);
+ visibilitymap_set(vacrel->rel,
+ blkno,
+ vmbuffer, vmflags);
conflict_xid = visibility_cutoff_xid;
}
diff --git a/src/backend/access/heap/visibilitymap.c b/src/backend/access/heap/visibilitymap.c
index bb8dfd8910a..f7bad68ffc5 100644
--- a/src/backend/access/heap/visibilitymap.c
+++ b/src/backend/access/heap/visibilitymap.c
@@ -14,8 +14,7 @@
* visibilitymap_clear - clear bits for one page in the visibility map
* visibilitymap_pin - pin a map page for setting a bit
* visibilitymap_pin_ok - check whether correct map page is already pinned
- * visibilitymap_set - set bit(s) in a previously pinned page and log
- * visibilitymap_set_vmbits - set bit(s) in a pinned page
+ * visibilitymap_set - set bit(s) in a previously pinned page
* visibilitymap_get_status - get status of bits
* visibilitymap_count - count number of bits set in visibility map
* visibilitymap_prepare_truncate -
@@ -220,108 +219,6 @@ visibilitymap_pin_ok(BlockNumber heapBlk, Buffer vmbuf)
return BufferIsValid(vmbuf) && BufferGetBlockNumber(vmbuf) == mapBlock;
}
-/*
- * visibilitymap_set - set bit(s) on a previously pinned page
- *
- * recptr is the LSN of the XLOG record we're replaying, if we're in recovery,
- * or InvalidXLogRecPtr in normal running. The VM page LSN is advanced to the
- * one provided; in normal running, we generate a new XLOG record and set the
- * page LSN to that value (though the heap page's LSN may *not* be updated;
- * see below). cutoff_xid is the largest xmin on the page being marked
- * all-visible; it is needed for Hot Standby, and can be InvalidTransactionId
- * if the page contains no tuples. It can also be set to InvalidTransactionId
- * when a page that is already all-visible is being marked all-frozen.
- *
- * Caller is expected to set the heap page's PD_ALL_VISIBLE bit before calling
- * this function. Except in recovery, caller should also pass the heap
- * buffer. When checksums are enabled and we're not in recovery, we must add
- * the heap buffer to the WAL chain to protect it from being torn.
- *
- * You must pass a buffer containing the correct map page to this function.
- * Call visibilitymap_pin first to pin the right one. This function doesn't do
- * any I/O.
- *
- * Returns the state of the page's VM bits before setting flags.
- */
-uint8
-visibilitymap_set(Relation rel, BlockNumber heapBlk, Buffer heapBuf,
- XLogRecPtr recptr, Buffer vmBuf, TransactionId cutoff_xid,
- uint8 flags)
-{
- BlockNumber mapBlock = HEAPBLK_TO_MAPBLOCK(heapBlk);
- uint32 mapByte = HEAPBLK_TO_MAPBYTE(heapBlk);
- uint8 mapOffset = HEAPBLK_TO_OFFSET(heapBlk);
- Page page;
- uint8 *map;
- uint8 status;
-
-#ifdef TRACE_VISIBILITYMAP
- elog(DEBUG1, "vm_set flags 0x%02X for %s %d",
- flags, RelationGetRelationName(rel), heapBlk);
-#endif
-
- Assert(InRecovery || XLogRecPtrIsInvalid(recptr));
- Assert(InRecovery || PageIsAllVisible(BufferGetPage(heapBuf)));
- Assert((flags & VISIBILITYMAP_VALID_BITS) == flags);
-
- /* Must never set all_frozen bit without also setting all_visible bit */
- Assert(flags != VISIBILITYMAP_ALL_FROZEN);
-
- /* Check that we have the right heap page pinned, if present */
- if (BufferIsValid(heapBuf) && BufferGetBlockNumber(heapBuf) != heapBlk)
- elog(ERROR, "wrong heap buffer passed to visibilitymap_set");
-
- Assert(!BufferIsValid(heapBuf) || BufferIsExclusiveLocked(heapBuf));
-
- /* Check that we have the right VM page pinned */
- if (!BufferIsValid(vmBuf) || BufferGetBlockNumber(vmBuf) != mapBlock)
- elog(ERROR, "wrong VM buffer passed to visibilitymap_set");
-
- page = BufferGetPage(vmBuf);
- map = (uint8 *) PageGetContents(page);
- LockBuffer(vmBuf, BUFFER_LOCK_EXCLUSIVE);
-
- status = (map[mapByte] >> mapOffset) & VISIBILITYMAP_VALID_BITS;
- if (flags != status)
- {
- START_CRIT_SECTION();
-
- map[mapByte] |= (flags << mapOffset);
- MarkBufferDirty(vmBuf);
-
- if (RelationNeedsWAL(rel))
- {
- if (XLogRecPtrIsInvalid(recptr))
- {
- Assert(!InRecovery);
- recptr = log_heap_visible(rel, heapBuf, vmBuf, cutoff_xid, flags);
-
- /*
- * If data checksums are enabled (or wal_log_hints=on), we
- * need to protect the heap page from being torn.
- *
- * If not, then we must *not* update the heap page's LSN. In
- * this case, the FPI for the heap page was omitted from the
- * WAL record inserted above, so it would be incorrect to
- * update the heap page's LSN.
- */
- if (XLogHintBitIsNeeded())
- {
- Page heapPage = BufferGetPage(heapBuf);
-
- PageSetLSN(heapPage, recptr);
- }
- }
- PageSetLSN(page, recptr);
- }
-
- END_CRIT_SECTION();
- }
-
- LockBuffer(vmBuf, BUFFER_LOCK_UNLOCK);
- return status;
-}
-
/*
* Set flags in the VM block contained in the passed in vmBuf.
*
@@ -343,8 +240,8 @@ visibilitymap_set(Relation rel, BlockNumber heapBlk, Buffer heapBuf,
* is pinned and exclusive locked.
*/
uint8
-visibilitymap_set_vmbits(Relation rel, BlockNumber heapBlk,
- Buffer vmBuf, uint8 flags)
+visibilitymap_set(Relation rel, BlockNumber heapBlk,
+ Buffer vmBuf, uint8 flags)
{
BlockNumber mapBlock = HEAPBLK_TO_MAPBLOCK(heapBlk);
uint32 mapByte = HEAPBLK_TO_MAPBYTE(heapBlk);
diff --git a/src/backend/access/rmgrdesc/heapdesc.c b/src/backend/access/rmgrdesc/heapdesc.c
index c95d30dfe8d..47998f1df15 100644
--- a/src/backend/access/rmgrdesc/heapdesc.c
+++ b/src/backend/access/rmgrdesc/heapdesc.c
@@ -343,13 +343,6 @@ heap2_desc(StringInfo buf, XLogReaderState *record)
}
}
}
- else if (info == XLOG_HEAP2_VISIBLE)
- {
- xl_heap_visible *xlrec = (xl_heap_visible *) rec;
-
- appendStringInfo(buf, "snapshotConflictHorizon: %u, flags: 0x%02X",
- xlrec->snapshotConflictHorizon, xlrec->flags);
- }
else if (info == XLOG_HEAP2_MULTI_INSERT)
{
xl_heap_multi_insert *xlrec = (xl_heap_multi_insert *) rec;
@@ -454,9 +447,6 @@ heap2_identify(uint8 info)
case XLOG_HEAP2_PRUNE_VACUUM_CLEANUP:
id = "PRUNE_VACUUM_CLEANUP";
break;
- case XLOG_HEAP2_VISIBLE:
- id = "VISIBLE";
- break;
case XLOG_HEAP2_MULTI_INSERT:
id = "MULTI_INSERT";
break;
diff --git a/src/backend/replication/logical/decode.c b/src/backend/replication/logical/decode.c
index cc03f0706e9..2fdd4af90a8 100644
--- a/src/backend/replication/logical/decode.c
+++ b/src/backend/replication/logical/decode.c
@@ -454,7 +454,6 @@ heap2_decode(LogicalDecodingContext *ctx, XLogRecordBuffer *buf)
case XLOG_HEAP2_PRUNE_ON_ACCESS:
case XLOG_HEAP2_PRUNE_VACUUM_SCAN:
case XLOG_HEAP2_PRUNE_VACUUM_CLEANUP:
- case XLOG_HEAP2_VISIBLE:
case XLOG_HEAP2_LOCK_UPDATED:
break;
default:
diff --git a/src/backend/storage/ipc/standby.c b/src/backend/storage/ipc/standby.c
index 4222bdab078..c619643e121 100644
--- a/src/backend/storage/ipc/standby.c
+++ b/src/backend/storage/ipc/standby.c
@@ -475,12 +475,12 @@ ResolveRecoveryConflictWithSnapshot(TransactionId snapshotConflictHorizon,
* If we get passed InvalidTransactionId then we do nothing (no conflict).
*
* This can happen when replaying already-applied WAL records after a
- * standby crash or restart, or when replaying an XLOG_HEAP2_VISIBLE
- * record that marks as frozen a page which was already all-visible. It's
- * also quite common with records generated during index deletion
- * (original execution of the deletion can reason that a recovery conflict
- * which is sufficient for the deletion operation must take place before
- * replay of the deletion record itself).
+ * standby crash or restart, or when replaying a record that marks as
+ * frozen a page which was already marked all-visible in the visibility
+ * map. It's also quite common with records generated during index
+ * deletion (original execution of the deletion can reason that a recovery
+ * conflict which is sufficient for the deletion operation must take place
+ * before replay of the deletion record itself).
*/
if (!TransactionIdIsValid(snapshotConflictHorizon))
return;
diff --git a/src/include/access/heapam_xlog.h b/src/include/access/heapam_xlog.h
index d8508593e7c..3672f372aa8 100644
--- a/src/include/access/heapam_xlog.h
+++ b/src/include/access/heapam_xlog.h
@@ -60,7 +60,6 @@
#define XLOG_HEAP2_PRUNE_ON_ACCESS 0x10
#define XLOG_HEAP2_PRUNE_VACUUM_SCAN 0x20
#define XLOG_HEAP2_PRUNE_VACUUM_CLEANUP 0x30
-#define XLOG_HEAP2_VISIBLE 0x40
#define XLOG_HEAP2_MULTI_INSERT 0x50
#define XLOG_HEAP2_LOCK_UPDATED 0x60
#define XLOG_HEAP2_NEW_CID 0x70
@@ -446,20 +445,6 @@ typedef struct xl_heap_inplace
#define MinSizeOfHeapInplace (offsetof(xl_heap_inplace, nmsgs) + sizeof(int))
-/*
- * This is what we need to know about setting a visibility map bit
- *
- * Backup blk 0: visibility map buffer
- * Backup blk 1: heap buffer
- */
-typedef struct xl_heap_visible
-{
- TransactionId snapshotConflictHorizon;
- uint8 flags;
-} xl_heap_visible;
-
-#define SizeOfHeapVisible (offsetof(xl_heap_visible, flags) + sizeof(uint8))
-
typedef struct xl_heap_new_cid
{
/*
@@ -503,11 +488,6 @@ extern void heap2_desc(StringInfo buf, XLogReaderState *record);
extern const char *heap2_identify(uint8 info);
extern void heap_xlog_logical_rewrite(XLogReaderState *r);
-extern XLogRecPtr log_heap_visible(Relation rel, Buffer heap_buffer,
- Buffer vm_buffer,
- TransactionId snapshotConflictHorizon,
- uint8 vmflags);
-
/* in heapdesc.c, so it can be shared between frontend/backend code */
extern void heap_xlog_deserialize_prune_and_freeze(char *cursor, uint16 flags,
int *nplans, xlhp_freeze_plan **plans,
diff --git a/src/include/access/visibilitymap.h b/src/include/access/visibilitymap.h
index fc7056a91ea..c5b1e1f7adb 100644
--- a/src/include/access/visibilitymap.h
+++ b/src/include/access/visibilitymap.h
@@ -15,7 +15,6 @@
#define VISIBILITYMAP_H
#include "access/visibilitymapdefs.h"
-#include "access/xlogdefs.h"
#include "storage/block.h"
#include "storage/buf.h"
#include "utils/relcache.h"
@@ -31,14 +30,8 @@ extern bool visibilitymap_clear(Relation rel, BlockNumber heapBlk,
extern void visibilitymap_pin(Relation rel, BlockNumber heapBlk,
Buffer *vmbuf);
extern bool visibilitymap_pin_ok(BlockNumber heapBlk, Buffer vmbuf);
-extern uint8 visibilitymap_set(Relation rel,
- BlockNumber heapBlk, Buffer heapBuf,
- XLogRecPtr recptr,
- Buffer vmBuf,
- TransactionId cutoff_xid,
- uint8 flags);
-extern uint8 visibilitymap_set_vmbits(Relation rel, BlockNumber heapBlk,
- Buffer vmBuf, uint8 flags);
+extern uint8 visibilitymap_set(Relation rel, BlockNumber heapBlk,
+ Buffer vmBuf, uint8 flags);
extern uint8 visibilitymap_get_status(Relation rel, BlockNumber heapBlk, Buffer *vmbuf);
extern void visibilitymap_count(Relation rel, BlockNumber *all_visible, BlockNumber *all_frozen);
extern BlockNumber visibilitymap_prepare_truncate(Relation rel,
diff --git a/src/include/access/visibilitymapdefs.h b/src/include/access/visibilitymapdefs.h
index 5ad5c020877..e01bce4c99f 100644
--- a/src/include/access/visibilitymapdefs.h
+++ b/src/include/access/visibilitymapdefs.h
@@ -21,14 +21,5 @@
#define VISIBILITYMAP_ALL_FROZEN 0x02
#define VISIBILITYMAP_VALID_BITS 0x03 /* OR of all valid visibilitymap
* flags bits */
-/*
- * To detect recovery conflicts during logical decoding on a standby, we need
- * to know if a table is a user catalog table. For that we add an additional
- * bit into xl_heap_visible.flags, in addition to the above.
- *
- * NB: VISIBILITYMAP_XLOG_* may not be passed to visibilitymap_set().
- */
-#define VISIBILITYMAP_XLOG_CATALOG_REL 0x04
-#define VISIBILITYMAP_XLOG_VALID_BITS (VISIBILITYMAP_VALID_BITS | VISIBILITYMAP_XLOG_CATALOG_REL)
#endif /* VISIBILITYMAPDEFS_H */
diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list
index a13e8162890..b4c880c083f 100644
--- a/src/tools/pgindent/typedefs.list
+++ b/src/tools/pgindent/typedefs.list
@@ -4273,7 +4273,6 @@ xl_heap_prune
xl_heap_rewrite_mapping
xl_heap_truncate
xl_heap_update
-xl_heap_visible
xl_invalid_page
xl_invalid_page_key
xl_invalidations
--
2.43.0
v10-0013-Rename-PruneState.freeze-to-attempt_freeze.patchtext/x-patch; charset=US-ASCII; name=v10-0013-Rename-PruneState.freeze-to-attempt_freeze.patchDownload
From 273f80c9d6ba29b9f689584a48c0e28e65280287 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <melanieplageman@gmail.com>
Date: Thu, 31 Jul 2025 14:07:51 -0400
Subject: [PATCH v10 13/22] Rename PruneState.freeze to attempt_freeze
This makes it more clear that this is to indicate the caller would like
heap_page_prune_and_freeze() to consider freezing tuples -- not that we
ultimately will end up freezing them.
Also rename local variable hint_bit_fpi to did_tuple_hint_fpi. This
makes it clear it is about tuple hints and not page hints and that it
indicates something that happened and not something that could happen.
---
src/backend/access/heap/pruneheap.c | 20 ++++++++++----------
1 file changed, 10 insertions(+), 10 deletions(-)
diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index a415db2c01e..96b2d58e40c 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -43,7 +43,7 @@ typedef struct
/* whether or not dead items can be set LP_UNUSED during pruning */
bool mark_unused_now;
/* whether to attempt freezing tuples */
- bool freeze;
+ bool attempt_freeze;
/*
* Whether or not to consider updating the VM. There is some bookkeeping
@@ -456,7 +456,7 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
bool do_set_vm;
uint8 vmflags = 0;
uint8 old_vmbits = 0;
- bool hint_bit_fpi;
+ bool did_tuple_hint_fpi;
int64 fpi_before = pgWalUsage.wal_fpi;
bool all_frozen_except_lp_dead = false;
bool set_pd_all_visible = false;
@@ -464,7 +464,7 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
/* Copy parameters to prstate */
prstate.vistest = vistest;
prstate.mark_unused_now = (options & HEAP_PAGE_PRUNE_MARK_UNUSED_NOW) != 0;
- prstate.freeze = (options & HEAP_PAGE_PRUNE_FREEZE) != 0;
+ prstate.attempt_freeze = (options & HEAP_PAGE_PRUNE_FREEZE) != 0;
prstate.consider_update_vm = (options & HEAP_PAGE_PRUNE_UPDATE_VM) != 0;
prstate.cutoffs = cutoffs;
@@ -489,7 +489,7 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
/* initialize page freezing working state */
prstate.pagefrz.freeze_required = false;
- if (prstate.freeze)
+ if (prstate.attempt_freeze)
{
Assert(new_relfrozen_xid && new_relmin_mxid);
prstate.pagefrz.FreezePageRelfrozenXid = *new_relfrozen_xid;
@@ -539,7 +539,7 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
* bookkeeping. Initializing all_visible to false allows skipping the work
* to update them in heap_prune_record_unchanged_lp_normal().
*/
- if (prstate.freeze)
+ if (prstate.attempt_freeze)
{
prstate.all_visible = true;
prstate.all_frozen = true;
@@ -657,7 +657,7 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
* If checksums are enabled, heap_prune_satisfies_vacuum() may have caused
* an FPI to be emitted.
*/
- hint_bit_fpi = fpi_before != pgWalUsage.wal_fpi;
+ did_tuple_hint_fpi = fpi_before != pgWalUsage.wal_fpi;
/*
* Process HOT chains.
@@ -774,7 +774,7 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
* plans we prepared, or not.
*/
do_freeze = false;
- if (prstate.freeze)
+ if (prstate.attempt_freeze)
{
if (prstate.pagefrz.freeze_required)
{
@@ -807,7 +807,7 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
*/
if (RelationNeedsWAL(relation))
{
- if (hint_bit_fpi)
+ if (did_tuple_hint_fpi)
do_freeze = true;
else if (do_prune)
{
@@ -1132,7 +1132,7 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
presult->lpdead_items = prstate.lpdead_items;
/* the presult->deadoffsets array was already filled in */
- if (prstate.freeze)
+ if (prstate.attempt_freeze)
{
if (presult->nfrozen > 0)
{
@@ -1719,7 +1719,7 @@ heap_prune_record_unchanged_lp_normal(Page page, PruneState *prstate, OffsetNumb
* to update the VM, we have to call heap_prepare_freeze_tuple() on every
* tuple to know whether or not the page will be totally frozen.
*/
- if (prstate->freeze)
+ if (prstate->attempt_freeze)
{
bool totally_frozen;
--
2.43.0
v10-0016-Use-GlobalVisState-to-determine-page-level-visib.patchtext/x-patch; charset=US-ASCII; name=v10-0016-Use-GlobalVisState-to-determine-page-level-visib.patchDownload
From c3053237e5ee43c8c2a023b1f6e1a018fe55de2f Mon Sep 17 00:00:00 2001
From: Melanie Plageman <melanieplageman@gmail.com>
Date: Tue, 29 Jul 2025 14:38:24 -0400
Subject: [PATCH v10 16/22] Use GlobalVisState to determine page level
visibility
During pruning and during vacuum's third phase, we try to determine if
the whole page can be set all-visible in the visibility map. Instead of
using OldestXmin to determine if all the tuples on a page are visible to
everyone, use the GlobalVisState. This allows us to start setting the VM
during on-access pruning in a future commit.
It is possible for the GlobalVisState to change during the course of a
vacuum. In all but extraordinary cases, it moves forward, meaning more
pages could potentially be set in the VM.
Because comparing a transaction ID to the GlobalVisState requires more
operations than comparing it to another single transaction ID, we now
wait until after examining all the tuples on the page and if we have
maintained the visibility_cutoff_xid, we compare that to the
GlobalVisState just once per page. This works because if the page is
all-visible and has live, committed tuples on it, the
visibility_cutoff_xid will contain the newest xmin on the page. If
everyone can see it, the page is truly all-visible.
Doing this may mean we examine more tuples' xmins than before, as we may
have set all_visible to false sooner when encountering a live tuple
newer than OldestXmin. However, these extra comparisons were found not
to be significant in a profile.
---
src/backend/access/heap/heapam_visibility.c | 28 ++++++++++++
src/backend/access/heap/pruneheap.c | 48 +++++++++------------
src/backend/access/heap/vacuumlazy.c | 19 ++++----
src/include/access/heapam.h | 4 +-
4 files changed, 60 insertions(+), 39 deletions(-)
diff --git a/src/backend/access/heap/heapam_visibility.c b/src/backend/access/heap/heapam_visibility.c
index 4ebc8abdbeb..edd529dc3c0 100644
--- a/src/backend/access/heap/heapam_visibility.c
+++ b/src/backend/access/heap/heapam_visibility.c
@@ -1189,6 +1189,34 @@ HeapTupleSatisfiesVacuum(HeapTuple htup, TransactionId OldestXmin,
return res;
}
+/*
+ * Nearly the same as HeapTupleSatisfiesVacuum, but uses a GlobalVisState to
+ * determine whether or not a tuple is HEAPTUPLE_DEAD Or
+ * HEAPTUPLE_RECENTLY_DEAD. It serves the same purpose but can be used by
+ * callers that have not calculated a single OldestXmin value.
+ */
+HTSV_Result
+HeapTupleSatisfiesVacuumGlobalVis(HeapTuple htup, GlobalVisState *vistest,
+ Buffer buffer)
+{
+ TransactionId dead_after = InvalidTransactionId;
+ HTSV_Result res;
+
+ res = HeapTupleSatisfiesVacuumHorizon(htup, buffer, &dead_after);
+
+ if (res == HEAPTUPLE_RECENTLY_DEAD)
+ {
+ Assert(TransactionIdIsValid(dead_after));
+
+ if (GlobalVisXidVisibleToAll(vistest, dead_after))
+ res = HEAPTUPLE_DEAD;
+ }
+ else
+ Assert(!TransactionIdIsValid(dead_after));
+
+ return res;
+}
+
/*
* Work horse for HeapTupleSatisfiesVacuum and similar routines.
*
diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index 30736519191..5e8748b15ef 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -141,10 +141,9 @@ typedef struct
* all_visible and all_frozen indicate if the all-visible and all-frozen
* bits in the visibility map can be set for this page after pruning.
*
- * visibility_cutoff_xid is the newest xmin of live tuples on the page.
- * The caller can use it as the conflict horizon, when setting the VM
- * bits. It is only valid if we froze some tuples, and all_frozen is
- * true.
+ * visibility_cutoff_xid is the newest xmin of live tuples on the page. It
+ * can be used as the conflict horizon, when setting the VM or when
+ * freezing all the live tuples on the page.
*
* NOTE: all_visible and all_frozen don't include LP_DEAD items until
* directly before updating the VM. We ignore LP_DEAD items when deciding
@@ -557,14 +556,12 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
/*
* The visibility cutoff xid is the newest xmin of live, committed tuples
- * older than OldestXmin on the page. This field is only kept up-to-date
- * if the page is all-visible. As soon as a tuple is encountered that is
- * not visible to all, this field is unmaintained. As long as it is
- * maintained, it can be used to calculate the snapshot conflict horizon.
- * This is most likely to happen when updating the VM and/or freezing all
- * live tuples on the page. It is updated before returning to the caller
- * because vacuum does assert-build only validation on the page using this
- * field.
+ * on the page older than the visibility horizon represented in the
+ * GlobalVisState.
+ *
+ * If we encounter an uncommitted tuple, this field is unmaintained. If
+ * the page is being set all-visible or when freezing all live tuples on
+ * the page, it is used to calculate the snapshot conflict horizon.
*/
prstate.visibility_cutoff_xid = InvalidTransactionId;
@@ -760,6 +757,16 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
prstate.ndead > 0 ||
prstate.nunused > 0;
+ /*
+ * After processing all the live tuples on the page, if the newest xmin
+ * amongst them is not visible to everyone, the page cannot be
+ * all-visible.
+ */
+ if (prstate.all_visible &&
+ TransactionIdIsNormal(prstate.visibility_cutoff_xid) &&
+ !GlobalVisXidVisibleToAll(prstate.vistest, prstate.visibility_cutoff_xid))
+ prstate.all_visible = prstate.all_frozen = false;
+
/*
* Even if we don't prune anything, if we found a new value for the
* pd_prune_xid field or the page was marked full, we will update those
@@ -1103,12 +1110,10 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
TransactionId debug_cutoff;
bool debug_all_frozen;
- Assert(cutoffs);
-
Assert(prstate.lpdead_items == 0);
if (!heap_page_is_all_visible(relation, buffer,
- cutoffs->OldestXmin,
+ prstate.vistest,
&debug_all_frozen,
&debug_cutoff, off_loc))
Assert(false);
@@ -1633,19 +1638,6 @@ heap_prune_record_unchanged_lp_normal(Page page, PruneState *prstate, OffsetNumb
*/
xmin = HeapTupleHeaderGetXmin(htup);
- /*
- * For now always use prstate->cutoffs for this test, because
- * we only update 'all_visible' when freezing is requested. We
- * could use GlobalVisTestIsRemovableXid instead, if a
- * non-freezing caller wanted to set the VM bit.
- */
- Assert(prstate->cutoffs);
- if (!TransactionIdPrecedes(xmin, prstate->cutoffs->OldestXmin))
- {
- prstate->all_visible = prstate->all_frozen = false;
- break;
- }
-
/* Track newest xmin on page. */
if (TransactionIdFollows(xmin, prstate->visibility_cutoff_xid) &&
TransactionIdIsNormal(xmin))
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index 2ff67d77cb4..7558ac697f1 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -464,7 +464,7 @@ static void dead_items_add(LVRelState *vacrel, BlockNumber blkno, OffsetNumber *
static void dead_items_reset(LVRelState *vacrel);
static void dead_items_cleanup(LVRelState *vacrel);
static bool heap_page_would_be_all_visible(Relation rel, Buffer buf,
- TransactionId OldestXmin,
+ GlobalVisState *vistest,
OffsetNumber *deadoffsets,
int ndeadoffsets,
bool *all_frozen,
@@ -2715,7 +2715,7 @@ lazy_vacuum_heap_page(LVRelState *vacrel, BlockNumber blkno, Buffer buffer,
InvalidOffsetNumber);
if (heap_page_would_be_all_visible(vacrel->rel, buffer,
- vacrel->cutoffs.OldestXmin,
+ vacrel->vistest,
deadoffsets, num_offsets,
&all_frozen, &visibility_cutoff_xid,
&vacrel->offnum))
@@ -3458,13 +3458,13 @@ dead_items_cleanup(LVRelState *vacrel)
*/
bool
heap_page_is_all_visible(Relation rel, Buffer buf,
- TransactionId OldestXmin,
+ GlobalVisState *vistest,
bool *all_frozen,
TransactionId *visibility_cutoff_xid,
OffsetNumber *logging_offnum)
{
- return heap_page_would_be_all_visible(rel, buf, OldestXmin,
+ return heap_page_would_be_all_visible(rel, buf, vistest,
NULL, 0,
all_frozen,
visibility_cutoff_xid,
@@ -3483,7 +3483,7 @@ heap_page_is_all_visible(Relation rel, Buffer buf,
* Returns true if the page is all-visible other than the provided
* deadoffsets and false otherwise.
*
- * OldestXmin is used to determine visibility.
+ * vistest is used to determine visibility.
*
* *all_frozen is an output parameter indicating to the caller if every tuple
* on the page is frozen.
@@ -3504,7 +3504,7 @@ heap_page_is_all_visible(Relation rel, Buffer buf,
*/
static bool
heap_page_would_be_all_visible(Relation rel, Buffer buf,
- TransactionId OldestXmin,
+ GlobalVisState *vistest,
OffsetNumber *deadoffsets,
int ndeadoffsets,
bool *all_frozen,
@@ -3576,8 +3576,8 @@ heap_page_would_be_all_visible(Relation rel, Buffer buf,
tuple.t_len = ItemIdGetLength(itemid);
tuple.t_tableOid = RelationGetRelid(rel);
- switch (HeapTupleSatisfiesVacuum(&tuple, OldestXmin,
- buf))
+ switch (HeapTupleSatisfiesVacuumGlobalVis(&tuple, vistest,
+ buf))
{
case HEAPTUPLE_LIVE:
{
@@ -3596,8 +3596,7 @@ heap_page_would_be_all_visible(Relation rel, Buffer buf,
* that everyone sees it as committed?
*/
xmin = HeapTupleHeaderGetXmin(tuple.t_data);
- if (!TransactionIdPrecedes(xmin,
- OldestXmin))
+ if (!GlobalVisXidVisibleToAll(vistest, xmin))
{
all_visible = false;
*all_frozen = false;
diff --git a/src/include/access/heapam.h b/src/include/access/heapam.h
index 0b9bb1c9b13..4278f351bdf 100644
--- a/src/include/access/heapam.h
+++ b/src/include/access/heapam.h
@@ -342,7 +342,7 @@ extern void heap_inplace_unlock(Relation relation,
HeapTuple oldtup, Buffer buffer);
extern bool heap_page_is_all_visible(Relation rel, Buffer buf,
- TransactionId OldestXmin,
+ GlobalVisState *vistest,
bool *all_frozen,
TransactionId *visibility_cutoff_xid,
OffsetNumber *logging_offnum);
@@ -415,6 +415,8 @@ extern TM_Result HeapTupleSatisfiesUpdate(HeapTuple htup, CommandId curcid,
Buffer buffer);
extern HTSV_Result HeapTupleSatisfiesVacuum(HeapTuple htup, TransactionId OldestXmin,
Buffer buffer);
+extern HTSV_Result HeapTupleSatisfiesVacuumGlobalVis(HeapTuple htup,
+ GlobalVisState *vistest, Buffer buffer);
extern HTSV_Result HeapTupleSatisfiesVacuumHorizon(HeapTuple htup, Buffer buffer,
TransactionId *dead_after);
extern void HeapTupleSetHintBits(HeapTupleHeader tuple, Buffer buffer,
--
2.43.0
v10-0018-Unset-all-visible-sooner-if-not-freezing.patchtext/x-patch; charset=US-ASCII; name=v10-0018-Unset-all-visible-sooner-if-not-freezing.patchDownload
From 09df07e1794c06cb435c641ad57250672eb16215 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <melanieplageman@gmail.com>
Date: Tue, 29 Jul 2025 14:35:13 -0400
Subject: [PATCH v10 18/22] Unset all-visible sooner if not freezing
In prune/freeze code, we delay unsetting all-visible/all-frozen in the
presence of dead items to allow opportunistically freezing tuples if the
whole page would be frozen except for those dead items -- which are
removed later in vacuum's third phase.
Future commits will allow on-access pruning to set the VM, which means
all-visible will be initialized to true instead of false and we will do
extra bookkeeping in heap_prune_unchanged_lp_normal() to keep track of
whether or not the page is all-visible.
Because on-access pruning will not freeze tuples, it makes sense to
unset all-visible as soon as we encounter an LP_DEAD item and
avoid continued bookkeeping since we know the page is not all-visible
and we won't be able to remove those dead items.
---
src/backend/access/heap/pruneheap.c | 10 ++++++++--
1 file changed, 8 insertions(+), 2 deletions(-)
diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index 5e8748b15ef..e6cf4df17c5 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -1498,8 +1498,11 @@ heap_prune_record_dead(PruneState *prstate, OffsetNumber offnum,
/*
* Deliberately delay unsetting all_visible until later during pruning.
- * Removable dead tuples shouldn't preclude freezing the page.
+ * Removable dead tuples shouldn't preclude freezing the page. If we won't
+ * attempt freezing, just unset all-visible now, though.
*/
+ if (!prstate->attempt_freeze)
+ prstate->all_visible = prstate->all_frozen = false;
/* Record the dead offset for vacuum */
prstate->deadoffsets[prstate->lpdead_items++] = offnum;
@@ -1757,8 +1760,11 @@ heap_prune_record_unchanged_lp_dead(Page page, PruneState *prstate, OffsetNumber
* Similarly, don't unset all_visible until later, at the end of
* heap_page_prune_and_freeze(). This will allow us to attempt to freeze
* the page after pruning. As long as we unset it before updating the
- * visibility map, this will be correct.
+ * visibility map, this will be correct. If we won't attempt freezing,
+ * though, just unset all-visible now.
*/
+ if (!prstate->attempt_freeze)
+ prstate->all_visible = prstate->all_frozen = false;
/* Record the dead offset for vacuum */
prstate->deadoffsets[prstate->lpdead_items++] = offnum;
--
2.43.0
v10-0015-Rename-GlobalVisTestIsRemovableXid-to-GlobalVisX.patchtext/x-patch; charset=US-ASCII; name=v10-0015-Rename-GlobalVisTestIsRemovableXid-to-GlobalVisX.patchDownload
From 69ff6eecfcab1cedabea8a11f61d2c688f700a61 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <melanieplageman@gmail.com>
Date: Fri, 18 Jul 2025 16:30:04 -0400
Subject: [PATCH v10 15/22] Rename GlobalVisTestIsRemovableXid() to
GlobalVisXidVisibleToAll()
Currently, we only use GlobalVisTestIsRemovableXid() to check if a
tuple's xmax is visible to all, meaning we can remove it. But future
commits will use GlobalVisTestIsRemovableXid() to test if a tuple's xmin
is visible to all for the purposes of determining if setting the page
all-visible in the VM. In that case, it makes more sense to call the
function GlobalVisXidVisibleToAll().
Reviewed-by: Kirill Reshke <reshkekirill@gmail.com>
---
src/backend/access/heap/heapam_visibility.c | 6 +++---
src/backend/access/heap/pruneheap.c | 14 +++++++-------
src/backend/access/spgist/spgvacuum.c | 2 +-
src/backend/storage/ipc/procarray.c | 13 ++++++-------
src/include/utils/snapmgr.h | 4 ++--
5 files changed, 19 insertions(+), 20 deletions(-)
diff --git a/src/backend/access/heap/heapam_visibility.c b/src/backend/access/heap/heapam_visibility.c
index 05f6946fe60..4ebc8abdbeb 100644
--- a/src/backend/access/heap/heapam_visibility.c
+++ b/src/backend/access/heap/heapam_visibility.c
@@ -1447,7 +1447,7 @@ HeapTupleSatisfiesNonVacuumable(HeapTuple htup, Snapshot snapshot,
{
Assert(TransactionIdIsValid(dead_after));
- if (GlobalVisTestIsRemovableXid(snapshot->vistest, dead_after))
+ if (GlobalVisXidVisibleToAll(snapshot->vistest, dead_after))
res = HEAPTUPLE_DEAD;
}
else
@@ -1512,8 +1512,8 @@ HeapTupleIsSurelyDead(HeapTuple htup, GlobalVisState *vistest)
return false;
/* Deleter committed, so tuple is dead if the XID is old enough. */
- return GlobalVisTestIsRemovableXid(vistest,
- HeapTupleHeaderGetRawXmax(tuple));
+ return GlobalVisXidVisibleToAll(vistest,
+ HeapTupleHeaderGetRawXmax(tuple));
}
/*
diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index 3fe9db99c0d..30736519191 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -231,7 +231,7 @@ heap_page_prune_opt(Relation relation, Buffer buffer)
*/
vistest = GlobalVisTestFor(relation);
- if (!GlobalVisTestIsRemovableXid(vistest, prune_xid))
+ if (!GlobalVisXidVisibleToAll(vistest, prune_xid))
return;
/*
@@ -578,9 +578,9 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
* Determining HTSV only once for each tuple is required for correctness,
* to deal with cases where running HTSV twice could result in different
* results. For example, RECENTLY_DEAD can turn to DEAD if another
- * checked item causes GlobalVisTestIsRemovableFullXid() to update the
- * horizon, or INSERT_IN_PROGRESS can change to DEAD if the inserting
- * transaction aborts.
+ * checked item causes GlobalVisXidVisibleToAll() to update the horizon,
+ * or INSERT_IN_PROGRESS can change to DEAD if the inserting transaction
+ * aborts.
*
* It's also good for performance. Most commonly tuples within a page are
* stored at decreasing offsets (while the items are stored at increasing
@@ -1177,11 +1177,11 @@ heap_prune_satisfies_vacuum(PruneState *prstate, HeapTuple tup, Buffer buffer)
* Determine whether or not the tuple is considered dead when compared
* with the provided GlobalVisState. On-access pruning does not provide
* VacuumCutoffs. And for vacuum, even if the tuple's xmax is not older
- * than OldestXmin, GlobalVisTestIsRemovableXid() could find the row dead
- * if the GlobalVisState has been updated since the beginning of vacuuming
+ * than OldestXmin, GlobalVisXidVisibleToAll() could find the row dead if
+ * the GlobalVisState has been updated since the beginning of vacuuming
* the relation.
*/
- if (GlobalVisTestIsRemovableXid(prstate->vistest, dead_after))
+ if (GlobalVisXidVisibleToAll(prstate->vistest, dead_after))
return HEAPTUPLE_DEAD;
return res;
diff --git a/src/backend/access/spgist/spgvacuum.c b/src/backend/access/spgist/spgvacuum.c
index 8f8a1ad7796..496cca69410 100644
--- a/src/backend/access/spgist/spgvacuum.c
+++ b/src/backend/access/spgist/spgvacuum.c
@@ -536,7 +536,7 @@ vacuumRedirectAndPlaceholder(Relation index, Relation heaprel, Buffer buffer)
*/
if (dt->tupstate == SPGIST_REDIRECT &&
(!TransactionIdIsValid(dt->xid) ||
- GlobalVisTestIsRemovableXid(vistest, dt->xid)))
+ GlobalVisXidVisibleToAll(vistest, dt->xid)))
{
dt->tupstate = SPGIST_PLACEHOLDER;
Assert(opaque->nRedirection > 0);
diff --git a/src/backend/storage/ipc/procarray.c b/src/backend/storage/ipc/procarray.c
index 200f72c6e25..f67f01c17c2 100644
--- a/src/backend/storage/ipc/procarray.c
+++ b/src/backend/storage/ipc/procarray.c
@@ -4181,8 +4181,7 @@ GlobalVisUpdate(void)
* See comment for GlobalVisState for details.
*/
bool
-GlobalVisTestIsRemovableFullXid(GlobalVisState *state,
- FullTransactionId fxid)
+GlobalVisFullXidVisibleToAll(GlobalVisState *state, FullTransactionId fxid)
{
/*
* If fxid is older than maybe_needed bound, it definitely is visible to
@@ -4223,7 +4222,7 @@ GlobalVisTestIsRemovableFullXid(GlobalVisState *state,
* relfrozenxid).
*/
bool
-GlobalVisTestIsRemovableXid(GlobalVisState *state, TransactionId xid)
+GlobalVisXidVisibleToAll(GlobalVisState *state, TransactionId xid)
{
FullTransactionId fxid;
@@ -4237,7 +4236,7 @@ GlobalVisTestIsRemovableXid(GlobalVisState *state, TransactionId xid)
*/
fxid = FullXidRelativeTo(state->definitely_needed, xid);
- return GlobalVisTestIsRemovableFullXid(state, fxid);
+ return GlobalVisFullXidVisibleToAll(state, fxid);
}
/*
@@ -4251,12 +4250,12 @@ GlobalVisCheckRemovableFullXid(Relation rel, FullTransactionId fxid)
state = GlobalVisTestFor(rel);
- return GlobalVisTestIsRemovableFullXid(state, fxid);
+ return GlobalVisFullXidVisibleToAll(state, fxid);
}
/*
* Convenience wrapper around GlobalVisTestFor() and
- * GlobalVisTestIsRemovableXid(), see their comments.
+ * GlobalVisTestIsVisibleXid(), see their comments.
*/
bool
GlobalVisCheckRemovableXid(Relation rel, TransactionId xid)
@@ -4265,7 +4264,7 @@ GlobalVisCheckRemovableXid(Relation rel, TransactionId xid)
state = GlobalVisTestFor(rel);
- return GlobalVisTestIsRemovableXid(state, xid);
+ return GlobalVisXidVisibleToAll(state, xid);
}
/*
diff --git a/src/include/utils/snapmgr.h b/src/include/utils/snapmgr.h
index 604c1f90216..a0ea2cfcea2 100644
--- a/src/include/utils/snapmgr.h
+++ b/src/include/utils/snapmgr.h
@@ -100,8 +100,8 @@ extern char *ExportSnapshot(Snapshot snapshot);
*/
typedef struct GlobalVisState GlobalVisState;
extern GlobalVisState *GlobalVisTestFor(Relation rel);
-extern bool GlobalVisTestIsRemovableXid(GlobalVisState *state, TransactionId xid);
-extern bool GlobalVisTestIsRemovableFullXid(GlobalVisState *state, FullTransactionId fxid);
+extern bool GlobalVisXidVisibleToAll(GlobalVisState *state, TransactionId xid);
+extern bool GlobalVisFullXidVisibleToAll(GlobalVisState *state, FullTransactionId fxid);
extern bool GlobalVisCheckRemovableXid(Relation rel, TransactionId xid);
extern bool GlobalVisCheckRemovableFullXid(Relation rel, FullTransactionId fxid);
--
2.43.0
v10-0017-Inline-TransactionIdFollows-Precedes.patchtext/x-patch; charset=US-ASCII; name=v10-0017-Inline-TransactionIdFollows-Precedes.patchDownload
From 6d261819f8b35946698c540b87260e3c49883c0d Mon Sep 17 00:00:00 2001
From: Melanie Plageman <melanieplageman@gmail.com>
Date: Tue, 29 Jul 2025 09:57:13 -0400
Subject: [PATCH v10 17/22] Inline TransactionIdFollows/Precedes()
Calling these from on-access pruning code had noticeable overhead in a
profile. There does not seem to be a reason not to inline them.
Reviewed-by: Kirill Reshke <reshkekirill@gmail.com>
---
src/backend/access/transam/transam.c | 64 -------------------------
src/include/access/transam.h | 70 ++++++++++++++++++++++++++--
2 files changed, 66 insertions(+), 68 deletions(-)
diff --git a/src/backend/access/transam/transam.c b/src/backend/access/transam/transam.c
index 9a39451a29a..553d6756cb3 100644
--- a/src/backend/access/transam/transam.c
+++ b/src/backend/access/transam/transam.c
@@ -273,70 +273,6 @@ TransactionIdAbortTree(TransactionId xid, int nxids, TransactionId *xids)
TRANSACTION_STATUS_ABORTED, InvalidXLogRecPtr);
}
-/*
- * TransactionIdPrecedes --- is id1 logically < id2?
- */
-bool
-TransactionIdPrecedes(TransactionId id1, TransactionId id2)
-{
- /*
- * If either ID is a permanent XID then we can just do unsigned
- * comparison. If both are normal, do a modulo-2^32 comparison.
- */
- int32 diff;
-
- if (!TransactionIdIsNormal(id1) || !TransactionIdIsNormal(id2))
- return (id1 < id2);
-
- diff = (int32) (id1 - id2);
- return (diff < 0);
-}
-
-/*
- * TransactionIdPrecedesOrEquals --- is id1 logically <= id2?
- */
-bool
-TransactionIdPrecedesOrEquals(TransactionId id1, TransactionId id2)
-{
- int32 diff;
-
- if (!TransactionIdIsNormal(id1) || !TransactionIdIsNormal(id2))
- return (id1 <= id2);
-
- diff = (int32) (id1 - id2);
- return (diff <= 0);
-}
-
-/*
- * TransactionIdFollows --- is id1 logically > id2?
- */
-bool
-TransactionIdFollows(TransactionId id1, TransactionId id2)
-{
- int32 diff;
-
- if (!TransactionIdIsNormal(id1) || !TransactionIdIsNormal(id2))
- return (id1 > id2);
-
- diff = (int32) (id1 - id2);
- return (diff > 0);
-}
-
-/*
- * TransactionIdFollowsOrEquals --- is id1 logically >= id2?
- */
-bool
-TransactionIdFollowsOrEquals(TransactionId id1, TransactionId id2)
-{
- int32 diff;
-
- if (!TransactionIdIsNormal(id1) || !TransactionIdIsNormal(id2))
- return (id1 >= id2);
-
- diff = (int32) (id1 - id2);
- return (diff >= 0);
-}
-
/*
* TransactionIdLatest --- get latest XID among a main xact and its children
diff --git a/src/include/access/transam.h b/src/include/access/transam.h
index 7d82cd2eb56..c9e20418275 100644
--- a/src/include/access/transam.h
+++ b/src/include/access/transam.h
@@ -255,6 +255,72 @@ typedef struct TransamVariablesData
} TransamVariablesData;
+
+/*
+ * TransactionIdPrecedes --- is id1 logically < id2?
+ */
+static inline bool
+TransactionIdPrecedes(TransactionId id1, TransactionId id2)
+{
+ /*
+ * If either ID is a permanent XID then we can just do unsigned
+ * comparison. If both are normal, do a modulo-2^32 comparison.
+ */
+ int32 diff;
+
+ if (!TransactionIdIsNormal(id1) || !TransactionIdIsNormal(id2))
+ return (id1 < id2);
+
+ diff = (int32) (id1 - id2);
+ return (diff < 0);
+}
+
+/*
+ * TransactionIdPrecedesOrEquals --- is id1 logically <= id2?
+ */
+static inline bool
+TransactionIdPrecedesOrEquals(TransactionId id1, TransactionId id2)
+{
+ int32 diff;
+
+ if (!TransactionIdIsNormal(id1) || !TransactionIdIsNormal(id2))
+ return (id1 <= id2);
+
+ diff = (int32) (id1 - id2);
+ return (diff <= 0);
+}
+
+/*
+ * TransactionIdFollows --- is id1 logically > id2?
+ */
+static inline bool
+TransactionIdFollows(TransactionId id1, TransactionId id2)
+{
+ int32 diff;
+
+ if (!TransactionIdIsNormal(id1) || !TransactionIdIsNormal(id2))
+ return (id1 > id2);
+
+ diff = (int32) (id1 - id2);
+ return (diff > 0);
+}
+
+/*
+ * TransactionIdFollowsOrEquals --- is id1 logically >= id2?
+ */
+static inline bool
+TransactionIdFollowsOrEquals(TransactionId id1, TransactionId id2)
+{
+ int32 diff;
+
+ if (!TransactionIdIsNormal(id1) || !TransactionIdIsNormal(id2))
+ return (id1 >= id2);
+
+ diff = (int32) (id1 - id2);
+ return (diff >= 0);
+}
+
+
/* ----------------
* extern declarations
* ----------------
@@ -274,10 +340,6 @@ extern bool TransactionIdDidAbort(TransactionId transactionId);
extern void TransactionIdCommitTree(TransactionId xid, int nxids, TransactionId *xids);
extern void TransactionIdAsyncCommitTree(TransactionId xid, int nxids, TransactionId *xids, XLogRecPtr lsn);
extern void TransactionIdAbortTree(TransactionId xid, int nxids, TransactionId *xids);
-extern bool TransactionIdPrecedes(TransactionId id1, TransactionId id2);
-extern bool TransactionIdPrecedesOrEquals(TransactionId id1, TransactionId id2);
-extern bool TransactionIdFollows(TransactionId id1, TransactionId id2);
-extern bool TransactionIdFollowsOrEquals(TransactionId id1, TransactionId id2);
extern TransactionId TransactionIdLatest(TransactionId mainxid,
int nxids, const TransactionId *xids);
extern XLogRecPtr TransactionIdGetCommitLSN(TransactionId xid);
--
2.43.0
On Fri, Sep 5, 2025 at 6:20 PM Melanie Plageman
<melanieplageman@gmail.com> wrote:
yikes, you are right about the "reason" member. Attached 0002 removes
it, and I'll go ahead and fix it in the back branches too.
I think changing this in the back-branches is a super-bad idea. If you
want, you can add a comment in the back-branches saying "oops, we
shipped a field that isn't used for anything", but changing the struct
definition is very likely to make 0 people happy and >0 people
unhappy. On the other hand, changing this in master is a good idea and
you should go ahead and do that before this creates any more
confusion.
--
Robert Haas
EDB: http://www.enterprisedb.com
On Mon, Sep 8, 2025 at 12:41 PM Robert Haas <robertmhaas@gmail.com> wrote:
On Fri, Sep 5, 2025 at 6:20 PM Melanie Plageman
<melanieplageman@gmail.com> wrote:yikes, you are right about the "reason" member. Attached 0002 removes
it, and I'll go ahead and fix it in the back branches too.I think changing this in the back-branches is a super-bad idea. If you
want, you can add a comment in the back-branches saying "oops, we
shipped a field that isn't used for anything", but changing the struct
definition is very likely to make 0 people happy and >0 people
unhappy. On the other hand, changing this in master is a good idea and
you should go ahead and do that before this creates any more
confusion.
Yes, that makes 100% sense. It should have occurred to me. I've pushed
the commit to master. I didn't put an updated set of patches here in
case someone was already reviewing them, as nothing else has changed.
- Melanie
On Mon, Sep 8, 2025 at 11:44 AM Melanie Plageman
<melanieplageman@gmail.com> wrote:
I pushed this, so rebased v10 is attached. I've added one new patch:
0002 adds ERRCODE_DATA_CORRUPTED to the existing log messages about
VM/data corruption in vacuum. Andrey Borodin earlier suggested this,
and I had neglected to include it.
Writing "ereport(WARNING, (errcode(ERRCODE_DATA_CORRUPTED)" is very
much a minority position. Generally the call to errcode() is on the
following line. I think the commit message could use a bit of work,
too. The first sentence heavily duplicates the second and the fourth,
and the third sentence isn't sufficiently well-connected to the rest
to make it clear why you're restating this general principle in this
commit message.
Perhaps something like:
Add error codes when VACUUM discovers VM corruption
Commit fd6ec93bf890314ac694dc8a7f3c45702ecc1bbd and other previous
work has established the principle that when an error is potentially
reachable in case of on-disk corruption, but is not expected to be
reached otherwise, ERRCODE_DATA_CORRUPTED should be used. This allows
log monitoring software to search for evidence of corruption by
filtering on the error code.
That kibitzing aside, I think this is pretty clearly the right thing to do.
--
Robert Haas
EDB: http://www.enterprisedb.com
On Mon, Sep 8, 2025 at 2:54 PM Robert Haas <robertmhaas@gmail.com> wrote:
Commit fd6ec93bf890314ac694dc8a7f3c45702ecc1bbd and other previous
work has established the principle that when an error is potentially
reachable in case of on-disk corruption, but is not expected to be
reached otherwise, ERRCODE_DATA_CORRUPTED should be used. This allows
log monitoring software to search for evidence of corruption by
filtering on the error code.That kibitzing aside, I think this is pretty clearly the right thing to do.
Thanks for the suggested wording and the pointer to that thread.
I noticed that in that thread they decided to use errmsg_internal()
instead of errmsg() for a few different reasons -- one of which was
that the situation is not supposed to happen/cannot happen -- which I
don't really understand. It is a reachable code path. Another is that
it is extra work for translators, which I'm also not sure how to apply
to my situation. Are the VM corruption cases worth extra work to the
translators?
I think the most compelling reason is that people will want to search
for the error message in English online. So, for that reason, my
instinct is to use errmsg_internal() in my case as well.
- Melanie
On Mon, Sep 8, 2025 at 3:14 PM Melanie Plageman
<melanieplageman@gmail.com> wrote:
I noticed that in that thread they decided to use errmsg_internal()
instead of errmsg() for a few different reasons -- one of which was
that the situation is not supposed to happen/cannot happen -- which I
don't really understand. It is a reachable code path. Another is that
it is extra work for translators, which I'm also not sure how to apply
to my situation. Are the VM corruption cases worth extra work to the
translators?I think the most compelling reason is that people will want to search
for the error message in English online. So, for that reason, my
instinct is to use errmsg_internal() in my case as well.
I don't find that reason particularly compelling -- people could want
to search for any error message, or they could equally want to be able
to read it without Google translate. Guessing which messages are
obscure enough that we need not translate them exceeds my powers. If I
were doing it, I'd make it errmsg() rather than errmsg_internal() and
let the translations team change it if they don't think it's worth
bothering with, because if you make it errmsg_internal() then they
won't see it until somebody complains about it not getting translated.
However, I suspect different committers would pursue different
strategies here.
--
Robert Haas
EDB: http://www.enterprisedb.com
Reviewing 0003:
+ /*
+ * If we're only adding already frozen rows to a
previously empty
+ * page, mark it as all-frozen and update the
visibility map. We're
+ * already holding a pin on the vmbuffer.
+ */
else if (all_frozen_set)
+ {
PageSetAllVisible(page);
+ LockBuffer(vmbuffer, BUFFER_LOCK_EXCLUSIVE);
+ visibilitymap_set_vmbits(relation,
+
BufferGetBlockNumber(buffer),
+
vmbuffer,
+
VISIBILITYMAP_ALL_VISIBLE |
+
VISIBILITYMAP_ALL_FROZEN);
Locking a buffer in a critical section violates the order of
operations proposed in the 'Write-Ahead Log Coding' section of
src/backend/access/transam/README.
+ * Now read and update the VM block. Even if we skipped
updating the heap
+ * page due to the file being dropped or truncated later in
recovery, it's
+ * still safe to update the visibility map. Any WAL record that clears
+ * the visibility map bit does so before checking the page LSN, so any
+ * bits that need to be cleared will still be cleared.
+ *
+ * It is only okay to set the VM bits without holding the heap page lock
+ * because we can expect no other writers of this page.
The first paragraph of this paraphrases a similar content in
xlog_heap_visible(), but I don't see the variation in phrasing as an
improvement.
The second paragraph does not convince me at all. I see no reason to
believe that this is safe, or that it is a good idea. The code in
xlog_heap_visible() thinks its OK to unlock and relock the page to
make visibilitymap_set() happy, which is cringy but probably safe for
lack of concurrent writers, but skipping locking altogether seems
deeply unwise.
- * visibilitymap_set - set a bit in a previously pinned page
+ * visibilitymap_set - set bit(s) in a previously
pinned page and log
+ * visibilitymap_set_vmbits - set bit(s) in a pinned page
I suspect the indentation was done with a different mix of spaces and
tabs here, because this doesn't align for me.
In general, this idea makes some sense to me -- there doesn't seem to
be any particularly good reason why the visibility-map update should
be handled by a different WAL record than the all-visible flag on the
page itself. It's a little hard for me to make that statement too
conclusively without studying more of the patches than I've had time
to do today, but off the top of my head it seems to make sense.
However, I'm not sure you've taken enough care with the details here.
--
Robert Haas
EDB: http://www.enterprisedb.com
On Mon, Sep 8, 2025 at 4:15 PM Robert Haas <robertmhaas@gmail.com> wrote:
Reviewing 0003:
Locking a buffer in a critical section violates the order of
operations proposed in the 'Write-Ahead Log Coding' section of
src/backend/access/transam/README.
Right, I noticed some other callers of visibiltymap_set() (like
lazy_scan_new_or_empty()) did call it in a critical section (and it
exclusive locks the VM page), so I thought perhaps it was better to
keep this operation as close as possible to where we update the VM
(similar to how it is in master in visibilitymap_set()).
But, I think you're right that maintaining the order of operations
proposed in transam/README is more important. As such, in attached
v11, I've modified this patch and the other patches where I replace
visibilitymap_set() with visibilitymap_set_vmbits() to exclusively
lock the vmbuffer before the critical section.
visibilitymap_set_vmbits() asserts that we have the vmbuffer
exclusively locked, so we should be good.
+ * Now read and update the VM block. Even if we skipped updating the heap + * page due to the file being dropped or truncated later in recovery, it's + * still safe to update the visibility map. Any WAL record that clears + * the visibility map bit does so before checking the page LSN, so any + * bits that need to be cleared will still be cleared. + * + * It is only okay to set the VM bits without holding the heap page lock + * because we can expect no other writers of this page.The first paragraph of this paraphrases a similar content in
xlog_heap_visible(), but I don't see the variation in phrasing as an
improvement.
The only difference is I replaced the phrase "LSN interlock" with
"being dropped or truncated later in recovery" -- which is more
specific and, I thought, more clear. Without this comment, it took me
some time to understand the scenarios that might lead us to skip
updating the heap block. heap_xlog_visible() has cause to describe
this situation in an earlier comment -- which is why I think the LSN
interlock comment is less confusing there.
Anyway, I'm open to changing the comment. I could:
1) copy-paste the same comment as heap_xlog_visible()
2) refer to the comment in heap_xlog_visible() (comment seemed a bit
short for that)
3) diverge the comments further by improving the new comment in
heap_xlog_multi_insert() in some way
4) something else?
The second paragraph does not convince me at all. I see no reason to
believe that this is safe, or that it is a good idea. The code in
xlog_heap_visible() thinks its OK to unlock and relock the page to
make visibilitymap_set() happy, which is cringy but probably safe for
lack of concurrent writers, but skipping locking altogether seems
deeply unwise.
Actually in master, heap_xlog_visible() has no lock on the heap page
when it calls visibiltymap_set(). It releases that lock before
recording the freespace in the FSM and doesn't take it again.
It does unlock and relock the VM page -- because visibilitymap_set()
expects to take the lock on the VM.
I agree that not holding the heap lock while updating the VM is
unsatisfying. We can't hold it while doing the IO to read in the VM
block in XLogReadBufferForRedoExtended(). So, we could take it again
before calling visibilitymap_set(). But we don't always have the heap
buffer, though. I suspect this is partially why heap_xlog_visible()
unconditionally passes InvalidBuffer to visibilitymap_set() as the
heap buffer and has special case handling for recovery when we don't
have the heap buffer.
In any case, it isn't an active bug, and I don't think future-proofing
VM replay (i.e. against parallel recovery) is a prerequisite for
committing this patch since it is also that way on master.
- * visibilitymap_set - set a bit in a previously pinned page + * visibilitymap_set - set bit(s) in a previously pinned page and log + * visibilitymap_set_vmbits - set bit(s) in a pinned pageI suspect the indentation was done with a different mix of spaces and
tabs here, because this doesn't align for me.
oops, fixed.
I pushed the ERRCODE_DATA_CORRUPTED patch, so attached v11 is rebased
and also has the changes mentioned above.
Since you've started reviewing the set, I'll note that patches
0005-0011 are split up for ease of review and it may not necessarily
make sense to keep that separation for eventual commit. They are a
series of steps to move VM updates from lazy_scan_prune() into
pruneheap.c.
- Melanie
Attachments:
v11-0001-Eliminate-xl_heap_visible-in-COPY-FREEZE.patchtext/x-patch; charset=US-ASCII; name=v11-0001-Eliminate-xl_heap_visible-in-COPY-FREEZE.patchDownload
From 1ce37296b97bb40e717b3dc1f2052da0b022fa78 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <melanieplageman@gmail.com>
Date: Tue, 17 Jun 2025 17:22:10 -0400
Subject: [PATCH v11 01/20] Eliminate xl_heap_visible in COPY FREEZE
Instead of emitting a separate WAL record for setting the VM bits in
xl_heap_visible, specify the changes to make to the VM block in the
xl_heap_multi_insert record instead.
This halves the number of WAL records emitted by COPY FREEZE.
Author: Melanie Plageman <melanieplageman@gmail.com>
Reviewed-by: Robert Haas <robertmhaas@gmail.com>
Reviewed-by: Kirill Reshke <reshkekirill@gmail.com>
Discussion: https://postgr.es/m/flat/CAAKRu_ZMw6Npd_qm2KM%2BFwQ3cMOMx1Dh3VMhp8-V7SOLxdK9-g%40mail.gmail.com
---
src/backend/access/heap/heapam.c | 50 +++++++++++-------
src/backend/access/heap/heapam_xlog.c | 43 +++++++++++++++-
src/backend/access/heap/visibilitymap.c | 67 ++++++++++++++++++++++++-
src/backend/access/rmgrdesc/heapdesc.c | 5 ++
src/include/access/visibilitymap.h | 2 +
5 files changed, 147 insertions(+), 20 deletions(-)
diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index 4c5ae205a7a..cff531a4801 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -2466,7 +2466,11 @@ heap_multi_insert(Relation relation, TupleTableSlot **slots, int ntuples,
starting_with_empty_page = PageGetMaxOffsetNumber(page) == 0;
if (starting_with_empty_page && (options & HEAP_INSERT_FROZEN))
+ {
all_frozen_set = true;
+ /* Lock the vmbuffer before entering the critical section */
+ LockBuffer(vmbuffer, BUFFER_LOCK_EXCLUSIVE);
+ }
/* NO EREPORT(ERROR) from here till changes are logged */
START_CRIT_SECTION();
@@ -2504,9 +2508,6 @@ heap_multi_insert(Relation relation, TupleTableSlot **slots, int ntuples,
/*
* If the page is all visible, need to clear that, unless we're only
* going to add further frozen rows to it.
- *
- * If we're only adding already frozen rows to a previously empty
- * page, mark it as all-visible.
*/
if (PageIsAllVisible(page) && !(options & HEAP_INSERT_FROZEN))
{
@@ -2516,8 +2517,21 @@ heap_multi_insert(Relation relation, TupleTableSlot **slots, int ntuples,
BufferGetBlockNumber(buffer),
vmbuffer, VISIBILITYMAP_VALID_BITS);
}
+
+ /*
+ * If we're only adding already frozen rows to a previously empty
+ * page, mark it as all-frozen and update the visibility map. We're
+ * already holding a pin on the vmbuffer.
+ */
else if (all_frozen_set)
+ {
PageSetAllVisible(page);
+ visibilitymap_set_vmbits(relation,
+ BufferGetBlockNumber(buffer),
+ vmbuffer,
+ VISIBILITYMAP_ALL_VISIBLE |
+ VISIBILITYMAP_ALL_FROZEN);
+ }
/*
* XXX Should we set PageSetPrunable on this page ? See heap_insert()
@@ -2565,6 +2579,12 @@ heap_multi_insert(Relation relation, TupleTableSlot **slots, int ntuples,
xlrec->flags = 0;
if (all_visible_cleared)
xlrec->flags = XLH_INSERT_ALL_VISIBLE_CLEARED;
+
+ /*
+ * We don't have to worry about including a conflict xid in the
+ * WAL record as HEAP_INSERT_FROZEN intentionally violates
+ * visibility rules.
+ */
if (all_frozen_set)
xlrec->flags = XLH_INSERT_ALL_FROZEN_SET;
@@ -2627,7 +2647,10 @@ heap_multi_insert(Relation relation, TupleTableSlot **slots, int ntuples,
XLogBeginInsert();
XLogRegisterData(xlrec, tupledata - scratch.data);
+
XLogRegisterBuffer(0, buffer, REGBUF_STANDARD | bufflags);
+ if (all_frozen_set)
+ XLogRegisterBuffer(1, vmbuffer, 0);
XLogRegisterBufData(0, tupledata, totaldatalen);
@@ -2637,26 +2660,17 @@ heap_multi_insert(Relation relation, TupleTableSlot **slots, int ntuples,
recptr = XLogInsert(RM_HEAP2_ID, info);
PageSetLSN(page, recptr);
+ if (all_frozen_set)
+ {
+ Assert(BufferIsDirty(vmbuffer));
+ PageSetLSN(BufferGetPage(vmbuffer), recptr);
+ }
}
END_CRIT_SECTION();
- /*
- * If we've frozen everything on the page, update the visibilitymap.
- * We're already holding pin on the vmbuffer.
- */
if (all_frozen_set)
- {
- /*
- * It's fine to use InvalidTransactionId here - this is only used
- * when HEAP_INSERT_FROZEN is specified, which intentionally
- * violates visibility rules.
- */
- visibilitymap_set(relation, BufferGetBlockNumber(buffer), buffer,
- InvalidXLogRecPtr, vmbuffer,
- InvalidTransactionId,
- VISIBILITYMAP_ALL_VISIBLE | VISIBILITYMAP_ALL_FROZEN);
- }
+ LockBuffer(vmbuffer, BUFFER_LOCK_UNLOCK);
UnlockReleaseBuffer(buffer);
ndone += nthispage;
diff --git a/src/backend/access/heap/heapam_xlog.c b/src/backend/access/heap/heapam_xlog.c
index cf843277938..0820f7d052d 100644
--- a/src/backend/access/heap/heapam_xlog.c
+++ b/src/backend/access/heap/heapam_xlog.c
@@ -551,6 +551,7 @@ heap_xlog_multi_insert(XLogReaderState *record)
int i;
bool isinit = (XLogRecGetInfo(record) & XLOG_HEAP_INIT_PAGE) != 0;
XLogRedoAction action;
+ Buffer vmbuffer = InvalidBuffer;
/*
* Insertion doesn't overwrite MVCC data, so no conflict processing is
@@ -571,11 +572,11 @@ heap_xlog_multi_insert(XLogReaderState *record)
if (xlrec->flags & XLH_INSERT_ALL_VISIBLE_CLEARED)
{
Relation reln = CreateFakeRelcacheEntry(rlocator);
- Buffer vmbuffer = InvalidBuffer;
visibilitymap_pin(reln, blkno, &vmbuffer);
visibilitymap_clear(reln, blkno, vmbuffer, VISIBILITYMAP_VALID_BITS);
ReleaseBuffer(vmbuffer);
+ vmbuffer = InvalidBuffer;
FreeFakeRelcacheEntry(reln);
}
@@ -662,6 +663,46 @@ heap_xlog_multi_insert(XLogReaderState *record)
if (BufferIsValid(buffer))
UnlockReleaseBuffer(buffer);
+ buffer = InvalidBuffer;
+
+ /*
+ * Now read and update the VM block. Even if we skipped updating the heap
+ * page due to the file being dropped or truncated later in recovery, it's
+ * still safe to update the visibility map. Any WAL record that clears
+ * the visibility map bit does so before checking the page LSN, so any
+ * bits that need to be cleared will still be cleared.
+ *
+ * It is only okay to set the VM bits without holding the heap page lock
+ * because we can expect no other writers of this page.
+ */
+ if (xlrec->flags & XLH_INSERT_ALL_FROZEN_SET &&
+ XLogReadBufferForRedoExtended(record, 1, RBM_ZERO_ON_ERROR, false,
+ &vmbuffer) == BLK_NEEDS_REDO)
+ {
+ Page vmpage = BufferGetPage(vmbuffer);
+ Relation reln = CreateFakeRelcacheEntry(rlocator);
+
+ /* initialize the page if it was read as zeros */
+ if (PageIsNew(vmpage))
+ PageInit(vmpage, BLCKSZ, 0);
+
+ visibilitymap_set_vmbits(reln, blkno,
+ vmbuffer,
+ VISIBILITYMAP_ALL_VISIBLE |
+ VISIBILITYMAP_ALL_FROZEN);
+
+ /*
+ * It is not possible that the VM was already set for this heap page,
+ * so the vmbuffer must have been modified and marked dirty.
+ */
+ Assert(BufferIsDirty(vmbuffer));
+ PageSetLSN(BufferGetPage(vmbuffer), lsn);
+ FreeFakeRelcacheEntry(reln);
+ }
+
+ if (BufferIsValid(vmbuffer))
+ UnlockReleaseBuffer(vmbuffer);
+
/*
* If the page is running low on free space, update the FSM as well.
* Arbitrarily, our definition of "low" is less than 20%. We can't do much
diff --git a/src/backend/access/heap/visibilitymap.c b/src/backend/access/heap/visibilitymap.c
index 7306c16f05c..aa48a436108 100644
--- a/src/backend/access/heap/visibilitymap.c
+++ b/src/backend/access/heap/visibilitymap.c
@@ -14,7 +14,8 @@
* visibilitymap_clear - clear bits for one page in the visibility map
* visibilitymap_pin - pin a map page for setting a bit
* visibilitymap_pin_ok - check whether correct map page is already pinned
- * visibilitymap_set - set a bit in a previously pinned page
+ * visibilitymap_set - set bit(s) in a previously pinned page and log
+ * visibilitymap_set_vmbits - set bit(s) in a pinned page
* visibilitymap_get_status - get status of bits
* visibilitymap_count - count number of bits set in visibility map
* visibilitymap_prepare_truncate -
@@ -321,6 +322,70 @@ visibilitymap_set(Relation rel, BlockNumber heapBlk, Buffer heapBuf,
return status;
}
+/*
+ * Set flags in the VM block contained in the passed in vmBuf.
+ *
+ * This function is for callers which include the VM changes in the same WAL
+ * record as the modifications of the heap page which rendered it all-visible.
+ * Callers separately logging the VM changes should invoke visibilitymap_set()
+ * instead.
+ *
+ * Caller must have pinned and exclusive locked the correct block of the VM in
+ * vmBuf. This block should contain the VM bits for the given heapBlk.
+ *
+ * During normal operation (i.e. not recovery), this should be called in a
+ * critical section which also makes any necessary changes to the heap page
+ * and, if relevant, emits WAL.
+ *
+ * Caller is responsible for WAL logging the changes to the VM buffer and for
+ * making any changes needed to the associated heap page. This includes
+ * maintaining any invariants such as ensuring the buffer containing heapBlk
+ * is pinned and exclusive locked.
+ */
+uint8
+visibilitymap_set_vmbits(Relation rel, BlockNumber heapBlk,
+ Buffer vmBuf, uint8 flags)
+{
+ BlockNumber mapBlock = HEAPBLK_TO_MAPBLOCK(heapBlk);
+ uint32 mapByte = HEAPBLK_TO_MAPBYTE(heapBlk);
+ uint8 mapOffset = HEAPBLK_TO_OFFSET(heapBlk);
+ Page page;
+ uint8 *map;
+ uint8 status;
+
+#ifdef TRACE_VISIBILITYMAP
+ elog(DEBUG1, "vm_set flags 0x%02X for %s %d",
+ flags, RelationGetRelationName(rel), heapBlk);
+#endif
+
+ /* Call in same critical section where WAL is emitted. */
+ Assert(InRecovery || CritSectionCount > 0);
+
+ /* Flags should be valid. Also never clear bits with this function */
+ Assert((flags & VISIBILITYMAP_VALID_BITS) == flags);
+
+ /* Must never set all_frozen bit without also setting all_visible bit */
+ Assert(flags != VISIBILITYMAP_ALL_FROZEN);
+
+ /* Check that we have the right VM page pinned */
+ if (!BufferIsValid(vmBuf) || BufferGetBlockNumber(vmBuf) != mapBlock)
+ elog(ERROR, "wrong VM buffer passed to visibilitymap_set");
+
+ Assert(BufferIsExclusiveLocked(vmBuf));
+
+ page = BufferGetPage(vmBuf);
+ map = (uint8 *) PageGetContents(page);
+
+ status = (map[mapByte] >> mapOffset) & VISIBILITYMAP_VALID_BITS;
+ if (flags != status)
+ {
+ map[mapByte] |= (flags << mapOffset);
+ MarkBufferDirty(vmBuf);
+ }
+
+ return status;
+}
+
/*
* visibilitymap_get_status - get status of bits
*
diff --git a/src/backend/access/rmgrdesc/heapdesc.c b/src/backend/access/rmgrdesc/heapdesc.c
index 82b62c95de5..b48d7dc1d24 100644
--- a/src/backend/access/rmgrdesc/heapdesc.c
+++ b/src/backend/access/rmgrdesc/heapdesc.c
@@ -16,6 +16,7 @@
#include "access/heapam_xlog.h"
#include "access/rmgrdesc_utils.h"
+#include "access/visibilitymapdefs.h"
#include "storage/standbydefs.h"
/*
@@ -354,6 +355,10 @@ heap2_desc(StringInfo buf, XLogReaderState *record)
appendStringInfo(buf, "ntuples: %d, flags: 0x%02X", xlrec->ntuples,
xlrec->flags);
+ if (xlrec->flags & XLH_INSERT_ALL_FROZEN_SET)
+ appendStringInfo(buf, ", vm_flags: 0x%02X",
+ VISIBILITYMAP_ALL_VISIBLE | VISIBILITYMAP_ALL_FROZEN);
+
if (XLogRecHasBlockData(record, 0) && !isinit)
{
appendStringInfoString(buf, ", offsets:");
diff --git a/src/include/access/visibilitymap.h b/src/include/access/visibilitymap.h
index be21c6dd1a3..fc7056a91ea 100644
--- a/src/include/access/visibilitymap.h
+++ b/src/include/access/visibilitymap.h
@@ -37,6 +37,8 @@ extern uint8 visibilitymap_set(Relation rel,
Buffer vmBuf,
TransactionId cutoff_xid,
uint8 flags);
+extern uint8 visibilitymap_set_vmbits(Relation rel, BlockNumber heapBlk,
+ Buffer vmBuf, uint8 flags);
extern uint8 visibilitymap_get_status(Relation rel, BlockNumber heapBlk, Buffer *vmbuf);
extern void visibilitymap_count(Relation rel, BlockNumber *all_visible, BlockNumber *all_frozen);
extern BlockNumber visibilitymap_prepare_truncate(Relation rel,
--
2.43.0
v11-0003-Eliminate-xl_heap_visible-from-vacuum-phase-III.patchtext/x-patch; charset=US-ASCII; name=v11-0003-Eliminate-xl_heap_visible-from-vacuum-phase-III.patchDownload
From fff425a8f480f66dc61c61ffb2b15f679901331d Mon Sep 17 00:00:00 2001
From: Melanie Plageman <melanieplageman@gmail.com>
Date: Wed, 18 Jun 2025 12:42:13 -0400
Subject: [PATCH v11 03/20] Eliminate xl_heap_visible from vacuum phase III
Instead of emitting a separate xl_heap_visible record for each page that
is rendered all-visible by vacuum's third phase, include the updates to
the VM in the already emitted xl_heap_prune record.
The visibilitymap bits are stored in the flags member of the
xl_heap_prune record.
This can decrease the number of of WAL records vacuum phase III emits by
as much as half.
Author: Melanie Plageman <melanieplageman@gmail.com>
Reviewed-by: Kirill Reshke <reshkekirill@gmail.com>
Reviewed-by: Andres Freund <andres@anarazel.de>
Discussion: https://postgr.es/m/flat/CAAKRu_ZMw6Npd_qm2KM%2BFwQ3cMOMx1Dh3VMhp8-V7SOLxdK9-g%40mail.gmail.com
---
src/backend/access/heap/heapam_xlog.c | 145 +++++++++++++++++----
src/backend/access/heap/pruneheap.c | 66 ++++++++--
src/backend/access/heap/vacuumlazy.c | 166 +++++++++++++++++--------
src/backend/access/rmgrdesc/heapdesc.c | 6 +-
src/include/access/heapam.h | 9 ++
src/include/access/heapam_xlog.h | 36 ++++--
6 files changed, 332 insertions(+), 96 deletions(-)
diff --git a/src/backend/access/heap/heapam_xlog.c b/src/backend/access/heap/heapam_xlog.c
index 0820f7d052d..11c11929ed9 100644
--- a/src/backend/access/heap/heapam_xlog.c
+++ b/src/backend/access/heap/heapam_xlog.c
@@ -35,7 +35,9 @@ heap_xlog_prune_freeze(XLogReaderState *record)
Buffer buffer;
RelFileLocator rlocator;
BlockNumber blkno;
- XLogRedoAction action;
+ Buffer vmbuffer = InvalidBuffer;
+ uint8 vmflags = 0;
+ Size freespace = 0;
XLogRecGetBlockTag(record, 0, &rlocator, NULL, &blkno);
memcpy(&xlrec, maindataptr, SizeOfHeapPrune);
@@ -50,11 +52,17 @@ heap_xlog_prune_freeze(XLogReaderState *record)
Assert((xlrec.flags & XLHP_CLEANUP_LOCK) != 0 ||
(xlrec.flags & (XLHP_HAS_REDIRECTIONS | XLHP_HAS_DEAD_ITEMS)) == 0);
+ vmflags = xlrec.flags & VISIBILITYMAP_VALID_BITS;
+
/*
- * We are about to remove and/or freeze tuples. In Hot Standby mode,
- * ensure that there are no queries running for which the removed tuples
- * are still visible or which still consider the frozen xids as running.
- * The conflict horizon XID comes after xl_heap_prune.
+ * After xl_heap_prune is the optional snapshot conflict horizon.
+ *
+ * In Hot Standby mode, we must ensure that there are no running queries
+ * which would conflict with the changes in this record. That means we
+ * can't replay this record if it removes tuples that are still visible to
+ * transactions on the standby, freeze tuples with xids that are still
+ * considered running on the standby, or set a page as all-visible in the
+ * VM if it isn't all-visible to all transactions on the standby.
*/
if ((xlrec.flags & XLHP_HAS_CONFLICT_HORIZON) != 0)
{
@@ -71,12 +79,12 @@ heap_xlog_prune_freeze(XLogReaderState *record)
}
/*
- * If we have a full-page image, restore it and we're done.
+ * If we have a full-page image of the heap block, restore it and we're
+ * done with the heap block.
*/
- action = XLogReadBufferForRedoExtended(record, 0, RBM_NORMAL,
- (xlrec.flags & XLHP_CLEANUP_LOCK) != 0,
- &buffer);
- if (action == BLK_NEEDS_REDO)
+ if (XLogReadBufferForRedoExtended(record, 0, RBM_NORMAL,
+ (xlrec.flags & XLHP_CLEANUP_LOCK) != 0,
+ &buffer) == BLK_NEEDS_REDO)
{
Page page = BufferGetPage(buffer);
OffsetNumber *redirected;
@@ -89,6 +97,9 @@ heap_xlog_prune_freeze(XLogReaderState *record)
Size datalen;
xlhp_freeze_plan *plans;
OffsetNumber *frz_offsets;
+ bool do_prune;
+ bool mark_buffer_dirty;
+ bool set_heap_lsn;
char *dataptr = XLogRecGetBlockData(record, 0, &datalen);
heap_xlog_deserialize_prune_and_freeze(dataptr, xlrec.flags,
@@ -97,11 +108,18 @@ heap_xlog_prune_freeze(XLogReaderState *record)
&ndead, &nowdead,
&nunused, &nowunused);
+ do_prune = nredirected > 0 || ndead > 0 || nunused > 0;
+ set_heap_lsn = mark_buffer_dirty = do_prune || nplans > 0;
+
+ /* Ensure the record does something */
+ Assert(do_prune || nplans > 0 ||
+ vmflags & VISIBILITYMAP_VALID_BITS);
+
/*
* Update all line pointers per the record, and repair fragmentation
* if needed.
*/
- if (nredirected > 0 || ndead > 0 || nunused > 0)
+ if (do_prune)
heap_page_prune_execute(buffer,
(xlrec.flags & XLHP_CLEANUP_LOCK) == 0,
redirected, nredirected,
@@ -138,36 +156,117 @@ heap_xlog_prune_freeze(XLogReaderState *record)
/* There should be no more data */
Assert((char *) frz_offsets == dataptr + datalen);
+ /*
+ * Now set PD_ALL_VISIBLE, if required. We'll only do this if we are
+ * also going to set bits in the VM later.
+ *
+ * We must never end up with the VM bit set and the page-level
+ * PD_ALL_VISIBLE bit clear. If that were to occur, a subsequent page
+ * modification would fail to clear the VM bit.
+ */
+ if ((vmflags & VISIBILITYMAP_VALID_BITS) && !PageIsAllVisible(page))
+ {
+ PageSetAllVisible(page);
+
+ /*
+ * If the only change to the heap page is setting PD_ALL_VISIBLE,
+ * we can avoid setting the page LSN unless checksums or
+ * wal_log_hints are enabled.
+ */
+ set_heap_lsn = XLogHintBitIsNeeded() ? true : set_heap_lsn;
+ mark_buffer_dirty = true;
+ }
+
/*
* Note: we don't worry about updating the page's prunability hints.
* At worst this will cause an extra prune cycle to occur soon.
*/
- PageSetLSN(page, lsn);
- MarkBufferDirty(buffer);
+ if (mark_buffer_dirty)
+ MarkBufferDirty(buffer);
+ if (set_heap_lsn)
+ PageSetLSN(page, lsn);
}
/*
- * If we released any space or line pointers, update the free space map.
+ * If we released any space or line pointers or will be setting a page in
+ * the visibility map, measure the page's freespace to later update the
+ * freespace map.
+ *
+ * Even if we are just updating the VM (and thus not freeing up any
+ * space), we'll still update the FSM for this page. Since FSM is not
+ * WAL-logged and only updated heuristically, it easily becomes stale in
+ * standbys. If the standby is later promoted and runs VACUUM, it will
+ * skip updating individual free space figures for pages that became
+ * all-visible (or all-frozen, depending on the vacuum mode,) which is
+ * troublesome when FreeSpaceMapVacuum propagates too optimistic free
+ * space values to upper FSM layers; later inserters try to use such pages
+ * only to find out that they are unusable. This can cause long stalls
+ * when there are many such pages.
+ *
+ * Forestall those problems by updating FSM's idea about a page that is
+ * becoming all-visible or all-frozen.
*
* Do this regardless of a full-page image being applied, since the FSM
* data is not in the page anyway.
+ *
+ * We want to avoid holding an exclusive lock on the heap buffer while
+ * doing IO (either of the FSM or the VM), so we'll release the lock on
+ * the heap buffer before doing either.
*/
if (BufferIsValid(buffer))
{
- if (xlrec.flags & (XLHP_HAS_REDIRECTIONS |
- XLHP_HAS_DEAD_ITEMS |
- XLHP_HAS_NOW_UNUSED_ITEMS))
- {
- Size freespace = PageGetHeapFreeSpace(BufferGetPage(buffer));
+ if ((xlrec.flags & (XLHP_HAS_REDIRECTIONS |
+ XLHP_HAS_DEAD_ITEMS |
+ XLHP_HAS_NOW_UNUSED_ITEMS)) ||
+ vmflags & VISIBILITYMAP_VALID_BITS)
+ freespace = PageGetHeapFreeSpace(BufferGetPage(buffer));
+
+ UnlockReleaseBuffer(buffer);
+ }
+
+ /*
+ * Read and update the VM block. Even if we skipped updating the heap page
+ * due to the file being dropped or truncated later in recovery, it's
+ * still safe to update the visibility map. Any WAL record that clears
+ * the visibility map bit does so before checking the page LSN, so any
+ * bits that need to be cleared will still be cleared.
+ *
+ * Note that it is _only_ okay that we do not hold a lock on the heap page
+ * because we are in recovery and can expect no other writers to clear
+ * PD_ALL_VISIBLE before we are able to update the VM.
+ */
+ if (vmflags & VISIBILITYMAP_VALID_BITS &&
+ XLogReadBufferForRedoExtended(record, 1,
+ RBM_ZERO_ON_ERROR,
+ false,
+ &vmbuffer) == BLK_NEEDS_REDO)
+ {
+ Page vmpage = BufferGetPage(vmbuffer);
+ uint8 old_vmbits = 0;
+ Relation reln = CreateFakeRelcacheEntry(rlocator);
- UnlockReleaseBuffer(buffer);
+ /* initialize the page if it was read as zeros */
+ if (PageIsNew(vmpage))
+ PageInit(vmpage, BLCKSZ, 0);
+
+ old_vmbits = visibilitymap_set_vmbits(reln, blkno, vmbuffer, vmflags);
- XLogRecordPageWithFreeSpace(rlocator, blkno, freespace);
+ /* Only set VM page LSN if we modified the page */
+ if (old_vmbits != vmflags)
+ {
+ Assert(BufferIsDirty(vmbuffer));
+ PageSetLSN(BufferGetPage(vmbuffer), lsn);
}
- else
- UnlockReleaseBuffer(buffer);
+
+ FreeFakeRelcacheEntry(reln);
}
+
+ if (BufferIsValid(vmbuffer))
+ UnlockReleaseBuffer(vmbuffer);
+
+ if (freespace > 0)
+ XLogRecordPageWithFreeSpace(rlocator, blkno, freespace);
}
/*
diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index 7ebd22f00a3..f0b33d1b696 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -21,6 +21,7 @@
#include "access/transam.h"
#include "access/xlog.h"
#include "access/xloginsert.h"
+#include "access/visibilitymapdefs.h"
#include "commands/vacuum.h"
#include "executor/instrument.h"
#include "miscadmin.h"
@@ -835,6 +836,7 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
conflict_xid = prstate.latest_xid_removed;
log_heap_prune_and_freeze(relation, buffer,
+ InvalidBuffer, 0, false,
conflict_xid,
true, reason,
prstate.frozen, prstate.nfrozen,
@@ -2030,14 +2032,18 @@ heap_log_freeze_plan(HeapTupleFreeze *tuples, int ntuples,
*
* This is used for several different page maintenance operations:
*
- * - Page pruning, in VACUUM's 1st pass or on access: Some items are
+ * - Page pruning, in vacuum phase I or on-access: Some items are
* redirected, some marked dead, and some removed altogether.
*
- * - Freezing: Items are marked as 'frozen'.
+ * - Freezing: During vacuum phase I, items are marked as 'frozen'
*
- * - Vacuum, 2nd pass: Items that are already LP_DEAD are marked as unused.
+ * - Reaping: During vacuum phase III, items that are already LP_DEAD are
+ * marked as unused.
*
- * They have enough commonalities that we use a single WAL record for them
+ * - VM updates: After vacuum phase III, the heap page may be marked
+ * all-visible and all-frozen.
+ *
+ * These changes all happen together, so we use a singel WAL record for them
* all.
*
* If replaying the record requires a cleanup lock, pass cleanup_lock = true.
@@ -2045,12 +2051,23 @@ heap_log_freeze_plan(HeapTupleFreeze *tuples, int ntuples,
* replaying 'unused' items depends on whether they were all previously marked
* as dead.
*
+ * If the VM is being updated, vmflags will contain the bits to set. In this
+ * case, vmbuffer should already have been updated and marked dirty and should
+ * still be pinned and locked.
+ *
+ * set_pd_all_vis indicates that we set PD_ALL_VISIBLE and thus should update
+ * the page LSN when checksums/wal_log_hints are enabled even if we did not
+ * prune or freeze tuples on the page.
+ *
* Note: This function scribbles on the 'frozen' array.
*
* Note: This is called in a critical section, so careful what you do here.
*/
void
log_heap_prune_and_freeze(Relation relation, Buffer buffer,
+ Buffer vmbuffer,
+ uint8 vmflags,
+ bool set_pd_all_vis,
TransactionId conflict_xid,
bool cleanup_lock,
PruneReason reason,
@@ -2062,6 +2079,7 @@ log_heap_prune_and_freeze(Relation relation, Buffer buffer,
xl_heap_prune xlrec;
XLogRecPtr recptr;
uint8 info;
+ uint8 regbuf_flags;
/* The following local variables hold data registered in the WAL record: */
xlhp_freeze_plan plans[MaxHeapTuplesPerPage];
@@ -2070,8 +2088,21 @@ log_heap_prune_and_freeze(Relation relation, Buffer buffer,
xlhp_prune_items dead_items;
xlhp_prune_items unused_items;
OffsetNumber frz_offsets[MaxHeapTuplesPerPage];
+ bool do_prune = nredirected > 0 || ndead > 0 || nunused > 0;
+
+ Assert((vmflags & VISIBILITYMAP_VALID_BITS) == vmflags);
+ xlrec.flags = vmflags;
- xlrec.flags = 0;
+ regbuf_flags = REGBUF_STANDARD;
+
+ /*
+ * We can avoid an FPI if the only modification we are making to the heap
+ * page is to set PD_ALL_VISIBLE and checksums/wal_log_hints are disabled.
+ */
+ if (!do_prune &&
+ nfrozen == 0 &&
+ (!set_pd_all_vis || !XLogHintBitIsNeeded()))
+ regbuf_flags |= REGBUF_NO_IMAGE;
/*
* Prepare data for the buffer. The arrays are not actually in the
@@ -2079,7 +2110,11 @@ log_heap_prune_and_freeze(Relation relation, Buffer buffer,
* page image, the arrays can be omitted.
*/
XLogBeginInsert();
- XLogRegisterBuffer(0, buffer, REGBUF_STANDARD);
+ XLogRegisterBuffer(0, buffer, regbuf_flags);
+
+ if (vmflags & VISIBILITYMAP_VALID_BITS)
+ XLogRegisterBuffer(1, vmbuffer, 0);
+
if (nfrozen > 0)
{
int nplans;
@@ -2168,5 +2203,22 @@ log_heap_prune_and_freeze(Relation relation, Buffer buffer,
}
recptr = XLogInsert(RM_HEAP2_ID, info);
- PageSetLSN(BufferGetPage(buffer), recptr);
+ if (vmflags & VISIBILITYMAP_VALID_BITS)
+ {
+ Assert(BufferIsDirty(vmbuffer));
+ PageSetLSN(BufferGetPage(vmbuffer), recptr);
+ }
+
+ /*
+ * If pruning or freezing tuples or setting the page all-visible when
+ * checksums or wal_hint_bits are enabled, we must bump the LSN. Torn
+ * pages are possible if we update PD_ALL_VISIBLE without bumping the LSN,
+ * but this is deemed okay for page hint updates.
+ */
+ if (do_prune || nfrozen > 0 ||
+ (set_pd_all_vis && XLogHintBitIsNeeded()))
+ {
+ Assert(BufferIsDirty(buffer));
+ PageSetLSN(BufferGetPage(buffer), recptr);
+ }
}
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index 8a84bdfe0a9..51067264004 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -463,11 +463,13 @@ static void dead_items_add(LVRelState *vacrel, BlockNumber blkno, OffsetNumber *
int num_offsets);
static void dead_items_reset(LVRelState *vacrel);
static void dead_items_cleanup(LVRelState *vacrel);
-static bool heap_page_is_all_visible(Relation rel, Buffer buf,
- TransactionId OldestXmin,
- bool *all_frozen,
- TransactionId *visibility_cutoff_xid,
- OffsetNumber *logging_offnum);
+static bool heap_page_would_be_all_visible(Relation rel, Buffer buf,
+ TransactionId OldestXmin,
+ OffsetNumber *deadoffsets,
+ int ndeadoffsets,
+ bool *all_frozen,
+ TransactionId *visibility_cutoff_xid,
+ OffsetNumber *logging_offnum);
static void update_relstats_all_indexes(LVRelState *vacrel);
static void vacuum_error_callback(void *arg);
static void update_vacuum_error_info(LVRelState *vacrel,
@@ -2852,8 +2854,11 @@ lazy_vacuum_heap_page(LVRelState *vacrel, BlockNumber blkno, Buffer buffer,
OffsetNumber unused[MaxHeapTuplesPerPage];
int nunused = 0;
TransactionId visibility_cutoff_xid;
+ TransactionId conflict_xid = InvalidTransactionId;
bool all_frozen;
LVSavedErrInfo saved_err_info;
+ uint8 vmflags = 0;
+ bool set_pd_all_vis = false;
Assert(vacrel->do_index_vacuuming);
@@ -2864,6 +2869,23 @@ lazy_vacuum_heap_page(LVRelState *vacrel, BlockNumber blkno, Buffer buffer,
VACUUM_ERRCB_PHASE_VACUUM_HEAP, blkno,
InvalidOffsetNumber);
+ if (heap_page_would_be_all_visible(vacrel->rel, buffer,
+ vacrel->cutoffs.OldestXmin,
+ deadoffsets, num_offsets,
+ &all_frozen, &visibility_cutoff_xid,
+ &vacrel->offnum))
+ {
+ vmflags |= VISIBILITYMAP_ALL_VISIBLE;
+ if (all_frozen)
+ {
+ vmflags |= VISIBILITYMAP_ALL_FROZEN;
+ Assert(!TransactionIdIsValid(visibility_cutoff_xid));
+ }
+
+ /* Take the lock on the vmbuffer before entering a critical section */
+ LockBuffer(vmbuffer, BUFFER_LOCK_EXCLUSIVE);
+ }
+
START_CRIT_SECTION();
for (int i = 0; i < num_offsets; i++)
@@ -2883,6 +2905,17 @@ lazy_vacuum_heap_page(LVRelState *vacrel, BlockNumber blkno, Buffer buffer,
/* Attempt to truncate line pointer array now */
PageTruncateLinePointerArray(page);
+ if ((vmflags & VISIBILITYMAP_VALID_BITS) != 0)
+ {
+ Assert(!PageIsAllVisible(page));
+ set_pd_all_vis = true;
+ PageSetAllVisible(page);
+ visibilitymap_set_vmbits(vacrel->rel,
+ blkno,
+ vmbuffer, vmflags);
+ conflict_xid = visibility_cutoff_xid;
+ }
+
/*
* Mark buffer dirty before we write WAL.
*/
@@ -2892,7 +2925,10 @@ lazy_vacuum_heap_page(LVRelState *vacrel, BlockNumber blkno, Buffer buffer,
if (RelationNeedsWAL(vacrel->rel))
{
log_heap_prune_and_freeze(vacrel->rel, buffer,
- InvalidTransactionId,
+ vmbuffer,
+ vmflags,
+ set_pd_all_vis,
+ conflict_xid,
false, /* no cleanup lock required */
PRUNE_VACUUM_CLEANUP,
NULL, 0, /* frozen */
@@ -2901,39 +2937,12 @@ lazy_vacuum_heap_page(LVRelState *vacrel, BlockNumber blkno, Buffer buffer,
unused, nunused);
}
- /*
- * End critical section, so we safely can do visibility tests (which
- * possibly need to perform IO and allocate memory!). If we crash now the
- * page (including the corresponding vm bit) might not be marked all
- * visible, but that's fine. A later vacuum will fix that.
- */
END_CRIT_SECTION();
- /*
- * Now that we have removed the LP_DEAD items from the page, once again
- * check if the page has become all-visible. The page is already marked
- * dirty, exclusively locked, and, if needed, a full page image has been
- * emitted.
- */
- Assert(!PageIsAllVisible(page));
- if (heap_page_is_all_visible(vacrel->rel, buffer, vacrel->cutoffs.OldestXmin,
- &all_frozen, &visibility_cutoff_xid, &vacrel->offnum))
+ if ((vmflags & VISIBILITYMAP_ALL_VISIBLE) != 0)
{
- uint8 flags = VISIBILITYMAP_ALL_VISIBLE;
-
- if (all_frozen)
- {
- Assert(!TransactionIdIsValid(visibility_cutoff_xid));
- flags |= VISIBILITYMAP_ALL_FROZEN;
- }
-
- PageSetAllVisible(page);
- visibilitymap_set(vacrel->rel, blkno, buffer,
- InvalidXLogRecPtr,
- vmbuffer, visibility_cutoff_xid,
- flags);
-
/* Count the newly set VM page for logging */
+ LockBuffer(vmbuffer, BUFFER_LOCK_UNLOCK);
vacrel->vm_new_visible_pages++;
if (all_frozen)
vacrel->vm_new_visible_frozen_pages++;
@@ -3600,40 +3609,85 @@ dead_items_cleanup(LVRelState *vacrel)
}
/*
- * Check if every tuple in the given page in buf is visible to all current and
- * future transactions.
+ * Wrapper for heap_page_would_be_all_visible() which can be used for
+ * callers that expect no LP_DEAD on the page.
+ */
+bool
+heap_page_is_all_visible(Relation rel, Buffer buf,
+ TransactionId OldestXmin,
+ bool *all_frozen,
+ TransactionId *visibility_cutoff_xid,
+ OffsetNumber *logging_offnum)
+{
+
+ return heap_page_would_be_all_visible(rel, buf, OldestXmin,
+ NULL, 0,
+ all_frozen,
+ visibility_cutoff_xid,
+ logging_offnum);
+}
+
+/*
+ * Determines whether or not the heap page in buf is all-visible other than
+ * the dead line pointers referred to by the provided deadoffsets array.
*
- * OldestXmin is used to determine visibility.
+ * deadoffsets are the offsets the caller knows about and already removed
+ * associated index entries. Vacuum will call this before setting those line
+ * pointers LP_UNUSED. So, if there are no new LP_DEAD items, then the page
+ * can be set all-visible in the VM by the caller.
+ *
+ * Returns true if the page is all-visible other than the provided
+ * deadoffsets and false otherwise.
*
- * Sets *all_frozen to true if every tuple on this page is frozen.
+ * OldestXmin is used to determine visibility.
*
- * Sets *visibility_cutoff_xid to the highest xmin amongst the visible tuples.
- * It is only valid if the page is all-visible.
+ * *all_frozen is an output parameter indicating to the caller if every tuple
+ * on the page is frozen.
*
* *logging_offnum will have the OffsetNumber of the current tuple being
* processed for vacuum's error callback system.
*
- * This is a stripped down version of lazy_scan_prune(). If you change
- * anything here, make sure that everything stays in sync. Note that an
- * assertion calls us to verify that everybody still agrees. Be sure to avoid
- * introducing new side-effects here.
+ * *visibility_cutoff_xid is an output parameter with the highest xmin amongst the
+ * visible tuples. It is only valid if the page is all-visible.
+ *
+ * Callers looking to verify that the page is already all-visible can call
+ * heap_page_is_all_visible().
+ *
+ * This is similar logic to that in heap_prune_record_unchanged_lp_normal() If
+ * you change anything here, make sure that everything stays in sync. Note
+ * that an assertion calls us to verify that everybody still agrees. Be sure
+ * to avoid introducing new side-effects here.
*/
static bool
-heap_page_is_all_visible(Relation rel, Buffer buf,
- TransactionId OldestXmin,
- bool *all_frozen,
- TransactionId *visibility_cutoff_xid,
- OffsetNumber *logging_offnum)
+heap_page_would_be_all_visible(Relation rel, Buffer buf,
+ TransactionId OldestXmin,
+ OffsetNumber *deadoffsets,
+ int ndeadoffsets,
+ bool *all_frozen,
+ TransactionId *visibility_cutoff_xid,
+ OffsetNumber *logging_offnum)
{
Page page = BufferGetPage(buf);
BlockNumber blockno = BufferGetBlockNumber(buf);
OffsetNumber offnum,
maxoff;
bool all_visible = true;
+ int matched_dead_count = 0;
*visibility_cutoff_xid = InvalidTransactionId;
*all_frozen = true;
+ Assert(ndeadoffsets == 0 || deadoffsets);
+
+#ifdef USE_ASSERT_CHECKING
+ /* Confirm input deadoffsets[] is strictly sorted */
+ if (ndeadoffsets > 1)
+ {
+ for (int i = 1; i < ndeadoffsets; i++)
+ Assert(deadoffsets[i - 1] < deadoffsets[i]);
+ }
+#endif
+
maxoff = PageGetMaxOffsetNumber(page);
for (offnum = FirstOffsetNumber;
offnum <= maxoff && all_visible;
@@ -3661,9 +3715,15 @@ heap_page_is_all_visible(Relation rel, Buffer buf,
*/
if (ItemIdIsDead(itemid))
{
- all_visible = false;
- *all_frozen = false;
- break;
+ if (!deadoffsets ||
+ matched_dead_count >= ndeadoffsets ||
+ deadoffsets[matched_dead_count] != offnum)
+ {
+ *all_frozen = all_visible = false;
+ break;
+ }
+ matched_dead_count++;
+ continue;
}
Assert(ItemIdIsNormal(itemid));
diff --git a/src/backend/access/rmgrdesc/heapdesc.c b/src/backend/access/rmgrdesc/heapdesc.c
index b48d7dc1d24..439f33b8061 100644
--- a/src/backend/access/rmgrdesc/heapdesc.c
+++ b/src/backend/access/rmgrdesc/heapdesc.c
@@ -103,7 +103,7 @@ plan_elem_desc(StringInfo buf, void *plan, void *data)
* code, the latter of which is used in frontend (pg_waldump) code.
*/
void
-heap_xlog_deserialize_prune_and_freeze(char *cursor, uint8 flags,
+heap_xlog_deserialize_prune_and_freeze(char *cursor, uint16 flags,
int *nplans, xlhp_freeze_plan **plans,
OffsetNumber **frz_offsets,
int *nredirected, OffsetNumber **redirected,
@@ -287,6 +287,10 @@ heap2_desc(StringInfo buf, XLogReaderState *record)
appendStringInfo(buf, ", isCatalogRel: %c",
xlrec->flags & XLHP_IS_CATALOG_REL ? 'T' : 'F');
+ if (xlrec->flags & VISIBILITYMAP_VALID_BITS)
+ appendStringInfo(buf, ", vm_flags: 0x%02X",
+ xlrec->flags & VISIBILITYMAP_VALID_BITS);
+
if (XLogRecHasBlockData(record, 0))
{
Size datalen;
diff --git a/src/include/access/heapam.h b/src/include/access/heapam.h
index a2bd5a897f8..8b47295efa2 100644
--- a/src/include/access/heapam.h
+++ b/src/include/access/heapam.h
@@ -344,6 +344,12 @@ extern void heap_inplace_update_and_unlock(Relation relation,
Buffer buffer);
extern void heap_inplace_unlock(Relation relation,
HeapTuple oldtup, Buffer buffer);
+
+extern bool heap_page_is_all_visible(Relation rel, Buffer buf,
+ TransactionId OldestXmin,
+ bool *all_frozen,
+ TransactionId *visibility_cutoff_xid,
+ OffsetNumber *logging_offnum);
extern bool heap_prepare_freeze_tuple(HeapTupleHeader tuple,
const struct VacuumCutoffs *cutoffs,
HeapPageFreeze *pagefrz,
@@ -388,6 +394,9 @@ extern void heap_page_prune_execute(Buffer buffer, bool lp_truncate_only,
OffsetNumber *nowunused, int nunused);
extern void heap_get_root_tuples(Page page, OffsetNumber *root_offsets);
extern void log_heap_prune_and_freeze(Relation relation, Buffer buffer,
+ Buffer vmbuffer,
+ uint8 vmflags,
+ bool vm_modified_heap_page,
TransactionId conflict_xid,
bool cleanup_lock,
PruneReason reason,
diff --git a/src/include/access/heapam_xlog.h b/src/include/access/heapam_xlog.h
index d4c0625b632..d8508593e7c 100644
--- a/src/include/access/heapam_xlog.h
+++ b/src/include/access/heapam_xlog.h
@@ -249,7 +249,7 @@ typedef struct xl_heap_update
* Main data section:
*
* xl_heap_prune
- * uint8 flags
+ * uint16 flags
* TransactionId snapshot_conflict_horizon
*
* Block 0 data section:
@@ -284,7 +284,7 @@ typedef struct xl_heap_update
*/
typedef struct xl_heap_prune
{
- uint8 flags;
+ uint16 flags;
/*
* If XLHP_HAS_CONFLICT_HORIZON is set, the conflict horizon XID follows,
@@ -292,10 +292,22 @@ typedef struct xl_heap_prune
*/
} xl_heap_prune;
-#define SizeOfHeapPrune (offsetof(xl_heap_prune, flags) + sizeof(uint8))
+#define SizeOfHeapPrune (offsetof(xl_heap_prune, flags) + sizeof(uint16))
+
+/*
+ * The xl_heap_prune record's flags may also contain which VM bits to set. As
+ * such, (1 << 0) and (1 << 1) are reserved for VISIBILITYMAP_ALL_VISIBLE and
+ * VISIBILITYMAP_ALL_FROZEN.
+ */
-/* to handle recovery conflict during logical decoding on standby */
-#define XLHP_IS_CATALOG_REL (1 << 1)
+/*
+ * To handle recovery conflict during logical decoding on standby, we must know
+ * if the table is a catalog table. Note that in visibilitymapdefs.h
+ * VISIBLITYMAP_XLOG_CATALOG_REL is also defined as (1 << 2). xl_heap_prune
+ * records should use XLHP_IS_CATALOG_REL, not VISIBILIYTMAP_XLOG_CATALOG_REL --
+ * even if they only contain updates to the VM.
+ */
+#define XLHP_IS_CATALOG_REL (1 << 2)
/*
* Does replaying the record require a cleanup-lock?
@@ -305,7 +317,7 @@ typedef struct xl_heap_prune
* marks LP_DEAD line pointers as unused without moving any tuple data, an
* ordinary exclusive lock is sufficient.
*/
-#define XLHP_CLEANUP_LOCK (1 << 2)
+#define XLHP_CLEANUP_LOCK (1 << 3)
/*
* If we remove or freeze any entries that contain xids, we need to include a
@@ -313,22 +325,22 @@ typedef struct xl_heap_prune
* there are no queries running for which the removed tuples are still
* visible, or which still consider the frozen XIDs as running.
*/
-#define XLHP_HAS_CONFLICT_HORIZON (1 << 3)
+#define XLHP_HAS_CONFLICT_HORIZON (1 << 4)
/*
* Indicates that an xlhp_freeze_plans sub-record and one or more
* xlhp_freeze_plan sub-records are present.
*/
-#define XLHP_HAS_FREEZE_PLANS (1 << 4)
+#define XLHP_HAS_FREEZE_PLANS (1 << 5)
/*
* XLHP_HAS_REDIRECTIONS, XLHP_HAS_DEAD_ITEMS, and XLHP_HAS_NOW_UNUSED_ITEMS
* indicate that xlhp_prune_items sub-records with redirected, dead, and
* unused item offsets are present.
*/
-#define XLHP_HAS_REDIRECTIONS (1 << 5)
-#define XLHP_HAS_DEAD_ITEMS (1 << 6)
-#define XLHP_HAS_NOW_UNUSED_ITEMS (1 << 7)
+#define XLHP_HAS_REDIRECTIONS (1 << 6)
+#define XLHP_HAS_DEAD_ITEMS (1 << 7)
+#define XLHP_HAS_NOW_UNUSED_ITEMS (1 << 8)
/*
* xlhp_freeze_plan describes how to freeze a group of one or more heap tuples
@@ -497,7 +509,7 @@ extern XLogRecPtr log_heap_visible(Relation rel, Buffer heap_buffer,
uint8 vmflags);
/* in heapdesc.c, so it can be shared between frontend/backend code */
-extern void heap_xlog_deserialize_prune_and_freeze(char *cursor, uint8 flags,
+extern void heap_xlog_deserialize_prune_and_freeze(char *cursor, uint16 flags,
int *nplans, xlhp_freeze_plan **plans,
OffsetNumber **frz_offsets,
int *nredirected, OffsetNumber **redirected,
--
2.43.0
v11-0002-Make-heap_page_is_all_visible-independent-of-LVR.patchtext/x-patch; charset=US-ASCII; name=v11-0002-Make-heap_page_is_all_visible-independent-of-LVR.patchDownload
From 94de583f0c7786ed49d27685055a1e3bd0cecb61 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <melanieplageman@gmail.com>
Date: Wed, 18 Jun 2025 15:48:51 -0400
Subject: [PATCH v11 02/20] Make heap_page_is_all_visible independent of
LVRelState
Future commits will use this function inside of pruneheap.c where we do
not have access to the LVRelState. We only need a few parameters from
the LVRelState, so just pass those in explicitly.
Author: Melanie Plageman <melanieplageman@gmail.com>
Reviewed-by: Kirill Reshke <reshkekirill@gmail.com>
Discussion: https://postgr.es/m/flat/CAAKRu_ZMw6Npd_qm2KM%2BFwQ3cMOMx1Dh3VMhp8-V7SOLxdK9-g%40mail.gmail.com
---
src/backend/access/heap/vacuumlazy.c | 48 ++++++++++++++++++----------
1 file changed, 31 insertions(+), 17 deletions(-)
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index 981d9380a92..8a84bdfe0a9 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -463,8 +463,11 @@ static void dead_items_add(LVRelState *vacrel, BlockNumber blkno, OffsetNumber *
int num_offsets);
static void dead_items_reset(LVRelState *vacrel);
static void dead_items_cleanup(LVRelState *vacrel);
-static bool heap_page_is_all_visible(LVRelState *vacrel, Buffer buf,
- TransactionId *visibility_cutoff_xid, bool *all_frozen);
+static bool heap_page_is_all_visible(Relation rel, Buffer buf,
+ TransactionId OldestXmin,
+ bool *all_frozen,
+ TransactionId *visibility_cutoff_xid,
+ OffsetNumber *logging_offnum);
static void update_relstats_all_indexes(LVRelState *vacrel);
static void vacuum_error_callback(void *arg);
static void update_vacuum_error_info(LVRelState *vacrel,
@@ -2009,8 +2012,9 @@ lazy_scan_prune(LVRelState *vacrel,
Assert(presult.lpdead_items == 0);
- if (!heap_page_is_all_visible(vacrel, buf,
- &debug_cutoff, &debug_all_frozen))
+ if (!heap_page_is_all_visible(vacrel->rel, buf,
+ vacrel->cutoffs.OldestXmin, &debug_all_frozen,
+ &debug_cutoff, &vacrel->offnum))
Assert(false);
Assert(presult.all_frozen == debug_all_frozen);
@@ -2912,8 +2916,8 @@ lazy_vacuum_heap_page(LVRelState *vacrel, BlockNumber blkno, Buffer buffer,
* emitted.
*/
Assert(!PageIsAllVisible(page));
- if (heap_page_is_all_visible(vacrel, buffer, &visibility_cutoff_xid,
- &all_frozen))
+ if (heap_page_is_all_visible(vacrel->rel, buffer, vacrel->cutoffs.OldestXmin,
+ &all_frozen, &visibility_cutoff_xid, &vacrel->offnum))
{
uint8 flags = VISIBILITYMAP_ALL_VISIBLE;
@@ -3596,10 +3600,18 @@ dead_items_cleanup(LVRelState *vacrel)
}
/*
- * Check if every tuple in the given page is visible to all current and future
- * transactions. Also return the visibility_cutoff_xid which is the highest
- * xmin amongst the visible tuples. Set *all_frozen to true if every tuple
- * on this page is frozen.
+ * Check if every tuple in the given page in buf is visible to all current and
+ * future transactions.
+ *
+ * OldestXmin is used to determine visibility.
+ *
+ * Sets *all_frozen to true if every tuple on this page is frozen.
+ *
+ * Sets *visibility_cutoff_xid to the highest xmin amongst the visible tuples.
+ * It is only valid if the page is all-visible.
+ *
+ * *logging_offnum will have the OffsetNumber of the current tuple being
+ * processed for vacuum's error callback system.
*
* This is a stripped down version of lazy_scan_prune(). If you change
* anything here, make sure that everything stays in sync. Note that an
@@ -3607,9 +3619,11 @@ dead_items_cleanup(LVRelState *vacrel)
* introducing new side-effects here.
*/
static bool
-heap_page_is_all_visible(LVRelState *vacrel, Buffer buf,
+heap_page_is_all_visible(Relation rel, Buffer buf,
+ TransactionId OldestXmin,
+ bool *all_frozen,
TransactionId *visibility_cutoff_xid,
- bool *all_frozen)
+ OffsetNumber *logging_offnum)
{
Page page = BufferGetPage(buf);
BlockNumber blockno = BufferGetBlockNumber(buf);
@@ -3632,7 +3646,7 @@ heap_page_is_all_visible(LVRelState *vacrel, Buffer buf,
* Set the offset number so that we can display it along with any
* error that occurred while processing this tuple.
*/
- vacrel->offnum = offnum;
+ *logging_offnum = offnum;
itemid = PageGetItemId(page, offnum);
/* Unused or redirect line pointers are of no interest */
@@ -3656,9 +3670,9 @@ heap_page_is_all_visible(LVRelState *vacrel, Buffer buf,
tuple.t_data = (HeapTupleHeader) PageGetItem(page, itemid);
tuple.t_len = ItemIdGetLength(itemid);
- tuple.t_tableOid = RelationGetRelid(vacrel->rel);
+ tuple.t_tableOid = RelationGetRelid(rel);
- switch (HeapTupleSatisfiesVacuum(&tuple, vacrel->cutoffs.OldestXmin,
+ switch (HeapTupleSatisfiesVacuum(&tuple, OldestXmin,
buf))
{
case HEAPTUPLE_LIVE:
@@ -3679,7 +3693,7 @@ heap_page_is_all_visible(LVRelState *vacrel, Buffer buf,
*/
xmin = HeapTupleHeaderGetXmin(tuple.t_data);
if (!TransactionIdPrecedes(xmin,
- vacrel->cutoffs.OldestXmin))
+ OldestXmin))
{
all_visible = false;
*all_frozen = false;
@@ -3714,7 +3728,7 @@ heap_page_is_all_visible(LVRelState *vacrel, Buffer buf,
} /* scan along page */
/* Clear the offset information once we have processed the given page. */
- vacrel->offnum = InvalidOffsetNumber;
+ *logging_offnum = InvalidOffsetNumber;
return all_visible;
}
--
2.43.0
v11-0005-Combine-lazy_scan_prune-VM-corruption-cases.patchtext/x-patch; charset=US-ASCII; name=v11-0005-Combine-lazy_scan_prune-VM-corruption-cases.patchDownload
From 6ed211a90a2e3b384ba06d85ae183f513ca3ffc3 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <melanieplageman@gmail.com>
Date: Wed, 28 May 2025 16:04:03 -0400
Subject: [PATCH v11 05/20] Combine lazy_scan_prune VM corruption cases
lazy_scan_prune() updates the visibility map after phase I of heap
vacuuming. It also checks and fixes corruption in the VM. The corruption
cases where mixed in with the normal visibility map update cases.
Careful study of the ordering of the current logic reveals that the
corruption cases can be reordered and extracted into a separate
function. This should result in no additional overhead when compared to
previous execution.
This reordering makes it clear which cases are about corruption and
which cases are normal VM updates. Separating them also makes it
possible to combine the normal cases in a future commit. This will make
the logic easier to understand and allow for further separation of the
logic to allow updating the VM in the same record as pruning and
freezing in phase I.
---
src/backend/access/heap/vacuumlazy.c | 126 +++++++++++++++++----------
1 file changed, 79 insertions(+), 47 deletions(-)
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index a1cdaaebb57..e9b4e924d22 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -430,6 +430,12 @@ static void find_next_unskippable_block(LVRelState *vacrel, bool *skipsallvis);
static bool lazy_scan_new_or_empty(LVRelState *vacrel, Buffer buf,
BlockNumber blkno, Page page,
bool sharelock, Buffer vmbuffer);
+static bool identify_and_fix_vm_corruption(Relation relation,
+ BlockNumber heap_blk,
+ Buffer heap_buffer, Page heap_page,
+ bool heap_blk_known_av,
+ int64 nlpdead_items,
+ Buffer vmbuffer);
static int lazy_scan_prune(LVRelState *vacrel, Buffer buf,
BlockNumber blkno, Page page,
Buffer vmbuffer, bool all_visible_according_to_vm,
@@ -1934,6 +1940,72 @@ lazy_scan_new_or_empty(LVRelState *vacrel, Buffer buf, BlockNumber blkno,
return false;
}
+/*
+ * When updating the visibility map after phase I heap vacuuming, we take the
+ * opportunity to identify and fix any VM corruption.
+ *
+ * heap_blk_known_av is the visibility status of the heap page collected
+ * while finding the next unskippable block in heap_vac_scan_next_block().
+ */
+static bool
+identify_and_fix_vm_corruption(Relation relation,
+ BlockNumber heap_blk,
+ Buffer heap_buffer, Page heap_page,
+ bool heap_blk_known_av,
+ int64 nlpdead_items,
+ Buffer vmbuffer)
+{
+ /*
+ * As of PostgreSQL 9.2, the visibility map bit should never be set if the
+ * page-level bit is clear. However, it's possible that the bit got
+ * cleared after heap_vac_scan_next_block() was called, so we must recheck
+ * with buffer lock before concluding that the VM is corrupt.
+ */
+ if (heap_blk_known_av && !PageIsAllVisible(heap_page) &&
+ visibilitymap_get_status(relation, heap_blk, &vmbuffer) != 0)
+ {
+ ereport(WARNING,
+ (errcode(ERRCODE_DATA_CORRUPTED),
+ errmsg("page is not marked all-visible but visibility map bit is set in relation \"%s\" page %u",
+ RelationGetRelationName(relation), heap_blk)));
+
+ visibilitymap_clear(relation, heap_blk, vmbuffer,
+ VISIBILITYMAP_VALID_BITS);
+ return true;
+ }
+
+ /*
+ * It's possible for the value returned by
+ * GetOldestNonRemovableTransactionId() to move backwards, so it's not
+ * wrong for us to see tuples that appear to not be visible to everyone
+ * yet, while PD_ALL_VISIBLE is already set. The real safe xmin value
+ * never moves backwards, but GetOldestNonRemovableTransactionId() is
+ * conservative and sometimes returns a value that's unnecessarily small,
+ * so if we see that contradiction it just means that the tuples that we
+ * think are not visible to everyone yet actually are, and the
+ * PD_ALL_VISIBLE flag is correct.
+ *
+ * There should never be LP_DEAD items on a page with PD_ALL_VISIBLE set,
+ * however.
+ */
+ if (nlpdead_items > 0 && PageIsAllVisible(heap_page))
+ {
+ ereport(WARNING,
+ (errcode(ERRCODE_DATA_CORRUPTED),
+ errmsg("page containing LP_DEAD items is marked as all-visible in relation \"%s\" page %u",
+ RelationGetRelationName(relation), heap_blk)));
+
+ PageClearAllVisible(heap_page);
+ MarkBufferDirty(heap_buffer);
+ visibilitymap_clear(relation, heap_blk, vmbuffer,
+ VISIBILITYMAP_VALID_BITS);
+ return true;
+ }
+
+ return false;
+}
+
+
/* qsort comparator for sorting OffsetNumbers */
static int
cmpOffsetNumbers(const void *a, const void *b)
@@ -2080,9 +2152,14 @@ lazy_scan_prune(LVRelState *vacrel,
/*
* Handle setting visibility map bit based on information from the VM (as
* of last heap_vac_scan_next_block() call), and from all_visible and
- * all_frozen variables
+ * all_frozen variables. Start by looking for any VM corruption.
*/
- if (!all_visible_according_to_vm && presult.all_visible)
+ if (identify_and_fix_vm_corruption(vacrel->rel, blkno, buf, page,
+ all_visible_according_to_vm, presult.lpdead_items, vmbuffer))
+ {
+ /* Don't update the VM if we just cleared corruption in it */
+ }
+ else if (!all_visible_according_to_vm && presult.all_visible)
{
uint8 old_vmbits;
uint8 flags = VISIBILITYMAP_ALL_VISIBLE;
@@ -2134,51 +2211,6 @@ lazy_scan_prune(LVRelState *vacrel,
}
}
- /*
- * As of PostgreSQL 9.2, the visibility map bit should never be set if the
- * page-level bit is clear. However, it's possible that the bit got
- * cleared after heap_vac_scan_next_block() was called, so we must recheck
- * with buffer lock before concluding that the VM is corrupt.
- */
- else if (all_visible_according_to_vm && !PageIsAllVisible(page) &&
- visibilitymap_get_status(vacrel->rel, blkno, &vmbuffer) != 0)
- {
- ereport(WARNING,
- (errcode(ERRCODE_DATA_CORRUPTED),
- errmsg("page is not marked all-visible but visibility map bit is set in relation \"%s\" page %u",
- vacrel->relname, blkno)));
-
- visibilitymap_clear(vacrel->rel, blkno, vmbuffer,
- VISIBILITYMAP_VALID_BITS);
- }
-
- /*
- * It's possible for the value returned by
- * GetOldestNonRemovableTransactionId() to move backwards, so it's not
- * wrong for us to see tuples that appear to not be visible to everyone
- * yet, while PD_ALL_VISIBLE is already set. The real safe xmin value
- * never moves backwards, but GetOldestNonRemovableTransactionId() is
- * conservative and sometimes returns a value that's unnecessarily small,
- * so if we see that contradiction it just means that the tuples that we
- * think are not visible to everyone yet actually are, and the
- * PD_ALL_VISIBLE flag is correct.
- *
- * There should never be LP_DEAD items on a page with PD_ALL_VISIBLE set,
- * however.
- */
- else if (presult.lpdead_items > 0 && PageIsAllVisible(page))
- {
- ereport(WARNING,
- (errcode(ERRCODE_DATA_CORRUPTED),
- errmsg("page containing LP_DEAD items is marked as all-visible in relation \"%s\" page %u",
- vacrel->relname, blkno)));
-
- PageClearAllVisible(page);
- MarkBufferDirty(buf);
- visibilitymap_clear(vacrel->rel, blkno, vmbuffer,
- VISIBILITYMAP_VALID_BITS);
- }
-
/*
* If the all-visible page is all-frozen but not marked as such yet, mark
* it as all-frozen. Note that all_frozen is only valid if all_visible is
--
2.43.0
v11-0004-Use-xl_heap_prune-record-for-setting-empty-pages.patchtext/x-patch; charset=US-ASCII; name=v11-0004-Use-xl_heap_prune-record-for-setting-empty-pages.patchDownload
From 372ba8cb2a0f1b234db1b2dca929ae025d43a034 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <melanieplageman@gmail.com>
Date: Wed, 18 Jun 2025 12:42:19 -0400
Subject: [PATCH v11 04/20] Use xl_heap_prune record for setting empty pages
all-visible
As part of a project to eliminate xl_heap_visible records, eliminate
their usage in phase I vacuum of empty pages.
---
src/backend/access/heap/pruneheap.c | 14 +++++--
src/backend/access/heap/vacuumlazy.c | 57 ++++++++++++++++++----------
src/include/access/heapam.h | 1 +
3 files changed, 49 insertions(+), 23 deletions(-)
diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index f0b33d1b696..373986b204a 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -836,6 +836,7 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
conflict_xid = prstate.latest_xid_removed;
log_heap_prune_and_freeze(relation, buffer,
+ false,
InvalidBuffer, 0, false,
conflict_xid,
true, reason,
@@ -2055,6 +2056,9 @@ heap_log_freeze_plan(HeapTupleFreeze *tuples, int ntuples,
* case, vmbuffer should already have been updated and marked dirty and should
* still be pinned and locked.
*
+ * force_heap_fpi indicates that a full page image of the heap block should be
+ * forced.
+ *
* set_pd_all_vis indicates that we set PD_ALL_VISIBLE and thus should update
* the page LSN when checksums/wal_log_hints are enabled even if we did not
* prune or freeze tuples on the page.
@@ -2065,6 +2069,7 @@ heap_log_freeze_plan(HeapTupleFreeze *tuples, int ntuples,
*/
void
log_heap_prune_and_freeze(Relation relation, Buffer buffer,
+ bool force_heap_fpi,
Buffer vmbuffer,
uint8 vmflags,
bool set_pd_all_vis,
@@ -2095,13 +2100,16 @@ log_heap_prune_and_freeze(Relation relation, Buffer buffer,
regbuf_flags = REGBUF_STANDARD;
+ if (force_heap_fpi)
+ regbuf_flags |= REGBUF_FORCE_IMAGE;
+
/*
* We can avoid an FPI if the only modification we are making to the heap
* page is to set PD_ALL_VISIBLE and checksums/wal_log_hints are disabled.
*/
- if (!do_prune &&
- nfrozen == 0 &&
- (!set_pd_all_vis || !XLogHintBitIsNeeded()))
+ else if (!do_prune &&
+ nfrozen == 0 &&
+ (!set_pd_all_vis || !XLogHintBitIsNeeded()))
regbuf_flags |= REGBUF_NO_IMAGE;
/*
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index 51067264004..a1cdaaebb57 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -1877,33 +1877,49 @@ lazy_scan_new_or_empty(LVRelState *vacrel, Buffer buf, BlockNumber blkno,
*/
if (!PageIsAllVisible(page))
{
+ uint8 new_vmbits = VISIBILITYMAP_ALL_VISIBLE |
+ VISIBILITYMAP_ALL_FROZEN;
+
+ /* Lock vmbuffer before entering critical section */
+ LockBuffer(vmbuffer, BUFFER_LOCK_EXCLUSIVE);
+
START_CRIT_SECTION();
- /* mark buffer dirty before writing a WAL record */
+ PageSetAllVisible(page);
MarkBufferDirty(buf);
- /*
- * It's possible that another backend has extended the heap,
- * initialized the page, and then failed to WAL-log the page due
- * to an ERROR. Since heap extension is not WAL-logged, recovery
- * might try to replay our record setting the page all-visible and
- * find that the page isn't initialized, which will cause a PANIC.
- * To prevent that, check whether the page has been previously
- * WAL-logged, and if not, do that now.
- */
- if (RelationNeedsWAL(vacrel->rel) &&
- PageGetLSN(page) == InvalidXLogRecPtr)
- log_newpage_buffer(buf, true);
+ visibilitymap_set_vmbits(vacrel->rel, blkno,
+ vmbuffer, new_vmbits);
+
+ if (RelationNeedsWAL(vacrel->rel))
+ {
+ /*
+ * It's possible that another backend has extended the heap,
+ * initialized the page, and then failed to WAL-log the page
+ * due to an ERROR. Since heap extension is not WAL-logged,
+ * recovery might try to replay our record setting the page
+ * all-visible and find that the page isn't initialized, which
+ * will cause a PANIC. To prevent that, if the page hasn't
+ * been previously WAL-logged, force a heap FPI.
+ */
+ log_heap_prune_and_freeze(vacrel->rel, buf,
+ PageGetLSN(page) == InvalidXLogRecPtr,
+ vmbuffer,
+ new_vmbits,
+ true,
+ InvalidTransactionId,
+ false, PRUNE_VACUUM_SCAN,
+ NULL, 0,
+ NULL, 0,
+ NULL, 0,
+ NULL, 0);
+ }
- PageSetAllVisible(page);
- visibilitymap_set(vacrel->rel, blkno, buf,
- InvalidXLogRecPtr,
- vmbuffer, InvalidTransactionId,
- VISIBILITYMAP_ALL_VISIBLE |
- VISIBILITYMAP_ALL_FROZEN);
END_CRIT_SECTION();
- /* Count the newly all-frozen pages for logging */
+ LockBuffer(vmbuffer, BUFFER_LOCK_UNLOCK);
+
+ /* Count the newly all-frozen pages for logging. */
vacrel->vm_new_visible_pages++;
vacrel->vm_new_visible_frozen_pages++;
}
@@ -2925,6 +2941,7 @@ lazy_vacuum_heap_page(LVRelState *vacrel, BlockNumber blkno, Buffer buffer,
if (RelationNeedsWAL(vacrel->rel))
{
log_heap_prune_and_freeze(vacrel->rel, buffer,
+ false,
vmbuffer,
vmflags,
set_pd_all_vis,
diff --git a/src/include/access/heapam.h b/src/include/access/heapam.h
index 8b47295efa2..e7129a644a1 100644
--- a/src/include/access/heapam.h
+++ b/src/include/access/heapam.h
@@ -394,6 +394,7 @@ extern void heap_page_prune_execute(Buffer buffer, bool lp_truncate_only,
OffsetNumber *nowunused, int nunused);
extern void heap_get_root_tuples(Page page, OffsetNumber *root_offsets);
extern void log_heap_prune_and_freeze(Relation relation, Buffer buffer,
+ bool force_heap_fpi,
Buffer vmbuffer,
uint8 vmflags,
bool vm_modified_heap_page,
--
2.43.0
v11-0008-Keep-all_frozen-updated-too-in-heap_page_prune_a.patchtext/x-patch; charset=US-ASCII; name=v11-0008-Keep-all_frozen-updated-too-in-heap_page_prune_a.patchDownload
From 76c45ff7622f5c7859ea09eb65c1c552ab6b3ec1 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <melanieplageman@gmail.com>
Date: Wed, 23 Jul 2025 16:01:24 -0400
Subject: [PATCH v11 08/20] Keep all_frozen updated too in
heap_page_prune_and_freeze
We previously relied on only using all-visible and all-frozen together
but it's best to keep them both updated.
Future commits will separate usage of these fields, so it is best not to
rely on all_visible for all_frozen's validity.
---
src/backend/access/heap/pruneheap.c | 15 ++++++---------
1 file changed, 6 insertions(+), 9 deletions(-)
diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index 54af3296b91..bbd83e4fcc7 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -143,10 +143,6 @@ typedef struct
* whether to freeze the page or not. The all_visible and all_frozen
* values returned to the caller are adjusted to include LP_DEAD items at
* the end.
- *
- * all_frozen should only be considered valid if all_visible is also set;
- * we don't bother to clear the all_frozen flag every time we clear the
- * all_visible flag.
*/
bool all_visible;
bool all_frozen;
@@ -830,6 +826,7 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
*/
}
+ Assert(!prstate.all_frozen || prstate.all_visible);
/* Any error while applying the changes is critical */
START_CRIT_SECTION();
@@ -1474,7 +1471,7 @@ heap_prune_record_unchanged_lp_normal(Page page, PruneState *prstate, OffsetNumb
if (!HeapTupleHeaderXminCommitted(htup))
{
- prstate->all_visible = false;
+ prstate->all_visible = prstate->all_frozen = false;
break;
}
@@ -1496,7 +1493,7 @@ heap_prune_record_unchanged_lp_normal(Page page, PruneState *prstate, OffsetNumb
Assert(prstate->cutoffs);
if (!TransactionIdPrecedes(xmin, prstate->cutoffs->OldestXmin))
{
- prstate->all_visible = false;
+ prstate->all_visible = prstate->all_frozen = false;
break;
}
@@ -1509,7 +1506,7 @@ heap_prune_record_unchanged_lp_normal(Page page, PruneState *prstate, OffsetNumb
case HEAPTUPLE_RECENTLY_DEAD:
prstate->recently_dead_tuples++;
- prstate->all_visible = false;
+ prstate->all_visible = prstate->all_frozen = false;
/*
* This tuple will soon become DEAD. Update the hint field so
@@ -1528,7 +1525,7 @@ heap_prune_record_unchanged_lp_normal(Page page, PruneState *prstate, OffsetNumb
* assumption is a bit shaky, but it is what acquire_sample_rows()
* does, so be consistent.
*/
- prstate->all_visible = false;
+ prstate->all_visible = prstate->all_frozen = false;
/*
* If we wanted to optimize for aborts, we might consider marking
@@ -1546,7 +1543,7 @@ heap_prune_record_unchanged_lp_normal(Page page, PruneState *prstate, OffsetNumb
* will commit and update the counters after we report.
*/
prstate->live_tuples++;
- prstate->all_visible = false;
+ prstate->all_visible = prstate->all_frozen = false;
/*
* This tuple may soon become DEAD. Update the hint field so that
--
2.43.0
v11-0007-Find-and-fix-VM-corruption-in-heap_page_prune_an.patchtext/x-patch; charset=US-ASCII; name=v11-0007-Find-and-fix-VM-corruption-in-heap_page_prune_an.patchDownload
From 34188dade43706d764392ef68af82c1e6deb663a Mon Sep 17 00:00:00 2001
From: Melanie Plageman <melanieplageman@gmail.com>
Date: Wed, 28 May 2025 16:45:59 -0400
Subject: [PATCH v11 07/20] Find and fix VM corruption in
heap_page_prune_and_freeze
Future commits will update the VM in the same critical section and WAL
record as pruning and freezing. For ease of review, this commit makes
one step toward doing this. It moves the VM corruption handling case to
heap_page_prune_and_freeze().
This commit is only really meant for review, as it adds a member to
PruneFreezeResult (vm_corruption) that is removed in later commits.
---
src/backend/access/heap/pruneheap.c | 93 +++++++++++++++++++++++++++-
src/backend/access/heap/vacuumlazy.c | 84 +++----------------------
src/include/access/heapam.h | 4 ++
3 files changed, 102 insertions(+), 79 deletions(-)
diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index 373986b204a..54af3296b91 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -21,7 +21,7 @@
#include "access/transam.h"
#include "access/xlog.h"
#include "access/xloginsert.h"
-#include "access/visibilitymapdefs.h"
+#include "access/visibilitymap.h"
#include "commands/vacuum.h"
#include "executor/instrument.h"
#include "miscadmin.h"
@@ -177,6 +177,13 @@ static void heap_prune_record_unchanged_lp_redirect(PruneState *prstate, OffsetN
static void page_verify_redirects(Page page);
+static bool identify_and_fix_vm_corruption(Relation relation,
+ BlockNumber heap_blk,
+ Buffer heap_buffer, Page heap_page,
+ bool heap_blk_known_av,
+ int64 nlpdead_items,
+ Buffer vmbuffer);
+
/*
* Optionally prune and repair fragmentation in the specified page.
@@ -261,7 +268,9 @@ heap_page_prune_opt(Relation relation, Buffer buffer)
* not the relation has indexes, since we cannot safely determine
* that during on-access pruning with the current implementation.
*/
- heap_page_prune_and_freeze(relation, buffer, vistest, 0,
+ heap_page_prune_and_freeze(relation, buffer, false,
+ InvalidBuffer,
+ vistest, 0,
NULL, &presult, PRUNE_ON_ACCESS, &dummy_off_loc, NULL, NULL);
/*
@@ -294,6 +303,70 @@ heap_page_prune_opt(Relation relation, Buffer buffer)
}
}
+/*
+ * When updating the visibility map after phase I heap vacuuming, we take the
+ * opportunity to identify and fix any VM corruption.
+ *
+ * heap_blk_known_av is the visibility status of the heap page collected
+ * while finding the next unskippable block in heap_vac_scan_next_block().
+ */
+static bool
+identify_and_fix_vm_corruption(Relation relation,
+ BlockNumber heap_blk,
+ Buffer heap_buffer, Page heap_page,
+ bool heap_blk_known_av,
+ int64 nlpdead_items,
+ Buffer vmbuffer)
+{
+ /*
+ * As of PostgreSQL 9.2, the visibility map bit should never be set if the
+ * page-level bit is clear. However, it's possible that the bit got
+ * cleared after heap_vac_scan_next_block() was called, so we must recheck
+ * with buffer lock before concluding that the VM is corrupt.
+ */
+ if (heap_blk_known_av && !PageIsAllVisible(heap_page) &&
+ visibilitymap_get_status(relation, heap_blk, &vmbuffer) != 0)
+ {
+ ereport(WARNING,
+ (errcode(ERRCODE_DATA_CORRUPTED),
+ errmsg("page is not marked all-visible but visibility map bit is set in relation \"%s\" page %u",
+ RelationGetRelationName(relation), heap_blk)));
+
+ visibilitymap_clear(relation, heap_blk, vmbuffer,
+ VISIBILITYMAP_VALID_BITS);
+ return true;
+ }
+
+ /*
+ * It's possible for the value returned by
+ * GetOldestNonRemovableTransactionId() to move backwards, so it's not
+ * wrong for us to see tuples that appear to not be visible to everyone
+ * yet, while PD_ALL_VISIBLE is already set. The real safe xmin value
+ * never moves backwards, but GetOldestNonRemovableTransactionId() is
+ * conservative and sometimes returns a value that's unnecessarily small,
+ * so if we see that contradiction it just means that the tuples that we
+ * think are not visible to everyone yet actually are, and the
+ * PD_ALL_VISIBLE flag is correct.
+ *
+ * There should never be LP_DEAD items on a page with PD_ALL_VISIBLE set,
+ * however.
+ */
+ if (nlpdead_items > 0 && PageIsAllVisible(heap_page))
+ {
+ ereport(WARNING,
+ (errcode(ERRCODE_DATA_CORRUPTED),
+ errmsg("page containing LP_DEAD items is marked as all-visible in relation \"%s\" page %u",
+ RelationGetRelationName(relation), heap_blk)));
+
+ PageClearAllVisible(heap_page);
+ MarkBufferDirty(heap_buffer);
+ visibilitymap_clear(relation, heap_blk, vmbuffer,
+ VISIBILITYMAP_VALID_BITS);
+ return true;
+ }
+
+ return false;
+}
/*
* Prune and repair fragmentation and potentially freeze tuples on the
@@ -314,6 +387,10 @@ heap_page_prune_opt(Relation relation, Buffer buffer)
* HEAP_PRUNE_FREEZE option is not set, because at the moment only callers
* that also freeze need that information.
*
+ * blk_known_av is the visibility status of the heap block as of the last call
+ * to find_next_unskippable_block(). vmbuffer is the buffer that may already
+ * contain the required block of the visibility map.
+ *
* vistest is used to distinguish whether tuples are DEAD or RECENTLY_DEAD
* (see heap_prune_satisfies_vacuum).
*
@@ -349,6 +426,8 @@ heap_page_prune_opt(Relation relation, Buffer buffer)
*/
void
heap_page_prune_and_freeze(Relation relation, Buffer buffer,
+ bool blk_known_av,
+ Buffer vmbuffer,
GlobalVisState *vistest,
int options,
struct VacuumCutoffs *cutoffs,
@@ -897,6 +976,16 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
presult->lpdead_items = prstate.lpdead_items;
/* the presult->deadoffsets array was already filled in */
+ /*
+ * Clear any VM corruption. This does not need to be done in a critical
+ * section.
+ */
+ presult->vm_corruption = false;
+ if (options & HEAP_PAGE_PRUNE_UPDATE_VM)
+ presult->vm_corruption = identify_and_fix_vm_corruption(relation,
+ blockno, buffer, page,
+ blk_known_av,
+ prstate.lpdead_items, vmbuffer);
if (prstate.freeze)
{
if (presult->nfrozen > 0)
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index 1febb524d41..574e415b0e0 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -430,12 +430,6 @@ static void find_next_unskippable_block(LVRelState *vacrel, bool *skipsallvis);
static bool lazy_scan_new_or_empty(LVRelState *vacrel, Buffer buf,
BlockNumber blkno, Page page,
bool sharelock, Buffer vmbuffer);
-static bool identify_and_fix_vm_corruption(Relation relation,
- BlockNumber heap_blk,
- Buffer heap_buffer, Page heap_page,
- bool heap_blk_known_av,
- int64 nlpdead_items,
- Buffer vmbuffer);
static int lazy_scan_prune(LVRelState *vacrel, Buffer buf,
BlockNumber blkno, Page page,
Buffer vmbuffer, bool all_visible_according_to_vm,
@@ -1940,72 +1934,6 @@ lazy_scan_new_or_empty(LVRelState *vacrel, Buffer buf, BlockNumber blkno,
return false;
}
-/*
- * When updating the visibility map after phase I heap vacuuming, we take the
- * opportunity to identify and fix any VM corruption.
- *
- * heap_blk_known_av is the visibility status of the heap page collected
- * while finding the next unskippable block in heap_vac_scan_next_block().
- */
-static bool
-identify_and_fix_vm_corruption(Relation relation,
- BlockNumber heap_blk,
- Buffer heap_buffer, Page heap_page,
- bool heap_blk_known_av,
- int64 nlpdead_items,
- Buffer vmbuffer)
-{
- /*
- * As of PostgreSQL 9.2, the visibility map bit should never be set if the
- * page-level bit is clear. However, it's possible that the bit got
- * cleared after heap_vac_scan_next_block() was called, so we must recheck
- * with buffer lock before concluding that the VM is corrupt.
- */
- if (heap_blk_known_av && !PageIsAllVisible(heap_page) &&
- visibilitymap_get_status(relation, heap_blk, &vmbuffer) != 0)
- {
- ereport(WARNING,
- (errcode(ERRCODE_DATA_CORRUPTED),
- errmsg("page is not marked all-visible but visibility map bit is set in relation \"%s\" page %u",
- RelationGetRelationName(relation), heap_blk)));
-
- visibilitymap_clear(relation, heap_blk, vmbuffer,
- VISIBILITYMAP_VALID_BITS);
- return true;
- }
-
- /*
- * It's possible for the value returned by
- * GetOldestNonRemovableTransactionId() to move backwards, so it's not
- * wrong for us to see tuples that appear to not be visible to everyone
- * yet, while PD_ALL_VISIBLE is already set. The real safe xmin value
- * never moves backwards, but GetOldestNonRemovableTransactionId() is
- * conservative and sometimes returns a value that's unnecessarily small,
- * so if we see that contradiction it just means that the tuples that we
- * think are not visible to everyone yet actually are, and the
- * PD_ALL_VISIBLE flag is correct.
- *
- * There should never be LP_DEAD items on a page with PD_ALL_VISIBLE set,
- * however.
- */
- if (nlpdead_items > 0 && PageIsAllVisible(heap_page))
- {
- ereport(WARNING,
- (errcode(ERRCODE_DATA_CORRUPTED),
- errmsg("page containing LP_DEAD items is marked as all-visible in relation \"%s\" page %u",
- RelationGetRelationName(relation), heap_blk)));
-
- PageClearAllVisible(heap_page);
- MarkBufferDirty(heap_buffer);
- visibilitymap_clear(relation, heap_blk, vmbuffer,
- VISIBILITYMAP_VALID_BITS);
- return true;
- }
-
- return false;
-}
-
-
/* qsort comparator for sorting OffsetNumbers */
static int
cmpOffsetNumbers(const void *a, const void *b)
@@ -2063,11 +1991,14 @@ lazy_scan_prune(LVRelState *vacrel,
* tuples. Pruning will have determined whether or not the page is
* all-visible.
*/
- prune_options = HEAP_PAGE_PRUNE_FREEZE;
+ prune_options = HEAP_PAGE_PRUNE_FREEZE | HEAP_PAGE_PRUNE_UPDATE_VM;
if (vacrel->nindexes == 0)
prune_options |= HEAP_PAGE_PRUNE_MARK_UNUSED_NOW;
- heap_page_prune_and_freeze(rel, buf, vacrel->vistest, prune_options,
+ heap_page_prune_and_freeze(rel, buf,
+ all_visible_according_to_vm,
+ vmbuffer,
+ vacrel->vistest, prune_options,
&vacrel->cutoffs, &presult, PRUNE_VACUUM_SCAN,
&vacrel->offnum,
&vacrel->NewRelfrozenXid, &vacrel->NewRelminMxid);
@@ -2152,10 +2083,9 @@ lazy_scan_prune(LVRelState *vacrel,
/*
* Handle setting visibility map bit based on information from the VM (as
* of last heap_vac_scan_next_block() call), and from all_visible and
- * all_frozen variables. Start by looking for any VM corruption.
+ * all_frozen variables.
*/
- if (identify_and_fix_vm_corruption(vacrel->rel, blkno, buf, page,
- all_visible_according_to_vm, presult.lpdead_items, vmbuffer))
+ if (presult.vm_corruption)
{
/* Don't update the VM if we just cleared corruption in it */
}
diff --git a/src/include/access/heapam.h b/src/include/access/heapam.h
index e7129a644a1..0c7eb5e46f4 100644
--- a/src/include/access/heapam.h
+++ b/src/include/access/heapam.h
@@ -42,6 +42,7 @@
/* "options" flag bits for heap_page_prune_and_freeze */
#define HEAP_PAGE_PRUNE_MARK_UNUSED_NOW (1 << 0)
#define HEAP_PAGE_PRUNE_FREEZE (1 << 1)
+#define HEAP_PAGE_PRUNE_UPDATE_VM (1 << 2)
typedef struct BulkInsertStateData *BulkInsertState;
struct TupleTableSlot;
@@ -247,6 +248,7 @@ typedef struct PruneFreezeResult
bool all_visible;
bool all_frozen;
TransactionId vm_conflict_horizon;
+ bool vm_corruption;
/*
* Whether or not the page makes rel truncation unsafe. This is set to
@@ -380,6 +382,8 @@ extern TransactionId heap_index_delete_tuples(Relation rel,
struct GlobalVisState;
extern void heap_page_prune_opt(Relation relation, Buffer buffer);
extern void heap_page_prune_and_freeze(Relation relation, Buffer buffer,
+ bool blk_known_av,
+ Buffer vmbuffer,
struct GlobalVisState *vistest,
int options,
struct VacuumCutoffs *cutoffs,
--
2.43.0
v11-0006-Combine-vacuum-phase-I-VM-update-cases.patchtext/x-patch; charset=US-ASCII; name=v11-0006-Combine-vacuum-phase-I-VM-update-cases.patchDownload
From 8791f69380c4b30c85a590bf697440efa064ac7c Mon Sep 17 00:00:00 2001
From: Melanie Plageman <melanieplageman@gmail.com>
Date: Wed, 28 May 2025 16:35:36 -0400
Subject: [PATCH v11 06/20] Combine vacuum phase I VM update cases
We update the VM after phase I of vacuum -- either setting both the VM
bits when all bits are currently unset or setting just the frozen bit
when the all-visible bit is already set.
Those two cases shared much of the same code -- leading to unnecessary
duplication. This commit combines them, which is simpler and easier to
understand.
The combined case also happens to fix a longstanding bug where if we are
only setting an all-visible page all-frozen and checksums/wal_log_hints
are enabled, we would fail to set the buffer dirty before setting the
page LSN in visibilitymap_set().
---
src/backend/access/heap/vacuumlazy.c | 101 +++++++++------------------
1 file changed, 32 insertions(+), 69 deletions(-)
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index e9b4e924d22..1febb524d41 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -2159,11 +2159,26 @@ lazy_scan_prune(LVRelState *vacrel,
{
/* Don't update the VM if we just cleared corruption in it */
}
- else if (!all_visible_according_to_vm && presult.all_visible)
+
+ /*
+ * If the page isn't yet marked all-visible in the VM or it is and needs
+ * to me marked all-frozen, update the VM. Note that all_frozen is only
+ * valid if all_visible is true, so we must check both all_visible and
+ * all_frozen.
+ */
+ else if (presult.all_visible &&
+ (!all_visible_according_to_vm ||
+ (presult.all_frozen && !VM_ALL_FROZEN(vacrel->rel, blkno, &vmbuffer))))
{
uint8 old_vmbits;
uint8 flags = VISIBILITYMAP_ALL_VISIBLE;
+ /*
+ * If the page is all-frozen, we can pass InvalidTransactionId as our
+ * cutoff_xid, since a snapshotConflictHorizon sufficient to make
+ * everything safe for REDO was logged when the page's tuples were
+ * frozen.
+ */
if (presult.all_frozen)
{
Assert(!TransactionIdIsValid(presult.vm_conflict_horizon));
@@ -2176,21 +2191,29 @@ lazy_scan_prune(LVRelState *vacrel,
* checksums are not enabled). Regardless, set both bits so that we
* get back in sync.
*
- * NB: If the heap page is all-visible but the VM bit is not set, we
- * don't need to dirty the heap page. However, if checksums are
- * enabled, we do need to make sure that the heap page is dirtied
- * before passing it to visibilitymap_set(), because it may be logged.
- * Given that this situation should only happen in rare cases after a
- * crash, it is not worth optimizing.
+ * If the heap page is all-visible but the VM bit is not set, we don't
+ * need to dirty the heap page. However, if checksums are enabled, we
+ * do need to make sure that the heap page is dirtied before passing
+ * it to visibilitymap_set(), because it may be logged.
*/
- PageSetAllVisible(page);
- MarkBufferDirty(buf);
+ if (!PageIsAllVisible(page) || XLogHintBitIsNeeded())
+ {
+ PageSetAllVisible(page);
+ MarkBufferDirty(buf);
+ }
+
old_vmbits = visibilitymap_set(vacrel->rel, blkno, buf,
InvalidXLogRecPtr,
vmbuffer, presult.vm_conflict_horizon,
flags);
/*
+ * Even if we are only setting the all-frozen bit, there is a small
+ * chance that the VM was modified sometime between setting
+ * all_visible_according_to_vm and checking the visibility during
+ * pruning. Check the return value of old_vmbits to ensure the
+ * visibility map counters used for logging are accurate.
+ *
* If the page wasn't already set all-visible and/or all-frozen in the
* VM, count it as newly set for logging.
*/
@@ -2211,66 +2234,6 @@ lazy_scan_prune(LVRelState *vacrel,
}
}
- /*
- * If the all-visible page is all-frozen but not marked as such yet, mark
- * it as all-frozen. Note that all_frozen is only valid if all_visible is
- * true, so we must check both all_visible and all_frozen.
- */
- else if (all_visible_according_to_vm && presult.all_visible &&
- presult.all_frozen && !VM_ALL_FROZEN(vacrel->rel, blkno, &vmbuffer))
- {
- uint8 old_vmbits;
-
- /*
- * Avoid relying on all_visible_according_to_vm as a proxy for the
- * page-level PD_ALL_VISIBLE bit being set, since it might have become
- * stale -- even when all_visible is set
- */
- if (!PageIsAllVisible(page))
- {
- PageSetAllVisible(page);
- MarkBufferDirty(buf);
- }
-
- /*
- * Set the page all-frozen (and all-visible) in the VM.
- *
- * We can pass InvalidTransactionId as our cutoff_xid, since a
- * snapshotConflictHorizon sufficient to make everything safe for REDO
- * was logged when the page's tuples were frozen.
- */
- Assert(!TransactionIdIsValid(presult.vm_conflict_horizon));
- old_vmbits = visibilitymap_set(vacrel->rel, blkno, buf,
- InvalidXLogRecPtr,
- vmbuffer, InvalidTransactionId,
- VISIBILITYMAP_ALL_VISIBLE |
- VISIBILITYMAP_ALL_FROZEN);
-
- /*
- * The page was likely already set all-visible in the VM. However,
- * there is a small chance that it was modified sometime between
- * setting all_visible_according_to_vm and checking the visibility
- * during pruning. Check the return value of old_vmbits anyway to
- * ensure the visibility map counters used for logging are accurate.
- */
- if ((old_vmbits & VISIBILITYMAP_ALL_VISIBLE) == 0)
- {
- vacrel->vm_new_visible_pages++;
- vacrel->vm_new_visible_frozen_pages++;
- *vm_page_frozen = true;
- }
-
- /*
- * We already checked that the page was not set all-frozen in the VM
- * above, so we don't need to test the value of old_vmbits.
- */
- else
- {
- vacrel->vm_new_frozen_pages++;
- *vm_page_frozen = true;
- }
- }
-
return presult.ndeleted;
}
--
2.43.0
v11-0009-Update-VM-in-pruneheap.c.patchtext/x-patch; charset=US-ASCII; name=v11-0009-Update-VM-in-pruneheap.c.patchDownload
From 04abbbd76bded07b80a48fb7af9e30bc8cca93a2 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <melanieplageman@gmail.com>
Date: Mon, 2 Jun 2025 11:04:14 -0400
Subject: [PATCH v11 09/20] Update VM in pruneheap.c
As a step toward updating the VM in the same critical section and WAL
record as pruning and freezing (during phase I of vacuuming), first move
the VM update (still in its own critical section and WAL record) into
heap_page_prune_and_freeze(). This makes review easier.
---
src/backend/access/heap/pruneheap.c | 99 +++++++++++++++++++++++-----
src/backend/access/heap/vacuumlazy.c | 98 +++++----------------------
src/include/access/heapam.h | 15 +++--
3 files changed, 106 insertions(+), 106 deletions(-)
diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index bbd83e4fcc7..398962ed1cb 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -366,7 +366,8 @@ identify_and_fix_vm_corruption(Relation relation,
/*
* Prune and repair fragmentation and potentially freeze tuples on the
- * specified page.
+ * specified page. If the page's visibility status has changed, update it in
+ * the VM.
*
* Caller must have pin and buffer cleanup lock on the page. Note that we
* don't update the FSM information for page on caller's behalf. Caller might
@@ -442,6 +443,8 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
bool do_freeze;
bool do_prune;
bool do_hint;
+ uint8 vmflags = 0;
+ uint8 old_vmbits = 0;
bool hint_bit_fpi;
int64 fpi_before = pgWalUsage.wal_fpi;
@@ -942,7 +945,7 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
*
* Now that freezing has been finalized, unset all_visible if there are
* any LP_DEAD items on the page. It needs to reflect the present state
- * of the page, as expected by our caller.
+ * of the page, as expected for updating the visibility map.
*/
if (prstate.all_visible && prstate.lpdead_items == 0)
{
@@ -958,31 +961,91 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
presult->hastup = prstate.hastup;
/*
- * For callers planning to update the visibility map, the conflict horizon
- * for that record must be the newest xmin on the page. However, if the
- * page is completely frozen, there can be no conflict and the
- * vm_conflict_horizon should remain InvalidTransactionId. This includes
- * the case that we just froze all the tuples; the prune-freeze record
- * included the conflict XID already so the caller doesn't need it.
+ * If updating the visibility map, the conflict horizon for that record
+ * must be the newest xmin on the page. However, if the page is
+ * completely frozen, there can be no conflict and the vm_conflict_horizon
+ * should remain InvalidTransactionId. This includes the case that we
+ * just froze all the tuples; the prune-freeze record included the
+ * conflict XID already so the VM update record doesn't need it.
*/
if (presult->all_frozen)
presult->vm_conflict_horizon = InvalidTransactionId;
else
presult->vm_conflict_horizon = prstate.visibility_cutoff_xid;
- presult->lpdead_items = prstate.lpdead_items;
- /* the presult->deadoffsets array was already filled in */
-
/*
- * Clear any VM corruption. This does not need to be done in a critical
- * section.
+ * Handle setting visibility map bit based on information from the VM (as
+ * of last heap_vac_scan_next_block() call), and from all_visible and
+ * all_frozen variables.
*/
- presult->vm_corruption = false;
if (options & HEAP_PAGE_PRUNE_UPDATE_VM)
- presult->vm_corruption = identify_and_fix_vm_corruption(relation,
- blockno, buffer, page,
- blk_known_av,
- prstate.lpdead_items, vmbuffer);
+ {
+ if (identify_and_fix_vm_corruption(relation,
+ blockno, buffer, page,
+ blk_known_av,
+ prstate.lpdead_items, vmbuffer))
+ {
+ /* If we fix corruption, don't update the VM further */
+ }
+
+ /*
+ * If the page isn't yet marked all-visible in the VM or it is and
+ * needs to me marked all-frozen, update the VM. Note that all_frozen
+ * is only valid if all_visible is true, so we must check both
+ * all_visible and all_frozen.
+ */
+ else if (presult->all_visible &&
+ (!blk_known_av ||
+ (presult->all_frozen && !VM_ALL_FROZEN(relation, blockno, &vmbuffer))))
+ {
+ Assert(prstate.lpdead_items == 0);
+ vmflags = VISIBILITYMAP_ALL_VISIBLE;
+
+ /*
+ * If the page is all-frozen, we can pass InvalidTransactionId as
+ * our cutoff_xid, since a snapshotConflictHorizon sufficient to
+ * make everything safe for REDO was logged when the page's tuples
+ * were frozen.
+ */
+ if (presult->all_frozen)
+ {
+ Assert(!TransactionIdIsValid(presult->vm_conflict_horizon));
+ vmflags |= VISIBILITYMAP_ALL_FROZEN;
+ }
+
+ /*
+ * It's possible for the VM bit to be clear and the page-level bit
+ * to be set if checksums are not enabled.
+ *
+ * And even if we are just planning to update the frozen bit in
+ * the VM, we shouldn't rely on all_visible_according_to_vm as a
+ * proxy for the page-level PD_ALL_VISIBLE bit being set, since it
+ * might have become stale.
+ *
+ * If the heap page is all-visible but the VM bit is not set, we
+ * don't need to dirty the heap page. However, if checksums are
+ * enabled, we do need to make sure that the heap page is dirtied
+ * before passing it to visibilitymap_set(), because it may be
+ * logged.
+ */
+ if (!PageIsAllVisible(page) || XLogHintBitIsNeeded())
+ {
+ PageSetAllVisible(page);
+ MarkBufferDirty(buffer);
+ }
+
+ old_vmbits = visibilitymap_set(relation, blockno, buffer, InvalidXLogRecPtr,
+ vmbuffer, presult->vm_conflict_horizon,
+ vmflags);
+ }
+ }
+
+ presult->lpdead_items = prstate.lpdead_items;
+ /* the presult->deadoffsets array was already filled in */
+
+ presult->old_vmbits = old_vmbits;
+ presult->new_vmbits = vmflags;
+
if (prstate.freeze)
{
if (presult->nfrozen > 0)
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index 574e415b0e0..9492423141e 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -1949,7 +1949,8 @@ cmpOffsetNumbers(const void *a, const void *b)
* vmbuffer is the buffer containing the VM block with visibility information
* for the heap block, blkno. all_visible_according_to_vm is the saved
* visibility status of the heap block looked up earlier by the caller. We
- * won't rely entirely on this status, as it may be out of date.
+ * won't rely entirely on this status, as it may be out of date. These will be
+ * passed on to heap_page_prune_and_freeze() to use while setting the VM.
*
* *has_lpdead_items is set to true or false depending on whether, upon return
* from this function, any LP_DEAD items are still present on the page.
@@ -1978,6 +1979,7 @@ lazy_scan_prune(LVRelState *vacrel,
/*
* Prune all HOT-update chains and potentially freeze tuples on this page.
+ * Then, if the page's visibility status has changed, update the VM.
*
* If the relation has no indexes, we can immediately mark would-be dead
* items LP_UNUSED.
@@ -1986,10 +1988,6 @@ lazy_scan_prune(LVRelState *vacrel,
* presult.ndeleted. It should not be confused with presult.lpdead_items;
* presult.lpdead_items's final value can be thought of as the number of
* tuples that were deleted from indexes.
- *
- * We will update the VM after collecting LP_DEAD items and freezing
- * tuples. Pruning will have determined whether or not the page is
- * all-visible.
*/
prune_options = HEAP_PAGE_PRUNE_FREEZE | HEAP_PAGE_PRUNE_UPDATE_VM;
if (vacrel->nindexes == 0)
@@ -2081,88 +2079,26 @@ lazy_scan_prune(LVRelState *vacrel,
Assert(!presult.all_visible || !(*has_lpdead_items));
/*
- * Handle setting visibility map bit based on information from the VM (as
- * of last heap_vac_scan_next_block() call), and from all_visible and
- * all_frozen variables.
+ * For the purposes of logging, count whether or not the page was newly
+ * set all-visible and, potentially, all-frozen.
*/
- if (presult.vm_corruption)
+ if ((presult.old_vmbits & VISIBILITYMAP_ALL_VISIBLE) == 0 &&
+ (presult.new_vmbits & VISIBILITYMAP_ALL_VISIBLE) != 0)
{
- /* Don't update the VM if we just cleared corruption in it */
- }
-
- /*
- * If the page isn't yet marked all-visible in the VM or it is and needs
- * to me marked all-frozen, update the VM. Note that all_frozen is only
- * valid if all_visible is true, so we must check both all_visible and
- * all_frozen.
- */
- else if (presult.all_visible &&
- (!all_visible_according_to_vm ||
- (presult.all_frozen && !VM_ALL_FROZEN(vacrel->rel, blkno, &vmbuffer))))
- {
- uint8 old_vmbits;
- uint8 flags = VISIBILITYMAP_ALL_VISIBLE;
-
- /*
- * If the page is all-frozen, we can pass InvalidTransactionId as our
- * cutoff_xid, since a snapshotConflictHorizon sufficient to make
- * everything safe for REDO was logged when the page's tuples were
- * frozen.
- */
- if (presult.all_frozen)
- {
- Assert(!TransactionIdIsValid(presult.vm_conflict_horizon));
- flags |= VISIBILITYMAP_ALL_FROZEN;
- }
-
- /*
- * It should never be the case that the visibility map page is set
- * while the page-level bit is clear, but the reverse is allowed (if
- * checksums are not enabled). Regardless, set both bits so that we
- * get back in sync.
- *
- * If the heap page is all-visible but the VM bit is not set, we don't
- * need to dirty the heap page. However, if checksums are enabled, we
- * do need to make sure that the heap page is dirtied before passing
- * it to visibilitymap_set(), because it may be logged.
- */
- if (!PageIsAllVisible(page) || XLogHintBitIsNeeded())
- {
- PageSetAllVisible(page);
- MarkBufferDirty(buf);
- }
-
- old_vmbits = visibilitymap_set(vacrel->rel, blkno, buf,
- InvalidXLogRecPtr,
- vmbuffer, presult.vm_conflict_horizon,
- flags);
-
- /*
- * Even if we are only setting the all-frozen bit, there is a small
- * chance that the VM was modified sometime between setting
- * all_visible_according_to_vm and checking the visibility during
- * pruning. Check the return value of old_vmbits to ensure the
- * visibility map counters used for logging are accurate.
- *
- * If the page wasn't already set all-visible and/or all-frozen in the
- * VM, count it as newly set for logging.
- */
- if ((old_vmbits & VISIBILITYMAP_ALL_VISIBLE) == 0)
- {
- vacrel->vm_new_visible_pages++;
- if (presult.all_frozen)
- {
- vacrel->vm_new_visible_frozen_pages++;
- *vm_page_frozen = true;
- }
- }
- else if ((old_vmbits & VISIBILITYMAP_ALL_FROZEN) == 0 &&
- presult.all_frozen)
+ vacrel->vm_new_visible_pages++;
+ if ((presult.new_vmbits & VISIBILITYMAP_ALL_FROZEN) != 0)
{
- vacrel->vm_new_frozen_pages++;
+ vacrel->vm_new_visible_frozen_pages++;
*vm_page_frozen = true;
}
}
+ else if ((presult.old_vmbits & VISIBILITYMAP_ALL_FROZEN) == 0 &&
+ (presult.new_vmbits & VISIBILITYMAP_ALL_FROZEN) != 0)
+ {
+ Assert((presult.new_vmbits & VISIBILITYMAP_ALL_VISIBLE) != 0);
+ vacrel->vm_new_frozen_pages++;
+ *vm_page_frozen = true;
+ }
return presult.ndeleted;
}
diff --git a/src/include/access/heapam.h b/src/include/access/heapam.h
index 0c7eb5e46f4..b85648456e9 100644
--- a/src/include/access/heapam.h
+++ b/src/include/access/heapam.h
@@ -235,20 +235,21 @@ typedef struct PruneFreezeResult
int recently_dead_tuples;
/*
- * all_visible and all_frozen indicate if the all-visible and all-frozen
- * bits in the visibility map can be set for this page, after pruning.
+ * all_visible and all_frozen indicate the status of the page as reflected
+ * in the visibility map after pruning, freezing, and setting any pages
+ * all-visible in the visibility map.
*
- * vm_conflict_horizon is the newest xmin of live tuples on the page. The
- * caller can use it as the conflict horizon when setting the VM bits. It
- * is only valid if we froze some tuples (nfrozen > 0), and all_frozen is
- * true.
+ * vm_conflict_horizon is the newest xmin of live tuples on the page
+ * (older than OldestXmin). It will only be valid if we did not set the
+ * page all-frozen in the VM.
*
* These are only set if the HEAP_PRUNE_FREEZE option is set.
*/
bool all_visible;
bool all_frozen;
TransactionId vm_conflict_horizon;
- bool vm_corruption;
+ uint8 old_vmbits;
+ uint8 new_vmbits;
/*
* Whether or not the page makes rel truncation unsafe. This is set to
--
2.43.0
v11-0010-Rename-PruneState.freeze-to-attempt_freeze.patchtext/x-patch; charset=US-ASCII; name=v11-0010-Rename-PruneState.freeze-to-attempt_freeze.patchDownload
From 133c61abb24a832033d973fd2509230a68cb9b9d Mon Sep 17 00:00:00 2001
From: Melanie Plageman <melanieplageman@gmail.com>
Date: Thu, 31 Jul 2025 14:07:51 -0400
Subject: [PATCH v11 10/20] Rename PruneState.freeze to attempt_freeze
This makes it more clear that this is to indicate the caller would like
heap_page_prune_and_freeze() to consider freezing tuples -- not that we
ultimately will end up freezing them.
Also rename local variable hint_bit_fpi to did_tuple_hint_fpi. This
makes it clear it is about tuple hints and not page hints and that it
indicates something that happened and not something that could happen.
---
src/backend/access/heap/pruneheap.c | 20 ++++++++++----------
1 file changed, 10 insertions(+), 10 deletions(-)
diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index 398962ed1cb..df3e6439176 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -43,7 +43,7 @@ typedef struct
/* whether or not dead items can be set LP_UNUSED during pruning */
bool mark_unused_now;
/* whether to attempt freezing tuples */
- bool freeze;
+ bool attempt_freeze;
struct VacuumCutoffs *cutoffs;
/*-------------------------------------------------------
@@ -445,13 +445,13 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
bool do_hint;
uint8 vmflags = 0;
uint8 old_vmbits = 0;
- bool hint_bit_fpi;
+ bool did_tuple_hint_fpi;
int64 fpi_before = pgWalUsage.wal_fpi;
/* Copy parameters to prstate */
prstate.vistest = vistest;
prstate.mark_unused_now = (options & HEAP_PAGE_PRUNE_MARK_UNUSED_NOW) != 0;
- prstate.freeze = (options & HEAP_PAGE_PRUNE_FREEZE) != 0;
+ prstate.attempt_freeze = (options & HEAP_PAGE_PRUNE_FREEZE) != 0;
prstate.cutoffs = cutoffs;
/*
@@ -473,7 +473,7 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
/* initialize page freezing working state */
prstate.pagefrz.freeze_required = false;
- if (prstate.freeze)
+ if (prstate.attempt_freeze)
{
Assert(new_relfrozen_xid && new_relmin_mxid);
prstate.pagefrz.FreezePageRelfrozenXid = *new_relfrozen_xid;
@@ -520,7 +520,7 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
* function, when we return the value to the caller, so that the caller
* doesn't set the VM bit incorrectly.
*/
- if (prstate.freeze)
+ if (prstate.attempt_freeze)
{
prstate.all_visible = true;
prstate.all_frozen = true;
@@ -634,7 +634,7 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
* If checksums are enabled, heap_prune_satisfies_vacuum() may have caused
* an FPI to be emitted.
*/
- hint_bit_fpi = fpi_before != pgWalUsage.wal_fpi;
+ did_tuple_hint_fpi = fpi_before != pgWalUsage.wal_fpi;
/*
* Process HOT chains.
@@ -750,7 +750,7 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
* plans we prepared, or not.
*/
do_freeze = false;
- if (prstate.freeze)
+ if (prstate.attempt_freeze)
{
if (prstate.pagefrz.freeze_required)
{
@@ -783,7 +783,7 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
*/
if (RelationNeedsWAL(relation))
{
- if (hint_bit_fpi)
+ if (did_tuple_hint_fpi)
do_freeze = true;
else if (do_prune)
{
@@ -1046,7 +1046,7 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
presult->old_vmbits = old_vmbits;
presult->new_vmbits = vmflags;
- if (prstate.freeze)
+ if (prstate.attempt_freeze)
{
if (presult->nfrozen > 0)
{
@@ -1628,7 +1628,7 @@ heap_prune_record_unchanged_lp_normal(Page page, PruneState *prstate, OffsetNumb
}
/* Consider freezing any normal tuples which will not be removed */
- if (prstate->freeze)
+ if (prstate->attempt_freeze)
{
bool totally_frozen;
--
2.43.0
v11-0013-Rename-GlobalVisTestIsRemovableXid-to-GlobalVisX.patchtext/x-patch; charset=US-ASCII; name=v11-0013-Rename-GlobalVisTestIsRemovableXid-to-GlobalVisX.patchDownload
From 74e196584204f9554c2425bc7be9ed9e1a9821fc Mon Sep 17 00:00:00 2001
From: Melanie Plageman <melanieplageman@gmail.com>
Date: Fri, 18 Jul 2025 16:30:04 -0400
Subject: [PATCH v11 13/20] Rename GlobalVisTestIsRemovableXid() to
GlobalVisXidVisibleToAll()
Currently, we only use GlobalVisTestIsRemovableXid() to check if a
tuple's xmax is visible to all, meaning we can remove it. But future
commits will use GlobalVisTestIsRemovableXid() to test if a tuple's xmin
is visible to all for the purposes of determining if setting the page
all-visible in the VM. In that case, it makes more sense to call the
function GlobalVisXidVisibleToAll().
Reviewed-by: Kirill Reshke <reshkekirill@gmail.com>
---
src/backend/access/heap/heapam_visibility.c | 6 +++---
src/backend/access/heap/pruneheap.c | 14 +++++++-------
src/backend/access/spgist/spgvacuum.c | 2 +-
src/backend/storage/ipc/procarray.c | 13 ++++++-------
src/include/utils/snapmgr.h | 4 ++--
5 files changed, 19 insertions(+), 20 deletions(-)
diff --git a/src/backend/access/heap/heapam_visibility.c b/src/backend/access/heap/heapam_visibility.c
index 05f6946fe60..4ebc8abdbeb 100644
--- a/src/backend/access/heap/heapam_visibility.c
+++ b/src/backend/access/heap/heapam_visibility.c
@@ -1447,7 +1447,7 @@ HeapTupleSatisfiesNonVacuumable(HeapTuple htup, Snapshot snapshot,
{
Assert(TransactionIdIsValid(dead_after));
- if (GlobalVisTestIsRemovableXid(snapshot->vistest, dead_after))
+ if (GlobalVisXidVisibleToAll(snapshot->vistest, dead_after))
res = HEAPTUPLE_DEAD;
}
else
@@ -1512,8 +1512,8 @@ HeapTupleIsSurelyDead(HeapTuple htup, GlobalVisState *vistest)
return false;
/* Deleter committed, so tuple is dead if the XID is old enough. */
- return GlobalVisTestIsRemovableXid(vistest,
- HeapTupleHeaderGetRawXmax(tuple));
+ return GlobalVisXidVisibleToAll(vistest,
+ HeapTupleHeaderGetRawXmax(tuple));
}
/*
diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index 6637966e927..0211effeec7 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -231,7 +231,7 @@ heap_page_prune_opt(Relation relation, Buffer buffer)
*/
vistest = GlobalVisTestFor(relation);
- if (!GlobalVisTestIsRemovableXid(vistest, prune_xid))
+ if (!GlobalVisXidVisibleToAll(vistest, prune_xid))
return;
/*
@@ -580,9 +580,9 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
* Determining HTSV only once for each tuple is required for correctness,
* to deal with cases where running HTSV twice could result in different
* results. For example, RECENTLY_DEAD can turn to DEAD if another
- * checked item causes GlobalVisTestIsRemovableFullXid() to update the
- * horizon, or INSERT_IN_PROGRESS can change to DEAD if the inserting
- * transaction aborts.
+ * checked item causes GlobalVisXidVisibleToAll() to update the horizon,
+ * or INSERT_IN_PROGRESS can change to DEAD if the inserting transaction
+ * aborts.
*
* It's also good for performance. Most commonly tuples within a page are
* stored at decreasing offsets (while the items are stored at increasing
@@ -1182,11 +1182,11 @@ heap_prune_satisfies_vacuum(PruneState *prstate, HeapTuple tup, Buffer buffer)
* Determine whether or not the tuple is considered dead when compared
* with the provided GlobalVisState. On-access pruning does not provide
* VacuumCutoffs. And for vacuum, even if the tuple's xmax is not older
- * than OldestXmin, GlobalVisTestIsRemovableXid() could find the row dead
- * if the GlobalVisState has been updated since the beginning of vacuuming
+ * than OldestXmin, GlobalVisXidVisibleToAll() could find the row dead if
+ * the GlobalVisState has been updated since the beginning of vacuuming
* the relation.
*/
- if (GlobalVisTestIsRemovableXid(prstate->vistest, dead_after))
+ if (GlobalVisXidVisibleToAll(prstate->vistest, dead_after))
return HEAPTUPLE_DEAD;
return res;
diff --git a/src/backend/access/spgist/spgvacuum.c b/src/backend/access/spgist/spgvacuum.c
index 8f8a1ad7796..496cca69410 100644
--- a/src/backend/access/spgist/spgvacuum.c
+++ b/src/backend/access/spgist/spgvacuum.c
@@ -536,7 +536,7 @@ vacuumRedirectAndPlaceholder(Relation index, Relation heaprel, Buffer buffer)
*/
if (dt->tupstate == SPGIST_REDIRECT &&
(!TransactionIdIsValid(dt->xid) ||
- GlobalVisTestIsRemovableXid(vistest, dt->xid)))
+ GlobalVisXidVisibleToAll(vistest, dt->xid)))
{
dt->tupstate = SPGIST_PLACEHOLDER;
Assert(opaque->nRedirection > 0);
diff --git a/src/backend/storage/ipc/procarray.c b/src/backend/storage/ipc/procarray.c
index 200f72c6e25..f67f01c17c2 100644
--- a/src/backend/storage/ipc/procarray.c
+++ b/src/backend/storage/ipc/procarray.c
@@ -4181,8 +4181,7 @@ GlobalVisUpdate(void)
* See comment for GlobalVisState for details.
*/
bool
-GlobalVisTestIsRemovableFullXid(GlobalVisState *state,
- FullTransactionId fxid)
+GlobalVisFullXidVisibleToAll(GlobalVisState *state, FullTransactionId fxid)
{
/*
* If fxid is older than maybe_needed bound, it definitely is visible to
@@ -4223,7 +4222,7 @@ GlobalVisTestIsRemovableFullXid(GlobalVisState *state,
* relfrozenxid).
*/
bool
-GlobalVisTestIsRemovableXid(GlobalVisState *state, TransactionId xid)
+GlobalVisXidVisibleToAll(GlobalVisState *state, TransactionId xid)
{
FullTransactionId fxid;
@@ -4237,7 +4236,7 @@ GlobalVisTestIsRemovableXid(GlobalVisState *state, TransactionId xid)
*/
fxid = FullXidRelativeTo(state->definitely_needed, xid);
- return GlobalVisTestIsRemovableFullXid(state, fxid);
+ return GlobalVisFullXidVisibleToAll(state, fxid);
}
/*
@@ -4251,12 +4250,12 @@ GlobalVisCheckRemovableFullXid(Relation rel, FullTransactionId fxid)
state = GlobalVisTestFor(rel);
- return GlobalVisTestIsRemovableFullXid(state, fxid);
+ return GlobalVisFullXidVisibleToAll(state, fxid);
}
/*
* Convenience wrapper around GlobalVisTestFor() and
- * GlobalVisTestIsRemovableXid(), see their comments.
+ * GlobalVisTestIsVisibleXid(), see their comments.
*/
bool
GlobalVisCheckRemovableXid(Relation rel, TransactionId xid)
@@ -4265,7 +4264,7 @@ GlobalVisCheckRemovableXid(Relation rel, TransactionId xid)
state = GlobalVisTestFor(rel);
- return GlobalVisTestIsRemovableXid(state, xid);
+ return GlobalVisXidVisibleToAll(state, xid);
}
/*
diff --git a/src/include/utils/snapmgr.h b/src/include/utils/snapmgr.h
index 604c1f90216..a0ea2cfcea2 100644
--- a/src/include/utils/snapmgr.h
+++ b/src/include/utils/snapmgr.h
@@ -100,8 +100,8 @@ extern char *ExportSnapshot(Snapshot snapshot);
*/
typedef struct GlobalVisState GlobalVisState;
extern GlobalVisState *GlobalVisTestFor(Relation rel);
-extern bool GlobalVisTestIsRemovableXid(GlobalVisState *state, TransactionId xid);
-extern bool GlobalVisTestIsRemovableFullXid(GlobalVisState *state, FullTransactionId fxid);
+extern bool GlobalVisXidVisibleToAll(GlobalVisState *state, TransactionId xid);
+extern bool GlobalVisFullXidVisibleToAll(GlobalVisState *state, FullTransactionId fxid);
extern bool GlobalVisCheckRemovableXid(Relation rel, TransactionId xid);
extern bool GlobalVisCheckRemovableFullXid(Relation rel, FullTransactionId fxid);
--
2.43.0
v11-0014-Use-GlobalVisState-to-determine-page-level-visib.patchtext/x-patch; charset=US-ASCII; name=v11-0014-Use-GlobalVisState-to-determine-page-level-visib.patchDownload
From 1e836a4bca61d6ab748a9d1c43dfcef6e0b06f81 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <melanieplageman@gmail.com>
Date: Tue, 29 Jul 2025 14:38:24 -0400
Subject: [PATCH v11 14/20] Use GlobalVisState to determine page level
visibility
During pruning and during vacuum's third phase, we try to determine if
the whole page can be set all-visible in the visibility map. Instead of
using OldestXmin to determine if all the tuples on a page are visible to
everyone, use the GlobalVisState. This allows us to start setting the VM
during on-access pruning in a future commit.
It is possible for the GlobalVisState to change during the course of a
vacuum. In all but extraordinary cases, it moves forward, meaning more
pages could potentially be set in the VM.
Because comparing a transaction ID to the GlobalVisState requires more
operations than comparing it to another single transaction ID, we now
wait until after examining all the tuples on the page and if we have
maintained the visibility_cutoff_xid, we compare that to the
GlobalVisState just once per page. This works because if the page is
all-visible and has live, committed tuples on it, the
visibility_cutoff_xid will contain the newest xmin on the page. If
everyone can see it, the page is truly all-visible.
Doing this may mean we examine more tuples' xmins than before, as we may
have set all_visible to false sooner when encountering a live tuple
newer than OldestXmin. However, these extra comparisons were found not
to be significant in a profile.
---
src/backend/access/heap/heapam_visibility.c | 28 ++++++++++++
src/backend/access/heap/pruneheap.c | 48 +++++++++------------
src/backend/access/heap/vacuumlazy.c | 19 ++++----
src/include/access/heapam.h | 4 +-
4 files changed, 60 insertions(+), 39 deletions(-)
diff --git a/src/backend/access/heap/heapam_visibility.c b/src/backend/access/heap/heapam_visibility.c
index 4ebc8abdbeb..edd529dc3c0 100644
--- a/src/backend/access/heap/heapam_visibility.c
+++ b/src/backend/access/heap/heapam_visibility.c
@@ -1189,6 +1189,34 @@ HeapTupleSatisfiesVacuum(HeapTuple htup, TransactionId OldestXmin,
return res;
}
+/*
+ * Nearly the same as HeapTupleSatisfiesVacuum, but uses a GlobalVisState to
+ * determine whether or not a tuple is HEAPTUPLE_DEAD Or
+ * HEAPTUPLE_RECENTLY_DEAD. It serves the same purpose but can be used by
+ * callers that have not calculated a single OldestXmin value.
+ */
+HTSV_Result
+HeapTupleSatisfiesVacuumGlobalVis(HeapTuple htup, GlobalVisState *vistest,
+ Buffer buffer)
+{
+ TransactionId dead_after = InvalidTransactionId;
+ HTSV_Result res;
+
+ res = HeapTupleSatisfiesVacuumHorizon(htup, buffer, &dead_after);
+
+ if (res == HEAPTUPLE_RECENTLY_DEAD)
+ {
+ Assert(TransactionIdIsValid(dead_after));
+
+ if (GlobalVisXidVisibleToAll(vistest, dead_after))
+ res = HEAPTUPLE_DEAD;
+ }
+ else
+ Assert(!TransactionIdIsValid(dead_after));
+
+ return res;
+}
+
/*
* Work horse for HeapTupleSatisfiesVacuum and similar routines.
*
diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index 0211effeec7..c6935e45cec 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -141,10 +141,9 @@ typedef struct
* all_visible and all_frozen indicate if the all-visible and all-frozen
* bits in the visibility map can be set for this page after pruning.
*
- * visibility_cutoff_xid is the newest xmin of live tuples on the page.
- * The caller can use it as the conflict horizon, when setting the VM
- * bits. It is only valid if we froze some tuples, and all_frozen is
- * true.
+ * visibility_cutoff_xid is the newest xmin of live tuples on the page. It
+ * can be used as the conflict horizon, when setting the VM or when
+ * freezing all the live tuples on the page.
*
* NOTE: all_visible and all_frozen don't include LP_DEAD items until
* directly before updating the VM. We ignore LP_DEAD items when deciding
@@ -559,14 +558,12 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
/*
* The visibility cutoff xid is the newest xmin of live, committed tuples
- * older than OldestXmin on the page. This field is only kept up-to-date
- * if the page is all-visible. As soon as a tuple is encountered that is
- * not visible to all, this field is unmaintained. As long as it is
- * maintained, it can be used to calculate the snapshot conflict horizon.
- * This is most likely to happen when updating the VM and/or freezing all
- * live tuples on the page. It is updated before returning to the caller
- * because vacuum does assert-build only validation on the page using this
- * field.
+ * on the page older than the visibility horizon represented in the
+ * GlobalVisState.
+ *
+ * If we encounter an uncommitted tuple, this field is unmaintained. If
+ * the page is being set all-visible or when freezing all live tuples on
+ * the page, it is used to calculate the snapshot conflict horizon.
*/
prstate.visibility_cutoff_xid = InvalidTransactionId;
@@ -762,6 +759,16 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
prstate.ndead > 0 ||
prstate.nunused > 0;
+ /*
+ * After processing all the live tuples on the page, if the newest xmin
+ * amongst them is not visible to everyone, the page cannot be
+ * all-visible.
+ */
+ if (prstate.all_visible &&
+ TransactionIdIsNormal(prstate.visibility_cutoff_xid) &&
+ !GlobalVisXidVisibleToAll(prstate.vistest, prstate.visibility_cutoff_xid))
+ prstate.all_visible = prstate.all_frozen = false;
+
/*
* Even if we don't prune anything, if we found a new value for the
* pd_prune_xid field or the page was marked full, we will update those
@@ -1108,12 +1115,10 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
TransactionId debug_cutoff;
bool debug_all_frozen;
- Assert(cutoffs);
-
Assert(prstate.lpdead_items == 0);
if (!heap_page_is_all_visible(relation, buffer,
- cutoffs->OldestXmin,
+ prstate.vistest,
&debug_all_frozen,
&debug_cutoff, off_loc))
Assert(false);
@@ -1638,19 +1643,6 @@ heap_prune_record_unchanged_lp_normal(Page page, PruneState *prstate, OffsetNumb
*/
xmin = HeapTupleHeaderGetXmin(htup);
- /*
- * For now always use prstate->cutoffs for this test, because
- * we only update 'all_visible' when freezing is requested. We
- * could use GlobalVisTestIsRemovableXid instead, if a
- * non-freezing caller wanted to set the VM bit.
- */
- Assert(prstate->cutoffs);
- if (!TransactionIdPrecedes(xmin, prstate->cutoffs->OldestXmin))
- {
- prstate->all_visible = prstate->all_frozen = false;
- break;
- }
-
/* Track newest xmin on page. */
if (TransactionIdFollows(xmin, prstate->visibility_cutoff_xid) &&
TransactionIdIsNormal(xmin))
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index 2dcca071a45..4ad05ba4db6 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -464,7 +464,7 @@ static void dead_items_add(LVRelState *vacrel, BlockNumber blkno, OffsetNumber *
static void dead_items_reset(LVRelState *vacrel);
static void dead_items_cleanup(LVRelState *vacrel);
static bool heap_page_would_be_all_visible(Relation rel, Buffer buf,
- TransactionId OldestXmin,
+ GlobalVisState *vistest,
OffsetNumber *deadoffsets,
int ndeadoffsets,
bool *all_frozen,
@@ -2717,7 +2717,7 @@ lazy_vacuum_heap_page(LVRelState *vacrel, BlockNumber blkno, Buffer buffer,
InvalidOffsetNumber);
if (heap_page_would_be_all_visible(vacrel->rel, buffer,
- vacrel->cutoffs.OldestXmin,
+ vacrel->vistest,
deadoffsets, num_offsets,
&all_frozen, &visibility_cutoff_xid,
&vacrel->offnum))
@@ -3462,13 +3462,13 @@ dead_items_cleanup(LVRelState *vacrel)
*/
bool
heap_page_is_all_visible(Relation rel, Buffer buf,
- TransactionId OldestXmin,
+ GlobalVisState *vistest,
bool *all_frozen,
TransactionId *visibility_cutoff_xid,
OffsetNumber *logging_offnum)
{
- return heap_page_would_be_all_visible(rel, buf, OldestXmin,
+ return heap_page_would_be_all_visible(rel, buf, vistest,
NULL, 0,
all_frozen,
visibility_cutoff_xid,
@@ -3487,7 +3487,7 @@ heap_page_is_all_visible(Relation rel, Buffer buf,
* Returns true if the page is all-visible other than the provided
* deadoffsets and false otherwise.
*
- * OldestXmin is used to determine visibility.
+ * vistest is used to determine visibility.
*
* *all_frozen is an output parameter indicating to the caller if every tuple
* on the page is frozen.
@@ -3508,7 +3508,7 @@ heap_page_is_all_visible(Relation rel, Buffer buf,
*/
static bool
heap_page_would_be_all_visible(Relation rel, Buffer buf,
- TransactionId OldestXmin,
+ GlobalVisState *vistest,
OffsetNumber *deadoffsets,
int ndeadoffsets,
bool *all_frozen,
@@ -3580,8 +3580,8 @@ heap_page_would_be_all_visible(Relation rel, Buffer buf,
tuple.t_len = ItemIdGetLength(itemid);
tuple.t_tableOid = RelationGetRelid(rel);
- switch (HeapTupleSatisfiesVacuum(&tuple, OldestXmin,
- buf))
+ switch (HeapTupleSatisfiesVacuumGlobalVis(&tuple, vistest,
+ buf))
{
case HEAPTUPLE_LIVE:
{
@@ -3600,8 +3600,7 @@ heap_page_would_be_all_visible(Relation rel, Buffer buf,
* that everyone sees it as committed?
*/
xmin = HeapTupleHeaderGetXmin(tuple.t_data);
- if (!TransactionIdPrecedes(xmin,
- OldestXmin))
+ if (!GlobalVisXidVisibleToAll(vistest, xmin))
{
all_visible = false;
*all_frozen = false;
diff --git a/src/include/access/heapam.h b/src/include/access/heapam.h
index 0b9bb1c9b13..4278f351bdf 100644
--- a/src/include/access/heapam.h
+++ b/src/include/access/heapam.h
@@ -342,7 +342,7 @@ extern void heap_inplace_unlock(Relation relation,
HeapTuple oldtup, Buffer buffer);
extern bool heap_page_is_all_visible(Relation rel, Buffer buf,
- TransactionId OldestXmin,
+ GlobalVisState *vistest,
bool *all_frozen,
TransactionId *visibility_cutoff_xid,
OffsetNumber *logging_offnum);
@@ -415,6 +415,8 @@ extern TM_Result HeapTupleSatisfiesUpdate(HeapTuple htup, CommandId curcid,
Buffer buffer);
extern HTSV_Result HeapTupleSatisfiesVacuum(HeapTuple htup, TransactionId OldestXmin,
Buffer buffer);
+extern HTSV_Result HeapTupleSatisfiesVacuumGlobalVis(HeapTuple htup,
+ GlobalVisState *vistest, Buffer buffer);
extern HTSV_Result HeapTupleSatisfiesVacuumHorizon(HeapTuple htup, Buffer buffer,
TransactionId *dead_after);
extern void HeapTupleSetHintBits(HeapTupleHeader tuple, Buffer buffer,
--
2.43.0
v11-0011-Eliminate-xl_heap_visible-from-vacuum-phase-I-pr.patchtext/x-patch; charset=US-ASCII; name=v11-0011-Eliminate-xl_heap_visible-from-vacuum-phase-I-pr.patchDownload
From 029312b2d8a64782179df1bced1545bec1675211 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <melanieplageman@gmail.com>
Date: Wed, 18 Jun 2025 12:41:00 -0400
Subject: [PATCH v11 11/20] Eliminate xl_heap_visible from vacuum phase I
prune/freeze
Instead of emitting a separate WAL record for every block rendered
all-visible/frozen by vacuum's phase I, include the changes to the VM in
the xl_heap_prune record already emitted.
This is only enabled for vacuum's prune/freeze work, not for on-access
pruning.
---
src/backend/access/heap/pruneheap.c | 459 ++++++++++++++++-----------
src/backend/access/heap/vacuumlazy.c | 30 --
src/include/access/heapam.h | 15 +-
3 files changed, 282 insertions(+), 222 deletions(-)
diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index df3e6439176..dce9025d268 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -44,6 +44,13 @@ typedef struct
bool mark_unused_now;
/* whether to attempt freezing tuples */
bool attempt_freeze;
+
+ /*
+ * Whether or not to consider updating the VM. There is some bookkeeping
+ * that must be maintained if we would like to update the VM.
+ */
+ bool consider_update_vm;
+
struct VacuumCutoffs *cutoffs;
/*-------------------------------------------------------
@@ -108,8 +115,9 @@ typedef struct
*
* These fields are not used by pruning itself for the most part, but are
* used to collect information about what was pruned and what state the
- * page is in after pruning, for the benefit of the caller. They are
- * copied to the caller's PruneFreezeResult at the end.
+ * page is in after pruning to use when updating the visibility map and
+ * for the benefit of the caller. They are copied to the caller's
+ * PruneFreezeResult at the end.
* -------------------------------------------------------
*/
@@ -138,11 +146,10 @@ typedef struct
* bits. It is only valid if we froze some tuples, and all_frozen is
* true.
*
- * NOTE: all_visible and all_frozen don't include LP_DEAD items. That's
- * convenient for heap_page_prune_and_freeze(), to use them to decide
- * whether to freeze the page or not. The all_visible and all_frozen
- * values returned to the caller are adjusted to include LP_DEAD items at
- * the end.
+ * NOTE: all_visible and all_frozen don't include LP_DEAD items until
+ * directly before updating the VM. We ignore LP_DEAD items when deciding
+ * whether or not to opportunistically freeze and when determining the
+ * snapshot conflict horizon required when freezing tuples.
*/
bool all_visible;
bool all_frozen;
@@ -377,12 +384,15 @@ identify_and_fix_vm_corruption(Relation relation,
* If the HEAP_PRUNE_FREEZE option is set, we will also freeze tuples if it's
* required in order to advance relfrozenxid / relminmxid, or if it's
* considered advantageous for overall system performance to do so now. The
- * 'cutoffs', 'presult', 'new_relfrozen_xid' and 'new_relmin_mxid' arguments
- * are required when freezing. When HEAP_PRUNE_FREEZE option is set, we also
- * set presult->all_visible and presult->all_frozen on exit, to indicate if
- * the VM bits can be set. They are always set to false when the
- * HEAP_PRUNE_FREEZE option is not set, because at the moment only callers
- * that also freeze need that information.
+ * 'cutoffs', 'new_relfrozen_xid' and 'new_relmin_mxid' arguments are required
+ * when freezing.
+ *
+ * If HEAP_PAGE_PRUNE_UPDATE_VM is set and the visibility status of the page
+ * has changed, we will update the VM at the same time as pruning and freezing
+ * the heap page. We will also update presult->old_vmbits and
+ * presult->new_vmbits with the state of the VM before and after updating it
+ * for the caller to use in bookkeeping. Note that new and old_vmbits will be
+ * 0 if HEAP_PAGE_PRUNE_UPDATE_VM is not set.
*
* blk_known_av is the visibility status of the heap block as of the last call
* to find_next_unskippable_block(). vmbuffer is the buffer that may already
@@ -398,6 +408,8 @@ identify_and_fix_vm_corruption(Relation relation,
* FREEZE indicates that we will also freeze tuples, and will return
* 'all_visible', 'all_frozen' flags to the caller.
*
+ * UPDATE_VM indicates that we will set the page's status in the VM.
+ *
* cutoffs contains the freeze cutoffs, established by VACUUM at the beginning
* of vacuuming the relation. Required if HEAP_PRUNE_FREEZE option is set.
* cutoffs->OldestXmin is also used to determine if dead tuples are
@@ -442,18 +454,24 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
HeapTupleData tup;
bool do_freeze;
bool do_prune;
- bool do_hint;
+ bool do_hint_full_or_prunable;
+ bool do_set_vm;
uint8 vmflags = 0;
uint8 old_vmbits = 0;
bool did_tuple_hint_fpi;
int64 fpi_before = pgWalUsage.wal_fpi;
+ bool all_frozen_except_lp_dead = false;
+ bool set_pd_all_visible = false;
/* Copy parameters to prstate */
prstate.vistest = vistest;
prstate.mark_unused_now = (options & HEAP_PAGE_PRUNE_MARK_UNUSED_NOW) != 0;
prstate.attempt_freeze = (options & HEAP_PAGE_PRUNE_FREEZE) != 0;
+ prstate.consider_update_vm = (options & HEAP_PAGE_PRUNE_UPDATE_VM) != 0;
prstate.cutoffs = cutoffs;
+ Assert(!prstate.consider_update_vm || vmbuffer);
+
/*
* Our strategy is to scan the page and make lists of items to change,
* then apply the changes within a critical section. This keeps as much
@@ -498,50 +516,57 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
prstate.deadoffsets = presult->deadoffsets;
/*
- * Caller may update the VM after we're done. We can keep track of
- * whether the page will be all-visible and all-frozen after pruning and
- * freezing to help the caller to do that.
+ * Keep track of whether or not the page will be all-visible and
+ * all-frozen for use in opportunistic freezing and to update the VM if
+ * the caller requests it.
+ *
+ * Currently, only VACUUM attempts freezing and setting the VM bits. But
+ * other callers could do either one. The visibility bookkeeping is
+ * required for opportunistic freezing (in addition to setting the VM
+ * bits) because we only consider opportunistically freezing tuples if the
+ * whole page would become all-frozen or if the whole page will be frozen
+ * except for dead tuples that will be removed by vacuum.
*
- * Currently, only VACUUM sets the VM bits. To save the effort, only do
- * the bookkeeping if the caller needs it. Currently, that's tied to
- * HEAP_PAGE_PRUNE_FREEZE, but it could be a separate flag if you wanted
- * to update the VM bits without also freezing or freeze without also
- * setting the VM bits.
+ * If only updating the VM, we must initialize all_frozen to false, as
+ * heap_prepare_freeze_tuple() will not be called for each tuple on the
+ * page and we will not end up correctly setting it to false later.
*
- * In addition to telling the caller whether it can set the VM bit, we
- * also use 'all_visible' and 'all_frozen' for our own decision-making. If
- * the whole page would become frozen, we consider opportunistically
- * freezing tuples. We will not be able to freeze the whole page if there
- * are tuples present that are not visible to everyone or if there are
- * dead tuples which are not yet removable. However, dead tuples which
- * will be removed by the end of vacuuming should not preclude us from
- * opportunistically freezing. Because of that, we do not clear
- * all_visible when we see LP_DEAD items. We fix that at the end of the
- * function, when we return the value to the caller, so that the caller
- * doesn't set the VM bit incorrectly.
+ * Dead tuples which will be removed by the end of vacuuming should not
+ * preclude us from opportunistically freezing, so we do not clear
+ * all_visible when we see LP_DEAD items. We fix that after determining
+ * whether or not to freeze but before deciding whether or not to update
+ * the VM so that we don't set the VM bit incorrectly.
+ *
+ * If not freezing or updating the VM, we otherwise avoid the extra
+ * bookkeeping. Initializing all_visible to false allows skipping the work
+ * to update them in heap_prune_record_unchanged_lp_normal().
*/
if (prstate.attempt_freeze)
{
prstate.all_visible = true;
prstate.all_frozen = true;
}
+ else if (prstate.consider_update_vm)
+ {
+ prstate.all_visible = true;
+ prstate.all_frozen = false;
+ }
else
{
- /*
- * Initializing to false allows skipping the work to update them in
- * heap_prune_record_unchanged_lp_normal().
- */
prstate.all_visible = false;
prstate.all_frozen = false;
}
/*
- * The visibility cutoff xid is the newest xmin of live tuples on the
- * page. In the common case, this will be set as the conflict horizon the
- * caller can use for updating the VM. If, at the end of freezing and
- * pruning, the page is all-frozen, there is no possibility that any
- * running transaction on the standby does not see tuples on the page as
- * all-visible, so the conflict horizon remains InvalidTransactionId.
+ * The visibility cutoff xid is the newest xmin of live, committed tuples
+ * older than OldestXmin on the page. This field is only kept up-to-date
+ * if the page is all-visible. As soon as a tuple is encountered that is
+ * not visible to all, this field is unmaintained. As long as it is
+ * maintained, it can be used to calculate the snapshot conflict horizon.
+ * This is most likely to happen when updating the VM and/or freezing all
+ * live tuples on the page. It is updated before returning to the caller
+ * because vacuum does assert-build only validation on the page using this
+ * field.
*/
prstate.visibility_cutoff_xid = InvalidTransactionId;
@@ -739,10 +764,11 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
/*
* Even if we don't prune anything, if we found a new value for the
- * pd_prune_xid field or the page was marked full, we will update the hint
- * bit.
+ * pd_prune_xid field or the page was marked full, we will update those
+ * hint bits.
*/
- do_hint = ((PageHeader) page)->pd_prune_xid != prstate.new_prune_xid ||
+ do_hint_full_or_prunable =
+ ((PageHeader) page)->pd_prune_xid != prstate.new_prune_xid ||
PageIsFull(page);
/*
@@ -790,7 +816,7 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
if (XLogCheckBufferNeedsBackup(buffer))
do_freeze = true;
}
- else if (do_hint)
+ else if (do_hint_full_or_prunable)
{
if (XLogHintBitIsNeeded() && XLogCheckBufferNeedsBackup(buffer))
do_freeze = true;
@@ -829,11 +855,88 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
*/
}
+ /*
+ * It was convenient to ignore LP_DEAD items in all_visible earlier on to
+ * make the choice of whether or not to freeze the page unaffected by the
+ * short-term presence of LP_DEAD items. These LP_DEAD items were
+ * effectively assumed to be LP_UNUSED items in the making. It doesn't
+ * matter which vacuum heap pass (initial pass or final pass) ends up
+ * setting the page all-frozen, as long as the ongoing VACUUM does it.
+ *
+ * Now that freezing has been finalized, unset all_visible if there are
+ * any LP_DEAD items on the page. It needs to reflect the present state of
+ * the page when using it to determine whether or not to update the VM.
+ *
+ * Keep track of whether or not the page was all-frozen except LP_DEAD
+ * items for the purposes of calculating the snapshot conflict horizon,
+ * though.
+ */
+ all_frozen_except_lp_dead = prstate.all_frozen;
+ if (prstate.lpdead_items > 0)
+ {
+ prstate.all_visible = false;
+ prstate.all_frozen = false;
+ }
+
Assert(!prstate.all_frozen || prstate.all_visible);
+
+ /*
+ * Handle setting visibility map bit based on information from the VM (as
+ * of last heap_vac_scan_next_block() call), and from all_visible and
+ * all_frozen variables.
+ */
+ if (prstate.consider_update_vm)
+ {
+ /*
+ * Clear any VM corruption. This does not need to be in a critical
+ * section, so we do it first. If PD_ALL_VISIBLE is incorrectly set,
+ * we may mark the heap page buffer dirty here and could end up doing
+ * so again later. This is not a correctness issue and is in the path
+ * of VM corruption, so we don't have to worry about the extra
+ * performance overhead.
+ */
+ if (identify_and_fix_vm_corruption(relation,
+ blockno, buffer, page,
+ blk_known_av, prstate.lpdead_items, vmbuffer))
+ {
+ /* If we fix corruption, don't update the VM further */
+ }
+
+ /* Determine if we actually need to set the VM and which bits to set. */
+ else if (prstate.all_visible &&
+ (!blk_known_av ||
+ (prstate.all_frozen && !VM_ALL_FROZEN(relation, blockno, &vmbuffer))))
+ {
+ vmflags |= VISIBILITYMAP_ALL_VISIBLE;
+ if (prstate.all_frozen)
+ vmflags |= VISIBILITYMAP_ALL_FROZEN;
+ }
+ }
+
+ do_set_vm = vmflags & VISIBILITYMAP_VALID_BITS;
+
+ /* Lock vmbuffer before entering a critical section */
+ if (do_set_vm)
+ LockBuffer(vmbuffer, BUFFER_LOCK_EXCLUSIVE);
+
+ /*
+ * Don't set PD_ALL_VISIBLE unless we also plan to set the VM. While it is
+ * correct for a heap page to have PD_ALL_VISIBLE even if the VM is not
+ * set, we strongly prefer to keep them in sync.
+ *
+ * Prior to Postgres 19, it was possible for the page-level bit to be set
+ * and the VM bit to be clear. This could happen if we crashed after
+ * setting PD_ALL_VISIBLE but before setting bits in the VM.
+ */
+ set_pd_all_visible = do_set_vm && !PageIsAllVisible(page);
+
+ /* Save these for the caller in case we later zero out vmflags */
+ presult->new_vmbits = vmflags;
+
/* Any error while applying the changes is critical */
START_CRIT_SECTION();
- if (do_hint)
+ if (do_hint_full_or_prunable)
{
/*
* Update the page's pd_prune_xid field to either zero, or the lowest
@@ -849,15 +952,16 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
PageClearFull(page);
/*
- * If that's all we had to do to the page, this is a non-WAL-logged
- * hint. If we are going to freeze or prune the page, we will mark
- * the buffer dirty below.
+ * If we are _only_ setting the prune_xid or PD_PAGE_FULL hint, then
+ * this is a non-WAL-logged hint. If we are going to freeze or prune
+ * tuples on the page or set PD_ALL_VISIBLE, we will mark the buffer
+ * dirty and emit WAL below.
*/
- if (!do_freeze && !do_prune)
+ if (!do_prune && !do_freeze && !set_pd_all_visible)
MarkBufferDirtyHint(buffer, true);
}
- if (do_prune || do_freeze)
+ if (do_prune || do_freeze || do_set_vm)
{
/* Apply the planned item changes and repair page fragmentation. */
if (do_prune)
@@ -871,12 +975,47 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
if (do_freeze)
heap_freeze_prepared_tuples(buffer, prstate.frozen, prstate.nfrozen);
- MarkBufferDirty(buffer);
+ if (set_pd_all_visible)
+ PageSetAllVisible(page);
/*
- * Emit a WAL XLOG_HEAP2_PRUNE_FREEZE record showing what we did
+ * We only set PD_ALL_VISIBLE if we also set the VM, and since setting
+ * the VM requires emitting WAL, MarkBufferDirtyHint() isn't
+ * appropriate here.
*/
- if (RelationNeedsWAL(relation))
+ if (do_prune || do_freeze || set_pd_all_visible)
+ MarkBufferDirty(buffer);
+
+ if (do_set_vm)
+ {
+ Assert(PageIsAllVisible(page));
+ old_vmbits = visibilitymap_set_vmbits(relation, blockno,
+ vmbuffer, vmflags);
+
+ if (old_vmbits == vmflags)
+ {
+ LockBuffer(vmbuffer, BUFFER_LOCK_UNLOCK);
+ do_set_vm = false;
+ /* 0 out vmflags so we don't emit WAL to update the VM */
+ vmflags = 0;
+ }
+ }
+
+ /*
+ * It should never be the case that PD_ALL_VISIBLE is not set and the
+ * VM is set. Or, if it were, we should have caught it earlier when
+ * finding and fixing VM corruption. So, if we found out the VM was
+ * already set above, we should have found PD_ALL_VISIBLE set earlier.
+ */
+ Assert(!set_pd_all_visible || do_set_vm);
+
+ /*
+ * Emit a WAL XLOG_HEAP2_PRUNE_FREEZE record showing what we did. If
+ * we were only updating the VM and it turns out it was already set,
+ * we will have unset do_set_vm earlier. As such, check it again
+ * before emitting the record.
+ */
+ if (RelationNeedsWAL(relation) && (do_set_vm || do_prune || do_freeze))
{
/*
* The snapshotConflictHorizon for the whole record should be the
@@ -888,35 +1027,56 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
* on the standby with xids older than the youngest tuple this
* record will freeze will conflict.
*/
- TransactionId frz_conflict_horizon = InvalidTransactionId;
- TransactionId conflict_xid;
+ TransactionId conflict_xid = InvalidTransactionId;
+
+ /*
+ * If we are updating the VM, the conflict horizon is almost
+ * always the visibility cutoff XID.
+ *
+ * Separately, if we are freezing any tuples, as an optimization,
+ * we can use the visibility_cutoff_xid as the conflict horizon if
+ * the page will be all-frozen. This is true even if there are
+ * LP_DEAD line pointers because we ignored those when maintaining
+ * the visibility_cutoff_xid.
+ */
+ if (do_set_vm || (do_freeze && all_frozen_except_lp_dead))
+ conflict_xid = prstate.visibility_cutoff_xid;
/*
- * We can use the visibility_cutoff_xid as our cutoff for
- * conflicts when the whole page is eligible to become all-frozen
- * in the VM once we're done with it. Otherwise we generate a
- * conservative cutoff by stepping back from OldestXmin.
+ * Otherwise, if we are freezing but the page would not be
+ * all-frozen, we have to use the more pessimistic horizon of
+ * OldestXmin, which may be newer than the newest tuple we froze.
+ * We currently don't track the newest tuple we froze.
*/
- if (do_freeze)
+ else if (do_freeze)
{
- if (prstate.all_visible && prstate.all_frozen)
- frz_conflict_horizon = prstate.visibility_cutoff_xid;
- else
- {
- /* Avoids false conflicts when hot_standby_feedback in use */
- frz_conflict_horizon = prstate.cutoffs->OldestXmin;
- TransactionIdRetreat(frz_conflict_horizon);
- }
+ conflict_xid = prstate.cutoffs->OldestXmin;
+ TransactionIdRetreat(conflict_xid);
}
- if (TransactionIdFollows(frz_conflict_horizon, prstate.latest_xid_removed))
- conflict_xid = frz_conflict_horizon;
- else
+ /*
+ * If we are removing tuples with a younger xmax than our so far
+ * calculated conflict_xid, we must use this as our horizon.
+ */
+ if (TransactionIdFollows(prstate.latest_xid_removed, conflict_xid))
conflict_xid = prstate.latest_xid_removed;
+ /*
+ * We can omit the snapshot conflict horizon if we are not pruning
+ * or freezing any tuples and are setting an already all-visible
+ * page all-frozen in the VM. In this case, all of the tuples on
+ * the page must already be visible to all MVCC snapshots on the
+ * standby.
+ */
+ if (!do_prune && !do_freeze && do_set_vm &&
+ blk_known_av && (vmflags & VISIBILITYMAP_ALL_FROZEN))
+ conflict_xid = InvalidTransactionId;
+
log_heap_prune_and_freeze(relation, buffer,
false,
- InvalidBuffer, 0, false,
+ vmbuffer,
+ vmflags,
+ set_pd_all_visible,
conflict_xid,
true, reason,
prstate.frozen, prstate.nfrozen,
@@ -928,124 +1088,55 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
END_CRIT_SECTION();
- /* Copy information back for caller */
- presult->ndeleted = prstate.ndeleted;
- presult->nnewlpdead = prstate.ndead;
- presult->nfrozen = prstate.nfrozen;
- presult->live_tuples = prstate.live_tuples;
- presult->recently_dead_tuples = prstate.recently_dead_tuples;
-
- /*
- * It was convenient to ignore LP_DEAD items in all_visible earlier on to
- * make the choice of whether or not to freeze the page unaffected by the
- * short-term presence of LP_DEAD items. These LP_DEAD items were
- * effectively assumed to be LP_UNUSED items in the making. It doesn't
- * matter which vacuum heap pass (initial pass or final pass) ends up
- * setting the page all-frozen, as long as the ongoing VACUUM does it.
- *
- * Now that freezing has been finalized, unset all_visible if there are
- * any LP_DEAD items on the page. It needs to reflect the present state
- * of the page, as expected for updating the visibility map.
- */
- if (prstate.all_visible && prstate.lpdead_items == 0)
- {
- presult->all_visible = prstate.all_visible;
- presult->all_frozen = prstate.all_frozen;
- }
- else
- {
- presult->all_visible = false;
- presult->all_frozen = false;
- }
-
- presult->hastup = prstate.hastup;
+ if (do_set_vm)
+ LockBuffer(vmbuffer, BUFFER_LOCK_UNLOCK);
- /*
- * If updating the visibility map, the conflict horizon for that record
- * must be the newest xmin on the page. However, if the page is
- * completely frozen, there can be no conflict and the vm_conflict_horizon
- * should remain InvalidTransactionId. This includes the case that we
- * just froze all the tuples; the prune-freeze record included the
- * conflict XID already so the VM update record doesn't need it.
- */
- if (presult->all_frozen)
- presult->vm_conflict_horizon = InvalidTransactionId;
- else
- presult->vm_conflict_horizon = prstate.visibility_cutoff_xid;
+ Assert(!prstate.all_visible || (prstate.lpdead_items == 0));
/*
- * Handle setting visibility map bit based on information from the VM (as
- * of last heap_vac_scan_next_block() call), and from all_visible and
- * all_frozen variables.
+ * VACUUM will call heap_page_is_all_visible() during the second pass over
+ * the heap to determine all_visible and all_frozen for the page -- this
+ * is a specialized version of the logic from this function. Now that
+ * we've finished pruning and freezing, make sure that we're in total
+ * agreement with heap_page_is_all_visible() using an assertion. We will
+ * have already set the page in the VM, so this assertion will only let
+ * you know that you've already done something wrong.
*/
- if (options & HEAP_PAGE_PRUNE_UPDATE_VM)
+#ifdef USE_ASSERT_CHECKING
+ if (prstate.all_visible)
{
- if (identify_and_fix_vm_corruption(relation,
- blockno, buffer, page,
- blk_known_av,
- prstate.lpdead_items, vmbuffer))
- {
- /* If we fix corruption, don't update the VM further */
- }
+ TransactionId debug_cutoff;
+ bool debug_all_frozen;
- /*
- * If the page isn't yet marked all-visible in the VM or it is and
- * needs to me marked all-frozen, update the VM. Note that all_frozen
- * is only valid if all_visible is true, so we must check both
- * all_visible and all_frozen.
- */
- else if (presult->all_visible &&
- (!blk_known_av ||
- (presult->all_frozen && !VM_ALL_FROZEN(relation, blockno, &vmbuffer))))
- {
- Assert(prstate.lpdead_items == 0);
- vmflags = VISIBILITYMAP_ALL_VISIBLE;
+ Assert(cutoffs);
- /*
- * If the page is all-frozen, we can pass InvalidTransactionId as
- * our cutoff_xid, since a snapshotConflictHorizon sufficient to
- * make everything safe for REDO was logged when the page's tuples
- * were frozen.
- */
- if (presult->all_frozen)
- {
- Assert(!TransactionIdIsValid(presult->vm_conflict_horizon));
- vmflags |= VISIBILITYMAP_ALL_FROZEN;
- }
+ Assert(prstate.lpdead_items == 0);
- /*
- * It's possible for the VM bit to be clear and the page-level bit
- * to be set if checksums are not enabled.
- *
- * And even if we are just planning to update the frozen bit in
- * the VM, we shouldn't rely on all_visible_according_to_vm as a
- * proxy for the page-level PD_ALL_VISIBLE bit being set, since it
- * might have become stale.
- *
- * If the heap page is all-visible but the VM bit is not set, we
- * don't need to dirty the heap page. However, if checksums are
- * enabled, we do need to make sure that the heap page is dirtied
- * before passing it to visibilitymap_set(), because it may be
- * logged.
- */
- if (!PageIsAllVisible(page) || XLogHintBitIsNeeded())
- {
- PageSetAllVisible(page);
- MarkBufferDirty(buffer);
- }
+ if (!heap_page_is_all_visible(relation, buffer,
+ cutoffs->OldestXmin,
+ &debug_all_frozen,
+ &debug_cutoff, off_loc))
+ Assert(false);
- old_vmbits = visibilitymap_set(relation, blockno, buffer, InvalidXLogRecPtr,
- vmbuffer, presult->vm_conflict_horizon,
- vmflags);
- }
+ Assert(prstate.all_frozen == debug_all_frozen);
+
+ Assert(!TransactionIdIsValid(debug_cutoff) ||
+ debug_cutoff == prstate.visibility_cutoff_xid);
}
+#endif
+ /* Copy information back for caller */
+ presult->ndeleted = prstate.ndeleted;
+ presult->nnewlpdead = prstate.ndead;
+ presult->nfrozen = prstate.nfrozen;
+ presult->live_tuples = prstate.live_tuples;
+ presult->recently_dead_tuples = prstate.recently_dead_tuples;
+ presult->old_vmbits = old_vmbits;
+ /* new_vmbits was set above */
+ presult->hastup = prstate.hastup;
presult->lpdead_items = prstate.lpdead_items;
/* the presult->deadoffsets array was already filled in */
- presult->old_vmbits = old_vmbits;
- presult->new_vmbits = vmflags;
-
if (prstate.attempt_freeze)
{
if (presult->nfrozen > 0)
@@ -1627,7 +1718,12 @@ heap_prune_record_unchanged_lp_normal(Page page, PruneState *prstate, OffsetNumb
break;
}
- /* Consider freezing any normal tuples which will not be removed */
+ /*
+ * Consider freezing any normal tuples which will not be removed.
+ * Regardless of whether or not we want to freeze the tuples, if we want
+ * to update the VM, we have to call heap_prepare_freeze_tuple() on every
+ * tuple to know whether or not the page will be totally frozen.
+ */
if (prstate->attempt_freeze)
{
bool totally_frozen;
@@ -2190,7 +2286,7 @@ heap_log_freeze_plan(HeapTupleFreeze *tuples, int ntuples,
* - Reaping: During vacuum phase III, items that are already LP_DEAD are
* marked as unused.
*
- * - VM updates: After vacuum phase III, the heap page may be marked
+ * - VM updates: After vacuum phases I and III, the heap page may be marked
* all-visible and all-frozen.
*
* These changes all happen together, so we use a singel WAL record for them
@@ -2244,6 +2340,7 @@ log_heap_prune_and_freeze(Relation relation, Buffer buffer,
OffsetNumber frz_offsets[MaxHeapTuplesPerPage];
bool do_prune = nredirected > 0 || ndead > 0 || nunused > 0;
+ Assert(do_prune || nfrozen > 0 || vmflags & VISIBILITYMAP_VALID_BITS);
Assert((vmflags & VISIBILITYMAP_VALID_BITS) == vmflags);
xlrec.flags = vmflags;
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index 9492423141e..75205179b83 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -2015,34 +2015,6 @@ lazy_scan_prune(LVRelState *vacrel,
vacrel->new_frozen_tuple_pages++;
}
- /*
- * VACUUM will call heap_page_is_all_visible() during the second pass over
- * the heap to determine all_visible and all_frozen for the page -- this
- * is a specialized version of the logic from this function. Now that
- * we've finished pruning and freezing, make sure that we're in total
- * agreement with heap_page_is_all_visible() using an assertion.
- */
-#ifdef USE_ASSERT_CHECKING
- /* Note that all_frozen value does not matter when !all_visible */
- if (presult.all_visible)
- {
- TransactionId debug_cutoff;
- bool debug_all_frozen;
-
- Assert(presult.lpdead_items == 0);
-
- if (!heap_page_is_all_visible(vacrel->rel, buf,
- vacrel->cutoffs.OldestXmin, &debug_all_frozen,
- &debug_cutoff, &vacrel->offnum))
- Assert(false);
-
- Assert(presult.all_frozen == debug_all_frozen);
-
- Assert(!TransactionIdIsValid(debug_cutoff) ||
- debug_cutoff == presult.vm_conflict_horizon);
- }
-#endif
-
/*
* Now save details of the LP_DEAD items from the page in vacrel
*/
@@ -2076,8 +2048,6 @@ lazy_scan_prune(LVRelState *vacrel,
/* Did we find LP_DEAD items? */
*has_lpdead_items = (presult.lpdead_items > 0);
- Assert(!presult.all_visible || !(*has_lpdead_items));
-
/*
* For the purposes of logging, count whether or not the page was newly
* set all-visible and, potentially, all-frozen.
diff --git a/src/include/access/heapam.h b/src/include/access/heapam.h
index b85648456e9..0b9bb1c9b13 100644
--- a/src/include/access/heapam.h
+++ b/src/include/access/heapam.h
@@ -235,19 +235,12 @@ typedef struct PruneFreezeResult
int recently_dead_tuples;
/*
- * all_visible and all_frozen indicate the status of the page as reflected
- * in the visibility map after pruning, freezing, and setting any pages
- * all-visible in the visibility map.
+ * old_vmbits are the state of the all-visible and all-frozen bits in the
+ * visibility map before updating it during phase I of vacuuming.
+ * new_vmbits are the state of those bits after phase I of vacuuming.
*
- * vm_conflict_horizon is the newest xmin of live tuples on the page
- * (older than OldestXmin). It will only be valid if we did not set the
- * page all-frozen in the VM.
- *
- * These are only set if the HEAP_PRUNE_FREEZE option is set.
+ * These are only set if the HEAP_PAGE_PRUNE_UPDATE_VM option is set.
*/
- bool all_visible;
- bool all_frozen;
- TransactionId vm_conflict_horizon;
uint8 old_vmbits;
uint8 new_vmbits;
--
2.43.0
v11-0012-Remove-xl_heap_visible-entirely.patchtext/x-patch; charset=US-ASCII; name=v11-0012-Remove-xl_heap_visible-entirely.patchDownload
From 061b3cfcc586895787a1d682156f73dc6a9705a4 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <melanieplageman@gmail.com>
Date: Wed, 18 Jun 2025 12:30:42 -0400
Subject: [PATCH v11 12/20] Remove xl_heap_visible entirely
There are now no users of this, so eliminate it entirely.
---
src/backend/access/common/bufmask.c | 3 +-
src/backend/access/heap/heapam.c | 54 +-------
src/backend/access/heap/heapam_xlog.c | 152 ++---------------------
src/backend/access/heap/pruneheap.c | 4 +-
src/backend/access/heap/vacuumlazy.c | 10 +-
src/backend/access/heap/visibilitymap.c | 109 +---------------
src/backend/access/rmgrdesc/heapdesc.c | 10 --
src/backend/replication/logical/decode.c | 1 -
src/backend/storage/ipc/standby.c | 12 +-
src/include/access/heapam_xlog.h | 20 ---
src/include/access/visibilitymap.h | 11 +-
src/include/access/visibilitymapdefs.h | 9 --
src/tools/pgindent/typedefs.list | 1 -
13 files changed, 32 insertions(+), 364 deletions(-)
diff --git a/src/backend/access/common/bufmask.c b/src/backend/access/common/bufmask.c
index bb260cffa68..1fff01383b3 100644
--- a/src/backend/access/common/bufmask.c
+++ b/src/backend/access/common/bufmask.c
@@ -56,8 +56,7 @@ mask_page_hint_bits(Page page)
/*
* During replay, if the page LSN has advanced past our XLOG record's LSN,
- * we don't mark the page all-visible. See heap_xlog_visible() for
- * details.
+ * we don't mark the page all-visible.
*/
PageClearAllVisible(page);
}
diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index cff531a4801..6f161a6eab2 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -40,6 +40,7 @@
#include "access/valid.h"
#include "access/visibilitymap.h"
#include "access/xloginsert.h"
+#include "access/xlogutils.h"
#include "catalog/pg_database.h"
#include "catalog/pg_database_d.h"
#include "commands/vacuum.h"
@@ -2526,11 +2527,11 @@ heap_multi_insert(Relation relation, TupleTableSlot **slots, int ntuples,
else if (all_frozen_set)
{
PageSetAllVisible(page);
- visibilitymap_set_vmbits(relation,
- BufferGetBlockNumber(buffer),
- vmbuffer,
- VISIBILITYMAP_ALL_VISIBLE |
- VISIBILITYMAP_ALL_FROZEN);
+ visibilitymap_set(relation,
+ BufferGetBlockNumber(buffer),
+ vmbuffer,
+ VISIBILITYMAP_ALL_VISIBLE |
+ VISIBILITYMAP_ALL_FROZEN);
}
/*
@@ -8801,49 +8802,6 @@ bottomup_sort_and_shrink(TM_IndexDeleteOp *delstate)
return nblocksfavorable;
}
-/*
- * Perform XLogInsert for a heap-visible operation. 'block' is the block
- * being marked all-visible, and vm_buffer is the buffer containing the
- * corresponding visibility map block. Both should have already been modified
- * and dirtied.
- *
- * snapshotConflictHorizon comes from the largest xmin on the page being
- * marked all-visible. REDO routine uses it to generate recovery conflicts.
- *
- * If checksums or wal_log_hints are enabled, we may also generate a full-page
- * image of heap_buffer. Otherwise, we optimize away the FPI (by specifying
- * REGBUF_NO_IMAGE for the heap buffer), in which case the caller should *not*
- * update the heap page's LSN.
- */
-XLogRecPtr
-log_heap_visible(Relation rel, Buffer heap_buffer, Buffer vm_buffer,
- TransactionId snapshotConflictHorizon, uint8 vmflags)
-{
- xl_heap_visible xlrec;
- XLogRecPtr recptr;
- uint8 flags;
-
- Assert(BufferIsValid(heap_buffer));
- Assert(BufferIsValid(vm_buffer));
-
- xlrec.snapshotConflictHorizon = snapshotConflictHorizon;
- xlrec.flags = vmflags;
- if (RelationIsAccessibleInLogicalDecoding(rel))
- xlrec.flags |= VISIBILITYMAP_XLOG_CATALOG_REL;
- XLogBeginInsert();
- XLogRegisterData(&xlrec, SizeOfHeapVisible);
-
- XLogRegisterBuffer(0, vm_buffer, 0);
-
- flags = REGBUF_STANDARD;
- if (!XLogHintBitIsNeeded())
- flags |= REGBUF_NO_IMAGE;
- XLogRegisterBuffer(1, heap_buffer, flags);
-
- recptr = XLogInsert(RM_HEAP2_ID, XLOG_HEAP2_VISIBLE);
-
- return recptr;
-}
/*
* Perform XLogInsert for a heap-update operation. Caller must already
diff --git a/src/backend/access/heap/heapam_xlog.c b/src/backend/access/heap/heapam_xlog.c
index 11c11929ed9..ff3ad8b4cd2 100644
--- a/src/backend/access/heap/heapam_xlog.c
+++ b/src/backend/access/heap/heapam_xlog.c
@@ -53,6 +53,8 @@ heap_xlog_prune_freeze(XLogReaderState *record)
(xlrec.flags & (XLHP_HAS_REDIRECTIONS | XLHP_HAS_DEAD_ITEMS)) == 0);
vmflags = xlrec.flags & VISIBILITYMAP_VALID_BITS;
+ /* Must never set all_frozen bit without also setting all_visible bit */
+ Assert(vmflags != VISIBILITYMAP_ALL_FROZEN);
/*
* After xl_heap_prune is the optional snapshot conflict horizon.
@@ -250,7 +252,7 @@ heap_xlog_prune_freeze(XLogReaderState *record)
if (PageIsNew(vmpage))
PageInit(vmpage, BLCKSZ, 0);
- old_vmbits = visibilitymap_set_vmbits(reln, blkno, vmbuffer, vmflags);
+ old_vmbits = visibilitymap_set(reln, blkno, vmbuffer, vmflags);
/* Only set VM page LSN if we modified the page */
if (old_vmbits != vmflags)
@@ -269,142 +271,6 @@ heap_xlog_prune_freeze(XLogReaderState *record)
XLogRecordPageWithFreeSpace(rlocator, blkno, freespace);
}
-/*
- * Replay XLOG_HEAP2_VISIBLE records.
- *
- * The critical integrity requirement here is that we must never end up with
- * a situation where the visibility map bit is set, and the page-level
- * PD_ALL_VISIBLE bit is clear. If that were to occur, then a subsequent
- * page modification would fail to clear the visibility map bit.
- */
-static void
-heap_xlog_visible(XLogReaderState *record)
-{
- XLogRecPtr lsn = record->EndRecPtr;
- xl_heap_visible *xlrec = (xl_heap_visible *) XLogRecGetData(record);
- Buffer vmbuffer = InvalidBuffer;
- Buffer buffer;
- Page page;
- RelFileLocator rlocator;
- BlockNumber blkno;
- XLogRedoAction action;
-
- Assert((xlrec->flags & VISIBILITYMAP_XLOG_VALID_BITS) == xlrec->flags);
-
- XLogRecGetBlockTag(record, 1, &rlocator, NULL, &blkno);
-
- /*
- * If there are any Hot Standby transactions running that have an xmin
- * horizon old enough that this page isn't all-visible for them, they
- * might incorrectly decide that an index-only scan can skip a heap fetch.
- *
- * NB: It might be better to throw some kind of "soft" conflict here that
- * forces any index-only scan that is in flight to perform heap fetches,
- * rather than killing the transaction outright.
- */
- if (InHotStandby)
- ResolveRecoveryConflictWithSnapshot(xlrec->snapshotConflictHorizon,
- xlrec->flags & VISIBILITYMAP_XLOG_CATALOG_REL,
- rlocator);
-
- /*
- * Read the heap page, if it still exists. If the heap file has dropped or
- * truncated later in recovery, we don't need to update the page, but we'd
- * better still update the visibility map.
- */
- action = XLogReadBufferForRedo(record, 1, &buffer);
- if (action == BLK_NEEDS_REDO)
- {
- /*
- * We don't bump the LSN of the heap page when setting the visibility
- * map bit (unless checksums or wal_hint_bits is enabled, in which
- * case we must). This exposes us to torn page hazards, but since
- * we're not inspecting the existing page contents in any way, we
- * don't care.
- */
- page = BufferGetPage(buffer);
-
- PageSetAllVisible(page);
-
- if (XLogHintBitIsNeeded())
- PageSetLSN(page, lsn);
-
- MarkBufferDirty(buffer);
- }
- else if (action == BLK_RESTORED)
- {
- /*
- * If heap block was backed up, we already restored it and there's
- * nothing more to do. (This can only happen with checksums or
- * wal_log_hints enabled.)
- */
- }
-
- if (BufferIsValid(buffer))
- {
- Size space = PageGetFreeSpace(BufferGetPage(buffer));
-
- UnlockReleaseBuffer(buffer);
-
- /*
- * Since FSM is not WAL-logged and only updated heuristically, it
- * easily becomes stale in standbys. If the standby is later promoted
- * and runs VACUUM, it will skip updating individual free space
- * figures for pages that became all-visible (or all-frozen, depending
- * on the vacuum mode,) which is troublesome when FreeSpaceMapVacuum
- * propagates too optimistic free space values to upper FSM layers;
- * later inserters try to use such pages only to find out that they
- * are unusable. This can cause long stalls when there are many such
- * pages.
- *
- * Forestall those problems by updating FSM's idea about a page that
- * is becoming all-visible or all-frozen.
- *
- * Do this regardless of a full-page image being applied, since the
- * FSM data is not in the page anyway.
- */
- if (xlrec->flags & VISIBILITYMAP_VALID_BITS)
- XLogRecordPageWithFreeSpace(rlocator, blkno, space);
- }
-
- /*
- * Even if we skipped the heap page update due to the LSN interlock, it's
- * still safe to update the visibility map. Any WAL record that clears
- * the visibility map bit does so before checking the page LSN, so any
- * bits that need to be cleared will still be cleared.
- */
- if (XLogReadBufferForRedoExtended(record, 0, RBM_ZERO_ON_ERROR, false,
- &vmbuffer) == BLK_NEEDS_REDO)
- {
- Page vmpage = BufferGetPage(vmbuffer);
- Relation reln;
- uint8 vmbits;
-
- /* initialize the page if it was read as zeros */
- if (PageIsNew(vmpage))
- PageInit(vmpage, BLCKSZ, 0);
-
- /* remove VISIBILITYMAP_XLOG_* */
- vmbits = xlrec->flags & VISIBILITYMAP_VALID_BITS;
-
- /*
- * XLogReadBufferForRedoExtended locked the buffer. But
- * visibilitymap_set will handle locking itself.
- */
- LockBuffer(vmbuffer, BUFFER_LOCK_UNLOCK);
-
- reln = CreateFakeRelcacheEntry(rlocator);
-
- visibilitymap_set(reln, blkno, InvalidBuffer, lsn, vmbuffer,
- xlrec->snapshotConflictHorizon, vmbits);
-
- ReleaseBuffer(vmbuffer);
- FreeFakeRelcacheEntry(reln);
- }
- else if (BufferIsValid(vmbuffer))
- UnlockReleaseBuffer(vmbuffer);
-}
-
/*
* Given an "infobits" field from an XLog record, set the correct bits in the
* given infomask and infomask2 for the tuple touched by the record.
@@ -785,15 +651,14 @@ heap_xlog_multi_insert(XLogReaderState *record)
if (PageIsNew(vmpage))
PageInit(vmpage, BLCKSZ, 0);
- visibilitymap_set_vmbits(reln, blkno,
- vmbuffer,
- VISIBILITYMAP_ALL_VISIBLE |
- VISIBILITYMAP_ALL_FROZEN);
-
/*
* It is not possible that the VM was already set for this heap page,
* so the vmbuffer must have been modified and marked dirty.
*/
+ visibilitymap_set(reln, blkno,
+ vmbuffer,
+ VISIBILITYMAP_ALL_VISIBLE |
+ VISIBILITYMAP_ALL_FROZEN);
Assert(BufferIsDirty(vmbuffer));
PageSetLSN(BufferGetPage(vmbuffer), lsn);
FreeFakeRelcacheEntry(reln);
@@ -1374,9 +1239,6 @@ heap2_redo(XLogReaderState *record)
case XLOG_HEAP2_PRUNE_VACUUM_CLEANUP:
heap_xlog_prune_freeze(record);
break;
- case XLOG_HEAP2_VISIBLE:
- heap_xlog_visible(record);
- break;
case XLOG_HEAP2_MULTI_INSERT:
heap_xlog_multi_insert(record);
break;
diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index dce9025d268..6637966e927 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -989,8 +989,8 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
if (do_set_vm)
{
Assert(PageIsAllVisible(page));
- old_vmbits = visibilitymap_set_vmbits(relation, blockno,
- vmbuffer, vmflags);
+ old_vmbits = visibilitymap_set(relation, blockno,
+ vmbuffer, vmflags);
if (old_vmbits == vmflags)
{
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index 75205179b83..2dcca071a45 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -1888,8 +1888,8 @@ lazy_scan_new_or_empty(LVRelState *vacrel, Buffer buf, BlockNumber blkno,
PageSetAllVisible(page);
MarkBufferDirty(buf);
- visibilitymap_set_vmbits(vacrel->rel, blkno,
- vmbuffer, new_vmbits);
+ visibilitymap_set(vacrel->rel, blkno,
+ vmbuffer, new_vmbits);
if (RelationNeedsWAL(vacrel->rel))
{
@@ -2757,9 +2757,9 @@ lazy_vacuum_heap_page(LVRelState *vacrel, BlockNumber blkno, Buffer buffer,
Assert(!PageIsAllVisible(page));
set_pd_all_vis = true;
PageSetAllVisible(page);
- visibilitymap_set_vmbits(vacrel->rel,
- blkno,
- vmbuffer, vmflags);
+ visibilitymap_set(vacrel->rel,
+ blkno,
+ vmbuffer, vmflags);
conflict_xid = visibility_cutoff_xid;
}
diff --git a/src/backend/access/heap/visibilitymap.c b/src/backend/access/heap/visibilitymap.c
index aa48a436108..f7bad68ffc5 100644
--- a/src/backend/access/heap/visibilitymap.c
+++ b/src/backend/access/heap/visibilitymap.c
@@ -14,8 +14,7 @@
* visibilitymap_clear - clear bits for one page in the visibility map
* visibilitymap_pin - pin a map page for setting a bit
* visibilitymap_pin_ok - check whether correct map page is already pinned
- * visibilitymap_set - set bit(s) in a previously pinned page and log
- * visibilitymap_set_vmbits - set bit(s) in a pinned page
+ * visibilitymap_set - set bit(s) in a previously pinned page
* visibilitymap_get_status - get status of bits
* visibilitymap_count - count number of bits set in visibility map
* visibilitymap_prepare_truncate -
@@ -220,108 +219,6 @@ visibilitymap_pin_ok(BlockNumber heapBlk, Buffer vmbuf)
return BufferIsValid(vmbuf) && BufferGetBlockNumber(vmbuf) == mapBlock;
}
-/*
- * visibilitymap_set - set bit(s) on a previously pinned page
- *
- * recptr is the LSN of the XLOG record we're replaying, if we're in recovery,
- * or InvalidXLogRecPtr in normal running. The VM page LSN is advanced to the
- * one provided; in normal running, we generate a new XLOG record and set the
- * page LSN to that value (though the heap page's LSN may *not* be updated;
- * see below). cutoff_xid is the largest xmin on the page being marked
- * all-visible; it is needed for Hot Standby, and can be InvalidTransactionId
- * if the page contains no tuples. It can also be set to InvalidTransactionId
- * when a page that is already all-visible is being marked all-frozen.
- *
- * Caller is expected to set the heap page's PD_ALL_VISIBLE bit before calling
- * this function. Except in recovery, caller should also pass the heap
- * buffer. When checksums are enabled and we're not in recovery, we must add
- * the heap buffer to the WAL chain to protect it from being torn.
- *
- * You must pass a buffer containing the correct map page to this function.
- * Call visibilitymap_pin first to pin the right one. This function doesn't do
- * any I/O.
- *
- * Returns the state of the page's VM bits before setting flags.
- */
-uint8
-visibilitymap_set(Relation rel, BlockNumber heapBlk, Buffer heapBuf,
- XLogRecPtr recptr, Buffer vmBuf, TransactionId cutoff_xid,
- uint8 flags)
-{
- BlockNumber mapBlock = HEAPBLK_TO_MAPBLOCK(heapBlk);
- uint32 mapByte = HEAPBLK_TO_MAPBYTE(heapBlk);
- uint8 mapOffset = HEAPBLK_TO_OFFSET(heapBlk);
- Page page;
- uint8 *map;
- uint8 status;
-
-#ifdef TRACE_VISIBILITYMAP
- elog(DEBUG1, "vm_set flags 0x%02X for %s %d",
- flags, RelationGetRelationName(rel), heapBlk);
-#endif
-
- Assert(InRecovery || XLogRecPtrIsInvalid(recptr));
- Assert(InRecovery || PageIsAllVisible(BufferGetPage(heapBuf)));
- Assert((flags & VISIBILITYMAP_VALID_BITS) == flags);
-
- /* Must never set all_frozen bit without also setting all_visible bit */
- Assert(flags != VISIBILITYMAP_ALL_FROZEN);
-
- /* Check that we have the right heap page pinned, if present */
- if (BufferIsValid(heapBuf) && BufferGetBlockNumber(heapBuf) != heapBlk)
- elog(ERROR, "wrong heap buffer passed to visibilitymap_set");
-
- Assert(!BufferIsValid(heapBuf) || BufferIsExclusiveLocked(heapBuf));
-
- /* Check that we have the right VM page pinned */
- if (!BufferIsValid(vmBuf) || BufferGetBlockNumber(vmBuf) != mapBlock)
- elog(ERROR, "wrong VM buffer passed to visibilitymap_set");
-
- page = BufferGetPage(vmBuf);
- map = (uint8 *) PageGetContents(page);
- LockBuffer(vmBuf, BUFFER_LOCK_EXCLUSIVE);
-
- status = (map[mapByte] >> mapOffset) & VISIBILITYMAP_VALID_BITS;
- if (flags != status)
- {
- START_CRIT_SECTION();
-
- map[mapByte] |= (flags << mapOffset);
- MarkBufferDirty(vmBuf);
-
- if (RelationNeedsWAL(rel))
- {
- if (XLogRecPtrIsInvalid(recptr))
- {
- Assert(!InRecovery);
- recptr = log_heap_visible(rel, heapBuf, vmBuf, cutoff_xid, flags);
-
- /*
- * If data checksums are enabled (or wal_log_hints=on), we
- * need to protect the heap page from being torn.
- *
- * If not, then we must *not* update the heap page's LSN. In
- * this case, the FPI for the heap page was omitted from the
- * WAL record inserted above, so it would be incorrect to
- * update the heap page's LSN.
- */
- if (XLogHintBitIsNeeded())
- {
- Page heapPage = BufferGetPage(heapBuf);
-
- PageSetLSN(heapPage, recptr);
- }
- }
- PageSetLSN(page, recptr);
- }
-
- END_CRIT_SECTION();
- }
-
- LockBuffer(vmBuf, BUFFER_LOCK_UNLOCK);
- return status;
-}
-
/*
* Set flags in the VM block contained in the passed in vmBuf.
*
@@ -343,8 +240,8 @@ visibilitymap_set(Relation rel, BlockNumber heapBlk, Buffer heapBuf,
* is pinned and exclusive locked.
*/
uint8
-visibilitymap_set_vmbits(Relation rel, BlockNumber heapBlk,
- Buffer vmBuf, uint8 flags)
+visibilitymap_set(Relation rel, BlockNumber heapBlk,
+ Buffer vmBuf, uint8 flags)
{
BlockNumber mapBlock = HEAPBLK_TO_MAPBLOCK(heapBlk);
uint32 mapByte = HEAPBLK_TO_MAPBYTE(heapBlk);
diff --git a/src/backend/access/rmgrdesc/heapdesc.c b/src/backend/access/rmgrdesc/heapdesc.c
index 439f33b8061..3342af02c75 100644
--- a/src/backend/access/rmgrdesc/heapdesc.c
+++ b/src/backend/access/rmgrdesc/heapdesc.c
@@ -344,13 +344,6 @@ heap2_desc(StringInfo buf, XLogReaderState *record)
}
}
}
- else if (info == XLOG_HEAP2_VISIBLE)
- {
- xl_heap_visible *xlrec = (xl_heap_visible *) rec;
-
- appendStringInfo(buf, "snapshotConflictHorizon: %u, flags: 0x%02X",
- xlrec->snapshotConflictHorizon, xlrec->flags);
- }
else if (info == XLOG_HEAP2_MULTI_INSERT)
{
xl_heap_multi_insert *xlrec = (xl_heap_multi_insert *) rec;
@@ -455,9 +448,6 @@ heap2_identify(uint8 info)
case XLOG_HEAP2_PRUNE_VACUUM_CLEANUP:
id = "PRUNE_VACUUM_CLEANUP";
break;
- case XLOG_HEAP2_VISIBLE:
- id = "VISIBLE";
- break;
case XLOG_HEAP2_MULTI_INSERT:
id = "MULTI_INSERT";
break;
diff --git a/src/backend/replication/logical/decode.c b/src/backend/replication/logical/decode.c
index cc03f0706e9..2fdd4af90a8 100644
--- a/src/backend/replication/logical/decode.c
+++ b/src/backend/replication/logical/decode.c
@@ -454,7 +454,6 @@ heap2_decode(LogicalDecodingContext *ctx, XLogRecordBuffer *buf)
case XLOG_HEAP2_PRUNE_ON_ACCESS:
case XLOG_HEAP2_PRUNE_VACUUM_SCAN:
case XLOG_HEAP2_PRUNE_VACUUM_CLEANUP:
- case XLOG_HEAP2_VISIBLE:
case XLOG_HEAP2_LOCK_UPDATED:
break;
default:
diff --git a/src/backend/storage/ipc/standby.c b/src/backend/storage/ipc/standby.c
index 4222bdab078..c619643e121 100644
--- a/src/backend/storage/ipc/standby.c
+++ b/src/backend/storage/ipc/standby.c
@@ -475,12 +475,12 @@ ResolveRecoveryConflictWithSnapshot(TransactionId snapshotConflictHorizon,
* If we get passed InvalidTransactionId then we do nothing (no conflict).
*
* This can happen when replaying already-applied WAL records after a
- * standby crash or restart, or when replaying an XLOG_HEAP2_VISIBLE
- * record that marks as frozen a page which was already all-visible. It's
- * also quite common with records generated during index deletion
- * (original execution of the deletion can reason that a recovery conflict
- * which is sufficient for the deletion operation must take place before
- * replay of the deletion record itself).
+ * standby crash or restart, or when replaying a record that marks as
+ * frozen a page which was already marked all-visible in the visibility
+ * map. It's also quite common with records generated during index
+ * deletion (original execution of the deletion can reason that a recovery
+ * conflict which is sufficient for the deletion operation must take place
+ * before replay of the deletion record itself).
*/
if (!TransactionIdIsValid(snapshotConflictHorizon))
return;
diff --git a/src/include/access/heapam_xlog.h b/src/include/access/heapam_xlog.h
index d8508593e7c..3672f372aa8 100644
--- a/src/include/access/heapam_xlog.h
+++ b/src/include/access/heapam_xlog.h
@@ -60,7 +60,6 @@
#define XLOG_HEAP2_PRUNE_ON_ACCESS 0x10
#define XLOG_HEAP2_PRUNE_VACUUM_SCAN 0x20
#define XLOG_HEAP2_PRUNE_VACUUM_CLEANUP 0x30
-#define XLOG_HEAP2_VISIBLE 0x40
#define XLOG_HEAP2_MULTI_INSERT 0x50
#define XLOG_HEAP2_LOCK_UPDATED 0x60
#define XLOG_HEAP2_NEW_CID 0x70
@@ -446,20 +445,6 @@ typedef struct xl_heap_inplace
#define MinSizeOfHeapInplace (offsetof(xl_heap_inplace, nmsgs) + sizeof(int))
-/*
- * This is what we need to know about setting a visibility map bit
- *
- * Backup blk 0: visibility map buffer
- * Backup blk 1: heap buffer
- */
-typedef struct xl_heap_visible
-{
- TransactionId snapshotConflictHorizon;
- uint8 flags;
-} xl_heap_visible;
-
-#define SizeOfHeapVisible (offsetof(xl_heap_visible, flags) + sizeof(uint8))
-
typedef struct xl_heap_new_cid
{
/*
@@ -503,11 +488,6 @@ extern void heap2_desc(StringInfo buf, XLogReaderState *record);
extern const char *heap2_identify(uint8 info);
extern void heap_xlog_logical_rewrite(XLogReaderState *r);
-extern XLogRecPtr log_heap_visible(Relation rel, Buffer heap_buffer,
- Buffer vm_buffer,
- TransactionId snapshotConflictHorizon,
- uint8 vmflags);
-
/* in heapdesc.c, so it can be shared between frontend/backend code */
extern void heap_xlog_deserialize_prune_and_freeze(char *cursor, uint16 flags,
int *nplans, xlhp_freeze_plan **plans,
diff --git a/src/include/access/visibilitymap.h b/src/include/access/visibilitymap.h
index fc7056a91ea..c5b1e1f7adb 100644
--- a/src/include/access/visibilitymap.h
+++ b/src/include/access/visibilitymap.h
@@ -15,7 +15,6 @@
#define VISIBILITYMAP_H
#include "access/visibilitymapdefs.h"
-#include "access/xlogdefs.h"
#include "storage/block.h"
#include "storage/buf.h"
#include "utils/relcache.h"
@@ -31,14 +30,8 @@ extern bool visibilitymap_clear(Relation rel, BlockNumber heapBlk,
extern void visibilitymap_pin(Relation rel, BlockNumber heapBlk,
Buffer *vmbuf);
extern bool visibilitymap_pin_ok(BlockNumber heapBlk, Buffer vmbuf);
-extern uint8 visibilitymap_set(Relation rel,
- BlockNumber heapBlk, Buffer heapBuf,
- XLogRecPtr recptr,
- Buffer vmBuf,
- TransactionId cutoff_xid,
- uint8 flags);
-extern uint8 visibilitymap_set_vmbits(Relation rel, BlockNumber heapBlk,
- Buffer vmBuf, uint8 flags);
+extern uint8 visibilitymap_set(Relation rel, BlockNumber heapBlk,
+ Buffer vmBuf, uint8 flags);
extern uint8 visibilitymap_get_status(Relation rel, BlockNumber heapBlk, Buffer *vmbuf);
extern void visibilitymap_count(Relation rel, BlockNumber *all_visible, BlockNumber *all_frozen);
extern BlockNumber visibilitymap_prepare_truncate(Relation rel,
diff --git a/src/include/access/visibilitymapdefs.h b/src/include/access/visibilitymapdefs.h
index 5ad5c020877..e01bce4c99f 100644
--- a/src/include/access/visibilitymapdefs.h
+++ b/src/include/access/visibilitymapdefs.h
@@ -21,14 +21,5 @@
#define VISIBILITYMAP_ALL_FROZEN 0x02
#define VISIBILITYMAP_VALID_BITS 0x03 /* OR of all valid visibilitymap
* flags bits */
-/*
- * To detect recovery conflicts during logical decoding on a standby, we need
- * to know if a table is a user catalog table. For that we add an additional
- * bit into xl_heap_visible.flags, in addition to the above.
- *
- * NB: VISIBILITYMAP_XLOG_* may not be passed to visibilitymap_set().
- */
-#define VISIBILITYMAP_XLOG_CATALOG_REL 0x04
-#define VISIBILITYMAP_XLOG_VALID_BITS (VISIBILITYMAP_VALID_BITS | VISIBILITYMAP_XLOG_CATALOG_REL)
#endif /* VISIBILITYMAPDEFS_H */
diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list
index a13e8162890..b4c880c083f 100644
--- a/src/tools/pgindent/typedefs.list
+++ b/src/tools/pgindent/typedefs.list
@@ -4273,7 +4273,6 @@ xl_heap_prune
xl_heap_rewrite_mapping
xl_heap_truncate
xl_heap_update
-xl_heap_visible
xl_invalid_page
xl_invalid_page_key
xl_invalidations
--
2.43.0
v11-0015-Inline-TransactionIdFollows-Precedes.patchtext/x-patch; charset=US-ASCII; name=v11-0015-Inline-TransactionIdFollows-Precedes.patchDownload
From 67cd0164f81fc9612875edd024b917ed79707b83 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <melanieplageman@gmail.com>
Date: Tue, 29 Jul 2025 09:57:13 -0400
Subject: [PATCH v11 15/20] Inline TransactionIdFollows/Precedes()
Calling these from on-access pruning code had noticeable overhead in a
profile. There does not seem to be a reason not to inline them.
Reviewed-by: Kirill Reshke <reshkekirill@gmail.com>
---
src/backend/access/transam/transam.c | 64 -------------------------
src/include/access/transam.h | 70 ++++++++++++++++++++++++++--
2 files changed, 66 insertions(+), 68 deletions(-)
diff --git a/src/backend/access/transam/transam.c b/src/backend/access/transam/transam.c
index 9a39451a29a..553d6756cb3 100644
--- a/src/backend/access/transam/transam.c
+++ b/src/backend/access/transam/transam.c
@@ -273,70 +273,6 @@ TransactionIdAbortTree(TransactionId xid, int nxids, TransactionId *xids)
TRANSACTION_STATUS_ABORTED, InvalidXLogRecPtr);
}
-/*
- * TransactionIdPrecedes --- is id1 logically < id2?
- */
-bool
-TransactionIdPrecedes(TransactionId id1, TransactionId id2)
-{
- /*
- * If either ID is a permanent XID then we can just do unsigned
- * comparison. If both are normal, do a modulo-2^32 comparison.
- */
- int32 diff;
-
- if (!TransactionIdIsNormal(id1) || !TransactionIdIsNormal(id2))
- return (id1 < id2);
-
- diff = (int32) (id1 - id2);
- return (diff < 0);
-}
-
-/*
- * TransactionIdPrecedesOrEquals --- is id1 logically <= id2?
- */
-bool
-TransactionIdPrecedesOrEquals(TransactionId id1, TransactionId id2)
-{
- int32 diff;
-
- if (!TransactionIdIsNormal(id1) || !TransactionIdIsNormal(id2))
- return (id1 <= id2);
-
- diff = (int32) (id1 - id2);
- return (diff <= 0);
-}
-
-/*
- * TransactionIdFollows --- is id1 logically > id2?
- */
-bool
-TransactionIdFollows(TransactionId id1, TransactionId id2)
-{
- int32 diff;
-
- if (!TransactionIdIsNormal(id1) || !TransactionIdIsNormal(id2))
- return (id1 > id2);
-
- diff = (int32) (id1 - id2);
- return (diff > 0);
-}
-
-/*
- * TransactionIdFollowsOrEquals --- is id1 logically >= id2?
- */
-bool
-TransactionIdFollowsOrEquals(TransactionId id1, TransactionId id2)
-{
- int32 diff;
-
- if (!TransactionIdIsNormal(id1) || !TransactionIdIsNormal(id2))
- return (id1 >= id2);
-
- diff = (int32) (id1 - id2);
- return (diff >= 0);
-}
-
/*
* TransactionIdLatest --- get latest XID among a main xact and its children
diff --git a/src/include/access/transam.h b/src/include/access/transam.h
index 7d82cd2eb56..c9e20418275 100644
--- a/src/include/access/transam.h
+++ b/src/include/access/transam.h
@@ -255,6 +255,72 @@ typedef struct TransamVariablesData
} TransamVariablesData;
+
+/*
+ * TransactionIdPrecedes --- is id1 logically < id2?
+ */
+static inline bool
+TransactionIdPrecedes(TransactionId id1, TransactionId id2)
+{
+ /*
+ * If either ID is a permanent XID then we can just do unsigned
+ * comparison. If both are normal, do a modulo-2^32 comparison.
+ */
+ int32 diff;
+
+ if (!TransactionIdIsNormal(id1) || !TransactionIdIsNormal(id2))
+ return (id1 < id2);
+
+ diff = (int32) (id1 - id2);
+ return (diff < 0);
+}
+
+/*
+ * TransactionIdPrecedesOrEquals --- is id1 logically <= id2?
+ */
+static inline bool
+TransactionIdPrecedesOrEquals(TransactionId id1, TransactionId id2)
+{
+ int32 diff;
+
+ if (!TransactionIdIsNormal(id1) || !TransactionIdIsNormal(id2))
+ return (id1 <= id2);
+
+ diff = (int32) (id1 - id2);
+ return (diff <= 0);
+}
+
+/*
+ * TransactionIdFollows --- is id1 logically > id2?
+ */
+static inline bool
+TransactionIdFollows(TransactionId id1, TransactionId id2)
+{
+ int32 diff;
+
+ if (!TransactionIdIsNormal(id1) || !TransactionIdIsNormal(id2))
+ return (id1 > id2);
+
+ diff = (int32) (id1 - id2);
+ return (diff > 0);
+}
+
+/*
+ * TransactionIdFollowsOrEquals --- is id1 logically >= id2?
+ */
+static inline bool
+TransactionIdFollowsOrEquals(TransactionId id1, TransactionId id2)
+{
+ int32 diff;
+
+ if (!TransactionIdIsNormal(id1) || !TransactionIdIsNormal(id2))
+ return (id1 >= id2);
+
+ diff = (int32) (id1 - id2);
+ return (diff >= 0);
+}
+
+
/* ----------------
* extern declarations
* ----------------
@@ -274,10 +340,6 @@ extern bool TransactionIdDidAbort(TransactionId transactionId);
extern void TransactionIdCommitTree(TransactionId xid, int nxids, TransactionId *xids);
extern void TransactionIdAsyncCommitTree(TransactionId xid, int nxids, TransactionId *xids, XLogRecPtr lsn);
extern void TransactionIdAbortTree(TransactionId xid, int nxids, TransactionId *xids);
-extern bool TransactionIdPrecedes(TransactionId id1, TransactionId id2);
-extern bool TransactionIdPrecedesOrEquals(TransactionId id1, TransactionId id2);
-extern bool TransactionIdFollows(TransactionId id1, TransactionId id2);
-extern bool TransactionIdFollowsOrEquals(TransactionId id1, TransactionId id2);
extern TransactionId TransactionIdLatest(TransactionId mainxid,
int nxids, const TransactionId *xids);
extern XLogRecPtr TransactionIdGetCommitLSN(TransactionId xid);
--
2.43.0
v11-0016-Unset-all-visible-sooner-if-not-freezing.patchtext/x-patch; charset=US-ASCII; name=v11-0016-Unset-all-visible-sooner-if-not-freezing.patchDownload
From 6a4f009579e3067371d50bb85080243a26fd333f Mon Sep 17 00:00:00 2001
From: Melanie Plageman <melanieplageman@gmail.com>
Date: Tue, 29 Jul 2025 14:35:13 -0400
Subject: [PATCH v11 16/20] Unset all-visible sooner if not freezing
In prune/freeze code, we delay unsetting all-visible/all-frozen in the
presence of dead items to allow opportunistically freezing tuples if the
whole page would be frozen except for those dead items -- which are
removed later in vacuum's third phase.
Future commits will allow on-access pruning to set the VM, which means
all-visible will be initialized to true instead of false and we will do
extra bookkeeping in heap_prune_unchanged_lp_normal() to keep track of
whether or not the page is all-visible.
Because on-access pruning will not freeze tuples, it makes sense to
unset all-visible as soon as we encounter an LP_DEAD item and
avoid continued bookkeeping since we know the page is not all-visible
and we won't be able to remove those dead items.
---
src/backend/access/heap/pruneheap.c | 10 ++++++++--
1 file changed, 8 insertions(+), 2 deletions(-)
diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index c6935e45cec..ba8ddc7fa35 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -1503,8 +1503,11 @@ heap_prune_record_dead(PruneState *prstate, OffsetNumber offnum,
/*
* Deliberately delay unsetting all_visible until later during pruning.
- * Removable dead tuples shouldn't preclude freezing the page.
+ * Removable dead tuples shouldn't preclude freezing the page. If we won't
+ * attempt freezing, just unset all-visible now, though.
*/
+ if (!prstate->attempt_freeze)
+ prstate->all_visible = prstate->all_frozen = false;
/* Record the dead offset for vacuum */
prstate->deadoffsets[prstate->lpdead_items++] = offnum;
@@ -1762,8 +1765,11 @@ heap_prune_record_unchanged_lp_dead(Page page, PruneState *prstate, OffsetNumber
* Similarly, don't unset all_visible until later, at the end of
* heap_page_prune_and_freeze(). This will allow us to attempt to freeze
* the page after pruning. As long as we unset it before updating the
- * visibility map, this will be correct.
+ * visibility map, this will be correct. If we won't attempt freezing,
+ * though, just unset all-visible now.
*/
+ if (!prstate->attempt_freeze)
+ prstate->all_visible = prstate->all_frozen = false;
/* Record the dead offset for vacuum */
prstate->deadoffsets[prstate->lpdead_items++] = offnum;
--
2.43.0
v11-0017-Allow-on-access-pruning-to-set-pages-all-visible.patchtext/x-patch; charset=US-ASCII; name=v11-0017-Allow-on-access-pruning-to-set-pages-all-visible.patchDownload
From 903038c95c425ffaf35925483ae3f3a4c010f5a9 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <melanieplageman@gmail.com>
Date: Tue, 29 Jul 2025 14:34:30 -0400
Subject: [PATCH v11 17/20] Allow on-access pruning to set pages all-visible
Many queries do not modify the underlying relation. For such queries, if
on-access pruning occurs during the scan, we can check whether the page
has become all-visible and update the visibility map accordingly.
Previously, only vacuum marked pages as all-visible or all-frozen.
Supporting this requires passing information about whether the relation
is modified from the executor down to the scan descriptor.
This commit implements on-access VM setting for sequential scans as well
as for the underlying heap relation in index scans and bitmap heap
scans.
---
src/backend/access/heap/heapam.c | 15 ++++-
src/backend/access/heap/heapam_handler.c | 15 ++++-
src/backend/access/heap/pruneheap.c | 67 ++++++++++++++-----
src/backend/access/index/indexam.c | 46 +++++++++++++
src/backend/access/table/tableam.c | 39 +++++++++--
src/backend/executor/execMain.c | 4 ++
src/backend/executor/execUtils.c | 2 +
src/backend/executor/nodeBitmapHeapscan.c | 7 +-
src/backend/executor/nodeIndexscan.c | 18 +++--
src/backend/executor/nodeSeqscan.c | 24 +++++--
src/include/access/genam.h | 11 +++
src/include/access/heapam.h | 24 ++++++-
src/include/access/relscan.h | 6 ++
src/include/access/tableam.h | 30 ++++++++-
src/include/nodes/execnodes.h | 6 ++
.../t/035_standby_logical_decoding.pl | 3 +-
16 files changed, 277 insertions(+), 40 deletions(-)
diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index 6f161a6eab2..f9e50d47aee 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -556,6 +556,7 @@ heap_prepare_pagescan(TableScanDesc sscan)
Buffer buffer = scan->rs_cbuf;
BlockNumber block = scan->rs_cblock;
Snapshot snapshot;
+ Buffer *vmbuffer = NULL;
Page page;
int lines;
bool all_visible;
@@ -570,7 +571,9 @@ heap_prepare_pagescan(TableScanDesc sscan)
/*
* Prune and repair fragmentation for the whole page, if possible.
*/
- heap_page_prune_opt(scan->rs_base.rs_rd, buffer);
+ if (sscan->rs_flags & SO_ALLOW_VM_SET)
+ vmbuffer = &scan->rs_vmbuffer;
+ heap_page_prune_opt(scan->rs_base.rs_rd, buffer, vmbuffer);
/*
* We must hold share lock on the buffer content while examining tuple
@@ -1247,6 +1250,7 @@ heap_beginscan(Relation relation, Snapshot snapshot,
sizeof(TBMIterateResult));
}
+ scan->rs_vmbuffer = InvalidBuffer;
return (TableScanDesc) scan;
}
@@ -1285,6 +1289,12 @@ heap_rescan(TableScanDesc sscan, ScanKey key, bool set_params,
scan->rs_cbuf = InvalidBuffer;
}
+ if (BufferIsValid(scan->rs_vmbuffer))
+ {
+ ReleaseBuffer(scan->rs_vmbuffer);
+ scan->rs_vmbuffer = InvalidBuffer;
+ }
+
/*
* SO_TYPE_BITMAPSCAN would be cleaned up here, but it does not hold any
* additional data vs a normal HeapScan
@@ -1317,6 +1327,9 @@ heap_endscan(TableScanDesc sscan)
if (BufferIsValid(scan->rs_cbuf))
ReleaseBuffer(scan->rs_cbuf);
+ if (BufferIsValid(scan->rs_vmbuffer))
+ ReleaseBuffer(scan->rs_vmbuffer);
+
/*
* Must free the read stream before freeing the BufferAccessStrategy.
*/
diff --git a/src/backend/access/heap/heapam_handler.c b/src/backend/access/heap/heapam_handler.c
index bcbac844bb6..f05b9e4968d 100644
--- a/src/backend/access/heap/heapam_handler.c
+++ b/src/backend/access/heap/heapam_handler.c
@@ -85,6 +85,7 @@ heapam_index_fetch_begin(Relation rel)
hscan->xs_base.rel = rel;
hscan->xs_cbuf = InvalidBuffer;
+ hscan->xs_vmbuffer = InvalidBuffer;
return &hscan->xs_base;
}
@@ -99,6 +100,12 @@ heapam_index_fetch_reset(IndexFetchTableData *scan)
ReleaseBuffer(hscan->xs_cbuf);
hscan->xs_cbuf = InvalidBuffer;
}
+
+ if (BufferIsValid(hscan->xs_vmbuffer))
+ {
+ ReleaseBuffer(hscan->xs_vmbuffer);
+ hscan->xs_vmbuffer = InvalidBuffer;
+ }
}
static void
@@ -138,7 +145,8 @@ heapam_index_fetch_tuple(struct IndexFetchTableData *scan,
* Prune page, but only if we weren't already on this page
*/
if (prev_buf != hscan->xs_cbuf)
- heap_page_prune_opt(hscan->xs_base.rel, hscan->xs_cbuf);
+ heap_page_prune_opt(hscan->xs_base.rel, hscan->xs_cbuf,
+ scan->modifies_base_rel ? NULL : &hscan->xs_vmbuffer);
}
/* Obtain share-lock on the buffer so we can examine visibility */
@@ -2471,6 +2479,7 @@ BitmapHeapScanNextBlock(TableScanDesc scan,
TBMIterateResult *tbmres;
OffsetNumber offsets[TBM_MAX_TUPLES_PER_PAGE];
int noffsets = -1;
+ Buffer *vmbuffer = NULL;
Assert(scan->rs_flags & SO_TYPE_BITMAPSCAN);
Assert(hscan->rs_read_stream);
@@ -2517,7 +2526,9 @@ BitmapHeapScanNextBlock(TableScanDesc scan,
/*
* Prune and repair fragmentation for the whole page, if possible.
*/
- heap_page_prune_opt(scan->rs_rd, buffer);
+ if (scan->rs_flags & SO_ALLOW_VM_SET)
+ vmbuffer = &hscan->rs_vmbuffer;
+ heap_page_prune_opt(scan->rs_rd, buffer, vmbuffer);
/*
* We must hold share lock on the buffer content while examining tuple
diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index ba8ddc7fa35..69d8e42bdc8 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -198,9 +198,13 @@ static bool identify_and_fix_vm_corruption(Relation relation,
* if there's not any use in pruning.
*
* Caller must have pin on the buffer, and must *not* have a lock on it.
+ *
+ * If vmbuffer is not NULL, it is okay for pruning to set the visibility map if
+ * the page is all visible. We will take care of pinning and, if needed,
+ * reading in the page of the visibility map.
*/
void
-heap_page_prune_opt(Relation relation, Buffer buffer)
+heap_page_prune_opt(Relation relation, Buffer buffer, Buffer *vmbuffer)
{
Page page = BufferGetPage(buffer);
TransactionId prune_xid;
@@ -264,6 +268,13 @@ heap_page_prune_opt(Relation relation, Buffer buffer)
{
OffsetNumber dummy_off_loc;
PruneFreezeResult presult;
+ int options = 0;
+
+ if (vmbuffer)
+ {
+ visibilitymap_pin(relation, BufferGetBlockNumber(buffer), vmbuffer);
+ options = HEAP_PAGE_PRUNE_UPDATE_VM;
+ }
/*
* For now, pass mark_unused_now as false regardless of whether or
@@ -271,9 +282,10 @@ heap_page_prune_opt(Relation relation, Buffer buffer)
* that during on-access pruning with the current implementation.
*/
heap_page_prune_and_freeze(relation, buffer, false,
- InvalidBuffer,
- vistest, 0,
- NULL, &presult, PRUNE_ON_ACCESS, &dummy_off_loc, NULL, NULL);
+ vmbuffer ? *vmbuffer : InvalidBuffer,
+ vistest, options,
+ NULL, &presult, PRUNE_ON_ACCESS,
+ &dummy_off_loc, NULL, NULL);
/*
* Report the number of tuples reclaimed to pgstats. This is
@@ -519,12 +531,17 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
* all-frozen for use in opportunistic freezing and to update the VM if
* the caller requests it.
*
- * Currently, only VACUUM attempts freezing and setting the VM bits. But
- * other callers could do either one. The visibility bookkeeping is
- * required for opportunistic freezing (in addition to setting the VM
- * bits) because we only consider opportunistically freezing tuples if the
- * whole page would become all-frozen or if the whole page will be frozen
- * except for dead tuples that will be removed by vacuum.
+ * Currently, only VACUUM attempts freezing. But other callers could. The
+ * visibility bookkeeping is required for opportunistic freezing (in
+ * addition to setting the VM bits) because we only consider
+ * opportunistically freezing tuples if the whole page would become
+ * all-frozen or if the whole page will be frozen except for dead tuples
+ * that will be removed by vacuum. But if consider_update_vm is false,
+ * we'll not set the VM even if the page is discovered to be all-visible.
+ *
+ * If only HEAP_PAGE_PRUNE_UPDATE_VM is passed and not
+ * HEAP_PAGE_PRUNE_FREEZE, prstate.all_frozen must be initialized to false
+ * because we will not call heap_prepare_freeze_tuple() on each tuple.
*
* If only updating the VM, we must initialize all_frozen to false, as
* heap_prepare_freeze_tuple() will not be called for each tuple on the
@@ -536,7 +553,7 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
* whether or not to freeze but before deciding whether or not to update
* the VM so that we don't set the VM bit incorrectly.
*
- * If not freezing or updating the VM, we otherwise avoid the extra
+ * If not freezing and not updating the VM, we avoid the extra
* bookkeeping. Initializing all_visible to false allows skipping the work
* to update them in heap_prune_record_unchanged_lp_normal().
*/
@@ -885,12 +902,30 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
prstate.all_frozen = false;
}
+ /*
+ * If this is an on-access call and we're not actually pruning, avoid
+ * setting the visibility map if it would newly dirty the heap page or, if
+ * the page is already dirty, if doing so would require including a
+ * full-page image (FPI) of the heap page in the WAL. This situation
+ * should be rare, as on-access pruning is only attempted when
+ * pd_prune_xid is valid.
+ */
+ if (reason == PRUNE_ON_ACCESS &&
+ prstate.consider_update_vm &&
+ prstate.all_visible &&
+ !do_prune && !do_freeze &&
+ (!BufferIsDirty(buffer) || XLogCheckBufferNeedsBackup(buffer)))
+ {
+ prstate.consider_update_vm = false;
+ prstate.all_visible = prstate.all_frozen = false;
+ }
+
Assert(!prstate.all_frozen || prstate.all_visible);
/*
- * Handle setting visibility map bit based on information from the VM (as
- * of last heap_vac_scan_next_block() call), and from all_visible and
- * all_frozen variables.
+ * Handle setting visibility map bit based on information from the VM (if
+ * provided, e.g. by vacuum from the last heap_vac_scan_next_block()
+ * call), and from all_visible and all_frozen variables.
*/
if (prstate.consider_update_vm)
{
@@ -2284,8 +2319,8 @@ heap_log_freeze_plan(HeapTupleFreeze *tuples, int ntuples,
* - Reaping: During vacuum phase III, items that are already LP_DEAD are
* marked as unused.
*
- * - VM updates: After vacuum phases I and III, the heap page may be marked
- * all-visible and all-frozen.
+ * - VM updates: After vacuum phases I and III and on-access, the heap page
+ * may be marked all-visible and all-frozen.
*
* These changes all happen together, so we use a singel WAL record for them
* all.
diff --git a/src/backend/access/index/indexam.c b/src/backend/access/index/indexam.c
index 86d11f4ec79..4603ece09bd 100644
--- a/src/backend/access/index/indexam.c
+++ b/src/backend/access/index/indexam.c
@@ -289,6 +289,32 @@ index_beginscan(Relation heapRelation,
return scan;
}
+/*
+ * Similar to index_beginscan(), but allows the caller to indicate whether the
+ * query modifies the underlying base relation. This is used when the caller
+ * wants to attempt marking pages in the base relation as all-visible in the
+ * visibility map during on-access pruning.
+ */
+IndexScanDesc
+index_beginscan_vmset(Relation heapRelation,
+ Relation indexRelation,
+ Snapshot snapshot,
+ IndexScanInstrumentation *instrument,
+ int nkeys, int norderbys, bool modifies_base_rel)
+{
+ IndexScanDesc scan;
+
+ scan = index_beginscan(heapRelation,
+ indexRelation,
+ snapshot,
+ instrument,
+ nkeys, norderbys);
+
+ scan->xs_heapfetch->modifies_base_rel = modifies_base_rel;
+
+ return scan;
+}
+
/*
* index_beginscan_bitmap - start a scan of an index with amgetbitmap
*
@@ -620,6 +646,26 @@ index_beginscan_parallel(Relation heaprel, Relation indexrel,
return scan;
}
+/*
+ * Parallel version of index_beginscan_vmset()
+ */
+IndexScanDesc
+index_beginscan_parallel_vmset(Relation heaprel, Relation indexrel,
+ IndexScanInstrumentation *instrument,
+ int nkeys, int norderbys,
+ ParallelIndexScanDesc pscan,
+ bool modifies_base_rel)
+{
+ IndexScanDesc scan;
+
+ scan = index_beginscan_parallel(heaprel, indexrel,
+ instrument,
+ nkeys, norderbys,
+ pscan);
+ scan->xs_heapfetch->modifies_base_rel = modifies_base_rel;
+ return scan;
+}
+
/* ----------------
* index_getnext_tid - get the next TID from a scan
*
diff --git a/src/backend/access/table/tableam.c b/src/backend/access/table/tableam.c
index a56c5eceb14..67dbf99f5b5 100644
--- a/src/backend/access/table/tableam.c
+++ b/src/backend/access/table/tableam.c
@@ -49,6 +49,10 @@
char *default_table_access_method = DEFAULT_TABLE_ACCESS_METHOD;
bool synchronize_seqscans = true;
+/* Helper for table_beginscan_parallel() and table_beginscan_parallel_vmset() */
+static TableScanDesc table_beginscan_parallel_common(Relation relation, ParallelTableScanDesc pscan,
+ uint32 flags);
+
/* ----------------------------------------------------------------------------
* Slot functions.
@@ -162,12 +166,14 @@ table_parallelscan_initialize(Relation rel, ParallelTableScanDesc pscan,
}
}
-TableScanDesc
-table_beginscan_parallel(Relation relation, ParallelTableScanDesc pscan)
+/*
+ * Common helper for table_beginscan_parallel() and table_beginscan_parallel_vmset()
+ */
+static TableScanDesc
+table_beginscan_parallel_common(Relation relation, ParallelTableScanDesc pscan,
+ uint32 flags)
{
Snapshot snapshot;
- uint32 flags = SO_TYPE_SEQSCAN |
- SO_ALLOW_STRAT | SO_ALLOW_SYNC | SO_ALLOW_PAGEMODE;
Assert(RelFileLocatorEquals(relation->rd_locator, pscan->phs_locator));
@@ -188,6 +194,31 @@ table_beginscan_parallel(Relation relation, ParallelTableScanDesc pscan)
pscan, flags);
}
+TableScanDesc
+table_beginscan_parallel(Relation relation, ParallelTableScanDesc pscan)
+{
+ uint32 flags = SO_TYPE_SEQSCAN |
+ SO_ALLOW_STRAT | SO_ALLOW_SYNC | SO_ALLOW_PAGEMODE;
+
+ return table_beginscan_parallel_common(relation, pscan, flags);
+}
+
+/*
+ * Parallel version of table_beginscan_vmset()
+ */
+TableScanDesc
+table_beginscan_parallel_vmset(Relation relation, ParallelTableScanDesc pscan,
+ bool modifies_rel)
+{
+ uint32 flags = SO_TYPE_SEQSCAN |
+ SO_ALLOW_STRAT | SO_ALLOW_SYNC | SO_ALLOW_PAGEMODE;
+
+ if (!modifies_rel)
+ flags |= SO_ALLOW_VM_SET;
+
+ return table_beginscan_parallel_common(relation, pscan, flags);
+}
+
/* ----------------------------------------------------------------------------
* Index scan related functions.
diff --git a/src/backend/executor/execMain.c b/src/backend/executor/execMain.c
index ff12e2e1364..2e0474c948a 100644
--- a/src/backend/executor/execMain.c
+++ b/src/backend/executor/execMain.c
@@ -916,6 +916,10 @@ InitPlan(QueryDesc *queryDesc, int eflags)
break;
}
+ /* If it has a rowmark, the relation is modified */
+ estate->es_modified_relids = bms_add_member(estate->es_modified_relids,
+ rc->rti);
+
/* Check that relation is a legal target for marking */
if (relation)
CheckValidRowMarkRel(relation, rc->markType);
diff --git a/src/backend/executor/execUtils.c b/src/backend/executor/execUtils.c
index fdc65c2b42b..28a06dcd244 100644
--- a/src/backend/executor/execUtils.c
+++ b/src/backend/executor/execUtils.c
@@ -893,6 +893,8 @@ ExecInitResultRelation(EState *estate, ResultRelInfo *resultRelInfo,
estate->es_result_relations = (ResultRelInfo **)
palloc0(estate->es_range_table_size * sizeof(ResultRelInfo *));
estate->es_result_relations[rti - 1] = resultRelInfo;
+ estate->es_modified_relids = bms_add_member(estate->es_modified_relids,
+ rti);
/*
* Saving in the list allows to avoid needlessly traversing the whole
diff --git a/src/backend/executor/nodeBitmapHeapscan.c b/src/backend/executor/nodeBitmapHeapscan.c
index bf24f3d7fe0..af6db9f7919 100644
--- a/src/backend/executor/nodeBitmapHeapscan.c
+++ b/src/backend/executor/nodeBitmapHeapscan.c
@@ -105,11 +105,16 @@ BitmapTableScanSetup(BitmapHeapScanState *node)
*/
if (!node->ss.ss_currentScanDesc)
{
+ bool modifies_rel =
+ bms_is_member(((Scan *) node->ss.ps.plan)->scanrelid,
+ node->ss.ps.state->es_modified_relids);
+
node->ss.ss_currentScanDesc =
table_beginscan_bm(node->ss.ss_currentRelation,
node->ss.ps.state->es_snapshot,
0,
- NULL);
+ NULL,
+ modifies_rel);
}
node->ss.ss_currentScanDesc->st.rs_tbmiterator = tbmiterator;
diff --git a/src/backend/executor/nodeIndexscan.c b/src/backend/executor/nodeIndexscan.c
index 7fcaa37fe62..c2ffbd3b08e 100644
--- a/src/backend/executor/nodeIndexscan.c
+++ b/src/backend/executor/nodeIndexscan.c
@@ -102,16 +102,22 @@ IndexNext(IndexScanState *node)
if (scandesc == NULL)
{
+
+ bool modifies_base_rel =
+ bms_is_member(((Scan *) node->ss.ps.plan)->scanrelid,
+ estate->es_modified_relids);
+
/*
* We reach here if the index scan is not parallel, or if we're
* serially executing an index scan that was planned to be parallel.
*/
- scandesc = index_beginscan(node->ss.ss_currentRelation,
- node->iss_RelationDesc,
- estate->es_snapshot,
- &node->iss_Instrument,
- node->iss_NumScanKeys,
- node->iss_NumOrderByKeys);
+ scandesc = index_beginscan_vmset(node->ss.ss_currentRelation,
+ node->iss_RelationDesc,
+ estate->es_snapshot,
+ &node->iss_Instrument,
+ node->iss_NumScanKeys,
+ node->iss_NumOrderByKeys,
+ modifies_base_rel);
node->iss_ScanDesc = scandesc;
diff --git a/src/backend/executor/nodeSeqscan.c b/src/backend/executor/nodeSeqscan.c
index 94047d29430..fd69275c181 100644
--- a/src/backend/executor/nodeSeqscan.c
+++ b/src/backend/executor/nodeSeqscan.c
@@ -65,13 +65,18 @@ SeqNext(SeqScanState *node)
if (scandesc == NULL)
{
+ bool modifies_rel =
+ bms_is_member(((Scan *) node->ss.ps.plan)->scanrelid,
+ estate->es_modified_relids);
+
/*
* We reach here if the scan is not parallel, or if we're serially
* executing a scan that was planned to be parallel.
*/
- scandesc = table_beginscan(node->ss.ss_currentRelation,
- estate->es_snapshot,
- 0, NULL);
+ scandesc = table_beginscan_vmset(node->ss.ss_currentRelation,
+ estate->es_snapshot,
+ 0, NULL, modifies_rel);
+
node->ss.ss_currentScanDesc = scandesc;
}
@@ -366,6 +371,7 @@ ExecSeqScanInitializeDSM(SeqScanState *node,
ParallelContext *pcxt)
{
EState *estate = node->ss.ps.state;
+ bool modifies_rel;
ParallelTableScanDesc pscan;
pscan = shm_toc_allocate(pcxt->toc, node->pscan_len);
@@ -373,8 +379,11 @@ ExecSeqScanInitializeDSM(SeqScanState *node,
pscan,
estate->es_snapshot);
shm_toc_insert(pcxt->toc, node->ss.ps.plan->plan_node_id, pscan);
+ modifies_rel = bms_is_member(((Scan *) node->ss.ps.plan)->scanrelid,
+ estate->es_modified_relids);
node->ss.ss_currentScanDesc =
- table_beginscan_parallel(node->ss.ss_currentRelation, pscan);
+ table_beginscan_parallel_vmset(node->ss.ss_currentRelation, pscan,
+ modifies_rel);
}
/* ----------------------------------------------------------------
@@ -404,8 +413,13 @@ ExecSeqScanInitializeWorker(SeqScanState *node,
ParallelWorkerContext *pwcxt)
{
ParallelTableScanDesc pscan;
+ bool modifies_rel =
+ bms_is_member(((Scan *) node->ss.ps.plan)->scanrelid,
+ node->ss.ps.state->es_modified_relids);
pscan = shm_toc_lookup(pwcxt->toc, node->ss.ps.plan->plan_node_id, false);
node->ss.ss_currentScanDesc =
- table_beginscan_parallel(node->ss.ss_currentRelation, pscan);
+ table_beginscan_parallel_vmset(node->ss.ss_currentRelation,
+ pscan,
+ modifies_rel);
}
diff --git a/src/include/access/genam.h b/src/include/access/genam.h
index 5b2ab181b5f..bf272c2c37f 100644
--- a/src/include/access/genam.h
+++ b/src/include/access/genam.h
@@ -180,6 +180,11 @@ extern IndexScanDesc index_beginscan(Relation heapRelation,
Snapshot snapshot,
IndexScanInstrumentation *instrument,
int nkeys, int norderbys);
+extern IndexScanDesc index_beginscan_vmset(Relation heapRelation,
+ Relation indexRelation,
+ Snapshot snapshot,
+ IndexScanInstrumentation *instrument,
+ int nkeys, int norderbys, bool modifies_heap_rel);
extern IndexScanDesc index_beginscan_bitmap(Relation indexRelation,
Snapshot snapshot,
IndexScanInstrumentation *instrument,
@@ -206,6 +211,12 @@ extern IndexScanDesc index_beginscan_parallel(Relation heaprel,
IndexScanInstrumentation *instrument,
int nkeys, int norderbys,
ParallelIndexScanDesc pscan);
+
+extern IndexScanDesc index_beginscan_parallel_vmset(Relation heaprel, Relation indexrel,
+ IndexScanInstrumentation *instrument,
+ int nkeys, int norderbys,
+ ParallelIndexScanDesc pscan,
+ bool modifies_rel);
extern ItemPointer index_getnext_tid(IndexScanDesc scan,
ScanDirection direction);
struct TupleTableSlot;
diff --git a/src/include/access/heapam.h b/src/include/access/heapam.h
index 4278f351bdf..16f7904a21e 100644
--- a/src/include/access/heapam.h
+++ b/src/include/access/heapam.h
@@ -94,6 +94,13 @@ typedef struct HeapScanDescData
*/
ParallelBlockTableScanWorkerData *rs_parallelworkerdata;
+ /*
+ * For sequential scans and bitmap heap scans. If the relation is not
+ * being modified, on-access pruning may read in the current heap page's
+ * corresponding VM block to this buffer.
+ */
+ Buffer rs_vmbuffer;
+
/* these fields only used in page-at-a-time mode and for bitmap scans */
uint32 rs_cindex; /* current tuple's index in vistuples */
uint32 rs_ntuples; /* number of visible tuples on page */
@@ -116,8 +123,18 @@ typedef struct IndexFetchHeapData
{
IndexFetchTableData xs_base; /* AM independent part of the descriptor */
- Buffer xs_cbuf; /* current heap buffer in scan, if any */
- /* NB: if xs_cbuf is not InvalidBuffer, we hold a pin on that buffer */
+ /*
+ * Current heap buffer in scan, if any. NB: if xs_cbuf is not
+ * InvalidBuffer, we hold a pin on that buffer.
+ */
+ Buffer xs_cbuf;
+
+ /*
+ * For index scans that do not modify the underlying heap table, on-access
+ * pruning may read in the current heap page's corresponding VM block to
+ * this buffer.
+ */
+ Buffer xs_vmbuffer;
} IndexFetchHeapData;
/* Result codes for HeapTupleSatisfiesVacuum */
@@ -374,7 +391,8 @@ extern TransactionId heap_index_delete_tuples(Relation rel,
/* in heap/pruneheap.c */
struct GlobalVisState;
-extern void heap_page_prune_opt(Relation relation, Buffer buffer);
+extern void heap_page_prune_opt(Relation relation, Buffer buffer,
+ Buffer *vmbuffer);
extern void heap_page_prune_and_freeze(Relation relation, Buffer buffer,
bool blk_known_av,
Buffer vmbuffer,
diff --git a/src/include/access/relscan.h b/src/include/access/relscan.h
index b5e0fb386c0..f496e0b4939 100644
--- a/src/include/access/relscan.h
+++ b/src/include/access/relscan.h
@@ -121,6 +121,12 @@ typedef struct ParallelBlockTableScanWorkerData *ParallelBlockTableScanWorker;
typedef struct IndexFetchTableData
{
Relation rel;
+
+ /*
+ * Some optimizations can only be performed if the query does not modify
+ * the underlying relation. Track that here.
+ */
+ bool modifies_base_rel;
} IndexFetchTableData;
struct IndexScanInstrumentation;
diff --git a/src/include/access/tableam.h b/src/include/access/tableam.h
index b2ce35e2a34..e31c21cf8eb 100644
--- a/src/include/access/tableam.h
+++ b/src/include/access/tableam.h
@@ -62,6 +62,8 @@ typedef enum ScanOptions
/* unregister snapshot at scan end? */
SO_TEMP_SNAPSHOT = 1 << 9,
+ /* whether or not scan should attempt to set the VM */
+ SO_ALLOW_VM_SET = 1 << 10,
} ScanOptions;
/*
@@ -881,6 +883,25 @@ table_beginscan(Relation rel, Snapshot snapshot,
return rel->rd_tableam->scan_begin(rel, snapshot, nkeys, key, NULL, flags);
}
+/*
+ * Similar to table_beginscan(), but allows the caller to indicate whether the
+ * query modifies the relation. This is used when the caller wants to attempt
+ * marking pages in the relation as all-visible in the visibility map during
+ * on-access pruning.
+ */
+static inline TableScanDesc
+table_beginscan_vmset(Relation rel, Snapshot snapshot,
+ int nkeys, struct ScanKeyData *key, bool modifies_rel)
+{
+ uint32 flags = SO_TYPE_SEQSCAN |
+ SO_ALLOW_STRAT | SO_ALLOW_SYNC | SO_ALLOW_PAGEMODE;
+
+ if (!modifies_rel)
+ flags |= SO_ALLOW_VM_SET;
+
+ return rel->rd_tableam->scan_begin(rel, snapshot, nkeys, key, NULL, flags);
+}
+
/*
* Like table_beginscan(), but for scanning catalog. It'll automatically use a
* snapshot appropriate for scanning catalog relations.
@@ -918,10 +939,13 @@ table_beginscan_strat(Relation rel, Snapshot snapshot,
*/
static inline TableScanDesc
table_beginscan_bm(Relation rel, Snapshot snapshot,
- int nkeys, struct ScanKeyData *key)
+ int nkeys, struct ScanKeyData *key, bool modifies_rel)
{
uint32 flags = SO_TYPE_BITMAPSCAN | SO_ALLOW_PAGEMODE;
+ if (!modifies_rel)
+ flags |= SO_ALLOW_VM_SET;
+
return rel->rd_tableam->scan_begin(rel, snapshot, nkeys, key,
NULL, flags);
}
@@ -1130,6 +1154,10 @@ extern void table_parallelscan_initialize(Relation rel,
extern TableScanDesc table_beginscan_parallel(Relation relation,
ParallelTableScanDesc pscan);
+extern TableScanDesc table_beginscan_parallel_vmset(Relation relation,
+ ParallelTableScanDesc pscan,
+ bool modifies_rel);
+
/*
* Restart a parallel scan. Call this in the leader process. Caller is
* responsible for making sure that all workers have finished the scan
diff --git a/src/include/nodes/execnodes.h b/src/include/nodes/execnodes.h
index de782014b2d..839c1be1d7c 100644
--- a/src/include/nodes/execnodes.h
+++ b/src/include/nodes/execnodes.h
@@ -678,6 +678,12 @@ typedef struct EState
* ExecDoInitialPruning() */
const char *es_sourceText; /* Source text from QueryDesc */
+ /*
+ * RT indexes of relations modified by the query either through
+ * UPDATE/DELETE/INSERT/MERGE or SELECT FOR UPDATE
+ */
+ Bitmapset *es_modified_relids;
+
JunkFilter *es_junkFilter; /* top-level junk filter, if any */
/* If query can insert/delete tuples, the command ID to mark them with */
diff --git a/src/test/recovery/t/035_standby_logical_decoding.pl b/src/test/recovery/t/035_standby_logical_decoding.pl
index c9c182892cf..f5c0c65b260 100644
--- a/src/test/recovery/t/035_standby_logical_decoding.pl
+++ b/src/test/recovery/t/035_standby_logical_decoding.pl
@@ -296,6 +296,7 @@ wal_level = 'logical'
max_replication_slots = 4
max_wal_senders = 4
autovacuum = off
+hot_standby_feedback = on
});
$node_primary->dump_info;
$node_primary->start;
@@ -745,7 +746,7 @@ check_pg_recvlogical_stderr($handle,
$logstart = -s $node_standby->logfile;
reactive_slots_change_hfs_and_wait_for_xmins('shared_row_removal_',
- 'no_conflict_', 0, 1);
+ 'no_conflict_', 1, 0);
# This should not trigger a conflict
wait_until_vacuum_can_remove(
--
2.43.0
v11-0020-Set-pd_prune_xid-on-insert.patchtext/x-patch; charset=US-ASCII; name=v11-0020-Set-pd_prune_xid-on-insert.patchDownload
From 1f1a222f13f55c0c8e4c66fe5075b0bd3f7f1949 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <melanieplageman@gmail.com>
Date: Tue, 29 Jul 2025 16:12:56 -0400
Subject: [PATCH v11 20/20] Set pd_prune_xid on insert
Now that we can set the VM during read-only queries, it makes sense to
start setting the page prunable hint on insert. This will allow
heap_page_prune_and_freeze() to be called when the page is full or
mostly full.
For years there has been a note in heap_insert() and heap_multi_insert()
pointing out that setting pd_prune_xid would help clean up aborted
inserted tuples that would otherwise not be cleaned up until vacuum.
So, that's another benefit of setting it.
Setting pd_prune_xid on insert causes a page to be pruned and then
written out which then affects the reported number of hits in the
index-killtuples isolation test. This is a quirk of how hits are tracked
which sometimes leads them to be double counted. This should probably be
fixed or changed independently.
---
src/backend/access/heap/heapam.c | 25 +++++++++++++------
src/backend/access/heap/heapam_xlog.c | 15 ++++++++++-
.../isolation/expected/index-killtuples.out | 6 ++---
3 files changed, 34 insertions(+), 12 deletions(-)
diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index f9e50d47aee..09d97896c66 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -2105,6 +2105,7 @@ heap_insert(Relation relation, HeapTuple tup, CommandId cid,
TransactionId xid = GetCurrentTransactionId();
HeapTuple heaptup;
Buffer buffer;
+ Page page;
Buffer vmbuffer = InvalidBuffer;
bool all_visible_cleared = false;
@@ -2164,15 +2165,19 @@ heap_insert(Relation relation, HeapTuple tup, CommandId cid,
}
/*
- * XXX Should we set PageSetPrunable on this page ?
+ * Set pd_prune_xid to trigger heap_page_prune_and_freeze() once the page
+ * is full so that we can set the page all-visible in the VM.
*
- * The inserting transaction may eventually abort thus making this tuple
- * DEAD and hence available for pruning. Though we don't want to optimize
- * for aborts, if no other tuple in this page is UPDATEd/DELETEd, the
- * aborted tuple will never be pruned until next vacuum is triggered.
+ * Setting pd_prune_xid is also handy if the inserting transaction
+ * eventually aborts making this tuple DEAD and hence available for
+ * pruning. If no other tuple in this page is UPDATEd/DELETEd, the aborted
+ * tuple would never otherwise be pruned until next vacuum is triggered.
*
- * If you do add PageSetPrunable here, add it in heap_xlog_insert too.
+ * Don't set it if we are in bootstrap mode, though.
*/
+ page = BufferGetPage(buffer);
+ if (TransactionIdIsNormal(xid))
+ PageSetPrunable(page, xid);
MarkBufferDirty(buffer);
@@ -2182,7 +2187,6 @@ heap_insert(Relation relation, HeapTuple tup, CommandId cid,
xl_heap_insert xlrec;
xl_heap_header xlhdr;
XLogRecPtr recptr;
- Page page = BufferGetPage(buffer);
uint8 info = XLOG_HEAP_INSERT;
int bufflags = 0;
@@ -2548,8 +2552,13 @@ heap_multi_insert(Relation relation, TupleTableSlot **slots, int ntuples,
}
/*
- * XXX Should we set PageSetPrunable on this page ? See heap_insert()
+ * Set pd_prune_xid. See heap_insert() for more on why we do this when
+ * inserting tuples. This only makes sense if we aren't already
+ * setting the page frozen in the VM. We also don't set it in
+ * bootstrap mode.
*/
+ if (!all_frozen_set && TransactionIdIsNormal(xid))
+ PageSetPrunable(page, xid);
MarkBufferDirty(buffer);
diff --git a/src/backend/access/heap/heapam_xlog.c b/src/backend/access/heap/heapam_xlog.c
index ff3ad8b4cd2..e7d7804871b 100644
--- a/src/backend/access/heap/heapam_xlog.c
+++ b/src/backend/access/heap/heapam_xlog.c
@@ -470,6 +470,12 @@ heap_xlog_insert(XLogReaderState *record)
freespace = PageGetHeapFreeSpace(page); /* needed to update FSM below */
+ /*
+ * Set the page prunable to trigger on-access pruning later which may
+ * set the page all-visible in the VM.
+ */
+ PageSetPrunable(page, XLogRecGetXid(record));
+
PageSetLSN(page, lsn);
if (xlrec->flags & XLH_INSERT_ALL_VISIBLE_CLEARED)
@@ -619,9 +625,16 @@ heap_xlog_multi_insert(XLogReaderState *record)
if (xlrec->flags & XLH_INSERT_ALL_VISIBLE_CLEARED)
PageClearAllVisible(page);
- /* XLH_INSERT_ALL_FROZEN_SET implies that all tuples are visible */
+ /*
+ * XLH_INSERT_ALL_FROZEN_SET implies that all tuples are visible. If
+ * we are not setting the page frozen, then set the page's prunable
+ * hint so that we trigger on-access pruning later which may set the
+ * page all-visible in the VM.
+ */
if (xlrec->flags & XLH_INSERT_ALL_FROZEN_SET)
PageSetAllVisible(page);
+ else
+ PageSetPrunable(page, XLogRecGetXid(record));
MarkBufferDirty(buffer);
}
diff --git a/src/test/isolation/expected/index-killtuples.out b/src/test/isolation/expected/index-killtuples.out
index be7ddd756ef..b29f2434b00 100644
--- a/src/test/isolation/expected/index-killtuples.out
+++ b/src/test/isolation/expected/index-killtuples.out
@@ -54,7 +54,7 @@ step flush: SELECT FROM pg_stat_force_next_flush();
step result: SELECT heap_blks_read + heap_blks_hit - counter.heap_accesses AS new_heap_accesses FROM counter, pg_statio_all_tables WHERE relname = 'kill_prior_tuple';
new_heap_accesses
-----------------
- 1
+ 2
(1 row)
step measure: UPDATE counter SET heap_accesses = (SELECT heap_blks_read + heap_blks_hit FROM pg_statio_all_tables WHERE relname = 'kill_prior_tuple');
@@ -130,7 +130,7 @@ step flush: SELECT FROM pg_stat_force_next_flush();
step result: SELECT heap_blks_read + heap_blks_hit - counter.heap_accesses AS new_heap_accesses FROM counter, pg_statio_all_tables WHERE relname = 'kill_prior_tuple';
new_heap_accesses
-----------------
- 1
+ 2
(1 row)
step measure: UPDATE counter SET heap_accesses = (SELECT heap_blks_read + heap_blks_hit FROM pg_statio_all_tables WHERE relname = 'kill_prior_tuple');
@@ -283,7 +283,7 @@ step flush: SELECT FROM pg_stat_force_next_flush();
step result: SELECT heap_blks_read + heap_blks_hit - counter.heap_accesses AS new_heap_accesses FROM counter, pg_statio_all_tables WHERE relname = 'kill_prior_tuple';
new_heap_accesses
-----------------
- 1
+ 2
(1 row)
step measure: UPDATE counter SET heap_accesses = (SELECT heap_blks_read + heap_blks_hit FROM pg_statio_all_tables WHERE relname = 'kill_prior_tuple');
--
2.43.0
v11-0018-Add-helper-functions-to-heap_page_prune_and_free.patchtext/x-patch; charset=US-ASCII; name=v11-0018-Add-helper-functions-to-heap_page_prune_and_free.patchDownload
From fbf27c16f0add32135836ea843cdbc1b8fc4aa44 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <melanieplageman@gmail.com>
Date: Wed, 30 Jul 2025 18:51:43 -0400
Subject: [PATCH v11 18/20] Add helper functions to heap_page_prune_and_freeze
heap_page_prune_and_freeze() has gotten rather long. It has several
stages:
1) setup - where the PruneState is set up
2) tuple examination -- where tuples and line pointers are examined to
determine what needs to be pruned and what could be frozen
3) evaluation -- where we determine based on caller provided options,
heuristics, and state gathered during stage 2 whether or not to
freeze tuples and set the page in the VM
4) execution - where the page changes are actually made and logged
This commit refactors the evaluation stage into helpers which return
whether or not to freeze and set the VM.
XXX: For the purposes of committing, this likely shouldn't be a separate
commit. But I'm not sure yet whether it makes more sense to do this
refactoring earlier in the set for clarity for the reviewer.
---
src/backend/access/heap/pruneheap.c | 473 +++++++++++++++++-----------
1 file changed, 296 insertions(+), 177 deletions(-)
diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index 69d8e42bdc8..67b56e45ad7 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -179,6 +179,22 @@ static void heap_prune_record_unchanged_lp_redirect(PruneState *prstate, OffsetN
static void page_verify_redirects(Page page);
+static bool heap_page_will_freeze(Relation relation, Buffer buffer,
+ bool do_prune,
+ bool do_hint_full_or_prunable,
+ bool did_tuple_hint_fpi,
+ PruneState *prstate,
+ bool *all_frozen_except_lp_dead);
+
+static bool heap_page_will_update_vm(Relation relation,
+ Buffer buffer, BlockNumber blockno, Page page,
+ PruneReason reason,
+ bool do_prune, bool do_freeze,
+ bool blk_known_av,
+ PruneState *prstate,
+ Buffer *vmbuffer, uint8 *vmflags,
+ bool *set_pd_all_visible);
+
static bool identify_and_fix_vm_corruption(Relation relation,
BlockNumber heap_blk,
Buffer heap_buffer, Page heap_page,
@@ -382,6 +398,249 @@ identify_and_fix_vm_corruption(Relation relation,
return false;
}
+
+/*
+ * Determine whether to set the visibility map bits based on information from
+ * the PruneState and blk_known_av, which some callers will provide after
+ * previously examining this heap page's VM bits (e.g. vacuum from the last
+ * heap_vac_scan_next_block() call).
+ *
+ * We pass in blockno and page even those can be derived from buffer to avoid
+ * extra BufferGetBlock() and BufferGetBlockNumber() calls.
+ *
+ * This should be called only after do_freeze has been decided (and do_prune
+ * has been set), as these factor into our heuristic-based decision.
+ *
+ * prstate and vmbuffer are input/output fields. vmflags and and
+ * set_pd_all_visible are output fields.
+ *
+ * Returns true if the caller should set one or both of the VM bits and false
+ * otherwise.
+ */
+static bool
+heap_page_will_update_vm(Relation relation,
+ Buffer buffer, BlockNumber blockno, Page page,
+ PruneReason reason,
+ bool do_prune, bool do_freeze,
+ bool blk_known_av,
+ PruneState *prstate,
+ Buffer *vmbuffer, uint8 *vmflags,
+ bool *set_pd_all_visible)
+{
+ bool do_set_vm = false;
+
+ /*
+ * If the caller specified not to update the VM, validate everything is in
+ * the right state and exit.
+ */
+ if (!prstate->consider_update_vm)
+ {
+ Assert(!prstate->all_visible && !prstate->all_frozen);
+ /* We don't set only the page level visibility hint */
+ Assert(!(*set_pd_all_visible));
+ Assert(*vmflags == 0);
+ return false;
+ }
+
+ /*
+ * If this is an on-access call and we're not actually pruning, avoid
+ * setting the visibility map if it would newly dirty the heap page or, if
+ * the page is already dirty, if doing so would require including a
+ * full-page image (FPI) of the heap page in the WAL. This situation
+ * should be rare, as on-access pruning is only attempted when
+ * pd_prune_xid is valid.
+ */
+ if (reason == PRUNE_ON_ACCESS &&
+ prstate->consider_update_vm &&
+ prstate->all_visible &&
+ !do_prune && !do_freeze &&
+ (!BufferIsDirty(buffer) || XLogCheckBufferNeedsBackup(buffer)))
+ {
+ prstate->consider_update_vm = false;
+ prstate->all_visible = prstate->all_frozen = false;
+ }
+
+ Assert(!prstate->all_frozen || prstate->all_visible);
+
+ /*
+ * Clear any VM corruption. This does not need to be in a critical
+ * section, so we do it first. If PD_ALL_VISIBLE is incorrectly set, we
+ * may mark the heap page buffer dirty here and could end up doing so
+ * again later. This is not a correctness issue and is in the path of VM
+ * corruption, so we don't have to worry about the extra performance
+ * overhead.
+ */
+ if (identify_and_fix_vm_corruption(relation,
+ blockno, buffer, page,
+ blk_known_av, prstate->lpdead_items,
+ *vmbuffer))
+ {
+ /* If we fix corruption, don't update the VM further */
+ }
+
+ /* Determine if we actually need to set the VM and which bits to set. */
+ else if (prstate->all_visible &&
+ (!blk_known_av ||
+ (prstate->all_frozen && !VM_ALL_FROZEN(relation, blockno, vmbuffer))))
+ {
+ *vmflags |= VISIBILITYMAP_ALL_VISIBLE;
+ if (prstate->all_frozen)
+ *vmflags |= VISIBILITYMAP_ALL_FROZEN;
+ }
+
+ do_set_vm = *vmflags & VISIBILITYMAP_VALID_BITS;
+
+ /*
+ * Don't set PD_ALL_VISIBLE unless we also plan to set the VM. While it is
+ * correct for a heap page to have PD_ALL_VISIBLE even if the VM is not
+ * set, we strongly prefer to keep them in sync.
+ *
+ * Prior to Postgres 19, it was possible for the page-level bit to be set
+ * and the VM bit to be clear. This could happen if we crashed after
+ * setting PD_ALL_VISIBLE but before setting bits in the VM.
+ */
+ *set_pd_all_visible = do_set_vm && !PageIsAllVisible(page);
+ return do_set_vm;
+}
+
+/*
+ * Decide if we want to go ahead with freezing according to the freeze plans we
+ * prepared for the given buffer or not. If the caller specified we should not
+ * freeze tuples, it exits early.
+ *
+ * do_prune, do_hint_full_or_prunable, and did_tuple_hint_fpi must all have
+ * been decided before calling this function.
+ *
+ * prstate is an input/output parameter. all_frozen_except_lp_dead is set and
+ * used later to determine the snapshot conflict horizon for the record.
+ *
+ * Returns true if we should use our freeze plans and freeze tuples on the page
+ * and false otherwise.
+ */
+static bool
+heap_page_will_freeze(Relation relation, Buffer buffer,
+ bool do_prune,
+ bool do_hint_full_or_prunable,
+ bool did_tuple_hint_fpi,
+ PruneState *prstate,
+ bool *all_frozen_except_lp_dead)
+{
+ bool do_freeze = false;
+
+ /*
+ * If the caller specified we should not attempt to freeze any tuples,
+ * validate that everything is in the right state and exit.
+ */
+ if (!prstate->attempt_freeze)
+ {
+ Assert(!prstate->all_frozen && prstate->nfrozen == 0);
+ Assert(prstate->lpdead_items == 0 || !prstate->all_visible);
+ Assert(!(*all_frozen_except_lp_dead));
+ return false;
+ }
+
+ if (prstate->pagefrz.freeze_required)
+ {
+ /*
+ * heap_prepare_freeze_tuple indicated that at least one XID/MXID from
+ * before FreezeLimit/MultiXactCutoff is present. Must freeze to
+ * advance relfrozenxid/relminmxid.
+ */
+ do_freeze = true;
+ }
+ else
+ {
+ /*
+ * Opportunistically freeze the page if we are generating an FPI
+ * anyway and if doing so means that we can set the page all-frozen
+ * afterwards (might not happen until VACUUM's final heap pass).
+ *
+ * XXX: Previously, we knew if pruning emitted an FPI by checking
+ * pgWalUsage.wal_fpi before and after pruning. Once the freeze and
+ * prune records were combined, this heuristic couldn't be used
+ * anymore. The opportunistic freeze heuristic must be improved;
+ * however, for now, try to approximate the old logic.
+ */
+ if (prstate->all_visible && prstate->all_frozen && prstate->nfrozen > 0)
+ {
+ /*
+ * Freezing would make the page all-frozen. Have already emitted
+ * an FPI or will do so anyway?
+ */
+ if (RelationNeedsWAL(relation))
+ {
+ if (did_tuple_hint_fpi)
+ do_freeze = true;
+ else if (do_prune)
+ {
+ if (XLogCheckBufferNeedsBackup(buffer))
+ do_freeze = true;
+ }
+ else if (do_hint_full_or_prunable)
+ {
+ if (XLogHintBitIsNeeded() && XLogCheckBufferNeedsBackup(buffer))
+ do_freeze = true;
+ }
+ }
+ }
+ }
+
+ if (do_freeze)
+ {
+ /*
+ * Validate the tuples we will be freezing before entering the
+ * critical section.
+ */
+ heap_pre_freeze_checks(buffer, prstate->frozen, prstate->nfrozen);
+ }
+ else if (prstate->nfrozen > 0)
+ {
+ /*
+ * The page contained some tuples that were not already frozen, and we
+ * chose not to freeze them now. The page won't be all-frozen then.
+ */
+ Assert(!prstate->pagefrz.freeze_required);
+
+ prstate->all_frozen = false;
+ prstate->nfrozen = 0; /* avoid miscounts in instrumentation */
+ }
+ else
+ {
+ /*
+ * We have no freeze plans to execute. The page might already be
+ * all-frozen (perhaps only following pruning), though. Such pages
+ * can be marked all-frozen in the VM by our caller, even though none
+ * of its tuples were newly frozen here.
+ */
+ }
+
+ /*
+ * It was convenient to ignore LP_DEAD items in all_visible earlier on to
+ * make the choice of whether or not to freeze the page unaffected by the
+ * short-term presence of LP_DEAD items. These LP_DEAD items were
+ * effectively assumed to be LP_UNUSED items in the making. It doesn't
+ * matter which vacuum heap pass (initial pass or final pass) ends up
+ * setting the page all-frozen, as long as the ongoing VACUUM does it.
+ *
+ * Now that freezing has been finalized, unset all_visible if there are
+ * any LP_DEAD items on the page. It needs to reflect the present state of
+ * the page when using it to determine whether or not to update the VM.
+ *
+ * Keep track of whether or not the page was all-frozen except LP_DEAD
+ * items for the purposes of calculating the snapshot conflict horizon,
+ * though.
+ */
+ *all_frozen_except_lp_dead = prstate->all_frozen;
+ if (prstate->lpdead_items > 0)
+ {
+ prstate->all_visible = false;
+ prstate->all_frozen = false;
+ }
+
+ return do_freeze;
+}
+
+
/*
* Prune and repair fragmentation and potentially freeze tuples on the
* specified page. If the page's visibility status has changed, update it in
@@ -772,20 +1031,30 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
/* Clear the offset information once we have processed the given page. */
*off_loc = InvalidOffsetNumber;
- do_prune = prstate.nredirected > 0 ||
- prstate.ndead > 0 ||
- prstate.nunused > 0;
-
/*
* After processing all the live tuples on the page, if the newest xmin
* amongst them is not visible to everyone, the page cannot be
- * all-visible.
+ * all-visible. This must be done before we decide whether or not to
+ * opportunistically freeze below because we do not want to
+ * opportunistically freeze the page if there are live tuples not visible
+ * to everyone, which would prevent setting the page frozen in the VM.
*/
if (prstate.all_visible &&
TransactionIdIsNormal(prstate.visibility_cutoff_xid) &&
!GlobalVisXidVisibleToAll(prstate.vistest, prstate.visibility_cutoff_xid))
prstate.all_visible = prstate.all_frozen = false;
+ /*
+ * Now decide based on information collected while examining every tuple
+ * which actions to take. If there are any prunable tuples, we'll prune
+ * them. However, we will decide based on options specified by the caller
+ * and various heuristics whether or not to freeze any tuples and whether
+ * or not the page should be set all-visible/all-frozen in the VM.
+ */
+ do_prune = prstate.nredirected > 0 ||
+ prstate.ndead > 0 ||
+ prstate.nunused > 0;
+
/*
* Even if we don't prune anything, if we found a new value for the
* pd_prune_xid field or the page was marked full, we will update those
@@ -796,186 +1065,36 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
PageIsFull(page);
/*
- * Decide if we want to go ahead with freezing according to the freeze
- * plans we prepared, or not.
- */
- do_freeze = false;
- if (prstate.attempt_freeze)
- {
- if (prstate.pagefrz.freeze_required)
- {
- /*
- * heap_prepare_freeze_tuple indicated that at least one XID/MXID
- * from before FreezeLimit/MultiXactCutoff is present. Must
- * freeze to advance relfrozenxid/relminmxid.
- */
- do_freeze = true;
- }
- else
- {
- /*
- * Opportunistically freeze the page if we are generating an FPI
- * anyway and if doing so means that we can set the page
- * all-frozen afterwards (might not happen until VACUUM's final
- * heap pass).
- *
- * XXX: Previously, we knew if pruning emitted an FPI by checking
- * pgWalUsage.wal_fpi before and after pruning. Once the freeze
- * and prune records were combined, this heuristic couldn't be
- * used anymore. The opportunistic freeze heuristic must be
- * improved; however, for now, try to approximate the old logic.
- */
- if (prstate.all_visible && prstate.all_frozen && prstate.nfrozen > 0)
- {
- /*
- * Freezing would make the page all-frozen. Have already
- * emitted an FPI or will do so anyway?
- */
- if (RelationNeedsWAL(relation))
- {
- if (did_tuple_hint_fpi)
- do_freeze = true;
- else if (do_prune)
- {
- if (XLogCheckBufferNeedsBackup(buffer))
- do_freeze = true;
- }
- else if (do_hint_full_or_prunable)
- {
- if (XLogHintBitIsNeeded() && XLogCheckBufferNeedsBackup(buffer))
- do_freeze = true;
- }
- }
- }
- }
- }
-
- if (do_freeze)
- {
- /*
- * Validate the tuples we will be freezing before entering the
- * critical section.
- */
- heap_pre_freeze_checks(buffer, prstate.frozen, prstate.nfrozen);
- }
- else if (prstate.nfrozen > 0)
- {
- /*
- * The page contained some tuples that were not already frozen, and we
- * chose not to freeze them now. The page won't be all-frozen then.
- */
- Assert(!prstate.pagefrz.freeze_required);
-
- prstate.all_frozen = false;
- prstate.nfrozen = 0; /* avoid miscounts in instrumentation */
- }
- else
- {
- /*
- * We have no freeze plans to execute. The page might already be
- * all-frozen (perhaps only following pruning), though. Such pages
- * can be marked all-frozen in the VM by our caller, even though none
- * of its tuples were newly frozen here.
- */
- }
-
- /*
- * It was convenient to ignore LP_DEAD items in all_visible earlier on to
- * make the choice of whether or not to freeze the page unaffected by the
- * short-term presence of LP_DEAD items. These LP_DEAD items were
- * effectively assumed to be LP_UNUSED items in the making. It doesn't
- * matter which vacuum heap pass (initial pass or final pass) ends up
- * setting the page all-frozen, as long as the ongoing VACUUM does it.
- *
- * Now that freezing has been finalized, unset all_visible if there are
- * any LP_DEAD items on the page. It needs to reflect the present state of
- * the page when using it to determine whether or not to update the VM.
- *
- * Keep track of whether or not the page was all-frozen except LP_DEAD
- * items for the purposes of calculating the snapshot conflict horizon,
- * though.
+ * We must decide whether or not to freeze before deciding if and what to
+ * set in the VM.
*/
- all_frozen_except_lp_dead = prstate.all_frozen;
- if (prstate.lpdead_items > 0)
- {
- prstate.all_visible = false;
- prstate.all_frozen = false;
- }
+ do_freeze = heap_page_will_freeze(relation, buffer,
+ do_prune,
+ do_hint_full_or_prunable,
+ did_tuple_hint_fpi,
+ &prstate,
+ &all_frozen_except_lp_dead);
+
+ do_set_vm = heap_page_will_update_vm(relation,
+ buffer, blockno, page,
+ reason,
+ do_prune, do_freeze,
+ blk_known_av,
+ &prstate,
+ &vmbuffer,
+ &vmflags, &set_pd_all_visible);
- /*
- * If this is an on-access call and we're not actually pruning, avoid
- * setting the visibility map if it would newly dirty the heap page or, if
- * the page is already dirty, if doing so would require including a
- * full-page image (FPI) of the heap page in the WAL. This situation
- * should be rare, as on-access pruning is only attempted when
- * pd_prune_xid is valid.
- */
- if (reason == PRUNE_ON_ACCESS &&
- prstate.consider_update_vm &&
- prstate.all_visible &&
- !do_prune && !do_freeze &&
- (!BufferIsDirty(buffer) || XLogCheckBufferNeedsBackup(buffer)))
- {
- prstate.consider_update_vm = false;
- prstate.all_visible = prstate.all_frozen = false;
- }
-
- Assert(!prstate.all_frozen || prstate.all_visible);
-
- /*
- * Handle setting visibility map bit based on information from the VM (if
- * provided, e.g. by vacuum from the last heap_vac_scan_next_block()
- * call), and from all_visible and all_frozen variables.
- */
- if (prstate.consider_update_vm)
- {
- /*
- * Clear any VM corruption. This does not need to be in a critical
- * section, so we do it first. If PD_ALL_VISIBLE is incorrectly set,
- * we may mark the heap page buffer dirty here and could end up doing
- * so again later. This is not a correctness issue and is in the path
- * of VM corruption, so we don't have to worry about the extra
- * performance overhead.
- */
- if (identify_and_fix_vm_corruption(relation,
- blockno, buffer, page,
- blk_known_av, prstate.lpdead_items, vmbuffer))
- {
- /* If we fix corruption, don't update the VM further */
- }
-
- /* Determine if we actually need to set the VM and which bits to set. */
- else if (prstate.all_visible &&
- (!blk_known_av ||
- (prstate.all_frozen && !VM_ALL_FROZEN(relation, blockno, &vmbuffer))))
- {
- vmflags |= VISIBILITYMAP_ALL_VISIBLE;
- if (prstate.all_frozen)
- vmflags |= VISIBILITYMAP_ALL_FROZEN;
- }
- }
-
- do_set_vm = vmflags & VISIBILITYMAP_VALID_BITS;
+ /* Save these for the caller in case we later zero out vmflags */
+ presult->new_vmbits = vmflags;
- /* Lock vmbuffer before entering a critical section */
+ /* Lock vmbuffer before entering critical section */
if (do_set_vm)
LockBuffer(vmbuffer, BUFFER_LOCK_EXCLUSIVE);
/*
- * Don't set PD_ALL_VISIBLE unless we also plan to set the VM. While it is
- * correct for a heap page to have PD_ALL_VISIBLE even if the VM is not
- * set, we strongly prefer to keep them in sync.
- *
- * Prior to Postgres 19, it was possible for the page-level bit to be set
- * and the VM bit to be clear. This could happen if we crashed after
- * setting PD_ALL_VISIBLE but before setting bits in the VM.
+ * Time to actually make the changes to the page and log them. Any error
+ * while applying the changes is critical.
*/
- set_pd_all_visible = do_set_vm && !PageIsAllVisible(page);
-
- /* Save these for the caller in case we later zero out vmflags */
- presult->new_vmbits = vmflags;
-
- /* Any error while applying the changes is critical */
START_CRIT_SECTION();
if (do_hint_full_or_prunable)
--
2.43.0
v11-0019-Reorder-heap_page_prune_and_freeze-parameters.patchtext/x-patch; charset=US-ASCII; name=v11-0019-Reorder-heap_page_prune_and_freeze-parameters.patchDownload
From e75e86aac65557f119d5d00077cf21183b55ce46 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <melanieplageman@gmail.com>
Date: Thu, 31 Jul 2025 12:08:18 -0400
Subject: [PATCH v11 19/20] Reorder heap_page_prune_and_freeze parameters
Reorder parameters so that all of the output parameters are together at
the end of the parameter list.
---
src/backend/access/heap/pruneheap.c | 38 ++++++++++++++--------------
src/backend/access/heap/vacuumlazy.c | 6 ++---
src/include/access/heapam.h | 4 +--
3 files changed, 24 insertions(+), 24 deletions(-)
diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index 67b56e45ad7..3e55c43f17b 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -297,10 +297,10 @@ heap_page_prune_opt(Relation relation, Buffer buffer, Buffer *vmbuffer)
* not the relation has indexes, since we cannot safely determine
* that during on-access pruning with the current implementation.
*/
- heap_page_prune_and_freeze(relation, buffer, false,
+ heap_page_prune_and_freeze(relation, buffer, options, false,
vmbuffer ? *vmbuffer : InvalidBuffer,
- vistest, options,
- NULL, &presult, PRUNE_ON_ACCESS,
+ vistest,
+ NULL, PRUNE_ON_ACCESS, &presult,
&dummy_off_loc, NULL, NULL);
/*
@@ -651,6 +651,15 @@ heap_page_will_freeze(Relation relation, Buffer buffer,
* also need to account for a reduction in the length of the line pointer
* array following array truncation by us.
*
+ * options:
+ * MARK_UNUSED_NOW indicates that dead items can be set LP_UNUSED during
+ * pruning.
+ *
+ * FREEZE indicates that we will also freeze tuples, and will return
+ * 'all_visible', 'all_frozen' flags to the caller.
+ *
+ * UPDATE_VM indicates that we will set the page's status in the VM.
+ *
* If the HEAP_PRUNE_FREEZE option is set, we will also freeze tuples if it's
* required in order to advance relfrozenxid / relminmxid, or if it's
* considered advantageous for overall system performance to do so now. The
@@ -669,30 +678,21 @@ heap_page_will_freeze(Relation relation, Buffer buffer,
* contain the required block of the visibility map.
*
* vistest is used to distinguish whether tuples are DEAD or RECENTLY_DEAD
- * (see heap_prune_satisfies_vacuum).
- *
- * options:
- * MARK_UNUSED_NOW indicates that dead items can be set LP_UNUSED during
- * pruning.
- *
- * FREEZE indicates that we will also freeze tuples, and will return
- * 'all_visible', 'all_frozen' flags to the caller.
- *
- * UPDATE_VM indicates that we will set the page's status in the VM.
+ * (see heap_prune_satisfies_vacuum). It is an input parameter.
*
* cutoffs contains the freeze cutoffs, established by VACUUM at the beginning
* of vacuuming the relation. Required if HEAP_PRUNE_FREEZE option is set.
* cutoffs->OldestXmin is also used to determine if dead tuples are
- * HEAPTUPLE_RECENTLY_DEAD or HEAPTUPLE_DEAD.
+ * HEAPTUPLE_RECENTLY_DEAD or HEAPTUPLE_DEAD. It is an input parameter.
+ *
+ * reason indicates why the pruning is performed. It is included in the WAL
+ * record for debugging and analysis purposes, but otherwise has no effect.
*
* presult contains output parameters needed by callers, such as the number of
* tuples removed and the offsets of dead items on the page after pruning.
* heap_page_prune_and_freeze() is responsible for initializing it. Required
* by all callers.
*
- * reason indicates why the pruning is performed. It is included in the WAL
- * record for debugging and analysis purposes, but otherwise has no effect.
- *
* off_loc is the offset location required by the caller to use in error
* callback.
*
@@ -705,13 +705,13 @@ heap_page_will_freeze(Relation relation, Buffer buffer,
*/
void
heap_page_prune_and_freeze(Relation relation, Buffer buffer,
+ int options,
bool blk_known_av,
Buffer vmbuffer,
GlobalVisState *vistest,
- int options,
struct VacuumCutoffs *cutoffs,
- PruneFreezeResult *presult,
PruneReason reason,
+ PruneFreezeResult *presult,
OffsetNumber *off_loc,
TransactionId *new_relfrozen_xid,
MultiXactId *new_relmin_mxid)
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index 4ad05ba4db6..4fb915e1d94 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -1993,11 +1993,11 @@ lazy_scan_prune(LVRelState *vacrel,
if (vacrel->nindexes == 0)
prune_options |= HEAP_PAGE_PRUNE_MARK_UNUSED_NOW;
- heap_page_prune_and_freeze(rel, buf,
+ heap_page_prune_and_freeze(rel, buf, prune_options,
all_visible_according_to_vm,
vmbuffer,
- vacrel->vistest, prune_options,
- &vacrel->cutoffs, &presult, PRUNE_VACUUM_SCAN,
+ vacrel->vistest,
+ &vacrel->cutoffs, PRUNE_VACUUM_SCAN, &presult,
&vacrel->offnum,
&vacrel->NewRelfrozenXid, &vacrel->NewRelminMxid);
diff --git a/src/include/access/heapam.h b/src/include/access/heapam.h
index 16f7904a21e..0c4e5607627 100644
--- a/src/include/access/heapam.h
+++ b/src/include/access/heapam.h
@@ -394,13 +394,13 @@ struct GlobalVisState;
extern void heap_page_prune_opt(Relation relation, Buffer buffer,
Buffer *vmbuffer);
extern void heap_page_prune_and_freeze(Relation relation, Buffer buffer,
+ int options,
bool blk_known_av,
Buffer vmbuffer,
struct GlobalVisState *vistest,
- int options,
struct VacuumCutoffs *cutoffs,
- PruneFreezeResult *presult,
PruneReason reason,
+ PruneFreezeResult *presult,
OffsetNumber *off_loc,
TransactionId *new_relfrozen_xid,
MultiXactId *new_relmin_mxid);
--
2.43.0
On Mon, Sep 8, 2025 at 6:29 PM Melanie Plageman
<melanieplageman@gmail.com> wrote:
But, I think you're right that maintaining the order of operations
proposed in transam/README is more important. As such, in attached
v11, I've modified this patch and the other patches where I replace
visibilitymap_set() with visibilitymap_set_vmbits() to exclusively
lock the vmbuffer before the critical section.
visibilitymap_set_vmbits() asserts that we have the vmbuffer
exclusively locked, so we should be good.
That sounds good. I think it is OK to keep some of the odd things that
we're currently doing if they're hard to eliminate, but if they're not
really needed then I'd rather see us standardize the code. I feel (and
I think you may agree, based on other conversations that we've had)
that the visibility map code is somewhat oddly structured, and I'd
like to see us push the amount of oddness down rather than up, if we
can reasonably do so without breaking everything.
The only difference is I replaced the phrase "LSN interlock" with
"being dropped or truncated later in recovery" -- which is more
specific and, I thought, more clear. Without this comment, it took me
some time to understand the scenarios that might lead us to skip
updating the heap block. heap_xlog_visible() has cause to describe
this situation in an earlier comment -- which is why I think the LSN
interlock comment is less confusing there.Anyway, I'm open to changing the comment. I could:
1) copy-paste the same comment as heap_xlog_visible()
2) refer to the comment in heap_xlog_visible() (comment seemed a bit
short for that)
3) diverge the comments further by improving the new comment in
heap_xlog_multi_insert() in some way
4) something else?
IMHO, copying and pasting comments is not great, and comments with
identical intent and divergent wording are also not great. The former
is not great because having a whole bunch of copies of the same
comment, especially if it's a block comment rather than a 1-liner,
uses up a bunch of space and creates a maintenance hazard in the sense
that future updates might not get propagated to all copies. The latter
is not great because it makes it hard to grep for other instances that
should be adjusted when you adjust one, and also because if one
version really is better than the other than ideally we'd like to have
the good version everywhere. Of course, there's some tension between
these two goals. In this particular case, thinking a little harder
about your proposed change, it seems to me that "LSN interlock" is
more clear about what the immediate test is that would cause us to
skip updating the heap page, and "being dropped or truncated later in
recovery" is more clear about what the larger state of the world that
would lead to that situation is. But whatever preference anyone might
have about which way to go with that choice, it is hard to see why the
preference should go one way in one case and the other way in another
case. Therefore, I favor an approach that leads either to an identical
comment in both places, or to one comment referring to the other.
The second paragraph does not convince me at all. I see no reason to
believe that this is safe, or that it is a good idea. The code in
xlog_heap_visible() thinks its OK to unlock and relock the page to
make visibilitymap_set() happy, which is cringy but probably safe for
lack of concurrent writers, but skipping locking altogether seems
deeply unwise.Actually in master, heap_xlog_visible() has no lock on the heap page
when it calls visibiltymap_set(). It releases that lock before
recording the freespace in the FSM and doesn't take it again.It does unlock and relock the VM page -- because visibilitymap_set()
expects to take the lock on the VM.I agree that not holding the heap lock while updating the VM is
unsatisfying. We can't hold it while doing the IO to read in the VM
block in XLogReadBufferForRedoExtended(). So, we could take it again
before calling visibilitymap_set(). But we don't always have the heap
buffer, though. I suspect this is partially why heap_xlog_visible()
unconditionally passes InvalidBuffer to visibilitymap_set() as the
heap buffer and has special case handling for recovery when we don't
have the heap buffer.
You know, I wasn't thinking carefully enough about the distinction
between the heap page and the visibility map page here. I thought you
were saying that you were modifying a page without a lock on that
page, but you aren't: you're saying you're modifying a page without a
lock on another page to which it is related. The former seems
disastrous, but the latter might be OK. However, I'm sort of confused
about what the comment is trying to say to justify that:
+ * It is only okay to set the VM bits without holding the heap page lock
+ * because we can expect no other writers of this page.
It is not exactly clear to me whether "this page" here refers to the
heap page or the VM page. If it means the heap page, why should that
be so if we haven't got any kind of lock? If it means the VM page,
then why is the heap page even relevant?
--
Robert Haas
EDB: http://www.enterprisedb.com
On Tue, Sep 9, 2025 at 10:00 AM Robert Haas <robertmhaas@gmail.com> wrote:
On Mon, Sep 8, 2025 at 6:29 PM Melanie Plageman
<melanieplageman@gmail.com> wrote:The only difference is I replaced the phrase "LSN interlock" with
"being dropped or truncated later in recovery" -- which is more
specific and, I thought, more clear. Without this comment, it took me
some time to understand the scenarios that might lead us to skip
updating the heap block. heap_xlog_visible() has cause to describe
this situation in an earlier comment -- which is why I think the LSN
interlock comment is less confusing there.Anyway, I'm open to changing the comment. I could:
1) copy-paste the same comment as heap_xlog_visible()
2) refer to the comment in heap_xlog_visible() (comment seemed a bit
short for that)
3) diverge the comments further by improving the new comment in
heap_xlog_multi_insert() in some way
4) something else?IMHO, copying and pasting comments is not great, and comments with
identical intent and divergent wording are also not great. The former
is not great because having a whole bunch of copies of the same
comment, especially if it's a block comment rather than a 1-liner,
uses up a bunch of space and creates a maintenance hazard in the sense
that future updates might not get propagated to all copies. The latter
is not great because it makes it hard to grep for other instances that
should be adjusted when you adjust one, and also because if one
version really is better than the other than ideally we'd like to have
the good version everywhere. Of course, there's some tension between
these two goals. In this particular case, thinking a little harder
about your proposed change, it seems to me that "LSN interlock" is
more clear about what the immediate test is that would cause us to
skip updating the heap page, and "being dropped or truncated later in
recovery" is more clear about what the larger state of the world that
would lead to that situation is. But whatever preference anyone might
have about which way to go with that choice, it is hard to see why the
preference should go one way in one case and the other way in another
case. Therefore, I favor an approach that leads either to an identical
comment in both places, or to one comment referring to the other.
I see what you are saying.
For heap_xlog_visible() the LSN interlock comment is easier to parse
because of an earlier comment before reading the heap page:
/*
* Read the heap page, if it still exists. If the heap file has dropped or
* truncated later in recovery, we don't need to update the page, but we'd
* better still update the visibility map.
*/
I've gone with the direct copy-paste of the LSN interlock paragraph in
attached v12. I think referring to the other comment is too confusing
in context here. However, I also added a line about what could cause
the LSN interlock -- but above it, so as to retain grep-ability of the
other comment.
The second paragraph does not convince me at all. I see no reason to
believe that this is safe, or that it is a good idea. The code in
xlog_heap_visible() thinks its OK to unlock and relock the page to
make visibilitymap_set() happy, which is cringy but probably safe for
lack of concurrent writers, but skipping locking altogether seems
deeply unwise.Actually in master, heap_xlog_visible() has no lock on the heap page
when it calls visibiltymap_set(). It releases that lock before
recording the freespace in the FSM and doesn't take it again.It does unlock and relock the VM page -- because visibilitymap_set()
expects to take the lock on the VM.I agree that not holding the heap lock while updating the VM is
unsatisfying. We can't hold it while doing the IO to read in the VM
block in XLogReadBufferForRedoExtended(). So, we could take it again
before calling visibilitymap_set(). But we don't always have the heap
buffer, though. I suspect this is partially why heap_xlog_visible()
unconditionally passes InvalidBuffer to visibilitymap_set() as the
heap buffer and has special case handling for recovery when we don't
have the heap buffer.You know, I wasn't thinking carefully enough about the distinction
between the heap page and the visibility map page here. I thought you
were saying that you were modifying a page without a lock on that
page, but you aren't: you're saying you're modifying a page without a
lock on another page to which it is related. The former seems
disastrous, but the latter might be OK. However, I'm sort of confused
about what the comment is trying to say to justify that:+ * It is only okay to set the VM bits without holding the heap page lock + * because we can expect no other writers of this page.It is not exactly clear to me whether "this page" here refers to the
heap page or the VM page. If it means the heap page, why should that
be so if we haven't got any kind of lock? If it means the VM page,
then why is the heap page even relevant?
I've expanded the comment in v12. In normal operation we must have the
lock on the heap page when setting the VM bits because if another
backend cleared PD_ALL_VISIBLE, we could have the forbidden scenario
where PD_ALL_VISIBLE is clear and the VM is set. This is not allowed
because then someone else may read the VM, conclude the page is
all-visible, and then an index-only scan can return wrong results. In
recovery, there are no concurrent writers, so it can't happen.
It is worth discussing how to fix it in heap_xlog_visible() so that
future scenarios like parallel recovery could not break this. However,
this patch is not a deviation from the behavior on master, and,
technically the behavior on master works.
- Melanie
Attachments:
v12-0002-Make-heap_page_is_all_visible-independent-of-LVR.patchtext/x-patch; charset=US-ASCII; name=v12-0002-Make-heap_page_is_all_visible-independent-of-LVR.patchDownload
From 28253a5c4cb60d842f83a6f3b90bb984ffd10f89 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <melanieplageman@gmail.com>
Date: Wed, 18 Jun 2025 15:48:51 -0400
Subject: [PATCH v12 02/20] Make heap_page_is_all_visible independent of
LVRelState
Future commits will use this function inside of pruneheap.c where we do
not have access to the LVRelState. We only need a few parameters from
the LVRelState, so just pass those in explicitly.
Author: Melanie Plageman <melanieplageman@gmail.com>
Reviewed-by: Kirill Reshke <reshkekirill@gmail.com>
Discussion: https://postgr.es/m/flat/CAAKRu_ZMw6Npd_qm2KM%2BFwQ3cMOMx1Dh3VMhp8-V7SOLxdK9-g%40mail.gmail.com
---
src/backend/access/heap/vacuumlazy.c | 48 ++++++++++++++++++----------
1 file changed, 31 insertions(+), 17 deletions(-)
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index 981d9380a92..8a84bdfe0a9 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -463,8 +463,11 @@ static void dead_items_add(LVRelState *vacrel, BlockNumber blkno, OffsetNumber *
int num_offsets);
static void dead_items_reset(LVRelState *vacrel);
static void dead_items_cleanup(LVRelState *vacrel);
-static bool heap_page_is_all_visible(LVRelState *vacrel, Buffer buf,
- TransactionId *visibility_cutoff_xid, bool *all_frozen);
+static bool heap_page_is_all_visible(Relation rel, Buffer buf,
+ TransactionId OldestXmin,
+ bool *all_frozen,
+ TransactionId *visibility_cutoff_xid,
+ OffsetNumber *logging_offnum);
static void update_relstats_all_indexes(LVRelState *vacrel);
static void vacuum_error_callback(void *arg);
static void update_vacuum_error_info(LVRelState *vacrel,
@@ -2009,8 +2012,9 @@ lazy_scan_prune(LVRelState *vacrel,
Assert(presult.lpdead_items == 0);
- if (!heap_page_is_all_visible(vacrel, buf,
- &debug_cutoff, &debug_all_frozen))
+ if (!heap_page_is_all_visible(vacrel->rel, buf,
+ vacrel->cutoffs.OldestXmin, &debug_all_frozen,
+ &debug_cutoff, &vacrel->offnum))
Assert(false);
Assert(presult.all_frozen == debug_all_frozen);
@@ -2912,8 +2916,8 @@ lazy_vacuum_heap_page(LVRelState *vacrel, BlockNumber blkno, Buffer buffer,
* emitted.
*/
Assert(!PageIsAllVisible(page));
- if (heap_page_is_all_visible(vacrel, buffer, &visibility_cutoff_xid,
- &all_frozen))
+ if (heap_page_is_all_visible(vacrel->rel, buffer, vacrel->cutoffs.OldestXmin,
+ &all_frozen, &visibility_cutoff_xid, &vacrel->offnum))
{
uint8 flags = VISIBILITYMAP_ALL_VISIBLE;
@@ -3596,10 +3600,18 @@ dead_items_cleanup(LVRelState *vacrel)
}
/*
- * Check if every tuple in the given page is visible to all current and future
- * transactions. Also return the visibility_cutoff_xid which is the highest
- * xmin amongst the visible tuples. Set *all_frozen to true if every tuple
- * on this page is frozen.
+ * Check if every tuple in the given page in buf is visible to all current and
+ * future transactions.
+ *
+ * OldestXmin is used to determine visibility.
+ *
+ * Sets *all_frozen to true if every tuple on this page is frozen.
+ *
+ * Sets *visibility_cutoff_xid to the highest xmin amongst the visible tuples.
+ * It is only valid if the page is all-visible.
+ *
+ * *logging_offnum will have the OffsetNumber of the current tuple being
+ * processed for vacuum's error callback system.
*
* This is a stripped down version of lazy_scan_prune(). If you change
* anything here, make sure that everything stays in sync. Note that an
@@ -3607,9 +3619,11 @@ dead_items_cleanup(LVRelState *vacrel)
* introducing new side-effects here.
*/
static bool
-heap_page_is_all_visible(LVRelState *vacrel, Buffer buf,
+heap_page_is_all_visible(Relation rel, Buffer buf,
+ TransactionId OldestXmin,
+ bool *all_frozen,
TransactionId *visibility_cutoff_xid,
- bool *all_frozen)
+ OffsetNumber *logging_offnum)
{
Page page = BufferGetPage(buf);
BlockNumber blockno = BufferGetBlockNumber(buf);
@@ -3632,7 +3646,7 @@ heap_page_is_all_visible(LVRelState *vacrel, Buffer buf,
* Set the offset number so that we can display it along with any
* error that occurred while processing this tuple.
*/
- vacrel->offnum = offnum;
+ *logging_offnum = offnum;
itemid = PageGetItemId(page, offnum);
/* Unused or redirect line pointers are of no interest */
@@ -3656,9 +3670,9 @@ heap_page_is_all_visible(LVRelState *vacrel, Buffer buf,
tuple.t_data = (HeapTupleHeader) PageGetItem(page, itemid);
tuple.t_len = ItemIdGetLength(itemid);
- tuple.t_tableOid = RelationGetRelid(vacrel->rel);
+ tuple.t_tableOid = RelationGetRelid(rel);
- switch (HeapTupleSatisfiesVacuum(&tuple, vacrel->cutoffs.OldestXmin,
+ switch (HeapTupleSatisfiesVacuum(&tuple, OldestXmin,
buf))
{
case HEAPTUPLE_LIVE:
@@ -3679,7 +3693,7 @@ heap_page_is_all_visible(LVRelState *vacrel, Buffer buf,
*/
xmin = HeapTupleHeaderGetXmin(tuple.t_data);
if (!TransactionIdPrecedes(xmin,
- vacrel->cutoffs.OldestXmin))
+ OldestXmin))
{
all_visible = false;
*all_frozen = false;
@@ -3714,7 +3728,7 @@ heap_page_is_all_visible(LVRelState *vacrel, Buffer buf,
} /* scan along page */
/* Clear the offset information once we have processed the given page. */
- vacrel->offnum = InvalidOffsetNumber;
+ *logging_offnum = InvalidOffsetNumber;
return all_visible;
}
--
2.43.0
v12-0001-Eliminate-xl_heap_visible-in-COPY-FREEZE.patchtext/x-patch; charset=US-ASCII; name=v12-0001-Eliminate-xl_heap_visible-in-COPY-FREEZE.patchDownload
From a042cabf79da8faa583f081b432ddb955d6211bc Mon Sep 17 00:00:00 2001
From: Melanie Plageman <melanieplageman@gmail.com>
Date: Tue, 17 Jun 2025 17:22:10 -0400
Subject: [PATCH v12 01/20] Eliminate xl_heap_visible in COPY FREEZE
Instead of emitting a separate WAL record for setting the VM bits in
xl_heap_visible, specify the changes to make to the VM block in the
xl_heap_multi_insert record instead.
This halves the number of WAL records emitted by COPY FREEZE.
Author: Melanie Plageman <melanieplageman@gmail.com>
Reviewed-by: Robert Haas <robertmhaas@gmail.com>
Reviewed-by: Andres Freund <andres@anarazel.de>
Reviewed-by: Kirill Reshke <reshkekirill@gmail.com>
Discussion: https://postgr.es/m/flat/CAAKRu_ZMw6Npd_qm2KM%2BFwQ3cMOMx1Dh3VMhp8-V7SOLxdK9-g%40mail.gmail.com
---
src/backend/access/heap/heapam.c | 50 +++++++++++-------
src/backend/access/heap/heapam_xlog.c | 55 +++++++++++++++++++-
src/backend/access/heap/visibilitymap.c | 67 ++++++++++++++++++++++++-
src/backend/access/rmgrdesc/heapdesc.c | 5 ++
src/include/access/visibilitymap.h | 2 +
5 files changed, 159 insertions(+), 20 deletions(-)
diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index 4c5ae205a7a..cff531a4801 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -2466,7 +2466,11 @@ heap_multi_insert(Relation relation, TupleTableSlot **slots, int ntuples,
starting_with_empty_page = PageGetMaxOffsetNumber(page) == 0;
if (starting_with_empty_page && (options & HEAP_INSERT_FROZEN))
+ {
all_frozen_set = true;
+ /* Lock the vmbuffer before entering the critical section */
+ LockBuffer(vmbuffer, BUFFER_LOCK_EXCLUSIVE);
+ }
/* NO EREPORT(ERROR) from here till changes are logged */
START_CRIT_SECTION();
@@ -2504,9 +2508,6 @@ heap_multi_insert(Relation relation, TupleTableSlot **slots, int ntuples,
/*
* If the page is all visible, need to clear that, unless we're only
* going to add further frozen rows to it.
- *
- * If we're only adding already frozen rows to a previously empty
- * page, mark it as all-visible.
*/
if (PageIsAllVisible(page) && !(options & HEAP_INSERT_FROZEN))
{
@@ -2516,8 +2517,21 @@ heap_multi_insert(Relation relation, TupleTableSlot **slots, int ntuples,
BufferGetBlockNumber(buffer),
vmbuffer, VISIBILITYMAP_VALID_BITS);
}
+
+ /*
+ * If we're only adding already frozen rows to a previously empty
+ * page, mark it as all-frozen and update the visibility map. We're
+ * already holding a pin on the vmbuffer.
+ */
else if (all_frozen_set)
+ {
PageSetAllVisible(page);
+ visibilitymap_set_vmbits(relation,
+ BufferGetBlockNumber(buffer),
+ vmbuffer,
+ VISIBILITYMAP_ALL_VISIBLE |
+ VISIBILITYMAP_ALL_FROZEN);
+ }
/*
* XXX Should we set PageSetPrunable on this page ? See heap_insert()
@@ -2565,6 +2579,12 @@ heap_multi_insert(Relation relation, TupleTableSlot **slots, int ntuples,
xlrec->flags = 0;
if (all_visible_cleared)
xlrec->flags = XLH_INSERT_ALL_VISIBLE_CLEARED;
+
+ /*
+ * We don't have to worry about including a conflict xid in the
+ * WAL record as HEAP_INSERT_FROZEN intentionally violates
+ * visibility rules.
+ */
if (all_frozen_set)
xlrec->flags = XLH_INSERT_ALL_FROZEN_SET;
@@ -2627,7 +2647,10 @@ heap_multi_insert(Relation relation, TupleTableSlot **slots, int ntuples,
XLogBeginInsert();
XLogRegisterData(xlrec, tupledata - scratch.data);
+
XLogRegisterBuffer(0, buffer, REGBUF_STANDARD | bufflags);
+ if (all_frozen_set)
+ XLogRegisterBuffer(1, vmbuffer, 0);
XLogRegisterBufData(0, tupledata, totaldatalen);
@@ -2637,26 +2660,17 @@ heap_multi_insert(Relation relation, TupleTableSlot **slots, int ntuples,
recptr = XLogInsert(RM_HEAP2_ID, info);
PageSetLSN(page, recptr);
+ if (all_frozen_set)
+ {
+ Assert(BufferIsDirty(vmbuffer));
+ PageSetLSN(BufferGetPage(vmbuffer), recptr);
+ }
}
END_CRIT_SECTION();
- /*
- * If we've frozen everything on the page, update the visibilitymap.
- * We're already holding pin on the vmbuffer.
- */
if (all_frozen_set)
- {
- /*
- * It's fine to use InvalidTransactionId here - this is only used
- * when HEAP_INSERT_FROZEN is specified, which intentionally
- * violates visibility rules.
- */
- visibilitymap_set(relation, BufferGetBlockNumber(buffer), buffer,
- InvalidXLogRecPtr, vmbuffer,
- InvalidTransactionId,
- VISIBILITYMAP_ALL_VISIBLE | VISIBILITYMAP_ALL_FROZEN);
- }
+ LockBuffer(vmbuffer, BUFFER_LOCK_UNLOCK);
UnlockReleaseBuffer(buffer);
ndone += nthispage;
diff --git a/src/backend/access/heap/heapam_xlog.c b/src/backend/access/heap/heapam_xlog.c
index cf843277938..86e5f76e49f 100644
--- a/src/backend/access/heap/heapam_xlog.c
+++ b/src/backend/access/heap/heapam_xlog.c
@@ -551,6 +551,7 @@ heap_xlog_multi_insert(XLogReaderState *record)
int i;
bool isinit = (XLogRecGetInfo(record) & XLOG_HEAP_INIT_PAGE) != 0;
XLogRedoAction action;
+ Buffer vmbuffer = InvalidBuffer;
/*
* Insertion doesn't overwrite MVCC data, so no conflict processing is
@@ -571,11 +572,11 @@ heap_xlog_multi_insert(XLogReaderState *record)
if (xlrec->flags & XLH_INSERT_ALL_VISIBLE_CLEARED)
{
Relation reln = CreateFakeRelcacheEntry(rlocator);
- Buffer vmbuffer = InvalidBuffer;
visibilitymap_pin(reln, blkno, &vmbuffer);
visibilitymap_clear(reln, blkno, vmbuffer, VISIBILITYMAP_VALID_BITS);
ReleaseBuffer(vmbuffer);
+ vmbuffer = InvalidBuffer;
FreeFakeRelcacheEntry(reln);
}
@@ -662,6 +663,58 @@ heap_xlog_multi_insert(XLogReaderState *record)
if (BufferIsValid(buffer))
UnlockReleaseBuffer(buffer);
+ buffer = InvalidBuffer;
+
+ /*
+ * Now read and update the VM block.
+ *
+ * Note that the heap relation may have been dropped or truncated, leading
+ * us to skip updating the heap block.
+ *
+ * Even if we skipped the heap page update due to the LSN interlock, it's
+ * still safe to update the visibility map. Any WAL record that clears
+ * the visibility map bit does so before checking the page LSN, so any
+ * bits that need to be cleared will still be cleared.
+ *
+ * Note that the lock on the heap page was dropped above. In normal
+ * operation this would never be safe because a concurrent query could
+ * modify the heap page and clear PD_ALL_VISIBLE -- violating the
+ * invariant that PD_ALL_VISIBLE must be set if the corresponding bit in
+ * the VM is set.
+ *
+ * In recovery, we expect no other writers, so writing to the VM page
+ * without holding a lock on the heap page is considered safe enough. It
+ * is done this way when replaying xl_heap_visible records (see
+ * heap_xlog_visible()).
+ */
+ if (xlrec->flags & XLH_INSERT_ALL_FROZEN_SET &&
+ XLogReadBufferForRedoExtended(record, 1, RBM_ZERO_ON_ERROR, false,
+ &vmbuffer) == BLK_NEEDS_REDO)
+ {
+ Page vmpage = BufferGetPage(vmbuffer);
+ Relation reln = CreateFakeRelcacheEntry(rlocator);
+
+ /* initialize the page if it was read as zeros */
+ if (PageIsNew(vmpage))
+ PageInit(vmpage, BLCKSZ, 0);
+
+ visibilitymap_set_vmbits(reln, blkno,
+ vmbuffer,
+ VISIBILITYMAP_ALL_VISIBLE |
+ VISIBILITYMAP_ALL_FROZEN);
+
+ /*
+ * It is not possible that the VM was already set for this heap page,
+ * so the vmbuffer must have been modified and marked dirty.
+ */
+ Assert(BufferIsDirty(vmbuffer));
+ PageSetLSN(BufferGetPage(vmbuffer), lsn);
+ FreeFakeRelcacheEntry(reln);
+ }
+
+ if (BufferIsValid(vmbuffer))
+ UnlockReleaseBuffer(vmbuffer);
+
/*
* If the page is running low on free space, update the FSM as well.
* Arbitrarily, our definition of "low" is less than 20%. We can't do much
diff --git a/src/backend/access/heap/visibilitymap.c b/src/backend/access/heap/visibilitymap.c
index 7306c16f05c..aa48a436108 100644
--- a/src/backend/access/heap/visibilitymap.c
+++ b/src/backend/access/heap/visibilitymap.c
@@ -14,7 +14,8 @@
* visibilitymap_clear - clear bits for one page in the visibility map
* visibilitymap_pin - pin a map page for setting a bit
* visibilitymap_pin_ok - check whether correct map page is already pinned
- * visibilitymap_set - set a bit in a previously pinned page
+ * visibilitymap_set - set bit(s) in a previously pinned page and log
+ * visibilitymap_set_vmbits - set bit(s) in a pinned page
* visibilitymap_get_status - get status of bits
* visibilitymap_count - count number of bits set in visibility map
* visibilitymap_prepare_truncate -
@@ -321,6 +322,70 @@ visibilitymap_set(Relation rel, BlockNumber heapBlk, Buffer heapBuf,
return status;
}
+/*
+ * Set flags in the VM block contained in the passed in vmBuf.
+ *
+ * This function is for callers which include the VM changes in the same WAL
+ * record as the modifications of the heap page which rendered it all-visible.
+ * Callers separately logging the VM changes should invoke visibilitymap_set()
+ * instead.
+ *
+ * Caller must have pinned and exclusive locked the correct block of the VM in
+ * vmBuf. This block should contain the VM bits for the given heapBlk.
+ *
+ * During normal operation (i.e. not recovery), this should be called in a
+ * critical section which also makes any necessary changes to the heap page
+ * and, if relevant, emits WAL.
+ *
+ * Caller is responsible for WAL logging the changes to the VM buffer and for
+ * making any changes needed to the associated heap page. This includes
+ * maintaining any invariants such as ensuring the buffer containing heapBlk
+ * is pinned and exclusive locked.
+ */
+uint8
+visibilitymap_set_vmbits(Relation rel, BlockNumber heapBlk,
+ Buffer vmBuf, uint8 flags)
+{
+ BlockNumber mapBlock = HEAPBLK_TO_MAPBLOCK(heapBlk);
+ uint32 mapByte = HEAPBLK_TO_MAPBYTE(heapBlk);
+ uint8 mapOffset = HEAPBLK_TO_OFFSET(heapBlk);
+ Page page;
+ uint8 *map;
+ uint8 status;
+
+#ifdef TRACE_VISIBILITYMAP
+ elog(DEBUG1, "vm_set flags 0x%02X for %s %d",
+ flags, RelationGetRelationName(rel), heapBlk);
+#endif
+
+ /* Call in same critical section where WAL is emitted. */
+ Assert(InRecovery || CritSectionCount > 0);
+
+ /* Flags should be valid. Also never clear bits with this function */
+ Assert((flags & VISIBILITYMAP_VALID_BITS) == flags);
+
+ /* Must never set all_frozen bit without also setting all_visible bit */
+ Assert(flags != VISIBILITYMAP_ALL_FROZEN);
+
+ /* Check that we have the right VM page pinned */
+ if (!BufferIsValid(vmBuf) || BufferGetBlockNumber(vmBuf) != mapBlock)
+ elog(ERROR, "wrong VM buffer passed to visibilitymap_set");
+
+ Assert(BufferIsExclusiveLocked(vmBuf));
+
+ page = BufferGetPage(vmBuf);
+ map = (uint8 *) PageGetContents(page);
+
+ status = (map[mapByte] >> mapOffset) & VISIBILITYMAP_VALID_BITS;
+ if (flags != status)
+ {
+ map[mapByte] |= (flags << mapOffset);
+ MarkBufferDirty(vmBuf);
+ }
+
+ return status;
+}
+
/*
* visibilitymap_get_status - get status of bits
*
diff --git a/src/backend/access/rmgrdesc/heapdesc.c b/src/backend/access/rmgrdesc/heapdesc.c
index 82b62c95de5..b48d7dc1d24 100644
--- a/src/backend/access/rmgrdesc/heapdesc.c
+++ b/src/backend/access/rmgrdesc/heapdesc.c
@@ -16,6 +16,7 @@
#include "access/heapam_xlog.h"
#include "access/rmgrdesc_utils.h"
+#include "access/visibilitymapdefs.h"
#include "storage/standbydefs.h"
/*
@@ -354,6 +355,10 @@ heap2_desc(StringInfo buf, XLogReaderState *record)
appendStringInfo(buf, "ntuples: %d, flags: 0x%02X", xlrec->ntuples,
xlrec->flags);
+ if (xlrec->flags & XLH_INSERT_ALL_FROZEN_SET)
+ appendStringInfo(buf, ", vm_flags: 0x%02X",
+ VISIBILITYMAP_ALL_VISIBLE | VISIBILITYMAP_ALL_FROZEN);
+
if (XLogRecHasBlockData(record, 0) && !isinit)
{
appendStringInfoString(buf, ", offsets:");
diff --git a/src/include/access/visibilitymap.h b/src/include/access/visibilitymap.h
index be21c6dd1a3..fc7056a91ea 100644
--- a/src/include/access/visibilitymap.h
+++ b/src/include/access/visibilitymap.h
@@ -37,6 +37,8 @@ extern uint8 visibilitymap_set(Relation rel,
Buffer vmBuf,
TransactionId cutoff_xid,
uint8 flags);
+extern uint8 visibilitymap_set_vmbits(Relation rel, BlockNumber heapBlk,
+ Buffer vmBuf, uint8 flags);
extern uint8 visibilitymap_get_status(Relation rel, BlockNumber heapBlk, Buffer *vmbuf);
extern void visibilitymap_count(Relation rel, BlockNumber *all_visible, BlockNumber *all_frozen);
extern BlockNumber visibilitymap_prepare_truncate(Relation rel,
--
2.43.0
v12-0005-Combine-lazy_scan_prune-VM-corruption-cases.patchtext/x-patch; charset=US-ASCII; name=v12-0005-Combine-lazy_scan_prune-VM-corruption-cases.patchDownload
From e5c62e789cc7757e61543aa628877f9bcab4dcac Mon Sep 17 00:00:00 2001
From: Melanie Plageman <melanieplageman@gmail.com>
Date: Wed, 28 May 2025 16:04:03 -0400
Subject: [PATCH v12 05/20] Combine lazy_scan_prune VM corruption cases
lazy_scan_prune() updates the visibility map after phase I of heap
vacuuming. It also checks and fixes corruption in the VM. The corruption
cases where mixed in with the normal visibility map update cases.
Careful study of the ordering of the current logic reveals that the
corruption cases can be reordered and extracted into a separate
function. This should result in no additional overhead when compared to
previous execution.
This reordering makes it clear which cases are about corruption and
which cases are normal VM updates. Separating them also makes it
possible to combine the normal cases in a future commit. This will make
the logic easier to understand and allow for further separation of the
logic to allow updating the VM in the same record as pruning and
freezing in phase I.
---
src/backend/access/heap/vacuumlazy.c | 126 +++++++++++++++++----------
1 file changed, 79 insertions(+), 47 deletions(-)
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index a1cdaaebb57..e9b4e924d22 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -430,6 +430,12 @@ static void find_next_unskippable_block(LVRelState *vacrel, bool *skipsallvis);
static bool lazy_scan_new_or_empty(LVRelState *vacrel, Buffer buf,
BlockNumber blkno, Page page,
bool sharelock, Buffer vmbuffer);
+static bool identify_and_fix_vm_corruption(Relation relation,
+ BlockNumber heap_blk,
+ Buffer heap_buffer, Page heap_page,
+ bool heap_blk_known_av,
+ int64 nlpdead_items,
+ Buffer vmbuffer);
static int lazy_scan_prune(LVRelState *vacrel, Buffer buf,
BlockNumber blkno, Page page,
Buffer vmbuffer, bool all_visible_according_to_vm,
@@ -1934,6 +1940,72 @@ lazy_scan_new_or_empty(LVRelState *vacrel, Buffer buf, BlockNumber blkno,
return false;
}
+/*
+ * When updating the visibility map after phase I heap vacuuming, we take the
+ * opportunity to identify and fix any VM corruption.
+ *
+ * heap_blk_known_av is the visibility status of the heap page collected
+ * while finding the next unskippable block in heap_vac_scan_next_block().
+ */
+static bool
+identify_and_fix_vm_corruption(Relation relation,
+ BlockNumber heap_blk,
+ Buffer heap_buffer, Page heap_page,
+ bool heap_blk_known_av,
+ int64 nlpdead_items,
+ Buffer vmbuffer)
+{
+ /*
+ * As of PostgreSQL 9.2, the visibility map bit should never be set if the
+ * page-level bit is clear. However, it's possible that the bit got
+ * cleared after heap_vac_scan_next_block() was called, so we must recheck
+ * with buffer lock before concluding that the VM is corrupt.
+ */
+ if (heap_blk_known_av && !PageIsAllVisible(heap_page) &&
+ visibilitymap_get_status(relation, heap_blk, &vmbuffer) != 0)
+ {
+ ereport(WARNING,
+ (errcode(ERRCODE_DATA_CORRUPTED),
+ errmsg("page is not marked all-visible but visibility map bit is set in relation \"%s\" page %u",
+ RelationGetRelationName(relation), heap_blk)));
+
+ visibilitymap_clear(relation, heap_blk, vmbuffer,
+ VISIBILITYMAP_VALID_BITS);
+ return true;
+ }
+
+ /*
+ * It's possible for the value returned by
+ * GetOldestNonRemovableTransactionId() to move backwards, so it's not
+ * wrong for us to see tuples that appear to not be visible to everyone
+ * yet, while PD_ALL_VISIBLE is already set. The real safe xmin value
+ * never moves backwards, but GetOldestNonRemovableTransactionId() is
+ * conservative and sometimes returns a value that's unnecessarily small,
+ * so if we see that contradiction it just means that the tuples that we
+ * think are not visible to everyone yet actually are, and the
+ * PD_ALL_VISIBLE flag is correct.
+ *
+ * There should never be LP_DEAD items on a page with PD_ALL_VISIBLE set,
+ * however.
+ */
+ if (nlpdead_items > 0 && PageIsAllVisible(heap_page))
+ {
+ ereport(WARNING,
+ (errcode(ERRCODE_DATA_CORRUPTED),
+ errmsg("page containing LP_DEAD items is marked as all-visible in relation \"%s\" page %u",
+ RelationGetRelationName(relation), heap_blk)));
+
+ PageClearAllVisible(heap_page);
+ MarkBufferDirty(heap_buffer);
+ visibilitymap_clear(relation, heap_blk, vmbuffer,
+ VISIBILITYMAP_VALID_BITS);
+ return true;
+ }
+
+ return false;
+}
+
+
/* qsort comparator for sorting OffsetNumbers */
static int
cmpOffsetNumbers(const void *a, const void *b)
@@ -2080,9 +2152,14 @@ lazy_scan_prune(LVRelState *vacrel,
/*
* Handle setting visibility map bit based on information from the VM (as
* of last heap_vac_scan_next_block() call), and from all_visible and
- * all_frozen variables
+ * all_frozen variables. Start by looking for any VM corruption.
*/
- if (!all_visible_according_to_vm && presult.all_visible)
+ if (identify_and_fix_vm_corruption(vacrel->rel, blkno, buf, page,
+ all_visible_according_to_vm, presult.lpdead_items, vmbuffer))
+ {
+ /* Don't update the VM if we just cleared corruption in it */
+ }
+ else if (!all_visible_according_to_vm && presult.all_visible)
{
uint8 old_vmbits;
uint8 flags = VISIBILITYMAP_ALL_VISIBLE;
@@ -2134,51 +2211,6 @@ lazy_scan_prune(LVRelState *vacrel,
}
}
- /*
- * As of PostgreSQL 9.2, the visibility map bit should never be set if the
- * page-level bit is clear. However, it's possible that the bit got
- * cleared after heap_vac_scan_next_block() was called, so we must recheck
- * with buffer lock before concluding that the VM is corrupt.
- */
- else if (all_visible_according_to_vm && !PageIsAllVisible(page) &&
- visibilitymap_get_status(vacrel->rel, blkno, &vmbuffer) != 0)
- {
- ereport(WARNING,
- (errcode(ERRCODE_DATA_CORRUPTED),
- errmsg("page is not marked all-visible but visibility map bit is set in relation \"%s\" page %u",
- vacrel->relname, blkno)));
-
- visibilitymap_clear(vacrel->rel, blkno, vmbuffer,
- VISIBILITYMAP_VALID_BITS);
- }
-
- /*
- * It's possible for the value returned by
- * GetOldestNonRemovableTransactionId() to move backwards, so it's not
- * wrong for us to see tuples that appear to not be visible to everyone
- * yet, while PD_ALL_VISIBLE is already set. The real safe xmin value
- * never moves backwards, but GetOldestNonRemovableTransactionId() is
- * conservative and sometimes returns a value that's unnecessarily small,
- * so if we see that contradiction it just means that the tuples that we
- * think are not visible to everyone yet actually are, and the
- * PD_ALL_VISIBLE flag is correct.
- *
- * There should never be LP_DEAD items on a page with PD_ALL_VISIBLE set,
- * however.
- */
- else if (presult.lpdead_items > 0 && PageIsAllVisible(page))
- {
- ereport(WARNING,
- (errcode(ERRCODE_DATA_CORRUPTED),
- errmsg("page containing LP_DEAD items is marked as all-visible in relation \"%s\" page %u",
- vacrel->relname, blkno)));
-
- PageClearAllVisible(page);
- MarkBufferDirty(buf);
- visibilitymap_clear(vacrel->rel, blkno, vmbuffer,
- VISIBILITYMAP_VALID_BITS);
- }
-
/*
* If the all-visible page is all-frozen but not marked as such yet, mark
* it as all-frozen. Note that all_frozen is only valid if all_visible is
--
2.43.0
v12-0003-Eliminate-xl_heap_visible-from-vacuum-phase-III.patchtext/x-patch; charset=US-ASCII; name=v12-0003-Eliminate-xl_heap_visible-from-vacuum-phase-III.patchDownload
From 684e2b681adfee93d5155cb77df3062188ae2dbc Mon Sep 17 00:00:00 2001
From: Melanie Plageman <melanieplageman@gmail.com>
Date: Wed, 18 Jun 2025 12:42:13 -0400
Subject: [PATCH v12 03/20] Eliminate xl_heap_visible from vacuum phase III
Instead of emitting a separate xl_heap_visible record for each page that
is rendered all-visible by vacuum's third phase, include the updates to
the VM in the already emitted xl_heap_prune record.
The visibilitymap bits are stored in the flags member of the
xl_heap_prune record.
This can decrease the number of of WAL records vacuum phase III emits by
as much as half.
Author: Melanie Plageman <melanieplageman@gmail.com>
Reviewed-by: Kirill Reshke <reshkekirill@gmail.com>
Reviewed-by: Andres Freund <andres@anarazel.de>
Discussion: https://postgr.es/m/flat/CAAKRu_ZMw6Npd_qm2KM%2BFwQ3cMOMx1Dh3VMhp8-V7SOLxdK9-g%40mail.gmail.com
---
src/backend/access/heap/heapam_xlog.c | 156 +++++++++++++++++++----
src/backend/access/heap/pruneheap.c | 66 ++++++++--
src/backend/access/heap/vacuumlazy.c | 166 +++++++++++++++++--------
src/backend/access/rmgrdesc/heapdesc.c | 6 +-
src/include/access/heapam.h | 9 ++
src/include/access/heapam_xlog.h | 36 ++++--
6 files changed, 343 insertions(+), 96 deletions(-)
diff --git a/src/backend/access/heap/heapam_xlog.c b/src/backend/access/heap/heapam_xlog.c
index 86e5f76e49f..5872f13397f 100644
--- a/src/backend/access/heap/heapam_xlog.c
+++ b/src/backend/access/heap/heapam_xlog.c
@@ -35,7 +35,9 @@ heap_xlog_prune_freeze(XLogReaderState *record)
Buffer buffer;
RelFileLocator rlocator;
BlockNumber blkno;
- XLogRedoAction action;
+ Buffer vmbuffer = InvalidBuffer;
+ uint8 vmflags = 0;
+ Size freespace = 0;
XLogRecGetBlockTag(record, 0, &rlocator, NULL, &blkno);
memcpy(&xlrec, maindataptr, SizeOfHeapPrune);
@@ -50,11 +52,17 @@ heap_xlog_prune_freeze(XLogReaderState *record)
Assert((xlrec.flags & XLHP_CLEANUP_LOCK) != 0 ||
(xlrec.flags & (XLHP_HAS_REDIRECTIONS | XLHP_HAS_DEAD_ITEMS)) == 0);
+ vmflags = xlrec.flags & VISIBILITYMAP_VALID_BITS;
+
/*
- * We are about to remove and/or freeze tuples. In Hot Standby mode,
- * ensure that there are no queries running for which the removed tuples
- * are still visible or which still consider the frozen xids as running.
- * The conflict horizon XID comes after xl_heap_prune.
+ * After xl_heap_prune is the optional snapshot conflict horizon.
+ *
+ * In Hot Standby mode, we must ensure that there are no running queries
+ * which would conflict with the changes in this record. That means we
+ * can't replay this record if it removes tuples that are still visible to
+ * transactions on the standby, freeze tuples with xids that are still
+ * considered running on the standby, or set a page as all-visible in the
+ * VM if it isn't all-visible to all transactions on the standby.
*/
if ((xlrec.flags & XLHP_HAS_CONFLICT_HORIZON) != 0)
{
@@ -71,12 +79,12 @@ heap_xlog_prune_freeze(XLogReaderState *record)
}
/*
- * If we have a full-page image, restore it and we're done.
+ * If we have a full-page image of the heap block, restore it and we're
+ * done with the heap block.
*/
- action = XLogReadBufferForRedoExtended(record, 0, RBM_NORMAL,
- (xlrec.flags & XLHP_CLEANUP_LOCK) != 0,
- &buffer);
- if (action == BLK_NEEDS_REDO)
+ if (XLogReadBufferForRedoExtended(record, 0, RBM_NORMAL,
+ (xlrec.flags & XLHP_CLEANUP_LOCK) != 0,
+ &buffer) == BLK_NEEDS_REDO)
{
Page page = BufferGetPage(buffer);
OffsetNumber *redirected;
@@ -89,6 +97,9 @@ heap_xlog_prune_freeze(XLogReaderState *record)
Size datalen;
xlhp_freeze_plan *plans;
OffsetNumber *frz_offsets;
+ bool do_prune;
+ bool mark_buffer_dirty;
+ bool set_heap_lsn;
char *dataptr = XLogRecGetBlockData(record, 0, &datalen);
heap_xlog_deserialize_prune_and_freeze(dataptr, xlrec.flags,
@@ -97,11 +108,18 @@ heap_xlog_prune_freeze(XLogReaderState *record)
&ndead, &nowdead,
&nunused, &nowunused);
+ do_prune = nredirected > 0 || ndead > 0 || nunused > 0;
+ set_heap_lsn = mark_buffer_dirty = do_prune || nplans > 0;
+
+ /* Ensure the record does something */
+ Assert(do_prune || nplans > 0 ||
+ vmflags & VISIBILITYMAP_VALID_BITS);
+
/*
* Update all line pointers per the record, and repair fragmentation
* if needed.
*/
- if (nredirected > 0 || ndead > 0 || nunused > 0)
+ if (do_prune)
heap_page_prune_execute(buffer,
(xlrec.flags & XLHP_CLEANUP_LOCK) == 0,
redirected, nredirected,
@@ -138,36 +156,128 @@ heap_xlog_prune_freeze(XLogReaderState *record)
/* There should be no more data */
Assert((char *) frz_offsets == dataptr + datalen);
+ /*
+ * Now set PD_ALL_VISIBLE, if required. We'll only do this if we are
+ * also going to set bits in the VM later.
+ *
+ * We must never end up with the VM bit set and the page-level
+ * PD_ALL_VISIBLE bit clear. If that were to occur, a subsequent page
+ * modification would fail to clear the VM bit.
+ */
+ if ((vmflags & VISIBILITYMAP_VALID_BITS) && !PageIsAllVisible(page))
+ {
+ PageSetAllVisible(page);
+
+ /*
+ * If the only change to the heap page is setting PD_ALL_VISIBLE,
+ * we can avoid setting the page LSN unless checksums or
+ * wal_log_hints are enabled.
+ */
+ set_heap_lsn = XLogHintBitIsNeeded() ? true : set_heap_lsn;
+ mark_buffer_dirty = true;
+ }
+
/*
* Note: we don't worry about updating the page's prunability hints.
* At worst this will cause an extra prune cycle to occur soon.
*/
- PageSetLSN(page, lsn);
- MarkBufferDirty(buffer);
+ if (mark_buffer_dirty)
+ MarkBufferDirty(buffer);
+ if (set_heap_lsn)
+ PageSetLSN(page, lsn);
}
/*
- * If we released any space or line pointers, update the free space map.
+ * If we released any space or line pointers or will be setting a page in
+ * the visibility map, measure the page's freespace to later update the
+ * freespace map.
+ *
+ * Even if we are just updating the VM (and thus not freeing up any
+ * space), we'll still update the FSM for this page. Since FSM is not
+ * WAL-logged and only updated heuristically, it easily becomes stale in
+ * standbys. If the standby is later promoted and runs VACUUM, it will
+ * skip updating individual free space figures for pages that became
+ * all-visible (or all-frozen, depending on the vacuum mode,) which is
+ * troublesome when FreeSpaceMapVacuum propagates too optimistic free
+ * space values to upper FSM layers; later inserters try to use such pages
+ * only to find out that they are unusable. This can cause long stalls
+ * when there are many such pages.
+ *
+ * Forestall those problems by updating FSM's idea about a page that is
+ * becoming all-visible or all-frozen.
*
* Do this regardless of a full-page image being applied, since the FSM
* data is not in the page anyway.
+ *
+ * We want to avoid holding an exclusive lock on the heap buffer while
+ * doing IO (either of the FSM or the VM), so we'll release the lock on
+ * the heap buffer before doing either.
*/
if (BufferIsValid(buffer))
{
- if (xlrec.flags & (XLHP_HAS_REDIRECTIONS |
- XLHP_HAS_DEAD_ITEMS |
- XLHP_HAS_NOW_UNUSED_ITEMS))
- {
- Size freespace = PageGetHeapFreeSpace(BufferGetPage(buffer));
+ if ((xlrec.flags & (XLHP_HAS_REDIRECTIONS |
+ XLHP_HAS_DEAD_ITEMS |
+ XLHP_HAS_NOW_UNUSED_ITEMS)) ||
+ vmflags & VISIBILITYMAP_VALID_BITS)
+ freespace = PageGetHeapFreeSpace(BufferGetPage(buffer));
+
+ UnlockReleaseBuffer(buffer);
+ }
+
+ /*
+ * Now read and update the VM block.
+ *
+ * Note that the heap relation may have been dropped or truncated, leading
+ * us to skip updating the heap block.
+ *
+ * Even if we skipped the heap page update due to the LSN interlock, it's
+ * still safe to update the visibility map. Any WAL record that clears
+ * the visibility map bit does so before checking the page LSN, so any
+ * bits that need to be cleared will still be cleared.
+ *
+ * Note that the lock on the heap page was dropped above. In normal
+ * operation this would never be safe because a concurrent query could
+ * modify the heap page and clear PD_ALL_VISIBLE -- violating the
+ * invariant that PD_ALL_VISIBLE must be set if the corresponding bit in
+ * the VM is set.
+ *
+ * In recovery, we expect no other writers, so writing to the VM page
+ * without holding a lock on the heap page is considered safe enough. It
+ * is done this way when replaying xl_heap_visible records (see
+ * heap_xlog_visible()).
+ */
+ if (vmflags & VISIBILITYMAP_VALID_BITS &&
+ XLogReadBufferForRedoExtended(record, 1,
+ RBM_ZERO_ON_ERROR,
+ false,
+ &vmbuffer) == BLK_NEEDS_REDO)
+ {
+ Page vmpage = BufferGetPage(vmbuffer);
+ uint8 old_vmbits = 0;
+ Relation reln = CreateFakeRelcacheEntry(rlocator);
- UnlockReleaseBuffer(buffer);
+ /* initialize the page if it was read as zeros */
+ if (PageIsNew(vmpage))
+ PageInit(vmpage, BLCKSZ, 0);
- XLogRecordPageWithFreeSpace(rlocator, blkno, freespace);
+ old_vmbits = visibilitymap_set_vmbits(reln, blkno, vmbuffer, vmflags);
+
+ /* Only set VM page LSN if we modified the page */
+ if (old_vmbits != vmflags)
+ {
+ Assert(BufferIsDirty(vmbuffer));
+ PageSetLSN(BufferGetPage(vmbuffer), lsn);
}
- else
- UnlockReleaseBuffer(buffer);
+
+ FreeFakeRelcacheEntry(reln);
}
+
+ if (BufferIsValid(vmbuffer))
+ UnlockReleaseBuffer(vmbuffer);
+
+ if (freespace > 0)
+ XLogRecordPageWithFreeSpace(rlocator, blkno, freespace);
}
/*
diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index 7ebd22f00a3..f0b33d1b696 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -21,6 +21,7 @@
#include "access/transam.h"
#include "access/xlog.h"
#include "access/xloginsert.h"
+#include "access/visibilitymapdefs.h"
#include "commands/vacuum.h"
#include "executor/instrument.h"
#include "miscadmin.h"
@@ -835,6 +836,7 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
conflict_xid = prstate.latest_xid_removed;
log_heap_prune_and_freeze(relation, buffer,
+ InvalidBuffer, 0, false,
conflict_xid,
true, reason,
prstate.frozen, prstate.nfrozen,
@@ -2030,14 +2032,18 @@ heap_log_freeze_plan(HeapTupleFreeze *tuples, int ntuples,
*
* This is used for several different page maintenance operations:
*
- * - Page pruning, in VACUUM's 1st pass or on access: Some items are
+ * - Page pruning, in vacuum phase I or on-access: Some items are
* redirected, some marked dead, and some removed altogether.
*
- * - Freezing: Items are marked as 'frozen'.
+ * - Freezing: During vacuum phase I, items are marked as 'frozen'
*
- * - Vacuum, 2nd pass: Items that are already LP_DEAD are marked as unused.
+ * - Reaping: During vacuum phase III, items that are already LP_DEAD are
+ * marked as unused.
*
- * They have enough commonalities that we use a single WAL record for them
+ * - VM updates: After vacuum phase III, the heap page may be marked
+ * all-visible and all-frozen.
+ *
+ * These changes all happen together, so we use a singel WAL record for them
* all.
*
* If replaying the record requires a cleanup lock, pass cleanup_lock = true.
@@ -2045,12 +2051,23 @@ heap_log_freeze_plan(HeapTupleFreeze *tuples, int ntuples,
* replaying 'unused' items depends on whether they were all previously marked
* as dead.
*
+ * If the VM is being updated, vmflags will contain the bits to set. In this
+ * case, vmbuffer should already have been updated and marked dirty and should
+ * still be pinned and locked.
+ *
+ * set_pd_all_vis indicates that we set PD_ALL_VISIBLE and thus should update
+ * the page LSN when checksums/wal_log_hints are enabled even if we did not
+ * prune or freeze tuples on the page.
+ *
* Note: This function scribbles on the 'frozen' array.
*
* Note: This is called in a critical section, so careful what you do here.
*/
void
log_heap_prune_and_freeze(Relation relation, Buffer buffer,
+ Buffer vmbuffer,
+ uint8 vmflags,
+ bool set_pd_all_vis,
TransactionId conflict_xid,
bool cleanup_lock,
PruneReason reason,
@@ -2062,6 +2079,7 @@ log_heap_prune_and_freeze(Relation relation, Buffer buffer,
xl_heap_prune xlrec;
XLogRecPtr recptr;
uint8 info;
+ uint8 regbuf_flags;
/* The following local variables hold data registered in the WAL record: */
xlhp_freeze_plan plans[MaxHeapTuplesPerPage];
@@ -2070,8 +2088,21 @@ log_heap_prune_and_freeze(Relation relation, Buffer buffer,
xlhp_prune_items dead_items;
xlhp_prune_items unused_items;
OffsetNumber frz_offsets[MaxHeapTuplesPerPage];
+ bool do_prune = nredirected > 0 || ndead > 0 || nunused > 0;
+
+ Assert((vmflags & VISIBILITYMAP_VALID_BITS) == vmflags);
+ xlrec.flags = vmflags;
- xlrec.flags = 0;
+ regbuf_flags = REGBUF_STANDARD;
+
+ /*
+ * We can avoid an FPI if the only modification we are making to the heap
+ * page is to set PD_ALL_VISIBLE and checksums/wal_log_hints are disabled.
+ */
+ if (!do_prune &&
+ nfrozen == 0 &&
+ (!set_pd_all_vis || !XLogHintBitIsNeeded()))
+ regbuf_flags |= REGBUF_NO_IMAGE;
/*
* Prepare data for the buffer. The arrays are not actually in the
@@ -2079,7 +2110,11 @@ log_heap_prune_and_freeze(Relation relation, Buffer buffer,
* page image, the arrays can be omitted.
*/
XLogBeginInsert();
- XLogRegisterBuffer(0, buffer, REGBUF_STANDARD);
+ XLogRegisterBuffer(0, buffer, regbuf_flags);
+
+ if (vmflags & VISIBILITYMAP_VALID_BITS)
+ XLogRegisterBuffer(1, vmbuffer, 0);
+
if (nfrozen > 0)
{
int nplans;
@@ -2168,5 +2203,22 @@ log_heap_prune_and_freeze(Relation relation, Buffer buffer,
}
recptr = XLogInsert(RM_HEAP2_ID, info);
- PageSetLSN(BufferGetPage(buffer), recptr);
+ if (vmflags & VISIBILITYMAP_VALID_BITS)
+ {
+ Assert(BufferIsDirty(vmbuffer));
+ PageSetLSN(BufferGetPage(vmbuffer), recptr);
+ }
+
+ /*
+ * If pruning or freezing tuples or setting the page all-visible when
+ * checksums or wal_hint_bits are enabled, we must bump the LSN. Torn
+ * pages are possible if we update PD_ALL_VISIBLE without bumping the LSN,
+ * but this is deemed okay for page hint updates.
+ */
+ if (do_prune || nfrozen > 0 ||
+ (set_pd_all_vis && XLogHintBitIsNeeded()))
+ {
+ Assert(BufferIsDirty(buffer));
+ PageSetLSN(BufferGetPage(buffer), recptr);
+ }
}
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index 8a84bdfe0a9..51067264004 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -463,11 +463,13 @@ static void dead_items_add(LVRelState *vacrel, BlockNumber blkno, OffsetNumber *
int num_offsets);
static void dead_items_reset(LVRelState *vacrel);
static void dead_items_cleanup(LVRelState *vacrel);
-static bool heap_page_is_all_visible(Relation rel, Buffer buf,
- TransactionId OldestXmin,
- bool *all_frozen,
- TransactionId *visibility_cutoff_xid,
- OffsetNumber *logging_offnum);
+static bool heap_page_would_be_all_visible(Relation rel, Buffer buf,
+ TransactionId OldestXmin,
+ OffsetNumber *deadoffsets,
+ int ndeadoffsets,
+ bool *all_frozen,
+ TransactionId *visibility_cutoff_xid,
+ OffsetNumber *logging_offnum);
static void update_relstats_all_indexes(LVRelState *vacrel);
static void vacuum_error_callback(void *arg);
static void update_vacuum_error_info(LVRelState *vacrel,
@@ -2852,8 +2854,11 @@ lazy_vacuum_heap_page(LVRelState *vacrel, BlockNumber blkno, Buffer buffer,
OffsetNumber unused[MaxHeapTuplesPerPage];
int nunused = 0;
TransactionId visibility_cutoff_xid;
+ TransactionId conflict_xid = InvalidTransactionId;
bool all_frozen;
LVSavedErrInfo saved_err_info;
+ uint8 vmflags = 0;
+ bool set_pd_all_vis = false;
Assert(vacrel->do_index_vacuuming);
@@ -2864,6 +2869,23 @@ lazy_vacuum_heap_page(LVRelState *vacrel, BlockNumber blkno, Buffer buffer,
VACUUM_ERRCB_PHASE_VACUUM_HEAP, blkno,
InvalidOffsetNumber);
+ if (heap_page_would_be_all_visible(vacrel->rel, buffer,
+ vacrel->cutoffs.OldestXmin,
+ deadoffsets, num_offsets,
+ &all_frozen, &visibility_cutoff_xid,
+ &vacrel->offnum))
+ {
+ vmflags |= VISIBILITYMAP_ALL_VISIBLE;
+ if (all_frozen)
+ {
+ vmflags |= VISIBILITYMAP_ALL_FROZEN;
+ Assert(!TransactionIdIsValid(visibility_cutoff_xid));
+ }
+
+ /* Take the lock on the vmbuffer before entering a critical section */
+ LockBuffer(vmbuffer, BUFFER_LOCK_EXCLUSIVE);
+ }
+
START_CRIT_SECTION();
for (int i = 0; i < num_offsets; i++)
@@ -2883,6 +2905,17 @@ lazy_vacuum_heap_page(LVRelState *vacrel, BlockNumber blkno, Buffer buffer,
/* Attempt to truncate line pointer array now */
PageTruncateLinePointerArray(page);
+ if ((vmflags & VISIBILITYMAP_VALID_BITS) != 0)
+ {
+ Assert(!PageIsAllVisible(page));
+ set_pd_all_vis = true;
+ PageSetAllVisible(page);
+ visibilitymap_set_vmbits(vacrel->rel,
+ blkno,
+ vmbuffer, vmflags);
+ conflict_xid = visibility_cutoff_xid;
+ }
+
/*
* Mark buffer dirty before we write WAL.
*/
@@ -2892,7 +2925,10 @@ lazy_vacuum_heap_page(LVRelState *vacrel, BlockNumber blkno, Buffer buffer,
if (RelationNeedsWAL(vacrel->rel))
{
log_heap_prune_and_freeze(vacrel->rel, buffer,
- InvalidTransactionId,
+ vmbuffer,
+ vmflags,
+ set_pd_all_vis,
+ conflict_xid,
false, /* no cleanup lock required */
PRUNE_VACUUM_CLEANUP,
NULL, 0, /* frozen */
@@ -2901,39 +2937,12 @@ lazy_vacuum_heap_page(LVRelState *vacrel, BlockNumber blkno, Buffer buffer,
unused, nunused);
}
- /*
- * End critical section, so we safely can do visibility tests (which
- * possibly need to perform IO and allocate memory!). If we crash now the
- * page (including the corresponding vm bit) might not be marked all
- * visible, but that's fine. A later vacuum will fix that.
- */
END_CRIT_SECTION();
- /*
- * Now that we have removed the LP_DEAD items from the page, once again
- * check if the page has become all-visible. The page is already marked
- * dirty, exclusively locked, and, if needed, a full page image has been
- * emitted.
- */
- Assert(!PageIsAllVisible(page));
- if (heap_page_is_all_visible(vacrel->rel, buffer, vacrel->cutoffs.OldestXmin,
- &all_frozen, &visibility_cutoff_xid, &vacrel->offnum))
+ if ((vmflags & VISIBILITYMAP_ALL_VISIBLE) != 0)
{
- uint8 flags = VISIBILITYMAP_ALL_VISIBLE;
-
- if (all_frozen)
- {
- Assert(!TransactionIdIsValid(visibility_cutoff_xid));
- flags |= VISIBILITYMAP_ALL_FROZEN;
- }
-
- PageSetAllVisible(page);
- visibilitymap_set(vacrel->rel, blkno, buffer,
- InvalidXLogRecPtr,
- vmbuffer, visibility_cutoff_xid,
- flags);
-
/* Count the newly set VM page for logging */
+ LockBuffer(vmbuffer, BUFFER_LOCK_UNLOCK);
vacrel->vm_new_visible_pages++;
if (all_frozen)
vacrel->vm_new_visible_frozen_pages++;
@@ -3600,40 +3609,85 @@ dead_items_cleanup(LVRelState *vacrel)
}
/*
- * Check if every tuple in the given page in buf is visible to all current and
- * future transactions.
+ * Wrapper for heap_page_would_be_all_visible() which can be used for
+ * callers that expect no LP_DEAD on the page.
+ */
+bool
+heap_page_is_all_visible(Relation rel, Buffer buf,
+ TransactionId OldestXmin,
+ bool *all_frozen,
+ TransactionId *visibility_cutoff_xid,
+ OffsetNumber *logging_offnum)
+{
+
+ return heap_page_would_be_all_visible(rel, buf, OldestXmin,
+ NULL, 0,
+ all_frozen,
+ visibility_cutoff_xid,
+ logging_offnum);
+}
+
+/*
+ * Determines whether or not the heap page in buf is all-visible other than
+ * the dead line pointers referred to by the provided deadoffsets array.
*
- * OldestXmin is used to determine visibility.
+ * deadoffsets are the offsets the caller knows about and already removed
+ * associated index entries. Vacuum will call this before setting those line
+ * pointers LP_UNUSED. So, if there are no new LP_DEAD items, then the page
+ * can be set all-visible in the VM by the caller.
+ *
+ * Returns true if the page is all-visible other than the provided
+ * deadoffsets and false otherwise.
*
- * Sets *all_frozen to true if every tuple on this page is frozen.
+ * OldestXmin is used to determine visibility.
*
- * Sets *visibility_cutoff_xid to the highest xmin amongst the visible tuples.
- * It is only valid if the page is all-visible.
+ * *all_frozen is an output parameter indicating to the caller if every tuple
+ * on the page is frozen.
*
* *logging_offnum will have the OffsetNumber of the current tuple being
* processed for vacuum's error callback system.
*
- * This is a stripped down version of lazy_scan_prune(). If you change
- * anything here, make sure that everything stays in sync. Note that an
- * assertion calls us to verify that everybody still agrees. Be sure to avoid
- * introducing new side-effects here.
+ * *visibility_cutoff_xid is an output parameter with the highest xmin amongst the
+ * visible tuples. It is only valid if the page is all-visible.
+ *
+ * Callers looking to verify that the page is already all-visible can call
+ * heap_page_is_all_visible().
+ *
+ * This is similar logic to that in heap_prune_record_unchanged_lp_normal() If
+ * you change anything here, make sure that everything stays in sync. Note
+ * that an assertion calls us to verify that everybody still agrees. Be sure
+ * to avoid introducing new side-effects here.
*/
static bool
-heap_page_is_all_visible(Relation rel, Buffer buf,
- TransactionId OldestXmin,
- bool *all_frozen,
- TransactionId *visibility_cutoff_xid,
- OffsetNumber *logging_offnum)
+heap_page_would_be_all_visible(Relation rel, Buffer buf,
+ TransactionId OldestXmin,
+ OffsetNumber *deadoffsets,
+ int ndeadoffsets,
+ bool *all_frozen,
+ TransactionId *visibility_cutoff_xid,
+ OffsetNumber *logging_offnum)
{
Page page = BufferGetPage(buf);
BlockNumber blockno = BufferGetBlockNumber(buf);
OffsetNumber offnum,
maxoff;
bool all_visible = true;
+ int matched_dead_count = 0;
*visibility_cutoff_xid = InvalidTransactionId;
*all_frozen = true;
+ Assert(ndeadoffsets == 0 || deadoffsets);
+
+#ifdef USE_ASSERT_CHECKING
+ /* Confirm input deadoffsets[] is strictly sorted */
+ if (ndeadoffsets > 1)
+ {
+ for (int i = 1; i < ndeadoffsets; i++)
+ Assert(deadoffsets[i - 1] < deadoffsets[i]);
+ }
+#endif
+
maxoff = PageGetMaxOffsetNumber(page);
for (offnum = FirstOffsetNumber;
offnum <= maxoff && all_visible;
@@ -3661,9 +3715,15 @@ heap_page_is_all_visible(Relation rel, Buffer buf,
*/
if (ItemIdIsDead(itemid))
{
- all_visible = false;
- *all_frozen = false;
- break;
+ if (!deadoffsets ||
+ matched_dead_count >= ndeadoffsets ||
+ deadoffsets[matched_dead_count] != offnum)
+ {
+ *all_frozen = all_visible = false;
+ break;
+ }
+ matched_dead_count++;
+ continue;
}
Assert(ItemIdIsNormal(itemid));
diff --git a/src/backend/access/rmgrdesc/heapdesc.c b/src/backend/access/rmgrdesc/heapdesc.c
index b48d7dc1d24..439f33b8061 100644
--- a/src/backend/access/rmgrdesc/heapdesc.c
+++ b/src/backend/access/rmgrdesc/heapdesc.c
@@ -103,7 +103,7 @@ plan_elem_desc(StringInfo buf, void *plan, void *data)
* code, the latter of which is used in frontend (pg_waldump) code.
*/
void
-heap_xlog_deserialize_prune_and_freeze(char *cursor, uint8 flags,
+heap_xlog_deserialize_prune_and_freeze(char *cursor, uint16 flags,
int *nplans, xlhp_freeze_plan **plans,
OffsetNumber **frz_offsets,
int *nredirected, OffsetNumber **redirected,
@@ -287,6 +287,10 @@ heap2_desc(StringInfo buf, XLogReaderState *record)
appendStringInfo(buf, ", isCatalogRel: %c",
xlrec->flags & XLHP_IS_CATALOG_REL ? 'T' : 'F');
+ if (xlrec->flags & VISIBILITYMAP_VALID_BITS)
+ appendStringInfo(buf, ", vm_flags: 0x%02X",
+ xlrec->flags & VISIBILITYMAP_VALID_BITS);
+
if (XLogRecHasBlockData(record, 0))
{
Size datalen;
diff --git a/src/include/access/heapam.h b/src/include/access/heapam.h
index a2bd5a897f8..8b47295efa2 100644
--- a/src/include/access/heapam.h
+++ b/src/include/access/heapam.h
@@ -344,6 +344,12 @@ extern void heap_inplace_update_and_unlock(Relation relation,
Buffer buffer);
extern void heap_inplace_unlock(Relation relation,
HeapTuple oldtup, Buffer buffer);
+
+extern bool heap_page_is_all_visible(Relation rel, Buffer buf,
+ TransactionId OldestXmin,
+ bool *all_frozen,
+ TransactionId *visibility_cutoff_xid,
+ OffsetNumber *logging_offnum);
extern bool heap_prepare_freeze_tuple(HeapTupleHeader tuple,
const struct VacuumCutoffs *cutoffs,
HeapPageFreeze *pagefrz,
@@ -388,6 +394,9 @@ extern void heap_page_prune_execute(Buffer buffer, bool lp_truncate_only,
OffsetNumber *nowunused, int nunused);
extern void heap_get_root_tuples(Page page, OffsetNumber *root_offsets);
extern void log_heap_prune_and_freeze(Relation relation, Buffer buffer,
+ Buffer vmbuffer,
+ uint8 vmflags,
+ bool vm_modified_heap_page,
TransactionId conflict_xid,
bool cleanup_lock,
PruneReason reason,
diff --git a/src/include/access/heapam_xlog.h b/src/include/access/heapam_xlog.h
index d4c0625b632..d8508593e7c 100644
--- a/src/include/access/heapam_xlog.h
+++ b/src/include/access/heapam_xlog.h
@@ -249,7 +249,7 @@ typedef struct xl_heap_update
* Main data section:
*
* xl_heap_prune
- * uint8 flags
+ * uint16 flags
* TransactionId snapshot_conflict_horizon
*
* Block 0 data section:
@@ -284,7 +284,7 @@ typedef struct xl_heap_update
*/
typedef struct xl_heap_prune
{
- uint8 flags;
+ uint16 flags;
/*
* If XLHP_HAS_CONFLICT_HORIZON is set, the conflict horizon XID follows,
@@ -292,10 +292,22 @@ typedef struct xl_heap_prune
*/
} xl_heap_prune;
-#define SizeOfHeapPrune (offsetof(xl_heap_prune, flags) + sizeof(uint8))
+#define SizeOfHeapPrune (offsetof(xl_heap_prune, flags) + sizeof(uint16))
+
+/*
+ * The xl_heap_prune record's flags may also contain which VM bits to set. As
+ * such, (1 << 0) and (1 << 1) are reserved for VISIBILITYMAP_ALL_VISIBLE and
+ * VISIBILITYMAP_ALL_FROZEN.
+ */
-/* to handle recovery conflict during logical decoding on standby */
-#define XLHP_IS_CATALOG_REL (1 << 1)
+/*
+ * To handle recovery conflict during logical decoding on standby, we must know
+ * if the table is a catalog table. Note that in visibilitymapdefs.h
+ * VISIBLITYMAP_XLOG_CATALOG_REL is also defined as (1 << 2). xl_heap_prune
+ * records should use XLHP_IS_CATALOG_REL, not VISIBILIYTMAP_XLOG_CATALOG_REL --
+ * even if they only contain updates to the VM.
+ */
+#define XLHP_IS_CATALOG_REL (1 << 2)
/*
* Does replaying the record require a cleanup-lock?
@@ -305,7 +317,7 @@ typedef struct xl_heap_prune
* marks LP_DEAD line pointers as unused without moving any tuple data, an
* ordinary exclusive lock is sufficient.
*/
-#define XLHP_CLEANUP_LOCK (1 << 2)
+#define XLHP_CLEANUP_LOCK (1 << 3)
/*
* If we remove or freeze any entries that contain xids, we need to include a
@@ -313,22 +325,22 @@ typedef struct xl_heap_prune
* there are no queries running for which the removed tuples are still
* visible, or which still consider the frozen XIDs as running.
*/
-#define XLHP_HAS_CONFLICT_HORIZON (1 << 3)
+#define XLHP_HAS_CONFLICT_HORIZON (1 << 4)
/*
* Indicates that an xlhp_freeze_plans sub-record and one or more
* xlhp_freeze_plan sub-records are present.
*/
-#define XLHP_HAS_FREEZE_PLANS (1 << 4)
+#define XLHP_HAS_FREEZE_PLANS (1 << 5)
/*
* XLHP_HAS_REDIRECTIONS, XLHP_HAS_DEAD_ITEMS, and XLHP_HAS_NOW_UNUSED_ITEMS
* indicate that xlhp_prune_items sub-records with redirected, dead, and
* unused item offsets are present.
*/
-#define XLHP_HAS_REDIRECTIONS (1 << 5)
-#define XLHP_HAS_DEAD_ITEMS (1 << 6)
-#define XLHP_HAS_NOW_UNUSED_ITEMS (1 << 7)
+#define XLHP_HAS_REDIRECTIONS (1 << 6)
+#define XLHP_HAS_DEAD_ITEMS (1 << 7)
+#define XLHP_HAS_NOW_UNUSED_ITEMS (1 << 8)
/*
* xlhp_freeze_plan describes how to freeze a group of one or more heap tuples
@@ -497,7 +509,7 @@ extern XLogRecPtr log_heap_visible(Relation rel, Buffer heap_buffer,
uint8 vmflags);
/* in heapdesc.c, so it can be shared between frontend/backend code */
-extern void heap_xlog_deserialize_prune_and_freeze(char *cursor, uint8 flags,
+extern void heap_xlog_deserialize_prune_and_freeze(char *cursor, uint16 flags,
int *nplans, xlhp_freeze_plan **plans,
OffsetNumber **frz_offsets,
int *nredirected, OffsetNumber **redirected,
--
2.43.0
v12-0004-Use-xl_heap_prune-record-for-setting-empty-pages.patchtext/x-patch; charset=US-ASCII; name=v12-0004-Use-xl_heap_prune-record-for-setting-empty-pages.patchDownload
From 3d979ac727c20e964a77552c7a5f06f45a7aff7c Mon Sep 17 00:00:00 2001
From: Melanie Plageman <melanieplageman@gmail.com>
Date: Wed, 18 Jun 2025 12:42:19 -0400
Subject: [PATCH v12 04/20] Use xl_heap_prune record for setting empty pages
all-visible
As part of a project to eliminate xl_heap_visible records, eliminate
their usage in phase I vacuum of empty pages.
---
src/backend/access/heap/pruneheap.c | 14 +++++--
src/backend/access/heap/vacuumlazy.c | 57 ++++++++++++++++++----------
src/include/access/heapam.h | 1 +
3 files changed, 49 insertions(+), 23 deletions(-)
diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index f0b33d1b696..373986b204a 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -836,6 +836,7 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
conflict_xid = prstate.latest_xid_removed;
log_heap_prune_and_freeze(relation, buffer,
+ false,
InvalidBuffer, 0, false,
conflict_xid,
true, reason,
@@ -2055,6 +2056,9 @@ heap_log_freeze_plan(HeapTupleFreeze *tuples, int ntuples,
* case, vmbuffer should already have been updated and marked dirty and should
* still be pinned and locked.
*
+ * force_heap_fpi indicates that a full page image of the heap block should be
+ * forced.
+ *
* set_pd_all_vis indicates that we set PD_ALL_VISIBLE and thus should update
* the page LSN when checksums/wal_log_hints are enabled even if we did not
* prune or freeze tuples on the page.
@@ -2065,6 +2069,7 @@ heap_log_freeze_plan(HeapTupleFreeze *tuples, int ntuples,
*/
void
log_heap_prune_and_freeze(Relation relation, Buffer buffer,
+ bool force_heap_fpi,
Buffer vmbuffer,
uint8 vmflags,
bool set_pd_all_vis,
@@ -2095,13 +2100,16 @@ log_heap_prune_and_freeze(Relation relation, Buffer buffer,
regbuf_flags = REGBUF_STANDARD;
+ if (force_heap_fpi)
+ regbuf_flags |= REGBUF_FORCE_IMAGE;
+
/*
* We can avoid an FPI if the only modification we are making to the heap
* page is to set PD_ALL_VISIBLE and checksums/wal_log_hints are disabled.
*/
- if (!do_prune &&
- nfrozen == 0 &&
- (!set_pd_all_vis || !XLogHintBitIsNeeded()))
+ else if (!do_prune &&
+ nfrozen == 0 &&
+ (!set_pd_all_vis || !XLogHintBitIsNeeded()))
regbuf_flags |= REGBUF_NO_IMAGE;
/*
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index 51067264004..a1cdaaebb57 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -1877,33 +1877,49 @@ lazy_scan_new_or_empty(LVRelState *vacrel, Buffer buf, BlockNumber blkno,
*/
if (!PageIsAllVisible(page))
{
+ uint8 new_vmbits = VISIBILITYMAP_ALL_VISIBLE |
+ VISIBILITYMAP_ALL_FROZEN;
+
+ /* Lock vmbuffer before entering critical section */
+ LockBuffer(vmbuffer, BUFFER_LOCK_EXCLUSIVE);
+
START_CRIT_SECTION();
- /* mark buffer dirty before writing a WAL record */
+ PageSetAllVisible(page);
MarkBufferDirty(buf);
- /*
- * It's possible that another backend has extended the heap,
- * initialized the page, and then failed to WAL-log the page due
- * to an ERROR. Since heap extension is not WAL-logged, recovery
- * might try to replay our record setting the page all-visible and
- * find that the page isn't initialized, which will cause a PANIC.
- * To prevent that, check whether the page has been previously
- * WAL-logged, and if not, do that now.
- */
- if (RelationNeedsWAL(vacrel->rel) &&
- PageGetLSN(page) == InvalidXLogRecPtr)
- log_newpage_buffer(buf, true);
+ visibilitymap_set_vmbits(vacrel->rel, blkno,
+ vmbuffer, new_vmbits);
+
+ if (RelationNeedsWAL(vacrel->rel))
+ {
+ /*
+ * It's possible that another backend has extended the heap,
+ * initialized the page, and then failed to WAL-log the page
+ * due to an ERROR. Since heap extension is not WAL-logged,
+ * recovery might try to replay our record setting the page
+ * all-visible and find that the page isn't initialized, which
+ * will cause a PANIC. To prevent that, if the page hasn't
+ * been previously WAL-logged, force a heap FPI.
+ */
+ log_heap_prune_and_freeze(vacrel->rel, buf,
+ PageGetLSN(page) == InvalidXLogRecPtr,
+ vmbuffer,
+ new_vmbits,
+ true,
+ InvalidTransactionId,
+ false, PRUNE_VACUUM_SCAN,
+ NULL, 0,
+ NULL, 0,
+ NULL, 0,
+ NULL, 0);
+ }
- PageSetAllVisible(page);
- visibilitymap_set(vacrel->rel, blkno, buf,
- InvalidXLogRecPtr,
- vmbuffer, InvalidTransactionId,
- VISIBILITYMAP_ALL_VISIBLE |
- VISIBILITYMAP_ALL_FROZEN);
END_CRIT_SECTION();
- /* Count the newly all-frozen pages for logging */
+ LockBuffer(vmbuffer, BUFFER_LOCK_UNLOCK);
+
+ /* Count the newly all-frozen pages for logging. */
vacrel->vm_new_visible_pages++;
vacrel->vm_new_visible_frozen_pages++;
}
@@ -2925,6 +2941,7 @@ lazy_vacuum_heap_page(LVRelState *vacrel, BlockNumber blkno, Buffer buffer,
if (RelationNeedsWAL(vacrel->rel))
{
log_heap_prune_and_freeze(vacrel->rel, buffer,
+ false,
vmbuffer,
vmflags,
set_pd_all_vis,
diff --git a/src/include/access/heapam.h b/src/include/access/heapam.h
index 8b47295efa2..e7129a644a1 100644
--- a/src/include/access/heapam.h
+++ b/src/include/access/heapam.h
@@ -394,6 +394,7 @@ extern void heap_page_prune_execute(Buffer buffer, bool lp_truncate_only,
OffsetNumber *nowunused, int nunused);
extern void heap_get_root_tuples(Page page, OffsetNumber *root_offsets);
extern void log_heap_prune_and_freeze(Relation relation, Buffer buffer,
+ bool force_heap_fpi,
Buffer vmbuffer,
uint8 vmflags,
bool vm_modified_heap_page,
--
2.43.0
v12-0007-Find-and-fix-VM-corruption-in-heap_page_prune_an.patchtext/x-patch; charset=US-ASCII; name=v12-0007-Find-and-fix-VM-corruption-in-heap_page_prune_an.patchDownload
From 133a81c37921fc4a13e85795dc3a6f39726b0254 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <melanieplageman@gmail.com>
Date: Wed, 28 May 2025 16:45:59 -0400
Subject: [PATCH v12 07/20] Find and fix VM corruption in
heap_page_prune_and_freeze
Future commits will update the VM in the same critical section and WAL
record as pruning and freezing. For ease of review, this commit makes
one step toward doing this. It moves the VM corruption handling case to
heap_page_prune_and_freeze().
This commit is only really meant for review, as it adds a member to
PruneFreezeResult (vm_corruption) that is removed in later commits.
---
src/backend/access/heap/pruneheap.c | 93 +++++++++++++++++++++++++++-
src/backend/access/heap/vacuumlazy.c | 84 +++----------------------
src/include/access/heapam.h | 4 ++
3 files changed, 102 insertions(+), 79 deletions(-)
diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index 373986b204a..54af3296b91 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -21,7 +21,7 @@
#include "access/transam.h"
#include "access/xlog.h"
#include "access/xloginsert.h"
-#include "access/visibilitymapdefs.h"
+#include "access/visibilitymap.h"
#include "commands/vacuum.h"
#include "executor/instrument.h"
#include "miscadmin.h"
@@ -177,6 +177,13 @@ static void heap_prune_record_unchanged_lp_redirect(PruneState *prstate, OffsetN
static void page_verify_redirects(Page page);
+static bool identify_and_fix_vm_corruption(Relation relation,
+ BlockNumber heap_blk,
+ Buffer heap_buffer, Page heap_page,
+ bool heap_blk_known_av,
+ int64 nlpdead_items,
+ Buffer vmbuffer);
+
/*
* Optionally prune and repair fragmentation in the specified page.
@@ -261,7 +268,9 @@ heap_page_prune_opt(Relation relation, Buffer buffer)
* not the relation has indexes, since we cannot safely determine
* that during on-access pruning with the current implementation.
*/
- heap_page_prune_and_freeze(relation, buffer, vistest, 0,
+ heap_page_prune_and_freeze(relation, buffer, false,
+ InvalidBuffer,
+ vistest, 0,
NULL, &presult, PRUNE_ON_ACCESS, &dummy_off_loc, NULL, NULL);
/*
@@ -294,6 +303,70 @@ heap_page_prune_opt(Relation relation, Buffer buffer)
}
}
+/*
+ * When updating the visibility map after phase I heap vacuuming, we take the
+ * opportunity to identify and fix any VM corruption.
+ *
+ * heap_blk_known_av is the visibility status of the heap page collected
+ * while finding the next unskippable block in heap_vac_scan_next_block().
+ */
+static bool
+identify_and_fix_vm_corruption(Relation relation,
+ BlockNumber heap_blk,
+ Buffer heap_buffer, Page heap_page,
+ bool heap_blk_known_av,
+ int64 nlpdead_items,
+ Buffer vmbuffer)
+{
+ /*
+ * As of PostgreSQL 9.2, the visibility map bit should never be set if the
+ * page-level bit is clear. However, it's possible that the bit got
+ * cleared after heap_vac_scan_next_block() was called, so we must recheck
+ * with buffer lock before concluding that the VM is corrupt.
+ */
+ if (heap_blk_known_av && !PageIsAllVisible(heap_page) &&
+ visibilitymap_get_status(relation, heap_blk, &vmbuffer) != 0)
+ {
+ ereport(WARNING,
+ (errcode(ERRCODE_DATA_CORRUPTED),
+ errmsg("page is not marked all-visible but visibility map bit is set in relation \"%s\" page %u",
+ RelationGetRelationName(relation), heap_blk)));
+
+ visibilitymap_clear(relation, heap_blk, vmbuffer,
+ VISIBILITYMAP_VALID_BITS);
+ return true;
+ }
+
+ /*
+ * It's possible for the value returned by
+ * GetOldestNonRemovableTransactionId() to move backwards, so it's not
+ * wrong for us to see tuples that appear to not be visible to everyone
+ * yet, while PD_ALL_VISIBLE is already set. The real safe xmin value
+ * never moves backwards, but GetOldestNonRemovableTransactionId() is
+ * conservative and sometimes returns a value that's unnecessarily small,
+ * so if we see that contradiction it just means that the tuples that we
+ * think are not visible to everyone yet actually are, and the
+ * PD_ALL_VISIBLE flag is correct.
+ *
+ * There should never be LP_DEAD items on a page with PD_ALL_VISIBLE set,
+ * however.
+ */
+ if (nlpdead_items > 0 && PageIsAllVisible(heap_page))
+ {
+ ereport(WARNING,
+ (errcode(ERRCODE_DATA_CORRUPTED),
+ errmsg("page containing LP_DEAD items is marked as all-visible in relation \"%s\" page %u",
+ RelationGetRelationName(relation), heap_blk)));
+
+ PageClearAllVisible(heap_page);
+ MarkBufferDirty(heap_buffer);
+ visibilitymap_clear(relation, heap_blk, vmbuffer,
+ VISIBILITYMAP_VALID_BITS);
+ return true;
+ }
+
+ return false;
+}
/*
* Prune and repair fragmentation and potentially freeze tuples on the
@@ -314,6 +387,10 @@ heap_page_prune_opt(Relation relation, Buffer buffer)
* HEAP_PRUNE_FREEZE option is not set, because at the moment only callers
* that also freeze need that information.
*
+ * blk_known_av is the visibility status of the heap block as of the last call
+ * to find_next_unskippable_block(). vmbuffer is the buffer that may already
+ * contain the required block of the visibility map.
+ *
* vistest is used to distinguish whether tuples are DEAD or RECENTLY_DEAD
* (see heap_prune_satisfies_vacuum).
*
@@ -349,6 +426,8 @@ heap_page_prune_opt(Relation relation, Buffer buffer)
*/
void
heap_page_prune_and_freeze(Relation relation, Buffer buffer,
+ bool blk_known_av,
+ Buffer vmbuffer,
GlobalVisState *vistest,
int options,
struct VacuumCutoffs *cutoffs,
@@ -897,6 +976,16 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
presult->lpdead_items = prstate.lpdead_items;
/* the presult->deadoffsets array was already filled in */
+ /*
+ * Clear any VM corruption. This does not need to be done in a critical
+ * section.
+ */
+ presult->vm_corruption = false;
+ if (options & HEAP_PAGE_PRUNE_UPDATE_VM)
+ presult->vm_corruption = identify_and_fix_vm_corruption(relation,
+ blockno, buffer, page,
+ blk_known_av,
+ prstate.lpdead_items, vmbuffer);
if (prstate.freeze)
{
if (presult->nfrozen > 0)
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index 1febb524d41..574e415b0e0 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -430,12 +430,6 @@ static void find_next_unskippable_block(LVRelState *vacrel, bool *skipsallvis);
static bool lazy_scan_new_or_empty(LVRelState *vacrel, Buffer buf,
BlockNumber blkno, Page page,
bool sharelock, Buffer vmbuffer);
-static bool identify_and_fix_vm_corruption(Relation relation,
- BlockNumber heap_blk,
- Buffer heap_buffer, Page heap_page,
- bool heap_blk_known_av,
- int64 nlpdead_items,
- Buffer vmbuffer);
static int lazy_scan_prune(LVRelState *vacrel, Buffer buf,
BlockNumber blkno, Page page,
Buffer vmbuffer, bool all_visible_according_to_vm,
@@ -1940,72 +1934,6 @@ lazy_scan_new_or_empty(LVRelState *vacrel, Buffer buf, BlockNumber blkno,
return false;
}
-/*
- * When updating the visibility map after phase I heap vacuuming, we take the
- * opportunity to identify and fix any VM corruption.
- *
- * heap_blk_known_av is the visibility status of the heap page collected
- * while finding the next unskippable block in heap_vac_scan_next_block().
- */
-static bool
-identify_and_fix_vm_corruption(Relation relation,
- BlockNumber heap_blk,
- Buffer heap_buffer, Page heap_page,
- bool heap_blk_known_av,
- int64 nlpdead_items,
- Buffer vmbuffer)
-{
- /*
- * As of PostgreSQL 9.2, the visibility map bit should never be set if the
- * page-level bit is clear. However, it's possible that the bit got
- * cleared after heap_vac_scan_next_block() was called, so we must recheck
- * with buffer lock before concluding that the VM is corrupt.
- */
- if (heap_blk_known_av && !PageIsAllVisible(heap_page) &&
- visibilitymap_get_status(relation, heap_blk, &vmbuffer) != 0)
- {
- ereport(WARNING,
- (errcode(ERRCODE_DATA_CORRUPTED),
- errmsg("page is not marked all-visible but visibility map bit is set in relation \"%s\" page %u",
- RelationGetRelationName(relation), heap_blk)));
-
- visibilitymap_clear(relation, heap_blk, vmbuffer,
- VISIBILITYMAP_VALID_BITS);
- return true;
- }
-
- /*
- * It's possible for the value returned by
- * GetOldestNonRemovableTransactionId() to move backwards, so it's not
- * wrong for us to see tuples that appear to not be visible to everyone
- * yet, while PD_ALL_VISIBLE is already set. The real safe xmin value
- * never moves backwards, but GetOldestNonRemovableTransactionId() is
- * conservative and sometimes returns a value that's unnecessarily small,
- * so if we see that contradiction it just means that the tuples that we
- * think are not visible to everyone yet actually are, and the
- * PD_ALL_VISIBLE flag is correct.
- *
- * There should never be LP_DEAD items on a page with PD_ALL_VISIBLE set,
- * however.
- */
- if (nlpdead_items > 0 && PageIsAllVisible(heap_page))
- {
- ereport(WARNING,
- (errcode(ERRCODE_DATA_CORRUPTED),
- errmsg("page containing LP_DEAD items is marked as all-visible in relation \"%s\" page %u",
- RelationGetRelationName(relation), heap_blk)));
-
- PageClearAllVisible(heap_page);
- MarkBufferDirty(heap_buffer);
- visibilitymap_clear(relation, heap_blk, vmbuffer,
- VISIBILITYMAP_VALID_BITS);
- return true;
- }
-
- return false;
-}
-
-
/* qsort comparator for sorting OffsetNumbers */
static int
cmpOffsetNumbers(const void *a, const void *b)
@@ -2063,11 +1991,14 @@ lazy_scan_prune(LVRelState *vacrel,
* tuples. Pruning will have determined whether or not the page is
* all-visible.
*/
- prune_options = HEAP_PAGE_PRUNE_FREEZE;
+ prune_options = HEAP_PAGE_PRUNE_FREEZE | HEAP_PAGE_PRUNE_UPDATE_VM;
if (vacrel->nindexes == 0)
prune_options |= HEAP_PAGE_PRUNE_MARK_UNUSED_NOW;
- heap_page_prune_and_freeze(rel, buf, vacrel->vistest, prune_options,
+ heap_page_prune_and_freeze(rel, buf,
+ all_visible_according_to_vm,
+ vmbuffer,
+ vacrel->vistest, prune_options,
&vacrel->cutoffs, &presult, PRUNE_VACUUM_SCAN,
&vacrel->offnum,
&vacrel->NewRelfrozenXid, &vacrel->NewRelminMxid);
@@ -2152,10 +2083,9 @@ lazy_scan_prune(LVRelState *vacrel,
/*
* Handle setting visibility map bit based on information from the VM (as
* of last heap_vac_scan_next_block() call), and from all_visible and
- * all_frozen variables. Start by looking for any VM corruption.
+ * all_frozen variables.
*/
- if (identify_and_fix_vm_corruption(vacrel->rel, blkno, buf, page,
- all_visible_according_to_vm, presult.lpdead_items, vmbuffer))
+ if (presult.vm_corruption)
{
/* Don't update the VM if we just cleared corruption in it */
}
diff --git a/src/include/access/heapam.h b/src/include/access/heapam.h
index e7129a644a1..0c7eb5e46f4 100644
--- a/src/include/access/heapam.h
+++ b/src/include/access/heapam.h
@@ -42,6 +42,7 @@
/* "options" flag bits for heap_page_prune_and_freeze */
#define HEAP_PAGE_PRUNE_MARK_UNUSED_NOW (1 << 0)
#define HEAP_PAGE_PRUNE_FREEZE (1 << 1)
+#define HEAP_PAGE_PRUNE_UPDATE_VM (1 << 2)
typedef struct BulkInsertStateData *BulkInsertState;
struct TupleTableSlot;
@@ -247,6 +248,7 @@ typedef struct PruneFreezeResult
bool all_visible;
bool all_frozen;
TransactionId vm_conflict_horizon;
+ bool vm_corruption;
/*
* Whether or not the page makes rel truncation unsafe. This is set to
@@ -380,6 +382,8 @@ extern TransactionId heap_index_delete_tuples(Relation rel,
struct GlobalVisState;
extern void heap_page_prune_opt(Relation relation, Buffer buffer);
extern void heap_page_prune_and_freeze(Relation relation, Buffer buffer,
+ bool blk_known_av,
+ Buffer vmbuffer,
struct GlobalVisState *vistest,
int options,
struct VacuumCutoffs *cutoffs,
--
2.43.0
v12-0006-Combine-vacuum-phase-I-VM-update-cases.patchtext/x-patch; charset=US-ASCII; name=v12-0006-Combine-vacuum-phase-I-VM-update-cases.patchDownload
From f73b16cbea6a580ed7cf0c72c37c9a3251fa4cf4 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <melanieplageman@gmail.com>
Date: Wed, 28 May 2025 16:35:36 -0400
Subject: [PATCH v12 06/20] Combine vacuum phase I VM update cases
We update the VM after phase I of vacuum -- either setting both the VM
bits when all bits are currently unset or setting just the frozen bit
when the all-visible bit is already set.
Those two cases shared much of the same code -- leading to unnecessary
duplication. This commit combines them, which is simpler and easier to
understand.
The combined case also happens to fix a longstanding bug where if we are
only setting an all-visible page all-frozen and checksums/wal_log_hints
are enabled, we would fail to set the buffer dirty before setting the
page LSN in visibilitymap_set().
---
src/backend/access/heap/vacuumlazy.c | 101 +++++++++------------------
1 file changed, 32 insertions(+), 69 deletions(-)
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index e9b4e924d22..1febb524d41 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -2159,11 +2159,26 @@ lazy_scan_prune(LVRelState *vacrel,
{
/* Don't update the VM if we just cleared corruption in it */
}
- else if (!all_visible_according_to_vm && presult.all_visible)
+
+ /*
+ * If the page isn't yet marked all-visible in the VM or it is and needs
+ * to me marked all-frozen, update the VM. Note that all_frozen is only
+ * valid if all_visible is true, so we must check both all_visible and
+ * all_frozen.
+ */
+ else if (presult.all_visible &&
+ (!all_visible_according_to_vm ||
+ (presult.all_frozen && !VM_ALL_FROZEN(vacrel->rel, blkno, &vmbuffer))))
{
uint8 old_vmbits;
uint8 flags = VISIBILITYMAP_ALL_VISIBLE;
+ /*
+ * If the page is all-frozen, we can pass InvalidTransactionId as our
+ * cutoff_xid, since a snapshotConflictHorizon sufficient to make
+ * everything safe for REDO was logged when the page's tuples were
+ * frozen.
+ */
if (presult.all_frozen)
{
Assert(!TransactionIdIsValid(presult.vm_conflict_horizon));
@@ -2176,21 +2191,29 @@ lazy_scan_prune(LVRelState *vacrel,
* checksums are not enabled). Regardless, set both bits so that we
* get back in sync.
*
- * NB: If the heap page is all-visible but the VM bit is not set, we
- * don't need to dirty the heap page. However, if checksums are
- * enabled, we do need to make sure that the heap page is dirtied
- * before passing it to visibilitymap_set(), because it may be logged.
- * Given that this situation should only happen in rare cases after a
- * crash, it is not worth optimizing.
+ * If the heap page is all-visible but the VM bit is not set, we don't
+ * need to dirty the heap page. However, if checksums are enabled, we
+ * do need to make sure that the heap page is dirtied before passing
+ * it to visibilitymap_set(), because it may be logged.
*/
- PageSetAllVisible(page);
- MarkBufferDirty(buf);
+ if (!PageIsAllVisible(page) || XLogHintBitIsNeeded())
+ {
+ PageSetAllVisible(page);
+ MarkBufferDirty(buf);
+ }
+
old_vmbits = visibilitymap_set(vacrel->rel, blkno, buf,
InvalidXLogRecPtr,
vmbuffer, presult.vm_conflict_horizon,
flags);
/*
+ * Even if we are only setting the all-frozen bit, there is a small
+ * chance that the VM was modified sometime between setting
+ * all_visible_according_to_vm and checking the visibility during
+ * pruning. Check the return value of old_vmbits to ensure the
+ * visibility map counters used for logging are accurate.
+ *
* If the page wasn't already set all-visible and/or all-frozen in the
* VM, count it as newly set for logging.
*/
@@ -2211,66 +2234,6 @@ lazy_scan_prune(LVRelState *vacrel,
}
}
- /*
- * If the all-visible page is all-frozen but not marked as such yet, mark
- * it as all-frozen. Note that all_frozen is only valid if all_visible is
- * true, so we must check both all_visible and all_frozen.
- */
- else if (all_visible_according_to_vm && presult.all_visible &&
- presult.all_frozen && !VM_ALL_FROZEN(vacrel->rel, blkno, &vmbuffer))
- {
- uint8 old_vmbits;
-
- /*
- * Avoid relying on all_visible_according_to_vm as a proxy for the
- * page-level PD_ALL_VISIBLE bit being set, since it might have become
- * stale -- even when all_visible is set
- */
- if (!PageIsAllVisible(page))
- {
- PageSetAllVisible(page);
- MarkBufferDirty(buf);
- }
-
- /*
- * Set the page all-frozen (and all-visible) in the VM.
- *
- * We can pass InvalidTransactionId as our cutoff_xid, since a
- * snapshotConflictHorizon sufficient to make everything safe for REDO
- * was logged when the page's tuples were frozen.
- */
- Assert(!TransactionIdIsValid(presult.vm_conflict_horizon));
- old_vmbits = visibilitymap_set(vacrel->rel, blkno, buf,
- InvalidXLogRecPtr,
- vmbuffer, InvalidTransactionId,
- VISIBILITYMAP_ALL_VISIBLE |
- VISIBILITYMAP_ALL_FROZEN);
-
- /*
- * The page was likely already set all-visible in the VM. However,
- * there is a small chance that it was modified sometime between
- * setting all_visible_according_to_vm and checking the visibility
- * during pruning. Check the return value of old_vmbits anyway to
- * ensure the visibility map counters used for logging are accurate.
- */
- if ((old_vmbits & VISIBILITYMAP_ALL_VISIBLE) == 0)
- {
- vacrel->vm_new_visible_pages++;
- vacrel->vm_new_visible_frozen_pages++;
- *vm_page_frozen = true;
- }
-
- /*
- * We already checked that the page was not set all-frozen in the VM
- * above, so we don't need to test the value of old_vmbits.
- */
- else
- {
- vacrel->vm_new_frozen_pages++;
- *vm_page_frozen = true;
- }
- }
-
return presult.ndeleted;
}
--
2.43.0
v12-0010-Rename-PruneState.freeze-to-attempt_freeze.patchtext/x-patch; charset=US-ASCII; name=v12-0010-Rename-PruneState.freeze-to-attempt_freeze.patchDownload
From 8277be755b187e66a57bfff15a2d46f98656f4ca Mon Sep 17 00:00:00 2001
From: Melanie Plageman <melanieplageman@gmail.com>
Date: Thu, 31 Jul 2025 14:07:51 -0400
Subject: [PATCH v12 10/20] Rename PruneState.freeze to attempt_freeze
This makes it more clear that this is to indicate the caller would like
heap_page_prune_and_freeze() to consider freezing tuples -- not that we
ultimately will end up freezing them.
Also rename local variable hint_bit_fpi to did_tuple_hint_fpi. This
makes it clear it is about tuple hints and not page hints and that it
indicates something that happened and not something that could happen.
---
src/backend/access/heap/pruneheap.c | 20 ++++++++++----------
1 file changed, 10 insertions(+), 10 deletions(-)
diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index 398962ed1cb..df3e6439176 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -43,7 +43,7 @@ typedef struct
/* whether or not dead items can be set LP_UNUSED during pruning */
bool mark_unused_now;
/* whether to attempt freezing tuples */
- bool freeze;
+ bool attempt_freeze;
struct VacuumCutoffs *cutoffs;
/*-------------------------------------------------------
@@ -445,13 +445,13 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
bool do_hint;
uint8 vmflags = 0;
uint8 old_vmbits = 0;
- bool hint_bit_fpi;
+ bool did_tuple_hint_fpi;
int64 fpi_before = pgWalUsage.wal_fpi;
/* Copy parameters to prstate */
prstate.vistest = vistest;
prstate.mark_unused_now = (options & HEAP_PAGE_PRUNE_MARK_UNUSED_NOW) != 0;
- prstate.freeze = (options & HEAP_PAGE_PRUNE_FREEZE) != 0;
+ prstate.attempt_freeze = (options & HEAP_PAGE_PRUNE_FREEZE) != 0;
prstate.cutoffs = cutoffs;
/*
@@ -473,7 +473,7 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
/* initialize page freezing working state */
prstate.pagefrz.freeze_required = false;
- if (prstate.freeze)
+ if (prstate.attempt_freeze)
{
Assert(new_relfrozen_xid && new_relmin_mxid);
prstate.pagefrz.FreezePageRelfrozenXid = *new_relfrozen_xid;
@@ -520,7 +520,7 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
* function, when we return the value to the caller, so that the caller
* doesn't set the VM bit incorrectly.
*/
- if (prstate.freeze)
+ if (prstate.attempt_freeze)
{
prstate.all_visible = true;
prstate.all_frozen = true;
@@ -634,7 +634,7 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
* If checksums are enabled, heap_prune_satisfies_vacuum() may have caused
* an FPI to be emitted.
*/
- hint_bit_fpi = fpi_before != pgWalUsage.wal_fpi;
+ did_tuple_hint_fpi = fpi_before != pgWalUsage.wal_fpi;
/*
* Process HOT chains.
@@ -750,7 +750,7 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
* plans we prepared, or not.
*/
do_freeze = false;
- if (prstate.freeze)
+ if (prstate.attempt_freeze)
{
if (prstate.pagefrz.freeze_required)
{
@@ -783,7 +783,7 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
*/
if (RelationNeedsWAL(relation))
{
- if (hint_bit_fpi)
+ if (did_tuple_hint_fpi)
do_freeze = true;
else if (do_prune)
{
@@ -1046,7 +1046,7 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
presult->old_vmbits = old_vmbits;
presult->new_vmbits = vmflags;
- if (prstate.freeze)
+ if (prstate.attempt_freeze)
{
if (presult->nfrozen > 0)
{
@@ -1628,7 +1628,7 @@ heap_prune_record_unchanged_lp_normal(Page page, PruneState *prstate, OffsetNumb
}
/* Consider freezing any normal tuples which will not be removed */
- if (prstate->freeze)
+ if (prstate->attempt_freeze)
{
bool totally_frozen;
--
2.43.0
v12-0008-Keep-all_frozen-updated-too-in-heap_page_prune_a.patchtext/x-patch; charset=US-ASCII; name=v12-0008-Keep-all_frozen-updated-too-in-heap_page_prune_a.patchDownload
From 6eecf49c63134c561082dc6a85fdb35d752aea53 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <melanieplageman@gmail.com>
Date: Wed, 23 Jul 2025 16:01:24 -0400
Subject: [PATCH v12 08/20] Keep all_frozen updated too in
heap_page_prune_and_freeze
We previously relied on only using all-visible and all-frozen together
but it's best to keep them both updated.
Future commits will separate usage of these fields, so it is best not to
rely on all_visible for all_frozen's validity.
---
src/backend/access/heap/pruneheap.c | 15 ++++++---------
1 file changed, 6 insertions(+), 9 deletions(-)
diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index 54af3296b91..bbd83e4fcc7 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -143,10 +143,6 @@ typedef struct
* whether to freeze the page or not. The all_visible and all_frozen
* values returned to the caller are adjusted to include LP_DEAD items at
* the end.
- *
- * all_frozen should only be considered valid if all_visible is also set;
- * we don't bother to clear the all_frozen flag every time we clear the
- * all_visible flag.
*/
bool all_visible;
bool all_frozen;
@@ -830,6 +826,7 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
*/
}
+ Assert(!prstate.all_frozen || prstate.all_visible);
/* Any error while applying the changes is critical */
START_CRIT_SECTION();
@@ -1474,7 +1471,7 @@ heap_prune_record_unchanged_lp_normal(Page page, PruneState *prstate, OffsetNumb
if (!HeapTupleHeaderXminCommitted(htup))
{
- prstate->all_visible = false;
+ prstate->all_visible = prstate->all_frozen = false;
break;
}
@@ -1496,7 +1493,7 @@ heap_prune_record_unchanged_lp_normal(Page page, PruneState *prstate, OffsetNumb
Assert(prstate->cutoffs);
if (!TransactionIdPrecedes(xmin, prstate->cutoffs->OldestXmin))
{
- prstate->all_visible = false;
+ prstate->all_visible = prstate->all_frozen = false;
break;
}
@@ -1509,7 +1506,7 @@ heap_prune_record_unchanged_lp_normal(Page page, PruneState *prstate, OffsetNumb
case HEAPTUPLE_RECENTLY_DEAD:
prstate->recently_dead_tuples++;
- prstate->all_visible = false;
+ prstate->all_visible = prstate->all_frozen = false;
/*
* This tuple will soon become DEAD. Update the hint field so
@@ -1528,7 +1525,7 @@ heap_prune_record_unchanged_lp_normal(Page page, PruneState *prstate, OffsetNumb
* assumption is a bit shaky, but it is what acquire_sample_rows()
* does, so be consistent.
*/
- prstate->all_visible = false;
+ prstate->all_visible = prstate->all_frozen = false;
/*
* If we wanted to optimize for aborts, we might consider marking
@@ -1546,7 +1543,7 @@ heap_prune_record_unchanged_lp_normal(Page page, PruneState *prstate, OffsetNumb
* will commit and update the counters after we report.
*/
prstate->live_tuples++;
- prstate->all_visible = false;
+ prstate->all_visible = prstate->all_frozen = false;
/*
* This tuple may soon become DEAD. Update the hint field so that
--
2.43.0
v12-0009-Update-VM-in-pruneheap.c.patchtext/x-patch; charset=US-ASCII; name=v12-0009-Update-VM-in-pruneheap.c.patchDownload
From 2e2fc840a3af65fec6eee2a8eb2de30839a8ca52 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <melanieplageman@gmail.com>
Date: Mon, 2 Jun 2025 11:04:14 -0400
Subject: [PATCH v12 09/20] Update VM in pruneheap.c
As a step toward updating the VM in the same critical section and WAL
record as pruning and freezing (during phase I of vacuuming), first move
the VM update (still in its own critical section and WAL record) into
heap_page_prune_and_freeze(). This makes review easier.
---
src/backend/access/heap/pruneheap.c | 99 +++++++++++++++++++++++-----
src/backend/access/heap/vacuumlazy.c | 98 +++++----------------------
src/include/access/heapam.h | 15 +++--
3 files changed, 106 insertions(+), 106 deletions(-)
diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index bbd83e4fcc7..398962ed1cb 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -366,7 +366,8 @@ identify_and_fix_vm_corruption(Relation relation,
/*
* Prune and repair fragmentation and potentially freeze tuples on the
- * specified page.
+ * specified page. If the page's visibility status has changed, update it in
+ * the VM.
*
* Caller must have pin and buffer cleanup lock on the page. Note that we
* don't update the FSM information for page on caller's behalf. Caller might
@@ -442,6 +443,8 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
bool do_freeze;
bool do_prune;
bool do_hint;
+ uint8 vmflags = 0;
+ uint8 old_vmbits = 0;
bool hint_bit_fpi;
int64 fpi_before = pgWalUsage.wal_fpi;
@@ -942,7 +945,7 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
*
* Now that freezing has been finalized, unset all_visible if there are
* any LP_DEAD items on the page. It needs to reflect the present state
- * of the page, as expected by our caller.
+ * of the page, as expected for updating the visibility map.
*/
if (prstate.all_visible && prstate.lpdead_items == 0)
{
@@ -958,31 +961,91 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
presult->hastup = prstate.hastup;
/*
- * For callers planning to update the visibility map, the conflict horizon
- * for that record must be the newest xmin on the page. However, if the
- * page is completely frozen, there can be no conflict and the
- * vm_conflict_horizon should remain InvalidTransactionId. This includes
- * the case that we just froze all the tuples; the prune-freeze record
- * included the conflict XID already so the caller doesn't need it.
+ * If updating the visibility map, the conflict horizon for that record
+ * must be the newest xmin on the page. However, if the page is
+ * completely frozen, there can be no conflict and the vm_conflict_horizon
+ * should remain InvalidTransactionId. This includes the case that we
+ * just froze all the tuples; the prune-freeze record included the
+ * conflict XID already so the VM update record doesn't need it.
*/
if (presult->all_frozen)
presult->vm_conflict_horizon = InvalidTransactionId;
else
presult->vm_conflict_horizon = prstate.visibility_cutoff_xid;
- presult->lpdead_items = prstate.lpdead_items;
- /* the presult->deadoffsets array was already filled in */
-
/*
- * Clear any VM corruption. This does not need to be done in a critical
- * section.
+ * Handle setting visibility map bit based on information from the VM (as
+ * of last heap_vac_scan_next_block() call), and from all_visible and
+ * all_frozen variables.
*/
- presult->vm_corruption = false;
if (options & HEAP_PAGE_PRUNE_UPDATE_VM)
- presult->vm_corruption = identify_and_fix_vm_corruption(relation,
- blockno, buffer, page,
- blk_known_av,
- prstate.lpdead_items, vmbuffer);
+ {
+ if (identify_and_fix_vm_corruption(relation,
+ blockno, buffer, page,
+ blk_known_av,
+ prstate.lpdead_items, vmbuffer))
+ {
+ /* If we fix corruption, don't update the VM further */
+ }
+
+ /*
+ * If the page isn't yet marked all-visible in the VM or it is and
+ * needs to me marked all-frozen, update the VM. Note that all_frozen
+ * is only valid if all_visible is true, so we must check both
+ * all_visible and all_frozen.
+ */
+ else if (presult->all_visible &&
+ (!blk_known_av ||
+ (presult->all_frozen && !VM_ALL_FROZEN(relation, blockno, &vmbuffer))))
+ {
+ Assert(prstate.lpdead_items == 0);
+ vmflags = VISIBILITYMAP_ALL_VISIBLE;
+
+ /*
+ * If the page is all-frozen, we can pass InvalidTransactionId as
+ * our cutoff_xid, since a snapshotConflictHorizon sufficient to
+ * make everything safe for REDO was logged when the page's tuples
+ * were frozen.
+ */
+ if (presult->all_frozen)
+ {
+ Assert(!TransactionIdIsValid(presult->vm_conflict_horizon));
+ vmflags |= VISIBILITYMAP_ALL_FROZEN;
+ }
+
+ /*
+ * It's possible for the VM bit to be clear and the page-level bit
+ * to be set if checksums are not enabled.
+ *
+ * And even if we are just planning to update the frozen bit in
+ * the VM, we shouldn't rely on all_visible_according_to_vm as a
+ * proxy for the page-level PD_ALL_VISIBLE bit being set, since it
+ * might have become stale.
+ *
+ * If the heap page is all-visible but the VM bit is not set, we
+ * don't need to dirty the heap page. However, if checksums are
+ * enabled, we do need to make sure that the heap page is dirtied
+ * before passing it to visibilitymap_set(), because it may be
+ * logged.
+ */
+ if (!PageIsAllVisible(page) || XLogHintBitIsNeeded())
+ {
+ PageSetAllVisible(page);
+ MarkBufferDirty(buffer);
+ }
+
+ old_vmbits = visibilitymap_set(relation, blockno, buffer, InvalidXLogRecPtr,
+ vmbuffer, presult->vm_conflict_horizon,
+ vmflags);
+ }
+ }
+
+ presult->lpdead_items = prstate.lpdead_items;
+ /* the presult->deadoffsets array was already filled in */
+
+ presult->old_vmbits = old_vmbits;
+ presult->new_vmbits = vmflags;
+
if (prstate.freeze)
{
if (presult->nfrozen > 0)
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index 574e415b0e0..9492423141e 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -1949,7 +1949,8 @@ cmpOffsetNumbers(const void *a, const void *b)
* vmbuffer is the buffer containing the VM block with visibility information
* for the heap block, blkno. all_visible_according_to_vm is the saved
* visibility status of the heap block looked up earlier by the caller. We
- * won't rely entirely on this status, as it may be out of date.
+ * won't rely entirely on this status, as it may be out of date. These will be
+ * passed on to heap_page_prune_and_freeze() to use while setting the VM.
*
* *has_lpdead_items is set to true or false depending on whether, upon return
* from this function, any LP_DEAD items are still present on the page.
@@ -1978,6 +1979,7 @@ lazy_scan_prune(LVRelState *vacrel,
/*
* Prune all HOT-update chains and potentially freeze tuples on this page.
+ * Then, if the page's visibility status has changed, update the VM.
*
* If the relation has no indexes, we can immediately mark would-be dead
* items LP_UNUSED.
@@ -1986,10 +1988,6 @@ lazy_scan_prune(LVRelState *vacrel,
* presult.ndeleted. It should not be confused with presult.lpdead_items;
* presult.lpdead_items's final value can be thought of as the number of
* tuples that were deleted from indexes.
- *
- * We will update the VM after collecting LP_DEAD items and freezing
- * tuples. Pruning will have determined whether or not the page is
- * all-visible.
*/
prune_options = HEAP_PAGE_PRUNE_FREEZE | HEAP_PAGE_PRUNE_UPDATE_VM;
if (vacrel->nindexes == 0)
@@ -2081,88 +2079,26 @@ lazy_scan_prune(LVRelState *vacrel,
Assert(!presult.all_visible || !(*has_lpdead_items));
/*
- * Handle setting visibility map bit based on information from the VM (as
- * of last heap_vac_scan_next_block() call), and from all_visible and
- * all_frozen variables.
+ * For the purposes of logging, count whether or not the page was newly
+ * set all-visible and, potentially, all-frozen.
*/
- if (presult.vm_corruption)
+ if ((presult.old_vmbits & VISIBILITYMAP_ALL_VISIBLE) == 0 &&
+ (presult.new_vmbits & VISIBILITYMAP_ALL_VISIBLE) != 0)
{
- /* Don't update the VM if we just cleared corruption in it */
- }
-
- /*
- * If the page isn't yet marked all-visible in the VM or it is and needs
- * to me marked all-frozen, update the VM. Note that all_frozen is only
- * valid if all_visible is true, so we must check both all_visible and
- * all_frozen.
- */
- else if (presult.all_visible &&
- (!all_visible_according_to_vm ||
- (presult.all_frozen && !VM_ALL_FROZEN(vacrel->rel, blkno, &vmbuffer))))
- {
- uint8 old_vmbits;
- uint8 flags = VISIBILITYMAP_ALL_VISIBLE;
-
- /*
- * If the page is all-frozen, we can pass InvalidTransactionId as our
- * cutoff_xid, since a snapshotConflictHorizon sufficient to make
- * everything safe for REDO was logged when the page's tuples were
- * frozen.
- */
- if (presult.all_frozen)
- {
- Assert(!TransactionIdIsValid(presult.vm_conflict_horizon));
- flags |= VISIBILITYMAP_ALL_FROZEN;
- }
-
- /*
- * It should never be the case that the visibility map page is set
- * while the page-level bit is clear, but the reverse is allowed (if
- * checksums are not enabled). Regardless, set both bits so that we
- * get back in sync.
- *
- * If the heap page is all-visible but the VM bit is not set, we don't
- * need to dirty the heap page. However, if checksums are enabled, we
- * do need to make sure that the heap page is dirtied before passing
- * it to visibilitymap_set(), because it may be logged.
- */
- if (!PageIsAllVisible(page) || XLogHintBitIsNeeded())
- {
- PageSetAllVisible(page);
- MarkBufferDirty(buf);
- }
-
- old_vmbits = visibilitymap_set(vacrel->rel, blkno, buf,
- InvalidXLogRecPtr,
- vmbuffer, presult.vm_conflict_horizon,
- flags);
-
- /*
- * Even if we are only setting the all-frozen bit, there is a small
- * chance that the VM was modified sometime between setting
- * all_visible_according_to_vm and checking the visibility during
- * pruning. Check the return value of old_vmbits to ensure the
- * visibility map counters used for logging are accurate.
- *
- * If the page wasn't already set all-visible and/or all-frozen in the
- * VM, count it as newly set for logging.
- */
- if ((old_vmbits & VISIBILITYMAP_ALL_VISIBLE) == 0)
- {
- vacrel->vm_new_visible_pages++;
- if (presult.all_frozen)
- {
- vacrel->vm_new_visible_frozen_pages++;
- *vm_page_frozen = true;
- }
- }
- else if ((old_vmbits & VISIBILITYMAP_ALL_FROZEN) == 0 &&
- presult.all_frozen)
+ vacrel->vm_new_visible_pages++;
+ if ((presult.new_vmbits & VISIBILITYMAP_ALL_FROZEN) != 0)
{
- vacrel->vm_new_frozen_pages++;
+ vacrel->vm_new_visible_frozen_pages++;
*vm_page_frozen = true;
}
}
+ else if ((presult.old_vmbits & VISIBILITYMAP_ALL_FROZEN) == 0 &&
+ (presult.new_vmbits & VISIBILITYMAP_ALL_FROZEN) != 0)
+ {
+ Assert((presult.new_vmbits & VISIBILITYMAP_ALL_VISIBLE) != 0);
+ vacrel->vm_new_frozen_pages++;
+ *vm_page_frozen = true;
+ }
return presult.ndeleted;
}
diff --git a/src/include/access/heapam.h b/src/include/access/heapam.h
index 0c7eb5e46f4..b85648456e9 100644
--- a/src/include/access/heapam.h
+++ b/src/include/access/heapam.h
@@ -235,20 +235,21 @@ typedef struct PruneFreezeResult
int recently_dead_tuples;
/*
- * all_visible and all_frozen indicate if the all-visible and all-frozen
- * bits in the visibility map can be set for this page, after pruning.
+ * all_visible and all_frozen indicate the status of the page as reflected
+ * in the visibility map after pruning, freezing, and setting any pages
+ * all-visible in the visibility map.
*
- * vm_conflict_horizon is the newest xmin of live tuples on the page. The
- * caller can use it as the conflict horizon when setting the VM bits. It
- * is only valid if we froze some tuples (nfrozen > 0), and all_frozen is
- * true.
+ * vm_conflict_horizon is the newest xmin of live tuples on the page
+ * (older than OldestXmin). It will only be valid if we did not set the
+ * page all-frozen in the VM.
*
* These are only set if the HEAP_PRUNE_FREEZE option is set.
*/
bool all_visible;
bool all_frozen;
TransactionId vm_conflict_horizon;
- bool vm_corruption;
+ uint8 old_vmbits;
+ uint8 new_vmbits;
/*
* Whether or not the page makes rel truncation unsafe. This is set to
--
2.43.0
v12-0011-Eliminate-xl_heap_visible-from-vacuum-phase-I-pr.patchtext/x-patch; charset=US-ASCII; name=v12-0011-Eliminate-xl_heap_visible-from-vacuum-phase-I-pr.patchDownload
From 73418906b5aca553da545d77c1f0d29cd3d2f0b4 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <melanieplageman@gmail.com>
Date: Wed, 18 Jun 2025 12:41:00 -0400
Subject: [PATCH v12 11/20] Eliminate xl_heap_visible from vacuum phase I
prune/freeze
Instead of emitting a separate WAL record for every block rendered
all-visible/frozen by vacuum's phase I, include the changes to the VM in
the xl_heap_prune record already emitted.
This is only enabled for vacuum's prune/freeze work, not for on-access
pruning.
---
src/backend/access/heap/pruneheap.c | 459 ++++++++++++++++-----------
src/backend/access/heap/vacuumlazy.c | 30 --
src/include/access/heapam.h | 15 +-
3 files changed, 282 insertions(+), 222 deletions(-)
diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index df3e6439176..dce9025d268 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -44,6 +44,13 @@ typedef struct
bool mark_unused_now;
/* whether to attempt freezing tuples */
bool attempt_freeze;
+
+ /*
+ * Whether or not to consider updating the VM. There is some bookkeeping
+ * that must be maintained if we would like to update the VM.
+ */
+ bool consider_update_vm;
+
struct VacuumCutoffs *cutoffs;
/*-------------------------------------------------------
@@ -108,8 +115,9 @@ typedef struct
*
* These fields are not used by pruning itself for the most part, but are
* used to collect information about what was pruned and what state the
- * page is in after pruning, for the benefit of the caller. They are
- * copied to the caller's PruneFreezeResult at the end.
+ * page is in after pruning to use when updating the visibility map and
+ * for the benefit of the caller. They are copied to the caller's
+ * PruneFreezeResult at the end.
* -------------------------------------------------------
*/
@@ -138,11 +146,10 @@ typedef struct
* bits. It is only valid if we froze some tuples, and all_frozen is
* true.
*
- * NOTE: all_visible and all_frozen don't include LP_DEAD items. That's
- * convenient for heap_page_prune_and_freeze(), to use them to decide
- * whether to freeze the page or not. The all_visible and all_frozen
- * values returned to the caller are adjusted to include LP_DEAD items at
- * the end.
+ * NOTE: all_visible and all_frozen don't include LP_DEAD items until
+ * directly before updating the VM. We ignore LP_DEAD items when deciding
+ * whether or not to opportunistically freeze and when determining the
+ * snapshot conflict horizon required when freezing tuples.
*/
bool all_visible;
bool all_frozen;
@@ -377,12 +384,15 @@ identify_and_fix_vm_corruption(Relation relation,
* If the HEAP_PRUNE_FREEZE option is set, we will also freeze tuples if it's
* required in order to advance relfrozenxid / relminmxid, or if it's
* considered advantageous for overall system performance to do so now. The
- * 'cutoffs', 'presult', 'new_relfrozen_xid' and 'new_relmin_mxid' arguments
- * are required when freezing. When HEAP_PRUNE_FREEZE option is set, we also
- * set presult->all_visible and presult->all_frozen on exit, to indicate if
- * the VM bits can be set. They are always set to false when the
- * HEAP_PRUNE_FREEZE option is not set, because at the moment only callers
- * that also freeze need that information.
+ * 'cutoffs', 'new_relfrozen_xid' and 'new_relmin_mxid' arguments are required
+ * when freezing.
+ *
+ * If HEAP_PAGE_PRUNE_UPDATE_VM is set and the visibility status of the page
+ * has changed, we will update the VM at the same time as pruning and freezing
+ * the heap page. We will also update presult->old_vmbits and
+ * presult->new_vmbits with the state of the VM before and after updating it
+ * for the caller to use in bookkeeping. Note that new and old_vmbits will be
+ * 0 if HEAP_PAGE_PRUNE_UPDATE_VM is not set.
*
* blk_known_av is the visibility status of the heap block as of the last call
* to find_next_unskippable_block(). vmbuffer is the buffer that may already
@@ -398,6 +408,8 @@ identify_and_fix_vm_corruption(Relation relation,
* FREEZE indicates that we will also freeze tuples, and will return
* 'all_visible', 'all_frozen' flags to the caller.
*
+ * UPDATE_VM indicates that we will set the page's status in the VM.
+ *
* cutoffs contains the freeze cutoffs, established by VACUUM at the beginning
* of vacuuming the relation. Required if HEAP_PRUNE_FREEZE option is set.
* cutoffs->OldestXmin is also used to determine if dead tuples are
@@ -442,18 +454,24 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
HeapTupleData tup;
bool do_freeze;
bool do_prune;
- bool do_hint;
+ bool do_hint_full_or_prunable;
+ bool do_set_vm;
uint8 vmflags = 0;
uint8 old_vmbits = 0;
bool did_tuple_hint_fpi;
int64 fpi_before = pgWalUsage.wal_fpi;
+ bool all_frozen_except_lp_dead = false;
+ bool set_pd_all_visible = false;
/* Copy parameters to prstate */
prstate.vistest = vistest;
prstate.mark_unused_now = (options & HEAP_PAGE_PRUNE_MARK_UNUSED_NOW) != 0;
prstate.attempt_freeze = (options & HEAP_PAGE_PRUNE_FREEZE) != 0;
+ prstate.consider_update_vm = (options & HEAP_PAGE_PRUNE_UPDATE_VM) != 0;
prstate.cutoffs = cutoffs;
+ Assert(!prstate.consider_update_vm || vmbuffer);
+
/*
* Our strategy is to scan the page and make lists of items to change,
* then apply the changes within a critical section. This keeps as much
@@ -498,50 +516,57 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
prstate.deadoffsets = presult->deadoffsets;
/*
- * Caller may update the VM after we're done. We can keep track of
- * whether the page will be all-visible and all-frozen after pruning and
- * freezing to help the caller to do that.
+ * Keep track of whether or not the page will be all-visible and
+ * all-frozen for use in opportunistic freezing and to update the VM if
+ * the caller requests it.
+ *
+ * Currently, only VACUUM attempts freezing and setting the VM bits. But
+ * other callers could do either one. The visibility bookkeeping is
+ * required for opportunistic freezing (in addition to setting the VM
+ * bits) because we only consider opportunistically freezing tuples if the
+ * whole page would become all-frozen or if the whole page will be frozen
+ * except for dead tuples that will be removed by vacuum.
*
- * Currently, only VACUUM sets the VM bits. To save the effort, only do
- * the bookkeeping if the caller needs it. Currently, that's tied to
- * HEAP_PAGE_PRUNE_FREEZE, but it could be a separate flag if you wanted
- * to update the VM bits without also freezing or freeze without also
- * setting the VM bits.
+ * If only updating the VM, we must initialize all_frozen to false, as
+ * heap_prepare_freeze_tuple() will not be called for each tuple on the
+ * page and we will not end up correctly setting it to false later.
*
- * In addition to telling the caller whether it can set the VM bit, we
- * also use 'all_visible' and 'all_frozen' for our own decision-making. If
- * the whole page would become frozen, we consider opportunistically
- * freezing tuples. We will not be able to freeze the whole page if there
- * are tuples present that are not visible to everyone or if there are
- * dead tuples which are not yet removable. However, dead tuples which
- * will be removed by the end of vacuuming should not preclude us from
- * opportunistically freezing. Because of that, we do not clear
- * all_visible when we see LP_DEAD items. We fix that at the end of the
- * function, when we return the value to the caller, so that the caller
- * doesn't set the VM bit incorrectly.
+ * Dead tuples which will be removed by the end of vacuuming should not
+ * preclude us from opportunistically freezing, so we do not clear
+ * all_visible when we see LP_DEAD items. We fix that after determining
+ * whether or not to freeze but before deciding whether or not to update
+ * the VM so that we don't set the VM bit incorrectly.
+ *
+ * If not freezing or updating the VM, we otherwise avoid the extra
+ * bookkeeping. Initializing all_visible to false allows skipping the work
+ * to update them in heap_prune_record_unchanged_lp_normal().
*/
if (prstate.attempt_freeze)
{
prstate.all_visible = true;
prstate.all_frozen = true;
}
+ else if (prstate.consider_update_vm)
+ {
+ prstate.all_visible = true;
+ prstate.all_frozen = false;
+ }
else
{
- /*
- * Initializing to false allows skipping the work to update them in
- * heap_prune_record_unchanged_lp_normal().
- */
prstate.all_visible = false;
prstate.all_frozen = false;
}
/*
- * The visibility cutoff xid is the newest xmin of live tuples on the
- * page. In the common case, this will be set as the conflict horizon the
- * caller can use for updating the VM. If, at the end of freezing and
- * pruning, the page is all-frozen, there is no possibility that any
- * running transaction on the standby does not see tuples on the page as
- * all-visible, so the conflict horizon remains InvalidTransactionId.
+ * The visibility cutoff xid is the newest xmin of live, committed tuples
+ * older than OldestXmin on the page. This field is only kept up-to-date
+ * if the page is all-visible. As soon as a tuple is encountered that is
+ * not visible to all, this field is unmaintained. As long as it is
+ * maintained, it can be used to calculate the snapshot conflict horizon.
+ * This is most likely to happen when updating the VM and/or freezing all
+ * live tuples on the page. It is updated before returning to the caller
+ * because vacuum does assert-build only validation on the page using this
+ * field.
*/
prstate.visibility_cutoff_xid = InvalidTransactionId;
@@ -739,10 +764,11 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
/*
* Even if we don't prune anything, if we found a new value for the
- * pd_prune_xid field or the page was marked full, we will update the hint
- * bit.
+ * pd_prune_xid field or the page was marked full, we will update those
+ * hint bits.
*/
- do_hint = ((PageHeader) page)->pd_prune_xid != prstate.new_prune_xid ||
+ do_hint_full_or_prunable =
+ ((PageHeader) page)->pd_prune_xid != prstate.new_prune_xid ||
PageIsFull(page);
/*
@@ -790,7 +816,7 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
if (XLogCheckBufferNeedsBackup(buffer))
do_freeze = true;
}
- else if (do_hint)
+ else if (do_hint_full_or_prunable)
{
if (XLogHintBitIsNeeded() && XLogCheckBufferNeedsBackup(buffer))
do_freeze = true;
@@ -829,11 +855,88 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
*/
}
+ /*
+ * It was convenient to ignore LP_DEAD items in all_visible earlier on to
+ * make the choice of whether or not to freeze the page unaffected by the
+ * short-term presence of LP_DEAD items. These LP_DEAD items were
+ * effectively assumed to be LP_UNUSED items in the making. It doesn't
+ * matter which vacuum heap pass (initial pass or final pass) ends up
+ * setting the page all-frozen, as long as the ongoing VACUUM does it.
+ *
+ * Now that freezing has been finalized, unset all_visible if there are
+ * any LP_DEAD items on the page. It needs to reflect the present state of
+ * the page when using it to determine whether or not to update the VM.
+ *
+ * Keep track of whether or not the page was all-frozen except LP_DEAD
+ * items for the purposes of calculating the snapshot conflict horizon,
+ * though.
+ */
+ all_frozen_except_lp_dead = prstate.all_frozen;
+ if (prstate.lpdead_items > 0)
+ {
+ prstate.all_visible = false;
+ prstate.all_frozen = false;
+ }
+
Assert(!prstate.all_frozen || prstate.all_visible);
+
+ /*
+ * Handle setting visibility map bit based on information from the VM (as
+ * of last heap_vac_scan_next_block() call), and from all_visible and
+ * all_frozen variables.
+ */
+ if (prstate.consider_update_vm)
+ {
+ /*
+ * Clear any VM corruption. This does not need to be in a critical
+ * section, so we do it first. If PD_ALL_VISIBLE is incorrectly set,
+ * we may mark the heap page buffer dirty here and could end up doing
+ * so again later. This is not a correctness issue and is in the path
+ * of VM corruption, so we don't have to worry about the extra
+ * performance overhead.
+ */
+ if (identify_and_fix_vm_corruption(relation,
+ blockno, buffer, page,
+ blk_known_av, prstate.lpdead_items, vmbuffer))
+ {
+ /* If we fix corruption, don't update the VM further */
+ }
+
+ /* Determine if we actually need to set the VM and which bits to set. */
+ else if (prstate.all_visible &&
+ (!blk_known_av ||
+ (prstate.all_frozen && !VM_ALL_FROZEN(relation, blockno, &vmbuffer))))
+ {
+ vmflags |= VISIBILITYMAP_ALL_VISIBLE;
+ if (prstate.all_frozen)
+ vmflags |= VISIBILITYMAP_ALL_FROZEN;
+ }
+ }
+
+ do_set_vm = vmflags & VISIBILITYMAP_VALID_BITS;
+
+ /* Lock vmbuffer before entering a critical section */
+ if (do_set_vm)
+ LockBuffer(vmbuffer, BUFFER_LOCK_EXCLUSIVE);
+
+ /*
+ * Don't set PD_ALL_VISIBLE unless we also plan to set the VM. While it is
+ * correct for a heap page to have PD_ALL_VISIBLE even if the VM is not
+ * set, we strongly prefer to keep them in sync.
+ *
+ * Prior to Postgres 19, it was possible for the page-level bit to be set
+ * and the VM bit to be clear. This could happen if we crashed after
+ * setting PD_ALL_VISIBLE but before setting bits in the VM.
+ */
+ set_pd_all_visible = do_set_vm && !PageIsAllVisible(page);
+
+ /* Save these for the caller in case we later zero out vmflags */
+ presult->new_vmbits = vmflags;
+
/* Any error while applying the changes is critical */
START_CRIT_SECTION();
- if (do_hint)
+ if (do_hint_full_or_prunable)
{
/*
* Update the page's pd_prune_xid field to either zero, or the lowest
@@ -849,15 +952,16 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
PageClearFull(page);
/*
- * If that's all we had to do to the page, this is a non-WAL-logged
- * hint. If we are going to freeze or prune the page, we will mark
- * the buffer dirty below.
+ * If we are _only_ setting the prune_xid or PD_PAGE_FULL hint, then
+ * this is a non-WAL-logged hint. If we are going to freeze or prune
+ * tuples on the page or set PD_ALL_VISIBLE, we will mark the buffer
+ * dirty and emit WAL below.
*/
- if (!do_freeze && !do_prune)
+ if (!do_prune && !do_freeze && !set_pd_all_visible)
MarkBufferDirtyHint(buffer, true);
}
- if (do_prune || do_freeze)
+ if (do_prune || do_freeze || do_set_vm)
{
/* Apply the planned item changes and repair page fragmentation. */
if (do_prune)
@@ -871,12 +975,47 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
if (do_freeze)
heap_freeze_prepared_tuples(buffer, prstate.frozen, prstate.nfrozen);
- MarkBufferDirty(buffer);
+ if (set_pd_all_visible)
+ PageSetAllVisible(page);
/*
- * Emit a WAL XLOG_HEAP2_PRUNE_FREEZE record showing what we did
+ * We only set PD_ALL_VISIBLE if we also set the VM, and since setting
+ * the VM requires emitting WAL, MarkBufferDirtyHint() isn't
+ * appropriate here.
*/
- if (RelationNeedsWAL(relation))
+ if (do_prune || do_freeze || set_pd_all_visible)
+ MarkBufferDirty(buffer);
+
+ if (do_set_vm)
+ {
+ Assert(PageIsAllVisible(page));
+ old_vmbits = visibilitymap_set_vmbits(relation, blockno,
+ vmbuffer, vmflags);
+
+ if (old_vmbits == vmflags)
+ {
+ LockBuffer(vmbuffer, BUFFER_LOCK_UNLOCK);
+ do_set_vm = false;
+ /* 0 out vmflags so we don't emit WAL to update the VM */
+ vmflags = 0;
+ }
+ }
+
+ /*
+ * It should never be the case that PD_ALL_VISIBLE is not set and the
+ * VM is set. Or, if it were, we should have caught it earlier when
+ * finding and fixing VM corruption. So, if we found out the VM was
+ * already set above, we should have found PD_ALL_VISIBLE set earlier.
+ */
+ Assert(!set_pd_all_visible || do_set_vm);
+
+ /*
+ * Emit a WAL XLOG_HEAP2_PRUNE_FREEZE record showing what we did. If
+ * we were only updating the VM and it turns out it was already set,
+ * we will have unset do_set_vm earlier. As such, check it again
+ * before emitting the record.
+ */
+ if (RelationNeedsWAL(relation) && (do_set_vm || do_prune || do_freeze))
{
/*
* The snapshotConflictHorizon for the whole record should be the
@@ -888,35 +1027,56 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
* on the standby with xids older than the youngest tuple this
* record will freeze will conflict.
*/
- TransactionId frz_conflict_horizon = InvalidTransactionId;
- TransactionId conflict_xid;
+ TransactionId conflict_xid = InvalidTransactionId;
+
+ /*
+ * If we are updating the VM, the conflict horizon is almost
+ * always the visibility cutoff XID.
+ *
+ * Separately, if we are freezing any tuples, as an optimization,
+ * we can use the visibility_cutoff_xid as the conflict horizon if
+ * the page will be all-frozen. This is true even if there are
+ * LP_DEAD line pointers because we ignored those when maintaining
+ * the visibility_cutoff_xid.
+ */
+ if (do_set_vm || (do_freeze && all_frozen_except_lp_dead))
+ conflict_xid = prstate.visibility_cutoff_xid;
/*
- * We can use the visibility_cutoff_xid as our cutoff for
- * conflicts when the whole page is eligible to become all-frozen
- * in the VM once we're done with it. Otherwise we generate a
- * conservative cutoff by stepping back from OldestXmin.
+ * Otherwise, if we are freezing but the page would not be
+ * all-frozen, we have to use the more pessimistic horizon of
+ * OldestXmin, which may be newer than the newest tuple we froze.
+ * We currently don't track the newest tuple we froze.
*/
- if (do_freeze)
+ else if (do_freeze)
{
- if (prstate.all_visible && prstate.all_frozen)
- frz_conflict_horizon = prstate.visibility_cutoff_xid;
- else
- {
- /* Avoids false conflicts when hot_standby_feedback in use */
- frz_conflict_horizon = prstate.cutoffs->OldestXmin;
- TransactionIdRetreat(frz_conflict_horizon);
- }
+ conflict_xid = prstate.cutoffs->OldestXmin;
+ TransactionIdRetreat(conflict_xid);
}
- if (TransactionIdFollows(frz_conflict_horizon, prstate.latest_xid_removed))
- conflict_xid = frz_conflict_horizon;
- else
+ /*
+ * If we are removing tuples with a younger xmax than our so far
+ * calculated conflict_xid, we must use this as our horizon.
+ */
+ if (TransactionIdFollows(prstate.latest_xid_removed, conflict_xid))
conflict_xid = prstate.latest_xid_removed;
+ /*
+ * We can omit the snapshot conflict horizon if we are not pruning
+ * or freezing any tuples and are setting an already all-visible
+ * page all-frozen in the VM. In this case, all of the tuples on
+ * the page must already be visible to all MVCC snapshots on the
+ * standby.
+ */
+ if (!do_prune && !do_freeze && do_set_vm &&
+ blk_known_av && (vmflags & VISIBILITYMAP_ALL_FROZEN))
+ conflict_xid = InvalidTransactionId;
+
log_heap_prune_and_freeze(relation, buffer,
false,
- InvalidBuffer, 0, false,
+ vmbuffer,
+ vmflags,
+ set_pd_all_visible,
conflict_xid,
true, reason,
prstate.frozen, prstate.nfrozen,
@@ -928,124 +1088,55 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
END_CRIT_SECTION();
- /* Copy information back for caller */
- presult->ndeleted = prstate.ndeleted;
- presult->nnewlpdead = prstate.ndead;
- presult->nfrozen = prstate.nfrozen;
- presult->live_tuples = prstate.live_tuples;
- presult->recently_dead_tuples = prstate.recently_dead_tuples;
-
- /*
- * It was convenient to ignore LP_DEAD items in all_visible earlier on to
- * make the choice of whether or not to freeze the page unaffected by the
- * short-term presence of LP_DEAD items. These LP_DEAD items were
- * effectively assumed to be LP_UNUSED items in the making. It doesn't
- * matter which vacuum heap pass (initial pass or final pass) ends up
- * setting the page all-frozen, as long as the ongoing VACUUM does it.
- *
- * Now that freezing has been finalized, unset all_visible if there are
- * any LP_DEAD items on the page. It needs to reflect the present state
- * of the page, as expected for updating the visibility map.
- */
- if (prstate.all_visible && prstate.lpdead_items == 0)
- {
- presult->all_visible = prstate.all_visible;
- presult->all_frozen = prstate.all_frozen;
- }
- else
- {
- presult->all_visible = false;
- presult->all_frozen = false;
- }
-
- presult->hastup = prstate.hastup;
+ if (do_set_vm)
+ LockBuffer(vmbuffer, BUFFER_LOCK_UNLOCK);
- /*
- * If updating the visibility map, the conflict horizon for that record
- * must be the newest xmin on the page. However, if the page is
- * completely frozen, there can be no conflict and the vm_conflict_horizon
- * should remain InvalidTransactionId. This includes the case that we
- * just froze all the tuples; the prune-freeze record included the
- * conflict XID already so the VM update record doesn't need it.
- */
- if (presult->all_frozen)
- presult->vm_conflict_horizon = InvalidTransactionId;
- else
- presult->vm_conflict_horizon = prstate.visibility_cutoff_xid;
+ Assert(!prstate.all_visible || (prstate.lpdead_items == 0));
/*
- * Handle setting visibility map bit based on information from the VM (as
- * of last heap_vac_scan_next_block() call), and from all_visible and
- * all_frozen variables.
+ * VACUUM will call heap_page_is_all_visible() during the second pass over
+ * the heap to determine all_visible and all_frozen for the page -- this
+ * is a specialized version of the logic from this function. Now that
+ * we've finished pruning and freezing, make sure that we're in total
+ * agreement with heap_page_is_all_visible() using an assertion. We will
+ * have already set the page in the VM, so this assertion will only let
+ * you know that you've already done something wrong.
*/
- if (options & HEAP_PAGE_PRUNE_UPDATE_VM)
+#ifdef USE_ASSERT_CHECKING
+ if (prstate.all_visible)
{
- if (identify_and_fix_vm_corruption(relation,
- blockno, buffer, page,
- blk_known_av,
- prstate.lpdead_items, vmbuffer))
- {
- /* If we fix corruption, don't update the VM further */
- }
+ TransactionId debug_cutoff;
+ bool debug_all_frozen;
- /*
- * If the page isn't yet marked all-visible in the VM or it is and
- * needs to me marked all-frozen, update the VM. Note that all_frozen
- * is only valid if all_visible is true, so we must check both
- * all_visible and all_frozen.
- */
- else if (presult->all_visible &&
- (!blk_known_av ||
- (presult->all_frozen && !VM_ALL_FROZEN(relation, blockno, &vmbuffer))))
- {
- Assert(prstate.lpdead_items == 0);
- vmflags = VISIBILITYMAP_ALL_VISIBLE;
+ Assert(cutoffs);
- /*
- * If the page is all-frozen, we can pass InvalidTransactionId as
- * our cutoff_xid, since a snapshotConflictHorizon sufficient to
- * make everything safe for REDO was logged when the page's tuples
- * were frozen.
- */
- if (presult->all_frozen)
- {
- Assert(!TransactionIdIsValid(presult->vm_conflict_horizon));
- vmflags |= VISIBILITYMAP_ALL_FROZEN;
- }
+ Assert(prstate.lpdead_items == 0);
- /*
- * It's possible for the VM bit to be clear and the page-level bit
- * to be set if checksums are not enabled.
- *
- * And even if we are just planning to update the frozen bit in
- * the VM, we shouldn't rely on all_visible_according_to_vm as a
- * proxy for the page-level PD_ALL_VISIBLE bit being set, since it
- * might have become stale.
- *
- * If the heap page is all-visible but the VM bit is not set, we
- * don't need to dirty the heap page. However, if checksums are
- * enabled, we do need to make sure that the heap page is dirtied
- * before passing it to visibilitymap_set(), because it may be
- * logged.
- */
- if (!PageIsAllVisible(page) || XLogHintBitIsNeeded())
- {
- PageSetAllVisible(page);
- MarkBufferDirty(buffer);
- }
+ if (!heap_page_is_all_visible(relation, buffer,
+ cutoffs->OldestXmin,
+ &debug_all_frozen,
+ &debug_cutoff, off_loc))
+ Assert(false);
- old_vmbits = visibilitymap_set(relation, blockno, buffer, InvalidXLogRecPtr,
- vmbuffer, presult->vm_conflict_horizon,
- vmflags);
- }
+ Assert(prstate.all_frozen == debug_all_frozen);
+
+ Assert(!TransactionIdIsValid(debug_cutoff) ||
+ debug_cutoff == prstate.visibility_cutoff_xid);
}
+#endif
+ /* Copy information back for caller */
+ presult->ndeleted = prstate.ndeleted;
+ presult->nnewlpdead = prstate.ndead;
+ presult->nfrozen = prstate.nfrozen;
+ presult->live_tuples = prstate.live_tuples;
+ presult->recently_dead_tuples = prstate.recently_dead_tuples;
+ presult->old_vmbits = old_vmbits;
+ /* new_vmbits was set above */
+ presult->hastup = prstate.hastup;
presult->lpdead_items = prstate.lpdead_items;
/* the presult->deadoffsets array was already filled in */
- presult->old_vmbits = old_vmbits;
- presult->new_vmbits = vmflags;
-
if (prstate.attempt_freeze)
{
if (presult->nfrozen > 0)
@@ -1627,7 +1718,12 @@ heap_prune_record_unchanged_lp_normal(Page page, PruneState *prstate, OffsetNumb
break;
}
- /* Consider freezing any normal tuples which will not be removed */
+ /*
+ * Consider freezing any normal tuples which will not be removed.
+ * Regardless of whether or not we want to freeze the tuples, if we want
+ * to update the VM, we have to call heap_prepare_freeze_tuple() on every
+ * tuple to know whether or not the page will be totally frozen.
+ */
if (prstate->attempt_freeze)
{
bool totally_frozen;
@@ -2190,7 +2286,7 @@ heap_log_freeze_plan(HeapTupleFreeze *tuples, int ntuples,
* - Reaping: During vacuum phase III, items that are already LP_DEAD are
* marked as unused.
*
- * - VM updates: After vacuum phase III, the heap page may be marked
+ * - VM updates: After vacuum phases I and III, the heap page may be marked
* all-visible and all-frozen.
*
* These changes all happen together, so we use a singel WAL record for them
@@ -2244,6 +2340,7 @@ log_heap_prune_and_freeze(Relation relation, Buffer buffer,
OffsetNumber frz_offsets[MaxHeapTuplesPerPage];
bool do_prune = nredirected > 0 || ndead > 0 || nunused > 0;
+ Assert(do_prune || nfrozen > 0 || vmflags & VISIBILITYMAP_VALID_BITS);
Assert((vmflags & VISIBILITYMAP_VALID_BITS) == vmflags);
xlrec.flags = vmflags;
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index 9492423141e..75205179b83 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -2015,34 +2015,6 @@ lazy_scan_prune(LVRelState *vacrel,
vacrel->new_frozen_tuple_pages++;
}
- /*
- * VACUUM will call heap_page_is_all_visible() during the second pass over
- * the heap to determine all_visible and all_frozen for the page -- this
- * is a specialized version of the logic from this function. Now that
- * we've finished pruning and freezing, make sure that we're in total
- * agreement with heap_page_is_all_visible() using an assertion.
- */
-#ifdef USE_ASSERT_CHECKING
- /* Note that all_frozen value does not matter when !all_visible */
- if (presult.all_visible)
- {
- TransactionId debug_cutoff;
- bool debug_all_frozen;
-
- Assert(presult.lpdead_items == 0);
-
- if (!heap_page_is_all_visible(vacrel->rel, buf,
- vacrel->cutoffs.OldestXmin, &debug_all_frozen,
- &debug_cutoff, &vacrel->offnum))
- Assert(false);
-
- Assert(presult.all_frozen == debug_all_frozen);
-
- Assert(!TransactionIdIsValid(debug_cutoff) ||
- debug_cutoff == presult.vm_conflict_horizon);
- }
-#endif
-
/*
* Now save details of the LP_DEAD items from the page in vacrel
*/
@@ -2076,8 +2048,6 @@ lazy_scan_prune(LVRelState *vacrel,
/* Did we find LP_DEAD items? */
*has_lpdead_items = (presult.lpdead_items > 0);
- Assert(!presult.all_visible || !(*has_lpdead_items));
-
/*
* For the purposes of logging, count whether or not the page was newly
* set all-visible and, potentially, all-frozen.
diff --git a/src/include/access/heapam.h b/src/include/access/heapam.h
index b85648456e9..0b9bb1c9b13 100644
--- a/src/include/access/heapam.h
+++ b/src/include/access/heapam.h
@@ -235,19 +235,12 @@ typedef struct PruneFreezeResult
int recently_dead_tuples;
/*
- * all_visible and all_frozen indicate the status of the page as reflected
- * in the visibility map after pruning, freezing, and setting any pages
- * all-visible in the visibility map.
+ * old_vmbits are the state of the all-visible and all-frozen bits in the
+ * visibility map before updating it during phase I of vacuuming.
+ * new_vmbits are the state of those bits after phase I of vacuuming.
*
- * vm_conflict_horizon is the newest xmin of live tuples on the page
- * (older than OldestXmin). It will only be valid if we did not set the
- * page all-frozen in the VM.
- *
- * These are only set if the HEAP_PRUNE_FREEZE option is set.
+ * These are only set if the HEAP_PAGE_PRUNE_UPDATE_VM option is set.
*/
- bool all_visible;
- bool all_frozen;
- TransactionId vm_conflict_horizon;
uint8 old_vmbits;
uint8 new_vmbits;
--
2.43.0
v12-0013-Rename-GlobalVisTestIsRemovableXid-to-GlobalVisX.patchtext/x-patch; charset=US-ASCII; name=v12-0013-Rename-GlobalVisTestIsRemovableXid-to-GlobalVisX.patchDownload
From 1345b081eff8eabc84ee7026a6be1a4ee5a45f47 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <melanieplageman@gmail.com>
Date: Fri, 18 Jul 2025 16:30:04 -0400
Subject: [PATCH v12 13/20] Rename GlobalVisTestIsRemovableXid() to
GlobalVisXidVisibleToAll()
Currently, we only use GlobalVisTestIsRemovableXid() to check if a
tuple's xmax is visible to all, meaning we can remove it. But future
commits will use GlobalVisTestIsRemovableXid() to test if a tuple's xmin
is visible to all for the purposes of determining if setting the page
all-visible in the VM. In that case, it makes more sense to call the
function GlobalVisXidVisibleToAll().
Reviewed-by: Kirill Reshke <reshkekirill@gmail.com>
---
src/backend/access/heap/heapam_visibility.c | 6 +++---
src/backend/access/heap/pruneheap.c | 14 +++++++-------
src/backend/access/spgist/spgvacuum.c | 2 +-
src/backend/storage/ipc/procarray.c | 13 ++++++-------
src/include/utils/snapmgr.h | 4 ++--
5 files changed, 19 insertions(+), 20 deletions(-)
diff --git a/src/backend/access/heap/heapam_visibility.c b/src/backend/access/heap/heapam_visibility.c
index 05f6946fe60..4ebc8abdbeb 100644
--- a/src/backend/access/heap/heapam_visibility.c
+++ b/src/backend/access/heap/heapam_visibility.c
@@ -1447,7 +1447,7 @@ HeapTupleSatisfiesNonVacuumable(HeapTuple htup, Snapshot snapshot,
{
Assert(TransactionIdIsValid(dead_after));
- if (GlobalVisTestIsRemovableXid(snapshot->vistest, dead_after))
+ if (GlobalVisXidVisibleToAll(snapshot->vistest, dead_after))
res = HEAPTUPLE_DEAD;
}
else
@@ -1512,8 +1512,8 @@ HeapTupleIsSurelyDead(HeapTuple htup, GlobalVisState *vistest)
return false;
/* Deleter committed, so tuple is dead if the XID is old enough. */
- return GlobalVisTestIsRemovableXid(vistest,
- HeapTupleHeaderGetRawXmax(tuple));
+ return GlobalVisXidVisibleToAll(vistest,
+ HeapTupleHeaderGetRawXmax(tuple));
}
/*
diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index 6637966e927..0211effeec7 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -231,7 +231,7 @@ heap_page_prune_opt(Relation relation, Buffer buffer)
*/
vistest = GlobalVisTestFor(relation);
- if (!GlobalVisTestIsRemovableXid(vistest, prune_xid))
+ if (!GlobalVisXidVisibleToAll(vistest, prune_xid))
return;
/*
@@ -580,9 +580,9 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
* Determining HTSV only once for each tuple is required for correctness,
* to deal with cases where running HTSV twice could result in different
* results. For example, RECENTLY_DEAD can turn to DEAD if another
- * checked item causes GlobalVisTestIsRemovableFullXid() to update the
- * horizon, or INSERT_IN_PROGRESS can change to DEAD if the inserting
- * transaction aborts.
+ * checked item causes GlobalVisXidVisibleToAll() to update the horizon,
+ * or INSERT_IN_PROGRESS can change to DEAD if the inserting transaction
+ * aborts.
*
* It's also good for performance. Most commonly tuples within a page are
* stored at decreasing offsets (while the items are stored at increasing
@@ -1182,11 +1182,11 @@ heap_prune_satisfies_vacuum(PruneState *prstate, HeapTuple tup, Buffer buffer)
* Determine whether or not the tuple is considered dead when compared
* with the provided GlobalVisState. On-access pruning does not provide
* VacuumCutoffs. And for vacuum, even if the tuple's xmax is not older
- * than OldestXmin, GlobalVisTestIsRemovableXid() could find the row dead
- * if the GlobalVisState has been updated since the beginning of vacuuming
+ * than OldestXmin, GlobalVisXidVisibleToAll() could find the row dead if
+ * the GlobalVisState has been updated since the beginning of vacuuming
* the relation.
*/
- if (GlobalVisTestIsRemovableXid(prstate->vistest, dead_after))
+ if (GlobalVisXidVisibleToAll(prstate->vistest, dead_after))
return HEAPTUPLE_DEAD;
return res;
diff --git a/src/backend/access/spgist/spgvacuum.c b/src/backend/access/spgist/spgvacuum.c
index 8f8a1ad7796..496cca69410 100644
--- a/src/backend/access/spgist/spgvacuum.c
+++ b/src/backend/access/spgist/spgvacuum.c
@@ -536,7 +536,7 @@ vacuumRedirectAndPlaceholder(Relation index, Relation heaprel, Buffer buffer)
*/
if (dt->tupstate == SPGIST_REDIRECT &&
(!TransactionIdIsValid(dt->xid) ||
- GlobalVisTestIsRemovableXid(vistest, dt->xid)))
+ GlobalVisXidVisibleToAll(vistest, dt->xid)))
{
dt->tupstate = SPGIST_PLACEHOLDER;
Assert(opaque->nRedirection > 0);
diff --git a/src/backend/storage/ipc/procarray.c b/src/backend/storage/ipc/procarray.c
index 200f72c6e25..f67f01c17c2 100644
--- a/src/backend/storage/ipc/procarray.c
+++ b/src/backend/storage/ipc/procarray.c
@@ -4181,8 +4181,7 @@ GlobalVisUpdate(void)
* See comment for GlobalVisState for details.
*/
bool
-GlobalVisTestIsRemovableFullXid(GlobalVisState *state,
- FullTransactionId fxid)
+GlobalVisFullXidVisibleToAll(GlobalVisState *state, FullTransactionId fxid)
{
/*
* If fxid is older than maybe_needed bound, it definitely is visible to
@@ -4223,7 +4222,7 @@ GlobalVisTestIsRemovableFullXid(GlobalVisState *state,
* relfrozenxid).
*/
bool
-GlobalVisTestIsRemovableXid(GlobalVisState *state, TransactionId xid)
+GlobalVisXidVisibleToAll(GlobalVisState *state, TransactionId xid)
{
FullTransactionId fxid;
@@ -4237,7 +4236,7 @@ GlobalVisTestIsRemovableXid(GlobalVisState *state, TransactionId xid)
*/
fxid = FullXidRelativeTo(state->definitely_needed, xid);
- return GlobalVisTestIsRemovableFullXid(state, fxid);
+ return GlobalVisFullXidVisibleToAll(state, fxid);
}
/*
@@ -4251,12 +4250,12 @@ GlobalVisCheckRemovableFullXid(Relation rel, FullTransactionId fxid)
state = GlobalVisTestFor(rel);
- return GlobalVisTestIsRemovableFullXid(state, fxid);
+ return GlobalVisFullXidVisibleToAll(state, fxid);
}
/*
* Convenience wrapper around GlobalVisTestFor() and
- * GlobalVisTestIsRemovableXid(), see their comments.
+ * GlobalVisTestIsVisibleXid(), see their comments.
*/
bool
GlobalVisCheckRemovableXid(Relation rel, TransactionId xid)
@@ -4265,7 +4264,7 @@ GlobalVisCheckRemovableXid(Relation rel, TransactionId xid)
state = GlobalVisTestFor(rel);
- return GlobalVisTestIsRemovableXid(state, xid);
+ return GlobalVisXidVisibleToAll(state, xid);
}
/*
diff --git a/src/include/utils/snapmgr.h b/src/include/utils/snapmgr.h
index 604c1f90216..a0ea2cfcea2 100644
--- a/src/include/utils/snapmgr.h
+++ b/src/include/utils/snapmgr.h
@@ -100,8 +100,8 @@ extern char *ExportSnapshot(Snapshot snapshot);
*/
typedef struct GlobalVisState GlobalVisState;
extern GlobalVisState *GlobalVisTestFor(Relation rel);
-extern bool GlobalVisTestIsRemovableXid(GlobalVisState *state, TransactionId xid);
-extern bool GlobalVisTestIsRemovableFullXid(GlobalVisState *state, FullTransactionId fxid);
+extern bool GlobalVisXidVisibleToAll(GlobalVisState *state, TransactionId xid);
+extern bool GlobalVisFullXidVisibleToAll(GlobalVisState *state, FullTransactionId fxid);
extern bool GlobalVisCheckRemovableXid(Relation rel, TransactionId xid);
extern bool GlobalVisCheckRemovableFullXid(Relation rel, FullTransactionId fxid);
--
2.43.0
v12-0012-Remove-xl_heap_visible-entirely.patchtext/x-patch; charset=US-ASCII; name=v12-0012-Remove-xl_heap_visible-entirely.patchDownload
From 1980ef906cb104da3a97a597ee69de73a21bdf0e Mon Sep 17 00:00:00 2001
From: Melanie Plageman <melanieplageman@gmail.com>
Date: Wed, 18 Jun 2025 12:30:42 -0400
Subject: [PATCH v12 12/20] Remove xl_heap_visible entirely
There are now no users of this, so eliminate it entirely.
---
src/backend/access/common/bufmask.c | 3 +-
src/backend/access/heap/heapam.c | 54 +-------
src/backend/access/heap/heapam_xlog.c | 160 ++---------------------
src/backend/access/heap/pruneheap.c | 4 +-
src/backend/access/heap/vacuumlazy.c | 10 +-
src/backend/access/heap/visibilitymap.c | 109 +--------------
src/backend/access/rmgrdesc/heapdesc.c | 10 --
src/backend/replication/logical/decode.c | 1 -
src/backend/storage/ipc/standby.c | 12 +-
src/include/access/heapam_xlog.h | 20 ---
src/include/access/visibilitymap.h | 11 +-
src/include/access/visibilitymapdefs.h | 9 --
src/tools/pgindent/typedefs.list | 1 -
13 files changed, 34 insertions(+), 370 deletions(-)
diff --git a/src/backend/access/common/bufmask.c b/src/backend/access/common/bufmask.c
index bb260cffa68..1fff01383b3 100644
--- a/src/backend/access/common/bufmask.c
+++ b/src/backend/access/common/bufmask.c
@@ -56,8 +56,7 @@ mask_page_hint_bits(Page page)
/*
* During replay, if the page LSN has advanced past our XLOG record's LSN,
- * we don't mark the page all-visible. See heap_xlog_visible() for
- * details.
+ * we don't mark the page all-visible.
*/
PageClearAllVisible(page);
}
diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index cff531a4801..6f161a6eab2 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -40,6 +40,7 @@
#include "access/valid.h"
#include "access/visibilitymap.h"
#include "access/xloginsert.h"
+#include "access/xlogutils.h"
#include "catalog/pg_database.h"
#include "catalog/pg_database_d.h"
#include "commands/vacuum.h"
@@ -2526,11 +2527,11 @@ heap_multi_insert(Relation relation, TupleTableSlot **slots, int ntuples,
else if (all_frozen_set)
{
PageSetAllVisible(page);
- visibilitymap_set_vmbits(relation,
- BufferGetBlockNumber(buffer),
- vmbuffer,
- VISIBILITYMAP_ALL_VISIBLE |
- VISIBILITYMAP_ALL_FROZEN);
+ visibilitymap_set(relation,
+ BufferGetBlockNumber(buffer),
+ vmbuffer,
+ VISIBILITYMAP_ALL_VISIBLE |
+ VISIBILITYMAP_ALL_FROZEN);
}
/*
@@ -8801,49 +8802,6 @@ bottomup_sort_and_shrink(TM_IndexDeleteOp *delstate)
return nblocksfavorable;
}
-/*
- * Perform XLogInsert for a heap-visible operation. 'block' is the block
- * being marked all-visible, and vm_buffer is the buffer containing the
- * corresponding visibility map block. Both should have already been modified
- * and dirtied.
- *
- * snapshotConflictHorizon comes from the largest xmin on the page being
- * marked all-visible. REDO routine uses it to generate recovery conflicts.
- *
- * If checksums or wal_log_hints are enabled, we may also generate a full-page
- * image of heap_buffer. Otherwise, we optimize away the FPI (by specifying
- * REGBUF_NO_IMAGE for the heap buffer), in which case the caller should *not*
- * update the heap page's LSN.
- */
-XLogRecPtr
-log_heap_visible(Relation rel, Buffer heap_buffer, Buffer vm_buffer,
- TransactionId snapshotConflictHorizon, uint8 vmflags)
-{
- xl_heap_visible xlrec;
- XLogRecPtr recptr;
- uint8 flags;
-
- Assert(BufferIsValid(heap_buffer));
- Assert(BufferIsValid(vm_buffer));
-
- xlrec.snapshotConflictHorizon = snapshotConflictHorizon;
- xlrec.flags = vmflags;
- if (RelationIsAccessibleInLogicalDecoding(rel))
- xlrec.flags |= VISIBILITYMAP_XLOG_CATALOG_REL;
- XLogBeginInsert();
- XLogRegisterData(&xlrec, SizeOfHeapVisible);
-
- XLogRegisterBuffer(0, vm_buffer, 0);
-
- flags = REGBUF_STANDARD;
- if (!XLogHintBitIsNeeded())
- flags |= REGBUF_NO_IMAGE;
- XLogRegisterBuffer(1, heap_buffer, flags);
-
- recptr = XLogInsert(RM_HEAP2_ID, XLOG_HEAP2_VISIBLE);
-
- return recptr;
-}
/*
* Perform XLogInsert for a heap-update operation. Caller must already
diff --git a/src/backend/access/heap/heapam_xlog.c b/src/backend/access/heap/heapam_xlog.c
index 5872f13397f..9f16ba68d16 100644
--- a/src/backend/access/heap/heapam_xlog.c
+++ b/src/backend/access/heap/heapam_xlog.c
@@ -53,6 +53,8 @@ heap_xlog_prune_freeze(XLogReaderState *record)
(xlrec.flags & (XLHP_HAS_REDIRECTIONS | XLHP_HAS_DEAD_ITEMS)) == 0);
vmflags = xlrec.flags & VISIBILITYMAP_VALID_BITS;
+ /* Must never set all_frozen bit without also setting all_visible bit */
+ Assert(vmflags != VISIBILITYMAP_ALL_FROZEN);
/*
* After xl_heap_prune is the optional snapshot conflict horizon.
@@ -243,9 +245,7 @@ heap_xlog_prune_freeze(XLogReaderState *record)
* the VM is set.
*
* In recovery, we expect no other writers, so writing to the VM page
- * without holding a lock on the heap page is considered safe enough. It
- * is done this way when replaying xl_heap_visible records (see
- * heap_xlog_visible()).
+ * without holding a lock on the heap page is considered safe enough.
*/
if (vmflags & VISIBILITYMAP_VALID_BITS &&
XLogReadBufferForRedoExtended(record, 1,
@@ -261,7 +261,7 @@ heap_xlog_prune_freeze(XLogReaderState *record)
if (PageIsNew(vmpage))
PageInit(vmpage, BLCKSZ, 0);
- old_vmbits = visibilitymap_set_vmbits(reln, blkno, vmbuffer, vmflags);
+ old_vmbits = visibilitymap_set(reln, blkno, vmbuffer, vmflags);
/* Only set VM page LSN if we modified the page */
if (old_vmbits != vmflags)
@@ -280,142 +280,6 @@ heap_xlog_prune_freeze(XLogReaderState *record)
XLogRecordPageWithFreeSpace(rlocator, blkno, freespace);
}
-/*
- * Replay XLOG_HEAP2_VISIBLE records.
- *
- * The critical integrity requirement here is that we must never end up with
- * a situation where the visibility map bit is set, and the page-level
- * PD_ALL_VISIBLE bit is clear. If that were to occur, then a subsequent
- * page modification would fail to clear the visibility map bit.
- */
-static void
-heap_xlog_visible(XLogReaderState *record)
-{
- XLogRecPtr lsn = record->EndRecPtr;
- xl_heap_visible *xlrec = (xl_heap_visible *) XLogRecGetData(record);
- Buffer vmbuffer = InvalidBuffer;
- Buffer buffer;
- Page page;
- RelFileLocator rlocator;
- BlockNumber blkno;
- XLogRedoAction action;
-
- Assert((xlrec->flags & VISIBILITYMAP_XLOG_VALID_BITS) == xlrec->flags);
-
- XLogRecGetBlockTag(record, 1, &rlocator, NULL, &blkno);
-
- /*
- * If there are any Hot Standby transactions running that have an xmin
- * horizon old enough that this page isn't all-visible for them, they
- * might incorrectly decide that an index-only scan can skip a heap fetch.
- *
- * NB: It might be better to throw some kind of "soft" conflict here that
- * forces any index-only scan that is in flight to perform heap fetches,
- * rather than killing the transaction outright.
- */
- if (InHotStandby)
- ResolveRecoveryConflictWithSnapshot(xlrec->snapshotConflictHorizon,
- xlrec->flags & VISIBILITYMAP_XLOG_CATALOG_REL,
- rlocator);
-
- /*
- * Read the heap page, if it still exists. If the heap file has dropped or
- * truncated later in recovery, we don't need to update the page, but we'd
- * better still update the visibility map.
- */
- action = XLogReadBufferForRedo(record, 1, &buffer);
- if (action == BLK_NEEDS_REDO)
- {
- /*
- * We don't bump the LSN of the heap page when setting the visibility
- * map bit (unless checksums or wal_hint_bits is enabled, in which
- * case we must). This exposes us to torn page hazards, but since
- * we're not inspecting the existing page contents in any way, we
- * don't care.
- */
- page = BufferGetPage(buffer);
-
- PageSetAllVisible(page);
-
- if (XLogHintBitIsNeeded())
- PageSetLSN(page, lsn);
-
- MarkBufferDirty(buffer);
- }
- else if (action == BLK_RESTORED)
- {
- /*
- * If heap block was backed up, we already restored it and there's
- * nothing more to do. (This can only happen with checksums or
- * wal_log_hints enabled.)
- */
- }
-
- if (BufferIsValid(buffer))
- {
- Size space = PageGetFreeSpace(BufferGetPage(buffer));
-
- UnlockReleaseBuffer(buffer);
-
- /*
- * Since FSM is not WAL-logged and only updated heuristically, it
- * easily becomes stale in standbys. If the standby is later promoted
- * and runs VACUUM, it will skip updating individual free space
- * figures for pages that became all-visible (or all-frozen, depending
- * on the vacuum mode,) which is troublesome when FreeSpaceMapVacuum
- * propagates too optimistic free space values to upper FSM layers;
- * later inserters try to use such pages only to find out that they
- * are unusable. This can cause long stalls when there are many such
- * pages.
- *
- * Forestall those problems by updating FSM's idea about a page that
- * is becoming all-visible or all-frozen.
- *
- * Do this regardless of a full-page image being applied, since the
- * FSM data is not in the page anyway.
- */
- if (xlrec->flags & VISIBILITYMAP_VALID_BITS)
- XLogRecordPageWithFreeSpace(rlocator, blkno, space);
- }
-
- /*
- * Even if we skipped the heap page update due to the LSN interlock, it's
- * still safe to update the visibility map. Any WAL record that clears
- * the visibility map bit does so before checking the page LSN, so any
- * bits that need to be cleared will still be cleared.
- */
- if (XLogReadBufferForRedoExtended(record, 0, RBM_ZERO_ON_ERROR, false,
- &vmbuffer) == BLK_NEEDS_REDO)
- {
- Page vmpage = BufferGetPage(vmbuffer);
- Relation reln;
- uint8 vmbits;
-
- /* initialize the page if it was read as zeros */
- if (PageIsNew(vmpage))
- PageInit(vmpage, BLCKSZ, 0);
-
- /* remove VISIBILITYMAP_XLOG_* */
- vmbits = xlrec->flags & VISIBILITYMAP_VALID_BITS;
-
- /*
- * XLogReadBufferForRedoExtended locked the buffer. But
- * visibilitymap_set will handle locking itself.
- */
- LockBuffer(vmbuffer, BUFFER_LOCK_UNLOCK);
-
- reln = CreateFakeRelcacheEntry(rlocator);
-
- visibilitymap_set(reln, blkno, InvalidBuffer, lsn, vmbuffer,
- xlrec->snapshotConflictHorizon, vmbits);
-
- ReleaseBuffer(vmbuffer);
- FreeFakeRelcacheEntry(reln);
- }
- else if (BufferIsValid(vmbuffer))
- UnlockReleaseBuffer(vmbuffer);
-}
-
/*
* Given an "infobits" field from an XLog record, set the correct bits in the
* given infomask and infomask2 for the tuple touched by the record.
@@ -793,9 +657,7 @@ heap_xlog_multi_insert(XLogReaderState *record)
* the VM is set.
*
* In recovery, we expect no other writers, so writing to the VM page
- * without holding a lock on the heap page is considered safe enough. It
- * is done this way when replaying xl_heap_visible records (see
- * heap_xlog_visible()).
+ * without holding a lock on the heap page is considered safe enough.
*/
if (xlrec->flags & XLH_INSERT_ALL_FROZEN_SET &&
XLogReadBufferForRedoExtended(record, 1, RBM_ZERO_ON_ERROR, false,
@@ -808,15 +670,14 @@ heap_xlog_multi_insert(XLogReaderState *record)
if (PageIsNew(vmpage))
PageInit(vmpage, BLCKSZ, 0);
- visibilitymap_set_vmbits(reln, blkno,
- vmbuffer,
- VISIBILITYMAP_ALL_VISIBLE |
- VISIBILITYMAP_ALL_FROZEN);
-
/*
* It is not possible that the VM was already set for this heap page,
* so the vmbuffer must have been modified and marked dirty.
*/
+ visibilitymap_set(reln, blkno,
+ vmbuffer,
+ VISIBILITYMAP_ALL_VISIBLE |
+ VISIBILITYMAP_ALL_FROZEN);
Assert(BufferIsDirty(vmbuffer));
PageSetLSN(BufferGetPage(vmbuffer), lsn);
FreeFakeRelcacheEntry(reln);
@@ -1397,9 +1258,6 @@ heap2_redo(XLogReaderState *record)
case XLOG_HEAP2_PRUNE_VACUUM_CLEANUP:
heap_xlog_prune_freeze(record);
break;
- case XLOG_HEAP2_VISIBLE:
- heap_xlog_visible(record);
- break;
case XLOG_HEAP2_MULTI_INSERT:
heap_xlog_multi_insert(record);
break;
diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index dce9025d268..6637966e927 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -989,8 +989,8 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
if (do_set_vm)
{
Assert(PageIsAllVisible(page));
- old_vmbits = visibilitymap_set_vmbits(relation, blockno,
- vmbuffer, vmflags);
+ old_vmbits = visibilitymap_set(relation, blockno,
+ vmbuffer, vmflags);
if (old_vmbits == vmflags)
{
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index 75205179b83..2dcca071a45 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -1888,8 +1888,8 @@ lazy_scan_new_or_empty(LVRelState *vacrel, Buffer buf, BlockNumber blkno,
PageSetAllVisible(page);
MarkBufferDirty(buf);
- visibilitymap_set_vmbits(vacrel->rel, blkno,
- vmbuffer, new_vmbits);
+ visibilitymap_set(vacrel->rel, blkno,
+ vmbuffer, new_vmbits);
if (RelationNeedsWAL(vacrel->rel))
{
@@ -2757,9 +2757,9 @@ lazy_vacuum_heap_page(LVRelState *vacrel, BlockNumber blkno, Buffer buffer,
Assert(!PageIsAllVisible(page));
set_pd_all_vis = true;
PageSetAllVisible(page);
- visibilitymap_set_vmbits(vacrel->rel,
- blkno,
- vmbuffer, vmflags);
+ visibilitymap_set(vacrel->rel,
+ blkno,
+ vmbuffer, vmflags);
conflict_xid = visibility_cutoff_xid;
}
diff --git a/src/backend/access/heap/visibilitymap.c b/src/backend/access/heap/visibilitymap.c
index aa48a436108..f7bad68ffc5 100644
--- a/src/backend/access/heap/visibilitymap.c
+++ b/src/backend/access/heap/visibilitymap.c
@@ -14,8 +14,7 @@
* visibilitymap_clear - clear bits for one page in the visibility map
* visibilitymap_pin - pin a map page for setting a bit
* visibilitymap_pin_ok - check whether correct map page is already pinned
- * visibilitymap_set - set bit(s) in a previously pinned page and log
- * visibilitymap_set_vmbits - set bit(s) in a pinned page
+ * visibilitymap_set - set bit(s) in a previously pinned page
* visibilitymap_get_status - get status of bits
* visibilitymap_count - count number of bits set in visibility map
* visibilitymap_prepare_truncate -
@@ -220,108 +219,6 @@ visibilitymap_pin_ok(BlockNumber heapBlk, Buffer vmbuf)
return BufferIsValid(vmbuf) && BufferGetBlockNumber(vmbuf) == mapBlock;
}
-/*
- * visibilitymap_set - set bit(s) on a previously pinned page
- *
- * recptr is the LSN of the XLOG record we're replaying, if we're in recovery,
- * or InvalidXLogRecPtr in normal running. The VM page LSN is advanced to the
- * one provided; in normal running, we generate a new XLOG record and set the
- * page LSN to that value (though the heap page's LSN may *not* be updated;
- * see below). cutoff_xid is the largest xmin on the page being marked
- * all-visible; it is needed for Hot Standby, and can be InvalidTransactionId
- * if the page contains no tuples. It can also be set to InvalidTransactionId
- * when a page that is already all-visible is being marked all-frozen.
- *
- * Caller is expected to set the heap page's PD_ALL_VISIBLE bit before calling
- * this function. Except in recovery, caller should also pass the heap
- * buffer. When checksums are enabled and we're not in recovery, we must add
- * the heap buffer to the WAL chain to protect it from being torn.
- *
- * You must pass a buffer containing the correct map page to this function.
- * Call visibilitymap_pin first to pin the right one. This function doesn't do
- * any I/O.
- *
- * Returns the state of the page's VM bits before setting flags.
- */
-uint8
-visibilitymap_set(Relation rel, BlockNumber heapBlk, Buffer heapBuf,
- XLogRecPtr recptr, Buffer vmBuf, TransactionId cutoff_xid,
- uint8 flags)
-{
- BlockNumber mapBlock = HEAPBLK_TO_MAPBLOCK(heapBlk);
- uint32 mapByte = HEAPBLK_TO_MAPBYTE(heapBlk);
- uint8 mapOffset = HEAPBLK_TO_OFFSET(heapBlk);
- Page page;
- uint8 *map;
- uint8 status;
-
-#ifdef TRACE_VISIBILITYMAP
- elog(DEBUG1, "vm_set flags 0x%02X for %s %d",
- flags, RelationGetRelationName(rel), heapBlk);
-#endif
-
- Assert(InRecovery || XLogRecPtrIsInvalid(recptr));
- Assert(InRecovery || PageIsAllVisible(BufferGetPage(heapBuf)));
- Assert((flags & VISIBILITYMAP_VALID_BITS) == flags);
-
- /* Must never set all_frozen bit without also setting all_visible bit */
- Assert(flags != VISIBILITYMAP_ALL_FROZEN);
-
- /* Check that we have the right heap page pinned, if present */
- if (BufferIsValid(heapBuf) && BufferGetBlockNumber(heapBuf) != heapBlk)
- elog(ERROR, "wrong heap buffer passed to visibilitymap_set");
-
- Assert(!BufferIsValid(heapBuf) || BufferIsExclusiveLocked(heapBuf));
-
- /* Check that we have the right VM page pinned */
- if (!BufferIsValid(vmBuf) || BufferGetBlockNumber(vmBuf) != mapBlock)
- elog(ERROR, "wrong VM buffer passed to visibilitymap_set");
-
- page = BufferGetPage(vmBuf);
- map = (uint8 *) PageGetContents(page);
- LockBuffer(vmBuf, BUFFER_LOCK_EXCLUSIVE);
-
- status = (map[mapByte] >> mapOffset) & VISIBILITYMAP_VALID_BITS;
- if (flags != status)
- {
- START_CRIT_SECTION();
-
- map[mapByte] |= (flags << mapOffset);
- MarkBufferDirty(vmBuf);
-
- if (RelationNeedsWAL(rel))
- {
- if (XLogRecPtrIsInvalid(recptr))
- {
- Assert(!InRecovery);
- recptr = log_heap_visible(rel, heapBuf, vmBuf, cutoff_xid, flags);
-
- /*
- * If data checksums are enabled (or wal_log_hints=on), we
- * need to protect the heap page from being torn.
- *
- * If not, then we must *not* update the heap page's LSN. In
- * this case, the FPI for the heap page was omitted from the
- * WAL record inserted above, so it would be incorrect to
- * update the heap page's LSN.
- */
- if (XLogHintBitIsNeeded())
- {
- Page heapPage = BufferGetPage(heapBuf);
-
- PageSetLSN(heapPage, recptr);
- }
- }
- PageSetLSN(page, recptr);
- }
-
- END_CRIT_SECTION();
- }
-
- LockBuffer(vmBuf, BUFFER_LOCK_UNLOCK);
- return status;
-}
-
/*
* Set flags in the VM block contained in the passed in vmBuf.
*
@@ -343,8 +240,8 @@ visibilitymap_set(Relation rel, BlockNumber heapBlk, Buffer heapBuf,
* is pinned and exclusive locked.
*/
uint8
-visibilitymap_set_vmbits(Relation rel, BlockNumber heapBlk,
- Buffer vmBuf, uint8 flags)
+visibilitymap_set(Relation rel, BlockNumber heapBlk,
+ Buffer vmBuf, uint8 flags)
{
BlockNumber mapBlock = HEAPBLK_TO_MAPBLOCK(heapBlk);
uint32 mapByte = HEAPBLK_TO_MAPBYTE(heapBlk);
diff --git a/src/backend/access/rmgrdesc/heapdesc.c b/src/backend/access/rmgrdesc/heapdesc.c
index 439f33b8061..3342af02c75 100644
--- a/src/backend/access/rmgrdesc/heapdesc.c
+++ b/src/backend/access/rmgrdesc/heapdesc.c
@@ -344,13 +344,6 @@ heap2_desc(StringInfo buf, XLogReaderState *record)
}
}
}
- else if (info == XLOG_HEAP2_VISIBLE)
- {
- xl_heap_visible *xlrec = (xl_heap_visible *) rec;
-
- appendStringInfo(buf, "snapshotConflictHorizon: %u, flags: 0x%02X",
- xlrec->snapshotConflictHorizon, xlrec->flags);
- }
else if (info == XLOG_HEAP2_MULTI_INSERT)
{
xl_heap_multi_insert *xlrec = (xl_heap_multi_insert *) rec;
@@ -455,9 +448,6 @@ heap2_identify(uint8 info)
case XLOG_HEAP2_PRUNE_VACUUM_CLEANUP:
id = "PRUNE_VACUUM_CLEANUP";
break;
- case XLOG_HEAP2_VISIBLE:
- id = "VISIBLE";
- break;
case XLOG_HEAP2_MULTI_INSERT:
id = "MULTI_INSERT";
break;
diff --git a/src/backend/replication/logical/decode.c b/src/backend/replication/logical/decode.c
index cc03f0706e9..2fdd4af90a8 100644
--- a/src/backend/replication/logical/decode.c
+++ b/src/backend/replication/logical/decode.c
@@ -454,7 +454,6 @@ heap2_decode(LogicalDecodingContext *ctx, XLogRecordBuffer *buf)
case XLOG_HEAP2_PRUNE_ON_ACCESS:
case XLOG_HEAP2_PRUNE_VACUUM_SCAN:
case XLOG_HEAP2_PRUNE_VACUUM_CLEANUP:
- case XLOG_HEAP2_VISIBLE:
case XLOG_HEAP2_LOCK_UPDATED:
break;
default:
diff --git a/src/backend/storage/ipc/standby.c b/src/backend/storage/ipc/standby.c
index 4222bdab078..c619643e121 100644
--- a/src/backend/storage/ipc/standby.c
+++ b/src/backend/storage/ipc/standby.c
@@ -475,12 +475,12 @@ ResolveRecoveryConflictWithSnapshot(TransactionId snapshotConflictHorizon,
* If we get passed InvalidTransactionId then we do nothing (no conflict).
*
* This can happen when replaying already-applied WAL records after a
- * standby crash or restart, or when replaying an XLOG_HEAP2_VISIBLE
- * record that marks as frozen a page which was already all-visible. It's
- * also quite common with records generated during index deletion
- * (original execution of the deletion can reason that a recovery conflict
- * which is sufficient for the deletion operation must take place before
- * replay of the deletion record itself).
+ * standby crash or restart, or when replaying a record that marks as
+ * frozen a page which was already marked all-visible in the visibility
+ * map. It's also quite common with records generated during index
+ * deletion (original execution of the deletion can reason that a recovery
+ * conflict which is sufficient for the deletion operation must take place
+ * before replay of the deletion record itself).
*/
if (!TransactionIdIsValid(snapshotConflictHorizon))
return;
diff --git a/src/include/access/heapam_xlog.h b/src/include/access/heapam_xlog.h
index d8508593e7c..3672f372aa8 100644
--- a/src/include/access/heapam_xlog.h
+++ b/src/include/access/heapam_xlog.h
@@ -60,7 +60,6 @@
#define XLOG_HEAP2_PRUNE_ON_ACCESS 0x10
#define XLOG_HEAP2_PRUNE_VACUUM_SCAN 0x20
#define XLOG_HEAP2_PRUNE_VACUUM_CLEANUP 0x30
-#define XLOG_HEAP2_VISIBLE 0x40
#define XLOG_HEAP2_MULTI_INSERT 0x50
#define XLOG_HEAP2_LOCK_UPDATED 0x60
#define XLOG_HEAP2_NEW_CID 0x70
@@ -446,20 +445,6 @@ typedef struct xl_heap_inplace
#define MinSizeOfHeapInplace (offsetof(xl_heap_inplace, nmsgs) + sizeof(int))
-/*
- * This is what we need to know about setting a visibility map bit
- *
- * Backup blk 0: visibility map buffer
- * Backup blk 1: heap buffer
- */
-typedef struct xl_heap_visible
-{
- TransactionId snapshotConflictHorizon;
- uint8 flags;
-} xl_heap_visible;
-
-#define SizeOfHeapVisible (offsetof(xl_heap_visible, flags) + sizeof(uint8))
-
typedef struct xl_heap_new_cid
{
/*
@@ -503,11 +488,6 @@ extern void heap2_desc(StringInfo buf, XLogReaderState *record);
extern const char *heap2_identify(uint8 info);
extern void heap_xlog_logical_rewrite(XLogReaderState *r);
-extern XLogRecPtr log_heap_visible(Relation rel, Buffer heap_buffer,
- Buffer vm_buffer,
- TransactionId snapshotConflictHorizon,
- uint8 vmflags);
-
/* in heapdesc.c, so it can be shared between frontend/backend code */
extern void heap_xlog_deserialize_prune_and_freeze(char *cursor, uint16 flags,
int *nplans, xlhp_freeze_plan **plans,
diff --git a/src/include/access/visibilitymap.h b/src/include/access/visibilitymap.h
index fc7056a91ea..c5b1e1f7adb 100644
--- a/src/include/access/visibilitymap.h
+++ b/src/include/access/visibilitymap.h
@@ -15,7 +15,6 @@
#define VISIBILITYMAP_H
#include "access/visibilitymapdefs.h"
-#include "access/xlogdefs.h"
#include "storage/block.h"
#include "storage/buf.h"
#include "utils/relcache.h"
@@ -31,14 +30,8 @@ extern bool visibilitymap_clear(Relation rel, BlockNumber heapBlk,
extern void visibilitymap_pin(Relation rel, BlockNumber heapBlk,
Buffer *vmbuf);
extern bool visibilitymap_pin_ok(BlockNumber heapBlk, Buffer vmbuf);
-extern uint8 visibilitymap_set(Relation rel,
- BlockNumber heapBlk, Buffer heapBuf,
- XLogRecPtr recptr,
- Buffer vmBuf,
- TransactionId cutoff_xid,
- uint8 flags);
-extern uint8 visibilitymap_set_vmbits(Relation rel, BlockNumber heapBlk,
- Buffer vmBuf, uint8 flags);
+extern uint8 visibilitymap_set(Relation rel, BlockNumber heapBlk,
+ Buffer vmBuf, uint8 flags);
extern uint8 visibilitymap_get_status(Relation rel, BlockNumber heapBlk, Buffer *vmbuf);
extern void visibilitymap_count(Relation rel, BlockNumber *all_visible, BlockNumber *all_frozen);
extern BlockNumber visibilitymap_prepare_truncate(Relation rel,
diff --git a/src/include/access/visibilitymapdefs.h b/src/include/access/visibilitymapdefs.h
index 5ad5c020877..e01bce4c99f 100644
--- a/src/include/access/visibilitymapdefs.h
+++ b/src/include/access/visibilitymapdefs.h
@@ -21,14 +21,5 @@
#define VISIBILITYMAP_ALL_FROZEN 0x02
#define VISIBILITYMAP_VALID_BITS 0x03 /* OR of all valid visibilitymap
* flags bits */
-/*
- * To detect recovery conflicts during logical decoding on a standby, we need
- * to know if a table is a user catalog table. For that we add an additional
- * bit into xl_heap_visible.flags, in addition to the above.
- *
- * NB: VISIBILITYMAP_XLOG_* may not be passed to visibilitymap_set().
- */
-#define VISIBILITYMAP_XLOG_CATALOG_REL 0x04
-#define VISIBILITYMAP_XLOG_VALID_BITS (VISIBILITYMAP_VALID_BITS | VISIBILITYMAP_XLOG_CATALOG_REL)
#endif /* VISIBILITYMAPDEFS_H */
diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list
index a13e8162890..b4c880c083f 100644
--- a/src/tools/pgindent/typedefs.list
+++ b/src/tools/pgindent/typedefs.list
@@ -4273,7 +4273,6 @@ xl_heap_prune
xl_heap_rewrite_mapping
xl_heap_truncate
xl_heap_update
-xl_heap_visible
xl_invalid_page
xl_invalid_page_key
xl_invalidations
--
2.43.0
v12-0015-Inline-TransactionIdFollows-Precedes.patchtext/x-patch; charset=US-ASCII; name=v12-0015-Inline-TransactionIdFollows-Precedes.patchDownload
From ba7442b7b550e1510d49c9df7eb23ddaf8533644 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <melanieplageman@gmail.com>
Date: Tue, 29 Jul 2025 09:57:13 -0400
Subject: [PATCH v12 15/20] Inline TransactionIdFollows/Precedes()
Calling these from on-access pruning code had noticeable overhead in a
profile. There does not seem to be a reason not to inline them.
Reviewed-by: Kirill Reshke <reshkekirill@gmail.com>
---
src/backend/access/transam/transam.c | 64 -------------------------
src/include/access/transam.h | 70 ++++++++++++++++++++++++++--
2 files changed, 66 insertions(+), 68 deletions(-)
diff --git a/src/backend/access/transam/transam.c b/src/backend/access/transam/transam.c
index 9a39451a29a..553d6756cb3 100644
--- a/src/backend/access/transam/transam.c
+++ b/src/backend/access/transam/transam.c
@@ -273,70 +273,6 @@ TransactionIdAbortTree(TransactionId xid, int nxids, TransactionId *xids)
TRANSACTION_STATUS_ABORTED, InvalidXLogRecPtr);
}
-/*
- * TransactionIdPrecedes --- is id1 logically < id2?
- */
-bool
-TransactionIdPrecedes(TransactionId id1, TransactionId id2)
-{
- /*
- * If either ID is a permanent XID then we can just do unsigned
- * comparison. If both are normal, do a modulo-2^32 comparison.
- */
- int32 diff;
-
- if (!TransactionIdIsNormal(id1) || !TransactionIdIsNormal(id2))
- return (id1 < id2);
-
- diff = (int32) (id1 - id2);
- return (diff < 0);
-}
-
-/*
- * TransactionIdPrecedesOrEquals --- is id1 logically <= id2?
- */
-bool
-TransactionIdPrecedesOrEquals(TransactionId id1, TransactionId id2)
-{
- int32 diff;
-
- if (!TransactionIdIsNormal(id1) || !TransactionIdIsNormal(id2))
- return (id1 <= id2);
-
- diff = (int32) (id1 - id2);
- return (diff <= 0);
-}
-
-/*
- * TransactionIdFollows --- is id1 logically > id2?
- */
-bool
-TransactionIdFollows(TransactionId id1, TransactionId id2)
-{
- int32 diff;
-
- if (!TransactionIdIsNormal(id1) || !TransactionIdIsNormal(id2))
- return (id1 > id2);
-
- diff = (int32) (id1 - id2);
- return (diff > 0);
-}
-
-/*
- * TransactionIdFollowsOrEquals --- is id1 logically >= id2?
- */
-bool
-TransactionIdFollowsOrEquals(TransactionId id1, TransactionId id2)
-{
- int32 diff;
-
- if (!TransactionIdIsNormal(id1) || !TransactionIdIsNormal(id2))
- return (id1 >= id2);
-
- diff = (int32) (id1 - id2);
- return (diff >= 0);
-}
-
/*
* TransactionIdLatest --- get latest XID among a main xact and its children
diff --git a/src/include/access/transam.h b/src/include/access/transam.h
index 7d82cd2eb56..c9e20418275 100644
--- a/src/include/access/transam.h
+++ b/src/include/access/transam.h
@@ -255,6 +255,72 @@ typedef struct TransamVariablesData
} TransamVariablesData;
+
+/*
+ * TransactionIdPrecedes --- is id1 logically < id2?
+ */
+static inline bool
+TransactionIdPrecedes(TransactionId id1, TransactionId id2)
+{
+ /*
+ * If either ID is a permanent XID then we can just do unsigned
+ * comparison. If both are normal, do a modulo-2^32 comparison.
+ */
+ int32 diff;
+
+ if (!TransactionIdIsNormal(id1) || !TransactionIdIsNormal(id2))
+ return (id1 < id2);
+
+ diff = (int32) (id1 - id2);
+ return (diff < 0);
+}
+
+/*
+ * TransactionIdPrecedesOrEquals --- is id1 logically <= id2?
+ */
+static inline bool
+TransactionIdPrecedesOrEquals(TransactionId id1, TransactionId id2)
+{
+ int32 diff;
+
+ if (!TransactionIdIsNormal(id1) || !TransactionIdIsNormal(id2))
+ return (id1 <= id2);
+
+ diff = (int32) (id1 - id2);
+ return (diff <= 0);
+}
+
+/*
+ * TransactionIdFollows --- is id1 logically > id2?
+ */
+static inline bool
+TransactionIdFollows(TransactionId id1, TransactionId id2)
+{
+ int32 diff;
+
+ if (!TransactionIdIsNormal(id1) || !TransactionIdIsNormal(id2))
+ return (id1 > id2);
+
+ diff = (int32) (id1 - id2);
+ return (diff > 0);
+}
+
+/*
+ * TransactionIdFollowsOrEquals --- is id1 logically >= id2?
+ */
+static inline bool
+TransactionIdFollowsOrEquals(TransactionId id1, TransactionId id2)
+{
+ int32 diff;
+
+ if (!TransactionIdIsNormal(id1) || !TransactionIdIsNormal(id2))
+ return (id1 >= id2);
+
+ diff = (int32) (id1 - id2);
+ return (diff >= 0);
+}
+
+
/* ----------------
* extern declarations
* ----------------
@@ -274,10 +340,6 @@ extern bool TransactionIdDidAbort(TransactionId transactionId);
extern void TransactionIdCommitTree(TransactionId xid, int nxids, TransactionId *xids);
extern void TransactionIdAsyncCommitTree(TransactionId xid, int nxids, TransactionId *xids, XLogRecPtr lsn);
extern void TransactionIdAbortTree(TransactionId xid, int nxids, TransactionId *xids);
-extern bool TransactionIdPrecedes(TransactionId id1, TransactionId id2);
-extern bool TransactionIdPrecedesOrEquals(TransactionId id1, TransactionId id2);
-extern bool TransactionIdFollows(TransactionId id1, TransactionId id2);
-extern bool TransactionIdFollowsOrEquals(TransactionId id1, TransactionId id2);
extern TransactionId TransactionIdLatest(TransactionId mainxid,
int nxids, const TransactionId *xids);
extern XLogRecPtr TransactionIdGetCommitLSN(TransactionId xid);
--
2.43.0
v12-0016-Unset-all-visible-sooner-if-not-freezing.patchtext/x-patch; charset=US-ASCII; name=v12-0016-Unset-all-visible-sooner-if-not-freezing.patchDownload
From b3f607261a22bad37a3aba9091dbee049d424eda Mon Sep 17 00:00:00 2001
From: Melanie Plageman <melanieplageman@gmail.com>
Date: Tue, 29 Jul 2025 14:35:13 -0400
Subject: [PATCH v12 16/20] Unset all-visible sooner if not freezing
In prune/freeze code, we delay unsetting all-visible/all-frozen in the
presence of dead items to allow opportunistically freezing tuples if the
whole page would be frozen except for those dead items -- which are
removed later in vacuum's third phase.
Future commits will allow on-access pruning to set the VM, which means
all-visible will be initialized to true instead of false and we will do
extra bookkeeping in heap_prune_unchanged_lp_normal() to keep track of
whether or not the page is all-visible.
Because on-access pruning will not freeze tuples, it makes sense to
unset all-visible as soon as we encounter an LP_DEAD item and
avoid continued bookkeeping since we know the page is not all-visible
and we won't be able to remove those dead items.
---
src/backend/access/heap/pruneheap.c | 10 ++++++++--
1 file changed, 8 insertions(+), 2 deletions(-)
diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index c6935e45cec..ba8ddc7fa35 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -1503,8 +1503,11 @@ heap_prune_record_dead(PruneState *prstate, OffsetNumber offnum,
/*
* Deliberately delay unsetting all_visible until later during pruning.
- * Removable dead tuples shouldn't preclude freezing the page.
+ * Removable dead tuples shouldn't preclude freezing the page. If we won't
+ * attempt freezing, just unset all-visible now, though.
*/
+ if (!prstate->attempt_freeze)
+ prstate->all_visible = prstate->all_frozen = false;
/* Record the dead offset for vacuum */
prstate->deadoffsets[prstate->lpdead_items++] = offnum;
@@ -1762,8 +1765,11 @@ heap_prune_record_unchanged_lp_dead(Page page, PruneState *prstate, OffsetNumber
* Similarly, don't unset all_visible until later, at the end of
* heap_page_prune_and_freeze(). This will allow us to attempt to freeze
* the page after pruning. As long as we unset it before updating the
- * visibility map, this will be correct.
+ * visibility map, this will be correct. If we won't attempt freezing,
+ * though, just unset all-visible now.
*/
+ if (!prstate->attempt_freeze)
+ prstate->all_visible = prstate->all_frozen = false;
/* Record the dead offset for vacuum */
prstate->deadoffsets[prstate->lpdead_items++] = offnum;
--
2.43.0
v12-0014-Use-GlobalVisState-to-determine-page-level-visib.patchtext/x-patch; charset=US-ASCII; name=v12-0014-Use-GlobalVisState-to-determine-page-level-visib.patchDownload
From e993249ab9d2ec35202cba48cce4a8928dd03ab9 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <melanieplageman@gmail.com>
Date: Tue, 29 Jul 2025 14:38:24 -0400
Subject: [PATCH v12 14/20] Use GlobalVisState to determine page level
visibility
During pruning and during vacuum's third phase, we try to determine if
the whole page can be set all-visible in the visibility map. Instead of
using OldestXmin to determine if all the tuples on a page are visible to
everyone, use the GlobalVisState. This allows us to start setting the VM
during on-access pruning in a future commit.
It is possible for the GlobalVisState to change during the course of a
vacuum. In all but extraordinary cases, it moves forward, meaning more
pages could potentially be set in the VM.
Because comparing a transaction ID to the GlobalVisState requires more
operations than comparing it to another single transaction ID, we now
wait until after examining all the tuples on the page and if we have
maintained the visibility_cutoff_xid, we compare that to the
GlobalVisState just once per page. This works because if the page is
all-visible and has live, committed tuples on it, the
visibility_cutoff_xid will contain the newest xmin on the page. If
everyone can see it, the page is truly all-visible.
Doing this may mean we examine more tuples' xmins than before, as we may
have set all_visible to false sooner when encountering a live tuple
newer than OldestXmin. However, these extra comparisons were found not
to be significant in a profile.
---
src/backend/access/heap/heapam_visibility.c | 28 ++++++++++++
src/backend/access/heap/pruneheap.c | 48 +++++++++------------
src/backend/access/heap/vacuumlazy.c | 19 ++++----
src/include/access/heapam.h | 4 +-
4 files changed, 60 insertions(+), 39 deletions(-)
diff --git a/src/backend/access/heap/heapam_visibility.c b/src/backend/access/heap/heapam_visibility.c
index 4ebc8abdbeb..edd529dc3c0 100644
--- a/src/backend/access/heap/heapam_visibility.c
+++ b/src/backend/access/heap/heapam_visibility.c
@@ -1189,6 +1189,34 @@ HeapTupleSatisfiesVacuum(HeapTuple htup, TransactionId OldestXmin,
return res;
}
+/*
+ * Nearly the same as HeapTupleSatisfiesVacuum, but uses a GlobalVisState to
+ * determine whether or not a tuple is HEAPTUPLE_DEAD Or
+ * HEAPTUPLE_RECENTLY_DEAD. It serves the same purpose but can be used by
+ * callers that have not calculated a single OldestXmin value.
+ */
+HTSV_Result
+HeapTupleSatisfiesVacuumGlobalVis(HeapTuple htup, GlobalVisState *vistest,
+ Buffer buffer)
+{
+ TransactionId dead_after = InvalidTransactionId;
+ HTSV_Result res;
+
+ res = HeapTupleSatisfiesVacuumHorizon(htup, buffer, &dead_after);
+
+ if (res == HEAPTUPLE_RECENTLY_DEAD)
+ {
+ Assert(TransactionIdIsValid(dead_after));
+
+ if (GlobalVisXidVisibleToAll(vistest, dead_after))
+ res = HEAPTUPLE_DEAD;
+ }
+ else
+ Assert(!TransactionIdIsValid(dead_after));
+
+ return res;
+}
+
/*
* Work horse for HeapTupleSatisfiesVacuum and similar routines.
*
diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index 0211effeec7..c6935e45cec 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -141,10 +141,9 @@ typedef struct
* all_visible and all_frozen indicate if the all-visible and all-frozen
* bits in the visibility map can be set for this page after pruning.
*
- * visibility_cutoff_xid is the newest xmin of live tuples on the page.
- * The caller can use it as the conflict horizon, when setting the VM
- * bits. It is only valid if we froze some tuples, and all_frozen is
- * true.
+ * visibility_cutoff_xid is the newest xmin of live tuples on the page. It
+ * can be used as the conflict horizon, when setting the VM or when
+ * freezing all the live tuples on the page.
*
* NOTE: all_visible and all_frozen don't include LP_DEAD items until
* directly before updating the VM. We ignore LP_DEAD items when deciding
@@ -559,14 +558,12 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
/*
* The visibility cutoff xid is the newest xmin of live, committed tuples
- * older than OldestXmin on the page. This field is only kept up-to-date
- * if the page is all-visible. As soon as a tuple is encountered that is
- * not visible to all, this field is unmaintained. As long as it is
- * maintained, it can be used to calculate the snapshot conflict horizon.
- * This is most likely to happen when updating the VM and/or freezing all
- * live tuples on the page. It is updated before returning to the caller
- * because vacuum does assert-build only validation on the page using this
- * field.
+ * on the page older than the visibility horizon represented in the
+ * GlobalVisState.
+ *
+ * If we encounter an uncommitted tuple, this field is unmaintained. If
+ * the page is being set all-visible or when freezing all live tuples on
+ * the page, it is used to calculate the snapshot conflict horizon.
*/
prstate.visibility_cutoff_xid = InvalidTransactionId;
@@ -762,6 +759,16 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
prstate.ndead > 0 ||
prstate.nunused > 0;
+ /*
+ * After processing all the live tuples on the page, if the newest xmin
+ * amongst them is not visible to everyone, the page cannot be
+ * all-visible.
+ */
+ if (prstate.all_visible &&
+ TransactionIdIsNormal(prstate.visibility_cutoff_xid) &&
+ !GlobalVisXidVisibleToAll(prstate.vistest, prstate.visibility_cutoff_xid))
+ prstate.all_visible = prstate.all_frozen = false;
+
/*
* Even if we don't prune anything, if we found a new value for the
* pd_prune_xid field or the page was marked full, we will update those
@@ -1108,12 +1115,10 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
TransactionId debug_cutoff;
bool debug_all_frozen;
- Assert(cutoffs);
-
Assert(prstate.lpdead_items == 0);
if (!heap_page_is_all_visible(relation, buffer,
- cutoffs->OldestXmin,
+ prstate.vistest,
&debug_all_frozen,
&debug_cutoff, off_loc))
Assert(false);
@@ -1638,19 +1643,6 @@ heap_prune_record_unchanged_lp_normal(Page page, PruneState *prstate, OffsetNumb
*/
xmin = HeapTupleHeaderGetXmin(htup);
- /*
- * For now always use prstate->cutoffs for this test, because
- * we only update 'all_visible' when freezing is requested. We
- * could use GlobalVisTestIsRemovableXid instead, if a
- * non-freezing caller wanted to set the VM bit.
- */
- Assert(prstate->cutoffs);
- if (!TransactionIdPrecedes(xmin, prstate->cutoffs->OldestXmin))
- {
- prstate->all_visible = prstate->all_frozen = false;
- break;
- }
-
/* Track newest xmin on page. */
if (TransactionIdFollows(xmin, prstate->visibility_cutoff_xid) &&
TransactionIdIsNormal(xmin))
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index 2dcca071a45..4ad05ba4db6 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -464,7 +464,7 @@ static void dead_items_add(LVRelState *vacrel, BlockNumber blkno, OffsetNumber *
static void dead_items_reset(LVRelState *vacrel);
static void dead_items_cleanup(LVRelState *vacrel);
static bool heap_page_would_be_all_visible(Relation rel, Buffer buf,
- TransactionId OldestXmin,
+ GlobalVisState *vistest,
OffsetNumber *deadoffsets,
int ndeadoffsets,
bool *all_frozen,
@@ -2717,7 +2717,7 @@ lazy_vacuum_heap_page(LVRelState *vacrel, BlockNumber blkno, Buffer buffer,
InvalidOffsetNumber);
if (heap_page_would_be_all_visible(vacrel->rel, buffer,
- vacrel->cutoffs.OldestXmin,
+ vacrel->vistest,
deadoffsets, num_offsets,
&all_frozen, &visibility_cutoff_xid,
&vacrel->offnum))
@@ -3462,13 +3462,13 @@ dead_items_cleanup(LVRelState *vacrel)
*/
bool
heap_page_is_all_visible(Relation rel, Buffer buf,
- TransactionId OldestXmin,
+ GlobalVisState *vistest,
bool *all_frozen,
TransactionId *visibility_cutoff_xid,
OffsetNumber *logging_offnum)
{
- return heap_page_would_be_all_visible(rel, buf, OldestXmin,
+ return heap_page_would_be_all_visible(rel, buf, vistest,
NULL, 0,
all_frozen,
visibility_cutoff_xid,
@@ -3487,7 +3487,7 @@ heap_page_is_all_visible(Relation rel, Buffer buf,
* Returns true if the page is all-visible other than the provided
* deadoffsets and false otherwise.
*
- * OldestXmin is used to determine visibility.
+ * vistest is used to determine visibility.
*
* *all_frozen is an output parameter indicating to the caller if every tuple
* on the page is frozen.
@@ -3508,7 +3508,7 @@ heap_page_is_all_visible(Relation rel, Buffer buf,
*/
static bool
heap_page_would_be_all_visible(Relation rel, Buffer buf,
- TransactionId OldestXmin,
+ GlobalVisState *vistest,
OffsetNumber *deadoffsets,
int ndeadoffsets,
bool *all_frozen,
@@ -3580,8 +3580,8 @@ heap_page_would_be_all_visible(Relation rel, Buffer buf,
tuple.t_len = ItemIdGetLength(itemid);
tuple.t_tableOid = RelationGetRelid(rel);
- switch (HeapTupleSatisfiesVacuum(&tuple, OldestXmin,
- buf))
+ switch (HeapTupleSatisfiesVacuumGlobalVis(&tuple, vistest,
+ buf))
{
case HEAPTUPLE_LIVE:
{
@@ -3600,8 +3600,7 @@ heap_page_would_be_all_visible(Relation rel, Buffer buf,
* that everyone sees it as committed?
*/
xmin = HeapTupleHeaderGetXmin(tuple.t_data);
- if (!TransactionIdPrecedes(xmin,
- OldestXmin))
+ if (!GlobalVisXidVisibleToAll(vistest, xmin))
{
all_visible = false;
*all_frozen = false;
diff --git a/src/include/access/heapam.h b/src/include/access/heapam.h
index 0b9bb1c9b13..4278f351bdf 100644
--- a/src/include/access/heapam.h
+++ b/src/include/access/heapam.h
@@ -342,7 +342,7 @@ extern void heap_inplace_unlock(Relation relation,
HeapTuple oldtup, Buffer buffer);
extern bool heap_page_is_all_visible(Relation rel, Buffer buf,
- TransactionId OldestXmin,
+ GlobalVisState *vistest,
bool *all_frozen,
TransactionId *visibility_cutoff_xid,
OffsetNumber *logging_offnum);
@@ -415,6 +415,8 @@ extern TM_Result HeapTupleSatisfiesUpdate(HeapTuple htup, CommandId curcid,
Buffer buffer);
extern HTSV_Result HeapTupleSatisfiesVacuum(HeapTuple htup, TransactionId OldestXmin,
Buffer buffer);
+extern HTSV_Result HeapTupleSatisfiesVacuumGlobalVis(HeapTuple htup,
+ GlobalVisState *vistest, Buffer buffer);
extern HTSV_Result HeapTupleSatisfiesVacuumHorizon(HeapTuple htup, Buffer buffer,
TransactionId *dead_after);
extern void HeapTupleSetHintBits(HeapTupleHeader tuple, Buffer buffer,
--
2.43.0
v12-0017-Allow-on-access-pruning-to-set-pages-all-visible.patchtext/x-patch; charset=US-ASCII; name=v12-0017-Allow-on-access-pruning-to-set-pages-all-visible.patchDownload
From 1bc54d63a946c7999427716a711bb6be9db74861 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <melanieplageman@gmail.com>
Date: Tue, 29 Jul 2025 14:34:30 -0400
Subject: [PATCH v12 17/20] Allow on-access pruning to set pages all-visible
Many queries do not modify the underlying relation. For such queries, if
on-access pruning occurs during the scan, we can check whether the page
has become all-visible and update the visibility map accordingly.
Previously, only vacuum marked pages as all-visible or all-frozen.
Supporting this requires passing information about whether the relation
is modified from the executor down to the scan descriptor.
This commit implements on-access VM setting for sequential scans as well
as for the underlying heap relation in index scans and bitmap heap
scans.
---
src/backend/access/heap/heapam.c | 15 ++++-
src/backend/access/heap/heapam_handler.c | 15 ++++-
src/backend/access/heap/pruneheap.c | 67 ++++++++++++++-----
src/backend/access/index/indexam.c | 46 +++++++++++++
src/backend/access/table/tableam.c | 39 +++++++++--
src/backend/executor/execMain.c | 4 ++
src/backend/executor/execUtils.c | 2 +
src/backend/executor/nodeBitmapHeapscan.c | 7 +-
src/backend/executor/nodeIndexscan.c | 18 +++--
src/backend/executor/nodeSeqscan.c | 24 +++++--
src/include/access/genam.h | 11 +++
src/include/access/heapam.h | 24 ++++++-
src/include/access/relscan.h | 6 ++
src/include/access/tableam.h | 30 ++++++++-
src/include/nodes/execnodes.h | 6 ++
.../t/035_standby_logical_decoding.pl | 3 +-
16 files changed, 277 insertions(+), 40 deletions(-)
diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index 6f161a6eab2..f9e50d47aee 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -556,6 +556,7 @@ heap_prepare_pagescan(TableScanDesc sscan)
Buffer buffer = scan->rs_cbuf;
BlockNumber block = scan->rs_cblock;
Snapshot snapshot;
+ Buffer *vmbuffer = NULL;
Page page;
int lines;
bool all_visible;
@@ -570,7 +571,9 @@ heap_prepare_pagescan(TableScanDesc sscan)
/*
* Prune and repair fragmentation for the whole page, if possible.
*/
- heap_page_prune_opt(scan->rs_base.rs_rd, buffer);
+ if (sscan->rs_flags & SO_ALLOW_VM_SET)
+ vmbuffer = &scan->rs_vmbuffer;
+ heap_page_prune_opt(scan->rs_base.rs_rd, buffer, vmbuffer);
/*
* We must hold share lock on the buffer content while examining tuple
@@ -1247,6 +1250,7 @@ heap_beginscan(Relation relation, Snapshot snapshot,
sizeof(TBMIterateResult));
}
+ scan->rs_vmbuffer = InvalidBuffer;
return (TableScanDesc) scan;
}
@@ -1285,6 +1289,12 @@ heap_rescan(TableScanDesc sscan, ScanKey key, bool set_params,
scan->rs_cbuf = InvalidBuffer;
}
+ if (BufferIsValid(scan->rs_vmbuffer))
+ {
+ ReleaseBuffer(scan->rs_vmbuffer);
+ scan->rs_vmbuffer = InvalidBuffer;
+ }
+
/*
* SO_TYPE_BITMAPSCAN would be cleaned up here, but it does not hold any
* additional data vs a normal HeapScan
@@ -1317,6 +1327,9 @@ heap_endscan(TableScanDesc sscan)
if (BufferIsValid(scan->rs_cbuf))
ReleaseBuffer(scan->rs_cbuf);
+ if (BufferIsValid(scan->rs_vmbuffer))
+ ReleaseBuffer(scan->rs_vmbuffer);
+
/*
* Must free the read stream before freeing the BufferAccessStrategy.
*/
diff --git a/src/backend/access/heap/heapam_handler.c b/src/backend/access/heap/heapam_handler.c
index bcbac844bb6..f05b9e4968d 100644
--- a/src/backend/access/heap/heapam_handler.c
+++ b/src/backend/access/heap/heapam_handler.c
@@ -85,6 +85,7 @@ heapam_index_fetch_begin(Relation rel)
hscan->xs_base.rel = rel;
hscan->xs_cbuf = InvalidBuffer;
+ hscan->xs_vmbuffer = InvalidBuffer;
return &hscan->xs_base;
}
@@ -99,6 +100,12 @@ heapam_index_fetch_reset(IndexFetchTableData *scan)
ReleaseBuffer(hscan->xs_cbuf);
hscan->xs_cbuf = InvalidBuffer;
}
+
+ if (BufferIsValid(hscan->xs_vmbuffer))
+ {
+ ReleaseBuffer(hscan->xs_vmbuffer);
+ hscan->xs_vmbuffer = InvalidBuffer;
+ }
}
static void
@@ -138,7 +145,8 @@ heapam_index_fetch_tuple(struct IndexFetchTableData *scan,
* Prune page, but only if we weren't already on this page
*/
if (prev_buf != hscan->xs_cbuf)
- heap_page_prune_opt(hscan->xs_base.rel, hscan->xs_cbuf);
+ heap_page_prune_opt(hscan->xs_base.rel, hscan->xs_cbuf,
+ scan->modifies_base_rel ? NULL : &hscan->xs_vmbuffer);
}
/* Obtain share-lock on the buffer so we can examine visibility */
@@ -2471,6 +2479,7 @@ BitmapHeapScanNextBlock(TableScanDesc scan,
TBMIterateResult *tbmres;
OffsetNumber offsets[TBM_MAX_TUPLES_PER_PAGE];
int noffsets = -1;
+ Buffer *vmbuffer = NULL;
Assert(scan->rs_flags & SO_TYPE_BITMAPSCAN);
Assert(hscan->rs_read_stream);
@@ -2517,7 +2526,9 @@ BitmapHeapScanNextBlock(TableScanDesc scan,
/*
* Prune and repair fragmentation for the whole page, if possible.
*/
- heap_page_prune_opt(scan->rs_rd, buffer);
+ if (scan->rs_flags & SO_ALLOW_VM_SET)
+ vmbuffer = &hscan->rs_vmbuffer;
+ heap_page_prune_opt(scan->rs_rd, buffer, vmbuffer);
/*
* We must hold share lock on the buffer content while examining tuple
diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index ba8ddc7fa35..69d8e42bdc8 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -198,9 +198,13 @@ static bool identify_and_fix_vm_corruption(Relation relation,
* if there's not any use in pruning.
*
* Caller must have pin on the buffer, and must *not* have a lock on it.
+ *
+ * If vmbuffer is not NULL, it is okay for pruning to set the visibility map if
+ * the page is all visible. We will take care of pinning and, if needed,
+ * reading in the page of the visibility map.
*/
void
-heap_page_prune_opt(Relation relation, Buffer buffer)
+heap_page_prune_opt(Relation relation, Buffer buffer, Buffer *vmbuffer)
{
Page page = BufferGetPage(buffer);
TransactionId prune_xid;
@@ -264,6 +268,13 @@ heap_page_prune_opt(Relation relation, Buffer buffer)
{
OffsetNumber dummy_off_loc;
PruneFreezeResult presult;
+ int options = 0;
+
+ if (vmbuffer)
+ {
+ visibilitymap_pin(relation, BufferGetBlockNumber(buffer), vmbuffer);
+ options = HEAP_PAGE_PRUNE_UPDATE_VM;
+ }
/*
* For now, pass mark_unused_now as false regardless of whether or
@@ -271,9 +282,10 @@ heap_page_prune_opt(Relation relation, Buffer buffer)
* that during on-access pruning with the current implementation.
*/
heap_page_prune_and_freeze(relation, buffer, false,
- InvalidBuffer,
- vistest, 0,
- NULL, &presult, PRUNE_ON_ACCESS, &dummy_off_loc, NULL, NULL);
+ vmbuffer ? *vmbuffer : InvalidBuffer,
+ vistest, options,
+ NULL, &presult, PRUNE_ON_ACCESS,
+ &dummy_off_loc, NULL, NULL);
/*
* Report the number of tuples reclaimed to pgstats. This is
@@ -519,12 +531,17 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
* all-frozen for use in opportunistic freezing and to update the VM if
* the caller requests it.
*
- * Currently, only VACUUM attempts freezing and setting the VM bits. But
- * other callers could do either one. The visibility bookkeeping is
- * required for opportunistic freezing (in addition to setting the VM
- * bits) because we only consider opportunistically freezing tuples if the
- * whole page would become all-frozen or if the whole page will be frozen
- * except for dead tuples that will be removed by vacuum.
+ * Currently, only VACUUM attempts freezing. But other callers could. The
+ * visibility bookkeeping is required for opportunistic freezing (in
+ * addition to setting the VM bits) because we only consider
+ * opportunistically freezing tuples if the whole page would become
+ * all-frozen or if the whole page will be frozen except for dead tuples
+ * that will be removed by vacuum. But if consider_update_vm is false,
+ * we'll not set the VM even if the page is discovered to be all-visible.
+ *
+ * If only HEAP_PAGE_PRUNE_UPDATE_VM is passed and not
+ * HEAP_PAGE_PRUNE_FREEZE, prstate.all_frozen must be initialized to false
+ * because we will not call heap_prepare_freeze_tuple() on each tuple.
*
* If only updating the VM, we must initialize all_frozen to false, as
* heap_prepare_freeze_tuple() will not be called for each tuple on the
@@ -536,7 +553,7 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
* whether or not to freeze but before deciding whether or not to update
* the VM so that we don't set the VM bit incorrectly.
*
- * If not freezing or updating the VM, we otherwise avoid the extra
+ * If not freezing and not updating the VM, we avoid the extra
* bookkeeping. Initializing all_visible to false allows skipping the work
* to update them in heap_prune_record_unchanged_lp_normal().
*/
@@ -885,12 +902,30 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
prstate.all_frozen = false;
}
+ /*
+ * If this is an on-access call and we're not actually pruning, avoid
+ * setting the visibility map if it would newly dirty the heap page or, if
+ * the page is already dirty, if doing so would require including a
+ * full-page image (FPI) of the heap page in the WAL. This situation
+ * should be rare, as on-access pruning is only attempted when
+ * pd_prune_xid is valid.
+ */
+ if (reason == PRUNE_ON_ACCESS &&
+ prstate.consider_update_vm &&
+ prstate.all_visible &&
+ !do_prune && !do_freeze &&
+ (!BufferIsDirty(buffer) || XLogCheckBufferNeedsBackup(buffer)))
+ {
+ prstate.consider_update_vm = false;
+ prstate.all_visible = prstate.all_frozen = false;
+ }
+
Assert(!prstate.all_frozen || prstate.all_visible);
/*
- * Handle setting visibility map bit based on information from the VM (as
- * of last heap_vac_scan_next_block() call), and from all_visible and
- * all_frozen variables.
+ * Handle setting visibility map bit based on information from the VM (if
+ * provided, e.g. by vacuum from the last heap_vac_scan_next_block()
+ * call), and from all_visible and all_frozen variables.
*/
if (prstate.consider_update_vm)
{
@@ -2284,8 +2319,8 @@ heap_log_freeze_plan(HeapTupleFreeze *tuples, int ntuples,
* - Reaping: During vacuum phase III, items that are already LP_DEAD are
* marked as unused.
*
- * - VM updates: After vacuum phases I and III, the heap page may be marked
- * all-visible and all-frozen.
+ * - VM updates: After vacuum phases I and III and on-access, the heap page
+ * may be marked all-visible and all-frozen.
*
* These changes all happen together, so we use a singel WAL record for them
* all.
diff --git a/src/backend/access/index/indexam.c b/src/backend/access/index/indexam.c
index 86d11f4ec79..4603ece09bd 100644
--- a/src/backend/access/index/indexam.c
+++ b/src/backend/access/index/indexam.c
@@ -289,6 +289,32 @@ index_beginscan(Relation heapRelation,
return scan;
}
+/*
+ * Similar to index_beginscan(), but allows the caller to indicate whether the
+ * query modifies the underlying base relation. This is used when the caller
+ * wants to attempt marking pages in the base relation as all-visible in the
+ * visibility map during on-access pruning.
+ */
+IndexScanDesc
+index_beginscan_vmset(Relation heapRelation,
+ Relation indexRelation,
+ Snapshot snapshot,
+ IndexScanInstrumentation *instrument,
+ int nkeys, int norderbys, bool modifies_base_rel)
+{
+ IndexScanDesc scan;
+
+ scan = index_beginscan(heapRelation,
+ indexRelation,
+ snapshot,
+ instrument,
+ nkeys, norderbys);
+
+ scan->xs_heapfetch->modifies_base_rel = modifies_base_rel;
+
+ return scan;
+}
+
/*
* index_beginscan_bitmap - start a scan of an index with amgetbitmap
*
@@ -620,6 +646,26 @@ index_beginscan_parallel(Relation heaprel, Relation indexrel,
return scan;
}
+/*
+ * Parallel version of index_beginscan_vmset()
+ */
+IndexScanDesc
+index_beginscan_parallel_vmset(Relation heaprel, Relation indexrel,
+ IndexScanInstrumentation *instrument,
+ int nkeys, int norderbys,
+ ParallelIndexScanDesc pscan,
+ bool modifies_base_rel)
+{
+ IndexScanDesc scan;
+
+ scan = index_beginscan_parallel(heaprel, indexrel,
+ instrument,
+ nkeys, norderbys,
+ pscan);
+ scan->xs_heapfetch->modifies_base_rel = modifies_base_rel;
+ return scan;
+}
+
/* ----------------
* index_getnext_tid - get the next TID from a scan
*
diff --git a/src/backend/access/table/tableam.c b/src/backend/access/table/tableam.c
index a56c5eceb14..67dbf99f5b5 100644
--- a/src/backend/access/table/tableam.c
+++ b/src/backend/access/table/tableam.c
@@ -49,6 +49,10 @@
char *default_table_access_method = DEFAULT_TABLE_ACCESS_METHOD;
bool synchronize_seqscans = true;
+/* Helper for table_beginscan_parallel() and table_beginscan_parallel_vmset() */
+static TableScanDesc table_beginscan_parallel_common(Relation relation, ParallelTableScanDesc pscan,
+ uint32 flags);
+
/* ----------------------------------------------------------------------------
* Slot functions.
@@ -162,12 +166,14 @@ table_parallelscan_initialize(Relation rel, ParallelTableScanDesc pscan,
}
}
-TableScanDesc
-table_beginscan_parallel(Relation relation, ParallelTableScanDesc pscan)
+/*
+ * Common helper for table_beginscan_parallel() and table_beginscan_parallel_vmset()
+ */
+static TableScanDesc
+table_beginscan_parallel_common(Relation relation, ParallelTableScanDesc pscan,
+ uint32 flags)
{
Snapshot snapshot;
- uint32 flags = SO_TYPE_SEQSCAN |
- SO_ALLOW_STRAT | SO_ALLOW_SYNC | SO_ALLOW_PAGEMODE;
Assert(RelFileLocatorEquals(relation->rd_locator, pscan->phs_locator));
@@ -188,6 +194,31 @@ table_beginscan_parallel(Relation relation, ParallelTableScanDesc pscan)
pscan, flags);
}
+TableScanDesc
+table_beginscan_parallel(Relation relation, ParallelTableScanDesc pscan)
+{
+ uint32 flags = SO_TYPE_SEQSCAN |
+ SO_ALLOW_STRAT | SO_ALLOW_SYNC | SO_ALLOW_PAGEMODE;
+
+ return table_beginscan_parallel_common(relation, pscan, flags);
+}
+
+/*
+ * Parallel version of table_beginscan_vmset()
+ */
+TableScanDesc
+table_beginscan_parallel_vmset(Relation relation, ParallelTableScanDesc pscan,
+ bool modifies_rel)
+{
+ uint32 flags = SO_TYPE_SEQSCAN |
+ SO_ALLOW_STRAT | SO_ALLOW_SYNC | SO_ALLOW_PAGEMODE;
+
+ if (!modifies_rel)
+ flags |= SO_ALLOW_VM_SET;
+
+ return table_beginscan_parallel_common(relation, pscan, flags);
+}
+
/* ----------------------------------------------------------------------------
* Index scan related functions.
diff --git a/src/backend/executor/execMain.c b/src/backend/executor/execMain.c
index ff12e2e1364..2e0474c948a 100644
--- a/src/backend/executor/execMain.c
+++ b/src/backend/executor/execMain.c
@@ -916,6 +916,10 @@ InitPlan(QueryDesc *queryDesc, int eflags)
break;
}
+ /* If it has a rowmark, the relation is modified */
+ estate->es_modified_relids = bms_add_member(estate->es_modified_relids,
+ rc->rti);
+
/* Check that relation is a legal target for marking */
if (relation)
CheckValidRowMarkRel(relation, rc->markType);
diff --git a/src/backend/executor/execUtils.c b/src/backend/executor/execUtils.c
index fdc65c2b42b..28a06dcd244 100644
--- a/src/backend/executor/execUtils.c
+++ b/src/backend/executor/execUtils.c
@@ -893,6 +893,8 @@ ExecInitResultRelation(EState *estate, ResultRelInfo *resultRelInfo,
estate->es_result_relations = (ResultRelInfo **)
palloc0(estate->es_range_table_size * sizeof(ResultRelInfo *));
estate->es_result_relations[rti - 1] = resultRelInfo;
+ estate->es_modified_relids = bms_add_member(estate->es_modified_relids,
+ rti);
/*
* Saving in the list allows to avoid needlessly traversing the whole
diff --git a/src/backend/executor/nodeBitmapHeapscan.c b/src/backend/executor/nodeBitmapHeapscan.c
index bf24f3d7fe0..af6db9f7919 100644
--- a/src/backend/executor/nodeBitmapHeapscan.c
+++ b/src/backend/executor/nodeBitmapHeapscan.c
@@ -105,11 +105,16 @@ BitmapTableScanSetup(BitmapHeapScanState *node)
*/
if (!node->ss.ss_currentScanDesc)
{
+ bool modifies_rel =
+ bms_is_member(((Scan *) node->ss.ps.plan)->scanrelid,
+ node->ss.ps.state->es_modified_relids);
+
node->ss.ss_currentScanDesc =
table_beginscan_bm(node->ss.ss_currentRelation,
node->ss.ps.state->es_snapshot,
0,
- NULL);
+ NULL,
+ modifies_rel);
}
node->ss.ss_currentScanDesc->st.rs_tbmiterator = tbmiterator;
diff --git a/src/backend/executor/nodeIndexscan.c b/src/backend/executor/nodeIndexscan.c
index 7fcaa37fe62..c2ffbd3b08e 100644
--- a/src/backend/executor/nodeIndexscan.c
+++ b/src/backend/executor/nodeIndexscan.c
@@ -102,16 +102,22 @@ IndexNext(IndexScanState *node)
if (scandesc == NULL)
{
+
+ bool modifies_base_rel =
+ bms_is_member(((Scan *) node->ss.ps.plan)->scanrelid,
+ estate->es_modified_relids);
+
/*
* We reach here if the index scan is not parallel, or if we're
* serially executing an index scan that was planned to be parallel.
*/
- scandesc = index_beginscan(node->ss.ss_currentRelation,
- node->iss_RelationDesc,
- estate->es_snapshot,
- &node->iss_Instrument,
- node->iss_NumScanKeys,
- node->iss_NumOrderByKeys);
+ scandesc = index_beginscan_vmset(node->ss.ss_currentRelation,
+ node->iss_RelationDesc,
+ estate->es_snapshot,
+ &node->iss_Instrument,
+ node->iss_NumScanKeys,
+ node->iss_NumOrderByKeys,
+ modifies_base_rel);
node->iss_ScanDesc = scandesc;
diff --git a/src/backend/executor/nodeSeqscan.c b/src/backend/executor/nodeSeqscan.c
index 94047d29430..fd69275c181 100644
--- a/src/backend/executor/nodeSeqscan.c
+++ b/src/backend/executor/nodeSeqscan.c
@@ -65,13 +65,18 @@ SeqNext(SeqScanState *node)
if (scandesc == NULL)
{
+ bool modifies_rel =
+ bms_is_member(((Scan *) node->ss.ps.plan)->scanrelid,
+ estate->es_modified_relids);
+
/*
* We reach here if the scan is not parallel, or if we're serially
* executing a scan that was planned to be parallel.
*/
- scandesc = table_beginscan(node->ss.ss_currentRelation,
- estate->es_snapshot,
- 0, NULL);
+ scandesc = table_beginscan_vmset(node->ss.ss_currentRelation,
+ estate->es_snapshot,
+ 0, NULL, modifies_rel);
+
node->ss.ss_currentScanDesc = scandesc;
}
@@ -366,6 +371,7 @@ ExecSeqScanInitializeDSM(SeqScanState *node,
ParallelContext *pcxt)
{
EState *estate = node->ss.ps.state;
+ bool modifies_rel;
ParallelTableScanDesc pscan;
pscan = shm_toc_allocate(pcxt->toc, node->pscan_len);
@@ -373,8 +379,11 @@ ExecSeqScanInitializeDSM(SeqScanState *node,
pscan,
estate->es_snapshot);
shm_toc_insert(pcxt->toc, node->ss.ps.plan->plan_node_id, pscan);
+ modifies_rel = bms_is_member(((Scan *) node->ss.ps.plan)->scanrelid,
+ estate->es_modified_relids);
node->ss.ss_currentScanDesc =
- table_beginscan_parallel(node->ss.ss_currentRelation, pscan);
+ table_beginscan_parallel_vmset(node->ss.ss_currentRelation, pscan,
+ modifies_rel);
}
/* ----------------------------------------------------------------
@@ -404,8 +413,13 @@ ExecSeqScanInitializeWorker(SeqScanState *node,
ParallelWorkerContext *pwcxt)
{
ParallelTableScanDesc pscan;
+ bool modifies_rel =
+ bms_is_member(((Scan *) node->ss.ps.plan)->scanrelid,
+ node->ss.ps.state->es_modified_relids);
pscan = shm_toc_lookup(pwcxt->toc, node->ss.ps.plan->plan_node_id, false);
node->ss.ss_currentScanDesc =
- table_beginscan_parallel(node->ss.ss_currentRelation, pscan);
+ table_beginscan_parallel_vmset(node->ss.ss_currentRelation,
+ pscan,
+ modifies_rel);
}
diff --git a/src/include/access/genam.h b/src/include/access/genam.h
index 5b2ab181b5f..bf272c2c37f 100644
--- a/src/include/access/genam.h
+++ b/src/include/access/genam.h
@@ -180,6 +180,11 @@ extern IndexScanDesc index_beginscan(Relation heapRelation,
Snapshot snapshot,
IndexScanInstrumentation *instrument,
int nkeys, int norderbys);
+extern IndexScanDesc index_beginscan_vmset(Relation heapRelation,
+ Relation indexRelation,
+ Snapshot snapshot,
+ IndexScanInstrumentation *instrument,
+ int nkeys, int norderbys, bool modifies_heap_rel);
extern IndexScanDesc index_beginscan_bitmap(Relation indexRelation,
Snapshot snapshot,
IndexScanInstrumentation *instrument,
@@ -206,6 +211,12 @@ extern IndexScanDesc index_beginscan_parallel(Relation heaprel,
IndexScanInstrumentation *instrument,
int nkeys, int norderbys,
ParallelIndexScanDesc pscan);
+
+extern IndexScanDesc index_beginscan_parallel_vmset(Relation heaprel, Relation indexrel,
+ IndexScanInstrumentation *instrument,
+ int nkeys, int norderbys,
+ ParallelIndexScanDesc pscan,
+ bool modifies_rel);
extern ItemPointer index_getnext_tid(IndexScanDesc scan,
ScanDirection direction);
struct TupleTableSlot;
diff --git a/src/include/access/heapam.h b/src/include/access/heapam.h
index 4278f351bdf..16f7904a21e 100644
--- a/src/include/access/heapam.h
+++ b/src/include/access/heapam.h
@@ -94,6 +94,13 @@ typedef struct HeapScanDescData
*/
ParallelBlockTableScanWorkerData *rs_parallelworkerdata;
+ /*
+ * For sequential scans and bitmap heap scans. If the relation is not
+ * being modified, on-access pruning may read in the current heap page's
+ * corresponding VM block to this buffer.
+ */
+ Buffer rs_vmbuffer;
+
/* these fields only used in page-at-a-time mode and for bitmap scans */
uint32 rs_cindex; /* current tuple's index in vistuples */
uint32 rs_ntuples; /* number of visible tuples on page */
@@ -116,8 +123,18 @@ typedef struct IndexFetchHeapData
{
IndexFetchTableData xs_base; /* AM independent part of the descriptor */
- Buffer xs_cbuf; /* current heap buffer in scan, if any */
- /* NB: if xs_cbuf is not InvalidBuffer, we hold a pin on that buffer */
+ /*
+ * Current heap buffer in scan, if any. NB: if xs_cbuf is not
+ * InvalidBuffer, we hold a pin on that buffer.
+ */
+ Buffer xs_cbuf;
+
+ /*
+ * For index scans that do not modify the underlying heap table, on-access
+ * pruning may read in the current heap page's corresponding VM block to
+ * this buffer.
+ */
+ Buffer xs_vmbuffer;
} IndexFetchHeapData;
/* Result codes for HeapTupleSatisfiesVacuum */
@@ -374,7 +391,8 @@ extern TransactionId heap_index_delete_tuples(Relation rel,
/* in heap/pruneheap.c */
struct GlobalVisState;
-extern void heap_page_prune_opt(Relation relation, Buffer buffer);
+extern void heap_page_prune_opt(Relation relation, Buffer buffer,
+ Buffer *vmbuffer);
extern void heap_page_prune_and_freeze(Relation relation, Buffer buffer,
bool blk_known_av,
Buffer vmbuffer,
diff --git a/src/include/access/relscan.h b/src/include/access/relscan.h
index b5e0fb386c0..f496e0b4939 100644
--- a/src/include/access/relscan.h
+++ b/src/include/access/relscan.h
@@ -121,6 +121,12 @@ typedef struct ParallelBlockTableScanWorkerData *ParallelBlockTableScanWorker;
typedef struct IndexFetchTableData
{
Relation rel;
+
+ /*
+ * Some optimizations can only be performed if the query does not modify
+ * the underlying relation. Track that here.
+ */
+ bool modifies_base_rel;
} IndexFetchTableData;
struct IndexScanInstrumentation;
diff --git a/src/include/access/tableam.h b/src/include/access/tableam.h
index b2ce35e2a34..e31c21cf8eb 100644
--- a/src/include/access/tableam.h
+++ b/src/include/access/tableam.h
@@ -62,6 +62,8 @@ typedef enum ScanOptions
/* unregister snapshot at scan end? */
SO_TEMP_SNAPSHOT = 1 << 9,
+ /* whether or not scan should attempt to set the VM */
+ SO_ALLOW_VM_SET = 1 << 10,
} ScanOptions;
/*
@@ -881,6 +883,25 @@ table_beginscan(Relation rel, Snapshot snapshot,
return rel->rd_tableam->scan_begin(rel, snapshot, nkeys, key, NULL, flags);
}
+/*
+ * Similar to table_beginscan(), but allows the caller to indicate whether the
+ * query modifies the relation. This is used when the caller wants to attempt
+ * marking pages in the relation as all-visible in the visibility map during
+ * on-access pruning.
+ */
+static inline TableScanDesc
+table_beginscan_vmset(Relation rel, Snapshot snapshot,
+ int nkeys, struct ScanKeyData *key, bool modifies_rel)
+{
+ uint32 flags = SO_TYPE_SEQSCAN |
+ SO_ALLOW_STRAT | SO_ALLOW_SYNC | SO_ALLOW_PAGEMODE;
+
+ if (!modifies_rel)
+ flags |= SO_ALLOW_VM_SET;
+
+ return rel->rd_tableam->scan_begin(rel, snapshot, nkeys, key, NULL, flags);
+}
+
/*
* Like table_beginscan(), but for scanning catalog. It'll automatically use a
* snapshot appropriate for scanning catalog relations.
@@ -918,10 +939,13 @@ table_beginscan_strat(Relation rel, Snapshot snapshot,
*/
static inline TableScanDesc
table_beginscan_bm(Relation rel, Snapshot snapshot,
- int nkeys, struct ScanKeyData *key)
+ int nkeys, struct ScanKeyData *key, bool modifies_rel)
{
uint32 flags = SO_TYPE_BITMAPSCAN | SO_ALLOW_PAGEMODE;
+ if (!modifies_rel)
+ flags |= SO_ALLOW_VM_SET;
+
return rel->rd_tableam->scan_begin(rel, snapshot, nkeys, key,
NULL, flags);
}
@@ -1130,6 +1154,10 @@ extern void table_parallelscan_initialize(Relation rel,
extern TableScanDesc table_beginscan_parallel(Relation relation,
ParallelTableScanDesc pscan);
+extern TableScanDesc table_beginscan_parallel_vmset(Relation relation,
+ ParallelTableScanDesc pscan,
+ bool modifies_rel);
+
/*
* Restart a parallel scan. Call this in the leader process. Caller is
* responsible for making sure that all workers have finished the scan
diff --git a/src/include/nodes/execnodes.h b/src/include/nodes/execnodes.h
index de782014b2d..839c1be1d7c 100644
--- a/src/include/nodes/execnodes.h
+++ b/src/include/nodes/execnodes.h
@@ -678,6 +678,12 @@ typedef struct EState
* ExecDoInitialPruning() */
const char *es_sourceText; /* Source text from QueryDesc */
+ /*
+ * RT indexes of relations modified by the query either through
+ * UPDATE/DELETE/INSERT/MERGE or SELECT FOR UPDATE
+ */
+ Bitmapset *es_modified_relids;
+
JunkFilter *es_junkFilter; /* top-level junk filter, if any */
/* If query can insert/delete tuples, the command ID to mark them with */
diff --git a/src/test/recovery/t/035_standby_logical_decoding.pl b/src/test/recovery/t/035_standby_logical_decoding.pl
index c9c182892cf..f5c0c65b260 100644
--- a/src/test/recovery/t/035_standby_logical_decoding.pl
+++ b/src/test/recovery/t/035_standby_logical_decoding.pl
@@ -296,6 +296,7 @@ wal_level = 'logical'
max_replication_slots = 4
max_wal_senders = 4
autovacuum = off
+hot_standby_feedback = on
});
$node_primary->dump_info;
$node_primary->start;
@@ -745,7 +746,7 @@ check_pg_recvlogical_stderr($handle,
$logstart = -s $node_standby->logfile;
reactive_slots_change_hfs_and_wait_for_xmins('shared_row_removal_',
- 'no_conflict_', 0, 1);
+ 'no_conflict_', 1, 0);
# This should not trigger a conflict
wait_until_vacuum_can_remove(
--
2.43.0
v12-0018-Add-helper-functions-to-heap_page_prune_and_free.patchtext/x-patch; charset=US-ASCII; name=v12-0018-Add-helper-functions-to-heap_page_prune_and_free.patchDownload
From 6c3258af5d7959a275876c5ce694fe6923be821e Mon Sep 17 00:00:00 2001
From: Melanie Plageman <melanieplageman@gmail.com>
Date: Wed, 30 Jul 2025 18:51:43 -0400
Subject: [PATCH v12 18/20] Add helper functions to heap_page_prune_and_freeze
heap_page_prune_and_freeze() has gotten rather long. It has several
stages:
1) setup - where the PruneState is set up
2) tuple examination -- where tuples and line pointers are examined to
determine what needs to be pruned and what could be frozen
3) evaluation -- where we determine based on caller provided options,
heuristics, and state gathered during stage 2 whether or not to
freeze tuples and set the page in the VM
4) execution - where the page changes are actually made and logged
This commit refactors the evaluation stage into helpers which return
whether or not to freeze and set the VM.
XXX: For the purposes of committing, this likely shouldn't be a separate
commit. But I'm not sure yet whether it makes more sense to do this
refactoring earlier in the set for clarity for the reviewer.
---
src/backend/access/heap/pruneheap.c | 473 +++++++++++++++++-----------
1 file changed, 296 insertions(+), 177 deletions(-)
diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index 69d8e42bdc8..67b56e45ad7 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -179,6 +179,22 @@ static void heap_prune_record_unchanged_lp_redirect(PruneState *prstate, OffsetN
static void page_verify_redirects(Page page);
+static bool heap_page_will_freeze(Relation relation, Buffer buffer,
+ bool do_prune,
+ bool do_hint_full_or_prunable,
+ bool did_tuple_hint_fpi,
+ PruneState *prstate,
+ bool *all_frozen_except_lp_dead);
+
+static bool heap_page_will_update_vm(Relation relation,
+ Buffer buffer, BlockNumber blockno, Page page,
+ PruneReason reason,
+ bool do_prune, bool do_freeze,
+ bool blk_known_av,
+ PruneState *prstate,
+ Buffer *vmbuffer, uint8 *vmflags,
+ bool *set_pd_all_visible);
+
static bool identify_and_fix_vm_corruption(Relation relation,
BlockNumber heap_blk,
Buffer heap_buffer, Page heap_page,
@@ -382,6 +398,249 @@ identify_and_fix_vm_corruption(Relation relation,
return false;
}
+
+/*
+ * Determine whether to set the visibility map bits based on information from
+ * the PruneState and blk_known_av, which some callers will provide after
+ * previously examining this heap page's VM bits (e.g. vacuum from the last
+ * heap_vac_scan_next_block() call).
+ *
+ * We pass in blockno and page even those can be derived from buffer to avoid
+ * extra BufferGetBlock() and BufferGetBlockNumber() calls.
+ *
+ * This should be called only after do_freeze has been decided (and do_prune
+ * has been set), as these factor into our heuristic-based decision.
+ *
+ * prstate and vmbuffer are input/output fields. vmflags and and
+ * set_pd_all_visible are output fields.
+ *
+ * Returns true if the caller should set one or both of the VM bits and false
+ * otherwise.
+ */
+static bool
+heap_page_will_update_vm(Relation relation,
+ Buffer buffer, BlockNumber blockno, Page page,
+ PruneReason reason,
+ bool do_prune, bool do_freeze,
+ bool blk_known_av,
+ PruneState *prstate,
+ Buffer *vmbuffer, uint8 *vmflags,
+ bool *set_pd_all_visible)
+{
+ bool do_set_vm = false;
+
+ /*
+ * If the caller specified not to update the VM, validate everything is in
+ * the right state and exit.
+ */
+ if (!prstate->consider_update_vm)
+ {
+ Assert(!prstate->all_visible && !prstate->all_frozen);
+ /* We don't set only the page level visibility hint */
+ Assert(!(*set_pd_all_visible));
+ Assert(*vmflags == 0);
+ return false;
+ }
+
+ /*
+ * If this is an on-access call and we're not actually pruning, avoid
+ * setting the visibility map if it would newly dirty the heap page or, if
+ * the page is already dirty, if doing so would require including a
+ * full-page image (FPI) of the heap page in the WAL. This situation
+ * should be rare, as on-access pruning is only attempted when
+ * pd_prune_xid is valid.
+ */
+ if (reason == PRUNE_ON_ACCESS &&
+ prstate->consider_update_vm &&
+ prstate->all_visible &&
+ !do_prune && !do_freeze &&
+ (!BufferIsDirty(buffer) || XLogCheckBufferNeedsBackup(buffer)))
+ {
+ prstate->consider_update_vm = false;
+ prstate->all_visible = prstate->all_frozen = false;
+ }
+
+ Assert(!prstate->all_frozen || prstate->all_visible);
+
+ /*
+ * Clear any VM corruption. This does not need to be in a critical
+ * section, so we do it first. If PD_ALL_VISIBLE is incorrectly set, we
+ * may mark the heap page buffer dirty here and could end up doing so
+ * again later. This is not a correctness issue and is in the path of VM
+ * corruption, so we don't have to worry about the extra performance
+ * overhead.
+ */
+ if (identify_and_fix_vm_corruption(relation,
+ blockno, buffer, page,
+ blk_known_av, prstate->lpdead_items,
+ *vmbuffer))
+ {
+ /* If we fix corruption, don't update the VM further */
+ }
+
+ /* Determine if we actually need to set the VM and which bits to set. */
+ else if (prstate->all_visible &&
+ (!blk_known_av ||
+ (prstate->all_frozen && !VM_ALL_FROZEN(relation, blockno, vmbuffer))))
+ {
+ *vmflags |= VISIBILITYMAP_ALL_VISIBLE;
+ if (prstate->all_frozen)
+ *vmflags |= VISIBILITYMAP_ALL_FROZEN;
+ }
+
+ do_set_vm = *vmflags & VISIBILITYMAP_VALID_BITS;
+
+ /*
+ * Don't set PD_ALL_VISIBLE unless we also plan to set the VM. While it is
+ * correct for a heap page to have PD_ALL_VISIBLE even if the VM is not
+ * set, we strongly prefer to keep them in sync.
+ *
+ * Prior to Postgres 19, it was possible for the page-level bit to be set
+ * and the VM bit to be clear. This could happen if we crashed after
+ * setting PD_ALL_VISIBLE but before setting bits in the VM.
+ */
+ *set_pd_all_visible = do_set_vm && !PageIsAllVisible(page);
+ return do_set_vm;
+}
+
+/*
+ * Decide if we want to go ahead with freezing according to the freeze plans we
+ * prepared for the given buffer or not. If the caller specified we should not
+ * freeze tuples, it exits early.
+ *
+ * do_prune, do_hint_full_or_prunable, and did_tuple_hint_fpi must all have
+ * been decided before calling this function.
+ *
+ * prstate is an input/output parameter. all_frozen_except_lp_dead is set and
+ * used later to determine the snapshot conflict horizon for the record.
+ *
+ * Returns true if we should use our freeze plans and freeze tuples on the page
+ * and false otherwise.
+ */
+static bool
+heap_page_will_freeze(Relation relation, Buffer buffer,
+ bool do_prune,
+ bool do_hint_full_or_prunable,
+ bool did_tuple_hint_fpi,
+ PruneState *prstate,
+ bool *all_frozen_except_lp_dead)
+{
+ bool do_freeze = false;
+
+ /*
+ * If the caller specified we should not attempt to freeze any tuples,
+ * validate that everything is in the right state and exit.
+ */
+ if (!prstate->attempt_freeze)
+ {
+ Assert(!prstate->all_frozen && prstate->nfrozen == 0);
+ Assert(prstate->lpdead_items == 0 || !prstate->all_visible);
+ Assert(!(*all_frozen_except_lp_dead));
+ return false;
+ }
+
+ if (prstate->pagefrz.freeze_required)
+ {
+ /*
+ * heap_prepare_freeze_tuple indicated that at least one XID/MXID from
+ * before FreezeLimit/MultiXactCutoff is present. Must freeze to
+ * advance relfrozenxid/relminmxid.
+ */
+ do_freeze = true;
+ }
+ else
+ {
+ /*
+ * Opportunistically freeze the page if we are generating an FPI
+ * anyway and if doing so means that we can set the page all-frozen
+ * afterwards (might not happen until VACUUM's final heap pass).
+ *
+ * XXX: Previously, we knew if pruning emitted an FPI by checking
+ * pgWalUsage.wal_fpi before and after pruning. Once the freeze and
+ * prune records were combined, this heuristic couldn't be used
+ * anymore. The opportunistic freeze heuristic must be improved;
+ * however, for now, try to approximate the old logic.
+ */
+ if (prstate->all_visible && prstate->all_frozen && prstate->nfrozen > 0)
+ {
+ /*
+ * Freezing would make the page all-frozen. Have already emitted
+ * an FPI or will do so anyway?
+ */
+ if (RelationNeedsWAL(relation))
+ {
+ if (did_tuple_hint_fpi)
+ do_freeze = true;
+ else if (do_prune)
+ {
+ if (XLogCheckBufferNeedsBackup(buffer))
+ do_freeze = true;
+ }
+ else if (do_hint_full_or_prunable)
+ {
+ if (XLogHintBitIsNeeded() && XLogCheckBufferNeedsBackup(buffer))
+ do_freeze = true;
+ }
+ }
+ }
+ }
+
+ if (do_freeze)
+ {
+ /*
+ * Validate the tuples we will be freezing before entering the
+ * critical section.
+ */
+ heap_pre_freeze_checks(buffer, prstate->frozen, prstate->nfrozen);
+ }
+ else if (prstate->nfrozen > 0)
+ {
+ /*
+ * The page contained some tuples that were not already frozen, and we
+ * chose not to freeze them now. The page won't be all-frozen then.
+ */
+ Assert(!prstate->pagefrz.freeze_required);
+
+ prstate->all_frozen = false;
+ prstate->nfrozen = 0; /* avoid miscounts in instrumentation */
+ }
+ else
+ {
+ /*
+ * We have no freeze plans to execute. The page might already be
+ * all-frozen (perhaps only following pruning), though. Such pages
+ * can be marked all-frozen in the VM by our caller, even though none
+ * of its tuples were newly frozen here.
+ */
+ }
+
+ /*
+ * It was convenient to ignore LP_DEAD items in all_visible earlier on to
+ * make the choice of whether or not to freeze the page unaffected by the
+ * short-term presence of LP_DEAD items. These LP_DEAD items were
+ * effectively assumed to be LP_UNUSED items in the making. It doesn't
+ * matter which vacuum heap pass (initial pass or final pass) ends up
+ * setting the page all-frozen, as long as the ongoing VACUUM does it.
+ *
+ * Now that freezing has been finalized, unset all_visible if there are
+ * any LP_DEAD items on the page. It needs to reflect the present state of
+ * the page when using it to determine whether or not to update the VM.
+ *
+ * Keep track of whether or not the page was all-frozen except LP_DEAD
+ * items for the purposes of calculating the snapshot conflict horizon,
+ * though.
+ */
+ *all_frozen_except_lp_dead = prstate->all_frozen;
+ if (prstate->lpdead_items > 0)
+ {
+ prstate->all_visible = false;
+ prstate->all_frozen = false;
+ }
+
+ return do_freeze;
+}
+
+
/*
* Prune and repair fragmentation and potentially freeze tuples on the
* specified page. If the page's visibility status has changed, update it in
@@ -772,20 +1031,30 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
/* Clear the offset information once we have processed the given page. */
*off_loc = InvalidOffsetNumber;
- do_prune = prstate.nredirected > 0 ||
- prstate.ndead > 0 ||
- prstate.nunused > 0;
-
/*
* After processing all the live tuples on the page, if the newest xmin
* amongst them is not visible to everyone, the page cannot be
- * all-visible.
+ * all-visible. This must be done before we decide whether or not to
+ * opportunistically freeze below because we do not want to
+ * opportunistically freeze the page if there are live tuples not visible
+ * to everyone, which would prevent setting the page frozen in the VM.
*/
if (prstate.all_visible &&
TransactionIdIsNormal(prstate.visibility_cutoff_xid) &&
!GlobalVisXidVisibleToAll(prstate.vistest, prstate.visibility_cutoff_xid))
prstate.all_visible = prstate.all_frozen = false;
+ /*
+ * Now decide based on information collected while examining every tuple
+ * which actions to take. If there are any prunable tuples, we'll prune
+ * them. However, we will decide based on options specified by the caller
+ * and various heuristics whether or not to freeze any tuples and whether
+ * or not the page should be set all-visible/all-frozen in the VM.
+ */
+ do_prune = prstate.nredirected > 0 ||
+ prstate.ndead > 0 ||
+ prstate.nunused > 0;
+
/*
* Even if we don't prune anything, if we found a new value for the
* pd_prune_xid field or the page was marked full, we will update those
@@ -796,186 +1065,36 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
PageIsFull(page);
/*
- * Decide if we want to go ahead with freezing according to the freeze
- * plans we prepared, or not.
- */
- do_freeze = false;
- if (prstate.attempt_freeze)
- {
- if (prstate.pagefrz.freeze_required)
- {
- /*
- * heap_prepare_freeze_tuple indicated that at least one XID/MXID
- * from before FreezeLimit/MultiXactCutoff is present. Must
- * freeze to advance relfrozenxid/relminmxid.
- */
- do_freeze = true;
- }
- else
- {
- /*
- * Opportunistically freeze the page if we are generating an FPI
- * anyway and if doing so means that we can set the page
- * all-frozen afterwards (might not happen until VACUUM's final
- * heap pass).
- *
- * XXX: Previously, we knew if pruning emitted an FPI by checking
- * pgWalUsage.wal_fpi before and after pruning. Once the freeze
- * and prune records were combined, this heuristic couldn't be
- * used anymore. The opportunistic freeze heuristic must be
- * improved; however, for now, try to approximate the old logic.
- */
- if (prstate.all_visible && prstate.all_frozen && prstate.nfrozen > 0)
- {
- /*
- * Freezing would make the page all-frozen. Have already
- * emitted an FPI or will do so anyway?
- */
- if (RelationNeedsWAL(relation))
- {
- if (did_tuple_hint_fpi)
- do_freeze = true;
- else if (do_prune)
- {
- if (XLogCheckBufferNeedsBackup(buffer))
- do_freeze = true;
- }
- else if (do_hint_full_or_prunable)
- {
- if (XLogHintBitIsNeeded() && XLogCheckBufferNeedsBackup(buffer))
- do_freeze = true;
- }
- }
- }
- }
- }
-
- if (do_freeze)
- {
- /*
- * Validate the tuples we will be freezing before entering the
- * critical section.
- */
- heap_pre_freeze_checks(buffer, prstate.frozen, prstate.nfrozen);
- }
- else if (prstate.nfrozen > 0)
- {
- /*
- * The page contained some tuples that were not already frozen, and we
- * chose not to freeze them now. The page won't be all-frozen then.
- */
- Assert(!prstate.pagefrz.freeze_required);
-
- prstate.all_frozen = false;
- prstate.nfrozen = 0; /* avoid miscounts in instrumentation */
- }
- else
- {
- /*
- * We have no freeze plans to execute. The page might already be
- * all-frozen (perhaps only following pruning), though. Such pages
- * can be marked all-frozen in the VM by our caller, even though none
- * of its tuples were newly frozen here.
- */
- }
-
- /*
- * It was convenient to ignore LP_DEAD items in all_visible earlier on to
- * make the choice of whether or not to freeze the page unaffected by the
- * short-term presence of LP_DEAD items. These LP_DEAD items were
- * effectively assumed to be LP_UNUSED items in the making. It doesn't
- * matter which vacuum heap pass (initial pass or final pass) ends up
- * setting the page all-frozen, as long as the ongoing VACUUM does it.
- *
- * Now that freezing has been finalized, unset all_visible if there are
- * any LP_DEAD items on the page. It needs to reflect the present state of
- * the page when using it to determine whether or not to update the VM.
- *
- * Keep track of whether or not the page was all-frozen except LP_DEAD
- * items for the purposes of calculating the snapshot conflict horizon,
- * though.
+ * We must decide whether or not to freeze before deciding if and what to
+ * set in the VM.
*/
- all_frozen_except_lp_dead = prstate.all_frozen;
- if (prstate.lpdead_items > 0)
- {
- prstate.all_visible = false;
- prstate.all_frozen = false;
- }
+ do_freeze = heap_page_will_freeze(relation, buffer,
+ do_prune,
+ do_hint_full_or_prunable,
+ did_tuple_hint_fpi,
+ &prstate,
+ &all_frozen_except_lp_dead);
+
+ do_set_vm = heap_page_will_update_vm(relation,
+ buffer, blockno, page,
+ reason,
+ do_prune, do_freeze,
+ blk_known_av,
+ &prstate,
+ &vmbuffer,
+ &vmflags, &set_pd_all_visible);
- /*
- * If this is an on-access call and we're not actually pruning, avoid
- * setting the visibility map if it would newly dirty the heap page or, if
- * the page is already dirty, if doing so would require including a
- * full-page image (FPI) of the heap page in the WAL. This situation
- * should be rare, as on-access pruning is only attempted when
- * pd_prune_xid is valid.
- */
- if (reason == PRUNE_ON_ACCESS &&
- prstate.consider_update_vm &&
- prstate.all_visible &&
- !do_prune && !do_freeze &&
- (!BufferIsDirty(buffer) || XLogCheckBufferNeedsBackup(buffer)))
- {
- prstate.consider_update_vm = false;
- prstate.all_visible = prstate.all_frozen = false;
- }
-
- Assert(!prstate.all_frozen || prstate.all_visible);
-
- /*
- * Handle setting visibility map bit based on information from the VM (if
- * provided, e.g. by vacuum from the last heap_vac_scan_next_block()
- * call), and from all_visible and all_frozen variables.
- */
- if (prstate.consider_update_vm)
- {
- /*
- * Clear any VM corruption. This does not need to be in a critical
- * section, so we do it first. If PD_ALL_VISIBLE is incorrectly set,
- * we may mark the heap page buffer dirty here and could end up doing
- * so again later. This is not a correctness issue and is in the path
- * of VM corruption, so we don't have to worry about the extra
- * performance overhead.
- */
- if (identify_and_fix_vm_corruption(relation,
- blockno, buffer, page,
- blk_known_av, prstate.lpdead_items, vmbuffer))
- {
- /* If we fix corruption, don't update the VM further */
- }
-
- /* Determine if we actually need to set the VM and which bits to set. */
- else if (prstate.all_visible &&
- (!blk_known_av ||
- (prstate.all_frozen && !VM_ALL_FROZEN(relation, blockno, &vmbuffer))))
- {
- vmflags |= VISIBILITYMAP_ALL_VISIBLE;
- if (prstate.all_frozen)
- vmflags |= VISIBILITYMAP_ALL_FROZEN;
- }
- }
-
- do_set_vm = vmflags & VISIBILITYMAP_VALID_BITS;
+ /* Save these for the caller in case we later zero out vmflags */
+ presult->new_vmbits = vmflags;
- /* Lock vmbuffer before entering a critical section */
+ /* Lock vmbuffer before entering critical section */
if (do_set_vm)
LockBuffer(vmbuffer, BUFFER_LOCK_EXCLUSIVE);
/*
- * Don't set PD_ALL_VISIBLE unless we also plan to set the VM. While it is
- * correct for a heap page to have PD_ALL_VISIBLE even if the VM is not
- * set, we strongly prefer to keep them in sync.
- *
- * Prior to Postgres 19, it was possible for the page-level bit to be set
- * and the VM bit to be clear. This could happen if we crashed after
- * setting PD_ALL_VISIBLE but before setting bits in the VM.
+ * Time to actually make the changes to the page and log them. Any error
+ * while applying the changes is critical.
*/
- set_pd_all_visible = do_set_vm && !PageIsAllVisible(page);
-
- /* Save these for the caller in case we later zero out vmflags */
- presult->new_vmbits = vmflags;
-
- /* Any error while applying the changes is critical */
START_CRIT_SECTION();
if (do_hint_full_or_prunable)
--
2.43.0
v12-0019-Reorder-heap_page_prune_and_freeze-parameters.patchtext/x-patch; charset=US-ASCII; name=v12-0019-Reorder-heap_page_prune_and_freeze-parameters.patchDownload
From f9780190dda309979ed52a820e623cecdbac3ad8 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <melanieplageman@gmail.com>
Date: Thu, 31 Jul 2025 12:08:18 -0400
Subject: [PATCH v12 19/20] Reorder heap_page_prune_and_freeze parameters
Reorder parameters so that all of the output parameters are together at
the end of the parameter list.
---
src/backend/access/heap/pruneheap.c | 38 ++++++++++++++--------------
src/backend/access/heap/vacuumlazy.c | 6 ++---
src/include/access/heapam.h | 4 +--
3 files changed, 24 insertions(+), 24 deletions(-)
diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index 67b56e45ad7..3e55c43f17b 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -297,10 +297,10 @@ heap_page_prune_opt(Relation relation, Buffer buffer, Buffer *vmbuffer)
* not the relation has indexes, since we cannot safely determine
* that during on-access pruning with the current implementation.
*/
- heap_page_prune_and_freeze(relation, buffer, false,
+ heap_page_prune_and_freeze(relation, buffer, options, false,
vmbuffer ? *vmbuffer : InvalidBuffer,
- vistest, options,
- NULL, &presult, PRUNE_ON_ACCESS,
+ vistest,
+ NULL, PRUNE_ON_ACCESS, &presult,
&dummy_off_loc, NULL, NULL);
/*
@@ -651,6 +651,15 @@ heap_page_will_freeze(Relation relation, Buffer buffer,
* also need to account for a reduction in the length of the line pointer
* array following array truncation by us.
*
+ * options:
+ * MARK_UNUSED_NOW indicates that dead items can be set LP_UNUSED during
+ * pruning.
+ *
+ * FREEZE indicates that we will also freeze tuples, and will return
+ * 'all_visible', 'all_frozen' flags to the caller.
+ *
+ * UPDATE_VM indicates that we will set the page's status in the VM.
+ *
* If the HEAP_PRUNE_FREEZE option is set, we will also freeze tuples if it's
* required in order to advance relfrozenxid / relminmxid, or if it's
* considered advantageous for overall system performance to do so now. The
@@ -669,30 +678,21 @@ heap_page_will_freeze(Relation relation, Buffer buffer,
* contain the required block of the visibility map.
*
* vistest is used to distinguish whether tuples are DEAD or RECENTLY_DEAD
- * (see heap_prune_satisfies_vacuum).
- *
- * options:
- * MARK_UNUSED_NOW indicates that dead items can be set LP_UNUSED during
- * pruning.
- *
- * FREEZE indicates that we will also freeze tuples, and will return
- * 'all_visible', 'all_frozen' flags to the caller.
- *
- * UPDATE_VM indicates that we will set the page's status in the VM.
+ * (see heap_prune_satisfies_vacuum). It is an input parameter.
*
* cutoffs contains the freeze cutoffs, established by VACUUM at the beginning
* of vacuuming the relation. Required if HEAP_PRUNE_FREEZE option is set.
* cutoffs->OldestXmin is also used to determine if dead tuples are
- * HEAPTUPLE_RECENTLY_DEAD or HEAPTUPLE_DEAD.
+ * HEAPTUPLE_RECENTLY_DEAD or HEAPTUPLE_DEAD. It is an input parameter.
+ *
+ * reason indicates why the pruning is performed. It is included in the WAL
+ * record for debugging and analysis purposes, but otherwise has no effect.
*
* presult contains output parameters needed by callers, such as the number of
* tuples removed and the offsets of dead items on the page after pruning.
* heap_page_prune_and_freeze() is responsible for initializing it. Required
* by all callers.
*
- * reason indicates why the pruning is performed. It is included in the WAL
- * record for debugging and analysis purposes, but otherwise has no effect.
- *
* off_loc is the offset location required by the caller to use in error
* callback.
*
@@ -705,13 +705,13 @@ heap_page_will_freeze(Relation relation, Buffer buffer,
*/
void
heap_page_prune_and_freeze(Relation relation, Buffer buffer,
+ int options,
bool blk_known_av,
Buffer vmbuffer,
GlobalVisState *vistest,
- int options,
struct VacuumCutoffs *cutoffs,
- PruneFreezeResult *presult,
PruneReason reason,
+ PruneFreezeResult *presult,
OffsetNumber *off_loc,
TransactionId *new_relfrozen_xid,
MultiXactId *new_relmin_mxid)
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index 4ad05ba4db6..4fb915e1d94 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -1993,11 +1993,11 @@ lazy_scan_prune(LVRelState *vacrel,
if (vacrel->nindexes == 0)
prune_options |= HEAP_PAGE_PRUNE_MARK_UNUSED_NOW;
- heap_page_prune_and_freeze(rel, buf,
+ heap_page_prune_and_freeze(rel, buf, prune_options,
all_visible_according_to_vm,
vmbuffer,
- vacrel->vistest, prune_options,
- &vacrel->cutoffs, &presult, PRUNE_VACUUM_SCAN,
+ vacrel->vistest,
+ &vacrel->cutoffs, PRUNE_VACUUM_SCAN, &presult,
&vacrel->offnum,
&vacrel->NewRelfrozenXid, &vacrel->NewRelminMxid);
diff --git a/src/include/access/heapam.h b/src/include/access/heapam.h
index 16f7904a21e..0c4e5607627 100644
--- a/src/include/access/heapam.h
+++ b/src/include/access/heapam.h
@@ -394,13 +394,13 @@ struct GlobalVisState;
extern void heap_page_prune_opt(Relation relation, Buffer buffer,
Buffer *vmbuffer);
extern void heap_page_prune_and_freeze(Relation relation, Buffer buffer,
+ int options,
bool blk_known_av,
Buffer vmbuffer,
struct GlobalVisState *vistest,
- int options,
struct VacuumCutoffs *cutoffs,
- PruneFreezeResult *presult,
PruneReason reason,
+ PruneFreezeResult *presult,
OffsetNumber *off_loc,
TransactionId *new_relfrozen_xid,
MultiXactId *new_relmin_mxid);
--
2.43.0
v12-0020-Set-pd_prune_xid-on-insert.patchtext/x-patch; charset=US-ASCII; name=v12-0020-Set-pd_prune_xid-on-insert.patchDownload
From 95c4ebccc8f78b106f43f709a7a657c5102cd2a7 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <melanieplageman@gmail.com>
Date: Tue, 29 Jul 2025 16:12:56 -0400
Subject: [PATCH v12 20/20] Set pd_prune_xid on insert
Now that we can set the VM during read-only queries, it makes sense to
start setting the page prunable hint on insert. This will allow
heap_page_prune_and_freeze() to be called when the page is full or
mostly full.
For years there has been a note in heap_insert() and heap_multi_insert()
pointing out that setting pd_prune_xid would help clean up aborted
inserted tuples that would otherwise not be cleaned up until vacuum.
So, that's another benefit of setting it.
Setting pd_prune_xid on insert causes a page to be pruned and then
written out which then affects the reported number of hits in the
index-killtuples isolation test. This is a quirk of how hits are tracked
which sometimes leads them to be double counted. This should probably be
fixed or changed independently.
ci-os-only:
---
src/backend/access/heap/heapam.c | 25 +++++++++++++------
src/backend/access/heap/heapam_xlog.c | 15 ++++++++++-
.../isolation/expected/index-killtuples.out | 6 ++---
3 files changed, 34 insertions(+), 12 deletions(-)
diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index f9e50d47aee..09d97896c66 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -2105,6 +2105,7 @@ heap_insert(Relation relation, HeapTuple tup, CommandId cid,
TransactionId xid = GetCurrentTransactionId();
HeapTuple heaptup;
Buffer buffer;
+ Page page;
Buffer vmbuffer = InvalidBuffer;
bool all_visible_cleared = false;
@@ -2164,15 +2165,19 @@ heap_insert(Relation relation, HeapTuple tup, CommandId cid,
}
/*
- * XXX Should we set PageSetPrunable on this page ?
+ * Set pd_prune_xid to trigger heap_page_prune_and_freeze() once the page
+ * is full so that we can set the page all-visible in the VM.
*
- * The inserting transaction may eventually abort thus making this tuple
- * DEAD and hence available for pruning. Though we don't want to optimize
- * for aborts, if no other tuple in this page is UPDATEd/DELETEd, the
- * aborted tuple will never be pruned until next vacuum is triggered.
+ * Setting pd_prune_xid is also handy if the inserting transaction
+ * eventually aborts making this tuple DEAD and hence available for
+ * pruning. If no other tuple in this page is UPDATEd/DELETEd, the aborted
+ * tuple would never otherwise be pruned until next vacuum is triggered.
*
- * If you do add PageSetPrunable here, add it in heap_xlog_insert too.
+ * Don't set it if we are in bootstrap mode, though.
*/
+ page = BufferGetPage(buffer);
+ if (TransactionIdIsNormal(xid))
+ PageSetPrunable(page, xid);
MarkBufferDirty(buffer);
@@ -2182,7 +2187,6 @@ heap_insert(Relation relation, HeapTuple tup, CommandId cid,
xl_heap_insert xlrec;
xl_heap_header xlhdr;
XLogRecPtr recptr;
- Page page = BufferGetPage(buffer);
uint8 info = XLOG_HEAP_INSERT;
int bufflags = 0;
@@ -2548,8 +2552,13 @@ heap_multi_insert(Relation relation, TupleTableSlot **slots, int ntuples,
}
/*
- * XXX Should we set PageSetPrunable on this page ? See heap_insert()
+ * Set pd_prune_xid. See heap_insert() for more on why we do this when
+ * inserting tuples. This only makes sense if we aren't already
+ * setting the page frozen in the VM. We also don't set it in
+ * bootstrap mode.
*/
+ if (!all_frozen_set && TransactionIdIsNormal(xid))
+ PageSetPrunable(page, xid);
MarkBufferDirty(buffer);
diff --git a/src/backend/access/heap/heapam_xlog.c b/src/backend/access/heap/heapam_xlog.c
index 9f16ba68d16..321d6a0d960 100644
--- a/src/backend/access/heap/heapam_xlog.c
+++ b/src/backend/access/heap/heapam_xlog.c
@@ -479,6 +479,12 @@ heap_xlog_insert(XLogReaderState *record)
freespace = PageGetHeapFreeSpace(page); /* needed to update FSM below */
+ /*
+ * Set the page prunable to trigger on-access pruning later which may
+ * set the page all-visible in the VM.
+ */
+ PageSetPrunable(page, XLogRecGetXid(record));
+
PageSetLSN(page, lsn);
if (xlrec->flags & XLH_INSERT_ALL_VISIBLE_CLEARED)
@@ -628,9 +634,16 @@ heap_xlog_multi_insert(XLogReaderState *record)
if (xlrec->flags & XLH_INSERT_ALL_VISIBLE_CLEARED)
PageClearAllVisible(page);
- /* XLH_INSERT_ALL_FROZEN_SET implies that all tuples are visible */
+ /*
+ * XLH_INSERT_ALL_FROZEN_SET implies that all tuples are visible. If
+ * we are not setting the page frozen, then set the page's prunable
+ * hint so that we trigger on-access pruning later which may set the
+ * page all-visible in the VM.
+ */
if (xlrec->flags & XLH_INSERT_ALL_FROZEN_SET)
PageSetAllVisible(page);
+ else
+ PageSetPrunable(page, XLogRecGetXid(record));
MarkBufferDirty(buffer);
}
diff --git a/src/test/isolation/expected/index-killtuples.out b/src/test/isolation/expected/index-killtuples.out
index be7ddd756ef..b29f2434b00 100644
--- a/src/test/isolation/expected/index-killtuples.out
+++ b/src/test/isolation/expected/index-killtuples.out
@@ -54,7 +54,7 @@ step flush: SELECT FROM pg_stat_force_next_flush();
step result: SELECT heap_blks_read + heap_blks_hit - counter.heap_accesses AS new_heap_accesses FROM counter, pg_statio_all_tables WHERE relname = 'kill_prior_tuple';
new_heap_accesses
-----------------
- 1
+ 2
(1 row)
step measure: UPDATE counter SET heap_accesses = (SELECT heap_blks_read + heap_blks_hit FROM pg_statio_all_tables WHERE relname = 'kill_prior_tuple');
@@ -130,7 +130,7 @@ step flush: SELECT FROM pg_stat_force_next_flush();
step result: SELECT heap_blks_read + heap_blks_hit - counter.heap_accesses AS new_heap_accesses FROM counter, pg_statio_all_tables WHERE relname = 'kill_prior_tuple';
new_heap_accesses
-----------------
- 1
+ 2
(1 row)
step measure: UPDATE counter SET heap_accesses = (SELECT heap_blks_read + heap_blks_hit FROM pg_statio_all_tables WHERE relname = 'kill_prior_tuple');
@@ -283,7 +283,7 @@ step flush: SELECT FROM pg_stat_force_next_flush();
step result: SELECT heap_blks_read + heap_blks_hit - counter.heap_accesses AS new_heap_accesses FROM counter, pg_statio_all_tables WHERE relname = 'kill_prior_tuple';
new_heap_accesses
-----------------
- 1
+ 2
(1 row)
step measure: UPDATE counter SET heap_accesses = (SELECT heap_blks_read + heap_blks_hit FROM pg_statio_all_tables WHERE relname = 'kill_prior_tuple');
--
2.43.0
On Tue, Sep 9, 2025 at 12:24 PM Melanie Plageman
<melanieplageman@gmail.com> wrote:
For heap_xlog_visible() the LSN interlock comment is easier to parse
because of an earlier comment before reading the heap page:/*
* Read the heap page, if it still exists. If the heap file has dropped or
* truncated later in recovery, we don't need to update the page, but we'd
* better still update the visibility map.
*/I've gone with the direct copy-paste of the LSN interlock paragraph in
attached v12. I think referring to the other comment is too confusing
in context here. However, I also added a line about what could cause
the LSN interlock -- but above it, so as to retain grep-ability of the
other comment.
I think that reads a little strangely. I would consolidate: Note that
the heap relation may have been dropped or truncated, leading us to
skip updating the heap block due to the LSN interlock. However, even
in that case, it's still safe to update the visibility map, etc. The
rest of the comment is perhaps a tad more explicit than our usual
practice, but that might be a good thing, because sometimes we're a
little too terse about these critical details.
I just realized that I don't like this:
+ /*
+ * If we're only adding already frozen rows to a previously empty
+ * page, mark it as all-frozen and update the visibility map. We're
+ * already holding a pin on the vmbuffer.
+ */
The thing is, we rarely position a block comment just before an "else
if". There are probably instances, but it's not typical. That's why
the existing comment contains two "if blah then blah" statements of
which you deleted the second -- because it needed to cover both the
"if" and the "else if". An alternative style is to move the comment
down a nesting level and rephrase without the conditional, ie. "We're
only adding frozen rows to a previously empty page, so mark it as
all-frozen etc." But I don't know that I like doing that for one
branch of the "if" and not the other.
The rest of what's now 0001 looks OK to me now, although you might
want to wait for a review from somebody more knowledgeable about this
area.
Some very quick comments on the next few patches -- far from a full review:
0002. Looks boring, probably unobjectionable provided the payoff patch is OK.
0003. What you've done here with xl_heap_prune.flags is kind of
horrifying. The problem is that, while you've added code explaining
that VISIBILITYMAP_ALL_{VISIBLE,FROZEN} are honorary XLHP flags,
nobody who isn't looking directly at that comment is going to
understand the muddling of the two namespaces. I would suggest not
doing this, even if it means defining redundant constants and writing
technically-unnecessary code to translate between them.
0004. It is not clear to me why you need to get
log_heap_prune_and_freeze to do the work here. Why can't
log_newpage_buffer get the job done already?
0005. It looks a little curious that you delete the
identify-corruption logic from the end of the if-nest and add it to
the beginning. Ceteris paribus, you'd expect that to be worse, since
corruption is a rare case.
0006. "to me marked" -> "to be marked".
+ * If the heap page is all-visible but the VM bit is
not set, we don't
+ * need to dirty the heap page. However, if checksums
are enabled, we
+ * do need to make sure that the heap page is dirtied
before passing
+ * it to visibilitymap_set(), because it may be logged.
*/
- PageSetAllVisible(page);
- MarkBufferDirty(buf);
+ if (!PageIsAllVisible(page) || XLogHintBitIsNeeded())
+ {
+ PageSetAllVisible(page);
+ MarkBufferDirty(buf);
+ }
I really hate this. Maybe you're going to argue that it's not the job
of this patch to fix the awfulness here, but surely marking a buffer
dirty in case some other function decides to WAL-log it is a
ridiculous plan.
--
Robert Haas
EDB: http://www.enterprisedb.com
Thanks for the review! I've made the changes to comments and minor
fixes you suggested in attached v13 and have limited my inline
responses to areas where further discussion is required.
On Tue, Sep 9, 2025 at 3:26 PM Robert Haas <robertmhaas@gmail.com> wrote:
0003. What you've done here with xl_heap_prune.flags is kind of
horrifying. The problem is that, while you've added code explaining
that VISIBILITYMAP_ALL_{VISIBLE,FROZEN} are honorary XLHP flags,
nobody who isn't looking directly at that comment is going to
understand the muddling of the two namespaces. I would suggest not
doing this, even if it means defining redundant constants and writing
technically-unnecessary code to translate between them.
Fair. I've introduced new XLHP flags in attached v13. Hopefully it
puts an end to the horror.
0004. It is not clear to me why you need to get
log_heap_prune_and_freeze to do the work here. Why can't
log_newpage_buffer get the job done already?
Well, I need something to emit the changes to the VM. I'm eliminating
all users of xl_heap_visible. Empty pages are the ones that benefit
the least from switching from xl_heap_visible -> xl_heap_prune. But,
if I don't transition them, we have to maintain all the
xl_heap_visible code (including visibilitymap_set() in its long form).
As for log_newpage_buffer(), I could keep it if you think it is too
confusing to change log_heap_prune_and_freeze()'s API (by passing
force_heap_fpi) to handle this case, I can leave log_newpage_buffer()
there and then call log_heap_prune_and_freeze().
I just thought it seemed simple to avoid emitting the new page record
and the VM update record, so why not -- but I don't have strong
feelings.
0005. It looks a little curious that you delete the
identify-corruption logic from the end of the if-nest and add it to
the beginning. Ceteris paribus, you'd expect that to be worse, since
corruption is a rare case.
On master, the two corruption cases are sandwiched between the normal
VM set cases. And I actually think doing it this way is brittle. If
you put the cases which set the VM first, you have to have completely
bulletproof the if statements guarding them to foreclose any possible
corruption case from entering because otherwise you will overwrite the
corruption you then try to detect.
But, specifically, from a performance perspective:
I think moving up the third case doesn't matter because the check is so cheap:
else if (presult.lpdead_items > 0 && PageIsAllVisible(page))
And as for moving up the second case (the other corruption case), the
non-cheap thing it does is call visibilitymap_get_status()
else if (all_visible_according_to_vm && !PageIsAllVisible(page) &&
visibilitymap_get_status(vacrel->rel, blkno, &vmbuffer) != 0)
But once you call visibilitymap_get_status() once, assuming there is
no corruption and you need to go set the VM, you've already got that
page of the VM read, so it is probably pretty cheap. Overall, I didn't
think this would add noticeable overhead or many wasted operations.
And I thought that reorganizing the code improved clarity as well as
decreased the likelihood of bugs from insufficiently guarding positive
cases against corrupt pages and overwriting corruption instead of
detecting it.
If we're really worried about it from a performance perspective, I
could add an extra test at the top of identify_and_fix_vm_corruption()
that dumps out early if (!all_visible_according_to_vm &&
presult.all_visible).
+ * If the heap page is all-visible but the VM bit is not set, we don't + * need to dirty the heap page. However, if checksums are enabled, we + * do need to make sure that the heap page is dirtied before passing + * it to visibilitymap_set(), because it may be logged. */ - PageSetAllVisible(page); - MarkBufferDirty(buf); + if (!PageIsAllVisible(page) || XLogHintBitIsNeeded()) + { + PageSetAllVisible(page); + MarkBufferDirty(buf); + }I really hate this. Maybe you're going to argue that it's not the job
of this patch to fix the awfulness here, but surely marking a buffer
dirty in case some other function decides to WAL-log it is a
ridiculous plan.
Right, it isn't pretty. But I don't quite see what the alternative is.
We need to mark the buffer dirty before setting the LSN. We could
perhaps rewrite visibilitymap_set()'s API to return the LSN of the
xl_heap_visible record and stamp it on the heap buffer ourselves. But
1) I think visibilitymap_set() purposefully conceals its WAL logging
ways from the caller and propagating that info back up starts to make
the API messy in another way and 2) I'm a bit loath to make big
changes to visibilitymap_set() right now since my patch set eventually
resolves this by putting the changes to the VM and heap page in the
same WAL record.
- Melanie
Attachments:
v13-0004-Use-xl_heap_prune-record-for-setting-empty-pages.patchtext/x-patch; charset=US-ASCII; name=v13-0004-Use-xl_heap_prune-record-for-setting-empty-pages.patchDownload
From 6d9a4502319e125d4fa5350cf63019427afba066 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <melanieplageman@gmail.com>
Date: Wed, 18 Jun 2025 12:42:19 -0400
Subject: [PATCH v13 04/20] Use xl_heap_prune record for setting empty pages
all-visible
As part of a project to eliminate xl_heap_visible records, eliminate
their usage in phase I vacuum of empty pages.
---
src/backend/access/heap/pruneheap.c | 14 +++++--
src/backend/access/heap/vacuumlazy.c | 57 ++++++++++++++++++----------
src/include/access/heapam.h | 1 +
3 files changed, 49 insertions(+), 23 deletions(-)
diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index 680c0562322..343ab55e527 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -836,6 +836,7 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
conflict_xid = prstate.latest_xid_removed;
log_heap_prune_and_freeze(relation, buffer,
+ false,
InvalidBuffer, 0, false,
conflict_xid,
true, reason,
@@ -2055,6 +2056,9 @@ heap_log_freeze_plan(HeapTupleFreeze *tuples, int ntuples,
* case, vmbuffer should already have been updated and marked dirty and should
* still be pinned and locked.
*
+ * force_heap_fpi indicates that a full page image of the heap block should be
+ * forced.
+ *
* set_pd_all_vis indicates that we set PD_ALL_VISIBLE and thus should update
* the page LSN when checksums/wal_log_hints are enabled even if we did not
* prune or freeze tuples on the page.
@@ -2065,6 +2069,7 @@ heap_log_freeze_plan(HeapTupleFreeze *tuples, int ntuples,
*/
void
log_heap_prune_and_freeze(Relation relation, Buffer buffer,
+ bool force_heap_fpi,
Buffer vmbuffer,
uint8 vmflags,
bool set_pd_all_vis,
@@ -2096,13 +2101,16 @@ log_heap_prune_and_freeze(Relation relation, Buffer buffer,
regbuf_flags = REGBUF_STANDARD;
+ if (force_heap_fpi)
+ regbuf_flags |= REGBUF_FORCE_IMAGE;
+
/*
* We can avoid an FPI if the only modification we are making to the heap
* page is to set PD_ALL_VISIBLE and checksums/wal_log_hints are disabled.
*/
- if (!do_prune &&
- nfrozen == 0 &&
- (!set_pd_all_vis || !XLogHintBitIsNeeded()))
+ else if (!do_prune &&
+ nfrozen == 0 &&
+ (!set_pd_all_vis || !XLogHintBitIsNeeded()))
regbuf_flags |= REGBUF_NO_IMAGE;
/*
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index 51067264004..a1cdaaebb57 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -1877,33 +1877,49 @@ lazy_scan_new_or_empty(LVRelState *vacrel, Buffer buf, BlockNumber blkno,
*/
if (!PageIsAllVisible(page))
{
+ uint8 new_vmbits = VISIBILITYMAP_ALL_VISIBLE |
+ VISIBILITYMAP_ALL_FROZEN;
+
+ /* Lock vmbuffer before entering critical section */
+ LockBuffer(vmbuffer, BUFFER_LOCK_EXCLUSIVE);
+
START_CRIT_SECTION();
- /* mark buffer dirty before writing a WAL record */
+ PageSetAllVisible(page);
MarkBufferDirty(buf);
- /*
- * It's possible that another backend has extended the heap,
- * initialized the page, and then failed to WAL-log the page due
- * to an ERROR. Since heap extension is not WAL-logged, recovery
- * might try to replay our record setting the page all-visible and
- * find that the page isn't initialized, which will cause a PANIC.
- * To prevent that, check whether the page has been previously
- * WAL-logged, and if not, do that now.
- */
- if (RelationNeedsWAL(vacrel->rel) &&
- PageGetLSN(page) == InvalidXLogRecPtr)
- log_newpage_buffer(buf, true);
+ visibilitymap_set_vmbits(vacrel->rel, blkno,
+ vmbuffer, new_vmbits);
+
+ if (RelationNeedsWAL(vacrel->rel))
+ {
+ /*
+ * It's possible that another backend has extended the heap,
+ * initialized the page, and then failed to WAL-log the page
+ * due to an ERROR. Since heap extension is not WAL-logged,
+ * recovery might try to replay our record setting the page
+ * all-visible and find that the page isn't initialized, which
+ * will cause a PANIC. To prevent that, if the page hasn't
+ * been previously WAL-logged, force a heap FPI.
+ */
+ log_heap_prune_and_freeze(vacrel->rel, buf,
+ PageGetLSN(page) == InvalidXLogRecPtr,
+ vmbuffer,
+ new_vmbits,
+ true,
+ InvalidTransactionId,
+ false, PRUNE_VACUUM_SCAN,
+ NULL, 0,
+ NULL, 0,
+ NULL, 0,
+ NULL, 0);
+ }
- PageSetAllVisible(page);
- visibilitymap_set(vacrel->rel, blkno, buf,
- InvalidXLogRecPtr,
- vmbuffer, InvalidTransactionId,
- VISIBILITYMAP_ALL_VISIBLE |
- VISIBILITYMAP_ALL_FROZEN);
END_CRIT_SECTION();
- /* Count the newly all-frozen pages for logging */
+ LockBuffer(vmbuffer, BUFFER_LOCK_UNLOCK);
+
+ /* Count the newly all-frozen pages for logging. */
vacrel->vm_new_visible_pages++;
vacrel->vm_new_visible_frozen_pages++;
}
@@ -2925,6 +2941,7 @@ lazy_vacuum_heap_page(LVRelState *vacrel, BlockNumber blkno, Buffer buffer,
if (RelationNeedsWAL(vacrel->rel))
{
log_heap_prune_and_freeze(vacrel->rel, buffer,
+ false,
vmbuffer,
vmflags,
set_pd_all_vis,
diff --git a/src/include/access/heapam.h b/src/include/access/heapam.h
index 13934cb7dc6..7ec270feed0 100644
--- a/src/include/access/heapam.h
+++ b/src/include/access/heapam.h
@@ -394,6 +394,7 @@ extern void heap_page_prune_execute(Buffer buffer, bool lp_truncate_only,
OffsetNumber *nowunused, int nunused);
extern void heap_get_root_tuples(Page page, OffsetNumber *root_offsets);
extern void log_heap_prune_and_freeze(Relation relation, Buffer buffer,
+ bool force_heap_fpi,
Buffer vmbuffer,
uint8 vmflags,
bool set_pd_all_vis,
--
2.43.0
v13-0001-Eliminate-xl_heap_visible-in-COPY-FREEZE.patchtext/x-patch; charset=US-ASCII; name=v13-0001-Eliminate-xl_heap_visible-in-COPY-FREEZE.patchDownload
From 58f0e628e901766697bcbbfbaeb7abe685f23d54 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <melanieplageman@gmail.com>
Date: Tue, 17 Jun 2025 17:22:10 -0400
Subject: [PATCH v13 01/20] Eliminate xl_heap_visible in COPY FREEZE
Instead of emitting a separate WAL record for setting the VM bits in
xl_heap_visible, specify the changes to make to the VM block in the
xl_heap_multi_insert record instead.
This halves the number of WAL records emitted by COPY FREEZE.
Author: Melanie Plageman <melanieplageman@gmail.com>
Reviewed-by: Robert Haas <robertmhaas@gmail.com>
Reviewed-by: Andres Freund <andres@anarazel.de>
Reviewed-by: Kirill Reshke <reshkekirill@gmail.com>
Discussion: https://postgr.es/m/flat/CAAKRu_ZMw6Npd_qm2KM%2BFwQ3cMOMx1Dh3VMhp8-V7SOLxdK9-g%40mail.gmail.com
---
src/backend/access/heap/heapam.c | 44 ++++++++++------
src/backend/access/heap/heapam_xlog.c | 54 +++++++++++++++++++-
src/backend/access/heap/visibilitymap.c | 67 ++++++++++++++++++++++++-
src/backend/access/rmgrdesc/heapdesc.c | 5 ++
src/include/access/visibilitymap.h | 2 +
5 files changed, 154 insertions(+), 18 deletions(-)
diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index 4c5ae205a7a..c8cd9d22726 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -2466,7 +2466,11 @@ heap_multi_insert(Relation relation, TupleTableSlot **slots, int ntuples,
starting_with_empty_page = PageGetMaxOffsetNumber(page) == 0;
if (starting_with_empty_page && (options & HEAP_INSERT_FROZEN))
+ {
all_frozen_set = true;
+ /* Lock the vmbuffer before entering the critical section */
+ LockBuffer(vmbuffer, BUFFER_LOCK_EXCLUSIVE);
+ }
/* NO EREPORT(ERROR) from here till changes are logged */
START_CRIT_SECTION();
@@ -2506,7 +2510,8 @@ heap_multi_insert(Relation relation, TupleTableSlot **slots, int ntuples,
* going to add further frozen rows to it.
*
* If we're only adding already frozen rows to a previously empty
- * page, mark it as all-visible.
+ * page, mark it as all-frozen and update the visibility map. We're
+ * already holding a pin on the vmbuffer.
*/
if (PageIsAllVisible(page) && !(options & HEAP_INSERT_FROZEN))
{
@@ -2517,7 +2522,14 @@ heap_multi_insert(Relation relation, TupleTableSlot **slots, int ntuples,
vmbuffer, VISIBILITYMAP_VALID_BITS);
}
else if (all_frozen_set)
+ {
PageSetAllVisible(page);
+ visibilitymap_set_vmbits(relation,
+ BufferGetBlockNumber(buffer),
+ vmbuffer,
+ VISIBILITYMAP_ALL_VISIBLE |
+ VISIBILITYMAP_ALL_FROZEN);
+ }
/*
* XXX Should we set PageSetPrunable on this page ? See heap_insert()
@@ -2565,6 +2577,12 @@ heap_multi_insert(Relation relation, TupleTableSlot **slots, int ntuples,
xlrec->flags = 0;
if (all_visible_cleared)
xlrec->flags = XLH_INSERT_ALL_VISIBLE_CLEARED;
+
+ /*
+ * We don't have to worry about including a conflict xid in the
+ * WAL record as HEAP_INSERT_FROZEN intentionally violates
+ * visibility rules.
+ */
if (all_frozen_set)
xlrec->flags = XLH_INSERT_ALL_FROZEN_SET;
@@ -2627,7 +2645,10 @@ heap_multi_insert(Relation relation, TupleTableSlot **slots, int ntuples,
XLogBeginInsert();
XLogRegisterData(xlrec, tupledata - scratch.data);
+
XLogRegisterBuffer(0, buffer, REGBUF_STANDARD | bufflags);
+ if (all_frozen_set)
+ XLogRegisterBuffer(1, vmbuffer, 0);
XLogRegisterBufData(0, tupledata, totaldatalen);
@@ -2637,26 +2658,17 @@ heap_multi_insert(Relation relation, TupleTableSlot **slots, int ntuples,
recptr = XLogInsert(RM_HEAP2_ID, info);
PageSetLSN(page, recptr);
+ if (all_frozen_set)
+ {
+ Assert(BufferIsDirty(vmbuffer));
+ PageSetLSN(BufferGetPage(vmbuffer), recptr);
+ }
}
END_CRIT_SECTION();
- /*
- * If we've frozen everything on the page, update the visibilitymap.
- * We're already holding pin on the vmbuffer.
- */
if (all_frozen_set)
- {
- /*
- * It's fine to use InvalidTransactionId here - this is only used
- * when HEAP_INSERT_FROZEN is specified, which intentionally
- * violates visibility rules.
- */
- visibilitymap_set(relation, BufferGetBlockNumber(buffer), buffer,
- InvalidXLogRecPtr, vmbuffer,
- InvalidTransactionId,
- VISIBILITYMAP_ALL_VISIBLE | VISIBILITYMAP_ALL_FROZEN);
- }
+ LockBuffer(vmbuffer, BUFFER_LOCK_UNLOCK);
UnlockReleaseBuffer(buffer);
ndone += nthispage;
diff --git a/src/backend/access/heap/heapam_xlog.c b/src/backend/access/heap/heapam_xlog.c
index cf843277938..faa7c561a8a 100644
--- a/src/backend/access/heap/heapam_xlog.c
+++ b/src/backend/access/heap/heapam_xlog.c
@@ -551,6 +551,7 @@ heap_xlog_multi_insert(XLogReaderState *record)
int i;
bool isinit = (XLogRecGetInfo(record) & XLOG_HEAP_INIT_PAGE) != 0;
XLogRedoAction action;
+ Buffer vmbuffer = InvalidBuffer;
/*
* Insertion doesn't overwrite MVCC data, so no conflict processing is
@@ -571,11 +572,11 @@ heap_xlog_multi_insert(XLogReaderState *record)
if (xlrec->flags & XLH_INSERT_ALL_VISIBLE_CLEARED)
{
Relation reln = CreateFakeRelcacheEntry(rlocator);
- Buffer vmbuffer = InvalidBuffer;
visibilitymap_pin(reln, blkno, &vmbuffer);
visibilitymap_clear(reln, blkno, vmbuffer, VISIBILITYMAP_VALID_BITS);
ReleaseBuffer(vmbuffer);
+ vmbuffer = InvalidBuffer;
FreeFakeRelcacheEntry(reln);
}
@@ -662,6 +663,57 @@ heap_xlog_multi_insert(XLogReaderState *record)
if (BufferIsValid(buffer))
UnlockReleaseBuffer(buffer);
+ buffer = InvalidBuffer;
+
+ /*
+ * Now read and update the VM block.
+ *
+ * Note that the heap relation may have been dropped or truncated, leading
+ * us to skip updating the heap block due to the LSN interlock. However,
+ * even in that case, it's still safe to update the visibility map. Any
+ * WAL record that clears the visibility map bit does so before checking
+ * the page LSN, so any bits that need to be cleared will still be
+ * cleared.
+ *
+ * Note that the lock on the heap page was dropped above. In normal
+ * operation this would never be safe because a concurrent query could
+ * modify the heap page and clear PD_ALL_VISIBLE -- violating the
+ * invariant that PD_ALL_VISIBLE must be set if the corresponding bit in
+ * the VM is set.
+ *
+ * In recovery, we expect no other writers, so writing to the VM page
+ * without holding a lock on the heap page is considered safe enough. It
+ * is done this way when replaying xl_heap_visible records (see
+ * heap_xlog_visible()).
+ */
+ if (xlrec->flags & XLH_INSERT_ALL_FROZEN_SET &&
+ XLogReadBufferForRedoExtended(record, 1, RBM_ZERO_ON_ERROR, false,
+ &vmbuffer) == BLK_NEEDS_REDO)
+ {
+ Page vmpage = BufferGetPage(vmbuffer);
+ Relation reln = CreateFakeRelcacheEntry(rlocator);
+
+ /* initialize the page if it was read as zeros */
+ if (PageIsNew(vmpage))
+ PageInit(vmpage, BLCKSZ, 0);
+
+ visibilitymap_set_vmbits(reln, blkno,
+ vmbuffer,
+ VISIBILITYMAP_ALL_VISIBLE |
+ VISIBILITYMAP_ALL_FROZEN);
+
+ /*
+ * It is not possible that the VM was already set for this heap page,
+ * so the vmbuffer must have been modified and marked dirty.
+ */
+ Assert(BufferIsDirty(vmbuffer));
+ PageSetLSN(BufferGetPage(vmbuffer), lsn);
+ FreeFakeRelcacheEntry(reln);
+ }
+
+ if (BufferIsValid(vmbuffer))
+ UnlockReleaseBuffer(vmbuffer);
+
/*
* If the page is running low on free space, update the FSM as well.
* Arbitrarily, our definition of "low" is less than 20%. We can't do much
diff --git a/src/backend/access/heap/visibilitymap.c b/src/backend/access/heap/visibilitymap.c
index 7306c16f05c..aa48a436108 100644
--- a/src/backend/access/heap/visibilitymap.c
+++ b/src/backend/access/heap/visibilitymap.c
@@ -14,7 +14,8 @@
* visibilitymap_clear - clear bits for one page in the visibility map
* visibilitymap_pin - pin a map page for setting a bit
* visibilitymap_pin_ok - check whether correct map page is already pinned
- * visibilitymap_set - set a bit in a previously pinned page
+ * visibilitymap_set - set bit(s) in a previously pinned page and log
+ * visibilitymap_set_vmbits - set bit(s) in a pinned page
* visibilitymap_get_status - get status of bits
* visibilitymap_count - count number of bits set in visibility map
* visibilitymap_prepare_truncate -
@@ -321,6 +322,70 @@ visibilitymap_set(Relation rel, BlockNumber heapBlk, Buffer heapBuf,
return status;
}
+/*
+ * Set flags in the VM block contained in the passed in vmBuf.
+ *
+ * This function is for callers which include the VM changes in the same WAL
+ * record as the modifications of the heap page which rendered it all-visible.
+ * Callers separately logging the VM changes should invoke visibilitymap_set()
+ * instead.
+ *
+ * Caller must have pinned and exclusive locked the correct block of the VM in
+ * vmBuf. This block should contain the VM bits for the given heapBlk.
+ *
+ * During normal operation (i.e. not recovery), this should be called in a
+ * critical section which also makes any necessary changes to the heap page
+ * and, if relevant, emits WAL.
+ *
+ * Caller is responsible for WAL logging the changes to the VM buffer and for
+ * making any changes needed to the associated heap page. This includes
+ * maintaining any invariants such as ensuring the buffer containing heapBlk
+ * is pinned and exclusive locked.
+ */
+uint8
+visibilitymap_set_vmbits(Relation rel, BlockNumber heapBlk,
+ Buffer vmBuf, uint8 flags)
+{
+ BlockNumber mapBlock = HEAPBLK_TO_MAPBLOCK(heapBlk);
+ uint32 mapByte = HEAPBLK_TO_MAPBYTE(heapBlk);
+ uint8 mapOffset = HEAPBLK_TO_OFFSET(heapBlk);
+ Page page;
+ uint8 *map;
+ uint8 status;
+
+#ifdef TRACE_VISIBILITYMAP
+ elog(DEBUG1, "vm_set flags 0x%02X for %s %d",
+ flags, RelationGetRelationName(rel), heapBlk);
+#endif
+
+ /* Call in same critical section where WAL is emitted. */
+ Assert(InRecovery || CritSectionCount > 0);
+
+ /* Flags should be valid. Also never clear bits with this function */
+ Assert((flags & VISIBILITYMAP_VALID_BITS) == flags);
+
+ /* Must never set all_frozen bit without also setting all_visible bit */
+ Assert(flags != VISIBILITYMAP_ALL_FROZEN);
+
+ /* Check that we have the right VM page pinned */
+ if (!BufferIsValid(vmBuf) || BufferGetBlockNumber(vmBuf) != mapBlock)
+ elog(ERROR, "wrong VM buffer passed to visibilitymap_set");
+
+ Assert(BufferIsExclusiveLocked(vmBuf));
+
+ page = BufferGetPage(vmBuf);
+ map = (uint8 *) PageGetContents(page);
+
+ status = (map[mapByte] >> mapOffset) & VISIBILITYMAP_VALID_BITS;
+ if (flags != status)
+ {
+ map[mapByte] |= (flags << mapOffset);
+ MarkBufferDirty(vmBuf);
+ }
+
+ return status;
+}
+
/*
* visibilitymap_get_status - get status of bits
*
diff --git a/src/backend/access/rmgrdesc/heapdesc.c b/src/backend/access/rmgrdesc/heapdesc.c
index 82b62c95de5..b48d7dc1d24 100644
--- a/src/backend/access/rmgrdesc/heapdesc.c
+++ b/src/backend/access/rmgrdesc/heapdesc.c
@@ -16,6 +16,7 @@
#include "access/heapam_xlog.h"
#include "access/rmgrdesc_utils.h"
+#include "access/visibilitymapdefs.h"
#include "storage/standbydefs.h"
/*
@@ -354,6 +355,10 @@ heap2_desc(StringInfo buf, XLogReaderState *record)
appendStringInfo(buf, "ntuples: %d, flags: 0x%02X", xlrec->ntuples,
xlrec->flags);
+ if (xlrec->flags & XLH_INSERT_ALL_FROZEN_SET)
+ appendStringInfo(buf, ", vm_flags: 0x%02X",
+ VISIBILITYMAP_ALL_VISIBLE | VISIBILITYMAP_ALL_FROZEN);
+
if (XLogRecHasBlockData(record, 0) && !isinit)
{
appendStringInfoString(buf, ", offsets:");
diff --git a/src/include/access/visibilitymap.h b/src/include/access/visibilitymap.h
index be21c6dd1a3..fc7056a91ea 100644
--- a/src/include/access/visibilitymap.h
+++ b/src/include/access/visibilitymap.h
@@ -37,6 +37,8 @@ extern uint8 visibilitymap_set(Relation rel,
Buffer vmBuf,
TransactionId cutoff_xid,
uint8 flags);
+extern uint8 visibilitymap_set_vmbits(Relation rel, BlockNumber heapBlk,
+ Buffer vmBuf, uint8 flags);
extern uint8 visibilitymap_get_status(Relation rel, BlockNumber heapBlk, Buffer *vmbuf);
extern void visibilitymap_count(Relation rel, BlockNumber *all_visible, BlockNumber *all_frozen);
extern BlockNumber visibilitymap_prepare_truncate(Relation rel,
--
2.43.0
v13-0002-Make-heap_page_is_all_visible-independent-of-LVR.patchtext/x-patch; charset=US-ASCII; name=v13-0002-Make-heap_page_is_all_visible-independent-of-LVR.patchDownload
From a0d48ef7ba1b28c084a6a5bde4e27c6af6fb9820 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <melanieplageman@gmail.com>
Date: Wed, 18 Jun 2025 15:48:51 -0400
Subject: [PATCH v13 02/20] Make heap_page_is_all_visible independent of
LVRelState
Future commits will use this function inside of pruneheap.c where we do
not have access to the LVRelState. We only need a few parameters from
the LVRelState, so just pass those in explicitly.
Author: Melanie Plageman <melanieplageman@gmail.com>
Reviewed-by: Kirill Reshke <reshkekirill@gmail.com>
Reviewed-by: Robert Haas <robertmhaas@gmail.com>
Discussion: https://postgr.es/m/flat/CAAKRu_ZMw6Npd_qm2KM%2BFwQ3cMOMx1Dh3VMhp8-V7SOLxdK9-g%40mail.gmail.com
---
src/backend/access/heap/vacuumlazy.c | 48 ++++++++++++++++++----------
1 file changed, 31 insertions(+), 17 deletions(-)
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index 981d9380a92..8a84bdfe0a9 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -463,8 +463,11 @@ static void dead_items_add(LVRelState *vacrel, BlockNumber blkno, OffsetNumber *
int num_offsets);
static void dead_items_reset(LVRelState *vacrel);
static void dead_items_cleanup(LVRelState *vacrel);
-static bool heap_page_is_all_visible(LVRelState *vacrel, Buffer buf,
- TransactionId *visibility_cutoff_xid, bool *all_frozen);
+static bool heap_page_is_all_visible(Relation rel, Buffer buf,
+ TransactionId OldestXmin,
+ bool *all_frozen,
+ TransactionId *visibility_cutoff_xid,
+ OffsetNumber *logging_offnum);
static void update_relstats_all_indexes(LVRelState *vacrel);
static void vacuum_error_callback(void *arg);
static void update_vacuum_error_info(LVRelState *vacrel,
@@ -2009,8 +2012,9 @@ lazy_scan_prune(LVRelState *vacrel,
Assert(presult.lpdead_items == 0);
- if (!heap_page_is_all_visible(vacrel, buf,
- &debug_cutoff, &debug_all_frozen))
+ if (!heap_page_is_all_visible(vacrel->rel, buf,
+ vacrel->cutoffs.OldestXmin, &debug_all_frozen,
+ &debug_cutoff, &vacrel->offnum))
Assert(false);
Assert(presult.all_frozen == debug_all_frozen);
@@ -2912,8 +2916,8 @@ lazy_vacuum_heap_page(LVRelState *vacrel, BlockNumber blkno, Buffer buffer,
* emitted.
*/
Assert(!PageIsAllVisible(page));
- if (heap_page_is_all_visible(vacrel, buffer, &visibility_cutoff_xid,
- &all_frozen))
+ if (heap_page_is_all_visible(vacrel->rel, buffer, vacrel->cutoffs.OldestXmin,
+ &all_frozen, &visibility_cutoff_xid, &vacrel->offnum))
{
uint8 flags = VISIBILITYMAP_ALL_VISIBLE;
@@ -3596,10 +3600,18 @@ dead_items_cleanup(LVRelState *vacrel)
}
/*
- * Check if every tuple in the given page is visible to all current and future
- * transactions. Also return the visibility_cutoff_xid which is the highest
- * xmin amongst the visible tuples. Set *all_frozen to true if every tuple
- * on this page is frozen.
+ * Check if every tuple in the given page in buf is visible to all current and
+ * future transactions.
+ *
+ * OldestXmin is used to determine visibility.
+ *
+ * Sets *all_frozen to true if every tuple on this page is frozen.
+ *
+ * Sets *visibility_cutoff_xid to the highest xmin amongst the visible tuples.
+ * It is only valid if the page is all-visible.
+ *
+ * *logging_offnum will have the OffsetNumber of the current tuple being
+ * processed for vacuum's error callback system.
*
* This is a stripped down version of lazy_scan_prune(). If you change
* anything here, make sure that everything stays in sync. Note that an
@@ -3607,9 +3619,11 @@ dead_items_cleanup(LVRelState *vacrel)
* introducing new side-effects here.
*/
static bool
-heap_page_is_all_visible(LVRelState *vacrel, Buffer buf,
+heap_page_is_all_visible(Relation rel, Buffer buf,
+ TransactionId OldestXmin,
+ bool *all_frozen,
TransactionId *visibility_cutoff_xid,
- bool *all_frozen)
+ OffsetNumber *logging_offnum)
{
Page page = BufferGetPage(buf);
BlockNumber blockno = BufferGetBlockNumber(buf);
@@ -3632,7 +3646,7 @@ heap_page_is_all_visible(LVRelState *vacrel, Buffer buf,
* Set the offset number so that we can display it along with any
* error that occurred while processing this tuple.
*/
- vacrel->offnum = offnum;
+ *logging_offnum = offnum;
itemid = PageGetItemId(page, offnum);
/* Unused or redirect line pointers are of no interest */
@@ -3656,9 +3670,9 @@ heap_page_is_all_visible(LVRelState *vacrel, Buffer buf,
tuple.t_data = (HeapTupleHeader) PageGetItem(page, itemid);
tuple.t_len = ItemIdGetLength(itemid);
- tuple.t_tableOid = RelationGetRelid(vacrel->rel);
+ tuple.t_tableOid = RelationGetRelid(rel);
- switch (HeapTupleSatisfiesVacuum(&tuple, vacrel->cutoffs.OldestXmin,
+ switch (HeapTupleSatisfiesVacuum(&tuple, OldestXmin,
buf))
{
case HEAPTUPLE_LIVE:
@@ -3679,7 +3693,7 @@ heap_page_is_all_visible(LVRelState *vacrel, Buffer buf,
*/
xmin = HeapTupleHeaderGetXmin(tuple.t_data);
if (!TransactionIdPrecedes(xmin,
- vacrel->cutoffs.OldestXmin))
+ OldestXmin))
{
all_visible = false;
*all_frozen = false;
@@ -3714,7 +3728,7 @@ heap_page_is_all_visible(LVRelState *vacrel, Buffer buf,
} /* scan along page */
/* Clear the offset information once we have processed the given page. */
- vacrel->offnum = InvalidOffsetNumber;
+ *logging_offnum = InvalidOffsetNumber;
return all_visible;
}
--
2.43.0
v13-0003-Eliminate-xl_heap_visible-from-vacuum-phase-III.patchtext/x-patch; charset=US-ASCII; name=v13-0003-Eliminate-xl_heap_visible-from-vacuum-phase-III.patchDownload
From c0ae74215319dcd3e79ccd3f15091cf40cf5a692 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <melanieplageman@gmail.com>
Date: Wed, 18 Jun 2025 12:42:13 -0400
Subject: [PATCH v13 03/20] Eliminate xl_heap_visible from vacuum phase III
Instead of emitting a separate xl_heap_visible record for each page that
is rendered all-visible by vacuum's third phase, include the updates to
the VM in the already emitted xl_heap_prune record.
The visibilitymap bits are stored in the flags member of the
xl_heap_prune record.
This can decrease the number of of WAL records vacuum phase III emits by
as much as half.
Author: Melanie Plageman <melanieplageman@gmail.com>
Reviewed-by: Kirill Reshke <reshkekirill@gmail.com>
Reviewed-by: Andres Freund <andres@anarazel.de>
Discussion: https://postgr.es/m/flat/CAAKRu_ZMw6Npd_qm2KM%2BFwQ3cMOMx1Dh3VMhp8-V7SOLxdK9-g%40mail.gmail.com
---
src/backend/access/heap/heapam_xlog.c | 160 ++++++++++++++++++++----
src/backend/access/heap/pruneheap.c | 71 ++++++++++-
src/backend/access/heap/vacuumlazy.c | 166 +++++++++++++++++--------
src/backend/access/rmgrdesc/heapdesc.c | 11 +-
src/include/access/heapam.h | 9 ++
src/include/access/heapam_xlog.h | 40 ++++--
6 files changed, 362 insertions(+), 95 deletions(-)
diff --git a/src/backend/access/heap/heapam_xlog.c b/src/backend/access/heap/heapam_xlog.c
index faa7c561a8a..83b39a9102c 100644
--- a/src/backend/access/heap/heapam_xlog.c
+++ b/src/backend/access/heap/heapam_xlog.c
@@ -35,7 +35,9 @@ heap_xlog_prune_freeze(XLogReaderState *record)
Buffer buffer;
RelFileLocator rlocator;
BlockNumber blkno;
- XLogRedoAction action;
+ Buffer vmbuffer = InvalidBuffer;
+ uint8 vmflags = 0;
+ Size freespace = 0;
XLogRecGetBlockTag(record, 0, &rlocator, NULL, &blkno);
memcpy(&xlrec, maindataptr, SizeOfHeapPrune);
@@ -50,11 +52,22 @@ heap_xlog_prune_freeze(XLogReaderState *record)
Assert((xlrec.flags & XLHP_CLEANUP_LOCK) != 0 ||
(xlrec.flags & (XLHP_HAS_REDIRECTIONS | XLHP_HAS_DEAD_ITEMS)) == 0);
+ if (xlrec.flags & XLHP_VM_ALL_VISIBLE)
+ {
+ vmflags = VISIBILITYMAP_ALL_VISIBLE;
+ if (xlrec.flags & XLHP_VM_ALL_FROZEN)
+ vmflags |= VISIBILITYMAP_ALL_FROZEN;
+ }
+
/*
- * We are about to remove and/or freeze tuples. In Hot Standby mode,
- * ensure that there are no queries running for which the removed tuples
- * are still visible or which still consider the frozen xids as running.
- * The conflict horizon XID comes after xl_heap_prune.
+ * After xl_heap_prune is the optional snapshot conflict horizon.
+ *
+ * In Hot Standby mode, we must ensure that there are no running queries
+ * which would conflict with the changes in this record. That means we
+ * can't replay this record if it removes tuples that are still visible to
+ * transactions on the standby, freeze tuples with xids that are still
+ * considered running on the standby, or set a page as all-visible in the
+ * VM if it isn't all-visible to all transactions on the standby.
*/
if ((xlrec.flags & XLHP_HAS_CONFLICT_HORIZON) != 0)
{
@@ -71,12 +84,12 @@ heap_xlog_prune_freeze(XLogReaderState *record)
}
/*
- * If we have a full-page image, restore it and we're done.
+ * If we have a full-page image of the heap block, restore it and we're
+ * done with the heap block.
*/
- action = XLogReadBufferForRedoExtended(record, 0, RBM_NORMAL,
- (xlrec.flags & XLHP_CLEANUP_LOCK) != 0,
- &buffer);
- if (action == BLK_NEEDS_REDO)
+ if (XLogReadBufferForRedoExtended(record, 0, RBM_NORMAL,
+ (xlrec.flags & XLHP_CLEANUP_LOCK) != 0,
+ &buffer) == BLK_NEEDS_REDO)
{
Page page = BufferGetPage(buffer);
OffsetNumber *redirected;
@@ -89,6 +102,9 @@ heap_xlog_prune_freeze(XLogReaderState *record)
Size datalen;
xlhp_freeze_plan *plans;
OffsetNumber *frz_offsets;
+ bool do_prune;
+ bool mark_buffer_dirty;
+ bool set_heap_lsn;
char *dataptr = XLogRecGetBlockData(record, 0, &datalen);
heap_xlog_deserialize_prune_and_freeze(dataptr, xlrec.flags,
@@ -97,11 +113,18 @@ heap_xlog_prune_freeze(XLogReaderState *record)
&ndead, &nowdead,
&nunused, &nowunused);
+ do_prune = nredirected > 0 || ndead > 0 || nunused > 0;
+ set_heap_lsn = mark_buffer_dirty = do_prune || nplans > 0;
+
+ /* Ensure the record does something */
+ Assert(do_prune || nplans > 0 ||
+ vmflags & VISIBILITYMAP_VALID_BITS);
+
/*
* Update all line pointers per the record, and repair fragmentation
* if needed.
*/
- if (nredirected > 0 || ndead > 0 || nunused > 0)
+ if (do_prune)
heap_page_prune_execute(buffer,
(xlrec.flags & XLHP_CLEANUP_LOCK) == 0,
redirected, nredirected,
@@ -138,36 +161,127 @@ heap_xlog_prune_freeze(XLogReaderState *record)
/* There should be no more data */
Assert((char *) frz_offsets == dataptr + datalen);
+ /*
+ * Now set PD_ALL_VISIBLE, if required. We'll only do this if we are
+ * also going to set bits in the VM later.
+ *
+ * We must never end up with the VM bit set and the page-level
+ * PD_ALL_VISIBLE bit clear. If that were to occur, a subsequent page
+ * modification would fail to clear the VM bit.
+ */
+ if ((vmflags & VISIBILITYMAP_VALID_BITS) && !PageIsAllVisible(page))
+ {
+ PageSetAllVisible(page);
+
+ /*
+ * If the only change to the heap page is setting PD_ALL_VISIBLE,
+ * we can avoid setting the page LSN unless checksums or
+ * wal_log_hints are enabled.
+ */
+ set_heap_lsn = XLogHintBitIsNeeded() ? true : set_heap_lsn;
+ mark_buffer_dirty = true;
+ }
+
/*
* Note: we don't worry about updating the page's prunability hints.
* At worst this will cause an extra prune cycle to occur soon.
*/
- PageSetLSN(page, lsn);
- MarkBufferDirty(buffer);
+ if (mark_buffer_dirty)
+ MarkBufferDirty(buffer);
+ if (set_heap_lsn)
+ PageSetLSN(page, lsn);
}
/*
- * If we released any space or line pointers, update the free space map.
+ * If we released any space or line pointers or will be setting a page in
+ * the visibility map, measure the page's freespace to later update the
+ * freespace map.
+ *
+ * Even if we are just updating the VM (and thus not freeing up any
+ * space), we'll still update the FSM for this page. Since the FSM is not
+ * WAL-logged and only updated heuristically, it easily becomes stale in
+ * standbys. If the standby is later promoted and runs VACUUM, it will
+ * skip updating individual free space figures for pages that became
+ * all-visible (or all-frozen, depending on the vacuum mode,) which is
+ * troublesome when FreeSpaceMapVacuum propagates too optimistic free
+ * space values to upper FSM layers; later inserters try to use such pages
+ * only to find out that they are unusable. This can cause long stalls
+ * when there are many such pages.
+ *
+ * Forestall those problems by updating FSM's idea about a page that is
+ * becoming all-visible or all-frozen.
*
* Do this regardless of a full-page image being applied, since the FSM
* data is not in the page anyway.
+ *
+ * We want to avoid holding an exclusive lock on the heap buffer while
+ * doing IO (either of the FSM or the VM), so we'll release the lock on
+ * the heap buffer before doing either.
*/
if (BufferIsValid(buffer))
{
- if (xlrec.flags & (XLHP_HAS_REDIRECTIONS |
- XLHP_HAS_DEAD_ITEMS |
- XLHP_HAS_NOW_UNUSED_ITEMS))
- {
- Size freespace = PageGetHeapFreeSpace(BufferGetPage(buffer));
+ if ((xlrec.flags & (XLHP_HAS_REDIRECTIONS |
+ XLHP_HAS_DEAD_ITEMS |
+ XLHP_HAS_NOW_UNUSED_ITEMS)) ||
+ vmflags & VISIBILITYMAP_VALID_BITS)
+ freespace = PageGetHeapFreeSpace(BufferGetPage(buffer));
- UnlockReleaseBuffer(buffer);
+ UnlockReleaseBuffer(buffer);
+ }
- XLogRecordPageWithFreeSpace(rlocator, blkno, freespace);
+ /*
+ * Now read and update the VM block.
+ *
+ * Note that the heap relation may have been dropped or truncated, leading
+ * us to skip updating the heap block due to the LSN interlock. However,
+ * even in that case, it's still safe to update the visibility map. Any
+ * WAL record that clears the visibility map bit does so before checking
+ * the page LSN, so any bits that need to be cleared will still be
+ * cleared.
+ *
+ * Note that the lock on the heap page was dropped above. In normal
+ * operation this would never be safe because a concurrent query could
+ * modify the heap page and clear PD_ALL_VISIBLE -- violating the
+ * invariant that PD_ALL_VISIBLE must be set if the corresponding bit in
+ * the VM is set.
+ *
+ * In recovery, we expect no other writers, so writing to the VM page
+ * without holding a lock on the heap page is considered safe enough. It
+ * is done this way when replaying xl_heap_visible records (see
+ * heap_xlog_visible()).
+ */
+ if (vmflags & VISIBILITYMAP_VALID_BITS &&
+ XLogReadBufferForRedoExtended(record, 1,
+ RBM_ZERO_ON_ERROR,
+ false,
+ &vmbuffer) == BLK_NEEDS_REDO)
+ {
+ Page vmpage = BufferGetPage(vmbuffer);
+ uint8 old_vmbits = 0;
+ Relation reln = CreateFakeRelcacheEntry(rlocator);
+
+ /* initialize the page if it was read as zeros */
+ if (PageIsNew(vmpage))
+ PageInit(vmpage, BLCKSZ, 0);
+
+ old_vmbits = visibilitymap_set_vmbits(reln, blkno, vmbuffer, vmflags);
+
+ /* Only set VM page LSN if we modified the page */
+ if (old_vmbits != vmflags)
+ {
+ Assert(BufferIsDirty(vmbuffer));
+ PageSetLSN(BufferGetPage(vmbuffer), lsn);
}
- else
- UnlockReleaseBuffer(buffer);
+
+ FreeFakeRelcacheEntry(reln);
}
+
+ if (BufferIsValid(vmbuffer))
+ UnlockReleaseBuffer(vmbuffer);
+
+ if (freespace > 0)
+ XLogRecordPageWithFreeSpace(rlocator, blkno, freespace);
}
/*
diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index 7ebd22f00a3..680c0562322 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -21,6 +21,7 @@
#include "access/transam.h"
#include "access/xlog.h"
#include "access/xloginsert.h"
+#include "access/visibilitymapdefs.h"
#include "commands/vacuum.h"
#include "executor/instrument.h"
#include "miscadmin.h"
@@ -835,6 +836,7 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
conflict_xid = prstate.latest_xid_removed;
log_heap_prune_and_freeze(relation, buffer,
+ InvalidBuffer, 0, false,
conflict_xid,
true, reason,
prstate.frozen, prstate.nfrozen,
@@ -2030,14 +2032,18 @@ heap_log_freeze_plan(HeapTupleFreeze *tuples, int ntuples,
*
* This is used for several different page maintenance operations:
*
- * - Page pruning, in VACUUM's 1st pass or on access: Some items are
+ * - Page pruning, in vacuum phase I or on-access: Some items are
* redirected, some marked dead, and some removed altogether.
*
- * - Freezing: Items are marked as 'frozen'.
+ * - Freezing: During vacuum phase I, items are marked as 'frozen'
*
- * - Vacuum, 2nd pass: Items that are already LP_DEAD are marked as unused.
+ * - Reaping: During vacuum phase III, items that are already LP_DEAD are
+ * marked as unused.
*
- * They have enough commonalities that we use a single WAL record for them
+ * - VM updates: After vacuum phase III, the heap page may be marked
+ * all-visible and all-frozen.
+ *
+ * These changes all happen together, so we use a single WAL record for them
* all.
*
* If replaying the record requires a cleanup lock, pass cleanup_lock = true.
@@ -2045,12 +2051,23 @@ heap_log_freeze_plan(HeapTupleFreeze *tuples, int ntuples,
* replaying 'unused' items depends on whether they were all previously marked
* as dead.
*
+ * If the VM is being updated, vmflags will contain the bits to set. In this
+ * case, vmbuffer should already have been updated and marked dirty and should
+ * still be pinned and locked.
+ *
+ * set_pd_all_vis indicates that we set PD_ALL_VISIBLE and thus should update
+ * the page LSN when checksums/wal_log_hints are enabled even if we did not
+ * prune or freeze tuples on the page.
+ *
* Note: This function scribbles on the 'frozen' array.
*
* Note: This is called in a critical section, so careful what you do here.
*/
void
log_heap_prune_and_freeze(Relation relation, Buffer buffer,
+ Buffer vmbuffer,
+ uint8 vmflags,
+ bool set_pd_all_vis,
TransactionId conflict_xid,
bool cleanup_lock,
PruneReason reason,
@@ -2062,6 +2079,7 @@ log_heap_prune_and_freeze(Relation relation, Buffer buffer,
xl_heap_prune xlrec;
XLogRecPtr recptr;
uint8 info;
+ uint8 regbuf_flags;
/* The following local variables hold data registered in the WAL record: */
xlhp_freeze_plan plans[MaxHeapTuplesPerPage];
@@ -2070,16 +2088,34 @@ log_heap_prune_and_freeze(Relation relation, Buffer buffer,
xlhp_prune_items dead_items;
xlhp_prune_items unused_items;
OffsetNumber frz_offsets[MaxHeapTuplesPerPage];
+ bool do_prune = nredirected > 0 || ndead > 0 || nunused > 0;
xlrec.flags = 0;
+ Assert((vmflags & VISIBILITYMAP_VALID_BITS) == vmflags);
+
+ regbuf_flags = REGBUF_STANDARD;
+
+ /*
+ * We can avoid an FPI if the only modification we are making to the heap
+ * page is to set PD_ALL_VISIBLE and checksums/wal_log_hints are disabled.
+ */
+ if (!do_prune &&
+ nfrozen == 0 &&
+ (!set_pd_all_vis || !XLogHintBitIsNeeded()))
+ regbuf_flags |= REGBUF_NO_IMAGE;
+
/*
* Prepare data for the buffer. The arrays are not actually in the
* buffer, but we pretend that they are. When XLogInsert stores a full
* page image, the arrays can be omitted.
*/
XLogBeginInsert();
- XLogRegisterBuffer(0, buffer, REGBUF_STANDARD);
+ XLogRegisterBuffer(0, buffer, regbuf_flags);
+
+ if (vmflags & VISIBILITYMAP_VALID_BITS)
+ XLogRegisterBuffer(1, vmbuffer, 0);
+
if (nfrozen > 0)
{
int nplans;
@@ -2136,6 +2172,12 @@ log_heap_prune_and_freeze(Relation relation, Buffer buffer,
* Prepare the main xl_heap_prune record. We already set the XLHP_HAS_*
* flag above.
*/
+ if (vmflags & VISIBILITYMAP_ALL_VISIBLE)
+ {
+ xlrec.flags |= XLHP_VM_ALL_VISIBLE;
+ if (vmflags & VISIBILITYMAP_ALL_FROZEN)
+ xlrec.flags |= XLHP_VM_ALL_FROZEN;
+ }
if (RelationIsAccessibleInLogicalDecoding(relation))
xlrec.flags |= XLHP_IS_CATALOG_REL;
if (TransactionIdIsValid(conflict_xid))
@@ -2168,5 +2210,22 @@ log_heap_prune_and_freeze(Relation relation, Buffer buffer,
}
recptr = XLogInsert(RM_HEAP2_ID, info);
- PageSetLSN(BufferGetPage(buffer), recptr);
+ if (vmflags & VISIBILITYMAP_VALID_BITS)
+ {
+ Assert(BufferIsDirty(vmbuffer));
+ PageSetLSN(BufferGetPage(vmbuffer), recptr);
+ }
+
+ /*
+ * If pruning or freezing tuples or setting the page all-visible when
+ * checksums or wal_hint_bits are enabled, we must bump the LSN. Torn
+ * pages are possible if we update PD_ALL_VISIBLE without bumping the LSN,
+ * but this is deemed okay for page hint updates.
+ */
+ if (do_prune || nfrozen > 0 ||
+ (set_pd_all_vis && XLogHintBitIsNeeded()))
+ {
+ Assert(BufferIsDirty(buffer));
+ PageSetLSN(BufferGetPage(buffer), recptr);
+ }
}
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index 8a84bdfe0a9..51067264004 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -463,11 +463,13 @@ static void dead_items_add(LVRelState *vacrel, BlockNumber blkno, OffsetNumber *
int num_offsets);
static void dead_items_reset(LVRelState *vacrel);
static void dead_items_cleanup(LVRelState *vacrel);
-static bool heap_page_is_all_visible(Relation rel, Buffer buf,
- TransactionId OldestXmin,
- bool *all_frozen,
- TransactionId *visibility_cutoff_xid,
- OffsetNumber *logging_offnum);
+static bool heap_page_would_be_all_visible(Relation rel, Buffer buf,
+ TransactionId OldestXmin,
+ OffsetNumber *deadoffsets,
+ int ndeadoffsets,
+ bool *all_frozen,
+ TransactionId *visibility_cutoff_xid,
+ OffsetNumber *logging_offnum);
static void update_relstats_all_indexes(LVRelState *vacrel);
static void vacuum_error_callback(void *arg);
static void update_vacuum_error_info(LVRelState *vacrel,
@@ -2852,8 +2854,11 @@ lazy_vacuum_heap_page(LVRelState *vacrel, BlockNumber blkno, Buffer buffer,
OffsetNumber unused[MaxHeapTuplesPerPage];
int nunused = 0;
TransactionId visibility_cutoff_xid;
+ TransactionId conflict_xid = InvalidTransactionId;
bool all_frozen;
LVSavedErrInfo saved_err_info;
+ uint8 vmflags = 0;
+ bool set_pd_all_vis = false;
Assert(vacrel->do_index_vacuuming);
@@ -2864,6 +2869,23 @@ lazy_vacuum_heap_page(LVRelState *vacrel, BlockNumber blkno, Buffer buffer,
VACUUM_ERRCB_PHASE_VACUUM_HEAP, blkno,
InvalidOffsetNumber);
+ if (heap_page_would_be_all_visible(vacrel->rel, buffer,
+ vacrel->cutoffs.OldestXmin,
+ deadoffsets, num_offsets,
+ &all_frozen, &visibility_cutoff_xid,
+ &vacrel->offnum))
+ {
+ vmflags |= VISIBILITYMAP_ALL_VISIBLE;
+ if (all_frozen)
+ {
+ vmflags |= VISIBILITYMAP_ALL_FROZEN;
+ Assert(!TransactionIdIsValid(visibility_cutoff_xid));
+ }
+
+ /* Take the lock on the vmbuffer before entering a critical section */
+ LockBuffer(vmbuffer, BUFFER_LOCK_EXCLUSIVE);
+ }
+
START_CRIT_SECTION();
for (int i = 0; i < num_offsets; i++)
@@ -2883,6 +2905,17 @@ lazy_vacuum_heap_page(LVRelState *vacrel, BlockNumber blkno, Buffer buffer,
/* Attempt to truncate line pointer array now */
PageTruncateLinePointerArray(page);
+ if ((vmflags & VISIBILITYMAP_VALID_BITS) != 0)
+ {
+ Assert(!PageIsAllVisible(page));
+ set_pd_all_vis = true;
+ PageSetAllVisible(page);
+ visibilitymap_set_vmbits(vacrel->rel,
+ blkno,
+ vmbuffer, vmflags);
+ conflict_xid = visibility_cutoff_xid;
+ }
+
/*
* Mark buffer dirty before we write WAL.
*/
@@ -2892,7 +2925,10 @@ lazy_vacuum_heap_page(LVRelState *vacrel, BlockNumber blkno, Buffer buffer,
if (RelationNeedsWAL(vacrel->rel))
{
log_heap_prune_and_freeze(vacrel->rel, buffer,
- InvalidTransactionId,
+ vmbuffer,
+ vmflags,
+ set_pd_all_vis,
+ conflict_xid,
false, /* no cleanup lock required */
PRUNE_VACUUM_CLEANUP,
NULL, 0, /* frozen */
@@ -2901,39 +2937,12 @@ lazy_vacuum_heap_page(LVRelState *vacrel, BlockNumber blkno, Buffer buffer,
unused, nunused);
}
- /*
- * End critical section, so we safely can do visibility tests (which
- * possibly need to perform IO and allocate memory!). If we crash now the
- * page (including the corresponding vm bit) might not be marked all
- * visible, but that's fine. A later vacuum will fix that.
- */
END_CRIT_SECTION();
- /*
- * Now that we have removed the LP_DEAD items from the page, once again
- * check if the page has become all-visible. The page is already marked
- * dirty, exclusively locked, and, if needed, a full page image has been
- * emitted.
- */
- Assert(!PageIsAllVisible(page));
- if (heap_page_is_all_visible(vacrel->rel, buffer, vacrel->cutoffs.OldestXmin,
- &all_frozen, &visibility_cutoff_xid, &vacrel->offnum))
+ if ((vmflags & VISIBILITYMAP_ALL_VISIBLE) != 0)
{
- uint8 flags = VISIBILITYMAP_ALL_VISIBLE;
-
- if (all_frozen)
- {
- Assert(!TransactionIdIsValid(visibility_cutoff_xid));
- flags |= VISIBILITYMAP_ALL_FROZEN;
- }
-
- PageSetAllVisible(page);
- visibilitymap_set(vacrel->rel, blkno, buffer,
- InvalidXLogRecPtr,
- vmbuffer, visibility_cutoff_xid,
- flags);
-
/* Count the newly set VM page for logging */
+ LockBuffer(vmbuffer, BUFFER_LOCK_UNLOCK);
vacrel->vm_new_visible_pages++;
if (all_frozen)
vacrel->vm_new_visible_frozen_pages++;
@@ -3600,40 +3609,85 @@ dead_items_cleanup(LVRelState *vacrel)
}
/*
- * Check if every tuple in the given page in buf is visible to all current and
- * future transactions.
+ * Wrapper for heap_page_would_be_all_visible() which can be used for
+ * callers that expect no LP_DEAD on the page.
+ */
+bool
+heap_page_is_all_visible(Relation rel, Buffer buf,
+ TransactionId OldestXmin,
+ bool *all_frozen,
+ TransactionId *visibility_cutoff_xid,
+ OffsetNumber *logging_offnum)
+{
+
+ return heap_page_would_be_all_visible(rel, buf, OldestXmin,
+ NULL, 0,
+ all_frozen,
+ visibility_cutoff_xid,
+ logging_offnum);
+}
+
+/*
+ * Determines whether or not the heap page in buf is all-visible other than
+ * the dead line pointers referred to by the provided deadoffsets array.
*
- * OldestXmin is used to determine visibility.
+ * deadoffsets are the offsets the caller knows about and already removed
+ * associated index entries. Vacuum will call this before setting those line
+ * pointers LP_UNUSED. So, if there are no new LP_DEAD items, then the page
+ * can be set all-visible in the VM by the caller.
+ *
+ * Returns true if the page is all-visible other than the provided
+ * deadoffsets and false otherwise.
*
- * Sets *all_frozen to true if every tuple on this page is frozen.
+ * OldestXmin is used to determine visibility.
*
- * Sets *visibility_cutoff_xid to the highest xmin amongst the visible tuples.
- * It is only valid if the page is all-visible.
+ * *all_frozen is an output parameter indicating to the caller if every tuple
+ * on the page is frozen.
*
* *logging_offnum will have the OffsetNumber of the current tuple being
* processed for vacuum's error callback system.
*
- * This is a stripped down version of lazy_scan_prune(). If you change
- * anything here, make sure that everything stays in sync. Note that an
- * assertion calls us to verify that everybody still agrees. Be sure to avoid
- * introducing new side-effects here.
+ * *visibility_cutoff_xid is an output parameter with the highest xmin amongst the
+ * visible tuples. It is only valid if the page is all-visible.
+ *
+ * Callers looking to verify that the page is already all-visible can call
+ * heap_page_is_all_visible().
+ *
+ * This is similar logic to that in heap_prune_record_unchanged_lp_normal() If
+ * you change anything here, make sure that everything stays in sync. Note
+ * that an assertion calls us to verify that everybody still agrees. Be sure
+ * to avoid introducing new side-effects here.
*/
static bool
-heap_page_is_all_visible(Relation rel, Buffer buf,
- TransactionId OldestXmin,
- bool *all_frozen,
- TransactionId *visibility_cutoff_xid,
- OffsetNumber *logging_offnum)
+heap_page_would_be_all_visible(Relation rel, Buffer buf,
+ TransactionId OldestXmin,
+ OffsetNumber *deadoffsets,
+ int ndeadoffsets,
+ bool *all_frozen,
+ TransactionId *visibility_cutoff_xid,
+ OffsetNumber *logging_offnum)
{
Page page = BufferGetPage(buf);
BlockNumber blockno = BufferGetBlockNumber(buf);
OffsetNumber offnum,
maxoff;
bool all_visible = true;
+ int matched_dead_count = 0;
*visibility_cutoff_xid = InvalidTransactionId;
*all_frozen = true;
+ Assert(ndeadoffsets == 0 || deadoffsets);
+
+#ifdef USE_ASSERT_CHECKING
+ /* Confirm input deadoffsets[] is strictly sorted */
+ if (ndeadoffsets > 1)
+ {
+ for (int i = 1; i < ndeadoffsets; i++)
+ Assert(deadoffsets[i - 1] < deadoffsets[i]);
+ }
+#endif
+
maxoff = PageGetMaxOffsetNumber(page);
for (offnum = FirstOffsetNumber;
offnum <= maxoff && all_visible;
@@ -3661,9 +3715,15 @@ heap_page_is_all_visible(Relation rel, Buffer buf,
*/
if (ItemIdIsDead(itemid))
{
- all_visible = false;
- *all_frozen = false;
- break;
+ if (!deadoffsets ||
+ matched_dead_count >= ndeadoffsets ||
+ deadoffsets[matched_dead_count] != offnum)
+ {
+ *all_frozen = all_visible = false;
+ break;
+ }
+ matched_dead_count++;
+ continue;
}
Assert(ItemIdIsNormal(itemid));
diff --git a/src/backend/access/rmgrdesc/heapdesc.c b/src/backend/access/rmgrdesc/heapdesc.c
index b48d7dc1d24..1cb44ca32d3 100644
--- a/src/backend/access/rmgrdesc/heapdesc.c
+++ b/src/backend/access/rmgrdesc/heapdesc.c
@@ -103,7 +103,7 @@ plan_elem_desc(StringInfo buf, void *plan, void *data)
* code, the latter of which is used in frontend (pg_waldump) code.
*/
void
-heap_xlog_deserialize_prune_and_freeze(char *cursor, uint8 flags,
+heap_xlog_deserialize_prune_and_freeze(char *cursor, uint16 flags,
int *nplans, xlhp_freeze_plan **plans,
OffsetNumber **frz_offsets,
int *nredirected, OffsetNumber **redirected,
@@ -287,6 +287,15 @@ heap2_desc(StringInfo buf, XLogReaderState *record)
appendStringInfo(buf, ", isCatalogRel: %c",
xlrec->flags & XLHP_IS_CATALOG_REL ? 'T' : 'F');
+ if (xlrec->flags & XLHP_VM_ALL_VISIBLE)
+ {
+ uint8 vmflags = VISIBILITYMAP_ALL_VISIBLE;
+
+ if (xlrec->flags & XLHP_VM_ALL_FROZEN)
+ vmflags |= VISIBILITYMAP_ALL_FROZEN;
+ appendStringInfo(buf, ", vm_flags: 0x%02X", vmflags);
+ }
+
if (XLogRecHasBlockData(record, 0))
{
Size datalen;
diff --git a/src/include/access/heapam.h b/src/include/access/heapam.h
index a2bd5a897f8..13934cb7dc6 100644
--- a/src/include/access/heapam.h
+++ b/src/include/access/heapam.h
@@ -344,6 +344,12 @@ extern void heap_inplace_update_and_unlock(Relation relation,
Buffer buffer);
extern void heap_inplace_unlock(Relation relation,
HeapTuple oldtup, Buffer buffer);
+
+extern bool heap_page_is_all_visible(Relation rel, Buffer buf,
+ TransactionId OldestXmin,
+ bool *all_frozen,
+ TransactionId *visibility_cutoff_xid,
+ OffsetNumber *logging_offnum);
extern bool heap_prepare_freeze_tuple(HeapTupleHeader tuple,
const struct VacuumCutoffs *cutoffs,
HeapPageFreeze *pagefrz,
@@ -388,6 +394,9 @@ extern void heap_page_prune_execute(Buffer buffer, bool lp_truncate_only,
OffsetNumber *nowunused, int nunused);
extern void heap_get_root_tuples(Page page, OffsetNumber *root_offsets);
extern void log_heap_prune_and_freeze(Relation relation, Buffer buffer,
+ Buffer vmbuffer,
+ uint8 vmflags,
+ bool set_pd_all_vis,
TransactionId conflict_xid,
bool cleanup_lock,
PruneReason reason,
diff --git a/src/include/access/heapam_xlog.h b/src/include/access/heapam_xlog.h
index d4c0625b632..6d759c197a1 100644
--- a/src/include/access/heapam_xlog.h
+++ b/src/include/access/heapam_xlog.h
@@ -249,7 +249,7 @@ typedef struct xl_heap_update
* Main data section:
*
* xl_heap_prune
- * uint8 flags
+ * uint16 flags
* TransactionId snapshot_conflict_horizon
*
* Block 0 data section:
@@ -284,7 +284,7 @@ typedef struct xl_heap_update
*/
typedef struct xl_heap_prune
{
- uint8 flags;
+ uint16 flags;
/*
* If XLHP_HAS_CONFLICT_HORIZON is set, the conflict horizon XID follows,
@@ -292,10 +292,26 @@ typedef struct xl_heap_prune
*/
} xl_heap_prune;
-#define SizeOfHeapPrune (offsetof(xl_heap_prune, flags) + sizeof(uint8))
+#define SizeOfHeapPrune (offsetof(xl_heap_prune, flags) + sizeof(uint16))
+
+/*
+ * The xl_heap_prune record's flags may also contain which VM bits to set. Note
+ * that VISIBILITYMAP_ALL_VISIBLE and VISIBILITYMAP_ALL_FROZEN are defined to
+ * the same values as XLHP_VM_ALL_VISIBLE and XLHP_VM_ALL_FROZEN respectively.
+ * However, xl_heap_prune should always use the XLHP flags and translate them
+ * back to their visibilitymapdefs.h equivalent.
+ */
+#define XLHP_VM_ALL_VISIBLE (1 << 0)
+#define XLHP_VM_ALL_FROZEN (1 << 1)
-/* to handle recovery conflict during logical decoding on standby */
-#define XLHP_IS_CATALOG_REL (1 << 1)
+/*
+ * To handle recovery conflict during logical decoding on standby, we must know
+ * if the table is a catalog table. Note that in visibilitymapdefs.h
+ * VISIBLITYMAP_XLOG_CATALOG_REL is also defined as (1 << 2). xl_heap_prune
+ * records should use XLHP_IS_CATALOG_REL, not VISIBILIYTMAP_XLOG_CATALOG_REL --
+ * even if they only contain updates to the VM.
+ */
+#define XLHP_IS_CATALOG_REL (1 << 2)
/*
* Does replaying the record require a cleanup-lock?
@@ -305,7 +321,7 @@ typedef struct xl_heap_prune
* marks LP_DEAD line pointers as unused without moving any tuple data, an
* ordinary exclusive lock is sufficient.
*/
-#define XLHP_CLEANUP_LOCK (1 << 2)
+#define XLHP_CLEANUP_LOCK (1 << 3)
/*
* If we remove or freeze any entries that contain xids, we need to include a
@@ -313,22 +329,22 @@ typedef struct xl_heap_prune
* there are no queries running for which the removed tuples are still
* visible, or which still consider the frozen XIDs as running.
*/
-#define XLHP_HAS_CONFLICT_HORIZON (1 << 3)
+#define XLHP_HAS_CONFLICT_HORIZON (1 << 4)
/*
* Indicates that an xlhp_freeze_plans sub-record and one or more
* xlhp_freeze_plan sub-records are present.
*/
-#define XLHP_HAS_FREEZE_PLANS (1 << 4)
+#define XLHP_HAS_FREEZE_PLANS (1 << 5)
/*
* XLHP_HAS_REDIRECTIONS, XLHP_HAS_DEAD_ITEMS, and XLHP_HAS_NOW_UNUSED_ITEMS
* indicate that xlhp_prune_items sub-records with redirected, dead, and
* unused item offsets are present.
*/
-#define XLHP_HAS_REDIRECTIONS (1 << 5)
-#define XLHP_HAS_DEAD_ITEMS (1 << 6)
-#define XLHP_HAS_NOW_UNUSED_ITEMS (1 << 7)
+#define XLHP_HAS_REDIRECTIONS (1 << 6)
+#define XLHP_HAS_DEAD_ITEMS (1 << 7)
+#define XLHP_HAS_NOW_UNUSED_ITEMS (1 << 8)
/*
* xlhp_freeze_plan describes how to freeze a group of one or more heap tuples
@@ -497,7 +513,7 @@ extern XLogRecPtr log_heap_visible(Relation rel, Buffer heap_buffer,
uint8 vmflags);
/* in heapdesc.c, so it can be shared between frontend/backend code */
-extern void heap_xlog_deserialize_prune_and_freeze(char *cursor, uint8 flags,
+extern void heap_xlog_deserialize_prune_and_freeze(char *cursor, uint16 flags,
int *nplans, xlhp_freeze_plan **plans,
OffsetNumber **frz_offsets,
int *nredirected, OffsetNumber **redirected,
--
2.43.0
v13-0007-Find-and-fix-VM-corruption-in-heap_page_prune_an.patchtext/x-patch; charset=US-ASCII; name=v13-0007-Find-and-fix-VM-corruption-in-heap_page_prune_an.patchDownload
From 1d875984902501382636e2d537874f6c4b6e6ea4 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <melanieplageman@gmail.com>
Date: Wed, 28 May 2025 16:45:59 -0400
Subject: [PATCH v13 07/20] Find and fix VM corruption in
heap_page_prune_and_freeze
Future commits will update the VM in the same critical section and WAL
record as pruning and freezing. For ease of review, this commit makes
one step toward doing this. It moves the VM corruption handling case to
heap_page_prune_and_freeze().
This commit is only really meant for review, as it adds a member to
PruneFreezeResult (vm_corruption) that is removed in later commits.
---
src/backend/access/heap/pruneheap.c | 93 +++++++++++++++++++++++++++-
src/backend/access/heap/vacuumlazy.c | 84 +++----------------------
src/include/access/heapam.h | 4 ++
3 files changed, 102 insertions(+), 79 deletions(-)
diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index 343ab55e527..8f968b47c38 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -21,7 +21,7 @@
#include "access/transam.h"
#include "access/xlog.h"
#include "access/xloginsert.h"
-#include "access/visibilitymapdefs.h"
+#include "access/visibilitymap.h"
#include "commands/vacuum.h"
#include "executor/instrument.h"
#include "miscadmin.h"
@@ -177,6 +177,13 @@ static void heap_prune_record_unchanged_lp_redirect(PruneState *prstate, OffsetN
static void page_verify_redirects(Page page);
+static bool identify_and_fix_vm_corruption(Relation relation,
+ BlockNumber heap_blk,
+ Buffer heap_buffer, Page heap_page,
+ bool heap_blk_known_av,
+ int64 nlpdead_items,
+ Buffer vmbuffer);
+
/*
* Optionally prune and repair fragmentation in the specified page.
@@ -261,7 +268,9 @@ heap_page_prune_opt(Relation relation, Buffer buffer)
* not the relation has indexes, since we cannot safely determine
* that during on-access pruning with the current implementation.
*/
- heap_page_prune_and_freeze(relation, buffer, vistest, 0,
+ heap_page_prune_and_freeze(relation, buffer, false,
+ InvalidBuffer,
+ vistest, 0,
NULL, &presult, PRUNE_ON_ACCESS, &dummy_off_loc, NULL, NULL);
/*
@@ -294,6 +303,70 @@ heap_page_prune_opt(Relation relation, Buffer buffer)
}
}
+/*
+ * When updating the visibility map after phase I heap vacuuming, we take the
+ * opportunity to identify and fix any VM corruption.
+ *
+ * heap_blk_known_av is the visibility status of the heap page collected
+ * while finding the next unskippable block in heap_vac_scan_next_block().
+ */
+static bool
+identify_and_fix_vm_corruption(Relation relation,
+ BlockNumber heap_blk,
+ Buffer heap_buffer, Page heap_page,
+ bool heap_blk_known_av,
+ int64 nlpdead_items,
+ Buffer vmbuffer)
+{
+ /*
+ * As of PostgreSQL 9.2, the visibility map bit should never be set if the
+ * page-level bit is clear. However, it's possible that the bit got
+ * cleared after heap_vac_scan_next_block() was called, so we must recheck
+ * with buffer lock before concluding that the VM is corrupt.
+ */
+ if (heap_blk_known_av && !PageIsAllVisible(heap_page) &&
+ visibilitymap_get_status(relation, heap_blk, &vmbuffer) != 0)
+ {
+ ereport(WARNING,
+ (errcode(ERRCODE_DATA_CORRUPTED),
+ errmsg("page is not marked all-visible but visibility map bit is set in relation \"%s\" page %u",
+ RelationGetRelationName(relation), heap_blk)));
+
+ visibilitymap_clear(relation, heap_blk, vmbuffer,
+ VISIBILITYMAP_VALID_BITS);
+ return true;
+ }
+
+ /*
+ * It's possible for the value returned by
+ * GetOldestNonRemovableTransactionId() to move backwards, so it's not
+ * wrong for us to see tuples that appear to not be visible to everyone
+ * yet, while PD_ALL_VISIBLE is already set. The real safe xmin value
+ * never moves backwards, but GetOldestNonRemovableTransactionId() is
+ * conservative and sometimes returns a value that's unnecessarily small,
+ * so if we see that contradiction it just means that the tuples that we
+ * think are not visible to everyone yet actually are, and the
+ * PD_ALL_VISIBLE flag is correct.
+ *
+ * There should never be LP_DEAD items on a page with PD_ALL_VISIBLE set,
+ * however.
+ */
+ if (nlpdead_items > 0 && PageIsAllVisible(heap_page))
+ {
+ ereport(WARNING,
+ (errcode(ERRCODE_DATA_CORRUPTED),
+ errmsg("page containing LP_DEAD items is marked as all-visible in relation \"%s\" page %u",
+ RelationGetRelationName(relation), heap_blk)));
+
+ PageClearAllVisible(heap_page);
+ MarkBufferDirty(heap_buffer);
+ visibilitymap_clear(relation, heap_blk, vmbuffer,
+ VISIBILITYMAP_VALID_BITS);
+ return true;
+ }
+
+ return false;
+}
/*
* Prune and repair fragmentation and potentially freeze tuples on the
@@ -314,6 +387,10 @@ heap_page_prune_opt(Relation relation, Buffer buffer)
* HEAP_PRUNE_FREEZE option is not set, because at the moment only callers
* that also freeze need that information.
*
+ * blk_known_av is the visibility status of the heap block as of the last call
+ * to find_next_unskippable_block(). vmbuffer is the buffer that may already
+ * contain the required block of the visibility map.
+ *
* vistest is used to distinguish whether tuples are DEAD or RECENTLY_DEAD
* (see heap_prune_satisfies_vacuum).
*
@@ -349,6 +426,8 @@ heap_page_prune_opt(Relation relation, Buffer buffer)
*/
void
heap_page_prune_and_freeze(Relation relation, Buffer buffer,
+ bool blk_known_av,
+ Buffer vmbuffer,
GlobalVisState *vistest,
int options,
struct VacuumCutoffs *cutoffs,
@@ -897,6 +976,16 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
presult->lpdead_items = prstate.lpdead_items;
/* the presult->deadoffsets array was already filled in */
+ /*
+ * Clear any VM corruption. This does not need to be done in a critical
+ * section.
+ */
+ presult->vm_corruption = false;
+ if (options & HEAP_PAGE_PRUNE_UPDATE_VM)
+ presult->vm_corruption = identify_and_fix_vm_corruption(relation,
+ blockno, buffer, page,
+ blk_known_av,
+ prstate.lpdead_items, vmbuffer);
if (prstate.freeze)
{
if (presult->nfrozen > 0)
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index d6818323932..a222c9f9164 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -430,12 +430,6 @@ static void find_next_unskippable_block(LVRelState *vacrel, bool *skipsallvis);
static bool lazy_scan_new_or_empty(LVRelState *vacrel, Buffer buf,
BlockNumber blkno, Page page,
bool sharelock, Buffer vmbuffer);
-static bool identify_and_fix_vm_corruption(Relation relation,
- BlockNumber heap_blk,
- Buffer heap_buffer, Page heap_page,
- bool heap_blk_known_av,
- int64 nlpdead_items,
- Buffer vmbuffer);
static int lazy_scan_prune(LVRelState *vacrel, Buffer buf,
BlockNumber blkno, Page page,
Buffer vmbuffer, bool all_visible_according_to_vm,
@@ -1940,72 +1934,6 @@ lazy_scan_new_or_empty(LVRelState *vacrel, Buffer buf, BlockNumber blkno,
return false;
}
-/*
- * When updating the visibility map after phase I heap vacuuming, we take the
- * opportunity to identify and fix any VM corruption.
- *
- * heap_blk_known_av is the visibility status of the heap page collected
- * while finding the next unskippable block in heap_vac_scan_next_block().
- */
-static bool
-identify_and_fix_vm_corruption(Relation relation,
- BlockNumber heap_blk,
- Buffer heap_buffer, Page heap_page,
- bool heap_blk_known_av,
- int64 nlpdead_items,
- Buffer vmbuffer)
-{
- /*
- * As of PostgreSQL 9.2, the visibility map bit should never be set if the
- * page-level bit is clear. However, it's possible that the bit got
- * cleared after heap_vac_scan_next_block() was called, so we must recheck
- * with buffer lock before concluding that the VM is corrupt.
- */
- if (heap_blk_known_av && !PageIsAllVisible(heap_page) &&
- visibilitymap_get_status(relation, heap_blk, &vmbuffer) != 0)
- {
- ereport(WARNING,
- (errcode(ERRCODE_DATA_CORRUPTED),
- errmsg("page is not marked all-visible but visibility map bit is set in relation \"%s\" page %u",
- RelationGetRelationName(relation), heap_blk)));
-
- visibilitymap_clear(relation, heap_blk, vmbuffer,
- VISIBILITYMAP_VALID_BITS);
- return true;
- }
-
- /*
- * It's possible for the value returned by
- * GetOldestNonRemovableTransactionId() to move backwards, so it's not
- * wrong for us to see tuples that appear to not be visible to everyone
- * yet, while PD_ALL_VISIBLE is already set. The real safe xmin value
- * never moves backwards, but GetOldestNonRemovableTransactionId() is
- * conservative and sometimes returns a value that's unnecessarily small,
- * so if we see that contradiction it just means that the tuples that we
- * think are not visible to everyone yet actually are, and the
- * PD_ALL_VISIBLE flag is correct.
- *
- * There should never be LP_DEAD items on a page with PD_ALL_VISIBLE set,
- * however.
- */
- if (nlpdead_items > 0 && PageIsAllVisible(heap_page))
- {
- ereport(WARNING,
- (errcode(ERRCODE_DATA_CORRUPTED),
- errmsg("page containing LP_DEAD items is marked as all-visible in relation \"%s\" page %u",
- RelationGetRelationName(relation), heap_blk)));
-
- PageClearAllVisible(heap_page);
- MarkBufferDirty(heap_buffer);
- visibilitymap_clear(relation, heap_blk, vmbuffer,
- VISIBILITYMAP_VALID_BITS);
- return true;
- }
-
- return false;
-}
-
-
/* qsort comparator for sorting OffsetNumbers */
static int
cmpOffsetNumbers(const void *a, const void *b)
@@ -2063,11 +1991,14 @@ lazy_scan_prune(LVRelState *vacrel,
* tuples. Pruning will have determined whether or not the page is
* all-visible.
*/
- prune_options = HEAP_PAGE_PRUNE_FREEZE;
+ prune_options = HEAP_PAGE_PRUNE_FREEZE | HEAP_PAGE_PRUNE_UPDATE_VM;
if (vacrel->nindexes == 0)
prune_options |= HEAP_PAGE_PRUNE_MARK_UNUSED_NOW;
- heap_page_prune_and_freeze(rel, buf, vacrel->vistest, prune_options,
+ heap_page_prune_and_freeze(rel, buf,
+ all_visible_according_to_vm,
+ vmbuffer,
+ vacrel->vistest, prune_options,
&vacrel->cutoffs, &presult, PRUNE_VACUUM_SCAN,
&vacrel->offnum,
&vacrel->NewRelfrozenXid, &vacrel->NewRelminMxid);
@@ -2152,10 +2083,9 @@ lazy_scan_prune(LVRelState *vacrel,
/*
* Handle setting visibility map bit based on information from the VM (as
* of last heap_vac_scan_next_block() call), and from all_visible and
- * all_frozen variables. Start by looking for any VM corruption.
+ * all_frozen variables.
*/
- if (identify_and_fix_vm_corruption(vacrel->rel, blkno, buf, page,
- all_visible_according_to_vm, presult.lpdead_items, vmbuffer))
+ if (presult.vm_corruption)
{
/* Don't update the VM if we just cleared corruption in it */
}
diff --git a/src/include/access/heapam.h b/src/include/access/heapam.h
index 7ec270feed0..a3b85fd1daf 100644
--- a/src/include/access/heapam.h
+++ b/src/include/access/heapam.h
@@ -42,6 +42,7 @@
/* "options" flag bits for heap_page_prune_and_freeze */
#define HEAP_PAGE_PRUNE_MARK_UNUSED_NOW (1 << 0)
#define HEAP_PAGE_PRUNE_FREEZE (1 << 1)
+#define HEAP_PAGE_PRUNE_UPDATE_VM (1 << 2)
typedef struct BulkInsertStateData *BulkInsertState;
struct TupleTableSlot;
@@ -247,6 +248,7 @@ typedef struct PruneFreezeResult
bool all_visible;
bool all_frozen;
TransactionId vm_conflict_horizon;
+ bool vm_corruption;
/*
* Whether or not the page makes rel truncation unsafe. This is set to
@@ -380,6 +382,8 @@ extern TransactionId heap_index_delete_tuples(Relation rel,
struct GlobalVisState;
extern void heap_page_prune_opt(Relation relation, Buffer buffer);
extern void heap_page_prune_and_freeze(Relation relation, Buffer buffer,
+ bool blk_known_av,
+ Buffer vmbuffer,
struct GlobalVisState *vistest,
int options,
struct VacuumCutoffs *cutoffs,
--
2.43.0
v13-0006-Combine-vacuum-phase-I-VM-update-cases.patchtext/x-patch; charset=US-ASCII; name=v13-0006-Combine-vacuum-phase-I-VM-update-cases.patchDownload
From 02228c3550e5626bddff072059326baad1ba1e1c Mon Sep 17 00:00:00 2001
From: Melanie Plageman <melanieplageman@gmail.com>
Date: Wed, 28 May 2025 16:35:36 -0400
Subject: [PATCH v13 06/20] Combine vacuum phase I VM update cases
We update the VM after phase I of vacuum -- either setting both the VM
bits when all bits are currently unset or setting just the frozen bit
when the all-visible bit is already set.
Those two cases shared much of the same code -- leading to unnecessary
duplication. This commit combines them, which is simpler and easier to
understand.
The combined case also happens to fix a longstanding bug where if we are
only setting an all-visible page all-frozen and checksums/wal_log_hints
are enabled, we would fail to set the buffer dirty before setting the
page LSN in visibilitymap_set().
---
src/backend/access/heap/vacuumlazy.c | 101 +++++++++------------------
1 file changed, 32 insertions(+), 69 deletions(-)
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index e9b4e924d22..d6818323932 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -2159,11 +2159,26 @@ lazy_scan_prune(LVRelState *vacrel,
{
/* Don't update the VM if we just cleared corruption in it */
}
- else if (!all_visible_according_to_vm && presult.all_visible)
+
+ /*
+ * If the page isn't yet marked all-visible in the VM or it is and needs
+ * to be marked all-frozen, update the VM. Note that all_frozen is only
+ * valid if all_visible is true, so we must check both all_visible and
+ * all_frozen.
+ */
+ else if (presult.all_visible &&
+ (!all_visible_according_to_vm ||
+ (presult.all_frozen && !VM_ALL_FROZEN(vacrel->rel, blkno, &vmbuffer))))
{
uint8 old_vmbits;
uint8 flags = VISIBILITYMAP_ALL_VISIBLE;
+ /*
+ * If the page is all-frozen, we can pass InvalidTransactionId as our
+ * cutoff_xid, since a snapshotConflictHorizon sufficient to make
+ * everything safe for REDO was logged when the page's tuples were
+ * frozen.
+ */
if (presult.all_frozen)
{
Assert(!TransactionIdIsValid(presult.vm_conflict_horizon));
@@ -2176,21 +2191,29 @@ lazy_scan_prune(LVRelState *vacrel,
* checksums are not enabled). Regardless, set both bits so that we
* get back in sync.
*
- * NB: If the heap page is all-visible but the VM bit is not set, we
- * don't need to dirty the heap page. However, if checksums are
- * enabled, we do need to make sure that the heap page is dirtied
- * before passing it to visibilitymap_set(), because it may be logged.
- * Given that this situation should only happen in rare cases after a
- * crash, it is not worth optimizing.
+ * If the heap page is all-visible but the VM bit is not set, we don't
+ * need to dirty the heap page. However, if checksums are enabled, we
+ * do need to make sure that the heap page is dirtied before passing
+ * it to visibilitymap_set(), because it may be logged.
*/
- PageSetAllVisible(page);
- MarkBufferDirty(buf);
+ if (!PageIsAllVisible(page) || XLogHintBitIsNeeded())
+ {
+ PageSetAllVisible(page);
+ MarkBufferDirty(buf);
+ }
+
old_vmbits = visibilitymap_set(vacrel->rel, blkno, buf,
InvalidXLogRecPtr,
vmbuffer, presult.vm_conflict_horizon,
flags);
/*
+ * Even if we are only setting the all-frozen bit, there is a small
+ * chance that the VM was modified sometime between setting
+ * all_visible_according_to_vm and checking the visibility during
+ * pruning. Check the return value of old_vmbits to ensure the
+ * visibility map counters used for logging are accurate.
+ *
* If the page wasn't already set all-visible and/or all-frozen in the
* VM, count it as newly set for logging.
*/
@@ -2211,66 +2234,6 @@ lazy_scan_prune(LVRelState *vacrel,
}
}
- /*
- * If the all-visible page is all-frozen but not marked as such yet, mark
- * it as all-frozen. Note that all_frozen is only valid if all_visible is
- * true, so we must check both all_visible and all_frozen.
- */
- else if (all_visible_according_to_vm && presult.all_visible &&
- presult.all_frozen && !VM_ALL_FROZEN(vacrel->rel, blkno, &vmbuffer))
- {
- uint8 old_vmbits;
-
- /*
- * Avoid relying on all_visible_according_to_vm as a proxy for the
- * page-level PD_ALL_VISIBLE bit being set, since it might have become
- * stale -- even when all_visible is set
- */
- if (!PageIsAllVisible(page))
- {
- PageSetAllVisible(page);
- MarkBufferDirty(buf);
- }
-
- /*
- * Set the page all-frozen (and all-visible) in the VM.
- *
- * We can pass InvalidTransactionId as our cutoff_xid, since a
- * snapshotConflictHorizon sufficient to make everything safe for REDO
- * was logged when the page's tuples were frozen.
- */
- Assert(!TransactionIdIsValid(presult.vm_conflict_horizon));
- old_vmbits = visibilitymap_set(vacrel->rel, blkno, buf,
- InvalidXLogRecPtr,
- vmbuffer, InvalidTransactionId,
- VISIBILITYMAP_ALL_VISIBLE |
- VISIBILITYMAP_ALL_FROZEN);
-
- /*
- * The page was likely already set all-visible in the VM. However,
- * there is a small chance that it was modified sometime between
- * setting all_visible_according_to_vm and checking the visibility
- * during pruning. Check the return value of old_vmbits anyway to
- * ensure the visibility map counters used for logging are accurate.
- */
- if ((old_vmbits & VISIBILITYMAP_ALL_VISIBLE) == 0)
- {
- vacrel->vm_new_visible_pages++;
- vacrel->vm_new_visible_frozen_pages++;
- *vm_page_frozen = true;
- }
-
- /*
- * We already checked that the page was not set all-frozen in the VM
- * above, so we don't need to test the value of old_vmbits.
- */
- else
- {
- vacrel->vm_new_frozen_pages++;
- *vm_page_frozen = true;
- }
- }
-
return presult.ndeleted;
}
--
2.43.0
v13-0008-Keep-all_frozen-updated-too-in-heap_page_prune_a.patchtext/x-patch; charset=US-ASCII; name=v13-0008-Keep-all_frozen-updated-too-in-heap_page_prune_a.patchDownload
From 12130b4f6fa88c9e748d45da860a5f8b1a7dd289 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <melanieplageman@gmail.com>
Date: Wed, 23 Jul 2025 16:01:24 -0400
Subject: [PATCH v13 08/20] Keep all_frozen updated too in
heap_page_prune_and_freeze
We previously relied on only using all-visible and all-frozen together
but it's best to keep them both updated.
Future commits will separate usage of these fields, so it is best not to
rely on all_visible for all_frozen's validity.
---
src/backend/access/heap/pruneheap.c | 15 ++++++---------
1 file changed, 6 insertions(+), 9 deletions(-)
diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index 8f968b47c38..3be4ae3ae2a 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -143,10 +143,6 @@ typedef struct
* whether to freeze the page or not. The all_visible and all_frozen
* values returned to the caller are adjusted to include LP_DEAD items at
* the end.
- *
- * all_frozen should only be considered valid if all_visible is also set;
- * we don't bother to clear the all_frozen flag every time we clear the
- * all_visible flag.
*/
bool all_visible;
bool all_frozen;
@@ -830,6 +826,7 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
*/
}
+ Assert(!prstate.all_frozen || prstate.all_visible);
/* Any error while applying the changes is critical */
START_CRIT_SECTION();
@@ -1474,7 +1471,7 @@ heap_prune_record_unchanged_lp_normal(Page page, PruneState *prstate, OffsetNumb
if (!HeapTupleHeaderXminCommitted(htup))
{
- prstate->all_visible = false;
+ prstate->all_visible = prstate->all_frozen = false;
break;
}
@@ -1496,7 +1493,7 @@ heap_prune_record_unchanged_lp_normal(Page page, PruneState *prstate, OffsetNumb
Assert(prstate->cutoffs);
if (!TransactionIdPrecedes(xmin, prstate->cutoffs->OldestXmin))
{
- prstate->all_visible = false;
+ prstate->all_visible = prstate->all_frozen = false;
break;
}
@@ -1509,7 +1506,7 @@ heap_prune_record_unchanged_lp_normal(Page page, PruneState *prstate, OffsetNumb
case HEAPTUPLE_RECENTLY_DEAD:
prstate->recently_dead_tuples++;
- prstate->all_visible = false;
+ prstate->all_visible = prstate->all_frozen = false;
/*
* This tuple will soon become DEAD. Update the hint field so
@@ -1528,7 +1525,7 @@ heap_prune_record_unchanged_lp_normal(Page page, PruneState *prstate, OffsetNumb
* assumption is a bit shaky, but it is what acquire_sample_rows()
* does, so be consistent.
*/
- prstate->all_visible = false;
+ prstate->all_visible = prstate->all_frozen = false;
/*
* If we wanted to optimize for aborts, we might consider marking
@@ -1546,7 +1543,7 @@ heap_prune_record_unchanged_lp_normal(Page page, PruneState *prstate, OffsetNumb
* will commit and update the counters after we report.
*/
prstate->live_tuples++;
- prstate->all_visible = false;
+ prstate->all_visible = prstate->all_frozen = false;
/*
* This tuple may soon become DEAD. Update the hint field so that
--
2.43.0
v13-0005-Combine-lazy_scan_prune-VM-corruption-cases.patchtext/x-patch; charset=US-ASCII; name=v13-0005-Combine-lazy_scan_prune-VM-corruption-cases.patchDownload
From cb4fb780867cd8736d4d7d5b8a49089a6105fee2 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <melanieplageman@gmail.com>
Date: Wed, 28 May 2025 16:04:03 -0400
Subject: [PATCH v13 05/20] Combine lazy_scan_prune VM corruption cases
lazy_scan_prune() updates the visibility map after phase I of heap
vacuuming. It also checks and fixes corruption in the VM. The corruption
cases where mixed in with the normal visibility map update cases.
Careful study of the ordering of the current logic reveals that the
corruption cases can be reordered and extracted into a separate
function. This should result in no additional overhead when compared to
previous execution.
This reordering makes it clear which cases are about corruption and
which cases are normal VM updates. Separating them also makes it
possible to combine the normal cases in a future commit. This will make
the logic easier to understand and allow for further separation of the
logic to allow updating the VM in the same record as pruning and
freezing in phase I.
---
src/backend/access/heap/vacuumlazy.c | 126 +++++++++++++++++----------
1 file changed, 79 insertions(+), 47 deletions(-)
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index a1cdaaebb57..e9b4e924d22 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -430,6 +430,12 @@ static void find_next_unskippable_block(LVRelState *vacrel, bool *skipsallvis);
static bool lazy_scan_new_or_empty(LVRelState *vacrel, Buffer buf,
BlockNumber blkno, Page page,
bool sharelock, Buffer vmbuffer);
+static bool identify_and_fix_vm_corruption(Relation relation,
+ BlockNumber heap_blk,
+ Buffer heap_buffer, Page heap_page,
+ bool heap_blk_known_av,
+ int64 nlpdead_items,
+ Buffer vmbuffer);
static int lazy_scan_prune(LVRelState *vacrel, Buffer buf,
BlockNumber blkno, Page page,
Buffer vmbuffer, bool all_visible_according_to_vm,
@@ -1934,6 +1940,72 @@ lazy_scan_new_or_empty(LVRelState *vacrel, Buffer buf, BlockNumber blkno,
return false;
}
+/*
+ * When updating the visibility map after phase I heap vacuuming, we take the
+ * opportunity to identify and fix any VM corruption.
+ *
+ * heap_blk_known_av is the visibility status of the heap page collected
+ * while finding the next unskippable block in heap_vac_scan_next_block().
+ */
+static bool
+identify_and_fix_vm_corruption(Relation relation,
+ BlockNumber heap_blk,
+ Buffer heap_buffer, Page heap_page,
+ bool heap_blk_known_av,
+ int64 nlpdead_items,
+ Buffer vmbuffer)
+{
+ /*
+ * As of PostgreSQL 9.2, the visibility map bit should never be set if the
+ * page-level bit is clear. However, it's possible that the bit got
+ * cleared after heap_vac_scan_next_block() was called, so we must recheck
+ * with buffer lock before concluding that the VM is corrupt.
+ */
+ if (heap_blk_known_av && !PageIsAllVisible(heap_page) &&
+ visibilitymap_get_status(relation, heap_blk, &vmbuffer) != 0)
+ {
+ ereport(WARNING,
+ (errcode(ERRCODE_DATA_CORRUPTED),
+ errmsg("page is not marked all-visible but visibility map bit is set in relation \"%s\" page %u",
+ RelationGetRelationName(relation), heap_blk)));
+
+ visibilitymap_clear(relation, heap_blk, vmbuffer,
+ VISIBILITYMAP_VALID_BITS);
+ return true;
+ }
+
+ /*
+ * It's possible for the value returned by
+ * GetOldestNonRemovableTransactionId() to move backwards, so it's not
+ * wrong for us to see tuples that appear to not be visible to everyone
+ * yet, while PD_ALL_VISIBLE is already set. The real safe xmin value
+ * never moves backwards, but GetOldestNonRemovableTransactionId() is
+ * conservative and sometimes returns a value that's unnecessarily small,
+ * so if we see that contradiction it just means that the tuples that we
+ * think are not visible to everyone yet actually are, and the
+ * PD_ALL_VISIBLE flag is correct.
+ *
+ * There should never be LP_DEAD items on a page with PD_ALL_VISIBLE set,
+ * however.
+ */
+ if (nlpdead_items > 0 && PageIsAllVisible(heap_page))
+ {
+ ereport(WARNING,
+ (errcode(ERRCODE_DATA_CORRUPTED),
+ errmsg("page containing LP_DEAD items is marked as all-visible in relation \"%s\" page %u",
+ RelationGetRelationName(relation), heap_blk)));
+
+ PageClearAllVisible(heap_page);
+ MarkBufferDirty(heap_buffer);
+ visibilitymap_clear(relation, heap_blk, vmbuffer,
+ VISIBILITYMAP_VALID_BITS);
+ return true;
+ }
+
+ return false;
+}
+
+
/* qsort comparator for sorting OffsetNumbers */
static int
cmpOffsetNumbers(const void *a, const void *b)
@@ -2080,9 +2152,14 @@ lazy_scan_prune(LVRelState *vacrel,
/*
* Handle setting visibility map bit based on information from the VM (as
* of last heap_vac_scan_next_block() call), and from all_visible and
- * all_frozen variables
+ * all_frozen variables. Start by looking for any VM corruption.
*/
- if (!all_visible_according_to_vm && presult.all_visible)
+ if (identify_and_fix_vm_corruption(vacrel->rel, blkno, buf, page,
+ all_visible_according_to_vm, presult.lpdead_items, vmbuffer))
+ {
+ /* Don't update the VM if we just cleared corruption in it */
+ }
+ else if (!all_visible_according_to_vm && presult.all_visible)
{
uint8 old_vmbits;
uint8 flags = VISIBILITYMAP_ALL_VISIBLE;
@@ -2134,51 +2211,6 @@ lazy_scan_prune(LVRelState *vacrel,
}
}
- /*
- * As of PostgreSQL 9.2, the visibility map bit should never be set if the
- * page-level bit is clear. However, it's possible that the bit got
- * cleared after heap_vac_scan_next_block() was called, so we must recheck
- * with buffer lock before concluding that the VM is corrupt.
- */
- else if (all_visible_according_to_vm && !PageIsAllVisible(page) &&
- visibilitymap_get_status(vacrel->rel, blkno, &vmbuffer) != 0)
- {
- ereport(WARNING,
- (errcode(ERRCODE_DATA_CORRUPTED),
- errmsg("page is not marked all-visible but visibility map bit is set in relation \"%s\" page %u",
- vacrel->relname, blkno)));
-
- visibilitymap_clear(vacrel->rel, blkno, vmbuffer,
- VISIBILITYMAP_VALID_BITS);
- }
-
- /*
- * It's possible for the value returned by
- * GetOldestNonRemovableTransactionId() to move backwards, so it's not
- * wrong for us to see tuples that appear to not be visible to everyone
- * yet, while PD_ALL_VISIBLE is already set. The real safe xmin value
- * never moves backwards, but GetOldestNonRemovableTransactionId() is
- * conservative and sometimes returns a value that's unnecessarily small,
- * so if we see that contradiction it just means that the tuples that we
- * think are not visible to everyone yet actually are, and the
- * PD_ALL_VISIBLE flag is correct.
- *
- * There should never be LP_DEAD items on a page with PD_ALL_VISIBLE set,
- * however.
- */
- else if (presult.lpdead_items > 0 && PageIsAllVisible(page))
- {
- ereport(WARNING,
- (errcode(ERRCODE_DATA_CORRUPTED),
- errmsg("page containing LP_DEAD items is marked as all-visible in relation \"%s\" page %u",
- vacrel->relname, blkno)));
-
- PageClearAllVisible(page);
- MarkBufferDirty(buf);
- visibilitymap_clear(vacrel->rel, blkno, vmbuffer,
- VISIBILITYMAP_VALID_BITS);
- }
-
/*
* If the all-visible page is all-frozen but not marked as such yet, mark
* it as all-frozen. Note that all_frozen is only valid if all_visible is
--
2.43.0
v13-0010-Rename-PruneState.freeze-to-attempt_freeze.patchtext/x-patch; charset=US-ASCII; name=v13-0010-Rename-PruneState.freeze-to-attempt_freeze.patchDownload
From 42817d95ec5bda5fb164167825b585731e4fdc70 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <melanieplageman@gmail.com>
Date: Thu, 31 Jul 2025 14:07:51 -0400
Subject: [PATCH v13 10/20] Rename PruneState.freeze to attempt_freeze
This makes it more clear that this is to indicate the caller would like
heap_page_prune_and_freeze() to consider freezing tuples -- not that we
ultimately will end up freezing them.
Also rename local variable hint_bit_fpi to did_tuple_hint_fpi. This
makes it clear it is about tuple hints and not page hints and that it
indicates something that happened and not something that could happen.
---
src/backend/access/heap/pruneheap.c | 20 ++++++++++----------
1 file changed, 10 insertions(+), 10 deletions(-)
diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index ae59242a843..44b186a4560 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -43,7 +43,7 @@ typedef struct
/* whether or not dead items can be set LP_UNUSED during pruning */
bool mark_unused_now;
/* whether to attempt freezing tuples */
- bool freeze;
+ bool attempt_freeze;
struct VacuumCutoffs *cutoffs;
/*-------------------------------------------------------
@@ -445,13 +445,13 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
bool do_hint;
uint8 vmflags = 0;
uint8 old_vmbits = 0;
- bool hint_bit_fpi;
+ bool did_tuple_hint_fpi;
int64 fpi_before = pgWalUsage.wal_fpi;
/* Copy parameters to prstate */
prstate.vistest = vistest;
prstate.mark_unused_now = (options & HEAP_PAGE_PRUNE_MARK_UNUSED_NOW) != 0;
- prstate.freeze = (options & HEAP_PAGE_PRUNE_FREEZE) != 0;
+ prstate.attempt_freeze = (options & HEAP_PAGE_PRUNE_FREEZE) != 0;
prstate.cutoffs = cutoffs;
/*
@@ -473,7 +473,7 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
/* initialize page freezing working state */
prstate.pagefrz.freeze_required = false;
- if (prstate.freeze)
+ if (prstate.attempt_freeze)
{
Assert(new_relfrozen_xid && new_relmin_mxid);
prstate.pagefrz.FreezePageRelfrozenXid = *new_relfrozen_xid;
@@ -520,7 +520,7 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
* function, when we return the value to the caller, so that the caller
* doesn't set the VM bit incorrectly.
*/
- if (prstate.freeze)
+ if (prstate.attempt_freeze)
{
prstate.all_visible = true;
prstate.all_frozen = true;
@@ -634,7 +634,7 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
* If checksums are enabled, heap_prune_satisfies_vacuum() may have caused
* an FPI to be emitted.
*/
- hint_bit_fpi = fpi_before != pgWalUsage.wal_fpi;
+ did_tuple_hint_fpi = fpi_before != pgWalUsage.wal_fpi;
/*
* Process HOT chains.
@@ -750,7 +750,7 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
* plans we prepared, or not.
*/
do_freeze = false;
- if (prstate.freeze)
+ if (prstate.attempt_freeze)
{
if (prstate.pagefrz.freeze_required)
{
@@ -783,7 +783,7 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
*/
if (RelationNeedsWAL(relation))
{
- if (hint_bit_fpi)
+ if (did_tuple_hint_fpi)
do_freeze = true;
else if (do_prune)
{
@@ -1046,7 +1046,7 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
presult->old_vmbits = old_vmbits;
presult->new_vmbits = vmflags;
- if (prstate.freeze)
+ if (prstate.attempt_freeze)
{
if (presult->nfrozen > 0)
{
@@ -1628,7 +1628,7 @@ heap_prune_record_unchanged_lp_normal(Page page, PruneState *prstate, OffsetNumb
}
/* Consider freezing any normal tuples which will not be removed */
- if (prstate->freeze)
+ if (prstate->attempt_freeze)
{
bool totally_frozen;
--
2.43.0
v13-0009-Update-VM-in-pruneheap.c.patchtext/x-patch; charset=US-ASCII; name=v13-0009-Update-VM-in-pruneheap.c.patchDownload
From 4305a7ac8c4b1ecbb863e4df9b293c8cf1b7a4e8 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <melanieplageman@gmail.com>
Date: Mon, 2 Jun 2025 11:04:14 -0400
Subject: [PATCH v13 09/20] Update VM in pruneheap.c
As a step toward updating the VM in the same critical section and WAL
record as pruning and freezing (during phase I of vacuuming), first move
the VM update (still in its own critical section and WAL record) into
heap_page_prune_and_freeze(). This makes review easier.
---
src/backend/access/heap/pruneheap.c | 99 +++++++++++++++++++++++-----
src/backend/access/heap/vacuumlazy.c | 98 +++++----------------------
src/include/access/heapam.h | 15 +++--
3 files changed, 106 insertions(+), 106 deletions(-)
diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index 3be4ae3ae2a..ae59242a843 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -366,7 +366,8 @@ identify_and_fix_vm_corruption(Relation relation,
/*
* Prune and repair fragmentation and potentially freeze tuples on the
- * specified page.
+ * specified page. If the page's visibility status has changed, update it in
+ * the VM.
*
* Caller must have pin and buffer cleanup lock on the page. Note that we
* don't update the FSM information for page on caller's behalf. Caller might
@@ -442,6 +443,8 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
bool do_freeze;
bool do_prune;
bool do_hint;
+ uint8 vmflags = 0;
+ uint8 old_vmbits = 0;
bool hint_bit_fpi;
int64 fpi_before = pgWalUsage.wal_fpi;
@@ -942,7 +945,7 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
*
* Now that freezing has been finalized, unset all_visible if there are
* any LP_DEAD items on the page. It needs to reflect the present state
- * of the page, as expected by our caller.
+ * of the page, as expected for updating the visibility map.
*/
if (prstate.all_visible && prstate.lpdead_items == 0)
{
@@ -958,31 +961,91 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
presult->hastup = prstate.hastup;
/*
- * For callers planning to update the visibility map, the conflict horizon
- * for that record must be the newest xmin on the page. However, if the
- * page is completely frozen, there can be no conflict and the
- * vm_conflict_horizon should remain InvalidTransactionId. This includes
- * the case that we just froze all the tuples; the prune-freeze record
- * included the conflict XID already so the caller doesn't need it.
+ * If updating the visibility map, the conflict horizon for that record
+ * must be the newest xmin on the page. However, if the page is
+ * completely frozen, there can be no conflict and the vm_conflict_horizon
+ * should remain InvalidTransactionId. This includes the case that we
+ * just froze all the tuples; the prune-freeze record included the
+ * conflict XID already so the VM update record doesn't need it.
*/
if (presult->all_frozen)
presult->vm_conflict_horizon = InvalidTransactionId;
else
presult->vm_conflict_horizon = prstate.visibility_cutoff_xid;
- presult->lpdead_items = prstate.lpdead_items;
- /* the presult->deadoffsets array was already filled in */
-
/*
- * Clear any VM corruption. This does not need to be done in a critical
- * section.
+ * Handle setting visibility map bit based on information from the VM (as
+ * of last heap_vac_scan_next_block() call), and from all_visible and
+ * all_frozen variables.
*/
- presult->vm_corruption = false;
if (options & HEAP_PAGE_PRUNE_UPDATE_VM)
- presult->vm_corruption = identify_and_fix_vm_corruption(relation,
- blockno, buffer, page,
- blk_known_av,
- prstate.lpdead_items, vmbuffer);
+ {
+ if (identify_and_fix_vm_corruption(relation,
+ blockno, buffer, page,
+ blk_known_av,
+ prstate.lpdead_items, vmbuffer))
+ {
+ /* If we fix corruption, don't update the VM further */
+ }
+
+ /*
+ * If the page isn't yet marked all-visible in the VM or it is and
+ * needs to be marked all-frozen, update the VM. Note that all_frozen
+ * is only valid if all_visible is true, so we must check both
+ * all_visible and all_frozen.
+ */
+ else if (presult->all_visible &&
+ (!blk_known_av ||
+ (presult->all_frozen && !VM_ALL_FROZEN(relation, blockno, &vmbuffer))))
+ {
+ Assert(prstate.lpdead_items == 0);
+ vmflags = VISIBILITYMAP_ALL_VISIBLE;
+
+ /*
+ * If the page is all-frozen, we can pass InvalidTransactionId as
+ * our cutoff_xid, since a snapshotConflictHorizon sufficient to
+ * make everything safe for REDO was logged when the page's tuples
+ * were frozen.
+ */
+ if (presult->all_frozen)
+ {
+ Assert(!TransactionIdIsValid(presult->vm_conflict_horizon));
+ vmflags |= VISIBILITYMAP_ALL_FROZEN;
+ }
+
+ /*
+ * It's possible for the VM bit to be clear and the page-level bit
+ * to be set if checksums are not enabled.
+ *
+ * And even if we are just planning to update the frozen bit in
+ * the VM, we shouldn't rely on all_visible_according_to_vm as a
+ * proxy for the page-level PD_ALL_VISIBLE bit being set, since it
+ * might have become stale.
+ *
+ * If the heap page is all-visible but the VM bit is not set, we
+ * don't need to dirty the heap page. However, if checksums are
+ * enabled, we do need to make sure that the heap page is dirtied
+ * before passing it to visibilitymap_set(), because it may be
+ * logged.
+ */
+ if (!PageIsAllVisible(page) || XLogHintBitIsNeeded())
+ {
+ PageSetAllVisible(page);
+ MarkBufferDirty(buffer);
+ }
+
+ old_vmbits = visibilitymap_set(relation, blockno, buffer, InvalidXLogRecPtr,
+ vmbuffer, presult->vm_conflict_horizon,
+ vmflags);
+ }
+ }
+
+ presult->lpdead_items = prstate.lpdead_items;
+ /* the presult->deadoffsets array was already filled in */
+
+ presult->old_vmbits = old_vmbits;
+ presult->new_vmbits = vmflags;
+
if (prstate.freeze)
{
if (presult->nfrozen > 0)
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index a222c9f9164..9492423141e 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -1949,7 +1949,8 @@ cmpOffsetNumbers(const void *a, const void *b)
* vmbuffer is the buffer containing the VM block with visibility information
* for the heap block, blkno. all_visible_according_to_vm is the saved
* visibility status of the heap block looked up earlier by the caller. We
- * won't rely entirely on this status, as it may be out of date.
+ * won't rely entirely on this status, as it may be out of date. These will be
+ * passed on to heap_page_prune_and_freeze() to use while setting the VM.
*
* *has_lpdead_items is set to true or false depending on whether, upon return
* from this function, any LP_DEAD items are still present on the page.
@@ -1978,6 +1979,7 @@ lazy_scan_prune(LVRelState *vacrel,
/*
* Prune all HOT-update chains and potentially freeze tuples on this page.
+ * Then, if the page's visibility status has changed, update the VM.
*
* If the relation has no indexes, we can immediately mark would-be dead
* items LP_UNUSED.
@@ -1986,10 +1988,6 @@ lazy_scan_prune(LVRelState *vacrel,
* presult.ndeleted. It should not be confused with presult.lpdead_items;
* presult.lpdead_items's final value can be thought of as the number of
* tuples that were deleted from indexes.
- *
- * We will update the VM after collecting LP_DEAD items and freezing
- * tuples. Pruning will have determined whether or not the page is
- * all-visible.
*/
prune_options = HEAP_PAGE_PRUNE_FREEZE | HEAP_PAGE_PRUNE_UPDATE_VM;
if (vacrel->nindexes == 0)
@@ -2081,88 +2079,26 @@ lazy_scan_prune(LVRelState *vacrel,
Assert(!presult.all_visible || !(*has_lpdead_items));
/*
- * Handle setting visibility map bit based on information from the VM (as
- * of last heap_vac_scan_next_block() call), and from all_visible and
- * all_frozen variables.
+ * For the purposes of logging, count whether or not the page was newly
+ * set all-visible and, potentially, all-frozen.
*/
- if (presult.vm_corruption)
+ if ((presult.old_vmbits & VISIBILITYMAP_ALL_VISIBLE) == 0 &&
+ (presult.new_vmbits & VISIBILITYMAP_ALL_VISIBLE) != 0)
{
- /* Don't update the VM if we just cleared corruption in it */
- }
-
- /*
- * If the page isn't yet marked all-visible in the VM or it is and needs
- * to be marked all-frozen, update the VM. Note that all_frozen is only
- * valid if all_visible is true, so we must check both all_visible and
- * all_frozen.
- */
- else if (presult.all_visible &&
- (!all_visible_according_to_vm ||
- (presult.all_frozen && !VM_ALL_FROZEN(vacrel->rel, blkno, &vmbuffer))))
- {
- uint8 old_vmbits;
- uint8 flags = VISIBILITYMAP_ALL_VISIBLE;
-
- /*
- * If the page is all-frozen, we can pass InvalidTransactionId as our
- * cutoff_xid, since a snapshotConflictHorizon sufficient to make
- * everything safe for REDO was logged when the page's tuples were
- * frozen.
- */
- if (presult.all_frozen)
- {
- Assert(!TransactionIdIsValid(presult.vm_conflict_horizon));
- flags |= VISIBILITYMAP_ALL_FROZEN;
- }
-
- /*
- * It should never be the case that the visibility map page is set
- * while the page-level bit is clear, but the reverse is allowed (if
- * checksums are not enabled). Regardless, set both bits so that we
- * get back in sync.
- *
- * If the heap page is all-visible but the VM bit is not set, we don't
- * need to dirty the heap page. However, if checksums are enabled, we
- * do need to make sure that the heap page is dirtied before passing
- * it to visibilitymap_set(), because it may be logged.
- */
- if (!PageIsAllVisible(page) || XLogHintBitIsNeeded())
- {
- PageSetAllVisible(page);
- MarkBufferDirty(buf);
- }
-
- old_vmbits = visibilitymap_set(vacrel->rel, blkno, buf,
- InvalidXLogRecPtr,
- vmbuffer, presult.vm_conflict_horizon,
- flags);
-
- /*
- * Even if we are only setting the all-frozen bit, there is a small
- * chance that the VM was modified sometime between setting
- * all_visible_according_to_vm and checking the visibility during
- * pruning. Check the return value of old_vmbits to ensure the
- * visibility map counters used for logging are accurate.
- *
- * If the page wasn't already set all-visible and/or all-frozen in the
- * VM, count it as newly set for logging.
- */
- if ((old_vmbits & VISIBILITYMAP_ALL_VISIBLE) == 0)
- {
- vacrel->vm_new_visible_pages++;
- if (presult.all_frozen)
- {
- vacrel->vm_new_visible_frozen_pages++;
- *vm_page_frozen = true;
- }
- }
- else if ((old_vmbits & VISIBILITYMAP_ALL_FROZEN) == 0 &&
- presult.all_frozen)
+ vacrel->vm_new_visible_pages++;
+ if ((presult.new_vmbits & VISIBILITYMAP_ALL_FROZEN) != 0)
{
- vacrel->vm_new_frozen_pages++;
+ vacrel->vm_new_visible_frozen_pages++;
*vm_page_frozen = true;
}
}
+ else if ((presult.old_vmbits & VISIBILITYMAP_ALL_FROZEN) == 0 &&
+ (presult.new_vmbits & VISIBILITYMAP_ALL_FROZEN) != 0)
+ {
+ Assert((presult.new_vmbits & VISIBILITYMAP_ALL_VISIBLE) != 0);
+ vacrel->vm_new_frozen_pages++;
+ *vm_page_frozen = true;
+ }
return presult.ndeleted;
}
diff --git a/src/include/access/heapam.h b/src/include/access/heapam.h
index a3b85fd1daf..9952ae96b12 100644
--- a/src/include/access/heapam.h
+++ b/src/include/access/heapam.h
@@ -235,20 +235,21 @@ typedef struct PruneFreezeResult
int recently_dead_tuples;
/*
- * all_visible and all_frozen indicate if the all-visible and all-frozen
- * bits in the visibility map can be set for this page, after pruning.
+ * all_visible and all_frozen indicate the status of the page as reflected
+ * in the visibility map after pruning, freezing, and setting any pages
+ * all-visible in the visibility map.
*
- * vm_conflict_horizon is the newest xmin of live tuples on the page. The
- * caller can use it as the conflict horizon when setting the VM bits. It
- * is only valid if we froze some tuples (nfrozen > 0), and all_frozen is
- * true.
+ * vm_conflict_horizon is the newest xmin of live tuples on the page
+ * (older than OldestXmin). It will only be valid if we did not set the
+ * page all-frozen in the VM.
*
* These are only set if the HEAP_PRUNE_FREEZE option is set.
*/
bool all_visible;
bool all_frozen;
TransactionId vm_conflict_horizon;
- bool vm_corruption;
+ uint8 old_vmbits;
+ uint8 new_vmbits;
/*
* Whether or not the page makes rel truncation unsafe. This is set to
--
2.43.0
v13-0011-Eliminate-xl_heap_visible-from-vacuum-phase-I-pr.patchtext/x-patch; charset=US-ASCII; name=v13-0011-Eliminate-xl_heap_visible-from-vacuum-phase-I-pr.patchDownload
From c5ae37129bbfcc7245f052976493a2c81e15a25b Mon Sep 17 00:00:00 2001
From: Melanie Plageman <melanieplageman@gmail.com>
Date: Wed, 18 Jun 2025 12:41:00 -0400
Subject: [PATCH v13 11/20] Eliminate xl_heap_visible from vacuum phase I
prune/freeze
Instead of emitting a separate WAL record for every block rendered
all-visible/frozen by vacuum's phase I, include the changes to the VM in
the xl_heap_prune record already emitted.
This is only enabled for vacuum's prune/freeze work, not for on-access
pruning.
---
src/backend/access/heap/pruneheap.c | 460 ++++++++++++++++-----------
src/backend/access/heap/vacuumlazy.c | 30 --
src/include/access/heapam.h | 15 +-
3 files changed, 283 insertions(+), 222 deletions(-)
diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index 44b186a4560..74f7878c9ac 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -44,6 +44,13 @@ typedef struct
bool mark_unused_now;
/* whether to attempt freezing tuples */
bool attempt_freeze;
+
+ /*
+ * Whether or not to consider updating the VM. There is some bookkeeping
+ * that must be maintained if we would like to update the VM.
+ */
+ bool consider_update_vm;
+
struct VacuumCutoffs *cutoffs;
/*-------------------------------------------------------
@@ -108,8 +115,9 @@ typedef struct
*
* These fields are not used by pruning itself for the most part, but are
* used to collect information about what was pruned and what state the
- * page is in after pruning, for the benefit of the caller. They are
- * copied to the caller's PruneFreezeResult at the end.
+ * page is in after pruning to use when updating the visibility map and
+ * for the benefit of the caller. They are copied to the caller's
+ * PruneFreezeResult at the end.
* -------------------------------------------------------
*/
@@ -138,11 +146,10 @@ typedef struct
* bits. It is only valid if we froze some tuples, and all_frozen is
* true.
*
- * NOTE: all_visible and all_frozen don't include LP_DEAD items. That's
- * convenient for heap_page_prune_and_freeze(), to use them to decide
- * whether to freeze the page or not. The all_visible and all_frozen
- * values returned to the caller are adjusted to include LP_DEAD items at
- * the end.
+ * NOTE: all_visible and all_frozen don't include LP_DEAD items until
+ * directly before updating the VM. We ignore LP_DEAD items when deciding
+ * whether or not to opportunistically freeze and when determining the
+ * snapshot conflict horizon required when freezing tuples.
*/
bool all_visible;
bool all_frozen;
@@ -377,12 +384,15 @@ identify_and_fix_vm_corruption(Relation relation,
* If the HEAP_PRUNE_FREEZE option is set, we will also freeze tuples if it's
* required in order to advance relfrozenxid / relminmxid, or if it's
* considered advantageous for overall system performance to do so now. The
- * 'cutoffs', 'presult', 'new_relfrozen_xid' and 'new_relmin_mxid' arguments
- * are required when freezing. When HEAP_PRUNE_FREEZE option is set, we also
- * set presult->all_visible and presult->all_frozen on exit, to indicate if
- * the VM bits can be set. They are always set to false when the
- * HEAP_PRUNE_FREEZE option is not set, because at the moment only callers
- * that also freeze need that information.
+ * 'cutoffs', 'new_relfrozen_xid' and 'new_relmin_mxid' arguments are required
+ * when freezing.
+ *
+ * If HEAP_PAGE_PRUNE_UPDATE_VM is set and the visibility status of the page
+ * has changed, we will update the VM at the same time as pruning and freezing
+ * the heap page. We will also update presult->old_vmbits and
+ * presult->new_vmbits with the state of the VM before and after updating it
+ * for the caller to use in bookkeeping. Note that new and old_vmbits will be
+ * 0 if HEAP_PAGE_PRUNE_UPDATE_VM is not set.
*
* blk_known_av is the visibility status of the heap block as of the last call
* to find_next_unskippable_block(). vmbuffer is the buffer that may already
@@ -398,6 +408,8 @@ identify_and_fix_vm_corruption(Relation relation,
* FREEZE indicates that we will also freeze tuples, and will return
* 'all_visible', 'all_frozen' flags to the caller.
*
+ * UPDATE_VM indicates that we will set the page's status in the VM.
+ *
* cutoffs contains the freeze cutoffs, established by VACUUM at the beginning
* of vacuuming the relation. Required if HEAP_PRUNE_FREEZE option is set.
* cutoffs->OldestXmin is also used to determine if dead tuples are
@@ -442,18 +454,24 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
HeapTupleData tup;
bool do_freeze;
bool do_prune;
- bool do_hint;
+ bool do_hint_full_or_prunable;
+ bool do_set_vm;
uint8 vmflags = 0;
uint8 old_vmbits = 0;
bool did_tuple_hint_fpi;
int64 fpi_before = pgWalUsage.wal_fpi;
+ bool all_frozen_except_lp_dead = false;
+ bool set_pd_all_visible = false;
/* Copy parameters to prstate */
prstate.vistest = vistest;
prstate.mark_unused_now = (options & HEAP_PAGE_PRUNE_MARK_UNUSED_NOW) != 0;
prstate.attempt_freeze = (options & HEAP_PAGE_PRUNE_FREEZE) != 0;
+ prstate.consider_update_vm = (options & HEAP_PAGE_PRUNE_UPDATE_VM) != 0;
prstate.cutoffs = cutoffs;
+ Assert(!prstate.consider_update_vm || vmbuffer);
+
/*
* Our strategy is to scan the page and make lists of items to change,
* then apply the changes within a critical section. This keeps as much
@@ -498,50 +516,57 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
prstate.deadoffsets = presult->deadoffsets;
/*
- * Caller may update the VM after we're done. We can keep track of
- * whether the page will be all-visible and all-frozen after pruning and
- * freezing to help the caller to do that.
+ * Keep track of whether or not the page will be all-visible and
+ * all-frozen for use in opportunistic freezing and to update the VM if
+ * the caller requests it.
+ *
+ * Currently, only VACUUM attempts freezing and setting the VM bits. But
+ * other callers could do either one. The visibility bookkeeping is
+ * required for opportunistic freezing (in addition to setting the VM
+ * bits) because we only consider opportunistically freezing tuples if the
+ * whole page would become all-frozen or if the whole page will be frozen
+ * except for dead tuples that will be removed by vacuum.
*
- * Currently, only VACUUM sets the VM bits. To save the effort, only do
- * the bookkeeping if the caller needs it. Currently, that's tied to
- * HEAP_PAGE_PRUNE_FREEZE, but it could be a separate flag if you wanted
- * to update the VM bits without also freezing or freeze without also
- * setting the VM bits.
+ * If only updating the VM, we must initialize all_frozen to false, as
+ * heap_prepare_freeze_tuple() will not be called for each tuple on the
+ * page and we will not end up correctly setting it to false later.
*
- * In addition to telling the caller whether it can set the VM bit, we
- * also use 'all_visible' and 'all_frozen' for our own decision-making. If
- * the whole page would become frozen, we consider opportunistically
- * freezing tuples. We will not be able to freeze the whole page if there
- * are tuples present that are not visible to everyone or if there are
- * dead tuples which are not yet removable. However, dead tuples which
- * will be removed by the end of vacuuming should not preclude us from
- * opportunistically freezing. Because of that, we do not clear
- * all_visible when we see LP_DEAD items. We fix that at the end of the
- * function, when we return the value to the caller, so that the caller
- * doesn't set the VM bit incorrectly.
+ * Dead tuples which will be removed by the end of vacuuming should not
+ * preclude us from opportunistically freezing, so we do not clear
+ * all_visible when we see LP_DEAD items. We fix that after determining
+ * whether or not to freeze but before deciding whether or not to update
+ * the VM so that we don't set the VM bit incorrectly.
+ *
+ * If not freezing or updating the VM, we otherwise avoid the extra
+ * bookkeeping. Initializing all_visible to false allows skipping the work
+ * to update them in heap_prune_record_unchanged_lp_normal().
*/
if (prstate.attempt_freeze)
{
prstate.all_visible = true;
prstate.all_frozen = true;
}
+ else if (prstate.consider_update_vm)
+ {
+ prstate.all_visible = true;
+ prstate.all_frozen = false;
+ }
else
{
- /*
- * Initializing to false allows skipping the work to update them in
- * heap_prune_record_unchanged_lp_normal().
- */
prstate.all_visible = false;
prstate.all_frozen = false;
}
/*
- * The visibility cutoff xid is the newest xmin of live tuples on the
- * page. In the common case, this will be set as the conflict horizon the
- * caller can use for updating the VM. If, at the end of freezing and
- * pruning, the page is all-frozen, there is no possibility that any
- * running transaction on the standby does not see tuples on the page as
- * all-visible, so the conflict horizon remains InvalidTransactionId.
+ * The visibility cutoff xid is the newest xmin of live, committed tuples
+ * older than OldestXmin on the page. This field is only kept up-to-date
+ * if the page is all-visible. As soon as a tuple is encountered that is
+ * not visible to all, this field is unmaintained. As long as it is
+ * maintained, it can be used to calculate the snapshot conflict horizon.
+ * This is most likely to happen when updating the VM and/or freezing all
+ * live tuples on the page. It is updated before returning to the caller
+ * because vacuum does assert-build only validation on the page using this
+ * field.
*/
prstate.visibility_cutoff_xid = InvalidTransactionId;
@@ -739,10 +764,11 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
/*
* Even if we don't prune anything, if we found a new value for the
- * pd_prune_xid field or the page was marked full, we will update the hint
- * bit.
+ * pd_prune_xid field or the page was marked full, we will update those
+ * hint bits.
*/
- do_hint = ((PageHeader) page)->pd_prune_xid != prstate.new_prune_xid ||
+ do_hint_full_or_prunable =
+ ((PageHeader) page)->pd_prune_xid != prstate.new_prune_xid ||
PageIsFull(page);
/*
@@ -790,7 +816,7 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
if (XLogCheckBufferNeedsBackup(buffer))
do_freeze = true;
}
- else if (do_hint)
+ else if (do_hint_full_or_prunable)
{
if (XLogHintBitIsNeeded() && XLogCheckBufferNeedsBackup(buffer))
do_freeze = true;
@@ -829,11 +855,88 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
*/
}
+ /*
+ * It was convenient to ignore LP_DEAD items in all_visible earlier on to
+ * make the choice of whether or not to freeze the page unaffected by the
+ * short-term presence of LP_DEAD items. These LP_DEAD items were
+ * effectively assumed to be LP_UNUSED items in the making. It doesn't
+ * matter which vacuum heap pass (initial pass or final pass) ends up
+ * setting the page all-frozen, as long as the ongoing VACUUM does it.
+ *
+ * Now that freezing has been finalized, unset all_visible if there are
+ * any LP_DEAD items on the page. It needs to reflect the present state of
+ * the page when using it to determine whether or not to update the VM.
+ *
+ * Keep track of whether or not the page was all-frozen except LP_DEAD
+ * items for the purposes of calculating the snapshot conflict horizon,
+ * though.
+ */
+ all_frozen_except_lp_dead = prstate.all_frozen;
+ if (prstate.lpdead_items > 0)
+ {
+ prstate.all_visible = false;
+ prstate.all_frozen = false;
+ }
+
Assert(!prstate.all_frozen || prstate.all_visible);
+
+ /*
+ * Handle setting visibility map bit based on information from the VM (as
+ * of last heap_vac_scan_next_block() call), and from all_visible and
+ * all_frozen variables.
+ */
+ if (prstate.consider_update_vm)
+ {
+ /*
+ * Clear any VM corruption. This does not need to be in a critical
+ * section, so we do it first. If PD_ALL_VISIBLE is incorrectly set,
+ * we may mark the heap page buffer dirty here and could end up doing
+ * so again later. This is not a correctness issue and is in the path
+ * of VM corruption, so we don't have to worry about the extra
+ * performance overhead.
+ */
+ if (identify_and_fix_vm_corruption(relation,
+ blockno, buffer, page,
+ blk_known_av, prstate.lpdead_items, vmbuffer))
+ {
+ /* If we fix corruption, don't update the VM further */
+ }
+
+ /* Determine if we actually need to set the VM and which bits to set. */
+ else if (prstate.all_visible &&
+ (!blk_known_av ||
+ (prstate.all_frozen && !VM_ALL_FROZEN(relation, blockno, &vmbuffer))))
+ {
+ vmflags |= VISIBILITYMAP_ALL_VISIBLE;
+ if (prstate.all_frozen)
+ vmflags |= VISIBILITYMAP_ALL_FROZEN;
+ }
+ }
+
+ do_set_vm = vmflags & VISIBILITYMAP_VALID_BITS;
+
+ /* Lock vmbuffer before entering a critical section */
+ if (do_set_vm)
+ LockBuffer(vmbuffer, BUFFER_LOCK_EXCLUSIVE);
+
+ /*
+ * Don't set PD_ALL_VISIBLE unless we also plan to set the VM. While it is
+ * correct for a heap page to have PD_ALL_VISIBLE even if the VM is not
+ * set, we strongly prefer to keep them in sync.
+ *
+ * Prior to Postgres 19, it was possible for the page-level bit to be set
+ * and the VM bit to be clear. This could happen if we crashed after
+ * setting PD_ALL_VISIBLE but before setting bits in the VM.
+ */
+ set_pd_all_visible = do_set_vm && !PageIsAllVisible(page);
+
+ /* Save these for the caller in case we later zero out vmflags */
+ presult->new_vmbits = vmflags;
+
/* Any error while applying the changes is critical */
START_CRIT_SECTION();
- if (do_hint)
+ if (do_hint_full_or_prunable)
{
/*
* Update the page's pd_prune_xid field to either zero, or the lowest
@@ -849,15 +952,16 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
PageClearFull(page);
/*
- * If that's all we had to do to the page, this is a non-WAL-logged
- * hint. If we are going to freeze or prune the page, we will mark
- * the buffer dirty below.
+ * If we are _only_ setting the prune_xid or PD_PAGE_FULL hint, then
+ * this is a non-WAL-logged hint. If we are going to freeze or prune
+ * tuples on the page or set PD_ALL_VISIBLE, we will mark the buffer
+ * dirty and emit WAL below.
*/
- if (!do_freeze && !do_prune)
+ if (!do_prune && !do_freeze && !set_pd_all_visible)
MarkBufferDirtyHint(buffer, true);
}
- if (do_prune || do_freeze)
+ if (do_prune || do_freeze || do_set_vm)
{
/* Apply the planned item changes and repair page fragmentation. */
if (do_prune)
@@ -871,12 +975,47 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
if (do_freeze)
heap_freeze_prepared_tuples(buffer, prstate.frozen, prstate.nfrozen);
- MarkBufferDirty(buffer);
+ if (set_pd_all_visible)
+ PageSetAllVisible(page);
/*
- * Emit a WAL XLOG_HEAP2_PRUNE_FREEZE record showing what we did
+ * We only set PD_ALL_VISIBLE if we also set the VM, and since setting
+ * the VM requires emitting WAL, MarkBufferDirtyHint() isn't
+ * appropriate here.
*/
- if (RelationNeedsWAL(relation))
+ if (do_prune || do_freeze || set_pd_all_visible)
+ MarkBufferDirty(buffer);
+
+ if (do_set_vm)
+ {
+ Assert(PageIsAllVisible(page));
+ old_vmbits = visibilitymap_set_vmbits(relation, blockno,
+ vmbuffer, vmflags);
+
+ if (old_vmbits == vmflags)
+ {
+ LockBuffer(vmbuffer, BUFFER_LOCK_UNLOCK);
+ do_set_vm = false;
+ /* 0 out vmflags so we don't emit WAL to update the VM */
+ vmflags = 0;
+ }
+ }
+
+ /*
+ * It should never be the case that PD_ALL_VISIBLE is not set and the
+ * VM is set. Or, if it were, we should have caught it earlier when
+ * finding and fixing VM corruption. So, if we found out the VM was
+ * already set above, we should have found PD_ALL_VISIBLE set earlier.
+ */
+ Assert(!set_pd_all_visible || do_set_vm);
+
+ /*
+ * Emit a WAL XLOG_HEAP2_PRUNE_FREEZE record showing what we did. If
+ * we were only updating the VM and it turns out it was already set,
+ * we will have unset do_set_vm earlier. As such, check it again
+ * before emitting the record.
+ */
+ if (RelationNeedsWAL(relation) && (do_set_vm || do_prune || do_freeze))
{
/*
* The snapshotConflictHorizon for the whole record should be the
@@ -888,35 +1027,56 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
* on the standby with xids older than the youngest tuple this
* record will freeze will conflict.
*/
- TransactionId frz_conflict_horizon = InvalidTransactionId;
- TransactionId conflict_xid;
+ TransactionId conflict_xid = InvalidTransactionId;
+
+ /*
+ * If we are updating the VM, the conflict horizon is almost
+ * always the visibility cutoff XID.
+ *
+ * Separately, if we are freezing any tuples, as an optimization,
+ * we can use the visibility_cutoff_xid as the conflict horizon if
+ * the page will be all-frozen. This is true even if there are
+ * LP_DEAD line pointers because we ignored those when maintaining
+ * the visibility_cutoff_xid.
+ */
+ if (do_set_vm || (do_freeze && all_frozen_except_lp_dead))
+ conflict_xid = prstate.visibility_cutoff_xid;
/*
- * We can use the visibility_cutoff_xid as our cutoff for
- * conflicts when the whole page is eligible to become all-frozen
- * in the VM once we're done with it. Otherwise we generate a
- * conservative cutoff by stepping back from OldestXmin.
+ * Otherwise, if we are freezing but the page would not be
+ * all-frozen, we have to use the more pessimistic horizon of
+ * OldestXmin, which may be newer than the newest tuple we froze.
+ * We currently don't track the newest tuple we froze.
*/
- if (do_freeze)
+ else if (do_freeze)
{
- if (prstate.all_visible && prstate.all_frozen)
- frz_conflict_horizon = prstate.visibility_cutoff_xid;
- else
- {
- /* Avoids false conflicts when hot_standby_feedback in use */
- frz_conflict_horizon = prstate.cutoffs->OldestXmin;
- TransactionIdRetreat(frz_conflict_horizon);
- }
+ conflict_xid = prstate.cutoffs->OldestXmin;
+ TransactionIdRetreat(conflict_xid);
}
- if (TransactionIdFollows(frz_conflict_horizon, prstate.latest_xid_removed))
- conflict_xid = frz_conflict_horizon;
- else
+ /*
+ * If we are removing tuples with a younger xmax than our so far
+ * calculated conflict_xid, we must use this as our horizon.
+ */
+ if (TransactionIdFollows(prstate.latest_xid_removed, conflict_xid))
conflict_xid = prstate.latest_xid_removed;
+ /*
+ * We can omit the snapshot conflict horizon if we are not pruning
+ * or freezing any tuples and are setting an already all-visible
+ * page all-frozen in the VM. In this case, all of the tuples on
+ * the page must already be visible to all MVCC snapshots on the
+ * standby.
+ */
+ if (!do_prune && !do_freeze && do_set_vm &&
+ blk_known_av && (vmflags & VISIBILITYMAP_ALL_FROZEN))
+ conflict_xid = InvalidTransactionId;
+
log_heap_prune_and_freeze(relation, buffer,
false,
- InvalidBuffer, 0, false,
+ vmbuffer,
+ vmflags,
+ set_pd_all_visible,
conflict_xid,
true, reason,
prstate.frozen, prstate.nfrozen,
@@ -928,124 +1088,55 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
END_CRIT_SECTION();
- /* Copy information back for caller */
- presult->ndeleted = prstate.ndeleted;
- presult->nnewlpdead = prstate.ndead;
- presult->nfrozen = prstate.nfrozen;
- presult->live_tuples = prstate.live_tuples;
- presult->recently_dead_tuples = prstate.recently_dead_tuples;
-
- /*
- * It was convenient to ignore LP_DEAD items in all_visible earlier on to
- * make the choice of whether or not to freeze the page unaffected by the
- * short-term presence of LP_DEAD items. These LP_DEAD items were
- * effectively assumed to be LP_UNUSED items in the making. It doesn't
- * matter which vacuum heap pass (initial pass or final pass) ends up
- * setting the page all-frozen, as long as the ongoing VACUUM does it.
- *
- * Now that freezing has been finalized, unset all_visible if there are
- * any LP_DEAD items on the page. It needs to reflect the present state
- * of the page, as expected for updating the visibility map.
- */
- if (prstate.all_visible && prstate.lpdead_items == 0)
- {
- presult->all_visible = prstate.all_visible;
- presult->all_frozen = prstate.all_frozen;
- }
- else
- {
- presult->all_visible = false;
- presult->all_frozen = false;
- }
-
- presult->hastup = prstate.hastup;
+ if (do_set_vm)
+ LockBuffer(vmbuffer, BUFFER_LOCK_UNLOCK);
- /*
- * If updating the visibility map, the conflict horizon for that record
- * must be the newest xmin on the page. However, if the page is
- * completely frozen, there can be no conflict and the vm_conflict_horizon
- * should remain InvalidTransactionId. This includes the case that we
- * just froze all the tuples; the prune-freeze record included the
- * conflict XID already so the VM update record doesn't need it.
- */
- if (presult->all_frozen)
- presult->vm_conflict_horizon = InvalidTransactionId;
- else
- presult->vm_conflict_horizon = prstate.visibility_cutoff_xid;
+ Assert(!prstate.all_visible || (prstate.lpdead_items == 0));
/*
- * Handle setting visibility map bit based on information from the VM (as
- * of last heap_vac_scan_next_block() call), and from all_visible and
- * all_frozen variables.
+ * VACUUM will call heap_page_is_all_visible() during the second pass over
+ * the heap to determine all_visible and all_frozen for the page -- this
+ * is a specialized version of the logic from this function. Now that
+ * we've finished pruning and freezing, make sure that we're in total
+ * agreement with heap_page_is_all_visible() using an assertion. We will
+ * have already set the page in the VM, so this assertion will only let
+ * you know that you've already done something wrong.
*/
- if (options & HEAP_PAGE_PRUNE_UPDATE_VM)
+#ifdef USE_ASSERT_CHECKING
+ if (prstate.all_visible)
{
- if (identify_and_fix_vm_corruption(relation,
- blockno, buffer, page,
- blk_known_av,
- prstate.lpdead_items, vmbuffer))
- {
- /* If we fix corruption, don't update the VM further */
- }
+ TransactionId debug_cutoff;
+ bool debug_all_frozen;
- /*
- * If the page isn't yet marked all-visible in the VM or it is and
- * needs to be marked all-frozen, update the VM. Note that all_frozen
- * is only valid if all_visible is true, so we must check both
- * all_visible and all_frozen.
- */
- else if (presult->all_visible &&
- (!blk_known_av ||
- (presult->all_frozen && !VM_ALL_FROZEN(relation, blockno, &vmbuffer))))
- {
- Assert(prstate.lpdead_items == 0);
- vmflags = VISIBILITYMAP_ALL_VISIBLE;
+ Assert(cutoffs);
- /*
- * If the page is all-frozen, we can pass InvalidTransactionId as
- * our cutoff_xid, since a snapshotConflictHorizon sufficient to
- * make everything safe for REDO was logged when the page's tuples
- * were frozen.
- */
- if (presult->all_frozen)
- {
- Assert(!TransactionIdIsValid(presult->vm_conflict_horizon));
- vmflags |= VISIBILITYMAP_ALL_FROZEN;
- }
+ Assert(prstate.lpdead_items == 0);
- /*
- * It's possible for the VM bit to be clear and the page-level bit
- * to be set if checksums are not enabled.
- *
- * And even if we are just planning to update the frozen bit in
- * the VM, we shouldn't rely on all_visible_according_to_vm as a
- * proxy for the page-level PD_ALL_VISIBLE bit being set, since it
- * might have become stale.
- *
- * If the heap page is all-visible but the VM bit is not set, we
- * don't need to dirty the heap page. However, if checksums are
- * enabled, we do need to make sure that the heap page is dirtied
- * before passing it to visibilitymap_set(), because it may be
- * logged.
- */
- if (!PageIsAllVisible(page) || XLogHintBitIsNeeded())
- {
- PageSetAllVisible(page);
- MarkBufferDirty(buffer);
- }
+ if (!heap_page_is_all_visible(relation, buffer,
+ cutoffs->OldestXmin,
+ &debug_all_frozen,
+ &debug_cutoff, off_loc))
+ Assert(false);
- old_vmbits = visibilitymap_set(relation, blockno, buffer, InvalidXLogRecPtr,
- vmbuffer, presult->vm_conflict_horizon,
- vmflags);
- }
+ Assert(prstate.all_frozen == debug_all_frozen);
+
+ Assert(!TransactionIdIsValid(debug_cutoff) ||
+ debug_cutoff == prstate.visibility_cutoff_xid);
}
+#endif
+ /* Copy information back for caller */
+ presult->ndeleted = prstate.ndeleted;
+ presult->nnewlpdead = prstate.ndead;
+ presult->nfrozen = prstate.nfrozen;
+ presult->live_tuples = prstate.live_tuples;
+ presult->recently_dead_tuples = prstate.recently_dead_tuples;
+ presult->old_vmbits = old_vmbits;
+ /* new_vmbits was set above */
+ presult->hastup = prstate.hastup;
presult->lpdead_items = prstate.lpdead_items;
/* the presult->deadoffsets array was already filled in */
- presult->old_vmbits = old_vmbits;
- presult->new_vmbits = vmflags;
-
if (prstate.attempt_freeze)
{
if (presult->nfrozen > 0)
@@ -1627,7 +1718,12 @@ heap_prune_record_unchanged_lp_normal(Page page, PruneState *prstate, OffsetNumb
break;
}
- /* Consider freezing any normal tuples which will not be removed */
+ /*
+ * Consider freezing any normal tuples which will not be removed.
+ * Regardless of whether or not we want to freeze the tuples, if we want
+ * to update the VM, we have to call heap_prepare_freeze_tuple() on every
+ * tuple to know whether or not the page will be totally frozen.
+ */
if (prstate->attempt_freeze)
{
bool totally_frozen;
@@ -2190,7 +2286,7 @@ heap_log_freeze_plan(HeapTupleFreeze *tuples, int ntuples,
* - Reaping: During vacuum phase III, items that are already LP_DEAD are
* marked as unused.
*
- * - VM updates: After vacuum phase III, the heap page may be marked
+ * - VM updates: After vacuum phases I and III, the heap page may be marked
* all-visible and all-frozen.
*
* These changes all happen together, so we use a single WAL record for them
@@ -2248,6 +2344,8 @@ log_heap_prune_and_freeze(Relation relation, Buffer buffer,
Assert((vmflags & VISIBILITYMAP_VALID_BITS) == vmflags);
+ Assert(do_prune || nfrozen > 0 || vmflags & VISIBILITYMAP_VALID_BITS);
+
regbuf_flags = REGBUF_STANDARD;
if (force_heap_fpi)
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index 9492423141e..75205179b83 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -2015,34 +2015,6 @@ lazy_scan_prune(LVRelState *vacrel,
vacrel->new_frozen_tuple_pages++;
}
- /*
- * VACUUM will call heap_page_is_all_visible() during the second pass over
- * the heap to determine all_visible and all_frozen for the page -- this
- * is a specialized version of the logic from this function. Now that
- * we've finished pruning and freezing, make sure that we're in total
- * agreement with heap_page_is_all_visible() using an assertion.
- */
-#ifdef USE_ASSERT_CHECKING
- /* Note that all_frozen value does not matter when !all_visible */
- if (presult.all_visible)
- {
- TransactionId debug_cutoff;
- bool debug_all_frozen;
-
- Assert(presult.lpdead_items == 0);
-
- if (!heap_page_is_all_visible(vacrel->rel, buf,
- vacrel->cutoffs.OldestXmin, &debug_all_frozen,
- &debug_cutoff, &vacrel->offnum))
- Assert(false);
-
- Assert(presult.all_frozen == debug_all_frozen);
-
- Assert(!TransactionIdIsValid(debug_cutoff) ||
- debug_cutoff == presult.vm_conflict_horizon);
- }
-#endif
-
/*
* Now save details of the LP_DEAD items from the page in vacrel
*/
@@ -2076,8 +2048,6 @@ lazy_scan_prune(LVRelState *vacrel,
/* Did we find LP_DEAD items? */
*has_lpdead_items = (presult.lpdead_items > 0);
- Assert(!presult.all_visible || !(*has_lpdead_items));
-
/*
* For the purposes of logging, count whether or not the page was newly
* set all-visible and, potentially, all-frozen.
diff --git a/src/include/access/heapam.h b/src/include/access/heapam.h
index 9952ae96b12..3679928d43e 100644
--- a/src/include/access/heapam.h
+++ b/src/include/access/heapam.h
@@ -235,19 +235,12 @@ typedef struct PruneFreezeResult
int recently_dead_tuples;
/*
- * all_visible and all_frozen indicate the status of the page as reflected
- * in the visibility map after pruning, freezing, and setting any pages
- * all-visible in the visibility map.
+ * old_vmbits are the state of the all-visible and all-frozen bits in the
+ * visibility map before updating it during phase I of vacuuming.
+ * new_vmbits are the state of those bits after phase I of vacuuming.
*
- * vm_conflict_horizon is the newest xmin of live tuples on the page
- * (older than OldestXmin). It will only be valid if we did not set the
- * page all-frozen in the VM.
- *
- * These are only set if the HEAP_PRUNE_FREEZE option is set.
+ * These are only set if the HEAP_PAGE_PRUNE_UPDATE_VM option is set.
*/
- bool all_visible;
- bool all_frozen;
- TransactionId vm_conflict_horizon;
uint8 old_vmbits;
uint8 new_vmbits;
--
2.43.0
v13-0012-Remove-xl_heap_visible-entirely.patchtext/x-patch; charset=US-ASCII; name=v13-0012-Remove-xl_heap_visible-entirely.patchDownload
From c3dd8565db8e6f0273a497c584f53bd10057b4b9 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <melanieplageman@gmail.com>
Date: Wed, 18 Jun 2025 12:30:42 -0400
Subject: [PATCH v13 12/20] Remove xl_heap_visible entirely
There are now no users of this, so eliminate it entirely.
---
src/backend/access/common/bufmask.c | 3 +-
src/backend/access/heap/heapam.c | 54 +-------
src/backend/access/heap/heapam_xlog.c | 158 +----------------------
src/backend/access/heap/pruneheap.c | 4 +-
src/backend/access/heap/vacuumlazy.c | 10 +-
src/backend/access/heap/visibilitymap.c | 109 +---------------
src/backend/access/rmgrdesc/heapdesc.c | 10 --
src/backend/replication/logical/decode.c | 1 -
src/backend/storage/ipc/standby.c | 12 +-
src/include/access/heapam_xlog.h | 20 ---
src/include/access/visibilitymap.h | 11 +-
src/include/access/visibilitymapdefs.h | 9 --
src/tools/pgindent/typedefs.list | 1 -
13 files changed, 32 insertions(+), 370 deletions(-)
diff --git a/src/backend/access/common/bufmask.c b/src/backend/access/common/bufmask.c
index bb260cffa68..1fff01383b3 100644
--- a/src/backend/access/common/bufmask.c
+++ b/src/backend/access/common/bufmask.c
@@ -56,8 +56,7 @@ mask_page_hint_bits(Page page)
/*
* During replay, if the page LSN has advanced past our XLOG record's LSN,
- * we don't mark the page all-visible. See heap_xlog_visible() for
- * details.
+ * we don't mark the page all-visible.
*/
PageClearAllVisible(page);
}
diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index c8cd9d22726..dfa9d5a460d 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -40,6 +40,7 @@
#include "access/valid.h"
#include "access/visibilitymap.h"
#include "access/xloginsert.h"
+#include "access/xlogutils.h"
#include "catalog/pg_database.h"
#include "catalog/pg_database_d.h"
#include "commands/vacuum.h"
@@ -2524,11 +2525,11 @@ heap_multi_insert(Relation relation, TupleTableSlot **slots, int ntuples,
else if (all_frozen_set)
{
PageSetAllVisible(page);
- visibilitymap_set_vmbits(relation,
- BufferGetBlockNumber(buffer),
- vmbuffer,
- VISIBILITYMAP_ALL_VISIBLE |
- VISIBILITYMAP_ALL_FROZEN);
+ visibilitymap_set(relation,
+ BufferGetBlockNumber(buffer),
+ vmbuffer,
+ VISIBILITYMAP_ALL_VISIBLE |
+ VISIBILITYMAP_ALL_FROZEN);
}
/*
@@ -8799,49 +8800,6 @@ bottomup_sort_and_shrink(TM_IndexDeleteOp *delstate)
return nblocksfavorable;
}
-/*
- * Perform XLogInsert for a heap-visible operation. 'block' is the block
- * being marked all-visible, and vm_buffer is the buffer containing the
- * corresponding visibility map block. Both should have already been modified
- * and dirtied.
- *
- * snapshotConflictHorizon comes from the largest xmin on the page being
- * marked all-visible. REDO routine uses it to generate recovery conflicts.
- *
- * If checksums or wal_log_hints are enabled, we may also generate a full-page
- * image of heap_buffer. Otherwise, we optimize away the FPI (by specifying
- * REGBUF_NO_IMAGE for the heap buffer), in which case the caller should *not*
- * update the heap page's LSN.
- */
-XLogRecPtr
-log_heap_visible(Relation rel, Buffer heap_buffer, Buffer vm_buffer,
- TransactionId snapshotConflictHorizon, uint8 vmflags)
-{
- xl_heap_visible xlrec;
- XLogRecPtr recptr;
- uint8 flags;
-
- Assert(BufferIsValid(heap_buffer));
- Assert(BufferIsValid(vm_buffer));
-
- xlrec.snapshotConflictHorizon = snapshotConflictHorizon;
- xlrec.flags = vmflags;
- if (RelationIsAccessibleInLogicalDecoding(rel))
- xlrec.flags |= VISIBILITYMAP_XLOG_CATALOG_REL;
- XLogBeginInsert();
- XLogRegisterData(&xlrec, SizeOfHeapVisible);
-
- XLogRegisterBuffer(0, vm_buffer, 0);
-
- flags = REGBUF_STANDARD;
- if (!XLogHintBitIsNeeded())
- flags |= REGBUF_NO_IMAGE;
- XLogRegisterBuffer(1, heap_buffer, flags);
-
- recptr = XLogInsert(RM_HEAP2_ID, XLOG_HEAP2_VISIBLE);
-
- return recptr;
-}
/*
* Perform XLogInsert for a heap-update operation. Caller must already
diff --git a/src/backend/access/heap/heapam_xlog.c b/src/backend/access/heap/heapam_xlog.c
index 83b39a9102c..84c2924967d 100644
--- a/src/backend/access/heap/heapam_xlog.c
+++ b/src/backend/access/heap/heapam_xlog.c
@@ -247,9 +247,7 @@ heap_xlog_prune_freeze(XLogReaderState *record)
* the VM is set.
*
* In recovery, we expect no other writers, so writing to the VM page
- * without holding a lock on the heap page is considered safe enough. It
- * is done this way when replaying xl_heap_visible records (see
- * heap_xlog_visible()).
+ * without holding a lock on the heap page is considered safe enough.
*/
if (vmflags & VISIBILITYMAP_VALID_BITS &&
XLogReadBufferForRedoExtended(record, 1,
@@ -265,7 +263,7 @@ heap_xlog_prune_freeze(XLogReaderState *record)
if (PageIsNew(vmpage))
PageInit(vmpage, BLCKSZ, 0);
- old_vmbits = visibilitymap_set_vmbits(reln, blkno, vmbuffer, vmflags);
+ old_vmbits = visibilitymap_set(reln, blkno, vmbuffer, vmflags);
/* Only set VM page LSN if we modified the page */
if (old_vmbits != vmflags)
@@ -284,142 +282,6 @@ heap_xlog_prune_freeze(XLogReaderState *record)
XLogRecordPageWithFreeSpace(rlocator, blkno, freespace);
}
-/*
- * Replay XLOG_HEAP2_VISIBLE records.
- *
- * The critical integrity requirement here is that we must never end up with
- * a situation where the visibility map bit is set, and the page-level
- * PD_ALL_VISIBLE bit is clear. If that were to occur, then a subsequent
- * page modification would fail to clear the visibility map bit.
- */
-static void
-heap_xlog_visible(XLogReaderState *record)
-{
- XLogRecPtr lsn = record->EndRecPtr;
- xl_heap_visible *xlrec = (xl_heap_visible *) XLogRecGetData(record);
- Buffer vmbuffer = InvalidBuffer;
- Buffer buffer;
- Page page;
- RelFileLocator rlocator;
- BlockNumber blkno;
- XLogRedoAction action;
-
- Assert((xlrec->flags & VISIBILITYMAP_XLOG_VALID_BITS) == xlrec->flags);
-
- XLogRecGetBlockTag(record, 1, &rlocator, NULL, &blkno);
-
- /*
- * If there are any Hot Standby transactions running that have an xmin
- * horizon old enough that this page isn't all-visible for them, they
- * might incorrectly decide that an index-only scan can skip a heap fetch.
- *
- * NB: It might be better to throw some kind of "soft" conflict here that
- * forces any index-only scan that is in flight to perform heap fetches,
- * rather than killing the transaction outright.
- */
- if (InHotStandby)
- ResolveRecoveryConflictWithSnapshot(xlrec->snapshotConflictHorizon,
- xlrec->flags & VISIBILITYMAP_XLOG_CATALOG_REL,
- rlocator);
-
- /*
- * Read the heap page, if it still exists. If the heap file has dropped or
- * truncated later in recovery, we don't need to update the page, but we'd
- * better still update the visibility map.
- */
- action = XLogReadBufferForRedo(record, 1, &buffer);
- if (action == BLK_NEEDS_REDO)
- {
- /*
- * We don't bump the LSN of the heap page when setting the visibility
- * map bit (unless checksums or wal_hint_bits is enabled, in which
- * case we must). This exposes us to torn page hazards, but since
- * we're not inspecting the existing page contents in any way, we
- * don't care.
- */
- page = BufferGetPage(buffer);
-
- PageSetAllVisible(page);
-
- if (XLogHintBitIsNeeded())
- PageSetLSN(page, lsn);
-
- MarkBufferDirty(buffer);
- }
- else if (action == BLK_RESTORED)
- {
- /*
- * If heap block was backed up, we already restored it and there's
- * nothing more to do. (This can only happen with checksums or
- * wal_log_hints enabled.)
- */
- }
-
- if (BufferIsValid(buffer))
- {
- Size space = PageGetFreeSpace(BufferGetPage(buffer));
-
- UnlockReleaseBuffer(buffer);
-
- /*
- * Since FSM is not WAL-logged and only updated heuristically, it
- * easily becomes stale in standbys. If the standby is later promoted
- * and runs VACUUM, it will skip updating individual free space
- * figures for pages that became all-visible (or all-frozen, depending
- * on the vacuum mode,) which is troublesome when FreeSpaceMapVacuum
- * propagates too optimistic free space values to upper FSM layers;
- * later inserters try to use such pages only to find out that they
- * are unusable. This can cause long stalls when there are many such
- * pages.
- *
- * Forestall those problems by updating FSM's idea about a page that
- * is becoming all-visible or all-frozen.
- *
- * Do this regardless of a full-page image being applied, since the
- * FSM data is not in the page anyway.
- */
- if (xlrec->flags & VISIBILITYMAP_VALID_BITS)
- XLogRecordPageWithFreeSpace(rlocator, blkno, space);
- }
-
- /*
- * Even if we skipped the heap page update due to the LSN interlock, it's
- * still safe to update the visibility map. Any WAL record that clears
- * the visibility map bit does so before checking the page LSN, so any
- * bits that need to be cleared will still be cleared.
- */
- if (XLogReadBufferForRedoExtended(record, 0, RBM_ZERO_ON_ERROR, false,
- &vmbuffer) == BLK_NEEDS_REDO)
- {
- Page vmpage = BufferGetPage(vmbuffer);
- Relation reln;
- uint8 vmbits;
-
- /* initialize the page if it was read as zeros */
- if (PageIsNew(vmpage))
- PageInit(vmpage, BLCKSZ, 0);
-
- /* remove VISIBILITYMAP_XLOG_* */
- vmbits = xlrec->flags & VISIBILITYMAP_VALID_BITS;
-
- /*
- * XLogReadBufferForRedoExtended locked the buffer. But
- * visibilitymap_set will handle locking itself.
- */
- LockBuffer(vmbuffer, BUFFER_LOCK_UNLOCK);
-
- reln = CreateFakeRelcacheEntry(rlocator);
-
- visibilitymap_set(reln, blkno, InvalidBuffer, lsn, vmbuffer,
- xlrec->snapshotConflictHorizon, vmbits);
-
- ReleaseBuffer(vmbuffer);
- FreeFakeRelcacheEntry(reln);
- }
- else if (BufferIsValid(vmbuffer))
- UnlockReleaseBuffer(vmbuffer);
-}
-
/*
* Given an "infobits" field from an XLog record, set the correct bits in the
* given infomask and infomask2 for the tuple touched by the record.
@@ -796,9 +658,7 @@ heap_xlog_multi_insert(XLogReaderState *record)
* the VM is set.
*
* In recovery, we expect no other writers, so writing to the VM page
- * without holding a lock on the heap page is considered safe enough. It
- * is done this way when replaying xl_heap_visible records (see
- * heap_xlog_visible()).
+ * without holding a lock on the heap page is considered safe enough.
*/
if (xlrec->flags & XLH_INSERT_ALL_FROZEN_SET &&
XLogReadBufferForRedoExtended(record, 1, RBM_ZERO_ON_ERROR, false,
@@ -811,15 +671,14 @@ heap_xlog_multi_insert(XLogReaderState *record)
if (PageIsNew(vmpage))
PageInit(vmpage, BLCKSZ, 0);
- visibilitymap_set_vmbits(reln, blkno,
- vmbuffer,
- VISIBILITYMAP_ALL_VISIBLE |
- VISIBILITYMAP_ALL_FROZEN);
-
/*
* It is not possible that the VM was already set for this heap page,
* so the vmbuffer must have been modified and marked dirty.
*/
+ visibilitymap_set(reln, blkno,
+ vmbuffer,
+ VISIBILITYMAP_ALL_VISIBLE |
+ VISIBILITYMAP_ALL_FROZEN);
Assert(BufferIsDirty(vmbuffer));
PageSetLSN(BufferGetPage(vmbuffer), lsn);
FreeFakeRelcacheEntry(reln);
@@ -1400,9 +1259,6 @@ heap2_redo(XLogReaderState *record)
case XLOG_HEAP2_PRUNE_VACUUM_CLEANUP:
heap_xlog_prune_freeze(record);
break;
- case XLOG_HEAP2_VISIBLE:
- heap_xlog_visible(record);
- break;
case XLOG_HEAP2_MULTI_INSERT:
heap_xlog_multi_insert(record);
break;
diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index 74f7878c9ac..538e82db8e6 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -989,8 +989,8 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
if (do_set_vm)
{
Assert(PageIsAllVisible(page));
- old_vmbits = visibilitymap_set_vmbits(relation, blockno,
- vmbuffer, vmflags);
+ old_vmbits = visibilitymap_set(relation, blockno,
+ vmbuffer, vmflags);
if (old_vmbits == vmflags)
{
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index 75205179b83..2dcca071a45 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -1888,8 +1888,8 @@ lazy_scan_new_or_empty(LVRelState *vacrel, Buffer buf, BlockNumber blkno,
PageSetAllVisible(page);
MarkBufferDirty(buf);
- visibilitymap_set_vmbits(vacrel->rel, blkno,
- vmbuffer, new_vmbits);
+ visibilitymap_set(vacrel->rel, blkno,
+ vmbuffer, new_vmbits);
if (RelationNeedsWAL(vacrel->rel))
{
@@ -2757,9 +2757,9 @@ lazy_vacuum_heap_page(LVRelState *vacrel, BlockNumber blkno, Buffer buffer,
Assert(!PageIsAllVisible(page));
set_pd_all_vis = true;
PageSetAllVisible(page);
- visibilitymap_set_vmbits(vacrel->rel,
- blkno,
- vmbuffer, vmflags);
+ visibilitymap_set(vacrel->rel,
+ blkno,
+ vmbuffer, vmflags);
conflict_xid = visibility_cutoff_xid;
}
diff --git a/src/backend/access/heap/visibilitymap.c b/src/backend/access/heap/visibilitymap.c
index aa48a436108..f7bad68ffc5 100644
--- a/src/backend/access/heap/visibilitymap.c
+++ b/src/backend/access/heap/visibilitymap.c
@@ -14,8 +14,7 @@
* visibilitymap_clear - clear bits for one page in the visibility map
* visibilitymap_pin - pin a map page for setting a bit
* visibilitymap_pin_ok - check whether correct map page is already pinned
- * visibilitymap_set - set bit(s) in a previously pinned page and log
- * visibilitymap_set_vmbits - set bit(s) in a pinned page
+ * visibilitymap_set - set bit(s) in a previously pinned page
* visibilitymap_get_status - get status of bits
* visibilitymap_count - count number of bits set in visibility map
* visibilitymap_prepare_truncate -
@@ -220,108 +219,6 @@ visibilitymap_pin_ok(BlockNumber heapBlk, Buffer vmbuf)
return BufferIsValid(vmbuf) && BufferGetBlockNumber(vmbuf) == mapBlock;
}
-/*
- * visibilitymap_set - set bit(s) on a previously pinned page
- *
- * recptr is the LSN of the XLOG record we're replaying, if we're in recovery,
- * or InvalidXLogRecPtr in normal running. The VM page LSN is advanced to the
- * one provided; in normal running, we generate a new XLOG record and set the
- * page LSN to that value (though the heap page's LSN may *not* be updated;
- * see below). cutoff_xid is the largest xmin on the page being marked
- * all-visible; it is needed for Hot Standby, and can be InvalidTransactionId
- * if the page contains no tuples. It can also be set to InvalidTransactionId
- * when a page that is already all-visible is being marked all-frozen.
- *
- * Caller is expected to set the heap page's PD_ALL_VISIBLE bit before calling
- * this function. Except in recovery, caller should also pass the heap
- * buffer. When checksums are enabled and we're not in recovery, we must add
- * the heap buffer to the WAL chain to protect it from being torn.
- *
- * You must pass a buffer containing the correct map page to this function.
- * Call visibilitymap_pin first to pin the right one. This function doesn't do
- * any I/O.
- *
- * Returns the state of the page's VM bits before setting flags.
- */
-uint8
-visibilitymap_set(Relation rel, BlockNumber heapBlk, Buffer heapBuf,
- XLogRecPtr recptr, Buffer vmBuf, TransactionId cutoff_xid,
- uint8 flags)
-{
- BlockNumber mapBlock = HEAPBLK_TO_MAPBLOCK(heapBlk);
- uint32 mapByte = HEAPBLK_TO_MAPBYTE(heapBlk);
- uint8 mapOffset = HEAPBLK_TO_OFFSET(heapBlk);
- Page page;
- uint8 *map;
- uint8 status;
-
-#ifdef TRACE_VISIBILITYMAP
- elog(DEBUG1, "vm_set flags 0x%02X for %s %d",
- flags, RelationGetRelationName(rel), heapBlk);
-#endif
-
- Assert(InRecovery || XLogRecPtrIsInvalid(recptr));
- Assert(InRecovery || PageIsAllVisible(BufferGetPage(heapBuf)));
- Assert((flags & VISIBILITYMAP_VALID_BITS) == flags);
-
- /* Must never set all_frozen bit without also setting all_visible bit */
- Assert(flags != VISIBILITYMAP_ALL_FROZEN);
-
- /* Check that we have the right heap page pinned, if present */
- if (BufferIsValid(heapBuf) && BufferGetBlockNumber(heapBuf) != heapBlk)
- elog(ERROR, "wrong heap buffer passed to visibilitymap_set");
-
- Assert(!BufferIsValid(heapBuf) || BufferIsExclusiveLocked(heapBuf));
-
- /* Check that we have the right VM page pinned */
- if (!BufferIsValid(vmBuf) || BufferGetBlockNumber(vmBuf) != mapBlock)
- elog(ERROR, "wrong VM buffer passed to visibilitymap_set");
-
- page = BufferGetPage(vmBuf);
- map = (uint8 *) PageGetContents(page);
- LockBuffer(vmBuf, BUFFER_LOCK_EXCLUSIVE);
-
- status = (map[mapByte] >> mapOffset) & VISIBILITYMAP_VALID_BITS;
- if (flags != status)
- {
- START_CRIT_SECTION();
-
- map[mapByte] |= (flags << mapOffset);
- MarkBufferDirty(vmBuf);
-
- if (RelationNeedsWAL(rel))
- {
- if (XLogRecPtrIsInvalid(recptr))
- {
- Assert(!InRecovery);
- recptr = log_heap_visible(rel, heapBuf, vmBuf, cutoff_xid, flags);
-
- /*
- * If data checksums are enabled (or wal_log_hints=on), we
- * need to protect the heap page from being torn.
- *
- * If not, then we must *not* update the heap page's LSN. In
- * this case, the FPI for the heap page was omitted from the
- * WAL record inserted above, so it would be incorrect to
- * update the heap page's LSN.
- */
- if (XLogHintBitIsNeeded())
- {
- Page heapPage = BufferGetPage(heapBuf);
-
- PageSetLSN(heapPage, recptr);
- }
- }
- PageSetLSN(page, recptr);
- }
-
- END_CRIT_SECTION();
- }
-
- LockBuffer(vmBuf, BUFFER_LOCK_UNLOCK);
- return status;
-}
-
/*
* Set flags in the VM block contained in the passed in vmBuf.
*
@@ -343,8 +240,8 @@ visibilitymap_set(Relation rel, BlockNumber heapBlk, Buffer heapBuf,
* is pinned and exclusive locked.
*/
uint8
-visibilitymap_set_vmbits(Relation rel, BlockNumber heapBlk,
- Buffer vmBuf, uint8 flags)
+visibilitymap_set(Relation rel, BlockNumber heapBlk,
+ Buffer vmBuf, uint8 flags)
{
BlockNumber mapBlock = HEAPBLK_TO_MAPBLOCK(heapBlk);
uint32 mapByte = HEAPBLK_TO_MAPBYTE(heapBlk);
diff --git a/src/backend/access/rmgrdesc/heapdesc.c b/src/backend/access/rmgrdesc/heapdesc.c
index 1cb44ca32d3..93505cb8c56 100644
--- a/src/backend/access/rmgrdesc/heapdesc.c
+++ b/src/backend/access/rmgrdesc/heapdesc.c
@@ -349,13 +349,6 @@ heap2_desc(StringInfo buf, XLogReaderState *record)
}
}
}
- else if (info == XLOG_HEAP2_VISIBLE)
- {
- xl_heap_visible *xlrec = (xl_heap_visible *) rec;
-
- appendStringInfo(buf, "snapshotConflictHorizon: %u, flags: 0x%02X",
- xlrec->snapshotConflictHorizon, xlrec->flags);
- }
else if (info == XLOG_HEAP2_MULTI_INSERT)
{
xl_heap_multi_insert *xlrec = (xl_heap_multi_insert *) rec;
@@ -460,9 +453,6 @@ heap2_identify(uint8 info)
case XLOG_HEAP2_PRUNE_VACUUM_CLEANUP:
id = "PRUNE_VACUUM_CLEANUP";
break;
- case XLOG_HEAP2_VISIBLE:
- id = "VISIBLE";
- break;
case XLOG_HEAP2_MULTI_INSERT:
id = "MULTI_INSERT";
break;
diff --git a/src/backend/replication/logical/decode.c b/src/backend/replication/logical/decode.c
index cc03f0706e9..2fdd4af90a8 100644
--- a/src/backend/replication/logical/decode.c
+++ b/src/backend/replication/logical/decode.c
@@ -454,7 +454,6 @@ heap2_decode(LogicalDecodingContext *ctx, XLogRecordBuffer *buf)
case XLOG_HEAP2_PRUNE_ON_ACCESS:
case XLOG_HEAP2_PRUNE_VACUUM_SCAN:
case XLOG_HEAP2_PRUNE_VACUUM_CLEANUP:
- case XLOG_HEAP2_VISIBLE:
case XLOG_HEAP2_LOCK_UPDATED:
break;
default:
diff --git a/src/backend/storage/ipc/standby.c b/src/backend/storage/ipc/standby.c
index 4222bdab078..c619643e121 100644
--- a/src/backend/storage/ipc/standby.c
+++ b/src/backend/storage/ipc/standby.c
@@ -475,12 +475,12 @@ ResolveRecoveryConflictWithSnapshot(TransactionId snapshotConflictHorizon,
* If we get passed InvalidTransactionId then we do nothing (no conflict).
*
* This can happen when replaying already-applied WAL records after a
- * standby crash or restart, or when replaying an XLOG_HEAP2_VISIBLE
- * record that marks as frozen a page which was already all-visible. It's
- * also quite common with records generated during index deletion
- * (original execution of the deletion can reason that a recovery conflict
- * which is sufficient for the deletion operation must take place before
- * replay of the deletion record itself).
+ * standby crash or restart, or when replaying a record that marks as
+ * frozen a page which was already marked all-visible in the visibility
+ * map. It's also quite common with records generated during index
+ * deletion (original execution of the deletion can reason that a recovery
+ * conflict which is sufficient for the deletion operation must take place
+ * before replay of the deletion record itself).
*/
if (!TransactionIdIsValid(snapshotConflictHorizon))
return;
diff --git a/src/include/access/heapam_xlog.h b/src/include/access/heapam_xlog.h
index 6d759c197a1..cdd6acbea1c 100644
--- a/src/include/access/heapam_xlog.h
+++ b/src/include/access/heapam_xlog.h
@@ -60,7 +60,6 @@
#define XLOG_HEAP2_PRUNE_ON_ACCESS 0x10
#define XLOG_HEAP2_PRUNE_VACUUM_SCAN 0x20
#define XLOG_HEAP2_PRUNE_VACUUM_CLEANUP 0x30
-#define XLOG_HEAP2_VISIBLE 0x40
#define XLOG_HEAP2_MULTI_INSERT 0x50
#define XLOG_HEAP2_LOCK_UPDATED 0x60
#define XLOG_HEAP2_NEW_CID 0x70
@@ -450,20 +449,6 @@ typedef struct xl_heap_inplace
#define MinSizeOfHeapInplace (offsetof(xl_heap_inplace, nmsgs) + sizeof(int))
-/*
- * This is what we need to know about setting a visibility map bit
- *
- * Backup blk 0: visibility map buffer
- * Backup blk 1: heap buffer
- */
-typedef struct xl_heap_visible
-{
- TransactionId snapshotConflictHorizon;
- uint8 flags;
-} xl_heap_visible;
-
-#define SizeOfHeapVisible (offsetof(xl_heap_visible, flags) + sizeof(uint8))
-
typedef struct xl_heap_new_cid
{
/*
@@ -507,11 +492,6 @@ extern void heap2_desc(StringInfo buf, XLogReaderState *record);
extern const char *heap2_identify(uint8 info);
extern void heap_xlog_logical_rewrite(XLogReaderState *r);
-extern XLogRecPtr log_heap_visible(Relation rel, Buffer heap_buffer,
- Buffer vm_buffer,
- TransactionId snapshotConflictHorizon,
- uint8 vmflags);
-
/* in heapdesc.c, so it can be shared between frontend/backend code */
extern void heap_xlog_deserialize_prune_and_freeze(char *cursor, uint16 flags,
int *nplans, xlhp_freeze_plan **plans,
diff --git a/src/include/access/visibilitymap.h b/src/include/access/visibilitymap.h
index fc7056a91ea..c5b1e1f7adb 100644
--- a/src/include/access/visibilitymap.h
+++ b/src/include/access/visibilitymap.h
@@ -15,7 +15,6 @@
#define VISIBILITYMAP_H
#include "access/visibilitymapdefs.h"
-#include "access/xlogdefs.h"
#include "storage/block.h"
#include "storage/buf.h"
#include "utils/relcache.h"
@@ -31,14 +30,8 @@ extern bool visibilitymap_clear(Relation rel, BlockNumber heapBlk,
extern void visibilitymap_pin(Relation rel, BlockNumber heapBlk,
Buffer *vmbuf);
extern bool visibilitymap_pin_ok(BlockNumber heapBlk, Buffer vmbuf);
-extern uint8 visibilitymap_set(Relation rel,
- BlockNumber heapBlk, Buffer heapBuf,
- XLogRecPtr recptr,
- Buffer vmBuf,
- TransactionId cutoff_xid,
- uint8 flags);
-extern uint8 visibilitymap_set_vmbits(Relation rel, BlockNumber heapBlk,
- Buffer vmBuf, uint8 flags);
+extern uint8 visibilitymap_set(Relation rel, BlockNumber heapBlk,
+ Buffer vmBuf, uint8 flags);
extern uint8 visibilitymap_get_status(Relation rel, BlockNumber heapBlk, Buffer *vmbuf);
extern void visibilitymap_count(Relation rel, BlockNumber *all_visible, BlockNumber *all_frozen);
extern BlockNumber visibilitymap_prepare_truncate(Relation rel,
diff --git a/src/include/access/visibilitymapdefs.h b/src/include/access/visibilitymapdefs.h
index 5ad5c020877..e01bce4c99f 100644
--- a/src/include/access/visibilitymapdefs.h
+++ b/src/include/access/visibilitymapdefs.h
@@ -21,14 +21,5 @@
#define VISIBILITYMAP_ALL_FROZEN 0x02
#define VISIBILITYMAP_VALID_BITS 0x03 /* OR of all valid visibilitymap
* flags bits */
-/*
- * To detect recovery conflicts during logical decoding on a standby, we need
- * to know if a table is a user catalog table. For that we add an additional
- * bit into xl_heap_visible.flags, in addition to the above.
- *
- * NB: VISIBILITYMAP_XLOG_* may not be passed to visibilitymap_set().
- */
-#define VISIBILITYMAP_XLOG_CATALOG_REL 0x04
-#define VISIBILITYMAP_XLOG_VALID_BITS (VISIBILITYMAP_VALID_BITS | VISIBILITYMAP_XLOG_CATALOG_REL)
#endif /* VISIBILITYMAPDEFS_H */
diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list
index a13e8162890..b4c880c083f 100644
--- a/src/tools/pgindent/typedefs.list
+++ b/src/tools/pgindent/typedefs.list
@@ -4273,7 +4273,6 @@ xl_heap_prune
xl_heap_rewrite_mapping
xl_heap_truncate
xl_heap_update
-xl_heap_visible
xl_invalid_page
xl_invalid_page_key
xl_invalidations
--
2.43.0
v13-0015-Inline-TransactionIdFollows-Precedes.patchtext/x-patch; charset=US-ASCII; name=v13-0015-Inline-TransactionIdFollows-Precedes.patchDownload
From 11aaf7bb0e74e846631bd6e82aae6f2ecf19e431 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <melanieplageman@gmail.com>
Date: Tue, 29 Jul 2025 09:57:13 -0400
Subject: [PATCH v13 15/20] Inline TransactionIdFollows/Precedes()
Calling these from on-access pruning code had noticeable overhead in a
profile. There does not seem to be a reason not to inline them.
Reviewed-by: Kirill Reshke <reshkekirill@gmail.com>
---
src/backend/access/transam/transam.c | 64 -------------------------
src/include/access/transam.h | 70 ++++++++++++++++++++++++++--
2 files changed, 66 insertions(+), 68 deletions(-)
diff --git a/src/backend/access/transam/transam.c b/src/backend/access/transam/transam.c
index 9a39451a29a..553d6756cb3 100644
--- a/src/backend/access/transam/transam.c
+++ b/src/backend/access/transam/transam.c
@@ -273,70 +273,6 @@ TransactionIdAbortTree(TransactionId xid, int nxids, TransactionId *xids)
TRANSACTION_STATUS_ABORTED, InvalidXLogRecPtr);
}
-/*
- * TransactionIdPrecedes --- is id1 logically < id2?
- */
-bool
-TransactionIdPrecedes(TransactionId id1, TransactionId id2)
-{
- /*
- * If either ID is a permanent XID then we can just do unsigned
- * comparison. If both are normal, do a modulo-2^32 comparison.
- */
- int32 diff;
-
- if (!TransactionIdIsNormal(id1) || !TransactionIdIsNormal(id2))
- return (id1 < id2);
-
- diff = (int32) (id1 - id2);
- return (diff < 0);
-}
-
-/*
- * TransactionIdPrecedesOrEquals --- is id1 logically <= id2?
- */
-bool
-TransactionIdPrecedesOrEquals(TransactionId id1, TransactionId id2)
-{
- int32 diff;
-
- if (!TransactionIdIsNormal(id1) || !TransactionIdIsNormal(id2))
- return (id1 <= id2);
-
- diff = (int32) (id1 - id2);
- return (diff <= 0);
-}
-
-/*
- * TransactionIdFollows --- is id1 logically > id2?
- */
-bool
-TransactionIdFollows(TransactionId id1, TransactionId id2)
-{
- int32 diff;
-
- if (!TransactionIdIsNormal(id1) || !TransactionIdIsNormal(id2))
- return (id1 > id2);
-
- diff = (int32) (id1 - id2);
- return (diff > 0);
-}
-
-/*
- * TransactionIdFollowsOrEquals --- is id1 logically >= id2?
- */
-bool
-TransactionIdFollowsOrEquals(TransactionId id1, TransactionId id2)
-{
- int32 diff;
-
- if (!TransactionIdIsNormal(id1) || !TransactionIdIsNormal(id2))
- return (id1 >= id2);
-
- diff = (int32) (id1 - id2);
- return (diff >= 0);
-}
-
/*
* TransactionIdLatest --- get latest XID among a main xact and its children
diff --git a/src/include/access/transam.h b/src/include/access/transam.h
index 7d82cd2eb56..c9e20418275 100644
--- a/src/include/access/transam.h
+++ b/src/include/access/transam.h
@@ -255,6 +255,72 @@ typedef struct TransamVariablesData
} TransamVariablesData;
+
+/*
+ * TransactionIdPrecedes --- is id1 logically < id2?
+ */
+static inline bool
+TransactionIdPrecedes(TransactionId id1, TransactionId id2)
+{
+ /*
+ * If either ID is a permanent XID then we can just do unsigned
+ * comparison. If both are normal, do a modulo-2^32 comparison.
+ */
+ int32 diff;
+
+ if (!TransactionIdIsNormal(id1) || !TransactionIdIsNormal(id2))
+ return (id1 < id2);
+
+ diff = (int32) (id1 - id2);
+ return (diff < 0);
+}
+
+/*
+ * TransactionIdPrecedesOrEquals --- is id1 logically <= id2?
+ */
+static inline bool
+TransactionIdPrecedesOrEquals(TransactionId id1, TransactionId id2)
+{
+ int32 diff;
+
+ if (!TransactionIdIsNormal(id1) || !TransactionIdIsNormal(id2))
+ return (id1 <= id2);
+
+ diff = (int32) (id1 - id2);
+ return (diff <= 0);
+}
+
+/*
+ * TransactionIdFollows --- is id1 logically > id2?
+ */
+static inline bool
+TransactionIdFollows(TransactionId id1, TransactionId id2)
+{
+ int32 diff;
+
+ if (!TransactionIdIsNormal(id1) || !TransactionIdIsNormal(id2))
+ return (id1 > id2);
+
+ diff = (int32) (id1 - id2);
+ return (diff > 0);
+}
+
+/*
+ * TransactionIdFollowsOrEquals --- is id1 logically >= id2?
+ */
+static inline bool
+TransactionIdFollowsOrEquals(TransactionId id1, TransactionId id2)
+{
+ int32 diff;
+
+ if (!TransactionIdIsNormal(id1) || !TransactionIdIsNormal(id2))
+ return (id1 >= id2);
+
+ diff = (int32) (id1 - id2);
+ return (diff >= 0);
+}
+
+
/* ----------------
* extern declarations
* ----------------
@@ -274,10 +340,6 @@ extern bool TransactionIdDidAbort(TransactionId transactionId);
extern void TransactionIdCommitTree(TransactionId xid, int nxids, TransactionId *xids);
extern void TransactionIdAsyncCommitTree(TransactionId xid, int nxids, TransactionId *xids, XLogRecPtr lsn);
extern void TransactionIdAbortTree(TransactionId xid, int nxids, TransactionId *xids);
-extern bool TransactionIdPrecedes(TransactionId id1, TransactionId id2);
-extern bool TransactionIdPrecedesOrEquals(TransactionId id1, TransactionId id2);
-extern bool TransactionIdFollows(TransactionId id1, TransactionId id2);
-extern bool TransactionIdFollowsOrEquals(TransactionId id1, TransactionId id2);
extern TransactionId TransactionIdLatest(TransactionId mainxid,
int nxids, const TransactionId *xids);
extern XLogRecPtr TransactionIdGetCommitLSN(TransactionId xid);
--
2.43.0
v13-0013-Rename-GlobalVisTestIsRemovableXid-to-GlobalVisX.patchtext/x-patch; charset=US-ASCII; name=v13-0013-Rename-GlobalVisTestIsRemovableXid-to-GlobalVisX.patchDownload
From 12caa7c46ccd46d6b62efd58aac1eb1166bc141f Mon Sep 17 00:00:00 2001
From: Melanie Plageman <melanieplageman@gmail.com>
Date: Fri, 18 Jul 2025 16:30:04 -0400
Subject: [PATCH v13 13/20] Rename GlobalVisTestIsRemovableXid() to
GlobalVisXidVisibleToAll()
Currently, we only use GlobalVisTestIsRemovableXid() to check if a
tuple's xmax is visible to all, meaning we can remove it. But future
commits will use GlobalVisTestIsRemovableXid() to test if a tuple's xmin
is visible to all for the purposes of determining if setting the page
all-visible in the VM. In that case, it makes more sense to call the
function GlobalVisXidVisibleToAll().
Reviewed-by: Kirill Reshke <reshkekirill@gmail.com>
---
src/backend/access/heap/heapam_visibility.c | 6 +++---
src/backend/access/heap/pruneheap.c | 14 +++++++-------
src/backend/access/spgist/spgvacuum.c | 2 +-
src/backend/storage/ipc/procarray.c | 13 ++++++-------
src/include/utils/snapmgr.h | 4 ++--
5 files changed, 19 insertions(+), 20 deletions(-)
diff --git a/src/backend/access/heap/heapam_visibility.c b/src/backend/access/heap/heapam_visibility.c
index 05f6946fe60..4ebc8abdbeb 100644
--- a/src/backend/access/heap/heapam_visibility.c
+++ b/src/backend/access/heap/heapam_visibility.c
@@ -1447,7 +1447,7 @@ HeapTupleSatisfiesNonVacuumable(HeapTuple htup, Snapshot snapshot,
{
Assert(TransactionIdIsValid(dead_after));
- if (GlobalVisTestIsRemovableXid(snapshot->vistest, dead_after))
+ if (GlobalVisXidVisibleToAll(snapshot->vistest, dead_after))
res = HEAPTUPLE_DEAD;
}
else
@@ -1512,8 +1512,8 @@ HeapTupleIsSurelyDead(HeapTuple htup, GlobalVisState *vistest)
return false;
/* Deleter committed, so tuple is dead if the XID is old enough. */
- return GlobalVisTestIsRemovableXid(vistest,
- HeapTupleHeaderGetRawXmax(tuple));
+ return GlobalVisXidVisibleToAll(vistest,
+ HeapTupleHeaderGetRawXmax(tuple));
}
/*
diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index 538e82db8e6..480ada99e22 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -231,7 +231,7 @@ heap_page_prune_opt(Relation relation, Buffer buffer)
*/
vistest = GlobalVisTestFor(relation);
- if (!GlobalVisTestIsRemovableXid(vistest, prune_xid))
+ if (!GlobalVisXidVisibleToAll(vistest, prune_xid))
return;
/*
@@ -580,9 +580,9 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
* Determining HTSV only once for each tuple is required for correctness,
* to deal with cases where running HTSV twice could result in different
* results. For example, RECENTLY_DEAD can turn to DEAD if another
- * checked item causes GlobalVisTestIsRemovableFullXid() to update the
- * horizon, or INSERT_IN_PROGRESS can change to DEAD if the inserting
- * transaction aborts.
+ * checked item causes GlobalVisXidVisibleToAll() to update the horizon,
+ * or INSERT_IN_PROGRESS can change to DEAD if the inserting transaction
+ * aborts.
*
* It's also good for performance. Most commonly tuples within a page are
* stored at decreasing offsets (while the items are stored at increasing
@@ -1182,11 +1182,11 @@ heap_prune_satisfies_vacuum(PruneState *prstate, HeapTuple tup, Buffer buffer)
* Determine whether or not the tuple is considered dead when compared
* with the provided GlobalVisState. On-access pruning does not provide
* VacuumCutoffs. And for vacuum, even if the tuple's xmax is not older
- * than OldestXmin, GlobalVisTestIsRemovableXid() could find the row dead
- * if the GlobalVisState has been updated since the beginning of vacuuming
+ * than OldestXmin, GlobalVisXidVisibleToAll() could find the row dead if
+ * the GlobalVisState has been updated since the beginning of vacuuming
* the relation.
*/
- if (GlobalVisTestIsRemovableXid(prstate->vistest, dead_after))
+ if (GlobalVisXidVisibleToAll(prstate->vistest, dead_after))
return HEAPTUPLE_DEAD;
return res;
diff --git a/src/backend/access/spgist/spgvacuum.c b/src/backend/access/spgist/spgvacuum.c
index 8f8a1ad7796..496cca69410 100644
--- a/src/backend/access/spgist/spgvacuum.c
+++ b/src/backend/access/spgist/spgvacuum.c
@@ -536,7 +536,7 @@ vacuumRedirectAndPlaceholder(Relation index, Relation heaprel, Buffer buffer)
*/
if (dt->tupstate == SPGIST_REDIRECT &&
(!TransactionIdIsValid(dt->xid) ||
- GlobalVisTestIsRemovableXid(vistest, dt->xid)))
+ GlobalVisXidVisibleToAll(vistest, dt->xid)))
{
dt->tupstate = SPGIST_PLACEHOLDER;
Assert(opaque->nRedirection > 0);
diff --git a/src/backend/storage/ipc/procarray.c b/src/backend/storage/ipc/procarray.c
index 200f72c6e25..f67f01c17c2 100644
--- a/src/backend/storage/ipc/procarray.c
+++ b/src/backend/storage/ipc/procarray.c
@@ -4181,8 +4181,7 @@ GlobalVisUpdate(void)
* See comment for GlobalVisState for details.
*/
bool
-GlobalVisTestIsRemovableFullXid(GlobalVisState *state,
- FullTransactionId fxid)
+GlobalVisFullXidVisibleToAll(GlobalVisState *state, FullTransactionId fxid)
{
/*
* If fxid is older than maybe_needed bound, it definitely is visible to
@@ -4223,7 +4222,7 @@ GlobalVisTestIsRemovableFullXid(GlobalVisState *state,
* relfrozenxid).
*/
bool
-GlobalVisTestIsRemovableXid(GlobalVisState *state, TransactionId xid)
+GlobalVisXidVisibleToAll(GlobalVisState *state, TransactionId xid)
{
FullTransactionId fxid;
@@ -4237,7 +4236,7 @@ GlobalVisTestIsRemovableXid(GlobalVisState *state, TransactionId xid)
*/
fxid = FullXidRelativeTo(state->definitely_needed, xid);
- return GlobalVisTestIsRemovableFullXid(state, fxid);
+ return GlobalVisFullXidVisibleToAll(state, fxid);
}
/*
@@ -4251,12 +4250,12 @@ GlobalVisCheckRemovableFullXid(Relation rel, FullTransactionId fxid)
state = GlobalVisTestFor(rel);
- return GlobalVisTestIsRemovableFullXid(state, fxid);
+ return GlobalVisFullXidVisibleToAll(state, fxid);
}
/*
* Convenience wrapper around GlobalVisTestFor() and
- * GlobalVisTestIsRemovableXid(), see their comments.
+ * GlobalVisTestIsVisibleXid(), see their comments.
*/
bool
GlobalVisCheckRemovableXid(Relation rel, TransactionId xid)
@@ -4265,7 +4264,7 @@ GlobalVisCheckRemovableXid(Relation rel, TransactionId xid)
state = GlobalVisTestFor(rel);
- return GlobalVisTestIsRemovableXid(state, xid);
+ return GlobalVisXidVisibleToAll(state, xid);
}
/*
diff --git a/src/include/utils/snapmgr.h b/src/include/utils/snapmgr.h
index 604c1f90216..a0ea2cfcea2 100644
--- a/src/include/utils/snapmgr.h
+++ b/src/include/utils/snapmgr.h
@@ -100,8 +100,8 @@ extern char *ExportSnapshot(Snapshot snapshot);
*/
typedef struct GlobalVisState GlobalVisState;
extern GlobalVisState *GlobalVisTestFor(Relation rel);
-extern bool GlobalVisTestIsRemovableXid(GlobalVisState *state, TransactionId xid);
-extern bool GlobalVisTestIsRemovableFullXid(GlobalVisState *state, FullTransactionId fxid);
+extern bool GlobalVisXidVisibleToAll(GlobalVisState *state, TransactionId xid);
+extern bool GlobalVisFullXidVisibleToAll(GlobalVisState *state, FullTransactionId fxid);
extern bool GlobalVisCheckRemovableXid(Relation rel, TransactionId xid);
extern bool GlobalVisCheckRemovableFullXid(Relation rel, FullTransactionId fxid);
--
2.43.0
v13-0014-Use-GlobalVisState-to-determine-page-level-visib.patchtext/x-patch; charset=US-ASCII; name=v13-0014-Use-GlobalVisState-to-determine-page-level-visib.patchDownload
From 1c10116933e50ae2be845f38f5c09f13382bd0f6 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <melanieplageman@gmail.com>
Date: Tue, 29 Jul 2025 14:38:24 -0400
Subject: [PATCH v13 14/20] Use GlobalVisState to determine page level
visibility
During pruning and during vacuum's third phase, we try to determine if
the whole page can be set all-visible in the visibility map. Instead of
using OldestXmin to determine if all the tuples on a page are visible to
everyone, use the GlobalVisState. This allows us to start setting the VM
during on-access pruning in a future commit.
It is possible for the GlobalVisState to change during the course of a
vacuum. In all but extraordinary cases, it moves forward, meaning more
pages could potentially be set in the VM.
Because comparing a transaction ID to the GlobalVisState requires more
operations than comparing it to another single transaction ID, we now
wait until after examining all the tuples on the page and if we have
maintained the visibility_cutoff_xid, we compare that to the
GlobalVisState just once per page. This works because if the page is
all-visible and has live, committed tuples on it, the
visibility_cutoff_xid will contain the newest xmin on the page. If
everyone can see it, the page is truly all-visible.
Doing this may mean we examine more tuples' xmins than before, as we may
have set all_visible to false sooner when encountering a live tuple
newer than OldestXmin. However, these extra comparisons were found not
to be significant in a profile.
---
src/backend/access/heap/heapam_visibility.c | 28 ++++++++++++
src/backend/access/heap/pruneheap.c | 48 +++++++++------------
src/backend/access/heap/vacuumlazy.c | 19 ++++----
src/include/access/heapam.h | 4 +-
4 files changed, 60 insertions(+), 39 deletions(-)
diff --git a/src/backend/access/heap/heapam_visibility.c b/src/backend/access/heap/heapam_visibility.c
index 4ebc8abdbeb..edd529dc3c0 100644
--- a/src/backend/access/heap/heapam_visibility.c
+++ b/src/backend/access/heap/heapam_visibility.c
@@ -1189,6 +1189,34 @@ HeapTupleSatisfiesVacuum(HeapTuple htup, TransactionId OldestXmin,
return res;
}
+/*
+ * Nearly the same as HeapTupleSatisfiesVacuum, but uses a GlobalVisState to
+ * determine whether or not a tuple is HEAPTUPLE_DEAD Or
+ * HEAPTUPLE_RECENTLY_DEAD. It serves the same purpose but can be used by
+ * callers that have not calculated a single OldestXmin value.
+ */
+HTSV_Result
+HeapTupleSatisfiesVacuumGlobalVis(HeapTuple htup, GlobalVisState *vistest,
+ Buffer buffer)
+{
+ TransactionId dead_after = InvalidTransactionId;
+ HTSV_Result res;
+
+ res = HeapTupleSatisfiesVacuumHorizon(htup, buffer, &dead_after);
+
+ if (res == HEAPTUPLE_RECENTLY_DEAD)
+ {
+ Assert(TransactionIdIsValid(dead_after));
+
+ if (GlobalVisXidVisibleToAll(vistest, dead_after))
+ res = HEAPTUPLE_DEAD;
+ }
+ else
+ Assert(!TransactionIdIsValid(dead_after));
+
+ return res;
+}
+
/*
* Work horse for HeapTupleSatisfiesVacuum and similar routines.
*
diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index 480ada99e22..b8ca1be15a0 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -141,10 +141,9 @@ typedef struct
* all_visible and all_frozen indicate if the all-visible and all-frozen
* bits in the visibility map can be set for this page after pruning.
*
- * visibility_cutoff_xid is the newest xmin of live tuples on the page.
- * The caller can use it as the conflict horizon, when setting the VM
- * bits. It is only valid if we froze some tuples, and all_frozen is
- * true.
+ * visibility_cutoff_xid is the newest xmin of live tuples on the page. It
+ * can be used as the conflict horizon, when setting the VM or when
+ * freezing all the live tuples on the page.
*
* NOTE: all_visible and all_frozen don't include LP_DEAD items until
* directly before updating the VM. We ignore LP_DEAD items when deciding
@@ -559,14 +558,12 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
/*
* The visibility cutoff xid is the newest xmin of live, committed tuples
- * older than OldestXmin on the page. This field is only kept up-to-date
- * if the page is all-visible. As soon as a tuple is encountered that is
- * not visible to all, this field is unmaintained. As long as it is
- * maintained, it can be used to calculate the snapshot conflict horizon.
- * This is most likely to happen when updating the VM and/or freezing all
- * live tuples on the page. It is updated before returning to the caller
- * because vacuum does assert-build only validation on the page using this
- * field.
+ * on the page older than the visibility horizon represented in the
+ * GlobalVisState.
+ *
+ * If we encounter an uncommitted tuple, this field is unmaintained. If
+ * the page is being set all-visible or when freezing all live tuples on
+ * the page, it is used to calculate the snapshot conflict horizon.
*/
prstate.visibility_cutoff_xid = InvalidTransactionId;
@@ -762,6 +759,16 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
prstate.ndead > 0 ||
prstate.nunused > 0;
+ /*
+ * After processing all the live tuples on the page, if the newest xmin
+ * amongst them is not visible to everyone, the page cannot be
+ * all-visible.
+ */
+ if (prstate.all_visible &&
+ TransactionIdIsNormal(prstate.visibility_cutoff_xid) &&
+ !GlobalVisXidVisibleToAll(prstate.vistest, prstate.visibility_cutoff_xid))
+ prstate.all_visible = prstate.all_frozen = false;
+
/*
* Even if we don't prune anything, if we found a new value for the
* pd_prune_xid field or the page was marked full, we will update those
@@ -1108,12 +1115,10 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
TransactionId debug_cutoff;
bool debug_all_frozen;
- Assert(cutoffs);
-
Assert(prstate.lpdead_items == 0);
if (!heap_page_is_all_visible(relation, buffer,
- cutoffs->OldestXmin,
+ prstate.vistest,
&debug_all_frozen,
&debug_cutoff, off_loc))
Assert(false);
@@ -1638,19 +1643,6 @@ heap_prune_record_unchanged_lp_normal(Page page, PruneState *prstate, OffsetNumb
*/
xmin = HeapTupleHeaderGetXmin(htup);
- /*
- * For now always use prstate->cutoffs for this test, because
- * we only update 'all_visible' when freezing is requested. We
- * could use GlobalVisTestIsRemovableXid instead, if a
- * non-freezing caller wanted to set the VM bit.
- */
- Assert(prstate->cutoffs);
- if (!TransactionIdPrecedes(xmin, prstate->cutoffs->OldestXmin))
- {
- prstate->all_visible = prstate->all_frozen = false;
- break;
- }
-
/* Track newest xmin on page. */
if (TransactionIdFollows(xmin, prstate->visibility_cutoff_xid) &&
TransactionIdIsNormal(xmin))
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index 2dcca071a45..4ad05ba4db6 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -464,7 +464,7 @@ static void dead_items_add(LVRelState *vacrel, BlockNumber blkno, OffsetNumber *
static void dead_items_reset(LVRelState *vacrel);
static void dead_items_cleanup(LVRelState *vacrel);
static bool heap_page_would_be_all_visible(Relation rel, Buffer buf,
- TransactionId OldestXmin,
+ GlobalVisState *vistest,
OffsetNumber *deadoffsets,
int ndeadoffsets,
bool *all_frozen,
@@ -2717,7 +2717,7 @@ lazy_vacuum_heap_page(LVRelState *vacrel, BlockNumber blkno, Buffer buffer,
InvalidOffsetNumber);
if (heap_page_would_be_all_visible(vacrel->rel, buffer,
- vacrel->cutoffs.OldestXmin,
+ vacrel->vistest,
deadoffsets, num_offsets,
&all_frozen, &visibility_cutoff_xid,
&vacrel->offnum))
@@ -3462,13 +3462,13 @@ dead_items_cleanup(LVRelState *vacrel)
*/
bool
heap_page_is_all_visible(Relation rel, Buffer buf,
- TransactionId OldestXmin,
+ GlobalVisState *vistest,
bool *all_frozen,
TransactionId *visibility_cutoff_xid,
OffsetNumber *logging_offnum)
{
- return heap_page_would_be_all_visible(rel, buf, OldestXmin,
+ return heap_page_would_be_all_visible(rel, buf, vistest,
NULL, 0,
all_frozen,
visibility_cutoff_xid,
@@ -3487,7 +3487,7 @@ heap_page_is_all_visible(Relation rel, Buffer buf,
* Returns true if the page is all-visible other than the provided
* deadoffsets and false otherwise.
*
- * OldestXmin is used to determine visibility.
+ * vistest is used to determine visibility.
*
* *all_frozen is an output parameter indicating to the caller if every tuple
* on the page is frozen.
@@ -3508,7 +3508,7 @@ heap_page_is_all_visible(Relation rel, Buffer buf,
*/
static bool
heap_page_would_be_all_visible(Relation rel, Buffer buf,
- TransactionId OldestXmin,
+ GlobalVisState *vistest,
OffsetNumber *deadoffsets,
int ndeadoffsets,
bool *all_frozen,
@@ -3580,8 +3580,8 @@ heap_page_would_be_all_visible(Relation rel, Buffer buf,
tuple.t_len = ItemIdGetLength(itemid);
tuple.t_tableOid = RelationGetRelid(rel);
- switch (HeapTupleSatisfiesVacuum(&tuple, OldestXmin,
- buf))
+ switch (HeapTupleSatisfiesVacuumGlobalVis(&tuple, vistest,
+ buf))
{
case HEAPTUPLE_LIVE:
{
@@ -3600,8 +3600,7 @@ heap_page_would_be_all_visible(Relation rel, Buffer buf,
* that everyone sees it as committed?
*/
xmin = HeapTupleHeaderGetXmin(tuple.t_data);
- if (!TransactionIdPrecedes(xmin,
- OldestXmin))
+ if (!GlobalVisXidVisibleToAll(vistest, xmin))
{
all_visible = false;
*all_frozen = false;
diff --git a/src/include/access/heapam.h b/src/include/access/heapam.h
index 3679928d43e..fcd882cb03b 100644
--- a/src/include/access/heapam.h
+++ b/src/include/access/heapam.h
@@ -342,7 +342,7 @@ extern void heap_inplace_unlock(Relation relation,
HeapTuple oldtup, Buffer buffer);
extern bool heap_page_is_all_visible(Relation rel, Buffer buf,
- TransactionId OldestXmin,
+ GlobalVisState *vistest,
bool *all_frozen,
TransactionId *visibility_cutoff_xid,
OffsetNumber *logging_offnum);
@@ -415,6 +415,8 @@ extern TM_Result HeapTupleSatisfiesUpdate(HeapTuple htup, CommandId curcid,
Buffer buffer);
extern HTSV_Result HeapTupleSatisfiesVacuum(HeapTuple htup, TransactionId OldestXmin,
Buffer buffer);
+extern HTSV_Result HeapTupleSatisfiesVacuumGlobalVis(HeapTuple htup,
+ GlobalVisState *vistest, Buffer buffer);
extern HTSV_Result HeapTupleSatisfiesVacuumHorizon(HeapTuple htup, Buffer buffer,
TransactionId *dead_after);
extern void HeapTupleSetHintBits(HeapTupleHeader tuple, Buffer buffer,
--
2.43.0
v13-0017-Allow-on-access-pruning-to-set-pages-all-visible.patchtext/x-patch; charset=US-ASCII; name=v13-0017-Allow-on-access-pruning-to-set-pages-all-visible.patchDownload
From f7cb1704e5716def42f8b0cdcbb6c390525c4cff Mon Sep 17 00:00:00 2001
From: Melanie Plageman <melanieplageman@gmail.com>
Date: Tue, 29 Jul 2025 14:34:30 -0400
Subject: [PATCH v13 17/20] Allow on-access pruning to set pages all-visible
Many queries do not modify the underlying relation. For such queries, if
on-access pruning occurs during the scan, we can check whether the page
has become all-visible and update the visibility map accordingly.
Previously, only vacuum marked pages as all-visible or all-frozen.
Supporting this requires passing information about whether the relation
is modified from the executor down to the scan descriptor.
This commit implements on-access VM setting for sequential scans as well
as for the underlying heap relation in index scans and bitmap heap
scans.
---
src/backend/access/heap/heapam.c | 15 ++++-
src/backend/access/heap/heapam_handler.c | 15 ++++-
src/backend/access/heap/pruneheap.c | 67 ++++++++++++++-----
src/backend/access/index/indexam.c | 46 +++++++++++++
src/backend/access/table/tableam.c | 39 +++++++++--
src/backend/executor/execMain.c | 4 ++
src/backend/executor/execUtils.c | 2 +
src/backend/executor/nodeBitmapHeapscan.c | 7 +-
src/backend/executor/nodeIndexscan.c | 18 +++--
src/backend/executor/nodeSeqscan.c | 24 +++++--
src/include/access/genam.h | 11 +++
src/include/access/heapam.h | 24 ++++++-
src/include/access/relscan.h | 6 ++
src/include/access/tableam.h | 30 ++++++++-
src/include/nodes/execnodes.h | 6 ++
.../t/035_standby_logical_decoding.pl | 3 +-
16 files changed, 277 insertions(+), 40 deletions(-)
diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index dfa9d5a460d..eedc7cb07bf 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -556,6 +556,7 @@ heap_prepare_pagescan(TableScanDesc sscan)
Buffer buffer = scan->rs_cbuf;
BlockNumber block = scan->rs_cblock;
Snapshot snapshot;
+ Buffer *vmbuffer = NULL;
Page page;
int lines;
bool all_visible;
@@ -570,7 +571,9 @@ heap_prepare_pagescan(TableScanDesc sscan)
/*
* Prune and repair fragmentation for the whole page, if possible.
*/
- heap_page_prune_opt(scan->rs_base.rs_rd, buffer);
+ if (sscan->rs_flags & SO_ALLOW_VM_SET)
+ vmbuffer = &scan->rs_vmbuffer;
+ heap_page_prune_opt(scan->rs_base.rs_rd, buffer, vmbuffer);
/*
* We must hold share lock on the buffer content while examining tuple
@@ -1247,6 +1250,7 @@ heap_beginscan(Relation relation, Snapshot snapshot,
sizeof(TBMIterateResult));
}
+ scan->rs_vmbuffer = InvalidBuffer;
return (TableScanDesc) scan;
}
@@ -1285,6 +1289,12 @@ heap_rescan(TableScanDesc sscan, ScanKey key, bool set_params,
scan->rs_cbuf = InvalidBuffer;
}
+ if (BufferIsValid(scan->rs_vmbuffer))
+ {
+ ReleaseBuffer(scan->rs_vmbuffer);
+ scan->rs_vmbuffer = InvalidBuffer;
+ }
+
/*
* SO_TYPE_BITMAPSCAN would be cleaned up here, but it does not hold any
* additional data vs a normal HeapScan
@@ -1317,6 +1327,9 @@ heap_endscan(TableScanDesc sscan)
if (BufferIsValid(scan->rs_cbuf))
ReleaseBuffer(scan->rs_cbuf);
+ if (BufferIsValid(scan->rs_vmbuffer))
+ ReleaseBuffer(scan->rs_vmbuffer);
+
/*
* Must free the read stream before freeing the BufferAccessStrategy.
*/
diff --git a/src/backend/access/heap/heapam_handler.c b/src/backend/access/heap/heapam_handler.c
index bcbac844bb6..f05b9e4968d 100644
--- a/src/backend/access/heap/heapam_handler.c
+++ b/src/backend/access/heap/heapam_handler.c
@@ -85,6 +85,7 @@ heapam_index_fetch_begin(Relation rel)
hscan->xs_base.rel = rel;
hscan->xs_cbuf = InvalidBuffer;
+ hscan->xs_vmbuffer = InvalidBuffer;
return &hscan->xs_base;
}
@@ -99,6 +100,12 @@ heapam_index_fetch_reset(IndexFetchTableData *scan)
ReleaseBuffer(hscan->xs_cbuf);
hscan->xs_cbuf = InvalidBuffer;
}
+
+ if (BufferIsValid(hscan->xs_vmbuffer))
+ {
+ ReleaseBuffer(hscan->xs_vmbuffer);
+ hscan->xs_vmbuffer = InvalidBuffer;
+ }
}
static void
@@ -138,7 +145,8 @@ heapam_index_fetch_tuple(struct IndexFetchTableData *scan,
* Prune page, but only if we weren't already on this page
*/
if (prev_buf != hscan->xs_cbuf)
- heap_page_prune_opt(hscan->xs_base.rel, hscan->xs_cbuf);
+ heap_page_prune_opt(hscan->xs_base.rel, hscan->xs_cbuf,
+ scan->modifies_base_rel ? NULL : &hscan->xs_vmbuffer);
}
/* Obtain share-lock on the buffer so we can examine visibility */
@@ -2471,6 +2479,7 @@ BitmapHeapScanNextBlock(TableScanDesc scan,
TBMIterateResult *tbmres;
OffsetNumber offsets[TBM_MAX_TUPLES_PER_PAGE];
int noffsets = -1;
+ Buffer *vmbuffer = NULL;
Assert(scan->rs_flags & SO_TYPE_BITMAPSCAN);
Assert(hscan->rs_read_stream);
@@ -2517,7 +2526,9 @@ BitmapHeapScanNextBlock(TableScanDesc scan,
/*
* Prune and repair fragmentation for the whole page, if possible.
*/
- heap_page_prune_opt(scan->rs_rd, buffer);
+ if (scan->rs_flags & SO_ALLOW_VM_SET)
+ vmbuffer = &hscan->rs_vmbuffer;
+ heap_page_prune_opt(scan->rs_rd, buffer, vmbuffer);
/*
* We must hold share lock on the buffer content while examining tuple
diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index 4f4a0af1f04..7523b936769 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -198,9 +198,13 @@ static bool identify_and_fix_vm_corruption(Relation relation,
* if there's not any use in pruning.
*
* Caller must have pin on the buffer, and must *not* have a lock on it.
+ *
+ * If vmbuffer is not NULL, it is okay for pruning to set the visibility map if
+ * the page is all visible. We will take care of pinning and, if needed,
+ * reading in the page of the visibility map.
*/
void
-heap_page_prune_opt(Relation relation, Buffer buffer)
+heap_page_prune_opt(Relation relation, Buffer buffer, Buffer *vmbuffer)
{
Page page = BufferGetPage(buffer);
TransactionId prune_xid;
@@ -264,6 +268,13 @@ heap_page_prune_opt(Relation relation, Buffer buffer)
{
OffsetNumber dummy_off_loc;
PruneFreezeResult presult;
+ int options = 0;
+
+ if (vmbuffer)
+ {
+ visibilitymap_pin(relation, BufferGetBlockNumber(buffer), vmbuffer);
+ options = HEAP_PAGE_PRUNE_UPDATE_VM;
+ }
/*
* For now, pass mark_unused_now as false regardless of whether or
@@ -271,9 +282,10 @@ heap_page_prune_opt(Relation relation, Buffer buffer)
* that during on-access pruning with the current implementation.
*/
heap_page_prune_and_freeze(relation, buffer, false,
- InvalidBuffer,
- vistest, 0,
- NULL, &presult, PRUNE_ON_ACCESS, &dummy_off_loc, NULL, NULL);
+ vmbuffer ? *vmbuffer : InvalidBuffer,
+ vistest, options,
+ NULL, &presult, PRUNE_ON_ACCESS,
+ &dummy_off_loc, NULL, NULL);
/*
* Report the number of tuples reclaimed to pgstats. This is
@@ -519,12 +531,17 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
* all-frozen for use in opportunistic freezing and to update the VM if
* the caller requests it.
*
- * Currently, only VACUUM attempts freezing and setting the VM bits. But
- * other callers could do either one. The visibility bookkeeping is
- * required for opportunistic freezing (in addition to setting the VM
- * bits) because we only consider opportunistically freezing tuples if the
- * whole page would become all-frozen or if the whole page will be frozen
- * except for dead tuples that will be removed by vacuum.
+ * Currently, only VACUUM attempts freezing. But other callers could. The
+ * visibility bookkeeping is required for opportunistic freezing (in
+ * addition to setting the VM bits) because we only consider
+ * opportunistically freezing tuples if the whole page would become
+ * all-frozen or if the whole page will be frozen except for dead tuples
+ * that will be removed by vacuum. But if consider_update_vm is false,
+ * we'll not set the VM even if the page is discovered to be all-visible.
+ *
+ * If only HEAP_PAGE_PRUNE_UPDATE_VM is passed and not
+ * HEAP_PAGE_PRUNE_FREEZE, prstate.all_frozen must be initialized to false
+ * because we will not call heap_prepare_freeze_tuple() on each tuple.
*
* If only updating the VM, we must initialize all_frozen to false, as
* heap_prepare_freeze_tuple() will not be called for each tuple on the
@@ -536,7 +553,7 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
* whether or not to freeze but before deciding whether or not to update
* the VM so that we don't set the VM bit incorrectly.
*
- * If not freezing or updating the VM, we otherwise avoid the extra
+ * If not freezing and not updating the VM, we avoid the extra
* bookkeeping. Initializing all_visible to false allows skipping the work
* to update them in heap_prune_record_unchanged_lp_normal().
*/
@@ -885,12 +902,30 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
prstate.all_frozen = false;
}
+ /*
+ * If this is an on-access call and we're not actually pruning, avoid
+ * setting the visibility map if it would newly dirty the heap page or, if
+ * the page is already dirty, if doing so would require including a
+ * full-page image (FPI) of the heap page in the WAL. This situation
+ * should be rare, as on-access pruning is only attempted when
+ * pd_prune_xid is valid.
+ */
+ if (reason == PRUNE_ON_ACCESS &&
+ prstate.consider_update_vm &&
+ prstate.all_visible &&
+ !do_prune && !do_freeze &&
+ (!BufferIsDirty(buffer) || XLogCheckBufferNeedsBackup(buffer)))
+ {
+ prstate.consider_update_vm = false;
+ prstate.all_visible = prstate.all_frozen = false;
+ }
+
Assert(!prstate.all_frozen || prstate.all_visible);
/*
- * Handle setting visibility map bit based on information from the VM (as
- * of last heap_vac_scan_next_block() call), and from all_visible and
- * all_frozen variables.
+ * Handle setting visibility map bit based on information from the VM (if
+ * provided, e.g. by vacuum from the last heap_vac_scan_next_block()
+ * call), and from all_visible and all_frozen variables.
*/
if (prstate.consider_update_vm)
{
@@ -2284,8 +2319,8 @@ heap_log_freeze_plan(HeapTupleFreeze *tuples, int ntuples,
* - Reaping: During vacuum phase III, items that are already LP_DEAD are
* marked as unused.
*
- * - VM updates: After vacuum phases I and III, the heap page may be marked
- * all-visible and all-frozen.
+ * - VM updates: After vacuum phases I and III and on-access, the heap page
+ * may be marked all-visible and all-frozen.
*
* These changes all happen together, so we use a single WAL record for them
* all.
diff --git a/src/backend/access/index/indexam.c b/src/backend/access/index/indexam.c
index 86d11f4ec79..4603ece09bd 100644
--- a/src/backend/access/index/indexam.c
+++ b/src/backend/access/index/indexam.c
@@ -289,6 +289,32 @@ index_beginscan(Relation heapRelation,
return scan;
}
+/*
+ * Similar to index_beginscan(), but allows the caller to indicate whether the
+ * query modifies the underlying base relation. This is used when the caller
+ * wants to attempt marking pages in the base relation as all-visible in the
+ * visibility map during on-access pruning.
+ */
+IndexScanDesc
+index_beginscan_vmset(Relation heapRelation,
+ Relation indexRelation,
+ Snapshot snapshot,
+ IndexScanInstrumentation *instrument,
+ int nkeys, int norderbys, bool modifies_base_rel)
+{
+ IndexScanDesc scan;
+
+ scan = index_beginscan(heapRelation,
+ indexRelation,
+ snapshot,
+ instrument,
+ nkeys, norderbys);
+
+ scan->xs_heapfetch->modifies_base_rel = modifies_base_rel;
+
+ return scan;
+}
+
/*
* index_beginscan_bitmap - start a scan of an index with amgetbitmap
*
@@ -620,6 +646,26 @@ index_beginscan_parallel(Relation heaprel, Relation indexrel,
return scan;
}
+/*
+ * Parallel version of index_beginscan_vmset()
+ */
+IndexScanDesc
+index_beginscan_parallel_vmset(Relation heaprel, Relation indexrel,
+ IndexScanInstrumentation *instrument,
+ int nkeys, int norderbys,
+ ParallelIndexScanDesc pscan,
+ bool modifies_base_rel)
+{
+ IndexScanDesc scan;
+
+ scan = index_beginscan_parallel(heaprel, indexrel,
+ instrument,
+ nkeys, norderbys,
+ pscan);
+ scan->xs_heapfetch->modifies_base_rel = modifies_base_rel;
+ return scan;
+}
+
/* ----------------
* index_getnext_tid - get the next TID from a scan
*
diff --git a/src/backend/access/table/tableam.c b/src/backend/access/table/tableam.c
index a56c5eceb14..67dbf99f5b5 100644
--- a/src/backend/access/table/tableam.c
+++ b/src/backend/access/table/tableam.c
@@ -49,6 +49,10 @@
char *default_table_access_method = DEFAULT_TABLE_ACCESS_METHOD;
bool synchronize_seqscans = true;
+/* Helper for table_beginscan_parallel() and table_beginscan_parallel_vmset() */
+static TableScanDesc table_beginscan_parallel_common(Relation relation, ParallelTableScanDesc pscan,
+ uint32 flags);
+
/* ----------------------------------------------------------------------------
* Slot functions.
@@ -162,12 +166,14 @@ table_parallelscan_initialize(Relation rel, ParallelTableScanDesc pscan,
}
}
-TableScanDesc
-table_beginscan_parallel(Relation relation, ParallelTableScanDesc pscan)
+/*
+ * Common helper for table_beginscan_parallel() and table_beginscan_parallel_vmset()
+ */
+static TableScanDesc
+table_beginscan_parallel_common(Relation relation, ParallelTableScanDesc pscan,
+ uint32 flags)
{
Snapshot snapshot;
- uint32 flags = SO_TYPE_SEQSCAN |
- SO_ALLOW_STRAT | SO_ALLOW_SYNC | SO_ALLOW_PAGEMODE;
Assert(RelFileLocatorEquals(relation->rd_locator, pscan->phs_locator));
@@ -188,6 +194,31 @@ table_beginscan_parallel(Relation relation, ParallelTableScanDesc pscan)
pscan, flags);
}
+TableScanDesc
+table_beginscan_parallel(Relation relation, ParallelTableScanDesc pscan)
+{
+ uint32 flags = SO_TYPE_SEQSCAN |
+ SO_ALLOW_STRAT | SO_ALLOW_SYNC | SO_ALLOW_PAGEMODE;
+
+ return table_beginscan_parallel_common(relation, pscan, flags);
+}
+
+/*
+ * Parallel version of table_beginscan_vmset()
+ */
+TableScanDesc
+table_beginscan_parallel_vmset(Relation relation, ParallelTableScanDesc pscan,
+ bool modifies_rel)
+{
+ uint32 flags = SO_TYPE_SEQSCAN |
+ SO_ALLOW_STRAT | SO_ALLOW_SYNC | SO_ALLOW_PAGEMODE;
+
+ if (!modifies_rel)
+ flags |= SO_ALLOW_VM_SET;
+
+ return table_beginscan_parallel_common(relation, pscan, flags);
+}
+
/* ----------------------------------------------------------------------------
* Index scan related functions.
diff --git a/src/backend/executor/execMain.c b/src/backend/executor/execMain.c
index ff12e2e1364..2e0474c948a 100644
--- a/src/backend/executor/execMain.c
+++ b/src/backend/executor/execMain.c
@@ -916,6 +916,10 @@ InitPlan(QueryDesc *queryDesc, int eflags)
break;
}
+ /* If it has a rowmark, the relation is modified */
+ estate->es_modified_relids = bms_add_member(estate->es_modified_relids,
+ rc->rti);
+
/* Check that relation is a legal target for marking */
if (relation)
CheckValidRowMarkRel(relation, rc->markType);
diff --git a/src/backend/executor/execUtils.c b/src/backend/executor/execUtils.c
index fdc65c2b42b..28a06dcd244 100644
--- a/src/backend/executor/execUtils.c
+++ b/src/backend/executor/execUtils.c
@@ -893,6 +893,8 @@ ExecInitResultRelation(EState *estate, ResultRelInfo *resultRelInfo,
estate->es_result_relations = (ResultRelInfo **)
palloc0(estate->es_range_table_size * sizeof(ResultRelInfo *));
estate->es_result_relations[rti - 1] = resultRelInfo;
+ estate->es_modified_relids = bms_add_member(estate->es_modified_relids,
+ rti);
/*
* Saving in the list allows to avoid needlessly traversing the whole
diff --git a/src/backend/executor/nodeBitmapHeapscan.c b/src/backend/executor/nodeBitmapHeapscan.c
index bf24f3d7fe0..af6db9f7919 100644
--- a/src/backend/executor/nodeBitmapHeapscan.c
+++ b/src/backend/executor/nodeBitmapHeapscan.c
@@ -105,11 +105,16 @@ BitmapTableScanSetup(BitmapHeapScanState *node)
*/
if (!node->ss.ss_currentScanDesc)
{
+ bool modifies_rel =
+ bms_is_member(((Scan *) node->ss.ps.plan)->scanrelid,
+ node->ss.ps.state->es_modified_relids);
+
node->ss.ss_currentScanDesc =
table_beginscan_bm(node->ss.ss_currentRelation,
node->ss.ps.state->es_snapshot,
0,
- NULL);
+ NULL,
+ modifies_rel);
}
node->ss.ss_currentScanDesc->st.rs_tbmiterator = tbmiterator;
diff --git a/src/backend/executor/nodeIndexscan.c b/src/backend/executor/nodeIndexscan.c
index 7fcaa37fe62..c2ffbd3b08e 100644
--- a/src/backend/executor/nodeIndexscan.c
+++ b/src/backend/executor/nodeIndexscan.c
@@ -102,16 +102,22 @@ IndexNext(IndexScanState *node)
if (scandesc == NULL)
{
+
+ bool modifies_base_rel =
+ bms_is_member(((Scan *) node->ss.ps.plan)->scanrelid,
+ estate->es_modified_relids);
+
/*
* We reach here if the index scan is not parallel, or if we're
* serially executing an index scan that was planned to be parallel.
*/
- scandesc = index_beginscan(node->ss.ss_currentRelation,
- node->iss_RelationDesc,
- estate->es_snapshot,
- &node->iss_Instrument,
- node->iss_NumScanKeys,
- node->iss_NumOrderByKeys);
+ scandesc = index_beginscan_vmset(node->ss.ss_currentRelation,
+ node->iss_RelationDesc,
+ estate->es_snapshot,
+ &node->iss_Instrument,
+ node->iss_NumScanKeys,
+ node->iss_NumOrderByKeys,
+ modifies_base_rel);
node->iss_ScanDesc = scandesc;
diff --git a/src/backend/executor/nodeSeqscan.c b/src/backend/executor/nodeSeqscan.c
index 94047d29430..fd69275c181 100644
--- a/src/backend/executor/nodeSeqscan.c
+++ b/src/backend/executor/nodeSeqscan.c
@@ -65,13 +65,18 @@ SeqNext(SeqScanState *node)
if (scandesc == NULL)
{
+ bool modifies_rel =
+ bms_is_member(((Scan *) node->ss.ps.plan)->scanrelid,
+ estate->es_modified_relids);
+
/*
* We reach here if the scan is not parallel, or if we're serially
* executing a scan that was planned to be parallel.
*/
- scandesc = table_beginscan(node->ss.ss_currentRelation,
- estate->es_snapshot,
- 0, NULL);
+ scandesc = table_beginscan_vmset(node->ss.ss_currentRelation,
+ estate->es_snapshot,
+ 0, NULL, modifies_rel);
+
node->ss.ss_currentScanDesc = scandesc;
}
@@ -366,6 +371,7 @@ ExecSeqScanInitializeDSM(SeqScanState *node,
ParallelContext *pcxt)
{
EState *estate = node->ss.ps.state;
+ bool modifies_rel;
ParallelTableScanDesc pscan;
pscan = shm_toc_allocate(pcxt->toc, node->pscan_len);
@@ -373,8 +379,11 @@ ExecSeqScanInitializeDSM(SeqScanState *node,
pscan,
estate->es_snapshot);
shm_toc_insert(pcxt->toc, node->ss.ps.plan->plan_node_id, pscan);
+ modifies_rel = bms_is_member(((Scan *) node->ss.ps.plan)->scanrelid,
+ estate->es_modified_relids);
node->ss.ss_currentScanDesc =
- table_beginscan_parallel(node->ss.ss_currentRelation, pscan);
+ table_beginscan_parallel_vmset(node->ss.ss_currentRelation, pscan,
+ modifies_rel);
}
/* ----------------------------------------------------------------
@@ -404,8 +413,13 @@ ExecSeqScanInitializeWorker(SeqScanState *node,
ParallelWorkerContext *pwcxt)
{
ParallelTableScanDesc pscan;
+ bool modifies_rel =
+ bms_is_member(((Scan *) node->ss.ps.plan)->scanrelid,
+ node->ss.ps.state->es_modified_relids);
pscan = shm_toc_lookup(pwcxt->toc, node->ss.ps.plan->plan_node_id, false);
node->ss.ss_currentScanDesc =
- table_beginscan_parallel(node->ss.ss_currentRelation, pscan);
+ table_beginscan_parallel_vmset(node->ss.ss_currentRelation,
+ pscan,
+ modifies_rel);
}
diff --git a/src/include/access/genam.h b/src/include/access/genam.h
index 5b2ab181b5f..bf272c2c37f 100644
--- a/src/include/access/genam.h
+++ b/src/include/access/genam.h
@@ -180,6 +180,11 @@ extern IndexScanDesc index_beginscan(Relation heapRelation,
Snapshot snapshot,
IndexScanInstrumentation *instrument,
int nkeys, int norderbys);
+extern IndexScanDesc index_beginscan_vmset(Relation heapRelation,
+ Relation indexRelation,
+ Snapshot snapshot,
+ IndexScanInstrumentation *instrument,
+ int nkeys, int norderbys, bool modifies_heap_rel);
extern IndexScanDesc index_beginscan_bitmap(Relation indexRelation,
Snapshot snapshot,
IndexScanInstrumentation *instrument,
@@ -206,6 +211,12 @@ extern IndexScanDesc index_beginscan_parallel(Relation heaprel,
IndexScanInstrumentation *instrument,
int nkeys, int norderbys,
ParallelIndexScanDesc pscan);
+
+extern IndexScanDesc index_beginscan_parallel_vmset(Relation heaprel, Relation indexrel,
+ IndexScanInstrumentation *instrument,
+ int nkeys, int norderbys,
+ ParallelIndexScanDesc pscan,
+ bool modifies_rel);
extern ItemPointer index_getnext_tid(IndexScanDesc scan,
ScanDirection direction);
struct TupleTableSlot;
diff --git a/src/include/access/heapam.h b/src/include/access/heapam.h
index fcd882cb03b..2210a5e0a79 100644
--- a/src/include/access/heapam.h
+++ b/src/include/access/heapam.h
@@ -94,6 +94,13 @@ typedef struct HeapScanDescData
*/
ParallelBlockTableScanWorkerData *rs_parallelworkerdata;
+ /*
+ * For sequential scans and bitmap heap scans. If the relation is not
+ * being modified, on-access pruning may read in the current heap page's
+ * corresponding VM block to this buffer.
+ */
+ Buffer rs_vmbuffer;
+
/* these fields only used in page-at-a-time mode and for bitmap scans */
uint32 rs_cindex; /* current tuple's index in vistuples */
uint32 rs_ntuples; /* number of visible tuples on page */
@@ -116,8 +123,18 @@ typedef struct IndexFetchHeapData
{
IndexFetchTableData xs_base; /* AM independent part of the descriptor */
- Buffer xs_cbuf; /* current heap buffer in scan, if any */
- /* NB: if xs_cbuf is not InvalidBuffer, we hold a pin on that buffer */
+ /*
+ * Current heap buffer in scan, if any. NB: if xs_cbuf is not
+ * InvalidBuffer, we hold a pin on that buffer.
+ */
+ Buffer xs_cbuf;
+
+ /*
+ * For index scans that do not modify the underlying heap table, on-access
+ * pruning may read in the current heap page's corresponding VM block to
+ * this buffer.
+ */
+ Buffer xs_vmbuffer;
} IndexFetchHeapData;
/* Result codes for HeapTupleSatisfiesVacuum */
@@ -374,7 +391,8 @@ extern TransactionId heap_index_delete_tuples(Relation rel,
/* in heap/pruneheap.c */
struct GlobalVisState;
-extern void heap_page_prune_opt(Relation relation, Buffer buffer);
+extern void heap_page_prune_opt(Relation relation, Buffer buffer,
+ Buffer *vmbuffer);
extern void heap_page_prune_and_freeze(Relation relation, Buffer buffer,
bool blk_known_av,
Buffer vmbuffer,
diff --git a/src/include/access/relscan.h b/src/include/access/relscan.h
index b5e0fb386c0..f496e0b4939 100644
--- a/src/include/access/relscan.h
+++ b/src/include/access/relscan.h
@@ -121,6 +121,12 @@ typedef struct ParallelBlockTableScanWorkerData *ParallelBlockTableScanWorker;
typedef struct IndexFetchTableData
{
Relation rel;
+
+ /*
+ * Some optimizations can only be performed if the query does not modify
+ * the underlying relation. Track that here.
+ */
+ bool modifies_base_rel;
} IndexFetchTableData;
struct IndexScanInstrumentation;
diff --git a/src/include/access/tableam.h b/src/include/access/tableam.h
index b2ce35e2a34..e31c21cf8eb 100644
--- a/src/include/access/tableam.h
+++ b/src/include/access/tableam.h
@@ -62,6 +62,8 @@ typedef enum ScanOptions
/* unregister snapshot at scan end? */
SO_TEMP_SNAPSHOT = 1 << 9,
+ /* whether or not scan should attempt to set the VM */
+ SO_ALLOW_VM_SET = 1 << 10,
} ScanOptions;
/*
@@ -881,6 +883,25 @@ table_beginscan(Relation rel, Snapshot snapshot,
return rel->rd_tableam->scan_begin(rel, snapshot, nkeys, key, NULL, flags);
}
+/*
+ * Similar to table_beginscan(), but allows the caller to indicate whether the
+ * query modifies the relation. This is used when the caller wants to attempt
+ * marking pages in the relation as all-visible in the visibility map during
+ * on-access pruning.
+ */
+static inline TableScanDesc
+table_beginscan_vmset(Relation rel, Snapshot snapshot,
+ int nkeys, struct ScanKeyData *key, bool modifies_rel)
+{
+ uint32 flags = SO_TYPE_SEQSCAN |
+ SO_ALLOW_STRAT | SO_ALLOW_SYNC | SO_ALLOW_PAGEMODE;
+
+ if (!modifies_rel)
+ flags |= SO_ALLOW_VM_SET;
+
+ return rel->rd_tableam->scan_begin(rel, snapshot, nkeys, key, NULL, flags);
+}
+
/*
* Like table_beginscan(), but for scanning catalog. It'll automatically use a
* snapshot appropriate for scanning catalog relations.
@@ -918,10 +939,13 @@ table_beginscan_strat(Relation rel, Snapshot snapshot,
*/
static inline TableScanDesc
table_beginscan_bm(Relation rel, Snapshot snapshot,
- int nkeys, struct ScanKeyData *key)
+ int nkeys, struct ScanKeyData *key, bool modifies_rel)
{
uint32 flags = SO_TYPE_BITMAPSCAN | SO_ALLOW_PAGEMODE;
+ if (!modifies_rel)
+ flags |= SO_ALLOW_VM_SET;
+
return rel->rd_tableam->scan_begin(rel, snapshot, nkeys, key,
NULL, flags);
}
@@ -1130,6 +1154,10 @@ extern void table_parallelscan_initialize(Relation rel,
extern TableScanDesc table_beginscan_parallel(Relation relation,
ParallelTableScanDesc pscan);
+extern TableScanDesc table_beginscan_parallel_vmset(Relation relation,
+ ParallelTableScanDesc pscan,
+ bool modifies_rel);
+
/*
* Restart a parallel scan. Call this in the leader process. Caller is
* responsible for making sure that all workers have finished the scan
diff --git a/src/include/nodes/execnodes.h b/src/include/nodes/execnodes.h
index de782014b2d..839c1be1d7c 100644
--- a/src/include/nodes/execnodes.h
+++ b/src/include/nodes/execnodes.h
@@ -678,6 +678,12 @@ typedef struct EState
* ExecDoInitialPruning() */
const char *es_sourceText; /* Source text from QueryDesc */
+ /*
+ * RT indexes of relations modified by the query either through
+ * UPDATE/DELETE/INSERT/MERGE or SELECT FOR UPDATE
+ */
+ Bitmapset *es_modified_relids;
+
JunkFilter *es_junkFilter; /* top-level junk filter, if any */
/* If query can insert/delete tuples, the command ID to mark them with */
diff --git a/src/test/recovery/t/035_standby_logical_decoding.pl b/src/test/recovery/t/035_standby_logical_decoding.pl
index c9c182892cf..f5c0c65b260 100644
--- a/src/test/recovery/t/035_standby_logical_decoding.pl
+++ b/src/test/recovery/t/035_standby_logical_decoding.pl
@@ -296,6 +296,7 @@ wal_level = 'logical'
max_replication_slots = 4
max_wal_senders = 4
autovacuum = off
+hot_standby_feedback = on
});
$node_primary->dump_info;
$node_primary->start;
@@ -745,7 +746,7 @@ check_pg_recvlogical_stderr($handle,
$logstart = -s $node_standby->logfile;
reactive_slots_change_hfs_and_wait_for_xmins('shared_row_removal_',
- 'no_conflict_', 0, 1);
+ 'no_conflict_', 1, 0);
# This should not trigger a conflict
wait_until_vacuum_can_remove(
--
2.43.0
v13-0018-Add-helper-functions-to-heap_page_prune_and_free.patchtext/x-patch; charset=US-ASCII; name=v13-0018-Add-helper-functions-to-heap_page_prune_and_free.patchDownload
From b726ee54192582e78c5a3866d2a819993c2c798a Mon Sep 17 00:00:00 2001
From: Melanie Plageman <melanieplageman@gmail.com>
Date: Wed, 30 Jul 2025 18:51:43 -0400
Subject: [PATCH v13 18/20] Add helper functions to heap_page_prune_and_freeze
heap_page_prune_and_freeze() has gotten rather long. It has several
stages:
1) setup - where the PruneState is set up
2) tuple examination -- where tuples and line pointers are examined to
determine what needs to be pruned and what could be frozen
3) evaluation -- where we determine based on caller provided options,
heuristics, and state gathered during stage 2 whether or not to
freeze tuples and set the page in the VM
4) execution - where the page changes are actually made and logged
This commit refactors the evaluation stage into helpers which return
whether or not to freeze and set the VM.
XXX: For the purposes of committing, this likely shouldn't be a separate
commit. But I'm not sure yet whether it makes more sense to do this
refactoring earlier in the set for clarity for the reviewer.
---
src/backend/access/heap/pruneheap.c | 473 +++++++++++++++++-----------
1 file changed, 296 insertions(+), 177 deletions(-)
diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index 7523b936769..33a35dc5aab 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -179,6 +179,22 @@ static void heap_prune_record_unchanged_lp_redirect(PruneState *prstate, OffsetN
static void page_verify_redirects(Page page);
+static bool heap_page_will_freeze(Relation relation, Buffer buffer,
+ bool do_prune,
+ bool do_hint_full_or_prunable,
+ bool did_tuple_hint_fpi,
+ PruneState *prstate,
+ bool *all_frozen_except_lp_dead);
+
+static bool heap_page_will_update_vm(Relation relation,
+ Buffer buffer, BlockNumber blockno, Page page,
+ PruneReason reason,
+ bool do_prune, bool do_freeze,
+ bool blk_known_av,
+ PruneState *prstate,
+ Buffer *vmbuffer, uint8 *vmflags,
+ bool *set_pd_all_visible);
+
static bool identify_and_fix_vm_corruption(Relation relation,
BlockNumber heap_blk,
Buffer heap_buffer, Page heap_page,
@@ -382,6 +398,249 @@ identify_and_fix_vm_corruption(Relation relation,
return false;
}
+
+/*
+ * Determine whether to set the visibility map bits based on information from
+ * the PruneState and blk_known_av, which some callers will provide after
+ * previously examining this heap page's VM bits (e.g. vacuum from the last
+ * heap_vac_scan_next_block() call).
+ *
+ * We pass in blockno and page even those can be derived from buffer to avoid
+ * extra BufferGetBlock() and BufferGetBlockNumber() calls.
+ *
+ * This should be called only after do_freeze has been decided (and do_prune
+ * has been set), as these factor into our heuristic-based decision.
+ *
+ * prstate and vmbuffer are input/output fields. vmflags and and
+ * set_pd_all_visible are output fields.
+ *
+ * Returns true if the caller should set one or both of the VM bits and false
+ * otherwise.
+ */
+static bool
+heap_page_will_update_vm(Relation relation,
+ Buffer buffer, BlockNumber blockno, Page page,
+ PruneReason reason,
+ bool do_prune, bool do_freeze,
+ bool blk_known_av,
+ PruneState *prstate,
+ Buffer *vmbuffer, uint8 *vmflags,
+ bool *set_pd_all_visible)
+{
+ bool do_set_vm = false;
+
+ /*
+ * If the caller specified not to update the VM, validate everything is in
+ * the right state and exit.
+ */
+ if (!prstate->consider_update_vm)
+ {
+ Assert(!prstate->all_visible && !prstate->all_frozen);
+ /* We don't set only the page level visibility hint */
+ Assert(!(*set_pd_all_visible));
+ Assert(*vmflags == 0);
+ return false;
+ }
+
+ /*
+ * If this is an on-access call and we're not actually pruning, avoid
+ * setting the visibility map if it would newly dirty the heap page or, if
+ * the page is already dirty, if doing so would require including a
+ * full-page image (FPI) of the heap page in the WAL. This situation
+ * should be rare, as on-access pruning is only attempted when
+ * pd_prune_xid is valid.
+ */
+ if (reason == PRUNE_ON_ACCESS &&
+ prstate->consider_update_vm &&
+ prstate->all_visible &&
+ !do_prune && !do_freeze &&
+ (!BufferIsDirty(buffer) || XLogCheckBufferNeedsBackup(buffer)))
+ {
+ prstate->consider_update_vm = false;
+ prstate->all_visible = prstate->all_frozen = false;
+ }
+
+ Assert(!prstate->all_frozen || prstate->all_visible);
+
+ /*
+ * Clear any VM corruption. This does not need to be in a critical
+ * section, so we do it first. If PD_ALL_VISIBLE is incorrectly set, we
+ * may mark the heap page buffer dirty here and could end up doing so
+ * again later. This is not a correctness issue and is in the path of VM
+ * corruption, so we don't have to worry about the extra performance
+ * overhead.
+ */
+ if (identify_and_fix_vm_corruption(relation,
+ blockno, buffer, page,
+ blk_known_av, prstate->lpdead_items,
+ *vmbuffer))
+ {
+ /* If we fix corruption, don't update the VM further */
+ }
+
+ /* Determine if we actually need to set the VM and which bits to set. */
+ else if (prstate->all_visible &&
+ (!blk_known_av ||
+ (prstate->all_frozen && !VM_ALL_FROZEN(relation, blockno, vmbuffer))))
+ {
+ *vmflags |= VISIBILITYMAP_ALL_VISIBLE;
+ if (prstate->all_frozen)
+ *vmflags |= VISIBILITYMAP_ALL_FROZEN;
+ }
+
+ do_set_vm = *vmflags & VISIBILITYMAP_VALID_BITS;
+
+ /*
+ * Don't set PD_ALL_VISIBLE unless we also plan to set the VM. While it is
+ * correct for a heap page to have PD_ALL_VISIBLE even if the VM is not
+ * set, we strongly prefer to keep them in sync.
+ *
+ * Prior to Postgres 19, it was possible for the page-level bit to be set
+ * and the VM bit to be clear. This could happen if we crashed after
+ * setting PD_ALL_VISIBLE but before setting bits in the VM.
+ */
+ *set_pd_all_visible = do_set_vm && !PageIsAllVisible(page);
+ return do_set_vm;
+}
+
+/*
+ * Decide if we want to go ahead with freezing according to the freeze plans we
+ * prepared for the given buffer or not. If the caller specified we should not
+ * freeze tuples, it exits early.
+ *
+ * do_prune, do_hint_full_or_prunable, and did_tuple_hint_fpi must all have
+ * been decided before calling this function.
+ *
+ * prstate is an input/output parameter. all_frozen_except_lp_dead is set and
+ * used later to determine the snapshot conflict horizon for the record.
+ *
+ * Returns true if we should use our freeze plans and freeze tuples on the page
+ * and false otherwise.
+ */
+static bool
+heap_page_will_freeze(Relation relation, Buffer buffer,
+ bool do_prune,
+ bool do_hint_full_or_prunable,
+ bool did_tuple_hint_fpi,
+ PruneState *prstate,
+ bool *all_frozen_except_lp_dead)
+{
+ bool do_freeze = false;
+
+ /*
+ * If the caller specified we should not attempt to freeze any tuples,
+ * validate that everything is in the right state and exit.
+ */
+ if (!prstate->attempt_freeze)
+ {
+ Assert(!prstate->all_frozen && prstate->nfrozen == 0);
+ Assert(prstate->lpdead_items == 0 || !prstate->all_visible);
+ Assert(!(*all_frozen_except_lp_dead));
+ return false;
+ }
+
+ if (prstate->pagefrz.freeze_required)
+ {
+ /*
+ * heap_prepare_freeze_tuple indicated that at least one XID/MXID from
+ * before FreezeLimit/MultiXactCutoff is present. Must freeze to
+ * advance relfrozenxid/relminmxid.
+ */
+ do_freeze = true;
+ }
+ else
+ {
+ /*
+ * Opportunistically freeze the page if we are generating an FPI
+ * anyway and if doing so means that we can set the page all-frozen
+ * afterwards (might not happen until VACUUM's final heap pass).
+ *
+ * XXX: Previously, we knew if pruning emitted an FPI by checking
+ * pgWalUsage.wal_fpi before and after pruning. Once the freeze and
+ * prune records were combined, this heuristic couldn't be used
+ * anymore. The opportunistic freeze heuristic must be improved;
+ * however, for now, try to approximate the old logic.
+ */
+ if (prstate->all_visible && prstate->all_frozen && prstate->nfrozen > 0)
+ {
+ /*
+ * Freezing would make the page all-frozen. Have already emitted
+ * an FPI or will do so anyway?
+ */
+ if (RelationNeedsWAL(relation))
+ {
+ if (did_tuple_hint_fpi)
+ do_freeze = true;
+ else if (do_prune)
+ {
+ if (XLogCheckBufferNeedsBackup(buffer))
+ do_freeze = true;
+ }
+ else if (do_hint_full_or_prunable)
+ {
+ if (XLogHintBitIsNeeded() && XLogCheckBufferNeedsBackup(buffer))
+ do_freeze = true;
+ }
+ }
+ }
+ }
+
+ if (do_freeze)
+ {
+ /*
+ * Validate the tuples we will be freezing before entering the
+ * critical section.
+ */
+ heap_pre_freeze_checks(buffer, prstate->frozen, prstate->nfrozen);
+ }
+ else if (prstate->nfrozen > 0)
+ {
+ /*
+ * The page contained some tuples that were not already frozen, and we
+ * chose not to freeze them now. The page won't be all-frozen then.
+ */
+ Assert(!prstate->pagefrz.freeze_required);
+
+ prstate->all_frozen = false;
+ prstate->nfrozen = 0; /* avoid miscounts in instrumentation */
+ }
+ else
+ {
+ /*
+ * We have no freeze plans to execute. The page might already be
+ * all-frozen (perhaps only following pruning), though. Such pages
+ * can be marked all-frozen in the VM by our caller, even though none
+ * of its tuples were newly frozen here.
+ */
+ }
+
+ /*
+ * It was convenient to ignore LP_DEAD items in all_visible earlier on to
+ * make the choice of whether or not to freeze the page unaffected by the
+ * short-term presence of LP_DEAD items. These LP_DEAD items were
+ * effectively assumed to be LP_UNUSED items in the making. It doesn't
+ * matter which vacuum heap pass (initial pass or final pass) ends up
+ * setting the page all-frozen, as long as the ongoing VACUUM does it.
+ *
+ * Now that freezing has been finalized, unset all_visible if there are
+ * any LP_DEAD items on the page. It needs to reflect the present state of
+ * the page when using it to determine whether or not to update the VM.
+ *
+ * Keep track of whether or not the page was all-frozen except LP_DEAD
+ * items for the purposes of calculating the snapshot conflict horizon,
+ * though.
+ */
+ *all_frozen_except_lp_dead = prstate->all_frozen;
+ if (prstate->lpdead_items > 0)
+ {
+ prstate->all_visible = false;
+ prstate->all_frozen = false;
+ }
+
+ return do_freeze;
+}
+
+
/*
* Prune and repair fragmentation and potentially freeze tuples on the
* specified page. If the page's visibility status has changed, update it in
@@ -772,20 +1031,30 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
/* Clear the offset information once we have processed the given page. */
*off_loc = InvalidOffsetNumber;
- do_prune = prstate.nredirected > 0 ||
- prstate.ndead > 0 ||
- prstate.nunused > 0;
-
/*
* After processing all the live tuples on the page, if the newest xmin
* amongst them is not visible to everyone, the page cannot be
- * all-visible.
+ * all-visible. This must be done before we decide whether or not to
+ * opportunistically freeze below because we do not want to
+ * opportunistically freeze the page if there are live tuples not visible
+ * to everyone, which would prevent setting the page frozen in the VM.
*/
if (prstate.all_visible &&
TransactionIdIsNormal(prstate.visibility_cutoff_xid) &&
!GlobalVisXidVisibleToAll(prstate.vistest, prstate.visibility_cutoff_xid))
prstate.all_visible = prstate.all_frozen = false;
+ /*
+ * Now decide based on information collected while examining every tuple
+ * which actions to take. If there are any prunable tuples, we'll prune
+ * them. However, we will decide based on options specified by the caller
+ * and various heuristics whether or not to freeze any tuples and whether
+ * or not the page should be set all-visible/all-frozen in the VM.
+ */
+ do_prune = prstate.nredirected > 0 ||
+ prstate.ndead > 0 ||
+ prstate.nunused > 0;
+
/*
* Even if we don't prune anything, if we found a new value for the
* pd_prune_xid field or the page was marked full, we will update those
@@ -796,186 +1065,36 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
PageIsFull(page);
/*
- * Decide if we want to go ahead with freezing according to the freeze
- * plans we prepared, or not.
- */
- do_freeze = false;
- if (prstate.attempt_freeze)
- {
- if (prstate.pagefrz.freeze_required)
- {
- /*
- * heap_prepare_freeze_tuple indicated that at least one XID/MXID
- * from before FreezeLimit/MultiXactCutoff is present. Must
- * freeze to advance relfrozenxid/relminmxid.
- */
- do_freeze = true;
- }
- else
- {
- /*
- * Opportunistically freeze the page if we are generating an FPI
- * anyway and if doing so means that we can set the page
- * all-frozen afterwards (might not happen until VACUUM's final
- * heap pass).
- *
- * XXX: Previously, we knew if pruning emitted an FPI by checking
- * pgWalUsage.wal_fpi before and after pruning. Once the freeze
- * and prune records were combined, this heuristic couldn't be
- * used anymore. The opportunistic freeze heuristic must be
- * improved; however, for now, try to approximate the old logic.
- */
- if (prstate.all_visible && prstate.all_frozen && prstate.nfrozen > 0)
- {
- /*
- * Freezing would make the page all-frozen. Have already
- * emitted an FPI or will do so anyway?
- */
- if (RelationNeedsWAL(relation))
- {
- if (did_tuple_hint_fpi)
- do_freeze = true;
- else if (do_prune)
- {
- if (XLogCheckBufferNeedsBackup(buffer))
- do_freeze = true;
- }
- else if (do_hint_full_or_prunable)
- {
- if (XLogHintBitIsNeeded() && XLogCheckBufferNeedsBackup(buffer))
- do_freeze = true;
- }
- }
- }
- }
- }
-
- if (do_freeze)
- {
- /*
- * Validate the tuples we will be freezing before entering the
- * critical section.
- */
- heap_pre_freeze_checks(buffer, prstate.frozen, prstate.nfrozen);
- }
- else if (prstate.nfrozen > 0)
- {
- /*
- * The page contained some tuples that were not already frozen, and we
- * chose not to freeze them now. The page won't be all-frozen then.
- */
- Assert(!prstate.pagefrz.freeze_required);
-
- prstate.all_frozen = false;
- prstate.nfrozen = 0; /* avoid miscounts in instrumentation */
- }
- else
- {
- /*
- * We have no freeze plans to execute. The page might already be
- * all-frozen (perhaps only following pruning), though. Such pages
- * can be marked all-frozen in the VM by our caller, even though none
- * of its tuples were newly frozen here.
- */
- }
-
- /*
- * It was convenient to ignore LP_DEAD items in all_visible earlier on to
- * make the choice of whether or not to freeze the page unaffected by the
- * short-term presence of LP_DEAD items. These LP_DEAD items were
- * effectively assumed to be LP_UNUSED items in the making. It doesn't
- * matter which vacuum heap pass (initial pass or final pass) ends up
- * setting the page all-frozen, as long as the ongoing VACUUM does it.
- *
- * Now that freezing has been finalized, unset all_visible if there are
- * any LP_DEAD items on the page. It needs to reflect the present state of
- * the page when using it to determine whether or not to update the VM.
- *
- * Keep track of whether or not the page was all-frozen except LP_DEAD
- * items for the purposes of calculating the snapshot conflict horizon,
- * though.
+ * We must decide whether or not to freeze before deciding if and what to
+ * set in the VM.
*/
- all_frozen_except_lp_dead = prstate.all_frozen;
- if (prstate.lpdead_items > 0)
- {
- prstate.all_visible = false;
- prstate.all_frozen = false;
- }
+ do_freeze = heap_page_will_freeze(relation, buffer,
+ do_prune,
+ do_hint_full_or_prunable,
+ did_tuple_hint_fpi,
+ &prstate,
+ &all_frozen_except_lp_dead);
+
+ do_set_vm = heap_page_will_update_vm(relation,
+ buffer, blockno, page,
+ reason,
+ do_prune, do_freeze,
+ blk_known_av,
+ &prstate,
+ &vmbuffer,
+ &vmflags, &set_pd_all_visible);
- /*
- * If this is an on-access call and we're not actually pruning, avoid
- * setting the visibility map if it would newly dirty the heap page or, if
- * the page is already dirty, if doing so would require including a
- * full-page image (FPI) of the heap page in the WAL. This situation
- * should be rare, as on-access pruning is only attempted when
- * pd_prune_xid is valid.
- */
- if (reason == PRUNE_ON_ACCESS &&
- prstate.consider_update_vm &&
- prstate.all_visible &&
- !do_prune && !do_freeze &&
- (!BufferIsDirty(buffer) || XLogCheckBufferNeedsBackup(buffer)))
- {
- prstate.consider_update_vm = false;
- prstate.all_visible = prstate.all_frozen = false;
- }
-
- Assert(!prstate.all_frozen || prstate.all_visible);
-
- /*
- * Handle setting visibility map bit based on information from the VM (if
- * provided, e.g. by vacuum from the last heap_vac_scan_next_block()
- * call), and from all_visible and all_frozen variables.
- */
- if (prstate.consider_update_vm)
- {
- /*
- * Clear any VM corruption. This does not need to be in a critical
- * section, so we do it first. If PD_ALL_VISIBLE is incorrectly set,
- * we may mark the heap page buffer dirty here and could end up doing
- * so again later. This is not a correctness issue and is in the path
- * of VM corruption, so we don't have to worry about the extra
- * performance overhead.
- */
- if (identify_and_fix_vm_corruption(relation,
- blockno, buffer, page,
- blk_known_av, prstate.lpdead_items, vmbuffer))
- {
- /* If we fix corruption, don't update the VM further */
- }
-
- /* Determine if we actually need to set the VM and which bits to set. */
- else if (prstate.all_visible &&
- (!blk_known_av ||
- (prstate.all_frozen && !VM_ALL_FROZEN(relation, blockno, &vmbuffer))))
- {
- vmflags |= VISIBILITYMAP_ALL_VISIBLE;
- if (prstate.all_frozen)
- vmflags |= VISIBILITYMAP_ALL_FROZEN;
- }
- }
-
- do_set_vm = vmflags & VISIBILITYMAP_VALID_BITS;
+ /* Save these for the caller in case we later zero out vmflags */
+ presult->new_vmbits = vmflags;
- /* Lock vmbuffer before entering a critical section */
+ /* Lock vmbuffer before entering critical section */
if (do_set_vm)
LockBuffer(vmbuffer, BUFFER_LOCK_EXCLUSIVE);
/*
- * Don't set PD_ALL_VISIBLE unless we also plan to set the VM. While it is
- * correct for a heap page to have PD_ALL_VISIBLE even if the VM is not
- * set, we strongly prefer to keep them in sync.
- *
- * Prior to Postgres 19, it was possible for the page-level bit to be set
- * and the VM bit to be clear. This could happen if we crashed after
- * setting PD_ALL_VISIBLE but before setting bits in the VM.
+ * Time to actually make the changes to the page and log them. Any error
+ * while applying the changes is critical.
*/
- set_pd_all_visible = do_set_vm && !PageIsAllVisible(page);
-
- /* Save these for the caller in case we later zero out vmflags */
- presult->new_vmbits = vmflags;
-
- /* Any error while applying the changes is critical */
START_CRIT_SECTION();
if (do_hint_full_or_prunable)
--
2.43.0
v13-0016-Unset-all-visible-sooner-if-not-freezing.patchtext/x-patch; charset=US-ASCII; name=v13-0016-Unset-all-visible-sooner-if-not-freezing.patchDownload
From 8de90b1801ac3b59977d90c15aee854b40f3f043 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <melanieplageman@gmail.com>
Date: Tue, 29 Jul 2025 14:35:13 -0400
Subject: [PATCH v13 16/20] Unset all-visible sooner if not freezing
In prune/freeze code, we delay unsetting all-visible/all-frozen in the
presence of dead items to allow opportunistically freezing tuples if the
whole page would be frozen except for those dead items -- which are
removed later in vacuum's third phase.
Future commits will allow on-access pruning to set the VM, which means
all-visible will be initialized to true instead of false and we will do
extra bookkeeping in heap_prune_unchanged_lp_normal() to keep track of
whether or not the page is all-visible.
Because on-access pruning will not freeze tuples, it makes sense to
unset all-visible as soon as we encounter an LP_DEAD item and
avoid continued bookkeeping since we know the page is not all-visible
and we won't be able to remove those dead items.
---
src/backend/access/heap/pruneheap.c | 10 ++++++++--
1 file changed, 8 insertions(+), 2 deletions(-)
diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index b8ca1be15a0..4f4a0af1f04 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -1503,8 +1503,11 @@ heap_prune_record_dead(PruneState *prstate, OffsetNumber offnum,
/*
* Deliberately delay unsetting all_visible until later during pruning.
- * Removable dead tuples shouldn't preclude freezing the page.
+ * Removable dead tuples shouldn't preclude freezing the page. If we won't
+ * attempt freezing, just unset all-visible now, though.
*/
+ if (!prstate->attempt_freeze)
+ prstate->all_visible = prstate->all_frozen = false;
/* Record the dead offset for vacuum */
prstate->deadoffsets[prstate->lpdead_items++] = offnum;
@@ -1762,8 +1765,11 @@ heap_prune_record_unchanged_lp_dead(Page page, PruneState *prstate, OffsetNumber
* Similarly, don't unset all_visible until later, at the end of
* heap_page_prune_and_freeze(). This will allow us to attempt to freeze
* the page after pruning. As long as we unset it before updating the
- * visibility map, this will be correct.
+ * visibility map, this will be correct. If we won't attempt freezing,
+ * though, just unset all-visible now.
*/
+ if (!prstate->attempt_freeze)
+ prstate->all_visible = prstate->all_frozen = false;
/* Record the dead offset for vacuum */
prstate->deadoffsets[prstate->lpdead_items++] = offnum;
--
2.43.0
v13-0020-Set-pd_prune_xid-on-insert.patchtext/x-patch; charset=US-ASCII; name=v13-0020-Set-pd_prune_xid-on-insert.patchDownload
From e041d4a571d77f9159d914a89dcfc49a9419463d Mon Sep 17 00:00:00 2001
From: Melanie Plageman <melanieplageman@gmail.com>
Date: Tue, 29 Jul 2025 16:12:56 -0400
Subject: [PATCH v13 20/20] Set pd_prune_xid on insert
Now that we can set the VM during read-only queries, it makes sense to
start setting the page prunable hint on insert. This will allow
heap_page_prune_and_freeze() to be called when the page is full or
mostly full.
For years there has been a note in heap_insert() and heap_multi_insert()
pointing out that setting pd_prune_xid would help clean up aborted
inserted tuples that would otherwise not be cleaned up until vacuum.
So, that's another benefit of setting it.
Setting pd_prune_xid on insert causes a page to be pruned and then
written out which then affects the reported number of hits in the
index-killtuples isolation test. This is a quirk of how hits are tracked
which sometimes leads them to be double counted. This should probably be
fixed or changed independently.
ci-os-only:
---
src/backend/access/heap/heapam.c | 25 +++++++++++++------
src/backend/access/heap/heapam_xlog.c | 15 ++++++++++-
.../isolation/expected/index-killtuples.out | 6 ++---
3 files changed, 34 insertions(+), 12 deletions(-)
diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index eedc7cb07bf..442e557aeaf 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -2105,6 +2105,7 @@ heap_insert(Relation relation, HeapTuple tup, CommandId cid,
TransactionId xid = GetCurrentTransactionId();
HeapTuple heaptup;
Buffer buffer;
+ Page page;
Buffer vmbuffer = InvalidBuffer;
bool all_visible_cleared = false;
@@ -2164,15 +2165,19 @@ heap_insert(Relation relation, HeapTuple tup, CommandId cid,
}
/*
- * XXX Should we set PageSetPrunable on this page ?
+ * Set pd_prune_xid to trigger heap_page_prune_and_freeze() once the page
+ * is full so that we can set the page all-visible in the VM.
*
- * The inserting transaction may eventually abort thus making this tuple
- * DEAD and hence available for pruning. Though we don't want to optimize
- * for aborts, if no other tuple in this page is UPDATEd/DELETEd, the
- * aborted tuple will never be pruned until next vacuum is triggered.
+ * Setting pd_prune_xid is also handy if the inserting transaction
+ * eventually aborts making this tuple DEAD and hence available for
+ * pruning. If no other tuple in this page is UPDATEd/DELETEd, the aborted
+ * tuple would never otherwise be pruned until next vacuum is triggered.
*
- * If you do add PageSetPrunable here, add it in heap_xlog_insert too.
+ * Don't set it if we are in bootstrap mode, though.
*/
+ page = BufferGetPage(buffer);
+ if (TransactionIdIsNormal(xid))
+ PageSetPrunable(page, xid);
MarkBufferDirty(buffer);
@@ -2182,7 +2187,6 @@ heap_insert(Relation relation, HeapTuple tup, CommandId cid,
xl_heap_insert xlrec;
xl_heap_header xlhdr;
XLogRecPtr recptr;
- Page page = BufferGetPage(buffer);
uint8 info = XLOG_HEAP_INSERT;
int bufflags = 0;
@@ -2546,8 +2550,13 @@ heap_multi_insert(Relation relation, TupleTableSlot **slots, int ntuples,
}
/*
- * XXX Should we set PageSetPrunable on this page ? See heap_insert()
+ * Set pd_prune_xid. See heap_insert() for more on why we do this when
+ * inserting tuples. This only makes sense if we aren't already
+ * setting the page frozen in the VM. We also don't set it in
+ * bootstrap mode.
*/
+ if (!all_frozen_set && TransactionIdIsNormal(xid))
+ PageSetPrunable(page, xid);
MarkBufferDirty(buffer);
diff --git a/src/backend/access/heap/heapam_xlog.c b/src/backend/access/heap/heapam_xlog.c
index 84c2924967d..eed0619c0ad 100644
--- a/src/backend/access/heap/heapam_xlog.c
+++ b/src/backend/access/heap/heapam_xlog.c
@@ -481,6 +481,12 @@ heap_xlog_insert(XLogReaderState *record)
freespace = PageGetHeapFreeSpace(page); /* needed to update FSM below */
+ /*
+ * Set the page prunable to trigger on-access pruning later which may
+ * set the page all-visible in the VM.
+ */
+ PageSetPrunable(page, XLogRecGetXid(record));
+
PageSetLSN(page, lsn);
if (xlrec->flags & XLH_INSERT_ALL_VISIBLE_CLEARED)
@@ -630,9 +636,16 @@ heap_xlog_multi_insert(XLogReaderState *record)
if (xlrec->flags & XLH_INSERT_ALL_VISIBLE_CLEARED)
PageClearAllVisible(page);
- /* XLH_INSERT_ALL_FROZEN_SET implies that all tuples are visible */
+ /*
+ * XLH_INSERT_ALL_FROZEN_SET implies that all tuples are visible. If
+ * we are not setting the page frozen, then set the page's prunable
+ * hint so that we trigger on-access pruning later which may set the
+ * page all-visible in the VM.
+ */
if (xlrec->flags & XLH_INSERT_ALL_FROZEN_SET)
PageSetAllVisible(page);
+ else
+ PageSetPrunable(page, XLogRecGetXid(record));
MarkBufferDirty(buffer);
}
diff --git a/src/test/isolation/expected/index-killtuples.out b/src/test/isolation/expected/index-killtuples.out
index be7ddd756ef..b29f2434b00 100644
--- a/src/test/isolation/expected/index-killtuples.out
+++ b/src/test/isolation/expected/index-killtuples.out
@@ -54,7 +54,7 @@ step flush: SELECT FROM pg_stat_force_next_flush();
step result: SELECT heap_blks_read + heap_blks_hit - counter.heap_accesses AS new_heap_accesses FROM counter, pg_statio_all_tables WHERE relname = 'kill_prior_tuple';
new_heap_accesses
-----------------
- 1
+ 2
(1 row)
step measure: UPDATE counter SET heap_accesses = (SELECT heap_blks_read + heap_blks_hit FROM pg_statio_all_tables WHERE relname = 'kill_prior_tuple');
@@ -130,7 +130,7 @@ step flush: SELECT FROM pg_stat_force_next_flush();
step result: SELECT heap_blks_read + heap_blks_hit - counter.heap_accesses AS new_heap_accesses FROM counter, pg_statio_all_tables WHERE relname = 'kill_prior_tuple';
new_heap_accesses
-----------------
- 1
+ 2
(1 row)
step measure: UPDATE counter SET heap_accesses = (SELECT heap_blks_read + heap_blks_hit FROM pg_statio_all_tables WHERE relname = 'kill_prior_tuple');
@@ -283,7 +283,7 @@ step flush: SELECT FROM pg_stat_force_next_flush();
step result: SELECT heap_blks_read + heap_blks_hit - counter.heap_accesses AS new_heap_accesses FROM counter, pg_statio_all_tables WHERE relname = 'kill_prior_tuple';
new_heap_accesses
-----------------
- 1
+ 2
(1 row)
step measure: UPDATE counter SET heap_accesses = (SELECT heap_blks_read + heap_blks_hit FROM pg_statio_all_tables WHERE relname = 'kill_prior_tuple');
--
2.43.0
v13-0019-Reorder-heap_page_prune_and_freeze-parameters.patchtext/x-patch; charset=US-ASCII; name=v13-0019-Reorder-heap_page_prune_and_freeze-parameters.patchDownload
From 696833a9710180be1d90507a9f267d4436327b2c Mon Sep 17 00:00:00 2001
From: Melanie Plageman <melanieplageman@gmail.com>
Date: Thu, 31 Jul 2025 12:08:18 -0400
Subject: [PATCH v13 19/20] Reorder heap_page_prune_and_freeze parameters
Reorder parameters so that all of the output parameters are together at
the end of the parameter list.
---
src/backend/access/heap/pruneheap.c | 38 ++++++++++++++--------------
src/backend/access/heap/vacuumlazy.c | 6 ++---
src/include/access/heapam.h | 4 +--
3 files changed, 24 insertions(+), 24 deletions(-)
diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index 33a35dc5aab..c1c0dae87ba 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -297,10 +297,10 @@ heap_page_prune_opt(Relation relation, Buffer buffer, Buffer *vmbuffer)
* not the relation has indexes, since we cannot safely determine
* that during on-access pruning with the current implementation.
*/
- heap_page_prune_and_freeze(relation, buffer, false,
+ heap_page_prune_and_freeze(relation, buffer, options, false,
vmbuffer ? *vmbuffer : InvalidBuffer,
- vistest, options,
- NULL, &presult, PRUNE_ON_ACCESS,
+ vistest,
+ NULL, PRUNE_ON_ACCESS, &presult,
&dummy_off_loc, NULL, NULL);
/*
@@ -651,6 +651,15 @@ heap_page_will_freeze(Relation relation, Buffer buffer,
* also need to account for a reduction in the length of the line pointer
* array following array truncation by us.
*
+ * options:
+ * MARK_UNUSED_NOW indicates that dead items can be set LP_UNUSED during
+ * pruning.
+ *
+ * FREEZE indicates that we will also freeze tuples, and will return
+ * 'all_visible', 'all_frozen' flags to the caller.
+ *
+ * UPDATE_VM indicates that we will set the page's status in the VM.
+ *
* If the HEAP_PRUNE_FREEZE option is set, we will also freeze tuples if it's
* required in order to advance relfrozenxid / relminmxid, or if it's
* considered advantageous for overall system performance to do so now. The
@@ -669,30 +678,21 @@ heap_page_will_freeze(Relation relation, Buffer buffer,
* contain the required block of the visibility map.
*
* vistest is used to distinguish whether tuples are DEAD or RECENTLY_DEAD
- * (see heap_prune_satisfies_vacuum).
- *
- * options:
- * MARK_UNUSED_NOW indicates that dead items can be set LP_UNUSED during
- * pruning.
- *
- * FREEZE indicates that we will also freeze tuples, and will return
- * 'all_visible', 'all_frozen' flags to the caller.
- *
- * UPDATE_VM indicates that we will set the page's status in the VM.
+ * (see heap_prune_satisfies_vacuum). It is an input parameter.
*
* cutoffs contains the freeze cutoffs, established by VACUUM at the beginning
* of vacuuming the relation. Required if HEAP_PRUNE_FREEZE option is set.
* cutoffs->OldestXmin is also used to determine if dead tuples are
- * HEAPTUPLE_RECENTLY_DEAD or HEAPTUPLE_DEAD.
+ * HEAPTUPLE_RECENTLY_DEAD or HEAPTUPLE_DEAD. It is an input parameter.
+ *
+ * reason indicates why the pruning is performed. It is included in the WAL
+ * record for debugging and analysis purposes, but otherwise has no effect.
*
* presult contains output parameters needed by callers, such as the number of
* tuples removed and the offsets of dead items on the page after pruning.
* heap_page_prune_and_freeze() is responsible for initializing it. Required
* by all callers.
*
- * reason indicates why the pruning is performed. It is included in the WAL
- * record for debugging and analysis purposes, but otherwise has no effect.
- *
* off_loc is the offset location required by the caller to use in error
* callback.
*
@@ -705,13 +705,13 @@ heap_page_will_freeze(Relation relation, Buffer buffer,
*/
void
heap_page_prune_and_freeze(Relation relation, Buffer buffer,
+ int options,
bool blk_known_av,
Buffer vmbuffer,
GlobalVisState *vistest,
- int options,
struct VacuumCutoffs *cutoffs,
- PruneFreezeResult *presult,
PruneReason reason,
+ PruneFreezeResult *presult,
OffsetNumber *off_loc,
TransactionId *new_relfrozen_xid,
MultiXactId *new_relmin_mxid)
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index 4ad05ba4db6..4fb915e1d94 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -1993,11 +1993,11 @@ lazy_scan_prune(LVRelState *vacrel,
if (vacrel->nindexes == 0)
prune_options |= HEAP_PAGE_PRUNE_MARK_UNUSED_NOW;
- heap_page_prune_and_freeze(rel, buf,
+ heap_page_prune_and_freeze(rel, buf, prune_options,
all_visible_according_to_vm,
vmbuffer,
- vacrel->vistest, prune_options,
- &vacrel->cutoffs, &presult, PRUNE_VACUUM_SCAN,
+ vacrel->vistest,
+ &vacrel->cutoffs, PRUNE_VACUUM_SCAN, &presult,
&vacrel->offnum,
&vacrel->NewRelfrozenXid, &vacrel->NewRelminMxid);
diff --git a/src/include/access/heapam.h b/src/include/access/heapam.h
index 2210a5e0a79..ca3f37c2925 100644
--- a/src/include/access/heapam.h
+++ b/src/include/access/heapam.h
@@ -394,13 +394,13 @@ struct GlobalVisState;
extern void heap_page_prune_opt(Relation relation, Buffer buffer,
Buffer *vmbuffer);
extern void heap_page_prune_and_freeze(Relation relation, Buffer buffer,
+ int options,
bool blk_known_av,
Buffer vmbuffer,
struct GlobalVisState *vistest,
- int options,
struct VacuumCutoffs *cutoffs,
- PruneFreezeResult *presult,
PruneReason reason,
+ PruneFreezeResult *presult,
OffsetNumber *off_loc,
TransactionId *new_relfrozen_xid,
MultiXactId *new_relmin_mxid);
--
2.43.0
On Tue, Sep 9, 2025 at 7:08 PM Melanie Plageman
<melanieplageman@gmail.com> wrote:
Fair. I've introduced new XLHP flags in attached v13. Hopefully it
puts an end to the horror.
I suggest not renumbering all of the existing flags and just adding
these new ones at the end. Less code churn and more likely to break in
an obvious way if you mix up the two sets of flags.
More on 0002:
+ set_heap_lsn = XLogHintBitIsNeeded() ? true : set_heap_lsn;
Maybe just if (XLogHintBitIsNeeded) set_heap_lsn = true? I don't feel
super-strongly that what you've done is bad but it looks weird to my
eyes.
+ * If we released any space or line pointers or will be setting a page in
+ * the visibility map, measure the page's freespace to later update the
"setting a page in the visibility map" seems a little muddled to me.
You set bits, not pages.
+ * all-visible (or all-frozen, depending on the vacuum mode,) which is
This comma placement is questionable.
/*
+ * Note that the heap relation may have been dropped or truncated, leading
+ * us to skip updating the heap block due to the LSN interlock. However,
+ * even in that case, it's still safe to update the visibility map. Any
+ * WAL record that clears the visibility map bit does so before checking
+ * the page LSN, so any bits that need to be cleared will still be
+ * cleared.
+ *
+ * Note that the lock on the heap page was dropped above. In normal
+ * operation this would never be safe because a concurrent query could
+ * modify the heap page and clear PD_ALL_VISIBLE -- violating the
+ * invariant that PD_ALL_VISIBLE must be set if the corresponding bit in
+ * the VM is set.
+ *
+ * In recovery, we expect no other writers, so writing to the VM page
+ * without holding a lock on the heap page is considered safe enough. It
+ * is done this way when replaying xl_heap_visible records (see
*/
How many copies of this comment do you plan to end up with?
The comment for log_heap_prune_and_freeze seems to be anticipating future work.
0004. It is not clear to me why you need to get
log_heap_prune_and_freeze to do the work here. Why can't
log_newpage_buffer get the job done already?Well, I need something to emit the changes to the VM. I'm eliminating
all users of xl_heap_visible. Empty pages are the ones that benefit
the least from switching from xl_heap_visible -> xl_heap_prune. But,
if I don't transition them, we have to maintain all the
xl_heap_visible code (including visibilitymap_set() in its long form).As for log_newpage_buffer(), I could keep it if you think it is too
confusing to change log_heap_prune_and_freeze()'s API (by passing
force_heap_fpi) to handle this case, I can leave log_newpage_buffer()
there and then call log_heap_prune_and_freeze().I just thought it seemed simple to avoid emitting the new page record
and the VM update record, so why not -- but I don't have strong
feelings.
Yeah, I'm not sure what the right thing to do here is. I think I was
again experiencing brain fade by forgetting that there is a heap page
and a VM page and, of course, log_heap_newpage() probably isn't going
to touch the latter. So that makes sense. On the other hand, we could
only have one type of WAL record for every single operation in the
system if we gave it enough flags, and force_heap_fpi seems
suspiciously like a flag that turns this into a whole different kind
of WAL record.
0005. It looks a little curious that you delete the
identify-corruption logic from the end of the if-nest and add it to
the beginning. Ceteris paribus, you'd expect that to be worse, since
corruption is a rare case.On master, the two corruption cases are sandwiched between the normal
VM set cases. And I actually think doing it this way is brittle. If
you put the cases which set the VM first, you have to have completely
bulletproof the if statements guarding them to foreclose any possible
corruption case from entering because otherwise you will overwrite the
corruption you then try to detect.
Hmm. In the current code, we first test (!all_visible_according_to_vm
&& presult.all_visible), then (all_visible_according_to_vm &&
!PageIsAllVisible(page) && visibilitymap_get_status(vacrel->rel,
blkno, &vmbuffer) != 0), and then (presult.lpdead_items > 0 &&
PageIsAllVisible(page)). The first and second can never coexist,
because they require opposite values of all_visible_according_to_vm.
The second and third cannot coexist because they require opposite
values of PageIsAllVisible(page). It is not entirely obvious that the
first and third tests couldn't both pass, but you'd have to have
presult.all_visible and presult.lpdead_items > 0, and it's a bit hard
to see how heap_page_prune_and_freeze() could ever allow that.
Consider:
if (prstate.all_visible && prstate.lpdead_items == 0)
{
presult->all_visible = prstate.all_visible;
presult->all_frozen = prstate.all_frozen;
}
else
{
presult->all_visible = false;
presult->all_frozen = false;
}
...
presult->lpdead_items = prstate.lpdead_items;
So I don't really think I'm persuaded that the current way is brittle.
But that having been said, I agree with you that the order of the
checks is kind of random, and I don't think it really matters that
much for performance. What does matter is clarity. I feel like what
I'd ideally like this logic to do is say: do we want the VM bit for
the page to be set to all-frozen, just all-visible, or neither? Then
push the VM bit to the correct state, dragging the page-level bit
along behind. And the current logic sort of does that. It's roughly:
1. Should we go from not-all-visible to either all-visible or
all-frozen? If yes, do so.
2. Should we go from either all-visible or all-frozen to
not-all-visible? If yes, do so.
3. Should we go from either all-visible or all-frozen to
not-all-visible for a different reason? If yes, do so.
4. Should we go from all-visible to all-frozen? If yes, do so.
But what's weird is that all the tests are written differently, and we
have two different reasons for going to not-all-visible, namely
PD_ALL_VISIBLE-not-set and dead-items-on-page, whereas there's only
one test for each of the other state-transitions, because the
decision-making for those cases is fully completed at an earlier
stage. I would kind of like to see this expressed in a way that first
decides which state transition to make (forward-to-all-frozen,
forward-to-all-visible, backward-to-all-visible,
backward-to-not-all-visible, nothing) and then does the corresponding
work. What you're doing instead is splitting half of those functions
off into a helper function while keeping the other half where they are
without cleaning up any of the logic. Now, maybe that's OK: I'm far
from having grokked the whole patch set. But it is not any more clear
than what we have now, IMHO, and perhaps even a bit less so.
--
Robert Haas
EDB: http://www.enterprisedb.com
On Wed, Sep 10, 2025 at 4:01 PM Robert Haas <robertmhaas@gmail.com> wrote:
On Tue, Sep 9, 2025 at 7:08 PM Melanie Plageman
<melanieplageman@gmail.com> wrote:Fair. I've introduced new XLHP flags in attached v13. Hopefully it
puts an end to the horror.I suggest not renumbering all of the existing flags and just adding
these new ones at the end. Less code churn and more likely to break in
an obvious way if you mix up the two sets of flags.
Makes sense. In my attached v14, I have not renumbered them.
More on 0002:
After an off-list discussion we had about how to make the patches in
the set progressively improve the code instead of just mechanically
refactoring it, I have made some big changes in the intermediate
patches in the set.
Before actually including the VM changes in the vacuum/prune WAL
records, I first include setting PD_ALL_VISIBLE with the other changes
to the heap page so that we can remove the heap page from the VM
setting WAL chain. This happens to fix the bug we discussed where if
you set an all-visible page all-frozen and checksums/wal_log_hints are
enabled, you may end up setting an LSN on a page that was not marked
dirty.
0001 is RFC but waiting on one other reviewer
0002 - 0007 is a bit of cleanup I had later in the patch set but moved
up because I think it made the intermediate patches better
0008 - 0012 removes the heap page from the XLOG_HEAP2_VISIBLE WAL
chain (it makes all callers of visibilitymap_set() set PD_ALL_VISIBLE
in the same WAL record as changes to the heap page)
0013 - 0018 finish the job eliminating XLOG_HEAP2_VISIBLE and set VM
bits in the same WAL record as the heap changes
0019 - 0024 set the VM on-access
/* + * Note that the heap relation may have been dropped or truncated, leading + * us to skip updating the heap block due to the LSN interlock. However, + * even in that case, it's still safe to update the visibility map. Any + * WAL record that clears the visibility map bit does so before checking + * the page LSN, so any bits that need to be cleared will still be + * cleared. + * + * Note that the lock on the heap page was dropped above. In normal + * operation this would never be safe because a concurrent query could + * modify the heap page and clear PD_ALL_VISIBLE -- violating the + * invariant that PD_ALL_VISIBLE must be set if the corresponding bit in + * the VM is set. + * + * In recovery, we expect no other writers, so writing to the VM page + * without holding a lock on the heap page is considered safe enough. It + * is done this way when replaying xl_heap_visible records (see */How many copies of this comment do you plan to end up with?
By the end, one for copy freeze replay and one for prune/freeze/vacuum
replay. I felt two wasn't too bad and was easier than meta-explaining
what the other comment was explaining.
0004. It is not clear to me why you need to get
log_heap_prune_and_freeze to do the work here. Why can't
log_newpage_buffer get the job done already?Well, I need something to emit the changes to the VM. I'm eliminating
all users of xl_heap_visible. Empty pages are the ones that benefit
the least from switching from xl_heap_visible -> xl_heap_prune. But,
if I don't transition them, we have to maintain all the
xl_heap_visible code (including visibilitymap_set() in its long form).As for log_newpage_buffer(), I could keep it if you think it is too
confusing to change log_heap_prune_and_freeze()'s API (by passing
force_heap_fpi) to handle this case, I can leave log_newpage_buffer()
there and then call log_heap_prune_and_freeze().I just thought it seemed simple to avoid emitting the new page record
and the VM update record, so why not -- but I don't have strong
feelings.Yeah, I'm not sure what the right thing to do here is. I think I was
again experiencing brain fade by forgetting that there is a heap page
and a VM page and, of course, log_heap_newpage() probably isn't going
to touch the latter. So that makes sense. On the other hand, we could
only have one type of WAL record for every single operation in the
system if we gave it enough flags, and force_heap_fpi seems
suspiciously like a flag that turns this into a whole different kind
of WAL record.
I've kept log_heap_newpage() and used log_heap_prune_and_freeze() for
setting PD_ALL_VISIBLE and the VM.
0005. It looks a little curious that you delete the
identify-corruption logic from the end of the if-nest and add it to
the beginning. Ceteris paribus, you'd expect that to be worse, since
corruption is a rare case.On master, the two corruption cases are sandwiched between the normal
VM set cases. And I actually think doing it this way is brittle. If
you put the cases which set the VM first, you have to have completely
bulletproof the if statements guarding them to foreclose any possible
corruption case from entering because otherwise you will overwrite the
corruption you then try to detect.Hmm. In the current code, we first test (!all_visible_according_to_vm
&& presult.all_visible), then (all_visible_according_to_vm &&
!PageIsAllVisible(page) && visibilitymap_get_status(vacrel->rel,
blkno, &vmbuffer) != 0), and then (presult.lpdead_items > 0 &&
PageIsAllVisible(page)). The first and second can never coexist,
because they require opposite values of all_visible_according_to_vm.
The second and third cannot coexist because they require opposite
values of PageIsAllVisible(page). It is not entirely obvious that the
first and third tests couldn't both pass, but you'd have to have
presult.all_visible and presult.lpdead_items > 0, and it's a bit hard
to see how heap_page_prune_and_freeze() could ever allow that.
Consider:if (prstate.all_visible && prstate.lpdead_items == 0)
{
presult->all_visible = prstate.all_visible;
presult->all_frozen = prstate.all_frozen;
}
else
{
presult->all_visible = false;
presult->all_frozen = false;
}
...
presult->lpdead_items = prstate.lpdead_items;So I don't really think I'm persuaded that the current way is brittle.
I meant brittle because it has to be so carefully coded for it to work
out this way. If you ever wanted to change or enhance it, it's quite
hard to know how to make sure all of them are entirely mutually
exclusive.
But that having been said, I agree with you that the order of the
checks is kind of random, and I don't think it really matters that
much for performance. What does matter is clarity. I feel like what
I'd ideally like this logic to do is say: do we want the VM bit for
the page to be set to all-frozen, just all-visible, or neither? Then
push the VM bit to the correct state, dragging the page-level bit
along behind. And the current logic sort of does that. It's roughly:1. Should we go from not-all-visible to either all-visible or
all-frozen? If yes, do so.
2. Should we go from either all-visible or all-frozen to
not-all-visible? If yes, do so.
3. Should we go from either all-visible or all-frozen to
not-all-visible for a different reason? If yes, do so.
4. Should we go from all-visible to all-frozen? If yes, do so.
I don't necessarily agree that fixing corruption and setting the VM
should be together -- they feel like separate things to me. But, I
don't feel strongly enough about it to push it.
But what's weird is that all the tests are written differently, and we
have two different reasons for going to not-all-visible, namely
PD_ALL_VISIBLE-not-set and dead-items-on-page, whereas there's only
one test for each of the other state-transitions, because the
decision-making for those cases is fully completed at an earlier
stage. I would kind of like to see this expressed in a way that first
decides which state transition to make (forward-to-all-frozen,
forward-to-all-visible, backward-to-all-visible,
backward-to-not-all-visible, nothing) and then does the corresponding
work. What you're doing instead is splitting half of those functions
off into a helper function while keeping the other half where they are
without cleaning up any of the logic. Now, maybe that's OK: I'm far
from having grokked the whole patch set. But it is not any more clear
than what we have now, IMHO, and perhaps even a bit less so.
In terms of my patch set, I do have to change something about this
mixture of fixing corruption and setting the VM because I need to set
the VM bits in the same critical section as making the other changes
to the heap page (pruning, etc) and include the VM set changes in the
same WAL record (note that clearing the VM to fix corruption is not
WAL-logged).
What I've gone with is determining what to set the VM bits to and then
fixing the corruption at the same time. Then, later, when making the
changes to the heap page, I actually set the VM. This is kind of the
opposite of what you suggested above -- determining what to set the
bits to altogether -- corruption and non-corruption cases together. I
don't think we can do that though, because fixing the corruption is
non WAL-logged changes to the page and VM and setting the VM bits is a
WAL-logged change. And, you can't clear bits with visibilitymap_set()
(there's an assertion about that). So you have to call different
functions (not to mention emit distinct error messages). I don't know
that I've come up with the ideal solution, though.
- Melanie
Attachments:
v14-0015-Set-empty-pages-all-visible-in-XLOG_HEAP2_PRUNE_.patchtext/x-patch; charset=US-ASCII; name=v14-0015-Set-empty-pages-all-visible-in-XLOG_HEAP2_PRUNE_.patchDownload
From ed61f88812f33cb96cebeabc5c9c43a11cdd5a3e Mon Sep 17 00:00:00 2001
From: Melanie Plageman <melanieplageman@gmail.com>
Date: Wed, 17 Sep 2025 16:04:18 -0400
Subject: [PATCH v14 15/24] Set empty pages all-visible in
XLOG_HEAP2_PRUNE_VACUUM_SCAN record
As part of a project to eliminate XLOG_HEAP2_VISIBLE records, eliminate
their usage in phase I vacuum of empty pages.
---
src/backend/access/heap/vacuumlazy.c | 55 +++++++++++++++++-----------
1 file changed, 34 insertions(+), 21 deletions(-)
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index b6c973cd111..e01fc5bb502 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -1882,11 +1882,21 @@ lazy_scan_new_or_empty(LVRelState *vacrel, Buffer buf, BlockNumber blkno,
*/
if (!PageIsAllVisible(page))
{
+ bool set_pd_all_vis = true;
+
+ /* Lock vmbuffer before entering critical section */
+ LockBuffer(vmbuffer, BUFFER_LOCK_EXCLUSIVE);
+
START_CRIT_SECTION();
PageSetAllVisible(page);
MarkBufferDirty(buf);
+ visibilitymap_set_vmbits(vacrel->rel, blkno,
+ vmbuffer,
+ VISIBILITYMAP_ALL_VISIBLE |
+ VISIBILITYMAP_ALL_FROZEN);
+
if (RelationNeedsWAL(vacrel->rel))
{
/*
@@ -1897,34 +1907,37 @@ lazy_scan_new_or_empty(LVRelState *vacrel, Buffer buf, BlockNumber blkno,
* all-visible and find that the page isn't initialized, which
* will cause a PANIC. To prevent that, check whether the page
* has been previously WAL-logged, and if not, do that now.
- *
- * Otherwise, just emit WAL for setting PD_ALL_VISIBLE on the
- * heap page. Doing this in a separate record from setting the
- * VM allows us to omit the heap page from the VM WAL chain.
*/
if (PageGetLSN(page) == InvalidXLogRecPtr)
+ {
log_newpage_buffer(buf, true);
- else
- log_heap_prune_and_freeze(vacrel->rel, buf,
- InvalidBuffer,
- 0,
- InvalidTransactionId, /* conflict xid */
- false, /* cleanup lock */
- true, /* set_pd_all_vis */
- PRUNE_VACUUM_SCAN, /* reason */
- NULL, 0,
- NULL, 0,
- NULL, 0,
- NULL, 0);
+ set_pd_all_vis = false;
+ }
+
+ /*
+ * Emit WAL for setting PD_ALL_VISIBLE on the heap page and
+ * setting the VM. If we emitted a new page record for the
+ * page above, setting PD_ALL_VISIBLE will already have been
+ * included in that record.
+ */
+ log_heap_prune_and_freeze(vacrel->rel, buf,
+ vmbuffer,
+ VISIBILITYMAP_ALL_VISIBLE |
+ VISIBILITYMAP_ALL_FROZEN,
+ InvalidTransactionId, /* conflict xid */
+ false, /* cleanup lock */
+ set_pd_all_vis,
+ PRUNE_VACUUM_SCAN, /* reason */
+ NULL, 0,
+ NULL, 0,
+ NULL, 0,
+ NULL, 0);
}
- visibilitymap_set(vacrel->rel, blkno,
- InvalidXLogRecPtr,
- vmbuffer, InvalidTransactionId,
- VISIBILITYMAP_ALL_VISIBLE |
- VISIBILITYMAP_ALL_FROZEN);
END_CRIT_SECTION();
+ LockBuffer(vmbuffer, BUFFER_LOCK_UNLOCK);
+
/* Count the newly all-frozen pages for logging */
vacrel->vm_new_visible_pages++;
vacrel->vm_new_visible_frozen_pages++;
--
2.43.0
v14-0016-Set-VM-in-heap_page_prune_and_freeze.patchtext/x-patch; charset=US-ASCII; name=v14-0016-Set-VM-in-heap_page_prune_and_freeze.patchDownload
From 6d11a7bf77706bc4ddbdb156f25f9c53d4b1e615 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <melanieplageman@gmail.com>
Date: Tue, 16 Sep 2025 15:46:40 -0400
Subject: [PATCH v14 16/24] Set VM in heap_page_prune_and_freeze
The determination as to whether or not the page can be set
all-visible/all-frozen has already been done by the end of
heap_page_prune_and_freeze(). Vacuum waited until it returns to
lazy_scan_prune() to actually set the VM, though.
This commit moves setting the VM into heap_page_prune_and_freeze().
There are still two separate WAL records -- one for the changes to the
heap page and one for the changes to the VM. But, this is an incremental
step toward logging setting the VM in the same WAL record as pruning and
freezing.
Note that this is not used by on-access pruning.
---
src/backend/access/heap/pruneheap.c | 221 +++++++++++++++++++++++----
src/backend/access/heap/vacuumlazy.c | 146 ++----------------
src/include/access/heapam.h | 24 +--
3 files changed, 221 insertions(+), 170 deletions(-)
diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index 9e00fbf3cd1..e3f9967e26c 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -19,6 +19,7 @@
#include "access/htup_details.h"
#include "access/multixact.h"
#include "access/transam.h"
+#include "access/visibilitymap.h"
#include "access/xlog.h"
#include "access/visibilitymapdefs.h"
#include "access/xloginsert.h"
@@ -257,7 +258,9 @@ heap_page_prune_opt(Relation relation, Buffer buffer)
* not the relation has indexes, since we cannot safely determine
* that during on-access pruning with the current implementation.
*/
- heap_page_prune_and_freeze(relation, buffer, PRUNE_ON_ACCESS, 0, NULL,
+ heap_page_prune_and_freeze(relation, buffer,
+ InvalidBuffer, false,
+ PRUNE_ON_ACCESS, 0, NULL,
vistest, &presult, &dummy_off_loc, NULL, NULL);
/*
@@ -423,16 +426,115 @@ heap_page_will_freeze(Relation relation, Buffer buffer,
return do_freeze;
}
+/*
+ * Determine whether to set the visibility map bits based on information from
+ * the PruneState and blk_known_av, which some callers will provide after
+ * previously examining this heap page's VM bits (e.g. vacuum from the last
+ * heap_vac_scan_next_block() call).
+ *
+ * This should be called only after do_freeze has been decided (and do_prune
+ * has been set), as these factor into our heuristic-based decision.
+ *
+ * Returns true if the caller should set one or both of the VM bits and false
+ * otherwise.
+ */
+static bool
+heap_page_will_set_vis(Relation relation,
+ BlockNumber heap_blk,
+ Buffer heap_buf,
+ Buffer vmbuffer,
+ bool blk_known_av,
+ PruneState *prstate,
+ uint8 *vmflags,
+ bool *do_set_pd_vis)
+{
+ Page heap_page = BufferGetPage(heap_buf);
+ bool do_set_vm = false;
+
+ if (prstate->all_visible && !PageIsAllVisible(heap_page))
+ *do_set_pd_vis = true;
+
+ if ((prstate->all_visible && !blk_known_av) ||
+ (prstate->all_frozen && !VM_ALL_FROZEN(relation, heap_blk, &vmbuffer)))
+ {
+ *vmflags = VISIBILITYMAP_ALL_VISIBLE;
+ if (prstate->all_frozen)
+ *vmflags |= VISIBILITYMAP_ALL_FROZEN;
+
+ do_set_vm = true;
+ }
+
+ /*
+ * Now handle two potential corruption cases:
+ *
+ * These do not need to happen in a critical section and are not
+ * WAL-logged.
+ *
+ * As of PostgreSQL 9.2, the visibility map bit should never be set if the
+ * page-level bit is clear. However, it's possible that in vacuum the bit
+ * got cleared after heap_vac_scan_next_block() was called, so we must
+ * recheck with buffer lock before concluding that the VM is corrupt.
+ */
+ else if (blk_known_av && !PageIsAllVisible(heap_page) &&
+ visibilitymap_get_status(relation, heap_blk, &vmbuffer) != 0)
+ {
+ ereport(WARNING,
+ (errcode(ERRCODE_DATA_CORRUPTED),
+ errmsg("page is not marked all-visible but visibility map bit is set in relation \"%s\" page %u",
+ RelationGetRelationName(relation), heap_blk)));
+
+ visibilitymap_clear(relation, heap_blk, vmbuffer,
+ VISIBILITYMAP_VALID_BITS);
+ }
+
+ /*
+ * It's possible for the value returned by
+ * GetOldestNonRemovableTransactionId() to move backwards, so it's not
+ * wrong for us to see tuples that appear to not be visible to everyone
+ * yet, while PD_ALL_VISIBLE is already set. The real safe xmin value
+ * never moves backwards, but GetOldestNonRemovableTransactionId() is
+ * conservative and sometimes returns a value that's unnecessarily small,
+ * so if we see that contradiction it just means that the tuples that we
+ * think are not visible to everyone yet actually are, and the
+ * PD_ALL_VISIBLE flag is correct.
+ *
+ * There should never be LP_DEAD items on a page with PD_ALL_VISIBLE set,
+ * however.
+ */
+ else if (prstate->lpdead_items > 0 && PageIsAllVisible(heap_page))
+ {
+ ereport(WARNING,
+ (errcode(ERRCODE_DATA_CORRUPTED),
+ errmsg("page containing LP_DEAD items is marked as all-visible in relation \"%s\" page %u",
+ RelationGetRelationName(relation), heap_blk)));
+
+ PageClearAllVisible(heap_page);
+ MarkBufferDirty(heap_buf);
+ visibilitymap_clear(relation, heap_blk, vmbuffer,
+ VISIBILITYMAP_VALID_BITS);
+ }
+
+ /* We should only set the VM if PD_ALL_VISIBLE is set or will be */
+ Assert(!do_set_vm || PageIsAllVisible(heap_page) || *do_set_pd_vis);
+
+ return do_set_vm;
+}
/*
* Prune and repair fragmentation and potentially freeze tuples on the
- * specified page.
+ * specified page. If the page's visibility status has changed, update it in
+ * the VM.
*
* Caller must have pin and buffer cleanup lock on the page. Note that we
* don't update the FSM information for page on caller's behalf. Caller might
* also need to account for a reduction in the length of the line pointer
* array following array truncation by us.
*
+ * vmbuffer is the buffer that must already contain contain the required block
+ * of the visibility map if we are to update it. blk_known_av is the
+ * visibility status of the heap block as of the last call to
+ * find_next_unskippable_block().
+ *
* reason indicates why the pruning is performed. It is included in the WAL
* record for debugging and analysis purposes, but otherwise has no effect.
*
@@ -443,15 +545,20 @@ heap_page_will_freeze(Relation relation, Buffer buffer,
* FREEZE indicates that we will also freeze tuples, and will return
* 'all_visible', 'all_frozen' flags to the caller.
*
- * If the HEAP_PRUNE_FREEZE option is set, we will freeze tuples if it's
+ * UPDATE_VIS indicates that we will set the page's status in the VM.
+ *
+ * If the HEAP_PRUNE_FREEZE option is set, we will also freeze tuples if it's
* required in order to advance relfrozenxid / relminmxid, or if it's
* considered advantageous for overall system performance to do so now. The
* 'cutoffs', 'presult', 'new_relfrozen_xid' and 'new_relmin_mxid' arguments
- * are required when freezing. When HEAP_PRUNE_FREEZE option is set, we also
- * set presult->all_visible and presult->all_frozen on exit, to indicate if
- * the VM bits can be set. They are always set to false when the
- * HEAP_PRUNE_FREEZE option is not set, because at the moment only callers
- * that also freeze need that information.
+ * are required when freezing.
+ *
+ * If HEAP_PAGE_PRUNE_UPDATE_VIS is set and the visibility status of the page
+ * has changed, we will update the VM at the same time as pruning and freezing
+ * the heap page. We will also update presult->old_vmbits and
+ * presult->new_vmbits with the state of the VM before and after updating it
+ * for the caller to use in bookkeeping.
+ *
*
* cutoffs contains the freeze cutoffs, established by VACUUM at the beginning
* of vacuuming the relation. Required if HEAP_PRUNE_FREEZE option is set.
@@ -478,6 +585,7 @@ heap_page_will_freeze(Relation relation, Buffer buffer,
*/
void
heap_page_prune_and_freeze(Relation relation, Buffer buffer,
+ Buffer vmbuffer, bool blk_known_av,
PruneReason reason,
int options,
const struct VacuumCutoffs *cutoffs,
@@ -496,10 +604,13 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
bool do_freeze;
bool do_prune;
bool do_hint_prune;
+ bool do_set_vm;
bool do_set_pd_vis;
bool did_tuple_hint_fpi;
int64 fpi_before = pgWalUsage.wal_fpi;
TransactionId frz_conflict_horizon = InvalidTransactionId;
+ uint8 new_vmbits = 0;
+ uint8 old_vmbits = 0;
/* Copy parameters to prstate */
prstate.vistest = vistest;
@@ -828,19 +939,27 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
Assert(!prstate.all_frozen || prstate.all_visible);
/*
- * Though callers should set the VM if PD_ALL_VISIBLE is set here, it is
- * allowed for the page-level bit to be set and the VM to be clear.
+ * Determine whether or not to set the page level PD_ALL_VISIBLE and the
+ * visibility map bits based on information from the VM and from
+ * all_visible and all_frozen variables.
+ *
+ * Though callers should set the VM if PD_ALL_VISIBLE is set, it is
+ * allowed for the page-level bit to be set and the VM to be clear. We log
+ * setting PD_ALL_VISIBLE on the heap page in a
+ * XLOG_HEAP2_PRUNE_VACUUM_SCAN record and setting the VM bits in a later
+ * emitted XLOG_HEAP2_VISIBLE record.
+ *
* Setting PD_ALL_VISIBLE when we are making the changes to the page that
* render it all-visible allows us to omit the heap page from the WAL
* chain when later updating the VM -- even when checksums/wal_log_hints
* are enabled.
*/
do_set_pd_vis = false;
+ do_set_vm = false;
if ((options & HEAP_PAGE_PRUNE_UPDATE_VIS) != 0)
- {
- if (prstate.all_visible && !PageIsAllVisible(page))
- do_set_pd_vis = true;
- }
+ do_set_vm = heap_page_will_set_vis(relation,
+ blockno, buffer, vmbuffer, blk_known_av,
+ &prstate, &new_vmbits, &do_set_pd_vis);
/* Any error while applying the changes is critical */
START_CRIT_SECTION();
@@ -928,28 +1047,72 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
END_CRIT_SECTION();
+ Assert(!prstate.all_visible || (prstate.lpdead_items == 0));
+
+ /*
+ * VACUUM will call heap_page_would_be_all_visible() during the second
+ * pass over the heap to determine all_visible and all_frozen for the page
+ * -- this is a specialized version of that logic. Now that we've finished
+ * pruning and freezing, make sure that we're in total agreement with
+ * heap_page_would_be_all_visible() using an assertion.
+ */
+#ifdef USE_ASSERT_CHECKING
+ if (prstate.all_visible)
+ {
+ TransactionId debug_cutoff;
+ bool debug_all_frozen;
+
+ Assert(prstate.lpdead_items == 0);
+ Assert(prstate.cutoffs);
+
+ if (!heap_page_is_all_visible(relation, buffer,
+ prstate.cutoffs->OldestXmin,
+ &debug_all_frozen,
+ &debug_cutoff, off_loc))
+ Assert(false);
+
+ Assert(prstate.all_frozen == debug_all_frozen);
+
+ Assert(!TransactionIdIsValid(debug_cutoff) ||
+ debug_cutoff == prstate.visibility_cutoff_xid);
+ }
+#endif
+
+ /* Now set the VM */
+ if (do_set_vm)
+ {
+ TransactionId vm_conflict_horizon;
+
+ Assert((new_vmbits & VISIBILITYMAP_VALID_BITS) != 0);
+
+ /*
+ * The conflict horizon for that record must be the newest xmin on the
+ * page. However, if the page is completely frozen, there can be no
+ * conflict and the vm_conflict_horizon should remain
+ * InvalidTransactionId. This includes the case that we just froze
+ * all the tuples; the prune-freeze record included the conflict XID
+ * already so a snapshotConflictHorizon sufficient to make everything
+ * safe for REDO was logged when the page's tuples were frozen.
+ */
+ if (prstate.all_frozen)
+ vm_conflict_horizon = InvalidTransactionId;
+ else
+ vm_conflict_horizon = prstate.visibility_cutoff_xid;
+ old_vmbits = visibilitymap_set(relation, blockno,
+ InvalidXLogRecPtr,
+ vmbuffer, vm_conflict_horizon,
+ new_vmbits);
+ }
+
/* Copy information back for caller */
presult->ndeleted = prstate.ndeleted;
presult->nnewlpdead = prstate.ndead;
presult->nfrozen = prstate.nfrozen;
presult->live_tuples = prstate.live_tuples;
presult->recently_dead_tuples = prstate.recently_dead_tuples;
- presult->all_visible = prstate.all_visible;
- presult->all_frozen = prstate.all_frozen;
presult->hastup = prstate.hastup;
-
- /*
- * For callers planning to update the visibility map, the conflict horizon
- * for that record must be the newest xmin on the page. However, if the
- * page is completely frozen, there can be no conflict and the
- * vm_conflict_horizon should remain InvalidTransactionId. This includes
- * the case that we just froze all the tuples; the prune-freeze record
- * included the conflict XID already so the caller doesn't need it.
- */
- if (presult->all_frozen)
- presult->vm_conflict_horizon = InvalidTransactionId;
- else
- presult->vm_conflict_horizon = prstate.visibility_cutoff_xid;
+ presult->new_vmbits = new_vmbits;
+ presult->old_vmbits = old_vmbits;
presult->lpdead_items = prstate.lpdead_items;
/* the presult->deadoffsets array was already filled in */
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index e01fc5bb502..8ec0476a0d4 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -463,11 +463,6 @@ static void dead_items_add(LVRelState *vacrel, BlockNumber blkno, OffsetNumber *
int num_offsets);
static void dead_items_reset(LVRelState *vacrel);
static void dead_items_cleanup(LVRelState *vacrel);
-static bool heap_page_is_all_visible(Relation rel, Buffer buf,
- TransactionId OldestXmin,
- bool *all_frozen,
- TransactionId *visibility_cutoff_xid,
- OffsetNumber *logging_offnum);
static bool heap_page_would_be_all_visible(Relation rel, Buffer buf,
TransactionId OldestXmin,
OffsetNumber *deadoffsets,
@@ -2014,7 +2009,9 @@ lazy_scan_prune(LVRelState *vacrel,
if (vacrel->nindexes == 0)
prune_options |= HEAP_PAGE_PRUNE_MARK_UNUSED_NOW;
- heap_page_prune_and_freeze(rel, buf, PRUNE_VACUUM_SCAN, prune_options,
+ heap_page_prune_and_freeze(rel, buf,
+ vmbuffer, all_visible_according_to_vm,
+ PRUNE_VACUUM_SCAN, prune_options,
&vacrel->cutoffs,
vacrel->vistest,
&presult,
@@ -2035,33 +2032,6 @@ lazy_scan_prune(LVRelState *vacrel,
vacrel->new_frozen_tuple_pages++;
}
- /*
- * VACUUM will call heap_page_is_all_visible() during the second pass over
- * the heap to determine all_visible and all_frozen for the page -- this
- * is a specialized version of the logic from this function. Now that
- * we've finished pruning and freezing, make sure that we're in total
- * agreement with heap_page_is_all_visible() using an assertion.
- */
-#ifdef USE_ASSERT_CHECKING
- if (presult.all_visible)
- {
- TransactionId debug_cutoff;
- bool debug_all_frozen;
-
- Assert(presult.lpdead_items == 0);
-
- if (!heap_page_is_all_visible(vacrel->rel, buf,
- vacrel->cutoffs.OldestXmin, &debug_all_frozen,
- &debug_cutoff, &vacrel->offnum))
- Assert(false);
-
- Assert(presult.all_frozen == debug_all_frozen);
-
- Assert(!TransactionIdIsValid(debug_cutoff) ||
- debug_cutoff == presult.vm_conflict_horizon);
- }
-#endif
-
/*
* Now save details of the LP_DEAD items from the page in vacrel
*/
@@ -2095,112 +2065,28 @@ lazy_scan_prune(LVRelState *vacrel,
/* Did we find LP_DEAD items? */
*has_lpdead_items = (presult.lpdead_items > 0);
- Assert(!presult.all_visible || !(*has_lpdead_items));
- Assert(!presult.all_frozen || presult.all_visible);
-
/*
- * Handle setting visibility map bits based on information from the VM (as
- * of last heap_vac_scan_next_block() call), and from all_visible and
- * all_frozen variables.
+ * For the purposes of logging, count whether or not the page was newly
+ * set all-visible and, potentially, all-frozen.
*/
- if ((presult.all_visible && !all_visible_according_to_vm) ||
- (presult.all_frozen && !VM_ALL_FROZEN(vacrel->rel, blkno, &vmbuffer)))
+ if ((presult.old_vmbits & VISIBILITYMAP_ALL_VISIBLE) == 0 &&
+ (presult.new_vmbits & VISIBILITYMAP_ALL_VISIBLE) != 0)
{
- uint8 old_vmbits;
- uint8 flags = VISIBILITYMAP_ALL_VISIBLE;
-
- /*
- * If the page is all-frozen, we can pass InvalidTransactionId as our
- * cutoff_xid, since a snapshotConflictHorizon sufficient to make
- * everything safe for REDO was logged when the page's tuples were
- * frozen.
- */
- if (presult.all_frozen)
- {
- Assert(!TransactionIdIsValid(presult.vm_conflict_horizon));
- flags |= VISIBILITYMAP_ALL_FROZEN;
- }
-
- old_vmbits = visibilitymap_set(vacrel->rel, blkno,
- InvalidXLogRecPtr,
- vmbuffer, presult.vm_conflict_horizon,
- flags);
-
- /*
- * Even if we are only setting the all-frozen bit, there is a small
- * chance that the VM was modified sometime between setting
- * all_visible_according_to_vm and checking the visibility during
- * pruning. Check the return value of old_vmbits to ensure the
- * visibility map counters used for logging are accurate.
- *
- * If the page wasn't already set all-visible and/or all-frozen in the
- * VM, count it as newly set for logging.
- */
- if ((old_vmbits & VISIBILITYMAP_ALL_VISIBLE) == 0)
- {
- vacrel->vm_new_visible_pages++;
- if (presult.all_frozen)
- {
- vacrel->vm_new_visible_frozen_pages++;
- *vm_page_frozen = true;
- }
- }
- else if ((old_vmbits & VISIBILITYMAP_ALL_FROZEN) == 0 &&
- presult.all_frozen)
+ vacrel->vm_new_visible_pages++;
+ if ((presult.new_vmbits & VISIBILITYMAP_ALL_FROZEN) != 0)
{
- vacrel->vm_new_frozen_pages++;
+ vacrel->vm_new_visible_frozen_pages++;
*vm_page_frozen = true;
}
}
-
- /*
- * Now handle two potential corruption cases:
- *
- * As of PostgreSQL 9.2, the visibility map bit should never be set if the
- * page-level bit is clear. However, it's possible that the bit got
- * cleared after heap_vac_scan_next_block() was called, so we must recheck
- * with buffer lock before concluding that the VM is corrupt.
- */
- else if (all_visible_according_to_vm && !PageIsAllVisible(page) &&
- visibilitymap_get_status(vacrel->rel, blkno, &vmbuffer) != 0)
- {
- ereport(WARNING,
- (errcode(ERRCODE_DATA_CORRUPTED),
- errmsg("page is not marked all-visible but visibility map bit is set in relation \"%s\" page %u",
- vacrel->relname, blkno)));
-
- visibilitymap_clear(vacrel->rel, blkno, vmbuffer,
- VISIBILITYMAP_VALID_BITS);
- }
-
- /*
- * It's possible for the value returned by
- * GetOldestNonRemovableTransactionId() to move backwards, so it's not
- * wrong for us to see tuples that appear to not be visible to everyone
- * yet, while PD_ALL_VISIBLE is already set. The real safe xmin value
- * never moves backwards, but GetOldestNonRemovableTransactionId() is
- * conservative and sometimes returns a value that's unnecessarily small,
- * so if we see that contradiction it just means that the tuples that we
- * think are not visible to everyone yet actually are, and the
- * PD_ALL_VISIBLE flag is correct.
- *
- * There should never be LP_DEAD items on a page with PD_ALL_VISIBLE set,
- * however.
- */
- else if (presult.lpdead_items > 0 && PageIsAllVisible(page))
+ else if ((presult.old_vmbits & VISIBILITYMAP_ALL_FROZEN) == 0 &&
+ (presult.new_vmbits & VISIBILITYMAP_ALL_FROZEN) != 0)
{
- ereport(WARNING,
- (errcode(ERRCODE_DATA_CORRUPTED),
- errmsg("page containing LP_DEAD items is marked as all-visible in relation \"%s\" page %u",
- vacrel->relname, blkno)));
-
- PageClearAllVisible(page);
- MarkBufferDirty(buf);
- visibilitymap_clear(vacrel->rel, blkno, vmbuffer,
- VISIBILITYMAP_VALID_BITS);
+ Assert((presult.new_vmbits & VISIBILITYMAP_ALL_VISIBLE) != 0);
+ vacrel->vm_new_frozen_pages++;
+ *vm_page_frozen = true;
}
-
return presult.ndeleted;
}
@@ -3590,7 +3476,7 @@ dead_items_cleanup(LVRelState *vacrel)
* Wrapper for heap_page_would_be_all_visible() which can be used for
* callers that expect no LP_DEAD on the page.
*/
-static bool
+bool
heap_page_is_all_visible(Relation rel, Buffer buf,
TransactionId OldestXmin,
bool *all_frozen,
diff --git a/src/include/access/heapam.h b/src/include/access/heapam.h
index be66970c9f0..797cd51145d 100644
--- a/src/include/access/heapam.h
+++ b/src/include/access/heapam.h
@@ -235,19 +235,14 @@ typedef struct PruneFreezeResult
int recently_dead_tuples;
/*
- * all_visible and all_frozen indicate if the all-visible and all-frozen
- * bits in the visibility map can be set for this page, after pruning.
+ * old_vmbits are the state of the all-visible and all-frozen bits in the
+ * visibility map before updating it during phase I of vacuuming.
+ * new_vmbits are the state of those bits after phase I of vacuuming.
*
- * vm_conflict_horizon is the newest xmin of live tuples on the page. The
- * caller can use it as the conflict horizon when setting the VM bits. It
- * is only valid if we froze some tuples (nfrozen > 0), and all_frozen is
- * true.
- *
- * These are only set if the HEAP_PRUNE_FREEZE option is set.
+ * These are only set if the HEAP_PAGE_PRUNE_UPDATE_VIS option is set.
*/
- bool all_visible;
- bool all_frozen;
- TransactionId vm_conflict_horizon;
+ uint8 new_vmbits;
+ uint8 old_vmbits;
/*
* Whether or not the page makes rel truncation unsafe. This is set to
@@ -375,6 +370,7 @@ extern TransactionId heap_index_delete_tuples(Relation rel,
struct GlobalVisState;
extern void heap_page_prune_opt(Relation relation, Buffer buffer);
extern void heap_page_prune_and_freeze(Relation relation, Buffer buffer,
+ Buffer vmbuffer, bool blk_known_av,
PruneReason reason,
int options,
const struct VacuumCutoffs *cutoffs,
@@ -403,6 +399,12 @@ extern void log_heap_prune_and_freeze(Relation relation, Buffer buffer,
extern void heap_vacuum_rel(Relation rel,
const VacuumParams params, BufferAccessStrategy bstrategy);
+extern bool heap_page_is_all_visible(Relation rel, Buffer buf,
+ TransactionId OldestXmin,
+ bool *all_frozen,
+ TransactionId *visibility_cutoff_xid,
+ OffsetNumber *logging_offnum);
+
/* in heap/heapam_visibility.c */
extern bool HeapTupleSatisfiesVisibility(HeapTuple htup, Snapshot snapshot,
Buffer buffer);
--
2.43.0
v14-0017-Eliminate-XLOG_HEAP2_VISIBLE-from-vacuum-phase-I.patchtext/x-patch; charset=US-ASCII; name=v14-0017-Eliminate-XLOG_HEAP2_VISIBLE-from-vacuum-phase-I.patchDownload
From 9904f827846bb2660dbc9ff0ecb1d24dbe9dc3bc Mon Sep 17 00:00:00 2001
From: Melanie Plageman <melanieplageman@gmail.com>
Date: Wed, 17 Sep 2025 17:29:59 -0400
Subject: [PATCH v14 17/24] Eliminate XLOG_HEAP2_VISIBLE from vacuum phase I
prune/freeze
Instead of emitting a separate WAL record for every block rendered
all-visible/frozen by vacuum's phase I, include the changes to the VM in
the XLOG_HEAP2_PRUNE_VACUUM_SCAN record already emitted.
This is only enabled for vacuum's prune/freeze work, not for on-access
pruning.
---
src/backend/access/heap/pruneheap.c | 183 +++++++++++++++++-----------
src/include/access/heapam.h | 3 +-
2 files changed, 112 insertions(+), 74 deletions(-)
diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index e3f9967e26c..a14c793da7e 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -662,50 +662,58 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
prstate.deadoffsets = presult->deadoffsets;
/*
- * Caller may update the VM after we're done. We can keep track of
- * whether the page will be all-visible and all-frozen after pruning and
- * freezing to help the caller to do that.
+ * Keep track of whether or not the page will be all-visible and
+ * all-frozen for use in opportunistic freezing and to update the VM if
+ * the caller requests it.
*
- * Currently, only VACUUM sets the VM bits. To save the effort, only do
- * the bookkeeping if the caller needs it. Currently, that's tied to
- * HEAP_PAGE_PRUNE_FREEZE, but it could be a separate flag if you wanted
- * to update the VM bits without also freezing or freeze without also
- * setting the VM bits.
+ * Currently, only VACUUM attempts freezing. But other callers could. The
+ * visibility bookkeeping is required for opportunistic freezing (in
+ * addition to setting the VM bits) because we only consider
+ * opportunistically freezing tuples if the whole page would become
+ * all-frozen or if the whole page will be frozen except for dead tuples
+ * that will be removed by vacuum. But if consider_update_vm is false,
+ * we'll not set the VM even if the page is discovered to be all-visible.
*
- * In addition to telling the caller whether it can set the VM bit, we
- * also use 'all_visible' and 'all_frozen' for our own decision-making. If
- * the whole page would become frozen, we consider opportunistically
- * freezing tuples. We will not be able to freeze the whole page if there
- * are tuples present that are not visible to everyone or if there are
- * dead tuples which are not yet removable. However, dead tuples which
- * will be removed by the end of vacuuming should not preclude us from
- * opportunistically freezing. Because of that, we do not immediately
- * clear all_visible when we see LP_DEAD items. We fix that after
- * scanning the line pointers, before we return the value to the caller,
- * so that the caller doesn't set the VM bit incorrectly.
+ * If only HEAP_PAGE_PRUNE_UPDATE_ViS is passed and not
+ * HEAP_PAGE_PRUNE_FREEZE, prstate.all_frozen must be initialized to false
+ * because we will not call heap_prepare_freeze_tuple() on each tuple.
+ *
+ * Dead tuples which will be removed by the end of vacuuming should not
+ * preclude us from opportunistically freezing, so we do not clear
+ * all_visible when we see LP_DEAD items. We fix that after determining
+ * whether or not to freeze but before deciding whether or not to update
+ * the VM so that we don't set the VM bit incorrectly.
+ *
+ * If not freezing and not updating the VM, we avoid the extra
+ * bookkeeping. Initializing all_visible to false allows skipping the work
+ * to update them in heap_prune_record_unchanged_lp_normal().
*/
if (prstate.attempt_freeze)
{
prstate.all_visible = true;
prstate.all_frozen = true;
}
+ else if ((options & HEAP_PAGE_PRUNE_UPDATE_VIS) != 0)
+ {
+ prstate.all_visible = true;
+ prstate.all_frozen = false;
+ }
else
{
- /*
- * Initializing to false allows skipping the work to update them in
- * heap_prune_record_unchanged_lp_normal().
- */
prstate.all_visible = false;
prstate.all_frozen = false;
}
/*
- * The visibility cutoff xid is the newest xmin of live tuples on the
- * page. In the common case, this will be set as the conflict horizon the
- * caller can use for updating the VM. If, at the end of freezing and
- * pruning, the page is all-frozen, there is no possibility that any
- * running transaction on the standby does not see tuples on the page as
- * all-visible, so the conflict horizon remains InvalidTransactionId.
+ * The visibility cutoff xid is the newest xmin of live, committed tuples
+ * older than OldestXmin on the page. This field is only kept up-to-date
+ * if the page is all-visible. As soon as a tuple is encountered that is
+ * not visible to all, this field is unmaintained. As long as it is
+ * maintained, it can be used to calculate the snapshot conflict horizon.
+ * This is most likely to happen when updating the VM and/or freezing all
+ * live tuples on the page. It is updated before returning to the caller
+ * because vacuum does assert-build only validation on the page using this
+ * field.
*/
prstate.visibility_cutoff_xid = InvalidTransactionId;
@@ -943,16 +951,15 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
* visibility map bits based on information from the VM and from
* all_visible and all_frozen variables.
*
- * Though callers should set the VM if PD_ALL_VISIBLE is set, it is
- * allowed for the page-level bit to be set and the VM to be clear. We log
- * setting PD_ALL_VISIBLE on the heap page in a
- * XLOG_HEAP2_PRUNE_VACUUM_SCAN record and setting the VM bits in a later
- * emitted XLOG_HEAP2_VISIBLE record.
+ * It is allowed for the page-level bit to be set and the VM to be clear,
+ * however, we have a strong preference for keeping them in sync.
*
- * Setting PD_ALL_VISIBLE when we are making the changes to the page that
- * render it all-visible allows us to omit the heap page from the WAL
- * chain when later updating the VM -- even when checksums/wal_log_hints
- * are enabled.
+ * Prior to Postgres 19, it was possible for the page-level bit to be set
+ * and the VM bit to be clear. This could happen if we crashed after
+ * setting PD_ALL_VISIBLE but before setting bits in the VM.
+ *
+ * As such, it is possible to only update the VM when PD_ALL_VISIBLE is
+ * already set.
*/
do_set_pd_vis = false;
do_set_vm = false;
@@ -961,6 +968,10 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
blockno, buffer, vmbuffer, blk_known_av,
&prstate, &new_vmbits, &do_set_pd_vis);
+ /* Lock vmbuffer before entering a critical section */
+ if (do_set_vm)
+ LockBuffer(vmbuffer, BUFFER_LOCK_EXCLUSIVE);
+
/* Any error while applying the changes is critical */
START_CRIT_SECTION();
@@ -991,7 +1002,7 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
MarkBufferDirtyHint(buffer, true);
}
- if (do_prune || do_freeze || do_set_pd_vis)
+ if (do_prune || do_freeze || do_set_pd_vis || do_set_vm)
{
/* Apply the planned item changes and repair page fragmentation. */
if (do_prune)
@@ -1008,12 +1019,31 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
if (do_set_pd_vis)
PageSetAllVisible(page);
- MarkBufferDirty(buffer);
+ if (do_prune || do_freeze || do_set_pd_vis)
+ MarkBufferDirty(buffer);
+
+ if (do_set_vm)
+ {
+ Assert(PageIsAllVisible(page));
+
+ old_vmbits = visibilitymap_set_vmbits(relation, blockno,
+ vmbuffer, new_vmbits);
+ if (old_vmbits == new_vmbits)
+ {
+ LockBuffer(vmbuffer, BUFFER_LOCK_UNLOCK);
+ /* Unset so we don't emit WAL since no change occurred */
+ do_set_vm = false;
+ }
+ }
/*
- * Emit a WAL XLOG_HEAP2_PRUNE* record showing what we did
+ * Emit a WAL XLOG_HEAP2_PRUNE* record showing what we did If we were
+ * only updating the VM and it turns out it was already set, we will
+ * have unset do_set_vm earlier. As such, check it again before
+ * emitting the record.
*/
- if (RelationNeedsWAL(relation))
+ if (RelationNeedsWAL(relation) &&
+ (do_prune || do_freeze || do_set_pd_vis || do_set_vm))
{
/*
* The snapshotConflictHorizon for the whole record should be the
@@ -1025,15 +1055,45 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
* on the standby with xids older than the youngest tuple this
* record will freeze will conflict.
*/
- TransactionId conflict_xid;
+ TransactionId conflict_xid = InvalidTransactionId;
- if (TransactionIdFollows(frz_conflict_horizon, prstate.latest_xid_removed))
+ /*
+ * If we are updating the VM, the conflict horizon is almost
+ * always the visibility cutoff XID.
+ *
+ * Separately, if we are freezing any tuples, as an optimization,
+ * we can use the visibility_cutoff_xid as the conflict horizon if
+ * the page will be all-frozen. This is true even if there are
+ * LP_DEAD line pointers because we ignored those when maintaining
+ * the visibility_cutoff_xid. This will have been calculated
+ * earlier as the frz_conflict_horizon when we determined we would
+ * freeze.
+ */
+ if (do_set_vm)
+ conflict_xid = prstate.visibility_cutoff_xid;
+ else if (do_freeze)
conflict_xid = frz_conflict_horizon;
- else
+
+ /*
+ * If we are removing tuples with a younger xmax than our so far
+ * calculated conflict_xid, we must use this as our horizon.
+ */
+ if (TransactionIdFollows(prstate.latest_xid_removed, conflict_xid))
conflict_xid = prstate.latest_xid_removed;
+ /*
+ * We can omit the snapshot conflict horizon if we are not pruning
+ * or freezing any tuples and are setting an already all-visible
+ * page all-frozen in the VM. In this case, all of the tuples on
+ * the page must already be visible to all MVCC snapshots on the
+ * standby.
+ */
+ if (!do_prune && !do_freeze && do_set_vm &&
+ blk_known_av && (new_vmbits & VISIBILITYMAP_ALL_FROZEN))
+ conflict_xid = InvalidTransactionId;
+
log_heap_prune_and_freeze(relation, buffer,
- InvalidBuffer, 0,
+ vmbuffer, new_vmbits,
conflict_xid,
true,
do_set_pd_vis,
@@ -1047,6 +1107,9 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
END_CRIT_SECTION();
+ if (do_set_vm)
+ LockBuffer(vmbuffer, BUFFER_LOCK_UNLOCK);
+
Assert(!prstate.all_visible || (prstate.lpdead_items == 0));
/*
@@ -1078,32 +1141,6 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
}
#endif
- /* Now set the VM */
- if (do_set_vm)
- {
- TransactionId vm_conflict_horizon;
-
- Assert((new_vmbits & VISIBILITYMAP_VALID_BITS) != 0);
-
- /*
- * The conflict horizon for that record must be the newest xmin on the
- * page. However, if the page is completely frozen, there can be no
- * conflict and the vm_conflict_horizon should remain
- * InvalidTransactionId. This includes the case that we just froze
- * all the tuples; the prune-freeze record included the conflict XID
- * already so a snapshotConflictHorizon sufficient to make everything
- * safe for REDO was logged when the page's tuples were frozen.
- */
- if (prstate.all_frozen)
- vm_conflict_horizon = InvalidTransactionId;
- else
- vm_conflict_horizon = prstate.visibility_cutoff_xid;
- old_vmbits = visibilitymap_set(relation, blockno,
- InvalidXLogRecPtr,
- vmbuffer, vm_conflict_horizon,
- new_vmbits);
- }
-
/* Copy information back for caller */
presult->ndeleted = prstate.ndeleted;
presult->nnewlpdead = prstate.ndead;
@@ -2261,7 +2298,7 @@ heap_log_freeze_plan(HeapTupleFreeze *tuples, int ntuples,
* - Reaping: During vacuum phase III, items that are already LP_DEAD are
* marked as unused.
*
- * - VM updates: After vacuum phase III, the heap page may be marked
+ * - VM updates: After vacuum phases I and III, the heap page may be marked
* all-visible and all-frozen.
*
* These changes all happen together, so we use a single WAL record for them
diff --git a/src/include/access/heapam.h b/src/include/access/heapam.h
index 797cd51145d..cac7a4c2899 100644
--- a/src/include/access/heapam.h
+++ b/src/include/access/heapam.h
@@ -239,7 +239,8 @@ typedef struct PruneFreezeResult
* visibility map before updating it during phase I of vacuuming.
* new_vmbits are the state of those bits after phase I of vacuuming.
*
- * These are only set if the HEAP_PAGE_PRUNE_UPDATE_VIS option is set.
+ * These are only set if the HEAP_PAGE_PRUNE_UPDATE_VIS option is set and
+ * we have actually updated the VM.
*/
uint8 new_vmbits;
uint8 old_vmbits;
--
2.43.0
v14-0018-Remove-XLOG_HEAP2_VISIBLE-entirely.patchtext/x-patch; charset=US-ASCII; name=v14-0018-Remove-XLOG_HEAP2_VISIBLE-entirely.patchDownload
From 6f94908b0649956e1d1abbbd5c362a57282c2c26 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <melanieplageman@gmail.com>
Date: Wed, 17 Sep 2025 17:42:54 -0400
Subject: [PATCH v14 18/24] Remove XLOG_HEAP2_VISIBLE entirely
There are now no users of this, so eliminate it entirely.
This includes the xl_heap_visible struct as well as all of the functions
used to emit and replay XLOG_HEAP2_VISIBLE records.
ci-os-only:
---
src/backend/access/common/bufmask.c | 4 +-
src/backend/access/heap/heapam.c | 40 ++--------
src/backend/access/heap/heapam_xlog.c | 96 +++---------------------
src/backend/access/heap/pruneheap.c | 4 +-
src/backend/access/heap/vacuumlazy.c | 14 ++--
src/backend/access/heap/visibilitymap.c | 83 +-------------------
src/backend/access/rmgrdesc/heapdesc.c | 10 ---
src/backend/replication/logical/decode.c | 1 -
src/backend/storage/ipc/standby.c | 12 +--
src/include/access/heapam_xlog.h | 19 -----
src/include/access/visibilitymap.h | 11 +--
src/include/access/visibilitymapdefs.h | 9 ---
src/tools/pgindent/typedefs.list | 1 -
13 files changed, 36 insertions(+), 268 deletions(-)
diff --git a/src/backend/access/common/bufmask.c b/src/backend/access/common/bufmask.c
index bb260cffa68..5f07f179415 100644
--- a/src/backend/access/common/bufmask.c
+++ b/src/backend/access/common/bufmask.c
@@ -56,8 +56,8 @@ mask_page_hint_bits(Page page)
/*
* During replay, if the page LSN has advanced past our XLOG record's LSN,
- * we don't mark the page all-visible. See heap_xlog_visible() for
- * details.
+ * we don't mark the page all-visible. See heap_xlog_prune_and_freeze()
+ * for more details.
*/
PageClearAllVisible(page);
}
diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index 0323e2df409..ab514ce65ec 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -2524,11 +2524,11 @@ heap_multi_insert(Relation relation, TupleTableSlot **slots, int ntuples,
else if (all_frozen_set)
{
PageSetAllVisible(page);
- visibilitymap_set_vmbits(relation,
- BufferGetBlockNumber(buffer),
- vmbuffer,
- VISIBILITYMAP_ALL_VISIBLE |
- VISIBILITYMAP_ALL_FROZEN);
+ visibilitymap_set(relation,
+ BufferGetBlockNumber(buffer),
+ vmbuffer,
+ VISIBILITYMAP_ALL_VISIBLE |
+ VISIBILITYMAP_ALL_FROZEN);
}
/*
@@ -8799,36 +8799,6 @@ bottomup_sort_and_shrink(TM_IndexDeleteOp *delstate)
return nblocksfavorable;
}
-/*
- * Perform XLogInsert for a heap-visible operation. 'block' is the block
- * being marked all-visible, and vm_buffer is the buffer containing the
- * corresponding visibility map block. Both should have already been modified
- * and dirtied.
- *
- * snapshotConflictHorizon comes from the largest xmin on the page being
- * marked all-visible. REDO routine uses it to generate recovery conflicts.
- */
-XLogRecPtr
-log_heap_visible(Relation rel, Buffer vm_buffer,
- TransactionId snapshotConflictHorizon, uint8 vmflags)
-{
- xl_heap_visible xlrec;
- XLogRecPtr recptr;
-
- Assert(BufferIsValid(vm_buffer));
-
- xlrec.snapshotConflictHorizon = snapshotConflictHorizon;
- xlrec.flags = vmflags;
- if (RelationIsAccessibleInLogicalDecoding(rel))
- xlrec.flags |= VISIBILITYMAP_XLOG_CATALOG_REL;
- XLogBeginInsert();
- XLogRegisterData(&xlrec, SizeOfHeapVisible);
- XLogRegisterBuffer(0, vm_buffer, 0);
- recptr = XLogInsert(RM_HEAP2_ID, XLOG_HEAP2_VISIBLE);
-
- return recptr;
-}
-
/*
* Perform XLogInsert for a heap-update operation. Caller must already
* have modified the buffer(s) and marked them dirty.
diff --git a/src/backend/access/heap/heapam_xlog.c b/src/backend/access/heap/heapam_xlog.c
index c1f332f7a9a..a8908373067 100644
--- a/src/backend/access/heap/heapam_xlog.c
+++ b/src/backend/access/heap/heapam_xlog.c
@@ -251,8 +251,8 @@ heap_xlog_prune_freeze(XLogReaderState *record)
*
* In recovery, we expect no other writers, so writing to the VM page
* without holding a lock on the heap page is considered safe enough. It
- * is done this way when replaying xl_heap_visible records (see
- * heap_xlog_visible()).
+ * is also done this way when replaying COPY FREEZE records (see
+ * heap_xlog_multi_insert()).
*/
if (vmflags & VISIBILITYMAP_VALID_BITS &&
XLogReadBufferForRedoExtended(record, 1,
@@ -268,7 +268,7 @@ heap_xlog_prune_freeze(XLogReaderState *record)
if (PageIsNew(vmpage))
PageInit(vmpage, BLCKSZ, 0);
- old_vmbits = visibilitymap_set_vmbits(reln, blkno, vmbuffer, vmflags);
+ old_vmbits = visibilitymap_set(reln, blkno, vmbuffer, vmflags);
/* Only set VM page LSN if we modified the page */
if (old_vmbits != vmflags)
@@ -287,81 +287,6 @@ heap_xlog_prune_freeze(XLogReaderState *record)
XLogRecordPageWithFreeSpace(rlocator, blkno, freespace);
}
-/*
- * Replay XLOG_HEAP2_VISIBLE records.
- *
- * It is imperative that the previously emitted record set PD_ALL_VISIBLE on
- * the heap page. We must never end up with a situation where the visibility
- * map bit is set, and the page-level PD_ALL_VISIBLE bit is clear. If that
- * were to occur, then a subsequent page modification would fail to clear the
- * visibility map bit.
- */
-static void
-heap_xlog_visible(XLogReaderState *record)
-{
- XLogRecPtr lsn = record->EndRecPtr;
- xl_heap_visible *xlrec = (xl_heap_visible *) XLogRecGetData(record);
- Buffer vmbuffer = InvalidBuffer;
- RelFileLocator rlocator;
- BlockNumber blkno;
-
- Assert((xlrec->flags & VISIBILITYMAP_XLOG_VALID_BITS) == xlrec->flags);
-
- XLogRecGetBlockTag(record, 0, &rlocator, NULL, &blkno);
-
- /*
- * If there are any Hot Standby transactions running that have an xmin
- * horizon old enough that this page isn't all-visible for them, they
- * might incorrectly decide that an index-only scan can skip a heap fetch.
- *
- * NB: It might be better to throw some kind of "soft" conflict here that
- * forces any index-only scan that is in flight to perform heap fetches,
- * rather than killing the transaction outright.
- */
- if (InHotStandby)
- ResolveRecoveryConflictWithSnapshot(xlrec->snapshotConflictHorizon,
- xlrec->flags & VISIBILITYMAP_XLOG_CATALOG_REL,
- rlocator);
-
- /*
- * Even if the heap relation was dropped or truncated and the previously
- * emitted record skipped the heap page update due to this LSN interlock,
- * it's still safe to update the visibility map. Any WAL record that
- * clears the visibility map bit does so before checking the page LSN, so
- * any bits that need to be cleared will still be cleared.
- */
- if (XLogReadBufferForRedoExtended(record, 0, RBM_ZERO_ON_ERROR, false,
- &vmbuffer) == BLK_NEEDS_REDO)
- {
- Page vmpage = BufferGetPage(vmbuffer);
- Relation reln;
- uint8 vmbits;
-
- /* initialize the page if it was read as zeros */
- if (PageIsNew(vmpage))
- PageInit(vmpage, BLCKSZ, 0);
-
- /* remove VISIBILITYMAP_XLOG_* */
- vmbits = xlrec->flags & VISIBILITYMAP_VALID_BITS;
-
- /*
- * XLogReadBufferForRedoExtended locked the buffer. But
- * visibilitymap_set will handle locking itself.
- */
- LockBuffer(vmbuffer, BUFFER_LOCK_UNLOCK);
-
- reln = CreateFakeRelcacheEntry(rlocator);
-
- visibilitymap_set(reln, blkno, lsn, vmbuffer,
- xlrec->snapshotConflictHorizon, vmbits);
-
- ReleaseBuffer(vmbuffer);
- FreeFakeRelcacheEntry(reln);
- }
- else if (BufferIsValid(vmbuffer))
- UnlockReleaseBuffer(vmbuffer);
-}
-
/*
* Given an "infobits" field from an XLog record, set the correct bits in the
* given infomask and infomask2 for the tuple touched by the record.
@@ -739,8 +664,8 @@ heap_xlog_multi_insert(XLogReaderState *record)
*
* In recovery, we expect no other writers, so writing to the VM page
* without holding a lock on the heap page is considered safe enough. It
- * is done this way when replaying xl_heap_visible records (see
- * heap_xlog_visible()).
+ * is done this way when replaying xl_heap_prune records (see
+ * heap_xlog_prune_and_freeze()).
*/
if (xlrec->flags & XLH_INSERT_ALL_FROZEN_SET &&
XLogReadBufferForRedoExtended(record, 1, RBM_ZERO_ON_ERROR, false,
@@ -753,10 +678,10 @@ heap_xlog_multi_insert(XLogReaderState *record)
if (PageIsNew(vmpage))
PageInit(vmpage, BLCKSZ, 0);
- visibilitymap_set_vmbits(reln, blkno,
- vmbuffer,
- VISIBILITYMAP_ALL_VISIBLE |
- VISIBILITYMAP_ALL_FROZEN);
+ visibilitymap_set(reln, blkno,
+ vmbuffer,
+ VISIBILITYMAP_ALL_VISIBLE |
+ VISIBILITYMAP_ALL_FROZEN);
/*
* It is not possible that the VM was already set for this heap page,
@@ -1342,9 +1267,6 @@ heap2_redo(XLogReaderState *record)
case XLOG_HEAP2_PRUNE_VACUUM_CLEANUP:
heap_xlog_prune_freeze(record);
break;
- case XLOG_HEAP2_VISIBLE:
- heap_xlog_visible(record);
- break;
case XLOG_HEAP2_MULTI_INSERT:
heap_xlog_multi_insert(record);
break;
diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index a14c793da7e..39d59a43ff7 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -1026,8 +1026,8 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
{
Assert(PageIsAllVisible(page));
- old_vmbits = visibilitymap_set_vmbits(relation, blockno,
- vmbuffer, new_vmbits);
+ old_vmbits = visibilitymap_set(relation, blockno,
+ vmbuffer, new_vmbits);
if (old_vmbits == new_vmbits)
{
LockBuffer(vmbuffer, BUFFER_LOCK_UNLOCK);
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index 8ec0476a0d4..28436389d63 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -1887,10 +1887,10 @@ lazy_scan_new_or_empty(LVRelState *vacrel, Buffer buf, BlockNumber blkno,
PageSetAllVisible(page);
MarkBufferDirty(buf);
- visibilitymap_set_vmbits(vacrel->rel, blkno,
- vmbuffer,
- VISIBILITYMAP_ALL_VISIBLE |
- VISIBILITYMAP_ALL_FROZEN);
+ visibilitymap_set(vacrel->rel, blkno,
+ vmbuffer,
+ VISIBILITYMAP_ALL_VISIBLE |
+ VISIBILITYMAP_ALL_FROZEN);
if (RelationNeedsWAL(vacrel->rel))
{
@@ -2775,9 +2775,9 @@ lazy_vacuum_heap_page(LVRelState *vacrel, BlockNumber blkno, Buffer buffer,
if ((vmflags & VISIBILITYMAP_VALID_BITS) != 0)
{
PageSetAllVisible(page);
- visibilitymap_set_vmbits(vacrel->rel,
- blkno,
- vmbuffer, vmflags);
+ visibilitymap_set(vacrel->rel,
+ blkno,
+ vmbuffer, vmflags);
conflict_xid = visibility_cutoff_xid;
}
diff --git a/src/backend/access/heap/visibilitymap.c b/src/backend/access/heap/visibilitymap.c
index 75fcb3f067a..38d3131e56b 100644
--- a/src/backend/access/heap/visibilitymap.c
+++ b/src/backend/access/heap/visibilitymap.c
@@ -14,8 +14,7 @@
* visibilitymap_clear - clear bits for one page in the visibility map
* visibilitymap_pin - pin a map page for setting a bit
* visibilitymap_pin_ok - check whether correct map page is already pinned
- * visibilitymap_set - set bit(s) in a previously pinned page and log
- * visibilitymap_set_vmbits - set bit(s) in a pinned page
+ * visibilitymap_set - set bit(s) in a previously pinned page
* visibilitymap_get_status - get status of bits
* visibilitymap_count - count number of bits set in visibility map
* visibilitymap_prepare_truncate -
@@ -220,82 +219,6 @@ visibilitymap_pin_ok(BlockNumber heapBlk, Buffer vmbuf)
return BufferIsValid(vmbuf) && BufferGetBlockNumber(vmbuf) == mapBlock;
}
-/*
- * visibilitymap_set - set bit(s) on a previously pinned page
- *
- * recptr is the LSN of the XLOG record we're replaying, if we're in recovery,
- * or InvalidXLogRecPtr in normal running. The VM page LSN is advanced to the
- * one provided; in normal running, we generate a new XLOG record and set the
- * page LSN to that value (though the heap page's LSN may *not* be updated;
- * see below). cutoff_xid is the largest xmin on the page being marked
- * all-visible; it is needed for Hot Standby, and can be InvalidTransactionId
- * if the page contains no tuples. It can also be set to InvalidTransactionId
- * when a page that is already all-visible is being marked all-frozen.
- *
- * Caller is expected to set the heap page's PD_ALL_VISIBLE bit before calling
- * this function.
- *
- * You must pass a buffer containing the correct map page to this function.
- * Call visibilitymap_pin first to pin the right one. This function doesn't do
- * any I/O.
- *
- * Returns the state of the page's VM bits before setting flags.
- */
-uint8
-visibilitymap_set(Relation rel, BlockNumber heapBlk,
- XLogRecPtr recptr, Buffer vmBuf, TransactionId cutoff_xid,
- uint8 flags)
-{
- BlockNumber mapBlock = HEAPBLK_TO_MAPBLOCK(heapBlk);
- uint32 mapByte = HEAPBLK_TO_MAPBYTE(heapBlk);
- uint8 mapOffset = HEAPBLK_TO_OFFSET(heapBlk);
- Page page;
- uint8 *map;
- uint8 status;
-
-#ifdef TRACE_VISIBILITYMAP
- elog(DEBUG1, "vm_set flags 0x%02X for %s %d",
- flags, RelationGetRelationName(rel), heapBlk);
-#endif
-
- Assert(InRecovery || XLogRecPtrIsInvalid(recptr));
- Assert((flags & VISIBILITYMAP_VALID_BITS) == flags);
-
- /* Must never set all_frozen bit without also setting all_visible bit */
- Assert(flags != VISIBILITYMAP_ALL_FROZEN);
-
- /* Check that we have the right VM page pinned */
- if (!BufferIsValid(vmBuf) || BufferGetBlockNumber(vmBuf) != mapBlock)
- elog(ERROR, "wrong VM buffer passed to visibilitymap_set");
-
- page = BufferGetPage(vmBuf);
- map = (uint8 *) PageGetContents(page);
- LockBuffer(vmBuf, BUFFER_LOCK_EXCLUSIVE);
-
- status = (map[mapByte] >> mapOffset) & VISIBILITYMAP_VALID_BITS;
- if (flags != status)
- {
- START_CRIT_SECTION();
-
- map[mapByte] |= (flags << mapOffset);
- MarkBufferDirty(vmBuf);
-
- if (RelationNeedsWAL(rel))
- {
- if (XLogRecPtrIsInvalid(recptr))
- {
- Assert(!InRecovery);
- recptr = log_heap_visible(rel, vmBuf, cutoff_xid, flags);
- }
- PageSetLSN(page, recptr);
- }
-
- END_CRIT_SECTION();
- }
-
- LockBuffer(vmBuf, BUFFER_LOCK_UNLOCK);
- return status;
-}
/*
* Set flags in the VM block contained in the passed in vmBuf.
@@ -318,8 +241,8 @@ visibilitymap_set(Relation rel, BlockNumber heapBlk,
* is pinned and exclusive locked.
*/
uint8
-visibilitymap_set_vmbits(Relation rel, BlockNumber heapBlk,
- Buffer vmBuf, uint8 flags)
+visibilitymap_set(Relation rel, BlockNumber heapBlk,
+ Buffer vmBuf, uint8 flags)
{
BlockNumber mapBlock = HEAPBLK_TO_MAPBLOCK(heapBlk);
uint32 mapByte = HEAPBLK_TO_MAPBYTE(heapBlk);
diff --git a/src/backend/access/rmgrdesc/heapdesc.c b/src/backend/access/rmgrdesc/heapdesc.c
index 1cb44ca32d3..93505cb8c56 100644
--- a/src/backend/access/rmgrdesc/heapdesc.c
+++ b/src/backend/access/rmgrdesc/heapdesc.c
@@ -349,13 +349,6 @@ heap2_desc(StringInfo buf, XLogReaderState *record)
}
}
}
- else if (info == XLOG_HEAP2_VISIBLE)
- {
- xl_heap_visible *xlrec = (xl_heap_visible *) rec;
-
- appendStringInfo(buf, "snapshotConflictHorizon: %u, flags: 0x%02X",
- xlrec->snapshotConflictHorizon, xlrec->flags);
- }
else if (info == XLOG_HEAP2_MULTI_INSERT)
{
xl_heap_multi_insert *xlrec = (xl_heap_multi_insert *) rec;
@@ -460,9 +453,6 @@ heap2_identify(uint8 info)
case XLOG_HEAP2_PRUNE_VACUUM_CLEANUP:
id = "PRUNE_VACUUM_CLEANUP";
break;
- case XLOG_HEAP2_VISIBLE:
- id = "VISIBLE";
- break;
case XLOG_HEAP2_MULTI_INSERT:
id = "MULTI_INSERT";
break;
diff --git a/src/backend/replication/logical/decode.c b/src/backend/replication/logical/decode.c
index cc03f0706e9..2fdd4af90a8 100644
--- a/src/backend/replication/logical/decode.c
+++ b/src/backend/replication/logical/decode.c
@@ -454,7 +454,6 @@ heap2_decode(LogicalDecodingContext *ctx, XLogRecordBuffer *buf)
case XLOG_HEAP2_PRUNE_ON_ACCESS:
case XLOG_HEAP2_PRUNE_VACUUM_SCAN:
case XLOG_HEAP2_PRUNE_VACUUM_CLEANUP:
- case XLOG_HEAP2_VISIBLE:
case XLOG_HEAP2_LOCK_UPDATED:
break;
default:
diff --git a/src/backend/storage/ipc/standby.c b/src/backend/storage/ipc/standby.c
index 4222bdab078..c619643e121 100644
--- a/src/backend/storage/ipc/standby.c
+++ b/src/backend/storage/ipc/standby.c
@@ -475,12 +475,12 @@ ResolveRecoveryConflictWithSnapshot(TransactionId snapshotConflictHorizon,
* If we get passed InvalidTransactionId then we do nothing (no conflict).
*
* This can happen when replaying already-applied WAL records after a
- * standby crash or restart, or when replaying an XLOG_HEAP2_VISIBLE
- * record that marks as frozen a page which was already all-visible. It's
- * also quite common with records generated during index deletion
- * (original execution of the deletion can reason that a recovery conflict
- * which is sufficient for the deletion operation must take place before
- * replay of the deletion record itself).
+ * standby crash or restart, or when replaying a record that marks as
+ * frozen a page which was already marked all-visible in the visibility
+ * map. It's also quite common with records generated during index
+ * deletion (original execution of the deletion can reason that a recovery
+ * conflict which is sufficient for the deletion operation must take place
+ * before replay of the deletion record itself).
*/
if (!TransactionIdIsValid(snapshotConflictHorizon))
return;
diff --git a/src/include/access/heapam_xlog.h b/src/include/access/heapam_xlog.h
index 833114e0a6e..61ceaf2a98b 100644
--- a/src/include/access/heapam_xlog.h
+++ b/src/include/access/heapam_xlog.h
@@ -60,7 +60,6 @@
#define XLOG_HEAP2_PRUNE_ON_ACCESS 0x10
#define XLOG_HEAP2_PRUNE_VACUUM_SCAN 0x20
#define XLOG_HEAP2_PRUNE_VACUUM_CLEANUP 0x30
-#define XLOG_HEAP2_VISIBLE 0x40
#define XLOG_HEAP2_MULTI_INSERT 0x50
#define XLOG_HEAP2_LOCK_UPDATED 0x60
#define XLOG_HEAP2_NEW_CID 0x70
@@ -451,19 +450,6 @@ typedef struct xl_heap_inplace
#define MinSizeOfHeapInplace (offsetof(xl_heap_inplace, nmsgs) + sizeof(int))
-/*
- * This is what we need to know about setting a visibility map bit
- *
- * Backup blk 0: visibility map buffer
- */
-typedef struct xl_heap_visible
-{
- TransactionId snapshotConflictHorizon;
- uint8 flags;
-} xl_heap_visible;
-
-#define SizeOfHeapVisible (offsetof(xl_heap_visible, flags) + sizeof(uint8))
-
typedef struct xl_heap_new_cid
{
/*
@@ -507,11 +493,6 @@ extern void heap2_desc(StringInfo buf, XLogReaderState *record);
extern const char *heap2_identify(uint8 info);
extern void heap_xlog_logical_rewrite(XLogReaderState *r);
-extern XLogRecPtr log_heap_visible(Relation rel,
- Buffer vm_buffer,
- TransactionId snapshotConflictHorizon,
- uint8 vmflags);
-
/* in heapdesc.c, so it can be shared between frontend/backend code */
extern void heap_xlog_deserialize_prune_and_freeze(char *cursor, uint16 flags,
int *nplans, xlhp_freeze_plan **plans,
diff --git a/src/include/access/visibilitymap.h b/src/include/access/visibilitymap.h
index 302adf4856a..c5b1e1f7adb 100644
--- a/src/include/access/visibilitymap.h
+++ b/src/include/access/visibilitymap.h
@@ -15,7 +15,6 @@
#define VISIBILITYMAP_H
#include "access/visibilitymapdefs.h"
-#include "access/xlogdefs.h"
#include "storage/block.h"
#include "storage/buf.h"
#include "utils/relcache.h"
@@ -31,14 +30,8 @@ extern bool visibilitymap_clear(Relation rel, BlockNumber heapBlk,
extern void visibilitymap_pin(Relation rel, BlockNumber heapBlk,
Buffer *vmbuf);
extern bool visibilitymap_pin_ok(BlockNumber heapBlk, Buffer vmbuf);
-extern uint8 visibilitymap_set(Relation rel,
- BlockNumber heapBlk,
- XLogRecPtr recptr,
- Buffer vmBuf,
- TransactionId cutoff_xid,
- uint8 flags);
-extern uint8 visibilitymap_set_vmbits(Relation rel, BlockNumber heapBlk,
- Buffer vmBuf, uint8 flags);
+extern uint8 visibilitymap_set(Relation rel, BlockNumber heapBlk,
+ Buffer vmBuf, uint8 flags);
extern uint8 visibilitymap_get_status(Relation rel, BlockNumber heapBlk, Buffer *vmbuf);
extern void visibilitymap_count(Relation rel, BlockNumber *all_visible, BlockNumber *all_frozen);
extern BlockNumber visibilitymap_prepare_truncate(Relation rel,
diff --git a/src/include/access/visibilitymapdefs.h b/src/include/access/visibilitymapdefs.h
index 5ad5c020877..e01bce4c99f 100644
--- a/src/include/access/visibilitymapdefs.h
+++ b/src/include/access/visibilitymapdefs.h
@@ -21,14 +21,5 @@
#define VISIBILITYMAP_ALL_FROZEN 0x02
#define VISIBILITYMAP_VALID_BITS 0x03 /* OR of all valid visibilitymap
* flags bits */
-/*
- * To detect recovery conflicts during logical decoding on a standby, we need
- * to know if a table is a user catalog table. For that we add an additional
- * bit into xl_heap_visible.flags, in addition to the above.
- *
- * NB: VISIBILITYMAP_XLOG_* may not be passed to visibilitymap_set().
- */
-#define VISIBILITYMAP_XLOG_CATALOG_REL 0x04
-#define VISIBILITYMAP_XLOG_VALID_BITS (VISIBILITYMAP_VALID_BITS | VISIBILITYMAP_XLOG_CATALOG_REL)
#endif /* VISIBILITYMAPDEFS_H */
diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list
index e90af5b2ad3..32c0f4719c3 100644
--- a/src/tools/pgindent/typedefs.list
+++ b/src/tools/pgindent/typedefs.list
@@ -4268,7 +4268,6 @@ xl_heap_prune
xl_heap_rewrite_mapping
xl_heap_truncate
xl_heap_update
-xl_heap_visible
xl_invalid_page
xl_invalid_page_key
xl_invalidations
--
2.43.0
v14-0019-Rename-GlobalVisTestIsRemovableXid-to-GlobalVisX.patchtext/x-patch; charset=US-ASCII; name=v14-0019-Rename-GlobalVisTestIsRemovableXid-to-GlobalVisX.patchDownload
From cbfb5ee8a412651c604307cd0bd611f187ed348a Mon Sep 17 00:00:00 2001
From: Melanie Plageman <melanieplageman@gmail.com>
Date: Fri, 18 Jul 2025 16:30:04 -0400
Subject: [PATCH v14 19/24] Rename GlobalVisTestIsRemovableXid() to
GlobalVisXidVisibleToAll()
Currently, we only use GlobalVisTestIsRemovableXid() to check if a
tuple's xmax is visible to all, meaning we can remove it. But future
commits will use GlobalVisTestIsRemovableXid() to test if a tuple's xmin
is visible to all for the purposes of determining if setting the page
all-visible in the VM. In that case, it makes more sense to call the
function GlobalVisXidVisibleToAll().
Reviewed-by: Kirill Reshke <reshkekirill@gmail.com>
---
src/backend/access/heap/heapam_visibility.c | 6 +++---
src/backend/access/heap/pruneheap.c | 14 +++++++-------
src/backend/access/spgist/spgvacuum.c | 2 +-
src/backend/storage/ipc/procarray.c | 13 ++++++-------
src/include/utils/snapmgr.h | 4 ++--
5 files changed, 19 insertions(+), 20 deletions(-)
diff --git a/src/backend/access/heap/heapam_visibility.c b/src/backend/access/heap/heapam_visibility.c
index 05f6946fe60..4ebc8abdbeb 100644
--- a/src/backend/access/heap/heapam_visibility.c
+++ b/src/backend/access/heap/heapam_visibility.c
@@ -1447,7 +1447,7 @@ HeapTupleSatisfiesNonVacuumable(HeapTuple htup, Snapshot snapshot,
{
Assert(TransactionIdIsValid(dead_after));
- if (GlobalVisTestIsRemovableXid(snapshot->vistest, dead_after))
+ if (GlobalVisXidVisibleToAll(snapshot->vistest, dead_after))
res = HEAPTUPLE_DEAD;
}
else
@@ -1512,8 +1512,8 @@ HeapTupleIsSurelyDead(HeapTuple htup, GlobalVisState *vistest)
return false;
/* Deleter committed, so tuple is dead if the XID is old enough. */
- return GlobalVisTestIsRemovableXid(vistest,
- HeapTupleHeaderGetRawXmax(tuple));
+ return GlobalVisXidVisibleToAll(vistest,
+ HeapTupleHeaderGetRawXmax(tuple));
}
/*
diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index 39d59a43ff7..471151fae2e 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -218,7 +218,7 @@ heap_page_prune_opt(Relation relation, Buffer buffer)
*/
vistest = GlobalVisTestFor(relation);
- if (!GlobalVisTestIsRemovableXid(vistest, prune_xid))
+ if (!GlobalVisXidVisibleToAll(vistest, prune_xid))
return;
/*
@@ -727,9 +727,9 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
* Determining HTSV only once for each tuple is required for correctness,
* to deal with cases where running HTSV twice could result in different
* results. For example, RECENTLY_DEAD can turn to DEAD if another
- * checked item causes GlobalVisTestIsRemovableFullXid() to update the
- * horizon, or INSERT_IN_PROGRESS can change to DEAD if the inserting
- * transaction aborts.
+ * checked item causes GlobalVisXidVisibleToAll() to update the horizon,
+ * or INSERT_IN_PROGRESS can change to DEAD if the inserting transaction
+ * aborts.
*
* It's also good for performance. Most commonly tuples within a page are
* stored at decreasing offsets (while the items are stored at increasing
@@ -1199,11 +1199,11 @@ heap_prune_satisfies_vacuum(PruneState *prstate, HeapTuple tup, Buffer buffer)
* Determine whether or not the tuple is considered dead when compared
* with the provided GlobalVisState. On-access pruning does not provide
* VacuumCutoffs. And for vacuum, even if the tuple's xmax is not older
- * than OldestXmin, GlobalVisTestIsRemovableXid() could find the row dead
- * if the GlobalVisState has been updated since the beginning of vacuuming
+ * than OldestXmin, GlobalVisXidVisibleToAll() could find the row dead if
+ * the GlobalVisState has been updated since the beginning of vacuuming
* the relation.
*/
- if (GlobalVisTestIsRemovableXid(prstate->vistest, dead_after))
+ if (GlobalVisXidVisibleToAll(prstate->vistest, dead_after))
return HEAPTUPLE_DEAD;
return res;
diff --git a/src/backend/access/spgist/spgvacuum.c b/src/backend/access/spgist/spgvacuum.c
index 8f8a1ad7796..496cca69410 100644
--- a/src/backend/access/spgist/spgvacuum.c
+++ b/src/backend/access/spgist/spgvacuum.c
@@ -536,7 +536,7 @@ vacuumRedirectAndPlaceholder(Relation index, Relation heaprel, Buffer buffer)
*/
if (dt->tupstate == SPGIST_REDIRECT &&
(!TransactionIdIsValid(dt->xid) ||
- GlobalVisTestIsRemovableXid(vistest, dt->xid)))
+ GlobalVisXidVisibleToAll(vistest, dt->xid)))
{
dt->tupstate = SPGIST_PLACEHOLDER;
Assert(opaque->nRedirection > 0);
diff --git a/src/backend/storage/ipc/procarray.c b/src/backend/storage/ipc/procarray.c
index 200f72c6e25..f67f01c17c2 100644
--- a/src/backend/storage/ipc/procarray.c
+++ b/src/backend/storage/ipc/procarray.c
@@ -4181,8 +4181,7 @@ GlobalVisUpdate(void)
* See comment for GlobalVisState for details.
*/
bool
-GlobalVisTestIsRemovableFullXid(GlobalVisState *state,
- FullTransactionId fxid)
+GlobalVisFullXidVisibleToAll(GlobalVisState *state, FullTransactionId fxid)
{
/*
* If fxid is older than maybe_needed bound, it definitely is visible to
@@ -4223,7 +4222,7 @@ GlobalVisTestIsRemovableFullXid(GlobalVisState *state,
* relfrozenxid).
*/
bool
-GlobalVisTestIsRemovableXid(GlobalVisState *state, TransactionId xid)
+GlobalVisXidVisibleToAll(GlobalVisState *state, TransactionId xid)
{
FullTransactionId fxid;
@@ -4237,7 +4236,7 @@ GlobalVisTestIsRemovableXid(GlobalVisState *state, TransactionId xid)
*/
fxid = FullXidRelativeTo(state->definitely_needed, xid);
- return GlobalVisTestIsRemovableFullXid(state, fxid);
+ return GlobalVisFullXidVisibleToAll(state, fxid);
}
/*
@@ -4251,12 +4250,12 @@ GlobalVisCheckRemovableFullXid(Relation rel, FullTransactionId fxid)
state = GlobalVisTestFor(rel);
- return GlobalVisTestIsRemovableFullXid(state, fxid);
+ return GlobalVisFullXidVisibleToAll(state, fxid);
}
/*
* Convenience wrapper around GlobalVisTestFor() and
- * GlobalVisTestIsRemovableXid(), see their comments.
+ * GlobalVisTestIsVisibleXid(), see their comments.
*/
bool
GlobalVisCheckRemovableXid(Relation rel, TransactionId xid)
@@ -4265,7 +4264,7 @@ GlobalVisCheckRemovableXid(Relation rel, TransactionId xid)
state = GlobalVisTestFor(rel);
- return GlobalVisTestIsRemovableXid(state, xid);
+ return GlobalVisXidVisibleToAll(state, xid);
}
/*
diff --git a/src/include/utils/snapmgr.h b/src/include/utils/snapmgr.h
index 604c1f90216..a0ea2cfcea2 100644
--- a/src/include/utils/snapmgr.h
+++ b/src/include/utils/snapmgr.h
@@ -100,8 +100,8 @@ extern char *ExportSnapshot(Snapshot snapshot);
*/
typedef struct GlobalVisState GlobalVisState;
extern GlobalVisState *GlobalVisTestFor(Relation rel);
-extern bool GlobalVisTestIsRemovableXid(GlobalVisState *state, TransactionId xid);
-extern bool GlobalVisTestIsRemovableFullXid(GlobalVisState *state, FullTransactionId fxid);
+extern bool GlobalVisXidVisibleToAll(GlobalVisState *state, TransactionId xid);
+extern bool GlobalVisFullXidVisibleToAll(GlobalVisState *state, FullTransactionId fxid);
extern bool GlobalVisCheckRemovableXid(Relation rel, TransactionId xid);
extern bool GlobalVisCheckRemovableFullXid(Relation rel, FullTransactionId fxid);
--
2.43.0
v14-0020-Use-GlobalVisState-to-determine-page-level-visib.patchtext/x-patch; charset=US-ASCII; name=v14-0020-Use-GlobalVisState-to-determine-page-level-visib.patchDownload
From aeb0c7ed54566dfd8b67d4ad50d46938b1ccf95d Mon Sep 17 00:00:00 2001
From: Melanie Plageman <melanieplageman@gmail.com>
Date: Tue, 29 Jul 2025 14:38:24 -0400
Subject: [PATCH v14 20/24] Use GlobalVisState to determine page level
visibility
During pruning and during vacuum's third phase, we try to determine if
the whole page can be set all-visible in the visibility map. Instead of
using OldestXmin to determine if all the tuples on a page are visible to
everyone, use the GlobalVisState. This allows us to start setting the VM
during on-access pruning in a future commit.
It is possible for the GlobalVisState to change during the course of a
vacuum. In all but extraordinary cases, it moves forward, meaning more
pages could potentially be set in the VM.
Because comparing a transaction ID to the GlobalVisState requires more
operations than comparing it to another single transaction ID, we now
wait until after examining all the tuples on the page and if we have
maintained the visibility_cutoff_xid, we compare that to the
GlobalVisState just once per page. This works because if the page is
all-visible and has live, committed tuples on it, the
visibility_cutoff_xid will contain the newest xmin on the page. If
everyone can see it, the page is truly all-visible.
Doing this may mean we examine more tuples' xmins than before, as we may
have set all_visible to false sooner when encountering a live tuple
newer than OldestXmin. However, these extra comparisons were found not
to be significant in a profile.
---
src/backend/access/heap/heapam_visibility.c | 28 +++++++++++++
src/backend/access/heap/pruneheap.c | 46 +++++++++------------
src/backend/access/heap/vacuumlazy.c | 20 ++++-----
src/include/access/heapam.h | 4 +-
4 files changed, 60 insertions(+), 38 deletions(-)
diff --git a/src/backend/access/heap/heapam_visibility.c b/src/backend/access/heap/heapam_visibility.c
index 4ebc8abdbeb..edd529dc3c0 100644
--- a/src/backend/access/heap/heapam_visibility.c
+++ b/src/backend/access/heap/heapam_visibility.c
@@ -1189,6 +1189,34 @@ HeapTupleSatisfiesVacuum(HeapTuple htup, TransactionId OldestXmin,
return res;
}
+/*
+ * Nearly the same as HeapTupleSatisfiesVacuum, but uses a GlobalVisState to
+ * determine whether or not a tuple is HEAPTUPLE_DEAD Or
+ * HEAPTUPLE_RECENTLY_DEAD. It serves the same purpose but can be used by
+ * callers that have not calculated a single OldestXmin value.
+ */
+HTSV_Result
+HeapTupleSatisfiesVacuumGlobalVis(HeapTuple htup, GlobalVisState *vistest,
+ Buffer buffer)
+{
+ TransactionId dead_after = InvalidTransactionId;
+ HTSV_Result res;
+
+ res = HeapTupleSatisfiesVacuumHorizon(htup, buffer, &dead_after);
+
+ if (res == HEAPTUPLE_RECENTLY_DEAD)
+ {
+ Assert(TransactionIdIsValid(dead_after));
+
+ if (GlobalVisXidVisibleToAll(vistest, dead_after))
+ res = HEAPTUPLE_DEAD;
+ }
+ else
+ Assert(!TransactionIdIsValid(dead_after));
+
+ return res;
+}
+
/*
* Work horse for HeapTupleSatisfiesVacuum and similar routines.
*
diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index 471151fae2e..bb7a1357a89 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -134,10 +134,9 @@ typedef struct
* all_visible and all_frozen indicate if the all-visible and all-frozen
* bits in the visibility map can be set for this page after pruning.
*
- * visibility_cutoff_xid is the newest xmin of live tuples on the page.
- * The caller can use it as the conflict horizon, when setting the VM
- * bits. It is only valid if we froze some tuples, and all_frozen is
- * true.
+ * visibility_cutoff_xid is the newest xmin of live tuples on the page. It
+ * can be used as the conflict horizon, when setting the VM or when
+ * freezing all the live tuples on the page.
*
* NOTE: all_visible and all_frozen don't include LP_DEAD items. That's
* convenient for heap_page_prune_and_freeze(), to use them to decide
@@ -706,14 +705,12 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
/*
* The visibility cutoff xid is the newest xmin of live, committed tuples
- * older than OldestXmin on the page. This field is only kept up-to-date
- * if the page is all-visible. As soon as a tuple is encountered that is
- * not visible to all, this field is unmaintained. As long as it is
- * maintained, it can be used to calculate the snapshot conflict horizon.
- * This is most likely to happen when updating the VM and/or freezing all
- * live tuples on the page. It is updated before returning to the caller
- * because vacuum does assert-build only validation on the page using this
- * field.
+ * on the page older than the visibility horizon represented in the
+ * GlobalVisState.
+ *
+ * If we encounter an uncommitted tuple, this field is unmaintained. If
+ * the page is being set all-visible or when freezing all live tuples on
+ * the page, it is used to calculate the snapshot conflict horizon.
*/
prstate.visibility_cutoff_xid = InvalidTransactionId;
@@ -909,6 +906,16 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
prstate.ndead > 0 ||
prstate.nunused > 0;
+ /*
+ * After processing all the live tuples on the page, if the newest xmin
+ * amongst them is not visible to everyone, the page cannot be
+ * all-visible.
+ */
+ if (prstate.all_visible &&
+ TransactionIdIsNormal(prstate.visibility_cutoff_xid) &&
+ !GlobalVisXidVisibleToAll(prstate.vistest, prstate.visibility_cutoff_xid))
+ prstate.all_visible = prstate.all_frozen = false;
+
/*
* Even if we don't prune anything, if we found a new value for the
* pd_prune_xid field or the page was marked full, we will update the hint
@@ -1129,7 +1136,7 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
Assert(prstate.cutoffs);
if (!heap_page_is_all_visible(relation, buffer,
- prstate.cutoffs->OldestXmin,
+ prstate.vistest,
&debug_all_frozen,
&debug_cutoff, off_loc))
Assert(false);
@@ -1655,19 +1662,6 @@ heap_prune_record_unchanged_lp_normal(Page page, PruneState *prstate, OffsetNumb
*/
xmin = HeapTupleHeaderGetXmin(htup);
- /*
- * For now always use prstate->cutoffs for this test, because
- * we only update 'all_visible' when freezing is requested. We
- * could use GlobalVisTestIsRemovableXid instead, if a
- * non-freezing caller wanted to set the VM bit.
- */
- Assert(prstate->cutoffs);
- if (!TransactionIdPrecedes(xmin, prstate->cutoffs->OldestXmin))
- {
- prstate->all_visible = prstate->all_frozen = false;
- break;
- }
-
/* Track newest xmin on page. */
if (TransactionIdFollows(xmin, prstate->visibility_cutoff_xid) &&
TransactionIdIsNormal(xmin))
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index 28436389d63..341115dbbbe 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -464,7 +464,7 @@ static void dead_items_add(LVRelState *vacrel, BlockNumber blkno, OffsetNumber *
static void dead_items_reset(LVRelState *vacrel);
static void dead_items_cleanup(LVRelState *vacrel);
static bool heap_page_would_be_all_visible(Relation rel, Buffer buf,
- TransactionId OldestXmin,
+ GlobalVisState *vistest,
OffsetNumber *deadoffsets,
int ndeadoffsets,
bool *all_frozen,
@@ -2733,7 +2733,7 @@ lazy_vacuum_heap_page(LVRelState *vacrel, BlockNumber blkno, Buffer buffer,
InvalidOffsetNumber);
if (heap_page_would_be_all_visible(vacrel->rel, buffer,
- vacrel->cutoffs.OldestXmin,
+ vacrel->vistest,
deadoffsets, num_offsets,
&all_frozen, &visibility_cutoff_xid,
&vacrel->offnum))
@@ -3478,14 +3478,13 @@ dead_items_cleanup(LVRelState *vacrel)
*/
bool
heap_page_is_all_visible(Relation rel, Buffer buf,
- TransactionId OldestXmin,
+ GlobalVisState *vistest,
bool *all_frozen,
TransactionId *visibility_cutoff_xid,
OffsetNumber *logging_offnum)
{
- return heap_page_would_be_all_visible(rel, buf,
- OldestXmin,
+ return heap_page_would_be_all_visible(rel, buf, vistest,
NULL, 0,
all_frozen,
visibility_cutoff_xid,
@@ -3504,7 +3503,7 @@ heap_page_is_all_visible(Relation rel, Buffer buf,
* Returns true if the page is all-visible other than the provided
* deadoffsets and false otherwise.
*
- * OldestXmin is used to determine visibility.
+ * vistest is used to determine visibility.
*
* *all_frozen is an output parameter indicating to the caller if every tuple
* on the page is frozen.
@@ -3525,7 +3524,7 @@ heap_page_is_all_visible(Relation rel, Buffer buf,
*/
static bool
heap_page_would_be_all_visible(Relation rel, Buffer buf,
- TransactionId OldestXmin,
+ GlobalVisState *vistest,
OffsetNumber *deadoffsets,
int ndeadoffsets,
bool *all_frozen,
@@ -3597,8 +3596,8 @@ heap_page_would_be_all_visible(Relation rel, Buffer buf,
tuple.t_len = ItemIdGetLength(itemid);
tuple.t_tableOid = RelationGetRelid(rel);
- switch (HeapTupleSatisfiesVacuum(&tuple, OldestXmin,
- buf))
+ switch (HeapTupleSatisfiesVacuumGlobalVis(&tuple, vistest,
+ buf))
{
case HEAPTUPLE_LIVE:
{
@@ -3617,8 +3616,7 @@ heap_page_would_be_all_visible(Relation rel, Buffer buf,
* that everyone sees it as committed?
*/
xmin = HeapTupleHeaderGetXmin(tuple.t_data);
- if (!TransactionIdPrecedes(xmin,
- OldestXmin))
+ if (!GlobalVisXidVisibleToAll(vistest, xmin))
{
all_visible = false;
*all_frozen = false;
diff --git a/src/include/access/heapam.h b/src/include/access/heapam.h
index cac7a4c2899..35a25cf0b04 100644
--- a/src/include/access/heapam.h
+++ b/src/include/access/heapam.h
@@ -401,7 +401,7 @@ extern void heap_vacuum_rel(Relation rel,
const VacuumParams params, BufferAccessStrategy bstrategy);
extern bool heap_page_is_all_visible(Relation rel, Buffer buf,
- TransactionId OldestXmin,
+ GlobalVisState *vistest,
bool *all_frozen,
TransactionId *visibility_cutoff_xid,
OffsetNumber *logging_offnum);
@@ -413,6 +413,8 @@ extern TM_Result HeapTupleSatisfiesUpdate(HeapTuple htup, CommandId curcid,
Buffer buffer);
extern HTSV_Result HeapTupleSatisfiesVacuum(HeapTuple htup, TransactionId OldestXmin,
Buffer buffer);
+extern HTSV_Result HeapTupleSatisfiesVacuumGlobalVis(HeapTuple htup,
+ GlobalVisState *vistest, Buffer buffer);
extern HTSV_Result HeapTupleSatisfiesVacuumHorizon(HeapTuple htup, Buffer buffer,
TransactionId *dead_after);
extern void HeapTupleSetHintBits(HeapTupleHeader tuple, Buffer buffer,
--
2.43.0
v14-0003-Reorder-heap_page_prune_and_freeze-parameters.patchtext/x-patch; charset=US-ASCII; name=v14-0003-Reorder-heap_page_prune_and_freeze-parameters.patchDownload
From da4f0d141c8fa673a4651c42efd8bc48cd88c485 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <melanieplageman@gmail.com>
Date: Mon, 15 Sep 2025 12:06:19 -0400
Subject: [PATCH v14 03/24] Reorder heap_page_prune_and_freeze parameters
Move read-only parameters to the beginning of the function, making it
more clear which paramters are inputs and which are input/outputs or
outputs. Also const-qualify VacuumCutoffs, which is not modified in
heap_page_prune_and_freeze().
---
src/backend/access/heap/pruneheap.c | 40 ++++++++++++++--------------
src/backend/access/heap/vacuumlazy.c | 6 +++--
src/include/access/heapam.h | 6 ++---
3 files changed, 27 insertions(+), 25 deletions(-)
diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index d8ea0c78f77..28bd6a56749 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -43,7 +43,7 @@ typedef struct
bool mark_unused_now;
/* whether to attempt freezing tuples */
bool freeze;
- struct VacuumCutoffs *cutoffs;
+ const struct VacuumCutoffs *cutoffs;
/*-------------------------------------------------------
* Fields describing what to do to the page
@@ -260,8 +260,8 @@ heap_page_prune_opt(Relation relation, Buffer buffer)
* not the relation has indexes, since we cannot safely determine
* that during on-access pruning with the current implementation.
*/
- heap_page_prune_and_freeze(relation, buffer, vistest, 0,
- NULL, &presult, PRUNE_ON_ACCESS, &dummy_off_loc, NULL, NULL);
+ heap_page_prune_and_freeze(relation, buffer, PRUNE_ON_ACCESS, 0, NULL,
+ vistest, &presult, &dummy_off_loc, NULL, NULL);
/*
* Report the number of tuples reclaimed to pgstats. This is
@@ -303,7 +303,17 @@ heap_page_prune_opt(Relation relation, Buffer buffer)
* also need to account for a reduction in the length of the line pointer
* array following array truncation by us.
*
- * If the HEAP_PRUNE_FREEZE option is set, we will also freeze tuples if it's
+ * reason indicates why the pruning is performed. It is included in the WAL
+ * record for debugging and analysis purposes, but otherwise has no effect.
+ *
+ * options:
+ * MARK_UNUSED_NOW indicates that dead items can be set LP_UNUSED during
+ * pruning.
+ *
+ * FREEZE indicates that we will also freeze tuples, and will return
+ * 'all_visible', 'all_frozen' flags to the caller.
+ *
+ * If the HEAP_PRUNE_FREEZE option is set, we will freeze tuples if it's
* required in order to advance relfrozenxid / relminmxid, or if it's
* considered advantageous for overall system performance to do so now. The
* 'cutoffs', 'presult', 'new_relfrozen_xid' and 'new_relmin_mxid' arguments
@@ -313,29 +323,19 @@ heap_page_prune_opt(Relation relation, Buffer buffer)
* HEAP_PRUNE_FREEZE option is not set, because at the moment only callers
* that also freeze need that information.
*
- * vistest is used to distinguish whether tuples are DEAD or RECENTLY_DEAD
- * (see heap_prune_satisfies_vacuum).
- *
- * options:
- * MARK_UNUSED_NOW indicates that dead items can be set LP_UNUSED during
- * pruning.
- *
- * FREEZE indicates that we will also freeze tuples, and will return
- * 'all_visible', 'all_frozen' flags to the caller.
- *
* cutoffs contains the freeze cutoffs, established by VACUUM at the beginning
* of vacuuming the relation. Required if HEAP_PRUNE_FREEZE option is set.
* cutoffs->OldestXmin is also used to determine if dead tuples are
* HEAPTUPLE_RECENTLY_DEAD or HEAPTUPLE_DEAD.
*
+ * vistest is used to distinguish whether tuples are DEAD or RECENTLY_DEAD
+ * (see heap_prune_satisfies_vacuum).
+ *
* presult contains output parameters needed by callers, such as the number of
* tuples removed and the offsets of dead items on the page after pruning.
* heap_page_prune_and_freeze() is responsible for initializing it. Required
* by all callers.
*
- * reason indicates why the pruning is performed. It is included in the WAL
- * record for debugging and analysis purposes, but otherwise has no effect.
- *
* off_loc is the offset location required by the caller to use in error
* callback.
*
@@ -348,11 +348,11 @@ heap_page_prune_opt(Relation relation, Buffer buffer)
*/
void
heap_page_prune_and_freeze(Relation relation, Buffer buffer,
- GlobalVisState *vistest,
+ PruneReason reason,
int options,
- struct VacuumCutoffs *cutoffs,
+ const struct VacuumCutoffs *cutoffs,
+ GlobalVisState *vistest,
PruneFreezeResult *presult,
- PruneReason reason,
OffsetNumber *off_loc,
TransactionId *new_relfrozen_xid,
MultiXactId *new_relmin_mxid)
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index 981d9380a92..ddc9677694c 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -1974,8 +1974,10 @@ lazy_scan_prune(LVRelState *vacrel,
if (vacrel->nindexes == 0)
prune_options |= HEAP_PAGE_PRUNE_MARK_UNUSED_NOW;
- heap_page_prune_and_freeze(rel, buf, vacrel->vistest, prune_options,
- &vacrel->cutoffs, &presult, PRUNE_VACUUM_SCAN,
+ heap_page_prune_and_freeze(rel, buf, PRUNE_VACUUM_SCAN, prune_options,
+ &vacrel->cutoffs,
+ vacrel->vistest,
+ &presult,
&vacrel->offnum,
&vacrel->NewRelfrozenXid, &vacrel->NewRelminMxid);
diff --git a/src/include/access/heapam.h b/src/include/access/heapam.h
index a2bd5a897f8..34206a6a7d5 100644
--- a/src/include/access/heapam.h
+++ b/src/include/access/heapam.h
@@ -374,11 +374,11 @@ extern TransactionId heap_index_delete_tuples(Relation rel,
struct GlobalVisState;
extern void heap_page_prune_opt(Relation relation, Buffer buffer);
extern void heap_page_prune_and_freeze(Relation relation, Buffer buffer,
- struct GlobalVisState *vistest,
+ PruneReason reason,
int options,
- struct VacuumCutoffs *cutoffs,
+ const struct VacuumCutoffs *cutoffs,
+ struct GlobalVisState *vistest,
PruneFreezeResult *presult,
- PruneReason reason,
OffsetNumber *off_loc,
TransactionId *new_relfrozen_xid,
MultiXactId *new_relmin_mxid);
--
2.43.0
v14-0005-Rename-PruneState.freeze-to-attempt_freeze.patchtext/x-patch; charset=US-ASCII; name=v14-0005-Rename-PruneState.freeze-to-attempt_freeze.patchDownload
From 94b4e946cd498470e9a0fac0b15299feaccfeefc Mon Sep 17 00:00:00 2001
From: Melanie Plageman <melanieplageman@gmail.com>
Date: Thu, 31 Jul 2025 14:07:51 -0400
Subject: [PATCH v14 05/24] Rename PruneState.freeze to attempt_freeze
This makes it more clear that this is to indicate the caller would like
heap_page_prune_and_freeze() to consider freezing tuples -- not that we
ultimately will end up freezing them.
Also rename local variable hint_bit_fpi to did_tuple_hint_fpi. This
makes it clear it is about tuple hints and not page hints and that it
indicates something that happened and not something that could happen.
And rename local variable do_hint to do_hint_prune. This distinguishes
the prunable and page full hints used to decide whether or not to
on-access prune a page from other page-level and tuple hint bits.
---
src/backend/access/heap/pruneheap.c | 28 ++++++++++++++--------------
1 file changed, 14 insertions(+), 14 deletions(-)
diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index ea8216e0632..740aa07cd83 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -42,7 +42,7 @@ typedef struct
/* whether or not dead items can be set LP_UNUSED during pruning */
bool mark_unused_now;
/* whether to attempt freezing tuples */
- bool freeze;
+ bool attempt_freeze;
const struct VacuumCutoffs *cutoffs;
/*-------------------------------------------------------
@@ -361,14 +361,14 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
HeapTupleData tup;
bool do_freeze;
bool do_prune;
- bool do_hint;
- bool hint_bit_fpi;
+ bool do_hint_prune;
+ bool did_tuple_hint_fpi;
int64 fpi_before = pgWalUsage.wal_fpi;
/* Copy parameters to prstate */
prstate.vistest = vistest;
prstate.mark_unused_now = (options & HEAP_PAGE_PRUNE_MARK_UNUSED_NOW) != 0;
- prstate.freeze = (options & HEAP_PAGE_PRUNE_FREEZE) != 0;
+ prstate.attempt_freeze = (options & HEAP_PAGE_PRUNE_FREEZE) != 0;
prstate.cutoffs = cutoffs;
/*
@@ -390,7 +390,7 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
/* initialize page freezing working state */
prstate.pagefrz.freeze_required = false;
- if (prstate.freeze)
+ if (prstate.attempt_freeze)
{
Assert(new_relfrozen_xid && new_relmin_mxid);
prstate.pagefrz.FreezePageRelfrozenXid = *new_relfrozen_xid;
@@ -437,7 +437,7 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
* function, when we return the value to the caller, so that the caller
* doesn't set the VM bit incorrectly.
*/
- if (prstate.freeze)
+ if (prstate.attempt_freeze)
{
prstate.all_visible = true;
prstate.all_frozen = true;
@@ -551,7 +551,7 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
* If checksums are enabled, heap_prune_satisfies_vacuum() may have caused
* an FPI to be emitted.
*/
- hint_bit_fpi = fpi_before != pgWalUsage.wal_fpi;
+ did_tuple_hint_fpi = fpi_before != pgWalUsage.wal_fpi;
/*
* Process HOT chains.
@@ -659,7 +659,7 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
* pd_prune_xid field or the page was marked full, we will update the hint
* bit.
*/
- do_hint = ((PageHeader) page)->pd_prune_xid != prstate.new_prune_xid ||
+ do_hint_prune = ((PageHeader) page)->pd_prune_xid != prstate.new_prune_xid ||
PageIsFull(page);
/*
@@ -667,7 +667,7 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
* plans we prepared, or not.
*/
do_freeze = false;
- if (prstate.freeze)
+ if (prstate.attempt_freeze)
{
if (prstate.pagefrz.freeze_required)
{
@@ -702,14 +702,14 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
*/
if (RelationNeedsWAL(relation))
{
- if (hint_bit_fpi)
+ if (did_tuple_hint_fpi)
do_freeze = true;
else if (do_prune)
{
if (XLogCheckBufferNeedsBackup(buffer))
do_freeze = true;
}
- else if (do_hint)
+ else if (do_hint_prune)
{
if (XLogHintBitIsNeeded() && XLogCheckBufferNeedsBackup(buffer))
do_freeze = true;
@@ -752,7 +752,7 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
/* Any error while applying the changes is critical */
START_CRIT_SECTION();
- if (do_hint)
+ if (do_hint_prune)
{
/*
* Update the page's pd_prune_xid field to either zero, or the lowest
@@ -893,7 +893,7 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
presult->lpdead_items = prstate.lpdead_items;
/* the presult->deadoffsets array was already filled in */
- if (prstate.freeze)
+ if (prstate.attempt_freeze)
{
if (presult->nfrozen > 0)
{
@@ -1475,7 +1475,7 @@ heap_prune_record_unchanged_lp_normal(Page page, PruneState *prstate, OffsetNumb
}
/* Consider freezing any normal tuples which will not be removed */
- if (prstate->freeze)
+ if (prstate->attempt_freeze)
{
bool totally_frozen;
--
2.43.0
v14-0002-Correct-prune-WAL-record-opcode-mention-in-comme.patchtext/x-patch; charset=US-ASCII; name=v14-0002-Correct-prune-WAL-record-opcode-mention-in-comme.patchDownload
From d89c39061d008ccfe306c9c39e7b74f9555a4ac2 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <melanieplageman@gmail.com>
Date: Wed, 17 Sep 2025 14:54:42 -0400
Subject: [PATCH v14 02/24] Correct prune WAL record opcode mention in comment
f83d709760d8 incorrectly refers to a XLOG_HEAP2_PRUNE_FREEZE WAL record
opcode. No such code exists. The relevant opcodes are
XLOG_HEAP2_PRUNE_ON_ACCESS, XLOG_HEAP2_PRUNE_VACUUM_SCAN, and
XLOG_HEAP2_PRUNE_VACUUM_CLEANUP. Correct it.
---
src/backend/access/heap/pruneheap.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index 7ebd22f00a3..d8ea0c78f77 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -794,7 +794,7 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
MarkBufferDirty(buffer);
/*
- * Emit a WAL XLOG_HEAP2_PRUNE_FREEZE record showing what we did
+ * Emit a WAL XLOG_HEAP2_PRUNE* record showing what we did
*/
if (RelationNeedsWAL(relation))
{
@@ -2026,7 +2026,7 @@ heap_log_freeze_plan(HeapTupleFreeze *tuples, int ntuples,
}
/*
- * Write an XLOG_HEAP2_PRUNE_FREEZE WAL record
+ * Write an XLOG_HEAP2_PRUNE* WAL record
*
* This is used for several different page maintenance operations:
*
--
2.43.0
v14-0004-Keep-all_frozen-updated-in-heap_page_prune_and_f.patchtext/x-patch; charset=US-ASCII; name=v14-0004-Keep-all_frozen-updated-in-heap_page_prune_and_f.patchDownload
From 51729486db735989377d18bfc855d0d3d7f32114 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <melanieplageman@gmail.com>
Date: Wed, 23 Jul 2025 16:01:24 -0400
Subject: [PATCH v14 04/24] Keep all_frozen updated in
heap_page_prune_and_freeze
We previously relied on only using all-visible and all-frozen together
but it's best to keep them both updated.
Future commits will separate usage of these fields, so it is best not to
rely on all_visible for all_frozen's validity.
---
src/backend/access/heap/pruneheap.c | 21 ++++++++++-----------
src/backend/access/heap/vacuumlazy.c | 9 ++++-----
2 files changed, 14 insertions(+), 16 deletions(-)
diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index 28bd6a56749..ea8216e0632 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -142,10 +142,6 @@ typedef struct
* whether to freeze the page or not. The all_visible and all_frozen
* values returned to the caller are adjusted to include LP_DEAD items at
* the end.
- *
- * all_frozen should only be considered valid if all_visible is also set;
- * we don't bother to clear the all_frozen flag every time we clear the
- * all_visible flag.
*/
bool all_visible;
bool all_frozen;
@@ -696,8 +692,10 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
* used anymore. The opportunistic freeze heuristic must be
* improved; however, for now, try to approximate the old logic.
*/
- if (prstate.all_visible && prstate.all_frozen && prstate.nfrozen > 0)
+ if (prstate.all_frozen && prstate.nfrozen > 0)
{
+ Assert(prstate.all_visible);
+
/*
* Freezing would make the page all-frozen. Have already
* emitted an FPI or will do so anyway?
@@ -750,6 +748,7 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
*/
}
+ Assert(!prstate.all_frozen || prstate.all_visible);
/* Any error while applying the changes is critical */
START_CRIT_SECTION();
@@ -819,7 +818,7 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
*/
if (do_freeze)
{
- if (prstate.all_visible && prstate.all_frozen)
+ if (prstate.all_frozen)
frz_conflict_horizon = prstate.visibility_cutoff_xid;
else
{
@@ -1382,7 +1381,7 @@ heap_prune_record_unchanged_lp_normal(Page page, PruneState *prstate, OffsetNumb
if (!HeapTupleHeaderXminCommitted(htup))
{
- prstate->all_visible = false;
+ prstate->all_visible = prstate->all_frozen = false;
break;
}
@@ -1404,7 +1403,7 @@ heap_prune_record_unchanged_lp_normal(Page page, PruneState *prstate, OffsetNumb
Assert(prstate->cutoffs);
if (!TransactionIdPrecedes(xmin, prstate->cutoffs->OldestXmin))
{
- prstate->all_visible = false;
+ prstate->all_visible = prstate->all_frozen = false;
break;
}
@@ -1417,7 +1416,7 @@ heap_prune_record_unchanged_lp_normal(Page page, PruneState *prstate, OffsetNumb
case HEAPTUPLE_RECENTLY_DEAD:
prstate->recently_dead_tuples++;
- prstate->all_visible = false;
+ prstate->all_visible = prstate->all_frozen = false;
/*
* This tuple will soon become DEAD. Update the hint field so
@@ -1436,7 +1435,7 @@ heap_prune_record_unchanged_lp_normal(Page page, PruneState *prstate, OffsetNumb
* assumption is a bit shaky, but it is what acquire_sample_rows()
* does, so be consistent.
*/
- prstate->all_visible = false;
+ prstate->all_visible = prstate->all_frozen = false;
/*
* If we wanted to optimize for aborts, we might consider marking
@@ -1454,7 +1453,7 @@ heap_prune_record_unchanged_lp_normal(Page page, PruneState *prstate, OffsetNumb
* will commit and update the counters after we report.
*/
prstate->live_tuples++;
- prstate->all_visible = false;
+ prstate->all_visible = prstate->all_frozen = false;
/*
* This tuple may soon become DEAD. Update the hint field so that
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index ddc9677694c..50cc898087f 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -2003,7 +2003,6 @@ lazy_scan_prune(LVRelState *vacrel,
* agreement with heap_page_is_all_visible() using an assertion.
*/
#ifdef USE_ASSERT_CHECKING
- /* Note that all_frozen value does not matter when !all_visible */
if (presult.all_visible)
{
TransactionId debug_cutoff;
@@ -2056,6 +2055,7 @@ lazy_scan_prune(LVRelState *vacrel,
*has_lpdead_items = (presult.lpdead_items > 0);
Assert(!presult.all_visible || !(*has_lpdead_items));
+ Assert(!presult.all_frozen || presult.all_visible);
/*
* Handle setting visibility map bit based on information from the VM (as
@@ -2161,11 +2161,10 @@ lazy_scan_prune(LVRelState *vacrel,
/*
* If the all-visible page is all-frozen but not marked as such yet, mark
- * it as all-frozen. Note that all_frozen is only valid if all_visible is
- * true, so we must check both all_visible and all_frozen.
+ * it as all-frozen.
*/
- else if (all_visible_according_to_vm && presult.all_visible &&
- presult.all_frozen && !VM_ALL_FROZEN(vacrel->rel, blkno, &vmbuffer))
+ else if (all_visible_according_to_vm && presult.all_frozen &&
+ !VM_ALL_FROZEN(vacrel->rel, blkno, &vmbuffer))
{
uint8 old_vmbits;
--
2.43.0
v14-0001-Eliminate-COPY-FREEZE-use-of-XLOG_HEAP2_VISIBLE.patchtext/x-patch; charset=US-ASCII; name=v14-0001-Eliminate-COPY-FREEZE-use-of-XLOG_HEAP2_VISIBLE.patchDownload
From cacff6c95e38d370b87148bc48cf6ac5f086ed07 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <melanieplageman@gmail.com>
Date: Tue, 17 Jun 2025 17:22:10 -0400
Subject: [PATCH v14 01/24] Eliminate COPY FREEZE use of XLOG_HEAP2_VISIBLE
Instead of emitting a separate WAL XLOG_HEAP2_VISIBLE record for setting
bits in the VM, specify the changes to make to the VM block in the
XLOG_HEAP2_MULTI_INSERT record.
This halves the number of WAL records emitted by COPY FREEZE.
Author: Melanie Plageman <melanieplageman@gmail.com>
Reviewed-by: Robert Haas <robertmhaas@gmail.com>
Reviewed-by: Andres Freund <andres@anarazel.de>
Reviewed-by: Kirill Reshke <reshkekirill@gmail.com>
Discussion: https://postgr.es/m/flat/CAAKRu_ZMw6Npd_qm2KM%2BFwQ3cMOMx1Dh3VMhp8-V7SOLxdK9-g%40mail.gmail.com
---
src/backend/access/heap/heapam.c | 44 ++++++++++------
src/backend/access/heap/heapam_xlog.c | 54 +++++++++++++++++++-
src/backend/access/heap/visibilitymap.c | 67 ++++++++++++++++++++++++-
src/backend/access/rmgrdesc/heapdesc.c | 5 ++
src/include/access/visibilitymap.h | 2 +
5 files changed, 154 insertions(+), 18 deletions(-)
diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index 4c5ae205a7a..c8cd9d22726 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -2466,7 +2466,11 @@ heap_multi_insert(Relation relation, TupleTableSlot **slots, int ntuples,
starting_with_empty_page = PageGetMaxOffsetNumber(page) == 0;
if (starting_with_empty_page && (options & HEAP_INSERT_FROZEN))
+ {
all_frozen_set = true;
+ /* Lock the vmbuffer before entering the critical section */
+ LockBuffer(vmbuffer, BUFFER_LOCK_EXCLUSIVE);
+ }
/* NO EREPORT(ERROR) from here till changes are logged */
START_CRIT_SECTION();
@@ -2506,7 +2510,8 @@ heap_multi_insert(Relation relation, TupleTableSlot **slots, int ntuples,
* going to add further frozen rows to it.
*
* If we're only adding already frozen rows to a previously empty
- * page, mark it as all-visible.
+ * page, mark it as all-frozen and update the visibility map. We're
+ * already holding a pin on the vmbuffer.
*/
if (PageIsAllVisible(page) && !(options & HEAP_INSERT_FROZEN))
{
@@ -2517,7 +2522,14 @@ heap_multi_insert(Relation relation, TupleTableSlot **slots, int ntuples,
vmbuffer, VISIBILITYMAP_VALID_BITS);
}
else if (all_frozen_set)
+ {
PageSetAllVisible(page);
+ visibilitymap_set_vmbits(relation,
+ BufferGetBlockNumber(buffer),
+ vmbuffer,
+ VISIBILITYMAP_ALL_VISIBLE |
+ VISIBILITYMAP_ALL_FROZEN);
+ }
/*
* XXX Should we set PageSetPrunable on this page ? See heap_insert()
@@ -2565,6 +2577,12 @@ heap_multi_insert(Relation relation, TupleTableSlot **slots, int ntuples,
xlrec->flags = 0;
if (all_visible_cleared)
xlrec->flags = XLH_INSERT_ALL_VISIBLE_CLEARED;
+
+ /*
+ * We don't have to worry about including a conflict xid in the
+ * WAL record as HEAP_INSERT_FROZEN intentionally violates
+ * visibility rules.
+ */
if (all_frozen_set)
xlrec->flags = XLH_INSERT_ALL_FROZEN_SET;
@@ -2627,7 +2645,10 @@ heap_multi_insert(Relation relation, TupleTableSlot **slots, int ntuples,
XLogBeginInsert();
XLogRegisterData(xlrec, tupledata - scratch.data);
+
XLogRegisterBuffer(0, buffer, REGBUF_STANDARD | bufflags);
+ if (all_frozen_set)
+ XLogRegisterBuffer(1, vmbuffer, 0);
XLogRegisterBufData(0, tupledata, totaldatalen);
@@ -2637,26 +2658,17 @@ heap_multi_insert(Relation relation, TupleTableSlot **slots, int ntuples,
recptr = XLogInsert(RM_HEAP2_ID, info);
PageSetLSN(page, recptr);
+ if (all_frozen_set)
+ {
+ Assert(BufferIsDirty(vmbuffer));
+ PageSetLSN(BufferGetPage(vmbuffer), recptr);
+ }
}
END_CRIT_SECTION();
- /*
- * If we've frozen everything on the page, update the visibilitymap.
- * We're already holding pin on the vmbuffer.
- */
if (all_frozen_set)
- {
- /*
- * It's fine to use InvalidTransactionId here - this is only used
- * when HEAP_INSERT_FROZEN is specified, which intentionally
- * violates visibility rules.
- */
- visibilitymap_set(relation, BufferGetBlockNumber(buffer), buffer,
- InvalidXLogRecPtr, vmbuffer,
- InvalidTransactionId,
- VISIBILITYMAP_ALL_VISIBLE | VISIBILITYMAP_ALL_FROZEN);
- }
+ LockBuffer(vmbuffer, BUFFER_LOCK_UNLOCK);
UnlockReleaseBuffer(buffer);
ndone += nthispage;
diff --git a/src/backend/access/heap/heapam_xlog.c b/src/backend/access/heap/heapam_xlog.c
index cf843277938..faa7c561a8a 100644
--- a/src/backend/access/heap/heapam_xlog.c
+++ b/src/backend/access/heap/heapam_xlog.c
@@ -551,6 +551,7 @@ heap_xlog_multi_insert(XLogReaderState *record)
int i;
bool isinit = (XLogRecGetInfo(record) & XLOG_HEAP_INIT_PAGE) != 0;
XLogRedoAction action;
+ Buffer vmbuffer = InvalidBuffer;
/*
* Insertion doesn't overwrite MVCC data, so no conflict processing is
@@ -571,11 +572,11 @@ heap_xlog_multi_insert(XLogReaderState *record)
if (xlrec->flags & XLH_INSERT_ALL_VISIBLE_CLEARED)
{
Relation reln = CreateFakeRelcacheEntry(rlocator);
- Buffer vmbuffer = InvalidBuffer;
visibilitymap_pin(reln, blkno, &vmbuffer);
visibilitymap_clear(reln, blkno, vmbuffer, VISIBILITYMAP_VALID_BITS);
ReleaseBuffer(vmbuffer);
+ vmbuffer = InvalidBuffer;
FreeFakeRelcacheEntry(reln);
}
@@ -662,6 +663,57 @@ heap_xlog_multi_insert(XLogReaderState *record)
if (BufferIsValid(buffer))
UnlockReleaseBuffer(buffer);
+ buffer = InvalidBuffer;
+
+ /*
+ * Now read and update the VM block.
+ *
+ * Note that the heap relation may have been dropped or truncated, leading
+ * us to skip updating the heap block due to the LSN interlock. However,
+ * even in that case, it's still safe to update the visibility map. Any
+ * WAL record that clears the visibility map bit does so before checking
+ * the page LSN, so any bits that need to be cleared will still be
+ * cleared.
+ *
+ * Note that the lock on the heap page was dropped above. In normal
+ * operation this would never be safe because a concurrent query could
+ * modify the heap page and clear PD_ALL_VISIBLE -- violating the
+ * invariant that PD_ALL_VISIBLE must be set if the corresponding bit in
+ * the VM is set.
+ *
+ * In recovery, we expect no other writers, so writing to the VM page
+ * without holding a lock on the heap page is considered safe enough. It
+ * is done this way when replaying xl_heap_visible records (see
+ * heap_xlog_visible()).
+ */
+ if (xlrec->flags & XLH_INSERT_ALL_FROZEN_SET &&
+ XLogReadBufferForRedoExtended(record, 1, RBM_ZERO_ON_ERROR, false,
+ &vmbuffer) == BLK_NEEDS_REDO)
+ {
+ Page vmpage = BufferGetPage(vmbuffer);
+ Relation reln = CreateFakeRelcacheEntry(rlocator);
+
+ /* initialize the page if it was read as zeros */
+ if (PageIsNew(vmpage))
+ PageInit(vmpage, BLCKSZ, 0);
+
+ visibilitymap_set_vmbits(reln, blkno,
+ vmbuffer,
+ VISIBILITYMAP_ALL_VISIBLE |
+ VISIBILITYMAP_ALL_FROZEN);
+
+ /*
+ * It is not possible that the VM was already set for this heap page,
+ * so the vmbuffer must have been modified and marked dirty.
+ */
+ Assert(BufferIsDirty(vmbuffer));
+ PageSetLSN(BufferGetPage(vmbuffer), lsn);
+ FreeFakeRelcacheEntry(reln);
+ }
+
+ if (BufferIsValid(vmbuffer))
+ UnlockReleaseBuffer(vmbuffer);
+
/*
* If the page is running low on free space, update the FSM as well.
* Arbitrarily, our definition of "low" is less than 20%. We can't do much
diff --git a/src/backend/access/heap/visibilitymap.c b/src/backend/access/heap/visibilitymap.c
index 7306c16f05c..aa48a436108 100644
--- a/src/backend/access/heap/visibilitymap.c
+++ b/src/backend/access/heap/visibilitymap.c
@@ -14,7 +14,8 @@
* visibilitymap_clear - clear bits for one page in the visibility map
* visibilitymap_pin - pin a map page for setting a bit
* visibilitymap_pin_ok - check whether correct map page is already pinned
- * visibilitymap_set - set a bit in a previously pinned page
+ * visibilitymap_set - set bit(s) in a previously pinned page and log
+ * visibilitymap_set_vmbits - set bit(s) in a pinned page
* visibilitymap_get_status - get status of bits
* visibilitymap_count - count number of bits set in visibility map
* visibilitymap_prepare_truncate -
@@ -321,6 +322,70 @@ visibilitymap_set(Relation rel, BlockNumber heapBlk, Buffer heapBuf,
return status;
}
+/*
+ * Set flags in the VM block contained in the passed in vmBuf.
+ *
+ * This function is for callers which include the VM changes in the same WAL
+ * record as the modifications of the heap page which rendered it all-visible.
+ * Callers separately logging the VM changes should invoke visibilitymap_set()
+ * instead.
+ *
+ * Caller must have pinned and exclusive locked the correct block of the VM in
+ * vmBuf. This block should contain the VM bits for the given heapBlk.
+ *
+ * During normal operation (i.e. not recovery), this should be called in a
+ * critical section which also makes any necessary changes to the heap page
+ * and, if relevant, emits WAL.
+ *
+ * Caller is responsible for WAL logging the changes to the VM buffer and for
+ * making any changes needed to the associated heap page. This includes
+ * maintaining any invariants such as ensuring the buffer containing heapBlk
+ * is pinned and exclusive locked.
+ */
+uint8
+visibilitymap_set_vmbits(Relation rel, BlockNumber heapBlk,
+ Buffer vmBuf, uint8 flags)
+{
+ BlockNumber mapBlock = HEAPBLK_TO_MAPBLOCK(heapBlk);
+ uint32 mapByte = HEAPBLK_TO_MAPBYTE(heapBlk);
+ uint8 mapOffset = HEAPBLK_TO_OFFSET(heapBlk);
+ Page page;
+ uint8 *map;
+ uint8 status;
+
+#ifdef TRACE_VISIBILITYMAP
+ elog(DEBUG1, "vm_set flags 0x%02X for %s %d",
+ flags, RelationGetRelationName(rel), heapBlk);
+#endif
+
+ /* Call in same critical section where WAL is emitted. */
+ Assert(InRecovery || CritSectionCount > 0);
+
+ /* Flags should be valid. Also never clear bits with this function */
+ Assert((flags & VISIBILITYMAP_VALID_BITS) == flags);
+
+ /* Must never set all_frozen bit without also setting all_visible bit */
+ Assert(flags != VISIBILITYMAP_ALL_FROZEN);
+
+ /* Check that we have the right VM page pinned */
+ if (!BufferIsValid(vmBuf) || BufferGetBlockNumber(vmBuf) != mapBlock)
+ elog(ERROR, "wrong VM buffer passed to visibilitymap_set");
+
+ Assert(BufferIsExclusiveLocked(vmBuf));
+
+ page = BufferGetPage(vmBuf);
+ map = (uint8 *) PageGetContents(page);
+
+ status = (map[mapByte] >> mapOffset) & VISIBILITYMAP_VALID_BITS;
+ if (flags != status)
+ {
+ map[mapByte] |= (flags << mapOffset);
+ MarkBufferDirty(vmBuf);
+ }
+
+ return status;
+}
+
/*
* visibilitymap_get_status - get status of bits
*
diff --git a/src/backend/access/rmgrdesc/heapdesc.c b/src/backend/access/rmgrdesc/heapdesc.c
index 82b62c95de5..b48d7dc1d24 100644
--- a/src/backend/access/rmgrdesc/heapdesc.c
+++ b/src/backend/access/rmgrdesc/heapdesc.c
@@ -16,6 +16,7 @@
#include "access/heapam_xlog.h"
#include "access/rmgrdesc_utils.h"
+#include "access/visibilitymapdefs.h"
#include "storage/standbydefs.h"
/*
@@ -354,6 +355,10 @@ heap2_desc(StringInfo buf, XLogReaderState *record)
appendStringInfo(buf, "ntuples: %d, flags: 0x%02X", xlrec->ntuples,
xlrec->flags);
+ if (xlrec->flags & XLH_INSERT_ALL_FROZEN_SET)
+ appendStringInfo(buf, ", vm_flags: 0x%02X",
+ VISIBILITYMAP_ALL_VISIBLE | VISIBILITYMAP_ALL_FROZEN);
+
if (XLogRecHasBlockData(record, 0) && !isinit)
{
appendStringInfoString(buf, ", offsets:");
diff --git a/src/include/access/visibilitymap.h b/src/include/access/visibilitymap.h
index be21c6dd1a3..fc7056a91ea 100644
--- a/src/include/access/visibilitymap.h
+++ b/src/include/access/visibilitymap.h
@@ -37,6 +37,8 @@ extern uint8 visibilitymap_set(Relation rel,
Buffer vmBuf,
TransactionId cutoff_xid,
uint8 flags);
+extern uint8 visibilitymap_set_vmbits(Relation rel, BlockNumber heapBlk,
+ Buffer vmBuf, uint8 flags);
extern uint8 visibilitymap_get_status(Relation rel, BlockNumber heapBlk, Buffer *vmbuf);
extern void visibilitymap_count(Relation rel, BlockNumber *all_visible, BlockNumber *all_frozen);
extern BlockNumber visibilitymap_prepare_truncate(Relation rel,
--
2.43.0
v14-0007-Update-PruneState.all_-visible-frozen-sooner-in-.patchtext/x-patch; charset=US-ASCII; name=v14-0007-Update-PruneState.all_-visible-frozen-sooner-in-.patchDownload
From de93f7eaffb009436cae2f80571ba0148f99db7a Mon Sep 17 00:00:00 2001
From: Melanie Plageman <melanieplageman@gmail.com>
Date: Mon, 15 Sep 2025 16:25:44 -0400
Subject: [PATCH v14 07/24] Update PruneState.all_[visible|frozen] sooner in
pruning
We don't clear PruneState.all_visible and all_frozen during pruning when
we see LP_DEAD items because we want to still opportunistically freeze a
page if it would become frozen after vacuum's third phase.
Currently, this is fine because heap_page_prune_and_freeze() doesn't set
PD_ALL_VISIBLE or set bits in the VM. If we want to do that in the
future, we need all_visible and all_frozen to be accurate earlier in
heap_page_prune_and_freeze(). To do this, we must also move up
determination of the freeze conflict horizon. We use the visibility
cutoff xid even if the whole page won't be frozen until after vacuum's
third phase.
---
src/backend/access/heap/pruneheap.c | 95 ++++++++++++++---------------
1 file changed, 45 insertions(+), 50 deletions(-)
diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index 4ed74de6f27..5e536bd0d4d 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -296,7 +296,9 @@ heap_page_prune_opt(Relation relation, Buffer buffer)
* pre-freeze checks.
*
* do_prune, do_hint_full_or_prunable, and did_tuple_hint_fpi must all have
- * been decided before calling this function.
+ * been decided before calling this function. *frz_conflict_horizon is set to
+ * the snapshot conflict horizon we for the WAL record should we decide to freeze
+ * tuples.
*
* prstate is an input/output parameter.
*
@@ -308,7 +310,8 @@ heap_page_will_freeze(Relation relation, Buffer buffer,
bool did_tuple_hint_fpi,
bool do_prune,
bool do_hint_prune,
- PruneState *prstate)
+ PruneState *prstate,
+ TransactionId *frz_conflict_horizon)
{
bool do_freeze = false;
@@ -378,6 +381,22 @@ heap_page_will_freeze(Relation relation, Buffer buffer,
* critical section.
*/
heap_pre_freeze_checks(buffer, prstate->frozen, prstate->nfrozen);
+
+ /*
+ * Calculate what the snapshot conflict horizon should be for a record
+ * freezing tuples. We can use the visibility_cutoff_xid as our cutoff
+ * for conflicts when the whole page is eligible to become all-frozen
+ * in the VM once we're done with it. Otherwise we generate a
+ * conservative cutoff by stepping back from OldestXmin.
+ */
+ if (prstate->all_frozen)
+ *frz_conflict_horizon = prstate->visibility_cutoff_xid;
+ else
+ {
+ /* Avoids false conflicts when hot_standby_feedback in use */
+ *frz_conflict_horizon = prstate->cutoffs->OldestXmin;
+ TransactionIdRetreat(*frz_conflict_horizon);
+ }
}
else if (prstate->nfrozen > 0)
{
@@ -478,6 +497,7 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
bool do_hint_prune;
bool did_tuple_hint_fpi;
int64 fpi_before = pgWalUsage.wal_fpi;
+ TransactionId frz_conflict_horizon = InvalidTransactionId;
/* Copy parameters to prstate */
prstate.vistest = vistest;
@@ -546,10 +566,10 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
* are tuples present that are not visible to everyone or if there are
* dead tuples which are not yet removable. However, dead tuples which
* will be removed by the end of vacuuming should not preclude us from
- * opportunistically freezing. Because of that, we do not clear
- * all_visible when we see LP_DEAD items. We fix that at the end of the
- * function, when we return the value to the caller, so that the caller
- * doesn't set the VM bit incorrectly.
+ * opportunistically freezing. Because of that, we do not immediately
+ * clear all_visible when we see LP_DEAD items. We fix that after
+ * scanning the line pointers, before we return the value to the caller,
+ * so that the caller doesn't set the VM bit incorrectly.
*/
if (prstate.attempt_freeze)
{
@@ -784,8 +804,24 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
did_tuple_hint_fpi,
do_prune,
do_hint_prune,
- &prstate);
+ &prstate,
+ &frz_conflict_horizon);
+ /*
+ * While scanning the line pointers, we did not clear
+ * all_visible/all_frozen when encountering LP_DEAD items because we
+ * wanted the decision whether or not to freeze the page to be unaffected
+ * by the short-term presence of LP_DEAD items. These LP_DEAD items are
+ * effectively assumed to be LP_UNUSED items in the making. It doesn't
+ * matter which vacuum heap pass (initial pass or final pass) ends up
+ * setting the page all-frozen, as long as the ongoing VACUUM does it.
+ *
+ * Now that we finished determining whether or not to freeze the page,
+ * update all_visible and all_frozen so that they reflect the true state
+ * of the page for setting PD_ALL_VISIBLE and VM bits.
+ */
+ if (prstate.lpdead_items > 0)
+ prstate.all_visible = prstate.all_frozen = false;
Assert(!prstate.all_frozen || prstate.all_visible);
/* Any error while applying the changes is critical */
@@ -846,27 +882,8 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
* on the standby with xids older than the youngest tuple this
* record will freeze will conflict.
*/
- TransactionId frz_conflict_horizon = InvalidTransactionId;
TransactionId conflict_xid;
- /*
- * We can use the visibility_cutoff_xid as our cutoff for
- * conflicts when the whole page is eligible to become all-frozen
- * in the VM once we're done with it. Otherwise we generate a
- * conservative cutoff by stepping back from OldestXmin.
- */
- if (do_freeze)
- {
- if (prstate.all_frozen)
- frz_conflict_horizon = prstate.visibility_cutoff_xid;
- else
- {
- /* Avoids false conflicts when hot_standby_feedback in use */
- frz_conflict_horizon = prstate.cutoffs->OldestXmin;
- TransactionIdRetreat(frz_conflict_horizon);
- }
- }
-
if (TransactionIdFollows(frz_conflict_horizon, prstate.latest_xid_removed))
conflict_xid = frz_conflict_horizon;
else
@@ -890,30 +907,8 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
presult->nfrozen = prstate.nfrozen;
presult->live_tuples = prstate.live_tuples;
presult->recently_dead_tuples = prstate.recently_dead_tuples;
-
- /*
- * It was convenient to ignore LP_DEAD items in all_visible earlier on to
- * make the choice of whether or not to freeze the page unaffected by the
- * short-term presence of LP_DEAD items. These LP_DEAD items were
- * effectively assumed to be LP_UNUSED items in the making. It doesn't
- * matter which vacuum heap pass (initial pass or final pass) ends up
- * setting the page all-frozen, as long as the ongoing VACUUM does it.
- *
- * Now that freezing has been finalized, unset all_visible if there are
- * any LP_DEAD items on the page. It needs to reflect the present state
- * of the page, as expected by our caller.
- */
- if (prstate.all_visible && prstate.lpdead_items == 0)
- {
- presult->all_visible = prstate.all_visible;
- presult->all_frozen = prstate.all_frozen;
- }
- else
- {
- presult->all_visible = false;
- presult->all_frozen = false;
- }
-
+ presult->all_visible = prstate.all_visible;
+ presult->all_frozen = prstate.all_frozen;
presult->hastup = prstate.hastup;
/*
--
2.43.0
v14-0006-Add-helper-for-freeze-determination-to-heap_page.patchtext/x-patch; charset=US-ASCII; name=v14-0006-Add-helper-for-freeze-determination-to-heap_page.patchDownload
From aee92ee8a07beade81a82200fbbfe605d499ac4c Mon Sep 17 00:00:00 2001
From: Melanie Plageman <melanieplageman@gmail.com>
Date: Tue, 16 Sep 2025 14:22:10 -0400
Subject: [PATCH v14 06/24] Add helper for freeze determination to
heap_page_prune_and_freeze
After scanning through the line pointers on the heap page during
vacuum's first phase, we use several statuses and information we
collected to determine whether or not we will use the freeze plans we
assembled.
Do this in a helper for better readability.
---
src/backend/access/heap/pruneheap.c | 199 +++++++++++++++++-----------
1 file changed, 119 insertions(+), 80 deletions(-)
diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index 740aa07cd83..4ed74de6f27 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -289,6 +289,120 @@ heap_page_prune_opt(Relation relation, Buffer buffer)
}
}
+/*
+ * Decide if we want to go ahead with freezing according to the freeze plans
+ * we prepared for the given heap buffer or not. If the caller specified we
+ * should not freeze tuples, it exits early. Otherwise, it does a few
+ * pre-freeze checks.
+ *
+ * do_prune, do_hint_full_or_prunable, and did_tuple_hint_fpi must all have
+ * been decided before calling this function.
+ *
+ * prstate is an input/output parameter.
+ *
+ * Returns true if we should use our freeze plans and freeze tuples on the page
+ * and false otherwise.
+ */
+static bool
+heap_page_will_freeze(Relation relation, Buffer buffer,
+ bool did_tuple_hint_fpi,
+ bool do_prune,
+ bool do_hint_prune,
+ PruneState *prstate)
+{
+ bool do_freeze = false;
+
+ /*
+ * If the caller specified we should not attempt to freeze any tuples,
+ * validate that everything is in the right state and exit.
+ */
+ if (!prstate->attempt_freeze)
+ {
+ Assert(!prstate->all_frozen && prstate->nfrozen == 0);
+ Assert(prstate->lpdead_items == 0 || !prstate->all_visible);
+ return false;
+ }
+
+ if (prstate->pagefrz.freeze_required)
+ {
+ /*
+ * heap_prepare_freeze_tuple indicated that at least one XID/MXID from
+ * before FreezeLimit/MultiXactCutoff is present. Must freeze to
+ * advance relfrozenxid/relminmxid.
+ */
+ do_freeze = true;
+ }
+ else
+ {
+ /*
+ * Opportunistically freeze the page if we are generating an FPI
+ * anyway and if doing so means that we can set the page all-frozen
+ * afterwards (might not happen until VACUUM's final heap pass).
+ *
+ * XXX: Previously, we knew if pruning emitted an FPI by checking
+ * pgWalUsage.wal_fpi before and after pruning. Once the freeze and
+ * prune records were combined, this heuristic couldn't be used
+ * anymore. The opportunistic freeze heuristic must be improved;
+ * however, for now, try to approximate the old logic.
+ */
+ if (prstate->all_frozen && prstate->nfrozen > 0)
+ {
+ Assert(prstate->all_visible);
+
+ /*
+ * Freezing would make the page all-frozen. Have already emitted
+ * an FPI or will do so anyway?
+ */
+ if (RelationNeedsWAL(relation))
+ {
+ if (did_tuple_hint_fpi)
+ do_freeze = true;
+ else if (do_prune)
+ {
+ if (XLogCheckBufferNeedsBackup(buffer))
+ do_freeze = true;
+ }
+ else if (do_hint_prune)
+ {
+ if (XLogHintBitIsNeeded() && XLogCheckBufferNeedsBackup(buffer))
+ do_freeze = true;
+ }
+ }
+ }
+ }
+
+ if (do_freeze)
+ {
+ /*
+ * Validate the tuples we will be freezing before entering the
+ * critical section.
+ */
+ heap_pre_freeze_checks(buffer, prstate->frozen, prstate->nfrozen);
+ }
+ else if (prstate->nfrozen > 0)
+ {
+ /*
+ * The page contained some tuples that were not already frozen, and we
+ * chose not to freeze them now. The page won't be all-frozen then.
+ */
+ Assert(!prstate->pagefrz.freeze_required);
+
+ prstate->all_frozen = false;
+ prstate->nfrozen = 0; /* avoid miscounts in instrumentation */
+ }
+ else
+ {
+ /*
+ * We have no freeze plans to execute. The page might already be
+ * all-frozen (perhaps only following pruning), though. Such pages
+ * can be marked all-frozen in the VM by our caller, even though none
+ * of its tuples were newly frozen here.
+ */
+ }
+
+ return do_freeze;
+}
+
/*
* Prune and repair fragmentation and potentially freeze tuples on the
@@ -666,87 +780,12 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
* Decide if we want to go ahead with freezing according to the freeze
* plans we prepared, or not.
*/
- do_freeze = false;
- if (prstate.attempt_freeze)
- {
- if (prstate.pagefrz.freeze_required)
- {
- /*
- * heap_prepare_freeze_tuple indicated that at least one XID/MXID
- * from before FreezeLimit/MultiXactCutoff is present. Must
- * freeze to advance relfrozenxid/relminmxid.
- */
- do_freeze = true;
- }
- else
- {
- /*
- * Opportunistically freeze the page if we are generating an FPI
- * anyway and if doing so means that we can set the page
- * all-frozen afterwards (might not happen until VACUUM's final
- * heap pass).
- *
- * XXX: Previously, we knew if pruning emitted an FPI by checking
- * pgWalUsage.wal_fpi before and after pruning. Once the freeze
- * and prune records were combined, this heuristic couldn't be
- * used anymore. The opportunistic freeze heuristic must be
- * improved; however, for now, try to approximate the old logic.
- */
- if (prstate.all_frozen && prstate.nfrozen > 0)
- {
- Assert(prstate.all_visible);
+ do_freeze = heap_page_will_freeze(relation, buffer,
+ did_tuple_hint_fpi,
+ do_prune,
+ do_hint_prune,
+ &prstate);
- /*
- * Freezing would make the page all-frozen. Have already
- * emitted an FPI or will do so anyway?
- */
- if (RelationNeedsWAL(relation))
- {
- if (did_tuple_hint_fpi)
- do_freeze = true;
- else if (do_prune)
- {
- if (XLogCheckBufferNeedsBackup(buffer))
- do_freeze = true;
- }
- else if (do_hint_prune)
- {
- if (XLogHintBitIsNeeded() && XLogCheckBufferNeedsBackup(buffer))
- do_freeze = true;
- }
- }
- }
- }
- }
-
- if (do_freeze)
- {
- /*
- * Validate the tuples we will be freezing before entering the
- * critical section.
- */
- heap_pre_freeze_checks(buffer, prstate.frozen, prstate.nfrozen);
- }
- else if (prstate.nfrozen > 0)
- {
- /*
- * The page contained some tuples that were not already frozen, and we
- * chose not to freeze them now. The page won't be all-frozen then.
- */
- Assert(!prstate.pagefrz.freeze_required);
-
- prstate.all_frozen = false;
- prstate.nfrozen = 0; /* avoid miscounts in instrumentation */
- }
- else
- {
- /*
- * We have no freeze plans to execute. The page might already be
- * all-frozen (perhaps only following pruning), though. Such pages
- * can be marked all-frozen in the VM by our caller, even though none
- * of its tuples were newly frozen here.
- */
- }
Assert(!prstate.all_frozen || prstate.all_visible);
/* Any error while applying the changes is critical */
--
2.43.0
v14-0008-Set-PD_ALL_VISIBLE-in-heap_page_prune_and_freeze.patchtext/x-patch; charset=US-ASCII; name=v14-0008-Set-PD_ALL_VISIBLE-in-heap_page_prune_and_freeze.patchDownload
From 7ae7f9d9f1c05cf66d7fee964db801cbcf52a324 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <melanieplageman@gmail.com>
Date: Mon, 15 Sep 2025 16:32:35 -0400
Subject: [PATCH v14 08/24] Set PD_ALL_VISIBLE in heap_page_prune_and_freeze
After phase I of vacuum, if the heap page was rendered all-visible, we
can set it as such in the VM. We also must set the page-level
PD_ALL_VISIBLE bit. By setting PD_ALL_VISIBLE while making the other
changes to the heap page instead of while updating the VM, we can omit
the heap page from the WAL chain during the VM update. The result is
that xl_heap_prune records include updates to PD_ALL_VISIBLE.
This commit doesn't yet remove the heap page from the WAL chain because
it does not change other users of visibilitymap_set().
Note that this is carefully coded such that if the only modification to
the page during heap_page_prune_and_freeze() is setting PD_ALL_VISIBLE
and checksums/wal_log_hints are disabled we will never emit a full page
image of the heap page.
This also fixes a longstanding issue where, when checksums/wal_log_hints
are enabled, an all-visible page being set all-frozen may not mark the
buffer dirty before visibilitymap_set() stamps it with the
xl_heap_visible LSN.
It is noteworthy that the checks for page corruption and an inconsistent
state between the heap page and the VM in lazy_scan_prune() now happen
after having set PD_ALL_VISIBLE. That is not a functional change because
the corruption cases are mutually exclusive with cases where we would
set PD_ALL_VISIBLE.
---
src/backend/access/heap/heapam_xlog.c | 63 +++++++++++++++++++----
src/backend/access/heap/pruneheap.c | 72 ++++++++++++++++++++++++---
src/backend/access/heap/vacuumlazy.c | 29 +----------
src/include/access/heapam.h | 2 +
src/include/access/heapam_xlog.h | 2 +
5 files changed, 125 insertions(+), 43 deletions(-)
diff --git a/src/backend/access/heap/heapam_xlog.c b/src/backend/access/heap/heapam_xlog.c
index faa7c561a8a..a54238f2b59 100644
--- a/src/backend/access/heap/heapam_xlog.c
+++ b/src/backend/access/heap/heapam_xlog.c
@@ -90,6 +90,7 @@ heap_xlog_prune_freeze(XLogReaderState *record)
xlhp_freeze_plan *plans;
OffsetNumber *frz_offsets;
char *dataptr = XLogRecGetBlockData(record, 0, &datalen);
+ bool do_prune;
heap_xlog_deserialize_prune_and_freeze(dataptr, xlrec.flags,
&nplans, &plans, &frz_offsets,
@@ -97,11 +98,13 @@ heap_xlog_prune_freeze(XLogReaderState *record)
&ndead, &nowdead,
&nunused, &nowunused);
+ do_prune = nredirected > 0 || ndead > 0 || nunused > 0;
+
/*
* Update all line pointers per the record, and repair fragmentation
* if needed.
*/
- if (nredirected > 0 || ndead > 0 || nunused > 0)
+ if (do_prune)
heap_page_prune_execute(buffer,
(xlrec.flags & XLHP_CLEANUP_LOCK) == 0,
redirected, nredirected,
@@ -138,17 +141,52 @@ heap_xlog_prune_freeze(XLogReaderState *record)
/* There should be no more data */
Assert((char *) frz_offsets == dataptr + datalen);
+ /*
+ * The critical integrity requirement here is that we must never end
+ * up with a situation where the visibility map bit is set, and the
+ * page-level PD_ALL_VISIBLE bit is clear. If that were to occur,
+ * then a subsequent page modification would fail to clear the
+ * visibility map bit.
+ */
+ if (xlrec.flags & XLHP_SET_PD_ALL_VIS)
+ PageSetAllVisible(page);
+
/*
* Note: we don't worry about updating the page's prunability hints.
* At worst this will cause an extra prune cycle to occur soon.
*/
-
- PageSetLSN(page, lsn);
MarkBufferDirty(buffer);
+
+ /*
+ * We always emit a WAL record when setting PD_ALL_VISIBLE, but we are
+ * careful not to emit a full page image unless
+ * checksums/wal_log_hints are enabled. We only set the heap page LSN
+ * if full page images were an option when emitting WAL. Otherwise,
+ * subsequent modifications of the page may incorrectly skip emitting
+ * a full page image.
+ */
+ if (do_prune || nplans > 0 ||
+ (xlrec.flags & XLHP_SET_PD_ALL_VIS && XLogHintBitIsNeeded()))
+ PageSetLSN(page, lsn);
}
/*
- * If we released any space or line pointers, update the free space map.
+ * If we released any space or line pointers or set PD_ALL_VISIBLE update
+ * the freespace map.
+ *
+ * Even if we are just setting PD_ALL_VISIBLE (and thus not freeing up any
+ * space), we'll still update the FSM for this page. Since the FSM is not
+ * WAL-logged and only updated heuristically, it easily becomes stale in
+ * standbys. If the standby is later promoted and runs VACUUM, it will
+ * skip updating individual free space figures for pages that became
+ * all-visible (or all-frozen, depending on the vacuum mode,) which is
+ * troublesome when FreeSpaceMapVacuum propagates too optimistic free
+ * space values to upper FSM layers; later inserters try to use such pages
+ * only to find out that they are unusable. This can cause long stalls
+ * when there are many such pages.
+ *
+ * Forestall those problems by updating FSM's idea about a page that is
+ * becoming all-visible or all-frozen.
*
* Do this regardless of a full-page image being applied, since the FSM
* data is not in the page anyway.
@@ -157,10 +195,16 @@ heap_xlog_prune_freeze(XLogReaderState *record)
{
if (xlrec.flags & (XLHP_HAS_REDIRECTIONS |
XLHP_HAS_DEAD_ITEMS |
- XLHP_HAS_NOW_UNUSED_ITEMS))
+ XLHP_HAS_NOW_UNUSED_ITEMS |
+ XLHP_SET_PD_ALL_VIS))
{
Size freespace = PageGetHeapFreeSpace(BufferGetPage(buffer));
+ /*
+ * We want to avoid holding an exclusive lock on the heap buffer
+ * while doing IO, so we'll release the lock on the heap buffer
+ * first.
+ */
UnlockReleaseBuffer(buffer);
XLogRecordPageWithFreeSpace(rlocator, blkno, freespace);
@@ -173,10 +217,11 @@ heap_xlog_prune_freeze(XLogReaderState *record)
/*
* Replay XLOG_HEAP2_VISIBLE records.
*
- * The critical integrity requirement here is that we must never end up with
- * a situation where the visibility map bit is set, and the page-level
- * PD_ALL_VISIBLE bit is clear. If that were to occur, then a subsequent
- * page modification would fail to clear the visibility map bit.
+ * It is imperative that the previously emitted record set PD_ALL_VISIBLE on
+ * the heap page. We must never end up with a situation where the visibility
+ * map bit is set, and the page-level PD_ALL_VISIBLE bit is clear. If that
+ * were to occur, then a subsequent page modification would fail to clear the
+ * visibility map bit.
*/
static void
heap_xlog_visible(XLogReaderState *record)
diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index 5e536bd0d4d..9b25131543b 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -495,6 +495,7 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
bool do_freeze;
bool do_prune;
bool do_hint_prune;
+ bool do_set_pd_vis;
bool did_tuple_hint_fpi;
int64 fpi_before = pgWalUsage.wal_fpi;
TransactionId frz_conflict_horizon = InvalidTransactionId;
@@ -824,6 +825,22 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
prstate.all_visible = prstate.all_frozen = false;
Assert(!prstate.all_frozen || prstate.all_visible);
+
+ /*
+ * Though callers should set the VM if PD_ALL_VISIBLE is set here, it is
+ * allowed for the page-level bit to be set and the VM to be clear.
+ * Setting PD_ALL_VISIBLE when we are making the changes to the page that
+ * render it all-visible allows us to omit the heap page from the WAL
+ * chain when later updating the VM -- even when checksums/wal_log_hints
+ * are enabled.
+ */
+ do_set_pd_vis = false;
+ if ((options & HEAP_PAGE_PRUNE_UPDATE_VIS) != 0)
+ {
+ if (prstate.all_visible && !PageIsAllVisible(page))
+ do_set_pd_vis = true;
+ }
+
/* Any error while applying the changes is critical */
START_CRIT_SECTION();
@@ -844,14 +861,17 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
/*
* If that's all we had to do to the page, this is a non-WAL-logged
- * hint. If we are going to freeze or prune the page, we will mark
- * the buffer dirty below.
+ * hint. If we are going to freeze or prune the page or set
+ * PD_ALL_VISIBLE, we will mark the buffer dirty below.
+ *
+ * Setting PD_ALL_VISIBLE is fully WAL-logged because it is forbidden
+ * for the VM to be set and PD_ALL_VISIBLE to be clear.
*/
- if (!do_freeze && !do_prune)
+ if (!do_freeze && !do_prune && !do_set_pd_vis)
MarkBufferDirtyHint(buffer, true);
}
- if (do_prune || do_freeze)
+ if (do_prune || do_freeze || do_set_pd_vis)
{
/* Apply the planned item changes and repair page fragmentation. */
if (do_prune)
@@ -865,6 +885,9 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
if (do_freeze)
heap_freeze_prepared_tuples(buffer, prstate.frozen, prstate.nfrozen);
+ if (do_set_pd_vis)
+ PageSetAllVisible(page);
+
MarkBufferDirty(buffer);
/*
@@ -891,7 +914,9 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
log_heap_prune_and_freeze(relation, buffer,
conflict_xid,
- true, reason,
+ true,
+ do_set_pd_vis,
+ reason,
prstate.frozen, prstate.nfrozen,
prstate.redirected, prstate.nredirected,
prstate.nowdead, prstate.ndead,
@@ -2078,6 +2103,10 @@ heap_log_freeze_plan(HeapTupleFreeze *tuples, int ntuples,
* replaying 'unused' items depends on whether they were all previously marked
* as dead.
*
+ * set_pd_all_vis indicates that we set PD_ALL_VISIBLE and thus should update
+ * the page LSN when checksums/wal_log_hints are enabled even if we did not
+ * prune or freeze tuples on the page.
+ *
* Note: This function scribbles on the 'frozen' array.
*
* Note: This is called in a critical section, so careful what you do here.
@@ -2086,6 +2115,7 @@ void
log_heap_prune_and_freeze(Relation relation, Buffer buffer,
TransactionId conflict_xid,
bool cleanup_lock,
+ bool set_pd_all_vis,
PruneReason reason,
HeapTupleFreeze *frozen, int nfrozen,
OffsetNumber *redirected, int nredirected,
@@ -2095,6 +2125,7 @@ log_heap_prune_and_freeze(Relation relation, Buffer buffer,
xl_heap_prune xlrec;
XLogRecPtr recptr;
uint8 info;
+ uint8 regbuf_flags;
/* The following local variables hold data registered in the WAL record: */
xlhp_freeze_plan plans[MaxHeapTuplesPerPage];
@@ -2103,8 +2134,21 @@ log_heap_prune_and_freeze(Relation relation, Buffer buffer,
xlhp_prune_items dead_items;
xlhp_prune_items unused_items;
OffsetNumber frz_offsets[MaxHeapTuplesPerPage];
+ bool do_prune = nredirected > 0 || ndead > 0 || nunused > 0;
xlrec.flags = 0;
+ regbuf_flags = REGBUF_STANDARD;
+
+ /*
+ * We can avoid an FPI if the only modification we are making to the heap
+ * page is to set PD_ALL_VISIBLE and checksums/wal_log_hints are disabled.
+ * Note that if we explicitly skip an FPI, we must not set the heap page
+ * LSN later.
+ */
+ if (!do_prune &&
+ nfrozen == 0 &&
+ (!set_pd_all_vis || !XLogHintBitIsNeeded()))
+ regbuf_flags |= REGBUF_NO_IMAGE;
/*
* Prepare data for the buffer. The arrays are not actually in the
@@ -2112,7 +2156,7 @@ log_heap_prune_and_freeze(Relation relation, Buffer buffer,
* page image, the arrays can be omitted.
*/
XLogBeginInsert();
- XLogRegisterBuffer(0, buffer, REGBUF_STANDARD);
+ XLogRegisterBuffer(0, buffer, regbuf_flags);
if (nfrozen > 0)
{
int nplans;
@@ -2169,6 +2213,8 @@ log_heap_prune_and_freeze(Relation relation, Buffer buffer,
* Prepare the main xl_heap_prune record. We already set the XLHP_HAS_*
* flag above.
*/
+ if (set_pd_all_vis)
+ xlrec.flags |= XLHP_SET_PD_ALL_VIS;
if (RelationIsAccessibleInLogicalDecoding(relation))
xlrec.flags |= XLHP_IS_CATALOG_REL;
if (TransactionIdIsValid(conflict_xid))
@@ -2201,5 +2247,17 @@ log_heap_prune_and_freeze(Relation relation, Buffer buffer,
}
recptr = XLogInsert(RM_HEAP2_ID, info);
- PageSetLSN(BufferGetPage(buffer), recptr);
+ /*
+ * We must bump the page LSN if pruning or freezing. If we are only
+ * updating PD_ALL_VISIBLE, though, we can skip doing this unless
+ * wal_log_hints/checksums are enabled. Torn pages are possible if we
+ * update PD_ALL_VISIBLE without bumping the LSN, but this is deemed okay
+ * for page hint updates.
+ */
+ if (do_prune || nfrozen > 0 ||
+ (set_pd_all_vis && XLogHintBitIsNeeded()))
+ {
+ Assert(BufferIsDirty(buffer));
+ PageSetLSN(BufferGetPage(buffer), recptr);
+ }
}
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index 50cc898087f..308abff16ca 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -1970,7 +1970,7 @@ lazy_scan_prune(LVRelState *vacrel,
* tuples. Pruning will have determined whether or not the page is
* all-visible.
*/
- prune_options = HEAP_PAGE_PRUNE_FREEZE;
+ prune_options = HEAP_PAGE_PRUNE_FREEZE | HEAP_PAGE_PRUNE_UPDATE_VIS;
if (vacrel->nindexes == 0)
prune_options |= HEAP_PAGE_PRUNE_MARK_UNUSED_NOW;
@@ -2073,21 +2073,6 @@ lazy_scan_prune(LVRelState *vacrel,
flags |= VISIBILITYMAP_ALL_FROZEN;
}
- /*
- * It should never be the case that the visibility map page is set
- * while the page-level bit is clear, but the reverse is allowed (if
- * checksums are not enabled). Regardless, set both bits so that we
- * get back in sync.
- *
- * NB: If the heap page is all-visible but the VM bit is not set, we
- * don't need to dirty the heap page. However, if checksums are
- * enabled, we do need to make sure that the heap page is dirtied
- * before passing it to visibilitymap_set(), because it may be logged.
- * Given that this situation should only happen in rare cases after a
- * crash, it is not worth optimizing.
- */
- PageSetAllVisible(page);
- MarkBufferDirty(buf);
old_vmbits = visibilitymap_set(vacrel->rel, blkno, buf,
InvalidXLogRecPtr,
vmbuffer, presult.vm_conflict_horizon,
@@ -2168,17 +2153,6 @@ lazy_scan_prune(LVRelState *vacrel,
{
uint8 old_vmbits;
- /*
- * Avoid relying on all_visible_according_to_vm as a proxy for the
- * page-level PD_ALL_VISIBLE bit being set, since it might have become
- * stale -- even when all_visible is set
- */
- if (!PageIsAllVisible(page))
- {
- PageSetAllVisible(page);
- MarkBufferDirty(buf);
- }
-
/*
* Set the page all-frozen (and all-visible) in the VM.
*
@@ -2891,6 +2865,7 @@ lazy_vacuum_heap_page(LVRelState *vacrel, BlockNumber blkno, Buffer buffer,
log_heap_prune_and_freeze(vacrel->rel, buffer,
InvalidTransactionId,
false, /* no cleanup lock required */
+ false,
PRUNE_VACUUM_CLEANUP,
NULL, 0, /* frozen */
NULL, 0, /* redirected */
diff --git a/src/include/access/heapam.h b/src/include/access/heapam.h
index 34206a6a7d5..2f77d8dbcd6 100644
--- a/src/include/access/heapam.h
+++ b/src/include/access/heapam.h
@@ -42,6 +42,7 @@
/* "options" flag bits for heap_page_prune_and_freeze */
#define HEAP_PAGE_PRUNE_MARK_UNUSED_NOW (1 << 0)
#define HEAP_PAGE_PRUNE_FREEZE (1 << 1)
+#define HEAP_PAGE_PRUNE_UPDATE_VIS (1 << 2)
typedef struct BulkInsertStateData *BulkInsertState;
struct TupleTableSlot;
@@ -390,6 +391,7 @@ extern void heap_get_root_tuples(Page page, OffsetNumber *root_offsets);
extern void log_heap_prune_and_freeze(Relation relation, Buffer buffer,
TransactionId conflict_xid,
bool cleanup_lock,
+ bool set_pd_all_vis,
PruneReason reason,
HeapTupleFreeze *frozen, int nfrozen,
OffsetNumber *redirected, int nredirected,
diff --git a/src/include/access/heapam_xlog.h b/src/include/access/heapam_xlog.h
index d4c0625b632..7d3fb75dda7 100644
--- a/src/include/access/heapam_xlog.h
+++ b/src/include/access/heapam_xlog.h
@@ -294,6 +294,8 @@ typedef struct xl_heap_prune
#define SizeOfHeapPrune (offsetof(xl_heap_prune, flags) + sizeof(uint8))
+#define XLHP_SET_PD_ALL_VIS (1 << 0)
+
/* to handle recovery conflict during logical decoding on standby */
#define XLHP_IS_CATALOG_REL (1 << 1)
--
2.43.0
v14-0009-Combine-vacuum-phase-I-VM-update-cases.patchtext/x-patch; charset=US-ASCII; name=v14-0009-Combine-vacuum-phase-I-VM-update-cases.patchDownload
From a88a7f88097755d430d030753c4080aa4092ef7b Mon Sep 17 00:00:00 2001
From: Melanie Plageman <melanieplageman@gmail.com>
Date: Mon, 15 Sep 2025 17:48:38 -0400
Subject: [PATCH v14 09/24] Combine vacuum phase I VM update cases
We update the VM after phase I of vacuum -- either setting both the VM
bits when all bits are currently unset or setting just the frozen bit
when the all-visible bit is already set.
Those two cases shared much of the same code -- leading to unnecessary
duplication. This commit combines them, which is simpler and easier to
understand.
---
src/backend/access/heap/vacuumlazy.c | 68 ++++++++--------------------
1 file changed, 18 insertions(+), 50 deletions(-)
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index 308abff16ca..5a6bbbd97f2 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -2058,15 +2058,22 @@ lazy_scan_prune(LVRelState *vacrel,
Assert(!presult.all_frozen || presult.all_visible);
/*
- * Handle setting visibility map bit based on information from the VM (as
+ * Handle setting visibility map bits based on information from the VM (as
* of last heap_vac_scan_next_block() call), and from all_visible and
- * all_frozen variables
+ * all_frozen variables.
*/
- if (!all_visible_according_to_vm && presult.all_visible)
+ if ((presult.all_visible && !all_visible_according_to_vm) ||
+ (presult.all_frozen && !VM_ALL_FROZEN(vacrel->rel, blkno, &vmbuffer)))
{
uint8 old_vmbits;
uint8 flags = VISIBILITYMAP_ALL_VISIBLE;
+ /*
+ * If the page is all-frozen, we can pass InvalidTransactionId as our
+ * cutoff_xid, since a snapshotConflictHorizon sufficient to make
+ * everything safe for REDO was logged when the page's tuples were
+ * frozen.
+ */
if (presult.all_frozen)
{
Assert(!TransactionIdIsValid(presult.vm_conflict_horizon));
@@ -2079,6 +2086,12 @@ lazy_scan_prune(LVRelState *vacrel,
flags);
/*
+ * Even if we are only setting the all-frozen bit, there is a small
+ * chance that the VM was modified sometime between setting
+ * all_visible_according_to_vm and checking the visibility during
+ * pruning. Check the return value of old_vmbits to ensure the
+ * visibility map counters used for logging are accurate.
+ *
* If the page wasn't already set all-visible and/or all-frozen in the
* VM, count it as newly set for logging.
*/
@@ -2100,6 +2113,8 @@ lazy_scan_prune(LVRelState *vacrel,
}
/*
+ * Now handle two potential corruption cases:
+ *
* As of PostgreSQL 9.2, the visibility map bit should never be set if the
* page-level bit is clear. However, it's possible that the bit got
* cleared after heap_vac_scan_next_block() was called, so we must recheck
@@ -2144,53 +2159,6 @@ lazy_scan_prune(LVRelState *vacrel,
VISIBILITYMAP_VALID_BITS);
}
- /*
- * If the all-visible page is all-frozen but not marked as such yet, mark
- * it as all-frozen.
- */
- else if (all_visible_according_to_vm && presult.all_frozen &&
- !VM_ALL_FROZEN(vacrel->rel, blkno, &vmbuffer))
- {
- uint8 old_vmbits;
-
- /*
- * Set the page all-frozen (and all-visible) in the VM.
- *
- * We can pass InvalidTransactionId as our cutoff_xid, since a
- * snapshotConflictHorizon sufficient to make everything safe for REDO
- * was logged when the page's tuples were frozen.
- */
- Assert(!TransactionIdIsValid(presult.vm_conflict_horizon));
- old_vmbits = visibilitymap_set(vacrel->rel, blkno, buf,
- InvalidXLogRecPtr,
- vmbuffer, InvalidTransactionId,
- VISIBILITYMAP_ALL_VISIBLE |
- VISIBILITYMAP_ALL_FROZEN);
-
- /*
- * The page was likely already set all-visible in the VM. However,
- * there is a small chance that it was modified sometime between
- * setting all_visible_according_to_vm and checking the visibility
- * during pruning. Check the return value of old_vmbits anyway to
- * ensure the visibility map counters used for logging are accurate.
- */
- if ((old_vmbits & VISIBILITYMAP_ALL_VISIBLE) == 0)
- {
- vacrel->vm_new_visible_pages++;
- vacrel->vm_new_visible_frozen_pages++;
- *vm_page_frozen = true;
- }
-
- /*
- * We already checked that the page was not set all-frozen in the VM
- * above, so we don't need to test the value of old_vmbits.
- */
- else
- {
- vacrel->vm_new_frozen_pages++;
- *vm_page_frozen = true;
- }
- }
return presult.ndeleted;
}
--
2.43.0
v14-0010-Vacuum-phase-III-set-PD_ALL_VISIBLE-in-vacuum-WA.patchtext/x-patch; charset=US-ASCII; name=v14-0010-Vacuum-phase-III-set-PD_ALL_VISIBLE-in-vacuum-WA.patchDownload
From aafd0b18341a03d4b48574f28694d04891555c5e Mon Sep 17 00:00:00 2001
From: Melanie Plageman <melanieplageman@gmail.com>
Date: Tue, 16 Sep 2025 10:39:31 -0400
Subject: [PATCH v14 10/24] Vacuum phase III set PD_ALL_VISIBLE in vacuum WAL
record
Instead of setting PD_ALL_VISIBLE on the heap page when setting bits in
the VM, set it when flipping the line pointers on the page to LP_UNUSED.
This will allow us to omit the heap page from the VM WAL chain.
To do this, we must check if the page will be all-visible once we flip
the line pointers before we actually do so.
One functional change is that a single critical section surrounds both
the VM update and the heap update. Previously they were each in a
critical section, so we could crash and have set PD_ALL_VISIBLE but not
set bits in the VM.
---
src/backend/access/heap/vacuumlazy.c | 140 ++++++++++++++++++++-------
1 file changed, 105 insertions(+), 35 deletions(-)
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index 5a6bbbd97f2..9bfcd67a61b 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -465,6 +465,11 @@ static void dead_items_reset(LVRelState *vacrel);
static void dead_items_cleanup(LVRelState *vacrel);
static bool heap_page_is_all_visible(LVRelState *vacrel, Buffer buf,
TransactionId *visibility_cutoff_xid, bool *all_frozen);
+static bool heap_page_would_be_all_visible(LVRelState *vacrel, Buffer buf,
+ OffsetNumber *deadoffsets,
+ int ndeadoffsets,
+ bool *all_frozen,
+ TransactionId *visibility_cutoff_xid);
static void update_relstats_all_indexes(LVRelState *vacrel);
static void vacuum_error_callback(void *arg);
static void update_vacuum_error_info(LVRelState *vacrel,
@@ -2793,6 +2798,7 @@ lazy_vacuum_heap_page(LVRelState *vacrel, BlockNumber blkno, Buffer buffer,
TransactionId visibility_cutoff_xid;
bool all_frozen;
LVSavedErrInfo saved_err_info;
+ uint8 vmflags = 0;
Assert(vacrel->do_index_vacuuming);
@@ -2803,6 +2809,18 @@ lazy_vacuum_heap_page(LVRelState *vacrel, BlockNumber blkno, Buffer buffer,
VACUUM_ERRCB_PHASE_VACUUM_HEAP, blkno,
InvalidOffsetNumber);
+ if (heap_page_would_be_all_visible(vacrel, buffer,
+ deadoffsets, num_offsets,
+ &all_frozen, &visibility_cutoff_xid))
+ {
+ vmflags |= VISIBILITYMAP_ALL_VISIBLE;
+ if (all_frozen)
+ {
+ vmflags |= VISIBILITYMAP_ALL_FROZEN;
+ Assert(!TransactionIdIsValid(visibility_cutoff_xid));
+ }
+ }
+
START_CRIT_SECTION();
for (int i = 0; i < num_offsets; i++)
@@ -2822,6 +2840,13 @@ lazy_vacuum_heap_page(LVRelState *vacrel, BlockNumber blkno, Buffer buffer,
/* Attempt to truncate line pointer array now */
PageTruncateLinePointerArray(page);
+ /*
+ * The page will never have PD_ALL_VISIBLE already set, so if we are
+ * setting the VM, we must set PD_ALL_VISIBLE as well.
+ */
+ if ((vmflags & VISIBILITYMAP_VALID_BITS) != 0)
+ PageSetAllVisible(page);
+
/*
* Mark buffer dirty before we write WAL.
*/
@@ -2833,7 +2858,7 @@ lazy_vacuum_heap_page(LVRelState *vacrel, BlockNumber blkno, Buffer buffer,
log_heap_prune_and_freeze(vacrel->rel, buffer,
InvalidTransactionId,
false, /* no cleanup lock required */
- false,
+ (vmflags & VISIBILITYMAP_VALID_BITS) != 0,
PRUNE_VACUUM_CLEANUP,
NULL, 0, /* frozen */
NULL, 0, /* redirected */
@@ -2842,36 +2867,26 @@ lazy_vacuum_heap_page(LVRelState *vacrel, BlockNumber blkno, Buffer buffer,
}
/*
- * End critical section, so we safely can do visibility tests (which
- * possibly need to perform IO and allocate memory!). If we crash now the
- * page (including the corresponding vm bit) might not be marked all
- * visible, but that's fine. A later vacuum will fix that.
+ * Note that we don't end the critical section until after emitting the VM
+ * record. This ensures both PD_ALL_VISIBLE and the VM bits are set or
+ * unset in the event of a crash. While it is correct for PD_ALL_VISIBLE
+ * to be set and the VM to be clear, we should do our best to keep these
+ * in sync. This does mean that we will take a lock on the VM buffer
+ * inside of a critical section, which is generally discouraged. There is
+ * precedent for this in other callers of visibilitymap_set(), though.
*/
- END_CRIT_SECTION();
/*
- * Now that we have removed the LP_DEAD items from the page, once again
- * check if the page has become all-visible. The page is already marked
- * dirty, exclusively locked, and, if needed, a full page image has been
- * emitted.
+ * Now that we have removed the LP_DEAD items from the page, set the
+ * visibility map if the page became all-visible/all-frozen. Changes to
+ * the heap page have already been logged.
*/
- Assert(!PageIsAllVisible(page));
- if (heap_page_is_all_visible(vacrel, buffer, &visibility_cutoff_xid,
- &all_frozen))
+ if ((vmflags & VISIBILITYMAP_ALL_VISIBLE) != 0)
{
- uint8 flags = VISIBILITYMAP_ALL_VISIBLE;
-
- if (all_frozen)
- {
- Assert(!TransactionIdIsValid(visibility_cutoff_xid));
- flags |= VISIBILITYMAP_ALL_FROZEN;
- }
-
- PageSetAllVisible(page);
visibilitymap_set(vacrel->rel, blkno, buffer,
InvalidXLogRecPtr,
vmbuffer, visibility_cutoff_xid,
- flags);
+ vmflags);
/* Count the newly set VM page for logging */
vacrel->vm_new_visible_pages++;
@@ -2879,6 +2894,8 @@ lazy_vacuum_heap_page(LVRelState *vacrel, BlockNumber blkno, Buffer buffer,
vacrel->vm_new_visible_frozen_pages++;
}
+ END_CRIT_SECTION();
+
/* Revert to the previous phase information for error traceback */
restore_vacuum_error_info(vacrel, &saved_err_info);
}
@@ -3540,30 +3557,77 @@ dead_items_cleanup(LVRelState *vacrel)
}
/*
- * Check if every tuple in the given page is visible to all current and future
- * transactions. Also return the visibility_cutoff_xid which is the highest
- * xmin amongst the visible tuples. Set *all_frozen to true if every tuple
- * on this page is frozen.
- *
- * This is a stripped down version of lazy_scan_prune(). If you change
- * anything here, make sure that everything stays in sync. Note that an
- * assertion calls us to verify that everybody still agrees. Be sure to avoid
- * introducing new side-effects here.
+ * Wrapper for heap_page_would_be_all_visible() which can be used for
+ * callers that expect no LP_DEAD on the page.
*/
static bool
heap_page_is_all_visible(LVRelState *vacrel, Buffer buf,
TransactionId *visibility_cutoff_xid,
bool *all_frozen)
{
+
+ return heap_page_would_be_all_visible(vacrel, buf,
+ NULL, 0,
+ all_frozen,
+ visibility_cutoff_xid);
+}
+
+/*
+ * Determines whether or not the heap page in buf is all-visible other than
+ * the dead line pointers referred to by the provided deadoffsets array.
+ *
+ * deadoffsets are the offsets the caller knows about and already removed
+ * associated index entries. Vacuum will call this before setting those line
+ * pointers LP_UNUSED. So, if there are no new LP_DEAD items, then the page
+ * can be set all-visible in the VM by the caller.
+ *
+ * Returns true if the page is all-visible other than the provided
+ * deadoffsets and false otherwise.
+ *
+ * vacrel->cutoffs.OldestXmin is used to determine visibility.
+ *
+ * *all_frozen is an output parameter indicating to the caller if every tuple
+ * on the page is frozen.
+ *
+ * *visibility_cutoff_xid is an output parameter with the highest xmin amongst the
+ * visible tuples. It is only valid if the page is all-visible.
+ *
+ * Callers looking to verify that the page is already all-visible can call
+ * heap_page_is_all_visible().
+ *
+ * This is similar logic to that in heap_prune_record_unchanged_lp_normal() If
+ * you change anything here, make sure that everything stays in sync. Note
+ * that an assertion calls us to verify that everybody still agrees. Be sure
+ * to avoid introducing new side-effects here.
+ */
+static bool
+heap_page_would_be_all_visible(LVRelState *vacrel, Buffer buf,
+ OffsetNumber *deadoffsets,
+ int ndeadoffsets,
+ bool *all_frozen,
+ TransactionId *visibility_cutoff_xid)
+{
Page page = BufferGetPage(buf);
BlockNumber blockno = BufferGetBlockNumber(buf);
OffsetNumber offnum,
maxoff;
bool all_visible = true;
+ int matched_dead_count = 0;
*visibility_cutoff_xid = InvalidTransactionId;
*all_frozen = true;
+ Assert(ndeadoffsets == 0 || deadoffsets);
+
+#ifdef USE_ASSERT_CHECKING
+ /* Confirm input deadoffsets[] is strictly sorted */
+ if (ndeadoffsets > 1)
+ {
+ for (int i = 1; i < ndeadoffsets; i++)
+ Assert(deadoffsets[i - 1] < deadoffsets[i]);
+ }
+#endif
+
maxoff = PageGetMaxOffsetNumber(page);
for (offnum = FirstOffsetNumber;
offnum <= maxoff && all_visible;
@@ -3591,9 +3655,15 @@ heap_page_is_all_visible(LVRelState *vacrel, Buffer buf,
*/
if (ItemIdIsDead(itemid))
{
- all_visible = false;
- *all_frozen = false;
- break;
+ if (!deadoffsets ||
+ matched_dead_count >= ndeadoffsets ||
+ deadoffsets[matched_dead_count] != offnum)
+ {
+ *all_frozen = all_visible = false;
+ break;
+ }
+ matched_dead_count++;
+ continue;
}
Assert(ItemIdIsNormal(itemid));
--
2.43.0
v14-0011-Log-setting-empty-pages-PD_ALL_VISIBLE-with-XLOG.patchtext/x-patch; charset=US-ASCII; name=v14-0011-Log-setting-empty-pages-PD_ALL_VISIBLE-with-XLOG.patchDownload
From d774b80288042d9a31cbc6477c2f0151f1c9dc2e Mon Sep 17 00:00:00 2001
From: Melanie Plageman <melanieplageman@gmail.com>
Date: Wed, 17 Sep 2025 18:11:49 -0400
Subject: [PATCH v14 11/24] Log setting empty pages PD_ALL_VISIBLE with
XLOG_HEAP2_VACUUM_SCAN
Though not a big win for this particular case, if we use the
XLOG_HEAP2_VACUUM_SCAN record to log setting PD_ALL_VISIBLE on the heap
page we can omit the heap page from the WAL chain when setting the
visibility map. A follow-on commit will actually remove the heap page
from the VM set WAL chain.
---
src/backend/access/heap/vacuumlazy.c | 43 +++++++++++++++++++---------
1 file changed, 29 insertions(+), 14 deletions(-)
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index 9bfcd67a61b..c016f8f7c25 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -1879,23 +1879,38 @@ lazy_scan_new_or_empty(LVRelState *vacrel, Buffer buf, BlockNumber blkno,
{
START_CRIT_SECTION();
- /* mark buffer dirty before writing a WAL record */
+ PageSetAllVisible(page);
MarkBufferDirty(buf);
- /*
- * It's possible that another backend has extended the heap,
- * initialized the page, and then failed to WAL-log the page due
- * to an ERROR. Since heap extension is not WAL-logged, recovery
- * might try to replay our record setting the page all-visible and
- * find that the page isn't initialized, which will cause a PANIC.
- * To prevent that, check whether the page has been previously
- * WAL-logged, and if not, do that now.
- */
- if (RelationNeedsWAL(vacrel->rel) &&
- PageGetLSN(page) == InvalidXLogRecPtr)
- log_newpage_buffer(buf, true);
+ if (RelationNeedsWAL(vacrel->rel))
+ {
+ /*
+ * It's possible that another backend has extended the heap,
+ * initialized the page, and then failed to WAL-log the page
+ * due to an ERROR. Since heap extension is not WAL-logged,
+ * recovery might try to replay our record setting the page
+ * all-visible and find that the page isn't initialized, which
+ * will cause a PANIC. To prevent that, check whether the page
+ * has been previously WAL-logged, and if not, do that now.
+ *
+ * Otherwise, just emit WAL for setting PD_ALL_VISIBLE on the
+ * heap page. Doing this in a separate record from setting the
+ * VM allows us to omit the heap page from the VM WAL chain.
+ */
+ if (PageGetLSN(page) == InvalidXLogRecPtr)
+ log_newpage_buffer(buf, true);
+ else
+ log_heap_prune_and_freeze(vacrel->rel, buf,
+ InvalidTransactionId, /* conflict xid */
+ false, /* cleanup lock */
+ true, /* set_pd_all_vis */
+ PRUNE_VACUUM_SCAN, /* reason */
+ NULL, 0,
+ NULL, 0,
+ NULL, 0,
+ NULL, 0);
+ }
- PageSetAllVisible(page);
visibilitymap_set(vacrel->rel, blkno, buf,
InvalidXLogRecPtr,
vmbuffer, InvalidTransactionId,
--
2.43.0
v14-0012-Remove-heap-buffer-from-XLOG_HEAP2_VISIBLE-WAL-c.patchtext/x-patch; charset=US-ASCII; name=v14-0012-Remove-heap-buffer-from-XLOG_HEAP2_VISIBLE-WAL-c.patchDownload
From a63eed81ff73217a12cbb84b2a7f4def3366871a Mon Sep 17 00:00:00 2001
From: Melanie Plageman <melanieplageman@gmail.com>
Date: Tue, 16 Sep 2025 11:05:30 -0400
Subject: [PATCH v14 12/24] Remove heap buffer from XLOG_HEAP2_VISIBLE WAL
chain
Now that all users of visibilitymap_set() include setting PD_ALL_VISIBLE
in the WAL record capturing other changes to the heap page, we no longer
need to include the heap buffer in the WAL chain for setting the VM.
---
src/backend/access/heap/heapam.c | 16 +-----
src/backend/access/heap/heapam_xlog.c | 76 +++----------------------
src/backend/access/heap/vacuumlazy.c | 6 +-
src/backend/access/heap/visibilitymap.c | 31 +---------
src/include/access/heapam_xlog.h | 3 +-
src/include/access/visibilitymap.h | 2 +-
6 files changed, 16 insertions(+), 118 deletions(-)
diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index c8cd9d22726..0323e2df409 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -8807,21 +8807,14 @@ bottomup_sort_and_shrink(TM_IndexDeleteOp *delstate)
*
* snapshotConflictHorizon comes from the largest xmin on the page being
* marked all-visible. REDO routine uses it to generate recovery conflicts.
- *
- * If checksums or wal_log_hints are enabled, we may also generate a full-page
- * image of heap_buffer. Otherwise, we optimize away the FPI (by specifying
- * REGBUF_NO_IMAGE for the heap buffer), in which case the caller should *not*
- * update the heap page's LSN.
*/
XLogRecPtr
-log_heap_visible(Relation rel, Buffer heap_buffer, Buffer vm_buffer,
+log_heap_visible(Relation rel, Buffer vm_buffer,
TransactionId snapshotConflictHorizon, uint8 vmflags)
{
xl_heap_visible xlrec;
XLogRecPtr recptr;
- uint8 flags;
- Assert(BufferIsValid(heap_buffer));
Assert(BufferIsValid(vm_buffer));
xlrec.snapshotConflictHorizon = snapshotConflictHorizon;
@@ -8830,14 +8823,7 @@ log_heap_visible(Relation rel, Buffer heap_buffer, Buffer vm_buffer,
xlrec.flags |= VISIBILITYMAP_XLOG_CATALOG_REL;
XLogBeginInsert();
XLogRegisterData(&xlrec, SizeOfHeapVisible);
-
XLogRegisterBuffer(0, vm_buffer, 0);
-
- flags = REGBUF_STANDARD;
- if (!XLogHintBitIsNeeded())
- flags |= REGBUF_NO_IMAGE;
- XLogRegisterBuffer(1, heap_buffer, flags);
-
recptr = XLogInsert(RM_HEAP2_ID, XLOG_HEAP2_VISIBLE);
return recptr;
diff --git a/src/backend/access/heap/heapam_xlog.c b/src/backend/access/heap/heapam_xlog.c
index a54238f2b59..68b41f39e69 100644
--- a/src/backend/access/heap/heapam_xlog.c
+++ b/src/backend/access/heap/heapam_xlog.c
@@ -229,15 +229,12 @@ heap_xlog_visible(XLogReaderState *record)
XLogRecPtr lsn = record->EndRecPtr;
xl_heap_visible *xlrec = (xl_heap_visible *) XLogRecGetData(record);
Buffer vmbuffer = InvalidBuffer;
- Buffer buffer;
- Page page;
RelFileLocator rlocator;
BlockNumber blkno;
- XLogRedoAction action;
Assert((xlrec->flags & VISIBILITYMAP_XLOG_VALID_BITS) == xlrec->flags);
- XLogRecGetBlockTag(record, 1, &rlocator, NULL, &blkno);
+ XLogRecGetBlockTag(record, 0, &rlocator, NULL, &blkno);
/*
* If there are any Hot Standby transactions running that have an xmin
@@ -254,70 +251,11 @@ heap_xlog_visible(XLogReaderState *record)
rlocator);
/*
- * Read the heap page, if it still exists. If the heap file has dropped or
- * truncated later in recovery, we don't need to update the page, but we'd
- * better still update the visibility map.
- */
- action = XLogReadBufferForRedo(record, 1, &buffer);
- if (action == BLK_NEEDS_REDO)
- {
- /*
- * We don't bump the LSN of the heap page when setting the visibility
- * map bit (unless checksums or wal_hint_bits is enabled, in which
- * case we must). This exposes us to torn page hazards, but since
- * we're not inspecting the existing page contents in any way, we
- * don't care.
- */
- page = BufferGetPage(buffer);
-
- PageSetAllVisible(page);
-
- if (XLogHintBitIsNeeded())
- PageSetLSN(page, lsn);
-
- MarkBufferDirty(buffer);
- }
- else if (action == BLK_RESTORED)
- {
- /*
- * If heap block was backed up, we already restored it and there's
- * nothing more to do. (This can only happen with checksums or
- * wal_log_hints enabled.)
- */
- }
-
- if (BufferIsValid(buffer))
- {
- Size space = PageGetFreeSpace(BufferGetPage(buffer));
-
- UnlockReleaseBuffer(buffer);
-
- /*
- * Since FSM is not WAL-logged and only updated heuristically, it
- * easily becomes stale in standbys. If the standby is later promoted
- * and runs VACUUM, it will skip updating individual free space
- * figures for pages that became all-visible (or all-frozen, depending
- * on the vacuum mode,) which is troublesome when FreeSpaceMapVacuum
- * propagates too optimistic free space values to upper FSM layers;
- * later inserters try to use such pages only to find out that they
- * are unusable. This can cause long stalls when there are many such
- * pages.
- *
- * Forestall those problems by updating FSM's idea about a page that
- * is becoming all-visible or all-frozen.
- *
- * Do this regardless of a full-page image being applied, since the
- * FSM data is not in the page anyway.
- */
- if (xlrec->flags & VISIBILITYMAP_VALID_BITS)
- XLogRecordPageWithFreeSpace(rlocator, blkno, space);
- }
-
- /*
- * Even if we skipped the heap page update due to the LSN interlock, it's
- * still safe to update the visibility map. Any WAL record that clears
- * the visibility map bit does so before checking the page LSN, so any
- * bits that need to be cleared will still be cleared.
+ * Even if the heap relation was dropped or truncated and the previously
+ * emitted record skipped the heap page update due to this LSN interlock,
+ * it's still safe to update the visibility map. Any WAL record that
+ * clears the visibility map bit does so before checking the page LSN, so
+ * any bits that need to be cleared will still be cleared.
*/
if (XLogReadBufferForRedoExtended(record, 0, RBM_ZERO_ON_ERROR, false,
&vmbuffer) == BLK_NEEDS_REDO)
@@ -341,7 +279,7 @@ heap_xlog_visible(XLogReaderState *record)
reln = CreateFakeRelcacheEntry(rlocator);
- visibilitymap_set(reln, blkno, InvalidBuffer, lsn, vmbuffer,
+ visibilitymap_set(reln, blkno, lsn, vmbuffer,
xlrec->snapshotConflictHorizon, vmbits);
ReleaseBuffer(vmbuffer);
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index c016f8f7c25..735f1e7501e 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -1911,7 +1911,7 @@ lazy_scan_new_or_empty(LVRelState *vacrel, Buffer buf, BlockNumber blkno,
NULL, 0);
}
- visibilitymap_set(vacrel->rel, blkno, buf,
+ visibilitymap_set(vacrel->rel, blkno,
InvalidXLogRecPtr,
vmbuffer, InvalidTransactionId,
VISIBILITYMAP_ALL_VISIBLE |
@@ -2100,7 +2100,7 @@ lazy_scan_prune(LVRelState *vacrel,
flags |= VISIBILITYMAP_ALL_FROZEN;
}
- old_vmbits = visibilitymap_set(vacrel->rel, blkno, buf,
+ old_vmbits = visibilitymap_set(vacrel->rel, blkno,
InvalidXLogRecPtr,
vmbuffer, presult.vm_conflict_horizon,
flags);
@@ -2898,7 +2898,7 @@ lazy_vacuum_heap_page(LVRelState *vacrel, BlockNumber blkno, Buffer buffer,
*/
if ((vmflags & VISIBILITYMAP_ALL_VISIBLE) != 0)
{
- visibilitymap_set(vacrel->rel, blkno, buffer,
+ visibilitymap_set(vacrel->rel, blkno,
InvalidXLogRecPtr,
vmbuffer, visibility_cutoff_xid,
vmflags);
diff --git a/src/backend/access/heap/visibilitymap.c b/src/backend/access/heap/visibilitymap.c
index aa48a436108..75fcb3f067a 100644
--- a/src/backend/access/heap/visibilitymap.c
+++ b/src/backend/access/heap/visibilitymap.c
@@ -233,9 +233,7 @@ visibilitymap_pin_ok(BlockNumber heapBlk, Buffer vmbuf)
* when a page that is already all-visible is being marked all-frozen.
*
* Caller is expected to set the heap page's PD_ALL_VISIBLE bit before calling
- * this function. Except in recovery, caller should also pass the heap
- * buffer. When checksums are enabled and we're not in recovery, we must add
- * the heap buffer to the WAL chain to protect it from being torn.
+ * this function.
*
* You must pass a buffer containing the correct map page to this function.
* Call visibilitymap_pin first to pin the right one. This function doesn't do
@@ -244,7 +242,7 @@ visibilitymap_pin_ok(BlockNumber heapBlk, Buffer vmbuf)
* Returns the state of the page's VM bits before setting flags.
*/
uint8
-visibilitymap_set(Relation rel, BlockNumber heapBlk, Buffer heapBuf,
+visibilitymap_set(Relation rel, BlockNumber heapBlk,
XLogRecPtr recptr, Buffer vmBuf, TransactionId cutoff_xid,
uint8 flags)
{
@@ -261,18 +259,11 @@ visibilitymap_set(Relation rel, BlockNumber heapBlk, Buffer heapBuf,
#endif
Assert(InRecovery || XLogRecPtrIsInvalid(recptr));
- Assert(InRecovery || PageIsAllVisible(BufferGetPage(heapBuf)));
Assert((flags & VISIBILITYMAP_VALID_BITS) == flags);
/* Must never set all_frozen bit without also setting all_visible bit */
Assert(flags != VISIBILITYMAP_ALL_FROZEN);
- /* Check that we have the right heap page pinned, if present */
- if (BufferIsValid(heapBuf) && BufferGetBlockNumber(heapBuf) != heapBlk)
- elog(ERROR, "wrong heap buffer passed to visibilitymap_set");
-
- Assert(!BufferIsValid(heapBuf) || BufferIsExclusiveLocked(heapBuf));
-
/* Check that we have the right VM page pinned */
if (!BufferIsValid(vmBuf) || BufferGetBlockNumber(vmBuf) != mapBlock)
elog(ERROR, "wrong VM buffer passed to visibilitymap_set");
@@ -294,23 +285,7 @@ visibilitymap_set(Relation rel, BlockNumber heapBlk, Buffer heapBuf,
if (XLogRecPtrIsInvalid(recptr))
{
Assert(!InRecovery);
- recptr = log_heap_visible(rel, heapBuf, vmBuf, cutoff_xid, flags);
-
- /*
- * If data checksums are enabled (or wal_log_hints=on), we
- * need to protect the heap page from being torn.
- *
- * If not, then we must *not* update the heap page's LSN. In
- * this case, the FPI for the heap page was omitted from the
- * WAL record inserted above, so it would be incorrect to
- * update the heap page's LSN.
- */
- if (XLogHintBitIsNeeded())
- {
- Page heapPage = BufferGetPage(heapBuf);
-
- PageSetLSN(heapPage, recptr);
- }
+ recptr = log_heap_visible(rel, vmBuf, cutoff_xid, flags);
}
PageSetLSN(page, recptr);
}
diff --git a/src/include/access/heapam_xlog.h b/src/include/access/heapam_xlog.h
index 7d3fb75dda7..82b8f7f2bbc 100644
--- a/src/include/access/heapam_xlog.h
+++ b/src/include/access/heapam_xlog.h
@@ -440,7 +440,6 @@ typedef struct xl_heap_inplace
* This is what we need to know about setting a visibility map bit
*
* Backup blk 0: visibility map buffer
- * Backup blk 1: heap buffer
*/
typedef struct xl_heap_visible
{
@@ -493,7 +492,7 @@ extern void heap2_desc(StringInfo buf, XLogReaderState *record);
extern const char *heap2_identify(uint8 info);
extern void heap_xlog_logical_rewrite(XLogReaderState *r);
-extern XLogRecPtr log_heap_visible(Relation rel, Buffer heap_buffer,
+extern XLogRecPtr log_heap_visible(Relation rel,
Buffer vm_buffer,
TransactionId snapshotConflictHorizon,
uint8 vmflags);
diff --git a/src/include/access/visibilitymap.h b/src/include/access/visibilitymap.h
index fc7056a91ea..302adf4856a 100644
--- a/src/include/access/visibilitymap.h
+++ b/src/include/access/visibilitymap.h
@@ -32,7 +32,7 @@ extern void visibilitymap_pin(Relation rel, BlockNumber heapBlk,
Buffer *vmbuf);
extern bool visibilitymap_pin_ok(BlockNumber heapBlk, Buffer vmbuf);
extern uint8 visibilitymap_set(Relation rel,
- BlockNumber heapBlk, Buffer heapBuf,
+ BlockNumber heapBlk,
XLogRecPtr recptr,
Buffer vmBuf,
TransactionId cutoff_xid,
--
2.43.0
v14-0014-Eliminate-XLOG_HEAP2_VISIBLE-from-vacuum-phase-I.patchtext/x-patch; charset=US-ASCII; name=v14-0014-Eliminate-XLOG_HEAP2_VISIBLE-from-vacuum-phase-I.patchDownload
From 1fc1a338e5d6621f89df46fe29d08c799267b39d Mon Sep 17 00:00:00 2001
From: Melanie Plageman <melanieplageman@gmail.com>
Date: Wed, 17 Sep 2025 15:52:18 -0400
Subject: [PATCH v14 14/24] Eliminate XLOG_HEAP2_VISIBLE from vacuum phase III
Instead of emitting a separate XLOG_HEAP2_VISIBLE record for each page
that is rendered all-visible by vacuum's third phase, include the
updates to the VM in the already emitted XLOG_HEAP2_PRUNE_VACUUM_CLEANUP
record.
The visibilitymap bits are stored in the flags member of the
xl_heap_prune struct.
This can decrease the number of of WAL records vacuum phase III emits by
as much as half.
Author: Melanie Plageman <melanieplageman@gmail.com>
Reviewed-by: Kirill Reshke <reshkekirill@gmail.com>
Reviewed-by: Andres Freund <andres@anarazel.de>
Discussion: https://postgr.es/m/flat/CAAKRu_ZMw6Npd_qm2KM%2BFwQ3cMOMx1Dh3VMhp8-V7SOLxdK9-g%40mail.gmail.com
---
src/backend/access/heap/heapam_xlog.c | 147 ++++++++++++++++++-------
src/backend/access/heap/pruneheap.c | 37 ++++++-
src/backend/access/heap/vacuumlazy.c | 38 +++----
src/backend/access/rmgrdesc/heapdesc.c | 11 +-
src/include/access/heapam.h | 1 +
src/include/access/heapam_xlog.h | 25 ++++-
6 files changed, 190 insertions(+), 69 deletions(-)
diff --git a/src/backend/access/heap/heapam_xlog.c b/src/backend/access/heap/heapam_xlog.c
index 68b41f39e69..c1f332f7a9a 100644
--- a/src/backend/access/heap/heapam_xlog.c
+++ b/src/backend/access/heap/heapam_xlog.c
@@ -35,7 +35,9 @@ heap_xlog_prune_freeze(XLogReaderState *record)
Buffer buffer;
RelFileLocator rlocator;
BlockNumber blkno;
- XLogRedoAction action;
+ Buffer vmbuffer = InvalidBuffer;
+ uint8 vmflags = 0;
+ Size freespace = 0;
XLogRecGetBlockTag(record, 0, &rlocator, NULL, &blkno);
memcpy(&xlrec, maindataptr, SizeOfHeapPrune);
@@ -50,11 +52,22 @@ heap_xlog_prune_freeze(XLogReaderState *record)
Assert((xlrec.flags & XLHP_CLEANUP_LOCK) != 0 ||
(xlrec.flags & (XLHP_HAS_REDIRECTIONS | XLHP_HAS_DEAD_ITEMS)) == 0);
+ if (xlrec.flags & XLHP_VM_ALL_VISIBLE)
+ {
+ vmflags = VISIBILITYMAP_ALL_VISIBLE;
+ if (xlrec.flags & XLHP_VM_ALL_FROZEN)
+ vmflags |= VISIBILITYMAP_ALL_FROZEN;
+ }
+
/*
- * We are about to remove and/or freeze tuples. In Hot Standby mode,
- * ensure that there are no queries running for which the removed tuples
- * are still visible or which still consider the frozen xids as running.
- * The conflict horizon XID comes after xl_heap_prune.
+ * After xl_heap_prune is the optional snapshot conflict horizon.
+ *
+ * In Hot Standby mode, we must ensure that there are no running queries
+ * which would conflict with the changes in this record. That means we
+ * can't replay this record if it removes tuples that are still visible to
+ * transactions on the standby, freeze tuples with xids that are still
+ * considered running on the standby, or set a page as all-visible in the
+ * VM if it isn't all-visible to all transactions on the standby.
*/
if ((xlrec.flags & XLHP_HAS_CONFLICT_HORIZON) != 0)
{
@@ -71,12 +84,12 @@ heap_xlog_prune_freeze(XLogReaderState *record)
}
/*
- * If we have a full-page image, restore it and we're done.
+ * If we have a full-page image of the heap block, restore it and we're
+ * done with the heap block.
*/
- action = XLogReadBufferForRedoExtended(record, 0, RBM_NORMAL,
- (xlrec.flags & XLHP_CLEANUP_LOCK) != 0,
- &buffer);
- if (action == BLK_NEEDS_REDO)
+ if (XLogReadBufferForRedoExtended(record, 0, RBM_NORMAL,
+ (xlrec.flags & XLHP_CLEANUP_LOCK) != 0,
+ &buffer) == BLK_NEEDS_REDO)
{
Page page = BufferGetPage(buffer);
OffsetNumber *redirected;
@@ -100,6 +113,11 @@ heap_xlog_prune_freeze(XLogReaderState *record)
do_prune = nredirected > 0 || ndead > 0 || nunused > 0;
+ /* Ensure the record does something */
+ Assert(do_prune || nplans > 0 ||
+ vmflags & VISIBILITYMAP_VALID_BITS ||
+ xlrec.flags & XLHP_SET_PD_ALL_VIS);
+
/*
* Update all line pointers per the record, and repair fragmentation
* if needed.
@@ -147,15 +165,23 @@ heap_xlog_prune_freeze(XLogReaderState *record)
* page-level PD_ALL_VISIBLE bit is clear. If that were to occur,
* then a subsequent page modification would fail to clear the
* visibility map bit.
+ *
+ * Note: we don't worry about updating the page's prunability hints.
+ * At worst this will cause an extra prune cycle to occur soon.
*/
if (xlrec.flags & XLHP_SET_PD_ALL_VIS)
PageSetAllVisible(page);
/*
- * Note: we don't worry about updating the page's prunability hints.
- * At worst this will cause an extra prune cycle to occur soon.
+ * We must never end up with the VM bit set and the page-level
+ * PD_ALL_VISIBLE bit clear. If that were to occur, a subsequent page
+ * modification would fail to clear the VM bit.
*/
- MarkBufferDirty(buffer);
+ Assert(!(vmflags & VISIBILITYMAP_VALID_BITS) || PageIsAllVisible(page));
+
+ /* If this record only sets the VM, no need to dirty the heap page */
+ if (do_prune || nplans > 0 || xlrec.flags & XLHP_SET_PD_ALL_VIS)
+ MarkBufferDirty(buffer);
/*
* We always emit a WAL record when setting PD_ALL_VISIBLE, but we are
@@ -171,47 +197,94 @@ heap_xlog_prune_freeze(XLogReaderState *record)
}
/*
- * If we released any space or line pointers or set PD_ALL_VISIBLE update
- * the freespace map.
+ * If we released any space or line pointers or set PD_ALL_VISIBLE or the
+ * VM, update the freespace map.
*
- * Even if we are just setting PD_ALL_VISIBLE (and thus not freeing up any
- * space), we'll still update the FSM for this page. Since the FSM is not
- * WAL-logged and only updated heuristically, it easily becomes stale in
- * standbys. If the standby is later promoted and runs VACUUM, it will
- * skip updating individual free space figures for pages that became
- * all-visible (or all-frozen, depending on the vacuum mode,) which is
- * troublesome when FreeSpaceMapVacuum propagates too optimistic free
- * space values to upper FSM layers; later inserters try to use such pages
- * only to find out that they are unusable. This can cause long stalls
- * when there are many such pages.
+ * Even if we are just setting PD_ALL_VISIBLE or updating the VM (and thus
+ * not freeing up any space), we'll still update the FSM for this page.
+ * Since the FSM is not WAL-logged and only updated heuristically, it
+ * easily becomes stale in standbys. If the standby is later promoted and
+ * runs VACUUM, it will skip updating individual free space figures for
+ * pages that became all-visible (or all-frozen, depending on the vacuum
+ * mode,) which is troublesome when FreeSpaceMapVacuum propagates too
+ * optimistic free space values to upper FSM layers; later inserters try
+ * to use such pages only to find out that they are unusable. This can
+ * cause long stalls when there are many such pages.
*
* Forestall those problems by updating FSM's idea about a page that is
* becoming all-visible or all-frozen.
*
* Do this regardless of a full-page image being applied, since the FSM
* data is not in the page anyway.
+ *
+ * We want to avoid holding an exclusive lock on the heap buffer while
+ * doing IO (either of the FSM or the VM), so we'll release the lock on
+ * the heap buffer before doing either.
*/
if (BufferIsValid(buffer))
{
if (xlrec.flags & (XLHP_HAS_REDIRECTIONS |
XLHP_HAS_DEAD_ITEMS |
XLHP_HAS_NOW_UNUSED_ITEMS |
- XLHP_SET_PD_ALL_VIS))
- {
- Size freespace = PageGetHeapFreeSpace(BufferGetPage(buffer));
+ XLHP_SET_PD_ALL_VIS |
+ (vmflags & VISIBILITYMAP_VALID_BITS)))
+ freespace = PageGetHeapFreeSpace(BufferGetPage(buffer));
- /*
- * We want to avoid holding an exclusive lock on the heap buffer
- * while doing IO, so we'll release the lock on the heap buffer
- * first.
- */
- UnlockReleaseBuffer(buffer);
+ UnlockReleaseBuffer(buffer);
+ }
- XLogRecordPageWithFreeSpace(rlocator, blkno, freespace);
+ /*
+ * Now read and update the VM block.
+ *
+ * Note that the heap relation may have been dropped or truncated, leading
+ * us to skip updating the heap block due to the LSN interlock. However,
+ * even in that case, it's still safe to update the visibility map. Any
+ * WAL record that clears the visibility map bit does so before checking
+ * the page LSN, so any bits that need to be cleared will still be
+ * cleared.
+ *
+ * Note that the lock on the heap page was dropped above. In normal
+ * operation this would never be safe because a concurrent query could
+ * modify the heap page and clear PD_ALL_VISIBLE -- violating the
+ * invariant that PD_ALL_VISIBLE must be set if the corresponding bit in
+ * the VM is set.
+ *
+ * In recovery, we expect no other writers, so writing to the VM page
+ * without holding a lock on the heap page is considered safe enough. It
+ * is done this way when replaying xl_heap_visible records (see
+ * heap_xlog_visible()).
+ */
+ if (vmflags & VISIBILITYMAP_VALID_BITS &&
+ XLogReadBufferForRedoExtended(record, 1,
+ RBM_ZERO_ON_ERROR,
+ false,
+ &vmbuffer) == BLK_NEEDS_REDO)
+ {
+ Page vmpage = BufferGetPage(vmbuffer);
+ uint8 old_vmbits = 0;
+ Relation reln = CreateFakeRelcacheEntry(rlocator);
+
+ /* initialize the page if it was read as zeros */
+ if (PageIsNew(vmpage))
+ PageInit(vmpage, BLCKSZ, 0);
+
+ old_vmbits = visibilitymap_set_vmbits(reln, blkno, vmbuffer, vmflags);
+
+ /* Only set VM page LSN if we modified the page */
+ if (old_vmbits != vmflags)
+ {
+ Assert(BufferIsDirty(vmbuffer));
+ PageSetLSN(BufferGetPage(vmbuffer), lsn);
}
- else
- UnlockReleaseBuffer(buffer);
+
+ FreeFakeRelcacheEntry(reln);
}
+
+ if (BufferIsValid(vmbuffer))
+ UnlockReleaseBuffer(vmbuffer);
+
+ if (freespace > 0)
+ XLogRecordPageWithFreeSpace(rlocator, blkno, freespace);
}
/*
diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index 9b25131543b..9e00fbf3cd1 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -20,6 +20,7 @@
#include "access/multixact.h"
#include "access/transam.h"
#include "access/xlog.h"
+#include "access/visibilitymapdefs.h"
#include "access/xloginsert.h"
#include "commands/vacuum.h"
#include "executor/instrument.h"
@@ -913,6 +914,7 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
conflict_xid = prstate.latest_xid_removed;
log_heap_prune_and_freeze(relation, buffer,
+ InvalidBuffer, 0,
conflict_xid,
true,
do_set_pd_vis,
@@ -2088,14 +2090,18 @@ heap_log_freeze_plan(HeapTupleFreeze *tuples, int ntuples,
*
* This is used for several different page maintenance operations:
*
- * - Page pruning, in VACUUM's 1st pass or on access: Some items are
+ * - Page pruning, in vacuum phase I or on-access: Some items are
* redirected, some marked dead, and some removed altogether.
*
- * - Freezing: Items are marked as 'frozen'.
+ * - Freezing: During vacuum phase I, items are marked as 'frozen'
*
- * - Vacuum, 2nd pass: Items that are already LP_DEAD are marked as unused.
+ * - Reaping: During vacuum phase III, items that are already LP_DEAD are
+ * marked as unused.
*
- * They have enough commonalities that we use a single WAL record for them
+ * - VM updates: After vacuum phase III, the heap page may be marked
+ * all-visible and all-frozen.
+ *
+ * These changes all happen together, so we use a single WAL record for them
* all.
*
* If replaying the record requires a cleanup lock, pass cleanup_lock = true.
@@ -2103,6 +2109,10 @@ heap_log_freeze_plan(HeapTupleFreeze *tuples, int ntuples,
* replaying 'unused' items depends on whether they were all previously marked
* as dead.
*
+ * If the VM is being updated, vmflags will contain the bits to set. In this
+ * case, vmbuffer should already have been updated and marked dirty and should
+ * still be pinned and locked.
+ *
* set_pd_all_vis indicates that we set PD_ALL_VISIBLE and thus should update
* the page LSN when checksums/wal_log_hints are enabled even if we did not
* prune or freeze tuples on the page.
@@ -2113,6 +2123,7 @@ heap_log_freeze_plan(HeapTupleFreeze *tuples, int ntuples,
*/
void
log_heap_prune_and_freeze(Relation relation, Buffer buffer,
+ Buffer vmbuffer, uint8 vmflags,
TransactionId conflict_xid,
bool cleanup_lock,
bool set_pd_all_vis,
@@ -2139,6 +2150,8 @@ log_heap_prune_and_freeze(Relation relation, Buffer buffer,
xlrec.flags = 0;
regbuf_flags = REGBUF_STANDARD;
+ Assert((vmflags & VISIBILITYMAP_VALID_BITS) == vmflags);
+
/*
* We can avoid an FPI if the only modification we are making to the heap
* page is to set PD_ALL_VISIBLE and checksums/wal_log_hints are disabled.
@@ -2157,6 +2170,10 @@ log_heap_prune_and_freeze(Relation relation, Buffer buffer,
*/
XLogBeginInsert();
XLogRegisterBuffer(0, buffer, regbuf_flags);
+
+ if (vmflags & VISIBILITYMAP_VALID_BITS)
+ XLogRegisterBuffer(1, vmbuffer, 0);
+
if (nfrozen > 0)
{
int nplans;
@@ -2213,6 +2230,12 @@ log_heap_prune_and_freeze(Relation relation, Buffer buffer,
* Prepare the main xl_heap_prune record. We already set the XLHP_HAS_*
* flag above.
*/
+ if (vmflags & VISIBILITYMAP_ALL_VISIBLE)
+ {
+ xlrec.flags |= XLHP_VM_ALL_VISIBLE;
+ if (vmflags & VISIBILITYMAP_ALL_FROZEN)
+ xlrec.flags |= XLHP_VM_ALL_FROZEN;
+ }
if (set_pd_all_vis)
xlrec.flags |= XLHP_SET_PD_ALL_VIS;
if (RelationIsAccessibleInLogicalDecoding(relation))
@@ -2247,6 +2270,12 @@ log_heap_prune_and_freeze(Relation relation, Buffer buffer,
}
recptr = XLogInsert(RM_HEAP2_ID, info);
+ if (vmflags & VISIBILITYMAP_VALID_BITS)
+ {
+ Assert(BufferIsDirty(vmbuffer));
+ PageSetLSN(BufferGetPage(vmbuffer), recptr);
+ }
+
/*
* We must bump the page LSN if pruning or freezing. If we are only
* updating PD_ALL_VISIBLE, though, we can skip doing this unless
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index a0f3984e37f..b6c973cd111 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -1906,6 +1906,8 @@ lazy_scan_new_or_empty(LVRelState *vacrel, Buffer buf, BlockNumber blkno,
log_newpage_buffer(buf, true);
else
log_heap_prune_and_freeze(vacrel->rel, buf,
+ InvalidBuffer,
+ 0,
InvalidTransactionId, /* conflict xid */
false, /* cleanup lock */
true, /* set_pd_all_vis */
@@ -2817,6 +2819,7 @@ lazy_vacuum_heap_page(LVRelState *vacrel, BlockNumber blkno, Buffer buffer,
OffsetNumber unused[MaxHeapTuplesPerPage];
int nunused = 0;
TransactionId visibility_cutoff_xid;
+ TransactionId conflict_xid = InvalidTransactionId;
bool all_frozen;
LVSavedErrInfo saved_err_info;
uint8 vmflags = 0;
@@ -2842,6 +2845,9 @@ lazy_vacuum_heap_page(LVRelState *vacrel, BlockNumber blkno, Buffer buffer,
vmflags |= VISIBILITYMAP_ALL_FROZEN;
Assert(!TransactionIdIsValid(visibility_cutoff_xid));
}
+
+ /* Take the lock on the vmbuffer before entering a critical section */
+ LockBuffer(vmbuffer, BUFFER_LOCK_EXCLUSIVE);
}
START_CRIT_SECTION();
@@ -2868,7 +2874,13 @@ lazy_vacuum_heap_page(LVRelState *vacrel, BlockNumber blkno, Buffer buffer,
* setting the VM, we must set PD_ALL_VISIBLE as well.
*/
if ((vmflags & VISIBILITYMAP_VALID_BITS) != 0)
+ {
PageSetAllVisible(page);
+ visibilitymap_set_vmbits(vacrel->rel,
+ blkno,
+ vmbuffer, vmflags);
+ conflict_xid = visibility_cutoff_xid;
+ }
/*
* Mark buffer dirty before we write WAL.
@@ -2879,7 +2891,8 @@ lazy_vacuum_heap_page(LVRelState *vacrel, BlockNumber blkno, Buffer buffer,
if (RelationNeedsWAL(vacrel->rel))
{
log_heap_prune_and_freeze(vacrel->rel, buffer,
- InvalidTransactionId,
+ vmbuffer, vmflags,
+ conflict_xid,
false, /* no cleanup lock required */
(vmflags & VISIBILITYMAP_VALID_BITS) != 0,
PRUNE_VACUUM_CLEANUP,
@@ -2889,36 +2902,17 @@ lazy_vacuum_heap_page(LVRelState *vacrel, BlockNumber blkno, Buffer buffer,
unused, nunused);
}
- /*
- * Note that we don't end the critical section until after emitting the VM
- * record. This ensures both PD_ALL_VISIBLE and the VM bits are set or
- * unset in the event of a crash. While it is correct for PD_ALL_VISIBLE
- * to be set and the VM to be clear, we should do our best to keep these
- * in sync. This does mean that we will take a lock on the VM buffer
- * inside of a critical section, which is generally discouraged. There is
- * precedent for this in other callers of visibilitymap_set(), though.
- */
+ END_CRIT_SECTION();
- /*
- * Now that we have removed the LP_DEAD items from the page, set the
- * visibility map if the page became all-visible/all-frozen. Changes to
- * the heap page have already been logged.
- */
if ((vmflags & VISIBILITYMAP_ALL_VISIBLE) != 0)
{
- visibilitymap_set(vacrel->rel, blkno,
- InvalidXLogRecPtr,
- vmbuffer, visibility_cutoff_xid,
- vmflags);
-
/* Count the newly set VM page for logging */
+ LockBuffer(vmbuffer, BUFFER_LOCK_UNLOCK);
vacrel->vm_new_visible_pages++;
if (all_frozen)
vacrel->vm_new_visible_frozen_pages++;
}
- END_CRIT_SECTION();
-
/* Revert to the previous phase information for error traceback */
restore_vacuum_error_info(vacrel, &saved_err_info);
}
diff --git a/src/backend/access/rmgrdesc/heapdesc.c b/src/backend/access/rmgrdesc/heapdesc.c
index b48d7dc1d24..1cb44ca32d3 100644
--- a/src/backend/access/rmgrdesc/heapdesc.c
+++ b/src/backend/access/rmgrdesc/heapdesc.c
@@ -103,7 +103,7 @@ plan_elem_desc(StringInfo buf, void *plan, void *data)
* code, the latter of which is used in frontend (pg_waldump) code.
*/
void
-heap_xlog_deserialize_prune_and_freeze(char *cursor, uint8 flags,
+heap_xlog_deserialize_prune_and_freeze(char *cursor, uint16 flags,
int *nplans, xlhp_freeze_plan **plans,
OffsetNumber **frz_offsets,
int *nredirected, OffsetNumber **redirected,
@@ -287,6 +287,15 @@ heap2_desc(StringInfo buf, XLogReaderState *record)
appendStringInfo(buf, ", isCatalogRel: %c",
xlrec->flags & XLHP_IS_CATALOG_REL ? 'T' : 'F');
+ if (xlrec->flags & XLHP_VM_ALL_VISIBLE)
+ {
+ uint8 vmflags = VISIBILITYMAP_ALL_VISIBLE;
+
+ if (xlrec->flags & XLHP_VM_ALL_FROZEN)
+ vmflags |= VISIBILITYMAP_ALL_FROZEN;
+ appendStringInfo(buf, ", vm_flags: 0x%02X", vmflags);
+ }
+
if (XLogRecHasBlockData(record, 0))
{
Size datalen;
diff --git a/src/include/access/heapam.h b/src/include/access/heapam.h
index 2f77d8dbcd6..be66970c9f0 100644
--- a/src/include/access/heapam.h
+++ b/src/include/access/heapam.h
@@ -389,6 +389,7 @@ extern void heap_page_prune_execute(Buffer buffer, bool lp_truncate_only,
OffsetNumber *nowunused, int nunused);
extern void heap_get_root_tuples(Page page, OffsetNumber *root_offsets);
extern void log_heap_prune_and_freeze(Relation relation, Buffer buffer,
+ Buffer vmbuffer, uint8 vmflags,
TransactionId conflict_xid,
bool cleanup_lock,
bool set_pd_all_vis,
diff --git a/src/include/access/heapam_xlog.h b/src/include/access/heapam_xlog.h
index 82b8f7f2bbc..833114e0a6e 100644
--- a/src/include/access/heapam_xlog.h
+++ b/src/include/access/heapam_xlog.h
@@ -249,7 +249,7 @@ typedef struct xl_heap_update
* Main data section:
*
* xl_heap_prune
- * uint8 flags
+ * uint16 flags
* TransactionId snapshot_conflict_horizon
*
* Block 0 data section:
@@ -284,7 +284,7 @@ typedef struct xl_heap_update
*/
typedef struct xl_heap_prune
{
- uint8 flags;
+ uint16 flags;
/*
* If XLHP_HAS_CONFLICT_HORIZON is set, the conflict horizon XID follows,
@@ -292,11 +292,17 @@ typedef struct xl_heap_prune
*/
} xl_heap_prune;
-#define SizeOfHeapPrune (offsetof(xl_heap_prune, flags) + sizeof(uint8))
+#define SizeOfHeapPrune (offsetof(xl_heap_prune, flags) + sizeof(uint16))
#define XLHP_SET_PD_ALL_VIS (1 << 0)
-/* to handle recovery conflict during logical decoding on standby */
+/*
+ * To handle recovery conflict during logical decoding on standby, we must know
+ * if the table is a catalog table. Note that in visibilitymapdefs.h
+ * VISIBLITYMAP_XLOG_CATALOG_REL is also defined as (1 << 2). xl_heap_prune
+ * records should use XLHP_IS_CATALOG_REL, not VISIBILIYTMAP_XLOG_CATALOG_REL --
+ * even if they only contain updates to the VM.
+ */
#define XLHP_IS_CATALOG_REL (1 << 1)
/*
@@ -332,6 +338,15 @@ typedef struct xl_heap_prune
#define XLHP_HAS_DEAD_ITEMS (1 << 6)
#define XLHP_HAS_NOW_UNUSED_ITEMS (1 << 7)
+/*
+ * The xl_heap_prune record's flags may also contain which VM bits to set.
+ * xl_heap_prune should always use the XLHP_VM_ALL_VISIBLE and
+ * XLHP_VM_ALL_FROZEN flags and translate them to their visibilitymapdefs.h
+ * equivalents, VISIBILITYMAP_ALL_VISIBLE and VISIBILITYMAP_ALL_FROZEN.
+ */
+#define XLHP_VM_ALL_VISIBLE (1 << 8)
+#define XLHP_VM_ALL_FROZEN (1 << 9)
+
/*
* xlhp_freeze_plan describes how to freeze a group of one or more heap tuples
* (appears in xl_heap_prune's xlhp_freeze_plans sub-record)
@@ -498,7 +513,7 @@ extern XLogRecPtr log_heap_visible(Relation rel,
uint8 vmflags);
/* in heapdesc.c, so it can be shared between frontend/backend code */
-extern void heap_xlog_deserialize_prune_and_freeze(char *cursor, uint8 flags,
+extern void heap_xlog_deserialize_prune_and_freeze(char *cursor, uint16 flags,
int *nplans, xlhp_freeze_plan **plans,
OffsetNumber **frz_offsets,
int *nredirected, OffsetNumber **redirected,
--
2.43.0
v14-0013-Make-heap_page_is_all_visible-independent-of-LVR.patchtext/x-patch; charset=US-ASCII; name=v14-0013-Make-heap_page_is_all_visible-independent-of-LVR.patchDownload
From 2f820f93bfe273ed9b9867d3ddc9f4c67dd94296 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <melanieplageman@gmail.com>
Date: Tue, 16 Sep 2025 15:39:31 -0400
Subject: [PATCH v14 13/24] Make heap_page_is_all_visible independent of
LVRelState
Future commits will use this function inside of pruneheap.c where we do
not have access to the LVRelState. We only need a few parameters from
the LVRelState, so just pass those in explicitly.
Author: Melanie Plageman <melanieplageman@gmail.com>
Reviewed-by: Kirill Reshke <reshkekirill@gmail.com>
Reviewed-by: Robert Haas <robertmhaas@gmail.com>
Discussion: https://postgr.es/m/flat/CAAKRu_ZMw6Npd_qm2KM%2BFwQ3cMOMx1Dh3VMhp8-V7SOLxdK9-g%40mail.gmail.com
---
src/backend/access/heap/vacuumlazy.c | 57 ++++++++++++++++++----------
1 file changed, 37 insertions(+), 20 deletions(-)
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index 735f1e7501e..a0f3984e37f 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -463,13 +463,18 @@ static void dead_items_add(LVRelState *vacrel, BlockNumber blkno, OffsetNumber *
int num_offsets);
static void dead_items_reset(LVRelState *vacrel);
static void dead_items_cleanup(LVRelState *vacrel);
-static bool heap_page_is_all_visible(LVRelState *vacrel, Buffer buf,
- TransactionId *visibility_cutoff_xid, bool *all_frozen);
-static bool heap_page_would_be_all_visible(LVRelState *vacrel, Buffer buf,
+static bool heap_page_is_all_visible(Relation rel, Buffer buf,
+ TransactionId OldestXmin,
+ bool *all_frozen,
+ TransactionId *visibility_cutoff_xid,
+ OffsetNumber *logging_offnum);
+static bool heap_page_would_be_all_visible(Relation rel, Buffer buf,
+ TransactionId OldestXmin,
OffsetNumber *deadoffsets,
int ndeadoffsets,
bool *all_frozen,
- TransactionId *visibility_cutoff_xid);
+ TransactionId *visibility_cutoff_xid,
+ OffsetNumber *logging_offnum);
static void update_relstats_all_indexes(LVRelState *vacrel);
static void vacuum_error_callback(void *arg);
static void update_vacuum_error_info(LVRelState *vacrel,
@@ -2030,8 +2035,9 @@ lazy_scan_prune(LVRelState *vacrel,
Assert(presult.lpdead_items == 0);
- if (!heap_page_is_all_visible(vacrel, buf,
- &debug_cutoff, &debug_all_frozen))
+ if (!heap_page_is_all_visible(vacrel->rel, buf,
+ vacrel->cutoffs.OldestXmin, &debug_all_frozen,
+ &debug_cutoff, &vacrel->offnum))
Assert(false);
Assert(presult.all_frozen == debug_all_frozen);
@@ -2824,9 +2830,11 @@ lazy_vacuum_heap_page(LVRelState *vacrel, BlockNumber blkno, Buffer buffer,
VACUUM_ERRCB_PHASE_VACUUM_HEAP, blkno,
InvalidOffsetNumber);
- if (heap_page_would_be_all_visible(vacrel, buffer,
+ if (heap_page_would_be_all_visible(vacrel->rel, buffer,
+ vacrel->cutoffs.OldestXmin,
deadoffsets, num_offsets,
- &all_frozen, &visibility_cutoff_xid))
+ &all_frozen, &visibility_cutoff_xid,
+ &vacrel->offnum))
{
vmflags |= VISIBILITYMAP_ALL_VISIBLE;
if (all_frozen)
@@ -3576,15 +3584,19 @@ dead_items_cleanup(LVRelState *vacrel)
* callers that expect no LP_DEAD on the page.
*/
static bool
-heap_page_is_all_visible(LVRelState *vacrel, Buffer buf,
+heap_page_is_all_visible(Relation rel, Buffer buf,
+ TransactionId OldestXmin,
+ bool *all_frozen,
TransactionId *visibility_cutoff_xid,
- bool *all_frozen)
+ OffsetNumber *logging_offnum)
{
- return heap_page_would_be_all_visible(vacrel, buf,
+ return heap_page_would_be_all_visible(rel, buf,
+ OldestXmin,
NULL, 0,
all_frozen,
- visibility_cutoff_xid);
+ visibility_cutoff_xid,
+ logging_offnum);
}
/*
@@ -3599,7 +3611,7 @@ heap_page_is_all_visible(LVRelState *vacrel, Buffer buf,
* Returns true if the page is all-visible other than the provided
* deadoffsets and false otherwise.
*
- * vacrel->cutoffs.OldestXmin is used to determine visibility.
+ * OldestXmin is used to determine visibility.
*
* *all_frozen is an output parameter indicating to the caller if every tuple
* on the page is frozen.
@@ -3607,6 +3619,9 @@ heap_page_is_all_visible(LVRelState *vacrel, Buffer buf,
* *visibility_cutoff_xid is an output parameter with the highest xmin amongst the
* visible tuples. It is only valid if the page is all-visible.
*
+ * *logging_offnum will have the OffsetNumber of the current tuple being
+ * processed for vacuum's error callback system.
+ *
* Callers looking to verify that the page is already all-visible can call
* heap_page_is_all_visible().
*
@@ -3616,11 +3631,13 @@ heap_page_is_all_visible(LVRelState *vacrel, Buffer buf,
* to avoid introducing new side-effects here.
*/
static bool
-heap_page_would_be_all_visible(LVRelState *vacrel, Buffer buf,
+heap_page_would_be_all_visible(Relation rel, Buffer buf,
+ TransactionId OldestXmin,
OffsetNumber *deadoffsets,
int ndeadoffsets,
bool *all_frozen,
- TransactionId *visibility_cutoff_xid)
+ TransactionId *visibility_cutoff_xid,
+ OffsetNumber *logging_offnum)
{
Page page = BufferGetPage(buf);
BlockNumber blockno = BufferGetBlockNumber(buf);
@@ -3655,7 +3672,7 @@ heap_page_would_be_all_visible(LVRelState *vacrel, Buffer buf,
* Set the offset number so that we can display it along with any
* error that occurred while processing this tuple.
*/
- vacrel->offnum = offnum;
+ *logging_offnum = offnum;
itemid = PageGetItemId(page, offnum);
/* Unused or redirect line pointers are of no interest */
@@ -3685,9 +3702,9 @@ heap_page_would_be_all_visible(LVRelState *vacrel, Buffer buf,
tuple.t_data = (HeapTupleHeader) PageGetItem(page, itemid);
tuple.t_len = ItemIdGetLength(itemid);
- tuple.t_tableOid = RelationGetRelid(vacrel->rel);
+ tuple.t_tableOid = RelationGetRelid(rel);
- switch (HeapTupleSatisfiesVacuum(&tuple, vacrel->cutoffs.OldestXmin,
+ switch (HeapTupleSatisfiesVacuum(&tuple, OldestXmin,
buf))
{
case HEAPTUPLE_LIVE:
@@ -3708,7 +3725,7 @@ heap_page_would_be_all_visible(LVRelState *vacrel, Buffer buf,
*/
xmin = HeapTupleHeaderGetXmin(tuple.t_data);
if (!TransactionIdPrecedes(xmin,
- vacrel->cutoffs.OldestXmin))
+ OldestXmin))
{
all_visible = false;
*all_frozen = false;
@@ -3743,7 +3760,7 @@ heap_page_would_be_all_visible(LVRelState *vacrel, Buffer buf,
} /* scan along page */
/* Clear the offset information once we have processed the given page. */
- vacrel->offnum = InvalidOffsetNumber;
+ *logging_offnum = InvalidOffsetNumber;
return all_visible;
}
--
2.43.0
v14-0021-Inline-TransactionIdFollows-Precedes.patchtext/x-patch; charset=US-ASCII; name=v14-0021-Inline-TransactionIdFollows-Precedes.patchDownload
From 7ea26725c69aba6f269692387a6e923614181cc4 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <melanieplageman@gmail.com>
Date: Tue, 29 Jul 2025 09:57:13 -0400
Subject: [PATCH v14 21/24] Inline TransactionIdFollows/Precedes()
Calling these from on-access pruning code had noticeable overhead in a
profile. There does not seem to be a reason not to inline them.
Reviewed-by: Kirill Reshke <reshkekirill@gmail.com>
---
src/backend/access/transam/transam.c | 64 -------------------------
src/include/access/transam.h | 70 ++++++++++++++++++++++++++--
2 files changed, 66 insertions(+), 68 deletions(-)
diff --git a/src/backend/access/transam/transam.c b/src/backend/access/transam/transam.c
index 9a39451a29a..553d6756cb3 100644
--- a/src/backend/access/transam/transam.c
+++ b/src/backend/access/transam/transam.c
@@ -273,70 +273,6 @@ TransactionIdAbortTree(TransactionId xid, int nxids, TransactionId *xids)
TRANSACTION_STATUS_ABORTED, InvalidXLogRecPtr);
}
-/*
- * TransactionIdPrecedes --- is id1 logically < id2?
- */
-bool
-TransactionIdPrecedes(TransactionId id1, TransactionId id2)
-{
- /*
- * If either ID is a permanent XID then we can just do unsigned
- * comparison. If both are normal, do a modulo-2^32 comparison.
- */
- int32 diff;
-
- if (!TransactionIdIsNormal(id1) || !TransactionIdIsNormal(id2))
- return (id1 < id2);
-
- diff = (int32) (id1 - id2);
- return (diff < 0);
-}
-
-/*
- * TransactionIdPrecedesOrEquals --- is id1 logically <= id2?
- */
-bool
-TransactionIdPrecedesOrEquals(TransactionId id1, TransactionId id2)
-{
- int32 diff;
-
- if (!TransactionIdIsNormal(id1) || !TransactionIdIsNormal(id2))
- return (id1 <= id2);
-
- diff = (int32) (id1 - id2);
- return (diff <= 0);
-}
-
-/*
- * TransactionIdFollows --- is id1 logically > id2?
- */
-bool
-TransactionIdFollows(TransactionId id1, TransactionId id2)
-{
- int32 diff;
-
- if (!TransactionIdIsNormal(id1) || !TransactionIdIsNormal(id2))
- return (id1 > id2);
-
- diff = (int32) (id1 - id2);
- return (diff > 0);
-}
-
-/*
- * TransactionIdFollowsOrEquals --- is id1 logically >= id2?
- */
-bool
-TransactionIdFollowsOrEquals(TransactionId id1, TransactionId id2)
-{
- int32 diff;
-
- if (!TransactionIdIsNormal(id1) || !TransactionIdIsNormal(id2))
- return (id1 >= id2);
-
- diff = (int32) (id1 - id2);
- return (diff >= 0);
-}
-
/*
* TransactionIdLatest --- get latest XID among a main xact and its children
diff --git a/src/include/access/transam.h b/src/include/access/transam.h
index 7d82cd2eb56..c9e20418275 100644
--- a/src/include/access/transam.h
+++ b/src/include/access/transam.h
@@ -255,6 +255,72 @@ typedef struct TransamVariablesData
} TransamVariablesData;
+
+/*
+ * TransactionIdPrecedes --- is id1 logically < id2?
+ */
+static inline bool
+TransactionIdPrecedes(TransactionId id1, TransactionId id2)
+{
+ /*
+ * If either ID is a permanent XID then we can just do unsigned
+ * comparison. If both are normal, do a modulo-2^32 comparison.
+ */
+ int32 diff;
+
+ if (!TransactionIdIsNormal(id1) || !TransactionIdIsNormal(id2))
+ return (id1 < id2);
+
+ diff = (int32) (id1 - id2);
+ return (diff < 0);
+}
+
+/*
+ * TransactionIdPrecedesOrEquals --- is id1 logically <= id2?
+ */
+static inline bool
+TransactionIdPrecedesOrEquals(TransactionId id1, TransactionId id2)
+{
+ int32 diff;
+
+ if (!TransactionIdIsNormal(id1) || !TransactionIdIsNormal(id2))
+ return (id1 <= id2);
+
+ diff = (int32) (id1 - id2);
+ return (diff <= 0);
+}
+
+/*
+ * TransactionIdFollows --- is id1 logically > id2?
+ */
+static inline bool
+TransactionIdFollows(TransactionId id1, TransactionId id2)
+{
+ int32 diff;
+
+ if (!TransactionIdIsNormal(id1) || !TransactionIdIsNormal(id2))
+ return (id1 > id2);
+
+ diff = (int32) (id1 - id2);
+ return (diff > 0);
+}
+
+/*
+ * TransactionIdFollowsOrEquals --- is id1 logically >= id2?
+ */
+static inline bool
+TransactionIdFollowsOrEquals(TransactionId id1, TransactionId id2)
+{
+ int32 diff;
+
+ if (!TransactionIdIsNormal(id1) || !TransactionIdIsNormal(id2))
+ return (id1 >= id2);
+
+ diff = (int32) (id1 - id2);
+ return (diff >= 0);
+}
+
+
/* ----------------
* extern declarations
* ----------------
@@ -274,10 +340,6 @@ extern bool TransactionIdDidAbort(TransactionId transactionId);
extern void TransactionIdCommitTree(TransactionId xid, int nxids, TransactionId *xids);
extern void TransactionIdAsyncCommitTree(TransactionId xid, int nxids, TransactionId *xids, XLogRecPtr lsn);
extern void TransactionIdAbortTree(TransactionId xid, int nxids, TransactionId *xids);
-extern bool TransactionIdPrecedes(TransactionId id1, TransactionId id2);
-extern bool TransactionIdPrecedesOrEquals(TransactionId id1, TransactionId id2);
-extern bool TransactionIdFollows(TransactionId id1, TransactionId id2);
-extern bool TransactionIdFollowsOrEquals(TransactionId id1, TransactionId id2);
extern TransactionId TransactionIdLatest(TransactionId mainxid,
int nxids, const TransactionId *xids);
extern XLogRecPtr TransactionIdGetCommitLSN(TransactionId xid);
--
2.43.0
v14-0022-Unset-all-visible-sooner-if-not-freezing.patchtext/x-patch; charset=US-ASCII; name=v14-0022-Unset-all-visible-sooner-if-not-freezing.patchDownload
From eea3df3f0660f868df56fa0043c182b2fb3c0258 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <melanieplageman@gmail.com>
Date: Tue, 29 Jul 2025 14:35:13 -0400
Subject: [PATCH v14 22/24] Unset all-visible sooner if not freezing
In prune/freeze code, we delay unsetting all-visible/all-frozen in the
presence of dead items to allow opportunistically freezing tuples if the
whole page would be frozen except for those dead items -- which are
removed later in vacuum's third phase.
Future commits will allow on-access pruning to set the VM, which means
all-visible will be initialized to true instead of false and we will do
extra bookkeeping in heap_prune_unchanged_lp_normal() to keep track of
whether or not the page is all-visible.
Because on-access pruning will not freeze tuples, it makes sense to
unset all-visible as soon as we encounter an LP_DEAD item and
avoid continued bookkeeping since we know the page is not all-visible
and we won't be able to remove those dead items.
---
src/backend/access/heap/pruneheap.c | 10 ++++++++--
1 file changed, 8 insertions(+), 2 deletions(-)
diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index bb7a1357a89..c29f47ab151 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -1522,8 +1522,11 @@ heap_prune_record_dead(PruneState *prstate, OffsetNumber offnum,
/*
* Deliberately delay unsetting all_visible until later during pruning.
- * Removable dead tuples shouldn't preclude freezing the page.
+ * Removable dead tuples shouldn't preclude freezing the page. If we won't
+ * attempt freezing, just unset all-visible now, though.
*/
+ if (!prstate->attempt_freeze)
+ prstate->all_visible = prstate->all_frozen = false;
/* Record the dead offset for vacuum */
prstate->deadoffsets[prstate->lpdead_items++] = offnum;
@@ -1776,8 +1779,11 @@ heap_prune_record_unchanged_lp_dead(Page page, PruneState *prstate, OffsetNumber
* Similarly, don't unset all_visible until later, at the end of
* heap_page_prune_and_freeze(). This will allow us to attempt to freeze
* the page after pruning. As long as we unset it before updating the
- * visibility map, this will be correct.
+ * visibility map, this will be correct. If we won't attempt freezing,
+ * though, just unset all-visible now.
*/
+ if (!prstate->attempt_freeze)
+ prstate->all_visible = prstate->all_frozen = false;
/* Record the dead offset for vacuum */
prstate->deadoffsets[prstate->lpdead_items++] = offnum;
--
2.43.0
v14-0024-Set-pd_prune_xid-on-insert.patchtext/x-patch; charset=US-ASCII; name=v14-0024-Set-pd_prune_xid-on-insert.patchDownload
From 0134ca707f4c64620ff26c69d703b79ec421ac91 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <melanieplageman@gmail.com>
Date: Tue, 29 Jul 2025 16:12:56 -0400
Subject: [PATCH v14 24/24] Set pd_prune_xid on insert
Now that we can set the VM during read-only queries, it makes sense to
start setting the page prunable hint on insert. This will allow
heap_page_prune_and_freeze() to be called when the page is full or
mostly full.
For years there has been a note in heap_insert() and heap_multi_insert()
pointing out that setting pd_prune_xid would help clean up aborted
inserted tuples that would otherwise not be cleaned up until vacuum.
So, that's another benefit of setting it.
Setting pd_prune_xid on insert causes a page to be pruned and then
written out which then affects the reported number of hits in the
index-killtuples isolation test. This is a quirk of how hits are tracked
which sometimes leads them to be double counted. This should probably be
fixed or changed independently.
---
src/backend/access/heap/heapam.c | 25 +++++++++++++------
src/backend/access/heap/heapam_xlog.c | 15 ++++++++++-
.../isolation/expected/index-killtuples.out | 6 ++---
3 files changed, 34 insertions(+), 12 deletions(-)
diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index 94d673d92c0..47aa9638724 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -2104,6 +2104,7 @@ heap_insert(Relation relation, HeapTuple tup, CommandId cid,
TransactionId xid = GetCurrentTransactionId();
HeapTuple heaptup;
Buffer buffer;
+ Page page;
Buffer vmbuffer = InvalidBuffer;
bool all_visible_cleared = false;
@@ -2163,15 +2164,19 @@ heap_insert(Relation relation, HeapTuple tup, CommandId cid,
}
/*
- * XXX Should we set PageSetPrunable on this page ?
+ * Set pd_prune_xid to trigger heap_page_prune_and_freeze() once the page
+ * is full so that we can set the page all-visible in the VM.
*
- * The inserting transaction may eventually abort thus making this tuple
- * DEAD and hence available for pruning. Though we don't want to optimize
- * for aborts, if no other tuple in this page is UPDATEd/DELETEd, the
- * aborted tuple will never be pruned until next vacuum is triggered.
+ * Setting pd_prune_xid is also handy if the inserting transaction
+ * eventually aborts making this tuple DEAD and hence available for
+ * pruning. If no other tuple in this page is UPDATEd/DELETEd, the aborted
+ * tuple would never otherwise be pruned until next vacuum is triggered.
*
- * If you do add PageSetPrunable here, add it in heap_xlog_insert too.
+ * Don't set it if we are in bootstrap mode, though.
*/
+ page = BufferGetPage(buffer);
+ if (TransactionIdIsNormal(xid))
+ PageSetPrunable(page, xid);
MarkBufferDirty(buffer);
@@ -2181,7 +2186,6 @@ heap_insert(Relation relation, HeapTuple tup, CommandId cid,
xl_heap_insert xlrec;
xl_heap_header xlhdr;
XLogRecPtr recptr;
- Page page = BufferGetPage(buffer);
uint8 info = XLOG_HEAP_INSERT;
int bufflags = 0;
@@ -2545,8 +2549,13 @@ heap_multi_insert(Relation relation, TupleTableSlot **slots, int ntuples,
}
/*
- * XXX Should we set PageSetPrunable on this page ? See heap_insert()
+ * Set pd_prune_xid. See heap_insert() for more on why we do this when
+ * inserting tuples. This only makes sense if we aren't already
+ * setting the page frozen in the VM. We also don't set it in
+ * bootstrap mode.
*/
+ if (!all_frozen_set && TransactionIdIsNormal(xid))
+ PageSetPrunable(page, xid);
MarkBufferDirty(buffer);
diff --git a/src/backend/access/heap/heapam_xlog.c b/src/backend/access/heap/heapam_xlog.c
index a8908373067..a2c4e4f47fe 100644
--- a/src/backend/access/heap/heapam_xlog.c
+++ b/src/backend/access/heap/heapam_xlog.c
@@ -486,6 +486,12 @@ heap_xlog_insert(XLogReaderState *record)
freespace = PageGetHeapFreeSpace(page); /* needed to update FSM below */
+ /*
+ * Set the page prunable to trigger on-access pruning later which may
+ * set the page all-visible in the VM.
+ */
+ PageSetPrunable(page, XLogRecGetXid(record));
+
PageSetLSN(page, lsn);
if (xlrec->flags & XLH_INSERT_ALL_VISIBLE_CLEARED)
@@ -635,9 +641,16 @@ heap_xlog_multi_insert(XLogReaderState *record)
if (xlrec->flags & XLH_INSERT_ALL_VISIBLE_CLEARED)
PageClearAllVisible(page);
- /* XLH_INSERT_ALL_FROZEN_SET implies that all tuples are visible */
+ /*
+ * XLH_INSERT_ALL_FROZEN_SET implies that all tuples are visible. If
+ * we are not setting the page frozen, then set the page's prunable
+ * hint so that we trigger on-access pruning later which may set the
+ * page all-visible in the VM.
+ */
if (xlrec->flags & XLH_INSERT_ALL_FROZEN_SET)
PageSetAllVisible(page);
+ else
+ PageSetPrunable(page, XLogRecGetXid(record));
MarkBufferDirty(buffer);
}
diff --git a/src/test/isolation/expected/index-killtuples.out b/src/test/isolation/expected/index-killtuples.out
index be7ddd756ef..b29f2434b00 100644
--- a/src/test/isolation/expected/index-killtuples.out
+++ b/src/test/isolation/expected/index-killtuples.out
@@ -54,7 +54,7 @@ step flush: SELECT FROM pg_stat_force_next_flush();
step result: SELECT heap_blks_read + heap_blks_hit - counter.heap_accesses AS new_heap_accesses FROM counter, pg_statio_all_tables WHERE relname = 'kill_prior_tuple';
new_heap_accesses
-----------------
- 1
+ 2
(1 row)
step measure: UPDATE counter SET heap_accesses = (SELECT heap_blks_read + heap_blks_hit FROM pg_statio_all_tables WHERE relname = 'kill_prior_tuple');
@@ -130,7 +130,7 @@ step flush: SELECT FROM pg_stat_force_next_flush();
step result: SELECT heap_blks_read + heap_blks_hit - counter.heap_accesses AS new_heap_accesses FROM counter, pg_statio_all_tables WHERE relname = 'kill_prior_tuple';
new_heap_accesses
-----------------
- 1
+ 2
(1 row)
step measure: UPDATE counter SET heap_accesses = (SELECT heap_blks_read + heap_blks_hit FROM pg_statio_all_tables WHERE relname = 'kill_prior_tuple');
@@ -283,7 +283,7 @@ step flush: SELECT FROM pg_stat_force_next_flush();
step result: SELECT heap_blks_read + heap_blks_hit - counter.heap_accesses AS new_heap_accesses FROM counter, pg_statio_all_tables WHERE relname = 'kill_prior_tuple';
new_heap_accesses
-----------------
- 1
+ 2
(1 row)
step measure: UPDATE counter SET heap_accesses = (SELECT heap_blks_read + heap_blks_hit FROM pg_statio_all_tables WHERE relname = 'kill_prior_tuple');
--
2.43.0
v14-0023-Allow-on-access-pruning-to-set-pages-all-visible.patchtext/x-patch; charset=US-ASCII; name=v14-0023-Allow-on-access-pruning-to-set-pages-all-visible.patchDownload
From feedf2af7c6e0f025d4c0b35d7f7cb9df71e18a9 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <melanieplageman@gmail.com>
Date: Tue, 29 Jul 2025 14:34:30 -0400
Subject: [PATCH v14 23/24] Allow on-access pruning to set pages all-visible
Many queries do not modify the underlying relation. For such queries, if
on-access pruning occurs during the scan, we can check whether the page
has become all-visible and update the visibility map accordingly.
Previously, only vacuum marked pages as all-visible or all-frozen.
Supporting this requires passing information about whether the relation
is modified from the executor down to the scan descriptor.
This commit implements on-access VM setting for sequential scans as well
as for the underlying heap relation in index scans and bitmap heap
scans.
---
src/backend/access/heap/heapam.c | 15 +++-
src/backend/access/heap/heapam_handler.c | 15 +++-
src/backend/access/heap/pruneheap.c | 73 +++++++++++++++----
src/backend/access/index/indexam.c | 46 ++++++++++++
src/backend/access/table/tableam.c | 39 +++++++++-
src/backend/executor/execMain.c | 4 +
src/backend/executor/execUtils.c | 2 +
src/backend/executor/nodeBitmapHeapscan.c | 7 +-
src/backend/executor/nodeIndexscan.c | 18 +++--
src/backend/executor/nodeSeqscan.c | 24 ++++--
src/include/access/genam.h | 11 +++
src/include/access/heapam.h | 24 +++++-
src/include/access/relscan.h | 6 ++
src/include/access/tableam.h | 30 +++++++-
src/include/nodes/execnodes.h | 6 ++
.../t/035_standby_logical_decoding.pl | 3 +-
16 files changed, 285 insertions(+), 38 deletions(-)
diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index ab514ce65ec..94d673d92c0 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -555,6 +555,7 @@ heap_prepare_pagescan(TableScanDesc sscan)
Buffer buffer = scan->rs_cbuf;
BlockNumber block = scan->rs_cblock;
Snapshot snapshot;
+ Buffer *vmbuffer = NULL;
Page page;
int lines;
bool all_visible;
@@ -569,7 +570,9 @@ heap_prepare_pagescan(TableScanDesc sscan)
/*
* Prune and repair fragmentation for the whole page, if possible.
*/
- heap_page_prune_opt(scan->rs_base.rs_rd, buffer);
+ if (sscan->rs_flags & SO_ALLOW_VM_SET)
+ vmbuffer = &scan->rs_vmbuffer;
+ heap_page_prune_opt(scan->rs_base.rs_rd, buffer, vmbuffer);
/*
* We must hold share lock on the buffer content while examining tuple
@@ -1246,6 +1249,7 @@ heap_beginscan(Relation relation, Snapshot snapshot,
sizeof(TBMIterateResult));
}
+ scan->rs_vmbuffer = InvalidBuffer;
return (TableScanDesc) scan;
}
@@ -1284,6 +1288,12 @@ heap_rescan(TableScanDesc sscan, ScanKey key, bool set_params,
scan->rs_cbuf = InvalidBuffer;
}
+ if (BufferIsValid(scan->rs_vmbuffer))
+ {
+ ReleaseBuffer(scan->rs_vmbuffer);
+ scan->rs_vmbuffer = InvalidBuffer;
+ }
+
/*
* SO_TYPE_BITMAPSCAN would be cleaned up here, but it does not hold any
* additional data vs a normal HeapScan
@@ -1316,6 +1326,9 @@ heap_endscan(TableScanDesc sscan)
if (BufferIsValid(scan->rs_cbuf))
ReleaseBuffer(scan->rs_cbuf);
+ if (BufferIsValid(scan->rs_vmbuffer))
+ ReleaseBuffer(scan->rs_vmbuffer);
+
/*
* Must free the read stream before freeing the BufferAccessStrategy.
*/
diff --git a/src/backend/access/heap/heapam_handler.c b/src/backend/access/heap/heapam_handler.c
index bcbac844bb6..f05b9e4968d 100644
--- a/src/backend/access/heap/heapam_handler.c
+++ b/src/backend/access/heap/heapam_handler.c
@@ -85,6 +85,7 @@ heapam_index_fetch_begin(Relation rel)
hscan->xs_base.rel = rel;
hscan->xs_cbuf = InvalidBuffer;
+ hscan->xs_vmbuffer = InvalidBuffer;
return &hscan->xs_base;
}
@@ -99,6 +100,12 @@ heapam_index_fetch_reset(IndexFetchTableData *scan)
ReleaseBuffer(hscan->xs_cbuf);
hscan->xs_cbuf = InvalidBuffer;
}
+
+ if (BufferIsValid(hscan->xs_vmbuffer))
+ {
+ ReleaseBuffer(hscan->xs_vmbuffer);
+ hscan->xs_vmbuffer = InvalidBuffer;
+ }
}
static void
@@ -138,7 +145,8 @@ heapam_index_fetch_tuple(struct IndexFetchTableData *scan,
* Prune page, but only if we weren't already on this page
*/
if (prev_buf != hscan->xs_cbuf)
- heap_page_prune_opt(hscan->xs_base.rel, hscan->xs_cbuf);
+ heap_page_prune_opt(hscan->xs_base.rel, hscan->xs_cbuf,
+ scan->modifies_base_rel ? NULL : &hscan->xs_vmbuffer);
}
/* Obtain share-lock on the buffer so we can examine visibility */
@@ -2471,6 +2479,7 @@ BitmapHeapScanNextBlock(TableScanDesc scan,
TBMIterateResult *tbmres;
OffsetNumber offsets[TBM_MAX_TUPLES_PER_PAGE];
int noffsets = -1;
+ Buffer *vmbuffer = NULL;
Assert(scan->rs_flags & SO_TYPE_BITMAPSCAN);
Assert(hscan->rs_read_stream);
@@ -2517,7 +2526,9 @@ BitmapHeapScanNextBlock(TableScanDesc scan,
/*
* Prune and repair fragmentation for the whole page, if possible.
*/
- heap_page_prune_opt(scan->rs_rd, buffer);
+ if (scan->rs_flags & SO_ALLOW_VM_SET)
+ vmbuffer = &hscan->rs_vmbuffer;
+ heap_page_prune_opt(scan->rs_rd, buffer, vmbuffer);
/*
* We must hold share lock on the buffer content while examining tuple
diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index c29f47ab151..3eaee398735 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -45,6 +45,8 @@ typedef struct
bool mark_unused_now;
/* whether to attempt freezing tuples */
bool attempt_freeze;
+ /* whether or not to attempt updating the VM */
+ bool attempt_update_vm;
const struct VacuumCutoffs *cutoffs;
/*-------------------------------------------------------
@@ -185,9 +187,13 @@ static void page_verify_redirects(Page page);
* if there's not any use in pruning.
*
* Caller must have pin on the buffer, and must *not* have a lock on it.
+ *
+ * If vmbuffer is not NULL, it is okay for pruning to set the visibility map if
+ * the page is all visible. We will take care of pinning and, if needed,
+ * reading in the page of the visibility map.
*/
void
-heap_page_prune_opt(Relation relation, Buffer buffer)
+heap_page_prune_opt(Relation relation, Buffer buffer, Buffer *vmbuffer)
{
Page page = BufferGetPage(buffer);
TransactionId prune_xid;
@@ -251,6 +257,13 @@ heap_page_prune_opt(Relation relation, Buffer buffer)
{
OffsetNumber dummy_off_loc;
PruneFreezeResult presult;
+ int options = 0;
+
+ if (vmbuffer)
+ {
+ visibilitymap_pin(relation, BufferGetBlockNumber(buffer), vmbuffer);
+ options = HEAP_PAGE_PRUNE_UPDATE_VIS;
+ }
/*
* For now, pass mark_unused_now as false regardless of whether or
@@ -258,8 +271,9 @@ heap_page_prune_opt(Relation relation, Buffer buffer)
* that during on-access pruning with the current implementation.
*/
heap_page_prune_and_freeze(relation, buffer,
- InvalidBuffer, false,
- PRUNE_ON_ACCESS, 0, NULL,
+ vmbuffer ? *vmbuffer : InvalidBuffer,
+ false, /* blk_known_av */
+ PRUNE_ON_ACCESS, options, NULL,
vistest, &presult, &dummy_off_loc, NULL, NULL);
/*
@@ -443,6 +457,8 @@ heap_page_will_set_vis(Relation relation,
Buffer heap_buf,
Buffer vmbuffer,
bool blk_known_av,
+ PruneReason reason,
+ bool do_prune, bool do_freeze,
PruneState *prstate,
uint8 *vmflags,
bool *do_set_pd_vis)
@@ -450,6 +466,32 @@ heap_page_will_set_vis(Relation relation,
Page heap_page = BufferGetPage(heap_buf);
bool do_set_vm = false;
+ *do_set_pd_vis = false;
+
+ if (!prstate->attempt_update_vm)
+ {
+ Assert(!prstate->all_visible && !prstate->all_frozen);
+ Assert(*vmflags == 0);
+ return false;
+ }
+
+ /*
+ * If this is an on-access call and we're not actually pruning, avoid
+ * setting the visibility map if it would newly dirty the heap page or, if
+ * the page is already dirty, if doing so would require including a
+ * full-page image (FPI) of the heap page in the WAL. This situation
+ * should be rare, as on-access pruning is only attempted when
+ * pd_prune_xid is valid.
+ */
+ if (reason == PRUNE_ON_ACCESS &&
+ prstate->all_visible &&
+ !do_prune && !do_freeze &&
+ (!BufferIsDirty(heap_buf) || XLogCheckBufferNeedsBackup(heap_buf)))
+ {
+ prstate->all_visible = prstate->all_frozen = false;
+ return false;
+ }
+
if (prstate->all_visible && !PageIsAllVisible(heap_page))
*do_set_pd_vis = true;
@@ -473,6 +515,9 @@ heap_page_will_set_vis(Relation relation,
* page-level bit is clear. However, it's possible that in vacuum the bit
* got cleared after heap_vac_scan_next_block() was called, so we must
* recheck with buffer lock before concluding that the VM is corrupt.
+ *
+ * XXX: This will never trigger for on-access pruning because it passes
+ * blk_known_av as false. Should we remove that condition here?
*/
else if (blk_known_av && !PageIsAllVisible(heap_page) &&
visibilitymap_get_status(relation, heap_blk, &vmbuffer) != 0)
@@ -615,6 +660,7 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
prstate.vistest = vistest;
prstate.mark_unused_now = (options & HEAP_PAGE_PRUNE_MARK_UNUSED_NOW) != 0;
prstate.attempt_freeze = (options & HEAP_PAGE_PRUNE_FREEZE) != 0;
+ prstate.attempt_update_vm = (options & HEAP_PAGE_PRUNE_UPDATE_VIS) != 0;
prstate.cutoffs = cutoffs;
/*
@@ -692,7 +738,7 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
prstate.all_visible = true;
prstate.all_frozen = true;
}
- else if ((options & HEAP_PAGE_PRUNE_UPDATE_VIS) != 0)
+ else if (prstate.attempt_update_vm)
{
prstate.all_visible = true;
prstate.all_frozen = false;
@@ -951,7 +997,7 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
if (prstate.lpdead_items > 0)
prstate.all_visible = prstate.all_frozen = false;
- Assert(!prstate.all_frozen || prstate.all_visible);
+
/*
* Determine whether or not to set the page level PD_ALL_VISIBLE and the
@@ -968,12 +1014,12 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
* As such, it is possible to only update the VM when PD_ALL_VISIBLE is
* already set.
*/
- do_set_pd_vis = false;
- do_set_vm = false;
- if ((options & HEAP_PAGE_PRUNE_UPDATE_VIS) != 0)
- do_set_vm = heap_page_will_set_vis(relation,
- blockno, buffer, vmbuffer, blk_known_av,
- &prstate, &new_vmbits, &do_set_pd_vis);
+ do_set_vm = heap_page_will_set_vis(relation,
+ blockno, buffer, vmbuffer, blk_known_av,
+ reason, do_prune, do_freeze,
+ &prstate, &new_vmbits, &do_set_pd_vis);
+
+ Assert(!prstate.all_frozen || prstate.all_visible);
/* Lock vmbuffer before entering a critical section */
if (do_set_vm)
@@ -1133,7 +1179,6 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
bool debug_all_frozen;
Assert(prstate.lpdead_items == 0);
- Assert(prstate.cutoffs);
if (!heap_page_is_all_visible(relation, buffer,
prstate.vistest,
@@ -2298,8 +2343,8 @@ heap_log_freeze_plan(HeapTupleFreeze *tuples, int ntuples,
* - Reaping: During vacuum phase III, items that are already LP_DEAD are
* marked as unused.
*
- * - VM updates: After vacuum phases I and III, the heap page may be marked
- * all-visible and all-frozen.
+ * - VM updates: After vacuum phases I and III and on-access, the heap page
+ * may be marked all-visible and all-frozen.
*
* These changes all happen together, so we use a single WAL record for them
* all.
diff --git a/src/backend/access/index/indexam.c b/src/backend/access/index/indexam.c
index 86d11f4ec79..4603ece09bd 100644
--- a/src/backend/access/index/indexam.c
+++ b/src/backend/access/index/indexam.c
@@ -289,6 +289,32 @@ index_beginscan(Relation heapRelation,
return scan;
}
+/*
+ * Similar to index_beginscan(), but allows the caller to indicate whether the
+ * query modifies the underlying base relation. This is used when the caller
+ * wants to attempt marking pages in the base relation as all-visible in the
+ * visibility map during on-access pruning.
+ */
+IndexScanDesc
+index_beginscan_vmset(Relation heapRelation,
+ Relation indexRelation,
+ Snapshot snapshot,
+ IndexScanInstrumentation *instrument,
+ int nkeys, int norderbys, bool modifies_base_rel)
+{
+ IndexScanDesc scan;
+
+ scan = index_beginscan(heapRelation,
+ indexRelation,
+ snapshot,
+ instrument,
+ nkeys, norderbys);
+
+ scan->xs_heapfetch->modifies_base_rel = modifies_base_rel;
+
+ return scan;
+}
+
/*
* index_beginscan_bitmap - start a scan of an index with amgetbitmap
*
@@ -620,6 +646,26 @@ index_beginscan_parallel(Relation heaprel, Relation indexrel,
return scan;
}
+/*
+ * Parallel version of index_beginscan_vmset()
+ */
+IndexScanDesc
+index_beginscan_parallel_vmset(Relation heaprel, Relation indexrel,
+ IndexScanInstrumentation *instrument,
+ int nkeys, int norderbys,
+ ParallelIndexScanDesc pscan,
+ bool modifies_base_rel)
+{
+ IndexScanDesc scan;
+
+ scan = index_beginscan_parallel(heaprel, indexrel,
+ instrument,
+ nkeys, norderbys,
+ pscan);
+ scan->xs_heapfetch->modifies_base_rel = modifies_base_rel;
+ return scan;
+}
+
/* ----------------
* index_getnext_tid - get the next TID from a scan
*
diff --git a/src/backend/access/table/tableam.c b/src/backend/access/table/tableam.c
index a56c5eceb14..67dbf99f5b5 100644
--- a/src/backend/access/table/tableam.c
+++ b/src/backend/access/table/tableam.c
@@ -49,6 +49,10 @@
char *default_table_access_method = DEFAULT_TABLE_ACCESS_METHOD;
bool synchronize_seqscans = true;
+/* Helper for table_beginscan_parallel() and table_beginscan_parallel_vmset() */
+static TableScanDesc table_beginscan_parallel_common(Relation relation, ParallelTableScanDesc pscan,
+ uint32 flags);
+
/* ----------------------------------------------------------------------------
* Slot functions.
@@ -162,12 +166,14 @@ table_parallelscan_initialize(Relation rel, ParallelTableScanDesc pscan,
}
}
-TableScanDesc
-table_beginscan_parallel(Relation relation, ParallelTableScanDesc pscan)
+/*
+ * Common helper for table_beginscan_parallel() and table_beginscan_parallel_vmset()
+ */
+static TableScanDesc
+table_beginscan_parallel_common(Relation relation, ParallelTableScanDesc pscan,
+ uint32 flags)
{
Snapshot snapshot;
- uint32 flags = SO_TYPE_SEQSCAN |
- SO_ALLOW_STRAT | SO_ALLOW_SYNC | SO_ALLOW_PAGEMODE;
Assert(RelFileLocatorEquals(relation->rd_locator, pscan->phs_locator));
@@ -188,6 +194,31 @@ table_beginscan_parallel(Relation relation, ParallelTableScanDesc pscan)
pscan, flags);
}
+TableScanDesc
+table_beginscan_parallel(Relation relation, ParallelTableScanDesc pscan)
+{
+ uint32 flags = SO_TYPE_SEQSCAN |
+ SO_ALLOW_STRAT | SO_ALLOW_SYNC | SO_ALLOW_PAGEMODE;
+
+ return table_beginscan_parallel_common(relation, pscan, flags);
+}
+
+/*
+ * Parallel version of table_beginscan_vmset()
+ */
+TableScanDesc
+table_beginscan_parallel_vmset(Relation relation, ParallelTableScanDesc pscan,
+ bool modifies_rel)
+{
+ uint32 flags = SO_TYPE_SEQSCAN |
+ SO_ALLOW_STRAT | SO_ALLOW_SYNC | SO_ALLOW_PAGEMODE;
+
+ if (!modifies_rel)
+ flags |= SO_ALLOW_VM_SET;
+
+ return table_beginscan_parallel_common(relation, pscan, flags);
+}
+
/* ----------------------------------------------------------------------------
* Index scan related functions.
diff --git a/src/backend/executor/execMain.c b/src/backend/executor/execMain.c
index ff12e2e1364..2e0474c948a 100644
--- a/src/backend/executor/execMain.c
+++ b/src/backend/executor/execMain.c
@@ -916,6 +916,10 @@ InitPlan(QueryDesc *queryDesc, int eflags)
break;
}
+ /* If it has a rowmark, the relation is modified */
+ estate->es_modified_relids = bms_add_member(estate->es_modified_relids,
+ rc->rti);
+
/* Check that relation is a legal target for marking */
if (relation)
CheckValidRowMarkRel(relation, rc->markType);
diff --git a/src/backend/executor/execUtils.c b/src/backend/executor/execUtils.c
index fdc65c2b42b..28a06dcd244 100644
--- a/src/backend/executor/execUtils.c
+++ b/src/backend/executor/execUtils.c
@@ -893,6 +893,8 @@ ExecInitResultRelation(EState *estate, ResultRelInfo *resultRelInfo,
estate->es_result_relations = (ResultRelInfo **)
palloc0(estate->es_range_table_size * sizeof(ResultRelInfo *));
estate->es_result_relations[rti - 1] = resultRelInfo;
+ estate->es_modified_relids = bms_add_member(estate->es_modified_relids,
+ rti);
/*
* Saving in the list allows to avoid needlessly traversing the whole
diff --git a/src/backend/executor/nodeBitmapHeapscan.c b/src/backend/executor/nodeBitmapHeapscan.c
index bf24f3d7fe0..af6db9f7919 100644
--- a/src/backend/executor/nodeBitmapHeapscan.c
+++ b/src/backend/executor/nodeBitmapHeapscan.c
@@ -105,11 +105,16 @@ BitmapTableScanSetup(BitmapHeapScanState *node)
*/
if (!node->ss.ss_currentScanDesc)
{
+ bool modifies_rel =
+ bms_is_member(((Scan *) node->ss.ps.plan)->scanrelid,
+ node->ss.ps.state->es_modified_relids);
+
node->ss.ss_currentScanDesc =
table_beginscan_bm(node->ss.ss_currentRelation,
node->ss.ps.state->es_snapshot,
0,
- NULL);
+ NULL,
+ modifies_rel);
}
node->ss.ss_currentScanDesc->st.rs_tbmiterator = tbmiterator;
diff --git a/src/backend/executor/nodeIndexscan.c b/src/backend/executor/nodeIndexscan.c
index 7fcaa37fe62..c2ffbd3b08e 100644
--- a/src/backend/executor/nodeIndexscan.c
+++ b/src/backend/executor/nodeIndexscan.c
@@ -102,16 +102,22 @@ IndexNext(IndexScanState *node)
if (scandesc == NULL)
{
+
+ bool modifies_base_rel =
+ bms_is_member(((Scan *) node->ss.ps.plan)->scanrelid,
+ estate->es_modified_relids);
+
/*
* We reach here if the index scan is not parallel, or if we're
* serially executing an index scan that was planned to be parallel.
*/
- scandesc = index_beginscan(node->ss.ss_currentRelation,
- node->iss_RelationDesc,
- estate->es_snapshot,
- &node->iss_Instrument,
- node->iss_NumScanKeys,
- node->iss_NumOrderByKeys);
+ scandesc = index_beginscan_vmset(node->ss.ss_currentRelation,
+ node->iss_RelationDesc,
+ estate->es_snapshot,
+ &node->iss_Instrument,
+ node->iss_NumScanKeys,
+ node->iss_NumOrderByKeys,
+ modifies_base_rel);
node->iss_ScanDesc = scandesc;
diff --git a/src/backend/executor/nodeSeqscan.c b/src/backend/executor/nodeSeqscan.c
index 94047d29430..fd69275c181 100644
--- a/src/backend/executor/nodeSeqscan.c
+++ b/src/backend/executor/nodeSeqscan.c
@@ -65,13 +65,18 @@ SeqNext(SeqScanState *node)
if (scandesc == NULL)
{
+ bool modifies_rel =
+ bms_is_member(((Scan *) node->ss.ps.plan)->scanrelid,
+ estate->es_modified_relids);
+
/*
* We reach here if the scan is not parallel, or if we're serially
* executing a scan that was planned to be parallel.
*/
- scandesc = table_beginscan(node->ss.ss_currentRelation,
- estate->es_snapshot,
- 0, NULL);
+ scandesc = table_beginscan_vmset(node->ss.ss_currentRelation,
+ estate->es_snapshot,
+ 0, NULL, modifies_rel);
+
node->ss.ss_currentScanDesc = scandesc;
}
@@ -366,6 +371,7 @@ ExecSeqScanInitializeDSM(SeqScanState *node,
ParallelContext *pcxt)
{
EState *estate = node->ss.ps.state;
+ bool modifies_rel;
ParallelTableScanDesc pscan;
pscan = shm_toc_allocate(pcxt->toc, node->pscan_len);
@@ -373,8 +379,11 @@ ExecSeqScanInitializeDSM(SeqScanState *node,
pscan,
estate->es_snapshot);
shm_toc_insert(pcxt->toc, node->ss.ps.plan->plan_node_id, pscan);
+ modifies_rel = bms_is_member(((Scan *) node->ss.ps.plan)->scanrelid,
+ estate->es_modified_relids);
node->ss.ss_currentScanDesc =
- table_beginscan_parallel(node->ss.ss_currentRelation, pscan);
+ table_beginscan_parallel_vmset(node->ss.ss_currentRelation, pscan,
+ modifies_rel);
}
/* ----------------------------------------------------------------
@@ -404,8 +413,13 @@ ExecSeqScanInitializeWorker(SeqScanState *node,
ParallelWorkerContext *pwcxt)
{
ParallelTableScanDesc pscan;
+ bool modifies_rel =
+ bms_is_member(((Scan *) node->ss.ps.plan)->scanrelid,
+ node->ss.ps.state->es_modified_relids);
pscan = shm_toc_lookup(pwcxt->toc, node->ss.ps.plan->plan_node_id, false);
node->ss.ss_currentScanDesc =
- table_beginscan_parallel(node->ss.ss_currentRelation, pscan);
+ table_beginscan_parallel_vmset(node->ss.ss_currentRelation,
+ pscan,
+ modifies_rel);
}
diff --git a/src/include/access/genam.h b/src/include/access/genam.h
index 5b2ab181b5f..bf272c2c37f 100644
--- a/src/include/access/genam.h
+++ b/src/include/access/genam.h
@@ -180,6 +180,11 @@ extern IndexScanDesc index_beginscan(Relation heapRelation,
Snapshot snapshot,
IndexScanInstrumentation *instrument,
int nkeys, int norderbys);
+extern IndexScanDesc index_beginscan_vmset(Relation heapRelation,
+ Relation indexRelation,
+ Snapshot snapshot,
+ IndexScanInstrumentation *instrument,
+ int nkeys, int norderbys, bool modifies_heap_rel);
extern IndexScanDesc index_beginscan_bitmap(Relation indexRelation,
Snapshot snapshot,
IndexScanInstrumentation *instrument,
@@ -206,6 +211,12 @@ extern IndexScanDesc index_beginscan_parallel(Relation heaprel,
IndexScanInstrumentation *instrument,
int nkeys, int norderbys,
ParallelIndexScanDesc pscan);
+
+extern IndexScanDesc index_beginscan_parallel_vmset(Relation heaprel, Relation indexrel,
+ IndexScanInstrumentation *instrument,
+ int nkeys, int norderbys,
+ ParallelIndexScanDesc pscan,
+ bool modifies_rel);
extern ItemPointer index_getnext_tid(IndexScanDesc scan,
ScanDirection direction);
struct TupleTableSlot;
diff --git a/src/include/access/heapam.h b/src/include/access/heapam.h
index 35a25cf0b04..4da629067d1 100644
--- a/src/include/access/heapam.h
+++ b/src/include/access/heapam.h
@@ -94,6 +94,13 @@ typedef struct HeapScanDescData
*/
ParallelBlockTableScanWorkerData *rs_parallelworkerdata;
+ /*
+ * For sequential scans and bitmap heap scans. If the relation is not
+ * being modified, on-access pruning may read in the current heap page's
+ * corresponding VM block to this buffer.
+ */
+ Buffer rs_vmbuffer;
+
/* these fields only used in page-at-a-time mode and for bitmap scans */
uint32 rs_cindex; /* current tuple's index in vistuples */
uint32 rs_ntuples; /* number of visible tuples on page */
@@ -116,8 +123,18 @@ typedef struct IndexFetchHeapData
{
IndexFetchTableData xs_base; /* AM independent part of the descriptor */
- Buffer xs_cbuf; /* current heap buffer in scan, if any */
- /* NB: if xs_cbuf is not InvalidBuffer, we hold a pin on that buffer */
+ /*
+ * Current heap buffer in scan, if any. NB: if xs_cbuf is not
+ * InvalidBuffer, we hold a pin on that buffer.
+ */
+ Buffer xs_cbuf;
+
+ /*
+ * For index scans that do not modify the underlying heap table, on-access
+ * pruning may read in the current heap page's corresponding VM block to
+ * this buffer.
+ */
+ Buffer xs_vmbuffer;
} IndexFetchHeapData;
/* Result codes for HeapTupleSatisfiesVacuum */
@@ -369,7 +386,8 @@ extern TransactionId heap_index_delete_tuples(Relation rel,
/* in heap/pruneheap.c */
struct GlobalVisState;
-extern void heap_page_prune_opt(Relation relation, Buffer buffer);
+extern void heap_page_prune_opt(Relation relation, Buffer buffer,
+ Buffer *vmbuffer);
extern void heap_page_prune_and_freeze(Relation relation, Buffer buffer,
Buffer vmbuffer, bool blk_known_av,
PruneReason reason,
diff --git a/src/include/access/relscan.h b/src/include/access/relscan.h
index b5e0fb386c0..f496e0b4939 100644
--- a/src/include/access/relscan.h
+++ b/src/include/access/relscan.h
@@ -121,6 +121,12 @@ typedef struct ParallelBlockTableScanWorkerData *ParallelBlockTableScanWorker;
typedef struct IndexFetchTableData
{
Relation rel;
+
+ /*
+ * Some optimizations can only be performed if the query does not modify
+ * the underlying relation. Track that here.
+ */
+ bool modifies_base_rel;
} IndexFetchTableData;
struct IndexScanInstrumentation;
diff --git a/src/include/access/tableam.h b/src/include/access/tableam.h
index b2ce35e2a34..e31c21cf8eb 100644
--- a/src/include/access/tableam.h
+++ b/src/include/access/tableam.h
@@ -62,6 +62,8 @@ typedef enum ScanOptions
/* unregister snapshot at scan end? */
SO_TEMP_SNAPSHOT = 1 << 9,
+ /* whether or not scan should attempt to set the VM */
+ SO_ALLOW_VM_SET = 1 << 10,
} ScanOptions;
/*
@@ -881,6 +883,25 @@ table_beginscan(Relation rel, Snapshot snapshot,
return rel->rd_tableam->scan_begin(rel, snapshot, nkeys, key, NULL, flags);
}
+/*
+ * Similar to table_beginscan(), but allows the caller to indicate whether the
+ * query modifies the relation. This is used when the caller wants to attempt
+ * marking pages in the relation as all-visible in the visibility map during
+ * on-access pruning.
+ */
+static inline TableScanDesc
+table_beginscan_vmset(Relation rel, Snapshot snapshot,
+ int nkeys, struct ScanKeyData *key, bool modifies_rel)
+{
+ uint32 flags = SO_TYPE_SEQSCAN |
+ SO_ALLOW_STRAT | SO_ALLOW_SYNC | SO_ALLOW_PAGEMODE;
+
+ if (!modifies_rel)
+ flags |= SO_ALLOW_VM_SET;
+
+ return rel->rd_tableam->scan_begin(rel, snapshot, nkeys, key, NULL, flags);
+}
+
/*
* Like table_beginscan(), but for scanning catalog. It'll automatically use a
* snapshot appropriate for scanning catalog relations.
@@ -918,10 +939,13 @@ table_beginscan_strat(Relation rel, Snapshot snapshot,
*/
static inline TableScanDesc
table_beginscan_bm(Relation rel, Snapshot snapshot,
- int nkeys, struct ScanKeyData *key)
+ int nkeys, struct ScanKeyData *key, bool modifies_rel)
{
uint32 flags = SO_TYPE_BITMAPSCAN | SO_ALLOW_PAGEMODE;
+ if (!modifies_rel)
+ flags |= SO_ALLOW_VM_SET;
+
return rel->rd_tableam->scan_begin(rel, snapshot, nkeys, key,
NULL, flags);
}
@@ -1130,6 +1154,10 @@ extern void table_parallelscan_initialize(Relation rel,
extern TableScanDesc table_beginscan_parallel(Relation relation,
ParallelTableScanDesc pscan);
+extern TableScanDesc table_beginscan_parallel_vmset(Relation relation,
+ ParallelTableScanDesc pscan,
+ bool modifies_rel);
+
/*
* Restart a parallel scan. Call this in the leader process. Caller is
* responsible for making sure that all workers have finished the scan
diff --git a/src/include/nodes/execnodes.h b/src/include/nodes/execnodes.h
index 3a920cc7d17..c854be93436 100644
--- a/src/include/nodes/execnodes.h
+++ b/src/include/nodes/execnodes.h
@@ -676,6 +676,12 @@ typedef struct EState
* ExecDoInitialPruning() */
const char *es_sourceText; /* Source text from QueryDesc */
+ /*
+ * RT indexes of relations modified by the query either through
+ * UPDATE/DELETE/INSERT/MERGE or SELECT FOR UPDATE
+ */
+ Bitmapset *es_modified_relids;
+
JunkFilter *es_junkFilter; /* top-level junk filter, if any */
/* If query can insert/delete tuples, the command ID to mark them with */
diff --git a/src/test/recovery/t/035_standby_logical_decoding.pl b/src/test/recovery/t/035_standby_logical_decoding.pl
index c9c182892cf..f5c0c65b260 100644
--- a/src/test/recovery/t/035_standby_logical_decoding.pl
+++ b/src/test/recovery/t/035_standby_logical_decoding.pl
@@ -296,6 +296,7 @@ wal_level = 'logical'
max_replication_slots = 4
max_wal_senders = 4
autovacuum = off
+hot_standby_feedback = on
});
$node_primary->dump_info;
$node_primary->start;
@@ -745,7 +746,7 @@ check_pg_recvlogical_stderr($handle,
$logstart = -s $node_standby->logfile;
reactive_slots_change_hfs_and_wait_for_xmins('shared_row_removal_',
- 'no_conflict_', 0, 1);
+ 'no_conflict_', 1, 0);
# This should not trigger a conflict
wait_until_vacuum_can_remove(
--
2.43.0
Hi,
On 2025-09-17 20:10:07 -0400, Melanie Plageman wrote:
0001 is RFC but waiting on one other reviewer
From cacff6c95e38d370b87148bc48cf6ac5f086ed07 Mon Sep 17 00:00:00 2001 From: Melanie Plageman <melanieplageman@gmail.com> Date: Tue, 17 Jun 2025 17:22:10 -0400 Subject: [PATCH v14 01/24] Eliminate COPY FREEZE use of XLOG_HEAP2_VISIBLE diff --git a/src/backend/access/heap/heapam_xlog.c b/src/backend/access/heap/heapam_xlog.c index cf843277938..faa7c561a8a 100644 --- a/src/backend/access/heap/heapam_xlog.c +++ b/src/backend/access/heap/heapam_xlog.c @@ -551,6 +551,7 @@ heap_xlog_multi_insert(XLogReaderState *record) int i; bool isinit = (XLogRecGetInfo(record) & XLOG_HEAP_INIT_PAGE) != 0; XLogRedoAction action; + Buffer vmbuffer = InvalidBuffer;/*
* Insertion doesn't overwrite MVCC data, so no conflict processing is
@@ -571,11 +572,11 @@ heap_xlog_multi_insert(XLogReaderState *record)
if (xlrec->flags & XLH_INSERT_ALL_VISIBLE_CLEARED)
{
Relation reln = CreateFakeRelcacheEntry(rlocator);
- Buffer vmbuffer = InvalidBuffer;visibilitymap_pin(reln, blkno, &vmbuffer);
visibilitymap_clear(reln, blkno, vmbuffer, VISIBILITYMAP_VALID_BITS);
ReleaseBuffer(vmbuffer);
+ vmbuffer = InvalidBuffer;
FreeFakeRelcacheEntry(reln);
}@@ -662,6 +663,57 @@ heap_xlog_multi_insert(XLogReaderState *record)
if (BufferIsValid(buffer))
UnlockReleaseBuffer(buffer);+ buffer = InvalidBuffer; + + /* + * Now read and update the VM block. + * + * Note that the heap relation may have been dropped or truncated, leading + * us to skip updating the heap block due to the LSN interlock.
I don't fully understand this - how does dropping/truncating the relation lead
to skipping due to the LSN interlock?
+ * even in that case, it's still safe to update the visibility map. Any + * WAL record that clears the visibility map bit does so before checking + * the page LSN, so any bits that need to be cleared will still be + * cleared. + * + * Note that the lock on the heap page was dropped above. In normal + * operation this would never be safe because a concurrent query could + * modify the heap page and clear PD_ALL_VISIBLE -- violating the + * invariant that PD_ALL_VISIBLE must be set if the corresponding bit in + * the VM is set. + * + * In recovery, we expect no other writers, so writing to the VM page + * without holding a lock on the heap page is considered safe enough. It + * is done this way when replaying xl_heap_visible records (see + * heap_xlog_visible()). + */ + if (xlrec->flags & XLH_INSERT_ALL_FROZEN_SET && + XLogReadBufferForRedoExtended(record, 1, RBM_ZERO_ON_ERROR, false, + &vmbuffer) == BLK_NEEDS_REDO) + {
Why are we using RBM_ZERO_ON_ERROR here? I know it's copied from
heap_xlog_visible(), but I don't immediately understand (or remember) why we
do so there either?
+ Page vmpage = BufferGetPage(vmbuffer); + Relation reln = CreateFakeRelcacheEntry(rlocator);
Hm. Do we really need to continue doing this ugly fake relcache stuff? I'd
really like to eventually get rid of that and given that the new "code shape"
delegates a lot more responsibility to the redo routines, they should have a
fairly easy time not needing a fake relcache? Afaict the relation already is
not used outside of debugging paths?
+ /* initialize the page if it was read as zeros */ + if (PageIsNew(vmpage)) + PageInit(vmpage, BLCKSZ, 0); + + visibilitymap_set_vmbits(reln, blkno, + vmbuffer, + VISIBILITYMAP_ALL_VISIBLE | + VISIBILITYMAP_ALL_FROZEN); + + /* + * It is not possible that the VM was already set for this heap page, + * so the vmbuffer must have been modified and marked dirty. + */
I assume that's because we a) checked the LSN interlock b) are replaying
something that needed to newly set the bit?
Except for the above comments, this looks pretty good to me.
Seems 0002 should just be applied...
Re 0003: I wonder if it's getting to the point that a struct should be used as
the argument.
Greetings,
Andres Freund
On Thu, Sep 18, 2025 at 12:48 PM Andres Freund <andres@anarazel.de> wrote:
On 2025-09-17 20:10:07 -0400, Melanie Plageman wrote:
+ /* + * Now read and update the VM block. + * + * Note that the heap relation may have been dropped or truncated, leading + * us to skip updating the heap block due to the LSN interlock.I don't fully understand this - how does dropping/truncating the relation lead
to skipping due to the LSN interlock?
Yes, this wasn't right. I misunderstood.
What I think it should say is that if the heap update was skipped due
to LSN interlock we still have to replay the updates to the VM because
each vm page contains bits for multiple heap blocks and if the record
included a vm page FPI, subsequent updates to the VM may rely on this
FPI to avoid torn pages. We don't condition it on the heap redo having
been an FPI, probably because it is not worth it -- but I wonder if
that is worth calling out in the comment?
Do we also need to replay it when the heap redo returns BLK_NOTFOUND?
I assume this can happen in the case of relation dropped or truncated
-- but in this case there wouldn't be subsequent records updating the
VM for other heap blocks that we need to replay because the other heap
blocks won't be found either, right?
+ if (xlrec->flags & XLH_INSERT_ALL_FROZEN_SET && + XLogReadBufferForRedoExtended(record, 1, RBM_ZERO_ON_ERROR, false, + &vmbuffer) == BLK_NEEDS_REDO) + {Why are we using RBM_ZERO_ON_ERROR here? I know it's copied from
heap_xlog_visible(), but I don't immediately understand (or remember) why we
do so there either?
It has been RBM_ZERO_ON_ERROR since XLogReadBufferForRedoExtended()
was introduced here in 2c03216d8311.
I think we probably do this because vm_readbuf() passes ReadBuffer()
RBM_ZERO_ON_ERROR and has this comment
* For reading we use ZERO_ON_ERROR mode, and initialize the page if
* necessary. It's always safe to clear bits, so it's better to clear
* corrupt pages than error out.
Do you think I also should have a comment in heap_xlog_multi_insert()?
+ Page vmpage = BufferGetPage(vmbuffer); + Relation reln = CreateFakeRelcacheEntry(rlocator);Hm. Do we really need to continue doing this ugly fake relcache stuff? I'd
really like to eventually get rid of that and given that the new "code shape"
delegates a lot more responsibility to the redo routines, they should have a
fairly easy time not needing a fake relcache? Afaict the relation already is
not used outside of debugging paths?
Yes, interestingly we don't have the relname in recovery anyway, so it
does all this fake relcache stuff only to convert the relfilenode to a
string and uses that.
The fake relcache stuff will still be used by visibilitymap_pin()
which callers like heap_xlog_delete() use to get the VM page. And I
don't think it is worth coming up with a version of that that doesn't
use the relcache. But you're right that the Relation is not needed for
visibilitymap_set_vmbits(). I've changed it to just take the relation
name as a string.
+ /* initialize the page if it was read as zeros */ + if (PageIsNew(vmpage)) + PageInit(vmpage, BLCKSZ, 0); + + visibilitymap_set_vmbits(reln, blkno, + vmbuffer, + VISIBILITYMAP_ALL_VISIBLE | + VISIBILITYMAP_ALL_FROZEN); + + /* + * It is not possible that the VM was already set for this heap page, + * so the vmbuffer must have been modified and marked dirty. + */I assume that's because we a) checked the LSN interlock b) are replaying
something that needed to newly set the bit?
Yes, perhaps it is not worth having the assert since it attracts extra
attention to an invariant that is unlikely to be in danger of
regression.
Seems 0002 should just be applied...
Done
Re 0003: I wonder if it's getting to the point that a struct should be used as
the argument.
I have been thinking about this. I have yet to come up with a good
idea for a struct name or multiple struct names that seem to fit here.
I could move the other output parameters into the PruneFreezeResult
and then maybe make some kind of PruneFreezeParameters struct or
something?
- Melanie
Attachments:
v15-0015-Set-VM-in-heap_page_prune_and_freeze.patchtext/x-patch; charset=US-ASCII; name=v15-0015-Set-VM-in-heap_page_prune_and_freeze.patchDownload
From 75a2d24ed02733533027b9fe17f25160d2529b0c Mon Sep 17 00:00:00 2001
From: Melanie Plageman <melanieplageman@gmail.com>
Date: Tue, 16 Sep 2025 15:46:40 -0400
Subject: [PATCH v15 15/23] Set VM in heap_page_prune_and_freeze
The determination as to whether or not the page can be set
all-visible/all-frozen has already been done by the end of
heap_page_prune_and_freeze(). Vacuum waited until it returns to
lazy_scan_prune() to actually set the VM, though.
This commit moves setting the VM into heap_page_prune_and_freeze().
There are still two separate WAL records -- one for the changes to the
heap page and one for the changes to the VM. But, this is an incremental
step toward logging setting the VM in the same WAL record as pruning and
freezing.
Note that this is not used by on-access pruning.
---
src/backend/access/heap/pruneheap.c | 221 +++++++++++++++++++++++----
src/backend/access/heap/vacuumlazy.c | 146 ++----------------
src/include/access/heapam.h | 24 +--
3 files changed, 221 insertions(+), 170 deletions(-)
diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index 9e00fbf3cd1..e3f9967e26c 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -19,6 +19,7 @@
#include "access/htup_details.h"
#include "access/multixact.h"
#include "access/transam.h"
+#include "access/visibilitymap.h"
#include "access/xlog.h"
#include "access/visibilitymapdefs.h"
#include "access/xloginsert.h"
@@ -257,7 +258,9 @@ heap_page_prune_opt(Relation relation, Buffer buffer)
* not the relation has indexes, since we cannot safely determine
* that during on-access pruning with the current implementation.
*/
- heap_page_prune_and_freeze(relation, buffer, PRUNE_ON_ACCESS, 0, NULL,
+ heap_page_prune_and_freeze(relation, buffer,
+ InvalidBuffer, false,
+ PRUNE_ON_ACCESS, 0, NULL,
vistest, &presult, &dummy_off_loc, NULL, NULL);
/*
@@ -423,16 +426,115 @@ heap_page_will_freeze(Relation relation, Buffer buffer,
return do_freeze;
}
+/*
+ * Determine whether to set the visibility map bits based on information from
+ * the PruneState and blk_known_av, which some callers will provide after
+ * previously examining this heap page's VM bits (e.g. vacuum from the last
+ * heap_vac_scan_next_block() call).
+ *
+ * This should be called only after do_freeze has been decided (and do_prune
+ * has been set), as these factor into our heuristic-based decision.
+ *
+ * Returns true if the caller should set one or both of the VM bits and false
+ * otherwise.
+ */
+static bool
+heap_page_will_set_vis(Relation relation,
+ BlockNumber heap_blk,
+ Buffer heap_buf,
+ Buffer vmbuffer,
+ bool blk_known_av,
+ PruneState *prstate,
+ uint8 *vmflags,
+ bool *do_set_pd_vis)
+{
+ Page heap_page = BufferGetPage(heap_buf);
+ bool do_set_vm = false;
+
+ if (prstate->all_visible && !PageIsAllVisible(heap_page))
+ *do_set_pd_vis = true;
+
+ if ((prstate->all_visible && !blk_known_av) ||
+ (prstate->all_frozen && !VM_ALL_FROZEN(relation, heap_blk, &vmbuffer)))
+ {
+ *vmflags = VISIBILITYMAP_ALL_VISIBLE;
+ if (prstate->all_frozen)
+ *vmflags |= VISIBILITYMAP_ALL_FROZEN;
+
+ do_set_vm = true;
+ }
+
+ /*
+ * Now handle two potential corruption cases:
+ *
+ * These do not need to happen in a critical section and are not
+ * WAL-logged.
+ *
+ * As of PostgreSQL 9.2, the visibility map bit should never be set if the
+ * page-level bit is clear. However, it's possible that in vacuum the bit
+ * got cleared after heap_vac_scan_next_block() was called, so we must
+ * recheck with buffer lock before concluding that the VM is corrupt.
+ */
+ else if (blk_known_av && !PageIsAllVisible(heap_page) &&
+ visibilitymap_get_status(relation, heap_blk, &vmbuffer) != 0)
+ {
+ ereport(WARNING,
+ (errcode(ERRCODE_DATA_CORRUPTED),
+ errmsg("page is not marked all-visible but visibility map bit is set in relation \"%s\" page %u",
+ RelationGetRelationName(relation), heap_blk)));
+
+ visibilitymap_clear(relation, heap_blk, vmbuffer,
+ VISIBILITYMAP_VALID_BITS);
+ }
+
+ /*
+ * It's possible for the value returned by
+ * GetOldestNonRemovableTransactionId() to move backwards, so it's not
+ * wrong for us to see tuples that appear to not be visible to everyone
+ * yet, while PD_ALL_VISIBLE is already set. The real safe xmin value
+ * never moves backwards, but GetOldestNonRemovableTransactionId() is
+ * conservative and sometimes returns a value that's unnecessarily small,
+ * so if we see that contradiction it just means that the tuples that we
+ * think are not visible to everyone yet actually are, and the
+ * PD_ALL_VISIBLE flag is correct.
+ *
+ * There should never be LP_DEAD items on a page with PD_ALL_VISIBLE set,
+ * however.
+ */
+ else if (prstate->lpdead_items > 0 && PageIsAllVisible(heap_page))
+ {
+ ereport(WARNING,
+ (errcode(ERRCODE_DATA_CORRUPTED),
+ errmsg("page containing LP_DEAD items is marked as all-visible in relation \"%s\" page %u",
+ RelationGetRelationName(relation), heap_blk)));
+
+ PageClearAllVisible(heap_page);
+ MarkBufferDirty(heap_buf);
+ visibilitymap_clear(relation, heap_blk, vmbuffer,
+ VISIBILITYMAP_VALID_BITS);
+ }
+
+ /* We should only set the VM if PD_ALL_VISIBLE is set or will be */
+ Assert(!do_set_vm || PageIsAllVisible(heap_page) || *do_set_pd_vis);
+
+ return do_set_vm;
+}
/*
* Prune and repair fragmentation and potentially freeze tuples on the
- * specified page.
+ * specified page. If the page's visibility status has changed, update it in
+ * the VM.
*
* Caller must have pin and buffer cleanup lock on the page. Note that we
* don't update the FSM information for page on caller's behalf. Caller might
* also need to account for a reduction in the length of the line pointer
* array following array truncation by us.
*
+ * vmbuffer is the buffer that must already contain contain the required block
+ * of the visibility map if we are to update it. blk_known_av is the
+ * visibility status of the heap block as of the last call to
+ * find_next_unskippable_block().
+ *
* reason indicates why the pruning is performed. It is included in the WAL
* record for debugging and analysis purposes, but otherwise has no effect.
*
@@ -443,15 +545,20 @@ heap_page_will_freeze(Relation relation, Buffer buffer,
* FREEZE indicates that we will also freeze tuples, and will return
* 'all_visible', 'all_frozen' flags to the caller.
*
- * If the HEAP_PRUNE_FREEZE option is set, we will freeze tuples if it's
+ * UPDATE_VIS indicates that we will set the page's status in the VM.
+ *
+ * If the HEAP_PRUNE_FREEZE option is set, we will also freeze tuples if it's
* required in order to advance relfrozenxid / relminmxid, or if it's
* considered advantageous for overall system performance to do so now. The
* 'cutoffs', 'presult', 'new_relfrozen_xid' and 'new_relmin_mxid' arguments
- * are required when freezing. When HEAP_PRUNE_FREEZE option is set, we also
- * set presult->all_visible and presult->all_frozen on exit, to indicate if
- * the VM bits can be set. They are always set to false when the
- * HEAP_PRUNE_FREEZE option is not set, because at the moment only callers
- * that also freeze need that information.
+ * are required when freezing.
+ *
+ * If HEAP_PAGE_PRUNE_UPDATE_VIS is set and the visibility status of the page
+ * has changed, we will update the VM at the same time as pruning and freezing
+ * the heap page. We will also update presult->old_vmbits and
+ * presult->new_vmbits with the state of the VM before and after updating it
+ * for the caller to use in bookkeeping.
+ *
*
* cutoffs contains the freeze cutoffs, established by VACUUM at the beginning
* of vacuuming the relation. Required if HEAP_PRUNE_FREEZE option is set.
@@ -478,6 +585,7 @@ heap_page_will_freeze(Relation relation, Buffer buffer,
*/
void
heap_page_prune_and_freeze(Relation relation, Buffer buffer,
+ Buffer vmbuffer, bool blk_known_av,
PruneReason reason,
int options,
const struct VacuumCutoffs *cutoffs,
@@ -496,10 +604,13 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
bool do_freeze;
bool do_prune;
bool do_hint_prune;
+ bool do_set_vm;
bool do_set_pd_vis;
bool did_tuple_hint_fpi;
int64 fpi_before = pgWalUsage.wal_fpi;
TransactionId frz_conflict_horizon = InvalidTransactionId;
+ uint8 new_vmbits = 0;
+ uint8 old_vmbits = 0;
/* Copy parameters to prstate */
prstate.vistest = vistest;
@@ -828,19 +939,27 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
Assert(!prstate.all_frozen || prstate.all_visible);
/*
- * Though callers should set the VM if PD_ALL_VISIBLE is set here, it is
- * allowed for the page-level bit to be set and the VM to be clear.
+ * Determine whether or not to set the page level PD_ALL_VISIBLE and the
+ * visibility map bits based on information from the VM and from
+ * all_visible and all_frozen variables.
+ *
+ * Though callers should set the VM if PD_ALL_VISIBLE is set, it is
+ * allowed for the page-level bit to be set and the VM to be clear. We log
+ * setting PD_ALL_VISIBLE on the heap page in a
+ * XLOG_HEAP2_PRUNE_VACUUM_SCAN record and setting the VM bits in a later
+ * emitted XLOG_HEAP2_VISIBLE record.
+ *
* Setting PD_ALL_VISIBLE when we are making the changes to the page that
* render it all-visible allows us to omit the heap page from the WAL
* chain when later updating the VM -- even when checksums/wal_log_hints
* are enabled.
*/
do_set_pd_vis = false;
+ do_set_vm = false;
if ((options & HEAP_PAGE_PRUNE_UPDATE_VIS) != 0)
- {
- if (prstate.all_visible && !PageIsAllVisible(page))
- do_set_pd_vis = true;
- }
+ do_set_vm = heap_page_will_set_vis(relation,
+ blockno, buffer, vmbuffer, blk_known_av,
+ &prstate, &new_vmbits, &do_set_pd_vis);
/* Any error while applying the changes is critical */
START_CRIT_SECTION();
@@ -928,28 +1047,72 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
END_CRIT_SECTION();
+ Assert(!prstate.all_visible || (prstate.lpdead_items == 0));
+
+ /*
+ * VACUUM will call heap_page_would_be_all_visible() during the second
+ * pass over the heap to determine all_visible and all_frozen for the page
+ * -- this is a specialized version of that logic. Now that we've finished
+ * pruning and freezing, make sure that we're in total agreement with
+ * heap_page_would_be_all_visible() using an assertion.
+ */
+#ifdef USE_ASSERT_CHECKING
+ if (prstate.all_visible)
+ {
+ TransactionId debug_cutoff;
+ bool debug_all_frozen;
+
+ Assert(prstate.lpdead_items == 0);
+ Assert(prstate.cutoffs);
+
+ if (!heap_page_is_all_visible(relation, buffer,
+ prstate.cutoffs->OldestXmin,
+ &debug_all_frozen,
+ &debug_cutoff, off_loc))
+ Assert(false);
+
+ Assert(prstate.all_frozen == debug_all_frozen);
+
+ Assert(!TransactionIdIsValid(debug_cutoff) ||
+ debug_cutoff == prstate.visibility_cutoff_xid);
+ }
+#endif
+
+ /* Now set the VM */
+ if (do_set_vm)
+ {
+ TransactionId vm_conflict_horizon;
+
+ Assert((new_vmbits & VISIBILITYMAP_VALID_BITS) != 0);
+
+ /*
+ * The conflict horizon for that record must be the newest xmin on the
+ * page. However, if the page is completely frozen, there can be no
+ * conflict and the vm_conflict_horizon should remain
+ * InvalidTransactionId. This includes the case that we just froze
+ * all the tuples; the prune-freeze record included the conflict XID
+ * already so a snapshotConflictHorizon sufficient to make everything
+ * safe for REDO was logged when the page's tuples were frozen.
+ */
+ if (prstate.all_frozen)
+ vm_conflict_horizon = InvalidTransactionId;
+ else
+ vm_conflict_horizon = prstate.visibility_cutoff_xid;
+ old_vmbits = visibilitymap_set(relation, blockno,
+ InvalidXLogRecPtr,
+ vmbuffer, vm_conflict_horizon,
+ new_vmbits);
+ }
+
/* Copy information back for caller */
presult->ndeleted = prstate.ndeleted;
presult->nnewlpdead = prstate.ndead;
presult->nfrozen = prstate.nfrozen;
presult->live_tuples = prstate.live_tuples;
presult->recently_dead_tuples = prstate.recently_dead_tuples;
- presult->all_visible = prstate.all_visible;
- presult->all_frozen = prstate.all_frozen;
presult->hastup = prstate.hastup;
-
- /*
- * For callers planning to update the visibility map, the conflict horizon
- * for that record must be the newest xmin on the page. However, if the
- * page is completely frozen, there can be no conflict and the
- * vm_conflict_horizon should remain InvalidTransactionId. This includes
- * the case that we just froze all the tuples; the prune-freeze record
- * included the conflict XID already so the caller doesn't need it.
- */
- if (presult->all_frozen)
- presult->vm_conflict_horizon = InvalidTransactionId;
- else
- presult->vm_conflict_horizon = prstate.visibility_cutoff_xid;
+ presult->new_vmbits = new_vmbits;
+ presult->old_vmbits = old_vmbits;
presult->lpdead_items = prstate.lpdead_items;
/* the presult->deadoffsets array was already filled in */
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index e8721761392..d71f3755dce 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -463,11 +463,6 @@ static void dead_items_add(LVRelState *vacrel, BlockNumber blkno, OffsetNumber *
int num_offsets);
static void dead_items_reset(LVRelState *vacrel);
static void dead_items_cleanup(LVRelState *vacrel);
-static bool heap_page_is_all_visible(Relation rel, Buffer buf,
- TransactionId OldestXmin,
- bool *all_frozen,
- TransactionId *visibility_cutoff_xid,
- OffsetNumber *logging_offnum);
static bool heap_page_would_be_all_visible(Relation rel, Buffer buf,
TransactionId OldestXmin,
OffsetNumber *deadoffsets,
@@ -2015,7 +2010,9 @@ lazy_scan_prune(LVRelState *vacrel,
if (vacrel->nindexes == 0)
prune_options |= HEAP_PAGE_PRUNE_MARK_UNUSED_NOW;
- heap_page_prune_and_freeze(rel, buf, PRUNE_VACUUM_SCAN, prune_options,
+ heap_page_prune_and_freeze(rel, buf,
+ vmbuffer, all_visible_according_to_vm,
+ PRUNE_VACUUM_SCAN, prune_options,
&vacrel->cutoffs,
vacrel->vistest,
&presult,
@@ -2036,33 +2033,6 @@ lazy_scan_prune(LVRelState *vacrel,
vacrel->new_frozen_tuple_pages++;
}
- /*
- * VACUUM will call heap_page_is_all_visible() during the second pass over
- * the heap to determine all_visible and all_frozen for the page -- this
- * is a specialized version of the logic from this function. Now that
- * we've finished pruning and freezing, make sure that we're in total
- * agreement with heap_page_is_all_visible() using an assertion.
- */
-#ifdef USE_ASSERT_CHECKING
- if (presult.all_visible)
- {
- TransactionId debug_cutoff;
- bool debug_all_frozen;
-
- Assert(presult.lpdead_items == 0);
-
- if (!heap_page_is_all_visible(vacrel->rel, buf,
- vacrel->cutoffs.OldestXmin, &debug_all_frozen,
- &debug_cutoff, &vacrel->offnum))
- Assert(false);
-
- Assert(presult.all_frozen == debug_all_frozen);
-
- Assert(!TransactionIdIsValid(debug_cutoff) ||
- debug_cutoff == presult.vm_conflict_horizon);
- }
-#endif
-
/*
* Now save details of the LP_DEAD items from the page in vacrel
*/
@@ -2096,112 +2066,28 @@ lazy_scan_prune(LVRelState *vacrel,
/* Did we find LP_DEAD items? */
*has_lpdead_items = (presult.lpdead_items > 0);
- Assert(!presult.all_visible || !(*has_lpdead_items));
- Assert(!presult.all_frozen || presult.all_visible);
-
/*
- * Handle setting visibility map bits based on information from the VM (as
- * of last heap_vac_scan_next_block() call), and from all_visible and
- * all_frozen variables.
+ * For the purposes of logging, count whether or not the page was newly
+ * set all-visible and, potentially, all-frozen.
*/
- if ((presult.all_visible && !all_visible_according_to_vm) ||
- (presult.all_frozen && !VM_ALL_FROZEN(vacrel->rel, blkno, &vmbuffer)))
+ if ((presult.old_vmbits & VISIBILITYMAP_ALL_VISIBLE) == 0 &&
+ (presult.new_vmbits & VISIBILITYMAP_ALL_VISIBLE) != 0)
{
- uint8 old_vmbits;
- uint8 flags = VISIBILITYMAP_ALL_VISIBLE;
-
- /*
- * If the page is all-frozen, we can pass InvalidTransactionId as our
- * cutoff_xid, since a snapshotConflictHorizon sufficient to make
- * everything safe for REDO was logged when the page's tuples were
- * frozen.
- */
- if (presult.all_frozen)
- {
- Assert(!TransactionIdIsValid(presult.vm_conflict_horizon));
- flags |= VISIBILITYMAP_ALL_FROZEN;
- }
-
- old_vmbits = visibilitymap_set(vacrel->rel, blkno,
- InvalidXLogRecPtr,
- vmbuffer, presult.vm_conflict_horizon,
- flags);
-
- /*
- * Even if we are only setting the all-frozen bit, there is a small
- * chance that the VM was modified sometime between setting
- * all_visible_according_to_vm and checking the visibility during
- * pruning. Check the return value of old_vmbits to ensure the
- * visibility map counters used for logging are accurate.
- *
- * If the page wasn't already set all-visible and/or all-frozen in the
- * VM, count it as newly set for logging.
- */
- if ((old_vmbits & VISIBILITYMAP_ALL_VISIBLE) == 0)
- {
- vacrel->vm_new_visible_pages++;
- if (presult.all_frozen)
- {
- vacrel->vm_new_visible_frozen_pages++;
- *vm_page_frozen = true;
- }
- }
- else if ((old_vmbits & VISIBILITYMAP_ALL_FROZEN) == 0 &&
- presult.all_frozen)
+ vacrel->vm_new_visible_pages++;
+ if ((presult.new_vmbits & VISIBILITYMAP_ALL_FROZEN) != 0)
{
- vacrel->vm_new_frozen_pages++;
+ vacrel->vm_new_visible_frozen_pages++;
*vm_page_frozen = true;
}
}
-
- /*
- * Now handle two potential corruption cases:
- *
- * As of PostgreSQL 9.2, the visibility map bit should never be set if the
- * page-level bit is clear. However, it's possible that the bit got
- * cleared after heap_vac_scan_next_block() was called, so we must recheck
- * with buffer lock before concluding that the VM is corrupt.
- */
- else if (all_visible_according_to_vm && !PageIsAllVisible(page) &&
- visibilitymap_get_status(vacrel->rel, blkno, &vmbuffer) != 0)
- {
- ereport(WARNING,
- (errcode(ERRCODE_DATA_CORRUPTED),
- errmsg("page is not marked all-visible but visibility map bit is set in relation \"%s\" page %u",
- vacrel->relname, blkno)));
-
- visibilitymap_clear(vacrel->rel, blkno, vmbuffer,
- VISIBILITYMAP_VALID_BITS);
- }
-
- /*
- * It's possible for the value returned by
- * GetOldestNonRemovableTransactionId() to move backwards, so it's not
- * wrong for us to see tuples that appear to not be visible to everyone
- * yet, while PD_ALL_VISIBLE is already set. The real safe xmin value
- * never moves backwards, but GetOldestNonRemovableTransactionId() is
- * conservative and sometimes returns a value that's unnecessarily small,
- * so if we see that contradiction it just means that the tuples that we
- * think are not visible to everyone yet actually are, and the
- * PD_ALL_VISIBLE flag is correct.
- *
- * There should never be LP_DEAD items on a page with PD_ALL_VISIBLE set,
- * however.
- */
- else if (presult.lpdead_items > 0 && PageIsAllVisible(page))
+ else if ((presult.old_vmbits & VISIBILITYMAP_ALL_FROZEN) == 0 &&
+ (presult.new_vmbits & VISIBILITYMAP_ALL_FROZEN) != 0)
{
- ereport(WARNING,
- (errcode(ERRCODE_DATA_CORRUPTED),
- errmsg("page containing LP_DEAD items is marked as all-visible in relation \"%s\" page %u",
- vacrel->relname, blkno)));
-
- PageClearAllVisible(page);
- MarkBufferDirty(buf);
- visibilitymap_clear(vacrel->rel, blkno, vmbuffer,
- VISIBILITYMAP_VALID_BITS);
+ Assert((presult.new_vmbits & VISIBILITYMAP_ALL_VISIBLE) != 0);
+ vacrel->vm_new_frozen_pages++;
+ *vm_page_frozen = true;
}
-
return presult.ndeleted;
}
@@ -3591,7 +3477,7 @@ dead_items_cleanup(LVRelState *vacrel)
* Wrapper for heap_page_would_be_all_visible() which can be used for
* callers that expect no LP_DEAD on the page.
*/
-static bool
+bool
heap_page_is_all_visible(Relation rel, Buffer buf,
TransactionId OldestXmin,
bool *all_frozen,
diff --git a/src/include/access/heapam.h b/src/include/access/heapam.h
index e97b53f1ee8..493ddeacbc0 100644
--- a/src/include/access/heapam.h
+++ b/src/include/access/heapam.h
@@ -235,19 +235,14 @@ typedef struct PruneFreezeResult
int recently_dead_tuples;
/*
- * all_visible and all_frozen indicate if the all-visible and all-frozen
- * bits in the visibility map can be set for this page, after pruning.
+ * old_vmbits are the state of the all-visible and all-frozen bits in the
+ * visibility map before updating it during phase I of vacuuming.
+ * new_vmbits are the state of those bits after phase I of vacuuming.
*
- * vm_conflict_horizon is the newest xmin of live tuples on the page. The
- * caller can use it as the conflict horizon when setting the VM bits. It
- * is only valid if we froze some tuples (nfrozen > 0), and all_frozen is
- * true.
- *
- * These are only set if the HEAP_PRUNE_FREEZE option is set.
+ * These are only set if the HEAP_PAGE_PRUNE_UPDATE_VIS option is set.
*/
- bool all_visible;
- bool all_frozen;
- TransactionId vm_conflict_horizon;
+ uint8 new_vmbits;
+ uint8 old_vmbits;
/*
* Whether or not the page makes rel truncation unsafe. This is set to
@@ -369,6 +364,7 @@ extern TransactionId heap_index_delete_tuples(Relation rel,
struct GlobalVisState;
extern void heap_page_prune_opt(Relation relation, Buffer buffer);
extern void heap_page_prune_and_freeze(Relation relation, Buffer buffer,
+ Buffer vmbuffer, bool blk_known_av,
PruneReason reason,
int options,
const struct VacuumCutoffs *cutoffs,
@@ -397,6 +393,12 @@ extern void log_heap_prune_and_freeze(Relation relation, Buffer buffer,
extern void heap_vacuum_rel(Relation rel,
const VacuumParams params, BufferAccessStrategy bstrategy);
+extern bool heap_page_is_all_visible(Relation rel, Buffer buf,
+ TransactionId OldestXmin,
+ bool *all_frozen,
+ TransactionId *visibility_cutoff_xid,
+ OffsetNumber *logging_offnum);
+
/* in heap/heapam_visibility.c */
extern bool HeapTupleSatisfiesVisibility(HeapTuple htup, Snapshot snapshot,
Buffer buffer);
--
2.43.0
v15-0018-Rename-GlobalVisTestIsRemovableXid-to-GlobalVisX.patchtext/x-patch; charset=US-ASCII; name=v15-0018-Rename-GlobalVisTestIsRemovableXid-to-GlobalVisX.patchDownload
From 7821b9fd5001f1e1e20ec0c4857655cc8b781cbc Mon Sep 17 00:00:00 2001
From: Melanie Plageman <melanieplageman@gmail.com>
Date: Fri, 18 Jul 2025 16:30:04 -0400
Subject: [PATCH v15 18/23] Rename GlobalVisTestIsRemovableXid() to
GlobalVisXidVisibleToAll()
Currently, we only use GlobalVisTestIsRemovableXid() to check if a
tuple's xmax is visible to all, meaning we can remove it. But future
commits will use GlobalVisTestIsRemovableXid() to test if a tuple's xmin
is visible to all for the purposes of determining if setting the page
all-visible in the VM. In that case, it makes more sense to call the
function GlobalVisXidVisibleToAll().
Reviewed-by: Kirill Reshke <reshkekirill@gmail.com>
---
src/backend/access/heap/heapam_visibility.c | 6 +++---
src/backend/access/heap/pruneheap.c | 14 +++++++-------
src/backend/access/spgist/spgvacuum.c | 2 +-
src/backend/storage/ipc/procarray.c | 13 ++++++-------
src/include/utils/snapmgr.h | 4 ++--
5 files changed, 19 insertions(+), 20 deletions(-)
diff --git a/src/backend/access/heap/heapam_visibility.c b/src/backend/access/heap/heapam_visibility.c
index 05f6946fe60..4ebc8abdbeb 100644
--- a/src/backend/access/heap/heapam_visibility.c
+++ b/src/backend/access/heap/heapam_visibility.c
@@ -1447,7 +1447,7 @@ HeapTupleSatisfiesNonVacuumable(HeapTuple htup, Snapshot snapshot,
{
Assert(TransactionIdIsValid(dead_after));
- if (GlobalVisTestIsRemovableXid(snapshot->vistest, dead_after))
+ if (GlobalVisXidVisibleToAll(snapshot->vistest, dead_after))
res = HEAPTUPLE_DEAD;
}
else
@@ -1512,8 +1512,8 @@ HeapTupleIsSurelyDead(HeapTuple htup, GlobalVisState *vistest)
return false;
/* Deleter committed, so tuple is dead if the XID is old enough. */
- return GlobalVisTestIsRemovableXid(vistest,
- HeapTupleHeaderGetRawXmax(tuple));
+ return GlobalVisXidVisibleToAll(vistest,
+ HeapTupleHeaderGetRawXmax(tuple));
}
/*
diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index faf1002f25f..52e956189e8 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -218,7 +218,7 @@ heap_page_prune_opt(Relation relation, Buffer buffer)
*/
vistest = GlobalVisTestFor(relation);
- if (!GlobalVisTestIsRemovableXid(vistest, prune_xid))
+ if (!GlobalVisXidVisibleToAll(vistest, prune_xid))
return;
/*
@@ -727,9 +727,9 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
* Determining HTSV only once for each tuple is required for correctness,
* to deal with cases where running HTSV twice could result in different
* results. For example, RECENTLY_DEAD can turn to DEAD if another
- * checked item causes GlobalVisTestIsRemovableFullXid() to update the
- * horizon, or INSERT_IN_PROGRESS can change to DEAD if the inserting
- * transaction aborts.
+ * checked item causes GlobalVisXidVisibleToAll() to update the horizon,
+ * or INSERT_IN_PROGRESS can change to DEAD if the inserting transaction
+ * aborts.
*
* It's also good for performance. Most commonly tuples within a page are
* stored at decreasing offsets (while the items are stored at increasing
@@ -1200,11 +1200,11 @@ heap_prune_satisfies_vacuum(PruneState *prstate, HeapTuple tup, Buffer buffer)
* Determine whether or not the tuple is considered dead when compared
* with the provided GlobalVisState. On-access pruning does not provide
* VacuumCutoffs. And for vacuum, even if the tuple's xmax is not older
- * than OldestXmin, GlobalVisTestIsRemovableXid() could find the row dead
- * if the GlobalVisState has been updated since the beginning of vacuuming
+ * than OldestXmin, GlobalVisXidVisibleToAll() could find the row dead if
+ * the GlobalVisState has been updated since the beginning of vacuuming
* the relation.
*/
- if (GlobalVisTestIsRemovableXid(prstate->vistest, dead_after))
+ if (GlobalVisXidVisibleToAll(prstate->vistest, dead_after))
return HEAPTUPLE_DEAD;
return res;
diff --git a/src/backend/access/spgist/spgvacuum.c b/src/backend/access/spgist/spgvacuum.c
index 8f8a1ad7796..496cca69410 100644
--- a/src/backend/access/spgist/spgvacuum.c
+++ b/src/backend/access/spgist/spgvacuum.c
@@ -536,7 +536,7 @@ vacuumRedirectAndPlaceholder(Relation index, Relation heaprel, Buffer buffer)
*/
if (dt->tupstate == SPGIST_REDIRECT &&
(!TransactionIdIsValid(dt->xid) ||
- GlobalVisTestIsRemovableXid(vistest, dt->xid)))
+ GlobalVisXidVisibleToAll(vistest, dt->xid)))
{
dt->tupstate = SPGIST_PLACEHOLDER;
Assert(opaque->nRedirection > 0);
diff --git a/src/backend/storage/ipc/procarray.c b/src/backend/storage/ipc/procarray.c
index 200f72c6e25..f67f01c17c2 100644
--- a/src/backend/storage/ipc/procarray.c
+++ b/src/backend/storage/ipc/procarray.c
@@ -4181,8 +4181,7 @@ GlobalVisUpdate(void)
* See comment for GlobalVisState for details.
*/
bool
-GlobalVisTestIsRemovableFullXid(GlobalVisState *state,
- FullTransactionId fxid)
+GlobalVisFullXidVisibleToAll(GlobalVisState *state, FullTransactionId fxid)
{
/*
* If fxid is older than maybe_needed bound, it definitely is visible to
@@ -4223,7 +4222,7 @@ GlobalVisTestIsRemovableFullXid(GlobalVisState *state,
* relfrozenxid).
*/
bool
-GlobalVisTestIsRemovableXid(GlobalVisState *state, TransactionId xid)
+GlobalVisXidVisibleToAll(GlobalVisState *state, TransactionId xid)
{
FullTransactionId fxid;
@@ -4237,7 +4236,7 @@ GlobalVisTestIsRemovableXid(GlobalVisState *state, TransactionId xid)
*/
fxid = FullXidRelativeTo(state->definitely_needed, xid);
- return GlobalVisTestIsRemovableFullXid(state, fxid);
+ return GlobalVisFullXidVisibleToAll(state, fxid);
}
/*
@@ -4251,12 +4250,12 @@ GlobalVisCheckRemovableFullXid(Relation rel, FullTransactionId fxid)
state = GlobalVisTestFor(rel);
- return GlobalVisTestIsRemovableFullXid(state, fxid);
+ return GlobalVisFullXidVisibleToAll(state, fxid);
}
/*
* Convenience wrapper around GlobalVisTestFor() and
- * GlobalVisTestIsRemovableXid(), see their comments.
+ * GlobalVisTestIsVisibleXid(), see their comments.
*/
bool
GlobalVisCheckRemovableXid(Relation rel, TransactionId xid)
@@ -4265,7 +4264,7 @@ GlobalVisCheckRemovableXid(Relation rel, TransactionId xid)
state = GlobalVisTestFor(rel);
- return GlobalVisTestIsRemovableXid(state, xid);
+ return GlobalVisXidVisibleToAll(state, xid);
}
/*
diff --git a/src/include/utils/snapmgr.h b/src/include/utils/snapmgr.h
index 604c1f90216..a0ea2cfcea2 100644
--- a/src/include/utils/snapmgr.h
+++ b/src/include/utils/snapmgr.h
@@ -100,8 +100,8 @@ extern char *ExportSnapshot(Snapshot snapshot);
*/
typedef struct GlobalVisState GlobalVisState;
extern GlobalVisState *GlobalVisTestFor(Relation rel);
-extern bool GlobalVisTestIsRemovableXid(GlobalVisState *state, TransactionId xid);
-extern bool GlobalVisTestIsRemovableFullXid(GlobalVisState *state, FullTransactionId fxid);
+extern bool GlobalVisXidVisibleToAll(GlobalVisState *state, TransactionId xid);
+extern bool GlobalVisFullXidVisibleToAll(GlobalVisState *state, FullTransactionId fxid);
extern bool GlobalVisCheckRemovableXid(Relation rel, TransactionId xid);
extern bool GlobalVisCheckRemovableFullXid(Relation rel, FullTransactionId fxid);
--
2.43.0
v15-0002-Reorder-heap_page_prune_and_freeze-parameters.patchtext/x-patch; charset=US-ASCII; name=v15-0002-Reorder-heap_page_prune_and_freeze-parameters.patchDownload
From bca3a9c979507bc631193bd9ca5d39556bed383d Mon Sep 17 00:00:00 2001
From: Melanie Plageman <melanieplageman@gmail.com>
Date: Mon, 15 Sep 2025 12:06:19 -0400
Subject: [PATCH v15 02/23] Reorder heap_page_prune_and_freeze parameters
Move read-only parameters to the beginning of the function, making it
more clear which parameters are inputs and which are input/outputs or
outputs. Also const-qualify VacuumCutoffs, which is not modified in
heap_page_prune_and_freeze().
---
src/backend/access/heap/pruneheap.c | 40 ++++++++++++++--------------
src/backend/access/heap/vacuumlazy.c | 6 +++--
src/include/access/heapam.h | 6 ++---
3 files changed, 27 insertions(+), 25 deletions(-)
diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index d8ea0c78f77..28bd6a56749 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -43,7 +43,7 @@ typedef struct
bool mark_unused_now;
/* whether to attempt freezing tuples */
bool freeze;
- struct VacuumCutoffs *cutoffs;
+ const struct VacuumCutoffs *cutoffs;
/*-------------------------------------------------------
* Fields describing what to do to the page
@@ -260,8 +260,8 @@ heap_page_prune_opt(Relation relation, Buffer buffer)
* not the relation has indexes, since we cannot safely determine
* that during on-access pruning with the current implementation.
*/
- heap_page_prune_and_freeze(relation, buffer, vistest, 0,
- NULL, &presult, PRUNE_ON_ACCESS, &dummy_off_loc, NULL, NULL);
+ heap_page_prune_and_freeze(relation, buffer, PRUNE_ON_ACCESS, 0, NULL,
+ vistest, &presult, &dummy_off_loc, NULL, NULL);
/*
* Report the number of tuples reclaimed to pgstats. This is
@@ -303,7 +303,17 @@ heap_page_prune_opt(Relation relation, Buffer buffer)
* also need to account for a reduction in the length of the line pointer
* array following array truncation by us.
*
- * If the HEAP_PRUNE_FREEZE option is set, we will also freeze tuples if it's
+ * reason indicates why the pruning is performed. It is included in the WAL
+ * record for debugging and analysis purposes, but otherwise has no effect.
+ *
+ * options:
+ * MARK_UNUSED_NOW indicates that dead items can be set LP_UNUSED during
+ * pruning.
+ *
+ * FREEZE indicates that we will also freeze tuples, and will return
+ * 'all_visible', 'all_frozen' flags to the caller.
+ *
+ * If the HEAP_PRUNE_FREEZE option is set, we will freeze tuples if it's
* required in order to advance relfrozenxid / relminmxid, or if it's
* considered advantageous for overall system performance to do so now. The
* 'cutoffs', 'presult', 'new_relfrozen_xid' and 'new_relmin_mxid' arguments
@@ -313,29 +323,19 @@ heap_page_prune_opt(Relation relation, Buffer buffer)
* HEAP_PRUNE_FREEZE option is not set, because at the moment only callers
* that also freeze need that information.
*
- * vistest is used to distinguish whether tuples are DEAD or RECENTLY_DEAD
- * (see heap_prune_satisfies_vacuum).
- *
- * options:
- * MARK_UNUSED_NOW indicates that dead items can be set LP_UNUSED during
- * pruning.
- *
- * FREEZE indicates that we will also freeze tuples, and will return
- * 'all_visible', 'all_frozen' flags to the caller.
- *
* cutoffs contains the freeze cutoffs, established by VACUUM at the beginning
* of vacuuming the relation. Required if HEAP_PRUNE_FREEZE option is set.
* cutoffs->OldestXmin is also used to determine if dead tuples are
* HEAPTUPLE_RECENTLY_DEAD or HEAPTUPLE_DEAD.
*
+ * vistest is used to distinguish whether tuples are DEAD or RECENTLY_DEAD
+ * (see heap_prune_satisfies_vacuum).
+ *
* presult contains output parameters needed by callers, such as the number of
* tuples removed and the offsets of dead items on the page after pruning.
* heap_page_prune_and_freeze() is responsible for initializing it. Required
* by all callers.
*
- * reason indicates why the pruning is performed. It is included in the WAL
- * record for debugging and analysis purposes, but otherwise has no effect.
- *
* off_loc is the offset location required by the caller to use in error
* callback.
*
@@ -348,11 +348,11 @@ heap_page_prune_opt(Relation relation, Buffer buffer)
*/
void
heap_page_prune_and_freeze(Relation relation, Buffer buffer,
- GlobalVisState *vistest,
+ PruneReason reason,
int options,
- struct VacuumCutoffs *cutoffs,
+ const struct VacuumCutoffs *cutoffs,
+ GlobalVisState *vistest,
PruneFreezeResult *presult,
- PruneReason reason,
OffsetNumber *off_loc,
TransactionId *new_relfrozen_xid,
MultiXactId *new_relmin_mxid)
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index 981d9380a92..ddc9677694c 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -1974,8 +1974,10 @@ lazy_scan_prune(LVRelState *vacrel,
if (vacrel->nindexes == 0)
prune_options |= HEAP_PAGE_PRUNE_MARK_UNUSED_NOW;
- heap_page_prune_and_freeze(rel, buf, vacrel->vistest, prune_options,
- &vacrel->cutoffs, &presult, PRUNE_VACUUM_SCAN,
+ heap_page_prune_and_freeze(rel, buf, PRUNE_VACUUM_SCAN, prune_options,
+ &vacrel->cutoffs,
+ vacrel->vistest,
+ &presult,
&vacrel->offnum,
&vacrel->NewRelfrozenXid, &vacrel->NewRelminMxid);
diff --git a/src/include/access/heapam.h b/src/include/access/heapam.h
index a1de400b9a5..665e0c79baf 100644
--- a/src/include/access/heapam.h
+++ b/src/include/access/heapam.h
@@ -368,11 +368,11 @@ extern TransactionId heap_index_delete_tuples(Relation rel,
struct GlobalVisState;
extern void heap_page_prune_opt(Relation relation, Buffer buffer);
extern void heap_page_prune_and_freeze(Relation relation, Buffer buffer,
- struct GlobalVisState *vistest,
+ PruneReason reason,
int options,
- struct VacuumCutoffs *cutoffs,
+ const struct VacuumCutoffs *cutoffs,
+ struct GlobalVisState *vistest,
PruneFreezeResult *presult,
- PruneReason reason,
OffsetNumber *off_loc,
TransactionId *new_relfrozen_xid,
MultiXactId *new_relmin_mxid);
--
2.43.0
v15-0004-Rename-PruneState.freeze-to-attempt_freeze.patchtext/x-patch; charset=US-ASCII; name=v15-0004-Rename-PruneState.freeze-to-attempt_freeze.patchDownload
From bbf405f68ab042b1a01241a9700ad3506ebea789 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <melanieplageman@gmail.com>
Date: Thu, 31 Jul 2025 14:07:51 -0400
Subject: [PATCH v15 04/23] Rename PruneState.freeze to attempt_freeze
This makes it more clear that this is to indicate the caller would like
heap_page_prune_and_freeze() to consider freezing tuples -- not that we
ultimately will end up freezing them.
Also rename local variable hint_bit_fpi to did_tuple_hint_fpi. This
makes it clear it is about tuple hints and not page hints and that it
indicates something that happened and not something that could happen.
And rename local variable do_hint to do_hint_prune. This distinguishes
the prunable and page full hints used to decide whether or not to
on-access prune a page from other page-level and tuple hint bits.
---
src/backend/access/heap/pruneheap.c | 28 ++++++++++++++--------------
1 file changed, 14 insertions(+), 14 deletions(-)
diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index ea8216e0632..740aa07cd83 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -42,7 +42,7 @@ typedef struct
/* whether or not dead items can be set LP_UNUSED during pruning */
bool mark_unused_now;
/* whether to attempt freezing tuples */
- bool freeze;
+ bool attempt_freeze;
const struct VacuumCutoffs *cutoffs;
/*-------------------------------------------------------
@@ -361,14 +361,14 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
HeapTupleData tup;
bool do_freeze;
bool do_prune;
- bool do_hint;
- bool hint_bit_fpi;
+ bool do_hint_prune;
+ bool did_tuple_hint_fpi;
int64 fpi_before = pgWalUsage.wal_fpi;
/* Copy parameters to prstate */
prstate.vistest = vistest;
prstate.mark_unused_now = (options & HEAP_PAGE_PRUNE_MARK_UNUSED_NOW) != 0;
- prstate.freeze = (options & HEAP_PAGE_PRUNE_FREEZE) != 0;
+ prstate.attempt_freeze = (options & HEAP_PAGE_PRUNE_FREEZE) != 0;
prstate.cutoffs = cutoffs;
/*
@@ -390,7 +390,7 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
/* initialize page freezing working state */
prstate.pagefrz.freeze_required = false;
- if (prstate.freeze)
+ if (prstate.attempt_freeze)
{
Assert(new_relfrozen_xid && new_relmin_mxid);
prstate.pagefrz.FreezePageRelfrozenXid = *new_relfrozen_xid;
@@ -437,7 +437,7 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
* function, when we return the value to the caller, so that the caller
* doesn't set the VM bit incorrectly.
*/
- if (prstate.freeze)
+ if (prstate.attempt_freeze)
{
prstate.all_visible = true;
prstate.all_frozen = true;
@@ -551,7 +551,7 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
* If checksums are enabled, heap_prune_satisfies_vacuum() may have caused
* an FPI to be emitted.
*/
- hint_bit_fpi = fpi_before != pgWalUsage.wal_fpi;
+ did_tuple_hint_fpi = fpi_before != pgWalUsage.wal_fpi;
/*
* Process HOT chains.
@@ -659,7 +659,7 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
* pd_prune_xid field or the page was marked full, we will update the hint
* bit.
*/
- do_hint = ((PageHeader) page)->pd_prune_xid != prstate.new_prune_xid ||
+ do_hint_prune = ((PageHeader) page)->pd_prune_xid != prstate.new_prune_xid ||
PageIsFull(page);
/*
@@ -667,7 +667,7 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
* plans we prepared, or not.
*/
do_freeze = false;
- if (prstate.freeze)
+ if (prstate.attempt_freeze)
{
if (prstate.pagefrz.freeze_required)
{
@@ -702,14 +702,14 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
*/
if (RelationNeedsWAL(relation))
{
- if (hint_bit_fpi)
+ if (did_tuple_hint_fpi)
do_freeze = true;
else if (do_prune)
{
if (XLogCheckBufferNeedsBackup(buffer))
do_freeze = true;
}
- else if (do_hint)
+ else if (do_hint_prune)
{
if (XLogHintBitIsNeeded() && XLogCheckBufferNeedsBackup(buffer))
do_freeze = true;
@@ -752,7 +752,7 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
/* Any error while applying the changes is critical */
START_CRIT_SECTION();
- if (do_hint)
+ if (do_hint_prune)
{
/*
* Update the page's pd_prune_xid field to either zero, or the lowest
@@ -893,7 +893,7 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
presult->lpdead_items = prstate.lpdead_items;
/* the presult->deadoffsets array was already filled in */
- if (prstate.freeze)
+ if (prstate.attempt_freeze)
{
if (presult->nfrozen > 0)
{
@@ -1475,7 +1475,7 @@ heap_prune_record_unchanged_lp_normal(Page page, PruneState *prstate, OffsetNumb
}
/* Consider freezing any normal tuples which will not be removed */
- if (prstate->freeze)
+ if (prstate->attempt_freeze)
{
bool totally_frozen;
--
2.43.0
v15-0005-Add-helper-for-freeze-determination-to-heap_page.patchtext/x-patch; charset=US-ASCII; name=v15-0005-Add-helper-for-freeze-determination-to-heap_page.patchDownload
From 8d5d18247faca56c12938333161aa2d19e70341e Mon Sep 17 00:00:00 2001
From: Melanie Plageman <melanieplageman@gmail.com>
Date: Tue, 16 Sep 2025 14:22:10 -0400
Subject: [PATCH v15 05/23] Add helper for freeze determination to
heap_page_prune_and_freeze
After scanning through the line pointers on the heap page during
vacuum's first phase, we use several statuses and information we
collected to determine whether or not we will use the freeze plans we
assembled.
Do this in a helper for better readability.
---
src/backend/access/heap/pruneheap.c | 199 +++++++++++++++++-----------
1 file changed, 119 insertions(+), 80 deletions(-)
diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index 740aa07cd83..4ed74de6f27 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -289,6 +289,120 @@ heap_page_prune_opt(Relation relation, Buffer buffer)
}
}
+/*
+ * Decide if we want to go ahead with freezing according to the freeze plans
+ * we prepared for the given heap buffer or not. If the caller specified we
+ * should not freeze tuples, it exits early. Otherwise, it does a few
+ * pre-freeze checks.
+ *
+ * do_prune, do_hint_full_or_prunable, and did_tuple_hint_fpi must all have
+ * been decided before calling this function.
+ *
+ * prstate is an input/output parameter.
+ *
+ * Returns true if we should use our freeze plans and freeze tuples on the page
+ * and false otherwise.
+ */
+static bool
+heap_page_will_freeze(Relation relation, Buffer buffer,
+ bool did_tuple_hint_fpi,
+ bool do_prune,
+ bool do_hint_prune,
+ PruneState *prstate)
+{
+ bool do_freeze = false;
+
+ /*
+ * If the caller specified we should not attempt to freeze any tuples,
+ * validate that everything is in the right state and exit.
+ */
+ if (!prstate->attempt_freeze)
+ {
+ Assert(!prstate->all_frozen && prstate->nfrozen == 0);
+ Assert(prstate->lpdead_items == 0 || !prstate->all_visible);
+ return false;
+ }
+
+ if (prstate->pagefrz.freeze_required)
+ {
+ /*
+ * heap_prepare_freeze_tuple indicated that at least one XID/MXID from
+ * before FreezeLimit/MultiXactCutoff is present. Must freeze to
+ * advance relfrozenxid/relminmxid.
+ */
+ do_freeze = true;
+ }
+ else
+ {
+ /*
+ * Opportunistically freeze the page if we are generating an FPI
+ * anyway and if doing so means that we can set the page all-frozen
+ * afterwards (might not happen until VACUUM's final heap pass).
+ *
+ * XXX: Previously, we knew if pruning emitted an FPI by checking
+ * pgWalUsage.wal_fpi before and after pruning. Once the freeze and
+ * prune records were combined, this heuristic couldn't be used
+ * anymore. The opportunistic freeze heuristic must be improved;
+ * however, for now, try to approximate the old logic.
+ */
+ if (prstate->all_frozen && prstate->nfrozen > 0)
+ {
+ Assert(prstate->all_visible);
+
+ /*
+ * Freezing would make the page all-frozen. Have already emitted
+ * an FPI or will do so anyway?
+ */
+ if (RelationNeedsWAL(relation))
+ {
+ if (did_tuple_hint_fpi)
+ do_freeze = true;
+ else if (do_prune)
+ {
+ if (XLogCheckBufferNeedsBackup(buffer))
+ do_freeze = true;
+ }
+ else if (do_hint_prune)
+ {
+ if (XLogHintBitIsNeeded() && XLogCheckBufferNeedsBackup(buffer))
+ do_freeze = true;
+ }
+ }
+ }
+ }
+
+ if (do_freeze)
+ {
+ /*
+ * Validate the tuples we will be freezing before entering the
+ * critical section.
+ */
+ heap_pre_freeze_checks(buffer, prstate->frozen, prstate->nfrozen);
+ }
+ else if (prstate->nfrozen > 0)
+ {
+ /*
+ * The page contained some tuples that were not already frozen, and we
+ * chose not to freeze them now. The page won't be all-frozen then.
+ */
+ Assert(!prstate->pagefrz.freeze_required);
+
+ prstate->all_frozen = false;
+ prstate->nfrozen = 0; /* avoid miscounts in instrumentation */
+ }
+ else
+ {
+ /*
+ * We have no freeze plans to execute. The page might already be
+ * all-frozen (perhaps only following pruning), though. Such pages
+ * can be marked all-frozen in the VM by our caller, even though none
+ * of its tuples were newly frozen here.
+ */
+ }
+
+ return do_freeze;
+}
+
/*
* Prune and repair fragmentation and potentially freeze tuples on the
@@ -666,87 +780,12 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
* Decide if we want to go ahead with freezing according to the freeze
* plans we prepared, or not.
*/
- do_freeze = false;
- if (prstate.attempt_freeze)
- {
- if (prstate.pagefrz.freeze_required)
- {
- /*
- * heap_prepare_freeze_tuple indicated that at least one XID/MXID
- * from before FreezeLimit/MultiXactCutoff is present. Must
- * freeze to advance relfrozenxid/relminmxid.
- */
- do_freeze = true;
- }
- else
- {
- /*
- * Opportunistically freeze the page if we are generating an FPI
- * anyway and if doing so means that we can set the page
- * all-frozen afterwards (might not happen until VACUUM's final
- * heap pass).
- *
- * XXX: Previously, we knew if pruning emitted an FPI by checking
- * pgWalUsage.wal_fpi before and after pruning. Once the freeze
- * and prune records were combined, this heuristic couldn't be
- * used anymore. The opportunistic freeze heuristic must be
- * improved; however, for now, try to approximate the old logic.
- */
- if (prstate.all_frozen && prstate.nfrozen > 0)
- {
- Assert(prstate.all_visible);
+ do_freeze = heap_page_will_freeze(relation, buffer,
+ did_tuple_hint_fpi,
+ do_prune,
+ do_hint_prune,
+ &prstate);
- /*
- * Freezing would make the page all-frozen. Have already
- * emitted an FPI or will do so anyway?
- */
- if (RelationNeedsWAL(relation))
- {
- if (did_tuple_hint_fpi)
- do_freeze = true;
- else if (do_prune)
- {
- if (XLogCheckBufferNeedsBackup(buffer))
- do_freeze = true;
- }
- else if (do_hint_prune)
- {
- if (XLogHintBitIsNeeded() && XLogCheckBufferNeedsBackup(buffer))
- do_freeze = true;
- }
- }
- }
- }
- }
-
- if (do_freeze)
- {
- /*
- * Validate the tuples we will be freezing before entering the
- * critical section.
- */
- heap_pre_freeze_checks(buffer, prstate.frozen, prstate.nfrozen);
- }
- else if (prstate.nfrozen > 0)
- {
- /*
- * The page contained some tuples that were not already frozen, and we
- * chose not to freeze them now. The page won't be all-frozen then.
- */
- Assert(!prstate.pagefrz.freeze_required);
-
- prstate.all_frozen = false;
- prstate.nfrozen = 0; /* avoid miscounts in instrumentation */
- }
- else
- {
- /*
- * We have no freeze plans to execute. The page might already be
- * all-frozen (perhaps only following pruning), though. Such pages
- * can be marked all-frozen in the VM by our caller, even though none
- * of its tuples were newly frozen here.
- */
- }
Assert(!prstate.all_frozen || prstate.all_visible);
/* Any error while applying the changes is critical */
--
2.43.0
v15-0003-Keep-all_frozen-updated-in-heap_page_prune_and_f.patchtext/x-patch; charset=US-ASCII; name=v15-0003-Keep-all_frozen-updated-in-heap_page_prune_and_f.patchDownload
From 026fe909ccd34e8f7ca92a56c83e9c2aac813a10 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <melanieplageman@gmail.com>
Date: Wed, 23 Jul 2025 16:01:24 -0400
Subject: [PATCH v15 03/23] Keep all_frozen updated in
heap_page_prune_and_freeze
We previously relied on only using all-visible and all-frozen together
but it's best to keep them both updated.
Future commits will separate usage of these fields, so it is best not to
rely on all_visible for all_frozen's validity.
---
src/backend/access/heap/pruneheap.c | 21 ++++++++++-----------
src/backend/access/heap/vacuumlazy.c | 9 ++++-----
2 files changed, 14 insertions(+), 16 deletions(-)
diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index 28bd6a56749..ea8216e0632 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -142,10 +142,6 @@ typedef struct
* whether to freeze the page or not. The all_visible and all_frozen
* values returned to the caller are adjusted to include LP_DEAD items at
* the end.
- *
- * all_frozen should only be considered valid if all_visible is also set;
- * we don't bother to clear the all_frozen flag every time we clear the
- * all_visible flag.
*/
bool all_visible;
bool all_frozen;
@@ -696,8 +692,10 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
* used anymore. The opportunistic freeze heuristic must be
* improved; however, for now, try to approximate the old logic.
*/
- if (prstate.all_visible && prstate.all_frozen && prstate.nfrozen > 0)
+ if (prstate.all_frozen && prstate.nfrozen > 0)
{
+ Assert(prstate.all_visible);
+
/*
* Freezing would make the page all-frozen. Have already
* emitted an FPI or will do so anyway?
@@ -750,6 +748,7 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
*/
}
+ Assert(!prstate.all_frozen || prstate.all_visible);
/* Any error while applying the changes is critical */
START_CRIT_SECTION();
@@ -819,7 +818,7 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
*/
if (do_freeze)
{
- if (prstate.all_visible && prstate.all_frozen)
+ if (prstate.all_frozen)
frz_conflict_horizon = prstate.visibility_cutoff_xid;
else
{
@@ -1382,7 +1381,7 @@ heap_prune_record_unchanged_lp_normal(Page page, PruneState *prstate, OffsetNumb
if (!HeapTupleHeaderXminCommitted(htup))
{
- prstate->all_visible = false;
+ prstate->all_visible = prstate->all_frozen = false;
break;
}
@@ -1404,7 +1403,7 @@ heap_prune_record_unchanged_lp_normal(Page page, PruneState *prstate, OffsetNumb
Assert(prstate->cutoffs);
if (!TransactionIdPrecedes(xmin, prstate->cutoffs->OldestXmin))
{
- prstate->all_visible = false;
+ prstate->all_visible = prstate->all_frozen = false;
break;
}
@@ -1417,7 +1416,7 @@ heap_prune_record_unchanged_lp_normal(Page page, PruneState *prstate, OffsetNumb
case HEAPTUPLE_RECENTLY_DEAD:
prstate->recently_dead_tuples++;
- prstate->all_visible = false;
+ prstate->all_visible = prstate->all_frozen = false;
/*
* This tuple will soon become DEAD. Update the hint field so
@@ -1436,7 +1435,7 @@ heap_prune_record_unchanged_lp_normal(Page page, PruneState *prstate, OffsetNumb
* assumption is a bit shaky, but it is what acquire_sample_rows()
* does, so be consistent.
*/
- prstate->all_visible = false;
+ prstate->all_visible = prstate->all_frozen = false;
/*
* If we wanted to optimize for aborts, we might consider marking
@@ -1454,7 +1453,7 @@ heap_prune_record_unchanged_lp_normal(Page page, PruneState *prstate, OffsetNumb
* will commit and update the counters after we report.
*/
prstate->live_tuples++;
- prstate->all_visible = false;
+ prstate->all_visible = prstate->all_frozen = false;
/*
* This tuple may soon become DEAD. Update the hint field so that
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index ddc9677694c..50cc898087f 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -2003,7 +2003,6 @@ lazy_scan_prune(LVRelState *vacrel,
* agreement with heap_page_is_all_visible() using an assertion.
*/
#ifdef USE_ASSERT_CHECKING
- /* Note that all_frozen value does not matter when !all_visible */
if (presult.all_visible)
{
TransactionId debug_cutoff;
@@ -2056,6 +2055,7 @@ lazy_scan_prune(LVRelState *vacrel,
*has_lpdead_items = (presult.lpdead_items > 0);
Assert(!presult.all_visible || !(*has_lpdead_items));
+ Assert(!presult.all_frozen || presult.all_visible);
/*
* Handle setting visibility map bit based on information from the VM (as
@@ -2161,11 +2161,10 @@ lazy_scan_prune(LVRelState *vacrel,
/*
* If the all-visible page is all-frozen but not marked as such yet, mark
- * it as all-frozen. Note that all_frozen is only valid if all_visible is
- * true, so we must check both all_visible and all_frozen.
+ * it as all-frozen.
*/
- else if (all_visible_according_to_vm && presult.all_visible &&
- presult.all_frozen && !VM_ALL_FROZEN(vacrel->rel, blkno, &vmbuffer))
+ else if (all_visible_according_to_vm && presult.all_frozen &&
+ !VM_ALL_FROZEN(vacrel->rel, blkno, &vmbuffer))
{
uint8 old_vmbits;
--
2.43.0
v15-0001-Eliminate-COPY-FREEZE-use-of-XLOG_HEAP2_VISIBLE.patchtext/x-patch; charset=US-ASCII; name=v15-0001-Eliminate-COPY-FREEZE-use-of-XLOG_HEAP2_VISIBLE.patchDownload
From 5ae81ae4429b21bab607a053c80e5b9217b48751 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <melanieplageman@gmail.com>
Date: Tue, 17 Jun 2025 17:22:10 -0400
Subject: [PATCH v15 01/23] Eliminate COPY FREEZE use of XLOG_HEAP2_VISIBLE
Instead of emitting a separate WAL XLOG_HEAP2_VISIBLE record for setting
bits in the VM, specify the changes to make to the VM block in the
XLOG_HEAP2_MULTI_INSERT record.
This halves the number of WAL records emitted by COPY FREEZE.
Author: Melanie Plageman <melanieplageman@gmail.com>
Reviewed-by: Robert Haas <robertmhaas@gmail.com>
Reviewed-by: Andres Freund <andres@anarazel.de>
Reviewed-by: Kirill Reshke <reshkekirill@gmail.com>
Discussion: https://postgr.es/m/flat/CAAKRu_ZMw6Npd_qm2KM%2BFwQ3cMOMx1Dh3VMhp8-V7SOLxdK9-g%40mail.gmail.com
---
src/backend/access/heap/heapam.c | 44 ++++++++++------
src/backend/access/heap/heapam_xlog.c | 59 ++++++++++++++++++++-
src/backend/access/heap/visibilitymap.c | 70 ++++++++++++++++++++++++-
src/backend/access/rmgrdesc/heapdesc.c | 5 ++
src/include/access/visibilitymap.h | 3 ++
5 files changed, 163 insertions(+), 18 deletions(-)
diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index ed0c0c2dc9f..7f354caec31 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -2466,7 +2466,11 @@ heap_multi_insert(Relation relation, TupleTableSlot **slots, int ntuples,
starting_with_empty_page = PageGetMaxOffsetNumber(page) == 0;
if (starting_with_empty_page && (options & HEAP_INSERT_FROZEN))
+ {
all_frozen_set = true;
+ /* Lock the vmbuffer before entering the critical section */
+ LockBuffer(vmbuffer, BUFFER_LOCK_EXCLUSIVE);
+ }
/* NO EREPORT(ERROR) from here till changes are logged */
START_CRIT_SECTION();
@@ -2506,7 +2510,8 @@ heap_multi_insert(Relation relation, TupleTableSlot **slots, int ntuples,
* going to add further frozen rows to it.
*
* If we're only adding already frozen rows to a previously empty
- * page, mark it as all-visible.
+ * page, mark it as all-frozen and update the visibility map. We're
+ * already holding a pin on the vmbuffer.
*/
if (PageIsAllVisible(page) && !(options & HEAP_INSERT_FROZEN))
{
@@ -2517,7 +2522,14 @@ heap_multi_insert(Relation relation, TupleTableSlot **slots, int ntuples,
vmbuffer, VISIBILITYMAP_VALID_BITS);
}
else if (all_frozen_set)
+ {
PageSetAllVisible(page);
+ visibilitymap_set_vmbits(BufferGetBlockNumber(buffer),
+ vmbuffer,
+ VISIBILITYMAP_ALL_VISIBLE |
+ VISIBILITYMAP_ALL_FROZEN,
+ RelationGetRelationName(relation));
+ }
/*
* XXX Should we set PageSetPrunable on this page ? See heap_insert()
@@ -2565,6 +2577,12 @@ heap_multi_insert(Relation relation, TupleTableSlot **slots, int ntuples,
xlrec->flags = 0;
if (all_visible_cleared)
xlrec->flags = XLH_INSERT_ALL_VISIBLE_CLEARED;
+
+ /*
+ * We don't have to worry about including a conflict xid in the
+ * WAL record as HEAP_INSERT_FROZEN intentionally violates
+ * visibility rules.
+ */
if (all_frozen_set)
xlrec->flags = XLH_INSERT_ALL_FROZEN_SET;
@@ -2627,7 +2645,10 @@ heap_multi_insert(Relation relation, TupleTableSlot **slots, int ntuples,
XLogBeginInsert();
XLogRegisterData(xlrec, tupledata - scratch.data);
+
XLogRegisterBuffer(0, buffer, REGBUF_STANDARD | bufflags);
+ if (all_frozen_set)
+ XLogRegisterBuffer(1, vmbuffer, 0);
XLogRegisterBufData(0, tupledata, totaldatalen);
@@ -2637,26 +2658,17 @@ heap_multi_insert(Relation relation, TupleTableSlot **slots, int ntuples,
recptr = XLogInsert(RM_HEAP2_ID, info);
PageSetLSN(page, recptr);
+ if (all_frozen_set)
+ {
+ Assert(BufferIsDirty(vmbuffer));
+ PageSetLSN(BufferGetPage(vmbuffer), recptr);
+ }
}
END_CRIT_SECTION();
- /*
- * If we've frozen everything on the page, update the visibilitymap.
- * We're already holding pin on the vmbuffer.
- */
if (all_frozen_set)
- {
- /*
- * It's fine to use InvalidTransactionId here - this is only used
- * when HEAP_INSERT_FROZEN is specified, which intentionally
- * violates visibility rules.
- */
- visibilitymap_set(relation, BufferGetBlockNumber(buffer), buffer,
- InvalidXLogRecPtr, vmbuffer,
- InvalidTransactionId,
- VISIBILITYMAP_ALL_VISIBLE | VISIBILITYMAP_ALL_FROZEN);
- }
+ LockBuffer(vmbuffer, BUFFER_LOCK_UNLOCK);
UnlockReleaseBuffer(buffer);
ndone += nthispage;
diff --git a/src/backend/access/heap/heapam_xlog.c b/src/backend/access/heap/heapam_xlog.c
index cf843277938..73aaaef9d8e 100644
--- a/src/backend/access/heap/heapam_xlog.c
+++ b/src/backend/access/heap/heapam_xlog.c
@@ -551,6 +551,7 @@ heap_xlog_multi_insert(XLogReaderState *record)
int i;
bool isinit = (XLogRecGetInfo(record) & XLOG_HEAP_INIT_PAGE) != 0;
XLogRedoAction action;
+ Buffer vmbuffer = InvalidBuffer;
/*
* Insertion doesn't overwrite MVCC data, so no conflict processing is
@@ -571,11 +572,11 @@ heap_xlog_multi_insert(XLogReaderState *record)
if (xlrec->flags & XLH_INSERT_ALL_VISIBLE_CLEARED)
{
Relation reln = CreateFakeRelcacheEntry(rlocator);
- Buffer vmbuffer = InvalidBuffer;
visibilitymap_pin(reln, blkno, &vmbuffer);
visibilitymap_clear(reln, blkno, vmbuffer, VISIBILITYMAP_VALID_BITS);
ReleaseBuffer(vmbuffer);
+ vmbuffer = InvalidBuffer;
FreeFakeRelcacheEntry(reln);
}
@@ -662,6 +663,62 @@ heap_xlog_multi_insert(XLogReaderState *record)
if (BufferIsValid(buffer))
UnlockReleaseBuffer(buffer);
+ buffer = InvalidBuffer;
+
+ /*
+ * Now read and update the VM block.
+ *
+ * We must redo changes to the VM even if the heap page was skipped due to
+ * LSN interlock. Each block of the VM contains bits for multiple heap
+ * blocks and subsequent records may contain updates to other bits in this
+ * block. If this record contains an FPI, subsequent records may rely on
+ * it for protection against a torn page.
+ *
+ * The changes to the heap page are replayed first to maintain the
+ * invariant that PD_ALL_VISIBLE must be set if the VM is set.
+ *
+ * Note that the lock on the heap page was dropped above. In normal
+ * operation this would never be safe because a concurrent query could
+ * modify the heap page and clear PD_ALL_VISIBLE -- violating the
+ * invariant that PD_ALL_VISIBLE must be set if the corresponding bit in
+ * the VM is set.
+ *
+ * In recovery, we expect no other writers, so writing to the VM page
+ * without holding a lock on the heap page is considered safe enough. It
+ * is done this way when replaying xl_heap_visible records (see
+ * heap_xlog_visible()).
+ */
+ if (xlrec->flags & XLH_INSERT_ALL_FROZEN_SET &&
+ XLogReadBufferForRedoExtended(record, 1, RBM_ZERO_ON_ERROR, false,
+ &vmbuffer) == BLK_NEEDS_REDO)
+ {
+ Page vmpage = BufferGetPage(vmbuffer);
+ char *relname;
+
+ /* initialize the page if it was read as zeros */
+ if (PageIsNew(vmpage))
+ PageInit(vmpage, BLCKSZ, 0);
+
+ /* We don't have relation name during recovery, so use relfilenode */
+ relname = psprintf("%u", rlocator.relNumber);
+ visibilitymap_set_vmbits(blkno,
+ vmbuffer,
+ VISIBILITYMAP_ALL_VISIBLE |
+ VISIBILITYMAP_ALL_FROZEN,
+ relname);
+
+ /*
+ * It is not possible that the VM was already set for this heap page,
+ * so the vmbuffer must have been modified and marked dirty.
+ */
+ Assert(BufferIsDirty(vmbuffer));
+ PageSetLSN(BufferGetPage(vmbuffer), lsn);
+ pfree(relname);
+ }
+
+ if (BufferIsValid(vmbuffer))
+ UnlockReleaseBuffer(vmbuffer);
+
/*
* If the page is running low on free space, update the FSM as well.
* Arbitrarily, our definition of "low" is less than 20%. We can't do much
diff --git a/src/backend/access/heap/visibilitymap.c b/src/backend/access/heap/visibilitymap.c
index 7306c16f05c..b28460392b7 100644
--- a/src/backend/access/heap/visibilitymap.c
+++ b/src/backend/access/heap/visibilitymap.c
@@ -14,7 +14,8 @@
* visibilitymap_clear - clear bits for one page in the visibility map
* visibilitymap_pin - pin a map page for setting a bit
* visibilitymap_pin_ok - check whether correct map page is already pinned
- * visibilitymap_set - set a bit in a previously pinned page
+ * visibilitymap_set - set bit(s) in a previously pinned page and log
+ * visibilitymap_set_vmbits - set bit(s) in a pinned page
* visibilitymap_get_status - get status of bits
* visibilitymap_count - count number of bits set in visibility map
* visibilitymap_prepare_truncate -
@@ -321,6 +322,73 @@ visibilitymap_set(Relation rel, BlockNumber heapBlk, Buffer heapBuf,
return status;
}
+/*
+ * Set flags in the VM block contained in the passed in vmBuf.
+ *
+ * This function is for callers which include the VM changes in the same WAL
+ * record as the modifications of the heap page which rendered it all-visible.
+ * Callers separately logging the VM changes should invoke visibilitymap_set()
+ * instead.
+ *
+ * Caller must have pinned and exclusive locked the correct block of the VM in
+ * vmBuf. This block should contain the VM bits for the given heapBlk.
+ *
+ * heapRelname is used only for debugging purposes.
+ *
+ * During normal operation (i.e. not recovery), this should be called in a
+ * critical section which also makes any necessary changes to the heap page
+ * and, if relevant, emits WAL.
+ *
+ * Caller is responsible for WAL logging the changes to the VM buffer and for
+ * making any changes needed to the associated heap page. This includes
+ * maintaining any invariants such as ensuring the buffer containing heapBlk
+ * is pinned and exclusive locked.
+ */
+uint8
+visibilitymap_set_vmbits(BlockNumber heapBlk,
+ Buffer vmBuf, uint8 flags,
+ const char *heapRelname)
+{
+ BlockNumber mapBlock = HEAPBLK_TO_MAPBLOCK(heapBlk);
+ uint32 mapByte = HEAPBLK_TO_MAPBYTE(heapBlk);
+ uint8 mapOffset = HEAPBLK_TO_OFFSET(heapBlk);
+ Page page;
+ uint8 *map;
+ uint8 status;
+
+#ifdef TRACE_VISIBILITYMAP
+ elog(DEBUG1, "vm_set flags 0x%02X for %s %d",
+ flags, heapRelname, heapBlk);
+#endif
+
+ /* Call in same critical section where WAL is emitted. */
+ Assert(InRecovery || CritSectionCount > 0);
+
+ /* Flags should be valid. Also never clear bits with this function */
+ Assert((flags & VISIBILITYMAP_VALID_BITS) == flags);
+
+ /* Must never set all_frozen bit without also setting all_visible bit */
+ Assert(flags != VISIBILITYMAP_ALL_FROZEN);
+
+ /* Check that we have the right VM page pinned */
+ if (!BufferIsValid(vmBuf) || BufferGetBlockNumber(vmBuf) != mapBlock)
+ elog(ERROR, "wrong VM buffer passed to visibilitymap_set");
+
+ Assert(BufferIsExclusiveLocked(vmBuf));
+
+ page = BufferGetPage(vmBuf);
+ map = (uint8 *) PageGetContents(page);
+
+ status = (map[mapByte] >> mapOffset) & VISIBILITYMAP_VALID_BITS;
+ if (flags != status)
+ {
+ map[mapByte] |= (flags << mapOffset);
+ MarkBufferDirty(vmBuf);
+ }
+
+ return status;
+}
+
/*
* visibilitymap_get_status - get status of bits
*
diff --git a/src/backend/access/rmgrdesc/heapdesc.c b/src/backend/access/rmgrdesc/heapdesc.c
index 82b62c95de5..b48d7dc1d24 100644
--- a/src/backend/access/rmgrdesc/heapdesc.c
+++ b/src/backend/access/rmgrdesc/heapdesc.c
@@ -16,6 +16,7 @@
#include "access/heapam_xlog.h"
#include "access/rmgrdesc_utils.h"
+#include "access/visibilitymapdefs.h"
#include "storage/standbydefs.h"
/*
@@ -354,6 +355,10 @@ heap2_desc(StringInfo buf, XLogReaderState *record)
appendStringInfo(buf, "ntuples: %d, flags: 0x%02X", xlrec->ntuples,
xlrec->flags);
+ if (xlrec->flags & XLH_INSERT_ALL_FROZEN_SET)
+ appendStringInfo(buf, ", vm_flags: 0x%02X",
+ VISIBILITYMAP_ALL_VISIBLE | VISIBILITYMAP_ALL_FROZEN);
+
if (XLogRecHasBlockData(record, 0) && !isinit)
{
appendStringInfoString(buf, ", offsets:");
diff --git a/src/include/access/visibilitymap.h b/src/include/access/visibilitymap.h
index be21c6dd1a3..3dcf37ba03f 100644
--- a/src/include/access/visibilitymap.h
+++ b/src/include/access/visibilitymap.h
@@ -37,6 +37,9 @@ extern uint8 visibilitymap_set(Relation rel,
Buffer vmBuf,
TransactionId cutoff_xid,
uint8 flags);
+extern uint8 visibilitymap_set_vmbits(BlockNumber heapBlk,
+ Buffer vmBuf, uint8 flags,
+ const char *heapRelname);
extern uint8 visibilitymap_get_status(Relation rel, BlockNumber heapBlk, Buffer *vmbuf);
extern void visibilitymap_count(Relation rel, BlockNumber *all_visible, BlockNumber *all_frozen);
extern BlockNumber visibilitymap_prepare_truncate(Relation rel,
--
2.43.0
v15-0006-Update-PruneState.all_-visible-frozen-sooner-in-.patchtext/x-patch; charset=US-ASCII; name=v15-0006-Update-PruneState.all_-visible-frozen-sooner-in-.patchDownload
From 3221391ad5b194084715fcbac076948cf79dfcc9 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <melanieplageman@gmail.com>
Date: Mon, 15 Sep 2025 16:25:44 -0400
Subject: [PATCH v15 06/23] Update PruneState.all_[visible|frozen] sooner in
pruning
We don't clear PruneState.all_visible and all_frozen during pruning when
we see LP_DEAD items because we want to still opportunistically freeze a
page if it would become frozen after vacuum's third phase.
Currently, this is fine because heap_page_prune_and_freeze() doesn't set
PD_ALL_VISIBLE or set bits in the VM. If we want to do that in the
future, we need all_visible and all_frozen to be accurate earlier in
heap_page_prune_and_freeze(). To do this, we must also move up
determination of the freeze conflict horizon. We use the visibility
cutoff xid even if the whole page won't be frozen until after vacuum's
third phase.
---
src/backend/access/heap/pruneheap.c | 95 ++++++++++++++---------------
1 file changed, 45 insertions(+), 50 deletions(-)
diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index 4ed74de6f27..5e536bd0d4d 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -296,7 +296,9 @@ heap_page_prune_opt(Relation relation, Buffer buffer)
* pre-freeze checks.
*
* do_prune, do_hint_full_or_prunable, and did_tuple_hint_fpi must all have
- * been decided before calling this function.
+ * been decided before calling this function. *frz_conflict_horizon is set to
+ * the snapshot conflict horizon we for the WAL record should we decide to freeze
+ * tuples.
*
* prstate is an input/output parameter.
*
@@ -308,7 +310,8 @@ heap_page_will_freeze(Relation relation, Buffer buffer,
bool did_tuple_hint_fpi,
bool do_prune,
bool do_hint_prune,
- PruneState *prstate)
+ PruneState *prstate,
+ TransactionId *frz_conflict_horizon)
{
bool do_freeze = false;
@@ -378,6 +381,22 @@ heap_page_will_freeze(Relation relation, Buffer buffer,
* critical section.
*/
heap_pre_freeze_checks(buffer, prstate->frozen, prstate->nfrozen);
+
+ /*
+ * Calculate what the snapshot conflict horizon should be for a record
+ * freezing tuples. We can use the visibility_cutoff_xid as our cutoff
+ * for conflicts when the whole page is eligible to become all-frozen
+ * in the VM once we're done with it. Otherwise we generate a
+ * conservative cutoff by stepping back from OldestXmin.
+ */
+ if (prstate->all_frozen)
+ *frz_conflict_horizon = prstate->visibility_cutoff_xid;
+ else
+ {
+ /* Avoids false conflicts when hot_standby_feedback in use */
+ *frz_conflict_horizon = prstate->cutoffs->OldestXmin;
+ TransactionIdRetreat(*frz_conflict_horizon);
+ }
}
else if (prstate->nfrozen > 0)
{
@@ -478,6 +497,7 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
bool do_hint_prune;
bool did_tuple_hint_fpi;
int64 fpi_before = pgWalUsage.wal_fpi;
+ TransactionId frz_conflict_horizon = InvalidTransactionId;
/* Copy parameters to prstate */
prstate.vistest = vistest;
@@ -546,10 +566,10 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
* are tuples present that are not visible to everyone or if there are
* dead tuples which are not yet removable. However, dead tuples which
* will be removed by the end of vacuuming should not preclude us from
- * opportunistically freezing. Because of that, we do not clear
- * all_visible when we see LP_DEAD items. We fix that at the end of the
- * function, when we return the value to the caller, so that the caller
- * doesn't set the VM bit incorrectly.
+ * opportunistically freezing. Because of that, we do not immediately
+ * clear all_visible when we see LP_DEAD items. We fix that after
+ * scanning the line pointers, before we return the value to the caller,
+ * so that the caller doesn't set the VM bit incorrectly.
*/
if (prstate.attempt_freeze)
{
@@ -784,8 +804,24 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
did_tuple_hint_fpi,
do_prune,
do_hint_prune,
- &prstate);
+ &prstate,
+ &frz_conflict_horizon);
+ /*
+ * While scanning the line pointers, we did not clear
+ * all_visible/all_frozen when encountering LP_DEAD items because we
+ * wanted the decision whether or not to freeze the page to be unaffected
+ * by the short-term presence of LP_DEAD items. These LP_DEAD items are
+ * effectively assumed to be LP_UNUSED items in the making. It doesn't
+ * matter which vacuum heap pass (initial pass or final pass) ends up
+ * setting the page all-frozen, as long as the ongoing VACUUM does it.
+ *
+ * Now that we finished determining whether or not to freeze the page,
+ * update all_visible and all_frozen so that they reflect the true state
+ * of the page for setting PD_ALL_VISIBLE and VM bits.
+ */
+ if (prstate.lpdead_items > 0)
+ prstate.all_visible = prstate.all_frozen = false;
Assert(!prstate.all_frozen || prstate.all_visible);
/* Any error while applying the changes is critical */
@@ -846,27 +882,8 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
* on the standby with xids older than the youngest tuple this
* record will freeze will conflict.
*/
- TransactionId frz_conflict_horizon = InvalidTransactionId;
TransactionId conflict_xid;
- /*
- * We can use the visibility_cutoff_xid as our cutoff for
- * conflicts when the whole page is eligible to become all-frozen
- * in the VM once we're done with it. Otherwise we generate a
- * conservative cutoff by stepping back from OldestXmin.
- */
- if (do_freeze)
- {
- if (prstate.all_frozen)
- frz_conflict_horizon = prstate.visibility_cutoff_xid;
- else
- {
- /* Avoids false conflicts when hot_standby_feedback in use */
- frz_conflict_horizon = prstate.cutoffs->OldestXmin;
- TransactionIdRetreat(frz_conflict_horizon);
- }
- }
-
if (TransactionIdFollows(frz_conflict_horizon, prstate.latest_xid_removed))
conflict_xid = frz_conflict_horizon;
else
@@ -890,30 +907,8 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
presult->nfrozen = prstate.nfrozen;
presult->live_tuples = prstate.live_tuples;
presult->recently_dead_tuples = prstate.recently_dead_tuples;
-
- /*
- * It was convenient to ignore LP_DEAD items in all_visible earlier on to
- * make the choice of whether or not to freeze the page unaffected by the
- * short-term presence of LP_DEAD items. These LP_DEAD items were
- * effectively assumed to be LP_UNUSED items in the making. It doesn't
- * matter which vacuum heap pass (initial pass or final pass) ends up
- * setting the page all-frozen, as long as the ongoing VACUUM does it.
- *
- * Now that freezing has been finalized, unset all_visible if there are
- * any LP_DEAD items on the page. It needs to reflect the present state
- * of the page, as expected by our caller.
- */
- if (prstate.all_visible && prstate.lpdead_items == 0)
- {
- presult->all_visible = prstate.all_visible;
- presult->all_frozen = prstate.all_frozen;
- }
- else
- {
- presult->all_visible = false;
- presult->all_frozen = false;
- }
-
+ presult->all_visible = prstate.all_visible;
+ presult->all_frozen = prstate.all_frozen;
presult->hastup = prstate.hastup;
/*
--
2.43.0
v15-0008-Combine-vacuum-phase-I-VM-update-cases.patchtext/x-patch; charset=US-ASCII; name=v15-0008-Combine-vacuum-phase-I-VM-update-cases.patchDownload
From 78d8fd0ab8ef94c9de7f3a4c8f308ce0c2cba54b Mon Sep 17 00:00:00 2001
From: Melanie Plageman <melanieplageman@gmail.com>
Date: Mon, 15 Sep 2025 17:48:38 -0400
Subject: [PATCH v15 08/23] Combine vacuum phase I VM update cases
We update the VM after phase I of vacuum -- either setting both the VM
bits when all bits are currently unset or setting just the frozen bit
when the all-visible bit is already set.
Those two cases shared much of the same code -- leading to unnecessary
duplication. This commit combines them, which is simpler and easier to
understand.
---
src/backend/access/heap/vacuumlazy.c | 68 ++++++++--------------------
1 file changed, 18 insertions(+), 50 deletions(-)
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index 308abff16ca..5a6bbbd97f2 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -2058,15 +2058,22 @@ lazy_scan_prune(LVRelState *vacrel,
Assert(!presult.all_frozen || presult.all_visible);
/*
- * Handle setting visibility map bit based on information from the VM (as
+ * Handle setting visibility map bits based on information from the VM (as
* of last heap_vac_scan_next_block() call), and from all_visible and
- * all_frozen variables
+ * all_frozen variables.
*/
- if (!all_visible_according_to_vm && presult.all_visible)
+ if ((presult.all_visible && !all_visible_according_to_vm) ||
+ (presult.all_frozen && !VM_ALL_FROZEN(vacrel->rel, blkno, &vmbuffer)))
{
uint8 old_vmbits;
uint8 flags = VISIBILITYMAP_ALL_VISIBLE;
+ /*
+ * If the page is all-frozen, we can pass InvalidTransactionId as our
+ * cutoff_xid, since a snapshotConflictHorizon sufficient to make
+ * everything safe for REDO was logged when the page's tuples were
+ * frozen.
+ */
if (presult.all_frozen)
{
Assert(!TransactionIdIsValid(presult.vm_conflict_horizon));
@@ -2079,6 +2086,12 @@ lazy_scan_prune(LVRelState *vacrel,
flags);
/*
+ * Even if we are only setting the all-frozen bit, there is a small
+ * chance that the VM was modified sometime between setting
+ * all_visible_according_to_vm and checking the visibility during
+ * pruning. Check the return value of old_vmbits to ensure the
+ * visibility map counters used for logging are accurate.
+ *
* If the page wasn't already set all-visible and/or all-frozen in the
* VM, count it as newly set for logging.
*/
@@ -2100,6 +2113,8 @@ lazy_scan_prune(LVRelState *vacrel,
}
/*
+ * Now handle two potential corruption cases:
+ *
* As of PostgreSQL 9.2, the visibility map bit should never be set if the
* page-level bit is clear. However, it's possible that the bit got
* cleared after heap_vac_scan_next_block() was called, so we must recheck
@@ -2144,53 +2159,6 @@ lazy_scan_prune(LVRelState *vacrel,
VISIBILITYMAP_VALID_BITS);
}
- /*
- * If the all-visible page is all-frozen but not marked as such yet, mark
- * it as all-frozen.
- */
- else if (all_visible_according_to_vm && presult.all_frozen &&
- !VM_ALL_FROZEN(vacrel->rel, blkno, &vmbuffer))
- {
- uint8 old_vmbits;
-
- /*
- * Set the page all-frozen (and all-visible) in the VM.
- *
- * We can pass InvalidTransactionId as our cutoff_xid, since a
- * snapshotConflictHorizon sufficient to make everything safe for REDO
- * was logged when the page's tuples were frozen.
- */
- Assert(!TransactionIdIsValid(presult.vm_conflict_horizon));
- old_vmbits = visibilitymap_set(vacrel->rel, blkno, buf,
- InvalidXLogRecPtr,
- vmbuffer, InvalidTransactionId,
- VISIBILITYMAP_ALL_VISIBLE |
- VISIBILITYMAP_ALL_FROZEN);
-
- /*
- * The page was likely already set all-visible in the VM. However,
- * there is a small chance that it was modified sometime between
- * setting all_visible_according_to_vm and checking the visibility
- * during pruning. Check the return value of old_vmbits anyway to
- * ensure the visibility map counters used for logging are accurate.
- */
- if ((old_vmbits & VISIBILITYMAP_ALL_VISIBLE) == 0)
- {
- vacrel->vm_new_visible_pages++;
- vacrel->vm_new_visible_frozen_pages++;
- *vm_page_frozen = true;
- }
-
- /*
- * We already checked that the page was not set all-frozen in the VM
- * above, so we don't need to test the value of old_vmbits.
- */
- else
- {
- vacrel->vm_new_frozen_pages++;
- *vm_page_frozen = true;
- }
- }
return presult.ndeleted;
}
--
2.43.0
v15-0009-Vacuum-phase-III-set-PD_ALL_VISIBLE-in-vacuum-WA.patchtext/x-patch; charset=US-ASCII; name=v15-0009-Vacuum-phase-III-set-PD_ALL_VISIBLE-in-vacuum-WA.patchDownload
From e5fba63482e7d3bd44a991773ac3da50d2402781 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <melanieplageman@gmail.com>
Date: Tue, 16 Sep 2025 10:39:31 -0400
Subject: [PATCH v15 09/23] Vacuum phase III set PD_ALL_VISIBLE in vacuum WAL
record
Instead of setting PD_ALL_VISIBLE on the heap page when setting bits in
the VM, set it when flipping the line pointers on the page to LP_UNUSED.
This will allow us to omit the heap page from the VM WAL chain.
To do this, we must check if the page will be all-visible once we flip
the line pointers before we actually do so.
One functional change is that a single critical section surrounds both
the VM update and the heap update. Previously they were each in a
critical section, so we could crash and have set PD_ALL_VISIBLE but not
set bits in the VM.
---
src/backend/access/heap/vacuumlazy.c | 140 ++++++++++++++++++++-------
1 file changed, 105 insertions(+), 35 deletions(-)
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index 5a6bbbd97f2..9bfcd67a61b 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -465,6 +465,11 @@ static void dead_items_reset(LVRelState *vacrel);
static void dead_items_cleanup(LVRelState *vacrel);
static bool heap_page_is_all_visible(LVRelState *vacrel, Buffer buf,
TransactionId *visibility_cutoff_xid, bool *all_frozen);
+static bool heap_page_would_be_all_visible(LVRelState *vacrel, Buffer buf,
+ OffsetNumber *deadoffsets,
+ int ndeadoffsets,
+ bool *all_frozen,
+ TransactionId *visibility_cutoff_xid);
static void update_relstats_all_indexes(LVRelState *vacrel);
static void vacuum_error_callback(void *arg);
static void update_vacuum_error_info(LVRelState *vacrel,
@@ -2793,6 +2798,7 @@ lazy_vacuum_heap_page(LVRelState *vacrel, BlockNumber blkno, Buffer buffer,
TransactionId visibility_cutoff_xid;
bool all_frozen;
LVSavedErrInfo saved_err_info;
+ uint8 vmflags = 0;
Assert(vacrel->do_index_vacuuming);
@@ -2803,6 +2809,18 @@ lazy_vacuum_heap_page(LVRelState *vacrel, BlockNumber blkno, Buffer buffer,
VACUUM_ERRCB_PHASE_VACUUM_HEAP, blkno,
InvalidOffsetNumber);
+ if (heap_page_would_be_all_visible(vacrel, buffer,
+ deadoffsets, num_offsets,
+ &all_frozen, &visibility_cutoff_xid))
+ {
+ vmflags |= VISIBILITYMAP_ALL_VISIBLE;
+ if (all_frozen)
+ {
+ vmflags |= VISIBILITYMAP_ALL_FROZEN;
+ Assert(!TransactionIdIsValid(visibility_cutoff_xid));
+ }
+ }
+
START_CRIT_SECTION();
for (int i = 0; i < num_offsets; i++)
@@ -2822,6 +2840,13 @@ lazy_vacuum_heap_page(LVRelState *vacrel, BlockNumber blkno, Buffer buffer,
/* Attempt to truncate line pointer array now */
PageTruncateLinePointerArray(page);
+ /*
+ * The page will never have PD_ALL_VISIBLE already set, so if we are
+ * setting the VM, we must set PD_ALL_VISIBLE as well.
+ */
+ if ((vmflags & VISIBILITYMAP_VALID_BITS) != 0)
+ PageSetAllVisible(page);
+
/*
* Mark buffer dirty before we write WAL.
*/
@@ -2833,7 +2858,7 @@ lazy_vacuum_heap_page(LVRelState *vacrel, BlockNumber blkno, Buffer buffer,
log_heap_prune_and_freeze(vacrel->rel, buffer,
InvalidTransactionId,
false, /* no cleanup lock required */
- false,
+ (vmflags & VISIBILITYMAP_VALID_BITS) != 0,
PRUNE_VACUUM_CLEANUP,
NULL, 0, /* frozen */
NULL, 0, /* redirected */
@@ -2842,36 +2867,26 @@ lazy_vacuum_heap_page(LVRelState *vacrel, BlockNumber blkno, Buffer buffer,
}
/*
- * End critical section, so we safely can do visibility tests (which
- * possibly need to perform IO and allocate memory!). If we crash now the
- * page (including the corresponding vm bit) might not be marked all
- * visible, but that's fine. A later vacuum will fix that.
+ * Note that we don't end the critical section until after emitting the VM
+ * record. This ensures both PD_ALL_VISIBLE and the VM bits are set or
+ * unset in the event of a crash. While it is correct for PD_ALL_VISIBLE
+ * to be set and the VM to be clear, we should do our best to keep these
+ * in sync. This does mean that we will take a lock on the VM buffer
+ * inside of a critical section, which is generally discouraged. There is
+ * precedent for this in other callers of visibilitymap_set(), though.
*/
- END_CRIT_SECTION();
/*
- * Now that we have removed the LP_DEAD items from the page, once again
- * check if the page has become all-visible. The page is already marked
- * dirty, exclusively locked, and, if needed, a full page image has been
- * emitted.
+ * Now that we have removed the LP_DEAD items from the page, set the
+ * visibility map if the page became all-visible/all-frozen. Changes to
+ * the heap page have already been logged.
*/
- Assert(!PageIsAllVisible(page));
- if (heap_page_is_all_visible(vacrel, buffer, &visibility_cutoff_xid,
- &all_frozen))
+ if ((vmflags & VISIBILITYMAP_ALL_VISIBLE) != 0)
{
- uint8 flags = VISIBILITYMAP_ALL_VISIBLE;
-
- if (all_frozen)
- {
- Assert(!TransactionIdIsValid(visibility_cutoff_xid));
- flags |= VISIBILITYMAP_ALL_FROZEN;
- }
-
- PageSetAllVisible(page);
visibilitymap_set(vacrel->rel, blkno, buffer,
InvalidXLogRecPtr,
vmbuffer, visibility_cutoff_xid,
- flags);
+ vmflags);
/* Count the newly set VM page for logging */
vacrel->vm_new_visible_pages++;
@@ -2879,6 +2894,8 @@ lazy_vacuum_heap_page(LVRelState *vacrel, BlockNumber blkno, Buffer buffer,
vacrel->vm_new_visible_frozen_pages++;
}
+ END_CRIT_SECTION();
+
/* Revert to the previous phase information for error traceback */
restore_vacuum_error_info(vacrel, &saved_err_info);
}
@@ -3540,30 +3557,77 @@ dead_items_cleanup(LVRelState *vacrel)
}
/*
- * Check if every tuple in the given page is visible to all current and future
- * transactions. Also return the visibility_cutoff_xid which is the highest
- * xmin amongst the visible tuples. Set *all_frozen to true if every tuple
- * on this page is frozen.
- *
- * This is a stripped down version of lazy_scan_prune(). If you change
- * anything here, make sure that everything stays in sync. Note that an
- * assertion calls us to verify that everybody still agrees. Be sure to avoid
- * introducing new side-effects here.
+ * Wrapper for heap_page_would_be_all_visible() which can be used for
+ * callers that expect no LP_DEAD on the page.
*/
static bool
heap_page_is_all_visible(LVRelState *vacrel, Buffer buf,
TransactionId *visibility_cutoff_xid,
bool *all_frozen)
{
+
+ return heap_page_would_be_all_visible(vacrel, buf,
+ NULL, 0,
+ all_frozen,
+ visibility_cutoff_xid);
+}
+
+/*
+ * Determines whether or not the heap page in buf is all-visible other than
+ * the dead line pointers referred to by the provided deadoffsets array.
+ *
+ * deadoffsets are the offsets the caller knows about and already removed
+ * associated index entries. Vacuum will call this before setting those line
+ * pointers LP_UNUSED. So, if there are no new LP_DEAD items, then the page
+ * can be set all-visible in the VM by the caller.
+ *
+ * Returns true if the page is all-visible other than the provided
+ * deadoffsets and false otherwise.
+ *
+ * vacrel->cutoffs.OldestXmin is used to determine visibility.
+ *
+ * *all_frozen is an output parameter indicating to the caller if every tuple
+ * on the page is frozen.
+ *
+ * *visibility_cutoff_xid is an output parameter with the highest xmin amongst the
+ * visible tuples. It is only valid if the page is all-visible.
+ *
+ * Callers looking to verify that the page is already all-visible can call
+ * heap_page_is_all_visible().
+ *
+ * This is similar logic to that in heap_prune_record_unchanged_lp_normal() If
+ * you change anything here, make sure that everything stays in sync. Note
+ * that an assertion calls us to verify that everybody still agrees. Be sure
+ * to avoid introducing new side-effects here.
+ */
+static bool
+heap_page_would_be_all_visible(LVRelState *vacrel, Buffer buf,
+ OffsetNumber *deadoffsets,
+ int ndeadoffsets,
+ bool *all_frozen,
+ TransactionId *visibility_cutoff_xid)
+{
Page page = BufferGetPage(buf);
BlockNumber blockno = BufferGetBlockNumber(buf);
OffsetNumber offnum,
maxoff;
bool all_visible = true;
+ int matched_dead_count = 0;
*visibility_cutoff_xid = InvalidTransactionId;
*all_frozen = true;
+ Assert(ndeadoffsets == 0 || deadoffsets);
+
+#ifdef USE_ASSERT_CHECKING
+ /* Confirm input deadoffsets[] is strictly sorted */
+ if (ndeadoffsets > 1)
+ {
+ for (int i = 1; i < ndeadoffsets; i++)
+ Assert(deadoffsets[i - 1] < deadoffsets[i]);
+ }
+#endif
+
maxoff = PageGetMaxOffsetNumber(page);
for (offnum = FirstOffsetNumber;
offnum <= maxoff && all_visible;
@@ -3591,9 +3655,15 @@ heap_page_is_all_visible(LVRelState *vacrel, Buffer buf,
*/
if (ItemIdIsDead(itemid))
{
- all_visible = false;
- *all_frozen = false;
- break;
+ if (!deadoffsets ||
+ matched_dead_count >= ndeadoffsets ||
+ deadoffsets[matched_dead_count] != offnum)
+ {
+ *all_frozen = all_visible = false;
+ break;
+ }
+ matched_dead_count++;
+ continue;
}
Assert(ItemIdIsNormal(itemid));
--
2.43.0
v15-0007-Set-PD_ALL_VISIBLE-in-heap_page_prune_and_freeze.patchtext/x-patch; charset=US-ASCII; name=v15-0007-Set-PD_ALL_VISIBLE-in-heap_page_prune_and_freeze.patchDownload
From 7554e6c7b6e9d1570e473b5c096eee84aab4f5db Mon Sep 17 00:00:00 2001
From: Melanie Plageman <melanieplageman@gmail.com>
Date: Mon, 15 Sep 2025 16:32:35 -0400
Subject: [PATCH v15 07/23] Set PD_ALL_VISIBLE in heap_page_prune_and_freeze
After phase I of vacuum, if the heap page was rendered all-visible, we
can set it as such in the VM. We also must set the page-level
PD_ALL_VISIBLE bit. By setting PD_ALL_VISIBLE while making the other
changes to the heap page instead of while updating the VM, we can omit
the heap page from the WAL chain during the VM update. The result is
that XLOG_HEAP2_PRUNE_VACUUM_SCAN records include updates to
PD_ALL_VISIBLE.
This commit doesn't yet remove the heap page from the WAL chain because
it does not change other users of visibilitymap_set().
On-access pruning does not enable setting PD_ALL_VISIBLE.
Note that this is carefully coded such that if the only modification to
the page during heap_page_prune_and_freeze() is setting PD_ALL_VISIBLE
and checksums/wal_log_hints are disabled we will never emit a full page
image of the heap page.
This also fixes a longstanding issue where, when checksums/wal_log_hints
are enabled, an all-visible page being set all-frozen may not mark the
buffer dirty before visibilitymap_set() stamps it with the
xl_heap_visible LSN.
It is noteworthy that the checks for page corruption and an inconsistent
state between the heap page and the VM in lazy_scan_prune() now happen
after having set PD_ALL_VISIBLE. That is not a functional change because
the corruption cases are mutually exclusive with cases where we would
set PD_ALL_VISIBLE.
---
src/backend/access/heap/heapam_xlog.c | 63 +++++++++++++++++++----
src/backend/access/heap/pruneheap.c | 72 ++++++++++++++++++++++++---
src/backend/access/heap/vacuumlazy.c | 29 +----------
src/include/access/heapam.h | 2 +
src/include/access/heapam_xlog.h | 2 +
5 files changed, 125 insertions(+), 43 deletions(-)
diff --git a/src/backend/access/heap/heapam_xlog.c b/src/backend/access/heap/heapam_xlog.c
index 73aaaef9d8e..4ea1a186c98 100644
--- a/src/backend/access/heap/heapam_xlog.c
+++ b/src/backend/access/heap/heapam_xlog.c
@@ -90,6 +90,7 @@ heap_xlog_prune_freeze(XLogReaderState *record)
xlhp_freeze_plan *plans;
OffsetNumber *frz_offsets;
char *dataptr = XLogRecGetBlockData(record, 0, &datalen);
+ bool do_prune;
heap_xlog_deserialize_prune_and_freeze(dataptr, xlrec.flags,
&nplans, &plans, &frz_offsets,
@@ -97,11 +98,13 @@ heap_xlog_prune_freeze(XLogReaderState *record)
&ndead, &nowdead,
&nunused, &nowunused);
+ do_prune = nredirected > 0 || ndead > 0 || nunused > 0;
+
/*
* Update all line pointers per the record, and repair fragmentation
* if needed.
*/
- if (nredirected > 0 || ndead > 0 || nunused > 0)
+ if (do_prune)
heap_page_prune_execute(buffer,
(xlrec.flags & XLHP_CLEANUP_LOCK) == 0,
redirected, nredirected,
@@ -138,17 +141,52 @@ heap_xlog_prune_freeze(XLogReaderState *record)
/* There should be no more data */
Assert((char *) frz_offsets == dataptr + datalen);
+ /*
+ * The critical integrity requirement here is that we must never end
+ * up with a situation where the visibility map bit is set, and the
+ * page-level PD_ALL_VISIBLE bit is clear. If that were to occur,
+ * then a subsequent page modification would fail to clear the
+ * visibility map bit.
+ */
+ if (xlrec.flags & XLHP_SET_PD_ALL_VIS)
+ PageSetAllVisible(page);
+
/*
* Note: we don't worry about updating the page's prunability hints.
* At worst this will cause an extra prune cycle to occur soon.
*/
-
- PageSetLSN(page, lsn);
MarkBufferDirty(buffer);
+
+ /*
+ * We always emit a WAL record when setting PD_ALL_VISIBLE, but we are
+ * careful not to emit a full page image unless
+ * checksums/wal_log_hints are enabled. We only set the heap page LSN
+ * if full page images were an option when emitting WAL. Otherwise,
+ * subsequent modifications of the page may incorrectly skip emitting
+ * a full page image.
+ */
+ if (do_prune || nplans > 0 ||
+ (xlrec.flags & XLHP_SET_PD_ALL_VIS && XLogHintBitIsNeeded()))
+ PageSetLSN(page, lsn);
}
/*
- * If we released any space or line pointers, update the free space map.
+ * If we released any space or line pointers or set PD_ALL_VISIBLE update
+ * the freespace map.
+ *
+ * Even if we are just setting PD_ALL_VISIBLE (and thus not freeing up any
+ * space), we'll still update the FSM for this page. Since the FSM is not
+ * WAL-logged and only updated heuristically, it easily becomes stale in
+ * standbys. If the standby is later promoted and runs VACUUM, it will
+ * skip updating individual free space figures for pages that became
+ * all-visible (or all-frozen, depending on the vacuum mode,) which is
+ * troublesome when FreeSpaceMapVacuum propagates too optimistic free
+ * space values to upper FSM layers; later inserters try to use such pages
+ * only to find out that they are unusable. This can cause long stalls
+ * when there are many such pages.
+ *
+ * Forestall those problems by updating FSM's idea about a page that is
+ * becoming all-visible or all-frozen.
*
* Do this regardless of a full-page image being applied, since the FSM
* data is not in the page anyway.
@@ -157,10 +195,16 @@ heap_xlog_prune_freeze(XLogReaderState *record)
{
if (xlrec.flags & (XLHP_HAS_REDIRECTIONS |
XLHP_HAS_DEAD_ITEMS |
- XLHP_HAS_NOW_UNUSED_ITEMS))
+ XLHP_HAS_NOW_UNUSED_ITEMS |
+ XLHP_SET_PD_ALL_VIS))
{
Size freespace = PageGetHeapFreeSpace(BufferGetPage(buffer));
+ /*
+ * We want to avoid holding an exclusive lock on the heap buffer
+ * while doing IO, so we'll release the lock on the heap buffer
+ * first.
+ */
UnlockReleaseBuffer(buffer);
XLogRecordPageWithFreeSpace(rlocator, blkno, freespace);
@@ -173,10 +217,11 @@ heap_xlog_prune_freeze(XLogReaderState *record)
/*
* Replay XLOG_HEAP2_VISIBLE records.
*
- * The critical integrity requirement here is that we must never end up with
- * a situation where the visibility map bit is set, and the page-level
- * PD_ALL_VISIBLE bit is clear. If that were to occur, then a subsequent
- * page modification would fail to clear the visibility map bit.
+ * It is imperative that the previously emitted record set PD_ALL_VISIBLE on
+ * the heap page. We must never end up with a situation where the visibility
+ * map bit is set, and the page-level PD_ALL_VISIBLE bit is clear. If that
+ * were to occur, then a subsequent page modification would fail to clear the
+ * visibility map bit.
*/
static void
heap_xlog_visible(XLogReaderState *record)
diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index 5e536bd0d4d..9b25131543b 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -495,6 +495,7 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
bool do_freeze;
bool do_prune;
bool do_hint_prune;
+ bool do_set_pd_vis;
bool did_tuple_hint_fpi;
int64 fpi_before = pgWalUsage.wal_fpi;
TransactionId frz_conflict_horizon = InvalidTransactionId;
@@ -824,6 +825,22 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
prstate.all_visible = prstate.all_frozen = false;
Assert(!prstate.all_frozen || prstate.all_visible);
+
+ /*
+ * Though callers should set the VM if PD_ALL_VISIBLE is set here, it is
+ * allowed for the page-level bit to be set and the VM to be clear.
+ * Setting PD_ALL_VISIBLE when we are making the changes to the page that
+ * render it all-visible allows us to omit the heap page from the WAL
+ * chain when later updating the VM -- even when checksums/wal_log_hints
+ * are enabled.
+ */
+ do_set_pd_vis = false;
+ if ((options & HEAP_PAGE_PRUNE_UPDATE_VIS) != 0)
+ {
+ if (prstate.all_visible && !PageIsAllVisible(page))
+ do_set_pd_vis = true;
+ }
+
/* Any error while applying the changes is critical */
START_CRIT_SECTION();
@@ -844,14 +861,17 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
/*
* If that's all we had to do to the page, this is a non-WAL-logged
- * hint. If we are going to freeze or prune the page, we will mark
- * the buffer dirty below.
+ * hint. If we are going to freeze or prune the page or set
+ * PD_ALL_VISIBLE, we will mark the buffer dirty below.
+ *
+ * Setting PD_ALL_VISIBLE is fully WAL-logged because it is forbidden
+ * for the VM to be set and PD_ALL_VISIBLE to be clear.
*/
- if (!do_freeze && !do_prune)
+ if (!do_freeze && !do_prune && !do_set_pd_vis)
MarkBufferDirtyHint(buffer, true);
}
- if (do_prune || do_freeze)
+ if (do_prune || do_freeze || do_set_pd_vis)
{
/* Apply the planned item changes and repair page fragmentation. */
if (do_prune)
@@ -865,6 +885,9 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
if (do_freeze)
heap_freeze_prepared_tuples(buffer, prstate.frozen, prstate.nfrozen);
+ if (do_set_pd_vis)
+ PageSetAllVisible(page);
+
MarkBufferDirty(buffer);
/*
@@ -891,7 +914,9 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
log_heap_prune_and_freeze(relation, buffer,
conflict_xid,
- true, reason,
+ true,
+ do_set_pd_vis,
+ reason,
prstate.frozen, prstate.nfrozen,
prstate.redirected, prstate.nredirected,
prstate.nowdead, prstate.ndead,
@@ -2078,6 +2103,10 @@ heap_log_freeze_plan(HeapTupleFreeze *tuples, int ntuples,
* replaying 'unused' items depends on whether they were all previously marked
* as dead.
*
+ * set_pd_all_vis indicates that we set PD_ALL_VISIBLE and thus should update
+ * the page LSN when checksums/wal_log_hints are enabled even if we did not
+ * prune or freeze tuples on the page.
+ *
* Note: This function scribbles on the 'frozen' array.
*
* Note: This is called in a critical section, so careful what you do here.
@@ -2086,6 +2115,7 @@ void
log_heap_prune_and_freeze(Relation relation, Buffer buffer,
TransactionId conflict_xid,
bool cleanup_lock,
+ bool set_pd_all_vis,
PruneReason reason,
HeapTupleFreeze *frozen, int nfrozen,
OffsetNumber *redirected, int nredirected,
@@ -2095,6 +2125,7 @@ log_heap_prune_and_freeze(Relation relation, Buffer buffer,
xl_heap_prune xlrec;
XLogRecPtr recptr;
uint8 info;
+ uint8 regbuf_flags;
/* The following local variables hold data registered in the WAL record: */
xlhp_freeze_plan plans[MaxHeapTuplesPerPage];
@@ -2103,8 +2134,21 @@ log_heap_prune_and_freeze(Relation relation, Buffer buffer,
xlhp_prune_items dead_items;
xlhp_prune_items unused_items;
OffsetNumber frz_offsets[MaxHeapTuplesPerPage];
+ bool do_prune = nredirected > 0 || ndead > 0 || nunused > 0;
xlrec.flags = 0;
+ regbuf_flags = REGBUF_STANDARD;
+
+ /*
+ * We can avoid an FPI if the only modification we are making to the heap
+ * page is to set PD_ALL_VISIBLE and checksums/wal_log_hints are disabled.
+ * Note that if we explicitly skip an FPI, we must not set the heap page
+ * LSN later.
+ */
+ if (!do_prune &&
+ nfrozen == 0 &&
+ (!set_pd_all_vis || !XLogHintBitIsNeeded()))
+ regbuf_flags |= REGBUF_NO_IMAGE;
/*
* Prepare data for the buffer. The arrays are not actually in the
@@ -2112,7 +2156,7 @@ log_heap_prune_and_freeze(Relation relation, Buffer buffer,
* page image, the arrays can be omitted.
*/
XLogBeginInsert();
- XLogRegisterBuffer(0, buffer, REGBUF_STANDARD);
+ XLogRegisterBuffer(0, buffer, regbuf_flags);
if (nfrozen > 0)
{
int nplans;
@@ -2169,6 +2213,8 @@ log_heap_prune_and_freeze(Relation relation, Buffer buffer,
* Prepare the main xl_heap_prune record. We already set the XLHP_HAS_*
* flag above.
*/
+ if (set_pd_all_vis)
+ xlrec.flags |= XLHP_SET_PD_ALL_VIS;
if (RelationIsAccessibleInLogicalDecoding(relation))
xlrec.flags |= XLHP_IS_CATALOG_REL;
if (TransactionIdIsValid(conflict_xid))
@@ -2201,5 +2247,17 @@ log_heap_prune_and_freeze(Relation relation, Buffer buffer,
}
recptr = XLogInsert(RM_HEAP2_ID, info);
- PageSetLSN(BufferGetPage(buffer), recptr);
+ /*
+ * We must bump the page LSN if pruning or freezing. If we are only
+ * updating PD_ALL_VISIBLE, though, we can skip doing this unless
+ * wal_log_hints/checksums are enabled. Torn pages are possible if we
+ * update PD_ALL_VISIBLE without bumping the LSN, but this is deemed okay
+ * for page hint updates.
+ */
+ if (do_prune || nfrozen > 0 ||
+ (set_pd_all_vis && XLogHintBitIsNeeded()))
+ {
+ Assert(BufferIsDirty(buffer));
+ PageSetLSN(BufferGetPage(buffer), recptr);
+ }
}
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index 50cc898087f..308abff16ca 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -1970,7 +1970,7 @@ lazy_scan_prune(LVRelState *vacrel,
* tuples. Pruning will have determined whether or not the page is
* all-visible.
*/
- prune_options = HEAP_PAGE_PRUNE_FREEZE;
+ prune_options = HEAP_PAGE_PRUNE_FREEZE | HEAP_PAGE_PRUNE_UPDATE_VIS;
if (vacrel->nindexes == 0)
prune_options |= HEAP_PAGE_PRUNE_MARK_UNUSED_NOW;
@@ -2073,21 +2073,6 @@ lazy_scan_prune(LVRelState *vacrel,
flags |= VISIBILITYMAP_ALL_FROZEN;
}
- /*
- * It should never be the case that the visibility map page is set
- * while the page-level bit is clear, but the reverse is allowed (if
- * checksums are not enabled). Regardless, set both bits so that we
- * get back in sync.
- *
- * NB: If the heap page is all-visible but the VM bit is not set, we
- * don't need to dirty the heap page. However, if checksums are
- * enabled, we do need to make sure that the heap page is dirtied
- * before passing it to visibilitymap_set(), because it may be logged.
- * Given that this situation should only happen in rare cases after a
- * crash, it is not worth optimizing.
- */
- PageSetAllVisible(page);
- MarkBufferDirty(buf);
old_vmbits = visibilitymap_set(vacrel->rel, blkno, buf,
InvalidXLogRecPtr,
vmbuffer, presult.vm_conflict_horizon,
@@ -2168,17 +2153,6 @@ lazy_scan_prune(LVRelState *vacrel,
{
uint8 old_vmbits;
- /*
- * Avoid relying on all_visible_according_to_vm as a proxy for the
- * page-level PD_ALL_VISIBLE bit being set, since it might have become
- * stale -- even when all_visible is set
- */
- if (!PageIsAllVisible(page))
- {
- PageSetAllVisible(page);
- MarkBufferDirty(buf);
- }
-
/*
* Set the page all-frozen (and all-visible) in the VM.
*
@@ -2891,6 +2865,7 @@ lazy_vacuum_heap_page(LVRelState *vacrel, BlockNumber blkno, Buffer buffer,
log_heap_prune_and_freeze(vacrel->rel, buffer,
InvalidTransactionId,
false, /* no cleanup lock required */
+ false,
PRUNE_VACUUM_CLEANUP,
NULL, 0, /* frozen */
NULL, 0, /* redirected */
diff --git a/src/include/access/heapam.h b/src/include/access/heapam.h
index 665e0c79baf..34fe5603512 100644
--- a/src/include/access/heapam.h
+++ b/src/include/access/heapam.h
@@ -42,6 +42,7 @@
/* "options" flag bits for heap_page_prune_and_freeze */
#define HEAP_PAGE_PRUNE_MARK_UNUSED_NOW (1 << 0)
#define HEAP_PAGE_PRUNE_FREEZE (1 << 1)
+#define HEAP_PAGE_PRUNE_UPDATE_VIS (1 << 2)
typedef struct BulkInsertStateData *BulkInsertState;
struct TupleTableSlot;
@@ -384,6 +385,7 @@ extern void heap_get_root_tuples(Page page, OffsetNumber *root_offsets);
extern void log_heap_prune_and_freeze(Relation relation, Buffer buffer,
TransactionId conflict_xid,
bool cleanup_lock,
+ bool set_pd_all_vis,
PruneReason reason,
HeapTupleFreeze *frozen, int nfrozen,
OffsetNumber *redirected, int nredirected,
diff --git a/src/include/access/heapam_xlog.h b/src/include/access/heapam_xlog.h
index d4c0625b632..7d3fb75dda7 100644
--- a/src/include/access/heapam_xlog.h
+++ b/src/include/access/heapam_xlog.h
@@ -294,6 +294,8 @@ typedef struct xl_heap_prune
#define SizeOfHeapPrune (offsetof(xl_heap_prune, flags) + sizeof(uint8))
+#define XLHP_SET_PD_ALL_VIS (1 << 0)
+
/* to handle recovery conflict during logical decoding on standby */
#define XLHP_IS_CATALOG_REL (1 << 1)
--
2.43.0
v15-0010-Log-setting-empty-pages-PD_ALL_VISIBLE-with-XLOG.patchtext/x-patch; charset=US-ASCII; name=v15-0010-Log-setting-empty-pages-PD_ALL_VISIBLE-with-XLOG.patchDownload
From 112df3de663b3cee2a4e1b6c267bc880a2d39c6c Mon Sep 17 00:00:00 2001
From: Melanie Plageman <melanieplageman@gmail.com>
Date: Wed, 17 Sep 2025 18:11:49 -0400
Subject: [PATCH v15 10/23] Log setting empty pages PD_ALL_VISIBLE with
XLOG_HEAP2_VACUUM_SCAN
Though not a big win for this particular case, if we use the
XLOG_HEAP2_VACUUM_SCAN record to log setting PD_ALL_VISIBLE on the heap
page we can omit the heap page from the WAL chain when setting the
visibility map. A follow-on commit will actually remove the heap page
from the VM set WAL chain.
---
src/backend/access/heap/vacuumlazy.c | 43 +++++++++++++++++++---------
1 file changed, 29 insertions(+), 14 deletions(-)
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index 9bfcd67a61b..c016f8f7c25 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -1879,23 +1879,38 @@ lazy_scan_new_or_empty(LVRelState *vacrel, Buffer buf, BlockNumber blkno,
{
START_CRIT_SECTION();
- /* mark buffer dirty before writing a WAL record */
+ PageSetAllVisible(page);
MarkBufferDirty(buf);
- /*
- * It's possible that another backend has extended the heap,
- * initialized the page, and then failed to WAL-log the page due
- * to an ERROR. Since heap extension is not WAL-logged, recovery
- * might try to replay our record setting the page all-visible and
- * find that the page isn't initialized, which will cause a PANIC.
- * To prevent that, check whether the page has been previously
- * WAL-logged, and if not, do that now.
- */
- if (RelationNeedsWAL(vacrel->rel) &&
- PageGetLSN(page) == InvalidXLogRecPtr)
- log_newpage_buffer(buf, true);
+ if (RelationNeedsWAL(vacrel->rel))
+ {
+ /*
+ * It's possible that another backend has extended the heap,
+ * initialized the page, and then failed to WAL-log the page
+ * due to an ERROR. Since heap extension is not WAL-logged,
+ * recovery might try to replay our record setting the page
+ * all-visible and find that the page isn't initialized, which
+ * will cause a PANIC. To prevent that, check whether the page
+ * has been previously WAL-logged, and if not, do that now.
+ *
+ * Otherwise, just emit WAL for setting PD_ALL_VISIBLE on the
+ * heap page. Doing this in a separate record from setting the
+ * VM allows us to omit the heap page from the VM WAL chain.
+ */
+ if (PageGetLSN(page) == InvalidXLogRecPtr)
+ log_newpage_buffer(buf, true);
+ else
+ log_heap_prune_and_freeze(vacrel->rel, buf,
+ InvalidTransactionId, /* conflict xid */
+ false, /* cleanup lock */
+ true, /* set_pd_all_vis */
+ PRUNE_VACUUM_SCAN, /* reason */
+ NULL, 0,
+ NULL, 0,
+ NULL, 0,
+ NULL, 0);
+ }
- PageSetAllVisible(page);
visibilitymap_set(vacrel->rel, blkno, buf,
InvalidXLogRecPtr,
vmbuffer, InvalidTransactionId,
--
2.43.0
v15-0011-Remove-heap-buffer-from-XLOG_HEAP2_VISIBLE-WAL-c.patchtext/x-patch; charset=US-ASCII; name=v15-0011-Remove-heap-buffer-from-XLOG_HEAP2_VISIBLE-WAL-c.patchDownload
From f14db744ecb79b121d8d0d3489384a93bd6abf07 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <melanieplageman@gmail.com>
Date: Tue, 16 Sep 2025 11:05:30 -0400
Subject: [PATCH v15 11/23] Remove heap buffer from XLOG_HEAP2_VISIBLE WAL
chain
Now that all users of visibilitymap_set() include setting PD_ALL_VISIBLE
in the WAL record capturing other changes to the heap page, we no longer
need to include the heap buffer in the WAL chain for setting the VM.
---
src/backend/access/heap/heapam.c | 16 +-----
src/backend/access/heap/heapam_xlog.c | 76 +++----------------------
src/backend/access/heap/vacuumlazy.c | 6 +-
src/backend/access/heap/visibilitymap.c | 31 +---------
src/include/access/heapam_xlog.h | 3 +-
src/include/access/visibilitymap.h | 2 +-
6 files changed, 16 insertions(+), 118 deletions(-)
diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index 7f354caec31..d4d83a6f9fe 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -8806,21 +8806,14 @@ bottomup_sort_and_shrink(TM_IndexDeleteOp *delstate)
*
* snapshotConflictHorizon comes from the largest xmin on the page being
* marked all-visible. REDO routine uses it to generate recovery conflicts.
- *
- * If checksums or wal_log_hints are enabled, we may also generate a full-page
- * image of heap_buffer. Otherwise, we optimize away the FPI (by specifying
- * REGBUF_NO_IMAGE for the heap buffer), in which case the caller should *not*
- * update the heap page's LSN.
*/
XLogRecPtr
-log_heap_visible(Relation rel, Buffer heap_buffer, Buffer vm_buffer,
+log_heap_visible(Relation rel, Buffer vm_buffer,
TransactionId snapshotConflictHorizon, uint8 vmflags)
{
xl_heap_visible xlrec;
XLogRecPtr recptr;
- uint8 flags;
- Assert(BufferIsValid(heap_buffer));
Assert(BufferIsValid(vm_buffer));
xlrec.snapshotConflictHorizon = snapshotConflictHorizon;
@@ -8829,14 +8822,7 @@ log_heap_visible(Relation rel, Buffer heap_buffer, Buffer vm_buffer,
xlrec.flags |= VISIBILITYMAP_XLOG_CATALOG_REL;
XLogBeginInsert();
XLogRegisterData(&xlrec, SizeOfHeapVisible);
-
XLogRegisterBuffer(0, vm_buffer, 0);
-
- flags = REGBUF_STANDARD;
- if (!XLogHintBitIsNeeded())
- flags |= REGBUF_NO_IMAGE;
- XLogRegisterBuffer(1, heap_buffer, flags);
-
recptr = XLogInsert(RM_HEAP2_ID, XLOG_HEAP2_VISIBLE);
return recptr;
diff --git a/src/backend/access/heap/heapam_xlog.c b/src/backend/access/heap/heapam_xlog.c
index 4ea1a186c98..2e9fda0a9bf 100644
--- a/src/backend/access/heap/heapam_xlog.c
+++ b/src/backend/access/heap/heapam_xlog.c
@@ -229,15 +229,12 @@ heap_xlog_visible(XLogReaderState *record)
XLogRecPtr lsn = record->EndRecPtr;
xl_heap_visible *xlrec = (xl_heap_visible *) XLogRecGetData(record);
Buffer vmbuffer = InvalidBuffer;
- Buffer buffer;
- Page page;
RelFileLocator rlocator;
BlockNumber blkno;
- XLogRedoAction action;
Assert((xlrec->flags & VISIBILITYMAP_XLOG_VALID_BITS) == xlrec->flags);
- XLogRecGetBlockTag(record, 1, &rlocator, NULL, &blkno);
+ XLogRecGetBlockTag(record, 0, &rlocator, NULL, &blkno);
/*
* If there are any Hot Standby transactions running that have an xmin
@@ -254,70 +251,11 @@ heap_xlog_visible(XLogReaderState *record)
rlocator);
/*
- * Read the heap page, if it still exists. If the heap file has dropped or
- * truncated later in recovery, we don't need to update the page, but we'd
- * better still update the visibility map.
- */
- action = XLogReadBufferForRedo(record, 1, &buffer);
- if (action == BLK_NEEDS_REDO)
- {
- /*
- * We don't bump the LSN of the heap page when setting the visibility
- * map bit (unless checksums or wal_hint_bits is enabled, in which
- * case we must). This exposes us to torn page hazards, but since
- * we're not inspecting the existing page contents in any way, we
- * don't care.
- */
- page = BufferGetPage(buffer);
-
- PageSetAllVisible(page);
-
- if (XLogHintBitIsNeeded())
- PageSetLSN(page, lsn);
-
- MarkBufferDirty(buffer);
- }
- else if (action == BLK_RESTORED)
- {
- /*
- * If heap block was backed up, we already restored it and there's
- * nothing more to do. (This can only happen with checksums or
- * wal_log_hints enabled.)
- */
- }
-
- if (BufferIsValid(buffer))
- {
- Size space = PageGetFreeSpace(BufferGetPage(buffer));
-
- UnlockReleaseBuffer(buffer);
-
- /*
- * Since FSM is not WAL-logged and only updated heuristically, it
- * easily becomes stale in standbys. If the standby is later promoted
- * and runs VACUUM, it will skip updating individual free space
- * figures for pages that became all-visible (or all-frozen, depending
- * on the vacuum mode,) which is troublesome when FreeSpaceMapVacuum
- * propagates too optimistic free space values to upper FSM layers;
- * later inserters try to use such pages only to find out that they
- * are unusable. This can cause long stalls when there are many such
- * pages.
- *
- * Forestall those problems by updating FSM's idea about a page that
- * is becoming all-visible or all-frozen.
- *
- * Do this regardless of a full-page image being applied, since the
- * FSM data is not in the page anyway.
- */
- if (xlrec->flags & VISIBILITYMAP_VALID_BITS)
- XLogRecordPageWithFreeSpace(rlocator, blkno, space);
- }
-
- /*
- * Even if we skipped the heap page update due to the LSN interlock, it's
- * still safe to update the visibility map. Any WAL record that clears
- * the visibility map bit does so before checking the page LSN, so any
- * bits that need to be cleared will still be cleared.
+ * Even if the heap relation was dropped or truncated and the previously
+ * emitted record skipped the heap page update due to this LSN interlock,
+ * it's still safe to update the visibility map. Any WAL record that
+ * clears the visibility map bit does so before checking the page LSN, so
+ * any bits that need to be cleared will still be cleared.
*/
if (XLogReadBufferForRedoExtended(record, 0, RBM_ZERO_ON_ERROR, false,
&vmbuffer) == BLK_NEEDS_REDO)
@@ -341,7 +279,7 @@ heap_xlog_visible(XLogReaderState *record)
reln = CreateFakeRelcacheEntry(rlocator);
- visibilitymap_set(reln, blkno, InvalidBuffer, lsn, vmbuffer,
+ visibilitymap_set(reln, blkno, lsn, vmbuffer,
xlrec->snapshotConflictHorizon, vmbits);
ReleaseBuffer(vmbuffer);
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index c016f8f7c25..735f1e7501e 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -1911,7 +1911,7 @@ lazy_scan_new_or_empty(LVRelState *vacrel, Buffer buf, BlockNumber blkno,
NULL, 0);
}
- visibilitymap_set(vacrel->rel, blkno, buf,
+ visibilitymap_set(vacrel->rel, blkno,
InvalidXLogRecPtr,
vmbuffer, InvalidTransactionId,
VISIBILITYMAP_ALL_VISIBLE |
@@ -2100,7 +2100,7 @@ lazy_scan_prune(LVRelState *vacrel,
flags |= VISIBILITYMAP_ALL_FROZEN;
}
- old_vmbits = visibilitymap_set(vacrel->rel, blkno, buf,
+ old_vmbits = visibilitymap_set(vacrel->rel, blkno,
InvalidXLogRecPtr,
vmbuffer, presult.vm_conflict_horizon,
flags);
@@ -2898,7 +2898,7 @@ lazy_vacuum_heap_page(LVRelState *vacrel, BlockNumber blkno, Buffer buffer,
*/
if ((vmflags & VISIBILITYMAP_ALL_VISIBLE) != 0)
{
- visibilitymap_set(vacrel->rel, blkno, buffer,
+ visibilitymap_set(vacrel->rel, blkno,
InvalidXLogRecPtr,
vmbuffer, visibility_cutoff_xid,
vmflags);
diff --git a/src/backend/access/heap/visibilitymap.c b/src/backend/access/heap/visibilitymap.c
index b28460392b7..33541e36674 100644
--- a/src/backend/access/heap/visibilitymap.c
+++ b/src/backend/access/heap/visibilitymap.c
@@ -233,9 +233,7 @@ visibilitymap_pin_ok(BlockNumber heapBlk, Buffer vmbuf)
* when a page that is already all-visible is being marked all-frozen.
*
* Caller is expected to set the heap page's PD_ALL_VISIBLE bit before calling
- * this function. Except in recovery, caller should also pass the heap
- * buffer. When checksums are enabled and we're not in recovery, we must add
- * the heap buffer to the WAL chain to protect it from being torn.
+ * this function.
*
* You must pass a buffer containing the correct map page to this function.
* Call visibilitymap_pin first to pin the right one. This function doesn't do
@@ -244,7 +242,7 @@ visibilitymap_pin_ok(BlockNumber heapBlk, Buffer vmbuf)
* Returns the state of the page's VM bits before setting flags.
*/
uint8
-visibilitymap_set(Relation rel, BlockNumber heapBlk, Buffer heapBuf,
+visibilitymap_set(Relation rel, BlockNumber heapBlk,
XLogRecPtr recptr, Buffer vmBuf, TransactionId cutoff_xid,
uint8 flags)
{
@@ -261,18 +259,11 @@ visibilitymap_set(Relation rel, BlockNumber heapBlk, Buffer heapBuf,
#endif
Assert(InRecovery || XLogRecPtrIsInvalid(recptr));
- Assert(InRecovery || PageIsAllVisible(BufferGetPage(heapBuf)));
Assert((flags & VISIBILITYMAP_VALID_BITS) == flags);
/* Must never set all_frozen bit without also setting all_visible bit */
Assert(flags != VISIBILITYMAP_ALL_FROZEN);
- /* Check that we have the right heap page pinned, if present */
- if (BufferIsValid(heapBuf) && BufferGetBlockNumber(heapBuf) != heapBlk)
- elog(ERROR, "wrong heap buffer passed to visibilitymap_set");
-
- Assert(!BufferIsValid(heapBuf) || BufferIsExclusiveLocked(heapBuf));
-
/* Check that we have the right VM page pinned */
if (!BufferIsValid(vmBuf) || BufferGetBlockNumber(vmBuf) != mapBlock)
elog(ERROR, "wrong VM buffer passed to visibilitymap_set");
@@ -294,23 +285,7 @@ visibilitymap_set(Relation rel, BlockNumber heapBlk, Buffer heapBuf,
if (XLogRecPtrIsInvalid(recptr))
{
Assert(!InRecovery);
- recptr = log_heap_visible(rel, heapBuf, vmBuf, cutoff_xid, flags);
-
- /*
- * If data checksums are enabled (or wal_log_hints=on), we
- * need to protect the heap page from being torn.
- *
- * If not, then we must *not* update the heap page's LSN. In
- * this case, the FPI for the heap page was omitted from the
- * WAL record inserted above, so it would be incorrect to
- * update the heap page's LSN.
- */
- if (XLogHintBitIsNeeded())
- {
- Page heapPage = BufferGetPage(heapBuf);
-
- PageSetLSN(heapPage, recptr);
- }
+ recptr = log_heap_visible(rel, vmBuf, cutoff_xid, flags);
}
PageSetLSN(page, recptr);
}
diff --git a/src/include/access/heapam_xlog.h b/src/include/access/heapam_xlog.h
index 7d3fb75dda7..82b8f7f2bbc 100644
--- a/src/include/access/heapam_xlog.h
+++ b/src/include/access/heapam_xlog.h
@@ -440,7 +440,6 @@ typedef struct xl_heap_inplace
* This is what we need to know about setting a visibility map bit
*
* Backup blk 0: visibility map buffer
- * Backup blk 1: heap buffer
*/
typedef struct xl_heap_visible
{
@@ -493,7 +492,7 @@ extern void heap2_desc(StringInfo buf, XLogReaderState *record);
extern const char *heap2_identify(uint8 info);
extern void heap_xlog_logical_rewrite(XLogReaderState *r);
-extern XLogRecPtr log_heap_visible(Relation rel, Buffer heap_buffer,
+extern XLogRecPtr log_heap_visible(Relation rel,
Buffer vm_buffer,
TransactionId snapshotConflictHorizon,
uint8 vmflags);
diff --git a/src/include/access/visibilitymap.h b/src/include/access/visibilitymap.h
index 3dcf37ba03f..fbc69604d57 100644
--- a/src/include/access/visibilitymap.h
+++ b/src/include/access/visibilitymap.h
@@ -32,7 +32,7 @@ extern void visibilitymap_pin(Relation rel, BlockNumber heapBlk,
Buffer *vmbuf);
extern bool visibilitymap_pin_ok(BlockNumber heapBlk, Buffer vmbuf);
extern uint8 visibilitymap_set(Relation rel,
- BlockNumber heapBlk, Buffer heapBuf,
+ BlockNumber heapBlk,
XLogRecPtr recptr,
Buffer vmBuf,
TransactionId cutoff_xid,
--
2.43.0
v15-0012-Make-heap_page_is_all_visible-independent-of-LVR.patchtext/x-patch; charset=US-ASCII; name=v15-0012-Make-heap_page_is_all_visible-independent-of-LVR.patchDownload
From 0174b06e74adeedf425ac159cd04b11c9c35fd73 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <melanieplageman@gmail.com>
Date: Tue, 16 Sep 2025 15:39:31 -0400
Subject: [PATCH v15 12/23] Make heap_page_is_all_visible independent of
LVRelState
Future commits will use this function inside of pruneheap.c where we do
not have access to the LVRelState. We only need a few parameters from
the LVRelState, so just pass those in explicitly.
Author: Melanie Plageman <melanieplageman@gmail.com>
Reviewed-by: Kirill Reshke <reshkekirill@gmail.com>
Reviewed-by: Robert Haas <robertmhaas@gmail.com>
Discussion: https://postgr.es/m/flat/CAAKRu_ZMw6Npd_qm2KM%2BFwQ3cMOMx1Dh3VMhp8-V7SOLxdK9-g%40mail.gmail.com
---
src/backend/access/heap/vacuumlazy.c | 57 ++++++++++++++++++----------
1 file changed, 37 insertions(+), 20 deletions(-)
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index 735f1e7501e..a0f3984e37f 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -463,13 +463,18 @@ static void dead_items_add(LVRelState *vacrel, BlockNumber blkno, OffsetNumber *
int num_offsets);
static void dead_items_reset(LVRelState *vacrel);
static void dead_items_cleanup(LVRelState *vacrel);
-static bool heap_page_is_all_visible(LVRelState *vacrel, Buffer buf,
- TransactionId *visibility_cutoff_xid, bool *all_frozen);
-static bool heap_page_would_be_all_visible(LVRelState *vacrel, Buffer buf,
+static bool heap_page_is_all_visible(Relation rel, Buffer buf,
+ TransactionId OldestXmin,
+ bool *all_frozen,
+ TransactionId *visibility_cutoff_xid,
+ OffsetNumber *logging_offnum);
+static bool heap_page_would_be_all_visible(Relation rel, Buffer buf,
+ TransactionId OldestXmin,
OffsetNumber *deadoffsets,
int ndeadoffsets,
bool *all_frozen,
- TransactionId *visibility_cutoff_xid);
+ TransactionId *visibility_cutoff_xid,
+ OffsetNumber *logging_offnum);
static void update_relstats_all_indexes(LVRelState *vacrel);
static void vacuum_error_callback(void *arg);
static void update_vacuum_error_info(LVRelState *vacrel,
@@ -2030,8 +2035,9 @@ lazy_scan_prune(LVRelState *vacrel,
Assert(presult.lpdead_items == 0);
- if (!heap_page_is_all_visible(vacrel, buf,
- &debug_cutoff, &debug_all_frozen))
+ if (!heap_page_is_all_visible(vacrel->rel, buf,
+ vacrel->cutoffs.OldestXmin, &debug_all_frozen,
+ &debug_cutoff, &vacrel->offnum))
Assert(false);
Assert(presult.all_frozen == debug_all_frozen);
@@ -2824,9 +2830,11 @@ lazy_vacuum_heap_page(LVRelState *vacrel, BlockNumber blkno, Buffer buffer,
VACUUM_ERRCB_PHASE_VACUUM_HEAP, blkno,
InvalidOffsetNumber);
- if (heap_page_would_be_all_visible(vacrel, buffer,
+ if (heap_page_would_be_all_visible(vacrel->rel, buffer,
+ vacrel->cutoffs.OldestXmin,
deadoffsets, num_offsets,
- &all_frozen, &visibility_cutoff_xid))
+ &all_frozen, &visibility_cutoff_xid,
+ &vacrel->offnum))
{
vmflags |= VISIBILITYMAP_ALL_VISIBLE;
if (all_frozen)
@@ -3576,15 +3584,19 @@ dead_items_cleanup(LVRelState *vacrel)
* callers that expect no LP_DEAD on the page.
*/
static bool
-heap_page_is_all_visible(LVRelState *vacrel, Buffer buf,
+heap_page_is_all_visible(Relation rel, Buffer buf,
+ TransactionId OldestXmin,
+ bool *all_frozen,
TransactionId *visibility_cutoff_xid,
- bool *all_frozen)
+ OffsetNumber *logging_offnum)
{
- return heap_page_would_be_all_visible(vacrel, buf,
+ return heap_page_would_be_all_visible(rel, buf,
+ OldestXmin,
NULL, 0,
all_frozen,
- visibility_cutoff_xid);
+ visibility_cutoff_xid,
+ logging_offnum);
}
/*
@@ -3599,7 +3611,7 @@ heap_page_is_all_visible(LVRelState *vacrel, Buffer buf,
* Returns true if the page is all-visible other than the provided
* deadoffsets and false otherwise.
*
- * vacrel->cutoffs.OldestXmin is used to determine visibility.
+ * OldestXmin is used to determine visibility.
*
* *all_frozen is an output parameter indicating to the caller if every tuple
* on the page is frozen.
@@ -3607,6 +3619,9 @@ heap_page_is_all_visible(LVRelState *vacrel, Buffer buf,
* *visibility_cutoff_xid is an output parameter with the highest xmin amongst the
* visible tuples. It is only valid if the page is all-visible.
*
+ * *logging_offnum will have the OffsetNumber of the current tuple being
+ * processed for vacuum's error callback system.
+ *
* Callers looking to verify that the page is already all-visible can call
* heap_page_is_all_visible().
*
@@ -3616,11 +3631,13 @@ heap_page_is_all_visible(LVRelState *vacrel, Buffer buf,
* to avoid introducing new side-effects here.
*/
static bool
-heap_page_would_be_all_visible(LVRelState *vacrel, Buffer buf,
+heap_page_would_be_all_visible(Relation rel, Buffer buf,
+ TransactionId OldestXmin,
OffsetNumber *deadoffsets,
int ndeadoffsets,
bool *all_frozen,
- TransactionId *visibility_cutoff_xid)
+ TransactionId *visibility_cutoff_xid,
+ OffsetNumber *logging_offnum)
{
Page page = BufferGetPage(buf);
BlockNumber blockno = BufferGetBlockNumber(buf);
@@ -3655,7 +3672,7 @@ heap_page_would_be_all_visible(LVRelState *vacrel, Buffer buf,
* Set the offset number so that we can display it along with any
* error that occurred while processing this tuple.
*/
- vacrel->offnum = offnum;
+ *logging_offnum = offnum;
itemid = PageGetItemId(page, offnum);
/* Unused or redirect line pointers are of no interest */
@@ -3685,9 +3702,9 @@ heap_page_would_be_all_visible(LVRelState *vacrel, Buffer buf,
tuple.t_data = (HeapTupleHeader) PageGetItem(page, itemid);
tuple.t_len = ItemIdGetLength(itemid);
- tuple.t_tableOid = RelationGetRelid(vacrel->rel);
+ tuple.t_tableOid = RelationGetRelid(rel);
- switch (HeapTupleSatisfiesVacuum(&tuple, vacrel->cutoffs.OldestXmin,
+ switch (HeapTupleSatisfiesVacuum(&tuple, OldestXmin,
buf))
{
case HEAPTUPLE_LIVE:
@@ -3708,7 +3725,7 @@ heap_page_would_be_all_visible(LVRelState *vacrel, Buffer buf,
*/
xmin = HeapTupleHeaderGetXmin(tuple.t_data);
if (!TransactionIdPrecedes(xmin,
- vacrel->cutoffs.OldestXmin))
+ OldestXmin))
{
all_visible = false;
*all_frozen = false;
@@ -3743,7 +3760,7 @@ heap_page_would_be_all_visible(LVRelState *vacrel, Buffer buf,
} /* scan along page */
/* Clear the offset information once we have processed the given page. */
- vacrel->offnum = InvalidOffsetNumber;
+ *logging_offnum = InvalidOffsetNumber;
return all_visible;
}
--
2.43.0
v15-0013-Eliminate-XLOG_HEAP2_VISIBLE-from-vacuum-phase-I.patchtext/x-patch; charset=US-ASCII; name=v15-0013-Eliminate-XLOG_HEAP2_VISIBLE-from-vacuum-phase-I.patchDownload
From b6d38f5938f2614b89e76a372cf88f2a857216e5 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <melanieplageman@gmail.com>
Date: Wed, 17 Sep 2025 15:52:18 -0400
Subject: [PATCH v15 13/23] Eliminate XLOG_HEAP2_VISIBLE from vacuum phase III
Instead of emitting a separate XLOG_HEAP2_VISIBLE record for each page
that is rendered all-visible by vacuum's third phase, include the
updates to the VM in the already emitted XLOG_HEAP2_PRUNE_VACUUM_CLEANUP
record.
The visibilitymap bits are stored in the flags member of the
xl_heap_prune struct.
This can decrease the number of of WAL records vacuum phase III emits by
as much as half.
Author: Melanie Plageman <melanieplageman@gmail.com>
Reviewed-by: Robert Haas <robertmhaas@gmail.com>
Reviewed-by: Kirill Reshke <reshkekirill@gmail.com>
Reviewed-by: Andres Freund <andres@anarazel.de>
Discussion: https://postgr.es/m/flat/CAAKRu_ZMw6Npd_qm2KM%2BFwQ3cMOMx1Dh3VMhp8-V7SOLxdK9-g%40mail.gmail.com
---
src/backend/access/heap/heapam_xlog.c | 134 ++++++++++++++++++-------
src/backend/access/heap/pruneheap.c | 37 ++++++-
src/backend/access/heap/vacuumlazy.c | 38 +++----
src/backend/access/rmgrdesc/heapdesc.c | 11 +-
src/include/access/heapam.h | 1 +
src/include/access/heapam_xlog.h | 25 ++++-
6 files changed, 177 insertions(+), 69 deletions(-)
diff --git a/src/backend/access/heap/heapam_xlog.c b/src/backend/access/heap/heapam_xlog.c
index 2e9fda0a9bf..dcd0dba45a0 100644
--- a/src/backend/access/heap/heapam_xlog.c
+++ b/src/backend/access/heap/heapam_xlog.c
@@ -35,7 +35,9 @@ heap_xlog_prune_freeze(XLogReaderState *record)
Buffer buffer;
RelFileLocator rlocator;
BlockNumber blkno;
- XLogRedoAction action;
+ Buffer vmbuffer = InvalidBuffer;
+ uint8 vmflags = 0;
+ Size freespace = 0;
XLogRecGetBlockTag(record, 0, &rlocator, NULL, &blkno);
memcpy(&xlrec, maindataptr, SizeOfHeapPrune);
@@ -50,11 +52,22 @@ heap_xlog_prune_freeze(XLogReaderState *record)
Assert((xlrec.flags & XLHP_CLEANUP_LOCK) != 0 ||
(xlrec.flags & (XLHP_HAS_REDIRECTIONS | XLHP_HAS_DEAD_ITEMS)) == 0);
+ if (xlrec.flags & XLHP_VM_ALL_VISIBLE)
+ {
+ vmflags = VISIBILITYMAP_ALL_VISIBLE;
+ if (xlrec.flags & XLHP_VM_ALL_FROZEN)
+ vmflags |= VISIBILITYMAP_ALL_FROZEN;
+ }
+
/*
- * We are about to remove and/or freeze tuples. In Hot Standby mode,
- * ensure that there are no queries running for which the removed tuples
- * are still visible or which still consider the frozen xids as running.
- * The conflict horizon XID comes after xl_heap_prune.
+ * After xl_heap_prune is the optional snapshot conflict horizon.
+ *
+ * In Hot Standby mode, we must ensure that there are no running queries
+ * which would conflict with the changes in this record. That means we
+ * can't replay this record if it removes tuples that are still visible to
+ * transactions on the standby, freeze tuples with xids that are still
+ * considered running on the standby, or set a page as all-visible in the
+ * VM if it isn't all-visible to all transactions on the standby.
*/
if ((xlrec.flags & XLHP_HAS_CONFLICT_HORIZON) != 0)
{
@@ -71,12 +84,12 @@ heap_xlog_prune_freeze(XLogReaderState *record)
}
/*
- * If we have a full-page image, restore it and we're done.
+ * If we have a full-page image of the heap block, restore it and we're
+ * done with the heap block.
*/
- action = XLogReadBufferForRedoExtended(record, 0, RBM_NORMAL,
- (xlrec.flags & XLHP_CLEANUP_LOCK) != 0,
- &buffer);
- if (action == BLK_NEEDS_REDO)
+ if (XLogReadBufferForRedoExtended(record, 0, RBM_NORMAL,
+ (xlrec.flags & XLHP_CLEANUP_LOCK) != 0,
+ &buffer) == BLK_NEEDS_REDO)
{
Page page = BufferGetPage(buffer);
OffsetNumber *redirected;
@@ -100,6 +113,11 @@ heap_xlog_prune_freeze(XLogReaderState *record)
do_prune = nredirected > 0 || ndead > 0 || nunused > 0;
+ /* Ensure the record does something */
+ Assert(do_prune || nplans > 0 ||
+ vmflags & VISIBILITYMAP_VALID_BITS ||
+ xlrec.flags & XLHP_SET_PD_ALL_VIS);
+
/*
* Update all line pointers per the record, and repair fragmentation
* if needed.
@@ -147,15 +165,23 @@ heap_xlog_prune_freeze(XLogReaderState *record)
* page-level PD_ALL_VISIBLE bit is clear. If that were to occur,
* then a subsequent page modification would fail to clear the
* visibility map bit.
+ *
+ * Note: we don't worry about updating the page's prunability hints.
+ * At worst this will cause an extra prune cycle to occur soon.
*/
if (xlrec.flags & XLHP_SET_PD_ALL_VIS)
PageSetAllVisible(page);
/*
- * Note: we don't worry about updating the page's prunability hints.
- * At worst this will cause an extra prune cycle to occur soon.
+ * We must never end up with the VM bit set and the page-level
+ * PD_ALL_VISIBLE bit clear. If that were to occur, a subsequent page
+ * modification would fail to clear the VM bit.
*/
- MarkBufferDirty(buffer);
+ Assert(!(vmflags & VISIBILITYMAP_VALID_BITS) || PageIsAllVisible(page));
+
+ /* If this record only sets the VM, no need to dirty the heap page */
+ if (do_prune || nplans > 0 || xlrec.flags & XLHP_SET_PD_ALL_VIS)
+ MarkBufferDirty(buffer);
/*
* We always emit a WAL record when setting PD_ALL_VISIBLE, but we are
@@ -171,47 +197,81 @@ heap_xlog_prune_freeze(XLogReaderState *record)
}
/*
- * If we released any space or line pointers or set PD_ALL_VISIBLE update
- * the freespace map.
+ * If we released any space or line pointers or set PD_ALL_VISIBLE or the
+ * VM, update the freespace map.
*
- * Even if we are just setting PD_ALL_VISIBLE (and thus not freeing up any
- * space), we'll still update the FSM for this page. Since the FSM is not
- * WAL-logged and only updated heuristically, it easily becomes stale in
- * standbys. If the standby is later promoted and runs VACUUM, it will
- * skip updating individual free space figures for pages that became
- * all-visible (or all-frozen, depending on the vacuum mode,) which is
- * troublesome when FreeSpaceMapVacuum propagates too optimistic free
- * space values to upper FSM layers; later inserters try to use such pages
- * only to find out that they are unusable. This can cause long stalls
- * when there are many such pages.
+ * Even if we are just setting PD_ALL_VISIBLE or updating the VM (and thus
+ * not freeing up any space), we'll still update the FSM for this page.
+ * Since the FSM is not WAL-logged and only updated heuristically, it
+ * easily becomes stale in standbys. If the standby is later promoted and
+ * runs VACUUM, it will skip updating individual free space figures for
+ * pages that became all-visible (or all-frozen, depending on the vacuum
+ * mode,) which is troublesome when FreeSpaceMapVacuum propagates too
+ * optimistic free space values to upper FSM layers; later inserters try
+ * to use such pages only to find out that they are unusable. This can
+ * cause long stalls when there are many such pages.
*
* Forestall those problems by updating FSM's idea about a page that is
* becoming all-visible or all-frozen.
*
* Do this regardless of a full-page image being applied, since the FSM
* data is not in the page anyway.
+ *
+ * We want to avoid holding an exclusive lock on the heap buffer while
+ * doing IO (either of the FSM or the VM), so we'll release the lock on
+ * the heap buffer before doing either.
*/
if (BufferIsValid(buffer))
{
if (xlrec.flags & (XLHP_HAS_REDIRECTIONS |
XLHP_HAS_DEAD_ITEMS |
XLHP_HAS_NOW_UNUSED_ITEMS |
- XLHP_SET_PD_ALL_VIS))
- {
- Size freespace = PageGetHeapFreeSpace(BufferGetPage(buffer));
+ XLHP_SET_PD_ALL_VIS |
+ (vmflags & VISIBILITYMAP_VALID_BITS)))
+ freespace = PageGetHeapFreeSpace(BufferGetPage(buffer));
- /*
- * We want to avoid holding an exclusive lock on the heap buffer
- * while doing IO, so we'll release the lock on the heap buffer
- * first.
- */
- UnlockReleaseBuffer(buffer);
+ UnlockReleaseBuffer(buffer);
+ }
+
+ /*
+ * Now read and update the VM block.
+ *
+ * We must redo changes to the VM even if the heap page was skipped due to
+ * LSN interlock. See comment in heap_xlog_multi_insert() for more details
+ * on replaying changes to the VM.
+ */
+ if (vmflags & VISIBILITYMAP_VALID_BITS &&
+ XLogReadBufferForRedoExtended(record, 1,
+ RBM_ZERO_ON_ERROR,
+ false,
+ &vmbuffer) == BLK_NEEDS_REDO)
+ {
+ Page vmpage = BufferGetPage(vmbuffer);
+ char *relname;
+ uint8 old_vmbits = 0;
+
+ /* initialize the page if it was read as zeros */
+ if (PageIsNew(vmpage))
+ PageInit(vmpage, BLCKSZ, 0);
- XLogRecordPageWithFreeSpace(rlocator, blkno, freespace);
+ /* We don't have relation name during recovery, so use relfilenode */
+ relname = psprintf("%u", rlocator.relNumber);
+ old_vmbits = visibilitymap_set_vmbits(blkno, vmbuffer, vmflags, relname);
+
+ /* Only set VM page LSN if we modified the page */
+ if (old_vmbits != vmflags)
+ {
+ Assert(BufferIsDirty(vmbuffer));
+ PageSetLSN(BufferGetPage(vmbuffer), lsn);
}
- else
- UnlockReleaseBuffer(buffer);
+ pfree(relname);
}
+
+ if (BufferIsValid(vmbuffer))
+ UnlockReleaseBuffer(vmbuffer);
+
+ if (freespace > 0)
+ XLogRecordPageWithFreeSpace(rlocator, blkno, freespace);
}
/*
diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index 9b25131543b..9e00fbf3cd1 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -20,6 +20,7 @@
#include "access/multixact.h"
#include "access/transam.h"
#include "access/xlog.h"
+#include "access/visibilitymapdefs.h"
#include "access/xloginsert.h"
#include "commands/vacuum.h"
#include "executor/instrument.h"
@@ -913,6 +914,7 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
conflict_xid = prstate.latest_xid_removed;
log_heap_prune_and_freeze(relation, buffer,
+ InvalidBuffer, 0,
conflict_xid,
true,
do_set_pd_vis,
@@ -2088,14 +2090,18 @@ heap_log_freeze_plan(HeapTupleFreeze *tuples, int ntuples,
*
* This is used for several different page maintenance operations:
*
- * - Page pruning, in VACUUM's 1st pass or on access: Some items are
+ * - Page pruning, in vacuum phase I or on-access: Some items are
* redirected, some marked dead, and some removed altogether.
*
- * - Freezing: Items are marked as 'frozen'.
+ * - Freezing: During vacuum phase I, items are marked as 'frozen'
*
- * - Vacuum, 2nd pass: Items that are already LP_DEAD are marked as unused.
+ * - Reaping: During vacuum phase III, items that are already LP_DEAD are
+ * marked as unused.
*
- * They have enough commonalities that we use a single WAL record for them
+ * - VM updates: After vacuum phase III, the heap page may be marked
+ * all-visible and all-frozen.
+ *
+ * These changes all happen together, so we use a single WAL record for them
* all.
*
* If replaying the record requires a cleanup lock, pass cleanup_lock = true.
@@ -2103,6 +2109,10 @@ heap_log_freeze_plan(HeapTupleFreeze *tuples, int ntuples,
* replaying 'unused' items depends on whether they were all previously marked
* as dead.
*
+ * If the VM is being updated, vmflags will contain the bits to set. In this
+ * case, vmbuffer should already have been updated and marked dirty and should
+ * still be pinned and locked.
+ *
* set_pd_all_vis indicates that we set PD_ALL_VISIBLE and thus should update
* the page LSN when checksums/wal_log_hints are enabled even if we did not
* prune or freeze tuples on the page.
@@ -2113,6 +2123,7 @@ heap_log_freeze_plan(HeapTupleFreeze *tuples, int ntuples,
*/
void
log_heap_prune_and_freeze(Relation relation, Buffer buffer,
+ Buffer vmbuffer, uint8 vmflags,
TransactionId conflict_xid,
bool cleanup_lock,
bool set_pd_all_vis,
@@ -2139,6 +2150,8 @@ log_heap_prune_and_freeze(Relation relation, Buffer buffer,
xlrec.flags = 0;
regbuf_flags = REGBUF_STANDARD;
+ Assert((vmflags & VISIBILITYMAP_VALID_BITS) == vmflags);
+
/*
* We can avoid an FPI if the only modification we are making to the heap
* page is to set PD_ALL_VISIBLE and checksums/wal_log_hints are disabled.
@@ -2157,6 +2170,10 @@ log_heap_prune_and_freeze(Relation relation, Buffer buffer,
*/
XLogBeginInsert();
XLogRegisterBuffer(0, buffer, regbuf_flags);
+
+ if (vmflags & VISIBILITYMAP_VALID_BITS)
+ XLogRegisterBuffer(1, vmbuffer, 0);
+
if (nfrozen > 0)
{
int nplans;
@@ -2213,6 +2230,12 @@ log_heap_prune_and_freeze(Relation relation, Buffer buffer,
* Prepare the main xl_heap_prune record. We already set the XLHP_HAS_*
* flag above.
*/
+ if (vmflags & VISIBILITYMAP_ALL_VISIBLE)
+ {
+ xlrec.flags |= XLHP_VM_ALL_VISIBLE;
+ if (vmflags & VISIBILITYMAP_ALL_FROZEN)
+ xlrec.flags |= XLHP_VM_ALL_FROZEN;
+ }
if (set_pd_all_vis)
xlrec.flags |= XLHP_SET_PD_ALL_VIS;
if (RelationIsAccessibleInLogicalDecoding(relation))
@@ -2247,6 +2270,12 @@ log_heap_prune_and_freeze(Relation relation, Buffer buffer,
}
recptr = XLogInsert(RM_HEAP2_ID, info);
+ if (vmflags & VISIBILITYMAP_VALID_BITS)
+ {
+ Assert(BufferIsDirty(vmbuffer));
+ PageSetLSN(BufferGetPage(vmbuffer), recptr);
+ }
+
/*
* We must bump the page LSN if pruning or freezing. If we are only
* updating PD_ALL_VISIBLE, though, we can skip doing this unless
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index a0f3984e37f..539e5267574 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -1906,6 +1906,8 @@ lazy_scan_new_or_empty(LVRelState *vacrel, Buffer buf, BlockNumber blkno,
log_newpage_buffer(buf, true);
else
log_heap_prune_and_freeze(vacrel->rel, buf,
+ InvalidBuffer,
+ 0,
InvalidTransactionId, /* conflict xid */
false, /* cleanup lock */
true, /* set_pd_all_vis */
@@ -2817,6 +2819,7 @@ lazy_vacuum_heap_page(LVRelState *vacrel, BlockNumber blkno, Buffer buffer,
OffsetNumber unused[MaxHeapTuplesPerPage];
int nunused = 0;
TransactionId visibility_cutoff_xid;
+ TransactionId conflict_xid = InvalidTransactionId;
bool all_frozen;
LVSavedErrInfo saved_err_info;
uint8 vmflags = 0;
@@ -2842,6 +2845,9 @@ lazy_vacuum_heap_page(LVRelState *vacrel, BlockNumber blkno, Buffer buffer,
vmflags |= VISIBILITYMAP_ALL_FROZEN;
Assert(!TransactionIdIsValid(visibility_cutoff_xid));
}
+
+ /* Take the lock on the vmbuffer before entering a critical section */
+ LockBuffer(vmbuffer, BUFFER_LOCK_EXCLUSIVE);
}
START_CRIT_SECTION();
@@ -2868,7 +2874,13 @@ lazy_vacuum_heap_page(LVRelState *vacrel, BlockNumber blkno, Buffer buffer,
* setting the VM, we must set PD_ALL_VISIBLE as well.
*/
if ((vmflags & VISIBILITYMAP_VALID_BITS) != 0)
+ {
PageSetAllVisible(page);
+ visibilitymap_set_vmbits(blkno,
+ vmbuffer, vmflags,
+ RelationGetRelationName(vacrel->rel));
+ conflict_xid = visibility_cutoff_xid;
+ }
/*
* Mark buffer dirty before we write WAL.
@@ -2879,7 +2891,8 @@ lazy_vacuum_heap_page(LVRelState *vacrel, BlockNumber blkno, Buffer buffer,
if (RelationNeedsWAL(vacrel->rel))
{
log_heap_prune_and_freeze(vacrel->rel, buffer,
- InvalidTransactionId,
+ vmbuffer, vmflags,
+ conflict_xid,
false, /* no cleanup lock required */
(vmflags & VISIBILITYMAP_VALID_BITS) != 0,
PRUNE_VACUUM_CLEANUP,
@@ -2889,36 +2902,17 @@ lazy_vacuum_heap_page(LVRelState *vacrel, BlockNumber blkno, Buffer buffer,
unused, nunused);
}
- /*
- * Note that we don't end the critical section until after emitting the VM
- * record. This ensures both PD_ALL_VISIBLE and the VM bits are set or
- * unset in the event of a crash. While it is correct for PD_ALL_VISIBLE
- * to be set and the VM to be clear, we should do our best to keep these
- * in sync. This does mean that we will take a lock on the VM buffer
- * inside of a critical section, which is generally discouraged. There is
- * precedent for this in other callers of visibilitymap_set(), though.
- */
+ END_CRIT_SECTION();
- /*
- * Now that we have removed the LP_DEAD items from the page, set the
- * visibility map if the page became all-visible/all-frozen. Changes to
- * the heap page have already been logged.
- */
if ((vmflags & VISIBILITYMAP_ALL_VISIBLE) != 0)
{
- visibilitymap_set(vacrel->rel, blkno,
- InvalidXLogRecPtr,
- vmbuffer, visibility_cutoff_xid,
- vmflags);
-
/* Count the newly set VM page for logging */
+ LockBuffer(vmbuffer, BUFFER_LOCK_UNLOCK);
vacrel->vm_new_visible_pages++;
if (all_frozen)
vacrel->vm_new_visible_frozen_pages++;
}
- END_CRIT_SECTION();
-
/* Revert to the previous phase information for error traceback */
restore_vacuum_error_info(vacrel, &saved_err_info);
}
diff --git a/src/backend/access/rmgrdesc/heapdesc.c b/src/backend/access/rmgrdesc/heapdesc.c
index b48d7dc1d24..1cb44ca32d3 100644
--- a/src/backend/access/rmgrdesc/heapdesc.c
+++ b/src/backend/access/rmgrdesc/heapdesc.c
@@ -103,7 +103,7 @@ plan_elem_desc(StringInfo buf, void *plan, void *data)
* code, the latter of which is used in frontend (pg_waldump) code.
*/
void
-heap_xlog_deserialize_prune_and_freeze(char *cursor, uint8 flags,
+heap_xlog_deserialize_prune_and_freeze(char *cursor, uint16 flags,
int *nplans, xlhp_freeze_plan **plans,
OffsetNumber **frz_offsets,
int *nredirected, OffsetNumber **redirected,
@@ -287,6 +287,15 @@ heap2_desc(StringInfo buf, XLogReaderState *record)
appendStringInfo(buf, ", isCatalogRel: %c",
xlrec->flags & XLHP_IS_CATALOG_REL ? 'T' : 'F');
+ if (xlrec->flags & XLHP_VM_ALL_VISIBLE)
+ {
+ uint8 vmflags = VISIBILITYMAP_ALL_VISIBLE;
+
+ if (xlrec->flags & XLHP_VM_ALL_FROZEN)
+ vmflags |= VISIBILITYMAP_ALL_FROZEN;
+ appendStringInfo(buf, ", vm_flags: 0x%02X", vmflags);
+ }
+
if (XLogRecHasBlockData(record, 0))
{
Size datalen;
diff --git a/src/include/access/heapam.h b/src/include/access/heapam.h
index 34fe5603512..e97b53f1ee8 100644
--- a/src/include/access/heapam.h
+++ b/src/include/access/heapam.h
@@ -383,6 +383,7 @@ extern void heap_page_prune_execute(Buffer buffer, bool lp_truncate_only,
OffsetNumber *nowunused, int nunused);
extern void heap_get_root_tuples(Page page, OffsetNumber *root_offsets);
extern void log_heap_prune_and_freeze(Relation relation, Buffer buffer,
+ Buffer vmbuffer, uint8 vmflags,
TransactionId conflict_xid,
bool cleanup_lock,
bool set_pd_all_vis,
diff --git a/src/include/access/heapam_xlog.h b/src/include/access/heapam_xlog.h
index 82b8f7f2bbc..833114e0a6e 100644
--- a/src/include/access/heapam_xlog.h
+++ b/src/include/access/heapam_xlog.h
@@ -249,7 +249,7 @@ typedef struct xl_heap_update
* Main data section:
*
* xl_heap_prune
- * uint8 flags
+ * uint16 flags
* TransactionId snapshot_conflict_horizon
*
* Block 0 data section:
@@ -284,7 +284,7 @@ typedef struct xl_heap_update
*/
typedef struct xl_heap_prune
{
- uint8 flags;
+ uint16 flags;
/*
* If XLHP_HAS_CONFLICT_HORIZON is set, the conflict horizon XID follows,
@@ -292,11 +292,17 @@ typedef struct xl_heap_prune
*/
} xl_heap_prune;
-#define SizeOfHeapPrune (offsetof(xl_heap_prune, flags) + sizeof(uint8))
+#define SizeOfHeapPrune (offsetof(xl_heap_prune, flags) + sizeof(uint16))
#define XLHP_SET_PD_ALL_VIS (1 << 0)
-/* to handle recovery conflict during logical decoding on standby */
+/*
+ * To handle recovery conflict during logical decoding on standby, we must know
+ * if the table is a catalog table. Note that in visibilitymapdefs.h
+ * VISIBLITYMAP_XLOG_CATALOG_REL is also defined as (1 << 2). xl_heap_prune
+ * records should use XLHP_IS_CATALOG_REL, not VISIBILIYTMAP_XLOG_CATALOG_REL --
+ * even if they only contain updates to the VM.
+ */
#define XLHP_IS_CATALOG_REL (1 << 1)
/*
@@ -332,6 +338,15 @@ typedef struct xl_heap_prune
#define XLHP_HAS_DEAD_ITEMS (1 << 6)
#define XLHP_HAS_NOW_UNUSED_ITEMS (1 << 7)
+/*
+ * The xl_heap_prune record's flags may also contain which VM bits to set.
+ * xl_heap_prune should always use the XLHP_VM_ALL_VISIBLE and
+ * XLHP_VM_ALL_FROZEN flags and translate them to their visibilitymapdefs.h
+ * equivalents, VISIBILITYMAP_ALL_VISIBLE and VISIBILITYMAP_ALL_FROZEN.
+ */
+#define XLHP_VM_ALL_VISIBLE (1 << 8)
+#define XLHP_VM_ALL_FROZEN (1 << 9)
+
/*
* xlhp_freeze_plan describes how to freeze a group of one or more heap tuples
* (appears in xl_heap_prune's xlhp_freeze_plans sub-record)
@@ -498,7 +513,7 @@ extern XLogRecPtr log_heap_visible(Relation rel,
uint8 vmflags);
/* in heapdesc.c, so it can be shared between frontend/backend code */
-extern void heap_xlog_deserialize_prune_and_freeze(char *cursor, uint8 flags,
+extern void heap_xlog_deserialize_prune_and_freeze(char *cursor, uint16 flags,
int *nplans, xlhp_freeze_plan **plans,
OffsetNumber **frz_offsets,
int *nredirected, OffsetNumber **redirected,
--
2.43.0
v15-0014-Set-empty-pages-all-visible-in-XLOG_HEAP2_PRUNE_.patchtext/x-patch; charset=US-ASCII; name=v15-0014-Set-empty-pages-all-visible-in-XLOG_HEAP2_PRUNE_.patchDownload
From b0242b98434d61bcaff239a5731f3d1e65f310f6 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <melanieplageman@gmail.com>
Date: Wed, 17 Sep 2025 16:04:18 -0400
Subject: [PATCH v15 14/23] Set empty pages all-visible in
XLOG_HEAP2_PRUNE_VACUUM_SCAN record
As part of a project to eliminate XLOG_HEAP2_VISIBLE records, eliminate
their usage in phase I vacuum of empty pages.
---
src/backend/access/heap/vacuumlazy.c | 56 +++++++++++++++++-----------
1 file changed, 35 insertions(+), 21 deletions(-)
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index 539e5267574..e8721761392 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -1882,11 +1882,22 @@ lazy_scan_new_or_empty(LVRelState *vacrel, Buffer buf, BlockNumber blkno,
*/
if (!PageIsAllVisible(page))
{
+ bool set_pd_all_vis = true;
+
+ /* Lock vmbuffer before entering critical section */
+ LockBuffer(vmbuffer, BUFFER_LOCK_EXCLUSIVE);
+
START_CRIT_SECTION();
PageSetAllVisible(page);
MarkBufferDirty(buf);
+ visibilitymap_set_vmbits(blkno,
+ vmbuffer,
+ VISIBILITYMAP_ALL_VISIBLE |
+ VISIBILITYMAP_ALL_FROZEN,
+ RelationGetRelationName(vacrel->rel));
+
if (RelationNeedsWAL(vacrel->rel))
{
/*
@@ -1897,34 +1908,37 @@ lazy_scan_new_or_empty(LVRelState *vacrel, Buffer buf, BlockNumber blkno,
* all-visible and find that the page isn't initialized, which
* will cause a PANIC. To prevent that, check whether the page
* has been previously WAL-logged, and if not, do that now.
- *
- * Otherwise, just emit WAL for setting PD_ALL_VISIBLE on the
- * heap page. Doing this in a separate record from setting the
- * VM allows us to omit the heap page from the VM WAL chain.
*/
if (PageGetLSN(page) == InvalidXLogRecPtr)
+ {
log_newpage_buffer(buf, true);
- else
- log_heap_prune_and_freeze(vacrel->rel, buf,
- InvalidBuffer,
- 0,
- InvalidTransactionId, /* conflict xid */
- false, /* cleanup lock */
- true, /* set_pd_all_vis */
- PRUNE_VACUUM_SCAN, /* reason */
- NULL, 0,
- NULL, 0,
- NULL, 0,
- NULL, 0);
+ set_pd_all_vis = false;
+ }
+
+ /*
+ * Emit WAL for setting PD_ALL_VISIBLE on the heap page and
+ * setting the VM. If we emitted a new page record for the
+ * page above, setting PD_ALL_VISIBLE will already have been
+ * included in that record.
+ */
+ log_heap_prune_and_freeze(vacrel->rel, buf,
+ vmbuffer,
+ VISIBILITYMAP_ALL_VISIBLE |
+ VISIBILITYMAP_ALL_FROZEN,
+ InvalidTransactionId, /* conflict xid */
+ false, /* cleanup lock */
+ set_pd_all_vis,
+ PRUNE_VACUUM_SCAN, /* reason */
+ NULL, 0,
+ NULL, 0,
+ NULL, 0,
+ NULL, 0);
}
- visibilitymap_set(vacrel->rel, blkno,
- InvalidXLogRecPtr,
- vmbuffer, InvalidTransactionId,
- VISIBILITYMAP_ALL_VISIBLE |
- VISIBILITYMAP_ALL_FROZEN);
END_CRIT_SECTION();
+ LockBuffer(vmbuffer, BUFFER_LOCK_UNLOCK);
+
/* Count the newly all-frozen pages for logging */
vacrel->vm_new_visible_pages++;
vacrel->vm_new_visible_frozen_pages++;
--
2.43.0
v15-0016-Eliminate-XLOG_HEAP2_VISIBLE-from-vacuum-phase-I.patchtext/x-patch; charset=US-ASCII; name=v15-0016-Eliminate-XLOG_HEAP2_VISIBLE-from-vacuum-phase-I.patchDownload
From 558df69caf2c977989781da2757a4c930728e596 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <melanieplageman@gmail.com>
Date: Wed, 17 Sep 2025 17:29:59 -0400
Subject: [PATCH v15 16/23] Eliminate XLOG_HEAP2_VISIBLE from vacuum phase I
prune/freeze
Instead of emitting a separate WAL record for every block rendered
all-visible/frozen by vacuum's phase I, include the changes to the VM in
the XLOG_HEAP2_PRUNE_VACUUM_SCAN record already emitted.
This is only enabled for vacuum's prune/freeze work, not for on-access
pruning.
---
src/backend/access/heap/pruneheap.c | 184 +++++++++++++++++-----------
src/include/access/heapam.h | 3 +-
2 files changed, 113 insertions(+), 74 deletions(-)
diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index e3f9967e26c..473822a8e26 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -662,50 +662,58 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
prstate.deadoffsets = presult->deadoffsets;
/*
- * Caller may update the VM after we're done. We can keep track of
- * whether the page will be all-visible and all-frozen after pruning and
- * freezing to help the caller to do that.
+ * Keep track of whether or not the page will be all-visible and
+ * all-frozen for use in opportunistic freezing and to update the VM if
+ * the caller requests it.
*
- * Currently, only VACUUM sets the VM bits. To save the effort, only do
- * the bookkeeping if the caller needs it. Currently, that's tied to
- * HEAP_PAGE_PRUNE_FREEZE, but it could be a separate flag if you wanted
- * to update the VM bits without also freezing or freeze without also
- * setting the VM bits.
+ * Currently, only VACUUM attempts freezing. But other callers could. The
+ * visibility bookkeeping is required for opportunistic freezing (in
+ * addition to setting the VM bits) because we only consider
+ * opportunistically freezing tuples if the whole page would become
+ * all-frozen or if the whole page will be frozen except for dead tuples
+ * that will be removed by vacuum. But if consider_update_vm is false,
+ * we'll not set the VM even if the page is discovered to be all-visible.
*
- * In addition to telling the caller whether it can set the VM bit, we
- * also use 'all_visible' and 'all_frozen' for our own decision-making. If
- * the whole page would become frozen, we consider opportunistically
- * freezing tuples. We will not be able to freeze the whole page if there
- * are tuples present that are not visible to everyone or if there are
- * dead tuples which are not yet removable. However, dead tuples which
- * will be removed by the end of vacuuming should not preclude us from
- * opportunistically freezing. Because of that, we do not immediately
- * clear all_visible when we see LP_DEAD items. We fix that after
- * scanning the line pointers, before we return the value to the caller,
- * so that the caller doesn't set the VM bit incorrectly.
+ * If only HEAP_PAGE_PRUNE_UPDATE_ViS is passed and not
+ * HEAP_PAGE_PRUNE_FREEZE, prstate.all_frozen must be initialized to false
+ * because we will not call heap_prepare_freeze_tuple() on each tuple.
+ *
+ * Dead tuples which will be removed by the end of vacuuming should not
+ * preclude us from opportunistically freezing, so we do not clear
+ * all_visible when we see LP_DEAD items. We fix that after determining
+ * whether or not to freeze but before deciding whether or not to update
+ * the VM so that we don't set the VM bit incorrectly.
+ *
+ * If not freezing and not updating the VM, we avoid the extra
+ * bookkeeping. Initializing all_visible to false allows skipping the work
+ * to update them in heap_prune_record_unchanged_lp_normal().
*/
if (prstate.attempt_freeze)
{
prstate.all_visible = true;
prstate.all_frozen = true;
}
+ else if ((options & HEAP_PAGE_PRUNE_UPDATE_VIS) != 0)
+ {
+ prstate.all_visible = true;
+ prstate.all_frozen = false;
+ }
else
{
- /*
- * Initializing to false allows skipping the work to update them in
- * heap_prune_record_unchanged_lp_normal().
- */
prstate.all_visible = false;
prstate.all_frozen = false;
}
/*
- * The visibility cutoff xid is the newest xmin of live tuples on the
- * page. In the common case, this will be set as the conflict horizon the
- * caller can use for updating the VM. If, at the end of freezing and
- * pruning, the page is all-frozen, there is no possibility that any
- * running transaction on the standby does not see tuples on the page as
- * all-visible, so the conflict horizon remains InvalidTransactionId.
+ * The visibility cutoff xid is the newest xmin of live, committed tuples
+ * older than OldestXmin on the page. This field is only kept up-to-date
+ * if the page is all-visible. As soon as a tuple is encountered that is
+ * not visible to all, this field is unmaintained. As long as it is
+ * maintained, it can be used to calculate the snapshot conflict horizon.
+ * This is most likely to happen when updating the VM and/or freezing all
+ * live tuples on the page. It is updated before returning to the caller
+ * because vacuum does assert-build only validation on the page using this
+ * field.
*/
prstate.visibility_cutoff_xid = InvalidTransactionId;
@@ -943,16 +951,15 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
* visibility map bits based on information from the VM and from
* all_visible and all_frozen variables.
*
- * Though callers should set the VM if PD_ALL_VISIBLE is set, it is
- * allowed for the page-level bit to be set and the VM to be clear. We log
- * setting PD_ALL_VISIBLE on the heap page in a
- * XLOG_HEAP2_PRUNE_VACUUM_SCAN record and setting the VM bits in a later
- * emitted XLOG_HEAP2_VISIBLE record.
+ * It is allowed for the page-level bit to be set and the VM to be clear,
+ * however, we have a strong preference for keeping them in sync.
*
- * Setting PD_ALL_VISIBLE when we are making the changes to the page that
- * render it all-visible allows us to omit the heap page from the WAL
- * chain when later updating the VM -- even when checksums/wal_log_hints
- * are enabled.
+ * Prior to Postgres 19, it was possible for the page-level bit to be set
+ * and the VM bit to be clear. This could happen if we crashed after
+ * setting PD_ALL_VISIBLE but before setting bits in the VM.
+ *
+ * As such, it is possible to only update the VM when PD_ALL_VISIBLE is
+ * already set.
*/
do_set_pd_vis = false;
do_set_vm = false;
@@ -961,6 +968,10 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
blockno, buffer, vmbuffer, blk_known_av,
&prstate, &new_vmbits, &do_set_pd_vis);
+ /* Lock vmbuffer before entering a critical section */
+ if (do_set_vm)
+ LockBuffer(vmbuffer, BUFFER_LOCK_EXCLUSIVE);
+
/* Any error while applying the changes is critical */
START_CRIT_SECTION();
@@ -991,7 +1002,7 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
MarkBufferDirtyHint(buffer, true);
}
- if (do_prune || do_freeze || do_set_pd_vis)
+ if (do_prune || do_freeze || do_set_pd_vis || do_set_vm)
{
/* Apply the planned item changes and repair page fragmentation. */
if (do_prune)
@@ -1008,12 +1019,32 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
if (do_set_pd_vis)
PageSetAllVisible(page);
- MarkBufferDirty(buffer);
+ if (do_prune || do_freeze || do_set_pd_vis)
+ MarkBufferDirty(buffer);
+
+ if (do_set_vm)
+ {
+ Assert(PageIsAllVisible(page));
+
+ old_vmbits = visibilitymap_set_vmbits(blockno,
+ vmbuffer, new_vmbits,
+ RelationGetRelationName(relation));
+ if (old_vmbits == new_vmbits)
+ {
+ LockBuffer(vmbuffer, BUFFER_LOCK_UNLOCK);
+ /* Unset so we don't emit WAL since no change occurred */
+ do_set_vm = false;
+ }
+ }
/*
- * Emit a WAL XLOG_HEAP2_PRUNE* record showing what we did
+ * Emit a WAL XLOG_HEAP2_PRUNE* record showing what we did If we were
+ * only updating the VM and it turns out it was already set, we will
+ * have unset do_set_vm earlier. As such, check it again before
+ * emitting the record.
*/
- if (RelationNeedsWAL(relation))
+ if (RelationNeedsWAL(relation) &&
+ (do_prune || do_freeze || do_set_pd_vis || do_set_vm))
{
/*
* The snapshotConflictHorizon for the whole record should be the
@@ -1025,15 +1056,45 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
* on the standby with xids older than the youngest tuple this
* record will freeze will conflict.
*/
- TransactionId conflict_xid;
+ TransactionId conflict_xid = InvalidTransactionId;
- if (TransactionIdFollows(frz_conflict_horizon, prstate.latest_xid_removed))
+ /*
+ * If we are updating the VM, the conflict horizon is almost
+ * always the visibility cutoff XID.
+ *
+ * Separately, if we are freezing any tuples, as an optimization,
+ * we can use the visibility_cutoff_xid as the conflict horizon if
+ * the page will be all-frozen. This is true even if there are
+ * LP_DEAD line pointers because we ignored those when maintaining
+ * the visibility_cutoff_xid. This will have been calculated
+ * earlier as the frz_conflict_horizon when we determined we would
+ * freeze.
+ */
+ if (do_set_vm)
+ conflict_xid = prstate.visibility_cutoff_xid;
+ else if (do_freeze)
conflict_xid = frz_conflict_horizon;
- else
+
+ /*
+ * If we are removing tuples with a younger xmax than our so far
+ * calculated conflict_xid, we must use this as our horizon.
+ */
+ if (TransactionIdFollows(prstate.latest_xid_removed, conflict_xid))
conflict_xid = prstate.latest_xid_removed;
+ /*
+ * We can omit the snapshot conflict horizon if we are not pruning
+ * or freezing any tuples and are setting an already all-visible
+ * page all-frozen in the VM. In this case, all of the tuples on
+ * the page must already be visible to all MVCC snapshots on the
+ * standby.
+ */
+ if (!do_prune && !do_freeze && do_set_vm &&
+ blk_known_av && (new_vmbits & VISIBILITYMAP_ALL_FROZEN))
+ conflict_xid = InvalidTransactionId;
+
log_heap_prune_and_freeze(relation, buffer,
- InvalidBuffer, 0,
+ vmbuffer, new_vmbits,
conflict_xid,
true,
do_set_pd_vis,
@@ -1047,6 +1108,9 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
END_CRIT_SECTION();
+ if (do_set_vm)
+ LockBuffer(vmbuffer, BUFFER_LOCK_UNLOCK);
+
Assert(!prstate.all_visible || (prstate.lpdead_items == 0));
/*
@@ -1078,32 +1142,6 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
}
#endif
- /* Now set the VM */
- if (do_set_vm)
- {
- TransactionId vm_conflict_horizon;
-
- Assert((new_vmbits & VISIBILITYMAP_VALID_BITS) != 0);
-
- /*
- * The conflict horizon for that record must be the newest xmin on the
- * page. However, if the page is completely frozen, there can be no
- * conflict and the vm_conflict_horizon should remain
- * InvalidTransactionId. This includes the case that we just froze
- * all the tuples; the prune-freeze record included the conflict XID
- * already so a snapshotConflictHorizon sufficient to make everything
- * safe for REDO was logged when the page's tuples were frozen.
- */
- if (prstate.all_frozen)
- vm_conflict_horizon = InvalidTransactionId;
- else
- vm_conflict_horizon = prstate.visibility_cutoff_xid;
- old_vmbits = visibilitymap_set(relation, blockno,
- InvalidXLogRecPtr,
- vmbuffer, vm_conflict_horizon,
- new_vmbits);
- }
-
/* Copy information back for caller */
presult->ndeleted = prstate.ndeleted;
presult->nnewlpdead = prstate.ndead;
@@ -2261,7 +2299,7 @@ heap_log_freeze_plan(HeapTupleFreeze *tuples, int ntuples,
* - Reaping: During vacuum phase III, items that are already LP_DEAD are
* marked as unused.
*
- * - VM updates: After vacuum phase III, the heap page may be marked
+ * - VM updates: After vacuum phases I and III, the heap page may be marked
* all-visible and all-frozen.
*
* These changes all happen together, so we use a single WAL record for them
diff --git a/src/include/access/heapam.h b/src/include/access/heapam.h
index 493ddeacbc0..394f62a21e5 100644
--- a/src/include/access/heapam.h
+++ b/src/include/access/heapam.h
@@ -239,7 +239,8 @@ typedef struct PruneFreezeResult
* visibility map before updating it during phase I of vacuuming.
* new_vmbits are the state of those bits after phase I of vacuuming.
*
- * These are only set if the HEAP_PAGE_PRUNE_UPDATE_VIS option is set.
+ * These are only set if the HEAP_PAGE_PRUNE_UPDATE_VIS option is set and
+ * we have actually updated the VM.
*/
uint8 new_vmbits;
uint8 old_vmbits;
--
2.43.0
v15-0017-Remove-XLOG_HEAP2_VISIBLE-entirely.patchtext/x-patch; charset=US-ASCII; name=v15-0017-Remove-XLOG_HEAP2_VISIBLE-entirely.patchDownload
From 3d5897f742431854cfc9e5cc300a92ae256b3496 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <melanieplageman@gmail.com>
Date: Wed, 17 Sep 2025 17:42:54 -0400
Subject: [PATCH v15 17/23] Remove XLOG_HEAP2_VISIBLE entirely
There are now no users of this, so eliminate it entirely.
This includes the xl_heap_visible struct as well as all of the functions
used to emit and replay XLOG_HEAP2_VISIBLE records.
---
src/backend/access/common/bufmask.c | 4 +-
src/backend/access/heap/heapam.c | 40 ++--------
src/backend/access/heap/heapam_xlog.c | 94 ++----------------------
src/backend/access/heap/pruneheap.c | 6 +-
src/backend/access/heap/vacuumlazy.c | 16 ++--
src/backend/access/heap/visibilitymap.c | 85 +--------------------
src/backend/access/rmgrdesc/heapdesc.c | 10 ---
src/backend/replication/logical/decode.c | 1 -
src/backend/storage/ipc/standby.c | 12 +--
src/include/access/heapam_xlog.h | 19 -----
src/include/access/visibilitymap.h | 15 ++--
src/include/access/visibilitymapdefs.h | 9 ---
src/tools/pgindent/typedefs.list | 1 -
13 files changed, 41 insertions(+), 271 deletions(-)
diff --git a/src/backend/access/common/bufmask.c b/src/backend/access/common/bufmask.c
index bb260cffa68..5f07f179415 100644
--- a/src/backend/access/common/bufmask.c
+++ b/src/backend/access/common/bufmask.c
@@ -56,8 +56,8 @@ mask_page_hint_bits(Page page)
/*
* During replay, if the page LSN has advanced past our XLOG record's LSN,
- * we don't mark the page all-visible. See heap_xlog_visible() for
- * details.
+ * we don't mark the page all-visible. See heap_xlog_prune_and_freeze()
+ * for more details.
*/
PageClearAllVisible(page);
}
diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index d4d83a6f9fe..14a2996b9ee 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -2524,11 +2524,11 @@ heap_multi_insert(Relation relation, TupleTableSlot **slots, int ntuples,
else if (all_frozen_set)
{
PageSetAllVisible(page);
- visibilitymap_set_vmbits(BufferGetBlockNumber(buffer),
- vmbuffer,
- VISIBILITYMAP_ALL_VISIBLE |
- VISIBILITYMAP_ALL_FROZEN,
- RelationGetRelationName(relation));
+ visibilitymap_set(BufferGetBlockNumber(buffer),
+ vmbuffer,
+ VISIBILITYMAP_ALL_VISIBLE |
+ VISIBILITYMAP_ALL_FROZEN,
+ RelationGetRelationName(relation));
}
/*
@@ -8798,36 +8798,6 @@ bottomup_sort_and_shrink(TM_IndexDeleteOp *delstate)
return nblocksfavorable;
}
-/*
- * Perform XLogInsert for a heap-visible operation. 'block' is the block
- * being marked all-visible, and vm_buffer is the buffer containing the
- * corresponding visibility map block. Both should have already been modified
- * and dirtied.
- *
- * snapshotConflictHorizon comes from the largest xmin on the page being
- * marked all-visible. REDO routine uses it to generate recovery conflicts.
- */
-XLogRecPtr
-log_heap_visible(Relation rel, Buffer vm_buffer,
- TransactionId snapshotConflictHorizon, uint8 vmflags)
-{
- xl_heap_visible xlrec;
- XLogRecPtr recptr;
-
- Assert(BufferIsValid(vm_buffer));
-
- xlrec.snapshotConflictHorizon = snapshotConflictHorizon;
- xlrec.flags = vmflags;
- if (RelationIsAccessibleInLogicalDecoding(rel))
- xlrec.flags |= VISIBILITYMAP_XLOG_CATALOG_REL;
- XLogBeginInsert();
- XLogRegisterData(&xlrec, SizeOfHeapVisible);
- XLogRegisterBuffer(0, vm_buffer, 0);
- recptr = XLogInsert(RM_HEAP2_ID, XLOG_HEAP2_VISIBLE);
-
- return recptr;
-}
-
/*
* Perform XLogInsert for a heap-update operation. Caller must already
* have modified the buffer(s) and marked them dirty.
diff --git a/src/backend/access/heap/heapam_xlog.c b/src/backend/access/heap/heapam_xlog.c
index dcd0dba45a0..502517fa62e 100644
--- a/src/backend/access/heap/heapam_xlog.c
+++ b/src/backend/access/heap/heapam_xlog.c
@@ -256,7 +256,7 @@ heap_xlog_prune_freeze(XLogReaderState *record)
/* We don't have relation name during recovery, so use relfilenode */
relname = psprintf("%u", rlocator.relNumber);
- old_vmbits = visibilitymap_set_vmbits(blkno, vmbuffer, vmflags, relname);
+ old_vmbits = visibilitymap_set(blkno, vmbuffer, vmflags, relname);
/* Only set VM page LSN if we modified the page */
if (old_vmbits != vmflags)
@@ -274,81 +274,6 @@ heap_xlog_prune_freeze(XLogReaderState *record)
XLogRecordPageWithFreeSpace(rlocator, blkno, freespace);
}
-/*
- * Replay XLOG_HEAP2_VISIBLE records.
- *
- * It is imperative that the previously emitted record set PD_ALL_VISIBLE on
- * the heap page. We must never end up with a situation where the visibility
- * map bit is set, and the page-level PD_ALL_VISIBLE bit is clear. If that
- * were to occur, then a subsequent page modification would fail to clear the
- * visibility map bit.
- */
-static void
-heap_xlog_visible(XLogReaderState *record)
-{
- XLogRecPtr lsn = record->EndRecPtr;
- xl_heap_visible *xlrec = (xl_heap_visible *) XLogRecGetData(record);
- Buffer vmbuffer = InvalidBuffer;
- RelFileLocator rlocator;
- BlockNumber blkno;
-
- Assert((xlrec->flags & VISIBILITYMAP_XLOG_VALID_BITS) == xlrec->flags);
-
- XLogRecGetBlockTag(record, 0, &rlocator, NULL, &blkno);
-
- /*
- * If there are any Hot Standby transactions running that have an xmin
- * horizon old enough that this page isn't all-visible for them, they
- * might incorrectly decide that an index-only scan can skip a heap fetch.
- *
- * NB: It might be better to throw some kind of "soft" conflict here that
- * forces any index-only scan that is in flight to perform heap fetches,
- * rather than killing the transaction outright.
- */
- if (InHotStandby)
- ResolveRecoveryConflictWithSnapshot(xlrec->snapshotConflictHorizon,
- xlrec->flags & VISIBILITYMAP_XLOG_CATALOG_REL,
- rlocator);
-
- /*
- * Even if the heap relation was dropped or truncated and the previously
- * emitted record skipped the heap page update due to this LSN interlock,
- * it's still safe to update the visibility map. Any WAL record that
- * clears the visibility map bit does so before checking the page LSN, so
- * any bits that need to be cleared will still be cleared.
- */
- if (XLogReadBufferForRedoExtended(record, 0, RBM_ZERO_ON_ERROR, false,
- &vmbuffer) == BLK_NEEDS_REDO)
- {
- Page vmpage = BufferGetPage(vmbuffer);
- Relation reln;
- uint8 vmbits;
-
- /* initialize the page if it was read as zeros */
- if (PageIsNew(vmpage))
- PageInit(vmpage, BLCKSZ, 0);
-
- /* remove VISIBILITYMAP_XLOG_* */
- vmbits = xlrec->flags & VISIBILITYMAP_VALID_BITS;
-
- /*
- * XLogReadBufferForRedoExtended locked the buffer. But
- * visibilitymap_set will handle locking itself.
- */
- LockBuffer(vmbuffer, BUFFER_LOCK_UNLOCK);
-
- reln = CreateFakeRelcacheEntry(rlocator);
-
- visibilitymap_set(reln, blkno, lsn, vmbuffer,
- xlrec->snapshotConflictHorizon, vmbits);
-
- ReleaseBuffer(vmbuffer);
- FreeFakeRelcacheEntry(reln);
- }
- else if (BufferIsValid(vmbuffer))
- UnlockReleaseBuffer(vmbuffer);
-}
-
/*
* Given an "infobits" field from an XLog record, set the correct bits in the
* given infomask and infomask2 for the tuple touched by the record.
@@ -728,8 +653,8 @@ heap_xlog_multi_insert(XLogReaderState *record)
*
* In recovery, we expect no other writers, so writing to the VM page
* without holding a lock on the heap page is considered safe enough. It
- * is done this way when replaying xl_heap_visible records (see
- * heap_xlog_visible()).
+ * is done this way when replaying xl_heap_prune records (see
+ * heap_xlog_prune_and_freeze()).
*/
if (xlrec->flags & XLH_INSERT_ALL_FROZEN_SET &&
XLogReadBufferForRedoExtended(record, 1, RBM_ZERO_ON_ERROR, false,
@@ -744,11 +669,11 @@ heap_xlog_multi_insert(XLogReaderState *record)
/* We don't have relation name during recovery, so use relfilenode */
relname = psprintf("%u", rlocator.relNumber);
- visibilitymap_set_vmbits(blkno,
- vmbuffer,
- VISIBILITYMAP_ALL_VISIBLE |
- VISIBILITYMAP_ALL_FROZEN,
- relname);
+ visibilitymap_set(blkno,
+ vmbuffer,
+ VISIBILITYMAP_ALL_VISIBLE |
+ VISIBILITYMAP_ALL_FROZEN,
+ relname);
/*
* It is not possible that the VM was already set for this heap page,
@@ -1334,9 +1259,6 @@ heap2_redo(XLogReaderState *record)
case XLOG_HEAP2_PRUNE_VACUUM_CLEANUP:
heap_xlog_prune_freeze(record);
break;
- case XLOG_HEAP2_VISIBLE:
- heap_xlog_visible(record);
- break;
case XLOG_HEAP2_MULTI_INSERT:
heap_xlog_multi_insert(record);
break;
diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index 473822a8e26..faf1002f25f 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -1026,9 +1026,9 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
{
Assert(PageIsAllVisible(page));
- old_vmbits = visibilitymap_set_vmbits(blockno,
- vmbuffer, new_vmbits,
- RelationGetRelationName(relation));
+ old_vmbits = visibilitymap_set(blockno,
+ vmbuffer, new_vmbits,
+ RelationGetRelationName(relation));
if (old_vmbits == new_vmbits)
{
LockBuffer(vmbuffer, BUFFER_LOCK_UNLOCK);
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index d71f3755dce..e59eb40133d 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -1887,11 +1887,11 @@ lazy_scan_new_or_empty(LVRelState *vacrel, Buffer buf, BlockNumber blkno,
PageSetAllVisible(page);
MarkBufferDirty(buf);
- visibilitymap_set_vmbits(blkno,
- vmbuffer,
- VISIBILITYMAP_ALL_VISIBLE |
- VISIBILITYMAP_ALL_FROZEN,
- RelationGetRelationName(vacrel->rel));
+ visibilitymap_set(blkno,
+ vmbuffer,
+ VISIBILITYMAP_ALL_VISIBLE |
+ VISIBILITYMAP_ALL_FROZEN,
+ RelationGetRelationName(vacrel->rel));
if (RelationNeedsWAL(vacrel->rel))
{
@@ -2776,9 +2776,9 @@ lazy_vacuum_heap_page(LVRelState *vacrel, BlockNumber blkno, Buffer buffer,
if ((vmflags & VISIBILITYMAP_VALID_BITS) != 0)
{
PageSetAllVisible(page);
- visibilitymap_set_vmbits(blkno,
- vmbuffer, vmflags,
- RelationGetRelationName(vacrel->rel));
+ visibilitymap_set(blkno,
+ vmbuffer, vmflags,
+ RelationGetRelationName(vacrel->rel));
conflict_xid = visibility_cutoff_xid;
}
diff --git a/src/backend/access/heap/visibilitymap.c b/src/backend/access/heap/visibilitymap.c
index 33541e36674..8754b737e94 100644
--- a/src/backend/access/heap/visibilitymap.c
+++ b/src/backend/access/heap/visibilitymap.c
@@ -14,8 +14,7 @@
* visibilitymap_clear - clear bits for one page in the visibility map
* visibilitymap_pin - pin a map page for setting a bit
* visibilitymap_pin_ok - check whether correct map page is already pinned
- * visibilitymap_set - set bit(s) in a previously pinned page and log
- * visibilitymap_set_vmbits - set bit(s) in a pinned page
+ * visibilitymap_set - set bit(s) in a previously pinned page
* visibilitymap_get_status - get status of bits
* visibilitymap_count - count number of bits set in visibility map
* visibilitymap_prepare_truncate -
@@ -220,82 +219,6 @@ visibilitymap_pin_ok(BlockNumber heapBlk, Buffer vmbuf)
return BufferIsValid(vmbuf) && BufferGetBlockNumber(vmbuf) == mapBlock;
}
-/*
- * visibilitymap_set - set bit(s) on a previously pinned page
- *
- * recptr is the LSN of the XLOG record we're replaying, if we're in recovery,
- * or InvalidXLogRecPtr in normal running. The VM page LSN is advanced to the
- * one provided; in normal running, we generate a new XLOG record and set the
- * page LSN to that value (though the heap page's LSN may *not* be updated;
- * see below). cutoff_xid is the largest xmin on the page being marked
- * all-visible; it is needed for Hot Standby, and can be InvalidTransactionId
- * if the page contains no tuples. It can also be set to InvalidTransactionId
- * when a page that is already all-visible is being marked all-frozen.
- *
- * Caller is expected to set the heap page's PD_ALL_VISIBLE bit before calling
- * this function.
- *
- * You must pass a buffer containing the correct map page to this function.
- * Call visibilitymap_pin first to pin the right one. This function doesn't do
- * any I/O.
- *
- * Returns the state of the page's VM bits before setting flags.
- */
-uint8
-visibilitymap_set(Relation rel, BlockNumber heapBlk,
- XLogRecPtr recptr, Buffer vmBuf, TransactionId cutoff_xid,
- uint8 flags)
-{
- BlockNumber mapBlock = HEAPBLK_TO_MAPBLOCK(heapBlk);
- uint32 mapByte = HEAPBLK_TO_MAPBYTE(heapBlk);
- uint8 mapOffset = HEAPBLK_TO_OFFSET(heapBlk);
- Page page;
- uint8 *map;
- uint8 status;
-
-#ifdef TRACE_VISIBILITYMAP
- elog(DEBUG1, "vm_set flags 0x%02X for %s %d",
- flags, RelationGetRelationName(rel), heapBlk);
-#endif
-
- Assert(InRecovery || XLogRecPtrIsInvalid(recptr));
- Assert((flags & VISIBILITYMAP_VALID_BITS) == flags);
-
- /* Must never set all_frozen bit without also setting all_visible bit */
- Assert(flags != VISIBILITYMAP_ALL_FROZEN);
-
- /* Check that we have the right VM page pinned */
- if (!BufferIsValid(vmBuf) || BufferGetBlockNumber(vmBuf) != mapBlock)
- elog(ERROR, "wrong VM buffer passed to visibilitymap_set");
-
- page = BufferGetPage(vmBuf);
- map = (uint8 *) PageGetContents(page);
- LockBuffer(vmBuf, BUFFER_LOCK_EXCLUSIVE);
-
- status = (map[mapByte] >> mapOffset) & VISIBILITYMAP_VALID_BITS;
- if (flags != status)
- {
- START_CRIT_SECTION();
-
- map[mapByte] |= (flags << mapOffset);
- MarkBufferDirty(vmBuf);
-
- if (RelationNeedsWAL(rel))
- {
- if (XLogRecPtrIsInvalid(recptr))
- {
- Assert(!InRecovery);
- recptr = log_heap_visible(rel, vmBuf, cutoff_xid, flags);
- }
- PageSetLSN(page, recptr);
- }
-
- END_CRIT_SECTION();
- }
-
- LockBuffer(vmBuf, BUFFER_LOCK_UNLOCK);
- return status;
-}
/*
* Set flags in the VM block contained in the passed in vmBuf.
@@ -320,9 +243,9 @@ visibilitymap_set(Relation rel, BlockNumber heapBlk,
* is pinned and exclusive locked.
*/
uint8
-visibilitymap_set_vmbits(BlockNumber heapBlk,
- Buffer vmBuf, uint8 flags,
- const char *heapRelname)
+visibilitymap_set(BlockNumber heapBlk,
+ Buffer vmBuf, uint8 flags,
+ const char *heapRelname)
{
BlockNumber mapBlock = HEAPBLK_TO_MAPBLOCK(heapBlk);
uint32 mapByte = HEAPBLK_TO_MAPBYTE(heapBlk);
diff --git a/src/backend/access/rmgrdesc/heapdesc.c b/src/backend/access/rmgrdesc/heapdesc.c
index 1cb44ca32d3..93505cb8c56 100644
--- a/src/backend/access/rmgrdesc/heapdesc.c
+++ b/src/backend/access/rmgrdesc/heapdesc.c
@@ -349,13 +349,6 @@ heap2_desc(StringInfo buf, XLogReaderState *record)
}
}
}
- else if (info == XLOG_HEAP2_VISIBLE)
- {
- xl_heap_visible *xlrec = (xl_heap_visible *) rec;
-
- appendStringInfo(buf, "snapshotConflictHorizon: %u, flags: 0x%02X",
- xlrec->snapshotConflictHorizon, xlrec->flags);
- }
else if (info == XLOG_HEAP2_MULTI_INSERT)
{
xl_heap_multi_insert *xlrec = (xl_heap_multi_insert *) rec;
@@ -460,9 +453,6 @@ heap2_identify(uint8 info)
case XLOG_HEAP2_PRUNE_VACUUM_CLEANUP:
id = "PRUNE_VACUUM_CLEANUP";
break;
- case XLOG_HEAP2_VISIBLE:
- id = "VISIBLE";
- break;
case XLOG_HEAP2_MULTI_INSERT:
id = "MULTI_INSERT";
break;
diff --git a/src/backend/replication/logical/decode.c b/src/backend/replication/logical/decode.c
index cc03f0706e9..2fdd4af90a8 100644
--- a/src/backend/replication/logical/decode.c
+++ b/src/backend/replication/logical/decode.c
@@ -454,7 +454,6 @@ heap2_decode(LogicalDecodingContext *ctx, XLogRecordBuffer *buf)
case XLOG_HEAP2_PRUNE_ON_ACCESS:
case XLOG_HEAP2_PRUNE_VACUUM_SCAN:
case XLOG_HEAP2_PRUNE_VACUUM_CLEANUP:
- case XLOG_HEAP2_VISIBLE:
case XLOG_HEAP2_LOCK_UPDATED:
break;
default:
diff --git a/src/backend/storage/ipc/standby.c b/src/backend/storage/ipc/standby.c
index 4222bdab078..c619643e121 100644
--- a/src/backend/storage/ipc/standby.c
+++ b/src/backend/storage/ipc/standby.c
@@ -475,12 +475,12 @@ ResolveRecoveryConflictWithSnapshot(TransactionId snapshotConflictHorizon,
* If we get passed InvalidTransactionId then we do nothing (no conflict).
*
* This can happen when replaying already-applied WAL records after a
- * standby crash or restart, or when replaying an XLOG_HEAP2_VISIBLE
- * record that marks as frozen a page which was already all-visible. It's
- * also quite common with records generated during index deletion
- * (original execution of the deletion can reason that a recovery conflict
- * which is sufficient for the deletion operation must take place before
- * replay of the deletion record itself).
+ * standby crash or restart, or when replaying a record that marks as
+ * frozen a page which was already marked all-visible in the visibility
+ * map. It's also quite common with records generated during index
+ * deletion (original execution of the deletion can reason that a recovery
+ * conflict which is sufficient for the deletion operation must take place
+ * before replay of the deletion record itself).
*/
if (!TransactionIdIsValid(snapshotConflictHorizon))
return;
diff --git a/src/include/access/heapam_xlog.h b/src/include/access/heapam_xlog.h
index 833114e0a6e..61ceaf2a98b 100644
--- a/src/include/access/heapam_xlog.h
+++ b/src/include/access/heapam_xlog.h
@@ -60,7 +60,6 @@
#define XLOG_HEAP2_PRUNE_ON_ACCESS 0x10
#define XLOG_HEAP2_PRUNE_VACUUM_SCAN 0x20
#define XLOG_HEAP2_PRUNE_VACUUM_CLEANUP 0x30
-#define XLOG_HEAP2_VISIBLE 0x40
#define XLOG_HEAP2_MULTI_INSERT 0x50
#define XLOG_HEAP2_LOCK_UPDATED 0x60
#define XLOG_HEAP2_NEW_CID 0x70
@@ -451,19 +450,6 @@ typedef struct xl_heap_inplace
#define MinSizeOfHeapInplace (offsetof(xl_heap_inplace, nmsgs) + sizeof(int))
-/*
- * This is what we need to know about setting a visibility map bit
- *
- * Backup blk 0: visibility map buffer
- */
-typedef struct xl_heap_visible
-{
- TransactionId snapshotConflictHorizon;
- uint8 flags;
-} xl_heap_visible;
-
-#define SizeOfHeapVisible (offsetof(xl_heap_visible, flags) + sizeof(uint8))
-
typedef struct xl_heap_new_cid
{
/*
@@ -507,11 +493,6 @@ extern void heap2_desc(StringInfo buf, XLogReaderState *record);
extern const char *heap2_identify(uint8 info);
extern void heap_xlog_logical_rewrite(XLogReaderState *r);
-extern XLogRecPtr log_heap_visible(Relation rel,
- Buffer vm_buffer,
- TransactionId snapshotConflictHorizon,
- uint8 vmflags);
-
/* in heapdesc.c, so it can be shared between frontend/backend code */
extern void heap_xlog_deserialize_prune_and_freeze(char *cursor, uint16 flags,
int *nplans, xlhp_freeze_plan **plans,
diff --git a/src/include/access/visibilitymap.h b/src/include/access/visibilitymap.h
index fbc69604d57..859e5795457 100644
--- a/src/include/access/visibilitymap.h
+++ b/src/include/access/visibilitymap.h
@@ -15,7 +15,6 @@
#define VISIBILITYMAP_H
#include "access/visibilitymapdefs.h"
-#include "access/xlogdefs.h"
#include "storage/block.h"
#include "storage/buf.h"
#include "utils/relcache.h"
@@ -31,15 +30,11 @@ extern bool visibilitymap_clear(Relation rel, BlockNumber heapBlk,
extern void visibilitymap_pin(Relation rel, BlockNumber heapBlk,
Buffer *vmbuf);
extern bool visibilitymap_pin_ok(BlockNumber heapBlk, Buffer vmbuf);
-extern uint8 visibilitymap_set(Relation rel,
- BlockNumber heapBlk,
- XLogRecPtr recptr,
- Buffer vmBuf,
- TransactionId cutoff_xid,
- uint8 flags);
-extern uint8 visibilitymap_set_vmbits(BlockNumber heapBlk,
- Buffer vmBuf, uint8 flags,
- const char *heapRelname);
+
+extern uint8 visibilitymap_set(BlockNumber heapBlk,
+ Buffer vmBuf, uint8 flags,
+ const char *heapRelname);
+
extern uint8 visibilitymap_get_status(Relation rel, BlockNumber heapBlk, Buffer *vmbuf);
extern void visibilitymap_count(Relation rel, BlockNumber *all_visible, BlockNumber *all_frozen);
extern BlockNumber visibilitymap_prepare_truncate(Relation rel,
diff --git a/src/include/access/visibilitymapdefs.h b/src/include/access/visibilitymapdefs.h
index 5ad5c020877..e01bce4c99f 100644
--- a/src/include/access/visibilitymapdefs.h
+++ b/src/include/access/visibilitymapdefs.h
@@ -21,14 +21,5 @@
#define VISIBILITYMAP_ALL_FROZEN 0x02
#define VISIBILITYMAP_VALID_BITS 0x03 /* OR of all valid visibilitymap
* flags bits */
-/*
- * To detect recovery conflicts during logical decoding on a standby, we need
- * to know if a table is a user catalog table. For that we add an additional
- * bit into xl_heap_visible.flags, in addition to the above.
- *
- * NB: VISIBILITYMAP_XLOG_* may not be passed to visibilitymap_set().
- */
-#define VISIBILITYMAP_XLOG_CATALOG_REL 0x04
-#define VISIBILITYMAP_XLOG_VALID_BITS (VISIBILITYMAP_VALID_BITS | VISIBILITYMAP_XLOG_CATALOG_REL)
#endif /* VISIBILITYMAPDEFS_H */
diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list
index 3c80d49b67e..d400c8429b0 100644
--- a/src/tools/pgindent/typedefs.list
+++ b/src/tools/pgindent/typedefs.list
@@ -4269,7 +4269,6 @@ xl_heap_prune
xl_heap_rewrite_mapping
xl_heap_truncate
xl_heap_update
-xl_heap_visible
xl_invalid_page
xl_invalid_page_key
xl_invalidations
--
2.43.0
v15-0020-Inline-TransactionIdFollows-Precedes.patchtext/x-patch; charset=US-ASCII; name=v15-0020-Inline-TransactionIdFollows-Precedes.patchDownload
From 2ae2b01dc50b7dde504519c0540ece7acf801211 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <melanieplageman@gmail.com>
Date: Tue, 29 Jul 2025 09:57:13 -0400
Subject: [PATCH v15 20/23] Inline TransactionIdFollows/Precedes()
Calling these from on-access pruning code had noticeable overhead in a
profile. There does not seem to be a reason not to inline them.
Reviewed-by: Kirill Reshke <reshkekirill@gmail.com>
---
src/backend/access/transam/transam.c | 64 -------------------------
src/include/access/transam.h | 70 ++++++++++++++++++++++++++--
2 files changed, 66 insertions(+), 68 deletions(-)
diff --git a/src/backend/access/transam/transam.c b/src/backend/access/transam/transam.c
index 9a39451a29a..553d6756cb3 100644
--- a/src/backend/access/transam/transam.c
+++ b/src/backend/access/transam/transam.c
@@ -273,70 +273,6 @@ TransactionIdAbortTree(TransactionId xid, int nxids, TransactionId *xids)
TRANSACTION_STATUS_ABORTED, InvalidXLogRecPtr);
}
-/*
- * TransactionIdPrecedes --- is id1 logically < id2?
- */
-bool
-TransactionIdPrecedes(TransactionId id1, TransactionId id2)
-{
- /*
- * If either ID is a permanent XID then we can just do unsigned
- * comparison. If both are normal, do a modulo-2^32 comparison.
- */
- int32 diff;
-
- if (!TransactionIdIsNormal(id1) || !TransactionIdIsNormal(id2))
- return (id1 < id2);
-
- diff = (int32) (id1 - id2);
- return (diff < 0);
-}
-
-/*
- * TransactionIdPrecedesOrEquals --- is id1 logically <= id2?
- */
-bool
-TransactionIdPrecedesOrEquals(TransactionId id1, TransactionId id2)
-{
- int32 diff;
-
- if (!TransactionIdIsNormal(id1) || !TransactionIdIsNormal(id2))
- return (id1 <= id2);
-
- diff = (int32) (id1 - id2);
- return (diff <= 0);
-}
-
-/*
- * TransactionIdFollows --- is id1 logically > id2?
- */
-bool
-TransactionIdFollows(TransactionId id1, TransactionId id2)
-{
- int32 diff;
-
- if (!TransactionIdIsNormal(id1) || !TransactionIdIsNormal(id2))
- return (id1 > id2);
-
- diff = (int32) (id1 - id2);
- return (diff > 0);
-}
-
-/*
- * TransactionIdFollowsOrEquals --- is id1 logically >= id2?
- */
-bool
-TransactionIdFollowsOrEquals(TransactionId id1, TransactionId id2)
-{
- int32 diff;
-
- if (!TransactionIdIsNormal(id1) || !TransactionIdIsNormal(id2))
- return (id1 >= id2);
-
- diff = (int32) (id1 - id2);
- return (diff >= 0);
-}
-
/*
* TransactionIdLatest --- get latest XID among a main xact and its children
diff --git a/src/include/access/transam.h b/src/include/access/transam.h
index 7d82cd2eb56..c9e20418275 100644
--- a/src/include/access/transam.h
+++ b/src/include/access/transam.h
@@ -255,6 +255,72 @@ typedef struct TransamVariablesData
} TransamVariablesData;
+
+/*
+ * TransactionIdPrecedes --- is id1 logically < id2?
+ */
+static inline bool
+TransactionIdPrecedes(TransactionId id1, TransactionId id2)
+{
+ /*
+ * If either ID is a permanent XID then we can just do unsigned
+ * comparison. If both are normal, do a modulo-2^32 comparison.
+ */
+ int32 diff;
+
+ if (!TransactionIdIsNormal(id1) || !TransactionIdIsNormal(id2))
+ return (id1 < id2);
+
+ diff = (int32) (id1 - id2);
+ return (diff < 0);
+}
+
+/*
+ * TransactionIdPrecedesOrEquals --- is id1 logically <= id2?
+ */
+static inline bool
+TransactionIdPrecedesOrEquals(TransactionId id1, TransactionId id2)
+{
+ int32 diff;
+
+ if (!TransactionIdIsNormal(id1) || !TransactionIdIsNormal(id2))
+ return (id1 <= id2);
+
+ diff = (int32) (id1 - id2);
+ return (diff <= 0);
+}
+
+/*
+ * TransactionIdFollows --- is id1 logically > id2?
+ */
+static inline bool
+TransactionIdFollows(TransactionId id1, TransactionId id2)
+{
+ int32 diff;
+
+ if (!TransactionIdIsNormal(id1) || !TransactionIdIsNormal(id2))
+ return (id1 > id2);
+
+ diff = (int32) (id1 - id2);
+ return (diff > 0);
+}
+
+/*
+ * TransactionIdFollowsOrEquals --- is id1 logically >= id2?
+ */
+static inline bool
+TransactionIdFollowsOrEquals(TransactionId id1, TransactionId id2)
+{
+ int32 diff;
+
+ if (!TransactionIdIsNormal(id1) || !TransactionIdIsNormal(id2))
+ return (id1 >= id2);
+
+ diff = (int32) (id1 - id2);
+ return (diff >= 0);
+}
+
+
/* ----------------
* extern declarations
* ----------------
@@ -274,10 +340,6 @@ extern bool TransactionIdDidAbort(TransactionId transactionId);
extern void TransactionIdCommitTree(TransactionId xid, int nxids, TransactionId *xids);
extern void TransactionIdAsyncCommitTree(TransactionId xid, int nxids, TransactionId *xids, XLogRecPtr lsn);
extern void TransactionIdAbortTree(TransactionId xid, int nxids, TransactionId *xids);
-extern bool TransactionIdPrecedes(TransactionId id1, TransactionId id2);
-extern bool TransactionIdPrecedesOrEquals(TransactionId id1, TransactionId id2);
-extern bool TransactionIdFollows(TransactionId id1, TransactionId id2);
-extern bool TransactionIdFollowsOrEquals(TransactionId id1, TransactionId id2);
extern TransactionId TransactionIdLatest(TransactionId mainxid,
int nxids, const TransactionId *xids);
extern XLogRecPtr TransactionIdGetCommitLSN(TransactionId xid);
--
2.43.0
v15-0021-Unset-all-visible-sooner-if-not-freezing.patchtext/x-patch; charset=US-ASCII; name=v15-0021-Unset-all-visible-sooner-if-not-freezing.patchDownload
From 74ce20dc392912c2f066c8c32819ae206acfde7c Mon Sep 17 00:00:00 2001
From: Melanie Plageman <melanieplageman@gmail.com>
Date: Tue, 29 Jul 2025 14:35:13 -0400
Subject: [PATCH v15 21/23] Unset all-visible sooner if not freezing
In prune/freeze code, we delay unsetting all-visible/all-frozen in the
presence of dead items to allow opportunistically freezing tuples if the
whole page would be frozen except for those dead items -- which are
removed later in vacuum's third phase.
Future commits will allow on-access pruning to set the VM, which means
all-visible will be initialized to true instead of false and we will do
extra bookkeeping in heap_prune_unchanged_lp_normal() to keep track of
whether or not the page is all-visible.
Because on-access pruning will not freeze tuples, it makes sense to
unset all-visible as soon as we encounter an LP_DEAD item and
avoid continued bookkeeping since we know the page is not all-visible
and we won't be able to remove those dead items.
---
src/backend/access/heap/pruneheap.c | 10 ++++++++--
1 file changed, 8 insertions(+), 2 deletions(-)
diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index 631394889d7..e64addfdf5d 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -1523,8 +1523,11 @@ heap_prune_record_dead(PruneState *prstate, OffsetNumber offnum,
/*
* Deliberately delay unsetting all_visible until later during pruning.
- * Removable dead tuples shouldn't preclude freezing the page.
+ * Removable dead tuples shouldn't preclude freezing the page. If we won't
+ * attempt freezing, just unset all-visible now, though.
*/
+ if (!prstate->attempt_freeze)
+ prstate->all_visible = prstate->all_frozen = false;
/* Record the dead offset for vacuum */
prstate->deadoffsets[prstate->lpdead_items++] = offnum;
@@ -1777,8 +1780,11 @@ heap_prune_record_unchanged_lp_dead(Page page, PruneState *prstate, OffsetNumber
* Similarly, don't unset all_visible until later, at the end of
* heap_page_prune_and_freeze(). This will allow us to attempt to freeze
* the page after pruning. As long as we unset it before updating the
- * visibility map, this will be correct.
+ * visibility map, this will be correct. If we won't attempt freezing,
+ * though, just unset all-visible now.
*/
+ if (!prstate->attempt_freeze)
+ prstate->all_visible = prstate->all_frozen = false;
/* Record the dead offset for vacuum */
prstate->deadoffsets[prstate->lpdead_items++] = offnum;
--
2.43.0
v15-0019-Use-GlobalVisState-to-determine-page-level-visib.patchtext/x-patch; charset=US-ASCII; name=v15-0019-Use-GlobalVisState-to-determine-page-level-visib.patchDownload
From dbde84e7d5706b06ea252f75bc5aa7bc39ff2dea Mon Sep 17 00:00:00 2001
From: Melanie Plageman <melanieplageman@gmail.com>
Date: Tue, 29 Jul 2025 14:38:24 -0400
Subject: [PATCH v15 19/23] Use GlobalVisState to determine page level
visibility
During pruning and during vacuum's third phase, we try to determine if
the whole page can be set all-visible in the visibility map. Instead of
using OldestXmin to determine if all the tuples on a page are visible to
everyone, use the GlobalVisState. This allows us to start setting the VM
during on-access pruning in a future commit.
It is possible for the GlobalVisState to change during the course of a
vacuum. In all but extraordinary cases, it moves forward, meaning more
pages could potentially be set in the VM.
Because comparing a transaction ID to the GlobalVisState requires more
operations than comparing it to another single transaction ID, we now
wait until after examining all the tuples on the page and if we have
maintained the visibility_cutoff_xid, we compare that to the
GlobalVisState just once per page. This works because if the page is
all-visible and has live, committed tuples on it, the
visibility_cutoff_xid will contain the newest xmin on the page. If
everyone can see it, the page is truly all-visible.
Doing this may mean we examine more tuples' xmins than before, as we may
have set all_visible to false sooner when encountering a live tuple
newer than OldestXmin. However, these extra comparisons were found not
to be significant in a profile.
---
src/backend/access/heap/heapam_visibility.c | 28 +++++++++++++
src/backend/access/heap/pruneheap.c | 46 +++++++++------------
src/backend/access/heap/vacuumlazy.c | 20 ++++-----
src/include/access/heapam.h | 4 +-
4 files changed, 60 insertions(+), 38 deletions(-)
diff --git a/src/backend/access/heap/heapam_visibility.c b/src/backend/access/heap/heapam_visibility.c
index 4ebc8abdbeb..edd529dc3c0 100644
--- a/src/backend/access/heap/heapam_visibility.c
+++ b/src/backend/access/heap/heapam_visibility.c
@@ -1189,6 +1189,34 @@ HeapTupleSatisfiesVacuum(HeapTuple htup, TransactionId OldestXmin,
return res;
}
+/*
+ * Nearly the same as HeapTupleSatisfiesVacuum, but uses a GlobalVisState to
+ * determine whether or not a tuple is HEAPTUPLE_DEAD Or
+ * HEAPTUPLE_RECENTLY_DEAD. It serves the same purpose but can be used by
+ * callers that have not calculated a single OldestXmin value.
+ */
+HTSV_Result
+HeapTupleSatisfiesVacuumGlobalVis(HeapTuple htup, GlobalVisState *vistest,
+ Buffer buffer)
+{
+ TransactionId dead_after = InvalidTransactionId;
+ HTSV_Result res;
+
+ res = HeapTupleSatisfiesVacuumHorizon(htup, buffer, &dead_after);
+
+ if (res == HEAPTUPLE_RECENTLY_DEAD)
+ {
+ Assert(TransactionIdIsValid(dead_after));
+
+ if (GlobalVisXidVisibleToAll(vistest, dead_after))
+ res = HEAPTUPLE_DEAD;
+ }
+ else
+ Assert(!TransactionIdIsValid(dead_after));
+
+ return res;
+}
+
/*
* Work horse for HeapTupleSatisfiesVacuum and similar routines.
*
diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index 52e956189e8..631394889d7 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -134,10 +134,9 @@ typedef struct
* all_visible and all_frozen indicate if the all-visible and all-frozen
* bits in the visibility map can be set for this page after pruning.
*
- * visibility_cutoff_xid is the newest xmin of live tuples on the page.
- * The caller can use it as the conflict horizon, when setting the VM
- * bits. It is only valid if we froze some tuples, and all_frozen is
- * true.
+ * visibility_cutoff_xid is the newest xmin of live tuples on the page. It
+ * can be used as the conflict horizon, when setting the VM or when
+ * freezing all the live tuples on the page.
*
* NOTE: all_visible and all_frozen don't include LP_DEAD items. That's
* convenient for heap_page_prune_and_freeze(), to use them to decide
@@ -706,14 +705,12 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
/*
* The visibility cutoff xid is the newest xmin of live, committed tuples
- * older than OldestXmin on the page. This field is only kept up-to-date
- * if the page is all-visible. As soon as a tuple is encountered that is
- * not visible to all, this field is unmaintained. As long as it is
- * maintained, it can be used to calculate the snapshot conflict horizon.
- * This is most likely to happen when updating the VM and/or freezing all
- * live tuples on the page. It is updated before returning to the caller
- * because vacuum does assert-build only validation on the page using this
- * field.
+ * on the page older than the visibility horizon represented in the
+ * GlobalVisState.
+ *
+ * If we encounter an uncommitted tuple, this field is unmaintained. If
+ * the page is being set all-visible or when freezing all live tuples on
+ * the page, it is used to calculate the snapshot conflict horizon.
*/
prstate.visibility_cutoff_xid = InvalidTransactionId;
@@ -909,6 +906,16 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
prstate.ndead > 0 ||
prstate.nunused > 0;
+ /*
+ * After processing all the live tuples on the page, if the newest xmin
+ * amongst them is not visible to everyone, the page cannot be
+ * all-visible.
+ */
+ if (prstate.all_visible &&
+ TransactionIdIsNormal(prstate.visibility_cutoff_xid) &&
+ !GlobalVisXidVisibleToAll(prstate.vistest, prstate.visibility_cutoff_xid))
+ prstate.all_visible = prstate.all_frozen = false;
+
/*
* Even if we don't prune anything, if we found a new value for the
* pd_prune_xid field or the page was marked full, we will update the hint
@@ -1130,7 +1137,7 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
Assert(prstate.cutoffs);
if (!heap_page_is_all_visible(relation, buffer,
- prstate.cutoffs->OldestXmin,
+ prstate.vistest,
&debug_all_frozen,
&debug_cutoff, off_loc))
Assert(false);
@@ -1656,19 +1663,6 @@ heap_prune_record_unchanged_lp_normal(Page page, PruneState *prstate, OffsetNumb
*/
xmin = HeapTupleHeaderGetXmin(htup);
- /*
- * For now always use prstate->cutoffs for this test, because
- * we only update 'all_visible' when freezing is requested. We
- * could use GlobalVisTestIsRemovableXid instead, if a
- * non-freezing caller wanted to set the VM bit.
- */
- Assert(prstate->cutoffs);
- if (!TransactionIdPrecedes(xmin, prstate->cutoffs->OldestXmin))
- {
- prstate->all_visible = prstate->all_frozen = false;
- break;
- }
-
/* Track newest xmin on page. */
if (TransactionIdFollows(xmin, prstate->visibility_cutoff_xid) &&
TransactionIdIsNormal(xmin))
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index e59eb40133d..d9b83fb6115 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -464,7 +464,7 @@ static void dead_items_add(LVRelState *vacrel, BlockNumber blkno, OffsetNumber *
static void dead_items_reset(LVRelState *vacrel);
static void dead_items_cleanup(LVRelState *vacrel);
static bool heap_page_would_be_all_visible(Relation rel, Buffer buf,
- TransactionId OldestXmin,
+ GlobalVisState *vistest,
OffsetNumber *deadoffsets,
int ndeadoffsets,
bool *all_frozen,
@@ -2734,7 +2734,7 @@ lazy_vacuum_heap_page(LVRelState *vacrel, BlockNumber blkno, Buffer buffer,
InvalidOffsetNumber);
if (heap_page_would_be_all_visible(vacrel->rel, buffer,
- vacrel->cutoffs.OldestXmin,
+ vacrel->vistest,
deadoffsets, num_offsets,
&all_frozen, &visibility_cutoff_xid,
&vacrel->offnum))
@@ -3479,14 +3479,13 @@ dead_items_cleanup(LVRelState *vacrel)
*/
bool
heap_page_is_all_visible(Relation rel, Buffer buf,
- TransactionId OldestXmin,
+ GlobalVisState *vistest,
bool *all_frozen,
TransactionId *visibility_cutoff_xid,
OffsetNumber *logging_offnum)
{
- return heap_page_would_be_all_visible(rel, buf,
- OldestXmin,
+ return heap_page_would_be_all_visible(rel, buf, vistest,
NULL, 0,
all_frozen,
visibility_cutoff_xid,
@@ -3505,7 +3504,7 @@ heap_page_is_all_visible(Relation rel, Buffer buf,
* Returns true if the page is all-visible other than the provided
* deadoffsets and false otherwise.
*
- * OldestXmin is used to determine visibility.
+ * vistest is used to determine visibility.
*
* *all_frozen is an output parameter indicating to the caller if every tuple
* on the page is frozen.
@@ -3526,7 +3525,7 @@ heap_page_is_all_visible(Relation rel, Buffer buf,
*/
static bool
heap_page_would_be_all_visible(Relation rel, Buffer buf,
- TransactionId OldestXmin,
+ GlobalVisState *vistest,
OffsetNumber *deadoffsets,
int ndeadoffsets,
bool *all_frozen,
@@ -3598,8 +3597,8 @@ heap_page_would_be_all_visible(Relation rel, Buffer buf,
tuple.t_len = ItemIdGetLength(itemid);
tuple.t_tableOid = RelationGetRelid(rel);
- switch (HeapTupleSatisfiesVacuum(&tuple, OldestXmin,
- buf))
+ switch (HeapTupleSatisfiesVacuumGlobalVis(&tuple, vistest,
+ buf))
{
case HEAPTUPLE_LIVE:
{
@@ -3618,8 +3617,7 @@ heap_page_would_be_all_visible(Relation rel, Buffer buf,
* that everyone sees it as committed?
*/
xmin = HeapTupleHeaderGetXmin(tuple.t_data);
- if (!TransactionIdPrecedes(xmin,
- OldestXmin))
+ if (!GlobalVisXidVisibleToAll(vistest, xmin))
{
all_visible = false;
*all_frozen = false;
diff --git a/src/include/access/heapam.h b/src/include/access/heapam.h
index 394f62a21e5..34ee323a423 100644
--- a/src/include/access/heapam.h
+++ b/src/include/access/heapam.h
@@ -395,7 +395,7 @@ extern void heap_vacuum_rel(Relation rel,
const VacuumParams params, BufferAccessStrategy bstrategy);
extern bool heap_page_is_all_visible(Relation rel, Buffer buf,
- TransactionId OldestXmin,
+ GlobalVisState *vistest,
bool *all_frozen,
TransactionId *visibility_cutoff_xid,
OffsetNumber *logging_offnum);
@@ -407,6 +407,8 @@ extern TM_Result HeapTupleSatisfiesUpdate(HeapTuple htup, CommandId curcid,
Buffer buffer);
extern HTSV_Result HeapTupleSatisfiesVacuum(HeapTuple htup, TransactionId OldestXmin,
Buffer buffer);
+extern HTSV_Result HeapTupleSatisfiesVacuumGlobalVis(HeapTuple htup,
+ GlobalVisState *vistest, Buffer buffer);
extern HTSV_Result HeapTupleSatisfiesVacuumHorizon(HeapTuple htup, Buffer buffer,
TransactionId *dead_after);
extern void HeapTupleSetHintBits(HeapTupleHeader tuple, Buffer buffer,
--
2.43.0
v15-0022-Allow-on-access-pruning-to-set-pages-all-visible.patchtext/x-patch; charset=US-ASCII; name=v15-0022-Allow-on-access-pruning-to-set-pages-all-visible.patchDownload
From 8df8cf1d9c5baa8d07e623e80dfaeb5ff4b25228 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <melanieplageman@gmail.com>
Date: Tue, 29 Jul 2025 14:34:30 -0400
Subject: [PATCH v15 22/23] Allow on-access pruning to set pages all-visible
Many queries do not modify the underlying relation. For such queries, if
on-access pruning occurs during the scan, we can check whether the page
has become all-visible and update the visibility map accordingly.
Previously, only vacuum marked pages as all-visible or all-frozen.
Supporting this requires passing information about whether the relation
is modified from the executor down to the scan descriptor.
This commit implements on-access VM setting for sequential scans as well
as for the underlying heap relation in index scans and bitmap heap
scans.
---
src/backend/access/heap/heapam.c | 15 +++-
src/backend/access/heap/heapam_handler.c | 15 +++-
src/backend/access/heap/pruneheap.c | 89 ++++++++++++++-----
src/backend/access/index/indexam.c | 46 ++++++++++
src/backend/access/table/tableam.c | 39 +++++++-
src/backend/executor/execMain.c | 4 +
src/backend/executor/execUtils.c | 2 +
src/backend/executor/nodeBitmapHeapscan.c | 7 +-
src/backend/executor/nodeIndexscan.c | 18 ++--
src/backend/executor/nodeSeqscan.c | 24 +++--
src/include/access/genam.h | 11 +++
src/include/access/heapam.h | 24 ++++-
src/include/access/relscan.h | 6 ++
src/include/access/tableam.h | 30 ++++++-
src/include/nodes/execnodes.h | 6 ++
.../t/035_standby_logical_decoding.pl | 3 +-
16 files changed, 292 insertions(+), 47 deletions(-)
diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index 14a2996b9ee..6181e355aaf 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -555,6 +555,7 @@ heap_prepare_pagescan(TableScanDesc sscan)
Buffer buffer = scan->rs_cbuf;
BlockNumber block = scan->rs_cblock;
Snapshot snapshot;
+ Buffer *vmbuffer = NULL;
Page page;
int lines;
bool all_visible;
@@ -569,7 +570,9 @@ heap_prepare_pagescan(TableScanDesc sscan)
/*
* Prune and repair fragmentation for the whole page, if possible.
*/
- heap_page_prune_opt(scan->rs_base.rs_rd, buffer);
+ if (sscan->rs_flags & SO_ALLOW_VM_SET)
+ vmbuffer = &scan->rs_vmbuffer;
+ heap_page_prune_opt(scan->rs_base.rs_rd, buffer, vmbuffer);
/*
* We must hold share lock on the buffer content while examining tuple
@@ -1246,6 +1249,7 @@ heap_beginscan(Relation relation, Snapshot snapshot,
sizeof(TBMIterateResult));
}
+ scan->rs_vmbuffer = InvalidBuffer;
return (TableScanDesc) scan;
}
@@ -1284,6 +1288,12 @@ heap_rescan(TableScanDesc sscan, ScanKey key, bool set_params,
scan->rs_cbuf = InvalidBuffer;
}
+ if (BufferIsValid(scan->rs_vmbuffer))
+ {
+ ReleaseBuffer(scan->rs_vmbuffer);
+ scan->rs_vmbuffer = InvalidBuffer;
+ }
+
/*
* SO_TYPE_BITMAPSCAN would be cleaned up here, but it does not hold any
* additional data vs a normal HeapScan
@@ -1316,6 +1326,9 @@ heap_endscan(TableScanDesc sscan)
if (BufferIsValid(scan->rs_cbuf))
ReleaseBuffer(scan->rs_cbuf);
+ if (BufferIsValid(scan->rs_vmbuffer))
+ ReleaseBuffer(scan->rs_vmbuffer);
+
/*
* Must free the read stream before freeing the BufferAccessStrategy.
*/
diff --git a/src/backend/access/heap/heapam_handler.c b/src/backend/access/heap/heapam_handler.c
index bcbac844bb6..f05b9e4968d 100644
--- a/src/backend/access/heap/heapam_handler.c
+++ b/src/backend/access/heap/heapam_handler.c
@@ -85,6 +85,7 @@ heapam_index_fetch_begin(Relation rel)
hscan->xs_base.rel = rel;
hscan->xs_cbuf = InvalidBuffer;
+ hscan->xs_vmbuffer = InvalidBuffer;
return &hscan->xs_base;
}
@@ -99,6 +100,12 @@ heapam_index_fetch_reset(IndexFetchTableData *scan)
ReleaseBuffer(hscan->xs_cbuf);
hscan->xs_cbuf = InvalidBuffer;
}
+
+ if (BufferIsValid(hscan->xs_vmbuffer))
+ {
+ ReleaseBuffer(hscan->xs_vmbuffer);
+ hscan->xs_vmbuffer = InvalidBuffer;
+ }
}
static void
@@ -138,7 +145,8 @@ heapam_index_fetch_tuple(struct IndexFetchTableData *scan,
* Prune page, but only if we weren't already on this page
*/
if (prev_buf != hscan->xs_cbuf)
- heap_page_prune_opt(hscan->xs_base.rel, hscan->xs_cbuf);
+ heap_page_prune_opt(hscan->xs_base.rel, hscan->xs_cbuf,
+ scan->modifies_base_rel ? NULL : &hscan->xs_vmbuffer);
}
/* Obtain share-lock on the buffer so we can examine visibility */
@@ -2471,6 +2479,7 @@ BitmapHeapScanNextBlock(TableScanDesc scan,
TBMIterateResult *tbmres;
OffsetNumber offsets[TBM_MAX_TUPLES_PER_PAGE];
int noffsets = -1;
+ Buffer *vmbuffer = NULL;
Assert(scan->rs_flags & SO_TYPE_BITMAPSCAN);
Assert(hscan->rs_read_stream);
@@ -2517,7 +2526,9 @@ BitmapHeapScanNextBlock(TableScanDesc scan,
/*
* Prune and repair fragmentation for the whole page, if possible.
*/
- heap_page_prune_opt(scan->rs_rd, buffer);
+ if (scan->rs_flags & SO_ALLOW_VM_SET)
+ vmbuffer = &hscan->rs_vmbuffer;
+ heap_page_prune_opt(scan->rs_rd, buffer, vmbuffer);
/*
* We must hold share lock on the buffer content while examining tuple
diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index e64addfdf5d..0d8fea346c5 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -45,6 +45,8 @@ typedef struct
bool mark_unused_now;
/* whether to attempt freezing tuples */
bool attempt_freeze;
+ /* whether or not to attempt updating the VM */
+ bool attempt_update_vm;
const struct VacuumCutoffs *cutoffs;
/*-------------------------------------------------------
@@ -185,9 +187,13 @@ static void page_verify_redirects(Page page);
* if there's not any use in pruning.
*
* Caller must have pin on the buffer, and must *not* have a lock on it.
+ *
+ * If vmbuffer is not NULL, it is okay for pruning to set the visibility map if
+ * the page is all visible. We will take care of pinning and, if needed,
+ * reading in the page of the visibility map.
*/
void
-heap_page_prune_opt(Relation relation, Buffer buffer)
+heap_page_prune_opt(Relation relation, Buffer buffer, Buffer *vmbuffer)
{
Page page = BufferGetPage(buffer);
TransactionId prune_xid;
@@ -251,6 +257,13 @@ heap_page_prune_opt(Relation relation, Buffer buffer)
{
OffsetNumber dummy_off_loc;
PruneFreezeResult presult;
+ int options = 0;
+
+ if (vmbuffer)
+ {
+ visibilitymap_pin(relation, BufferGetBlockNumber(buffer), vmbuffer);
+ options = HEAP_PAGE_PRUNE_UPDATE_VIS;
+ }
/*
* For now, pass mark_unused_now as false regardless of whether or
@@ -258,8 +271,9 @@ heap_page_prune_opt(Relation relation, Buffer buffer)
* that during on-access pruning with the current implementation.
*/
heap_page_prune_and_freeze(relation, buffer,
- InvalidBuffer, false,
- PRUNE_ON_ACCESS, 0, NULL,
+ vmbuffer ? *vmbuffer : InvalidBuffer,
+ false, /* blk_known_av */
+ PRUNE_ON_ACCESS, options, NULL,
vistest, &presult, &dummy_off_loc, NULL, NULL);
/*
@@ -443,6 +457,8 @@ heap_page_will_set_vis(Relation relation,
Buffer heap_buf,
Buffer vmbuffer,
bool blk_known_av,
+ PruneReason reason,
+ bool do_prune, bool do_freeze,
PruneState *prstate,
uint8 *vmflags,
bool *do_set_pd_vis)
@@ -450,6 +466,32 @@ heap_page_will_set_vis(Relation relation,
Page heap_page = BufferGetPage(heap_buf);
bool do_set_vm = false;
+ *do_set_pd_vis = false;
+
+ if (!prstate->attempt_update_vm)
+ {
+ Assert(!prstate->all_visible && !prstate->all_frozen);
+ Assert(*vmflags == 0);
+ return false;
+ }
+
+ /*
+ * If this is an on-access call and we're not actually pruning, avoid
+ * setting the visibility map if it would newly dirty the heap page or, if
+ * the page is already dirty, if doing so would require including a
+ * full-page image (FPI) of the heap page in the WAL. This situation
+ * should be rare, as on-access pruning is only attempted when
+ * pd_prune_xid is valid.
+ */
+ if (reason == PRUNE_ON_ACCESS &&
+ prstate->all_visible &&
+ !do_prune && !do_freeze &&
+ (!BufferIsDirty(heap_buf) || XLogCheckBufferNeedsBackup(heap_buf)))
+ {
+ prstate->all_visible = prstate->all_frozen = false;
+ return false;
+ }
+
if (prstate->all_visible && !PageIsAllVisible(heap_page))
*do_set_pd_vis = true;
@@ -473,6 +515,9 @@ heap_page_will_set_vis(Relation relation,
* page-level bit is clear. However, it's possible that in vacuum the bit
* got cleared after heap_vac_scan_next_block() was called, so we must
* recheck with buffer lock before concluding that the VM is corrupt.
+ *
+ * XXX: This will never trigger for on-access pruning because it passes
+ * blk_known_av as false. Should we remove that condition here?
*/
else if (blk_known_av && !PageIsAllVisible(heap_page) &&
visibilitymap_get_status(relation, heap_blk, &vmbuffer) != 0)
@@ -615,6 +660,7 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
prstate.vistest = vistest;
prstate.mark_unused_now = (options & HEAP_PAGE_PRUNE_MARK_UNUSED_NOW) != 0;
prstate.attempt_freeze = (options & HEAP_PAGE_PRUNE_FREEZE) != 0;
+ prstate.attempt_update_vm = (options & HEAP_PAGE_PRUNE_UPDATE_VIS) != 0;
prstate.cutoffs = cutoffs;
/*
@@ -692,7 +738,7 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
prstate.all_visible = true;
prstate.all_frozen = true;
}
- else if ((options & HEAP_PAGE_PRUNE_UPDATE_VIS) != 0)
+ else if (prstate.attempt_update_vm)
{
prstate.all_visible = true;
prstate.all_frozen = false;
@@ -906,6 +952,14 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
prstate.ndead > 0 ||
prstate.nunused > 0;
+ /*
+ * Even if we don't prune anything, if we found a new value for the
+ * pd_prune_xid field or the page was marked full, we will update the hint
+ * bit.
+ */
+ do_hint_prune = ((PageHeader) page)->pd_prune_xid != prstate.new_prune_xid ||
+ PageIsFull(page);
+
/*
* After processing all the live tuples on the page, if the newest xmin
* amongst them is not visible to everyone, the page cannot be
@@ -916,14 +970,6 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
!GlobalVisXidVisibleToAll(prstate.vistest, prstate.visibility_cutoff_xid))
prstate.all_visible = prstate.all_frozen = false;
- /*
- * Even if we don't prune anything, if we found a new value for the
- * pd_prune_xid field or the page was marked full, we will update the hint
- * bit.
- */
- do_hint_prune = ((PageHeader) page)->pd_prune_xid != prstate.new_prune_xid ||
- PageIsFull(page);
-
/*
* Decide if we want to go ahead with freezing according to the freeze
* plans we prepared, or not.
@@ -951,8 +997,6 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
if (prstate.lpdead_items > 0)
prstate.all_visible = prstate.all_frozen = false;
- Assert(!prstate.all_frozen || prstate.all_visible);
-
/*
* Determine whether or not to set the page level PD_ALL_VISIBLE and the
* visibility map bits based on information from the VM and from
@@ -968,12 +1012,12 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
* As such, it is possible to only update the VM when PD_ALL_VISIBLE is
* already set.
*/
- do_set_pd_vis = false;
- do_set_vm = false;
- if ((options & HEAP_PAGE_PRUNE_UPDATE_VIS) != 0)
- do_set_vm = heap_page_will_set_vis(relation,
- blockno, buffer, vmbuffer, blk_known_av,
- &prstate, &new_vmbits, &do_set_pd_vis);
+ do_set_vm = heap_page_will_set_vis(relation,
+ blockno, buffer, vmbuffer, blk_known_av,
+ reason, do_prune, do_freeze,
+ &prstate, &new_vmbits, &do_set_pd_vis);
+
+ Assert(!prstate.all_frozen || prstate.all_visible);
/* Lock vmbuffer before entering a critical section */
if (do_set_vm)
@@ -1134,7 +1178,6 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
bool debug_all_frozen;
Assert(prstate.lpdead_items == 0);
- Assert(prstate.cutoffs);
if (!heap_page_is_all_visible(relation, buffer,
prstate.vistest,
@@ -2299,8 +2342,8 @@ heap_log_freeze_plan(HeapTupleFreeze *tuples, int ntuples,
* - Reaping: During vacuum phase III, items that are already LP_DEAD are
* marked as unused.
*
- * - VM updates: After vacuum phases I and III, the heap page may be marked
- * all-visible and all-frozen.
+ * - VM updates: After vacuum phases I and III and on-access, the heap page
+ * may be marked all-visible and all-frozen.
*
* These changes all happen together, so we use a single WAL record for them
* all.
diff --git a/src/backend/access/index/indexam.c b/src/backend/access/index/indexam.c
index 0492d92d23b..8d582a8eafd 100644
--- a/src/backend/access/index/indexam.c
+++ b/src/backend/access/index/indexam.c
@@ -289,6 +289,32 @@ index_beginscan(Relation heapRelation,
return scan;
}
+/*
+ * Similar to index_beginscan(), but allows the caller to indicate whether the
+ * query modifies the underlying base relation. This is used when the caller
+ * wants to attempt marking pages in the base relation as all-visible in the
+ * visibility map during on-access pruning.
+ */
+IndexScanDesc
+index_beginscan_vmset(Relation heapRelation,
+ Relation indexRelation,
+ Snapshot snapshot,
+ IndexScanInstrumentation *instrument,
+ int nkeys, int norderbys, bool modifies_base_rel)
+{
+ IndexScanDesc scan;
+
+ scan = index_beginscan(heapRelation,
+ indexRelation,
+ snapshot,
+ instrument,
+ nkeys, norderbys);
+
+ scan->xs_heapfetch->modifies_base_rel = modifies_base_rel;
+
+ return scan;
+}
+
/*
* index_beginscan_bitmap - start a scan of an index with amgetbitmap
*
@@ -620,6 +646,26 @@ index_beginscan_parallel(Relation heaprel, Relation indexrel,
return scan;
}
+/*
+ * Parallel version of index_beginscan_vmset()
+ */
+IndexScanDesc
+index_beginscan_parallel_vmset(Relation heaprel, Relation indexrel,
+ IndexScanInstrumentation *instrument,
+ int nkeys, int norderbys,
+ ParallelIndexScanDesc pscan,
+ bool modifies_base_rel)
+{
+ IndexScanDesc scan;
+
+ scan = index_beginscan_parallel(heaprel, indexrel,
+ instrument,
+ nkeys, norderbys,
+ pscan);
+ scan->xs_heapfetch->modifies_base_rel = modifies_base_rel;
+ return scan;
+}
+
/* ----------------
* index_getnext_tid - get the next TID from a scan
*
diff --git a/src/backend/access/table/tableam.c b/src/backend/access/table/tableam.c
index a56c5eceb14..67dbf99f5b5 100644
--- a/src/backend/access/table/tableam.c
+++ b/src/backend/access/table/tableam.c
@@ -49,6 +49,10 @@
char *default_table_access_method = DEFAULT_TABLE_ACCESS_METHOD;
bool synchronize_seqscans = true;
+/* Helper for table_beginscan_parallel() and table_beginscan_parallel_vmset() */
+static TableScanDesc table_beginscan_parallel_common(Relation relation, ParallelTableScanDesc pscan,
+ uint32 flags);
+
/* ----------------------------------------------------------------------------
* Slot functions.
@@ -162,12 +166,14 @@ table_parallelscan_initialize(Relation rel, ParallelTableScanDesc pscan,
}
}
-TableScanDesc
-table_beginscan_parallel(Relation relation, ParallelTableScanDesc pscan)
+/*
+ * Common helper for table_beginscan_parallel() and table_beginscan_parallel_vmset()
+ */
+static TableScanDesc
+table_beginscan_parallel_common(Relation relation, ParallelTableScanDesc pscan,
+ uint32 flags)
{
Snapshot snapshot;
- uint32 flags = SO_TYPE_SEQSCAN |
- SO_ALLOW_STRAT | SO_ALLOW_SYNC | SO_ALLOW_PAGEMODE;
Assert(RelFileLocatorEquals(relation->rd_locator, pscan->phs_locator));
@@ -188,6 +194,31 @@ table_beginscan_parallel(Relation relation, ParallelTableScanDesc pscan)
pscan, flags);
}
+TableScanDesc
+table_beginscan_parallel(Relation relation, ParallelTableScanDesc pscan)
+{
+ uint32 flags = SO_TYPE_SEQSCAN |
+ SO_ALLOW_STRAT | SO_ALLOW_SYNC | SO_ALLOW_PAGEMODE;
+
+ return table_beginscan_parallel_common(relation, pscan, flags);
+}
+
+/*
+ * Parallel version of table_beginscan_vmset()
+ */
+TableScanDesc
+table_beginscan_parallel_vmset(Relation relation, ParallelTableScanDesc pscan,
+ bool modifies_rel)
+{
+ uint32 flags = SO_TYPE_SEQSCAN |
+ SO_ALLOW_STRAT | SO_ALLOW_SYNC | SO_ALLOW_PAGEMODE;
+
+ if (!modifies_rel)
+ flags |= SO_ALLOW_VM_SET;
+
+ return table_beginscan_parallel_common(relation, pscan, flags);
+}
+
/* ----------------------------------------------------------------------------
* Index scan related functions.
diff --git a/src/backend/executor/execMain.c b/src/backend/executor/execMain.c
index 831c55ce787..15be318fd41 100644
--- a/src/backend/executor/execMain.c
+++ b/src/backend/executor/execMain.c
@@ -916,6 +916,10 @@ InitPlan(QueryDesc *queryDesc, int eflags)
break;
}
+ /* If it has a rowmark, the relation is modified */
+ estate->es_modified_relids = bms_add_member(estate->es_modified_relids,
+ rc->rti);
+
/* Check that relation is a legal target for marking */
if (relation)
CheckValidRowMarkRel(relation, rc->markType);
diff --git a/src/backend/executor/execUtils.c b/src/backend/executor/execUtils.c
index fdc65c2b42b..28a06dcd244 100644
--- a/src/backend/executor/execUtils.c
+++ b/src/backend/executor/execUtils.c
@@ -893,6 +893,8 @@ ExecInitResultRelation(EState *estate, ResultRelInfo *resultRelInfo,
estate->es_result_relations = (ResultRelInfo **)
palloc0(estate->es_range_table_size * sizeof(ResultRelInfo *));
estate->es_result_relations[rti - 1] = resultRelInfo;
+ estate->es_modified_relids = bms_add_member(estate->es_modified_relids,
+ rti);
/*
* Saving in the list allows to avoid needlessly traversing the whole
diff --git a/src/backend/executor/nodeBitmapHeapscan.c b/src/backend/executor/nodeBitmapHeapscan.c
index bf24f3d7fe0..af6db9f7919 100644
--- a/src/backend/executor/nodeBitmapHeapscan.c
+++ b/src/backend/executor/nodeBitmapHeapscan.c
@@ -105,11 +105,16 @@ BitmapTableScanSetup(BitmapHeapScanState *node)
*/
if (!node->ss.ss_currentScanDesc)
{
+ bool modifies_rel =
+ bms_is_member(((Scan *) node->ss.ps.plan)->scanrelid,
+ node->ss.ps.state->es_modified_relids);
+
node->ss.ss_currentScanDesc =
table_beginscan_bm(node->ss.ss_currentRelation,
node->ss.ps.state->es_snapshot,
0,
- NULL);
+ NULL,
+ modifies_rel);
}
node->ss.ss_currentScanDesc->st.rs_tbmiterator = tbmiterator;
diff --git a/src/backend/executor/nodeIndexscan.c b/src/backend/executor/nodeIndexscan.c
index 7fcaa37fe62..c2ffbd3b08e 100644
--- a/src/backend/executor/nodeIndexscan.c
+++ b/src/backend/executor/nodeIndexscan.c
@@ -102,16 +102,22 @@ IndexNext(IndexScanState *node)
if (scandesc == NULL)
{
+
+ bool modifies_base_rel =
+ bms_is_member(((Scan *) node->ss.ps.plan)->scanrelid,
+ estate->es_modified_relids);
+
/*
* We reach here if the index scan is not parallel, or if we're
* serially executing an index scan that was planned to be parallel.
*/
- scandesc = index_beginscan(node->ss.ss_currentRelation,
- node->iss_RelationDesc,
- estate->es_snapshot,
- &node->iss_Instrument,
- node->iss_NumScanKeys,
- node->iss_NumOrderByKeys);
+ scandesc = index_beginscan_vmset(node->ss.ss_currentRelation,
+ node->iss_RelationDesc,
+ estate->es_snapshot,
+ &node->iss_Instrument,
+ node->iss_NumScanKeys,
+ node->iss_NumOrderByKeys,
+ modifies_base_rel);
node->iss_ScanDesc = scandesc;
diff --git a/src/backend/executor/nodeSeqscan.c b/src/backend/executor/nodeSeqscan.c
index 94047d29430..fd69275c181 100644
--- a/src/backend/executor/nodeSeqscan.c
+++ b/src/backend/executor/nodeSeqscan.c
@@ -65,13 +65,18 @@ SeqNext(SeqScanState *node)
if (scandesc == NULL)
{
+ bool modifies_rel =
+ bms_is_member(((Scan *) node->ss.ps.plan)->scanrelid,
+ estate->es_modified_relids);
+
/*
* We reach here if the scan is not parallel, or if we're serially
* executing a scan that was planned to be parallel.
*/
- scandesc = table_beginscan(node->ss.ss_currentRelation,
- estate->es_snapshot,
- 0, NULL);
+ scandesc = table_beginscan_vmset(node->ss.ss_currentRelation,
+ estate->es_snapshot,
+ 0, NULL, modifies_rel);
+
node->ss.ss_currentScanDesc = scandesc;
}
@@ -366,6 +371,7 @@ ExecSeqScanInitializeDSM(SeqScanState *node,
ParallelContext *pcxt)
{
EState *estate = node->ss.ps.state;
+ bool modifies_rel;
ParallelTableScanDesc pscan;
pscan = shm_toc_allocate(pcxt->toc, node->pscan_len);
@@ -373,8 +379,11 @@ ExecSeqScanInitializeDSM(SeqScanState *node,
pscan,
estate->es_snapshot);
shm_toc_insert(pcxt->toc, node->ss.ps.plan->plan_node_id, pscan);
+ modifies_rel = bms_is_member(((Scan *) node->ss.ps.plan)->scanrelid,
+ estate->es_modified_relids);
node->ss.ss_currentScanDesc =
- table_beginscan_parallel(node->ss.ss_currentRelation, pscan);
+ table_beginscan_parallel_vmset(node->ss.ss_currentRelation, pscan,
+ modifies_rel);
}
/* ----------------------------------------------------------------
@@ -404,8 +413,13 @@ ExecSeqScanInitializeWorker(SeqScanState *node,
ParallelWorkerContext *pwcxt)
{
ParallelTableScanDesc pscan;
+ bool modifies_rel =
+ bms_is_member(((Scan *) node->ss.ps.plan)->scanrelid,
+ node->ss.ps.state->es_modified_relids);
pscan = shm_toc_lookup(pwcxt->toc, node->ss.ps.plan->plan_node_id, false);
node->ss.ss_currentScanDesc =
- table_beginscan_parallel(node->ss.ss_currentRelation, pscan);
+ table_beginscan_parallel_vmset(node->ss.ss_currentRelation,
+ pscan,
+ modifies_rel);
}
diff --git a/src/include/access/genam.h b/src/include/access/genam.h
index 0831c33b038..87827127d96 100644
--- a/src/include/access/genam.h
+++ b/src/include/access/genam.h
@@ -174,6 +174,11 @@ extern IndexScanDesc index_beginscan(Relation heapRelation,
Snapshot snapshot,
IndexScanInstrumentation *instrument,
int nkeys, int norderbys);
+extern IndexScanDesc index_beginscan_vmset(Relation heapRelation,
+ Relation indexRelation,
+ Snapshot snapshot,
+ IndexScanInstrumentation *instrument,
+ int nkeys, int norderbys, bool modifies_heap_rel);
extern IndexScanDesc index_beginscan_bitmap(Relation indexRelation,
Snapshot snapshot,
IndexScanInstrumentation *instrument,
@@ -200,6 +205,12 @@ extern IndexScanDesc index_beginscan_parallel(Relation heaprel,
IndexScanInstrumentation *instrument,
int nkeys, int norderbys,
ParallelIndexScanDesc pscan);
+
+extern IndexScanDesc index_beginscan_parallel_vmset(Relation heaprel, Relation indexrel,
+ IndexScanInstrumentation *instrument,
+ int nkeys, int norderbys,
+ ParallelIndexScanDesc pscan,
+ bool modifies_rel);
extern ItemPointer index_getnext_tid(IndexScanDesc scan,
ScanDirection direction);
struct TupleTableSlot;
diff --git a/src/include/access/heapam.h b/src/include/access/heapam.h
index 34ee323a423..9dcf8d29496 100644
--- a/src/include/access/heapam.h
+++ b/src/include/access/heapam.h
@@ -94,6 +94,13 @@ typedef struct HeapScanDescData
*/
ParallelBlockTableScanWorkerData *rs_parallelworkerdata;
+ /*
+ * For sequential scans and bitmap heap scans. If the relation is not
+ * being modified, on-access pruning may read in the current heap page's
+ * corresponding VM block to this buffer.
+ */
+ Buffer rs_vmbuffer;
+
/* these fields only used in page-at-a-time mode and for bitmap scans */
uint32 rs_cindex; /* current tuple's index in vistuples */
uint32 rs_ntuples; /* number of visible tuples on page */
@@ -116,8 +123,18 @@ typedef struct IndexFetchHeapData
{
IndexFetchTableData xs_base; /* AM independent part of the descriptor */
- Buffer xs_cbuf; /* current heap buffer in scan, if any */
- /* NB: if xs_cbuf is not InvalidBuffer, we hold a pin on that buffer */
+ /*
+ * Current heap buffer in scan, if any. NB: if xs_cbuf is not
+ * InvalidBuffer, we hold a pin on that buffer.
+ */
+ Buffer xs_cbuf;
+
+ /*
+ * For index scans that do not modify the underlying heap table, on-access
+ * pruning may read in the current heap page's corresponding VM block to
+ * this buffer.
+ */
+ Buffer xs_vmbuffer;
} IndexFetchHeapData;
/* Result codes for HeapTupleSatisfiesVacuum */
@@ -363,7 +380,8 @@ extern TransactionId heap_index_delete_tuples(Relation rel,
/* in heap/pruneheap.c */
struct GlobalVisState;
-extern void heap_page_prune_opt(Relation relation, Buffer buffer);
+extern void heap_page_prune_opt(Relation relation, Buffer buffer,
+ Buffer *vmbuffer);
extern void heap_page_prune_and_freeze(Relation relation, Buffer buffer,
Buffer vmbuffer, bool blk_known_av,
PruneReason reason,
diff --git a/src/include/access/relscan.h b/src/include/access/relscan.h
index b5e0fb386c0..f496e0b4939 100644
--- a/src/include/access/relscan.h
+++ b/src/include/access/relscan.h
@@ -121,6 +121,12 @@ typedef struct ParallelBlockTableScanWorkerData *ParallelBlockTableScanWorker;
typedef struct IndexFetchTableData
{
Relation rel;
+
+ /*
+ * Some optimizations can only be performed if the query does not modify
+ * the underlying relation. Track that here.
+ */
+ bool modifies_base_rel;
} IndexFetchTableData;
struct IndexScanInstrumentation;
diff --git a/src/include/access/tableam.h b/src/include/access/tableam.h
index 77eb41eb6dc..6f5d4f9bb65 100644
--- a/src/include/access/tableam.h
+++ b/src/include/access/tableam.h
@@ -62,6 +62,8 @@ typedef enum ScanOptions
/* unregister snapshot at scan end? */
SO_TEMP_SNAPSHOT = 1 << 9,
+ /* whether or not scan should attempt to set the VM */
+ SO_ALLOW_VM_SET = 1 << 10,
} ScanOptions;
/*
@@ -881,6 +883,25 @@ table_beginscan(Relation rel, Snapshot snapshot,
return rel->rd_tableam->scan_begin(rel, snapshot, nkeys, key, NULL, flags);
}
+/*
+ * Similar to table_beginscan(), but allows the caller to indicate whether the
+ * query modifies the relation. This is used when the caller wants to attempt
+ * marking pages in the relation as all-visible in the visibility map during
+ * on-access pruning.
+ */
+static inline TableScanDesc
+table_beginscan_vmset(Relation rel, Snapshot snapshot,
+ int nkeys, struct ScanKeyData *key, bool modifies_rel)
+{
+ uint32 flags = SO_TYPE_SEQSCAN |
+ SO_ALLOW_STRAT | SO_ALLOW_SYNC | SO_ALLOW_PAGEMODE;
+
+ if (!modifies_rel)
+ flags |= SO_ALLOW_VM_SET;
+
+ return rel->rd_tableam->scan_begin(rel, snapshot, nkeys, key, NULL, flags);
+}
+
/*
* Like table_beginscan(), but for scanning catalog. It'll automatically use a
* snapshot appropriate for scanning catalog relations.
@@ -918,10 +939,13 @@ table_beginscan_strat(Relation rel, Snapshot snapshot,
*/
static inline TableScanDesc
table_beginscan_bm(Relation rel, Snapshot snapshot,
- int nkeys, struct ScanKeyData *key)
+ int nkeys, struct ScanKeyData *key, bool modifies_rel)
{
uint32 flags = SO_TYPE_BITMAPSCAN | SO_ALLOW_PAGEMODE;
+ if (!modifies_rel)
+ flags |= SO_ALLOW_VM_SET;
+
return rel->rd_tableam->scan_begin(rel, snapshot, nkeys, key,
NULL, flags);
}
@@ -1130,6 +1154,10 @@ extern void table_parallelscan_initialize(Relation rel,
extern TableScanDesc table_beginscan_parallel(Relation relation,
ParallelTableScanDesc pscan);
+extern TableScanDesc table_beginscan_parallel_vmset(Relation relation,
+ ParallelTableScanDesc pscan,
+ bool modifies_rel);
+
/*
* Restart a parallel scan. Call this in the leader process. Caller is
* responsible for making sure that all workers have finished the scan
diff --git a/src/include/nodes/execnodes.h b/src/include/nodes/execnodes.h
index 3a920cc7d17..c854be93436 100644
--- a/src/include/nodes/execnodes.h
+++ b/src/include/nodes/execnodes.h
@@ -676,6 +676,12 @@ typedef struct EState
* ExecDoInitialPruning() */
const char *es_sourceText; /* Source text from QueryDesc */
+ /*
+ * RT indexes of relations modified by the query either through
+ * UPDATE/DELETE/INSERT/MERGE or SELECT FOR UPDATE
+ */
+ Bitmapset *es_modified_relids;
+
JunkFilter *es_junkFilter; /* top-level junk filter, if any */
/* If query can insert/delete tuples, the command ID to mark them with */
diff --git a/src/test/recovery/t/035_standby_logical_decoding.pl b/src/test/recovery/t/035_standby_logical_decoding.pl
index c9c182892cf..f5c0c65b260 100644
--- a/src/test/recovery/t/035_standby_logical_decoding.pl
+++ b/src/test/recovery/t/035_standby_logical_decoding.pl
@@ -296,6 +296,7 @@ wal_level = 'logical'
max_replication_slots = 4
max_wal_senders = 4
autovacuum = off
+hot_standby_feedback = on
});
$node_primary->dump_info;
$node_primary->start;
@@ -745,7 +746,7 @@ check_pg_recvlogical_stderr($handle,
$logstart = -s $node_standby->logfile;
reactive_slots_change_hfs_and_wait_for_xmins('shared_row_removal_',
- 'no_conflict_', 0, 1);
+ 'no_conflict_', 1, 0);
# This should not trigger a conflict
wait_until_vacuum_can_remove(
--
2.43.0
v15-0023-Set-pd_prune_xid-on-insert.patchtext/x-patch; charset=US-ASCII; name=v15-0023-Set-pd_prune_xid-on-insert.patchDownload
From 3d30243c60a34b3dfd63eff381e86626e0c466e7 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <melanieplageman@gmail.com>
Date: Tue, 29 Jul 2025 16:12:56 -0400
Subject: [PATCH v15 23/23] Set pd_prune_xid on insert
Now that we can set the VM during read-only queries, it makes sense to
start setting the page prunable hint on insert. This will allow
heap_page_prune_and_freeze() to be called when the page is full or
mostly full.
For years there has been a note in heap_insert() and heap_multi_insert()
pointing out that setting pd_prune_xid would help clean up aborted
inserted tuples that would otherwise not be cleaned up until vacuum.
So, that's another benefit of setting it.
Setting pd_prune_xid on insert causes a page to be pruned and then
written out which then affects the reported number of hits in the
index-killtuples isolation test. This is a quirk of how hits are tracked
which sometimes leads them to be double counted. This should probably be
fixed or changed independently.
---
src/backend/access/heap/heapam.c | 25 +++++++++++++------
src/backend/access/heap/heapam_xlog.c | 15 ++++++++++-
.../isolation/expected/index-killtuples.out | 6 ++---
3 files changed, 34 insertions(+), 12 deletions(-)
diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index 6181e355aaf..1704269715e 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -2104,6 +2104,7 @@ heap_insert(Relation relation, HeapTuple tup, CommandId cid,
TransactionId xid = GetCurrentTransactionId();
HeapTuple heaptup;
Buffer buffer;
+ Page page;
Buffer vmbuffer = InvalidBuffer;
bool all_visible_cleared = false;
@@ -2163,15 +2164,19 @@ heap_insert(Relation relation, HeapTuple tup, CommandId cid,
}
/*
- * XXX Should we set PageSetPrunable on this page ?
+ * Set pd_prune_xid to trigger heap_page_prune_and_freeze() once the page
+ * is full so that we can set the page all-visible in the VM.
*
- * The inserting transaction may eventually abort thus making this tuple
- * DEAD and hence available for pruning. Though we don't want to optimize
- * for aborts, if no other tuple in this page is UPDATEd/DELETEd, the
- * aborted tuple will never be pruned until next vacuum is triggered.
+ * Setting pd_prune_xid is also handy if the inserting transaction
+ * eventually aborts making this tuple DEAD and hence available for
+ * pruning. If no other tuple in this page is UPDATEd/DELETEd, the aborted
+ * tuple would never otherwise be pruned until next vacuum is triggered.
*
- * If you do add PageSetPrunable here, add it in heap_xlog_insert too.
+ * Don't set it if we are in bootstrap mode, though.
*/
+ page = BufferGetPage(buffer);
+ if (TransactionIdIsNormal(xid))
+ PageSetPrunable(page, xid);
MarkBufferDirty(buffer);
@@ -2181,7 +2186,6 @@ heap_insert(Relation relation, HeapTuple tup, CommandId cid,
xl_heap_insert xlrec;
xl_heap_header xlhdr;
XLogRecPtr recptr;
- Page page = BufferGetPage(buffer);
uint8 info = XLOG_HEAP_INSERT;
int bufflags = 0;
@@ -2545,8 +2549,13 @@ heap_multi_insert(Relation relation, TupleTableSlot **slots, int ntuples,
}
/*
- * XXX Should we set PageSetPrunable on this page ? See heap_insert()
+ * Set pd_prune_xid. See heap_insert() for more on why we do this when
+ * inserting tuples. This only makes sense if we aren't already
+ * setting the page frozen in the VM. We also don't set it in
+ * bootstrap mode.
*/
+ if (!all_frozen_set && TransactionIdIsNormal(xid))
+ PageSetPrunable(page, xid);
MarkBufferDirty(buffer);
diff --git a/src/backend/access/heap/heapam_xlog.c b/src/backend/access/heap/heapam_xlog.c
index 502517fa62e..8c2a4a2e847 100644
--- a/src/backend/access/heap/heapam_xlog.c
+++ b/src/backend/access/heap/heapam_xlog.c
@@ -473,6 +473,12 @@ heap_xlog_insert(XLogReaderState *record)
freespace = PageGetHeapFreeSpace(page); /* needed to update FSM below */
+ /*
+ * Set the page prunable to trigger on-access pruning later which may
+ * set the page all-visible in the VM.
+ */
+ PageSetPrunable(page, XLogRecGetXid(record));
+
PageSetLSN(page, lsn);
if (xlrec->flags & XLH_INSERT_ALL_VISIBLE_CLEARED)
@@ -622,9 +628,16 @@ heap_xlog_multi_insert(XLogReaderState *record)
if (xlrec->flags & XLH_INSERT_ALL_VISIBLE_CLEARED)
PageClearAllVisible(page);
- /* XLH_INSERT_ALL_FROZEN_SET implies that all tuples are visible */
+ /*
+ * XLH_INSERT_ALL_FROZEN_SET implies that all tuples are visible. If
+ * we are not setting the page frozen, then set the page's prunable
+ * hint so that we trigger on-access pruning later which may set the
+ * page all-visible in the VM.
+ */
if (xlrec->flags & XLH_INSERT_ALL_FROZEN_SET)
PageSetAllVisible(page);
+ else
+ PageSetPrunable(page, XLogRecGetXid(record));
MarkBufferDirty(buffer);
}
diff --git a/src/test/isolation/expected/index-killtuples.out b/src/test/isolation/expected/index-killtuples.out
index be7ddd756ef..b29f2434b00 100644
--- a/src/test/isolation/expected/index-killtuples.out
+++ b/src/test/isolation/expected/index-killtuples.out
@@ -54,7 +54,7 @@ step flush: SELECT FROM pg_stat_force_next_flush();
step result: SELECT heap_blks_read + heap_blks_hit - counter.heap_accesses AS new_heap_accesses FROM counter, pg_statio_all_tables WHERE relname = 'kill_prior_tuple';
new_heap_accesses
-----------------
- 1
+ 2
(1 row)
step measure: UPDATE counter SET heap_accesses = (SELECT heap_blks_read + heap_blks_hit FROM pg_statio_all_tables WHERE relname = 'kill_prior_tuple');
@@ -130,7 +130,7 @@ step flush: SELECT FROM pg_stat_force_next_flush();
step result: SELECT heap_blks_read + heap_blks_hit - counter.heap_accesses AS new_heap_accesses FROM counter, pg_statio_all_tables WHERE relname = 'kill_prior_tuple';
new_heap_accesses
-----------------
- 1
+ 2
(1 row)
step measure: UPDATE counter SET heap_accesses = (SELECT heap_blks_read + heap_blks_hit FROM pg_statio_all_tables WHERE relname = 'kill_prior_tuple');
@@ -283,7 +283,7 @@ step flush: SELECT FROM pg_stat_force_next_flush();
step result: SELECT heap_blks_read + heap_blks_hit - counter.heap_accesses AS new_heap_accesses FROM counter, pg_statio_all_tables WHERE relname = 'kill_prior_tuple';
new_heap_accesses
-----------------
- 1
+ 2
(1 row)
step measure: UPDATE counter SET heap_accesses = (SELECT heap_blks_read + heap_blks_hit FROM pg_statio_all_tables WHERE relname = 'kill_prior_tuple');
--
2.43.0
I find this patch set quite hard to follow. 0001 altogether removes
the use of XLOG_HEAP2_VISIBLE in cases where we use
XLOG_HEAP2_MULTI_INSERT, but then 0007 (the next non-refactoring
patch) begins half-removing the dependency on XLOG_HEAP2_VISIBLE,
assisted by 0009 and 0010, and then later you come back and remove the
other half of the dependency. I know it was I who proposed (off-list)
first making the XLOG_HEAP2_VISIBLE record only deal with the VM page
and not the heap buffer, but I'm not sure that idea quite worked out
in terms of making this easier to follow. At the least, it seems weird
that COPY FREEZE is an exception that gets handled in a different way
than all the other cases, fully removing the dependency in one step.
It would also be nice if each time you repost this, or maybe in a
README that you post along beside the actual patches, you'd include
some kind of roadmap to help the reader understand the internal
structure of the patch set, like 1 does this, 2-9 get us to here,
10-whatever get us to this next place.
I don't really understand how the interlocking works. 0011 changes
visibilitymap_set so that it doesn't take the heap block as an
argument, but we'd better hold a lock on the heap page while setting
the VM bit, otherwise I think somebody could come along and modify the
heap page after we decided it was all-visible and before we actually
set the VM bit, which would be terrible. I would expect the comments
and the commit message to say something about that, but I don't see
that they do.
I find myself fearful of the way that 0007 propagates the existing
hacks around setting the VM bit into a new place:
+ /*
+ * We always emit a WAL record when setting
PD_ALL_VISIBLE, but we are
+ * careful not to emit a full page image unless
+ * checksums/wal_log_hints are enabled. We only set
the heap page LSN
+ * if full page images were an option when emitting
WAL. Otherwise,
+ * subsequent modifications of the page may
incorrectly skip emitting
+ * a full page image.
+ */
+ if (do_prune || nplans > 0 ||
+ (xlrec.flags & XLHP_SET_PD_ALL_VIS &&
XLogHintBitIsNeeded()))
+ PageSetLSN(page, lsn);
I suppose it's not the worst thing to duplicate this logic, because
you're later going to remove the original copy. But, it took me >10
minutes to find the text in src/backend/access/transam/README, in the
second half of the "Writing Hints" section, that explains the overall
principle here, and since the patch set doesn't seem to touch that
text, maybe you weren't even aware it was there. And, it's a little
weird to have a single WAL record that is either a hint or not a hint
depending on a complex set of conditions. (IMHO mixing & and &&
without parentheses is quite brave, and an explicit != 0 might not be
a bad idea either.)
Anyway, I kind of wonder if it's time to back out the hack that I
installed here many years ago. At the time, I thought that it would be
bad if a VACUUM swept over the visibility map setting VM bits and as a
result emitted an FPI for every page in the entire heap ... but
everyone who is running with checksums has accepted that cost already,
and with those being the default, that's probably going to be most
people. It would be even more compelling if we were going to freeze,
prune, and set all-visible on access, because then presumably the case
where we touch a page and ONLY set the VM bit would be rare, so the
cost of doing that wouldn't matter much, but I guess the patch doesn't
go that far -- we can freeze or set all-visible on access but not
prune, without which the scenario I was worrying about at the time is
still fairly plausible, I think, if checksums are turned off.
--
Robert Haas
EDB: http://www.enterprisedb.com
On Wed, Sep 24, 2025 at 4:13 PM Robert Haas <robertmhaas@gmail.com> wrote:
I find this patch set quite hard to follow. 0001 altogether removes
the use of XLOG_HEAP2_VISIBLE in cases where we use
XLOG_HEAP2_MULTI_INSERT, but then 0007 (the next non-refactoring
patch) begins half-removing the dependency on XLOG_HEAP2_VISIBLE,
assisted by 0009 and 0010, and then later you come back and remove the
other half of the dependency. I know it was I who proposed (off-list)
first making the XLOG_HEAP2_VISIBLE record only deal with the VM page
and not the heap buffer, but I'm not sure that idea quite worked out
in terms of making this easier to follow. At the least, it seems weird
that COPY FREEZE is an exception that gets handled in a different way
than all the other cases, fully removing the dependency in one step.
It would also be nice if each time you repost this, or maybe in a
README that you post along beside the actual patches, you'd include
some kind of roadmap to help the reader understand the internal
structure of the patch set, like 1 does this, 2-9 get us to here,
10-whatever get us to this next place.
In attached v16, I’ve reverted to removing XLOG_HEAP2_VISIBLE
entirely, rather than first removing each caller's heap page from the
VM WAL chain. I reordered changes and squashed several refactoring
patches to improve patch-by-patch readability. This should make the
set read differently from earlier versions that removed
XLOG_HEAP2_VISIBLE and had more step-by-step mechanical refactoring.
I think if we plan to go all the way with removing XLOG_HEAP2_VISIBLE,
having intermediate patches that just set PD_ALL_VISIBLE when making
other heap pages are more confusing than helpful. Also, I think having
separate flags for setting PD_ALL_VISIBLE in the WAL record
over-complicated the code.
0001: remove XLOG_HEAP2_VISIBLE from COPY FREEZE
0002 - 0005: various refactoring in advance of removing
XLOG_HEAP2_VISIBLE in pruning
0006: Pruning and freezing by vacuum sets the VM and emits a single
WAL record with those changes
0007: Reaping (phase III) by vacuum sets the VM and sets line pointers
unused in a single WAL record
0008 - 0009: XLOG_HEAP2_VISIBLE is eliminated
0010 - 0012: preparation for setting VM on-access
0013: set VM on-access
0014: set pd_prune_xid on insert
I find myself fearful of the way that 0007 propagates the existing
hacks around setting the VM bit into a new place:+ /* + * We always emit a WAL record when setting PD_ALL_VISIBLE, but we are + * careful not to emit a full page image unless + * checksums/wal_log_hints are enabled. We only set the heap page LSN + * if full page images were an option when emitting WAL. Otherwise, + * subsequent modifications of the page may incorrectly skip emitting + * a full page image. + */ + if (do_prune || nplans > 0 || + (xlrec.flags & XLHP_SET_PD_ALL_VIS && XLogHintBitIsNeeded())) + PageSetLSN(page, lsn);I suppose it's not the worst thing to duplicate this logic, because
you're later going to remove the original copy. But, it took me >10
minutes to find the text in src/backend/access/transam/README, in the
second half of the "Writing Hints" section, that explains the overall
principle here, and since the patch set doesn't seem to touch that
text, maybe you weren't even aware it was there.
I don't think that src/backend/access/transam/README must change with
my patch. It is still true that if the only change we are making to
the heap page is setting PD_ALL_VISIBLE and checksums/wal_log_hints
are disabled, we explicitly avoid an FPI and thus can't stamp the page
LSN.
And, it's a little
weird to have a single WAL record that is either a hint or not a hint
depending on a complex set of conditions.
PD_ALL_VISIBLE is different from tuple hints and other page hints
because setting the VM is always WAL logged and when we replay that,
it will always set PD_ALL_VISIBLE, so PD_ALL_VISIBLE is effectively
always WAL-logged. The other hints aren't wal-logged unless checksums
are enabled and we need an FPI. So PD_ALL_VISIBLE is different from
other page hints in multiple ways. We can't make it more like those
hints because of needing to preserve the invariant that the VM is
never set when the page is clear. The only thing we could do is forbid
omitting the FPI even when checksums are not enabled.
Anyway, I kind of wonder if it's time to back out the hack that I
installed here many years ago. At the time, I thought that it would be
bad if a VACUUM swept over the visibility map setting VM bits and as a
result emitted an FPI for every page in the entire heap ... but
everyone who is running with checksums has accepted that cost already,
and with those being the default, that's probably going to be most
people.
I agree that PD_ALL_VISIBLE persistence is complicated, but we have
other special cases that complicate the code for a performance
benefit. I guess the question is if we are saying people shouldn't run
without checksums in production. If that's true, then it's fine to
remove this optimization. Otherwise, I'm not so sure.
I think cloud providers generally have checksums enabled, but I don't
know what is common on-prem.
It would be even more compelling if we were going to freeze,
prune, and set all-visible on access, because then presumably the case
where we touch a page and ONLY set the VM bit would be rare, so the
cost of doing that wouldn't matter much, but I guess the patch doesn't
go that far -- we can freeze or set all-visible on access but not
prune, without which the scenario I was worrying about at the time is
still fairly plausible, I think, if checksums are turned off.
With the whole set applied, we can prune and set the VM on access but
not freeze. I have a patch to do that, but it introduced noticeable
CPU overhead to prepare the freeze plans. I'd have to spend much more
time studying it to avoid regressing workloads where we don't end up
freezing but prepare the freeze plans during SELECT queries.
- Melanie
Attachments:
v16-0014-Set-pd_prune_xid-on-insert.patchtext/x-patch; charset=UTF-8; name=v16-0014-Set-pd_prune_xid-on-insert.patchDownload
From bd82158f3836798a6ea9194e70e33b93980fbbde Mon Sep 17 00:00:00 2001
From: Melanie Plageman <melanieplageman@gmail.com>
Date: Tue, 29 Jul 2025 16:12:56 -0400
Subject: [PATCH v16 14/14] Set pd_prune_xid on insert
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
Now that visibility map (VM) updates can occur during read-only queries,
it makes sense to also set the page’s pd_prune_xid hint during inserts.
This enables heap_page_prune_and_freeze() to run after a page is
filled with newly inserted tuples the first time it is read.
This change also addresses a long-standing note in heap_insert() and
heap_multi_insert(), which observed that setting pd_prune_xid would
help clean up aborted insertions sooner. Without it, such tuples might
linger until VACUUM, whereas now they can be pruned earlier.
Setting pd_prune_xid on insert can cause a page to be dirtied and
written out when it previously would not have been, affetcting the
reported number of hits in the index-killtuples isolation test. It is
unclear if this is a bug in the way hits are tracked, a faulty test
expectation, or if simply updating the test's expected output is
sufficient remediation.
---
src/backend/access/heap/heapam.c | 25 +++++++++++++------
src/backend/access/heap/heapam_xlog.c | 15 ++++++++++-
.../isolation/expected/index-killtuples.out | 6 ++---
3 files changed, 34 insertions(+), 12 deletions(-)
diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index 6181e355aaf..1704269715e 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -2104,6 +2104,7 @@ heap_insert(Relation relation, HeapTuple tup, CommandId cid,
TransactionId xid = GetCurrentTransactionId();
HeapTuple heaptup;
Buffer buffer;
+ Page page;
Buffer vmbuffer = InvalidBuffer;
bool all_visible_cleared = false;
@@ -2163,15 +2164,19 @@ heap_insert(Relation relation, HeapTuple tup, CommandId cid,
}
/*
- * XXX Should we set PageSetPrunable on this page ?
+ * Set pd_prune_xid to trigger heap_page_prune_and_freeze() once the page
+ * is full so that we can set the page all-visible in the VM.
*
- * The inserting transaction may eventually abort thus making this tuple
- * DEAD and hence available for pruning. Though we don't want to optimize
- * for aborts, if no other tuple in this page is UPDATEd/DELETEd, the
- * aborted tuple will never be pruned until next vacuum is triggered.
+ * Setting pd_prune_xid is also handy if the inserting transaction
+ * eventually aborts making this tuple DEAD and hence available for
+ * pruning. If no other tuple in this page is UPDATEd/DELETEd, the aborted
+ * tuple would never otherwise be pruned until next vacuum is triggered.
*
- * If you do add PageSetPrunable here, add it in heap_xlog_insert too.
+ * Don't set it if we are in bootstrap mode, though.
*/
+ page = BufferGetPage(buffer);
+ if (TransactionIdIsNormal(xid))
+ PageSetPrunable(page, xid);
MarkBufferDirty(buffer);
@@ -2181,7 +2186,6 @@ heap_insert(Relation relation, HeapTuple tup, CommandId cid,
xl_heap_insert xlrec;
xl_heap_header xlhdr;
XLogRecPtr recptr;
- Page page = BufferGetPage(buffer);
uint8 info = XLOG_HEAP_INSERT;
int bufflags = 0;
@@ -2545,8 +2549,13 @@ heap_multi_insert(Relation relation, TupleTableSlot **slots, int ntuples,
}
/*
- * XXX Should we set PageSetPrunable on this page ? See heap_insert()
+ * Set pd_prune_xid. See heap_insert() for more on why we do this when
+ * inserting tuples. This only makes sense if we aren't already
+ * setting the page frozen in the VM. We also don't set it in
+ * bootstrap mode.
*/
+ if (!all_frozen_set && TransactionIdIsNormal(xid))
+ PageSetPrunable(page, xid);
MarkBufferDirty(buffer);
diff --git a/src/backend/access/heap/heapam_xlog.c b/src/backend/access/heap/heapam_xlog.c
index 69d1f0b8633..51f7961075f 100644
--- a/src/backend/access/heap/heapam_xlog.c
+++ b/src/backend/access/heap/heapam_xlog.c
@@ -475,6 +475,12 @@ heap_xlog_insert(XLogReaderState *record)
freespace = PageGetHeapFreeSpace(page); /* needed to update FSM below */
+ /*
+ * Set the page prunable to trigger on-access pruning later which may
+ * set the page all-visible in the VM.
+ */
+ PageSetPrunable(page, XLogRecGetXid(record));
+
PageSetLSN(page, lsn);
if (xlrec->flags & XLH_INSERT_ALL_VISIBLE_CLEARED)
@@ -624,9 +630,16 @@ heap_xlog_multi_insert(XLogReaderState *record)
if (xlrec->flags & XLH_INSERT_ALL_VISIBLE_CLEARED)
PageClearAllVisible(page);
- /* XLH_INSERT_ALL_FROZEN_SET implies that all tuples are visible */
+ /*
+ * XLH_INSERT_ALL_FROZEN_SET implies that all tuples are visible. If
+ * we are not setting the page frozen, then set the page's prunable
+ * hint so that we trigger on-access pruning later which may set the
+ * page all-visible in the VM.
+ */
if (xlrec->flags & XLH_INSERT_ALL_FROZEN_SET)
PageSetAllVisible(page);
+ else
+ PageSetPrunable(page, XLogRecGetXid(record));
MarkBufferDirty(buffer);
}
diff --git a/src/test/isolation/expected/index-killtuples.out b/src/test/isolation/expected/index-killtuples.out
index be7ddd756ef..b29f2434b00 100644
--- a/src/test/isolation/expected/index-killtuples.out
+++ b/src/test/isolation/expected/index-killtuples.out
@@ -54,7 +54,7 @@ step flush: SELECT FROM pg_stat_force_next_flush();
step result: SELECT heap_blks_read + heap_blks_hit - counter.heap_accesses AS new_heap_accesses FROM counter, pg_statio_all_tables WHERE relname = 'kill_prior_tuple';
new_heap_accesses
-----------------
- 1
+ 2
(1 row)
step measure: UPDATE counter SET heap_accesses = (SELECT heap_blks_read + heap_blks_hit FROM pg_statio_all_tables WHERE relname = 'kill_prior_tuple');
@@ -130,7 +130,7 @@ step flush: SELECT FROM pg_stat_force_next_flush();
step result: SELECT heap_blks_read + heap_blks_hit - counter.heap_accesses AS new_heap_accesses FROM counter, pg_statio_all_tables WHERE relname = 'kill_prior_tuple';
new_heap_accesses
-----------------
- 1
+ 2
(1 row)
step measure: UPDATE counter SET heap_accesses = (SELECT heap_blks_read + heap_blks_hit FROM pg_statio_all_tables WHERE relname = 'kill_prior_tuple');
@@ -283,7 +283,7 @@ step flush: SELECT FROM pg_stat_force_next_flush();
step result: SELECT heap_blks_read + heap_blks_hit - counter.heap_accesses AS new_heap_accesses FROM counter, pg_statio_all_tables WHERE relname = 'kill_prior_tuple';
new_heap_accesses
-----------------
- 1
+ 2
(1 row)
step measure: UPDATE counter SET heap_accesses = (SELECT heap_blks_read + heap_blks_hit FROM pg_statio_all_tables WHERE relname = 'kill_prior_tuple');
--
2.43.0
v16-0005-Make-heap_page_is_all_visible-independent-of-LVR.patchtext/x-patch; charset=US-ASCII; name=v16-0005-Make-heap_page_is_all_visible-independent-of-LVR.patchDownload
From 280948d3f1f18b8a6c473d6b56023b0c795f0efa Mon Sep 17 00:00:00 2001
From: Melanie Plageman <melanieplageman@gmail.com>
Date: Fri, 3 Oct 2025 15:57:02 -0400
Subject: [PATCH v16 05/14] Make heap_page_is_all_visible independent of
LVRelState
Future commits will use this function inside of pruneheap.c where we do
not have access to the LVRelState. We only need a few parameters from
the LVRelState, so just pass those in explicitly.
Author: Melanie Plageman <melanieplageman@gmail.com>
Reviewed-by: Kirill Reshke <reshkekirill@gmail.com>
Reviewed-by: Robert Haas <robertmhaas@gmail.com>
Discussion: https://postgr.es/m/flat/CAAKRu_ZMw6Npd_qm2KM%2BFwQ3cMOMx1Dh3VMhp8-V7SOLxdK9-g%40mail.gmail.com
---
src/backend/access/heap/vacuumlazy.c | 44 +++++++++++++++-------------
src/include/access/heapam.h | 6 ++++
2 files changed, 30 insertions(+), 20 deletions(-)
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index 8eef436dd10..aed1f8e1139 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -463,8 +463,6 @@ static void dead_items_add(LVRelState *vacrel, BlockNumber blkno, OffsetNumber *
int num_offsets);
static void dead_items_reset(LVRelState *vacrel);
static void dead_items_cleanup(LVRelState *vacrel);
-static bool heap_page_is_all_visible(LVRelState *vacrel, Buffer buf,
- TransactionId *visibility_cutoff_xid, bool *all_frozen);
static void update_relstats_all_indexes(LVRelState *vacrel);
static void vacuum_error_callback(void *arg);
static void update_vacuum_error_info(LVRelState *vacrel,
@@ -2014,8 +2012,9 @@ lazy_scan_prune(LVRelState *vacrel,
Assert(presult.lpdead_items == 0);
- if (!heap_page_is_all_visible(vacrel, buf,
- &debug_cutoff, &debug_all_frozen))
+ if (!heap_page_is_all_visible(vacrel->rel, buf,
+ vacrel->cutoffs.OldestXmin, &debug_all_frozen,
+ &debug_cutoff, &vacrel->offnum))
Assert(false);
Assert(presult.all_frozen == debug_all_frozen);
@@ -2917,8 +2916,10 @@ lazy_vacuum_heap_page(LVRelState *vacrel, BlockNumber blkno, Buffer buffer,
* emitted.
*/
Assert(!PageIsAllVisible(page));
- if (heap_page_is_all_visible(vacrel, buffer, &visibility_cutoff_xid,
- &all_frozen))
+ if (heap_page_is_all_visible(vacrel->rel, buffer, vacrel->cutoffs.OldestXmin,
+ &all_frozen,
+ &visibility_cutoff_xid,
+ &vacrel->offnum))
{
uint8 flags = VISIBILITYMAP_ALL_VISIBLE;
@@ -3608,15 +3609,20 @@ dead_items_cleanup(LVRelState *vacrel)
* xmin amongst the visible tuples. Set *all_frozen to true if every tuple
* on this page is frozen.
*
- * This is a stripped down version of lazy_scan_prune(). If you change
- * anything here, make sure that everything stays in sync. Note that an
- * assertion calls us to verify that everybody still agrees. Be sure to avoid
- * introducing new side-effects here.
+ * *logging_offnum will have the OffsetNumber of the current tuple being
+ * processed for vacuum's error callback system.
+ *
+ * This is similar logic to that in heap_prune_record_unchanged_lp_normal() If
+ * you change anything here, make sure that everything stays in sync. Note
+ * that an assertion calls us to verify that everybody still agrees. Be sure
+ * to avoid introducing new side-effects here.
*/
-static bool
-heap_page_is_all_visible(LVRelState *vacrel, Buffer buf,
+bool
+heap_page_is_all_visible(Relation rel, Buffer buf,
+ TransactionId OldestXmin,
+ bool *all_frozen,
TransactionId *visibility_cutoff_xid,
- bool *all_frozen)
+ OffsetNumber *logging_offnum)
{
Page page = BufferGetPage(buf);
BlockNumber blockno = BufferGetBlockNumber(buf);
@@ -3639,7 +3645,7 @@ heap_page_is_all_visible(LVRelState *vacrel, Buffer buf,
* Set the offset number so that we can display it along with any
* error that occurred while processing this tuple.
*/
- vacrel->offnum = offnum;
+ *logging_offnum = offnum;
itemid = PageGetItemId(page, offnum);
/* Unused or redirect line pointers are of no interest */
@@ -3663,10 +3669,9 @@ heap_page_is_all_visible(LVRelState *vacrel, Buffer buf,
tuple.t_data = (HeapTupleHeader) PageGetItem(page, itemid);
tuple.t_len = ItemIdGetLength(itemid);
- tuple.t_tableOid = RelationGetRelid(vacrel->rel);
+ tuple.t_tableOid = RelationGetRelid(rel);
- switch (HeapTupleSatisfiesVacuum(&tuple, vacrel->cutoffs.OldestXmin,
- buf))
+ switch (HeapTupleSatisfiesVacuum(&tuple, OldestXmin, buf))
{
case HEAPTUPLE_LIVE:
{
@@ -3685,8 +3690,7 @@ heap_page_is_all_visible(LVRelState *vacrel, Buffer buf,
* that everyone sees it as committed?
*/
xmin = HeapTupleHeaderGetXmin(tuple.t_data);
- if (!TransactionIdPrecedes(xmin,
- vacrel->cutoffs.OldestXmin))
+ if (!TransactionIdPrecedes(xmin, OldestXmin))
{
all_visible = false;
*all_frozen = false;
@@ -3721,7 +3725,7 @@ heap_page_is_all_visible(LVRelState *vacrel, Buffer buf,
} /* scan along page */
/* Clear the offset information once we have processed the given page. */
- vacrel->offnum = InvalidOffsetNumber;
+ *logging_offnum = InvalidOffsetNumber;
return all_visible;
}
diff --git a/src/include/access/heapam.h b/src/include/access/heapam.h
index bc71fef6643..ea67fb83fbe 100644
--- a/src/include/access/heapam.h
+++ b/src/include/access/heapam.h
@@ -432,6 +432,12 @@ extern void log_heap_prune_and_freeze(Relation relation, Buffer buffer,
extern void heap_vacuum_rel(Relation rel,
const VacuumParams params, BufferAccessStrategy bstrategy);
+extern bool heap_page_is_all_visible(Relation rel, Buffer buf,
+ TransactionId OldestXmin,
+ bool *all_frozen,
+ TransactionId *visibility_cutoff_xid,
+ OffsetNumber *logging_offnum);
+
/* in heap/heapam_visibility.c */
extern bool HeapTupleSatisfiesVisibility(HeapTuple htup, Snapshot snapshot,
Buffer buffer);
--
2.43.0
v16-0004-Update-PruneState.all_-visible-frozen-earlier-in.patchtext/x-patch; charset=UTF-8; name=v16-0004-Update-PruneState.all_-visible-frozen-earlier-in.patchDownload
From a5772e0eec65df1cf064055b1ba77a51861f7fe8 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <melanieplageman@gmail.com>
Date: Mon, 15 Sep 2025 16:25:44 -0400
Subject: [PATCH v16 04/14] Update PruneState.all_[visible|frozen] earlier in
pruning
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
In the prune/freeze path, we currently delay clearing all_visible and
all_frozen when dead items are present. This allows opportunistic
freezing if the page would otherwise be fully frozen, since those dead
items are later removed in vacuum’s third phase.
However, if no freezing will be attempted, there’s no need to delay.
Clearing the flags promptly avoids extra bookkeeping in
heap_prune_unchanged_lp_normal(). At present this has no runtime effect
because all callers that consider setting the VM also attempt freezing,
but future callers (e.g. on-access pruning) may want to set the VM
without preparing freeze plans.
We also used to defer clearing all_visible and all_frozen until after
computing the visibility cutoff XID. By determining the cutoff earlier,
we can update these flags immediately after deciding whether to
opportunistically freeze. This is necessary if we want to set the VM in
the same WAL record that prunes and freezes tuples on the page.
While we are at it, unset all_frozen whenever we unset all_visible.
Previously we could only use all_frozen in combination with all_visible
as all_frozen was not unset when not all-visible tuples were encountered.
It is best to keep them both up-to-date to avoid mistakes when using
all_frozen.
---
src/backend/access/heap/pruneheap.c | 145 ++++++++++++++-------------
src/backend/access/heap/vacuumlazy.c | 9 +-
2 files changed, 78 insertions(+), 76 deletions(-)
diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index f819ab57d55..c23a6a21a7f 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -137,15 +137,12 @@ typedef struct
* bits. It is only valid if we froze some tuples, and all_frozen is
* true.
*
- * NOTE: all_visible and all_frozen don't include LP_DEAD items. That's
- * convenient for heap_page_prune_and_freeze(), to use them to decide
- * whether to freeze the page or not. The all_visible and all_frozen
- * values returned to the caller are adjusted to include LP_DEAD items at
- * the end.
- *
- * all_frozen should only be considered valid if all_visible is also set;
- * we don't bother to clear the all_frozen flag every time we clear the
- * all_visible flag.
+ * NOTE: all_visible and all_frozen initially don't include LP_DEAD items.
+ * That's convenient for heap_page_prune_and_freeze(), to use them to
+ * decide whether to freeze the page or not. The all_visible and
+ * all_frozen values returned to the caller are adjusted to include
+ * LP_DEAD items after we determine whether or not to opportunistically
+ * freeze.
*/
bool all_visible;
bool all_frozen;
@@ -308,7 +305,9 @@ heap_page_prune_opt(Relation relation, Buffer buffer)
* pre-freeze checks.
*
* do_prune, do_hint_full_or_prunable, and did_tuple_hint_fpi must all have
- * been decided before calling this function.
+ * been decided before calling this function. *frz_conflict_horizon is set to
+ * the snapshot conflict horizon we for the WAL record should we decide to freeze
+ * tuples.
*
* prstate is an input/output parameter.
*
@@ -320,7 +319,8 @@ heap_page_will_freeze(Relation relation, Buffer buffer,
bool did_tuple_hint_fpi,
bool do_prune,
bool do_hint_prune,
- PruneState *prstate)
+ PruneState *prstate,
+ TransactionId *frz_conflict_horizon)
{
bool do_freeze = false;
@@ -357,8 +357,10 @@ heap_page_will_freeze(Relation relation, Buffer buffer,
* anymore. The opportunistic freeze heuristic must be improved;
* however, for now, try to approximate the old logic.
*/
- if (prstate->all_visible && prstate->all_frozen && prstate->nfrozen > 0)
+ if (prstate->all_frozen && prstate->nfrozen > 0)
{
+ Assert(prstate->all_visible);
+
/*
* Freezing would make the page all-frozen. Have already emitted
* an FPI or will do so anyway?
@@ -388,6 +390,22 @@ heap_page_will_freeze(Relation relation, Buffer buffer,
* critical section.
*/
heap_pre_freeze_checks(buffer, prstate->frozen, prstate->nfrozen);
+
+ /*
+ * Calculate what the snapshot conflict horizon should be for a record
+ * freezing tuples. We can use the visibility_cutoff_xid as our cutoff
+ * for conflicts when the whole page is eligible to become all-frozen
+ * in the VM once we're done with it. Otherwise we generate a
+ * conservative cutoff by stepping back from OldestXmin.
+ */
+ if (prstate->all_frozen)
+ *frz_conflict_horizon = prstate->visibility_cutoff_xid;
+ else
+ {
+ /* Avoids false conflicts when hot_standby_feedback in use */
+ *frz_conflict_horizon = prstate->cutoffs->OldestXmin;
+ TransactionIdRetreat(*frz_conflict_horizon);
+ }
}
else if (prstate->nfrozen > 0)
{
@@ -432,10 +450,11 @@ heap_page_will_freeze(Relation relation, Buffer buffer,
* considered advantageous for overall system performance to do so now. The
* 'params.cutoffs', 'presult', 'new_relfrozen_xid' and 'new_relmin_mxid'
* arguments are required when freezing. When HEAP_PRUNE_FREEZE option is
- * passed, we also set presult->all_visible and presult->all_frozen on exit,
- * to indicate if the VM bits can be set. They are always set to false when
- * the HEAP_PRUNE_FREEZE option is not passed, because at the moment only
- * callers that also freeze need that information.
+ * passed, we also set presult->all_visible and presult->all_frozen after
+ * determining whether or not to opporunistically freeze, to indicate if the
+ * VM bits can be set. They are always set to false when the
+ * HEAP_PRUNE_FREEZE option is not passed, because at the moment only callers
+ * that also freeze need that information.
*
* presult contains output parameters needed by callers, such as the number of
* tuples removed and the offsets of dead items on the page after pruning.
@@ -471,6 +490,7 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
bool do_hint_prune;
bool did_tuple_hint_fpi;
int64 fpi_before = pgWalUsage.wal_fpi;
+ TransactionId frz_conflict_horizon = InvalidTransactionId;
/* Copy parameters to prstate */
prstate.vistest = params->vistest;
@@ -540,10 +560,10 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
* are tuples present that are not visible to everyone or if there are
* dead tuples which are not yet removable. However, dead tuples which
* will be removed by the end of vacuuming should not preclude us from
- * opportunistically freezing. Because of that, we do not clear
- * all_visible when we see LP_DEAD items. We fix that at the end of the
- * function, when we return the value to the caller, so that the caller
- * doesn't set the VM bit incorrectly.
+ * opportunistically freezing. Because of that, we do not immediately
+ * clear all_visible when we see LP_DEAD items. We fix that after
+ * scanning the line pointers, before we return the value to the caller,
+ * so that the caller doesn't set the VM bit incorrectly.
*/
if (prstate.attempt_freeze)
{
@@ -778,8 +798,26 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
did_tuple_hint_fpi,
do_prune,
do_hint_prune,
- &prstate);
+ &prstate,
+ &frz_conflict_horizon);
+ /*
+ * While scanning the line pointers, we did not clear
+ * all_visible/all_frozen when encountering LP_DEAD items because we
+ * wanted the decision whether or not to freeze the page to be unaffected
+ * by the short-term presence of LP_DEAD items. These LP_DEAD items are
+ * effectively assumed to be LP_UNUSED items in the making. It doesn't
+ * matter which vacuum heap pass (initial pass or final pass) ends up
+ * setting the page all-frozen, as long as the ongoing VACUUM does it.
+ *
+ * Now that we finished determining whether or not to freeze the page,
+ * update all_visible and all_frozen so that they reflect the true state
+ * of the page for setting PD_ALL_VISIBLE and VM bits.
+ */
+ if (prstate.lpdead_items > 0)
+ prstate.all_visible = prstate.all_frozen = false;
+
+ Assert(!prstate.all_frozen || prstate.all_visible);
/* Any error while applying the changes is critical */
START_CRIT_SECTION();
@@ -838,27 +876,8 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
* on the standby with xids older than the youngest tuple this
* record will freeze will conflict.
*/
- TransactionId frz_conflict_horizon = InvalidTransactionId;
TransactionId conflict_xid;
- /*
- * We can use the visibility_cutoff_xid as our cutoff for
- * conflicts when the whole page is eligible to become all-frozen
- * in the VM once we're done with it. Otherwise we generate a
- * conservative cutoff by stepping back from OldestXmin.
- */
- if (do_freeze)
- {
- if (prstate.all_visible && prstate.all_frozen)
- frz_conflict_horizon = prstate.visibility_cutoff_xid;
- else
- {
- /* Avoids false conflicts when hot_standby_feedback in use */
- frz_conflict_horizon = prstate.cutoffs->OldestXmin;
- TransactionIdRetreat(frz_conflict_horizon);
- }
- }
-
if (TransactionIdFollows(frz_conflict_horizon, prstate.latest_xid_removed))
conflict_xid = frz_conflict_horizon;
else
@@ -882,30 +901,8 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
presult->nfrozen = prstate.nfrozen;
presult->live_tuples = prstate.live_tuples;
presult->recently_dead_tuples = prstate.recently_dead_tuples;
-
- /*
- * It was convenient to ignore LP_DEAD items in all_visible earlier on to
- * make the choice of whether or not to freeze the page unaffected by the
- * short-term presence of LP_DEAD items. These LP_DEAD items were
- * effectively assumed to be LP_UNUSED items in the making. It doesn't
- * matter which vacuum heap pass (initial pass or final pass) ends up
- * setting the page all-frozen, as long as the ongoing VACUUM does it.
- *
- * Now that freezing has been finalized, unset all_visible if there are
- * any LP_DEAD items on the page. It needs to reflect the present state
- * of the page, as expected by our caller.
- */
- if (prstate.all_visible && prstate.lpdead_items == 0)
- {
- presult->all_visible = prstate.all_visible;
- presult->all_frozen = prstate.all_frozen;
- }
- else
- {
- presult->all_visible = false;
- presult->all_frozen = false;
- }
-
+ presult->all_visible = prstate.all_visible;
+ presult->all_frozen = prstate.all_frozen;
presult->hastup = prstate.hastup;
/*
@@ -1285,8 +1282,11 @@ heap_prune_record_dead(PruneState *prstate, OffsetNumber offnum,
/*
* Deliberately delay unsetting all_visible until later during pruning.
- * Removable dead tuples shouldn't preclude freezing the page.
+ * Removable dead tuples shouldn't preclude freezing the page. If we won't
+ * attempt freezing, just unset all-visible now, though.
*/
+ if (!prstate->attempt_freeze)
+ prstate->all_visible = prstate->all_frozen = false;
/* Record the dead offset for vacuum */
prstate->deadoffsets[prstate->lpdead_items++] = offnum;
@@ -1412,7 +1412,7 @@ heap_prune_record_unchanged_lp_normal(Page page, PruneState *prstate, OffsetNumb
if (!HeapTupleHeaderXminCommitted(htup))
{
- prstate->all_visible = false;
+ prstate->all_visible = prstate->all_frozen = false;
break;
}
@@ -1434,7 +1434,7 @@ heap_prune_record_unchanged_lp_normal(Page page, PruneState *prstate, OffsetNumb
Assert(prstate->cutoffs);
if (!TransactionIdPrecedes(xmin, prstate->cutoffs->OldestXmin))
{
- prstate->all_visible = false;
+ prstate->all_visible = prstate->all_frozen = false;
break;
}
@@ -1447,7 +1447,7 @@ heap_prune_record_unchanged_lp_normal(Page page, PruneState *prstate, OffsetNumb
case HEAPTUPLE_RECENTLY_DEAD:
prstate->recently_dead_tuples++;
- prstate->all_visible = false;
+ prstate->all_visible = prstate->all_frozen = false;
/*
* This tuple will soon become DEAD. Update the hint field so
@@ -1466,7 +1466,7 @@ heap_prune_record_unchanged_lp_normal(Page page, PruneState *prstate, OffsetNumb
* assumption is a bit shaky, but it is what acquire_sample_rows()
* does, so be consistent.
*/
- prstate->all_visible = false;
+ prstate->all_visible = prstate->all_frozen = false;
/*
* If we wanted to optimize for aborts, we might consider marking
@@ -1484,7 +1484,7 @@ heap_prune_record_unchanged_lp_normal(Page page, PruneState *prstate, OffsetNumb
* will commit and update the counters after we report.
*/
prstate->live_tuples++;
- prstate->all_visible = false;
+ prstate->all_visible = prstate->all_frozen = false;
/*
* This tuple may soon become DEAD. Update the hint field so that
@@ -1552,8 +1552,11 @@ heap_prune_record_unchanged_lp_dead(Page page, PruneState *prstate, OffsetNumber
* Similarly, don't unset all_visible until later, at the end of
* heap_page_prune_and_freeze(). This will allow us to attempt to freeze
* the page after pruning. As long as we unset it before updating the
- * visibility map, this will be correct.
+ * visibility map, this will be correct. If we won't attempt freezing,
+ * though, just unset all-visible now.
*/
+ if (!prstate->attempt_freeze)
+ prstate->all_visible = prstate->all_frozen = false;
/* Record the dead offset for vacuum */
prstate->deadoffsets[prstate->lpdead_items++] = offnum;
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index 6125f157709..8eef436dd10 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -2007,7 +2007,6 @@ lazy_scan_prune(LVRelState *vacrel,
* agreement with heap_page_is_all_visible() using an assertion.
*/
#ifdef USE_ASSERT_CHECKING
- /* Note that all_frozen value does not matter when !all_visible */
if (presult.all_visible)
{
TransactionId debug_cutoff;
@@ -2060,6 +2059,7 @@ lazy_scan_prune(LVRelState *vacrel,
*has_lpdead_items = (presult.lpdead_items > 0);
Assert(!presult.all_visible || !(*has_lpdead_items));
+ Assert(!presult.all_frozen || presult.all_visible);
/*
* Handle setting visibility map bit based on information from the VM (as
@@ -2165,11 +2165,10 @@ lazy_scan_prune(LVRelState *vacrel,
/*
* If the all-visible page is all-frozen but not marked as such yet, mark
- * it as all-frozen. Note that all_frozen is only valid if all_visible is
- * true, so we must check both all_visible and all_frozen.
+ * it as all-frozen.
*/
- else if (all_visible_according_to_vm && presult.all_visible &&
- presult.all_frozen && !VM_ALL_FROZEN(vacrel->rel, blkno, &vmbuffer))
+ else if (all_visible_according_to_vm && presult.all_frozen &&
+ !VM_ALL_FROZEN(vacrel->rel, blkno, &vmbuffer))
{
uint8 old_vmbits;
--
2.43.0
v16-0002-Assorted-trivial-heap_page_prune_and_freeze-clea.patchtext/x-patch; charset=US-ASCII; name=v16-0002-Assorted-trivial-heap_page_prune_and_freeze-clea.patchDownload
From 33a35d23ae88d634cb01024295099e5d5466b1a3 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <melanieplageman@gmail.com>
Date: Mon, 15 Sep 2025 12:06:19 -0400
Subject: [PATCH v16 02/14] Assorted trivial heap_page_prune_and_freeze cleanup
Group heap_page_prune_and_freeze() input parameters in a struct and
clean up their documentation.
Rename a member of PruneState and disambiguate some local
heap_page_prune_and_freeze() variables.
---
src/backend/access/heap/pruneheap.c | 114 +++++++++++++--------------
src/backend/access/heap/vacuumlazy.c | 16 ++--
src/include/access/heapam.h | 62 ++++++++++++---
src/tools/pgindent/typedefs.list | 1 +
4 files changed, 115 insertions(+), 78 deletions(-)
diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index d8ea0c78f77..9ba89b1fc28 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -42,8 +42,8 @@ typedef struct
/* whether or not dead items can be set LP_UNUSED during pruning */
bool mark_unused_now;
/* whether to attempt freezing tuples */
- bool freeze;
- struct VacuumCutoffs *cutoffs;
+ bool attempt_freeze;
+ const struct VacuumCutoffs *cutoffs;
/*-------------------------------------------------------
* Fields describing what to do to the page
@@ -253,15 +253,23 @@ heap_page_prune_opt(Relation relation, Buffer buffer)
if (PageIsFull(page) || PageGetHeapFreeSpace(page) < minfree)
{
OffsetNumber dummy_off_loc;
+ PruneFreezeParams params;
PruneFreezeResult presult;
+ params.relation = relation;
+ params.buffer = buffer;
+ params.reason = PRUNE_ON_ACCESS;
+ params.vistest = vistest;
+ params.cutoffs = NULL;
+
/*
* For now, pass mark_unused_now as false regardless of whether or
* not the relation has indexes, since we cannot safely determine
* that during on-access pruning with the current implementation.
*/
- heap_page_prune_and_freeze(relation, buffer, vistest, 0,
- NULL, &presult, PRUNE_ON_ACCESS, &dummy_off_loc, NULL, NULL);
+ params.options = 0;
+
+ heap_page_prune_and_freeze(¶ms, &presult, &dummy_off_loc, NULL, NULL);
/*
* Report the number of tuples reclaimed to pgstats. This is
@@ -303,60 +311,43 @@ heap_page_prune_opt(Relation relation, Buffer buffer)
* also need to account for a reduction in the length of the line pointer
* array following array truncation by us.
*
- * If the HEAP_PRUNE_FREEZE option is set, we will also freeze tuples if it's
- * required in order to advance relfrozenxid / relminmxid, or if it's
- * considered advantageous for overall system performance to do so now. The
- * 'cutoffs', 'presult', 'new_relfrozen_xid' and 'new_relmin_mxid' arguments
- * are required when freezing. When HEAP_PRUNE_FREEZE option is set, we also
- * set presult->all_visible and presult->all_frozen on exit, to indicate if
- * the VM bits can be set. They are always set to false when the
- * HEAP_PRUNE_FREEZE option is not set, because at the moment only callers
- * that also freeze need that information.
- *
- * vistest is used to distinguish whether tuples are DEAD or RECENTLY_DEAD
- * (see heap_prune_satisfies_vacuum).
+ * params contains the input parameters used to control freezing and pruning
+ * behavior. See the definition of PruneFreezeParams for more on what each
+ * parameter does.
*
- * options:
- * MARK_UNUSED_NOW indicates that dead items can be set LP_UNUSED during
- * pruning.
- *
- * FREEZE indicates that we will also freeze tuples, and will return
- * 'all_visible', 'all_frozen' flags to the caller.
- *
- * cutoffs contains the freeze cutoffs, established by VACUUM at the beginning
- * of vacuuming the relation. Required if HEAP_PRUNE_FREEZE option is set.
- * cutoffs->OldestXmin is also used to determine if dead tuples are
- * HEAPTUPLE_RECENTLY_DEAD or HEAPTUPLE_DEAD.
+ * If the HEAP_PRUNE_FREEZE option is set in params, we will freeze tuples if
+ * it's required in order to advance relfrozenxid / relminmxid, or if it's
+ * considered advantageous for overall system performance to do so now. The
+ * 'params.cutoffs', 'presult', 'new_relfrozen_xid' and 'new_relmin_mxid'
+ * arguments are required when freezing. When HEAP_PRUNE_FREEZE option is
+ * passed, we also set presult->all_visible and presult->all_frozen on exit,
+ * to indicate if the VM bits can be set. They are always set to false when
+ * the HEAP_PRUNE_FREEZE option is not passed, because at the moment only
+ * callers that also freeze need that information.
*
* presult contains output parameters needed by callers, such as the number of
* tuples removed and the offsets of dead items on the page after pruning.
* heap_page_prune_and_freeze() is responsible for initializing it. Required
* by all callers.
*
- * reason indicates why the pruning is performed. It is included in the WAL
- * record for debugging and analysis purposes, but otherwise has no effect.
- *
* off_loc is the offset location required by the caller to use in error
* callback.
*
* new_relfrozen_xid and new_relmin_mxid must provided by the caller if the
- * HEAP_PRUNE_FREEZE option is set. On entry, they contain the oldest XID and
- * multi-XID seen on the relation so far. They will be updated with oldest
- * values present on the page after pruning. After processing the whole
- * relation, VACUUM can use these values as the new relfrozenxid/relminmxid
- * for the relation.
+ * HEAP_PRUNE_FREEZE option is set in params. On entry, they contain the
+ * oldest XID and multi-XID seen on the relation so far. They will be updated
+ * with oldest values present on the page after pruning. After processing the
+ * whole relation, VACUUM can use these values as the new
+ * relfrozenxid/relminmxid for the relation.
*/
void
-heap_page_prune_and_freeze(Relation relation, Buffer buffer,
- GlobalVisState *vistest,
- int options,
- struct VacuumCutoffs *cutoffs,
+heap_page_prune_and_freeze(PruneFreezeParams *params,
PruneFreezeResult *presult,
- PruneReason reason,
OffsetNumber *off_loc,
TransactionId *new_relfrozen_xid,
MultiXactId *new_relmin_mxid)
{
+ Buffer buffer = params->buffer;
Page page = BufferGetPage(buffer);
BlockNumber blockno = BufferGetBlockNumber(buffer);
OffsetNumber offnum,
@@ -365,15 +356,16 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
HeapTupleData tup;
bool do_freeze;
bool do_prune;
- bool do_hint;
- bool hint_bit_fpi;
+ bool do_hint_prune;
+ bool did_tuple_hint_fpi;
int64 fpi_before = pgWalUsage.wal_fpi;
/* Copy parameters to prstate */
- prstate.vistest = vistest;
- prstate.mark_unused_now = (options & HEAP_PAGE_PRUNE_MARK_UNUSED_NOW) != 0;
- prstate.freeze = (options & HEAP_PAGE_PRUNE_FREEZE) != 0;
- prstate.cutoffs = cutoffs;
+ prstate.vistest = params->vistest;
+ prstate.mark_unused_now =
+ (params->options & HEAP_PAGE_PRUNE_MARK_UNUSED_NOW) != 0;
+ prstate.attempt_freeze = (params->options & HEAP_PAGE_PRUNE_FREEZE) != 0;
+ prstate.cutoffs = params->cutoffs;
/*
* Our strategy is to scan the page and make lists of items to change,
@@ -394,7 +386,7 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
/* initialize page freezing working state */
prstate.pagefrz.freeze_required = false;
- if (prstate.freeze)
+ if (prstate.attempt_freeze)
{
Assert(new_relfrozen_xid && new_relmin_mxid);
prstate.pagefrz.FreezePageRelfrozenXid = *new_relfrozen_xid;
@@ -441,7 +433,7 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
* function, when we return the value to the caller, so that the caller
* doesn't set the VM bit incorrectly.
*/
- if (prstate.freeze)
+ if (prstate.attempt_freeze)
{
prstate.all_visible = true;
prstate.all_frozen = true;
@@ -467,7 +459,7 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
prstate.visibility_cutoff_xid = InvalidTransactionId;
maxoff = PageGetMaxOffsetNumber(page);
- tup.t_tableOid = RelationGetRelid(relation);
+ tup.t_tableOid = RelationGetRelid(params->relation);
/*
* Determine HTSV for all tuples, and queue them up for processing as HOT
@@ -555,7 +547,7 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
* If checksums are enabled, heap_prune_satisfies_vacuum() may have caused
* an FPI to be emitted.
*/
- hint_bit_fpi = fpi_before != pgWalUsage.wal_fpi;
+ did_tuple_hint_fpi = fpi_before != pgWalUsage.wal_fpi;
/*
* Process HOT chains.
@@ -663,7 +655,7 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
* pd_prune_xid field or the page was marked full, we will update the hint
* bit.
*/
- do_hint = ((PageHeader) page)->pd_prune_xid != prstate.new_prune_xid ||
+ do_hint_prune = ((PageHeader) page)->pd_prune_xid != prstate.new_prune_xid ||
PageIsFull(page);
/*
@@ -671,7 +663,7 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
* plans we prepared, or not.
*/
do_freeze = false;
- if (prstate.freeze)
+ if (prstate.attempt_freeze)
{
if (prstate.pagefrz.freeze_required)
{
@@ -702,16 +694,16 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
* Freezing would make the page all-frozen. Have already
* emitted an FPI or will do so anyway?
*/
- if (RelationNeedsWAL(relation))
+ if (RelationNeedsWAL(params->relation))
{
- if (hint_bit_fpi)
+ if (did_tuple_hint_fpi)
do_freeze = true;
else if (do_prune)
{
if (XLogCheckBufferNeedsBackup(buffer))
do_freeze = true;
}
- else if (do_hint)
+ else if (do_hint_prune)
{
if (XLogHintBitIsNeeded() && XLogCheckBufferNeedsBackup(buffer))
do_freeze = true;
@@ -753,7 +745,7 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
/* Any error while applying the changes is critical */
START_CRIT_SECTION();
- if (do_hint)
+ if (do_hint_prune)
{
/*
* Update the page's pd_prune_xid field to either zero, or the lowest
@@ -796,7 +788,7 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
/*
* Emit a WAL XLOG_HEAP2_PRUNE* record showing what we did
*/
- if (RelationNeedsWAL(relation))
+ if (RelationNeedsWAL(params->relation))
{
/*
* The snapshotConflictHorizon for the whole record should be the
@@ -834,9 +826,9 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
else
conflict_xid = prstate.latest_xid_removed;
- log_heap_prune_and_freeze(relation, buffer,
+ log_heap_prune_and_freeze(params->relation, buffer,
conflict_xid,
- true, reason,
+ true, params->reason,
prstate.frozen, prstate.nfrozen,
prstate.redirected, prstate.nredirected,
prstate.nowdead, prstate.ndead,
@@ -894,7 +886,7 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
presult->lpdead_items = prstate.lpdead_items;
/* the presult->deadoffsets array was already filled in */
- if (prstate.freeze)
+ if (prstate.attempt_freeze)
{
if (presult->nfrozen > 0)
{
@@ -1476,7 +1468,7 @@ heap_prune_record_unchanged_lp_normal(Page page, PruneState *prstate, OffsetNumb
}
/* Consider freezing any normal tuples which will not be removed */
- if (prstate->freeze)
+ if (prstate->attempt_freeze)
{
bool totally_frozen;
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index ab6938d1da1..6125f157709 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -1951,10 +1951,16 @@ lazy_scan_prune(LVRelState *vacrel,
{
Relation rel = vacrel->rel;
PruneFreezeResult presult;
- int prune_options = 0;
+ PruneFreezeParams params;
Assert(BufferGetBlockNumber(buf) == blkno);
+ params.relation = rel;
+ params.buffer = buf;
+ params.reason = PRUNE_VACUUM_SCAN;
+ params.cutoffs = &vacrel->cutoffs;
+ params.vistest = vacrel->vistest;
+
/*
* Prune all HOT-update chains and potentially freeze tuples on this page.
*
@@ -1970,12 +1976,12 @@ lazy_scan_prune(LVRelState *vacrel,
* tuples. Pruning will have determined whether or not the page is
* all-visible.
*/
- prune_options = HEAP_PAGE_PRUNE_FREEZE;
+ params.options = HEAP_PAGE_PRUNE_FREEZE;
if (vacrel->nindexes == 0)
- prune_options |= HEAP_PAGE_PRUNE_MARK_UNUSED_NOW;
+ params.options |= HEAP_PAGE_PRUNE_MARK_UNUSED_NOW;
- heap_page_prune_and_freeze(rel, buf, vacrel->vistest, prune_options,
- &vacrel->cutoffs, &presult, PRUNE_VACUUM_SCAN,
+ heap_page_prune_and_freeze(¶ms,
+ &presult,
&vacrel->offnum,
&vacrel->NewRelfrozenXid, &vacrel->NewRelminMxid);
diff --git a/src/include/access/heapam.h b/src/include/access/heapam.h
index e60d34dad25..bc71fef6643 100644
--- a/src/include/access/heapam.h
+++ b/src/include/access/heapam.h
@@ -221,6 +221,55 @@ typedef struct HeapPageFreeze
} HeapPageFreeze;
+
+/* 'reason' codes for heap_page_prune_and_freeze() */
+typedef enum
+{
+ PRUNE_ON_ACCESS, /* on-access pruning */
+ PRUNE_VACUUM_SCAN, /* VACUUM 1st heap pass */
+ PRUNE_VACUUM_CLEANUP, /* VACUUM 2nd heap pass */
+} PruneReason;
+
+/*
+ * Input parameters to heap_page_prune_and_freeze()
+ */
+typedef struct PruneFreezeParams
+{
+ Relation relation; /* relation containing buffer to be pruned */
+ Buffer buffer; /* buffer to be pruned */
+
+ /*
+ * The reason pruning was performed. It is used to set the WAL record
+ * opcode which is used for debugging and analysis purposes.
+ */
+ PruneReason reason;
+
+ /*
+ * Contains flag bits:
+ *
+ * MARK_UNUSED_NOW indicates that dead items can be set LP_UNUSED during
+ * pruning.
+ *
+ * FREEZE indicates that we will also freeze tuples, and will return
+ * 'all_visible', 'all_frozen' flags to the caller.
+ */
+ int options;
+
+ /*
+ * vistest is used to distinguish whether tuples are DEAD or RECENTLY_DEAD
+ * (see heap_prune_satisfies_vacuum).
+ */
+ GlobalVisState *vistest;
+
+ /*
+ * cutoffs contains the freeze cutoffs, established by VACUUM at the
+ * beginning of vacuuming the relation. Required if HEAP_PRUNE_FREEZE
+ * option is set. cutoffs->OldestXmin is also used to determine if dead
+ * tuples are HEAPTUPLE_RECENTLY_DEAD or HEAPTUPLE_DEAD.
+ */
+ struct VacuumCutoffs *cutoffs;
+} PruneFreezeParams;
+
/*
* Per-page state returned by heap_page_prune_and_freeze()
*/
@@ -264,13 +313,6 @@ typedef struct PruneFreezeResult
OffsetNumber deadoffsets[MaxHeapTuplesPerPage];
} PruneFreezeResult;
-/* 'reason' codes for heap_page_prune_and_freeze() */
-typedef enum
-{
- PRUNE_ON_ACCESS, /* on-access pruning */
- PRUNE_VACUUM_SCAN, /* VACUUM 1st heap pass */
- PRUNE_VACUUM_CLEANUP, /* VACUUM 2nd heap pass */
-} PruneReason;
/* ----------------
* function prototypes for heap access method
@@ -367,12 +409,8 @@ extern TransactionId heap_index_delete_tuples(Relation rel,
/* in heap/pruneheap.c */
extern void heap_page_prune_opt(Relation relation, Buffer buffer);
-extern void heap_page_prune_and_freeze(Relation relation, Buffer buffer,
- GlobalVisState *vistest,
- int options,
- struct VacuumCutoffs *cutoffs,
+extern void heap_page_prune_and_freeze(PruneFreezeParams *params,
PruneFreezeResult *presult,
- PruneReason reason,
OffsetNumber *off_loc,
TransactionId *new_relfrozen_xid,
MultiXactId *new_relmin_mxid);
diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list
index 37f26f6c6b7..8a626d633d5 100644
--- a/src/tools/pgindent/typedefs.list
+++ b/src/tools/pgindent/typedefs.list
@@ -2340,6 +2340,7 @@ ProjectionPath
PromptInterruptContext
ProtocolVersion
PrsStorage
+PruneFreezeParams
PruneFreezeResult
PruneReason
PruneState
--
2.43.0
v16-0003-Add-helper-for-freeze-determination-to-heap_page.patchtext/x-patch; charset=US-ASCII; name=v16-0003-Add-helper-for-freeze-determination-to-heap_page.patchDownload
From f269cdce51b10d0b5ccc0e047ff08b247e6adf89 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <melanieplageman@gmail.com>
Date: Tue, 16 Sep 2025 14:22:10 -0400
Subject: [PATCH v16 03/14] Add helper for freeze determination to
heap_page_prune_and_freeze
After scanning through the line pointers on the heap page during
vacuum's first phase, we use several statuses and information we
collected to determine whether or not we will use the freeze plans we
assembled.
Do this in a helper for better readability.
---
src/backend/access/heap/pruneheap.c | 196 +++++++++++++++++-----------
1 file changed, 117 insertions(+), 79 deletions(-)
diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index 9ba89b1fc28..f819ab57d55 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -301,6 +301,118 @@ heap_page_prune_opt(Relation relation, Buffer buffer)
}
}
+/*
+ * Decide if we want to go ahead with freezing according to the freeze plans
+ * we prepared for the given heap buffer or not. If the caller specified we
+ * should not freeze tuples, it exits early. Otherwise, it does a few
+ * pre-freeze checks.
+ *
+ * do_prune, do_hint_full_or_prunable, and did_tuple_hint_fpi must all have
+ * been decided before calling this function.
+ *
+ * prstate is an input/output parameter.
+ *
+ * Returns true if we should use our freeze plans and freeze tuples on the page
+ * and false otherwise.
+ */
+static bool
+heap_page_will_freeze(Relation relation, Buffer buffer,
+ bool did_tuple_hint_fpi,
+ bool do_prune,
+ bool do_hint_prune,
+ PruneState *prstate)
+{
+ bool do_freeze = false;
+
+ /*
+ * If the caller specified we should not attempt to freeze any tuples,
+ * validate that everything is in the right state and exit.
+ */
+ if (!prstate->attempt_freeze)
+ {
+ Assert(!prstate->all_frozen && prstate->nfrozen == 0);
+ Assert(prstate->lpdead_items == 0 || !prstate->all_visible);
+ return false;
+ }
+
+ if (prstate->pagefrz.freeze_required)
+ {
+ /*
+ * heap_prepare_freeze_tuple indicated that at least one XID/MXID from
+ * before FreezeLimit/MultiXactCutoff is present. Must freeze to
+ * advance relfrozenxid/relminmxid.
+ */
+ do_freeze = true;
+ }
+ else
+ {
+ /*
+ * Opportunistically freeze the page if we are generating an FPI
+ * anyway and if doing so means that we can set the page all-frozen
+ * afterwards (might not happen until VACUUM's final heap pass).
+ *
+ * XXX: Previously, we knew if pruning emitted an FPI by checking
+ * pgWalUsage.wal_fpi before and after pruning. Once the freeze and
+ * prune records were combined, this heuristic couldn't be used
+ * anymore. The opportunistic freeze heuristic must be improved;
+ * however, for now, try to approximate the old logic.
+ */
+ if (prstate->all_visible && prstate->all_frozen && prstate->nfrozen > 0)
+ {
+ /*
+ * Freezing would make the page all-frozen. Have already emitted
+ * an FPI or will do so anyway?
+ */
+ if (RelationNeedsWAL(relation))
+ {
+ if (did_tuple_hint_fpi)
+ do_freeze = true;
+ else if (do_prune)
+ {
+ if (XLogCheckBufferNeedsBackup(buffer))
+ do_freeze = true;
+ }
+ else if (do_hint_prune)
+ {
+ if (XLogHintBitIsNeeded() && XLogCheckBufferNeedsBackup(buffer))
+ do_freeze = true;
+ }
+ }
+ }
+ }
+
+ if (do_freeze)
+ {
+ /*
+ * Validate the tuples we will be freezing before entering the
+ * critical section.
+ */
+ heap_pre_freeze_checks(buffer, prstate->frozen, prstate->nfrozen);
+ }
+ else if (prstate->nfrozen > 0)
+ {
+ /*
+ * The page contained some tuples that were not already frozen, and we
+ * chose not to freeze them now. The page won't be all-frozen then.
+ */
+ Assert(!prstate->pagefrz.freeze_required);
+
+ prstate->all_frozen = false;
+ prstate->nfrozen = 0; /* avoid miscounts in instrumentation */
+ }
+ else
+ {
+ /*
+ * We have no freeze plans to execute. The page might already be
+ * all-frozen (perhaps only following pruning), though. Such pages
+ * can be marked all-frozen in the VM by our caller, even though none
+ * of its tuples were newly frozen here.
+ */
+ }
+
+ return do_freeze;
+}
+
/*
* Prune and repair fragmentation and potentially freeze tuples on the
@@ -662,85 +774,11 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
* Decide if we want to go ahead with freezing according to the freeze
* plans we prepared, or not.
*/
- do_freeze = false;
- if (prstate.attempt_freeze)
- {
- if (prstate.pagefrz.freeze_required)
- {
- /*
- * heap_prepare_freeze_tuple indicated that at least one XID/MXID
- * from before FreezeLimit/MultiXactCutoff is present. Must
- * freeze to advance relfrozenxid/relminmxid.
- */
- do_freeze = true;
- }
- else
- {
- /*
- * Opportunistically freeze the page if we are generating an FPI
- * anyway and if doing so means that we can set the page
- * all-frozen afterwards (might not happen until VACUUM's final
- * heap pass).
- *
- * XXX: Previously, we knew if pruning emitted an FPI by checking
- * pgWalUsage.wal_fpi before and after pruning. Once the freeze
- * and prune records were combined, this heuristic couldn't be
- * used anymore. The opportunistic freeze heuristic must be
- * improved; however, for now, try to approximate the old logic.
- */
- if (prstate.all_visible && prstate.all_frozen && prstate.nfrozen > 0)
- {
- /*
- * Freezing would make the page all-frozen. Have already
- * emitted an FPI or will do so anyway?
- */
- if (RelationNeedsWAL(params->relation))
- {
- if (did_tuple_hint_fpi)
- do_freeze = true;
- else if (do_prune)
- {
- if (XLogCheckBufferNeedsBackup(buffer))
- do_freeze = true;
- }
- else if (do_hint_prune)
- {
- if (XLogHintBitIsNeeded() && XLogCheckBufferNeedsBackup(buffer))
- do_freeze = true;
- }
- }
- }
- }
- }
-
- if (do_freeze)
- {
- /*
- * Validate the tuples we will be freezing before entering the
- * critical section.
- */
- heap_pre_freeze_checks(buffer, prstate.frozen, prstate.nfrozen);
- }
- else if (prstate.nfrozen > 0)
- {
- /*
- * The page contained some tuples that were not already frozen, and we
- * chose not to freeze them now. The page won't be all-frozen then.
- */
- Assert(!prstate.pagefrz.freeze_required);
-
- prstate.all_frozen = false;
- prstate.nfrozen = 0; /* avoid miscounts in instrumentation */
- }
- else
- {
- /*
- * We have no freeze plans to execute. The page might already be
- * all-frozen (perhaps only following pruning), though. Such pages
- * can be marked all-frozen in the VM by our caller, even though none
- * of its tuples were newly frozen here.
- */
- }
+ do_freeze = heap_page_will_freeze(params->relation, buffer,
+ did_tuple_hint_fpi,
+ do_prune,
+ do_hint_prune,
+ &prstate);
/* Any error while applying the changes is critical */
START_CRIT_SECTION();
--
2.43.0
v16-0001-Eliminate-COPY-FREEZE-use-of-XLOG_HEAP2_VISIBLE.patchtext/x-patch; charset=UTF-8; name=v16-0001-Eliminate-COPY-FREEZE-use-of-XLOG_HEAP2_VISIBLE.patchDownload
From 4312376fff987b32d4599ccd78893c8c2f7770e0 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <melanieplageman@gmail.com>
Date: Tue, 17 Jun 2025 17:22:10 -0400
Subject: [PATCH v16 01/14] Eliminate COPY FREEZE use of XLOG_HEAP2_VISIBLE
Instead of emitting a separate WAL XLOG_HEAP2_VISIBLE record for setting
bits in the VM, specify the changes to make to the VM block in the
XLOG_HEAP2_MULTI_INSERT record.
This halves the number of WAL records emitted by COPY FREEZE.
Author: Melanie Plageman <melanieplageman@gmail.com>
Reviewed-by: Andres Freund <andres@anarazel.de>
Reviewed-by: Robert Haas <robertmhaas@gmail.com>
Reviewed-by: Kirill Reshke <reshkekirill@gmail.com>
Discussion: https://postgr.es/m/flat/CAAKRu_ZMw6Npd_qm2KM%2BFwQ3cMOMx1Dh3VMhp8-V7SOLxdK9-g%40mail.gmail.com
---
src/backend/access/heap/heapam.c | 44 ++++++++++------
src/backend/access/heap/heapam_xlog.c | 52 ++++++++++++++++++-
src/backend/access/heap/visibilitymap.c | 68 ++++++++++++++++++++++++-
src/backend/access/rmgrdesc/heapdesc.c | 5 ++
src/include/access/visibilitymap.h | 3 ++
5 files changed, 154 insertions(+), 18 deletions(-)
diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index ed0c0c2dc9f..7f354caec31 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -2466,7 +2466,11 @@ heap_multi_insert(Relation relation, TupleTableSlot **slots, int ntuples,
starting_with_empty_page = PageGetMaxOffsetNumber(page) == 0;
if (starting_with_empty_page && (options & HEAP_INSERT_FROZEN))
+ {
all_frozen_set = true;
+ /* Lock the vmbuffer before entering the critical section */
+ LockBuffer(vmbuffer, BUFFER_LOCK_EXCLUSIVE);
+ }
/* NO EREPORT(ERROR) from here till changes are logged */
START_CRIT_SECTION();
@@ -2506,7 +2510,8 @@ heap_multi_insert(Relation relation, TupleTableSlot **slots, int ntuples,
* going to add further frozen rows to it.
*
* If we're only adding already frozen rows to a previously empty
- * page, mark it as all-visible.
+ * page, mark it as all-frozen and update the visibility map. We're
+ * already holding a pin on the vmbuffer.
*/
if (PageIsAllVisible(page) && !(options & HEAP_INSERT_FROZEN))
{
@@ -2517,7 +2522,14 @@ heap_multi_insert(Relation relation, TupleTableSlot **slots, int ntuples,
vmbuffer, VISIBILITYMAP_VALID_BITS);
}
else if (all_frozen_set)
+ {
PageSetAllVisible(page);
+ visibilitymap_set_vmbits(BufferGetBlockNumber(buffer),
+ vmbuffer,
+ VISIBILITYMAP_ALL_VISIBLE |
+ VISIBILITYMAP_ALL_FROZEN,
+ RelationGetRelationName(relation));
+ }
/*
* XXX Should we set PageSetPrunable on this page ? See heap_insert()
@@ -2565,6 +2577,12 @@ heap_multi_insert(Relation relation, TupleTableSlot **slots, int ntuples,
xlrec->flags = 0;
if (all_visible_cleared)
xlrec->flags = XLH_INSERT_ALL_VISIBLE_CLEARED;
+
+ /*
+ * We don't have to worry about including a conflict xid in the
+ * WAL record as HEAP_INSERT_FROZEN intentionally violates
+ * visibility rules.
+ */
if (all_frozen_set)
xlrec->flags = XLH_INSERT_ALL_FROZEN_SET;
@@ -2627,7 +2645,10 @@ heap_multi_insert(Relation relation, TupleTableSlot **slots, int ntuples,
XLogBeginInsert();
XLogRegisterData(xlrec, tupledata - scratch.data);
+
XLogRegisterBuffer(0, buffer, REGBUF_STANDARD | bufflags);
+ if (all_frozen_set)
+ XLogRegisterBuffer(1, vmbuffer, 0);
XLogRegisterBufData(0, tupledata, totaldatalen);
@@ -2637,26 +2658,17 @@ heap_multi_insert(Relation relation, TupleTableSlot **slots, int ntuples,
recptr = XLogInsert(RM_HEAP2_ID, info);
PageSetLSN(page, recptr);
+ if (all_frozen_set)
+ {
+ Assert(BufferIsDirty(vmbuffer));
+ PageSetLSN(BufferGetPage(vmbuffer), recptr);
+ }
}
END_CRIT_SECTION();
- /*
- * If we've frozen everything on the page, update the visibilitymap.
- * We're already holding pin on the vmbuffer.
- */
if (all_frozen_set)
- {
- /*
- * It's fine to use InvalidTransactionId here - this is only used
- * when HEAP_INSERT_FROZEN is specified, which intentionally
- * violates visibility rules.
- */
- visibilitymap_set(relation, BufferGetBlockNumber(buffer), buffer,
- InvalidXLogRecPtr, vmbuffer,
- InvalidTransactionId,
- VISIBILITYMAP_ALL_VISIBLE | VISIBILITYMAP_ALL_FROZEN);
- }
+ LockBuffer(vmbuffer, BUFFER_LOCK_UNLOCK);
UnlockReleaseBuffer(buffer);
ndone += nthispage;
diff --git a/src/backend/access/heap/heapam_xlog.c b/src/backend/access/heap/heapam_xlog.c
index cf843277938..c2c7e6ab086 100644
--- a/src/backend/access/heap/heapam_xlog.c
+++ b/src/backend/access/heap/heapam_xlog.c
@@ -551,6 +551,7 @@ heap_xlog_multi_insert(XLogReaderState *record)
int i;
bool isinit = (XLogRecGetInfo(record) & XLOG_HEAP_INIT_PAGE) != 0;
XLogRedoAction action;
+ Buffer vmbuffer = InvalidBuffer;
/*
* Insertion doesn't overwrite MVCC data, so no conflict processing is
@@ -571,11 +572,11 @@ heap_xlog_multi_insert(XLogReaderState *record)
if (xlrec->flags & XLH_INSERT_ALL_VISIBLE_CLEARED)
{
Relation reln = CreateFakeRelcacheEntry(rlocator);
- Buffer vmbuffer = InvalidBuffer;
visibilitymap_pin(reln, blkno, &vmbuffer);
visibilitymap_clear(reln, blkno, vmbuffer, VISIBILITYMAP_VALID_BITS);
ReleaseBuffer(vmbuffer);
+ vmbuffer = InvalidBuffer;
FreeFakeRelcacheEntry(reln);
}
@@ -662,6 +663,55 @@ heap_xlog_multi_insert(XLogReaderState *record)
if (BufferIsValid(buffer))
UnlockReleaseBuffer(buffer);
+ buffer = InvalidBuffer;
+
+ /*
+ * Read and update the visibility map (VM) block.
+ *
+ * We must always redo VM changes, even if the corresponding heap page
+ * update was skipped due to the LSN interlock. Each VM block covers
+ * multiple heap pages, so later WAL records may update other bits in the
+ * same block. If this record includes a full-page image (FPI), subsequent
+ * WAL records may depend on it to guard against torn pages.
+ *
+ * Heap page changes are replayed first to preserve the invariant:
+ * PD_ALL_VISIBLE must be set on the heap page if the VM bit is set.
+ *
+ * Note that we released the heap page lock above. Under normal operation,
+ * this would be unsafe — a concurrent modification could clear
+ * PD_ALL_VISIBLE while the VM bit remained set, violating the invariant.
+ *
+ * During recovery, however, no concurrent writers exist. Therefore,
+ * updating the VM without holding the heap page lock is safe enough. This
+ * same approach is taken when replaying xl_heap_visible records (see
+ * heap_xlog_visible()).
+ */
+ if ((xlrec->flags & XLH_INSERT_ALL_FROZEN_SET) &&
+ XLogReadBufferForRedoExtended(record, 1, RBM_ZERO_ON_ERROR, false,
+ &vmbuffer) == BLK_NEEDS_REDO)
+ {
+ Page vmpage = BufferGetPage(vmbuffer);
+ char *relname;
+
+ /* initialize the page if it was read as zeros */
+ if (PageIsNew(vmpage))
+ PageInit(vmpage, BLCKSZ, 0);
+
+ /* We don't have relation name during recovery, so use relfilenode */
+ relname = psprintf("%u", rlocator.relNumber);
+ visibilitymap_set_vmbits(blkno,
+ vmbuffer,
+ VISIBILITYMAP_ALL_VISIBLE |
+ VISIBILITYMAP_ALL_FROZEN,
+ relname);
+
+ PageSetLSN(BufferGetPage(vmbuffer), lsn);
+ pfree(relname);
+ }
+
+ if (BufferIsValid(vmbuffer))
+ UnlockReleaseBuffer(vmbuffer);
+
/*
* If the page is running low on free space, update the FSM as well.
* Arbitrarily, our definition of "low" is less than 20%. We can't do much
diff --git a/src/backend/access/heap/visibilitymap.c b/src/backend/access/heap/visibilitymap.c
index 7306c16f05c..738105eb97e 100644
--- a/src/backend/access/heap/visibilitymap.c
+++ b/src/backend/access/heap/visibilitymap.c
@@ -14,7 +14,8 @@
* visibilitymap_clear - clear bits for one page in the visibility map
* visibilitymap_pin - pin a map page for setting a bit
* visibilitymap_pin_ok - check whether correct map page is already pinned
- * visibilitymap_set - set a bit in a previously pinned page
+ * visibilitymap_set - set bit(s) in a previously pinned page and log
+ * visibilitymap_set_vmbits - set bit(s) in a pinned page
* visibilitymap_get_status - get status of bits
* visibilitymap_count - count number of bits set in visibility map
* visibilitymap_prepare_truncate -
@@ -321,6 +322,71 @@ visibilitymap_set(Relation rel, BlockNumber heapBlk, Buffer heapBuf,
return status;
}
+/*
+ * Set visibility map (VM) flags in the block referenced by vmBuf.
+ *
+ * This function is intended for callers that log VM changes together
+ * with the heap page modifications that rendered the page all-visible.
+ * Callers that log VM changes separately should use visibilitymap_set().
+ *
+ * Caller responsibilities:
+ * - vmBuf must be pinned and exclusively locked, and it must cover the
+ * VM bits corresponding to heapBlk.
+ * - In normal operation (not recovery), this must be called inside a
+ * critical section that also applies the necessary heap page changes
+ * and, if applicable, emits WAL.
+ * - The caller is responsible for WAL logging the VM buffer changes and
+ * for any required modifications to the associated heap page. This
+ * includes preserving invariants such as holding a pin and exclusive
+ * lock on the buffer containing heapBlk.
+ *
+ * heapRelname is used only for debugging.
+ */
+uint8
+visibilitymap_set_vmbits(BlockNumber heapBlk,
+ Buffer vmBuf, uint8 flags,
+ const char *heapRelname)
+{
+ BlockNumber mapBlock = HEAPBLK_TO_MAPBLOCK(heapBlk);
+ uint32 mapByte = HEAPBLK_TO_MAPBYTE(heapBlk);
+ uint8 mapOffset = HEAPBLK_TO_OFFSET(heapBlk);
+ Page page;
+ uint8 *map;
+ uint8 status;
+
+#ifdef TRACE_VISIBILITYMAP
+ elog(DEBUG1, "vm_set flags 0x%02X for %s %d",
+ flags, heapRelname, heapBlk);
+#endif
+
+ /* Call in same critical section where WAL is emitted. */
+ Assert(InRecovery || CritSectionCount > 0);
+
+ /* Flags should be valid. Also never clear bits with this function */
+ Assert((flags & VISIBILITYMAP_VALID_BITS) == flags);
+
+ /* Must never set all_frozen bit without also setting all_visible bit */
+ Assert(flags != VISIBILITYMAP_ALL_FROZEN);
+
+ /* Check that we have the right VM page pinned */
+ if (!BufferIsValid(vmBuf) || BufferGetBlockNumber(vmBuf) != mapBlock)
+ elog(ERROR, "wrong VM buffer passed to visibilitymap_set");
+
+ Assert(BufferIsExclusiveLocked(vmBuf));
+
+ page = BufferGetPage(vmBuf);
+ map = (uint8 *) PageGetContents(page);
+
+ status = (map[mapByte] >> mapOffset) & VISIBILITYMAP_VALID_BITS;
+ if (flags != status)
+ {
+ map[mapByte] |= (flags << mapOffset);
+ MarkBufferDirty(vmBuf);
+ }
+
+ return status;
+}
+
/*
* visibilitymap_get_status - get status of bits
*
diff --git a/src/backend/access/rmgrdesc/heapdesc.c b/src/backend/access/rmgrdesc/heapdesc.c
index 82b62c95de5..b48d7dc1d24 100644
--- a/src/backend/access/rmgrdesc/heapdesc.c
+++ b/src/backend/access/rmgrdesc/heapdesc.c
@@ -16,6 +16,7 @@
#include "access/heapam_xlog.h"
#include "access/rmgrdesc_utils.h"
+#include "access/visibilitymapdefs.h"
#include "storage/standbydefs.h"
/*
@@ -354,6 +355,10 @@ heap2_desc(StringInfo buf, XLogReaderState *record)
appendStringInfo(buf, "ntuples: %d, flags: 0x%02X", xlrec->ntuples,
xlrec->flags);
+ if (xlrec->flags & XLH_INSERT_ALL_FROZEN_SET)
+ appendStringInfo(buf, ", vm_flags: 0x%02X",
+ VISIBILITYMAP_ALL_VISIBLE | VISIBILITYMAP_ALL_FROZEN);
+
if (XLogRecHasBlockData(record, 0) && !isinit)
{
appendStringInfoString(buf, ", offsets:");
diff --git a/src/include/access/visibilitymap.h b/src/include/access/visibilitymap.h
index be21c6dd1a3..3dcf37ba03f 100644
--- a/src/include/access/visibilitymap.h
+++ b/src/include/access/visibilitymap.h
@@ -37,6 +37,9 @@ extern uint8 visibilitymap_set(Relation rel,
Buffer vmBuf,
TransactionId cutoff_xid,
uint8 flags);
+extern uint8 visibilitymap_set_vmbits(BlockNumber heapBlk,
+ Buffer vmBuf, uint8 flags,
+ const char *heapRelname);
extern uint8 visibilitymap_get_status(Relation rel, BlockNumber heapBlk, Buffer *vmbuf);
extern void visibilitymap_count(Relation rel, BlockNumber *all_visible, BlockNumber *all_frozen);
extern BlockNumber visibilitymap_prepare_truncate(Relation rel,
--
2.43.0
v16-0006-Eliminate-XLOG_HEAP2_VISIBLE-from-vacuum-prune-f.patchtext/x-patch; charset=UTF-8; name=v16-0006-Eliminate-XLOG_HEAP2_VISIBLE-from-vacuum-prune-f.patchDownload
From 0141c10d30bd7ea620d16d24201ba22e5337a4dc Mon Sep 17 00:00:00 2001
From: Melanie Plageman <melanieplageman@gmail.com>
Date: Sat, 27 Sep 2025 11:52:08 -0400
Subject: [PATCH v16 06/14] Eliminate XLOG_HEAP2_VISIBLE from vacuum
prune/freeze
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
Vacuum no longer emits a separate WAL record for each page set
all-visible or all-frozen during phase I. Instead, visibility map
updates are now included in the XLOG_HEAP2_PRUNE_VACUUM_SCAN record that
is already emitted for pruning and freezing.
Previously, heap_page_prune_and_freeze() determined whether a page was
all-visible, but the corresponding VM bits were only set later in
lazy_scan_prune(). Now the VM is updated immediately in
heap_page_prune_and_freeze(), at the same time as the heap
modifications.
This change applies only to vacuum’s prune/freeze work, not to pruning
performed during normal page access.
---
src/backend/access/heap/heapam_xlog.c | 158 +++++++--
src/backend/access/heap/pruneheap.c | 474 ++++++++++++++++++++-----
src/backend/access/heap/vacuumlazy.c | 202 +----------
src/backend/access/rmgrdesc/heapdesc.c | 11 +-
src/include/access/heapam.h | 36 +-
src/include/access/heapam_xlog.h | 17 +-
6 files changed, 584 insertions(+), 314 deletions(-)
diff --git a/src/backend/access/heap/heapam_xlog.c b/src/backend/access/heap/heapam_xlog.c
index c2c7e6ab086..911416bbc56 100644
--- a/src/backend/access/heap/heapam_xlog.c
+++ b/src/backend/access/heap/heapam_xlog.c
@@ -35,7 +35,9 @@ heap_xlog_prune_freeze(XLogReaderState *record)
Buffer buffer;
RelFileLocator rlocator;
BlockNumber blkno;
- XLogRedoAction action;
+ Buffer vmbuffer = InvalidBuffer;
+ uint8 vmflags = 0;
+ Size freespace = 0;
XLogRecGetBlockTag(record, 0, &rlocator, NULL, &blkno);
memcpy(&xlrec, maindataptr, SizeOfHeapPrune);
@@ -50,11 +52,22 @@ heap_xlog_prune_freeze(XLogReaderState *record)
Assert((xlrec.flags & XLHP_CLEANUP_LOCK) != 0 ||
(xlrec.flags & (XLHP_HAS_REDIRECTIONS | XLHP_HAS_DEAD_ITEMS)) == 0);
+ if (xlrec.flags & XLHP_VM_ALL_VISIBLE)
+ {
+ vmflags = VISIBILITYMAP_ALL_VISIBLE;
+ if (xlrec.flags & XLHP_VM_ALL_FROZEN)
+ vmflags |= VISIBILITYMAP_ALL_FROZEN;
+ }
+
/*
- * We are about to remove and/or freeze tuples. In Hot Standby mode,
- * ensure that there are no queries running for which the removed tuples
- * are still visible or which still consider the frozen xids as running.
- * The conflict horizon XID comes after xl_heap_prune.
+ * After xl_heap_prune is the optional snapshot conflict horizon.
+ *
+ * In Hot Standby mode, we must ensure that there are no running queries
+ * which would conflict with the changes in this record. That means we
+ * can't replay this record if it removes tuples that are still visible to
+ * transactions on the standby, freeze tuples with xids that are still
+ * considered running on the standby, or set a page as all-visible in the
+ * VM if it isn't all-visible to all transactions on the standby.
*/
if ((xlrec.flags & XLHP_HAS_CONFLICT_HORIZON) != 0)
{
@@ -71,12 +84,12 @@ heap_xlog_prune_freeze(XLogReaderState *record)
}
/*
- * If we have a full-page image, restore it and we're done.
+ * If we have a full-page image of the heap block, restore it and we're
+ * done with the heap block.
*/
- action = XLogReadBufferForRedoExtended(record, 0, RBM_NORMAL,
- (xlrec.flags & XLHP_CLEANUP_LOCK) != 0,
- &buffer);
- if (action == BLK_NEEDS_REDO)
+ if (XLogReadBufferForRedoExtended(record, 0, RBM_NORMAL,
+ (xlrec.flags & XLHP_CLEANUP_LOCK) != 0,
+ &buffer) == BLK_NEEDS_REDO)
{
Page page = BufferGetPage(buffer);
OffsetNumber *redirected;
@@ -90,6 +103,9 @@ heap_xlog_prune_freeze(XLogReaderState *record)
xlhp_freeze_plan *plans;
OffsetNumber *frz_offsets;
char *dataptr = XLogRecGetBlockData(record, 0, &datalen);
+ bool do_prune;
+ bool mark_buffer_dirty = false;
+ bool set_lsn = false;
heap_xlog_deserialize_prune_and_freeze(dataptr, xlrec.flags,
&nplans, &plans, &frz_offsets,
@@ -97,11 +113,16 @@ heap_xlog_prune_freeze(XLogReaderState *record)
&ndead, &nowdead,
&nunused, &nowunused);
+ do_prune = nredirected > 0 || ndead > 0 || nunused > 0;
+
+ /* Ensure the record does something */
+ Assert(do_prune || nplans > 0 || vmflags & VISIBILITYMAP_VALID_BITS);
+
/*
* Update all line pointers per the record, and repair fragmentation
* if needed.
*/
- if (nredirected > 0 || ndead > 0 || nunused > 0)
+ if (do_prune)
heap_page_prune_execute(buffer,
(xlrec.flags & XLHP_CLEANUP_LOCK) == 0,
redirected, nredirected,
@@ -138,36 +159,121 @@ heap_xlog_prune_freeze(XLogReaderState *record)
/* There should be no more data */
Assert((char *) frz_offsets == dataptr + datalen);
+ if (do_prune || nplans > 0)
+ mark_buffer_dirty = set_lsn = true;
+
+ /*
+ * The critical integrity requirement here is that we must never end
+ * up with with the visibility map bit set and the page-level
+ * PD_ALL_VISIBLE bit clear. If that were to occur, a subsequent page
+ * modification would fail to clear the visibility map bit.
+ *
+ * If this record only sets the VM, no need to dirty the heap page.
+ */
+ if ((vmflags & VISIBILITYMAP_VALID_BITS) && !PageIsAllVisible(page))
+ {
+ PageSetAllVisible(page);
+ mark_buffer_dirty = true;
+
+ /*
+ * Always emit a WAL record when setting PD_ALL_VISIBLE but only
+ * emit an FPI if checksums/wal_log_hints are enabled. Advance the
+ * page LSN only if the record could include an FPI, since
+ * recovery skips records <= the stamped LSN. Otherwise it might
+ * skip an earlier FPI needed to repair a torn page.
+ */
+ if (XLogHintBitIsNeeded())
+ set_lsn = true;
+ }
+
+ if (mark_buffer_dirty)
+ MarkBufferDirty(buffer);
+
+ if (set_lsn)
+ PageSetLSN(page, lsn);
+
/*
* Note: we don't worry about updating the page's prunability hints.
* At worst this will cause an extra prune cycle to occur soon.
*/
-
- PageSetLSN(page, lsn);
- MarkBufferDirty(buffer);
}
/*
- * If we released any space or line pointers, update the free space map.
+ * If we released any space or line pointers or set PD_ALL_VISIBLE or the
+ * VM, update the freespace map.
+ *
+ * Even when no actual space is freed (e.g., when only marking the page
+ * all-visible or frozen), we still update the FSM. Because the FSM is
+ * unlogged and maintained heuristically, it often becomes stale on
+ * standbys. If such a standby is later promoted and runs VACUUM, it will
+ * skip recalculating free space for pages that were marked all-visible
+ * (or all-frozen, depending on the mode). FreeSpaceMapVacuum can then
+ * propagate overly optimistic free space values upward, causing future
+ * insertions to select pages that turn out to be unusable. In bulk, this
+ * can lead to long stalls.
+ *
+ * To prevent this, always refresh the FSM’s view when a page becomes
+ * all-visible or all-frozen.
+ *
+ * Do this regardless of whether a full-page image is logged, since FSM
+ * data is not part of the page itself.
*
- * Do this regardless of a full-page image being applied, since the FSM
- * data is not in the page anyway.
*/
if (BufferIsValid(buffer))
{
- if (xlrec.flags & (XLHP_HAS_REDIRECTIONS |
- XLHP_HAS_DEAD_ITEMS |
- XLHP_HAS_NOW_UNUSED_ITEMS))
- {
- Size freespace = PageGetHeapFreeSpace(BufferGetPage(buffer));
+ if ((xlrec.flags & (XLHP_HAS_REDIRECTIONS |
+ XLHP_HAS_DEAD_ITEMS |
+ XLHP_HAS_NOW_UNUSED_ITEMS)) ||
+ (vmflags & VISIBILITYMAP_VALID_BITS))
+ freespace = PageGetHeapFreeSpace(BufferGetPage(buffer));
- UnlockReleaseBuffer(buffer);
+ /*
+ * We want to avoid holding an exclusive lock on the heap buffer while
+ * doing IO (either of the FSM or the VM), so we'll release the lock
+ * on the heap buffer before doing either.
+ */
+ UnlockReleaseBuffer(buffer);
+ }
- XLogRecordPageWithFreeSpace(rlocator, blkno, freespace);
+ /*
+ * Now read and update the VM block.
+ *
+ * We must redo changes to the VM even if the heap page was skipped due to
+ * LSN interlock. See comment in heap_xlog_multi_insert() for more details
+ * on replaying changes to the VM.
+ */
+ if ((vmflags & VISIBILITYMAP_VALID_BITS) &&
+ XLogReadBufferForRedoExtended(record, 1,
+ RBM_ZERO_ON_ERROR,
+ false,
+ &vmbuffer) == BLK_NEEDS_REDO)
+ {
+ Page vmpage = BufferGetPage(vmbuffer);
+ char *relname;
+ uint8 old_vmbits = 0;
+
+ /* initialize the page if it was read as zeros */
+ if (PageIsNew(vmpage))
+ PageInit(vmpage, BLCKSZ, 0);
+
+ /* We don't have relation name during recovery, so use relfilenode */
+ relname = psprintf("%u", rlocator.relNumber);
+ old_vmbits = visibilitymap_set_vmbits(blkno, vmbuffer, vmflags, relname);
+
+ /* Only set VM page LSN if we modified the page */
+ if (old_vmbits != vmflags)
+ {
+ Assert(BufferIsDirty(vmbuffer));
+ PageSetLSN(BufferGetPage(vmbuffer), lsn);
}
- else
- UnlockReleaseBuffer(buffer);
+ pfree(relname);
}
+
+ if (BufferIsValid(vmbuffer))
+ UnlockReleaseBuffer(vmbuffer);
+
+ if (freespace > 0)
+ XLogRecordPageWithFreeSpace(rlocator, blkno, freespace);
}
/*
diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index c23a6a21a7f..f384d74416a 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -19,6 +19,7 @@
#include "access/htup_details.h"
#include "access/multixact.h"
#include "access/transam.h"
+#include "access/visibilitymap.h"
#include "access/xlog.h"
#include "access/xloginsert.h"
#include "commands/vacuum.h"
@@ -43,6 +44,8 @@ typedef struct
bool mark_unused_now;
/* whether to attempt freezing tuples */
bool attempt_freeze;
+ /* whether or not to attempt updating the VM */
+ bool attempt_update_vm;
const struct VacuumCutoffs *cutoffs;
/*-------------------------------------------------------
@@ -132,17 +135,17 @@ typedef struct
* all_visible and all_frozen indicate if the all-visible and all-frozen
* bits in the visibility map can be set for this page after pruning.
*
- * visibility_cutoff_xid is the newest xmin of live tuples on the page.
- * The caller can use it as the conflict horizon, when setting the VM
- * bits. It is only valid if we froze some tuples, and all_frozen is
- * true.
+ * visibility_cutoff_xid is the newest xmin of live tuples on the page. It
+ * can be used as the conflict horizon when setting the VM or when
+ * freezing all the tuples on the page. It is only valid when all the live
+ * tuples on the page are all-visible.
*
* NOTE: all_visible and all_frozen initially don't include LP_DEAD items.
* That's convenient for heap_page_prune_and_freeze(), to use them to
- * decide whether to freeze the page or not. The all_visible and
- * all_frozen values returned to the caller are adjusted to include
- * LP_DEAD items after we determine whether or not to opportunistically
- * freeze.
+ * decide whether to opportunistically freeze the page or not. The
+ * all_visible and all_frozen values ultimately used to set the VM are
+ * adjusted to include LP_DEAD items after we determine whether or not to
+ * opportunistically freeze.
*/
bool all_visible;
bool all_frozen;
@@ -173,6 +176,19 @@ static void heap_prune_record_unchanged_lp_redirect(PruneState *prstate, OffsetN
static void page_verify_redirects(Page page);
+static TransactionId get_conflict_xid(bool do_prune, bool do_freeze, bool do_set_vm,
+ TransactionId latest_xid_removed, TransactionId frz_conflict_horizon,
+ TransactionId visibility_cutoff_xid, bool blk_already_av,
+ bool set_blk_all_frozen);
+
+static bool heap_page_will_set_vis(Relation relation,
+ BlockNumber heap_blk,
+ Buffer heap_buf,
+ Buffer vmbuffer,
+ bool blk_known_av,
+ const PruneState *prstate,
+ uint8 *vmflags,
+ bool *do_set_pd_vis);
/*
* Optionally prune and repair fragmentation in the specified page.
@@ -258,6 +274,8 @@ heap_page_prune_opt(Relation relation, Buffer buffer)
params.reason = PRUNE_ON_ACCESS;
params.vistest = vistest;
params.cutoffs = NULL;
+ params.vmbuffer = InvalidBuffer;
+ params.blk_known_av = false;
/*
* For now, pass mark_unused_now as false regardless of whether or
@@ -431,10 +449,108 @@ heap_page_will_freeze(Relation relation, Buffer buffer,
return do_freeze;
}
+/*
+ * Decide whether to set the visibility map bits for heap_blk, using
+ * information from PruneState and blk_known_av. Some callers may already
+ * have examined this page’s VM bits (e.g., VACUUM in the previous
+ * heap_vac_scan_next_block() call) and can pass that along.
+ *
+ * Returns true if one or both VM bits should be set, along with the desired
+ * flags in *vmflags. Also indicates via do_set_pd_vis whether PD_ALL_VISIBLE
+ * should be set on the heap page.
+ */
+static bool
+heap_page_will_set_vis(Relation relation,
+ BlockNumber heap_blk,
+ Buffer heap_buf,
+ Buffer vmbuffer,
+ bool blk_known_av,
+ const PruneState *prstate,
+ uint8 *vmflags,
+ bool *do_set_pd_vis)
+{
+ Page heap_page = BufferGetPage(heap_buf);
+ bool do_set_vm = false;
+
+ *do_set_pd_vis = false;
+
+ if (!prstate->attempt_update_vm)
+ {
+ Assert(!prstate->all_visible && !prstate->all_frozen);
+ Assert(*vmflags == 0);
+ return false;
+ }
+
+ if (prstate->all_visible && !PageIsAllVisible(heap_page))
+ *do_set_pd_vis = true;
+
+ if ((prstate->all_visible && !blk_known_av) ||
+ (prstate->all_frozen && !VM_ALL_FROZEN(relation, heap_blk, &vmbuffer)))
+ {
+ *vmflags = VISIBILITYMAP_ALL_VISIBLE;
+ if (prstate->all_frozen)
+ *vmflags |= VISIBILITYMAP_ALL_FROZEN;
+
+ do_set_vm = true;
+ }
+
+ /*
+ * Now handle two potential corruption cases:
+ *
+ * These do not need to happen in a critical section and are not
+ * WAL-logged.
+ *
+ * As of PostgreSQL 9.2, the visibility map bit should never be set if the
+ * page-level bit is clear. However, it's possible that in vacuum the bit
+ * got cleared after heap_vac_scan_next_block() was called, so we must
+ * recheck with buffer lock before concluding that the VM is corrupt.
+ */
+ else if (blk_known_av && !PageIsAllVisible(heap_page) &&
+ visibilitymap_get_status(relation, heap_blk, &vmbuffer) != 0)
+ {
+ ereport(WARNING,
+ (errcode(ERRCODE_DATA_CORRUPTED),
+ errmsg("page is not marked all-visible but visibility map bit is set in relation \"%s\" page %u",
+ RelationGetRelationName(relation), heap_blk)));
+
+ visibilitymap_clear(relation, heap_blk, vmbuffer,
+ VISIBILITYMAP_VALID_BITS);
+ }
+
+ /*
+ * It's possible for the value returned by
+ * GetOldestNonRemovableTransactionId() to move backwards, so it's not
+ * wrong for us to see tuples that appear to not be visible to everyone
+ * yet, while PD_ALL_VISIBLE is already set. The real safe xmin value
+ * never moves backwards, but GetOldestNonRemovableTransactionId() is
+ * conservative and sometimes returns a value that's unnecessarily small,
+ * so if we see that contradiction it just means that the tuples that we
+ * think are not visible to everyone yet actually are, and the
+ * PD_ALL_VISIBLE flag is correct.
+ *
+ * There should never be LP_DEAD items on a page with PD_ALL_VISIBLE set,
+ * however.
+ */
+ else if (prstate->lpdead_items > 0 && PageIsAllVisible(heap_page))
+ {
+ ereport(WARNING,
+ (errcode(ERRCODE_DATA_CORRUPTED),
+ errmsg("page containing LP_DEAD items is marked as all-visible in relation \"%s\" page %u",
+ RelationGetRelationName(relation), heap_blk)));
+
+ PageClearAllVisible(heap_page);
+ MarkBufferDirty(heap_buf);
+ visibilitymap_clear(relation, heap_blk, vmbuffer,
+ VISIBILITYMAP_VALID_BITS);
+ }
+
+ return do_set_vm;
+}
/*
* Prune and repair fragmentation and potentially freeze tuples on the
- * specified page.
+ * specified page. If the page's visibility status has changed, update it in
+ * the VM.
*
* Caller must have pin and buffer cleanup lock on the page. Note that we
* don't update the FSM information for page on caller's behalf. Caller might
@@ -449,12 +565,13 @@ heap_page_will_freeze(Relation relation, Buffer buffer,
* it's required in order to advance relfrozenxid / relminmxid, or if it's
* considered advantageous for overall system performance to do so now. The
* 'params.cutoffs', 'presult', 'new_relfrozen_xid' and 'new_relmin_mxid'
- * arguments are required when freezing. When HEAP_PRUNE_FREEZE option is
- * passed, we also set presult->all_visible and presult->all_frozen after
- * determining whether or not to opporunistically freeze, to indicate if the
- * VM bits can be set. They are always set to false when the
- * HEAP_PRUNE_FREEZE option is not passed, because at the moment only callers
- * that also freeze need that information.
+ * arguments are required when freezing.
+ *
+ * If HEAP_PAGE_PRUNE_UPDATE_VIS is set in params and the visibility status of
+ * the page has changed, we will update the VM at the same time as pruning and
+ * freezing the heap page. We will also update presult->old_vmbits and
+ * presult->new_vmbits with the state of the VM before and after updating it
+ * for the caller to use in bookkeeping.
*
* presult contains output parameters needed by callers, such as the number of
* tuples removed and the offsets of dead items on the page after pruning.
@@ -479,6 +596,7 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
MultiXactId *new_relmin_mxid)
{
Buffer buffer = params->buffer;
+ Buffer vmbuffer = params->vmbuffer;
Page page = BufferGetPage(buffer);
BlockNumber blockno = BufferGetBlockNumber(buffer);
OffsetNumber offnum,
@@ -488,15 +606,22 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
bool do_freeze;
bool do_prune;
bool do_hint_prune;
+ bool do_set_vm;
+ bool do_set_pd_vis;
bool did_tuple_hint_fpi;
int64 fpi_before = pgWalUsage.wal_fpi;
TransactionId frz_conflict_horizon = InvalidTransactionId;
+ TransactionId conflict_xid = InvalidTransactionId;
+ uint8 new_vmbits = 0;
+ uint8 old_vmbits = 0;
/* Copy parameters to prstate */
prstate.vistest = params->vistest;
prstate.mark_unused_now =
(params->options & HEAP_PAGE_PRUNE_MARK_UNUSED_NOW) != 0;
prstate.attempt_freeze = (params->options & HEAP_PAGE_PRUNE_FREEZE) != 0;
+ prstate.attempt_update_vm =
+ (params->options & HEAP_PAGE_PRUNE_UPDATE_VIS) != 0;
prstate.cutoffs = params->cutoffs;
/*
@@ -543,50 +668,54 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
prstate.deadoffsets = presult->deadoffsets;
/*
- * Caller may update the VM after we're done. We can keep track of
- * whether the page will be all-visible and all-frozen after pruning and
- * freezing to help the caller to do that.
+ * Track whether the page could be marked all-visible and/or all-frozen.
+ * This information is used for opportunistic freezing and for updating
+ * the visibility map (VM) if requested by the caller.
+ *
+ * Currently, only VACUUM performs freezing, but other callers may in the
+ * future. Visibility bookkeeping is required not just for setting the VM
+ * bits, but also for opportunistic freezing: we only consider freezing if
+ * the page would become all-frozen, or if it would be all-frozen except
+ * for dead tuples that VACUUM will remove. If attempt_update_vm is false,
+ * we will not set the VM bit even if the page is found to be all-visible.
*
- * Currently, only VACUUM sets the VM bits. To save the effort, only do
- * the bookkeeping if the caller needs it. Currently, that's tied to
- * HEAP_PAGE_PRUNE_FREEZE, but it could be a separate flag if you wanted
- * to update the VM bits without also freezing or freeze without also
- * setting the VM bits.
+ * If HEAP_PAGE_PRUNE_UPDATE_VIS is passed without HEAP_PAGE_PRUNE_FREEZE,
+ * prstate.all_frozen must be initialized to false, since we will not call
+ * heap_prepare_freeze_tuple() for each tuple.
*
- * In addition to telling the caller whether it can set the VM bit, we
- * also use 'all_visible' and 'all_frozen' for our own decision-making. If
- * the whole page would become frozen, we consider opportunistically
- * freezing tuples. We will not be able to freeze the whole page if there
- * are tuples present that are not visible to everyone or if there are
- * dead tuples which are not yet removable. However, dead tuples which
- * will be removed by the end of vacuuming should not preclude us from
- * opportunistically freezing. Because of that, we do not immediately
- * clear all_visible when we see LP_DEAD items. We fix that after
- * scanning the line pointers, before we return the value to the caller,
- * so that the caller doesn't set the VM bit incorrectly.
+ * Dead tuples that will be removed by the end of vacuum should not
+ * prevent opportunistic freezing. Therefore, we do not clear all_visible
+ * when we encounter LP_DEAD items. Instead, we correct all_visible after
+ * deciding whether to freeze, but before updating the VM, to avoid
+ * setting the VM bit incorrectly.
+ *
+ * If neither freezing nor VM updates are requested, we skip the extra
+ * bookkeeping. In this case, initializing all_visible to false allows
+ * heap_prune_record_unchanged_lp_normal() to bypass unnecessary work.
*/
if (prstate.attempt_freeze)
{
prstate.all_visible = true;
prstate.all_frozen = true;
}
+ else if (prstate.attempt_update_vm)
+ {
+ prstate.all_visible = true;
+ prstate.all_frozen = false;
+ }
else
{
- /*
- * Initializing to false allows skipping the work to update them in
- * heap_prune_record_unchanged_lp_normal().
- */
prstate.all_visible = false;
prstate.all_frozen = false;
}
/*
- * The visibility cutoff xid is the newest xmin of live tuples on the
- * page. In the common case, this will be set as the conflict horizon the
- * caller can use for updating the VM. If, at the end of freezing and
- * pruning, the page is all-frozen, there is no possibility that any
- * running transaction on the standby does not see tuples on the page as
- * all-visible, so the conflict horizon remains InvalidTransactionId.
+ * The visibility cutoff xid is the newest xmin of live, committed tuples
+ * older than OldestXmin on the page. This field is only kept up-to-date
+ * if the page is all-visible. As soon as a tuple is encountered that is
+ * not visible to all, this field is unmaintained. As long as it is
+ * maintained, it can be used to calculate the snapshot conflict horizon
+ * when updating the VM and/or freezing all the tuples on the page.
*/
prstate.visibility_cutoff_xid = InvalidTransactionId;
@@ -818,6 +947,35 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
prstate.all_visible = prstate.all_frozen = false;
Assert(!prstate.all_frozen || prstate.all_visible);
+ Assert(!prstate.all_visible || (prstate.lpdead_items == 0));
+
+ /*
+ * Decide whether to set the page-level PD_ALL_VISIBLE bit and the VM bits
+ * based on information from the VM and the all_visible/all_frozen flags.
+ *
+ * While it is valid for PD_ALL_VISIBLE to be set when the corresponding
+ * VM bit is clear, we strongly prefer to keep them in sync.
+ *
+ * Accordingly, we also allow updating only the VM when PD_ALL_VISIBLE has
+ * already been set. Setting only the VM is most common when setting an
+ * already all-visible page all-frozen.
+ */
+ do_set_vm = heap_page_will_set_vis(params->relation,
+ blockno, buffer, vmbuffer, params->blk_known_av,
+ &prstate, &new_vmbits, &do_set_pd_vis);
+
+ /* We should only set the VM if PD_ALL_VISIBLE is set or will be */
+ Assert(!do_set_vm || do_set_pd_vis || PageIsAllVisible(page));
+
+ conflict_xid = get_conflict_xid(do_prune, do_freeze, do_set_vm,
+ prstate.latest_xid_removed, frz_conflict_horizon,
+ prstate.visibility_cutoff_xid, params->blk_known_av,
+ (do_set_vm && (new_vmbits & VISIBILITYMAP_ALL_FROZEN)));
+
+ /* Lock vmbuffer before entering a critical section */
+ if (do_set_vm)
+ LockBuffer(vmbuffer, BUFFER_LOCK_EXCLUSIVE);
+
/* Any error while applying the changes is critical */
START_CRIT_SECTION();
@@ -838,14 +996,17 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
/*
* If that's all we had to do to the page, this is a non-WAL-logged
- * hint. If we are going to freeze or prune the page, we will mark
- * the buffer dirty below.
+ * hint. If we are going to freeze or prune the page or set
+ * PD_ALL_VISIBLE, we will mark the buffer dirty below.
+ *
+ * Setting PD_ALL_VISIBLE is fully WAL-logged because it is forbidden
+ * for the VM to be set and PD_ALL_VISIBLE to be clear.
*/
- if (!do_freeze && !do_prune)
+ if (!do_freeze && !do_prune && !do_set_pd_vis)
MarkBufferDirtyHint(buffer, true);
}
- if (do_prune || do_freeze)
+ if (do_prune || do_freeze || do_set_vm)
{
/* Apply the planned item changes and repair page fragmentation. */
if (do_prune)
@@ -859,64 +1020,91 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
if (do_freeze)
heap_freeze_prepared_tuples(buffer, prstate.frozen, prstate.nfrozen);
- MarkBufferDirty(buffer);
+ if (do_set_pd_vis)
+ PageSetAllVisible(page);
- /*
- * Emit a WAL XLOG_HEAP2_PRUNE* record showing what we did
- */
- if (RelationNeedsWAL(params->relation))
+ if (do_prune || do_freeze || do_set_pd_vis)
+ MarkBufferDirty(buffer);
+
+ if (do_set_vm)
{
- /*
- * The snapshotConflictHorizon for the whole record should be the
- * most conservative of all the horizons calculated for any of the
- * possible modifications. If this record will prune tuples, any
- * transactions on the standby older than the youngest xmax of the
- * most recently removed tuple this record will prune will
- * conflict. If this record will freeze tuples, any transactions
- * on the standby with xids older than the youngest tuple this
- * record will freeze will conflict.
- */
- TransactionId conflict_xid;
+ Assert(PageIsAllVisible(page));
- if (TransactionIdFollows(frz_conflict_horizon, prstate.latest_xid_removed))
- conflict_xid = frz_conflict_horizon;
- else
- conflict_xid = prstate.latest_xid_removed;
+ old_vmbits = visibilitymap_set_vmbits(blockno,
+ vmbuffer, new_vmbits,
+ RelationGetRelationName(params->relation));
+ if (old_vmbits == new_vmbits)
+ {
+ LockBuffer(vmbuffer, BUFFER_LOCK_UNLOCK);
+ /* Unset so we don't emit WAL since no change occurred */
+ do_set_vm = false;
+ }
+ }
+ /*
+ * Emit a WAL XLOG_HEAP2_PRUNE* record showing what we did If we were
+ * only updating the VM and it turns out it was already set, we will
+ * have unset do_set_vm earlier. As such, check it again before
+ * emitting the record.
+ */
+ if (RelationNeedsWAL(params->relation) &&
+ (do_prune || do_freeze || do_set_vm))
log_heap_prune_and_freeze(params->relation, buffer,
+ do_set_vm ? vmbuffer : InvalidBuffer,
+ do_set_vm ? new_vmbits : 0,
conflict_xid,
- true, params->reason,
+ true, /* cleanup lock */
+ do_set_pd_vis,
+ params->reason,
prstate.frozen, prstate.nfrozen,
prstate.redirected, prstate.nredirected,
prstate.nowdead, prstate.ndead,
prstate.nowunused, prstate.nunused);
- }
}
END_CRIT_SECTION();
+ if (do_set_vm)
+ LockBuffer(vmbuffer, BUFFER_LOCK_UNLOCK);
+
+ /*
+ * During its second pass over the heap, VACUUM calls
+ * heap_page_would_be_all_visible() to determine whether a page is
+ * all-visible and all-frozen. The logic here is similar. After completing
+ * pruning and freezing, use an assertion to verify that our results
+ * remain consistent with heap_page_would_be_all_visible().
+ */
+#ifdef USE_ASSERT_CHECKING
+ if (prstate.all_visible)
+ {
+ TransactionId debug_cutoff;
+ bool debug_all_frozen;
+
+ Assert(prstate.lpdead_items == 0);
+ Assert(prstate.cutoffs);
+
+ if (!heap_page_is_all_visible(params->relation, buffer,
+ prstate.cutoffs->OldestXmin,
+ &debug_all_frozen,
+ &debug_cutoff, off_loc))
+ Assert(false);
+
+ Assert(prstate.all_frozen == debug_all_frozen);
+
+ Assert(!TransactionIdIsValid(debug_cutoff) ||
+ debug_cutoff == prstate.visibility_cutoff_xid);
+ }
+#endif
+
/* Copy information back for caller */
presult->ndeleted = prstate.ndeleted;
presult->nnewlpdead = prstate.ndead;
presult->nfrozen = prstate.nfrozen;
presult->live_tuples = prstate.live_tuples;
presult->recently_dead_tuples = prstate.recently_dead_tuples;
- presult->all_visible = prstate.all_visible;
- presult->all_frozen = prstate.all_frozen;
presult->hastup = prstate.hastup;
-
- /*
- * For callers planning to update the visibility map, the conflict horizon
- * for that record must be the newest xmin on the page. However, if the
- * page is completely frozen, there can be no conflict and the
- * vm_conflict_horizon should remain InvalidTransactionId. This includes
- * the case that we just froze all the tuples; the prune-freeze record
- * included the conflict XID already so the caller doesn't need it.
- */
- if (presult->all_frozen)
- presult->vm_conflict_horizon = InvalidTransactionId;
- else
- presult->vm_conflict_horizon = prstate.visibility_cutoff_xid;
+ presult->new_vmbits = new_vmbits;
+ presult->old_vmbits = old_vmbits;
presult->lpdead_items = prstate.lpdead_items;
/* the presult->deadoffsets array was already filled in */
@@ -2058,6 +2246,64 @@ heap_log_freeze_plan(HeapTupleFreeze *tuples, int ntuples,
return nplans;
}
+/*
+ * Calculate the conflict horizon for the whole XLOG_HEAP2_PRUNE_VACUUM_SCAN
+ * record.
+ */
+static TransactionId
+get_conflict_xid(bool do_prune, bool do_freeze, bool do_set_vm,
+ TransactionId latest_xid_removed, TransactionId frz_conflict_horizon,
+ TransactionId visibility_cutoff_xid, bool blk_already_av,
+ bool set_blk_all_frozen)
+{
+
+ /*
+ * The snapshotConflictHorizon for the whole record should be the most
+ * conservative of all the horizons calculated for any of the possible
+ * modifications. If this record will prune tuples, any transactions on
+ * the standby older than the youngest xmax of the most recently removed
+ * tuple this record will prune will conflict. If this record will freeze
+ * tuples, any transactions on the standby with xids older than the
+ * youngest tuple this record will freeze will conflict.
+ */
+ TransactionId conflict_xid = InvalidTransactionId;
+
+ /*
+ * If we are updating the VM, the conflict horizon is almost always the
+ * visibility cutoff XID.
+ *
+ * Separately, if we are freezing any tuples, as an optimization, we can
+ * use the visibility_cutoff_xid as the conflict horizon if the page will
+ * be all-frozen. This is true even if there are LP_DEAD line pointers
+ * because we ignored those when maintaining the visibility_cutoff_xid.
+ * This will have been calculated earlier as the frz_conflict_horizon when
+ * we determined we would freeze.
+ */
+ if (do_set_vm)
+ conflict_xid = visibility_cutoff_xid;
+ else if (do_freeze)
+ conflict_xid = frz_conflict_horizon;
+
+ /*
+ * If we are removing tuples with a younger xmax than our so far
+ * calculated conflict_xid, we must use this as our horizon.
+ */
+ if (TransactionIdFollows(latest_xid_removed, conflict_xid))
+ conflict_xid = latest_xid_removed;
+
+ /*
+ * We can omit the snapshot conflict horizon if we are not pruning or
+ * freezing any tuples and are setting an already all-visible page
+ * all-frozen in the VM. In this case, all of the tuples on the page must
+ * already be visible to all MVCC snapshots on the standby.
+ */
+ if (!do_prune && !do_freeze &&
+ do_set_vm && blk_already_av && set_blk_all_frozen)
+ conflict_xid = InvalidTransactionId;
+
+ return conflict_xid;
+}
+
/*
* Write an XLOG_HEAP2_PRUNE* WAL record
*
@@ -2078,14 +2324,24 @@ heap_log_freeze_plan(HeapTupleFreeze *tuples, int ntuples,
* replaying 'unused' items depends on whether they were all previously marked
* as dead.
*
+ * If the VM is being updated, vmflags will contain the bits to set. In this
+ * case, vmbuffer should already have been updated and marked dirty and should
+ * still be pinned and locked.
+ *
+ * set_pd_all_vis indicates that we set PD_ALL_VISIBLE and thus should update
+ * the page LSN when checksums/wal_log_hints are enabled even if we did not
+ * prune or freeze tuples on the page.
+ *
* Note: This function scribbles on the 'frozen' array.
*
* Note: This is called in a critical section, so careful what you do here.
*/
void
log_heap_prune_and_freeze(Relation relation, Buffer buffer,
+ Buffer vmbuffer, uint8 vmflags,
TransactionId conflict_xid,
bool cleanup_lock,
+ bool set_pd_all_vis,
PruneReason reason,
HeapTupleFreeze *frozen, int nfrozen,
OffsetNumber *redirected, int nredirected,
@@ -2095,6 +2351,7 @@ log_heap_prune_and_freeze(Relation relation, Buffer buffer,
xl_heap_prune xlrec;
XLogRecPtr recptr;
uint8 info;
+ uint8 regbuf_flags;
/* The following local variables hold data registered in the WAL record: */
xlhp_freeze_plan plans[MaxHeapTuplesPerPage];
@@ -2103,8 +2360,23 @@ log_heap_prune_and_freeze(Relation relation, Buffer buffer,
xlhp_prune_items dead_items;
xlhp_prune_items unused_items;
OffsetNumber frz_offsets[MaxHeapTuplesPerPage];
+ bool do_prune = nredirected > 0 || ndead > 0 || nunused > 0;
xlrec.flags = 0;
+ regbuf_flags = REGBUF_STANDARD;
+
+ Assert((vmflags & VISIBILITYMAP_VALID_BITS) == vmflags);
+
+ /*
+ * We can avoid an FPI if the only modification we are making to the heap
+ * page is to set PD_ALL_VISIBLE and checksums/wal_log_hints are disabled.
+ * Note that if we explicitly skip an FPI, we must not set the heap page
+ * LSN later.
+ */
+ if (!do_prune &&
+ nfrozen == 0 &&
+ (!set_pd_all_vis || !XLogHintBitIsNeeded()))
+ regbuf_flags |= REGBUF_NO_IMAGE;
/*
* Prepare data for the buffer. The arrays are not actually in the
@@ -2112,7 +2384,11 @@ log_heap_prune_and_freeze(Relation relation, Buffer buffer,
* page image, the arrays can be omitted.
*/
XLogBeginInsert();
- XLogRegisterBuffer(0, buffer, REGBUF_STANDARD);
+ XLogRegisterBuffer(0, buffer, regbuf_flags);
+
+ if (vmflags & VISIBILITYMAP_VALID_BITS)
+ XLogRegisterBuffer(1, vmbuffer, 0);
+
if (nfrozen > 0)
{
int nplans;
@@ -2169,6 +2445,12 @@ log_heap_prune_and_freeze(Relation relation, Buffer buffer,
* Prepare the main xl_heap_prune record. We already set the XLHP_HAS_*
* flag above.
*/
+ if (vmflags & VISIBILITYMAP_ALL_VISIBLE)
+ {
+ xlrec.flags |= XLHP_VM_ALL_VISIBLE;
+ if (vmflags & VISIBILITYMAP_ALL_FROZEN)
+ xlrec.flags |= XLHP_VM_ALL_FROZEN;
+ }
if (RelationIsAccessibleInLogicalDecoding(relation))
xlrec.flags |= XLHP_IS_CATALOG_REL;
if (TransactionIdIsValid(conflict_xid))
@@ -2201,5 +2483,23 @@ log_heap_prune_and_freeze(Relation relation, Buffer buffer,
}
recptr = XLogInsert(RM_HEAP2_ID, info);
- PageSetLSN(BufferGetPage(buffer), recptr);
+ if (vmflags & VISIBILITYMAP_VALID_BITS)
+ {
+ Assert(BufferIsDirty(vmbuffer));
+ PageSetLSN(BufferGetPage(vmbuffer), recptr);
+ }
+
+ /*
+ * We must bump the page LSN if pruning or freezing. If we are only
+ * updating PD_ALL_VISIBLE, though, we can skip doing this unless
+ * wal_log_hints/checksums are enabled. Torn pages are possible if we
+ * update PD_ALL_VISIBLE without bumping the LSN, but this is deemed okay
+ * for page hint updates.
+ */
+ if (do_prune || nfrozen > 0 ||
+ (set_pd_all_vis && XLogHintBitIsNeeded()))
+ {
+ Assert(BufferIsDirty(buffer));
+ PageSetLSN(BufferGetPage(buffer), recptr);
+ }
}
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index aed1f8e1139..39526bf608f 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -1958,6 +1958,8 @@ lazy_scan_prune(LVRelState *vacrel,
params.reason = PRUNE_VACUUM_SCAN;
params.cutoffs = &vacrel->cutoffs;
params.vistest = vacrel->vistest;
+ params.vmbuffer = vmbuffer;
+ params.blk_known_av = all_visible_according_to_vm;
/*
* Prune all HOT-update chains and potentially freeze tuples on this page.
@@ -1974,7 +1976,7 @@ lazy_scan_prune(LVRelState *vacrel,
* tuples. Pruning will have determined whether or not the page is
* all-visible.
*/
- params.options = HEAP_PAGE_PRUNE_FREEZE;
+ params.options = HEAP_PAGE_PRUNE_FREEZE | HEAP_PAGE_PRUNE_UPDATE_VIS;
if (vacrel->nindexes == 0)
params.options |= HEAP_PAGE_PRUNE_MARK_UNUSED_NOW;
@@ -1997,33 +1999,6 @@ lazy_scan_prune(LVRelState *vacrel,
vacrel->new_frozen_tuple_pages++;
}
- /*
- * VACUUM will call heap_page_is_all_visible() during the second pass over
- * the heap to determine all_visible and all_frozen for the page -- this
- * is a specialized version of the logic from this function. Now that
- * we've finished pruning and freezing, make sure that we're in total
- * agreement with heap_page_is_all_visible() using an assertion.
- */
-#ifdef USE_ASSERT_CHECKING
- if (presult.all_visible)
- {
- TransactionId debug_cutoff;
- bool debug_all_frozen;
-
- Assert(presult.lpdead_items == 0);
-
- if (!heap_page_is_all_visible(vacrel->rel, buf,
- vacrel->cutoffs.OldestXmin, &debug_all_frozen,
- &debug_cutoff, &vacrel->offnum))
- Assert(false);
-
- Assert(presult.all_frozen == debug_all_frozen);
-
- Assert(!TransactionIdIsValid(debug_cutoff) ||
- debug_cutoff == presult.vm_conflict_horizon);
- }
-#endif
-
/*
* Now save details of the LP_DEAD items from the page in vacrel
*/
@@ -2057,168 +2032,26 @@ lazy_scan_prune(LVRelState *vacrel,
/* Did we find LP_DEAD items? */
*has_lpdead_items = (presult.lpdead_items > 0);
- Assert(!presult.all_visible || !(*has_lpdead_items));
- Assert(!presult.all_frozen || presult.all_visible);
-
/*
- * Handle setting visibility map bit based on information from the VM (as
- * of last heap_vac_scan_next_block() call), and from all_visible and
- * all_frozen variables
+ * For the purposes of logging, count whether or not the page was newly
+ * set all-visible and, potentially, all-frozen.
*/
- if (!all_visible_according_to_vm && presult.all_visible)
+ if ((presult.old_vmbits & VISIBILITYMAP_ALL_VISIBLE) == 0 &&
+ (presult.new_vmbits & VISIBILITYMAP_ALL_VISIBLE) != 0)
{
- uint8 old_vmbits;
- uint8 flags = VISIBILITYMAP_ALL_VISIBLE;
-
- if (presult.all_frozen)
- {
- Assert(!TransactionIdIsValid(presult.vm_conflict_horizon));
- flags |= VISIBILITYMAP_ALL_FROZEN;
- }
-
- /*
- * It should never be the case that the visibility map page is set
- * while the page-level bit is clear, but the reverse is allowed (if
- * checksums are not enabled). Regardless, set both bits so that we
- * get back in sync.
- *
- * NB: If the heap page is all-visible but the VM bit is not set, we
- * don't need to dirty the heap page. However, if checksums are
- * enabled, we do need to make sure that the heap page is dirtied
- * before passing it to visibilitymap_set(), because it may be logged.
- * Given that this situation should only happen in rare cases after a
- * crash, it is not worth optimizing.
- */
- PageSetAllVisible(page);
- MarkBufferDirty(buf);
- old_vmbits = visibilitymap_set(vacrel->rel, blkno, buf,
- InvalidXLogRecPtr,
- vmbuffer, presult.vm_conflict_horizon,
- flags);
-
- /*
- * If the page wasn't already set all-visible and/or all-frozen in the
- * VM, count it as newly set for logging.
- */
- if ((old_vmbits & VISIBILITYMAP_ALL_VISIBLE) == 0)
- {
- vacrel->vm_new_visible_pages++;
- if (presult.all_frozen)
- {
- vacrel->vm_new_visible_frozen_pages++;
- *vm_page_frozen = true;
- }
- }
- else if ((old_vmbits & VISIBILITYMAP_ALL_FROZEN) == 0 &&
- presult.all_frozen)
+ vacrel->vm_new_visible_pages++;
+ if ((presult.new_vmbits & VISIBILITYMAP_ALL_FROZEN) != 0)
{
- vacrel->vm_new_frozen_pages++;
+ vacrel->vm_new_visible_frozen_pages++;
*vm_page_frozen = true;
}
}
-
- /*
- * As of PostgreSQL 9.2, the visibility map bit should never be set if the
- * page-level bit is clear. However, it's possible that the bit got
- * cleared after heap_vac_scan_next_block() was called, so we must recheck
- * with buffer lock before concluding that the VM is corrupt.
- */
- else if (all_visible_according_to_vm && !PageIsAllVisible(page) &&
- visibilitymap_get_status(vacrel->rel, blkno, &vmbuffer) != 0)
- {
- ereport(WARNING,
- (errcode(ERRCODE_DATA_CORRUPTED),
- errmsg("page is not marked all-visible but visibility map bit is set in relation \"%s\" page %u",
- vacrel->relname, blkno)));
-
- visibilitymap_clear(vacrel->rel, blkno, vmbuffer,
- VISIBILITYMAP_VALID_BITS);
- }
-
- /*
- * It's possible for the value returned by
- * GetOldestNonRemovableTransactionId() to move backwards, so it's not
- * wrong for us to see tuples that appear to not be visible to everyone
- * yet, while PD_ALL_VISIBLE is already set. The real safe xmin value
- * never moves backwards, but GetOldestNonRemovableTransactionId() is
- * conservative and sometimes returns a value that's unnecessarily small,
- * so if we see that contradiction it just means that the tuples that we
- * think are not visible to everyone yet actually are, and the
- * PD_ALL_VISIBLE flag is correct.
- *
- * There should never be LP_DEAD items on a page with PD_ALL_VISIBLE set,
- * however.
- */
- else if (presult.lpdead_items > 0 && PageIsAllVisible(page))
- {
- ereport(WARNING,
- (errcode(ERRCODE_DATA_CORRUPTED),
- errmsg("page containing LP_DEAD items is marked as all-visible in relation \"%s\" page %u",
- vacrel->relname, blkno)));
-
- PageClearAllVisible(page);
- MarkBufferDirty(buf);
- visibilitymap_clear(vacrel->rel, blkno, vmbuffer,
- VISIBILITYMAP_VALID_BITS);
- }
-
- /*
- * If the all-visible page is all-frozen but not marked as such yet, mark
- * it as all-frozen.
- */
- else if (all_visible_according_to_vm && presult.all_frozen &&
- !VM_ALL_FROZEN(vacrel->rel, blkno, &vmbuffer))
+ else if ((presult.old_vmbits & VISIBILITYMAP_ALL_FROZEN) == 0 &&
+ (presult.new_vmbits & VISIBILITYMAP_ALL_FROZEN) != 0)
{
- uint8 old_vmbits;
-
- /*
- * Avoid relying on all_visible_according_to_vm as a proxy for the
- * page-level PD_ALL_VISIBLE bit being set, since it might have become
- * stale -- even when all_visible is set
- */
- if (!PageIsAllVisible(page))
- {
- PageSetAllVisible(page);
- MarkBufferDirty(buf);
- }
-
- /*
- * Set the page all-frozen (and all-visible) in the VM.
- *
- * We can pass InvalidTransactionId as our cutoff_xid, since a
- * snapshotConflictHorizon sufficient to make everything safe for REDO
- * was logged when the page's tuples were frozen.
- */
- Assert(!TransactionIdIsValid(presult.vm_conflict_horizon));
- old_vmbits = visibilitymap_set(vacrel->rel, blkno, buf,
- InvalidXLogRecPtr,
- vmbuffer, InvalidTransactionId,
- VISIBILITYMAP_ALL_VISIBLE |
- VISIBILITYMAP_ALL_FROZEN);
-
- /*
- * The page was likely already set all-visible in the VM. However,
- * there is a small chance that it was modified sometime between
- * setting all_visible_according_to_vm and checking the visibility
- * during pruning. Check the return value of old_vmbits anyway to
- * ensure the visibility map counters used for logging are accurate.
- */
- if ((old_vmbits & VISIBILITYMAP_ALL_VISIBLE) == 0)
- {
- vacrel->vm_new_visible_pages++;
- vacrel->vm_new_visible_frozen_pages++;
- *vm_page_frozen = true;
- }
-
- /*
- * We already checked that the page was not set all-frozen in the VM
- * above, so we don't need to test the value of old_vmbits.
- */
- else
- {
- vacrel->vm_new_frozen_pages++;
- *vm_page_frozen = true;
- }
+ Assert((presult.new_vmbits & VISIBILITYMAP_ALL_VISIBLE) != 0);
+ vacrel->vm_new_frozen_pages++;
+ *vm_page_frozen = true;
}
return presult.ndeleted;
@@ -2892,8 +2725,11 @@ lazy_vacuum_heap_page(LVRelState *vacrel, BlockNumber blkno, Buffer buffer,
if (RelationNeedsWAL(vacrel->rel))
{
log_heap_prune_and_freeze(vacrel->rel, buffer,
- InvalidTransactionId,
+ InvalidBuffer, /* vmbuffer */
+ 0, /* vmflags */
+ InvalidTransactionId, /* conflict_xid */
false, /* no cleanup lock required */
+ false, /* set_pd_all_vis */
PRUNE_VACUUM_CLEANUP,
NULL, 0, /* frozen */
NULL, 0, /* redirected */
diff --git a/src/backend/access/rmgrdesc/heapdesc.c b/src/backend/access/rmgrdesc/heapdesc.c
index b48d7dc1d24..1cb44ca32d3 100644
--- a/src/backend/access/rmgrdesc/heapdesc.c
+++ b/src/backend/access/rmgrdesc/heapdesc.c
@@ -103,7 +103,7 @@ plan_elem_desc(StringInfo buf, void *plan, void *data)
* code, the latter of which is used in frontend (pg_waldump) code.
*/
void
-heap_xlog_deserialize_prune_and_freeze(char *cursor, uint8 flags,
+heap_xlog_deserialize_prune_and_freeze(char *cursor, uint16 flags,
int *nplans, xlhp_freeze_plan **plans,
OffsetNumber **frz_offsets,
int *nredirected, OffsetNumber **redirected,
@@ -287,6 +287,15 @@ heap2_desc(StringInfo buf, XLogReaderState *record)
appendStringInfo(buf, ", isCatalogRel: %c",
xlrec->flags & XLHP_IS_CATALOG_REL ? 'T' : 'F');
+ if (xlrec->flags & XLHP_VM_ALL_VISIBLE)
+ {
+ uint8 vmflags = VISIBILITYMAP_ALL_VISIBLE;
+
+ if (xlrec->flags & XLHP_VM_ALL_FROZEN)
+ vmflags |= VISIBILITYMAP_ALL_FROZEN;
+ appendStringInfo(buf, ", vm_flags: 0x%02X", vmflags);
+ }
+
if (XLogRecHasBlockData(record, 0))
{
Size datalen;
diff --git a/src/include/access/heapam.h b/src/include/access/heapam.h
index ea67fb83fbe..2de39ba0cd1 100644
--- a/src/include/access/heapam.h
+++ b/src/include/access/heapam.h
@@ -42,6 +42,7 @@
/* "options" flag bits for heap_page_prune_and_freeze */
#define HEAP_PAGE_PRUNE_MARK_UNUSED_NOW (1 << 0)
#define HEAP_PAGE_PRUNE_FREEZE (1 << 1)
+#define HEAP_PAGE_PRUNE_UPDATE_VIS (1 << 2)
typedef struct BulkInsertStateData *BulkInsertState;
typedef struct GlobalVisState GlobalVisState;
@@ -238,6 +239,16 @@ typedef struct PruneFreezeParams
Relation relation; /* relation containing buffer to be pruned */
Buffer buffer; /* buffer to be pruned */
+ /*
+ *
+ * vmbuffer is the buffer that must already contain contain the required
+ * block of the visibility map if we are to update it. blk_known_av is the
+ * visibility status of the heap block as of the last call to
+ * find_next_unskippable_block().
+ */
+ Buffer vmbuffer;
+ bool blk_known_av;
+
/*
* The reason pruning was performed. It is used to set the WAL record
* opcode which is used for debugging and analysis purposes.
@@ -250,8 +261,9 @@ typedef struct PruneFreezeParams
* MARK_UNUSED_NOW indicates that dead items can be set LP_UNUSED during
* pruning.
*
- * FREEZE indicates that we will also freeze tuples, and will return
- * 'all_visible', 'all_frozen' flags to the caller.
+ * FREEZE indicates that we will also freeze tuples
+ *
+ * UPDATE_VIS indicates that we will set the page's status in the VM.
*/
int options;
@@ -284,19 +296,15 @@ typedef struct PruneFreezeResult
int recently_dead_tuples;
/*
- * all_visible and all_frozen indicate if the all-visible and all-frozen
- * bits in the visibility map can be set for this page, after pruning.
- *
- * vm_conflict_horizon is the newest xmin of live tuples on the page. The
- * caller can use it as the conflict horizon when setting the VM bits. It
- * is only valid if we froze some tuples (nfrozen > 0), and all_frozen is
- * true.
+ * old_vmbits are the state of the all-visible and all-frozen bits in the
+ * visibility map before updating it during phase I of vacuuming.
+ * new_vmbits are the state of those bits after phase I of vacuuming.
*
- * These are only set if the HEAP_PRUNE_FREEZE option is set.
+ * These are only set if the HEAP_PAGE_PRUNE_UPDATE_VIS option is set and
+ * we have attempted to update the VM.
*/
- bool all_visible;
- bool all_frozen;
- TransactionId vm_conflict_horizon;
+ uint8 new_vmbits;
+ uint8 old_vmbits;
/*
* Whether or not the page makes rel truncation unsafe. This is set to
@@ -420,8 +428,10 @@ extern void heap_page_prune_execute(Buffer buffer, bool lp_truncate_only,
OffsetNumber *nowunused, int nunused);
extern void heap_get_root_tuples(Page page, OffsetNumber *root_offsets);
extern void log_heap_prune_and_freeze(Relation relation, Buffer buffer,
+ Buffer vmbuffer, uint8 vmflags,
TransactionId conflict_xid,
bool cleanup_lock,
+ bool set_pd_all_vis,
PruneReason reason,
HeapTupleFreeze *frozen, int nfrozen,
OffsetNumber *redirected, int nredirected,
diff --git a/src/include/access/heapam_xlog.h b/src/include/access/heapam_xlog.h
index d4c0625b632..16c2b2e3c9c 100644
--- a/src/include/access/heapam_xlog.h
+++ b/src/include/access/heapam_xlog.h
@@ -249,7 +249,7 @@ typedef struct xl_heap_update
* Main data section:
*
* xl_heap_prune
- * uint8 flags
+ * uint16 flags
* TransactionId snapshot_conflict_horizon
*
* Block 0 data section:
@@ -284,7 +284,7 @@ typedef struct xl_heap_update
*/
typedef struct xl_heap_prune
{
- uint8 flags;
+ uint16 flags;
/*
* If XLHP_HAS_CONFLICT_HORIZON is set, the conflict horizon XID follows,
@@ -292,7 +292,7 @@ typedef struct xl_heap_prune
*/
} xl_heap_prune;
-#define SizeOfHeapPrune (offsetof(xl_heap_prune, flags) + sizeof(uint8))
+#define SizeOfHeapPrune (offsetof(xl_heap_prune, flags) + sizeof(uint16))
/* to handle recovery conflict during logical decoding on standby */
#define XLHP_IS_CATALOG_REL (1 << 1)
@@ -330,6 +330,15 @@ typedef struct xl_heap_prune
#define XLHP_HAS_DEAD_ITEMS (1 << 6)
#define XLHP_HAS_NOW_UNUSED_ITEMS (1 << 7)
+/*
+ * The xl_heap_prune record's flags may also contain which VM bits to set.
+ * xl_heap_prune should always use the XLHP_VM_ALL_VISIBLE and
+ * XLHP_VM_ALL_FROZEN flags and translate them to their visibilitymapdefs.h
+ * equivalents, VISIBILITYMAP_ALL_VISIBLE and VISIBILITYMAP_ALL_FROZEN.
+ */
+#define XLHP_VM_ALL_VISIBLE (1 << 8)
+#define XLHP_VM_ALL_FROZEN (1 << 9)
+
/*
* xlhp_freeze_plan describes how to freeze a group of one or more heap tuples
* (appears in xl_heap_prune's xlhp_freeze_plans sub-record)
@@ -497,7 +506,7 @@ extern XLogRecPtr log_heap_visible(Relation rel, Buffer heap_buffer,
uint8 vmflags);
/* in heapdesc.c, so it can be shared between frontend/backend code */
-extern void heap_xlog_deserialize_prune_and_freeze(char *cursor, uint8 flags,
+extern void heap_xlog_deserialize_prune_and_freeze(char *cursor, uint16 flags,
int *nplans, xlhp_freeze_plan **plans,
OffsetNumber **frz_offsets,
int *nredirected, OffsetNumber **redirected,
--
2.43.0
v16-0007-Eliminate-XLOG_HEAP2_VISIBLE-from-vacuum-phase-I.patchtext/x-patch; charset=US-ASCII; name=v16-0007-Eliminate-XLOG_HEAP2_VISIBLE-from-vacuum-phase-I.patchDownload
From 95d94ee991ea163b4b7861a193b3a1a3497de73e Mon Sep 17 00:00:00 2001
From: Melanie Plageman <melanieplageman@gmail.com>
Date: Sat, 27 Sep 2025 11:54:38 -0400
Subject: [PATCH v16 07/14] Eliminate XLOG_HEAP2_VISIBLE from vacuum phase III
Instead of emitting a separate XLOG_HEAP2_VISIBLE record for each page
that becomes all-visible in vacuum's third phase, record the
visibility map update in the already emitted
XLOG_HEAP2_PRUNE_VACUUM_CLEANUP record.
Visibility checks are now performed before marking dead items unused.
This is safe because the heap page is held under exclusive lock for the
entire operation.
This reduces the number of WAL records generated by VACUUM phase III by
up to 50%.
Author: Melanie Plageman <melanieplageman@gmail.com>
Reviewed-by: Robert Haas <robertmhaas@gmail.com>
Reviewed-by: Kirill Reshke <reshkekirill@gmail.com>
Reviewed-by: Andres Freund <andres@anarazel.de>
Discussion: https://postgr.es/m/flat/CAAKRu_ZMw6Npd_qm2KM%2BFwQ3cMOMx1Dh3VMhp8-V7SOLxdK9-g%40mail.gmail.com
---
src/backend/access/heap/vacuumlazy.c | 174 +++++++++++++++++++--------
1 file changed, 124 insertions(+), 50 deletions(-)
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index 39526bf608f..cf1c2efc999 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -463,6 +463,13 @@ static void dead_items_add(LVRelState *vacrel, BlockNumber blkno, OffsetNumber *
int num_offsets);
static void dead_items_reset(LVRelState *vacrel);
static void dead_items_cleanup(LVRelState *vacrel);
+static bool heap_page_would_be_all_visible(Relation rel, Buffer buf,
+ TransactionId OldestXmin,
+ OffsetNumber *deadoffsets,
+ int ndeadoffsets,
+ bool *all_frozen,
+ TransactionId *visibility_cutoff_xid,
+ OffsetNumber *logging_offnum);
static void update_relstats_all_indexes(LVRelState *vacrel);
static void vacuum_error_callback(void *arg);
static void update_vacuum_error_info(LVRelState *vacrel,
@@ -2685,8 +2692,10 @@ lazy_vacuum_heap_page(LVRelState *vacrel, BlockNumber blkno, Buffer buffer,
OffsetNumber unused[MaxHeapTuplesPerPage];
int nunused = 0;
TransactionId visibility_cutoff_xid;
+ TransactionId conflict_xid = InvalidTransactionId;
bool all_frozen;
LVSavedErrInfo saved_err_info;
+ uint8 vmflags = 0;
Assert(vacrel->do_index_vacuuming);
@@ -2697,6 +2706,31 @@ lazy_vacuum_heap_page(LVRelState *vacrel, BlockNumber blkno, Buffer buffer,
VACUUM_ERRCB_PHASE_VACUUM_HEAP, blkno,
InvalidOffsetNumber);
+ /*
+ * Before marking dead items unused, check whether the page will become
+ * all-visible once that change is applied. This lets us reap the tuples
+ * and mark the page all-visible within the same critical section,
+ * enabling both changes to be emitted in a single WAL record. Since the
+ * visibility checks may perform I/O and allocate memory, they must be
+ * done outside the critical section.
+ */
+ if (heap_page_would_be_all_visible(vacrel->rel, buffer,
+ vacrel->cutoffs.OldestXmin,
+ deadoffsets, num_offsets,
+ &all_frozen, &visibility_cutoff_xid,
+ &vacrel->offnum))
+ {
+ vmflags |= VISIBILITYMAP_ALL_VISIBLE;
+ if (all_frozen)
+ {
+ vmflags |= VISIBILITYMAP_ALL_FROZEN;
+ Assert(!TransactionIdIsValid(visibility_cutoff_xid));
+ }
+
+ /* Take the lock on the vmbuffer before entering a critical section */
+ LockBuffer(vmbuffer, BUFFER_LOCK_EXCLUSIVE);
+ }
+
START_CRIT_SECTION();
for (int i = 0; i < num_offsets; i++)
@@ -2716,6 +2750,21 @@ lazy_vacuum_heap_page(LVRelState *vacrel, BlockNumber blkno, Buffer buffer,
/* Attempt to truncate line pointer array now */
PageTruncateLinePointerArray(page);
+ /*
+ * The page is guaranteed to have had dead line pointers, so
+ * PD_ALL_VISIBLE cannot be already set. Therefore, whenever we set the VM
+ * bit, we must also set PD_ALL_VISIBLE. The heap page lock is held while
+ * updating the VM to ensure consistency.
+ */
+ if ((vmflags & VISIBILITYMAP_VALID_BITS) != 0)
+ {
+ PageSetAllVisible(page);
+ visibilitymap_set_vmbits(blkno,
+ vmbuffer, vmflags,
+ RelationGetRelationName(vacrel->rel));
+ conflict_xid = visibility_cutoff_xid;
+ }
+
/*
* Mark buffer dirty before we write WAL.
*/
@@ -2725,11 +2774,10 @@ lazy_vacuum_heap_page(LVRelState *vacrel, BlockNumber blkno, Buffer buffer,
if (RelationNeedsWAL(vacrel->rel))
{
log_heap_prune_and_freeze(vacrel->rel, buffer,
- InvalidBuffer, /* vmbuffer */
- 0, /* vmflags */
- InvalidTransactionId, /* conflict_xid */
+ vmbuffer, vmflags,
+ conflict_xid,
false, /* no cleanup lock required */
- false, /* set_pd_all_vis */
+ (vmflags & VISIBILITYMAP_VALID_BITS) != 0,
PRUNE_VACUUM_CLEANUP,
NULL, 0, /* frozen */
NULL, 0, /* redirected */
@@ -2737,41 +2785,12 @@ lazy_vacuum_heap_page(LVRelState *vacrel, BlockNumber blkno, Buffer buffer,
unused, nunused);
}
- /*
- * End critical section, so we safely can do visibility tests (which
- * possibly need to perform IO and allocate memory!). If we crash now the
- * page (including the corresponding vm bit) might not be marked all
- * visible, but that's fine. A later vacuum will fix that.
- */
END_CRIT_SECTION();
- /*
- * Now that we have removed the LP_DEAD items from the page, once again
- * check if the page has become all-visible. The page is already marked
- * dirty, exclusively locked, and, if needed, a full page image has been
- * emitted.
- */
- Assert(!PageIsAllVisible(page));
- if (heap_page_is_all_visible(vacrel->rel, buffer, vacrel->cutoffs.OldestXmin,
- &all_frozen,
- &visibility_cutoff_xid,
- &vacrel->offnum))
+ if ((vmflags & VISIBILITYMAP_ALL_VISIBLE) != 0)
{
- uint8 flags = VISIBILITYMAP_ALL_VISIBLE;
-
- if (all_frozen)
- {
- Assert(!TransactionIdIsValid(visibility_cutoff_xid));
- flags |= VISIBILITYMAP_ALL_FROZEN;
- }
-
- PageSetAllVisible(page);
- visibilitymap_set(vacrel->rel, blkno, buffer,
- InvalidXLogRecPtr,
- vmbuffer, visibility_cutoff_xid,
- flags);
-
/* Count the newly set VM page for logging */
+ LockBuffer(vmbuffer, BUFFER_LOCK_UNLOCK);
vacrel->vm_new_visible_pages++;
if (all_frozen)
vacrel->vm_new_visible_frozen_pages++;
@@ -3440,18 +3459,8 @@ dead_items_cleanup(LVRelState *vacrel)
}
/*
- * Check if every tuple in the given page is visible to all current and future
- * transactions. Also return the visibility_cutoff_xid which is the highest
- * xmin amongst the visible tuples. Set *all_frozen to true if every tuple
- * on this page is frozen.
- *
- * *logging_offnum will have the OffsetNumber of the current tuple being
- * processed for vacuum's error callback system.
- *
- * This is similar logic to that in heap_prune_record_unchanged_lp_normal() If
- * you change anything here, make sure that everything stays in sync. Note
- * that an assertion calls us to verify that everybody still agrees. Be sure
- * to avoid introducing new side-effects here.
+ * Wrapper for heap_page_would_be_all_visible() which can be used for
+ * callers that expect no LP_DEAD on the page.
*/
bool
heap_page_is_all_visible(Relation rel, Buffer buf,
@@ -3460,15 +3469,74 @@ heap_page_is_all_visible(Relation rel, Buffer buf,
TransactionId *visibility_cutoff_xid,
OffsetNumber *logging_offnum)
{
+
+ return heap_page_would_be_all_visible(rel, buf,
+ OldestXmin,
+ NULL, 0,
+ all_frozen,
+ visibility_cutoff_xid,
+ logging_offnum);
+}
+
+/*
+ * Check whether the heap page in buf is all-visible except for the dead
+ * tuples referenced in the deadoffsets array.
+ *
+ * The visibility checks may perform IO and allocate memory so they must not
+ * be done in a critical section. This function is used by vacuum to determine
+ * if the page will be all-visible once it reaps known dead tuples. That way
+ * it can do both in the same critical section and emit a single WAL record.
+ *
+ * Returns true if the page is all-visible other than the provided
+ * deadoffsets and false otherwise.
+ *
+ * OldestXmin is used to determine visibility.
+ *
+ * Output parameters:
+ *
+ * - *all_frozen: true if every tuple on the page is frozen
+ * - *visibility_cutoff_xid: newest xmin; valid only if page is all-visible
+ * - *logging_offnum: OffsetNumber of current tuple being processed;
+ * used by vacuum's error callback system.
+ *
+ * Callers looking to verify that the page is already all-visible can call
+ * heap_page_is_all_visible().
+ *
+ * This logic is closely related to heap_prune_record_unchanged_lp_normal().
+ * If you modify this function, ensure consistency with that code. An
+ * assertion cross-checks that both remain in agreement. Do not introduce new
+ * side-effects.
+ */
+static bool
+heap_page_would_be_all_visible(Relation rel, Buffer buf,
+ TransactionId OldestXmin,
+ OffsetNumber *deadoffsets,
+ int ndeadoffsets,
+ bool *all_frozen,
+ TransactionId *visibility_cutoff_xid,
+ OffsetNumber *logging_offnum)
+{
Page page = BufferGetPage(buf);
BlockNumber blockno = BufferGetBlockNumber(buf);
OffsetNumber offnum,
maxoff;
bool all_visible = true;
+ int matched_dead_count = 0;
*visibility_cutoff_xid = InvalidTransactionId;
*all_frozen = true;
+ Assert(ndeadoffsets == 0 || deadoffsets);
+
+#ifdef USE_ASSERT_CHECKING
+ /* Confirm input deadoffsets[] is strictly sorted */
+ if (ndeadoffsets > 1)
+ {
+ for (int i = 1; i < ndeadoffsets; i++)
+ Assert(deadoffsets[i - 1] < deadoffsets[i]);
+ }
+#endif
+
maxoff = PageGetMaxOffsetNumber(page);
for (offnum = FirstOffsetNumber;
offnum <= maxoff && all_visible;
@@ -3496,9 +3564,15 @@ heap_page_is_all_visible(Relation rel, Buffer buf,
*/
if (ItemIdIsDead(itemid))
{
- all_visible = false;
- *all_frozen = false;
- break;
+ if (!deadoffsets ||
+ matched_dead_count >= ndeadoffsets ||
+ deadoffsets[matched_dead_count] != offnum)
+ {
+ *all_frozen = all_visible = false;
+ break;
+ }
+ matched_dead_count++;
+ continue;
}
Assert(ItemIdIsNormal(itemid));
--
2.43.0
v16-0008-Eliminate-XLOG_HEAP2_VISIBLE-from-empty-page-vac.patchtext/x-patch; charset=US-ASCII; name=v16-0008-Eliminate-XLOG_HEAP2_VISIBLE-from-empty-page-vac.patchDownload
From 3e79e84930ba110a0dbf4abe6b3c84f3c021c78a Mon Sep 17 00:00:00 2001
From: Melanie Plageman <melanieplageman@gmail.com>
Date: Sat, 27 Sep 2025 11:55:21 -0400
Subject: [PATCH v16 08/14] Eliminate XLOG_HEAP2_VISIBLE from empty-page vacuum
As part of removing XLOG_HEAP2_VISIBLE records, phase I of VACUUM now
marks empty pages all-visible in a XLOG_HEAP2_PRUNE_VACUUM_SCAN record.
Author: Melanie Plageman <melanieplageman@gmail.com>
Reviewed-by: Robert Haas <robertmhaas@gmail.com>
---
src/backend/access/heap/vacuumlazy.c | 36 +++++++++++++++++++++++-----
1 file changed, 30 insertions(+), 6 deletions(-)
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index cf1c2efc999..cf9de40ff3c 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -1877,9 +1877,12 @@ lazy_scan_new_or_empty(LVRelState *vacrel, Buffer buf, BlockNumber blkno,
*/
if (!PageIsAllVisible(page))
{
+ /* Lock vmbuffer before entering critical section */
+ LockBuffer(vmbuffer, BUFFER_LOCK_EXCLUSIVE);
+
START_CRIT_SECTION();
- /* mark buffer dirty before writing a WAL record */
+ /* Mark buffer dirty before writing any WAL records */
MarkBufferDirty(buf);
/*
@@ -1896,13 +1899,34 @@ lazy_scan_new_or_empty(LVRelState *vacrel, Buffer buf, BlockNumber blkno,
log_newpage_buffer(buf, true);
PageSetAllVisible(page);
- visibilitymap_set(vacrel->rel, blkno, buf,
- InvalidXLogRecPtr,
- vmbuffer, InvalidTransactionId,
- VISIBILITYMAP_ALL_VISIBLE |
- VISIBILITYMAP_ALL_FROZEN);
+ visibilitymap_set_vmbits(blkno,
+ vmbuffer,
+ VISIBILITYMAP_ALL_VISIBLE |
+ VISIBILITYMAP_ALL_FROZEN,
+ RelationGetRelationName(vacrel->rel));
+
+ /*
+ * Emit WAL for setting PD_ALL_VISIBLE on the heap page and
+ * setting the VM.
+ */
+ if (RelationNeedsWAL(vacrel->rel))
+ log_heap_prune_and_freeze(vacrel->rel, buf,
+ vmbuffer,
+ VISIBILITYMAP_ALL_VISIBLE |
+ VISIBILITYMAP_ALL_FROZEN,
+ InvalidTransactionId, /* conflict xid */
+ false, /* cleanup lock */
+ true, /* set_pd_all_vis */
+ PRUNE_VACUUM_SCAN, /* reason */
+ NULL, 0,
+ NULL, 0,
+ NULL, 0,
+ NULL, 0);
+
END_CRIT_SECTION();
+ LockBuffer(vmbuffer, BUFFER_LOCK_UNLOCK);
+
/* Count the newly all-frozen pages for logging */
vacrel->vm_new_visible_pages++;
vacrel->vm_new_visible_frozen_pages++;
--
2.43.0
v16-0009-Remove-XLOG_HEAP2_VISIBLE-entirely.patchtext/x-patch; charset=US-ASCII; name=v16-0009-Remove-XLOG_HEAP2_VISIBLE-entirely.patchDownload
From d32451ace53d97e8e11deb12c87655c6e937ee0d Mon Sep 17 00:00:00 2001
From: Melanie Plageman <melanieplageman@gmail.com>
Date: Sat, 27 Sep 2025 11:55:36 -0400
Subject: [PATCH v16 09/14] Remove XLOG_HEAP2_VISIBLE entirely
As no remaining users emit XLOG_HEAP2_VISIBLE records.
This includes deleting the xl_heap_visible struct and all functions
responsible for emitting or replaying XLOG_HEAP2_VISIBLE records.
Author: Melanie Plageman <melanieplageman@gmail.com>
Reviewed-by: Andrey Borodin <x4mmm@yandex-team.ru>
---
src/backend/access/common/bufmask.c | 4 +-
src/backend/access/heap/heapam.c | 54 +-------
src/backend/access/heap/heapam_xlog.c | 155 ++---------------------
src/backend/access/heap/pruneheap.c | 18 ++-
src/backend/access/heap/vacuumlazy.c | 16 +--
src/backend/access/heap/visibilitymap.c | 110 +---------------
src/backend/access/rmgrdesc/heapdesc.c | 10 --
src/backend/replication/logical/decode.c | 1 -
src/backend/storage/ipc/standby.c | 12 +-
src/include/access/heapam_xlog.h | 28 +---
src/include/access/visibilitymap.h | 15 +--
src/include/access/visibilitymapdefs.h | 9 --
src/tools/pgindent/typedefs.list | 1 -
13 files changed, 56 insertions(+), 377 deletions(-)
diff --git a/src/backend/access/common/bufmask.c b/src/backend/access/common/bufmask.c
index bb260cffa68..5f07f179415 100644
--- a/src/backend/access/common/bufmask.c
+++ b/src/backend/access/common/bufmask.c
@@ -56,8 +56,8 @@ mask_page_hint_bits(Page page)
/*
* During replay, if the page LSN has advanced past our XLOG record's LSN,
- * we don't mark the page all-visible. See heap_xlog_visible() for
- * details.
+ * we don't mark the page all-visible. See heap_xlog_prune_and_freeze()
+ * for more details.
*/
PageClearAllVisible(page);
}
diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index 7f354caec31..14a2996b9ee 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -2524,11 +2524,11 @@ heap_multi_insert(Relation relation, TupleTableSlot **slots, int ntuples,
else if (all_frozen_set)
{
PageSetAllVisible(page);
- visibilitymap_set_vmbits(BufferGetBlockNumber(buffer),
- vmbuffer,
- VISIBILITYMAP_ALL_VISIBLE |
- VISIBILITYMAP_ALL_FROZEN,
- RelationGetRelationName(relation));
+ visibilitymap_set(BufferGetBlockNumber(buffer),
+ vmbuffer,
+ VISIBILITYMAP_ALL_VISIBLE |
+ VISIBILITYMAP_ALL_FROZEN,
+ RelationGetRelationName(relation));
}
/*
@@ -8798,50 +8798,6 @@ bottomup_sort_and_shrink(TM_IndexDeleteOp *delstate)
return nblocksfavorable;
}
-/*
- * Perform XLogInsert for a heap-visible operation. 'block' is the block
- * being marked all-visible, and vm_buffer is the buffer containing the
- * corresponding visibility map block. Both should have already been modified
- * and dirtied.
- *
- * snapshotConflictHorizon comes from the largest xmin on the page being
- * marked all-visible. REDO routine uses it to generate recovery conflicts.
- *
- * If checksums or wal_log_hints are enabled, we may also generate a full-page
- * image of heap_buffer. Otherwise, we optimize away the FPI (by specifying
- * REGBUF_NO_IMAGE for the heap buffer), in which case the caller should *not*
- * update the heap page's LSN.
- */
-XLogRecPtr
-log_heap_visible(Relation rel, Buffer heap_buffer, Buffer vm_buffer,
- TransactionId snapshotConflictHorizon, uint8 vmflags)
-{
- xl_heap_visible xlrec;
- XLogRecPtr recptr;
- uint8 flags;
-
- Assert(BufferIsValid(heap_buffer));
- Assert(BufferIsValid(vm_buffer));
-
- xlrec.snapshotConflictHorizon = snapshotConflictHorizon;
- xlrec.flags = vmflags;
- if (RelationIsAccessibleInLogicalDecoding(rel))
- xlrec.flags |= VISIBILITYMAP_XLOG_CATALOG_REL;
- XLogBeginInsert();
- XLogRegisterData(&xlrec, SizeOfHeapVisible);
-
- XLogRegisterBuffer(0, vm_buffer, 0);
-
- flags = REGBUF_STANDARD;
- if (!XLogHintBitIsNeeded())
- flags |= REGBUF_NO_IMAGE;
- XLogRegisterBuffer(1, heap_buffer, flags);
-
- recptr = XLogInsert(RM_HEAP2_ID, XLOG_HEAP2_VISIBLE);
-
- return recptr;
-}
-
/*
* Perform XLogInsert for a heap-update operation. Caller must already
* have modified the buffer(s) and marked them dirty.
diff --git a/src/backend/access/heap/heapam_xlog.c b/src/backend/access/heap/heapam_xlog.c
index 911416bbc56..69d1f0b8633 100644
--- a/src/backend/access/heap/heapam_xlog.c
+++ b/src/backend/access/heap/heapam_xlog.c
@@ -258,7 +258,7 @@ heap_xlog_prune_freeze(XLogReaderState *record)
/* We don't have relation name during recovery, so use relfilenode */
relname = psprintf("%u", rlocator.relNumber);
- old_vmbits = visibilitymap_set_vmbits(blkno, vmbuffer, vmflags, relname);
+ old_vmbits = visibilitymap_set(blkno, vmbuffer, vmflags, relname);
/* Only set VM page LSN if we modified the page */
if (old_vmbits != vmflags)
@@ -276,142 +276,6 @@ heap_xlog_prune_freeze(XLogReaderState *record)
XLogRecordPageWithFreeSpace(rlocator, blkno, freespace);
}
-/*
- * Replay XLOG_HEAP2_VISIBLE records.
- *
- * The critical integrity requirement here is that we must never end up with
- * a situation where the visibility map bit is set, and the page-level
- * PD_ALL_VISIBLE bit is clear. If that were to occur, then a subsequent
- * page modification would fail to clear the visibility map bit.
- */
-static void
-heap_xlog_visible(XLogReaderState *record)
-{
- XLogRecPtr lsn = record->EndRecPtr;
- xl_heap_visible *xlrec = (xl_heap_visible *) XLogRecGetData(record);
- Buffer vmbuffer = InvalidBuffer;
- Buffer buffer;
- Page page;
- RelFileLocator rlocator;
- BlockNumber blkno;
- XLogRedoAction action;
-
- Assert((xlrec->flags & VISIBILITYMAP_XLOG_VALID_BITS) == xlrec->flags);
-
- XLogRecGetBlockTag(record, 1, &rlocator, NULL, &blkno);
-
- /*
- * If there are any Hot Standby transactions running that have an xmin
- * horizon old enough that this page isn't all-visible for them, they
- * might incorrectly decide that an index-only scan can skip a heap fetch.
- *
- * NB: It might be better to throw some kind of "soft" conflict here that
- * forces any index-only scan that is in flight to perform heap fetches,
- * rather than killing the transaction outright.
- */
- if (InHotStandby)
- ResolveRecoveryConflictWithSnapshot(xlrec->snapshotConflictHorizon,
- xlrec->flags & VISIBILITYMAP_XLOG_CATALOG_REL,
- rlocator);
-
- /*
- * Read the heap page, if it still exists. If the heap file has dropped or
- * truncated later in recovery, we don't need to update the page, but we'd
- * better still update the visibility map.
- */
- action = XLogReadBufferForRedo(record, 1, &buffer);
- if (action == BLK_NEEDS_REDO)
- {
- /*
- * We don't bump the LSN of the heap page when setting the visibility
- * map bit (unless checksums or wal_hint_bits is enabled, in which
- * case we must). This exposes us to torn page hazards, but since
- * we're not inspecting the existing page contents in any way, we
- * don't care.
- */
- page = BufferGetPage(buffer);
-
- PageSetAllVisible(page);
-
- if (XLogHintBitIsNeeded())
- PageSetLSN(page, lsn);
-
- MarkBufferDirty(buffer);
- }
- else if (action == BLK_RESTORED)
- {
- /*
- * If heap block was backed up, we already restored it and there's
- * nothing more to do. (This can only happen with checksums or
- * wal_log_hints enabled.)
- */
- }
-
- if (BufferIsValid(buffer))
- {
- Size space = PageGetFreeSpace(BufferGetPage(buffer));
-
- UnlockReleaseBuffer(buffer);
-
- /*
- * Since FSM is not WAL-logged and only updated heuristically, it
- * easily becomes stale in standbys. If the standby is later promoted
- * and runs VACUUM, it will skip updating individual free space
- * figures for pages that became all-visible (or all-frozen, depending
- * on the vacuum mode,) which is troublesome when FreeSpaceMapVacuum
- * propagates too optimistic free space values to upper FSM layers;
- * later inserters try to use such pages only to find out that they
- * are unusable. This can cause long stalls when there are many such
- * pages.
- *
- * Forestall those problems by updating FSM's idea about a page that
- * is becoming all-visible or all-frozen.
- *
- * Do this regardless of a full-page image being applied, since the
- * FSM data is not in the page anyway.
- */
- if (xlrec->flags & VISIBILITYMAP_VALID_BITS)
- XLogRecordPageWithFreeSpace(rlocator, blkno, space);
- }
-
- /*
- * Even if we skipped the heap page update due to the LSN interlock, it's
- * still safe to update the visibility map. Any WAL record that clears
- * the visibility map bit does so before checking the page LSN, so any
- * bits that need to be cleared will still be cleared.
- */
- if (XLogReadBufferForRedoExtended(record, 0, RBM_ZERO_ON_ERROR, false,
- &vmbuffer) == BLK_NEEDS_REDO)
- {
- Page vmpage = BufferGetPage(vmbuffer);
- Relation reln;
- uint8 vmbits;
-
- /* initialize the page if it was read as zeros */
- if (PageIsNew(vmpage))
- PageInit(vmpage, BLCKSZ, 0);
-
- /* remove VISIBILITYMAP_XLOG_* */
- vmbits = xlrec->flags & VISIBILITYMAP_VALID_BITS;
-
- /*
- * XLogReadBufferForRedoExtended locked the buffer. But
- * visibilitymap_set will handle locking itself.
- */
- LockBuffer(vmbuffer, BUFFER_LOCK_UNLOCK);
-
- reln = CreateFakeRelcacheEntry(rlocator);
-
- visibilitymap_set(reln, blkno, InvalidBuffer, lsn, vmbuffer,
- xlrec->snapshotConflictHorizon, vmbits);
-
- ReleaseBuffer(vmbuffer);
- FreeFakeRelcacheEntry(reln);
- }
- else if (BufferIsValid(vmbuffer))
- UnlockReleaseBuffer(vmbuffer);
-}
-
/*
* Given an "infobits" field from an XLog record, set the correct bits in the
* given infomask and infomask2 for the tuple touched by the record.
@@ -789,8 +653,8 @@ heap_xlog_multi_insert(XLogReaderState *record)
*
* During recovery, however, no concurrent writers exist. Therefore,
* updating the VM without holding the heap page lock is safe enough. This
- * same approach is taken when replaying xl_heap_visible records (see
- * heap_xlog_visible()).
+ * same approach is taken when replaying XLOG_HEAP2_PRUNE* records (see
+ * heap_xlog_prune_and_freeze()).
*/
if ((xlrec->flags & XLH_INSERT_ALL_FROZEN_SET) &&
XLogReadBufferForRedoExtended(record, 1, RBM_ZERO_ON_ERROR, false,
@@ -805,11 +669,11 @@ heap_xlog_multi_insert(XLogReaderState *record)
/* We don't have relation name during recovery, so use relfilenode */
relname = psprintf("%u", rlocator.relNumber);
- visibilitymap_set_vmbits(blkno,
- vmbuffer,
- VISIBILITYMAP_ALL_VISIBLE |
- VISIBILITYMAP_ALL_FROZEN,
- relname);
+ visibilitymap_set(blkno,
+ vmbuffer,
+ VISIBILITYMAP_ALL_VISIBLE |
+ VISIBILITYMAP_ALL_FROZEN,
+ relname);
PageSetLSN(BufferGetPage(vmbuffer), lsn);
pfree(relname);
@@ -1390,9 +1254,6 @@ heap2_redo(XLogReaderState *record)
case XLOG_HEAP2_PRUNE_VACUUM_CLEANUP:
heap_xlog_prune_freeze(record);
break;
- case XLOG_HEAP2_VISIBLE:
- heap_xlog_visible(record);
- break;
case XLOG_HEAP2_MULTI_INSERT:
heap_xlog_multi_insert(record);
break;
diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index f384d74416a..142781d0008 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -1030,9 +1030,9 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
{
Assert(PageIsAllVisible(page));
- old_vmbits = visibilitymap_set_vmbits(blockno,
- vmbuffer, new_vmbits,
- RelationGetRelationName(params->relation));
+ old_vmbits = visibilitymap_set(blockno,
+ vmbuffer, new_vmbits,
+ RelationGetRelationName(params->relation));
if (old_vmbits == new_vmbits)
{
LockBuffer(vmbuffer, BUFFER_LOCK_UNLOCK);
@@ -2309,14 +2309,18 @@ get_conflict_xid(bool do_prune, bool do_freeze, bool do_set_vm,
*
* This is used for several different page maintenance operations:
*
- * - Page pruning, in VACUUM's 1st pass or on access: Some items are
+ * - Page pruning, in vacuum phase I or on-access: Some items are
* redirected, some marked dead, and some removed altogether.
*
- * - Freezing: Items are marked as 'frozen'.
+ * - Freezing: During vacuum phase I, items are marked as 'frozen'
*
- * - Vacuum, 2nd pass: Items that are already LP_DEAD are marked as unused.
+ * - Reaping: During vacuum phase III, items that are already LP_DEAD are
+ * marked as unused.
*
- * They have enough commonalities that we use a single WAL record for them
+ * - VM updates: After vacuum phases I and III, the heap page may be marked
+ * all-visible and all-frozen.
+ *
+ * These changes all happen together, so we use a single WAL record for them
* all.
*
* If replaying the record requires a cleanup lock, pass cleanup_lock = true.
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index cf9de40ff3c..bed77af23a2 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -1899,11 +1899,11 @@ lazy_scan_new_or_empty(LVRelState *vacrel, Buffer buf, BlockNumber blkno,
log_newpage_buffer(buf, true);
PageSetAllVisible(page);
- visibilitymap_set_vmbits(blkno,
- vmbuffer,
- VISIBILITYMAP_ALL_VISIBLE |
- VISIBILITYMAP_ALL_FROZEN,
- RelationGetRelationName(vacrel->rel));
+ visibilitymap_set(blkno,
+ vmbuffer,
+ VISIBILITYMAP_ALL_VISIBLE |
+ VISIBILITYMAP_ALL_FROZEN,
+ RelationGetRelationName(vacrel->rel));
/*
* Emit WAL for setting PD_ALL_VISIBLE on the heap page and
@@ -2783,9 +2783,9 @@ lazy_vacuum_heap_page(LVRelState *vacrel, BlockNumber blkno, Buffer buffer,
if ((vmflags & VISIBILITYMAP_VALID_BITS) != 0)
{
PageSetAllVisible(page);
- visibilitymap_set_vmbits(blkno,
- vmbuffer, vmflags,
- RelationGetRelationName(vacrel->rel));
+ visibilitymap_set(blkno,
+ vmbuffer, vmflags,
+ RelationGetRelationName(vacrel->rel));
conflict_xid = visibility_cutoff_xid;
}
diff --git a/src/backend/access/heap/visibilitymap.c b/src/backend/access/heap/visibilitymap.c
index 738105eb97e..dfa6113f0a9 100644
--- a/src/backend/access/heap/visibilitymap.c
+++ b/src/backend/access/heap/visibilitymap.c
@@ -14,8 +14,7 @@
* visibilitymap_clear - clear bits for one page in the visibility map
* visibilitymap_pin - pin a map page for setting a bit
* visibilitymap_pin_ok - check whether correct map page is already pinned
- * visibilitymap_set - set bit(s) in a previously pinned page and log
- * visibilitymap_set_vmbits - set bit(s) in a pinned page
+ * visibilitymap_set - set bit(s) in a previously pinned page
* visibilitymap_get_status - get status of bits
* visibilitymap_count - count number of bits set in visibility map
* visibilitymap_prepare_truncate -
@@ -220,107 +219,6 @@ visibilitymap_pin_ok(BlockNumber heapBlk, Buffer vmbuf)
return BufferIsValid(vmbuf) && BufferGetBlockNumber(vmbuf) == mapBlock;
}
-/*
- * visibilitymap_set - set bit(s) on a previously pinned page
- *
- * recptr is the LSN of the XLOG record we're replaying, if we're in recovery,
- * or InvalidXLogRecPtr in normal running. The VM page LSN is advanced to the
- * one provided; in normal running, we generate a new XLOG record and set the
- * page LSN to that value (though the heap page's LSN may *not* be updated;
- * see below). cutoff_xid is the largest xmin on the page being marked
- * all-visible; it is needed for Hot Standby, and can be InvalidTransactionId
- * if the page contains no tuples. It can also be set to InvalidTransactionId
- * when a page that is already all-visible is being marked all-frozen.
- *
- * Caller is expected to set the heap page's PD_ALL_VISIBLE bit before calling
- * this function. Except in recovery, caller should also pass the heap
- * buffer. When checksums are enabled and we're not in recovery, we must add
- * the heap buffer to the WAL chain to protect it from being torn.
- *
- * You must pass a buffer containing the correct map page to this function.
- * Call visibilitymap_pin first to pin the right one. This function doesn't do
- * any I/O.
- *
- * Returns the state of the page's VM bits before setting flags.
- */
-uint8
-visibilitymap_set(Relation rel, BlockNumber heapBlk, Buffer heapBuf,
- XLogRecPtr recptr, Buffer vmBuf, TransactionId cutoff_xid,
- uint8 flags)
-{
- BlockNumber mapBlock = HEAPBLK_TO_MAPBLOCK(heapBlk);
- uint32 mapByte = HEAPBLK_TO_MAPBYTE(heapBlk);
- uint8 mapOffset = HEAPBLK_TO_OFFSET(heapBlk);
- Page page;
- uint8 *map;
- uint8 status;
-
-#ifdef TRACE_VISIBILITYMAP
- elog(DEBUG1, "vm_set flags 0x%02X for %s %d",
- flags, RelationGetRelationName(rel), heapBlk);
-#endif
-
- Assert(InRecovery || XLogRecPtrIsInvalid(recptr));
- Assert(InRecovery || PageIsAllVisible(BufferGetPage(heapBuf)));
- Assert((flags & VISIBILITYMAP_VALID_BITS) == flags);
-
- /* Must never set all_frozen bit without also setting all_visible bit */
- Assert(flags != VISIBILITYMAP_ALL_FROZEN);
-
- /* Check that we have the right heap page pinned, if present */
- if (BufferIsValid(heapBuf) && BufferGetBlockNumber(heapBuf) != heapBlk)
- elog(ERROR, "wrong heap buffer passed to visibilitymap_set");
-
- Assert(!BufferIsValid(heapBuf) || BufferIsExclusiveLocked(heapBuf));
-
- /* Check that we have the right VM page pinned */
- if (!BufferIsValid(vmBuf) || BufferGetBlockNumber(vmBuf) != mapBlock)
- elog(ERROR, "wrong VM buffer passed to visibilitymap_set");
-
- page = BufferGetPage(vmBuf);
- map = (uint8 *) PageGetContents(page);
- LockBuffer(vmBuf, BUFFER_LOCK_EXCLUSIVE);
-
- status = (map[mapByte] >> mapOffset) & VISIBILITYMAP_VALID_BITS;
- if (flags != status)
- {
- START_CRIT_SECTION();
-
- map[mapByte] |= (flags << mapOffset);
- MarkBufferDirty(vmBuf);
-
- if (RelationNeedsWAL(rel))
- {
- if (XLogRecPtrIsInvalid(recptr))
- {
- Assert(!InRecovery);
- recptr = log_heap_visible(rel, heapBuf, vmBuf, cutoff_xid, flags);
-
- /*
- * If data checksums are enabled (or wal_log_hints=on), we
- * need to protect the heap page from being torn.
- *
- * If not, then we must *not* update the heap page's LSN. In
- * this case, the FPI for the heap page was omitted from the
- * WAL record inserted above, so it would be incorrect to
- * update the heap page's LSN.
- */
- if (XLogHintBitIsNeeded())
- {
- Page heapPage = BufferGetPage(heapBuf);
-
- PageSetLSN(heapPage, recptr);
- }
- }
- PageSetLSN(page, recptr);
- }
-
- END_CRIT_SECTION();
- }
-
- LockBuffer(vmBuf, BUFFER_LOCK_UNLOCK);
- return status;
-}
/*
* Set visibility map (VM) flags in the block referenced by vmBuf.
@@ -343,9 +241,9 @@ visibilitymap_set(Relation rel, BlockNumber heapBlk, Buffer heapBuf,
* heapRelname is used only for debugging.
*/
uint8
-visibilitymap_set_vmbits(BlockNumber heapBlk,
- Buffer vmBuf, uint8 flags,
- const char *heapRelname)
+visibilitymap_set(BlockNumber heapBlk,
+ Buffer vmBuf, uint8 flags,
+ const char *heapRelname)
{
BlockNumber mapBlock = HEAPBLK_TO_MAPBLOCK(heapBlk);
uint32 mapByte = HEAPBLK_TO_MAPBYTE(heapBlk);
diff --git a/src/backend/access/rmgrdesc/heapdesc.c b/src/backend/access/rmgrdesc/heapdesc.c
index 1cb44ca32d3..93505cb8c56 100644
--- a/src/backend/access/rmgrdesc/heapdesc.c
+++ b/src/backend/access/rmgrdesc/heapdesc.c
@@ -349,13 +349,6 @@ heap2_desc(StringInfo buf, XLogReaderState *record)
}
}
}
- else if (info == XLOG_HEAP2_VISIBLE)
- {
- xl_heap_visible *xlrec = (xl_heap_visible *) rec;
-
- appendStringInfo(buf, "snapshotConflictHorizon: %u, flags: 0x%02X",
- xlrec->snapshotConflictHorizon, xlrec->flags);
- }
else if (info == XLOG_HEAP2_MULTI_INSERT)
{
xl_heap_multi_insert *xlrec = (xl_heap_multi_insert *) rec;
@@ -460,9 +453,6 @@ heap2_identify(uint8 info)
case XLOG_HEAP2_PRUNE_VACUUM_CLEANUP:
id = "PRUNE_VACUUM_CLEANUP";
break;
- case XLOG_HEAP2_VISIBLE:
- id = "VISIBLE";
- break;
case XLOG_HEAP2_MULTI_INSERT:
id = "MULTI_INSERT";
break;
diff --git a/src/backend/replication/logical/decode.c b/src/backend/replication/logical/decode.c
index cc03f0706e9..2fdd4af90a8 100644
--- a/src/backend/replication/logical/decode.c
+++ b/src/backend/replication/logical/decode.c
@@ -454,7 +454,6 @@ heap2_decode(LogicalDecodingContext *ctx, XLogRecordBuffer *buf)
case XLOG_HEAP2_PRUNE_ON_ACCESS:
case XLOG_HEAP2_PRUNE_VACUUM_SCAN:
case XLOG_HEAP2_PRUNE_VACUUM_CLEANUP:
- case XLOG_HEAP2_VISIBLE:
case XLOG_HEAP2_LOCK_UPDATED:
break;
default:
diff --git a/src/backend/storage/ipc/standby.c b/src/backend/storage/ipc/standby.c
index 4222bdab078..c619643e121 100644
--- a/src/backend/storage/ipc/standby.c
+++ b/src/backend/storage/ipc/standby.c
@@ -475,12 +475,12 @@ ResolveRecoveryConflictWithSnapshot(TransactionId snapshotConflictHorizon,
* If we get passed InvalidTransactionId then we do nothing (no conflict).
*
* This can happen when replaying already-applied WAL records after a
- * standby crash or restart, or when replaying an XLOG_HEAP2_VISIBLE
- * record that marks as frozen a page which was already all-visible. It's
- * also quite common with records generated during index deletion
- * (original execution of the deletion can reason that a recovery conflict
- * which is sufficient for the deletion operation must take place before
- * replay of the deletion record itself).
+ * standby crash or restart, or when replaying a record that marks as
+ * frozen a page which was already marked all-visible in the visibility
+ * map. It's also quite common with records generated during index
+ * deletion (original execution of the deletion can reason that a recovery
+ * conflict which is sufficient for the deletion operation must take place
+ * before replay of the deletion record itself).
*/
if (!TransactionIdIsValid(snapshotConflictHorizon))
return;
diff --git a/src/include/access/heapam_xlog.h b/src/include/access/heapam_xlog.h
index 16c2b2e3c9c..e9e77bd678b 100644
--- a/src/include/access/heapam_xlog.h
+++ b/src/include/access/heapam_xlog.h
@@ -60,7 +60,6 @@
#define XLOG_HEAP2_PRUNE_ON_ACCESS 0x10
#define XLOG_HEAP2_PRUNE_VACUUM_SCAN 0x20
#define XLOG_HEAP2_PRUNE_VACUUM_CLEANUP 0x30
-#define XLOG_HEAP2_VISIBLE 0x40
#define XLOG_HEAP2_MULTI_INSERT 0x50
#define XLOG_HEAP2_LOCK_UPDATED 0x60
#define XLOG_HEAP2_NEW_CID 0x70
@@ -294,7 +293,13 @@ typedef struct xl_heap_prune
#define SizeOfHeapPrune (offsetof(xl_heap_prune, flags) + sizeof(uint16))
-/* to handle recovery conflict during logical decoding on standby */
+/*
+ * To handle recovery conflict during logical decoding on standby, we must know
+ * if the table is a catalog table. Note that in visibilitymapdefs.h
+ * VISIBLITYMAP_XLOG_CATALOG_REL is also defined as (1 << 2). xl_heap_prune
+ * records should use XLHP_IS_CATALOG_REL, not VISIBILIYTMAP_XLOG_CATALOG_REL --
+ * even if they only contain updates to the VM.
+ */
#define XLHP_IS_CATALOG_REL (1 << 1)
/*
@@ -443,20 +448,6 @@ typedef struct xl_heap_inplace
#define MinSizeOfHeapInplace (offsetof(xl_heap_inplace, nmsgs) + sizeof(int))
-/*
- * This is what we need to know about setting a visibility map bit
- *
- * Backup blk 0: visibility map buffer
- * Backup blk 1: heap buffer
- */
-typedef struct xl_heap_visible
-{
- TransactionId snapshotConflictHorizon;
- uint8 flags;
-} xl_heap_visible;
-
-#define SizeOfHeapVisible (offsetof(xl_heap_visible, flags) + sizeof(uint8))
-
typedef struct xl_heap_new_cid
{
/*
@@ -500,11 +491,6 @@ extern void heap2_desc(StringInfo buf, XLogReaderState *record);
extern const char *heap2_identify(uint8 info);
extern void heap_xlog_logical_rewrite(XLogReaderState *r);
-extern XLogRecPtr log_heap_visible(Relation rel, Buffer heap_buffer,
- Buffer vm_buffer,
- TransactionId snapshotConflictHorizon,
- uint8 vmflags);
-
/* in heapdesc.c, so it can be shared between frontend/backend code */
extern void heap_xlog_deserialize_prune_and_freeze(char *cursor, uint16 flags,
int *nplans, xlhp_freeze_plan **plans,
diff --git a/src/include/access/visibilitymap.h b/src/include/access/visibilitymap.h
index 3dcf37ba03f..859e5795457 100644
--- a/src/include/access/visibilitymap.h
+++ b/src/include/access/visibilitymap.h
@@ -15,7 +15,6 @@
#define VISIBILITYMAP_H
#include "access/visibilitymapdefs.h"
-#include "access/xlogdefs.h"
#include "storage/block.h"
#include "storage/buf.h"
#include "utils/relcache.h"
@@ -31,15 +30,11 @@ extern bool visibilitymap_clear(Relation rel, BlockNumber heapBlk,
extern void visibilitymap_pin(Relation rel, BlockNumber heapBlk,
Buffer *vmbuf);
extern bool visibilitymap_pin_ok(BlockNumber heapBlk, Buffer vmbuf);
-extern uint8 visibilitymap_set(Relation rel,
- BlockNumber heapBlk, Buffer heapBuf,
- XLogRecPtr recptr,
- Buffer vmBuf,
- TransactionId cutoff_xid,
- uint8 flags);
-extern uint8 visibilitymap_set_vmbits(BlockNumber heapBlk,
- Buffer vmBuf, uint8 flags,
- const char *heapRelname);
+
+extern uint8 visibilitymap_set(BlockNumber heapBlk,
+ Buffer vmBuf, uint8 flags,
+ const char *heapRelname);
+
extern uint8 visibilitymap_get_status(Relation rel, BlockNumber heapBlk, Buffer *vmbuf);
extern void visibilitymap_count(Relation rel, BlockNumber *all_visible, BlockNumber *all_frozen);
extern BlockNumber visibilitymap_prepare_truncate(Relation rel,
diff --git a/src/include/access/visibilitymapdefs.h b/src/include/access/visibilitymapdefs.h
index 5ad5c020877..e01bce4c99f 100644
--- a/src/include/access/visibilitymapdefs.h
+++ b/src/include/access/visibilitymapdefs.h
@@ -21,14 +21,5 @@
#define VISIBILITYMAP_ALL_FROZEN 0x02
#define VISIBILITYMAP_VALID_BITS 0x03 /* OR of all valid visibilitymap
* flags bits */
-/*
- * To detect recovery conflicts during logical decoding on a standby, we need
- * to know if a table is a user catalog table. For that we add an additional
- * bit into xl_heap_visible.flags, in addition to the above.
- *
- * NB: VISIBILITYMAP_XLOG_* may not be passed to visibilitymap_set().
- */
-#define VISIBILITYMAP_XLOG_CATALOG_REL 0x04
-#define VISIBILITYMAP_XLOG_VALID_BITS (VISIBILITYMAP_VALID_BITS | VISIBILITYMAP_XLOG_CATALOG_REL)
#endif /* VISIBILITYMAPDEFS_H */
diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list
index 8a626d633d5..48eb3cf4466 100644
--- a/src/tools/pgindent/typedefs.list
+++ b/src/tools/pgindent/typedefs.list
@@ -4272,7 +4272,6 @@ xl_heap_prune
xl_heap_rewrite_mapping
xl_heap_truncate
xl_heap_update
-xl_heap_visible
xl_invalid_page
xl_invalid_page_key
xl_invalidations
--
2.43.0
v16-0010-Rename-GlobalVisTestIsRemovableXid-to-GlobalVisX.patchtext/x-patch; charset=UTF-8; name=v16-0010-Rename-GlobalVisTestIsRemovableXid-to-GlobalVisX.patchDownload
From 1e4108e0c5b007fe55f12c29f4a47247ba023ef9 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <melanieplageman@gmail.com>
Date: Fri, 18 Jul 2025 16:30:04 -0400
Subject: [PATCH v16 10/14] Rename GlobalVisTestIsRemovableXid() to
GlobalVisXidVisibleToAll()
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
The function is currently only used to check whether a tuple’s xmax is
visible to all transactions (and thus removable). Upcoming changes will
also use it to test whether a tuple’s xmin is visible to all to
decide if a page can be marked all-visible in the visibility map.
The new name, GlobalVisXidVisibleToAll(), better reflects this broader
purpose.
Reviewed-by: Kirill Reshke <reshkekirill@gmail.com>
---
src/backend/access/heap/heapam_visibility.c | 6 +++---
src/backend/access/heap/pruneheap.c | 16 ++++++++--------
src/backend/access/spgist/spgvacuum.c | 2 +-
src/backend/storage/ipc/procarray.c | 17 ++++++++---------
src/include/utils/snapmgr.h | 4 ++--
5 files changed, 22 insertions(+), 23 deletions(-)
diff --git a/src/backend/access/heap/heapam_visibility.c b/src/backend/access/heap/heapam_visibility.c
index 05f6946fe60..4ebc8abdbeb 100644
--- a/src/backend/access/heap/heapam_visibility.c
+++ b/src/backend/access/heap/heapam_visibility.c
@@ -1447,7 +1447,7 @@ HeapTupleSatisfiesNonVacuumable(HeapTuple htup, Snapshot snapshot,
{
Assert(TransactionIdIsValid(dead_after));
- if (GlobalVisTestIsRemovableXid(snapshot->vistest, dead_after))
+ if (GlobalVisXidVisibleToAll(snapshot->vistest, dead_after))
res = HEAPTUPLE_DEAD;
}
else
@@ -1512,8 +1512,8 @@ HeapTupleIsSurelyDead(HeapTuple htup, GlobalVisState *vistest)
return false;
/* Deleter committed, so tuple is dead if the XID is old enough. */
- return GlobalVisTestIsRemovableXid(vistest,
- HeapTupleHeaderGetRawXmax(tuple));
+ return GlobalVisXidVisibleToAll(vistest,
+ HeapTupleHeaderGetRawXmax(tuple));
}
/*
diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index 142781d0008..78e04f1d17c 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -233,7 +233,7 @@ heap_page_prune_opt(Relation relation, Buffer buffer)
*/
vistest = GlobalVisTestFor(relation);
- if (!GlobalVisTestIsRemovableXid(vistest, prune_xid))
+ if (!GlobalVisXidVisibleToAll(vistest, prune_xid))
return;
/*
@@ -729,9 +729,9 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
* Determining HTSV only once for each tuple is required for correctness,
* to deal with cases where running HTSV twice could result in different
* results. For example, RECENTLY_DEAD can turn to DEAD if another
- * checked item causes GlobalVisTestIsRemovableFullXid() to update the
- * horizon, or INSERT_IN_PROGRESS can change to DEAD if the inserting
- * transaction aborts.
+ * checked item causes GlobalVisXidVisibleToAll() to update the horizon,
+ * or INSERT_IN_PROGRESS can change to DEAD if the inserting transaction
+ * aborts.
*
* It's also good for performance. Most commonly tuples within a page are
* stored at decreasing offsets (while the items are stored at increasing
@@ -1154,11 +1154,11 @@ heap_prune_satisfies_vacuum(PruneState *prstate, HeapTuple tup, Buffer buffer)
* Determine whether or not the tuple is considered dead when compared
* with the provided GlobalVisState. On-access pruning does not provide
* VacuumCutoffs. And for vacuum, even if the tuple's xmax is not older
- * than OldestXmin, GlobalVisTestIsRemovableXid() could find the row dead
- * if the GlobalVisState has been updated since the beginning of vacuuming
+ * than OldestXmin, GlobalVisXidVisibleToAll() could find the row dead if
+ * the GlobalVisState has been updated since the beginning of vacuuming
* the relation.
*/
- if (GlobalVisTestIsRemovableXid(prstate->vistest, dead_after))
+ if (GlobalVisXidVisibleToAll(prstate->vistest, dead_after))
return HEAPTUPLE_DEAD;
return res;
@@ -1616,7 +1616,7 @@ heap_prune_record_unchanged_lp_normal(Page page, PruneState *prstate, OffsetNumb
/*
* For now always use prstate->cutoffs for this test, because
* we only update 'all_visible' when freezing is requested. We
- * could use GlobalVisTestIsRemovableXid instead, if a
+ * could use GlobalVisXidVisibleToAll() instead, if a
* non-freezing caller wanted to set the VM bit.
*/
Assert(prstate->cutoffs);
diff --git a/src/backend/access/spgist/spgvacuum.c b/src/backend/access/spgist/spgvacuum.c
index 8f8a1ad7796..496cca69410 100644
--- a/src/backend/access/spgist/spgvacuum.c
+++ b/src/backend/access/spgist/spgvacuum.c
@@ -536,7 +536,7 @@ vacuumRedirectAndPlaceholder(Relation index, Relation heaprel, Buffer buffer)
*/
if (dt->tupstate == SPGIST_REDIRECT &&
(!TransactionIdIsValid(dt->xid) ||
- GlobalVisTestIsRemovableXid(vistest, dt->xid)))
+ GlobalVisXidVisibleToAll(vistest, dt->xid)))
{
dt->tupstate = SPGIST_PLACEHOLDER;
Assert(opaque->nRedirection > 0);
diff --git a/src/backend/storage/ipc/procarray.c b/src/backend/storage/ipc/procarray.c
index 200f72c6e25..235c3b584f6 100644
--- a/src/backend/storage/ipc/procarray.c
+++ b/src/backend/storage/ipc/procarray.c
@@ -4181,8 +4181,7 @@ GlobalVisUpdate(void)
* See comment for GlobalVisState for details.
*/
bool
-GlobalVisTestIsRemovableFullXid(GlobalVisState *state,
- FullTransactionId fxid)
+GlobalVisFullXidVisibleToAll(GlobalVisState *state, FullTransactionId fxid)
{
/*
* If fxid is older than maybe_needed bound, it definitely is visible to
@@ -4216,14 +4215,14 @@ GlobalVisTestIsRemovableFullXid(GlobalVisState *state,
}
/*
- * Wrapper around GlobalVisTestIsRemovableFullXid() for 32bit xids.
+ * Wrapper around GlobalVisFullXidVisibleToAll() for 32bit xids.
*
* It is crucial that this only gets called for xids from a source that
* protects against xid wraparounds (e.g. from a table and thus protected by
* relfrozenxid).
*/
bool
-GlobalVisTestIsRemovableXid(GlobalVisState *state, TransactionId xid)
+GlobalVisXidVisibleToAll(GlobalVisState *state, TransactionId xid)
{
FullTransactionId fxid;
@@ -4237,12 +4236,12 @@ GlobalVisTestIsRemovableXid(GlobalVisState *state, TransactionId xid)
*/
fxid = FullXidRelativeTo(state->definitely_needed, xid);
- return GlobalVisTestIsRemovableFullXid(state, fxid);
+ return GlobalVisFullXidVisibleToAll(state, fxid);
}
/*
* Convenience wrapper around GlobalVisTestFor() and
- * GlobalVisTestIsRemovableFullXid(), see their comments.
+ * GlobalVisFullXidVisibleToAll(), see their comments.
*/
bool
GlobalVisCheckRemovableFullXid(Relation rel, FullTransactionId fxid)
@@ -4251,12 +4250,12 @@ GlobalVisCheckRemovableFullXid(Relation rel, FullTransactionId fxid)
state = GlobalVisTestFor(rel);
- return GlobalVisTestIsRemovableFullXid(state, fxid);
+ return GlobalVisFullXidVisibleToAll(state, fxid);
}
/*
* Convenience wrapper around GlobalVisTestFor() and
- * GlobalVisTestIsRemovableXid(), see their comments.
+ * GlobalVisTestIsVisibleXid(), see their comments.
*/
bool
GlobalVisCheckRemovableXid(Relation rel, TransactionId xid)
@@ -4265,7 +4264,7 @@ GlobalVisCheckRemovableXid(Relation rel, TransactionId xid)
state = GlobalVisTestFor(rel);
- return GlobalVisTestIsRemovableXid(state, xid);
+ return GlobalVisXidVisibleToAll(state, xid);
}
/*
diff --git a/src/include/utils/snapmgr.h b/src/include/utils/snapmgr.h
index 604c1f90216..a0ea2cfcea2 100644
--- a/src/include/utils/snapmgr.h
+++ b/src/include/utils/snapmgr.h
@@ -100,8 +100,8 @@ extern char *ExportSnapshot(Snapshot snapshot);
*/
typedef struct GlobalVisState GlobalVisState;
extern GlobalVisState *GlobalVisTestFor(Relation rel);
-extern bool GlobalVisTestIsRemovableXid(GlobalVisState *state, TransactionId xid);
-extern bool GlobalVisTestIsRemovableFullXid(GlobalVisState *state, FullTransactionId fxid);
+extern bool GlobalVisXidVisibleToAll(GlobalVisState *state, TransactionId xid);
+extern bool GlobalVisFullXidVisibleToAll(GlobalVisState *state, FullTransactionId fxid);
extern bool GlobalVisCheckRemovableXid(Relation rel, TransactionId xid);
extern bool GlobalVisCheckRemovableFullXid(Relation rel, FullTransactionId fxid);
--
2.43.0
v16-0011-Use-GlobalVisState-in-vacuum-to-determine-page-l.patchtext/x-patch; charset=UTF-8; name=v16-0011-Use-GlobalVisState-in-vacuum-to-determine-page-l.patchDownload
From a28aef72286f446c53614621ebe7f8b65ee4b59b Mon Sep 17 00:00:00 2001
From: Melanie Plageman <melanieplageman@gmail.com>
Date: Tue, 29 Jul 2025 14:38:24 -0400
Subject: [PATCH v16 11/14] Use GlobalVisState in vacuum to determine page
level visibility
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
During vacuum's first and third phases, we examine tuples' visibility
to determine if we can set the page all-visible in the visibility map.
Previously, this check compared tuple xmins against a single XID chosen at
the start of vacuum (OldestXmin). We now use GlobalVisState, which also
enables future work to set the VM during on-access pruning, since ordinary
queries have access to GlobalVisState but not OldestXmin.
This also benefits vacuum directly: GlobalVisState may advance
during a vacuum, allowing more pages to become considered all-visible.
In the rare case that it moves backward, VACUUM falls back to OldestXmin
to ensure we don’t attempt to freeze a dead tuple that wasn’t yet
prunable according to the GlobalVisState.
Because comparing a transaction ID against GlobalVisState is more
expensive than comparing against a single XID, we defer this check until
after scanning all tuples on the page. If visibility_cutoff_xid was
maintained, we perform the GlobalVisState check only once per page.
This is safe because visibility_cutoff_xid records the newest xmin on
the page; if it is globally visible, then the entire page is all-visible.
This approach may result in examining more tuple xmins than before,
since with OldestXmin we could sometimes rule out the page being
all-visible earlier. However, profiling shows the additional cost is not
significant.
---
src/backend/access/heap/heapam_visibility.c | 28 ++++++++++++++++
src/backend/access/heap/pruneheap.c | 37 ++++++++++-----------
src/backend/access/heap/vacuumlazy.c | 17 +++++-----
src/include/access/heapam.h | 7 ++--
4 files changed, 57 insertions(+), 32 deletions(-)
diff --git a/src/backend/access/heap/heapam_visibility.c b/src/backend/access/heap/heapam_visibility.c
index 4ebc8abdbeb..edd529dc3c0 100644
--- a/src/backend/access/heap/heapam_visibility.c
+++ b/src/backend/access/heap/heapam_visibility.c
@@ -1189,6 +1189,34 @@ HeapTupleSatisfiesVacuum(HeapTuple htup, TransactionId OldestXmin,
return res;
}
+/*
+ * Nearly the same as HeapTupleSatisfiesVacuum, but uses a GlobalVisState to
+ * determine whether or not a tuple is HEAPTUPLE_DEAD Or
+ * HEAPTUPLE_RECENTLY_DEAD. It serves the same purpose but can be used by
+ * callers that have not calculated a single OldestXmin value.
+ */
+HTSV_Result
+HeapTupleSatisfiesVacuumGlobalVis(HeapTuple htup, GlobalVisState *vistest,
+ Buffer buffer)
+{
+ TransactionId dead_after = InvalidTransactionId;
+ HTSV_Result res;
+
+ res = HeapTupleSatisfiesVacuumHorizon(htup, buffer, &dead_after);
+
+ if (res == HEAPTUPLE_RECENTLY_DEAD)
+ {
+ Assert(TransactionIdIsValid(dead_after));
+
+ if (GlobalVisXidVisibleToAll(vistest, dead_after))
+ res = HEAPTUPLE_DEAD;
+ }
+ else
+ Assert(!TransactionIdIsValid(dead_after));
+
+ return res;
+}
+
/*
* Work horse for HeapTupleSatisfiesVacuum and similar routines.
*
diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index 78e04f1d17c..e5b16bd2b38 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -711,11 +711,12 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
/*
* The visibility cutoff xid is the newest xmin of live, committed tuples
- * older than OldestXmin on the page. This field is only kept up-to-date
- * if the page is all-visible. As soon as a tuple is encountered that is
- * not visible to all, this field is unmaintained. As long as it is
- * maintained, it can be used to calculate the snapshot conflict horizon
- * when updating the VM and/or freezing all the tuples on the page.
+ * on the page older than the visibility horizon represented in the
+ * GlobalVisState. This field is only kept up-to-date if the page is
+ * all-visible. As soon as a tuple is encountered that is not visible to
+ * all, this field is unmaintained. As long as it is maintained, it can be
+ * used to calculate the snapshot conflict horizon when updating the VM
+ * and/or freezing all the tuples on the page.
*/
prstate.visibility_cutoff_xid = InvalidTransactionId;
@@ -911,6 +912,16 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
prstate.ndead > 0 ||
prstate.nunused > 0;
+ /*
+ * After processing all the live tuples on the page, if the newest xmin
+ * amongst them is not visible to everyone, the page cannot be
+ * all-visible.
+ */
+ if (prstate.all_visible &&
+ TransactionIdIsNormal(prstate.visibility_cutoff_xid) &&
+ !GlobalVisXidVisibleToAll(prstate.vistest, prstate.visibility_cutoff_xid))
+ prstate.all_visible = prstate.all_frozen = false;
+
/*
* Even if we don't prune anything, if we found a new value for the
* pd_prune_xid field or the page was marked full, we will update the hint
@@ -1081,10 +1092,9 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
bool debug_all_frozen;
Assert(prstate.lpdead_items == 0);
- Assert(prstate.cutoffs);
if (!heap_page_is_all_visible(params->relation, buffer,
- prstate.cutoffs->OldestXmin,
+ prstate.vistest,
&debug_all_frozen,
&debug_cutoff, off_loc))
Assert(false);
@@ -1613,19 +1623,6 @@ heap_prune_record_unchanged_lp_normal(Page page, PruneState *prstate, OffsetNumb
*/
xmin = HeapTupleHeaderGetXmin(htup);
- /*
- * For now always use prstate->cutoffs for this test, because
- * we only update 'all_visible' when freezing is requested. We
- * could use GlobalVisXidVisibleToAll() instead, if a
- * non-freezing caller wanted to set the VM bit.
- */
- Assert(prstate->cutoffs);
- if (!TransactionIdPrecedes(xmin, prstate->cutoffs->OldestXmin))
- {
- prstate->all_visible = prstate->all_frozen = false;
- break;
- }
-
/* Track newest xmin on page. */
if (TransactionIdFollows(xmin, prstate->visibility_cutoff_xid) &&
TransactionIdIsNormal(xmin))
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index bed77af23a2..3af8a359e42 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -464,7 +464,7 @@ static void dead_items_add(LVRelState *vacrel, BlockNumber blkno, OffsetNumber *
static void dead_items_reset(LVRelState *vacrel);
static void dead_items_cleanup(LVRelState *vacrel);
static bool heap_page_would_be_all_visible(Relation rel, Buffer buf,
- TransactionId OldestXmin,
+ GlobalVisState *vistest,
OffsetNumber *deadoffsets,
int ndeadoffsets,
bool *all_frozen,
@@ -2739,7 +2739,7 @@ lazy_vacuum_heap_page(LVRelState *vacrel, BlockNumber blkno, Buffer buffer,
* done outside the critical section.
*/
if (heap_page_would_be_all_visible(vacrel->rel, buffer,
- vacrel->cutoffs.OldestXmin,
+ vacrel->vistest,
deadoffsets, num_offsets,
&all_frozen, &visibility_cutoff_xid,
&vacrel->offnum))
@@ -3488,14 +3488,13 @@ dead_items_cleanup(LVRelState *vacrel)
*/
bool
heap_page_is_all_visible(Relation rel, Buffer buf,
- TransactionId OldestXmin,
+ GlobalVisState *vistest,
bool *all_frozen,
TransactionId *visibility_cutoff_xid,
OffsetNumber *logging_offnum)
{
- return heap_page_would_be_all_visible(rel, buf,
- OldestXmin,
+ return heap_page_would_be_all_visible(rel, buf, vistest,
NULL, 0,
all_frozen,
visibility_cutoff_xid,
@@ -3514,7 +3513,7 @@ heap_page_is_all_visible(Relation rel, Buffer buf,
* Returns true if the page is all-visible other than the provided
* deadoffsets and false otherwise.
*
- * OldestXmin is used to determine visibility.
+ * vistest is used to determine visibility.
*
* Output parameters:
*
@@ -3533,7 +3532,7 @@ heap_page_is_all_visible(Relation rel, Buffer buf,
*/
static bool
heap_page_would_be_all_visible(Relation rel, Buffer buf,
- TransactionId OldestXmin,
+ GlobalVisState *vistest,
OffsetNumber *deadoffsets,
int ndeadoffsets,
bool *all_frozen,
@@ -3605,7 +3604,7 @@ heap_page_would_be_all_visible(Relation rel, Buffer buf,
tuple.t_len = ItemIdGetLength(itemid);
tuple.t_tableOid = RelationGetRelid(rel);
- switch (HeapTupleSatisfiesVacuum(&tuple, OldestXmin, buf))
+ switch (HeapTupleSatisfiesVacuumGlobalVis(&tuple, vistest, buf))
{
case HEAPTUPLE_LIVE:
{
@@ -3624,7 +3623,7 @@ heap_page_would_be_all_visible(Relation rel, Buffer buf,
* that everyone sees it as committed?
*/
xmin = HeapTupleHeaderGetXmin(tuple.t_data);
- if (!TransactionIdPrecedes(xmin, OldestXmin))
+ if (!GlobalVisXidVisibleToAll(vistest, xmin))
{
all_visible = false;
*all_frozen = false;
diff --git a/src/include/access/heapam.h b/src/include/access/heapam.h
index 2de39ba0cd1..df0632aebc6 100644
--- a/src/include/access/heapam.h
+++ b/src/include/access/heapam.h
@@ -276,8 +276,7 @@ typedef struct PruneFreezeParams
/*
* cutoffs contains the freeze cutoffs, established by VACUUM at the
* beginning of vacuuming the relation. Required if HEAP_PRUNE_FREEZE
- * option is set. cutoffs->OldestXmin is also used to determine if dead
- * tuples are HEAPTUPLE_RECENTLY_DEAD or HEAPTUPLE_DEAD.
+ * option is set.
*/
struct VacuumCutoffs *cutoffs;
} PruneFreezeParams;
@@ -443,7 +442,7 @@ extern void heap_vacuum_rel(Relation rel,
const VacuumParams params, BufferAccessStrategy bstrategy);
extern bool heap_page_is_all_visible(Relation rel, Buffer buf,
- TransactionId OldestXmin,
+ GlobalVisState *vistest,
bool *all_frozen,
TransactionId *visibility_cutoff_xid,
OffsetNumber *logging_offnum);
@@ -455,6 +454,8 @@ extern TM_Result HeapTupleSatisfiesUpdate(HeapTuple htup, CommandId curcid,
Buffer buffer);
extern HTSV_Result HeapTupleSatisfiesVacuum(HeapTuple htup, TransactionId OldestXmin,
Buffer buffer);
+extern HTSV_Result HeapTupleSatisfiesVacuumGlobalVis(HeapTuple htup,
+ GlobalVisState *vistest, Buffer buffer);
extern HTSV_Result HeapTupleSatisfiesVacuumHorizon(HeapTuple htup, Buffer buffer,
TransactionId *dead_after);
extern void HeapTupleSetHintBits(HeapTupleHeader tuple, Buffer buffer,
--
2.43.0
v16-0012-Inline-TransactionIdFollows-Precedes.patchtext/x-patch; charset=US-ASCII; name=v16-0012-Inline-TransactionIdFollows-Precedes.patchDownload
From 9ed00b821b89276c80382bc810e6a3368cc35521 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <melanieplageman@gmail.com>
Date: Tue, 29 Jul 2025 09:57:13 -0400
Subject: [PATCH v16 12/14] Inline TransactionIdFollows/Precedes()
Calling these from on-access pruning code had noticeable overhead in a
profile. There does not seem to be a reason not to inline them.
Reviewed-by: Kirill Reshke <reshkekirill@gmail.com>
---
src/backend/access/transam/transam.c | 64 -------------------------
src/include/access/transam.h | 70 ++++++++++++++++++++++++++--
2 files changed, 66 insertions(+), 68 deletions(-)
diff --git a/src/backend/access/transam/transam.c b/src/backend/access/transam/transam.c
index 9a39451a29a..553d6756cb3 100644
--- a/src/backend/access/transam/transam.c
+++ b/src/backend/access/transam/transam.c
@@ -273,70 +273,6 @@ TransactionIdAbortTree(TransactionId xid, int nxids, TransactionId *xids)
TRANSACTION_STATUS_ABORTED, InvalidXLogRecPtr);
}
-/*
- * TransactionIdPrecedes --- is id1 logically < id2?
- */
-bool
-TransactionIdPrecedes(TransactionId id1, TransactionId id2)
-{
- /*
- * If either ID is a permanent XID then we can just do unsigned
- * comparison. If both are normal, do a modulo-2^32 comparison.
- */
- int32 diff;
-
- if (!TransactionIdIsNormal(id1) || !TransactionIdIsNormal(id2))
- return (id1 < id2);
-
- diff = (int32) (id1 - id2);
- return (diff < 0);
-}
-
-/*
- * TransactionIdPrecedesOrEquals --- is id1 logically <= id2?
- */
-bool
-TransactionIdPrecedesOrEquals(TransactionId id1, TransactionId id2)
-{
- int32 diff;
-
- if (!TransactionIdIsNormal(id1) || !TransactionIdIsNormal(id2))
- return (id1 <= id2);
-
- diff = (int32) (id1 - id2);
- return (diff <= 0);
-}
-
-/*
- * TransactionIdFollows --- is id1 logically > id2?
- */
-bool
-TransactionIdFollows(TransactionId id1, TransactionId id2)
-{
- int32 diff;
-
- if (!TransactionIdIsNormal(id1) || !TransactionIdIsNormal(id2))
- return (id1 > id2);
-
- diff = (int32) (id1 - id2);
- return (diff > 0);
-}
-
-/*
- * TransactionIdFollowsOrEquals --- is id1 logically >= id2?
- */
-bool
-TransactionIdFollowsOrEquals(TransactionId id1, TransactionId id2)
-{
- int32 diff;
-
- if (!TransactionIdIsNormal(id1) || !TransactionIdIsNormal(id2))
- return (id1 >= id2);
-
- diff = (int32) (id1 - id2);
- return (diff >= 0);
-}
-
/*
* TransactionIdLatest --- get latest XID among a main xact and its children
diff --git a/src/include/access/transam.h b/src/include/access/transam.h
index 7d82cd2eb56..c9e20418275 100644
--- a/src/include/access/transam.h
+++ b/src/include/access/transam.h
@@ -255,6 +255,72 @@ typedef struct TransamVariablesData
} TransamVariablesData;
+
+/*
+ * TransactionIdPrecedes --- is id1 logically < id2?
+ */
+static inline bool
+TransactionIdPrecedes(TransactionId id1, TransactionId id2)
+{
+ /*
+ * If either ID is a permanent XID then we can just do unsigned
+ * comparison. If both are normal, do a modulo-2^32 comparison.
+ */
+ int32 diff;
+
+ if (!TransactionIdIsNormal(id1) || !TransactionIdIsNormal(id2))
+ return (id1 < id2);
+
+ diff = (int32) (id1 - id2);
+ return (diff < 0);
+}
+
+/*
+ * TransactionIdPrecedesOrEquals --- is id1 logically <= id2?
+ */
+static inline bool
+TransactionIdPrecedesOrEquals(TransactionId id1, TransactionId id2)
+{
+ int32 diff;
+
+ if (!TransactionIdIsNormal(id1) || !TransactionIdIsNormal(id2))
+ return (id1 <= id2);
+
+ diff = (int32) (id1 - id2);
+ return (diff <= 0);
+}
+
+/*
+ * TransactionIdFollows --- is id1 logically > id2?
+ */
+static inline bool
+TransactionIdFollows(TransactionId id1, TransactionId id2)
+{
+ int32 diff;
+
+ if (!TransactionIdIsNormal(id1) || !TransactionIdIsNormal(id2))
+ return (id1 > id2);
+
+ diff = (int32) (id1 - id2);
+ return (diff > 0);
+}
+
+/*
+ * TransactionIdFollowsOrEquals --- is id1 logically >= id2?
+ */
+static inline bool
+TransactionIdFollowsOrEquals(TransactionId id1, TransactionId id2)
+{
+ int32 diff;
+
+ if (!TransactionIdIsNormal(id1) || !TransactionIdIsNormal(id2))
+ return (id1 >= id2);
+
+ diff = (int32) (id1 - id2);
+ return (diff >= 0);
+}
+
+
/* ----------------
* extern declarations
* ----------------
@@ -274,10 +340,6 @@ extern bool TransactionIdDidAbort(TransactionId transactionId);
extern void TransactionIdCommitTree(TransactionId xid, int nxids, TransactionId *xids);
extern void TransactionIdAsyncCommitTree(TransactionId xid, int nxids, TransactionId *xids, XLogRecPtr lsn);
extern void TransactionIdAbortTree(TransactionId xid, int nxids, TransactionId *xids);
-extern bool TransactionIdPrecedes(TransactionId id1, TransactionId id2);
-extern bool TransactionIdPrecedesOrEquals(TransactionId id1, TransactionId id2);
-extern bool TransactionIdFollows(TransactionId id1, TransactionId id2);
-extern bool TransactionIdFollowsOrEquals(TransactionId id1, TransactionId id2);
extern TransactionId TransactionIdLatest(TransactionId mainxid,
int nxids, const TransactionId *xids);
extern XLogRecPtr TransactionIdGetCommitLSN(TransactionId xid);
--
2.43.0
v16-0013-Allow-on-access-pruning-to-set-pages-all-visible.patchtext/x-patch; charset=UTF-8; name=v16-0013-Allow-on-access-pruning-to-set-pages-all-visible.patchDownload
From 13ff9fd8071f9b7aea07cca603c51a9a3cd659f1 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <melanieplageman@gmail.com>
Date: Tue, 29 Jul 2025 14:34:30 -0400
Subject: [PATCH v16 13/14] Allow on-access pruning to set pages all-visible
Many queries do not modify the underlying relation. For such queries, if
on-access pruning occurs during the scan, we can check whether the page
has become all-visible and update the visibility map accordingly.
Previously, only vacuum and COPY FREEZE marked pages as all-visible or
all-frozen.
Supporting this requires passing information about whether the relation
is modified from the executor down to the scan descriptor.
This commit implements on-access VM setting for sequential scans as well
as for the underlying heap relation in index scans and bitmap heap
scans.
---
src/backend/access/heap/heapam.c | 15 +++-
src/backend/access/heap/heapam_handler.c | 15 +++-
src/backend/access/heap/pruneheap.c | 71 +++++++++++++++----
src/backend/access/index/indexam.c | 46 ++++++++++++
src/backend/access/table/tableam.c | 39 ++++++++--
src/backend/executor/execMain.c | 4 ++
src/backend/executor/execUtils.c | 2 +
src/backend/executor/nodeBitmapHeapscan.c | 7 +-
src/backend/executor/nodeIndexscan.c | 18 +++--
src/backend/executor/nodeSeqscan.c | 24 +++++--
src/include/access/genam.h | 11 +++
src/include/access/heapam.h | 24 ++++++-
src/include/access/relscan.h | 6 ++
src/include/access/tableam.h | 30 +++++++-
src/include/nodes/execnodes.h | 6 ++
.../t/035_standby_logical_decoding.pl | 3 +-
16 files changed, 282 insertions(+), 39 deletions(-)
diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index 14a2996b9ee..6181e355aaf 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -555,6 +555,7 @@ heap_prepare_pagescan(TableScanDesc sscan)
Buffer buffer = scan->rs_cbuf;
BlockNumber block = scan->rs_cblock;
Snapshot snapshot;
+ Buffer *vmbuffer = NULL;
Page page;
int lines;
bool all_visible;
@@ -569,7 +570,9 @@ heap_prepare_pagescan(TableScanDesc sscan)
/*
* Prune and repair fragmentation for the whole page, if possible.
*/
- heap_page_prune_opt(scan->rs_base.rs_rd, buffer);
+ if (sscan->rs_flags & SO_ALLOW_VM_SET)
+ vmbuffer = &scan->rs_vmbuffer;
+ heap_page_prune_opt(scan->rs_base.rs_rd, buffer, vmbuffer);
/*
* We must hold share lock on the buffer content while examining tuple
@@ -1246,6 +1249,7 @@ heap_beginscan(Relation relation, Snapshot snapshot,
sizeof(TBMIterateResult));
}
+ scan->rs_vmbuffer = InvalidBuffer;
return (TableScanDesc) scan;
}
@@ -1284,6 +1288,12 @@ heap_rescan(TableScanDesc sscan, ScanKey key, bool set_params,
scan->rs_cbuf = InvalidBuffer;
}
+ if (BufferIsValid(scan->rs_vmbuffer))
+ {
+ ReleaseBuffer(scan->rs_vmbuffer);
+ scan->rs_vmbuffer = InvalidBuffer;
+ }
+
/*
* SO_TYPE_BITMAPSCAN would be cleaned up here, but it does not hold any
* additional data vs a normal HeapScan
@@ -1316,6 +1326,9 @@ heap_endscan(TableScanDesc sscan)
if (BufferIsValid(scan->rs_cbuf))
ReleaseBuffer(scan->rs_cbuf);
+ if (BufferIsValid(scan->rs_vmbuffer))
+ ReleaseBuffer(scan->rs_vmbuffer);
+
/*
* Must free the read stream before freeing the BufferAccessStrategy.
*/
diff --git a/src/backend/access/heap/heapam_handler.c b/src/backend/access/heap/heapam_handler.c
index bcbac844bb6..f05b9e4968d 100644
--- a/src/backend/access/heap/heapam_handler.c
+++ b/src/backend/access/heap/heapam_handler.c
@@ -85,6 +85,7 @@ heapam_index_fetch_begin(Relation rel)
hscan->xs_base.rel = rel;
hscan->xs_cbuf = InvalidBuffer;
+ hscan->xs_vmbuffer = InvalidBuffer;
return &hscan->xs_base;
}
@@ -99,6 +100,12 @@ heapam_index_fetch_reset(IndexFetchTableData *scan)
ReleaseBuffer(hscan->xs_cbuf);
hscan->xs_cbuf = InvalidBuffer;
}
+
+ if (BufferIsValid(hscan->xs_vmbuffer))
+ {
+ ReleaseBuffer(hscan->xs_vmbuffer);
+ hscan->xs_vmbuffer = InvalidBuffer;
+ }
}
static void
@@ -138,7 +145,8 @@ heapam_index_fetch_tuple(struct IndexFetchTableData *scan,
* Prune page, but only if we weren't already on this page
*/
if (prev_buf != hscan->xs_cbuf)
- heap_page_prune_opt(hscan->xs_base.rel, hscan->xs_cbuf);
+ heap_page_prune_opt(hscan->xs_base.rel, hscan->xs_cbuf,
+ scan->modifies_base_rel ? NULL : &hscan->xs_vmbuffer);
}
/* Obtain share-lock on the buffer so we can examine visibility */
@@ -2471,6 +2479,7 @@ BitmapHeapScanNextBlock(TableScanDesc scan,
TBMIterateResult *tbmres;
OffsetNumber offsets[TBM_MAX_TUPLES_PER_PAGE];
int noffsets = -1;
+ Buffer *vmbuffer = NULL;
Assert(scan->rs_flags & SO_TYPE_BITMAPSCAN);
Assert(hscan->rs_read_stream);
@@ -2517,7 +2526,9 @@ BitmapHeapScanNextBlock(TableScanDesc scan,
/*
* Prune and repair fragmentation for the whole page, if possible.
*/
- heap_page_prune_opt(scan->rs_rd, buffer);
+ if (scan->rs_flags & SO_ALLOW_VM_SET)
+ vmbuffer = &hscan->rs_vmbuffer;
+ heap_page_prune_opt(scan->rs_rd, buffer, vmbuffer);
/*
* We must hold share lock on the buffer content while examining tuple
diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index e5b16bd2b38..fa3b38cdadc 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -186,7 +186,9 @@ static bool heap_page_will_set_vis(Relation relation,
Buffer heap_buf,
Buffer vmbuffer,
bool blk_known_av,
- const PruneState *prstate,
+ PruneReason reason,
+ bool do_prune, bool do_freeze,
+ PruneState *prstate,
uint8 *vmflags,
bool *do_set_pd_vis);
@@ -201,9 +203,13 @@ static bool heap_page_will_set_vis(Relation relation,
* if there's not any use in pruning.
*
* Caller must have pin on the buffer, and must *not* have a lock on it.
+ *
+ * If vmbuffer is not NULL, it is okay for pruning to set the visibility map if
+ * the page is all-visible. We will take care of pinning and, if needed,
+ * reading in the page of the visibility map.
*/
void
-heap_page_prune_opt(Relation relation, Buffer buffer)
+heap_page_prune_opt(Relation relation, Buffer buffer, Buffer *vmbuffer)
{
Page page = BufferGetPage(buffer);
TransactionId prune_xid;
@@ -269,12 +275,21 @@ heap_page_prune_opt(Relation relation, Buffer buffer)
PruneFreezeParams params;
PruneFreezeResult presult;
+ params.options = 0;
+ params.vmbuffer = InvalidBuffer;
+
+ if (vmbuffer)
+ {
+ visibilitymap_pin(relation, BufferGetBlockNumber(buffer), vmbuffer);
+ params.options = HEAP_PAGE_PRUNE_UPDATE_VIS;
+ params.vmbuffer = *vmbuffer;
+ }
+
params.relation = relation;
params.buffer = buffer;
params.reason = PRUNE_ON_ACCESS;
params.vistest = vistest;
params.cutoffs = NULL;
- params.vmbuffer = InvalidBuffer;
params.blk_known_av = false;
/*
@@ -455,6 +470,9 @@ heap_page_will_freeze(Relation relation, Buffer buffer,
* have examined this page’s VM bits (e.g., VACUUM in the previous
* heap_vac_scan_next_block() call) and can pass that along.
*
+ * This should be called only after do_freeze has been decided (and do_prune
+ * has been set), as these factor into our heuristic-based decision.
+ *
* Returns true if one or both VM bits should be set, along with the desired
* flags in *vmflags. Also indicates via do_set_pd_vis whether PD_ALL_VISIBLE
* should be set on the heap page.
@@ -465,7 +483,9 @@ heap_page_will_set_vis(Relation relation,
Buffer heap_buf,
Buffer vmbuffer,
bool blk_known_av,
- const PruneState *prstate,
+ PruneReason reason,
+ bool do_prune, bool do_freeze,
+ PruneState *prstate,
uint8 *vmflags,
bool *do_set_pd_vis)
{
@@ -481,6 +501,23 @@ heap_page_will_set_vis(Relation relation,
return false;
}
+ /*
+ * If this is an on-access call and we're not actually pruning, avoid
+ * setting the visibility map if it would newly dirty the heap page or, if
+ * the page is already dirty, if doing so would require including a
+ * full-page image (FPI) of the heap page in the WAL. This situation
+ * should be rare, as on-access pruning is only attempted when
+ * pd_prune_xid is valid.
+ */
+ if (reason == PRUNE_ON_ACCESS &&
+ prstate->all_visible &&
+ !do_prune && !do_freeze &&
+ (!BufferIsDirty(heap_buf) || XLogCheckBufferNeedsBackup(heap_buf)))
+ {
+ prstate->all_visible = prstate->all_frozen = false;
+ return false;
+ }
+
if (prstate->all_visible && !PageIsAllVisible(heap_page))
*do_set_pd_vis = true;
@@ -504,6 +541,9 @@ heap_page_will_set_vis(Relation relation,
* page-level bit is clear. However, it's possible that in vacuum the bit
* got cleared after heap_vac_scan_next_block() was called, so we must
* recheck with buffer lock before concluding that the VM is corrupt.
+ *
+ * XXX: This will never trigger for on-access pruning because it passes
+ * blk_known_av as false. Should we remove that condition here?
*/
else if (blk_known_av && !PageIsAllVisible(heap_page) &&
visibilitymap_get_status(relation, heap_blk, &vmbuffer) != 0)
@@ -912,6 +952,14 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
prstate.ndead > 0 ||
prstate.nunused > 0;
+ /*
+ * Even if we don't prune anything, if we found a new value for the
+ * pd_prune_xid field or the page was marked full, we will update the hint
+ * bit.
+ */
+ do_hint_prune = ((PageHeader) page)->pd_prune_xid != prstate.new_prune_xid ||
+ PageIsFull(page);
+
/*
* After processing all the live tuples on the page, if the newest xmin
* amongst them is not visible to everyone, the page cannot be
@@ -922,14 +970,6 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
!GlobalVisXidVisibleToAll(prstate.vistest, prstate.visibility_cutoff_xid))
prstate.all_visible = prstate.all_frozen = false;
- /*
- * Even if we don't prune anything, if we found a new value for the
- * pd_prune_xid field or the page was marked full, we will update the hint
- * bit.
- */
- do_hint_prune = ((PageHeader) page)->pd_prune_xid != prstate.new_prune_xid ||
- PageIsFull(page);
-
/*
* Decide if we want to go ahead with freezing according to the freeze
* plans we prepared, or not.
@@ -973,6 +1013,7 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
*/
do_set_vm = heap_page_will_set_vis(params->relation,
blockno, buffer, vmbuffer, params->blk_known_av,
+ params->reason, do_prune, do_freeze,
&prstate, &new_vmbits, &do_set_pd_vis);
/* We should only set the VM if PD_ALL_VISIBLE is set or will be */
@@ -2245,7 +2286,7 @@ heap_log_freeze_plan(HeapTupleFreeze *tuples, int ntuples,
/*
* Calculate the conflict horizon for the whole XLOG_HEAP2_PRUNE_VACUUM_SCAN
- * record.
+ * or XLOG_HEAP2_PRUNE_ON_ACCESS record.
*/
static TransactionId
get_conflict_xid(bool do_prune, bool do_freeze, bool do_set_vm,
@@ -2314,8 +2355,8 @@ get_conflict_xid(bool do_prune, bool do_freeze, bool do_set_vm,
* - Reaping: During vacuum phase III, items that are already LP_DEAD are
* marked as unused.
*
- * - VM updates: After vacuum phases I and III, the heap page may be marked
- * all-visible and all-frozen.
+ * - VM updates: After vacuum phases I and III and on-access, the heap page
+ * may be marked all-visible and all-frozen.
*
* These changes all happen together, so we use a single WAL record for them
* all.
diff --git a/src/backend/access/index/indexam.c b/src/backend/access/index/indexam.c
index 0492d92d23b..8d582a8eafd 100644
--- a/src/backend/access/index/indexam.c
+++ b/src/backend/access/index/indexam.c
@@ -289,6 +289,32 @@ index_beginscan(Relation heapRelation,
return scan;
}
+/*
+ * Similar to index_beginscan(), but allows the caller to indicate whether the
+ * query modifies the underlying base relation. This is used when the caller
+ * wants to attempt marking pages in the base relation as all-visible in the
+ * visibility map during on-access pruning.
+ */
+IndexScanDesc
+index_beginscan_vmset(Relation heapRelation,
+ Relation indexRelation,
+ Snapshot snapshot,
+ IndexScanInstrumentation *instrument,
+ int nkeys, int norderbys, bool modifies_base_rel)
+{
+ IndexScanDesc scan;
+
+ scan = index_beginscan(heapRelation,
+ indexRelation,
+ snapshot,
+ instrument,
+ nkeys, norderbys);
+
+ scan->xs_heapfetch->modifies_base_rel = modifies_base_rel;
+
+ return scan;
+}
+
/*
* index_beginscan_bitmap - start a scan of an index with amgetbitmap
*
@@ -620,6 +646,26 @@ index_beginscan_parallel(Relation heaprel, Relation indexrel,
return scan;
}
+/*
+ * Parallel version of index_beginscan_vmset()
+ */
+IndexScanDesc
+index_beginscan_parallel_vmset(Relation heaprel, Relation indexrel,
+ IndexScanInstrumentation *instrument,
+ int nkeys, int norderbys,
+ ParallelIndexScanDesc pscan,
+ bool modifies_base_rel)
+{
+ IndexScanDesc scan;
+
+ scan = index_beginscan_parallel(heaprel, indexrel,
+ instrument,
+ nkeys, norderbys,
+ pscan);
+ scan->xs_heapfetch->modifies_base_rel = modifies_base_rel;
+ return scan;
+}
+
/* ----------------
* index_getnext_tid - get the next TID from a scan
*
diff --git a/src/backend/access/table/tableam.c b/src/backend/access/table/tableam.c
index 5e41404937e..3e3a0f72a71 100644
--- a/src/backend/access/table/tableam.c
+++ b/src/backend/access/table/tableam.c
@@ -49,6 +49,10 @@
char *default_table_access_method = DEFAULT_TABLE_ACCESS_METHOD;
bool synchronize_seqscans = true;
+/* Helper for table_beginscan_parallel() and table_beginscan_parallel_vmset() */
+static TableScanDesc table_beginscan_parallel_common(Relation relation, ParallelTableScanDesc pscan,
+ uint32 flags);
+
/* ----------------------------------------------------------------------------
* Slot functions.
@@ -162,12 +166,14 @@ table_parallelscan_initialize(Relation rel, ParallelTableScanDesc pscan,
}
}
-TableScanDesc
-table_beginscan_parallel(Relation relation, ParallelTableScanDesc pscan)
+/*
+ * Common helper for table_beginscan_parallel() and table_beginscan_parallel_vmset()
+ */
+static TableScanDesc
+table_beginscan_parallel_common(Relation relation, ParallelTableScanDesc pscan,
+ uint32 flags)
{
Snapshot snapshot;
- uint32 flags = SO_TYPE_SEQSCAN |
- SO_ALLOW_STRAT | SO_ALLOW_SYNC | SO_ALLOW_PAGEMODE;
Assert(RelFileLocatorEquals(relation->rd_locator, pscan->phs_locator));
@@ -188,6 +194,31 @@ table_beginscan_parallel(Relation relation, ParallelTableScanDesc pscan)
pscan, flags);
}
+TableScanDesc
+table_beginscan_parallel(Relation relation, ParallelTableScanDesc pscan)
+{
+ uint32 flags = SO_TYPE_SEQSCAN |
+ SO_ALLOW_STRAT | SO_ALLOW_SYNC | SO_ALLOW_PAGEMODE;
+
+ return table_beginscan_parallel_common(relation, pscan, flags);
+}
+
+/*
+ * Parallel version of table_beginscan_vmset()
+ */
+TableScanDesc
+table_beginscan_parallel_vmset(Relation relation, ParallelTableScanDesc pscan,
+ bool modifies_rel)
+{
+ uint32 flags = SO_TYPE_SEQSCAN |
+ SO_ALLOW_STRAT | SO_ALLOW_SYNC | SO_ALLOW_PAGEMODE;
+
+ if (!modifies_rel)
+ flags |= SO_ALLOW_VM_SET;
+
+ return table_beginscan_parallel_common(relation, pscan, flags);
+}
+
/* ----------------------------------------------------------------------------
* Index scan related functions.
diff --git a/src/backend/executor/execMain.c b/src/backend/executor/execMain.c
index 831c55ce787..15be318fd41 100644
--- a/src/backend/executor/execMain.c
+++ b/src/backend/executor/execMain.c
@@ -916,6 +916,10 @@ InitPlan(QueryDesc *queryDesc, int eflags)
break;
}
+ /* If it has a rowmark, the relation is modified */
+ estate->es_modified_relids = bms_add_member(estate->es_modified_relids,
+ rc->rti);
+
/* Check that relation is a legal target for marking */
if (relation)
CheckValidRowMarkRel(relation, rc->markType);
diff --git a/src/backend/executor/execUtils.c b/src/backend/executor/execUtils.c
index fdc65c2b42b..28a06dcd244 100644
--- a/src/backend/executor/execUtils.c
+++ b/src/backend/executor/execUtils.c
@@ -893,6 +893,8 @@ ExecInitResultRelation(EState *estate, ResultRelInfo *resultRelInfo,
estate->es_result_relations = (ResultRelInfo **)
palloc0(estate->es_range_table_size * sizeof(ResultRelInfo *));
estate->es_result_relations[rti - 1] = resultRelInfo;
+ estate->es_modified_relids = bms_add_member(estate->es_modified_relids,
+ rti);
/*
* Saving in the list allows to avoid needlessly traversing the whole
diff --git a/src/backend/executor/nodeBitmapHeapscan.c b/src/backend/executor/nodeBitmapHeapscan.c
index bf24f3d7fe0..af6db9f7919 100644
--- a/src/backend/executor/nodeBitmapHeapscan.c
+++ b/src/backend/executor/nodeBitmapHeapscan.c
@@ -105,11 +105,16 @@ BitmapTableScanSetup(BitmapHeapScanState *node)
*/
if (!node->ss.ss_currentScanDesc)
{
+ bool modifies_rel =
+ bms_is_member(((Scan *) node->ss.ps.plan)->scanrelid,
+ node->ss.ps.state->es_modified_relids);
+
node->ss.ss_currentScanDesc =
table_beginscan_bm(node->ss.ss_currentRelation,
node->ss.ps.state->es_snapshot,
0,
- NULL);
+ NULL,
+ modifies_rel);
}
node->ss.ss_currentScanDesc->st.rs_tbmiterator = tbmiterator;
diff --git a/src/backend/executor/nodeIndexscan.c b/src/backend/executor/nodeIndexscan.c
index 7fcaa37fe62..c2ffbd3b08e 100644
--- a/src/backend/executor/nodeIndexscan.c
+++ b/src/backend/executor/nodeIndexscan.c
@@ -102,16 +102,22 @@ IndexNext(IndexScanState *node)
if (scandesc == NULL)
{
+
+ bool modifies_base_rel =
+ bms_is_member(((Scan *) node->ss.ps.plan)->scanrelid,
+ estate->es_modified_relids);
+
/*
* We reach here if the index scan is not parallel, or if we're
* serially executing an index scan that was planned to be parallel.
*/
- scandesc = index_beginscan(node->ss.ss_currentRelation,
- node->iss_RelationDesc,
- estate->es_snapshot,
- &node->iss_Instrument,
- node->iss_NumScanKeys,
- node->iss_NumOrderByKeys);
+ scandesc = index_beginscan_vmset(node->ss.ss_currentRelation,
+ node->iss_RelationDesc,
+ estate->es_snapshot,
+ &node->iss_Instrument,
+ node->iss_NumScanKeys,
+ node->iss_NumOrderByKeys,
+ modifies_base_rel);
node->iss_ScanDesc = scandesc;
diff --git a/src/backend/executor/nodeSeqscan.c b/src/backend/executor/nodeSeqscan.c
index 94047d29430..fd69275c181 100644
--- a/src/backend/executor/nodeSeqscan.c
+++ b/src/backend/executor/nodeSeqscan.c
@@ -65,13 +65,18 @@ SeqNext(SeqScanState *node)
if (scandesc == NULL)
{
+ bool modifies_rel =
+ bms_is_member(((Scan *) node->ss.ps.plan)->scanrelid,
+ estate->es_modified_relids);
+
/*
* We reach here if the scan is not parallel, or if we're serially
* executing a scan that was planned to be parallel.
*/
- scandesc = table_beginscan(node->ss.ss_currentRelation,
- estate->es_snapshot,
- 0, NULL);
+ scandesc = table_beginscan_vmset(node->ss.ss_currentRelation,
+ estate->es_snapshot,
+ 0, NULL, modifies_rel);
+
node->ss.ss_currentScanDesc = scandesc;
}
@@ -366,6 +371,7 @@ ExecSeqScanInitializeDSM(SeqScanState *node,
ParallelContext *pcxt)
{
EState *estate = node->ss.ps.state;
+ bool modifies_rel;
ParallelTableScanDesc pscan;
pscan = shm_toc_allocate(pcxt->toc, node->pscan_len);
@@ -373,8 +379,11 @@ ExecSeqScanInitializeDSM(SeqScanState *node,
pscan,
estate->es_snapshot);
shm_toc_insert(pcxt->toc, node->ss.ps.plan->plan_node_id, pscan);
+ modifies_rel = bms_is_member(((Scan *) node->ss.ps.plan)->scanrelid,
+ estate->es_modified_relids);
node->ss.ss_currentScanDesc =
- table_beginscan_parallel(node->ss.ss_currentRelation, pscan);
+ table_beginscan_parallel_vmset(node->ss.ss_currentRelation, pscan,
+ modifies_rel);
}
/* ----------------------------------------------------------------
@@ -404,8 +413,13 @@ ExecSeqScanInitializeWorker(SeqScanState *node,
ParallelWorkerContext *pwcxt)
{
ParallelTableScanDesc pscan;
+ bool modifies_rel =
+ bms_is_member(((Scan *) node->ss.ps.plan)->scanrelid,
+ node->ss.ps.state->es_modified_relids);
pscan = shm_toc_lookup(pwcxt->toc, node->ss.ps.plan->plan_node_id, false);
node->ss.ss_currentScanDesc =
- table_beginscan_parallel(node->ss.ss_currentRelation, pscan);
+ table_beginscan_parallel_vmset(node->ss.ss_currentRelation,
+ pscan,
+ modifies_rel);
}
diff --git a/src/include/access/genam.h b/src/include/access/genam.h
index 9200a22bd9f..aa2112c8e04 100644
--- a/src/include/access/genam.h
+++ b/src/include/access/genam.h
@@ -178,6 +178,11 @@ extern IndexScanDesc index_beginscan(Relation heapRelation,
Snapshot snapshot,
IndexScanInstrumentation *instrument,
int nkeys, int norderbys);
+extern IndexScanDesc index_beginscan_vmset(Relation heapRelation,
+ Relation indexRelation,
+ Snapshot snapshot,
+ IndexScanInstrumentation *instrument,
+ int nkeys, int norderbys, bool modifies_heap_rel);
extern IndexScanDesc index_beginscan_bitmap(Relation indexRelation,
Snapshot snapshot,
IndexScanInstrumentation *instrument,
@@ -204,6 +209,12 @@ extern IndexScanDesc index_beginscan_parallel(Relation heaprel,
IndexScanInstrumentation *instrument,
int nkeys, int norderbys,
ParallelIndexScanDesc pscan);
+
+extern IndexScanDesc index_beginscan_parallel_vmset(Relation heaprel, Relation indexrel,
+ IndexScanInstrumentation *instrument,
+ int nkeys, int norderbys,
+ ParallelIndexScanDesc pscan,
+ bool modifies_rel);
extern ItemPointer index_getnext_tid(IndexScanDesc scan,
ScanDirection direction);
extern bool index_fetch_heap(IndexScanDesc scan, TupleTableSlot *slot);
diff --git a/src/include/access/heapam.h b/src/include/access/heapam.h
index df0632aebc6..59d8ce9ad42 100644
--- a/src/include/access/heapam.h
+++ b/src/include/access/heapam.h
@@ -95,6 +95,13 @@ typedef struct HeapScanDescData
*/
ParallelBlockTableScanWorkerData *rs_parallelworkerdata;
+ /*
+ * For sequential scans and bitmap heap scans. If the relation is not
+ * being modified, on-access pruning may read in the current heap page's
+ * corresponding VM block to this buffer.
+ */
+ Buffer rs_vmbuffer;
+
/* these fields only used in page-at-a-time mode and for bitmap scans */
uint32 rs_cindex; /* current tuple's index in vistuples */
uint32 rs_ntuples; /* number of visible tuples on page */
@@ -117,8 +124,18 @@ typedef struct IndexFetchHeapData
{
IndexFetchTableData xs_base; /* AM independent part of the descriptor */
- Buffer xs_cbuf; /* current heap buffer in scan, if any */
- /* NB: if xs_cbuf is not InvalidBuffer, we hold a pin on that buffer */
+ /*
+ * Current heap buffer in scan, if any. NB: if xs_cbuf is not
+ * InvalidBuffer, we hold a pin on that buffer.
+ */
+ Buffer xs_cbuf;
+
+ /*
+ * For index scans that do not modify the underlying heap table, on-access
+ * pruning may read in the current heap page's corresponding VM block to
+ * this buffer.
+ */
+ Buffer xs_vmbuffer;
} IndexFetchHeapData;
/* Result codes for HeapTupleSatisfiesVacuum */
@@ -415,7 +432,8 @@ extern TransactionId heap_index_delete_tuples(Relation rel,
TM_IndexDeleteOp *delstate);
/* in heap/pruneheap.c */
-extern void heap_page_prune_opt(Relation relation, Buffer buffer);
+extern void heap_page_prune_opt(Relation relation, Buffer buffer,
+ Buffer *vmbuffer);
extern void heap_page_prune_and_freeze(PruneFreezeParams *params,
PruneFreezeResult *presult,
OffsetNumber *off_loc,
diff --git a/src/include/access/relscan.h b/src/include/access/relscan.h
index b5e0fb386c0..f496e0b4939 100644
--- a/src/include/access/relscan.h
+++ b/src/include/access/relscan.h
@@ -121,6 +121,12 @@ typedef struct ParallelBlockTableScanWorkerData *ParallelBlockTableScanWorker;
typedef struct IndexFetchTableData
{
Relation rel;
+
+ /*
+ * Some optimizations can only be performed if the query does not modify
+ * the underlying relation. Track that here.
+ */
+ bool modifies_base_rel;
} IndexFetchTableData;
struct IndexScanInstrumentation;
diff --git a/src/include/access/tableam.h b/src/include/access/tableam.h
index e16bf025692..f250d4e7aec 100644
--- a/src/include/access/tableam.h
+++ b/src/include/access/tableam.h
@@ -63,6 +63,8 @@ typedef enum ScanOptions
/* unregister snapshot at scan end? */
SO_TEMP_SNAPSHOT = 1 << 9,
+ /* whether or not scan should attempt to set the VM */
+ SO_ALLOW_VM_SET = 1 << 10,
} ScanOptions;
/*
@@ -882,6 +884,25 @@ table_beginscan(Relation rel, Snapshot snapshot,
return rel->rd_tableam->scan_begin(rel, snapshot, nkeys, key, NULL, flags);
}
+/*
+ * Similar to table_beginscan(), but allows the caller to indicate whether the
+ * query modifies the relation. This is used when the caller wants to attempt
+ * marking pages in the relation as all-visible in the visibility map during
+ * on-access pruning.
+ */
+static inline TableScanDesc
+table_beginscan_vmset(Relation rel, Snapshot snapshot,
+ int nkeys, struct ScanKeyData *key, bool modifies_rel)
+{
+ uint32 flags = SO_TYPE_SEQSCAN |
+ SO_ALLOW_STRAT | SO_ALLOW_SYNC | SO_ALLOW_PAGEMODE;
+
+ if (!modifies_rel)
+ flags |= SO_ALLOW_VM_SET;
+
+ return rel->rd_tableam->scan_begin(rel, snapshot, nkeys, key, NULL, flags);
+}
+
/*
* Like table_beginscan(), but for scanning catalog. It'll automatically use a
* snapshot appropriate for scanning catalog relations.
@@ -919,10 +940,13 @@ table_beginscan_strat(Relation rel, Snapshot snapshot,
*/
static inline TableScanDesc
table_beginscan_bm(Relation rel, Snapshot snapshot,
- int nkeys, ScanKeyData *key)
+ int nkeys, ScanKeyData *key, bool modifies_rel)
{
uint32 flags = SO_TYPE_BITMAPSCAN | SO_ALLOW_PAGEMODE;
+ if (!modifies_rel)
+ flags |= SO_ALLOW_VM_SET;
+
return rel->rd_tableam->scan_begin(rel, snapshot, nkeys, key,
NULL, flags);
}
@@ -1130,6 +1154,10 @@ extern void table_parallelscan_initialize(Relation rel,
extern TableScanDesc table_beginscan_parallel(Relation relation,
ParallelTableScanDesc pscan);
+extern TableScanDesc table_beginscan_parallel_vmset(Relation relation,
+ ParallelTableScanDesc pscan,
+ bool modifies_rel);
+
/*
* Restart a parallel scan. Call this in the leader process. Caller is
* responsible for making sure that all workers have finished the scan
diff --git a/src/include/nodes/execnodes.h b/src/include/nodes/execnodes.h
index a36653c37f9..9c54fa06e4a 100644
--- a/src/include/nodes/execnodes.h
+++ b/src/include/nodes/execnodes.h
@@ -676,6 +676,12 @@ typedef struct EState
* ExecDoInitialPruning() */
const char *es_sourceText; /* Source text from QueryDesc */
+ /*
+ * RT indexes of relations modified by the query either through
+ * UPDATE/DELETE/INSERT/MERGE or SELECT FOR UPDATE
+ */
+ Bitmapset *es_modified_relids;
+
JunkFilter *es_junkFilter; /* top-level junk filter, if any */
/* If query can insert/delete tuples, the command ID to mark them with */
diff --git a/src/test/recovery/t/035_standby_logical_decoding.pl b/src/test/recovery/t/035_standby_logical_decoding.pl
index c9c182892cf..f5c0c65b260 100644
--- a/src/test/recovery/t/035_standby_logical_decoding.pl
+++ b/src/test/recovery/t/035_standby_logical_decoding.pl
@@ -296,6 +296,7 @@ wal_level = 'logical'
max_replication_slots = 4
max_wal_senders = 4
autovacuum = off
+hot_standby_feedback = on
});
$node_primary->dump_info;
$node_primary->start;
@@ -745,7 +746,7 @@ check_pg_recvlogical_stderr($handle,
$logstart = -s $node_standby->logfile;
reactive_slots_change_hfs_and_wait_for_xmins('shared_row_removal_',
- 'no_conflict_', 0, 1);
+ 'no_conflict_', 1, 0);
# This should not trigger a conflict
wait_until_vacuum_can_remove(
--
2.43.0
On Mon, Oct 6, 2025 at 6:40 PM Melanie Plageman
<melanieplageman@gmail.com> wrote:
In attached v16, I’ve reverted to removing XLOG_HEAP2_VISIBLE
entirely, rather than first removing each caller's heap page from the
VM WAL chain. I reordered changes and squashed several refactoring
patches to improve patch-by-patch readability. This should make the
set read differently from earlier versions that removed
XLOG_HEAP2_VISIBLE and had more step-by-step mechanical refactoring.I think if we plan to go all the way with removing XLOG_HEAP2_VISIBLE,
having intermediate patches that just set PD_ALL_VISIBLE when making
other heap pages are more confusing than helpful. Also, I think having
separate flags for setting PD_ALL_VISIBLE in the WAL record
over-complicated the code.
I decided to reorder the patches to remove XLOG_HEAP2_VISIBLE from
vacuum phase III before removing it from vacuum phase I because
removing it from phase III doesn't require preliminary refactoring
patches. I've done that in the attached v17.
I've also added an experimental patch on the end that refactors large
chunks of heap_page_prune_and_freeze() into helpers. I got some
feedback off-list that heap_page_prune_and_freeze() is too unwieldy
now. I'm not sure how I feel about them yet, so I haven't documented
them or moved them up in the patch set to before changes to
heap_page_prune_and_freeze().
0001: Eliminate XLOG_HEAP2_VISIBLE from COPY FREEZE
0002: Eliminate XLOG_HEAP2_VISIBLE from phase III of vacuum
0003 - 0006: cleanup and refactoring to prepare for 0007
0007: Eliminate XLOG_HEAP2_VISIBLE from vacuum prune/freeze
0008 - 0009: Remove XLOG_HEAP2_VISIBLE
0010 - 0012: refactoring to prepare for 0013
0013: Set VM on-access
0014: Set pd_prune_xid on insert
0015: Experimental refactoring of heap_page_prune_and_freeze into helpers
- Melanie
Attachments:
v17-0001-Eliminate-COPY-FREEZE-use-of-XLOG_HEAP2_VISIBLE.patchtext/x-patch; charset=UTF-8; name=v17-0001-Eliminate-COPY-FREEZE-use-of-XLOG_HEAP2_VISIBLE.patchDownload
From 5c94a9cea77820235f62719b9e760adb6fbbc615 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <melanieplageman@gmail.com>
Date: Tue, 17 Jun 2025 17:22:10 -0400
Subject: [PATCH v17 01/15] Eliminate COPY FREEZE use of XLOG_HEAP2_VISIBLE
Instead of emitting a separate WAL XLOG_HEAP2_VISIBLE record for setting
bits in the VM, specify the VM block changes in the
XLOG_HEAP2_MULTI_INSERT record.
This halves the number of WAL records emitted by COPY FREEZE.
Author: Melanie Plageman <melanieplageman@gmail.com>
Reviewed-by: Andres Freund <andres@anarazel.de>
Reviewed-by: Robert Haas <robertmhaas@gmail.com>
Reviewed-by: Kirill Reshke <reshkekirill@gmail.com>
Discussion: https://postgr.es/m/flat/CAAKRu_ZMw6Npd_qm2KM%2BFwQ3cMOMx1Dh3VMhp8-V7SOLxdK9-g%40mail.gmail.com
---
src/backend/access/heap/heapam.c | 44 ++++++++++------
src/backend/access/heap/heapam_xlog.c | 52 ++++++++++++++++++-
src/backend/access/heap/visibilitymap.c | 68 ++++++++++++++++++++++++-
src/backend/access/rmgrdesc/heapdesc.c | 5 ++
src/include/access/visibilitymap.h | 3 ++
5 files changed, 154 insertions(+), 18 deletions(-)
diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index ed0c0c2dc9f..7f354caec31 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -2466,7 +2466,11 @@ heap_multi_insert(Relation relation, TupleTableSlot **slots, int ntuples,
starting_with_empty_page = PageGetMaxOffsetNumber(page) == 0;
if (starting_with_empty_page && (options & HEAP_INSERT_FROZEN))
+ {
all_frozen_set = true;
+ /* Lock the vmbuffer before entering the critical section */
+ LockBuffer(vmbuffer, BUFFER_LOCK_EXCLUSIVE);
+ }
/* NO EREPORT(ERROR) from here till changes are logged */
START_CRIT_SECTION();
@@ -2506,7 +2510,8 @@ heap_multi_insert(Relation relation, TupleTableSlot **slots, int ntuples,
* going to add further frozen rows to it.
*
* If we're only adding already frozen rows to a previously empty
- * page, mark it as all-visible.
+ * page, mark it as all-frozen and update the visibility map. We're
+ * already holding a pin on the vmbuffer.
*/
if (PageIsAllVisible(page) && !(options & HEAP_INSERT_FROZEN))
{
@@ -2517,7 +2522,14 @@ heap_multi_insert(Relation relation, TupleTableSlot **slots, int ntuples,
vmbuffer, VISIBILITYMAP_VALID_BITS);
}
else if (all_frozen_set)
+ {
PageSetAllVisible(page);
+ visibilitymap_set_vmbits(BufferGetBlockNumber(buffer),
+ vmbuffer,
+ VISIBILITYMAP_ALL_VISIBLE |
+ VISIBILITYMAP_ALL_FROZEN,
+ RelationGetRelationName(relation));
+ }
/*
* XXX Should we set PageSetPrunable on this page ? See heap_insert()
@@ -2565,6 +2577,12 @@ heap_multi_insert(Relation relation, TupleTableSlot **slots, int ntuples,
xlrec->flags = 0;
if (all_visible_cleared)
xlrec->flags = XLH_INSERT_ALL_VISIBLE_CLEARED;
+
+ /*
+ * We don't have to worry about including a conflict xid in the
+ * WAL record as HEAP_INSERT_FROZEN intentionally violates
+ * visibility rules.
+ */
if (all_frozen_set)
xlrec->flags = XLH_INSERT_ALL_FROZEN_SET;
@@ -2627,7 +2645,10 @@ heap_multi_insert(Relation relation, TupleTableSlot **slots, int ntuples,
XLogBeginInsert();
XLogRegisterData(xlrec, tupledata - scratch.data);
+
XLogRegisterBuffer(0, buffer, REGBUF_STANDARD | bufflags);
+ if (all_frozen_set)
+ XLogRegisterBuffer(1, vmbuffer, 0);
XLogRegisterBufData(0, tupledata, totaldatalen);
@@ -2637,26 +2658,17 @@ heap_multi_insert(Relation relation, TupleTableSlot **slots, int ntuples,
recptr = XLogInsert(RM_HEAP2_ID, info);
PageSetLSN(page, recptr);
+ if (all_frozen_set)
+ {
+ Assert(BufferIsDirty(vmbuffer));
+ PageSetLSN(BufferGetPage(vmbuffer), recptr);
+ }
}
END_CRIT_SECTION();
- /*
- * If we've frozen everything on the page, update the visibilitymap.
- * We're already holding pin on the vmbuffer.
- */
if (all_frozen_set)
- {
- /*
- * It's fine to use InvalidTransactionId here - this is only used
- * when HEAP_INSERT_FROZEN is specified, which intentionally
- * violates visibility rules.
- */
- visibilitymap_set(relation, BufferGetBlockNumber(buffer), buffer,
- InvalidXLogRecPtr, vmbuffer,
- InvalidTransactionId,
- VISIBILITYMAP_ALL_VISIBLE | VISIBILITYMAP_ALL_FROZEN);
- }
+ LockBuffer(vmbuffer, BUFFER_LOCK_UNLOCK);
UnlockReleaseBuffer(buffer);
ndone += nthispage;
diff --git a/src/backend/access/heap/heapam_xlog.c b/src/backend/access/heap/heapam_xlog.c
index cf843277938..c2c7e6ab086 100644
--- a/src/backend/access/heap/heapam_xlog.c
+++ b/src/backend/access/heap/heapam_xlog.c
@@ -551,6 +551,7 @@ heap_xlog_multi_insert(XLogReaderState *record)
int i;
bool isinit = (XLogRecGetInfo(record) & XLOG_HEAP_INIT_PAGE) != 0;
XLogRedoAction action;
+ Buffer vmbuffer = InvalidBuffer;
/*
* Insertion doesn't overwrite MVCC data, so no conflict processing is
@@ -571,11 +572,11 @@ heap_xlog_multi_insert(XLogReaderState *record)
if (xlrec->flags & XLH_INSERT_ALL_VISIBLE_CLEARED)
{
Relation reln = CreateFakeRelcacheEntry(rlocator);
- Buffer vmbuffer = InvalidBuffer;
visibilitymap_pin(reln, blkno, &vmbuffer);
visibilitymap_clear(reln, blkno, vmbuffer, VISIBILITYMAP_VALID_BITS);
ReleaseBuffer(vmbuffer);
+ vmbuffer = InvalidBuffer;
FreeFakeRelcacheEntry(reln);
}
@@ -662,6 +663,55 @@ heap_xlog_multi_insert(XLogReaderState *record)
if (BufferIsValid(buffer))
UnlockReleaseBuffer(buffer);
+ buffer = InvalidBuffer;
+
+ /*
+ * Read and update the visibility map (VM) block.
+ *
+ * We must always redo VM changes, even if the corresponding heap page
+ * update was skipped due to the LSN interlock. Each VM block covers
+ * multiple heap pages, so later WAL records may update other bits in the
+ * same block. If this record includes a full-page image (FPI), subsequent
+ * WAL records may depend on it to guard against torn pages.
+ *
+ * Heap page changes are replayed first to preserve the invariant:
+ * PD_ALL_VISIBLE must be set on the heap page if the VM bit is set.
+ *
+ * Note that we released the heap page lock above. Under normal operation,
+ * this would be unsafe — a concurrent modification could clear
+ * PD_ALL_VISIBLE while the VM bit remained set, violating the invariant.
+ *
+ * During recovery, however, no concurrent writers exist. Therefore,
+ * updating the VM without holding the heap page lock is safe enough. This
+ * same approach is taken when replaying xl_heap_visible records (see
+ * heap_xlog_visible()).
+ */
+ if ((xlrec->flags & XLH_INSERT_ALL_FROZEN_SET) &&
+ XLogReadBufferForRedoExtended(record, 1, RBM_ZERO_ON_ERROR, false,
+ &vmbuffer) == BLK_NEEDS_REDO)
+ {
+ Page vmpage = BufferGetPage(vmbuffer);
+ char *relname;
+
+ /* initialize the page if it was read as zeros */
+ if (PageIsNew(vmpage))
+ PageInit(vmpage, BLCKSZ, 0);
+
+ /* We don't have relation name during recovery, so use relfilenode */
+ relname = psprintf("%u", rlocator.relNumber);
+ visibilitymap_set_vmbits(blkno,
+ vmbuffer,
+ VISIBILITYMAP_ALL_VISIBLE |
+ VISIBILITYMAP_ALL_FROZEN,
+ relname);
+
+ PageSetLSN(BufferGetPage(vmbuffer), lsn);
+ pfree(relname);
+ }
+
+ if (BufferIsValid(vmbuffer))
+ UnlockReleaseBuffer(vmbuffer);
+
/*
* If the page is running low on free space, update the FSM as well.
* Arbitrarily, our definition of "low" is less than 20%. We can't do much
diff --git a/src/backend/access/heap/visibilitymap.c b/src/backend/access/heap/visibilitymap.c
index 7306c16f05c..2d43147ffb7 100644
--- a/src/backend/access/heap/visibilitymap.c
+++ b/src/backend/access/heap/visibilitymap.c
@@ -14,7 +14,8 @@
* visibilitymap_clear - clear bits for one page in the visibility map
* visibilitymap_pin - pin a map page for setting a bit
* visibilitymap_pin_ok - check whether correct map page is already pinned
- * visibilitymap_set - set a bit in a previously pinned page
+ * visibilitymap_set - set bit(s) in a previously pinned page and log
+ * visibilitymap_set_vmbits - set bit(s) in a pinned page
* visibilitymap_get_status - get status of bits
* visibilitymap_count - count number of bits set in visibility map
* visibilitymap_prepare_truncate -
@@ -321,6 +322,71 @@ visibilitymap_set(Relation rel, BlockNumber heapBlk, Buffer heapBuf,
return status;
}
+/*
+ * Set visibility map (VM) flags in the block referenced by vmBuf.
+ *
+ * This function is intended for callers that log VM changes together
+ * with the heap page modifications that rendered the page all-visible.
+ * Callers that log VM changes separately should use visibilitymap_set().
+ *
+ * vmBuf must be pinned and exclusively locked, and it must cover the VM bits
+ * corresponding to heapBlk.
+ *
+ * In normal operation (not recovery), this must be called inside a critical
+ * section that also applies the necessary heap page changes and, if
+ * applicable, emits WAL.
+ *
+ * The caller is responsible for ensuring consistency between the heap page
+ * and the VM page by holding a pin and exclusive lock on the buffer
+ * containing heapBlk.
+ *
+ * heapRelname is used only for debugging.
+ */
+uint8
+visibilitymap_set_vmbits(BlockNumber heapBlk,
+ Buffer vmBuf, uint8 flags,
+ const char *heapRelname)
+{
+ BlockNumber mapBlock = HEAPBLK_TO_MAPBLOCK(heapBlk);
+ uint32 mapByte = HEAPBLK_TO_MAPBYTE(heapBlk);
+ uint8 mapOffset = HEAPBLK_TO_OFFSET(heapBlk);
+ Page page;
+ uint8 *map;
+ uint8 status;
+
+#ifdef TRACE_VISIBILITYMAP
+ elog(DEBUG1, "vm_set flags 0x%02X for %s %d",
+ flags, heapRelname, heapBlk);
+#endif
+
+ /* Call in same critical section where WAL is emitted. */
+ Assert(InRecovery || CritSectionCount > 0);
+
+ /* Flags should be valid. Also never clear bits with this function */
+ Assert((flags & VISIBILITYMAP_VALID_BITS) == flags);
+
+ /* Must never set all_frozen bit without also setting all_visible bit */
+ Assert(flags != VISIBILITYMAP_ALL_FROZEN);
+
+ /* Check that we have the right VM page pinned */
+ if (!BufferIsValid(vmBuf) || BufferGetBlockNumber(vmBuf) != mapBlock)
+ elog(ERROR, "wrong VM buffer passed to visibilitymap_set");
+
+ Assert(BufferIsExclusiveLocked(vmBuf));
+
+ page = BufferGetPage(vmBuf);
+ map = (uint8 *) PageGetContents(page);
+
+ status = (map[mapByte] >> mapOffset) & VISIBILITYMAP_VALID_BITS;
+ if (flags != status)
+ {
+ map[mapByte] |= (flags << mapOffset);
+ MarkBufferDirty(vmBuf);
+ }
+
+ return status;
+}
+
/*
* visibilitymap_get_status - get status of bits
*
diff --git a/src/backend/access/rmgrdesc/heapdesc.c b/src/backend/access/rmgrdesc/heapdesc.c
index 82b62c95de5..b48d7dc1d24 100644
--- a/src/backend/access/rmgrdesc/heapdesc.c
+++ b/src/backend/access/rmgrdesc/heapdesc.c
@@ -16,6 +16,7 @@
#include "access/heapam_xlog.h"
#include "access/rmgrdesc_utils.h"
+#include "access/visibilitymapdefs.h"
#include "storage/standbydefs.h"
/*
@@ -354,6 +355,10 @@ heap2_desc(StringInfo buf, XLogReaderState *record)
appendStringInfo(buf, "ntuples: %d, flags: 0x%02X", xlrec->ntuples,
xlrec->flags);
+ if (xlrec->flags & XLH_INSERT_ALL_FROZEN_SET)
+ appendStringInfo(buf, ", vm_flags: 0x%02X",
+ VISIBILITYMAP_ALL_VISIBLE | VISIBILITYMAP_ALL_FROZEN);
+
if (XLogRecHasBlockData(record, 0) && !isinit)
{
appendStringInfoString(buf, ", offsets:");
diff --git a/src/include/access/visibilitymap.h b/src/include/access/visibilitymap.h
index be21c6dd1a3..3dcf37ba03f 100644
--- a/src/include/access/visibilitymap.h
+++ b/src/include/access/visibilitymap.h
@@ -37,6 +37,9 @@ extern uint8 visibilitymap_set(Relation rel,
Buffer vmBuf,
TransactionId cutoff_xid,
uint8 flags);
+extern uint8 visibilitymap_set_vmbits(BlockNumber heapBlk,
+ Buffer vmBuf, uint8 flags,
+ const char *heapRelname);
extern uint8 visibilitymap_get_status(Relation rel, BlockNumber heapBlk, Buffer *vmbuf);
extern void visibilitymap_count(Relation rel, BlockNumber *all_visible, BlockNumber *all_frozen);
extern BlockNumber visibilitymap_prepare_truncate(Relation rel,
--
2.43.0
v17-0002-Eliminate-XLOG_HEAP2_VISIBLE-from-vacuum-phase-I.patchtext/x-patch; charset=UTF-8; name=v17-0002-Eliminate-XLOG_HEAP2_VISIBLE-from-vacuum-phase-I.patchDownload
From bb2a4c2d6800cd06cc804847b5862f36d8080617 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <melanieplageman@gmail.com>
Date: Wed, 8 Oct 2025 15:38:53 -0400
Subject: [PATCH v17 02/15] Eliminate XLOG_HEAP2_VISIBLE from vacuum phase III
Instead of emitting a separate XLOG_HEAP2_VISIBLE record for each page
that becomes all-visible in vacuum's third phase, record the
visibility map update in the already emitted
XLOG_HEAP2_PRUNE_VACUUM_CLEANUP record.
Visibility checks are now performed before marking dead items unused.
This is safe because the heap page is held under exclusive lock for the
entire operation.
This reduces the number of WAL records generated by VACUUM phase III by
up to 50%.
Author: Melanie Plageman <melanieplageman@gmail.com>
Reviewed-by: Robert Haas <robertmhaas@gmail.com>
Reviewed-by: Kirill Reshke <reshkekirill@gmail.com>
Reviewed-by: Andres Freund <andres@anarazel.de>
Discussion: https://postgr.es/m/flat/CAAKRu_ZMw6Npd_qm2KM%2BFwQ3cMOMx1Dh3VMhp8-V7SOLxdK9-g%40mail.gmail.com
---
src/backend/access/heap/heapam_xlog.c | 139 +++++++++++++++++-----
src/backend/access/heap/pruneheap.c | 56 ++++++++-
src/backend/access/heap/vacuumlazy.c | 153 ++++++++++++++++++-------
src/backend/access/rmgrdesc/heapdesc.c | 11 +-
src/include/access/heapam.h | 1 +
src/include/access/heapam_xlog.h | 17 ++-
6 files changed, 302 insertions(+), 75 deletions(-)
diff --git a/src/backend/access/heap/heapam_xlog.c b/src/backend/access/heap/heapam_xlog.c
index c2c7e6ab086..aaf595e75d6 100644
--- a/src/backend/access/heap/heapam_xlog.c
+++ b/src/backend/access/heap/heapam_xlog.c
@@ -35,7 +35,9 @@ heap_xlog_prune_freeze(XLogReaderState *record)
Buffer buffer;
RelFileLocator rlocator;
BlockNumber blkno;
- XLogRedoAction action;
+ Buffer vmbuffer = InvalidBuffer;
+ uint8 vmflags = 0;
+ Size freespace = 0;
XLogRecGetBlockTag(record, 0, &rlocator, NULL, &blkno);
memcpy(&xlrec, maindataptr, SizeOfHeapPrune);
@@ -50,11 +52,22 @@ heap_xlog_prune_freeze(XLogReaderState *record)
Assert((xlrec.flags & XLHP_CLEANUP_LOCK) != 0 ||
(xlrec.flags & (XLHP_HAS_REDIRECTIONS | XLHP_HAS_DEAD_ITEMS)) == 0);
+ if (xlrec.flags & XLHP_VM_ALL_VISIBLE)
+ {
+ vmflags = VISIBILITYMAP_ALL_VISIBLE;
+ if (xlrec.flags & XLHP_VM_ALL_FROZEN)
+ vmflags |= VISIBILITYMAP_ALL_FROZEN;
+ }
+
/*
- * We are about to remove and/or freeze tuples. In Hot Standby mode,
- * ensure that there are no queries running for which the removed tuples
- * are still visible or which still consider the frozen xids as running.
- * The conflict horizon XID comes after xl_heap_prune.
+ * After xl_heap_prune is the optional snapshot conflict horizon.
+ *
+ * In Hot Standby mode, we must ensure that there are no running queries
+ * which would conflict with the changes in this record. That means we
+ * can't replay this record if it removes tuples that are still visible to
+ * transactions on the standby, freeze tuples with xids that are still
+ * considered running on the standby, or set a page as all-visible in the
+ * VM if it isn't all-visible to all transactions on the standby.
*/
if ((xlrec.flags & XLHP_HAS_CONFLICT_HORIZON) != 0)
{
@@ -71,12 +84,12 @@ heap_xlog_prune_freeze(XLogReaderState *record)
}
/*
- * If we have a full-page image, restore it and we're done.
+ * If we have a full-page image of the heap block, restore it and we're
+ * done with the heap block.
*/
- action = XLogReadBufferForRedoExtended(record, 0, RBM_NORMAL,
- (xlrec.flags & XLHP_CLEANUP_LOCK) != 0,
- &buffer);
- if (action == BLK_NEEDS_REDO)
+ if (XLogReadBufferForRedoExtended(record, 0, RBM_NORMAL,
+ (xlrec.flags & XLHP_CLEANUP_LOCK) != 0,
+ &buffer) == BLK_NEEDS_REDO)
{
Page page = BufferGetPage(buffer);
OffsetNumber *redirected;
@@ -90,6 +103,7 @@ heap_xlog_prune_freeze(XLogReaderState *record)
xlhp_freeze_plan *plans;
OffsetNumber *frz_offsets;
char *dataptr = XLogRecGetBlockData(record, 0, &datalen);
+ bool do_prune;
heap_xlog_deserialize_prune_and_freeze(dataptr, xlrec.flags,
&nplans, &plans, &frz_offsets,
@@ -97,11 +111,16 @@ heap_xlog_prune_freeze(XLogReaderState *record)
&ndead, &nowdead,
&nunused, &nowunused);
+ do_prune = nredirected > 0 || ndead > 0 || nunused > 0;
+
+ /* Ensure the record does something */
+ Assert(do_prune || nplans > 0 || vmflags & VISIBILITYMAP_VALID_BITS);
+
/*
* Update all line pointers per the record, and repair fragmentation
* if needed.
*/
- if (nredirected > 0 || ndead > 0 || nunused > 0)
+ if (do_prune)
heap_page_prune_execute(buffer,
(xlrec.flags & XLHP_CLEANUP_LOCK) == 0,
redirected, nredirected,
@@ -138,36 +157,104 @@ heap_xlog_prune_freeze(XLogReaderState *record)
/* There should be no more data */
Assert((char *) frz_offsets == dataptr + datalen);
+ if ((vmflags & VISIBILITYMAP_VALID_BITS))
+ PageSetAllVisible(page);
+
+ MarkBufferDirty(buffer);
+
+ /*
+ * Always emit a WAL record when setting PD_ALL_VISIBLE but only emit
+ * an FPI if checksums/wal_log_hints are enabled. Advance the page LSN
+ * only if the record could include an FPI, since recovery skips
+ * records <= the stamped LSN. Otherwise it might skip an earlier FPI
+ * needed to repair a torn page.
+ */
+ if (do_prune || nplans > 0 ||
+ ((vmflags & VISIBILITYMAP_VALID_BITS) && XLogHintBitIsNeeded()))
+ PageSetLSN(page, lsn);
+
/*
* Note: we don't worry about updating the page's prunability hints.
* At worst this will cause an extra prune cycle to occur soon.
*/
-
- PageSetLSN(page, lsn);
- MarkBufferDirty(buffer);
}
/*
- * If we released any space or line pointers, update the free space map.
+ * If we released any space or line pointers or set PD_ALL_VISIBLE or the
+ * VM, update the freespace map.
+ *
+ * Even when no actual space is freed (e.g., when only marking the page
+ * all-visible or frozen), we still update the FSM. Because the FSM is
+ * unlogged and maintained heuristically, it often becomes stale on
+ * standbys. If such a standby is later promoted and runs VACUUM, it will
+ * skip recalculating free space for pages that were marked all-visible
+ * (or all-frozen, depending on the mode). FreeSpaceMapVacuum can then
+ * propagate overly optimistic free space values upward, causing future
+ * insertions to select pages that turn out to be unusable. In bulk, this
+ * can lead to long stalls.
+ *
+ * To prevent this, always refresh the FSM’s view when a page becomes
+ * all-visible or all-frozen.
+ *
+ * Do this regardless of whether a full-page image is logged, since FSM
+ * data is not part of the page itself.
*
- * Do this regardless of a full-page image being applied, since the FSM
- * data is not in the page anyway.
*/
if (BufferIsValid(buffer))
{
- if (xlrec.flags & (XLHP_HAS_REDIRECTIONS |
- XLHP_HAS_DEAD_ITEMS |
- XLHP_HAS_NOW_UNUSED_ITEMS))
- {
- Size freespace = PageGetHeapFreeSpace(BufferGetPage(buffer));
+ if ((xlrec.flags & (XLHP_HAS_REDIRECTIONS |
+ XLHP_HAS_DEAD_ITEMS |
+ XLHP_HAS_NOW_UNUSED_ITEMS)) ||
+ (vmflags & VISIBILITYMAP_VALID_BITS))
+ freespace = PageGetHeapFreeSpace(BufferGetPage(buffer));
+
+ /*
+ * We want to avoid holding an exclusive lock on the heap buffer while
+ * doing IO (either of the FSM or the VM), so we'll release the lock
+ * on the heap buffer before doing either.
+ */
+ UnlockReleaseBuffer(buffer);
+ }
+
+ /*
+ * Now read and update the VM block.
+ *
+ * We must redo changes to the VM even if the heap page was skipped due to
+ * LSN interlock. See comment in heap_xlog_multi_insert() for more details
+ * on replaying changes to the VM.
+ */
+ if ((vmflags & VISIBILITYMAP_VALID_BITS) &&
+ XLogReadBufferForRedoExtended(record, 1,
+ RBM_ZERO_ON_ERROR,
+ false,
+ &vmbuffer) == BLK_NEEDS_REDO)
+ {
+ Page vmpage = BufferGetPage(vmbuffer);
+ char *relname;
+ uint8 old_vmbits = 0;
+
+ /* initialize the page if it was read as zeros */
+ if (PageIsNew(vmpage))
+ PageInit(vmpage, BLCKSZ, 0);
- UnlockReleaseBuffer(buffer);
+ /* We don't have relation name during recovery, so use relfilenode */
+ relname = psprintf("%u", rlocator.relNumber);
+ old_vmbits = visibilitymap_set_vmbits(blkno, vmbuffer, vmflags, relname);
- XLogRecordPageWithFreeSpace(rlocator, blkno, freespace);
+ /* Only set VM page LSN if we modified the page */
+ if (old_vmbits != vmflags)
+ {
+ Assert(BufferIsDirty(vmbuffer));
+ PageSetLSN(BufferGetPage(vmbuffer), lsn);
}
- else
- UnlockReleaseBuffer(buffer);
+ pfree(relname);
}
+
+ if (BufferIsValid(vmbuffer))
+ UnlockReleaseBuffer(vmbuffer);
+
+ if (freespace > 0)
+ XLogRecordPageWithFreeSpace(rlocator, blkno, freespace);
}
/*
diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index d8ea0c78f77..9052e1a584c 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -19,6 +19,7 @@
#include "access/htup_details.h"
#include "access/multixact.h"
#include "access/transam.h"
+#include "access/visibilitymapdefs.h"
#include "access/xlog.h"
#include "access/xloginsert.h"
#include "commands/vacuum.h"
@@ -835,6 +836,8 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
conflict_xid = prstate.latest_xid_removed;
log_heap_prune_and_freeze(relation, buffer,
+ InvalidBuffer, /* vmbuffer */
+ 0, /* vmflags */
conflict_xid,
true, reason,
prstate.frozen, prstate.nfrozen,
@@ -2045,12 +2048,17 @@ heap_log_freeze_plan(HeapTupleFreeze *tuples, int ntuples,
* replaying 'unused' items depends on whether they were all previously marked
* as dead.
*
+ * If the VM is being updated, vmflags will contain the bits to set. In this
+ * case, vmbuffer should already have been updated and marked dirty and should
+ * still be pinned and locked.
+ *
* Note: This function scribbles on the 'frozen' array.
*
* Note: This is called in a critical section, so careful what you do here.
*/
void
log_heap_prune_and_freeze(Relation relation, Buffer buffer,
+ Buffer vmbuffer, uint8 vmflags,
TransactionId conflict_xid,
bool cleanup_lock,
PruneReason reason,
@@ -2062,6 +2070,7 @@ log_heap_prune_and_freeze(Relation relation, Buffer buffer,
xl_heap_prune xlrec;
XLogRecPtr recptr;
uint8 info;
+ uint8 regbuf_flags;
/* The following local variables hold data registered in the WAL record: */
xlhp_freeze_plan plans[MaxHeapTuplesPerPage];
@@ -2070,8 +2079,24 @@ log_heap_prune_and_freeze(Relation relation, Buffer buffer,
xlhp_prune_items dead_items;
xlhp_prune_items unused_items;
OffsetNumber frz_offsets[MaxHeapTuplesPerPage];
+ bool do_prune = nredirected > 0 || ndead > 0 || nunused > 0;
+ bool do_set_vm = vmflags & VISIBILITYMAP_VALID_BITS;
xlrec.flags = 0;
+ regbuf_flags = REGBUF_STANDARD;
+
+ Assert((vmflags & VISIBILITYMAP_VALID_BITS) == vmflags);
+
+ /*
+ * We can avoid an FPI if the only modification we are making to the heap
+ * page is to set PD_ALL_VISIBLE and checksums/wal_log_hints are disabled.
+ * Note that if we explicitly skip an FPI, we must not set the heap page
+ * LSN later.
+ */
+ if (!do_prune &&
+ nfrozen == 0 &&
+ (!do_set_vm || !XLogHintBitIsNeeded()))
+ regbuf_flags |= REGBUF_NO_IMAGE;
/*
* Prepare data for the buffer. The arrays are not actually in the
@@ -2079,7 +2104,11 @@ log_heap_prune_and_freeze(Relation relation, Buffer buffer,
* page image, the arrays can be omitted.
*/
XLogBeginInsert();
- XLogRegisterBuffer(0, buffer, REGBUF_STANDARD);
+ XLogRegisterBuffer(0, buffer, regbuf_flags);
+
+ if (do_set_vm)
+ XLogRegisterBuffer(1, vmbuffer, 0);
+
if (nfrozen > 0)
{
int nplans;
@@ -2136,6 +2165,12 @@ log_heap_prune_and_freeze(Relation relation, Buffer buffer,
* Prepare the main xl_heap_prune record. We already set the XLHP_HAS_*
* flag above.
*/
+ if (vmflags & VISIBILITYMAP_ALL_VISIBLE)
+ {
+ xlrec.flags |= XLHP_VM_ALL_VISIBLE;
+ if (vmflags & VISIBILITYMAP_ALL_FROZEN)
+ xlrec.flags |= XLHP_VM_ALL_FROZEN;
+ }
if (RelationIsAccessibleInLogicalDecoding(relation))
xlrec.flags |= XLHP_IS_CATALOG_REL;
if (TransactionIdIsValid(conflict_xid))
@@ -2168,5 +2203,22 @@ log_heap_prune_and_freeze(Relation relation, Buffer buffer,
}
recptr = XLogInsert(RM_HEAP2_ID, info);
- PageSetLSN(BufferGetPage(buffer), recptr);
+ if (do_set_vm)
+ {
+ Assert(BufferIsDirty(vmbuffer));
+ PageSetLSN(BufferGetPage(vmbuffer), recptr);
+ }
+
+ /*
+ * We must bump the page LSN if pruning or freezing. If we are only
+ * updating PD_ALL_VISIBLE, though, we can skip doing this unless
+ * wal_log_hints/checksums are enabled. Torn pages are possible if we
+ * update PD_ALL_VISIBLE without bumping the LSN, but this is deemed okay
+ * for page hint updates.
+ */
+ if (do_prune || nfrozen > 0 || (do_set_vm && XLogHintBitIsNeeded()))
+ {
+ Assert(BufferIsDirty(buffer));
+ PageSetLSN(BufferGetPage(buffer), recptr);
+ }
}
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index ab6938d1da1..dfe617a914f 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -465,6 +465,11 @@ static void dead_items_reset(LVRelState *vacrel);
static void dead_items_cleanup(LVRelState *vacrel);
static bool heap_page_is_all_visible(LVRelState *vacrel, Buffer buf,
TransactionId *visibility_cutoff_xid, bool *all_frozen);
+static bool heap_page_would_be_all_visible(LVRelState *vacrel, Buffer buf,
+ OffsetNumber *deadoffsets,
+ int ndeadoffsets,
+ bool *all_frozen,
+ TransactionId *visibility_cutoff_xid);
static void update_relstats_all_indexes(LVRelState *vacrel);
static void vacuum_error_callback(void *arg);
static void update_vacuum_error_info(LVRelState *vacrel,
@@ -2848,8 +2853,10 @@ lazy_vacuum_heap_page(LVRelState *vacrel, BlockNumber blkno, Buffer buffer,
OffsetNumber unused[MaxHeapTuplesPerPage];
int nunused = 0;
TransactionId visibility_cutoff_xid;
+ TransactionId conflict_xid = InvalidTransactionId;
bool all_frozen;
LVSavedErrInfo saved_err_info;
+ uint8 vmflags = 0;
Assert(vacrel->do_index_vacuuming);
@@ -2860,6 +2867,29 @@ lazy_vacuum_heap_page(LVRelState *vacrel, BlockNumber blkno, Buffer buffer,
VACUUM_ERRCB_PHASE_VACUUM_HEAP, blkno,
InvalidOffsetNumber);
+ /*
+ * Before marking dead items unused, check whether the page will become
+ * all-visible once that change is applied. This lets us reap the tuples
+ * and mark the page all-visible within the same critical section,
+ * enabling both changes to be emitted in a single WAL record. Since the
+ * visibility checks may perform I/O and allocate memory, they must be
+ * done outside the critical section.
+ */
+ if (heap_page_would_be_all_visible(vacrel, buffer,
+ deadoffsets, num_offsets,
+ &all_frozen, &visibility_cutoff_xid))
+ {
+ vmflags |= VISIBILITYMAP_ALL_VISIBLE;
+ if (all_frozen)
+ {
+ vmflags |= VISIBILITYMAP_ALL_FROZEN;
+ Assert(!TransactionIdIsValid(visibility_cutoff_xid));
+ }
+
+ /* Take the lock on the vmbuffer before entering a critical section */
+ LockBuffer(vmbuffer, BUFFER_LOCK_EXCLUSIVE);
+ }
+
START_CRIT_SECTION();
for (int i = 0; i < num_offsets; i++)
@@ -2879,6 +2909,21 @@ lazy_vacuum_heap_page(LVRelState *vacrel, BlockNumber blkno, Buffer buffer,
/* Attempt to truncate line pointer array now */
PageTruncateLinePointerArray(page);
+ /*
+ * The page is guaranteed to have had dead line pointers, so
+ * PD_ALL_VISIBLE cannot be already set. Therefore, whenever we set the VM
+ * bit, we must also set PD_ALL_VISIBLE. The heap page lock is held while
+ * updating the VM to ensure consistency.
+ */
+ if ((vmflags & VISIBILITYMAP_VALID_BITS) != 0)
+ {
+ PageSetAllVisible(page);
+ visibilitymap_set_vmbits(blkno,
+ vmbuffer, vmflags,
+ RelationGetRelationName(vacrel->rel));
+ conflict_xid = visibility_cutoff_xid;
+ }
+
/*
* Mark buffer dirty before we write WAL.
*/
@@ -2888,7 +2933,8 @@ lazy_vacuum_heap_page(LVRelState *vacrel, BlockNumber blkno, Buffer buffer,
if (RelationNeedsWAL(vacrel->rel))
{
log_heap_prune_and_freeze(vacrel->rel, buffer,
- InvalidTransactionId,
+ vmbuffer, vmflags,
+ conflict_xid,
false, /* no cleanup lock required */
PRUNE_VACUUM_CLEANUP,
NULL, 0, /* frozen */
@@ -2897,39 +2943,12 @@ lazy_vacuum_heap_page(LVRelState *vacrel, BlockNumber blkno, Buffer buffer,
unused, nunused);
}
- /*
- * End critical section, so we safely can do visibility tests (which
- * possibly need to perform IO and allocate memory!). If we crash now the
- * page (including the corresponding vm bit) might not be marked all
- * visible, but that's fine. A later vacuum will fix that.
- */
END_CRIT_SECTION();
- /*
- * Now that we have removed the LP_DEAD items from the page, once again
- * check if the page has become all-visible. The page is already marked
- * dirty, exclusively locked, and, if needed, a full page image has been
- * emitted.
- */
- Assert(!PageIsAllVisible(page));
- if (heap_page_is_all_visible(vacrel, buffer, &visibility_cutoff_xid,
- &all_frozen))
+ if ((vmflags & VISIBILITYMAP_ALL_VISIBLE) != 0)
{
- uint8 flags = VISIBILITYMAP_ALL_VISIBLE;
-
- if (all_frozen)
- {
- Assert(!TransactionIdIsValid(visibility_cutoff_xid));
- flags |= VISIBILITYMAP_ALL_FROZEN;
- }
-
- PageSetAllVisible(page);
- visibilitymap_set(vacrel->rel, blkno, buffer,
- InvalidXLogRecPtr,
- vmbuffer, visibility_cutoff_xid,
- flags);
-
/* Count the newly set VM page for logging */
+ LockBuffer(vmbuffer, BUFFER_LOCK_UNLOCK);
vacrel->vm_new_visible_pages++;
if (all_frozen)
vacrel->vm_new_visible_frozen_pages++;
@@ -3598,30 +3617,74 @@ dead_items_cleanup(LVRelState *vacrel)
}
/*
- * Check if every tuple in the given page is visible to all current and future
- * transactions. Also return the visibility_cutoff_xid which is the highest
- * xmin amongst the visible tuples. Set *all_frozen to true if every tuple
- * on this page is frozen.
- *
- * This is a stripped down version of lazy_scan_prune(). If you change
- * anything here, make sure that everything stays in sync. Note that an
- * assertion calls us to verify that everybody still agrees. Be sure to avoid
- * introducing new side-effects here.
+ * Wrapper for heap_page_would_be_all_visible() which can be used for
+ * callers that expect no LP_DEAD on the page.
*/
static bool
heap_page_is_all_visible(LVRelState *vacrel, Buffer buf,
TransactionId *visibility_cutoff_xid,
bool *all_frozen)
{
+
+ return heap_page_would_be_all_visible(vacrel, buf,
+ NULL, 0,
+ all_frozen,
+ visibility_cutoff_xid);
+}
+
+/*
+ * Check whether the heap page in buf is all-visible except for the dead
+ * tuples referenced in the deadoffsets array.
+ *
+ * The visibility checks may perform IO and allocate memory so they must not
+ * be done in a critical section. This function is used by vacuum to determine
+ * if the page will be all-visible once it reaps known dead tuples. That way
+ * it can do both in the same critical section and emit a single WAL record.
+ *
+ * Returns true if the page is all-visible other than the provided
+ * deadoffsets and false otherwise.
+ *
+ * Output parameters:
+ *
+ * - *all_frozen: true if every tuple on the page is frozen
+ * - *visibility_cutoff_xid: newest xmin; valid only if page is all-visible
+ *
+ * Callers looking to verify that the page is already all-visible can call
+ * heap_page_is_all_visible().
+ *
+ * This logic is closely related to heap_prune_record_unchanged_lp_normal().
+ * If you modify this function, ensure consistency with that code. An
+ * assertion cross-checks that both remain in agreement. Do not introduce new
+ * side-effects.
+ */
+static bool
+heap_page_would_be_all_visible(LVRelState *vacrel, Buffer buf,
+ OffsetNumber *deadoffsets,
+ int ndeadoffsets,
+ bool *all_frozen,
+ TransactionId *visibility_cutoff_xid)
+{
Page page = BufferGetPage(buf);
BlockNumber blockno = BufferGetBlockNumber(buf);
OffsetNumber offnum,
maxoff;
bool all_visible = true;
+ int matched_dead_count = 0;
*visibility_cutoff_xid = InvalidTransactionId;
*all_frozen = true;
+ Assert(ndeadoffsets == 0 || deadoffsets);
+
+#ifdef USE_ASSERT_CHECKING
+ /* Confirm input deadoffsets[] is strictly sorted */
+ if (ndeadoffsets > 1)
+ {
+ for (int i = 1; i < ndeadoffsets; i++)
+ Assert(deadoffsets[i - 1] < deadoffsets[i]);
+ }
+#endif
+
maxoff = PageGetMaxOffsetNumber(page);
for (offnum = FirstOffsetNumber;
offnum <= maxoff && all_visible;
@@ -3649,9 +3712,15 @@ heap_page_is_all_visible(LVRelState *vacrel, Buffer buf,
*/
if (ItemIdIsDead(itemid))
{
- all_visible = false;
- *all_frozen = false;
- break;
+ if (!deadoffsets ||
+ matched_dead_count >= ndeadoffsets ||
+ deadoffsets[matched_dead_count] != offnum)
+ {
+ *all_frozen = all_visible = false;
+ break;
+ }
+ matched_dead_count++;
+ continue;
}
Assert(ItemIdIsNormal(itemid));
diff --git a/src/backend/access/rmgrdesc/heapdesc.c b/src/backend/access/rmgrdesc/heapdesc.c
index b48d7dc1d24..1cb44ca32d3 100644
--- a/src/backend/access/rmgrdesc/heapdesc.c
+++ b/src/backend/access/rmgrdesc/heapdesc.c
@@ -103,7 +103,7 @@ plan_elem_desc(StringInfo buf, void *plan, void *data)
* code, the latter of which is used in frontend (pg_waldump) code.
*/
void
-heap_xlog_deserialize_prune_and_freeze(char *cursor, uint8 flags,
+heap_xlog_deserialize_prune_and_freeze(char *cursor, uint16 flags,
int *nplans, xlhp_freeze_plan **plans,
OffsetNumber **frz_offsets,
int *nredirected, OffsetNumber **redirected,
@@ -287,6 +287,15 @@ heap2_desc(StringInfo buf, XLogReaderState *record)
appendStringInfo(buf, ", isCatalogRel: %c",
xlrec->flags & XLHP_IS_CATALOG_REL ? 'T' : 'F');
+ if (xlrec->flags & XLHP_VM_ALL_VISIBLE)
+ {
+ uint8 vmflags = VISIBILITYMAP_ALL_VISIBLE;
+
+ if (xlrec->flags & XLHP_VM_ALL_FROZEN)
+ vmflags |= VISIBILITYMAP_ALL_FROZEN;
+ appendStringInfo(buf, ", vm_flags: 0x%02X", vmflags);
+ }
+
if (XLogRecHasBlockData(record, 0))
{
Size datalen;
diff --git a/src/include/access/heapam.h b/src/include/access/heapam.h
index e60d34dad25..8cbff6ab0eb 100644
--- a/src/include/access/heapam.h
+++ b/src/include/access/heapam.h
@@ -382,6 +382,7 @@ extern void heap_page_prune_execute(Buffer buffer, bool lp_truncate_only,
OffsetNumber *nowunused, int nunused);
extern void heap_get_root_tuples(Page page, OffsetNumber *root_offsets);
extern void log_heap_prune_and_freeze(Relation relation, Buffer buffer,
+ Buffer vmbuffer, uint8 vmflags,
TransactionId conflict_xid,
bool cleanup_lock,
PruneReason reason,
diff --git a/src/include/access/heapam_xlog.h b/src/include/access/heapam_xlog.h
index d4c0625b632..16c2b2e3c9c 100644
--- a/src/include/access/heapam_xlog.h
+++ b/src/include/access/heapam_xlog.h
@@ -249,7 +249,7 @@ typedef struct xl_heap_update
* Main data section:
*
* xl_heap_prune
- * uint8 flags
+ * uint16 flags
* TransactionId snapshot_conflict_horizon
*
* Block 0 data section:
@@ -284,7 +284,7 @@ typedef struct xl_heap_update
*/
typedef struct xl_heap_prune
{
- uint8 flags;
+ uint16 flags;
/*
* If XLHP_HAS_CONFLICT_HORIZON is set, the conflict horizon XID follows,
@@ -292,7 +292,7 @@ typedef struct xl_heap_prune
*/
} xl_heap_prune;
-#define SizeOfHeapPrune (offsetof(xl_heap_prune, flags) + sizeof(uint8))
+#define SizeOfHeapPrune (offsetof(xl_heap_prune, flags) + sizeof(uint16))
/* to handle recovery conflict during logical decoding on standby */
#define XLHP_IS_CATALOG_REL (1 << 1)
@@ -330,6 +330,15 @@ typedef struct xl_heap_prune
#define XLHP_HAS_DEAD_ITEMS (1 << 6)
#define XLHP_HAS_NOW_UNUSED_ITEMS (1 << 7)
+/*
+ * The xl_heap_prune record's flags may also contain which VM bits to set.
+ * xl_heap_prune should always use the XLHP_VM_ALL_VISIBLE and
+ * XLHP_VM_ALL_FROZEN flags and translate them to their visibilitymapdefs.h
+ * equivalents, VISIBILITYMAP_ALL_VISIBLE and VISIBILITYMAP_ALL_FROZEN.
+ */
+#define XLHP_VM_ALL_VISIBLE (1 << 8)
+#define XLHP_VM_ALL_FROZEN (1 << 9)
+
/*
* xlhp_freeze_plan describes how to freeze a group of one or more heap tuples
* (appears in xl_heap_prune's xlhp_freeze_plans sub-record)
@@ -497,7 +506,7 @@ extern XLogRecPtr log_heap_visible(Relation rel, Buffer heap_buffer,
uint8 vmflags);
/* in heapdesc.c, so it can be shared between frontend/backend code */
-extern void heap_xlog_deserialize_prune_and_freeze(char *cursor, uint8 flags,
+extern void heap_xlog_deserialize_prune_and_freeze(char *cursor, uint16 flags,
int *nplans, xlhp_freeze_plan **plans,
OffsetNumber **frz_offsets,
int *nredirected, OffsetNumber **redirected,
--
2.43.0
v17-0003-Assorted-trivial-heap_page_prune_and_freeze-clea.patchtext/x-patch; charset=US-ASCII; name=v17-0003-Assorted-trivial-heap_page_prune_and_freeze-clea.patchDownload
From 6b5fc27f0d80bab1df86a2e6fb51b64fd20c3cbb Mon Sep 17 00:00:00 2001
From: Melanie Plageman <melanieplageman@gmail.com>
Date: Mon, 15 Sep 2025 12:06:19 -0400
Subject: [PATCH v17 03/15] Assorted trivial heap_page_prune_and_freeze cleanup
Group heap_page_prune_and_freeze() input parameters in a struct and
clean up their documentation.
Rename a member of PruneState and disambiguate some local
heap_page_prune_and_freeze() variables.
---
src/backend/access/heap/pruneheap.c | 112 +++++++++++++--------------
src/backend/access/heap/vacuumlazy.c | 16 ++--
src/include/access/heapam.h | 62 ++++++++++++---
src/tools/pgindent/typedefs.list | 1 +
4 files changed, 114 insertions(+), 77 deletions(-)
diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index 9052e1a584c..be42d3c3272 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -43,7 +43,7 @@ typedef struct
/* whether or not dead items can be set LP_UNUSED during pruning */
bool mark_unused_now;
/* whether to attempt freezing tuples */
- bool freeze;
+ bool attempt_freeze;
struct VacuumCutoffs *cutoffs;
/*-------------------------------------------------------
@@ -254,15 +254,23 @@ heap_page_prune_opt(Relation relation, Buffer buffer)
if (PageIsFull(page) || PageGetHeapFreeSpace(page) < minfree)
{
OffsetNumber dummy_off_loc;
+ PruneFreezeParams params;
PruneFreezeResult presult;
+ params.relation = relation;
+ params.buffer = buffer;
+ params.reason = PRUNE_ON_ACCESS;
+ params.vistest = vistest;
+ params.cutoffs = NULL;
+
/*
* For now, pass mark_unused_now as false regardless of whether or
* not the relation has indexes, since we cannot safely determine
* that during on-access pruning with the current implementation.
*/
- heap_page_prune_and_freeze(relation, buffer, vistest, 0,
- NULL, &presult, PRUNE_ON_ACCESS, &dummy_off_loc, NULL, NULL);
+ params.options = 0;
+
+ heap_page_prune_and_freeze(¶ms, &presult, &dummy_off_loc, NULL, NULL);
/*
* Report the number of tuples reclaimed to pgstats. This is
@@ -304,60 +312,43 @@ heap_page_prune_opt(Relation relation, Buffer buffer)
* also need to account for a reduction in the length of the line pointer
* array following array truncation by us.
*
- * If the HEAP_PRUNE_FREEZE option is set, we will also freeze tuples if it's
- * required in order to advance relfrozenxid / relminmxid, or if it's
- * considered advantageous for overall system performance to do so now. The
- * 'cutoffs', 'presult', 'new_relfrozen_xid' and 'new_relmin_mxid' arguments
- * are required when freezing. When HEAP_PRUNE_FREEZE option is set, we also
- * set presult->all_visible and presult->all_frozen on exit, to indicate if
- * the VM bits can be set. They are always set to false when the
- * HEAP_PRUNE_FREEZE option is not set, because at the moment only callers
- * that also freeze need that information.
- *
- * vistest is used to distinguish whether tuples are DEAD or RECENTLY_DEAD
- * (see heap_prune_satisfies_vacuum).
+ * params contains the input parameters used to control freezing and pruning
+ * behavior. See the definition of PruneFreezeParams for more on what each
+ * parameter does.
*
- * options:
- * MARK_UNUSED_NOW indicates that dead items can be set LP_UNUSED during
- * pruning.
- *
- * FREEZE indicates that we will also freeze tuples, and will return
- * 'all_visible', 'all_frozen' flags to the caller.
- *
- * cutoffs contains the freeze cutoffs, established by VACUUM at the beginning
- * of vacuuming the relation. Required if HEAP_PRUNE_FREEZE option is set.
- * cutoffs->OldestXmin is also used to determine if dead tuples are
- * HEAPTUPLE_RECENTLY_DEAD or HEAPTUPLE_DEAD.
+ * If the HEAP_PRUNE_FREEZE option is set in params, we will freeze tuples if
+ * it's required in order to advance relfrozenxid / relminmxid, or if it's
+ * considered advantageous for overall system performance to do so now. The
+ * 'params.cutoffs', 'presult', 'new_relfrozen_xid' and 'new_relmin_mxid'
+ * arguments are required when freezing. When HEAP_PRUNE_FREEZE option is
+ * passed, we also set presult->all_visible and presult->all_frozen on exit,
+ * to indicate if the VM bits can be set. They are always set to false when
+ * the HEAP_PRUNE_FREEZE option is not passed, because at the moment only
+ * callers that also freeze need that information.
*
* presult contains output parameters needed by callers, such as the number of
* tuples removed and the offsets of dead items on the page after pruning.
* heap_page_prune_and_freeze() is responsible for initializing it. Required
* by all callers.
*
- * reason indicates why the pruning is performed. It is included in the WAL
- * record for debugging and analysis purposes, but otherwise has no effect.
- *
* off_loc is the offset location required by the caller to use in error
* callback.
*
* new_relfrozen_xid and new_relmin_mxid must provided by the caller if the
- * HEAP_PRUNE_FREEZE option is set. On entry, they contain the oldest XID and
- * multi-XID seen on the relation so far. They will be updated with oldest
- * values present on the page after pruning. After processing the whole
- * relation, VACUUM can use these values as the new relfrozenxid/relminmxid
- * for the relation.
+ * HEAP_PRUNE_FREEZE option is set in params. On entry, they contain the
+ * oldest XID and multi-XID seen on the relation so far. They will be updated
+ * with oldest values present on the page after pruning. After processing the
+ * whole relation, VACUUM can use these values as the new
+ * relfrozenxid/relminmxid for the relation.
*/
void
-heap_page_prune_and_freeze(Relation relation, Buffer buffer,
- GlobalVisState *vistest,
- int options,
- struct VacuumCutoffs *cutoffs,
+heap_page_prune_and_freeze(PruneFreezeParams *params,
PruneFreezeResult *presult,
- PruneReason reason,
OffsetNumber *off_loc,
TransactionId *new_relfrozen_xid,
MultiXactId *new_relmin_mxid)
{
+ Buffer buffer = params->buffer;
Page page = BufferGetPage(buffer);
BlockNumber blockno = BufferGetBlockNumber(buffer);
OffsetNumber offnum,
@@ -366,15 +357,16 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
HeapTupleData tup;
bool do_freeze;
bool do_prune;
- bool do_hint;
- bool hint_bit_fpi;
+ bool do_hint_prune;
+ bool did_tuple_hint_fpi;
int64 fpi_before = pgWalUsage.wal_fpi;
/* Copy parameters to prstate */
- prstate.vistest = vistest;
- prstate.mark_unused_now = (options & HEAP_PAGE_PRUNE_MARK_UNUSED_NOW) != 0;
- prstate.freeze = (options & HEAP_PAGE_PRUNE_FREEZE) != 0;
- prstate.cutoffs = cutoffs;
+ prstate.vistest = params->vistest;
+ prstate.mark_unused_now =
+ (params->options & HEAP_PAGE_PRUNE_MARK_UNUSED_NOW) != 0;
+ prstate.attempt_freeze = (params->options & HEAP_PAGE_PRUNE_FREEZE) != 0;
+ prstate.cutoffs = params->cutoffs;
/*
* Our strategy is to scan the page and make lists of items to change,
@@ -395,7 +387,7 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
/* initialize page freezing working state */
prstate.pagefrz.freeze_required = false;
- if (prstate.freeze)
+ if (prstate.attempt_freeze)
{
Assert(new_relfrozen_xid && new_relmin_mxid);
prstate.pagefrz.FreezePageRelfrozenXid = *new_relfrozen_xid;
@@ -442,7 +434,7 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
* function, when we return the value to the caller, so that the caller
* doesn't set the VM bit incorrectly.
*/
- if (prstate.freeze)
+ if (prstate.attempt_freeze)
{
prstate.all_visible = true;
prstate.all_frozen = true;
@@ -468,7 +460,7 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
prstate.visibility_cutoff_xid = InvalidTransactionId;
maxoff = PageGetMaxOffsetNumber(page);
- tup.t_tableOid = RelationGetRelid(relation);
+ tup.t_tableOid = RelationGetRelid(params->relation);
/*
* Determine HTSV for all tuples, and queue them up for processing as HOT
@@ -556,7 +548,7 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
* If checksums are enabled, heap_prune_satisfies_vacuum() may have caused
* an FPI to be emitted.
*/
- hint_bit_fpi = fpi_before != pgWalUsage.wal_fpi;
+ did_tuple_hint_fpi = fpi_before != pgWalUsage.wal_fpi;
/*
* Process HOT chains.
@@ -664,7 +656,7 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
* pd_prune_xid field or the page was marked full, we will update the hint
* bit.
*/
- do_hint = ((PageHeader) page)->pd_prune_xid != prstate.new_prune_xid ||
+ do_hint_prune = ((PageHeader) page)->pd_prune_xid != prstate.new_prune_xid ||
PageIsFull(page);
/*
@@ -672,7 +664,7 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
* plans we prepared, or not.
*/
do_freeze = false;
- if (prstate.freeze)
+ if (prstate.attempt_freeze)
{
if (prstate.pagefrz.freeze_required)
{
@@ -703,16 +695,16 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
* Freezing would make the page all-frozen. Have already
* emitted an FPI or will do so anyway?
*/
- if (RelationNeedsWAL(relation))
+ if (RelationNeedsWAL(params->relation))
{
- if (hint_bit_fpi)
+ if (did_tuple_hint_fpi)
do_freeze = true;
else if (do_prune)
{
if (XLogCheckBufferNeedsBackup(buffer))
do_freeze = true;
}
- else if (do_hint)
+ else if (do_hint_prune)
{
if (XLogHintBitIsNeeded() && XLogCheckBufferNeedsBackup(buffer))
do_freeze = true;
@@ -754,7 +746,7 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
/* Any error while applying the changes is critical */
START_CRIT_SECTION();
- if (do_hint)
+ if (do_hint_prune)
{
/*
* Update the page's pd_prune_xid field to either zero, or the lowest
@@ -797,7 +789,7 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
/*
* Emit a WAL XLOG_HEAP2_PRUNE* record showing what we did
*/
- if (RelationNeedsWAL(relation))
+ if (RelationNeedsWAL(params->relation))
{
/*
* The snapshotConflictHorizon for the whole record should be the
@@ -835,11 +827,11 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
else
conflict_xid = prstate.latest_xid_removed;
- log_heap_prune_and_freeze(relation, buffer,
+ log_heap_prune_and_freeze(params->relation, buffer,
InvalidBuffer, /* vmbuffer */
0, /* vmflags */
conflict_xid,
- true, reason,
+ true, params->reason,
prstate.frozen, prstate.nfrozen,
prstate.redirected, prstate.nredirected,
prstate.nowdead, prstate.ndead,
@@ -897,7 +889,7 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
presult->lpdead_items = prstate.lpdead_items;
/* the presult->deadoffsets array was already filled in */
- if (prstate.freeze)
+ if (prstate.attempt_freeze)
{
if (presult->nfrozen > 0)
{
@@ -1479,7 +1471,7 @@ heap_prune_record_unchanged_lp_normal(Page page, PruneState *prstate, OffsetNumb
}
/* Consider freezing any normal tuples which will not be removed */
- if (prstate->freeze)
+ if (prstate->attempt_freeze)
{
bool totally_frozen;
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index dfe617a914f..b25050d6773 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -1956,10 +1956,16 @@ lazy_scan_prune(LVRelState *vacrel,
{
Relation rel = vacrel->rel;
PruneFreezeResult presult;
- int prune_options = 0;
+ PruneFreezeParams params;
Assert(BufferGetBlockNumber(buf) == blkno);
+ params.relation = rel;
+ params.buffer = buf;
+ params.reason = PRUNE_VACUUM_SCAN;
+ params.cutoffs = &vacrel->cutoffs;
+ params.vistest = vacrel->vistest;
+
/*
* Prune all HOT-update chains and potentially freeze tuples on this page.
*
@@ -1975,12 +1981,12 @@ lazy_scan_prune(LVRelState *vacrel,
* tuples. Pruning will have determined whether or not the page is
* all-visible.
*/
- prune_options = HEAP_PAGE_PRUNE_FREEZE;
+ params.options = HEAP_PAGE_PRUNE_FREEZE;
if (vacrel->nindexes == 0)
- prune_options |= HEAP_PAGE_PRUNE_MARK_UNUSED_NOW;
+ params.options |= HEAP_PAGE_PRUNE_MARK_UNUSED_NOW;
- heap_page_prune_and_freeze(rel, buf, vacrel->vistest, prune_options,
- &vacrel->cutoffs, &presult, PRUNE_VACUUM_SCAN,
+ heap_page_prune_and_freeze(¶ms,
+ &presult,
&vacrel->offnum,
&vacrel->NewRelfrozenXid, &vacrel->NewRelminMxid);
diff --git a/src/include/access/heapam.h b/src/include/access/heapam.h
index 8cbff6ab0eb..74a5c24002b 100644
--- a/src/include/access/heapam.h
+++ b/src/include/access/heapam.h
@@ -221,6 +221,55 @@ typedef struct HeapPageFreeze
} HeapPageFreeze;
+
+/* 'reason' codes for heap_page_prune_and_freeze() */
+typedef enum
+{
+ PRUNE_ON_ACCESS, /* on-access pruning */
+ PRUNE_VACUUM_SCAN, /* VACUUM 1st heap pass */
+ PRUNE_VACUUM_CLEANUP, /* VACUUM 2nd heap pass */
+} PruneReason;
+
+/*
+ * Input parameters to heap_page_prune_and_freeze()
+ */
+typedef struct PruneFreezeParams
+{
+ Relation relation; /* relation containing buffer to be pruned */
+ Buffer buffer; /* buffer to be pruned */
+
+ /*
+ * The reason pruning was performed. It is used to set the WAL record
+ * opcode which is used for debugging and analysis purposes.
+ */
+ PruneReason reason;
+
+ /*
+ * Contains flag bits:
+ *
+ * MARK_UNUSED_NOW indicates that dead items can be set LP_UNUSED during
+ * pruning.
+ *
+ * FREEZE indicates that we will also freeze tuples, and will return
+ * 'all_visible', 'all_frozen' flags to the caller.
+ */
+ int options;
+
+ /*
+ * vistest is used to distinguish whether tuples are DEAD or RECENTLY_DEAD
+ * (see heap_prune_satisfies_vacuum).
+ */
+ GlobalVisState *vistest;
+
+ /*
+ * cutoffs contains the freeze cutoffs, established by VACUUM at the
+ * beginning of vacuuming the relation. Required if HEAP_PRUNE_FREEZE
+ * option is set. cutoffs->OldestXmin is also used to determine if dead
+ * tuples are HEAPTUPLE_RECENTLY_DEAD or HEAPTUPLE_DEAD.
+ */
+ struct VacuumCutoffs *cutoffs;
+} PruneFreezeParams;
+
/*
* Per-page state returned by heap_page_prune_and_freeze()
*/
@@ -264,13 +313,6 @@ typedef struct PruneFreezeResult
OffsetNumber deadoffsets[MaxHeapTuplesPerPage];
} PruneFreezeResult;
-/* 'reason' codes for heap_page_prune_and_freeze() */
-typedef enum
-{
- PRUNE_ON_ACCESS, /* on-access pruning */
- PRUNE_VACUUM_SCAN, /* VACUUM 1st heap pass */
- PRUNE_VACUUM_CLEANUP, /* VACUUM 2nd heap pass */
-} PruneReason;
/* ----------------
* function prototypes for heap access method
@@ -367,12 +409,8 @@ extern TransactionId heap_index_delete_tuples(Relation rel,
/* in heap/pruneheap.c */
extern void heap_page_prune_opt(Relation relation, Buffer buffer);
-extern void heap_page_prune_and_freeze(Relation relation, Buffer buffer,
- GlobalVisState *vistest,
- int options,
- struct VacuumCutoffs *cutoffs,
+extern void heap_page_prune_and_freeze(PruneFreezeParams *params,
PruneFreezeResult *presult,
- PruneReason reason,
OffsetNumber *off_loc,
TransactionId *new_relfrozen_xid,
MultiXactId *new_relmin_mxid);
diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list
index 02b5b041c45..20f45232175 100644
--- a/src/tools/pgindent/typedefs.list
+++ b/src/tools/pgindent/typedefs.list
@@ -2342,6 +2342,7 @@ ProjectionPath
PromptInterruptContext
ProtocolVersion
PrsStorage
+PruneFreezeParams
PruneFreezeResult
PruneReason
PruneState
--
2.43.0
v17-0008-Eliminate-XLOG_HEAP2_VISIBLE-from-empty-page-vac.patchtext/x-patch; charset=US-ASCII; name=v17-0008-Eliminate-XLOG_HEAP2_VISIBLE-from-empty-page-vac.patchDownload
From f3dc6eda58a61482f36786dda6e2aaa22c0e0f0f Mon Sep 17 00:00:00 2001
From: Melanie Plageman <melanieplageman@gmail.com>
Date: Sat, 27 Sep 2025 11:55:21 -0400
Subject: [PATCH v17 08/15] Eliminate XLOG_HEAP2_VISIBLE from empty-page vacuum
As part of removing XLOG_HEAP2_VISIBLE records, phase I of VACUUM now
marks empty pages all-visible in a XLOG_HEAP2_PRUNE_VACUUM_SCAN record.
Author: Melanie Plageman <melanieplageman@gmail.com>
Reviewed-by: Robert Haas <robertmhaas@gmail.com>
---
src/backend/access/heap/vacuumlazy.c | 36 +++++++++++++++++++++++-----
1 file changed, 30 insertions(+), 6 deletions(-)
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index 2f719108ad2..941b989ec50 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -1878,9 +1878,12 @@ lazy_scan_new_or_empty(LVRelState *vacrel, Buffer buf, BlockNumber blkno,
*/
if (!PageIsAllVisible(page))
{
+ /* Lock vmbuffer before entering critical section */
+ LockBuffer(vmbuffer, BUFFER_LOCK_EXCLUSIVE);
+
START_CRIT_SECTION();
- /* mark buffer dirty before writing a WAL record */
+ /* Mark buffer dirty before writing any WAL records */
MarkBufferDirty(buf);
/*
@@ -1897,13 +1900,34 @@ lazy_scan_new_or_empty(LVRelState *vacrel, Buffer buf, BlockNumber blkno,
log_newpage_buffer(buf, true);
PageSetAllVisible(page);
- visibilitymap_set(vacrel->rel, blkno, buf,
- InvalidXLogRecPtr,
- vmbuffer, InvalidTransactionId,
- VISIBILITYMAP_ALL_VISIBLE |
- VISIBILITYMAP_ALL_FROZEN);
+ visibilitymap_set_vmbits(blkno,
+ vmbuffer,
+ VISIBILITYMAP_ALL_VISIBLE |
+ VISIBILITYMAP_ALL_FROZEN,
+ RelationGetRelationName(vacrel->rel));
+
+ /*
+ * Emit WAL for setting PD_ALL_VISIBLE on the heap page and
+ * setting the VM.
+ */
+ if (RelationNeedsWAL(vacrel->rel))
+ log_heap_prune_and_freeze(vacrel->rel, buf,
+ vmbuffer,
+ VISIBILITYMAP_ALL_VISIBLE |
+ VISIBILITYMAP_ALL_FROZEN,
+ InvalidTransactionId, /* conflict xid */
+ false, /* cleanup lock */
+ true, /* set_pd_all_vis */
+ PRUNE_VACUUM_SCAN, /* reason */
+ NULL, 0,
+ NULL, 0,
+ NULL, 0,
+ NULL, 0);
+
END_CRIT_SECTION();
+ LockBuffer(vmbuffer, BUFFER_LOCK_UNLOCK);
+
/* Count the newly all-frozen pages for logging */
vacrel->vm_new_visible_pages++;
vacrel->vm_new_visible_frozen_pages++;
--
2.43.0
v17-0006-Make-heap_page_is_all_visible-independent-of-LVR.patchtext/x-patch; charset=US-ASCII; name=v17-0006-Make-heap_page_is_all_visible-independent-of-LVR.patchDownload
From 86193a71d2ff9649b5b1c1e6963bd610285ad369 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <melanieplageman@gmail.com>
Date: Fri, 3 Oct 2025 15:57:02 -0400
Subject: [PATCH v17 06/15] Make heap_page_is_all_visible independent of
LVRelState
Future commits will use this function inside of pruneheap.c where we do
not have access to the LVRelState. We only need a few parameters from
the LVRelState, so just pass those in explicitly.
Author: Melanie Plageman <melanieplageman@gmail.com>
Reviewed-by: Kirill Reshke <reshkekirill@gmail.com>
Reviewed-by: Robert Haas <robertmhaas@gmail.com>
Discussion: https://postgr.es/m/flat/CAAKRu_ZMw6Npd_qm2KM%2BFwQ3cMOMx1Dh3VMhp8-V7SOLxdK9-g%40mail.gmail.com
---
src/backend/access/heap/vacuumlazy.c | 59 ++++++++++++++++++----------
1 file changed, 38 insertions(+), 21 deletions(-)
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index 56a0286662b..c2618c6449c 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -463,13 +463,19 @@ static void dead_items_add(LVRelState *vacrel, BlockNumber blkno, OffsetNumber *
int num_offsets);
static void dead_items_reset(LVRelState *vacrel);
static void dead_items_cleanup(LVRelState *vacrel);
-static bool heap_page_is_all_visible(LVRelState *vacrel, Buffer buf,
- TransactionId *visibility_cutoff_xid, bool *all_frozen);
-static bool heap_page_would_be_all_visible(LVRelState *vacrel, Buffer buf,
+
+static bool heap_page_is_all_visible(Relation rel, Buffer buf,
+ TransactionId OldestXmin,
+ bool *all_frozen,
+ TransactionId *visibility_cutoff_xid,
+ OffsetNumber *logging_offnum);
+static bool heap_page_would_be_all_visible(Relation rel, Buffer buf,
+ TransactionId OldestXmin,
OffsetNumber *deadoffsets,
int ndeadoffsets,
bool *all_frozen,
- TransactionId *visibility_cutoff_xid);
+ TransactionId *visibility_cutoff_xid,
+ OffsetNumber *logging_offnum);
static void update_relstats_all_indexes(LVRelState *vacrel);
static void vacuum_error_callback(void *arg);
static void update_vacuum_error_info(LVRelState *vacrel,
@@ -2019,8 +2025,9 @@ lazy_scan_prune(LVRelState *vacrel,
Assert(presult.lpdead_items == 0);
- if (!heap_page_is_all_visible(vacrel, buf,
- &debug_cutoff, &debug_all_frozen))
+ if (!heap_page_is_all_visible(vacrel->rel, buf,
+ vacrel->cutoffs.OldestXmin, &debug_all_frozen,
+ &debug_cutoff, &vacrel->offnum))
Assert(false);
Assert(presult.all_frozen == debug_all_frozen);
@@ -2880,9 +2887,11 @@ lazy_vacuum_heap_page(LVRelState *vacrel, BlockNumber blkno, Buffer buffer,
* visibility checks may perform I/O and allocate memory, they must be
* done outside the critical section.
*/
- if (heap_page_would_be_all_visible(vacrel, buffer,
+ if (heap_page_would_be_all_visible(vacrel->rel, buffer,
+ vacrel->cutoffs.OldestXmin,
deadoffsets, num_offsets,
- &all_frozen, &visibility_cutoff_xid))
+ &all_frozen, &visibility_cutoff_xid,
+ &vacrel->offnum))
{
vmflags |= VISIBILITYMAP_ALL_VISIBLE;
if (all_frozen)
@@ -3626,15 +3635,19 @@ dead_items_cleanup(LVRelState *vacrel)
* callers that expect no LP_DEAD on the page.
*/
static bool
-heap_page_is_all_visible(LVRelState *vacrel, Buffer buf,
+heap_page_is_all_visible(Relation rel, Buffer buf,
+ TransactionId OldestXmin,
+ bool *all_frozen,
TransactionId *visibility_cutoff_xid,
- bool *all_frozen)
+ OffsetNumber *logging_offnum)
{
- return heap_page_would_be_all_visible(vacrel, buf,
+ return heap_page_would_be_all_visible(rel, buf,
+ OldestXmin,
NULL, 0,
all_frozen,
- visibility_cutoff_xid);
+ visibility_cutoff_xid,
+ logging_offnum);
}
/*
@@ -3649,10 +3662,14 @@ heap_page_is_all_visible(LVRelState *vacrel, Buffer buf,
* Returns true if the page is all-visible other than the provided
* deadoffsets and false otherwise.
*
+ * OldestXmin is used to determine visibility.
+ *
* Output parameters:
*
* - *all_frozen: true if every tuple on the page is frozen
* - *visibility_cutoff_xid: newest xmin; valid only if page is all-visible
+ * - *logging_offnum: OffsetNumber of current tuple being processed;
+ * used by vacuum's error callback system.
*
* Callers looking to verify that the page is already all-visible can call
* heap_page_is_all_visible().
@@ -3663,11 +3680,13 @@ heap_page_is_all_visible(LVRelState *vacrel, Buffer buf,
* side-effects.
*/
static bool
-heap_page_would_be_all_visible(LVRelState *vacrel, Buffer buf,
+heap_page_would_be_all_visible(Relation rel, Buffer buf,
+ TransactionId OldestXmin,
OffsetNumber *deadoffsets,
int ndeadoffsets,
bool *all_frozen,
- TransactionId *visibility_cutoff_xid)
+ TransactionId *visibility_cutoff_xid,
+ OffsetNumber *logging_offnum)
{
Page page = BufferGetPage(buf);
BlockNumber blockno = BufferGetBlockNumber(buf);
@@ -3702,7 +3721,7 @@ heap_page_would_be_all_visible(LVRelState *vacrel, Buffer buf,
* Set the offset number so that we can display it along with any
* error that occurred while processing this tuple.
*/
- vacrel->offnum = offnum;
+ *logging_offnum = offnum;
itemid = PageGetItemId(page, offnum);
/* Unused or redirect line pointers are of no interest */
@@ -3732,10 +3751,9 @@ heap_page_would_be_all_visible(LVRelState *vacrel, Buffer buf,
tuple.t_data = (HeapTupleHeader) PageGetItem(page, itemid);
tuple.t_len = ItemIdGetLength(itemid);
- tuple.t_tableOid = RelationGetRelid(vacrel->rel);
+ tuple.t_tableOid = RelationGetRelid(rel);
- switch (HeapTupleSatisfiesVacuum(&tuple, vacrel->cutoffs.OldestXmin,
- buf))
+ switch (HeapTupleSatisfiesVacuum(&tuple, OldestXmin, buf))
{
case HEAPTUPLE_LIVE:
{
@@ -3754,8 +3772,7 @@ heap_page_would_be_all_visible(LVRelState *vacrel, Buffer buf,
* that everyone sees it as committed?
*/
xmin = HeapTupleHeaderGetXmin(tuple.t_data);
- if (!TransactionIdPrecedes(xmin,
- vacrel->cutoffs.OldestXmin))
+ if (!TransactionIdPrecedes(xmin, OldestXmin))
{
all_visible = false;
*all_frozen = false;
@@ -3790,7 +3807,7 @@ heap_page_would_be_all_visible(LVRelState *vacrel, Buffer buf,
} /* scan along page */
/* Clear the offset information once we have processed the given page. */
- vacrel->offnum = InvalidOffsetNumber;
+ *logging_offnum = InvalidOffsetNumber;
return all_visible;
}
--
2.43.0
v17-0004-Add-helper-for-freeze-determination-to-heap_page.patchtext/x-patch; charset=US-ASCII; name=v17-0004-Add-helper-for-freeze-determination-to-heap_page.patchDownload
From c69a5219a9b792f3c9f6dc730b8810a88d088ae6 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <melanieplageman@gmail.com>
Date: Tue, 16 Sep 2025 14:22:10 -0400
Subject: [PATCH v17 04/15] Add helper for freeze determination to
heap_page_prune_and_freeze
After scanning through the line pointers on the heap page during
vacuum's first phase, we use several statuses and information we
collected to determine whether or not we will use the freeze plans we
assembled.
Do this in a helper for better readability.
---
src/backend/access/heap/pruneheap.c | 196 +++++++++++++++++-----------
1 file changed, 117 insertions(+), 79 deletions(-)
diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index be42d3c3272..44214a57ecd 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -302,6 +302,118 @@ heap_page_prune_opt(Relation relation, Buffer buffer)
}
}
+/*
+ * Decide if we want to go ahead with freezing according to the freeze plans
+ * we prepared for the given heap buffer or not. If the caller specified we
+ * should not freeze tuples, it exits early. Otherwise, it does a few
+ * pre-freeze checks.
+ *
+ * do_prune, do_hint_prune, and did_tuple_hint_fpi must all have been decided
+ * before calling this function.
+ *
+ * prstate is an input/output parameter.
+ *
+ * Returns true if we should use our freeze plans and freeze tuples on the page
+ * and false otherwise.
+ */
+static bool
+heap_page_will_freeze(Relation relation, Buffer buffer,
+ bool did_tuple_hint_fpi,
+ bool do_prune,
+ bool do_hint_prune,
+ PruneState *prstate)
+{
+ bool do_freeze = false;
+
+ /*
+ * If the caller specified we should not attempt to freeze any tuples,
+ * validate that everything is in the right state and exit.
+ */
+ if (!prstate->attempt_freeze)
+ {
+ Assert(!prstate->all_frozen && prstate->nfrozen == 0);
+ Assert(prstate->lpdead_items == 0 || !prstate->all_visible);
+ return false;
+ }
+
+ if (prstate->pagefrz.freeze_required)
+ {
+ /*
+ * heap_prepare_freeze_tuple indicated that at least one XID/MXID from
+ * before FreezeLimit/MultiXactCutoff is present. Must freeze to
+ * advance relfrozenxid/relminmxid.
+ */
+ do_freeze = true;
+ }
+ else
+ {
+ /*
+ * Opportunistically freeze the page if we are generating an FPI
+ * anyway and if doing so means that we can set the page all-frozen
+ * afterwards (might not happen until VACUUM's final heap pass).
+ *
+ * XXX: Previously, we knew if pruning emitted an FPI by checking
+ * pgWalUsage.wal_fpi before and after pruning. Once the freeze and
+ * prune records were combined, this heuristic couldn't be used
+ * anymore. The opportunistic freeze heuristic must be improved;
+ * however, for now, try to approximate the old logic.
+ */
+ if (prstate->all_visible && prstate->all_frozen && prstate->nfrozen > 0)
+ {
+ /*
+ * Freezing would make the page all-frozen. Have already emitted
+ * an FPI or will do so anyway?
+ */
+ if (RelationNeedsWAL(relation))
+ {
+ if (did_tuple_hint_fpi)
+ do_freeze = true;
+ else if (do_prune)
+ {
+ if (XLogCheckBufferNeedsBackup(buffer))
+ do_freeze = true;
+ }
+ else if (do_hint_prune)
+ {
+ if (XLogHintBitIsNeeded() && XLogCheckBufferNeedsBackup(buffer))
+ do_freeze = true;
+ }
+ }
+ }
+ }
+
+ if (do_freeze)
+ {
+ /*
+ * Validate the tuples we will be freezing before entering the
+ * critical section.
+ */
+ heap_pre_freeze_checks(buffer, prstate->frozen, prstate->nfrozen);
+ }
+ else if (prstate->nfrozen > 0)
+ {
+ /*
+ * The page contained some tuples that were not already frozen, and we
+ * chose not to freeze them now. The page won't be all-frozen then.
+ */
+ Assert(!prstate->pagefrz.freeze_required);
+
+ prstate->all_frozen = false;
+ prstate->nfrozen = 0; /* avoid miscounts in instrumentation */
+ }
+ else
+ {
+ /*
+ * We have no freeze plans to execute. The page might already be
+ * all-frozen (perhaps only following pruning), though. Such pages
+ * can be marked all-frozen in the VM by our caller, even though none
+ * of its tuples were newly frozen here.
+ */
+ }
+
+ return do_freeze;
+}
+
/*
* Prune and repair fragmentation and potentially freeze tuples on the
@@ -663,85 +775,11 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
* Decide if we want to go ahead with freezing according to the freeze
* plans we prepared, or not.
*/
- do_freeze = false;
- if (prstate.attempt_freeze)
- {
- if (prstate.pagefrz.freeze_required)
- {
- /*
- * heap_prepare_freeze_tuple indicated that at least one XID/MXID
- * from before FreezeLimit/MultiXactCutoff is present. Must
- * freeze to advance relfrozenxid/relminmxid.
- */
- do_freeze = true;
- }
- else
- {
- /*
- * Opportunistically freeze the page if we are generating an FPI
- * anyway and if doing so means that we can set the page
- * all-frozen afterwards (might not happen until VACUUM's final
- * heap pass).
- *
- * XXX: Previously, we knew if pruning emitted an FPI by checking
- * pgWalUsage.wal_fpi before and after pruning. Once the freeze
- * and prune records were combined, this heuristic couldn't be
- * used anymore. The opportunistic freeze heuristic must be
- * improved; however, for now, try to approximate the old logic.
- */
- if (prstate.all_visible && prstate.all_frozen && prstate.nfrozen > 0)
- {
- /*
- * Freezing would make the page all-frozen. Have already
- * emitted an FPI or will do so anyway?
- */
- if (RelationNeedsWAL(params->relation))
- {
- if (did_tuple_hint_fpi)
- do_freeze = true;
- else if (do_prune)
- {
- if (XLogCheckBufferNeedsBackup(buffer))
- do_freeze = true;
- }
- else if (do_hint_prune)
- {
- if (XLogHintBitIsNeeded() && XLogCheckBufferNeedsBackup(buffer))
- do_freeze = true;
- }
- }
- }
- }
- }
-
- if (do_freeze)
- {
- /*
- * Validate the tuples we will be freezing before entering the
- * critical section.
- */
- heap_pre_freeze_checks(buffer, prstate.frozen, prstate.nfrozen);
- }
- else if (prstate.nfrozen > 0)
- {
- /*
- * The page contained some tuples that were not already frozen, and we
- * chose not to freeze them now. The page won't be all-frozen then.
- */
- Assert(!prstate.pagefrz.freeze_required);
-
- prstate.all_frozen = false;
- prstate.nfrozen = 0; /* avoid miscounts in instrumentation */
- }
- else
- {
- /*
- * We have no freeze plans to execute. The page might already be
- * all-frozen (perhaps only following pruning), though. Such pages
- * can be marked all-frozen in the VM by our caller, even though none
- * of its tuples were newly frozen here.
- */
- }
+ do_freeze = heap_page_will_freeze(params->relation, buffer,
+ did_tuple_hint_fpi,
+ do_prune,
+ do_hint_prune,
+ &prstate);
/* Any error while applying the changes is critical */
START_CRIT_SECTION();
--
2.43.0
v17-0007-Eliminate-XLOG_HEAP2_VISIBLE-from-vacuum-prune-f.patchtext/x-patch; charset=UTF-8; name=v17-0007-Eliminate-XLOG_HEAP2_VISIBLE-from-vacuum-prune-f.patchDownload
From dde0dfc578137f7c93f9a0e34af38dcdb841b080 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <melanieplageman@gmail.com>
Date: Wed, 8 Oct 2025 15:39:01 -0400
Subject: [PATCH v17 07/15] Eliminate XLOG_HEAP2_VISIBLE from vacuum
prune/freeze
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
Vacuum no longer emits a separate WAL record for each page set
all-visible or all-frozen during phase I. Instead, visibility map
updates are now included in the XLOG_HEAP2_PRUNE_VACUUM_SCAN record that
is already emitted for pruning and freezing.
Previously, heap_page_prune_and_freeze() determined whether a page was
all-visible, but the corresponding VM bits were only set later in
lazy_scan_prune(). Now the VM is updated immediately in
heap_page_prune_and_freeze(), at the same time as the heap
modifications.
This change applies only to vacuum’s prune/freeze work, not to pruning
performed during normal page access.
Reviewed-by: Robert Haas <robertmhaas@gmail.com>
---
src/backend/access/heap/heapam_xlog.c | 41 ++-
src/backend/access/heap/pruneheap.c | 429 ++++++++++++++++++++------
src/backend/access/heap/vacuumlazy.c | 205 +-----------
src/include/access/heapam.h | 41 ++-
4 files changed, 414 insertions(+), 302 deletions(-)
diff --git a/src/backend/access/heap/heapam_xlog.c b/src/backend/access/heap/heapam_xlog.c
index aaf595e75d6..f6624bc98d0 100644
--- a/src/backend/access/heap/heapam_xlog.c
+++ b/src/backend/access/heap/heapam_xlog.c
@@ -104,6 +104,8 @@ heap_xlog_prune_freeze(XLogReaderState *record)
OffsetNumber *frz_offsets;
char *dataptr = XLogRecGetBlockData(record, 0, &datalen);
bool do_prune;
+ bool set_lsn = false;
+ bool mark_buffer_dirty = false;
heap_xlog_deserialize_prune_and_freeze(dataptr, xlrec.flags,
&nplans, &plans, &frz_offsets,
@@ -157,20 +159,37 @@ heap_xlog_prune_freeze(XLogReaderState *record)
/* There should be no more data */
Assert((char *) frz_offsets == dataptr + datalen);
- if ((vmflags & VISIBILITYMAP_VALID_BITS))
- PageSetAllVisible(page);
-
- MarkBufferDirty(buffer);
+ if (do_prune || nplans > 0)
+ mark_buffer_dirty = set_lsn = true;
/*
- * Always emit a WAL record when setting PD_ALL_VISIBLE but only emit
- * an FPI if checksums/wal_log_hints are enabled. Advance the page LSN
- * only if the record could include an FPI, since recovery skips
- * records <= the stamped LSN. Otherwise it might skip an earlier FPI
- * needed to repair a torn page.
+ * The critical integrity requirement here is that we must never end
+ * up with with the visibility map bit set and the page-level
+ * PD_ALL_VISIBLE bit clear. If that were to occur, a subsequent page
+ * modification would fail to clear the visibility map bit.
+ *
+ * If this record only sets the VM, no need to dirty the heap page.
*/
- if (do_prune || nplans > 0 ||
- ((vmflags & VISIBILITYMAP_VALID_BITS) && XLogHintBitIsNeeded()))
+ if ((vmflags & VISIBILITYMAP_VALID_BITS) && !PageIsAllVisible(page))
+ {
+ PageSetAllVisible(page);
+ mark_buffer_dirty = true;
+
+ /*
+ * Always emit a WAL record when setting PD_ALL_VISIBLE but only
+ * emit an FPI if checksums/wal_log_hints are enabled. Advance the
+ * page LSN only if the record could include an FPI, since
+ * recovery skips records <= the stamped LSN. Otherwise it might
+ * skip an earlier FPI needed to repair a torn page.
+ */
+ if (XLogHintBitIsNeeded())
+ set_lsn = true;
+ }
+
+ if (mark_buffer_dirty)
+ MarkBufferDirty(buffer);
+
+ if (set_lsn)
PageSetLSN(page, lsn);
/*
diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index 5892ed5a07e..f70563008e1 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -19,7 +19,7 @@
#include "access/htup_details.h"
#include "access/multixact.h"
#include "access/transam.h"
-#include "access/visibilitymapdefs.h"
+#include "access/visibilitymap.h"
#include "access/xlog.h"
#include "access/xloginsert.h"
#include "commands/vacuum.h"
@@ -44,6 +44,8 @@ typedef struct
bool mark_unused_now;
/* whether to attempt freezing tuples */
bool attempt_freeze;
+ /* whether or not to attempt updating the VM */
+ bool attempt_update_vm;
struct VacuumCutoffs *cutoffs;
/*-------------------------------------------------------
@@ -133,17 +135,17 @@ typedef struct
* all_visible and all_frozen indicate if the all-visible and all-frozen
* bits in the visibility map can be set for this page after pruning.
*
- * visibility_cutoff_xid is the newest xmin of live tuples on the page.
- * The caller can use it as the conflict horizon, when setting the VM
- * bits. It is only valid if we froze some tuples, and all_frozen is
- * true.
+ * visibility_cutoff_xid is the newest xmin of live tuples on the page. It
+ * can be used as the conflict horizon when setting the VM or when
+ * freezing all the tuples on the page. It is only valid when all the live
+ * tuples on the page are all-visible.
*
* NOTE: all_visible and all_frozen initially don't include LP_DEAD items.
* That's convenient for heap_page_prune_and_freeze(), to use them to
- * decide whether to freeze the page or not. The all_visible and
- * all_frozen values returned to the caller are adjusted to include
- * LP_DEAD items after we determine whether or not to opportunistically
- * freeze.
+ * decide whether to opportunistically freeze the page or not. The
+ * all_visible and all_frozen values ultimately used to set the VM are
+ * adjusted to include LP_DEAD items after we determine whether or not to
+ * opportunistically freeze.
*/
bool all_visible;
bool all_frozen;
@@ -174,6 +176,19 @@ static void heap_prune_record_unchanged_lp_redirect(PruneState *prstate, OffsetN
static void page_verify_redirects(Page page);
+static TransactionId get_conflict_xid(bool do_prune, bool do_freeze, bool do_set_vm,
+ TransactionId latest_xid_removed, TransactionId frz_conflict_horizon,
+ TransactionId visibility_cutoff_xid, bool blk_already_av,
+ bool set_blk_all_frozen);
+
+static bool heap_page_will_set_vis(Relation relation,
+ BlockNumber heap_blk,
+ Buffer heap_buf,
+ Buffer vmbuffer,
+ bool blk_known_av,
+ const PruneState *prstate,
+ uint8 *vmflags,
+ bool *do_set_pd_vis);
/*
* Optionally prune and repair fragmentation in the specified page.
@@ -259,6 +274,8 @@ heap_page_prune_opt(Relation relation, Buffer buffer)
params.reason = PRUNE_ON_ACCESS;
params.vistest = vistest;
params.cutoffs = NULL;
+ params.vmbuffer = InvalidBuffer;
+ params.blk_known_av = false;
/*
* For now, pass mark_unused_now as false regardless of whether or
@@ -431,10 +448,108 @@ heap_page_will_freeze(Relation relation, Buffer buffer,
return do_freeze;
}
+/*
+ * Decide whether to set the visibility map bits for heap_blk, using
+ * information from PruneState and blk_known_av. Some callers may already
+ * have examined this page’s VM bits (e.g., VACUUM in the previous
+ * heap_vac_scan_next_block() call) and can pass that along.
+ *
+ * Returns true if one or both VM bits should be set, along with the desired
+ * flags in *vmflags. Also indicates via do_set_pd_vis whether PD_ALL_VISIBLE
+ * should be set on the heap page.
+ */
+static bool
+heap_page_will_set_vis(Relation relation,
+ BlockNumber heap_blk,
+ Buffer heap_buf,
+ Buffer vmbuffer,
+ bool blk_known_av,
+ const PruneState *prstate,
+ uint8 *vmflags,
+ bool *do_set_pd_vis)
+{
+ Page heap_page = BufferGetPage(heap_buf);
+ bool do_set_vm = false;
+
+ *do_set_pd_vis = false;
+
+ if (!prstate->attempt_update_vm)
+ {
+ Assert(!prstate->all_visible && !prstate->all_frozen);
+ Assert(*vmflags == 0);
+ return false;
+ }
+
+ if (prstate->all_visible && !PageIsAllVisible(heap_page))
+ *do_set_pd_vis = true;
+
+ if ((prstate->all_visible && !blk_known_av) ||
+ (prstate->all_frozen && !VM_ALL_FROZEN(relation, heap_blk, &vmbuffer)))
+ {
+ *vmflags = VISIBILITYMAP_ALL_VISIBLE;
+ if (prstate->all_frozen)
+ *vmflags |= VISIBILITYMAP_ALL_FROZEN;
+
+ do_set_vm = true;
+ }
+
+ /*
+ * Now handle two potential corruption cases:
+ *
+ * These do not need to happen in a critical section and are not
+ * WAL-logged.
+ *
+ * As of PostgreSQL 9.2, the visibility map bit should never be set if the
+ * page-level bit is clear. However, it's possible that in vacuum the bit
+ * got cleared after heap_vac_scan_next_block() was called, so we must
+ * recheck with buffer lock before concluding that the VM is corrupt.
+ */
+ else if (blk_known_av && !PageIsAllVisible(heap_page) &&
+ visibilitymap_get_status(relation, heap_blk, &vmbuffer) != 0)
+ {
+ ereport(WARNING,
+ (errcode(ERRCODE_DATA_CORRUPTED),
+ errmsg("page is not marked all-visible but visibility map bit is set in relation \"%s\" page %u",
+ RelationGetRelationName(relation), heap_blk)));
+
+ visibilitymap_clear(relation, heap_blk, vmbuffer,
+ VISIBILITYMAP_VALID_BITS);
+ }
+
+ /*
+ * It's possible for the value returned by
+ * GetOldestNonRemovableTransactionId() to move backwards, so it's not
+ * wrong for us to see tuples that appear to not be visible to everyone
+ * yet, while PD_ALL_VISIBLE is already set. The real safe xmin value
+ * never moves backwards, but GetOldestNonRemovableTransactionId() is
+ * conservative and sometimes returns a value that's unnecessarily small,
+ * so if we see that contradiction it just means that the tuples that we
+ * think are not visible to everyone yet actually are, and the
+ * PD_ALL_VISIBLE flag is correct.
+ *
+ * There should never be LP_DEAD items on a page with PD_ALL_VISIBLE set,
+ * however.
+ */
+ else if (prstate->lpdead_items > 0 && PageIsAllVisible(heap_page))
+ {
+ ereport(WARNING,
+ (errcode(ERRCODE_DATA_CORRUPTED),
+ errmsg("page containing LP_DEAD items is marked as all-visible in relation \"%s\" page %u",
+ RelationGetRelationName(relation), heap_blk)));
+
+ PageClearAllVisible(heap_page);
+ MarkBufferDirty(heap_buf);
+ visibilitymap_clear(relation, heap_blk, vmbuffer,
+ VISIBILITYMAP_VALID_BITS);
+ }
+
+ return do_set_vm;
+}
/*
* Prune and repair fragmentation and potentially freeze tuples on the
- * specified page.
+ * specified page. If the page's visibility status has changed, update it in
+ * the VM.
*
* Caller must have pin and buffer cleanup lock on the page. Note that we
* don't update the FSM information for page on caller's behalf. Caller might
@@ -449,12 +564,13 @@ heap_page_will_freeze(Relation relation, Buffer buffer,
* it's required in order to advance relfrozenxid / relminmxid, or if it's
* considered advantageous for overall system performance to do so now. The
* 'params.cutoffs', 'presult', 'new_relfrozen_xid' and 'new_relmin_mxid'
- * arguments are required when freezing. When HEAP_PRUNE_FREEZE option is
- * passed, we also set presult->all_visible and presult->all_frozen after
- * determining whether or not to opporunistically freeze, to indicate if the
- * VM bits can be set. They are always set to false when the
- * HEAP_PRUNE_FREEZE option is not passed, because at the moment only callers
- * that also freeze need that information.
+ * arguments are required when freezing.
+ *
+ * If HEAP_PAGE_PRUNE_UPDATE_VIS is set in params and the visibility status of
+ * the page has changed, we will update the VM at the same time as pruning and
+ * freezing the heap page. We will also update presult->old_vmbits and
+ * presult->new_vmbits with the state of the VM before and after updating it
+ * for the caller to use in bookkeeping.
*
* presult contains output parameters needed by callers, such as the number of
* tuples removed and the offsets of dead items on the page after pruning.
@@ -479,6 +595,7 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
MultiXactId *new_relmin_mxid)
{
Buffer buffer = params->buffer;
+ Buffer vmbuffer = params->vmbuffer;
Page page = BufferGetPage(buffer);
BlockNumber blockno = BufferGetBlockNumber(buffer);
OffsetNumber offnum,
@@ -488,15 +605,22 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
bool do_freeze;
bool do_prune;
bool do_hint_prune;
+ bool do_set_vm;
+ bool do_set_pd_vis;
bool did_tuple_hint_fpi;
int64 fpi_before = pgWalUsage.wal_fpi;
TransactionId frz_conflict_horizon = InvalidTransactionId;
+ TransactionId conflict_xid = InvalidTransactionId;
+ uint8 new_vmbits = 0;
+ uint8 old_vmbits = 0;
/* Copy parameters to prstate */
prstate.vistest = params->vistest;
prstate.mark_unused_now =
(params->options & HEAP_PAGE_PRUNE_MARK_UNUSED_NOW) != 0;
prstate.attempt_freeze = (params->options & HEAP_PAGE_PRUNE_FREEZE) != 0;
+ prstate.attempt_update_vm =
+ (params->options & HEAP_PAGE_PRUNE_UPDATE_VIS) != 0;
prstate.cutoffs = params->cutoffs;
/*
@@ -543,50 +667,54 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
prstate.deadoffsets = presult->deadoffsets;
/*
- * Caller may update the VM after we're done. We can keep track of
- * whether the page will be all-visible and all-frozen after pruning and
- * freezing to help the caller to do that.
+ * Track whether the page could be marked all-visible and/or all-frozen.
+ * This information is used for opportunistic freezing and for updating
+ * the visibility map (VM) if requested by the caller.
+ *
+ * Currently, only VACUUM performs freezing, but other callers may in the
+ * future. Visibility bookkeeping is required not just for setting the VM
+ * bits, but also for opportunistic freezing: we only consider freezing if
+ * the page would become all-frozen, or if it would be all-frozen except
+ * for dead tuples that VACUUM will remove. If attempt_update_vm is false,
+ * we will not set the VM bit even if the page is found to be all-visible.
+ *
+ * If HEAP_PAGE_PRUNE_UPDATE_VIS is passed without HEAP_PAGE_PRUNE_FREEZE,
+ * prstate.all_frozen must be initialized to false, since we will not call
+ * heap_prepare_freeze_tuple() for each tuple.
*
- * Currently, only VACUUM sets the VM bits. To save the effort, only do
- * the bookkeeping if the caller needs it. Currently, that's tied to
- * HEAP_PAGE_PRUNE_FREEZE, but it could be a separate flag if you wanted
- * to update the VM bits without also freezing or freeze without also
- * setting the VM bits.
+ * Dead tuples that will be removed by the end of vacuum should not
+ * prevent opportunistic freezing. Therefore, we do not clear all_visible
+ * when we encounter LP_DEAD items. Instead, we correct all_visible after
+ * deciding whether to freeze, but before updating the VM, to avoid
+ * setting the VM bit incorrectly.
*
- * In addition to telling the caller whether it can set the VM bit, we
- * also use 'all_visible' and 'all_frozen' for our own decision-making. If
- * the whole page would become frozen, we consider opportunistically
- * freezing tuples. We will not be able to freeze the whole page if there
- * are tuples present that are not visible to everyone or if there are
- * dead tuples which are not yet removable. However, dead tuples which
- * will be removed by the end of vacuuming should not preclude us from
- * opportunistically freezing. Because of that, we do not immediately
- * clear all_visible when we see LP_DEAD items. We fix that after
- * scanning the line pointers, before we return the value to the caller,
- * so that the caller doesn't set the VM bit incorrectly.
+ * If neither freezing nor VM updates are requested, we skip the extra
+ * bookkeeping. In this case, initializing all_visible to false allows
+ * heap_prune_record_unchanged_lp_normal() to bypass unnecessary work.
*/
if (prstate.attempt_freeze)
{
prstate.all_visible = true;
prstate.all_frozen = true;
}
+ else if (prstate.attempt_update_vm)
+ {
+ prstate.all_visible = true;
+ prstate.all_frozen = false;
+ }
else
{
- /*
- * Initializing to false allows skipping the work to update them in
- * heap_prune_record_unchanged_lp_normal().
- */
prstate.all_visible = false;
prstate.all_frozen = false;
}
/*
- * The visibility cutoff xid is the newest xmin of live tuples on the
- * page. In the common case, this will be set as the conflict horizon the
- * caller can use for updating the VM. If, at the end of freezing and
- * pruning, the page is all-frozen, there is no possibility that any
- * running transaction on the standby does not see tuples on the page as
- * all-visible, so the conflict horizon remains InvalidTransactionId.
+ * The visibility cutoff xid is the newest xmin of live, committed tuples
+ * older than OldestXmin on the page. This field is only kept up-to-date
+ * if the page is all-visible. As soon as a tuple is encountered that is
+ * not visible to all, this field is unmaintained. As long as it is
+ * maintained, it can be used to calculate the snapshot conflict horizon
+ * when updating the VM and/or freezing all the tuples on the page.
*/
prstate.visibility_cutoff_xid = InvalidTransactionId;
@@ -818,6 +946,35 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
prstate.all_visible = prstate.all_frozen = false;
Assert(!prstate.all_frozen || prstate.all_visible);
+ Assert(!prstate.all_visible || (prstate.lpdead_items == 0));
+
+ /*
+ * Decide whether to set the page-level PD_ALL_VISIBLE bit and the VM bits
+ * based on information from the VM and the all_visible/all_frozen flags.
+ *
+ * While it is valid for PD_ALL_VISIBLE to be set when the corresponding
+ * VM bit is clear, we strongly prefer to keep them in sync.
+ *
+ * Accordingly, we also allow updating only the VM when PD_ALL_VISIBLE has
+ * already been set. Setting only the VM is most common when setting an
+ * already all-visible page all-frozen.
+ */
+ do_set_vm = heap_page_will_set_vis(params->relation,
+ blockno, buffer, vmbuffer, params->blk_known_av,
+ &prstate, &new_vmbits, &do_set_pd_vis);
+
+ /* We should only set the VM if PD_ALL_VISIBLE is set or will be */
+ Assert(!do_set_vm || do_set_pd_vis || PageIsAllVisible(page));
+
+ conflict_xid = get_conflict_xid(do_prune, do_freeze, do_set_vm,
+ prstate.latest_xid_removed, frz_conflict_horizon,
+ prstate.visibility_cutoff_xid, params->blk_known_av,
+ (do_set_vm && (new_vmbits & VISIBILITYMAP_ALL_FROZEN)));
+
+ /* Lock vmbuffer before entering a critical section */
+ if (do_set_vm)
+ LockBuffer(vmbuffer, BUFFER_LOCK_EXCLUSIVE);
+
/* Any error while applying the changes is critical */
START_CRIT_SECTION();
@@ -838,14 +995,17 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
/*
* If that's all we had to do to the page, this is a non-WAL-logged
- * hint. If we are going to freeze or prune the page, we will mark
- * the buffer dirty below.
+ * hint. If we are going to freeze or prune the page or set
+ * PD_ALL_VISIBLE, we will mark the buffer dirty below.
+ *
+ * Setting PD_ALL_VISIBLE is fully WAL-logged because it is forbidden
+ * for the VM to be set and PD_ALL_VISIBLE to be clear.
*/
- if (!do_freeze && !do_prune)
+ if (!do_freeze && !do_prune && !do_set_pd_vis)
MarkBufferDirtyHint(buffer, true);
}
- if (do_prune || do_freeze)
+ if (do_prune || do_freeze || do_set_vm)
{
/* Apply the planned item changes and repair page fragmentation. */
if (do_prune)
@@ -859,66 +1019,91 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
if (do_freeze)
heap_freeze_prepared_tuples(buffer, prstate.frozen, prstate.nfrozen);
- MarkBufferDirty(buffer);
+ if (do_set_pd_vis)
+ PageSetAllVisible(page);
- /*
- * Emit a WAL XLOG_HEAP2_PRUNE* record showing what we did
- */
- if (RelationNeedsWAL(params->relation))
+ if (do_prune || do_freeze || do_set_pd_vis)
+ MarkBufferDirty(buffer);
+
+ if (do_set_vm)
{
- /*
- * The snapshotConflictHorizon for the whole record should be the
- * most conservative of all the horizons calculated for any of the
- * possible modifications. If this record will prune tuples, any
- * transactions on the standby older than the youngest xmax of the
- * most recently removed tuple this record will prune will
- * conflict. If this record will freeze tuples, any transactions
- * on the standby with xids older than the youngest tuple this
- * record will freeze will conflict.
- */
- TransactionId conflict_xid;
+ Assert(PageIsAllVisible(page));
- if (TransactionIdFollows(frz_conflict_horizon, prstate.latest_xid_removed))
- conflict_xid = frz_conflict_horizon;
- else
- conflict_xid = prstate.latest_xid_removed;
+ old_vmbits = visibilitymap_set_vmbits(blockno,
+ vmbuffer, new_vmbits,
+ RelationGetRelationName(params->relation));
+ if (old_vmbits == new_vmbits)
+ {
+ LockBuffer(vmbuffer, BUFFER_LOCK_UNLOCK);
+ /* Unset so we don't emit WAL since no change occurred */
+ do_set_vm = false;
+ }
+ }
+ /*
+ * Emit a WAL XLOG_HEAP2_PRUNE* record showing what we did. If we were
+ * only updating the VM and it turns out it was already set, we will
+ * have unset do_set_vm earlier. As such, check it again before
+ * emitting the record.
+ */
+ if (RelationNeedsWAL(params->relation) &&
+ (do_prune || do_freeze || do_set_vm))
log_heap_prune_and_freeze(params->relation, buffer,
- InvalidBuffer, /* vmbuffer */
- 0, /* vmflags */
+ do_set_vm ? vmbuffer : InvalidBuffer,
+ do_set_vm ? new_vmbits : 0,
conflict_xid,
- true, params->reason,
+ true, /* cleanup lock */
+ do_set_pd_vis,
+ params->reason,
prstate.frozen, prstate.nfrozen,
prstate.redirected, prstate.nredirected,
prstate.nowdead, prstate.ndead,
prstate.nowunused, prstate.nunused);
- }
}
END_CRIT_SECTION();
+ if (do_set_vm)
+ LockBuffer(vmbuffer, BUFFER_LOCK_UNLOCK);
+
+ /*
+ * During its second pass over the heap, VACUUM calls
+ * heap_page_would_be_all_visible() to determine whether a page is
+ * all-visible and all-frozen. The logic here is similar. After completing
+ * pruning and freezing, use an assertion to verify that our results
+ * remain consistent with heap_page_would_be_all_visible().
+ */
+#ifdef USE_ASSERT_CHECKING
+ if (prstate.all_visible)
+ {
+ TransactionId debug_cutoff;
+ bool debug_all_frozen;
+
+ Assert(prstate.lpdead_items == 0);
+ Assert(prstate.cutoffs);
+
+ if (!heap_page_is_all_visible(params->relation, buffer,
+ prstate.cutoffs->OldestXmin,
+ &debug_all_frozen,
+ &debug_cutoff, off_loc))
+ Assert(false);
+
+ Assert(prstate.all_frozen == debug_all_frozen);
+
+ Assert(!TransactionIdIsValid(debug_cutoff) ||
+ debug_cutoff == prstate.visibility_cutoff_xid);
+ }
+#endif
+
/* Copy information back for caller */
presult->ndeleted = prstate.ndeleted;
presult->nnewlpdead = prstate.ndead;
presult->nfrozen = prstate.nfrozen;
presult->live_tuples = prstate.live_tuples;
presult->recently_dead_tuples = prstate.recently_dead_tuples;
- presult->all_visible = prstate.all_visible;
- presult->all_frozen = prstate.all_frozen;
presult->hastup = prstate.hastup;
-
- /*
- * For callers planning to update the visibility map, the conflict horizon
- * for that record must be the newest xmin on the page. However, if the
- * page is completely frozen, there can be no conflict and the
- * vm_conflict_horizon should remain InvalidTransactionId. This includes
- * the case that we just froze all the tuples; the prune-freeze record
- * included the conflict XID already so the caller doesn't need it.
- */
- if (presult->all_frozen)
- presult->vm_conflict_horizon = InvalidTransactionId;
- else
- presult->vm_conflict_horizon = prstate.visibility_cutoff_xid;
+ presult->new_vmbits = new_vmbits;
+ presult->old_vmbits = old_vmbits;
presult->lpdead_items = prstate.lpdead_items;
/* the presult->deadoffsets array was already filled in */
@@ -2060,6 +2245,64 @@ heap_log_freeze_plan(HeapTupleFreeze *tuples, int ntuples,
return nplans;
}
+/*
+ * Calculate the conflict horizon for the whole XLOG_HEAP2_PRUNE_VACUUM_SCAN
+ * record.
+ */
+static TransactionId
+get_conflict_xid(bool do_prune, bool do_freeze, bool do_set_vm,
+ TransactionId latest_xid_removed, TransactionId frz_conflict_horizon,
+ TransactionId visibility_cutoff_xid, bool blk_already_av,
+ bool set_blk_all_frozen)
+{
+
+ /*
+ * The snapshotConflictHorizon for the whole record should be the most
+ * conservative of all the horizons calculated for any of the possible
+ * modifications. If this record will prune tuples, any transactions on
+ * the standby older than the youngest xmax of the most recently removed
+ * tuple this record will prune will conflict. If this record will freeze
+ * tuples, any transactions on the standby with xids older than the
+ * youngest tuple this record will freeze will conflict.
+ */
+ TransactionId conflict_xid = InvalidTransactionId;
+
+ /*
+ * If we are updating the VM, the conflict horizon is almost always the
+ * visibility cutoff XID.
+ *
+ * Separately, if we are freezing any tuples, as an optimization, we can
+ * use the visibility_cutoff_xid as the conflict horizon if the page will
+ * be all-frozen. This is true even if there are LP_DEAD line pointers
+ * because we ignored those when maintaining the visibility_cutoff_xid.
+ * This will have been calculated earlier as the frz_conflict_horizon when
+ * we determined we would freeze.
+ */
+ if (do_set_vm)
+ conflict_xid = visibility_cutoff_xid;
+ else if (do_freeze)
+ conflict_xid = frz_conflict_horizon;
+
+ /*
+ * If we are removing tuples with a younger xmax than our so far
+ * calculated conflict_xid, we must use this as our horizon.
+ */
+ if (TransactionIdFollows(latest_xid_removed, conflict_xid))
+ conflict_xid = latest_xid_removed;
+
+ /*
+ * We can omit the snapshot conflict horizon if we are not pruning or
+ * freezing any tuples and are setting an already all-visible page
+ * all-frozen in the VM. In this case, all of the tuples on the page must
+ * already be visible to all MVCC snapshots on the standby.
+ */
+ if (!do_prune && !do_freeze &&
+ do_set_vm && blk_already_av && set_blk_all_frozen)
+ conflict_xid = InvalidTransactionId;
+
+ return conflict_xid;
+}
+
/*
* Write an XLOG_HEAP2_PRUNE* WAL record
*
@@ -2084,6 +2327,10 @@ heap_log_freeze_plan(HeapTupleFreeze *tuples, int ntuples,
* case, vmbuffer should already have been updated and marked dirty and should
* still be pinned and locked.
*
+ * set_pd_all_vis indicates that we set PD_ALL_VISIBLE and thus should update
+ * the page LSN when checksums/wal_log_hints are enabled even if we did not
+ * prune or freeze tuples on the page.
+ *
* Note: This function scribbles on the 'frozen' array.
*
* Note: This is called in a critical section, so careful what you do here.
@@ -2093,6 +2340,7 @@ log_heap_prune_and_freeze(Relation relation, Buffer buffer,
Buffer vmbuffer, uint8 vmflags,
TransactionId conflict_xid,
bool cleanup_lock,
+ bool set_pd_all_vis,
PruneReason reason,
HeapTupleFreeze *frozen, int nfrozen,
OffsetNumber *redirected, int nredirected,
@@ -2127,7 +2375,7 @@ log_heap_prune_and_freeze(Relation relation, Buffer buffer,
*/
if (!do_prune &&
nfrozen == 0 &&
- (!do_set_vm || !XLogHintBitIsNeeded()))
+ (!set_pd_all_vis || !XLogHintBitIsNeeded()))
regbuf_flags |= REGBUF_NO_IMAGE;
/*
@@ -2248,7 +2496,8 @@ log_heap_prune_and_freeze(Relation relation, Buffer buffer,
* update PD_ALL_VISIBLE without bumping the LSN, but this is deemed okay
* for page hint updates.
*/
- if (do_prune || nfrozen > 0 || (do_set_vm && XLogHintBitIsNeeded()))
+ if (do_prune || nfrozen > 0 ||
+ (set_pd_all_vis && XLogHintBitIsNeeded()))
{
Assert(BufferIsDirty(buffer));
PageSetLSN(BufferGetPage(buffer), recptr);
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index c2618c6449c..2f719108ad2 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -464,11 +464,6 @@ static void dead_items_add(LVRelState *vacrel, BlockNumber blkno, OffsetNumber *
static void dead_items_reset(LVRelState *vacrel);
static void dead_items_cleanup(LVRelState *vacrel);
-static bool heap_page_is_all_visible(Relation rel, Buffer buf,
- TransactionId OldestXmin,
- bool *all_frozen,
- TransactionId *visibility_cutoff_xid,
- OffsetNumber *logging_offnum);
static bool heap_page_would_be_all_visible(Relation rel, Buffer buf,
TransactionId OldestXmin,
OffsetNumber *deadoffsets,
@@ -1971,6 +1966,8 @@ lazy_scan_prune(LVRelState *vacrel,
params.reason = PRUNE_VACUUM_SCAN;
params.cutoffs = &vacrel->cutoffs;
params.vistest = vacrel->vistest;
+ params.vmbuffer = vmbuffer;
+ params.blk_known_av = all_visible_according_to_vm;
/*
* Prune all HOT-update chains and potentially freeze tuples on this page.
@@ -1987,7 +1984,7 @@ lazy_scan_prune(LVRelState *vacrel,
* tuples. Pruning will have determined whether or not the page is
* all-visible.
*/
- params.options = HEAP_PAGE_PRUNE_FREEZE;
+ params.options = HEAP_PAGE_PRUNE_FREEZE | HEAP_PAGE_PRUNE_UPDATE_VIS;
if (vacrel->nindexes == 0)
params.options |= HEAP_PAGE_PRUNE_MARK_UNUSED_NOW;
@@ -2010,33 +2007,6 @@ lazy_scan_prune(LVRelState *vacrel,
vacrel->new_frozen_tuple_pages++;
}
- /*
- * VACUUM will call heap_page_is_all_visible() during the second pass over
- * the heap to determine all_visible and all_frozen for the page -- this
- * is a specialized version of the logic from this function. Now that
- * we've finished pruning and freezing, make sure that we're in total
- * agreement with heap_page_is_all_visible() using an assertion.
- */
-#ifdef USE_ASSERT_CHECKING
- if (presult.all_visible)
- {
- TransactionId debug_cutoff;
- bool debug_all_frozen;
-
- Assert(presult.lpdead_items == 0);
-
- if (!heap_page_is_all_visible(vacrel->rel, buf,
- vacrel->cutoffs.OldestXmin, &debug_all_frozen,
- &debug_cutoff, &vacrel->offnum))
- Assert(false);
-
- Assert(presult.all_frozen == debug_all_frozen);
-
- Assert(!TransactionIdIsValid(debug_cutoff) ||
- debug_cutoff == presult.vm_conflict_horizon);
- }
-#endif
-
/*
* Now save details of the LP_DEAD items from the page in vacrel
*/
@@ -2070,168 +2040,26 @@ lazy_scan_prune(LVRelState *vacrel,
/* Did we find LP_DEAD items? */
*has_lpdead_items = (presult.lpdead_items > 0);
- Assert(!presult.all_visible || !(*has_lpdead_items));
- Assert(!presult.all_frozen || presult.all_visible);
-
/*
- * Handle setting visibility map bit based on information from the VM (as
- * of last heap_vac_scan_next_block() call), and from all_visible and
- * all_frozen variables
+ * For the purposes of logging, count whether or not the page was newly
+ * set all-visible and, potentially, all-frozen.
*/
- if (!all_visible_according_to_vm && presult.all_visible)
+ if ((presult.old_vmbits & VISIBILITYMAP_ALL_VISIBLE) == 0 &&
+ (presult.new_vmbits & VISIBILITYMAP_ALL_VISIBLE) != 0)
{
- uint8 old_vmbits;
- uint8 flags = VISIBILITYMAP_ALL_VISIBLE;
-
- if (presult.all_frozen)
- {
- Assert(!TransactionIdIsValid(presult.vm_conflict_horizon));
- flags |= VISIBILITYMAP_ALL_FROZEN;
- }
-
- /*
- * It should never be the case that the visibility map page is set
- * while the page-level bit is clear, but the reverse is allowed (if
- * checksums are not enabled). Regardless, set both bits so that we
- * get back in sync.
- *
- * NB: If the heap page is all-visible but the VM bit is not set, we
- * don't need to dirty the heap page. However, if checksums are
- * enabled, we do need to make sure that the heap page is dirtied
- * before passing it to visibilitymap_set(), because it may be logged.
- * Given that this situation should only happen in rare cases after a
- * crash, it is not worth optimizing.
- */
- PageSetAllVisible(page);
- MarkBufferDirty(buf);
- old_vmbits = visibilitymap_set(vacrel->rel, blkno, buf,
- InvalidXLogRecPtr,
- vmbuffer, presult.vm_conflict_horizon,
- flags);
-
- /*
- * If the page wasn't already set all-visible and/or all-frozen in the
- * VM, count it as newly set for logging.
- */
- if ((old_vmbits & VISIBILITYMAP_ALL_VISIBLE) == 0)
- {
- vacrel->vm_new_visible_pages++;
- if (presult.all_frozen)
- {
- vacrel->vm_new_visible_frozen_pages++;
- *vm_page_frozen = true;
- }
- }
- else if ((old_vmbits & VISIBILITYMAP_ALL_FROZEN) == 0 &&
- presult.all_frozen)
+ vacrel->vm_new_visible_pages++;
+ if ((presult.new_vmbits & VISIBILITYMAP_ALL_FROZEN) != 0)
{
- vacrel->vm_new_frozen_pages++;
+ vacrel->vm_new_visible_frozen_pages++;
*vm_page_frozen = true;
}
}
-
- /*
- * As of PostgreSQL 9.2, the visibility map bit should never be set if the
- * page-level bit is clear. However, it's possible that the bit got
- * cleared after heap_vac_scan_next_block() was called, so we must recheck
- * with buffer lock before concluding that the VM is corrupt.
- */
- else if (all_visible_according_to_vm && !PageIsAllVisible(page) &&
- visibilitymap_get_status(vacrel->rel, blkno, &vmbuffer) != 0)
+ else if ((presult.old_vmbits & VISIBILITYMAP_ALL_FROZEN) == 0 &&
+ (presult.new_vmbits & VISIBILITYMAP_ALL_FROZEN) != 0)
{
- ereport(WARNING,
- (errcode(ERRCODE_DATA_CORRUPTED),
- errmsg("page is not marked all-visible but visibility map bit is set in relation \"%s\" page %u",
- vacrel->relname, blkno)));
-
- visibilitymap_clear(vacrel->rel, blkno, vmbuffer,
- VISIBILITYMAP_VALID_BITS);
- }
-
- /*
- * It's possible for the value returned by
- * GetOldestNonRemovableTransactionId() to move backwards, so it's not
- * wrong for us to see tuples that appear to not be visible to everyone
- * yet, while PD_ALL_VISIBLE is already set. The real safe xmin value
- * never moves backwards, but GetOldestNonRemovableTransactionId() is
- * conservative and sometimes returns a value that's unnecessarily small,
- * so if we see that contradiction it just means that the tuples that we
- * think are not visible to everyone yet actually are, and the
- * PD_ALL_VISIBLE flag is correct.
- *
- * There should never be LP_DEAD items on a page with PD_ALL_VISIBLE set,
- * however.
- */
- else if (presult.lpdead_items > 0 && PageIsAllVisible(page))
- {
- ereport(WARNING,
- (errcode(ERRCODE_DATA_CORRUPTED),
- errmsg("page containing LP_DEAD items is marked as all-visible in relation \"%s\" page %u",
- vacrel->relname, blkno)));
-
- PageClearAllVisible(page);
- MarkBufferDirty(buf);
- visibilitymap_clear(vacrel->rel, blkno, vmbuffer,
- VISIBILITYMAP_VALID_BITS);
- }
-
- /*
- * If the all-visible page is all-frozen but not marked as such yet, mark
- * it as all-frozen.
- */
- else if (all_visible_according_to_vm && presult.all_frozen &&
- !VM_ALL_FROZEN(vacrel->rel, blkno, &vmbuffer))
- {
- uint8 old_vmbits;
-
- /*
- * Avoid relying on all_visible_according_to_vm as a proxy for the
- * page-level PD_ALL_VISIBLE bit being set, since it might have become
- * stale -- even when all_visible is set
- */
- if (!PageIsAllVisible(page))
- {
- PageSetAllVisible(page);
- MarkBufferDirty(buf);
- }
-
- /*
- * Set the page all-frozen (and all-visible) in the VM.
- *
- * We can pass InvalidTransactionId as our cutoff_xid, since a
- * snapshotConflictHorizon sufficient to make everything safe for REDO
- * was logged when the page's tuples were frozen.
- */
- Assert(!TransactionIdIsValid(presult.vm_conflict_horizon));
- old_vmbits = visibilitymap_set(vacrel->rel, blkno, buf,
- InvalidXLogRecPtr,
- vmbuffer, InvalidTransactionId,
- VISIBILITYMAP_ALL_VISIBLE |
- VISIBILITYMAP_ALL_FROZEN);
-
- /*
- * The page was likely already set all-visible in the VM. However,
- * there is a small chance that it was modified sometime between
- * setting all_visible_according_to_vm and checking the visibility
- * during pruning. Check the return value of old_vmbits anyway to
- * ensure the visibility map counters used for logging are accurate.
- */
- if ((old_vmbits & VISIBILITYMAP_ALL_VISIBLE) == 0)
- {
- vacrel->vm_new_visible_pages++;
- vacrel->vm_new_visible_frozen_pages++;
- *vm_page_frozen = true;
- }
-
- /*
- * We already checked that the page was not set all-frozen in the VM
- * above, so we don't need to test the value of old_vmbits.
- */
- else
- {
- vacrel->vm_new_frozen_pages++;
- *vm_page_frozen = true;
- }
+ Assert((presult.new_vmbits & VISIBILITYMAP_ALL_VISIBLE) != 0);
+ vacrel->vm_new_frozen_pages++;
+ *vm_page_frozen = true;
}
return presult.ndeleted;
@@ -2950,6 +2778,7 @@ lazy_vacuum_heap_page(LVRelState *vacrel, BlockNumber blkno, Buffer buffer,
vmbuffer, vmflags,
conflict_xid,
false, /* no cleanup lock required */
+ (vmflags & VISIBILITYMAP_VALID_BITS) != 0,
PRUNE_VACUUM_CLEANUP,
NULL, 0, /* frozen */
NULL, 0, /* redirected */
@@ -3634,7 +3463,7 @@ dead_items_cleanup(LVRelState *vacrel)
* Wrapper for heap_page_would_be_all_visible() which can be used for
* callers that expect no LP_DEAD on the page.
*/
-static bool
+bool
heap_page_is_all_visible(Relation rel, Buffer buf,
TransactionId OldestXmin,
bool *all_frozen,
diff --git a/src/include/access/heapam.h b/src/include/access/heapam.h
index 74a5c24002b..2de39ba0cd1 100644
--- a/src/include/access/heapam.h
+++ b/src/include/access/heapam.h
@@ -42,6 +42,7 @@
/* "options" flag bits for heap_page_prune_and_freeze */
#define HEAP_PAGE_PRUNE_MARK_UNUSED_NOW (1 << 0)
#define HEAP_PAGE_PRUNE_FREEZE (1 << 1)
+#define HEAP_PAGE_PRUNE_UPDATE_VIS (1 << 2)
typedef struct BulkInsertStateData *BulkInsertState;
typedef struct GlobalVisState GlobalVisState;
@@ -238,6 +239,16 @@ typedef struct PruneFreezeParams
Relation relation; /* relation containing buffer to be pruned */
Buffer buffer; /* buffer to be pruned */
+ /*
+ *
+ * vmbuffer is the buffer that must already contain contain the required
+ * block of the visibility map if we are to update it. blk_known_av is the
+ * visibility status of the heap block as of the last call to
+ * find_next_unskippable_block().
+ */
+ Buffer vmbuffer;
+ bool blk_known_av;
+
/*
* The reason pruning was performed. It is used to set the WAL record
* opcode which is used for debugging and analysis purposes.
@@ -250,8 +261,9 @@ typedef struct PruneFreezeParams
* MARK_UNUSED_NOW indicates that dead items can be set LP_UNUSED during
* pruning.
*
- * FREEZE indicates that we will also freeze tuples, and will return
- * 'all_visible', 'all_frozen' flags to the caller.
+ * FREEZE indicates that we will also freeze tuples
+ *
+ * UPDATE_VIS indicates that we will set the page's status in the VM.
*/
int options;
@@ -284,19 +296,15 @@ typedef struct PruneFreezeResult
int recently_dead_tuples;
/*
- * all_visible and all_frozen indicate if the all-visible and all-frozen
- * bits in the visibility map can be set for this page, after pruning.
- *
- * vm_conflict_horizon is the newest xmin of live tuples on the page. The
- * caller can use it as the conflict horizon when setting the VM bits. It
- * is only valid if we froze some tuples (nfrozen > 0), and all_frozen is
- * true.
+ * old_vmbits are the state of the all-visible and all-frozen bits in the
+ * visibility map before updating it during phase I of vacuuming.
+ * new_vmbits are the state of those bits after phase I of vacuuming.
*
- * These are only set if the HEAP_PRUNE_FREEZE option is set.
+ * These are only set if the HEAP_PAGE_PRUNE_UPDATE_VIS option is set and
+ * we have attempted to update the VM.
*/
- bool all_visible;
- bool all_frozen;
- TransactionId vm_conflict_horizon;
+ uint8 new_vmbits;
+ uint8 old_vmbits;
/*
* Whether or not the page makes rel truncation unsafe. This is set to
@@ -423,6 +431,7 @@ extern void log_heap_prune_and_freeze(Relation relation, Buffer buffer,
Buffer vmbuffer, uint8 vmflags,
TransactionId conflict_xid,
bool cleanup_lock,
+ bool set_pd_all_vis,
PruneReason reason,
HeapTupleFreeze *frozen, int nfrozen,
OffsetNumber *redirected, int nredirected,
@@ -433,6 +442,12 @@ extern void log_heap_prune_and_freeze(Relation relation, Buffer buffer,
extern void heap_vacuum_rel(Relation rel,
const VacuumParams params, BufferAccessStrategy bstrategy);
+extern bool heap_page_is_all_visible(Relation rel, Buffer buf,
+ TransactionId OldestXmin,
+ bool *all_frozen,
+ TransactionId *visibility_cutoff_xid,
+ OffsetNumber *logging_offnum);
+
/* in heap/heapam_visibility.c */
extern bool HeapTupleSatisfiesVisibility(HeapTuple htup, Snapshot snapshot,
Buffer buffer);
--
2.43.0
v17-0005-Update-PruneState.all_-visible-frozen-earlier-in.patchtext/x-patch; charset=UTF-8; name=v17-0005-Update-PruneState.all_-visible-frozen-earlier-in.patchDownload
From d4a4be3eed25853fc1ea84ebc2cbe0226afd823a Mon Sep 17 00:00:00 2001
From: Melanie Plageman <melanieplageman@gmail.com>
Date: Mon, 15 Sep 2025 16:25:44 -0400
Subject: [PATCH v17 05/15] Update PruneState.all_[visible|frozen] earlier in
pruning
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
In the prune/freeze path, we currently delay clearing all_visible and
all_frozen when dead items are present. This allows opportunistic
freezing if the page would otherwise be fully frozen, since those dead
items are later removed in vacuum’s third phase.
However, if no freezing will be attempted, there’s no need to delay.
Clearing the flags promptly avoids extra bookkeeping in
heap_prune_unchanged_lp_normal(). At present this has no runtime effect
because all callers that consider setting the VM also attempt freezing,
but future callers (e.g. on-access pruning) may want to set the VM
without preparing freeze plans.
We also used to defer clearing all_visible and all_frozen until after
computing the visibility cutoff XID. By determining the cutoff earlier,
we can update these flags immediately after deciding whether to
opportunistically freeze. This is necessary if we want to set the VM in
the same WAL record that prunes and freezes tuples on the page.
While we are at it, unset all_frozen whenever we unset all_visible.
Previously we could only use all_frozen in combination with all_visible
as all_frozen was not unset when not all-visible tuples were encountered.
It is best to keep them both up-to-date to avoid mistakes when using
all_frozen.
---
src/backend/access/heap/pruneheap.c | 144 ++++++++++++++-------------
src/backend/access/heap/vacuumlazy.c | 9 +-
2 files changed, 77 insertions(+), 76 deletions(-)
diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index 44214a57ecd..5892ed5a07e 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -138,15 +138,12 @@ typedef struct
* bits. It is only valid if we froze some tuples, and all_frozen is
* true.
*
- * NOTE: all_visible and all_frozen don't include LP_DEAD items. That's
- * convenient for heap_page_prune_and_freeze(), to use them to decide
- * whether to freeze the page or not. The all_visible and all_frozen
- * values returned to the caller are adjusted to include LP_DEAD items at
- * the end.
- *
- * all_frozen should only be considered valid if all_visible is also set;
- * we don't bother to clear the all_frozen flag every time we clear the
- * all_visible flag.
+ * NOTE: all_visible and all_frozen initially don't include LP_DEAD items.
+ * That's convenient for heap_page_prune_and_freeze(), to use them to
+ * decide whether to freeze the page or not. The all_visible and
+ * all_frozen values returned to the caller are adjusted to include
+ * LP_DEAD items after we determine whether or not to opportunistically
+ * freeze.
*/
bool all_visible;
bool all_frozen;
@@ -309,7 +306,8 @@ heap_page_prune_opt(Relation relation, Buffer buffer)
* pre-freeze checks.
*
* do_prune, do_hint_prune, and did_tuple_hint_fpi must all have been decided
- * before calling this function.
+ * before calling this function. *frz_conflict_horizon is set to the snapshot
+ * conflict horizon we for the WAL record should we decide to freeze tuples.
*
* prstate is an input/output parameter.
*
@@ -321,7 +319,8 @@ heap_page_will_freeze(Relation relation, Buffer buffer,
bool did_tuple_hint_fpi,
bool do_prune,
bool do_hint_prune,
- PruneState *prstate)
+ PruneState *prstate,
+ TransactionId *frz_conflict_horizon)
{
bool do_freeze = false;
@@ -358,8 +357,10 @@ heap_page_will_freeze(Relation relation, Buffer buffer,
* anymore. The opportunistic freeze heuristic must be improved;
* however, for now, try to approximate the old logic.
*/
- if (prstate->all_visible && prstate->all_frozen && prstate->nfrozen > 0)
+ if (prstate->all_frozen && prstate->nfrozen > 0)
{
+ Assert(prstate->all_visible);
+
/*
* Freezing would make the page all-frozen. Have already emitted
* an FPI or will do so anyway?
@@ -389,6 +390,22 @@ heap_page_will_freeze(Relation relation, Buffer buffer,
* critical section.
*/
heap_pre_freeze_checks(buffer, prstate->frozen, prstate->nfrozen);
+
+ /*
+ * Calculate what the snapshot conflict horizon should be for a record
+ * freezing tuples. We can use the visibility_cutoff_xid as our cutoff
+ * for conflicts when the whole page is eligible to become all-frozen
+ * in the VM once we're done with it. Otherwise we generate a
+ * conservative cutoff by stepping back from OldestXmin.
+ */
+ if (prstate->all_frozen)
+ *frz_conflict_horizon = prstate->visibility_cutoff_xid;
+ else
+ {
+ /* Avoids false conflicts when hot_standby_feedback in use */
+ *frz_conflict_horizon = prstate->cutoffs->OldestXmin;
+ TransactionIdRetreat(*frz_conflict_horizon);
+ }
}
else if (prstate->nfrozen > 0)
{
@@ -433,10 +450,11 @@ heap_page_will_freeze(Relation relation, Buffer buffer,
* considered advantageous for overall system performance to do so now. The
* 'params.cutoffs', 'presult', 'new_relfrozen_xid' and 'new_relmin_mxid'
* arguments are required when freezing. When HEAP_PRUNE_FREEZE option is
- * passed, we also set presult->all_visible and presult->all_frozen on exit,
- * to indicate if the VM bits can be set. They are always set to false when
- * the HEAP_PRUNE_FREEZE option is not passed, because at the moment only
- * callers that also freeze need that information.
+ * passed, we also set presult->all_visible and presult->all_frozen after
+ * determining whether or not to opporunistically freeze, to indicate if the
+ * VM bits can be set. They are always set to false when the
+ * HEAP_PRUNE_FREEZE option is not passed, because at the moment only callers
+ * that also freeze need that information.
*
* presult contains output parameters needed by callers, such as the number of
* tuples removed and the offsets of dead items on the page after pruning.
@@ -472,6 +490,7 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
bool do_hint_prune;
bool did_tuple_hint_fpi;
int64 fpi_before = pgWalUsage.wal_fpi;
+ TransactionId frz_conflict_horizon = InvalidTransactionId;
/* Copy parameters to prstate */
prstate.vistest = params->vistest;
@@ -541,10 +560,10 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
* are tuples present that are not visible to everyone or if there are
* dead tuples which are not yet removable. However, dead tuples which
* will be removed by the end of vacuuming should not preclude us from
- * opportunistically freezing. Because of that, we do not clear
- * all_visible when we see LP_DEAD items. We fix that at the end of the
- * function, when we return the value to the caller, so that the caller
- * doesn't set the VM bit incorrectly.
+ * opportunistically freezing. Because of that, we do not immediately
+ * clear all_visible when we see LP_DEAD items. We fix that after
+ * scanning the line pointers, before we return the value to the caller,
+ * so that the caller doesn't set the VM bit incorrectly.
*/
if (prstate.attempt_freeze)
{
@@ -779,8 +798,26 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
did_tuple_hint_fpi,
do_prune,
do_hint_prune,
- &prstate);
+ &prstate,
+ &frz_conflict_horizon);
+ /*
+ * While scanning the line pointers, we did not clear
+ * all_visible/all_frozen when encountering LP_DEAD items because we
+ * wanted the decision whether or not to freeze the page to be unaffected
+ * by the short-term presence of LP_DEAD items. These LP_DEAD items are
+ * effectively assumed to be LP_UNUSED items in the making. It doesn't
+ * matter which vacuum heap pass (initial pass or final pass) ends up
+ * setting the page all-frozen, as long as the ongoing VACUUM does it.
+ *
+ * Now that we finished determining whether or not to freeze the page,
+ * update all_visible and all_frozen so that they reflect the true state
+ * of the page for setting PD_ALL_VISIBLE and VM bits.
+ */
+ if (prstate.lpdead_items > 0)
+ prstate.all_visible = prstate.all_frozen = false;
+
+ Assert(!prstate.all_frozen || prstate.all_visible);
/* Any error while applying the changes is critical */
START_CRIT_SECTION();
@@ -839,27 +876,8 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
* on the standby with xids older than the youngest tuple this
* record will freeze will conflict.
*/
- TransactionId frz_conflict_horizon = InvalidTransactionId;
TransactionId conflict_xid;
- /*
- * We can use the visibility_cutoff_xid as our cutoff for
- * conflicts when the whole page is eligible to become all-frozen
- * in the VM once we're done with it. Otherwise we generate a
- * conservative cutoff by stepping back from OldestXmin.
- */
- if (do_freeze)
- {
- if (prstate.all_visible && prstate.all_frozen)
- frz_conflict_horizon = prstate.visibility_cutoff_xid;
- else
- {
- /* Avoids false conflicts when hot_standby_feedback in use */
- frz_conflict_horizon = prstate.cutoffs->OldestXmin;
- TransactionIdRetreat(frz_conflict_horizon);
- }
- }
-
if (TransactionIdFollows(frz_conflict_horizon, prstate.latest_xid_removed))
conflict_xid = frz_conflict_horizon;
else
@@ -885,30 +903,8 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
presult->nfrozen = prstate.nfrozen;
presult->live_tuples = prstate.live_tuples;
presult->recently_dead_tuples = prstate.recently_dead_tuples;
-
- /*
- * It was convenient to ignore LP_DEAD items in all_visible earlier on to
- * make the choice of whether or not to freeze the page unaffected by the
- * short-term presence of LP_DEAD items. These LP_DEAD items were
- * effectively assumed to be LP_UNUSED items in the making. It doesn't
- * matter which vacuum heap pass (initial pass or final pass) ends up
- * setting the page all-frozen, as long as the ongoing VACUUM does it.
- *
- * Now that freezing has been finalized, unset all_visible if there are
- * any LP_DEAD items on the page. It needs to reflect the present state
- * of the page, as expected by our caller.
- */
- if (prstate.all_visible && prstate.lpdead_items == 0)
- {
- presult->all_visible = prstate.all_visible;
- presult->all_frozen = prstate.all_frozen;
- }
- else
- {
- presult->all_visible = false;
- presult->all_frozen = false;
- }
-
+ presult->all_visible = prstate.all_visible;
+ presult->all_frozen = prstate.all_frozen;
presult->hastup = prstate.hastup;
/*
@@ -1288,8 +1284,11 @@ heap_prune_record_dead(PruneState *prstate, OffsetNumber offnum,
/*
* Deliberately delay unsetting all_visible until later during pruning.
- * Removable dead tuples shouldn't preclude freezing the page.
+ * Removable dead tuples shouldn't preclude freezing the page. If we won't
+ * attempt freezing, just unset all-visible now, though.
*/
+ if (!prstate->attempt_freeze)
+ prstate->all_visible = prstate->all_frozen = false;
/* Record the dead offset for vacuum */
prstate->deadoffsets[prstate->lpdead_items++] = offnum;
@@ -1415,7 +1414,7 @@ heap_prune_record_unchanged_lp_normal(Page page, PruneState *prstate, OffsetNumb
if (!HeapTupleHeaderXminCommitted(htup))
{
- prstate->all_visible = false;
+ prstate->all_visible = prstate->all_frozen = false;
break;
}
@@ -1437,7 +1436,7 @@ heap_prune_record_unchanged_lp_normal(Page page, PruneState *prstate, OffsetNumb
Assert(prstate->cutoffs);
if (!TransactionIdPrecedes(xmin, prstate->cutoffs->OldestXmin))
{
- prstate->all_visible = false;
+ prstate->all_visible = prstate->all_frozen = false;
break;
}
@@ -1450,7 +1449,7 @@ heap_prune_record_unchanged_lp_normal(Page page, PruneState *prstate, OffsetNumb
case HEAPTUPLE_RECENTLY_DEAD:
prstate->recently_dead_tuples++;
- prstate->all_visible = false;
+ prstate->all_visible = prstate->all_frozen = false;
/*
* This tuple will soon become DEAD. Update the hint field so
@@ -1469,7 +1468,7 @@ heap_prune_record_unchanged_lp_normal(Page page, PruneState *prstate, OffsetNumb
* assumption is a bit shaky, but it is what acquire_sample_rows()
* does, so be consistent.
*/
- prstate->all_visible = false;
+ prstate->all_visible = prstate->all_frozen = false;
/*
* If we wanted to optimize for aborts, we might consider marking
@@ -1487,7 +1486,7 @@ heap_prune_record_unchanged_lp_normal(Page page, PruneState *prstate, OffsetNumb
* will commit and update the counters after we report.
*/
prstate->live_tuples++;
- prstate->all_visible = false;
+ prstate->all_visible = prstate->all_frozen = false;
/*
* This tuple may soon become DEAD. Update the hint field so that
@@ -1555,8 +1554,11 @@ heap_prune_record_unchanged_lp_dead(Page page, PruneState *prstate, OffsetNumber
* Similarly, don't unset all_visible until later, at the end of
* heap_page_prune_and_freeze(). This will allow us to attempt to freeze
* the page after pruning. As long as we unset it before updating the
- * visibility map, this will be correct.
+ * visibility map, this will be correct. If we won't attempt freezing,
+ * though, just unset all-visible now.
*/
+ if (!prstate->attempt_freeze)
+ prstate->all_visible = prstate->all_frozen = false;
/* Record the dead offset for vacuum */
prstate->deadoffsets[prstate->lpdead_items++] = offnum;
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index b25050d6773..56a0286662b 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -2012,7 +2012,6 @@ lazy_scan_prune(LVRelState *vacrel,
* agreement with heap_page_is_all_visible() using an assertion.
*/
#ifdef USE_ASSERT_CHECKING
- /* Note that all_frozen value does not matter when !all_visible */
if (presult.all_visible)
{
TransactionId debug_cutoff;
@@ -2065,6 +2064,7 @@ lazy_scan_prune(LVRelState *vacrel,
*has_lpdead_items = (presult.lpdead_items > 0);
Assert(!presult.all_visible || !(*has_lpdead_items));
+ Assert(!presult.all_frozen || presult.all_visible);
/*
* Handle setting visibility map bit based on information from the VM (as
@@ -2170,11 +2170,10 @@ lazy_scan_prune(LVRelState *vacrel,
/*
* If the all-visible page is all-frozen but not marked as such yet, mark
- * it as all-frozen. Note that all_frozen is only valid if all_visible is
- * true, so we must check both all_visible and all_frozen.
+ * it as all-frozen.
*/
- else if (all_visible_according_to_vm && presult.all_visible &&
- presult.all_frozen && !VM_ALL_FROZEN(vacrel->rel, blkno, &vmbuffer))
+ else if (all_visible_according_to_vm && presult.all_frozen &&
+ !VM_ALL_FROZEN(vacrel->rel, blkno, &vmbuffer))
{
uint8 old_vmbits;
--
2.43.0
v17-0010-Rename-GlobalVisTestIsRemovableXid-to-GlobalVisX.patchtext/x-patch; charset=UTF-8; name=v17-0010-Rename-GlobalVisTestIsRemovableXid-to-GlobalVisX.patchDownload
From 078c1a636f208dee878fa4d78b6e05006513008a Mon Sep 17 00:00:00 2001
From: Melanie Plageman <melanieplageman@gmail.com>
Date: Fri, 18 Jul 2025 16:30:04 -0400
Subject: [PATCH v17 10/15] Rename GlobalVisTestIsRemovableXid() to
GlobalVisXidVisibleToAll()
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
The function is currently only used to check whether a tuple’s xmax is
visible to all transactions (and thus removable). Upcoming changes will
also use it to test whether a tuple’s xmin is visible to all to
decide if a page can be marked all-visible in the visibility map.
The new name, GlobalVisXidVisibleToAll(), better reflects this broader
purpose.
Reviewed-by: Kirill Reshke <reshkekirill@gmail.com>
---
src/backend/access/heap/heapam_visibility.c | 6 +++---
src/backend/access/heap/pruneheap.c | 16 ++++++++--------
src/backend/access/spgist/spgvacuum.c | 2 +-
src/backend/storage/ipc/procarray.c | 17 ++++++++---------
src/include/utils/snapmgr.h | 4 ++--
5 files changed, 22 insertions(+), 23 deletions(-)
diff --git a/src/backend/access/heap/heapam_visibility.c b/src/backend/access/heap/heapam_visibility.c
index 05f6946fe60..4ebc8abdbeb 100644
--- a/src/backend/access/heap/heapam_visibility.c
+++ b/src/backend/access/heap/heapam_visibility.c
@@ -1447,7 +1447,7 @@ HeapTupleSatisfiesNonVacuumable(HeapTuple htup, Snapshot snapshot,
{
Assert(TransactionIdIsValid(dead_after));
- if (GlobalVisTestIsRemovableXid(snapshot->vistest, dead_after))
+ if (GlobalVisXidVisibleToAll(snapshot->vistest, dead_after))
res = HEAPTUPLE_DEAD;
}
else
@@ -1512,8 +1512,8 @@ HeapTupleIsSurelyDead(HeapTuple htup, GlobalVisState *vistest)
return false;
/* Deleter committed, so tuple is dead if the XID is old enough. */
- return GlobalVisTestIsRemovableXid(vistest,
- HeapTupleHeaderGetRawXmax(tuple));
+ return GlobalVisXidVisibleToAll(vistest,
+ HeapTupleHeaderGetRawXmax(tuple));
}
/*
diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index 21b24f3992e..f1e137a387d 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -233,7 +233,7 @@ heap_page_prune_opt(Relation relation, Buffer buffer)
*/
vistest = GlobalVisTestFor(relation);
- if (!GlobalVisTestIsRemovableXid(vistest, prune_xid))
+ if (!GlobalVisXidVisibleToAll(vistest, prune_xid))
return;
/*
@@ -728,9 +728,9 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
* Determining HTSV only once for each tuple is required for correctness,
* to deal with cases where running HTSV twice could result in different
* results. For example, RECENTLY_DEAD can turn to DEAD if another
- * checked item causes GlobalVisTestIsRemovableFullXid() to update the
- * horizon, or INSERT_IN_PROGRESS can change to DEAD if the inserting
- * transaction aborts.
+ * checked item causes GlobalVisXidVisibleToAll() to update the horizon,
+ * or INSERT_IN_PROGRESS can change to DEAD if the inserting transaction
+ * aborts.
*
* It's also good for performance. Most commonly tuples within a page are
* stored at decreasing offsets (while the items are stored at increasing
@@ -1153,11 +1153,11 @@ heap_prune_satisfies_vacuum(PruneState *prstate, HeapTuple tup, Buffer buffer)
* Determine whether or not the tuple is considered dead when compared
* with the provided GlobalVisState. On-access pruning does not provide
* VacuumCutoffs. And for vacuum, even if the tuple's xmax is not older
- * than OldestXmin, GlobalVisTestIsRemovableXid() could find the row dead
- * if the GlobalVisState has been updated since the beginning of vacuuming
+ * than OldestXmin, GlobalVisXidVisibleToAll() could find the row dead if
+ * the GlobalVisState has been updated since the beginning of vacuuming
* the relation.
*/
- if (GlobalVisTestIsRemovableXid(prstate->vistest, dead_after))
+ if (GlobalVisXidVisibleToAll(prstate->vistest, dead_after))
return HEAPTUPLE_DEAD;
return res;
@@ -1615,7 +1615,7 @@ heap_prune_record_unchanged_lp_normal(Page page, PruneState *prstate, OffsetNumb
/*
* For now always use prstate->cutoffs for this test, because
* we only update 'all_visible' when freezing is requested. We
- * could use GlobalVisTestIsRemovableXid instead, if a
+ * could use GlobalVisXidVisibleToAll() instead, if a
* non-freezing caller wanted to set the VM bit.
*/
Assert(prstate->cutoffs);
diff --git a/src/backend/access/spgist/spgvacuum.c b/src/backend/access/spgist/spgvacuum.c
index 8f8a1ad7796..496cca69410 100644
--- a/src/backend/access/spgist/spgvacuum.c
+++ b/src/backend/access/spgist/spgvacuum.c
@@ -536,7 +536,7 @@ vacuumRedirectAndPlaceholder(Relation index, Relation heaprel, Buffer buffer)
*/
if (dt->tupstate == SPGIST_REDIRECT &&
(!TransactionIdIsValid(dt->xid) ||
- GlobalVisTestIsRemovableXid(vistest, dt->xid)))
+ GlobalVisXidVisibleToAll(vistest, dt->xid)))
{
dt->tupstate = SPGIST_PLACEHOLDER;
Assert(opaque->nRedirection > 0);
diff --git a/src/backend/storage/ipc/procarray.c b/src/backend/storage/ipc/procarray.c
index 200f72c6e25..235c3b584f6 100644
--- a/src/backend/storage/ipc/procarray.c
+++ b/src/backend/storage/ipc/procarray.c
@@ -4181,8 +4181,7 @@ GlobalVisUpdate(void)
* See comment for GlobalVisState for details.
*/
bool
-GlobalVisTestIsRemovableFullXid(GlobalVisState *state,
- FullTransactionId fxid)
+GlobalVisFullXidVisibleToAll(GlobalVisState *state, FullTransactionId fxid)
{
/*
* If fxid is older than maybe_needed bound, it definitely is visible to
@@ -4216,14 +4215,14 @@ GlobalVisTestIsRemovableFullXid(GlobalVisState *state,
}
/*
- * Wrapper around GlobalVisTestIsRemovableFullXid() for 32bit xids.
+ * Wrapper around GlobalVisFullXidVisibleToAll() for 32bit xids.
*
* It is crucial that this only gets called for xids from a source that
* protects against xid wraparounds (e.g. from a table and thus protected by
* relfrozenxid).
*/
bool
-GlobalVisTestIsRemovableXid(GlobalVisState *state, TransactionId xid)
+GlobalVisXidVisibleToAll(GlobalVisState *state, TransactionId xid)
{
FullTransactionId fxid;
@@ -4237,12 +4236,12 @@ GlobalVisTestIsRemovableXid(GlobalVisState *state, TransactionId xid)
*/
fxid = FullXidRelativeTo(state->definitely_needed, xid);
- return GlobalVisTestIsRemovableFullXid(state, fxid);
+ return GlobalVisFullXidVisibleToAll(state, fxid);
}
/*
* Convenience wrapper around GlobalVisTestFor() and
- * GlobalVisTestIsRemovableFullXid(), see their comments.
+ * GlobalVisFullXidVisibleToAll(), see their comments.
*/
bool
GlobalVisCheckRemovableFullXid(Relation rel, FullTransactionId fxid)
@@ -4251,12 +4250,12 @@ GlobalVisCheckRemovableFullXid(Relation rel, FullTransactionId fxid)
state = GlobalVisTestFor(rel);
- return GlobalVisTestIsRemovableFullXid(state, fxid);
+ return GlobalVisFullXidVisibleToAll(state, fxid);
}
/*
* Convenience wrapper around GlobalVisTestFor() and
- * GlobalVisTestIsRemovableXid(), see their comments.
+ * GlobalVisTestIsVisibleXid(), see their comments.
*/
bool
GlobalVisCheckRemovableXid(Relation rel, TransactionId xid)
@@ -4265,7 +4264,7 @@ GlobalVisCheckRemovableXid(Relation rel, TransactionId xid)
state = GlobalVisTestFor(rel);
- return GlobalVisTestIsRemovableXid(state, xid);
+ return GlobalVisXidVisibleToAll(state, xid);
}
/*
diff --git a/src/include/utils/snapmgr.h b/src/include/utils/snapmgr.h
index 604c1f90216..a0ea2cfcea2 100644
--- a/src/include/utils/snapmgr.h
+++ b/src/include/utils/snapmgr.h
@@ -100,8 +100,8 @@ extern char *ExportSnapshot(Snapshot snapshot);
*/
typedef struct GlobalVisState GlobalVisState;
extern GlobalVisState *GlobalVisTestFor(Relation rel);
-extern bool GlobalVisTestIsRemovableXid(GlobalVisState *state, TransactionId xid);
-extern bool GlobalVisTestIsRemovableFullXid(GlobalVisState *state, FullTransactionId fxid);
+extern bool GlobalVisXidVisibleToAll(GlobalVisState *state, TransactionId xid);
+extern bool GlobalVisFullXidVisibleToAll(GlobalVisState *state, FullTransactionId fxid);
extern bool GlobalVisCheckRemovableXid(Relation rel, TransactionId xid);
extern bool GlobalVisCheckRemovableFullXid(Relation rel, FullTransactionId fxid);
--
2.43.0
v17-0011-Use-GlobalVisState-in-vacuum-to-determine-page-l.patchtext/x-patch; charset=UTF-8; name=v17-0011-Use-GlobalVisState-in-vacuum-to-determine-page-l.patchDownload
From 783f1f53b90bc12ac025b68125e3cd85706c71fb Mon Sep 17 00:00:00 2001
From: Melanie Plageman <melanieplageman@gmail.com>
Date: Tue, 29 Jul 2025 14:38:24 -0400
Subject: [PATCH v17 11/15] Use GlobalVisState in vacuum to determine page
level visibility
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
During vacuum's first and third phases, we examine tuples' visibility
to determine if we can set the page all-visible in the visibility map.
Previously, this check compared tuple xmins against a single XID chosen at
the start of vacuum (OldestXmin). We now use GlobalVisState, which also
enables future work to set the VM during on-access pruning, since ordinary
queries have access to GlobalVisState but not OldestXmin.
This also benefits vacuum directly: GlobalVisState may advance
during a vacuum, allowing more pages to become considered all-visible.
In the rare case that it moves backward, VACUUM falls back to OldestXmin
to ensure we don’t attempt to freeze a dead tuple that wasn’t yet
prunable according to the GlobalVisState.
Because comparing a transaction ID against GlobalVisState is more
expensive than comparing against a single XID, we defer this check until
after scanning all tuples on the page. If visibility_cutoff_xid was
maintained, we perform the GlobalVisState check only once per page.
This is safe because visibility_cutoff_xid records the newest xmin on
the page; if it is globally visible, then the entire page is all-visible.
This approach may result in examining more tuple xmins than before,
since with OldestXmin we could sometimes rule out the page being
all-visible earlier. However, profiling shows the additional cost is not
significant.
---
src/backend/access/heap/heapam_visibility.c | 28 ++++++++++++++++
src/backend/access/heap/pruneheap.c | 37 ++++++++++-----------
src/backend/access/heap/vacuumlazy.c | 17 +++++-----
src/include/access/heapam.h | 7 ++--
4 files changed, 57 insertions(+), 32 deletions(-)
diff --git a/src/backend/access/heap/heapam_visibility.c b/src/backend/access/heap/heapam_visibility.c
index 4ebc8abdbeb..edd529dc3c0 100644
--- a/src/backend/access/heap/heapam_visibility.c
+++ b/src/backend/access/heap/heapam_visibility.c
@@ -1189,6 +1189,34 @@ HeapTupleSatisfiesVacuum(HeapTuple htup, TransactionId OldestXmin,
return res;
}
+/*
+ * Nearly the same as HeapTupleSatisfiesVacuum, but uses a GlobalVisState to
+ * determine whether or not a tuple is HEAPTUPLE_DEAD Or
+ * HEAPTUPLE_RECENTLY_DEAD. It serves the same purpose but can be used by
+ * callers that have not calculated a single OldestXmin value.
+ */
+HTSV_Result
+HeapTupleSatisfiesVacuumGlobalVis(HeapTuple htup, GlobalVisState *vistest,
+ Buffer buffer)
+{
+ TransactionId dead_after = InvalidTransactionId;
+ HTSV_Result res;
+
+ res = HeapTupleSatisfiesVacuumHorizon(htup, buffer, &dead_after);
+
+ if (res == HEAPTUPLE_RECENTLY_DEAD)
+ {
+ Assert(TransactionIdIsValid(dead_after));
+
+ if (GlobalVisXidVisibleToAll(vistest, dead_after))
+ res = HEAPTUPLE_DEAD;
+ }
+ else
+ Assert(!TransactionIdIsValid(dead_after));
+
+ return res;
+}
+
/*
* Work horse for HeapTupleSatisfiesVacuum and similar routines.
*
diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index f1e137a387d..671236ee23f 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -710,11 +710,12 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
/*
* The visibility cutoff xid is the newest xmin of live, committed tuples
- * older than OldestXmin on the page. This field is only kept up-to-date
- * if the page is all-visible. As soon as a tuple is encountered that is
- * not visible to all, this field is unmaintained. As long as it is
- * maintained, it can be used to calculate the snapshot conflict horizon
- * when updating the VM and/or freezing all the tuples on the page.
+ * on the page older than the visibility horizon represented in the
+ * GlobalVisState. This field is only kept up-to-date if the page is
+ * all-visible. As soon as a tuple is encountered that is not visible to
+ * all, this field is unmaintained. As long as it is maintained, it can be
+ * used to calculate the snapshot conflict horizon when updating the VM
+ * and/or freezing all the tuples on the page.
*/
prstate.visibility_cutoff_xid = InvalidTransactionId;
@@ -910,6 +911,16 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
prstate.ndead > 0 ||
prstate.nunused > 0;
+ /*
+ * After processing all the live tuples on the page, if the newest xmin
+ * amongst them is not visible to everyone, the page cannot be
+ * all-visible.
+ */
+ if (prstate.all_visible &&
+ TransactionIdIsNormal(prstate.visibility_cutoff_xid) &&
+ !GlobalVisXidVisibleToAll(prstate.vistest, prstate.visibility_cutoff_xid))
+ prstate.all_visible = prstate.all_frozen = false;
+
/*
* Even if we don't prune anything, if we found a new value for the
* pd_prune_xid field or the page was marked full, we will update the hint
@@ -1080,10 +1091,9 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
bool debug_all_frozen;
Assert(prstate.lpdead_items == 0);
- Assert(prstate.cutoffs);
if (!heap_page_is_all_visible(params->relation, buffer,
- prstate.cutoffs->OldestXmin,
+ prstate.vistest,
&debug_all_frozen,
&debug_cutoff, off_loc))
Assert(false);
@@ -1612,19 +1622,6 @@ heap_prune_record_unchanged_lp_normal(Page page, PruneState *prstate, OffsetNumb
*/
xmin = HeapTupleHeaderGetXmin(htup);
- /*
- * For now always use prstate->cutoffs for this test, because
- * we only update 'all_visible' when freezing is requested. We
- * could use GlobalVisXidVisibleToAll() instead, if a
- * non-freezing caller wanted to set the VM bit.
- */
- Assert(prstate->cutoffs);
- if (!TransactionIdPrecedes(xmin, prstate->cutoffs->OldestXmin))
- {
- prstate->all_visible = prstate->all_frozen = false;
- break;
- }
-
/* Track newest xmin on page. */
if (TransactionIdFollows(xmin, prstate->visibility_cutoff_xid) &&
TransactionIdIsNormal(xmin))
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index 1b20c96033e..3e9cf2f15a4 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -465,7 +465,7 @@ static void dead_items_reset(LVRelState *vacrel);
static void dead_items_cleanup(LVRelState *vacrel);
static bool heap_page_would_be_all_visible(Relation rel, Buffer buf,
- TransactionId OldestXmin,
+ GlobalVisState *vistest,
OffsetNumber *deadoffsets,
int ndeadoffsets,
bool *all_frozen,
@@ -2740,7 +2740,7 @@ lazy_vacuum_heap_page(LVRelState *vacrel, BlockNumber blkno, Buffer buffer,
* done outside the critical section.
*/
if (heap_page_would_be_all_visible(vacrel->rel, buffer,
- vacrel->cutoffs.OldestXmin,
+ vacrel->vistest,
deadoffsets, num_offsets,
&all_frozen, &visibility_cutoff_xid,
&vacrel->offnum))
@@ -3489,14 +3489,13 @@ dead_items_cleanup(LVRelState *vacrel)
*/
bool
heap_page_is_all_visible(Relation rel, Buffer buf,
- TransactionId OldestXmin,
+ GlobalVisState *vistest,
bool *all_frozen,
TransactionId *visibility_cutoff_xid,
OffsetNumber *logging_offnum)
{
- return heap_page_would_be_all_visible(rel, buf,
- OldestXmin,
+ return heap_page_would_be_all_visible(rel, buf, vistest,
NULL, 0,
all_frozen,
visibility_cutoff_xid,
@@ -3515,7 +3514,7 @@ heap_page_is_all_visible(Relation rel, Buffer buf,
* Returns true if the page is all-visible other than the provided
* deadoffsets and false otherwise.
*
- * OldestXmin is used to determine visibility.
+ * vistest is used to determine visibility.
*
* Output parameters:
*
@@ -3534,7 +3533,7 @@ heap_page_is_all_visible(Relation rel, Buffer buf,
*/
static bool
heap_page_would_be_all_visible(Relation rel, Buffer buf,
- TransactionId OldestXmin,
+ GlobalVisState *vistest,
OffsetNumber *deadoffsets,
int ndeadoffsets,
bool *all_frozen,
@@ -3606,7 +3605,7 @@ heap_page_would_be_all_visible(Relation rel, Buffer buf,
tuple.t_len = ItemIdGetLength(itemid);
tuple.t_tableOid = RelationGetRelid(rel);
- switch (HeapTupleSatisfiesVacuum(&tuple, OldestXmin, buf))
+ switch (HeapTupleSatisfiesVacuumGlobalVis(&tuple, vistest, buf))
{
case HEAPTUPLE_LIVE:
{
@@ -3625,7 +3624,7 @@ heap_page_would_be_all_visible(Relation rel, Buffer buf,
* that everyone sees it as committed?
*/
xmin = HeapTupleHeaderGetXmin(tuple.t_data);
- if (!TransactionIdPrecedes(xmin, OldestXmin))
+ if (!GlobalVisXidVisibleToAll(vistest, xmin))
{
all_visible = false;
*all_frozen = false;
diff --git a/src/include/access/heapam.h b/src/include/access/heapam.h
index 2de39ba0cd1..df0632aebc6 100644
--- a/src/include/access/heapam.h
+++ b/src/include/access/heapam.h
@@ -276,8 +276,7 @@ typedef struct PruneFreezeParams
/*
* cutoffs contains the freeze cutoffs, established by VACUUM at the
* beginning of vacuuming the relation. Required if HEAP_PRUNE_FREEZE
- * option is set. cutoffs->OldestXmin is also used to determine if dead
- * tuples are HEAPTUPLE_RECENTLY_DEAD or HEAPTUPLE_DEAD.
+ * option is set.
*/
struct VacuumCutoffs *cutoffs;
} PruneFreezeParams;
@@ -443,7 +442,7 @@ extern void heap_vacuum_rel(Relation rel,
const VacuumParams params, BufferAccessStrategy bstrategy);
extern bool heap_page_is_all_visible(Relation rel, Buffer buf,
- TransactionId OldestXmin,
+ GlobalVisState *vistest,
bool *all_frozen,
TransactionId *visibility_cutoff_xid,
OffsetNumber *logging_offnum);
@@ -455,6 +454,8 @@ extern TM_Result HeapTupleSatisfiesUpdate(HeapTuple htup, CommandId curcid,
Buffer buffer);
extern HTSV_Result HeapTupleSatisfiesVacuum(HeapTuple htup, TransactionId OldestXmin,
Buffer buffer);
+extern HTSV_Result HeapTupleSatisfiesVacuumGlobalVis(HeapTuple htup,
+ GlobalVisState *vistest, Buffer buffer);
extern HTSV_Result HeapTupleSatisfiesVacuumHorizon(HeapTuple htup, Buffer buffer,
TransactionId *dead_after);
extern void HeapTupleSetHintBits(HeapTupleHeader tuple, Buffer buffer,
--
2.43.0
v17-0012-Inline-TransactionIdFollows-Precedes.patchtext/x-patch; charset=US-ASCII; name=v17-0012-Inline-TransactionIdFollows-Precedes.patchDownload
From e412f9298b0735d1091f4769ace4d2d1a7e62312 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <melanieplageman@gmail.com>
Date: Tue, 29 Jul 2025 09:57:13 -0400
Subject: [PATCH v17 12/15] Inline TransactionIdFollows/Precedes()
Calling these from on-access pruning code had noticeable overhead in a
profile. There does not seem to be a reason not to inline them.
Reviewed-by: Kirill Reshke <reshkekirill@gmail.com>
---
src/backend/access/transam/transam.c | 64 -------------------------
src/include/access/transam.h | 70 ++++++++++++++++++++++++++--
2 files changed, 66 insertions(+), 68 deletions(-)
diff --git a/src/backend/access/transam/transam.c b/src/backend/access/transam/transam.c
index 9a39451a29a..553d6756cb3 100644
--- a/src/backend/access/transam/transam.c
+++ b/src/backend/access/transam/transam.c
@@ -273,70 +273,6 @@ TransactionIdAbortTree(TransactionId xid, int nxids, TransactionId *xids)
TRANSACTION_STATUS_ABORTED, InvalidXLogRecPtr);
}
-/*
- * TransactionIdPrecedes --- is id1 logically < id2?
- */
-bool
-TransactionIdPrecedes(TransactionId id1, TransactionId id2)
-{
- /*
- * If either ID is a permanent XID then we can just do unsigned
- * comparison. If both are normal, do a modulo-2^32 comparison.
- */
- int32 diff;
-
- if (!TransactionIdIsNormal(id1) || !TransactionIdIsNormal(id2))
- return (id1 < id2);
-
- diff = (int32) (id1 - id2);
- return (diff < 0);
-}
-
-/*
- * TransactionIdPrecedesOrEquals --- is id1 logically <= id2?
- */
-bool
-TransactionIdPrecedesOrEquals(TransactionId id1, TransactionId id2)
-{
- int32 diff;
-
- if (!TransactionIdIsNormal(id1) || !TransactionIdIsNormal(id2))
- return (id1 <= id2);
-
- diff = (int32) (id1 - id2);
- return (diff <= 0);
-}
-
-/*
- * TransactionIdFollows --- is id1 logically > id2?
- */
-bool
-TransactionIdFollows(TransactionId id1, TransactionId id2)
-{
- int32 diff;
-
- if (!TransactionIdIsNormal(id1) || !TransactionIdIsNormal(id2))
- return (id1 > id2);
-
- diff = (int32) (id1 - id2);
- return (diff > 0);
-}
-
-/*
- * TransactionIdFollowsOrEquals --- is id1 logically >= id2?
- */
-bool
-TransactionIdFollowsOrEquals(TransactionId id1, TransactionId id2)
-{
- int32 diff;
-
- if (!TransactionIdIsNormal(id1) || !TransactionIdIsNormal(id2))
- return (id1 >= id2);
-
- diff = (int32) (id1 - id2);
- return (diff >= 0);
-}
-
/*
* TransactionIdLatest --- get latest XID among a main xact and its children
diff --git a/src/include/access/transam.h b/src/include/access/transam.h
index 7d82cd2eb56..c9e20418275 100644
--- a/src/include/access/transam.h
+++ b/src/include/access/transam.h
@@ -255,6 +255,72 @@ typedef struct TransamVariablesData
} TransamVariablesData;
+
+/*
+ * TransactionIdPrecedes --- is id1 logically < id2?
+ */
+static inline bool
+TransactionIdPrecedes(TransactionId id1, TransactionId id2)
+{
+ /*
+ * If either ID is a permanent XID then we can just do unsigned
+ * comparison. If both are normal, do a modulo-2^32 comparison.
+ */
+ int32 diff;
+
+ if (!TransactionIdIsNormal(id1) || !TransactionIdIsNormal(id2))
+ return (id1 < id2);
+
+ diff = (int32) (id1 - id2);
+ return (diff < 0);
+}
+
+/*
+ * TransactionIdPrecedesOrEquals --- is id1 logically <= id2?
+ */
+static inline bool
+TransactionIdPrecedesOrEquals(TransactionId id1, TransactionId id2)
+{
+ int32 diff;
+
+ if (!TransactionIdIsNormal(id1) || !TransactionIdIsNormal(id2))
+ return (id1 <= id2);
+
+ diff = (int32) (id1 - id2);
+ return (diff <= 0);
+}
+
+/*
+ * TransactionIdFollows --- is id1 logically > id2?
+ */
+static inline bool
+TransactionIdFollows(TransactionId id1, TransactionId id2)
+{
+ int32 diff;
+
+ if (!TransactionIdIsNormal(id1) || !TransactionIdIsNormal(id2))
+ return (id1 > id2);
+
+ diff = (int32) (id1 - id2);
+ return (diff > 0);
+}
+
+/*
+ * TransactionIdFollowsOrEquals --- is id1 logically >= id2?
+ */
+static inline bool
+TransactionIdFollowsOrEquals(TransactionId id1, TransactionId id2)
+{
+ int32 diff;
+
+ if (!TransactionIdIsNormal(id1) || !TransactionIdIsNormal(id2))
+ return (id1 >= id2);
+
+ diff = (int32) (id1 - id2);
+ return (diff >= 0);
+}
+
+
/* ----------------
* extern declarations
* ----------------
@@ -274,10 +340,6 @@ extern bool TransactionIdDidAbort(TransactionId transactionId);
extern void TransactionIdCommitTree(TransactionId xid, int nxids, TransactionId *xids);
extern void TransactionIdAsyncCommitTree(TransactionId xid, int nxids, TransactionId *xids, XLogRecPtr lsn);
extern void TransactionIdAbortTree(TransactionId xid, int nxids, TransactionId *xids);
-extern bool TransactionIdPrecedes(TransactionId id1, TransactionId id2);
-extern bool TransactionIdPrecedesOrEquals(TransactionId id1, TransactionId id2);
-extern bool TransactionIdFollows(TransactionId id1, TransactionId id2);
-extern bool TransactionIdFollowsOrEquals(TransactionId id1, TransactionId id2);
extern TransactionId TransactionIdLatest(TransactionId mainxid,
int nxids, const TransactionId *xids);
extern XLogRecPtr TransactionIdGetCommitLSN(TransactionId xid);
--
2.43.0
v17-0013-Allow-on-access-pruning-to-set-pages-all-visible.patchtext/x-patch; charset=UTF-8; name=v17-0013-Allow-on-access-pruning-to-set-pages-all-visible.patchDownload
From 54fcba140e515eba0eb1f9d48e7d5875b92e7e39 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <melanieplageman@gmail.com>
Date: Tue, 29 Jul 2025 14:34:30 -0400
Subject: [PATCH v17 13/15] Allow on-access pruning to set pages all-visible
Many queries do not modify the underlying relation. For such queries, if
on-access pruning occurs during the scan, we can check whether the page
has become all-visible and update the visibility map accordingly.
Previously, only vacuum and COPY FREEZE marked pages as all-visible or
all-frozen.
Supporting this requires passing information about whether the relation
is modified from the executor down to the scan descriptor.
This commit implements on-access VM setting for sequential scans as well
as for the underlying heap relation in index scans and bitmap heap
scans.
---
src/backend/access/heap/heapam.c | 15 +++-
src/backend/access/heap/heapam_handler.c | 15 +++-
src/backend/access/heap/pruneheap.c | 71 +++++++++++++++----
src/backend/access/index/indexam.c | 46 ++++++++++++
src/backend/access/table/tableam.c | 39 ++++++++--
src/backend/executor/execMain.c | 4 ++
src/backend/executor/execUtils.c | 2 +
src/backend/executor/nodeBitmapHeapscan.c | 7 +-
src/backend/executor/nodeIndexscan.c | 18 +++--
src/backend/executor/nodeSeqscan.c | 24 +++++--
src/include/access/genam.h | 11 +++
src/include/access/heapam.h | 24 ++++++-
src/include/access/relscan.h | 6 ++
src/include/access/tableam.h | 30 +++++++-
src/include/nodes/execnodes.h | 6 ++
.../t/035_standby_logical_decoding.pl | 3 +-
16 files changed, 282 insertions(+), 39 deletions(-)
diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index 14a2996b9ee..6181e355aaf 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -555,6 +555,7 @@ heap_prepare_pagescan(TableScanDesc sscan)
Buffer buffer = scan->rs_cbuf;
BlockNumber block = scan->rs_cblock;
Snapshot snapshot;
+ Buffer *vmbuffer = NULL;
Page page;
int lines;
bool all_visible;
@@ -569,7 +570,9 @@ heap_prepare_pagescan(TableScanDesc sscan)
/*
* Prune and repair fragmentation for the whole page, if possible.
*/
- heap_page_prune_opt(scan->rs_base.rs_rd, buffer);
+ if (sscan->rs_flags & SO_ALLOW_VM_SET)
+ vmbuffer = &scan->rs_vmbuffer;
+ heap_page_prune_opt(scan->rs_base.rs_rd, buffer, vmbuffer);
/*
* We must hold share lock on the buffer content while examining tuple
@@ -1246,6 +1249,7 @@ heap_beginscan(Relation relation, Snapshot snapshot,
sizeof(TBMIterateResult));
}
+ scan->rs_vmbuffer = InvalidBuffer;
return (TableScanDesc) scan;
}
@@ -1284,6 +1288,12 @@ heap_rescan(TableScanDesc sscan, ScanKey key, bool set_params,
scan->rs_cbuf = InvalidBuffer;
}
+ if (BufferIsValid(scan->rs_vmbuffer))
+ {
+ ReleaseBuffer(scan->rs_vmbuffer);
+ scan->rs_vmbuffer = InvalidBuffer;
+ }
+
/*
* SO_TYPE_BITMAPSCAN would be cleaned up here, but it does not hold any
* additional data vs a normal HeapScan
@@ -1316,6 +1326,9 @@ heap_endscan(TableScanDesc sscan)
if (BufferIsValid(scan->rs_cbuf))
ReleaseBuffer(scan->rs_cbuf);
+ if (BufferIsValid(scan->rs_vmbuffer))
+ ReleaseBuffer(scan->rs_vmbuffer);
+
/*
* Must free the read stream before freeing the BufferAccessStrategy.
*/
diff --git a/src/backend/access/heap/heapam_handler.c b/src/backend/access/heap/heapam_handler.c
index bcbac844bb6..f05b9e4968d 100644
--- a/src/backend/access/heap/heapam_handler.c
+++ b/src/backend/access/heap/heapam_handler.c
@@ -85,6 +85,7 @@ heapam_index_fetch_begin(Relation rel)
hscan->xs_base.rel = rel;
hscan->xs_cbuf = InvalidBuffer;
+ hscan->xs_vmbuffer = InvalidBuffer;
return &hscan->xs_base;
}
@@ -99,6 +100,12 @@ heapam_index_fetch_reset(IndexFetchTableData *scan)
ReleaseBuffer(hscan->xs_cbuf);
hscan->xs_cbuf = InvalidBuffer;
}
+
+ if (BufferIsValid(hscan->xs_vmbuffer))
+ {
+ ReleaseBuffer(hscan->xs_vmbuffer);
+ hscan->xs_vmbuffer = InvalidBuffer;
+ }
}
static void
@@ -138,7 +145,8 @@ heapam_index_fetch_tuple(struct IndexFetchTableData *scan,
* Prune page, but only if we weren't already on this page
*/
if (prev_buf != hscan->xs_cbuf)
- heap_page_prune_opt(hscan->xs_base.rel, hscan->xs_cbuf);
+ heap_page_prune_opt(hscan->xs_base.rel, hscan->xs_cbuf,
+ scan->modifies_base_rel ? NULL : &hscan->xs_vmbuffer);
}
/* Obtain share-lock on the buffer so we can examine visibility */
@@ -2471,6 +2479,7 @@ BitmapHeapScanNextBlock(TableScanDesc scan,
TBMIterateResult *tbmres;
OffsetNumber offsets[TBM_MAX_TUPLES_PER_PAGE];
int noffsets = -1;
+ Buffer *vmbuffer = NULL;
Assert(scan->rs_flags & SO_TYPE_BITMAPSCAN);
Assert(hscan->rs_read_stream);
@@ -2517,7 +2526,9 @@ BitmapHeapScanNextBlock(TableScanDesc scan,
/*
* Prune and repair fragmentation for the whole page, if possible.
*/
- heap_page_prune_opt(scan->rs_rd, buffer);
+ if (scan->rs_flags & SO_ALLOW_VM_SET)
+ vmbuffer = &hscan->rs_vmbuffer;
+ heap_page_prune_opt(scan->rs_rd, buffer, vmbuffer);
/*
* We must hold share lock on the buffer content while examining tuple
diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index 671236ee23f..05e6b902069 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -186,7 +186,9 @@ static bool heap_page_will_set_vis(Relation relation,
Buffer heap_buf,
Buffer vmbuffer,
bool blk_known_av,
- const PruneState *prstate,
+ PruneReason reason,
+ bool do_prune, bool do_freeze,
+ PruneState *prstate,
uint8 *vmflags,
bool *do_set_pd_vis);
@@ -201,9 +203,13 @@ static bool heap_page_will_set_vis(Relation relation,
* if there's not any use in pruning.
*
* Caller must have pin on the buffer, and must *not* have a lock on it.
+ *
+ * If vmbuffer is not NULL, it is okay for pruning to set the visibility map if
+ * the page is all-visible. We will take care of pinning and, if needed,
+ * reading in the page of the visibility map.
*/
void
-heap_page_prune_opt(Relation relation, Buffer buffer)
+heap_page_prune_opt(Relation relation, Buffer buffer, Buffer *vmbuffer)
{
Page page = BufferGetPage(buffer);
TransactionId prune_xid;
@@ -269,12 +275,21 @@ heap_page_prune_opt(Relation relation, Buffer buffer)
PruneFreezeParams params;
PruneFreezeResult presult;
+ params.options = 0;
+ params.vmbuffer = InvalidBuffer;
+
+ if (vmbuffer)
+ {
+ visibilitymap_pin(relation, BufferGetBlockNumber(buffer), vmbuffer);
+ params.options = HEAP_PAGE_PRUNE_UPDATE_VIS;
+ params.vmbuffer = *vmbuffer;
+ }
+
params.relation = relation;
params.buffer = buffer;
params.reason = PRUNE_ON_ACCESS;
params.vistest = vistest;
params.cutoffs = NULL;
- params.vmbuffer = InvalidBuffer;
params.blk_known_av = false;
/*
@@ -454,6 +469,9 @@ heap_page_will_freeze(Relation relation, Buffer buffer,
* have examined this page’s VM bits (e.g., VACUUM in the previous
* heap_vac_scan_next_block() call) and can pass that along.
*
+ * This should be called only after do_freeze has been decided (and do_prune
+ * has been set), as these factor into our heuristic-based decision.
+ *
* Returns true if one or both VM bits should be set, along with the desired
* flags in *vmflags. Also indicates via do_set_pd_vis whether PD_ALL_VISIBLE
* should be set on the heap page.
@@ -464,7 +482,9 @@ heap_page_will_set_vis(Relation relation,
Buffer heap_buf,
Buffer vmbuffer,
bool blk_known_av,
- const PruneState *prstate,
+ PruneReason reason,
+ bool do_prune, bool do_freeze,
+ PruneState *prstate,
uint8 *vmflags,
bool *do_set_pd_vis)
{
@@ -480,6 +500,23 @@ heap_page_will_set_vis(Relation relation,
return false;
}
+ /*
+ * If this is an on-access call and we're not actually pruning, avoid
+ * setting the visibility map if it would newly dirty the heap page or, if
+ * the page is already dirty, if doing so would require including a
+ * full-page image (FPI) of the heap page in the WAL. This situation
+ * should be rare, as on-access pruning is only attempted when
+ * pd_prune_xid is valid.
+ */
+ if (reason == PRUNE_ON_ACCESS &&
+ prstate->all_visible &&
+ !do_prune && !do_freeze &&
+ (!BufferIsDirty(heap_buf) || XLogCheckBufferNeedsBackup(heap_buf)))
+ {
+ prstate->all_visible = prstate->all_frozen = false;
+ return false;
+ }
+
if (prstate->all_visible && !PageIsAllVisible(heap_page))
*do_set_pd_vis = true;
@@ -503,6 +540,9 @@ heap_page_will_set_vis(Relation relation,
* page-level bit is clear. However, it's possible that in vacuum the bit
* got cleared after heap_vac_scan_next_block() was called, so we must
* recheck with buffer lock before concluding that the VM is corrupt.
+ *
+ * XXX: This will never trigger for on-access pruning because it passes
+ * blk_known_av as false. Should we remove that condition here?
*/
else if (blk_known_av && !PageIsAllVisible(heap_page) &&
visibilitymap_get_status(relation, heap_blk, &vmbuffer) != 0)
@@ -911,6 +951,14 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
prstate.ndead > 0 ||
prstate.nunused > 0;
+ /*
+ * Even if we don't prune anything, if we found a new value for the
+ * pd_prune_xid field or the page was marked full, we will update the hint
+ * bit.
+ */
+ do_hint_prune = ((PageHeader) page)->pd_prune_xid != prstate.new_prune_xid ||
+ PageIsFull(page);
+
/*
* After processing all the live tuples on the page, if the newest xmin
* amongst them is not visible to everyone, the page cannot be
@@ -921,14 +969,6 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
!GlobalVisXidVisibleToAll(prstate.vistest, prstate.visibility_cutoff_xid))
prstate.all_visible = prstate.all_frozen = false;
- /*
- * Even if we don't prune anything, if we found a new value for the
- * pd_prune_xid field or the page was marked full, we will update the hint
- * bit.
- */
- do_hint_prune = ((PageHeader) page)->pd_prune_xid != prstate.new_prune_xid ||
- PageIsFull(page);
-
/*
* Decide if we want to go ahead with freezing according to the freeze
* plans we prepared, or not.
@@ -972,6 +1012,7 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
*/
do_set_vm = heap_page_will_set_vis(params->relation,
blockno, buffer, vmbuffer, params->blk_known_av,
+ params->reason, do_prune, do_freeze,
&prstate, &new_vmbits, &do_set_pd_vis);
/* We should only set the VM if PD_ALL_VISIBLE is set or will be */
@@ -2244,7 +2285,7 @@ heap_log_freeze_plan(HeapTupleFreeze *tuples, int ntuples,
/*
* Calculate the conflict horizon for the whole XLOG_HEAP2_PRUNE_VACUUM_SCAN
- * record.
+ * or XLOG_HEAP2_PRUNE_ON_ACCESS record.
*/
static TransactionId
get_conflict_xid(bool do_prune, bool do_freeze, bool do_set_vm,
@@ -2313,8 +2354,8 @@ get_conflict_xid(bool do_prune, bool do_freeze, bool do_set_vm,
* - Reaping: During vacuum phase III, items that are already LP_DEAD are
* marked as unused.
*
- * - VM updates: After vacuum phases I and III, the heap page may be marked
- * all-visible and all-frozen.
+ * - VM updates: After vacuum phases I and III and on-access, the heap page
+ * may be marked all-visible and all-frozen.
*
* These changes all happen together, so we use a single WAL record for them
* all.
diff --git a/src/backend/access/index/indexam.c b/src/backend/access/index/indexam.c
index 0492d92d23b..8d582a8eafd 100644
--- a/src/backend/access/index/indexam.c
+++ b/src/backend/access/index/indexam.c
@@ -289,6 +289,32 @@ index_beginscan(Relation heapRelation,
return scan;
}
+/*
+ * Similar to index_beginscan(), but allows the caller to indicate whether the
+ * query modifies the underlying base relation. This is used when the caller
+ * wants to attempt marking pages in the base relation as all-visible in the
+ * visibility map during on-access pruning.
+ */
+IndexScanDesc
+index_beginscan_vmset(Relation heapRelation,
+ Relation indexRelation,
+ Snapshot snapshot,
+ IndexScanInstrumentation *instrument,
+ int nkeys, int norderbys, bool modifies_base_rel)
+{
+ IndexScanDesc scan;
+
+ scan = index_beginscan(heapRelation,
+ indexRelation,
+ snapshot,
+ instrument,
+ nkeys, norderbys);
+
+ scan->xs_heapfetch->modifies_base_rel = modifies_base_rel;
+
+ return scan;
+}
+
/*
* index_beginscan_bitmap - start a scan of an index with amgetbitmap
*
@@ -620,6 +646,26 @@ index_beginscan_parallel(Relation heaprel, Relation indexrel,
return scan;
}
+/*
+ * Parallel version of index_beginscan_vmset()
+ */
+IndexScanDesc
+index_beginscan_parallel_vmset(Relation heaprel, Relation indexrel,
+ IndexScanInstrumentation *instrument,
+ int nkeys, int norderbys,
+ ParallelIndexScanDesc pscan,
+ bool modifies_base_rel)
+{
+ IndexScanDesc scan;
+
+ scan = index_beginscan_parallel(heaprel, indexrel,
+ instrument,
+ nkeys, norderbys,
+ pscan);
+ scan->xs_heapfetch->modifies_base_rel = modifies_base_rel;
+ return scan;
+}
+
/* ----------------
* index_getnext_tid - get the next TID from a scan
*
diff --git a/src/backend/access/table/tableam.c b/src/backend/access/table/tableam.c
index 5e41404937e..3e3a0f72a71 100644
--- a/src/backend/access/table/tableam.c
+++ b/src/backend/access/table/tableam.c
@@ -49,6 +49,10 @@
char *default_table_access_method = DEFAULT_TABLE_ACCESS_METHOD;
bool synchronize_seqscans = true;
+/* Helper for table_beginscan_parallel() and table_beginscan_parallel_vmset() */
+static TableScanDesc table_beginscan_parallel_common(Relation relation, ParallelTableScanDesc pscan,
+ uint32 flags);
+
/* ----------------------------------------------------------------------------
* Slot functions.
@@ -162,12 +166,14 @@ table_parallelscan_initialize(Relation rel, ParallelTableScanDesc pscan,
}
}
-TableScanDesc
-table_beginscan_parallel(Relation relation, ParallelTableScanDesc pscan)
+/*
+ * Common helper for table_beginscan_parallel() and table_beginscan_parallel_vmset()
+ */
+static TableScanDesc
+table_beginscan_parallel_common(Relation relation, ParallelTableScanDesc pscan,
+ uint32 flags)
{
Snapshot snapshot;
- uint32 flags = SO_TYPE_SEQSCAN |
- SO_ALLOW_STRAT | SO_ALLOW_SYNC | SO_ALLOW_PAGEMODE;
Assert(RelFileLocatorEquals(relation->rd_locator, pscan->phs_locator));
@@ -188,6 +194,31 @@ table_beginscan_parallel(Relation relation, ParallelTableScanDesc pscan)
pscan, flags);
}
+TableScanDesc
+table_beginscan_parallel(Relation relation, ParallelTableScanDesc pscan)
+{
+ uint32 flags = SO_TYPE_SEQSCAN |
+ SO_ALLOW_STRAT | SO_ALLOW_SYNC | SO_ALLOW_PAGEMODE;
+
+ return table_beginscan_parallel_common(relation, pscan, flags);
+}
+
+/*
+ * Parallel version of table_beginscan_vmset()
+ */
+TableScanDesc
+table_beginscan_parallel_vmset(Relation relation, ParallelTableScanDesc pscan,
+ bool modifies_rel)
+{
+ uint32 flags = SO_TYPE_SEQSCAN |
+ SO_ALLOW_STRAT | SO_ALLOW_SYNC | SO_ALLOW_PAGEMODE;
+
+ if (!modifies_rel)
+ flags |= SO_ALLOW_VM_SET;
+
+ return table_beginscan_parallel_common(relation, pscan, flags);
+}
+
/* ----------------------------------------------------------------------------
* Index scan related functions.
diff --git a/src/backend/executor/execMain.c b/src/backend/executor/execMain.c
index 831c55ce787..15be318fd41 100644
--- a/src/backend/executor/execMain.c
+++ b/src/backend/executor/execMain.c
@@ -916,6 +916,10 @@ InitPlan(QueryDesc *queryDesc, int eflags)
break;
}
+ /* If it has a rowmark, the relation is modified */
+ estate->es_modified_relids = bms_add_member(estate->es_modified_relids,
+ rc->rti);
+
/* Check that relation is a legal target for marking */
if (relation)
CheckValidRowMarkRel(relation, rc->markType);
diff --git a/src/backend/executor/execUtils.c b/src/backend/executor/execUtils.c
index fdc65c2b42b..28a06dcd244 100644
--- a/src/backend/executor/execUtils.c
+++ b/src/backend/executor/execUtils.c
@@ -893,6 +893,8 @@ ExecInitResultRelation(EState *estate, ResultRelInfo *resultRelInfo,
estate->es_result_relations = (ResultRelInfo **)
palloc0(estate->es_range_table_size * sizeof(ResultRelInfo *));
estate->es_result_relations[rti - 1] = resultRelInfo;
+ estate->es_modified_relids = bms_add_member(estate->es_modified_relids,
+ rti);
/*
* Saving in the list allows to avoid needlessly traversing the whole
diff --git a/src/backend/executor/nodeBitmapHeapscan.c b/src/backend/executor/nodeBitmapHeapscan.c
index bf24f3d7fe0..af6db9f7919 100644
--- a/src/backend/executor/nodeBitmapHeapscan.c
+++ b/src/backend/executor/nodeBitmapHeapscan.c
@@ -105,11 +105,16 @@ BitmapTableScanSetup(BitmapHeapScanState *node)
*/
if (!node->ss.ss_currentScanDesc)
{
+ bool modifies_rel =
+ bms_is_member(((Scan *) node->ss.ps.plan)->scanrelid,
+ node->ss.ps.state->es_modified_relids);
+
node->ss.ss_currentScanDesc =
table_beginscan_bm(node->ss.ss_currentRelation,
node->ss.ps.state->es_snapshot,
0,
- NULL);
+ NULL,
+ modifies_rel);
}
node->ss.ss_currentScanDesc->st.rs_tbmiterator = tbmiterator;
diff --git a/src/backend/executor/nodeIndexscan.c b/src/backend/executor/nodeIndexscan.c
index 7fcaa37fe62..c2ffbd3b08e 100644
--- a/src/backend/executor/nodeIndexscan.c
+++ b/src/backend/executor/nodeIndexscan.c
@@ -102,16 +102,22 @@ IndexNext(IndexScanState *node)
if (scandesc == NULL)
{
+
+ bool modifies_base_rel =
+ bms_is_member(((Scan *) node->ss.ps.plan)->scanrelid,
+ estate->es_modified_relids);
+
/*
* We reach here if the index scan is not parallel, or if we're
* serially executing an index scan that was planned to be parallel.
*/
- scandesc = index_beginscan(node->ss.ss_currentRelation,
- node->iss_RelationDesc,
- estate->es_snapshot,
- &node->iss_Instrument,
- node->iss_NumScanKeys,
- node->iss_NumOrderByKeys);
+ scandesc = index_beginscan_vmset(node->ss.ss_currentRelation,
+ node->iss_RelationDesc,
+ estate->es_snapshot,
+ &node->iss_Instrument,
+ node->iss_NumScanKeys,
+ node->iss_NumOrderByKeys,
+ modifies_base_rel);
node->iss_ScanDesc = scandesc;
diff --git a/src/backend/executor/nodeSeqscan.c b/src/backend/executor/nodeSeqscan.c
index 94047d29430..fd69275c181 100644
--- a/src/backend/executor/nodeSeqscan.c
+++ b/src/backend/executor/nodeSeqscan.c
@@ -65,13 +65,18 @@ SeqNext(SeqScanState *node)
if (scandesc == NULL)
{
+ bool modifies_rel =
+ bms_is_member(((Scan *) node->ss.ps.plan)->scanrelid,
+ estate->es_modified_relids);
+
/*
* We reach here if the scan is not parallel, or if we're serially
* executing a scan that was planned to be parallel.
*/
- scandesc = table_beginscan(node->ss.ss_currentRelation,
- estate->es_snapshot,
- 0, NULL);
+ scandesc = table_beginscan_vmset(node->ss.ss_currentRelation,
+ estate->es_snapshot,
+ 0, NULL, modifies_rel);
+
node->ss.ss_currentScanDesc = scandesc;
}
@@ -366,6 +371,7 @@ ExecSeqScanInitializeDSM(SeqScanState *node,
ParallelContext *pcxt)
{
EState *estate = node->ss.ps.state;
+ bool modifies_rel;
ParallelTableScanDesc pscan;
pscan = shm_toc_allocate(pcxt->toc, node->pscan_len);
@@ -373,8 +379,11 @@ ExecSeqScanInitializeDSM(SeqScanState *node,
pscan,
estate->es_snapshot);
shm_toc_insert(pcxt->toc, node->ss.ps.plan->plan_node_id, pscan);
+ modifies_rel = bms_is_member(((Scan *) node->ss.ps.plan)->scanrelid,
+ estate->es_modified_relids);
node->ss.ss_currentScanDesc =
- table_beginscan_parallel(node->ss.ss_currentRelation, pscan);
+ table_beginscan_parallel_vmset(node->ss.ss_currentRelation, pscan,
+ modifies_rel);
}
/* ----------------------------------------------------------------
@@ -404,8 +413,13 @@ ExecSeqScanInitializeWorker(SeqScanState *node,
ParallelWorkerContext *pwcxt)
{
ParallelTableScanDesc pscan;
+ bool modifies_rel =
+ bms_is_member(((Scan *) node->ss.ps.plan)->scanrelid,
+ node->ss.ps.state->es_modified_relids);
pscan = shm_toc_lookup(pwcxt->toc, node->ss.ps.plan->plan_node_id, false);
node->ss.ss_currentScanDesc =
- table_beginscan_parallel(node->ss.ss_currentRelation, pscan);
+ table_beginscan_parallel_vmset(node->ss.ss_currentRelation,
+ pscan,
+ modifies_rel);
}
diff --git a/src/include/access/genam.h b/src/include/access/genam.h
index 9200a22bd9f..aa2112c8e04 100644
--- a/src/include/access/genam.h
+++ b/src/include/access/genam.h
@@ -178,6 +178,11 @@ extern IndexScanDesc index_beginscan(Relation heapRelation,
Snapshot snapshot,
IndexScanInstrumentation *instrument,
int nkeys, int norderbys);
+extern IndexScanDesc index_beginscan_vmset(Relation heapRelation,
+ Relation indexRelation,
+ Snapshot snapshot,
+ IndexScanInstrumentation *instrument,
+ int nkeys, int norderbys, bool modifies_heap_rel);
extern IndexScanDesc index_beginscan_bitmap(Relation indexRelation,
Snapshot snapshot,
IndexScanInstrumentation *instrument,
@@ -204,6 +209,12 @@ extern IndexScanDesc index_beginscan_parallel(Relation heaprel,
IndexScanInstrumentation *instrument,
int nkeys, int norderbys,
ParallelIndexScanDesc pscan);
+
+extern IndexScanDesc index_beginscan_parallel_vmset(Relation heaprel, Relation indexrel,
+ IndexScanInstrumentation *instrument,
+ int nkeys, int norderbys,
+ ParallelIndexScanDesc pscan,
+ bool modifies_rel);
extern ItemPointer index_getnext_tid(IndexScanDesc scan,
ScanDirection direction);
extern bool index_fetch_heap(IndexScanDesc scan, TupleTableSlot *slot);
diff --git a/src/include/access/heapam.h b/src/include/access/heapam.h
index df0632aebc6..59d8ce9ad42 100644
--- a/src/include/access/heapam.h
+++ b/src/include/access/heapam.h
@@ -95,6 +95,13 @@ typedef struct HeapScanDescData
*/
ParallelBlockTableScanWorkerData *rs_parallelworkerdata;
+ /*
+ * For sequential scans and bitmap heap scans. If the relation is not
+ * being modified, on-access pruning may read in the current heap page's
+ * corresponding VM block to this buffer.
+ */
+ Buffer rs_vmbuffer;
+
/* these fields only used in page-at-a-time mode and for bitmap scans */
uint32 rs_cindex; /* current tuple's index in vistuples */
uint32 rs_ntuples; /* number of visible tuples on page */
@@ -117,8 +124,18 @@ typedef struct IndexFetchHeapData
{
IndexFetchTableData xs_base; /* AM independent part of the descriptor */
- Buffer xs_cbuf; /* current heap buffer in scan, if any */
- /* NB: if xs_cbuf is not InvalidBuffer, we hold a pin on that buffer */
+ /*
+ * Current heap buffer in scan, if any. NB: if xs_cbuf is not
+ * InvalidBuffer, we hold a pin on that buffer.
+ */
+ Buffer xs_cbuf;
+
+ /*
+ * For index scans that do not modify the underlying heap table, on-access
+ * pruning may read in the current heap page's corresponding VM block to
+ * this buffer.
+ */
+ Buffer xs_vmbuffer;
} IndexFetchHeapData;
/* Result codes for HeapTupleSatisfiesVacuum */
@@ -415,7 +432,8 @@ extern TransactionId heap_index_delete_tuples(Relation rel,
TM_IndexDeleteOp *delstate);
/* in heap/pruneheap.c */
-extern void heap_page_prune_opt(Relation relation, Buffer buffer);
+extern void heap_page_prune_opt(Relation relation, Buffer buffer,
+ Buffer *vmbuffer);
extern void heap_page_prune_and_freeze(PruneFreezeParams *params,
PruneFreezeResult *presult,
OffsetNumber *off_loc,
diff --git a/src/include/access/relscan.h b/src/include/access/relscan.h
index b5e0fb386c0..f496e0b4939 100644
--- a/src/include/access/relscan.h
+++ b/src/include/access/relscan.h
@@ -121,6 +121,12 @@ typedef struct ParallelBlockTableScanWorkerData *ParallelBlockTableScanWorker;
typedef struct IndexFetchTableData
{
Relation rel;
+
+ /*
+ * Some optimizations can only be performed if the query does not modify
+ * the underlying relation. Track that here.
+ */
+ bool modifies_base_rel;
} IndexFetchTableData;
struct IndexScanInstrumentation;
diff --git a/src/include/access/tableam.h b/src/include/access/tableam.h
index e16bf025692..f250d4e7aec 100644
--- a/src/include/access/tableam.h
+++ b/src/include/access/tableam.h
@@ -63,6 +63,8 @@ typedef enum ScanOptions
/* unregister snapshot at scan end? */
SO_TEMP_SNAPSHOT = 1 << 9,
+ /* whether or not scan should attempt to set the VM */
+ SO_ALLOW_VM_SET = 1 << 10,
} ScanOptions;
/*
@@ -882,6 +884,25 @@ table_beginscan(Relation rel, Snapshot snapshot,
return rel->rd_tableam->scan_begin(rel, snapshot, nkeys, key, NULL, flags);
}
+/*
+ * Similar to table_beginscan(), but allows the caller to indicate whether the
+ * query modifies the relation. This is used when the caller wants to attempt
+ * marking pages in the relation as all-visible in the visibility map during
+ * on-access pruning.
+ */
+static inline TableScanDesc
+table_beginscan_vmset(Relation rel, Snapshot snapshot,
+ int nkeys, struct ScanKeyData *key, bool modifies_rel)
+{
+ uint32 flags = SO_TYPE_SEQSCAN |
+ SO_ALLOW_STRAT | SO_ALLOW_SYNC | SO_ALLOW_PAGEMODE;
+
+ if (!modifies_rel)
+ flags |= SO_ALLOW_VM_SET;
+
+ return rel->rd_tableam->scan_begin(rel, snapshot, nkeys, key, NULL, flags);
+}
+
/*
* Like table_beginscan(), but for scanning catalog. It'll automatically use a
* snapshot appropriate for scanning catalog relations.
@@ -919,10 +940,13 @@ table_beginscan_strat(Relation rel, Snapshot snapshot,
*/
static inline TableScanDesc
table_beginscan_bm(Relation rel, Snapshot snapshot,
- int nkeys, ScanKeyData *key)
+ int nkeys, ScanKeyData *key, bool modifies_rel)
{
uint32 flags = SO_TYPE_BITMAPSCAN | SO_ALLOW_PAGEMODE;
+ if (!modifies_rel)
+ flags |= SO_ALLOW_VM_SET;
+
return rel->rd_tableam->scan_begin(rel, snapshot, nkeys, key,
NULL, flags);
}
@@ -1130,6 +1154,10 @@ extern void table_parallelscan_initialize(Relation rel,
extern TableScanDesc table_beginscan_parallel(Relation relation,
ParallelTableScanDesc pscan);
+extern TableScanDesc table_beginscan_parallel_vmset(Relation relation,
+ ParallelTableScanDesc pscan,
+ bool modifies_rel);
+
/*
* Restart a parallel scan. Call this in the leader process. Caller is
* responsible for making sure that all workers have finished the scan
diff --git a/src/include/nodes/execnodes.h b/src/include/nodes/execnodes.h
index a36653c37f9..9c54fa06e4a 100644
--- a/src/include/nodes/execnodes.h
+++ b/src/include/nodes/execnodes.h
@@ -676,6 +676,12 @@ typedef struct EState
* ExecDoInitialPruning() */
const char *es_sourceText; /* Source text from QueryDesc */
+ /*
+ * RT indexes of relations modified by the query either through
+ * UPDATE/DELETE/INSERT/MERGE or SELECT FOR UPDATE
+ */
+ Bitmapset *es_modified_relids;
+
JunkFilter *es_junkFilter; /* top-level junk filter, if any */
/* If query can insert/delete tuples, the command ID to mark them with */
diff --git a/src/test/recovery/t/035_standby_logical_decoding.pl b/src/test/recovery/t/035_standby_logical_decoding.pl
index c9c182892cf..f5c0c65b260 100644
--- a/src/test/recovery/t/035_standby_logical_decoding.pl
+++ b/src/test/recovery/t/035_standby_logical_decoding.pl
@@ -296,6 +296,7 @@ wal_level = 'logical'
max_replication_slots = 4
max_wal_senders = 4
autovacuum = off
+hot_standby_feedback = on
});
$node_primary->dump_info;
$node_primary->start;
@@ -745,7 +746,7 @@ check_pg_recvlogical_stderr($handle,
$logstart = -s $node_standby->logfile;
reactive_slots_change_hfs_and_wait_for_xmins('shared_row_removal_',
- 'no_conflict_', 0, 1);
+ 'no_conflict_', 1, 0);
# This should not trigger a conflict
wait_until_vacuum_can_remove(
--
2.43.0
v17-0009-Remove-XLOG_HEAP2_VISIBLE-entirely.patchtext/x-patch; charset=US-ASCII; name=v17-0009-Remove-XLOG_HEAP2_VISIBLE-entirely.patchDownload
From 3f8b38eec729ebe3711cdb850bb768f14029795a Mon Sep 17 00:00:00 2001
From: Melanie Plageman <melanieplageman@gmail.com>
Date: Sat, 27 Sep 2025 11:55:36 -0400
Subject: [PATCH v17 09/15] Remove XLOG_HEAP2_VISIBLE entirely
As no remaining users emit XLOG_HEAP2_VISIBLE records.
This includes deleting the xl_heap_visible struct and all functions
responsible for emitting or replaying XLOG_HEAP2_VISIBLE records.
Author: Melanie Plageman <melanieplageman@gmail.com>
Reviewed-by: Andrey Borodin <x4mmm@yandex-team.ru>
---
src/backend/access/common/bufmask.c | 4 +-
src/backend/access/heap/heapam.c | 54 +-------
src/backend/access/heap/heapam_xlog.c | 155 ++---------------------
src/backend/access/heap/pruneheap.c | 18 ++-
src/backend/access/heap/vacuumlazy.c | 16 +--
src/backend/access/heap/visibilitymap.c | 110 +---------------
src/backend/access/rmgrdesc/heapdesc.c | 10 --
src/backend/replication/logical/decode.c | 1 -
src/backend/storage/ipc/standby.c | 12 +-
src/include/access/heapam_xlog.h | 28 +---
src/include/access/visibilitymap.h | 15 +--
src/include/access/visibilitymapdefs.h | 9 --
src/tools/pgindent/typedefs.list | 1 -
13 files changed, 56 insertions(+), 377 deletions(-)
diff --git a/src/backend/access/common/bufmask.c b/src/backend/access/common/bufmask.c
index bb260cffa68..5f07f179415 100644
--- a/src/backend/access/common/bufmask.c
+++ b/src/backend/access/common/bufmask.c
@@ -56,8 +56,8 @@ mask_page_hint_bits(Page page)
/*
* During replay, if the page LSN has advanced past our XLOG record's LSN,
- * we don't mark the page all-visible. See heap_xlog_visible() for
- * details.
+ * we don't mark the page all-visible. See heap_xlog_prune_and_freeze()
+ * for more details.
*/
PageClearAllVisible(page);
}
diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index 7f354caec31..14a2996b9ee 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -2524,11 +2524,11 @@ heap_multi_insert(Relation relation, TupleTableSlot **slots, int ntuples,
else if (all_frozen_set)
{
PageSetAllVisible(page);
- visibilitymap_set_vmbits(BufferGetBlockNumber(buffer),
- vmbuffer,
- VISIBILITYMAP_ALL_VISIBLE |
- VISIBILITYMAP_ALL_FROZEN,
- RelationGetRelationName(relation));
+ visibilitymap_set(BufferGetBlockNumber(buffer),
+ vmbuffer,
+ VISIBILITYMAP_ALL_VISIBLE |
+ VISIBILITYMAP_ALL_FROZEN,
+ RelationGetRelationName(relation));
}
/*
@@ -8798,50 +8798,6 @@ bottomup_sort_and_shrink(TM_IndexDeleteOp *delstate)
return nblocksfavorable;
}
-/*
- * Perform XLogInsert for a heap-visible operation. 'block' is the block
- * being marked all-visible, and vm_buffer is the buffer containing the
- * corresponding visibility map block. Both should have already been modified
- * and dirtied.
- *
- * snapshotConflictHorizon comes from the largest xmin on the page being
- * marked all-visible. REDO routine uses it to generate recovery conflicts.
- *
- * If checksums or wal_log_hints are enabled, we may also generate a full-page
- * image of heap_buffer. Otherwise, we optimize away the FPI (by specifying
- * REGBUF_NO_IMAGE for the heap buffer), in which case the caller should *not*
- * update the heap page's LSN.
- */
-XLogRecPtr
-log_heap_visible(Relation rel, Buffer heap_buffer, Buffer vm_buffer,
- TransactionId snapshotConflictHorizon, uint8 vmflags)
-{
- xl_heap_visible xlrec;
- XLogRecPtr recptr;
- uint8 flags;
-
- Assert(BufferIsValid(heap_buffer));
- Assert(BufferIsValid(vm_buffer));
-
- xlrec.snapshotConflictHorizon = snapshotConflictHorizon;
- xlrec.flags = vmflags;
- if (RelationIsAccessibleInLogicalDecoding(rel))
- xlrec.flags |= VISIBILITYMAP_XLOG_CATALOG_REL;
- XLogBeginInsert();
- XLogRegisterData(&xlrec, SizeOfHeapVisible);
-
- XLogRegisterBuffer(0, vm_buffer, 0);
-
- flags = REGBUF_STANDARD;
- if (!XLogHintBitIsNeeded())
- flags |= REGBUF_NO_IMAGE;
- XLogRegisterBuffer(1, heap_buffer, flags);
-
- recptr = XLogInsert(RM_HEAP2_ID, XLOG_HEAP2_VISIBLE);
-
- return recptr;
-}
-
/*
* Perform XLogInsert for a heap-update operation. Caller must already
* have modified the buffer(s) and marked them dirty.
diff --git a/src/backend/access/heap/heapam_xlog.c b/src/backend/access/heap/heapam_xlog.c
index f6624bc98d0..aeb97cc3cea 100644
--- a/src/backend/access/heap/heapam_xlog.c
+++ b/src/backend/access/heap/heapam_xlog.c
@@ -258,7 +258,7 @@ heap_xlog_prune_freeze(XLogReaderState *record)
/* We don't have relation name during recovery, so use relfilenode */
relname = psprintf("%u", rlocator.relNumber);
- old_vmbits = visibilitymap_set_vmbits(blkno, vmbuffer, vmflags, relname);
+ old_vmbits = visibilitymap_set(blkno, vmbuffer, vmflags, relname);
/* Only set VM page LSN if we modified the page */
if (old_vmbits != vmflags)
@@ -276,142 +276,6 @@ heap_xlog_prune_freeze(XLogReaderState *record)
XLogRecordPageWithFreeSpace(rlocator, blkno, freespace);
}
-/*
- * Replay XLOG_HEAP2_VISIBLE records.
- *
- * The critical integrity requirement here is that we must never end up with
- * a situation where the visibility map bit is set, and the page-level
- * PD_ALL_VISIBLE bit is clear. If that were to occur, then a subsequent
- * page modification would fail to clear the visibility map bit.
- */
-static void
-heap_xlog_visible(XLogReaderState *record)
-{
- XLogRecPtr lsn = record->EndRecPtr;
- xl_heap_visible *xlrec = (xl_heap_visible *) XLogRecGetData(record);
- Buffer vmbuffer = InvalidBuffer;
- Buffer buffer;
- Page page;
- RelFileLocator rlocator;
- BlockNumber blkno;
- XLogRedoAction action;
-
- Assert((xlrec->flags & VISIBILITYMAP_XLOG_VALID_BITS) == xlrec->flags);
-
- XLogRecGetBlockTag(record, 1, &rlocator, NULL, &blkno);
-
- /*
- * If there are any Hot Standby transactions running that have an xmin
- * horizon old enough that this page isn't all-visible for them, they
- * might incorrectly decide that an index-only scan can skip a heap fetch.
- *
- * NB: It might be better to throw some kind of "soft" conflict here that
- * forces any index-only scan that is in flight to perform heap fetches,
- * rather than killing the transaction outright.
- */
- if (InHotStandby)
- ResolveRecoveryConflictWithSnapshot(xlrec->snapshotConflictHorizon,
- xlrec->flags & VISIBILITYMAP_XLOG_CATALOG_REL,
- rlocator);
-
- /*
- * Read the heap page, if it still exists. If the heap file has dropped or
- * truncated later in recovery, we don't need to update the page, but we'd
- * better still update the visibility map.
- */
- action = XLogReadBufferForRedo(record, 1, &buffer);
- if (action == BLK_NEEDS_REDO)
- {
- /*
- * We don't bump the LSN of the heap page when setting the visibility
- * map bit (unless checksums or wal_hint_bits is enabled, in which
- * case we must). This exposes us to torn page hazards, but since
- * we're not inspecting the existing page contents in any way, we
- * don't care.
- */
- page = BufferGetPage(buffer);
-
- PageSetAllVisible(page);
-
- if (XLogHintBitIsNeeded())
- PageSetLSN(page, lsn);
-
- MarkBufferDirty(buffer);
- }
- else if (action == BLK_RESTORED)
- {
- /*
- * If heap block was backed up, we already restored it and there's
- * nothing more to do. (This can only happen with checksums or
- * wal_log_hints enabled.)
- */
- }
-
- if (BufferIsValid(buffer))
- {
- Size space = PageGetFreeSpace(BufferGetPage(buffer));
-
- UnlockReleaseBuffer(buffer);
-
- /*
- * Since FSM is not WAL-logged and only updated heuristically, it
- * easily becomes stale in standbys. If the standby is later promoted
- * and runs VACUUM, it will skip updating individual free space
- * figures for pages that became all-visible (or all-frozen, depending
- * on the vacuum mode,) which is troublesome when FreeSpaceMapVacuum
- * propagates too optimistic free space values to upper FSM layers;
- * later inserters try to use such pages only to find out that they
- * are unusable. This can cause long stalls when there are many such
- * pages.
- *
- * Forestall those problems by updating FSM's idea about a page that
- * is becoming all-visible or all-frozen.
- *
- * Do this regardless of a full-page image being applied, since the
- * FSM data is not in the page anyway.
- */
- if (xlrec->flags & VISIBILITYMAP_VALID_BITS)
- XLogRecordPageWithFreeSpace(rlocator, blkno, space);
- }
-
- /*
- * Even if we skipped the heap page update due to the LSN interlock, it's
- * still safe to update the visibility map. Any WAL record that clears
- * the visibility map bit does so before checking the page LSN, so any
- * bits that need to be cleared will still be cleared.
- */
- if (XLogReadBufferForRedoExtended(record, 0, RBM_ZERO_ON_ERROR, false,
- &vmbuffer) == BLK_NEEDS_REDO)
- {
- Page vmpage = BufferGetPage(vmbuffer);
- Relation reln;
- uint8 vmbits;
-
- /* initialize the page if it was read as zeros */
- if (PageIsNew(vmpage))
- PageInit(vmpage, BLCKSZ, 0);
-
- /* remove VISIBILITYMAP_XLOG_* */
- vmbits = xlrec->flags & VISIBILITYMAP_VALID_BITS;
-
- /*
- * XLogReadBufferForRedoExtended locked the buffer. But
- * visibilitymap_set will handle locking itself.
- */
- LockBuffer(vmbuffer, BUFFER_LOCK_UNLOCK);
-
- reln = CreateFakeRelcacheEntry(rlocator);
-
- visibilitymap_set(reln, blkno, InvalidBuffer, lsn, vmbuffer,
- xlrec->snapshotConflictHorizon, vmbits);
-
- ReleaseBuffer(vmbuffer);
- FreeFakeRelcacheEntry(reln);
- }
- else if (BufferIsValid(vmbuffer))
- UnlockReleaseBuffer(vmbuffer);
-}
-
/*
* Given an "infobits" field from an XLog record, set the correct bits in the
* given infomask and infomask2 for the tuple touched by the record.
@@ -789,8 +653,8 @@ heap_xlog_multi_insert(XLogReaderState *record)
*
* During recovery, however, no concurrent writers exist. Therefore,
* updating the VM without holding the heap page lock is safe enough. This
- * same approach is taken when replaying xl_heap_visible records (see
- * heap_xlog_visible()).
+ * same approach is taken when replaying XLOG_HEAP2_PRUNE* records (see
+ * heap_xlog_prune_and_freeze()).
*/
if ((xlrec->flags & XLH_INSERT_ALL_FROZEN_SET) &&
XLogReadBufferForRedoExtended(record, 1, RBM_ZERO_ON_ERROR, false,
@@ -805,11 +669,11 @@ heap_xlog_multi_insert(XLogReaderState *record)
/* We don't have relation name during recovery, so use relfilenode */
relname = psprintf("%u", rlocator.relNumber);
- visibilitymap_set_vmbits(blkno,
- vmbuffer,
- VISIBILITYMAP_ALL_VISIBLE |
- VISIBILITYMAP_ALL_FROZEN,
- relname);
+ visibilitymap_set(blkno,
+ vmbuffer,
+ VISIBILITYMAP_ALL_VISIBLE |
+ VISIBILITYMAP_ALL_FROZEN,
+ relname);
PageSetLSN(BufferGetPage(vmbuffer), lsn);
pfree(relname);
@@ -1390,9 +1254,6 @@ heap2_redo(XLogReaderState *record)
case XLOG_HEAP2_PRUNE_VACUUM_CLEANUP:
heap_xlog_prune_freeze(record);
break;
- case XLOG_HEAP2_VISIBLE:
- heap_xlog_visible(record);
- break;
case XLOG_HEAP2_MULTI_INSERT:
heap_xlog_multi_insert(record);
break;
diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index f70563008e1..21b24f3992e 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -1029,9 +1029,9 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
{
Assert(PageIsAllVisible(page));
- old_vmbits = visibilitymap_set_vmbits(blockno,
- vmbuffer, new_vmbits,
- RelationGetRelationName(params->relation));
+ old_vmbits = visibilitymap_set(blockno,
+ vmbuffer, new_vmbits,
+ RelationGetRelationName(params->relation));
if (old_vmbits == new_vmbits)
{
LockBuffer(vmbuffer, BUFFER_LOCK_UNLOCK);
@@ -2308,14 +2308,18 @@ get_conflict_xid(bool do_prune, bool do_freeze, bool do_set_vm,
*
* This is used for several different page maintenance operations:
*
- * - Page pruning, in VACUUM's 1st pass or on access: Some items are
+ * - Page pruning, in vacuum phase I or on-access: Some items are
* redirected, some marked dead, and some removed altogether.
*
- * - Freezing: Items are marked as 'frozen'.
+ * - Freezing: During vacuum phase I, items are marked as 'frozen'
*
- * - Vacuum, 2nd pass: Items that are already LP_DEAD are marked as unused.
+ * - Reaping: During vacuum phase III, items that are already LP_DEAD are
+ * marked as unused.
*
- * They have enough commonalities that we use a single WAL record for them
+ * - VM updates: After vacuum phases I and III, the heap page may be marked
+ * all-visible and all-frozen.
+ *
+ * These changes all happen together, so we use a single WAL record for them
* all.
*
* If replaying the record requires a cleanup lock, pass cleanup_lock = true.
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index 941b989ec50..1b20c96033e 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -1900,11 +1900,11 @@ lazy_scan_new_or_empty(LVRelState *vacrel, Buffer buf, BlockNumber blkno,
log_newpage_buffer(buf, true);
PageSetAllVisible(page);
- visibilitymap_set_vmbits(blkno,
- vmbuffer,
- VISIBILITYMAP_ALL_VISIBLE |
- VISIBILITYMAP_ALL_FROZEN,
- RelationGetRelationName(vacrel->rel));
+ visibilitymap_set(blkno,
+ vmbuffer,
+ VISIBILITYMAP_ALL_VISIBLE |
+ VISIBILITYMAP_ALL_FROZEN,
+ RelationGetRelationName(vacrel->rel));
/*
* Emit WAL for setting PD_ALL_VISIBLE on the heap page and
@@ -2784,9 +2784,9 @@ lazy_vacuum_heap_page(LVRelState *vacrel, BlockNumber blkno, Buffer buffer,
if ((vmflags & VISIBILITYMAP_VALID_BITS) != 0)
{
PageSetAllVisible(page);
- visibilitymap_set_vmbits(blkno,
- vmbuffer, vmflags,
- RelationGetRelationName(vacrel->rel));
+ visibilitymap_set(blkno,
+ vmbuffer, vmflags,
+ RelationGetRelationName(vacrel->rel));
conflict_xid = visibility_cutoff_xid;
}
diff --git a/src/backend/access/heap/visibilitymap.c b/src/backend/access/heap/visibilitymap.c
index 2d43147ffb7..51d206e517d 100644
--- a/src/backend/access/heap/visibilitymap.c
+++ b/src/backend/access/heap/visibilitymap.c
@@ -14,8 +14,7 @@
* visibilitymap_clear - clear bits for one page in the visibility map
* visibilitymap_pin - pin a map page for setting a bit
* visibilitymap_pin_ok - check whether correct map page is already pinned
- * visibilitymap_set - set bit(s) in a previously pinned page and log
- * visibilitymap_set_vmbits - set bit(s) in a pinned page
+ * visibilitymap_set - set bit(s) in a previously pinned page
* visibilitymap_get_status - get status of bits
* visibilitymap_count - count number of bits set in visibility map
* visibilitymap_prepare_truncate -
@@ -220,107 +219,6 @@ visibilitymap_pin_ok(BlockNumber heapBlk, Buffer vmbuf)
return BufferIsValid(vmbuf) && BufferGetBlockNumber(vmbuf) == mapBlock;
}
-/*
- * visibilitymap_set - set bit(s) on a previously pinned page
- *
- * recptr is the LSN of the XLOG record we're replaying, if we're in recovery,
- * or InvalidXLogRecPtr in normal running. The VM page LSN is advanced to the
- * one provided; in normal running, we generate a new XLOG record and set the
- * page LSN to that value (though the heap page's LSN may *not* be updated;
- * see below). cutoff_xid is the largest xmin on the page being marked
- * all-visible; it is needed for Hot Standby, and can be InvalidTransactionId
- * if the page contains no tuples. It can also be set to InvalidTransactionId
- * when a page that is already all-visible is being marked all-frozen.
- *
- * Caller is expected to set the heap page's PD_ALL_VISIBLE bit before calling
- * this function. Except in recovery, caller should also pass the heap
- * buffer. When checksums are enabled and we're not in recovery, we must add
- * the heap buffer to the WAL chain to protect it from being torn.
- *
- * You must pass a buffer containing the correct map page to this function.
- * Call visibilitymap_pin first to pin the right one. This function doesn't do
- * any I/O.
- *
- * Returns the state of the page's VM bits before setting flags.
- */
-uint8
-visibilitymap_set(Relation rel, BlockNumber heapBlk, Buffer heapBuf,
- XLogRecPtr recptr, Buffer vmBuf, TransactionId cutoff_xid,
- uint8 flags)
-{
- BlockNumber mapBlock = HEAPBLK_TO_MAPBLOCK(heapBlk);
- uint32 mapByte = HEAPBLK_TO_MAPBYTE(heapBlk);
- uint8 mapOffset = HEAPBLK_TO_OFFSET(heapBlk);
- Page page;
- uint8 *map;
- uint8 status;
-
-#ifdef TRACE_VISIBILITYMAP
- elog(DEBUG1, "vm_set flags 0x%02X for %s %d",
- flags, RelationGetRelationName(rel), heapBlk);
-#endif
-
- Assert(InRecovery || XLogRecPtrIsInvalid(recptr));
- Assert(InRecovery || PageIsAllVisible(BufferGetPage(heapBuf)));
- Assert((flags & VISIBILITYMAP_VALID_BITS) == flags);
-
- /* Must never set all_frozen bit without also setting all_visible bit */
- Assert(flags != VISIBILITYMAP_ALL_FROZEN);
-
- /* Check that we have the right heap page pinned, if present */
- if (BufferIsValid(heapBuf) && BufferGetBlockNumber(heapBuf) != heapBlk)
- elog(ERROR, "wrong heap buffer passed to visibilitymap_set");
-
- Assert(!BufferIsValid(heapBuf) || BufferIsExclusiveLocked(heapBuf));
-
- /* Check that we have the right VM page pinned */
- if (!BufferIsValid(vmBuf) || BufferGetBlockNumber(vmBuf) != mapBlock)
- elog(ERROR, "wrong VM buffer passed to visibilitymap_set");
-
- page = BufferGetPage(vmBuf);
- map = (uint8 *) PageGetContents(page);
- LockBuffer(vmBuf, BUFFER_LOCK_EXCLUSIVE);
-
- status = (map[mapByte] >> mapOffset) & VISIBILITYMAP_VALID_BITS;
- if (flags != status)
- {
- START_CRIT_SECTION();
-
- map[mapByte] |= (flags << mapOffset);
- MarkBufferDirty(vmBuf);
-
- if (RelationNeedsWAL(rel))
- {
- if (XLogRecPtrIsInvalid(recptr))
- {
- Assert(!InRecovery);
- recptr = log_heap_visible(rel, heapBuf, vmBuf, cutoff_xid, flags);
-
- /*
- * If data checksums are enabled (or wal_log_hints=on), we
- * need to protect the heap page from being torn.
- *
- * If not, then we must *not* update the heap page's LSN. In
- * this case, the FPI for the heap page was omitted from the
- * WAL record inserted above, so it would be incorrect to
- * update the heap page's LSN.
- */
- if (XLogHintBitIsNeeded())
- {
- Page heapPage = BufferGetPage(heapBuf);
-
- PageSetLSN(heapPage, recptr);
- }
- }
- PageSetLSN(page, recptr);
- }
-
- END_CRIT_SECTION();
- }
-
- LockBuffer(vmBuf, BUFFER_LOCK_UNLOCK);
- return status;
-}
/*
* Set visibility map (VM) flags in the block referenced by vmBuf.
@@ -343,9 +241,9 @@ visibilitymap_set(Relation rel, BlockNumber heapBlk, Buffer heapBuf,
* heapRelname is used only for debugging.
*/
uint8
-visibilitymap_set_vmbits(BlockNumber heapBlk,
- Buffer vmBuf, uint8 flags,
- const char *heapRelname)
+visibilitymap_set(BlockNumber heapBlk,
+ Buffer vmBuf, uint8 flags,
+ const char *heapRelname)
{
BlockNumber mapBlock = HEAPBLK_TO_MAPBLOCK(heapBlk);
uint32 mapByte = HEAPBLK_TO_MAPBYTE(heapBlk);
diff --git a/src/backend/access/rmgrdesc/heapdesc.c b/src/backend/access/rmgrdesc/heapdesc.c
index 1cb44ca32d3..93505cb8c56 100644
--- a/src/backend/access/rmgrdesc/heapdesc.c
+++ b/src/backend/access/rmgrdesc/heapdesc.c
@@ -349,13 +349,6 @@ heap2_desc(StringInfo buf, XLogReaderState *record)
}
}
}
- else if (info == XLOG_HEAP2_VISIBLE)
- {
- xl_heap_visible *xlrec = (xl_heap_visible *) rec;
-
- appendStringInfo(buf, "snapshotConflictHorizon: %u, flags: 0x%02X",
- xlrec->snapshotConflictHorizon, xlrec->flags);
- }
else if (info == XLOG_HEAP2_MULTI_INSERT)
{
xl_heap_multi_insert *xlrec = (xl_heap_multi_insert *) rec;
@@ -460,9 +453,6 @@ heap2_identify(uint8 info)
case XLOG_HEAP2_PRUNE_VACUUM_CLEANUP:
id = "PRUNE_VACUUM_CLEANUP";
break;
- case XLOG_HEAP2_VISIBLE:
- id = "VISIBLE";
- break;
case XLOG_HEAP2_MULTI_INSERT:
id = "MULTI_INSERT";
break;
diff --git a/src/backend/replication/logical/decode.c b/src/backend/replication/logical/decode.c
index cc03f0706e9..2fdd4af90a8 100644
--- a/src/backend/replication/logical/decode.c
+++ b/src/backend/replication/logical/decode.c
@@ -454,7 +454,6 @@ heap2_decode(LogicalDecodingContext *ctx, XLogRecordBuffer *buf)
case XLOG_HEAP2_PRUNE_ON_ACCESS:
case XLOG_HEAP2_PRUNE_VACUUM_SCAN:
case XLOG_HEAP2_PRUNE_VACUUM_CLEANUP:
- case XLOG_HEAP2_VISIBLE:
case XLOG_HEAP2_LOCK_UPDATED:
break;
default:
diff --git a/src/backend/storage/ipc/standby.c b/src/backend/storage/ipc/standby.c
index 4222bdab078..c619643e121 100644
--- a/src/backend/storage/ipc/standby.c
+++ b/src/backend/storage/ipc/standby.c
@@ -475,12 +475,12 @@ ResolveRecoveryConflictWithSnapshot(TransactionId snapshotConflictHorizon,
* If we get passed InvalidTransactionId then we do nothing (no conflict).
*
* This can happen when replaying already-applied WAL records after a
- * standby crash or restart, or when replaying an XLOG_HEAP2_VISIBLE
- * record that marks as frozen a page which was already all-visible. It's
- * also quite common with records generated during index deletion
- * (original execution of the deletion can reason that a recovery conflict
- * which is sufficient for the deletion operation must take place before
- * replay of the deletion record itself).
+ * standby crash or restart, or when replaying a record that marks as
+ * frozen a page which was already marked all-visible in the visibility
+ * map. It's also quite common with records generated during index
+ * deletion (original execution of the deletion can reason that a recovery
+ * conflict which is sufficient for the deletion operation must take place
+ * before replay of the deletion record itself).
*/
if (!TransactionIdIsValid(snapshotConflictHorizon))
return;
diff --git a/src/include/access/heapam_xlog.h b/src/include/access/heapam_xlog.h
index 16c2b2e3c9c..e9e77bd678b 100644
--- a/src/include/access/heapam_xlog.h
+++ b/src/include/access/heapam_xlog.h
@@ -60,7 +60,6 @@
#define XLOG_HEAP2_PRUNE_ON_ACCESS 0x10
#define XLOG_HEAP2_PRUNE_VACUUM_SCAN 0x20
#define XLOG_HEAP2_PRUNE_VACUUM_CLEANUP 0x30
-#define XLOG_HEAP2_VISIBLE 0x40
#define XLOG_HEAP2_MULTI_INSERT 0x50
#define XLOG_HEAP2_LOCK_UPDATED 0x60
#define XLOG_HEAP2_NEW_CID 0x70
@@ -294,7 +293,13 @@ typedef struct xl_heap_prune
#define SizeOfHeapPrune (offsetof(xl_heap_prune, flags) + sizeof(uint16))
-/* to handle recovery conflict during logical decoding on standby */
+/*
+ * To handle recovery conflict during logical decoding on standby, we must know
+ * if the table is a catalog table. Note that in visibilitymapdefs.h
+ * VISIBLITYMAP_XLOG_CATALOG_REL is also defined as (1 << 2). xl_heap_prune
+ * records should use XLHP_IS_CATALOG_REL, not VISIBILIYTMAP_XLOG_CATALOG_REL --
+ * even if they only contain updates to the VM.
+ */
#define XLHP_IS_CATALOG_REL (1 << 1)
/*
@@ -443,20 +448,6 @@ typedef struct xl_heap_inplace
#define MinSizeOfHeapInplace (offsetof(xl_heap_inplace, nmsgs) + sizeof(int))
-/*
- * This is what we need to know about setting a visibility map bit
- *
- * Backup blk 0: visibility map buffer
- * Backup blk 1: heap buffer
- */
-typedef struct xl_heap_visible
-{
- TransactionId snapshotConflictHorizon;
- uint8 flags;
-} xl_heap_visible;
-
-#define SizeOfHeapVisible (offsetof(xl_heap_visible, flags) + sizeof(uint8))
-
typedef struct xl_heap_new_cid
{
/*
@@ -500,11 +491,6 @@ extern void heap2_desc(StringInfo buf, XLogReaderState *record);
extern const char *heap2_identify(uint8 info);
extern void heap_xlog_logical_rewrite(XLogReaderState *r);
-extern XLogRecPtr log_heap_visible(Relation rel, Buffer heap_buffer,
- Buffer vm_buffer,
- TransactionId snapshotConflictHorizon,
- uint8 vmflags);
-
/* in heapdesc.c, so it can be shared between frontend/backend code */
extern void heap_xlog_deserialize_prune_and_freeze(char *cursor, uint16 flags,
int *nplans, xlhp_freeze_plan **plans,
diff --git a/src/include/access/visibilitymap.h b/src/include/access/visibilitymap.h
index 3dcf37ba03f..859e5795457 100644
--- a/src/include/access/visibilitymap.h
+++ b/src/include/access/visibilitymap.h
@@ -15,7 +15,6 @@
#define VISIBILITYMAP_H
#include "access/visibilitymapdefs.h"
-#include "access/xlogdefs.h"
#include "storage/block.h"
#include "storage/buf.h"
#include "utils/relcache.h"
@@ -31,15 +30,11 @@ extern bool visibilitymap_clear(Relation rel, BlockNumber heapBlk,
extern void visibilitymap_pin(Relation rel, BlockNumber heapBlk,
Buffer *vmbuf);
extern bool visibilitymap_pin_ok(BlockNumber heapBlk, Buffer vmbuf);
-extern uint8 visibilitymap_set(Relation rel,
- BlockNumber heapBlk, Buffer heapBuf,
- XLogRecPtr recptr,
- Buffer vmBuf,
- TransactionId cutoff_xid,
- uint8 flags);
-extern uint8 visibilitymap_set_vmbits(BlockNumber heapBlk,
- Buffer vmBuf, uint8 flags,
- const char *heapRelname);
+
+extern uint8 visibilitymap_set(BlockNumber heapBlk,
+ Buffer vmBuf, uint8 flags,
+ const char *heapRelname);
+
extern uint8 visibilitymap_get_status(Relation rel, BlockNumber heapBlk, Buffer *vmbuf);
extern void visibilitymap_count(Relation rel, BlockNumber *all_visible, BlockNumber *all_frozen);
extern BlockNumber visibilitymap_prepare_truncate(Relation rel,
diff --git a/src/include/access/visibilitymapdefs.h b/src/include/access/visibilitymapdefs.h
index 5ad5c020877..e01bce4c99f 100644
--- a/src/include/access/visibilitymapdefs.h
+++ b/src/include/access/visibilitymapdefs.h
@@ -21,14 +21,5 @@
#define VISIBILITYMAP_ALL_FROZEN 0x02
#define VISIBILITYMAP_VALID_BITS 0x03 /* OR of all valid visibilitymap
* flags bits */
-/*
- * To detect recovery conflicts during logical decoding on a standby, we need
- * to know if a table is a user catalog table. For that we add an additional
- * bit into xl_heap_visible.flags, in addition to the above.
- *
- * NB: VISIBILITYMAP_XLOG_* may not be passed to visibilitymap_set().
- */
-#define VISIBILITYMAP_XLOG_CATALOG_REL 0x04
-#define VISIBILITYMAP_XLOG_VALID_BITS (VISIBILITYMAP_VALID_BITS | VISIBILITYMAP_XLOG_CATALOG_REL)
#endif /* VISIBILITYMAPDEFS_H */
diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list
index 20f45232175..885f9acff39 100644
--- a/src/tools/pgindent/typedefs.list
+++ b/src/tools/pgindent/typedefs.list
@@ -4275,7 +4275,6 @@ xl_heap_prune
xl_heap_rewrite_mapping
xl_heap_truncate
xl_heap_update
-xl_heap_visible
xl_invalid_page
xl_invalid_page_key
xl_invalidations
--
2.43.0
v17-0014-Set-pd_prune_xid-on-insert.patchtext/x-patch; charset=UTF-8; name=v17-0014-Set-pd_prune_xid-on-insert.patchDownload
From 58ba42d63128085051847ac1c9d7a88702657c23 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <melanieplageman@gmail.com>
Date: Tue, 29 Jul 2025 16:12:56 -0400
Subject: [PATCH v17 14/15] Set pd_prune_xid on insert
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
Now that visibility map (VM) updates can occur during read-only queries,
it makes sense to also set the page’s pd_prune_xid hint during inserts.
This enables heap_page_prune_and_freeze() to run after a page is
filled with newly inserted tuples the first time it is read.
This change also addresses a long-standing note in heap_insert() and
heap_multi_insert(), which observed that setting pd_prune_xid would
help clean up aborted insertions sooner. Without it, such tuples might
linger until VACUUM, whereas now they can be pruned earlier.
Setting pd_prune_xid on insert can cause a page to be dirtied and
written out when it previously would not have been, affetcting the
reported number of hits in the index-killtuples isolation test. It is
unclear if this is a bug in the way hits are tracked, a faulty test
expectation, or if simply updating the test's expected output is
sufficient remediation.
ci-os-only:
---
src/backend/access/heap/heapam.c | 25 +++++++++++++------
src/backend/access/heap/heapam_xlog.c | 15 ++++++++++-
.../isolation/expected/index-killtuples.out | 6 ++---
3 files changed, 34 insertions(+), 12 deletions(-)
diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index 6181e355aaf..1704269715e 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -2104,6 +2104,7 @@ heap_insert(Relation relation, HeapTuple tup, CommandId cid,
TransactionId xid = GetCurrentTransactionId();
HeapTuple heaptup;
Buffer buffer;
+ Page page;
Buffer vmbuffer = InvalidBuffer;
bool all_visible_cleared = false;
@@ -2163,15 +2164,19 @@ heap_insert(Relation relation, HeapTuple tup, CommandId cid,
}
/*
- * XXX Should we set PageSetPrunable on this page ?
+ * Set pd_prune_xid to trigger heap_page_prune_and_freeze() once the page
+ * is full so that we can set the page all-visible in the VM.
*
- * The inserting transaction may eventually abort thus making this tuple
- * DEAD and hence available for pruning. Though we don't want to optimize
- * for aborts, if no other tuple in this page is UPDATEd/DELETEd, the
- * aborted tuple will never be pruned until next vacuum is triggered.
+ * Setting pd_prune_xid is also handy if the inserting transaction
+ * eventually aborts making this tuple DEAD and hence available for
+ * pruning. If no other tuple in this page is UPDATEd/DELETEd, the aborted
+ * tuple would never otherwise be pruned until next vacuum is triggered.
*
- * If you do add PageSetPrunable here, add it in heap_xlog_insert too.
+ * Don't set it if we are in bootstrap mode, though.
*/
+ page = BufferGetPage(buffer);
+ if (TransactionIdIsNormal(xid))
+ PageSetPrunable(page, xid);
MarkBufferDirty(buffer);
@@ -2181,7 +2186,6 @@ heap_insert(Relation relation, HeapTuple tup, CommandId cid,
xl_heap_insert xlrec;
xl_heap_header xlhdr;
XLogRecPtr recptr;
- Page page = BufferGetPage(buffer);
uint8 info = XLOG_HEAP_INSERT;
int bufflags = 0;
@@ -2545,8 +2549,13 @@ heap_multi_insert(Relation relation, TupleTableSlot **slots, int ntuples,
}
/*
- * XXX Should we set PageSetPrunable on this page ? See heap_insert()
+ * Set pd_prune_xid. See heap_insert() for more on why we do this when
+ * inserting tuples. This only makes sense if we aren't already
+ * setting the page frozen in the VM. We also don't set it in
+ * bootstrap mode.
*/
+ if (!all_frozen_set && TransactionIdIsNormal(xid))
+ PageSetPrunable(page, xid);
MarkBufferDirty(buffer);
diff --git a/src/backend/access/heap/heapam_xlog.c b/src/backend/access/heap/heapam_xlog.c
index aeb97cc3cea..dbbc4a16bd8 100644
--- a/src/backend/access/heap/heapam_xlog.c
+++ b/src/backend/access/heap/heapam_xlog.c
@@ -475,6 +475,12 @@ heap_xlog_insert(XLogReaderState *record)
freespace = PageGetHeapFreeSpace(page); /* needed to update FSM below */
+ /*
+ * Set the page prunable to trigger on-access pruning later which may
+ * set the page all-visible in the VM.
+ */
+ PageSetPrunable(page, XLogRecGetXid(record));
+
PageSetLSN(page, lsn);
if (xlrec->flags & XLH_INSERT_ALL_VISIBLE_CLEARED)
@@ -624,9 +630,16 @@ heap_xlog_multi_insert(XLogReaderState *record)
if (xlrec->flags & XLH_INSERT_ALL_VISIBLE_CLEARED)
PageClearAllVisible(page);
- /* XLH_INSERT_ALL_FROZEN_SET implies that all tuples are visible */
+ /*
+ * XLH_INSERT_ALL_FROZEN_SET implies that all tuples are visible. If
+ * we are not setting the page frozen, then set the page's prunable
+ * hint so that we trigger on-access pruning later which may set the
+ * page all-visible in the VM.
+ */
if (xlrec->flags & XLH_INSERT_ALL_FROZEN_SET)
PageSetAllVisible(page);
+ else
+ PageSetPrunable(page, XLogRecGetXid(record));
MarkBufferDirty(buffer);
}
diff --git a/src/test/isolation/expected/index-killtuples.out b/src/test/isolation/expected/index-killtuples.out
index be7ddd756ef..b29f2434b00 100644
--- a/src/test/isolation/expected/index-killtuples.out
+++ b/src/test/isolation/expected/index-killtuples.out
@@ -54,7 +54,7 @@ step flush: SELECT FROM pg_stat_force_next_flush();
step result: SELECT heap_blks_read + heap_blks_hit - counter.heap_accesses AS new_heap_accesses FROM counter, pg_statio_all_tables WHERE relname = 'kill_prior_tuple';
new_heap_accesses
-----------------
- 1
+ 2
(1 row)
step measure: UPDATE counter SET heap_accesses = (SELECT heap_blks_read + heap_blks_hit FROM pg_statio_all_tables WHERE relname = 'kill_prior_tuple');
@@ -130,7 +130,7 @@ step flush: SELECT FROM pg_stat_force_next_flush();
step result: SELECT heap_blks_read + heap_blks_hit - counter.heap_accesses AS new_heap_accesses FROM counter, pg_statio_all_tables WHERE relname = 'kill_prior_tuple';
new_heap_accesses
-----------------
- 1
+ 2
(1 row)
step measure: UPDATE counter SET heap_accesses = (SELECT heap_blks_read + heap_blks_hit FROM pg_statio_all_tables WHERE relname = 'kill_prior_tuple');
@@ -283,7 +283,7 @@ step flush: SELECT FROM pg_stat_force_next_flush();
step result: SELECT heap_blks_read + heap_blks_hit - counter.heap_accesses AS new_heap_accesses FROM counter, pg_statio_all_tables WHERE relname = 'kill_prior_tuple';
new_heap_accesses
-----------------
- 1
+ 2
(1 row)
step measure: UPDATE counter SET heap_accesses = (SELECT heap_blks_read + heap_blks_hit FROM pg_statio_all_tables WHERE relname = 'kill_prior_tuple');
--
2.43.0
v17-0015-Split-heap_page_prune_and_freeze-into-helpers.patchtext/x-patch; charset=US-ASCII; name=v17-0015-Split-heap_page_prune_and_freeze-into-helpers.patchDownload
From bd3b416719d53c0fa904d0aaae9540b1cce84ec2 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <melanieplageman@gmail.com>
Date: Wed, 8 Oct 2025 18:45:45 -0400
Subject: [PATCH v17 15/15] Split heap_page_prune_and_freeze into helpers
---
src/backend/access/heap/pruneheap.c | 316 +++++++++++++++-------------
1 file changed, 170 insertions(+), 146 deletions(-)
diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index 05e6b902069..51674733eaf 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -586,82 +586,20 @@ heap_page_will_set_vis(Relation relation,
return do_set_vm;
}
-/*
- * Prune and repair fragmentation and potentially freeze tuples on the
- * specified page. If the page's visibility status has changed, update it in
- * the VM.
- *
- * Caller must have pin and buffer cleanup lock on the page. Note that we
- * don't update the FSM information for page on caller's behalf. Caller might
- * also need to account for a reduction in the length of the line pointer
- * array following array truncation by us.
- *
- * params contains the input parameters used to control freezing and pruning
- * behavior. See the definition of PruneFreezeParams for more on what each
- * parameter does.
- *
- * If the HEAP_PRUNE_FREEZE option is set in params, we will freeze tuples if
- * it's required in order to advance relfrozenxid / relminmxid, or if it's
- * considered advantageous for overall system performance to do so now. The
- * 'params.cutoffs', 'presult', 'new_relfrozen_xid' and 'new_relmin_mxid'
- * arguments are required when freezing.
- *
- * If HEAP_PAGE_PRUNE_UPDATE_VIS is set in params and the visibility status of
- * the page has changed, we will update the VM at the same time as pruning and
- * freezing the heap page. We will also update presult->old_vmbits and
- * presult->new_vmbits with the state of the VM before and after updating it
- * for the caller to use in bookkeeping.
- *
- * presult contains output parameters needed by callers, such as the number of
- * tuples removed and the offsets of dead items on the page after pruning.
- * heap_page_prune_and_freeze() is responsible for initializing it. Required
- * by all callers.
- *
- * off_loc is the offset location required by the caller to use in error
- * callback.
- *
- * new_relfrozen_xid and new_relmin_mxid must provided by the caller if the
- * HEAP_PRUNE_FREEZE option is set in params. On entry, they contain the
- * oldest XID and multi-XID seen on the relation so far. They will be updated
- * with oldest values present on the page after pruning. After processing the
- * whole relation, VACUUM can use these values as the new
- * relfrozenxid/relminmxid for the relation.
- */
-void
-heap_page_prune_and_freeze(PruneFreezeParams *params,
- PruneFreezeResult *presult,
- OffsetNumber *off_loc,
- TransactionId *new_relfrozen_xid,
- MultiXactId *new_relmin_mxid)
+static void
+prune_freeze_setup(PruneFreezeParams *params, PruneState *prstate,
+ TransactionId *new_relfrozen_xid,
+ MultiXactId *new_relmin_mxid,
+ PruneFreezeResult *presult)
{
- Buffer buffer = params->buffer;
- Buffer vmbuffer = params->vmbuffer;
- Page page = BufferGetPage(buffer);
- BlockNumber blockno = BufferGetBlockNumber(buffer);
- OffsetNumber offnum,
- maxoff;
- PruneState prstate;
- HeapTupleData tup;
- bool do_freeze;
- bool do_prune;
- bool do_hint_prune;
- bool do_set_vm;
- bool do_set_pd_vis;
- bool did_tuple_hint_fpi;
- int64 fpi_before = pgWalUsage.wal_fpi;
- TransactionId frz_conflict_horizon = InvalidTransactionId;
- TransactionId conflict_xid = InvalidTransactionId;
- uint8 new_vmbits = 0;
- uint8 old_vmbits = 0;
-
/* Copy parameters to prstate */
- prstate.vistest = params->vistest;
- prstate.mark_unused_now =
+ prstate->vistest = params->vistest;
+ prstate->mark_unused_now =
(params->options & HEAP_PAGE_PRUNE_MARK_UNUSED_NOW) != 0;
- prstate.attempt_freeze = (params->options & HEAP_PAGE_PRUNE_FREEZE) != 0;
- prstate.attempt_update_vm =
+ prstate->attempt_freeze = (params->options & HEAP_PAGE_PRUNE_FREEZE) != 0;
+ prstate->attempt_update_vm =
(params->options & HEAP_PAGE_PRUNE_UPDATE_VIS) != 0;
- prstate.cutoffs = params->cutoffs;
+ prstate->cutoffs = params->cutoffs;
/*
* Our strategy is to scan the page and make lists of items to change,
@@ -674,37 +612,37 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
* prunable, we will save the lowest relevant XID in new_prune_xid. Also
* initialize the rest of our working state.
*/
- prstate.new_prune_xid = InvalidTransactionId;
- prstate.latest_xid_removed = InvalidTransactionId;
- prstate.nredirected = prstate.ndead = prstate.nunused = prstate.nfrozen = 0;
- prstate.nroot_items = 0;
- prstate.nheaponly_items = 0;
+ prstate->new_prune_xid = InvalidTransactionId;
+ prstate->latest_xid_removed = InvalidTransactionId;
+ prstate->nredirected = prstate->ndead = prstate->nunused = prstate->nfrozen = 0;
+ prstate->nroot_items = 0;
+ prstate->nheaponly_items = 0;
/* initialize page freezing working state */
- prstate.pagefrz.freeze_required = false;
- if (prstate.attempt_freeze)
+ prstate->pagefrz.freeze_required = false;
+ if (prstate->attempt_freeze)
{
Assert(new_relfrozen_xid && new_relmin_mxid);
- prstate.pagefrz.FreezePageRelfrozenXid = *new_relfrozen_xid;
- prstate.pagefrz.NoFreezePageRelfrozenXid = *new_relfrozen_xid;
- prstate.pagefrz.FreezePageRelminMxid = *new_relmin_mxid;
- prstate.pagefrz.NoFreezePageRelminMxid = *new_relmin_mxid;
+ prstate->pagefrz.FreezePageRelfrozenXid = *new_relfrozen_xid;
+ prstate->pagefrz.NoFreezePageRelfrozenXid = *new_relfrozen_xid;
+ prstate->pagefrz.FreezePageRelminMxid = *new_relmin_mxid;
+ prstate->pagefrz.NoFreezePageRelminMxid = *new_relmin_mxid;
}
else
{
Assert(new_relfrozen_xid == NULL && new_relmin_mxid == NULL);
- prstate.pagefrz.FreezePageRelminMxid = InvalidMultiXactId;
- prstate.pagefrz.NoFreezePageRelminMxid = InvalidMultiXactId;
- prstate.pagefrz.FreezePageRelfrozenXid = InvalidTransactionId;
- prstate.pagefrz.NoFreezePageRelfrozenXid = InvalidTransactionId;
+ prstate->pagefrz.FreezePageRelminMxid = InvalidMultiXactId;
+ prstate->pagefrz.NoFreezePageRelminMxid = InvalidMultiXactId;
+ prstate->pagefrz.FreezePageRelfrozenXid = InvalidTransactionId;
+ prstate->pagefrz.NoFreezePageRelfrozenXid = InvalidTransactionId;
}
- prstate.ndeleted = 0;
- prstate.live_tuples = 0;
- prstate.recently_dead_tuples = 0;
- prstate.hastup = false;
- prstate.lpdead_items = 0;
- prstate.deadoffsets = presult->deadoffsets;
+ prstate->ndeleted = 0;
+ prstate->live_tuples = 0;
+ prstate->recently_dead_tuples = 0;
+ prstate->hastup = false;
+ prstate->lpdead_items = 0;
+ prstate->deadoffsets = presult->deadoffsets;
/*
* Track whether the page could be marked all-visible and/or all-frozen.
@@ -732,20 +670,20 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
* bookkeeping. In this case, initializing all_visible to false allows
* heap_prune_record_unchanged_lp_normal() to bypass unnecessary work.
*/
- if (prstate.attempt_freeze)
+ if (prstate->attempt_freeze)
{
- prstate.all_visible = true;
- prstate.all_frozen = true;
+ prstate->all_visible = true;
+ prstate->all_frozen = true;
}
- else if (prstate.attempt_update_vm)
+ else if (prstate->attempt_update_vm)
{
- prstate.all_visible = true;
- prstate.all_frozen = false;
+ prstate->all_visible = true;
+ prstate->all_frozen = false;
}
else
{
- prstate.all_visible = false;
- prstate.all_frozen = false;
+ prstate->all_visible = false;
+ prstate->all_frozen = false;
}
/*
@@ -757,10 +695,14 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
* used to calculate the snapshot conflict horizon when updating the VM
* and/or freezing all the tuples on the page.
*/
- prstate.visibility_cutoff_xid = InvalidTransactionId;
+ prstate->visibility_cutoff_xid = InvalidTransactionId;
+}
- maxoff = PageGetMaxOffsetNumber(page);
- tup.t_tableOid = RelationGetRelid(params->relation);
+static void
+prune_freeze_plan(PruneState *prstate, BlockNumber blockno, Buffer buffer, Page page,
+ OffsetNumber maxoff, OffsetNumber *off_loc, HeapTuple tup)
+{
+ OffsetNumber offnum;
/*
* Determine HTSV for all tuples, and queue them up for processing as HOT
@@ -795,13 +737,13 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
*/
*off_loc = offnum;
- prstate.processed[offnum] = false;
- prstate.htsv[offnum] = -1;
+ prstate->processed[offnum] = false;
+ prstate->htsv[offnum] = -1;
/* Nothing to do if slot doesn't contain a tuple */
if (!ItemIdIsUsed(itemid))
{
- heap_prune_record_unchanged_lp_unused(page, &prstate, offnum);
+ heap_prune_record_unchanged_lp_unused(page, prstate, offnum);
continue;
}
@@ -811,17 +753,17 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
* If the caller set mark_unused_now true, we can set dead line
* pointers LP_UNUSED now.
*/
- if (unlikely(prstate.mark_unused_now))
- heap_prune_record_unused(&prstate, offnum, false);
+ if (unlikely(prstate->mark_unused_now))
+ heap_prune_record_unused(prstate, offnum, false);
else
- heap_prune_record_unchanged_lp_dead(page, &prstate, offnum);
+ heap_prune_record_unchanged_lp_dead(page, prstate, offnum);
continue;
}
if (ItemIdIsRedirected(itemid))
{
/* This is the start of a HOT chain */
- prstate.root_items[prstate.nroot_items++] = offnum;
+ prstate->root_items[prstate->nroot_items++] = offnum;
continue;
}
@@ -831,25 +773,19 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
* Get the tuple's visibility status and queue it up for processing.
*/
htup = (HeapTupleHeader) PageGetItem(page, itemid);
- tup.t_data = htup;
- tup.t_len = ItemIdGetLength(itemid);
- ItemPointerSet(&tup.t_self, blockno, offnum);
+ tup->t_data = htup;
+ tup->t_len = ItemIdGetLength(itemid);
+ ItemPointerSet(&tup->t_self, blockno, offnum);
- prstate.htsv[offnum] = heap_prune_satisfies_vacuum(&prstate, &tup,
- buffer);
+ prstate->htsv[offnum] = heap_prune_satisfies_vacuum(prstate, tup,
+ buffer);
if (!HeapTupleHeaderIsHeapOnly(htup))
- prstate.root_items[prstate.nroot_items++] = offnum;
+ prstate->root_items[prstate->nroot_items++] = offnum;
else
- prstate.heaponly_items[prstate.nheaponly_items++] = offnum;
+ prstate->heaponly_items[prstate->nheaponly_items++] = offnum;
}
- /*
- * If checksums are enabled, heap_prune_satisfies_vacuum() may have caused
- * an FPI to be emitted.
- */
- did_tuple_hint_fpi = fpi_before != pgWalUsage.wal_fpi;
-
/*
* Process HOT chains.
*
@@ -861,30 +797,30 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
* the page instead of using the root_items array, also did it in
* ascending offset number order.)
*/
- for (int i = prstate.nroot_items - 1; i >= 0; i--)
+ for (int i = prstate->nroot_items - 1; i >= 0; i--)
{
- offnum = prstate.root_items[i];
+ offnum = prstate->root_items[i];
/* Ignore items already processed as part of an earlier chain */
- if (prstate.processed[offnum])
+ if (prstate->processed[offnum])
continue;
/* see preceding loop */
*off_loc = offnum;
/* Process this item or chain of items */
- heap_prune_chain(page, blockno, maxoff, offnum, &prstate);
+ heap_prune_chain(page, blockno, maxoff, offnum, prstate);
}
/*
* Process any heap-only tuples that were not already processed as part of
* a HOT chain.
*/
- for (int i = prstate.nheaponly_items - 1; i >= 0; i--)
+ for (int i = prstate->nheaponly_items - 1; i >= 0; i--)
{
- offnum = prstate.heaponly_items[i];
+ offnum = prstate->heaponly_items[i];
- if (prstate.processed[offnum])
+ if (prstate->processed[offnum])
continue;
/* see preceding loop */
@@ -903,7 +839,7 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
* return true for an XMIN_INVALID tuple, so this code will work even
* when there were sequential updates within the aborted transaction.)
*/
- if (prstate.htsv[offnum] == HEAPTUPLE_DEAD)
+ if (prstate->htsv[offnum] == HEAPTUPLE_DEAD)
{
ItemId itemid = PageGetItemId(page, offnum);
HeapTupleHeader htup = (HeapTupleHeader) PageGetItem(page, itemid);
@@ -911,8 +847,8 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
if (likely(!HeapTupleHeaderIsHotUpdated(htup)))
{
HeapTupleHeaderAdvanceConflictHorizon(htup,
- &prstate.latest_xid_removed);
- heap_prune_record_unused(&prstate, offnum, true);
+ &prstate->latest_xid_removed);
+ heap_prune_record_unused(prstate, offnum, true);
}
else
{
@@ -929,7 +865,7 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
}
}
else
- heap_prune_record_unchanged_lp_normal(page, &prstate, offnum);
+ heap_prune_record_unchanged_lp_normal(page, prstate, offnum);
}
/* We should now have processed every tuple exactly once */
@@ -940,12 +876,110 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
{
*off_loc = offnum;
- Assert(prstate.processed[offnum]);
+ Assert(prstate->processed[offnum]);
}
#endif
+ /*
+ * After processing all the live tuples on the page, if the newest xmin
+ * amongst them is not visible to everyone, the page cannot be
+ * all-visible.
+ */
+ if (prstate->all_visible &&
+ TransactionIdIsNormal(prstate->visibility_cutoff_xid) &&
+ !GlobalVisXidVisibleToAll(prstate->vistest, prstate->visibility_cutoff_xid))
+ prstate->all_visible = prstate->all_frozen = false;
+
/* Clear the offset information once we have processed the given page. */
*off_loc = InvalidOffsetNumber;
+}
+
+/*
+ * Prune and repair fragmentation and potentially freeze tuples on the
+ * specified page. If the page's visibility status has changed, update it in
+ * the VM.
+ *
+ * Caller must have pin and buffer cleanup lock on the page. Note that we
+ * don't update the FSM information for page on caller's behalf. Caller might
+ * also need to account for a reduction in the length of the line pointer
+ * array following array truncation by us.
+ *
+ * params contains the input parameters used to control freezing and pruning
+ * behavior. See the definition of PruneFreezeParams for more on what each
+ * parameter does.
+ *
+ * If the HEAP_PRUNE_FREEZE option is set in params, we will freeze tuples if
+ * it's required in order to advance relfrozenxid / relminmxid, or if it's
+ * considered advantageous for overall system performance to do so now. The
+ * 'params.cutoffs', 'presult', 'new_relfrozen_xid' and 'new_relmin_mxid'
+ * arguments are required when freezing.
+ *
+ * If HEAP_PAGE_PRUNE_UPDATE_VIS is set in params and the visibility status of
+ * the page has changed, we will update the VM at the same time as pruning and
+ * freezing the heap page. We will also update presult->old_vmbits and
+ * presult->new_vmbits with the state of the VM before and after updating it
+ * for the caller to use in bookkeeping.
+ *
+ * presult contains output parameters needed by callers, such as the number of
+ * tuples removed and the offsets of dead items on the page after pruning.
+ * heap_page_prune_and_freeze() is responsible for initializing it. Required
+ * by all callers.
+ *
+ * off_loc is the offset location required by the caller to use in error
+ * callback.
+ *
+ * new_relfrozen_xid and new_relmin_mxid must provided by the caller if the
+ * HEAP_PRUNE_FREEZE option is set in params. On entry, they contain the
+ * oldest XID and multi-XID seen on the relation so far. They will be updated
+ * with oldest values present on the page after pruning. After processing the
+ * whole relation, VACUUM can use these values as the new
+ * relfrozenxid/relminmxid for the relation.
+ */
+void
+heap_page_prune_and_freeze(PruneFreezeParams *params,
+ PruneFreezeResult *presult,
+ OffsetNumber *off_loc,
+ TransactionId *new_relfrozen_xid,
+ MultiXactId *new_relmin_mxid)
+{
+ Buffer buffer = params->buffer;
+ Buffer vmbuffer = params->vmbuffer;
+ Page page = BufferGetPage(buffer);
+ BlockNumber blockno = BufferGetBlockNumber(buffer);
+ OffsetNumber maxoff;
+ PruneState prstate;
+ HeapTupleData tup;
+ bool do_freeze;
+ bool do_prune;
+ bool do_hint_prune;
+ bool do_set_vm;
+ bool do_set_pd_vis;
+ bool did_tuple_hint_fpi;
+ int64 fpi_before = pgWalUsage.wal_fpi;
+ TransactionId frz_conflict_horizon = InvalidTransactionId;
+ TransactionId conflict_xid = InvalidTransactionId;
+ uint8 new_vmbits = 0;
+ uint8 old_vmbits = 0;
+
+ maxoff = PageGetMaxOffsetNumber(page);
+ tup.t_tableOid = RelationGetRelid(params->relation);
+
+ /* Initialize needed state in prstate */
+ prune_freeze_setup(params, &prstate, new_relfrozen_xid, new_relmin_mxid, presult);
+
+ /*
+ * Examine all line pointers and tuple visibility information to determine
+ * which line pointers should change state and which tuples may be frozen.
+ * Prepare queue of state changes to later be executed in a critical
+ * section.
+ */
+ prune_freeze_plan(&prstate, blockno, buffer, page, maxoff, off_loc, &tup);
+
+ /*
+ * If checksums are enabled, heap_prune_satisfies_vacuum() may have caused
+ * an FPI to be emitted.
+ */
+ did_tuple_hint_fpi = fpi_before != pgWalUsage.wal_fpi;
do_prune = prstate.nredirected > 0 ||
prstate.ndead > 0 ||
@@ -959,16 +993,6 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
do_hint_prune = ((PageHeader) page)->pd_prune_xid != prstate.new_prune_xid ||
PageIsFull(page);
- /*
- * After processing all the live tuples on the page, if the newest xmin
- * amongst them is not visible to everyone, the page cannot be
- * all-visible.
- */
- if (prstate.all_visible &&
- TransactionIdIsNormal(prstate.visibility_cutoff_xid) &&
- !GlobalVisXidVisibleToAll(prstate.vistest, prstate.visibility_cutoff_xid))
- prstate.all_visible = prstate.all_frozen = false;
-
/*
* Decide if we want to go ahead with freezing according to the freeze
* plans we prepared, or not.
--
2.43.0
Hi,
On 2025-10-08 18:54:25 -0400, Melanie Plageman wrote:
+uint8 +visibilitymap_set_vmbits(BlockNumber heapBlk, + Buffer vmBuf, uint8 flags, + const char *heapRelname) +{ + BlockNumber mapBlock = HEAPBLK_TO_MAPBLOCK(heapBlk); + uint32 mapByte = HEAPBLK_TO_MAPBYTE(heapBlk); + uint8 mapOffset = HEAPBLK_TO_OFFSET(heapBlk); + Page page; + uint8 *map; + uint8 status; + +#ifdef TRACE_VISIBILITYMAP + elog(DEBUG1, "vm_set flags 0x%02X for %s %d", + flags, heapRelname, heapBlk); +#endif
I like it doesn't take a Relation anymore, but I'd just pass the smgrrelation
instead, then you don't need to allocate the string in the caller, when it's
approximately never used.
Otherwise this looks pretty close to me.
@@ -71,12 +84,12 @@ heap_xlog_prune_freeze(XLogReaderState *record)
}/* - * If we have a full-page image, restore it and we're done. + * If we have a full-page image of the heap block, restore it and we're + * done with the heap block. */ - action = XLogReadBufferForRedoExtended(record, 0, RBM_NORMAL, - (xlrec.flags & XLHP_CLEANUP_LOCK) != 0, - &buffer); - if (action == BLK_NEEDS_REDO) + if (XLogReadBufferForRedoExtended(record, 0, RBM_NORMAL, + (xlrec.flags & XLHP_CLEANUP_LOCK) != 0, + &buffer) == BLK_NEEDS_REDO) { Page page = BufferGetPage(buffer); OffsetNumber *redirected;
Why move it around this way?
@@ -138,36 +157,104 @@ heap_xlog_prune_freeze(XLogReaderState *record)
/* There should be no more data */
Assert((char *) frz_offsets == dataptr + datalen);+ if ((vmflags & VISIBILITYMAP_VALID_BITS)) + PageSetAllVisible(page); + + MarkBufferDirty(buffer); + + /* + * Always emit a WAL record when setting PD_ALL_VISIBLE but only emit + * an FPI if checksums/wal_log_hints are enabled.
This comment reads as-if we're WAL logging here, but this is a
Wendy's^Wrecovery.
Advance the page LSN + * only if the record could include an FPI, since recovery skips + * records <= the stamped LSN. Otherwise it might skip an earlier FPI + * needed to repair a torn page. + */
This is confusing, should probably just reference the stuff we did in the
!recovery case.
+ if (do_prune || nplans > 0 || + ((vmflags & VISIBILITYMAP_VALID_BITS) && XLogHintBitIsNeeded())) + PageSetLSN(page, lsn); + /* * Note: we don't worry about updating the page's prunability hints. * At worst this will cause an extra prune cycle to occur soon. */
Not your fault, but that seems odd? Why aren't we just doing the right thing?
/* - * If we released any space or line pointers, update the free space map. + * If we released any space or line pointers or set PD_ALL_VISIBLE or the + * VM, update the freespace map.
I'd replace the first or with a , ;)
+ * Even when no actual space is freed (e.g., when only marking the page + * all-visible or frozen), we still update the FSM. Because the FSM is + * unlogged and maintained heuristically, it often becomes stale on + * standbys. If such a standby is later promoted and runs VACUUM, it will + * skip recalculating free space for pages that were marked all-visible + * (or all-frozen, depending on the mode). FreeSpaceMapVacuum can then + * propagate overly optimistic free space values upward, causing future + * insertions to select pages that turn out to be unusable. In bulk, this + * can lead to long stalls. + * + * To prevent this, always refresh the FSM’s view when a page becomes + * all-visible or all-frozen.
I'd s/refresh/update/, because refresh sounds more like rereading the current
state of the FSM, rather than changing the FSM.
+ /* We don't have relation name during recovery, so use relfilenode */ + relname = psprintf("%u", rlocator.relNumber); + old_vmbits = visibilitymap_set_vmbits(blkno, vmbuffer, vmflags, relname);- XLogRecordPageWithFreeSpace(rlocator, blkno, freespace); + /* Only set VM page LSN if we modified the page */ + if (old_vmbits != vmflags) + { + Assert(BufferIsDirty(vmbuffer)); + PageSetLSN(BufferGetPage(vmbuffer), lsn); } - else - UnlockReleaseBuffer(buffer); + pfree(relname);
Hm. When can we actually enter the old_vmbits == vmflags case? It might also
be fine to just say that we don't expect it to change but are mirroring the
code in visibilitymap_set().
I wonder if the VM specific redo portion should be in a common helper? Might
not be enough code to worry though...
@@ -2070,8 +2079,24 @@ log_heap_prune_and_freeze(Relation relation, Buffer buffer, xlhp_prune_items dead_items; xlhp_prune_items unused_items; OffsetNumber frz_offsets[MaxHeapTuplesPerPage]; + bool do_prune = nredirected > 0 || ndead > 0 || nunused > 0; + bool do_set_vm = vmflags & VISIBILITYMAP_VALID_BITS;xlrec.flags = 0; + regbuf_flags = REGBUF_STANDARD; + + Assert((vmflags & VISIBILITYMAP_VALID_BITS) == vmflags); + + /* + * We can avoid an FPI if the only modification we are making to the heap + * page is to set PD_ALL_VISIBLE and checksums/wal_log_hints are disabled.
Maybe s/an FPI/an FPI for the heap pae/?
+ * Note that if we explicitly skip an FPI, we must not set the heap page + * LSN later. + */ + if (!do_prune && + nfrozen == 0 && + (!do_set_vm || !XLogHintBitIsNeeded())) + regbuf_flags |= REGBUF_NO_IMAGE;
/* * Prepare data for the buffer. The arrays are not actually in the @@ -2079,7 +2104,11 @@ log_heap_prune_and_freeze(Relation relation, Buffer buffer, * page image, the arrays can be omitted. */ XLogBeginInsert(); - XLogRegisterBuffer(0, buffer, REGBUF_STANDARD); + XLogRegisterBuffer(0, buffer, regbuf_flags); + + if (do_set_vm) + XLogRegisterBuffer(1, vmbuffer, 0);
Seems a bit confusing that it's named regbuf_flags but isn't used all the
XLogRegisterBuffer calls. Maybe make the name more specific
(regbuf_flags_heap?)...
}
recptr = XLogInsert(RM_HEAP2_ID, info);- PageSetLSN(BufferGetPage(buffer), recptr); + if (do_set_vm) + { + Assert(BufferIsDirty(vmbuffer)); + PageSetLSN(BufferGetPage(vmbuffer), recptr); + }
+ /* + * We must bump the page LSN if pruning or freezing. If we are only + * updating PD_ALL_VISIBLE, though, we can skip doing this unless + * wal_log_hints/checksums are enabled. Torn pages are possible if we + * update PD_ALL_VISIBLE without bumping the LSN, but this is deemed okay + * for page hint updates. + */
Arguably it's not a torn page if we only modified something as narrow as a
hint bit, and are redoing that change after recovery. But that's extremely
nitpicky.
I wonder if the comment explaining this should be put into one place and
reference it from all the different places.
@@ -2860,6 +2867,29 @@ lazy_vacuum_heap_page(LVRelState *vacrel, BlockNumber blkno, Buffer buffer,
VACUUM_ERRCB_PHASE_VACUUM_HEAP, blkno,
InvalidOffsetNumber);+ /* + * Before marking dead items unused, check whether the page will become + * all-visible once that change is applied.
So the function is named _would_ but here you say will :)
This lets us reap the tuples + * and mark the page all-visible within the same critical section, + * enabling both changes to be emitted in a single WAL record. Since the + * visibility checks may perform I/O and allocate memory, they must be + * done outside the critical section. + */ + if (heap_page_would_be_all_visible(vacrel, buffer, + deadoffsets, num_offsets, + &all_frozen, &visibility_cutoff_xid)) + { + vmflags |= VISIBILITYMAP_ALL_VISIBLE; + if (all_frozen) + { + vmflags |= VISIBILITYMAP_ALL_FROZEN; + Assert(!TransactionIdIsValid(visibility_cutoff_xid)); + } + + /* Take the lock on the vmbuffer before entering a critical section */ + LockBuffer(vmbuffer, BUFFER_LOCK_EXCLUSIVE);
It sure would be nice if we had documented the lock order between the heap
page and the corresponding VM page anywhere. This is just doing what we did
before, so it's not this patch's fault, but I did get worried about it for a
moment.
+/* + * Check whether the heap page in buf is all-visible except for the dead + * tuples referenced in the deadoffsets array. + * + * The visibility checks may perform IO and allocate memory so they must not + * be done in a critical section. This function is used by vacuum to determine + * if the page will be all-visible once it reaps known dead tuples. That way + * it can do both in the same critical section and emit a single WAL record. + * + * Returns true if the page is all-visible other than the provided + * deadoffsets and false otherwise. + * + * Output parameters: + * + * - *all_frozen: true if every tuple on the page is frozen + * - *visibility_cutoff_xid: newest xmin; valid only if page is all-visible + * Callers looking to verify that the page is already all-visible can call + * heap_page_is_all_visible(). + * + * This logic is closely related to heap_prune_record_unchanged_lp_normal(). + * If you modify this function, ensure consistency with that code. An + * assertion cross-checks that both remain in agreement. Do not introduce new + * side-effects. + */ +static bool +heap_page_would_be_all_visible(LVRelState *vacrel, Buffer buf, + OffsetNumber *deadoffsets, + int ndeadoffsets, + bool *all_frozen, + TransactionId *visibility_cutoff_xid) +{ Page page = BufferGetPage(buf); BlockNumber blockno = BufferGetBlockNumber(buf); OffsetNumber offnum, maxoff; bool all_visible = true; + int matched_dead_count = 0;*visibility_cutoff_xid = InvalidTransactionId;
*all_frozen = true;+ Assert(ndeadoffsets == 0 || deadoffsets); + +#ifdef USE_ASSERT_CHECKING + /* Confirm input deadoffsets[] is strictly sorted */ + if (ndeadoffsets > 1) + { + for (int i = 1; i < ndeadoffsets; i++) + Assert(deadoffsets[i - 1] < deadoffsets[i]); + } +#endif + maxoff = PageGetMaxOffsetNumber(page); for (offnum = FirstOffsetNumber; offnum <= maxoff && all_visible; @@ -3649,9 +3712,15 @@ heap_page_is_all_visible(LVRelState *vacrel, Buffer buf, */ if (ItemIdIsDead(itemid)) { - all_visible = false; - *all_frozen = false; - break; + if (!deadoffsets || + matched_dead_count >= ndeadoffsets || + deadoffsets[matched_dead_count] != offnum) + { + *all_frozen = all_visible = false; + break; + } + matched_dead_count++; + continue; }Assert(ItemIdIsNormal(itemid));
Hm, what about an assert checking that matched_dead_count == ndeadoffsets at
the end?
From 6b5fc27f0d80bab1df86a2e6fb51b64fd20c3cbb Mon Sep 17 00:00:00 2001
From: Melanie Plageman <melanieplageman@gmail.com>
Date: Mon, 15 Sep 2025 12:06:19 -0400
Subject: [PATCH v17 03/15] Assorted trivial heap_page_prune_and_freeze cleanup
Seems like a good idea, but I'm too lazy to go through this in detail.
From c69a5219a9b792f3c9f6dc730b8810a88d088ae6 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <melanieplageman@gmail.com>
Date: Tue, 16 Sep 2025 14:22:10 -0400
Subject: [PATCH v17 04/15] Add helper for freeze determination to
heap_page_prune_and_freezeAfter scanning through the line pointers on the heap page during
vacuum's first phase, we use several statuses and information we
collected to determine whether or not we will use the freeze plans we
assembled.Do this in a helper for better readability.
@@ -663,85 +775,11 @@ heap_page_prune_and_freeze(PruneFreezeParams *params, * Decide if we want to go ahead with freezing according to the freeze * plans we prepared, or not. */ - do_freeze = false; - ... + do_freeze = heap_page_will_freeze(params->relation, buffer, + did_tuple_hint_fpi, + do_prune, + do_hint_prune, + &prstate);
Assuming this is just moving the code, I like this quite bit.
From d4a4be3eed25853fc1ea84ebc2cbe0226afd823a Mon Sep 17 00:00:00 2001
From: Melanie Plageman <melanieplageman@gmail.com>
Date: Mon, 15 Sep 2025 16:25:44 -0400
Subject: [PATCH v17 05/15] Update PruneState.all_[visible|frozen] earlier in
pruning
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bitIn the prune/freeze path, we currently delay clearing all_visible and
all_frozen when dead items are present. This allows opportunistic
freezing if the page would otherwise be fully frozen, since those dead
items are later removed in vacuum’s third phase.However, if no freezing will be attempted, there’s no need to delay.
Clearing the flags promptly avoids extra bookkeeping in
heap_prune_unchanged_lp_normal(). At present this has no runtime effect
because all callers that consider setting the VM also attempt freezing,
but future callers (e.g. on-access pruning) may want to set the VM
without preparing freeze plans.
s/heap_prune_unchanged_lp_normal/heap_prune_record_unchanged_lp_normal/
I think this should make it clearer that this is about reducing overhead for
future use of this code in on-access-pruning.
We also used to defer clearing all_visible and all_frozen until after
computing the visibility cutoff XID. By determining the cutoff earlier,
we can update these flags immediately after deciding whether to
opportunistically freeze. This is necessary if we want to set the VM in
the same WAL record that prunes and freezes tuples on the page.
I think this last sentence needs to be first. This is the only really
important thing in this patch, afaict.
From 86193a71d2ff9649b5b1c1e6963bd610285ad369 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <melanieplageman@gmail.com>
Date: Fri, 3 Oct 2025 15:57:02 -0400
Subject: [PATCH v17 06/15] Make heap_page_is_all_visible independent of
LVRelStateFuture commits will use this function inside of pruneheap.c where we do
not have access to the LVRelState. We only need a few parameters from
the LVRelState, so just pass those in explicitly.Author: Melanie Plageman <melanieplageman@gmail.com>
Reviewed-by: Kirill Reshke <reshkekirill@gmail.com>
Reviewed-by: Robert Haas <robertmhaas@gmail.com>
Discussion: /messages/by-id/flat/CAAKRu_ZMw6Npd_qm2KM+FwQ3cMOMx1Dh3VMhp8-V7SOLxdK9-g@mail.gmail.com
Makes sense. I don't think we need to wait for other stuff to be committed to
commit this.
From dde0dfc578137f7c93f9a0e34af38dcdb841b080 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <melanieplageman@gmail.com>
Date: Wed, 8 Oct 2025 15:39:01 -0400
Subject: [PATCH v17 07/15] Eliminate XLOG_HEAP2_VISIBLE from vacuum
prune/freeze
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
Seems very mildly odd that 0002 references phase III in the subject, but this
doesn't...
(I'm just very lightly skimming from this point on)
During vacuum's first and third phases, we examine tuples' visibility
to determine if we can set the page all-visible in the visibility map.Previously, this check compared tuple xmins against a single XID chosen at
the start of vacuum (OldestXmin). We now use GlobalVisState, which also
enables future work to set the VM during on-access pruning, since ordinary
queries have access to GlobalVisState but not OldestXmin.This also benefits vacuum directly: GlobalVisState may advance
during a vacuum, allowing more pages to become considered all-visible.
In the rare case that it moves backward, VACUUM falls back to OldestXmin
to ensure we don’t attempt to freeze a dead tuple that wasn’t yet
prunable according to the GlobalVisState.
It could, but it currently won't advance in vacuum, right?
From e412f9298b0735d1091f4769ace4d2d1a7e62312 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <melanieplageman@gmail.com>
Date: Tue, 29 Jul 2025 09:57:13 -0400
Subject: [PATCH v17 12/15] Inline TransactionIdFollows/Precedes()Calling these from on-access pruning code had noticeable overhead in a
profile. There does not seem to be a reason not to inline them.
Makes sense, just commit this ahead of the more complicated rest.
From 54fcba140e515eba0eb1f9d48e7d5875b92e7e39 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <melanieplageman@gmail.com>
Date: Tue, 29 Jul 2025 14:34:30 -0400
Subject: [PATCH v17 13/15] Allow on-access pruning to set pages all-visible
Sorry, will have to look at this another time...
Greetings,
Andres Freund
On Thu, 9 Oct 2025 at 03:54, Melanie Plageman <melanieplageman@gmail.com> wrote:
On Mon, Oct 6, 2025 at 6:40 PM Melanie Plageman
<melanieplageman@gmail.com> wrote:In attached v16, I’ve reverted to removing XLOG_HEAP2_VISIBLE
entirely, rather than first removing each caller's heap page from the
VM WAL chain. I reordered changes and squashed several refactoring
patches to improve patch-by-patch readability. This should make the
set read differently from earlier versions that removed
XLOG_HEAP2_VISIBLE and had more step-by-step mechanical refactoring.I think if we plan to go all the way with removing XLOG_HEAP2_VISIBLE,
having intermediate patches that just set PD_ALL_VISIBLE when making
other heap pages are more confusing than helpful. Also, I think having
separate flags for setting PD_ALL_VISIBLE in the WAL record
over-complicated the code.I decided to reorder the patches to remove XLOG_HEAP2_VISIBLE from
vacuum phase III before removing it from vacuum phase I because
removing it from phase III doesn't require preliminary refactoring
patches. I've done that in the attached v17.I've also added an experimental patch on the end that refactors large
chunks of heap_page_prune_and_freeze() into helpers. I got some
feedback off-list that heap_page_prune_and_freeze() is too unwieldy
now. I'm not sure how I feel about them yet, so I haven't documented
them or moved them up in the patch set to before changes to
heap_page_prune_and_freeze().0001: Eliminate XLOG_HEAP2_VISIBLE from COPY FREEZE
0002: Eliminate XLOG_HEAP2_VISIBLE from phase III of vacuum
0003 - 0006: cleanup and refactoring to prepare for 0007
0007: Eliminate XLOG_HEAP2_VISIBLE from vacuum prune/freeze
0008 - 0009: Remove XLOG_HEAP2_VISIBLE
0010 - 0012: refactoring to prepare for 0013
0013: Set VM on-access
0014: Set pd_prune_xid on insert
0015: Experimental refactoring of heap_page_prune_and_freeze into helpers- Melanie
Hi! Should we also bump XLOG_PAGE_MAGIC after d96f87332 & add323da40a
or do we wait for full set to be committed?
--
Best regards,
Kirill Reshke
On Tue, Oct 14, 2025 at 08:31:04AM +0500, Kirill Reshke wrote:
Hi! Should we also bump XLOG_PAGE_MAGIC after d96f87332 & add323da40a
or do we wait for full set to be committed?
I may be missing something, of course, but d96f87332 has not changed
the WAL format, VISIBILITYMAP_ALL_VISIBLE and VISIBILITYMAP_ALL_FROZEN
existing before that. The change in xl_heap_prune as done in
add323da40a6 should have bumped the format number.
--
Michael
On Mon, Oct 13, 2025 at 11:43 PM Michael Paquier <michael@paquier.xyz> wrote:
On Tue, Oct 14, 2025 at 08:31:04AM +0500, Kirill Reshke wrote:
Hi! Should we also bump XLOG_PAGE_MAGIC after d96f87332 & add323da40a
or do we wait for full set to be committed?I may be missing something, of course, but d96f87332 has not changed
the WAL format, VISIBILITYMAP_ALL_VISIBLE and VISIBILITYMAP_ALL_FROZEN
existing before that. The change in xl_heap_prune as done in
add323da40a6 should have bumped the format number.
Oops! Thanks for reporting.
I messed up and forgot to do this. And, if I'm not misunderstanding
the criteria, I did the same thing at the beginning of September with
4b5f206de2bb. I've committed the bump. Hopefully I learned my lesson.
- Melanie
Thanks so much for the review! I've addressed all your feedback except
what is commented on inline below.
I've gone ahead and committed the preliminary patches that you thought
were ready to commit.
Attached v18 is what remains.
0001 - 0003: refactoring
0004 - 0006: finish eliminating XLOG_HEAP2_VISIBLE
0007 - 0009: refactoring
0010: Set VM on-access
0011: Set prune xid on insert
0012: Some refactoring for discussion
For 0001, I got feedback heap_page_prune_and_freeze() has too many
arguments, so I tried to address that. I'm interested to know if folks
like this more.
0011 still needs a bit of investigation to understand fully if
anything else in the index-killtuples test needs to be changed to make
sure we have the same coverage.
0012 is sort of WIP. I got feedback heap_page_prune_and_freeze() was
too long and should be split up into helpers. I want to know if this
split makes sense. I can pull it down the patch stack if so.
Only 0001 and 0012 are optional amongst the refactoring patches. The
others are required to make on-access VM-setting possible or viable.
On Thu, Oct 9, 2025 at 2:18 PM Andres Freund <andres@anarazel.de> wrote:
@@ -71,12 +84,12 @@ heap_xlog_prune_freeze(XLogReaderState *record) } - action = XLogReadBufferForRedoExtended(record, 0, RBM_NORMAL, - (xlrec.flags & XLHP_CLEANUP_LOCK) != 0, - &buffer); - if (action == BLK_NEEDS_REDO) + if (XLogReadBufferForRedoExtended(record, 0, RBM_NORMAL, + (xlrec.flags & XLHP_CLEANUP_LOCK) != 0, + &buffer) == BLK_NEEDS_REDO) { Page page = BufferGetPage(buffer); OffsetNumber *redirected;Why move it around this way?
Because there will be an action for the visibility map
XLogReadBufferForRedoExtended(). I could have renamed it heap_action,
but it is being used only in one place, so I preferred to just cut it
to avoid any confusion.
Advance the page LSN + * only if the record could include an FPI, since recovery skips + * records <= the stamped LSN. Otherwise it might skip an earlier FPI + * needed to repair a torn page. + */This is confusing, should probably just reference the stuff we did in the
!recovery case.
I fixed this and addressed all your feedback related to this before committing.
+ if (do_prune || nplans > 0 || + ((vmflags & VISIBILITYMAP_VALID_BITS) && XLogHintBitIsNeeded())) + PageSetLSN(page, lsn); + /* * Note: we don't worry about updating the page's prunability hints. * At worst this will cause an extra prune cycle to occur soon. */Not your fault, but that seems odd? Why aren't we just doing the right thing?
The comment dates back to 6f10eb2. I imagine no one ever bothered to
fuss with extracting the XID. You could change
heap_page_prune_execute() to return the right value -- though that's a
bit ugly since it is used in normal operation as well as recovery.
I wonder if the VM specific redo portion should be in a common helper? Might
not be enough code to worry though...
I think it might be more code as a helper at this point.
@@ -2860,6 +2867,29 @@ lazy_vacuum_heap_page(LVRelState *vacrel, BlockNumber blkno, Buffer buffer,
VACUUM_ERRCB_PHASE_VACUUM_HEAP, blkno,
InvalidOffsetNumber);+ /* + * Before marking dead items unused, check whether the page will become + * all-visible once that change is applied.So the function is named _would_ but here you say will :)
I thought about it more and still feel that this function name should
contain "would". From vacuum's perspective it is "will" -- because it
knows it will remove those dead items, but from the function's
perspective it is hypothetical. I changed the comment though.
+ if (heap_page_would_be_all_visible(vacrel, buffer, + deadoffsets, num_offsets, + &all_frozen, &visibility_cutoff_xid)) + { + vmflags |= VISIBILITYMAP_ALL_VISIBLE; + if (all_frozen) + { + vmflags |= VISIBILITYMAP_ALL_FROZEN; + Assert(!TransactionIdIsValid(visibility_cutoff_xid)); + } + + /* Take the lock on the vmbuffer before entering a critical section */ + LockBuffer(vmbuffer, BUFFER_LOCK_EXCLUSIVE);It sure would be nice if we had documented the lock order between the heap
page and the corresponding VM page anywhere. This is just doing what we did
before, so it's not this patch's fault, but I did get worried about it for a
moment.
Well, the comment above the visibilitymap_set* functions says what
expectations they have for the heap page being locked.
+static bool +heap_page_would_be_all_visible(LVRelState *vacrel, Buffer buf, + OffsetNumber *deadoffsets, + int ndeadoffsets, + bool *all_frozen, + TransactionId *visibility_cutoff_xid) +{ Page page = BufferGetPage(buf);
Hm, what about an assert checking that matched_dead_count == ndeadoffsets at
the end?
I was going to put an Assert(ndeadoffsets <= matched_dead_count), but
then I started wondering if there is a way we could end up with fewer
dead items than we collected during phase I.
I had thought about if we dropped an index and then did on-access
pruning -- but we don't allow setting LP_DEAD items LP_UNUSED in
on-access pruning. So, maybe this is safe... I can do a follow-on
commit to add the assert. But I'm just not 100% sure I've thought of
all the cases where we might end up with fewer dead items.
During vacuum's first and third phases, we examine tuples' visibility
to determine if we can set the page all-visible in the visibility map.Previously, this check compared tuple xmins against a single XID chosen at
the start of vacuum (OldestXmin). We now use GlobalVisState, which also
enables future work to set the VM during on-access pruning, since ordinary
queries have access to GlobalVisState but not OldestXmin.This also benefits vacuum directly: GlobalVisState may advance
during a vacuum, allowing more pages to become considered all-visible.
In the rare case that it moves backward, VACUUM falls back to OldestXmin
to ensure we don’t attempt to freeze a dead tuple that wasn’t yet
prunable according to the GlobalVisState.It could, but it currently won't advance in vacuum, right?
I thought it was possible for it to advance when calling
heap_prune_satisfies_vacuum() ->
GlobalVisTestIsRemovableXid()->...GlobalVisUpdate(). This case isn't
going to be common, but some things can cause us to update it.
We have talked about explicitly updating GlobalVisState more often
during vacuums of large tables. But I was under the impression that it
was at least possible for it to advance during vacuum now.
- Melanie
Attachments:
v18-0001-Refactor-heap_page_prune_and_freeze-parameters-i.patchtext/x-patch; charset=UTF-8; name=v18-0001-Refactor-heap_page_prune_and_freeze-parameters-i.patchDownload
From d385615495305be4d42aeee0422dfeef8d26f3a9 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <melanieplageman@gmail.com>
Date: Tue, 14 Oct 2025 11:10:25 -0400
Subject: [PATCH v18 01/12] Refactor heap_page_prune_and_freeze() parameters
into a struct
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
heap_page_prune_and_freeze() had accumulated an unwieldy number of input
parameters, and upcoming work to handle VM updates in this function will
add even more.
Introduce a new PruneFreezeParams struct to group the function’s input
parameters, improving readability and maintainability.
Discussion: https://postgr.es/m/yn4zp35kkdsjx6wf47zcfmxgexxt4h2og47pvnw2x5ifyrs3qc%407uw6jyyxuyf7
---
src/backend/access/heap/pruneheap.c | 86 +++++++++++++---------------
src/backend/access/heap/vacuumlazy.c | 16 ++++--
src/include/access/heapam.h | 62 ++++++++++++++++----
src/tools/pgindent/typedefs.list | 1 +
4 files changed, 101 insertions(+), 64 deletions(-)
diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index 231bea679c6..450b2eb6494 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -258,15 +258,23 @@ heap_page_prune_opt(Relation relation, Buffer buffer)
if (PageIsFull(page) || PageGetHeapFreeSpace(page) < minfree)
{
OffsetNumber dummy_off_loc;
+ PruneFreezeParams params;
PruneFreezeResult presult;
+ params.relation = relation;
+ params.buffer = buffer;
+ params.reason = PRUNE_ON_ACCESS;
+ params.vistest = vistest;
+ params.cutoffs = NULL;
+
/*
* For now, pass mark_unused_now as false regardless of whether or
* not the relation has indexes, since we cannot safely determine
* that during on-access pruning with the current implementation.
*/
- heap_page_prune_and_freeze(relation, buffer, vistest, 0,
- NULL, &presult, PRUNE_ON_ACCESS, &dummy_off_loc, NULL, NULL);
+ params.options = 0;
+
+ heap_page_prune_and_freeze(¶ms, &presult, &dummy_off_loc, NULL, NULL);
/*
* Report the number of tuples reclaimed to pgstats. This is
@@ -419,60 +427,43 @@ heap_page_will_freeze(Relation relation, Buffer buffer,
* also need to account for a reduction in the length of the line pointer
* array following array truncation by us.
*
- * If the HEAP_PRUNE_FREEZE option is set, we will also freeze tuples if it's
- * required in order to advance relfrozenxid / relminmxid, or if it's
- * considered advantageous for overall system performance to do so now. The
- * 'cutoffs', 'presult', 'new_relfrozen_xid' and 'new_relmin_mxid' arguments
- * are required when freezing. When HEAP_PRUNE_FREEZE option is set, we also
- * set presult->all_visible and presult->all_frozen on exit, to indicate if
- * the VM bits can be set. They are always set to false when the
- * HEAP_PRUNE_FREEZE option is not set, because at the moment only callers
- * that also freeze need that information.
- *
- * vistest is used to distinguish whether tuples are DEAD or RECENTLY_DEAD
- * (see heap_prune_satisfies_vacuum).
+ * params contains the input parameters used to control freezing and pruning
+ * behavior. See the definition of PruneFreezeParams for more on what each
+ * parameter does.
*
- * options:
- * MARK_UNUSED_NOW indicates that dead items can be set LP_UNUSED during
- * pruning.
- *
- * FREEZE indicates that we will also freeze tuples, and will return
- * 'all_visible', 'all_frozen' flags to the caller.
- *
- * cutoffs contains the freeze cutoffs, established by VACUUM at the beginning
- * of vacuuming the relation. Required if HEAP_PRUNE_FREEZE option is set.
- * cutoffs->OldestXmin is also used to determine if dead tuples are
- * HEAPTUPLE_RECENTLY_DEAD or HEAPTUPLE_DEAD.
+ * If the HEAP_PRUNE_FREEZE option is set in params, we will freeze tuples if
+ * it's required in order to advance relfrozenxid / relminmxid, or if it's
+ * considered advantageous for overall system performance to do so now. The
+ * 'params.cutoffs', 'presult', 'new_relfrozen_xid' and 'new_relmin_mxid'
+ * arguments are required when freezing. When HEAP_PRUNE_FREEZE option is
+ * passed, we also set presult->all_visible and presult->all_frozen on exit,
+ * to indicate if the VM bits can be set. They are always set to false when
+ * the HEAP_PRUNE_FREEZE option is not passed, because at the moment only
+ * callers that also freeze need that information.
*
* presult contains output parameters needed by callers, such as the number of
* tuples removed and the offsets of dead items on the page after pruning.
* heap_page_prune_and_freeze() is responsible for initializing it. Required
* by all callers.
*
- * reason indicates why the pruning is performed. It is included in the WAL
- * record for debugging and analysis purposes, but otherwise has no effect.
- *
* off_loc is the offset location required by the caller to use in error
* callback.
*
* new_relfrozen_xid and new_relmin_mxid must provided by the caller if the
- * HEAP_PRUNE_FREEZE option is set. On entry, they contain the oldest XID and
- * multi-XID seen on the relation so far. They will be updated with oldest
- * values present on the page after pruning. After processing the whole
- * relation, VACUUM can use these values as the new relfrozenxid/relminmxid
- * for the relation.
+ * HEAP_PRUNE_FREEZE option is set in params. On entry, they contain the
+ * oldest XID and multi-XID seen on the relation so far. They will be updated
+ * with oldest values present on the page after pruning. After processing the
+ * whole relation, VACUUM can use these values as the new
+ * relfrozenxid/relminmxid for the relation.
*/
void
-heap_page_prune_and_freeze(Relation relation, Buffer buffer,
- GlobalVisState *vistest,
- int options,
- struct VacuumCutoffs *cutoffs,
+heap_page_prune_and_freeze(PruneFreezeParams *params,
PruneFreezeResult *presult,
- PruneReason reason,
OffsetNumber *off_loc,
TransactionId *new_relfrozen_xid,
MultiXactId *new_relmin_mxid)
{
+ Buffer buffer = params->buffer;
Page page = BufferGetPage(buffer);
BlockNumber blockno = BufferGetBlockNumber(buffer);
OffsetNumber offnum,
@@ -486,10 +477,11 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
int64 fpi_before = pgWalUsage.wal_fpi;
/* Copy parameters to prstate */
- prstate.vistest = vistest;
- prstate.mark_unused_now = (options & HEAP_PAGE_PRUNE_MARK_UNUSED_NOW) != 0;
- prstate.attempt_freeze = (options & HEAP_PAGE_PRUNE_FREEZE) != 0;
- prstate.cutoffs = cutoffs;
+ prstate.vistest = params->vistest;
+ prstate.mark_unused_now =
+ (params->options & HEAP_PAGE_PRUNE_MARK_UNUSED_NOW) != 0;
+ prstate.attempt_freeze = (params->options & HEAP_PAGE_PRUNE_FREEZE) != 0;
+ prstate.cutoffs = params->cutoffs;
/*
* Our strategy is to scan the page and make lists of items to change,
@@ -583,7 +575,7 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
prstate.visibility_cutoff_xid = InvalidTransactionId;
maxoff = PageGetMaxOffsetNumber(page);
- tup.t_tableOid = RelationGetRelid(relation);
+ tup.t_tableOid = RelationGetRelid(params->relation);
/*
* Determine HTSV for all tuples, and queue them up for processing as HOT
@@ -786,7 +778,7 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
* Decide if we want to go ahead with freezing according to the freeze
* plans we prepared, or not.
*/
- do_freeze = heap_page_will_freeze(relation, buffer,
+ do_freeze = heap_page_will_freeze(params->relation, buffer,
did_tuple_hint_fpi,
do_prune,
do_hint_prune,
@@ -838,7 +830,7 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
/*
* Emit a WAL XLOG_HEAP2_PRUNE* record showing what we did
*/
- if (RelationNeedsWAL(relation))
+ if (RelationNeedsWAL(params->relation))
{
/*
* The snapshotConflictHorizon for the whole record should be the
@@ -876,11 +868,11 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
else
conflict_xid = prstate.latest_xid_removed;
- log_heap_prune_and_freeze(relation, buffer,
+ log_heap_prune_and_freeze(params->relation, buffer,
InvalidBuffer, /* vmbuffer */
0, /* vmflags */
conflict_xid,
- true, reason,
+ true, params->reason,
prstate.frozen, prstate.nfrozen,
prstate.redirected, prstate.nredirected,
prstate.nowdead, prstate.ndead,
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index 71fbd68c8ea..7db7c56311b 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -1964,10 +1964,16 @@ lazy_scan_prune(LVRelState *vacrel,
{
Relation rel = vacrel->rel;
PruneFreezeResult presult;
- int prune_options = 0;
+ PruneFreezeParams params;
Assert(BufferGetBlockNumber(buf) == blkno);
+ params.relation = rel;
+ params.buffer = buf;
+ params.reason = PRUNE_VACUUM_SCAN;
+ params.cutoffs = &vacrel->cutoffs;
+ params.vistest = vacrel->vistest;
+
/*
* Prune all HOT-update chains and potentially freeze tuples on this page.
*
@@ -1983,12 +1989,12 @@ lazy_scan_prune(LVRelState *vacrel,
* tuples. Pruning will have determined whether or not the page is
* all-visible.
*/
- prune_options = HEAP_PAGE_PRUNE_FREEZE;
+ params.options = HEAP_PAGE_PRUNE_FREEZE;
if (vacrel->nindexes == 0)
- prune_options |= HEAP_PAGE_PRUNE_MARK_UNUSED_NOW;
+ params.options |= HEAP_PAGE_PRUNE_MARK_UNUSED_NOW;
- heap_page_prune_and_freeze(rel, buf, vacrel->vistest, prune_options,
- &vacrel->cutoffs, &presult, PRUNE_VACUUM_SCAN,
+ heap_page_prune_and_freeze(¶ms,
+ &presult,
&vacrel->offnum,
&vacrel->NewRelfrozenXid, &vacrel->NewRelminMxid);
diff --git a/src/include/access/heapam.h b/src/include/access/heapam.h
index 8cbff6ab0eb..74a5c24002b 100644
--- a/src/include/access/heapam.h
+++ b/src/include/access/heapam.h
@@ -221,6 +221,55 @@ typedef struct HeapPageFreeze
} HeapPageFreeze;
+
+/* 'reason' codes for heap_page_prune_and_freeze() */
+typedef enum
+{
+ PRUNE_ON_ACCESS, /* on-access pruning */
+ PRUNE_VACUUM_SCAN, /* VACUUM 1st heap pass */
+ PRUNE_VACUUM_CLEANUP, /* VACUUM 2nd heap pass */
+} PruneReason;
+
+/*
+ * Input parameters to heap_page_prune_and_freeze()
+ */
+typedef struct PruneFreezeParams
+{
+ Relation relation; /* relation containing buffer to be pruned */
+ Buffer buffer; /* buffer to be pruned */
+
+ /*
+ * The reason pruning was performed. It is used to set the WAL record
+ * opcode which is used for debugging and analysis purposes.
+ */
+ PruneReason reason;
+
+ /*
+ * Contains flag bits:
+ *
+ * MARK_UNUSED_NOW indicates that dead items can be set LP_UNUSED during
+ * pruning.
+ *
+ * FREEZE indicates that we will also freeze tuples, and will return
+ * 'all_visible', 'all_frozen' flags to the caller.
+ */
+ int options;
+
+ /*
+ * vistest is used to distinguish whether tuples are DEAD or RECENTLY_DEAD
+ * (see heap_prune_satisfies_vacuum).
+ */
+ GlobalVisState *vistest;
+
+ /*
+ * cutoffs contains the freeze cutoffs, established by VACUUM at the
+ * beginning of vacuuming the relation. Required if HEAP_PRUNE_FREEZE
+ * option is set. cutoffs->OldestXmin is also used to determine if dead
+ * tuples are HEAPTUPLE_RECENTLY_DEAD or HEAPTUPLE_DEAD.
+ */
+ struct VacuumCutoffs *cutoffs;
+} PruneFreezeParams;
+
/*
* Per-page state returned by heap_page_prune_and_freeze()
*/
@@ -264,13 +313,6 @@ typedef struct PruneFreezeResult
OffsetNumber deadoffsets[MaxHeapTuplesPerPage];
} PruneFreezeResult;
-/* 'reason' codes for heap_page_prune_and_freeze() */
-typedef enum
-{
- PRUNE_ON_ACCESS, /* on-access pruning */
- PRUNE_VACUUM_SCAN, /* VACUUM 1st heap pass */
- PRUNE_VACUUM_CLEANUP, /* VACUUM 2nd heap pass */
-} PruneReason;
/* ----------------
* function prototypes for heap access method
@@ -367,12 +409,8 @@ extern TransactionId heap_index_delete_tuples(Relation rel,
/* in heap/pruneheap.c */
extern void heap_page_prune_opt(Relation relation, Buffer buffer);
-extern void heap_page_prune_and_freeze(Relation relation, Buffer buffer,
- GlobalVisState *vistest,
- int options,
- struct VacuumCutoffs *cutoffs,
+extern void heap_page_prune_and_freeze(PruneFreezeParams *params,
PruneFreezeResult *presult,
- PruneReason reason,
OffsetNumber *off_loc,
TransactionId *new_relfrozen_xid,
MultiXactId *new_relmin_mxid);
diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list
index 5290b91e83e..b221b3699bf 100644
--- a/src/tools/pgindent/typedefs.list
+++ b/src/tools/pgindent/typedefs.list
@@ -2342,6 +2342,7 @@ ProjectionPath
PromptInterruptContext
ProtocolVersion
PrsStorage
+PruneFreezeParams
PruneFreezeResult
PruneReason
PruneState
--
2.43.0
v18-0002-Keep-all_frozen-updated-in-heap_page_prune_and_f.patchtext/x-patch; charset=US-ASCII; name=v18-0002-Keep-all_frozen-updated-in-heap_page_prune_and_f.patchDownload
From 4c3f113fbd62b553949b95cb352347767278e7dc Mon Sep 17 00:00:00 2001
From: Melanie Plageman <melanieplageman@gmail.com>
Date: Tue, 14 Oct 2025 14:55:40 -0400
Subject: [PATCH v18 02/12] Keep all_frozen updated in
heap_page_prune_and_freeze
Previously, we relied on all_visible and all_frozen being used together
to ensure that all_frozen was correct, but it is better to keep both
fields updated.
Future changes will separate their usage, so we should not depend on
all_visible for the validity of all_frozen.
---
src/backend/access/heap/pruneheap.c | 22 +++++++++++-----------
src/backend/access/heap/vacuumlazy.c | 9 ++++-----
2 files changed, 15 insertions(+), 16 deletions(-)
diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index 450b2eb6494..daa719fc2a1 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -143,10 +143,6 @@ typedef struct
* whether to freeze the page or not. The all_visible and all_frozen
* values returned to the caller are adjusted to include LP_DEAD items at
* the end.
- *
- * all_frozen should only be considered valid if all_visible is also set;
- * we don't bother to clear the all_frozen flag every time we clear the
- * all_visible flag.
*/
bool all_visible;
bool all_frozen;
@@ -361,8 +357,10 @@ heap_page_will_freeze(Relation relation, Buffer buffer,
* anymore. The opportunistic freeze heuristic must be improved;
* however, for now, try to approximate the old logic.
*/
- if (prstate->all_visible && prstate->all_frozen && prstate->nfrozen > 0)
+ if (prstate->all_frozen && prstate->nfrozen > 0)
{
+ Assert(prstate->all_visible);
+
/*
* Freezing would make the page all-frozen. Have already emitted
* an FPI or will do so anyway?
@@ -784,6 +782,8 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
do_hint_prune,
&prstate);
+ Assert(!prstate.all_frozen || prstate.all_visible);
+
/* Any error while applying the changes is critical */
START_CRIT_SECTION();
@@ -853,7 +853,7 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
*/
if (do_freeze)
{
- if (prstate.all_visible && prstate.all_frozen)
+ if (prstate.all_frozen)
frz_conflict_horizon = prstate.visibility_cutoff_xid;
else
{
@@ -1418,7 +1418,7 @@ heap_prune_record_unchanged_lp_normal(Page page, PruneState *prstate, OffsetNumb
if (!HeapTupleHeaderXminCommitted(htup))
{
- prstate->all_visible = false;
+ prstate->all_visible = prstate->all_frozen = false;
break;
}
@@ -1440,7 +1440,7 @@ heap_prune_record_unchanged_lp_normal(Page page, PruneState *prstate, OffsetNumb
Assert(prstate->cutoffs);
if (!TransactionIdPrecedes(xmin, prstate->cutoffs->OldestXmin))
{
- prstate->all_visible = false;
+ prstate->all_visible = prstate->all_frozen = false;
break;
}
@@ -1453,7 +1453,7 @@ heap_prune_record_unchanged_lp_normal(Page page, PruneState *prstate, OffsetNumb
case HEAPTUPLE_RECENTLY_DEAD:
prstate->recently_dead_tuples++;
- prstate->all_visible = false;
+ prstate->all_visible = prstate->all_frozen = false;
/*
* This tuple will soon become DEAD. Update the hint field so
@@ -1472,7 +1472,7 @@ heap_prune_record_unchanged_lp_normal(Page page, PruneState *prstate, OffsetNumb
* assumption is a bit shaky, but it is what acquire_sample_rows()
* does, so be consistent.
*/
- prstate->all_visible = false;
+ prstate->all_visible = prstate->all_frozen = false;
/*
* If we wanted to optimize for aborts, we might consider marking
@@ -1490,7 +1490,7 @@ heap_prune_record_unchanged_lp_normal(Page page, PruneState *prstate, OffsetNumb
* will commit and update the counters after we report.
*/
prstate->live_tuples++;
- prstate->all_visible = false;
+ prstate->all_visible = prstate->all_frozen = false;
/*
* This tuple may soon become DEAD. Update the hint field so that
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index 7db7c56311b..58de605ca09 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -2020,7 +2020,6 @@ lazy_scan_prune(LVRelState *vacrel,
* agreement with heap_page_is_all_visible() using an assertion.
*/
#ifdef USE_ASSERT_CHECKING
- /* Note that all_frozen value does not matter when !all_visible */
if (presult.all_visible)
{
TransactionId debug_cutoff;
@@ -2074,6 +2073,7 @@ lazy_scan_prune(LVRelState *vacrel,
*has_lpdead_items = (presult.lpdead_items > 0);
Assert(!presult.all_visible || !(*has_lpdead_items));
+ Assert(!presult.all_frozen || presult.all_visible);
/*
* Handle setting visibility map bit based on information from the VM (as
@@ -2179,11 +2179,10 @@ lazy_scan_prune(LVRelState *vacrel,
/*
* If the all-visible page is all-frozen but not marked as such yet, mark
- * it as all-frozen. Note that all_frozen is only valid if all_visible is
- * true, so we must check both all_visible and all_frozen.
+ * it as all-frozen.
*/
- else if (all_visible_according_to_vm && presult.all_visible &&
- presult.all_frozen && !VM_ALL_FROZEN(vacrel->rel, blkno, &vmbuffer))
+ else if (all_visible_according_to_vm && presult.all_frozen &&
+ !VM_ALL_FROZEN(vacrel->rel, blkno, &vmbuffer))
{
uint8 old_vmbits;
--
2.43.0
v18-0003-Update-PruneState.all_-visible-frozen-earlier-in.patchtext/x-patch; charset=UTF-8; name=v18-0003-Update-PruneState.all_-visible-frozen-earlier-in.patchDownload
From 181368d080f6a73304c4f248739ca08f85a737c4 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <melanieplageman@gmail.com>
Date: Tue, 14 Oct 2025 15:21:49 -0400
Subject: [PATCH v18 03/12] Update PruneState.all_[visible|frozen] earlier in
pruning
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
In the prune/freeze path, we currently delay clearing all_visible and
all_frozen in the presence of dead items. This allows opportunistic
freezing if the page would otherwise be fully frozen, since those dead
items are later removed in vacuum’s third phase.
To move the VM update into the same WAL record that
prunes and freezes tuples, we must know whether the page will
be marked all-visible/all-frozen before emitting WAL.
The only barrier to updating these flags immediately after deciding
whether to opportunistically freeze is that we previously used
all_frozen to compute the snapshot conflict horizon when freezing
tuples. By determining the cutoff earlier, we can update the flags
immediately after making the freeze decision.
This is required to set the VM in the XLOG_HEAP2_PRUNE_VACUUM_SCAN
record emitted by pruning and freezing.
---
src/backend/access/heap/pruneheap.c | 117 ++++++++++++++--------------
1 file changed, 57 insertions(+), 60 deletions(-)
diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index daa719fc2a1..ef8861022f1 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -138,11 +138,11 @@ typedef struct
* bits. It is only valid if we froze some tuples, and all_frozen is
* true.
*
- * NOTE: all_visible and all_frozen don't include LP_DEAD items. That's
- * convenient for heap_page_prune_and_freeze(), to use them to decide
- * whether to freeze the page or not. The all_visible and all_frozen
- * values returned to the caller are adjusted to include LP_DEAD items at
- * the end.
+ * NOTE: all_visible and all_frozen initially don't include LP_DEAD items.
+ * That's convenient for heap_page_prune_and_freeze() to use them to
+ * decide whether to freeze the page or not. The all_visible and
+ * all_frozen values returned to the caller are adjusted to include
+ * LP_DEAD items after we determine whether to opportunistically freeze.
*/
bool all_visible;
bool all_frozen;
@@ -175,7 +175,7 @@ static void page_verify_redirects(Page page);
static bool heap_page_will_freeze(Relation relation, Buffer buffer,
bool did_tuple_hint_fpi, bool do_prune, bool do_hint_prune,
- PruneState *prstate);
+ PruneState *prstate, TransactionId *frz_conflict_horizon);
/*
@@ -308,7 +308,9 @@ heap_page_prune_opt(Relation relation, Buffer buffer)
* performs several pre-freeze checks.
*
* The values of do_prune, do_hint_prune, and did_tuple_hint_fpi must be
- * determined before calling this function.
+ * determined before calling this function. *frz_conflict_horizon is set to
+ * the snapshot conflict horizon we for the WAL record should we decide to
+ * freeze tuples.
*
* prstate is both an input and output parameter.
*
@@ -320,7 +322,8 @@ heap_page_will_freeze(Relation relation, Buffer buffer,
bool did_tuple_hint_fpi,
bool do_prune,
bool do_hint_prune,
- PruneState *prstate)
+ PruneState *prstate,
+ TransactionId *frz_conflict_horizon)
{
bool do_freeze = false;
@@ -390,6 +393,22 @@ heap_page_will_freeze(Relation relation, Buffer buffer,
* critical section.
*/
heap_pre_freeze_checks(buffer, prstate->frozen, prstate->nfrozen);
+
+ /*
+ * Calculate what the snapshot conflict horizon should be for a record
+ * freezing tuples. We can use the visibility_cutoff_xid as our cutoff
+ * for conflicts when the whole page is eligible to become all-frozen
+ * in the VM once we're done with it. Otherwise, we generate a
+ * conservative cutoff by stepping back from OldestXmin.
+ */
+ if (prstate->all_frozen)
+ *frz_conflict_horizon = prstate->visibility_cutoff_xid;
+ else
+ {
+ /* Avoids false conflicts when hot_standby_feedback in use */
+ *frz_conflict_horizon = prstate->cutoffs->OldestXmin;
+ TransactionIdRetreat(*frz_conflict_horizon);
+ }
}
else if (prstate->nfrozen > 0)
{
@@ -434,10 +453,11 @@ heap_page_will_freeze(Relation relation, Buffer buffer,
* considered advantageous for overall system performance to do so now. The
* 'params.cutoffs', 'presult', 'new_relfrozen_xid' and 'new_relmin_mxid'
* arguments are required when freezing. When HEAP_PRUNE_FREEZE option is
- * passed, we also set presult->all_visible and presult->all_frozen on exit,
- * to indicate if the VM bits can be set. They are always set to false when
- * the HEAP_PRUNE_FREEZE option is not passed, because at the moment only
- * callers that also freeze need that information.
+ * passed, we also set presult->all_visible and presult->all_frozen after
+ * determining whether or not to opporunistically freeze, to indicate if the
+ * VM bits can be set. They are always set to false when the
+ * HEAP_PRUNE_FREEZE option is not passed, because at the moment only callers
+ * that also freeze need that information.
*
* presult contains output parameters needed by callers, such as the number of
* tuples removed and the offsets of dead items on the page after pruning.
@@ -473,6 +493,7 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
bool do_hint_prune;
bool did_tuple_hint_fpi;
int64 fpi_before = pgWalUsage.wal_fpi;
+ TransactionId frz_conflict_horizon = InvalidTransactionId;
/* Copy parameters to prstate */
prstate.vistest = params->vistest;
@@ -542,10 +563,10 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
* are tuples present that are not visible to everyone or if there are
* dead tuples which are not yet removable. However, dead tuples which
* will be removed by the end of vacuuming should not preclude us from
- * opportunistically freezing. Because of that, we do not clear
- * all_visible when we see LP_DEAD items. We fix that at the end of the
- * function, when we return the value to the caller, so that the caller
- * doesn't set the VM bit incorrectly.
+ * opportunistically freezing. Because of that, we do not immediately
+ * clear all_visible when we see LP_DEAD items. We fix that after
+ * scanning the line pointers, before we return the value to the caller,
+ * so that the caller doesn't set the VM bit incorrectly.
*/
if (prstate.attempt_freeze)
{
@@ -780,7 +801,24 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
did_tuple_hint_fpi,
do_prune,
do_hint_prune,
- &prstate);
+ &prstate,
+ &frz_conflict_horizon);
+
+ /*
+ * While scanning the line pointers, we did not clear
+ * all_visible/all_frozen when encountering LP_DEAD items because we
+ * wanted the decision whether or not to freeze the page to be unaffected
+ * by the short-term presence of LP_DEAD items. These LP_DEAD items are
+ * effectively assumed to be LP_UNUSED items in the making. It doesn't
+ * matter which vacuum heap pass (initial pass or final pass) ends up
+ * setting the page all-frozen, as long as the ongoing VACUUM does it.
+ *
+ * Now that we finished determining whether or not to freeze the page,
+ * update all_visible and all_frozen so that they reflect the true state
+ * of the page for setting PD_ALL_VISIBLE and VM bits.
+ */
+ if (prstate.lpdead_items > 0)
+ prstate.all_visible = prstate.all_frozen = false;
Assert(!prstate.all_frozen || prstate.all_visible);
@@ -842,27 +880,8 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
* on the standby with xids older than the youngest tuple this
* record will freeze will conflict.
*/
- TransactionId frz_conflict_horizon = InvalidTransactionId;
TransactionId conflict_xid;
- /*
- * We can use the visibility_cutoff_xid as our cutoff for
- * conflicts when the whole page is eligible to become all-frozen
- * in the VM once we're done with it. Otherwise we generate a
- * conservative cutoff by stepping back from OldestXmin.
- */
- if (do_freeze)
- {
- if (prstate.all_frozen)
- frz_conflict_horizon = prstate.visibility_cutoff_xid;
- else
- {
- /* Avoids false conflicts when hot_standby_feedback in use */
- frz_conflict_horizon = prstate.cutoffs->OldestXmin;
- TransactionIdRetreat(frz_conflict_horizon);
- }
- }
-
if (TransactionIdFollows(frz_conflict_horizon, prstate.latest_xid_removed))
conflict_xid = frz_conflict_horizon;
else
@@ -888,30 +907,8 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
presult->nfrozen = prstate.nfrozen;
presult->live_tuples = prstate.live_tuples;
presult->recently_dead_tuples = prstate.recently_dead_tuples;
-
- /*
- * It was convenient to ignore LP_DEAD items in all_visible earlier on to
- * make the choice of whether or not to freeze the page unaffected by the
- * short-term presence of LP_DEAD items. These LP_DEAD items were
- * effectively assumed to be LP_UNUSED items in the making. It doesn't
- * matter which vacuum heap pass (initial pass or final pass) ends up
- * setting the page all-frozen, as long as the ongoing VACUUM does it.
- *
- * Now that freezing has been finalized, unset all_visible if there are
- * any LP_DEAD items on the page. It needs to reflect the present state
- * of the page, as expected by our caller.
- */
- if (prstate.all_visible && prstate.lpdead_items == 0)
- {
- presult->all_visible = prstate.all_visible;
- presult->all_frozen = prstate.all_frozen;
- }
- else
- {
- presult->all_visible = false;
- presult->all_frozen = false;
- }
-
+ presult->all_visible = prstate.all_visible;
+ presult->all_frozen = prstate.all_frozen;
presult->hastup = prstate.hastup;
/*
--
2.43.0
v18-0004-Eliminate-XLOG_HEAP2_VISIBLE-from-vacuum-phase-I.patchtext/x-patch; charset=UTF-8; name=v18-0004-Eliminate-XLOG_HEAP2_VISIBLE-from-vacuum-phase-I.patchDownload
From 6db4df888158810125c25fa00a05fe31342a9c0f Mon Sep 17 00:00:00 2001
From: Melanie Plageman <melanieplageman@gmail.com>
Date: Wed, 8 Oct 2025 15:39:01 -0400
Subject: [PATCH v18 04/12] Eliminate XLOG_HEAP2_VISIBLE from vacuum phase I
prune/freeze
Vacuum no longer emits a separate WAL record for each page set
all-visible or all-frozen during phase I. Instead, visibility map
updates are now included in the XLOG_HEAP2_PRUNE_VACUUM_SCAN record that
is already emitted for pruning and freezing.
Previously, heap_page_prune_and_freeze() determined whether a page was
all-visible, but the corresponding VM bits were only set later in
lazy_scan_prune(). Now the VM is updated immediately in
heap_page_prune_and_freeze(), at the same time as the heap
modifications.
This change applies only to vacuum phase I, not to pruning performed
during normal page access.
Reviewed-by: Robert Haas <robertmhaas@gmail.com>
---
src/backend/access/heap/heapam_xlog.c | 37 ++-
src/backend/access/heap/pruneheap.c | 434 ++++++++++++++++++++------
src/backend/access/heap/vacuumlazy.c | 207 +-----------
src/include/access/heapam.h | 43 ++-
4 files changed, 421 insertions(+), 300 deletions(-)
diff --git a/src/backend/access/heap/heapam_xlog.c b/src/backend/access/heap/heapam_xlog.c
index 230d9888793..412ac3edf25 100644
--- a/src/backend/access/heap/heapam_xlog.c
+++ b/src/backend/access/heap/heapam_xlog.c
@@ -104,6 +104,8 @@ heap_xlog_prune_freeze(XLogReaderState *record)
OffsetNumber *frz_offsets;
char *dataptr = XLogRecGetBlockData(record, 0, &datalen);
bool do_prune;
+ bool set_lsn = false;
+ bool mark_buffer_dirty = false;
heap_xlog_deserialize_prune_and_freeze(dataptr, xlrec.flags,
&nplans, &plans, &frz_offsets,
@@ -157,17 +159,36 @@ heap_xlog_prune_freeze(XLogReaderState *record)
/* There should be no more data */
Assert((char *) frz_offsets == dataptr + datalen);
- if (vmflags & VISIBILITYMAP_VALID_BITS)
- PageSetAllVisible(page);
-
- MarkBufferDirty(buffer);
+ if (do_prune || nplans > 0)
+ mark_buffer_dirty = set_lsn = true;
/*
- * See log_heap_prune_and_freeze() for commentary on when we set the
- * heap page LSN.
+ * The critical integrity requirement here is that we must never end
+ * up with with the visibility map bit set and the page-level
+ * PD_ALL_VISIBLE bit clear. If that were to occur, a subsequent page
+ * modification would fail to clear the visibility map bit.
+ *
+ * vmflags may be nonzero with PD_ALL_VISIBLE already set (e.g. when
+ * marking an all-visible page all-frozen). If only the VM is updated,
+ * the heap page need not be dirtied.
*/
- if (do_prune || nplans > 0 ||
- ((vmflags & VISIBILITYMAP_VALID_BITS) && XLogHintBitIsNeeded()))
+ if ((vmflags & VISIBILITYMAP_VALID_BITS) && !PageIsAllVisible(page))
+ {
+ PageSetAllVisible(page);
+ mark_buffer_dirty = true;
+
+ /*
+ * See log_heap_prune_and_freeze() for commentary on when we set
+ * the heap page LSN.
+ */
+ if (XLogHintBitIsNeeded())
+ set_lsn = true;
+ }
+
+ if (mark_buffer_dirty)
+ MarkBufferDirty(buffer);
+
+ if (set_lsn)
PageSetLSN(page, lsn);
/*
diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index ef8861022f1..b38b62779ab 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -19,7 +19,7 @@
#include "access/htup_details.h"
#include "access/multixact.h"
#include "access/transam.h"
-#include "access/visibilitymapdefs.h"
+#include "access/visibilitymap.h"
#include "access/xlog.h"
#include "access/xloginsert.h"
#include "commands/vacuum.h"
@@ -44,6 +44,8 @@ typedef struct
bool mark_unused_now;
/* whether to attempt freezing tuples */
bool attempt_freeze;
+ /* whether or not to attempt updating the VM */
+ bool attempt_update_vm;
struct VacuumCutoffs *cutoffs;
/*-------------------------------------------------------
@@ -133,16 +135,17 @@ typedef struct
* all_visible and all_frozen indicate if the all-visible and all-frozen
* bits in the visibility map can be set for this page after pruning.
*
- * visibility_cutoff_xid is the newest xmin of live tuples on the page.
- * The caller can use it as the conflict horizon, when setting the VM
- * bits. It is only valid if we froze some tuples, and all_frozen is
- * true.
+ * visibility_cutoff_xid is the newest xmin of live tuples on the page. It
+ * can be used as the conflict horizon when setting the VM or when
+ * freezing all the tuples on the page. It is only valid when all the live
+ * tuples on the page are all-visible.
*
* NOTE: all_visible and all_frozen initially don't include LP_DEAD items.
* That's convenient for heap_page_prune_and_freeze() to use them to
- * decide whether to freeze the page or not. The all_visible and
- * all_frozen values returned to the caller are adjusted to include
- * LP_DEAD items after we determine whether to opportunistically freeze.
+ * decide whether to opportunistically freeze the page or not. The
+ * all_visible and all_frozen values ultimately used to set the VM are
+ * adjusted to include LP_DEAD items after we determine whether or not to
+ * opportunistically freeze.
*/
bool all_visible;
bool all_frozen;
@@ -173,10 +176,21 @@ static void heap_prune_record_unchanged_lp_redirect(PruneState *prstate, OffsetN
static void page_verify_redirects(Page page);
+static TransactionId get_conflict_xid(bool do_prune, bool do_freeze, bool do_set_vm,
+ TransactionId latest_xid_removed, TransactionId frz_conflict_horizon,
+ TransactionId visibility_cutoff_xid, bool blk_already_av,
+ bool set_blk_all_frozen);
static bool heap_page_will_freeze(Relation relation, Buffer buffer,
bool did_tuple_hint_fpi, bool do_prune, bool do_hint_prune,
PruneState *prstate, TransactionId *frz_conflict_horizon);
-
+static bool heap_page_will_set_vis(Relation relation,
+ BlockNumber heap_blk,
+ Buffer heap_buf,
+ Buffer vmbuffer,
+ bool blk_known_av,
+ const PruneState *prstate,
+ uint8 *vmflags,
+ bool *do_set_pd_vis);
/*
* Optionally prune and repair fragmentation in the specified page.
@@ -262,6 +276,8 @@ heap_page_prune_opt(Relation relation, Buffer buffer)
params.reason = PRUNE_ON_ACCESS;
params.vistest = vistest;
params.cutoffs = NULL;
+ params.vmbuffer = InvalidBuffer;
+ params.blk_known_av = false;
/*
* For now, pass mark_unused_now as false regardless of whether or
@@ -434,10 +450,108 @@ heap_page_will_freeze(Relation relation, Buffer buffer,
return do_freeze;
}
+/*
+ * Decide whether to set the visibility map bits for heap_blk, using
+ * information from PruneState and blk_known_av. Some callers may already
+ * have examined this page’s VM bits (e.g., VACUUM in the previous
+ * heap_vac_scan_next_block() call) and can pass that along.
+ *
+ * Returns true if one or both VM bits should be set, along with the desired
+ * flags in *vmflags. Also indicates via do_set_pd_vis whether PD_ALL_VISIBLE
+ * should be set on the heap page.
+ */
+static bool
+heap_page_will_set_vis(Relation relation,
+ BlockNumber heap_blk,
+ Buffer heap_buf,
+ Buffer vmbuffer,
+ bool blk_known_av,
+ const PruneState *prstate,
+ uint8 *vmflags,
+ bool *do_set_pd_vis)
+{
+ Page heap_page = BufferGetPage(heap_buf);
+ bool do_set_vm = false;
+
+ *do_set_pd_vis = false;
+
+ if (!prstate->attempt_update_vm)
+ {
+ Assert(!prstate->all_visible && !prstate->all_frozen);
+ Assert(*vmflags == 0);
+ return false;
+ }
+
+ if (prstate->all_visible && !PageIsAllVisible(heap_page))
+ *do_set_pd_vis = true;
+
+ if ((prstate->all_visible && !blk_known_av) ||
+ (prstate->all_frozen && !VM_ALL_FROZEN(relation, heap_blk, &vmbuffer)))
+ {
+ *vmflags = VISIBILITYMAP_ALL_VISIBLE;
+ if (prstate->all_frozen)
+ *vmflags |= VISIBILITYMAP_ALL_FROZEN;
+
+ do_set_vm = true;
+ }
+
+ /*
+ * Now handle two potential corruption cases:
+ *
+ * These do not need to happen in a critical section and are not
+ * WAL-logged.
+ *
+ * As of PostgreSQL 9.2, the visibility map bit should never be set if the
+ * page-level bit is clear. However, it's possible that in vacuum the bit
+ * got cleared after heap_vac_scan_next_block() was called, so we must
+ * recheck with buffer lock before concluding that the VM is corrupt.
+ */
+ else if (blk_known_av && !PageIsAllVisible(heap_page) &&
+ visibilitymap_get_status(relation, heap_blk, &vmbuffer) != 0)
+ {
+ ereport(WARNING,
+ (errcode(ERRCODE_DATA_CORRUPTED),
+ errmsg("page is not marked all-visible but visibility map bit is set in relation \"%s\" page %u",
+ RelationGetRelationName(relation), heap_blk)));
+
+ visibilitymap_clear(relation, heap_blk, vmbuffer,
+ VISIBILITYMAP_VALID_BITS);
+ }
+
+ /*
+ * It's possible for the value returned by
+ * GetOldestNonRemovableTransactionId() to move backwards, so it's not
+ * wrong for us to see tuples that appear to not be visible to everyone
+ * yet, while PD_ALL_VISIBLE is already set. The real safe xmin value
+ * never moves backwards, but GetOldestNonRemovableTransactionId() is
+ * conservative and sometimes returns a value that's unnecessarily small,
+ * so if we see that contradiction it just means that the tuples that we
+ * think are not visible to everyone yet actually are, and the
+ * PD_ALL_VISIBLE flag is correct.
+ *
+ * There should never be LP_DEAD items on a page with PD_ALL_VISIBLE set,
+ * however.
+ */
+ else if (prstate->lpdead_items > 0 && PageIsAllVisible(heap_page))
+ {
+ ereport(WARNING,
+ (errcode(ERRCODE_DATA_CORRUPTED),
+ errmsg("page containing LP_DEAD items is marked as all-visible in relation \"%s\" page %u",
+ RelationGetRelationName(relation), heap_blk)));
+
+ PageClearAllVisible(heap_page);
+ MarkBufferDirty(heap_buf);
+ visibilitymap_clear(relation, heap_blk, vmbuffer,
+ VISIBILITYMAP_VALID_BITS);
+ }
+
+ return do_set_vm;
+}
/*
* Prune and repair fragmentation and potentially freeze tuples on the
- * specified page.
+ * specified page. If the page's visibility status has changed, update it in
+ * the VM.
*
* Caller must have pin and buffer cleanup lock on the page. Note that we
* don't update the FSM information for page on caller's behalf. Caller might
@@ -452,12 +566,13 @@ heap_page_will_freeze(Relation relation, Buffer buffer,
* it's required in order to advance relfrozenxid / relminmxid, or if it's
* considered advantageous for overall system performance to do so now. The
* 'params.cutoffs', 'presult', 'new_relfrozen_xid' and 'new_relmin_mxid'
- * arguments are required when freezing. When HEAP_PRUNE_FREEZE option is
- * passed, we also set presult->all_visible and presult->all_frozen after
- * determining whether or not to opporunistically freeze, to indicate if the
- * VM bits can be set. They are always set to false when the
- * HEAP_PRUNE_FREEZE option is not passed, because at the moment only callers
- * that also freeze need that information.
+ * arguments are required when freezing.
+ *
+ * If HEAP_PAGE_PRUNE_UPDATE_VIS is set in params and the visibility status of
+ * the page has changed, we will update the VM at the same time as pruning and
+ * freezing the heap page. We will also update presult->old_vmbits and
+ * presult->new_vmbits with the state of the VM before and after updating it
+ * for the caller to use in bookkeeping.
*
* presult contains output parameters needed by callers, such as the number of
* tuples removed and the offsets of dead items on the page after pruning.
@@ -482,6 +597,7 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
MultiXactId *new_relmin_mxid)
{
Buffer buffer = params->buffer;
+ Buffer vmbuffer = params->vmbuffer;
Page page = BufferGetPage(buffer);
BlockNumber blockno = BufferGetBlockNumber(buffer);
OffsetNumber offnum,
@@ -491,15 +607,22 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
bool do_freeze;
bool do_prune;
bool do_hint_prune;
+ bool do_set_vm;
+ bool do_set_pd_vis;
bool did_tuple_hint_fpi;
int64 fpi_before = pgWalUsage.wal_fpi;
TransactionId frz_conflict_horizon = InvalidTransactionId;
+ TransactionId conflict_xid = InvalidTransactionId;
+ uint8 new_vmbits = 0;
+ uint8 old_vmbits = 0;
/* Copy parameters to prstate */
prstate.vistest = params->vistest;
prstate.mark_unused_now =
(params->options & HEAP_PAGE_PRUNE_MARK_UNUSED_NOW) != 0;
prstate.attempt_freeze = (params->options & HEAP_PAGE_PRUNE_FREEZE) != 0;
+ prstate.attempt_update_vm =
+ (params->options & HEAP_PAGE_PRUNE_UPDATE_VIS) != 0;
prstate.cutoffs = params->cutoffs;
/*
@@ -546,50 +669,54 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
prstate.deadoffsets = presult->deadoffsets;
/*
- * Caller may update the VM after we're done. We can keep track of
- * whether the page will be all-visible and all-frozen after pruning and
- * freezing to help the caller to do that.
+ * Track whether the page could be marked all-visible and/or all-frozen.
+ * This information is used for opportunistic freezing and for updating
+ * the visibility map (VM) if requested by the caller.
+ *
+ * Currently, only VACUUM performs freezing, but other callers may in the
+ * future. Visibility bookkeeping is required not just for setting the VM
+ * bits, but also for opportunistic freezing: we only consider freezing if
+ * the page would become all-frozen, or if it would be all-frozen except
+ * for dead tuples that VACUUM will remove. If attempt_update_vm is false,
+ * we will not set the VM bit even if the page is found to be all-visible.
+ *
+ * If HEAP_PAGE_PRUNE_UPDATE_VIS is passed without HEAP_PAGE_PRUNE_FREEZE,
+ * prstate.all_frozen must be initialized to false, since we will not call
+ * heap_prepare_freeze_tuple() for each tuple.
*
- * Currently, only VACUUM sets the VM bits. To save the effort, only do
- * the bookkeeping if the caller needs it. Currently, that's tied to
- * HEAP_PAGE_PRUNE_FREEZE, but it could be a separate flag if you wanted
- * to update the VM bits without also freezing or freeze without also
- * setting the VM bits.
+ * Dead tuples that will be removed by the end of vacuum should not
+ * prevent opportunistic freezing. Therefore, we do not clear all_visible
+ * when we encounter LP_DEAD items. Instead, we correct all_visible after
+ * deciding whether to freeze, but before updating the VM, to avoid
+ * setting the VM bit incorrectly.
*
- * In addition to telling the caller whether it can set the VM bit, we
- * also use 'all_visible' and 'all_frozen' for our own decision-making. If
- * the whole page would become frozen, we consider opportunistically
- * freezing tuples. We will not be able to freeze the whole page if there
- * are tuples present that are not visible to everyone or if there are
- * dead tuples which are not yet removable. However, dead tuples which
- * will be removed by the end of vacuuming should not preclude us from
- * opportunistically freezing. Because of that, we do not immediately
- * clear all_visible when we see LP_DEAD items. We fix that after
- * scanning the line pointers, before we return the value to the caller,
- * so that the caller doesn't set the VM bit incorrectly.
+ * If neither freezing nor VM updates are requested, we skip the extra
+ * bookkeeping. In this case, initializing all_visible to false allows
+ * heap_prune_record_unchanged_lp_normal() to bypass unnecessary work.
*/
if (prstate.attempt_freeze)
{
prstate.all_visible = true;
prstate.all_frozen = true;
}
+ else if (prstate.attempt_update_vm)
+ {
+ prstate.all_visible = true;
+ prstate.all_frozen = false;
+ }
else
{
- /*
- * Initializing to false allows skipping the work to update them in
- * heap_prune_record_unchanged_lp_normal().
- */
prstate.all_visible = false;
prstate.all_frozen = false;
}
/*
- * The visibility cutoff xid is the newest xmin of live tuples on the
- * page. In the common case, this will be set as the conflict horizon the
- * caller can use for updating the VM. If, at the end of freezing and
- * pruning, the page is all-frozen, there is no possibility that any
- * running transaction on the standby does not see tuples on the page as
- * all-visible, so the conflict horizon remains InvalidTransactionId.
+ * The visibility cutoff xid is the newest xmin of live, committed tuples
+ * older than OldestXmin on the page. This field is only kept up-to-date
+ * if the page is all-visible. As soon as a tuple is encountered that is
+ * not visible to all, this field is unmaintained. As long as it is
+ * maintained, it can be used to calculate the snapshot conflict horizon
+ * when updating the VM and/or freezing all the tuples on the page.
*/
prstate.visibility_cutoff_xid = InvalidTransactionId;
@@ -821,6 +948,34 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
prstate.all_visible = prstate.all_frozen = false;
Assert(!prstate.all_frozen || prstate.all_visible);
+ Assert(!prstate.all_visible || (prstate.lpdead_items == 0));
+
+ /*
+ * Decide whether to set the page-level PD_ALL_VISIBLE bit and the VM bits
+ * based on information from the VM and the all_visible/all_frozen flags.
+ *
+ * While it is valid for PD_ALL_VISIBLE to be set when the corresponding
+ * VM bit is clear, we strongly prefer to keep them in sync.
+ *
+ * Accordingly, we also allow updating only the VM when PD_ALL_VISIBLE has
+ * already been set. Setting only the VM is most common when setting an
+ * already all-visible page all-frozen.
+ */
+ do_set_vm = heap_page_will_set_vis(params->relation,
+ blockno, buffer, vmbuffer, params->blk_known_av,
+ &prstate, &new_vmbits, &do_set_pd_vis);
+
+ /* We should only set the VM if PD_ALL_VISIBLE is set or will be */
+ Assert(!do_set_vm || do_set_pd_vis || PageIsAllVisible(page));
+
+ conflict_xid = get_conflict_xid(do_prune, do_freeze, do_set_vm,
+ prstate.latest_xid_removed, frz_conflict_horizon,
+ prstate.visibility_cutoff_xid, params->blk_known_av,
+ (do_set_vm && (new_vmbits & VISIBILITYMAP_ALL_FROZEN)));
+
+ /* Lock vmbuffer before entering a critical section */
+ if (do_set_vm)
+ LockBuffer(vmbuffer, BUFFER_LOCK_EXCLUSIVE);
/* Any error while applying the changes is critical */
START_CRIT_SECTION();
@@ -842,14 +997,17 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
/*
* If that's all we had to do to the page, this is a non-WAL-logged
- * hint. If we are going to freeze or prune the page, we will mark
- * the buffer dirty below.
+ * hint. If we are going to freeze or prune the page or set
+ * PD_ALL_VISIBLE, we will mark the buffer dirty below.
+ *
+ * Setting PD_ALL_VISIBLE is fully WAL-logged because it is forbidden
+ * for the VM to be set and PD_ALL_VISIBLE to be clear.
*/
- if (!do_freeze && !do_prune)
+ if (!do_freeze && !do_prune && !do_set_pd_vis)
MarkBufferDirtyHint(buffer, true);
}
- if (do_prune || do_freeze)
+ if (do_prune || do_freeze || do_set_vm)
{
/* Apply the planned item changes and repair page fragmentation. */
if (do_prune)
@@ -863,35 +1021,43 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
if (do_freeze)
heap_freeze_prepared_tuples(buffer, prstate.frozen, prstate.nfrozen);
- MarkBufferDirty(buffer);
+ if (do_set_pd_vis)
+ PageSetAllVisible(page);
- /*
- * Emit a WAL XLOG_HEAP2_PRUNE* record showing what we did
- */
- if (RelationNeedsWAL(params->relation))
+ if (do_prune || do_freeze || do_set_pd_vis)
+ MarkBufferDirty(buffer);
+
+ if (do_set_vm)
{
- /*
- * The snapshotConflictHorizon for the whole record should be the
- * most conservative of all the horizons calculated for any of the
- * possible modifications. If this record will prune tuples, any
- * transactions on the standby older than the youngest xmax of the
- * most recently removed tuple this record will prune will
- * conflict. If this record will freeze tuples, any transactions
- * on the standby with xids older than the youngest tuple this
- * record will freeze will conflict.
- */
- TransactionId conflict_xid;
+ Assert(PageIsAllVisible(page));
- if (TransactionIdFollows(frz_conflict_horizon, prstate.latest_xid_removed))
- conflict_xid = frz_conflict_horizon;
- else
- conflict_xid = prstate.latest_xid_removed;
+ old_vmbits = visibilitymap_set_vmbits(blockno,
+ vmbuffer, new_vmbits,
+ params->relation->rd_locator);
+ if (old_vmbits == new_vmbits)
+ {
+ LockBuffer(vmbuffer, BUFFER_LOCK_UNLOCK);
+ /* Unset so we don't emit WAL since no change occurred */
+ do_set_vm = false;
+ }
+ }
+ /*
+ * Emit a WAL XLOG_HEAP2_PRUNE* record showing what we did. If we were
+ * only updating the VM and it turns out it was already set, we will
+ * have unset do_set_vm earlier. As such, check it again before
+ * emitting the record.
+ */
+ if (RelationNeedsWAL(params->relation) &&
+ (do_prune || do_freeze || do_set_vm))
+ {
log_heap_prune_and_freeze(params->relation, buffer,
- InvalidBuffer, /* vmbuffer */
- 0, /* vmflags */
+ do_set_vm ? vmbuffer : InvalidBuffer,
+ do_set_vm ? new_vmbits : 0,
conflict_xid,
- true, params->reason,
+ true, /* cleanup lock */
+ do_set_pd_vis,
+ params->reason,
prstate.frozen, prstate.nfrozen,
prstate.redirected, prstate.nredirected,
prstate.nowdead, prstate.ndead,
@@ -901,28 +1067,47 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
END_CRIT_SECTION();
+ if (do_set_vm)
+ LockBuffer(vmbuffer, BUFFER_LOCK_UNLOCK);
+
+ /*
+ * During its second pass over the heap, VACUUM calls
+ * heap_page_would_be_all_visible() to determine whether a page is
+ * all-visible and all-frozen. The logic here is similar. After completing
+ * pruning and freezing, use an assertion to verify that our results
+ * remain consistent with heap_page_would_be_all_visible().
+ */
+#ifdef USE_ASSERT_CHECKING
+ if (prstate.all_visible)
+ {
+ TransactionId debug_cutoff;
+ bool debug_all_frozen;
+
+ Assert(prstate.lpdead_items == 0);
+ Assert(prstate.cutoffs);
+
+ if (!heap_page_is_all_visible(params->relation, buffer,
+ prstate.cutoffs->OldestXmin,
+ &debug_all_frozen,
+ &debug_cutoff, off_loc))
+ Assert(false);
+
+ Assert(prstate.all_frozen == debug_all_frozen);
+
+ Assert(!TransactionIdIsValid(debug_cutoff) ||
+ debug_cutoff == prstate.visibility_cutoff_xid);
+ }
+#endif
+
/* Copy information back for caller */
presult->ndeleted = prstate.ndeleted;
presult->nnewlpdead = prstate.ndead;
presult->nfrozen = prstate.nfrozen;
presult->live_tuples = prstate.live_tuples;
presult->recently_dead_tuples = prstate.recently_dead_tuples;
- presult->all_visible = prstate.all_visible;
- presult->all_frozen = prstate.all_frozen;
presult->hastup = prstate.hastup;
-
- /*
- * For callers planning to update the visibility map, the conflict horizon
- * for that record must be the newest xmin on the page. However, if the
- * page is completely frozen, there can be no conflict and the
- * vm_conflict_horizon should remain InvalidTransactionId. This includes
- * the case that we just froze all the tuples; the prune-freeze record
- * included the conflict XID already so the caller doesn't need it.
- */
- if (presult->all_frozen)
- presult->vm_conflict_horizon = InvalidTransactionId;
- else
- presult->vm_conflict_horizon = prstate.visibility_cutoff_xid;
+ presult->new_vmbits = new_vmbits;
+ presult->old_vmbits = old_vmbits;
presult->lpdead_items = prstate.lpdead_items;
/* the presult->deadoffsets array was already filled in */
@@ -1413,6 +1598,8 @@ heap_prune_record_unchanged_lp_normal(Page page, PruneState *prstate, OffsetNumb
{
TransactionId xmin;
+ Assert(prstate->attempt_update_vm);
+
if (!HeapTupleHeaderXminCommitted(htup))
{
prstate->all_visible = prstate->all_frozen = false;
@@ -2058,6 +2245,64 @@ heap_log_freeze_plan(HeapTupleFreeze *tuples, int ntuples,
return nplans;
}
+/*
+ * Calculate the conflict horizon for the whole XLOG_HEAP2_PRUNE_VACUUM_SCAN
+ * record.
+ */
+static TransactionId
+get_conflict_xid(bool do_prune, bool do_freeze, bool do_set_vm,
+ TransactionId latest_xid_removed, TransactionId frz_conflict_horizon,
+ TransactionId visibility_cutoff_xid, bool blk_already_av,
+ bool set_blk_all_frozen)
+{
+
+ /*
+ * The snapshotConflictHorizon for the whole record should be the most
+ * conservative of all the horizons calculated for any of the possible
+ * modifications. If this record will prune tuples, any transactions on
+ * the standby older than the youngest xmax of the most recently removed
+ * tuple this record will prune will conflict. If this record will freeze
+ * tuples, any transactions on the standby with xids older than the
+ * youngest tuple this record will freeze will conflict.
+ */
+ TransactionId conflict_xid = InvalidTransactionId;
+
+ /*
+ * If we are updating the VM, the conflict horizon is almost always the
+ * visibility cutoff XID.
+ *
+ * Separately, if we are freezing any tuples, as an optimization, we can
+ * use the visibility_cutoff_xid as the conflict horizon if the page will
+ * be all-frozen. This is true even if there are LP_DEAD line pointers
+ * because we ignored those when maintaining the visibility_cutoff_xid.
+ * This will have been calculated earlier as the frz_conflict_horizon when
+ * we determined we would freeze.
+ */
+ if (do_set_vm)
+ conflict_xid = visibility_cutoff_xid;
+ else if (do_freeze)
+ conflict_xid = frz_conflict_horizon;
+
+ /*
+ * If we are removing tuples with a younger xmax than our so far
+ * calculated conflict_xid, we must use this as our horizon.
+ */
+ if (TransactionIdFollows(latest_xid_removed, conflict_xid))
+ conflict_xid = latest_xid_removed;
+
+ /*
+ * We can omit the snapshot conflict horizon if we are not pruning or
+ * freezing any tuples and are setting an already all-visible page
+ * all-frozen in the VM. In this case, all of the tuples on the page must
+ * already be visible to all MVCC snapshots on the standby.
+ */
+ if (!do_prune && !do_freeze &&
+ do_set_vm && blk_already_av && set_blk_all_frozen)
+ conflict_xid = InvalidTransactionId;
+
+ return conflict_xid;
+}
+
/*
* Write an XLOG_HEAP2_PRUNE* WAL record
*
@@ -2082,6 +2327,15 @@ heap_log_freeze_plan(HeapTupleFreeze *tuples, int ntuples,
* case, vmbuffer should already have been updated and marked dirty and should
* still be pinned and locked.
*
+ * set_pd_all_vis indicates that we set PD_ALL_VISIBLE and thus should update
+ * the page LSN when checksums/wal_log_hints are enabled even if we did not
+ * prune or freeze tuples on the page.
+ *
+ * In some cases, such as when heap_page_prune_and_freeze() is setting an
+ * already marked all-visible page all-frozen, PD_ALL_VISIBLE may already be
+ * set. So, it is possible for vmflags to be non-zero and set_pd_all_vis to be
+ * false.
+ *
* Note: This function scribbles on the 'frozen' array.
*
* Note: This is called in a critical section, so careful what you do here.
@@ -2091,6 +2345,7 @@ log_heap_prune_and_freeze(Relation relation, Buffer buffer,
Buffer vmbuffer, uint8 vmflags,
TransactionId conflict_xid,
bool cleanup_lock,
+ bool set_pd_all_vis,
PruneReason reason,
HeapTupleFreeze *frozen, int nfrozen,
OffsetNumber *redirected, int nredirected,
@@ -2127,7 +2382,7 @@ log_heap_prune_and_freeze(Relation relation, Buffer buffer,
*/
if (!do_prune &&
nfrozen == 0 &&
- (!do_set_vm || !XLogHintBitIsNeeded()))
+ (!set_pd_all_vis || !XLogHintBitIsNeeded()))
regbuf_flags_heap |= REGBUF_NO_IMAGE;
/*
@@ -2245,7 +2500,8 @@ log_heap_prune_and_freeze(Relation relation, Buffer buffer,
* See comment at the top of the function about regbuf_flags_heap for
* details on when we can advance the page LSN.
*/
- if (do_prune || nfrozen > 0 || (do_set_vm && XLogHintBitIsNeeded()))
+ if (do_prune || nfrozen > 0 ||
+ (set_pd_all_vis && XLogHintBitIsNeeded()))
{
Assert(BufferIsDirty(buffer));
PageSetLSN(BufferGetPage(buffer), recptr);
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index 58de605ca09..985a66bdb2e 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -464,13 +464,6 @@ static void dead_items_add(LVRelState *vacrel, BlockNumber blkno, OffsetNumber *
static void dead_items_reset(LVRelState *vacrel);
static void dead_items_cleanup(LVRelState *vacrel);
-#ifdef USE_ASSERT_CHECKING
-static bool heap_page_is_all_visible(Relation rel, Buffer buf,
- TransactionId OldestXmin,
- bool *all_frozen,
- TransactionId *visibility_cutoff_xid,
- OffsetNumber *logging_offnum);
-#endif
static bool heap_page_would_be_all_visible(Relation rel, Buffer buf,
TransactionId OldestXmin,
OffsetNumber *deadoffsets,
@@ -1973,6 +1966,8 @@ lazy_scan_prune(LVRelState *vacrel,
params.reason = PRUNE_VACUUM_SCAN;
params.cutoffs = &vacrel->cutoffs;
params.vistest = vacrel->vistest;
+ params.vmbuffer = vmbuffer;
+ params.blk_known_av = all_visible_according_to_vm;
/*
* Prune all HOT-update chains and potentially freeze tuples on this page.
@@ -1989,7 +1984,7 @@ lazy_scan_prune(LVRelState *vacrel,
* tuples. Pruning will have determined whether or not the page is
* all-visible.
*/
- params.options = HEAP_PAGE_PRUNE_FREEZE;
+ params.options = HEAP_PAGE_PRUNE_FREEZE | HEAP_PAGE_PRUNE_UPDATE_VIS;
if (vacrel->nindexes == 0)
params.options |= HEAP_PAGE_PRUNE_MARK_UNUSED_NOW;
@@ -2012,33 +2007,6 @@ lazy_scan_prune(LVRelState *vacrel,
vacrel->new_frozen_tuple_pages++;
}
- /*
- * VACUUM will call heap_page_is_all_visible() during the second pass over
- * the heap to determine all_visible and all_frozen for the page -- this
- * is a specialized version of the logic from this function. Now that
- * we've finished pruning and freezing, make sure that we're in total
- * agreement with heap_page_is_all_visible() using an assertion.
- */
-#ifdef USE_ASSERT_CHECKING
- if (presult.all_visible)
- {
- TransactionId debug_cutoff;
- bool debug_all_frozen;
-
- Assert(presult.lpdead_items == 0);
-
- if (!heap_page_is_all_visible(vacrel->rel, buf,
- vacrel->cutoffs.OldestXmin, &debug_all_frozen,
- &debug_cutoff, &vacrel->offnum))
- Assert(false);
-
- Assert(presult.all_frozen == debug_all_frozen);
-
- Assert(!TransactionIdIsValid(debug_cutoff) ||
- debug_cutoff == presult.vm_conflict_horizon);
- }
-#endif
-
/*
* Now save details of the LP_DEAD items from the page in vacrel
*/
@@ -2072,168 +2040,26 @@ lazy_scan_prune(LVRelState *vacrel,
/* Did we find LP_DEAD items? */
*has_lpdead_items = (presult.lpdead_items > 0);
- Assert(!presult.all_visible || !(*has_lpdead_items));
- Assert(!presult.all_frozen || presult.all_visible);
-
/*
- * Handle setting visibility map bit based on information from the VM (as
- * of last heap_vac_scan_next_block() call), and from all_visible and
- * all_frozen variables
+ * For the purposes of logging, count whether or not the page was newly
+ * set all-visible and, potentially, all-frozen.
*/
- if (!all_visible_according_to_vm && presult.all_visible)
+ if ((presult.old_vmbits & VISIBILITYMAP_ALL_VISIBLE) == 0 &&
+ (presult.new_vmbits & VISIBILITYMAP_ALL_VISIBLE) != 0)
{
- uint8 old_vmbits;
- uint8 flags = VISIBILITYMAP_ALL_VISIBLE;
-
- if (presult.all_frozen)
- {
- Assert(!TransactionIdIsValid(presult.vm_conflict_horizon));
- flags |= VISIBILITYMAP_ALL_FROZEN;
- }
-
- /*
- * It should never be the case that the visibility map page is set
- * while the page-level bit is clear, but the reverse is allowed (if
- * checksums are not enabled). Regardless, set both bits so that we
- * get back in sync.
- *
- * NB: If the heap page is all-visible but the VM bit is not set, we
- * don't need to dirty the heap page. However, if checksums are
- * enabled, we do need to make sure that the heap page is dirtied
- * before passing it to visibilitymap_set(), because it may be logged.
- * Given that this situation should only happen in rare cases after a
- * crash, it is not worth optimizing.
- */
- PageSetAllVisible(page);
- MarkBufferDirty(buf);
- old_vmbits = visibilitymap_set(vacrel->rel, blkno, buf,
- InvalidXLogRecPtr,
- vmbuffer, presult.vm_conflict_horizon,
- flags);
-
- /*
- * If the page wasn't already set all-visible and/or all-frozen in the
- * VM, count it as newly set for logging.
- */
- if ((old_vmbits & VISIBILITYMAP_ALL_VISIBLE) == 0)
- {
- vacrel->vm_new_visible_pages++;
- if (presult.all_frozen)
- {
- vacrel->vm_new_visible_frozen_pages++;
- *vm_page_frozen = true;
- }
- }
- else if ((old_vmbits & VISIBILITYMAP_ALL_FROZEN) == 0 &&
- presult.all_frozen)
+ vacrel->vm_new_visible_pages++;
+ if ((presult.new_vmbits & VISIBILITYMAP_ALL_FROZEN) != 0)
{
- vacrel->vm_new_frozen_pages++;
+ vacrel->vm_new_visible_frozen_pages++;
*vm_page_frozen = true;
}
}
-
- /*
- * As of PostgreSQL 9.2, the visibility map bit should never be set if the
- * page-level bit is clear. However, it's possible that the bit got
- * cleared after heap_vac_scan_next_block() was called, so we must recheck
- * with buffer lock before concluding that the VM is corrupt.
- */
- else if (all_visible_according_to_vm && !PageIsAllVisible(page) &&
- visibilitymap_get_status(vacrel->rel, blkno, &vmbuffer) != 0)
+ else if ((presult.old_vmbits & VISIBILITYMAP_ALL_FROZEN) == 0 &&
+ (presult.new_vmbits & VISIBILITYMAP_ALL_FROZEN) != 0)
{
- ereport(WARNING,
- (errcode(ERRCODE_DATA_CORRUPTED),
- errmsg("page is not marked all-visible but visibility map bit is set in relation \"%s\" page %u",
- vacrel->relname, blkno)));
-
- visibilitymap_clear(vacrel->rel, blkno, vmbuffer,
- VISIBILITYMAP_VALID_BITS);
- }
-
- /*
- * It's possible for the value returned by
- * GetOldestNonRemovableTransactionId() to move backwards, so it's not
- * wrong for us to see tuples that appear to not be visible to everyone
- * yet, while PD_ALL_VISIBLE is already set. The real safe xmin value
- * never moves backwards, but GetOldestNonRemovableTransactionId() is
- * conservative and sometimes returns a value that's unnecessarily small,
- * so if we see that contradiction it just means that the tuples that we
- * think are not visible to everyone yet actually are, and the
- * PD_ALL_VISIBLE flag is correct.
- *
- * There should never be LP_DEAD items on a page with PD_ALL_VISIBLE set,
- * however.
- */
- else if (presult.lpdead_items > 0 && PageIsAllVisible(page))
- {
- ereport(WARNING,
- (errcode(ERRCODE_DATA_CORRUPTED),
- errmsg("page containing LP_DEAD items is marked as all-visible in relation \"%s\" page %u",
- vacrel->relname, blkno)));
-
- PageClearAllVisible(page);
- MarkBufferDirty(buf);
- visibilitymap_clear(vacrel->rel, blkno, vmbuffer,
- VISIBILITYMAP_VALID_BITS);
- }
-
- /*
- * If the all-visible page is all-frozen but not marked as such yet, mark
- * it as all-frozen.
- */
- else if (all_visible_according_to_vm && presult.all_frozen &&
- !VM_ALL_FROZEN(vacrel->rel, blkno, &vmbuffer))
- {
- uint8 old_vmbits;
-
- /*
- * Avoid relying on all_visible_according_to_vm as a proxy for the
- * page-level PD_ALL_VISIBLE bit being set, since it might have become
- * stale -- even when all_visible is set
- */
- if (!PageIsAllVisible(page))
- {
- PageSetAllVisible(page);
- MarkBufferDirty(buf);
- }
-
- /*
- * Set the page all-frozen (and all-visible) in the VM.
- *
- * We can pass InvalidTransactionId as our cutoff_xid, since a
- * snapshotConflictHorizon sufficient to make everything safe for REDO
- * was logged when the page's tuples were frozen.
- */
- Assert(!TransactionIdIsValid(presult.vm_conflict_horizon));
- old_vmbits = visibilitymap_set(vacrel->rel, blkno, buf,
- InvalidXLogRecPtr,
- vmbuffer, InvalidTransactionId,
- VISIBILITYMAP_ALL_VISIBLE |
- VISIBILITYMAP_ALL_FROZEN);
-
- /*
- * The page was likely already set all-visible in the VM. However,
- * there is a small chance that it was modified sometime between
- * setting all_visible_according_to_vm and checking the visibility
- * during pruning. Check the return value of old_vmbits anyway to
- * ensure the visibility map counters used for logging are accurate.
- */
- if ((old_vmbits & VISIBILITYMAP_ALL_VISIBLE) == 0)
- {
- vacrel->vm_new_visible_pages++;
- vacrel->vm_new_visible_frozen_pages++;
- *vm_page_frozen = true;
- }
-
- /*
- * We already checked that the page was not set all-frozen in the VM
- * above, so we don't need to test the value of old_vmbits.
- */
- else
- {
- vacrel->vm_new_frozen_pages++;
- *vm_page_frozen = true;
- }
+ Assert((presult.new_vmbits & VISIBILITYMAP_ALL_VISIBLE) != 0);
+ vacrel->vm_new_frozen_pages++;
+ *vm_page_frozen = true;
}
return presult.ndeleted;
@@ -2955,6 +2781,7 @@ lazy_vacuum_heap_page(LVRelState *vacrel, BlockNumber blkno, Buffer buffer,
vmflags,
conflict_xid,
false, /* no cleanup lock required */
+ (vmflags & VISIBILITYMAP_VALID_BITS) != 0,
PRUNE_VACUUM_CLEANUP,
NULL, 0, /* frozen */
NULL, 0, /* redirected */
@@ -3642,7 +3469,7 @@ dead_items_cleanup(LVRelState *vacrel)
* that expect no LP_DEAD on the page. Currently assert-only, but there is no
* reason not to use it outside of asserts.
*/
-static bool
+bool
heap_page_is_all_visible(Relation rel, Buffer buf,
TransactionId OldestXmin,
bool *all_frozen,
diff --git a/src/include/access/heapam.h b/src/include/access/heapam.h
index 74a5c24002b..cb70f8ec562 100644
--- a/src/include/access/heapam.h
+++ b/src/include/access/heapam.h
@@ -42,6 +42,7 @@
/* "options" flag bits for heap_page_prune_and_freeze */
#define HEAP_PAGE_PRUNE_MARK_UNUSED_NOW (1 << 0)
#define HEAP_PAGE_PRUNE_FREEZE (1 << 1)
+#define HEAP_PAGE_PRUNE_UPDATE_VIS (1 << 2)
typedef struct BulkInsertStateData *BulkInsertState;
typedef struct GlobalVisState GlobalVisState;
@@ -238,6 +239,16 @@ typedef struct PruneFreezeParams
Relation relation; /* relation containing buffer to be pruned */
Buffer buffer; /* buffer to be pruned */
+ /*
+ *
+ * vmbuffer is the buffer that must already contain contain the required
+ * block of the visibility map if we are to update it. blk_known_av is the
+ * visibility status of the heap block as of the last call to
+ * find_next_unskippable_block().
+ */
+ Buffer vmbuffer;
+ bool blk_known_av;
+
/*
* The reason pruning was performed. It is used to set the WAL record
* opcode which is used for debugging and analysis purposes.
@@ -250,8 +261,9 @@ typedef struct PruneFreezeParams
* MARK_UNUSED_NOW indicates that dead items can be set LP_UNUSED during
* pruning.
*
- * FREEZE indicates that we will also freeze tuples, and will return
- * 'all_visible', 'all_frozen' flags to the caller.
+ * FREEZE indicates that we will also freeze tuples
+ *
+ * UPDATE_VIS indicates that we will set the page's status in the VM.
*/
int options;
@@ -284,19 +296,15 @@ typedef struct PruneFreezeResult
int recently_dead_tuples;
/*
- * all_visible and all_frozen indicate if the all-visible and all-frozen
- * bits in the visibility map can be set for this page, after pruning.
- *
- * vm_conflict_horizon is the newest xmin of live tuples on the page. The
- * caller can use it as the conflict horizon when setting the VM bits. It
- * is only valid if we froze some tuples (nfrozen > 0), and all_frozen is
- * true.
+ * old_vmbits are the state of the all-visible and all-frozen bits in the
+ * visibility map before updating it during phase I of vacuuming.
+ * new_vmbits are the state of those bits after phase I of vacuuming.
*
- * These are only set if the HEAP_PRUNE_FREEZE option is set.
+ * These are only set if the HEAP_PAGE_PRUNE_UPDATE_VIS option is set and
+ * we have attempted to update the VM.
*/
- bool all_visible;
- bool all_frozen;
- TransactionId vm_conflict_horizon;
+ uint8 new_vmbits;
+ uint8 old_vmbits;
/*
* Whether or not the page makes rel truncation unsafe. This is set to
@@ -423,6 +431,7 @@ extern void log_heap_prune_and_freeze(Relation relation, Buffer buffer,
Buffer vmbuffer, uint8 vmflags,
TransactionId conflict_xid,
bool cleanup_lock,
+ bool set_pd_all_vis,
PruneReason reason,
HeapTupleFreeze *frozen, int nfrozen,
OffsetNumber *redirected, int nredirected,
@@ -433,6 +442,14 @@ extern void log_heap_prune_and_freeze(Relation relation, Buffer buffer,
extern void heap_vacuum_rel(Relation rel,
const VacuumParams params, BufferAccessStrategy bstrategy);
+#ifdef USE_ASSERT_CHECKING
+extern bool heap_page_is_all_visible(Relation rel, Buffer buf,
+ TransactionId OldestXmin,
+ bool *all_frozen,
+ TransactionId *visibility_cutoff_xid,
+ OffsetNumber *logging_offnum);
+#endif
+
/* in heap/heapam_visibility.c */
extern bool HeapTupleSatisfiesVisibility(HeapTuple htup, Snapshot snapshot,
Buffer buffer);
--
2.43.0
v18-0005-Eliminate-XLOG_HEAP2_VISIBLE-from-empty-page-vac.patchtext/x-patch; charset=US-ASCII; name=v18-0005-Eliminate-XLOG_HEAP2_VISIBLE-from-empty-page-vac.patchDownload
From 393173db3db25838dd638ad334eb31ed09cb4f1e Mon Sep 17 00:00:00 2001
From: Melanie Plageman <melanieplageman@gmail.com>
Date: Sat, 27 Sep 2025 11:55:21 -0400
Subject: [PATCH v18 05/12] Eliminate XLOG_HEAP2_VISIBLE from empty-page vacuum
As part of removing XLOG_HEAP2_VISIBLE records, phase I of VACUUM now
marks empty pages all-visible in a XLOG_HEAP2_PRUNE_VACUUM_SCAN record.
Author: Melanie Plageman <melanieplageman@gmail.com>
Reviewed-by: Robert Haas <robertmhaas@gmail.com>
---
src/backend/access/heap/vacuumlazy.c | 36 +++++++++++++++++++++++-----
1 file changed, 30 insertions(+), 6 deletions(-)
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index 985a66bdb2e..14a8e342e51 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -1878,9 +1878,12 @@ lazy_scan_new_or_empty(LVRelState *vacrel, Buffer buf, BlockNumber blkno,
*/
if (!PageIsAllVisible(page))
{
+ /* Lock vmbuffer before entering critical section */
+ LockBuffer(vmbuffer, BUFFER_LOCK_EXCLUSIVE);
+
START_CRIT_SECTION();
- /* mark buffer dirty before writing a WAL record */
+ /* Mark buffer dirty before writing any WAL records */
MarkBufferDirty(buf);
/*
@@ -1897,13 +1900,34 @@ lazy_scan_new_or_empty(LVRelState *vacrel, Buffer buf, BlockNumber blkno,
log_newpage_buffer(buf, true);
PageSetAllVisible(page);
- visibilitymap_set(vacrel->rel, blkno, buf,
- InvalidXLogRecPtr,
- vmbuffer, InvalidTransactionId,
- VISIBILITYMAP_ALL_VISIBLE |
- VISIBILITYMAP_ALL_FROZEN);
+ visibilitymap_set_vmbits(blkno,
+ vmbuffer,
+ VISIBILITYMAP_ALL_VISIBLE |
+ VISIBILITYMAP_ALL_FROZEN,
+ vacrel->rel->rd_locator);
+
+ /*
+ * Emit WAL for setting PD_ALL_VISIBLE on the heap page and
+ * setting the VM.
+ */
+ if (RelationNeedsWAL(vacrel->rel))
+ log_heap_prune_and_freeze(vacrel->rel, buf,
+ vmbuffer,
+ VISIBILITYMAP_ALL_VISIBLE |
+ VISIBILITYMAP_ALL_FROZEN,
+ InvalidTransactionId, /* conflict xid */
+ false, /* cleanup lock */
+ true, /* set_pd_all_vis */
+ PRUNE_VACUUM_SCAN, /* reason */
+ NULL, 0,
+ NULL, 0,
+ NULL, 0,
+ NULL, 0);
+
END_CRIT_SECTION();
+ LockBuffer(vmbuffer, BUFFER_LOCK_UNLOCK);
+
/* Count the newly all-frozen pages for logging */
vacrel->vm_new_visible_pages++;
vacrel->vm_new_visible_frozen_pages++;
--
2.43.0
v18-0006-Remove-XLOG_HEAP2_VISIBLE-entirely.patchtext/x-patch; charset=US-ASCII; name=v18-0006-Remove-XLOG_HEAP2_VISIBLE-entirely.patchDownload
From 19c3b3150d386545c309a72fe21bcc7db11dbcb8 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <melanieplageman@gmail.com>
Date: Sat, 27 Sep 2025 11:55:36 -0400
Subject: [PATCH v18 06/12] Remove XLOG_HEAP2_VISIBLE entirely
As no remaining users emit XLOG_HEAP2_VISIBLE records.
This includes deleting the xl_heap_visible struct and all functions
responsible for emitting or replaying XLOG_HEAP2_VISIBLE records.
Author: Melanie Plageman <melanieplageman@gmail.com>
Reviewed-by: Andrey Borodin <x4mmm@yandex-team.ru>
---
src/backend/access/common/bufmask.c | 4 +-
src/backend/access/heap/heapam.c | 54 +-------
src/backend/access/heap/heapam_xlog.c | 155 ++---------------------
src/backend/access/heap/pruneheap.c | 18 ++-
src/backend/access/heap/vacuumlazy.c | 16 +--
src/backend/access/heap/visibilitymap.c | 111 +---------------
src/backend/access/rmgrdesc/heapdesc.c | 10 --
src/backend/replication/logical/decode.c | 1 -
src/backend/storage/ipc/standby.c | 12 +-
src/include/access/heapam_xlog.h | 28 +---
src/include/access/visibilitymap.h | 13 +-
src/include/access/visibilitymapdefs.h | 9 --
src/tools/pgindent/typedefs.list | 1 -
13 files changed, 54 insertions(+), 378 deletions(-)
diff --git a/src/backend/access/common/bufmask.c b/src/backend/access/common/bufmask.c
index bb260cffa68..5f07f179415 100644
--- a/src/backend/access/common/bufmask.c
+++ b/src/backend/access/common/bufmask.c
@@ -56,8 +56,8 @@ mask_page_hint_bits(Page page)
/*
* During replay, if the page LSN has advanced past our XLOG record's LSN,
- * we don't mark the page all-visible. See heap_xlog_visible() for
- * details.
+ * we don't mark the page all-visible. See heap_xlog_prune_and_freeze()
+ * for more details.
*/
PageClearAllVisible(page);
}
diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index 568696333c2..f881530a2a5 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -2524,11 +2524,11 @@ heap_multi_insert(Relation relation, TupleTableSlot **slots, int ntuples,
else if (all_frozen_set)
{
PageSetAllVisible(page);
- visibilitymap_set_vmbits(BufferGetBlockNumber(buffer),
- vmbuffer,
- VISIBILITYMAP_ALL_VISIBLE |
- VISIBILITYMAP_ALL_FROZEN,
- relation->rd_locator);
+ visibilitymap_set(BufferGetBlockNumber(buffer),
+ vmbuffer,
+ VISIBILITYMAP_ALL_VISIBLE |
+ VISIBILITYMAP_ALL_FROZEN,
+ relation->rd_locator);
}
/*
@@ -8797,50 +8797,6 @@ bottomup_sort_and_shrink(TM_IndexDeleteOp *delstate)
return nblocksfavorable;
}
-/*
- * Perform XLogInsert for a heap-visible operation. 'block' is the block
- * being marked all-visible, and vm_buffer is the buffer containing the
- * corresponding visibility map block. Both should have already been modified
- * and dirtied.
- *
- * snapshotConflictHorizon comes from the largest xmin on the page being
- * marked all-visible. REDO routine uses it to generate recovery conflicts.
- *
- * If checksums or wal_log_hints are enabled, we may also generate a full-page
- * image of heap_buffer. Otherwise, we optimize away the FPI (by specifying
- * REGBUF_NO_IMAGE for the heap buffer), in which case the caller should *not*
- * update the heap page's LSN.
- */
-XLogRecPtr
-log_heap_visible(Relation rel, Buffer heap_buffer, Buffer vm_buffer,
- TransactionId snapshotConflictHorizon, uint8 vmflags)
-{
- xl_heap_visible xlrec;
- XLogRecPtr recptr;
- uint8 flags;
-
- Assert(BufferIsValid(heap_buffer));
- Assert(BufferIsValid(vm_buffer));
-
- xlrec.snapshotConflictHorizon = snapshotConflictHorizon;
- xlrec.flags = vmflags;
- if (RelationIsAccessibleInLogicalDecoding(rel))
- xlrec.flags |= VISIBILITYMAP_XLOG_CATALOG_REL;
- XLogBeginInsert();
- XLogRegisterData(&xlrec, SizeOfHeapVisible);
-
- XLogRegisterBuffer(0, vm_buffer, 0);
-
- flags = REGBUF_STANDARD;
- if (!XLogHintBitIsNeeded())
- flags |= REGBUF_NO_IMAGE;
- XLogRegisterBuffer(1, heap_buffer, flags);
-
- recptr = XLogInsert(RM_HEAP2_ID, XLOG_HEAP2_VISIBLE);
-
- return recptr;
-}
-
/*
* Perform XLogInsert for a heap-update operation. Caller must already
* have modified the buffer(s) and marked them dirty.
diff --git a/src/backend/access/heap/heapam_xlog.c b/src/backend/access/heap/heapam_xlog.c
index 412ac3edf25..5eafdff6c2e 100644
--- a/src/backend/access/heap/heapam_xlog.c
+++ b/src/backend/access/heap/heapam_xlog.c
@@ -251,7 +251,7 @@ heap_xlog_prune_freeze(XLogReaderState *record)
if (PageIsNew(vmpage))
PageInit(vmpage, BLCKSZ, 0);
- visibilitymap_set_vmbits(blkno, vmbuffer, vmflags, rlocator);
+ visibilitymap_set(blkno, vmbuffer, vmflags, rlocator);
Assert(BufferIsDirty(vmbuffer));
PageSetLSN(vmpage, lsn);
@@ -264,142 +264,6 @@ heap_xlog_prune_freeze(XLogReaderState *record)
XLogRecordPageWithFreeSpace(rlocator, blkno, freespace);
}
-/*
- * Replay XLOG_HEAP2_VISIBLE records.
- *
- * The critical integrity requirement here is that we must never end up with
- * a situation where the visibility map bit is set, and the page-level
- * PD_ALL_VISIBLE bit is clear. If that were to occur, then a subsequent
- * page modification would fail to clear the visibility map bit.
- */
-static void
-heap_xlog_visible(XLogReaderState *record)
-{
- XLogRecPtr lsn = record->EndRecPtr;
- xl_heap_visible *xlrec = (xl_heap_visible *) XLogRecGetData(record);
- Buffer vmbuffer = InvalidBuffer;
- Buffer buffer;
- Page page;
- RelFileLocator rlocator;
- BlockNumber blkno;
- XLogRedoAction action;
-
- Assert((xlrec->flags & VISIBILITYMAP_XLOG_VALID_BITS) == xlrec->flags);
-
- XLogRecGetBlockTag(record, 1, &rlocator, NULL, &blkno);
-
- /*
- * If there are any Hot Standby transactions running that have an xmin
- * horizon old enough that this page isn't all-visible for them, they
- * might incorrectly decide that an index-only scan can skip a heap fetch.
- *
- * NB: It might be better to throw some kind of "soft" conflict here that
- * forces any index-only scan that is in flight to perform heap fetches,
- * rather than killing the transaction outright.
- */
- if (InHotStandby)
- ResolveRecoveryConflictWithSnapshot(xlrec->snapshotConflictHorizon,
- xlrec->flags & VISIBILITYMAP_XLOG_CATALOG_REL,
- rlocator);
-
- /*
- * Read the heap page, if it still exists. If the heap file has dropped or
- * truncated later in recovery, we don't need to update the page, but we'd
- * better still update the visibility map.
- */
- action = XLogReadBufferForRedo(record, 1, &buffer);
- if (action == BLK_NEEDS_REDO)
- {
- /*
- * We don't bump the LSN of the heap page when setting the visibility
- * map bit (unless checksums or wal_hint_bits is enabled, in which
- * case we must). This exposes us to torn page hazards, but since
- * we're not inspecting the existing page contents in any way, we
- * don't care.
- */
- page = BufferGetPage(buffer);
-
- PageSetAllVisible(page);
-
- if (XLogHintBitIsNeeded())
- PageSetLSN(page, lsn);
-
- MarkBufferDirty(buffer);
- }
- else if (action == BLK_RESTORED)
- {
- /*
- * If heap block was backed up, we already restored it and there's
- * nothing more to do. (This can only happen with checksums or
- * wal_log_hints enabled.)
- */
- }
-
- if (BufferIsValid(buffer))
- {
- Size space = PageGetFreeSpace(BufferGetPage(buffer));
-
- UnlockReleaseBuffer(buffer);
-
- /*
- * Since FSM is not WAL-logged and only updated heuristically, it
- * easily becomes stale in standbys. If the standby is later promoted
- * and runs VACUUM, it will skip updating individual free space
- * figures for pages that became all-visible (or all-frozen, depending
- * on the vacuum mode,) which is troublesome when FreeSpaceMapVacuum
- * propagates too optimistic free space values to upper FSM layers;
- * later inserters try to use such pages only to find out that they
- * are unusable. This can cause long stalls when there are many such
- * pages.
- *
- * Forestall those problems by updating FSM's idea about a page that
- * is becoming all-visible or all-frozen.
- *
- * Do this regardless of a full-page image being applied, since the
- * FSM data is not in the page anyway.
- */
- if (xlrec->flags & VISIBILITYMAP_VALID_BITS)
- XLogRecordPageWithFreeSpace(rlocator, blkno, space);
- }
-
- /*
- * Even if we skipped the heap page update due to the LSN interlock, it's
- * still safe to update the visibility map. Any WAL record that clears
- * the visibility map bit does so before checking the page LSN, so any
- * bits that need to be cleared will still be cleared.
- */
- if (XLogReadBufferForRedoExtended(record, 0, RBM_ZERO_ON_ERROR, false,
- &vmbuffer) == BLK_NEEDS_REDO)
- {
- Page vmpage = BufferGetPage(vmbuffer);
- Relation reln;
- uint8 vmbits;
-
- /* initialize the page if it was read as zeros */
- if (PageIsNew(vmpage))
- PageInit(vmpage, BLCKSZ, 0);
-
- /* remove VISIBILITYMAP_XLOG_* */
- vmbits = xlrec->flags & VISIBILITYMAP_VALID_BITS;
-
- /*
- * XLogReadBufferForRedoExtended locked the buffer. But
- * visibilitymap_set will handle locking itself.
- */
- LockBuffer(vmbuffer, BUFFER_LOCK_UNLOCK);
-
- reln = CreateFakeRelcacheEntry(rlocator);
-
- visibilitymap_set(reln, blkno, InvalidBuffer, lsn, vmbuffer,
- xlrec->snapshotConflictHorizon, vmbits);
-
- ReleaseBuffer(vmbuffer);
- FreeFakeRelcacheEntry(reln);
- }
- else if (BufferIsValid(vmbuffer))
- UnlockReleaseBuffer(vmbuffer);
-}
-
/*
* Given an "infobits" field from an XLog record, set the correct bits in the
* given infomask and infomask2 for the tuple touched by the record.
@@ -778,8 +642,8 @@ heap_xlog_multi_insert(XLogReaderState *record)
*
* During recovery, however, no concurrent writers exist. Therefore,
* updating the VM without holding the heap page lock is safe enough. This
- * same approach is taken when replaying xl_heap_visible records (see
- * heap_xlog_visible()).
+ * same approach is taken when replaying XLOG_HEAP2_PRUNE* records (see
+ * heap_xlog_prune_and_freeze()).
*/
if ((xlrec->flags & XLH_INSERT_ALL_FROZEN_SET) &&
XLogReadBufferForRedoExtended(record, 1, RBM_ZERO_ON_ERROR, false,
@@ -791,11 +655,11 @@ heap_xlog_multi_insert(XLogReaderState *record)
if (PageIsNew(vmpage))
PageInit(vmpage, BLCKSZ, 0);
- visibilitymap_set_vmbits(blkno,
- vmbuffer,
- VISIBILITYMAP_ALL_VISIBLE |
- VISIBILITYMAP_ALL_FROZEN,
- rlocator);
+ visibilitymap_set(blkno,
+ vmbuffer,
+ VISIBILITYMAP_ALL_VISIBLE |
+ VISIBILITYMAP_ALL_FROZEN,
+ rlocator);
Assert(BufferIsDirty(vmbuffer));
PageSetLSN(vmpage, lsn);
@@ -1376,9 +1240,6 @@ heap2_redo(XLogReaderState *record)
case XLOG_HEAP2_PRUNE_VACUUM_CLEANUP:
heap_xlog_prune_freeze(record);
break;
- case XLOG_HEAP2_VISIBLE:
- heap_xlog_visible(record);
- break;
case XLOG_HEAP2_MULTI_INSERT:
heap_xlog_multi_insert(record);
break;
diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index b38b62779ab..d4006803330 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -1031,9 +1031,9 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
{
Assert(PageIsAllVisible(page));
- old_vmbits = visibilitymap_set_vmbits(blockno,
- vmbuffer, new_vmbits,
- params->relation->rd_locator);
+ old_vmbits = visibilitymap_set(blockno,
+ vmbuffer, new_vmbits,
+ params->relation->rd_locator);
if (old_vmbits == new_vmbits)
{
LockBuffer(vmbuffer, BUFFER_LOCK_UNLOCK);
@@ -2308,14 +2308,18 @@ get_conflict_xid(bool do_prune, bool do_freeze, bool do_set_vm,
*
* This is used for several different page maintenance operations:
*
- * - Page pruning, in VACUUM's 1st pass or on access: Some items are
+ * - Page pruning, in vacuum phase I or on-access: Some items are
* redirected, some marked dead, and some removed altogether.
*
- * - Freezing: Items are marked as 'frozen'.
+ * - Freezing: During vacuum phase I, items are marked as 'frozen'
*
- * - Vacuum, 2nd pass: Items that are already LP_DEAD are marked as unused.
+ * - Reaping: During vacuum phase III, items that are already LP_DEAD are
+ * marked as unused.
*
- * They have enough commonalities that we use a single WAL record for them
+ * - VM updates: After vacuum phases I and III, the heap page may be marked
+ * all-visible and all-frozen.
+ *
+ * These changes all happen together, so we use a single WAL record for them
* all.
*
* If replaying the record requires a cleanup lock, pass cleanup_lock = true.
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index 14a8e342e51..d14f69ccb0a 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -1900,11 +1900,11 @@ lazy_scan_new_or_empty(LVRelState *vacrel, Buffer buf, BlockNumber blkno,
log_newpage_buffer(buf, true);
PageSetAllVisible(page);
- visibilitymap_set_vmbits(blkno,
- vmbuffer,
- VISIBILITYMAP_ALL_VISIBLE |
- VISIBILITYMAP_ALL_FROZEN,
- vacrel->rel->rd_locator);
+ visibilitymap_set(blkno,
+ vmbuffer,
+ VISIBILITYMAP_ALL_VISIBLE |
+ VISIBILITYMAP_ALL_FROZEN,
+ vacrel->rel->rd_locator);
/*
* Emit WAL for setting PD_ALL_VISIBLE on the heap page and
@@ -2786,9 +2786,9 @@ lazy_vacuum_heap_page(LVRelState *vacrel, BlockNumber blkno, Buffer buffer,
* set PD_ALL_VISIBLE.
*/
PageSetAllVisible(page);
- visibilitymap_set_vmbits(blkno,
- vmbuffer, vmflags,
- vacrel->rel->rd_locator);
+ visibilitymap_set(blkno,
+ vmbuffer, vmflags,
+ vacrel->rel->rd_locator);
conflict_xid = visibility_cutoff_xid;
}
diff --git a/src/backend/access/heap/visibilitymap.c b/src/backend/access/heap/visibilitymap.c
index 2f5e61e2392..a75b5bb6b13 100644
--- a/src/backend/access/heap/visibilitymap.c
+++ b/src/backend/access/heap/visibilitymap.c
@@ -14,8 +14,7 @@
* visibilitymap_clear - clear bits for one page in the visibility map
* visibilitymap_pin - pin a map page for setting a bit
* visibilitymap_pin_ok - check whether correct map page is already pinned
- * visibilitymap_set - set bit(s) in a previously pinned page and log
- * visibilitymap_set_vmbits - set bit(s) in a pinned page
+ * visibilitymap_set - set bit(s) in a previously pinned page
* visibilitymap_get_status - get status of bits
* visibilitymap_count - count number of bits set in visibility map
* visibilitymap_prepare_truncate -
@@ -220,108 +219,6 @@ visibilitymap_pin_ok(BlockNumber heapBlk, Buffer vmbuf)
return BufferIsValid(vmbuf) && BufferGetBlockNumber(vmbuf) == mapBlock;
}
-/*
- * visibilitymap_set - set bit(s) on a previously pinned page
- *
- * recptr is the LSN of the XLOG record we're replaying, if we're in recovery,
- * or InvalidXLogRecPtr in normal running. The VM page LSN is advanced to the
- * one provided; in normal running, we generate a new XLOG record and set the
- * page LSN to that value (though the heap page's LSN may *not* be updated;
- * see below). cutoff_xid is the largest xmin on the page being marked
- * all-visible; it is needed for Hot Standby, and can be InvalidTransactionId
- * if the page contains no tuples. It can also be set to InvalidTransactionId
- * when a page that is already all-visible is being marked all-frozen.
- *
- * Caller is expected to set the heap page's PD_ALL_VISIBLE bit before calling
- * this function. Except in recovery, caller should also pass the heap
- * buffer. When checksums are enabled and we're not in recovery, we must add
- * the heap buffer to the WAL chain to protect it from being torn.
- *
- * You must pass a buffer containing the correct map page to this function.
- * Call visibilitymap_pin first to pin the right one. This function doesn't do
- * any I/O.
- *
- * Returns the state of the page's VM bits before setting flags.
- */
-uint8
-visibilitymap_set(Relation rel, BlockNumber heapBlk, Buffer heapBuf,
- XLogRecPtr recptr, Buffer vmBuf, TransactionId cutoff_xid,
- uint8 flags)
-{
- BlockNumber mapBlock = HEAPBLK_TO_MAPBLOCK(heapBlk);
- uint32 mapByte = HEAPBLK_TO_MAPBYTE(heapBlk);
- uint8 mapOffset = HEAPBLK_TO_OFFSET(heapBlk);
- Page page;
- uint8 *map;
- uint8 status;
-
-#ifdef TRACE_VISIBILITYMAP
- elog(DEBUG1, "vm_set flags 0x%02X for %s %d",
- flags, RelationGetRelationName(rel), heapBlk);
-#endif
-
- Assert(InRecovery || XLogRecPtrIsInvalid(recptr));
- Assert(InRecovery || PageIsAllVisible(BufferGetPage(heapBuf)));
- Assert((flags & VISIBILITYMAP_VALID_BITS) == flags);
-
- /* Must never set all_frozen bit without also setting all_visible bit */
- Assert(flags != VISIBILITYMAP_ALL_FROZEN);
-
- /* Check that we have the right heap page pinned, if present */
- if (BufferIsValid(heapBuf) && BufferGetBlockNumber(heapBuf) != heapBlk)
- elog(ERROR, "wrong heap buffer passed to visibilitymap_set");
-
- Assert(!BufferIsValid(heapBuf) ||
- BufferIsLockedByMeInMode(heapBuf, BUFFER_LOCK_EXCLUSIVE));
-
- /* Check that we have the right VM page pinned */
- if (!BufferIsValid(vmBuf) || BufferGetBlockNumber(vmBuf) != mapBlock)
- elog(ERROR, "wrong VM buffer passed to visibilitymap_set");
-
- page = BufferGetPage(vmBuf);
- map = (uint8 *) PageGetContents(page);
- LockBuffer(vmBuf, BUFFER_LOCK_EXCLUSIVE);
-
- status = (map[mapByte] >> mapOffset) & VISIBILITYMAP_VALID_BITS;
- if (flags != status)
- {
- START_CRIT_SECTION();
-
- map[mapByte] |= (flags << mapOffset);
- MarkBufferDirty(vmBuf);
-
- if (RelationNeedsWAL(rel))
- {
- if (XLogRecPtrIsInvalid(recptr))
- {
- Assert(!InRecovery);
- recptr = log_heap_visible(rel, heapBuf, vmBuf, cutoff_xid, flags);
-
- /*
- * If data checksums are enabled (or wal_log_hints=on), we
- * need to protect the heap page from being torn.
- *
- * If not, then we must *not* update the heap page's LSN. In
- * this case, the FPI for the heap page was omitted from the
- * WAL record inserted above, so it would be incorrect to
- * update the heap page's LSN.
- */
- if (XLogHintBitIsNeeded())
- {
- Page heapPage = BufferGetPage(heapBuf);
-
- PageSetLSN(heapPage, recptr);
- }
- }
- PageSetLSN(page, recptr);
- }
-
- END_CRIT_SECTION();
- }
-
- LockBuffer(vmBuf, BUFFER_LOCK_UNLOCK);
- return status;
-}
/*
* Set VM (visibility map) flags in the VM block in vmBuf.
@@ -344,9 +241,9 @@ visibilitymap_set(Relation rel, BlockNumber heapBlk, Buffer heapBuf,
* rlocator is used only for debugging messages.
*/
uint8
-visibilitymap_set_vmbits(BlockNumber heapBlk,
- Buffer vmBuf, uint8 flags,
- const RelFileLocator rlocator)
+visibilitymap_set(BlockNumber heapBlk,
+ Buffer vmBuf, uint8 flags,
+ const RelFileLocator rlocator)
{
BlockNumber mapBlock = HEAPBLK_TO_MAPBLOCK(heapBlk);
uint32 mapByte = HEAPBLK_TO_MAPBYTE(heapBlk);
diff --git a/src/backend/access/rmgrdesc/heapdesc.c b/src/backend/access/rmgrdesc/heapdesc.c
index ca26d1f0ed1..08461fdf593 100644
--- a/src/backend/access/rmgrdesc/heapdesc.c
+++ b/src/backend/access/rmgrdesc/heapdesc.c
@@ -349,13 +349,6 @@ heap2_desc(StringInfo buf, XLogReaderState *record)
}
}
}
- else if (info == XLOG_HEAP2_VISIBLE)
- {
- xl_heap_visible *xlrec = (xl_heap_visible *) rec;
-
- appendStringInfo(buf, "snapshotConflictHorizon: %u, flags: 0x%02X",
- xlrec->snapshotConflictHorizon, xlrec->flags);
- }
else if (info == XLOG_HEAP2_MULTI_INSERT)
{
xl_heap_multi_insert *xlrec = (xl_heap_multi_insert *) rec;
@@ -461,9 +454,6 @@ heap2_identify(uint8 info)
case XLOG_HEAP2_PRUNE_VACUUM_CLEANUP:
id = "PRUNE_VACUUM_CLEANUP";
break;
- case XLOG_HEAP2_VISIBLE:
- id = "VISIBLE";
- break;
case XLOG_HEAP2_MULTI_INSERT:
id = "MULTI_INSERT";
break;
diff --git a/src/backend/replication/logical/decode.c b/src/backend/replication/logical/decode.c
index cc03f0706e9..2fdd4af90a8 100644
--- a/src/backend/replication/logical/decode.c
+++ b/src/backend/replication/logical/decode.c
@@ -454,7 +454,6 @@ heap2_decode(LogicalDecodingContext *ctx, XLogRecordBuffer *buf)
case XLOG_HEAP2_PRUNE_ON_ACCESS:
case XLOG_HEAP2_PRUNE_VACUUM_SCAN:
case XLOG_HEAP2_PRUNE_VACUUM_CLEANUP:
- case XLOG_HEAP2_VISIBLE:
case XLOG_HEAP2_LOCK_UPDATED:
break;
default:
diff --git a/src/backend/storage/ipc/standby.c b/src/backend/storage/ipc/standby.c
index 4222bdab078..c619643e121 100644
--- a/src/backend/storage/ipc/standby.c
+++ b/src/backend/storage/ipc/standby.c
@@ -475,12 +475,12 @@ ResolveRecoveryConflictWithSnapshot(TransactionId snapshotConflictHorizon,
* If we get passed InvalidTransactionId then we do nothing (no conflict).
*
* This can happen when replaying already-applied WAL records after a
- * standby crash or restart, or when replaying an XLOG_HEAP2_VISIBLE
- * record that marks as frozen a page which was already all-visible. It's
- * also quite common with records generated during index deletion
- * (original execution of the deletion can reason that a recovery conflict
- * which is sufficient for the deletion operation must take place before
- * replay of the deletion record itself).
+ * standby crash or restart, or when replaying a record that marks as
+ * frozen a page which was already marked all-visible in the visibility
+ * map. It's also quite common with records generated during index
+ * deletion (original execution of the deletion can reason that a recovery
+ * conflict which is sufficient for the deletion operation must take place
+ * before replay of the deletion record itself).
*/
if (!TransactionIdIsValid(snapshotConflictHorizon))
return;
diff --git a/src/include/access/heapam_xlog.h b/src/include/access/heapam_xlog.h
index 16c2b2e3c9c..e9e77bd678b 100644
--- a/src/include/access/heapam_xlog.h
+++ b/src/include/access/heapam_xlog.h
@@ -60,7 +60,6 @@
#define XLOG_HEAP2_PRUNE_ON_ACCESS 0x10
#define XLOG_HEAP2_PRUNE_VACUUM_SCAN 0x20
#define XLOG_HEAP2_PRUNE_VACUUM_CLEANUP 0x30
-#define XLOG_HEAP2_VISIBLE 0x40
#define XLOG_HEAP2_MULTI_INSERT 0x50
#define XLOG_HEAP2_LOCK_UPDATED 0x60
#define XLOG_HEAP2_NEW_CID 0x70
@@ -294,7 +293,13 @@ typedef struct xl_heap_prune
#define SizeOfHeapPrune (offsetof(xl_heap_prune, flags) + sizeof(uint16))
-/* to handle recovery conflict during logical decoding on standby */
+/*
+ * To handle recovery conflict during logical decoding on standby, we must know
+ * if the table is a catalog table. Note that in visibilitymapdefs.h
+ * VISIBLITYMAP_XLOG_CATALOG_REL is also defined as (1 << 2). xl_heap_prune
+ * records should use XLHP_IS_CATALOG_REL, not VISIBILIYTMAP_XLOG_CATALOG_REL --
+ * even if they only contain updates to the VM.
+ */
#define XLHP_IS_CATALOG_REL (1 << 1)
/*
@@ -443,20 +448,6 @@ typedef struct xl_heap_inplace
#define MinSizeOfHeapInplace (offsetof(xl_heap_inplace, nmsgs) + sizeof(int))
-/*
- * This is what we need to know about setting a visibility map bit
- *
- * Backup blk 0: visibility map buffer
- * Backup blk 1: heap buffer
- */
-typedef struct xl_heap_visible
-{
- TransactionId snapshotConflictHorizon;
- uint8 flags;
-} xl_heap_visible;
-
-#define SizeOfHeapVisible (offsetof(xl_heap_visible, flags) + sizeof(uint8))
-
typedef struct xl_heap_new_cid
{
/*
@@ -500,11 +491,6 @@ extern void heap2_desc(StringInfo buf, XLogReaderState *record);
extern const char *heap2_identify(uint8 info);
extern void heap_xlog_logical_rewrite(XLogReaderState *r);
-extern XLogRecPtr log_heap_visible(Relation rel, Buffer heap_buffer,
- Buffer vm_buffer,
- TransactionId snapshotConflictHorizon,
- uint8 vmflags);
-
/* in heapdesc.c, so it can be shared between frontend/backend code */
extern void heap_xlog_deserialize_prune_and_freeze(char *cursor, uint16 flags,
int *nplans, xlhp_freeze_plan **plans,
diff --git a/src/include/access/visibilitymap.h b/src/include/access/visibilitymap.h
index c6fa37be968..05ba6786b47 100644
--- a/src/include/access/visibilitymap.h
+++ b/src/include/access/visibilitymap.h
@@ -15,7 +15,6 @@
#define VISIBILITYMAP_H
#include "access/visibilitymapdefs.h"
-#include "access/xlogdefs.h"
#include "storage/block.h"
#include "storage/buf.h"
#include "storage/relfilelocator.h"
@@ -32,15 +31,9 @@ extern bool visibilitymap_clear(Relation rel, BlockNumber heapBlk,
extern void visibilitymap_pin(Relation rel, BlockNumber heapBlk,
Buffer *vmbuf);
extern bool visibilitymap_pin_ok(BlockNumber heapBlk, Buffer vmbuf);
-extern uint8 visibilitymap_set(Relation rel,
- BlockNumber heapBlk, Buffer heapBuf,
- XLogRecPtr recptr,
- Buffer vmBuf,
- TransactionId cutoff_xid,
- uint8 flags);
-extern uint8 visibilitymap_set_vmbits(BlockNumber heapBlk,
- Buffer vmBuf, uint8 flags,
- const RelFileLocator rlocator);
+extern uint8 visibilitymap_set(BlockNumber heapBlk,
+ Buffer vmBuf, uint8 flags,
+ const RelFileLocator rlocator);
extern uint8 visibilitymap_get_status(Relation rel, BlockNumber heapBlk, Buffer *vmbuf);
extern void visibilitymap_count(Relation rel, BlockNumber *all_visible, BlockNumber *all_frozen);
extern BlockNumber visibilitymap_prepare_truncate(Relation rel,
diff --git a/src/include/access/visibilitymapdefs.h b/src/include/access/visibilitymapdefs.h
index 5ad5c020877..e01bce4c99f 100644
--- a/src/include/access/visibilitymapdefs.h
+++ b/src/include/access/visibilitymapdefs.h
@@ -21,14 +21,5 @@
#define VISIBILITYMAP_ALL_FROZEN 0x02
#define VISIBILITYMAP_VALID_BITS 0x03 /* OR of all valid visibilitymap
* flags bits */
-/*
- * To detect recovery conflicts during logical decoding on a standby, we need
- * to know if a table is a user catalog table. For that we add an additional
- * bit into xl_heap_visible.flags, in addition to the above.
- *
- * NB: VISIBILITYMAP_XLOG_* may not be passed to visibilitymap_set().
- */
-#define VISIBILITYMAP_XLOG_CATALOG_REL 0x04
-#define VISIBILITYMAP_XLOG_VALID_BITS (VISIBILITYMAP_VALID_BITS | VISIBILITYMAP_XLOG_CATALOG_REL)
#endif /* VISIBILITYMAPDEFS_H */
diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list
index b221b3699bf..3ef4c06c85d 100644
--- a/src/tools/pgindent/typedefs.list
+++ b/src/tools/pgindent/typedefs.list
@@ -4277,7 +4277,6 @@ xl_heap_prune
xl_heap_rewrite_mapping
xl_heap_truncate
xl_heap_update
-xl_heap_visible
xl_invalid_page
xl_invalid_page_key
xl_invalidations
--
2.43.0
v18-0007-Rename-GlobalVisTestIsRemovableXid-to-GlobalVisX.patchtext/x-patch; charset=UTF-8; name=v18-0007-Rename-GlobalVisTestIsRemovableXid-to-GlobalVisX.patchDownload
From bd2d6a5ee19706a4e7d51e1df3479234fe28e3fc Mon Sep 17 00:00:00 2001
From: Melanie Plageman <melanieplageman@gmail.com>
Date: Fri, 18 Jul 2025 16:30:04 -0400
Subject: [PATCH v18 07/12] Rename GlobalVisTestIsRemovableXid() to
GlobalVisXidVisibleToAll()
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
The function is currently only used to check whether a tuple’s xmax is
visible to all transactions (and thus removable). Upcoming changes will
also use it to test whether a tuple’s xmin is visible to all to
decide if a page can be marked all-visible in the visibility map.
The new name, GlobalVisXidVisibleToAll(), better reflects this broader
purpose.
Reviewed-by: Kirill Reshke <reshkekirill@gmail.com>
---
src/backend/access/heap/heapam_visibility.c | 6 +++---
src/backend/access/heap/pruneheap.c | 16 ++++++++--------
src/backend/access/spgist/spgvacuum.c | 2 +-
src/backend/storage/ipc/procarray.c | 17 ++++++++---------
src/include/utils/snapmgr.h | 4 ++--
5 files changed, 22 insertions(+), 23 deletions(-)
diff --git a/src/backend/access/heap/heapam_visibility.c b/src/backend/access/heap/heapam_visibility.c
index 05f6946fe60..4ebc8abdbeb 100644
--- a/src/backend/access/heap/heapam_visibility.c
+++ b/src/backend/access/heap/heapam_visibility.c
@@ -1447,7 +1447,7 @@ HeapTupleSatisfiesNonVacuumable(HeapTuple htup, Snapshot snapshot,
{
Assert(TransactionIdIsValid(dead_after));
- if (GlobalVisTestIsRemovableXid(snapshot->vistest, dead_after))
+ if (GlobalVisXidVisibleToAll(snapshot->vistest, dead_after))
res = HEAPTUPLE_DEAD;
}
else
@@ -1512,8 +1512,8 @@ HeapTupleIsSurelyDead(HeapTuple htup, GlobalVisState *vistest)
return false;
/* Deleter committed, so tuple is dead if the XID is old enough. */
- return GlobalVisTestIsRemovableXid(vistest,
- HeapTupleHeaderGetRawXmax(tuple));
+ return GlobalVisXidVisibleToAll(vistest,
+ HeapTupleHeaderGetRawXmax(tuple));
}
/*
diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index d4006803330..40d0ae6fcde 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -235,7 +235,7 @@ heap_page_prune_opt(Relation relation, Buffer buffer)
*/
vistest = GlobalVisTestFor(relation);
- if (!GlobalVisTestIsRemovableXid(vistest, prune_xid))
+ if (!GlobalVisXidVisibleToAll(vistest, prune_xid))
return;
/*
@@ -730,9 +730,9 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
* Determining HTSV only once for each tuple is required for correctness,
* to deal with cases where running HTSV twice could result in different
* results. For example, RECENTLY_DEAD can turn to DEAD if another
- * checked item causes GlobalVisTestIsRemovableFullXid() to update the
- * horizon, or INSERT_IN_PROGRESS can change to DEAD if the inserting
- * transaction aborts.
+ * checked item causes GlobalVisXidVisibleToAll() to update the horizon,
+ * or INSERT_IN_PROGRESS can change to DEAD if the inserting transaction
+ * aborts.
*
* It's also good for performance. Most commonly tuples within a page are
* stored at decreasing offsets (while the items are stored at increasing
@@ -1157,11 +1157,11 @@ heap_prune_satisfies_vacuum(PruneState *prstate, HeapTuple tup, Buffer buffer)
* Determine whether or not the tuple is considered dead when compared
* with the provided GlobalVisState. On-access pruning does not provide
* VacuumCutoffs. And for vacuum, even if the tuple's xmax is not older
- * than OldestXmin, GlobalVisTestIsRemovableXid() could find the row dead
- * if the GlobalVisState has been updated since the beginning of vacuuming
+ * than OldestXmin, GlobalVisXidVisibleToAll() could find the row dead if
+ * the GlobalVisState has been updated since the beginning of vacuuming
* the relation.
*/
- if (GlobalVisTestIsRemovableXid(prstate->vistest, dead_after))
+ if (GlobalVisXidVisibleToAll(prstate->vistest, dead_after))
return HEAPTUPLE_DEAD;
return res;
@@ -1618,7 +1618,7 @@ heap_prune_record_unchanged_lp_normal(Page page, PruneState *prstate, OffsetNumb
/*
* For now always use prstate->cutoffs for this test, because
* we only update 'all_visible' when freezing is requested. We
- * could use GlobalVisTestIsRemovableXid instead, if a
+ * could use GlobalVisXidVisibleToAll() instead, if a
* non-freezing caller wanted to set the VM bit.
*/
Assert(prstate->cutoffs);
diff --git a/src/backend/access/spgist/spgvacuum.c b/src/backend/access/spgist/spgvacuum.c
index 8f8a1ad7796..496cca69410 100644
--- a/src/backend/access/spgist/spgvacuum.c
+++ b/src/backend/access/spgist/spgvacuum.c
@@ -536,7 +536,7 @@ vacuumRedirectAndPlaceholder(Relation index, Relation heaprel, Buffer buffer)
*/
if (dt->tupstate == SPGIST_REDIRECT &&
(!TransactionIdIsValid(dt->xid) ||
- GlobalVisTestIsRemovableXid(vistest, dt->xid)))
+ GlobalVisXidVisibleToAll(vistest, dt->xid)))
{
dt->tupstate = SPGIST_PLACEHOLDER;
Assert(opaque->nRedirection > 0);
diff --git a/src/backend/storage/ipc/procarray.c b/src/backend/storage/ipc/procarray.c
index 200f72c6e25..235c3b584f6 100644
--- a/src/backend/storage/ipc/procarray.c
+++ b/src/backend/storage/ipc/procarray.c
@@ -4181,8 +4181,7 @@ GlobalVisUpdate(void)
* See comment for GlobalVisState for details.
*/
bool
-GlobalVisTestIsRemovableFullXid(GlobalVisState *state,
- FullTransactionId fxid)
+GlobalVisFullXidVisibleToAll(GlobalVisState *state, FullTransactionId fxid)
{
/*
* If fxid is older than maybe_needed bound, it definitely is visible to
@@ -4216,14 +4215,14 @@ GlobalVisTestIsRemovableFullXid(GlobalVisState *state,
}
/*
- * Wrapper around GlobalVisTestIsRemovableFullXid() for 32bit xids.
+ * Wrapper around GlobalVisFullXidVisibleToAll() for 32bit xids.
*
* It is crucial that this only gets called for xids from a source that
* protects against xid wraparounds (e.g. from a table and thus protected by
* relfrozenxid).
*/
bool
-GlobalVisTestIsRemovableXid(GlobalVisState *state, TransactionId xid)
+GlobalVisXidVisibleToAll(GlobalVisState *state, TransactionId xid)
{
FullTransactionId fxid;
@@ -4237,12 +4236,12 @@ GlobalVisTestIsRemovableXid(GlobalVisState *state, TransactionId xid)
*/
fxid = FullXidRelativeTo(state->definitely_needed, xid);
- return GlobalVisTestIsRemovableFullXid(state, fxid);
+ return GlobalVisFullXidVisibleToAll(state, fxid);
}
/*
* Convenience wrapper around GlobalVisTestFor() and
- * GlobalVisTestIsRemovableFullXid(), see their comments.
+ * GlobalVisFullXidVisibleToAll(), see their comments.
*/
bool
GlobalVisCheckRemovableFullXid(Relation rel, FullTransactionId fxid)
@@ -4251,12 +4250,12 @@ GlobalVisCheckRemovableFullXid(Relation rel, FullTransactionId fxid)
state = GlobalVisTestFor(rel);
- return GlobalVisTestIsRemovableFullXid(state, fxid);
+ return GlobalVisFullXidVisibleToAll(state, fxid);
}
/*
* Convenience wrapper around GlobalVisTestFor() and
- * GlobalVisTestIsRemovableXid(), see their comments.
+ * GlobalVisTestIsVisibleXid(), see their comments.
*/
bool
GlobalVisCheckRemovableXid(Relation rel, TransactionId xid)
@@ -4265,7 +4264,7 @@ GlobalVisCheckRemovableXid(Relation rel, TransactionId xid)
state = GlobalVisTestFor(rel);
- return GlobalVisTestIsRemovableXid(state, xid);
+ return GlobalVisXidVisibleToAll(state, xid);
}
/*
diff --git a/src/include/utils/snapmgr.h b/src/include/utils/snapmgr.h
index 604c1f90216..a0ea2cfcea2 100644
--- a/src/include/utils/snapmgr.h
+++ b/src/include/utils/snapmgr.h
@@ -100,8 +100,8 @@ extern char *ExportSnapshot(Snapshot snapshot);
*/
typedef struct GlobalVisState GlobalVisState;
extern GlobalVisState *GlobalVisTestFor(Relation rel);
-extern bool GlobalVisTestIsRemovableXid(GlobalVisState *state, TransactionId xid);
-extern bool GlobalVisTestIsRemovableFullXid(GlobalVisState *state, FullTransactionId fxid);
+extern bool GlobalVisXidVisibleToAll(GlobalVisState *state, TransactionId xid);
+extern bool GlobalVisFullXidVisibleToAll(GlobalVisState *state, FullTransactionId fxid);
extern bool GlobalVisCheckRemovableXid(Relation rel, TransactionId xid);
extern bool GlobalVisCheckRemovableFullXid(Relation rel, FullTransactionId fxid);
--
2.43.0
v18-0008-Use-GlobalVisState-in-vacuum-to-determine-page-l.patchtext/x-patch; charset=UTF-8; name=v18-0008-Use-GlobalVisState-in-vacuum-to-determine-page-l.patchDownload
From a2c8648351211dec01a107c80325a64a618ecafe Mon Sep 17 00:00:00 2001
From: Melanie Plageman <melanieplageman@gmail.com>
Date: Tue, 29 Jul 2025 14:38:24 -0400
Subject: [PATCH v18 08/12] Use GlobalVisState in vacuum to determine page
level visibility
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
During vacuum's first and third phases, we examine tuples' visibility
to determine if we can set the page all-visible in the visibility map.
Previously, this check compared tuple xmins against a single XID chosen at
the start of vacuum (OldestXmin). We now use GlobalVisState, which also
enables future work to set the VM during on-access pruning, since ordinary
queries have access to GlobalVisState but not OldestXmin.
This also benefits vacuum directly: in some cases, GlobalVisState may
advance during a vacuum, allowing more pages to become considered
all-visible. And, in the future, we could easily add a heuristic to
update GlobalVisState more frequently during vacuums of large tables. In
the rare case that the GlobalVisState moves backward, vacuum falls back
to OldestXmin to ensure we don’t attempt to freeze a dead tuple that
wasn’t yet prunable according to the GlobalVisState.
Because comparing a transaction ID against GlobalVisState is more
expensive than comparing against a single XID, we defer this check until
after scanning all tuples on the page. If visibility_cutoff_xid was
maintained, we perform the GlobalVisState check only once per page.
This is safe because visibility_cutoff_xid records the newest xmin on
the page; if it is globally visible, then the entire page is all-visible.
This approach may result in examining more tuple xmins than before,
since with OldestXmin we could sometimes rule out the page being
all-visible earlier. However, profiling shows the additional cost is not
significant.
---
src/backend/access/heap/heapam_visibility.c | 28 ++++++++++++++++
src/backend/access/heap/pruneheap.c | 37 ++++++++++-----------
src/backend/access/heap/vacuumlazy.c | 17 +++++-----
src/include/access/heapam.h | 7 ++--
4 files changed, 57 insertions(+), 32 deletions(-)
diff --git a/src/backend/access/heap/heapam_visibility.c b/src/backend/access/heap/heapam_visibility.c
index 4ebc8abdbeb..edd529dc3c0 100644
--- a/src/backend/access/heap/heapam_visibility.c
+++ b/src/backend/access/heap/heapam_visibility.c
@@ -1189,6 +1189,34 @@ HeapTupleSatisfiesVacuum(HeapTuple htup, TransactionId OldestXmin,
return res;
}
+/*
+ * Nearly the same as HeapTupleSatisfiesVacuum, but uses a GlobalVisState to
+ * determine whether or not a tuple is HEAPTUPLE_DEAD Or
+ * HEAPTUPLE_RECENTLY_DEAD. It serves the same purpose but can be used by
+ * callers that have not calculated a single OldestXmin value.
+ */
+HTSV_Result
+HeapTupleSatisfiesVacuumGlobalVis(HeapTuple htup, GlobalVisState *vistest,
+ Buffer buffer)
+{
+ TransactionId dead_after = InvalidTransactionId;
+ HTSV_Result res;
+
+ res = HeapTupleSatisfiesVacuumHorizon(htup, buffer, &dead_after);
+
+ if (res == HEAPTUPLE_RECENTLY_DEAD)
+ {
+ Assert(TransactionIdIsValid(dead_after));
+
+ if (GlobalVisXidVisibleToAll(vistest, dead_after))
+ res = HEAPTUPLE_DEAD;
+ }
+ else
+ Assert(!TransactionIdIsValid(dead_after));
+
+ return res;
+}
+
/*
* Work horse for HeapTupleSatisfiesVacuum and similar routines.
*
diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index 40d0ae6fcde..6fc737eed69 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -712,11 +712,12 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
/*
* The visibility cutoff xid is the newest xmin of live, committed tuples
- * older than OldestXmin on the page. This field is only kept up-to-date
- * if the page is all-visible. As soon as a tuple is encountered that is
- * not visible to all, this field is unmaintained. As long as it is
- * maintained, it can be used to calculate the snapshot conflict horizon
- * when updating the VM and/or freezing all the tuples on the page.
+ * on the page older than the visibility horizon represented in the
+ * GlobalVisState. This field is only kept up-to-date if the page is
+ * all-visible. As soon as a tuple is encountered that is not visible to
+ * all, this field is unmaintained. As long as it is maintained, it can be
+ * used to calculate the snapshot conflict horizon when updating the VM
+ * and/or freezing all the tuples on the page.
*/
prstate.visibility_cutoff_xid = InvalidTransactionId;
@@ -912,6 +913,16 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
prstate.ndead > 0 ||
prstate.nunused > 0;
+ /*
+ * After processing all the live tuples on the page, if the newest xmin
+ * amongst them is not visible to everyone, the page cannot be
+ * all-visible.
+ */
+ if (prstate.all_visible &&
+ TransactionIdIsNormal(prstate.visibility_cutoff_xid) &&
+ !GlobalVisXidVisibleToAll(prstate.vistest, prstate.visibility_cutoff_xid))
+ prstate.all_visible = prstate.all_frozen = false;
+
/*
* Even if we don't prune anything, if we found a new value for the
* pd_prune_xid field or the page was marked full, we will update the hint
@@ -1084,10 +1095,9 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
bool debug_all_frozen;
Assert(prstate.lpdead_items == 0);
- Assert(prstate.cutoffs);
if (!heap_page_is_all_visible(params->relation, buffer,
- prstate.cutoffs->OldestXmin,
+ prstate.vistest,
&debug_all_frozen,
&debug_cutoff, off_loc))
Assert(false);
@@ -1615,19 +1625,6 @@ heap_prune_record_unchanged_lp_normal(Page page, PruneState *prstate, OffsetNumb
*/
xmin = HeapTupleHeaderGetXmin(htup);
- /*
- * For now always use prstate->cutoffs for this test, because
- * we only update 'all_visible' when freezing is requested. We
- * could use GlobalVisXidVisibleToAll() instead, if a
- * non-freezing caller wanted to set the VM bit.
- */
- Assert(prstate->cutoffs);
- if (!TransactionIdPrecedes(xmin, prstate->cutoffs->OldestXmin))
- {
- prstate->all_visible = prstate->all_frozen = false;
- break;
- }
-
/* Track newest xmin on page. */
if (TransactionIdFollows(xmin, prstate->visibility_cutoff_xid) &&
TransactionIdIsNormal(xmin))
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index d14f69ccb0a..92ad096d935 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -465,7 +465,7 @@ static void dead_items_reset(LVRelState *vacrel);
static void dead_items_cleanup(LVRelState *vacrel);
static bool heap_page_would_be_all_visible(Relation rel, Buffer buf,
- TransactionId OldestXmin,
+ GlobalVisState *vistest,
OffsetNumber *deadoffsets,
int ndeadoffsets,
bool *all_frozen,
@@ -2740,7 +2740,7 @@ lazy_vacuum_heap_page(LVRelState *vacrel, BlockNumber blkno, Buffer buffer,
* done outside the critical section.
*/
if (heap_page_would_be_all_visible(vacrel->rel, buffer,
- vacrel->cutoffs.OldestXmin,
+ vacrel->vistest,
deadoffsets, num_offsets,
&all_frozen, &visibility_cutoff_xid,
&vacrel->offnum))
@@ -3495,14 +3495,13 @@ dead_items_cleanup(LVRelState *vacrel)
*/
bool
heap_page_is_all_visible(Relation rel, Buffer buf,
- TransactionId OldestXmin,
+ GlobalVisState *vistest,
bool *all_frozen,
TransactionId *visibility_cutoff_xid,
OffsetNumber *logging_offnum)
{
- return heap_page_would_be_all_visible(rel, buf,
- OldestXmin,
+ return heap_page_would_be_all_visible(rel, buf, vistest,
NULL, 0,
all_frozen,
visibility_cutoff_xid,
@@ -3523,7 +3522,7 @@ heap_page_is_all_visible(Relation rel, Buffer buf,
* Returns true if the page is all-visible other than the provided
* deadoffsets and false otherwise.
*
- * OldestXmin is used to determine visibility.
+ * vistest is used to determine visibility.
*
* Output parameters:
*
@@ -3542,7 +3541,7 @@ heap_page_is_all_visible(Relation rel, Buffer buf,
*/
static bool
heap_page_would_be_all_visible(Relation rel, Buffer buf,
- TransactionId OldestXmin,
+ GlobalVisState *vistest,
OffsetNumber *deadoffsets,
int ndeadoffsets,
bool *all_frozen,
@@ -3616,7 +3615,7 @@ heap_page_would_be_all_visible(Relation rel, Buffer buf,
/* Visibility checks may do IO or allocate memory */
Assert(CritSectionCount == 0);
- switch (HeapTupleSatisfiesVacuum(&tuple, OldestXmin, buf))
+ switch (HeapTupleSatisfiesVacuumGlobalVis(&tuple, vistest, buf))
{
case HEAPTUPLE_LIVE:
{
@@ -3635,7 +3634,7 @@ heap_page_would_be_all_visible(Relation rel, Buffer buf,
* that everyone sees it as committed?
*/
xmin = HeapTupleHeaderGetXmin(tuple.t_data);
- if (!TransactionIdPrecedes(xmin, OldestXmin))
+ if (!GlobalVisXidVisibleToAll(vistest, xmin))
{
all_visible = false;
*all_frozen = false;
diff --git a/src/include/access/heapam.h b/src/include/access/heapam.h
index cb70f8ec562..00213fad852 100644
--- a/src/include/access/heapam.h
+++ b/src/include/access/heapam.h
@@ -276,8 +276,7 @@ typedef struct PruneFreezeParams
/*
* cutoffs contains the freeze cutoffs, established by VACUUM at the
* beginning of vacuuming the relation. Required if HEAP_PRUNE_FREEZE
- * option is set. cutoffs->OldestXmin is also used to determine if dead
- * tuples are HEAPTUPLE_RECENTLY_DEAD or HEAPTUPLE_DEAD.
+ * option is set.
*/
struct VacuumCutoffs *cutoffs;
} PruneFreezeParams;
@@ -444,7 +443,7 @@ extern void heap_vacuum_rel(Relation rel,
#ifdef USE_ASSERT_CHECKING
extern bool heap_page_is_all_visible(Relation rel, Buffer buf,
- TransactionId OldestXmin,
+ GlobalVisState *vistest,
bool *all_frozen,
TransactionId *visibility_cutoff_xid,
OffsetNumber *logging_offnum);
@@ -457,6 +456,8 @@ extern TM_Result HeapTupleSatisfiesUpdate(HeapTuple htup, CommandId curcid,
Buffer buffer);
extern HTSV_Result HeapTupleSatisfiesVacuum(HeapTuple htup, TransactionId OldestXmin,
Buffer buffer);
+extern HTSV_Result HeapTupleSatisfiesVacuumGlobalVis(HeapTuple htup,
+ GlobalVisState *vistest, Buffer buffer);
extern HTSV_Result HeapTupleSatisfiesVacuumHorizon(HeapTuple htup, Buffer buffer,
TransactionId *dead_after);
extern void HeapTupleSetHintBits(HeapTupleHeader tuple, Buffer buffer,
--
2.43.0
v18-0009-Unset-all_visible-sooner-if-not-freezing.patchtext/x-patch; charset=UTF-8; name=v18-0009-Unset-all_visible-sooner-if-not-freezing.patchDownload
From 51ea8c8266da0947c46951279d13fc8834f0ca45 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <melanieplageman@gmail.com>
Date: Tue, 14 Oct 2025 15:22:35 -0400
Subject: [PATCH v18 09/12] Unset all_visible sooner if not freezing
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
In the prune/freeze path, we currently delay clearing all_visible and
all_frozen in the presence of dead items to allow opportunistic
freezing.
However, if no freezing will be attempted, there’s no need to delay.
Clearing the flags earlier avoids extra bookkeeping in
heap_prune_record_unchanged_lp_normal(). This currently has no runtime
effect because all callers that consider setting the VM also prepare
freeze plans, but upcoming changes will allow on-access pruning to set
the VM without freezing. The extra bookkeeping was noticeable in a
profile of on-access VM setting.
---
src/backend/access/heap/pruneheap.c | 10 ++++++++--
1 file changed, 8 insertions(+), 2 deletions(-)
diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index 6fc737eed69..2979cb74651 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -1483,8 +1483,11 @@ heap_prune_record_dead(PruneState *prstate, OffsetNumber offnum,
/*
* Deliberately delay unsetting all_visible until later during pruning.
- * Removable dead tuples shouldn't preclude freezing the page.
+ * Removable dead tuples shouldn't preclude freezing the page. If we won't
+ * attempt freezing, just unset all-visible now, though.
*/
+ if (!prstate->attempt_freeze)
+ prstate->all_visible = prstate->all_frozen = false;
/* Record the dead offset for vacuum */
prstate->deadoffsets[prstate->lpdead_items++] = offnum;
@@ -1739,8 +1742,11 @@ heap_prune_record_unchanged_lp_dead(Page page, PruneState *prstate, OffsetNumber
* Similarly, don't unset all_visible until later, at the end of
* heap_page_prune_and_freeze(). This will allow us to attempt to freeze
* the page after pruning. As long as we unset it before updating the
- * visibility map, this will be correct.
+ * visibility map, this will be correct. If we won't attempt freezing,
+ * though, just unset all-visible now.
*/
+ if (!prstate->attempt_freeze)
+ prstate->all_visible = prstate->all_frozen = false;
/* Record the dead offset for vacuum */
prstate->deadoffsets[prstate->lpdead_items++] = offnum;
--
2.43.0
v18-0010-Allow-on-access-pruning-to-set-pages-all-visible.patchtext/x-patch; charset=UTF-8; name=v18-0010-Allow-on-access-pruning-to-set-pages-all-visible.patchDownload
From fc3618d0940f6698a009ec2ddc7886d975374cc6 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <melanieplageman@gmail.com>
Date: Tue, 29 Jul 2025 14:34:30 -0400
Subject: [PATCH v18 10/12] Allow on-access pruning to set pages all-visible
Many queries do not modify the underlying relation. For such queries, if
on-access pruning occurs during the scan, we can check whether the page
has become all-visible and update the visibility map accordingly.
Previously, only vacuum and COPY FREEZE marked pages as all-visible or
all-frozen.
Supporting this requires passing information about whether the relation
is modified from the executor down to the scan descriptor.
This commit implements on-access VM setting for sequential scans as well
as for the underlying heap relation in index scans and bitmap heap
scans.
---
src/backend/access/heap/heapam.c | 15 +++-
src/backend/access/heap/heapam_handler.c | 15 +++-
src/backend/access/heap/pruneheap.c | 73 +++++++++++++++----
src/backend/access/index/indexam.c | 46 ++++++++++++
src/backend/access/table/tableam.c | 39 +++++++++-
src/backend/executor/execMain.c | 4 +
src/backend/executor/execUtils.c | 2 +
src/backend/executor/nodeBitmapHeapscan.c | 7 +-
src/backend/executor/nodeIndexscan.c | 18 +++--
src/backend/executor/nodeSeqscan.c | 24 ++++--
src/include/access/genam.h | 11 +++
src/include/access/heapam.h | 24 +++++-
src/include/access/relscan.h | 6 ++
src/include/access/tableam.h | 30 +++++++-
src/include/nodes/execnodes.h | 6 ++
.../t/035_standby_logical_decoding.pl | 3 +-
16 files changed, 284 insertions(+), 39 deletions(-)
diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index f881530a2a5..d8594b9aac1 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -555,6 +555,7 @@ heap_prepare_pagescan(TableScanDesc sscan)
Buffer buffer = scan->rs_cbuf;
BlockNumber block = scan->rs_cblock;
Snapshot snapshot;
+ Buffer *vmbuffer = NULL;
Page page;
int lines;
bool all_visible;
@@ -569,7 +570,9 @@ heap_prepare_pagescan(TableScanDesc sscan)
/*
* Prune and repair fragmentation for the whole page, if possible.
*/
- heap_page_prune_opt(scan->rs_base.rs_rd, buffer);
+ if (sscan->rs_flags & SO_ALLOW_VM_SET)
+ vmbuffer = &scan->rs_vmbuffer;
+ heap_page_prune_opt(scan->rs_base.rs_rd, buffer, vmbuffer);
/*
* We must hold share lock on the buffer content while examining tuple
@@ -1246,6 +1249,7 @@ heap_beginscan(Relation relation, Snapshot snapshot,
sizeof(TBMIterateResult));
}
+ scan->rs_vmbuffer = InvalidBuffer;
return (TableScanDesc) scan;
}
@@ -1284,6 +1288,12 @@ heap_rescan(TableScanDesc sscan, ScanKey key, bool set_params,
scan->rs_cbuf = InvalidBuffer;
}
+ if (BufferIsValid(scan->rs_vmbuffer))
+ {
+ ReleaseBuffer(scan->rs_vmbuffer);
+ scan->rs_vmbuffer = InvalidBuffer;
+ }
+
/*
* SO_TYPE_BITMAPSCAN would be cleaned up here, but it does not hold any
* additional data vs a normal HeapScan
@@ -1316,6 +1326,9 @@ heap_endscan(TableScanDesc sscan)
if (BufferIsValid(scan->rs_cbuf))
ReleaseBuffer(scan->rs_cbuf);
+ if (BufferIsValid(scan->rs_vmbuffer))
+ ReleaseBuffer(scan->rs_vmbuffer);
+
/*
* Must free the read stream before freeing the BufferAccessStrategy.
*/
diff --git a/src/backend/access/heap/heapam_handler.c b/src/backend/access/heap/heapam_handler.c
index bcbac844bb6..f05b9e4968d 100644
--- a/src/backend/access/heap/heapam_handler.c
+++ b/src/backend/access/heap/heapam_handler.c
@@ -85,6 +85,7 @@ heapam_index_fetch_begin(Relation rel)
hscan->xs_base.rel = rel;
hscan->xs_cbuf = InvalidBuffer;
+ hscan->xs_vmbuffer = InvalidBuffer;
return &hscan->xs_base;
}
@@ -99,6 +100,12 @@ heapam_index_fetch_reset(IndexFetchTableData *scan)
ReleaseBuffer(hscan->xs_cbuf);
hscan->xs_cbuf = InvalidBuffer;
}
+
+ if (BufferIsValid(hscan->xs_vmbuffer))
+ {
+ ReleaseBuffer(hscan->xs_vmbuffer);
+ hscan->xs_vmbuffer = InvalidBuffer;
+ }
}
static void
@@ -138,7 +145,8 @@ heapam_index_fetch_tuple(struct IndexFetchTableData *scan,
* Prune page, but only if we weren't already on this page
*/
if (prev_buf != hscan->xs_cbuf)
- heap_page_prune_opt(hscan->xs_base.rel, hscan->xs_cbuf);
+ heap_page_prune_opt(hscan->xs_base.rel, hscan->xs_cbuf,
+ scan->modifies_base_rel ? NULL : &hscan->xs_vmbuffer);
}
/* Obtain share-lock on the buffer so we can examine visibility */
@@ -2471,6 +2479,7 @@ BitmapHeapScanNextBlock(TableScanDesc scan,
TBMIterateResult *tbmres;
OffsetNumber offsets[TBM_MAX_TUPLES_PER_PAGE];
int noffsets = -1;
+ Buffer *vmbuffer = NULL;
Assert(scan->rs_flags & SO_TYPE_BITMAPSCAN);
Assert(hscan->rs_read_stream);
@@ -2517,7 +2526,9 @@ BitmapHeapScanNextBlock(TableScanDesc scan,
/*
* Prune and repair fragmentation for the whole page, if possible.
*/
- heap_page_prune_opt(scan->rs_rd, buffer);
+ if (scan->rs_flags & SO_ALLOW_VM_SET)
+ vmbuffer = &hscan->rs_vmbuffer;
+ heap_page_prune_opt(scan->rs_rd, buffer, vmbuffer);
/*
* We must hold share lock on the buffer content while examining tuple
diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index 2979cb74651..6e863ffd85e 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -188,7 +188,9 @@ static bool heap_page_will_set_vis(Relation relation,
Buffer heap_buf,
Buffer vmbuffer,
bool blk_known_av,
- const PruneState *prstate,
+ PruneReason reason,
+ bool do_prune, bool do_freeze,
+ PruneState *prstate,
uint8 *vmflags,
bool *do_set_pd_vis);
@@ -203,9 +205,13 @@ static bool heap_page_will_set_vis(Relation relation,
* if there's not any use in pruning.
*
* Caller must have pin on the buffer, and must *not* have a lock on it.
+ *
+ * If vmbuffer is not NULL, it is okay for pruning to set the visibility map if
+ * the page is all-visible. We will take care of pinning and, if needed,
+ * reading in the page of the visibility map.
*/
void
-heap_page_prune_opt(Relation relation, Buffer buffer)
+heap_page_prune_opt(Relation relation, Buffer buffer, Buffer *vmbuffer)
{
Page page = BufferGetPage(buffer);
TransactionId prune_xid;
@@ -271,12 +277,21 @@ heap_page_prune_opt(Relation relation, Buffer buffer)
PruneFreezeParams params;
PruneFreezeResult presult;
+ params.options = 0;
+ params.vmbuffer = InvalidBuffer;
+
+ if (vmbuffer)
+ {
+ visibilitymap_pin(relation, BufferGetBlockNumber(buffer), vmbuffer);
+ params.options = HEAP_PAGE_PRUNE_UPDATE_VIS;
+ params.vmbuffer = *vmbuffer;
+ }
+
params.relation = relation;
params.buffer = buffer;
params.reason = PRUNE_ON_ACCESS;
params.vistest = vistest;
params.cutoffs = NULL;
- params.vmbuffer = InvalidBuffer;
params.blk_known_av = false;
/*
@@ -456,6 +471,9 @@ heap_page_will_freeze(Relation relation, Buffer buffer,
* have examined this page’s VM bits (e.g., VACUUM in the previous
* heap_vac_scan_next_block() call) and can pass that along.
*
+ * This should be called only after do_freeze has been decided (and do_prune
+ * has been set), as these factor into our heuristic-based decision.
+ *
* Returns true if one or both VM bits should be set, along with the desired
* flags in *vmflags. Also indicates via do_set_pd_vis whether PD_ALL_VISIBLE
* should be set on the heap page.
@@ -466,7 +484,9 @@ heap_page_will_set_vis(Relation relation,
Buffer heap_buf,
Buffer vmbuffer,
bool blk_known_av,
- const PruneState *prstate,
+ PruneReason reason,
+ bool do_prune, bool do_freeze,
+ PruneState *prstate,
uint8 *vmflags,
bool *do_set_pd_vis)
{
@@ -482,6 +502,23 @@ heap_page_will_set_vis(Relation relation,
return false;
}
+ /*
+ * If this is an on-access call and we're not actually pruning, avoid
+ * setting the visibility map if it would newly dirty the heap page or, if
+ * the page is already dirty, if doing so would require including a
+ * full-page image (FPI) of the heap page in the WAL. This situation
+ * should be rare, as on-access pruning is only attempted when
+ * pd_prune_xid is valid.
+ */
+ if (reason == PRUNE_ON_ACCESS &&
+ prstate->all_visible &&
+ !do_prune && !do_freeze &&
+ (!BufferIsDirty(heap_buf) || XLogCheckBufferNeedsBackup(heap_buf)))
+ {
+ prstate->all_visible = prstate->all_frozen = false;
+ return false;
+ }
+
if (prstate->all_visible && !PageIsAllVisible(heap_page))
*do_set_pd_vis = true;
@@ -505,6 +542,11 @@ heap_page_will_set_vis(Relation relation,
* page-level bit is clear. However, it's possible that in vacuum the bit
* got cleared after heap_vac_scan_next_block() was called, so we must
* recheck with buffer lock before concluding that the VM is corrupt.
+ *
+ * This will never trigger for on-access pruning because it couldn't have
+ * done a previous visibility map lookup and thus will always pass
+ * blk_known_av as false. A future vacuum will have to take care of fixing
+ * the corruption.
*/
else if (blk_known_av && !PageIsAllVisible(heap_page) &&
visibilitymap_get_status(relation, heap_blk, &vmbuffer) != 0)
@@ -913,6 +955,14 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
prstate.ndead > 0 ||
prstate.nunused > 0;
+ /*
+ * Even if we don't prune anything, if we found a new value for the
+ * pd_prune_xid field or the page was marked full, we will update the hint
+ * bit.
+ */
+ do_hint_prune = ((PageHeader) page)->pd_prune_xid != prstate.new_prune_xid ||
+ PageIsFull(page);
+
/*
* After processing all the live tuples on the page, if the newest xmin
* amongst them is not visible to everyone, the page cannot be
@@ -923,14 +973,6 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
!GlobalVisXidVisibleToAll(prstate.vistest, prstate.visibility_cutoff_xid))
prstate.all_visible = prstate.all_frozen = false;
- /*
- * Even if we don't prune anything, if we found a new value for the
- * pd_prune_xid field or the page was marked full, we will update the hint
- * bit.
- */
- do_hint_prune = ((PageHeader) page)->pd_prune_xid != prstate.new_prune_xid ||
- PageIsFull(page);
-
/*
* Decide if we want to go ahead with freezing according to the freeze
* plans we prepared, or not.
@@ -974,6 +1016,7 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
*/
do_set_vm = heap_page_will_set_vis(params->relation,
blockno, buffer, vmbuffer, params->blk_known_av,
+ params->reason, do_prune, do_freeze,
&prstate, &new_vmbits, &do_set_pd_vis);
/* We should only set the VM if PD_ALL_VISIBLE is set or will be */
@@ -2250,7 +2293,7 @@ heap_log_freeze_plan(HeapTupleFreeze *tuples, int ntuples,
/*
* Calculate the conflict horizon for the whole XLOG_HEAP2_PRUNE_VACUUM_SCAN
- * record.
+ * or XLOG_HEAP2_PRUNE_ON_ACCESS record.
*/
static TransactionId
get_conflict_xid(bool do_prune, bool do_freeze, bool do_set_vm,
@@ -2319,8 +2362,8 @@ get_conflict_xid(bool do_prune, bool do_freeze, bool do_set_vm,
* - Reaping: During vacuum phase III, items that are already LP_DEAD are
* marked as unused.
*
- * - VM updates: After vacuum phases I and III, the heap page may be marked
- * all-visible and all-frozen.
+ * - VM updates: After vacuum phases I and III and on-access, the heap page
+ * may be marked all-visible and all-frozen.
*
* These changes all happen together, so we use a single WAL record for them
* all.
diff --git a/src/backend/access/index/indexam.c b/src/backend/access/index/indexam.c
index 0492d92d23b..8d582a8eafd 100644
--- a/src/backend/access/index/indexam.c
+++ b/src/backend/access/index/indexam.c
@@ -289,6 +289,32 @@ index_beginscan(Relation heapRelation,
return scan;
}
+/*
+ * Similar to index_beginscan(), but allows the caller to indicate whether the
+ * query modifies the underlying base relation. This is used when the caller
+ * wants to attempt marking pages in the base relation as all-visible in the
+ * visibility map during on-access pruning.
+ */
+IndexScanDesc
+index_beginscan_vmset(Relation heapRelation,
+ Relation indexRelation,
+ Snapshot snapshot,
+ IndexScanInstrumentation *instrument,
+ int nkeys, int norderbys, bool modifies_base_rel)
+{
+ IndexScanDesc scan;
+
+ scan = index_beginscan(heapRelation,
+ indexRelation,
+ snapshot,
+ instrument,
+ nkeys, norderbys);
+
+ scan->xs_heapfetch->modifies_base_rel = modifies_base_rel;
+
+ return scan;
+}
+
/*
* index_beginscan_bitmap - start a scan of an index with amgetbitmap
*
@@ -620,6 +646,26 @@ index_beginscan_parallel(Relation heaprel, Relation indexrel,
return scan;
}
+/*
+ * Parallel version of index_beginscan_vmset()
+ */
+IndexScanDesc
+index_beginscan_parallel_vmset(Relation heaprel, Relation indexrel,
+ IndexScanInstrumentation *instrument,
+ int nkeys, int norderbys,
+ ParallelIndexScanDesc pscan,
+ bool modifies_base_rel)
+{
+ IndexScanDesc scan;
+
+ scan = index_beginscan_parallel(heaprel, indexrel,
+ instrument,
+ nkeys, norderbys,
+ pscan);
+ scan->xs_heapfetch->modifies_base_rel = modifies_base_rel;
+ return scan;
+}
+
/* ----------------
* index_getnext_tid - get the next TID from a scan
*
diff --git a/src/backend/access/table/tableam.c b/src/backend/access/table/tableam.c
index 5e41404937e..3e3a0f72a71 100644
--- a/src/backend/access/table/tableam.c
+++ b/src/backend/access/table/tableam.c
@@ -49,6 +49,10 @@
char *default_table_access_method = DEFAULT_TABLE_ACCESS_METHOD;
bool synchronize_seqscans = true;
+/* Helper for table_beginscan_parallel() and table_beginscan_parallel_vmset() */
+static TableScanDesc table_beginscan_parallel_common(Relation relation, ParallelTableScanDesc pscan,
+ uint32 flags);
+
/* ----------------------------------------------------------------------------
* Slot functions.
@@ -162,12 +166,14 @@ table_parallelscan_initialize(Relation rel, ParallelTableScanDesc pscan,
}
}
-TableScanDesc
-table_beginscan_parallel(Relation relation, ParallelTableScanDesc pscan)
+/*
+ * Common helper for table_beginscan_parallel() and table_beginscan_parallel_vmset()
+ */
+static TableScanDesc
+table_beginscan_parallel_common(Relation relation, ParallelTableScanDesc pscan,
+ uint32 flags)
{
Snapshot snapshot;
- uint32 flags = SO_TYPE_SEQSCAN |
- SO_ALLOW_STRAT | SO_ALLOW_SYNC | SO_ALLOW_PAGEMODE;
Assert(RelFileLocatorEquals(relation->rd_locator, pscan->phs_locator));
@@ -188,6 +194,31 @@ table_beginscan_parallel(Relation relation, ParallelTableScanDesc pscan)
pscan, flags);
}
+TableScanDesc
+table_beginscan_parallel(Relation relation, ParallelTableScanDesc pscan)
+{
+ uint32 flags = SO_TYPE_SEQSCAN |
+ SO_ALLOW_STRAT | SO_ALLOW_SYNC | SO_ALLOW_PAGEMODE;
+
+ return table_beginscan_parallel_common(relation, pscan, flags);
+}
+
+/*
+ * Parallel version of table_beginscan_vmset()
+ */
+TableScanDesc
+table_beginscan_parallel_vmset(Relation relation, ParallelTableScanDesc pscan,
+ bool modifies_rel)
+{
+ uint32 flags = SO_TYPE_SEQSCAN |
+ SO_ALLOW_STRAT | SO_ALLOW_SYNC | SO_ALLOW_PAGEMODE;
+
+ if (!modifies_rel)
+ flags |= SO_ALLOW_VM_SET;
+
+ return table_beginscan_parallel_common(relation, pscan, flags);
+}
+
/* ----------------------------------------------------------------------------
* Index scan related functions.
diff --git a/src/backend/executor/execMain.c b/src/backend/executor/execMain.c
index 831c55ce787..15be318fd41 100644
--- a/src/backend/executor/execMain.c
+++ b/src/backend/executor/execMain.c
@@ -916,6 +916,10 @@ InitPlan(QueryDesc *queryDesc, int eflags)
break;
}
+ /* If it has a rowmark, the relation is modified */
+ estate->es_modified_relids = bms_add_member(estate->es_modified_relids,
+ rc->rti);
+
/* Check that relation is a legal target for marking */
if (relation)
CheckValidRowMarkRel(relation, rc->markType);
diff --git a/src/backend/executor/execUtils.c b/src/backend/executor/execUtils.c
index fdc65c2b42b..28a06dcd244 100644
--- a/src/backend/executor/execUtils.c
+++ b/src/backend/executor/execUtils.c
@@ -893,6 +893,8 @@ ExecInitResultRelation(EState *estate, ResultRelInfo *resultRelInfo,
estate->es_result_relations = (ResultRelInfo **)
palloc0(estate->es_range_table_size * sizeof(ResultRelInfo *));
estate->es_result_relations[rti - 1] = resultRelInfo;
+ estate->es_modified_relids = bms_add_member(estate->es_modified_relids,
+ rti);
/*
* Saving in the list allows to avoid needlessly traversing the whole
diff --git a/src/backend/executor/nodeBitmapHeapscan.c b/src/backend/executor/nodeBitmapHeapscan.c
index bf24f3d7fe0..af6db9f7919 100644
--- a/src/backend/executor/nodeBitmapHeapscan.c
+++ b/src/backend/executor/nodeBitmapHeapscan.c
@@ -105,11 +105,16 @@ BitmapTableScanSetup(BitmapHeapScanState *node)
*/
if (!node->ss.ss_currentScanDesc)
{
+ bool modifies_rel =
+ bms_is_member(((Scan *) node->ss.ps.plan)->scanrelid,
+ node->ss.ps.state->es_modified_relids);
+
node->ss.ss_currentScanDesc =
table_beginscan_bm(node->ss.ss_currentRelation,
node->ss.ps.state->es_snapshot,
0,
- NULL);
+ NULL,
+ modifies_rel);
}
node->ss.ss_currentScanDesc->st.rs_tbmiterator = tbmiterator;
diff --git a/src/backend/executor/nodeIndexscan.c b/src/backend/executor/nodeIndexscan.c
index 7fcaa37fe62..c2ffbd3b08e 100644
--- a/src/backend/executor/nodeIndexscan.c
+++ b/src/backend/executor/nodeIndexscan.c
@@ -102,16 +102,22 @@ IndexNext(IndexScanState *node)
if (scandesc == NULL)
{
+
+ bool modifies_base_rel =
+ bms_is_member(((Scan *) node->ss.ps.plan)->scanrelid,
+ estate->es_modified_relids);
+
/*
* We reach here if the index scan is not parallel, or if we're
* serially executing an index scan that was planned to be parallel.
*/
- scandesc = index_beginscan(node->ss.ss_currentRelation,
- node->iss_RelationDesc,
- estate->es_snapshot,
- &node->iss_Instrument,
- node->iss_NumScanKeys,
- node->iss_NumOrderByKeys);
+ scandesc = index_beginscan_vmset(node->ss.ss_currentRelation,
+ node->iss_RelationDesc,
+ estate->es_snapshot,
+ &node->iss_Instrument,
+ node->iss_NumScanKeys,
+ node->iss_NumOrderByKeys,
+ modifies_base_rel);
node->iss_ScanDesc = scandesc;
diff --git a/src/backend/executor/nodeSeqscan.c b/src/backend/executor/nodeSeqscan.c
index 94047d29430..fd69275c181 100644
--- a/src/backend/executor/nodeSeqscan.c
+++ b/src/backend/executor/nodeSeqscan.c
@@ -65,13 +65,18 @@ SeqNext(SeqScanState *node)
if (scandesc == NULL)
{
+ bool modifies_rel =
+ bms_is_member(((Scan *) node->ss.ps.plan)->scanrelid,
+ estate->es_modified_relids);
+
/*
* We reach here if the scan is not parallel, or if we're serially
* executing a scan that was planned to be parallel.
*/
- scandesc = table_beginscan(node->ss.ss_currentRelation,
- estate->es_snapshot,
- 0, NULL);
+ scandesc = table_beginscan_vmset(node->ss.ss_currentRelation,
+ estate->es_snapshot,
+ 0, NULL, modifies_rel);
+
node->ss.ss_currentScanDesc = scandesc;
}
@@ -366,6 +371,7 @@ ExecSeqScanInitializeDSM(SeqScanState *node,
ParallelContext *pcxt)
{
EState *estate = node->ss.ps.state;
+ bool modifies_rel;
ParallelTableScanDesc pscan;
pscan = shm_toc_allocate(pcxt->toc, node->pscan_len);
@@ -373,8 +379,11 @@ ExecSeqScanInitializeDSM(SeqScanState *node,
pscan,
estate->es_snapshot);
shm_toc_insert(pcxt->toc, node->ss.ps.plan->plan_node_id, pscan);
+ modifies_rel = bms_is_member(((Scan *) node->ss.ps.plan)->scanrelid,
+ estate->es_modified_relids);
node->ss.ss_currentScanDesc =
- table_beginscan_parallel(node->ss.ss_currentRelation, pscan);
+ table_beginscan_parallel_vmset(node->ss.ss_currentRelation, pscan,
+ modifies_rel);
}
/* ----------------------------------------------------------------
@@ -404,8 +413,13 @@ ExecSeqScanInitializeWorker(SeqScanState *node,
ParallelWorkerContext *pwcxt)
{
ParallelTableScanDesc pscan;
+ bool modifies_rel =
+ bms_is_member(((Scan *) node->ss.ps.plan)->scanrelid,
+ node->ss.ps.state->es_modified_relids);
pscan = shm_toc_lookup(pwcxt->toc, node->ss.ps.plan->plan_node_id, false);
node->ss.ss_currentScanDesc =
- table_beginscan_parallel(node->ss.ss_currentRelation, pscan);
+ table_beginscan_parallel_vmset(node->ss.ss_currentRelation,
+ pscan,
+ modifies_rel);
}
diff --git a/src/include/access/genam.h b/src/include/access/genam.h
index 9200a22bd9f..aa2112c8e04 100644
--- a/src/include/access/genam.h
+++ b/src/include/access/genam.h
@@ -178,6 +178,11 @@ extern IndexScanDesc index_beginscan(Relation heapRelation,
Snapshot snapshot,
IndexScanInstrumentation *instrument,
int nkeys, int norderbys);
+extern IndexScanDesc index_beginscan_vmset(Relation heapRelation,
+ Relation indexRelation,
+ Snapshot snapshot,
+ IndexScanInstrumentation *instrument,
+ int nkeys, int norderbys, bool modifies_heap_rel);
extern IndexScanDesc index_beginscan_bitmap(Relation indexRelation,
Snapshot snapshot,
IndexScanInstrumentation *instrument,
@@ -204,6 +209,12 @@ extern IndexScanDesc index_beginscan_parallel(Relation heaprel,
IndexScanInstrumentation *instrument,
int nkeys, int norderbys,
ParallelIndexScanDesc pscan);
+
+extern IndexScanDesc index_beginscan_parallel_vmset(Relation heaprel, Relation indexrel,
+ IndexScanInstrumentation *instrument,
+ int nkeys, int norderbys,
+ ParallelIndexScanDesc pscan,
+ bool modifies_rel);
extern ItemPointer index_getnext_tid(IndexScanDesc scan,
ScanDirection direction);
extern bool index_fetch_heap(IndexScanDesc scan, TupleTableSlot *slot);
diff --git a/src/include/access/heapam.h b/src/include/access/heapam.h
index 00213fad852..342560e1034 100644
--- a/src/include/access/heapam.h
+++ b/src/include/access/heapam.h
@@ -95,6 +95,13 @@ typedef struct HeapScanDescData
*/
ParallelBlockTableScanWorkerData *rs_parallelworkerdata;
+ /*
+ * For sequential scans and bitmap heap scans. If the relation is not
+ * being modified, on-access pruning may read in the current heap page's
+ * corresponding VM block to this buffer.
+ */
+ Buffer rs_vmbuffer;
+
/* these fields only used in page-at-a-time mode and for bitmap scans */
uint32 rs_cindex; /* current tuple's index in vistuples */
uint32 rs_ntuples; /* number of visible tuples on page */
@@ -117,8 +124,18 @@ typedef struct IndexFetchHeapData
{
IndexFetchTableData xs_base; /* AM independent part of the descriptor */
- Buffer xs_cbuf; /* current heap buffer in scan, if any */
- /* NB: if xs_cbuf is not InvalidBuffer, we hold a pin on that buffer */
+ /*
+ * Current heap buffer in scan, if any. NB: if xs_cbuf is not
+ * InvalidBuffer, we hold a pin on that buffer.
+ */
+ Buffer xs_cbuf;
+
+ /*
+ * For index scans that do not modify the underlying heap table, on-access
+ * pruning may read in the current heap page's corresponding VM block to
+ * this buffer.
+ */
+ Buffer xs_vmbuffer;
} IndexFetchHeapData;
/* Result codes for HeapTupleSatisfiesVacuum */
@@ -415,7 +432,8 @@ extern TransactionId heap_index_delete_tuples(Relation rel,
TM_IndexDeleteOp *delstate);
/* in heap/pruneheap.c */
-extern void heap_page_prune_opt(Relation relation, Buffer buffer);
+extern void heap_page_prune_opt(Relation relation, Buffer buffer,
+ Buffer *vmbuffer);
extern void heap_page_prune_and_freeze(PruneFreezeParams *params,
PruneFreezeResult *presult,
OffsetNumber *off_loc,
diff --git a/src/include/access/relscan.h b/src/include/access/relscan.h
index b5e0fb386c0..f496e0b4939 100644
--- a/src/include/access/relscan.h
+++ b/src/include/access/relscan.h
@@ -121,6 +121,12 @@ typedef struct ParallelBlockTableScanWorkerData *ParallelBlockTableScanWorker;
typedef struct IndexFetchTableData
{
Relation rel;
+
+ /*
+ * Some optimizations can only be performed if the query does not modify
+ * the underlying relation. Track that here.
+ */
+ bool modifies_base_rel;
} IndexFetchTableData;
struct IndexScanInstrumentation;
diff --git a/src/include/access/tableam.h b/src/include/access/tableam.h
index e16bf025692..f250d4e7aec 100644
--- a/src/include/access/tableam.h
+++ b/src/include/access/tableam.h
@@ -63,6 +63,8 @@ typedef enum ScanOptions
/* unregister snapshot at scan end? */
SO_TEMP_SNAPSHOT = 1 << 9,
+ /* whether or not scan should attempt to set the VM */
+ SO_ALLOW_VM_SET = 1 << 10,
} ScanOptions;
/*
@@ -882,6 +884,25 @@ table_beginscan(Relation rel, Snapshot snapshot,
return rel->rd_tableam->scan_begin(rel, snapshot, nkeys, key, NULL, flags);
}
+/*
+ * Similar to table_beginscan(), but allows the caller to indicate whether the
+ * query modifies the relation. This is used when the caller wants to attempt
+ * marking pages in the relation as all-visible in the visibility map during
+ * on-access pruning.
+ */
+static inline TableScanDesc
+table_beginscan_vmset(Relation rel, Snapshot snapshot,
+ int nkeys, struct ScanKeyData *key, bool modifies_rel)
+{
+ uint32 flags = SO_TYPE_SEQSCAN |
+ SO_ALLOW_STRAT | SO_ALLOW_SYNC | SO_ALLOW_PAGEMODE;
+
+ if (!modifies_rel)
+ flags |= SO_ALLOW_VM_SET;
+
+ return rel->rd_tableam->scan_begin(rel, snapshot, nkeys, key, NULL, flags);
+}
+
/*
* Like table_beginscan(), but for scanning catalog. It'll automatically use a
* snapshot appropriate for scanning catalog relations.
@@ -919,10 +940,13 @@ table_beginscan_strat(Relation rel, Snapshot snapshot,
*/
static inline TableScanDesc
table_beginscan_bm(Relation rel, Snapshot snapshot,
- int nkeys, ScanKeyData *key)
+ int nkeys, ScanKeyData *key, bool modifies_rel)
{
uint32 flags = SO_TYPE_BITMAPSCAN | SO_ALLOW_PAGEMODE;
+ if (!modifies_rel)
+ flags |= SO_ALLOW_VM_SET;
+
return rel->rd_tableam->scan_begin(rel, snapshot, nkeys, key,
NULL, flags);
}
@@ -1130,6 +1154,10 @@ extern void table_parallelscan_initialize(Relation rel,
extern TableScanDesc table_beginscan_parallel(Relation relation,
ParallelTableScanDesc pscan);
+extern TableScanDesc table_beginscan_parallel_vmset(Relation relation,
+ ParallelTableScanDesc pscan,
+ bool modifies_rel);
+
/*
* Restart a parallel scan. Call this in the leader process. Caller is
* responsible for making sure that all workers have finished the scan
diff --git a/src/include/nodes/execnodes.h b/src/include/nodes/execnodes.h
index a36653c37f9..9c54fa06e4a 100644
--- a/src/include/nodes/execnodes.h
+++ b/src/include/nodes/execnodes.h
@@ -676,6 +676,12 @@ typedef struct EState
* ExecDoInitialPruning() */
const char *es_sourceText; /* Source text from QueryDesc */
+ /*
+ * RT indexes of relations modified by the query either through
+ * UPDATE/DELETE/INSERT/MERGE or SELECT FOR UPDATE
+ */
+ Bitmapset *es_modified_relids;
+
JunkFilter *es_junkFilter; /* top-level junk filter, if any */
/* If query can insert/delete tuples, the command ID to mark them with */
diff --git a/src/test/recovery/t/035_standby_logical_decoding.pl b/src/test/recovery/t/035_standby_logical_decoding.pl
index c9c182892cf..f5c0c65b260 100644
--- a/src/test/recovery/t/035_standby_logical_decoding.pl
+++ b/src/test/recovery/t/035_standby_logical_decoding.pl
@@ -296,6 +296,7 @@ wal_level = 'logical'
max_replication_slots = 4
max_wal_senders = 4
autovacuum = off
+hot_standby_feedback = on
});
$node_primary->dump_info;
$node_primary->start;
@@ -745,7 +746,7 @@ check_pg_recvlogical_stderr($handle,
$logstart = -s $node_standby->logfile;
reactive_slots_change_hfs_and_wait_for_xmins('shared_row_removal_',
- 'no_conflict_', 0, 1);
+ 'no_conflict_', 1, 0);
# This should not trigger a conflict
wait_until_vacuum_can_remove(
--
2.43.0
v18-0011-Set-pd_prune_xid-on-insert.patchtext/x-patch; charset=UTF-8; name=v18-0011-Set-pd_prune_xid-on-insert.patchDownload
From ac4338510fe32446375801ffd78b38367a87a56b Mon Sep 17 00:00:00 2001
From: Melanie Plageman <melanieplageman@gmail.com>
Date: Tue, 29 Jul 2025 16:12:56 -0400
Subject: [PATCH v18 11/12] Set pd_prune_xid on insert
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
Now that visibility map (VM) updates can occur during read-only queries,
it makes sense to also set the page’s pd_prune_xid hint during inserts.
This enables heap_page_prune_and_freeze() to run after a page is
filled with newly inserted tuples the first time it is read.
This change also addresses a long-standing note in heap_insert() and
heap_multi_insert(), which observed that setting pd_prune_xid would
help clean up aborted insertions sooner. Without it, such tuples might
linger until VACUUM, whereas now they can be pruned earlier.
Setting pd_prune_xid on insert can cause a page to be dirtied and
written out when it previously would not have been, affetcting the
reported number of hits in the index-killtuples isolation test. It is
unclear if this is a bug in the way hits are tracked, a faulty test
expectation, or if simply updating the test's expected output is
sufficient remediation.
---
src/backend/access/heap/heapam.c | 25 +++++++++++++------
src/backend/access/heap/heapam_xlog.c | 15 ++++++++++-
.../isolation/expected/index-killtuples.out | 6 ++---
3 files changed, 34 insertions(+), 12 deletions(-)
diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index d8594b9aac1..a3e2c4c20cd 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -2104,6 +2104,7 @@ heap_insert(Relation relation, HeapTuple tup, CommandId cid,
TransactionId xid = GetCurrentTransactionId();
HeapTuple heaptup;
Buffer buffer;
+ Page page;
Buffer vmbuffer = InvalidBuffer;
bool all_visible_cleared = false;
@@ -2163,15 +2164,19 @@ heap_insert(Relation relation, HeapTuple tup, CommandId cid,
}
/*
- * XXX Should we set PageSetPrunable on this page ?
+ * Set pd_prune_xid to trigger heap_page_prune_and_freeze() once the page
+ * is full so that we can set the page all-visible in the VM.
*
- * The inserting transaction may eventually abort thus making this tuple
- * DEAD and hence available for pruning. Though we don't want to optimize
- * for aborts, if no other tuple in this page is UPDATEd/DELETEd, the
- * aborted tuple will never be pruned until next vacuum is triggered.
+ * Setting pd_prune_xid is also handy if the inserting transaction
+ * eventually aborts making this tuple DEAD and hence available for
+ * pruning. If no other tuple in this page is UPDATEd/DELETEd, the aborted
+ * tuple would never otherwise be pruned until next vacuum is triggered.
*
- * If you do add PageSetPrunable here, add it in heap_xlog_insert too.
+ * Don't set it if we are in bootstrap mode, though.
*/
+ page = BufferGetPage(buffer);
+ if (TransactionIdIsNormal(xid))
+ PageSetPrunable(page, xid);
MarkBufferDirty(buffer);
@@ -2181,7 +2186,6 @@ heap_insert(Relation relation, HeapTuple tup, CommandId cid,
xl_heap_insert xlrec;
xl_heap_header xlhdr;
XLogRecPtr recptr;
- Page page = BufferGetPage(buffer);
uint8 info = XLOG_HEAP_INSERT;
int bufflags = 0;
@@ -2545,8 +2549,13 @@ heap_multi_insert(Relation relation, TupleTableSlot **slots, int ntuples,
}
/*
- * XXX Should we set PageSetPrunable on this page ? See heap_insert()
+ * Set pd_prune_xid. See heap_insert() for more on why we do this when
+ * inserting tuples. This only makes sense if we aren't already
+ * setting the page frozen in the VM. We also don't set it in
+ * bootstrap mode.
*/
+ if (!all_frozen_set && TransactionIdIsNormal(xid))
+ PageSetPrunable(page, xid);
MarkBufferDirty(buffer);
diff --git a/src/backend/access/heap/heapam_xlog.c b/src/backend/access/heap/heapam_xlog.c
index 5eafdff6c2e..21972347dec 100644
--- a/src/backend/access/heap/heapam_xlog.c
+++ b/src/backend/access/heap/heapam_xlog.c
@@ -463,6 +463,12 @@ heap_xlog_insert(XLogReaderState *record)
freespace = PageGetHeapFreeSpace(page); /* needed to update FSM below */
+ /*
+ * Set the page prunable to trigger on-access pruning later which may
+ * set the page all-visible in the VM.
+ */
+ PageSetPrunable(page, XLogRecGetXid(record));
+
PageSetLSN(page, lsn);
if (xlrec->flags & XLH_INSERT_ALL_VISIBLE_CLEARED)
@@ -612,9 +618,16 @@ heap_xlog_multi_insert(XLogReaderState *record)
if (xlrec->flags & XLH_INSERT_ALL_VISIBLE_CLEARED)
PageClearAllVisible(page);
- /* XLH_INSERT_ALL_FROZEN_SET implies that all tuples are visible */
+ /*
+ * XLH_INSERT_ALL_FROZEN_SET implies that all tuples are visible. If
+ * we are not setting the page frozen, then set the page's prunable
+ * hint so that we trigger on-access pruning later which may set the
+ * page all-visible in the VM.
+ */
if (xlrec->flags & XLH_INSERT_ALL_FROZEN_SET)
PageSetAllVisible(page);
+ else
+ PageSetPrunable(page, XLogRecGetXid(record));
MarkBufferDirty(buffer);
}
diff --git a/src/test/isolation/expected/index-killtuples.out b/src/test/isolation/expected/index-killtuples.out
index be7ddd756ef..b29f2434b00 100644
--- a/src/test/isolation/expected/index-killtuples.out
+++ b/src/test/isolation/expected/index-killtuples.out
@@ -54,7 +54,7 @@ step flush: SELECT FROM pg_stat_force_next_flush();
step result: SELECT heap_blks_read + heap_blks_hit - counter.heap_accesses AS new_heap_accesses FROM counter, pg_statio_all_tables WHERE relname = 'kill_prior_tuple';
new_heap_accesses
-----------------
- 1
+ 2
(1 row)
step measure: UPDATE counter SET heap_accesses = (SELECT heap_blks_read + heap_blks_hit FROM pg_statio_all_tables WHERE relname = 'kill_prior_tuple');
@@ -130,7 +130,7 @@ step flush: SELECT FROM pg_stat_force_next_flush();
step result: SELECT heap_blks_read + heap_blks_hit - counter.heap_accesses AS new_heap_accesses FROM counter, pg_statio_all_tables WHERE relname = 'kill_prior_tuple';
new_heap_accesses
-----------------
- 1
+ 2
(1 row)
step measure: UPDATE counter SET heap_accesses = (SELECT heap_blks_read + heap_blks_hit FROM pg_statio_all_tables WHERE relname = 'kill_prior_tuple');
@@ -283,7 +283,7 @@ step flush: SELECT FROM pg_stat_force_next_flush();
step result: SELECT heap_blks_read + heap_blks_hit - counter.heap_accesses AS new_heap_accesses FROM counter, pg_statio_all_tables WHERE relname = 'kill_prior_tuple';
new_heap_accesses
-----------------
- 1
+ 2
(1 row)
step measure: UPDATE counter SET heap_accesses = (SELECT heap_blks_read + heap_blks_hit FROM pg_statio_all_tables WHERE relname = 'kill_prior_tuple');
--
2.43.0
v18-0012-Split-heap_page_prune_and_freeze-into-helpers.patchtext/x-patch; charset=US-ASCII; name=v18-0012-Split-heap_page_prune_and_freeze-into-helpers.patchDownload
From 163e09cb81eeb1af31cd9b3a648896845587ce3a Mon Sep 17 00:00:00 2001
From: Melanie Plageman <melanieplageman@gmail.com>
Date: Wed, 8 Oct 2025 18:45:45 -0400
Subject: [PATCH v18 12/12] Split heap_page_prune_and_freeze into helpers
---
src/backend/access/heap/pruneheap.c | 316 +++++++++++++++-------------
1 file changed, 170 insertions(+), 146 deletions(-)
diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index 6e863ffd85e..d21a66f6a75 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -590,82 +590,20 @@ heap_page_will_set_vis(Relation relation,
return do_set_vm;
}
-/*
- * Prune and repair fragmentation and potentially freeze tuples on the
- * specified page. If the page's visibility status has changed, update it in
- * the VM.
- *
- * Caller must have pin and buffer cleanup lock on the page. Note that we
- * don't update the FSM information for page on caller's behalf. Caller might
- * also need to account for a reduction in the length of the line pointer
- * array following array truncation by us.
- *
- * params contains the input parameters used to control freezing and pruning
- * behavior. See the definition of PruneFreezeParams for more on what each
- * parameter does.
- *
- * If the HEAP_PRUNE_FREEZE option is set in params, we will freeze tuples if
- * it's required in order to advance relfrozenxid / relminmxid, or if it's
- * considered advantageous for overall system performance to do so now. The
- * 'params.cutoffs', 'presult', 'new_relfrozen_xid' and 'new_relmin_mxid'
- * arguments are required when freezing.
- *
- * If HEAP_PAGE_PRUNE_UPDATE_VIS is set in params and the visibility status of
- * the page has changed, we will update the VM at the same time as pruning and
- * freezing the heap page. We will also update presult->old_vmbits and
- * presult->new_vmbits with the state of the VM before and after updating it
- * for the caller to use in bookkeeping.
- *
- * presult contains output parameters needed by callers, such as the number of
- * tuples removed and the offsets of dead items on the page after pruning.
- * heap_page_prune_and_freeze() is responsible for initializing it. Required
- * by all callers.
- *
- * off_loc is the offset location required by the caller to use in error
- * callback.
- *
- * new_relfrozen_xid and new_relmin_mxid must provided by the caller if the
- * HEAP_PRUNE_FREEZE option is set in params. On entry, they contain the
- * oldest XID and multi-XID seen on the relation so far. They will be updated
- * with oldest values present on the page after pruning. After processing the
- * whole relation, VACUUM can use these values as the new
- * relfrozenxid/relminmxid for the relation.
- */
-void
-heap_page_prune_and_freeze(PruneFreezeParams *params,
- PruneFreezeResult *presult,
- OffsetNumber *off_loc,
- TransactionId *new_relfrozen_xid,
- MultiXactId *new_relmin_mxid)
+static void
+prune_freeze_setup(PruneFreezeParams *params, PruneState *prstate,
+ TransactionId *new_relfrozen_xid,
+ MultiXactId *new_relmin_mxid,
+ PruneFreezeResult *presult)
{
- Buffer buffer = params->buffer;
- Buffer vmbuffer = params->vmbuffer;
- Page page = BufferGetPage(buffer);
- BlockNumber blockno = BufferGetBlockNumber(buffer);
- OffsetNumber offnum,
- maxoff;
- PruneState prstate;
- HeapTupleData tup;
- bool do_freeze;
- bool do_prune;
- bool do_hint_prune;
- bool do_set_vm;
- bool do_set_pd_vis;
- bool did_tuple_hint_fpi;
- int64 fpi_before = pgWalUsage.wal_fpi;
- TransactionId frz_conflict_horizon = InvalidTransactionId;
- TransactionId conflict_xid = InvalidTransactionId;
- uint8 new_vmbits = 0;
- uint8 old_vmbits = 0;
-
/* Copy parameters to prstate */
- prstate.vistest = params->vistest;
- prstate.mark_unused_now =
+ prstate->vistest = params->vistest;
+ prstate->mark_unused_now =
(params->options & HEAP_PAGE_PRUNE_MARK_UNUSED_NOW) != 0;
- prstate.attempt_freeze = (params->options & HEAP_PAGE_PRUNE_FREEZE) != 0;
- prstate.attempt_update_vm =
+ prstate->attempt_freeze = (params->options & HEAP_PAGE_PRUNE_FREEZE) != 0;
+ prstate->attempt_update_vm =
(params->options & HEAP_PAGE_PRUNE_UPDATE_VIS) != 0;
- prstate.cutoffs = params->cutoffs;
+ prstate->cutoffs = params->cutoffs;
/*
* Our strategy is to scan the page and make lists of items to change,
@@ -678,37 +616,37 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
* prunable, we will save the lowest relevant XID in new_prune_xid. Also
* initialize the rest of our working state.
*/
- prstate.new_prune_xid = InvalidTransactionId;
- prstate.latest_xid_removed = InvalidTransactionId;
- prstate.nredirected = prstate.ndead = prstate.nunused = prstate.nfrozen = 0;
- prstate.nroot_items = 0;
- prstate.nheaponly_items = 0;
+ prstate->new_prune_xid = InvalidTransactionId;
+ prstate->latest_xid_removed = InvalidTransactionId;
+ prstate->nredirected = prstate->ndead = prstate->nunused = prstate->nfrozen = 0;
+ prstate->nroot_items = 0;
+ prstate->nheaponly_items = 0;
/* initialize page freezing working state */
- prstate.pagefrz.freeze_required = false;
- if (prstate.attempt_freeze)
+ prstate->pagefrz.freeze_required = false;
+ if (prstate->attempt_freeze)
{
Assert(new_relfrozen_xid && new_relmin_mxid);
- prstate.pagefrz.FreezePageRelfrozenXid = *new_relfrozen_xid;
- prstate.pagefrz.NoFreezePageRelfrozenXid = *new_relfrozen_xid;
- prstate.pagefrz.FreezePageRelminMxid = *new_relmin_mxid;
- prstate.pagefrz.NoFreezePageRelminMxid = *new_relmin_mxid;
+ prstate->pagefrz.FreezePageRelfrozenXid = *new_relfrozen_xid;
+ prstate->pagefrz.NoFreezePageRelfrozenXid = *new_relfrozen_xid;
+ prstate->pagefrz.FreezePageRelminMxid = *new_relmin_mxid;
+ prstate->pagefrz.NoFreezePageRelminMxid = *new_relmin_mxid;
}
else
{
Assert(new_relfrozen_xid == NULL && new_relmin_mxid == NULL);
- prstate.pagefrz.FreezePageRelminMxid = InvalidMultiXactId;
- prstate.pagefrz.NoFreezePageRelminMxid = InvalidMultiXactId;
- prstate.pagefrz.FreezePageRelfrozenXid = InvalidTransactionId;
- prstate.pagefrz.NoFreezePageRelfrozenXid = InvalidTransactionId;
+ prstate->pagefrz.FreezePageRelminMxid = InvalidMultiXactId;
+ prstate->pagefrz.NoFreezePageRelminMxid = InvalidMultiXactId;
+ prstate->pagefrz.FreezePageRelfrozenXid = InvalidTransactionId;
+ prstate->pagefrz.NoFreezePageRelfrozenXid = InvalidTransactionId;
}
- prstate.ndeleted = 0;
- prstate.live_tuples = 0;
- prstate.recently_dead_tuples = 0;
- prstate.hastup = false;
- prstate.lpdead_items = 0;
- prstate.deadoffsets = presult->deadoffsets;
+ prstate->ndeleted = 0;
+ prstate->live_tuples = 0;
+ prstate->recently_dead_tuples = 0;
+ prstate->hastup = false;
+ prstate->lpdead_items = 0;
+ prstate->deadoffsets = presult->deadoffsets;
/*
* Track whether the page could be marked all-visible and/or all-frozen.
@@ -736,20 +674,20 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
* bookkeeping. In this case, initializing all_visible to false allows
* heap_prune_record_unchanged_lp_normal() to bypass unnecessary work.
*/
- if (prstate.attempt_freeze)
+ if (prstate->attempt_freeze)
{
- prstate.all_visible = true;
- prstate.all_frozen = true;
+ prstate->all_visible = true;
+ prstate->all_frozen = true;
}
- else if (prstate.attempt_update_vm)
+ else if (prstate->attempt_update_vm)
{
- prstate.all_visible = true;
- prstate.all_frozen = false;
+ prstate->all_visible = true;
+ prstate->all_frozen = false;
}
else
{
- prstate.all_visible = false;
- prstate.all_frozen = false;
+ prstate->all_visible = false;
+ prstate->all_frozen = false;
}
/*
@@ -761,10 +699,14 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
* used to calculate the snapshot conflict horizon when updating the VM
* and/or freezing all the tuples on the page.
*/
- prstate.visibility_cutoff_xid = InvalidTransactionId;
+ prstate->visibility_cutoff_xid = InvalidTransactionId;
+}
- maxoff = PageGetMaxOffsetNumber(page);
- tup.t_tableOid = RelationGetRelid(params->relation);
+static void
+prune_freeze_plan(PruneState *prstate, BlockNumber blockno, Buffer buffer, Page page,
+ OffsetNumber maxoff, OffsetNumber *off_loc, HeapTuple tup)
+{
+ OffsetNumber offnum;
/*
* Determine HTSV for all tuples, and queue them up for processing as HOT
@@ -799,13 +741,13 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
*/
*off_loc = offnum;
- prstate.processed[offnum] = false;
- prstate.htsv[offnum] = -1;
+ prstate->processed[offnum] = false;
+ prstate->htsv[offnum] = -1;
/* Nothing to do if slot doesn't contain a tuple */
if (!ItemIdIsUsed(itemid))
{
- heap_prune_record_unchanged_lp_unused(page, &prstate, offnum);
+ heap_prune_record_unchanged_lp_unused(page, prstate, offnum);
continue;
}
@@ -815,17 +757,17 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
* If the caller set mark_unused_now true, we can set dead line
* pointers LP_UNUSED now.
*/
- if (unlikely(prstate.mark_unused_now))
- heap_prune_record_unused(&prstate, offnum, false);
+ if (unlikely(prstate->mark_unused_now))
+ heap_prune_record_unused(prstate, offnum, false);
else
- heap_prune_record_unchanged_lp_dead(page, &prstate, offnum);
+ heap_prune_record_unchanged_lp_dead(page, prstate, offnum);
continue;
}
if (ItemIdIsRedirected(itemid))
{
/* This is the start of a HOT chain */
- prstate.root_items[prstate.nroot_items++] = offnum;
+ prstate->root_items[prstate->nroot_items++] = offnum;
continue;
}
@@ -835,25 +777,19 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
* Get the tuple's visibility status and queue it up for processing.
*/
htup = (HeapTupleHeader) PageGetItem(page, itemid);
- tup.t_data = htup;
- tup.t_len = ItemIdGetLength(itemid);
- ItemPointerSet(&tup.t_self, blockno, offnum);
+ tup->t_data = htup;
+ tup->t_len = ItemIdGetLength(itemid);
+ ItemPointerSet(&tup->t_self, blockno, offnum);
- prstate.htsv[offnum] = heap_prune_satisfies_vacuum(&prstate, &tup,
- buffer);
+ prstate->htsv[offnum] = heap_prune_satisfies_vacuum(prstate, tup,
+ buffer);
if (!HeapTupleHeaderIsHeapOnly(htup))
- prstate.root_items[prstate.nroot_items++] = offnum;
+ prstate->root_items[prstate->nroot_items++] = offnum;
else
- prstate.heaponly_items[prstate.nheaponly_items++] = offnum;
+ prstate->heaponly_items[prstate->nheaponly_items++] = offnum;
}
- /*
- * If checksums are enabled, heap_prune_satisfies_vacuum() may have caused
- * an FPI to be emitted.
- */
- did_tuple_hint_fpi = fpi_before != pgWalUsage.wal_fpi;
-
/*
* Process HOT chains.
*
@@ -865,30 +801,30 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
* the page instead of using the root_items array, also did it in
* ascending offset number order.)
*/
- for (int i = prstate.nroot_items - 1; i >= 0; i--)
+ for (int i = prstate->nroot_items - 1; i >= 0; i--)
{
- offnum = prstate.root_items[i];
+ offnum = prstate->root_items[i];
/* Ignore items already processed as part of an earlier chain */
- if (prstate.processed[offnum])
+ if (prstate->processed[offnum])
continue;
/* see preceding loop */
*off_loc = offnum;
/* Process this item or chain of items */
- heap_prune_chain(page, blockno, maxoff, offnum, &prstate);
+ heap_prune_chain(page, blockno, maxoff, offnum, prstate);
}
/*
* Process any heap-only tuples that were not already processed as part of
* a HOT chain.
*/
- for (int i = prstate.nheaponly_items - 1; i >= 0; i--)
+ for (int i = prstate->nheaponly_items - 1; i >= 0; i--)
{
- offnum = prstate.heaponly_items[i];
+ offnum = prstate->heaponly_items[i];
- if (prstate.processed[offnum])
+ if (prstate->processed[offnum])
continue;
/* see preceding loop */
@@ -907,7 +843,7 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
* return true for an XMIN_INVALID tuple, so this code will work even
* when there were sequential updates within the aborted transaction.)
*/
- if (prstate.htsv[offnum] == HEAPTUPLE_DEAD)
+ if (prstate->htsv[offnum] == HEAPTUPLE_DEAD)
{
ItemId itemid = PageGetItemId(page, offnum);
HeapTupleHeader htup = (HeapTupleHeader) PageGetItem(page, itemid);
@@ -915,8 +851,8 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
if (likely(!HeapTupleHeaderIsHotUpdated(htup)))
{
HeapTupleHeaderAdvanceConflictHorizon(htup,
- &prstate.latest_xid_removed);
- heap_prune_record_unused(&prstate, offnum, true);
+ &prstate->latest_xid_removed);
+ heap_prune_record_unused(prstate, offnum, true);
}
else
{
@@ -933,7 +869,7 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
}
}
else
- heap_prune_record_unchanged_lp_normal(page, &prstate, offnum);
+ heap_prune_record_unchanged_lp_normal(page, prstate, offnum);
}
/* We should now have processed every tuple exactly once */
@@ -944,12 +880,110 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
{
*off_loc = offnum;
- Assert(prstate.processed[offnum]);
+ Assert(prstate->processed[offnum]);
}
#endif
+ /*
+ * After processing all the live tuples on the page, if the newest xmin
+ * amongst them is not visible to everyone, the page cannot be
+ * all-visible.
+ */
+ if (prstate->all_visible &&
+ TransactionIdIsNormal(prstate->visibility_cutoff_xid) &&
+ !GlobalVisXidVisibleToAll(prstate->vistest, prstate->visibility_cutoff_xid))
+ prstate->all_visible = prstate->all_frozen = false;
+
/* Clear the offset information once we have processed the given page. */
*off_loc = InvalidOffsetNumber;
+}
+
+/*
+ * Prune and repair fragmentation and potentially freeze tuples on the
+ * specified page. If the page's visibility status has changed, update it in
+ * the VM.
+ *
+ * Caller must have pin and buffer cleanup lock on the page. Note that we
+ * don't update the FSM information for page on caller's behalf. Caller might
+ * also need to account for a reduction in the length of the line pointer
+ * array following array truncation by us.
+ *
+ * params contains the input parameters used to control freezing and pruning
+ * behavior. See the definition of PruneFreezeParams for more on what each
+ * parameter does.
+ *
+ * If the HEAP_PRUNE_FREEZE option is set in params, we will freeze tuples if
+ * it's required in order to advance relfrozenxid / relminmxid, or if it's
+ * considered advantageous for overall system performance to do so now. The
+ * 'params.cutoffs', 'presult', 'new_relfrozen_xid' and 'new_relmin_mxid'
+ * arguments are required when freezing.
+ *
+ * If HEAP_PAGE_PRUNE_UPDATE_VIS is set in params and the visibility status of
+ * the page has changed, we will update the VM at the same time as pruning and
+ * freezing the heap page. We will also update presult->old_vmbits and
+ * presult->new_vmbits with the state of the VM before and after updating it
+ * for the caller to use in bookkeeping.
+ *
+ * presult contains output parameters needed by callers, such as the number of
+ * tuples removed and the offsets of dead items on the page after pruning.
+ * heap_page_prune_and_freeze() is responsible for initializing it. Required
+ * by all callers.
+ *
+ * off_loc is the offset location required by the caller to use in error
+ * callback.
+ *
+ * new_relfrozen_xid and new_relmin_mxid must provided by the caller if the
+ * HEAP_PRUNE_FREEZE option is set in params. On entry, they contain the
+ * oldest XID and multi-XID seen on the relation so far. They will be updated
+ * with oldest values present on the page after pruning. After processing the
+ * whole relation, VACUUM can use these values as the new
+ * relfrozenxid/relminmxid for the relation.
+ */
+void
+heap_page_prune_and_freeze(PruneFreezeParams *params,
+ PruneFreezeResult *presult,
+ OffsetNumber *off_loc,
+ TransactionId *new_relfrozen_xid,
+ MultiXactId *new_relmin_mxid)
+{
+ Buffer buffer = params->buffer;
+ Buffer vmbuffer = params->vmbuffer;
+ Page page = BufferGetPage(buffer);
+ BlockNumber blockno = BufferGetBlockNumber(buffer);
+ OffsetNumber maxoff;
+ PruneState prstate;
+ HeapTupleData tup;
+ bool do_freeze;
+ bool do_prune;
+ bool do_hint_prune;
+ bool do_set_vm;
+ bool do_set_pd_vis;
+ bool did_tuple_hint_fpi;
+ int64 fpi_before = pgWalUsage.wal_fpi;
+ TransactionId frz_conflict_horizon = InvalidTransactionId;
+ TransactionId conflict_xid = InvalidTransactionId;
+ uint8 new_vmbits = 0;
+ uint8 old_vmbits = 0;
+
+ maxoff = PageGetMaxOffsetNumber(page);
+ tup.t_tableOid = RelationGetRelid(params->relation);
+
+ /* Initialize needed state in prstate */
+ prune_freeze_setup(params, &prstate, new_relfrozen_xid, new_relmin_mxid, presult);
+
+ /*
+ * Examine all line pointers and tuple visibility information to determine
+ * which line pointers should change state and which tuples may be frozen.
+ * Prepare queue of state changes to later be executed in a critical
+ * section.
+ */
+ prune_freeze_plan(&prstate, blockno, buffer, page, maxoff, off_loc, &tup);
+
+ /*
+ * If checksums are enabled, heap_prune_satisfies_vacuum() may have caused
+ * an FPI to be emitted.
+ */
+ did_tuple_hint_fpi = fpi_before != pgWalUsage.wal_fpi;
do_prune = prstate.nredirected > 0 ||
prstate.ndead > 0 ||
@@ -963,16 +997,6 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
do_hint_prune = ((PageHeader) page)->pd_prune_xid != prstate.new_prune_xid ||
PageIsFull(page);
- /*
- * After processing all the live tuples on the page, if the newest xmin
- * amongst them is not visible to everyone, the page cannot be
- * all-visible.
- */
- if (prstate.all_visible &&
- TransactionIdIsNormal(prstate.visibility_cutoff_xid) &&
- !GlobalVisXidVisibleToAll(prstate.vistest, prstate.visibility_cutoff_xid))
- prstate.all_visible = prstate.all_frozen = false;
-
/*
* Decide if we want to go ahead with freezing according to the freeze
* plans we prepared, or not.
--
2.43.0
On Wed, 15 Oct 2025 at 04:27, Melanie Plageman
<melanieplageman@gmail.com> wrote:
Thanks so much for the review! I've addressed all your feedback except
what is commented on inline below.
I've gone ahead and committed the preliminary patches that you thought
were ready to commit.Attached v18 is what remains.
0001 - 0003: refactoring
0004 - 0006: finish eliminating XLOG_HEAP2_VISIBLE
0007 - 0009: refactoring
0010: Set VM on-access
0011: Set prune xid on insert
0012: Some refactoring for discussionFor 0001, I got feedback heap_page_prune_and_freeze() has too many
arguments, so I tried to address that. I'm interested to know if folks
like this more.0011 still needs a bit of investigation to understand fully if
anything else in the index-killtuples test needs to be changed to make
sure we have the same coverage.0012 is sort of WIP. I got feedback heap_page_prune_and_freeze() was
too long and should be split up into helpers. I want to know if this
split makes sense. I can pull it down the patch stack if so.Only 0001 and 0012 are optional amongst the refactoring patches. The
others are required to make on-access VM-setting possible or viable.On Thu, Oct 9, 2025 at 2:18 PM Andres Freund <andres@anarazel.de> wrote:
@@ -71,12 +84,12 @@ heap_xlog_prune_freeze(XLogReaderState *record) } - action = XLogReadBufferForRedoExtended(record, 0, RBM_NORMAL, - (xlrec.flags & XLHP_CLEANUP_LOCK) != 0, - &buffer); - if (action == BLK_NEEDS_REDO) + if (XLogReadBufferForRedoExtended(record, 0, RBM_NORMAL, + (xlrec.flags & XLHP_CLEANUP_LOCK) != 0, + &buffer) == BLK_NEEDS_REDO) { Page page = BufferGetPage(buffer); OffsetNumber *redirected;Why move it around this way?
Because there will be an action for the visibility map
XLogReadBufferForRedoExtended(). I could have renamed it heap_action,
but it is being used only in one place, so I preferred to just cut it
to avoid any confusion.Advance the page LSN + * only if the record could include an FPI, since recovery skips + * records <= the stamped LSN. Otherwise it might skip an earlier FPI + * needed to repair a torn page. + */This is confusing, should probably just reference the stuff we did in the
!recovery case.I fixed this and addressed all your feedback related to this before committing.
+ if (do_prune || nplans > 0 || + ((vmflags & VISIBILITYMAP_VALID_BITS) && XLogHintBitIsNeeded())) + PageSetLSN(page, lsn); + /* * Note: we don't worry about updating the page's prunability hints. * At worst this will cause an extra prune cycle to occur soon. */Not your fault, but that seems odd? Why aren't we just doing the right thing?
The comment dates back to 6f10eb2. I imagine no one ever bothered to
fuss with extracting the XID. You could change
heap_page_prune_execute() to return the right value -- though that's a
bit ugly since it is used in normal operation as well as recovery.I wonder if the VM specific redo portion should be in a common helper? Might
not be enough code to worry though...I think it might be more code as a helper at this point.
@@ -2860,6 +2867,29 @@ lazy_vacuum_heap_page(LVRelState *vacrel, BlockNumber blkno, Buffer buffer,
VACUUM_ERRCB_PHASE_VACUUM_HEAP, blkno,
InvalidOffsetNumber);+ /* + * Before marking dead items unused, check whether the page will become + * all-visible once that change is applied.So the function is named _would_ but here you say will :)
I thought about it more and still feel that this function name should
contain "would". From vacuum's perspective it is "will" -- because it
knows it will remove those dead items, but from the function's
perspective it is hypothetical. I changed the comment though.+ if (heap_page_would_be_all_visible(vacrel, buffer, + deadoffsets, num_offsets, + &all_frozen, &visibility_cutoff_xid)) + { + vmflags |= VISIBILITYMAP_ALL_VISIBLE; + if (all_frozen) + { + vmflags |= VISIBILITYMAP_ALL_FROZEN; + Assert(!TransactionIdIsValid(visibility_cutoff_xid)); + } + + /* Take the lock on the vmbuffer before entering a critical section */ + LockBuffer(vmbuffer, BUFFER_LOCK_EXCLUSIVE);It sure would be nice if we had documented the lock order between the heap
page and the corresponding VM page anywhere. This is just doing what we did
before, so it's not this patch's fault, but I did get worried about it for a
moment.Well, the comment above the visibilitymap_set* functions says what
expectations they have for the heap page being locked.+static bool +heap_page_would_be_all_visible(LVRelState *vacrel, Buffer buf, + OffsetNumber *deadoffsets, + int ndeadoffsets, + bool *all_frozen, + TransactionId *visibility_cutoff_xid) +{ Page page = BufferGetPage(buf);Hm, what about an assert checking that matched_dead_count == ndeadoffsets at
the end?I was going to put an Assert(ndeadoffsets <= matched_dead_count), but
then I started wondering if there is a way we could end up with fewer
dead items than we collected during phase I.I had thought about if we dropped an index and then did on-access
pruning -- but we don't allow setting LP_DEAD items LP_UNUSED in
on-access pruning. So, maybe this is safe... I can do a follow-on
commit to add the assert. But I'm just not 100% sure I've thought of
all the cases where we might end up with fewer dead items.During vacuum's first and third phases, we examine tuples' visibility
to determine if we can set the page all-visible in the visibility map.Previously, this check compared tuple xmins against a single XID chosen at
the start of vacuum (OldestXmin). We now use GlobalVisState, which also
enables future work to set the VM during on-access pruning, since ordinary
queries have access to GlobalVisState but not OldestXmin.This also benefits vacuum directly: GlobalVisState may advance
during a vacuum, allowing more pages to become considered all-visible.
In the rare case that it moves backward, VACUUM falls back to OldestXmin
to ensure we don’t attempt to freeze a dead tuple that wasn’t yet
prunable according to the GlobalVisState.It could, but it currently won't advance in vacuum, right?
I thought it was possible for it to advance when calling
heap_prune_satisfies_vacuum() ->
GlobalVisTestIsRemovableXid()->...GlobalVisUpdate(). This case isn't
going to be common, but some things can cause us to update it.We have talked about explicitly updating GlobalVisState more often
during vacuums of large tables. But I was under the impression that it
was at least possible for it to advance during vacuum now.- Melanie
Hi!
First of all, I rechecked v18 patches, they still cause WAL bytes
reduction. In a no-index vacuum case my result is a 39% reduction in
WAL bytes.
Almost like in your first message.
Here are my comments about code, I may be very nitpicky in minor
details, sorry for that
In 0003:
get_conflict_xid function logic is bit strange for me, it assigns
conflict_xid to some value, but in the very end we have
+ /* + * We can omit the snapshot conflict horizon if we are not pruning or + * freezing any tuples and are setting an already all-visible page + * all-frozen in the VM. In this case, all of the tuples on the page must + * already be visible to all MVCC snapshots on the standby. + */ + if (!do_prune && !do_freeze && + do_set_vm && blk_already_av && set_blk_all_frozen) + conflict_xid = InvalidTransactionId;
I feel like we should move this check to the beginning of the
function, and just return InvalidTransactionId in that if cond.
in 0004:
+ if (old_vmbits == new_vmbits) + { + LockBuffer(vmbuffer, BUFFER_LOCK_UNLOCK); + /* Unset so we don't emit WAL since no change occurred */ + do_set_vm = false; + }
and then
END_CRIT_SECTION(); + if (do_set_vm) + LockBuffer(vmbuffer, BUFFER_LOCK_UNLOCK); +
So, in the heap_page_prune_and_freeze function we release buffer lock
both inside and outside the crit section. As I understand, this is
actually safe. I also looked in other xlog coding practices for other
access methods (GiST, GIN, ....), and I can see that some of them
release buffers before leaving crit sections and some of them after.
But I still suggest to be in sync with 'Write-Ahead Log Coding'
section of
src/backend/access/transam/README, which says:
6. END_CRIT_SECTION()
7. Unlock and unpin the buffer(s).
Let's be consistent in this at least in this single function context.
In 0010:
I'm not terribly convenient that adding SO_ALLOW_VM_SET to TAM
ScanOptions is the right thing to do. Looks like VM bits are something
that make sense for HEAP AM for not for any TAM. So, don't we break
some layer of abstraction here? Would it be better for HEAP AM to set
some flags in heap_beginscan?
Overall 0001-0003 are mostly fine for me, 0004-0006 are the right
thing to do IMHO, but maybe they need some more review from hackers.
Other patches i did not review in a great detail, will return to this
later
--
Best regards,
Kirill Reshke
Thanks for the review!
On Wed, Oct 29, 2025 at 7:03 AM Kirill Reshke <reshkekirill@gmail.com> wrote:
get_conflict_xid function logic is bit strange for me, it assigns
conflict_xid to some value, but in the very end we have+ /* + * We can omit the snapshot conflict horizon if we are not pruning or + * freezing any tuples and are setting an already all-visible page + * all-frozen in the VM. In this case, all of the tuples on the page must + * already be visible to all MVCC snapshots on the standby. + */ + if (!do_prune && !do_freeze && + do_set_vm && blk_already_av && set_blk_all_frozen) + conflict_xid = InvalidTransactionId;I feel like we should move this check to the beginning of the
function, and just return InvalidTransactionId in that if cond.
You're right. I've changed it as you suggest in attached v19.
+ if (old_vmbits == new_vmbits) + { + LockBuffer(vmbuffer, BUFFER_LOCK_UNLOCK); + /* Unset so we don't emit WAL since no change occurred */ + do_set_vm = false; + }and then
END_CRIT_SECTION(); + if (do_set_vm) + LockBuffer(vmbuffer, BUFFER_LOCK_UNLOCK); +So, in the heap_page_prune_and_freeze function we release buffer lock
both inside and outside the crit section. As I understand, this is
actually safe. I also looked in other xlog coding practices for other
access methods (GiST, GIN, ....), and I can see that some of them
release buffers before leaving crit sections and some of them after.
But I still suggest to be in sync with 'Write-Ahead Log Coding'
section of
src/backend/access/transam/README, which says:6. END_CRIT_SECTION()
7. Unlock and unpin the buffer(s).
Let's be consistent in this at least in this single function context.
I see what you are saying. However, I don't see a good way to
determine whether or not we need to unlock the VM without introducing
another local variable in the outermost scope -- like "unlock_vm".
This function already has a lot of local variables, so I'm loath to do
that. And we want do_set_vm to reflect whether or not we actually set
it in case it gets used in the future.
This function doesn't lock or unlock the heap buffer so it doesn't
seem as urgent to me to follow the letter of the law in this case.
Attached patch doesn't have this change, but this is what it would look like:
/* Lock vmbuffer before entering a critical section */
+ unlock_vm = do_set_vm;
if (do_set_vm)
LockBuffer(vmbuffer, BUFFER_LOCK_EXCLUSIVE);
@@ -1112,12 +1114,9 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
old_vmbits = visibilitymap_set(blockno,
vmbuffer, new_vmbits,
params->relation->rd_locator);
- if (old_vmbits == new_vmbits)
- {
- LockBuffer(vmbuffer, BUFFER_LOCK_UNLOCK);
- /* Unset so we don't emit WAL since no change occurred */
- do_set_vm = false;
- }
+
+ /* Unset so we don't emit WAL since no change occurred */
+ do_set_vm = old_vmbits != new_vmbits;
}
/*
@@ -1145,7 +1144,7 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
END_CRIT_SECTION();
- if (do_set_vm)
+ if (unlock_vm)
LockBuffer(vmbuffer, BUFFER_LOCK_UNLOCK);
In 0010:
I'm not terribly convenient that adding SO_ALLOW_VM_SET to TAM
ScanOptions is the right thing to do. Looks like VM bits are something
that make sense for HEAP AM for not for any TAM. So, don't we break
some layer of abstraction here? Would it be better for HEAP AM to set
some flags in heap_beginscan?
I don't see another good way of doing it.
The information about whether or not the relation is modified in the
query is gathered during planning and saved in the plan. We need to
get that information to the scan descriptor, which is all we have when
we call heap_page_prune_opt() during the scan. The scan descriptor is
created by the table AM implementations of scan_begin(). The table AM
callbacks don't pass down the plan -- which makes sense; the scan
shouldn't know about the plan. They do pass down flags, so I thought
it made the most sense to add a flag. Note that I was able to avoid
modifying the actual table and index AM callbacks (scan_begin() and
ambeginscan()). I only made new wrappers that took "modifies_rel".
Now, it is true that referring to the VM is somewhat of a layering
violation. Though, other table AMs may use the information about if
the query modifies the relation -- which is really what this flag
represents. The ScanOptions are usually either a type or a call to
action. Which is why I felt a bit uncomfortable calling it something
like SO_MODIFIES_REL -- which is less of an option and more a piece of
information. And it makes it sound like the scan modifies the rel,
which is not the case. I wonder if there is another solution. Or maybe
we call it SO_QUERY_MODIFIES_REL?
- Melanie
Attachments:
v19-0001-Refactor-heap_page_prune_and_freeze-parameters-i.patchtext/x-patch; charset=UTF-8; name=v19-0001-Refactor-heap_page_prune_and_freeze-parameters-i.patchDownload
From 338f6e31bf029527a3898fee1fbe587e24de9f5f Mon Sep 17 00:00:00 2001
From: Melanie Plageman <melanieplageman@gmail.com>
Date: Tue, 14 Oct 2025 11:10:25 -0400
Subject: [PATCH v19 01/12] Refactor heap_page_prune_and_freeze() parameters
into a struct
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
heap_page_prune_and_freeze() had accumulated an unwieldy number of input
parameters, and upcoming work to handle VM updates in this function will
add even more.
Introduce a new PruneFreezeParams struct to group the function’s input
parameters, improving readability and maintainability.
Discussion: https://postgr.es/m/yn4zp35kkdsjx6wf47zcfmxgexxt4h2og47pvnw2x5ifyrs3qc%407uw6jyyxuyf7
---
src/backend/access/heap/pruneheap.c | 86 +++++++++++++---------------
src/backend/access/heap/vacuumlazy.c | 16 ++++--
src/include/access/heapam.h | 62 ++++++++++++++++----
src/tools/pgindent/typedefs.list | 1 +
4 files changed, 101 insertions(+), 64 deletions(-)
diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index 231bea679c6..450b2eb6494 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -258,15 +258,23 @@ heap_page_prune_opt(Relation relation, Buffer buffer)
if (PageIsFull(page) || PageGetHeapFreeSpace(page) < minfree)
{
OffsetNumber dummy_off_loc;
+ PruneFreezeParams params;
PruneFreezeResult presult;
+ params.relation = relation;
+ params.buffer = buffer;
+ params.reason = PRUNE_ON_ACCESS;
+ params.vistest = vistest;
+ params.cutoffs = NULL;
+
/*
* For now, pass mark_unused_now as false regardless of whether or
* not the relation has indexes, since we cannot safely determine
* that during on-access pruning with the current implementation.
*/
- heap_page_prune_and_freeze(relation, buffer, vistest, 0,
- NULL, &presult, PRUNE_ON_ACCESS, &dummy_off_loc, NULL, NULL);
+ params.options = 0;
+
+ heap_page_prune_and_freeze(¶ms, &presult, &dummy_off_loc, NULL, NULL);
/*
* Report the number of tuples reclaimed to pgstats. This is
@@ -419,60 +427,43 @@ heap_page_will_freeze(Relation relation, Buffer buffer,
* also need to account for a reduction in the length of the line pointer
* array following array truncation by us.
*
- * If the HEAP_PRUNE_FREEZE option is set, we will also freeze tuples if it's
- * required in order to advance relfrozenxid / relminmxid, or if it's
- * considered advantageous for overall system performance to do so now. The
- * 'cutoffs', 'presult', 'new_relfrozen_xid' and 'new_relmin_mxid' arguments
- * are required when freezing. When HEAP_PRUNE_FREEZE option is set, we also
- * set presult->all_visible and presult->all_frozen on exit, to indicate if
- * the VM bits can be set. They are always set to false when the
- * HEAP_PRUNE_FREEZE option is not set, because at the moment only callers
- * that also freeze need that information.
- *
- * vistest is used to distinguish whether tuples are DEAD or RECENTLY_DEAD
- * (see heap_prune_satisfies_vacuum).
+ * params contains the input parameters used to control freezing and pruning
+ * behavior. See the definition of PruneFreezeParams for more on what each
+ * parameter does.
*
- * options:
- * MARK_UNUSED_NOW indicates that dead items can be set LP_UNUSED during
- * pruning.
- *
- * FREEZE indicates that we will also freeze tuples, and will return
- * 'all_visible', 'all_frozen' flags to the caller.
- *
- * cutoffs contains the freeze cutoffs, established by VACUUM at the beginning
- * of vacuuming the relation. Required if HEAP_PRUNE_FREEZE option is set.
- * cutoffs->OldestXmin is also used to determine if dead tuples are
- * HEAPTUPLE_RECENTLY_DEAD or HEAPTUPLE_DEAD.
+ * If the HEAP_PRUNE_FREEZE option is set in params, we will freeze tuples if
+ * it's required in order to advance relfrozenxid / relminmxid, or if it's
+ * considered advantageous for overall system performance to do so now. The
+ * 'params.cutoffs', 'presult', 'new_relfrozen_xid' and 'new_relmin_mxid'
+ * arguments are required when freezing. When HEAP_PRUNE_FREEZE option is
+ * passed, we also set presult->all_visible and presult->all_frozen on exit,
+ * to indicate if the VM bits can be set. They are always set to false when
+ * the HEAP_PRUNE_FREEZE option is not passed, because at the moment only
+ * callers that also freeze need that information.
*
* presult contains output parameters needed by callers, such as the number of
* tuples removed and the offsets of dead items on the page after pruning.
* heap_page_prune_and_freeze() is responsible for initializing it. Required
* by all callers.
*
- * reason indicates why the pruning is performed. It is included in the WAL
- * record for debugging and analysis purposes, but otherwise has no effect.
- *
* off_loc is the offset location required by the caller to use in error
* callback.
*
* new_relfrozen_xid and new_relmin_mxid must provided by the caller if the
- * HEAP_PRUNE_FREEZE option is set. On entry, they contain the oldest XID and
- * multi-XID seen on the relation so far. They will be updated with oldest
- * values present on the page after pruning. After processing the whole
- * relation, VACUUM can use these values as the new relfrozenxid/relminmxid
- * for the relation.
+ * HEAP_PRUNE_FREEZE option is set in params. On entry, they contain the
+ * oldest XID and multi-XID seen on the relation so far. They will be updated
+ * with oldest values present on the page after pruning. After processing the
+ * whole relation, VACUUM can use these values as the new
+ * relfrozenxid/relminmxid for the relation.
*/
void
-heap_page_prune_and_freeze(Relation relation, Buffer buffer,
- GlobalVisState *vistest,
- int options,
- struct VacuumCutoffs *cutoffs,
+heap_page_prune_and_freeze(PruneFreezeParams *params,
PruneFreezeResult *presult,
- PruneReason reason,
OffsetNumber *off_loc,
TransactionId *new_relfrozen_xid,
MultiXactId *new_relmin_mxid)
{
+ Buffer buffer = params->buffer;
Page page = BufferGetPage(buffer);
BlockNumber blockno = BufferGetBlockNumber(buffer);
OffsetNumber offnum,
@@ -486,10 +477,11 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
int64 fpi_before = pgWalUsage.wal_fpi;
/* Copy parameters to prstate */
- prstate.vistest = vistest;
- prstate.mark_unused_now = (options & HEAP_PAGE_PRUNE_MARK_UNUSED_NOW) != 0;
- prstate.attempt_freeze = (options & HEAP_PAGE_PRUNE_FREEZE) != 0;
- prstate.cutoffs = cutoffs;
+ prstate.vistest = params->vistest;
+ prstate.mark_unused_now =
+ (params->options & HEAP_PAGE_PRUNE_MARK_UNUSED_NOW) != 0;
+ prstate.attempt_freeze = (params->options & HEAP_PAGE_PRUNE_FREEZE) != 0;
+ prstate.cutoffs = params->cutoffs;
/*
* Our strategy is to scan the page and make lists of items to change,
@@ -583,7 +575,7 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
prstate.visibility_cutoff_xid = InvalidTransactionId;
maxoff = PageGetMaxOffsetNumber(page);
- tup.t_tableOid = RelationGetRelid(relation);
+ tup.t_tableOid = RelationGetRelid(params->relation);
/*
* Determine HTSV for all tuples, and queue them up for processing as HOT
@@ -786,7 +778,7 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
* Decide if we want to go ahead with freezing according to the freeze
* plans we prepared, or not.
*/
- do_freeze = heap_page_will_freeze(relation, buffer,
+ do_freeze = heap_page_will_freeze(params->relation, buffer,
did_tuple_hint_fpi,
do_prune,
do_hint_prune,
@@ -838,7 +830,7 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
/*
* Emit a WAL XLOG_HEAP2_PRUNE* record showing what we did
*/
- if (RelationNeedsWAL(relation))
+ if (RelationNeedsWAL(params->relation))
{
/*
* The snapshotConflictHorizon for the whole record should be the
@@ -876,11 +868,11 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
else
conflict_xid = prstate.latest_xid_removed;
- log_heap_prune_and_freeze(relation, buffer,
+ log_heap_prune_and_freeze(params->relation, buffer,
InvalidBuffer, /* vmbuffer */
0, /* vmflags */
conflict_xid,
- true, reason,
+ true, params->reason,
prstate.frozen, prstate.nfrozen,
prstate.redirected, prstate.nredirected,
prstate.nowdead, prstate.ndead,
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index 61fe623cc60..e55be07cae4 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -1965,10 +1965,16 @@ lazy_scan_prune(LVRelState *vacrel,
{
Relation rel = vacrel->rel;
PruneFreezeResult presult;
- int prune_options = 0;
+ PruneFreezeParams params;
Assert(BufferGetBlockNumber(buf) == blkno);
+ params.relation = rel;
+ params.buffer = buf;
+ params.reason = PRUNE_VACUUM_SCAN;
+ params.cutoffs = &vacrel->cutoffs;
+ params.vistest = vacrel->vistest;
+
/*
* Prune all HOT-update chains and potentially freeze tuples on this page.
*
@@ -1984,12 +1990,12 @@ lazy_scan_prune(LVRelState *vacrel,
* tuples. Pruning will have determined whether or not the page is
* all-visible.
*/
- prune_options = HEAP_PAGE_PRUNE_FREEZE;
+ params.options = HEAP_PAGE_PRUNE_FREEZE;
if (vacrel->nindexes == 0)
- prune_options |= HEAP_PAGE_PRUNE_MARK_UNUSED_NOW;
+ params.options |= HEAP_PAGE_PRUNE_MARK_UNUSED_NOW;
- heap_page_prune_and_freeze(rel, buf, vacrel->vistest, prune_options,
- &vacrel->cutoffs, &presult, PRUNE_VACUUM_SCAN,
+ heap_page_prune_and_freeze(¶ms,
+ &presult,
&vacrel->offnum,
&vacrel->NewRelfrozenXid, &vacrel->NewRelminMxid);
diff --git a/src/include/access/heapam.h b/src/include/access/heapam.h
index 909db73b7bb..b0b6d3552a6 100644
--- a/src/include/access/heapam.h
+++ b/src/include/access/heapam.h
@@ -221,6 +221,55 @@ typedef struct HeapPageFreeze
} HeapPageFreeze;
+
+/* 'reason' codes for heap_page_prune_and_freeze() */
+typedef enum
+{
+ PRUNE_ON_ACCESS, /* on-access pruning */
+ PRUNE_VACUUM_SCAN, /* VACUUM 1st heap pass */
+ PRUNE_VACUUM_CLEANUP, /* VACUUM 2nd heap pass */
+} PruneReason;
+
+/*
+ * Input parameters to heap_page_prune_and_freeze()
+ */
+typedef struct PruneFreezeParams
+{
+ Relation relation; /* relation containing buffer to be pruned */
+ Buffer buffer; /* buffer to be pruned */
+
+ /*
+ * The reason pruning was performed. It is used to set the WAL record
+ * opcode which is used for debugging and analysis purposes.
+ */
+ PruneReason reason;
+
+ /*
+ * Contains flag bits:
+ *
+ * MARK_UNUSED_NOW indicates that dead items can be set LP_UNUSED during
+ * pruning.
+ *
+ * FREEZE indicates that we will also freeze tuples, and will return
+ * 'all_visible', 'all_frozen' flags to the caller.
+ */
+ int options;
+
+ /*
+ * vistest is used to distinguish whether tuples are DEAD or RECENTLY_DEAD
+ * (see heap_prune_satisfies_vacuum).
+ */
+ GlobalVisState *vistest;
+
+ /*
+ * cutoffs contains the freeze cutoffs, established by VACUUM at the
+ * beginning of vacuuming the relation. Required if HEAP_PRUNE_FREEZE
+ * option is set. cutoffs->OldestXmin is also used to determine if dead
+ * tuples are HEAPTUPLE_RECENTLY_DEAD or HEAPTUPLE_DEAD.
+ */
+ struct VacuumCutoffs *cutoffs;
+} PruneFreezeParams;
+
/*
* Per-page state returned by heap_page_prune_and_freeze()
*/
@@ -264,13 +313,6 @@ typedef struct PruneFreezeResult
OffsetNumber deadoffsets[MaxHeapTuplesPerPage];
} PruneFreezeResult;
-/* 'reason' codes for heap_page_prune_and_freeze() */
-typedef enum
-{
- PRUNE_ON_ACCESS, /* on-access pruning */
- PRUNE_VACUUM_SCAN, /* VACUUM 1st heap pass */
- PRUNE_VACUUM_CLEANUP, /* VACUUM 2nd heap pass */
-} PruneReason;
/* ----------------
* function prototypes for heap access method
@@ -367,12 +409,8 @@ extern TransactionId heap_index_delete_tuples(Relation rel,
/* in heap/pruneheap.c */
extern void heap_page_prune_opt(Relation relation, Buffer buffer);
-extern void heap_page_prune_and_freeze(Relation relation, Buffer buffer,
- GlobalVisState *vistest,
- int options,
- struct VacuumCutoffs *cutoffs,
+extern void heap_page_prune_and_freeze(PruneFreezeParams *params,
PruneFreezeResult *presult,
- PruneReason reason,
OffsetNumber *off_loc,
TransactionId *new_relfrozen_xid,
MultiXactId *new_relmin_mxid);
diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list
index 018b5919cf6..a384171de0d 100644
--- a/src/tools/pgindent/typedefs.list
+++ b/src/tools/pgindent/typedefs.list
@@ -2343,6 +2343,7 @@ ProjectionPath
PromptInterruptContext
ProtocolVersion
PrsStorage
+PruneFreezeParams
PruneFreezeResult
PruneReason
PruneState
--
2.43.0
v19-0002-Keep-all_frozen-updated-in-heap_page_prune_and_f.patchtext/x-patch; charset=US-ASCII; name=v19-0002-Keep-all_frozen-updated-in-heap_page_prune_and_f.patchDownload
From f179a5493c9671f9c8eca9231292d3a48bf7153c Mon Sep 17 00:00:00 2001
From: Melanie Plageman <melanieplageman@gmail.com>
Date: Tue, 14 Oct 2025 14:55:40 -0400
Subject: [PATCH v19 02/12] Keep all_frozen updated in
heap_page_prune_and_freeze
Previously, we relied on all_visible and all_frozen being used together
to ensure that all_frozen was correct, but it is better to keep both
fields updated.
Future changes will separate their usage, so we should not depend on
all_visible for the validity of all_frozen.
---
src/backend/access/heap/pruneheap.c | 22 +++++++++++-----------
src/backend/access/heap/vacuumlazy.c | 9 ++++-----
2 files changed, 15 insertions(+), 16 deletions(-)
diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index 450b2eb6494..daa719fc2a1 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -143,10 +143,6 @@ typedef struct
* whether to freeze the page or not. The all_visible and all_frozen
* values returned to the caller are adjusted to include LP_DEAD items at
* the end.
- *
- * all_frozen should only be considered valid if all_visible is also set;
- * we don't bother to clear the all_frozen flag every time we clear the
- * all_visible flag.
*/
bool all_visible;
bool all_frozen;
@@ -361,8 +357,10 @@ heap_page_will_freeze(Relation relation, Buffer buffer,
* anymore. The opportunistic freeze heuristic must be improved;
* however, for now, try to approximate the old logic.
*/
- if (prstate->all_visible && prstate->all_frozen && prstate->nfrozen > 0)
+ if (prstate->all_frozen && prstate->nfrozen > 0)
{
+ Assert(prstate->all_visible);
+
/*
* Freezing would make the page all-frozen. Have already emitted
* an FPI or will do so anyway?
@@ -784,6 +782,8 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
do_hint_prune,
&prstate);
+ Assert(!prstate.all_frozen || prstate.all_visible);
+
/* Any error while applying the changes is critical */
START_CRIT_SECTION();
@@ -853,7 +853,7 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
*/
if (do_freeze)
{
- if (prstate.all_visible && prstate.all_frozen)
+ if (prstate.all_frozen)
frz_conflict_horizon = prstate.visibility_cutoff_xid;
else
{
@@ -1418,7 +1418,7 @@ heap_prune_record_unchanged_lp_normal(Page page, PruneState *prstate, OffsetNumb
if (!HeapTupleHeaderXminCommitted(htup))
{
- prstate->all_visible = false;
+ prstate->all_visible = prstate->all_frozen = false;
break;
}
@@ -1440,7 +1440,7 @@ heap_prune_record_unchanged_lp_normal(Page page, PruneState *prstate, OffsetNumb
Assert(prstate->cutoffs);
if (!TransactionIdPrecedes(xmin, prstate->cutoffs->OldestXmin))
{
- prstate->all_visible = false;
+ prstate->all_visible = prstate->all_frozen = false;
break;
}
@@ -1453,7 +1453,7 @@ heap_prune_record_unchanged_lp_normal(Page page, PruneState *prstate, OffsetNumb
case HEAPTUPLE_RECENTLY_DEAD:
prstate->recently_dead_tuples++;
- prstate->all_visible = false;
+ prstate->all_visible = prstate->all_frozen = false;
/*
* This tuple will soon become DEAD. Update the hint field so
@@ -1472,7 +1472,7 @@ heap_prune_record_unchanged_lp_normal(Page page, PruneState *prstate, OffsetNumb
* assumption is a bit shaky, but it is what acquire_sample_rows()
* does, so be consistent.
*/
- prstate->all_visible = false;
+ prstate->all_visible = prstate->all_frozen = false;
/*
* If we wanted to optimize for aborts, we might consider marking
@@ -1490,7 +1490,7 @@ heap_prune_record_unchanged_lp_normal(Page page, PruneState *prstate, OffsetNumb
* will commit and update the counters after we report.
*/
prstate->live_tuples++;
- prstate->all_visible = false;
+ prstate->all_visible = prstate->all_frozen = false;
/*
* This tuple may soon become DEAD. Update the hint field so that
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index e55be07cae4..670a7424b15 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -2021,7 +2021,6 @@ lazy_scan_prune(LVRelState *vacrel,
* agreement with heap_page_is_all_visible() using an assertion.
*/
#ifdef USE_ASSERT_CHECKING
- /* Note that all_frozen value does not matter when !all_visible */
if (presult.all_visible)
{
TransactionId debug_cutoff;
@@ -2075,6 +2074,7 @@ lazy_scan_prune(LVRelState *vacrel,
*has_lpdead_items = (presult.lpdead_items > 0);
Assert(!presult.all_visible || !(*has_lpdead_items));
+ Assert(!presult.all_frozen || presult.all_visible);
/*
* Handle setting visibility map bit based on information from the VM (as
@@ -2180,11 +2180,10 @@ lazy_scan_prune(LVRelState *vacrel,
/*
* If the all-visible page is all-frozen but not marked as such yet, mark
- * it as all-frozen. Note that all_frozen is only valid if all_visible is
- * true, so we must check both all_visible and all_frozen.
+ * it as all-frozen.
*/
- else if (all_visible_according_to_vm && presult.all_visible &&
- presult.all_frozen && !VM_ALL_FROZEN(vacrel->rel, blkno, &vmbuffer))
+ else if (all_visible_according_to_vm && presult.all_frozen &&
+ !VM_ALL_FROZEN(vacrel->rel, blkno, &vmbuffer))
{
uint8 old_vmbits;
--
2.43.0
v19-0003-Update-PruneState.all_-visible-frozen-earlier-in.patchtext/x-patch; charset=UTF-8; name=v19-0003-Update-PruneState.all_-visible-frozen-earlier-in.patchDownload
From c27ab39e739acf7e96d1f7e81df91fc2b2b7fe43 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <melanieplageman@gmail.com>
Date: Tue, 14 Oct 2025 15:21:49 -0400
Subject: [PATCH v19 03/12] Update PruneState.all_[visible|frozen] earlier in
pruning
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
In the prune/freeze path, we currently delay clearing all_visible and
all_frozen in the presence of dead items. This allows opportunistic
freezing if the page would otherwise be fully frozen, since those dead
items are later removed in vacuum’s third phase.
To move the VM update into the same WAL record that
prunes and freezes tuples, we must know whether the page will
be marked all-visible/all-frozen before emitting WAL.
The only barrier to updating these flags immediately after deciding
whether to opportunistically freeze is that we previously used
all_frozen to compute the snapshot conflict horizon when freezing
tuples. By determining the cutoff earlier, we can update the flags
immediately after making the freeze decision.
This is required to set the VM in the XLOG_HEAP2_PRUNE_VACUUM_SCAN
record emitted by pruning and freezing.
---
src/backend/access/heap/pruneheap.c | 117 ++++++++++++++--------------
1 file changed, 57 insertions(+), 60 deletions(-)
diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index daa719fc2a1..ef8861022f1 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -138,11 +138,11 @@ typedef struct
* bits. It is only valid if we froze some tuples, and all_frozen is
* true.
*
- * NOTE: all_visible and all_frozen don't include LP_DEAD items. That's
- * convenient for heap_page_prune_and_freeze(), to use them to decide
- * whether to freeze the page or not. The all_visible and all_frozen
- * values returned to the caller are adjusted to include LP_DEAD items at
- * the end.
+ * NOTE: all_visible and all_frozen initially don't include LP_DEAD items.
+ * That's convenient for heap_page_prune_and_freeze() to use them to
+ * decide whether to freeze the page or not. The all_visible and
+ * all_frozen values returned to the caller are adjusted to include
+ * LP_DEAD items after we determine whether to opportunistically freeze.
*/
bool all_visible;
bool all_frozen;
@@ -175,7 +175,7 @@ static void page_verify_redirects(Page page);
static bool heap_page_will_freeze(Relation relation, Buffer buffer,
bool did_tuple_hint_fpi, bool do_prune, bool do_hint_prune,
- PruneState *prstate);
+ PruneState *prstate, TransactionId *frz_conflict_horizon);
/*
@@ -308,7 +308,9 @@ heap_page_prune_opt(Relation relation, Buffer buffer)
* performs several pre-freeze checks.
*
* The values of do_prune, do_hint_prune, and did_tuple_hint_fpi must be
- * determined before calling this function.
+ * determined before calling this function. *frz_conflict_horizon is set to
+ * the snapshot conflict horizon we for the WAL record should we decide to
+ * freeze tuples.
*
* prstate is both an input and output parameter.
*
@@ -320,7 +322,8 @@ heap_page_will_freeze(Relation relation, Buffer buffer,
bool did_tuple_hint_fpi,
bool do_prune,
bool do_hint_prune,
- PruneState *prstate)
+ PruneState *prstate,
+ TransactionId *frz_conflict_horizon)
{
bool do_freeze = false;
@@ -390,6 +393,22 @@ heap_page_will_freeze(Relation relation, Buffer buffer,
* critical section.
*/
heap_pre_freeze_checks(buffer, prstate->frozen, prstate->nfrozen);
+
+ /*
+ * Calculate what the snapshot conflict horizon should be for a record
+ * freezing tuples. We can use the visibility_cutoff_xid as our cutoff
+ * for conflicts when the whole page is eligible to become all-frozen
+ * in the VM once we're done with it. Otherwise, we generate a
+ * conservative cutoff by stepping back from OldestXmin.
+ */
+ if (prstate->all_frozen)
+ *frz_conflict_horizon = prstate->visibility_cutoff_xid;
+ else
+ {
+ /* Avoids false conflicts when hot_standby_feedback in use */
+ *frz_conflict_horizon = prstate->cutoffs->OldestXmin;
+ TransactionIdRetreat(*frz_conflict_horizon);
+ }
}
else if (prstate->nfrozen > 0)
{
@@ -434,10 +453,11 @@ heap_page_will_freeze(Relation relation, Buffer buffer,
* considered advantageous for overall system performance to do so now. The
* 'params.cutoffs', 'presult', 'new_relfrozen_xid' and 'new_relmin_mxid'
* arguments are required when freezing. When HEAP_PRUNE_FREEZE option is
- * passed, we also set presult->all_visible and presult->all_frozen on exit,
- * to indicate if the VM bits can be set. They are always set to false when
- * the HEAP_PRUNE_FREEZE option is not passed, because at the moment only
- * callers that also freeze need that information.
+ * passed, we also set presult->all_visible and presult->all_frozen after
+ * determining whether or not to opporunistically freeze, to indicate if the
+ * VM bits can be set. They are always set to false when the
+ * HEAP_PRUNE_FREEZE option is not passed, because at the moment only callers
+ * that also freeze need that information.
*
* presult contains output parameters needed by callers, such as the number of
* tuples removed and the offsets of dead items on the page after pruning.
@@ -473,6 +493,7 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
bool do_hint_prune;
bool did_tuple_hint_fpi;
int64 fpi_before = pgWalUsage.wal_fpi;
+ TransactionId frz_conflict_horizon = InvalidTransactionId;
/* Copy parameters to prstate */
prstate.vistest = params->vistest;
@@ -542,10 +563,10 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
* are tuples present that are not visible to everyone or if there are
* dead tuples which are not yet removable. However, dead tuples which
* will be removed by the end of vacuuming should not preclude us from
- * opportunistically freezing. Because of that, we do not clear
- * all_visible when we see LP_DEAD items. We fix that at the end of the
- * function, when we return the value to the caller, so that the caller
- * doesn't set the VM bit incorrectly.
+ * opportunistically freezing. Because of that, we do not immediately
+ * clear all_visible when we see LP_DEAD items. We fix that after
+ * scanning the line pointers, before we return the value to the caller,
+ * so that the caller doesn't set the VM bit incorrectly.
*/
if (prstate.attempt_freeze)
{
@@ -780,7 +801,24 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
did_tuple_hint_fpi,
do_prune,
do_hint_prune,
- &prstate);
+ &prstate,
+ &frz_conflict_horizon);
+
+ /*
+ * While scanning the line pointers, we did not clear
+ * all_visible/all_frozen when encountering LP_DEAD items because we
+ * wanted the decision whether or not to freeze the page to be unaffected
+ * by the short-term presence of LP_DEAD items. These LP_DEAD items are
+ * effectively assumed to be LP_UNUSED items in the making. It doesn't
+ * matter which vacuum heap pass (initial pass or final pass) ends up
+ * setting the page all-frozen, as long as the ongoing VACUUM does it.
+ *
+ * Now that we finished determining whether or not to freeze the page,
+ * update all_visible and all_frozen so that they reflect the true state
+ * of the page for setting PD_ALL_VISIBLE and VM bits.
+ */
+ if (prstate.lpdead_items > 0)
+ prstate.all_visible = prstate.all_frozen = false;
Assert(!prstate.all_frozen || prstate.all_visible);
@@ -842,27 +880,8 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
* on the standby with xids older than the youngest tuple this
* record will freeze will conflict.
*/
- TransactionId frz_conflict_horizon = InvalidTransactionId;
TransactionId conflict_xid;
- /*
- * We can use the visibility_cutoff_xid as our cutoff for
- * conflicts when the whole page is eligible to become all-frozen
- * in the VM once we're done with it. Otherwise we generate a
- * conservative cutoff by stepping back from OldestXmin.
- */
- if (do_freeze)
- {
- if (prstate.all_frozen)
- frz_conflict_horizon = prstate.visibility_cutoff_xid;
- else
- {
- /* Avoids false conflicts when hot_standby_feedback in use */
- frz_conflict_horizon = prstate.cutoffs->OldestXmin;
- TransactionIdRetreat(frz_conflict_horizon);
- }
- }
-
if (TransactionIdFollows(frz_conflict_horizon, prstate.latest_xid_removed))
conflict_xid = frz_conflict_horizon;
else
@@ -888,30 +907,8 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
presult->nfrozen = prstate.nfrozen;
presult->live_tuples = prstate.live_tuples;
presult->recently_dead_tuples = prstate.recently_dead_tuples;
-
- /*
- * It was convenient to ignore LP_DEAD items in all_visible earlier on to
- * make the choice of whether or not to freeze the page unaffected by the
- * short-term presence of LP_DEAD items. These LP_DEAD items were
- * effectively assumed to be LP_UNUSED items in the making. It doesn't
- * matter which vacuum heap pass (initial pass or final pass) ends up
- * setting the page all-frozen, as long as the ongoing VACUUM does it.
- *
- * Now that freezing has been finalized, unset all_visible if there are
- * any LP_DEAD items on the page. It needs to reflect the present state
- * of the page, as expected by our caller.
- */
- if (prstate.all_visible && prstate.lpdead_items == 0)
- {
- presult->all_visible = prstate.all_visible;
- presult->all_frozen = prstate.all_frozen;
- }
- else
- {
- presult->all_visible = false;
- presult->all_frozen = false;
- }
-
+ presult->all_visible = prstate.all_visible;
+ presult->all_frozen = prstate.all_frozen;
presult->hastup = prstate.hastup;
/*
--
2.43.0
v19-0004-Eliminate-XLOG_HEAP2_VISIBLE-from-vacuum-phase-I.patchtext/x-patch; charset=UTF-8; name=v19-0004-Eliminate-XLOG_HEAP2_VISIBLE-from-vacuum-phase-I.patchDownload
From 02c1c23779aa22e05dde52bb02701398f5261654 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <melanieplageman@gmail.com>
Date: Wed, 8 Oct 2025 15:39:01 -0400
Subject: [PATCH v19 04/12] Eliminate XLOG_HEAP2_VISIBLE from vacuum phase I
prune/freeze
Vacuum no longer emits a separate WAL record for each page set
all-visible or all-frozen during phase I. Instead, visibility map
updates are now included in the XLOG_HEAP2_PRUNE_VACUUM_SCAN record that
is already emitted for pruning and freezing.
Previously, heap_page_prune_and_freeze() determined whether a page was
all-visible, but the corresponding VM bits were only set later in
lazy_scan_prune(). Now the VM is updated immediately in
heap_page_prune_and_freeze(), at the same time as the heap
modifications.
This change applies only to vacuum phase I, not to pruning performed
during normal page access.
Author: Melanie Plageman <melanieplageman@gmail.com>
Reviewed-by: Robert Haas <robertmhaas@gmail.com>
Reviewed-by: Kirill Reshke <reshkekirill@gmail.com>
Discussion: https://postgr.es/m/flat/CAAKRu_ZMw6Npd_qm2KM%2BFwQ3cMOMx1Dh3VMhp8-V7SOLxdK9-g%40mail.gmail.com
---
src/backend/access/heap/heapam_xlog.c | 37 ++-
src/backend/access/heap/pruneheap.c | 435 ++++++++++++++++++++------
src/backend/access/heap/vacuumlazy.c | 207 +-----------
src/include/access/heapam.h | 43 ++-
4 files changed, 422 insertions(+), 300 deletions(-)
diff --git a/src/backend/access/heap/heapam_xlog.c b/src/backend/access/heap/heapam_xlog.c
index 11cb3f74da5..2af724451c3 100644
--- a/src/backend/access/heap/heapam_xlog.c
+++ b/src/backend/access/heap/heapam_xlog.c
@@ -104,6 +104,8 @@ heap_xlog_prune_freeze(XLogReaderState *record)
OffsetNumber *frz_offsets;
char *dataptr = XLogRecGetBlockData(record, 0, &datalen);
bool do_prune;
+ bool set_lsn = false;
+ bool mark_buffer_dirty = false;
heap_xlog_deserialize_prune_and_freeze(dataptr, xlrec.flags,
&nplans, &plans, &frz_offsets,
@@ -157,17 +159,36 @@ heap_xlog_prune_freeze(XLogReaderState *record)
/* There should be no more data */
Assert((char *) frz_offsets == dataptr + datalen);
- if (vmflags & VISIBILITYMAP_VALID_BITS)
- PageSetAllVisible(page);
-
- MarkBufferDirty(buffer);
+ if (do_prune || nplans > 0)
+ mark_buffer_dirty = set_lsn = true;
/*
- * See log_heap_prune_and_freeze() for commentary on when we set the
- * heap page LSN.
+ * The critical integrity requirement here is that we must never end
+ * up with with the visibility map bit set and the page-level
+ * PD_ALL_VISIBLE bit clear. If that were to occur, a subsequent page
+ * modification would fail to clear the visibility map bit.
+ *
+ * vmflags may be nonzero with PD_ALL_VISIBLE already set (e.g. when
+ * marking an all-visible page all-frozen). If only the VM is updated,
+ * the heap page need not be dirtied.
*/
- if (do_prune || nplans > 0 ||
- ((vmflags & VISIBILITYMAP_VALID_BITS) && XLogHintBitIsNeeded()))
+ if ((vmflags & VISIBILITYMAP_VALID_BITS) && !PageIsAllVisible(page))
+ {
+ PageSetAllVisible(page);
+ mark_buffer_dirty = true;
+
+ /*
+ * See log_heap_prune_and_freeze() for commentary on when we set
+ * the heap page LSN.
+ */
+ if (XLogHintBitIsNeeded())
+ set_lsn = true;
+ }
+
+ if (mark_buffer_dirty)
+ MarkBufferDirty(buffer);
+
+ if (set_lsn)
PageSetLSN(page, lsn);
/*
diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index ef8861022f1..8dafbd344d8 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -19,7 +19,7 @@
#include "access/htup_details.h"
#include "access/multixact.h"
#include "access/transam.h"
-#include "access/visibilitymapdefs.h"
+#include "access/visibilitymap.h"
#include "access/xlog.h"
#include "access/xloginsert.h"
#include "commands/vacuum.h"
@@ -44,6 +44,8 @@ typedef struct
bool mark_unused_now;
/* whether to attempt freezing tuples */
bool attempt_freeze;
+ /* whether or not to attempt updating the VM */
+ bool attempt_update_vm;
struct VacuumCutoffs *cutoffs;
/*-------------------------------------------------------
@@ -133,16 +135,17 @@ typedef struct
* all_visible and all_frozen indicate if the all-visible and all-frozen
* bits in the visibility map can be set for this page after pruning.
*
- * visibility_cutoff_xid is the newest xmin of live tuples on the page.
- * The caller can use it as the conflict horizon, when setting the VM
- * bits. It is only valid if we froze some tuples, and all_frozen is
- * true.
+ * visibility_cutoff_xid is the newest xmin of live tuples on the page. It
+ * can be used as the conflict horizon when setting the VM or when
+ * freezing all the tuples on the page. It is only valid when all the live
+ * tuples on the page are all-visible.
*
* NOTE: all_visible and all_frozen initially don't include LP_DEAD items.
* That's convenient for heap_page_prune_and_freeze() to use them to
- * decide whether to freeze the page or not. The all_visible and
- * all_frozen values returned to the caller are adjusted to include
- * LP_DEAD items after we determine whether to opportunistically freeze.
+ * decide whether to opportunistically freeze the page or not. The
+ * all_visible and all_frozen values ultimately used to set the VM are
+ * adjusted to include LP_DEAD items after we determine whether or not to
+ * opportunistically freeze.
*/
bool all_visible;
bool all_frozen;
@@ -173,10 +176,21 @@ static void heap_prune_record_unchanged_lp_redirect(PruneState *prstate, OffsetN
static void page_verify_redirects(Page page);
+static TransactionId get_conflict_xid(bool do_prune, bool do_freeze, bool do_set_vm,
+ TransactionId latest_xid_removed, TransactionId frz_conflict_horizon,
+ TransactionId visibility_cutoff_xid, bool blk_already_av,
+ bool set_blk_all_frozen);
static bool heap_page_will_freeze(Relation relation, Buffer buffer,
bool did_tuple_hint_fpi, bool do_prune, bool do_hint_prune,
PruneState *prstate, TransactionId *frz_conflict_horizon);
-
+static bool heap_page_will_set_vis(Relation relation,
+ BlockNumber heap_blk,
+ Buffer heap_buf,
+ Buffer vmbuffer,
+ bool blk_known_av,
+ const PruneState *prstate,
+ uint8 *vmflags,
+ bool *do_set_pd_vis);
/*
* Optionally prune and repair fragmentation in the specified page.
@@ -262,6 +276,8 @@ heap_page_prune_opt(Relation relation, Buffer buffer)
params.reason = PRUNE_ON_ACCESS;
params.vistest = vistest;
params.cutoffs = NULL;
+ params.vmbuffer = InvalidBuffer;
+ params.blk_known_av = false;
/*
* For now, pass mark_unused_now as false regardless of whether or
@@ -434,10 +450,108 @@ heap_page_will_freeze(Relation relation, Buffer buffer,
return do_freeze;
}
+/*
+ * Decide whether to set the visibility map bits for heap_blk, using
+ * information from PruneState and blk_known_av. Some callers may already
+ * have examined this page’s VM bits (e.g., VACUUM in the previous
+ * heap_vac_scan_next_block() call) and can pass that along.
+ *
+ * Returns true if one or both VM bits should be set, along with the desired
+ * flags in *vmflags. Also indicates via do_set_pd_vis whether PD_ALL_VISIBLE
+ * should be set on the heap page.
+ */
+static bool
+heap_page_will_set_vis(Relation relation,
+ BlockNumber heap_blk,
+ Buffer heap_buf,
+ Buffer vmbuffer,
+ bool blk_known_av,
+ const PruneState *prstate,
+ uint8 *vmflags,
+ bool *do_set_pd_vis)
+{
+ Page heap_page = BufferGetPage(heap_buf);
+ bool do_set_vm = false;
+
+ *do_set_pd_vis = false;
+
+ if (!prstate->attempt_update_vm)
+ {
+ Assert(!prstate->all_visible && !prstate->all_frozen);
+ Assert(*vmflags == 0);
+ return false;
+ }
+
+ if (prstate->all_visible && !PageIsAllVisible(heap_page))
+ *do_set_pd_vis = true;
+
+ if ((prstate->all_visible && !blk_known_av) ||
+ (prstate->all_frozen && !VM_ALL_FROZEN(relation, heap_blk, &vmbuffer)))
+ {
+ *vmflags = VISIBILITYMAP_ALL_VISIBLE;
+ if (prstate->all_frozen)
+ *vmflags |= VISIBILITYMAP_ALL_FROZEN;
+
+ do_set_vm = true;
+ }
+
+ /*
+ * Now handle two potential corruption cases:
+ *
+ * These do not need to happen in a critical section and are not
+ * WAL-logged.
+ *
+ * As of PostgreSQL 9.2, the visibility map bit should never be set if the
+ * page-level bit is clear. However, it's possible that in vacuum the bit
+ * got cleared after heap_vac_scan_next_block() was called, so we must
+ * recheck with buffer lock before concluding that the VM is corrupt.
+ */
+ else if (blk_known_av && !PageIsAllVisible(heap_page) &&
+ visibilitymap_get_status(relation, heap_blk, &vmbuffer) != 0)
+ {
+ ereport(WARNING,
+ (errcode(ERRCODE_DATA_CORRUPTED),
+ errmsg("page is not marked all-visible but visibility map bit is set in relation \"%s\" page %u",
+ RelationGetRelationName(relation), heap_blk)));
+
+ visibilitymap_clear(relation, heap_blk, vmbuffer,
+ VISIBILITYMAP_VALID_BITS);
+ }
+
+ /*
+ * It's possible for the value returned by
+ * GetOldestNonRemovableTransactionId() to move backwards, so it's not
+ * wrong for us to see tuples that appear to not be visible to everyone
+ * yet, while PD_ALL_VISIBLE is already set. The real safe xmin value
+ * never moves backwards, but GetOldestNonRemovableTransactionId() is
+ * conservative and sometimes returns a value that's unnecessarily small,
+ * so if we see that contradiction it just means that the tuples that we
+ * think are not visible to everyone yet actually are, and the
+ * PD_ALL_VISIBLE flag is correct.
+ *
+ * There should never be LP_DEAD items on a page with PD_ALL_VISIBLE set,
+ * however.
+ */
+ else if (prstate->lpdead_items > 0 && PageIsAllVisible(heap_page))
+ {
+ ereport(WARNING,
+ (errcode(ERRCODE_DATA_CORRUPTED),
+ errmsg("page containing LP_DEAD items is marked as all-visible in relation \"%s\" page %u",
+ RelationGetRelationName(relation), heap_blk)));
+
+ PageClearAllVisible(heap_page);
+ MarkBufferDirty(heap_buf);
+ visibilitymap_clear(relation, heap_blk, vmbuffer,
+ VISIBILITYMAP_VALID_BITS);
+ }
+
+ return do_set_vm;
+}
/*
* Prune and repair fragmentation and potentially freeze tuples on the
- * specified page.
+ * specified page. If the page's visibility status has changed, update it in
+ * the VM.
*
* Caller must have pin and buffer cleanup lock on the page. Note that we
* don't update the FSM information for page on caller's behalf. Caller might
@@ -452,12 +566,13 @@ heap_page_will_freeze(Relation relation, Buffer buffer,
* it's required in order to advance relfrozenxid / relminmxid, or if it's
* considered advantageous for overall system performance to do so now. The
* 'params.cutoffs', 'presult', 'new_relfrozen_xid' and 'new_relmin_mxid'
- * arguments are required when freezing. When HEAP_PRUNE_FREEZE option is
- * passed, we also set presult->all_visible and presult->all_frozen after
- * determining whether or not to opporunistically freeze, to indicate if the
- * VM bits can be set. They are always set to false when the
- * HEAP_PRUNE_FREEZE option is not passed, because at the moment only callers
- * that also freeze need that information.
+ * arguments are required when freezing.
+ *
+ * If HEAP_PAGE_PRUNE_UPDATE_VIS is set in params and the visibility status of
+ * the page has changed, we will update the VM at the same time as pruning and
+ * freezing the heap page. We will also update presult->old_vmbits and
+ * presult->new_vmbits with the state of the VM before and after updating it
+ * for the caller to use in bookkeeping.
*
* presult contains output parameters needed by callers, such as the number of
* tuples removed and the offsets of dead items on the page after pruning.
@@ -482,6 +597,7 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
MultiXactId *new_relmin_mxid)
{
Buffer buffer = params->buffer;
+ Buffer vmbuffer = params->vmbuffer;
Page page = BufferGetPage(buffer);
BlockNumber blockno = BufferGetBlockNumber(buffer);
OffsetNumber offnum,
@@ -491,15 +607,22 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
bool do_freeze;
bool do_prune;
bool do_hint_prune;
+ bool do_set_vm;
+ bool do_set_pd_vis;
bool did_tuple_hint_fpi;
int64 fpi_before = pgWalUsage.wal_fpi;
TransactionId frz_conflict_horizon = InvalidTransactionId;
+ TransactionId conflict_xid = InvalidTransactionId;
+ uint8 new_vmbits = 0;
+ uint8 old_vmbits = 0;
/* Copy parameters to prstate */
prstate.vistest = params->vistest;
prstate.mark_unused_now =
(params->options & HEAP_PAGE_PRUNE_MARK_UNUSED_NOW) != 0;
prstate.attempt_freeze = (params->options & HEAP_PAGE_PRUNE_FREEZE) != 0;
+ prstate.attempt_update_vm =
+ (params->options & HEAP_PAGE_PRUNE_UPDATE_VIS) != 0;
prstate.cutoffs = params->cutoffs;
/*
@@ -546,50 +669,54 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
prstate.deadoffsets = presult->deadoffsets;
/*
- * Caller may update the VM after we're done. We can keep track of
- * whether the page will be all-visible and all-frozen after pruning and
- * freezing to help the caller to do that.
+ * Track whether the page could be marked all-visible and/or all-frozen.
+ * This information is used for opportunistic freezing and for updating
+ * the visibility map (VM) if requested by the caller.
+ *
+ * Currently, only VACUUM performs freezing, but other callers may in the
+ * future. Visibility bookkeeping is required not just for setting the VM
+ * bits, but also for opportunistic freezing: we only consider freezing if
+ * the page would become all-frozen, or if it would be all-frozen except
+ * for dead tuples that VACUUM will remove. If attempt_update_vm is false,
+ * we will not set the VM bit even if the page is found to be all-visible.
+ *
+ * If HEAP_PAGE_PRUNE_UPDATE_VIS is passed without HEAP_PAGE_PRUNE_FREEZE,
+ * prstate.all_frozen must be initialized to false, since we will not call
+ * heap_prepare_freeze_tuple() for each tuple.
*
- * Currently, only VACUUM sets the VM bits. To save the effort, only do
- * the bookkeeping if the caller needs it. Currently, that's tied to
- * HEAP_PAGE_PRUNE_FREEZE, but it could be a separate flag if you wanted
- * to update the VM bits without also freezing or freeze without also
- * setting the VM bits.
+ * Dead tuples that will be removed by the end of vacuum should not
+ * prevent opportunistic freezing. Therefore, we do not clear all_visible
+ * when we encounter LP_DEAD items. Instead, we correct all_visible after
+ * deciding whether to freeze, but before updating the VM, to avoid
+ * setting the VM bit incorrectly.
*
- * In addition to telling the caller whether it can set the VM bit, we
- * also use 'all_visible' and 'all_frozen' for our own decision-making. If
- * the whole page would become frozen, we consider opportunistically
- * freezing tuples. We will not be able to freeze the whole page if there
- * are tuples present that are not visible to everyone or if there are
- * dead tuples which are not yet removable. However, dead tuples which
- * will be removed by the end of vacuuming should not preclude us from
- * opportunistically freezing. Because of that, we do not immediately
- * clear all_visible when we see LP_DEAD items. We fix that after
- * scanning the line pointers, before we return the value to the caller,
- * so that the caller doesn't set the VM bit incorrectly.
+ * If neither freezing nor VM updates are requested, we skip the extra
+ * bookkeeping. In this case, initializing all_visible to false allows
+ * heap_prune_record_unchanged_lp_normal() to bypass unnecessary work.
*/
if (prstate.attempt_freeze)
{
prstate.all_visible = true;
prstate.all_frozen = true;
}
+ else if (prstate.attempt_update_vm)
+ {
+ prstate.all_visible = true;
+ prstate.all_frozen = false;
+ }
else
{
- /*
- * Initializing to false allows skipping the work to update them in
- * heap_prune_record_unchanged_lp_normal().
- */
prstate.all_visible = false;
prstate.all_frozen = false;
}
/*
- * The visibility cutoff xid is the newest xmin of live tuples on the
- * page. In the common case, this will be set as the conflict horizon the
- * caller can use for updating the VM. If, at the end of freezing and
- * pruning, the page is all-frozen, there is no possibility that any
- * running transaction on the standby does not see tuples on the page as
- * all-visible, so the conflict horizon remains InvalidTransactionId.
+ * The visibility cutoff xid is the newest xmin of live, committed tuples
+ * older than OldestXmin on the page. This field is only kept up-to-date
+ * if the page is all-visible. As soon as a tuple is encountered that is
+ * not visible to all, this field is unmaintained. As long as it is
+ * maintained, it can be used to calculate the snapshot conflict horizon
+ * when updating the VM and/or freezing all the tuples on the page.
*/
prstate.visibility_cutoff_xid = InvalidTransactionId;
@@ -821,6 +948,34 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
prstate.all_visible = prstate.all_frozen = false;
Assert(!prstate.all_frozen || prstate.all_visible);
+ Assert(!prstate.all_visible || (prstate.lpdead_items == 0));
+
+ /*
+ * Decide whether to set the page-level PD_ALL_VISIBLE bit and the VM bits
+ * based on information from the VM and the all_visible/all_frozen flags.
+ *
+ * While it is valid for PD_ALL_VISIBLE to be set when the corresponding
+ * VM bit is clear, we strongly prefer to keep them in sync.
+ *
+ * Accordingly, we also allow updating only the VM when PD_ALL_VISIBLE has
+ * already been set. Setting only the VM is most common when setting an
+ * already all-visible page all-frozen.
+ */
+ do_set_vm = heap_page_will_set_vis(params->relation,
+ blockno, buffer, vmbuffer, params->blk_known_av,
+ &prstate, &new_vmbits, &do_set_pd_vis);
+
+ /* We should only set the VM if PD_ALL_VISIBLE is set or will be */
+ Assert(!do_set_vm || do_set_pd_vis || PageIsAllVisible(page));
+
+ conflict_xid = get_conflict_xid(do_prune, do_freeze, do_set_vm,
+ prstate.latest_xid_removed, frz_conflict_horizon,
+ prstate.visibility_cutoff_xid, params->blk_known_av,
+ (do_set_vm && (new_vmbits & VISIBILITYMAP_ALL_FROZEN)));
+
+ /* Lock vmbuffer before entering a critical section */
+ if (do_set_vm)
+ LockBuffer(vmbuffer, BUFFER_LOCK_EXCLUSIVE);
/* Any error while applying the changes is critical */
START_CRIT_SECTION();
@@ -842,14 +997,17 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
/*
* If that's all we had to do to the page, this is a non-WAL-logged
- * hint. If we are going to freeze or prune the page, we will mark
- * the buffer dirty below.
+ * hint. If we are going to freeze or prune the page or set
+ * PD_ALL_VISIBLE, we will mark the buffer dirty below.
+ *
+ * Setting PD_ALL_VISIBLE is fully WAL-logged because it is forbidden
+ * for the VM to be set and PD_ALL_VISIBLE to be clear.
*/
- if (!do_freeze && !do_prune)
+ if (!do_freeze && !do_prune && !do_set_pd_vis)
MarkBufferDirtyHint(buffer, true);
}
- if (do_prune || do_freeze)
+ if (do_prune || do_freeze || do_set_vm)
{
/* Apply the planned item changes and repair page fragmentation. */
if (do_prune)
@@ -863,35 +1021,43 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
if (do_freeze)
heap_freeze_prepared_tuples(buffer, prstate.frozen, prstate.nfrozen);
- MarkBufferDirty(buffer);
+ if (do_set_pd_vis)
+ PageSetAllVisible(page);
- /*
- * Emit a WAL XLOG_HEAP2_PRUNE* record showing what we did
- */
- if (RelationNeedsWAL(params->relation))
+ if (do_prune || do_freeze || do_set_pd_vis)
+ MarkBufferDirty(buffer);
+
+ if (do_set_vm)
{
- /*
- * The snapshotConflictHorizon for the whole record should be the
- * most conservative of all the horizons calculated for any of the
- * possible modifications. If this record will prune tuples, any
- * transactions on the standby older than the youngest xmax of the
- * most recently removed tuple this record will prune will
- * conflict. If this record will freeze tuples, any transactions
- * on the standby with xids older than the youngest tuple this
- * record will freeze will conflict.
- */
- TransactionId conflict_xid;
+ Assert(PageIsAllVisible(page));
- if (TransactionIdFollows(frz_conflict_horizon, prstate.latest_xid_removed))
- conflict_xid = frz_conflict_horizon;
- else
- conflict_xid = prstate.latest_xid_removed;
+ old_vmbits = visibilitymap_set_vmbits(blockno,
+ vmbuffer, new_vmbits,
+ params->relation->rd_locator);
+ if (old_vmbits == new_vmbits)
+ {
+ LockBuffer(vmbuffer, BUFFER_LOCK_UNLOCK);
+ /* Unset so we don't emit WAL since no change occurred */
+ do_set_vm = false;
+ }
+ }
+ /*
+ * Emit a WAL XLOG_HEAP2_PRUNE* record showing what we did. If we were
+ * only updating the VM and it turns out it was already set, we will
+ * have unset do_set_vm earlier. As such, check it again before
+ * emitting the record.
+ */
+ if (RelationNeedsWAL(params->relation) &&
+ (do_prune || do_freeze || do_set_vm))
+ {
log_heap_prune_and_freeze(params->relation, buffer,
- InvalidBuffer, /* vmbuffer */
- 0, /* vmflags */
+ do_set_vm ? vmbuffer : InvalidBuffer,
+ do_set_vm ? new_vmbits : 0,
conflict_xid,
- true, params->reason,
+ true, /* cleanup lock */
+ do_set_pd_vis,
+ params->reason,
prstate.frozen, prstate.nfrozen,
prstate.redirected, prstate.nredirected,
prstate.nowdead, prstate.ndead,
@@ -901,28 +1067,47 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
END_CRIT_SECTION();
+ if (do_set_vm)
+ LockBuffer(vmbuffer, BUFFER_LOCK_UNLOCK);
+
+ /*
+ * During its second pass over the heap, VACUUM calls
+ * heap_page_would_be_all_visible() to determine whether a page is
+ * all-visible and all-frozen. The logic here is similar. After completing
+ * pruning and freezing, use an assertion to verify that our results
+ * remain consistent with heap_page_would_be_all_visible().
+ */
+#ifdef USE_ASSERT_CHECKING
+ if (prstate.all_visible)
+ {
+ TransactionId debug_cutoff;
+ bool debug_all_frozen;
+
+ Assert(prstate.lpdead_items == 0);
+ Assert(prstate.cutoffs);
+
+ if (!heap_page_is_all_visible(params->relation, buffer,
+ prstate.cutoffs->OldestXmin,
+ &debug_all_frozen,
+ &debug_cutoff, off_loc))
+ Assert(false);
+
+ Assert(prstate.all_frozen == debug_all_frozen);
+
+ Assert(!TransactionIdIsValid(debug_cutoff) ||
+ debug_cutoff == prstate.visibility_cutoff_xid);
+ }
+#endif
+
/* Copy information back for caller */
presult->ndeleted = prstate.ndeleted;
presult->nnewlpdead = prstate.ndead;
presult->nfrozen = prstate.nfrozen;
presult->live_tuples = prstate.live_tuples;
presult->recently_dead_tuples = prstate.recently_dead_tuples;
- presult->all_visible = prstate.all_visible;
- presult->all_frozen = prstate.all_frozen;
presult->hastup = prstate.hastup;
-
- /*
- * For callers planning to update the visibility map, the conflict horizon
- * for that record must be the newest xmin on the page. However, if the
- * page is completely frozen, there can be no conflict and the
- * vm_conflict_horizon should remain InvalidTransactionId. This includes
- * the case that we just froze all the tuples; the prune-freeze record
- * included the conflict XID already so the caller doesn't need it.
- */
- if (presult->all_frozen)
- presult->vm_conflict_horizon = InvalidTransactionId;
- else
- presult->vm_conflict_horizon = prstate.visibility_cutoff_xid;
+ presult->new_vmbits = new_vmbits;
+ presult->old_vmbits = old_vmbits;
presult->lpdead_items = prstate.lpdead_items;
/* the presult->deadoffsets array was already filled in */
@@ -1413,6 +1598,8 @@ heap_prune_record_unchanged_lp_normal(Page page, PruneState *prstate, OffsetNumb
{
TransactionId xmin;
+ Assert(prstate->attempt_update_vm);
+
if (!HeapTupleHeaderXminCommitted(htup))
{
prstate->all_visible = prstate->all_frozen = false;
@@ -2058,6 +2245,65 @@ heap_log_freeze_plan(HeapTupleFreeze *tuples, int ntuples,
return nplans;
}
+/*
+ * Calculate the conflict horizon for the whole XLOG_HEAP2_PRUNE_VACUUM_SCAN
+ * record.
+ */
+static TransactionId
+get_conflict_xid(bool do_prune, bool do_freeze, bool do_set_vm,
+ TransactionId latest_xid_removed, TransactionId frz_conflict_horizon,
+ TransactionId visibility_cutoff_xid, bool blk_already_av,
+ bool set_blk_all_frozen)
+{
+ TransactionId conflict_xid;
+
+ /*
+ * We can omit the snapshot conflict horizon if we are not pruning or
+ * freezing any tuples and are setting an already all-visible page
+ * all-frozen in the VM. In this case, all of the tuples on the page must
+ * already be visible to all MVCC snapshots on the standby.
+ */
+ if (!do_prune && !do_freeze &&
+ do_set_vm && blk_already_av && set_blk_all_frozen)
+ return InvalidTransactionId;
+
+ /*
+ * The snapshotConflictHorizon for the whole record should be the most
+ * conservative of all the horizons calculated for any of the possible
+ * modifications. If this record will prune tuples, any transactions on
+ * the standby older than the youngest xmax of the most recently removed
+ * tuple this record will prune will conflict. If this record will freeze
+ * tuples, any transactions on the standby with xids older than the
+ * youngest tuple this record will freeze will conflict.
+ */
+ conflict_xid = InvalidTransactionId;
+
+ /*
+ * If we are updating the VM, the conflict horizon is almost always the
+ * visibility cutoff XID.
+ *
+ * Separately, if we are freezing any tuples, as an optimization, we can
+ * use the visibility_cutoff_xid as the conflict horizon if the page will
+ * be all-frozen. This is true even if there are LP_DEAD line pointers
+ * because we ignored those when maintaining the visibility_cutoff_xid.
+ * This will have been calculated earlier as the frz_conflict_horizon when
+ * we determined we would freeze.
+ */
+ if (do_set_vm)
+ conflict_xid = visibility_cutoff_xid;
+ else if (do_freeze)
+ conflict_xid = frz_conflict_horizon;
+
+ /*
+ * If we are removing tuples with a younger xmax than our so far
+ * calculated conflict_xid, we must use this as our horizon.
+ */
+ if (TransactionIdFollows(latest_xid_removed, conflict_xid))
+ conflict_xid = latest_xid_removed;
+
+ return conflict_xid;
+}
+
/*
* Write an XLOG_HEAP2_PRUNE* WAL record
*
@@ -2082,6 +2328,15 @@ heap_log_freeze_plan(HeapTupleFreeze *tuples, int ntuples,
* case, vmbuffer should already have been updated and marked dirty and should
* still be pinned and locked.
*
+ * set_pd_all_vis indicates that we set PD_ALL_VISIBLE and thus should update
+ * the page LSN when checksums/wal_log_hints are enabled even if we did not
+ * prune or freeze tuples on the page.
+ *
+ * In some cases, such as when heap_page_prune_and_freeze() is setting an
+ * already marked all-visible page all-frozen, PD_ALL_VISIBLE may already be
+ * set. So, it is possible for vmflags to be non-zero and set_pd_all_vis to be
+ * false.
+ *
* Note: This function scribbles on the 'frozen' array.
*
* Note: This is called in a critical section, so careful what you do here.
@@ -2091,6 +2346,7 @@ log_heap_prune_and_freeze(Relation relation, Buffer buffer,
Buffer vmbuffer, uint8 vmflags,
TransactionId conflict_xid,
bool cleanup_lock,
+ bool set_pd_all_vis,
PruneReason reason,
HeapTupleFreeze *frozen, int nfrozen,
OffsetNumber *redirected, int nredirected,
@@ -2127,7 +2383,7 @@ log_heap_prune_and_freeze(Relation relation, Buffer buffer,
*/
if (!do_prune &&
nfrozen == 0 &&
- (!do_set_vm || !XLogHintBitIsNeeded()))
+ (!set_pd_all_vis || !XLogHintBitIsNeeded()))
regbuf_flags_heap |= REGBUF_NO_IMAGE;
/*
@@ -2245,7 +2501,8 @@ log_heap_prune_and_freeze(Relation relation, Buffer buffer,
* See comment at the top of the function about regbuf_flags_heap for
* details on when we can advance the page LSN.
*/
- if (do_prune || nfrozen > 0 || (do_set_vm && XLogHintBitIsNeeded()))
+ if (do_prune || nfrozen > 0 ||
+ (set_pd_all_vis && XLogHintBitIsNeeded()))
{
Assert(BufferIsDirty(buffer));
PageSetLSN(BufferGetPage(buffer), recptr);
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index 670a7424b15..60529fcf9d7 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -464,13 +464,6 @@ static void dead_items_add(LVRelState *vacrel, BlockNumber blkno, OffsetNumber *
static void dead_items_reset(LVRelState *vacrel);
static void dead_items_cleanup(LVRelState *vacrel);
-#ifdef USE_ASSERT_CHECKING
-static bool heap_page_is_all_visible(Relation rel, Buffer buf,
- TransactionId OldestXmin,
- bool *all_frozen,
- TransactionId *visibility_cutoff_xid,
- OffsetNumber *logging_offnum);
-#endif
static bool heap_page_would_be_all_visible(Relation rel, Buffer buf,
TransactionId OldestXmin,
OffsetNumber *deadoffsets,
@@ -1974,6 +1967,8 @@ lazy_scan_prune(LVRelState *vacrel,
params.reason = PRUNE_VACUUM_SCAN;
params.cutoffs = &vacrel->cutoffs;
params.vistest = vacrel->vistest;
+ params.vmbuffer = vmbuffer;
+ params.blk_known_av = all_visible_according_to_vm;
/*
* Prune all HOT-update chains and potentially freeze tuples on this page.
@@ -1990,7 +1985,7 @@ lazy_scan_prune(LVRelState *vacrel,
* tuples. Pruning will have determined whether or not the page is
* all-visible.
*/
- params.options = HEAP_PAGE_PRUNE_FREEZE;
+ params.options = HEAP_PAGE_PRUNE_FREEZE | HEAP_PAGE_PRUNE_UPDATE_VIS;
if (vacrel->nindexes == 0)
params.options |= HEAP_PAGE_PRUNE_MARK_UNUSED_NOW;
@@ -2013,33 +2008,6 @@ lazy_scan_prune(LVRelState *vacrel,
vacrel->new_frozen_tuple_pages++;
}
- /*
- * VACUUM will call heap_page_is_all_visible() during the second pass over
- * the heap to determine all_visible and all_frozen for the page -- this
- * is a specialized version of the logic from this function. Now that
- * we've finished pruning and freezing, make sure that we're in total
- * agreement with heap_page_is_all_visible() using an assertion.
- */
-#ifdef USE_ASSERT_CHECKING
- if (presult.all_visible)
- {
- TransactionId debug_cutoff;
- bool debug_all_frozen;
-
- Assert(presult.lpdead_items == 0);
-
- if (!heap_page_is_all_visible(vacrel->rel, buf,
- vacrel->cutoffs.OldestXmin, &debug_all_frozen,
- &debug_cutoff, &vacrel->offnum))
- Assert(false);
-
- Assert(presult.all_frozen == debug_all_frozen);
-
- Assert(!TransactionIdIsValid(debug_cutoff) ||
- debug_cutoff == presult.vm_conflict_horizon);
- }
-#endif
-
/*
* Now save details of the LP_DEAD items from the page in vacrel
*/
@@ -2073,168 +2041,26 @@ lazy_scan_prune(LVRelState *vacrel,
/* Did we find LP_DEAD items? */
*has_lpdead_items = (presult.lpdead_items > 0);
- Assert(!presult.all_visible || !(*has_lpdead_items));
- Assert(!presult.all_frozen || presult.all_visible);
-
/*
- * Handle setting visibility map bit based on information from the VM (as
- * of last heap_vac_scan_next_block() call), and from all_visible and
- * all_frozen variables
+ * For the purposes of logging, count whether or not the page was newly
+ * set all-visible and, potentially, all-frozen.
*/
- if (!all_visible_according_to_vm && presult.all_visible)
+ if ((presult.old_vmbits & VISIBILITYMAP_ALL_VISIBLE) == 0 &&
+ (presult.new_vmbits & VISIBILITYMAP_ALL_VISIBLE) != 0)
{
- uint8 old_vmbits;
- uint8 flags = VISIBILITYMAP_ALL_VISIBLE;
-
- if (presult.all_frozen)
- {
- Assert(!TransactionIdIsValid(presult.vm_conflict_horizon));
- flags |= VISIBILITYMAP_ALL_FROZEN;
- }
-
- /*
- * It should never be the case that the visibility map page is set
- * while the page-level bit is clear, but the reverse is allowed (if
- * checksums are not enabled). Regardless, set both bits so that we
- * get back in sync.
- *
- * NB: If the heap page is all-visible but the VM bit is not set, we
- * don't need to dirty the heap page. However, if checksums are
- * enabled, we do need to make sure that the heap page is dirtied
- * before passing it to visibilitymap_set(), because it may be logged.
- * Given that this situation should only happen in rare cases after a
- * crash, it is not worth optimizing.
- */
- PageSetAllVisible(page);
- MarkBufferDirty(buf);
- old_vmbits = visibilitymap_set(vacrel->rel, blkno, buf,
- InvalidXLogRecPtr,
- vmbuffer, presult.vm_conflict_horizon,
- flags);
-
- /*
- * If the page wasn't already set all-visible and/or all-frozen in the
- * VM, count it as newly set for logging.
- */
- if ((old_vmbits & VISIBILITYMAP_ALL_VISIBLE) == 0)
- {
- vacrel->vm_new_visible_pages++;
- if (presult.all_frozen)
- {
- vacrel->vm_new_visible_frozen_pages++;
- *vm_page_frozen = true;
- }
- }
- else if ((old_vmbits & VISIBILITYMAP_ALL_FROZEN) == 0 &&
- presult.all_frozen)
+ vacrel->vm_new_visible_pages++;
+ if ((presult.new_vmbits & VISIBILITYMAP_ALL_FROZEN) != 0)
{
- vacrel->vm_new_frozen_pages++;
+ vacrel->vm_new_visible_frozen_pages++;
*vm_page_frozen = true;
}
}
-
- /*
- * As of PostgreSQL 9.2, the visibility map bit should never be set if the
- * page-level bit is clear. However, it's possible that the bit got
- * cleared after heap_vac_scan_next_block() was called, so we must recheck
- * with buffer lock before concluding that the VM is corrupt.
- */
- else if (all_visible_according_to_vm && !PageIsAllVisible(page) &&
- visibilitymap_get_status(vacrel->rel, blkno, &vmbuffer) != 0)
+ else if ((presult.old_vmbits & VISIBILITYMAP_ALL_FROZEN) == 0 &&
+ (presult.new_vmbits & VISIBILITYMAP_ALL_FROZEN) != 0)
{
- ereport(WARNING,
- (errcode(ERRCODE_DATA_CORRUPTED),
- errmsg("page is not marked all-visible but visibility map bit is set in relation \"%s\" page %u",
- vacrel->relname, blkno)));
-
- visibilitymap_clear(vacrel->rel, blkno, vmbuffer,
- VISIBILITYMAP_VALID_BITS);
- }
-
- /*
- * It's possible for the value returned by
- * GetOldestNonRemovableTransactionId() to move backwards, so it's not
- * wrong for us to see tuples that appear to not be visible to everyone
- * yet, while PD_ALL_VISIBLE is already set. The real safe xmin value
- * never moves backwards, but GetOldestNonRemovableTransactionId() is
- * conservative and sometimes returns a value that's unnecessarily small,
- * so if we see that contradiction it just means that the tuples that we
- * think are not visible to everyone yet actually are, and the
- * PD_ALL_VISIBLE flag is correct.
- *
- * There should never be LP_DEAD items on a page with PD_ALL_VISIBLE set,
- * however.
- */
- else if (presult.lpdead_items > 0 && PageIsAllVisible(page))
- {
- ereport(WARNING,
- (errcode(ERRCODE_DATA_CORRUPTED),
- errmsg("page containing LP_DEAD items is marked as all-visible in relation \"%s\" page %u",
- vacrel->relname, blkno)));
-
- PageClearAllVisible(page);
- MarkBufferDirty(buf);
- visibilitymap_clear(vacrel->rel, blkno, vmbuffer,
- VISIBILITYMAP_VALID_BITS);
- }
-
- /*
- * If the all-visible page is all-frozen but not marked as such yet, mark
- * it as all-frozen.
- */
- else if (all_visible_according_to_vm && presult.all_frozen &&
- !VM_ALL_FROZEN(vacrel->rel, blkno, &vmbuffer))
- {
- uint8 old_vmbits;
-
- /*
- * Avoid relying on all_visible_according_to_vm as a proxy for the
- * page-level PD_ALL_VISIBLE bit being set, since it might have become
- * stale -- even when all_visible is set
- */
- if (!PageIsAllVisible(page))
- {
- PageSetAllVisible(page);
- MarkBufferDirty(buf);
- }
-
- /*
- * Set the page all-frozen (and all-visible) in the VM.
- *
- * We can pass InvalidTransactionId as our cutoff_xid, since a
- * snapshotConflictHorizon sufficient to make everything safe for REDO
- * was logged when the page's tuples were frozen.
- */
- Assert(!TransactionIdIsValid(presult.vm_conflict_horizon));
- old_vmbits = visibilitymap_set(vacrel->rel, blkno, buf,
- InvalidXLogRecPtr,
- vmbuffer, InvalidTransactionId,
- VISIBILITYMAP_ALL_VISIBLE |
- VISIBILITYMAP_ALL_FROZEN);
-
- /*
- * The page was likely already set all-visible in the VM. However,
- * there is a small chance that it was modified sometime between
- * setting all_visible_according_to_vm and checking the visibility
- * during pruning. Check the return value of old_vmbits anyway to
- * ensure the visibility map counters used for logging are accurate.
- */
- if ((old_vmbits & VISIBILITYMAP_ALL_VISIBLE) == 0)
- {
- vacrel->vm_new_visible_pages++;
- vacrel->vm_new_visible_frozen_pages++;
- *vm_page_frozen = true;
- }
-
- /*
- * We already checked that the page was not set all-frozen in the VM
- * above, so we don't need to test the value of old_vmbits.
- */
- else
- {
- vacrel->vm_new_frozen_pages++;
- *vm_page_frozen = true;
- }
+ Assert((presult.new_vmbits & VISIBILITYMAP_ALL_VISIBLE) != 0);
+ vacrel->vm_new_frozen_pages++;
+ *vm_page_frozen = true;
}
return presult.ndeleted;
@@ -2956,6 +2782,7 @@ lazy_vacuum_heap_page(LVRelState *vacrel, BlockNumber blkno, Buffer buffer,
vmflags,
conflict_xid,
false, /* no cleanup lock required */
+ (vmflags & VISIBILITYMAP_VALID_BITS) != 0,
PRUNE_VACUUM_CLEANUP,
NULL, 0, /* frozen */
NULL, 0, /* redirected */
@@ -3643,7 +3470,7 @@ dead_items_cleanup(LVRelState *vacrel)
* that expect no LP_DEAD on the page. Currently assert-only, but there is no
* reason not to use it outside of asserts.
*/
-static bool
+bool
heap_page_is_all_visible(Relation rel, Buffer buf,
TransactionId OldestXmin,
bool *all_frozen,
diff --git a/src/include/access/heapam.h b/src/include/access/heapam.h
index b0b6d3552a6..1471940b4a4 100644
--- a/src/include/access/heapam.h
+++ b/src/include/access/heapam.h
@@ -42,6 +42,7 @@
/* "options" flag bits for heap_page_prune_and_freeze */
#define HEAP_PAGE_PRUNE_MARK_UNUSED_NOW (1 << 0)
#define HEAP_PAGE_PRUNE_FREEZE (1 << 1)
+#define HEAP_PAGE_PRUNE_UPDATE_VIS (1 << 2)
typedef struct BulkInsertStateData *BulkInsertState;
typedef struct GlobalVisState GlobalVisState;
@@ -238,6 +239,16 @@ typedef struct PruneFreezeParams
Relation relation; /* relation containing buffer to be pruned */
Buffer buffer; /* buffer to be pruned */
+ /*
+ *
+ * vmbuffer is the buffer that must already contain contain the required
+ * block of the visibility map if we are to update it. blk_known_av is the
+ * visibility status of the heap block as of the last call to
+ * find_next_unskippable_block().
+ */
+ Buffer vmbuffer;
+ bool blk_known_av;
+
/*
* The reason pruning was performed. It is used to set the WAL record
* opcode which is used for debugging and analysis purposes.
@@ -250,8 +261,9 @@ typedef struct PruneFreezeParams
* MARK_UNUSED_NOW indicates that dead items can be set LP_UNUSED during
* pruning.
*
- * FREEZE indicates that we will also freeze tuples, and will return
- * 'all_visible', 'all_frozen' flags to the caller.
+ * FREEZE indicates that we will also freeze tuples
+ *
+ * UPDATE_VIS indicates that we will set the page's status in the VM.
*/
int options;
@@ -284,19 +296,15 @@ typedef struct PruneFreezeResult
int recently_dead_tuples;
/*
- * all_visible and all_frozen indicate if the all-visible and all-frozen
- * bits in the visibility map can be set for this page, after pruning.
- *
- * vm_conflict_horizon is the newest xmin of live tuples on the page. The
- * caller can use it as the conflict horizon when setting the VM bits. It
- * is only valid if we froze some tuples (nfrozen > 0), and all_frozen is
- * true.
+ * old_vmbits are the state of the all-visible and all-frozen bits in the
+ * visibility map before updating it during phase I of vacuuming.
+ * new_vmbits are the state of those bits after phase I of vacuuming.
*
- * These are only set if the HEAP_PRUNE_FREEZE option is set.
+ * These are only set if the HEAP_PAGE_PRUNE_UPDATE_VIS option is set and
+ * we have attempted to update the VM.
*/
- bool all_visible;
- bool all_frozen;
- TransactionId vm_conflict_horizon;
+ uint8 new_vmbits;
+ uint8 old_vmbits;
/*
* Whether or not the page makes rel truncation unsafe. This is set to
@@ -423,6 +431,7 @@ extern void log_heap_prune_and_freeze(Relation relation, Buffer buffer,
Buffer vmbuffer, uint8 vmflags,
TransactionId conflict_xid,
bool cleanup_lock,
+ bool set_pd_all_vis,
PruneReason reason,
HeapTupleFreeze *frozen, int nfrozen,
OffsetNumber *redirected, int nredirected,
@@ -433,6 +442,14 @@ extern void log_heap_prune_and_freeze(Relation relation, Buffer buffer,
extern void heap_vacuum_rel(Relation rel,
const VacuumParams params, BufferAccessStrategy bstrategy);
+#ifdef USE_ASSERT_CHECKING
+extern bool heap_page_is_all_visible(Relation rel, Buffer buf,
+ TransactionId OldestXmin,
+ bool *all_frozen,
+ TransactionId *visibility_cutoff_xid,
+ OffsetNumber *logging_offnum);
+#endif
+
/* in heap/heapam_visibility.c */
extern bool HeapTupleSatisfiesVisibility(HeapTuple htup, Snapshot snapshot,
Buffer buffer);
--
2.43.0
v19-0005-Eliminate-XLOG_HEAP2_VISIBLE-from-empty-page-vac.patchtext/x-patch; charset=US-ASCII; name=v19-0005-Eliminate-XLOG_HEAP2_VISIBLE-from-empty-page-vac.patchDownload
From 90ef5b185e4940f8b6eab291460fe7115d0e7080 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <melanieplageman@gmail.com>
Date: Sat, 27 Sep 2025 11:55:21 -0400
Subject: [PATCH v19 05/12] Eliminate XLOG_HEAP2_VISIBLE from empty-page vacuum
As part of removing XLOG_HEAP2_VISIBLE records, phase I of VACUUM now
marks empty pages all-visible in a XLOG_HEAP2_PRUNE_VACUUM_SCAN record.
Author: Melanie Plageman <melanieplageman@gmail.com>
Reviewed-by: Robert Haas <robertmhaas@gmail.com>
---
src/backend/access/heap/vacuumlazy.c | 36 +++++++++++++++++++++++-----
1 file changed, 30 insertions(+), 6 deletions(-)
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index 60529fcf9d7..8c402b5b1d4 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -1879,9 +1879,12 @@ lazy_scan_new_or_empty(LVRelState *vacrel, Buffer buf, BlockNumber blkno,
*/
if (!PageIsAllVisible(page))
{
+ /* Lock vmbuffer before entering critical section */
+ LockBuffer(vmbuffer, BUFFER_LOCK_EXCLUSIVE);
+
START_CRIT_SECTION();
- /* mark buffer dirty before writing a WAL record */
+ /* Mark buffer dirty before writing any WAL records */
MarkBufferDirty(buf);
/*
@@ -1898,13 +1901,34 @@ lazy_scan_new_or_empty(LVRelState *vacrel, Buffer buf, BlockNumber blkno,
log_newpage_buffer(buf, true);
PageSetAllVisible(page);
- visibilitymap_set(vacrel->rel, blkno, buf,
- InvalidXLogRecPtr,
- vmbuffer, InvalidTransactionId,
- VISIBILITYMAP_ALL_VISIBLE |
- VISIBILITYMAP_ALL_FROZEN);
+ visibilitymap_set_vmbits(blkno,
+ vmbuffer,
+ VISIBILITYMAP_ALL_VISIBLE |
+ VISIBILITYMAP_ALL_FROZEN,
+ vacrel->rel->rd_locator);
+
+ /*
+ * Emit WAL for setting PD_ALL_VISIBLE on the heap page and
+ * setting the VM.
+ */
+ if (RelationNeedsWAL(vacrel->rel))
+ log_heap_prune_and_freeze(vacrel->rel, buf,
+ vmbuffer,
+ VISIBILITYMAP_ALL_VISIBLE |
+ VISIBILITYMAP_ALL_FROZEN,
+ InvalidTransactionId, /* conflict xid */
+ false, /* cleanup lock */
+ true, /* set_pd_all_vis */
+ PRUNE_VACUUM_SCAN, /* reason */
+ NULL, 0,
+ NULL, 0,
+ NULL, 0,
+ NULL, 0);
+
END_CRIT_SECTION();
+ LockBuffer(vmbuffer, BUFFER_LOCK_UNLOCK);
+
/* Count the newly all-frozen pages for logging */
vacrel->vm_new_visible_pages++;
vacrel->vm_new_visible_frozen_pages++;
--
2.43.0
v19-0006-Remove-XLOG_HEAP2_VISIBLE-entirely.patchtext/x-patch; charset=US-ASCII; name=v19-0006-Remove-XLOG_HEAP2_VISIBLE-entirely.patchDownload
From 25216d74c5ded740fb5b52beb74877c7a701c3b0 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <melanieplageman@gmail.com>
Date: Sat, 27 Sep 2025 11:55:36 -0400
Subject: [PATCH v19 06/12] Remove XLOG_HEAP2_VISIBLE entirely
As no remaining users emit XLOG_HEAP2_VISIBLE records.
This includes deleting the xl_heap_visible struct and all functions
responsible for emitting or replaying XLOG_HEAP2_VISIBLE records.
Author: Melanie Plageman <melanieplageman@gmail.com>
Reviewed-by: Andrey Borodin <x4mmm@yandex-team.ru>
---
src/backend/access/common/bufmask.c | 4 +-
src/backend/access/heap/heapam.c | 54 +-------
src/backend/access/heap/heapam_xlog.c | 155 ++---------------------
src/backend/access/heap/pruneheap.c | 18 ++-
src/backend/access/heap/vacuumlazy.c | 16 +--
src/backend/access/heap/visibilitymap.c | 111 +---------------
src/backend/access/rmgrdesc/heapdesc.c | 10 --
src/backend/replication/logical/decode.c | 1 -
src/backend/storage/ipc/standby.c | 12 +-
src/include/access/heapam_xlog.h | 28 +---
src/include/access/visibilitymap.h | 13 +-
src/include/access/visibilitymapdefs.h | 9 --
src/tools/pgindent/typedefs.list | 1 -
13 files changed, 54 insertions(+), 378 deletions(-)
diff --git a/src/backend/access/common/bufmask.c b/src/backend/access/common/bufmask.c
index bb260cffa68..5f07f179415 100644
--- a/src/backend/access/common/bufmask.c
+++ b/src/backend/access/common/bufmask.c
@@ -56,8 +56,8 @@ mask_page_hint_bits(Page page)
/*
* During replay, if the page LSN has advanced past our XLOG record's LSN,
- * we don't mark the page all-visible. See heap_xlog_visible() for
- * details.
+ * we don't mark the page all-visible. See heap_xlog_prune_and_freeze()
+ * for more details.
*/
PageClearAllVisible(page);
}
diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index 36fee9c994e..a0c5923a563 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -2524,11 +2524,11 @@ heap_multi_insert(Relation relation, TupleTableSlot **slots, int ntuples,
else if (all_frozen_set)
{
PageSetAllVisible(page);
- visibilitymap_set_vmbits(BufferGetBlockNumber(buffer),
- vmbuffer,
- VISIBILITYMAP_ALL_VISIBLE |
- VISIBILITYMAP_ALL_FROZEN,
- relation->rd_locator);
+ visibilitymap_set(BufferGetBlockNumber(buffer),
+ vmbuffer,
+ VISIBILITYMAP_ALL_VISIBLE |
+ VISIBILITYMAP_ALL_FROZEN,
+ relation->rd_locator);
}
/*
@@ -8797,50 +8797,6 @@ bottomup_sort_and_shrink(TM_IndexDeleteOp *delstate)
return nblocksfavorable;
}
-/*
- * Perform XLogInsert for a heap-visible operation. 'block' is the block
- * being marked all-visible, and vm_buffer is the buffer containing the
- * corresponding visibility map block. Both should have already been modified
- * and dirtied.
- *
- * snapshotConflictHorizon comes from the largest xmin on the page being
- * marked all-visible. REDO routine uses it to generate recovery conflicts.
- *
- * If checksums or wal_log_hints are enabled, we may also generate a full-page
- * image of heap_buffer. Otherwise, we optimize away the FPI (by specifying
- * REGBUF_NO_IMAGE for the heap buffer), in which case the caller should *not*
- * update the heap page's LSN.
- */
-XLogRecPtr
-log_heap_visible(Relation rel, Buffer heap_buffer, Buffer vm_buffer,
- TransactionId snapshotConflictHorizon, uint8 vmflags)
-{
- xl_heap_visible xlrec;
- XLogRecPtr recptr;
- uint8 flags;
-
- Assert(BufferIsValid(heap_buffer));
- Assert(BufferIsValid(vm_buffer));
-
- xlrec.snapshotConflictHorizon = snapshotConflictHorizon;
- xlrec.flags = vmflags;
- if (RelationIsAccessibleInLogicalDecoding(rel))
- xlrec.flags |= VISIBILITYMAP_XLOG_CATALOG_REL;
- XLogBeginInsert();
- XLogRegisterData(&xlrec, SizeOfHeapVisible);
-
- XLogRegisterBuffer(0, vm_buffer, 0);
-
- flags = REGBUF_STANDARD;
- if (!XLogHintBitIsNeeded())
- flags |= REGBUF_NO_IMAGE;
- XLogRegisterBuffer(1, heap_buffer, flags);
-
- recptr = XLogInsert(RM_HEAP2_ID, XLOG_HEAP2_VISIBLE);
-
- return recptr;
-}
-
/*
* Perform XLogInsert for a heap-update operation. Caller must already
* have modified the buffer(s) and marked them dirty.
diff --git a/src/backend/access/heap/heapam_xlog.c b/src/backend/access/heap/heapam_xlog.c
index 2af724451c3..5ab46e8bf8f 100644
--- a/src/backend/access/heap/heapam_xlog.c
+++ b/src/backend/access/heap/heapam_xlog.c
@@ -251,7 +251,7 @@ heap_xlog_prune_freeze(XLogReaderState *record)
if (PageIsNew(vmpage))
PageInit(vmpage, BLCKSZ, 0);
- visibilitymap_set_vmbits(blkno, vmbuffer, vmflags, rlocator);
+ visibilitymap_set(blkno, vmbuffer, vmflags, rlocator);
Assert(BufferIsDirty(vmbuffer));
PageSetLSN(vmpage, lsn);
@@ -264,142 +264,6 @@ heap_xlog_prune_freeze(XLogReaderState *record)
XLogRecordPageWithFreeSpace(rlocator, blkno, freespace);
}
-/*
- * Replay XLOG_HEAP2_VISIBLE records.
- *
- * The critical integrity requirement here is that we must never end up with
- * a situation where the visibility map bit is set, and the page-level
- * PD_ALL_VISIBLE bit is clear. If that were to occur, then a subsequent
- * page modification would fail to clear the visibility map bit.
- */
-static void
-heap_xlog_visible(XLogReaderState *record)
-{
- XLogRecPtr lsn = record->EndRecPtr;
- xl_heap_visible *xlrec = (xl_heap_visible *) XLogRecGetData(record);
- Buffer vmbuffer = InvalidBuffer;
- Buffer buffer;
- Page page;
- RelFileLocator rlocator;
- BlockNumber blkno;
- XLogRedoAction action;
-
- Assert((xlrec->flags & VISIBILITYMAP_XLOG_VALID_BITS) == xlrec->flags);
-
- XLogRecGetBlockTag(record, 1, &rlocator, NULL, &blkno);
-
- /*
- * If there are any Hot Standby transactions running that have an xmin
- * horizon old enough that this page isn't all-visible for them, they
- * might incorrectly decide that an index-only scan can skip a heap fetch.
- *
- * NB: It might be better to throw some kind of "soft" conflict here that
- * forces any index-only scan that is in flight to perform heap fetches,
- * rather than killing the transaction outright.
- */
- if (InHotStandby)
- ResolveRecoveryConflictWithSnapshot(xlrec->snapshotConflictHorizon,
- xlrec->flags & VISIBILITYMAP_XLOG_CATALOG_REL,
- rlocator);
-
- /*
- * Read the heap page, if it still exists. If the heap file has dropped or
- * truncated later in recovery, we don't need to update the page, but we'd
- * better still update the visibility map.
- */
- action = XLogReadBufferForRedo(record, 1, &buffer);
- if (action == BLK_NEEDS_REDO)
- {
- /*
- * We don't bump the LSN of the heap page when setting the visibility
- * map bit (unless checksums or wal_hint_bits is enabled, in which
- * case we must). This exposes us to torn page hazards, but since
- * we're not inspecting the existing page contents in any way, we
- * don't care.
- */
- page = BufferGetPage(buffer);
-
- PageSetAllVisible(page);
-
- if (XLogHintBitIsNeeded())
- PageSetLSN(page, lsn);
-
- MarkBufferDirty(buffer);
- }
- else if (action == BLK_RESTORED)
- {
- /*
- * If heap block was backed up, we already restored it and there's
- * nothing more to do. (This can only happen with checksums or
- * wal_log_hints enabled.)
- */
- }
-
- if (BufferIsValid(buffer))
- {
- Size space = PageGetFreeSpace(BufferGetPage(buffer));
-
- UnlockReleaseBuffer(buffer);
-
- /*
- * Since FSM is not WAL-logged and only updated heuristically, it
- * easily becomes stale in standbys. If the standby is later promoted
- * and runs VACUUM, it will skip updating individual free space
- * figures for pages that became all-visible (or all-frozen, depending
- * on the vacuum mode,) which is troublesome when FreeSpaceMapVacuum
- * propagates too optimistic free space values to upper FSM layers;
- * later inserters try to use such pages only to find out that they
- * are unusable. This can cause long stalls when there are many such
- * pages.
- *
- * Forestall those problems by updating FSM's idea about a page that
- * is becoming all-visible or all-frozen.
- *
- * Do this regardless of a full-page image being applied, since the
- * FSM data is not in the page anyway.
- */
- if (xlrec->flags & VISIBILITYMAP_VALID_BITS)
- XLogRecordPageWithFreeSpace(rlocator, blkno, space);
- }
-
- /*
- * Even if we skipped the heap page update due to the LSN interlock, it's
- * still safe to update the visibility map. Any WAL record that clears
- * the visibility map bit does so before checking the page LSN, so any
- * bits that need to be cleared will still be cleared.
- */
- if (XLogReadBufferForRedoExtended(record, 0, RBM_ZERO_ON_ERROR, false,
- &vmbuffer) == BLK_NEEDS_REDO)
- {
- Page vmpage = BufferGetPage(vmbuffer);
- Relation reln;
- uint8 vmbits;
-
- /* initialize the page if it was read as zeros */
- if (PageIsNew(vmpage))
- PageInit(vmpage, BLCKSZ, 0);
-
- /* remove VISIBILITYMAP_XLOG_* */
- vmbits = xlrec->flags & VISIBILITYMAP_VALID_BITS;
-
- /*
- * XLogReadBufferForRedoExtended locked the buffer. But
- * visibilitymap_set will handle locking itself.
- */
- LockBuffer(vmbuffer, BUFFER_LOCK_UNLOCK);
-
- reln = CreateFakeRelcacheEntry(rlocator);
-
- visibilitymap_set(reln, blkno, InvalidBuffer, lsn, vmbuffer,
- xlrec->snapshotConflictHorizon, vmbits);
-
- ReleaseBuffer(vmbuffer);
- FreeFakeRelcacheEntry(reln);
- }
- else if (BufferIsValid(vmbuffer))
- UnlockReleaseBuffer(vmbuffer);
-}
-
/*
* Given an "infobits" field from an XLog record, set the correct bits in the
* given infomask and infomask2 for the tuple touched by the record.
@@ -777,8 +641,8 @@ heap_xlog_multi_insert(XLogReaderState *record)
*
* During recovery, however, no concurrent writers exist. Therefore,
* updating the VM without holding the heap page lock is safe enough. This
- * same approach is taken when replaying xl_heap_visible records (see
- * heap_xlog_visible()).
+ * same approach is taken when replaying XLOG_HEAP2_PRUNE* records (see
+ * heap_xlog_prune_and_freeze()).
*/
if ((xlrec->flags & XLH_INSERT_ALL_FROZEN_SET) &&
XLogReadBufferForRedoExtended(record, 1, RBM_ZERO_ON_ERROR, false,
@@ -790,11 +654,11 @@ heap_xlog_multi_insert(XLogReaderState *record)
if (PageIsNew(vmpage))
PageInit(vmpage, BLCKSZ, 0);
- visibilitymap_set_vmbits(blkno,
- vmbuffer,
- VISIBILITYMAP_ALL_VISIBLE |
- VISIBILITYMAP_ALL_FROZEN,
- rlocator);
+ visibilitymap_set(blkno,
+ vmbuffer,
+ VISIBILITYMAP_ALL_VISIBLE |
+ VISIBILITYMAP_ALL_FROZEN,
+ rlocator);
Assert(BufferIsDirty(vmbuffer));
PageSetLSN(vmpage, lsn);
@@ -1375,9 +1239,6 @@ heap2_redo(XLogReaderState *record)
case XLOG_HEAP2_PRUNE_VACUUM_CLEANUP:
heap_xlog_prune_freeze(record);
break;
- case XLOG_HEAP2_VISIBLE:
- heap_xlog_visible(record);
- break;
case XLOG_HEAP2_MULTI_INSERT:
heap_xlog_multi_insert(record);
break;
diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index 8dafbd344d8..14690cd62ae 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -1031,9 +1031,9 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
{
Assert(PageIsAllVisible(page));
- old_vmbits = visibilitymap_set_vmbits(blockno,
- vmbuffer, new_vmbits,
- params->relation->rd_locator);
+ old_vmbits = visibilitymap_set(blockno,
+ vmbuffer, new_vmbits,
+ params->relation->rd_locator);
if (old_vmbits == new_vmbits)
{
LockBuffer(vmbuffer, BUFFER_LOCK_UNLOCK);
@@ -2309,14 +2309,18 @@ get_conflict_xid(bool do_prune, bool do_freeze, bool do_set_vm,
*
* This is used for several different page maintenance operations:
*
- * - Page pruning, in VACUUM's 1st pass or on access: Some items are
+ * - Page pruning, in vacuum phase I or on-access: Some items are
* redirected, some marked dead, and some removed altogether.
*
- * - Freezing: Items are marked as 'frozen'.
+ * - Freezing: During vacuum phase I, items are marked as 'frozen'
*
- * - Vacuum, 2nd pass: Items that are already LP_DEAD are marked as unused.
+ * - Reaping: During vacuum phase III, items that are already LP_DEAD are
+ * marked as unused.
*
- * They have enough commonalities that we use a single WAL record for them
+ * - VM updates: After vacuum phases I and III, the heap page may be marked
+ * all-visible and all-frozen.
+ *
+ * These changes all happen together, so we use a single WAL record for them
* all.
*
* If replaying the record requires a cleanup lock, pass cleanup_lock = true.
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index 8c402b5b1d4..ff6f0d1d0af 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -1901,11 +1901,11 @@ lazy_scan_new_or_empty(LVRelState *vacrel, Buffer buf, BlockNumber blkno,
log_newpage_buffer(buf, true);
PageSetAllVisible(page);
- visibilitymap_set_vmbits(blkno,
- vmbuffer,
- VISIBILITYMAP_ALL_VISIBLE |
- VISIBILITYMAP_ALL_FROZEN,
- vacrel->rel->rd_locator);
+ visibilitymap_set(blkno,
+ vmbuffer,
+ VISIBILITYMAP_ALL_VISIBLE |
+ VISIBILITYMAP_ALL_FROZEN,
+ vacrel->rel->rd_locator);
/*
* Emit WAL for setting PD_ALL_VISIBLE on the heap page and
@@ -2787,9 +2787,9 @@ lazy_vacuum_heap_page(LVRelState *vacrel, BlockNumber blkno, Buffer buffer,
* set PD_ALL_VISIBLE.
*/
PageSetAllVisible(page);
- visibilitymap_set_vmbits(blkno,
- vmbuffer, vmflags,
- vacrel->rel->rd_locator);
+ visibilitymap_set(blkno,
+ vmbuffer, vmflags,
+ vacrel->rel->rd_locator);
conflict_xid = visibility_cutoff_xid;
}
diff --git a/src/backend/access/heap/visibilitymap.c b/src/backend/access/heap/visibilitymap.c
index 2f5e61e2392..a75b5bb6b13 100644
--- a/src/backend/access/heap/visibilitymap.c
+++ b/src/backend/access/heap/visibilitymap.c
@@ -14,8 +14,7 @@
* visibilitymap_clear - clear bits for one page in the visibility map
* visibilitymap_pin - pin a map page for setting a bit
* visibilitymap_pin_ok - check whether correct map page is already pinned
- * visibilitymap_set - set bit(s) in a previously pinned page and log
- * visibilitymap_set_vmbits - set bit(s) in a pinned page
+ * visibilitymap_set - set bit(s) in a previously pinned page
* visibilitymap_get_status - get status of bits
* visibilitymap_count - count number of bits set in visibility map
* visibilitymap_prepare_truncate -
@@ -220,108 +219,6 @@ visibilitymap_pin_ok(BlockNumber heapBlk, Buffer vmbuf)
return BufferIsValid(vmbuf) && BufferGetBlockNumber(vmbuf) == mapBlock;
}
-/*
- * visibilitymap_set - set bit(s) on a previously pinned page
- *
- * recptr is the LSN of the XLOG record we're replaying, if we're in recovery,
- * or InvalidXLogRecPtr in normal running. The VM page LSN is advanced to the
- * one provided; in normal running, we generate a new XLOG record and set the
- * page LSN to that value (though the heap page's LSN may *not* be updated;
- * see below). cutoff_xid is the largest xmin on the page being marked
- * all-visible; it is needed for Hot Standby, and can be InvalidTransactionId
- * if the page contains no tuples. It can also be set to InvalidTransactionId
- * when a page that is already all-visible is being marked all-frozen.
- *
- * Caller is expected to set the heap page's PD_ALL_VISIBLE bit before calling
- * this function. Except in recovery, caller should also pass the heap
- * buffer. When checksums are enabled and we're not in recovery, we must add
- * the heap buffer to the WAL chain to protect it from being torn.
- *
- * You must pass a buffer containing the correct map page to this function.
- * Call visibilitymap_pin first to pin the right one. This function doesn't do
- * any I/O.
- *
- * Returns the state of the page's VM bits before setting flags.
- */
-uint8
-visibilitymap_set(Relation rel, BlockNumber heapBlk, Buffer heapBuf,
- XLogRecPtr recptr, Buffer vmBuf, TransactionId cutoff_xid,
- uint8 flags)
-{
- BlockNumber mapBlock = HEAPBLK_TO_MAPBLOCK(heapBlk);
- uint32 mapByte = HEAPBLK_TO_MAPBYTE(heapBlk);
- uint8 mapOffset = HEAPBLK_TO_OFFSET(heapBlk);
- Page page;
- uint8 *map;
- uint8 status;
-
-#ifdef TRACE_VISIBILITYMAP
- elog(DEBUG1, "vm_set flags 0x%02X for %s %d",
- flags, RelationGetRelationName(rel), heapBlk);
-#endif
-
- Assert(InRecovery || XLogRecPtrIsInvalid(recptr));
- Assert(InRecovery || PageIsAllVisible(BufferGetPage(heapBuf)));
- Assert((flags & VISIBILITYMAP_VALID_BITS) == flags);
-
- /* Must never set all_frozen bit without also setting all_visible bit */
- Assert(flags != VISIBILITYMAP_ALL_FROZEN);
-
- /* Check that we have the right heap page pinned, if present */
- if (BufferIsValid(heapBuf) && BufferGetBlockNumber(heapBuf) != heapBlk)
- elog(ERROR, "wrong heap buffer passed to visibilitymap_set");
-
- Assert(!BufferIsValid(heapBuf) ||
- BufferIsLockedByMeInMode(heapBuf, BUFFER_LOCK_EXCLUSIVE));
-
- /* Check that we have the right VM page pinned */
- if (!BufferIsValid(vmBuf) || BufferGetBlockNumber(vmBuf) != mapBlock)
- elog(ERROR, "wrong VM buffer passed to visibilitymap_set");
-
- page = BufferGetPage(vmBuf);
- map = (uint8 *) PageGetContents(page);
- LockBuffer(vmBuf, BUFFER_LOCK_EXCLUSIVE);
-
- status = (map[mapByte] >> mapOffset) & VISIBILITYMAP_VALID_BITS;
- if (flags != status)
- {
- START_CRIT_SECTION();
-
- map[mapByte] |= (flags << mapOffset);
- MarkBufferDirty(vmBuf);
-
- if (RelationNeedsWAL(rel))
- {
- if (XLogRecPtrIsInvalid(recptr))
- {
- Assert(!InRecovery);
- recptr = log_heap_visible(rel, heapBuf, vmBuf, cutoff_xid, flags);
-
- /*
- * If data checksums are enabled (or wal_log_hints=on), we
- * need to protect the heap page from being torn.
- *
- * If not, then we must *not* update the heap page's LSN. In
- * this case, the FPI for the heap page was omitted from the
- * WAL record inserted above, so it would be incorrect to
- * update the heap page's LSN.
- */
- if (XLogHintBitIsNeeded())
- {
- Page heapPage = BufferGetPage(heapBuf);
-
- PageSetLSN(heapPage, recptr);
- }
- }
- PageSetLSN(page, recptr);
- }
-
- END_CRIT_SECTION();
- }
-
- LockBuffer(vmBuf, BUFFER_LOCK_UNLOCK);
- return status;
-}
/*
* Set VM (visibility map) flags in the VM block in vmBuf.
@@ -344,9 +241,9 @@ visibilitymap_set(Relation rel, BlockNumber heapBlk, Buffer heapBuf,
* rlocator is used only for debugging messages.
*/
uint8
-visibilitymap_set_vmbits(BlockNumber heapBlk,
- Buffer vmBuf, uint8 flags,
- const RelFileLocator rlocator)
+visibilitymap_set(BlockNumber heapBlk,
+ Buffer vmBuf, uint8 flags,
+ const RelFileLocator rlocator)
{
BlockNumber mapBlock = HEAPBLK_TO_MAPBLOCK(heapBlk);
uint32 mapByte = HEAPBLK_TO_MAPBYTE(heapBlk);
diff --git a/src/backend/access/rmgrdesc/heapdesc.c b/src/backend/access/rmgrdesc/heapdesc.c
index ca26d1f0ed1..08461fdf593 100644
--- a/src/backend/access/rmgrdesc/heapdesc.c
+++ b/src/backend/access/rmgrdesc/heapdesc.c
@@ -349,13 +349,6 @@ heap2_desc(StringInfo buf, XLogReaderState *record)
}
}
}
- else if (info == XLOG_HEAP2_VISIBLE)
- {
- xl_heap_visible *xlrec = (xl_heap_visible *) rec;
-
- appendStringInfo(buf, "snapshotConflictHorizon: %u, flags: 0x%02X",
- xlrec->snapshotConflictHorizon, xlrec->flags);
- }
else if (info == XLOG_HEAP2_MULTI_INSERT)
{
xl_heap_multi_insert *xlrec = (xl_heap_multi_insert *) rec;
@@ -461,9 +454,6 @@ heap2_identify(uint8 info)
case XLOG_HEAP2_PRUNE_VACUUM_CLEANUP:
id = "PRUNE_VACUUM_CLEANUP";
break;
- case XLOG_HEAP2_VISIBLE:
- id = "VISIBLE";
- break;
case XLOG_HEAP2_MULTI_INSERT:
id = "MULTI_INSERT";
break;
diff --git a/src/backend/replication/logical/decode.c b/src/backend/replication/logical/decode.c
index cc03f0706e9..2fdd4af90a8 100644
--- a/src/backend/replication/logical/decode.c
+++ b/src/backend/replication/logical/decode.c
@@ -454,7 +454,6 @@ heap2_decode(LogicalDecodingContext *ctx, XLogRecordBuffer *buf)
case XLOG_HEAP2_PRUNE_ON_ACCESS:
case XLOG_HEAP2_PRUNE_VACUUM_SCAN:
case XLOG_HEAP2_PRUNE_VACUUM_CLEANUP:
- case XLOG_HEAP2_VISIBLE:
case XLOG_HEAP2_LOCK_UPDATED:
break;
default:
diff --git a/src/backend/storage/ipc/standby.c b/src/backend/storage/ipc/standby.c
index 4222bdab078..c619643e121 100644
--- a/src/backend/storage/ipc/standby.c
+++ b/src/backend/storage/ipc/standby.c
@@ -475,12 +475,12 @@ ResolveRecoveryConflictWithSnapshot(TransactionId snapshotConflictHorizon,
* If we get passed InvalidTransactionId then we do nothing (no conflict).
*
* This can happen when replaying already-applied WAL records after a
- * standby crash or restart, or when replaying an XLOG_HEAP2_VISIBLE
- * record that marks as frozen a page which was already all-visible. It's
- * also quite common with records generated during index deletion
- * (original execution of the deletion can reason that a recovery conflict
- * which is sufficient for the deletion operation must take place before
- * replay of the deletion record itself).
+ * standby crash or restart, or when replaying a record that marks as
+ * frozen a page which was already marked all-visible in the visibility
+ * map. It's also quite common with records generated during index
+ * deletion (original execution of the deletion can reason that a recovery
+ * conflict which is sufficient for the deletion operation must take place
+ * before replay of the deletion record itself).
*/
if (!TransactionIdIsValid(snapshotConflictHorizon))
return;
diff --git a/src/include/access/heapam_xlog.h b/src/include/access/heapam_xlog.h
index 16c2b2e3c9c..e9e77bd678b 100644
--- a/src/include/access/heapam_xlog.h
+++ b/src/include/access/heapam_xlog.h
@@ -60,7 +60,6 @@
#define XLOG_HEAP2_PRUNE_ON_ACCESS 0x10
#define XLOG_HEAP2_PRUNE_VACUUM_SCAN 0x20
#define XLOG_HEAP2_PRUNE_VACUUM_CLEANUP 0x30
-#define XLOG_HEAP2_VISIBLE 0x40
#define XLOG_HEAP2_MULTI_INSERT 0x50
#define XLOG_HEAP2_LOCK_UPDATED 0x60
#define XLOG_HEAP2_NEW_CID 0x70
@@ -294,7 +293,13 @@ typedef struct xl_heap_prune
#define SizeOfHeapPrune (offsetof(xl_heap_prune, flags) + sizeof(uint16))
-/* to handle recovery conflict during logical decoding on standby */
+/*
+ * To handle recovery conflict during logical decoding on standby, we must know
+ * if the table is a catalog table. Note that in visibilitymapdefs.h
+ * VISIBLITYMAP_XLOG_CATALOG_REL is also defined as (1 << 2). xl_heap_prune
+ * records should use XLHP_IS_CATALOG_REL, not VISIBILIYTMAP_XLOG_CATALOG_REL --
+ * even if they only contain updates to the VM.
+ */
#define XLHP_IS_CATALOG_REL (1 << 1)
/*
@@ -443,20 +448,6 @@ typedef struct xl_heap_inplace
#define MinSizeOfHeapInplace (offsetof(xl_heap_inplace, nmsgs) + sizeof(int))
-/*
- * This is what we need to know about setting a visibility map bit
- *
- * Backup blk 0: visibility map buffer
- * Backup blk 1: heap buffer
- */
-typedef struct xl_heap_visible
-{
- TransactionId snapshotConflictHorizon;
- uint8 flags;
-} xl_heap_visible;
-
-#define SizeOfHeapVisible (offsetof(xl_heap_visible, flags) + sizeof(uint8))
-
typedef struct xl_heap_new_cid
{
/*
@@ -500,11 +491,6 @@ extern void heap2_desc(StringInfo buf, XLogReaderState *record);
extern const char *heap2_identify(uint8 info);
extern void heap_xlog_logical_rewrite(XLogReaderState *r);
-extern XLogRecPtr log_heap_visible(Relation rel, Buffer heap_buffer,
- Buffer vm_buffer,
- TransactionId snapshotConflictHorizon,
- uint8 vmflags);
-
/* in heapdesc.c, so it can be shared between frontend/backend code */
extern void heap_xlog_deserialize_prune_and_freeze(char *cursor, uint16 flags,
int *nplans, xlhp_freeze_plan **plans,
diff --git a/src/include/access/visibilitymap.h b/src/include/access/visibilitymap.h
index c6fa37be968..05ba6786b47 100644
--- a/src/include/access/visibilitymap.h
+++ b/src/include/access/visibilitymap.h
@@ -15,7 +15,6 @@
#define VISIBILITYMAP_H
#include "access/visibilitymapdefs.h"
-#include "access/xlogdefs.h"
#include "storage/block.h"
#include "storage/buf.h"
#include "storage/relfilelocator.h"
@@ -32,15 +31,9 @@ extern bool visibilitymap_clear(Relation rel, BlockNumber heapBlk,
extern void visibilitymap_pin(Relation rel, BlockNumber heapBlk,
Buffer *vmbuf);
extern bool visibilitymap_pin_ok(BlockNumber heapBlk, Buffer vmbuf);
-extern uint8 visibilitymap_set(Relation rel,
- BlockNumber heapBlk, Buffer heapBuf,
- XLogRecPtr recptr,
- Buffer vmBuf,
- TransactionId cutoff_xid,
- uint8 flags);
-extern uint8 visibilitymap_set_vmbits(BlockNumber heapBlk,
- Buffer vmBuf, uint8 flags,
- const RelFileLocator rlocator);
+extern uint8 visibilitymap_set(BlockNumber heapBlk,
+ Buffer vmBuf, uint8 flags,
+ const RelFileLocator rlocator);
extern uint8 visibilitymap_get_status(Relation rel, BlockNumber heapBlk, Buffer *vmbuf);
extern void visibilitymap_count(Relation rel, BlockNumber *all_visible, BlockNumber *all_frozen);
extern BlockNumber visibilitymap_prepare_truncate(Relation rel,
diff --git a/src/include/access/visibilitymapdefs.h b/src/include/access/visibilitymapdefs.h
index 5ad5c020877..e01bce4c99f 100644
--- a/src/include/access/visibilitymapdefs.h
+++ b/src/include/access/visibilitymapdefs.h
@@ -21,14 +21,5 @@
#define VISIBILITYMAP_ALL_FROZEN 0x02
#define VISIBILITYMAP_VALID_BITS 0x03 /* OR of all valid visibilitymap
* flags bits */
-/*
- * To detect recovery conflicts during logical decoding on a standby, we need
- * to know if a table is a user catalog table. For that we add an additional
- * bit into xl_heap_visible.flags, in addition to the above.
- *
- * NB: VISIBILITYMAP_XLOG_* may not be passed to visibilitymap_set().
- */
-#define VISIBILITYMAP_XLOG_CATALOG_REL 0x04
-#define VISIBILITYMAP_XLOG_VALID_BITS (VISIBILITYMAP_VALID_BITS | VISIBILITYMAP_XLOG_CATALOG_REL)
#endif /* VISIBILITYMAPDEFS_H */
diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list
index a384171de0d..6b4a40f616c 100644
--- a/src/tools/pgindent/typedefs.list
+++ b/src/tools/pgindent/typedefs.list
@@ -4281,7 +4281,6 @@ xl_heap_prune
xl_heap_rewrite_mapping
xl_heap_truncate
xl_heap_update
-xl_heap_visible
xl_invalid_page
xl_invalid_page_key
xl_invalidations
--
2.43.0
v19-0007-Rename-GlobalVisTestIsRemovableXid-to-GlobalVisX.patchtext/x-patch; charset=UTF-8; name=v19-0007-Rename-GlobalVisTestIsRemovableXid-to-GlobalVisX.patchDownload
From 44e0d30a3672a28187e0fb1da014f05747c00d29 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <melanieplageman@gmail.com>
Date: Fri, 18 Jul 2025 16:30:04 -0400
Subject: [PATCH v19 07/12] Rename GlobalVisTestIsRemovableXid() to
GlobalVisXidVisibleToAll()
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
The function is currently only used to check whether a tuple’s xmax is
visible to all transactions (and thus removable). Upcoming changes will
also use it to test whether a tuple’s xmin is visible to all to
decide if a page can be marked all-visible in the visibility map.
The new name, GlobalVisXidVisibleToAll(), better reflects this broader
purpose.
Reviewed-by: Kirill Reshke <reshkekirill@gmail.com>
---
src/backend/access/heap/heapam_visibility.c | 6 +++---
src/backend/access/heap/pruneheap.c | 16 ++++++++--------
src/backend/access/spgist/spgvacuum.c | 2 +-
src/backend/storage/ipc/procarray.c | 17 ++++++++---------
src/include/utils/snapmgr.h | 4 ++--
5 files changed, 22 insertions(+), 23 deletions(-)
diff --git a/src/backend/access/heap/heapam_visibility.c b/src/backend/access/heap/heapam_visibility.c
index 05f6946fe60..4ebc8abdbeb 100644
--- a/src/backend/access/heap/heapam_visibility.c
+++ b/src/backend/access/heap/heapam_visibility.c
@@ -1447,7 +1447,7 @@ HeapTupleSatisfiesNonVacuumable(HeapTuple htup, Snapshot snapshot,
{
Assert(TransactionIdIsValid(dead_after));
- if (GlobalVisTestIsRemovableXid(snapshot->vistest, dead_after))
+ if (GlobalVisXidVisibleToAll(snapshot->vistest, dead_after))
res = HEAPTUPLE_DEAD;
}
else
@@ -1512,8 +1512,8 @@ HeapTupleIsSurelyDead(HeapTuple htup, GlobalVisState *vistest)
return false;
/* Deleter committed, so tuple is dead if the XID is old enough. */
- return GlobalVisTestIsRemovableXid(vistest,
- HeapTupleHeaderGetRawXmax(tuple));
+ return GlobalVisXidVisibleToAll(vistest,
+ HeapTupleHeaderGetRawXmax(tuple));
}
/*
diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index 14690cd62ae..d03b754b2cc 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -235,7 +235,7 @@ heap_page_prune_opt(Relation relation, Buffer buffer)
*/
vistest = GlobalVisTestFor(relation);
- if (!GlobalVisTestIsRemovableXid(vistest, prune_xid))
+ if (!GlobalVisXidVisibleToAll(vistest, prune_xid))
return;
/*
@@ -730,9 +730,9 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
* Determining HTSV only once for each tuple is required for correctness,
* to deal with cases where running HTSV twice could result in different
* results. For example, RECENTLY_DEAD can turn to DEAD if another
- * checked item causes GlobalVisTestIsRemovableFullXid() to update the
- * horizon, or INSERT_IN_PROGRESS can change to DEAD if the inserting
- * transaction aborts.
+ * checked item causes GlobalVisXidVisibleToAll() to update the horizon,
+ * or INSERT_IN_PROGRESS can change to DEAD if the inserting transaction
+ * aborts.
*
* It's also good for performance. Most commonly tuples within a page are
* stored at decreasing offsets (while the items are stored at increasing
@@ -1157,11 +1157,11 @@ heap_prune_satisfies_vacuum(PruneState *prstate, HeapTuple tup, Buffer buffer)
* Determine whether or not the tuple is considered dead when compared
* with the provided GlobalVisState. On-access pruning does not provide
* VacuumCutoffs. And for vacuum, even if the tuple's xmax is not older
- * than OldestXmin, GlobalVisTestIsRemovableXid() could find the row dead
- * if the GlobalVisState has been updated since the beginning of vacuuming
+ * than OldestXmin, GlobalVisXidVisibleToAll() could find the row dead if
+ * the GlobalVisState has been updated since the beginning of vacuuming
* the relation.
*/
- if (GlobalVisTestIsRemovableXid(prstate->vistest, dead_after))
+ if (GlobalVisXidVisibleToAll(prstate->vistest, dead_after))
return HEAPTUPLE_DEAD;
return res;
@@ -1618,7 +1618,7 @@ heap_prune_record_unchanged_lp_normal(Page page, PruneState *prstate, OffsetNumb
/*
* For now always use prstate->cutoffs for this test, because
* we only update 'all_visible' when freezing is requested. We
- * could use GlobalVisTestIsRemovableXid instead, if a
+ * could use GlobalVisXidVisibleToAll() instead, if a
* non-freezing caller wanted to set the VM bit.
*/
Assert(prstate->cutoffs);
diff --git a/src/backend/access/spgist/spgvacuum.c b/src/backend/access/spgist/spgvacuum.c
index 71ef2e5036f..1c0eb425ee9 100644
--- a/src/backend/access/spgist/spgvacuum.c
+++ b/src/backend/access/spgist/spgvacuum.c
@@ -536,7 +536,7 @@ vacuumRedirectAndPlaceholder(Relation index, Relation heaprel, Buffer buffer)
*/
if (dt->tupstate == SPGIST_REDIRECT &&
(!TransactionIdIsValid(dt->xid) ||
- GlobalVisTestIsRemovableXid(vistest, dt->xid)))
+ GlobalVisXidVisibleToAll(vistest, dt->xid)))
{
dt->tupstate = SPGIST_PLACEHOLDER;
Assert(opaque->nRedirection > 0);
diff --git a/src/backend/storage/ipc/procarray.c b/src/backend/storage/ipc/procarray.c
index 200f72c6e25..235c3b584f6 100644
--- a/src/backend/storage/ipc/procarray.c
+++ b/src/backend/storage/ipc/procarray.c
@@ -4181,8 +4181,7 @@ GlobalVisUpdate(void)
* See comment for GlobalVisState for details.
*/
bool
-GlobalVisTestIsRemovableFullXid(GlobalVisState *state,
- FullTransactionId fxid)
+GlobalVisFullXidVisibleToAll(GlobalVisState *state, FullTransactionId fxid)
{
/*
* If fxid is older than maybe_needed bound, it definitely is visible to
@@ -4216,14 +4215,14 @@ GlobalVisTestIsRemovableFullXid(GlobalVisState *state,
}
/*
- * Wrapper around GlobalVisTestIsRemovableFullXid() for 32bit xids.
+ * Wrapper around GlobalVisFullXidVisibleToAll() for 32bit xids.
*
* It is crucial that this only gets called for xids from a source that
* protects against xid wraparounds (e.g. from a table and thus protected by
* relfrozenxid).
*/
bool
-GlobalVisTestIsRemovableXid(GlobalVisState *state, TransactionId xid)
+GlobalVisXidVisibleToAll(GlobalVisState *state, TransactionId xid)
{
FullTransactionId fxid;
@@ -4237,12 +4236,12 @@ GlobalVisTestIsRemovableXid(GlobalVisState *state, TransactionId xid)
*/
fxid = FullXidRelativeTo(state->definitely_needed, xid);
- return GlobalVisTestIsRemovableFullXid(state, fxid);
+ return GlobalVisFullXidVisibleToAll(state, fxid);
}
/*
* Convenience wrapper around GlobalVisTestFor() and
- * GlobalVisTestIsRemovableFullXid(), see their comments.
+ * GlobalVisFullXidVisibleToAll(), see their comments.
*/
bool
GlobalVisCheckRemovableFullXid(Relation rel, FullTransactionId fxid)
@@ -4251,12 +4250,12 @@ GlobalVisCheckRemovableFullXid(Relation rel, FullTransactionId fxid)
state = GlobalVisTestFor(rel);
- return GlobalVisTestIsRemovableFullXid(state, fxid);
+ return GlobalVisFullXidVisibleToAll(state, fxid);
}
/*
* Convenience wrapper around GlobalVisTestFor() and
- * GlobalVisTestIsRemovableXid(), see their comments.
+ * GlobalVisTestIsVisibleXid(), see their comments.
*/
bool
GlobalVisCheckRemovableXid(Relation rel, TransactionId xid)
@@ -4265,7 +4264,7 @@ GlobalVisCheckRemovableXid(Relation rel, TransactionId xid)
state = GlobalVisTestFor(rel);
- return GlobalVisTestIsRemovableXid(state, xid);
+ return GlobalVisXidVisibleToAll(state, xid);
}
/*
diff --git a/src/include/utils/snapmgr.h b/src/include/utils/snapmgr.h
index 604c1f90216..a0ea2cfcea2 100644
--- a/src/include/utils/snapmgr.h
+++ b/src/include/utils/snapmgr.h
@@ -100,8 +100,8 @@ extern char *ExportSnapshot(Snapshot snapshot);
*/
typedef struct GlobalVisState GlobalVisState;
extern GlobalVisState *GlobalVisTestFor(Relation rel);
-extern bool GlobalVisTestIsRemovableXid(GlobalVisState *state, TransactionId xid);
-extern bool GlobalVisTestIsRemovableFullXid(GlobalVisState *state, FullTransactionId fxid);
+extern bool GlobalVisXidVisibleToAll(GlobalVisState *state, TransactionId xid);
+extern bool GlobalVisFullXidVisibleToAll(GlobalVisState *state, FullTransactionId fxid);
extern bool GlobalVisCheckRemovableXid(Relation rel, TransactionId xid);
extern bool GlobalVisCheckRemovableFullXid(Relation rel, FullTransactionId fxid);
--
2.43.0
v19-0008-Use-GlobalVisState-in-vacuum-to-determine-page-l.patchtext/x-patch; charset=UTF-8; name=v19-0008-Use-GlobalVisState-in-vacuum-to-determine-page-l.patchDownload
From 7e8926fc6256dc0966b0c65e4fcec0031fbd2988 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <melanieplageman@gmail.com>
Date: Tue, 29 Jul 2025 14:38:24 -0400
Subject: [PATCH v19 08/12] Use GlobalVisState in vacuum to determine page
level visibility
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
During vacuum's first and third phases, we examine tuples' visibility
to determine if we can set the page all-visible in the visibility map.
Previously, this check compared tuple xmins against a single XID chosen at
the start of vacuum (OldestXmin). We now use GlobalVisState, which also
enables future work to set the VM during on-access pruning, since ordinary
queries have access to GlobalVisState but not OldestXmin.
This also benefits vacuum directly: in some cases, GlobalVisState may
advance during a vacuum, allowing more pages to become considered
all-visible. And, in the future, we could easily add a heuristic to
update GlobalVisState more frequently during vacuums of large tables. In
the rare case that the GlobalVisState moves backward, vacuum falls back
to OldestXmin to ensure we don’t attempt to freeze a dead tuple that
wasn’t yet prunable according to the GlobalVisState.
Because comparing a transaction ID against GlobalVisState is more
expensive than comparing against a single XID, we defer this check until
after scanning all tuples on the page. If visibility_cutoff_xid was
maintained, we perform the GlobalVisState check only once per page.
This is safe because visibility_cutoff_xid records the newest xmin on
the page; if it is globally visible, then the entire page is all-visible.
This approach may result in examining more tuple xmins than before,
since with OldestXmin we could sometimes rule out the page being
all-visible earlier. However, profiling shows the additional cost is not
significant.
---
src/backend/access/heap/heapam_visibility.c | 28 ++++++++++++++++
src/backend/access/heap/pruneheap.c | 37 ++++++++++-----------
src/backend/access/heap/vacuumlazy.c | 17 +++++-----
src/include/access/heapam.h | 7 ++--
4 files changed, 57 insertions(+), 32 deletions(-)
diff --git a/src/backend/access/heap/heapam_visibility.c b/src/backend/access/heap/heapam_visibility.c
index 4ebc8abdbeb..edd529dc3c0 100644
--- a/src/backend/access/heap/heapam_visibility.c
+++ b/src/backend/access/heap/heapam_visibility.c
@@ -1189,6 +1189,34 @@ HeapTupleSatisfiesVacuum(HeapTuple htup, TransactionId OldestXmin,
return res;
}
+/*
+ * Nearly the same as HeapTupleSatisfiesVacuum, but uses a GlobalVisState to
+ * determine whether or not a tuple is HEAPTUPLE_DEAD Or
+ * HEAPTUPLE_RECENTLY_DEAD. It serves the same purpose but can be used by
+ * callers that have not calculated a single OldestXmin value.
+ */
+HTSV_Result
+HeapTupleSatisfiesVacuumGlobalVis(HeapTuple htup, GlobalVisState *vistest,
+ Buffer buffer)
+{
+ TransactionId dead_after = InvalidTransactionId;
+ HTSV_Result res;
+
+ res = HeapTupleSatisfiesVacuumHorizon(htup, buffer, &dead_after);
+
+ if (res == HEAPTUPLE_RECENTLY_DEAD)
+ {
+ Assert(TransactionIdIsValid(dead_after));
+
+ if (GlobalVisXidVisibleToAll(vistest, dead_after))
+ res = HEAPTUPLE_DEAD;
+ }
+ else
+ Assert(!TransactionIdIsValid(dead_after));
+
+ return res;
+}
+
/*
* Work horse for HeapTupleSatisfiesVacuum and similar routines.
*
diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index d03b754b2cc..d3c57eedfe3 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -712,11 +712,12 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
/*
* The visibility cutoff xid is the newest xmin of live, committed tuples
- * older than OldestXmin on the page. This field is only kept up-to-date
- * if the page is all-visible. As soon as a tuple is encountered that is
- * not visible to all, this field is unmaintained. As long as it is
- * maintained, it can be used to calculate the snapshot conflict horizon
- * when updating the VM and/or freezing all the tuples on the page.
+ * on the page older than the visibility horizon represented in the
+ * GlobalVisState. This field is only kept up-to-date if the page is
+ * all-visible. As soon as a tuple is encountered that is not visible to
+ * all, this field is unmaintained. As long as it is maintained, it can be
+ * used to calculate the snapshot conflict horizon when updating the VM
+ * and/or freezing all the tuples on the page.
*/
prstate.visibility_cutoff_xid = InvalidTransactionId;
@@ -912,6 +913,16 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
prstate.ndead > 0 ||
prstate.nunused > 0;
+ /*
+ * After processing all the live tuples on the page, if the newest xmin
+ * amongst them is not visible to everyone, the page cannot be
+ * all-visible.
+ */
+ if (prstate.all_visible &&
+ TransactionIdIsNormal(prstate.visibility_cutoff_xid) &&
+ !GlobalVisXidVisibleToAll(prstate.vistest, prstate.visibility_cutoff_xid))
+ prstate.all_visible = prstate.all_frozen = false;
+
/*
* Even if we don't prune anything, if we found a new value for the
* pd_prune_xid field or the page was marked full, we will update the hint
@@ -1084,10 +1095,9 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
bool debug_all_frozen;
Assert(prstate.lpdead_items == 0);
- Assert(prstate.cutoffs);
if (!heap_page_is_all_visible(params->relation, buffer,
- prstate.cutoffs->OldestXmin,
+ prstate.vistest,
&debug_all_frozen,
&debug_cutoff, off_loc))
Assert(false);
@@ -1615,19 +1625,6 @@ heap_prune_record_unchanged_lp_normal(Page page, PruneState *prstate, OffsetNumb
*/
xmin = HeapTupleHeaderGetXmin(htup);
- /*
- * For now always use prstate->cutoffs for this test, because
- * we only update 'all_visible' when freezing is requested. We
- * could use GlobalVisXidVisibleToAll() instead, if a
- * non-freezing caller wanted to set the VM bit.
- */
- Assert(prstate->cutoffs);
- if (!TransactionIdPrecedes(xmin, prstate->cutoffs->OldestXmin))
- {
- prstate->all_visible = prstate->all_frozen = false;
- break;
- }
-
/* Track newest xmin on page. */
if (TransactionIdFollows(xmin, prstate->visibility_cutoff_xid) &&
TransactionIdIsNormal(xmin))
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index ff6f0d1d0af..5e3c1d50378 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -465,7 +465,7 @@ static void dead_items_reset(LVRelState *vacrel);
static void dead_items_cleanup(LVRelState *vacrel);
static bool heap_page_would_be_all_visible(Relation rel, Buffer buf,
- TransactionId OldestXmin,
+ GlobalVisState *vistest,
OffsetNumber *deadoffsets,
int ndeadoffsets,
bool *all_frozen,
@@ -2741,7 +2741,7 @@ lazy_vacuum_heap_page(LVRelState *vacrel, BlockNumber blkno, Buffer buffer,
* done outside the critical section.
*/
if (heap_page_would_be_all_visible(vacrel->rel, buffer,
- vacrel->cutoffs.OldestXmin,
+ vacrel->vistest,
deadoffsets, num_offsets,
&all_frozen, &visibility_cutoff_xid,
&vacrel->offnum))
@@ -3496,14 +3496,13 @@ dead_items_cleanup(LVRelState *vacrel)
*/
bool
heap_page_is_all_visible(Relation rel, Buffer buf,
- TransactionId OldestXmin,
+ GlobalVisState *vistest,
bool *all_frozen,
TransactionId *visibility_cutoff_xid,
OffsetNumber *logging_offnum)
{
- return heap_page_would_be_all_visible(rel, buf,
- OldestXmin,
+ return heap_page_would_be_all_visible(rel, buf, vistest,
NULL, 0,
all_frozen,
visibility_cutoff_xid,
@@ -3524,7 +3523,7 @@ heap_page_is_all_visible(Relation rel, Buffer buf,
* Returns true if the page is all-visible other than the provided
* deadoffsets and false otherwise.
*
- * OldestXmin is used to determine visibility.
+ * vistest is used to determine visibility.
*
* Output parameters:
*
@@ -3543,7 +3542,7 @@ heap_page_is_all_visible(Relation rel, Buffer buf,
*/
static bool
heap_page_would_be_all_visible(Relation rel, Buffer buf,
- TransactionId OldestXmin,
+ GlobalVisState *vistest,
OffsetNumber *deadoffsets,
int ndeadoffsets,
bool *all_frozen,
@@ -3617,7 +3616,7 @@ heap_page_would_be_all_visible(Relation rel, Buffer buf,
/* Visibility checks may do IO or allocate memory */
Assert(CritSectionCount == 0);
- switch (HeapTupleSatisfiesVacuum(&tuple, OldestXmin, buf))
+ switch (HeapTupleSatisfiesVacuumGlobalVis(&tuple, vistest, buf))
{
case HEAPTUPLE_LIVE:
{
@@ -3636,7 +3635,7 @@ heap_page_would_be_all_visible(Relation rel, Buffer buf,
* that everyone sees it as committed?
*/
xmin = HeapTupleHeaderGetXmin(tuple.t_data);
- if (!TransactionIdPrecedes(xmin, OldestXmin))
+ if (!GlobalVisXidVisibleToAll(vistest, xmin))
{
all_visible = false;
*all_frozen = false;
diff --git a/src/include/access/heapam.h b/src/include/access/heapam.h
index 1471940b4a4..4fc6edf4261 100644
--- a/src/include/access/heapam.h
+++ b/src/include/access/heapam.h
@@ -276,8 +276,7 @@ typedef struct PruneFreezeParams
/*
* cutoffs contains the freeze cutoffs, established by VACUUM at the
* beginning of vacuuming the relation. Required if HEAP_PRUNE_FREEZE
- * option is set. cutoffs->OldestXmin is also used to determine if dead
- * tuples are HEAPTUPLE_RECENTLY_DEAD or HEAPTUPLE_DEAD.
+ * option is set.
*/
struct VacuumCutoffs *cutoffs;
} PruneFreezeParams;
@@ -444,7 +443,7 @@ extern void heap_vacuum_rel(Relation rel,
#ifdef USE_ASSERT_CHECKING
extern bool heap_page_is_all_visible(Relation rel, Buffer buf,
- TransactionId OldestXmin,
+ GlobalVisState *vistest,
bool *all_frozen,
TransactionId *visibility_cutoff_xid,
OffsetNumber *logging_offnum);
@@ -457,6 +456,8 @@ extern TM_Result HeapTupleSatisfiesUpdate(HeapTuple htup, CommandId curcid,
Buffer buffer);
extern HTSV_Result HeapTupleSatisfiesVacuum(HeapTuple htup, TransactionId OldestXmin,
Buffer buffer);
+extern HTSV_Result HeapTupleSatisfiesVacuumGlobalVis(HeapTuple htup,
+ GlobalVisState *vistest, Buffer buffer);
extern HTSV_Result HeapTupleSatisfiesVacuumHorizon(HeapTuple htup, Buffer buffer,
TransactionId *dead_after);
extern void HeapTupleSetHintBits(HeapTupleHeader tuple, Buffer buffer,
--
2.43.0
v19-0009-Unset-all_visible-sooner-if-not-freezing.patchtext/x-patch; charset=UTF-8; name=v19-0009-Unset-all_visible-sooner-if-not-freezing.patchDownload
From 675c8aa69fd456dfee011d40c913f33cd866cab6 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <melanieplageman@gmail.com>
Date: Tue, 14 Oct 2025 15:22:35 -0400
Subject: [PATCH v19 09/12] Unset all_visible sooner if not freezing
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
In the prune/freeze path, we currently delay clearing all_visible and
all_frozen in the presence of dead items to allow opportunistic
freezing.
However, if no freezing will be attempted, there’s no need to delay.
Clearing the flags earlier avoids extra bookkeeping in
heap_prune_record_unchanged_lp_normal(). This currently has no runtime
effect because all callers that consider setting the VM also prepare
freeze plans, but upcoming changes will allow on-access pruning to set
the VM without freezing. The extra bookkeeping was noticeable in a
profile of on-access VM setting.
---
src/backend/access/heap/pruneheap.c | 10 ++++++++--
1 file changed, 8 insertions(+), 2 deletions(-)
diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index d3c57eedfe3..7f457abf8e1 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -1483,8 +1483,11 @@ heap_prune_record_dead(PruneState *prstate, OffsetNumber offnum,
/*
* Deliberately delay unsetting all_visible until later during pruning.
- * Removable dead tuples shouldn't preclude freezing the page.
+ * Removable dead tuples shouldn't preclude freezing the page. If we won't
+ * attempt freezing, just unset all-visible now, though.
*/
+ if (!prstate->attempt_freeze)
+ prstate->all_visible = prstate->all_frozen = false;
/* Record the dead offset for vacuum */
prstate->deadoffsets[prstate->lpdead_items++] = offnum;
@@ -1739,8 +1742,11 @@ heap_prune_record_unchanged_lp_dead(Page page, PruneState *prstate, OffsetNumber
* Similarly, don't unset all_visible until later, at the end of
* heap_page_prune_and_freeze(). This will allow us to attempt to freeze
* the page after pruning. As long as we unset it before updating the
- * visibility map, this will be correct.
+ * visibility map, this will be correct. If we won't attempt freezing,
+ * though, just unset all-visible now.
*/
+ if (!prstate->attempt_freeze)
+ prstate->all_visible = prstate->all_frozen = false;
/* Record the dead offset for vacuum */
prstate->deadoffsets[prstate->lpdead_items++] = offnum;
--
2.43.0
v19-0010-Allow-on-access-pruning-to-set-pages-all-visible.patchtext/x-patch; charset=UTF-8; name=v19-0010-Allow-on-access-pruning-to-set-pages-all-visible.patchDownload
From 85792e88a836a7909a27783b1801e1da0e51399e Mon Sep 17 00:00:00 2001
From: Melanie Plageman <melanieplageman@gmail.com>
Date: Tue, 29 Jul 2025 14:34:30 -0400
Subject: [PATCH v19 10/12] Allow on-access pruning to set pages all-visible
Many queries do not modify the underlying relation. For such queries, if
on-access pruning occurs during the scan, we can check whether the page
has become all-visible and update the visibility map accordingly.
Previously, only vacuum and COPY FREEZE marked pages as all-visible or
all-frozen.
Supporting this requires passing information about whether the relation
is modified from the executor down to the scan descriptor.
This commit implements on-access VM setting for sequential scans as well
as for the underlying heap relation in index scans and bitmap heap
scans.
Author: Melanie Plageman <melanieplageman@gmail.com>
Reviewed-by: Kirill Reshke <reshkekirill@gmail.com>
Discussion: https://postgr.es/m/flat/CAAKRu_ZMw6Npd_qm2KM%2BFwQ3cMOMx1Dh3VMhp8-V7SOLxdK9-g%40mail.gmail.com
---
src/backend/access/heap/heapam.c | 15 +++-
src/backend/access/heap/heapam_handler.c | 15 +++-
src/backend/access/heap/pruneheap.c | 73 +++++++++++++++----
src/backend/access/index/indexam.c | 46 ++++++++++++
src/backend/access/table/tableam.c | 39 +++++++++-
src/backend/executor/execMain.c | 4 +
src/backend/executor/execUtils.c | 2 +
src/backend/executor/nodeBitmapHeapscan.c | 7 +-
src/backend/executor/nodeIndexscan.c | 18 +++--
src/backend/executor/nodeSeqscan.c | 24 ++++--
src/include/access/genam.h | 11 +++
src/include/access/heapam.h | 24 +++++-
src/include/access/relscan.h | 6 ++
src/include/access/tableam.h | 30 +++++++-
src/include/nodes/execnodes.h | 6 ++
.../t/035_standby_logical_decoding.pl | 3 +-
16 files changed, 284 insertions(+), 39 deletions(-)
diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index a0c5923a563..260f981d457 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -555,6 +555,7 @@ heap_prepare_pagescan(TableScanDesc sscan)
Buffer buffer = scan->rs_cbuf;
BlockNumber block = scan->rs_cblock;
Snapshot snapshot;
+ Buffer *vmbuffer = NULL;
Page page;
int lines;
bool all_visible;
@@ -569,7 +570,9 @@ heap_prepare_pagescan(TableScanDesc sscan)
/*
* Prune and repair fragmentation for the whole page, if possible.
*/
- heap_page_prune_opt(scan->rs_base.rs_rd, buffer);
+ if (sscan->rs_flags & SO_ALLOW_VM_SET)
+ vmbuffer = &scan->rs_vmbuffer;
+ heap_page_prune_opt(scan->rs_base.rs_rd, buffer, vmbuffer);
/*
* We must hold share lock on the buffer content while examining tuple
@@ -1246,6 +1249,7 @@ heap_beginscan(Relation relation, Snapshot snapshot,
sizeof(TBMIterateResult));
}
+ scan->rs_vmbuffer = InvalidBuffer;
return (TableScanDesc) scan;
}
@@ -1284,6 +1288,12 @@ heap_rescan(TableScanDesc sscan, ScanKey key, bool set_params,
scan->rs_cbuf = InvalidBuffer;
}
+ if (BufferIsValid(scan->rs_vmbuffer))
+ {
+ ReleaseBuffer(scan->rs_vmbuffer);
+ scan->rs_vmbuffer = InvalidBuffer;
+ }
+
/*
* SO_TYPE_BITMAPSCAN would be cleaned up here, but it does not hold any
* additional data vs a normal HeapScan
@@ -1316,6 +1326,9 @@ heap_endscan(TableScanDesc sscan)
if (BufferIsValid(scan->rs_cbuf))
ReleaseBuffer(scan->rs_cbuf);
+ if (BufferIsValid(scan->rs_vmbuffer))
+ ReleaseBuffer(scan->rs_vmbuffer);
+
/*
* Must free the read stream before freeing the BufferAccessStrategy.
*/
diff --git a/src/backend/access/heap/heapam_handler.c b/src/backend/access/heap/heapam_handler.c
index bcbac844bb6..f05b9e4968d 100644
--- a/src/backend/access/heap/heapam_handler.c
+++ b/src/backend/access/heap/heapam_handler.c
@@ -85,6 +85,7 @@ heapam_index_fetch_begin(Relation rel)
hscan->xs_base.rel = rel;
hscan->xs_cbuf = InvalidBuffer;
+ hscan->xs_vmbuffer = InvalidBuffer;
return &hscan->xs_base;
}
@@ -99,6 +100,12 @@ heapam_index_fetch_reset(IndexFetchTableData *scan)
ReleaseBuffer(hscan->xs_cbuf);
hscan->xs_cbuf = InvalidBuffer;
}
+
+ if (BufferIsValid(hscan->xs_vmbuffer))
+ {
+ ReleaseBuffer(hscan->xs_vmbuffer);
+ hscan->xs_vmbuffer = InvalidBuffer;
+ }
}
static void
@@ -138,7 +145,8 @@ heapam_index_fetch_tuple(struct IndexFetchTableData *scan,
* Prune page, but only if we weren't already on this page
*/
if (prev_buf != hscan->xs_cbuf)
- heap_page_prune_opt(hscan->xs_base.rel, hscan->xs_cbuf);
+ heap_page_prune_opt(hscan->xs_base.rel, hscan->xs_cbuf,
+ scan->modifies_base_rel ? NULL : &hscan->xs_vmbuffer);
}
/* Obtain share-lock on the buffer so we can examine visibility */
@@ -2471,6 +2479,7 @@ BitmapHeapScanNextBlock(TableScanDesc scan,
TBMIterateResult *tbmres;
OffsetNumber offsets[TBM_MAX_TUPLES_PER_PAGE];
int noffsets = -1;
+ Buffer *vmbuffer = NULL;
Assert(scan->rs_flags & SO_TYPE_BITMAPSCAN);
Assert(hscan->rs_read_stream);
@@ -2517,7 +2526,9 @@ BitmapHeapScanNextBlock(TableScanDesc scan,
/*
* Prune and repair fragmentation for the whole page, if possible.
*/
- heap_page_prune_opt(scan->rs_rd, buffer);
+ if (scan->rs_flags & SO_ALLOW_VM_SET)
+ vmbuffer = &hscan->rs_vmbuffer;
+ heap_page_prune_opt(scan->rs_rd, buffer, vmbuffer);
/*
* We must hold share lock on the buffer content while examining tuple
diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index 7f457abf8e1..631eb45bc96 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -188,7 +188,9 @@ static bool heap_page_will_set_vis(Relation relation,
Buffer heap_buf,
Buffer vmbuffer,
bool blk_known_av,
- const PruneState *prstate,
+ PruneReason reason,
+ bool do_prune, bool do_freeze,
+ PruneState *prstate,
uint8 *vmflags,
bool *do_set_pd_vis);
@@ -203,9 +205,13 @@ static bool heap_page_will_set_vis(Relation relation,
* if there's not any use in pruning.
*
* Caller must have pin on the buffer, and must *not* have a lock on it.
+ *
+ * If vmbuffer is not NULL, it is okay for pruning to set the visibility map if
+ * the page is all-visible. We will take care of pinning and, if needed,
+ * reading in the page of the visibility map.
*/
void
-heap_page_prune_opt(Relation relation, Buffer buffer)
+heap_page_prune_opt(Relation relation, Buffer buffer, Buffer *vmbuffer)
{
Page page = BufferGetPage(buffer);
TransactionId prune_xid;
@@ -271,12 +277,21 @@ heap_page_prune_opt(Relation relation, Buffer buffer)
PruneFreezeParams params;
PruneFreezeResult presult;
+ params.options = 0;
+ params.vmbuffer = InvalidBuffer;
+
+ if (vmbuffer)
+ {
+ visibilitymap_pin(relation, BufferGetBlockNumber(buffer), vmbuffer);
+ params.options = HEAP_PAGE_PRUNE_UPDATE_VIS;
+ params.vmbuffer = *vmbuffer;
+ }
+
params.relation = relation;
params.buffer = buffer;
params.reason = PRUNE_ON_ACCESS;
params.vistest = vistest;
params.cutoffs = NULL;
- params.vmbuffer = InvalidBuffer;
params.blk_known_av = false;
/*
@@ -456,6 +471,9 @@ heap_page_will_freeze(Relation relation, Buffer buffer,
* have examined this page’s VM bits (e.g., VACUUM in the previous
* heap_vac_scan_next_block() call) and can pass that along.
*
+ * This should be called only after do_freeze has been decided (and do_prune
+ * has been set), as these factor into our heuristic-based decision.
+ *
* Returns true if one or both VM bits should be set, along with the desired
* flags in *vmflags. Also indicates via do_set_pd_vis whether PD_ALL_VISIBLE
* should be set on the heap page.
@@ -466,7 +484,9 @@ heap_page_will_set_vis(Relation relation,
Buffer heap_buf,
Buffer vmbuffer,
bool blk_known_av,
- const PruneState *prstate,
+ PruneReason reason,
+ bool do_prune, bool do_freeze,
+ PruneState *prstate,
uint8 *vmflags,
bool *do_set_pd_vis)
{
@@ -482,6 +502,23 @@ heap_page_will_set_vis(Relation relation,
return false;
}
+ /*
+ * If this is an on-access call and we're not actually pruning, avoid
+ * setting the visibility map if it would newly dirty the heap page or, if
+ * the page is already dirty, if doing so would require including a
+ * full-page image (FPI) of the heap page in the WAL. This situation
+ * should be rare, as on-access pruning is only attempted when
+ * pd_prune_xid is valid.
+ */
+ if (reason == PRUNE_ON_ACCESS &&
+ prstate->all_visible &&
+ !do_prune && !do_freeze &&
+ (!BufferIsDirty(heap_buf) || XLogCheckBufferNeedsBackup(heap_buf)))
+ {
+ prstate->all_visible = prstate->all_frozen = false;
+ return false;
+ }
+
if (prstate->all_visible && !PageIsAllVisible(heap_page))
*do_set_pd_vis = true;
@@ -505,6 +542,11 @@ heap_page_will_set_vis(Relation relation,
* page-level bit is clear. However, it's possible that in vacuum the bit
* got cleared after heap_vac_scan_next_block() was called, so we must
* recheck with buffer lock before concluding that the VM is corrupt.
+ *
+ * This will never trigger for on-access pruning because it couldn't have
+ * done a previous visibility map lookup and thus will always pass
+ * blk_known_av as false. A future vacuum will have to take care of fixing
+ * the corruption.
*/
else if (blk_known_av && !PageIsAllVisible(heap_page) &&
visibilitymap_get_status(relation, heap_blk, &vmbuffer) != 0)
@@ -913,6 +955,14 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
prstate.ndead > 0 ||
prstate.nunused > 0;
+ /*
+ * Even if we don't prune anything, if we found a new value for the
+ * pd_prune_xid field or the page was marked full, we will update the hint
+ * bit.
+ */
+ do_hint_prune = ((PageHeader) page)->pd_prune_xid != prstate.new_prune_xid ||
+ PageIsFull(page);
+
/*
* After processing all the live tuples on the page, if the newest xmin
* amongst them is not visible to everyone, the page cannot be
@@ -923,14 +973,6 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
!GlobalVisXidVisibleToAll(prstate.vistest, prstate.visibility_cutoff_xid))
prstate.all_visible = prstate.all_frozen = false;
- /*
- * Even if we don't prune anything, if we found a new value for the
- * pd_prune_xid field or the page was marked full, we will update the hint
- * bit.
- */
- do_hint_prune = ((PageHeader) page)->pd_prune_xid != prstate.new_prune_xid ||
- PageIsFull(page);
-
/*
* Decide if we want to go ahead with freezing according to the freeze
* plans we prepared, or not.
@@ -974,6 +1016,7 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
*/
do_set_vm = heap_page_will_set_vis(params->relation,
blockno, buffer, vmbuffer, params->blk_known_av,
+ params->reason, do_prune, do_freeze,
&prstate, &new_vmbits, &do_set_pd_vis);
/* We should only set the VM if PD_ALL_VISIBLE is set or will be */
@@ -2250,7 +2293,7 @@ heap_log_freeze_plan(HeapTupleFreeze *tuples, int ntuples,
/*
* Calculate the conflict horizon for the whole XLOG_HEAP2_PRUNE_VACUUM_SCAN
- * record.
+ * or XLOG_HEAP2_PRUNE_ON_ACCESS record.
*/
static TransactionId
get_conflict_xid(bool do_prune, bool do_freeze, bool do_set_vm,
@@ -2320,8 +2363,8 @@ get_conflict_xid(bool do_prune, bool do_freeze, bool do_set_vm,
* - Reaping: During vacuum phase III, items that are already LP_DEAD are
* marked as unused.
*
- * - VM updates: After vacuum phases I and III, the heap page may be marked
- * all-visible and all-frozen.
+ * - VM updates: After vacuum phases I and III and on-access, the heap page
+ * may be marked all-visible and all-frozen.
*
* These changes all happen together, so we use a single WAL record for them
* all.
diff --git a/src/backend/access/index/indexam.c b/src/backend/access/index/indexam.c
index 0492d92d23b..8d582a8eafd 100644
--- a/src/backend/access/index/indexam.c
+++ b/src/backend/access/index/indexam.c
@@ -289,6 +289,32 @@ index_beginscan(Relation heapRelation,
return scan;
}
+/*
+ * Similar to index_beginscan(), but allows the caller to indicate whether the
+ * query modifies the underlying base relation. This is used when the caller
+ * wants to attempt marking pages in the base relation as all-visible in the
+ * visibility map during on-access pruning.
+ */
+IndexScanDesc
+index_beginscan_vmset(Relation heapRelation,
+ Relation indexRelation,
+ Snapshot snapshot,
+ IndexScanInstrumentation *instrument,
+ int nkeys, int norderbys, bool modifies_base_rel)
+{
+ IndexScanDesc scan;
+
+ scan = index_beginscan(heapRelation,
+ indexRelation,
+ snapshot,
+ instrument,
+ nkeys, norderbys);
+
+ scan->xs_heapfetch->modifies_base_rel = modifies_base_rel;
+
+ return scan;
+}
+
/*
* index_beginscan_bitmap - start a scan of an index with amgetbitmap
*
@@ -620,6 +646,26 @@ index_beginscan_parallel(Relation heaprel, Relation indexrel,
return scan;
}
+/*
+ * Parallel version of index_beginscan_vmset()
+ */
+IndexScanDesc
+index_beginscan_parallel_vmset(Relation heaprel, Relation indexrel,
+ IndexScanInstrumentation *instrument,
+ int nkeys, int norderbys,
+ ParallelIndexScanDesc pscan,
+ bool modifies_base_rel)
+{
+ IndexScanDesc scan;
+
+ scan = index_beginscan_parallel(heaprel, indexrel,
+ instrument,
+ nkeys, norderbys,
+ pscan);
+ scan->xs_heapfetch->modifies_base_rel = modifies_base_rel;
+ return scan;
+}
+
/* ----------------
* index_getnext_tid - get the next TID from a scan
*
diff --git a/src/backend/access/table/tableam.c b/src/backend/access/table/tableam.c
index 5e41404937e..3e3a0f72a71 100644
--- a/src/backend/access/table/tableam.c
+++ b/src/backend/access/table/tableam.c
@@ -49,6 +49,10 @@
char *default_table_access_method = DEFAULT_TABLE_ACCESS_METHOD;
bool synchronize_seqscans = true;
+/* Helper for table_beginscan_parallel() and table_beginscan_parallel_vmset() */
+static TableScanDesc table_beginscan_parallel_common(Relation relation, ParallelTableScanDesc pscan,
+ uint32 flags);
+
/* ----------------------------------------------------------------------------
* Slot functions.
@@ -162,12 +166,14 @@ table_parallelscan_initialize(Relation rel, ParallelTableScanDesc pscan,
}
}
-TableScanDesc
-table_beginscan_parallel(Relation relation, ParallelTableScanDesc pscan)
+/*
+ * Common helper for table_beginscan_parallel() and table_beginscan_parallel_vmset()
+ */
+static TableScanDesc
+table_beginscan_parallel_common(Relation relation, ParallelTableScanDesc pscan,
+ uint32 flags)
{
Snapshot snapshot;
- uint32 flags = SO_TYPE_SEQSCAN |
- SO_ALLOW_STRAT | SO_ALLOW_SYNC | SO_ALLOW_PAGEMODE;
Assert(RelFileLocatorEquals(relation->rd_locator, pscan->phs_locator));
@@ -188,6 +194,31 @@ table_beginscan_parallel(Relation relation, ParallelTableScanDesc pscan)
pscan, flags);
}
+TableScanDesc
+table_beginscan_parallel(Relation relation, ParallelTableScanDesc pscan)
+{
+ uint32 flags = SO_TYPE_SEQSCAN |
+ SO_ALLOW_STRAT | SO_ALLOW_SYNC | SO_ALLOW_PAGEMODE;
+
+ return table_beginscan_parallel_common(relation, pscan, flags);
+}
+
+/*
+ * Parallel version of table_beginscan_vmset()
+ */
+TableScanDesc
+table_beginscan_parallel_vmset(Relation relation, ParallelTableScanDesc pscan,
+ bool modifies_rel)
+{
+ uint32 flags = SO_TYPE_SEQSCAN |
+ SO_ALLOW_STRAT | SO_ALLOW_SYNC | SO_ALLOW_PAGEMODE;
+
+ if (!modifies_rel)
+ flags |= SO_ALLOW_VM_SET;
+
+ return table_beginscan_parallel_common(relation, pscan, flags);
+}
+
/* ----------------------------------------------------------------------------
* Index scan related functions.
diff --git a/src/backend/executor/execMain.c b/src/backend/executor/execMain.c
index 27c9eec697b..0630a5af79e 100644
--- a/src/backend/executor/execMain.c
+++ b/src/backend/executor/execMain.c
@@ -916,6 +916,10 @@ InitPlan(QueryDesc *queryDesc, int eflags)
break;
}
+ /* If it has a rowmark, the relation is modified */
+ estate->es_modified_relids = bms_add_member(estate->es_modified_relids,
+ rc->rti);
+
/* Check that relation is a legal target for marking */
if (relation)
CheckValidRowMarkRel(relation, rc->markType);
diff --git a/src/backend/executor/execUtils.c b/src/backend/executor/execUtils.c
index fdc65c2b42b..28a06dcd244 100644
--- a/src/backend/executor/execUtils.c
+++ b/src/backend/executor/execUtils.c
@@ -893,6 +893,8 @@ ExecInitResultRelation(EState *estate, ResultRelInfo *resultRelInfo,
estate->es_result_relations = (ResultRelInfo **)
palloc0(estate->es_range_table_size * sizeof(ResultRelInfo *));
estate->es_result_relations[rti - 1] = resultRelInfo;
+ estate->es_modified_relids = bms_add_member(estate->es_modified_relids,
+ rti);
/*
* Saving in the list allows to avoid needlessly traversing the whole
diff --git a/src/backend/executor/nodeBitmapHeapscan.c b/src/backend/executor/nodeBitmapHeapscan.c
index bf24f3d7fe0..af6db9f7919 100644
--- a/src/backend/executor/nodeBitmapHeapscan.c
+++ b/src/backend/executor/nodeBitmapHeapscan.c
@@ -105,11 +105,16 @@ BitmapTableScanSetup(BitmapHeapScanState *node)
*/
if (!node->ss.ss_currentScanDesc)
{
+ bool modifies_rel =
+ bms_is_member(((Scan *) node->ss.ps.plan)->scanrelid,
+ node->ss.ps.state->es_modified_relids);
+
node->ss.ss_currentScanDesc =
table_beginscan_bm(node->ss.ss_currentRelation,
node->ss.ps.state->es_snapshot,
0,
- NULL);
+ NULL,
+ modifies_rel);
}
node->ss.ss_currentScanDesc->st.rs_tbmiterator = tbmiterator;
diff --git a/src/backend/executor/nodeIndexscan.c b/src/backend/executor/nodeIndexscan.c
index f36929deec3..cbd1ecaa15f 100644
--- a/src/backend/executor/nodeIndexscan.c
+++ b/src/backend/executor/nodeIndexscan.c
@@ -102,16 +102,22 @@ IndexNext(IndexScanState *node)
if (scandesc == NULL)
{
+
+ bool modifies_base_rel =
+ bms_is_member(((Scan *) node->ss.ps.plan)->scanrelid,
+ estate->es_modified_relids);
+
/*
* We reach here if the index scan is not parallel, or if we're
* serially executing an index scan that was planned to be parallel.
*/
- scandesc = index_beginscan(node->ss.ss_currentRelation,
- node->iss_RelationDesc,
- estate->es_snapshot,
- &node->iss_Instrument,
- node->iss_NumScanKeys,
- node->iss_NumOrderByKeys);
+ scandesc = index_beginscan_vmset(node->ss.ss_currentRelation,
+ node->iss_RelationDesc,
+ estate->es_snapshot,
+ &node->iss_Instrument,
+ node->iss_NumScanKeys,
+ node->iss_NumOrderByKeys,
+ modifies_base_rel);
node->iss_ScanDesc = scandesc;
diff --git a/src/backend/executor/nodeSeqscan.c b/src/backend/executor/nodeSeqscan.c
index 94047d29430..fd69275c181 100644
--- a/src/backend/executor/nodeSeqscan.c
+++ b/src/backend/executor/nodeSeqscan.c
@@ -65,13 +65,18 @@ SeqNext(SeqScanState *node)
if (scandesc == NULL)
{
+ bool modifies_rel =
+ bms_is_member(((Scan *) node->ss.ps.plan)->scanrelid,
+ estate->es_modified_relids);
+
/*
* We reach here if the scan is not parallel, or if we're serially
* executing a scan that was planned to be parallel.
*/
- scandesc = table_beginscan(node->ss.ss_currentRelation,
- estate->es_snapshot,
- 0, NULL);
+ scandesc = table_beginscan_vmset(node->ss.ss_currentRelation,
+ estate->es_snapshot,
+ 0, NULL, modifies_rel);
+
node->ss.ss_currentScanDesc = scandesc;
}
@@ -366,6 +371,7 @@ ExecSeqScanInitializeDSM(SeqScanState *node,
ParallelContext *pcxt)
{
EState *estate = node->ss.ps.state;
+ bool modifies_rel;
ParallelTableScanDesc pscan;
pscan = shm_toc_allocate(pcxt->toc, node->pscan_len);
@@ -373,8 +379,11 @@ ExecSeqScanInitializeDSM(SeqScanState *node,
pscan,
estate->es_snapshot);
shm_toc_insert(pcxt->toc, node->ss.ps.plan->plan_node_id, pscan);
+ modifies_rel = bms_is_member(((Scan *) node->ss.ps.plan)->scanrelid,
+ estate->es_modified_relids);
node->ss.ss_currentScanDesc =
- table_beginscan_parallel(node->ss.ss_currentRelation, pscan);
+ table_beginscan_parallel_vmset(node->ss.ss_currentRelation, pscan,
+ modifies_rel);
}
/* ----------------------------------------------------------------
@@ -404,8 +413,13 @@ ExecSeqScanInitializeWorker(SeqScanState *node,
ParallelWorkerContext *pwcxt)
{
ParallelTableScanDesc pscan;
+ bool modifies_rel =
+ bms_is_member(((Scan *) node->ss.ps.plan)->scanrelid,
+ node->ss.ps.state->es_modified_relids);
pscan = shm_toc_lookup(pwcxt->toc, node->ss.ps.plan->plan_node_id, false);
node->ss.ss_currentScanDesc =
- table_beginscan_parallel(node->ss.ss_currentRelation, pscan);
+ table_beginscan_parallel_vmset(node->ss.ss_currentRelation,
+ pscan,
+ modifies_rel);
}
diff --git a/src/include/access/genam.h b/src/include/access/genam.h
index 9200a22bd9f..aa2112c8e04 100644
--- a/src/include/access/genam.h
+++ b/src/include/access/genam.h
@@ -178,6 +178,11 @@ extern IndexScanDesc index_beginscan(Relation heapRelation,
Snapshot snapshot,
IndexScanInstrumentation *instrument,
int nkeys, int norderbys);
+extern IndexScanDesc index_beginscan_vmset(Relation heapRelation,
+ Relation indexRelation,
+ Snapshot snapshot,
+ IndexScanInstrumentation *instrument,
+ int nkeys, int norderbys, bool modifies_heap_rel);
extern IndexScanDesc index_beginscan_bitmap(Relation indexRelation,
Snapshot snapshot,
IndexScanInstrumentation *instrument,
@@ -204,6 +209,12 @@ extern IndexScanDesc index_beginscan_parallel(Relation heaprel,
IndexScanInstrumentation *instrument,
int nkeys, int norderbys,
ParallelIndexScanDesc pscan);
+
+extern IndexScanDesc index_beginscan_parallel_vmset(Relation heaprel, Relation indexrel,
+ IndexScanInstrumentation *instrument,
+ int nkeys, int norderbys,
+ ParallelIndexScanDesc pscan,
+ bool modifies_rel);
extern ItemPointer index_getnext_tid(IndexScanDesc scan,
ScanDirection direction);
extern bool index_fetch_heap(IndexScanDesc scan, TupleTableSlot *slot);
diff --git a/src/include/access/heapam.h b/src/include/access/heapam.h
index 4fc6edf4261..1d2cab64e9c 100644
--- a/src/include/access/heapam.h
+++ b/src/include/access/heapam.h
@@ -95,6 +95,13 @@ typedef struct HeapScanDescData
*/
ParallelBlockTableScanWorkerData *rs_parallelworkerdata;
+ /*
+ * For sequential scans and bitmap heap scans. If the relation is not
+ * being modified, on-access pruning may read in the current heap page's
+ * corresponding VM block to this buffer.
+ */
+ Buffer rs_vmbuffer;
+
/* these fields only used in page-at-a-time mode and for bitmap scans */
uint32 rs_cindex; /* current tuple's index in vistuples */
uint32 rs_ntuples; /* number of visible tuples on page */
@@ -117,8 +124,18 @@ typedef struct IndexFetchHeapData
{
IndexFetchTableData xs_base; /* AM independent part of the descriptor */
- Buffer xs_cbuf; /* current heap buffer in scan, if any */
- /* NB: if xs_cbuf is not InvalidBuffer, we hold a pin on that buffer */
+ /*
+ * Current heap buffer in scan, if any. NB: if xs_cbuf is not
+ * InvalidBuffer, we hold a pin on that buffer.
+ */
+ Buffer xs_cbuf;
+
+ /*
+ * For index scans that do not modify the underlying heap table, on-access
+ * pruning may read in the current heap page's corresponding VM block to
+ * this buffer.
+ */
+ Buffer xs_vmbuffer;
} IndexFetchHeapData;
/* Result codes for HeapTupleSatisfiesVacuum */
@@ -415,7 +432,8 @@ extern TransactionId heap_index_delete_tuples(Relation rel,
TM_IndexDeleteOp *delstate);
/* in heap/pruneheap.c */
-extern void heap_page_prune_opt(Relation relation, Buffer buffer);
+extern void heap_page_prune_opt(Relation relation, Buffer buffer,
+ Buffer *vmbuffer);
extern void heap_page_prune_and_freeze(PruneFreezeParams *params,
PruneFreezeResult *presult,
OffsetNumber *off_loc,
diff --git a/src/include/access/relscan.h b/src/include/access/relscan.h
index b5e0fb386c0..f496e0b4939 100644
--- a/src/include/access/relscan.h
+++ b/src/include/access/relscan.h
@@ -121,6 +121,12 @@ typedef struct ParallelBlockTableScanWorkerData *ParallelBlockTableScanWorker;
typedef struct IndexFetchTableData
{
Relation rel;
+
+ /*
+ * Some optimizations can only be performed if the query does not modify
+ * the underlying relation. Track that here.
+ */
+ bool modifies_base_rel;
} IndexFetchTableData;
struct IndexScanInstrumentation;
diff --git a/src/include/access/tableam.h b/src/include/access/tableam.h
index e16bf025692..f250d4e7aec 100644
--- a/src/include/access/tableam.h
+++ b/src/include/access/tableam.h
@@ -63,6 +63,8 @@ typedef enum ScanOptions
/* unregister snapshot at scan end? */
SO_TEMP_SNAPSHOT = 1 << 9,
+ /* whether or not scan should attempt to set the VM */
+ SO_ALLOW_VM_SET = 1 << 10,
} ScanOptions;
/*
@@ -882,6 +884,25 @@ table_beginscan(Relation rel, Snapshot snapshot,
return rel->rd_tableam->scan_begin(rel, snapshot, nkeys, key, NULL, flags);
}
+/*
+ * Similar to table_beginscan(), but allows the caller to indicate whether the
+ * query modifies the relation. This is used when the caller wants to attempt
+ * marking pages in the relation as all-visible in the visibility map during
+ * on-access pruning.
+ */
+static inline TableScanDesc
+table_beginscan_vmset(Relation rel, Snapshot snapshot,
+ int nkeys, struct ScanKeyData *key, bool modifies_rel)
+{
+ uint32 flags = SO_TYPE_SEQSCAN |
+ SO_ALLOW_STRAT | SO_ALLOW_SYNC | SO_ALLOW_PAGEMODE;
+
+ if (!modifies_rel)
+ flags |= SO_ALLOW_VM_SET;
+
+ return rel->rd_tableam->scan_begin(rel, snapshot, nkeys, key, NULL, flags);
+}
+
/*
* Like table_beginscan(), but for scanning catalog. It'll automatically use a
* snapshot appropriate for scanning catalog relations.
@@ -919,10 +940,13 @@ table_beginscan_strat(Relation rel, Snapshot snapshot,
*/
static inline TableScanDesc
table_beginscan_bm(Relation rel, Snapshot snapshot,
- int nkeys, ScanKeyData *key)
+ int nkeys, ScanKeyData *key, bool modifies_rel)
{
uint32 flags = SO_TYPE_BITMAPSCAN | SO_ALLOW_PAGEMODE;
+ if (!modifies_rel)
+ flags |= SO_ALLOW_VM_SET;
+
return rel->rd_tableam->scan_begin(rel, snapshot, nkeys, key,
NULL, flags);
}
@@ -1130,6 +1154,10 @@ extern void table_parallelscan_initialize(Relation rel,
extern TableScanDesc table_beginscan_parallel(Relation relation,
ParallelTableScanDesc pscan);
+extern TableScanDesc table_beginscan_parallel_vmset(Relation relation,
+ ParallelTableScanDesc pscan,
+ bool modifies_rel);
+
/*
* Restart a parallel scan. Call this in the leader process. Caller is
* responsible for making sure that all workers have finished the scan
diff --git a/src/include/nodes/execnodes.h b/src/include/nodes/execnodes.h
index 18ae8f0d4bb..0c3b0d60168 100644
--- a/src/include/nodes/execnodes.h
+++ b/src/include/nodes/execnodes.h
@@ -676,6 +676,12 @@ typedef struct EState
* ExecDoInitialPruning() */
const char *es_sourceText; /* Source text from QueryDesc */
+ /*
+ * RT indexes of relations modified by the query either through
+ * UPDATE/DELETE/INSERT/MERGE or SELECT FOR UPDATE
+ */
+ Bitmapset *es_modified_relids;
+
JunkFilter *es_junkFilter; /* top-level junk filter, if any */
/* If query can insert/delete tuples, the command ID to mark them with */
diff --git a/src/test/recovery/t/035_standby_logical_decoding.pl b/src/test/recovery/t/035_standby_logical_decoding.pl
index ebe2fae1789..bdd9f0a62cd 100644
--- a/src/test/recovery/t/035_standby_logical_decoding.pl
+++ b/src/test/recovery/t/035_standby_logical_decoding.pl
@@ -296,6 +296,7 @@ wal_level = 'logical'
max_replication_slots = 4
max_wal_senders = 4
autovacuum = off
+hot_standby_feedback = on
});
$node_primary->dump_info;
$node_primary->start;
@@ -748,7 +749,7 @@ check_pg_recvlogical_stderr($handle,
$logstart = -s $node_standby->logfile;
reactive_slots_change_hfs_and_wait_for_xmins('shared_row_removal_',
- 'no_conflict_', 0, 1);
+ 'no_conflict_', 1, 0);
# This should not trigger a conflict
wait_until_vacuum_can_remove(
--
2.43.0
v19-0011-Set-pd_prune_xid-on-insert.patchtext/x-patch; charset=UTF-8; name=v19-0011-Set-pd_prune_xid-on-insert.patchDownload
From e6b5d40c8c8758179ccd2ffe6e60dfc725430c12 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <melanieplageman@gmail.com>
Date: Tue, 29 Jul 2025 16:12:56 -0400
Subject: [PATCH v19 11/12] Set pd_prune_xid on insert
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
Now that visibility map (VM) updates can occur during read-only queries,
it makes sense to also set the page’s pd_prune_xid hint during inserts.
This enables heap_page_prune_and_freeze() to run after a page is
filled with newly inserted tuples the first time it is read.
This change also addresses a long-standing note in heap_insert() and
heap_multi_insert(), which observed that setting pd_prune_xid would
help clean up aborted insertions sooner. Without it, such tuples might
linger until VACUUM, whereas now they can be pruned earlier.
Setting pd_prune_xid on insert can cause a page to be dirtied and
written out when it previously would not have been, affetcting the
reported number of hits in the index-killtuples isolation test. It is
unclear if this is a bug in the way hits are tracked, a faulty test
expectation, or if simply updating the test's expected output is
sufficient remediation.
---
src/backend/access/heap/heapam.c | 25 +++++++++++++------
src/backend/access/heap/heapam_xlog.c | 15 ++++++++++-
.../isolation/expected/index-killtuples.out | 6 ++---
3 files changed, 34 insertions(+), 12 deletions(-)
diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index 260f981d457..eea3a3d2ddc 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -2104,6 +2104,7 @@ heap_insert(Relation relation, HeapTuple tup, CommandId cid,
TransactionId xid = GetCurrentTransactionId();
HeapTuple heaptup;
Buffer buffer;
+ Page page;
Buffer vmbuffer = InvalidBuffer;
bool all_visible_cleared = false;
@@ -2163,15 +2164,19 @@ heap_insert(Relation relation, HeapTuple tup, CommandId cid,
}
/*
- * XXX Should we set PageSetPrunable on this page ?
+ * Set pd_prune_xid to trigger heap_page_prune_and_freeze() once the page
+ * is full so that we can set the page all-visible in the VM.
*
- * The inserting transaction may eventually abort thus making this tuple
- * DEAD and hence available for pruning. Though we don't want to optimize
- * for aborts, if no other tuple in this page is UPDATEd/DELETEd, the
- * aborted tuple will never be pruned until next vacuum is triggered.
+ * Setting pd_prune_xid is also handy if the inserting transaction
+ * eventually aborts making this tuple DEAD and hence available for
+ * pruning. If no other tuple in this page is UPDATEd/DELETEd, the aborted
+ * tuple would never otherwise be pruned until next vacuum is triggered.
*
- * If you do add PageSetPrunable here, add it in heap_xlog_insert too.
+ * Don't set it if we are in bootstrap mode, though.
*/
+ page = BufferGetPage(buffer);
+ if (TransactionIdIsNormal(xid))
+ PageSetPrunable(page, xid);
MarkBufferDirty(buffer);
@@ -2181,7 +2186,6 @@ heap_insert(Relation relation, HeapTuple tup, CommandId cid,
xl_heap_insert xlrec;
xl_heap_header xlhdr;
XLogRecPtr recptr;
- Page page = BufferGetPage(buffer);
uint8 info = XLOG_HEAP_INSERT;
int bufflags = 0;
@@ -2545,8 +2549,13 @@ heap_multi_insert(Relation relation, TupleTableSlot **slots, int ntuples,
}
/*
- * XXX Should we set PageSetPrunable on this page ? See heap_insert()
+ * Set pd_prune_xid. See heap_insert() for more on why we do this when
+ * inserting tuples. This only makes sense if we aren't already
+ * setting the page frozen in the VM. We also don't set it in
+ * bootstrap mode.
*/
+ if (!all_frozen_set && TransactionIdIsNormal(xid))
+ PageSetPrunable(page, xid);
MarkBufferDirty(buffer);
diff --git a/src/backend/access/heap/heapam_xlog.c b/src/backend/access/heap/heapam_xlog.c
index 5ab46e8bf8f..dac640f5c9d 100644
--- a/src/backend/access/heap/heapam_xlog.c
+++ b/src/backend/access/heap/heapam_xlog.c
@@ -462,6 +462,12 @@ heap_xlog_insert(XLogReaderState *record)
freespace = PageGetHeapFreeSpace(page); /* needed to update FSM below */
+ /*
+ * Set the page prunable to trigger on-access pruning later which may
+ * set the page all-visible in the VM.
+ */
+ PageSetPrunable(page, XLogRecGetXid(record));
+
PageSetLSN(page, lsn);
if (xlrec->flags & XLH_INSERT_ALL_VISIBLE_CLEARED)
@@ -611,9 +617,16 @@ heap_xlog_multi_insert(XLogReaderState *record)
if (xlrec->flags & XLH_INSERT_ALL_VISIBLE_CLEARED)
PageClearAllVisible(page);
- /* XLH_INSERT_ALL_FROZEN_SET implies that all tuples are visible */
+ /*
+ * XLH_INSERT_ALL_FROZEN_SET implies that all tuples are visible. If
+ * we are not setting the page frozen, then set the page's prunable
+ * hint so that we trigger on-access pruning later which may set the
+ * page all-visible in the VM.
+ */
if (xlrec->flags & XLH_INSERT_ALL_FROZEN_SET)
PageSetAllVisible(page);
+ else
+ PageSetPrunable(page, XLogRecGetXid(record));
MarkBufferDirty(buffer);
}
diff --git a/src/test/isolation/expected/index-killtuples.out b/src/test/isolation/expected/index-killtuples.out
index be7ddd756ef..b29f2434b00 100644
--- a/src/test/isolation/expected/index-killtuples.out
+++ b/src/test/isolation/expected/index-killtuples.out
@@ -54,7 +54,7 @@ step flush: SELECT FROM pg_stat_force_next_flush();
step result: SELECT heap_blks_read + heap_blks_hit - counter.heap_accesses AS new_heap_accesses FROM counter, pg_statio_all_tables WHERE relname = 'kill_prior_tuple';
new_heap_accesses
-----------------
- 1
+ 2
(1 row)
step measure: UPDATE counter SET heap_accesses = (SELECT heap_blks_read + heap_blks_hit FROM pg_statio_all_tables WHERE relname = 'kill_prior_tuple');
@@ -130,7 +130,7 @@ step flush: SELECT FROM pg_stat_force_next_flush();
step result: SELECT heap_blks_read + heap_blks_hit - counter.heap_accesses AS new_heap_accesses FROM counter, pg_statio_all_tables WHERE relname = 'kill_prior_tuple';
new_heap_accesses
-----------------
- 1
+ 2
(1 row)
step measure: UPDATE counter SET heap_accesses = (SELECT heap_blks_read + heap_blks_hit FROM pg_statio_all_tables WHERE relname = 'kill_prior_tuple');
@@ -283,7 +283,7 @@ step flush: SELECT FROM pg_stat_force_next_flush();
step result: SELECT heap_blks_read + heap_blks_hit - counter.heap_accesses AS new_heap_accesses FROM counter, pg_statio_all_tables WHERE relname = 'kill_prior_tuple';
new_heap_accesses
-----------------
- 1
+ 2
(1 row)
step measure: UPDATE counter SET heap_accesses = (SELECT heap_blks_read + heap_blks_hit FROM pg_statio_all_tables WHERE relname = 'kill_prior_tuple');
--
2.43.0
v19-0012-Split-heap_page_prune_and_freeze-into-helpers.patchtext/x-patch; charset=US-ASCII; name=v19-0012-Split-heap_page_prune_and_freeze-into-helpers.patchDownload
From 1ff6d727d64771ed19e07e6d1644380e16508944 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <melanieplageman@gmail.com>
Date: Wed, 8 Oct 2025 18:45:45 -0400
Subject: [PATCH v19 12/12] Split heap_page_prune_and_freeze into helpers
ci-os-only:
---
src/backend/access/heap/pruneheap.c | 316 +++++++++++++++-------------
1 file changed, 170 insertions(+), 146 deletions(-)
diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index 631eb45bc96..e18ec37fdf5 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -590,82 +590,20 @@ heap_page_will_set_vis(Relation relation,
return do_set_vm;
}
-/*
- * Prune and repair fragmentation and potentially freeze tuples on the
- * specified page. If the page's visibility status has changed, update it in
- * the VM.
- *
- * Caller must have pin and buffer cleanup lock on the page. Note that we
- * don't update the FSM information for page on caller's behalf. Caller might
- * also need to account for a reduction in the length of the line pointer
- * array following array truncation by us.
- *
- * params contains the input parameters used to control freezing and pruning
- * behavior. See the definition of PruneFreezeParams for more on what each
- * parameter does.
- *
- * If the HEAP_PRUNE_FREEZE option is set in params, we will freeze tuples if
- * it's required in order to advance relfrozenxid / relminmxid, or if it's
- * considered advantageous for overall system performance to do so now. The
- * 'params.cutoffs', 'presult', 'new_relfrozen_xid' and 'new_relmin_mxid'
- * arguments are required when freezing.
- *
- * If HEAP_PAGE_PRUNE_UPDATE_VIS is set in params and the visibility status of
- * the page has changed, we will update the VM at the same time as pruning and
- * freezing the heap page. We will also update presult->old_vmbits and
- * presult->new_vmbits with the state of the VM before and after updating it
- * for the caller to use in bookkeeping.
- *
- * presult contains output parameters needed by callers, such as the number of
- * tuples removed and the offsets of dead items on the page after pruning.
- * heap_page_prune_and_freeze() is responsible for initializing it. Required
- * by all callers.
- *
- * off_loc is the offset location required by the caller to use in error
- * callback.
- *
- * new_relfrozen_xid and new_relmin_mxid must provided by the caller if the
- * HEAP_PRUNE_FREEZE option is set in params. On entry, they contain the
- * oldest XID and multi-XID seen on the relation so far. They will be updated
- * with oldest values present on the page after pruning. After processing the
- * whole relation, VACUUM can use these values as the new
- * relfrozenxid/relminmxid for the relation.
- */
-void
-heap_page_prune_and_freeze(PruneFreezeParams *params,
- PruneFreezeResult *presult,
- OffsetNumber *off_loc,
- TransactionId *new_relfrozen_xid,
- MultiXactId *new_relmin_mxid)
+static void
+prune_freeze_setup(PruneFreezeParams *params, PruneState *prstate,
+ TransactionId *new_relfrozen_xid,
+ MultiXactId *new_relmin_mxid,
+ PruneFreezeResult *presult)
{
- Buffer buffer = params->buffer;
- Buffer vmbuffer = params->vmbuffer;
- Page page = BufferGetPage(buffer);
- BlockNumber blockno = BufferGetBlockNumber(buffer);
- OffsetNumber offnum,
- maxoff;
- PruneState prstate;
- HeapTupleData tup;
- bool do_freeze;
- bool do_prune;
- bool do_hint_prune;
- bool do_set_vm;
- bool do_set_pd_vis;
- bool did_tuple_hint_fpi;
- int64 fpi_before = pgWalUsage.wal_fpi;
- TransactionId frz_conflict_horizon = InvalidTransactionId;
- TransactionId conflict_xid = InvalidTransactionId;
- uint8 new_vmbits = 0;
- uint8 old_vmbits = 0;
-
/* Copy parameters to prstate */
- prstate.vistest = params->vistest;
- prstate.mark_unused_now =
+ prstate->vistest = params->vistest;
+ prstate->mark_unused_now =
(params->options & HEAP_PAGE_PRUNE_MARK_UNUSED_NOW) != 0;
- prstate.attempt_freeze = (params->options & HEAP_PAGE_PRUNE_FREEZE) != 0;
- prstate.attempt_update_vm =
+ prstate->attempt_freeze = (params->options & HEAP_PAGE_PRUNE_FREEZE) != 0;
+ prstate->attempt_update_vm =
(params->options & HEAP_PAGE_PRUNE_UPDATE_VIS) != 0;
- prstate.cutoffs = params->cutoffs;
+ prstate->cutoffs = params->cutoffs;
/*
* Our strategy is to scan the page and make lists of items to change,
@@ -678,37 +616,37 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
* prunable, we will save the lowest relevant XID in new_prune_xid. Also
* initialize the rest of our working state.
*/
- prstate.new_prune_xid = InvalidTransactionId;
- prstate.latest_xid_removed = InvalidTransactionId;
- prstate.nredirected = prstate.ndead = prstate.nunused = prstate.nfrozen = 0;
- prstate.nroot_items = 0;
- prstate.nheaponly_items = 0;
+ prstate->new_prune_xid = InvalidTransactionId;
+ prstate->latest_xid_removed = InvalidTransactionId;
+ prstate->nredirected = prstate->ndead = prstate->nunused = prstate->nfrozen = 0;
+ prstate->nroot_items = 0;
+ prstate->nheaponly_items = 0;
/* initialize page freezing working state */
- prstate.pagefrz.freeze_required = false;
- if (prstate.attempt_freeze)
+ prstate->pagefrz.freeze_required = false;
+ if (prstate->attempt_freeze)
{
Assert(new_relfrozen_xid && new_relmin_mxid);
- prstate.pagefrz.FreezePageRelfrozenXid = *new_relfrozen_xid;
- prstate.pagefrz.NoFreezePageRelfrozenXid = *new_relfrozen_xid;
- prstate.pagefrz.FreezePageRelminMxid = *new_relmin_mxid;
- prstate.pagefrz.NoFreezePageRelminMxid = *new_relmin_mxid;
+ prstate->pagefrz.FreezePageRelfrozenXid = *new_relfrozen_xid;
+ prstate->pagefrz.NoFreezePageRelfrozenXid = *new_relfrozen_xid;
+ prstate->pagefrz.FreezePageRelminMxid = *new_relmin_mxid;
+ prstate->pagefrz.NoFreezePageRelminMxid = *new_relmin_mxid;
}
else
{
Assert(new_relfrozen_xid == NULL && new_relmin_mxid == NULL);
- prstate.pagefrz.FreezePageRelminMxid = InvalidMultiXactId;
- prstate.pagefrz.NoFreezePageRelminMxid = InvalidMultiXactId;
- prstate.pagefrz.FreezePageRelfrozenXid = InvalidTransactionId;
- prstate.pagefrz.NoFreezePageRelfrozenXid = InvalidTransactionId;
+ prstate->pagefrz.FreezePageRelminMxid = InvalidMultiXactId;
+ prstate->pagefrz.NoFreezePageRelminMxid = InvalidMultiXactId;
+ prstate->pagefrz.FreezePageRelfrozenXid = InvalidTransactionId;
+ prstate->pagefrz.NoFreezePageRelfrozenXid = InvalidTransactionId;
}
- prstate.ndeleted = 0;
- prstate.live_tuples = 0;
- prstate.recently_dead_tuples = 0;
- prstate.hastup = false;
- prstate.lpdead_items = 0;
- prstate.deadoffsets = presult->deadoffsets;
+ prstate->ndeleted = 0;
+ prstate->live_tuples = 0;
+ prstate->recently_dead_tuples = 0;
+ prstate->hastup = false;
+ prstate->lpdead_items = 0;
+ prstate->deadoffsets = presult->deadoffsets;
/*
* Track whether the page could be marked all-visible and/or all-frozen.
@@ -736,20 +674,20 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
* bookkeeping. In this case, initializing all_visible to false allows
* heap_prune_record_unchanged_lp_normal() to bypass unnecessary work.
*/
- if (prstate.attempt_freeze)
+ if (prstate->attempt_freeze)
{
- prstate.all_visible = true;
- prstate.all_frozen = true;
+ prstate->all_visible = true;
+ prstate->all_frozen = true;
}
- else if (prstate.attempt_update_vm)
+ else if (prstate->attempt_update_vm)
{
- prstate.all_visible = true;
- prstate.all_frozen = false;
+ prstate->all_visible = true;
+ prstate->all_frozen = false;
}
else
{
- prstate.all_visible = false;
- prstate.all_frozen = false;
+ prstate->all_visible = false;
+ prstate->all_frozen = false;
}
/*
@@ -761,10 +699,14 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
* used to calculate the snapshot conflict horizon when updating the VM
* and/or freezing all the tuples on the page.
*/
- prstate.visibility_cutoff_xid = InvalidTransactionId;
+ prstate->visibility_cutoff_xid = InvalidTransactionId;
+}
- maxoff = PageGetMaxOffsetNumber(page);
- tup.t_tableOid = RelationGetRelid(params->relation);
+static void
+prune_freeze_plan(PruneState *prstate, BlockNumber blockno, Buffer buffer, Page page,
+ OffsetNumber maxoff, OffsetNumber *off_loc, HeapTuple tup)
+{
+ OffsetNumber offnum;
/*
* Determine HTSV for all tuples, and queue them up for processing as HOT
@@ -799,13 +741,13 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
*/
*off_loc = offnum;
- prstate.processed[offnum] = false;
- prstate.htsv[offnum] = -1;
+ prstate->processed[offnum] = false;
+ prstate->htsv[offnum] = -1;
/* Nothing to do if slot doesn't contain a tuple */
if (!ItemIdIsUsed(itemid))
{
- heap_prune_record_unchanged_lp_unused(page, &prstate, offnum);
+ heap_prune_record_unchanged_lp_unused(page, prstate, offnum);
continue;
}
@@ -815,17 +757,17 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
* If the caller set mark_unused_now true, we can set dead line
* pointers LP_UNUSED now.
*/
- if (unlikely(prstate.mark_unused_now))
- heap_prune_record_unused(&prstate, offnum, false);
+ if (unlikely(prstate->mark_unused_now))
+ heap_prune_record_unused(prstate, offnum, false);
else
- heap_prune_record_unchanged_lp_dead(page, &prstate, offnum);
+ heap_prune_record_unchanged_lp_dead(page, prstate, offnum);
continue;
}
if (ItemIdIsRedirected(itemid))
{
/* This is the start of a HOT chain */
- prstate.root_items[prstate.nroot_items++] = offnum;
+ prstate->root_items[prstate->nroot_items++] = offnum;
continue;
}
@@ -835,25 +777,19 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
* Get the tuple's visibility status and queue it up for processing.
*/
htup = (HeapTupleHeader) PageGetItem(page, itemid);
- tup.t_data = htup;
- tup.t_len = ItemIdGetLength(itemid);
- ItemPointerSet(&tup.t_self, blockno, offnum);
+ tup->t_data = htup;
+ tup->t_len = ItemIdGetLength(itemid);
+ ItemPointerSet(&tup->t_self, blockno, offnum);
- prstate.htsv[offnum] = heap_prune_satisfies_vacuum(&prstate, &tup,
- buffer);
+ prstate->htsv[offnum] = heap_prune_satisfies_vacuum(prstate, tup,
+ buffer);
if (!HeapTupleHeaderIsHeapOnly(htup))
- prstate.root_items[prstate.nroot_items++] = offnum;
+ prstate->root_items[prstate->nroot_items++] = offnum;
else
- prstate.heaponly_items[prstate.nheaponly_items++] = offnum;
+ prstate->heaponly_items[prstate->nheaponly_items++] = offnum;
}
- /*
- * If checksums are enabled, heap_prune_satisfies_vacuum() may have caused
- * an FPI to be emitted.
- */
- did_tuple_hint_fpi = fpi_before != pgWalUsage.wal_fpi;
-
/*
* Process HOT chains.
*
@@ -865,30 +801,30 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
* the page instead of using the root_items array, also did it in
* ascending offset number order.)
*/
- for (int i = prstate.nroot_items - 1; i >= 0; i--)
+ for (int i = prstate->nroot_items - 1; i >= 0; i--)
{
- offnum = prstate.root_items[i];
+ offnum = prstate->root_items[i];
/* Ignore items already processed as part of an earlier chain */
- if (prstate.processed[offnum])
+ if (prstate->processed[offnum])
continue;
/* see preceding loop */
*off_loc = offnum;
/* Process this item or chain of items */
- heap_prune_chain(page, blockno, maxoff, offnum, &prstate);
+ heap_prune_chain(page, blockno, maxoff, offnum, prstate);
}
/*
* Process any heap-only tuples that were not already processed as part of
* a HOT chain.
*/
- for (int i = prstate.nheaponly_items - 1; i >= 0; i--)
+ for (int i = prstate->nheaponly_items - 1; i >= 0; i--)
{
- offnum = prstate.heaponly_items[i];
+ offnum = prstate->heaponly_items[i];
- if (prstate.processed[offnum])
+ if (prstate->processed[offnum])
continue;
/* see preceding loop */
@@ -907,7 +843,7 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
* return true for an XMIN_INVALID tuple, so this code will work even
* when there were sequential updates within the aborted transaction.)
*/
- if (prstate.htsv[offnum] == HEAPTUPLE_DEAD)
+ if (prstate->htsv[offnum] == HEAPTUPLE_DEAD)
{
ItemId itemid = PageGetItemId(page, offnum);
HeapTupleHeader htup = (HeapTupleHeader) PageGetItem(page, itemid);
@@ -915,8 +851,8 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
if (likely(!HeapTupleHeaderIsHotUpdated(htup)))
{
HeapTupleHeaderAdvanceConflictHorizon(htup,
- &prstate.latest_xid_removed);
- heap_prune_record_unused(&prstate, offnum, true);
+ &prstate->latest_xid_removed);
+ heap_prune_record_unused(prstate, offnum, true);
}
else
{
@@ -933,7 +869,7 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
}
}
else
- heap_prune_record_unchanged_lp_normal(page, &prstate, offnum);
+ heap_prune_record_unchanged_lp_normal(page, prstate, offnum);
}
/* We should now have processed every tuple exactly once */
@@ -944,12 +880,110 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
{
*off_loc = offnum;
- Assert(prstate.processed[offnum]);
+ Assert(prstate->processed[offnum]);
}
#endif
+ /*
+ * After processing all the live tuples on the page, if the newest xmin
+ * amongst them is not visible to everyone, the page cannot be
+ * all-visible.
+ */
+ if (prstate->all_visible &&
+ TransactionIdIsNormal(prstate->visibility_cutoff_xid) &&
+ !GlobalVisXidVisibleToAll(prstate->vistest, prstate->visibility_cutoff_xid))
+ prstate->all_visible = prstate->all_frozen = false;
+
/* Clear the offset information once we have processed the given page. */
*off_loc = InvalidOffsetNumber;
+}
+
+/*
+ * Prune and repair fragmentation and potentially freeze tuples on the
+ * specified page. If the page's visibility status has changed, update it in
+ * the VM.
+ *
+ * Caller must have pin and buffer cleanup lock on the page. Note that we
+ * don't update the FSM information for page on caller's behalf. Caller might
+ * also need to account for a reduction in the length of the line pointer
+ * array following array truncation by us.
+ *
+ * params contains the input parameters used to control freezing and pruning
+ * behavior. See the definition of PruneFreezeParams for more on what each
+ * parameter does.
+ *
+ * If the HEAP_PRUNE_FREEZE option is set in params, we will freeze tuples if
+ * it's required in order to advance relfrozenxid / relminmxid, or if it's
+ * considered advantageous for overall system performance to do so now. The
+ * 'params.cutoffs', 'presult', 'new_relfrozen_xid' and 'new_relmin_mxid'
+ * arguments are required when freezing.
+ *
+ * If HEAP_PAGE_PRUNE_UPDATE_VIS is set in params and the visibility status of
+ * the page has changed, we will update the VM at the same time as pruning and
+ * freezing the heap page. We will also update presult->old_vmbits and
+ * presult->new_vmbits with the state of the VM before and after updating it
+ * for the caller to use in bookkeeping.
+ *
+ * presult contains output parameters needed by callers, such as the number of
+ * tuples removed and the offsets of dead items on the page after pruning.
+ * heap_page_prune_and_freeze() is responsible for initializing it. Required
+ * by all callers.
+ *
+ * off_loc is the offset location required by the caller to use in error
+ * callback.
+ *
+ * new_relfrozen_xid and new_relmin_mxid must provided by the caller if the
+ * HEAP_PRUNE_FREEZE option is set in params. On entry, they contain the
+ * oldest XID and multi-XID seen on the relation so far. They will be updated
+ * with oldest values present on the page after pruning. After processing the
+ * whole relation, VACUUM can use these values as the new
+ * relfrozenxid/relminmxid for the relation.
+ */
+void
+heap_page_prune_and_freeze(PruneFreezeParams *params,
+ PruneFreezeResult *presult,
+ OffsetNumber *off_loc,
+ TransactionId *new_relfrozen_xid,
+ MultiXactId *new_relmin_mxid)
+{
+ Buffer buffer = params->buffer;
+ Buffer vmbuffer = params->vmbuffer;
+ Page page = BufferGetPage(buffer);
+ BlockNumber blockno = BufferGetBlockNumber(buffer);
+ OffsetNumber maxoff;
+ PruneState prstate;
+ HeapTupleData tup;
+ bool do_freeze;
+ bool do_prune;
+ bool do_hint_prune;
+ bool do_set_vm;
+ bool do_set_pd_vis;
+ bool did_tuple_hint_fpi;
+ int64 fpi_before = pgWalUsage.wal_fpi;
+ TransactionId frz_conflict_horizon = InvalidTransactionId;
+ TransactionId conflict_xid = InvalidTransactionId;
+ uint8 new_vmbits = 0;
+ uint8 old_vmbits = 0;
+
+ maxoff = PageGetMaxOffsetNumber(page);
+ tup.t_tableOid = RelationGetRelid(params->relation);
+
+ /* Initialize needed state in prstate */
+ prune_freeze_setup(params, &prstate, new_relfrozen_xid, new_relmin_mxid, presult);
+
+ /*
+ * Examine all line pointers and tuple visibility information to determine
+ * which line pointers should change state and which tuples may be frozen.
+ * Prepare queue of state changes to later be executed in a critical
+ * section.
+ */
+ prune_freeze_plan(&prstate, blockno, buffer, page, maxoff, off_loc, &tup);
+
+ /*
+ * If checksums are enabled, heap_prune_satisfies_vacuum() may have caused
+ * an FPI to be emitted.
+ */
+ did_tuple_hint_fpi = fpi_before != pgWalUsage.wal_fpi;
do_prune = prstate.nredirected > 0 ||
prstate.ndead > 0 ||
@@ -963,16 +997,6 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
do_hint_prune = ((PageHeader) page)->pd_prune_xid != prstate.new_prune_xid ||
PageIsFull(page);
- /*
- * After processing all the live tuples on the page, if the newest xmin
- * amongst them is not visible to everyone, the page cannot be
- * all-visible.
- */
- if (prstate.all_visible &&
- TransactionIdIsNormal(prstate.visibility_cutoff_xid) &&
- !GlobalVisXidVisibleToAll(prstate.vistest, prstate.visibility_cutoff_xid))
- prstate.all_visible = prstate.all_frozen = false;
-
/*
* Decide if we want to go ahead with freezing according to the freeze
* plans we prepared, or not.
--
2.43.0
Attached v20 has general cleanup, changes to the table/index AM
callbacks detailed below, and it moves the
heap_page_prune_and_freeze() refactoring commit down the stack to
0004.
0001 - 0003 are fairly trivial cleanup patches. I think they are ready
to commit, so if I don't hear any objections in the next few days,
I'll go ahead and commit them.
On Tue, Nov 4, 2025 at 11:48 AM Melanie Plageman
<melanieplageman@gmail.com> wrote:
On Wed, Oct 29, 2025 at 7:03 AM Kirill Reshke <reshkekirill@gmail.com> wrote:
In 0010:
I'm not terribly convenient that adding SO_ALLOW_VM_SET to TAM
ScanOptions is the right thing to do. Looks like VM bits are something
that make sense for HEAP AM for not for any TAM. So, don't we break
some layer of abstraction here? Would it be better for HEAP AM to set
some flags in heap_beginscan?I don't see another good way of doing it.
The information about whether or not the relation is modified in the
query is gathered during planning and saved in the plan. We need to
get that information to the scan descriptor, which is all we have when
we call heap_page_prune_opt() during the scan. The scan descriptor is
created by the table AM implementations of scan_begin(). The table AM
callbacks don't pass down the plan -- which makes sense; the scan
shouldn't know about the plan. They do pass down flags, so I thought
it made the most sense to add a flag. Note that I was able to avoid
modifying the actual table and index AM callbacks (scan_begin() and
ambeginscan()). I only made new wrappers that took "modifies_rel".Now, it is true that referring to the VM is somewhat of a layering
violation. Though, other table AMs may use the information about if
the query modifies the relation -- which is really what this flag
represents. The ScanOptions are usually either a type or a call to
action. Which is why I felt a bit uncomfortable calling it something
like SO_MODIFIES_REL -- which is less of an option and more a piece of
information. And it makes it sound like the scan modifies the rel,
which is not the case. I wonder if there is another solution. Or maybe
we call it SO_QUERY_MODIFIES_REL?
Attached v20 changes the ScanOption name to SO_HINT_REL_READ_ONLY and
removes the new helper functions which took modifies_rel as a
parameter. Instead it modifies the existing
table_beginscan()/index_beginscan() helpers and the relevant callbacks
they invoke to have a new flags parameter. These are additional caller
provider flags.
In master, the IndexScan structures and helpers don't use ScanOptions,
but since I'm using them for properties of the base relation, I think
it is fine. I'm not sure if I should name the parameter base_rel_flags
instead of flags for the index-related callbacks and helpers or if
leaving it more generic is better, though.
- Melanie
Attachments:
v20-0003-Update-PruneState.all_-visible-frozen-earlier-in.patchtext/x-patch; charset=UTF-8; name=v20-0003-Update-PruneState.all_-visible-frozen-earlier-in.patchDownload
From 8ae7479e5f9191e26d30c5fa133a8322c19549c5 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <melanieplageman@gmail.com>
Date: Tue, 14 Oct 2025 15:21:49 -0400
Subject: [PATCH v20 03/12] Update PruneState.all_[visible|frozen] earlier in
pruning
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
In the prune/freeze path, we currently delay clearing all_visible and
all_frozen in the presence of dead items. This allows opportunistic
freezing if the page would otherwise be fully frozen, since those dead
items are later removed in vacuum’s third phase.
To move the VM update into the same WAL record that
prunes and freezes tuples, we must know whether the page will
be marked all-visible/all-frozen before emitting WAL.
The only barrier to updating these flags immediately after deciding
whether to opportunistically freeze is that we previously used
all_frozen to compute the snapshot conflict horizon when freezing
tuples. By determining the cutoff earlier, we can update the flags
immediately after making the freeze decision.
This is required to set the VM in the XLOG_HEAP2_PRUNE_VACUUM_SCAN
record emitted by pruning and freezing.
---
src/backend/access/heap/pruneheap.c | 117 ++++++++++++++--------------
1 file changed, 57 insertions(+), 60 deletions(-)
diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index 7cd51c7be33..86da2743423 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -138,11 +138,11 @@ typedef struct
* bits. It is only valid if we froze some tuples, and all_frozen is
* true.
*
- * NOTE: all_visible and all_frozen don't include LP_DEAD items. That's
- * convenient for heap_page_prune_and_freeze(), to use them to decide
- * whether to freeze the page or not. The all_visible and all_frozen
- * values returned to the caller are adjusted to include LP_DEAD items at
- * the end.
+ * NOTE: all_visible and all_frozen initially don't include LP_DEAD items.
+ * That's convenient for heap_page_prune_and_freeze() to use them to
+ * decide whether to freeze the page or not. The all_visible and
+ * all_frozen values returned to the caller are adjusted to include
+ * LP_DEAD items after we determine whether to opportunistically freeze.
*/
bool all_visible;
bool all_frozen;
@@ -175,7 +175,8 @@ static void page_verify_redirects(Page page);
static bool heap_page_will_freeze(Relation relation, Buffer buffer,
bool did_tuple_hint_fpi, bool do_prune, bool do_hint_prune,
- PruneState *prstate);
+ PruneState *prstate,
+ TransactionId *frz_conflict_horizon);
/*
@@ -306,7 +307,9 @@ heap_page_prune_opt(Relation relation, Buffer buffer)
* performs several pre-freeze checks.
*
* The values of do_prune, do_hint_prune, and did_tuple_hint_fpi must be
- * determined before calling this function.
+ * determined before calling this function. *frz_conflict_horizon is set to
+ * the snapshot conflict horizon we for the WAL record should we decide to
+ * freeze tuples.
*
* prstate is both an input and output parameter.
*
@@ -318,7 +321,8 @@ heap_page_will_freeze(Relation relation, Buffer buffer,
bool did_tuple_hint_fpi,
bool do_prune,
bool do_hint_prune,
- PruneState *prstate)
+ PruneState *prstate,
+ TransactionId *frz_conflict_horizon)
{
bool do_freeze = false;
@@ -388,6 +392,22 @@ heap_page_will_freeze(Relation relation, Buffer buffer,
* critical section.
*/
heap_pre_freeze_checks(buffer, prstate->frozen, prstate->nfrozen);
+
+ /*
+ * Calculate what the snapshot conflict horizon should be for a record
+ * freezing tuples. We can use the visibility_cutoff_xid as our cutoff
+ * for conflicts when the whole page is eligible to become all-frozen
+ * in the VM once we're done with it. Otherwise, we generate a
+ * conservative cutoff by stepping back from OldestXmin.
+ */
+ if (prstate->all_frozen)
+ *frz_conflict_horizon = prstate->visibility_cutoff_xid;
+ else
+ {
+ /* Avoids false conflicts when hot_standby_feedback in use */
+ *frz_conflict_horizon = prstate->cutoffs->OldestXmin;
+ TransactionIdRetreat(*frz_conflict_horizon);
+ }
}
else if (prstate->nfrozen > 0)
{
@@ -433,10 +453,10 @@ heap_page_will_freeze(Relation relation, Buffer buffer,
* now. The 'params.cutoffs', 'presult', 'new_relfrozen_xid' and
* 'new_relmin_mxid' arguments are required when freezing. When
* HEAP_PAGE_PRUNE_FREEZE option is passed, we also set presult->all_visible
- * and presult->all_frozen on exit, to indicate if the VM bits can be set.
- * They are always set to false when the HEAP_PAGE_PRUNE_FREEZE option is not
- * passed, because at the moment only callers that also freeze need that
- * information.
+ * and presult->all_frozen after determining whether or not to
+ * opporunistically freeze, to indicate if the VM bits can be set. They are
+ * always set to false when the HEAP_PAGE_PRUNE_FREEZE option is not passed,
+ * because at the moment only callers that also freeze need that information.
*
* presult contains output parameters needed by callers, such as the number of
* tuples removed and the offsets of dead items on the page after pruning.
@@ -472,6 +492,7 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
bool do_hint_prune;
bool did_tuple_hint_fpi;
int64 fpi_before = pgWalUsage.wal_fpi;
+ TransactionId frz_conflict_horizon = InvalidTransactionId;
/* Copy parameters to prstate */
prstate.vistest = params->vistest;
@@ -541,10 +562,10 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
* are tuples present that are not visible to everyone or if there are
* dead tuples which are not yet removable. However, dead tuples which
* will be removed by the end of vacuuming should not preclude us from
- * opportunistically freezing. Because of that, we do not clear
- * all_visible and all_frozen when we see LP_DEAD items. We fix that at
- * the end of the function, when we return the value to the caller, so
- * that the caller doesn't set the VM bits incorrectly.
+ * opportunistically freezing. Because of that, we do not immediately
+ * clear all_visible and all_frozen when we see LP_DEAD items. We fix
+ * that after scanning the line pointers, before we return the value to
+ * the caller, so that the caller doesn't set the VM bits incorrectly.
*/
if (prstate.attempt_freeze)
{
@@ -779,7 +800,24 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
did_tuple_hint_fpi,
do_prune,
do_hint_prune,
- &prstate);
+ &prstate,
+ &frz_conflict_horizon);
+
+ /*
+ * While scanning the line pointers, we did not clear
+ * all_visible/all_frozen when encountering LP_DEAD items because we
+ * wanted the decision whether or not to freeze the page to be unaffected
+ * by the short-term presence of LP_DEAD items. These LP_DEAD items are
+ * effectively assumed to be LP_UNUSED items in the making. It doesn't
+ * matter which vacuum heap pass (initial pass or final pass) ends up
+ * setting the page all-frozen, as long as the ongoing VACUUM does it.
+ *
+ * Now that we finished determining whether or not to freeze the page,
+ * update all_visible and all_frozen so that they reflect the true state
+ * of the page for setting PD_ALL_VISIBLE and VM bits.
+ */
+ if (prstate.lpdead_items > 0)
+ prstate.all_visible = prstate.all_frozen = false;
Assert(!prstate.all_frozen || prstate.all_visible);
@@ -841,27 +879,8 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
* on the standby with xids older than the youngest tuple this
* record will freeze will conflict.
*/
- TransactionId frz_conflict_horizon = InvalidTransactionId;
TransactionId conflict_xid;
- /*
- * We can use the visibility_cutoff_xid as our cutoff for
- * conflicts when the whole page is eligible to become all-frozen
- * in the VM once we're done with it. Otherwise we generate a
- * conservative cutoff by stepping back from OldestXmin.
- */
- if (do_freeze)
- {
- if (prstate.all_frozen)
- frz_conflict_horizon = prstate.visibility_cutoff_xid;
- else
- {
- /* Avoids false conflicts when hot_standby_feedback in use */
- frz_conflict_horizon = prstate.cutoffs->OldestXmin;
- TransactionIdRetreat(frz_conflict_horizon);
- }
- }
-
if (TransactionIdFollows(frz_conflict_horizon, prstate.latest_xid_removed))
conflict_xid = frz_conflict_horizon;
else
@@ -887,30 +906,8 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
presult->nfrozen = prstate.nfrozen;
presult->live_tuples = prstate.live_tuples;
presult->recently_dead_tuples = prstate.recently_dead_tuples;
-
- /*
- * It was convenient to ignore LP_DEAD items in all_visible/all_frozen
- * earlier on to make the choice of whether or not to freeze the page
- * unaffected by the short-term presence of LP_DEAD items. These LP_DEAD
- * items were effectively assumed to be LP_UNUSED items in the making. It
- * doesn't matter which vacuum heap pass (initial pass or final pass) ends
- * up setting the page all-frozen, as long as the ongoing VACUUM does it.
- *
- * Now that freezing has been finalized, unset all_visible and all_frozen
- * if there are any LP_DEAD items on the page. It needs to reflect the
- * present state of the page, as expected by our caller.
- */
- if (prstate.all_visible && prstate.lpdead_items == 0)
- {
- presult->all_visible = prstate.all_visible;
- presult->all_frozen = prstate.all_frozen;
- }
- else
- {
- presult->all_visible = false;
- presult->all_frozen = false;
- }
-
+ presult->all_visible = prstate.all_visible;
+ presult->all_frozen = prstate.all_frozen;
presult->hastup = prstate.hastup;
/*
--
2.43.0
v20-0004-Split-heap_page_prune_and_freeze-into-helpers.patchtext/x-patch; charset=US-ASCII; name=v20-0004-Split-heap_page_prune_and_freeze-into-helpers.patchDownload
From 13915bbef249e70af8167f77cc13b5ab88a9948f Mon Sep 17 00:00:00 2001
From: Melanie Plageman <melanieplageman@gmail.com>
Date: Mon, 17 Nov 2025 15:11:27 -0500
Subject: [PATCH v20 04/12] Split heap_page_prune_and_freeze() into helpers
Refactor the setup and planning phases of pruning and freezing into
helpers. This streamlines heap_page_prune_and_freeze() and makes it more
clear when the examination of tuples ends and page modifications begin.
---
src/backend/access/heap/pruneheap.c | 565 +++++++++++++++-------------
1 file changed, 310 insertions(+), 255 deletions(-)
diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index 86da2743423..9104c742a61 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -150,6 +150,14 @@ typedef struct
} PruneState;
/* Local functions */
+static void prune_freeze_setup(PruneFreezeParams *params,
+ TransactionId new_relfrozen_xid,
+ MultiXactId new_relmin_mxid,
+ const PruneFreezeResult *presult,
+ PruneState *prstate);
+static void prune_freeze_plan(Oid reloid, Buffer buffer,
+ PruneState *prstate,
+ OffsetNumber *off_loc);
static HTSV_Result heap_prune_satisfies_vacuum(PruneState *prstate,
HeapTuple tup,
Buffer buffer);
@@ -302,204 +310,22 @@ heap_page_prune_opt(Relation relation, Buffer buffer)
}
/*
- * Decide whether to proceed with freezing according to the freeze plans
- * prepared for the given heap buffer. If freezing is chosen, this function
- * performs several pre-freeze checks.
- *
- * The values of do_prune, do_hint_prune, and did_tuple_hint_fpi must be
- * determined before calling this function. *frz_conflict_horizon is set to
- * the snapshot conflict horizon we for the WAL record should we decide to
- * freeze tuples.
- *
- * prstate is both an input and output parameter.
- *
- * Returns true if we should apply the freeze plans and freeze tuples on the
- * page, and false otherwise.
+ * Helper for heap_page_prune_and_freeze() to initialize the PruneState using
+ * the provided parameters.
*/
-static bool
-heap_page_will_freeze(Relation relation, Buffer buffer,
- bool did_tuple_hint_fpi,
- bool do_prune,
- bool do_hint_prune,
- PruneState *prstate,
- TransactionId *frz_conflict_horizon)
-{
- bool do_freeze = false;
-
- /*
- * If the caller specified we should not attempt to freeze any tuples,
- * validate that everything is in the right state and return.
- */
- if (!prstate->attempt_freeze)
- {
- Assert(!prstate->all_frozen && prstate->nfrozen == 0);
- Assert(prstate->lpdead_items == 0 || !prstate->all_visible);
- return false;
- }
-
- if (prstate->pagefrz.freeze_required)
- {
- /*
- * heap_prepare_freeze_tuple indicated that at least one XID/MXID from
- * before FreezeLimit/MultiXactCutoff is present. Must freeze to
- * advance relfrozenxid/relminmxid.
- */
- do_freeze = true;
- }
- else
- {
- /*
- * Opportunistically freeze the page if we are generating an FPI
- * anyway and if doing so means that we can set the page all-frozen
- * afterwards (might not happen until VACUUM's final heap pass).
- *
- * XXX: Previously, we knew if pruning emitted an FPI by checking
- * pgWalUsage.wal_fpi before and after pruning. Once the freeze and
- * prune records were combined, this heuristic couldn't be used
- * anymore. The opportunistic freeze heuristic must be improved;
- * however, for now, try to approximate the old logic.
- */
- if (prstate->all_frozen && prstate->nfrozen > 0)
- {
- Assert(prstate->all_visible);
-
- /*
- * Freezing would make the page all-frozen. Have already emitted
- * an FPI or will do so anyway?
- */
- if (RelationNeedsWAL(relation))
- {
- if (did_tuple_hint_fpi)
- do_freeze = true;
- else if (do_prune)
- {
- if (XLogCheckBufferNeedsBackup(buffer))
- do_freeze = true;
- }
- else if (do_hint_prune)
- {
- if (XLogHintBitIsNeeded() && XLogCheckBufferNeedsBackup(buffer))
- do_freeze = true;
- }
- }
- }
- }
-
- if (do_freeze)
- {
- /*
- * Validate the tuples we will be freezing before entering the
- * critical section.
- */
- heap_pre_freeze_checks(buffer, prstate->frozen, prstate->nfrozen);
-
- /*
- * Calculate what the snapshot conflict horizon should be for a record
- * freezing tuples. We can use the visibility_cutoff_xid as our cutoff
- * for conflicts when the whole page is eligible to become all-frozen
- * in the VM once we're done with it. Otherwise, we generate a
- * conservative cutoff by stepping back from OldestXmin.
- */
- if (prstate->all_frozen)
- *frz_conflict_horizon = prstate->visibility_cutoff_xid;
- else
- {
- /* Avoids false conflicts when hot_standby_feedback in use */
- *frz_conflict_horizon = prstate->cutoffs->OldestXmin;
- TransactionIdRetreat(*frz_conflict_horizon);
- }
- }
- else if (prstate->nfrozen > 0)
- {
- /*
- * The page contained some tuples that were not already frozen, and we
- * chose not to freeze them now. The page won't be all-frozen then.
- */
- Assert(!prstate->pagefrz.freeze_required);
-
- prstate->all_frozen = false;
- prstate->nfrozen = 0; /* avoid miscounts in instrumentation */
- }
- else
- {
- /*
- * We have no freeze plans to execute. The page might already be
- * all-frozen (perhaps only following pruning), though. Such pages
- * can be marked all-frozen in the VM by our caller, even though none
- * of its tuples were newly frozen here.
- */
- }
-
- return do_freeze;
-}
-
-
-/*
- * Prune and repair fragmentation and potentially freeze tuples on the
- * specified page.
- *
- * Caller must have pin and buffer cleanup lock on the page. Note that we
- * don't update the FSM information for page on caller's behalf. Caller might
- * also need to account for a reduction in the length of the line pointer
- * array following array truncation by us.
- *
- * params contains the input parameters used to control freezing and pruning
- * behavior. See the definition of PruneFreezeParams for more on what each
- * parameter does.
- *
- * If the HEAP_PAGE_PRUNE_FREEZE option is set in params, we will freeze
- * tuples if it's required in order to advance relfrozenxid / relminmxid, or
- * if it's considered advantageous for overall system performance to do so
- * now. The 'params.cutoffs', 'presult', 'new_relfrozen_xid' and
- * 'new_relmin_mxid' arguments are required when freezing. When
- * HEAP_PAGE_PRUNE_FREEZE option is passed, we also set presult->all_visible
- * and presult->all_frozen after determining whether or not to
- * opporunistically freeze, to indicate if the VM bits can be set. They are
- * always set to false when the HEAP_PAGE_PRUNE_FREEZE option is not passed,
- * because at the moment only callers that also freeze need that information.
- *
- * presult contains output parameters needed by callers, such as the number of
- * tuples removed and the offsets of dead items on the page after pruning.
- * heap_page_prune_and_freeze() is responsible for initializing it. Required
- * by all callers.
- *
- * off_loc is the offset location required by the caller to use in error
- * callback.
- *
- * new_relfrozen_xid and new_relmin_mxid must provided by the caller if the
- * HEAP_PAGE_PRUNE_FREEZE option is set in params. On entry, they contain the
- * oldest XID and multi-XID seen on the relation so far. They will be updated
- * with oldest values present on the page after pruning. After processing the
- * whole relation, VACUUM can use these values as the new
- * relfrozenxid/relminmxid for the relation.
- */
-void
-heap_page_prune_and_freeze(PruneFreezeParams *params,
- PruneFreezeResult *presult,
- OffsetNumber *off_loc,
- TransactionId *new_relfrozen_xid,
- MultiXactId *new_relmin_mxid)
+static void
+prune_freeze_setup(PruneFreezeParams *params,
+ TransactionId new_relfrozen_xid,
+ MultiXactId new_relmin_mxid,
+ const PruneFreezeResult *presult,
+ PruneState *prstate)
{
- Buffer buffer = params->buffer;
- Page page = BufferGetPage(buffer);
- BlockNumber blockno = BufferGetBlockNumber(buffer);
- OffsetNumber offnum,
- maxoff;
- PruneState prstate;
- HeapTupleData tup;
- bool do_freeze;
- bool do_prune;
- bool do_hint_prune;
- bool did_tuple_hint_fpi;
- int64 fpi_before = pgWalUsage.wal_fpi;
- TransactionId frz_conflict_horizon = InvalidTransactionId;
-
/* Copy parameters to prstate */
- prstate.vistest = params->vistest;
- prstate.mark_unused_now =
+ prstate->vistest = params->vistest;
+ prstate->mark_unused_now =
(params->options & HEAP_PAGE_PRUNE_MARK_UNUSED_NOW) != 0;
- prstate.attempt_freeze = (params->options & HEAP_PAGE_PRUNE_FREEZE) != 0;
- prstate.cutoffs = params->cutoffs;
+ prstate->attempt_freeze = (params->options & HEAP_PAGE_PRUNE_FREEZE) != 0;
+ prstate->cutoffs = params->cutoffs;
/*
* Our strategy is to scan the page and make lists of items to change,
@@ -512,40 +338,41 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
* prunable, we will save the lowest relevant XID in new_prune_xid. Also
* initialize the rest of our working state.
*/
- prstate.new_prune_xid = InvalidTransactionId;
- prstate.latest_xid_removed = InvalidTransactionId;
- prstate.nredirected = prstate.ndead = prstate.nunused = prstate.nfrozen = 0;
- prstate.nroot_items = 0;
- prstate.nheaponly_items = 0;
+ prstate->new_prune_xid = InvalidTransactionId;
+ prstate->latest_xid_removed = InvalidTransactionId;
+ prstate->nredirected = prstate->ndead = prstate->nunused = 0;
+ prstate->nfrozen = 0;
+ prstate->nroot_items = 0;
+ prstate->nheaponly_items = 0;
/* initialize page freezing working state */
- prstate.pagefrz.freeze_required = false;
- if (prstate.attempt_freeze)
+ prstate->pagefrz.freeze_required = false;
+ if (prstate->attempt_freeze)
{
- Assert(new_relfrozen_xid && new_relmin_mxid);
- prstate.pagefrz.FreezePageRelfrozenXid = *new_relfrozen_xid;
- prstate.pagefrz.NoFreezePageRelfrozenXid = *new_relfrozen_xid;
- prstate.pagefrz.FreezePageRelminMxid = *new_relmin_mxid;
- prstate.pagefrz.NoFreezePageRelminMxid = *new_relmin_mxid;
+ prstate->pagefrz.FreezePageRelfrozenXid = new_relfrozen_xid;
+ prstate->pagefrz.NoFreezePageRelfrozenXid = new_relfrozen_xid;
+ prstate->pagefrz.FreezePageRelminMxid = new_relmin_mxid;
+ prstate->pagefrz.NoFreezePageRelminMxid = new_relmin_mxid;
}
else
{
- Assert(new_relfrozen_xid == NULL && new_relmin_mxid == NULL);
- prstate.pagefrz.FreezePageRelminMxid = InvalidMultiXactId;
- prstate.pagefrz.NoFreezePageRelminMxid = InvalidMultiXactId;
- prstate.pagefrz.FreezePageRelfrozenXid = InvalidTransactionId;
- prstate.pagefrz.NoFreezePageRelfrozenXid = InvalidTransactionId;
+ Assert(new_relfrozen_xid == InvalidTransactionId &&
+ new_relmin_mxid == InvalidMultiXactId);
+ prstate->pagefrz.FreezePageRelminMxid = InvalidMultiXactId;
+ prstate->pagefrz.NoFreezePageRelminMxid = InvalidMultiXactId;
+ prstate->pagefrz.FreezePageRelfrozenXid = InvalidTransactionId;
+ prstate->pagefrz.NoFreezePageRelfrozenXid = InvalidTransactionId;
}
- prstate.ndeleted = 0;
- prstate.live_tuples = 0;
- prstate.recently_dead_tuples = 0;
- prstate.hastup = false;
- prstate.lpdead_items = 0;
- prstate.deadoffsets = presult->deadoffsets;
+ prstate->ndeleted = 0;
+ prstate->live_tuples = 0;
+ prstate->recently_dead_tuples = 0;
+ prstate->hastup = false;
+ prstate->lpdead_items = 0;
+ prstate->deadoffsets = (OffsetNumber *) presult->deadoffsets;
/*
- * Caller may update the VM after we're done. We can keep track of
+ * Vacuum may update the VM after we're done. We can keep track of
* whether the page will be all-visible and all-frozen after pruning and
* freezing to help the caller to do that.
*
@@ -567,10 +394,10 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
* that after scanning the line pointers, before we return the value to
* the caller, so that the caller doesn't set the VM bits incorrectly.
*/
- if (prstate.attempt_freeze)
+ if (prstate->attempt_freeze)
{
- prstate.all_visible = true;
- prstate.all_frozen = true;
+ prstate->all_visible = true;
+ prstate->all_frozen = true;
}
else
{
@@ -578,8 +405,8 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
* Initializing to false allows skipping the work to update them in
* heap_prune_record_unchanged_lp_normal().
*/
- prstate.all_visible = false;
- prstate.all_frozen = false;
+ prstate->all_visible = false;
+ prstate->all_frozen = false;
}
/*
@@ -590,10 +417,29 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
* running transaction on the standby does not see tuples on the page as
* all-visible, so the conflict horizon remains InvalidTransactionId.
*/
- prstate.visibility_cutoff_xid = InvalidTransactionId;
+ prstate->visibility_cutoff_xid = InvalidTransactionId;
+}
- maxoff = PageGetMaxOffsetNumber(page);
- tup.t_tableOid = RelationGetRelid(params->relation);
+/*
+ * Helper for heap_page_prune_and_freeze(). Iterates over every tuple on the
+ * page, examines its visibility information, and determines the appropriate
+ * action for each tuple. All tuples are processed and classified during this
+ * phase, but no modifications are made to the page until the later execution
+ * stage.
+ *
+ * *off_loc is used for error callback and cleared before returning.
+ */
+static void
+prune_freeze_plan(Oid reloid, Buffer buffer, PruneState *prstate,
+ OffsetNumber *off_loc)
+{
+ Page page = BufferGetPage(buffer);
+ BlockNumber blockno = BufferGetBlockNumber(buffer);
+ OffsetNumber maxoff = PageGetMaxOffsetNumber(page);
+ OffsetNumber offnum;
+ HeapTupleData tup;
+
+ tup.t_tableOid = reloid;
/*
* Determine HTSV for all tuples, and queue them up for processing as HOT
@@ -628,13 +474,13 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
*/
*off_loc = offnum;
- prstate.processed[offnum] = false;
- prstate.htsv[offnum] = -1;
+ prstate->processed[offnum] = false;
+ prstate->htsv[offnum] = -1;
/* Nothing to do if slot doesn't contain a tuple */
if (!ItemIdIsUsed(itemid))
{
- heap_prune_record_unchanged_lp_unused(page, &prstate, offnum);
+ heap_prune_record_unchanged_lp_unused(page, prstate, offnum);
continue;
}
@@ -644,17 +490,17 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
* If the caller set mark_unused_now true, we can set dead line
* pointers LP_UNUSED now.
*/
- if (unlikely(prstate.mark_unused_now))
- heap_prune_record_unused(&prstate, offnum, false);
+ if (unlikely(prstate->mark_unused_now))
+ heap_prune_record_unused(prstate, offnum, false);
else
- heap_prune_record_unchanged_lp_dead(page, &prstate, offnum);
+ heap_prune_record_unchanged_lp_dead(page, prstate, offnum);
continue;
}
if (ItemIdIsRedirected(itemid))
{
/* This is the start of a HOT chain */
- prstate.root_items[prstate.nroot_items++] = offnum;
+ prstate->root_items[prstate->nroot_items++] = offnum;
continue;
}
@@ -668,21 +514,15 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
tup.t_len = ItemIdGetLength(itemid);
ItemPointerSet(&tup.t_self, blockno, offnum);
- prstate.htsv[offnum] = heap_prune_satisfies_vacuum(&prstate, &tup,
- buffer);
+ prstate->htsv[offnum] = heap_prune_satisfies_vacuum(prstate, &tup,
+ buffer);
if (!HeapTupleHeaderIsHeapOnly(htup))
- prstate.root_items[prstate.nroot_items++] = offnum;
+ prstate->root_items[prstate->nroot_items++] = offnum;
else
- prstate.heaponly_items[prstate.nheaponly_items++] = offnum;
+ prstate->heaponly_items[prstate->nheaponly_items++] = offnum;
}
- /*
- * If checksums are enabled, heap_prune_satisfies_vacuum() may have caused
- * an FPI to be emitted.
- */
- did_tuple_hint_fpi = fpi_before != pgWalUsage.wal_fpi;
-
/*
* Process HOT chains.
*
@@ -694,30 +534,30 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
* the page instead of using the root_items array, also did it in
* ascending offset number order.)
*/
- for (int i = prstate.nroot_items - 1; i >= 0; i--)
+ for (int i = prstate->nroot_items - 1; i >= 0; i--)
{
- offnum = prstate.root_items[i];
+ offnum = prstate->root_items[i];
/* Ignore items already processed as part of an earlier chain */
- if (prstate.processed[offnum])
+ if (prstate->processed[offnum])
continue;
/* see preceding loop */
*off_loc = offnum;
/* Process this item or chain of items */
- heap_prune_chain(page, blockno, maxoff, offnum, &prstate);
+ heap_prune_chain(page, blockno, maxoff, offnum, prstate);
}
/*
* Process any heap-only tuples that were not already processed as part of
* a HOT chain.
*/
- for (int i = prstate.nheaponly_items - 1; i >= 0; i--)
+ for (int i = prstate->nheaponly_items - 1; i >= 0; i--)
{
- offnum = prstate.heaponly_items[i];
+ offnum = prstate->heaponly_items[i];
- if (prstate.processed[offnum])
+ if (prstate->processed[offnum])
continue;
/* see preceding loop */
@@ -736,7 +576,7 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
* return true for an XMIN_INVALID tuple, so this code will work even
* when there were sequential updates within the aborted transaction.)
*/
- if (prstate.htsv[offnum] == HEAPTUPLE_DEAD)
+ if (prstate->htsv[offnum] == HEAPTUPLE_DEAD)
{
ItemId itemid = PageGetItemId(page, offnum);
HeapTupleHeader htup = (HeapTupleHeader) PageGetItem(page, itemid);
@@ -744,8 +584,8 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
if (likely(!HeapTupleHeaderIsHotUpdated(htup)))
{
HeapTupleHeaderAdvanceConflictHorizon(htup,
- &prstate.latest_xid_removed);
- heap_prune_record_unused(&prstate, offnum, true);
+ &prstate->latest_xid_removed);
+ heap_prune_record_unused(prstate, offnum, true);
}
else
{
@@ -762,7 +602,7 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
}
}
else
- heap_prune_record_unchanged_lp_normal(page, &prstate, offnum);
+ heap_prune_record_unchanged_lp_normal(page, prstate, offnum);
}
/* We should now have processed every tuple exactly once */
@@ -773,12 +613,227 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
{
*off_loc = offnum;
- Assert(prstate.processed[offnum]);
+ Assert(prstate->processed[offnum]);
}
#endif
/* Clear the offset information once we have processed the given page. */
*off_loc = InvalidOffsetNumber;
+}
+
+/*
+ * Decide whether to proceed with freezing according to the freeze plans
+ * prepared for the given heap buffer. If freezing is chosen, this function
+ * performs several pre-freeze checks.
+ *
+ * The values of do_prune, do_hint_prune, and did_tuple_hint_fpi must be
+ * determined before calling this function. *frz_conflict_horizon is set to
+ * the snapshot conflict horizon we for the WAL record should we decide to
+ * freeze tuples.
+ *
+ * prstate is both an input and output parameter.
+ *
+ * Returns true if we should apply the freeze plans and freeze tuples on the
+ * page, and false otherwise.
+ */
+static bool
+heap_page_will_freeze(Relation relation, Buffer buffer,
+ bool did_tuple_hint_fpi,
+ bool do_prune,
+ bool do_hint_prune,
+ PruneState *prstate,
+ TransactionId *frz_conflict_horizon)
+{
+ bool do_freeze = false;
+
+ /*
+ * If the caller specified we should not attempt to freeze any tuples,
+ * validate that everything is in the right state and return.
+ */
+ if (!prstate->attempt_freeze)
+ {
+ Assert(!prstate->all_frozen && prstate->nfrozen == 0);
+ Assert(prstate->lpdead_items == 0 || !prstate->all_visible);
+ return false;
+ }
+
+ if (prstate->pagefrz.freeze_required)
+ {
+ /*
+ * heap_prepare_freeze_tuple indicated that at least one XID/MXID from
+ * before FreezeLimit/MultiXactCutoff is present. Must freeze to
+ * advance relfrozenxid/relminmxid.
+ */
+ do_freeze = true;
+ }
+ else
+ {
+ /*
+ * Opportunistically freeze the page if we are generating an FPI
+ * anyway and if doing so means that we can set the page all-frozen
+ * afterwards (might not happen until VACUUM's final heap pass).
+ *
+ * XXX: Previously, we knew if pruning emitted an FPI by checking
+ * pgWalUsage.wal_fpi before and after pruning. Once the freeze and
+ * prune records were combined, this heuristic couldn't be used
+ * anymore. The opportunistic freeze heuristic must be improved;
+ * however, for now, try to approximate the old logic.
+ */
+ if (prstate->all_frozen && prstate->nfrozen > 0)
+ {
+ Assert(prstate->all_visible);
+
+ /*
+ * Freezing would make the page all-frozen. Have already emitted
+ * an FPI or will do so anyway?
+ */
+ if (RelationNeedsWAL(relation))
+ {
+ if (did_tuple_hint_fpi)
+ do_freeze = true;
+ else if (do_prune)
+ {
+ if (XLogCheckBufferNeedsBackup(buffer))
+ do_freeze = true;
+ }
+ else if (do_hint_prune)
+ {
+ if (XLogHintBitIsNeeded() && XLogCheckBufferNeedsBackup(buffer))
+ do_freeze = true;
+ }
+ }
+ }
+ }
+
+ if (do_freeze)
+ {
+ /*
+ * Validate the tuples we will be freezing before entering the
+ * critical section.
+ */
+ heap_pre_freeze_checks(buffer, prstate->frozen, prstate->nfrozen);
+
+ /*
+ * Calculate what the snapshot conflict horizon should be for a record
+ * freezing tuples. We can use the visibility_cutoff_xid as our cutoff
+ * for conflicts when the whole page is eligible to become all-frozen
+ * in the VM once we're done with it. Otherwise, we generate a
+ * conservative cutoff by stepping back from OldestXmin.
+ */
+ if (prstate->all_frozen)
+ *frz_conflict_horizon = prstate->visibility_cutoff_xid;
+ else
+ {
+ /* Avoids false conflicts when hot_standby_feedback in use */
+ *frz_conflict_horizon = prstate->cutoffs->OldestXmin;
+ TransactionIdRetreat(*frz_conflict_horizon);
+ }
+ }
+ else if (prstate->nfrozen > 0)
+ {
+ /*
+ * The page contained some tuples that were not already frozen, and we
+ * chose not to freeze them now. The page won't be all-frozen then.
+ */
+ Assert(!prstate->pagefrz.freeze_required);
+
+ prstate->all_frozen = false;
+ prstate->nfrozen = 0; /* avoid miscounts in instrumentation */
+ }
+ else
+ {
+ /*
+ * We have no freeze plans to execute. The page might already be
+ * all-frozen (perhaps only following pruning), though. Such pages
+ * can be marked all-frozen in the VM by our caller, even though none
+ * of its tuples were newly frozen here.
+ */
+ }
+
+ return do_freeze;
+}
+
+
+/*
+ * Prune and repair fragmentation and potentially freeze tuples on the
+ * specified page.
+ *
+ * Caller must have pin and buffer cleanup lock on the page. Note that we
+ * don't update the FSM information for page on caller's behalf. Caller might
+ * also need to account for a reduction in the length of the line pointer
+ * array following array truncation by us.
+ *
+ * params contains the input parameters used to control freezing and pruning
+ * behavior. See the definition of PruneFreezeParams for more on what each
+ * parameter does.
+ *
+ * If the HEAP_PAGE_PRUNE_FREEZE option is set in params, we will freeze
+ * tuples if it's required in order to advance relfrozenxid / relminmxid, or
+ * if it's considered advantageous for overall system performance to do so
+ * now. The 'params.cutoffs', 'presult', 'new_relfrozen_xid' and
+ * 'new_relmin_mxid' arguments are required when freezing. When
+ * HEAP_PAGE_PRUNE_FREEZE option is passed, we also set presult->all_visible
+ * and presult->all_frozen after determining whether or not to
+ * opporunistically freeze, to indicate if the VM bits can be set. They are
+ * always set to false when the HEAP_PAGE_PRUNE_FREEZE option is not passed,
+ * because at the moment only callers that also freeze need that information.
+ *
+ * presult contains output parameters needed by callers, such as the number of
+ * tuples removed and the offsets of dead items on the page after pruning.
+ * heap_page_prune_and_freeze() is responsible for initializing it. Required
+ * by all callers.
+ *
+ * off_loc is the offset location required by the caller to use in error
+ * callback.
+ *
+ * new_relfrozen_xid and new_relmin_mxid must provided by the caller if the
+ * HEAP_PAGE_PRUNE_FREEZE option is set in params. On entry, they contain the
+ * oldest XID and multi-XID seen on the relation so far. They will be updated
+ * with oldest values present on the page after pruning. After processing the
+ * whole relation, VACUUM can use these values as the new
+ * relfrozenxid/relminmxid for the relation.
+ */
+void
+heap_page_prune_and_freeze(PruneFreezeParams *params,
+ PruneFreezeResult *presult,
+ OffsetNumber *off_loc,
+ TransactionId *new_relfrozen_xid,
+ MultiXactId *new_relmin_mxid)
+{
+ Buffer buffer = params->buffer;
+ Page page = BufferGetPage(buffer);
+ PruneState prstate;
+ bool do_freeze;
+ bool do_prune;
+ bool do_hint_prune;
+ bool did_tuple_hint_fpi;
+ int64 fpi_before = pgWalUsage.wal_fpi;
+ TransactionId frz_conflict_horizon = InvalidTransactionId;
+
+ /* Initialize prstate */
+ prune_freeze_setup(params,
+ new_relfrozen_xid ?
+ *new_relfrozen_xid : InvalidTransactionId,
+ new_relmin_mxid ?
+ *new_relmin_mxid : InvalidMultiXactId,
+ presult,
+ &prstate);
+
+ /*
+ * Examine all line pointers and tuple visibility information to determine
+ * which line pointers should change state and which tuples may be frozen.
+ * Prepare queue of state changes to later be executed in a critical
+ * section.
+ */
+ prune_freeze_plan(RelationGetRelid(params->relation),
+ buffer, &prstate, off_loc);
+
+ /*
+ * If checksums are enabled, calling heap_prune_satisfies_vacuum() while
+ * checking tuple visibility information in prune_freeze_plan() may have
+ * caused an FPI to be emitted.
+ */
+ did_tuple_hint_fpi = fpi_before != pgWalUsage.wal_fpi;
do_prune = prstate.nredirected > 0 ||
prstate.ndead > 0 ||
--
2.43.0
v20-0005-Eliminate-XLOG_HEAP2_VISIBLE-from-vacuum-phase-I.patchtext/x-patch; charset=UTF-8; name=v20-0005-Eliminate-XLOG_HEAP2_VISIBLE-from-vacuum-phase-I.patchDownload
From f20655f822e7b83d14cc6616a992b69799c859cb Mon Sep 17 00:00:00 2001
From: Melanie Plageman <melanieplageman@gmail.com>
Date: Wed, 8 Oct 2025 15:39:01 -0400
Subject: [PATCH v20 05/12] Eliminate XLOG_HEAP2_VISIBLE from vacuum phase I
prune/freeze
Vacuum no longer emits a separate WAL record for each page set
all-visible or all-frozen during phase I. Instead, visibility map
updates are now included in the XLOG_HEAP2_PRUNE_VACUUM_SCAN record that
is already emitted for pruning and freezing.
Previously, heap_page_prune_and_freeze() determined whether a page was
all-visible, but the corresponding VM bits were only set later in
lazy_scan_prune(). Now the VM is updated immediately in
heap_page_prune_and_freeze(), at the same time as the heap
modifications.
This change applies only to vacuum phase I, not to pruning performed
during normal page access.
Author: Melanie Plageman <melanieplageman@gmail.com>
Reviewed-by: Robert Haas <robertmhaas@gmail.com>
Reviewed-by: Kirill Reshke <reshkekirill@gmail.com>
Discussion: https://postgr.es/m/flat/CAAKRu_ZMw6Npd_qm2KM%2BFwQ3cMOMx1Dh3VMhp8-V7SOLxdK9-g%40mail.gmail.com
---
src/backend/access/heap/heapam_xlog.c | 37 ++-
src/backend/access/heap/pruneheap.c | 460 +++++++++++++++++++++-----
src/backend/access/heap/vacuumlazy.c | 241 +-------------
src/include/access/heapam.h | 43 ++-
4 files changed, 447 insertions(+), 334 deletions(-)
diff --git a/src/backend/access/heap/heapam_xlog.c b/src/backend/access/heap/heapam_xlog.c
index 11cb3f74da5..2af724451c3 100644
--- a/src/backend/access/heap/heapam_xlog.c
+++ b/src/backend/access/heap/heapam_xlog.c
@@ -104,6 +104,8 @@ heap_xlog_prune_freeze(XLogReaderState *record)
OffsetNumber *frz_offsets;
char *dataptr = XLogRecGetBlockData(record, 0, &datalen);
bool do_prune;
+ bool set_lsn = false;
+ bool mark_buffer_dirty = false;
heap_xlog_deserialize_prune_and_freeze(dataptr, xlrec.flags,
&nplans, &plans, &frz_offsets,
@@ -157,17 +159,36 @@ heap_xlog_prune_freeze(XLogReaderState *record)
/* There should be no more data */
Assert((char *) frz_offsets == dataptr + datalen);
- if (vmflags & VISIBILITYMAP_VALID_BITS)
- PageSetAllVisible(page);
-
- MarkBufferDirty(buffer);
+ if (do_prune || nplans > 0)
+ mark_buffer_dirty = set_lsn = true;
/*
- * See log_heap_prune_and_freeze() for commentary on when we set the
- * heap page LSN.
+ * The critical integrity requirement here is that we must never end
+ * up with with the visibility map bit set and the page-level
+ * PD_ALL_VISIBLE bit clear. If that were to occur, a subsequent page
+ * modification would fail to clear the visibility map bit.
+ *
+ * vmflags may be nonzero with PD_ALL_VISIBLE already set (e.g. when
+ * marking an all-visible page all-frozen). If only the VM is updated,
+ * the heap page need not be dirtied.
*/
- if (do_prune || nplans > 0 ||
- ((vmflags & VISIBILITYMAP_VALID_BITS) && XLogHintBitIsNeeded()))
+ if ((vmflags & VISIBILITYMAP_VALID_BITS) && !PageIsAllVisible(page))
+ {
+ PageSetAllVisible(page);
+ mark_buffer_dirty = true;
+
+ /*
+ * See log_heap_prune_and_freeze() for commentary on when we set
+ * the heap page LSN.
+ */
+ if (XLogHintBitIsNeeded())
+ set_lsn = true;
+ }
+
+ if (mark_buffer_dirty)
+ MarkBufferDirty(buffer);
+
+ if (set_lsn)
PageSetLSN(page, lsn);
/*
diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index 9104c742a61..5667df86bae 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -19,7 +19,7 @@
#include "access/htup_details.h"
#include "access/multixact.h"
#include "access/transam.h"
-#include "access/visibilitymapdefs.h"
+#include "access/visibilitymap.h"
#include "access/xlog.h"
#include "access/xloginsert.h"
#include "commands/vacuum.h"
@@ -44,6 +44,8 @@ typedef struct
bool mark_unused_now;
/* whether to attempt freezing tuples */
bool attempt_freeze;
+ /* whether or not to attempt updating the VM */
+ bool attempt_update_vm;
struct VacuumCutoffs *cutoffs;
/*-------------------------------------------------------
@@ -133,16 +135,17 @@ typedef struct
* all_visible and all_frozen indicate if the all-visible and all-frozen
* bits in the visibility map can be set for this page after pruning.
*
- * visibility_cutoff_xid is the newest xmin of live tuples on the page.
- * The caller can use it as the conflict horizon, when setting the VM
- * bits. It is only valid if we froze some tuples, and all_frozen is
- * true.
+ * visibility_cutoff_xid is the newest xmin of live tuples on the page. It
+ * can be used as the conflict horizon when setting the VM or when
+ * freezing all the tuples on the page. It is only valid when all the live
+ * tuples on the page are all-visible.
*
* NOTE: all_visible and all_frozen initially don't include LP_DEAD items.
* That's convenient for heap_page_prune_and_freeze() to use them to
- * decide whether to freeze the page or not. The all_visible and
- * all_frozen values returned to the caller are adjusted to include
- * LP_DEAD items after we determine whether to opportunistically freeze.
+ * decide whether to opportunistically freeze the page or not. The
+ * all_visible and all_frozen values ultimately used to set the VM are
+ * adjusted to include LP_DEAD items after we determine whether or not to
+ * opportunistically freeze.
*/
bool all_visible;
bool all_frozen;
@@ -181,11 +184,22 @@ static void heap_prune_record_unchanged_lp_redirect(PruneState *prstate, OffsetN
static void page_verify_redirects(Page page);
+static TransactionId get_conflict_xid(bool do_prune, bool do_freeze, bool do_set_vm,
+ TransactionId latest_xid_removed, TransactionId frz_conflict_horizon,
+ TransactionId visibility_cutoff_xid, bool blk_already_av,
+ bool set_blk_all_frozen);
static bool heap_page_will_freeze(Relation relation, Buffer buffer,
bool did_tuple_hint_fpi, bool do_prune, bool do_hint_prune,
PruneState *prstate,
TransactionId *frz_conflict_horizon);
-
+static bool heap_page_will_set_vis(Relation relation,
+ BlockNumber heap_blk,
+ Buffer heap_buf,
+ Buffer vmbuffer,
+ bool blk_known_av,
+ const PruneState *prstate,
+ uint8 *vmflags,
+ bool *do_set_pd_vis);
/*
* Optionally prune and repair fragmentation in the specified page.
@@ -272,6 +286,7 @@ heap_page_prune_opt(Relation relation, Buffer buffer)
* current implementation.
*/
PruneFreezeParams params = {.relation = relation,.buffer = buffer,
+ .vmbuffer = InvalidBuffer,.blk_known_av = false,
.reason = PRUNE_ON_ACCESS,.options = 0,
.vistest = vistest,.cutoffs = NULL
};
@@ -325,6 +340,8 @@ prune_freeze_setup(PruneFreezeParams *params,
prstate->mark_unused_now =
(params->options & HEAP_PAGE_PRUNE_MARK_UNUSED_NOW) != 0;
prstate->attempt_freeze = (params->options & HEAP_PAGE_PRUNE_FREEZE) != 0;
+ prstate->attempt_update_vm =
+ (params->options & HEAP_PAGE_PRUNE_UPDATE_VIS) != 0;
prstate->cutoffs = params->cutoffs;
/*
@@ -372,50 +389,54 @@ prune_freeze_setup(PruneFreezeParams *params,
prstate->deadoffsets = (OffsetNumber *) presult->deadoffsets;
/*
- * Vacuum may update the VM after we're done. We can keep track of
- * whether the page will be all-visible and all-frozen after pruning and
- * freezing to help the caller to do that.
+ * Track whether the page could be marked all-visible and/or all-frozen.
+ * This information is used for opportunistic freezing and for updating
+ * the visibility map (VM) if requested by the caller.
+ *
+ * Currently, only VACUUM performs freezing, but other callers may in the
+ * future. Visibility bookkeeping is required not just for setting the VM
+ * bits, but also for opportunistic freezing: we only consider freezing if
+ * the page would become all-frozen, or if it would be all-frozen except
+ * for dead tuples that VACUUM will remove. If attempt_update_vm is false,
+ * we will not set the VM bit even if the page is found to be all-visible.
*
- * Currently, only VACUUM sets the VM bits. To save the effort, only do
- * the bookkeeping if the caller needs it. Currently, that's tied to
- * HEAP_PAGE_PRUNE_FREEZE, but it could be a separate flag if you wanted
- * to update the VM bits without also freezing or freeze without also
- * setting the VM bits.
+ * If HEAP_PAGE_PRUNE_UPDATE_VIS is passed without HEAP_PAGE_PRUNE_FREEZE,
+ * prstate.all_frozen must be initialized to false, since we will not call
+ * heap_prepare_freeze_tuple() for each tuple.
*
- * In addition to telling the caller whether it can set the VM bit, we
- * also use 'all_visible' and 'all_frozen' for our own decision-making. If
- * the whole page would become frozen, we consider opportunistically
- * freezing tuples. We will not be able to freeze the whole page if there
- * are tuples present that are not visible to everyone or if there are
- * dead tuples which are not yet removable. However, dead tuples which
- * will be removed by the end of vacuuming should not preclude us from
- * opportunistically freezing. Because of that, we do not immediately
- * clear all_visible and all_frozen when we see LP_DEAD items. We fix
- * that after scanning the line pointers, before we return the value to
- * the caller, so that the caller doesn't set the VM bits incorrectly.
+ * Dead tuples that will be removed by the end of vacuum should not
+ * prevent opportunistic freezing. Therefore, we do not clear all_visible
+ * and all_frozen when we encounter LP_DEAD items. Instead, we correct
+ * them after deciding whether to freeze, but before updating the VM, to
+ * avoid setting the VM bits incorrectly.
+ *
+ * If neither freezing nor VM updates are requested, we skip the extra
+ * bookkeeping. In this case, initializing all_visible to false allows
+ * heap_prune_record_unchanged_lp_normal() to bypass unnecessary work.
*/
if (prstate->attempt_freeze)
{
prstate->all_visible = true;
prstate->all_frozen = true;
}
+ else if (prstate->attempt_update_vm)
+ {
+ prstate->all_visible = true;
+ prstate->all_frozen = false;
+ }
else
{
- /*
- * Initializing to false allows skipping the work to update them in
- * heap_prune_record_unchanged_lp_normal().
- */
prstate->all_visible = false;
prstate->all_frozen = false;
}
/*
- * The visibility cutoff xid is the newest xmin of live tuples on the
- * page. In the common case, this will be set as the conflict horizon the
- * caller can use for updating the VM. If, at the end of freezing and
- * pruning, the page is all-frozen, there is no possibility that any
- * running transaction on the standby does not see tuples on the page as
- * all-visible, so the conflict horizon remains InvalidTransactionId.
+ * The visibility cutoff xid is the newest xmin of live, committed tuples
+ * older than OldestXmin on the page. This field is only kept up-to-date
+ * if the page is all-visible. As soon as a tuple is encountered that is
+ * not visible to all, this field is unmaintained. As long as it is
+ * maintained, it can be used to calculate the snapshot conflict horizon
+ * when updating the VM and/or freezing all the tuples on the page.
*/
prstate->visibility_cutoff_xid = InvalidTransactionId;
}
@@ -753,10 +774,133 @@ heap_page_will_freeze(Relation relation, Buffer buffer,
return do_freeze;
}
+/*
+ * Decide whether to set the visibility map bits for heap_blk, using
+ * information from PruneState and blk_known_av. Some callers may already
+ * have examined this page’s VM bits (e.g., VACUUM in the previous
+ * heap_vac_scan_next_block() call) and can pass that along.
+ *
+ * Returns true if one or both VM bits should be set, along with the desired
+ * flags in *vmflags. Also indicates via do_set_pd_vis whether PD_ALL_VISIBLE
+ * should be set on the heap page.
+ */
+static bool
+heap_page_will_set_vis(Relation relation,
+ BlockNumber heap_blk,
+ Buffer heap_buf,
+ Buffer vmbuffer,
+ bool blk_known_av,
+ const PruneState *prstate,
+ uint8 *vmflags,
+ bool *do_set_pd_vis)
+{
+ Page heap_page = BufferGetPage(heap_buf);
+ bool do_set_vm = false;
+
+ *do_set_pd_vis = false;
+
+ if (!prstate->attempt_update_vm)
+ {
+ Assert(!prstate->all_visible && !prstate->all_frozen);
+ Assert(*vmflags == 0);
+ return false;
+ }
+
+ if (prstate->all_visible && !PageIsAllVisible(heap_page))
+ *do_set_pd_vis = true;
+
+ if ((prstate->all_visible && !blk_known_av) ||
+ (prstate->all_frozen && !VM_ALL_FROZEN(relation, heap_blk, &vmbuffer)))
+ {
+ *vmflags = VISIBILITYMAP_ALL_VISIBLE;
+ if (prstate->all_frozen)
+ *vmflags |= VISIBILITYMAP_ALL_FROZEN;
+
+ do_set_vm = true;
+ }
+
+ /*
+ * Now handle two potential corruption cases:
+ *
+ * These do not need to happen in a critical section and are not
+ * WAL-logged.
+ *
+ * As of PostgreSQL 9.2, the visibility map bit should never be set if the
+ * page-level bit is clear. However, it's possible that in vacuum the bit
+ * got cleared after heap_vac_scan_next_block() was called, so we must
+ * recheck with buffer lock before concluding that the VM is corrupt.
+ */
+ else if (blk_known_av && !PageIsAllVisible(heap_page) &&
+ visibilitymap_get_status(relation, heap_blk, &vmbuffer) != 0)
+ {
+ ereport(WARNING,
+ (errcode(ERRCODE_DATA_CORRUPTED),
+ errmsg("page is not marked all-visible but visibility map bit is set in relation \"%s\" page %u",
+ RelationGetRelationName(relation), heap_blk)));
+
+ visibilitymap_clear(relation, heap_blk, vmbuffer,
+ VISIBILITYMAP_VALID_BITS);
+ }
+
+ /*
+ * It's possible for the value returned by
+ * GetOldestNonRemovableTransactionId() to move backwards, so it's not
+ * wrong for us to see tuples that appear to not be visible to everyone
+ * yet, while PD_ALL_VISIBLE is already set. The real safe xmin value
+ * never moves backwards, but GetOldestNonRemovableTransactionId() is
+ * conservative and sometimes returns a value that's unnecessarily small,
+ * so if we see that contradiction it just means that the tuples that we
+ * think are not visible to everyone yet actually are, and the
+ * PD_ALL_VISIBLE flag is correct.
+ *
+ * There should never be LP_DEAD items on a page with PD_ALL_VISIBLE set,
+ * however.
+ */
+ else if (prstate->lpdead_items > 0 && PageIsAllVisible(heap_page))
+ {
+ ereport(WARNING,
+ (errcode(ERRCODE_DATA_CORRUPTED),
+ errmsg("page containing LP_DEAD items is marked as all-visible in relation \"%s\" page %u",
+ RelationGetRelationName(relation), heap_blk)));
+
+ PageClearAllVisible(heap_page);
+ MarkBufferDirty(heap_buf);
+ visibilitymap_clear(relation, heap_blk, vmbuffer,
+ VISIBILITYMAP_VALID_BITS);
+ }
+
+ return do_set_vm;
+}
+
+
+#ifdef USE_ASSERT_CHECKING
+
+/*
+ * Wrapper for heap_page_would_be_all_visible() which can be used for callers
+ * that expect no LP_DEAD on the page. Currently assert-only, but there is no
+ * reason not to use it outside of asserts.
+ */
+static bool
+heap_page_is_all_visible(Relation rel, Buffer buf,
+ TransactionId OldestXmin,
+ bool *all_frozen,
+ TransactionId *visibility_cutoff_xid,
+ OffsetNumber *logging_offnum)
+{
+
+ return heap_page_would_be_all_visible(rel, buf,
+ OldestXmin,
+ NULL, 0,
+ all_frozen,
+ visibility_cutoff_xid,
+ logging_offnum);
+}
+#endif
/*
* Prune and repair fragmentation and potentially freeze tuples on the
- * specified page.
+ * specified page. If the page's visibility status has changed, update it in
+ * the VM.
*
* Caller must have pin and buffer cleanup lock on the page. Note that we
* don't update the FSM information for page on caller's behalf. Caller might
@@ -771,12 +915,13 @@ heap_page_will_freeze(Relation relation, Buffer buffer,
* tuples if it's required in order to advance relfrozenxid / relminmxid, or
* if it's considered advantageous for overall system performance to do so
* now. The 'params.cutoffs', 'presult', 'new_relfrozen_xid' and
- * 'new_relmin_mxid' arguments are required when freezing. When
- * HEAP_PAGE_PRUNE_FREEZE option is passed, we also set presult->all_visible
- * and presult->all_frozen after determining whether or not to
- * opporunistically freeze, to indicate if the VM bits can be set. They are
- * always set to false when the HEAP_PAGE_PRUNE_FREEZE option is not passed,
- * because at the moment only callers that also freeze need that information.
+ * 'new_relmin_mxid' arguments are required when freezing.
+ *
+ * If HEAP_PAGE_PRUNE_UPDATE_VIS is set in params and the visibility status of
+ * the page has changed, we will update the VM at the same time as pruning and
+ * freezing the heap page. We will also update presult->old_vmbits and
+ * presult->new_vmbits with the state of the VM before and after updating it
+ * for the caller to use in bookkeeping.
*
* presult contains output parameters needed by callers, such as the number of
* tuples removed and the offsets of dead items on the page after pruning.
@@ -801,14 +946,21 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
MultiXactId *new_relmin_mxid)
{
Buffer buffer = params->buffer;
+ Buffer vmbuffer = params->vmbuffer;
Page page = BufferGetPage(buffer);
+ BlockNumber blockno = BufferGetBlockNumber(buffer);
PruneState prstate;
bool do_freeze;
bool do_prune;
bool do_hint_prune;
+ bool do_set_vm;
+ bool do_set_pd_vis;
bool did_tuple_hint_fpi;
int64 fpi_before = pgWalUsage.wal_fpi;
TransactionId frz_conflict_horizon = InvalidTransactionId;
+ TransactionId conflict_xid = InvalidTransactionId;
+ uint8 new_vmbits = 0;
+ uint8 old_vmbits = 0;
/* Initialize prstate */
prune_freeze_setup(params,
@@ -875,6 +1027,34 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
prstate.all_visible = prstate.all_frozen = false;
Assert(!prstate.all_frozen || prstate.all_visible);
+ Assert(!prstate.all_visible || (prstate.lpdead_items == 0));
+
+ /*
+ * Decide whether to set the page-level PD_ALL_VISIBLE bit and the VM bits
+ * based on information from the VM and the all_visible/all_frozen flags.
+ *
+ * While it is valid for PD_ALL_VISIBLE to be set when the corresponding
+ * VM bit is clear, we strongly prefer to keep them in sync.
+ *
+ * Accordingly, we also allow updating only the VM when PD_ALL_VISIBLE has
+ * already been set. Setting only the VM is most common when setting an
+ * already all-visible page all-frozen.
+ */
+ do_set_vm = heap_page_will_set_vis(params->relation,
+ blockno, buffer, vmbuffer, params->blk_known_av,
+ &prstate, &new_vmbits, &do_set_pd_vis);
+
+ /* We should only set the VM if PD_ALL_VISIBLE is set or will be */
+ Assert(!do_set_vm || do_set_pd_vis || PageIsAllVisible(page));
+
+ conflict_xid = get_conflict_xid(do_prune, do_freeze, do_set_vm,
+ prstate.latest_xid_removed, frz_conflict_horizon,
+ prstate.visibility_cutoff_xid, params->blk_known_av,
+ (do_set_vm && (new_vmbits & VISIBILITYMAP_ALL_FROZEN)));
+
+ /* Lock vmbuffer before entering a critical section */
+ if (do_set_vm)
+ LockBuffer(vmbuffer, BUFFER_LOCK_EXCLUSIVE);
/* Any error while applying the changes is critical */
START_CRIT_SECTION();
@@ -896,14 +1076,17 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
/*
* If that's all we had to do to the page, this is a non-WAL-logged
- * hint. If we are going to freeze or prune the page, we will mark
- * the buffer dirty below.
+ * hint. If we are going to freeze or prune the page or set
+ * PD_ALL_VISIBLE, we will mark the buffer dirty below.
+ *
+ * Setting PD_ALL_VISIBLE is fully WAL-logged because it is forbidden
+ * for the VM to be set and PD_ALL_VISIBLE to be clear.
*/
- if (!do_freeze && !do_prune)
+ if (!do_freeze && !do_prune && !do_set_pd_vis)
MarkBufferDirtyHint(buffer, true);
}
- if (do_prune || do_freeze)
+ if (do_prune || do_freeze || do_set_vm)
{
/* Apply the planned item changes and repair page fragmentation. */
if (do_prune)
@@ -917,35 +1100,43 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
if (do_freeze)
heap_freeze_prepared_tuples(buffer, prstate.frozen, prstate.nfrozen);
- MarkBufferDirty(buffer);
+ if (do_set_pd_vis)
+ PageSetAllVisible(page);
- /*
- * Emit a WAL XLOG_HEAP2_PRUNE* record showing what we did
- */
- if (RelationNeedsWAL(params->relation))
+ if (do_prune || do_freeze || do_set_pd_vis)
+ MarkBufferDirty(buffer);
+
+ if (do_set_vm)
{
- /*
- * The snapshotConflictHorizon for the whole record should be the
- * most conservative of all the horizons calculated for any of the
- * possible modifications. If this record will prune tuples, any
- * transactions on the standby older than the youngest xmax of the
- * most recently removed tuple this record will prune will
- * conflict. If this record will freeze tuples, any transactions
- * on the standby with xids older than the youngest tuple this
- * record will freeze will conflict.
- */
- TransactionId conflict_xid;
+ Assert(PageIsAllVisible(page));
- if (TransactionIdFollows(frz_conflict_horizon, prstate.latest_xid_removed))
- conflict_xid = frz_conflict_horizon;
- else
- conflict_xid = prstate.latest_xid_removed;
+ old_vmbits = visibilitymap_set_vmbits(blockno,
+ vmbuffer, new_vmbits,
+ params->relation->rd_locator);
+ if (old_vmbits == new_vmbits)
+ {
+ LockBuffer(vmbuffer, BUFFER_LOCK_UNLOCK);
+ /* Unset so we don't emit WAL since no change occurred */
+ do_set_vm = false;
+ }
+ }
+ /*
+ * Emit a WAL XLOG_HEAP2_PRUNE* record showing what we did. If we were
+ * only updating the VM and it turns out it was already set, we will
+ * have unset do_set_vm earlier. As such, check it again before
+ * emitting the record.
+ */
+ if (RelationNeedsWAL(params->relation) &&
+ (do_prune || do_freeze || do_set_vm))
+ {
log_heap_prune_and_freeze(params->relation, buffer,
- InvalidBuffer, /* vmbuffer */
- 0, /* vmflags */
+ do_set_vm ? vmbuffer : InvalidBuffer,
+ do_set_vm ? new_vmbits : 0,
conflict_xid,
- true, params->reason,
+ true, /* cleanup lock */
+ do_set_pd_vis,
+ params->reason,
prstate.frozen, prstate.nfrozen,
prstate.redirected, prstate.nredirected,
prstate.nowdead, prstate.ndead,
@@ -955,28 +1146,47 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
END_CRIT_SECTION();
+ if (do_set_vm)
+ LockBuffer(vmbuffer, BUFFER_LOCK_UNLOCK);
+
+ /*
+ * During its second pass over the heap, VACUUM calls
+ * heap_page_would_be_all_visible() to determine whether a page is
+ * all-visible and all-frozen. The logic here is similar. After completing
+ * pruning and freezing, use an assertion to verify that our results
+ * remain consistent with heap_page_would_be_all_visible().
+ */
+#ifdef USE_ASSERT_CHECKING
+ if (prstate.all_visible)
+ {
+ TransactionId debug_cutoff;
+ bool debug_all_frozen;
+
+ Assert(prstate.lpdead_items == 0);
+ Assert(prstate.cutoffs);
+
+ if (!heap_page_is_all_visible(params->relation, buffer,
+ prstate.cutoffs->OldestXmin,
+ &debug_all_frozen,
+ &debug_cutoff, off_loc))
+ Assert(false);
+
+ Assert(prstate.all_frozen == debug_all_frozen);
+
+ Assert(!TransactionIdIsValid(debug_cutoff) ||
+ debug_cutoff == prstate.visibility_cutoff_xid);
+ }
+#endif
+
/* Copy information back for caller */
presult->ndeleted = prstate.ndeleted;
presult->nnewlpdead = prstate.ndead;
presult->nfrozen = prstate.nfrozen;
presult->live_tuples = prstate.live_tuples;
presult->recently_dead_tuples = prstate.recently_dead_tuples;
- presult->all_visible = prstate.all_visible;
- presult->all_frozen = prstate.all_frozen;
presult->hastup = prstate.hastup;
-
- /*
- * For callers planning to update the visibility map, the conflict horizon
- * for that record must be the newest xmin on the page. However, if the
- * page is completely frozen, there can be no conflict and the
- * vm_conflict_horizon should remain InvalidTransactionId. This includes
- * the case that we just froze all the tuples; the prune-freeze record
- * included the conflict XID already so the caller doesn't need it.
- */
- if (presult->all_frozen)
- presult->vm_conflict_horizon = InvalidTransactionId;
- else
- presult->vm_conflict_horizon = prstate.visibility_cutoff_xid;
+ presult->new_vmbits = new_vmbits;
+ presult->old_vmbits = old_vmbits;
presult->lpdead_items = prstate.lpdead_items;
/* the presult->deadoffsets array was already filled in */
@@ -1468,6 +1678,8 @@ heap_prune_record_unchanged_lp_normal(Page page, PruneState *prstate, OffsetNumb
{
TransactionId xmin;
+ Assert(prstate->attempt_update_vm);
+
if (!HeapTupleHeaderXminCommitted(htup))
{
prstate->all_visible = false;
@@ -2118,6 +2330,65 @@ heap_log_freeze_plan(HeapTupleFreeze *tuples, int ntuples,
return nplans;
}
+/*
+ * Calculate the conflict horizon for the whole XLOG_HEAP2_PRUNE_VACUUM_SCAN
+ * record.
+ */
+static TransactionId
+get_conflict_xid(bool do_prune, bool do_freeze, bool do_set_vm,
+ TransactionId latest_xid_removed, TransactionId frz_conflict_horizon,
+ TransactionId visibility_cutoff_xid, bool blk_already_av,
+ bool set_blk_all_frozen)
+{
+ TransactionId conflict_xid;
+
+ /*
+ * We can omit the snapshot conflict horizon if we are not pruning or
+ * freezing any tuples and are setting an already all-visible page
+ * all-frozen in the VM. In this case, all of the tuples on the page must
+ * already be visible to all MVCC snapshots on the standby.
+ */
+ if (!do_prune && !do_freeze &&
+ do_set_vm && blk_already_av && set_blk_all_frozen)
+ return InvalidTransactionId;
+
+ /*
+ * The snapshotConflictHorizon for the whole record should be the most
+ * conservative of all the horizons calculated for any of the possible
+ * modifications. If this record will prune tuples, any transactions on
+ * the standby older than the youngest xmax of the most recently removed
+ * tuple this record will prune will conflict. If this record will freeze
+ * tuples, any transactions on the standby with xids older than the
+ * youngest tuple this record will freeze will conflict.
+ */
+ conflict_xid = InvalidTransactionId;
+
+ /*
+ * If we are updating the VM, the conflict horizon is almost always the
+ * visibility cutoff XID.
+ *
+ * Separately, if we are freezing any tuples, as an optimization, we can
+ * use the visibility_cutoff_xid as the conflict horizon if the page will
+ * be all-frozen. This is true even if there are LP_DEAD line pointers
+ * because we ignored those when maintaining the visibility_cutoff_xid.
+ * This will have been calculated earlier as the frz_conflict_horizon when
+ * we determined we would freeze.
+ */
+ if (do_set_vm)
+ conflict_xid = visibility_cutoff_xid;
+ else if (do_freeze)
+ conflict_xid = frz_conflict_horizon;
+
+ /*
+ * If we are removing tuples with a younger xmax than our so far
+ * calculated conflict_xid, we must use this as our horizon.
+ */
+ if (TransactionIdFollows(latest_xid_removed, conflict_xid))
+ conflict_xid = latest_xid_removed;
+
+ return conflict_xid;
+}
+
/*
* Write an XLOG_HEAP2_PRUNE* WAL record
*
@@ -2142,6 +2413,15 @@ heap_log_freeze_plan(HeapTupleFreeze *tuples, int ntuples,
* case, vmbuffer should already have been updated and marked dirty and should
* still be pinned and locked.
*
+ * set_pd_all_vis indicates that we set PD_ALL_VISIBLE and thus should update
+ * the page LSN when checksums/wal_log_hints are enabled even if we did not
+ * prune or freeze tuples on the page.
+ *
+ * In some cases, such as when heap_page_prune_and_freeze() is setting an
+ * already marked all-visible page all-frozen, PD_ALL_VISIBLE may already be
+ * set. So, it is possible for vmflags to be non-zero and set_pd_all_vis to be
+ * false.
+ *
* Note: This function scribbles on the 'frozen' array.
*
* Note: This is called in a critical section, so careful what you do here.
@@ -2151,6 +2431,7 @@ log_heap_prune_and_freeze(Relation relation, Buffer buffer,
Buffer vmbuffer, uint8 vmflags,
TransactionId conflict_xid,
bool cleanup_lock,
+ bool set_pd_all_vis,
PruneReason reason,
HeapTupleFreeze *frozen, int nfrozen,
OffsetNumber *redirected, int nredirected,
@@ -2187,7 +2468,7 @@ log_heap_prune_and_freeze(Relation relation, Buffer buffer,
*/
if (!do_prune &&
nfrozen == 0 &&
- (!do_set_vm || !XLogHintBitIsNeeded()))
+ (!set_pd_all_vis || !XLogHintBitIsNeeded()))
regbuf_flags_heap |= REGBUF_NO_IMAGE;
/*
@@ -2305,7 +2586,8 @@ log_heap_prune_and_freeze(Relation relation, Buffer buffer,
* See comment at the top of the function about regbuf_flags_heap for
* details on when we can advance the page LSN.
*/
- if (do_prune || nfrozen > 0 || (do_set_vm && XLogHintBitIsNeeded()))
+ if (do_prune || nfrozen > 0 ||
+ (set_pd_all_vis && XLogHintBitIsNeeded()))
{
Assert(BufferIsDirty(buffer));
PageSetLSN(BufferGetPage(buffer), recptr);
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index e1b7456823d..a7a974b6639 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -464,20 +464,6 @@ static void dead_items_add(LVRelState *vacrel, BlockNumber blkno, OffsetNumber *
static void dead_items_reset(LVRelState *vacrel);
static void dead_items_cleanup(LVRelState *vacrel);
-#ifdef USE_ASSERT_CHECKING
-static bool heap_page_is_all_visible(Relation rel, Buffer buf,
- TransactionId OldestXmin,
- bool *all_frozen,
- TransactionId *visibility_cutoff_xid,
- OffsetNumber *logging_offnum);
-#endif
-static bool heap_page_would_be_all_visible(Relation rel, Buffer buf,
- TransactionId OldestXmin,
- OffsetNumber *deadoffsets,
- int ndeadoffsets,
- bool *all_frozen,
- TransactionId *visibility_cutoff_xid,
- OffsetNumber *logging_offnum);
static void update_relstats_all_indexes(LVRelState *vacrel);
static void vacuum_error_callback(void *arg);
static void update_vacuum_error_info(LVRelState *vacrel,
@@ -1966,7 +1952,9 @@ lazy_scan_prune(LVRelState *vacrel,
Relation rel = vacrel->rel;
PruneFreezeResult presult;
PruneFreezeParams params = {.relation = rel,.buffer = buf,
- .reason = PRUNE_VACUUM_SCAN,.options = HEAP_PAGE_PRUNE_FREEZE,
+ .vmbuffer = vmbuffer,.blk_known_av = all_visible_according_to_vm,
+ .reason = PRUNE_VACUUM_SCAN,
+ .options = HEAP_PAGE_PRUNE_FREEZE | HEAP_PAGE_PRUNE_UPDATE_VIS,
.cutoffs = &vacrel->cutoffs,.vistest = vacrel->vistest
};
@@ -2009,33 +1997,6 @@ lazy_scan_prune(LVRelState *vacrel,
vacrel->new_frozen_tuple_pages++;
}
- /*
- * VACUUM will call heap_page_is_all_visible() during the second pass over
- * the heap to determine all_visible and all_frozen for the page -- this
- * is a specialized version of the logic from this function. Now that
- * we've finished pruning and freezing, make sure that we're in total
- * agreement with heap_page_is_all_visible() using an assertion.
- */
-#ifdef USE_ASSERT_CHECKING
- if (presult.all_visible)
- {
- TransactionId debug_cutoff;
- bool debug_all_frozen;
-
- Assert(presult.lpdead_items == 0);
-
- if (!heap_page_is_all_visible(vacrel->rel, buf,
- vacrel->cutoffs.OldestXmin, &debug_all_frozen,
- &debug_cutoff, &vacrel->offnum))
- Assert(false);
-
- Assert(presult.all_frozen == debug_all_frozen);
-
- Assert(!TransactionIdIsValid(debug_cutoff) ||
- debug_cutoff == presult.vm_conflict_horizon);
- }
-#endif
-
/*
* Now save details of the LP_DEAD items from the page in vacrel
*/
@@ -2069,168 +2030,26 @@ lazy_scan_prune(LVRelState *vacrel,
/* Did we find LP_DEAD items? */
*has_lpdead_items = (presult.lpdead_items > 0);
- Assert(!presult.all_visible || !(*has_lpdead_items));
- Assert(!presult.all_frozen || presult.all_visible);
-
/*
- * Handle setting visibility map bit based on information from the VM (as
- * of last heap_vac_scan_next_block() call), and from all_visible and
- * all_frozen variables
+ * For the purposes of logging, count whether or not the page was newly
+ * set all-visible and, potentially, all-frozen.
*/
- if (!all_visible_according_to_vm && presult.all_visible)
+ if ((presult.old_vmbits & VISIBILITYMAP_ALL_VISIBLE) == 0 &&
+ (presult.new_vmbits & VISIBILITYMAP_ALL_VISIBLE) != 0)
{
- uint8 old_vmbits;
- uint8 flags = VISIBILITYMAP_ALL_VISIBLE;
-
- if (presult.all_frozen)
- {
- Assert(!TransactionIdIsValid(presult.vm_conflict_horizon));
- flags |= VISIBILITYMAP_ALL_FROZEN;
- }
-
- /*
- * It should never be the case that the visibility map page is set
- * while the page-level bit is clear, but the reverse is allowed (if
- * checksums are not enabled). Regardless, set both bits so that we
- * get back in sync.
- *
- * NB: If the heap page is all-visible but the VM bit is not set, we
- * don't need to dirty the heap page. However, if checksums are
- * enabled, we do need to make sure that the heap page is dirtied
- * before passing it to visibilitymap_set(), because it may be logged.
- * Given that this situation should only happen in rare cases after a
- * crash, it is not worth optimizing.
- */
- PageSetAllVisible(page);
- MarkBufferDirty(buf);
- old_vmbits = visibilitymap_set(vacrel->rel, blkno, buf,
- InvalidXLogRecPtr,
- vmbuffer, presult.vm_conflict_horizon,
- flags);
-
- /*
- * If the page wasn't already set all-visible and/or all-frozen in the
- * VM, count it as newly set for logging.
- */
- if ((old_vmbits & VISIBILITYMAP_ALL_VISIBLE) == 0)
- {
- vacrel->vm_new_visible_pages++;
- if (presult.all_frozen)
- {
- vacrel->vm_new_visible_frozen_pages++;
- *vm_page_frozen = true;
- }
- }
- else if ((old_vmbits & VISIBILITYMAP_ALL_FROZEN) == 0 &&
- presult.all_frozen)
+ vacrel->vm_new_visible_pages++;
+ if ((presult.new_vmbits & VISIBILITYMAP_ALL_FROZEN) != 0)
{
- vacrel->vm_new_frozen_pages++;
+ vacrel->vm_new_visible_frozen_pages++;
*vm_page_frozen = true;
}
}
-
- /*
- * As of PostgreSQL 9.2, the visibility map bit should never be set if the
- * page-level bit is clear. However, it's possible that the bit got
- * cleared after heap_vac_scan_next_block() was called, so we must recheck
- * with buffer lock before concluding that the VM is corrupt.
- */
- else if (all_visible_according_to_vm && !PageIsAllVisible(page) &&
- visibilitymap_get_status(vacrel->rel, blkno, &vmbuffer) != 0)
- {
- ereport(WARNING,
- (errcode(ERRCODE_DATA_CORRUPTED),
- errmsg("page is not marked all-visible but visibility map bit is set in relation \"%s\" page %u",
- vacrel->relname, blkno)));
-
- visibilitymap_clear(vacrel->rel, blkno, vmbuffer,
- VISIBILITYMAP_VALID_BITS);
- }
-
- /*
- * It's possible for the value returned by
- * GetOldestNonRemovableTransactionId() to move backwards, so it's not
- * wrong for us to see tuples that appear to not be visible to everyone
- * yet, while PD_ALL_VISIBLE is already set. The real safe xmin value
- * never moves backwards, but GetOldestNonRemovableTransactionId() is
- * conservative and sometimes returns a value that's unnecessarily small,
- * so if we see that contradiction it just means that the tuples that we
- * think are not visible to everyone yet actually are, and the
- * PD_ALL_VISIBLE flag is correct.
- *
- * There should never be LP_DEAD items on a page with PD_ALL_VISIBLE set,
- * however.
- */
- else if (presult.lpdead_items > 0 && PageIsAllVisible(page))
- {
- ereport(WARNING,
- (errcode(ERRCODE_DATA_CORRUPTED),
- errmsg("page containing LP_DEAD items is marked as all-visible in relation \"%s\" page %u",
- vacrel->relname, blkno)));
-
- PageClearAllVisible(page);
- MarkBufferDirty(buf);
- visibilitymap_clear(vacrel->rel, blkno, vmbuffer,
- VISIBILITYMAP_VALID_BITS);
- }
-
- /*
- * If the all-visible page is all-frozen but not marked as such yet, mark
- * it as all-frozen.
- */
- else if (all_visible_according_to_vm && presult.all_frozen &&
- !VM_ALL_FROZEN(vacrel->rel, blkno, &vmbuffer))
+ else if ((presult.old_vmbits & VISIBILITYMAP_ALL_FROZEN) == 0 &&
+ (presult.new_vmbits & VISIBILITYMAP_ALL_FROZEN) != 0)
{
- uint8 old_vmbits;
-
- /*
- * Avoid relying on all_visible_according_to_vm as a proxy for the
- * page-level PD_ALL_VISIBLE bit being set, since it might have become
- * stale -- even when all_visible is set
- */
- if (!PageIsAllVisible(page))
- {
- PageSetAllVisible(page);
- MarkBufferDirty(buf);
- }
-
- /*
- * Set the page all-frozen (and all-visible) in the VM.
- *
- * We can pass InvalidTransactionId as our cutoff_xid, since a
- * snapshotConflictHorizon sufficient to make everything safe for REDO
- * was logged when the page's tuples were frozen.
- */
- Assert(!TransactionIdIsValid(presult.vm_conflict_horizon));
- old_vmbits = visibilitymap_set(vacrel->rel, blkno, buf,
- InvalidXLogRecPtr,
- vmbuffer, InvalidTransactionId,
- VISIBILITYMAP_ALL_VISIBLE |
- VISIBILITYMAP_ALL_FROZEN);
-
- /*
- * The page was likely already set all-visible in the VM. However,
- * there is a small chance that it was modified sometime between
- * setting all_visible_according_to_vm and checking the visibility
- * during pruning. Check the return value of old_vmbits anyway to
- * ensure the visibility map counters used for logging are accurate.
- */
- if ((old_vmbits & VISIBILITYMAP_ALL_VISIBLE) == 0)
- {
- vacrel->vm_new_visible_pages++;
- vacrel->vm_new_visible_frozen_pages++;
- *vm_page_frozen = true;
- }
-
- /*
- * We already checked that the page was not set all-frozen in the VM
- * above, so we don't need to test the value of old_vmbits.
- */
- else
- {
- vacrel->vm_new_frozen_pages++;
- *vm_page_frozen = true;
- }
+ Assert((presult.new_vmbits & VISIBILITYMAP_ALL_VISIBLE) != 0);
+ vacrel->vm_new_frozen_pages++;
+ *vm_page_frozen = true;
}
return presult.ndeleted;
@@ -2952,6 +2771,7 @@ lazy_vacuum_heap_page(LVRelState *vacrel, BlockNumber blkno, Buffer buffer,
vmflags,
conflict_xid,
false, /* no cleanup lock required */
+ (vmflags & VISIBILITYMAP_VALID_BITS) != 0,
PRUNE_VACUUM_CLEANUP,
NULL, 0, /* frozen */
NULL, 0, /* redirected */
@@ -3632,30 +3452,6 @@ dead_items_cleanup(LVRelState *vacrel)
vacrel->pvs = NULL;
}
-#ifdef USE_ASSERT_CHECKING
-
-/*
- * Wrapper for heap_page_would_be_all_visible() which can be used for callers
- * that expect no LP_DEAD on the page. Currently assert-only, but there is no
- * reason not to use it outside of asserts.
- */
-static bool
-heap_page_is_all_visible(Relation rel, Buffer buf,
- TransactionId OldestXmin,
- bool *all_frozen,
- TransactionId *visibility_cutoff_xid,
- OffsetNumber *logging_offnum)
-{
-
- return heap_page_would_be_all_visible(rel, buf,
- OldestXmin,
- NULL, 0,
- all_frozen,
- visibility_cutoff_xid,
- logging_offnum);
-}
-#endif
-
/*
* Check whether the heap page in buf is all-visible except for the dead
* tuples referenced in the deadoffsets array.
@@ -3678,15 +3474,12 @@ heap_page_is_all_visible(Relation rel, Buffer buf,
* - *logging_offnum: OffsetNumber of current tuple being processed;
* used by vacuum's error callback system.
*
- * Callers looking to verify that the page is already all-visible can call
- * heap_page_is_all_visible().
- *
* This logic is closely related to heap_prune_record_unchanged_lp_normal().
* If you modify this function, ensure consistency with that code. An
* assertion cross-checks that both remain in agreement. Do not introduce new
* side-effects.
*/
-static bool
+bool
heap_page_would_be_all_visible(Relation rel, Buffer buf,
TransactionId OldestXmin,
OffsetNumber *deadoffsets,
diff --git a/src/include/access/heapam.h b/src/include/access/heapam.h
index 632c4332a8c..937b46a77db 100644
--- a/src/include/access/heapam.h
+++ b/src/include/access/heapam.h
@@ -42,6 +42,7 @@
/* "options" flag bits for heap_page_prune_and_freeze */
#define HEAP_PAGE_PRUNE_MARK_UNUSED_NOW (1 << 0)
#define HEAP_PAGE_PRUNE_FREEZE (1 << 1)
+#define HEAP_PAGE_PRUNE_UPDATE_VIS (1 << 2)
typedef struct BulkInsertStateData *BulkInsertState;
typedef struct GlobalVisState GlobalVisState;
@@ -238,6 +239,16 @@ typedef struct PruneFreezeParams
Relation relation; /* relation containing buffer to be pruned */
Buffer buffer; /* buffer to be pruned */
+ /*
+ *
+ * vmbuffer is the buffer that must already contain contain the required
+ * block of the visibility map if we are to update it. blk_known_av is the
+ * visibility status of the heap block as of the last call to
+ * find_next_unskippable_block().
+ */
+ Buffer vmbuffer;
+ bool blk_known_av;
+
/*
* The reason pruning was performed. It is used to set the WAL record
* opcode which is used for debugging and analysis purposes.
@@ -250,8 +261,10 @@ typedef struct PruneFreezeParams
* HEAP_PAGE_PRUNE_MARK_UNUSED_NOW indicates that dead items can be set
* LP_UNUSED during pruning.
*
- * HEAP_PAGE_PRUNE_FREEZE indicates that we will also freeze tuples, and
- * will return 'all_visible', 'all_frozen' flags to the caller.
+ * HEAP_PAGE_PRUNE_FREEZE indicates that we will also freeze tuples
+ *
+ * HEAP_PAGE_PRUNE_UPDATE_VIS indicates that we will set the page's status
+ * in the VM.
*/
int options;
@@ -285,19 +298,15 @@ typedef struct PruneFreezeResult
int recently_dead_tuples;
/*
- * all_visible and all_frozen indicate if the all-visible and all-frozen
- * bits in the visibility map can be set for this page, after pruning.
- *
- * vm_conflict_horizon is the newest xmin of live tuples on the page. The
- * caller can use it as the conflict horizon when setting the VM bits. It
- * is only valid if we froze some tuples (nfrozen > 0), and all_frozen is
- * true.
+ * old_vmbits are the state of the all-visible and all-frozen bits in the
+ * visibility map before updating it during phase I of vacuuming.
+ * new_vmbits are the state of those bits after phase I of vacuuming.
*
- * These are only set if the HEAP_PRUNE_FREEZE option is set.
+ * These are only set if the HEAP_PAGE_PRUNE_UPDATE_VIS option is set and
+ * we have attempted to update the VM.
*/
- bool all_visible;
- bool all_frozen;
- TransactionId vm_conflict_horizon;
+ uint8 new_vmbits;
+ uint8 old_vmbits;
/*
* Whether or not the page makes rel truncation unsafe. This is set to
@@ -424,6 +433,7 @@ extern void log_heap_prune_and_freeze(Relation relation, Buffer buffer,
Buffer vmbuffer, uint8 vmflags,
TransactionId conflict_xid,
bool cleanup_lock,
+ bool set_pd_all_vis,
PruneReason reason,
HeapTupleFreeze *frozen, int nfrozen,
OffsetNumber *redirected, int nredirected,
@@ -433,6 +443,13 @@ extern void log_heap_prune_and_freeze(Relation relation, Buffer buffer,
/* in heap/vacuumlazy.c */
extern void heap_vacuum_rel(Relation rel,
const VacuumParams params, BufferAccessStrategy bstrategy);
+extern bool heap_page_would_be_all_visible(Relation rel, Buffer buf,
+ TransactionId OldestXmin,
+ OffsetNumber *deadoffsets,
+ int ndeadoffsets,
+ bool *all_frozen,
+ TransactionId *visibility_cutoff_xid,
+ OffsetNumber *logging_offnum);
/* in heap/heapam_visibility.c */
extern bool HeapTupleSatisfiesVisibility(HeapTuple htup, Snapshot snapshot,
--
2.43.0
v20-0001-Refactor-heap_page_prune_and_freeze-parameters-i.patchtext/x-patch; charset=UTF-8; name=v20-0001-Refactor-heap_page_prune_and_freeze-parameters-i.patchDownload
From 12ecb9ef685f2fed0d741f91d6fc6a6a9f959c80 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <melanieplageman@gmail.com>
Date: Tue, 14 Oct 2025 11:10:25 -0400
Subject: [PATCH v20 01/12] Refactor heap_page_prune_and_freeze() parameters
into a struct
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
heap_page_prune_and_freeze() had accumulated an unwieldy number of input
parameters and upcoming work to handle VM updates in this function will
add even more.
Introduce a new PruneFreezeParams struct to group the function’s input
parameters, improving readability and maintainability.
Author: Melanie Plageman <melanieplageman@gmail.com>
Suggested-by: Andres Freund <andres@anarazel.de>
Discussion: https://postgr.es/m/yn4zp35kkdsjx6wf47zcfmxgexxt4h2og47pvnw2x5ifyrs3qc%407uw6jyyxuyf7
---
src/backend/access/heap/pruneheap.c | 91 +++++++++++++---------------
src/backend/access/heap/vacuumlazy.c | 12 ++--
src/include/access/heapam.h | 63 +++++++++++++++----
src/tools/pgindent/typedefs.list | 1 +
4 files changed, 100 insertions(+), 67 deletions(-)
diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index 231bea679c6..e9e14cb42b7 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -261,12 +261,18 @@ heap_page_prune_opt(Relation relation, Buffer buffer)
PruneFreezeResult presult;
/*
- * For now, pass mark_unused_now as false regardless of whether or
- * not the relation has indexes, since we cannot safely determine
- * that during on-access pruning with the current implementation.
+ * We don't pass the HEAP_PAGE_PRUNE_MARK_UNUSED_NOW option
+ * regardless of whether or not the relation has indexes, since we
+ * cannot safely determine that during on-access pruning with the
+ * current implementation.
*/
- heap_page_prune_and_freeze(relation, buffer, vistest, 0,
- NULL, &presult, PRUNE_ON_ACCESS, &dummy_off_loc, NULL, NULL);
+ PruneFreezeParams params = {.relation = relation,.buffer = buffer,
+ .reason = PRUNE_ON_ACCESS,.options = 0,
+ .vistest = vistest,.cutoffs = NULL
+ };
+
+ heap_page_prune_and_freeze(¶ms, &presult, &dummy_off_loc,
+ NULL, NULL);
/*
* Report the number of tuples reclaimed to pgstats. This is
@@ -419,60 +425,44 @@ heap_page_will_freeze(Relation relation, Buffer buffer,
* also need to account for a reduction in the length of the line pointer
* array following array truncation by us.
*
- * If the HEAP_PRUNE_FREEZE option is set, we will also freeze tuples if it's
- * required in order to advance relfrozenxid / relminmxid, or if it's
- * considered advantageous for overall system performance to do so now. The
- * 'cutoffs', 'presult', 'new_relfrozen_xid' and 'new_relmin_mxid' arguments
- * are required when freezing. When HEAP_PRUNE_FREEZE option is set, we also
- * set presult->all_visible and presult->all_frozen on exit, to indicate if
- * the VM bits can be set. They are always set to false when the
- * HEAP_PRUNE_FREEZE option is not set, because at the moment only callers
- * that also freeze need that information.
- *
- * vistest is used to distinguish whether tuples are DEAD or RECENTLY_DEAD
- * (see heap_prune_satisfies_vacuum).
- *
- * options:
- * MARK_UNUSED_NOW indicates that dead items can be set LP_UNUSED during
- * pruning.
+ * params contains the input parameters used to control freezing and pruning
+ * behavior. See the definition of PruneFreezeParams for more on what each
+ * parameter does.
*
- * FREEZE indicates that we will also freeze tuples, and will return
- * 'all_visible', 'all_frozen' flags to the caller.
- *
- * cutoffs contains the freeze cutoffs, established by VACUUM at the beginning
- * of vacuuming the relation. Required if HEAP_PRUNE_FREEZE option is set.
- * cutoffs->OldestXmin is also used to determine if dead tuples are
- * HEAPTUPLE_RECENTLY_DEAD or HEAPTUPLE_DEAD.
+ * If the HEAP_PAGE_PRUNE_FREEZE option is set in params, we will freeze
+ * tuples if it's required in order to advance relfrozenxid / relminmxid, or
+ * if it's considered advantageous for overall system performance to do so
+ * now. The 'params.cutoffs', 'presult', 'new_relfrozen_xid' and
+ * 'new_relmin_mxid' arguments are required when freezing. When
+ * HEAP_PAGE_PRUNE_FREEZE option is passed, we also set presult->all_visible
+ * and presult->all_frozen on exit, to indicate if the VM bits can be set.
+ * They are always set to false when the HEAP_PAGE_PRUNE_FREEZE option is not
+ * passed, because at the moment only callers that also freeze need that
+ * information.
*
* presult contains output parameters needed by callers, such as the number of
* tuples removed and the offsets of dead items on the page after pruning.
* heap_page_prune_and_freeze() is responsible for initializing it. Required
* by all callers.
*
- * reason indicates why the pruning is performed. It is included in the WAL
- * record for debugging and analysis purposes, but otherwise has no effect.
- *
* off_loc is the offset location required by the caller to use in error
* callback.
*
* new_relfrozen_xid and new_relmin_mxid must provided by the caller if the
- * HEAP_PRUNE_FREEZE option is set. On entry, they contain the oldest XID and
- * multi-XID seen on the relation so far. They will be updated with oldest
- * values present on the page after pruning. After processing the whole
- * relation, VACUUM can use these values as the new relfrozenxid/relminmxid
- * for the relation.
+ * HEAP_PAGE_PRUNE_FREEZE option is set in params. On entry, they contain the
+ * oldest XID and multi-XID seen on the relation so far. They will be updated
+ * with oldest values present on the page after pruning. After processing the
+ * whole relation, VACUUM can use these values as the new
+ * relfrozenxid/relminmxid for the relation.
*/
void
-heap_page_prune_and_freeze(Relation relation, Buffer buffer,
- GlobalVisState *vistest,
- int options,
- struct VacuumCutoffs *cutoffs,
+heap_page_prune_and_freeze(PruneFreezeParams *params,
PruneFreezeResult *presult,
- PruneReason reason,
OffsetNumber *off_loc,
TransactionId *new_relfrozen_xid,
MultiXactId *new_relmin_mxid)
{
+ Buffer buffer = params->buffer;
Page page = BufferGetPage(buffer);
BlockNumber blockno = BufferGetBlockNumber(buffer);
OffsetNumber offnum,
@@ -486,10 +476,11 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
int64 fpi_before = pgWalUsage.wal_fpi;
/* Copy parameters to prstate */
- prstate.vistest = vistest;
- prstate.mark_unused_now = (options & HEAP_PAGE_PRUNE_MARK_UNUSED_NOW) != 0;
- prstate.attempt_freeze = (options & HEAP_PAGE_PRUNE_FREEZE) != 0;
- prstate.cutoffs = cutoffs;
+ prstate.vistest = params->vistest;
+ prstate.mark_unused_now =
+ (params->options & HEAP_PAGE_PRUNE_MARK_UNUSED_NOW) != 0;
+ prstate.attempt_freeze = (params->options & HEAP_PAGE_PRUNE_FREEZE) != 0;
+ prstate.cutoffs = params->cutoffs;
/*
* Our strategy is to scan the page and make lists of items to change,
@@ -583,7 +574,7 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
prstate.visibility_cutoff_xid = InvalidTransactionId;
maxoff = PageGetMaxOffsetNumber(page);
- tup.t_tableOid = RelationGetRelid(relation);
+ tup.t_tableOid = RelationGetRelid(params->relation);
/*
* Determine HTSV for all tuples, and queue them up for processing as HOT
@@ -786,7 +777,7 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
* Decide if we want to go ahead with freezing according to the freeze
* plans we prepared, or not.
*/
- do_freeze = heap_page_will_freeze(relation, buffer,
+ do_freeze = heap_page_will_freeze(params->relation, buffer,
did_tuple_hint_fpi,
do_prune,
do_hint_prune,
@@ -838,7 +829,7 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
/*
* Emit a WAL XLOG_HEAP2_PRUNE* record showing what we did
*/
- if (RelationNeedsWAL(relation))
+ if (RelationNeedsWAL(params->relation))
{
/*
* The snapshotConflictHorizon for the whole record should be the
@@ -876,11 +867,11 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
else
conflict_xid = prstate.latest_xid_removed;
- log_heap_prune_and_freeze(relation, buffer,
+ log_heap_prune_and_freeze(params->relation, buffer,
InvalidBuffer, /* vmbuffer */
0, /* vmflags */
conflict_xid,
- true, reason,
+ true, params->reason,
prstate.frozen, prstate.nfrozen,
prstate.redirected, prstate.nredirected,
prstate.nowdead, prstate.ndead,
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index deb9a3dc0d1..2b9e5c7f81b 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -1965,7 +1965,10 @@ lazy_scan_prune(LVRelState *vacrel,
{
Relation rel = vacrel->rel;
PruneFreezeResult presult;
- int prune_options = 0;
+ PruneFreezeParams params = {.relation = rel,.buffer = buf,
+ .reason = PRUNE_VACUUM_SCAN,.options = HEAP_PAGE_PRUNE_FREEZE,
+ .cutoffs = &vacrel->cutoffs,.vistest = vacrel->vistest
+ };
Assert(BufferGetBlockNumber(buf) == blkno);
@@ -1984,12 +1987,11 @@ lazy_scan_prune(LVRelState *vacrel,
* tuples. Pruning will have determined whether or not the page is
* all-visible.
*/
- prune_options = HEAP_PAGE_PRUNE_FREEZE;
if (vacrel->nindexes == 0)
- prune_options |= HEAP_PAGE_PRUNE_MARK_UNUSED_NOW;
+ params.options |= HEAP_PAGE_PRUNE_MARK_UNUSED_NOW;
- heap_page_prune_and_freeze(rel, buf, vacrel->vistest, prune_options,
- &vacrel->cutoffs, &presult, PRUNE_VACUUM_SCAN,
+ heap_page_prune_and_freeze(¶ms,
+ &presult,
&vacrel->offnum,
&vacrel->NewRelfrozenXid, &vacrel->NewRelminMxid);
diff --git a/src/include/access/heapam.h b/src/include/access/heapam.h
index 909db73b7bb..632c4332a8c 100644
--- a/src/include/access/heapam.h
+++ b/src/include/access/heapam.h
@@ -221,6 +221,56 @@ typedef struct HeapPageFreeze
} HeapPageFreeze;
+
+/* 'reason' codes for heap_page_prune_and_freeze() */
+typedef enum
+{
+ PRUNE_ON_ACCESS, /* on-access pruning */
+ PRUNE_VACUUM_SCAN, /* VACUUM 1st heap pass */
+ PRUNE_VACUUM_CLEANUP, /* VACUUM 2nd heap pass */
+} PruneReason;
+
+/*
+ * Input parameters to heap_page_prune_and_freeze()
+ */
+typedef struct PruneFreezeParams
+{
+ Relation relation; /* relation containing buffer to be pruned */
+ Buffer buffer; /* buffer to be pruned */
+
+ /*
+ * The reason pruning was performed. It is used to set the WAL record
+ * opcode which is used for debugging and analysis purposes.
+ */
+ PruneReason reason;
+
+ /*
+ * Contains flag bits:
+ *
+ * HEAP_PAGE_PRUNE_MARK_UNUSED_NOW indicates that dead items can be set
+ * LP_UNUSED during pruning.
+ *
+ * HEAP_PAGE_PRUNE_FREEZE indicates that we will also freeze tuples, and
+ * will return 'all_visible', 'all_frozen' flags to the caller.
+ */
+ int options;
+
+ /*
+ * vistest is used to distinguish whether tuples are DEAD or RECENTLY_DEAD
+ * (see heap_prune_satisfies_vacuum).
+ */
+ GlobalVisState *vistest;
+
+ /*
+ * Contains the cutoffs used for freezing. They are required if the
+ * HEAP_PAGE_PRUNE_FREEZE option is set. cutoffs->OldestXmin is also used
+ * to determine if dead tuples are HEAPTUPLE_RECENTLY_DEAD or
+ * HEAPTUPLE_DEAD. Currently only vacuum passes in cutoffs. Vacuum
+ * calculates them once, at the beginning of vacuuming the relation.
+ */
+ struct VacuumCutoffs *cutoffs;
+} PruneFreezeParams;
+
/*
* Per-page state returned by heap_page_prune_and_freeze()
*/
@@ -264,13 +314,6 @@ typedef struct PruneFreezeResult
OffsetNumber deadoffsets[MaxHeapTuplesPerPage];
} PruneFreezeResult;
-/* 'reason' codes for heap_page_prune_and_freeze() */
-typedef enum
-{
- PRUNE_ON_ACCESS, /* on-access pruning */
- PRUNE_VACUUM_SCAN, /* VACUUM 1st heap pass */
- PRUNE_VACUUM_CLEANUP, /* VACUUM 2nd heap pass */
-} PruneReason;
/* ----------------
* function prototypes for heap access method
@@ -367,12 +410,8 @@ extern TransactionId heap_index_delete_tuples(Relation rel,
/* in heap/pruneheap.c */
extern void heap_page_prune_opt(Relation relation, Buffer buffer);
-extern void heap_page_prune_and_freeze(Relation relation, Buffer buffer,
- GlobalVisState *vistest,
- int options,
- struct VacuumCutoffs *cutoffs,
+extern void heap_page_prune_and_freeze(PruneFreezeParams *params,
PruneFreezeResult *presult,
- PruneReason reason,
OffsetNumber *off_loc,
TransactionId *new_relfrozen_xid,
MultiXactId *new_relmin_mxid);
diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list
index 23bce72ae64..8698918f443 100644
--- a/src/tools/pgindent/typedefs.list
+++ b/src/tools/pgindent/typedefs.list
@@ -2345,6 +2345,7 @@ ProjectionPath
PromptInterruptContext
ProtocolVersion
PrsStorage
+PruneFreezeParams
PruneFreezeResult
PruneReason
PruneState
--
2.43.0
v20-0002-Keep-all_frozen-updated-in-heap_page_prune_and_f.patchtext/x-patch; charset=US-ASCII; name=v20-0002-Keep-all_frozen-updated-in-heap_page_prune_and_f.patchDownload
From 55975c548eb8fa66be84fa8c1f41ee723549814b Mon Sep 17 00:00:00 2001
From: Melanie Plageman <melanieplageman@gmail.com>
Date: Tue, 14 Oct 2025 14:55:40 -0400
Subject: [PATCH v20 02/12] Keep all_frozen updated in
heap_page_prune_and_freeze
Previously, we relied on all_visible and all_frozen being used together
to ensure that all_frozen was correct, but it is better to keep both
fields updated.
Future changes will separate their usage, so we should not depend on
all_visible for the validity of all_frozen.
Author: Melanie Plageman <melanieplageman@gmail.com>
Discussion: https://postgr.es/m/flat/CAAKRu_ZMw6Npd_qm2KM%2BFwQ3cMOMx1Dh3VMhp8-V7SOLxdK9-g%40mail.gmail.com
---
src/backend/access/heap/pruneheap.c | 60 +++++++++++++++-------------
src/backend/access/heap/vacuumlazy.c | 9 ++---
2 files changed, 37 insertions(+), 32 deletions(-)
diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index e9e14cb42b7..7cd51c7be33 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -143,10 +143,6 @@ typedef struct
* whether to freeze the page or not. The all_visible and all_frozen
* values returned to the caller are adjusted to include LP_DEAD items at
* the end.
- *
- * all_frozen should only be considered valid if all_visible is also set;
- * we don't bother to clear the all_frozen flag every time we clear the
- * all_visible flag.
*/
bool all_visible;
bool all_frozen;
@@ -359,8 +355,10 @@ heap_page_will_freeze(Relation relation, Buffer buffer,
* anymore. The opportunistic freeze heuristic must be improved;
* however, for now, try to approximate the old logic.
*/
- if (prstate->all_visible && prstate->all_frozen && prstate->nfrozen > 0)
+ if (prstate->all_frozen && prstate->nfrozen > 0)
{
+ Assert(prstate->all_visible);
+
/*
* Freezing would make the page all-frozen. Have already emitted
* an FPI or will do so anyway?
@@ -544,9 +542,9 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
* dead tuples which are not yet removable. However, dead tuples which
* will be removed by the end of vacuuming should not preclude us from
* opportunistically freezing. Because of that, we do not clear
- * all_visible when we see LP_DEAD items. We fix that at the end of the
- * function, when we return the value to the caller, so that the caller
- * doesn't set the VM bit incorrectly.
+ * all_visible and all_frozen when we see LP_DEAD items. We fix that at
+ * the end of the function, when we return the value to the caller, so
+ * that the caller doesn't set the VM bits incorrectly.
*/
if (prstate.attempt_freeze)
{
@@ -783,6 +781,8 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
do_hint_prune,
&prstate);
+ Assert(!prstate.all_frozen || prstate.all_visible);
+
/* Any error while applying the changes is critical */
START_CRIT_SECTION();
@@ -852,7 +852,7 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
*/
if (do_freeze)
{
- if (prstate.all_visible && prstate.all_frozen)
+ if (prstate.all_frozen)
frz_conflict_horizon = prstate.visibility_cutoff_xid;
else
{
@@ -889,16 +889,16 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
presult->recently_dead_tuples = prstate.recently_dead_tuples;
/*
- * It was convenient to ignore LP_DEAD items in all_visible earlier on to
- * make the choice of whether or not to freeze the page unaffected by the
- * short-term presence of LP_DEAD items. These LP_DEAD items were
- * effectively assumed to be LP_UNUSED items in the making. It doesn't
- * matter which vacuum heap pass (initial pass or final pass) ends up
- * setting the page all-frozen, as long as the ongoing VACUUM does it.
+ * It was convenient to ignore LP_DEAD items in all_visible/all_frozen
+ * earlier on to make the choice of whether or not to freeze the page
+ * unaffected by the short-term presence of LP_DEAD items. These LP_DEAD
+ * items were effectively assumed to be LP_UNUSED items in the making. It
+ * doesn't matter which vacuum heap pass (initial pass or final pass) ends
+ * up setting the page all-frozen, as long as the ongoing VACUUM does it.
*
- * Now that freezing has been finalized, unset all_visible if there are
- * any LP_DEAD items on the page. It needs to reflect the present state
- * of the page, as expected by our caller.
+ * Now that freezing has been finalized, unset all_visible and all_frozen
+ * if there are any LP_DEAD items on the page. It needs to reflect the
+ * present state of the page, as expected by our caller.
*/
if (prstate.all_visible && prstate.lpdead_items == 0)
{
@@ -1289,8 +1289,9 @@ heap_prune_record_dead(PruneState *prstate, OffsetNumber offnum,
prstate->ndead++;
/*
- * Deliberately delay unsetting all_visible until later during pruning.
- * Removable dead tuples shouldn't preclude freezing the page.
+ * Deliberately delay unsetting all_visible and all_frozen until later
+ * during pruning. Removable dead tuples shouldn't preclude freezing the
+ * page.
*/
/* Record the dead offset for vacuum */
@@ -1418,6 +1419,7 @@ heap_prune_record_unchanged_lp_normal(Page page, PruneState *prstate, OffsetNumb
if (!HeapTupleHeaderXminCommitted(htup))
{
prstate->all_visible = false;
+ prstate->all_frozen = false;
break;
}
@@ -1432,14 +1434,15 @@ heap_prune_record_unchanged_lp_normal(Page page, PruneState *prstate, OffsetNumb
/*
* For now always use prstate->cutoffs for this test, because
- * we only update 'all_visible' when freezing is requested. We
- * could use GlobalVisTestIsRemovableXid instead, if a
- * non-freezing caller wanted to set the VM bit.
+ * we only update 'all_visible' and 'all_frozen' when freezing
+ * is requested. We could use GlobalVisTestIsRemovableXid
+ * instead, if a non-freezing caller wanted to set the VM bit.
*/
Assert(prstate->cutoffs);
if (!TransactionIdPrecedes(xmin, prstate->cutoffs->OldestXmin))
{
prstate->all_visible = false;
+ prstate->all_frozen = false;
break;
}
@@ -1453,6 +1456,7 @@ heap_prune_record_unchanged_lp_normal(Page page, PruneState *prstate, OffsetNumb
case HEAPTUPLE_RECENTLY_DEAD:
prstate->recently_dead_tuples++;
prstate->all_visible = false;
+ prstate->all_frozen = false;
/*
* This tuple will soon become DEAD. Update the hint field so
@@ -1472,6 +1476,7 @@ heap_prune_record_unchanged_lp_normal(Page page, PruneState *prstate, OffsetNumb
* does, so be consistent.
*/
prstate->all_visible = false;
+ prstate->all_frozen = false;
/*
* If we wanted to optimize for aborts, we might consider marking
@@ -1490,6 +1495,7 @@ heap_prune_record_unchanged_lp_normal(Page page, PruneState *prstate, OffsetNumb
*/
prstate->live_tuples++;
prstate->all_visible = false;
+ prstate->all_frozen = false;
/*
* This tuple may soon become DEAD. Update the hint field so that
@@ -1554,10 +1560,10 @@ heap_prune_record_unchanged_lp_dead(Page page, PruneState *prstate, OffsetNumber
* hastup/nonempty_pages as provisional no matter how LP_DEAD items are
* handled (handled here, or handled later on).
*
- * Similarly, don't unset all_visible until later, at the end of
- * heap_page_prune_and_freeze(). This will allow us to attempt to freeze
- * the page after pruning. As long as we unset it before updating the
- * visibility map, this will be correct.
+ * Similarly, don't unset all_visible and all_frozen until later, at the
+ * end of heap_page_prune_and_freeze(). This will allow us to attempt to
+ * freeze the page after pruning. As long as we unset it before updating
+ * the visibility map, this will be correct.
*/
/* Record the dead offset for vacuum */
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index 2b9e5c7f81b..e1b7456823d 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -2017,7 +2017,6 @@ lazy_scan_prune(LVRelState *vacrel,
* agreement with heap_page_is_all_visible() using an assertion.
*/
#ifdef USE_ASSERT_CHECKING
- /* Note that all_frozen value does not matter when !all_visible */
if (presult.all_visible)
{
TransactionId debug_cutoff;
@@ -2071,6 +2070,7 @@ lazy_scan_prune(LVRelState *vacrel,
*has_lpdead_items = (presult.lpdead_items > 0);
Assert(!presult.all_visible || !(*has_lpdead_items));
+ Assert(!presult.all_frozen || presult.all_visible);
/*
* Handle setting visibility map bit based on information from the VM (as
@@ -2176,11 +2176,10 @@ lazy_scan_prune(LVRelState *vacrel,
/*
* If the all-visible page is all-frozen but not marked as such yet, mark
- * it as all-frozen. Note that all_frozen is only valid if all_visible is
- * true, so we must check both all_visible and all_frozen.
+ * it as all-frozen.
*/
- else if (all_visible_according_to_vm && presult.all_visible &&
- presult.all_frozen && !VM_ALL_FROZEN(vacrel->rel, blkno, &vmbuffer))
+ else if (all_visible_according_to_vm && presult.all_frozen &&
+ !VM_ALL_FROZEN(vacrel->rel, blkno, &vmbuffer))
{
uint8 old_vmbits;
--
2.43.0
v20-0006-Eliminate-XLOG_HEAP2_VISIBLE-from-empty-page-vac.patchtext/x-patch; charset=US-ASCII; name=v20-0006-Eliminate-XLOG_HEAP2_VISIBLE-from-empty-page-vac.patchDownload
From 8519fcd867cd83f416513e635e0591af6c86a712 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <melanieplageman@gmail.com>
Date: Sat, 27 Sep 2025 11:55:21 -0400
Subject: [PATCH v20 06/12] Eliminate XLOG_HEAP2_VISIBLE from empty-page vacuum
As part of removing XLOG_HEAP2_VISIBLE records, phase I of VACUUM now
marks empty pages all-visible in a XLOG_HEAP2_PRUNE_VACUUM_SCAN record.
Author: Melanie Plageman <melanieplageman@gmail.com>
Reviewed-by: Robert Haas <robertmhaas@gmail.com>
---
src/backend/access/heap/vacuumlazy.c | 36 +++++++++++++++++++++++-----
1 file changed, 30 insertions(+), 6 deletions(-)
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index a7a974b6639..fa7be0f857f 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -1872,9 +1872,12 @@ lazy_scan_new_or_empty(LVRelState *vacrel, Buffer buf, BlockNumber blkno,
*/
if (!PageIsAllVisible(page))
{
+ /* Lock vmbuffer before entering critical section */
+ LockBuffer(vmbuffer, BUFFER_LOCK_EXCLUSIVE);
+
START_CRIT_SECTION();
- /* mark buffer dirty before writing a WAL record */
+ /* Mark buffer dirty before writing any WAL records */
MarkBufferDirty(buf);
/*
@@ -1891,13 +1894,34 @@ lazy_scan_new_or_empty(LVRelState *vacrel, Buffer buf, BlockNumber blkno,
log_newpage_buffer(buf, true);
PageSetAllVisible(page);
- visibilitymap_set(vacrel->rel, blkno, buf,
- InvalidXLogRecPtr,
- vmbuffer, InvalidTransactionId,
- VISIBILITYMAP_ALL_VISIBLE |
- VISIBILITYMAP_ALL_FROZEN);
+ visibilitymap_set_vmbits(blkno,
+ vmbuffer,
+ VISIBILITYMAP_ALL_VISIBLE |
+ VISIBILITYMAP_ALL_FROZEN,
+ vacrel->rel->rd_locator);
+
+ /*
+ * Emit WAL for setting PD_ALL_VISIBLE on the heap page and
+ * setting the VM.
+ */
+ if (RelationNeedsWAL(vacrel->rel))
+ log_heap_prune_and_freeze(vacrel->rel, buf,
+ vmbuffer,
+ VISIBILITYMAP_ALL_VISIBLE |
+ VISIBILITYMAP_ALL_FROZEN,
+ InvalidTransactionId, /* conflict xid */
+ false, /* cleanup lock */
+ true, /* set_pd_all_vis */
+ PRUNE_VACUUM_SCAN, /* reason */
+ NULL, 0,
+ NULL, 0,
+ NULL, 0,
+ NULL, 0);
+
END_CRIT_SECTION();
+ LockBuffer(vmbuffer, BUFFER_LOCK_UNLOCK);
+
/* Count the newly all-frozen pages for logging */
vacrel->vm_new_visible_pages++;
vacrel->vm_new_visible_frozen_pages++;
--
2.43.0
v20-0007-Remove-XLOG_HEAP2_VISIBLE-entirely.patchtext/x-patch; charset=US-ASCII; name=v20-0007-Remove-XLOG_HEAP2_VISIBLE-entirely.patchDownload
From 2e0e8e8365e4abd086978db70890bffd6e367b2e Mon Sep 17 00:00:00 2001
From: Melanie Plageman <melanieplageman@gmail.com>
Date: Sat, 27 Sep 2025 11:55:36 -0400
Subject: [PATCH v20 07/12] Remove XLOG_HEAP2_VISIBLE entirely
As no remaining users emit XLOG_HEAP2_VISIBLE records.
This includes deleting the xl_heap_visible struct and all functions
responsible for emitting or replaying XLOG_HEAP2_VISIBLE records.
Author: Melanie Plageman <melanieplageman@gmail.com>
Reviewed-by: Andrey Borodin <x4mmm@yandex-team.ru>
---
src/backend/access/common/bufmask.c | 4 +-
src/backend/access/heap/heapam.c | 54 +-------
src/backend/access/heap/heapam_xlog.c | 155 ++---------------------
src/backend/access/heap/pruneheap.c | 18 ++-
src/backend/access/heap/vacuumlazy.c | 16 +--
src/backend/access/heap/visibilitymap.c | 112 +---------------
src/backend/access/rmgrdesc/heapdesc.c | 10 --
src/backend/replication/logical/decode.c | 1 -
src/backend/storage/ipc/standby.c | 12 +-
src/include/access/heapam_xlog.h | 28 +---
src/include/access/visibilitymap.h | 13 +-
src/include/access/visibilitymapdefs.h | 9 --
src/tools/pgindent/typedefs.list | 1 -
13 files changed, 54 insertions(+), 379 deletions(-)
diff --git a/src/backend/access/common/bufmask.c b/src/backend/access/common/bufmask.c
index bb260cffa68..5f07f179415 100644
--- a/src/backend/access/common/bufmask.c
+++ b/src/backend/access/common/bufmask.c
@@ -56,8 +56,8 @@ mask_page_hint_bits(Page page)
/*
* During replay, if the page LSN has advanced past our XLOG record's LSN,
- * we don't mark the page all-visible. See heap_xlog_visible() for
- * details.
+ * we don't mark the page all-visible. See heap_xlog_prune_and_freeze()
+ * for more details.
*/
PageClearAllVisible(page);
}
diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index 4b0c49f4bb0..2bff37e03b5 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -2524,11 +2524,11 @@ heap_multi_insert(Relation relation, TupleTableSlot **slots, int ntuples,
else if (all_frozen_set)
{
PageSetAllVisible(page);
- visibilitymap_set_vmbits(BufferGetBlockNumber(buffer),
- vmbuffer,
- VISIBILITYMAP_ALL_VISIBLE |
- VISIBILITYMAP_ALL_FROZEN,
- relation->rd_locator);
+ visibilitymap_set(BufferGetBlockNumber(buffer),
+ vmbuffer,
+ VISIBILITYMAP_ALL_VISIBLE |
+ VISIBILITYMAP_ALL_FROZEN,
+ relation->rd_locator);
}
/*
@@ -8797,50 +8797,6 @@ bottomup_sort_and_shrink(TM_IndexDeleteOp *delstate)
return nblocksfavorable;
}
-/*
- * Perform XLogInsert for a heap-visible operation. 'block' is the block
- * being marked all-visible, and vm_buffer is the buffer containing the
- * corresponding visibility map block. Both should have already been modified
- * and dirtied.
- *
- * snapshotConflictHorizon comes from the largest xmin on the page being
- * marked all-visible. REDO routine uses it to generate recovery conflicts.
- *
- * If checksums or wal_log_hints are enabled, we may also generate a full-page
- * image of heap_buffer. Otherwise, we optimize away the FPI (by specifying
- * REGBUF_NO_IMAGE for the heap buffer), in which case the caller should *not*
- * update the heap page's LSN.
- */
-XLogRecPtr
-log_heap_visible(Relation rel, Buffer heap_buffer, Buffer vm_buffer,
- TransactionId snapshotConflictHorizon, uint8 vmflags)
-{
- xl_heap_visible xlrec;
- XLogRecPtr recptr;
- uint8 flags;
-
- Assert(BufferIsValid(heap_buffer));
- Assert(BufferIsValid(vm_buffer));
-
- xlrec.snapshotConflictHorizon = snapshotConflictHorizon;
- xlrec.flags = vmflags;
- if (RelationIsAccessibleInLogicalDecoding(rel))
- xlrec.flags |= VISIBILITYMAP_XLOG_CATALOG_REL;
- XLogBeginInsert();
- XLogRegisterData(&xlrec, SizeOfHeapVisible);
-
- XLogRegisterBuffer(0, vm_buffer, 0);
-
- flags = REGBUF_STANDARD;
- if (!XLogHintBitIsNeeded())
- flags |= REGBUF_NO_IMAGE;
- XLogRegisterBuffer(1, heap_buffer, flags);
-
- recptr = XLogInsert(RM_HEAP2_ID, XLOG_HEAP2_VISIBLE);
-
- return recptr;
-}
-
/*
* Perform XLogInsert for a heap-update operation. Caller must already
* have modified the buffer(s) and marked them dirty.
diff --git a/src/backend/access/heap/heapam_xlog.c b/src/backend/access/heap/heapam_xlog.c
index 2af724451c3..5ab46e8bf8f 100644
--- a/src/backend/access/heap/heapam_xlog.c
+++ b/src/backend/access/heap/heapam_xlog.c
@@ -251,7 +251,7 @@ heap_xlog_prune_freeze(XLogReaderState *record)
if (PageIsNew(vmpage))
PageInit(vmpage, BLCKSZ, 0);
- visibilitymap_set_vmbits(blkno, vmbuffer, vmflags, rlocator);
+ visibilitymap_set(blkno, vmbuffer, vmflags, rlocator);
Assert(BufferIsDirty(vmbuffer));
PageSetLSN(vmpage, lsn);
@@ -264,142 +264,6 @@ heap_xlog_prune_freeze(XLogReaderState *record)
XLogRecordPageWithFreeSpace(rlocator, blkno, freespace);
}
-/*
- * Replay XLOG_HEAP2_VISIBLE records.
- *
- * The critical integrity requirement here is that we must never end up with
- * a situation where the visibility map bit is set, and the page-level
- * PD_ALL_VISIBLE bit is clear. If that were to occur, then a subsequent
- * page modification would fail to clear the visibility map bit.
- */
-static void
-heap_xlog_visible(XLogReaderState *record)
-{
- XLogRecPtr lsn = record->EndRecPtr;
- xl_heap_visible *xlrec = (xl_heap_visible *) XLogRecGetData(record);
- Buffer vmbuffer = InvalidBuffer;
- Buffer buffer;
- Page page;
- RelFileLocator rlocator;
- BlockNumber blkno;
- XLogRedoAction action;
-
- Assert((xlrec->flags & VISIBILITYMAP_XLOG_VALID_BITS) == xlrec->flags);
-
- XLogRecGetBlockTag(record, 1, &rlocator, NULL, &blkno);
-
- /*
- * If there are any Hot Standby transactions running that have an xmin
- * horizon old enough that this page isn't all-visible for them, they
- * might incorrectly decide that an index-only scan can skip a heap fetch.
- *
- * NB: It might be better to throw some kind of "soft" conflict here that
- * forces any index-only scan that is in flight to perform heap fetches,
- * rather than killing the transaction outright.
- */
- if (InHotStandby)
- ResolveRecoveryConflictWithSnapshot(xlrec->snapshotConflictHorizon,
- xlrec->flags & VISIBILITYMAP_XLOG_CATALOG_REL,
- rlocator);
-
- /*
- * Read the heap page, if it still exists. If the heap file has dropped or
- * truncated later in recovery, we don't need to update the page, but we'd
- * better still update the visibility map.
- */
- action = XLogReadBufferForRedo(record, 1, &buffer);
- if (action == BLK_NEEDS_REDO)
- {
- /*
- * We don't bump the LSN of the heap page when setting the visibility
- * map bit (unless checksums or wal_hint_bits is enabled, in which
- * case we must). This exposes us to torn page hazards, but since
- * we're not inspecting the existing page contents in any way, we
- * don't care.
- */
- page = BufferGetPage(buffer);
-
- PageSetAllVisible(page);
-
- if (XLogHintBitIsNeeded())
- PageSetLSN(page, lsn);
-
- MarkBufferDirty(buffer);
- }
- else if (action == BLK_RESTORED)
- {
- /*
- * If heap block was backed up, we already restored it and there's
- * nothing more to do. (This can only happen with checksums or
- * wal_log_hints enabled.)
- */
- }
-
- if (BufferIsValid(buffer))
- {
- Size space = PageGetFreeSpace(BufferGetPage(buffer));
-
- UnlockReleaseBuffer(buffer);
-
- /*
- * Since FSM is not WAL-logged and only updated heuristically, it
- * easily becomes stale in standbys. If the standby is later promoted
- * and runs VACUUM, it will skip updating individual free space
- * figures for pages that became all-visible (or all-frozen, depending
- * on the vacuum mode,) which is troublesome when FreeSpaceMapVacuum
- * propagates too optimistic free space values to upper FSM layers;
- * later inserters try to use such pages only to find out that they
- * are unusable. This can cause long stalls when there are many such
- * pages.
- *
- * Forestall those problems by updating FSM's idea about a page that
- * is becoming all-visible or all-frozen.
- *
- * Do this regardless of a full-page image being applied, since the
- * FSM data is not in the page anyway.
- */
- if (xlrec->flags & VISIBILITYMAP_VALID_BITS)
- XLogRecordPageWithFreeSpace(rlocator, blkno, space);
- }
-
- /*
- * Even if we skipped the heap page update due to the LSN interlock, it's
- * still safe to update the visibility map. Any WAL record that clears
- * the visibility map bit does so before checking the page LSN, so any
- * bits that need to be cleared will still be cleared.
- */
- if (XLogReadBufferForRedoExtended(record, 0, RBM_ZERO_ON_ERROR, false,
- &vmbuffer) == BLK_NEEDS_REDO)
- {
- Page vmpage = BufferGetPage(vmbuffer);
- Relation reln;
- uint8 vmbits;
-
- /* initialize the page if it was read as zeros */
- if (PageIsNew(vmpage))
- PageInit(vmpage, BLCKSZ, 0);
-
- /* remove VISIBILITYMAP_XLOG_* */
- vmbits = xlrec->flags & VISIBILITYMAP_VALID_BITS;
-
- /*
- * XLogReadBufferForRedoExtended locked the buffer. But
- * visibilitymap_set will handle locking itself.
- */
- LockBuffer(vmbuffer, BUFFER_LOCK_UNLOCK);
-
- reln = CreateFakeRelcacheEntry(rlocator);
-
- visibilitymap_set(reln, blkno, InvalidBuffer, lsn, vmbuffer,
- xlrec->snapshotConflictHorizon, vmbits);
-
- ReleaseBuffer(vmbuffer);
- FreeFakeRelcacheEntry(reln);
- }
- else if (BufferIsValid(vmbuffer))
- UnlockReleaseBuffer(vmbuffer);
-}
-
/*
* Given an "infobits" field from an XLog record, set the correct bits in the
* given infomask and infomask2 for the tuple touched by the record.
@@ -777,8 +641,8 @@ heap_xlog_multi_insert(XLogReaderState *record)
*
* During recovery, however, no concurrent writers exist. Therefore,
* updating the VM without holding the heap page lock is safe enough. This
- * same approach is taken when replaying xl_heap_visible records (see
- * heap_xlog_visible()).
+ * same approach is taken when replaying XLOG_HEAP2_PRUNE* records (see
+ * heap_xlog_prune_and_freeze()).
*/
if ((xlrec->flags & XLH_INSERT_ALL_FROZEN_SET) &&
XLogReadBufferForRedoExtended(record, 1, RBM_ZERO_ON_ERROR, false,
@@ -790,11 +654,11 @@ heap_xlog_multi_insert(XLogReaderState *record)
if (PageIsNew(vmpage))
PageInit(vmpage, BLCKSZ, 0);
- visibilitymap_set_vmbits(blkno,
- vmbuffer,
- VISIBILITYMAP_ALL_VISIBLE |
- VISIBILITYMAP_ALL_FROZEN,
- rlocator);
+ visibilitymap_set(blkno,
+ vmbuffer,
+ VISIBILITYMAP_ALL_VISIBLE |
+ VISIBILITYMAP_ALL_FROZEN,
+ rlocator);
Assert(BufferIsDirty(vmbuffer));
PageSetLSN(vmpage, lsn);
@@ -1375,9 +1239,6 @@ heap2_redo(XLogReaderState *record)
case XLOG_HEAP2_PRUNE_VACUUM_CLEANUP:
heap_xlog_prune_freeze(record);
break;
- case XLOG_HEAP2_VISIBLE:
- heap_xlog_visible(record);
- break;
case XLOG_HEAP2_MULTI_INSERT:
heap_xlog_multi_insert(record);
break;
diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index 5667df86bae..d6b22b7b1c5 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -1110,9 +1110,9 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
{
Assert(PageIsAllVisible(page));
- old_vmbits = visibilitymap_set_vmbits(blockno,
- vmbuffer, new_vmbits,
- params->relation->rd_locator);
+ old_vmbits = visibilitymap_set(blockno,
+ vmbuffer, new_vmbits,
+ params->relation->rd_locator);
if (old_vmbits == new_vmbits)
{
LockBuffer(vmbuffer, BUFFER_LOCK_UNLOCK);
@@ -2394,14 +2394,18 @@ get_conflict_xid(bool do_prune, bool do_freeze, bool do_set_vm,
*
* This is used for several different page maintenance operations:
*
- * - Page pruning, in VACUUM's 1st pass or on access: Some items are
+ * - Page pruning, in vacuum phase I or on-access: Some items are
* redirected, some marked dead, and some removed altogether.
*
- * - Freezing: Items are marked as 'frozen'.
+ * - Freezing: During vacuum phase I, items are marked as 'frozen'
*
- * - Vacuum, 2nd pass: Items that are already LP_DEAD are marked as unused.
+ * - Reaping: During vacuum phase III, items that are already LP_DEAD are
+ * marked as unused.
*
- * They have enough commonalities that we use a single WAL record for them
+ * - VM updates: After vacuum phases I and III, the heap page may be marked
+ * all-visible and all-frozen.
+ *
+ * These changes all happen together, so we use a single WAL record for them
* all.
*
* If replaying the record requires a cleanup lock, pass cleanup_lock = true.
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index fa7be0f857f..fd68dfcfce2 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -1894,11 +1894,11 @@ lazy_scan_new_or_empty(LVRelState *vacrel, Buffer buf, BlockNumber blkno,
log_newpage_buffer(buf, true);
PageSetAllVisible(page);
- visibilitymap_set_vmbits(blkno,
- vmbuffer,
- VISIBILITYMAP_ALL_VISIBLE |
- VISIBILITYMAP_ALL_FROZEN,
- vacrel->rel->rd_locator);
+ visibilitymap_set(blkno,
+ vmbuffer,
+ VISIBILITYMAP_ALL_VISIBLE |
+ VISIBILITYMAP_ALL_FROZEN,
+ vacrel->rel->rd_locator);
/*
* Emit WAL for setting PD_ALL_VISIBLE on the heap page and
@@ -2776,9 +2776,9 @@ lazy_vacuum_heap_page(LVRelState *vacrel, BlockNumber blkno, Buffer buffer,
* set PD_ALL_VISIBLE.
*/
PageSetAllVisible(page);
- visibilitymap_set_vmbits(blkno,
- vmbuffer, vmflags,
- vacrel->rel->rd_locator);
+ visibilitymap_set(blkno,
+ vmbuffer, vmflags,
+ vacrel->rel->rd_locator);
conflict_xid = visibility_cutoff_xid;
}
diff --git a/src/backend/access/heap/visibilitymap.c b/src/backend/access/heap/visibilitymap.c
index d14588e92ae..7997e926872 100644
--- a/src/backend/access/heap/visibilitymap.c
+++ b/src/backend/access/heap/visibilitymap.c
@@ -14,8 +14,7 @@
* visibilitymap_clear - clear bits for one page in the visibility map
* visibilitymap_pin - pin a map page for setting a bit
* visibilitymap_pin_ok - check whether correct map page is already pinned
- * visibilitymap_set - set bit(s) in a previously pinned page and log
- * visibilitymap_set_vmbits - set bit(s) in a pinned page
+ * visibilitymap_set - set bit(s) in a previously pinned page
* visibilitymap_get_status - get status of bits
* visibilitymap_count - count number of bits set in visibility map
* visibilitymap_prepare_truncate -
@@ -220,109 +219,6 @@ visibilitymap_pin_ok(BlockNumber heapBlk, Buffer vmbuf)
return BufferIsValid(vmbuf) && BufferGetBlockNumber(vmbuf) == mapBlock;
}
-/*
- * visibilitymap_set - set bit(s) on a previously pinned page
- *
- * recptr is the LSN of the XLOG record we're replaying, if we're in recovery,
- * or InvalidXLogRecPtr in normal running. The VM page LSN is advanced to the
- * one provided; in normal running, we generate a new XLOG record and set the
- * page LSN to that value (though the heap page's LSN may *not* be updated;
- * see below). cutoff_xid is the largest xmin on the page being marked
- * all-visible; it is needed for Hot Standby, and can be InvalidTransactionId
- * if the page contains no tuples. It can also be set to InvalidTransactionId
- * when a page that is already all-visible is being marked all-frozen.
- *
- * Caller is expected to set the heap page's PD_ALL_VISIBLE bit before calling
- * this function. Except in recovery, caller should also pass the heap
- * buffer. When checksums are enabled and we're not in recovery, we must add
- * the heap buffer to the WAL chain to protect it from being torn.
- *
- * You must pass a buffer containing the correct map page to this function.
- * Call visibilitymap_pin first to pin the right one. This function doesn't do
- * any I/O.
- *
- * Returns the state of the page's VM bits before setting flags.
- */
-uint8
-visibilitymap_set(Relation rel, BlockNumber heapBlk, Buffer heapBuf,
- XLogRecPtr recptr, Buffer vmBuf, TransactionId cutoff_xid,
- uint8 flags)
-{
- BlockNumber mapBlock = HEAPBLK_TO_MAPBLOCK(heapBlk);
- uint32 mapByte = HEAPBLK_TO_MAPBYTE(heapBlk);
- uint8 mapOffset = HEAPBLK_TO_OFFSET(heapBlk);
- Page page;
- uint8 *map;
- uint8 status;
-
-#ifdef TRACE_VISIBILITYMAP
- elog(DEBUG1, "vm_set flags 0x%02X for %s %d",
- flags, RelationGetRelationName(rel), heapBlk);
-#endif
-
- Assert(InRecovery || !XLogRecPtrIsValid(recptr));
- Assert(InRecovery || PageIsAllVisible(BufferGetPage(heapBuf)));
- Assert((flags & VISIBILITYMAP_VALID_BITS) == flags);
-
- /* Must never set all_frozen bit without also setting all_visible bit */
- Assert(flags != VISIBILITYMAP_ALL_FROZEN);
-
- /* Check that we have the right heap page pinned, if present */
- if (BufferIsValid(heapBuf) && BufferGetBlockNumber(heapBuf) != heapBlk)
- elog(ERROR, "wrong heap buffer passed to visibilitymap_set");
-
- Assert(!BufferIsValid(heapBuf) ||
- BufferIsLockedByMeInMode(heapBuf, BUFFER_LOCK_EXCLUSIVE));
-
- /* Check that we have the right VM page pinned */
- if (!BufferIsValid(vmBuf) || BufferGetBlockNumber(vmBuf) != mapBlock)
- elog(ERROR, "wrong VM buffer passed to visibilitymap_set");
-
- page = BufferGetPage(vmBuf);
- map = (uint8 *) PageGetContents(page);
- LockBuffer(vmBuf, BUFFER_LOCK_EXCLUSIVE);
-
- status = (map[mapByte] >> mapOffset) & VISIBILITYMAP_VALID_BITS;
- if (flags != status)
- {
- START_CRIT_SECTION();
-
- map[mapByte] |= (flags << mapOffset);
- MarkBufferDirty(vmBuf);
-
- if (RelationNeedsWAL(rel))
- {
- if (!XLogRecPtrIsValid(recptr))
- {
- Assert(!InRecovery);
- recptr = log_heap_visible(rel, heapBuf, vmBuf, cutoff_xid, flags);
-
- /*
- * If data checksums are enabled (or wal_log_hints=on), we
- * need to protect the heap page from being torn.
- *
- * If not, then we must *not* update the heap page's LSN. In
- * this case, the FPI for the heap page was omitted from the
- * WAL record inserted above, so it would be incorrect to
- * update the heap page's LSN.
- */
- if (XLogHintBitIsNeeded())
- {
- Page heapPage = BufferGetPage(heapBuf);
-
- PageSetLSN(heapPage, recptr);
- }
- }
- PageSetLSN(page, recptr);
- }
-
- END_CRIT_SECTION();
- }
-
- LockBuffer(vmBuf, BUFFER_LOCK_UNLOCK);
- return status;
-}
-
/*
* Set VM (visibility map) flags in the VM block in vmBuf.
*
@@ -344,9 +240,9 @@ visibilitymap_set(Relation rel, BlockNumber heapBlk, Buffer heapBuf,
* rlocator is used only for debugging messages.
*/
uint8
-visibilitymap_set_vmbits(BlockNumber heapBlk,
- Buffer vmBuf, uint8 flags,
- const RelFileLocator rlocator)
+visibilitymap_set(BlockNumber heapBlk,
+ Buffer vmBuf, uint8 flags,
+ const RelFileLocator rlocator)
{
BlockNumber mapBlock = HEAPBLK_TO_MAPBLOCK(heapBlk);
uint32 mapByte = HEAPBLK_TO_MAPBYTE(heapBlk);
diff --git a/src/backend/access/rmgrdesc/heapdesc.c b/src/backend/access/rmgrdesc/heapdesc.c
index ca26d1f0ed1..08461fdf593 100644
--- a/src/backend/access/rmgrdesc/heapdesc.c
+++ b/src/backend/access/rmgrdesc/heapdesc.c
@@ -349,13 +349,6 @@ heap2_desc(StringInfo buf, XLogReaderState *record)
}
}
}
- else if (info == XLOG_HEAP2_VISIBLE)
- {
- xl_heap_visible *xlrec = (xl_heap_visible *) rec;
-
- appendStringInfo(buf, "snapshotConflictHorizon: %u, flags: 0x%02X",
- xlrec->snapshotConflictHorizon, xlrec->flags);
- }
else if (info == XLOG_HEAP2_MULTI_INSERT)
{
xl_heap_multi_insert *xlrec = (xl_heap_multi_insert *) rec;
@@ -461,9 +454,6 @@ heap2_identify(uint8 info)
case XLOG_HEAP2_PRUNE_VACUUM_CLEANUP:
id = "PRUNE_VACUUM_CLEANUP";
break;
- case XLOG_HEAP2_VISIBLE:
- id = "VISIBLE";
- break;
case XLOG_HEAP2_MULTI_INSERT:
id = "MULTI_INSERT";
break;
diff --git a/src/backend/replication/logical/decode.c b/src/backend/replication/logical/decode.c
index cc03f0706e9..2fdd4af90a8 100644
--- a/src/backend/replication/logical/decode.c
+++ b/src/backend/replication/logical/decode.c
@@ -454,7 +454,6 @@ heap2_decode(LogicalDecodingContext *ctx, XLogRecordBuffer *buf)
case XLOG_HEAP2_PRUNE_ON_ACCESS:
case XLOG_HEAP2_PRUNE_VACUUM_SCAN:
case XLOG_HEAP2_PRUNE_VACUUM_CLEANUP:
- case XLOG_HEAP2_VISIBLE:
case XLOG_HEAP2_LOCK_UPDATED:
break;
default:
diff --git a/src/backend/storage/ipc/standby.c b/src/backend/storage/ipc/standby.c
index 4222bdab078..c619643e121 100644
--- a/src/backend/storage/ipc/standby.c
+++ b/src/backend/storage/ipc/standby.c
@@ -475,12 +475,12 @@ ResolveRecoveryConflictWithSnapshot(TransactionId snapshotConflictHorizon,
* If we get passed InvalidTransactionId then we do nothing (no conflict).
*
* This can happen when replaying already-applied WAL records after a
- * standby crash or restart, or when replaying an XLOG_HEAP2_VISIBLE
- * record that marks as frozen a page which was already all-visible. It's
- * also quite common with records generated during index deletion
- * (original execution of the deletion can reason that a recovery conflict
- * which is sufficient for the deletion operation must take place before
- * replay of the deletion record itself).
+ * standby crash or restart, or when replaying a record that marks as
+ * frozen a page which was already marked all-visible in the visibility
+ * map. It's also quite common with records generated during index
+ * deletion (original execution of the deletion can reason that a recovery
+ * conflict which is sufficient for the deletion operation must take place
+ * before replay of the deletion record itself).
*/
if (!TransactionIdIsValid(snapshotConflictHorizon))
return;
diff --git a/src/include/access/heapam_xlog.h b/src/include/access/heapam_xlog.h
index 16c2b2e3c9c..e9e77bd678b 100644
--- a/src/include/access/heapam_xlog.h
+++ b/src/include/access/heapam_xlog.h
@@ -60,7 +60,6 @@
#define XLOG_HEAP2_PRUNE_ON_ACCESS 0x10
#define XLOG_HEAP2_PRUNE_VACUUM_SCAN 0x20
#define XLOG_HEAP2_PRUNE_VACUUM_CLEANUP 0x30
-#define XLOG_HEAP2_VISIBLE 0x40
#define XLOG_HEAP2_MULTI_INSERT 0x50
#define XLOG_HEAP2_LOCK_UPDATED 0x60
#define XLOG_HEAP2_NEW_CID 0x70
@@ -294,7 +293,13 @@ typedef struct xl_heap_prune
#define SizeOfHeapPrune (offsetof(xl_heap_prune, flags) + sizeof(uint16))
-/* to handle recovery conflict during logical decoding on standby */
+/*
+ * To handle recovery conflict during logical decoding on standby, we must know
+ * if the table is a catalog table. Note that in visibilitymapdefs.h
+ * VISIBLITYMAP_XLOG_CATALOG_REL is also defined as (1 << 2). xl_heap_prune
+ * records should use XLHP_IS_CATALOG_REL, not VISIBILIYTMAP_XLOG_CATALOG_REL --
+ * even if they only contain updates to the VM.
+ */
#define XLHP_IS_CATALOG_REL (1 << 1)
/*
@@ -443,20 +448,6 @@ typedef struct xl_heap_inplace
#define MinSizeOfHeapInplace (offsetof(xl_heap_inplace, nmsgs) + sizeof(int))
-/*
- * This is what we need to know about setting a visibility map bit
- *
- * Backup blk 0: visibility map buffer
- * Backup blk 1: heap buffer
- */
-typedef struct xl_heap_visible
-{
- TransactionId snapshotConflictHorizon;
- uint8 flags;
-} xl_heap_visible;
-
-#define SizeOfHeapVisible (offsetof(xl_heap_visible, flags) + sizeof(uint8))
-
typedef struct xl_heap_new_cid
{
/*
@@ -500,11 +491,6 @@ extern void heap2_desc(StringInfo buf, XLogReaderState *record);
extern const char *heap2_identify(uint8 info);
extern void heap_xlog_logical_rewrite(XLogReaderState *r);
-extern XLogRecPtr log_heap_visible(Relation rel, Buffer heap_buffer,
- Buffer vm_buffer,
- TransactionId snapshotConflictHorizon,
- uint8 vmflags);
-
/* in heapdesc.c, so it can be shared between frontend/backend code */
extern void heap_xlog_deserialize_prune_and_freeze(char *cursor, uint16 flags,
int *nplans, xlhp_freeze_plan **plans,
diff --git a/src/include/access/visibilitymap.h b/src/include/access/visibilitymap.h
index c6fa37be968..05ba6786b47 100644
--- a/src/include/access/visibilitymap.h
+++ b/src/include/access/visibilitymap.h
@@ -15,7 +15,6 @@
#define VISIBILITYMAP_H
#include "access/visibilitymapdefs.h"
-#include "access/xlogdefs.h"
#include "storage/block.h"
#include "storage/buf.h"
#include "storage/relfilelocator.h"
@@ -32,15 +31,9 @@ extern bool visibilitymap_clear(Relation rel, BlockNumber heapBlk,
extern void visibilitymap_pin(Relation rel, BlockNumber heapBlk,
Buffer *vmbuf);
extern bool visibilitymap_pin_ok(BlockNumber heapBlk, Buffer vmbuf);
-extern uint8 visibilitymap_set(Relation rel,
- BlockNumber heapBlk, Buffer heapBuf,
- XLogRecPtr recptr,
- Buffer vmBuf,
- TransactionId cutoff_xid,
- uint8 flags);
-extern uint8 visibilitymap_set_vmbits(BlockNumber heapBlk,
- Buffer vmBuf, uint8 flags,
- const RelFileLocator rlocator);
+extern uint8 visibilitymap_set(BlockNumber heapBlk,
+ Buffer vmBuf, uint8 flags,
+ const RelFileLocator rlocator);
extern uint8 visibilitymap_get_status(Relation rel, BlockNumber heapBlk, Buffer *vmbuf);
extern void visibilitymap_count(Relation rel, BlockNumber *all_visible, BlockNumber *all_frozen);
extern BlockNumber visibilitymap_prepare_truncate(Relation rel,
diff --git a/src/include/access/visibilitymapdefs.h b/src/include/access/visibilitymapdefs.h
index 5ad5c020877..e01bce4c99f 100644
--- a/src/include/access/visibilitymapdefs.h
+++ b/src/include/access/visibilitymapdefs.h
@@ -21,14 +21,5 @@
#define VISIBILITYMAP_ALL_FROZEN 0x02
#define VISIBILITYMAP_VALID_BITS 0x03 /* OR of all valid visibilitymap
* flags bits */
-/*
- * To detect recovery conflicts during logical decoding on a standby, we need
- * to know if a table is a user catalog table. For that we add an additional
- * bit into xl_heap_visible.flags, in addition to the above.
- *
- * NB: VISIBILITYMAP_XLOG_* may not be passed to visibilitymap_set().
- */
-#define VISIBILITYMAP_XLOG_CATALOG_REL 0x04
-#define VISIBILITYMAP_XLOG_VALID_BITS (VISIBILITYMAP_VALID_BITS | VISIBILITYMAP_XLOG_CATALOG_REL)
#endif /* VISIBILITYMAPDEFS_H */
diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list
index 8698918f443..76343fdf476 100644
--- a/src/tools/pgindent/typedefs.list
+++ b/src/tools/pgindent/typedefs.list
@@ -4289,7 +4289,6 @@ xl_heap_prune
xl_heap_rewrite_mapping
xl_heap_truncate
xl_heap_update
-xl_heap_visible
xl_invalid_page
xl_invalid_page_key
xl_invalidations
--
2.43.0
v20-0008-Rename-GlobalVisTestIsRemovableXid-to-GlobalVisX.patchtext/x-patch; charset=UTF-8; name=v20-0008-Rename-GlobalVisTestIsRemovableXid-to-GlobalVisX.patchDownload
From 321461cd0fa02408657f46d2ec0495e8a69790d7 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <melanieplageman@gmail.com>
Date: Fri, 18 Jul 2025 16:30:04 -0400
Subject: [PATCH v20 08/12] Rename GlobalVisTestIsRemovableXid() to
GlobalVisXidVisibleToAll()
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
The function is currently only used to check whether a tuple’s xmax is
visible to all transactions (and thus removable). Upcoming changes will
also use it to test whether a tuple’s xmin is visible to all to
decide if a page can be marked all-visible in the visibility map.
The new name, GlobalVisXidVisibleToAll(), better reflects this broader
purpose.
Reviewed-by: Kirill Reshke <reshkekirill@gmail.com>
---
src/backend/access/heap/heapam_visibility.c | 6 +++---
src/backend/access/heap/pruneheap.c | 12 ++++++------
src/backend/access/spgist/spgvacuum.c | 2 +-
src/backend/storage/ipc/procarray.c | 17 ++++++++---------
src/include/utils/snapmgr.h | 4 ++--
5 files changed, 20 insertions(+), 21 deletions(-)
diff --git a/src/backend/access/heap/heapam_visibility.c b/src/backend/access/heap/heapam_visibility.c
index 05f6946fe60..4ebc8abdbeb 100644
--- a/src/backend/access/heap/heapam_visibility.c
+++ b/src/backend/access/heap/heapam_visibility.c
@@ -1447,7 +1447,7 @@ HeapTupleSatisfiesNonVacuumable(HeapTuple htup, Snapshot snapshot,
{
Assert(TransactionIdIsValid(dead_after));
- if (GlobalVisTestIsRemovableXid(snapshot->vistest, dead_after))
+ if (GlobalVisXidVisibleToAll(snapshot->vistest, dead_after))
res = HEAPTUPLE_DEAD;
}
else
@@ -1512,8 +1512,8 @@ HeapTupleIsSurelyDead(HeapTuple htup, GlobalVisState *vistest)
return false;
/* Deleter committed, so tuple is dead if the XID is old enough. */
- return GlobalVisTestIsRemovableXid(vistest,
- HeapTupleHeaderGetRawXmax(tuple));
+ return GlobalVisXidVisibleToAll(vistest,
+ HeapTupleHeaderGetRawXmax(tuple));
}
/*
diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index d6b22b7b1c5..a2d872e5beb 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -244,7 +244,7 @@ heap_page_prune_opt(Relation relation, Buffer buffer)
*/
vistest = GlobalVisTestFor(relation);
- if (!GlobalVisTestIsRemovableXid(vistest, prune_xid))
+ if (!GlobalVisXidVisibleToAll(vistest, prune_xid))
return;
/*
@@ -469,7 +469,7 @@ prune_freeze_plan(Oid reloid, Buffer buffer, PruneState *prstate,
* Determining HTSV only once for each tuple is required for correctness,
* to deal with cases where running HTSV twice could result in different
* results. For example, RECENTLY_DEAD can turn to DEAD if another
- * checked item causes GlobalVisTestIsRemovableFullXid() to update the
+ * checked item causes GlobalVisFullXidVisibleToAll() to update the
* horizon, or INSERT_IN_PROGRESS can change to DEAD if the inserting
* transaction aborts.
*
@@ -1236,11 +1236,11 @@ heap_prune_satisfies_vacuum(PruneState *prstate, HeapTuple tup, Buffer buffer)
* Determine whether or not the tuple is considered dead when compared
* with the provided GlobalVisState. On-access pruning does not provide
* VacuumCutoffs. And for vacuum, even if the tuple's xmax is not older
- * than OldestXmin, GlobalVisTestIsRemovableXid() could find the row dead
- * if the GlobalVisState has been updated since the beginning of vacuuming
+ * than OldestXmin, GlobalVisXidVisibleToAll() could find the row dead if
+ * the GlobalVisState has been updated since the beginning of vacuuming
* the relation.
*/
- if (GlobalVisTestIsRemovableXid(prstate->vistest, dead_after))
+ if (GlobalVisXidVisibleToAll(prstate->vistest, dead_after))
return HEAPTUPLE_DEAD;
return res;
@@ -1699,7 +1699,7 @@ heap_prune_record_unchanged_lp_normal(Page page, PruneState *prstate, OffsetNumb
/*
* For now always use prstate->cutoffs for this test, because
* we only update 'all_visible' and 'all_frozen' when freezing
- * is requested. We could use GlobalVisTestIsRemovableXid
+ * is requested. We could use GlobalVisXidVisibleToAll()
* instead, if a non-freezing caller wanted to set the VM bit.
*/
Assert(prstate->cutoffs);
diff --git a/src/backend/access/spgist/spgvacuum.c b/src/backend/access/spgist/spgvacuum.c
index 71ef2e5036f..1c0eb425ee9 100644
--- a/src/backend/access/spgist/spgvacuum.c
+++ b/src/backend/access/spgist/spgvacuum.c
@@ -536,7 +536,7 @@ vacuumRedirectAndPlaceholder(Relation index, Relation heaprel, Buffer buffer)
*/
if (dt->tupstate == SPGIST_REDIRECT &&
(!TransactionIdIsValid(dt->xid) ||
- GlobalVisTestIsRemovableXid(vistest, dt->xid)))
+ GlobalVisXidVisibleToAll(vistest, dt->xid)))
{
dt->tupstate = SPGIST_PLACEHOLDER;
Assert(opaque->nRedirection > 0);
diff --git a/src/backend/storage/ipc/procarray.c b/src/backend/storage/ipc/procarray.c
index 200f72c6e25..235c3b584f6 100644
--- a/src/backend/storage/ipc/procarray.c
+++ b/src/backend/storage/ipc/procarray.c
@@ -4181,8 +4181,7 @@ GlobalVisUpdate(void)
* See comment for GlobalVisState for details.
*/
bool
-GlobalVisTestIsRemovableFullXid(GlobalVisState *state,
- FullTransactionId fxid)
+GlobalVisFullXidVisibleToAll(GlobalVisState *state, FullTransactionId fxid)
{
/*
* If fxid is older than maybe_needed bound, it definitely is visible to
@@ -4216,14 +4215,14 @@ GlobalVisTestIsRemovableFullXid(GlobalVisState *state,
}
/*
- * Wrapper around GlobalVisTestIsRemovableFullXid() for 32bit xids.
+ * Wrapper around GlobalVisFullXidVisibleToAll() for 32bit xids.
*
* It is crucial that this only gets called for xids from a source that
* protects against xid wraparounds (e.g. from a table and thus protected by
* relfrozenxid).
*/
bool
-GlobalVisTestIsRemovableXid(GlobalVisState *state, TransactionId xid)
+GlobalVisXidVisibleToAll(GlobalVisState *state, TransactionId xid)
{
FullTransactionId fxid;
@@ -4237,12 +4236,12 @@ GlobalVisTestIsRemovableXid(GlobalVisState *state, TransactionId xid)
*/
fxid = FullXidRelativeTo(state->definitely_needed, xid);
- return GlobalVisTestIsRemovableFullXid(state, fxid);
+ return GlobalVisFullXidVisibleToAll(state, fxid);
}
/*
* Convenience wrapper around GlobalVisTestFor() and
- * GlobalVisTestIsRemovableFullXid(), see their comments.
+ * GlobalVisFullXidVisibleToAll(), see their comments.
*/
bool
GlobalVisCheckRemovableFullXid(Relation rel, FullTransactionId fxid)
@@ -4251,12 +4250,12 @@ GlobalVisCheckRemovableFullXid(Relation rel, FullTransactionId fxid)
state = GlobalVisTestFor(rel);
- return GlobalVisTestIsRemovableFullXid(state, fxid);
+ return GlobalVisFullXidVisibleToAll(state, fxid);
}
/*
* Convenience wrapper around GlobalVisTestFor() and
- * GlobalVisTestIsRemovableXid(), see their comments.
+ * GlobalVisTestIsVisibleXid(), see their comments.
*/
bool
GlobalVisCheckRemovableXid(Relation rel, TransactionId xid)
@@ -4265,7 +4264,7 @@ GlobalVisCheckRemovableXid(Relation rel, TransactionId xid)
state = GlobalVisTestFor(rel);
- return GlobalVisTestIsRemovableXid(state, xid);
+ return GlobalVisXidVisibleToAll(state, xid);
}
/*
diff --git a/src/include/utils/snapmgr.h b/src/include/utils/snapmgr.h
index 604c1f90216..a0ea2cfcea2 100644
--- a/src/include/utils/snapmgr.h
+++ b/src/include/utils/snapmgr.h
@@ -100,8 +100,8 @@ extern char *ExportSnapshot(Snapshot snapshot);
*/
typedef struct GlobalVisState GlobalVisState;
extern GlobalVisState *GlobalVisTestFor(Relation rel);
-extern bool GlobalVisTestIsRemovableXid(GlobalVisState *state, TransactionId xid);
-extern bool GlobalVisTestIsRemovableFullXid(GlobalVisState *state, FullTransactionId fxid);
+extern bool GlobalVisXidVisibleToAll(GlobalVisState *state, TransactionId xid);
+extern bool GlobalVisFullXidVisibleToAll(GlobalVisState *state, FullTransactionId fxid);
extern bool GlobalVisCheckRemovableXid(Relation rel, TransactionId xid);
extern bool GlobalVisCheckRemovableFullXid(Relation rel, FullTransactionId fxid);
--
2.43.0
v20-0009-Use-GlobalVisState-in-vacuum-to-determine-page-l.patchtext/x-patch; charset=UTF-8; name=v20-0009-Use-GlobalVisState-in-vacuum-to-determine-page-l.patchDownload
From 013dd7d70e6af1ad578e9ff4a3753830e9548cbb Mon Sep 17 00:00:00 2001
From: Melanie Plageman <melanieplageman@gmail.com>
Date: Tue, 29 Jul 2025 14:38:24 -0400
Subject: [PATCH v20 09/12] Use GlobalVisState in vacuum to determine page
level visibility
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
During vacuum's first and third phases, we examine tuples' visibility
to determine if we can set the page all-visible in the visibility map.
Previously, this check compared tuple xmins against a single XID chosen at
the start of vacuum (OldestXmin). We now use GlobalVisState, which also
enables future work to set the VM during on-access pruning, since ordinary
queries have access to GlobalVisState but not OldestXmin.
This also benefits vacuum directly: in some cases, GlobalVisState may
advance during a vacuum, allowing more pages to become considered
all-visible. And, in the future, we could easily add a heuristic to
update GlobalVisState more frequently during vacuums of large tables. In
the rare case that the GlobalVisState moves backward, vacuum falls back
to OldestXmin to ensure we don’t attempt to freeze a dead tuple that
wasn’t yet prunable according to the GlobalVisState.
Because comparing a transaction ID against GlobalVisState is more
expensive than comparing against a single XID, we defer this check until
after scanning all tuples on the page. If visibility_cutoff_xid was
maintained, we perform the GlobalVisState check only once per page.
This is safe because visibility_cutoff_xid records the newest xmin on
the page; if it is globally visible, then the entire page is all-visible.
This approach may result in examining more tuple xmins than before,
since with OldestXmin we could sometimes rule out the page being
all-visible earlier. However, profiling shows the additional cost is not
significant.
---
src/backend/access/heap/heapam_visibility.c | 28 ++++++++++++++
src/backend/access/heap/pruneheap.c | 43 +++++++++------------
src/backend/access/heap/vacuumlazy.c | 10 ++---
src/include/access/heapam.h | 11 +++---
4 files changed, 58 insertions(+), 34 deletions(-)
diff --git a/src/backend/access/heap/heapam_visibility.c b/src/backend/access/heap/heapam_visibility.c
index 4ebc8abdbeb..edd529dc3c0 100644
--- a/src/backend/access/heap/heapam_visibility.c
+++ b/src/backend/access/heap/heapam_visibility.c
@@ -1189,6 +1189,34 @@ HeapTupleSatisfiesVacuum(HeapTuple htup, TransactionId OldestXmin,
return res;
}
+/*
+ * Nearly the same as HeapTupleSatisfiesVacuum, but uses a GlobalVisState to
+ * determine whether or not a tuple is HEAPTUPLE_DEAD Or
+ * HEAPTUPLE_RECENTLY_DEAD. It serves the same purpose but can be used by
+ * callers that have not calculated a single OldestXmin value.
+ */
+HTSV_Result
+HeapTupleSatisfiesVacuumGlobalVis(HeapTuple htup, GlobalVisState *vistest,
+ Buffer buffer)
+{
+ TransactionId dead_after = InvalidTransactionId;
+ HTSV_Result res;
+
+ res = HeapTupleSatisfiesVacuumHorizon(htup, buffer, &dead_after);
+
+ if (res == HEAPTUPLE_RECENTLY_DEAD)
+ {
+ Assert(TransactionIdIsValid(dead_after));
+
+ if (GlobalVisXidVisibleToAll(vistest, dead_after))
+ res = HEAPTUPLE_DEAD;
+ }
+ else
+ Assert(!TransactionIdIsValid(dead_after));
+
+ return res;
+}
+
/*
* Work horse for HeapTupleSatisfiesVacuum and similar routines.
*
diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index a2d872e5beb..fdbed5ac74d 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -432,11 +432,12 @@ prune_freeze_setup(PruneFreezeParams *params,
/*
* The visibility cutoff xid is the newest xmin of live, committed tuples
- * older than OldestXmin on the page. This field is only kept up-to-date
- * if the page is all-visible. As soon as a tuple is encountered that is
- * not visible to all, this field is unmaintained. As long as it is
- * maintained, it can be used to calculate the snapshot conflict horizon
- * when updating the VM and/or freezing all the tuples on the page.
+ * on the page older than the visibility horizon represented in the
+ * GlobalVisState. This field is only kept up-to-date if the page is
+ * all-visible. As soon as a tuple is encountered that is not visible to
+ * all, this field is unmaintained. As long as it is maintained, it can be
+ * used to calculate the snapshot conflict horizon when updating the VM
+ * and/or freezing all the tuples on the page.
*/
prstate->visibility_cutoff_xid = InvalidTransactionId;
}
@@ -882,14 +883,13 @@ heap_page_will_set_vis(Relation relation,
*/
static bool
heap_page_is_all_visible(Relation rel, Buffer buf,
- TransactionId OldestXmin,
+ GlobalVisState *vistest,
bool *all_frozen,
TransactionId *visibility_cutoff_xid,
OffsetNumber *logging_offnum)
{
- return heap_page_would_be_all_visible(rel, buf,
- OldestXmin,
+ return heap_page_would_be_all_visible(rel, buf, vistest,
NULL, 0,
all_frozen,
visibility_cutoff_xid,
@@ -991,6 +991,16 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
prstate.ndead > 0 ||
prstate.nunused > 0;
+ /*
+ * After processing all the live tuples on the page, if the newest xmin
+ * amongst them is not visible to everyone, the page cannot be
+ * all-visible.
+ */
+ if (prstate.all_visible &&
+ TransactionIdIsNormal(prstate.visibility_cutoff_xid) &&
+ !GlobalVisXidVisibleToAll(prstate.vistest, prstate.visibility_cutoff_xid))
+ prstate.all_visible = prstate.all_frozen = false;
+
/*
* Even if we don't prune anything, if we found a new value for the
* pd_prune_xid field or the page was marked full, we will update the hint
@@ -1163,10 +1173,9 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
bool debug_all_frozen;
Assert(prstate.lpdead_items == 0);
- Assert(prstate.cutoffs);
if (!heap_page_is_all_visible(params->relation, buffer,
- prstate.cutoffs->OldestXmin,
+ prstate.vistest,
&debug_all_frozen,
&debug_cutoff, off_loc))
Assert(false);
@@ -1696,20 +1705,6 @@ heap_prune_record_unchanged_lp_normal(Page page, PruneState *prstate, OffsetNumb
*/
xmin = HeapTupleHeaderGetXmin(htup);
- /*
- * For now always use prstate->cutoffs for this test, because
- * we only update 'all_visible' and 'all_frozen' when freezing
- * is requested. We could use GlobalVisXidVisibleToAll()
- * instead, if a non-freezing caller wanted to set the VM bit.
- */
- Assert(prstate->cutoffs);
- if (!TransactionIdPrecedes(xmin, prstate->cutoffs->OldestXmin))
- {
- prstate->all_visible = false;
- prstate->all_frozen = false;
- break;
- }
-
/* Track newest xmin on page. */
if (TransactionIdFollows(xmin, prstate->visibility_cutoff_xid) &&
TransactionIdIsNormal(xmin))
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index fd68dfcfce2..fdf37625cd0 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -2730,7 +2730,7 @@ lazy_vacuum_heap_page(LVRelState *vacrel, BlockNumber blkno, Buffer buffer,
* done outside the critical section.
*/
if (heap_page_would_be_all_visible(vacrel->rel, buffer,
- vacrel->cutoffs.OldestXmin,
+ vacrel->vistest,
deadoffsets, num_offsets,
&all_frozen, &visibility_cutoff_xid,
&vacrel->offnum))
@@ -3489,7 +3489,7 @@ dead_items_cleanup(LVRelState *vacrel)
* Returns true if the page is all-visible other than the provided
* deadoffsets and false otherwise.
*
- * OldestXmin is used to determine visibility.
+ * vistest is used to determine visibility.
*
* Output parameters:
*
@@ -3505,7 +3505,7 @@ dead_items_cleanup(LVRelState *vacrel)
*/
bool
heap_page_would_be_all_visible(Relation rel, Buffer buf,
- TransactionId OldestXmin,
+ GlobalVisState *vistest,
OffsetNumber *deadoffsets,
int ndeadoffsets,
bool *all_frozen,
@@ -3579,7 +3579,7 @@ heap_page_would_be_all_visible(Relation rel, Buffer buf,
/* Visibility checks may do IO or allocate memory */
Assert(CritSectionCount == 0);
- switch (HeapTupleSatisfiesVacuum(&tuple, OldestXmin, buf))
+ switch (HeapTupleSatisfiesVacuumGlobalVis(&tuple, vistest, buf))
{
case HEAPTUPLE_LIVE:
{
@@ -3598,7 +3598,7 @@ heap_page_would_be_all_visible(Relation rel, Buffer buf,
* that everyone sees it as committed?
*/
xmin = HeapTupleHeaderGetXmin(tuple.t_data);
- if (!TransactionIdPrecedes(xmin, OldestXmin))
+ if (!GlobalVisXidVisibleToAll(vistest, xmin))
{
all_visible = false;
*all_frozen = false;
diff --git a/src/include/access/heapam.h b/src/include/access/heapam.h
index 937b46a77db..2b6a521e4ea 100644
--- a/src/include/access/heapam.h
+++ b/src/include/access/heapam.h
@@ -276,10 +276,9 @@ typedef struct PruneFreezeParams
/*
* Contains the cutoffs used for freezing. They are required if the
- * HEAP_PAGE_PRUNE_FREEZE option is set. cutoffs->OldestXmin is also used
- * to determine if dead tuples are HEAPTUPLE_RECENTLY_DEAD or
- * HEAPTUPLE_DEAD. Currently only vacuum passes in cutoffs. Vacuum
- * calculates them once, at the beginning of vacuuming the relation.
+ * HEAP_PAGE_PRUNE_FREEZE option is set. Currently only vacuum passes in
+ * cutoffs. Vacuum calculates them once, at the beginning of vacuuming the
+ * relation.
*/
struct VacuumCutoffs *cutoffs;
} PruneFreezeParams;
@@ -444,7 +443,7 @@ extern void log_heap_prune_and_freeze(Relation relation, Buffer buffer,
extern void heap_vacuum_rel(Relation rel,
const VacuumParams params, BufferAccessStrategy bstrategy);
extern bool heap_page_would_be_all_visible(Relation rel, Buffer buf,
- TransactionId OldestXmin,
+ GlobalVisState *vistest,
OffsetNumber *deadoffsets,
int ndeadoffsets,
bool *all_frozen,
@@ -458,6 +457,8 @@ extern TM_Result HeapTupleSatisfiesUpdate(HeapTuple htup, CommandId curcid,
Buffer buffer);
extern HTSV_Result HeapTupleSatisfiesVacuum(HeapTuple htup, TransactionId OldestXmin,
Buffer buffer);
+extern HTSV_Result HeapTupleSatisfiesVacuumGlobalVis(HeapTuple htup,
+ GlobalVisState *vistest, Buffer buffer);
extern HTSV_Result HeapTupleSatisfiesVacuumHorizon(HeapTuple htup, Buffer buffer,
TransactionId *dead_after);
extern void HeapTupleSetHintBits(HeapTupleHeader tuple, Buffer buffer,
--
2.43.0
v20-0010-Unset-all_visible-sooner-if-not-freezing.patchtext/x-patch; charset=UTF-8; name=v20-0010-Unset-all_visible-sooner-if-not-freezing.patchDownload
From 6da1fcc5cb57cfc3b21ebb741dcde6fa207ccc4a Mon Sep 17 00:00:00 2001
From: Melanie Plageman <melanieplageman@gmail.com>
Date: Tue, 14 Oct 2025 15:22:35 -0400
Subject: [PATCH v20 10/12] Unset all_visible sooner if not freezing
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
In the prune/freeze path, we currently delay clearing all_visible and
all_frozen in the presence of dead items to allow opportunistic
freezing.
However, if no freezing will be attempted, there’s no need to delay.
Clearing the flags earlier avoids extra bookkeeping in
heap_prune_record_unchanged_lp_normal(). This currently has no runtime
effect because all callers that consider setting the VM also prepare
freeze plans, but upcoming changes will allow on-access pruning to set
the VM without freezing. The extra bookkeeping was noticeable in a
profile of on-access VM setting.
---
src/backend/access/heap/pruneheap.c | 15 +++++++++++++--
1 file changed, 13 insertions(+), 2 deletions(-)
diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index fdbed5ac74d..afb4251ad91 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -1562,8 +1562,13 @@ heap_prune_record_dead(PruneState *prstate, OffsetNumber offnum,
/*
* Deliberately delay unsetting all_visible and all_frozen until later
* during pruning. Removable dead tuples shouldn't preclude freezing the
- * page.
+ * page. If we won't attempt freezing, just unset all-visible now, though.
*/
+ if (!prstate->attempt_freeze)
+ {
+ prstate->all_visible = false;
+ prstate->all_frozen = false;
+ }
/* Record the dead offset for vacuum */
prstate->deadoffsets[prstate->lpdead_items++] = offnum;
@@ -1822,8 +1827,14 @@ heap_prune_record_unchanged_lp_dead(Page page, PruneState *prstate, OffsetNumber
* Similarly, don't unset all_visible and all_frozen until later, at the
* end of heap_page_prune_and_freeze(). This will allow us to attempt to
* freeze the page after pruning. As long as we unset it before updating
- * the visibility map, this will be correct.
+ * the visibility map, this will be correct. If we won't attempt freezing,
+ * though, just unset all-visible now.
*/
+ if (!prstate->attempt_freeze)
+ {
+ prstate->all_visible = false;
+ prstate->all_frozen = false;
+ }
/* Record the dead offset for vacuum */
prstate->deadoffsets[prstate->lpdead_items++] = offnum;
--
2.43.0
v20-0011-Allow-on-access-pruning-to-set-pages-all-visible.patchtext/x-patch; charset=UTF-8; name=v20-0011-Allow-on-access-pruning-to-set-pages-all-visible.patchDownload
From 9aac6ebedbc68301ee8c3d6da8aef54838851c90 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <melanieplageman@gmail.com>
Date: Tue, 29 Jul 2025 14:34:30 -0400
Subject: [PATCH v20 11/12] Allow on-access pruning to set pages all-visible
Many queries do not modify the underlying relation. For such queries, if
on-access pruning occurs during the scan, we can check whether the page
has become all-visible and update the visibility map accordingly.
Previously, only vacuum and COPY FREEZE marked pages as all-visible or
all-frozen.
Supporting this requires passing information about whether the relation
is modified from the executor down to the scan descriptor.
This commit implements on-access VM setting for sequential scans as well
as for the underlying heap relation in index scans and bitmap heap
scans.
Author: Melanie Plageman <melanieplageman@gmail.com>
Reviewed-by: Kirill Reshke <reshkekirill@gmail.com>
Discussion: https://postgr.es/m/flat/CAAKRu_ZMw6Npd_qm2KM%2BFwQ3cMOMx1Dh3VMhp8-V7SOLxdK9-g%40mail.gmail.com
---
contrib/pgrowlocks/pgrowlocks.c | 2 +-
src/backend/access/brin/brin.c | 3 +-
src/backend/access/gin/gininsert.c | 3 +-
src/backend/access/heap/heapam.c | 15 +++-
src/backend/access/heap/heapam_handler.c | 22 ++++--
src/backend/access/heap/pruneheap.c | 69 +++++++++++++++----
src/backend/access/index/genam.c | 4 +-
src/backend/access/index/indexam.c | 6 +-
src/backend/access/nbtree/nbtsort.c | 2 +-
src/backend/access/table/tableam.c | 8 ++-
src/backend/commands/constraint.c | 2 +-
src/backend/commands/copyto.c | 2 +-
src/backend/commands/tablecmds.c | 4 +-
src/backend/commands/typecmds.c | 4 +-
src/backend/executor/execIndexing.c | 2 +-
src/backend/executor/execMain.c | 4 ++
src/backend/executor/execReplication.c | 8 +--
src/backend/executor/execUtils.c | 2 +
src/backend/executor/nodeBitmapHeapscan.c | 9 ++-
src/backend/executor/nodeIndexonlyscan.c | 2 +-
src/backend/executor/nodeIndexscan.c | 11 ++-
src/backend/executor/nodeSeqscan.c | 26 ++++++-
src/backend/partitioning/partbounds.c | 2 +-
src/backend/utils/adt/selfuncs.c | 2 +-
src/include/access/genam.h | 3 +-
src/include/access/heapam.h | 30 +++++++-
src/include/access/tableam.h | 19 ++---
src/include/nodes/execnodes.h | 6 ++
.../t/035_standby_logical_decoding.pl | 3 +-
29 files changed, 210 insertions(+), 65 deletions(-)
diff --git a/contrib/pgrowlocks/pgrowlocks.c b/contrib/pgrowlocks/pgrowlocks.c
index f88269332b6..27f01d8055f 100644
--- a/contrib/pgrowlocks/pgrowlocks.c
+++ b/contrib/pgrowlocks/pgrowlocks.c
@@ -114,7 +114,7 @@ pgrowlocks(PG_FUNCTION_ARGS)
RelationGetRelationName(rel));
/* Scan the relation */
- scan = table_beginscan(rel, GetActiveSnapshot(), 0, NULL);
+ scan = table_beginscan(rel, GetActiveSnapshot(), 0, NULL, 0);
hscan = (HeapScanDesc) scan;
attinmeta = TupleDescGetAttInMetadata(rsinfo->setDesc);
diff --git a/src/backend/access/brin/brin.c b/src/backend/access/brin/brin.c
index 2f7d1437919..8186bba1d7e 100644
--- a/src/backend/access/brin/brin.c
+++ b/src/backend/access/brin/brin.c
@@ -2828,7 +2828,8 @@ _brin_parallel_scan_and_build(BrinBuildState *state,
indexInfo->ii_Concurrent = brinshared->isconcurrent;
scan = table_beginscan_parallel(heap,
- ParallelTableScanFromBrinShared(brinshared));
+ ParallelTableScanFromBrinShared(brinshared),
+ 0);
reltuples = table_index_build_scan(heap, index, indexInfo, true, true,
brinbuildCallbackParallel, state, scan);
diff --git a/src/backend/access/gin/gininsert.c b/src/backend/access/gin/gininsert.c
index c2b879b2bf6..147844690a1 100644
--- a/src/backend/access/gin/gininsert.c
+++ b/src/backend/access/gin/gininsert.c
@@ -2058,7 +2058,8 @@ _gin_parallel_scan_and_build(GinBuildState *state,
indexInfo->ii_Concurrent = ginshared->isconcurrent;
scan = table_beginscan_parallel(heap,
- ParallelTableScanFromGinBuildShared(ginshared));
+ ParallelTableScanFromGinBuildShared(ginshared),
+ 0);
reltuples = table_index_build_scan(heap, index, indexInfo, true, progress,
ginBuildCallbackParallel, state, scan);
diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index 2bff37e03b5..ae53e311ce1 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -555,6 +555,7 @@ heap_prepare_pagescan(TableScanDesc sscan)
Buffer buffer = scan->rs_cbuf;
BlockNumber block = scan->rs_cblock;
Snapshot snapshot;
+ Buffer *vmbuffer = NULL;
Page page;
int lines;
bool all_visible;
@@ -569,7 +570,9 @@ heap_prepare_pagescan(TableScanDesc sscan)
/*
* Prune and repair fragmentation for the whole page, if possible.
*/
- heap_page_prune_opt(scan->rs_base.rs_rd, buffer);
+ if (sscan->rs_flags & SO_HINT_REL_READ_ONLY)
+ vmbuffer = &scan->rs_vmbuffer;
+ heap_page_prune_opt(scan->rs_base.rs_rd, buffer, vmbuffer);
/*
* We must hold share lock on the buffer content while examining tuple
@@ -1246,6 +1249,7 @@ heap_beginscan(Relation relation, Snapshot snapshot,
sizeof(TBMIterateResult));
}
+ scan->rs_vmbuffer = InvalidBuffer;
return (TableScanDesc) scan;
}
@@ -1284,6 +1288,12 @@ heap_rescan(TableScanDesc sscan, ScanKey key, bool set_params,
scan->rs_cbuf = InvalidBuffer;
}
+ if (BufferIsValid(scan->rs_vmbuffer))
+ {
+ ReleaseBuffer(scan->rs_vmbuffer);
+ scan->rs_vmbuffer = InvalidBuffer;
+ }
+
/*
* SO_TYPE_BITMAPSCAN would be cleaned up here, but it does not hold any
* additional data vs a normal HeapScan
@@ -1316,6 +1326,9 @@ heap_endscan(TableScanDesc sscan)
if (BufferIsValid(scan->rs_cbuf))
ReleaseBuffer(scan->rs_cbuf);
+ if (BufferIsValid(scan->rs_vmbuffer))
+ ReleaseBuffer(scan->rs_vmbuffer);
+
/*
* Must free the read stream before freeing the BufferAccessStrategy.
*/
diff --git a/src/backend/access/heap/heapam_handler.c b/src/backend/access/heap/heapam_handler.c
index bcbac844bb6..27e3498f5f4 100644
--- a/src/backend/access/heap/heapam_handler.c
+++ b/src/backend/access/heap/heapam_handler.c
@@ -79,12 +79,14 @@ heapam_slot_callbacks(Relation relation)
*/
static IndexFetchTableData *
-heapam_index_fetch_begin(Relation rel)
+heapam_index_fetch_begin(Relation rel, uint32 flags)
{
IndexFetchHeapData *hscan = palloc0(sizeof(IndexFetchHeapData));
hscan->xs_base.rel = rel;
hscan->xs_cbuf = InvalidBuffer;
+ hscan->xs_vmbuffer = InvalidBuffer;
+ hscan->modifies_base_rel = !(flags & SO_HINT_REL_READ_ONLY);
return &hscan->xs_base;
}
@@ -99,6 +101,12 @@ heapam_index_fetch_reset(IndexFetchTableData *scan)
ReleaseBuffer(hscan->xs_cbuf);
hscan->xs_cbuf = InvalidBuffer;
}
+
+ if (BufferIsValid(hscan->xs_vmbuffer))
+ {
+ ReleaseBuffer(hscan->xs_vmbuffer);
+ hscan->xs_vmbuffer = InvalidBuffer;
+ }
}
static void
@@ -138,7 +146,8 @@ heapam_index_fetch_tuple(struct IndexFetchTableData *scan,
* Prune page, but only if we weren't already on this page
*/
if (prev_buf != hscan->xs_cbuf)
- heap_page_prune_opt(hscan->xs_base.rel, hscan->xs_cbuf);
+ heap_page_prune_opt(hscan->xs_base.rel, hscan->xs_cbuf,
+ hscan->modifies_base_rel ? NULL : &hscan->xs_vmbuffer);
}
/* Obtain share-lock on the buffer so we can examine visibility */
@@ -753,7 +762,7 @@ heapam_relation_copy_for_cluster(Relation OldHeap, Relation NewHeap,
tableScan = NULL;
heapScan = NULL;
- indexScan = index_beginscan(OldHeap, OldIndex, SnapshotAny, NULL, 0, 0);
+ indexScan = index_beginscan(OldHeap, OldIndex, SnapshotAny, NULL, 0, 0, 0);
index_rescan(indexScan, NULL, 0, NULL, 0);
}
else
@@ -762,7 +771,7 @@ heapam_relation_copy_for_cluster(Relation OldHeap, Relation NewHeap,
pgstat_progress_update_param(PROGRESS_CLUSTER_PHASE,
PROGRESS_CLUSTER_PHASE_SEQ_SCAN_HEAP);
- tableScan = table_beginscan(OldHeap, SnapshotAny, 0, (ScanKey) NULL);
+ tableScan = table_beginscan(OldHeap, SnapshotAny, 0, (ScanKey) NULL, 0);
heapScan = (HeapScanDesc) tableScan;
indexScan = NULL;
@@ -2471,6 +2480,7 @@ BitmapHeapScanNextBlock(TableScanDesc scan,
TBMIterateResult *tbmres;
OffsetNumber offsets[TBM_MAX_TUPLES_PER_PAGE];
int noffsets = -1;
+ Buffer *vmbuffer = NULL;
Assert(scan->rs_flags & SO_TYPE_BITMAPSCAN);
Assert(hscan->rs_read_stream);
@@ -2517,7 +2527,9 @@ BitmapHeapScanNextBlock(TableScanDesc scan,
/*
* Prune and repair fragmentation for the whole page, if possible.
*/
- heap_page_prune_opt(scan->rs_rd, buffer);
+ if (scan->rs_flags & SO_HINT_REL_READ_ONLY)
+ vmbuffer = &hscan->rs_vmbuffer;
+ heap_page_prune_opt(scan->rs_rd, buffer, vmbuffer);
/*
* We must hold share lock on the buffer content while examining tuple
diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index afb4251ad91..8011130ca8b 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -197,7 +197,9 @@ static bool heap_page_will_set_vis(Relation relation,
Buffer heap_buf,
Buffer vmbuffer,
bool blk_known_av,
- const PruneState *prstate,
+ PruneReason reason,
+ bool do_prune, bool do_freeze,
+ PruneState *prstate,
uint8 *vmflags,
bool *do_set_pd_vis);
@@ -212,9 +214,13 @@ static bool heap_page_will_set_vis(Relation relation,
* if there's not any use in pruning.
*
* Caller must have pin on the buffer, and must *not* have a lock on it.
+ *
+ * If vmbuffer is not NULL, it is okay for pruning to set the visibility map if
+ * the page is all-visible. We will take care of pinning and, if needed,
+ * reading in the page of the visibility map.
*/
void
-heap_page_prune_opt(Relation relation, Buffer buffer)
+heap_page_prune_opt(Relation relation, Buffer buffer, Buffer *vmbuffer)
{
Page page = BufferGetPage(buffer);
TransactionId prune_xid;
@@ -291,6 +297,13 @@ heap_page_prune_opt(Relation relation, Buffer buffer)
.vistest = vistest,.cutoffs = NULL
};
+ if (vmbuffer)
+ {
+ visibilitymap_pin(relation, BufferGetBlockNumber(buffer), vmbuffer);
+ params.options = HEAP_PAGE_PRUNE_UPDATE_VIS;
+ params.vmbuffer = *vmbuffer;
+ }
+
heap_page_prune_and_freeze(¶ms, &presult, &dummy_off_loc,
NULL, NULL);
@@ -781,6 +794,9 @@ heap_page_will_freeze(Relation relation, Buffer buffer,
* have examined this page’s VM bits (e.g., VACUUM in the previous
* heap_vac_scan_next_block() call) and can pass that along.
*
+ * This should be called only after do_freeze has been decided (and do_prune
+ * has been set), as these factor into our heuristic-based decision.
+ *
* Returns true if one or both VM bits should be set, along with the desired
* flags in *vmflags. Also indicates via do_set_pd_vis whether PD_ALL_VISIBLE
* should be set on the heap page.
@@ -791,7 +807,9 @@ heap_page_will_set_vis(Relation relation,
Buffer heap_buf,
Buffer vmbuffer,
bool blk_known_av,
- const PruneState *prstate,
+ PruneReason reason,
+ bool do_prune, bool do_freeze,
+ PruneState *prstate,
uint8 *vmflags,
bool *do_set_pd_vis)
{
@@ -807,6 +825,23 @@ heap_page_will_set_vis(Relation relation,
return false;
}
+ /*
+ * If this is an on-access call and we're not actually pruning, avoid
+ * setting the visibility map if it would newly dirty the heap page or, if
+ * the page is already dirty, if doing so would require including a
+ * full-page image (FPI) of the heap page in the WAL. This situation
+ * should be rare, as on-access pruning is only attempted when
+ * pd_prune_xid is valid.
+ */
+ if (reason == PRUNE_ON_ACCESS &&
+ prstate->all_visible &&
+ !do_prune && !do_freeze &&
+ (!BufferIsDirty(heap_buf) || XLogCheckBufferNeedsBackup(heap_buf)))
+ {
+ prstate->all_visible = prstate->all_frozen = false;
+ return false;
+ }
+
if (prstate->all_visible && !PageIsAllVisible(heap_page))
*do_set_pd_vis = true;
@@ -830,6 +865,11 @@ heap_page_will_set_vis(Relation relation,
* page-level bit is clear. However, it's possible that in vacuum the bit
* got cleared after heap_vac_scan_next_block() was called, so we must
* recheck with buffer lock before concluding that the VM is corrupt.
+ *
+ * This will never trigger for on-access pruning because it couldn't have
+ * done a previous visibility map lookup and thus will always pass
+ * blk_known_av as false. A future vacuum will have to take care of fixing
+ * the corruption.
*/
else if (blk_known_av && !PageIsAllVisible(heap_page) &&
visibilitymap_get_status(relation, heap_blk, &vmbuffer) != 0)
@@ -991,6 +1031,14 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
prstate.ndead > 0 ||
prstate.nunused > 0;
+ /*
+ * Even if we don't prune anything, if we found a new value for the
+ * pd_prune_xid field or the page was marked full, we will update the hint
+ * bit.
+ */
+ do_hint_prune = ((PageHeader) page)->pd_prune_xid != prstate.new_prune_xid ||
+ PageIsFull(page);
+
/*
* After processing all the live tuples on the page, if the newest xmin
* amongst them is not visible to everyone, the page cannot be
@@ -1001,14 +1049,6 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
!GlobalVisXidVisibleToAll(prstate.vistest, prstate.visibility_cutoff_xid))
prstate.all_visible = prstate.all_frozen = false;
- /*
- * Even if we don't prune anything, if we found a new value for the
- * pd_prune_xid field or the page was marked full, we will update the hint
- * bit.
- */
- do_hint_prune = ((PageHeader) page)->pd_prune_xid != prstate.new_prune_xid ||
- PageIsFull(page);
-
/*
* Decide if we want to go ahead with freezing according to the freeze
* plans we prepared, or not.
@@ -1052,6 +1092,7 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
*/
do_set_vm = heap_page_will_set_vis(params->relation,
blockno, buffer, vmbuffer, params->blk_known_av,
+ params->reason, do_prune, do_freeze,
&prstate, &new_vmbits, &do_set_pd_vis);
/* We should only set the VM if PD_ALL_VISIBLE is set or will be */
@@ -2338,7 +2379,7 @@ heap_log_freeze_plan(HeapTupleFreeze *tuples, int ntuples,
/*
* Calculate the conflict horizon for the whole XLOG_HEAP2_PRUNE_VACUUM_SCAN
- * record.
+ * or XLOG_HEAP2_PRUNE_ON_ACCESS record.
*/
static TransactionId
get_conflict_xid(bool do_prune, bool do_freeze, bool do_set_vm,
@@ -2408,8 +2449,8 @@ get_conflict_xid(bool do_prune, bool do_freeze, bool do_set_vm,
* - Reaping: During vacuum phase III, items that are already LP_DEAD are
* marked as unused.
*
- * - VM updates: After vacuum phases I and III, the heap page may be marked
- * all-visible and all-frozen.
+ * - VM updates: After vacuum phases I and III and on-access, the heap page
+ * may be marked all-visible and all-frozen.
*
* These changes all happen together, so we use a single WAL record for them
* all.
diff --git a/src/backend/access/index/genam.c b/src/backend/access/index/genam.c
index 0cb27af1310..1e7992dbeb3 100644
--- a/src/backend/access/index/genam.c
+++ b/src/backend/access/index/genam.c
@@ -447,7 +447,7 @@ systable_beginscan(Relation heapRelation,
}
sysscan->iscan = index_beginscan(heapRelation, irel,
- snapshot, NULL, nkeys, 0);
+ snapshot, NULL, nkeys, 0, 0);
index_rescan(sysscan->iscan, idxkey, nkeys, NULL, 0);
sysscan->scan = NULL;
@@ -708,7 +708,7 @@ systable_beginscan_ordered(Relation heapRelation,
}
sysscan->iscan = index_beginscan(heapRelation, indexRelation,
- snapshot, NULL, nkeys, 0);
+ snapshot, NULL, nkeys, 0, 0);
index_rescan(sysscan->iscan, idxkey, nkeys, NULL, 0);
sysscan->scan = NULL;
diff --git a/src/backend/access/index/indexam.c b/src/backend/access/index/indexam.c
index 0492d92d23b..b5523cf2ab1 100644
--- a/src/backend/access/index/indexam.c
+++ b/src/backend/access/index/indexam.c
@@ -257,7 +257,7 @@ index_beginscan(Relation heapRelation,
Relation indexRelation,
Snapshot snapshot,
IndexScanInstrumentation *instrument,
- int nkeys, int norderbys)
+ int nkeys, int norderbys, uint32 flags)
{
IndexScanDesc scan;
@@ -284,7 +284,7 @@ index_beginscan(Relation heapRelation,
scan->instrument = instrument;
/* prepare to fetch index matches from table */
- scan->xs_heapfetch = table_index_fetch_begin(heapRelation);
+ scan->xs_heapfetch = table_index_fetch_begin(heapRelation, flags);
return scan;
}
@@ -615,7 +615,7 @@ index_beginscan_parallel(Relation heaprel, Relation indexrel,
scan->instrument = instrument;
/* prepare to fetch index matches from table */
- scan->xs_heapfetch = table_index_fetch_begin(heaprel);
+ scan->xs_heapfetch = table_index_fetch_begin(heaprel, 0);
return scan;
}
diff --git a/src/backend/access/nbtree/nbtsort.c b/src/backend/access/nbtree/nbtsort.c
index 454adaee7dc..02ab0233e59 100644
--- a/src/backend/access/nbtree/nbtsort.c
+++ b/src/backend/access/nbtree/nbtsort.c
@@ -1925,7 +1925,7 @@ _bt_parallel_scan_and_sort(BTSpool *btspool, BTSpool *btspool2,
indexInfo = BuildIndexInfo(btspool->index);
indexInfo->ii_Concurrent = btshared->isconcurrent;
scan = table_beginscan_parallel(btspool->heap,
- ParallelTableScanFromBTShared(btshared));
+ ParallelTableScanFromBTShared(btshared), 0);
reltuples = table_index_build_scan(btspool->heap, btspool->index, indexInfo,
true, progress, _bt_build_callback,
&buildstate, scan);
diff --git a/src/backend/access/table/tableam.c b/src/backend/access/table/tableam.c
index 5e41404937e..558c4497993 100644
--- a/src/backend/access/table/tableam.c
+++ b/src/backend/access/table/tableam.c
@@ -50,6 +50,7 @@ char *default_table_access_method = DEFAULT_TABLE_ACCESS_METHOD;
bool synchronize_seqscans = true;
+
/* ----------------------------------------------------------------------------
* Slot functions.
* ----------------------------------------------------------------------------
@@ -163,10 +164,11 @@ table_parallelscan_initialize(Relation rel, ParallelTableScanDesc pscan,
}
TableScanDesc
-table_beginscan_parallel(Relation relation, ParallelTableScanDesc pscan)
+table_beginscan_parallel(Relation relation, ParallelTableScanDesc pscan, uint32 flags)
{
Snapshot snapshot;
- uint32 flags = SO_TYPE_SEQSCAN |
+
+ flags |= SO_TYPE_SEQSCAN |
SO_ALLOW_STRAT | SO_ALLOW_SYNC | SO_ALLOW_PAGEMODE;
Assert(RelFileLocatorEquals(relation->rd_locator, pscan->phs_locator));
@@ -217,7 +219,7 @@ table_index_fetch_tuple_check(Relation rel,
bool found;
slot = table_slot_create(rel, NULL);
- scan = table_index_fetch_begin(rel);
+ scan = table_index_fetch_begin(rel, 0);
found = table_index_fetch_tuple(scan, tid, snapshot, slot, &call_again,
all_dead);
table_index_fetch_end(scan);
diff --git a/src/backend/commands/constraint.c b/src/backend/commands/constraint.c
index 3497a8221f2..97c8278e36d 100644
--- a/src/backend/commands/constraint.c
+++ b/src/backend/commands/constraint.c
@@ -106,7 +106,7 @@ unique_key_recheck(PG_FUNCTION_ARGS)
*/
tmptid = checktid;
{
- IndexFetchTableData *scan = table_index_fetch_begin(trigdata->tg_relation);
+ IndexFetchTableData *scan = table_index_fetch_begin(trigdata->tg_relation, 0);
bool call_again = false;
if (!table_index_fetch_tuple(scan, &tmptid, SnapshotSelf, slot,
diff --git a/src/backend/commands/copyto.c b/src/backend/commands/copyto.c
index cef452584e5..22b453dc617 100644
--- a/src/backend/commands/copyto.c
+++ b/src/backend/commands/copyto.c
@@ -1156,7 +1156,7 @@ CopyRelationTo(CopyToState cstate, Relation rel, Relation root_rel, uint64 *proc
AttrMap *map = NULL;
TupleTableSlot *root_slot = NULL;
- scandesc = table_beginscan(rel, GetActiveSnapshot(), 0, NULL);
+ scandesc = table_beginscan(rel, GetActiveSnapshot(), 0, NULL, 0);
slot = table_slot_create(rel, NULL);
/*
diff --git a/src/backend/commands/tablecmds.c b/src/backend/commands/tablecmds.c
index 23ebaa3f230..66c418059fe 100644
--- a/src/backend/commands/tablecmds.c
+++ b/src/backend/commands/tablecmds.c
@@ -6345,7 +6345,7 @@ ATRewriteTable(AlteredTableInfo *tab, Oid OIDNewHeap)
* checking all the constraints.
*/
snapshot = RegisterSnapshot(GetLatestSnapshot());
- scan = table_beginscan(oldrel, snapshot, 0, NULL);
+ scan = table_beginscan(oldrel, snapshot, 0, NULL, 0);
/*
* Switch to per-tuple memory context and reset it for each tuple
@@ -13730,7 +13730,7 @@ validateForeignKeyConstraint(char *conname,
*/
snapshot = RegisterSnapshot(GetLatestSnapshot());
slot = table_slot_create(rel, NULL);
- scan = table_beginscan(rel, snapshot, 0, NULL);
+ scan = table_beginscan(rel, snapshot, 0, NULL, 0);
perTupCxt = AllocSetContextCreate(CurrentMemoryContext,
"validateForeignKeyConstraint",
diff --git a/src/backend/commands/typecmds.c b/src/backend/commands/typecmds.c
index 5979580139f..35560ac60d9 100644
--- a/src/backend/commands/typecmds.c
+++ b/src/backend/commands/typecmds.c
@@ -3154,7 +3154,7 @@ validateDomainNotNullConstraint(Oid domainoid)
/* Scan all tuples in this relation */
snapshot = RegisterSnapshot(GetLatestSnapshot());
- scan = table_beginscan(testrel, snapshot, 0, NULL);
+ scan = table_beginscan(testrel, snapshot, 0, NULL, 0);
slot = table_slot_create(testrel, NULL);
while (table_scan_getnextslot(scan, ForwardScanDirection, slot))
{
@@ -3235,7 +3235,7 @@ validateDomainCheckConstraint(Oid domainoid, const char *ccbin, LOCKMODE lockmod
/* Scan all tuples in this relation */
snapshot = RegisterSnapshot(GetLatestSnapshot());
- scan = table_beginscan(testrel, snapshot, 0, NULL);
+ scan = table_beginscan(testrel, snapshot, 0, NULL, 0);
slot = table_slot_create(testrel, NULL);
while (table_scan_getnextslot(scan, ForwardScanDirection, slot))
{
diff --git a/src/backend/executor/execIndexing.c b/src/backend/executor/execIndexing.c
index 401606f840a..4e39ac00f30 100644
--- a/src/backend/executor/execIndexing.c
+++ b/src/backend/executor/execIndexing.c
@@ -815,7 +815,7 @@ check_exclusion_or_unique_constraint(Relation heap, Relation index,
retry:
conflict = false;
found_self = false;
- index_scan = index_beginscan(heap, index, &DirtySnapshot, NULL, indnkeyatts, 0);
+ index_scan = index_beginscan(heap, index, &DirtySnapshot, NULL, indnkeyatts, 0, 0);
index_rescan(index_scan, scankeys, indnkeyatts, NULL, 0);
while (index_getnext_slot(index_scan, ForwardScanDirection, existing_slot))
diff --git a/src/backend/executor/execMain.c b/src/backend/executor/execMain.c
index 27c9eec697b..0630a5af79e 100644
--- a/src/backend/executor/execMain.c
+++ b/src/backend/executor/execMain.c
@@ -916,6 +916,10 @@ InitPlan(QueryDesc *queryDesc, int eflags)
break;
}
+ /* If it has a rowmark, the relation is modified */
+ estate->es_modified_relids = bms_add_member(estate->es_modified_relids,
+ rc->rti);
+
/* Check that relation is a legal target for marking */
if (relation)
CheckValidRowMarkRel(relation, rc->markType);
diff --git a/src/backend/executor/execReplication.c b/src/backend/executor/execReplication.c
index def32774c90..473d236e551 100644
--- a/src/backend/executor/execReplication.c
+++ b/src/backend/executor/execReplication.c
@@ -204,7 +204,7 @@ RelationFindReplTupleByIndex(Relation rel, Oid idxoid,
skey_attoff = build_replindex_scan_key(skey, rel, idxrel, searchslot);
/* Start an index scan. */
- scan = index_beginscan(rel, idxrel, &snap, NULL, skey_attoff, 0);
+ scan = index_beginscan(rel, idxrel, &snap, NULL, skey_attoff, 0, 0);
retry:
found = false;
@@ -382,7 +382,7 @@ RelationFindReplTupleSeq(Relation rel, LockTupleMode lockmode,
/* Start a heap scan. */
InitDirtySnapshot(snap);
- scan = table_beginscan(rel, &snap, 0, NULL);
+ scan = table_beginscan(rel, &snap, 0, NULL, 0);
scanslot = table_slot_create(rel, NULL);
retry:
@@ -601,7 +601,7 @@ RelationFindDeletedTupleInfoSeq(Relation rel, TupleTableSlot *searchslot,
* not yet committed or those just committed prior to the scan are
* excluded in update_most_recent_deletion_info().
*/
- scan = table_beginscan(rel, SnapshotAny, 0, NULL);
+ scan = table_beginscan(rel, SnapshotAny, 0, NULL, 0);
scanslot = table_slot_create(rel, NULL);
table_rescan(scan, NULL);
@@ -665,7 +665,7 @@ RelationFindDeletedTupleInfoByIndex(Relation rel, Oid idxoid,
* not yet committed or those just committed prior to the scan are
* excluded in update_most_recent_deletion_info().
*/
- scan = index_beginscan(rel, idxrel, SnapshotAny, NULL, skey_attoff, 0);
+ scan = index_beginscan(rel, idxrel, SnapshotAny, NULL, skey_attoff, 0, 0);
index_rescan(scan, skey, skey_attoff, NULL, 0);
diff --git a/src/backend/executor/execUtils.c b/src/backend/executor/execUtils.c
index fdc65c2b42b..28a06dcd244 100644
--- a/src/backend/executor/execUtils.c
+++ b/src/backend/executor/execUtils.c
@@ -893,6 +893,8 @@ ExecInitResultRelation(EState *estate, ResultRelInfo *resultRelInfo,
estate->es_result_relations = (ResultRelInfo **)
palloc0(estate->es_range_table_size * sizeof(ResultRelInfo *));
estate->es_result_relations[rti - 1] = resultRelInfo;
+ estate->es_modified_relids = bms_add_member(estate->es_modified_relids,
+ rti);
/*
* Saving in the list allows to avoid needlessly traversing the whole
diff --git a/src/backend/executor/nodeBitmapHeapscan.c b/src/backend/executor/nodeBitmapHeapscan.c
index bf24f3d7fe0..0d854db51a1 100644
--- a/src/backend/executor/nodeBitmapHeapscan.c
+++ b/src/backend/executor/nodeBitmapHeapscan.c
@@ -105,11 +105,18 @@ BitmapTableScanSetup(BitmapHeapScanState *node)
*/
if (!node->ss.ss_currentScanDesc)
{
+ uint32 flags = 0;
+
+ if (!bms_is_member(((Scan *) node->ss.ps.plan)->scanrelid,
+ node->ss.ps.state->es_modified_relids))
+ flags = SO_HINT_REL_READ_ONLY;
+
node->ss.ss_currentScanDesc =
table_beginscan_bm(node->ss.ss_currentRelation,
node->ss.ps.state->es_snapshot,
0,
- NULL);
+ NULL,
+ flags);
}
node->ss.ss_currentScanDesc->st.rs_tbmiterator = tbmiterator;
diff --git a/src/backend/executor/nodeIndexonlyscan.c b/src/backend/executor/nodeIndexonlyscan.c
index f464cca9507..87b04b1b88e 100644
--- a/src/backend/executor/nodeIndexonlyscan.c
+++ b/src/backend/executor/nodeIndexonlyscan.c
@@ -94,7 +94,7 @@ IndexOnlyNext(IndexOnlyScanState *node)
estate->es_snapshot,
&node->ioss_Instrument,
node->ioss_NumScanKeys,
- node->ioss_NumOrderByKeys);
+ node->ioss_NumOrderByKeys, 0);
node->ioss_ScanDesc = scandesc;
diff --git a/src/backend/executor/nodeIndexscan.c b/src/backend/executor/nodeIndexscan.c
index f36929deec3..90f929ce741 100644
--- a/src/backend/executor/nodeIndexscan.c
+++ b/src/backend/executor/nodeIndexscan.c
@@ -102,6 +102,12 @@ IndexNext(IndexScanState *node)
if (scandesc == NULL)
{
+ uint32 flags = 0;
+
+ if (!bms_is_member(((Scan *) node->ss.ps.plan)->scanrelid,
+ estate->es_modified_relids))
+ flags = SO_HINT_REL_READ_ONLY;
+
/*
* We reach here if the index scan is not parallel, or if we're
* serially executing an index scan that was planned to be parallel.
@@ -111,7 +117,8 @@ IndexNext(IndexScanState *node)
estate->es_snapshot,
&node->iss_Instrument,
node->iss_NumScanKeys,
- node->iss_NumOrderByKeys);
+ node->iss_NumOrderByKeys,
+ flags);
node->iss_ScanDesc = scandesc;
@@ -207,7 +214,7 @@ IndexNextWithReorder(IndexScanState *node)
estate->es_snapshot,
&node->iss_Instrument,
node->iss_NumScanKeys,
- node->iss_NumOrderByKeys);
+ node->iss_NumOrderByKeys, 0);
node->iss_ScanDesc = scandesc;
diff --git a/src/backend/executor/nodeSeqscan.c b/src/backend/executor/nodeSeqscan.c
index 94047d29430..4d0cbb9dee4 100644
--- a/src/backend/executor/nodeSeqscan.c
+++ b/src/backend/executor/nodeSeqscan.c
@@ -65,13 +65,20 @@ SeqNext(SeqScanState *node)
if (scandesc == NULL)
{
+ uint32 flags = 0;
+
+ if (!bms_is_member(((Scan *) node->ss.ps.plan)->scanrelid,
+ estate->es_modified_relids))
+ flags = SO_HINT_REL_READ_ONLY;
+
/*
* We reach here if the scan is not parallel, or if we're serially
* executing a scan that was planned to be parallel.
*/
scandesc = table_beginscan(node->ss.ss_currentRelation,
estate->es_snapshot,
- 0, NULL);
+ 0, NULL, flags);
+
node->ss.ss_currentScanDesc = scandesc;
}
@@ -367,14 +374,20 @@ ExecSeqScanInitializeDSM(SeqScanState *node,
{
EState *estate = node->ss.ps.state;
ParallelTableScanDesc pscan;
+ uint32 flags = 0;
pscan = shm_toc_allocate(pcxt->toc, node->pscan_len);
table_parallelscan_initialize(node->ss.ss_currentRelation,
pscan,
estate->es_snapshot);
shm_toc_insert(pcxt->toc, node->ss.ps.plan->plan_node_id, pscan);
+ if (!bms_is_member(((Scan *) node->ss.ps.plan)->scanrelid,
+ estate->es_modified_relids))
+ flags = SO_HINT_REL_READ_ONLY;
+
node->ss.ss_currentScanDesc =
- table_beginscan_parallel(node->ss.ss_currentRelation, pscan);
+ table_beginscan_parallel(node->ss.ss_currentRelation, pscan,
+ flags);
}
/* ----------------------------------------------------------------
@@ -404,8 +417,15 @@ ExecSeqScanInitializeWorker(SeqScanState *node,
ParallelWorkerContext *pwcxt)
{
ParallelTableScanDesc pscan;
+ uint32 flags = 0;
+
+ if (!bms_is_member(((Scan *) node->ss.ps.plan)->scanrelid,
+ node->ss.ps.state->es_modified_relids))
+ flags = SO_HINT_REL_READ_ONLY;
pscan = shm_toc_lookup(pwcxt->toc, node->ss.ps.plan->plan_node_id, false);
node->ss.ss_currentScanDesc =
- table_beginscan_parallel(node->ss.ss_currentRelation, pscan);
+ table_beginscan_parallel(node->ss.ss_currentRelation,
+ pscan,
+ flags);
}
diff --git a/src/backend/partitioning/partbounds.c b/src/backend/partitioning/partbounds.c
index 8ba038c5ef4..d3b340ee2a7 100644
--- a/src/backend/partitioning/partbounds.c
+++ b/src/backend/partitioning/partbounds.c
@@ -3370,7 +3370,7 @@ check_default_partition_contents(Relation parent, Relation default_rel,
econtext = GetPerTupleExprContext(estate);
snapshot = RegisterSnapshot(GetLatestSnapshot());
tupslot = table_slot_create(part_rel, &estate->es_tupleTable);
- scan = table_beginscan(part_rel, snapshot, 0, NULL);
+ scan = table_beginscan(part_rel, snapshot, 0, NULL, 0);
/*
* Switch to per-tuple memory context and reset it for each tuple
diff --git a/src/backend/utils/adt/selfuncs.c b/src/backend/utils/adt/selfuncs.c
index cb23ad52782..78fa63e2b73 100644
--- a/src/backend/utils/adt/selfuncs.c
+++ b/src/backend/utils/adt/selfuncs.c
@@ -6788,7 +6788,7 @@ get_actual_variable_endpoint(Relation heapRel,
index_scan = index_beginscan(heapRel, indexRel,
&SnapshotNonVacuumable, NULL,
- 1, 0);
+ 1, 0, 0);
/* Set it up for index-only scan */
index_scan->xs_want_itup = true;
index_rescan(index_scan, scankeys, 1, NULL, 0);
diff --git a/src/include/access/genam.h b/src/include/access/genam.h
index 9200a22bd9f..2f9e9ea6318 100644
--- a/src/include/access/genam.h
+++ b/src/include/access/genam.h
@@ -177,7 +177,7 @@ extern IndexScanDesc index_beginscan(Relation heapRelation,
Relation indexRelation,
Snapshot snapshot,
IndexScanInstrumentation *instrument,
- int nkeys, int norderbys);
+ int nkeys, int norderbys, uint32 flags);
extern IndexScanDesc index_beginscan_bitmap(Relation indexRelation,
Snapshot snapshot,
IndexScanInstrumentation *instrument,
@@ -204,6 +204,7 @@ extern IndexScanDesc index_beginscan_parallel(Relation heaprel,
IndexScanInstrumentation *instrument,
int nkeys, int norderbys,
ParallelIndexScanDesc pscan);
+
extern ItemPointer index_getnext_tid(IndexScanDesc scan,
ScanDirection direction);
extern bool index_fetch_heap(IndexScanDesc scan, TupleTableSlot *slot);
diff --git a/src/include/access/heapam.h b/src/include/access/heapam.h
index 2b6a521e4ea..1e3df54628b 100644
--- a/src/include/access/heapam.h
+++ b/src/include/access/heapam.h
@@ -95,6 +95,13 @@ typedef struct HeapScanDescData
*/
ParallelBlockTableScanWorkerData *rs_parallelworkerdata;
+ /*
+ * For sequential scans and bitmap heap scans. If the relation is not
+ * being modified, on-access pruning may read in the current heap page's
+ * corresponding VM block to this buffer.
+ */
+ Buffer rs_vmbuffer;
+
/* these fields only used in page-at-a-time mode and for bitmap scans */
uint32 rs_cindex; /* current tuple's index in vistuples */
uint32 rs_ntuples; /* number of visible tuples on page */
@@ -117,8 +124,24 @@ typedef struct IndexFetchHeapData
{
IndexFetchTableData xs_base; /* AM independent part of the descriptor */
- Buffer xs_cbuf; /* current heap buffer in scan, if any */
- /* NB: if xs_cbuf is not InvalidBuffer, we hold a pin on that buffer */
+ /*
+ * Current heap buffer in scan, if any. NB: if xs_cbuf is not
+ * InvalidBuffer, we hold a pin on that buffer.
+ */
+ Buffer xs_cbuf;
+
+ /*
+ * For index scans that do not modify the underlying heap table, on-access
+ * pruning may read in the current heap page's corresponding VM block to
+ * this buffer.
+ */
+ Buffer xs_vmbuffer;
+
+ /*
+ * Some optimizations can only be performed if the query does not modify
+ * the underlying relation. Track that here.
+ */
+ bool modifies_base_rel;
} IndexFetchHeapData;
/* Result codes for HeapTupleSatisfiesVacuum */
@@ -417,7 +440,8 @@ extern TransactionId heap_index_delete_tuples(Relation rel,
TM_IndexDeleteOp *delstate);
/* in heap/pruneheap.c */
-extern void heap_page_prune_opt(Relation relation, Buffer buffer);
+extern void heap_page_prune_opt(Relation relation, Buffer buffer,
+ Buffer *vmbuffer);
extern void heap_page_prune_and_freeze(PruneFreezeParams *params,
PruneFreezeResult *presult,
OffsetNumber *off_loc,
diff --git a/src/include/access/tableam.h b/src/include/access/tableam.h
index e16bf025692..0042636463f 100644
--- a/src/include/access/tableam.h
+++ b/src/include/access/tableam.h
@@ -63,6 +63,8 @@ typedef enum ScanOptions
/* unregister snapshot at scan end? */
SO_TEMP_SNAPSHOT = 1 << 9,
+ /* set if the query doesn't modify the rel */
+ SO_HINT_REL_READ_ONLY = 1 << 10,
} ScanOptions;
/*
@@ -420,7 +422,7 @@ typedef struct TableAmRoutine
*
* Tuples for an index scan can then be fetched via index_fetch_tuple.
*/
- struct IndexFetchTableData *(*index_fetch_begin) (Relation rel);
+ struct IndexFetchTableData *(*index_fetch_begin) (Relation rel, uint32 flags);
/*
* Reset index fetch. Typically this will release cross index fetch
@@ -874,9 +876,9 @@ extern TupleTableSlot *table_slot_create(Relation relation, List **reglist);
*/
static inline TableScanDesc
table_beginscan(Relation rel, Snapshot snapshot,
- int nkeys, ScanKeyData *key)
+ int nkeys, ScanKeyData *key, uint32 flags)
{
- uint32 flags = SO_TYPE_SEQSCAN |
+ flags |= SO_TYPE_SEQSCAN |
SO_ALLOW_STRAT | SO_ALLOW_SYNC | SO_ALLOW_PAGEMODE;
return rel->rd_tableam->scan_begin(rel, snapshot, nkeys, key, NULL, flags);
@@ -919,9 +921,9 @@ table_beginscan_strat(Relation rel, Snapshot snapshot,
*/
static inline TableScanDesc
table_beginscan_bm(Relation rel, Snapshot snapshot,
- int nkeys, ScanKeyData *key)
+ int nkeys, ScanKeyData *key, uint32 flags)
{
- uint32 flags = SO_TYPE_BITMAPSCAN | SO_ALLOW_PAGEMODE;
+ flags |= SO_TYPE_BITMAPSCAN | SO_ALLOW_PAGEMODE;
return rel->rd_tableam->scan_begin(rel, snapshot, nkeys, key,
NULL, flags);
@@ -1128,7 +1130,8 @@ extern void table_parallelscan_initialize(Relation rel,
* Caller must hold a suitable lock on the relation.
*/
extern TableScanDesc table_beginscan_parallel(Relation relation,
- ParallelTableScanDesc pscan);
+ ParallelTableScanDesc pscan,
+ uint32 flags);
/*
* Restart a parallel scan. Call this in the leader process. Caller is
@@ -1154,9 +1157,9 @@ table_parallelscan_reinitialize(Relation rel, ParallelTableScanDesc pscan)
* Tuples for an index scan can then be fetched via table_index_fetch_tuple().
*/
static inline IndexFetchTableData *
-table_index_fetch_begin(Relation rel)
+table_index_fetch_begin(Relation rel, uint32 flags)
{
- return rel->rd_tableam->index_fetch_begin(rel);
+ return rel->rd_tableam->index_fetch_begin(rel, flags);
}
/*
diff --git a/src/include/nodes/execnodes.h b/src/include/nodes/execnodes.h
index 18ae8f0d4bb..0c3b0d60168 100644
--- a/src/include/nodes/execnodes.h
+++ b/src/include/nodes/execnodes.h
@@ -676,6 +676,12 @@ typedef struct EState
* ExecDoInitialPruning() */
const char *es_sourceText; /* Source text from QueryDesc */
+ /*
+ * RT indexes of relations modified by the query either through
+ * UPDATE/DELETE/INSERT/MERGE or SELECT FOR UPDATE
+ */
+ Bitmapset *es_modified_relids;
+
JunkFilter *es_junkFilter; /* top-level junk filter, if any */
/* If query can insert/delete tuples, the command ID to mark them with */
diff --git a/src/test/recovery/t/035_standby_logical_decoding.pl b/src/test/recovery/t/035_standby_logical_decoding.pl
index ebe2fae1789..bdd9f0a62cd 100644
--- a/src/test/recovery/t/035_standby_logical_decoding.pl
+++ b/src/test/recovery/t/035_standby_logical_decoding.pl
@@ -296,6 +296,7 @@ wal_level = 'logical'
max_replication_slots = 4
max_wal_senders = 4
autovacuum = off
+hot_standby_feedback = on
});
$node_primary->dump_info;
$node_primary->start;
@@ -748,7 +749,7 @@ check_pg_recvlogical_stderr($handle,
$logstart = -s $node_standby->logfile;
reactive_slots_change_hfs_and_wait_for_xmins('shared_row_removal_',
- 'no_conflict_', 0, 1);
+ 'no_conflict_', 1, 0);
# This should not trigger a conflict
wait_until_vacuum_can_remove(
--
2.43.0
v20-0012-Set-pd_prune_xid-on-insert.patchtext/x-patch; charset=UTF-8; name=v20-0012-Set-pd_prune_xid-on-insert.patchDownload
From 80beb9d2f82b7b42fd162fbfacf065459afac578 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <melanieplageman@gmail.com>
Date: Tue, 29 Jul 2025 16:12:56 -0400
Subject: [PATCH v20 12/12] Set pd_prune_xid on insert
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
Now that visibility map (VM) updates can occur during read-only queries,
it makes sense to also set the page’s pd_prune_xid hint during inserts.
This enables heap_page_prune_and_freeze() to run after a page is
filled with newly inserted tuples the first time it is read.
This change also addresses a long-standing note in heap_insert() and
heap_multi_insert(), which observed that setting pd_prune_xid would
help clean up aborted insertions sooner. Without it, such tuples might
linger until VACUUM, whereas now they can be pruned earlier.
Setting pd_prune_xid on insert can cause a page to be dirtied and
written out when it previously would not have been, affetcting the
reported number of hits in the index-killtuples isolation test. It is
unclear if this is a bug in the way hits are tracked, a faulty test
expectation, or if simply updating the test's expected output is
sufficient remediation.
---
src/backend/access/heap/heapam.c | 25 +++++++++++++------
src/backend/access/heap/heapam_xlog.c | 15 ++++++++++-
.../isolation/expected/index-killtuples.out | 6 ++---
3 files changed, 34 insertions(+), 12 deletions(-)
diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index ae53e311ce1..f329f497480 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -2104,6 +2104,7 @@ heap_insert(Relation relation, HeapTuple tup, CommandId cid,
TransactionId xid = GetCurrentTransactionId();
HeapTuple heaptup;
Buffer buffer;
+ Page page;
Buffer vmbuffer = InvalidBuffer;
bool all_visible_cleared = false;
@@ -2163,15 +2164,19 @@ heap_insert(Relation relation, HeapTuple tup, CommandId cid,
}
/*
- * XXX Should we set PageSetPrunable on this page ?
+ * Set pd_prune_xid to trigger heap_page_prune_and_freeze() once the page
+ * is full so that we can set the page all-visible in the VM.
*
- * The inserting transaction may eventually abort thus making this tuple
- * DEAD and hence available for pruning. Though we don't want to optimize
- * for aborts, if no other tuple in this page is UPDATEd/DELETEd, the
- * aborted tuple will never be pruned until next vacuum is triggered.
+ * Setting pd_prune_xid is also handy if the inserting transaction
+ * eventually aborts making this tuple DEAD and hence available for
+ * pruning. If no other tuple in this page is UPDATEd/DELETEd, the aborted
+ * tuple would never otherwise be pruned until next vacuum is triggered.
*
- * If you do add PageSetPrunable here, add it in heap_xlog_insert too.
+ * Don't set it if we are in bootstrap mode, though.
*/
+ page = BufferGetPage(buffer);
+ if (TransactionIdIsNormal(xid))
+ PageSetPrunable(page, xid);
MarkBufferDirty(buffer);
@@ -2181,7 +2186,6 @@ heap_insert(Relation relation, HeapTuple tup, CommandId cid,
xl_heap_insert xlrec;
xl_heap_header xlhdr;
XLogRecPtr recptr;
- Page page = BufferGetPage(buffer);
uint8 info = XLOG_HEAP_INSERT;
int bufflags = 0;
@@ -2545,8 +2549,13 @@ heap_multi_insert(Relation relation, TupleTableSlot **slots, int ntuples,
}
/*
- * XXX Should we set PageSetPrunable on this page ? See heap_insert()
+ * Set pd_prune_xid. See heap_insert() for more on why we do this when
+ * inserting tuples. This only makes sense if we aren't already
+ * setting the page frozen in the VM. We also don't set it in
+ * bootstrap mode.
*/
+ if (!all_frozen_set && TransactionIdIsNormal(xid))
+ PageSetPrunable(page, xid);
MarkBufferDirty(buffer);
diff --git a/src/backend/access/heap/heapam_xlog.c b/src/backend/access/heap/heapam_xlog.c
index 5ab46e8bf8f..dac640f5c9d 100644
--- a/src/backend/access/heap/heapam_xlog.c
+++ b/src/backend/access/heap/heapam_xlog.c
@@ -462,6 +462,12 @@ heap_xlog_insert(XLogReaderState *record)
freespace = PageGetHeapFreeSpace(page); /* needed to update FSM below */
+ /*
+ * Set the page prunable to trigger on-access pruning later which may
+ * set the page all-visible in the VM.
+ */
+ PageSetPrunable(page, XLogRecGetXid(record));
+
PageSetLSN(page, lsn);
if (xlrec->flags & XLH_INSERT_ALL_VISIBLE_CLEARED)
@@ -611,9 +617,16 @@ heap_xlog_multi_insert(XLogReaderState *record)
if (xlrec->flags & XLH_INSERT_ALL_VISIBLE_CLEARED)
PageClearAllVisible(page);
- /* XLH_INSERT_ALL_FROZEN_SET implies that all tuples are visible */
+ /*
+ * XLH_INSERT_ALL_FROZEN_SET implies that all tuples are visible. If
+ * we are not setting the page frozen, then set the page's prunable
+ * hint so that we trigger on-access pruning later which may set the
+ * page all-visible in the VM.
+ */
if (xlrec->flags & XLH_INSERT_ALL_FROZEN_SET)
PageSetAllVisible(page);
+ else
+ PageSetPrunable(page, XLogRecGetXid(record));
MarkBufferDirty(buffer);
}
diff --git a/src/test/isolation/expected/index-killtuples.out b/src/test/isolation/expected/index-killtuples.out
index be7ddd756ef..b29f2434b00 100644
--- a/src/test/isolation/expected/index-killtuples.out
+++ b/src/test/isolation/expected/index-killtuples.out
@@ -54,7 +54,7 @@ step flush: SELECT FROM pg_stat_force_next_flush();
step result: SELECT heap_blks_read + heap_blks_hit - counter.heap_accesses AS new_heap_accesses FROM counter, pg_statio_all_tables WHERE relname = 'kill_prior_tuple';
new_heap_accesses
-----------------
- 1
+ 2
(1 row)
step measure: UPDATE counter SET heap_accesses = (SELECT heap_blks_read + heap_blks_hit FROM pg_statio_all_tables WHERE relname = 'kill_prior_tuple');
@@ -130,7 +130,7 @@ step flush: SELECT FROM pg_stat_force_next_flush();
step result: SELECT heap_blks_read + heap_blks_hit - counter.heap_accesses AS new_heap_accesses FROM counter, pg_statio_all_tables WHERE relname = 'kill_prior_tuple';
new_heap_accesses
-----------------
- 1
+ 2
(1 row)
step measure: UPDATE counter SET heap_accesses = (SELECT heap_blks_read + heap_blks_hit FROM pg_statio_all_tables WHERE relname = 'kill_prior_tuple');
@@ -283,7 +283,7 @@ step flush: SELECT FROM pg_stat_force_next_flush();
step result: SELECT heap_blks_read + heap_blks_hit - counter.heap_accesses AS new_heap_accesses FROM counter, pg_statio_all_tables WHERE relname = 'kill_prior_tuple';
new_heap_accesses
-----------------
- 1
+ 2
(1 row)
step measure: UPDATE counter SET heap_accesses = (SELECT heap_blks_read + heap_blks_hit FROM pg_statio_all_tables WHERE relname = 'kill_prior_tuple');
--
2.43.0
On Tue, 18 Nov 2025 at 04:07, Melanie Plageman
<melanieplageman@gmail.com> wrote:
Attached v20 has general cleanup, changes to the table/index AM
callbacks detailed below, and it moves the
heap_page_prune_and_freeze() refactoring commit down the stack to
0004.0001 - 0003 are fairly trivial cleanup patches. I think they are ready
to commit, so if I don't hear any objections in the next few days,
I'll go ahead and commit them.
Hi! I looked up these 0002-0003 patches once again, LGTM. In
particular, I think 0002 & 0003 makes VM bits management more simple.
My only review comment is about 0003:
Should we make frz_conflict_horizon not a heap_page_will_freeze's
argument but rather just another field of PruneState struct? If i'm
not mistaken, 'frz_conflict_horizon' fits good to be a part of pruning
state
--
Best regards,
Kirill Reshke
On Wed, Nov 19, 2025 at 4:35 AM Kirill Reshke <reshkekirill@gmail.com> wrote:
Hi! I looked up these 0002-0003 patches once again, LGTM. In
particular, I think 0002 & 0003 makes VM bits management more simple.
Thanks for the review!
My only review comment is about 0003:
Should we make frz_conflict_horizon not a heap_page_will_freeze's
argument but rather just another field of PruneState struct? If i'm
not mistaken, 'frz_conflict_horizon' fits good to be a part of pruning
state
Since it is passed into one of the helpers, I think I agree. Attached
v21 has this change.
- Melanie
Attachments:
v21-0001-Refactor-heap_page_prune_and_freeze-parameters-i.patchtext/x-patch; charset=UTF-8; name=v21-0001-Refactor-heap_page_prune_and_freeze-parameters-i.patchDownload
From a87132c42cae9379cea52df91e10d8d5e2677e16 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <melanieplageman@gmail.com>
Date: Tue, 14 Oct 2025 11:10:25 -0400
Subject: [PATCH v21 01/12] Refactor heap_page_prune_and_freeze() parameters
into a struct
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
heap_page_prune_and_freeze() had accumulated an unwieldy number of input
parameters and upcoming work to handle VM updates in this function will
add even more.
Introduce a new PruneFreezeParams struct to group the function’s input
parameters, improving readability and maintainability.
Author: Melanie Plageman <melanieplageman@gmail.com>
Reviewed-by: Kirill Reshke <reshkekirill@gmail.com>
Suggested-by: Andres Freund <andres@anarazel.de>
Discussion: https://postgr.es/m/yn4zp35kkdsjx6wf47zcfmxgexxt4h2og47pvnw2x5ifyrs3qc%407uw6jyyxuyf7
---
src/backend/access/heap/pruneheap.c | 91 +++++++++++++---------------
src/backend/access/heap/vacuumlazy.c | 12 ++--
src/include/access/heapam.h | 63 +++++++++++++++----
src/tools/pgindent/typedefs.list | 1 +
4 files changed, 100 insertions(+), 67 deletions(-)
diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index 231bea679c6..e9e14cb42b7 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -261,12 +261,18 @@ heap_page_prune_opt(Relation relation, Buffer buffer)
PruneFreezeResult presult;
/*
- * For now, pass mark_unused_now as false regardless of whether or
- * not the relation has indexes, since we cannot safely determine
- * that during on-access pruning with the current implementation.
+ * We don't pass the HEAP_PAGE_PRUNE_MARK_UNUSED_NOW option
+ * regardless of whether or not the relation has indexes, since we
+ * cannot safely determine that during on-access pruning with the
+ * current implementation.
*/
- heap_page_prune_and_freeze(relation, buffer, vistest, 0,
- NULL, &presult, PRUNE_ON_ACCESS, &dummy_off_loc, NULL, NULL);
+ PruneFreezeParams params = {.relation = relation,.buffer = buffer,
+ .reason = PRUNE_ON_ACCESS,.options = 0,
+ .vistest = vistest,.cutoffs = NULL
+ };
+
+ heap_page_prune_and_freeze(¶ms, &presult, &dummy_off_loc,
+ NULL, NULL);
/*
* Report the number of tuples reclaimed to pgstats. This is
@@ -419,60 +425,44 @@ heap_page_will_freeze(Relation relation, Buffer buffer,
* also need to account for a reduction in the length of the line pointer
* array following array truncation by us.
*
- * If the HEAP_PRUNE_FREEZE option is set, we will also freeze tuples if it's
- * required in order to advance relfrozenxid / relminmxid, or if it's
- * considered advantageous for overall system performance to do so now. The
- * 'cutoffs', 'presult', 'new_relfrozen_xid' and 'new_relmin_mxid' arguments
- * are required when freezing. When HEAP_PRUNE_FREEZE option is set, we also
- * set presult->all_visible and presult->all_frozen on exit, to indicate if
- * the VM bits can be set. They are always set to false when the
- * HEAP_PRUNE_FREEZE option is not set, because at the moment only callers
- * that also freeze need that information.
- *
- * vistest is used to distinguish whether tuples are DEAD or RECENTLY_DEAD
- * (see heap_prune_satisfies_vacuum).
- *
- * options:
- * MARK_UNUSED_NOW indicates that dead items can be set LP_UNUSED during
- * pruning.
+ * params contains the input parameters used to control freezing and pruning
+ * behavior. See the definition of PruneFreezeParams for more on what each
+ * parameter does.
*
- * FREEZE indicates that we will also freeze tuples, and will return
- * 'all_visible', 'all_frozen' flags to the caller.
- *
- * cutoffs contains the freeze cutoffs, established by VACUUM at the beginning
- * of vacuuming the relation. Required if HEAP_PRUNE_FREEZE option is set.
- * cutoffs->OldestXmin is also used to determine if dead tuples are
- * HEAPTUPLE_RECENTLY_DEAD or HEAPTUPLE_DEAD.
+ * If the HEAP_PAGE_PRUNE_FREEZE option is set in params, we will freeze
+ * tuples if it's required in order to advance relfrozenxid / relminmxid, or
+ * if it's considered advantageous for overall system performance to do so
+ * now. The 'params.cutoffs', 'presult', 'new_relfrozen_xid' and
+ * 'new_relmin_mxid' arguments are required when freezing. When
+ * HEAP_PAGE_PRUNE_FREEZE option is passed, we also set presult->all_visible
+ * and presult->all_frozen on exit, to indicate if the VM bits can be set.
+ * They are always set to false when the HEAP_PAGE_PRUNE_FREEZE option is not
+ * passed, because at the moment only callers that also freeze need that
+ * information.
*
* presult contains output parameters needed by callers, such as the number of
* tuples removed and the offsets of dead items on the page after pruning.
* heap_page_prune_and_freeze() is responsible for initializing it. Required
* by all callers.
*
- * reason indicates why the pruning is performed. It is included in the WAL
- * record for debugging and analysis purposes, but otherwise has no effect.
- *
* off_loc is the offset location required by the caller to use in error
* callback.
*
* new_relfrozen_xid and new_relmin_mxid must provided by the caller if the
- * HEAP_PRUNE_FREEZE option is set. On entry, they contain the oldest XID and
- * multi-XID seen on the relation so far. They will be updated with oldest
- * values present on the page after pruning. After processing the whole
- * relation, VACUUM can use these values as the new relfrozenxid/relminmxid
- * for the relation.
+ * HEAP_PAGE_PRUNE_FREEZE option is set in params. On entry, they contain the
+ * oldest XID and multi-XID seen on the relation so far. They will be updated
+ * with oldest values present on the page after pruning. After processing the
+ * whole relation, VACUUM can use these values as the new
+ * relfrozenxid/relminmxid for the relation.
*/
void
-heap_page_prune_and_freeze(Relation relation, Buffer buffer,
- GlobalVisState *vistest,
- int options,
- struct VacuumCutoffs *cutoffs,
+heap_page_prune_and_freeze(PruneFreezeParams *params,
PruneFreezeResult *presult,
- PruneReason reason,
OffsetNumber *off_loc,
TransactionId *new_relfrozen_xid,
MultiXactId *new_relmin_mxid)
{
+ Buffer buffer = params->buffer;
Page page = BufferGetPage(buffer);
BlockNumber blockno = BufferGetBlockNumber(buffer);
OffsetNumber offnum,
@@ -486,10 +476,11 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
int64 fpi_before = pgWalUsage.wal_fpi;
/* Copy parameters to prstate */
- prstate.vistest = vistest;
- prstate.mark_unused_now = (options & HEAP_PAGE_PRUNE_MARK_UNUSED_NOW) != 0;
- prstate.attempt_freeze = (options & HEAP_PAGE_PRUNE_FREEZE) != 0;
- prstate.cutoffs = cutoffs;
+ prstate.vistest = params->vistest;
+ prstate.mark_unused_now =
+ (params->options & HEAP_PAGE_PRUNE_MARK_UNUSED_NOW) != 0;
+ prstate.attempt_freeze = (params->options & HEAP_PAGE_PRUNE_FREEZE) != 0;
+ prstate.cutoffs = params->cutoffs;
/*
* Our strategy is to scan the page and make lists of items to change,
@@ -583,7 +574,7 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
prstate.visibility_cutoff_xid = InvalidTransactionId;
maxoff = PageGetMaxOffsetNumber(page);
- tup.t_tableOid = RelationGetRelid(relation);
+ tup.t_tableOid = RelationGetRelid(params->relation);
/*
* Determine HTSV for all tuples, and queue them up for processing as HOT
@@ -786,7 +777,7 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
* Decide if we want to go ahead with freezing according to the freeze
* plans we prepared, or not.
*/
- do_freeze = heap_page_will_freeze(relation, buffer,
+ do_freeze = heap_page_will_freeze(params->relation, buffer,
did_tuple_hint_fpi,
do_prune,
do_hint_prune,
@@ -838,7 +829,7 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
/*
* Emit a WAL XLOG_HEAP2_PRUNE* record showing what we did
*/
- if (RelationNeedsWAL(relation))
+ if (RelationNeedsWAL(params->relation))
{
/*
* The snapshotConflictHorizon for the whole record should be the
@@ -876,11 +867,11 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
else
conflict_xid = prstate.latest_xid_removed;
- log_heap_prune_and_freeze(relation, buffer,
+ log_heap_prune_and_freeze(params->relation, buffer,
InvalidBuffer, /* vmbuffer */
0, /* vmflags */
conflict_xid,
- true, reason,
+ true, params->reason,
prstate.frozen, prstate.nfrozen,
prstate.redirected, prstate.nredirected,
prstate.nowdead, prstate.ndead,
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index deb9a3dc0d1..2b9e5c7f81b 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -1965,7 +1965,10 @@ lazy_scan_prune(LVRelState *vacrel,
{
Relation rel = vacrel->rel;
PruneFreezeResult presult;
- int prune_options = 0;
+ PruneFreezeParams params = {.relation = rel,.buffer = buf,
+ .reason = PRUNE_VACUUM_SCAN,.options = HEAP_PAGE_PRUNE_FREEZE,
+ .cutoffs = &vacrel->cutoffs,.vistest = vacrel->vistest
+ };
Assert(BufferGetBlockNumber(buf) == blkno);
@@ -1984,12 +1987,11 @@ lazy_scan_prune(LVRelState *vacrel,
* tuples. Pruning will have determined whether or not the page is
* all-visible.
*/
- prune_options = HEAP_PAGE_PRUNE_FREEZE;
if (vacrel->nindexes == 0)
- prune_options |= HEAP_PAGE_PRUNE_MARK_UNUSED_NOW;
+ params.options |= HEAP_PAGE_PRUNE_MARK_UNUSED_NOW;
- heap_page_prune_and_freeze(rel, buf, vacrel->vistest, prune_options,
- &vacrel->cutoffs, &presult, PRUNE_VACUUM_SCAN,
+ heap_page_prune_and_freeze(¶ms,
+ &presult,
&vacrel->offnum,
&vacrel->NewRelfrozenXid, &vacrel->NewRelminMxid);
diff --git a/src/include/access/heapam.h b/src/include/access/heapam.h
index 909db73b7bb..632c4332a8c 100644
--- a/src/include/access/heapam.h
+++ b/src/include/access/heapam.h
@@ -221,6 +221,56 @@ typedef struct HeapPageFreeze
} HeapPageFreeze;
+
+/* 'reason' codes for heap_page_prune_and_freeze() */
+typedef enum
+{
+ PRUNE_ON_ACCESS, /* on-access pruning */
+ PRUNE_VACUUM_SCAN, /* VACUUM 1st heap pass */
+ PRUNE_VACUUM_CLEANUP, /* VACUUM 2nd heap pass */
+} PruneReason;
+
+/*
+ * Input parameters to heap_page_prune_and_freeze()
+ */
+typedef struct PruneFreezeParams
+{
+ Relation relation; /* relation containing buffer to be pruned */
+ Buffer buffer; /* buffer to be pruned */
+
+ /*
+ * The reason pruning was performed. It is used to set the WAL record
+ * opcode which is used for debugging and analysis purposes.
+ */
+ PruneReason reason;
+
+ /*
+ * Contains flag bits:
+ *
+ * HEAP_PAGE_PRUNE_MARK_UNUSED_NOW indicates that dead items can be set
+ * LP_UNUSED during pruning.
+ *
+ * HEAP_PAGE_PRUNE_FREEZE indicates that we will also freeze tuples, and
+ * will return 'all_visible', 'all_frozen' flags to the caller.
+ */
+ int options;
+
+ /*
+ * vistest is used to distinguish whether tuples are DEAD or RECENTLY_DEAD
+ * (see heap_prune_satisfies_vacuum).
+ */
+ GlobalVisState *vistest;
+
+ /*
+ * Contains the cutoffs used for freezing. They are required if the
+ * HEAP_PAGE_PRUNE_FREEZE option is set. cutoffs->OldestXmin is also used
+ * to determine if dead tuples are HEAPTUPLE_RECENTLY_DEAD or
+ * HEAPTUPLE_DEAD. Currently only vacuum passes in cutoffs. Vacuum
+ * calculates them once, at the beginning of vacuuming the relation.
+ */
+ struct VacuumCutoffs *cutoffs;
+} PruneFreezeParams;
+
/*
* Per-page state returned by heap_page_prune_and_freeze()
*/
@@ -264,13 +314,6 @@ typedef struct PruneFreezeResult
OffsetNumber deadoffsets[MaxHeapTuplesPerPage];
} PruneFreezeResult;
-/* 'reason' codes for heap_page_prune_and_freeze() */
-typedef enum
-{
- PRUNE_ON_ACCESS, /* on-access pruning */
- PRUNE_VACUUM_SCAN, /* VACUUM 1st heap pass */
- PRUNE_VACUUM_CLEANUP, /* VACUUM 2nd heap pass */
-} PruneReason;
/* ----------------
* function prototypes for heap access method
@@ -367,12 +410,8 @@ extern TransactionId heap_index_delete_tuples(Relation rel,
/* in heap/pruneheap.c */
extern void heap_page_prune_opt(Relation relation, Buffer buffer);
-extern void heap_page_prune_and_freeze(Relation relation, Buffer buffer,
- GlobalVisState *vistest,
- int options,
- struct VacuumCutoffs *cutoffs,
+extern void heap_page_prune_and_freeze(PruneFreezeParams *params,
PruneFreezeResult *presult,
- PruneReason reason,
OffsetNumber *off_loc,
TransactionId *new_relfrozen_xid,
MultiXactId *new_relmin_mxid);
diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list
index 57f2a9ccdc5..c751c25a04d 100644
--- a/src/tools/pgindent/typedefs.list
+++ b/src/tools/pgindent/typedefs.list
@@ -2348,6 +2348,7 @@ ProjectionPath
PromptInterruptContext
ProtocolVersion
PrsStorage
+PruneFreezeParams
PruneFreezeResult
PruneReason
PruneState
--
2.43.0
v21-0002-Keep-all_frozen-updated-in-heap_page_prune_and_f.patchtext/x-patch; charset=US-ASCII; name=v21-0002-Keep-all_frozen-updated-in-heap_page_prune_and_f.patchDownload
From bedba753bb9fcc37b3b5f1a7e38c02828850520d Mon Sep 17 00:00:00 2001
From: Melanie Plageman <melanieplageman@gmail.com>
Date: Tue, 14 Oct 2025 14:55:40 -0400
Subject: [PATCH v21 02/12] Keep all_frozen updated in
heap_page_prune_and_freeze
Previously, we relied on all_visible and all_frozen being used together
to ensure that all_frozen was correct, but it is better to keep both
fields updated.
Future changes will separate their usage, so we should not depend on
all_visible for the validity of all_frozen.
Author: Melanie Plageman <melanieplageman@gmail.com>
Reviewed-by: Kirill Reshke <reshkekirill@gmail.com>
Discussion: https://postgr.es/m/flat/CAAKRu_ZMw6Npd_qm2KM%2BFwQ3cMOMx1Dh3VMhp8-V7SOLxdK9-g%40mail.gmail.com
---
src/backend/access/heap/pruneheap.c | 60 +++++++++++++++-------------
src/backend/access/heap/vacuumlazy.c | 9 ++---
2 files changed, 37 insertions(+), 32 deletions(-)
diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index e9e14cb42b7..7cd51c7be33 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -143,10 +143,6 @@ typedef struct
* whether to freeze the page or not. The all_visible and all_frozen
* values returned to the caller are adjusted to include LP_DEAD items at
* the end.
- *
- * all_frozen should only be considered valid if all_visible is also set;
- * we don't bother to clear the all_frozen flag every time we clear the
- * all_visible flag.
*/
bool all_visible;
bool all_frozen;
@@ -359,8 +355,10 @@ heap_page_will_freeze(Relation relation, Buffer buffer,
* anymore. The opportunistic freeze heuristic must be improved;
* however, for now, try to approximate the old logic.
*/
- if (prstate->all_visible && prstate->all_frozen && prstate->nfrozen > 0)
+ if (prstate->all_frozen && prstate->nfrozen > 0)
{
+ Assert(prstate->all_visible);
+
/*
* Freezing would make the page all-frozen. Have already emitted
* an FPI or will do so anyway?
@@ -544,9 +542,9 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
* dead tuples which are not yet removable. However, dead tuples which
* will be removed by the end of vacuuming should not preclude us from
* opportunistically freezing. Because of that, we do not clear
- * all_visible when we see LP_DEAD items. We fix that at the end of the
- * function, when we return the value to the caller, so that the caller
- * doesn't set the VM bit incorrectly.
+ * all_visible and all_frozen when we see LP_DEAD items. We fix that at
+ * the end of the function, when we return the value to the caller, so
+ * that the caller doesn't set the VM bits incorrectly.
*/
if (prstate.attempt_freeze)
{
@@ -783,6 +781,8 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
do_hint_prune,
&prstate);
+ Assert(!prstate.all_frozen || prstate.all_visible);
+
/* Any error while applying the changes is critical */
START_CRIT_SECTION();
@@ -852,7 +852,7 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
*/
if (do_freeze)
{
- if (prstate.all_visible && prstate.all_frozen)
+ if (prstate.all_frozen)
frz_conflict_horizon = prstate.visibility_cutoff_xid;
else
{
@@ -889,16 +889,16 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
presult->recently_dead_tuples = prstate.recently_dead_tuples;
/*
- * It was convenient to ignore LP_DEAD items in all_visible earlier on to
- * make the choice of whether or not to freeze the page unaffected by the
- * short-term presence of LP_DEAD items. These LP_DEAD items were
- * effectively assumed to be LP_UNUSED items in the making. It doesn't
- * matter which vacuum heap pass (initial pass or final pass) ends up
- * setting the page all-frozen, as long as the ongoing VACUUM does it.
+ * It was convenient to ignore LP_DEAD items in all_visible/all_frozen
+ * earlier on to make the choice of whether or not to freeze the page
+ * unaffected by the short-term presence of LP_DEAD items. These LP_DEAD
+ * items were effectively assumed to be LP_UNUSED items in the making. It
+ * doesn't matter which vacuum heap pass (initial pass or final pass) ends
+ * up setting the page all-frozen, as long as the ongoing VACUUM does it.
*
- * Now that freezing has been finalized, unset all_visible if there are
- * any LP_DEAD items on the page. It needs to reflect the present state
- * of the page, as expected by our caller.
+ * Now that freezing has been finalized, unset all_visible and all_frozen
+ * if there are any LP_DEAD items on the page. It needs to reflect the
+ * present state of the page, as expected by our caller.
*/
if (prstate.all_visible && prstate.lpdead_items == 0)
{
@@ -1289,8 +1289,9 @@ heap_prune_record_dead(PruneState *prstate, OffsetNumber offnum,
prstate->ndead++;
/*
- * Deliberately delay unsetting all_visible until later during pruning.
- * Removable dead tuples shouldn't preclude freezing the page.
+ * Deliberately delay unsetting all_visible and all_frozen until later
+ * during pruning. Removable dead tuples shouldn't preclude freezing the
+ * page.
*/
/* Record the dead offset for vacuum */
@@ -1418,6 +1419,7 @@ heap_prune_record_unchanged_lp_normal(Page page, PruneState *prstate, OffsetNumb
if (!HeapTupleHeaderXminCommitted(htup))
{
prstate->all_visible = false;
+ prstate->all_frozen = false;
break;
}
@@ -1432,14 +1434,15 @@ heap_prune_record_unchanged_lp_normal(Page page, PruneState *prstate, OffsetNumb
/*
* For now always use prstate->cutoffs for this test, because
- * we only update 'all_visible' when freezing is requested. We
- * could use GlobalVisTestIsRemovableXid instead, if a
- * non-freezing caller wanted to set the VM bit.
+ * we only update 'all_visible' and 'all_frozen' when freezing
+ * is requested. We could use GlobalVisTestIsRemovableXid
+ * instead, if a non-freezing caller wanted to set the VM bit.
*/
Assert(prstate->cutoffs);
if (!TransactionIdPrecedes(xmin, prstate->cutoffs->OldestXmin))
{
prstate->all_visible = false;
+ prstate->all_frozen = false;
break;
}
@@ -1453,6 +1456,7 @@ heap_prune_record_unchanged_lp_normal(Page page, PruneState *prstate, OffsetNumb
case HEAPTUPLE_RECENTLY_DEAD:
prstate->recently_dead_tuples++;
prstate->all_visible = false;
+ prstate->all_frozen = false;
/*
* This tuple will soon become DEAD. Update the hint field so
@@ -1472,6 +1476,7 @@ heap_prune_record_unchanged_lp_normal(Page page, PruneState *prstate, OffsetNumb
* does, so be consistent.
*/
prstate->all_visible = false;
+ prstate->all_frozen = false;
/*
* If we wanted to optimize for aborts, we might consider marking
@@ -1490,6 +1495,7 @@ heap_prune_record_unchanged_lp_normal(Page page, PruneState *prstate, OffsetNumb
*/
prstate->live_tuples++;
prstate->all_visible = false;
+ prstate->all_frozen = false;
/*
* This tuple may soon become DEAD. Update the hint field so that
@@ -1554,10 +1560,10 @@ heap_prune_record_unchanged_lp_dead(Page page, PruneState *prstate, OffsetNumber
* hastup/nonempty_pages as provisional no matter how LP_DEAD items are
* handled (handled here, or handled later on).
*
- * Similarly, don't unset all_visible until later, at the end of
- * heap_page_prune_and_freeze(). This will allow us to attempt to freeze
- * the page after pruning. As long as we unset it before updating the
- * visibility map, this will be correct.
+ * Similarly, don't unset all_visible and all_frozen until later, at the
+ * end of heap_page_prune_and_freeze(). This will allow us to attempt to
+ * freeze the page after pruning. As long as we unset it before updating
+ * the visibility map, this will be correct.
*/
/* Record the dead offset for vacuum */
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index 2b9e5c7f81b..e1b7456823d 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -2017,7 +2017,6 @@ lazy_scan_prune(LVRelState *vacrel,
* agreement with heap_page_is_all_visible() using an assertion.
*/
#ifdef USE_ASSERT_CHECKING
- /* Note that all_frozen value does not matter when !all_visible */
if (presult.all_visible)
{
TransactionId debug_cutoff;
@@ -2071,6 +2070,7 @@ lazy_scan_prune(LVRelState *vacrel,
*has_lpdead_items = (presult.lpdead_items > 0);
Assert(!presult.all_visible || !(*has_lpdead_items));
+ Assert(!presult.all_frozen || presult.all_visible);
/*
* Handle setting visibility map bit based on information from the VM (as
@@ -2176,11 +2176,10 @@ lazy_scan_prune(LVRelState *vacrel,
/*
* If the all-visible page is all-frozen but not marked as such yet, mark
- * it as all-frozen. Note that all_frozen is only valid if all_visible is
- * true, so we must check both all_visible and all_frozen.
+ * it as all-frozen.
*/
- else if (all_visible_according_to_vm && presult.all_visible &&
- presult.all_frozen && !VM_ALL_FROZEN(vacrel->rel, blkno, &vmbuffer))
+ else if (all_visible_according_to_vm && presult.all_frozen &&
+ !VM_ALL_FROZEN(vacrel->rel, blkno, &vmbuffer))
{
uint8 old_vmbits;
--
2.43.0
v21-0003-Update-PruneState.all_-visible-frozen-earlier-in.patchtext/x-patch; charset=UTF-8; name=v21-0003-Update-PruneState.all_-visible-frozen-earlier-in.patchDownload
From 021ad801205c44581c68d826a03e53ed678abdf0 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <melanieplageman@gmail.com>
Date: Tue, 14 Oct 2025 15:21:49 -0400
Subject: [PATCH v21 03/12] Update PruneState.all_[visible|frozen] earlier in
pruning
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
In the prune/freeze path, we currently delay clearing all_visible and
all_frozen in the presence of dead items. This allows opportunistic
freezing if the page would otherwise be fully frozen, since those dead
items are later removed in vacuum’s third phase.
To move the VM update into the same WAL record that
prunes and freezes tuples, we must know whether the page will
be marked all-visible/all-frozen before emitting WAL.
The only barrier to updating these flags immediately after deciding
whether to opportunistically freeze is that we previously used
all_frozen to compute the snapshot conflict horizon when freezing
tuples. By determining the cutoff earlier, we can update the flags
immediately after making the freeze decision.
This is required to set the VM in the XLOG_HEAP2_PRUNE_VACUUM_SCAN
record emitted by pruning and freezing.
Author: Melanie Plageman <melanieplageman@gmail.com>
Reviewed-by: Kirill Reshke <reshkekirill@gmail.com>
Discussion: https://postgr.es/m/flat/CAAKRu_ZMw6Npd_qm2KM%2BFwQ3cMOMx1Dh3VMhp8-V7SOLxdK9-g%40mail.gmail.com
---
src/backend/access/heap/pruneheap.c | 116 ++++++++++++++--------------
1 file changed, 58 insertions(+), 58 deletions(-)
diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index 7cd51c7be33..8e40565381f 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -129,6 +129,13 @@ typedef struct
int lpdead_items; /* number of items in the array */
OffsetNumber *deadoffsets; /* points directly to presult->deadoffsets */
+ /*
+ * The snapshot conflict horizon used when freezing tuples. The final
+ * snapshot conflict horizon for the record may be newer if pruning
+ * removes newer transaction IDs.
+ */
+ TransactionId frz_conflict_horizon;
+
/*
* all_visible and all_frozen indicate if the all-visible and all-frozen
* bits in the visibility map can be set for this page after pruning.
@@ -138,11 +145,11 @@ typedef struct
* bits. It is only valid if we froze some tuples, and all_frozen is
* true.
*
- * NOTE: all_visible and all_frozen don't include LP_DEAD items. That's
- * convenient for heap_page_prune_and_freeze(), to use them to decide
- * whether to freeze the page or not. The all_visible and all_frozen
- * values returned to the caller are adjusted to include LP_DEAD items at
- * the end.
+ * NOTE: all_visible and all_frozen initially don't include LP_DEAD items.
+ * That's convenient for heap_page_prune_and_freeze() to use them to
+ * decide whether to freeze the page or not. The all_visible and
+ * all_frozen values returned to the caller are adjusted to include
+ * LP_DEAD items after we determine whether to opportunistically freeze.
*/
bool all_visible;
bool all_frozen;
@@ -388,6 +395,22 @@ heap_page_will_freeze(Relation relation, Buffer buffer,
* critical section.
*/
heap_pre_freeze_checks(buffer, prstate->frozen, prstate->nfrozen);
+
+ /*
+ * Calculate what the snapshot conflict horizon should be for a record
+ * freezing tuples. We can use the visibility_cutoff_xid as our cutoff
+ * for conflicts when the whole page is eligible to become all-frozen
+ * in the VM once we're done with it. Otherwise, we generate a
+ * conservative cutoff by stepping back from OldestXmin.
+ */
+ if (prstate->all_frozen)
+ prstate->frz_conflict_horizon = prstate->visibility_cutoff_xid;
+ else
+ {
+ /* Avoids false conflicts when hot_standby_feedback in use */
+ prstate->frz_conflict_horizon = prstate->cutoffs->OldestXmin;
+ TransactionIdRetreat(prstate->frz_conflict_horizon);
+ }
}
else if (prstate->nfrozen > 0)
{
@@ -433,10 +456,10 @@ heap_page_will_freeze(Relation relation, Buffer buffer,
* now. The 'params.cutoffs', 'presult', 'new_relfrozen_xid' and
* 'new_relmin_mxid' arguments are required when freezing. When
* HEAP_PAGE_PRUNE_FREEZE option is passed, we also set presult->all_visible
- * and presult->all_frozen on exit, to indicate if the VM bits can be set.
- * They are always set to false when the HEAP_PAGE_PRUNE_FREEZE option is not
- * passed, because at the moment only callers that also freeze need that
- * information.
+ * and presult->all_frozen after determining whether or not to
+ * opporunistically freeze, to indicate if the VM bits can be set. They are
+ * always set to false when the HEAP_PAGE_PRUNE_FREEZE option is not passed,
+ * because at the moment only callers that also freeze need that information.
*
* presult contains output parameters needed by callers, such as the number of
* tuples removed and the offsets of dead items on the page after pruning.
@@ -522,6 +545,7 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
prstate.hastup = false;
prstate.lpdead_items = 0;
prstate.deadoffsets = presult->deadoffsets;
+ prstate.frz_conflict_horizon = InvalidTransactionId;
/*
* Caller may update the VM after we're done. We can keep track of
@@ -541,10 +565,10 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
* are tuples present that are not visible to everyone or if there are
* dead tuples which are not yet removable. However, dead tuples which
* will be removed by the end of vacuuming should not preclude us from
- * opportunistically freezing. Because of that, we do not clear
- * all_visible and all_frozen when we see LP_DEAD items. We fix that at
- * the end of the function, when we return the value to the caller, so
- * that the caller doesn't set the VM bits incorrectly.
+ * opportunistically freezing. Because of that, we do not immediately
+ * clear all_visible and all_frozen when we see LP_DEAD items. We fix
+ * that after scanning the line pointers, before we return the value to
+ * the caller, so that the caller doesn't set the VM bits incorrectly.
*/
if (prstate.attempt_freeze)
{
@@ -781,6 +805,22 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
do_hint_prune,
&prstate);
+ /*
+ * While scanning the line pointers, we did not clear
+ * all_visible/all_frozen when encountering LP_DEAD items because we
+ * wanted the decision whether or not to freeze the page to be unaffected
+ * by the short-term presence of LP_DEAD items. These LP_DEAD items are
+ * effectively assumed to be LP_UNUSED items in the making. It doesn't
+ * matter which vacuum heap pass (initial pass or final pass) ends up
+ * setting the page all-frozen, as long as the ongoing VACUUM does it.
+ *
+ * Now that we finished determining whether or not to freeze the page,
+ * update all_visible and all_frozen so that they reflect the true state
+ * of the page for setting PD_ALL_VISIBLE and VM bits.
+ */
+ if (prstate.lpdead_items > 0)
+ prstate.all_visible = prstate.all_frozen = false;
+
Assert(!prstate.all_frozen || prstate.all_visible);
/* Any error while applying the changes is critical */
@@ -841,29 +881,11 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
* on the standby with xids older than the youngest tuple this
* record will freeze will conflict.
*/
- TransactionId frz_conflict_horizon = InvalidTransactionId;
TransactionId conflict_xid;
- /*
- * We can use the visibility_cutoff_xid as our cutoff for
- * conflicts when the whole page is eligible to become all-frozen
- * in the VM once we're done with it. Otherwise we generate a
- * conservative cutoff by stepping back from OldestXmin.
- */
- if (do_freeze)
- {
- if (prstate.all_frozen)
- frz_conflict_horizon = prstate.visibility_cutoff_xid;
- else
- {
- /* Avoids false conflicts when hot_standby_feedback in use */
- frz_conflict_horizon = prstate.cutoffs->OldestXmin;
- TransactionIdRetreat(frz_conflict_horizon);
- }
- }
-
- if (TransactionIdFollows(frz_conflict_horizon, prstate.latest_xid_removed))
- conflict_xid = frz_conflict_horizon;
+ if (TransactionIdFollows(prstate.frz_conflict_horizon,
+ prstate.latest_xid_removed))
+ conflict_xid = prstate.frz_conflict_horizon;
else
conflict_xid = prstate.latest_xid_removed;
@@ -887,30 +909,8 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
presult->nfrozen = prstate.nfrozen;
presult->live_tuples = prstate.live_tuples;
presult->recently_dead_tuples = prstate.recently_dead_tuples;
-
- /*
- * It was convenient to ignore LP_DEAD items in all_visible/all_frozen
- * earlier on to make the choice of whether or not to freeze the page
- * unaffected by the short-term presence of LP_DEAD items. These LP_DEAD
- * items were effectively assumed to be LP_UNUSED items in the making. It
- * doesn't matter which vacuum heap pass (initial pass or final pass) ends
- * up setting the page all-frozen, as long as the ongoing VACUUM does it.
- *
- * Now that freezing has been finalized, unset all_visible and all_frozen
- * if there are any LP_DEAD items on the page. It needs to reflect the
- * present state of the page, as expected by our caller.
- */
- if (prstate.all_visible && prstate.lpdead_items == 0)
- {
- presult->all_visible = prstate.all_visible;
- presult->all_frozen = prstate.all_frozen;
- }
- else
- {
- presult->all_visible = false;
- presult->all_frozen = false;
- }
-
+ presult->all_visible = prstate.all_visible;
+ presult->all_frozen = prstate.all_frozen;
presult->hastup = prstate.hastup;
/*
--
2.43.0
v21-0004-Split-heap_page_prune_and_freeze-into-helpers.patchtext/x-patch; charset=US-ASCII; name=v21-0004-Split-heap_page_prune_and_freeze-into-helpers.patchDownload
From fd070d6954e5156523dafe35392654453c1d8684 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <melanieplageman@gmail.com>
Date: Mon, 17 Nov 2025 15:11:27 -0500
Subject: [PATCH v21 04/12] Split heap_page_prune_and_freeze() into helpers
Refactor the setup and planning phases of pruning and freezing into
helpers. This streamlines heap_page_prune_and_freeze() and makes it more
clear when the examination of tuples ends and page modifications begin.
---
src/backend/access/heap/pruneheap.c | 559 +++++++++++++++-------------
1 file changed, 307 insertions(+), 252 deletions(-)
diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index 8e40565381f..b10c5eb1163 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -157,6 +157,14 @@ typedef struct
} PruneState;
/* Local functions */
+static void prune_freeze_setup(PruneFreezeParams *params,
+ TransactionId new_relfrozen_xid,
+ MultiXactId new_relmin_mxid,
+ const PruneFreezeResult *presult,
+ PruneState *prstate);
+static void prune_freeze_plan(Oid reloid, Buffer buffer,
+ PruneState *prstate,
+ OffsetNumber *off_loc);
static HTSV_Result heap_prune_satisfies_vacuum(PruneState *prstate,
HeapTuple tup,
Buffer buffer);
@@ -308,200 +316,22 @@ heap_page_prune_opt(Relation relation, Buffer buffer)
}
/*
- * Decide whether to proceed with freezing according to the freeze plans
- * prepared for the given heap buffer. If freezing is chosen, this function
- * performs several pre-freeze checks.
- *
- * The values of do_prune, do_hint_prune, and did_tuple_hint_fpi must be
- * determined before calling this function.
- *
- * prstate is both an input and output parameter.
- *
- * Returns true if we should apply the freeze plans and freeze tuples on the
- * page, and false otherwise.
+ * Helper for heap_page_prune_and_freeze() to initialize the PruneState using
+ * the provided parameters.
*/
-static bool
-heap_page_will_freeze(Relation relation, Buffer buffer,
- bool did_tuple_hint_fpi,
- bool do_prune,
- bool do_hint_prune,
- PruneState *prstate)
-{
- bool do_freeze = false;
-
- /*
- * If the caller specified we should not attempt to freeze any tuples,
- * validate that everything is in the right state and return.
- */
- if (!prstate->attempt_freeze)
- {
- Assert(!prstate->all_frozen && prstate->nfrozen == 0);
- Assert(prstate->lpdead_items == 0 || !prstate->all_visible);
- return false;
- }
-
- if (prstate->pagefrz.freeze_required)
- {
- /*
- * heap_prepare_freeze_tuple indicated that at least one XID/MXID from
- * before FreezeLimit/MultiXactCutoff is present. Must freeze to
- * advance relfrozenxid/relminmxid.
- */
- do_freeze = true;
- }
- else
- {
- /*
- * Opportunistically freeze the page if we are generating an FPI
- * anyway and if doing so means that we can set the page all-frozen
- * afterwards (might not happen until VACUUM's final heap pass).
- *
- * XXX: Previously, we knew if pruning emitted an FPI by checking
- * pgWalUsage.wal_fpi before and after pruning. Once the freeze and
- * prune records were combined, this heuristic couldn't be used
- * anymore. The opportunistic freeze heuristic must be improved;
- * however, for now, try to approximate the old logic.
- */
- if (prstate->all_frozen && prstate->nfrozen > 0)
- {
- Assert(prstate->all_visible);
-
- /*
- * Freezing would make the page all-frozen. Have already emitted
- * an FPI or will do so anyway?
- */
- if (RelationNeedsWAL(relation))
- {
- if (did_tuple_hint_fpi)
- do_freeze = true;
- else if (do_prune)
- {
- if (XLogCheckBufferNeedsBackup(buffer))
- do_freeze = true;
- }
- else if (do_hint_prune)
- {
- if (XLogHintBitIsNeeded() && XLogCheckBufferNeedsBackup(buffer))
- do_freeze = true;
- }
- }
- }
- }
-
- if (do_freeze)
- {
- /*
- * Validate the tuples we will be freezing before entering the
- * critical section.
- */
- heap_pre_freeze_checks(buffer, prstate->frozen, prstate->nfrozen);
-
- /*
- * Calculate what the snapshot conflict horizon should be for a record
- * freezing tuples. We can use the visibility_cutoff_xid as our cutoff
- * for conflicts when the whole page is eligible to become all-frozen
- * in the VM once we're done with it. Otherwise, we generate a
- * conservative cutoff by stepping back from OldestXmin.
- */
- if (prstate->all_frozen)
- prstate->frz_conflict_horizon = prstate->visibility_cutoff_xid;
- else
- {
- /* Avoids false conflicts when hot_standby_feedback in use */
- prstate->frz_conflict_horizon = prstate->cutoffs->OldestXmin;
- TransactionIdRetreat(prstate->frz_conflict_horizon);
- }
- }
- else if (prstate->nfrozen > 0)
- {
- /*
- * The page contained some tuples that were not already frozen, and we
- * chose not to freeze them now. The page won't be all-frozen then.
- */
- Assert(!prstate->pagefrz.freeze_required);
-
- prstate->all_frozen = false;
- prstate->nfrozen = 0; /* avoid miscounts in instrumentation */
- }
- else
- {
- /*
- * We have no freeze plans to execute. The page might already be
- * all-frozen (perhaps only following pruning), though. Such pages
- * can be marked all-frozen in the VM by our caller, even though none
- * of its tuples were newly frozen here.
- */
- }
-
- return do_freeze;
-}
-
-
-/*
- * Prune and repair fragmentation and potentially freeze tuples on the
- * specified page.
- *
- * Caller must have pin and buffer cleanup lock on the page. Note that we
- * don't update the FSM information for page on caller's behalf. Caller might
- * also need to account for a reduction in the length of the line pointer
- * array following array truncation by us.
- *
- * params contains the input parameters used to control freezing and pruning
- * behavior. See the definition of PruneFreezeParams for more on what each
- * parameter does.
- *
- * If the HEAP_PAGE_PRUNE_FREEZE option is set in params, we will freeze
- * tuples if it's required in order to advance relfrozenxid / relminmxid, or
- * if it's considered advantageous for overall system performance to do so
- * now. The 'params.cutoffs', 'presult', 'new_relfrozen_xid' and
- * 'new_relmin_mxid' arguments are required when freezing. When
- * HEAP_PAGE_PRUNE_FREEZE option is passed, we also set presult->all_visible
- * and presult->all_frozen after determining whether or not to
- * opporunistically freeze, to indicate if the VM bits can be set. They are
- * always set to false when the HEAP_PAGE_PRUNE_FREEZE option is not passed,
- * because at the moment only callers that also freeze need that information.
- *
- * presult contains output parameters needed by callers, such as the number of
- * tuples removed and the offsets of dead items on the page after pruning.
- * heap_page_prune_and_freeze() is responsible for initializing it. Required
- * by all callers.
- *
- * off_loc is the offset location required by the caller to use in error
- * callback.
- *
- * new_relfrozen_xid and new_relmin_mxid must provided by the caller if the
- * HEAP_PAGE_PRUNE_FREEZE option is set in params. On entry, they contain the
- * oldest XID and multi-XID seen on the relation so far. They will be updated
- * with oldest values present on the page after pruning. After processing the
- * whole relation, VACUUM can use these values as the new
- * relfrozenxid/relminmxid for the relation.
- */
-void
-heap_page_prune_and_freeze(PruneFreezeParams *params,
- PruneFreezeResult *presult,
- OffsetNumber *off_loc,
- TransactionId *new_relfrozen_xid,
- MultiXactId *new_relmin_mxid)
+static void
+prune_freeze_setup(PruneFreezeParams *params,
+ TransactionId new_relfrozen_xid,
+ MultiXactId new_relmin_mxid,
+ const PruneFreezeResult *presult,
+ PruneState *prstate)
{
- Buffer buffer = params->buffer;
- Page page = BufferGetPage(buffer);
- BlockNumber blockno = BufferGetBlockNumber(buffer);
- OffsetNumber offnum,
- maxoff;
- PruneState prstate;
- HeapTupleData tup;
- bool do_freeze;
- bool do_prune;
- bool do_hint_prune;
- bool did_tuple_hint_fpi;
- int64 fpi_before = pgWalUsage.wal_fpi;
-
/* Copy parameters to prstate */
- prstate.vistest = params->vistest;
- prstate.mark_unused_now =
+ prstate->vistest = params->vistest;
+ prstate->mark_unused_now =
(params->options & HEAP_PAGE_PRUNE_MARK_UNUSED_NOW) != 0;
- prstate.attempt_freeze = (params->options & HEAP_PAGE_PRUNE_FREEZE) != 0;
- prstate.cutoffs = params->cutoffs;
+ prstate->attempt_freeze = (params->options & HEAP_PAGE_PRUNE_FREEZE) != 0;
+ prstate->cutoffs = params->cutoffs;
/*
* Our strategy is to scan the page and make lists of items to change,
@@ -514,41 +344,42 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
* prunable, we will save the lowest relevant XID in new_prune_xid. Also
* initialize the rest of our working state.
*/
- prstate.new_prune_xid = InvalidTransactionId;
- prstate.latest_xid_removed = InvalidTransactionId;
- prstate.nredirected = prstate.ndead = prstate.nunused = prstate.nfrozen = 0;
- prstate.nroot_items = 0;
- prstate.nheaponly_items = 0;
+ prstate->new_prune_xid = InvalidTransactionId;
+ prstate->latest_xid_removed = InvalidTransactionId;
+ prstate->nredirected = prstate->ndead = prstate->nunused = 0;
+ prstate->nfrozen = 0;
+ prstate->nroot_items = 0;
+ prstate->nheaponly_items = 0;
/* initialize page freezing working state */
- prstate.pagefrz.freeze_required = false;
- if (prstate.attempt_freeze)
+ prstate->pagefrz.freeze_required = false;
+ if (prstate->attempt_freeze)
{
- Assert(new_relfrozen_xid && new_relmin_mxid);
- prstate.pagefrz.FreezePageRelfrozenXid = *new_relfrozen_xid;
- prstate.pagefrz.NoFreezePageRelfrozenXid = *new_relfrozen_xid;
- prstate.pagefrz.FreezePageRelminMxid = *new_relmin_mxid;
- prstate.pagefrz.NoFreezePageRelminMxid = *new_relmin_mxid;
+ prstate->pagefrz.FreezePageRelfrozenXid = new_relfrozen_xid;
+ prstate->pagefrz.NoFreezePageRelfrozenXid = new_relfrozen_xid;
+ prstate->pagefrz.FreezePageRelminMxid = new_relmin_mxid;
+ prstate->pagefrz.NoFreezePageRelminMxid = new_relmin_mxid;
}
else
{
- Assert(new_relfrozen_xid == NULL && new_relmin_mxid == NULL);
- prstate.pagefrz.FreezePageRelminMxid = InvalidMultiXactId;
- prstate.pagefrz.NoFreezePageRelminMxid = InvalidMultiXactId;
- prstate.pagefrz.FreezePageRelfrozenXid = InvalidTransactionId;
- prstate.pagefrz.NoFreezePageRelfrozenXid = InvalidTransactionId;
+ Assert(new_relfrozen_xid == InvalidTransactionId &&
+ new_relmin_mxid == InvalidMultiXactId);
+ prstate->pagefrz.FreezePageRelminMxid = InvalidMultiXactId;
+ prstate->pagefrz.NoFreezePageRelminMxid = InvalidMultiXactId;
+ prstate->pagefrz.FreezePageRelfrozenXid = InvalidTransactionId;
+ prstate->pagefrz.NoFreezePageRelfrozenXid = InvalidTransactionId;
}
- prstate.ndeleted = 0;
- prstate.live_tuples = 0;
- prstate.recently_dead_tuples = 0;
- prstate.hastup = false;
- prstate.lpdead_items = 0;
- prstate.deadoffsets = presult->deadoffsets;
- prstate.frz_conflict_horizon = InvalidTransactionId;
+ prstate->ndeleted = 0;
+ prstate->live_tuples = 0;
+ prstate->recently_dead_tuples = 0;
+ prstate->hastup = false;
+ prstate->lpdead_items = 0;
+ prstate->deadoffsets = (OffsetNumber *) presult->deadoffsets;
+ prstate->frz_conflict_horizon = InvalidTransactionId;
/*
- * Caller may update the VM after we're done. We can keep track of
+ * Vacuum may update the VM after we're done. We can keep track of
* whether the page will be all-visible and all-frozen after pruning and
* freezing to help the caller to do that.
*
@@ -570,10 +401,10 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
* that after scanning the line pointers, before we return the value to
* the caller, so that the caller doesn't set the VM bits incorrectly.
*/
- if (prstate.attempt_freeze)
+ if (prstate->attempt_freeze)
{
- prstate.all_visible = true;
- prstate.all_frozen = true;
+ prstate->all_visible = true;
+ prstate->all_frozen = true;
}
else
{
@@ -581,8 +412,8 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
* Initializing to false allows skipping the work to update them in
* heap_prune_record_unchanged_lp_normal().
*/
- prstate.all_visible = false;
- prstate.all_frozen = false;
+ prstate->all_visible = false;
+ prstate->all_frozen = false;
}
/*
@@ -593,10 +424,29 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
* running transaction on the standby does not see tuples on the page as
* all-visible, so the conflict horizon remains InvalidTransactionId.
*/
- prstate.visibility_cutoff_xid = InvalidTransactionId;
+ prstate->visibility_cutoff_xid = InvalidTransactionId;
+}
- maxoff = PageGetMaxOffsetNumber(page);
- tup.t_tableOid = RelationGetRelid(params->relation);
+/*
+ * Helper for heap_page_prune_and_freeze(). Iterates over every tuple on the
+ * page, examines its visibility information, and determines the appropriate
+ * action for each tuple. All tuples are processed and classified during this
+ * phase, but no modifications are made to the page until the later execution
+ * stage.
+ *
+ * *off_loc is used for error callback and cleared before returning.
+ */
+static void
+prune_freeze_plan(Oid reloid, Buffer buffer, PruneState *prstate,
+ OffsetNumber *off_loc)
+{
+ Page page = BufferGetPage(buffer);
+ BlockNumber blockno = BufferGetBlockNumber(buffer);
+ OffsetNumber maxoff = PageGetMaxOffsetNumber(page);
+ OffsetNumber offnum;
+ HeapTupleData tup;
+
+ tup.t_tableOid = reloid;
/*
* Determine HTSV for all tuples, and queue them up for processing as HOT
@@ -631,13 +481,13 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
*/
*off_loc = offnum;
- prstate.processed[offnum] = false;
- prstate.htsv[offnum] = -1;
+ prstate->processed[offnum] = false;
+ prstate->htsv[offnum] = -1;
/* Nothing to do if slot doesn't contain a tuple */
if (!ItemIdIsUsed(itemid))
{
- heap_prune_record_unchanged_lp_unused(page, &prstate, offnum);
+ heap_prune_record_unchanged_lp_unused(page, prstate, offnum);
continue;
}
@@ -647,17 +497,17 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
* If the caller set mark_unused_now true, we can set dead line
* pointers LP_UNUSED now.
*/
- if (unlikely(prstate.mark_unused_now))
- heap_prune_record_unused(&prstate, offnum, false);
+ if (unlikely(prstate->mark_unused_now))
+ heap_prune_record_unused(prstate, offnum, false);
else
- heap_prune_record_unchanged_lp_dead(page, &prstate, offnum);
+ heap_prune_record_unchanged_lp_dead(page, prstate, offnum);
continue;
}
if (ItemIdIsRedirected(itemid))
{
/* This is the start of a HOT chain */
- prstate.root_items[prstate.nroot_items++] = offnum;
+ prstate->root_items[prstate->nroot_items++] = offnum;
continue;
}
@@ -671,21 +521,15 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
tup.t_len = ItemIdGetLength(itemid);
ItemPointerSet(&tup.t_self, blockno, offnum);
- prstate.htsv[offnum] = heap_prune_satisfies_vacuum(&prstate, &tup,
- buffer);
+ prstate->htsv[offnum] = heap_prune_satisfies_vacuum(prstate, &tup,
+ buffer);
if (!HeapTupleHeaderIsHeapOnly(htup))
- prstate.root_items[prstate.nroot_items++] = offnum;
+ prstate->root_items[prstate->nroot_items++] = offnum;
else
- prstate.heaponly_items[prstate.nheaponly_items++] = offnum;
+ prstate->heaponly_items[prstate->nheaponly_items++] = offnum;
}
- /*
- * If checksums are enabled, heap_prune_satisfies_vacuum() may have caused
- * an FPI to be emitted.
- */
- did_tuple_hint_fpi = fpi_before != pgWalUsage.wal_fpi;
-
/*
* Process HOT chains.
*
@@ -697,30 +541,30 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
* the page instead of using the root_items array, also did it in
* ascending offset number order.)
*/
- for (int i = prstate.nroot_items - 1; i >= 0; i--)
+ for (int i = prstate->nroot_items - 1; i >= 0; i--)
{
- offnum = prstate.root_items[i];
+ offnum = prstate->root_items[i];
/* Ignore items already processed as part of an earlier chain */
- if (prstate.processed[offnum])
+ if (prstate->processed[offnum])
continue;
/* see preceding loop */
*off_loc = offnum;
/* Process this item or chain of items */
- heap_prune_chain(page, blockno, maxoff, offnum, &prstate);
+ heap_prune_chain(page, blockno, maxoff, offnum, prstate);
}
/*
* Process any heap-only tuples that were not already processed as part of
* a HOT chain.
*/
- for (int i = prstate.nheaponly_items - 1; i >= 0; i--)
+ for (int i = prstate->nheaponly_items - 1; i >= 0; i--)
{
- offnum = prstate.heaponly_items[i];
+ offnum = prstate->heaponly_items[i];
- if (prstate.processed[offnum])
+ if (prstate->processed[offnum])
continue;
/* see preceding loop */
@@ -739,7 +583,7 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
* return true for an XMIN_INVALID tuple, so this code will work even
* when there were sequential updates within the aborted transaction.)
*/
- if (prstate.htsv[offnum] == HEAPTUPLE_DEAD)
+ if (prstate->htsv[offnum] == HEAPTUPLE_DEAD)
{
ItemId itemid = PageGetItemId(page, offnum);
HeapTupleHeader htup = (HeapTupleHeader) PageGetItem(page, itemid);
@@ -747,8 +591,8 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
if (likely(!HeapTupleHeaderIsHotUpdated(htup)))
{
HeapTupleHeaderAdvanceConflictHorizon(htup,
- &prstate.latest_xid_removed);
- heap_prune_record_unused(&prstate, offnum, true);
+ &prstate->latest_xid_removed);
+ heap_prune_record_unused(prstate, offnum, true);
}
else
{
@@ -765,7 +609,7 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
}
}
else
- heap_prune_record_unchanged_lp_normal(page, &prstate, offnum);
+ heap_prune_record_unchanged_lp_normal(page, prstate, offnum);
}
/* We should now have processed every tuple exactly once */
@@ -776,12 +620,223 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
{
*off_loc = offnum;
- Assert(prstate.processed[offnum]);
+ Assert(prstate->processed[offnum]);
}
#endif
/* Clear the offset information once we have processed the given page. */
*off_loc = InvalidOffsetNumber;
+}
+
+/*
+ * Decide whether to proceed with freezing according to the freeze plans
+ * prepared for the given heap buffer. If freezing is chosen, this function
+ * performs several pre-freeze checks.
+ *
+ * The values of do_prune, do_hint_prune, and did_tuple_hint_fpi must be
+ * determined before calling this function.
+ *
+ * prstate is both an input and output parameter.
+ *
+ * Returns true if we should apply the freeze plans and freeze tuples on the
+ * page, and false otherwise.
+ */
+static bool
+heap_page_will_freeze(Relation relation, Buffer buffer,
+ bool did_tuple_hint_fpi,
+ bool do_prune,
+ bool do_hint_prune,
+ PruneState *prstate)
+{
+ bool do_freeze = false;
+
+ /*
+ * If the caller specified we should not attempt to freeze any tuples,
+ * validate that everything is in the right state and return.
+ */
+ if (!prstate->attempt_freeze)
+ {
+ Assert(!prstate->all_frozen && prstate->nfrozen == 0);
+ Assert(prstate->lpdead_items == 0 || !prstate->all_visible);
+ return false;
+ }
+
+ if (prstate->pagefrz.freeze_required)
+ {
+ /*
+ * heap_prepare_freeze_tuple indicated that at least one XID/MXID from
+ * before FreezeLimit/MultiXactCutoff is present. Must freeze to
+ * advance relfrozenxid/relminmxid.
+ */
+ do_freeze = true;
+ }
+ else
+ {
+ /*
+ * Opportunistically freeze the page if we are generating an FPI
+ * anyway and if doing so means that we can set the page all-frozen
+ * afterwards (might not happen until VACUUM's final heap pass).
+ *
+ * XXX: Previously, we knew if pruning emitted an FPI by checking
+ * pgWalUsage.wal_fpi before and after pruning. Once the freeze and
+ * prune records were combined, this heuristic couldn't be used
+ * anymore. The opportunistic freeze heuristic must be improved;
+ * however, for now, try to approximate the old logic.
+ */
+ if (prstate->all_frozen && prstate->nfrozen > 0)
+ {
+ Assert(prstate->all_visible);
+
+ /*
+ * Freezing would make the page all-frozen. Have already emitted
+ * an FPI or will do so anyway?
+ */
+ if (RelationNeedsWAL(relation))
+ {
+ if (did_tuple_hint_fpi)
+ do_freeze = true;
+ else if (do_prune)
+ {
+ if (XLogCheckBufferNeedsBackup(buffer))
+ do_freeze = true;
+ }
+ else if (do_hint_prune)
+ {
+ if (XLogHintBitIsNeeded() && XLogCheckBufferNeedsBackup(buffer))
+ do_freeze = true;
+ }
+ }
+ }
+ }
+
+ if (do_freeze)
+ {
+ /*
+ * Validate the tuples we will be freezing before entering the
+ * critical section.
+ */
+ heap_pre_freeze_checks(buffer, prstate->frozen, prstate->nfrozen);
+
+ /*
+ * Calculate what the snapshot conflict horizon should be for a record
+ * freezing tuples. We can use the visibility_cutoff_xid as our cutoff
+ * for conflicts when the whole page is eligible to become all-frozen
+ * in the VM once we're done with it. Otherwise, we generate a
+ * conservative cutoff by stepping back from OldestXmin.
+ */
+ if (prstate->all_frozen)
+ prstate->frz_conflict_horizon = prstate->visibility_cutoff_xid;
+ else
+ {
+ /* Avoids false conflicts when hot_standby_feedback in use */
+ prstate->frz_conflict_horizon = prstate->cutoffs->OldestXmin;
+ TransactionIdRetreat(prstate->frz_conflict_horizon);
+ }
+ }
+ else if (prstate->nfrozen > 0)
+ {
+ /*
+ * The page contained some tuples that were not already frozen, and we
+ * chose not to freeze them now. The page won't be all-frozen then.
+ */
+ Assert(!prstate->pagefrz.freeze_required);
+
+ prstate->all_frozen = false;
+ prstate->nfrozen = 0; /* avoid miscounts in instrumentation */
+ }
+ else
+ {
+ /*
+ * We have no freeze plans to execute. The page might already be
+ * all-frozen (perhaps only following pruning), though. Such pages
+ * can be marked all-frozen in the VM by our caller, even though none
+ * of its tuples were newly frozen here.
+ */
+ }
+
+ return do_freeze;
+}
+
+
+/*
+ * Prune and repair fragmentation and potentially freeze tuples on the
+ * specified page.
+ *
+ * Caller must have pin and buffer cleanup lock on the page. Note that we
+ * don't update the FSM information for page on caller's behalf. Caller might
+ * also need to account for a reduction in the length of the line pointer
+ * array following array truncation by us.
+ *
+ * params contains the input parameters used to control freezing and pruning
+ * behavior. See the definition of PruneFreezeParams for more on what each
+ * parameter does.
+ *
+ * If the HEAP_PAGE_PRUNE_FREEZE option is set in params, we will freeze
+ * tuples if it's required in order to advance relfrozenxid / relminmxid, or
+ * if it's considered advantageous for overall system performance to do so
+ * now. The 'params.cutoffs', 'presult', 'new_relfrozen_xid' and
+ * 'new_relmin_mxid' arguments are required when freezing. When
+ * HEAP_PAGE_PRUNE_FREEZE option is passed, we also set presult->all_visible
+ * and presult->all_frozen after determining whether or not to
+ * opporunistically freeze, to indicate if the VM bits can be set. They are
+ * always set to false when the HEAP_PAGE_PRUNE_FREEZE option is not passed,
+ * because at the moment only callers that also freeze need that information.
+ *
+ * presult contains output parameters needed by callers, such as the number of
+ * tuples removed and the offsets of dead items on the page after pruning.
+ * heap_page_prune_and_freeze() is responsible for initializing it. Required
+ * by all callers.
+ *
+ * off_loc is the offset location required by the caller to use in error
+ * callback.
+ *
+ * new_relfrozen_xid and new_relmin_mxid must provided by the caller if the
+ * HEAP_PAGE_PRUNE_FREEZE option is set in params. On entry, they contain the
+ * oldest XID and multi-XID seen on the relation so far. They will be updated
+ * with oldest values present on the page after pruning. After processing the
+ * whole relation, VACUUM can use these values as the new
+ * relfrozenxid/relminmxid for the relation.
+ */
+void
+heap_page_prune_and_freeze(PruneFreezeParams *params,
+ PruneFreezeResult *presult,
+ OffsetNumber *off_loc,
+ TransactionId *new_relfrozen_xid,
+ MultiXactId *new_relmin_mxid)
+{
+ Buffer buffer = params->buffer;
+ Page page = BufferGetPage(buffer);
+ PruneState prstate;
+ bool do_freeze;
+ bool do_prune;
+ bool do_hint_prune;
+ bool did_tuple_hint_fpi;
+ int64 fpi_before = pgWalUsage.wal_fpi;
+
+ /* Initialize prstate */
+ prune_freeze_setup(params,
+ new_relfrozen_xid ?
+ *new_relfrozen_xid : InvalidTransactionId,
+ new_relmin_mxid ?
+ *new_relmin_mxid : InvalidMultiXactId,
+ presult,
+ &prstate);
+
+ /*
+ * Examine all line pointers and tuple visibility information to determine
+ * which line pointers should change state and which tuples may be frozen.
+ * Prepare queue of state changes to later be executed in a critical
+ * section.
+ */
+ prune_freeze_plan(RelationGetRelid(params->relation),
+ buffer, &prstate, off_loc);
+
+ /*
+ * If checksums are enabled, calling heap_prune_satisfies_vacuum() while
+ * checking tuple visibility information in prune_freeze_plan() may have
+ * caused an FPI to be emitted.
+ */
+ did_tuple_hint_fpi = fpi_before != pgWalUsage.wal_fpi;
do_prune = prstate.nredirected > 0 ||
prstate.ndead > 0 ||
--
2.43.0
v21-0005-Eliminate-XLOG_HEAP2_VISIBLE-from-vacuum-phase-I.patchtext/x-patch; charset=UTF-8; name=v21-0005-Eliminate-XLOG_HEAP2_VISIBLE-from-vacuum-phase-I.patchDownload
From a9af84665ae761e9fba46f835a5efd849739da23 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <melanieplageman@gmail.com>
Date: Wed, 8 Oct 2025 15:39:01 -0400
Subject: [PATCH v21 05/12] Eliminate XLOG_HEAP2_VISIBLE from vacuum phase I
prune/freeze
Vacuum no longer emits a separate WAL record for each page set
all-visible or all-frozen during phase I. Instead, visibility map
updates are now included in the XLOG_HEAP2_PRUNE_VACUUM_SCAN record that
is already emitted for pruning and freezing.
Previously, heap_page_prune_and_freeze() determined whether a page was
all-visible, but the corresponding VM bits were only set later in
lazy_scan_prune(). Now the VM is updated immediately in
heap_page_prune_and_freeze(), at the same time as the heap
modifications.
This change applies only to vacuum phase I, not to pruning performed
during normal page access.
Author: Melanie Plageman <melanieplageman@gmail.com>
Reviewed-by: Robert Haas <robertmhaas@gmail.com>
Reviewed-by: Kirill Reshke <reshkekirill@gmail.com>
Discussion: https://postgr.es/m/flat/CAAKRu_ZMw6Npd_qm2KM%2BFwQ3cMOMx1Dh3VMhp8-V7SOLxdK9-g%40mail.gmail.com
---
src/backend/access/heap/heapam_xlog.c | 37 ++-
src/backend/access/heap/pruneheap.c | 461 +++++++++++++++++++++-----
src/backend/access/heap/vacuumlazy.c | 241 +-------------
src/include/access/heapam.h | 43 ++-
4 files changed, 447 insertions(+), 335 deletions(-)
diff --git a/src/backend/access/heap/heapam_xlog.c b/src/backend/access/heap/heapam_xlog.c
index 11cb3f74da5..2af724451c3 100644
--- a/src/backend/access/heap/heapam_xlog.c
+++ b/src/backend/access/heap/heapam_xlog.c
@@ -104,6 +104,8 @@ heap_xlog_prune_freeze(XLogReaderState *record)
OffsetNumber *frz_offsets;
char *dataptr = XLogRecGetBlockData(record, 0, &datalen);
bool do_prune;
+ bool set_lsn = false;
+ bool mark_buffer_dirty = false;
heap_xlog_deserialize_prune_and_freeze(dataptr, xlrec.flags,
&nplans, &plans, &frz_offsets,
@@ -157,17 +159,36 @@ heap_xlog_prune_freeze(XLogReaderState *record)
/* There should be no more data */
Assert((char *) frz_offsets == dataptr + datalen);
- if (vmflags & VISIBILITYMAP_VALID_BITS)
- PageSetAllVisible(page);
-
- MarkBufferDirty(buffer);
+ if (do_prune || nplans > 0)
+ mark_buffer_dirty = set_lsn = true;
/*
- * See log_heap_prune_and_freeze() for commentary on when we set the
- * heap page LSN.
+ * The critical integrity requirement here is that we must never end
+ * up with with the visibility map bit set and the page-level
+ * PD_ALL_VISIBLE bit clear. If that were to occur, a subsequent page
+ * modification would fail to clear the visibility map bit.
+ *
+ * vmflags may be nonzero with PD_ALL_VISIBLE already set (e.g. when
+ * marking an all-visible page all-frozen). If only the VM is updated,
+ * the heap page need not be dirtied.
*/
- if (do_prune || nplans > 0 ||
- ((vmflags & VISIBILITYMAP_VALID_BITS) && XLogHintBitIsNeeded()))
+ if ((vmflags & VISIBILITYMAP_VALID_BITS) && !PageIsAllVisible(page))
+ {
+ PageSetAllVisible(page);
+ mark_buffer_dirty = true;
+
+ /*
+ * See log_heap_prune_and_freeze() for commentary on when we set
+ * the heap page LSN.
+ */
+ if (XLogHintBitIsNeeded())
+ set_lsn = true;
+ }
+
+ if (mark_buffer_dirty)
+ MarkBufferDirty(buffer);
+
+ if (set_lsn)
PageSetLSN(page, lsn);
/*
diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index b10c5eb1163..ba578c1ce0f 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -19,7 +19,7 @@
#include "access/htup_details.h"
#include "access/multixact.h"
#include "access/transam.h"
-#include "access/visibilitymapdefs.h"
+#include "access/visibilitymap.h"
#include "access/xlog.h"
#include "access/xloginsert.h"
#include "commands/vacuum.h"
@@ -44,6 +44,8 @@ typedef struct
bool mark_unused_now;
/* whether to attempt freezing tuples */
bool attempt_freeze;
+ /* whether or not to attempt updating the VM */
+ bool attempt_update_vm;
struct VacuumCutoffs *cutoffs;
/*-------------------------------------------------------
@@ -140,16 +142,17 @@ typedef struct
* all_visible and all_frozen indicate if the all-visible and all-frozen
* bits in the visibility map can be set for this page after pruning.
*
- * visibility_cutoff_xid is the newest xmin of live tuples on the page.
- * The caller can use it as the conflict horizon, when setting the VM
- * bits. It is only valid if we froze some tuples, and all_frozen is
- * true.
+ * visibility_cutoff_xid is the newest xmin of live tuples on the page. It
+ * can be used as the conflict horizon when setting the VM or when
+ * freezing all the tuples on the page. It is only valid when all the live
+ * tuples on the page are all-visible.
*
* NOTE: all_visible and all_frozen initially don't include LP_DEAD items.
* That's convenient for heap_page_prune_and_freeze() to use them to
- * decide whether to freeze the page or not. The all_visible and
- * all_frozen values returned to the caller are adjusted to include
- * LP_DEAD items after we determine whether to opportunistically freeze.
+ * decide whether to opportunistically freeze the page or not. The
+ * all_visible and all_frozen values ultimately used to set the VM are
+ * adjusted to include LP_DEAD items after we determine whether or not to
+ * opportunistically freeze.
*/
bool all_visible;
bool all_frozen;
@@ -188,10 +191,21 @@ static void heap_prune_record_unchanged_lp_redirect(PruneState *prstate, OffsetN
static void page_verify_redirects(Page page);
+static TransactionId get_conflict_xid(bool do_prune, bool do_freeze, bool do_set_vm,
+ TransactionId latest_xid_removed, TransactionId frz_conflict_horizon,
+ TransactionId visibility_cutoff_xid, bool blk_already_av,
+ bool set_blk_all_frozen);
static bool heap_page_will_freeze(Relation relation, Buffer buffer,
bool did_tuple_hint_fpi, bool do_prune, bool do_hint_prune,
PruneState *prstate);
-
+static bool heap_page_will_set_vis(Relation relation,
+ BlockNumber heap_blk,
+ Buffer heap_buf,
+ Buffer vmbuffer,
+ bool blk_known_av,
+ const PruneState *prstate,
+ uint8 *vmflags,
+ bool *do_set_pd_vis);
/*
* Optionally prune and repair fragmentation in the specified page.
@@ -278,6 +292,7 @@ heap_page_prune_opt(Relation relation, Buffer buffer)
* current implementation.
*/
PruneFreezeParams params = {.relation = relation,.buffer = buffer,
+ .vmbuffer = InvalidBuffer,.blk_known_av = false,
.reason = PRUNE_ON_ACCESS,.options = 0,
.vistest = vistest,.cutoffs = NULL
};
@@ -331,6 +346,8 @@ prune_freeze_setup(PruneFreezeParams *params,
prstate->mark_unused_now =
(params->options & HEAP_PAGE_PRUNE_MARK_UNUSED_NOW) != 0;
prstate->attempt_freeze = (params->options & HEAP_PAGE_PRUNE_FREEZE) != 0;
+ prstate->attempt_update_vm =
+ (params->options & HEAP_PAGE_PRUNE_UPDATE_VIS) != 0;
prstate->cutoffs = params->cutoffs;
/*
@@ -379,50 +396,54 @@ prune_freeze_setup(PruneFreezeParams *params,
prstate->frz_conflict_horizon = InvalidTransactionId;
/*
- * Vacuum may update the VM after we're done. We can keep track of
- * whether the page will be all-visible and all-frozen after pruning and
- * freezing to help the caller to do that.
+ * Track whether the page could be marked all-visible and/or all-frozen.
+ * This information is used for opportunistic freezing and for updating
+ * the visibility map (VM) if requested by the caller.
+ *
+ * Currently, only VACUUM performs freezing, but other callers may in the
+ * future. Visibility bookkeeping is required not just for setting the VM
+ * bits, but also for opportunistic freezing: we only consider freezing if
+ * the page would become all-frozen, or if it would be all-frozen except
+ * for dead tuples that VACUUM will remove. If attempt_update_vm is false,
+ * we will not set the VM bit even if the page is found to be all-visible.
*
- * Currently, only VACUUM sets the VM bits. To save the effort, only do
- * the bookkeeping if the caller needs it. Currently, that's tied to
- * HEAP_PAGE_PRUNE_FREEZE, but it could be a separate flag if you wanted
- * to update the VM bits without also freezing or freeze without also
- * setting the VM bits.
+ * If HEAP_PAGE_PRUNE_UPDATE_VIS is passed without HEAP_PAGE_PRUNE_FREEZE,
+ * prstate.all_frozen must be initialized to false, since we will not call
+ * heap_prepare_freeze_tuple() for each tuple.
*
- * In addition to telling the caller whether it can set the VM bit, we
- * also use 'all_visible' and 'all_frozen' for our own decision-making. If
- * the whole page would become frozen, we consider opportunistically
- * freezing tuples. We will not be able to freeze the whole page if there
- * are tuples present that are not visible to everyone or if there are
- * dead tuples which are not yet removable. However, dead tuples which
- * will be removed by the end of vacuuming should not preclude us from
- * opportunistically freezing. Because of that, we do not immediately
- * clear all_visible and all_frozen when we see LP_DEAD items. We fix
- * that after scanning the line pointers, before we return the value to
- * the caller, so that the caller doesn't set the VM bits incorrectly.
+ * Dead tuples that will be removed by the end of vacuum should not
+ * prevent opportunistic freezing. Therefore, we do not clear all_visible
+ * and all_frozen when we encounter LP_DEAD items. Instead, we correct
+ * them after deciding whether to freeze, but before updating the VM, to
+ * avoid setting the VM bits incorrectly.
+ *
+ * If neither freezing nor VM updates are requested, we skip the extra
+ * bookkeeping. In this case, initializing all_visible to false allows
+ * heap_prune_record_unchanged_lp_normal() to bypass unnecessary work.
*/
if (prstate->attempt_freeze)
{
prstate->all_visible = true;
prstate->all_frozen = true;
}
+ else if (prstate->attempt_update_vm)
+ {
+ prstate->all_visible = true;
+ prstate->all_frozen = false;
+ }
else
{
- /*
- * Initializing to false allows skipping the work to update them in
- * heap_prune_record_unchanged_lp_normal().
- */
prstate->all_visible = false;
prstate->all_frozen = false;
}
/*
- * The visibility cutoff xid is the newest xmin of live tuples on the
- * page. In the common case, this will be set as the conflict horizon the
- * caller can use for updating the VM. If, at the end of freezing and
- * pruning, the page is all-frozen, there is no possibility that any
- * running transaction on the standby does not see tuples on the page as
- * all-visible, so the conflict horizon remains InvalidTransactionId.
+ * The visibility cutoff xid is the newest xmin of live, committed tuples
+ * older than OldestXmin on the page. This field is only kept up-to-date
+ * if the page is all-visible. As soon as a tuple is encountered that is
+ * not visible to all, this field is unmaintained. As long as it is
+ * maintained, it can be used to calculate the snapshot conflict horizon
+ * when updating the VM and/or freezing all the tuples on the page.
*/
prstate->visibility_cutoff_xid = InvalidTransactionId;
}
@@ -757,10 +778,133 @@ heap_page_will_freeze(Relation relation, Buffer buffer,
return do_freeze;
}
+/*
+ * Decide whether to set the visibility map bits for heap_blk, using
+ * information from PruneState and blk_known_av. Some callers may already
+ * have examined this page’s VM bits (e.g., VACUUM in the previous
+ * heap_vac_scan_next_block() call) and can pass that along.
+ *
+ * Returns true if one or both VM bits should be set, along with the desired
+ * flags in *vmflags. Also indicates via do_set_pd_vis whether PD_ALL_VISIBLE
+ * should be set on the heap page.
+ */
+static bool
+heap_page_will_set_vis(Relation relation,
+ BlockNumber heap_blk,
+ Buffer heap_buf,
+ Buffer vmbuffer,
+ bool blk_known_av,
+ const PruneState *prstate,
+ uint8 *vmflags,
+ bool *do_set_pd_vis)
+{
+ Page heap_page = BufferGetPage(heap_buf);
+ bool do_set_vm = false;
+
+ *do_set_pd_vis = false;
+
+ if (!prstate->attempt_update_vm)
+ {
+ Assert(!prstate->all_visible && !prstate->all_frozen);
+ Assert(*vmflags == 0);
+ return false;
+ }
+
+ if (prstate->all_visible && !PageIsAllVisible(heap_page))
+ *do_set_pd_vis = true;
+
+ if ((prstate->all_visible && !blk_known_av) ||
+ (prstate->all_frozen && !VM_ALL_FROZEN(relation, heap_blk, &vmbuffer)))
+ {
+ *vmflags = VISIBILITYMAP_ALL_VISIBLE;
+ if (prstate->all_frozen)
+ *vmflags |= VISIBILITYMAP_ALL_FROZEN;
+
+ do_set_vm = true;
+ }
+
+ /*
+ * Now handle two potential corruption cases:
+ *
+ * These do not need to happen in a critical section and are not
+ * WAL-logged.
+ *
+ * As of PostgreSQL 9.2, the visibility map bit should never be set if the
+ * page-level bit is clear. However, it's possible that in vacuum the bit
+ * got cleared after heap_vac_scan_next_block() was called, so we must
+ * recheck with buffer lock before concluding that the VM is corrupt.
+ */
+ else if (blk_known_av && !PageIsAllVisible(heap_page) &&
+ visibilitymap_get_status(relation, heap_blk, &vmbuffer) != 0)
+ {
+ ereport(WARNING,
+ (errcode(ERRCODE_DATA_CORRUPTED),
+ errmsg("page is not marked all-visible but visibility map bit is set in relation \"%s\" page %u",
+ RelationGetRelationName(relation), heap_blk)));
+
+ visibilitymap_clear(relation, heap_blk, vmbuffer,
+ VISIBILITYMAP_VALID_BITS);
+ }
+
+ /*
+ * It's possible for the value returned by
+ * GetOldestNonRemovableTransactionId() to move backwards, so it's not
+ * wrong for us to see tuples that appear to not be visible to everyone
+ * yet, while PD_ALL_VISIBLE is already set. The real safe xmin value
+ * never moves backwards, but GetOldestNonRemovableTransactionId() is
+ * conservative and sometimes returns a value that's unnecessarily small,
+ * so if we see that contradiction it just means that the tuples that we
+ * think are not visible to everyone yet actually are, and the
+ * PD_ALL_VISIBLE flag is correct.
+ *
+ * There should never be LP_DEAD items on a page with PD_ALL_VISIBLE set,
+ * however.
+ */
+ else if (prstate->lpdead_items > 0 && PageIsAllVisible(heap_page))
+ {
+ ereport(WARNING,
+ (errcode(ERRCODE_DATA_CORRUPTED),
+ errmsg("page containing LP_DEAD items is marked as all-visible in relation \"%s\" page %u",
+ RelationGetRelationName(relation), heap_blk)));
+
+ PageClearAllVisible(heap_page);
+ MarkBufferDirty(heap_buf);
+ visibilitymap_clear(relation, heap_blk, vmbuffer,
+ VISIBILITYMAP_VALID_BITS);
+ }
+
+ return do_set_vm;
+}
+
+
+#ifdef USE_ASSERT_CHECKING
+
+/*
+ * Wrapper for heap_page_would_be_all_visible() which can be used for callers
+ * that expect no LP_DEAD on the page. Currently assert-only, but there is no
+ * reason not to use it outside of asserts.
+ */
+static bool
+heap_page_is_all_visible(Relation rel, Buffer buf,
+ TransactionId OldestXmin,
+ bool *all_frozen,
+ TransactionId *visibility_cutoff_xid,
+ OffsetNumber *logging_offnum)
+{
+
+ return heap_page_would_be_all_visible(rel, buf,
+ OldestXmin,
+ NULL, 0,
+ all_frozen,
+ visibility_cutoff_xid,
+ logging_offnum);
+}
+#endif
/*
* Prune and repair fragmentation and potentially freeze tuples on the
- * specified page.
+ * specified page. If the page's visibility status has changed, update it in
+ * the VM.
*
* Caller must have pin and buffer cleanup lock on the page. Note that we
* don't update the FSM information for page on caller's behalf. Caller might
@@ -775,12 +919,13 @@ heap_page_will_freeze(Relation relation, Buffer buffer,
* tuples if it's required in order to advance relfrozenxid / relminmxid, or
* if it's considered advantageous for overall system performance to do so
* now. The 'params.cutoffs', 'presult', 'new_relfrozen_xid' and
- * 'new_relmin_mxid' arguments are required when freezing. When
- * HEAP_PAGE_PRUNE_FREEZE option is passed, we also set presult->all_visible
- * and presult->all_frozen after determining whether or not to
- * opporunistically freeze, to indicate if the VM bits can be set. They are
- * always set to false when the HEAP_PAGE_PRUNE_FREEZE option is not passed,
- * because at the moment only callers that also freeze need that information.
+ * 'new_relmin_mxid' arguments are required when freezing.
+ *
+ * If HEAP_PAGE_PRUNE_UPDATE_VIS is set in params and the visibility status of
+ * the page has changed, we will update the VM at the same time as pruning and
+ * freezing the heap page. We will also update presult->old_vmbits and
+ * presult->new_vmbits with the state of the VM before and after updating it
+ * for the caller to use in bookkeeping.
*
* presult contains output parameters needed by callers, such as the number of
* tuples removed and the offsets of dead items on the page after pruning.
@@ -805,13 +950,20 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
MultiXactId *new_relmin_mxid)
{
Buffer buffer = params->buffer;
+ Buffer vmbuffer = params->vmbuffer;
Page page = BufferGetPage(buffer);
+ BlockNumber blockno = BufferGetBlockNumber(buffer);
PruneState prstate;
bool do_freeze;
bool do_prune;
bool do_hint_prune;
+ bool do_set_vm;
+ bool do_set_pd_vis;
bool did_tuple_hint_fpi;
int64 fpi_before = pgWalUsage.wal_fpi;
+ TransactionId conflict_xid = InvalidTransactionId;
+ uint8 new_vmbits = 0;
+ uint8 old_vmbits = 0;
/* Initialize prstate */
prune_freeze_setup(params,
@@ -877,6 +1029,34 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
prstate.all_visible = prstate.all_frozen = false;
Assert(!prstate.all_frozen || prstate.all_visible);
+ Assert(!prstate.all_visible || (prstate.lpdead_items == 0));
+
+ /*
+ * Decide whether to set the page-level PD_ALL_VISIBLE bit and the VM bits
+ * based on information from the VM and the all_visible/all_frozen flags.
+ *
+ * While it is valid for PD_ALL_VISIBLE to be set when the corresponding
+ * VM bit is clear, we strongly prefer to keep them in sync.
+ *
+ * Accordingly, we also allow updating only the VM when PD_ALL_VISIBLE has
+ * already been set. Setting only the VM is most common when setting an
+ * already all-visible page all-frozen.
+ */
+ do_set_vm = heap_page_will_set_vis(params->relation,
+ blockno, buffer, vmbuffer, params->blk_known_av,
+ &prstate, &new_vmbits, &do_set_pd_vis);
+
+ /* We should only set the VM if PD_ALL_VISIBLE is set or will be */
+ Assert(!do_set_vm || do_set_pd_vis || PageIsAllVisible(page));
+
+ conflict_xid = get_conflict_xid(do_prune, do_freeze, do_set_vm,
+ prstate.latest_xid_removed, prstate.frz_conflict_horizon,
+ prstate.visibility_cutoff_xid, params->blk_known_av,
+ (do_set_vm && (new_vmbits & VISIBILITYMAP_ALL_FROZEN)));
+
+ /* Lock vmbuffer before entering a critical section */
+ if (do_set_vm)
+ LockBuffer(vmbuffer, BUFFER_LOCK_EXCLUSIVE);
/* Any error while applying the changes is critical */
START_CRIT_SECTION();
@@ -898,14 +1078,17 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
/*
* If that's all we had to do to the page, this is a non-WAL-logged
- * hint. If we are going to freeze or prune the page, we will mark
- * the buffer dirty below.
+ * hint. If we are going to freeze or prune the page or set
+ * PD_ALL_VISIBLE, we will mark the buffer dirty below.
+ *
+ * Setting PD_ALL_VISIBLE is fully WAL-logged because it is forbidden
+ * for the VM to be set and PD_ALL_VISIBLE to be clear.
*/
- if (!do_freeze && !do_prune)
+ if (!do_freeze && !do_prune && !do_set_pd_vis)
MarkBufferDirtyHint(buffer, true);
}
- if (do_prune || do_freeze)
+ if (do_prune || do_freeze || do_set_vm)
{
/* Apply the planned item changes and repair page fragmentation. */
if (do_prune)
@@ -919,36 +1102,43 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
if (do_freeze)
heap_freeze_prepared_tuples(buffer, prstate.frozen, prstate.nfrozen);
- MarkBufferDirty(buffer);
+ if (do_set_pd_vis)
+ PageSetAllVisible(page);
- /*
- * Emit a WAL XLOG_HEAP2_PRUNE* record showing what we did
- */
- if (RelationNeedsWAL(params->relation))
+ if (do_prune || do_freeze || do_set_pd_vis)
+ MarkBufferDirty(buffer);
+
+ if (do_set_vm)
{
- /*
- * The snapshotConflictHorizon for the whole record should be the
- * most conservative of all the horizons calculated for any of the
- * possible modifications. If this record will prune tuples, any
- * transactions on the standby older than the youngest xmax of the
- * most recently removed tuple this record will prune will
- * conflict. If this record will freeze tuples, any transactions
- * on the standby with xids older than the youngest tuple this
- * record will freeze will conflict.
- */
- TransactionId conflict_xid;
+ Assert(PageIsAllVisible(page));
- if (TransactionIdFollows(prstate.frz_conflict_horizon,
- prstate.latest_xid_removed))
- conflict_xid = prstate.frz_conflict_horizon;
- else
- conflict_xid = prstate.latest_xid_removed;
+ old_vmbits = visibilitymap_set_vmbits(blockno,
+ vmbuffer, new_vmbits,
+ params->relation->rd_locator);
+ if (old_vmbits == new_vmbits)
+ {
+ LockBuffer(vmbuffer, BUFFER_LOCK_UNLOCK);
+ /* Unset so we don't emit WAL since no change occurred */
+ do_set_vm = false;
+ }
+ }
+ /*
+ * Emit a WAL XLOG_HEAP2_PRUNE* record showing what we did. If we were
+ * only updating the VM and it turns out it was already set, we will
+ * have unset do_set_vm earlier. As such, check it again before
+ * emitting the record.
+ */
+ if (RelationNeedsWAL(params->relation) &&
+ (do_prune || do_freeze || do_set_vm))
+ {
log_heap_prune_and_freeze(params->relation, buffer,
- InvalidBuffer, /* vmbuffer */
- 0, /* vmflags */
+ do_set_vm ? vmbuffer : InvalidBuffer,
+ do_set_vm ? new_vmbits : 0,
conflict_xid,
- true, params->reason,
+ true, /* cleanup lock */
+ do_set_pd_vis,
+ params->reason,
prstate.frozen, prstate.nfrozen,
prstate.redirected, prstate.nredirected,
prstate.nowdead, prstate.ndead,
@@ -958,28 +1148,47 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
END_CRIT_SECTION();
+ if (do_set_vm)
+ LockBuffer(vmbuffer, BUFFER_LOCK_UNLOCK);
+
+ /*
+ * During its second pass over the heap, VACUUM calls
+ * heap_page_would_be_all_visible() to determine whether a page is
+ * all-visible and all-frozen. The logic here is similar. After completing
+ * pruning and freezing, use an assertion to verify that our results
+ * remain consistent with heap_page_would_be_all_visible().
+ */
+#ifdef USE_ASSERT_CHECKING
+ if (prstate.all_visible)
+ {
+ TransactionId debug_cutoff;
+ bool debug_all_frozen;
+
+ Assert(prstate.lpdead_items == 0);
+ Assert(prstate.cutoffs);
+
+ if (!heap_page_is_all_visible(params->relation, buffer,
+ prstate.cutoffs->OldestXmin,
+ &debug_all_frozen,
+ &debug_cutoff, off_loc))
+ Assert(false);
+
+ Assert(prstate.all_frozen == debug_all_frozen);
+
+ Assert(!TransactionIdIsValid(debug_cutoff) ||
+ debug_cutoff == prstate.visibility_cutoff_xid);
+ }
+#endif
+
/* Copy information back for caller */
presult->ndeleted = prstate.ndeleted;
presult->nnewlpdead = prstate.ndead;
presult->nfrozen = prstate.nfrozen;
presult->live_tuples = prstate.live_tuples;
presult->recently_dead_tuples = prstate.recently_dead_tuples;
- presult->all_visible = prstate.all_visible;
- presult->all_frozen = prstate.all_frozen;
presult->hastup = prstate.hastup;
-
- /*
- * For callers planning to update the visibility map, the conflict horizon
- * for that record must be the newest xmin on the page. However, if the
- * page is completely frozen, there can be no conflict and the
- * vm_conflict_horizon should remain InvalidTransactionId. This includes
- * the case that we just froze all the tuples; the prune-freeze record
- * included the conflict XID already so the caller doesn't need it.
- */
- if (presult->all_frozen)
- presult->vm_conflict_horizon = InvalidTransactionId;
- else
- presult->vm_conflict_horizon = prstate.visibility_cutoff_xid;
+ presult->new_vmbits = new_vmbits;
+ presult->old_vmbits = old_vmbits;
presult->lpdead_items = prstate.lpdead_items;
/* the presult->deadoffsets array was already filled in */
@@ -1471,6 +1680,8 @@ heap_prune_record_unchanged_lp_normal(Page page, PruneState *prstate, OffsetNumb
{
TransactionId xmin;
+ Assert(prstate->attempt_update_vm);
+
if (!HeapTupleHeaderXminCommitted(htup))
{
prstate->all_visible = false;
@@ -2121,6 +2332,65 @@ heap_log_freeze_plan(HeapTupleFreeze *tuples, int ntuples,
return nplans;
}
+/*
+ * Calculate the conflict horizon for the whole XLOG_HEAP2_PRUNE_VACUUM_SCAN
+ * record.
+ */
+static TransactionId
+get_conflict_xid(bool do_prune, bool do_freeze, bool do_set_vm,
+ TransactionId latest_xid_removed, TransactionId frz_conflict_horizon,
+ TransactionId visibility_cutoff_xid, bool blk_already_av,
+ bool set_blk_all_frozen)
+{
+ TransactionId conflict_xid;
+
+ /*
+ * We can omit the snapshot conflict horizon if we are not pruning or
+ * freezing any tuples and are setting an already all-visible page
+ * all-frozen in the VM. In this case, all of the tuples on the page must
+ * already be visible to all MVCC snapshots on the standby.
+ */
+ if (!do_prune && !do_freeze &&
+ do_set_vm && blk_already_av && set_blk_all_frozen)
+ return InvalidTransactionId;
+
+ /*
+ * The snapshotConflictHorizon for the whole record should be the most
+ * conservative of all the horizons calculated for any of the possible
+ * modifications. If this record will prune tuples, any transactions on
+ * the standby older than the youngest xmax of the most recently removed
+ * tuple this record will prune will conflict. If this record will freeze
+ * tuples, any transactions on the standby with xids older than the
+ * youngest tuple this record will freeze will conflict.
+ */
+ conflict_xid = InvalidTransactionId;
+
+ /*
+ * If we are updating the VM, the conflict horizon is almost always the
+ * visibility cutoff XID.
+ *
+ * Separately, if we are freezing any tuples, as an optimization, we can
+ * use the visibility_cutoff_xid as the conflict horizon if the page will
+ * be all-frozen. This is true even if there are LP_DEAD line pointers
+ * because we ignored those when maintaining the visibility_cutoff_xid.
+ * This will have been calculated earlier as the frz_conflict_horizon when
+ * we determined we would freeze.
+ */
+ if (do_set_vm)
+ conflict_xid = visibility_cutoff_xid;
+ else if (do_freeze)
+ conflict_xid = frz_conflict_horizon;
+
+ /*
+ * If we are removing tuples with a younger xmax than our so far
+ * calculated conflict_xid, we must use this as our horizon.
+ */
+ if (TransactionIdFollows(latest_xid_removed, conflict_xid))
+ conflict_xid = latest_xid_removed;
+
+ return conflict_xid;
+}
+
/*
* Write an XLOG_HEAP2_PRUNE* WAL record
*
@@ -2145,6 +2415,15 @@ heap_log_freeze_plan(HeapTupleFreeze *tuples, int ntuples,
* case, vmbuffer should already have been updated and marked dirty and should
* still be pinned and locked.
*
+ * set_pd_all_vis indicates that we set PD_ALL_VISIBLE and thus should update
+ * the page LSN when checksums/wal_log_hints are enabled even if we did not
+ * prune or freeze tuples on the page.
+ *
+ * In some cases, such as when heap_page_prune_and_freeze() is setting an
+ * already marked all-visible page all-frozen, PD_ALL_VISIBLE may already be
+ * set. So, it is possible for vmflags to be non-zero and set_pd_all_vis to be
+ * false.
+ *
* Note: This function scribbles on the 'frozen' array.
*
* Note: This is called in a critical section, so careful what you do here.
@@ -2154,6 +2433,7 @@ log_heap_prune_and_freeze(Relation relation, Buffer buffer,
Buffer vmbuffer, uint8 vmflags,
TransactionId conflict_xid,
bool cleanup_lock,
+ bool set_pd_all_vis,
PruneReason reason,
HeapTupleFreeze *frozen, int nfrozen,
OffsetNumber *redirected, int nredirected,
@@ -2190,7 +2470,7 @@ log_heap_prune_and_freeze(Relation relation, Buffer buffer,
*/
if (!do_prune &&
nfrozen == 0 &&
- (!do_set_vm || !XLogHintBitIsNeeded()))
+ (!set_pd_all_vis || !XLogHintBitIsNeeded()))
regbuf_flags_heap |= REGBUF_NO_IMAGE;
/*
@@ -2308,7 +2588,8 @@ log_heap_prune_and_freeze(Relation relation, Buffer buffer,
* See comment at the top of the function about regbuf_flags_heap for
* details on when we can advance the page LSN.
*/
- if (do_prune || nfrozen > 0 || (do_set_vm && XLogHintBitIsNeeded()))
+ if (do_prune || nfrozen > 0 ||
+ (set_pd_all_vis && XLogHintBitIsNeeded()))
{
Assert(BufferIsDirty(buffer));
PageSetLSN(BufferGetPage(buffer), recptr);
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index e1b7456823d..a7a974b6639 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -464,20 +464,6 @@ static void dead_items_add(LVRelState *vacrel, BlockNumber blkno, OffsetNumber *
static void dead_items_reset(LVRelState *vacrel);
static void dead_items_cleanup(LVRelState *vacrel);
-#ifdef USE_ASSERT_CHECKING
-static bool heap_page_is_all_visible(Relation rel, Buffer buf,
- TransactionId OldestXmin,
- bool *all_frozen,
- TransactionId *visibility_cutoff_xid,
- OffsetNumber *logging_offnum);
-#endif
-static bool heap_page_would_be_all_visible(Relation rel, Buffer buf,
- TransactionId OldestXmin,
- OffsetNumber *deadoffsets,
- int ndeadoffsets,
- bool *all_frozen,
- TransactionId *visibility_cutoff_xid,
- OffsetNumber *logging_offnum);
static void update_relstats_all_indexes(LVRelState *vacrel);
static void vacuum_error_callback(void *arg);
static void update_vacuum_error_info(LVRelState *vacrel,
@@ -1966,7 +1952,9 @@ lazy_scan_prune(LVRelState *vacrel,
Relation rel = vacrel->rel;
PruneFreezeResult presult;
PruneFreezeParams params = {.relation = rel,.buffer = buf,
- .reason = PRUNE_VACUUM_SCAN,.options = HEAP_PAGE_PRUNE_FREEZE,
+ .vmbuffer = vmbuffer,.blk_known_av = all_visible_according_to_vm,
+ .reason = PRUNE_VACUUM_SCAN,
+ .options = HEAP_PAGE_PRUNE_FREEZE | HEAP_PAGE_PRUNE_UPDATE_VIS,
.cutoffs = &vacrel->cutoffs,.vistest = vacrel->vistest
};
@@ -2009,33 +1997,6 @@ lazy_scan_prune(LVRelState *vacrel,
vacrel->new_frozen_tuple_pages++;
}
- /*
- * VACUUM will call heap_page_is_all_visible() during the second pass over
- * the heap to determine all_visible and all_frozen for the page -- this
- * is a specialized version of the logic from this function. Now that
- * we've finished pruning and freezing, make sure that we're in total
- * agreement with heap_page_is_all_visible() using an assertion.
- */
-#ifdef USE_ASSERT_CHECKING
- if (presult.all_visible)
- {
- TransactionId debug_cutoff;
- bool debug_all_frozen;
-
- Assert(presult.lpdead_items == 0);
-
- if (!heap_page_is_all_visible(vacrel->rel, buf,
- vacrel->cutoffs.OldestXmin, &debug_all_frozen,
- &debug_cutoff, &vacrel->offnum))
- Assert(false);
-
- Assert(presult.all_frozen == debug_all_frozen);
-
- Assert(!TransactionIdIsValid(debug_cutoff) ||
- debug_cutoff == presult.vm_conflict_horizon);
- }
-#endif
-
/*
* Now save details of the LP_DEAD items from the page in vacrel
*/
@@ -2069,168 +2030,26 @@ lazy_scan_prune(LVRelState *vacrel,
/* Did we find LP_DEAD items? */
*has_lpdead_items = (presult.lpdead_items > 0);
- Assert(!presult.all_visible || !(*has_lpdead_items));
- Assert(!presult.all_frozen || presult.all_visible);
-
/*
- * Handle setting visibility map bit based on information from the VM (as
- * of last heap_vac_scan_next_block() call), and from all_visible and
- * all_frozen variables
+ * For the purposes of logging, count whether or not the page was newly
+ * set all-visible and, potentially, all-frozen.
*/
- if (!all_visible_according_to_vm && presult.all_visible)
+ if ((presult.old_vmbits & VISIBILITYMAP_ALL_VISIBLE) == 0 &&
+ (presult.new_vmbits & VISIBILITYMAP_ALL_VISIBLE) != 0)
{
- uint8 old_vmbits;
- uint8 flags = VISIBILITYMAP_ALL_VISIBLE;
-
- if (presult.all_frozen)
- {
- Assert(!TransactionIdIsValid(presult.vm_conflict_horizon));
- flags |= VISIBILITYMAP_ALL_FROZEN;
- }
-
- /*
- * It should never be the case that the visibility map page is set
- * while the page-level bit is clear, but the reverse is allowed (if
- * checksums are not enabled). Regardless, set both bits so that we
- * get back in sync.
- *
- * NB: If the heap page is all-visible but the VM bit is not set, we
- * don't need to dirty the heap page. However, if checksums are
- * enabled, we do need to make sure that the heap page is dirtied
- * before passing it to visibilitymap_set(), because it may be logged.
- * Given that this situation should only happen in rare cases after a
- * crash, it is not worth optimizing.
- */
- PageSetAllVisible(page);
- MarkBufferDirty(buf);
- old_vmbits = visibilitymap_set(vacrel->rel, blkno, buf,
- InvalidXLogRecPtr,
- vmbuffer, presult.vm_conflict_horizon,
- flags);
-
- /*
- * If the page wasn't already set all-visible and/or all-frozen in the
- * VM, count it as newly set for logging.
- */
- if ((old_vmbits & VISIBILITYMAP_ALL_VISIBLE) == 0)
- {
- vacrel->vm_new_visible_pages++;
- if (presult.all_frozen)
- {
- vacrel->vm_new_visible_frozen_pages++;
- *vm_page_frozen = true;
- }
- }
- else if ((old_vmbits & VISIBILITYMAP_ALL_FROZEN) == 0 &&
- presult.all_frozen)
+ vacrel->vm_new_visible_pages++;
+ if ((presult.new_vmbits & VISIBILITYMAP_ALL_FROZEN) != 0)
{
- vacrel->vm_new_frozen_pages++;
+ vacrel->vm_new_visible_frozen_pages++;
*vm_page_frozen = true;
}
}
-
- /*
- * As of PostgreSQL 9.2, the visibility map bit should never be set if the
- * page-level bit is clear. However, it's possible that the bit got
- * cleared after heap_vac_scan_next_block() was called, so we must recheck
- * with buffer lock before concluding that the VM is corrupt.
- */
- else if (all_visible_according_to_vm && !PageIsAllVisible(page) &&
- visibilitymap_get_status(vacrel->rel, blkno, &vmbuffer) != 0)
- {
- ereport(WARNING,
- (errcode(ERRCODE_DATA_CORRUPTED),
- errmsg("page is not marked all-visible but visibility map bit is set in relation \"%s\" page %u",
- vacrel->relname, blkno)));
-
- visibilitymap_clear(vacrel->rel, blkno, vmbuffer,
- VISIBILITYMAP_VALID_BITS);
- }
-
- /*
- * It's possible for the value returned by
- * GetOldestNonRemovableTransactionId() to move backwards, so it's not
- * wrong for us to see tuples that appear to not be visible to everyone
- * yet, while PD_ALL_VISIBLE is already set. The real safe xmin value
- * never moves backwards, but GetOldestNonRemovableTransactionId() is
- * conservative and sometimes returns a value that's unnecessarily small,
- * so if we see that contradiction it just means that the tuples that we
- * think are not visible to everyone yet actually are, and the
- * PD_ALL_VISIBLE flag is correct.
- *
- * There should never be LP_DEAD items on a page with PD_ALL_VISIBLE set,
- * however.
- */
- else if (presult.lpdead_items > 0 && PageIsAllVisible(page))
- {
- ereport(WARNING,
- (errcode(ERRCODE_DATA_CORRUPTED),
- errmsg("page containing LP_DEAD items is marked as all-visible in relation \"%s\" page %u",
- vacrel->relname, blkno)));
-
- PageClearAllVisible(page);
- MarkBufferDirty(buf);
- visibilitymap_clear(vacrel->rel, blkno, vmbuffer,
- VISIBILITYMAP_VALID_BITS);
- }
-
- /*
- * If the all-visible page is all-frozen but not marked as such yet, mark
- * it as all-frozen.
- */
- else if (all_visible_according_to_vm && presult.all_frozen &&
- !VM_ALL_FROZEN(vacrel->rel, blkno, &vmbuffer))
+ else if ((presult.old_vmbits & VISIBILITYMAP_ALL_FROZEN) == 0 &&
+ (presult.new_vmbits & VISIBILITYMAP_ALL_FROZEN) != 0)
{
- uint8 old_vmbits;
-
- /*
- * Avoid relying on all_visible_according_to_vm as a proxy for the
- * page-level PD_ALL_VISIBLE bit being set, since it might have become
- * stale -- even when all_visible is set
- */
- if (!PageIsAllVisible(page))
- {
- PageSetAllVisible(page);
- MarkBufferDirty(buf);
- }
-
- /*
- * Set the page all-frozen (and all-visible) in the VM.
- *
- * We can pass InvalidTransactionId as our cutoff_xid, since a
- * snapshotConflictHorizon sufficient to make everything safe for REDO
- * was logged when the page's tuples were frozen.
- */
- Assert(!TransactionIdIsValid(presult.vm_conflict_horizon));
- old_vmbits = visibilitymap_set(vacrel->rel, blkno, buf,
- InvalidXLogRecPtr,
- vmbuffer, InvalidTransactionId,
- VISIBILITYMAP_ALL_VISIBLE |
- VISIBILITYMAP_ALL_FROZEN);
-
- /*
- * The page was likely already set all-visible in the VM. However,
- * there is a small chance that it was modified sometime between
- * setting all_visible_according_to_vm and checking the visibility
- * during pruning. Check the return value of old_vmbits anyway to
- * ensure the visibility map counters used for logging are accurate.
- */
- if ((old_vmbits & VISIBILITYMAP_ALL_VISIBLE) == 0)
- {
- vacrel->vm_new_visible_pages++;
- vacrel->vm_new_visible_frozen_pages++;
- *vm_page_frozen = true;
- }
-
- /*
- * We already checked that the page was not set all-frozen in the VM
- * above, so we don't need to test the value of old_vmbits.
- */
- else
- {
- vacrel->vm_new_frozen_pages++;
- *vm_page_frozen = true;
- }
+ Assert((presult.new_vmbits & VISIBILITYMAP_ALL_VISIBLE) != 0);
+ vacrel->vm_new_frozen_pages++;
+ *vm_page_frozen = true;
}
return presult.ndeleted;
@@ -2952,6 +2771,7 @@ lazy_vacuum_heap_page(LVRelState *vacrel, BlockNumber blkno, Buffer buffer,
vmflags,
conflict_xid,
false, /* no cleanup lock required */
+ (vmflags & VISIBILITYMAP_VALID_BITS) != 0,
PRUNE_VACUUM_CLEANUP,
NULL, 0, /* frozen */
NULL, 0, /* redirected */
@@ -3632,30 +3452,6 @@ dead_items_cleanup(LVRelState *vacrel)
vacrel->pvs = NULL;
}
-#ifdef USE_ASSERT_CHECKING
-
-/*
- * Wrapper for heap_page_would_be_all_visible() which can be used for callers
- * that expect no LP_DEAD on the page. Currently assert-only, but there is no
- * reason not to use it outside of asserts.
- */
-static bool
-heap_page_is_all_visible(Relation rel, Buffer buf,
- TransactionId OldestXmin,
- bool *all_frozen,
- TransactionId *visibility_cutoff_xid,
- OffsetNumber *logging_offnum)
-{
-
- return heap_page_would_be_all_visible(rel, buf,
- OldestXmin,
- NULL, 0,
- all_frozen,
- visibility_cutoff_xid,
- logging_offnum);
-}
-#endif
-
/*
* Check whether the heap page in buf is all-visible except for the dead
* tuples referenced in the deadoffsets array.
@@ -3678,15 +3474,12 @@ heap_page_is_all_visible(Relation rel, Buffer buf,
* - *logging_offnum: OffsetNumber of current tuple being processed;
* used by vacuum's error callback system.
*
- * Callers looking to verify that the page is already all-visible can call
- * heap_page_is_all_visible().
- *
* This logic is closely related to heap_prune_record_unchanged_lp_normal().
* If you modify this function, ensure consistency with that code. An
* assertion cross-checks that both remain in agreement. Do not introduce new
* side-effects.
*/
-static bool
+bool
heap_page_would_be_all_visible(Relation rel, Buffer buf,
TransactionId OldestXmin,
OffsetNumber *deadoffsets,
diff --git a/src/include/access/heapam.h b/src/include/access/heapam.h
index 632c4332a8c..937b46a77db 100644
--- a/src/include/access/heapam.h
+++ b/src/include/access/heapam.h
@@ -42,6 +42,7 @@
/* "options" flag bits for heap_page_prune_and_freeze */
#define HEAP_PAGE_PRUNE_MARK_UNUSED_NOW (1 << 0)
#define HEAP_PAGE_PRUNE_FREEZE (1 << 1)
+#define HEAP_PAGE_PRUNE_UPDATE_VIS (1 << 2)
typedef struct BulkInsertStateData *BulkInsertState;
typedef struct GlobalVisState GlobalVisState;
@@ -238,6 +239,16 @@ typedef struct PruneFreezeParams
Relation relation; /* relation containing buffer to be pruned */
Buffer buffer; /* buffer to be pruned */
+ /*
+ *
+ * vmbuffer is the buffer that must already contain contain the required
+ * block of the visibility map if we are to update it. blk_known_av is the
+ * visibility status of the heap block as of the last call to
+ * find_next_unskippable_block().
+ */
+ Buffer vmbuffer;
+ bool blk_known_av;
+
/*
* The reason pruning was performed. It is used to set the WAL record
* opcode which is used for debugging and analysis purposes.
@@ -250,8 +261,10 @@ typedef struct PruneFreezeParams
* HEAP_PAGE_PRUNE_MARK_UNUSED_NOW indicates that dead items can be set
* LP_UNUSED during pruning.
*
- * HEAP_PAGE_PRUNE_FREEZE indicates that we will also freeze tuples, and
- * will return 'all_visible', 'all_frozen' flags to the caller.
+ * HEAP_PAGE_PRUNE_FREEZE indicates that we will also freeze tuples
+ *
+ * HEAP_PAGE_PRUNE_UPDATE_VIS indicates that we will set the page's status
+ * in the VM.
*/
int options;
@@ -285,19 +298,15 @@ typedef struct PruneFreezeResult
int recently_dead_tuples;
/*
- * all_visible and all_frozen indicate if the all-visible and all-frozen
- * bits in the visibility map can be set for this page, after pruning.
- *
- * vm_conflict_horizon is the newest xmin of live tuples on the page. The
- * caller can use it as the conflict horizon when setting the VM bits. It
- * is only valid if we froze some tuples (nfrozen > 0), and all_frozen is
- * true.
+ * old_vmbits are the state of the all-visible and all-frozen bits in the
+ * visibility map before updating it during phase I of vacuuming.
+ * new_vmbits are the state of those bits after phase I of vacuuming.
*
- * These are only set if the HEAP_PRUNE_FREEZE option is set.
+ * These are only set if the HEAP_PAGE_PRUNE_UPDATE_VIS option is set and
+ * we have attempted to update the VM.
*/
- bool all_visible;
- bool all_frozen;
- TransactionId vm_conflict_horizon;
+ uint8 new_vmbits;
+ uint8 old_vmbits;
/*
* Whether or not the page makes rel truncation unsafe. This is set to
@@ -424,6 +433,7 @@ extern void log_heap_prune_and_freeze(Relation relation, Buffer buffer,
Buffer vmbuffer, uint8 vmflags,
TransactionId conflict_xid,
bool cleanup_lock,
+ bool set_pd_all_vis,
PruneReason reason,
HeapTupleFreeze *frozen, int nfrozen,
OffsetNumber *redirected, int nredirected,
@@ -433,6 +443,13 @@ extern void log_heap_prune_and_freeze(Relation relation, Buffer buffer,
/* in heap/vacuumlazy.c */
extern void heap_vacuum_rel(Relation rel,
const VacuumParams params, BufferAccessStrategy bstrategy);
+extern bool heap_page_would_be_all_visible(Relation rel, Buffer buf,
+ TransactionId OldestXmin,
+ OffsetNumber *deadoffsets,
+ int ndeadoffsets,
+ bool *all_frozen,
+ TransactionId *visibility_cutoff_xid,
+ OffsetNumber *logging_offnum);
/* in heap/heapam_visibility.c */
extern bool HeapTupleSatisfiesVisibility(HeapTuple htup, Snapshot snapshot,
--
2.43.0
v21-0006-Eliminate-XLOG_HEAP2_VISIBLE-from-empty-page-vac.patchtext/x-patch; charset=US-ASCII; name=v21-0006-Eliminate-XLOG_HEAP2_VISIBLE-from-empty-page-vac.patchDownload
From e117e20aebcbc4b3bfe5b077d9f122e171a8c6fe Mon Sep 17 00:00:00 2001
From: Melanie Plageman <melanieplageman@gmail.com>
Date: Sat, 27 Sep 2025 11:55:21 -0400
Subject: [PATCH v21 06/12] Eliminate XLOG_HEAP2_VISIBLE from empty-page vacuum
As part of removing XLOG_HEAP2_VISIBLE records, phase I of VACUUM now
marks empty pages all-visible in a XLOG_HEAP2_PRUNE_VACUUM_SCAN record.
Author: Melanie Plageman <melanieplageman@gmail.com>
Reviewed-by: Robert Haas <robertmhaas@gmail.com>
---
src/backend/access/heap/vacuumlazy.c | 36 +++++++++++++++++++++++-----
1 file changed, 30 insertions(+), 6 deletions(-)
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index a7a974b6639..fa7be0f857f 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -1872,9 +1872,12 @@ lazy_scan_new_or_empty(LVRelState *vacrel, Buffer buf, BlockNumber blkno,
*/
if (!PageIsAllVisible(page))
{
+ /* Lock vmbuffer before entering critical section */
+ LockBuffer(vmbuffer, BUFFER_LOCK_EXCLUSIVE);
+
START_CRIT_SECTION();
- /* mark buffer dirty before writing a WAL record */
+ /* Mark buffer dirty before writing any WAL records */
MarkBufferDirty(buf);
/*
@@ -1891,13 +1894,34 @@ lazy_scan_new_or_empty(LVRelState *vacrel, Buffer buf, BlockNumber blkno,
log_newpage_buffer(buf, true);
PageSetAllVisible(page);
- visibilitymap_set(vacrel->rel, blkno, buf,
- InvalidXLogRecPtr,
- vmbuffer, InvalidTransactionId,
- VISIBILITYMAP_ALL_VISIBLE |
- VISIBILITYMAP_ALL_FROZEN);
+ visibilitymap_set_vmbits(blkno,
+ vmbuffer,
+ VISIBILITYMAP_ALL_VISIBLE |
+ VISIBILITYMAP_ALL_FROZEN,
+ vacrel->rel->rd_locator);
+
+ /*
+ * Emit WAL for setting PD_ALL_VISIBLE on the heap page and
+ * setting the VM.
+ */
+ if (RelationNeedsWAL(vacrel->rel))
+ log_heap_prune_and_freeze(vacrel->rel, buf,
+ vmbuffer,
+ VISIBILITYMAP_ALL_VISIBLE |
+ VISIBILITYMAP_ALL_FROZEN,
+ InvalidTransactionId, /* conflict xid */
+ false, /* cleanup lock */
+ true, /* set_pd_all_vis */
+ PRUNE_VACUUM_SCAN, /* reason */
+ NULL, 0,
+ NULL, 0,
+ NULL, 0,
+ NULL, 0);
+
END_CRIT_SECTION();
+ LockBuffer(vmbuffer, BUFFER_LOCK_UNLOCK);
+
/* Count the newly all-frozen pages for logging */
vacrel->vm_new_visible_pages++;
vacrel->vm_new_visible_frozen_pages++;
--
2.43.0
v21-0007-Remove-XLOG_HEAP2_VISIBLE-entirely.patchtext/x-patch; charset=US-ASCII; name=v21-0007-Remove-XLOG_HEAP2_VISIBLE-entirely.patchDownload
From 1d3f9e8f397508808c01bcc827294014eac5b19b Mon Sep 17 00:00:00 2001
From: Melanie Plageman <melanieplageman@gmail.com>
Date: Sat, 27 Sep 2025 11:55:36 -0400
Subject: [PATCH v21 07/12] Remove XLOG_HEAP2_VISIBLE entirely
As no remaining users emit XLOG_HEAP2_VISIBLE records.
This includes deleting the xl_heap_visible struct and all functions
responsible for emitting or replaying XLOG_HEAP2_VISIBLE records.
Author: Melanie Plageman <melanieplageman@gmail.com>
Reviewed-by: Andrey Borodin <x4mmm@yandex-team.ru>
---
src/backend/access/common/bufmask.c | 4 +-
src/backend/access/heap/heapam.c | 54 +-------
src/backend/access/heap/heapam_xlog.c | 155 ++---------------------
src/backend/access/heap/pruneheap.c | 18 ++-
src/backend/access/heap/vacuumlazy.c | 16 +--
src/backend/access/heap/visibilitymap.c | 112 +---------------
src/backend/access/rmgrdesc/heapdesc.c | 10 --
src/backend/replication/logical/decode.c | 1 -
src/backend/storage/ipc/standby.c | 12 +-
src/include/access/heapam_xlog.h | 28 +---
src/include/access/visibilitymap.h | 13 +-
src/include/access/visibilitymapdefs.h | 9 --
src/tools/pgindent/typedefs.list | 1 -
13 files changed, 54 insertions(+), 379 deletions(-)
diff --git a/src/backend/access/common/bufmask.c b/src/backend/access/common/bufmask.c
index bb260cffa68..5f07f179415 100644
--- a/src/backend/access/common/bufmask.c
+++ b/src/backend/access/common/bufmask.c
@@ -56,8 +56,8 @@ mask_page_hint_bits(Page page)
/*
* During replay, if the page LSN has advanced past our XLOG record's LSN,
- * we don't mark the page all-visible. See heap_xlog_visible() for
- * details.
+ * we don't mark the page all-visible. See heap_xlog_prune_and_freeze()
+ * for more details.
*/
PageClearAllVisible(page);
}
diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index 4b0c49f4bb0..2bff37e03b5 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -2524,11 +2524,11 @@ heap_multi_insert(Relation relation, TupleTableSlot **slots, int ntuples,
else if (all_frozen_set)
{
PageSetAllVisible(page);
- visibilitymap_set_vmbits(BufferGetBlockNumber(buffer),
- vmbuffer,
- VISIBILITYMAP_ALL_VISIBLE |
- VISIBILITYMAP_ALL_FROZEN,
- relation->rd_locator);
+ visibilitymap_set(BufferGetBlockNumber(buffer),
+ vmbuffer,
+ VISIBILITYMAP_ALL_VISIBLE |
+ VISIBILITYMAP_ALL_FROZEN,
+ relation->rd_locator);
}
/*
@@ -8797,50 +8797,6 @@ bottomup_sort_and_shrink(TM_IndexDeleteOp *delstate)
return nblocksfavorable;
}
-/*
- * Perform XLogInsert for a heap-visible operation. 'block' is the block
- * being marked all-visible, and vm_buffer is the buffer containing the
- * corresponding visibility map block. Both should have already been modified
- * and dirtied.
- *
- * snapshotConflictHorizon comes from the largest xmin on the page being
- * marked all-visible. REDO routine uses it to generate recovery conflicts.
- *
- * If checksums or wal_log_hints are enabled, we may also generate a full-page
- * image of heap_buffer. Otherwise, we optimize away the FPI (by specifying
- * REGBUF_NO_IMAGE for the heap buffer), in which case the caller should *not*
- * update the heap page's LSN.
- */
-XLogRecPtr
-log_heap_visible(Relation rel, Buffer heap_buffer, Buffer vm_buffer,
- TransactionId snapshotConflictHorizon, uint8 vmflags)
-{
- xl_heap_visible xlrec;
- XLogRecPtr recptr;
- uint8 flags;
-
- Assert(BufferIsValid(heap_buffer));
- Assert(BufferIsValid(vm_buffer));
-
- xlrec.snapshotConflictHorizon = snapshotConflictHorizon;
- xlrec.flags = vmflags;
- if (RelationIsAccessibleInLogicalDecoding(rel))
- xlrec.flags |= VISIBILITYMAP_XLOG_CATALOG_REL;
- XLogBeginInsert();
- XLogRegisterData(&xlrec, SizeOfHeapVisible);
-
- XLogRegisterBuffer(0, vm_buffer, 0);
-
- flags = REGBUF_STANDARD;
- if (!XLogHintBitIsNeeded())
- flags |= REGBUF_NO_IMAGE;
- XLogRegisterBuffer(1, heap_buffer, flags);
-
- recptr = XLogInsert(RM_HEAP2_ID, XLOG_HEAP2_VISIBLE);
-
- return recptr;
-}
-
/*
* Perform XLogInsert for a heap-update operation. Caller must already
* have modified the buffer(s) and marked them dirty.
diff --git a/src/backend/access/heap/heapam_xlog.c b/src/backend/access/heap/heapam_xlog.c
index 2af724451c3..5ab46e8bf8f 100644
--- a/src/backend/access/heap/heapam_xlog.c
+++ b/src/backend/access/heap/heapam_xlog.c
@@ -251,7 +251,7 @@ heap_xlog_prune_freeze(XLogReaderState *record)
if (PageIsNew(vmpage))
PageInit(vmpage, BLCKSZ, 0);
- visibilitymap_set_vmbits(blkno, vmbuffer, vmflags, rlocator);
+ visibilitymap_set(blkno, vmbuffer, vmflags, rlocator);
Assert(BufferIsDirty(vmbuffer));
PageSetLSN(vmpage, lsn);
@@ -264,142 +264,6 @@ heap_xlog_prune_freeze(XLogReaderState *record)
XLogRecordPageWithFreeSpace(rlocator, blkno, freespace);
}
-/*
- * Replay XLOG_HEAP2_VISIBLE records.
- *
- * The critical integrity requirement here is that we must never end up with
- * a situation where the visibility map bit is set, and the page-level
- * PD_ALL_VISIBLE bit is clear. If that were to occur, then a subsequent
- * page modification would fail to clear the visibility map bit.
- */
-static void
-heap_xlog_visible(XLogReaderState *record)
-{
- XLogRecPtr lsn = record->EndRecPtr;
- xl_heap_visible *xlrec = (xl_heap_visible *) XLogRecGetData(record);
- Buffer vmbuffer = InvalidBuffer;
- Buffer buffer;
- Page page;
- RelFileLocator rlocator;
- BlockNumber blkno;
- XLogRedoAction action;
-
- Assert((xlrec->flags & VISIBILITYMAP_XLOG_VALID_BITS) == xlrec->flags);
-
- XLogRecGetBlockTag(record, 1, &rlocator, NULL, &blkno);
-
- /*
- * If there are any Hot Standby transactions running that have an xmin
- * horizon old enough that this page isn't all-visible for them, they
- * might incorrectly decide that an index-only scan can skip a heap fetch.
- *
- * NB: It might be better to throw some kind of "soft" conflict here that
- * forces any index-only scan that is in flight to perform heap fetches,
- * rather than killing the transaction outright.
- */
- if (InHotStandby)
- ResolveRecoveryConflictWithSnapshot(xlrec->snapshotConflictHorizon,
- xlrec->flags & VISIBILITYMAP_XLOG_CATALOG_REL,
- rlocator);
-
- /*
- * Read the heap page, if it still exists. If the heap file has dropped or
- * truncated later in recovery, we don't need to update the page, but we'd
- * better still update the visibility map.
- */
- action = XLogReadBufferForRedo(record, 1, &buffer);
- if (action == BLK_NEEDS_REDO)
- {
- /*
- * We don't bump the LSN of the heap page when setting the visibility
- * map bit (unless checksums or wal_hint_bits is enabled, in which
- * case we must). This exposes us to torn page hazards, but since
- * we're not inspecting the existing page contents in any way, we
- * don't care.
- */
- page = BufferGetPage(buffer);
-
- PageSetAllVisible(page);
-
- if (XLogHintBitIsNeeded())
- PageSetLSN(page, lsn);
-
- MarkBufferDirty(buffer);
- }
- else if (action == BLK_RESTORED)
- {
- /*
- * If heap block was backed up, we already restored it and there's
- * nothing more to do. (This can only happen with checksums or
- * wal_log_hints enabled.)
- */
- }
-
- if (BufferIsValid(buffer))
- {
- Size space = PageGetFreeSpace(BufferGetPage(buffer));
-
- UnlockReleaseBuffer(buffer);
-
- /*
- * Since FSM is not WAL-logged and only updated heuristically, it
- * easily becomes stale in standbys. If the standby is later promoted
- * and runs VACUUM, it will skip updating individual free space
- * figures for pages that became all-visible (or all-frozen, depending
- * on the vacuum mode,) which is troublesome when FreeSpaceMapVacuum
- * propagates too optimistic free space values to upper FSM layers;
- * later inserters try to use such pages only to find out that they
- * are unusable. This can cause long stalls when there are many such
- * pages.
- *
- * Forestall those problems by updating FSM's idea about a page that
- * is becoming all-visible or all-frozen.
- *
- * Do this regardless of a full-page image being applied, since the
- * FSM data is not in the page anyway.
- */
- if (xlrec->flags & VISIBILITYMAP_VALID_BITS)
- XLogRecordPageWithFreeSpace(rlocator, blkno, space);
- }
-
- /*
- * Even if we skipped the heap page update due to the LSN interlock, it's
- * still safe to update the visibility map. Any WAL record that clears
- * the visibility map bit does so before checking the page LSN, so any
- * bits that need to be cleared will still be cleared.
- */
- if (XLogReadBufferForRedoExtended(record, 0, RBM_ZERO_ON_ERROR, false,
- &vmbuffer) == BLK_NEEDS_REDO)
- {
- Page vmpage = BufferGetPage(vmbuffer);
- Relation reln;
- uint8 vmbits;
-
- /* initialize the page if it was read as zeros */
- if (PageIsNew(vmpage))
- PageInit(vmpage, BLCKSZ, 0);
-
- /* remove VISIBILITYMAP_XLOG_* */
- vmbits = xlrec->flags & VISIBILITYMAP_VALID_BITS;
-
- /*
- * XLogReadBufferForRedoExtended locked the buffer. But
- * visibilitymap_set will handle locking itself.
- */
- LockBuffer(vmbuffer, BUFFER_LOCK_UNLOCK);
-
- reln = CreateFakeRelcacheEntry(rlocator);
-
- visibilitymap_set(reln, blkno, InvalidBuffer, lsn, vmbuffer,
- xlrec->snapshotConflictHorizon, vmbits);
-
- ReleaseBuffer(vmbuffer);
- FreeFakeRelcacheEntry(reln);
- }
- else if (BufferIsValid(vmbuffer))
- UnlockReleaseBuffer(vmbuffer);
-}
-
/*
* Given an "infobits" field from an XLog record, set the correct bits in the
* given infomask and infomask2 for the tuple touched by the record.
@@ -777,8 +641,8 @@ heap_xlog_multi_insert(XLogReaderState *record)
*
* During recovery, however, no concurrent writers exist. Therefore,
* updating the VM without holding the heap page lock is safe enough. This
- * same approach is taken when replaying xl_heap_visible records (see
- * heap_xlog_visible()).
+ * same approach is taken when replaying XLOG_HEAP2_PRUNE* records (see
+ * heap_xlog_prune_and_freeze()).
*/
if ((xlrec->flags & XLH_INSERT_ALL_FROZEN_SET) &&
XLogReadBufferForRedoExtended(record, 1, RBM_ZERO_ON_ERROR, false,
@@ -790,11 +654,11 @@ heap_xlog_multi_insert(XLogReaderState *record)
if (PageIsNew(vmpage))
PageInit(vmpage, BLCKSZ, 0);
- visibilitymap_set_vmbits(blkno,
- vmbuffer,
- VISIBILITYMAP_ALL_VISIBLE |
- VISIBILITYMAP_ALL_FROZEN,
- rlocator);
+ visibilitymap_set(blkno,
+ vmbuffer,
+ VISIBILITYMAP_ALL_VISIBLE |
+ VISIBILITYMAP_ALL_FROZEN,
+ rlocator);
Assert(BufferIsDirty(vmbuffer));
PageSetLSN(vmpage, lsn);
@@ -1375,9 +1239,6 @@ heap2_redo(XLogReaderState *record)
case XLOG_HEAP2_PRUNE_VACUUM_CLEANUP:
heap_xlog_prune_freeze(record);
break;
- case XLOG_HEAP2_VISIBLE:
- heap_xlog_visible(record);
- break;
case XLOG_HEAP2_MULTI_INSERT:
heap_xlog_multi_insert(record);
break;
diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index ba578c1ce0f..80037d690e3 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -1112,9 +1112,9 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
{
Assert(PageIsAllVisible(page));
- old_vmbits = visibilitymap_set_vmbits(blockno,
- vmbuffer, new_vmbits,
- params->relation->rd_locator);
+ old_vmbits = visibilitymap_set(blockno,
+ vmbuffer, new_vmbits,
+ params->relation->rd_locator);
if (old_vmbits == new_vmbits)
{
LockBuffer(vmbuffer, BUFFER_LOCK_UNLOCK);
@@ -2396,14 +2396,18 @@ get_conflict_xid(bool do_prune, bool do_freeze, bool do_set_vm,
*
* This is used for several different page maintenance operations:
*
- * - Page pruning, in VACUUM's 1st pass or on access: Some items are
+ * - Page pruning, in vacuum phase I or on-access: Some items are
* redirected, some marked dead, and some removed altogether.
*
- * - Freezing: Items are marked as 'frozen'.
+ * - Freezing: During vacuum phase I, items are marked as 'frozen'
*
- * - Vacuum, 2nd pass: Items that are already LP_DEAD are marked as unused.
+ * - Reaping: During vacuum phase III, items that are already LP_DEAD are
+ * marked as unused.
*
- * They have enough commonalities that we use a single WAL record for them
+ * - VM updates: After vacuum phases I and III, the heap page may be marked
+ * all-visible and all-frozen.
+ *
+ * These changes all happen together, so we use a single WAL record for them
* all.
*
* If replaying the record requires a cleanup lock, pass cleanup_lock = true.
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index fa7be0f857f..fd68dfcfce2 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -1894,11 +1894,11 @@ lazy_scan_new_or_empty(LVRelState *vacrel, Buffer buf, BlockNumber blkno,
log_newpage_buffer(buf, true);
PageSetAllVisible(page);
- visibilitymap_set_vmbits(blkno,
- vmbuffer,
- VISIBILITYMAP_ALL_VISIBLE |
- VISIBILITYMAP_ALL_FROZEN,
- vacrel->rel->rd_locator);
+ visibilitymap_set(blkno,
+ vmbuffer,
+ VISIBILITYMAP_ALL_VISIBLE |
+ VISIBILITYMAP_ALL_FROZEN,
+ vacrel->rel->rd_locator);
/*
* Emit WAL for setting PD_ALL_VISIBLE on the heap page and
@@ -2776,9 +2776,9 @@ lazy_vacuum_heap_page(LVRelState *vacrel, BlockNumber blkno, Buffer buffer,
* set PD_ALL_VISIBLE.
*/
PageSetAllVisible(page);
- visibilitymap_set_vmbits(blkno,
- vmbuffer, vmflags,
- vacrel->rel->rd_locator);
+ visibilitymap_set(blkno,
+ vmbuffer, vmflags,
+ vacrel->rel->rd_locator);
conflict_xid = visibility_cutoff_xid;
}
diff --git a/src/backend/access/heap/visibilitymap.c b/src/backend/access/heap/visibilitymap.c
index d14588e92ae..7997e926872 100644
--- a/src/backend/access/heap/visibilitymap.c
+++ b/src/backend/access/heap/visibilitymap.c
@@ -14,8 +14,7 @@
* visibilitymap_clear - clear bits for one page in the visibility map
* visibilitymap_pin - pin a map page for setting a bit
* visibilitymap_pin_ok - check whether correct map page is already pinned
- * visibilitymap_set - set bit(s) in a previously pinned page and log
- * visibilitymap_set_vmbits - set bit(s) in a pinned page
+ * visibilitymap_set - set bit(s) in a previously pinned page
* visibilitymap_get_status - get status of bits
* visibilitymap_count - count number of bits set in visibility map
* visibilitymap_prepare_truncate -
@@ -220,109 +219,6 @@ visibilitymap_pin_ok(BlockNumber heapBlk, Buffer vmbuf)
return BufferIsValid(vmbuf) && BufferGetBlockNumber(vmbuf) == mapBlock;
}
-/*
- * visibilitymap_set - set bit(s) on a previously pinned page
- *
- * recptr is the LSN of the XLOG record we're replaying, if we're in recovery,
- * or InvalidXLogRecPtr in normal running. The VM page LSN is advanced to the
- * one provided; in normal running, we generate a new XLOG record and set the
- * page LSN to that value (though the heap page's LSN may *not* be updated;
- * see below). cutoff_xid is the largest xmin on the page being marked
- * all-visible; it is needed for Hot Standby, and can be InvalidTransactionId
- * if the page contains no tuples. It can also be set to InvalidTransactionId
- * when a page that is already all-visible is being marked all-frozen.
- *
- * Caller is expected to set the heap page's PD_ALL_VISIBLE bit before calling
- * this function. Except in recovery, caller should also pass the heap
- * buffer. When checksums are enabled and we're not in recovery, we must add
- * the heap buffer to the WAL chain to protect it from being torn.
- *
- * You must pass a buffer containing the correct map page to this function.
- * Call visibilitymap_pin first to pin the right one. This function doesn't do
- * any I/O.
- *
- * Returns the state of the page's VM bits before setting flags.
- */
-uint8
-visibilitymap_set(Relation rel, BlockNumber heapBlk, Buffer heapBuf,
- XLogRecPtr recptr, Buffer vmBuf, TransactionId cutoff_xid,
- uint8 flags)
-{
- BlockNumber mapBlock = HEAPBLK_TO_MAPBLOCK(heapBlk);
- uint32 mapByte = HEAPBLK_TO_MAPBYTE(heapBlk);
- uint8 mapOffset = HEAPBLK_TO_OFFSET(heapBlk);
- Page page;
- uint8 *map;
- uint8 status;
-
-#ifdef TRACE_VISIBILITYMAP
- elog(DEBUG1, "vm_set flags 0x%02X for %s %d",
- flags, RelationGetRelationName(rel), heapBlk);
-#endif
-
- Assert(InRecovery || !XLogRecPtrIsValid(recptr));
- Assert(InRecovery || PageIsAllVisible(BufferGetPage(heapBuf)));
- Assert((flags & VISIBILITYMAP_VALID_BITS) == flags);
-
- /* Must never set all_frozen bit without also setting all_visible bit */
- Assert(flags != VISIBILITYMAP_ALL_FROZEN);
-
- /* Check that we have the right heap page pinned, if present */
- if (BufferIsValid(heapBuf) && BufferGetBlockNumber(heapBuf) != heapBlk)
- elog(ERROR, "wrong heap buffer passed to visibilitymap_set");
-
- Assert(!BufferIsValid(heapBuf) ||
- BufferIsLockedByMeInMode(heapBuf, BUFFER_LOCK_EXCLUSIVE));
-
- /* Check that we have the right VM page pinned */
- if (!BufferIsValid(vmBuf) || BufferGetBlockNumber(vmBuf) != mapBlock)
- elog(ERROR, "wrong VM buffer passed to visibilitymap_set");
-
- page = BufferGetPage(vmBuf);
- map = (uint8 *) PageGetContents(page);
- LockBuffer(vmBuf, BUFFER_LOCK_EXCLUSIVE);
-
- status = (map[mapByte] >> mapOffset) & VISIBILITYMAP_VALID_BITS;
- if (flags != status)
- {
- START_CRIT_SECTION();
-
- map[mapByte] |= (flags << mapOffset);
- MarkBufferDirty(vmBuf);
-
- if (RelationNeedsWAL(rel))
- {
- if (!XLogRecPtrIsValid(recptr))
- {
- Assert(!InRecovery);
- recptr = log_heap_visible(rel, heapBuf, vmBuf, cutoff_xid, flags);
-
- /*
- * If data checksums are enabled (or wal_log_hints=on), we
- * need to protect the heap page from being torn.
- *
- * If not, then we must *not* update the heap page's LSN. In
- * this case, the FPI for the heap page was omitted from the
- * WAL record inserted above, so it would be incorrect to
- * update the heap page's LSN.
- */
- if (XLogHintBitIsNeeded())
- {
- Page heapPage = BufferGetPage(heapBuf);
-
- PageSetLSN(heapPage, recptr);
- }
- }
- PageSetLSN(page, recptr);
- }
-
- END_CRIT_SECTION();
- }
-
- LockBuffer(vmBuf, BUFFER_LOCK_UNLOCK);
- return status;
-}
-
/*
* Set VM (visibility map) flags in the VM block in vmBuf.
*
@@ -344,9 +240,9 @@ visibilitymap_set(Relation rel, BlockNumber heapBlk, Buffer heapBuf,
* rlocator is used only for debugging messages.
*/
uint8
-visibilitymap_set_vmbits(BlockNumber heapBlk,
- Buffer vmBuf, uint8 flags,
- const RelFileLocator rlocator)
+visibilitymap_set(BlockNumber heapBlk,
+ Buffer vmBuf, uint8 flags,
+ const RelFileLocator rlocator)
{
BlockNumber mapBlock = HEAPBLK_TO_MAPBLOCK(heapBlk);
uint32 mapByte = HEAPBLK_TO_MAPBYTE(heapBlk);
diff --git a/src/backend/access/rmgrdesc/heapdesc.c b/src/backend/access/rmgrdesc/heapdesc.c
index ca26d1f0ed1..08461fdf593 100644
--- a/src/backend/access/rmgrdesc/heapdesc.c
+++ b/src/backend/access/rmgrdesc/heapdesc.c
@@ -349,13 +349,6 @@ heap2_desc(StringInfo buf, XLogReaderState *record)
}
}
}
- else if (info == XLOG_HEAP2_VISIBLE)
- {
- xl_heap_visible *xlrec = (xl_heap_visible *) rec;
-
- appendStringInfo(buf, "snapshotConflictHorizon: %u, flags: 0x%02X",
- xlrec->snapshotConflictHorizon, xlrec->flags);
- }
else if (info == XLOG_HEAP2_MULTI_INSERT)
{
xl_heap_multi_insert *xlrec = (xl_heap_multi_insert *) rec;
@@ -461,9 +454,6 @@ heap2_identify(uint8 info)
case XLOG_HEAP2_PRUNE_VACUUM_CLEANUP:
id = "PRUNE_VACUUM_CLEANUP";
break;
- case XLOG_HEAP2_VISIBLE:
- id = "VISIBLE";
- break;
case XLOG_HEAP2_MULTI_INSERT:
id = "MULTI_INSERT";
break;
diff --git a/src/backend/replication/logical/decode.c b/src/backend/replication/logical/decode.c
index cc03f0706e9..2fdd4af90a8 100644
--- a/src/backend/replication/logical/decode.c
+++ b/src/backend/replication/logical/decode.c
@@ -454,7 +454,6 @@ heap2_decode(LogicalDecodingContext *ctx, XLogRecordBuffer *buf)
case XLOG_HEAP2_PRUNE_ON_ACCESS:
case XLOG_HEAP2_PRUNE_VACUUM_SCAN:
case XLOG_HEAP2_PRUNE_VACUUM_CLEANUP:
- case XLOG_HEAP2_VISIBLE:
case XLOG_HEAP2_LOCK_UPDATED:
break;
default:
diff --git a/src/backend/storage/ipc/standby.c b/src/backend/storage/ipc/standby.c
index 4222bdab078..c619643e121 100644
--- a/src/backend/storage/ipc/standby.c
+++ b/src/backend/storage/ipc/standby.c
@@ -475,12 +475,12 @@ ResolveRecoveryConflictWithSnapshot(TransactionId snapshotConflictHorizon,
* If we get passed InvalidTransactionId then we do nothing (no conflict).
*
* This can happen when replaying already-applied WAL records after a
- * standby crash or restart, or when replaying an XLOG_HEAP2_VISIBLE
- * record that marks as frozen a page which was already all-visible. It's
- * also quite common with records generated during index deletion
- * (original execution of the deletion can reason that a recovery conflict
- * which is sufficient for the deletion operation must take place before
- * replay of the deletion record itself).
+ * standby crash or restart, or when replaying a record that marks as
+ * frozen a page which was already marked all-visible in the visibility
+ * map. It's also quite common with records generated during index
+ * deletion (original execution of the deletion can reason that a recovery
+ * conflict which is sufficient for the deletion operation must take place
+ * before replay of the deletion record itself).
*/
if (!TransactionIdIsValid(snapshotConflictHorizon))
return;
diff --git a/src/include/access/heapam_xlog.h b/src/include/access/heapam_xlog.h
index 16c2b2e3c9c..e9e77bd678b 100644
--- a/src/include/access/heapam_xlog.h
+++ b/src/include/access/heapam_xlog.h
@@ -60,7 +60,6 @@
#define XLOG_HEAP2_PRUNE_ON_ACCESS 0x10
#define XLOG_HEAP2_PRUNE_VACUUM_SCAN 0x20
#define XLOG_HEAP2_PRUNE_VACUUM_CLEANUP 0x30
-#define XLOG_HEAP2_VISIBLE 0x40
#define XLOG_HEAP2_MULTI_INSERT 0x50
#define XLOG_HEAP2_LOCK_UPDATED 0x60
#define XLOG_HEAP2_NEW_CID 0x70
@@ -294,7 +293,13 @@ typedef struct xl_heap_prune
#define SizeOfHeapPrune (offsetof(xl_heap_prune, flags) + sizeof(uint16))
-/* to handle recovery conflict during logical decoding on standby */
+/*
+ * To handle recovery conflict during logical decoding on standby, we must know
+ * if the table is a catalog table. Note that in visibilitymapdefs.h
+ * VISIBLITYMAP_XLOG_CATALOG_REL is also defined as (1 << 2). xl_heap_prune
+ * records should use XLHP_IS_CATALOG_REL, not VISIBILIYTMAP_XLOG_CATALOG_REL --
+ * even if they only contain updates to the VM.
+ */
#define XLHP_IS_CATALOG_REL (1 << 1)
/*
@@ -443,20 +448,6 @@ typedef struct xl_heap_inplace
#define MinSizeOfHeapInplace (offsetof(xl_heap_inplace, nmsgs) + sizeof(int))
-/*
- * This is what we need to know about setting a visibility map bit
- *
- * Backup blk 0: visibility map buffer
- * Backup blk 1: heap buffer
- */
-typedef struct xl_heap_visible
-{
- TransactionId snapshotConflictHorizon;
- uint8 flags;
-} xl_heap_visible;
-
-#define SizeOfHeapVisible (offsetof(xl_heap_visible, flags) + sizeof(uint8))
-
typedef struct xl_heap_new_cid
{
/*
@@ -500,11 +491,6 @@ extern void heap2_desc(StringInfo buf, XLogReaderState *record);
extern const char *heap2_identify(uint8 info);
extern void heap_xlog_logical_rewrite(XLogReaderState *r);
-extern XLogRecPtr log_heap_visible(Relation rel, Buffer heap_buffer,
- Buffer vm_buffer,
- TransactionId snapshotConflictHorizon,
- uint8 vmflags);
-
/* in heapdesc.c, so it can be shared between frontend/backend code */
extern void heap_xlog_deserialize_prune_and_freeze(char *cursor, uint16 flags,
int *nplans, xlhp_freeze_plan **plans,
diff --git a/src/include/access/visibilitymap.h b/src/include/access/visibilitymap.h
index c6fa37be968..05ba6786b47 100644
--- a/src/include/access/visibilitymap.h
+++ b/src/include/access/visibilitymap.h
@@ -15,7 +15,6 @@
#define VISIBILITYMAP_H
#include "access/visibilitymapdefs.h"
-#include "access/xlogdefs.h"
#include "storage/block.h"
#include "storage/buf.h"
#include "storage/relfilelocator.h"
@@ -32,15 +31,9 @@ extern bool visibilitymap_clear(Relation rel, BlockNumber heapBlk,
extern void visibilitymap_pin(Relation rel, BlockNumber heapBlk,
Buffer *vmbuf);
extern bool visibilitymap_pin_ok(BlockNumber heapBlk, Buffer vmbuf);
-extern uint8 visibilitymap_set(Relation rel,
- BlockNumber heapBlk, Buffer heapBuf,
- XLogRecPtr recptr,
- Buffer vmBuf,
- TransactionId cutoff_xid,
- uint8 flags);
-extern uint8 visibilitymap_set_vmbits(BlockNumber heapBlk,
- Buffer vmBuf, uint8 flags,
- const RelFileLocator rlocator);
+extern uint8 visibilitymap_set(BlockNumber heapBlk,
+ Buffer vmBuf, uint8 flags,
+ const RelFileLocator rlocator);
extern uint8 visibilitymap_get_status(Relation rel, BlockNumber heapBlk, Buffer *vmbuf);
extern void visibilitymap_count(Relation rel, BlockNumber *all_visible, BlockNumber *all_frozen);
extern BlockNumber visibilitymap_prepare_truncate(Relation rel,
diff --git a/src/include/access/visibilitymapdefs.h b/src/include/access/visibilitymapdefs.h
index 5ad5c020877..e01bce4c99f 100644
--- a/src/include/access/visibilitymapdefs.h
+++ b/src/include/access/visibilitymapdefs.h
@@ -21,14 +21,5 @@
#define VISIBILITYMAP_ALL_FROZEN 0x02
#define VISIBILITYMAP_VALID_BITS 0x03 /* OR of all valid visibilitymap
* flags bits */
-/*
- * To detect recovery conflicts during logical decoding on a standby, we need
- * to know if a table is a user catalog table. For that we add an additional
- * bit into xl_heap_visible.flags, in addition to the above.
- *
- * NB: VISIBILITYMAP_XLOG_* may not be passed to visibilitymap_set().
- */
-#define VISIBILITYMAP_XLOG_CATALOG_REL 0x04
-#define VISIBILITYMAP_XLOG_VALID_BITS (VISIBILITYMAP_VALID_BITS | VISIBILITYMAP_XLOG_CATALOG_REL)
#endif /* VISIBILITYMAPDEFS_H */
diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list
index c751c25a04d..2a9951b7188 100644
--- a/src/tools/pgindent/typedefs.list
+++ b/src/tools/pgindent/typedefs.list
@@ -4292,7 +4292,6 @@ xl_heap_prune
xl_heap_rewrite_mapping
xl_heap_truncate
xl_heap_update
-xl_heap_visible
xl_invalid_page
xl_invalid_page_key
xl_invalidations
--
2.43.0
v21-0008-Rename-GlobalVisTestIsRemovableXid-to-GlobalVisX.patchtext/x-patch; charset=UTF-8; name=v21-0008-Rename-GlobalVisTestIsRemovableXid-to-GlobalVisX.patchDownload
From 2029bdec49e880e8d3453cd7a2246a93e69b867d Mon Sep 17 00:00:00 2001
From: Melanie Plageman <melanieplageman@gmail.com>
Date: Fri, 18 Jul 2025 16:30:04 -0400
Subject: [PATCH v21 08/12] Rename GlobalVisTestIsRemovableXid() to
GlobalVisXidVisibleToAll()
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
The function is currently only used to check whether a tuple’s xmax is
visible to all transactions (and thus removable). Upcoming changes will
also use it to test whether a tuple’s xmin is visible to all to
decide if a page can be marked all-visible in the visibility map.
The new name, GlobalVisXidVisibleToAll(), better reflects this broader
purpose.
Reviewed-by: Kirill Reshke <reshkekirill@gmail.com>
---
src/backend/access/heap/heapam_visibility.c | 6 +++---
src/backend/access/heap/pruneheap.c | 12 ++++++------
src/backend/access/spgist/spgvacuum.c | 2 +-
src/backend/storage/ipc/procarray.c | 17 ++++++++---------
src/include/utils/snapmgr.h | 4 ++--
5 files changed, 20 insertions(+), 21 deletions(-)
diff --git a/src/backend/access/heap/heapam_visibility.c b/src/backend/access/heap/heapam_visibility.c
index 05f6946fe60..4ebc8abdbeb 100644
--- a/src/backend/access/heap/heapam_visibility.c
+++ b/src/backend/access/heap/heapam_visibility.c
@@ -1447,7 +1447,7 @@ HeapTupleSatisfiesNonVacuumable(HeapTuple htup, Snapshot snapshot,
{
Assert(TransactionIdIsValid(dead_after));
- if (GlobalVisTestIsRemovableXid(snapshot->vistest, dead_after))
+ if (GlobalVisXidVisibleToAll(snapshot->vistest, dead_after))
res = HEAPTUPLE_DEAD;
}
else
@@ -1512,8 +1512,8 @@ HeapTupleIsSurelyDead(HeapTuple htup, GlobalVisState *vistest)
return false;
/* Deleter committed, so tuple is dead if the XID is old enough. */
- return GlobalVisTestIsRemovableXid(vistest,
- HeapTupleHeaderGetRawXmax(tuple));
+ return GlobalVisXidVisibleToAll(vistest,
+ HeapTupleHeaderGetRawXmax(tuple));
}
/*
diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index 80037d690e3..989af765702 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -250,7 +250,7 @@ heap_page_prune_opt(Relation relation, Buffer buffer)
*/
vistest = GlobalVisTestFor(relation);
- if (!GlobalVisTestIsRemovableXid(vistest, prune_xid))
+ if (!GlobalVisXidVisibleToAll(vistest, prune_xid))
return;
/*
@@ -476,7 +476,7 @@ prune_freeze_plan(Oid reloid, Buffer buffer, PruneState *prstate,
* Determining HTSV only once for each tuple is required for correctness,
* to deal with cases where running HTSV twice could result in different
* results. For example, RECENTLY_DEAD can turn to DEAD if another
- * checked item causes GlobalVisTestIsRemovableFullXid() to update the
+ * checked item causes GlobalVisFullXidVisibleToAll() to update the
* horizon, or INSERT_IN_PROGRESS can change to DEAD if the inserting
* transaction aborts.
*
@@ -1238,11 +1238,11 @@ heap_prune_satisfies_vacuum(PruneState *prstate, HeapTuple tup, Buffer buffer)
* Determine whether or not the tuple is considered dead when compared
* with the provided GlobalVisState. On-access pruning does not provide
* VacuumCutoffs. And for vacuum, even if the tuple's xmax is not older
- * than OldestXmin, GlobalVisTestIsRemovableXid() could find the row dead
- * if the GlobalVisState has been updated since the beginning of vacuuming
+ * than OldestXmin, GlobalVisXidVisibleToAll() could find the row dead if
+ * the GlobalVisState has been updated since the beginning of vacuuming
* the relation.
*/
- if (GlobalVisTestIsRemovableXid(prstate->vistest, dead_after))
+ if (GlobalVisXidVisibleToAll(prstate->vistest, dead_after))
return HEAPTUPLE_DEAD;
return res;
@@ -1701,7 +1701,7 @@ heap_prune_record_unchanged_lp_normal(Page page, PruneState *prstate, OffsetNumb
/*
* For now always use prstate->cutoffs for this test, because
* we only update 'all_visible' and 'all_frozen' when freezing
- * is requested. We could use GlobalVisTestIsRemovableXid
+ * is requested. We could use GlobalVisXidVisibleToAll()
* instead, if a non-freezing caller wanted to set the VM bit.
*/
Assert(prstate->cutoffs);
diff --git a/src/backend/access/spgist/spgvacuum.c b/src/backend/access/spgist/spgvacuum.c
index 71ef2e5036f..1c0eb425ee9 100644
--- a/src/backend/access/spgist/spgvacuum.c
+++ b/src/backend/access/spgist/spgvacuum.c
@@ -536,7 +536,7 @@ vacuumRedirectAndPlaceholder(Relation index, Relation heaprel, Buffer buffer)
*/
if (dt->tupstate == SPGIST_REDIRECT &&
(!TransactionIdIsValid(dt->xid) ||
- GlobalVisTestIsRemovableXid(vistest, dt->xid)))
+ GlobalVisXidVisibleToAll(vistest, dt->xid)))
{
dt->tupstate = SPGIST_PLACEHOLDER;
Assert(opaque->nRedirection > 0);
diff --git a/src/backend/storage/ipc/procarray.c b/src/backend/storage/ipc/procarray.c
index 200f72c6e25..235c3b584f6 100644
--- a/src/backend/storage/ipc/procarray.c
+++ b/src/backend/storage/ipc/procarray.c
@@ -4181,8 +4181,7 @@ GlobalVisUpdate(void)
* See comment for GlobalVisState for details.
*/
bool
-GlobalVisTestIsRemovableFullXid(GlobalVisState *state,
- FullTransactionId fxid)
+GlobalVisFullXidVisibleToAll(GlobalVisState *state, FullTransactionId fxid)
{
/*
* If fxid is older than maybe_needed bound, it definitely is visible to
@@ -4216,14 +4215,14 @@ GlobalVisTestIsRemovableFullXid(GlobalVisState *state,
}
/*
- * Wrapper around GlobalVisTestIsRemovableFullXid() for 32bit xids.
+ * Wrapper around GlobalVisFullXidVisibleToAll() for 32bit xids.
*
* It is crucial that this only gets called for xids from a source that
* protects against xid wraparounds (e.g. from a table and thus protected by
* relfrozenxid).
*/
bool
-GlobalVisTestIsRemovableXid(GlobalVisState *state, TransactionId xid)
+GlobalVisXidVisibleToAll(GlobalVisState *state, TransactionId xid)
{
FullTransactionId fxid;
@@ -4237,12 +4236,12 @@ GlobalVisTestIsRemovableXid(GlobalVisState *state, TransactionId xid)
*/
fxid = FullXidRelativeTo(state->definitely_needed, xid);
- return GlobalVisTestIsRemovableFullXid(state, fxid);
+ return GlobalVisFullXidVisibleToAll(state, fxid);
}
/*
* Convenience wrapper around GlobalVisTestFor() and
- * GlobalVisTestIsRemovableFullXid(), see their comments.
+ * GlobalVisFullXidVisibleToAll(), see their comments.
*/
bool
GlobalVisCheckRemovableFullXid(Relation rel, FullTransactionId fxid)
@@ -4251,12 +4250,12 @@ GlobalVisCheckRemovableFullXid(Relation rel, FullTransactionId fxid)
state = GlobalVisTestFor(rel);
- return GlobalVisTestIsRemovableFullXid(state, fxid);
+ return GlobalVisFullXidVisibleToAll(state, fxid);
}
/*
* Convenience wrapper around GlobalVisTestFor() and
- * GlobalVisTestIsRemovableXid(), see their comments.
+ * GlobalVisTestIsVisibleXid(), see their comments.
*/
bool
GlobalVisCheckRemovableXid(Relation rel, TransactionId xid)
@@ -4265,7 +4264,7 @@ GlobalVisCheckRemovableXid(Relation rel, TransactionId xid)
state = GlobalVisTestFor(rel);
- return GlobalVisTestIsRemovableXid(state, xid);
+ return GlobalVisXidVisibleToAll(state, xid);
}
/*
diff --git a/src/include/utils/snapmgr.h b/src/include/utils/snapmgr.h
index 604c1f90216..a0ea2cfcea2 100644
--- a/src/include/utils/snapmgr.h
+++ b/src/include/utils/snapmgr.h
@@ -100,8 +100,8 @@ extern char *ExportSnapshot(Snapshot snapshot);
*/
typedef struct GlobalVisState GlobalVisState;
extern GlobalVisState *GlobalVisTestFor(Relation rel);
-extern bool GlobalVisTestIsRemovableXid(GlobalVisState *state, TransactionId xid);
-extern bool GlobalVisTestIsRemovableFullXid(GlobalVisState *state, FullTransactionId fxid);
+extern bool GlobalVisXidVisibleToAll(GlobalVisState *state, TransactionId xid);
+extern bool GlobalVisFullXidVisibleToAll(GlobalVisState *state, FullTransactionId fxid);
extern bool GlobalVisCheckRemovableXid(Relation rel, TransactionId xid);
extern bool GlobalVisCheckRemovableFullXid(Relation rel, FullTransactionId fxid);
--
2.43.0
v21-0009-Use-GlobalVisState-in-vacuum-to-determine-page-l.patchtext/x-patch; charset=UTF-8; name=v21-0009-Use-GlobalVisState-in-vacuum-to-determine-page-l.patchDownload
From 2234e4bc98c173d740c27aa55347e92baec3e6d3 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <melanieplageman@gmail.com>
Date: Tue, 29 Jul 2025 14:38:24 -0400
Subject: [PATCH v21 09/12] Use GlobalVisState in vacuum to determine page
level visibility
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
During vacuum's first and third phases, we examine tuples' visibility
to determine if we can set the page all-visible in the visibility map.
Previously, this check compared tuple xmins against a single XID chosen at
the start of vacuum (OldestXmin). We now use GlobalVisState, which also
enables future work to set the VM during on-access pruning, since ordinary
queries have access to GlobalVisState but not OldestXmin.
This also benefits vacuum directly: in some cases, GlobalVisState may
advance during a vacuum, allowing more pages to become considered
all-visible. And, in the future, we could easily add a heuristic to
update GlobalVisState more frequently during vacuums of large tables. In
the rare case that the GlobalVisState moves backward, vacuum falls back
to OldestXmin to ensure we don’t attempt to freeze a dead tuple that
wasn’t yet prunable according to the GlobalVisState.
Because comparing a transaction ID against GlobalVisState is more
expensive than comparing against a single XID, we defer this check until
after scanning all tuples on the page. If visibility_cutoff_xid was
maintained, we perform the GlobalVisState check only once per page.
This is safe because visibility_cutoff_xid records the newest xmin on
the page; if it is globally visible, then the entire page is all-visible.
This approach may result in examining more tuple xmins than before,
since with OldestXmin we could sometimes rule out the page being
all-visible earlier. However, profiling shows the additional cost is not
significant.
---
src/backend/access/heap/heapam_visibility.c | 28 ++++++++++++++
src/backend/access/heap/pruneheap.c | 43 +++++++++------------
src/backend/access/heap/vacuumlazy.c | 10 ++---
src/include/access/heapam.h | 11 +++---
4 files changed, 58 insertions(+), 34 deletions(-)
diff --git a/src/backend/access/heap/heapam_visibility.c b/src/backend/access/heap/heapam_visibility.c
index 4ebc8abdbeb..edd529dc3c0 100644
--- a/src/backend/access/heap/heapam_visibility.c
+++ b/src/backend/access/heap/heapam_visibility.c
@@ -1189,6 +1189,34 @@ HeapTupleSatisfiesVacuum(HeapTuple htup, TransactionId OldestXmin,
return res;
}
+/*
+ * Nearly the same as HeapTupleSatisfiesVacuum, but uses a GlobalVisState to
+ * determine whether or not a tuple is HEAPTUPLE_DEAD Or
+ * HEAPTUPLE_RECENTLY_DEAD. It serves the same purpose but can be used by
+ * callers that have not calculated a single OldestXmin value.
+ */
+HTSV_Result
+HeapTupleSatisfiesVacuumGlobalVis(HeapTuple htup, GlobalVisState *vistest,
+ Buffer buffer)
+{
+ TransactionId dead_after = InvalidTransactionId;
+ HTSV_Result res;
+
+ res = HeapTupleSatisfiesVacuumHorizon(htup, buffer, &dead_after);
+
+ if (res == HEAPTUPLE_RECENTLY_DEAD)
+ {
+ Assert(TransactionIdIsValid(dead_after));
+
+ if (GlobalVisXidVisibleToAll(vistest, dead_after))
+ res = HEAPTUPLE_DEAD;
+ }
+ else
+ Assert(!TransactionIdIsValid(dead_after));
+
+ return res;
+}
+
/*
* Work horse for HeapTupleSatisfiesVacuum and similar routines.
*
diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index 989af765702..040efe80f2e 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -439,11 +439,12 @@ prune_freeze_setup(PruneFreezeParams *params,
/*
* The visibility cutoff xid is the newest xmin of live, committed tuples
- * older than OldestXmin on the page. This field is only kept up-to-date
- * if the page is all-visible. As soon as a tuple is encountered that is
- * not visible to all, this field is unmaintained. As long as it is
- * maintained, it can be used to calculate the snapshot conflict horizon
- * when updating the VM and/or freezing all the tuples on the page.
+ * on the page older than the visibility horizon represented in the
+ * GlobalVisState. This field is only kept up-to-date if the page is
+ * all-visible. As soon as a tuple is encountered that is not visible to
+ * all, this field is unmaintained. As long as it is maintained, it can be
+ * used to calculate the snapshot conflict horizon when updating the VM
+ * and/or freezing all the tuples on the page.
*/
prstate->visibility_cutoff_xid = InvalidTransactionId;
}
@@ -886,14 +887,13 @@ heap_page_will_set_vis(Relation relation,
*/
static bool
heap_page_is_all_visible(Relation rel, Buffer buf,
- TransactionId OldestXmin,
+ GlobalVisState *vistest,
bool *all_frozen,
TransactionId *visibility_cutoff_xid,
OffsetNumber *logging_offnum)
{
- return heap_page_would_be_all_visible(rel, buf,
- OldestXmin,
+ return heap_page_would_be_all_visible(rel, buf, vistest,
NULL, 0,
all_frozen,
visibility_cutoff_xid,
@@ -994,6 +994,16 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
prstate.ndead > 0 ||
prstate.nunused > 0;
+ /*
+ * After processing all the live tuples on the page, if the newest xmin
+ * amongst them is not visible to everyone, the page cannot be
+ * all-visible.
+ */
+ if (prstate.all_visible &&
+ TransactionIdIsNormal(prstate.visibility_cutoff_xid) &&
+ !GlobalVisXidVisibleToAll(prstate.vistest, prstate.visibility_cutoff_xid))
+ prstate.all_visible = prstate.all_frozen = false;
+
/*
* Even if we don't prune anything, if we found a new value for the
* pd_prune_xid field or the page was marked full, we will update the hint
@@ -1165,10 +1175,9 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
bool debug_all_frozen;
Assert(prstate.lpdead_items == 0);
- Assert(prstate.cutoffs);
if (!heap_page_is_all_visible(params->relation, buffer,
- prstate.cutoffs->OldestXmin,
+ prstate.vistest,
&debug_all_frozen,
&debug_cutoff, off_loc))
Assert(false);
@@ -1698,20 +1707,6 @@ heap_prune_record_unchanged_lp_normal(Page page, PruneState *prstate, OffsetNumb
*/
xmin = HeapTupleHeaderGetXmin(htup);
- /*
- * For now always use prstate->cutoffs for this test, because
- * we only update 'all_visible' and 'all_frozen' when freezing
- * is requested. We could use GlobalVisXidVisibleToAll()
- * instead, if a non-freezing caller wanted to set the VM bit.
- */
- Assert(prstate->cutoffs);
- if (!TransactionIdPrecedes(xmin, prstate->cutoffs->OldestXmin))
- {
- prstate->all_visible = false;
- prstate->all_frozen = false;
- break;
- }
-
/* Track newest xmin on page. */
if (TransactionIdFollows(xmin, prstate->visibility_cutoff_xid) &&
TransactionIdIsNormal(xmin))
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index fd68dfcfce2..fdf37625cd0 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -2730,7 +2730,7 @@ lazy_vacuum_heap_page(LVRelState *vacrel, BlockNumber blkno, Buffer buffer,
* done outside the critical section.
*/
if (heap_page_would_be_all_visible(vacrel->rel, buffer,
- vacrel->cutoffs.OldestXmin,
+ vacrel->vistest,
deadoffsets, num_offsets,
&all_frozen, &visibility_cutoff_xid,
&vacrel->offnum))
@@ -3489,7 +3489,7 @@ dead_items_cleanup(LVRelState *vacrel)
* Returns true if the page is all-visible other than the provided
* deadoffsets and false otherwise.
*
- * OldestXmin is used to determine visibility.
+ * vistest is used to determine visibility.
*
* Output parameters:
*
@@ -3505,7 +3505,7 @@ dead_items_cleanup(LVRelState *vacrel)
*/
bool
heap_page_would_be_all_visible(Relation rel, Buffer buf,
- TransactionId OldestXmin,
+ GlobalVisState *vistest,
OffsetNumber *deadoffsets,
int ndeadoffsets,
bool *all_frozen,
@@ -3579,7 +3579,7 @@ heap_page_would_be_all_visible(Relation rel, Buffer buf,
/* Visibility checks may do IO or allocate memory */
Assert(CritSectionCount == 0);
- switch (HeapTupleSatisfiesVacuum(&tuple, OldestXmin, buf))
+ switch (HeapTupleSatisfiesVacuumGlobalVis(&tuple, vistest, buf))
{
case HEAPTUPLE_LIVE:
{
@@ -3598,7 +3598,7 @@ heap_page_would_be_all_visible(Relation rel, Buffer buf,
* that everyone sees it as committed?
*/
xmin = HeapTupleHeaderGetXmin(tuple.t_data);
- if (!TransactionIdPrecedes(xmin, OldestXmin))
+ if (!GlobalVisXidVisibleToAll(vistest, xmin))
{
all_visible = false;
*all_frozen = false;
diff --git a/src/include/access/heapam.h b/src/include/access/heapam.h
index 937b46a77db..2b6a521e4ea 100644
--- a/src/include/access/heapam.h
+++ b/src/include/access/heapam.h
@@ -276,10 +276,9 @@ typedef struct PruneFreezeParams
/*
* Contains the cutoffs used for freezing. They are required if the
- * HEAP_PAGE_PRUNE_FREEZE option is set. cutoffs->OldestXmin is also used
- * to determine if dead tuples are HEAPTUPLE_RECENTLY_DEAD or
- * HEAPTUPLE_DEAD. Currently only vacuum passes in cutoffs. Vacuum
- * calculates them once, at the beginning of vacuuming the relation.
+ * HEAP_PAGE_PRUNE_FREEZE option is set. Currently only vacuum passes in
+ * cutoffs. Vacuum calculates them once, at the beginning of vacuuming the
+ * relation.
*/
struct VacuumCutoffs *cutoffs;
} PruneFreezeParams;
@@ -444,7 +443,7 @@ extern void log_heap_prune_and_freeze(Relation relation, Buffer buffer,
extern void heap_vacuum_rel(Relation rel,
const VacuumParams params, BufferAccessStrategy bstrategy);
extern bool heap_page_would_be_all_visible(Relation rel, Buffer buf,
- TransactionId OldestXmin,
+ GlobalVisState *vistest,
OffsetNumber *deadoffsets,
int ndeadoffsets,
bool *all_frozen,
@@ -458,6 +457,8 @@ extern TM_Result HeapTupleSatisfiesUpdate(HeapTuple htup, CommandId curcid,
Buffer buffer);
extern HTSV_Result HeapTupleSatisfiesVacuum(HeapTuple htup, TransactionId OldestXmin,
Buffer buffer);
+extern HTSV_Result HeapTupleSatisfiesVacuumGlobalVis(HeapTuple htup,
+ GlobalVisState *vistest, Buffer buffer);
extern HTSV_Result HeapTupleSatisfiesVacuumHorizon(HeapTuple htup, Buffer buffer,
TransactionId *dead_after);
extern void HeapTupleSetHintBits(HeapTupleHeader tuple, Buffer buffer,
--
2.43.0
v21-0010-Unset-all_visible-sooner-if-not-freezing.patchtext/x-patch; charset=UTF-8; name=v21-0010-Unset-all_visible-sooner-if-not-freezing.patchDownload
From 58242c95a3e737f3659913f24b22219dbafe1951 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <melanieplageman@gmail.com>
Date: Tue, 14 Oct 2025 15:22:35 -0400
Subject: [PATCH v21 10/12] Unset all_visible sooner if not freezing
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
In the prune/freeze path, we currently delay clearing all_visible and
all_frozen in the presence of dead items to allow opportunistic
freezing.
However, if no freezing will be attempted, there’s no need to delay.
Clearing the flags earlier avoids extra bookkeeping in
heap_prune_record_unchanged_lp_normal(). This currently has no runtime
effect because all callers that consider setting the VM also prepare
freeze plans, but upcoming changes will allow on-access pruning to set
the VM without freezing. The extra bookkeeping was noticeable in a
profile of on-access VM setting.
---
src/backend/access/heap/pruneheap.c | 15 +++++++++++++--
1 file changed, 13 insertions(+), 2 deletions(-)
diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index 040efe80f2e..90270081acd 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -1564,8 +1564,13 @@ heap_prune_record_dead(PruneState *prstate, OffsetNumber offnum,
/*
* Deliberately delay unsetting all_visible and all_frozen until later
* during pruning. Removable dead tuples shouldn't preclude freezing the
- * page.
+ * page. If we won't attempt freezing, just unset all-visible now, though.
*/
+ if (!prstate->attempt_freeze)
+ {
+ prstate->all_visible = false;
+ prstate->all_frozen = false;
+ }
/* Record the dead offset for vacuum */
prstate->deadoffsets[prstate->lpdead_items++] = offnum;
@@ -1824,8 +1829,14 @@ heap_prune_record_unchanged_lp_dead(Page page, PruneState *prstate, OffsetNumber
* Similarly, don't unset all_visible and all_frozen until later, at the
* end of heap_page_prune_and_freeze(). This will allow us to attempt to
* freeze the page after pruning. As long as we unset it before updating
- * the visibility map, this will be correct.
+ * the visibility map, this will be correct. If we won't attempt freezing,
+ * though, just unset all-visible now.
*/
+ if (!prstate->attempt_freeze)
+ {
+ prstate->all_visible = false;
+ prstate->all_frozen = false;
+ }
/* Record the dead offset for vacuum */
prstate->deadoffsets[prstate->lpdead_items++] = offnum;
--
2.43.0
v21-0011-Allow-on-access-pruning-to-set-pages-all-visible.patchtext/x-patch; charset=UTF-8; name=v21-0011-Allow-on-access-pruning-to-set-pages-all-visible.patchDownload
From f962eee2760f7f0927a318ac05b55e48eea3cec0 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <melanieplageman@gmail.com>
Date: Tue, 29 Jul 2025 14:34:30 -0400
Subject: [PATCH v21 11/12] Allow on-access pruning to set pages all-visible
Many queries do not modify the underlying relation. For such queries, if
on-access pruning occurs during the scan, we can check whether the page
has become all-visible and update the visibility map accordingly.
Previously, only vacuum and COPY FREEZE marked pages as all-visible or
all-frozen.
Supporting this requires passing information about whether the relation
is modified from the executor down to the scan descriptor.
This commit implements on-access VM setting for sequential scans as well
as for the underlying heap relation in index scans and bitmap heap
scans.
Author: Melanie Plageman <melanieplageman@gmail.com>
Reviewed-by: Kirill Reshke <reshkekirill@gmail.com>
Discussion: https://postgr.es/m/flat/CAAKRu_ZMw6Npd_qm2KM%2BFwQ3cMOMx1Dh3VMhp8-V7SOLxdK9-g%40mail.gmail.com
---
contrib/pgrowlocks/pgrowlocks.c | 2 +-
src/backend/access/brin/brin.c | 3 +-
src/backend/access/gin/gininsert.c | 3 +-
src/backend/access/heap/heapam.c | 15 +++-
src/backend/access/heap/heapam_handler.c | 22 ++++--
src/backend/access/heap/pruneheap.c | 69 +++++++++++++++----
src/backend/access/index/genam.c | 4 +-
src/backend/access/index/indexam.c | 6 +-
src/backend/access/nbtree/nbtsort.c | 2 +-
src/backend/access/table/tableam.c | 8 ++-
src/backend/commands/constraint.c | 2 +-
src/backend/commands/copyto.c | 2 +-
src/backend/commands/tablecmds.c | 4 +-
src/backend/commands/typecmds.c | 4 +-
src/backend/executor/execIndexing.c | 2 +-
src/backend/executor/execMain.c | 4 ++
src/backend/executor/execReplication.c | 8 +--
src/backend/executor/execUtils.c | 2 +
src/backend/executor/nodeBitmapHeapscan.c | 9 ++-
src/backend/executor/nodeIndexonlyscan.c | 2 +-
src/backend/executor/nodeIndexscan.c | 11 ++-
src/backend/executor/nodeSeqscan.c | 26 ++++++-
src/backend/partitioning/partbounds.c | 2 +-
src/backend/utils/adt/selfuncs.c | 2 +-
src/include/access/genam.h | 3 +-
src/include/access/heapam.h | 30 +++++++-
src/include/access/tableam.h | 19 ++---
src/include/nodes/execnodes.h | 6 ++
.../t/035_standby_logical_decoding.pl | 3 +-
29 files changed, 210 insertions(+), 65 deletions(-)
diff --git a/contrib/pgrowlocks/pgrowlocks.c b/contrib/pgrowlocks/pgrowlocks.c
index f88269332b6..27f01d8055f 100644
--- a/contrib/pgrowlocks/pgrowlocks.c
+++ b/contrib/pgrowlocks/pgrowlocks.c
@@ -114,7 +114,7 @@ pgrowlocks(PG_FUNCTION_ARGS)
RelationGetRelationName(rel));
/* Scan the relation */
- scan = table_beginscan(rel, GetActiveSnapshot(), 0, NULL);
+ scan = table_beginscan(rel, GetActiveSnapshot(), 0, NULL, 0);
hscan = (HeapScanDesc) scan;
attinmeta = TupleDescGetAttInMetadata(rsinfo->setDesc);
diff --git a/src/backend/access/brin/brin.c b/src/backend/access/brin/brin.c
index cb3331921cb..b9613787b85 100644
--- a/src/backend/access/brin/brin.c
+++ b/src/backend/access/brin/brin.c
@@ -2842,7 +2842,8 @@ _brin_parallel_scan_and_build(BrinBuildState *state,
indexInfo->ii_Concurrent = brinshared->isconcurrent;
scan = table_beginscan_parallel(heap,
- ParallelTableScanFromBrinShared(brinshared));
+ ParallelTableScanFromBrinShared(brinshared),
+ 0);
reltuples = table_index_build_scan(heap, index, indexInfo, true, true,
brinbuildCallbackParallel, state, scan);
diff --git a/src/backend/access/gin/gininsert.c b/src/backend/access/gin/gininsert.c
index c2b879b2bf6..147844690a1 100644
--- a/src/backend/access/gin/gininsert.c
+++ b/src/backend/access/gin/gininsert.c
@@ -2058,7 +2058,8 @@ _gin_parallel_scan_and_build(GinBuildState *state,
indexInfo->ii_Concurrent = ginshared->isconcurrent;
scan = table_beginscan_parallel(heap,
- ParallelTableScanFromGinBuildShared(ginshared));
+ ParallelTableScanFromGinBuildShared(ginshared),
+ 0);
reltuples = table_index_build_scan(heap, index, indexInfo, true, progress,
ginBuildCallbackParallel, state, scan);
diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index 2bff37e03b5..ae53e311ce1 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -555,6 +555,7 @@ heap_prepare_pagescan(TableScanDesc sscan)
Buffer buffer = scan->rs_cbuf;
BlockNumber block = scan->rs_cblock;
Snapshot snapshot;
+ Buffer *vmbuffer = NULL;
Page page;
int lines;
bool all_visible;
@@ -569,7 +570,9 @@ heap_prepare_pagescan(TableScanDesc sscan)
/*
* Prune and repair fragmentation for the whole page, if possible.
*/
- heap_page_prune_opt(scan->rs_base.rs_rd, buffer);
+ if (sscan->rs_flags & SO_HINT_REL_READ_ONLY)
+ vmbuffer = &scan->rs_vmbuffer;
+ heap_page_prune_opt(scan->rs_base.rs_rd, buffer, vmbuffer);
/*
* We must hold share lock on the buffer content while examining tuple
@@ -1246,6 +1249,7 @@ heap_beginscan(Relation relation, Snapshot snapshot,
sizeof(TBMIterateResult));
}
+ scan->rs_vmbuffer = InvalidBuffer;
return (TableScanDesc) scan;
}
@@ -1284,6 +1288,12 @@ heap_rescan(TableScanDesc sscan, ScanKey key, bool set_params,
scan->rs_cbuf = InvalidBuffer;
}
+ if (BufferIsValid(scan->rs_vmbuffer))
+ {
+ ReleaseBuffer(scan->rs_vmbuffer);
+ scan->rs_vmbuffer = InvalidBuffer;
+ }
+
/*
* SO_TYPE_BITMAPSCAN would be cleaned up here, but it does not hold any
* additional data vs a normal HeapScan
@@ -1316,6 +1326,9 @@ heap_endscan(TableScanDesc sscan)
if (BufferIsValid(scan->rs_cbuf))
ReleaseBuffer(scan->rs_cbuf);
+ if (BufferIsValid(scan->rs_vmbuffer))
+ ReleaseBuffer(scan->rs_vmbuffer);
+
/*
* Must free the read stream before freeing the BufferAccessStrategy.
*/
diff --git a/src/backend/access/heap/heapam_handler.c b/src/backend/access/heap/heapam_handler.c
index bcbac844bb6..27e3498f5f4 100644
--- a/src/backend/access/heap/heapam_handler.c
+++ b/src/backend/access/heap/heapam_handler.c
@@ -79,12 +79,14 @@ heapam_slot_callbacks(Relation relation)
*/
static IndexFetchTableData *
-heapam_index_fetch_begin(Relation rel)
+heapam_index_fetch_begin(Relation rel, uint32 flags)
{
IndexFetchHeapData *hscan = palloc0(sizeof(IndexFetchHeapData));
hscan->xs_base.rel = rel;
hscan->xs_cbuf = InvalidBuffer;
+ hscan->xs_vmbuffer = InvalidBuffer;
+ hscan->modifies_base_rel = !(flags & SO_HINT_REL_READ_ONLY);
return &hscan->xs_base;
}
@@ -99,6 +101,12 @@ heapam_index_fetch_reset(IndexFetchTableData *scan)
ReleaseBuffer(hscan->xs_cbuf);
hscan->xs_cbuf = InvalidBuffer;
}
+
+ if (BufferIsValid(hscan->xs_vmbuffer))
+ {
+ ReleaseBuffer(hscan->xs_vmbuffer);
+ hscan->xs_vmbuffer = InvalidBuffer;
+ }
}
static void
@@ -138,7 +146,8 @@ heapam_index_fetch_tuple(struct IndexFetchTableData *scan,
* Prune page, but only if we weren't already on this page
*/
if (prev_buf != hscan->xs_cbuf)
- heap_page_prune_opt(hscan->xs_base.rel, hscan->xs_cbuf);
+ heap_page_prune_opt(hscan->xs_base.rel, hscan->xs_cbuf,
+ hscan->modifies_base_rel ? NULL : &hscan->xs_vmbuffer);
}
/* Obtain share-lock on the buffer so we can examine visibility */
@@ -753,7 +762,7 @@ heapam_relation_copy_for_cluster(Relation OldHeap, Relation NewHeap,
tableScan = NULL;
heapScan = NULL;
- indexScan = index_beginscan(OldHeap, OldIndex, SnapshotAny, NULL, 0, 0);
+ indexScan = index_beginscan(OldHeap, OldIndex, SnapshotAny, NULL, 0, 0, 0);
index_rescan(indexScan, NULL, 0, NULL, 0);
}
else
@@ -762,7 +771,7 @@ heapam_relation_copy_for_cluster(Relation OldHeap, Relation NewHeap,
pgstat_progress_update_param(PROGRESS_CLUSTER_PHASE,
PROGRESS_CLUSTER_PHASE_SEQ_SCAN_HEAP);
- tableScan = table_beginscan(OldHeap, SnapshotAny, 0, (ScanKey) NULL);
+ tableScan = table_beginscan(OldHeap, SnapshotAny, 0, (ScanKey) NULL, 0);
heapScan = (HeapScanDesc) tableScan;
indexScan = NULL;
@@ -2471,6 +2480,7 @@ BitmapHeapScanNextBlock(TableScanDesc scan,
TBMIterateResult *tbmres;
OffsetNumber offsets[TBM_MAX_TUPLES_PER_PAGE];
int noffsets = -1;
+ Buffer *vmbuffer = NULL;
Assert(scan->rs_flags & SO_TYPE_BITMAPSCAN);
Assert(hscan->rs_read_stream);
@@ -2517,7 +2527,9 @@ BitmapHeapScanNextBlock(TableScanDesc scan,
/*
* Prune and repair fragmentation for the whole page, if possible.
*/
- heap_page_prune_opt(scan->rs_rd, buffer);
+ if (scan->rs_flags & SO_HINT_REL_READ_ONLY)
+ vmbuffer = &hscan->rs_vmbuffer;
+ heap_page_prune_opt(scan->rs_rd, buffer, vmbuffer);
/*
* We must hold share lock on the buffer content while examining tuple
diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index 90270081acd..124722f1778 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -203,7 +203,9 @@ static bool heap_page_will_set_vis(Relation relation,
Buffer heap_buf,
Buffer vmbuffer,
bool blk_known_av,
- const PruneState *prstate,
+ PruneReason reason,
+ bool do_prune, bool do_freeze,
+ PruneState *prstate,
uint8 *vmflags,
bool *do_set_pd_vis);
@@ -218,9 +220,13 @@ static bool heap_page_will_set_vis(Relation relation,
* if there's not any use in pruning.
*
* Caller must have pin on the buffer, and must *not* have a lock on it.
+ *
+ * If vmbuffer is not NULL, it is okay for pruning to set the visibility map if
+ * the page is all-visible. We will take care of pinning and, if needed,
+ * reading in the page of the visibility map.
*/
void
-heap_page_prune_opt(Relation relation, Buffer buffer)
+heap_page_prune_opt(Relation relation, Buffer buffer, Buffer *vmbuffer)
{
Page page = BufferGetPage(buffer);
TransactionId prune_xid;
@@ -297,6 +303,13 @@ heap_page_prune_opt(Relation relation, Buffer buffer)
.vistest = vistest,.cutoffs = NULL
};
+ if (vmbuffer)
+ {
+ visibilitymap_pin(relation, BufferGetBlockNumber(buffer), vmbuffer);
+ params.options = HEAP_PAGE_PRUNE_UPDATE_VIS;
+ params.vmbuffer = *vmbuffer;
+ }
+
heap_page_prune_and_freeze(¶ms, &presult, &dummy_off_loc,
NULL, NULL);
@@ -785,6 +798,9 @@ heap_page_will_freeze(Relation relation, Buffer buffer,
* have examined this page’s VM bits (e.g., VACUUM in the previous
* heap_vac_scan_next_block() call) and can pass that along.
*
+ * This should be called only after do_freeze has been decided (and do_prune
+ * has been set), as these factor into our heuristic-based decision.
+ *
* Returns true if one or both VM bits should be set, along with the desired
* flags in *vmflags. Also indicates via do_set_pd_vis whether PD_ALL_VISIBLE
* should be set on the heap page.
@@ -795,7 +811,9 @@ heap_page_will_set_vis(Relation relation,
Buffer heap_buf,
Buffer vmbuffer,
bool blk_known_av,
- const PruneState *prstate,
+ PruneReason reason,
+ bool do_prune, bool do_freeze,
+ PruneState *prstate,
uint8 *vmflags,
bool *do_set_pd_vis)
{
@@ -811,6 +829,23 @@ heap_page_will_set_vis(Relation relation,
return false;
}
+ /*
+ * If this is an on-access call and we're not actually pruning, avoid
+ * setting the visibility map if it would newly dirty the heap page or, if
+ * the page is already dirty, if doing so would require including a
+ * full-page image (FPI) of the heap page in the WAL. This situation
+ * should be rare, as on-access pruning is only attempted when
+ * pd_prune_xid is valid.
+ */
+ if (reason == PRUNE_ON_ACCESS &&
+ prstate->all_visible &&
+ !do_prune && !do_freeze &&
+ (!BufferIsDirty(heap_buf) || XLogCheckBufferNeedsBackup(heap_buf)))
+ {
+ prstate->all_visible = prstate->all_frozen = false;
+ return false;
+ }
+
if (prstate->all_visible && !PageIsAllVisible(heap_page))
*do_set_pd_vis = true;
@@ -834,6 +869,11 @@ heap_page_will_set_vis(Relation relation,
* page-level bit is clear. However, it's possible that in vacuum the bit
* got cleared after heap_vac_scan_next_block() was called, so we must
* recheck with buffer lock before concluding that the VM is corrupt.
+ *
+ * This will never trigger for on-access pruning because it couldn't have
+ * done a previous visibility map lookup and thus will always pass
+ * blk_known_av as false. A future vacuum will have to take care of fixing
+ * the corruption.
*/
else if (blk_known_av && !PageIsAllVisible(heap_page) &&
visibilitymap_get_status(relation, heap_blk, &vmbuffer) != 0)
@@ -994,6 +1034,14 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
prstate.ndead > 0 ||
prstate.nunused > 0;
+ /*
+ * Even if we don't prune anything, if we found a new value for the
+ * pd_prune_xid field or the page was marked full, we will update the hint
+ * bit.
+ */
+ do_hint_prune = ((PageHeader) page)->pd_prune_xid != prstate.new_prune_xid ||
+ PageIsFull(page);
+
/*
* After processing all the live tuples on the page, if the newest xmin
* amongst them is not visible to everyone, the page cannot be
@@ -1004,14 +1052,6 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
!GlobalVisXidVisibleToAll(prstate.vistest, prstate.visibility_cutoff_xid))
prstate.all_visible = prstate.all_frozen = false;
- /*
- * Even if we don't prune anything, if we found a new value for the
- * pd_prune_xid field or the page was marked full, we will update the hint
- * bit.
- */
- do_hint_prune = ((PageHeader) page)->pd_prune_xid != prstate.new_prune_xid ||
- PageIsFull(page);
-
/*
* Decide if we want to go ahead with freezing according to the freeze
* plans we prepared, or not.
@@ -1054,6 +1094,7 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
*/
do_set_vm = heap_page_will_set_vis(params->relation,
blockno, buffer, vmbuffer, params->blk_known_av,
+ params->reason, do_prune, do_freeze,
&prstate, &new_vmbits, &do_set_pd_vis);
/* We should only set the VM if PD_ALL_VISIBLE is set or will be */
@@ -2340,7 +2381,7 @@ heap_log_freeze_plan(HeapTupleFreeze *tuples, int ntuples,
/*
* Calculate the conflict horizon for the whole XLOG_HEAP2_PRUNE_VACUUM_SCAN
- * record.
+ * or XLOG_HEAP2_PRUNE_ON_ACCESS record.
*/
static TransactionId
get_conflict_xid(bool do_prune, bool do_freeze, bool do_set_vm,
@@ -2410,8 +2451,8 @@ get_conflict_xid(bool do_prune, bool do_freeze, bool do_set_vm,
* - Reaping: During vacuum phase III, items that are already LP_DEAD are
* marked as unused.
*
- * - VM updates: After vacuum phases I and III, the heap page may be marked
- * all-visible and all-frozen.
+ * - VM updates: After vacuum phases I and III and on-access, the heap page
+ * may be marked all-visible and all-frozen.
*
* These changes all happen together, so we use a single WAL record for them
* all.
diff --git a/src/backend/access/index/genam.c b/src/backend/access/index/genam.c
index 0cb27af1310..1e7992dbeb3 100644
--- a/src/backend/access/index/genam.c
+++ b/src/backend/access/index/genam.c
@@ -447,7 +447,7 @@ systable_beginscan(Relation heapRelation,
}
sysscan->iscan = index_beginscan(heapRelation, irel,
- snapshot, NULL, nkeys, 0);
+ snapshot, NULL, nkeys, 0, 0);
index_rescan(sysscan->iscan, idxkey, nkeys, NULL, 0);
sysscan->scan = NULL;
@@ -708,7 +708,7 @@ systable_beginscan_ordered(Relation heapRelation,
}
sysscan->iscan = index_beginscan(heapRelation, indexRelation,
- snapshot, NULL, nkeys, 0);
+ snapshot, NULL, nkeys, 0, 0);
index_rescan(sysscan->iscan, idxkey, nkeys, NULL, 0);
sysscan->scan = NULL;
diff --git a/src/backend/access/index/indexam.c b/src/backend/access/index/indexam.c
index 0492d92d23b..b5523cf2ab1 100644
--- a/src/backend/access/index/indexam.c
+++ b/src/backend/access/index/indexam.c
@@ -257,7 +257,7 @@ index_beginscan(Relation heapRelation,
Relation indexRelation,
Snapshot snapshot,
IndexScanInstrumentation *instrument,
- int nkeys, int norderbys)
+ int nkeys, int norderbys, uint32 flags)
{
IndexScanDesc scan;
@@ -284,7 +284,7 @@ index_beginscan(Relation heapRelation,
scan->instrument = instrument;
/* prepare to fetch index matches from table */
- scan->xs_heapfetch = table_index_fetch_begin(heapRelation);
+ scan->xs_heapfetch = table_index_fetch_begin(heapRelation, flags);
return scan;
}
@@ -615,7 +615,7 @@ index_beginscan_parallel(Relation heaprel, Relation indexrel,
scan->instrument = instrument;
/* prepare to fetch index matches from table */
- scan->xs_heapfetch = table_index_fetch_begin(heaprel);
+ scan->xs_heapfetch = table_index_fetch_begin(heaprel, 0);
return scan;
}
diff --git a/src/backend/access/nbtree/nbtsort.c b/src/backend/access/nbtree/nbtsort.c
index 454adaee7dc..02ab0233e59 100644
--- a/src/backend/access/nbtree/nbtsort.c
+++ b/src/backend/access/nbtree/nbtsort.c
@@ -1925,7 +1925,7 @@ _bt_parallel_scan_and_sort(BTSpool *btspool, BTSpool *btspool2,
indexInfo = BuildIndexInfo(btspool->index);
indexInfo->ii_Concurrent = btshared->isconcurrent;
scan = table_beginscan_parallel(btspool->heap,
- ParallelTableScanFromBTShared(btshared));
+ ParallelTableScanFromBTShared(btshared), 0);
reltuples = table_index_build_scan(btspool->heap, btspool->index, indexInfo,
true, progress, _bt_build_callback,
&buildstate, scan);
diff --git a/src/backend/access/table/tableam.c b/src/backend/access/table/tableam.c
index 5e41404937e..558c4497993 100644
--- a/src/backend/access/table/tableam.c
+++ b/src/backend/access/table/tableam.c
@@ -50,6 +50,7 @@ char *default_table_access_method = DEFAULT_TABLE_ACCESS_METHOD;
bool synchronize_seqscans = true;
+
/* ----------------------------------------------------------------------------
* Slot functions.
* ----------------------------------------------------------------------------
@@ -163,10 +164,11 @@ table_parallelscan_initialize(Relation rel, ParallelTableScanDesc pscan,
}
TableScanDesc
-table_beginscan_parallel(Relation relation, ParallelTableScanDesc pscan)
+table_beginscan_parallel(Relation relation, ParallelTableScanDesc pscan, uint32 flags)
{
Snapshot snapshot;
- uint32 flags = SO_TYPE_SEQSCAN |
+
+ flags |= SO_TYPE_SEQSCAN |
SO_ALLOW_STRAT | SO_ALLOW_SYNC | SO_ALLOW_PAGEMODE;
Assert(RelFileLocatorEquals(relation->rd_locator, pscan->phs_locator));
@@ -217,7 +219,7 @@ table_index_fetch_tuple_check(Relation rel,
bool found;
slot = table_slot_create(rel, NULL);
- scan = table_index_fetch_begin(rel);
+ scan = table_index_fetch_begin(rel, 0);
found = table_index_fetch_tuple(scan, tid, snapshot, slot, &call_again,
all_dead);
table_index_fetch_end(scan);
diff --git a/src/backend/commands/constraint.c b/src/backend/commands/constraint.c
index 3497a8221f2..97c8278e36d 100644
--- a/src/backend/commands/constraint.c
+++ b/src/backend/commands/constraint.c
@@ -106,7 +106,7 @@ unique_key_recheck(PG_FUNCTION_ARGS)
*/
tmptid = checktid;
{
- IndexFetchTableData *scan = table_index_fetch_begin(trigdata->tg_relation);
+ IndexFetchTableData *scan = table_index_fetch_begin(trigdata->tg_relation, 0);
bool call_again = false;
if (!table_index_fetch_tuple(scan, &tmptid, SnapshotSelf, slot,
diff --git a/src/backend/commands/copyto.c b/src/backend/commands/copyto.c
index cef452584e5..22b453dc617 100644
--- a/src/backend/commands/copyto.c
+++ b/src/backend/commands/copyto.c
@@ -1156,7 +1156,7 @@ CopyRelationTo(CopyToState cstate, Relation rel, Relation root_rel, uint64 *proc
AttrMap *map = NULL;
TupleTableSlot *root_slot = NULL;
- scandesc = table_beginscan(rel, GetActiveSnapshot(), 0, NULL);
+ scandesc = table_beginscan(rel, GetActiveSnapshot(), 0, NULL, 0);
slot = table_slot_create(rel, NULL);
/*
diff --git a/src/backend/commands/tablecmds.c b/src/backend/commands/tablecmds.c
index 23ebaa3f230..66c418059fe 100644
--- a/src/backend/commands/tablecmds.c
+++ b/src/backend/commands/tablecmds.c
@@ -6345,7 +6345,7 @@ ATRewriteTable(AlteredTableInfo *tab, Oid OIDNewHeap)
* checking all the constraints.
*/
snapshot = RegisterSnapshot(GetLatestSnapshot());
- scan = table_beginscan(oldrel, snapshot, 0, NULL);
+ scan = table_beginscan(oldrel, snapshot, 0, NULL, 0);
/*
* Switch to per-tuple memory context and reset it for each tuple
@@ -13730,7 +13730,7 @@ validateForeignKeyConstraint(char *conname,
*/
snapshot = RegisterSnapshot(GetLatestSnapshot());
slot = table_slot_create(rel, NULL);
- scan = table_beginscan(rel, snapshot, 0, NULL);
+ scan = table_beginscan(rel, snapshot, 0, NULL, 0);
perTupCxt = AllocSetContextCreate(CurrentMemoryContext,
"validateForeignKeyConstraint",
diff --git a/src/backend/commands/typecmds.c b/src/backend/commands/typecmds.c
index 5979580139f..35560ac60d9 100644
--- a/src/backend/commands/typecmds.c
+++ b/src/backend/commands/typecmds.c
@@ -3154,7 +3154,7 @@ validateDomainNotNullConstraint(Oid domainoid)
/* Scan all tuples in this relation */
snapshot = RegisterSnapshot(GetLatestSnapshot());
- scan = table_beginscan(testrel, snapshot, 0, NULL);
+ scan = table_beginscan(testrel, snapshot, 0, NULL, 0);
slot = table_slot_create(testrel, NULL);
while (table_scan_getnextslot(scan, ForwardScanDirection, slot))
{
@@ -3235,7 +3235,7 @@ validateDomainCheckConstraint(Oid domainoid, const char *ccbin, LOCKMODE lockmod
/* Scan all tuples in this relation */
snapshot = RegisterSnapshot(GetLatestSnapshot());
- scan = table_beginscan(testrel, snapshot, 0, NULL);
+ scan = table_beginscan(testrel, snapshot, 0, NULL, 0);
slot = table_slot_create(testrel, NULL);
while (table_scan_getnextslot(scan, ForwardScanDirection, slot))
{
diff --git a/src/backend/executor/execIndexing.c b/src/backend/executor/execIndexing.c
index 401606f840a..4e39ac00f30 100644
--- a/src/backend/executor/execIndexing.c
+++ b/src/backend/executor/execIndexing.c
@@ -815,7 +815,7 @@ check_exclusion_or_unique_constraint(Relation heap, Relation index,
retry:
conflict = false;
found_self = false;
- index_scan = index_beginscan(heap, index, &DirtySnapshot, NULL, indnkeyatts, 0);
+ index_scan = index_beginscan(heap, index, &DirtySnapshot, NULL, indnkeyatts, 0, 0);
index_rescan(index_scan, scankeys, indnkeyatts, NULL, 0);
while (index_getnext_slot(index_scan, ForwardScanDirection, existing_slot))
diff --git a/src/backend/executor/execMain.c b/src/backend/executor/execMain.c
index 27c9eec697b..0630a5af79e 100644
--- a/src/backend/executor/execMain.c
+++ b/src/backend/executor/execMain.c
@@ -916,6 +916,10 @@ InitPlan(QueryDesc *queryDesc, int eflags)
break;
}
+ /* If it has a rowmark, the relation is modified */
+ estate->es_modified_relids = bms_add_member(estate->es_modified_relids,
+ rc->rti);
+
/* Check that relation is a legal target for marking */
if (relation)
CheckValidRowMarkRel(relation, rc->markType);
diff --git a/src/backend/executor/execReplication.c b/src/backend/executor/execReplication.c
index def32774c90..473d236e551 100644
--- a/src/backend/executor/execReplication.c
+++ b/src/backend/executor/execReplication.c
@@ -204,7 +204,7 @@ RelationFindReplTupleByIndex(Relation rel, Oid idxoid,
skey_attoff = build_replindex_scan_key(skey, rel, idxrel, searchslot);
/* Start an index scan. */
- scan = index_beginscan(rel, idxrel, &snap, NULL, skey_attoff, 0);
+ scan = index_beginscan(rel, idxrel, &snap, NULL, skey_attoff, 0, 0);
retry:
found = false;
@@ -382,7 +382,7 @@ RelationFindReplTupleSeq(Relation rel, LockTupleMode lockmode,
/* Start a heap scan. */
InitDirtySnapshot(snap);
- scan = table_beginscan(rel, &snap, 0, NULL);
+ scan = table_beginscan(rel, &snap, 0, NULL, 0);
scanslot = table_slot_create(rel, NULL);
retry:
@@ -601,7 +601,7 @@ RelationFindDeletedTupleInfoSeq(Relation rel, TupleTableSlot *searchslot,
* not yet committed or those just committed prior to the scan are
* excluded in update_most_recent_deletion_info().
*/
- scan = table_beginscan(rel, SnapshotAny, 0, NULL);
+ scan = table_beginscan(rel, SnapshotAny, 0, NULL, 0);
scanslot = table_slot_create(rel, NULL);
table_rescan(scan, NULL);
@@ -665,7 +665,7 @@ RelationFindDeletedTupleInfoByIndex(Relation rel, Oid idxoid,
* not yet committed or those just committed prior to the scan are
* excluded in update_most_recent_deletion_info().
*/
- scan = index_beginscan(rel, idxrel, SnapshotAny, NULL, skey_attoff, 0);
+ scan = index_beginscan(rel, idxrel, SnapshotAny, NULL, skey_attoff, 0, 0);
index_rescan(scan, skey, skey_attoff, NULL, 0);
diff --git a/src/backend/executor/execUtils.c b/src/backend/executor/execUtils.c
index fdc65c2b42b..28a06dcd244 100644
--- a/src/backend/executor/execUtils.c
+++ b/src/backend/executor/execUtils.c
@@ -893,6 +893,8 @@ ExecInitResultRelation(EState *estate, ResultRelInfo *resultRelInfo,
estate->es_result_relations = (ResultRelInfo **)
palloc0(estate->es_range_table_size * sizeof(ResultRelInfo *));
estate->es_result_relations[rti - 1] = resultRelInfo;
+ estate->es_modified_relids = bms_add_member(estate->es_modified_relids,
+ rti);
/*
* Saving in the list allows to avoid needlessly traversing the whole
diff --git a/src/backend/executor/nodeBitmapHeapscan.c b/src/backend/executor/nodeBitmapHeapscan.c
index bf24f3d7fe0..0d854db51a1 100644
--- a/src/backend/executor/nodeBitmapHeapscan.c
+++ b/src/backend/executor/nodeBitmapHeapscan.c
@@ -105,11 +105,18 @@ BitmapTableScanSetup(BitmapHeapScanState *node)
*/
if (!node->ss.ss_currentScanDesc)
{
+ uint32 flags = 0;
+
+ if (!bms_is_member(((Scan *) node->ss.ps.plan)->scanrelid,
+ node->ss.ps.state->es_modified_relids))
+ flags = SO_HINT_REL_READ_ONLY;
+
node->ss.ss_currentScanDesc =
table_beginscan_bm(node->ss.ss_currentRelation,
node->ss.ps.state->es_snapshot,
0,
- NULL);
+ NULL,
+ flags);
}
node->ss.ss_currentScanDesc->st.rs_tbmiterator = tbmiterator;
diff --git a/src/backend/executor/nodeIndexonlyscan.c b/src/backend/executor/nodeIndexonlyscan.c
index f464cca9507..87b04b1b88e 100644
--- a/src/backend/executor/nodeIndexonlyscan.c
+++ b/src/backend/executor/nodeIndexonlyscan.c
@@ -94,7 +94,7 @@ IndexOnlyNext(IndexOnlyScanState *node)
estate->es_snapshot,
&node->ioss_Instrument,
node->ioss_NumScanKeys,
- node->ioss_NumOrderByKeys);
+ node->ioss_NumOrderByKeys, 0);
node->ioss_ScanDesc = scandesc;
diff --git a/src/backend/executor/nodeIndexscan.c b/src/backend/executor/nodeIndexscan.c
index f36929deec3..90f929ce741 100644
--- a/src/backend/executor/nodeIndexscan.c
+++ b/src/backend/executor/nodeIndexscan.c
@@ -102,6 +102,12 @@ IndexNext(IndexScanState *node)
if (scandesc == NULL)
{
+ uint32 flags = 0;
+
+ if (!bms_is_member(((Scan *) node->ss.ps.plan)->scanrelid,
+ estate->es_modified_relids))
+ flags = SO_HINT_REL_READ_ONLY;
+
/*
* We reach here if the index scan is not parallel, or if we're
* serially executing an index scan that was planned to be parallel.
@@ -111,7 +117,8 @@ IndexNext(IndexScanState *node)
estate->es_snapshot,
&node->iss_Instrument,
node->iss_NumScanKeys,
- node->iss_NumOrderByKeys);
+ node->iss_NumOrderByKeys,
+ flags);
node->iss_ScanDesc = scandesc;
@@ -207,7 +214,7 @@ IndexNextWithReorder(IndexScanState *node)
estate->es_snapshot,
&node->iss_Instrument,
node->iss_NumScanKeys,
- node->iss_NumOrderByKeys);
+ node->iss_NumOrderByKeys, 0);
node->iss_ScanDesc = scandesc;
diff --git a/src/backend/executor/nodeSeqscan.c b/src/backend/executor/nodeSeqscan.c
index 94047d29430..4d0cbb9dee4 100644
--- a/src/backend/executor/nodeSeqscan.c
+++ b/src/backend/executor/nodeSeqscan.c
@@ -65,13 +65,20 @@ SeqNext(SeqScanState *node)
if (scandesc == NULL)
{
+ uint32 flags = 0;
+
+ if (!bms_is_member(((Scan *) node->ss.ps.plan)->scanrelid,
+ estate->es_modified_relids))
+ flags = SO_HINT_REL_READ_ONLY;
+
/*
* We reach here if the scan is not parallel, or if we're serially
* executing a scan that was planned to be parallel.
*/
scandesc = table_beginscan(node->ss.ss_currentRelation,
estate->es_snapshot,
- 0, NULL);
+ 0, NULL, flags);
+
node->ss.ss_currentScanDesc = scandesc;
}
@@ -367,14 +374,20 @@ ExecSeqScanInitializeDSM(SeqScanState *node,
{
EState *estate = node->ss.ps.state;
ParallelTableScanDesc pscan;
+ uint32 flags = 0;
pscan = shm_toc_allocate(pcxt->toc, node->pscan_len);
table_parallelscan_initialize(node->ss.ss_currentRelation,
pscan,
estate->es_snapshot);
shm_toc_insert(pcxt->toc, node->ss.ps.plan->plan_node_id, pscan);
+ if (!bms_is_member(((Scan *) node->ss.ps.plan)->scanrelid,
+ estate->es_modified_relids))
+ flags = SO_HINT_REL_READ_ONLY;
+
node->ss.ss_currentScanDesc =
- table_beginscan_parallel(node->ss.ss_currentRelation, pscan);
+ table_beginscan_parallel(node->ss.ss_currentRelation, pscan,
+ flags);
}
/* ----------------------------------------------------------------
@@ -404,8 +417,15 @@ ExecSeqScanInitializeWorker(SeqScanState *node,
ParallelWorkerContext *pwcxt)
{
ParallelTableScanDesc pscan;
+ uint32 flags = 0;
+
+ if (!bms_is_member(((Scan *) node->ss.ps.plan)->scanrelid,
+ node->ss.ps.state->es_modified_relids))
+ flags = SO_HINT_REL_READ_ONLY;
pscan = shm_toc_lookup(pwcxt->toc, node->ss.ps.plan->plan_node_id, false);
node->ss.ss_currentScanDesc =
- table_beginscan_parallel(node->ss.ss_currentRelation, pscan);
+ table_beginscan_parallel(node->ss.ss_currentRelation,
+ pscan,
+ flags);
}
diff --git a/src/backend/partitioning/partbounds.c b/src/backend/partitioning/partbounds.c
index 8ba038c5ef4..d3b340ee2a7 100644
--- a/src/backend/partitioning/partbounds.c
+++ b/src/backend/partitioning/partbounds.c
@@ -3370,7 +3370,7 @@ check_default_partition_contents(Relation parent, Relation default_rel,
econtext = GetPerTupleExprContext(estate);
snapshot = RegisterSnapshot(GetLatestSnapshot());
tupslot = table_slot_create(part_rel, &estate->es_tupleTable);
- scan = table_beginscan(part_rel, snapshot, 0, NULL);
+ scan = table_beginscan(part_rel, snapshot, 0, NULL, 0);
/*
* Switch to per-tuple memory context and reset it for each tuple
diff --git a/src/backend/utils/adt/selfuncs.c b/src/backend/utils/adt/selfuncs.c
index 540aa9628d7..28434146eba 100644
--- a/src/backend/utils/adt/selfuncs.c
+++ b/src/backend/utils/adt/selfuncs.c
@@ -7100,7 +7100,7 @@ get_actual_variable_endpoint(Relation heapRel,
index_scan = index_beginscan(heapRel, indexRel,
&SnapshotNonVacuumable, NULL,
- 1, 0);
+ 1, 0, 0);
/* Set it up for index-only scan */
index_scan->xs_want_itup = true;
index_rescan(index_scan, scankeys, 1, NULL, 0);
diff --git a/src/include/access/genam.h b/src/include/access/genam.h
index 9200a22bd9f..2f9e9ea6318 100644
--- a/src/include/access/genam.h
+++ b/src/include/access/genam.h
@@ -177,7 +177,7 @@ extern IndexScanDesc index_beginscan(Relation heapRelation,
Relation indexRelation,
Snapshot snapshot,
IndexScanInstrumentation *instrument,
- int nkeys, int norderbys);
+ int nkeys, int norderbys, uint32 flags);
extern IndexScanDesc index_beginscan_bitmap(Relation indexRelation,
Snapshot snapshot,
IndexScanInstrumentation *instrument,
@@ -204,6 +204,7 @@ extern IndexScanDesc index_beginscan_parallel(Relation heaprel,
IndexScanInstrumentation *instrument,
int nkeys, int norderbys,
ParallelIndexScanDesc pscan);
+
extern ItemPointer index_getnext_tid(IndexScanDesc scan,
ScanDirection direction);
extern bool index_fetch_heap(IndexScanDesc scan, TupleTableSlot *slot);
diff --git a/src/include/access/heapam.h b/src/include/access/heapam.h
index 2b6a521e4ea..1e3df54628b 100644
--- a/src/include/access/heapam.h
+++ b/src/include/access/heapam.h
@@ -95,6 +95,13 @@ typedef struct HeapScanDescData
*/
ParallelBlockTableScanWorkerData *rs_parallelworkerdata;
+ /*
+ * For sequential scans and bitmap heap scans. If the relation is not
+ * being modified, on-access pruning may read in the current heap page's
+ * corresponding VM block to this buffer.
+ */
+ Buffer rs_vmbuffer;
+
/* these fields only used in page-at-a-time mode and for bitmap scans */
uint32 rs_cindex; /* current tuple's index in vistuples */
uint32 rs_ntuples; /* number of visible tuples on page */
@@ -117,8 +124,24 @@ typedef struct IndexFetchHeapData
{
IndexFetchTableData xs_base; /* AM independent part of the descriptor */
- Buffer xs_cbuf; /* current heap buffer in scan, if any */
- /* NB: if xs_cbuf is not InvalidBuffer, we hold a pin on that buffer */
+ /*
+ * Current heap buffer in scan, if any. NB: if xs_cbuf is not
+ * InvalidBuffer, we hold a pin on that buffer.
+ */
+ Buffer xs_cbuf;
+
+ /*
+ * For index scans that do not modify the underlying heap table, on-access
+ * pruning may read in the current heap page's corresponding VM block to
+ * this buffer.
+ */
+ Buffer xs_vmbuffer;
+
+ /*
+ * Some optimizations can only be performed if the query does not modify
+ * the underlying relation. Track that here.
+ */
+ bool modifies_base_rel;
} IndexFetchHeapData;
/* Result codes for HeapTupleSatisfiesVacuum */
@@ -417,7 +440,8 @@ extern TransactionId heap_index_delete_tuples(Relation rel,
TM_IndexDeleteOp *delstate);
/* in heap/pruneheap.c */
-extern void heap_page_prune_opt(Relation relation, Buffer buffer);
+extern void heap_page_prune_opt(Relation relation, Buffer buffer,
+ Buffer *vmbuffer);
extern void heap_page_prune_and_freeze(PruneFreezeParams *params,
PruneFreezeResult *presult,
OffsetNumber *off_loc,
diff --git a/src/include/access/tableam.h b/src/include/access/tableam.h
index e16bf025692..0042636463f 100644
--- a/src/include/access/tableam.h
+++ b/src/include/access/tableam.h
@@ -63,6 +63,8 @@ typedef enum ScanOptions
/* unregister snapshot at scan end? */
SO_TEMP_SNAPSHOT = 1 << 9,
+ /* set if the query doesn't modify the rel */
+ SO_HINT_REL_READ_ONLY = 1 << 10,
} ScanOptions;
/*
@@ -420,7 +422,7 @@ typedef struct TableAmRoutine
*
* Tuples for an index scan can then be fetched via index_fetch_tuple.
*/
- struct IndexFetchTableData *(*index_fetch_begin) (Relation rel);
+ struct IndexFetchTableData *(*index_fetch_begin) (Relation rel, uint32 flags);
/*
* Reset index fetch. Typically this will release cross index fetch
@@ -874,9 +876,9 @@ extern TupleTableSlot *table_slot_create(Relation relation, List **reglist);
*/
static inline TableScanDesc
table_beginscan(Relation rel, Snapshot snapshot,
- int nkeys, ScanKeyData *key)
+ int nkeys, ScanKeyData *key, uint32 flags)
{
- uint32 flags = SO_TYPE_SEQSCAN |
+ flags |= SO_TYPE_SEQSCAN |
SO_ALLOW_STRAT | SO_ALLOW_SYNC | SO_ALLOW_PAGEMODE;
return rel->rd_tableam->scan_begin(rel, snapshot, nkeys, key, NULL, flags);
@@ -919,9 +921,9 @@ table_beginscan_strat(Relation rel, Snapshot snapshot,
*/
static inline TableScanDesc
table_beginscan_bm(Relation rel, Snapshot snapshot,
- int nkeys, ScanKeyData *key)
+ int nkeys, ScanKeyData *key, uint32 flags)
{
- uint32 flags = SO_TYPE_BITMAPSCAN | SO_ALLOW_PAGEMODE;
+ flags |= SO_TYPE_BITMAPSCAN | SO_ALLOW_PAGEMODE;
return rel->rd_tableam->scan_begin(rel, snapshot, nkeys, key,
NULL, flags);
@@ -1128,7 +1130,8 @@ extern void table_parallelscan_initialize(Relation rel,
* Caller must hold a suitable lock on the relation.
*/
extern TableScanDesc table_beginscan_parallel(Relation relation,
- ParallelTableScanDesc pscan);
+ ParallelTableScanDesc pscan,
+ uint32 flags);
/*
* Restart a parallel scan. Call this in the leader process. Caller is
@@ -1154,9 +1157,9 @@ table_parallelscan_reinitialize(Relation rel, ParallelTableScanDesc pscan)
* Tuples for an index scan can then be fetched via table_index_fetch_tuple().
*/
static inline IndexFetchTableData *
-table_index_fetch_begin(Relation rel)
+table_index_fetch_begin(Relation rel, uint32 flags)
{
- return rel->rd_tableam->index_fetch_begin(rel);
+ return rel->rd_tableam->index_fetch_begin(rel, flags);
}
/*
diff --git a/src/include/nodes/execnodes.h b/src/include/nodes/execnodes.h
index 18ae8f0d4bb..0c3b0d60168 100644
--- a/src/include/nodes/execnodes.h
+++ b/src/include/nodes/execnodes.h
@@ -676,6 +676,12 @@ typedef struct EState
* ExecDoInitialPruning() */
const char *es_sourceText; /* Source text from QueryDesc */
+ /*
+ * RT indexes of relations modified by the query either through
+ * UPDATE/DELETE/INSERT/MERGE or SELECT FOR UPDATE
+ */
+ Bitmapset *es_modified_relids;
+
JunkFilter *es_junkFilter; /* top-level junk filter, if any */
/* If query can insert/delete tuples, the command ID to mark them with */
diff --git a/src/test/recovery/t/035_standby_logical_decoding.pl b/src/test/recovery/t/035_standby_logical_decoding.pl
index ebe2fae1789..bdd9f0a62cd 100644
--- a/src/test/recovery/t/035_standby_logical_decoding.pl
+++ b/src/test/recovery/t/035_standby_logical_decoding.pl
@@ -296,6 +296,7 @@ wal_level = 'logical'
max_replication_slots = 4
max_wal_senders = 4
autovacuum = off
+hot_standby_feedback = on
});
$node_primary->dump_info;
$node_primary->start;
@@ -748,7 +749,7 @@ check_pg_recvlogical_stderr($handle,
$logstart = -s $node_standby->logfile;
reactive_slots_change_hfs_and_wait_for_xmins('shared_row_removal_',
- 'no_conflict_', 0, 1);
+ 'no_conflict_', 1, 0);
# This should not trigger a conflict
wait_until_vacuum_can_remove(
--
2.43.0
v21-0012-Set-pd_prune_xid-on-insert.patchtext/x-patch; charset=UTF-8; name=v21-0012-Set-pd_prune_xid-on-insert.patchDownload
From 4e0febe03cd305e81cb73235d750901e9ef379f0 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <melanieplageman@gmail.com>
Date: Tue, 29 Jul 2025 16:12:56 -0400
Subject: [PATCH v21 12/12] Set pd_prune_xid on insert
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
Now that visibility map (VM) updates can occur during read-only queries,
it makes sense to also set the page’s pd_prune_xid hint during inserts.
This enables heap_page_prune_and_freeze() to run after a page is
filled with newly inserted tuples the first time it is read.
This change also addresses a long-standing note in heap_insert() and
heap_multi_insert(), which observed that setting pd_prune_xid would
help clean up aborted insertions sooner. Without it, such tuples might
linger until VACUUM, whereas now they can be pruned earlier.
Setting pd_prune_xid on insert can cause a page to be dirtied and
written out when it previously would not have been, affetcting the
reported number of hits in the index-killtuples isolation test. It is
unclear if this is a bug in the way hits are tracked, a faulty test
expectation, or if simply updating the test's expected output is
sufficient remediation.
---
src/backend/access/heap/heapam.c | 25 +++++++++++++------
src/backend/access/heap/heapam_xlog.c | 15 ++++++++++-
.../isolation/expected/index-killtuples.out | 6 ++---
3 files changed, 34 insertions(+), 12 deletions(-)
diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index ae53e311ce1..f329f497480 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -2104,6 +2104,7 @@ heap_insert(Relation relation, HeapTuple tup, CommandId cid,
TransactionId xid = GetCurrentTransactionId();
HeapTuple heaptup;
Buffer buffer;
+ Page page;
Buffer vmbuffer = InvalidBuffer;
bool all_visible_cleared = false;
@@ -2163,15 +2164,19 @@ heap_insert(Relation relation, HeapTuple tup, CommandId cid,
}
/*
- * XXX Should we set PageSetPrunable on this page ?
+ * Set pd_prune_xid to trigger heap_page_prune_and_freeze() once the page
+ * is full so that we can set the page all-visible in the VM.
*
- * The inserting transaction may eventually abort thus making this tuple
- * DEAD and hence available for pruning. Though we don't want to optimize
- * for aborts, if no other tuple in this page is UPDATEd/DELETEd, the
- * aborted tuple will never be pruned until next vacuum is triggered.
+ * Setting pd_prune_xid is also handy if the inserting transaction
+ * eventually aborts making this tuple DEAD and hence available for
+ * pruning. If no other tuple in this page is UPDATEd/DELETEd, the aborted
+ * tuple would never otherwise be pruned until next vacuum is triggered.
*
- * If you do add PageSetPrunable here, add it in heap_xlog_insert too.
+ * Don't set it if we are in bootstrap mode, though.
*/
+ page = BufferGetPage(buffer);
+ if (TransactionIdIsNormal(xid))
+ PageSetPrunable(page, xid);
MarkBufferDirty(buffer);
@@ -2181,7 +2186,6 @@ heap_insert(Relation relation, HeapTuple tup, CommandId cid,
xl_heap_insert xlrec;
xl_heap_header xlhdr;
XLogRecPtr recptr;
- Page page = BufferGetPage(buffer);
uint8 info = XLOG_HEAP_INSERT;
int bufflags = 0;
@@ -2545,8 +2549,13 @@ heap_multi_insert(Relation relation, TupleTableSlot **slots, int ntuples,
}
/*
- * XXX Should we set PageSetPrunable on this page ? See heap_insert()
+ * Set pd_prune_xid. See heap_insert() for more on why we do this when
+ * inserting tuples. This only makes sense if we aren't already
+ * setting the page frozen in the VM. We also don't set it in
+ * bootstrap mode.
*/
+ if (!all_frozen_set && TransactionIdIsNormal(xid))
+ PageSetPrunable(page, xid);
MarkBufferDirty(buffer);
diff --git a/src/backend/access/heap/heapam_xlog.c b/src/backend/access/heap/heapam_xlog.c
index 5ab46e8bf8f..dac640f5c9d 100644
--- a/src/backend/access/heap/heapam_xlog.c
+++ b/src/backend/access/heap/heapam_xlog.c
@@ -462,6 +462,12 @@ heap_xlog_insert(XLogReaderState *record)
freespace = PageGetHeapFreeSpace(page); /* needed to update FSM below */
+ /*
+ * Set the page prunable to trigger on-access pruning later which may
+ * set the page all-visible in the VM.
+ */
+ PageSetPrunable(page, XLogRecGetXid(record));
+
PageSetLSN(page, lsn);
if (xlrec->flags & XLH_INSERT_ALL_VISIBLE_CLEARED)
@@ -611,9 +617,16 @@ heap_xlog_multi_insert(XLogReaderState *record)
if (xlrec->flags & XLH_INSERT_ALL_VISIBLE_CLEARED)
PageClearAllVisible(page);
- /* XLH_INSERT_ALL_FROZEN_SET implies that all tuples are visible */
+ /*
+ * XLH_INSERT_ALL_FROZEN_SET implies that all tuples are visible. If
+ * we are not setting the page frozen, then set the page's prunable
+ * hint so that we trigger on-access pruning later which may set the
+ * page all-visible in the VM.
+ */
if (xlrec->flags & XLH_INSERT_ALL_FROZEN_SET)
PageSetAllVisible(page);
+ else
+ PageSetPrunable(page, XLogRecGetXid(record));
MarkBufferDirty(buffer);
}
diff --git a/src/test/isolation/expected/index-killtuples.out b/src/test/isolation/expected/index-killtuples.out
index be7ddd756ef..b29f2434b00 100644
--- a/src/test/isolation/expected/index-killtuples.out
+++ b/src/test/isolation/expected/index-killtuples.out
@@ -54,7 +54,7 @@ step flush: SELECT FROM pg_stat_force_next_flush();
step result: SELECT heap_blks_read + heap_blks_hit - counter.heap_accesses AS new_heap_accesses FROM counter, pg_statio_all_tables WHERE relname = 'kill_prior_tuple';
new_heap_accesses
-----------------
- 1
+ 2
(1 row)
step measure: UPDATE counter SET heap_accesses = (SELECT heap_blks_read + heap_blks_hit FROM pg_statio_all_tables WHERE relname = 'kill_prior_tuple');
@@ -130,7 +130,7 @@ step flush: SELECT FROM pg_stat_force_next_flush();
step result: SELECT heap_blks_read + heap_blks_hit - counter.heap_accesses AS new_heap_accesses FROM counter, pg_statio_all_tables WHERE relname = 'kill_prior_tuple';
new_heap_accesses
-----------------
- 1
+ 2
(1 row)
step measure: UPDATE counter SET heap_accesses = (SELECT heap_blks_read + heap_blks_hit FROM pg_statio_all_tables WHERE relname = 'kill_prior_tuple');
@@ -283,7 +283,7 @@ step flush: SELECT FROM pg_stat_force_next_flush();
step result: SELECT heap_blks_read + heap_blks_hit - counter.heap_accesses AS new_heap_accesses FROM counter, pg_statio_all_tables WHERE relname = 'kill_prior_tuple';
new_heap_accesses
-----------------
- 1
+ 2
(1 row)
step measure: UPDATE counter SET heap_accesses = (SELECT heap_blks_read + heap_blks_hit FROM pg_statio_all_tables WHERE relname = 'kill_prior_tuple');
--
2.43.0
On Wed, Nov 19, 2025 at 6:13 PM Melanie Plageman
<melanieplageman@gmail.com> wrote:
Since it is passed into one of the helpers, I think I agree. Attached
v21 has this change.
I've committed the first three patches. Attached v22 is the remaining
patches which set the VM in heap_page_prune_and_freeze() for vacuum
and then allow on-access pruning to also set the VM.
- Melanie
Attachments:
v22-0001-Split-heap_page_prune_and_freeze-into-helpers.patchtext/x-patch; charset=US-ASCII; name=v22-0001-Split-heap_page_prune_and_freeze-into-helpers.patchDownload
From 363f0e4ac9ac7699a6d9c2a267a2ad60825407c8 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <melanieplageman@gmail.com>
Date: Mon, 17 Nov 2025 15:11:27 -0500
Subject: [PATCH v22 1/9] Split heap_page_prune_and_freeze() into helpers
Refactor the setup and planning phases of pruning and freezing into
helpers. This streamlines heap_page_prune_and_freeze() and makes it more
clear when the examination of tuples ends and page modifications begin.
---
src/backend/access/heap/pruneheap.c | 559 +++++++++++++++-------------
1 file changed, 307 insertions(+), 252 deletions(-)
diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index 1850476dcd8..1460193b920 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -157,6 +157,14 @@ typedef struct
} PruneState;
/* Local functions */
+static void prune_freeze_setup(PruneFreezeParams *params,
+ TransactionId new_relfrozen_xid,
+ MultiXactId new_relmin_mxid,
+ const PruneFreezeResult *presult,
+ PruneState *prstate);
+static void prune_freeze_plan(Oid reloid, Buffer buffer,
+ PruneState *prstate,
+ OffsetNumber *off_loc);
static HTSV_Result heap_prune_satisfies_vacuum(PruneState *prstate,
HeapTuple tup,
Buffer buffer);
@@ -308,200 +316,22 @@ heap_page_prune_opt(Relation relation, Buffer buffer)
}
/*
- * Decide whether to proceed with freezing according to the freeze plans
- * prepared for the given heap buffer. If freezing is chosen, this function
- * performs several pre-freeze checks.
- *
- * The values of do_prune, do_hint_prune, and did_tuple_hint_fpi must be
- * determined before calling this function.
- *
- * prstate is both an input and output parameter.
- *
- * Returns true if we should apply the freeze plans and freeze tuples on the
- * page, and false otherwise.
+ * Helper for heap_page_prune_and_freeze() to initialize the PruneState using
+ * the provided parameters.
*/
-static bool
-heap_page_will_freeze(Relation relation, Buffer buffer,
- bool did_tuple_hint_fpi,
- bool do_prune,
- bool do_hint_prune,
- PruneState *prstate)
-{
- bool do_freeze = false;
-
- /*
- * If the caller specified we should not attempt to freeze any tuples,
- * validate that everything is in the right state and return.
- */
- if (!prstate->attempt_freeze)
- {
- Assert(!prstate->all_frozen && prstate->nfrozen == 0);
- Assert(prstate->lpdead_items == 0 || !prstate->all_visible);
- return false;
- }
-
- if (prstate->pagefrz.freeze_required)
- {
- /*
- * heap_prepare_freeze_tuple indicated that at least one XID/MXID from
- * before FreezeLimit/MultiXactCutoff is present. Must freeze to
- * advance relfrozenxid/relminmxid.
- */
- do_freeze = true;
- }
- else
- {
- /*
- * Opportunistically freeze the page if we are generating an FPI
- * anyway and if doing so means that we can set the page all-frozen
- * afterwards (might not happen until VACUUM's final heap pass).
- *
- * XXX: Previously, we knew if pruning emitted an FPI by checking
- * pgWalUsage.wal_fpi before and after pruning. Once the freeze and
- * prune records were combined, this heuristic couldn't be used
- * anymore. The opportunistic freeze heuristic must be improved;
- * however, for now, try to approximate the old logic.
- */
- if (prstate->all_frozen && prstate->nfrozen > 0)
- {
- Assert(prstate->all_visible);
-
- /*
- * Freezing would make the page all-frozen. Have already emitted
- * an FPI or will do so anyway?
- */
- if (RelationNeedsWAL(relation))
- {
- if (did_tuple_hint_fpi)
- do_freeze = true;
- else if (do_prune)
- {
- if (XLogCheckBufferNeedsBackup(buffer))
- do_freeze = true;
- }
- else if (do_hint_prune)
- {
- if (XLogHintBitIsNeeded() && XLogCheckBufferNeedsBackup(buffer))
- do_freeze = true;
- }
- }
- }
- }
-
- if (do_freeze)
- {
- /*
- * Validate the tuples we will be freezing before entering the
- * critical section.
- */
- heap_pre_freeze_checks(buffer, prstate->frozen, prstate->nfrozen);
-
- /*
- * Calculate what the snapshot conflict horizon should be for a record
- * freezing tuples. We can use the visibility_cutoff_xid as our cutoff
- * for conflicts when the whole page is eligible to become all-frozen
- * in the VM once we're done with it. Otherwise, we generate a
- * conservative cutoff by stepping back from OldestXmin.
- */
- if (prstate->all_frozen)
- prstate->frz_conflict_horizon = prstate->visibility_cutoff_xid;
- else
- {
- /* Avoids false conflicts when hot_standby_feedback in use */
- prstate->frz_conflict_horizon = prstate->cutoffs->OldestXmin;
- TransactionIdRetreat(prstate->frz_conflict_horizon);
- }
- }
- else if (prstate->nfrozen > 0)
- {
- /*
- * The page contained some tuples that were not already frozen, and we
- * chose not to freeze them now. The page won't be all-frozen then.
- */
- Assert(!prstate->pagefrz.freeze_required);
-
- prstate->all_frozen = false;
- prstate->nfrozen = 0; /* avoid miscounts in instrumentation */
- }
- else
- {
- /*
- * We have no freeze plans to execute. The page might already be
- * all-frozen (perhaps only following pruning), though. Such pages
- * can be marked all-frozen in the VM by our caller, even though none
- * of its tuples were newly frozen here.
- */
- }
-
- return do_freeze;
-}
-
-
-/*
- * Prune and repair fragmentation and potentially freeze tuples on the
- * specified page.
- *
- * Caller must have pin and buffer cleanup lock on the page. Note that we
- * don't update the FSM information for page on caller's behalf. Caller might
- * also need to account for a reduction in the length of the line pointer
- * array following array truncation by us.
- *
- * params contains the input parameters used to control freezing and pruning
- * behavior. See the definition of PruneFreezeParams for more on what each
- * parameter does.
- *
- * If the HEAP_PAGE_PRUNE_FREEZE option is set in params, we will freeze
- * tuples if it's required in order to advance relfrozenxid / relminmxid, or
- * if it's considered advantageous for overall system performance to do so
- * now. The 'params.cutoffs', 'presult', 'new_relfrozen_xid' and
- * 'new_relmin_mxid' arguments are required when freezing. When
- * HEAP_PAGE_PRUNE_FREEZE option is passed, we also set presult->all_visible
- * and presult->all_frozen after determining whether or not to
- * opporunistically freeze, to indicate if the VM bits can be set. They are
- * always set to false when the HEAP_PAGE_PRUNE_FREEZE option is not passed,
- * because at the moment only callers that also freeze need that information.
- *
- * presult contains output parameters needed by callers, such as the number of
- * tuples removed and the offsets of dead items on the page after pruning.
- * heap_page_prune_and_freeze() is responsible for initializing it. Required
- * by all callers.
- *
- * off_loc is the offset location required by the caller to use in error
- * callback.
- *
- * new_relfrozen_xid and new_relmin_mxid must provided by the caller if the
- * HEAP_PAGE_PRUNE_FREEZE option is set in params. On entry, they contain the
- * oldest XID and multi-XID seen on the relation so far. They will be updated
- * with oldest values present on the page after pruning. After processing the
- * whole relation, VACUUM can use these values as the new
- * relfrozenxid/relminmxid for the relation.
- */
-void
-heap_page_prune_and_freeze(PruneFreezeParams *params,
- PruneFreezeResult *presult,
- OffsetNumber *off_loc,
- TransactionId *new_relfrozen_xid,
- MultiXactId *new_relmin_mxid)
+static void
+prune_freeze_setup(PruneFreezeParams *params,
+ TransactionId new_relfrozen_xid,
+ MultiXactId new_relmin_mxid,
+ const PruneFreezeResult *presult,
+ PruneState *prstate)
{
- Buffer buffer = params->buffer;
- Page page = BufferGetPage(buffer);
- BlockNumber blockno = BufferGetBlockNumber(buffer);
- OffsetNumber offnum,
- maxoff;
- PruneState prstate;
- HeapTupleData tup;
- bool do_freeze;
- bool do_prune;
- bool do_hint_prune;
- bool did_tuple_hint_fpi;
- int64 fpi_before = pgWalUsage.wal_fpi;
-
/* Copy parameters to prstate */
- prstate.vistest = params->vistest;
- prstate.mark_unused_now =
+ prstate->vistest = params->vistest;
+ prstate->mark_unused_now =
(params->options & HEAP_PAGE_PRUNE_MARK_UNUSED_NOW) != 0;
- prstate.attempt_freeze = (params->options & HEAP_PAGE_PRUNE_FREEZE) != 0;
- prstate.cutoffs = params->cutoffs;
+ prstate->attempt_freeze = (params->options & HEAP_PAGE_PRUNE_FREEZE) != 0;
+ prstate->cutoffs = params->cutoffs;
/*
* Our strategy is to scan the page and make lists of items to change,
@@ -514,41 +344,42 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
* prunable, we will save the lowest relevant XID in new_prune_xid. Also
* initialize the rest of our working state.
*/
- prstate.new_prune_xid = InvalidTransactionId;
- prstate.latest_xid_removed = InvalidTransactionId;
- prstate.nredirected = prstate.ndead = prstate.nunused = prstate.nfrozen = 0;
- prstate.nroot_items = 0;
- prstate.nheaponly_items = 0;
+ prstate->new_prune_xid = InvalidTransactionId;
+ prstate->latest_xid_removed = InvalidTransactionId;
+ prstate->nredirected = prstate->ndead = prstate->nunused = 0;
+ prstate->nfrozen = 0;
+ prstate->nroot_items = 0;
+ prstate->nheaponly_items = 0;
/* initialize page freezing working state */
- prstate.pagefrz.freeze_required = false;
- if (prstate.attempt_freeze)
+ prstate->pagefrz.freeze_required = false;
+ if (prstate->attempt_freeze)
{
- Assert(new_relfrozen_xid && new_relmin_mxid);
- prstate.pagefrz.FreezePageRelfrozenXid = *new_relfrozen_xid;
- prstate.pagefrz.NoFreezePageRelfrozenXid = *new_relfrozen_xid;
- prstate.pagefrz.FreezePageRelminMxid = *new_relmin_mxid;
- prstate.pagefrz.NoFreezePageRelminMxid = *new_relmin_mxid;
+ prstate->pagefrz.FreezePageRelfrozenXid = new_relfrozen_xid;
+ prstate->pagefrz.NoFreezePageRelfrozenXid = new_relfrozen_xid;
+ prstate->pagefrz.FreezePageRelminMxid = new_relmin_mxid;
+ prstate->pagefrz.NoFreezePageRelminMxid = new_relmin_mxid;
}
else
{
- Assert(new_relfrozen_xid == NULL && new_relmin_mxid == NULL);
- prstate.pagefrz.FreezePageRelminMxid = InvalidMultiXactId;
- prstate.pagefrz.NoFreezePageRelminMxid = InvalidMultiXactId;
- prstate.pagefrz.FreezePageRelfrozenXid = InvalidTransactionId;
- prstate.pagefrz.NoFreezePageRelfrozenXid = InvalidTransactionId;
+ Assert(new_relfrozen_xid == InvalidTransactionId &&
+ new_relmin_mxid == InvalidMultiXactId);
+ prstate->pagefrz.FreezePageRelminMxid = InvalidMultiXactId;
+ prstate->pagefrz.NoFreezePageRelminMxid = InvalidMultiXactId;
+ prstate->pagefrz.FreezePageRelfrozenXid = InvalidTransactionId;
+ prstate->pagefrz.NoFreezePageRelfrozenXid = InvalidTransactionId;
}
- prstate.ndeleted = 0;
- prstate.live_tuples = 0;
- prstate.recently_dead_tuples = 0;
- prstate.hastup = false;
- prstate.lpdead_items = 0;
- prstate.deadoffsets = presult->deadoffsets;
- prstate.frz_conflict_horizon = InvalidTransactionId;
+ prstate->ndeleted = 0;
+ prstate->live_tuples = 0;
+ prstate->recently_dead_tuples = 0;
+ prstate->hastup = false;
+ prstate->lpdead_items = 0;
+ prstate->deadoffsets = (OffsetNumber *) presult->deadoffsets;
+ prstate->frz_conflict_horizon = InvalidTransactionId;
/*
- * Caller may update the VM after we're done. We can keep track of
+ * Vacuum may update the VM after we're done. We can keep track of
* whether the page will be all-visible and all-frozen after pruning and
* freezing to help the caller to do that.
*
@@ -571,10 +402,10 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
* all_frozen before we return them to the caller, so that the caller
* doesn't set the VM bits incorrectly.
*/
- if (prstate.attempt_freeze)
+ if (prstate->attempt_freeze)
{
- prstate.all_visible = true;
- prstate.all_frozen = true;
+ prstate->all_visible = true;
+ prstate->all_frozen = true;
}
else
{
@@ -582,8 +413,8 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
* Initializing to false allows skipping the work to update them in
* heap_prune_record_unchanged_lp_normal().
*/
- prstate.all_visible = false;
- prstate.all_frozen = false;
+ prstate->all_visible = false;
+ prstate->all_frozen = false;
}
/*
@@ -594,10 +425,29 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
* running transaction on the standby does not see tuples on the page as
* all-visible, so the conflict horizon remains InvalidTransactionId.
*/
- prstate.visibility_cutoff_xid = InvalidTransactionId;
+ prstate->visibility_cutoff_xid = InvalidTransactionId;
+}
- maxoff = PageGetMaxOffsetNumber(page);
- tup.t_tableOid = RelationGetRelid(params->relation);
+/*
+ * Helper for heap_page_prune_and_freeze(). Iterates over every tuple on the
+ * page, examines its visibility information, and determines the appropriate
+ * action for each tuple. All tuples are processed and classified during this
+ * phase, but no modifications are made to the page until the later execution
+ * stage.
+ *
+ * *off_loc is used for error callback and cleared before returning.
+ */
+static void
+prune_freeze_plan(Oid reloid, Buffer buffer, PruneState *prstate,
+ OffsetNumber *off_loc)
+{
+ Page page = BufferGetPage(buffer);
+ BlockNumber blockno = BufferGetBlockNumber(buffer);
+ OffsetNumber maxoff = PageGetMaxOffsetNumber(page);
+ OffsetNumber offnum;
+ HeapTupleData tup;
+
+ tup.t_tableOid = reloid;
/*
* Determine HTSV for all tuples, and queue them up for processing as HOT
@@ -632,13 +482,13 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
*/
*off_loc = offnum;
- prstate.processed[offnum] = false;
- prstate.htsv[offnum] = -1;
+ prstate->processed[offnum] = false;
+ prstate->htsv[offnum] = -1;
/* Nothing to do if slot doesn't contain a tuple */
if (!ItemIdIsUsed(itemid))
{
- heap_prune_record_unchanged_lp_unused(page, &prstate, offnum);
+ heap_prune_record_unchanged_lp_unused(page, prstate, offnum);
continue;
}
@@ -648,17 +498,17 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
* If the caller set mark_unused_now true, we can set dead line
* pointers LP_UNUSED now.
*/
- if (unlikely(prstate.mark_unused_now))
- heap_prune_record_unused(&prstate, offnum, false);
+ if (unlikely(prstate->mark_unused_now))
+ heap_prune_record_unused(prstate, offnum, false);
else
- heap_prune_record_unchanged_lp_dead(page, &prstate, offnum);
+ heap_prune_record_unchanged_lp_dead(page, prstate, offnum);
continue;
}
if (ItemIdIsRedirected(itemid))
{
/* This is the start of a HOT chain */
- prstate.root_items[prstate.nroot_items++] = offnum;
+ prstate->root_items[prstate->nroot_items++] = offnum;
continue;
}
@@ -672,21 +522,15 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
tup.t_len = ItemIdGetLength(itemid);
ItemPointerSet(&tup.t_self, blockno, offnum);
- prstate.htsv[offnum] = heap_prune_satisfies_vacuum(&prstate, &tup,
- buffer);
+ prstate->htsv[offnum] = heap_prune_satisfies_vacuum(prstate, &tup,
+ buffer);
if (!HeapTupleHeaderIsHeapOnly(htup))
- prstate.root_items[prstate.nroot_items++] = offnum;
+ prstate->root_items[prstate->nroot_items++] = offnum;
else
- prstate.heaponly_items[prstate.nheaponly_items++] = offnum;
+ prstate->heaponly_items[prstate->nheaponly_items++] = offnum;
}
- /*
- * If checksums are enabled, heap_prune_satisfies_vacuum() may have caused
- * an FPI to be emitted.
- */
- did_tuple_hint_fpi = fpi_before != pgWalUsage.wal_fpi;
-
/*
* Process HOT chains.
*
@@ -698,30 +542,30 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
* the page instead of using the root_items array, also did it in
* ascending offset number order.)
*/
- for (int i = prstate.nroot_items - 1; i >= 0; i--)
+ for (int i = prstate->nroot_items - 1; i >= 0; i--)
{
- offnum = prstate.root_items[i];
+ offnum = prstate->root_items[i];
/* Ignore items already processed as part of an earlier chain */
- if (prstate.processed[offnum])
+ if (prstate->processed[offnum])
continue;
/* see preceding loop */
*off_loc = offnum;
/* Process this item or chain of items */
- heap_prune_chain(page, blockno, maxoff, offnum, &prstate);
+ heap_prune_chain(page, blockno, maxoff, offnum, prstate);
}
/*
* Process any heap-only tuples that were not already processed as part of
* a HOT chain.
*/
- for (int i = prstate.nheaponly_items - 1; i >= 0; i--)
+ for (int i = prstate->nheaponly_items - 1; i >= 0; i--)
{
- offnum = prstate.heaponly_items[i];
+ offnum = prstate->heaponly_items[i];
- if (prstate.processed[offnum])
+ if (prstate->processed[offnum])
continue;
/* see preceding loop */
@@ -740,7 +584,7 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
* return true for an XMIN_INVALID tuple, so this code will work even
* when there were sequential updates within the aborted transaction.)
*/
- if (prstate.htsv[offnum] == HEAPTUPLE_DEAD)
+ if (prstate->htsv[offnum] == HEAPTUPLE_DEAD)
{
ItemId itemid = PageGetItemId(page, offnum);
HeapTupleHeader htup = (HeapTupleHeader) PageGetItem(page, itemid);
@@ -748,8 +592,8 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
if (likely(!HeapTupleHeaderIsHotUpdated(htup)))
{
HeapTupleHeaderAdvanceConflictHorizon(htup,
- &prstate.latest_xid_removed);
- heap_prune_record_unused(&prstate, offnum, true);
+ &prstate->latest_xid_removed);
+ heap_prune_record_unused(prstate, offnum, true);
}
else
{
@@ -766,7 +610,7 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
}
}
else
- heap_prune_record_unchanged_lp_normal(page, &prstate, offnum);
+ heap_prune_record_unchanged_lp_normal(page, prstate, offnum);
}
/* We should now have processed every tuple exactly once */
@@ -777,12 +621,223 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
{
*off_loc = offnum;
- Assert(prstate.processed[offnum]);
+ Assert(prstate->processed[offnum]);
}
#endif
/* Clear the offset information once we have processed the given page. */
*off_loc = InvalidOffsetNumber;
+}
+
+/*
+ * Decide whether to proceed with freezing according to the freeze plans
+ * prepared for the given heap buffer. If freezing is chosen, this function
+ * performs several pre-freeze checks.
+ *
+ * The values of do_prune, do_hint_prune, and did_tuple_hint_fpi must be
+ * determined before calling this function.
+ *
+ * prstate is both an input and output parameter.
+ *
+ * Returns true if we should apply the freeze plans and freeze tuples on the
+ * page, and false otherwise.
+ */
+static bool
+heap_page_will_freeze(Relation relation, Buffer buffer,
+ bool did_tuple_hint_fpi,
+ bool do_prune,
+ bool do_hint_prune,
+ PruneState *prstate)
+{
+ bool do_freeze = false;
+
+ /*
+ * If the caller specified we should not attempt to freeze any tuples,
+ * validate that everything is in the right state and return.
+ */
+ if (!prstate->attempt_freeze)
+ {
+ Assert(!prstate->all_frozen && prstate->nfrozen == 0);
+ Assert(prstate->lpdead_items == 0 || !prstate->all_visible);
+ return false;
+ }
+
+ if (prstate->pagefrz.freeze_required)
+ {
+ /*
+ * heap_prepare_freeze_tuple indicated that at least one XID/MXID from
+ * before FreezeLimit/MultiXactCutoff is present. Must freeze to
+ * advance relfrozenxid/relminmxid.
+ */
+ do_freeze = true;
+ }
+ else
+ {
+ /*
+ * Opportunistically freeze the page if we are generating an FPI
+ * anyway and if doing so means that we can set the page all-frozen
+ * afterwards (might not happen until VACUUM's final heap pass).
+ *
+ * XXX: Previously, we knew if pruning emitted an FPI by checking
+ * pgWalUsage.wal_fpi before and after pruning. Once the freeze and
+ * prune records were combined, this heuristic couldn't be used
+ * anymore. The opportunistic freeze heuristic must be improved;
+ * however, for now, try to approximate the old logic.
+ */
+ if (prstate->all_frozen && prstate->nfrozen > 0)
+ {
+ Assert(prstate->all_visible);
+
+ /*
+ * Freezing would make the page all-frozen. Have already emitted
+ * an FPI or will do so anyway?
+ */
+ if (RelationNeedsWAL(relation))
+ {
+ if (did_tuple_hint_fpi)
+ do_freeze = true;
+ else if (do_prune)
+ {
+ if (XLogCheckBufferNeedsBackup(buffer))
+ do_freeze = true;
+ }
+ else if (do_hint_prune)
+ {
+ if (XLogHintBitIsNeeded() && XLogCheckBufferNeedsBackup(buffer))
+ do_freeze = true;
+ }
+ }
+ }
+ }
+
+ if (do_freeze)
+ {
+ /*
+ * Validate the tuples we will be freezing before entering the
+ * critical section.
+ */
+ heap_pre_freeze_checks(buffer, prstate->frozen, prstate->nfrozen);
+
+ /*
+ * Calculate what the snapshot conflict horizon should be for a record
+ * freezing tuples. We can use the visibility_cutoff_xid as our cutoff
+ * for conflicts when the whole page is eligible to become all-frozen
+ * in the VM once we're done with it. Otherwise, we generate a
+ * conservative cutoff by stepping back from OldestXmin.
+ */
+ if (prstate->all_frozen)
+ prstate->frz_conflict_horizon = prstate->visibility_cutoff_xid;
+ else
+ {
+ /* Avoids false conflicts when hot_standby_feedback in use */
+ prstate->frz_conflict_horizon = prstate->cutoffs->OldestXmin;
+ TransactionIdRetreat(prstate->frz_conflict_horizon);
+ }
+ }
+ else if (prstate->nfrozen > 0)
+ {
+ /*
+ * The page contained some tuples that were not already frozen, and we
+ * chose not to freeze them now. The page won't be all-frozen then.
+ */
+ Assert(!prstate->pagefrz.freeze_required);
+
+ prstate->all_frozen = false;
+ prstate->nfrozen = 0; /* avoid miscounts in instrumentation */
+ }
+ else
+ {
+ /*
+ * We have no freeze plans to execute. The page might already be
+ * all-frozen (perhaps only following pruning), though. Such pages
+ * can be marked all-frozen in the VM by our caller, even though none
+ * of its tuples were newly frozen here.
+ */
+ }
+
+ return do_freeze;
+}
+
+
+/*
+ * Prune and repair fragmentation and potentially freeze tuples on the
+ * specified page.
+ *
+ * Caller must have pin and buffer cleanup lock on the page. Note that we
+ * don't update the FSM information for page on caller's behalf. Caller might
+ * also need to account for a reduction in the length of the line pointer
+ * array following array truncation by us.
+ *
+ * params contains the input parameters used to control freezing and pruning
+ * behavior. See the definition of PruneFreezeParams for more on what each
+ * parameter does.
+ *
+ * If the HEAP_PAGE_PRUNE_FREEZE option is set in params, we will freeze
+ * tuples if it's required in order to advance relfrozenxid / relminmxid, or
+ * if it's considered advantageous for overall system performance to do so
+ * now. The 'params.cutoffs', 'presult', 'new_relfrozen_xid' and
+ * 'new_relmin_mxid' arguments are required when freezing. When
+ * HEAP_PAGE_PRUNE_FREEZE option is passed, we also set presult->all_visible
+ * and presult->all_frozen after determining whether or not to
+ * opporunistically freeze, to indicate if the VM bits can be set. They are
+ * always set to false when the HEAP_PAGE_PRUNE_FREEZE option is not passed,
+ * because at the moment only callers that also freeze need that information.
+ *
+ * presult contains output parameters needed by callers, such as the number of
+ * tuples removed and the offsets of dead items on the page after pruning.
+ * heap_page_prune_and_freeze() is responsible for initializing it. Required
+ * by all callers.
+ *
+ * off_loc is the offset location required by the caller to use in error
+ * callback.
+ *
+ * new_relfrozen_xid and new_relmin_mxid must provided by the caller if the
+ * HEAP_PAGE_PRUNE_FREEZE option is set in params. On entry, they contain the
+ * oldest XID and multi-XID seen on the relation so far. They will be updated
+ * with oldest values present on the page after pruning. After processing the
+ * whole relation, VACUUM can use these values as the new
+ * relfrozenxid/relminmxid for the relation.
+ */
+void
+heap_page_prune_and_freeze(PruneFreezeParams *params,
+ PruneFreezeResult *presult,
+ OffsetNumber *off_loc,
+ TransactionId *new_relfrozen_xid,
+ MultiXactId *new_relmin_mxid)
+{
+ Buffer buffer = params->buffer;
+ Page page = BufferGetPage(buffer);
+ PruneState prstate;
+ bool do_freeze;
+ bool do_prune;
+ bool do_hint_prune;
+ bool did_tuple_hint_fpi;
+ int64 fpi_before = pgWalUsage.wal_fpi;
+
+ /* Initialize prstate */
+ prune_freeze_setup(params,
+ new_relfrozen_xid ?
+ *new_relfrozen_xid : InvalidTransactionId,
+ new_relmin_mxid ?
+ *new_relmin_mxid : InvalidMultiXactId,
+ presult,
+ &prstate);
+
+ /*
+ * Examine all line pointers and tuple visibility information to determine
+ * which line pointers should change state and which tuples may be frozen.
+ * Prepare queue of state changes to later be executed in a critical
+ * section.
+ */
+ prune_freeze_plan(RelationGetRelid(params->relation),
+ buffer, &prstate, off_loc);
+
+ /*
+ * If checksums are enabled, calling heap_prune_satisfies_vacuum() while
+ * checking tuple visibility information in prune_freeze_plan() may have
+ * caused an FPI to be emitted.
+ */
+ did_tuple_hint_fpi = fpi_before != pgWalUsage.wal_fpi;
do_prune = prstate.nredirected > 0 ||
prstate.ndead > 0 ||
--
2.43.0
v22-0002-Eliminate-XLOG_HEAP2_VISIBLE-from-vacuum-phase-I.patchtext/x-patch; charset=UTF-8; name=v22-0002-Eliminate-XLOG_HEAP2_VISIBLE-from-vacuum-phase-I.patchDownload
From 8ebaf434af5afaebcf71550116c59355b3bf15c1 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <melanieplageman@gmail.com>
Date: Wed, 8 Oct 2025 15:39:01 -0400
Subject: [PATCH v22 2/9] Eliminate XLOG_HEAP2_VISIBLE from vacuum phase I
prune/freeze
Vacuum no longer emits a separate WAL record for each page set
all-visible or all-frozen during phase I. Instead, visibility map
updates are now included in the XLOG_HEAP2_PRUNE_VACUUM_SCAN record that
is already emitted for pruning and freezing.
Previously, heap_page_prune_and_freeze() determined whether a page was
all-visible, but the corresponding VM bits were only set later in
lazy_scan_prune(). Now the VM is updated immediately in
heap_page_prune_and_freeze(), at the same time as the heap
modifications.
This change applies only to vacuum phase I, not to pruning performed
during normal page access.
Author: Melanie Plageman <melanieplageman@gmail.com>
Reviewed-by: Robert Haas <robertmhaas@gmail.com>
Reviewed-by: Kirill Reshke <reshkekirill@gmail.com>
Discussion: https://postgr.es/m/flat/CAAKRu_ZMw6Npd_qm2KM%2BFwQ3cMOMx1Dh3VMhp8-V7SOLxdK9-g%40mail.gmail.com
---
src/backend/access/heap/heapam_xlog.c | 37 ++-
src/backend/access/heap/pruneheap.c | 462 +++++++++++++++++++++-----
src/backend/access/heap/vacuumlazy.c | 241 +-------------
src/include/access/heapam.h | 43 ++-
4 files changed, 447 insertions(+), 336 deletions(-)
diff --git a/src/backend/access/heap/heapam_xlog.c b/src/backend/access/heap/heapam_xlog.c
index 11cb3f74da5..2af724451c3 100644
--- a/src/backend/access/heap/heapam_xlog.c
+++ b/src/backend/access/heap/heapam_xlog.c
@@ -104,6 +104,8 @@ heap_xlog_prune_freeze(XLogReaderState *record)
OffsetNumber *frz_offsets;
char *dataptr = XLogRecGetBlockData(record, 0, &datalen);
bool do_prune;
+ bool set_lsn = false;
+ bool mark_buffer_dirty = false;
heap_xlog_deserialize_prune_and_freeze(dataptr, xlrec.flags,
&nplans, &plans, &frz_offsets,
@@ -157,17 +159,36 @@ heap_xlog_prune_freeze(XLogReaderState *record)
/* There should be no more data */
Assert((char *) frz_offsets == dataptr + datalen);
- if (vmflags & VISIBILITYMAP_VALID_BITS)
- PageSetAllVisible(page);
-
- MarkBufferDirty(buffer);
+ if (do_prune || nplans > 0)
+ mark_buffer_dirty = set_lsn = true;
/*
- * See log_heap_prune_and_freeze() for commentary on when we set the
- * heap page LSN.
+ * The critical integrity requirement here is that we must never end
+ * up with with the visibility map bit set and the page-level
+ * PD_ALL_VISIBLE bit clear. If that were to occur, a subsequent page
+ * modification would fail to clear the visibility map bit.
+ *
+ * vmflags may be nonzero with PD_ALL_VISIBLE already set (e.g. when
+ * marking an all-visible page all-frozen). If only the VM is updated,
+ * the heap page need not be dirtied.
*/
- if (do_prune || nplans > 0 ||
- ((vmflags & VISIBILITYMAP_VALID_BITS) && XLogHintBitIsNeeded()))
+ if ((vmflags & VISIBILITYMAP_VALID_BITS) && !PageIsAllVisible(page))
+ {
+ PageSetAllVisible(page);
+ mark_buffer_dirty = true;
+
+ /*
+ * See log_heap_prune_and_freeze() for commentary on when we set
+ * the heap page LSN.
+ */
+ if (XLogHintBitIsNeeded())
+ set_lsn = true;
+ }
+
+ if (mark_buffer_dirty)
+ MarkBufferDirty(buffer);
+
+ if (set_lsn)
PageSetLSN(page, lsn);
/*
diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index 1460193b920..ba578c1ce0f 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -19,7 +19,7 @@
#include "access/htup_details.h"
#include "access/multixact.h"
#include "access/transam.h"
-#include "access/visibilitymapdefs.h"
+#include "access/visibilitymap.h"
#include "access/xlog.h"
#include "access/xloginsert.h"
#include "commands/vacuum.h"
@@ -44,6 +44,8 @@ typedef struct
bool mark_unused_now;
/* whether to attempt freezing tuples */
bool attempt_freeze;
+ /* whether or not to attempt updating the VM */
+ bool attempt_update_vm;
struct VacuumCutoffs *cutoffs;
/*-------------------------------------------------------
@@ -140,16 +142,17 @@ typedef struct
* all_visible and all_frozen indicate if the all-visible and all-frozen
* bits in the visibility map can be set for this page after pruning.
*
- * visibility_cutoff_xid is the newest xmin of live tuples on the page.
- * The caller can use it as the conflict horizon, when setting the VM
- * bits. It is only valid if we froze some tuples, and all_frozen is
- * true.
+ * visibility_cutoff_xid is the newest xmin of live tuples on the page. It
+ * can be used as the conflict horizon when setting the VM or when
+ * freezing all the tuples on the page. It is only valid when all the live
+ * tuples on the page are all-visible.
*
* NOTE: all_visible and all_frozen initially don't include LP_DEAD items.
* That's convenient for heap_page_prune_and_freeze() to use them to
- * decide whether to freeze the page or not. The all_visible and
- * all_frozen values returned to the caller are adjusted to include
- * LP_DEAD items after we determine whether to opportunistically freeze.
+ * decide whether to opportunistically freeze the page or not. The
+ * all_visible and all_frozen values ultimately used to set the VM are
+ * adjusted to include LP_DEAD items after we determine whether or not to
+ * opportunistically freeze.
*/
bool all_visible;
bool all_frozen;
@@ -188,10 +191,21 @@ static void heap_prune_record_unchanged_lp_redirect(PruneState *prstate, OffsetN
static void page_verify_redirects(Page page);
+static TransactionId get_conflict_xid(bool do_prune, bool do_freeze, bool do_set_vm,
+ TransactionId latest_xid_removed, TransactionId frz_conflict_horizon,
+ TransactionId visibility_cutoff_xid, bool blk_already_av,
+ bool set_blk_all_frozen);
static bool heap_page_will_freeze(Relation relation, Buffer buffer,
bool did_tuple_hint_fpi, bool do_prune, bool do_hint_prune,
PruneState *prstate);
-
+static bool heap_page_will_set_vis(Relation relation,
+ BlockNumber heap_blk,
+ Buffer heap_buf,
+ Buffer vmbuffer,
+ bool blk_known_av,
+ const PruneState *prstate,
+ uint8 *vmflags,
+ bool *do_set_pd_vis);
/*
* Optionally prune and repair fragmentation in the specified page.
@@ -278,6 +292,7 @@ heap_page_prune_opt(Relation relation, Buffer buffer)
* current implementation.
*/
PruneFreezeParams params = {.relation = relation,.buffer = buffer,
+ .vmbuffer = InvalidBuffer,.blk_known_av = false,
.reason = PRUNE_ON_ACCESS,.options = 0,
.vistest = vistest,.cutoffs = NULL
};
@@ -331,6 +346,8 @@ prune_freeze_setup(PruneFreezeParams *params,
prstate->mark_unused_now =
(params->options & HEAP_PAGE_PRUNE_MARK_UNUSED_NOW) != 0;
prstate->attempt_freeze = (params->options & HEAP_PAGE_PRUNE_FREEZE) != 0;
+ prstate->attempt_update_vm =
+ (params->options & HEAP_PAGE_PRUNE_UPDATE_VIS) != 0;
prstate->cutoffs = params->cutoffs;
/*
@@ -379,51 +396,54 @@ prune_freeze_setup(PruneFreezeParams *params,
prstate->frz_conflict_horizon = InvalidTransactionId;
/*
- * Vacuum may update the VM after we're done. We can keep track of
- * whether the page will be all-visible and all-frozen after pruning and
- * freezing to help the caller to do that.
+ * Track whether the page could be marked all-visible and/or all-frozen.
+ * This information is used for opportunistic freezing and for updating
+ * the visibility map (VM) if requested by the caller.
+ *
+ * Currently, only VACUUM performs freezing, but other callers may in the
+ * future. Visibility bookkeeping is required not just for setting the VM
+ * bits, but also for opportunistic freezing: we only consider freezing if
+ * the page would become all-frozen, or if it would be all-frozen except
+ * for dead tuples that VACUUM will remove. If attempt_update_vm is false,
+ * we will not set the VM bit even if the page is found to be all-visible.
*
- * Currently, only VACUUM sets the VM bits. To save the effort, only do
- * the bookkeeping if the caller needs it. Currently, that's tied to
- * HEAP_PAGE_PRUNE_FREEZE, but it could be a separate flag if you wanted
- * to update the VM bits without also freezing or freeze without also
- * setting the VM bits.
+ * If HEAP_PAGE_PRUNE_UPDATE_VIS is passed without HEAP_PAGE_PRUNE_FREEZE,
+ * prstate.all_frozen must be initialized to false, since we will not call
+ * heap_prepare_freeze_tuple() for each tuple.
*
- * In addition to telling the caller whether it can set the VM bit, we
- * also use 'all_visible' and 'all_frozen' for our own decision-making. If
- * the whole page would become frozen, we consider opportunistically
- * freezing tuples. We will not be able to freeze the whole page if there
- * are tuples present that are not visible to everyone or if there are
- * dead tuples which are not yet removable. However, dead tuples which
- * will be removed by the end of vacuuming should not preclude us from
- * opportunistically freezing. Because of that, we do not immediately
- * clear all_visible and all_frozen when we see LP_DEAD items. We fix
- * that after scanning the line pointers. We must correct all_visible and
- * all_frozen before we return them to the caller, so that the caller
- * doesn't set the VM bits incorrectly.
+ * Dead tuples that will be removed by the end of vacuum should not
+ * prevent opportunistic freezing. Therefore, we do not clear all_visible
+ * and all_frozen when we encounter LP_DEAD items. Instead, we correct
+ * them after deciding whether to freeze, but before updating the VM, to
+ * avoid setting the VM bits incorrectly.
+ *
+ * If neither freezing nor VM updates are requested, we skip the extra
+ * bookkeeping. In this case, initializing all_visible to false allows
+ * heap_prune_record_unchanged_lp_normal() to bypass unnecessary work.
*/
if (prstate->attempt_freeze)
{
prstate->all_visible = true;
prstate->all_frozen = true;
}
+ else if (prstate->attempt_update_vm)
+ {
+ prstate->all_visible = true;
+ prstate->all_frozen = false;
+ }
else
{
- /*
- * Initializing to false allows skipping the work to update them in
- * heap_prune_record_unchanged_lp_normal().
- */
prstate->all_visible = false;
prstate->all_frozen = false;
}
/*
- * The visibility cutoff xid is the newest xmin of live tuples on the
- * page. In the common case, this will be set as the conflict horizon the
- * caller can use for updating the VM. If, at the end of freezing and
- * pruning, the page is all-frozen, there is no possibility that any
- * running transaction on the standby does not see tuples on the page as
- * all-visible, so the conflict horizon remains InvalidTransactionId.
+ * The visibility cutoff xid is the newest xmin of live, committed tuples
+ * older than OldestXmin on the page. This field is only kept up-to-date
+ * if the page is all-visible. As soon as a tuple is encountered that is
+ * not visible to all, this field is unmaintained. As long as it is
+ * maintained, it can be used to calculate the snapshot conflict horizon
+ * when updating the VM and/or freezing all the tuples on the page.
*/
prstate->visibility_cutoff_xid = InvalidTransactionId;
}
@@ -758,10 +778,133 @@ heap_page_will_freeze(Relation relation, Buffer buffer,
return do_freeze;
}
+/*
+ * Decide whether to set the visibility map bits for heap_blk, using
+ * information from PruneState and blk_known_av. Some callers may already
+ * have examined this page’s VM bits (e.g., VACUUM in the previous
+ * heap_vac_scan_next_block() call) and can pass that along.
+ *
+ * Returns true if one or both VM bits should be set, along with the desired
+ * flags in *vmflags. Also indicates via do_set_pd_vis whether PD_ALL_VISIBLE
+ * should be set on the heap page.
+ */
+static bool
+heap_page_will_set_vis(Relation relation,
+ BlockNumber heap_blk,
+ Buffer heap_buf,
+ Buffer vmbuffer,
+ bool blk_known_av,
+ const PruneState *prstate,
+ uint8 *vmflags,
+ bool *do_set_pd_vis)
+{
+ Page heap_page = BufferGetPage(heap_buf);
+ bool do_set_vm = false;
+
+ *do_set_pd_vis = false;
+
+ if (!prstate->attempt_update_vm)
+ {
+ Assert(!prstate->all_visible && !prstate->all_frozen);
+ Assert(*vmflags == 0);
+ return false;
+ }
+
+ if (prstate->all_visible && !PageIsAllVisible(heap_page))
+ *do_set_pd_vis = true;
+
+ if ((prstate->all_visible && !blk_known_av) ||
+ (prstate->all_frozen && !VM_ALL_FROZEN(relation, heap_blk, &vmbuffer)))
+ {
+ *vmflags = VISIBILITYMAP_ALL_VISIBLE;
+ if (prstate->all_frozen)
+ *vmflags |= VISIBILITYMAP_ALL_FROZEN;
+
+ do_set_vm = true;
+ }
+
+ /*
+ * Now handle two potential corruption cases:
+ *
+ * These do not need to happen in a critical section and are not
+ * WAL-logged.
+ *
+ * As of PostgreSQL 9.2, the visibility map bit should never be set if the
+ * page-level bit is clear. However, it's possible that in vacuum the bit
+ * got cleared after heap_vac_scan_next_block() was called, so we must
+ * recheck with buffer lock before concluding that the VM is corrupt.
+ */
+ else if (blk_known_av && !PageIsAllVisible(heap_page) &&
+ visibilitymap_get_status(relation, heap_blk, &vmbuffer) != 0)
+ {
+ ereport(WARNING,
+ (errcode(ERRCODE_DATA_CORRUPTED),
+ errmsg("page is not marked all-visible but visibility map bit is set in relation \"%s\" page %u",
+ RelationGetRelationName(relation), heap_blk)));
+
+ visibilitymap_clear(relation, heap_blk, vmbuffer,
+ VISIBILITYMAP_VALID_BITS);
+ }
+
+ /*
+ * It's possible for the value returned by
+ * GetOldestNonRemovableTransactionId() to move backwards, so it's not
+ * wrong for us to see tuples that appear to not be visible to everyone
+ * yet, while PD_ALL_VISIBLE is already set. The real safe xmin value
+ * never moves backwards, but GetOldestNonRemovableTransactionId() is
+ * conservative and sometimes returns a value that's unnecessarily small,
+ * so if we see that contradiction it just means that the tuples that we
+ * think are not visible to everyone yet actually are, and the
+ * PD_ALL_VISIBLE flag is correct.
+ *
+ * There should never be LP_DEAD items on a page with PD_ALL_VISIBLE set,
+ * however.
+ */
+ else if (prstate->lpdead_items > 0 && PageIsAllVisible(heap_page))
+ {
+ ereport(WARNING,
+ (errcode(ERRCODE_DATA_CORRUPTED),
+ errmsg("page containing LP_DEAD items is marked as all-visible in relation \"%s\" page %u",
+ RelationGetRelationName(relation), heap_blk)));
+
+ PageClearAllVisible(heap_page);
+ MarkBufferDirty(heap_buf);
+ visibilitymap_clear(relation, heap_blk, vmbuffer,
+ VISIBILITYMAP_VALID_BITS);
+ }
+
+ return do_set_vm;
+}
+
+
+#ifdef USE_ASSERT_CHECKING
+
+/*
+ * Wrapper for heap_page_would_be_all_visible() which can be used for callers
+ * that expect no LP_DEAD on the page. Currently assert-only, but there is no
+ * reason not to use it outside of asserts.
+ */
+static bool
+heap_page_is_all_visible(Relation rel, Buffer buf,
+ TransactionId OldestXmin,
+ bool *all_frozen,
+ TransactionId *visibility_cutoff_xid,
+ OffsetNumber *logging_offnum)
+{
+
+ return heap_page_would_be_all_visible(rel, buf,
+ OldestXmin,
+ NULL, 0,
+ all_frozen,
+ visibility_cutoff_xid,
+ logging_offnum);
+}
+#endif
/*
* Prune and repair fragmentation and potentially freeze tuples on the
- * specified page.
+ * specified page. If the page's visibility status has changed, update it in
+ * the VM.
*
* Caller must have pin and buffer cleanup lock on the page. Note that we
* don't update the FSM information for page on caller's behalf. Caller might
@@ -776,12 +919,13 @@ heap_page_will_freeze(Relation relation, Buffer buffer,
* tuples if it's required in order to advance relfrozenxid / relminmxid, or
* if it's considered advantageous for overall system performance to do so
* now. The 'params.cutoffs', 'presult', 'new_relfrozen_xid' and
- * 'new_relmin_mxid' arguments are required when freezing. When
- * HEAP_PAGE_PRUNE_FREEZE option is passed, we also set presult->all_visible
- * and presult->all_frozen after determining whether or not to
- * opporunistically freeze, to indicate if the VM bits can be set. They are
- * always set to false when the HEAP_PAGE_PRUNE_FREEZE option is not passed,
- * because at the moment only callers that also freeze need that information.
+ * 'new_relmin_mxid' arguments are required when freezing.
+ *
+ * If HEAP_PAGE_PRUNE_UPDATE_VIS is set in params and the visibility status of
+ * the page has changed, we will update the VM at the same time as pruning and
+ * freezing the heap page. We will also update presult->old_vmbits and
+ * presult->new_vmbits with the state of the VM before and after updating it
+ * for the caller to use in bookkeeping.
*
* presult contains output parameters needed by callers, such as the number of
* tuples removed and the offsets of dead items on the page after pruning.
@@ -806,13 +950,20 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
MultiXactId *new_relmin_mxid)
{
Buffer buffer = params->buffer;
+ Buffer vmbuffer = params->vmbuffer;
Page page = BufferGetPage(buffer);
+ BlockNumber blockno = BufferGetBlockNumber(buffer);
PruneState prstate;
bool do_freeze;
bool do_prune;
bool do_hint_prune;
+ bool do_set_vm;
+ bool do_set_pd_vis;
bool did_tuple_hint_fpi;
int64 fpi_before = pgWalUsage.wal_fpi;
+ TransactionId conflict_xid = InvalidTransactionId;
+ uint8 new_vmbits = 0;
+ uint8 old_vmbits = 0;
/* Initialize prstate */
prune_freeze_setup(params,
@@ -878,6 +1029,34 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
prstate.all_visible = prstate.all_frozen = false;
Assert(!prstate.all_frozen || prstate.all_visible);
+ Assert(!prstate.all_visible || (prstate.lpdead_items == 0));
+
+ /*
+ * Decide whether to set the page-level PD_ALL_VISIBLE bit and the VM bits
+ * based on information from the VM and the all_visible/all_frozen flags.
+ *
+ * While it is valid for PD_ALL_VISIBLE to be set when the corresponding
+ * VM bit is clear, we strongly prefer to keep them in sync.
+ *
+ * Accordingly, we also allow updating only the VM when PD_ALL_VISIBLE has
+ * already been set. Setting only the VM is most common when setting an
+ * already all-visible page all-frozen.
+ */
+ do_set_vm = heap_page_will_set_vis(params->relation,
+ blockno, buffer, vmbuffer, params->blk_known_av,
+ &prstate, &new_vmbits, &do_set_pd_vis);
+
+ /* We should only set the VM if PD_ALL_VISIBLE is set or will be */
+ Assert(!do_set_vm || do_set_pd_vis || PageIsAllVisible(page));
+
+ conflict_xid = get_conflict_xid(do_prune, do_freeze, do_set_vm,
+ prstate.latest_xid_removed, prstate.frz_conflict_horizon,
+ prstate.visibility_cutoff_xid, params->blk_known_av,
+ (do_set_vm && (new_vmbits & VISIBILITYMAP_ALL_FROZEN)));
+
+ /* Lock vmbuffer before entering a critical section */
+ if (do_set_vm)
+ LockBuffer(vmbuffer, BUFFER_LOCK_EXCLUSIVE);
/* Any error while applying the changes is critical */
START_CRIT_SECTION();
@@ -899,14 +1078,17 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
/*
* If that's all we had to do to the page, this is a non-WAL-logged
- * hint. If we are going to freeze or prune the page, we will mark
- * the buffer dirty below.
+ * hint. If we are going to freeze or prune the page or set
+ * PD_ALL_VISIBLE, we will mark the buffer dirty below.
+ *
+ * Setting PD_ALL_VISIBLE is fully WAL-logged because it is forbidden
+ * for the VM to be set and PD_ALL_VISIBLE to be clear.
*/
- if (!do_freeze && !do_prune)
+ if (!do_freeze && !do_prune && !do_set_pd_vis)
MarkBufferDirtyHint(buffer, true);
}
- if (do_prune || do_freeze)
+ if (do_prune || do_freeze || do_set_vm)
{
/* Apply the planned item changes and repair page fragmentation. */
if (do_prune)
@@ -920,36 +1102,43 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
if (do_freeze)
heap_freeze_prepared_tuples(buffer, prstate.frozen, prstate.nfrozen);
- MarkBufferDirty(buffer);
+ if (do_set_pd_vis)
+ PageSetAllVisible(page);
- /*
- * Emit a WAL XLOG_HEAP2_PRUNE* record showing what we did
- */
- if (RelationNeedsWAL(params->relation))
+ if (do_prune || do_freeze || do_set_pd_vis)
+ MarkBufferDirty(buffer);
+
+ if (do_set_vm)
{
- /*
- * The snapshotConflictHorizon for the whole record should be the
- * most conservative of all the horizons calculated for any of the
- * possible modifications. If this record will prune tuples, any
- * transactions on the standby older than the youngest xmax of the
- * most recently removed tuple this record will prune will
- * conflict. If this record will freeze tuples, any transactions
- * on the standby with xids older than the youngest tuple this
- * record will freeze will conflict.
- */
- TransactionId conflict_xid;
+ Assert(PageIsAllVisible(page));
- if (TransactionIdFollows(prstate.frz_conflict_horizon,
- prstate.latest_xid_removed))
- conflict_xid = prstate.frz_conflict_horizon;
- else
- conflict_xid = prstate.latest_xid_removed;
+ old_vmbits = visibilitymap_set_vmbits(blockno,
+ vmbuffer, new_vmbits,
+ params->relation->rd_locator);
+ if (old_vmbits == new_vmbits)
+ {
+ LockBuffer(vmbuffer, BUFFER_LOCK_UNLOCK);
+ /* Unset so we don't emit WAL since no change occurred */
+ do_set_vm = false;
+ }
+ }
+ /*
+ * Emit a WAL XLOG_HEAP2_PRUNE* record showing what we did. If we were
+ * only updating the VM and it turns out it was already set, we will
+ * have unset do_set_vm earlier. As such, check it again before
+ * emitting the record.
+ */
+ if (RelationNeedsWAL(params->relation) &&
+ (do_prune || do_freeze || do_set_vm))
+ {
log_heap_prune_and_freeze(params->relation, buffer,
- InvalidBuffer, /* vmbuffer */
- 0, /* vmflags */
+ do_set_vm ? vmbuffer : InvalidBuffer,
+ do_set_vm ? new_vmbits : 0,
conflict_xid,
- true, params->reason,
+ true, /* cleanup lock */
+ do_set_pd_vis,
+ params->reason,
prstate.frozen, prstate.nfrozen,
prstate.redirected, prstate.nredirected,
prstate.nowdead, prstate.ndead,
@@ -959,28 +1148,47 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
END_CRIT_SECTION();
+ if (do_set_vm)
+ LockBuffer(vmbuffer, BUFFER_LOCK_UNLOCK);
+
+ /*
+ * During its second pass over the heap, VACUUM calls
+ * heap_page_would_be_all_visible() to determine whether a page is
+ * all-visible and all-frozen. The logic here is similar. After completing
+ * pruning and freezing, use an assertion to verify that our results
+ * remain consistent with heap_page_would_be_all_visible().
+ */
+#ifdef USE_ASSERT_CHECKING
+ if (prstate.all_visible)
+ {
+ TransactionId debug_cutoff;
+ bool debug_all_frozen;
+
+ Assert(prstate.lpdead_items == 0);
+ Assert(prstate.cutoffs);
+
+ if (!heap_page_is_all_visible(params->relation, buffer,
+ prstate.cutoffs->OldestXmin,
+ &debug_all_frozen,
+ &debug_cutoff, off_loc))
+ Assert(false);
+
+ Assert(prstate.all_frozen == debug_all_frozen);
+
+ Assert(!TransactionIdIsValid(debug_cutoff) ||
+ debug_cutoff == prstate.visibility_cutoff_xid);
+ }
+#endif
+
/* Copy information back for caller */
presult->ndeleted = prstate.ndeleted;
presult->nnewlpdead = prstate.ndead;
presult->nfrozen = prstate.nfrozen;
presult->live_tuples = prstate.live_tuples;
presult->recently_dead_tuples = prstate.recently_dead_tuples;
- presult->all_visible = prstate.all_visible;
- presult->all_frozen = prstate.all_frozen;
presult->hastup = prstate.hastup;
-
- /*
- * For callers planning to update the visibility map, the conflict horizon
- * for that record must be the newest xmin on the page. However, if the
- * page is completely frozen, there can be no conflict and the
- * vm_conflict_horizon should remain InvalidTransactionId. This includes
- * the case that we just froze all the tuples; the prune-freeze record
- * included the conflict XID already so the caller doesn't need it.
- */
- if (presult->all_frozen)
- presult->vm_conflict_horizon = InvalidTransactionId;
- else
- presult->vm_conflict_horizon = prstate.visibility_cutoff_xid;
+ presult->new_vmbits = new_vmbits;
+ presult->old_vmbits = old_vmbits;
presult->lpdead_items = prstate.lpdead_items;
/* the presult->deadoffsets array was already filled in */
@@ -1472,6 +1680,8 @@ heap_prune_record_unchanged_lp_normal(Page page, PruneState *prstate, OffsetNumb
{
TransactionId xmin;
+ Assert(prstate->attempt_update_vm);
+
if (!HeapTupleHeaderXminCommitted(htup))
{
prstate->all_visible = false;
@@ -2122,6 +2332,65 @@ heap_log_freeze_plan(HeapTupleFreeze *tuples, int ntuples,
return nplans;
}
+/*
+ * Calculate the conflict horizon for the whole XLOG_HEAP2_PRUNE_VACUUM_SCAN
+ * record.
+ */
+static TransactionId
+get_conflict_xid(bool do_prune, bool do_freeze, bool do_set_vm,
+ TransactionId latest_xid_removed, TransactionId frz_conflict_horizon,
+ TransactionId visibility_cutoff_xid, bool blk_already_av,
+ bool set_blk_all_frozen)
+{
+ TransactionId conflict_xid;
+
+ /*
+ * We can omit the snapshot conflict horizon if we are not pruning or
+ * freezing any tuples and are setting an already all-visible page
+ * all-frozen in the VM. In this case, all of the tuples on the page must
+ * already be visible to all MVCC snapshots on the standby.
+ */
+ if (!do_prune && !do_freeze &&
+ do_set_vm && blk_already_av && set_blk_all_frozen)
+ return InvalidTransactionId;
+
+ /*
+ * The snapshotConflictHorizon for the whole record should be the most
+ * conservative of all the horizons calculated for any of the possible
+ * modifications. If this record will prune tuples, any transactions on
+ * the standby older than the youngest xmax of the most recently removed
+ * tuple this record will prune will conflict. If this record will freeze
+ * tuples, any transactions on the standby with xids older than the
+ * youngest tuple this record will freeze will conflict.
+ */
+ conflict_xid = InvalidTransactionId;
+
+ /*
+ * If we are updating the VM, the conflict horizon is almost always the
+ * visibility cutoff XID.
+ *
+ * Separately, if we are freezing any tuples, as an optimization, we can
+ * use the visibility_cutoff_xid as the conflict horizon if the page will
+ * be all-frozen. This is true even if there are LP_DEAD line pointers
+ * because we ignored those when maintaining the visibility_cutoff_xid.
+ * This will have been calculated earlier as the frz_conflict_horizon when
+ * we determined we would freeze.
+ */
+ if (do_set_vm)
+ conflict_xid = visibility_cutoff_xid;
+ else if (do_freeze)
+ conflict_xid = frz_conflict_horizon;
+
+ /*
+ * If we are removing tuples with a younger xmax than our so far
+ * calculated conflict_xid, we must use this as our horizon.
+ */
+ if (TransactionIdFollows(latest_xid_removed, conflict_xid))
+ conflict_xid = latest_xid_removed;
+
+ return conflict_xid;
+}
+
/*
* Write an XLOG_HEAP2_PRUNE* WAL record
*
@@ -2146,6 +2415,15 @@ heap_log_freeze_plan(HeapTupleFreeze *tuples, int ntuples,
* case, vmbuffer should already have been updated and marked dirty and should
* still be pinned and locked.
*
+ * set_pd_all_vis indicates that we set PD_ALL_VISIBLE and thus should update
+ * the page LSN when checksums/wal_log_hints are enabled even if we did not
+ * prune or freeze tuples on the page.
+ *
+ * In some cases, such as when heap_page_prune_and_freeze() is setting an
+ * already marked all-visible page all-frozen, PD_ALL_VISIBLE may already be
+ * set. So, it is possible for vmflags to be non-zero and set_pd_all_vis to be
+ * false.
+ *
* Note: This function scribbles on the 'frozen' array.
*
* Note: This is called in a critical section, so careful what you do here.
@@ -2155,6 +2433,7 @@ log_heap_prune_and_freeze(Relation relation, Buffer buffer,
Buffer vmbuffer, uint8 vmflags,
TransactionId conflict_xid,
bool cleanup_lock,
+ bool set_pd_all_vis,
PruneReason reason,
HeapTupleFreeze *frozen, int nfrozen,
OffsetNumber *redirected, int nredirected,
@@ -2191,7 +2470,7 @@ log_heap_prune_and_freeze(Relation relation, Buffer buffer,
*/
if (!do_prune &&
nfrozen == 0 &&
- (!do_set_vm || !XLogHintBitIsNeeded()))
+ (!set_pd_all_vis || !XLogHintBitIsNeeded()))
regbuf_flags_heap |= REGBUF_NO_IMAGE;
/*
@@ -2309,7 +2588,8 @@ log_heap_prune_and_freeze(Relation relation, Buffer buffer,
* See comment at the top of the function about regbuf_flags_heap for
* details on when we can advance the page LSN.
*/
- if (do_prune || nfrozen > 0 || (do_set_vm && XLogHintBitIsNeeded()))
+ if (do_prune || nfrozen > 0 ||
+ (set_pd_all_vis && XLogHintBitIsNeeded()))
{
Assert(BufferIsDirty(buffer));
PageSetLSN(BufferGetPage(buffer), recptr);
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index 7a6d6f42634..ef73eafb4f6 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -464,20 +464,6 @@ static void dead_items_add(LVRelState *vacrel, BlockNumber blkno, OffsetNumber *
static void dead_items_reset(LVRelState *vacrel);
static void dead_items_cleanup(LVRelState *vacrel);
-#ifdef USE_ASSERT_CHECKING
-static bool heap_page_is_all_visible(Relation rel, Buffer buf,
- TransactionId OldestXmin,
- bool *all_frozen,
- TransactionId *visibility_cutoff_xid,
- OffsetNumber *logging_offnum);
-#endif
-static bool heap_page_would_be_all_visible(Relation rel, Buffer buf,
- TransactionId OldestXmin,
- OffsetNumber *deadoffsets,
- int ndeadoffsets,
- bool *all_frozen,
- TransactionId *visibility_cutoff_xid,
- OffsetNumber *logging_offnum);
static void update_relstats_all_indexes(LVRelState *vacrel);
static void vacuum_error_callback(void *arg);
static void update_vacuum_error_info(LVRelState *vacrel,
@@ -1966,7 +1952,9 @@ lazy_scan_prune(LVRelState *vacrel,
Relation rel = vacrel->rel;
PruneFreezeResult presult;
PruneFreezeParams params = {.relation = rel,.buffer = buf,
- .reason = PRUNE_VACUUM_SCAN,.options = HEAP_PAGE_PRUNE_FREEZE,
+ .vmbuffer = vmbuffer,.blk_known_av = all_visible_according_to_vm,
+ .reason = PRUNE_VACUUM_SCAN,
+ .options = HEAP_PAGE_PRUNE_FREEZE | HEAP_PAGE_PRUNE_UPDATE_VIS,
.vistest = vacrel->vistest,.cutoffs = &vacrel->cutoffs
};
@@ -2009,33 +1997,6 @@ lazy_scan_prune(LVRelState *vacrel,
vacrel->new_frozen_tuple_pages++;
}
- /*
- * VACUUM will call heap_page_is_all_visible() during the second pass over
- * the heap to determine all_visible and all_frozen for the page -- this
- * is a specialized version of the logic from this function. Now that
- * we've finished pruning and freezing, make sure that we're in total
- * agreement with heap_page_is_all_visible() using an assertion.
- */
-#ifdef USE_ASSERT_CHECKING
- if (presult.all_visible)
- {
- TransactionId debug_cutoff;
- bool debug_all_frozen;
-
- Assert(presult.lpdead_items == 0);
-
- if (!heap_page_is_all_visible(vacrel->rel, buf,
- vacrel->cutoffs.OldestXmin, &debug_all_frozen,
- &debug_cutoff, &vacrel->offnum))
- Assert(false);
-
- Assert(presult.all_frozen == debug_all_frozen);
-
- Assert(!TransactionIdIsValid(debug_cutoff) ||
- debug_cutoff == presult.vm_conflict_horizon);
- }
-#endif
-
/*
* Now save details of the LP_DEAD items from the page in vacrel
*/
@@ -2069,168 +2030,26 @@ lazy_scan_prune(LVRelState *vacrel,
/* Did we find LP_DEAD items? */
*has_lpdead_items = (presult.lpdead_items > 0);
- Assert(!presult.all_visible || !(*has_lpdead_items));
- Assert(!presult.all_frozen || presult.all_visible);
-
/*
- * Handle setting visibility map bit based on information from the VM (as
- * of last heap_vac_scan_next_block() call), and from all_visible and
- * all_frozen variables
+ * For the purposes of logging, count whether or not the page was newly
+ * set all-visible and, potentially, all-frozen.
*/
- if (!all_visible_according_to_vm && presult.all_visible)
+ if ((presult.old_vmbits & VISIBILITYMAP_ALL_VISIBLE) == 0 &&
+ (presult.new_vmbits & VISIBILITYMAP_ALL_VISIBLE) != 0)
{
- uint8 old_vmbits;
- uint8 flags = VISIBILITYMAP_ALL_VISIBLE;
-
- if (presult.all_frozen)
- {
- Assert(!TransactionIdIsValid(presult.vm_conflict_horizon));
- flags |= VISIBILITYMAP_ALL_FROZEN;
- }
-
- /*
- * It should never be the case that the visibility map page is set
- * while the page-level bit is clear, but the reverse is allowed (if
- * checksums are not enabled). Regardless, set both bits so that we
- * get back in sync.
- *
- * NB: If the heap page is all-visible but the VM bit is not set, we
- * don't need to dirty the heap page. However, if checksums are
- * enabled, we do need to make sure that the heap page is dirtied
- * before passing it to visibilitymap_set(), because it may be logged.
- * Given that this situation should only happen in rare cases after a
- * crash, it is not worth optimizing.
- */
- PageSetAllVisible(page);
- MarkBufferDirty(buf);
- old_vmbits = visibilitymap_set(vacrel->rel, blkno, buf,
- InvalidXLogRecPtr,
- vmbuffer, presult.vm_conflict_horizon,
- flags);
-
- /*
- * If the page wasn't already set all-visible and/or all-frozen in the
- * VM, count it as newly set for logging.
- */
- if ((old_vmbits & VISIBILITYMAP_ALL_VISIBLE) == 0)
- {
- vacrel->vm_new_visible_pages++;
- if (presult.all_frozen)
- {
- vacrel->vm_new_visible_frozen_pages++;
- *vm_page_frozen = true;
- }
- }
- else if ((old_vmbits & VISIBILITYMAP_ALL_FROZEN) == 0 &&
- presult.all_frozen)
+ vacrel->vm_new_visible_pages++;
+ if ((presult.new_vmbits & VISIBILITYMAP_ALL_FROZEN) != 0)
{
- vacrel->vm_new_frozen_pages++;
+ vacrel->vm_new_visible_frozen_pages++;
*vm_page_frozen = true;
}
}
-
- /*
- * As of PostgreSQL 9.2, the visibility map bit should never be set if the
- * page-level bit is clear. However, it's possible that the bit got
- * cleared after heap_vac_scan_next_block() was called, so we must recheck
- * with buffer lock before concluding that the VM is corrupt.
- */
- else if (all_visible_according_to_vm && !PageIsAllVisible(page) &&
- visibilitymap_get_status(vacrel->rel, blkno, &vmbuffer) != 0)
- {
- ereport(WARNING,
- (errcode(ERRCODE_DATA_CORRUPTED),
- errmsg("page is not marked all-visible but visibility map bit is set in relation \"%s\" page %u",
- vacrel->relname, blkno)));
-
- visibilitymap_clear(vacrel->rel, blkno, vmbuffer,
- VISIBILITYMAP_VALID_BITS);
- }
-
- /*
- * It's possible for the value returned by
- * GetOldestNonRemovableTransactionId() to move backwards, so it's not
- * wrong for us to see tuples that appear to not be visible to everyone
- * yet, while PD_ALL_VISIBLE is already set. The real safe xmin value
- * never moves backwards, but GetOldestNonRemovableTransactionId() is
- * conservative and sometimes returns a value that's unnecessarily small,
- * so if we see that contradiction it just means that the tuples that we
- * think are not visible to everyone yet actually are, and the
- * PD_ALL_VISIBLE flag is correct.
- *
- * There should never be LP_DEAD items on a page with PD_ALL_VISIBLE set,
- * however.
- */
- else if (presult.lpdead_items > 0 && PageIsAllVisible(page))
- {
- ereport(WARNING,
- (errcode(ERRCODE_DATA_CORRUPTED),
- errmsg("page containing LP_DEAD items is marked as all-visible in relation \"%s\" page %u",
- vacrel->relname, blkno)));
-
- PageClearAllVisible(page);
- MarkBufferDirty(buf);
- visibilitymap_clear(vacrel->rel, blkno, vmbuffer,
- VISIBILITYMAP_VALID_BITS);
- }
-
- /*
- * If the all-visible page is all-frozen but not marked as such yet, mark
- * it as all-frozen.
- */
- else if (all_visible_according_to_vm && presult.all_frozen &&
- !VM_ALL_FROZEN(vacrel->rel, blkno, &vmbuffer))
+ else if ((presult.old_vmbits & VISIBILITYMAP_ALL_FROZEN) == 0 &&
+ (presult.new_vmbits & VISIBILITYMAP_ALL_FROZEN) != 0)
{
- uint8 old_vmbits;
-
- /*
- * Avoid relying on all_visible_according_to_vm as a proxy for the
- * page-level PD_ALL_VISIBLE bit being set, since it might have become
- * stale -- even when all_visible is set
- */
- if (!PageIsAllVisible(page))
- {
- PageSetAllVisible(page);
- MarkBufferDirty(buf);
- }
-
- /*
- * Set the page all-frozen (and all-visible) in the VM.
- *
- * We can pass InvalidTransactionId as our cutoff_xid, since a
- * snapshotConflictHorizon sufficient to make everything safe for REDO
- * was logged when the page's tuples were frozen.
- */
- Assert(!TransactionIdIsValid(presult.vm_conflict_horizon));
- old_vmbits = visibilitymap_set(vacrel->rel, blkno, buf,
- InvalidXLogRecPtr,
- vmbuffer, InvalidTransactionId,
- VISIBILITYMAP_ALL_VISIBLE |
- VISIBILITYMAP_ALL_FROZEN);
-
- /*
- * The page was likely already set all-visible in the VM. However,
- * there is a small chance that it was modified sometime between
- * setting all_visible_according_to_vm and checking the visibility
- * during pruning. Check the return value of old_vmbits anyway to
- * ensure the visibility map counters used for logging are accurate.
- */
- if ((old_vmbits & VISIBILITYMAP_ALL_VISIBLE) == 0)
- {
- vacrel->vm_new_visible_pages++;
- vacrel->vm_new_visible_frozen_pages++;
- *vm_page_frozen = true;
- }
-
- /*
- * We already checked that the page was not set all-frozen in the VM
- * above, so we don't need to test the value of old_vmbits.
- */
- else
- {
- vacrel->vm_new_frozen_pages++;
- *vm_page_frozen = true;
- }
+ Assert((presult.new_vmbits & VISIBILITYMAP_ALL_VISIBLE) != 0);
+ vacrel->vm_new_frozen_pages++;
+ *vm_page_frozen = true;
}
return presult.ndeleted;
@@ -2952,6 +2771,7 @@ lazy_vacuum_heap_page(LVRelState *vacrel, BlockNumber blkno, Buffer buffer,
vmflags,
conflict_xid,
false, /* no cleanup lock required */
+ (vmflags & VISIBILITYMAP_VALID_BITS) != 0,
PRUNE_VACUUM_CLEANUP,
NULL, 0, /* frozen */
NULL, 0, /* redirected */
@@ -3632,30 +3452,6 @@ dead_items_cleanup(LVRelState *vacrel)
vacrel->pvs = NULL;
}
-#ifdef USE_ASSERT_CHECKING
-
-/*
- * Wrapper for heap_page_would_be_all_visible() which can be used for callers
- * that expect no LP_DEAD on the page. Currently assert-only, but there is no
- * reason not to use it outside of asserts.
- */
-static bool
-heap_page_is_all_visible(Relation rel, Buffer buf,
- TransactionId OldestXmin,
- bool *all_frozen,
- TransactionId *visibility_cutoff_xid,
- OffsetNumber *logging_offnum)
-{
-
- return heap_page_would_be_all_visible(rel, buf,
- OldestXmin,
- NULL, 0,
- all_frozen,
- visibility_cutoff_xid,
- logging_offnum);
-}
-#endif
-
/*
* Check whether the heap page in buf is all-visible except for the dead
* tuples referenced in the deadoffsets array.
@@ -3678,15 +3474,12 @@ heap_page_is_all_visible(Relation rel, Buffer buf,
* - *logging_offnum: OffsetNumber of current tuple being processed;
* used by vacuum's error callback system.
*
- * Callers looking to verify that the page is already all-visible can call
- * heap_page_is_all_visible().
- *
* This logic is closely related to heap_prune_record_unchanged_lp_normal().
* If you modify this function, ensure consistency with that code. An
* assertion cross-checks that both remain in agreement. Do not introduce new
* side-effects.
*/
-static bool
+bool
heap_page_would_be_all_visible(Relation rel, Buffer buf,
TransactionId OldestXmin,
OffsetNumber *deadoffsets,
diff --git a/src/include/access/heapam.h b/src/include/access/heapam.h
index 632c4332a8c..937b46a77db 100644
--- a/src/include/access/heapam.h
+++ b/src/include/access/heapam.h
@@ -42,6 +42,7 @@
/* "options" flag bits for heap_page_prune_and_freeze */
#define HEAP_PAGE_PRUNE_MARK_UNUSED_NOW (1 << 0)
#define HEAP_PAGE_PRUNE_FREEZE (1 << 1)
+#define HEAP_PAGE_PRUNE_UPDATE_VIS (1 << 2)
typedef struct BulkInsertStateData *BulkInsertState;
typedef struct GlobalVisState GlobalVisState;
@@ -238,6 +239,16 @@ typedef struct PruneFreezeParams
Relation relation; /* relation containing buffer to be pruned */
Buffer buffer; /* buffer to be pruned */
+ /*
+ *
+ * vmbuffer is the buffer that must already contain contain the required
+ * block of the visibility map if we are to update it. blk_known_av is the
+ * visibility status of the heap block as of the last call to
+ * find_next_unskippable_block().
+ */
+ Buffer vmbuffer;
+ bool blk_known_av;
+
/*
* The reason pruning was performed. It is used to set the WAL record
* opcode which is used for debugging and analysis purposes.
@@ -250,8 +261,10 @@ typedef struct PruneFreezeParams
* HEAP_PAGE_PRUNE_MARK_UNUSED_NOW indicates that dead items can be set
* LP_UNUSED during pruning.
*
- * HEAP_PAGE_PRUNE_FREEZE indicates that we will also freeze tuples, and
- * will return 'all_visible', 'all_frozen' flags to the caller.
+ * HEAP_PAGE_PRUNE_FREEZE indicates that we will also freeze tuples
+ *
+ * HEAP_PAGE_PRUNE_UPDATE_VIS indicates that we will set the page's status
+ * in the VM.
*/
int options;
@@ -285,19 +298,15 @@ typedef struct PruneFreezeResult
int recently_dead_tuples;
/*
- * all_visible and all_frozen indicate if the all-visible and all-frozen
- * bits in the visibility map can be set for this page, after pruning.
- *
- * vm_conflict_horizon is the newest xmin of live tuples on the page. The
- * caller can use it as the conflict horizon when setting the VM bits. It
- * is only valid if we froze some tuples (nfrozen > 0), and all_frozen is
- * true.
+ * old_vmbits are the state of the all-visible and all-frozen bits in the
+ * visibility map before updating it during phase I of vacuuming.
+ * new_vmbits are the state of those bits after phase I of vacuuming.
*
- * These are only set if the HEAP_PRUNE_FREEZE option is set.
+ * These are only set if the HEAP_PAGE_PRUNE_UPDATE_VIS option is set and
+ * we have attempted to update the VM.
*/
- bool all_visible;
- bool all_frozen;
- TransactionId vm_conflict_horizon;
+ uint8 new_vmbits;
+ uint8 old_vmbits;
/*
* Whether or not the page makes rel truncation unsafe. This is set to
@@ -424,6 +433,7 @@ extern void log_heap_prune_and_freeze(Relation relation, Buffer buffer,
Buffer vmbuffer, uint8 vmflags,
TransactionId conflict_xid,
bool cleanup_lock,
+ bool set_pd_all_vis,
PruneReason reason,
HeapTupleFreeze *frozen, int nfrozen,
OffsetNumber *redirected, int nredirected,
@@ -433,6 +443,13 @@ extern void log_heap_prune_and_freeze(Relation relation, Buffer buffer,
/* in heap/vacuumlazy.c */
extern void heap_vacuum_rel(Relation rel,
const VacuumParams params, BufferAccessStrategy bstrategy);
+extern bool heap_page_would_be_all_visible(Relation rel, Buffer buf,
+ TransactionId OldestXmin,
+ OffsetNumber *deadoffsets,
+ int ndeadoffsets,
+ bool *all_frozen,
+ TransactionId *visibility_cutoff_xid,
+ OffsetNumber *logging_offnum);
/* in heap/heapam_visibility.c */
extern bool HeapTupleSatisfiesVisibility(HeapTuple htup, Snapshot snapshot,
--
2.43.0
v22-0003-Eliminate-XLOG_HEAP2_VISIBLE-from-empty-page-vac.patchtext/x-patch; charset=US-ASCII; name=v22-0003-Eliminate-XLOG_HEAP2_VISIBLE-from-empty-page-vac.patchDownload
From 34f0009570e117d7d48b560cd097ee25c6cdcc7c Mon Sep 17 00:00:00 2001
From: Melanie Plageman <melanieplageman@gmail.com>
Date: Sat, 27 Sep 2025 11:55:21 -0400
Subject: [PATCH v22 3/9] Eliminate XLOG_HEAP2_VISIBLE from empty-page vacuum
As part of removing XLOG_HEAP2_VISIBLE records, phase I of VACUUM now
marks empty pages all-visible in a XLOG_HEAP2_PRUNE_VACUUM_SCAN record.
Author: Melanie Plageman <melanieplageman@gmail.com>
Reviewed-by: Robert Haas <robertmhaas@gmail.com>
---
src/backend/access/heap/vacuumlazy.c | 36 +++++++++++++++++++++++-----
1 file changed, 30 insertions(+), 6 deletions(-)
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index ef73eafb4f6..6a87fc371a0 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -1872,9 +1872,12 @@ lazy_scan_new_or_empty(LVRelState *vacrel, Buffer buf, BlockNumber blkno,
*/
if (!PageIsAllVisible(page))
{
+ /* Lock vmbuffer before entering critical section */
+ LockBuffer(vmbuffer, BUFFER_LOCK_EXCLUSIVE);
+
START_CRIT_SECTION();
- /* mark buffer dirty before writing a WAL record */
+ /* Mark buffer dirty before writing any WAL records */
MarkBufferDirty(buf);
/*
@@ -1891,13 +1894,34 @@ lazy_scan_new_or_empty(LVRelState *vacrel, Buffer buf, BlockNumber blkno,
log_newpage_buffer(buf, true);
PageSetAllVisible(page);
- visibilitymap_set(vacrel->rel, blkno, buf,
- InvalidXLogRecPtr,
- vmbuffer, InvalidTransactionId,
- VISIBILITYMAP_ALL_VISIBLE |
- VISIBILITYMAP_ALL_FROZEN);
+ visibilitymap_set_vmbits(blkno,
+ vmbuffer,
+ VISIBILITYMAP_ALL_VISIBLE |
+ VISIBILITYMAP_ALL_FROZEN,
+ vacrel->rel->rd_locator);
+
+ /*
+ * Emit WAL for setting PD_ALL_VISIBLE on the heap page and
+ * setting the VM.
+ */
+ if (RelationNeedsWAL(vacrel->rel))
+ log_heap_prune_and_freeze(vacrel->rel, buf,
+ vmbuffer,
+ VISIBILITYMAP_ALL_VISIBLE |
+ VISIBILITYMAP_ALL_FROZEN,
+ InvalidTransactionId, /* conflict xid */
+ false, /* cleanup lock */
+ true, /* set_pd_all_vis */
+ PRUNE_VACUUM_SCAN, /* reason */
+ NULL, 0,
+ NULL, 0,
+ NULL, 0,
+ NULL, 0);
+
END_CRIT_SECTION();
+ LockBuffer(vmbuffer, BUFFER_LOCK_UNLOCK);
+
/* Count the newly all-frozen pages for logging */
vacrel->vm_new_visible_pages++;
vacrel->vm_new_visible_frozen_pages++;
--
2.43.0
v22-0004-Remove-XLOG_HEAP2_VISIBLE-entirely.patchtext/x-patch; charset=US-ASCII; name=v22-0004-Remove-XLOG_HEAP2_VISIBLE-entirely.patchDownload
From 0d6a06d4533cfe153440d301c3d20915ba07892f Mon Sep 17 00:00:00 2001
From: Melanie Plageman <melanieplageman@gmail.com>
Date: Sat, 27 Sep 2025 11:55:36 -0400
Subject: [PATCH v22 4/9] Remove XLOG_HEAP2_VISIBLE entirely
As no remaining users emit XLOG_HEAP2_VISIBLE records.
This includes deleting the xl_heap_visible struct and all functions
responsible for emitting or replaying XLOG_HEAP2_VISIBLE records.
Author: Melanie Plageman <melanieplageman@gmail.com>
Reviewed-by: Andrey Borodin <x4mmm@yandex-team.ru>
---
src/backend/access/common/bufmask.c | 4 +-
src/backend/access/heap/heapam.c | 54 +-------
src/backend/access/heap/heapam_xlog.c | 155 ++---------------------
src/backend/access/heap/pruneheap.c | 18 ++-
src/backend/access/heap/vacuumlazy.c | 16 +--
src/backend/access/heap/visibilitymap.c | 112 +---------------
src/backend/access/rmgrdesc/heapdesc.c | 10 --
src/backend/replication/logical/decode.c | 1 -
src/backend/storage/ipc/standby.c | 12 +-
src/include/access/heapam_xlog.h | 28 +---
src/include/access/visibilitymap.h | 13 +-
src/include/access/visibilitymapdefs.h | 9 --
src/tools/pgindent/typedefs.list | 1 -
13 files changed, 54 insertions(+), 379 deletions(-)
diff --git a/src/backend/access/common/bufmask.c b/src/backend/access/common/bufmask.c
index bb260cffa68..5f07f179415 100644
--- a/src/backend/access/common/bufmask.c
+++ b/src/backend/access/common/bufmask.c
@@ -56,8 +56,8 @@ mask_page_hint_bits(Page page)
/*
* During replay, if the page LSN has advanced past our XLOG record's LSN,
- * we don't mark the page all-visible. See heap_xlog_visible() for
- * details.
+ * we don't mark the page all-visible. See heap_xlog_prune_and_freeze()
+ * for more details.
*/
PageClearAllVisible(page);
}
diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index 4b0c49f4bb0..2bff37e03b5 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -2524,11 +2524,11 @@ heap_multi_insert(Relation relation, TupleTableSlot **slots, int ntuples,
else if (all_frozen_set)
{
PageSetAllVisible(page);
- visibilitymap_set_vmbits(BufferGetBlockNumber(buffer),
- vmbuffer,
- VISIBILITYMAP_ALL_VISIBLE |
- VISIBILITYMAP_ALL_FROZEN,
- relation->rd_locator);
+ visibilitymap_set(BufferGetBlockNumber(buffer),
+ vmbuffer,
+ VISIBILITYMAP_ALL_VISIBLE |
+ VISIBILITYMAP_ALL_FROZEN,
+ relation->rd_locator);
}
/*
@@ -8797,50 +8797,6 @@ bottomup_sort_and_shrink(TM_IndexDeleteOp *delstate)
return nblocksfavorable;
}
-/*
- * Perform XLogInsert for a heap-visible operation. 'block' is the block
- * being marked all-visible, and vm_buffer is the buffer containing the
- * corresponding visibility map block. Both should have already been modified
- * and dirtied.
- *
- * snapshotConflictHorizon comes from the largest xmin on the page being
- * marked all-visible. REDO routine uses it to generate recovery conflicts.
- *
- * If checksums or wal_log_hints are enabled, we may also generate a full-page
- * image of heap_buffer. Otherwise, we optimize away the FPI (by specifying
- * REGBUF_NO_IMAGE for the heap buffer), in which case the caller should *not*
- * update the heap page's LSN.
- */
-XLogRecPtr
-log_heap_visible(Relation rel, Buffer heap_buffer, Buffer vm_buffer,
- TransactionId snapshotConflictHorizon, uint8 vmflags)
-{
- xl_heap_visible xlrec;
- XLogRecPtr recptr;
- uint8 flags;
-
- Assert(BufferIsValid(heap_buffer));
- Assert(BufferIsValid(vm_buffer));
-
- xlrec.snapshotConflictHorizon = snapshotConflictHorizon;
- xlrec.flags = vmflags;
- if (RelationIsAccessibleInLogicalDecoding(rel))
- xlrec.flags |= VISIBILITYMAP_XLOG_CATALOG_REL;
- XLogBeginInsert();
- XLogRegisterData(&xlrec, SizeOfHeapVisible);
-
- XLogRegisterBuffer(0, vm_buffer, 0);
-
- flags = REGBUF_STANDARD;
- if (!XLogHintBitIsNeeded())
- flags |= REGBUF_NO_IMAGE;
- XLogRegisterBuffer(1, heap_buffer, flags);
-
- recptr = XLogInsert(RM_HEAP2_ID, XLOG_HEAP2_VISIBLE);
-
- return recptr;
-}
-
/*
* Perform XLogInsert for a heap-update operation. Caller must already
* have modified the buffer(s) and marked them dirty.
diff --git a/src/backend/access/heap/heapam_xlog.c b/src/backend/access/heap/heapam_xlog.c
index 2af724451c3..5ab46e8bf8f 100644
--- a/src/backend/access/heap/heapam_xlog.c
+++ b/src/backend/access/heap/heapam_xlog.c
@@ -251,7 +251,7 @@ heap_xlog_prune_freeze(XLogReaderState *record)
if (PageIsNew(vmpage))
PageInit(vmpage, BLCKSZ, 0);
- visibilitymap_set_vmbits(blkno, vmbuffer, vmflags, rlocator);
+ visibilitymap_set(blkno, vmbuffer, vmflags, rlocator);
Assert(BufferIsDirty(vmbuffer));
PageSetLSN(vmpage, lsn);
@@ -264,142 +264,6 @@ heap_xlog_prune_freeze(XLogReaderState *record)
XLogRecordPageWithFreeSpace(rlocator, blkno, freespace);
}
-/*
- * Replay XLOG_HEAP2_VISIBLE records.
- *
- * The critical integrity requirement here is that we must never end up with
- * a situation where the visibility map bit is set, and the page-level
- * PD_ALL_VISIBLE bit is clear. If that were to occur, then a subsequent
- * page modification would fail to clear the visibility map bit.
- */
-static void
-heap_xlog_visible(XLogReaderState *record)
-{
- XLogRecPtr lsn = record->EndRecPtr;
- xl_heap_visible *xlrec = (xl_heap_visible *) XLogRecGetData(record);
- Buffer vmbuffer = InvalidBuffer;
- Buffer buffer;
- Page page;
- RelFileLocator rlocator;
- BlockNumber blkno;
- XLogRedoAction action;
-
- Assert((xlrec->flags & VISIBILITYMAP_XLOG_VALID_BITS) == xlrec->flags);
-
- XLogRecGetBlockTag(record, 1, &rlocator, NULL, &blkno);
-
- /*
- * If there are any Hot Standby transactions running that have an xmin
- * horizon old enough that this page isn't all-visible for them, they
- * might incorrectly decide that an index-only scan can skip a heap fetch.
- *
- * NB: It might be better to throw some kind of "soft" conflict here that
- * forces any index-only scan that is in flight to perform heap fetches,
- * rather than killing the transaction outright.
- */
- if (InHotStandby)
- ResolveRecoveryConflictWithSnapshot(xlrec->snapshotConflictHorizon,
- xlrec->flags & VISIBILITYMAP_XLOG_CATALOG_REL,
- rlocator);
-
- /*
- * Read the heap page, if it still exists. If the heap file has dropped or
- * truncated later in recovery, we don't need to update the page, but we'd
- * better still update the visibility map.
- */
- action = XLogReadBufferForRedo(record, 1, &buffer);
- if (action == BLK_NEEDS_REDO)
- {
- /*
- * We don't bump the LSN of the heap page when setting the visibility
- * map bit (unless checksums or wal_hint_bits is enabled, in which
- * case we must). This exposes us to torn page hazards, but since
- * we're not inspecting the existing page contents in any way, we
- * don't care.
- */
- page = BufferGetPage(buffer);
-
- PageSetAllVisible(page);
-
- if (XLogHintBitIsNeeded())
- PageSetLSN(page, lsn);
-
- MarkBufferDirty(buffer);
- }
- else if (action == BLK_RESTORED)
- {
- /*
- * If heap block was backed up, we already restored it and there's
- * nothing more to do. (This can only happen with checksums or
- * wal_log_hints enabled.)
- */
- }
-
- if (BufferIsValid(buffer))
- {
- Size space = PageGetFreeSpace(BufferGetPage(buffer));
-
- UnlockReleaseBuffer(buffer);
-
- /*
- * Since FSM is not WAL-logged and only updated heuristically, it
- * easily becomes stale in standbys. If the standby is later promoted
- * and runs VACUUM, it will skip updating individual free space
- * figures for pages that became all-visible (or all-frozen, depending
- * on the vacuum mode,) which is troublesome when FreeSpaceMapVacuum
- * propagates too optimistic free space values to upper FSM layers;
- * later inserters try to use such pages only to find out that they
- * are unusable. This can cause long stalls when there are many such
- * pages.
- *
- * Forestall those problems by updating FSM's idea about a page that
- * is becoming all-visible or all-frozen.
- *
- * Do this regardless of a full-page image being applied, since the
- * FSM data is not in the page anyway.
- */
- if (xlrec->flags & VISIBILITYMAP_VALID_BITS)
- XLogRecordPageWithFreeSpace(rlocator, blkno, space);
- }
-
- /*
- * Even if we skipped the heap page update due to the LSN interlock, it's
- * still safe to update the visibility map. Any WAL record that clears
- * the visibility map bit does so before checking the page LSN, so any
- * bits that need to be cleared will still be cleared.
- */
- if (XLogReadBufferForRedoExtended(record, 0, RBM_ZERO_ON_ERROR, false,
- &vmbuffer) == BLK_NEEDS_REDO)
- {
- Page vmpage = BufferGetPage(vmbuffer);
- Relation reln;
- uint8 vmbits;
-
- /* initialize the page if it was read as zeros */
- if (PageIsNew(vmpage))
- PageInit(vmpage, BLCKSZ, 0);
-
- /* remove VISIBILITYMAP_XLOG_* */
- vmbits = xlrec->flags & VISIBILITYMAP_VALID_BITS;
-
- /*
- * XLogReadBufferForRedoExtended locked the buffer. But
- * visibilitymap_set will handle locking itself.
- */
- LockBuffer(vmbuffer, BUFFER_LOCK_UNLOCK);
-
- reln = CreateFakeRelcacheEntry(rlocator);
-
- visibilitymap_set(reln, blkno, InvalidBuffer, lsn, vmbuffer,
- xlrec->snapshotConflictHorizon, vmbits);
-
- ReleaseBuffer(vmbuffer);
- FreeFakeRelcacheEntry(reln);
- }
- else if (BufferIsValid(vmbuffer))
- UnlockReleaseBuffer(vmbuffer);
-}
-
/*
* Given an "infobits" field from an XLog record, set the correct bits in the
* given infomask and infomask2 for the tuple touched by the record.
@@ -777,8 +641,8 @@ heap_xlog_multi_insert(XLogReaderState *record)
*
* During recovery, however, no concurrent writers exist. Therefore,
* updating the VM without holding the heap page lock is safe enough. This
- * same approach is taken when replaying xl_heap_visible records (see
- * heap_xlog_visible()).
+ * same approach is taken when replaying XLOG_HEAP2_PRUNE* records (see
+ * heap_xlog_prune_and_freeze()).
*/
if ((xlrec->flags & XLH_INSERT_ALL_FROZEN_SET) &&
XLogReadBufferForRedoExtended(record, 1, RBM_ZERO_ON_ERROR, false,
@@ -790,11 +654,11 @@ heap_xlog_multi_insert(XLogReaderState *record)
if (PageIsNew(vmpage))
PageInit(vmpage, BLCKSZ, 0);
- visibilitymap_set_vmbits(blkno,
- vmbuffer,
- VISIBILITYMAP_ALL_VISIBLE |
- VISIBILITYMAP_ALL_FROZEN,
- rlocator);
+ visibilitymap_set(blkno,
+ vmbuffer,
+ VISIBILITYMAP_ALL_VISIBLE |
+ VISIBILITYMAP_ALL_FROZEN,
+ rlocator);
Assert(BufferIsDirty(vmbuffer));
PageSetLSN(vmpage, lsn);
@@ -1375,9 +1239,6 @@ heap2_redo(XLogReaderState *record)
case XLOG_HEAP2_PRUNE_VACUUM_CLEANUP:
heap_xlog_prune_freeze(record);
break;
- case XLOG_HEAP2_VISIBLE:
- heap_xlog_visible(record);
- break;
case XLOG_HEAP2_MULTI_INSERT:
heap_xlog_multi_insert(record);
break;
diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index ba578c1ce0f..80037d690e3 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -1112,9 +1112,9 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
{
Assert(PageIsAllVisible(page));
- old_vmbits = visibilitymap_set_vmbits(blockno,
- vmbuffer, new_vmbits,
- params->relation->rd_locator);
+ old_vmbits = visibilitymap_set(blockno,
+ vmbuffer, new_vmbits,
+ params->relation->rd_locator);
if (old_vmbits == new_vmbits)
{
LockBuffer(vmbuffer, BUFFER_LOCK_UNLOCK);
@@ -2396,14 +2396,18 @@ get_conflict_xid(bool do_prune, bool do_freeze, bool do_set_vm,
*
* This is used for several different page maintenance operations:
*
- * - Page pruning, in VACUUM's 1st pass or on access: Some items are
+ * - Page pruning, in vacuum phase I or on-access: Some items are
* redirected, some marked dead, and some removed altogether.
*
- * - Freezing: Items are marked as 'frozen'.
+ * - Freezing: During vacuum phase I, items are marked as 'frozen'
*
- * - Vacuum, 2nd pass: Items that are already LP_DEAD are marked as unused.
+ * - Reaping: During vacuum phase III, items that are already LP_DEAD are
+ * marked as unused.
*
- * They have enough commonalities that we use a single WAL record for them
+ * - VM updates: After vacuum phases I and III, the heap page may be marked
+ * all-visible and all-frozen.
+ *
+ * These changes all happen together, so we use a single WAL record for them
* all.
*
* If replaying the record requires a cleanup lock, pass cleanup_lock = true.
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index 6a87fc371a0..5beb410aacc 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -1894,11 +1894,11 @@ lazy_scan_new_or_empty(LVRelState *vacrel, Buffer buf, BlockNumber blkno,
log_newpage_buffer(buf, true);
PageSetAllVisible(page);
- visibilitymap_set_vmbits(blkno,
- vmbuffer,
- VISIBILITYMAP_ALL_VISIBLE |
- VISIBILITYMAP_ALL_FROZEN,
- vacrel->rel->rd_locator);
+ visibilitymap_set(blkno,
+ vmbuffer,
+ VISIBILITYMAP_ALL_VISIBLE |
+ VISIBILITYMAP_ALL_FROZEN,
+ vacrel->rel->rd_locator);
/*
* Emit WAL for setting PD_ALL_VISIBLE on the heap page and
@@ -2776,9 +2776,9 @@ lazy_vacuum_heap_page(LVRelState *vacrel, BlockNumber blkno, Buffer buffer,
* set PD_ALL_VISIBLE.
*/
PageSetAllVisible(page);
- visibilitymap_set_vmbits(blkno,
- vmbuffer, vmflags,
- vacrel->rel->rd_locator);
+ visibilitymap_set(blkno,
+ vmbuffer, vmflags,
+ vacrel->rel->rd_locator);
conflict_xid = visibility_cutoff_xid;
}
diff --git a/src/backend/access/heap/visibilitymap.c b/src/backend/access/heap/visibilitymap.c
index d14588e92ae..7997e926872 100644
--- a/src/backend/access/heap/visibilitymap.c
+++ b/src/backend/access/heap/visibilitymap.c
@@ -14,8 +14,7 @@
* visibilitymap_clear - clear bits for one page in the visibility map
* visibilitymap_pin - pin a map page for setting a bit
* visibilitymap_pin_ok - check whether correct map page is already pinned
- * visibilitymap_set - set bit(s) in a previously pinned page and log
- * visibilitymap_set_vmbits - set bit(s) in a pinned page
+ * visibilitymap_set - set bit(s) in a previously pinned page
* visibilitymap_get_status - get status of bits
* visibilitymap_count - count number of bits set in visibility map
* visibilitymap_prepare_truncate -
@@ -220,109 +219,6 @@ visibilitymap_pin_ok(BlockNumber heapBlk, Buffer vmbuf)
return BufferIsValid(vmbuf) && BufferGetBlockNumber(vmbuf) == mapBlock;
}
-/*
- * visibilitymap_set - set bit(s) on a previously pinned page
- *
- * recptr is the LSN of the XLOG record we're replaying, if we're in recovery,
- * or InvalidXLogRecPtr in normal running. The VM page LSN is advanced to the
- * one provided; in normal running, we generate a new XLOG record and set the
- * page LSN to that value (though the heap page's LSN may *not* be updated;
- * see below). cutoff_xid is the largest xmin on the page being marked
- * all-visible; it is needed for Hot Standby, and can be InvalidTransactionId
- * if the page contains no tuples. It can also be set to InvalidTransactionId
- * when a page that is already all-visible is being marked all-frozen.
- *
- * Caller is expected to set the heap page's PD_ALL_VISIBLE bit before calling
- * this function. Except in recovery, caller should also pass the heap
- * buffer. When checksums are enabled and we're not in recovery, we must add
- * the heap buffer to the WAL chain to protect it from being torn.
- *
- * You must pass a buffer containing the correct map page to this function.
- * Call visibilitymap_pin first to pin the right one. This function doesn't do
- * any I/O.
- *
- * Returns the state of the page's VM bits before setting flags.
- */
-uint8
-visibilitymap_set(Relation rel, BlockNumber heapBlk, Buffer heapBuf,
- XLogRecPtr recptr, Buffer vmBuf, TransactionId cutoff_xid,
- uint8 flags)
-{
- BlockNumber mapBlock = HEAPBLK_TO_MAPBLOCK(heapBlk);
- uint32 mapByte = HEAPBLK_TO_MAPBYTE(heapBlk);
- uint8 mapOffset = HEAPBLK_TO_OFFSET(heapBlk);
- Page page;
- uint8 *map;
- uint8 status;
-
-#ifdef TRACE_VISIBILITYMAP
- elog(DEBUG1, "vm_set flags 0x%02X for %s %d",
- flags, RelationGetRelationName(rel), heapBlk);
-#endif
-
- Assert(InRecovery || !XLogRecPtrIsValid(recptr));
- Assert(InRecovery || PageIsAllVisible(BufferGetPage(heapBuf)));
- Assert((flags & VISIBILITYMAP_VALID_BITS) == flags);
-
- /* Must never set all_frozen bit without also setting all_visible bit */
- Assert(flags != VISIBILITYMAP_ALL_FROZEN);
-
- /* Check that we have the right heap page pinned, if present */
- if (BufferIsValid(heapBuf) && BufferGetBlockNumber(heapBuf) != heapBlk)
- elog(ERROR, "wrong heap buffer passed to visibilitymap_set");
-
- Assert(!BufferIsValid(heapBuf) ||
- BufferIsLockedByMeInMode(heapBuf, BUFFER_LOCK_EXCLUSIVE));
-
- /* Check that we have the right VM page pinned */
- if (!BufferIsValid(vmBuf) || BufferGetBlockNumber(vmBuf) != mapBlock)
- elog(ERROR, "wrong VM buffer passed to visibilitymap_set");
-
- page = BufferGetPage(vmBuf);
- map = (uint8 *) PageGetContents(page);
- LockBuffer(vmBuf, BUFFER_LOCK_EXCLUSIVE);
-
- status = (map[mapByte] >> mapOffset) & VISIBILITYMAP_VALID_BITS;
- if (flags != status)
- {
- START_CRIT_SECTION();
-
- map[mapByte] |= (flags << mapOffset);
- MarkBufferDirty(vmBuf);
-
- if (RelationNeedsWAL(rel))
- {
- if (!XLogRecPtrIsValid(recptr))
- {
- Assert(!InRecovery);
- recptr = log_heap_visible(rel, heapBuf, vmBuf, cutoff_xid, flags);
-
- /*
- * If data checksums are enabled (or wal_log_hints=on), we
- * need to protect the heap page from being torn.
- *
- * If not, then we must *not* update the heap page's LSN. In
- * this case, the FPI for the heap page was omitted from the
- * WAL record inserted above, so it would be incorrect to
- * update the heap page's LSN.
- */
- if (XLogHintBitIsNeeded())
- {
- Page heapPage = BufferGetPage(heapBuf);
-
- PageSetLSN(heapPage, recptr);
- }
- }
- PageSetLSN(page, recptr);
- }
-
- END_CRIT_SECTION();
- }
-
- LockBuffer(vmBuf, BUFFER_LOCK_UNLOCK);
- return status;
-}
-
/*
* Set VM (visibility map) flags in the VM block in vmBuf.
*
@@ -344,9 +240,9 @@ visibilitymap_set(Relation rel, BlockNumber heapBlk, Buffer heapBuf,
* rlocator is used only for debugging messages.
*/
uint8
-visibilitymap_set_vmbits(BlockNumber heapBlk,
- Buffer vmBuf, uint8 flags,
- const RelFileLocator rlocator)
+visibilitymap_set(BlockNumber heapBlk,
+ Buffer vmBuf, uint8 flags,
+ const RelFileLocator rlocator)
{
BlockNumber mapBlock = HEAPBLK_TO_MAPBLOCK(heapBlk);
uint32 mapByte = HEAPBLK_TO_MAPBYTE(heapBlk);
diff --git a/src/backend/access/rmgrdesc/heapdesc.c b/src/backend/access/rmgrdesc/heapdesc.c
index ca26d1f0ed1..08461fdf593 100644
--- a/src/backend/access/rmgrdesc/heapdesc.c
+++ b/src/backend/access/rmgrdesc/heapdesc.c
@@ -349,13 +349,6 @@ heap2_desc(StringInfo buf, XLogReaderState *record)
}
}
}
- else if (info == XLOG_HEAP2_VISIBLE)
- {
- xl_heap_visible *xlrec = (xl_heap_visible *) rec;
-
- appendStringInfo(buf, "snapshotConflictHorizon: %u, flags: 0x%02X",
- xlrec->snapshotConflictHorizon, xlrec->flags);
- }
else if (info == XLOG_HEAP2_MULTI_INSERT)
{
xl_heap_multi_insert *xlrec = (xl_heap_multi_insert *) rec;
@@ -461,9 +454,6 @@ heap2_identify(uint8 info)
case XLOG_HEAP2_PRUNE_VACUUM_CLEANUP:
id = "PRUNE_VACUUM_CLEANUP";
break;
- case XLOG_HEAP2_VISIBLE:
- id = "VISIBLE";
- break;
case XLOG_HEAP2_MULTI_INSERT:
id = "MULTI_INSERT";
break;
diff --git a/src/backend/replication/logical/decode.c b/src/backend/replication/logical/decode.c
index cc03f0706e9..2fdd4af90a8 100644
--- a/src/backend/replication/logical/decode.c
+++ b/src/backend/replication/logical/decode.c
@@ -454,7 +454,6 @@ heap2_decode(LogicalDecodingContext *ctx, XLogRecordBuffer *buf)
case XLOG_HEAP2_PRUNE_ON_ACCESS:
case XLOG_HEAP2_PRUNE_VACUUM_SCAN:
case XLOG_HEAP2_PRUNE_VACUUM_CLEANUP:
- case XLOG_HEAP2_VISIBLE:
case XLOG_HEAP2_LOCK_UPDATED:
break;
default:
diff --git a/src/backend/storage/ipc/standby.c b/src/backend/storage/ipc/standby.c
index 4222bdab078..c619643e121 100644
--- a/src/backend/storage/ipc/standby.c
+++ b/src/backend/storage/ipc/standby.c
@@ -475,12 +475,12 @@ ResolveRecoveryConflictWithSnapshot(TransactionId snapshotConflictHorizon,
* If we get passed InvalidTransactionId then we do nothing (no conflict).
*
* This can happen when replaying already-applied WAL records after a
- * standby crash or restart, or when replaying an XLOG_HEAP2_VISIBLE
- * record that marks as frozen a page which was already all-visible. It's
- * also quite common with records generated during index deletion
- * (original execution of the deletion can reason that a recovery conflict
- * which is sufficient for the deletion operation must take place before
- * replay of the deletion record itself).
+ * standby crash or restart, or when replaying a record that marks as
+ * frozen a page which was already marked all-visible in the visibility
+ * map. It's also quite common with records generated during index
+ * deletion (original execution of the deletion can reason that a recovery
+ * conflict which is sufficient for the deletion operation must take place
+ * before replay of the deletion record itself).
*/
if (!TransactionIdIsValid(snapshotConflictHorizon))
return;
diff --git a/src/include/access/heapam_xlog.h b/src/include/access/heapam_xlog.h
index 16c2b2e3c9c..e9e77bd678b 100644
--- a/src/include/access/heapam_xlog.h
+++ b/src/include/access/heapam_xlog.h
@@ -60,7 +60,6 @@
#define XLOG_HEAP2_PRUNE_ON_ACCESS 0x10
#define XLOG_HEAP2_PRUNE_VACUUM_SCAN 0x20
#define XLOG_HEAP2_PRUNE_VACUUM_CLEANUP 0x30
-#define XLOG_HEAP2_VISIBLE 0x40
#define XLOG_HEAP2_MULTI_INSERT 0x50
#define XLOG_HEAP2_LOCK_UPDATED 0x60
#define XLOG_HEAP2_NEW_CID 0x70
@@ -294,7 +293,13 @@ typedef struct xl_heap_prune
#define SizeOfHeapPrune (offsetof(xl_heap_prune, flags) + sizeof(uint16))
-/* to handle recovery conflict during logical decoding on standby */
+/*
+ * To handle recovery conflict during logical decoding on standby, we must know
+ * if the table is a catalog table. Note that in visibilitymapdefs.h
+ * VISIBLITYMAP_XLOG_CATALOG_REL is also defined as (1 << 2). xl_heap_prune
+ * records should use XLHP_IS_CATALOG_REL, not VISIBILIYTMAP_XLOG_CATALOG_REL --
+ * even if they only contain updates to the VM.
+ */
#define XLHP_IS_CATALOG_REL (1 << 1)
/*
@@ -443,20 +448,6 @@ typedef struct xl_heap_inplace
#define MinSizeOfHeapInplace (offsetof(xl_heap_inplace, nmsgs) + sizeof(int))
-/*
- * This is what we need to know about setting a visibility map bit
- *
- * Backup blk 0: visibility map buffer
- * Backup blk 1: heap buffer
- */
-typedef struct xl_heap_visible
-{
- TransactionId snapshotConflictHorizon;
- uint8 flags;
-} xl_heap_visible;
-
-#define SizeOfHeapVisible (offsetof(xl_heap_visible, flags) + sizeof(uint8))
-
typedef struct xl_heap_new_cid
{
/*
@@ -500,11 +491,6 @@ extern void heap2_desc(StringInfo buf, XLogReaderState *record);
extern const char *heap2_identify(uint8 info);
extern void heap_xlog_logical_rewrite(XLogReaderState *r);
-extern XLogRecPtr log_heap_visible(Relation rel, Buffer heap_buffer,
- Buffer vm_buffer,
- TransactionId snapshotConflictHorizon,
- uint8 vmflags);
-
/* in heapdesc.c, so it can be shared between frontend/backend code */
extern void heap_xlog_deserialize_prune_and_freeze(char *cursor, uint16 flags,
int *nplans, xlhp_freeze_plan **plans,
diff --git a/src/include/access/visibilitymap.h b/src/include/access/visibilitymap.h
index c6fa37be968..05ba6786b47 100644
--- a/src/include/access/visibilitymap.h
+++ b/src/include/access/visibilitymap.h
@@ -15,7 +15,6 @@
#define VISIBILITYMAP_H
#include "access/visibilitymapdefs.h"
-#include "access/xlogdefs.h"
#include "storage/block.h"
#include "storage/buf.h"
#include "storage/relfilelocator.h"
@@ -32,15 +31,9 @@ extern bool visibilitymap_clear(Relation rel, BlockNumber heapBlk,
extern void visibilitymap_pin(Relation rel, BlockNumber heapBlk,
Buffer *vmbuf);
extern bool visibilitymap_pin_ok(BlockNumber heapBlk, Buffer vmbuf);
-extern uint8 visibilitymap_set(Relation rel,
- BlockNumber heapBlk, Buffer heapBuf,
- XLogRecPtr recptr,
- Buffer vmBuf,
- TransactionId cutoff_xid,
- uint8 flags);
-extern uint8 visibilitymap_set_vmbits(BlockNumber heapBlk,
- Buffer vmBuf, uint8 flags,
- const RelFileLocator rlocator);
+extern uint8 visibilitymap_set(BlockNumber heapBlk,
+ Buffer vmBuf, uint8 flags,
+ const RelFileLocator rlocator);
extern uint8 visibilitymap_get_status(Relation rel, BlockNumber heapBlk, Buffer *vmbuf);
extern void visibilitymap_count(Relation rel, BlockNumber *all_visible, BlockNumber *all_frozen);
extern BlockNumber visibilitymap_prepare_truncate(Relation rel,
diff --git a/src/include/access/visibilitymapdefs.h b/src/include/access/visibilitymapdefs.h
index 5ad5c020877..e01bce4c99f 100644
--- a/src/include/access/visibilitymapdefs.h
+++ b/src/include/access/visibilitymapdefs.h
@@ -21,14 +21,5 @@
#define VISIBILITYMAP_ALL_FROZEN 0x02
#define VISIBILITYMAP_VALID_BITS 0x03 /* OR of all valid visibilitymap
* flags bits */
-/*
- * To detect recovery conflicts during logical decoding on a standby, we need
- * to know if a table is a user catalog table. For that we add an additional
- * bit into xl_heap_visible.flags, in addition to the above.
- *
- * NB: VISIBILITYMAP_XLOG_* may not be passed to visibilitymap_set().
- */
-#define VISIBILITYMAP_XLOG_CATALOG_REL 0x04
-#define VISIBILITYMAP_XLOG_VALID_BITS (VISIBILITYMAP_VALID_BITS | VISIBILITYMAP_XLOG_CATALOG_REL)
#endif /* VISIBILITYMAPDEFS_H */
diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list
index c751c25a04d..2a9951b7188 100644
--- a/src/tools/pgindent/typedefs.list
+++ b/src/tools/pgindent/typedefs.list
@@ -4292,7 +4292,6 @@ xl_heap_prune
xl_heap_rewrite_mapping
xl_heap_truncate
xl_heap_update
-xl_heap_visible
xl_invalid_page
xl_invalid_page_key
xl_invalidations
--
2.43.0
v22-0005-Rename-GlobalVisTestIsRemovableXid-to-GlobalVisX.patchtext/x-patch; charset=UTF-8; name=v22-0005-Rename-GlobalVisTestIsRemovableXid-to-GlobalVisX.patchDownload
From fd0455230968fd919999a5c035f3830d310f0e49 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <melanieplageman@gmail.com>
Date: Fri, 18 Jul 2025 16:30:04 -0400
Subject: [PATCH v22 5/9] Rename GlobalVisTestIsRemovableXid() to
GlobalVisXidVisibleToAll()
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
The function is currently only used to check whether a tuple’s xmax is
visible to all transactions (and thus removable). Upcoming changes will
also use it to test whether a tuple’s xmin is visible to all to
decide if a page can be marked all-visible in the visibility map.
The new name, GlobalVisXidVisibleToAll(), better reflects this broader
purpose.
Reviewed-by: Kirill Reshke <reshkekirill@gmail.com>
---
src/backend/access/heap/heapam_visibility.c | 6 +++---
src/backend/access/heap/pruneheap.c | 12 ++++++------
src/backend/access/spgist/spgvacuum.c | 2 +-
src/backend/storage/ipc/procarray.c | 17 ++++++++---------
src/include/utils/snapmgr.h | 4 ++--
5 files changed, 20 insertions(+), 21 deletions(-)
diff --git a/src/backend/access/heap/heapam_visibility.c b/src/backend/access/heap/heapam_visibility.c
index 05f6946fe60..4ebc8abdbeb 100644
--- a/src/backend/access/heap/heapam_visibility.c
+++ b/src/backend/access/heap/heapam_visibility.c
@@ -1447,7 +1447,7 @@ HeapTupleSatisfiesNonVacuumable(HeapTuple htup, Snapshot snapshot,
{
Assert(TransactionIdIsValid(dead_after));
- if (GlobalVisTestIsRemovableXid(snapshot->vistest, dead_after))
+ if (GlobalVisXidVisibleToAll(snapshot->vistest, dead_after))
res = HEAPTUPLE_DEAD;
}
else
@@ -1512,8 +1512,8 @@ HeapTupleIsSurelyDead(HeapTuple htup, GlobalVisState *vistest)
return false;
/* Deleter committed, so tuple is dead if the XID is old enough. */
- return GlobalVisTestIsRemovableXid(vistest,
- HeapTupleHeaderGetRawXmax(tuple));
+ return GlobalVisXidVisibleToAll(vistest,
+ HeapTupleHeaderGetRawXmax(tuple));
}
/*
diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index 80037d690e3..989af765702 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -250,7 +250,7 @@ heap_page_prune_opt(Relation relation, Buffer buffer)
*/
vistest = GlobalVisTestFor(relation);
- if (!GlobalVisTestIsRemovableXid(vistest, prune_xid))
+ if (!GlobalVisXidVisibleToAll(vistest, prune_xid))
return;
/*
@@ -476,7 +476,7 @@ prune_freeze_plan(Oid reloid, Buffer buffer, PruneState *prstate,
* Determining HTSV only once for each tuple is required for correctness,
* to deal with cases where running HTSV twice could result in different
* results. For example, RECENTLY_DEAD can turn to DEAD if another
- * checked item causes GlobalVisTestIsRemovableFullXid() to update the
+ * checked item causes GlobalVisFullXidVisibleToAll() to update the
* horizon, or INSERT_IN_PROGRESS can change to DEAD if the inserting
* transaction aborts.
*
@@ -1238,11 +1238,11 @@ heap_prune_satisfies_vacuum(PruneState *prstate, HeapTuple tup, Buffer buffer)
* Determine whether or not the tuple is considered dead when compared
* with the provided GlobalVisState. On-access pruning does not provide
* VacuumCutoffs. And for vacuum, even if the tuple's xmax is not older
- * than OldestXmin, GlobalVisTestIsRemovableXid() could find the row dead
- * if the GlobalVisState has been updated since the beginning of vacuuming
+ * than OldestXmin, GlobalVisXidVisibleToAll() could find the row dead if
+ * the GlobalVisState has been updated since the beginning of vacuuming
* the relation.
*/
- if (GlobalVisTestIsRemovableXid(prstate->vistest, dead_after))
+ if (GlobalVisXidVisibleToAll(prstate->vistest, dead_after))
return HEAPTUPLE_DEAD;
return res;
@@ -1701,7 +1701,7 @@ heap_prune_record_unchanged_lp_normal(Page page, PruneState *prstate, OffsetNumb
/*
* For now always use prstate->cutoffs for this test, because
* we only update 'all_visible' and 'all_frozen' when freezing
- * is requested. We could use GlobalVisTestIsRemovableXid
+ * is requested. We could use GlobalVisXidVisibleToAll()
* instead, if a non-freezing caller wanted to set the VM bit.
*/
Assert(prstate->cutoffs);
diff --git a/src/backend/access/spgist/spgvacuum.c b/src/backend/access/spgist/spgvacuum.c
index 71ef2e5036f..1c0eb425ee9 100644
--- a/src/backend/access/spgist/spgvacuum.c
+++ b/src/backend/access/spgist/spgvacuum.c
@@ -536,7 +536,7 @@ vacuumRedirectAndPlaceholder(Relation index, Relation heaprel, Buffer buffer)
*/
if (dt->tupstate == SPGIST_REDIRECT &&
(!TransactionIdIsValid(dt->xid) ||
- GlobalVisTestIsRemovableXid(vistest, dt->xid)))
+ GlobalVisXidVisibleToAll(vistest, dt->xid)))
{
dt->tupstate = SPGIST_PLACEHOLDER;
Assert(opaque->nRedirection > 0);
diff --git a/src/backend/storage/ipc/procarray.c b/src/backend/storage/ipc/procarray.c
index 200f72c6e25..235c3b584f6 100644
--- a/src/backend/storage/ipc/procarray.c
+++ b/src/backend/storage/ipc/procarray.c
@@ -4181,8 +4181,7 @@ GlobalVisUpdate(void)
* See comment for GlobalVisState for details.
*/
bool
-GlobalVisTestIsRemovableFullXid(GlobalVisState *state,
- FullTransactionId fxid)
+GlobalVisFullXidVisibleToAll(GlobalVisState *state, FullTransactionId fxid)
{
/*
* If fxid is older than maybe_needed bound, it definitely is visible to
@@ -4216,14 +4215,14 @@ GlobalVisTestIsRemovableFullXid(GlobalVisState *state,
}
/*
- * Wrapper around GlobalVisTestIsRemovableFullXid() for 32bit xids.
+ * Wrapper around GlobalVisFullXidVisibleToAll() for 32bit xids.
*
* It is crucial that this only gets called for xids from a source that
* protects against xid wraparounds (e.g. from a table and thus protected by
* relfrozenxid).
*/
bool
-GlobalVisTestIsRemovableXid(GlobalVisState *state, TransactionId xid)
+GlobalVisXidVisibleToAll(GlobalVisState *state, TransactionId xid)
{
FullTransactionId fxid;
@@ -4237,12 +4236,12 @@ GlobalVisTestIsRemovableXid(GlobalVisState *state, TransactionId xid)
*/
fxid = FullXidRelativeTo(state->definitely_needed, xid);
- return GlobalVisTestIsRemovableFullXid(state, fxid);
+ return GlobalVisFullXidVisibleToAll(state, fxid);
}
/*
* Convenience wrapper around GlobalVisTestFor() and
- * GlobalVisTestIsRemovableFullXid(), see their comments.
+ * GlobalVisFullXidVisibleToAll(), see their comments.
*/
bool
GlobalVisCheckRemovableFullXid(Relation rel, FullTransactionId fxid)
@@ -4251,12 +4250,12 @@ GlobalVisCheckRemovableFullXid(Relation rel, FullTransactionId fxid)
state = GlobalVisTestFor(rel);
- return GlobalVisTestIsRemovableFullXid(state, fxid);
+ return GlobalVisFullXidVisibleToAll(state, fxid);
}
/*
* Convenience wrapper around GlobalVisTestFor() and
- * GlobalVisTestIsRemovableXid(), see their comments.
+ * GlobalVisTestIsVisibleXid(), see their comments.
*/
bool
GlobalVisCheckRemovableXid(Relation rel, TransactionId xid)
@@ -4265,7 +4264,7 @@ GlobalVisCheckRemovableXid(Relation rel, TransactionId xid)
state = GlobalVisTestFor(rel);
- return GlobalVisTestIsRemovableXid(state, xid);
+ return GlobalVisXidVisibleToAll(state, xid);
}
/*
diff --git a/src/include/utils/snapmgr.h b/src/include/utils/snapmgr.h
index 604c1f90216..a0ea2cfcea2 100644
--- a/src/include/utils/snapmgr.h
+++ b/src/include/utils/snapmgr.h
@@ -100,8 +100,8 @@ extern char *ExportSnapshot(Snapshot snapshot);
*/
typedef struct GlobalVisState GlobalVisState;
extern GlobalVisState *GlobalVisTestFor(Relation rel);
-extern bool GlobalVisTestIsRemovableXid(GlobalVisState *state, TransactionId xid);
-extern bool GlobalVisTestIsRemovableFullXid(GlobalVisState *state, FullTransactionId fxid);
+extern bool GlobalVisXidVisibleToAll(GlobalVisState *state, TransactionId xid);
+extern bool GlobalVisFullXidVisibleToAll(GlobalVisState *state, FullTransactionId fxid);
extern bool GlobalVisCheckRemovableXid(Relation rel, TransactionId xid);
extern bool GlobalVisCheckRemovableFullXid(Relation rel, FullTransactionId fxid);
--
2.43.0
v22-0006-Use-GlobalVisState-in-vacuum-to-determine-page-l.patchtext/x-patch; charset=UTF-8; name=v22-0006-Use-GlobalVisState-in-vacuum-to-determine-page-l.patchDownload
From 565014e31aa117fb43993ee2e64da38ffb74f372 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <melanieplageman@gmail.com>
Date: Tue, 29 Jul 2025 14:38:24 -0400
Subject: [PATCH v22 6/9] Use GlobalVisState in vacuum to determine page level
visibility
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
During vacuum's first and third phases, we examine tuples' visibility
to determine if we can set the page all-visible in the visibility map.
Previously, this check compared tuple xmins against a single XID chosen at
the start of vacuum (OldestXmin). We now use GlobalVisState, which also
enables future work to set the VM during on-access pruning, since ordinary
queries have access to GlobalVisState but not OldestXmin.
This also benefits vacuum directly: in some cases, GlobalVisState may
advance during a vacuum, allowing more pages to become considered
all-visible. And, in the future, we could easily add a heuristic to
update GlobalVisState more frequently during vacuums of large tables. In
the rare case that the GlobalVisState moves backward, vacuum falls back
to OldestXmin to ensure we don’t attempt to freeze a dead tuple that
wasn’t yet prunable according to the GlobalVisState.
Because comparing a transaction ID against GlobalVisState is more
expensive than comparing against a single XID, we defer this check until
after scanning all tuples on the page. If visibility_cutoff_xid was
maintained, we perform the GlobalVisState check only once per page.
This is safe because visibility_cutoff_xid records the newest xmin on
the page; if it is globally visible, then the entire page is all-visible.
This approach may result in examining more tuple xmins than before,
since with OldestXmin we could sometimes rule out the page being
all-visible earlier. However, profiling shows the additional cost is not
significant.
---
src/backend/access/heap/heapam_visibility.c | 28 ++++++++++++++
src/backend/access/heap/pruneheap.c | 43 +++++++++------------
src/backend/access/heap/vacuumlazy.c | 10 ++---
src/include/access/heapam.h | 11 +++---
4 files changed, 58 insertions(+), 34 deletions(-)
diff --git a/src/backend/access/heap/heapam_visibility.c b/src/backend/access/heap/heapam_visibility.c
index 4ebc8abdbeb..edd529dc3c0 100644
--- a/src/backend/access/heap/heapam_visibility.c
+++ b/src/backend/access/heap/heapam_visibility.c
@@ -1189,6 +1189,34 @@ HeapTupleSatisfiesVacuum(HeapTuple htup, TransactionId OldestXmin,
return res;
}
+/*
+ * Nearly the same as HeapTupleSatisfiesVacuum, but uses a GlobalVisState to
+ * determine whether or not a tuple is HEAPTUPLE_DEAD Or
+ * HEAPTUPLE_RECENTLY_DEAD. It serves the same purpose but can be used by
+ * callers that have not calculated a single OldestXmin value.
+ */
+HTSV_Result
+HeapTupleSatisfiesVacuumGlobalVis(HeapTuple htup, GlobalVisState *vistest,
+ Buffer buffer)
+{
+ TransactionId dead_after = InvalidTransactionId;
+ HTSV_Result res;
+
+ res = HeapTupleSatisfiesVacuumHorizon(htup, buffer, &dead_after);
+
+ if (res == HEAPTUPLE_RECENTLY_DEAD)
+ {
+ Assert(TransactionIdIsValid(dead_after));
+
+ if (GlobalVisXidVisibleToAll(vistest, dead_after))
+ res = HEAPTUPLE_DEAD;
+ }
+ else
+ Assert(!TransactionIdIsValid(dead_after));
+
+ return res;
+}
+
/*
* Work horse for HeapTupleSatisfiesVacuum and similar routines.
*
diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index 989af765702..040efe80f2e 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -439,11 +439,12 @@ prune_freeze_setup(PruneFreezeParams *params,
/*
* The visibility cutoff xid is the newest xmin of live, committed tuples
- * older than OldestXmin on the page. This field is only kept up-to-date
- * if the page is all-visible. As soon as a tuple is encountered that is
- * not visible to all, this field is unmaintained. As long as it is
- * maintained, it can be used to calculate the snapshot conflict horizon
- * when updating the VM and/or freezing all the tuples on the page.
+ * on the page older than the visibility horizon represented in the
+ * GlobalVisState. This field is only kept up-to-date if the page is
+ * all-visible. As soon as a tuple is encountered that is not visible to
+ * all, this field is unmaintained. As long as it is maintained, it can be
+ * used to calculate the snapshot conflict horizon when updating the VM
+ * and/or freezing all the tuples on the page.
*/
prstate->visibility_cutoff_xid = InvalidTransactionId;
}
@@ -886,14 +887,13 @@ heap_page_will_set_vis(Relation relation,
*/
static bool
heap_page_is_all_visible(Relation rel, Buffer buf,
- TransactionId OldestXmin,
+ GlobalVisState *vistest,
bool *all_frozen,
TransactionId *visibility_cutoff_xid,
OffsetNumber *logging_offnum)
{
- return heap_page_would_be_all_visible(rel, buf,
- OldestXmin,
+ return heap_page_would_be_all_visible(rel, buf, vistest,
NULL, 0,
all_frozen,
visibility_cutoff_xid,
@@ -994,6 +994,16 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
prstate.ndead > 0 ||
prstate.nunused > 0;
+ /*
+ * After processing all the live tuples on the page, if the newest xmin
+ * amongst them is not visible to everyone, the page cannot be
+ * all-visible.
+ */
+ if (prstate.all_visible &&
+ TransactionIdIsNormal(prstate.visibility_cutoff_xid) &&
+ !GlobalVisXidVisibleToAll(prstate.vistest, prstate.visibility_cutoff_xid))
+ prstate.all_visible = prstate.all_frozen = false;
+
/*
* Even if we don't prune anything, if we found a new value for the
* pd_prune_xid field or the page was marked full, we will update the hint
@@ -1165,10 +1175,9 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
bool debug_all_frozen;
Assert(prstate.lpdead_items == 0);
- Assert(prstate.cutoffs);
if (!heap_page_is_all_visible(params->relation, buffer,
- prstate.cutoffs->OldestXmin,
+ prstate.vistest,
&debug_all_frozen,
&debug_cutoff, off_loc))
Assert(false);
@@ -1698,20 +1707,6 @@ heap_prune_record_unchanged_lp_normal(Page page, PruneState *prstate, OffsetNumb
*/
xmin = HeapTupleHeaderGetXmin(htup);
- /*
- * For now always use prstate->cutoffs for this test, because
- * we only update 'all_visible' and 'all_frozen' when freezing
- * is requested. We could use GlobalVisXidVisibleToAll()
- * instead, if a non-freezing caller wanted to set the VM bit.
- */
- Assert(prstate->cutoffs);
- if (!TransactionIdPrecedes(xmin, prstate->cutoffs->OldestXmin))
- {
- prstate->all_visible = false;
- prstate->all_frozen = false;
- break;
- }
-
/* Track newest xmin on page. */
if (TransactionIdFollows(xmin, prstate->visibility_cutoff_xid) &&
TransactionIdIsNormal(xmin))
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index 5beb410aacc..7c3bb25cc04 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -2730,7 +2730,7 @@ lazy_vacuum_heap_page(LVRelState *vacrel, BlockNumber blkno, Buffer buffer,
* done outside the critical section.
*/
if (heap_page_would_be_all_visible(vacrel->rel, buffer,
- vacrel->cutoffs.OldestXmin,
+ vacrel->vistest,
deadoffsets, num_offsets,
&all_frozen, &visibility_cutoff_xid,
&vacrel->offnum))
@@ -3489,7 +3489,7 @@ dead_items_cleanup(LVRelState *vacrel)
* Returns true if the page is all-visible other than the provided
* deadoffsets and false otherwise.
*
- * OldestXmin is used to determine visibility.
+ * vistest is used to determine visibility.
*
* Output parameters:
*
@@ -3505,7 +3505,7 @@ dead_items_cleanup(LVRelState *vacrel)
*/
bool
heap_page_would_be_all_visible(Relation rel, Buffer buf,
- TransactionId OldestXmin,
+ GlobalVisState *vistest,
OffsetNumber *deadoffsets,
int ndeadoffsets,
bool *all_frozen,
@@ -3579,7 +3579,7 @@ heap_page_would_be_all_visible(Relation rel, Buffer buf,
/* Visibility checks may do IO or allocate memory */
Assert(CritSectionCount == 0);
- switch (HeapTupleSatisfiesVacuum(&tuple, OldestXmin, buf))
+ switch (HeapTupleSatisfiesVacuumGlobalVis(&tuple, vistest, buf))
{
case HEAPTUPLE_LIVE:
{
@@ -3598,7 +3598,7 @@ heap_page_would_be_all_visible(Relation rel, Buffer buf,
* that everyone sees it as committed?
*/
xmin = HeapTupleHeaderGetXmin(tuple.t_data);
- if (!TransactionIdPrecedes(xmin, OldestXmin))
+ if (!GlobalVisXidVisibleToAll(vistest, xmin))
{
all_visible = false;
*all_frozen = false;
diff --git a/src/include/access/heapam.h b/src/include/access/heapam.h
index 937b46a77db..2b6a521e4ea 100644
--- a/src/include/access/heapam.h
+++ b/src/include/access/heapam.h
@@ -276,10 +276,9 @@ typedef struct PruneFreezeParams
/*
* Contains the cutoffs used for freezing. They are required if the
- * HEAP_PAGE_PRUNE_FREEZE option is set. cutoffs->OldestXmin is also used
- * to determine if dead tuples are HEAPTUPLE_RECENTLY_DEAD or
- * HEAPTUPLE_DEAD. Currently only vacuum passes in cutoffs. Vacuum
- * calculates them once, at the beginning of vacuuming the relation.
+ * HEAP_PAGE_PRUNE_FREEZE option is set. Currently only vacuum passes in
+ * cutoffs. Vacuum calculates them once, at the beginning of vacuuming the
+ * relation.
*/
struct VacuumCutoffs *cutoffs;
} PruneFreezeParams;
@@ -444,7 +443,7 @@ extern void log_heap_prune_and_freeze(Relation relation, Buffer buffer,
extern void heap_vacuum_rel(Relation rel,
const VacuumParams params, BufferAccessStrategy bstrategy);
extern bool heap_page_would_be_all_visible(Relation rel, Buffer buf,
- TransactionId OldestXmin,
+ GlobalVisState *vistest,
OffsetNumber *deadoffsets,
int ndeadoffsets,
bool *all_frozen,
@@ -458,6 +457,8 @@ extern TM_Result HeapTupleSatisfiesUpdate(HeapTuple htup, CommandId curcid,
Buffer buffer);
extern HTSV_Result HeapTupleSatisfiesVacuum(HeapTuple htup, TransactionId OldestXmin,
Buffer buffer);
+extern HTSV_Result HeapTupleSatisfiesVacuumGlobalVis(HeapTuple htup,
+ GlobalVisState *vistest, Buffer buffer);
extern HTSV_Result HeapTupleSatisfiesVacuumHorizon(HeapTuple htup, Buffer buffer,
TransactionId *dead_after);
extern void HeapTupleSetHintBits(HeapTupleHeader tuple, Buffer buffer,
--
2.43.0
v22-0007-Unset-all_visible-sooner-if-not-freezing.patchtext/x-patch; charset=UTF-8; name=v22-0007-Unset-all_visible-sooner-if-not-freezing.patchDownload
From 44ba53840d52ca255ddb09acb6fd0cda8559a4db Mon Sep 17 00:00:00 2001
From: Melanie Plageman <melanieplageman@gmail.com>
Date: Tue, 14 Oct 2025 15:22:35 -0400
Subject: [PATCH v22 7/9] Unset all_visible sooner if not freezing
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
In the prune/freeze path, we currently delay clearing all_visible and
all_frozen in the presence of dead items to allow opportunistic
freezing.
However, if no freezing will be attempted, there’s no need to delay.
Clearing the flags earlier avoids extra bookkeeping in
heap_prune_record_unchanged_lp_normal(). This currently has no runtime
effect because all callers that consider setting the VM also prepare
freeze plans, but upcoming changes will allow on-access pruning to set
the VM without freezing. The extra bookkeeping was noticeable in a
profile of on-access VM setting.
---
src/backend/access/heap/pruneheap.c | 15 +++++++++++++--
1 file changed, 13 insertions(+), 2 deletions(-)
diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index 040efe80f2e..90270081acd 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -1564,8 +1564,13 @@ heap_prune_record_dead(PruneState *prstate, OffsetNumber offnum,
/*
* Deliberately delay unsetting all_visible and all_frozen until later
* during pruning. Removable dead tuples shouldn't preclude freezing the
- * page.
+ * page. If we won't attempt freezing, just unset all-visible now, though.
*/
+ if (!prstate->attempt_freeze)
+ {
+ prstate->all_visible = false;
+ prstate->all_frozen = false;
+ }
/* Record the dead offset for vacuum */
prstate->deadoffsets[prstate->lpdead_items++] = offnum;
@@ -1824,8 +1829,14 @@ heap_prune_record_unchanged_lp_dead(Page page, PruneState *prstate, OffsetNumber
* Similarly, don't unset all_visible and all_frozen until later, at the
* end of heap_page_prune_and_freeze(). This will allow us to attempt to
* freeze the page after pruning. As long as we unset it before updating
- * the visibility map, this will be correct.
+ * the visibility map, this will be correct. If we won't attempt freezing,
+ * though, just unset all-visible now.
*/
+ if (!prstate->attempt_freeze)
+ {
+ prstate->all_visible = false;
+ prstate->all_frozen = false;
+ }
/* Record the dead offset for vacuum */
prstate->deadoffsets[prstate->lpdead_items++] = offnum;
--
2.43.0
v22-0008-Allow-on-access-pruning-to-set-pages-all-visible.patchtext/x-patch; charset=UTF-8; name=v22-0008-Allow-on-access-pruning-to-set-pages-all-visible.patchDownload
From bced81f6df3d303679fac2a1414d42f0db401232 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <melanieplageman@gmail.com>
Date: Tue, 29 Jul 2025 14:34:30 -0400
Subject: [PATCH v22 8/9] Allow on-access pruning to set pages all-visible
Many queries do not modify the underlying relation. For such queries, if
on-access pruning occurs during the scan, we can check whether the page
has become all-visible and update the visibility map accordingly.
Previously, only vacuum and COPY FREEZE marked pages as all-visible or
all-frozen.
Supporting this requires passing information about whether the relation
is modified from the executor down to the scan descriptor.
This commit implements on-access VM setting for sequential scans as well
as for the underlying heap relation in index scans and bitmap heap
scans.
Author: Melanie Plageman <melanieplageman@gmail.com>
Reviewed-by: Kirill Reshke <reshkekirill@gmail.com>
Discussion: https://postgr.es/m/flat/CAAKRu_ZMw6Npd_qm2KM%2BFwQ3cMOMx1Dh3VMhp8-V7SOLxdK9-g%40mail.gmail.com
---
contrib/pgrowlocks/pgrowlocks.c | 2 +-
src/backend/access/brin/brin.c | 3 +-
src/backend/access/gin/gininsert.c | 3 +-
src/backend/access/heap/heapam.c | 15 +++-
src/backend/access/heap/heapam_handler.c | 22 ++++--
src/backend/access/heap/pruneheap.c | 69 +++++++++++++++----
src/backend/access/index/genam.c | 4 +-
src/backend/access/index/indexam.c | 6 +-
src/backend/access/nbtree/nbtsort.c | 2 +-
src/backend/access/table/tableam.c | 8 ++-
src/backend/commands/constraint.c | 2 +-
src/backend/commands/copyto.c | 2 +-
src/backend/commands/tablecmds.c | 4 +-
src/backend/commands/typecmds.c | 4 +-
src/backend/executor/execIndexing.c | 2 +-
src/backend/executor/execMain.c | 4 ++
src/backend/executor/execReplication.c | 8 +--
src/backend/executor/execUtils.c | 2 +
src/backend/executor/nodeBitmapHeapscan.c | 9 ++-
src/backend/executor/nodeIndexonlyscan.c | 2 +-
src/backend/executor/nodeIndexscan.c | 11 ++-
src/backend/executor/nodeSeqscan.c | 26 ++++++-
src/backend/partitioning/partbounds.c | 2 +-
src/backend/utils/adt/selfuncs.c | 2 +-
src/include/access/genam.h | 3 +-
src/include/access/heapam.h | 30 +++++++-
src/include/access/tableam.h | 19 ++---
src/include/nodes/execnodes.h | 6 ++
.../t/035_standby_logical_decoding.pl | 3 +-
29 files changed, 210 insertions(+), 65 deletions(-)
diff --git a/contrib/pgrowlocks/pgrowlocks.c b/contrib/pgrowlocks/pgrowlocks.c
index f88269332b6..27f01d8055f 100644
--- a/contrib/pgrowlocks/pgrowlocks.c
+++ b/contrib/pgrowlocks/pgrowlocks.c
@@ -114,7 +114,7 @@ pgrowlocks(PG_FUNCTION_ARGS)
RelationGetRelationName(rel));
/* Scan the relation */
- scan = table_beginscan(rel, GetActiveSnapshot(), 0, NULL);
+ scan = table_beginscan(rel, GetActiveSnapshot(), 0, NULL, 0);
hscan = (HeapScanDesc) scan;
attinmeta = TupleDescGetAttInMetadata(rsinfo->setDesc);
diff --git a/src/backend/access/brin/brin.c b/src/backend/access/brin/brin.c
index cb3331921cb..b9613787b85 100644
--- a/src/backend/access/brin/brin.c
+++ b/src/backend/access/brin/brin.c
@@ -2842,7 +2842,8 @@ _brin_parallel_scan_and_build(BrinBuildState *state,
indexInfo->ii_Concurrent = brinshared->isconcurrent;
scan = table_beginscan_parallel(heap,
- ParallelTableScanFromBrinShared(brinshared));
+ ParallelTableScanFromBrinShared(brinshared),
+ 0);
reltuples = table_index_build_scan(heap, index, indexInfo, true, true,
brinbuildCallbackParallel, state, scan);
diff --git a/src/backend/access/gin/gininsert.c b/src/backend/access/gin/gininsert.c
index c2b879b2bf6..147844690a1 100644
--- a/src/backend/access/gin/gininsert.c
+++ b/src/backend/access/gin/gininsert.c
@@ -2058,7 +2058,8 @@ _gin_parallel_scan_and_build(GinBuildState *state,
indexInfo->ii_Concurrent = ginshared->isconcurrent;
scan = table_beginscan_parallel(heap,
- ParallelTableScanFromGinBuildShared(ginshared));
+ ParallelTableScanFromGinBuildShared(ginshared),
+ 0);
reltuples = table_index_build_scan(heap, index, indexInfo, true, progress,
ginBuildCallbackParallel, state, scan);
diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index 2bff37e03b5..ae53e311ce1 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -555,6 +555,7 @@ heap_prepare_pagescan(TableScanDesc sscan)
Buffer buffer = scan->rs_cbuf;
BlockNumber block = scan->rs_cblock;
Snapshot snapshot;
+ Buffer *vmbuffer = NULL;
Page page;
int lines;
bool all_visible;
@@ -569,7 +570,9 @@ heap_prepare_pagescan(TableScanDesc sscan)
/*
* Prune and repair fragmentation for the whole page, if possible.
*/
- heap_page_prune_opt(scan->rs_base.rs_rd, buffer);
+ if (sscan->rs_flags & SO_HINT_REL_READ_ONLY)
+ vmbuffer = &scan->rs_vmbuffer;
+ heap_page_prune_opt(scan->rs_base.rs_rd, buffer, vmbuffer);
/*
* We must hold share lock on the buffer content while examining tuple
@@ -1246,6 +1249,7 @@ heap_beginscan(Relation relation, Snapshot snapshot,
sizeof(TBMIterateResult));
}
+ scan->rs_vmbuffer = InvalidBuffer;
return (TableScanDesc) scan;
}
@@ -1284,6 +1288,12 @@ heap_rescan(TableScanDesc sscan, ScanKey key, bool set_params,
scan->rs_cbuf = InvalidBuffer;
}
+ if (BufferIsValid(scan->rs_vmbuffer))
+ {
+ ReleaseBuffer(scan->rs_vmbuffer);
+ scan->rs_vmbuffer = InvalidBuffer;
+ }
+
/*
* SO_TYPE_BITMAPSCAN would be cleaned up here, but it does not hold any
* additional data vs a normal HeapScan
@@ -1316,6 +1326,9 @@ heap_endscan(TableScanDesc sscan)
if (BufferIsValid(scan->rs_cbuf))
ReleaseBuffer(scan->rs_cbuf);
+ if (BufferIsValid(scan->rs_vmbuffer))
+ ReleaseBuffer(scan->rs_vmbuffer);
+
/*
* Must free the read stream before freeing the BufferAccessStrategy.
*/
diff --git a/src/backend/access/heap/heapam_handler.c b/src/backend/access/heap/heapam_handler.c
index bcbac844bb6..27e3498f5f4 100644
--- a/src/backend/access/heap/heapam_handler.c
+++ b/src/backend/access/heap/heapam_handler.c
@@ -79,12 +79,14 @@ heapam_slot_callbacks(Relation relation)
*/
static IndexFetchTableData *
-heapam_index_fetch_begin(Relation rel)
+heapam_index_fetch_begin(Relation rel, uint32 flags)
{
IndexFetchHeapData *hscan = palloc0(sizeof(IndexFetchHeapData));
hscan->xs_base.rel = rel;
hscan->xs_cbuf = InvalidBuffer;
+ hscan->xs_vmbuffer = InvalidBuffer;
+ hscan->modifies_base_rel = !(flags & SO_HINT_REL_READ_ONLY);
return &hscan->xs_base;
}
@@ -99,6 +101,12 @@ heapam_index_fetch_reset(IndexFetchTableData *scan)
ReleaseBuffer(hscan->xs_cbuf);
hscan->xs_cbuf = InvalidBuffer;
}
+
+ if (BufferIsValid(hscan->xs_vmbuffer))
+ {
+ ReleaseBuffer(hscan->xs_vmbuffer);
+ hscan->xs_vmbuffer = InvalidBuffer;
+ }
}
static void
@@ -138,7 +146,8 @@ heapam_index_fetch_tuple(struct IndexFetchTableData *scan,
* Prune page, but only if we weren't already on this page
*/
if (prev_buf != hscan->xs_cbuf)
- heap_page_prune_opt(hscan->xs_base.rel, hscan->xs_cbuf);
+ heap_page_prune_opt(hscan->xs_base.rel, hscan->xs_cbuf,
+ hscan->modifies_base_rel ? NULL : &hscan->xs_vmbuffer);
}
/* Obtain share-lock on the buffer so we can examine visibility */
@@ -753,7 +762,7 @@ heapam_relation_copy_for_cluster(Relation OldHeap, Relation NewHeap,
tableScan = NULL;
heapScan = NULL;
- indexScan = index_beginscan(OldHeap, OldIndex, SnapshotAny, NULL, 0, 0);
+ indexScan = index_beginscan(OldHeap, OldIndex, SnapshotAny, NULL, 0, 0, 0);
index_rescan(indexScan, NULL, 0, NULL, 0);
}
else
@@ -762,7 +771,7 @@ heapam_relation_copy_for_cluster(Relation OldHeap, Relation NewHeap,
pgstat_progress_update_param(PROGRESS_CLUSTER_PHASE,
PROGRESS_CLUSTER_PHASE_SEQ_SCAN_HEAP);
- tableScan = table_beginscan(OldHeap, SnapshotAny, 0, (ScanKey) NULL);
+ tableScan = table_beginscan(OldHeap, SnapshotAny, 0, (ScanKey) NULL, 0);
heapScan = (HeapScanDesc) tableScan;
indexScan = NULL;
@@ -2471,6 +2480,7 @@ BitmapHeapScanNextBlock(TableScanDesc scan,
TBMIterateResult *tbmres;
OffsetNumber offsets[TBM_MAX_TUPLES_PER_PAGE];
int noffsets = -1;
+ Buffer *vmbuffer = NULL;
Assert(scan->rs_flags & SO_TYPE_BITMAPSCAN);
Assert(hscan->rs_read_stream);
@@ -2517,7 +2527,9 @@ BitmapHeapScanNextBlock(TableScanDesc scan,
/*
* Prune and repair fragmentation for the whole page, if possible.
*/
- heap_page_prune_opt(scan->rs_rd, buffer);
+ if (scan->rs_flags & SO_HINT_REL_READ_ONLY)
+ vmbuffer = &hscan->rs_vmbuffer;
+ heap_page_prune_opt(scan->rs_rd, buffer, vmbuffer);
/*
* We must hold share lock on the buffer content while examining tuple
diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index 90270081acd..124722f1778 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -203,7 +203,9 @@ static bool heap_page_will_set_vis(Relation relation,
Buffer heap_buf,
Buffer vmbuffer,
bool blk_known_av,
- const PruneState *prstate,
+ PruneReason reason,
+ bool do_prune, bool do_freeze,
+ PruneState *prstate,
uint8 *vmflags,
bool *do_set_pd_vis);
@@ -218,9 +220,13 @@ static bool heap_page_will_set_vis(Relation relation,
* if there's not any use in pruning.
*
* Caller must have pin on the buffer, and must *not* have a lock on it.
+ *
+ * If vmbuffer is not NULL, it is okay for pruning to set the visibility map if
+ * the page is all-visible. We will take care of pinning and, if needed,
+ * reading in the page of the visibility map.
*/
void
-heap_page_prune_opt(Relation relation, Buffer buffer)
+heap_page_prune_opt(Relation relation, Buffer buffer, Buffer *vmbuffer)
{
Page page = BufferGetPage(buffer);
TransactionId prune_xid;
@@ -297,6 +303,13 @@ heap_page_prune_opt(Relation relation, Buffer buffer)
.vistest = vistest,.cutoffs = NULL
};
+ if (vmbuffer)
+ {
+ visibilitymap_pin(relation, BufferGetBlockNumber(buffer), vmbuffer);
+ params.options = HEAP_PAGE_PRUNE_UPDATE_VIS;
+ params.vmbuffer = *vmbuffer;
+ }
+
heap_page_prune_and_freeze(¶ms, &presult, &dummy_off_loc,
NULL, NULL);
@@ -785,6 +798,9 @@ heap_page_will_freeze(Relation relation, Buffer buffer,
* have examined this page’s VM bits (e.g., VACUUM in the previous
* heap_vac_scan_next_block() call) and can pass that along.
*
+ * This should be called only after do_freeze has been decided (and do_prune
+ * has been set), as these factor into our heuristic-based decision.
+ *
* Returns true if one or both VM bits should be set, along with the desired
* flags in *vmflags. Also indicates via do_set_pd_vis whether PD_ALL_VISIBLE
* should be set on the heap page.
@@ -795,7 +811,9 @@ heap_page_will_set_vis(Relation relation,
Buffer heap_buf,
Buffer vmbuffer,
bool blk_known_av,
- const PruneState *prstate,
+ PruneReason reason,
+ bool do_prune, bool do_freeze,
+ PruneState *prstate,
uint8 *vmflags,
bool *do_set_pd_vis)
{
@@ -811,6 +829,23 @@ heap_page_will_set_vis(Relation relation,
return false;
}
+ /*
+ * If this is an on-access call and we're not actually pruning, avoid
+ * setting the visibility map if it would newly dirty the heap page or, if
+ * the page is already dirty, if doing so would require including a
+ * full-page image (FPI) of the heap page in the WAL. This situation
+ * should be rare, as on-access pruning is only attempted when
+ * pd_prune_xid is valid.
+ */
+ if (reason == PRUNE_ON_ACCESS &&
+ prstate->all_visible &&
+ !do_prune && !do_freeze &&
+ (!BufferIsDirty(heap_buf) || XLogCheckBufferNeedsBackup(heap_buf)))
+ {
+ prstate->all_visible = prstate->all_frozen = false;
+ return false;
+ }
+
if (prstate->all_visible && !PageIsAllVisible(heap_page))
*do_set_pd_vis = true;
@@ -834,6 +869,11 @@ heap_page_will_set_vis(Relation relation,
* page-level bit is clear. However, it's possible that in vacuum the bit
* got cleared after heap_vac_scan_next_block() was called, so we must
* recheck with buffer lock before concluding that the VM is corrupt.
+ *
+ * This will never trigger for on-access pruning because it couldn't have
+ * done a previous visibility map lookup and thus will always pass
+ * blk_known_av as false. A future vacuum will have to take care of fixing
+ * the corruption.
*/
else if (blk_known_av && !PageIsAllVisible(heap_page) &&
visibilitymap_get_status(relation, heap_blk, &vmbuffer) != 0)
@@ -994,6 +1034,14 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
prstate.ndead > 0 ||
prstate.nunused > 0;
+ /*
+ * Even if we don't prune anything, if we found a new value for the
+ * pd_prune_xid field or the page was marked full, we will update the hint
+ * bit.
+ */
+ do_hint_prune = ((PageHeader) page)->pd_prune_xid != prstate.new_prune_xid ||
+ PageIsFull(page);
+
/*
* After processing all the live tuples on the page, if the newest xmin
* amongst them is not visible to everyone, the page cannot be
@@ -1004,14 +1052,6 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
!GlobalVisXidVisibleToAll(prstate.vistest, prstate.visibility_cutoff_xid))
prstate.all_visible = prstate.all_frozen = false;
- /*
- * Even if we don't prune anything, if we found a new value for the
- * pd_prune_xid field or the page was marked full, we will update the hint
- * bit.
- */
- do_hint_prune = ((PageHeader) page)->pd_prune_xid != prstate.new_prune_xid ||
- PageIsFull(page);
-
/*
* Decide if we want to go ahead with freezing according to the freeze
* plans we prepared, or not.
@@ -1054,6 +1094,7 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
*/
do_set_vm = heap_page_will_set_vis(params->relation,
blockno, buffer, vmbuffer, params->blk_known_av,
+ params->reason, do_prune, do_freeze,
&prstate, &new_vmbits, &do_set_pd_vis);
/* We should only set the VM if PD_ALL_VISIBLE is set or will be */
@@ -2340,7 +2381,7 @@ heap_log_freeze_plan(HeapTupleFreeze *tuples, int ntuples,
/*
* Calculate the conflict horizon for the whole XLOG_HEAP2_PRUNE_VACUUM_SCAN
- * record.
+ * or XLOG_HEAP2_PRUNE_ON_ACCESS record.
*/
static TransactionId
get_conflict_xid(bool do_prune, bool do_freeze, bool do_set_vm,
@@ -2410,8 +2451,8 @@ get_conflict_xid(bool do_prune, bool do_freeze, bool do_set_vm,
* - Reaping: During vacuum phase III, items that are already LP_DEAD are
* marked as unused.
*
- * - VM updates: After vacuum phases I and III, the heap page may be marked
- * all-visible and all-frozen.
+ * - VM updates: After vacuum phases I and III and on-access, the heap page
+ * may be marked all-visible and all-frozen.
*
* These changes all happen together, so we use a single WAL record for them
* all.
diff --git a/src/backend/access/index/genam.c b/src/backend/access/index/genam.c
index 0cb27af1310..1e7992dbeb3 100644
--- a/src/backend/access/index/genam.c
+++ b/src/backend/access/index/genam.c
@@ -447,7 +447,7 @@ systable_beginscan(Relation heapRelation,
}
sysscan->iscan = index_beginscan(heapRelation, irel,
- snapshot, NULL, nkeys, 0);
+ snapshot, NULL, nkeys, 0, 0);
index_rescan(sysscan->iscan, idxkey, nkeys, NULL, 0);
sysscan->scan = NULL;
@@ -708,7 +708,7 @@ systable_beginscan_ordered(Relation heapRelation,
}
sysscan->iscan = index_beginscan(heapRelation, indexRelation,
- snapshot, NULL, nkeys, 0);
+ snapshot, NULL, nkeys, 0, 0);
index_rescan(sysscan->iscan, idxkey, nkeys, NULL, 0);
sysscan->scan = NULL;
diff --git a/src/backend/access/index/indexam.c b/src/backend/access/index/indexam.c
index 0492d92d23b..b5523cf2ab1 100644
--- a/src/backend/access/index/indexam.c
+++ b/src/backend/access/index/indexam.c
@@ -257,7 +257,7 @@ index_beginscan(Relation heapRelation,
Relation indexRelation,
Snapshot snapshot,
IndexScanInstrumentation *instrument,
- int nkeys, int norderbys)
+ int nkeys, int norderbys, uint32 flags)
{
IndexScanDesc scan;
@@ -284,7 +284,7 @@ index_beginscan(Relation heapRelation,
scan->instrument = instrument;
/* prepare to fetch index matches from table */
- scan->xs_heapfetch = table_index_fetch_begin(heapRelation);
+ scan->xs_heapfetch = table_index_fetch_begin(heapRelation, flags);
return scan;
}
@@ -615,7 +615,7 @@ index_beginscan_parallel(Relation heaprel, Relation indexrel,
scan->instrument = instrument;
/* prepare to fetch index matches from table */
- scan->xs_heapfetch = table_index_fetch_begin(heaprel);
+ scan->xs_heapfetch = table_index_fetch_begin(heaprel, 0);
return scan;
}
diff --git a/src/backend/access/nbtree/nbtsort.c b/src/backend/access/nbtree/nbtsort.c
index 454adaee7dc..02ab0233e59 100644
--- a/src/backend/access/nbtree/nbtsort.c
+++ b/src/backend/access/nbtree/nbtsort.c
@@ -1925,7 +1925,7 @@ _bt_parallel_scan_and_sort(BTSpool *btspool, BTSpool *btspool2,
indexInfo = BuildIndexInfo(btspool->index);
indexInfo->ii_Concurrent = btshared->isconcurrent;
scan = table_beginscan_parallel(btspool->heap,
- ParallelTableScanFromBTShared(btshared));
+ ParallelTableScanFromBTShared(btshared), 0);
reltuples = table_index_build_scan(btspool->heap, btspool->index, indexInfo,
true, progress, _bt_build_callback,
&buildstate, scan);
diff --git a/src/backend/access/table/tableam.c b/src/backend/access/table/tableam.c
index 5e41404937e..558c4497993 100644
--- a/src/backend/access/table/tableam.c
+++ b/src/backend/access/table/tableam.c
@@ -50,6 +50,7 @@ char *default_table_access_method = DEFAULT_TABLE_ACCESS_METHOD;
bool synchronize_seqscans = true;
+
/* ----------------------------------------------------------------------------
* Slot functions.
* ----------------------------------------------------------------------------
@@ -163,10 +164,11 @@ table_parallelscan_initialize(Relation rel, ParallelTableScanDesc pscan,
}
TableScanDesc
-table_beginscan_parallel(Relation relation, ParallelTableScanDesc pscan)
+table_beginscan_parallel(Relation relation, ParallelTableScanDesc pscan, uint32 flags)
{
Snapshot snapshot;
- uint32 flags = SO_TYPE_SEQSCAN |
+
+ flags |= SO_TYPE_SEQSCAN |
SO_ALLOW_STRAT | SO_ALLOW_SYNC | SO_ALLOW_PAGEMODE;
Assert(RelFileLocatorEquals(relation->rd_locator, pscan->phs_locator));
@@ -217,7 +219,7 @@ table_index_fetch_tuple_check(Relation rel,
bool found;
slot = table_slot_create(rel, NULL);
- scan = table_index_fetch_begin(rel);
+ scan = table_index_fetch_begin(rel, 0);
found = table_index_fetch_tuple(scan, tid, snapshot, slot, &call_again,
all_dead);
table_index_fetch_end(scan);
diff --git a/src/backend/commands/constraint.c b/src/backend/commands/constraint.c
index 3497a8221f2..97c8278e36d 100644
--- a/src/backend/commands/constraint.c
+++ b/src/backend/commands/constraint.c
@@ -106,7 +106,7 @@ unique_key_recheck(PG_FUNCTION_ARGS)
*/
tmptid = checktid;
{
- IndexFetchTableData *scan = table_index_fetch_begin(trigdata->tg_relation);
+ IndexFetchTableData *scan = table_index_fetch_begin(trigdata->tg_relation, 0);
bool call_again = false;
if (!table_index_fetch_tuple(scan, &tmptid, SnapshotSelf, slot,
diff --git a/src/backend/commands/copyto.c b/src/backend/commands/copyto.c
index cef452584e5..22b453dc617 100644
--- a/src/backend/commands/copyto.c
+++ b/src/backend/commands/copyto.c
@@ -1156,7 +1156,7 @@ CopyRelationTo(CopyToState cstate, Relation rel, Relation root_rel, uint64 *proc
AttrMap *map = NULL;
TupleTableSlot *root_slot = NULL;
- scandesc = table_beginscan(rel, GetActiveSnapshot(), 0, NULL);
+ scandesc = table_beginscan(rel, GetActiveSnapshot(), 0, NULL, 0);
slot = table_slot_create(rel, NULL);
/*
diff --git a/src/backend/commands/tablecmds.c b/src/backend/commands/tablecmds.c
index 23ebaa3f230..66c418059fe 100644
--- a/src/backend/commands/tablecmds.c
+++ b/src/backend/commands/tablecmds.c
@@ -6345,7 +6345,7 @@ ATRewriteTable(AlteredTableInfo *tab, Oid OIDNewHeap)
* checking all the constraints.
*/
snapshot = RegisterSnapshot(GetLatestSnapshot());
- scan = table_beginscan(oldrel, snapshot, 0, NULL);
+ scan = table_beginscan(oldrel, snapshot, 0, NULL, 0);
/*
* Switch to per-tuple memory context and reset it for each tuple
@@ -13730,7 +13730,7 @@ validateForeignKeyConstraint(char *conname,
*/
snapshot = RegisterSnapshot(GetLatestSnapshot());
slot = table_slot_create(rel, NULL);
- scan = table_beginscan(rel, snapshot, 0, NULL);
+ scan = table_beginscan(rel, snapshot, 0, NULL, 0);
perTupCxt = AllocSetContextCreate(CurrentMemoryContext,
"validateForeignKeyConstraint",
diff --git a/src/backend/commands/typecmds.c b/src/backend/commands/typecmds.c
index 5979580139f..35560ac60d9 100644
--- a/src/backend/commands/typecmds.c
+++ b/src/backend/commands/typecmds.c
@@ -3154,7 +3154,7 @@ validateDomainNotNullConstraint(Oid domainoid)
/* Scan all tuples in this relation */
snapshot = RegisterSnapshot(GetLatestSnapshot());
- scan = table_beginscan(testrel, snapshot, 0, NULL);
+ scan = table_beginscan(testrel, snapshot, 0, NULL, 0);
slot = table_slot_create(testrel, NULL);
while (table_scan_getnextslot(scan, ForwardScanDirection, slot))
{
@@ -3235,7 +3235,7 @@ validateDomainCheckConstraint(Oid domainoid, const char *ccbin, LOCKMODE lockmod
/* Scan all tuples in this relation */
snapshot = RegisterSnapshot(GetLatestSnapshot());
- scan = table_beginscan(testrel, snapshot, 0, NULL);
+ scan = table_beginscan(testrel, snapshot, 0, NULL, 0);
slot = table_slot_create(testrel, NULL);
while (table_scan_getnextslot(scan, ForwardScanDirection, slot))
{
diff --git a/src/backend/executor/execIndexing.c b/src/backend/executor/execIndexing.c
index 401606f840a..4e39ac00f30 100644
--- a/src/backend/executor/execIndexing.c
+++ b/src/backend/executor/execIndexing.c
@@ -815,7 +815,7 @@ check_exclusion_or_unique_constraint(Relation heap, Relation index,
retry:
conflict = false;
found_self = false;
- index_scan = index_beginscan(heap, index, &DirtySnapshot, NULL, indnkeyatts, 0);
+ index_scan = index_beginscan(heap, index, &DirtySnapshot, NULL, indnkeyatts, 0, 0);
index_rescan(index_scan, scankeys, indnkeyatts, NULL, 0);
while (index_getnext_slot(index_scan, ForwardScanDirection, existing_slot))
diff --git a/src/backend/executor/execMain.c b/src/backend/executor/execMain.c
index 27c9eec697b..0630a5af79e 100644
--- a/src/backend/executor/execMain.c
+++ b/src/backend/executor/execMain.c
@@ -916,6 +916,10 @@ InitPlan(QueryDesc *queryDesc, int eflags)
break;
}
+ /* If it has a rowmark, the relation is modified */
+ estate->es_modified_relids = bms_add_member(estate->es_modified_relids,
+ rc->rti);
+
/* Check that relation is a legal target for marking */
if (relation)
CheckValidRowMarkRel(relation, rc->markType);
diff --git a/src/backend/executor/execReplication.c b/src/backend/executor/execReplication.c
index def32774c90..473d236e551 100644
--- a/src/backend/executor/execReplication.c
+++ b/src/backend/executor/execReplication.c
@@ -204,7 +204,7 @@ RelationFindReplTupleByIndex(Relation rel, Oid idxoid,
skey_attoff = build_replindex_scan_key(skey, rel, idxrel, searchslot);
/* Start an index scan. */
- scan = index_beginscan(rel, idxrel, &snap, NULL, skey_attoff, 0);
+ scan = index_beginscan(rel, idxrel, &snap, NULL, skey_attoff, 0, 0);
retry:
found = false;
@@ -382,7 +382,7 @@ RelationFindReplTupleSeq(Relation rel, LockTupleMode lockmode,
/* Start a heap scan. */
InitDirtySnapshot(snap);
- scan = table_beginscan(rel, &snap, 0, NULL);
+ scan = table_beginscan(rel, &snap, 0, NULL, 0);
scanslot = table_slot_create(rel, NULL);
retry:
@@ -601,7 +601,7 @@ RelationFindDeletedTupleInfoSeq(Relation rel, TupleTableSlot *searchslot,
* not yet committed or those just committed prior to the scan are
* excluded in update_most_recent_deletion_info().
*/
- scan = table_beginscan(rel, SnapshotAny, 0, NULL);
+ scan = table_beginscan(rel, SnapshotAny, 0, NULL, 0);
scanslot = table_slot_create(rel, NULL);
table_rescan(scan, NULL);
@@ -665,7 +665,7 @@ RelationFindDeletedTupleInfoByIndex(Relation rel, Oid idxoid,
* not yet committed or those just committed prior to the scan are
* excluded in update_most_recent_deletion_info().
*/
- scan = index_beginscan(rel, idxrel, SnapshotAny, NULL, skey_attoff, 0);
+ scan = index_beginscan(rel, idxrel, SnapshotAny, NULL, skey_attoff, 0, 0);
index_rescan(scan, skey, skey_attoff, NULL, 0);
diff --git a/src/backend/executor/execUtils.c b/src/backend/executor/execUtils.c
index fdc65c2b42b..28a06dcd244 100644
--- a/src/backend/executor/execUtils.c
+++ b/src/backend/executor/execUtils.c
@@ -893,6 +893,8 @@ ExecInitResultRelation(EState *estate, ResultRelInfo *resultRelInfo,
estate->es_result_relations = (ResultRelInfo **)
palloc0(estate->es_range_table_size * sizeof(ResultRelInfo *));
estate->es_result_relations[rti - 1] = resultRelInfo;
+ estate->es_modified_relids = bms_add_member(estate->es_modified_relids,
+ rti);
/*
* Saving in the list allows to avoid needlessly traversing the whole
diff --git a/src/backend/executor/nodeBitmapHeapscan.c b/src/backend/executor/nodeBitmapHeapscan.c
index bf24f3d7fe0..0d854db51a1 100644
--- a/src/backend/executor/nodeBitmapHeapscan.c
+++ b/src/backend/executor/nodeBitmapHeapscan.c
@@ -105,11 +105,18 @@ BitmapTableScanSetup(BitmapHeapScanState *node)
*/
if (!node->ss.ss_currentScanDesc)
{
+ uint32 flags = 0;
+
+ if (!bms_is_member(((Scan *) node->ss.ps.plan)->scanrelid,
+ node->ss.ps.state->es_modified_relids))
+ flags = SO_HINT_REL_READ_ONLY;
+
node->ss.ss_currentScanDesc =
table_beginscan_bm(node->ss.ss_currentRelation,
node->ss.ps.state->es_snapshot,
0,
- NULL);
+ NULL,
+ flags);
}
node->ss.ss_currentScanDesc->st.rs_tbmiterator = tbmiterator;
diff --git a/src/backend/executor/nodeIndexonlyscan.c b/src/backend/executor/nodeIndexonlyscan.c
index f464cca9507..87b04b1b88e 100644
--- a/src/backend/executor/nodeIndexonlyscan.c
+++ b/src/backend/executor/nodeIndexonlyscan.c
@@ -94,7 +94,7 @@ IndexOnlyNext(IndexOnlyScanState *node)
estate->es_snapshot,
&node->ioss_Instrument,
node->ioss_NumScanKeys,
- node->ioss_NumOrderByKeys);
+ node->ioss_NumOrderByKeys, 0);
node->ioss_ScanDesc = scandesc;
diff --git a/src/backend/executor/nodeIndexscan.c b/src/backend/executor/nodeIndexscan.c
index f36929deec3..90f929ce741 100644
--- a/src/backend/executor/nodeIndexscan.c
+++ b/src/backend/executor/nodeIndexscan.c
@@ -102,6 +102,12 @@ IndexNext(IndexScanState *node)
if (scandesc == NULL)
{
+ uint32 flags = 0;
+
+ if (!bms_is_member(((Scan *) node->ss.ps.plan)->scanrelid,
+ estate->es_modified_relids))
+ flags = SO_HINT_REL_READ_ONLY;
+
/*
* We reach here if the index scan is not parallel, or if we're
* serially executing an index scan that was planned to be parallel.
@@ -111,7 +117,8 @@ IndexNext(IndexScanState *node)
estate->es_snapshot,
&node->iss_Instrument,
node->iss_NumScanKeys,
- node->iss_NumOrderByKeys);
+ node->iss_NumOrderByKeys,
+ flags);
node->iss_ScanDesc = scandesc;
@@ -207,7 +214,7 @@ IndexNextWithReorder(IndexScanState *node)
estate->es_snapshot,
&node->iss_Instrument,
node->iss_NumScanKeys,
- node->iss_NumOrderByKeys);
+ node->iss_NumOrderByKeys, 0);
node->iss_ScanDesc = scandesc;
diff --git a/src/backend/executor/nodeSeqscan.c b/src/backend/executor/nodeSeqscan.c
index 94047d29430..4d0cbb9dee4 100644
--- a/src/backend/executor/nodeSeqscan.c
+++ b/src/backend/executor/nodeSeqscan.c
@@ -65,13 +65,20 @@ SeqNext(SeqScanState *node)
if (scandesc == NULL)
{
+ uint32 flags = 0;
+
+ if (!bms_is_member(((Scan *) node->ss.ps.plan)->scanrelid,
+ estate->es_modified_relids))
+ flags = SO_HINT_REL_READ_ONLY;
+
/*
* We reach here if the scan is not parallel, or if we're serially
* executing a scan that was planned to be parallel.
*/
scandesc = table_beginscan(node->ss.ss_currentRelation,
estate->es_snapshot,
- 0, NULL);
+ 0, NULL, flags);
+
node->ss.ss_currentScanDesc = scandesc;
}
@@ -367,14 +374,20 @@ ExecSeqScanInitializeDSM(SeqScanState *node,
{
EState *estate = node->ss.ps.state;
ParallelTableScanDesc pscan;
+ uint32 flags = 0;
pscan = shm_toc_allocate(pcxt->toc, node->pscan_len);
table_parallelscan_initialize(node->ss.ss_currentRelation,
pscan,
estate->es_snapshot);
shm_toc_insert(pcxt->toc, node->ss.ps.plan->plan_node_id, pscan);
+ if (!bms_is_member(((Scan *) node->ss.ps.plan)->scanrelid,
+ estate->es_modified_relids))
+ flags = SO_HINT_REL_READ_ONLY;
+
node->ss.ss_currentScanDesc =
- table_beginscan_parallel(node->ss.ss_currentRelation, pscan);
+ table_beginscan_parallel(node->ss.ss_currentRelation, pscan,
+ flags);
}
/* ----------------------------------------------------------------
@@ -404,8 +417,15 @@ ExecSeqScanInitializeWorker(SeqScanState *node,
ParallelWorkerContext *pwcxt)
{
ParallelTableScanDesc pscan;
+ uint32 flags = 0;
+
+ if (!bms_is_member(((Scan *) node->ss.ps.plan)->scanrelid,
+ node->ss.ps.state->es_modified_relids))
+ flags = SO_HINT_REL_READ_ONLY;
pscan = shm_toc_lookup(pwcxt->toc, node->ss.ps.plan->plan_node_id, false);
node->ss.ss_currentScanDesc =
- table_beginscan_parallel(node->ss.ss_currentRelation, pscan);
+ table_beginscan_parallel(node->ss.ss_currentRelation,
+ pscan,
+ flags);
}
diff --git a/src/backend/partitioning/partbounds.c b/src/backend/partitioning/partbounds.c
index 8ba038c5ef4..d3b340ee2a7 100644
--- a/src/backend/partitioning/partbounds.c
+++ b/src/backend/partitioning/partbounds.c
@@ -3370,7 +3370,7 @@ check_default_partition_contents(Relation parent, Relation default_rel,
econtext = GetPerTupleExprContext(estate);
snapshot = RegisterSnapshot(GetLatestSnapshot());
tupslot = table_slot_create(part_rel, &estate->es_tupleTable);
- scan = table_beginscan(part_rel, snapshot, 0, NULL);
+ scan = table_beginscan(part_rel, snapshot, 0, NULL, 0);
/*
* Switch to per-tuple memory context and reset it for each tuple
diff --git a/src/backend/utils/adt/selfuncs.c b/src/backend/utils/adt/selfuncs.c
index 540aa9628d7..28434146eba 100644
--- a/src/backend/utils/adt/selfuncs.c
+++ b/src/backend/utils/adt/selfuncs.c
@@ -7100,7 +7100,7 @@ get_actual_variable_endpoint(Relation heapRel,
index_scan = index_beginscan(heapRel, indexRel,
&SnapshotNonVacuumable, NULL,
- 1, 0);
+ 1, 0, 0);
/* Set it up for index-only scan */
index_scan->xs_want_itup = true;
index_rescan(index_scan, scankeys, 1, NULL, 0);
diff --git a/src/include/access/genam.h b/src/include/access/genam.h
index 9200a22bd9f..2f9e9ea6318 100644
--- a/src/include/access/genam.h
+++ b/src/include/access/genam.h
@@ -177,7 +177,7 @@ extern IndexScanDesc index_beginscan(Relation heapRelation,
Relation indexRelation,
Snapshot snapshot,
IndexScanInstrumentation *instrument,
- int nkeys, int norderbys);
+ int nkeys, int norderbys, uint32 flags);
extern IndexScanDesc index_beginscan_bitmap(Relation indexRelation,
Snapshot snapshot,
IndexScanInstrumentation *instrument,
@@ -204,6 +204,7 @@ extern IndexScanDesc index_beginscan_parallel(Relation heaprel,
IndexScanInstrumentation *instrument,
int nkeys, int norderbys,
ParallelIndexScanDesc pscan);
+
extern ItemPointer index_getnext_tid(IndexScanDesc scan,
ScanDirection direction);
extern bool index_fetch_heap(IndexScanDesc scan, TupleTableSlot *slot);
diff --git a/src/include/access/heapam.h b/src/include/access/heapam.h
index 2b6a521e4ea..1e3df54628b 100644
--- a/src/include/access/heapam.h
+++ b/src/include/access/heapam.h
@@ -95,6 +95,13 @@ typedef struct HeapScanDescData
*/
ParallelBlockTableScanWorkerData *rs_parallelworkerdata;
+ /*
+ * For sequential scans and bitmap heap scans. If the relation is not
+ * being modified, on-access pruning may read in the current heap page's
+ * corresponding VM block to this buffer.
+ */
+ Buffer rs_vmbuffer;
+
/* these fields only used in page-at-a-time mode and for bitmap scans */
uint32 rs_cindex; /* current tuple's index in vistuples */
uint32 rs_ntuples; /* number of visible tuples on page */
@@ -117,8 +124,24 @@ typedef struct IndexFetchHeapData
{
IndexFetchTableData xs_base; /* AM independent part of the descriptor */
- Buffer xs_cbuf; /* current heap buffer in scan, if any */
- /* NB: if xs_cbuf is not InvalidBuffer, we hold a pin on that buffer */
+ /*
+ * Current heap buffer in scan, if any. NB: if xs_cbuf is not
+ * InvalidBuffer, we hold a pin on that buffer.
+ */
+ Buffer xs_cbuf;
+
+ /*
+ * For index scans that do not modify the underlying heap table, on-access
+ * pruning may read in the current heap page's corresponding VM block to
+ * this buffer.
+ */
+ Buffer xs_vmbuffer;
+
+ /*
+ * Some optimizations can only be performed if the query does not modify
+ * the underlying relation. Track that here.
+ */
+ bool modifies_base_rel;
} IndexFetchHeapData;
/* Result codes for HeapTupleSatisfiesVacuum */
@@ -417,7 +440,8 @@ extern TransactionId heap_index_delete_tuples(Relation rel,
TM_IndexDeleteOp *delstate);
/* in heap/pruneheap.c */
-extern void heap_page_prune_opt(Relation relation, Buffer buffer);
+extern void heap_page_prune_opt(Relation relation, Buffer buffer,
+ Buffer *vmbuffer);
extern void heap_page_prune_and_freeze(PruneFreezeParams *params,
PruneFreezeResult *presult,
OffsetNumber *off_loc,
diff --git a/src/include/access/tableam.h b/src/include/access/tableam.h
index e16bf025692..0042636463f 100644
--- a/src/include/access/tableam.h
+++ b/src/include/access/tableam.h
@@ -63,6 +63,8 @@ typedef enum ScanOptions
/* unregister snapshot at scan end? */
SO_TEMP_SNAPSHOT = 1 << 9,
+ /* set if the query doesn't modify the rel */
+ SO_HINT_REL_READ_ONLY = 1 << 10,
} ScanOptions;
/*
@@ -420,7 +422,7 @@ typedef struct TableAmRoutine
*
* Tuples for an index scan can then be fetched via index_fetch_tuple.
*/
- struct IndexFetchTableData *(*index_fetch_begin) (Relation rel);
+ struct IndexFetchTableData *(*index_fetch_begin) (Relation rel, uint32 flags);
/*
* Reset index fetch. Typically this will release cross index fetch
@@ -874,9 +876,9 @@ extern TupleTableSlot *table_slot_create(Relation relation, List **reglist);
*/
static inline TableScanDesc
table_beginscan(Relation rel, Snapshot snapshot,
- int nkeys, ScanKeyData *key)
+ int nkeys, ScanKeyData *key, uint32 flags)
{
- uint32 flags = SO_TYPE_SEQSCAN |
+ flags |= SO_TYPE_SEQSCAN |
SO_ALLOW_STRAT | SO_ALLOW_SYNC | SO_ALLOW_PAGEMODE;
return rel->rd_tableam->scan_begin(rel, snapshot, nkeys, key, NULL, flags);
@@ -919,9 +921,9 @@ table_beginscan_strat(Relation rel, Snapshot snapshot,
*/
static inline TableScanDesc
table_beginscan_bm(Relation rel, Snapshot snapshot,
- int nkeys, ScanKeyData *key)
+ int nkeys, ScanKeyData *key, uint32 flags)
{
- uint32 flags = SO_TYPE_BITMAPSCAN | SO_ALLOW_PAGEMODE;
+ flags |= SO_TYPE_BITMAPSCAN | SO_ALLOW_PAGEMODE;
return rel->rd_tableam->scan_begin(rel, snapshot, nkeys, key,
NULL, flags);
@@ -1128,7 +1130,8 @@ extern void table_parallelscan_initialize(Relation rel,
* Caller must hold a suitable lock on the relation.
*/
extern TableScanDesc table_beginscan_parallel(Relation relation,
- ParallelTableScanDesc pscan);
+ ParallelTableScanDesc pscan,
+ uint32 flags);
/*
* Restart a parallel scan. Call this in the leader process. Caller is
@@ -1154,9 +1157,9 @@ table_parallelscan_reinitialize(Relation rel, ParallelTableScanDesc pscan)
* Tuples for an index scan can then be fetched via table_index_fetch_tuple().
*/
static inline IndexFetchTableData *
-table_index_fetch_begin(Relation rel)
+table_index_fetch_begin(Relation rel, uint32 flags)
{
- return rel->rd_tableam->index_fetch_begin(rel);
+ return rel->rd_tableam->index_fetch_begin(rel, flags);
}
/*
diff --git a/src/include/nodes/execnodes.h b/src/include/nodes/execnodes.h
index 18ae8f0d4bb..0c3b0d60168 100644
--- a/src/include/nodes/execnodes.h
+++ b/src/include/nodes/execnodes.h
@@ -676,6 +676,12 @@ typedef struct EState
* ExecDoInitialPruning() */
const char *es_sourceText; /* Source text from QueryDesc */
+ /*
+ * RT indexes of relations modified by the query either through
+ * UPDATE/DELETE/INSERT/MERGE or SELECT FOR UPDATE
+ */
+ Bitmapset *es_modified_relids;
+
JunkFilter *es_junkFilter; /* top-level junk filter, if any */
/* If query can insert/delete tuples, the command ID to mark them with */
diff --git a/src/test/recovery/t/035_standby_logical_decoding.pl b/src/test/recovery/t/035_standby_logical_decoding.pl
index ebe2fae1789..bdd9f0a62cd 100644
--- a/src/test/recovery/t/035_standby_logical_decoding.pl
+++ b/src/test/recovery/t/035_standby_logical_decoding.pl
@@ -296,6 +296,7 @@ wal_level = 'logical'
max_replication_slots = 4
max_wal_senders = 4
autovacuum = off
+hot_standby_feedback = on
});
$node_primary->dump_info;
$node_primary->start;
@@ -748,7 +749,7 @@ check_pg_recvlogical_stderr($handle,
$logstart = -s $node_standby->logfile;
reactive_slots_change_hfs_and_wait_for_xmins('shared_row_removal_',
- 'no_conflict_', 0, 1);
+ 'no_conflict_', 1, 0);
# This should not trigger a conflict
wait_until_vacuum_can_remove(
--
2.43.0
v22-0009-Set-pd_prune_xid-on-insert.patchtext/x-patch; charset=UTF-8; name=v22-0009-Set-pd_prune_xid-on-insert.patchDownload
From 76cb5109137fc1cceb62b4e5091115eee23fc6e9 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <melanieplageman@gmail.com>
Date: Tue, 29 Jul 2025 16:12:56 -0400
Subject: [PATCH v22 9/9] Set pd_prune_xid on insert
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
Now that visibility map (VM) updates can occur during read-only queries,
it makes sense to also set the page’s pd_prune_xid hint during inserts.
This enables heap_page_prune_and_freeze() to run after a page is
filled with newly inserted tuples the first time it is read.
This change also addresses a long-standing note in heap_insert() and
heap_multi_insert(), which observed that setting pd_prune_xid would
help clean up aborted insertions sooner. Without it, such tuples might
linger until VACUUM, whereas now they can be pruned earlier.
Setting pd_prune_xid on insert can cause a page to be dirtied and
written out when it previously would not have been, affetcting the
reported number of hits in the index-killtuples isolation test. It is
unclear if this is a bug in the way hits are tracked, a faulty test
expectation, or if simply updating the test's expected output is
sufficient remediation.
---
src/backend/access/heap/heapam.c | 25 +++++++++++++------
src/backend/access/heap/heapam_xlog.c | 15 ++++++++++-
.../isolation/expected/index-killtuples.out | 6 ++---
3 files changed, 34 insertions(+), 12 deletions(-)
diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index ae53e311ce1..f329f497480 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -2104,6 +2104,7 @@ heap_insert(Relation relation, HeapTuple tup, CommandId cid,
TransactionId xid = GetCurrentTransactionId();
HeapTuple heaptup;
Buffer buffer;
+ Page page;
Buffer vmbuffer = InvalidBuffer;
bool all_visible_cleared = false;
@@ -2163,15 +2164,19 @@ heap_insert(Relation relation, HeapTuple tup, CommandId cid,
}
/*
- * XXX Should we set PageSetPrunable on this page ?
+ * Set pd_prune_xid to trigger heap_page_prune_and_freeze() once the page
+ * is full so that we can set the page all-visible in the VM.
*
- * The inserting transaction may eventually abort thus making this tuple
- * DEAD and hence available for pruning. Though we don't want to optimize
- * for aborts, if no other tuple in this page is UPDATEd/DELETEd, the
- * aborted tuple will never be pruned until next vacuum is triggered.
+ * Setting pd_prune_xid is also handy if the inserting transaction
+ * eventually aborts making this tuple DEAD and hence available for
+ * pruning. If no other tuple in this page is UPDATEd/DELETEd, the aborted
+ * tuple would never otherwise be pruned until next vacuum is triggered.
*
- * If you do add PageSetPrunable here, add it in heap_xlog_insert too.
+ * Don't set it if we are in bootstrap mode, though.
*/
+ page = BufferGetPage(buffer);
+ if (TransactionIdIsNormal(xid))
+ PageSetPrunable(page, xid);
MarkBufferDirty(buffer);
@@ -2181,7 +2186,6 @@ heap_insert(Relation relation, HeapTuple tup, CommandId cid,
xl_heap_insert xlrec;
xl_heap_header xlhdr;
XLogRecPtr recptr;
- Page page = BufferGetPage(buffer);
uint8 info = XLOG_HEAP_INSERT;
int bufflags = 0;
@@ -2545,8 +2549,13 @@ heap_multi_insert(Relation relation, TupleTableSlot **slots, int ntuples,
}
/*
- * XXX Should we set PageSetPrunable on this page ? See heap_insert()
+ * Set pd_prune_xid. See heap_insert() for more on why we do this when
+ * inserting tuples. This only makes sense if we aren't already
+ * setting the page frozen in the VM. We also don't set it in
+ * bootstrap mode.
*/
+ if (!all_frozen_set && TransactionIdIsNormal(xid))
+ PageSetPrunable(page, xid);
MarkBufferDirty(buffer);
diff --git a/src/backend/access/heap/heapam_xlog.c b/src/backend/access/heap/heapam_xlog.c
index 5ab46e8bf8f..dac640f5c9d 100644
--- a/src/backend/access/heap/heapam_xlog.c
+++ b/src/backend/access/heap/heapam_xlog.c
@@ -462,6 +462,12 @@ heap_xlog_insert(XLogReaderState *record)
freespace = PageGetHeapFreeSpace(page); /* needed to update FSM below */
+ /*
+ * Set the page prunable to trigger on-access pruning later which may
+ * set the page all-visible in the VM.
+ */
+ PageSetPrunable(page, XLogRecGetXid(record));
+
PageSetLSN(page, lsn);
if (xlrec->flags & XLH_INSERT_ALL_VISIBLE_CLEARED)
@@ -611,9 +617,16 @@ heap_xlog_multi_insert(XLogReaderState *record)
if (xlrec->flags & XLH_INSERT_ALL_VISIBLE_CLEARED)
PageClearAllVisible(page);
- /* XLH_INSERT_ALL_FROZEN_SET implies that all tuples are visible */
+ /*
+ * XLH_INSERT_ALL_FROZEN_SET implies that all tuples are visible. If
+ * we are not setting the page frozen, then set the page's prunable
+ * hint so that we trigger on-access pruning later which may set the
+ * page all-visible in the VM.
+ */
if (xlrec->flags & XLH_INSERT_ALL_FROZEN_SET)
PageSetAllVisible(page);
+ else
+ PageSetPrunable(page, XLogRecGetXid(record));
MarkBufferDirty(buffer);
}
diff --git a/src/test/isolation/expected/index-killtuples.out b/src/test/isolation/expected/index-killtuples.out
index be7ddd756ef..b29f2434b00 100644
--- a/src/test/isolation/expected/index-killtuples.out
+++ b/src/test/isolation/expected/index-killtuples.out
@@ -54,7 +54,7 @@ step flush: SELECT FROM pg_stat_force_next_flush();
step result: SELECT heap_blks_read + heap_blks_hit - counter.heap_accesses AS new_heap_accesses FROM counter, pg_statio_all_tables WHERE relname = 'kill_prior_tuple';
new_heap_accesses
-----------------
- 1
+ 2
(1 row)
step measure: UPDATE counter SET heap_accesses = (SELECT heap_blks_read + heap_blks_hit FROM pg_statio_all_tables WHERE relname = 'kill_prior_tuple');
@@ -130,7 +130,7 @@ step flush: SELECT FROM pg_stat_force_next_flush();
step result: SELECT heap_blks_read + heap_blks_hit - counter.heap_accesses AS new_heap_accesses FROM counter, pg_statio_all_tables WHERE relname = 'kill_prior_tuple';
new_heap_accesses
-----------------
- 1
+ 2
(1 row)
step measure: UPDATE counter SET heap_accesses = (SELECT heap_blks_read + heap_blks_hit FROM pg_statio_all_tables WHERE relname = 'kill_prior_tuple');
@@ -283,7 +283,7 @@ step flush: SELECT FROM pg_stat_force_next_flush();
step result: SELECT heap_blks_read + heap_blks_hit - counter.heap_accesses AS new_heap_accesses FROM counter, pg_statio_all_tables WHERE relname = 'kill_prior_tuple';
new_heap_accesses
-----------------
- 1
+ 2
(1 row)
step measure: UPDATE counter SET heap_accesses = (SELECT heap_blks_read + heap_blks_hit FROM pg_statio_all_tables WHERE relname = 'kill_prior_tuple');
--
2.43.0
Melanie Plageman <melanieplageman@gmail.com> writes:
+ PruneFreezeParams params = {.relation = relation,.buffer = buffer, + .reason = PRUNE_ON_ACCESS,.options = 0, + .vistest = vistest,.cutoffs = NULL + };
I didn't pay much attention to this thread, so I didn't notice this
until it got committed, but I'd like to lodge an objection to this
formatting, especially the lack of spaces before the field names. This
would be much more readable with one struct field per line, i.e.
PruneFreezeParams params = {
.relation = rel,
.buffer = buf,
.reason = PRUNE_VACUUM_SCAN,
.options = HEAP_PAGE_PRUNE_FREEZE,
.vistest = vacrel->vistest,
.cutoffs = &vacrel->cutoffs,
};
or at a pinch, if we're really being stingy with the vertical space:
PruneFreezeParams params = {
.relation = rel, .buffer = buf,
.reason = PRUNE_VACUUM_SCAN, .options = HEAP_PAGE_PRUNE_FREEZE,
.vistest = vacrel->vistest, .cutoffs = &vacrel->cutoffs,
};
I had a quick grep, and every other designated struct initialiser I
could find uses the one-field-per-line form, but they're not consistent
about the comma after the last field. I personally prefer having it, so
that one can add more fields later without having to modify the
unrelated line.
- ilmari
Dagfinn Ilmari Mannsåker <ilmari@ilmari.org> writes:
Melanie Plageman <melanieplageman@gmail.com> writes:
+ PruneFreezeParams params = {.relation = relation,.buffer = buffer, + .reason = PRUNE_ON_ACCESS,.options = 0, + .vistest = vistest,.cutoffs = NULL + };I didn't pay much attention to this thread, so I didn't notice this
until it got committed, but I'd like to lodge an objection to this
formatting, especially the lack of spaces before the field names. This
would be much more readable with one struct field per line, i.e.PruneFreezeParams params = {
.relation = rel,
.buffer = buf,
.reason = PRUNE_VACUUM_SCAN,
.options = HEAP_PAGE_PRUNE_FREEZE,
.vistest = vacrel->vistest,
.cutoffs = &vacrel->cutoffs,
};
D'oh, my mail client untabified the .buffer line while I was editing it,
that should of course be:
PruneFreezeParams params = {
.relation = rel,
.buffer = buf,
.reason = PRUNE_VACUUM_SCAN,
.options = HEAP_PAGE_PRUNE_FREEZE,
.vistest = vacrel->vistest,
.cutoffs = &vacrel->cutoffs,
};
- ilmari
On Thu, Nov 20, 2025 at 12:55 PM Dagfinn Ilmari Mannsåker
<ilmari@ilmari.org> wrote:
I didn't pay much attention to this thread, so I didn't notice this
until it got committed, but I'd like to lodge an objection to this
formatting, especially the lack of spaces before the field names. This
would be much more readable with one struct field per line, i.e.PruneFreezeParams params = {
.relation = rel,
.buffer = buf,
.reason = PRUNE_VACUUM_SCAN,
.options = HEAP_PAGE_PRUNE_FREEZE,
.vistest = vacrel->vistest,
.cutoffs = &vacrel->cutoffs,
};or at a pinch, if we're really being stingy with the vertical space:
PruneFreezeParams params = {
.relation = rel, .buffer = buf,
.reason = PRUNE_VACUUM_SCAN, .options = HEAP_PAGE_PRUNE_FREEZE,
.vistest = vacrel->vistest, .cutoffs = &vacrel->cutoffs,
};I had a quick grep, and every other designated struct initialiser I
could find uses the one-field-per-line form, but they're not consistent
about the comma after the last field. I personally prefer having it, so
that one can add more fields later without having to modify the
unrelated line.
pgindent doesn't allow for a space after the comma before the period.
One reason I used struct initialization was to save space, so I'm a
bit loath to put every member on its own line. However, I don't want
to make the code less readable to others. So, I will commit an update
as you request.
- Melanie
On Nov 21, 2025, at 01:19, Melanie Plageman <melanieplageman@gmail.com> wrote:
On Wed, Nov 19, 2025 at 6:13 PM Melanie Plageman
<melanieplageman@gmail.com> wrote:Since it is passed into one of the helpers, I think I agree. Attached
v21 has this change.I've committed the first three patches. Attached v22 is the remaining
patches which set the VM in heap_page_prune_and_freeze() for vacuum
and then allow on-access pruning to also set the VM.
I just started reviewing 0001 yesterday and got a few comments. However, it was late, I didn’t have enough time to wrap up, so I decided to review a few more today and send the comments together. As you have pushed 0001-0003, I’d still raise my comment for them now, and I will review the rest of commits next week.
1 - pushed 0001
```
/*
* Report the number of tuples reclaimed to pgstats. This is
@@ -419,60 +425,44 @@ heap_page_will_freeze(Relation relation, Buffer buffer,
* also need to account for a reduction in the length of the line pointer
* array following array truncation by us.
*
- * If the HEAP_PRUNE_FREEZE option is set, we will also freeze tuples if it's
- * required in order to advance relfrozenxid / relminmxid, or if it's
- * considered advantageous for overall system performance to do so now. The
- * 'cutoffs', 'presult', 'new_relfrozen_xid' and 'new_relmin_mxid' arguments
- * are required when freezing. When HEAP_PRUNE_FREEZE option is set, we also
- * set presult->all_visible and presult->all_frozen on exit, to indicate if
- * the VM bits can be set. They are always set to false when the
- * HEAP_PRUNE_FREEZE option is not set, because at the moment only callers
- * that also freeze need that information.
- *
- * vistest is used to distinguish whether tuples are DEAD or RECENTLY_DEAD
- * (see heap_prune_satisfies_vacuum).
- *
- * options:
- * MARK_UNUSED_NOW indicates that dead items can be set LP_UNUSED during
- * pruning.
+ * params contains the input parameters used to control freezing and pruning
+ * behavior. See the definition of PruneFreezeParams for more on what each
+ * parameter does.
*
- * FREEZE indicates that we will also freeze tuples, and will return
- * 'all_visible', 'all_frozen' flags to the caller.
- *
- * cutoffs contains the freeze cutoffs, established by VACUUM at the beginning
- * of vacuuming the relation. Required if HEAP_PRUNE_FREEZE option is set.
- * cutoffs->OldestXmin is also used to determine if dead tuples are
- * HEAPTUPLE_RECENTLY_DEAD or HEAPTUPLE_DEAD.
+ * If the HEAP_PAGE_PRUNE_FREEZE option is set in params, we will freeze
+ * tuples if it's required in order to advance relfrozenxid / relminmxid, or
+ * if it's considered advantageous for overall system performance to do so
+ * now. The 'params.cutoffs', 'presult', 'new_relfrozen_xid' and
+ * 'new_relmin_mxid' arguments are required when freezing. When
+ * HEAP_PAGE_PRUNE_FREEZE option is passed, we also set presult->all_visible
+ * and presult->all_frozen on exit, to indicate if the VM bits can be set.
+ * They are always set to false when the HEAP_PAGE_PRUNE_FREEZE option is not
+ * passed, because at the moment only callers that also freeze need that
+ * information.
*
* presult contains output parameters needed by callers, such as the number of
* tuples removed and the offsets of dead items on the page after pruning.
* heap_page_prune_and_freeze() is responsible for initializing it. Required
* by all callers.
*
- * reason indicates why the pruning is performed. It is included in the WAL
- * record for debugging and analysis purposes, but otherwise has no effect.
- *
* off_loc is the offset location required by the caller to use in error
* callback.
*
* new_relfrozen_xid and new_relmin_mxid must provided by the caller if the
- * HEAP_PRUNE_FREEZE option is set. On entry, they contain the oldest XID and
- * multi-XID seen on the relation so far. They will be updated with oldest
- * values present on the page after pruning. After processing the whole
- * relation, VACUUM can use these values as the new relfrozenxid/relminmxid
- * for the relation.
+ * HEAP_PAGE_PRUNE_FREEZE option is set in params. On entry, they contain the
+ * oldest XID and multi-XID seen on the relation so far. They will be updated
+ * with oldest values present on the page after pruning. After processing the
+ * whole relation, VACUUM can use these values as the new
+ * relfrozenxid/relminmxid for the relation.
*/
void
-heap_page_prune_and_freeze(Relation relation, Buffer buffer,
- GlobalVisState *vistest,
- int options,
- struct VacuumCutoffs *cutoffs,
+heap_page_prune_and_freeze(PruneFreezeParams *params,
PruneFreezeResult *presult,
- PruneReason reason,
OffsetNumber *off_loc,
TransactionId *new_relfrozen_xid,
MultiXactId *new_relmin_mxid)
{
```
For this function interface change, I got a concern. The old function comment says "cutoffs contains the freeze cutoffs …. Required if HEAP_PRUNE_FREEZE option is set.”, meaning that cutoffs is only useful and must be set when HEAP_PRUNE_FREEZE is set. But the new comment seems to have lost this indication.
And in the old function interface, cutoffs sat right next to options, readers are easy to notice:
* when options is 0, cutoffs is null
```
heap_page_prune_and_freeze(relation, buffer, vistest, 0,
NULL, &presult, PRUNE_ON_ACCESS, &dummy_off_loc, NULL, NULL);
```
* when options has HEAP_PAGE_PRUNE_FREEZE, cutoffs is passed in
```
prune_options = HEAP_PAGE_PRUNE_FREEZE;
if (vacrel->nindexes == 0)
prune_options |= HEAP_PAGE_PRUNE_MARK_UNUSED_NOW;
heap_page_prune_and_freeze(rel, buf, vacrel->vistest, prune_options,
&vacrel->cutoffs, &presult, PRUNE_VACUUM_SCAN,
&vacrel->offnum,
&vacrel->NewRelfrozenXid, &vacrel->NewRelminMxid);
```
So, the change doesn’t break anything, but makes code a little bit harder to read. So, my suggestion is to add an assert in heap_page_prune_and_freeze, something like:
```
Assert(!(params->options & HEAP_PAGE_PRUNE_FREEZE) || params->cutoffs != NULL);
```
2 - pushed 0001
```
+ PruneFreezeParams params = {.relation = rel,.buffer = buf,
+ .reason = PRUNE_VACUUM_SCAN,.options = HEAP_PAGE_PRUNE_FREEZE,
+ .cutoffs = &vacrel->cutoffs,.vistest = vacrel->vistest
+ };
```
Using a designated initializer is not wrong, but makes future maintenance harder, because when a new field is added, this initializer will leave the new field uninitiated. From my impression, I don’t remember I ever see a designated initializer in PG code. I only remember 3 ways I have seen:
* use an initialize function to set every fields individually
* palloc0 to set all 0, then set non-zero fields individually
* {0} to set all 0, then set non-zero fields individually
3 - pushed 0002
```
prstate->all_visible = false;
+ prstate->all_frozen = false;
```
Nit: Now setting the both fields to false repeat in 6 places. Maybe add a static inline function, say PruneClearVisibilityFlags(), may improve maintainability.
4 - pushed 0003
```
+ * opporunistically freeze, to indicate if the VM bits can be set. They are
```
Typo: opporunistically, missed a “t”.
I’d stop here today, and continue reviewing rest commits in next week.
Best regards,
--
Chao Li (Evan)
HighGo Software Co., Ltd.
https://www.highgo.com/
On Nov 21, 2025, at 09:09, Chao Li <li.evan.chao@gmail.com> wrote:
I’d stop here today, and continue reviewing rest commits in next week.
I continue reviewing today.
0004 This a pure refactoring. It splits heap_page_prune_and_freeze to multiple small functions. LGTM, no comment.
0005 overall good, a few nit comments as below.
0006, 0007 look good, no comment.
5 - 0005 - heapam.h
```
+ /*
+ *
+ * vmbuffer is the buffer that must already contain contain the required
+ * block of the visibility map if we are to update it. blk_known_av is the
```
Nit:
* an unnecessary empty comment line.
* “contain contain” => “contain"
6 - 0005 heapam_xlog.c
```
+ * The critical integrity requirement here is that we must never end
+ * up with with the visibility map bit set and the page-level
```
Nit: “with with” => “with”
I will continue reviewing 0008 and rest tomorrow.
Best regards,
--
Chao Li (Evan)
HighGo Software Co., Ltd.
https://www.highgo.com/
On Nov 24, 2025, at 16:07, Chao Li <li.evan.chao@gmail.com> wrote:
0006, 0007 look good, no comment.
I missed a nit comment in 0007:
7 - 0007
```
+ * To handle recovery conflict during logical decoding on standby, we must know
+ * if the table is a catalog table. Note that in visibilitymapdefs.h
+ * VISIBLITYMAP_XLOG_CATALOG_REL is also defined as (1 << 2). xl_heap_prune
+ * records should use XLHP_IS_CATALOG_REL, not VISIBILIYTMAP_XLOG_CATALOG_REL --
+ * even if they only contain updates to the VM.
```
VISIBLITYMAP_XLOG_CATALOG_REL missed “I” after “B”.
Best regards,
--
Chao Li (Evan)
HighGo Software Co., Ltd.
https://www.highgo.com/
Hi,
On 2025-11-20 12:19:58 -0500, Melanie Plageman wrote:
From 363f0e4ac9ac7699a6d9c2a267a2ad60825407c8 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <melanieplageman@gmail.com>
Date: Mon, 17 Nov 2025 15:11:27 -0500
Subject: [PATCH v22 1/9] Split heap_page_prune_and_freeze() into helpersRefactor the setup and planning phases of pruning and freezing into
helpers. This streamlines heap_page_prune_and_freeze() and makes it more
clear when the examination of tuples ends and page modifications begin.
I think this is a considerable improvement.
I didn't review this with a lot of detail, given that it's mostly moving
code.
One minor thing: It's slightly odd that prune_freeze_plan() gets an oid
argument, prune_freeze_setup() gets the entire prstate,
heap_page_will_freeze() gets the Relation. It's what they need, but still a
bit odd.
FWIW, I found the diff generated by
git show --diff-algorithm=minimal --color-moved-ws=allow-indentation-change
useful for viewing this diff, showed much more clearly how little the code
actually changed.
From 8ebaf434af5afaebcf71550116c59355b3bf15c1 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <melanieplageman@gmail.com>
Date: Wed, 8 Oct 2025 15:39:01 -0400
Subject: [PATCH v22 2/9] Eliminate XLOG_HEAP2_VISIBLE from vacuum phase I
prune/freezeVacuum no longer emits a separate WAL record for each page set
all-visible or all-frozen during phase I. Instead, visibility map
updates are now included in the XLOG_HEAP2_PRUNE_VACUUM_SCAN record that
is already emitted for pruning and freezing.Previously, heap_page_prune_and_freeze() determined whether a page was
all-visible, but the corresponding VM bits were only set later in
lazy_scan_prune(). Now the VM is updated immediately in
heap_page_prune_and_freeze(), at the same time as the heap
modifications.This change applies only to vacuum phase I, not to pruning performed
during normal page access.
Hm. This change makes sense, but unfortunately I find it somewhat hard to
review. There are a lot of changes that don't obviously work towards one
goal in this commit.
@@ -238,6 +239,16 @@ typedef struct PruneFreezeParams
Relation relation; /* relation containing buffer to be pruned */
Buffer buffer; /* buffer to be pruned */+ /* + * + * vmbuffer is the buffer that must already contain contain the required + * block of the visibility map if we are to update it. blk_known_av is the + * visibility status of the heap block as of the last call to + * find_next_unskippable_block(). + */ + Buffer vmbuffer; + bool blk_known_av; + /* * The reason pruning was performed. It is used to set the WAL record * opcode which is used for debugging and analysis purposes.
What is blk_known_av set to if the block is known to not be all visible?
Compared to the case where we did not yet determine the visibility status of
the block?
@@ -250,8 +261,10 @@ typedef struct PruneFreezeParams * HEAP_PAGE_PRUNE_MARK_UNUSED_NOW indicates that dead items can be set * LP_UNUSED during pruning. * - * HEAP_PAGE_PRUNE_FREEZE indicates that we will also freeze tuples, and - * will return 'all_visible', 'all_frozen' flags to the caller. + * HEAP_PAGE_PRUNE_FREEZE indicates that we will also freeze tuples + * + * HEAP_PAGE_PRUNE_UPDATE_VIS indicates that we will set the page's status + * in the VM. */ int options;
nit^2: The previous version and the other paragraphs end in a .
@@ -157,17 +159,36 @@ heap_xlog_prune_freeze(XLogReaderState *record)
/* There should be no more data */
Assert((char *) frz_offsets == dataptr + datalen);- if (vmflags & VISIBILITYMAP_VALID_BITS) - PageSetAllVisible(page); - - MarkBufferDirty(buffer); + if (do_prune || nplans > 0) + mark_buffer_dirty = set_lsn = true;/* - * See log_heap_prune_and_freeze() for commentary on when we set the - * heap page LSN. + * The critical integrity requirement here is that we must never end + * up with with the visibility map bit set and the page-level + * PD_ALL_VISIBLE bit clear. If that were to occur, a subsequent page
s/clear/unset/ would be a tad clearer.
+ * modification would fail to clear the visibility map bit. + * + * vmflags may be nonzero with PD_ALL_VISIBLE already set (e.g. when + * marking an all-visible page all-frozen). If only the VM is updated, + * the heap page need not be dirtied. */ - if (do_prune || nplans > 0 || - ((vmflags & VISIBILITYMAP_VALID_BITS) && XLogHintBitIsNeeded())) + if ((vmflags & VISIBILITYMAP_VALID_BITS) && !PageIsAllVisible(page)) + { + PageSetAllVisible(page); + mark_buffer_dirty = true; + + /* + * See log_heap_prune_and_freeze() for commentary on when we set + * the heap page LSN. + */ + if (XLogHintBitIsNeeded()) + set_lsn = true; + }
Maybe worth adding something like Assert(!set_lsn || mark_buffer_dirty)?
+/* + * Decide whether to set the visibility map bits for heap_blk, using + * information from PruneState and blk_known_av. Some callers may already + * have examined this page’s VM bits (e.g., VACUUM in the previous + * heap_vac_scan_next_block() call) and can pass that along.
That's not entirely trivial to follow, tbh. As mentioned above, it's not clear
to me how the state of a block where did determine that the block is *not*
all-visible is represented.
+ * Returns true if one or both VM bits should be set, along with the desired + * flags in *vmflags. Also indicates via do_set_pd_vis whether PD_ALL_VISIBLE + * should be set on the heap page. + */ +static bool +heap_page_will_set_vis(Relation relation, + BlockNumber heap_blk, + Buffer heap_buf, + Buffer vmbuffer, + bool blk_known_av, + const PruneState *prstate, + uint8 *vmflags, + bool *do_set_pd_vis) +{ + Page heap_page = BufferGetPage(heap_buf); + bool do_set_vm = false; + + *do_set_pd_vis = false; + + + /* + * Now handle two potential corruption cases: + * + * These do not need to happen in a critical section and are not + * WAL-logged. + * + * As of PostgreSQL 9.2, the visibility map bit should never be set if the + * page-level bit is clear. However, it's possible that in vacuum the bit + * got cleared after heap_vac_scan_next_block() was called, so we must + * recheck with buffer lock before concluding that the VM is corrupt. + */ + else if (blk_known_av && !PageIsAllVisible(heap_page) && + visibilitymap_get_status(relation, heap_blk, &vmbuffer) != 0) + { + ereport(WARNING, + (errcode(ERRCODE_DATA_CORRUPTED), + errmsg("page is not marked all-visible but visibility map bit is set in relation \"%s\" page %u", + RelationGetRelationName(relation), heap_blk))); + + visibilitymap_clear(relation, heap_blk, vmbuffer, + VISIBILITYMAP_VALID_BITS);
Wait, why is it ok to perform this check iff blk_known_av is set?
+ old_vmbits = visibilitymap_set_vmbits(blockno, + vmbuffer, new_vmbits, + params->relation->rd_locator); + if (old_vmbits == new_vmbits) + { + LockBuffer(vmbuffer, BUFFER_LOCK_UNLOCK); + /* Unset so we don't emit WAL since no change occurred */ + do_set_vm = false; + } + }
What can lead to this path being reached? Doesn't this mean that something
changed the state of the VM while we were holding an exclusive lock on the
heap buffer?
+ /* + * Emit a WAL XLOG_HEAP2_PRUNE* record showing what we did. If we were + * only updating the VM and it turns out it was already set, we will + * have unset do_set_vm earlier. As such, check it again before + * emitting the record. + */ + if (RelationNeedsWAL(params->relation) && + (do_prune || do_freeze || do_set_vm)) + { log_heap_prune_and_freeze(params->relation, buffer, - InvalidBuffer, /* vmbuffer */ - 0, /* vmflags */ + do_set_vm ? vmbuffer : InvalidBuffer, + do_set_vm ? new_vmbits : 0, conflict_xid, - true, params->reason, + true, /* cleanup lock */ + do_set_pd_vis, + params->reason, prstate.frozen, prstate.nfrozen, prstate.redirected, prstate.nredirected, prstate.nowdead, prstate.ndead,
This function is now taking 16 parameters :/
@@ -959,28 +1148,47 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
END_CRIT_SECTION();
+ if (do_set_vm) + LockBuffer(vmbuffer, BUFFER_LOCK_UNLOCK); + + /* + * During its second pass over the heap, VACUUM calls + * heap_page_would_be_all_visible() to determine whether a page is + * all-visible and all-frozen. The logic here is similar. After completing + * pruning and freezing, use an assertion to verify that our results + * remain consistent with heap_page_would_be_all_visible(). + */ +#ifdef USE_ASSERT_CHECKING + if (prstate.all_visible) + { + TransactionId debug_cutoff; + bool debug_all_frozen; + + Assert(prstate.lpdead_items == 0); + Assert(prstate.cutoffs); + + if (!heap_page_is_all_visible(params->relation, buffer, + prstate.cutoffs->OldestXmin, + &debug_all_frozen, + &debug_cutoff, off_loc)) + Assert(false);
I don't love Assert(false), because the message for the assert failure is
pretty much meaningless. Sometimes it's hard to avoid, but here you have an if
() that has no body other than Assert(false)? Just Assert the expression
directly.
From 34f0009570e117d7d48b560cd097ee25c6cdcc7c Mon Sep 17 00:00:00 2001
From: Melanie Plageman <melanieplageman@gmail.com>
Date: Sat, 27 Sep 2025 11:55:21 -0400
Subject: [PATCH v22 3/9] Eliminate XLOG_HEAP2_VISIBLE from empty-page vacuumAs part of removing XLOG_HEAP2_VISIBLE records, phase I of VACUUM now
marks empty pages all-visible in a XLOG_HEAP2_PRUNE_VACUUM_SCAN record.
This whole business of treating empty pages as all-visible continues to not
make any sense to me. Particularly in combination with a not crashsafe FSM it
just seems ... unhelpful. It also means that there there's a decent chance of
extra WAL when bulk extending. But that's not the fault of this change.
From 0d6a06d4533cfe153440d301c3d20915ba07892f Mon Sep 17 00:00:00 2001
From: Melanie Plageman <melanieplageman@gmail.com>
Date: Sat, 27 Sep 2025 11:55:36 -0400
Subject: [PATCH v22 4/9] Remove XLOG_HEAP2_VISIBLE entirelyAs no remaining users emit XLOG_HEAP2_VISIBLE records.
This includes deleting the xl_heap_visible struct and all functions
responsible for emitting or replaying XLOG_HEAP2_VISIBLE records.
Probably worth mentioning that this changes the VM API.
@@ -2396,14 +2396,18 @@ get_conflict_xid(bool do_prune, bool do_freeze, bool do_set_vm, * * This is used for several different page maintenance operations: * - * - Page pruning, in VACUUM's 1st pass or on access: Some items are + * - Page pruning, in vacuum phase I or on-access: Some items are * redirected, some marked dead, and some removed altogether. * - * - Freezing: Items are marked as 'frozen'. + * - Freezing: During vacuum phase I, items are marked as 'frozen' * - * - Vacuum, 2nd pass: Items that are already LP_DEAD are marked as unused. + * - Reaping: During vacuum phase III, items that are already LP_DEAD are + * marked as unused. * - * They have enough commonalities that we use a single WAL record for them + * - VM updates: After vacuum phases I and III, the heap page may be marked + * all-visible and all-frozen. + * + * These changes all happen together, so we use a single WAL record for them * all. * * If replaying the record requires a cleanup lock, pass cleanup_lock = true.
How's that related to the commit's subject?
From fd0455230968fd919999a5c035f3830d310f0e49 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <melanieplageman@gmail.com>
Date: Fri, 18 Jul 2025 16:30:04 -0400
Subject: [PATCH v22 5/9] Rename GlobalVisTestIsRemovableXid() to
GlobalVisXidVisibleToAll()
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bitThe function is currently only used to check whether a tuple’s xmax is
visible to all transactions (and thus removable). Upcoming changes will
also use it to test whether a tuple’s xmin is visible to all to
decide if a page can be marked all-visible in the visibility map.The new name, GlobalVisXidVisibleToAll(), better reflects this broader
purpose.
If we want this - and I'm not convinced we do - I think it needs to go further
and change the other uses of removable in
procarray.c. ComputeXidHorizonsResult has a lot of related fields.
There's also GetOldestNonRemovableTransactionId(),
GlobalVisCheckRemovableXid(), GlobalVisCheckRemovableFullXid() that weren't
included in the renaming.
From 565014e31aa117fb43993ee2e64da38ffb74f372 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <melanieplageman@gmail.com>
Date: Tue, 29 Jul 2025 14:38:24 -0400
Subject: [PATCH v22 6/9] Use GlobalVisState in vacuum to determine page level
visibility
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bitDuring vacuum's first and third phases, we examine tuples' visibility
to determine if we can set the page all-visible in the visibility map.Previously, this check compared tuple xmins against a single XID chosen at
the start of vacuum (OldestXmin). We now use GlobalVisState, which also
enables future work to set the VM during on-access pruning, since ordinary
queries have access to GlobalVisState but not OldestXmin.This also benefits vacuum directly: in some cases, GlobalVisState may
advance during a vacuum, allowing more pages to become considered
all-visible. And, in the future, we could easily add a heuristic to
update GlobalVisState more frequently during vacuums of large tables. In
the rare case that the GlobalVisState moves backward, vacuum falls back
to OldestXmin to ensure we don’t attempt to freeze a dead tuple that
wasn’t yet prunable according to the GlobalVisState.
I think it may be better to make sure that the GlobalVisState can't move
backward.
From bced81f6df3d303679fac2a1414d42f0db401232 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <melanieplageman@gmail.com>
Date: Tue, 29 Jul 2025 14:34:30 -0400
Subject: [PATCH v22 8/9] Allow on-access pruning to set pages all-visibleMany queries do not modify the underlying relation. For such queries, if
on-access pruning occurs during the scan, we can check whether the page
has become all-visible and update the visibility map accordingly.
Previously, only vacuum and COPY FREEZE marked pages as all-visible or
all-frozen.
Supporting this requires passing information about whether the relation
is modified from the executor down to the scan descriptor.
I think it'd be good to split this part into a separate commit. The set of
folks to review that are distinct (and broader) from the ones looking at
heapam internals.
Greetings,
Andres Freund
Thanks for the review!
On Thu, Nov 20, 2025 at 8:10 PM Chao Li <li.evan.chao@gmail.com> wrote:
* new_relfrozen_xid and new_relmin_mxid must provided by the caller if the - * HEAP_PRUNE_FREEZE option is set. On entry, they contain the oldest XID and - * multi-XID seen on the relation so far. They will be updated with oldest - * values present on the page after pruning. After processing the whole - * relation, VACUUM can use these values as the new relfrozenxid/relminmxid - * for the relation. + * HEAP_PAGE_PRUNE_FREEZE option is set in params. On entry, they contain the + * oldest XID and multi-XID seen on the relation so far. They will be updated + * with oldest values present on the page after pruning. After processing the + * whole relation, VACUUM can use these values as the new + * relfrozenxid/relminmxid for the relation. */ void -heap_page_prune_and_freeze(Relation relation, Buffer buffer, - GlobalVisState *vistest, - int options, - struct VacuumCutoffs *cutoffs, +heap_page_prune_and_freeze(PruneFreezeParams *params, PruneFreezeResult *presult, - PruneReason reason, OffsetNumber *off_loc, TransactionId *new_relfrozen_xid, MultiXactId *new_relmin_mxid) { ```For this function interface change, I got a concern. The old function comment says "cutoffs contains the freeze cutoffs …. Required if HEAP_PRUNE_FREEZE option is set.”, meaning that cutoffs is only useful and must be set when HEAP_PRUNE_FREEZE is set. But the new comment seems to have lost this indication.
I did move that comment into the PruneFreezeParams struct definition.
And in the old function interface, cutoffs sat right next to options, readers are easy to notice:
* when options is 0, cutoffs is null
```
heap_page_prune_and_freeze(relation, buffer, vistest, 0,
NULL, &presult, PRUNE_ON_ACCESS, &dummy_off_loc, NULL, NULL);
```* when options has HEAP_PAGE_PRUNE_FREEZE, cutoffs is passed in
```
prune_options = HEAP_PAGE_PRUNE_FREEZE;
if (vacrel->nindexes == 0)
prune_options |= HEAP_PAGE_PRUNE_MARK_UNUSED_NOW;heap_page_prune_and_freeze(rel, buf, vacrel->vistest, prune_options,
&vacrel->cutoffs, &presult, PRUNE_VACUUM_SCAN,
&vacrel->offnum,
&vacrel->NewRelfrozenXid, &vacrel->NewRelminMxid);
```So, the change doesn’t break anything, but makes code a little bit harder to read. So, my suggestion is to add an assert in heap_page_prune_and_freeze, something like:
Assert(!(params->options & HEAP_PAGE_PRUNE_FREEZE) || params->cutoffs != NULL);
That's fair. I've gone ahead and pushed a commit with your suggested assert.
2 - pushed 0001 ``` + PruneFreezeParams params = {.relation = rel,.buffer = buf, + .reason = PRUNE_VACUUM_SCAN,.options = HEAP_PAGE_PRUNE_FREEZE, + .cutoffs = &vacrel->cutoffs,.vistest = vacrel->vistest + }; ```Using a designated initializer is not wrong, but makes future maintenance harder, because when a new field is added, this initializer will leave the new field uninitiated. From my impression, I don’t remember I ever see a designated initializer in PG code. I only remember 3 ways I have seen:
* use an initialize function to set every fields individually
* palloc0 to set all 0, then set non-zero fields individually
* {0} to set all 0, then set non-zero fields individually
Well, the main reason you don't see them much in the code is that a
lot of the code is old and we didn't require a c99-compliant compiler
until fairly recently (okay like 2018/2019) -- and thus couldn't use
designated initializers.
I agree that they are rare for structs (they are quite commonly used
with arrays), but they are there -- for example these bufmgr init
macros
#define BMR_REL(p_rel) \
((BufferManagerRelation){.rel = p_rel})
#define BMR_SMGR(p_smgr, p_relpersistence) \
((BufferManagerRelation){.smgr = p_smgr, .relpersistence =
p_relpersistence})
#define BMR_GET_SMGR(bmr) \
(RelationIsValid((bmr).rel) ? RelationGetSmgr((bmr).rel) : (bmr).smgr)
I don't see how it would be harder to remember to initialize a field
with a designated initializer vs if you have to just remember to add a
line initializing that field in the code. And using a designated
initializer ensures all unspecified fields will be zeroed out.
In general, I have seen threads [1]/messages/by-id/5B873BED.9080501@anastigmatix.net encouraging the use of designated
initializers, so I'm inclined to leave it as is since it is committed,
and I haven't heard other pushback.
3 - pushed 0002
```
prstate->all_visible = false;
+ prstate->all_frozen = false;
```Nit: Now setting the both fields to false repeat in 6 places. Maybe add a static inline function, say PruneClearVisibilityFlags(), may improve maintainability.
I see your point. However, I don't think it would necessarily be an
improvement. This function already has a lot of helpers you have to
jump to to understand what's going on. And in the place where they are
cleared most often, heap_prune_record_unchanged_lp_normal(), we set
other fields of the prstate directly, so it is nice visual symmetry in
my opinion to set them inline.
I did want to use chained assignment (all_visible = all_frozen =
false), but I have had people complain about that in my code before
more than once, so I refrained.
4 - pushed 0003
```
+ * opporunistically freeze, to indicate if the VM bits can be set. They are
```Typo: opporunistically, missed a “t”.
Fixed in same commit that added the assert.
- Melanie
Thanks for the review! All the small changes you suggested I made in
attached v23 unless otherwise noted below.
On Mon, Nov 24, 2025 at 5:24 PM Andres Freund <andres@anarazel.de> wrote:
On 2025-11-20 12:19:58 -0500, Melanie Plageman wrote:
Subject: [PATCH v22 1/9] Split heap_page_prune_and_freeze() into helpers
One minor thing: It's slightly odd that prune_freeze_plan() gets an oid
argument, prune_freeze_setup() gets the entire prstate,
heap_page_will_freeze() gets the Relation. It's what they need, but still a
bit odd.
They all get the PruneState actually.
I've committed this patch (but actually have to do a follow-on commit
to silence coverity. Will do that next.)
Subject: [PATCH v22 2/9] Eliminate XLOG_HEAP2_VISIBLE from vacuum phase I
prune/freezeHm. This change makes sense, but unfortunately I find it somewhat hard to
review. There are a lot of changes that don't obviously work towards one
goal in this commit.
I've split up the first commit into 4 patches in attached v23
(0002-0005). They are not meant to be committed separately but are
separate only for ease of review. They comprise the logical steps for
getting to the final code state. I originally had it split up but got
feedback it was more work to review them each. So, let's see how this
goes.
@@ -238,6 +239,16 @@ typedef struct PruneFreezeParams
+ * vmbuffer is the buffer that must already contain contain the required + * block of the visibility map if we are to update it. blk_known_av is the + * visibility status of the heap block as of the last call to + * find_next_unskippable_block(). + */ + Buffer vmbuffer; + bool blk_known_av;What is blk_known_av set to if the block is known to not be all visible?
Compared to the case where we did not yet determine the visibility status of
the block?
blk_known_av should always be set to false if the caller doesn't know.
It is used as an optimization. I've added to the comment in this
struct to clarify that. More on this further down in my mail.
+ * Decide whether to set the visibility map bits for heap_blk, using + * information from PruneState and blk_known_av. Some callers may already + * have examined this page’s VM bits (e.g., VACUUM in the previous + * heap_vac_scan_next_block() call) and can pass that along.That's not entirely trivial to follow, tbh. As mentioned above, it's not clear
to me how the state of a block where did determine that the block is *not*
all-visible is represented.
There is no need to distinguish between knowing it is not all-visible
and not knowing if it is all-visible. That is, "not known" and "known
not" are the same for our purposes. This is only an optimization and
not needed for correctness. I've tried to add comments to this effect
in various places where blk_known_av is used.
+ else if (blk_known_av && !PageIsAllVisible(heap_page) && + visibilitymap_get_status(relation, heap_blk, &vmbuffer) != 0) + { + ereport(WARNING, + (errcode(ERRCODE_DATA_CORRUPTED), + errmsg("page is not marked all-visible but visibility map bit is set in relation \"%s\" page %u", + RelationGetRelationName(relation), heap_blk))); + + visibilitymap_clear(relation, heap_blk, vmbuffer, + VISIBILITYMAP_VALID_BITS);Wait, why is it ok to perform this check iff blk_known_av is set?
This is existing logic in vacuum. It would be okay to perform the
check even if blk_known_av is false but might be too expensive for the
common case where the page is not all-visible (especially on-access).
The next vacuum should be able to enter this code path and fix it. Or
do you think it will be cheap enough because the caller will have read
in and pinned the VM page?
+ old_vmbits = visibilitymap_set_vmbits(blockno, + vmbuffer, new_vmbits, + params->relation->rd_locator); + if (old_vmbits == new_vmbits) + { + LockBuffer(vmbuffer, BUFFER_LOCK_UNLOCK); + /* Unset so we don't emit WAL since no change occurred */ + do_set_vm = false; + } + }What can lead to this path being reached? Doesn't this mean that something
changed the state of the VM while we were holding an exclusive lock on the
heap buffer?
This shouldn't be in this commit (I've fixed that). However, it is
needed once we have on-access VM setting because we could have set the
page all-visible in the VM on-access in between when
find_next_unskippable_block() first checks the VM and sets
all_visible_according_to_vm/blk_known_av and when we take the lock and
prune/freeze the page.
log_heap_prune_and_freeze(params->relation, buffer, - InvalidBuffer, /* vmbuffer */ - 0, /* vmflags */ + do_set_vm ? vmbuffer : InvalidBuffer, + do_set_vm ? new_vmbits : 0, conflict_xid, - true, params->reason, + true, /* cleanup lock */ + do_set_pd_vis, + params->reason, prstate.frozen, prstate.nfrozen, prstate.redirected, prstate.nredirected, prstate.nowdead, prstate.ndead,This function is now taking 16 parameters :/
Is this complaint about readability or performance of parameter
passing? Because if it's the latter, I can't imagine that will be
noticeable when compared to the overhead of emitting a WAL record.
I could add a struct just for passing the parameters to the
log_heap_prune_and_freeze(). Something like:
typedef struct PruneFreezeChanges
{
int nredirected;
int ndead;
int nunused;
int nfrozen;
OffsetNumber *redirected;
OffsetNumber *nowdead;
OffsetNumber *nowunused;
HeapTupleFreeze *frozen;
} PruneFreezeChanges;
PruneFreezeChanges c = {
.redirected = prstate.redirected,
.nredirected = prstate.nredirected,
.ndead = prstate.ndead,
.nowdead = prstate.nowdead,
.nunused = prstate.nunused,
.nowunused = prstate.nowunused,
.nfrozen = prstate.nfrozen,
.frozen = prstate.frozen,
};
log_heap_prune_and_freeze(params->relation, buffer,
InvalidBuffer,
/* vmbuffer */
0, /* vmflags */
conflict_xid,
true, params->reason,
c);
However, I fear it is a bit confusing to have this struct just to pass
the parameters to the log_heap_prune_and_freeze(). We can't use that
struct inline in the PruneState because then we would need all the
arrays to be inline in the PruneFreezeChanges struct which would cause
4*MaxHeapTuplesPerPage stack allocated OffsetNumbers in vacuum phase
III than it currently has and needs.
The only other related parameters I see that could be stuck into a
struct are vmflags and set_pd_all_vis -- maybe called VisiChanges or
HeapPageVisiChanges. But again, I'm not sure if it is worth adding a
new struct for this.
+#ifdef USE_ASSERT_CHECKING + if (prstate.all_visible) + { + TransactionId debug_cutoff; + bool debug_all_frozen; + + Assert(prstate.lpdead_items == 0); + Assert(prstate.cutoffs); + + if (!heap_page_is_all_visible(params->relation, buffer, + prstate.cutoffs->OldestXmin, + &debug_all_frozen, + &debug_cutoff, off_loc)) + Assert(false);I don't love Assert(false), because the message for the assert failure is
pretty much meaningless. Sometimes it's hard to avoid, but here you have an if
() that has no body other than Assert(false)? Just Assert the expression
directly.
This is existing code. I agree it's weird, but I remember Peter saying
something about why he did it this way that I no longer remember.
Anyway, 0001 changes the assert as you suggest.
Subject: [PATCH v22 3/9] Eliminate XLOG_HEAP2_VISIBLE from empty-page vacuum
As part of removing XLOG_HEAP2_VISIBLE records, phase I of VACUUM now
marks empty pages all-visible in a XLOG_HEAP2_PRUNE_VACUUM_SCAN record.This whole business of treating empty pages as all-visible continues to not
make any sense to me. Particularly in combination with a not crashsafe FSM it
just seems ... unhelpful. It also means that there there's a decent chance of
extra WAL when bulk extending. But that's not the fault of this change.
Is the argument for setting them av/af that we can skip them more
easily in future vacuums (i.e. not have to read in the page and take a
lock etc)?
Subject: [PATCH v22 4/9] Remove XLOG_HEAP2_VISIBLE entirely
As no remaining users emit XLOG_HEAP2_VISIBLE records.
This includes deleting the xl_heap_visible struct and all functions
responsible for emitting or replaying XLOG_HEAP2_VISIBLE records.Probably worth mentioning that this changes the VM API.
I've added a mention about this in the commit.
Are you imagining I have any comments anywhere about how
XLOG_HEAP2_VISIBLE used to exist?
I realized I need to bump XLOG_PAGE_MAGIC in this commit because the
code to replay XLOG_HEAP2_VISIBLE records is gone now.
What I'm not sure is if I have to bump it in some of the other commits
that change which WAL records are emitted by a particular operation
(e.g. not emitting a separate VM record from phase I of vacuum).
- * - Page pruning, in VACUUM's 1st pass or on access: Some items are + * - Page pruning, in vacuum phase I or on-access: Some items are * redirected, some marked dead, and some removed altogether. * - * - Freezing: Items are marked as 'frozen'. + * - Freezing: During vacuum phase I, items are marked as 'frozen' * - * - Vacuum, 2nd pass: Items that are already LP_DEAD are marked as unused. + * - Reaping: During vacuum phase III, items that are already LP_DEAD are + * marked as unused. * - * They have enough commonalities that we use a single WAL record for them + * - VM updates: After vacuum phases I and III, the heap page may be marked + * all-visible and all-frozen. + * + * These changes all happen together, so we use a single WAL record for them * all. * * If replaying the record requires a cleanup lock, pass cleanup_lock = true.How's that related to the commit's subject?
Oops, I moved it to the relevant commit.
Subject: [PATCH v22 5/9] Rename GlobalVisTestIsRemovableXid() to
GlobalVisXidVisibleToAll()The function is currently only used to check whether a tuple’s xmax is
visible to all transactions (and thus removable). Upcoming changes will
also use it to test whether a tuple’s xmin is visible to all to
decide if a page can be marked all-visible in the visibility map.The new name, GlobalVisXidVisibleToAll(), better reflects this broader
purpose.If we want this - and I'm not convinced we do - I think it needs to go further
and change the other uses of removable in
procarray.c. ComputeXidHorizonsResult has a lot of related fields.There's also GetOldestNonRemovableTransactionId(),
GlobalVisCheckRemovableXid(), GlobalVisCheckRemovableFullXid() that weren't
included in the renaming.
Okay, I see what you are saying. When you say you're not sure if we
want "this" -- do you mean using GlobalVisState for determining if
xmins are visible to all (which is required to set the VM on-access)
or do you mean renaming those functions?
If we're just talking about the renaming, looking at procarray.c, it
is full of the word "removable" because its functions were largely
used to examine and determine if everyone can see an xmax as committed
and thus if that tuple is removable from their perspective. But
nothing about the code that I can see means it has to be an xmax. We
could just as well use the functions to determine if everyone can see
an xmin as committed.
I don't see how we can leave the names as is and use it on xmins
because that tuple is _not_ removable based on testing if everyone can
see the xmin. So the function basically returns an incorrect result.
That being said, the problem with replacing "removable" with "visible
to all" -- which isn't _terrible_ -- means we have to replace
"nonremovable" with "not visible to all" -- which is terrible.
I think getting rid of "removable" from procarray.c would be an
improvement because that file feels tightly coupled to vacuum and
removing tuples because of the names of variables and functions when
actually its functionality isn't. So, the issue is coming up with
something palatable.
One alternative idea (that requires no renaming) is to add a wrapper
function somewhere outside procarray.c which invokes
GlobalVisTestIsRemovableXid() but is called something like
XidVisibleToAll() and is documented for use with xmins/etc. It would
avoid the messy work of coming up with a good name. What do you think?
Subject: [PATCH v22 6/9] Use GlobalVisState in vacuum to determine page level
visibilityThis also benefits vacuum directly: in some cases, GlobalVisState may
advance during a vacuum, allowing more pages to become considered
all-visible. And, in the future, we could easily add a heuristic to
update GlobalVisState more frequently during vacuums of large tables. In
the rare case that the GlobalVisState moves backward, vacuum falls back
to OldestXmin to ensure we don’t attempt to freeze a dead tuple that
wasn’t yet prunable according to the GlobalVisState.I think it may be better to make sure that the GlobalVisState can't move
backward.
Do you mean that I shouldn't use the GlobalVisState to determine
visibility until I make sure it can't move backwards?
There is actually no functional difference in my patch set with the
code this commit message refers to (in heap_prune_satisfies_vacuum()).
I only mentioned it to make sure folks knew that even though I was
widening usage of GlobalVisState, we wouldn't encounter
synchronization issues with freezing horizons.
Subject: [PATCH v22 8/9] Allow on-access pruning to set pages all-visible
Many queries do not modify the underlying relation. For such queries, if
on-access pruning occurs during the scan, we can check whether the page
has become all-visible and update the visibility map accordingly.
Previously, only vacuum and COPY FREEZE marked pages as all-visible or
all-frozen.Supporting this requires passing information about whether the relation
is modified from the executor down to the scan descriptor.I think it'd be good to split this part into a separate commit. The set of
folks to review that are distinct (and broader) from the ones looking at
heapam internals.
Good point. I've split it into 3 commits in this patch set (0011-0013)
- Melanie
Attachments:
v23-0001-Simplify-vacuum-visibility-assertion.patchtext/x-patch; charset=US-ASCII; name=v23-0001-Simplify-vacuum-visibility-assertion.patchDownload
From 7d51aaf9fea35367e36d143828412727a44d63d6 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <melanieplageman@gmail.com>
Date: Wed, 3 Dec 2025 10:42:53 -0500
Subject: [PATCH v23 01/14] Simplify vacuum visibility assertion
Phase I vacuum gives the page a once-over after pruning and freezing to
check that the values of all_visible and all_frozen agree with the
result of heap_page_is_all_visible(). This is meant to keep the logic in
phase I for determining visibility in sync with the logic in phase III.
Rewrite the assertion to avoid an Assert(false).
Suggested by Andres Freund.
---
src/backend/access/heap/vacuumlazy.c | 7 +++----
1 file changed, 3 insertions(+), 4 deletions(-)
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index 65bb0568a86..984d5879947 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -2028,10 +2028,9 @@ lazy_scan_prune(LVRelState *vacrel,
Assert(presult.lpdead_items == 0);
- if (!heap_page_is_all_visible(vacrel->rel, buf,
- vacrel->cutoffs.OldestXmin, &debug_all_frozen,
- &debug_cutoff, &vacrel->offnum))
- Assert(false);
+ Assert(heap_page_is_all_visible(vacrel->rel, buf,
+ vacrel->cutoffs.OldestXmin, &debug_all_frozen,
+ &debug_cutoff, &vacrel->offnum));
Assert(presult.all_frozen == debug_all_frozen);
--
2.43.0
v23-0002-Refactor-lazy_scan_prune-VM-set-logic-into-helpe.patchtext/x-patch; charset=US-ASCII; name=v23-0002-Refactor-lazy_scan_prune-VM-set-logic-into-helpe.patchDownload
From 7023583962f987cfde5450c8a2142574bb3ce84d Mon Sep 17 00:00:00 2001
From: Melanie Plageman <melanieplageman@gmail.com>
Date: Tue, 2 Dec 2025 13:36:39 -0500
Subject: [PATCH v23 02/14] Refactor lazy_scan_prune() VM set logic into helper
This commit is meant for ease of review only. It is a step towards
setting the VM in the same record as pruning and freezing in phase I of
vacuum. It isn't meant to be committed alone because it widens an
undesirable case where a heap buffer not marked dirty is stamped with an
LSN. If PD_ALL_VISIBLE is already set but the VM is not set, we won't
mark it dirty and then if checksums are enabled we will still stamp the
heap page LSN on a page not marked dirty.
Once the VM update is done in the same WAL record as pruning/freezing,
we will only set the LSN on the heap page if we set PD_ALL_VISIBLE or
made other heap page modifications.
---
src/backend/access/heap/vacuumlazy.c | 283 ++++++++++++++-------------
1 file changed, 146 insertions(+), 137 deletions(-)
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index 984d5879947..1cca095841e 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -1934,6 +1934,117 @@ cmpOffsetNumbers(const void *a, const void *b)
return pg_cmp_u16(*(const OffsetNumber *) a, *(const OffsetNumber *) b);
}
+
+/*
+ * Decide whether to set the visibility map bits for heap_blk, using
+ * information from PruneFreezeResult and all_visible_according_to_vm. This
+ * function does not actually set the VM bit or page-level hint,
+ * PD_ALL_VISIBLE.
+ *
+ * If it finds that the page-level visibility hint or VM is corrupted, it will
+ * fix them by clearing the VM bit and page hint. This does not need to be
+ * done in a critical section.
+ *
+ * Returns true if one or both VM bits should be set, along with the desired
+ * flags in *new_vmbits. Also indicates via do_set_pd_vis whether
+ * PD_ALL_VISIBLE should be set on the heap page.
+ */
+static bool
+heap_page_will_set_vis(Relation relation,
+ BlockNumber heap_blk,
+ Buffer heap_buf,
+ Buffer vmbuffer,
+ bool all_visible_according_to_vm,
+ const PruneFreezeResult *presult,
+ uint8 *new_vmbits,
+ bool *do_set_pd_vis)
+{
+ Page heap_page = BufferGetPage(heap_buf);
+
+ *new_vmbits = 0;
+
+ /*
+ * It should never be the case that the visibility map page is set while
+ * the page-level bit is clear, but the reverse is allowed (if checksums
+ * are not enabled).
+ *
+ * We avoid relying on all_visible_according_to_vm as a proxy for the
+ * page-level PD_ALL_VISIBLE bit being set, since it might have become
+ * stale.
+ */
+ *do_set_pd_vis = presult->all_visible & !PageIsAllVisible(heap_page);
+
+ /*
+ * Determine what to set the visibility map bits to based on information
+ * from the VM (as of last heap_vac_scan_next_block() call), and from
+ * all_visible and all_frozen variables.
+ */
+ if ((presult->all_visible && !all_visible_according_to_vm) ||
+ (presult->all_frozen && !VM_ALL_FROZEN(relation, heap_blk, &vmbuffer)))
+ {
+ *new_vmbits = VISIBILITYMAP_ALL_VISIBLE;
+ if (presult->all_frozen)
+ {
+ Assert(!TransactionIdIsValid(presult->vm_conflict_horizon));
+ *new_vmbits |= VISIBILITYMAP_ALL_FROZEN;
+ }
+
+ return true;
+ }
+
+ /*
+ * Now handle two potential corruption cases:
+ *
+ * These do not need to happen in a critical section and are not
+ * WAL-logged.
+ *
+ * As of PostgreSQL 9.2, the visibility map bit should never be set if the
+ * page-level bit is clear. However, it's possible that the bit got
+ * cleared after heap_vac_scan_next_block() was called, so we must recheck
+ * with buffer lock before concluding that the VM is corrupt.
+ */
+ else if (all_visible_according_to_vm && !PageIsAllVisible(heap_page) &&
+ visibilitymap_get_status(relation, heap_blk, &vmbuffer) != 0)
+ {
+ ereport(WARNING,
+ (errcode(ERRCODE_DATA_CORRUPTED),
+ errmsg("page is not marked all-visible but visibility map bit is set in relation \"%s\" page %u",
+ RelationGetRelationName(relation), heap_blk)));
+
+ visibilitymap_clear(relation, heap_blk, vmbuffer,
+ VISIBILITYMAP_VALID_BITS);
+ }
+
+ /*
+ * It's possible for the value returned by
+ * GetOldestNonRemovableTransactionId() to move backwards, so it's not
+ * wrong for us to see tuples that appear to not be visible to everyone
+ * yet, while PD_ALL_VISIBLE is already set. The real safe xmin value
+ * never moves backwards, but GetOldestNonRemovableTransactionId() is
+ * conservative and sometimes returns a value that's unnecessarily small,
+ * so if we see that contradiction it just means that the tuples that we
+ * think are not visible to everyone yet actually are, and the
+ * PD_ALL_VISIBLE flag is correct.
+ *
+ * There should never be LP_DEAD items on a page with PD_ALL_VISIBLE set,
+ * however.
+ */
+ else if (presult->lpdead_items > 0 && PageIsAllVisible(heap_page))
+ {
+ ereport(WARNING,
+ (errcode(ERRCODE_DATA_CORRUPTED),
+ errmsg("page containing LP_DEAD items is marked as all-visible in relation \"%s\" page %u",
+ RelationGetRelationName(relation), heap_blk)));
+
+ PageClearAllVisible(heap_page);
+ MarkBufferDirty(heap_buf);
+ visibilitymap_clear(relation, heap_blk, vmbuffer,
+ VISIBILITYMAP_VALID_BITS);
+ }
+
+ return false;
+}
+
/*
* lazy_scan_prune() -- lazy_scan_heap() pruning and freezing.
*
@@ -1964,6 +2075,10 @@ lazy_scan_prune(LVRelState *vacrel,
bool *vm_page_frozen)
{
Relation rel = vacrel->rel;
+ bool do_set_vm = false;
+ bool do_set_pd_vis = false;
+ uint8 new_vmbits = 0;
+ uint8 old_vmbits = 0;
PruneFreezeResult presult;
PruneFreezeParams params = {
.relation = rel,
@@ -2075,28 +2190,22 @@ lazy_scan_prune(LVRelState *vacrel,
Assert(!presult.all_visible || !(*has_lpdead_items));
Assert(!presult.all_frozen || presult.all_visible);
- /*
- * Handle setting visibility map bit based on information from the VM (as
- * of last heap_vac_scan_next_block() call), and from all_visible and
- * all_frozen variables
- */
- if (!all_visible_according_to_vm && presult.all_visible)
- {
- uint8 old_vmbits;
- uint8 flags = VISIBILITYMAP_ALL_VISIBLE;
+ do_set_vm = heap_page_will_set_vis(rel,
+ blkno,
+ buf,
+ vmbuffer,
+ all_visible_according_to_vm,
+ &presult,
+ &new_vmbits,
+ &do_set_pd_vis);
- if (presult.all_frozen)
- {
- Assert(!TransactionIdIsValid(presult.vm_conflict_horizon));
- flags |= VISIBILITYMAP_ALL_FROZEN;
- }
+ /* We should only set the VM if PD_ALL_VISIBLE is set or will be */
+ Assert(!do_set_vm || do_set_pd_vis || PageIsAllVisible(page));
+
+ if (do_set_pd_vis)
+ {
/*
- * It should never be the case that the visibility map page is set
- * while the page-level bit is clear, but the reverse is allowed (if
- * checksums are not enabled). Regardless, set both bits so that we
- * get back in sync.
- *
* NB: If the heap page is all-visible but the VM bit is not set, we
* don't need to dirty the heap page. However, if checksums are
* enabled, we do need to make sure that the heap page is dirtied
@@ -2104,136 +2213,36 @@ lazy_scan_prune(LVRelState *vacrel,
* Given that this situation should only happen in rare cases after a
* crash, it is not worth optimizing.
*/
- PageSetAllVisible(page);
MarkBufferDirty(buf);
+ PageSetAllVisible(page);
+ }
+
+ if (do_set_vm)
old_vmbits = visibilitymap_set(vacrel->rel, blkno, buf,
InvalidXLogRecPtr,
vmbuffer, presult.vm_conflict_horizon,
- flags);
-
- /*
- * If the page wasn't already set all-visible and/or all-frozen in the
- * VM, count it as newly set for logging.
- */
- if ((old_vmbits & VISIBILITYMAP_ALL_VISIBLE) == 0)
- {
- vacrel->vm_new_visible_pages++;
- if (presult.all_frozen)
- {
- vacrel->vm_new_visible_frozen_pages++;
- *vm_page_frozen = true;
- }
- }
- else if ((old_vmbits & VISIBILITYMAP_ALL_FROZEN) == 0 &&
- presult.all_frozen)
- {
- vacrel->vm_new_frozen_pages++;
- *vm_page_frozen = true;
- }
- }
-
- /*
- * As of PostgreSQL 9.2, the visibility map bit should never be set if the
- * page-level bit is clear. However, it's possible that the bit got
- * cleared after heap_vac_scan_next_block() was called, so we must recheck
- * with buffer lock before concluding that the VM is corrupt.
- */
- else if (all_visible_according_to_vm && !PageIsAllVisible(page) &&
- visibilitymap_get_status(vacrel->rel, blkno, &vmbuffer) != 0)
- {
- ereport(WARNING,
- (errcode(ERRCODE_DATA_CORRUPTED),
- errmsg("page is not marked all-visible but visibility map bit is set in relation \"%s\" page %u",
- vacrel->relname, blkno)));
-
- visibilitymap_clear(vacrel->rel, blkno, vmbuffer,
- VISIBILITYMAP_VALID_BITS);
- }
+ new_vmbits);
/*
- * It's possible for the value returned by
- * GetOldestNonRemovableTransactionId() to move backwards, so it's not
- * wrong for us to see tuples that appear to not be visible to everyone
- * yet, while PD_ALL_VISIBLE is already set. The real safe xmin value
- * never moves backwards, but GetOldestNonRemovableTransactionId() is
- * conservative and sometimes returns a value that's unnecessarily small,
- * so if we see that contradiction it just means that the tuples that we
- * think are not visible to everyone yet actually are, and the
- * PD_ALL_VISIBLE flag is correct.
- *
- * There should never be LP_DEAD items on a page with PD_ALL_VISIBLE set,
- * however.
+ * For the purposes of logging, count whether or not the page was newly
+ * set all-visible and, potentially, all-frozen.
*/
- else if (presult.lpdead_items > 0 && PageIsAllVisible(page))
+ if ((old_vmbits & VISIBILITYMAP_ALL_VISIBLE) == 0 &&
+ (new_vmbits & VISIBILITYMAP_ALL_VISIBLE) != 0)
{
- ereport(WARNING,
- (errcode(ERRCODE_DATA_CORRUPTED),
- errmsg("page containing LP_DEAD items is marked as all-visible in relation \"%s\" page %u",
- vacrel->relname, blkno)));
-
- PageClearAllVisible(page);
- MarkBufferDirty(buf);
- visibilitymap_clear(vacrel->rel, blkno, vmbuffer,
- VISIBILITYMAP_VALID_BITS);
- }
-
- /*
- * If the all-visible page is all-frozen but not marked as such yet, mark
- * it as all-frozen.
- */
- else if (all_visible_according_to_vm && presult.all_frozen &&
- !VM_ALL_FROZEN(vacrel->rel, blkno, &vmbuffer))
- {
- uint8 old_vmbits;
-
- /*
- * Avoid relying on all_visible_according_to_vm as a proxy for the
- * page-level PD_ALL_VISIBLE bit being set, since it might have become
- * stale -- even when all_visible is set
- */
- if (!PageIsAllVisible(page))
- {
- PageSetAllVisible(page);
- MarkBufferDirty(buf);
- }
-
- /*
- * Set the page all-frozen (and all-visible) in the VM.
- *
- * We can pass InvalidTransactionId as our cutoff_xid, since a
- * snapshotConflictHorizon sufficient to make everything safe for REDO
- * was logged when the page's tuples were frozen.
- */
- Assert(!TransactionIdIsValid(presult.vm_conflict_horizon));
- old_vmbits = visibilitymap_set(vacrel->rel, blkno, buf,
- InvalidXLogRecPtr,
- vmbuffer, InvalidTransactionId,
- VISIBILITYMAP_ALL_VISIBLE |
- VISIBILITYMAP_ALL_FROZEN);
-
- /*
- * The page was likely already set all-visible in the VM. However,
- * there is a small chance that it was modified sometime between
- * setting all_visible_according_to_vm and checking the visibility
- * during pruning. Check the return value of old_vmbits anyway to
- * ensure the visibility map counters used for logging are accurate.
- */
- if ((old_vmbits & VISIBILITYMAP_ALL_VISIBLE) == 0)
+ vacrel->vm_new_visible_pages++;
+ if ((new_vmbits & VISIBILITYMAP_ALL_FROZEN) != 0)
{
- vacrel->vm_new_visible_pages++;
vacrel->vm_new_visible_frozen_pages++;
*vm_page_frozen = true;
}
-
- /*
- * We already checked that the page was not set all-frozen in the VM
- * above, so we don't need to test the value of old_vmbits.
- */
- else
- {
- vacrel->vm_new_frozen_pages++;
- *vm_page_frozen = true;
- }
+ }
+ else if ((old_vmbits & VISIBILITYMAP_ALL_FROZEN) == 0 &&
+ (new_vmbits & VISIBILITYMAP_ALL_FROZEN) != 0)
+ {
+ Assert((new_vmbits & VISIBILITYMAP_ALL_VISIBLE) != 0);
+ vacrel->vm_new_frozen_pages++;
+ *vm_page_frozen = true;
}
return presult.ndeleted;
--
2.43.0
v23-0003-Set-the-VM-in-prune-code.patchtext/x-patch; charset=UTF-8; name=v23-0003-Set-the-VM-in-prune-code.patchDownload
From 40b506a888ef57f5b962b320b817b97e64c9c4c0 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <melanieplageman@gmail.com>
Date: Tue, 2 Dec 2025 15:07:42 -0500
Subject: [PATCH v23 03/14] Set the VM in prune code
For review only, this moves the code to set the VM into
heap_page_prune_and_freeze() as a step toward having it in the same WAL
record.
---
src/backend/access/heap/pruneheap.c | 281 ++++++++++++++++++++++-----
src/backend/access/heap/vacuumlazy.c | 166 +---------------
src/include/access/heapam.h | 27 +++
3 files changed, 272 insertions(+), 202 deletions(-)
diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index 5af84b4c875..0daf3abf717 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -19,7 +19,7 @@
#include "access/htup_details.h"
#include "access/multixact.h"
#include "access/transam.h"
-#include "access/visibilitymapdefs.h"
+#include "access/visibilitymap.h"
#include "access/xlog.h"
#include "access/xloginsert.h"
#include "commands/vacuum.h"
@@ -44,6 +44,8 @@ typedef struct
bool mark_unused_now;
/* whether to attempt freezing tuples */
bool attempt_freeze;
+ /* whether or not to attempt updating the VM */
+ bool attempt_update_vm;
struct VacuumCutoffs *cutoffs;
/*-------------------------------------------------------
@@ -140,16 +142,17 @@ typedef struct
* all_visible and all_frozen indicate if the all-visible and all-frozen
* bits in the visibility map can be set for this page after pruning.
*
- * visibility_cutoff_xid is the newest xmin of live tuples on the page.
- * The caller can use it as the conflict horizon, when setting the VM
- * bits. It is only valid if we froze some tuples, and all_frozen is
- * true.
+ * visibility_cutoff_xid is the newest xmin of live tuples on the page. It
+ * can be used as the conflict horizon when setting the VM or when
+ * freezing all the tuples on the page. It is only valid when all the live
+ * tuples on the page are all-visible.
*
* NOTE: all_visible and all_frozen initially don't include LP_DEAD items.
* That's convenient for heap_page_prune_and_freeze() to use them to
- * decide whether to freeze the page or not. The all_visible and
- * all_frozen values returned to the caller are adjusted to include
- * LP_DEAD items after we determine whether to opportunistically freeze.
+ * decide whether to opportunistically freeze the page or not. The
+ * all_visible and all_frozen values ultimately used to set the VM are
+ * adjusted to include LP_DEAD items after we determine whether or not to
+ * opportunistically freeze.
*/
bool all_visible;
bool all_frozen;
@@ -191,6 +194,14 @@ static void page_verify_redirects(Page page);
static bool heap_page_will_freeze(Relation relation, Buffer buffer,
bool did_tuple_hint_fpi, bool do_prune, bool do_hint_prune,
PruneState *prstate);
+static bool heap_page_will_set_vis(Relation relation,
+ BlockNumber heap_blk,
+ Buffer heap_buf,
+ Buffer vmbuffer,
+ bool blk_known_av,
+ const PruneFreezeResult *presult,
+ uint8 *new_vmbits,
+ bool *do_set_pd_vis);
/*
@@ -280,6 +291,8 @@ heap_page_prune_opt(Relation relation, Buffer buffer)
PruneFreezeParams params = {
.relation = relation,
.buffer = buffer,
+ .vmbuffer = InvalidBuffer,
+ .blk_known_av = false,
.reason = PRUNE_ON_ACCESS,
.options = 0,
.vistest = vistest,
@@ -338,6 +351,8 @@ prune_freeze_setup(PruneFreezeParams *params,
/* cutoffs must be provided if we will attempt freezing */
Assert(!(params->options & HEAP_PAGE_PRUNE_FREEZE) || params->cutoffs);
prstate->attempt_freeze = (params->options & HEAP_PAGE_PRUNE_FREEZE) != 0;
+ prstate->attempt_update_vm =
+ (params->options & HEAP_PAGE_PRUNE_UPDATE_VIS) != 0;
prstate->cutoffs = params->cutoffs;
/*
@@ -386,51 +401,54 @@ prune_freeze_setup(PruneFreezeParams *params,
prstate->frz_conflict_horizon = InvalidTransactionId;
/*
- * Vacuum may update the VM after we're done. We can keep track of
- * whether the page will be all-visible and all-frozen after pruning and
- * freezing to help the caller to do that.
+ * Track whether the page could be marked all-visible and/or all-frozen.
+ * This information is used for opportunistic freezing and for updating
+ * the visibility map (VM) if requested by the caller.
*
- * Currently, only VACUUM sets the VM bits. To save the effort, only do
- * the bookkeeping if the caller needs it. Currently, that's tied to
- * HEAP_PAGE_PRUNE_FREEZE, but it could be a separate flag if you wanted
- * to update the VM bits without also freezing or freeze without also
- * setting the VM bits.
+ * Currently, only VACUUM performs freezing, but other callers may in the
+ * future. Visibility bookkeeping is required not just for setting the VM
+ * bits, but also for opportunistic freezing: we only consider freezing if
+ * the page would become all-frozen, or if it would be all-frozen except
+ * for dead tuples that VACUUM will remove. If attempt_update_vm is false,
+ * we will not set the VM bit even if the page is found to be all-visible.
*
- * In addition to telling the caller whether it can set the VM bit, we
- * also use 'all_visible' and 'all_frozen' for our own decision-making. If
- * the whole page would become frozen, we consider opportunistically
- * freezing tuples. We will not be able to freeze the whole page if there
- * are tuples present that are not visible to everyone or if there are
- * dead tuples which are not yet removable. However, dead tuples which
- * will be removed by the end of vacuuming should not preclude us from
- * opportunistically freezing. Because of that, we do not immediately
- * clear all_visible and all_frozen when we see LP_DEAD items. We fix
- * that after scanning the line pointers. We must correct all_visible and
- * all_frozen before we return them to the caller, so that the caller
- * doesn't set the VM bits incorrectly.
+ * If HEAP_PAGE_PRUNE_UPDATE_VIS is passed without HEAP_PAGE_PRUNE_FREEZE,
+ * prstate.all_frozen must be initialized to false, since we will not call
+ * heap_prepare_freeze_tuple() for each tuple.
+ *
+ * Dead tuples that will be removed by the end of vacuum should not
+ * prevent opportunistic freezing. Therefore, we do not clear all_visible
+ * and all_frozen when we encounter LP_DEAD items. Instead, we correct
+ * them after deciding whether to freeze, but before updating the VM, to
+ * avoid setting the VM bits incorrectly.
+ *
+ * If neither freezing nor VM updates are requested, we skip the extra
+ * bookkeeping. In this case, initializing all_visible to false allows
+ * heap_prune_record_unchanged_lp_normal() to bypass unnecessary work.
*/
if (prstate->attempt_freeze)
{
prstate->all_visible = true;
prstate->all_frozen = true;
}
+ else if (prstate->attempt_update_vm)
+ {
+ prstate->all_visible = true;
+ prstate->all_frozen = false;
+ }
else
{
- /*
- * Initializing to false allows skipping the work to update them in
- * heap_prune_record_unchanged_lp_normal().
- */
prstate->all_visible = false;
prstate->all_frozen = false;
}
/*
- * The visibility cutoff xid is the newest xmin of live tuples on the
- * page. In the common case, this will be set as the conflict horizon the
- * caller can use for updating the VM. If, at the end of freezing and
- * pruning, the page is all-frozen, there is no possibility that any
- * running transaction on the standby does not see tuples on the page as
- * all-visible, so the conflict horizon remains InvalidTransactionId.
+ * The visibility cutoff xid is the newest xmin of live, committed tuples
+ * older than OldestXmin on the page. This field is only kept up-to-date
+ * if the page is all-visible. As soon as a tuple is encountered that is
+ * not visible to all, this field is unmaintained. As long as it is
+ * maintained, it can be used to calculate the snapshot conflict horizon
+ * when updating the VM and/or freezing all the tuples on the page.
*/
prstate->visibility_cutoff_xid = InvalidTransactionId;
}
@@ -765,10 +783,131 @@ heap_page_will_freeze(Relation relation, Buffer buffer,
return do_freeze;
}
+/*
+ * Decide whether to set the visibility map bits for heap_blk, using
+ * information from PruneFreezeResult and blk_known_av. Some callers may
+ * already have examined this page’s VM bits (e.g., VACUUM in the previous
+ * heap_vac_scan_next_block() call) and can pass that along as blk_known_av.
+ * Callers that have not previously checked the page's status in the VM should
+ * pass false for blk_known_av.
+ *
+ * This function does not actually set the VM bit or page-level hint,
+ * PD_ALL_VISIBLE.
+ *
+ * However, if it finds that the page-level visibility hint or VM is
+ * corrupted, it will fix them by clearing the VM bit and page hint. This does
+ * not need to be done in a critical section.
+ *
+ * Returns true if one or both VM bits should be set, along with the desired
+ * flags in *new_vmbits. Also indicates via do_set_pd_vis whether
+ * PD_ALL_VISIBLE should be set on the heap page.
+ */
+static bool
+heap_page_will_set_vis(Relation relation,
+ BlockNumber heap_blk,
+ Buffer heap_buf,
+ Buffer vmbuffer,
+ bool blk_known_av,
+ const PruneFreezeResult *presult,
+ uint8 *new_vmbits,
+ bool *do_set_pd_vis)
+{
+ Page heap_page = BufferGetPage(heap_buf);
+
+ *new_vmbits = 0;
+
+ /*
+ * It should never be the case that the visibility map page is set while
+ * the page-level bit is clear, but the reverse is allowed (if checksums
+ * are not enabled).
+ *
+ * We avoid relying on blk_known_av as a proxy for the page-level
+ * PD_ALL_VISIBLE bit being set, since it might have become stale and may
+ * not be provided by all callers.
+ */
+ *do_set_pd_vis = presult->all_visible & !PageIsAllVisible(heap_page);
+
+ /*
+ * Determine what the visibility map bits should be set to using the
+ * values of all_visible and all_frozen determined during
+ * pruning/freezing.
+ */
+ if ((presult->all_visible && !blk_known_av) ||
+ (presult->all_frozen && !VM_ALL_FROZEN(relation, heap_blk, &vmbuffer)))
+ {
+ *new_vmbits = VISIBILITYMAP_ALL_VISIBLE;
+ if (presult->all_frozen)
+ {
+ Assert(!TransactionIdIsValid(presult->vm_conflict_horizon));
+ *new_vmbits |= VISIBILITYMAP_ALL_FROZEN;
+ }
+
+ return true;
+ }
+
+ /*
+ * Now handle two potential corruption cases:
+ *
+ * These do not need to happen in a critical section and are not
+ * WAL-logged.
+ *
+ * As of PostgreSQL 9.2, the visibility map bit should never be set if the
+ * page-level bit is clear. However, it's possible that the bit got
+ * cleared after heap_vac_scan_next_block() was called, so we must recheck
+ * with buffer lock before concluding that the VM is corrupt.
+ *
+ * Callers which did not check the visibility map and determine
+ * blk_known_av will not be eligible for this, however the cost of
+ * potentially needing to read the visibility map for pages that are not
+ * all-visible is too high to justify generalizing the check.
+ */
+ else if (blk_known_av && !PageIsAllVisible(heap_page) &&
+ visibilitymap_get_status(relation, heap_blk, &vmbuffer) != 0)
+ {
+ ereport(WARNING,
+ (errcode(ERRCODE_DATA_CORRUPTED),
+ errmsg("page is not marked all-visible but visibility map bit is set in relation \"%s\" page %u",
+ RelationGetRelationName(relation), heap_blk)));
+
+ visibilitymap_clear(relation, heap_blk, vmbuffer,
+ VISIBILITYMAP_VALID_BITS);
+ }
+
+ /*
+ * It's possible for the value returned by
+ * GetOldestNonRemovableTransactionId() to move backwards, so it's not
+ * wrong for us to see tuples that appear to not be visible to everyone
+ * yet, while PD_ALL_VISIBLE is already set. The real safe xmin value
+ * never moves backwards, but GetOldestNonRemovableTransactionId() is
+ * conservative and sometimes returns a value that's unnecessarily small,
+ * so if we see that contradiction it just means that the tuples that we
+ * think are not visible to everyone yet actually are, and the
+ * PD_ALL_VISIBLE flag is correct.
+ *
+ * There should never be LP_DEAD items on a page with PD_ALL_VISIBLE set,
+ * however.
+ */
+ else if (presult->lpdead_items > 0 && PageIsAllVisible(heap_page))
+ {
+ ereport(WARNING,
+ (errcode(ERRCODE_DATA_CORRUPTED),
+ errmsg("page containing LP_DEAD items is marked as all-visible in relation \"%s\" page %u",
+ RelationGetRelationName(relation), heap_blk)));
+
+ PageClearAllVisible(heap_page);
+ MarkBufferDirty(heap_buf);
+ visibilitymap_clear(relation, heap_blk, vmbuffer,
+ VISIBILITYMAP_VALID_BITS);
+ }
+
+ return false;
+}
+
/*
* Prune and repair fragmentation and potentially freeze tuples on the
- * specified page.
+ * specified page. If the page's visibility status has changed, update it in
+ * the VM.
*
* Caller must have pin and buffer cleanup lock on the page. Note that we
* don't update the FSM information for page on caller's behalf. Caller might
@@ -783,12 +922,13 @@ heap_page_will_freeze(Relation relation, Buffer buffer,
* tuples if it's required in order to advance relfrozenxid / relminmxid, or
* if it's considered advantageous for overall system performance to do so
* now. The 'params.cutoffs', 'presult', 'new_relfrozen_xid' and
- * 'new_relmin_mxid' arguments are required when freezing. When
- * HEAP_PAGE_PRUNE_FREEZE option is passed, we also set presult->all_visible
- * and presult->all_frozen after determining whether or not to
- * opportunistically freeze, to indicate if the VM bits can be set. They are
- * always set to false when the HEAP_PAGE_PRUNE_FREEZE option is not passed,
- * because at the moment only callers that also freeze need that information.
+ * 'new_relmin_mxid' arguments are required when freezing.
+ *
+ * If HEAP_PAGE_PRUNE_UPDATE_VIS is set in params and the visibility status of
+ * the page has changed, we will update the VM at the same time as pruning and
+ * freezing the heap page. We will also update presult->old_vmbits and
+ * presult->new_vmbits with the state of the VM before and after updating it
+ * for the caller to use in bookkeeping.
*
* presult contains output parameters needed by callers, such as the number of
* tuples removed and the offsets of dead items on the page after pruning.
@@ -813,11 +953,15 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
MultiXactId *new_relmin_mxid)
{
Buffer buffer = params->buffer;
+ Buffer vmbuffer = params->vmbuffer;
Page page = BufferGetPage(buffer);
+ BlockNumber blockno = BufferGetBlockNumber(buffer);
PruneState prstate;
bool do_freeze;
bool do_prune;
bool do_hint_prune;
+ bool do_set_vm;
+ bool do_set_pd_vis;
bool did_tuple_hint_fpi;
int64 fpi_before = pgWalUsage.wal_fpi;
@@ -1005,6 +1149,51 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
*new_relmin_mxid = prstate.pagefrz.NoFreezePageRelminMxid;
}
}
+
+ /* Now update the visibility map and PD_ALL_VISIBLE hint */
+ Assert(!prstate.all_visible || (prstate.lpdead_items == 0));
+
+ do_set_vm = false;
+ if (prstate.attempt_update_vm)
+ do_set_vm = heap_page_will_set_vis(params->relation,
+ blockno,
+ buffer,
+ vmbuffer,
+ params->blk_known_av,
+ presult,
+ &presult->new_vmbits,
+ &do_set_pd_vis);
+
+
+ /* We should only set the VM if PD_ALL_VISIBLE is set or will be */
+ Assert(!do_set_vm || do_set_pd_vis || PageIsAllVisible(page));
+
+ /*
+ * new_vmbits should be 0 regardless of whether or not the page is
+ * all-visible if we do not intend to set the VM.
+ */
+ Assert(do_set_vm || presult->new_vmbits == 0);
+
+ if (do_set_pd_vis)
+ {
+ /*
+ * NB: If the heap page is all-visible but the VM bit is not set, we
+ * don't need to dirty the heap page. However, if checksums are
+ * enabled, we do need to make sure that the heap page is dirtied
+ * before passing it to visibilitymap_set(), because it may be logged.
+ * Given that this situation should only happen in rare cases after a
+ * crash, it is not worth optimizing.
+ */
+ MarkBufferDirty(buffer);
+ PageSetAllVisible(page);
+ }
+
+ presult->old_vmbits = 0;
+ if (do_set_vm)
+ presult->old_vmbits = visibilitymap_set(params->relation, blockno, buffer,
+ InvalidXLogRecPtr,
+ vmbuffer, presult->vm_conflict_horizon,
+ presult->new_vmbits);
}
@@ -1479,6 +1668,8 @@ heap_prune_record_unchanged_lp_normal(Page page, PruneState *prstate, OffsetNumb
{
TransactionId xmin;
+ Assert(prstate->attempt_update_vm);
+
if (!HeapTupleHeaderXminCommitted(htup))
{
prstate->all_visible = false;
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index 1cca095841e..f5617335cb2 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -1935,116 +1935,6 @@ cmpOffsetNumbers(const void *a, const void *b)
}
-/*
- * Decide whether to set the visibility map bits for heap_blk, using
- * information from PruneFreezeResult and all_visible_according_to_vm. This
- * function does not actually set the VM bit or page-level hint,
- * PD_ALL_VISIBLE.
- *
- * If it finds that the page-level visibility hint or VM is corrupted, it will
- * fix them by clearing the VM bit and page hint. This does not need to be
- * done in a critical section.
- *
- * Returns true if one or both VM bits should be set, along with the desired
- * flags in *new_vmbits. Also indicates via do_set_pd_vis whether
- * PD_ALL_VISIBLE should be set on the heap page.
- */
-static bool
-heap_page_will_set_vis(Relation relation,
- BlockNumber heap_blk,
- Buffer heap_buf,
- Buffer vmbuffer,
- bool all_visible_according_to_vm,
- const PruneFreezeResult *presult,
- uint8 *new_vmbits,
- bool *do_set_pd_vis)
-{
- Page heap_page = BufferGetPage(heap_buf);
-
- *new_vmbits = 0;
-
- /*
- * It should never be the case that the visibility map page is set while
- * the page-level bit is clear, but the reverse is allowed (if checksums
- * are not enabled).
- *
- * We avoid relying on all_visible_according_to_vm as a proxy for the
- * page-level PD_ALL_VISIBLE bit being set, since it might have become
- * stale.
- */
- *do_set_pd_vis = presult->all_visible & !PageIsAllVisible(heap_page);
-
- /*
- * Determine what to set the visibility map bits to based on information
- * from the VM (as of last heap_vac_scan_next_block() call), and from
- * all_visible and all_frozen variables.
- */
- if ((presult->all_visible && !all_visible_according_to_vm) ||
- (presult->all_frozen && !VM_ALL_FROZEN(relation, heap_blk, &vmbuffer)))
- {
- *new_vmbits = VISIBILITYMAP_ALL_VISIBLE;
- if (presult->all_frozen)
- {
- Assert(!TransactionIdIsValid(presult->vm_conflict_horizon));
- *new_vmbits |= VISIBILITYMAP_ALL_FROZEN;
- }
-
- return true;
- }
-
- /*
- * Now handle two potential corruption cases:
- *
- * These do not need to happen in a critical section and are not
- * WAL-logged.
- *
- * As of PostgreSQL 9.2, the visibility map bit should never be set if the
- * page-level bit is clear. However, it's possible that the bit got
- * cleared after heap_vac_scan_next_block() was called, so we must recheck
- * with buffer lock before concluding that the VM is corrupt.
- */
- else if (all_visible_according_to_vm && !PageIsAllVisible(heap_page) &&
- visibilitymap_get_status(relation, heap_blk, &vmbuffer) != 0)
- {
- ereport(WARNING,
- (errcode(ERRCODE_DATA_CORRUPTED),
- errmsg("page is not marked all-visible but visibility map bit is set in relation \"%s\" page %u",
- RelationGetRelationName(relation), heap_blk)));
-
- visibilitymap_clear(relation, heap_blk, vmbuffer,
- VISIBILITYMAP_VALID_BITS);
- }
-
- /*
- * It's possible for the value returned by
- * GetOldestNonRemovableTransactionId() to move backwards, so it's not
- * wrong for us to see tuples that appear to not be visible to everyone
- * yet, while PD_ALL_VISIBLE is already set. The real safe xmin value
- * never moves backwards, but GetOldestNonRemovableTransactionId() is
- * conservative and sometimes returns a value that's unnecessarily small,
- * so if we see that contradiction it just means that the tuples that we
- * think are not visible to everyone yet actually are, and the
- * PD_ALL_VISIBLE flag is correct.
- *
- * There should never be LP_DEAD items on a page with PD_ALL_VISIBLE set,
- * however.
- */
- else if (presult->lpdead_items > 0 && PageIsAllVisible(heap_page))
- {
- ereport(WARNING,
- (errcode(ERRCODE_DATA_CORRUPTED),
- errmsg("page containing LP_DEAD items is marked as all-visible in relation \"%s\" page %u",
- RelationGetRelationName(relation), heap_blk)));
-
- PageClearAllVisible(heap_page);
- MarkBufferDirty(heap_buf);
- visibilitymap_clear(relation, heap_blk, vmbuffer,
- VISIBILITYMAP_VALID_BITS);
- }
-
- return false;
-}
-
/*
* lazy_scan_prune() -- lazy_scan_heap() pruning and freezing.
*
@@ -2075,16 +1965,14 @@ lazy_scan_prune(LVRelState *vacrel,
bool *vm_page_frozen)
{
Relation rel = vacrel->rel;
- bool do_set_vm = false;
- bool do_set_pd_vis = false;
- uint8 new_vmbits = 0;
- uint8 old_vmbits = 0;
PruneFreezeResult presult;
PruneFreezeParams params = {
.relation = rel,
.buffer = buf,
+ .vmbuffer = vmbuffer,
+ .blk_known_av = all_visible_according_to_vm,
.reason = PRUNE_VACUUM_SCAN,
- .options = HEAP_PAGE_PRUNE_FREEZE,
+ .options = HEAP_PAGE_PRUNE_FREEZE | HEAP_PAGE_PRUNE_UPDATE_VIS,
.vistest = vacrel->vistest,
.cutoffs = &vacrel->cutoffs,
};
@@ -2187,60 +2075,24 @@ lazy_scan_prune(LVRelState *vacrel,
/* Did we find LP_DEAD items? */
*has_lpdead_items = (presult.lpdead_items > 0);
- Assert(!presult.all_visible || !(*has_lpdead_items));
- Assert(!presult.all_frozen || presult.all_visible);
-
- do_set_vm = heap_page_will_set_vis(rel,
- blkno,
- buf,
- vmbuffer,
- all_visible_according_to_vm,
- &presult,
- &new_vmbits,
- &do_set_pd_vis);
-
-
- /* We should only set the VM if PD_ALL_VISIBLE is set or will be */
- Assert(!do_set_vm || do_set_pd_vis || PageIsAllVisible(page));
-
- if (do_set_pd_vis)
- {
- /*
- * NB: If the heap page is all-visible but the VM bit is not set, we
- * don't need to dirty the heap page. However, if checksums are
- * enabled, we do need to make sure that the heap page is dirtied
- * before passing it to visibilitymap_set(), because it may be logged.
- * Given that this situation should only happen in rare cases after a
- * crash, it is not worth optimizing.
- */
- MarkBufferDirty(buf);
- PageSetAllVisible(page);
- }
-
- if (do_set_vm)
- old_vmbits = visibilitymap_set(vacrel->rel, blkno, buf,
- InvalidXLogRecPtr,
- vmbuffer, presult.vm_conflict_horizon,
- new_vmbits);
-
/*
* For the purposes of logging, count whether or not the page was newly
* set all-visible and, potentially, all-frozen.
*/
- if ((old_vmbits & VISIBILITYMAP_ALL_VISIBLE) == 0 &&
- (new_vmbits & VISIBILITYMAP_ALL_VISIBLE) != 0)
+ if ((presult.old_vmbits & VISIBILITYMAP_ALL_VISIBLE) == 0 &&
+ (presult.new_vmbits & VISIBILITYMAP_ALL_VISIBLE) != 0)
{
vacrel->vm_new_visible_pages++;
- if ((new_vmbits & VISIBILITYMAP_ALL_FROZEN) != 0)
+ if ((presult.new_vmbits & VISIBILITYMAP_ALL_FROZEN) != 0)
{
vacrel->vm_new_visible_frozen_pages++;
*vm_page_frozen = true;
}
}
- else if ((old_vmbits & VISIBILITYMAP_ALL_FROZEN) == 0 &&
- (new_vmbits & VISIBILITYMAP_ALL_FROZEN) != 0)
+ else if ((presult.old_vmbits & VISIBILITYMAP_ALL_FROZEN) == 0 &&
+ (presult.new_vmbits & VISIBILITYMAP_ALL_FROZEN) != 0)
{
- Assert((new_vmbits & VISIBILITYMAP_ALL_VISIBLE) != 0);
+ Assert((presult.new_vmbits & VISIBILITYMAP_ALL_VISIBLE) != 0);
vacrel->vm_new_frozen_pages++;
*vm_page_frozen = true;
}
diff --git a/src/include/access/heapam.h b/src/include/access/heapam.h
index 632c4332a8c..ce9cfbdc767 100644
--- a/src/include/access/heapam.h
+++ b/src/include/access/heapam.h
@@ -42,6 +42,7 @@
/* "options" flag bits for heap_page_prune_and_freeze */
#define HEAP_PAGE_PRUNE_MARK_UNUSED_NOW (1 << 0)
#define HEAP_PAGE_PRUNE_FREEZE (1 << 1)
+#define HEAP_PAGE_PRUNE_UPDATE_VIS (1 << 2)
typedef struct BulkInsertStateData *BulkInsertState;
typedef struct GlobalVisState GlobalVisState;
@@ -238,6 +239,18 @@ typedef struct PruneFreezeParams
Relation relation; /* relation containing buffer to be pruned */
Buffer buffer; /* buffer to be pruned */
+ /*
+ * vmbuffer is the buffer that must already contain the required block of
+ * the visibility map if we are to update it. blk_known_av is the
+ * visibility status of the heap block as of the last call to
+ * find_next_unskippable_block(). Callers which did not check the
+ * visibility map already should pass false for blk_known_av. This is only
+ * an optimization for callers that did check the VM and won't affect
+ * correctness.
+ */
+ Buffer vmbuffer;
+ bool blk_known_av;
+
/*
* The reason pruning was performed. It is used to set the WAL record
* opcode which is used for debugging and analysis purposes.
@@ -252,6 +265,9 @@ typedef struct PruneFreezeParams
*
* HEAP_PAGE_PRUNE_FREEZE indicates that we will also freeze tuples, and
* will return 'all_visible', 'all_frozen' flags to the caller.
+ *
+ * HEAP_PAGE_PRUNE_UPDATE_VIS indicates that we will set the page's status
+ * in the VM.
*/
int options;
@@ -299,6 +315,17 @@ typedef struct PruneFreezeResult
bool all_frozen;
TransactionId vm_conflict_horizon;
+ /*
+ * old_vmbits are the state of the all-visible and all-frozen bits in the
+ * visibility map before updating it during phase I of vacuuming.
+ * new_vmbits are the state of those bits after phase I of vacuuming.
+ *
+ * These are only set if the HEAP_PAGE_PRUNE_UPDATE_VIS option is set and
+ * we have attempted to update the VM.
+ */
+ uint8 new_vmbits;
+ uint8 old_vmbits;
+
/*
* Whether or not the page makes rel truncation unsafe. This is set to
* 'true', even if the page contains LP_DEAD items. VACUUM will remove
--
2.43.0
v23-0004-Move-VM-assert-into-prune-freeze-code.patchtext/x-patch; charset=UTF-8; name=v23-0004-Move-VM-assert-into-prune-freeze-code.patchDownload
From 9aa0ec2b5fae04762128fbec329a23139fb5b4a4 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <melanieplageman@gmail.com>
Date: Tue, 2 Dec 2025 15:57:34 -0500
Subject: [PATCH v23 04/14] Move VM assert into prune/freeze code
For review only, this commit moves the check of the heap page into
prune/freeze code before setting the VM. This allows us to remove some
fields of the PruneFreezeResult.
This will get squashed into a larger commit to set the VM in the same
record where we prune and freeze.
---
src/backend/access/heap/pruneheap.c | 142 +++++++++++++++++++--------
src/backend/access/heap/vacuumlazy.c | 68 +------------
src/include/access/heapam.h | 25 ++---
3 files changed, 111 insertions(+), 124 deletions(-)
diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index 0daf3abf717..2512b5d83e3 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -199,7 +199,7 @@ static bool heap_page_will_set_vis(Relation relation,
Buffer heap_buf,
Buffer vmbuffer,
bool blk_known_av,
- const PruneFreezeResult *presult,
+ const PruneState *prstate,
uint8 *new_vmbits,
bool *do_set_pd_vis);
@@ -785,8 +785,8 @@ heap_page_will_freeze(Relation relation, Buffer buffer,
/*
* Decide whether to set the visibility map bits for heap_blk, using
- * information from PruneFreezeResult and blk_known_av. Some callers may
- * already have examined this page’s VM bits (e.g., VACUUM in the previous
+ * information from PruneState and blk_known_av. Some callers may already have
+ * examined this page’s VM bits (e.g., VACUUM in the previous
* heap_vac_scan_next_block() call) and can pass that along as blk_known_av.
* Callers that have not previously checked the page's status in the VM should
* pass false for blk_known_av.
@@ -808,13 +808,20 @@ heap_page_will_set_vis(Relation relation,
Buffer heap_buf,
Buffer vmbuffer,
bool blk_known_av,
- const PruneFreezeResult *presult,
+ const PruneState *prstate,
uint8 *new_vmbits,
bool *do_set_pd_vis)
{
Page heap_page = BufferGetPage(heap_buf);
*new_vmbits = 0;
+ *do_set_pd_vis = false;
+
+ if (!prstate->attempt_update_vm)
+ {
+ Assert(!prstate->all_visible && !prstate->all_frozen);
+ return false;
+ }
/*
* It should never be the case that the visibility map page is set while
@@ -825,22 +832,19 @@ heap_page_will_set_vis(Relation relation,
* PD_ALL_VISIBLE bit being set, since it might have become stale and may
* not be provided by all callers.
*/
- *do_set_pd_vis = presult->all_visible & !PageIsAllVisible(heap_page);
+ *do_set_pd_vis = prstate->all_visible & !PageIsAllVisible(heap_page);
/*
* Determine what the visibility map bits should be set to using the
* values of all_visible and all_frozen determined during
* pruning/freezing.
*/
- if ((presult->all_visible && !blk_known_av) ||
- (presult->all_frozen && !VM_ALL_FROZEN(relation, heap_blk, &vmbuffer)))
+ if ((prstate->all_visible && !blk_known_av) ||
+ (prstate->all_frozen && !VM_ALL_FROZEN(relation, heap_blk, &vmbuffer)))
{
*new_vmbits = VISIBILITYMAP_ALL_VISIBLE;
- if (presult->all_frozen)
- {
- Assert(!TransactionIdIsValid(presult->vm_conflict_horizon));
+ if (prstate->all_frozen)
*new_vmbits |= VISIBILITYMAP_ALL_FROZEN;
- }
return true;
}
@@ -887,7 +891,7 @@ heap_page_will_set_vis(Relation relation,
* There should never be LP_DEAD items on a page with PD_ALL_VISIBLE set,
* however.
*/
- else if (presult->lpdead_items > 0 && PageIsAllVisible(heap_page))
+ else if (prstate->lpdead_items > 0 && PageIsAllVisible(heap_page))
{
ereport(WARNING,
(errcode(ERRCODE_DATA_CORRUPTED),
@@ -903,6 +907,30 @@ heap_page_will_set_vis(Relation relation,
return false;
}
+#ifdef USE_ASSERT_CHECKING
+
+/*
+ * Wrapper for heap_page_would_be_all_visible() which can be used for callers
+ * that expect no LP_DEAD on the page. Currently assert-only, but there is no
+ * reason not to use it outside of asserts.
+ */
+static bool
+heap_page_is_all_visible(Relation rel, Buffer buf,
+ TransactionId OldestXmin,
+ bool *all_frozen,
+ TransactionId *visibility_cutoff_xid,
+ OffsetNumber *logging_offnum)
+{
+
+ return heap_page_would_be_all_visible(rel, buf,
+ OldestXmin,
+ NULL, 0,
+ all_frozen,
+ visibility_cutoff_xid,
+ logging_offnum);
+}
+#endif
+
/*
* Prune and repair fragmentation and potentially freeze tuples on the
@@ -956,6 +984,7 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
Buffer vmbuffer = params->vmbuffer;
Page page = BufferGetPage(buffer);
BlockNumber blockno = BufferGetBlockNumber(buffer);
+ TransactionId vm_conflict_horizon = InvalidTransactionId;
PruneState prstate;
bool do_freeze;
bool do_prune;
@@ -1116,23 +1145,8 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
presult->nfrozen = prstate.nfrozen;
presult->live_tuples = prstate.live_tuples;
presult->recently_dead_tuples = prstate.recently_dead_tuples;
- presult->all_visible = prstate.all_visible;
- presult->all_frozen = prstate.all_frozen;
presult->hastup = prstate.hastup;
- /*
- * For callers planning to update the visibility map, the conflict horizon
- * for that record must be the newest xmin on the page. However, if the
- * page is completely frozen, there can be no conflict and the
- * vm_conflict_horizon should remain InvalidTransactionId. This includes
- * the case that we just froze all the tuples; the prune-freeze record
- * included the conflict XID already so the caller doesn't need it.
- */
- if (presult->all_frozen)
- presult->vm_conflict_horizon = InvalidTransactionId;
- else
- presult->vm_conflict_horizon = prstate.visibility_cutoff_xid;
-
presult->lpdead_items = prstate.lpdead_items;
/* the presult->deadoffsets array was already filled in */
@@ -1150,20 +1164,68 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
}
}
- /* Now update the visibility map and PD_ALL_VISIBLE hint */
- Assert(!prstate.all_visible || (prstate.lpdead_items == 0));
+ /*
+ * If updating the visibility map, the conflict horizon for that record
+ * must be the newest xmin on the page. However, if the page is
+ * completely frozen, there can be no conflict and the vm_conflict_horizon
+ * should remain InvalidTransactionId. This includes the case that we
+ * just froze all the tuples; the prune-freeze record included the
+ * conflict XID already so we don't need to again.
+ */
+ if (prstate.all_frozen)
+ vm_conflict_horizon = InvalidTransactionId;
+ else
+ vm_conflict_horizon = prstate.visibility_cutoff_xid;
+
+ /*
+ * During its second pass over the heap, VACUUM calls
+ * heap_page_would_be_all_visible() to determine whether a page is
+ * all-visible and all-frozen. The logic here is similar. After completing
+ * pruning and freezing, use an assertion to verify that our results
+ * remain consistent with heap_page_would_be_all_visible().
+ */
+#ifdef USE_ASSERT_CHECKING
+ if (prstate.all_visible)
+ {
+ TransactionId debug_cutoff;
+ bool debug_all_frozen;
+
+ Assert(presult->lpdead_items == 0);
+
+ Assert(heap_page_is_all_visible(params->relation, buffer,
+ prstate.cutoffs->OldestXmin,
+ &debug_all_frozen,
+ &debug_cutoff, off_loc));
- do_set_vm = false;
- if (prstate.attempt_update_vm)
- do_set_vm = heap_page_will_set_vis(params->relation,
- blockno,
- buffer,
- vmbuffer,
- params->blk_known_av,
- presult,
- &presult->new_vmbits,
- &do_set_pd_vis);
+ Assert(prstate.all_frozen == debug_all_frozen);
+ Assert(!TransactionIdIsValid(debug_cutoff) ||
+ debug_cutoff == vm_conflict_horizon);
+ }
+#endif
+
+ Assert(!prstate.all_frozen || prstate.all_visible);
+ Assert(!prstate.all_visible || (prstate.lpdead_items == 0));
+
+ /*
+ * Decide whether to set the page-level PD_ALL_VISIBLE bit and the VM bits
+ * based on information from the VM and the all_visible/all_frozen flags.
+ *
+ * While it is valid for PD_ALL_VISIBLE to be set when the corresponding
+ * VM bit is clear, we strongly prefer to keep them in sync.
+ *
+ * Accordingly, we also allow updating only the VM when PD_ALL_VISIBLE has
+ * already been set. Setting only the VM is most common when setting an
+ * already all-visible page all-frozen.
+ */
+ do_set_vm = heap_page_will_set_vis(params->relation,
+ blockno,
+ buffer,
+ vmbuffer,
+ params->blk_known_av,
+ &prstate,
+ &presult->new_vmbits,
+ &do_set_pd_vis);
/* We should only set the VM if PD_ALL_VISIBLE is set or will be */
Assert(!do_set_vm || do_set_pd_vis || PageIsAllVisible(page));
@@ -1192,7 +1254,7 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
if (do_set_vm)
presult->old_vmbits = visibilitymap_set(params->relation, blockno, buffer,
InvalidXLogRecPtr,
- vmbuffer, presult->vm_conflict_horizon,
+ vmbuffer, vm_conflict_horizon,
presult->new_vmbits);
}
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index f5617335cb2..4aa425ec945 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -464,20 +464,6 @@ static void dead_items_add(LVRelState *vacrel, BlockNumber blkno, OffsetNumber *
static void dead_items_reset(LVRelState *vacrel);
static void dead_items_cleanup(LVRelState *vacrel);
-#ifdef USE_ASSERT_CHECKING
-static bool heap_page_is_all_visible(Relation rel, Buffer buf,
- TransactionId OldestXmin,
- bool *all_frozen,
- TransactionId *visibility_cutoff_xid,
- OffsetNumber *logging_offnum);
-#endif
-static bool heap_page_would_be_all_visible(Relation rel, Buffer buf,
- TransactionId OldestXmin,
- OffsetNumber *deadoffsets,
- int ndeadoffsets,
- bool *all_frozen,
- TransactionId *visibility_cutoff_xid,
- OffsetNumber *logging_offnum);
static void update_relstats_all_indexes(LVRelState *vacrel);
static void vacuum_error_callback(void *arg);
static void update_vacuum_error_info(LVRelState *vacrel,
@@ -2016,32 +2002,6 @@ lazy_scan_prune(LVRelState *vacrel,
vacrel->new_frozen_tuple_pages++;
}
- /*
- * VACUUM will call heap_page_is_all_visible() during the second pass over
- * the heap to determine all_visible and all_frozen for the page -- this
- * is a specialized version of the logic from this function. Now that
- * we've finished pruning and freezing, make sure that we're in total
- * agreement with heap_page_is_all_visible() using an assertion.
- */
-#ifdef USE_ASSERT_CHECKING
- if (presult.all_visible)
- {
- TransactionId debug_cutoff;
- bool debug_all_frozen;
-
- Assert(presult.lpdead_items == 0);
-
- Assert(heap_page_is_all_visible(vacrel->rel, buf,
- vacrel->cutoffs.OldestXmin, &debug_all_frozen,
- &debug_cutoff, &vacrel->offnum));
-
- Assert(presult.all_frozen == debug_all_frozen);
-
- Assert(!TransactionIdIsValid(debug_cutoff) ||
- debug_cutoff == presult.vm_conflict_horizon);
- }
-#endif
-
/*
* Now save details of the LP_DEAD items from the page in vacrel
*/
@@ -3496,29 +3456,6 @@ dead_items_cleanup(LVRelState *vacrel)
vacrel->pvs = NULL;
}
-#ifdef USE_ASSERT_CHECKING
-
-/*
- * Wrapper for heap_page_would_be_all_visible() which can be used for callers
- * that expect no LP_DEAD on the page. Currently assert-only, but there is no
- * reason not to use it outside of asserts.
- */
-static bool
-heap_page_is_all_visible(Relation rel, Buffer buf,
- TransactionId OldestXmin,
- bool *all_frozen,
- TransactionId *visibility_cutoff_xid,
- OffsetNumber *logging_offnum)
-{
-
- return heap_page_would_be_all_visible(rel, buf,
- OldestXmin,
- NULL, 0,
- all_frozen,
- visibility_cutoff_xid,
- logging_offnum);
-}
-#endif
/*
* Check whether the heap page in buf is all-visible except for the dead
@@ -3542,15 +3479,12 @@ heap_page_is_all_visible(Relation rel, Buffer buf,
* - *logging_offnum: OffsetNumber of current tuple being processed;
* used by vacuum's error callback system.
*
- * Callers looking to verify that the page is already all-visible can call
- * heap_page_is_all_visible().
- *
* This logic is closely related to heap_prune_record_unchanged_lp_normal().
* If you modify this function, ensure consistency with that code. An
* assertion cross-checks that both remain in agreement. Do not introduce new
* side-effects.
*/
-static bool
+bool
heap_page_would_be_all_visible(Relation rel, Buffer buf,
TransactionId OldestXmin,
OffsetNumber *deadoffsets,
diff --git a/src/include/access/heapam.h b/src/include/access/heapam.h
index ce9cfbdc767..b20096b6ca1 100644
--- a/src/include/access/heapam.h
+++ b/src/include/access/heapam.h
@@ -263,8 +263,7 @@ typedef struct PruneFreezeParams
* HEAP_PAGE_PRUNE_MARK_UNUSED_NOW indicates that dead items can be set
* LP_UNUSED during pruning.
*
- * HEAP_PAGE_PRUNE_FREEZE indicates that we will also freeze tuples, and
- * will return 'all_visible', 'all_frozen' flags to the caller.
+ * HEAP_PAGE_PRUNE_FREEZE indicates that we will also freeze tuples
*
* HEAP_PAGE_PRUNE_UPDATE_VIS indicates that we will set the page's status
* in the VM.
@@ -300,21 +299,6 @@ typedef struct PruneFreezeResult
int live_tuples;
int recently_dead_tuples;
- /*
- * all_visible and all_frozen indicate if the all-visible and all-frozen
- * bits in the visibility map can be set for this page, after pruning.
- *
- * vm_conflict_horizon is the newest xmin of live tuples on the page. The
- * caller can use it as the conflict horizon when setting the VM bits. It
- * is only valid if we froze some tuples (nfrozen > 0), and all_frozen is
- * true.
- *
- * These are only set if the HEAP_PRUNE_FREEZE option is set.
- */
- bool all_visible;
- bool all_frozen;
- TransactionId vm_conflict_horizon;
-
/*
* old_vmbits are the state of the all-visible and all-frozen bits in the
* visibility map before updating it during phase I of vacuuming.
@@ -460,6 +444,13 @@ extern void log_heap_prune_and_freeze(Relation relation, Buffer buffer,
/* in heap/vacuumlazy.c */
extern void heap_vacuum_rel(Relation rel,
const VacuumParams params, BufferAccessStrategy bstrategy);
+extern bool heap_page_would_be_all_visible(Relation rel, Buffer buf,
+ TransactionId OldestXmin,
+ OffsetNumber *deadoffsets,
+ int ndeadoffsets,
+ bool *all_frozen,
+ TransactionId *visibility_cutoff_xid,
+ OffsetNumber *logging_offnum);
/* in heap/heapam_visibility.c */
extern bool HeapTupleSatisfiesVisibility(HeapTuple htup, Snapshot snapshot,
--
2.43.0
v23-0005-Eliminate-XLOG_HEAP2_VISIBLE-from-vacuum-phase-I.patchtext/x-patch; charset=US-ASCII; name=v23-0005-Eliminate-XLOG_HEAP2_VISIBLE-from-vacuum-phase-I.patchDownload
From a407673cb2632d4544cc56458dbf4a063da2067c Mon Sep 17 00:00:00 2001
From: Melanie Plageman <melanieplageman@gmail.com>
Date: Tue, 2 Dec 2025 16:16:22 -0500
Subject: [PATCH v23 05/14] Eliminate XLOG_HEAP2_VISIBLE from vacuum phase I
prune/freeze
Vacuum no longer emits a separate WAL record for each page set
all-visible or all-frozen during phase I. Instead, visibility map
updates are now included in the XLOG_HEAP2_PRUNE_VACUUM_SCAN record that
is already emitted for pruning and freezing.
Previously, heap_page_prune_and_freeze() determined whether a page was
all-visible, but the corresponding VM bits were only set later in
lazy_scan_prune(). Now the VM is updated immediately in
heap_page_prune_and_freeze(), at the same time as the heap
modifications.
This change applies only to vacuum phase I, not to pruning performed
during normal page access.
NOTE: This commit is the main commit and all review-only commits
preceding it will be squashed into it.
Author: Melanie Plageman <melanieplageman@gmail.com>
Reviewed-by: Andres Freund <andres@anarazel.de>
Reviewed-by: Robert Haas <robertmhaas@gmail.com>
Reviewed-by: Kirill Reshke <reshkekirill@gmail.com>
Reviewed-by: Chao Li <li.evan.chao@gmail.com>
Discussion: https://postgr.es/m/flat/CAAKRu_ZMw6Npd_qm2KM%2BFwQ3cMOMx1Dh3VMhp8-V7SOLxdK9-g%40mail.gmail.com
---
src/backend/access/heap/heapam_xlog.c | 48 +++--
src/backend/access/heap/pruneheap.c | 294 +++++++++++++++-----------
src/backend/access/heap/vacuumlazy.c | 1 +
src/include/access/heapam.h | 1 +
4 files changed, 212 insertions(+), 132 deletions(-)
diff --git a/src/backend/access/heap/heapam_xlog.c b/src/backend/access/heap/heapam_xlog.c
index 11cb3f74da5..b1ceab71928 100644
--- a/src/backend/access/heap/heapam_xlog.c
+++ b/src/backend/access/heap/heapam_xlog.c
@@ -104,6 +104,8 @@ heap_xlog_prune_freeze(XLogReaderState *record)
OffsetNumber *frz_offsets;
char *dataptr = XLogRecGetBlockData(record, 0, &datalen);
bool do_prune;
+ bool set_lsn = false;
+ bool mark_buffer_dirty = false;
heap_xlog_deserialize_prune_and_freeze(dataptr, xlrec.flags,
&nplans, &plans, &frz_offsets,
@@ -157,17 +159,39 @@ heap_xlog_prune_freeze(XLogReaderState *record)
/* There should be no more data */
Assert((char *) frz_offsets == dataptr + datalen);
- if (vmflags & VISIBILITYMAP_VALID_BITS)
- PageSetAllVisible(page);
-
- MarkBufferDirty(buffer);
+ if (do_prune || nplans > 0)
+ mark_buffer_dirty = set_lsn = true;
/*
- * See log_heap_prune_and_freeze() for commentary on when we set the
- * heap page LSN.
+ * The critical integrity requirement here is that we must never end
+ * up with the visibility map bit set and the page-level
+ * PD_ALL_VISIBLE bit unset. If that were to occur, a subsequent page
+ * modification would fail to clear the visibility map bit.
+ *
+ * vmflags may be nonzero with PD_ALL_VISIBLE already set (e.g. when
+ * marking an all-visible page all-frozen). If only the VM is updated,
+ * the heap page need not be dirtied.
*/
- if (do_prune || nplans > 0 ||
- ((vmflags & VISIBILITYMAP_VALID_BITS) && XLogHintBitIsNeeded()))
+ if ((vmflags & VISIBILITYMAP_VALID_BITS) && !PageIsAllVisible(page))
+ {
+ PageSetAllVisible(page);
+ mark_buffer_dirty = true;
+
+ /*
+ * See log_heap_prune_and_freeze() for commentary on when we set
+ * the heap page LSN.
+ */
+ if (XLogHintBitIsNeeded())
+ set_lsn = true;
+ }
+
+ /* We should always mark a buffer dirty before stamping with an LSN */
+ Assert(!set_lsn || mark_buffer_dirty);
+
+ if (mark_buffer_dirty)
+ MarkBufferDirty(buffer);
+
+ if (set_lsn)
PageSetLSN(page, lsn);
/*
@@ -246,10 +270,10 @@ heap_xlog_prune_freeze(XLogReaderState *record)
/*
* Replay XLOG_HEAP2_VISIBLE records.
*
- * The critical integrity requirement here is that we must never end up with
- * a situation where the visibility map bit is set, and the page-level
- * PD_ALL_VISIBLE bit is clear. If that were to occur, then a subsequent
- * page modification would fail to clear the visibility map bit.
+ * The critical integrity requirement here is that we must never end up with a
+ * situation where the visibility map bit is set, and the page-level
+ * PD_ALL_VISIBLE bit is clear. If that were to occur, then a subsequent page
+ * modification would fail to clear the visibility map bit.
*/
static void
heap_xlog_visible(XLogReaderState *record)
diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index 2512b5d83e3..b851d723c74 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -194,6 +194,12 @@ static void page_verify_redirects(Page page);
static bool heap_page_will_freeze(Relation relation, Buffer buffer,
bool did_tuple_hint_fpi, bool do_prune, bool do_hint_prune,
PruneState *prstate);
+static TransactionId get_conflict_xid(bool do_prune, bool do_freeze, bool do_set_vm,
+ uint8 new_vmbits,
+ TransactionId latest_xid_removed,
+ TransactionId frz_conflict_horizon,
+ TransactionId visibility_cutoff_xid,
+ bool blk_already_av);
static bool heap_page_will_set_vis(Relation relation,
BlockNumber heap_blk,
Buffer heap_buf,
@@ -783,6 +789,64 @@ heap_page_will_freeze(Relation relation, Buffer buffer,
return do_freeze;
}
+/*
+ * Calculate the conflict horizon for the whole XLOG_HEAP2_PRUNE_VACUUM_SCAN
+ * or XLOG_HEAP2_PRUNE_ON_ACCESS record.
+ */
+static TransactionId
+get_conflict_xid(bool do_prune, bool do_freeze, bool do_set_vm, uint8 new_vmbits,
+ TransactionId latest_xid_removed, TransactionId frz_conflict_horizon,
+ TransactionId visibility_cutoff_xid, bool blk_already_av)
+{
+ TransactionId conflict_xid;
+
+ /*
+ * We can omit the snapshot conflict horizon if we are not pruning or
+ * freezing any tuples and are setting an already all-visible page
+ * all-frozen in the VM. In this case, all of the tuples on the page must
+ * already be visible to all MVCC snapshots on the standby.
+ */
+ if (!do_prune && !do_freeze &&
+ do_set_vm && blk_already_av && (new_vmbits & VISIBILITYMAP_ALL_FROZEN))
+ return InvalidTransactionId;
+
+ /*
+ * The snapshotConflictHorizon for the whole record should be the most
+ * conservative of all the horizons calculated for any of the possible
+ * modifications. If this record will prune tuples, any transactions on
+ * the standby older than the youngest xmax of the most recently removed
+ * tuple this record will prune will conflict. If this record will freeze
+ * tuples, any transactions on the standby with xids older than the
+ * youngest tuple this record will freeze will conflict.
+ */
+ conflict_xid = InvalidTransactionId;
+
+ /*
+ * If we are updating the VM, the conflict horizon is almost always the
+ * visibility cutoff XID.
+ *
+ * Separately, if we are freezing any tuples, as an optimization, we can
+ * use the visibility_cutoff_xid as the conflict horizon if the page will
+ * be all-frozen. This is true even if there are LP_DEAD line pointers
+ * because we ignored those when maintaining the visibility_cutoff_xid.
+ * This will have been calculated earlier as the frz_conflict_horizon when
+ * we determined we would freeze.
+ */
+ if (do_set_vm)
+ conflict_xid = visibility_cutoff_xid;
+ else if (do_freeze)
+ conflict_xid = frz_conflict_horizon;
+
+ /*
+ * If we are removing tuples with a younger xmax than our so far
+ * calculated conflict_xid, we must use this as our horizon.
+ */
+ if (TransactionIdFollows(latest_xid_removed, conflict_xid))
+ conflict_xid = latest_xid_removed;
+
+ return conflict_xid;
+}
+
/*
* Decide whether to set the visibility map bits for heap_blk, using
* information from PruneState and blk_known_av. Some callers may already have
@@ -984,7 +1048,6 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
Buffer vmbuffer = params->vmbuffer;
Page page = BufferGetPage(buffer);
BlockNumber blockno = BufferGetBlockNumber(buffer);
- TransactionId vm_conflict_horizon = InvalidTransactionId;
PruneState prstate;
bool do_freeze;
bool do_prune;
@@ -993,6 +1056,9 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
bool do_set_pd_vis;
bool did_tuple_hint_fpi;
int64 fpi_before = pgWalUsage.wal_fpi;
+ TransactionId conflict_xid = InvalidTransactionId;
+ uint8 new_vmbits = 0;
+ uint8 old_vmbits = 0;
/* Initialize prstate */
prune_freeze_setup(params,
@@ -1058,6 +1124,39 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
prstate.all_visible = prstate.all_frozen = false;
Assert(!prstate.all_frozen || prstate.all_visible);
+ Assert(!prstate.all_visible || (prstate.lpdead_items == 0));
+
+ /*
+ * Decide whether to set the page-level PD_ALL_VISIBLE bit and the VM bits
+ * based on information from the VM and the all_visible/all_frozen flags.
+ *
+ * While it is valid for PD_ALL_VISIBLE to be set when the corresponding
+ * VM bit is clear, we strongly prefer to keep them in sync.
+ *
+ * Accordingly, we also allow updating only the VM when PD_ALL_VISIBLE has
+ * already been set. Setting only the VM is most common when setting an
+ * already all-visible page all-frozen.
+ */
+ do_set_vm = heap_page_will_set_vis(params->relation,
+ blockno, buffer, vmbuffer, params->blk_known_av,
+ &prstate, &new_vmbits, &do_set_pd_vis);
+
+ /* We should only set the VM if PD_ALL_VISIBLE is set or will be */
+ Assert(!do_set_vm || do_set_pd_vis || PageIsAllVisible(page));
+
+ /*
+ * new_vmbits should be 0 regardless of whether or not the page is
+ * all-visible if we do not intend to set the VM.
+ */
+ Assert(do_set_vm || new_vmbits == 0);
+
+ conflict_xid = get_conflict_xid(do_prune, do_freeze, do_set_vm, new_vmbits,
+ prstate.latest_xid_removed, prstate.frz_conflict_horizon,
+ prstate.visibility_cutoff_xid, params->blk_known_av);
+
+ /* Lock vmbuffer before entering a critical section */
+ if (do_set_vm)
+ LockBuffer(vmbuffer, BUFFER_LOCK_EXCLUSIVE);
/* Any error while applying the changes is critical */
START_CRIT_SECTION();
@@ -1079,14 +1178,17 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
/*
* If that's all we had to do to the page, this is a non-WAL-logged
- * hint. If we are going to freeze or prune the page, we will mark
- * the buffer dirty below.
+ * hint. If we are going to freeze or prune the page or set
+ * PD_ALL_VISIBLE, we will mark the buffer dirty below.
+ *
+ * Setting PD_ALL_VISIBLE is fully WAL-logged because it is forbidden
+ * for the VM to be set and PD_ALL_VISIBLE to be clear.
*/
- if (!do_freeze && !do_prune)
+ if (!do_freeze && !do_prune && !do_set_pd_vis)
MarkBufferDirtyHint(buffer, true);
}
- if (do_prune || do_freeze)
+ if (do_prune || do_freeze || do_set_vm)
{
/* Apply the planned item changes and repair page fragmentation. */
if (do_prune)
@@ -1100,36 +1202,33 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
if (do_freeze)
heap_freeze_prepared_tuples(buffer, prstate.frozen, prstate.nfrozen);
- MarkBufferDirty(buffer);
+ if (do_set_pd_vis)
+ PageSetAllVisible(page);
+
+ if (do_prune || do_freeze || do_set_pd_vis)
+ MarkBufferDirty(buffer);
+
+ if (do_set_vm)
+ {
+ Assert(PageIsAllVisible(page));
+ old_vmbits = visibilitymap_set_vmbits(blockno,
+ vmbuffer, new_vmbits,
+ params->relation->rd_locator);
+ Assert(old_vmbits != new_vmbits);
+ }
/*
* Emit a WAL XLOG_HEAP2_PRUNE* record showing what we did
*/
if (RelationNeedsWAL(params->relation))
{
- /*
- * The snapshotConflictHorizon for the whole record should be the
- * most conservative of all the horizons calculated for any of the
- * possible modifications. If this record will prune tuples, any
- * transactions on the standby older than the youngest xmax of the
- * most recently removed tuple this record will prune will
- * conflict. If this record will freeze tuples, any transactions
- * on the standby with xids older than the youngest tuple this
- * record will freeze will conflict.
- */
- TransactionId conflict_xid;
-
- if (TransactionIdFollows(prstate.frz_conflict_horizon,
- prstate.latest_xid_removed))
- conflict_xid = prstate.frz_conflict_horizon;
- else
- conflict_xid = prstate.latest_xid_removed;
-
log_heap_prune_and_freeze(params->relation, buffer,
- InvalidBuffer, /* vmbuffer */
- 0, /* vmflags */
+ do_set_vm ? vmbuffer : InvalidBuffer,
+ do_set_vm ? new_vmbits : 0,
conflict_xid,
- true, params->reason,
+ true, /* cleanup lock */
+ do_set_pd_vis,
+ params->reason,
prstate.frozen, prstate.nfrozen,
prstate.redirected, prstate.nredirected,
prstate.nowdead, prstate.ndead,
@@ -1139,43 +1238,8 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
END_CRIT_SECTION();
- /* Copy information back for caller */
- presult->ndeleted = prstate.ndeleted;
- presult->nnewlpdead = prstate.ndead;
- presult->nfrozen = prstate.nfrozen;
- presult->live_tuples = prstate.live_tuples;
- presult->recently_dead_tuples = prstate.recently_dead_tuples;
- presult->hastup = prstate.hastup;
-
- presult->lpdead_items = prstate.lpdead_items;
- /* the presult->deadoffsets array was already filled in */
-
- if (prstate.attempt_freeze)
- {
- if (presult->nfrozen > 0)
- {
- *new_relfrozen_xid = prstate.pagefrz.FreezePageRelfrozenXid;
- *new_relmin_mxid = prstate.pagefrz.FreezePageRelminMxid;
- }
- else
- {
- *new_relfrozen_xid = prstate.pagefrz.NoFreezePageRelfrozenXid;
- *new_relmin_mxid = prstate.pagefrz.NoFreezePageRelminMxid;
- }
- }
-
- /*
- * If updating the visibility map, the conflict horizon for that record
- * must be the newest xmin on the page. However, if the page is
- * completely frozen, there can be no conflict and the vm_conflict_horizon
- * should remain InvalidTransactionId. This includes the case that we
- * just froze all the tuples; the prune-freeze record included the
- * conflict XID already so we don't need to again.
- */
- if (prstate.all_frozen)
- vm_conflict_horizon = InvalidTransactionId;
- else
- vm_conflict_horizon = prstate.visibility_cutoff_xid;
+ if (do_set_vm)
+ LockBuffer(vmbuffer, BUFFER_LOCK_UNLOCK);
/*
* During its second pass over the heap, VACUUM calls
@@ -1190,7 +1254,8 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
TransactionId debug_cutoff;
bool debug_all_frozen;
- Assert(presult->lpdead_items == 0);
+ Assert(prstate.lpdead_items == 0);
+ Assert(prstate.cutoffs);
Assert(heap_page_is_all_visible(params->relation, buffer,
prstate.cutoffs->OldestXmin,
@@ -1200,62 +1265,36 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
Assert(prstate.all_frozen == debug_all_frozen);
Assert(!TransactionIdIsValid(debug_cutoff) ||
- debug_cutoff == vm_conflict_horizon);
+ debug_cutoff == prstate.visibility_cutoff_xid);
}
#endif
- Assert(!prstate.all_frozen || prstate.all_visible);
- Assert(!prstate.all_visible || (prstate.lpdead_items == 0));
-
- /*
- * Decide whether to set the page-level PD_ALL_VISIBLE bit and the VM bits
- * based on information from the VM and the all_visible/all_frozen flags.
- *
- * While it is valid for PD_ALL_VISIBLE to be set when the corresponding
- * VM bit is clear, we strongly prefer to keep them in sync.
- *
- * Accordingly, we also allow updating only the VM when PD_ALL_VISIBLE has
- * already been set. Setting only the VM is most common when setting an
- * already all-visible page all-frozen.
- */
- do_set_vm = heap_page_will_set_vis(params->relation,
- blockno,
- buffer,
- vmbuffer,
- params->blk_known_av,
- &prstate,
- &presult->new_vmbits,
- &do_set_pd_vis);
-
- /* We should only set the VM if PD_ALL_VISIBLE is set or will be */
- Assert(!do_set_vm || do_set_pd_vis || PageIsAllVisible(page));
+ /* Copy information back for caller */
+ presult->ndeleted = prstate.ndeleted;
+ presult->nnewlpdead = prstate.ndead;
+ presult->nfrozen = prstate.nfrozen;
+ presult->live_tuples = prstate.live_tuples;
+ presult->recently_dead_tuples = prstate.recently_dead_tuples;
+ presult->hastup = prstate.hastup;
+ presult->new_vmbits = new_vmbits;
+ presult->old_vmbits = old_vmbits;
- /*
- * new_vmbits should be 0 regardless of whether or not the page is
- * all-visible if we do not intend to set the VM.
- */
- Assert(do_set_vm || presult->new_vmbits == 0);
+ presult->lpdead_items = prstate.lpdead_items;
+ /* the presult->deadoffsets array was already filled in */
- if (do_set_pd_vis)
+ if (prstate.attempt_freeze)
{
- /*
- * NB: If the heap page is all-visible but the VM bit is not set, we
- * don't need to dirty the heap page. However, if checksums are
- * enabled, we do need to make sure that the heap page is dirtied
- * before passing it to visibilitymap_set(), because it may be logged.
- * Given that this situation should only happen in rare cases after a
- * crash, it is not worth optimizing.
- */
- MarkBufferDirty(buffer);
- PageSetAllVisible(page);
+ if (presult->nfrozen > 0)
+ {
+ *new_relfrozen_xid = prstate.pagefrz.FreezePageRelfrozenXid;
+ *new_relmin_mxid = prstate.pagefrz.FreezePageRelminMxid;
+ }
+ else
+ {
+ *new_relfrozen_xid = prstate.pagefrz.NoFreezePageRelfrozenXid;
+ *new_relmin_mxid = prstate.pagefrz.NoFreezePageRelminMxid;
+ }
}
-
- presult->old_vmbits = 0;
- if (do_set_vm)
- presult->old_vmbits = visibilitymap_set(params->relation, blockno, buffer,
- InvalidXLogRecPtr,
- vmbuffer, vm_conflict_horizon,
- presult->new_vmbits);
}
@@ -2387,14 +2426,18 @@ heap_log_freeze_plan(HeapTupleFreeze *tuples, int ntuples,
*
* This is used for several different page maintenance operations:
*
- * - Page pruning, in VACUUM's 1st pass or on access: Some items are
+ * - Page pruning, in vacuum phase I or on-access: Some items are
* redirected, some marked dead, and some removed altogether.
*
- * - Freezing: Items are marked as 'frozen'.
+ * - Freezing: During vacuum phase I, items are marked as 'frozen'
+ *
+ * - Reaping: During vacuum phase III, items that are already LP_DEAD are
+ * marked as unused.
*
- * - Vacuum, 2nd pass: Items that are already LP_DEAD are marked as unused.
+ * - VM updates: After vacuum phases I and III, the heap page may be marked
+ * all-visible and all-frozen.
*
- * They have enough commonalities that we use a single WAL record for them
+ * These changes all happen together, so we use a single WAL record for them
* all.
*
* If replaying the record requires a cleanup lock, pass cleanup_lock = true.
@@ -2406,6 +2449,15 @@ heap_log_freeze_plan(HeapTupleFreeze *tuples, int ntuples,
* case, vmbuffer should already have been updated and marked dirty and should
* still be pinned and locked.
*
+ * set_pd_all_vis indicates that we set PD_ALL_VISIBLE and thus should update
+ * the page LSN when checksums/wal_log_hints are enabled even if we did not
+ * prune or freeze tuples on the page.
+ *
+ * In some cases, such as when heap_page_prune_and_freeze() is setting an
+ * already marked all-visible page all-frozen, PD_ALL_VISIBLE may already be
+ * set. So, it is possible for vmflags to be non-zero and set_pd_all_vis to be
+ * false.
+ *
* Note: This function scribbles on the 'frozen' array.
*
* Note: This is called in a critical section, so careful what you do here.
@@ -2415,6 +2467,7 @@ log_heap_prune_and_freeze(Relation relation, Buffer buffer,
Buffer vmbuffer, uint8 vmflags,
TransactionId conflict_xid,
bool cleanup_lock,
+ bool set_pd_all_vis,
PruneReason reason,
HeapTupleFreeze *frozen, int nfrozen,
OffsetNumber *redirected, int nredirected,
@@ -2451,7 +2504,7 @@ log_heap_prune_and_freeze(Relation relation, Buffer buffer,
*/
if (!do_prune &&
nfrozen == 0 &&
- (!do_set_vm || !XLogHintBitIsNeeded()))
+ (!set_pd_all_vis || !XLogHintBitIsNeeded()))
regbuf_flags_heap |= REGBUF_NO_IMAGE;
/*
@@ -2569,7 +2622,8 @@ log_heap_prune_and_freeze(Relation relation, Buffer buffer,
* See comment at the top of the function about regbuf_flags_heap for
* details on when we can advance the page LSN.
*/
- if (do_prune || nfrozen > 0 || (do_set_vm && XLogHintBitIsNeeded()))
+ if (do_prune || nfrozen > 0 ||
+ (set_pd_all_vis && XLogHintBitIsNeeded()))
{
Assert(BufferIsDirty(buffer));
PageSetLSN(BufferGetPage(buffer), recptr);
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index 4aa425ec945..0d39d57115d 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -2776,6 +2776,7 @@ lazy_vacuum_heap_page(LVRelState *vacrel, BlockNumber blkno, Buffer buffer,
vmflags,
conflict_xid,
false, /* no cleanup lock required */
+ (vmflags & VISIBILITYMAP_VALID_BITS) != 0,
PRUNE_VACUUM_CLEANUP,
NULL, 0, /* frozen */
NULL, 0, /* redirected */
diff --git a/src/include/access/heapam.h b/src/include/access/heapam.h
index b20096b6ca1..14c1d92604d 100644
--- a/src/include/access/heapam.h
+++ b/src/include/access/heapam.h
@@ -435,6 +435,7 @@ extern void log_heap_prune_and_freeze(Relation relation, Buffer buffer,
Buffer vmbuffer, uint8 vmflags,
TransactionId conflict_xid,
bool cleanup_lock,
+ bool set_pd_all_vis,
PruneReason reason,
HeapTupleFreeze *frozen, int nfrozen,
OffsetNumber *redirected, int nredirected,
--
2.43.0
v23-0006-Eliminate-XLOG_HEAP2_VISIBLE-from-empty-page-vac.patchtext/x-patch; charset=US-ASCII; name=v23-0006-Eliminate-XLOG_HEAP2_VISIBLE-from-empty-page-vac.patchDownload
From fd1518bb82741f8b0e554206c2e35a64bf12fbc3 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <melanieplageman@gmail.com>
Date: Sat, 27 Sep 2025 11:55:21 -0400
Subject: [PATCH v23 06/14] Eliminate XLOG_HEAP2_VISIBLE from empty-page vacuum
As part of removing XLOG_HEAP2_VISIBLE records, phase I of VACUUM now
marks empty pages all-visible in a XLOG_HEAP2_PRUNE_VACUUM_SCAN record.
Author: Melanie Plageman <melanieplageman@gmail.com>
Reviewed-by: Robert Haas <robertmhaas@gmail.com>
Reviewed-by: Andres Freund <andres@anarazel.de>
Reviewed-by: Chao Li <li.evan.chao@gmail.com>
---
src/backend/access/heap/vacuumlazy.c | 36 +++++++++++++++++++++++-----
1 file changed, 30 insertions(+), 6 deletions(-)
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index 0d39d57115d..d03442abcc1 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -1872,9 +1872,12 @@ lazy_scan_new_or_empty(LVRelState *vacrel, Buffer buf, BlockNumber blkno,
*/
if (!PageIsAllVisible(page))
{
+ /* Lock vmbuffer before entering critical section */
+ LockBuffer(vmbuffer, BUFFER_LOCK_EXCLUSIVE);
+
START_CRIT_SECTION();
- /* mark buffer dirty before writing a WAL record */
+ /* Mark buffer dirty before writing any WAL records */
MarkBufferDirty(buf);
/*
@@ -1891,13 +1894,34 @@ lazy_scan_new_or_empty(LVRelState *vacrel, Buffer buf, BlockNumber blkno,
log_newpage_buffer(buf, true);
PageSetAllVisible(page);
- visibilitymap_set(vacrel->rel, blkno, buf,
- InvalidXLogRecPtr,
- vmbuffer, InvalidTransactionId,
- VISIBILITYMAP_ALL_VISIBLE |
- VISIBILITYMAP_ALL_FROZEN);
+ visibilitymap_set_vmbits(blkno,
+ vmbuffer,
+ VISIBILITYMAP_ALL_VISIBLE |
+ VISIBILITYMAP_ALL_FROZEN,
+ vacrel->rel->rd_locator);
+
+ /*
+ * Emit WAL for setting PD_ALL_VISIBLE on the heap page and
+ * setting the VM.
+ */
+ if (RelationNeedsWAL(vacrel->rel))
+ log_heap_prune_and_freeze(vacrel->rel, buf,
+ vmbuffer,
+ VISIBILITYMAP_ALL_VISIBLE |
+ VISIBILITYMAP_ALL_FROZEN,
+ InvalidTransactionId, /* conflict xid */
+ false, /* cleanup lock */
+ true, /* set_pd_all_vis */
+ PRUNE_VACUUM_SCAN, /* reason */
+ NULL, 0,
+ NULL, 0,
+ NULL, 0,
+ NULL, 0);
+
END_CRIT_SECTION();
+ LockBuffer(vmbuffer, BUFFER_LOCK_UNLOCK);
+
/* Count the newly all-frozen pages for logging */
vacrel->vm_new_visible_pages++;
vacrel->vm_new_visible_frozen_pages++;
--
2.43.0
v23-0007-Remove-XLOG_HEAP2_VISIBLE-entirely.patchtext/x-patch; charset=US-ASCII; name=v23-0007-Remove-XLOG_HEAP2_VISIBLE-entirely.patchDownload
From faf079025c6354fb2fcb0695da29118c476ae4dd Mon Sep 17 00:00:00 2001
From: Melanie Plageman <melanieplageman@gmail.com>
Date: Sat, 27 Sep 2025 11:55:36 -0400
Subject: [PATCH v23 07/14] Remove XLOG_HEAP2_VISIBLE entirely
As no remaining users emit XLOG_HEAP2_VISIBLE records.
This includes deleting the xl_heap_visible struct and all functions
responsible for emitting or replaying XLOG_HEAP2_VISIBLE records.
This changes the visibility map API, so any external users/consumers of
the VM-only WAL record will need to change.
Author: Melanie Plageman <melanieplageman@gmail.com>
Reviewed-by: Andrey Borodin <x4mmm@yandex-team.ru>
Reviewed-by: Andres Freund <andres@anarazel.de>
Reviewed-by: Chao Li <li.evan.chao@gmail.com>
---
src/backend/access/common/bufmask.c | 4 +-
src/backend/access/heap/heapam.c | 54 +-------
src/backend/access/heap/heapam_xlog.c | 155 ++---------------------
src/backend/access/heap/pruneheap.c | 6 +-
src/backend/access/heap/vacuumlazy.c | 16 +--
src/backend/access/heap/visibilitymap.c | 112 +---------------
src/backend/access/rmgrdesc/heapdesc.c | 10 --
src/backend/replication/logical/decode.c | 1 -
src/backend/storage/ipc/standby.c | 12 +-
src/include/access/heapam_xlog.h | 28 +---
src/include/access/visibilitymap.h | 13 +-
src/include/access/visibilitymapdefs.h | 9 --
src/tools/pgindent/typedefs.list | 1 -
13 files changed, 46 insertions(+), 375 deletions(-)
diff --git a/src/backend/access/common/bufmask.c b/src/backend/access/common/bufmask.c
index bb260cffa68..5f07f179415 100644
--- a/src/backend/access/common/bufmask.c
+++ b/src/backend/access/common/bufmask.c
@@ -56,8 +56,8 @@ mask_page_hint_bits(Page page)
/*
* During replay, if the page LSN has advanced past our XLOG record's LSN,
- * we don't mark the page all-visible. See heap_xlog_visible() for
- * details.
+ * we don't mark the page all-visible. See heap_xlog_prune_and_freeze()
+ * for more details.
*/
PageClearAllVisible(page);
}
diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index 4d382a04338..3ad78ba4694 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -2539,11 +2539,11 @@ heap_multi_insert(Relation relation, TupleTableSlot **slots, int ntuples,
else if (all_frozen_set)
{
PageSetAllVisible(page);
- visibilitymap_set_vmbits(BufferGetBlockNumber(buffer),
- vmbuffer,
- VISIBILITYMAP_ALL_VISIBLE |
- VISIBILITYMAP_ALL_FROZEN,
- relation->rd_locator);
+ visibilitymap_set(BufferGetBlockNumber(buffer),
+ vmbuffer,
+ VISIBILITYMAP_ALL_VISIBLE |
+ VISIBILITYMAP_ALL_FROZEN,
+ relation->rd_locator);
}
/*
@@ -8812,50 +8812,6 @@ bottomup_sort_and_shrink(TM_IndexDeleteOp *delstate)
return nblocksfavorable;
}
-/*
- * Perform XLogInsert for a heap-visible operation. 'block' is the block
- * being marked all-visible, and vm_buffer is the buffer containing the
- * corresponding visibility map block. Both should have already been modified
- * and dirtied.
- *
- * snapshotConflictHorizon comes from the largest xmin on the page being
- * marked all-visible. REDO routine uses it to generate recovery conflicts.
- *
- * If checksums or wal_log_hints are enabled, we may also generate a full-page
- * image of heap_buffer. Otherwise, we optimize away the FPI (by specifying
- * REGBUF_NO_IMAGE for the heap buffer), in which case the caller should *not*
- * update the heap page's LSN.
- */
-XLogRecPtr
-log_heap_visible(Relation rel, Buffer heap_buffer, Buffer vm_buffer,
- TransactionId snapshotConflictHorizon, uint8 vmflags)
-{
- xl_heap_visible xlrec;
- XLogRecPtr recptr;
- uint8 flags;
-
- Assert(BufferIsValid(heap_buffer));
- Assert(BufferIsValid(vm_buffer));
-
- xlrec.snapshotConflictHorizon = snapshotConflictHorizon;
- xlrec.flags = vmflags;
- if (RelationIsAccessibleInLogicalDecoding(rel))
- xlrec.flags |= VISIBILITYMAP_XLOG_CATALOG_REL;
- XLogBeginInsert();
- XLogRegisterData(&xlrec, SizeOfHeapVisible);
-
- XLogRegisterBuffer(0, vm_buffer, 0);
-
- flags = REGBUF_STANDARD;
- if (!XLogHintBitIsNeeded())
- flags |= REGBUF_NO_IMAGE;
- XLogRegisterBuffer(1, heap_buffer, flags);
-
- recptr = XLogInsert(RM_HEAP2_ID, XLOG_HEAP2_VISIBLE);
-
- return recptr;
-}
-
/*
* Perform XLogInsert for a heap-update operation. Caller must already
* have modified the buffer(s) and marked them dirty.
diff --git a/src/backend/access/heap/heapam_xlog.c b/src/backend/access/heap/heapam_xlog.c
index b1ceab71928..f0de2c136a0 100644
--- a/src/backend/access/heap/heapam_xlog.c
+++ b/src/backend/access/heap/heapam_xlog.c
@@ -254,7 +254,7 @@ heap_xlog_prune_freeze(XLogReaderState *record)
if (PageIsNew(vmpage))
PageInit(vmpage, BLCKSZ, 0);
- visibilitymap_set_vmbits(blkno, vmbuffer, vmflags, rlocator);
+ visibilitymap_set(blkno, vmbuffer, vmflags, rlocator);
Assert(BufferIsDirty(vmbuffer));
PageSetLSN(vmpage, lsn);
@@ -267,142 +267,6 @@ heap_xlog_prune_freeze(XLogReaderState *record)
XLogRecordPageWithFreeSpace(rlocator, blkno, freespace);
}
-/*
- * Replay XLOG_HEAP2_VISIBLE records.
- *
- * The critical integrity requirement here is that we must never end up with a
- * situation where the visibility map bit is set, and the page-level
- * PD_ALL_VISIBLE bit is clear. If that were to occur, then a subsequent page
- * modification would fail to clear the visibility map bit.
- */
-static void
-heap_xlog_visible(XLogReaderState *record)
-{
- XLogRecPtr lsn = record->EndRecPtr;
- xl_heap_visible *xlrec = (xl_heap_visible *) XLogRecGetData(record);
- Buffer vmbuffer = InvalidBuffer;
- Buffer buffer;
- Page page;
- RelFileLocator rlocator;
- BlockNumber blkno;
- XLogRedoAction action;
-
- Assert((xlrec->flags & VISIBILITYMAP_XLOG_VALID_BITS) == xlrec->flags);
-
- XLogRecGetBlockTag(record, 1, &rlocator, NULL, &blkno);
-
- /*
- * If there are any Hot Standby transactions running that have an xmin
- * horizon old enough that this page isn't all-visible for them, they
- * might incorrectly decide that an index-only scan can skip a heap fetch.
- *
- * NB: It might be better to throw some kind of "soft" conflict here that
- * forces any index-only scan that is in flight to perform heap fetches,
- * rather than killing the transaction outright.
- */
- if (InHotStandby)
- ResolveRecoveryConflictWithSnapshot(xlrec->snapshotConflictHorizon,
- xlrec->flags & VISIBILITYMAP_XLOG_CATALOG_REL,
- rlocator);
-
- /*
- * Read the heap page, if it still exists. If the heap file has dropped or
- * truncated later in recovery, we don't need to update the page, but we'd
- * better still update the visibility map.
- */
- action = XLogReadBufferForRedo(record, 1, &buffer);
- if (action == BLK_NEEDS_REDO)
- {
- /*
- * We don't bump the LSN of the heap page when setting the visibility
- * map bit (unless checksums or wal_hint_bits is enabled, in which
- * case we must). This exposes us to torn page hazards, but since
- * we're not inspecting the existing page contents in any way, we
- * don't care.
- */
- page = BufferGetPage(buffer);
-
- PageSetAllVisible(page);
-
- if (XLogHintBitIsNeeded())
- PageSetLSN(page, lsn);
-
- MarkBufferDirty(buffer);
- }
- else if (action == BLK_RESTORED)
- {
- /*
- * If heap block was backed up, we already restored it and there's
- * nothing more to do. (This can only happen with checksums or
- * wal_log_hints enabled.)
- */
- }
-
- if (BufferIsValid(buffer))
- {
- Size space = PageGetFreeSpace(BufferGetPage(buffer));
-
- UnlockReleaseBuffer(buffer);
-
- /*
- * Since FSM is not WAL-logged and only updated heuristically, it
- * easily becomes stale in standbys. If the standby is later promoted
- * and runs VACUUM, it will skip updating individual free space
- * figures for pages that became all-visible (or all-frozen, depending
- * on the vacuum mode,) which is troublesome when FreeSpaceMapVacuum
- * propagates too optimistic free space values to upper FSM layers;
- * later inserters try to use such pages only to find out that they
- * are unusable. This can cause long stalls when there are many such
- * pages.
- *
- * Forestall those problems by updating FSM's idea about a page that
- * is becoming all-visible or all-frozen.
- *
- * Do this regardless of a full-page image being applied, since the
- * FSM data is not in the page anyway.
- */
- if (xlrec->flags & VISIBILITYMAP_VALID_BITS)
- XLogRecordPageWithFreeSpace(rlocator, blkno, space);
- }
-
- /*
- * Even if we skipped the heap page update due to the LSN interlock, it's
- * still safe to update the visibility map. Any WAL record that clears
- * the visibility map bit does so before checking the page LSN, so any
- * bits that need to be cleared will still be cleared.
- */
- if (XLogReadBufferForRedoExtended(record, 0, RBM_ZERO_ON_ERROR, false,
- &vmbuffer) == BLK_NEEDS_REDO)
- {
- Page vmpage = BufferGetPage(vmbuffer);
- Relation reln;
- uint8 vmbits;
-
- /* initialize the page if it was read as zeros */
- if (PageIsNew(vmpage))
- PageInit(vmpage, BLCKSZ, 0);
-
- /* remove VISIBILITYMAP_XLOG_* */
- vmbits = xlrec->flags & VISIBILITYMAP_VALID_BITS;
-
- /*
- * XLogReadBufferForRedoExtended locked the buffer. But
- * visibilitymap_set will handle locking itself.
- */
- LockBuffer(vmbuffer, BUFFER_LOCK_UNLOCK);
-
- reln = CreateFakeRelcacheEntry(rlocator);
-
- visibilitymap_set(reln, blkno, InvalidBuffer, lsn, vmbuffer,
- xlrec->snapshotConflictHorizon, vmbits);
-
- ReleaseBuffer(vmbuffer);
- FreeFakeRelcacheEntry(reln);
- }
- else if (BufferIsValid(vmbuffer))
- UnlockReleaseBuffer(vmbuffer);
-}
-
/*
* Given an "infobits" field from an XLog record, set the correct bits in the
* given infomask and infomask2 for the tuple touched by the record.
@@ -780,8 +644,8 @@ heap_xlog_multi_insert(XLogReaderState *record)
*
* During recovery, however, no concurrent writers exist. Therefore,
* updating the VM without holding the heap page lock is safe enough. This
- * same approach is taken when replaying xl_heap_visible records (see
- * heap_xlog_visible()).
+ * same approach is taken when replaying XLOG_HEAP2_PRUNE* records (see
+ * heap_xlog_prune_and_freeze()).
*/
if ((xlrec->flags & XLH_INSERT_ALL_FROZEN_SET) &&
XLogReadBufferForRedoExtended(record, 1, RBM_ZERO_ON_ERROR, false,
@@ -793,11 +657,11 @@ heap_xlog_multi_insert(XLogReaderState *record)
if (PageIsNew(vmpage))
PageInit(vmpage, BLCKSZ, 0);
- visibilitymap_set_vmbits(blkno,
- vmbuffer,
- VISIBILITYMAP_ALL_VISIBLE |
- VISIBILITYMAP_ALL_FROZEN,
- rlocator);
+ visibilitymap_set(blkno,
+ vmbuffer,
+ VISIBILITYMAP_ALL_VISIBLE |
+ VISIBILITYMAP_ALL_FROZEN,
+ rlocator);
Assert(BufferIsDirty(vmbuffer));
PageSetLSN(vmpage, lsn);
@@ -1378,9 +1242,6 @@ heap2_redo(XLogReaderState *record)
case XLOG_HEAP2_PRUNE_VACUUM_CLEANUP:
heap_xlog_prune_freeze(record);
break;
- case XLOG_HEAP2_VISIBLE:
- heap_xlog_visible(record);
- break;
case XLOG_HEAP2_MULTI_INSERT:
heap_xlog_multi_insert(record);
break;
diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index b851d723c74..7a778ad3bad 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -1211,9 +1211,9 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
if (do_set_vm)
{
Assert(PageIsAllVisible(page));
- old_vmbits = visibilitymap_set_vmbits(blockno,
- vmbuffer, new_vmbits,
- params->relation->rd_locator);
+ old_vmbits = visibilitymap_set(blockno,
+ vmbuffer, new_vmbits,
+ params->relation->rd_locator);
Assert(old_vmbits != new_vmbits);
}
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index d03442abcc1..5d88a1592e3 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -1894,11 +1894,11 @@ lazy_scan_new_or_empty(LVRelState *vacrel, Buffer buf, BlockNumber blkno,
log_newpage_buffer(buf, true);
PageSetAllVisible(page);
- visibilitymap_set_vmbits(blkno,
- vmbuffer,
- VISIBILITYMAP_ALL_VISIBLE |
- VISIBILITYMAP_ALL_FROZEN,
- vacrel->rel->rd_locator);
+ visibilitymap_set(blkno,
+ vmbuffer,
+ VISIBILITYMAP_ALL_VISIBLE |
+ VISIBILITYMAP_ALL_FROZEN,
+ vacrel->rel->rd_locator);
/*
* Emit WAL for setting PD_ALL_VISIBLE on the heap page and
@@ -2781,9 +2781,9 @@ lazy_vacuum_heap_page(LVRelState *vacrel, BlockNumber blkno, Buffer buffer,
* set PD_ALL_VISIBLE.
*/
PageSetAllVisible(page);
- visibilitymap_set_vmbits(blkno,
- vmbuffer, vmflags,
- vacrel->rel->rd_locator);
+ visibilitymap_set(blkno,
+ vmbuffer, vmflags,
+ vacrel->rel->rd_locator);
conflict_xid = visibility_cutoff_xid;
}
diff --git a/src/backend/access/heap/visibilitymap.c b/src/backend/access/heap/visibilitymap.c
index d14588e92ae..7997e926872 100644
--- a/src/backend/access/heap/visibilitymap.c
+++ b/src/backend/access/heap/visibilitymap.c
@@ -14,8 +14,7 @@
* visibilitymap_clear - clear bits for one page in the visibility map
* visibilitymap_pin - pin a map page for setting a bit
* visibilitymap_pin_ok - check whether correct map page is already pinned
- * visibilitymap_set - set bit(s) in a previously pinned page and log
- * visibilitymap_set_vmbits - set bit(s) in a pinned page
+ * visibilitymap_set - set bit(s) in a previously pinned page
* visibilitymap_get_status - get status of bits
* visibilitymap_count - count number of bits set in visibility map
* visibilitymap_prepare_truncate -
@@ -220,109 +219,6 @@ visibilitymap_pin_ok(BlockNumber heapBlk, Buffer vmbuf)
return BufferIsValid(vmbuf) && BufferGetBlockNumber(vmbuf) == mapBlock;
}
-/*
- * visibilitymap_set - set bit(s) on a previously pinned page
- *
- * recptr is the LSN of the XLOG record we're replaying, if we're in recovery,
- * or InvalidXLogRecPtr in normal running. The VM page LSN is advanced to the
- * one provided; in normal running, we generate a new XLOG record and set the
- * page LSN to that value (though the heap page's LSN may *not* be updated;
- * see below). cutoff_xid is the largest xmin on the page being marked
- * all-visible; it is needed for Hot Standby, and can be InvalidTransactionId
- * if the page contains no tuples. It can also be set to InvalidTransactionId
- * when a page that is already all-visible is being marked all-frozen.
- *
- * Caller is expected to set the heap page's PD_ALL_VISIBLE bit before calling
- * this function. Except in recovery, caller should also pass the heap
- * buffer. When checksums are enabled and we're not in recovery, we must add
- * the heap buffer to the WAL chain to protect it from being torn.
- *
- * You must pass a buffer containing the correct map page to this function.
- * Call visibilitymap_pin first to pin the right one. This function doesn't do
- * any I/O.
- *
- * Returns the state of the page's VM bits before setting flags.
- */
-uint8
-visibilitymap_set(Relation rel, BlockNumber heapBlk, Buffer heapBuf,
- XLogRecPtr recptr, Buffer vmBuf, TransactionId cutoff_xid,
- uint8 flags)
-{
- BlockNumber mapBlock = HEAPBLK_TO_MAPBLOCK(heapBlk);
- uint32 mapByte = HEAPBLK_TO_MAPBYTE(heapBlk);
- uint8 mapOffset = HEAPBLK_TO_OFFSET(heapBlk);
- Page page;
- uint8 *map;
- uint8 status;
-
-#ifdef TRACE_VISIBILITYMAP
- elog(DEBUG1, "vm_set flags 0x%02X for %s %d",
- flags, RelationGetRelationName(rel), heapBlk);
-#endif
-
- Assert(InRecovery || !XLogRecPtrIsValid(recptr));
- Assert(InRecovery || PageIsAllVisible(BufferGetPage(heapBuf)));
- Assert((flags & VISIBILITYMAP_VALID_BITS) == flags);
-
- /* Must never set all_frozen bit without also setting all_visible bit */
- Assert(flags != VISIBILITYMAP_ALL_FROZEN);
-
- /* Check that we have the right heap page pinned, if present */
- if (BufferIsValid(heapBuf) && BufferGetBlockNumber(heapBuf) != heapBlk)
- elog(ERROR, "wrong heap buffer passed to visibilitymap_set");
-
- Assert(!BufferIsValid(heapBuf) ||
- BufferIsLockedByMeInMode(heapBuf, BUFFER_LOCK_EXCLUSIVE));
-
- /* Check that we have the right VM page pinned */
- if (!BufferIsValid(vmBuf) || BufferGetBlockNumber(vmBuf) != mapBlock)
- elog(ERROR, "wrong VM buffer passed to visibilitymap_set");
-
- page = BufferGetPage(vmBuf);
- map = (uint8 *) PageGetContents(page);
- LockBuffer(vmBuf, BUFFER_LOCK_EXCLUSIVE);
-
- status = (map[mapByte] >> mapOffset) & VISIBILITYMAP_VALID_BITS;
- if (flags != status)
- {
- START_CRIT_SECTION();
-
- map[mapByte] |= (flags << mapOffset);
- MarkBufferDirty(vmBuf);
-
- if (RelationNeedsWAL(rel))
- {
- if (!XLogRecPtrIsValid(recptr))
- {
- Assert(!InRecovery);
- recptr = log_heap_visible(rel, heapBuf, vmBuf, cutoff_xid, flags);
-
- /*
- * If data checksums are enabled (or wal_log_hints=on), we
- * need to protect the heap page from being torn.
- *
- * If not, then we must *not* update the heap page's LSN. In
- * this case, the FPI for the heap page was omitted from the
- * WAL record inserted above, so it would be incorrect to
- * update the heap page's LSN.
- */
- if (XLogHintBitIsNeeded())
- {
- Page heapPage = BufferGetPage(heapBuf);
-
- PageSetLSN(heapPage, recptr);
- }
- }
- PageSetLSN(page, recptr);
- }
-
- END_CRIT_SECTION();
- }
-
- LockBuffer(vmBuf, BUFFER_LOCK_UNLOCK);
- return status;
-}
-
/*
* Set VM (visibility map) flags in the VM block in vmBuf.
*
@@ -344,9 +240,9 @@ visibilitymap_set(Relation rel, BlockNumber heapBlk, Buffer heapBuf,
* rlocator is used only for debugging messages.
*/
uint8
-visibilitymap_set_vmbits(BlockNumber heapBlk,
- Buffer vmBuf, uint8 flags,
- const RelFileLocator rlocator)
+visibilitymap_set(BlockNumber heapBlk,
+ Buffer vmBuf, uint8 flags,
+ const RelFileLocator rlocator)
{
BlockNumber mapBlock = HEAPBLK_TO_MAPBLOCK(heapBlk);
uint32 mapByte = HEAPBLK_TO_MAPBYTE(heapBlk);
diff --git a/src/backend/access/rmgrdesc/heapdesc.c b/src/backend/access/rmgrdesc/heapdesc.c
index ca26d1f0ed1..08461fdf593 100644
--- a/src/backend/access/rmgrdesc/heapdesc.c
+++ b/src/backend/access/rmgrdesc/heapdesc.c
@@ -349,13 +349,6 @@ heap2_desc(StringInfo buf, XLogReaderState *record)
}
}
}
- else if (info == XLOG_HEAP2_VISIBLE)
- {
- xl_heap_visible *xlrec = (xl_heap_visible *) rec;
-
- appendStringInfo(buf, "snapshotConflictHorizon: %u, flags: 0x%02X",
- xlrec->snapshotConflictHorizon, xlrec->flags);
- }
else if (info == XLOG_HEAP2_MULTI_INSERT)
{
xl_heap_multi_insert *xlrec = (xl_heap_multi_insert *) rec;
@@ -461,9 +454,6 @@ heap2_identify(uint8 info)
case XLOG_HEAP2_PRUNE_VACUUM_CLEANUP:
id = "PRUNE_VACUUM_CLEANUP";
break;
- case XLOG_HEAP2_VISIBLE:
- id = "VISIBLE";
- break;
case XLOG_HEAP2_MULTI_INSERT:
id = "MULTI_INSERT";
break;
diff --git a/src/backend/replication/logical/decode.c b/src/backend/replication/logical/decode.c
index cc03f0706e9..2fdd4af90a8 100644
--- a/src/backend/replication/logical/decode.c
+++ b/src/backend/replication/logical/decode.c
@@ -454,7 +454,6 @@ heap2_decode(LogicalDecodingContext *ctx, XLogRecordBuffer *buf)
case XLOG_HEAP2_PRUNE_ON_ACCESS:
case XLOG_HEAP2_PRUNE_VACUUM_SCAN:
case XLOG_HEAP2_PRUNE_VACUUM_CLEANUP:
- case XLOG_HEAP2_VISIBLE:
case XLOG_HEAP2_LOCK_UPDATED:
break;
default:
diff --git a/src/backend/storage/ipc/standby.c b/src/backend/storage/ipc/standby.c
index 4222bdab078..c619643e121 100644
--- a/src/backend/storage/ipc/standby.c
+++ b/src/backend/storage/ipc/standby.c
@@ -475,12 +475,12 @@ ResolveRecoveryConflictWithSnapshot(TransactionId snapshotConflictHorizon,
* If we get passed InvalidTransactionId then we do nothing (no conflict).
*
* This can happen when replaying already-applied WAL records after a
- * standby crash or restart, or when replaying an XLOG_HEAP2_VISIBLE
- * record that marks as frozen a page which was already all-visible. It's
- * also quite common with records generated during index deletion
- * (original execution of the deletion can reason that a recovery conflict
- * which is sufficient for the deletion operation must take place before
- * replay of the deletion record itself).
+ * standby crash or restart, or when replaying a record that marks as
+ * frozen a page which was already marked all-visible in the visibility
+ * map. It's also quite common with records generated during index
+ * deletion (original execution of the deletion can reason that a recovery
+ * conflict which is sufficient for the deletion operation must take place
+ * before replay of the deletion record itself).
*/
if (!TransactionIdIsValid(snapshotConflictHorizon))
return;
diff --git a/src/include/access/heapam_xlog.h b/src/include/access/heapam_xlog.h
index 16c2b2e3c9c..69678187832 100644
--- a/src/include/access/heapam_xlog.h
+++ b/src/include/access/heapam_xlog.h
@@ -60,7 +60,6 @@
#define XLOG_HEAP2_PRUNE_ON_ACCESS 0x10
#define XLOG_HEAP2_PRUNE_VACUUM_SCAN 0x20
#define XLOG_HEAP2_PRUNE_VACUUM_CLEANUP 0x30
-#define XLOG_HEAP2_VISIBLE 0x40
#define XLOG_HEAP2_MULTI_INSERT 0x50
#define XLOG_HEAP2_LOCK_UPDATED 0x60
#define XLOG_HEAP2_NEW_CID 0x70
@@ -294,7 +293,13 @@ typedef struct xl_heap_prune
#define SizeOfHeapPrune (offsetof(xl_heap_prune, flags) + sizeof(uint16))
-/* to handle recovery conflict during logical decoding on standby */
+/*
+ * To handle recovery conflict during logical decoding on standby, we must know
+ * if the table is a catalog table. Note that in visibilitymapdefs.h
+ * VISIBILITYMAP_XLOG_CATALOG_REL is also defined as (1 << 2). xl_heap_prune
+ * records should use XLHP_IS_CATALOG_REL, not VISIBILIYTMAP_XLOG_CATALOG_REL --
+ * even if they only contain updates to the VM.
+ */
#define XLHP_IS_CATALOG_REL (1 << 1)
/*
@@ -443,20 +448,6 @@ typedef struct xl_heap_inplace
#define MinSizeOfHeapInplace (offsetof(xl_heap_inplace, nmsgs) + sizeof(int))
-/*
- * This is what we need to know about setting a visibility map bit
- *
- * Backup blk 0: visibility map buffer
- * Backup blk 1: heap buffer
- */
-typedef struct xl_heap_visible
-{
- TransactionId snapshotConflictHorizon;
- uint8 flags;
-} xl_heap_visible;
-
-#define SizeOfHeapVisible (offsetof(xl_heap_visible, flags) + sizeof(uint8))
-
typedef struct xl_heap_new_cid
{
/*
@@ -500,11 +491,6 @@ extern void heap2_desc(StringInfo buf, XLogReaderState *record);
extern const char *heap2_identify(uint8 info);
extern void heap_xlog_logical_rewrite(XLogReaderState *r);
-extern XLogRecPtr log_heap_visible(Relation rel, Buffer heap_buffer,
- Buffer vm_buffer,
- TransactionId snapshotConflictHorizon,
- uint8 vmflags);
-
/* in heapdesc.c, so it can be shared between frontend/backend code */
extern void heap_xlog_deserialize_prune_and_freeze(char *cursor, uint16 flags,
int *nplans, xlhp_freeze_plan **plans,
diff --git a/src/include/access/visibilitymap.h b/src/include/access/visibilitymap.h
index c6fa37be968..05ba6786b47 100644
--- a/src/include/access/visibilitymap.h
+++ b/src/include/access/visibilitymap.h
@@ -15,7 +15,6 @@
#define VISIBILITYMAP_H
#include "access/visibilitymapdefs.h"
-#include "access/xlogdefs.h"
#include "storage/block.h"
#include "storage/buf.h"
#include "storage/relfilelocator.h"
@@ -32,15 +31,9 @@ extern bool visibilitymap_clear(Relation rel, BlockNumber heapBlk,
extern void visibilitymap_pin(Relation rel, BlockNumber heapBlk,
Buffer *vmbuf);
extern bool visibilitymap_pin_ok(BlockNumber heapBlk, Buffer vmbuf);
-extern uint8 visibilitymap_set(Relation rel,
- BlockNumber heapBlk, Buffer heapBuf,
- XLogRecPtr recptr,
- Buffer vmBuf,
- TransactionId cutoff_xid,
- uint8 flags);
-extern uint8 visibilitymap_set_vmbits(BlockNumber heapBlk,
- Buffer vmBuf, uint8 flags,
- const RelFileLocator rlocator);
+extern uint8 visibilitymap_set(BlockNumber heapBlk,
+ Buffer vmBuf, uint8 flags,
+ const RelFileLocator rlocator);
extern uint8 visibilitymap_get_status(Relation rel, BlockNumber heapBlk, Buffer *vmbuf);
extern void visibilitymap_count(Relation rel, BlockNumber *all_visible, BlockNumber *all_frozen);
extern BlockNumber visibilitymap_prepare_truncate(Relation rel,
diff --git a/src/include/access/visibilitymapdefs.h b/src/include/access/visibilitymapdefs.h
index 5ad5c020877..e01bce4c99f 100644
--- a/src/include/access/visibilitymapdefs.h
+++ b/src/include/access/visibilitymapdefs.h
@@ -21,14 +21,5 @@
#define VISIBILITYMAP_ALL_FROZEN 0x02
#define VISIBILITYMAP_VALID_BITS 0x03 /* OR of all valid visibilitymap
* flags bits */
-/*
- * To detect recovery conflicts during logical decoding on a standby, we need
- * to know if a table is a user catalog table. For that we add an additional
- * bit into xl_heap_visible.flags, in addition to the above.
- *
- * NB: VISIBILITYMAP_XLOG_* may not be passed to visibilitymap_set().
- */
-#define VISIBILITYMAP_XLOG_CATALOG_REL 0x04
-#define VISIBILITYMAP_XLOG_VALID_BITS (VISIBILITYMAP_VALID_BITS | VISIBILITYMAP_XLOG_CATALOG_REL)
#endif /* VISIBILITYMAPDEFS_H */
diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list
index cf3f6a7dafd..a139705de01 100644
--- a/src/tools/pgindent/typedefs.list
+++ b/src/tools/pgindent/typedefs.list
@@ -4302,7 +4302,6 @@ xl_heap_prune
xl_heap_rewrite_mapping
xl_heap_truncate
xl_heap_update
-xl_heap_visible
xl_invalid_page
xl_invalid_page_key
xl_invalidations
--
2.43.0
v23-0008-Rename-GlobalVisTestIsRemovableXid-to-GlobalVisX.patchtext/x-patch; charset=UTF-8; name=v23-0008-Rename-GlobalVisTestIsRemovableXid-to-GlobalVisX.patchDownload
From 221a5db2d8a7cee07053c30f635be0b27bae2242 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <melanieplageman@gmail.com>
Date: Fri, 18 Jul 2025 16:30:04 -0400
Subject: [PATCH v23 08/14] Rename GlobalVisTestIsRemovableXid() to
GlobalVisXidVisibleToAll()
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
The function is currently only used to check whether a tuple’s xmax is
visible to all transactions (and thus removable). Upcoming changes will
also use it to test whether a tuple’s xmin is visible to all to
decide if a page can be marked all-visible in the visibility map.
The new name, GlobalVisXidVisibleToAll(), better reflects this broader
purpose.
Reviewed-by: Andres Freund <andres@anarazel.de>
Reviewed-by: Kirill Reshke <reshkekirill@gmail.com>
Reviewed-by: Chao Li <li.evan.chao@gmail.com>
---
src/backend/access/heap/heapam_visibility.c | 6 +++---
src/backend/access/heap/pruneheap.c | 12 ++++++------
src/backend/access/spgist/spgvacuum.c | 2 +-
src/backend/storage/ipc/procarray.c | 17 ++++++++---------
src/include/utils/snapmgr.h | 4 ++--
5 files changed, 20 insertions(+), 21 deletions(-)
diff --git a/src/backend/access/heap/heapam_visibility.c b/src/backend/access/heap/heapam_visibility.c
index 05f6946fe60..4ebc8abdbeb 100644
--- a/src/backend/access/heap/heapam_visibility.c
+++ b/src/backend/access/heap/heapam_visibility.c
@@ -1447,7 +1447,7 @@ HeapTupleSatisfiesNonVacuumable(HeapTuple htup, Snapshot snapshot,
{
Assert(TransactionIdIsValid(dead_after));
- if (GlobalVisTestIsRemovableXid(snapshot->vistest, dead_after))
+ if (GlobalVisXidVisibleToAll(snapshot->vistest, dead_after))
res = HEAPTUPLE_DEAD;
}
else
@@ -1512,8 +1512,8 @@ HeapTupleIsSurelyDead(HeapTuple htup, GlobalVisState *vistest)
return false;
/* Deleter committed, so tuple is dead if the XID is old enough. */
- return GlobalVisTestIsRemovableXid(vistest,
- HeapTupleHeaderGetRawXmax(tuple));
+ return GlobalVisXidVisibleToAll(vistest,
+ HeapTupleHeaderGetRawXmax(tuple));
}
/*
diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index 7a778ad3bad..1a3c7cf1ef5 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -253,7 +253,7 @@ heap_page_prune_opt(Relation relation, Buffer buffer)
*/
vistest = GlobalVisTestFor(relation);
- if (!GlobalVisTestIsRemovableXid(vistest, prune_xid))
+ if (!GlobalVisXidVisibleToAll(vistest, prune_xid))
return;
/*
@@ -487,7 +487,7 @@ prune_freeze_plan(Oid reloid, Buffer buffer, PruneState *prstate,
* Determining HTSV only once for each tuple is required for correctness,
* to deal with cases where running HTSV twice could result in different
* results. For example, RECENTLY_DEAD can turn to DEAD if another
- * checked item causes GlobalVisTestIsRemovableFullXid() to update the
+ * checked item causes GlobalVisFullXidVisibleToAll() to update the
* horizon, or INSERT_IN_PROGRESS can change to DEAD if the inserting
* transaction aborts.
*
@@ -1327,11 +1327,11 @@ heap_prune_satisfies_vacuum(PruneState *prstate, HeapTuple tup, Buffer buffer)
* Determine whether or not the tuple is considered dead when compared
* with the provided GlobalVisState. On-access pruning does not provide
* VacuumCutoffs. And for vacuum, even if the tuple's xmax is not older
- * than OldestXmin, GlobalVisTestIsRemovableXid() could find the row dead
- * if the GlobalVisState has been updated since the beginning of vacuuming
+ * than OldestXmin, GlobalVisXidVisibleToAll() could find the row dead if
+ * the GlobalVisState has been updated since the beginning of vacuuming
* the relation.
*/
- if (GlobalVisTestIsRemovableXid(prstate->vistest, dead_after))
+ if (GlobalVisXidVisibleToAll(prstate->vistest, dead_after))
return HEAPTUPLE_DEAD;
return res;
@@ -1790,7 +1790,7 @@ heap_prune_record_unchanged_lp_normal(Page page, PruneState *prstate, OffsetNumb
/*
* For now always use prstate->cutoffs for this test, because
* we only update 'all_visible' and 'all_frozen' when freezing
- * is requested. We could use GlobalVisTestIsRemovableXid
+ * is requested. We could use GlobalVisXidVisibleToAll()
* instead, if a non-freezing caller wanted to set the VM bit.
*/
Assert(prstate->cutoffs);
diff --git a/src/backend/access/spgist/spgvacuum.c b/src/backend/access/spgist/spgvacuum.c
index 71ef2e5036f..1c0eb425ee9 100644
--- a/src/backend/access/spgist/spgvacuum.c
+++ b/src/backend/access/spgist/spgvacuum.c
@@ -536,7 +536,7 @@ vacuumRedirectAndPlaceholder(Relation index, Relation heaprel, Buffer buffer)
*/
if (dt->tupstate == SPGIST_REDIRECT &&
(!TransactionIdIsValid(dt->xid) ||
- GlobalVisTestIsRemovableXid(vistest, dt->xid)))
+ GlobalVisXidVisibleToAll(vistest, dt->xid)))
{
dt->tupstate = SPGIST_PLACEHOLDER;
Assert(opaque->nRedirection > 0);
diff --git a/src/backend/storage/ipc/procarray.c b/src/backend/storage/ipc/procarray.c
index 200f72c6e25..235c3b584f6 100644
--- a/src/backend/storage/ipc/procarray.c
+++ b/src/backend/storage/ipc/procarray.c
@@ -4181,8 +4181,7 @@ GlobalVisUpdate(void)
* See comment for GlobalVisState for details.
*/
bool
-GlobalVisTestIsRemovableFullXid(GlobalVisState *state,
- FullTransactionId fxid)
+GlobalVisFullXidVisibleToAll(GlobalVisState *state, FullTransactionId fxid)
{
/*
* If fxid is older than maybe_needed bound, it definitely is visible to
@@ -4216,14 +4215,14 @@ GlobalVisTestIsRemovableFullXid(GlobalVisState *state,
}
/*
- * Wrapper around GlobalVisTestIsRemovableFullXid() for 32bit xids.
+ * Wrapper around GlobalVisFullXidVisibleToAll() for 32bit xids.
*
* It is crucial that this only gets called for xids from a source that
* protects against xid wraparounds (e.g. from a table and thus protected by
* relfrozenxid).
*/
bool
-GlobalVisTestIsRemovableXid(GlobalVisState *state, TransactionId xid)
+GlobalVisXidVisibleToAll(GlobalVisState *state, TransactionId xid)
{
FullTransactionId fxid;
@@ -4237,12 +4236,12 @@ GlobalVisTestIsRemovableXid(GlobalVisState *state, TransactionId xid)
*/
fxid = FullXidRelativeTo(state->definitely_needed, xid);
- return GlobalVisTestIsRemovableFullXid(state, fxid);
+ return GlobalVisFullXidVisibleToAll(state, fxid);
}
/*
* Convenience wrapper around GlobalVisTestFor() and
- * GlobalVisTestIsRemovableFullXid(), see their comments.
+ * GlobalVisFullXidVisibleToAll(), see their comments.
*/
bool
GlobalVisCheckRemovableFullXid(Relation rel, FullTransactionId fxid)
@@ -4251,12 +4250,12 @@ GlobalVisCheckRemovableFullXid(Relation rel, FullTransactionId fxid)
state = GlobalVisTestFor(rel);
- return GlobalVisTestIsRemovableFullXid(state, fxid);
+ return GlobalVisFullXidVisibleToAll(state, fxid);
}
/*
* Convenience wrapper around GlobalVisTestFor() and
- * GlobalVisTestIsRemovableXid(), see their comments.
+ * GlobalVisTestIsVisibleXid(), see their comments.
*/
bool
GlobalVisCheckRemovableXid(Relation rel, TransactionId xid)
@@ -4265,7 +4264,7 @@ GlobalVisCheckRemovableXid(Relation rel, TransactionId xid)
state = GlobalVisTestFor(rel);
- return GlobalVisTestIsRemovableXid(state, xid);
+ return GlobalVisXidVisibleToAll(state, xid);
}
/*
diff --git a/src/include/utils/snapmgr.h b/src/include/utils/snapmgr.h
index 604c1f90216..a0ea2cfcea2 100644
--- a/src/include/utils/snapmgr.h
+++ b/src/include/utils/snapmgr.h
@@ -100,8 +100,8 @@ extern char *ExportSnapshot(Snapshot snapshot);
*/
typedef struct GlobalVisState GlobalVisState;
extern GlobalVisState *GlobalVisTestFor(Relation rel);
-extern bool GlobalVisTestIsRemovableXid(GlobalVisState *state, TransactionId xid);
-extern bool GlobalVisTestIsRemovableFullXid(GlobalVisState *state, FullTransactionId fxid);
+extern bool GlobalVisXidVisibleToAll(GlobalVisState *state, TransactionId xid);
+extern bool GlobalVisFullXidVisibleToAll(GlobalVisState *state, FullTransactionId fxid);
extern bool GlobalVisCheckRemovableXid(Relation rel, TransactionId xid);
extern bool GlobalVisCheckRemovableFullXid(Relation rel, FullTransactionId fxid);
--
2.43.0
v23-0009-Use-GlobalVisState-in-vacuum-to-determine-page-l.patchtext/x-patch; charset=UTF-8; name=v23-0009-Use-GlobalVisState-in-vacuum-to-determine-page-l.patchDownload
From c75f7d1281fadf5c49e37577ef42ff96b92b3f59 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <melanieplageman@gmail.com>
Date: Tue, 29 Jul 2025 14:38:24 -0400
Subject: [PATCH v23 09/14] Use GlobalVisState in vacuum to determine page
level visibility
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
During vacuum's first and third phases, we examine tuples' visibility
to determine if we can set the page all-visible in the visibility map.
Previously, this check compared tuple xmins against a single XID chosen at
the start of vacuum (OldestXmin). We now use GlobalVisState, which also
enables future work to set the VM during on-access pruning, since ordinary
queries have access to GlobalVisState but not OldestXmin.
This also benefits vacuum directly: in some cases, GlobalVisState may
advance during a vacuum, allowing more pages to become considered
all-visible. And, in the future, we could easily add a heuristic to
update GlobalVisState more frequently during vacuums of large tables. In
the rare case that the GlobalVisState moves backward, vacuum falls back
to OldestXmin to ensure we don’t attempt to freeze a dead tuple that
wasn’t yet prunable according to the GlobalVisState.
Because comparing a transaction ID against GlobalVisState is more
expensive than comparing against a single XID, we defer this check until
after scanning all tuples on the page. If visibility_cutoff_xid was
maintained, we perform the GlobalVisState check only once per page.
This is safe because visibility_cutoff_xid records the newest xmin on
the page; if it is globally visible, then the entire page is all-visible.
This approach may result in examining more tuple xmins than before,
since with OldestXmin we could sometimes rule out the page being
all-visible earlier. However, profiling shows the additional cost is not
significant.
Reviewed-by: Andres Freund <andres@anarazel.de>
Reviewed-by: Chao Li <li.evan.chao@gmail.com>
---
src/backend/access/heap/heapam_visibility.c | 28 ++++++++++++++
src/backend/access/heap/pruneheap.c | 43 +++++++++------------
src/backend/access/heap/vacuumlazy.c | 10 ++---
src/include/access/heapam.h | 11 +++---
4 files changed, 58 insertions(+), 34 deletions(-)
diff --git a/src/backend/access/heap/heapam_visibility.c b/src/backend/access/heap/heapam_visibility.c
index 4ebc8abdbeb..edd529dc3c0 100644
--- a/src/backend/access/heap/heapam_visibility.c
+++ b/src/backend/access/heap/heapam_visibility.c
@@ -1189,6 +1189,34 @@ HeapTupleSatisfiesVacuum(HeapTuple htup, TransactionId OldestXmin,
return res;
}
+/*
+ * Nearly the same as HeapTupleSatisfiesVacuum, but uses a GlobalVisState to
+ * determine whether or not a tuple is HEAPTUPLE_DEAD Or
+ * HEAPTUPLE_RECENTLY_DEAD. It serves the same purpose but can be used by
+ * callers that have not calculated a single OldestXmin value.
+ */
+HTSV_Result
+HeapTupleSatisfiesVacuumGlobalVis(HeapTuple htup, GlobalVisState *vistest,
+ Buffer buffer)
+{
+ TransactionId dead_after = InvalidTransactionId;
+ HTSV_Result res;
+
+ res = HeapTupleSatisfiesVacuumHorizon(htup, buffer, &dead_after);
+
+ if (res == HEAPTUPLE_RECENTLY_DEAD)
+ {
+ Assert(TransactionIdIsValid(dead_after));
+
+ if (GlobalVisXidVisibleToAll(vistest, dead_after))
+ res = HEAPTUPLE_DEAD;
+ }
+ else
+ Assert(!TransactionIdIsValid(dead_after));
+
+ return res;
+}
+
/*
* Work horse for HeapTupleSatisfiesVacuum and similar routines.
*
diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index 1a3c7cf1ef5..d836bbeaf52 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -450,11 +450,12 @@ prune_freeze_setup(PruneFreezeParams *params,
/*
* The visibility cutoff xid is the newest xmin of live, committed tuples
- * older than OldestXmin on the page. This field is only kept up-to-date
- * if the page is all-visible. As soon as a tuple is encountered that is
- * not visible to all, this field is unmaintained. As long as it is
- * maintained, it can be used to calculate the snapshot conflict horizon
- * when updating the VM and/or freezing all the tuples on the page.
+ * on the page older than the visibility horizon represented in the
+ * GlobalVisState. This field is only kept up-to-date if the page is
+ * all-visible. As soon as a tuple is encountered that is not visible to
+ * all, this field is unmaintained. As long as it is maintained, it can be
+ * used to calculate the snapshot conflict horizon when updating the VM
+ * and/or freezing all the tuples on the page.
*/
prstate->visibility_cutoff_xid = InvalidTransactionId;
}
@@ -980,14 +981,13 @@ heap_page_will_set_vis(Relation relation,
*/
static bool
heap_page_is_all_visible(Relation rel, Buffer buf,
- TransactionId OldestXmin,
+ GlobalVisState *vistest,
bool *all_frozen,
TransactionId *visibility_cutoff_xid,
OffsetNumber *logging_offnum)
{
- return heap_page_would_be_all_visible(rel, buf,
- OldestXmin,
+ return heap_page_would_be_all_visible(rel, buf, vistest,
NULL, 0,
all_frozen,
visibility_cutoff_xid,
@@ -1078,6 +1078,16 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
prune_freeze_plan(RelationGetRelid(params->relation),
buffer, &prstate, off_loc);
+ /*
+ * After processing all the live tuples on the page, if the newest xmin
+ * amongst them is not visible to everyone, the page cannot be
+ * all-visible.
+ */
+ if (prstate.all_visible &&
+ TransactionIdIsNormal(prstate.visibility_cutoff_xid) &&
+ !GlobalVisXidVisibleToAll(prstate.vistest, prstate.visibility_cutoff_xid))
+ prstate.all_visible = prstate.all_frozen = false;
+
/*
* If checksums are enabled, calling heap_prune_satisfies_vacuum() while
* checking tuple visibility information in prune_freeze_plan() may have
@@ -1255,10 +1265,9 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
bool debug_all_frozen;
Assert(prstate.lpdead_items == 0);
- Assert(prstate.cutoffs);
Assert(heap_page_is_all_visible(params->relation, buffer,
- prstate.cutoffs->OldestXmin,
+ prstate.vistest,
&debug_all_frozen,
&debug_cutoff, off_loc));
@@ -1787,20 +1796,6 @@ heap_prune_record_unchanged_lp_normal(Page page, PruneState *prstate, OffsetNumb
*/
xmin = HeapTupleHeaderGetXmin(htup);
- /*
- * For now always use prstate->cutoffs for this test, because
- * we only update 'all_visible' and 'all_frozen' when freezing
- * is requested. We could use GlobalVisXidVisibleToAll()
- * instead, if a non-freezing caller wanted to set the VM bit.
- */
- Assert(prstate->cutoffs);
- if (!TransactionIdPrecedes(xmin, prstate->cutoffs->OldestXmin))
- {
- prstate->all_visible = false;
- prstate->all_frozen = false;
- break;
- }
-
/* Track newest xmin on page. */
if (TransactionIdFollows(xmin, prstate->visibility_cutoff_xid) &&
TransactionIdIsNormal(xmin))
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index 5d88a1592e3..c0e1350cb11 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -2735,7 +2735,7 @@ lazy_vacuum_heap_page(LVRelState *vacrel, BlockNumber blkno, Buffer buffer,
* done outside the critical section.
*/
if (heap_page_would_be_all_visible(vacrel->rel, buffer,
- vacrel->cutoffs.OldestXmin,
+ vacrel->vistest,
deadoffsets, num_offsets,
&all_frozen, &visibility_cutoff_xid,
&vacrel->offnum))
@@ -3495,7 +3495,7 @@ dead_items_cleanup(LVRelState *vacrel)
* Returns true if the page is all-visible other than the provided
* deadoffsets and false otherwise.
*
- * OldestXmin is used to determine visibility.
+ * vistest is used to determine visibility.
*
* Output parameters:
*
@@ -3511,7 +3511,7 @@ dead_items_cleanup(LVRelState *vacrel)
*/
bool
heap_page_would_be_all_visible(Relation rel, Buffer buf,
- TransactionId OldestXmin,
+ GlobalVisState *vistest,
OffsetNumber *deadoffsets,
int ndeadoffsets,
bool *all_frozen,
@@ -3585,7 +3585,7 @@ heap_page_would_be_all_visible(Relation rel, Buffer buf,
/* Visibility checks may do IO or allocate memory */
Assert(CritSectionCount == 0);
- switch (HeapTupleSatisfiesVacuum(&tuple, OldestXmin, buf))
+ switch (HeapTupleSatisfiesVacuumGlobalVis(&tuple, vistest, buf))
{
case HEAPTUPLE_LIVE:
{
@@ -3604,7 +3604,7 @@ heap_page_would_be_all_visible(Relation rel, Buffer buf,
* that everyone sees it as committed?
*/
xmin = HeapTupleHeaderGetXmin(tuple.t_data);
- if (!TransactionIdPrecedes(xmin, OldestXmin))
+ if (!GlobalVisXidVisibleToAll(vistest, xmin))
{
all_visible = false;
*all_frozen = false;
diff --git a/src/include/access/heapam.h b/src/include/access/heapam.h
index 14c1d92604d..4702ec00dea 100644
--- a/src/include/access/heapam.h
+++ b/src/include/access/heapam.h
@@ -278,10 +278,9 @@ typedef struct PruneFreezeParams
/*
* Contains the cutoffs used for freezing. They are required if the
- * HEAP_PAGE_PRUNE_FREEZE option is set. cutoffs->OldestXmin is also used
- * to determine if dead tuples are HEAPTUPLE_RECENTLY_DEAD or
- * HEAPTUPLE_DEAD. Currently only vacuum passes in cutoffs. Vacuum
- * calculates them once, at the beginning of vacuuming the relation.
+ * HEAP_PAGE_PRUNE_FREEZE option is set. Currently only vacuum passes in
+ * cutoffs. Vacuum calculates them once, at the beginning of vacuuming the
+ * relation.
*/
struct VacuumCutoffs *cutoffs;
} PruneFreezeParams;
@@ -446,7 +445,7 @@ extern void log_heap_prune_and_freeze(Relation relation, Buffer buffer,
extern void heap_vacuum_rel(Relation rel,
const VacuumParams params, BufferAccessStrategy bstrategy);
extern bool heap_page_would_be_all_visible(Relation rel, Buffer buf,
- TransactionId OldestXmin,
+ GlobalVisState *vistest,
OffsetNumber *deadoffsets,
int ndeadoffsets,
bool *all_frozen,
@@ -460,6 +459,8 @@ extern TM_Result HeapTupleSatisfiesUpdate(HeapTuple htup, CommandId curcid,
Buffer buffer);
extern HTSV_Result HeapTupleSatisfiesVacuum(HeapTuple htup, TransactionId OldestXmin,
Buffer buffer);
+extern HTSV_Result HeapTupleSatisfiesVacuumGlobalVis(HeapTuple htup,
+ GlobalVisState *vistest, Buffer buffer);
extern HTSV_Result HeapTupleSatisfiesVacuumHorizon(HeapTuple htup, Buffer buffer,
TransactionId *dead_after);
extern void HeapTupleSetHintBits(HeapTupleHeader tuple, Buffer buffer,
--
2.43.0
v23-0010-Unset-all_visible-sooner-if-not-freezing.patchtext/x-patch; charset=UTF-8; name=v23-0010-Unset-all_visible-sooner-if-not-freezing.patchDownload
From 23236fd69abba5d481ff228d0fe1486ba40eddf3 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <melanieplageman@gmail.com>
Date: Tue, 14 Oct 2025 15:22:35 -0400
Subject: [PATCH v23 10/14] Unset all_visible sooner if not freezing
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
In the prune/freeze path, we currently delay clearing all_visible and
all_frozen in the presence of dead items to allow opportunistic
freezing.
However, if no freezing will be attempted, there’s no need to delay.
Clearing the flags earlier avoids extra bookkeeping in
heap_prune_record_unchanged_lp_normal(). This currently has no runtime
effect because all callers that consider setting the VM also prepare
freeze plans, but upcoming changes will allow on-access pruning to set
the VM without freezing. The extra bookkeeping was noticeable in a
profile of on-access VM setting.
Reviewed-by: Chao Li <li.evan.chao@gmail.com>
---
src/backend/access/heap/pruneheap.c | 15 +++++++++++++--
1 file changed, 13 insertions(+), 2 deletions(-)
diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index d836bbeaf52..03b8ddcc38d 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -1653,8 +1653,13 @@ heap_prune_record_dead(PruneState *prstate, OffsetNumber offnum,
/*
* Deliberately delay unsetting all_visible and all_frozen until later
* during pruning. Removable dead tuples shouldn't preclude freezing the
- * page.
+ * page. If we won't attempt freezing, just unset all-visible now, though.
*/
+ if (!prstate->attempt_freeze)
+ {
+ prstate->all_visible = false;
+ prstate->all_frozen = false;
+ }
/* Record the dead offset for vacuum */
prstate->deadoffsets[prstate->lpdead_items++] = offnum;
@@ -1913,8 +1918,14 @@ heap_prune_record_unchanged_lp_dead(Page page, PruneState *prstate, OffsetNumber
* Similarly, don't unset all_visible and all_frozen until later, at the
* end of heap_page_prune_and_freeze(). This will allow us to attempt to
* freeze the page after pruning. As long as we unset it before updating
- * the visibility map, this will be correct.
+ * the visibility map, this will be correct. If we won't attempt freezing,
+ * though, just unset all-visible now.
*/
+ if (!prstate->attempt_freeze)
+ {
+ prstate->all_visible = false;
+ prstate->all_frozen = false;
+ }
/* Record the dead offset for vacuum */
prstate->deadoffsets[prstate->lpdead_items++] = offnum;
--
2.43.0
v23-0011-Track-which-relations-are-modified-by-a-query.patchtext/x-patch; charset=US-ASCII; name=v23-0011-Track-which-relations-are-modified-by-a-query.patchDownload
From be0f2805786ba0d4711c39fe4896a3d6f51feba1 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <melanieplageman@gmail.com>
Date: Wed, 3 Dec 2025 15:07:24 -0500
Subject: [PATCH v23 11/14] Track which relations are modified by a query
Save the relids in a bitmap in the estate. A later commit will pass this
information down to scan nodes to control whether or not the scan allows
setting the visibility map while on-access pruning. We don't want to set
the visibility map if the query is just going to modify the page
immediately after.
---
src/backend/executor/execMain.c | 4 ++++
src/backend/executor/execUtils.c | 2 ++
src/include/nodes/execnodes.h | 6 ++++++
3 files changed, 12 insertions(+)
diff --git a/src/backend/executor/execMain.c b/src/backend/executor/execMain.c
index 27c9eec697b..0630a5af79e 100644
--- a/src/backend/executor/execMain.c
+++ b/src/backend/executor/execMain.c
@@ -916,6 +916,10 @@ InitPlan(QueryDesc *queryDesc, int eflags)
break;
}
+ /* If it has a rowmark, the relation is modified */
+ estate->es_modified_relids = bms_add_member(estate->es_modified_relids,
+ rc->rti);
+
/* Check that relation is a legal target for marking */
if (relation)
CheckValidRowMarkRel(relation, rc->markType);
diff --git a/src/backend/executor/execUtils.c b/src/backend/executor/execUtils.c
index fdc65c2b42b..28a06dcd244 100644
--- a/src/backend/executor/execUtils.c
+++ b/src/backend/executor/execUtils.c
@@ -893,6 +893,8 @@ ExecInitResultRelation(EState *estate, ResultRelInfo *resultRelInfo,
estate->es_result_relations = (ResultRelInfo **)
palloc0(estate->es_range_table_size * sizeof(ResultRelInfo *));
estate->es_result_relations[rti - 1] = resultRelInfo;
+ estate->es_modified_relids = bms_add_member(estate->es_modified_relids,
+ rti);
/*
* Saving in the list allows to avoid needlessly traversing the whole
diff --git a/src/include/nodes/execnodes.h b/src/include/nodes/execnodes.h
index 64ff6996431..7f6522cea8e 100644
--- a/src/include/nodes/execnodes.h
+++ b/src/include/nodes/execnodes.h
@@ -676,6 +676,12 @@ typedef struct EState
* ExecDoInitialPruning() */
const char *es_sourceText; /* Source text from QueryDesc */
+ /*
+ * RT indexes of relations modified by the query either through
+ * UPDATE/DELETE/INSERT/MERGE or SELECT FOR UPDATE
+ */
+ Bitmapset *es_modified_relids;
+
JunkFilter *es_junkFilter; /* top-level junk filter, if any */
/* If query can insert/delete tuples, the command ID to mark them with */
--
2.43.0
v23-0012-Pass-down-information-on-table-modification-to-s.patchtext/x-patch; charset=US-ASCII; name=v23-0012-Pass-down-information-on-table-modification-to-s.patchDownload
From 804296ddbb6b3553d37492d2f79d034df71fd3e5 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <melanieplageman@gmail.com>
Date: Wed, 3 Dec 2025 15:12:18 -0500
Subject: [PATCH v23 12/14] Pass down information on table modification to scan
node
Pass down information to sequential scan, index scan, and bitmap table
scan nodes on whether or not the query modifies the relation being
scanned. A later commit will use this information to update the VM
during on-access pruning only if the relation is not modified by the
query.
---
contrib/pgrowlocks/pgrowlocks.c | 2 +-
src/backend/access/brin/brin.c | 3 ++-
src/backend/access/gin/gininsert.c | 3 ++-
src/backend/access/heap/heapam_handler.c | 7 +++---
src/backend/access/index/genam.c | 4 ++--
src/backend/access/index/indexam.c | 6 +++---
src/backend/access/nbtree/nbtsort.c | 2 +-
src/backend/access/table/tableam.c | 7 +++---
src/backend/commands/constraint.c | 2 +-
src/backend/commands/copyto.c | 2 +-
src/backend/commands/tablecmds.c | 4 ++--
src/backend/commands/typecmds.c | 4 ++--
src/backend/executor/execIndexing.c | 2 +-
src/backend/executor/execReplication.c | 8 +++----
src/backend/executor/nodeBitmapHeapscan.c | 9 +++++++-
src/backend/executor/nodeIndexonlyscan.c | 2 +-
src/backend/executor/nodeIndexscan.c | 11 ++++++++--
src/backend/executor/nodeSeqscan.c | 26 ++++++++++++++++++++---
src/backend/partitioning/partbounds.c | 2 +-
src/backend/utils/adt/selfuncs.c | 2 +-
src/include/access/genam.h | 2 +-
src/include/access/heapam.h | 6 ++++++
src/include/access/tableam.h | 19 ++++++++++-------
23 files changed, 91 insertions(+), 44 deletions(-)
diff --git a/contrib/pgrowlocks/pgrowlocks.c b/contrib/pgrowlocks/pgrowlocks.c
index f88269332b6..27f01d8055f 100644
--- a/contrib/pgrowlocks/pgrowlocks.c
+++ b/contrib/pgrowlocks/pgrowlocks.c
@@ -114,7 +114,7 @@ pgrowlocks(PG_FUNCTION_ARGS)
RelationGetRelationName(rel));
/* Scan the relation */
- scan = table_beginscan(rel, GetActiveSnapshot(), 0, NULL);
+ scan = table_beginscan(rel, GetActiveSnapshot(), 0, NULL, 0);
hscan = (HeapScanDesc) scan;
attinmeta = TupleDescGetAttInMetadata(rsinfo->setDesc);
diff --git a/src/backend/access/brin/brin.c b/src/backend/access/brin/brin.c
index cb3331921cb..b9613787b85 100644
--- a/src/backend/access/brin/brin.c
+++ b/src/backend/access/brin/brin.c
@@ -2842,7 +2842,8 @@ _brin_parallel_scan_and_build(BrinBuildState *state,
indexInfo->ii_Concurrent = brinshared->isconcurrent;
scan = table_beginscan_parallel(heap,
- ParallelTableScanFromBrinShared(brinshared));
+ ParallelTableScanFromBrinShared(brinshared),
+ 0);
reltuples = table_index_build_scan(heap, index, indexInfo, true, true,
brinbuildCallbackParallel, state, scan);
diff --git a/src/backend/access/gin/gininsert.c b/src/backend/access/gin/gininsert.c
index f87c60a230c..645688f9241 100644
--- a/src/backend/access/gin/gininsert.c
+++ b/src/backend/access/gin/gininsert.c
@@ -2058,7 +2058,8 @@ _gin_parallel_scan_and_build(GinBuildState *state,
indexInfo->ii_Concurrent = ginshared->isconcurrent;
scan = table_beginscan_parallel(heap,
- ParallelTableScanFromGinBuildShared(ginshared));
+ ParallelTableScanFromGinBuildShared(ginshared),
+ 0);
reltuples = table_index_build_scan(heap, index, indexInfo, true, progress,
ginBuildCallbackParallel, state, scan);
diff --git a/src/backend/access/heap/heapam_handler.c b/src/backend/access/heap/heapam_handler.c
index bcbac844bb6..d7fac94826d 100644
--- a/src/backend/access/heap/heapam_handler.c
+++ b/src/backend/access/heap/heapam_handler.c
@@ -79,12 +79,13 @@ heapam_slot_callbacks(Relation relation)
*/
static IndexFetchTableData *
-heapam_index_fetch_begin(Relation rel)
+heapam_index_fetch_begin(Relation rel, uint32 flags)
{
IndexFetchHeapData *hscan = palloc0(sizeof(IndexFetchHeapData));
hscan->xs_base.rel = rel;
hscan->xs_cbuf = InvalidBuffer;
+ hscan->modifies_base_rel = !(flags & SO_HINT_REL_READ_ONLY);
return &hscan->xs_base;
}
@@ -753,7 +754,7 @@ heapam_relation_copy_for_cluster(Relation OldHeap, Relation NewHeap,
tableScan = NULL;
heapScan = NULL;
- indexScan = index_beginscan(OldHeap, OldIndex, SnapshotAny, NULL, 0, 0);
+ indexScan = index_beginscan(OldHeap, OldIndex, SnapshotAny, NULL, 0, 0, 0);
index_rescan(indexScan, NULL, 0, NULL, 0);
}
else
@@ -762,7 +763,7 @@ heapam_relation_copy_for_cluster(Relation OldHeap, Relation NewHeap,
pgstat_progress_update_param(PROGRESS_CLUSTER_PHASE,
PROGRESS_CLUSTER_PHASE_SEQ_SCAN_HEAP);
- tableScan = table_beginscan(OldHeap, SnapshotAny, 0, (ScanKey) NULL);
+ tableScan = table_beginscan(OldHeap, SnapshotAny, 0, (ScanKey) NULL, 0);
heapScan = (HeapScanDesc) tableScan;
indexScan = NULL;
diff --git a/src/backend/access/index/genam.c b/src/backend/access/index/genam.c
index c96917085c2..9d425504e1b 100644
--- a/src/backend/access/index/genam.c
+++ b/src/backend/access/index/genam.c
@@ -447,7 +447,7 @@ systable_beginscan(Relation heapRelation,
}
sysscan->iscan = index_beginscan(heapRelation, irel,
- snapshot, NULL, nkeys, 0);
+ snapshot, NULL, nkeys, 0, 0);
index_rescan(sysscan->iscan, idxkey, nkeys, NULL, 0);
sysscan->scan = NULL;
@@ -708,7 +708,7 @@ systable_beginscan_ordered(Relation heapRelation,
}
sysscan->iscan = index_beginscan(heapRelation, indexRelation,
- snapshot, NULL, nkeys, 0);
+ snapshot, NULL, nkeys, 0, 0);
index_rescan(sysscan->iscan, idxkey, nkeys, NULL, 0);
sysscan->scan = NULL;
diff --git a/src/backend/access/index/indexam.c b/src/backend/access/index/indexam.c
index 0492d92d23b..b5523cf2ab1 100644
--- a/src/backend/access/index/indexam.c
+++ b/src/backend/access/index/indexam.c
@@ -257,7 +257,7 @@ index_beginscan(Relation heapRelation,
Relation indexRelation,
Snapshot snapshot,
IndexScanInstrumentation *instrument,
- int nkeys, int norderbys)
+ int nkeys, int norderbys, uint32 flags)
{
IndexScanDesc scan;
@@ -284,7 +284,7 @@ index_beginscan(Relation heapRelation,
scan->instrument = instrument;
/* prepare to fetch index matches from table */
- scan->xs_heapfetch = table_index_fetch_begin(heapRelation);
+ scan->xs_heapfetch = table_index_fetch_begin(heapRelation, flags);
return scan;
}
@@ -615,7 +615,7 @@ index_beginscan_parallel(Relation heaprel, Relation indexrel,
scan->instrument = instrument;
/* prepare to fetch index matches from table */
- scan->xs_heapfetch = table_index_fetch_begin(heaprel);
+ scan->xs_heapfetch = table_index_fetch_begin(heaprel, 0);
return scan;
}
diff --git a/src/backend/access/nbtree/nbtsort.c b/src/backend/access/nbtree/nbtsort.c
index 454adaee7dc..02ab0233e59 100644
--- a/src/backend/access/nbtree/nbtsort.c
+++ b/src/backend/access/nbtree/nbtsort.c
@@ -1925,7 +1925,7 @@ _bt_parallel_scan_and_sort(BTSpool *btspool, BTSpool *btspool2,
indexInfo = BuildIndexInfo(btspool->index);
indexInfo->ii_Concurrent = btshared->isconcurrent;
scan = table_beginscan_parallel(btspool->heap,
- ParallelTableScanFromBTShared(btshared));
+ ParallelTableScanFromBTShared(btshared), 0);
reltuples = table_index_build_scan(btspool->heap, btspool->index, indexInfo,
true, progress, _bt_build_callback,
&buildstate, scan);
diff --git a/src/backend/access/table/tableam.c b/src/backend/access/table/tableam.c
index 1e099febdc8..db2a302a486 100644
--- a/src/backend/access/table/tableam.c
+++ b/src/backend/access/table/tableam.c
@@ -163,10 +163,11 @@ table_parallelscan_initialize(Relation rel, ParallelTableScanDesc pscan,
}
TableScanDesc
-table_beginscan_parallel(Relation relation, ParallelTableScanDesc pscan)
+table_beginscan_parallel(Relation relation, ParallelTableScanDesc pscan, uint32 flags)
{
Snapshot snapshot;
- uint32 flags = SO_TYPE_SEQSCAN |
+
+ flags |= SO_TYPE_SEQSCAN |
SO_ALLOW_STRAT | SO_ALLOW_SYNC | SO_ALLOW_PAGEMODE;
Assert(RelFileLocatorEquals(relation->rd_locator, pscan->phs_locator));
@@ -248,7 +249,7 @@ table_index_fetch_tuple_check(Relation rel,
bool found;
slot = table_slot_create(rel, NULL);
- scan = table_index_fetch_begin(rel);
+ scan = table_index_fetch_begin(rel, 0);
found = table_index_fetch_tuple(scan, tid, snapshot, slot, &call_again,
all_dead);
table_index_fetch_end(scan);
diff --git a/src/backend/commands/constraint.c b/src/backend/commands/constraint.c
index 3497a8221f2..97c8278e36d 100644
--- a/src/backend/commands/constraint.c
+++ b/src/backend/commands/constraint.c
@@ -106,7 +106,7 @@ unique_key_recheck(PG_FUNCTION_ARGS)
*/
tmptid = checktid;
{
- IndexFetchTableData *scan = table_index_fetch_begin(trigdata->tg_relation);
+ IndexFetchTableData *scan = table_index_fetch_begin(trigdata->tg_relation, 0);
bool call_again = false;
if (!table_index_fetch_tuple(scan, &tmptid, SnapshotSelf, slot,
diff --git a/src/backend/commands/copyto.c b/src/backend/commands/copyto.c
index cef452584e5..22b453dc617 100644
--- a/src/backend/commands/copyto.c
+++ b/src/backend/commands/copyto.c
@@ -1156,7 +1156,7 @@ CopyRelationTo(CopyToState cstate, Relation rel, Relation root_rel, uint64 *proc
AttrMap *map = NULL;
TupleTableSlot *root_slot = NULL;
- scandesc = table_beginscan(rel, GetActiveSnapshot(), 0, NULL);
+ scandesc = table_beginscan(rel, GetActiveSnapshot(), 0, NULL, 0);
slot = table_slot_create(rel, NULL);
/*
diff --git a/src/backend/commands/tablecmds.c b/src/backend/commands/tablecmds.c
index 07e5b95782e..58dbbf4d851 100644
--- a/src/backend/commands/tablecmds.c
+++ b/src/backend/commands/tablecmds.c
@@ -6345,7 +6345,7 @@ ATRewriteTable(AlteredTableInfo *tab, Oid OIDNewHeap)
* checking all the constraints.
*/
snapshot = RegisterSnapshot(GetLatestSnapshot());
- scan = table_beginscan(oldrel, snapshot, 0, NULL);
+ scan = table_beginscan(oldrel, snapshot, 0, NULL, 0);
/*
* Switch to per-tuple memory context and reset it for each tuple
@@ -13730,7 +13730,7 @@ validateForeignKeyConstraint(char *conname,
*/
snapshot = RegisterSnapshot(GetLatestSnapshot());
slot = table_slot_create(rel, NULL);
- scan = table_beginscan(rel, snapshot, 0, NULL);
+ scan = table_beginscan(rel, snapshot, 0, NULL, 0);
perTupCxt = AllocSetContextCreate(CurrentMemoryContext,
"validateForeignKeyConstraint",
diff --git a/src/backend/commands/typecmds.c b/src/backend/commands/typecmds.c
index 47d5047fe8b..055759cd343 100644
--- a/src/backend/commands/typecmds.c
+++ b/src/backend/commands/typecmds.c
@@ -3157,7 +3157,7 @@ validateDomainNotNullConstraint(Oid domainoid)
/* Scan all tuples in this relation */
snapshot = RegisterSnapshot(GetLatestSnapshot());
- scan = table_beginscan(testrel, snapshot, 0, NULL);
+ scan = table_beginscan(testrel, snapshot, 0, NULL, 0);
slot = table_slot_create(testrel, NULL);
while (table_scan_getnextslot(scan, ForwardScanDirection, slot))
{
@@ -3238,7 +3238,7 @@ validateDomainCheckConstraint(Oid domainoid, const char *ccbin, LOCKMODE lockmod
/* Scan all tuples in this relation */
snapshot = RegisterSnapshot(GetLatestSnapshot());
- scan = table_beginscan(testrel, snapshot, 0, NULL);
+ scan = table_beginscan(testrel, snapshot, 0, NULL, 0);
slot = table_slot_create(testrel, NULL);
while (table_scan_getnextslot(scan, ForwardScanDirection, slot))
{
diff --git a/src/backend/executor/execIndexing.c b/src/backend/executor/execIndexing.c
index dd323c9b9fd..b41bfeca244 100644
--- a/src/backend/executor/execIndexing.c
+++ b/src/backend/executor/execIndexing.c
@@ -816,7 +816,7 @@ check_exclusion_or_unique_constraint(Relation heap, Relation index,
retry:
conflict = false;
found_self = false;
- index_scan = index_beginscan(heap, index, &DirtySnapshot, NULL, indnkeyatts, 0);
+ index_scan = index_beginscan(heap, index, &DirtySnapshot, NULL, indnkeyatts, 0, 0);
index_rescan(index_scan, scankeys, indnkeyatts, NULL, 0);
while (index_getnext_slot(index_scan, ForwardScanDirection, existing_slot))
diff --git a/src/backend/executor/execReplication.c b/src/backend/executor/execReplication.c
index def32774c90..473d236e551 100644
--- a/src/backend/executor/execReplication.c
+++ b/src/backend/executor/execReplication.c
@@ -204,7 +204,7 @@ RelationFindReplTupleByIndex(Relation rel, Oid idxoid,
skey_attoff = build_replindex_scan_key(skey, rel, idxrel, searchslot);
/* Start an index scan. */
- scan = index_beginscan(rel, idxrel, &snap, NULL, skey_attoff, 0);
+ scan = index_beginscan(rel, idxrel, &snap, NULL, skey_attoff, 0, 0);
retry:
found = false;
@@ -382,7 +382,7 @@ RelationFindReplTupleSeq(Relation rel, LockTupleMode lockmode,
/* Start a heap scan. */
InitDirtySnapshot(snap);
- scan = table_beginscan(rel, &snap, 0, NULL);
+ scan = table_beginscan(rel, &snap, 0, NULL, 0);
scanslot = table_slot_create(rel, NULL);
retry:
@@ -601,7 +601,7 @@ RelationFindDeletedTupleInfoSeq(Relation rel, TupleTableSlot *searchslot,
* not yet committed or those just committed prior to the scan are
* excluded in update_most_recent_deletion_info().
*/
- scan = table_beginscan(rel, SnapshotAny, 0, NULL);
+ scan = table_beginscan(rel, SnapshotAny, 0, NULL, 0);
scanslot = table_slot_create(rel, NULL);
table_rescan(scan, NULL);
@@ -665,7 +665,7 @@ RelationFindDeletedTupleInfoByIndex(Relation rel, Oid idxoid,
* not yet committed or those just committed prior to the scan are
* excluded in update_most_recent_deletion_info().
*/
- scan = index_beginscan(rel, idxrel, SnapshotAny, NULL, skey_attoff, 0);
+ scan = index_beginscan(rel, idxrel, SnapshotAny, NULL, skey_attoff, 0, 0);
index_rescan(scan, skey, skey_attoff, NULL, 0);
diff --git a/src/backend/executor/nodeBitmapHeapscan.c b/src/backend/executor/nodeBitmapHeapscan.c
index bf24f3d7fe0..0d854db51a1 100644
--- a/src/backend/executor/nodeBitmapHeapscan.c
+++ b/src/backend/executor/nodeBitmapHeapscan.c
@@ -105,11 +105,18 @@ BitmapTableScanSetup(BitmapHeapScanState *node)
*/
if (!node->ss.ss_currentScanDesc)
{
+ uint32 flags = 0;
+
+ if (!bms_is_member(((Scan *) node->ss.ps.plan)->scanrelid,
+ node->ss.ps.state->es_modified_relids))
+ flags = SO_HINT_REL_READ_ONLY;
+
node->ss.ss_currentScanDesc =
table_beginscan_bm(node->ss.ss_currentRelation,
node->ss.ps.state->es_snapshot,
0,
- NULL);
+ NULL,
+ flags);
}
node->ss.ss_currentScanDesc->st.rs_tbmiterator = tbmiterator;
diff --git a/src/backend/executor/nodeIndexonlyscan.c b/src/backend/executor/nodeIndexonlyscan.c
index f464cca9507..87b04b1b88e 100644
--- a/src/backend/executor/nodeIndexonlyscan.c
+++ b/src/backend/executor/nodeIndexonlyscan.c
@@ -94,7 +94,7 @@ IndexOnlyNext(IndexOnlyScanState *node)
estate->es_snapshot,
&node->ioss_Instrument,
node->ioss_NumScanKeys,
- node->ioss_NumOrderByKeys);
+ node->ioss_NumOrderByKeys, 0);
node->ioss_ScanDesc = scandesc;
diff --git a/src/backend/executor/nodeIndexscan.c b/src/backend/executor/nodeIndexscan.c
index f36929deec3..90f929ce741 100644
--- a/src/backend/executor/nodeIndexscan.c
+++ b/src/backend/executor/nodeIndexscan.c
@@ -102,6 +102,12 @@ IndexNext(IndexScanState *node)
if (scandesc == NULL)
{
+ uint32 flags = 0;
+
+ if (!bms_is_member(((Scan *) node->ss.ps.plan)->scanrelid,
+ estate->es_modified_relids))
+ flags = SO_HINT_REL_READ_ONLY;
+
/*
* We reach here if the index scan is not parallel, or if we're
* serially executing an index scan that was planned to be parallel.
@@ -111,7 +117,8 @@ IndexNext(IndexScanState *node)
estate->es_snapshot,
&node->iss_Instrument,
node->iss_NumScanKeys,
- node->iss_NumOrderByKeys);
+ node->iss_NumOrderByKeys,
+ flags);
node->iss_ScanDesc = scandesc;
@@ -207,7 +214,7 @@ IndexNextWithReorder(IndexScanState *node)
estate->es_snapshot,
&node->iss_Instrument,
node->iss_NumScanKeys,
- node->iss_NumOrderByKeys);
+ node->iss_NumOrderByKeys, 0);
node->iss_ScanDesc = scandesc;
diff --git a/src/backend/executor/nodeSeqscan.c b/src/backend/executor/nodeSeqscan.c
index 94047d29430..4d0cbb9dee4 100644
--- a/src/backend/executor/nodeSeqscan.c
+++ b/src/backend/executor/nodeSeqscan.c
@@ -65,13 +65,20 @@ SeqNext(SeqScanState *node)
if (scandesc == NULL)
{
+ uint32 flags = 0;
+
+ if (!bms_is_member(((Scan *) node->ss.ps.plan)->scanrelid,
+ estate->es_modified_relids))
+ flags = SO_HINT_REL_READ_ONLY;
+
/*
* We reach here if the scan is not parallel, or if we're serially
* executing a scan that was planned to be parallel.
*/
scandesc = table_beginscan(node->ss.ss_currentRelation,
estate->es_snapshot,
- 0, NULL);
+ 0, NULL, flags);
+
node->ss.ss_currentScanDesc = scandesc;
}
@@ -367,14 +374,20 @@ ExecSeqScanInitializeDSM(SeqScanState *node,
{
EState *estate = node->ss.ps.state;
ParallelTableScanDesc pscan;
+ uint32 flags = 0;
pscan = shm_toc_allocate(pcxt->toc, node->pscan_len);
table_parallelscan_initialize(node->ss.ss_currentRelation,
pscan,
estate->es_snapshot);
shm_toc_insert(pcxt->toc, node->ss.ps.plan->plan_node_id, pscan);
+ if (!bms_is_member(((Scan *) node->ss.ps.plan)->scanrelid,
+ estate->es_modified_relids))
+ flags = SO_HINT_REL_READ_ONLY;
+
node->ss.ss_currentScanDesc =
- table_beginscan_parallel(node->ss.ss_currentRelation, pscan);
+ table_beginscan_parallel(node->ss.ss_currentRelation, pscan,
+ flags);
}
/* ----------------------------------------------------------------
@@ -404,8 +417,15 @@ ExecSeqScanInitializeWorker(SeqScanState *node,
ParallelWorkerContext *pwcxt)
{
ParallelTableScanDesc pscan;
+ uint32 flags = 0;
+
+ if (!bms_is_member(((Scan *) node->ss.ps.plan)->scanrelid,
+ node->ss.ps.state->es_modified_relids))
+ flags = SO_HINT_REL_READ_ONLY;
pscan = shm_toc_lookup(pwcxt->toc, node->ss.ps.plan->plan_node_id, false);
node->ss.ss_currentScanDesc =
- table_beginscan_parallel(node->ss.ss_currentRelation, pscan);
+ table_beginscan_parallel(node->ss.ss_currentRelation,
+ pscan,
+ flags);
}
diff --git a/src/backend/partitioning/partbounds.c b/src/backend/partitioning/partbounds.c
index 8ba038c5ef4..d3b340ee2a7 100644
--- a/src/backend/partitioning/partbounds.c
+++ b/src/backend/partitioning/partbounds.c
@@ -3370,7 +3370,7 @@ check_default_partition_contents(Relation parent, Relation default_rel,
econtext = GetPerTupleExprContext(estate);
snapshot = RegisterSnapshot(GetLatestSnapshot());
tupslot = table_slot_create(part_rel, &estate->es_tupleTable);
- scan = table_beginscan(part_rel, snapshot, 0, NULL);
+ scan = table_beginscan(part_rel, snapshot, 0, NULL, 0);
/*
* Switch to per-tuple memory context and reset it for each tuple
diff --git a/src/backend/utils/adt/selfuncs.c b/src/backend/utils/adt/selfuncs.c
index 540aa9628d7..28434146eba 100644
--- a/src/backend/utils/adt/selfuncs.c
+++ b/src/backend/utils/adt/selfuncs.c
@@ -7100,7 +7100,7 @@ get_actual_variable_endpoint(Relation heapRel,
index_scan = index_beginscan(heapRel, indexRel,
&SnapshotNonVacuumable, NULL,
- 1, 0);
+ 1, 0, 0);
/* Set it up for index-only scan */
index_scan->xs_want_itup = true;
index_rescan(index_scan, scankeys, 1, NULL, 0);
diff --git a/src/include/access/genam.h b/src/include/access/genam.h
index 9200a22bd9f..d29d9e905fc 100644
--- a/src/include/access/genam.h
+++ b/src/include/access/genam.h
@@ -177,7 +177,7 @@ extern IndexScanDesc index_beginscan(Relation heapRelation,
Relation indexRelation,
Snapshot snapshot,
IndexScanInstrumentation *instrument,
- int nkeys, int norderbys);
+ int nkeys, int norderbys, uint32 flags);
extern IndexScanDesc index_beginscan_bitmap(Relation indexRelation,
Snapshot snapshot,
IndexScanInstrumentation *instrument,
diff --git a/src/include/access/heapam.h b/src/include/access/heapam.h
index 4702ec00dea..fc2c8314e97 100644
--- a/src/include/access/heapam.h
+++ b/src/include/access/heapam.h
@@ -119,6 +119,12 @@ typedef struct IndexFetchHeapData
Buffer xs_cbuf; /* current heap buffer in scan, if any */
/* NB: if xs_cbuf is not InvalidBuffer, we hold a pin on that buffer */
+
+ /*
+ * Some optimizations can only be performed if the query does not modify
+ * the underlying relation. Track that here.
+ */
+ bool modifies_base_rel;
} IndexFetchHeapData;
/* Result codes for HeapTupleSatisfiesVacuum */
diff --git a/src/include/access/tableam.h b/src/include/access/tableam.h
index 2fa790b6bf5..d10b1b03cdb 100644
--- a/src/include/access/tableam.h
+++ b/src/include/access/tableam.h
@@ -63,6 +63,8 @@ typedef enum ScanOptions
/* unregister snapshot at scan end? */
SO_TEMP_SNAPSHOT = 1 << 9,
+ /* set if the query doesn't modify the rel */
+ SO_HINT_REL_READ_ONLY = 1 << 10,
} ScanOptions;
/*
@@ -420,7 +422,7 @@ typedef struct TableAmRoutine
*
* Tuples for an index scan can then be fetched via index_fetch_tuple.
*/
- struct IndexFetchTableData *(*index_fetch_begin) (Relation rel);
+ struct IndexFetchTableData *(*index_fetch_begin) (Relation rel, uint32 flags);
/*
* Reset index fetch. Typically this will release cross index fetch
@@ -874,9 +876,9 @@ extern TupleTableSlot *table_slot_create(Relation relation, List **reglist);
*/
static inline TableScanDesc
table_beginscan(Relation rel, Snapshot snapshot,
- int nkeys, ScanKeyData *key)
+ int nkeys, ScanKeyData *key, uint32 flags)
{
- uint32 flags = SO_TYPE_SEQSCAN |
+ flags |= SO_TYPE_SEQSCAN |
SO_ALLOW_STRAT | SO_ALLOW_SYNC | SO_ALLOW_PAGEMODE;
return rel->rd_tableam->scan_begin(rel, snapshot, nkeys, key, NULL, flags);
@@ -919,9 +921,9 @@ table_beginscan_strat(Relation rel, Snapshot snapshot,
*/
static inline TableScanDesc
table_beginscan_bm(Relation rel, Snapshot snapshot,
- int nkeys, ScanKeyData *key)
+ int nkeys, ScanKeyData *key, uint32 flags)
{
- uint32 flags = SO_TYPE_BITMAPSCAN | SO_ALLOW_PAGEMODE;
+ flags |= SO_TYPE_BITMAPSCAN | SO_ALLOW_PAGEMODE;
return rel->rd_tableam->scan_begin(rel, snapshot, nkeys, key,
NULL, flags);
@@ -1128,7 +1130,8 @@ extern void table_parallelscan_initialize(Relation rel,
* Caller must hold a suitable lock on the relation.
*/
extern TableScanDesc table_beginscan_parallel(Relation relation,
- ParallelTableScanDesc pscan);
+ ParallelTableScanDesc pscan,
+ uint32 flags);
/*
* Begin a parallel tid range scan. `pscan` needs to have been initialized
@@ -1164,9 +1167,9 @@ table_parallelscan_reinitialize(Relation rel, ParallelTableScanDesc pscan)
* Tuples for an index scan can then be fetched via table_index_fetch_tuple().
*/
static inline IndexFetchTableData *
-table_index_fetch_begin(Relation rel)
+table_index_fetch_begin(Relation rel, uint32 flags)
{
- return rel->rd_tableam->index_fetch_begin(rel);
+ return rel->rd_tableam->index_fetch_begin(rel, flags);
}
/*
--
2.43.0
v23-0013-Allow-on-access-pruning-to-set-pages-all-visible.patchtext/x-patch; charset=US-ASCII; name=v23-0013-Allow-on-access-pruning-to-set-pages-all-visible.patchDownload
From 422613795deaaef2bdd43cd0767e019cbdd44f50 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <melanieplageman@gmail.com>
Date: Wed, 3 Dec 2025 15:24:08 -0500
Subject: [PATCH v23 13/14] Allow on-access pruning to set pages all-visible
Many queries do not modify the underlying relation. For such queries, if
on-access pruning occurs during the scan, we can check whether the page
has become all-visible and update the visibility map accordingly.
Previously, only vacuum and COPY FREEZE marked pages as all-visible or
all-frozen.
This commit implements on-access VM setting for sequential scans as well
as for the underlying heap relation in index scans and bitmap heap
scans.
Author: Melanie Plageman <melanieplageman@gmail.com>
Reviewed-by: Andres Freund <andres@anarazel.de>
Reviewed-by: Kirill Reshke <reshkekirill@gmail.com>
Discussion: https://postgr.es/m/flat/CAAKRu_ZMw6Npd_qm2KM%2BFwQ3cMOMx1Dh3VMhp8-V7SOLxdK9-g%40mail.gmail.com
---
src/backend/access/heap/heapam.c | 15 +++-
src/backend/access/heap/heapam_handler.c | 15 +++-
src/backend/access/heap/pruneheap.c | 78 ++++++++++++++++---
src/include/access/heapam.h | 24 +++++-
.../t/035_standby_logical_decoding.pl | 3 +-
5 files changed, 116 insertions(+), 19 deletions(-)
diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index 3ad78ba4694..ecc04390ac7 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -570,6 +570,7 @@ heap_prepare_pagescan(TableScanDesc sscan)
Buffer buffer = scan->rs_cbuf;
BlockNumber block = scan->rs_cblock;
Snapshot snapshot;
+ Buffer *vmbuffer = NULL;
Page page;
int lines;
bool all_visible;
@@ -584,7 +585,9 @@ heap_prepare_pagescan(TableScanDesc sscan)
/*
* Prune and repair fragmentation for the whole page, if possible.
*/
- heap_page_prune_opt(scan->rs_base.rs_rd, buffer);
+ if (sscan->rs_flags & SO_HINT_REL_READ_ONLY)
+ vmbuffer = &scan->rs_vmbuffer;
+ heap_page_prune_opt(scan->rs_base.rs_rd, buffer, vmbuffer);
/*
* We must hold share lock on the buffer content while examining tuple
@@ -1261,6 +1264,7 @@ heap_beginscan(Relation relation, Snapshot snapshot,
sizeof(TBMIterateResult));
}
+ scan->rs_vmbuffer = InvalidBuffer;
return (TableScanDesc) scan;
}
@@ -1299,6 +1303,12 @@ heap_rescan(TableScanDesc sscan, ScanKey key, bool set_params,
scan->rs_cbuf = InvalidBuffer;
}
+ if (BufferIsValid(scan->rs_vmbuffer))
+ {
+ ReleaseBuffer(scan->rs_vmbuffer);
+ scan->rs_vmbuffer = InvalidBuffer;
+ }
+
/*
* SO_TYPE_BITMAPSCAN would be cleaned up here, but it does not hold any
* additional data vs a normal HeapScan
@@ -1331,6 +1341,9 @@ heap_endscan(TableScanDesc sscan)
if (BufferIsValid(scan->rs_cbuf))
ReleaseBuffer(scan->rs_cbuf);
+ if (BufferIsValid(scan->rs_vmbuffer))
+ ReleaseBuffer(scan->rs_vmbuffer);
+
/*
* Must free the read stream before freeing the BufferAccessStrategy.
*/
diff --git a/src/backend/access/heap/heapam_handler.c b/src/backend/access/heap/heapam_handler.c
index d7fac94826d..27e3498f5f4 100644
--- a/src/backend/access/heap/heapam_handler.c
+++ b/src/backend/access/heap/heapam_handler.c
@@ -85,6 +85,7 @@ heapam_index_fetch_begin(Relation rel, uint32 flags)
hscan->xs_base.rel = rel;
hscan->xs_cbuf = InvalidBuffer;
+ hscan->xs_vmbuffer = InvalidBuffer;
hscan->modifies_base_rel = !(flags & SO_HINT_REL_READ_ONLY);
return &hscan->xs_base;
@@ -100,6 +101,12 @@ heapam_index_fetch_reset(IndexFetchTableData *scan)
ReleaseBuffer(hscan->xs_cbuf);
hscan->xs_cbuf = InvalidBuffer;
}
+
+ if (BufferIsValid(hscan->xs_vmbuffer))
+ {
+ ReleaseBuffer(hscan->xs_vmbuffer);
+ hscan->xs_vmbuffer = InvalidBuffer;
+ }
}
static void
@@ -139,7 +146,8 @@ heapam_index_fetch_tuple(struct IndexFetchTableData *scan,
* Prune page, but only if we weren't already on this page
*/
if (prev_buf != hscan->xs_cbuf)
- heap_page_prune_opt(hscan->xs_base.rel, hscan->xs_cbuf);
+ heap_page_prune_opt(hscan->xs_base.rel, hscan->xs_cbuf,
+ hscan->modifies_base_rel ? NULL : &hscan->xs_vmbuffer);
}
/* Obtain share-lock on the buffer so we can examine visibility */
@@ -2472,6 +2480,7 @@ BitmapHeapScanNextBlock(TableScanDesc scan,
TBMIterateResult *tbmres;
OffsetNumber offsets[TBM_MAX_TUPLES_PER_PAGE];
int noffsets = -1;
+ Buffer *vmbuffer = NULL;
Assert(scan->rs_flags & SO_TYPE_BITMAPSCAN);
Assert(hscan->rs_read_stream);
@@ -2518,7 +2527,9 @@ BitmapHeapScanNextBlock(TableScanDesc scan,
/*
* Prune and repair fragmentation for the whole page, if possible.
*/
- heap_page_prune_opt(scan->rs_rd, buffer);
+ if (scan->rs_flags & SO_HINT_REL_READ_ONLY)
+ vmbuffer = &hscan->rs_vmbuffer;
+ heap_page_prune_opt(scan->rs_rd, buffer, vmbuffer);
/*
* We must hold share lock on the buffer content while examining tuple
diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index 03b8ddcc38d..04f10054402 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -205,7 +205,9 @@ static bool heap_page_will_set_vis(Relation relation,
Buffer heap_buf,
Buffer vmbuffer,
bool blk_known_av,
- const PruneState *prstate,
+ PruneReason reason,
+ bool do_prune, bool do_freeze,
+ PruneState *prstate,
uint8 *new_vmbits,
bool *do_set_pd_vis);
@@ -221,9 +223,13 @@ static bool heap_page_will_set_vis(Relation relation,
* if there's not any use in pruning.
*
* Caller must have pin on the buffer, and must *not* have a lock on it.
+ *
+ * If vmbuffer is not NULL, it is okay for pruning to set the visibility map if
+ * the page is all-visible. We will take care of pinning and, if needed,
+ * reading in the page of the visibility map.
*/
void
-heap_page_prune_opt(Relation relation, Buffer buffer)
+heap_page_prune_opt(Relation relation, Buffer buffer, Buffer *vmbuffer)
{
Page page = BufferGetPage(buffer);
TransactionId prune_xid;
@@ -305,6 +311,13 @@ heap_page_prune_opt(Relation relation, Buffer buffer)
.cutoffs = NULL,
};
+ if (vmbuffer)
+ {
+ visibilitymap_pin(relation, BufferGetBlockNumber(buffer), vmbuffer);
+ params.options |= HEAP_PAGE_PRUNE_UPDATE_VIS;
+ params.vmbuffer = *vmbuffer;
+ }
+
heap_page_prune_and_freeze(¶ms, &presult, &dummy_off_loc,
NULL, NULL);
@@ -863,6 +876,9 @@ get_conflict_xid(bool do_prune, bool do_freeze, bool do_set_vm, uint8 new_vmbits
* corrupted, it will fix them by clearing the VM bit and page hint. This does
* not need to be done in a critical section.
*
+ * This should be called only after do_freeze has been decided (and do_prune
+ * has been set), as these factor into our heuristic-based decision.
+ *
* Returns true if one or both VM bits should be set, along with the desired
* flags in *new_vmbits. Also indicates via do_set_pd_vis whether
* PD_ALL_VISIBLE should be set on the heap page.
@@ -873,7 +889,9 @@ heap_page_will_set_vis(Relation relation,
Buffer heap_buf,
Buffer vmbuffer,
bool blk_known_av,
- const PruneState *prstate,
+ PruneReason reason,
+ bool do_prune, bool do_freeze,
+ PruneState *prstate,
uint8 *new_vmbits,
bool *do_set_pd_vis)
{
@@ -888,6 +906,24 @@ heap_page_will_set_vis(Relation relation,
return false;
}
+ /*
+ * If this is an on-access call and we're not actually pruning, avoid
+ * setting the visibility map if it would newly dirty the heap page or, if
+ * the page is already dirty, if doing so would require including a
+ * full-page image (FPI) of the heap page in the WAL. This situation
+ * should be rare, as on-access pruning is only attempted when
+ * pd_prune_xid is valid.
+ */
+ if (reason == PRUNE_ON_ACCESS &&
+ prstate->all_visible &&
+ !do_prune && !do_freeze &&
+ (!BufferIsDirty(heap_buf) || XLogCheckBufferNeedsBackup(heap_buf)))
+ {
+ prstate->all_visible = false;
+ prstate->all_frozen = false;
+ return false;
+ }
+
/*
* It should never be the case that the visibility map page is set while
* the page-level bit is clear, but the reverse is allowed (if checksums
@@ -921,14 +957,15 @@ heap_page_will_set_vis(Relation relation,
* WAL-logged.
*
* As of PostgreSQL 9.2, the visibility map bit should never be set if the
- * page-level bit is clear. However, it's possible that the bit got
- * cleared after heap_vac_scan_next_block() was called, so we must recheck
- * with buffer lock before concluding that the VM is corrupt.
+ * page-level bit is clear. However, it's possible that in vacuum the bit
+ * got cleared after heap_vac_scan_next_block() was called, so we must
+ * recheck with buffer lock before concluding that the VM is corrupt.
*
* Callers which did not check the visibility map and determine
* blk_known_av will not be eligible for this, however the cost of
* potentially needing to read the visibility map for pages that are not
- * all-visible is too high to justify generalizing the check.
+ * all-visible is too high to justify generalizing the check. A future
+ * vacuum will have to take care of fixing the corruption.
*/
else if (blk_known_av && !PageIsAllVisible(heap_page) &&
visibilitymap_get_status(relation, heap_blk, &vmbuffer) != 0)
@@ -1149,6 +1186,7 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
*/
do_set_vm = heap_page_will_set_vis(params->relation,
blockno, buffer, vmbuffer, params->blk_known_av,
+ params->reason, do_prune, do_freeze,
&prstate, &new_vmbits, &do_set_pd_vis);
/* We should only set the VM if PD_ALL_VISIBLE is set or will be */
@@ -1224,13 +1262,29 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
old_vmbits = visibilitymap_set(blockno,
vmbuffer, new_vmbits,
params->relation->rd_locator);
- Assert(old_vmbits != new_vmbits);
+
+ /*
+ * If on-access pruning set the VM in between when vacuum first
+ * checked the visibility map and determined blk_known_av and when
+ * we actually prune the page, we could end up trying to set the
+ * VM only to find it is already set.
+ */
+ if (old_vmbits == new_vmbits)
+ {
+ LockBuffer(vmbuffer, BUFFER_LOCK_UNLOCK);
+ /* Unset so we don't emit WAL since no change occured */
+ do_set_vm = false;
+ }
}
/*
- * Emit a WAL XLOG_HEAP2_PRUNE* record showing what we did
+ * Emit a WAL XLOG_HEAP2_PRUNE* record showing what we did. If we were
+ * only planning to update the VM, and it turns out that it was
+ * already set, there is no need to emit WAL. As such, we must check
+ * that some change is required again.
*/
- if (RelationNeedsWAL(params->relation))
+ if (RelationNeedsWAL(params->relation) &&
+ (do_prune || do_freeze || do_set_vm))
{
log_heap_prune_and_freeze(params->relation, buffer,
do_set_vm ? vmbuffer : InvalidBuffer,
@@ -2440,8 +2494,8 @@ heap_log_freeze_plan(HeapTupleFreeze *tuples, int ntuples,
* - Reaping: During vacuum phase III, items that are already LP_DEAD are
* marked as unused.
*
- * - VM updates: After vacuum phases I and III, the heap page may be marked
- * all-visible and all-frozen.
+ * - VM updates: After vacuum phases I and III and on-access, the heap page
+ * may be marked all-visible and all-frozen.
*
* These changes all happen together, so we use a single WAL record for them
* all.
diff --git a/src/include/access/heapam.h b/src/include/access/heapam.h
index fc2c8314e97..3f2b5eedfff 100644
--- a/src/include/access/heapam.h
+++ b/src/include/access/heapam.h
@@ -95,6 +95,13 @@ typedef struct HeapScanDescData
*/
ParallelBlockTableScanWorkerData *rs_parallelworkerdata;
+ /*
+ * For sequential scans and bitmap heap scans. If the relation is not
+ * being modified, on-access pruning may read in the current heap page's
+ * corresponding VM block to this buffer.
+ */
+ Buffer rs_vmbuffer;
+
/* these fields only used in page-at-a-time mode and for bitmap scans */
uint32 rs_cindex; /* current tuple's index in vistuples */
uint32 rs_ntuples; /* number of visible tuples on page */
@@ -117,8 +124,18 @@ typedef struct IndexFetchHeapData
{
IndexFetchTableData xs_base; /* AM independent part of the descriptor */
- Buffer xs_cbuf; /* current heap buffer in scan, if any */
- /* NB: if xs_cbuf is not InvalidBuffer, we hold a pin on that buffer */
+ /*
+ * Current heap buffer in scan, if any. NB: if xs_cbuf is not
+ * InvalidBuffer, we hold a pin on that buffer.
+ */
+ Buffer xs_cbuf;
+
+ /*
+ * For index scans that do not modify the underlying heap table, on-access
+ * pruning may read in the current heap page's corresponding VM block to
+ * this buffer.
+ */
+ Buffer xs_vmbuffer;
/*
* Some optimizations can only be performed if the query does not modify
@@ -425,7 +442,8 @@ extern TransactionId heap_index_delete_tuples(Relation rel,
TM_IndexDeleteOp *delstate);
/* in heap/pruneheap.c */
-extern void heap_page_prune_opt(Relation relation, Buffer buffer);
+extern void heap_page_prune_opt(Relation relation, Buffer buffer,
+ Buffer *vmbuffer);
extern void heap_page_prune_and_freeze(PruneFreezeParams *params,
PruneFreezeResult *presult,
OffsetNumber *off_loc,
diff --git a/src/test/recovery/t/035_standby_logical_decoding.pl b/src/test/recovery/t/035_standby_logical_decoding.pl
index ebe2fae1789..bdd9f0a62cd 100644
--- a/src/test/recovery/t/035_standby_logical_decoding.pl
+++ b/src/test/recovery/t/035_standby_logical_decoding.pl
@@ -296,6 +296,7 @@ wal_level = 'logical'
max_replication_slots = 4
max_wal_senders = 4
autovacuum = off
+hot_standby_feedback = on
});
$node_primary->dump_info;
$node_primary->start;
@@ -748,7 +749,7 @@ check_pg_recvlogical_stderr($handle,
$logstart = -s $node_standby->logfile;
reactive_slots_change_hfs_and_wait_for_xmins('shared_row_removal_',
- 'no_conflict_', 0, 1);
+ 'no_conflict_', 1, 0);
# This should not trigger a conflict
wait_until_vacuum_can_remove(
--
2.43.0
v23-0014-Set-pd_prune_xid-on-insert.patchtext/x-patch; charset=UTF-8; name=v23-0014-Set-pd_prune_xid-on-insert.patchDownload
From 59d58156716426668022402d030cf8de7fcac928 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <melanieplageman@gmail.com>
Date: Tue, 29 Jul 2025 16:12:56 -0400
Subject: [PATCH v23 14/14] Set pd_prune_xid on insert
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
Now that visibility map (VM) updates can occur during read-only queries,
it makes sense to also set the page’s pd_prune_xid hint during inserts.
This enables heap_page_prune_and_freeze() to run after a page is
filled with newly inserted tuples the first time it is read.
This change also addresses a long-standing note in heap_insert() and
heap_multi_insert(), which observed that setting pd_prune_xid would
help clean up aborted insertions sooner. Without it, such tuples might
linger until VACUUM, whereas now they can be pruned earlier.
Setting pd_prune_xid on insert can cause a page to be dirtied and
written out when it previously would not have been, affetcting the
reported number of hits in the index-killtuples isolation test. It is
unclear if this is a bug in the way hits are tracked, a faulty test
expectation, or if simply updating the test's expected output is
sufficient remediation.
ci-os-only:
---
src/backend/access/heap/heapam.c | 25 +++++++++++++------
src/backend/access/heap/heapam_xlog.c | 15 ++++++++++-
.../modules/index/expected/killtuples.out | 6 ++---
3 files changed, 34 insertions(+), 12 deletions(-)
diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index ecc04390ac7..d5f3f897dd3 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -2119,6 +2119,7 @@ heap_insert(Relation relation, HeapTuple tup, CommandId cid,
TransactionId xid = GetCurrentTransactionId();
HeapTuple heaptup;
Buffer buffer;
+ Page page;
Buffer vmbuffer = InvalidBuffer;
bool all_visible_cleared = false;
@@ -2178,15 +2179,19 @@ heap_insert(Relation relation, HeapTuple tup, CommandId cid,
}
/*
- * XXX Should we set PageSetPrunable on this page ?
+ * Set pd_prune_xid to trigger heap_page_prune_and_freeze() once the page
+ * is full so that we can set the page all-visible in the VM.
*
- * The inserting transaction may eventually abort thus making this tuple
- * DEAD and hence available for pruning. Though we don't want to optimize
- * for aborts, if no other tuple in this page is UPDATEd/DELETEd, the
- * aborted tuple will never be pruned until next vacuum is triggered.
+ * Setting pd_prune_xid is also handy if the inserting transaction
+ * eventually aborts making this tuple DEAD and hence available for
+ * pruning. If no other tuple in this page is UPDATEd/DELETEd, the aborted
+ * tuple would never otherwise be pruned until next vacuum is triggered.
*
- * If you do add PageSetPrunable here, add it in heap_xlog_insert too.
+ * Don't set it if we are in bootstrap mode, though.
*/
+ page = BufferGetPage(buffer);
+ if (TransactionIdIsNormal(xid))
+ PageSetPrunable(page, xid);
MarkBufferDirty(buffer);
@@ -2196,7 +2201,6 @@ heap_insert(Relation relation, HeapTuple tup, CommandId cid,
xl_heap_insert xlrec;
xl_heap_header xlhdr;
XLogRecPtr recptr;
- Page page = BufferGetPage(buffer);
uint8 info = XLOG_HEAP_INSERT;
int bufflags = 0;
@@ -2560,8 +2564,13 @@ heap_multi_insert(Relation relation, TupleTableSlot **slots, int ntuples,
}
/*
- * XXX Should we set PageSetPrunable on this page ? See heap_insert()
+ * Set pd_prune_xid. See heap_insert() for more on why we do this when
+ * inserting tuples. This only makes sense if we aren't already
+ * setting the page frozen in the VM. We also don't set it in
+ * bootstrap mode.
*/
+ if (!all_frozen_set && TransactionIdIsNormal(xid))
+ PageSetPrunable(page, xid);
MarkBufferDirty(buffer);
diff --git a/src/backend/access/heap/heapam_xlog.c b/src/backend/access/heap/heapam_xlog.c
index f0de2c136a0..cf62a8df67c 100644
--- a/src/backend/access/heap/heapam_xlog.c
+++ b/src/backend/access/heap/heapam_xlog.c
@@ -465,6 +465,12 @@ heap_xlog_insert(XLogReaderState *record)
freespace = PageGetHeapFreeSpace(page); /* needed to update FSM below */
+ /*
+ * Set the page prunable to trigger on-access pruning later which may
+ * set the page all-visible in the VM.
+ */
+ PageSetPrunable(page, XLogRecGetXid(record));
+
PageSetLSN(page, lsn);
if (xlrec->flags & XLH_INSERT_ALL_VISIBLE_CLEARED)
@@ -614,9 +620,16 @@ heap_xlog_multi_insert(XLogReaderState *record)
if (xlrec->flags & XLH_INSERT_ALL_VISIBLE_CLEARED)
PageClearAllVisible(page);
- /* XLH_INSERT_ALL_FROZEN_SET implies that all tuples are visible */
+ /*
+ * XLH_INSERT_ALL_FROZEN_SET implies that all tuples are visible. If
+ * we are not setting the page frozen, then set the page's prunable
+ * hint so that we trigger on-access pruning later which may set the
+ * page all-visible in the VM.
+ */
if (xlrec->flags & XLH_INSERT_ALL_FROZEN_SET)
PageSetAllVisible(page);
+ else
+ PageSetPrunable(page, XLogRecGetXid(record));
MarkBufferDirty(buffer);
}
diff --git a/src/test/modules/index/expected/killtuples.out b/src/test/modules/index/expected/killtuples.out
index be7ddd756ef..b29f2434b00 100644
--- a/src/test/modules/index/expected/killtuples.out
+++ b/src/test/modules/index/expected/killtuples.out
@@ -54,7 +54,7 @@ step flush: SELECT FROM pg_stat_force_next_flush();
step result: SELECT heap_blks_read + heap_blks_hit - counter.heap_accesses AS new_heap_accesses FROM counter, pg_statio_all_tables WHERE relname = 'kill_prior_tuple';
new_heap_accesses
-----------------
- 1
+ 2
(1 row)
step measure: UPDATE counter SET heap_accesses = (SELECT heap_blks_read + heap_blks_hit FROM pg_statio_all_tables WHERE relname = 'kill_prior_tuple');
@@ -130,7 +130,7 @@ step flush: SELECT FROM pg_stat_force_next_flush();
step result: SELECT heap_blks_read + heap_blks_hit - counter.heap_accesses AS new_heap_accesses FROM counter, pg_statio_all_tables WHERE relname = 'kill_prior_tuple';
new_heap_accesses
-----------------
- 1
+ 2
(1 row)
step measure: UPDATE counter SET heap_accesses = (SELECT heap_blks_read + heap_blks_hit FROM pg_statio_all_tables WHERE relname = 'kill_prior_tuple');
@@ -283,7 +283,7 @@ step flush: SELECT FROM pg_stat_force_next_flush();
step result: SELECT heap_blks_read + heap_blks_hit - counter.heap_accesses AS new_heap_accesses FROM counter, pg_statio_all_tables WHERE relname = 'kill_prior_tuple';
new_heap_accesses
-----------------
- 1
+ 2
(1 row)
step measure: UPDATE counter SET heap_accesses = (SELECT heap_blks_read + heap_blks_hit FROM pg_statio_all_tables WHERE relname = 'kill_prior_tuple');
--
2.43.0
On Mon, Nov 24, 2025 at 3:08 AM Chao Li <li.evan.chao@gmail.com> wrote:
On Nov 21, 2025, at 09:09, Chao Li <li.evan.chao@gmail.com> wrote:
I’d stop here today, and continue reviewing rest commits in next week.
I continue reviewing today.
I incorporated all your feedback in my recently posted v23. Thanks for
the review!
- Melanie
Hi Melanie,
I resisted this patch again today. I reviewed 0001-0004, and got a few more comments:
On Dec 4, 2025, at 07:07, Melanie Plageman <melanieplageman@gmail.com> wrote:
<v23-0001-Simplify-vacuum-visibility-assertion.patch><v23-0002-Refactor-lazy_scan_prune-VM-set-logic-into-helpe.patch><v23-0003-Set-the-VM-in-prune-code.patch><v23-0004-Move-VM-assert-into-prune-freeze-code.patch><v23-0005-Eliminate-XLOG_HEAP2_VISIBLE-from-vacuum-phase-I.patch><v23-0006-Eliminate-XLOG_HEAP2_VISIBLE-from-empty-page-vac.patch><v23-0007-Remove-XLOG_HEAP2_VISIBLE-entirely.patch><v23-0008-Rename-GlobalVisTestIsRemovableXid-to-GlobalVisX.patch><v23-0009-Use-GlobalVisState-in-vacuum-to-determine-page-l.patch><v23-0010-Unset-all_visible-sooner-if-not-freezing.patch><v23-0011-Track-which-relations-are-modified-by-a-query.patch><v23-0012-Pass-down-information-on-table-modification-to-s.patch><v23-0013-Allow-on-access-pruning-to-set-pages-all-visible.patch><v23-0014-Set-pd_prune_xid-on-insert.patch>
1 - 0002
```
+static bool
+heap_page_will_set_vis(Relation relation,
+ BlockNumber heap_blk,
+ Buffer heap_buf,
+ Buffer vmbuffer,
+ bool all_visible_according_to_vm,
+ const PruneFreezeResult *presult,
+ uint8 *new_vmbits,
+ bool *do_set_pd_vis)
```
Actually, I wanted to comment on the new function name in last round of review, but I guess I missed that.
I was very confused what “set_vis” means, and finally figured out “vis” should stand for “visibility”. Here “vis” actually means “visibility map bits”. There is the other “vis” in the last parameter’s name “do_set_pd_vis” where the “vis” should be mean “PD_ALL_VISIBLE” bit. So the two “vis” feels making things confusing.
How about rename the function to “heap_page_will_set_vm_bits”, and rename the last parameter to “do_set_all_visible”?
2 - 0002
```
+ * Decide whether to set the visibility map bits for heap_blk, using
+ * information from PruneFreezeResult and all_visible_according_to_vm. This
+ * function does not actually set the VM bit or page-level hint,
+ * PD_ALL_VISIBLE.
+ *
+ * If it finds that the page-level visibility hint or VM is corrupted, it will
+ * fix them by clearing the VM bit and page hint. This does not need to be
+ * done in a critical section.
+ *
+ * Returns true if one or both VM bits should be set, along with the desired
+ * flags in *new_vmbits. Also indicates via do_set_pd_vis whether
+ * PD_ALL_VISIBLE should be set on the heap page.
+ */
```
This function comment mentions PD_ALL_VISIBLE twice, but never mentions ALL_FROZEN. So “Returns true if one or both VM bits should be set” fells unclear. How about rephrase like "Returns true if the all-visible and/or all-frozen VM bits should be set.”
3 - 0002
```
+ /*
+ * Now handle two potential corruption cases:
+ *
+ * These do not need to happen in a critical section and are not
+ * WAL-logged.
+ *
+ * As of PostgreSQL 9.2, the visibility map bit should never be set if the
+ * page-level bit is clear. However, it's possible that the bit got
+ * cleared after heap_vac_scan_next_block() was called, so we must recheck
+ * with buffer lock before concluding that the VM is corrupt.
+ */
+ else if (all_visible_according_to_vm && !PageIsAllVisible(heap_page) &&
+ visibilitymap_get_status(relation, heap_blk, &vmbuffer) != 0)
+ {
+ ereport(WARNING,
+ (errcode(ERRCODE_DATA_CORRUPTED),
+ errmsg("page is not marked all-visible but visibility map bit is set in relation \"%s\" page %u",
+ RelationGetRelationName(relation), heap_blk)));
+
+ visibilitymap_clear(relation, heap_blk, vmbuffer,
+ VISIBILITYMAP_VALID_BITS);
+ }
```
Here in the comment and error message, I guess “visibility map bit” refers to “all visible bit”, can we be explicit?
Best regards,
--
Chao Li (Evan)
HighGo Software Co., Ltd.
https://www.highgo.com/
On Thu, Dec 4, 2025 at 12:11 AM Chao Li <li.evan.chao@gmail.com> wrote:
I resisted this patch again today. I reviewed 0001-0004, and got a few more comments:
Thanks for the review! v24 attached with updates you suggested as well
as the bug fix described below.
I realized my code didn't mark the heap buffer dirty if we were not
modifying it (i.e. only setting the VM). This trips an assert in
XLogRegisterBuffer() which requires that all buffers registered with
the WAL machinery are marked dirty unless REGBUF_NO_CHANGE is passed.
It wasn't possible to hit it in master because we unconditionally
dirtied the buffer if we found the VM not set in
find_next_unskippable_block() -- even if we made no changes to the
heap buffer. But my refactoring only dirtied the heap buffer if we
modified it (either pruning/freezing or setting PD_ALL_VISIBLE).
In attached v24, I once again always dirty the heap buffer before
registering it. We can't skip adding the heap buffer to the WAL chain
even if we didn't modify it, because we use it to update the freespace
map during recovery. We could pass REGBUF_NO_CHANGE when the heap
buffer is completely unmodified. But the delicate special case logic
doesn't seem worth the effort to maintain, as the only time the heap
buffer should be unmodified is when the VM has been truncated or
removed. I added a test to the commit doing this refactoring that
would have caught my mistake (0003).
I also split the refactoring of the VM setting logic into more commits
to help make it clearer (0003-0004). We could technically commit the
refactoring commits to master. I had not originally intended to do so
since they do not have independent value beyond clarity for the
reviewer.
In this set 0001 and 0002 are independent. 0003-0007 are all small
steps toward the single change in 0007 which combines the VM updates
into the same WAL record as pruning and freezing. 0008 and 0009 are
removing the rest of XLOG_HEAP2_VISIBLE. 0010 - 0012 are refactoring
needed to set the VM during on-access pruning. 0013 - 0015 are small
steps toward setting the VM on-access. And 0016 sets the prune xid on
insert so we may set the VM on-access for pages that have only new
data.
+static bool +heap_page_will_set_vis(Relation relation,Actually, I wanted to comment on the new function name in last round of review, but I guess I missed that.
I was very confused what “set_vis” means, and finally figured out “vis” should stand for “visibility”. Here “vis” actually means “visibility map bits”. There is the other “vis” in the last parameter’s name “do_set_pd_vis” where the “vis” should be mean “PD_ALL_VISIBLE” bit. So the two “vis” feels making things confusing.
How about rename the function to “heap_page_will_set_vm_bits”, and rename the last parameter to “do_set_all_visible”?
I named it that way because it was responsible for telling us what we
should set the VM to _and_ if we should set PD_ALL_VISIBLE. However,
once I corrected the bug mentioned above, we always set PD_ALL_VISIBLE
if setting the VM, so I was able to remove this ambiguity. As such
I've renamed the function heap_page_will_set_vm() (and removed the
last parameter).
+ * Decide whether to set the visibility map bits for heap_blk, using + * information from PruneFreezeResult and all_visible_according_to_vm. This + * function does not actually set the VM bit or page-level hint, + * PD_ALL_VISIBLE. + * + * If it finds that the page-level visibility hint or VM is corrupted, it will + * fix them by clearing the VM bit and page hint. This does not need to be + * done in a critical section. + * + * Returns true if one or both VM bits should be set, along with the desired + * flags in *new_vmbits. Also indicates via do_set_pd_vis whether + * PD_ALL_VISIBLE should be set on the heap page. + */ ```This function comment mentions PD_ALL_VISIBLE twice, but never mentions ALL_FROZEN. So “Returns true if one or both VM bits should be set” fells unclear. How about rephrase like "Returns true if the all-visible and/or all-frozen VM bits should be set.”
PD_ALL_VISIBLE is the page-level visibility hint (not the VM bit) and
there is no page level frozen hint. It doesn't mention that the VM
bits are all-visible and all-frozen, though, so I have modified the
comment a bit to make sure the all-frozen bit of the VM is mentioned.
+ * Now handle two potential corruption cases: + * + * These do not need to happen in a critical section and are not + * WAL-logged. + * + * As of PostgreSQL 9.2, the visibility map bit should never be set if the + * page-level bit is clear. However, it's possible that the bit got + * cleared after heap_vac_scan_next_block() was called, so we must recheck + * with buffer lock before concluding that the VM is corrupt. + */ + else if (all_visible_according_to_vm && !PageIsAllVisible(heap_page) && + visibilitymap_get_status(relation, heap_blk, &vmbuffer) != 0) + { + ereport(WARNING, + (errcode(ERRCODE_DATA_CORRUPTED), + errmsg("page is not marked all-visible but visibility map bit is set in relation \"%s\" page %u", + RelationGetRelationName(relation), heap_blk))); + + visibilitymap_clear(relation, heap_blk, vmbuffer, + VISIBILITYMAP_VALID_BITS); + } ```Here in the comment and error message, I guess “visibility map bit” refers to “all visible bit”, can we be explicit?
This is an existing comment in lazy_scan_prune() that I simply moved.
It isn't valid for the all-frozen bit to be set unless the all-visible
bit is set. I'm not sure whether specifying which bits were set in the
warning will help users debug the corruption they are seeing. But I
think it is a reasonable suggestion to make. Perhaps it is worth
suggesting this (adding the specific vmbits to the warning message) in
a separate thread since it is an independent improvement on master?
- Melanie
Attachments:
v24-0001-Simplify-vacuum-visibility-assertion.patchtext/x-patch; charset=US-ASCII; name=v24-0001-Simplify-vacuum-visibility-assertion.patchDownload
From 08652e26242aceb5048d384209b49ff6d4b287d3 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <melanieplageman@gmail.com>
Date: Wed, 3 Dec 2025 10:42:53 -0500
Subject: [PATCH v24 01/16] Simplify vacuum visibility assertion
Phase I vacuum gives the page a once-over after pruning and freezing to
check that the values of all_visible and all_frozen agree with the
result of heap_page_is_all_visible(). This is meant to keep the logic in
phase I for determining visibility in sync with the logic in phase III.
Rewrite the assertion to avoid an Assert(false).
Suggested by Andres Freund.
---
src/backend/access/heap/vacuumlazy.c | 7 +++----
1 file changed, 3 insertions(+), 4 deletions(-)
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index 65bb0568a86..984d5879947 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -2028,10 +2028,9 @@ lazy_scan_prune(LVRelState *vacrel,
Assert(presult.lpdead_items == 0);
- if (!heap_page_is_all_visible(vacrel->rel, buf,
- vacrel->cutoffs.OldestXmin, &debug_all_frozen,
- &debug_cutoff, &vacrel->offnum))
- Assert(false);
+ Assert(heap_page_is_all_visible(vacrel->rel, buf,
+ vacrel->cutoffs.OldestXmin, &debug_all_frozen,
+ &debug_cutoff, &vacrel->offnum));
Assert(presult.all_frozen == debug_all_frozen);
--
2.43.0
v24-0002-Add-comment-about-PD_ALL_VISIBLE-and-VM-sync.patchtext/x-patch; charset=US-ASCII; name=v24-0002-Add-comment-about-PD_ALL_VISIBLE-and-VM-sync.patchDownload
From 33e063761f30c23ce923ea485eb9cb86acee2d92 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <melanieplageman@gmail.com>
Date: Mon, 8 Dec 2025 17:32:49 -0500
Subject: [PATCH v24 02/16] Add comment about PD_ALL_VISIBLE and VM sync
The comment above heap_xlog_visible() about the critical integrity
requirement for PD_ALL_VISIBLE and the visibility map should also be in
heap_xlog_prune_freeze() where we set PD_ALL_VISIBLE.
---
src/backend/access/heap/heapam_xlog.c | 6 ++++++
1 file changed, 6 insertions(+)
diff --git a/src/backend/access/heap/heapam_xlog.c b/src/backend/access/heap/heapam_xlog.c
index 11cb3f74da5..a09fb4b803a 100644
--- a/src/backend/access/heap/heapam_xlog.c
+++ b/src/backend/access/heap/heapam_xlog.c
@@ -157,6 +157,12 @@ heap_xlog_prune_freeze(XLogReaderState *record)
/* There should be no more data */
Assert((char *) frz_offsets == dataptr + datalen);
+ /*
+ * The critical integrity requirement here is that we must never end
+ * up with the visibility map bit set and the page-level
+ * PD_ALL_VISIBLE bit unset. If that were to occur, a subsequent page
+ * modification would fail to clear the visibility map bit.
+ */
if (vmflags & VISIBILITYMAP_VALID_BITS)
PageSetAllVisible(page);
--
2.43.0
v24-0003-Combine-visibilitymap_set-cases-in-lazy_scan_pru.patchtext/x-patch; charset=US-ASCII; name=v24-0003-Combine-visibilitymap_set-cases-in-lazy_scan_pru.patchDownload
From ed8807b5099a0066881c8b8e1690100fa71f2e90 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <melanieplageman@gmail.com>
Date: Mon, 8 Dec 2025 15:49:54 -0500
Subject: [PATCH v24 03/16] Combine visibilitymap_set() cases in
lazy_scan_prune()
The heap buffer is unconditionally added to the WAL chain when setting
the VM, so it must always be marked dirty.
In one of the cases in lazy_scan_prune(), we try to avoid setting
PD_ALL_VISIBLE and marking the buffer dirty again if PD_ALL_VISIBLE is
already set. There is little gain here, and if we eliminate that
condition, we can easily combine the two cases which set the VM in
lazy_scan_prune(). This is more straightforward and makes it clear that
the heap buffer must be marked dirty since it is added to the WAL chain.
In the previously separate second VM set case, the heap buffer would
always be dirty anyway -- either because we just froze a tuple and
marked the buffer dirty or because we modified the buffer between
find_next_unskippable_block() and heap_page_prune_and_freeze() and then
pruned it in heap_page_prune_and_freeze().
This commit also adds a test case to ensure we don't add code
resulting in the heap buffer not being marked dirty before being
added to the WAL chain.
XXX: is it okay to do a checkpoint in the pg_visibility test?
---
.../pg_visibility/expected/pg_visibility.out | 13 +++
contrib/pg_visibility/sql/pg_visibility.sql | 9 ++
src/backend/access/heap/vacuumlazy.c | 95 ++++---------------
3 files changed, 43 insertions(+), 74 deletions(-)
diff --git a/contrib/pg_visibility/expected/pg_visibility.out b/contrib/pg_visibility/expected/pg_visibility.out
index 09fa5933a35..adc01162895 100644
--- a/contrib/pg_visibility/expected/pg_visibility.out
+++ b/contrib/pg_visibility/expected/pg_visibility.out
@@ -204,6 +204,19 @@ select pg_truncate_visibility_map('test_partition');
(1 row)
+-- the heap buffer must be marked dirty before adding it to the WAL chain when
+-- setting the VM
+create table test_heap_buffer_dirty(a int);
+insert into test_heap_buffer_dirty values (1);
+vacuum (freeze) test_heap_buffer_dirty;
+checkpoint;
+select pg_truncate_visibility_map('test_heap_buffer_dirty');
+ pg_truncate_visibility_map
+----------------------------
+
+(1 row)
+
+vacuum test_heap_buffer_dirty;
-- test copy freeze
create table copyfreeze (a int, b char(1500));
-- load all rows via COPY FREEZE and ensure that all pages are set all-visible
diff --git a/contrib/pg_visibility/sql/pg_visibility.sql b/contrib/pg_visibility/sql/pg_visibility.sql
index 5af06ec5b76..0cdd087badb 100644
--- a/contrib/pg_visibility/sql/pg_visibility.sql
+++ b/contrib/pg_visibility/sql/pg_visibility.sql
@@ -94,6 +94,15 @@ select count(*) > 0 from pg_visibility_map_summary('test_partition');
select * from pg_check_frozen('test_partition'); -- hopefully none
select pg_truncate_visibility_map('test_partition');
+-- the heap buffer must be marked dirty before adding it to the WAL chain when
+-- setting the VM
+create table test_heap_buffer_dirty(a int);
+insert into test_heap_buffer_dirty values (1);
+vacuum (freeze) test_heap_buffer_dirty;
+checkpoint;
+select pg_truncate_visibility_map('test_heap_buffer_dirty');
+vacuum test_heap_buffer_dirty;
+
-- test copy freeze
create table copyfreeze (a int, b char(1500));
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index 984d5879947..14040552e48 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -2080,15 +2080,21 @@ lazy_scan_prune(LVRelState *vacrel,
* of last heap_vac_scan_next_block() call), and from all_visible and
* all_frozen variables
*/
- if (!all_visible_according_to_vm && presult.all_visible)
+ if ((presult.all_visible && !all_visible_according_to_vm) ||
+ (presult.all_frozen && !VM_ALL_FROZEN(rel, blkno, &vmbuffer)))
{
uint8 old_vmbits;
- uint8 flags = VISIBILITYMAP_ALL_VISIBLE;
+ uint8 new_vmbits = VISIBILITYMAP_ALL_VISIBLE;
if (presult.all_frozen)
{
+ /*
+ * We can pass InvalidTransactionId as our cutoff_xid, since a
+ * snapshotConflictHorizon sufficient to make everything safe for
+ * REDO was logged when the page's tuples were frozen.
+ */
Assert(!TransactionIdIsValid(presult.vm_conflict_horizon));
- flags |= VISIBILITYMAP_ALL_FROZEN;
+ new_vmbits |= VISIBILITYMAP_ALL_FROZEN;
}
/*
@@ -2097,36 +2103,36 @@ lazy_scan_prune(LVRelState *vacrel,
* checksums are not enabled). Regardless, set both bits so that we
* get back in sync.
*
- * NB: If the heap page is all-visible but the VM bit is not set, we
- * don't need to dirty the heap page. However, if checksums are
- * enabled, we do need to make sure that the heap page is dirtied
- * before passing it to visibilitymap_set(), because it may be logged.
- * Given that this situation should only happen in rare cases after a
- * crash, it is not worth optimizing.
+ * The heap page is added to the WAL chain even if it wasn't modified,
+ * so we still need to mark it dirty. The only scenario where it isn't
+ * modified in phase I is when the VM was truncated or removed, which
+ * isn't worth optimizing for.
*/
PageSetAllVisible(page);
MarkBufferDirty(buf);
old_vmbits = visibilitymap_set(vacrel->rel, blkno, buf,
InvalidXLogRecPtr,
vmbuffer, presult.vm_conflict_horizon,
- flags);
+ new_vmbits);
/*
- * If the page wasn't already set all-visible and/or all-frozen in the
- * VM, count it as newly set for logging.
+ * For the purposes of logging, count whether or not the page was
+ * newly set all-visible and, potentially, all-frozen.
*/
- if ((old_vmbits & VISIBILITYMAP_ALL_VISIBLE) == 0)
+ if ((old_vmbits & VISIBILITYMAP_ALL_VISIBLE) == 0 &&
+ (new_vmbits & VISIBILITYMAP_ALL_VISIBLE) != 0)
{
vacrel->vm_new_visible_pages++;
- if (presult.all_frozen)
+ if ((new_vmbits & VISIBILITYMAP_ALL_FROZEN) != 0)
{
vacrel->vm_new_visible_frozen_pages++;
*vm_page_frozen = true;
}
}
else if ((old_vmbits & VISIBILITYMAP_ALL_FROZEN) == 0 &&
- presult.all_frozen)
+ (new_vmbits & VISIBILITYMAP_ALL_FROZEN) != 0)
{
+ Assert((new_vmbits & VISIBILITYMAP_ALL_VISIBLE) != 0);
vacrel->vm_new_frozen_pages++;
*vm_page_frozen = true;
}
@@ -2177,65 +2183,6 @@ lazy_scan_prune(LVRelState *vacrel,
VISIBILITYMAP_VALID_BITS);
}
- /*
- * If the all-visible page is all-frozen but not marked as such yet, mark
- * it as all-frozen.
- */
- else if (all_visible_according_to_vm && presult.all_frozen &&
- !VM_ALL_FROZEN(vacrel->rel, blkno, &vmbuffer))
- {
- uint8 old_vmbits;
-
- /*
- * Avoid relying on all_visible_according_to_vm as a proxy for the
- * page-level PD_ALL_VISIBLE bit being set, since it might have become
- * stale -- even when all_visible is set
- */
- if (!PageIsAllVisible(page))
- {
- PageSetAllVisible(page);
- MarkBufferDirty(buf);
- }
-
- /*
- * Set the page all-frozen (and all-visible) in the VM.
- *
- * We can pass InvalidTransactionId as our cutoff_xid, since a
- * snapshotConflictHorizon sufficient to make everything safe for REDO
- * was logged when the page's tuples were frozen.
- */
- Assert(!TransactionIdIsValid(presult.vm_conflict_horizon));
- old_vmbits = visibilitymap_set(vacrel->rel, blkno, buf,
- InvalidXLogRecPtr,
- vmbuffer, InvalidTransactionId,
- VISIBILITYMAP_ALL_VISIBLE |
- VISIBILITYMAP_ALL_FROZEN);
-
- /*
- * The page was likely already set all-visible in the VM. However,
- * there is a small chance that it was modified sometime between
- * setting all_visible_according_to_vm and checking the visibility
- * during pruning. Check the return value of old_vmbits anyway to
- * ensure the visibility map counters used for logging are accurate.
- */
- if ((old_vmbits & VISIBILITYMAP_ALL_VISIBLE) == 0)
- {
- vacrel->vm_new_visible_pages++;
- vacrel->vm_new_visible_frozen_pages++;
- *vm_page_frozen = true;
- }
-
- /*
- * We already checked that the page was not set all-frozen in the VM
- * above, so we don't need to test the value of old_vmbits.
- */
- else
- {
- vacrel->vm_new_frozen_pages++;
- *vm_page_frozen = true;
- }
- }
-
return presult.ndeleted;
}
--
2.43.0
v24-0004-Refactor-lazy_scan_prune-VM-set-logic-into-helpe.patchtext/x-patch; charset=US-ASCII; name=v24-0004-Refactor-lazy_scan_prune-VM-set-logic-into-helpe.patchDownload
From 842dd8da0c38440315de4e01bda026970b42d7eb Mon Sep 17 00:00:00 2001
From: Melanie Plageman <melanieplageman@gmail.com>
Date: Tue, 2 Dec 2025 13:36:39 -0500
Subject: [PATCH v24 04/16] Refactor lazy_scan_prune() VM set logic into helper
While this may not be an improvement on its own, encapsulating the logic
for determining what to set the VM bits to in a helper is one step
toward setting the VM in heap_page_prune_and_freeze().
---
src/backend/access/heap/vacuumlazy.c | 209 ++++++++++++++++-----------
1 file changed, 126 insertions(+), 83 deletions(-)
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index 14040552e48..577950c2f77 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -1934,6 +1934,104 @@ cmpOffsetNumbers(const void *a, const void *b)
return pg_cmp_u16(*(const OffsetNumber *) a, *(const OffsetNumber *) b);
}
+
+/*
+ * Decide whether to set the visibility map bits (all-visible and all-frozen)
+ * for heap_blk using information from PruneFreezeResult and
+ * all_visible_according_to_vm. This function does not actually set the VM
+ * bits or page-level visibility hint, PD_ALL_VISIBLE.
+ *
+ * If it finds that the page-level visibility hint or VM is corrupted, it will
+ * fix them by clearing the VM bits and visibility page hint. This does not
+ * need to be done in a critical section.
+ *
+ * Returns true if one or both VM bits should be set, along with returning
+ * what bits should be set in the VM in *new_vmbits.
+ */
+static bool
+heap_page_will_set_vm(Relation relation,
+ BlockNumber heap_blk,
+ Buffer heap_buf,
+ Buffer vmbuffer,
+ bool all_visible_according_to_vm,
+ const PruneFreezeResult *presult,
+ uint8 *new_vmbits)
+{
+ Page heap_page = BufferGetPage(heap_buf);
+
+ *new_vmbits = 0;
+
+ /*
+ * Determine what to set the visibility map bits to based on information
+ * from the VM (as of last heap_vac_scan_next_block() call), and from
+ * all_visible and all_frozen variables.
+ */
+ if ((presult->all_visible && !all_visible_according_to_vm) ||
+ (presult->all_frozen && !VM_ALL_FROZEN(relation, heap_blk, &vmbuffer)))
+ {
+ *new_vmbits = VISIBILITYMAP_ALL_VISIBLE;
+ if (presult->all_frozen)
+ {
+ Assert(!TransactionIdIsValid(presult->vm_conflict_horizon));
+ *new_vmbits |= VISIBILITYMAP_ALL_FROZEN;
+ }
+
+ return true;
+ }
+
+ /*
+ * Now handle two potential corruption cases:
+ *
+ * These do not need to happen in a critical section and are not
+ * WAL-logged.
+ *
+ * As of PostgreSQL 9.2, the visibility map bit should never be set if the
+ * page-level bit is clear. However, it's possible that the bit got
+ * cleared after heap_vac_scan_next_block() was called, so we must recheck
+ * with buffer lock before concluding that the VM is corrupt.
+ */
+ else if (all_visible_according_to_vm && !PageIsAllVisible(heap_page) &&
+ visibilitymap_get_status(relation, heap_blk, &vmbuffer) != 0)
+ {
+ ereport(WARNING,
+ (errcode(ERRCODE_DATA_CORRUPTED),
+ errmsg("page is not marked all-visible but visibility map bit is set in relation \"%s\" page %u",
+ RelationGetRelationName(relation), heap_blk)));
+
+ visibilitymap_clear(relation, heap_blk, vmbuffer,
+ VISIBILITYMAP_VALID_BITS);
+ }
+
+ /*
+ * It's possible for the value returned by
+ * GetOldestNonRemovableTransactionId() to move backwards, so it's not
+ * wrong for us to see tuples that appear to not be visible to everyone
+ * yet, while PD_ALL_VISIBLE is already set. The real safe xmin value
+ * never moves backwards, but GetOldestNonRemovableTransactionId() is
+ * conservative and sometimes returns a value that's unnecessarily small,
+ * so if we see that contradiction it just means that the tuples that we
+ * think are not visible to everyone yet actually are, and the
+ * PD_ALL_VISIBLE flag is correct.
+ *
+ * There should never be LP_DEAD items on a page with PD_ALL_VISIBLE set,
+ * however.
+ */
+ else if (presult->lpdead_items > 0 && PageIsAllVisible(heap_page))
+ {
+ ereport(WARNING,
+ (errcode(ERRCODE_DATA_CORRUPTED),
+ errmsg("page containing LP_DEAD items is marked as all-visible in relation \"%s\" page %u",
+ RelationGetRelationName(relation), heap_blk)));
+
+ PageClearAllVisible(heap_page);
+ MarkBufferDirty(heap_buf);
+ visibilitymap_clear(relation, heap_blk, vmbuffer,
+ VISIBILITYMAP_VALID_BITS);
+ }
+
+ return false;
+}
+
/*
* lazy_scan_prune() -- lazy_scan_heap() pruning and freezing.
*
@@ -1964,6 +2062,9 @@ lazy_scan_prune(LVRelState *vacrel,
bool *vm_page_frozen)
{
Relation rel = vacrel->rel;
+ bool do_set_vm = false;
+ uint8 new_vmbits = 0;
+ uint8 old_vmbits = 0;
PruneFreezeResult presult;
PruneFreezeParams params = {
.relation = rel,
@@ -2075,33 +2176,20 @@ lazy_scan_prune(LVRelState *vacrel,
Assert(!presult.all_visible || !(*has_lpdead_items));
Assert(!presult.all_frozen || presult.all_visible);
- /*
- * Handle setting visibility map bit based on information from the VM (as
- * of last heap_vac_scan_next_block() call), and from all_visible and
- * all_frozen variables
- */
- if ((presult.all_visible && !all_visible_according_to_vm) ||
- (presult.all_frozen && !VM_ALL_FROZEN(rel, blkno, &vmbuffer)))
- {
- uint8 old_vmbits;
- uint8 new_vmbits = VISIBILITYMAP_ALL_VISIBLE;
-
- if (presult.all_frozen)
- {
- /*
- * We can pass InvalidTransactionId as our cutoff_xid, since a
- * snapshotConflictHorizon sufficient to make everything safe for
- * REDO was logged when the page's tuples were frozen.
- */
- Assert(!TransactionIdIsValid(presult.vm_conflict_horizon));
- new_vmbits |= VISIBILITYMAP_ALL_FROZEN;
- }
+ do_set_vm = heap_page_will_set_vm(rel,
+ blkno,
+ buf,
+ vmbuffer,
+ all_visible_according_to_vm,
+ &presult,
+ &new_vmbits);
+ if (do_set_vm)
+ {
/*
* It should never be the case that the visibility map page is set
* while the page-level bit is clear, but the reverse is allowed (if
- * checksums are not enabled). Regardless, set both bits so that we
- * get back in sync.
+ * checksums are not enabled).
*
* The heap page is added to the WAL chain even if it wasn't modified,
* so we still need to mark it dirty. The only scenario where it isn't
@@ -2114,73 +2202,28 @@ lazy_scan_prune(LVRelState *vacrel,
InvalidXLogRecPtr,
vmbuffer, presult.vm_conflict_horizon,
new_vmbits);
-
- /*
- * For the purposes of logging, count whether or not the page was
- * newly set all-visible and, potentially, all-frozen.
- */
- if ((old_vmbits & VISIBILITYMAP_ALL_VISIBLE) == 0 &&
- (new_vmbits & VISIBILITYMAP_ALL_VISIBLE) != 0)
- {
- vacrel->vm_new_visible_pages++;
- if ((new_vmbits & VISIBILITYMAP_ALL_FROZEN) != 0)
- {
- vacrel->vm_new_visible_frozen_pages++;
- *vm_page_frozen = true;
- }
- }
- else if ((old_vmbits & VISIBILITYMAP_ALL_FROZEN) == 0 &&
- (new_vmbits & VISIBILITYMAP_ALL_FROZEN) != 0)
- {
- Assert((new_vmbits & VISIBILITYMAP_ALL_VISIBLE) != 0);
- vacrel->vm_new_frozen_pages++;
- *vm_page_frozen = true;
- }
}
/*
- * As of PostgreSQL 9.2, the visibility map bit should never be set if the
- * page-level bit is clear. However, it's possible that the bit got
- * cleared after heap_vac_scan_next_block() was called, so we must recheck
- * with buffer lock before concluding that the VM is corrupt.
+ * For the purposes of logging, count whether or not the page was newly
+ * set all-visible and, potentially, all-frozen.
*/
- else if (all_visible_according_to_vm && !PageIsAllVisible(page) &&
- visibilitymap_get_status(vacrel->rel, blkno, &vmbuffer) != 0)
+ if ((old_vmbits & VISIBILITYMAP_ALL_VISIBLE) == 0 &&
+ (new_vmbits & VISIBILITYMAP_ALL_VISIBLE) != 0)
{
- ereport(WARNING,
- (errcode(ERRCODE_DATA_CORRUPTED),
- errmsg("page is not marked all-visible but visibility map bit is set in relation \"%s\" page %u",
- vacrel->relname, blkno)));
-
- visibilitymap_clear(vacrel->rel, blkno, vmbuffer,
- VISIBILITYMAP_VALID_BITS);
+ vacrel->vm_new_visible_pages++;
+ if ((new_vmbits & VISIBILITYMAP_ALL_FROZEN) != 0)
+ {
+ vacrel->vm_new_visible_frozen_pages++;
+ *vm_page_frozen = true;
+ }
}
-
- /*
- * It's possible for the value returned by
- * GetOldestNonRemovableTransactionId() to move backwards, so it's not
- * wrong for us to see tuples that appear to not be visible to everyone
- * yet, while PD_ALL_VISIBLE is already set. The real safe xmin value
- * never moves backwards, but GetOldestNonRemovableTransactionId() is
- * conservative and sometimes returns a value that's unnecessarily small,
- * so if we see that contradiction it just means that the tuples that we
- * think are not visible to everyone yet actually are, and the
- * PD_ALL_VISIBLE flag is correct.
- *
- * There should never be LP_DEAD items on a page with PD_ALL_VISIBLE set,
- * however.
- */
- else if (presult.lpdead_items > 0 && PageIsAllVisible(page))
+ else if ((old_vmbits & VISIBILITYMAP_ALL_FROZEN) == 0 &&
+ (new_vmbits & VISIBILITYMAP_ALL_FROZEN) != 0)
{
- ereport(WARNING,
- (errcode(ERRCODE_DATA_CORRUPTED),
- errmsg("page containing LP_DEAD items is marked as all-visible in relation \"%s\" page %u",
- vacrel->relname, blkno)));
-
- PageClearAllVisible(page);
- MarkBufferDirty(buf);
- visibilitymap_clear(vacrel->rel, blkno, vmbuffer,
- VISIBILITYMAP_VALID_BITS);
+ Assert((new_vmbits & VISIBILITYMAP_ALL_VISIBLE) != 0);
+ vacrel->vm_new_frozen_pages++;
+ *vm_page_frozen = true;
}
return presult.ndeleted;
--
2.43.0
v24-0005-Set-the-VM-in-heap_page_prune_and_freeze.patchtext/x-patch; charset=UTF-8; name=v24-0005-Set-the-VM-in-heap_page_prune_and_freeze.patchDownload
From 0d49cdd7813e02979b9e1b72eb344a93688c5d6e Mon Sep 17 00:00:00 2001
From: Melanie Plageman <melanieplageman@gmail.com>
Date: Tue, 2 Dec 2025 15:07:42 -0500
Subject: [PATCH v24 05/16] Set the VM in heap_page_prune_and_freeze()
This has no independent benefit. It is meant for ease of review. As of
this commit, there is still a separate WAL record emitted for setting
the VM after pruning and freezing. But it is easier to review if moving
the logic into pruneheap.c is separate from setting the VM in the same
WAL record.
---
src/backend/access/heap/pruneheap.c | 263 ++++++++++++++++++++++-----
src/backend/access/heap/vacuumlazy.c | 147 +--------------
src/include/access/heapam.h | 27 +++
3 files changed, 254 insertions(+), 183 deletions(-)
diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index ca44225a10e..d7f36e2764f 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -19,7 +19,7 @@
#include "access/htup_details.h"
#include "access/multixact.h"
#include "access/transam.h"
-#include "access/visibilitymapdefs.h"
+#include "access/visibilitymap.h"
#include "access/xlog.h"
#include "access/xloginsert.h"
#include "commands/vacuum.h"
@@ -44,6 +44,8 @@ typedef struct
bool mark_unused_now;
/* whether to attempt freezing tuples */
bool attempt_freeze;
+ /* whether or not to attempt updating the VM */
+ bool attempt_update_vm;
struct VacuumCutoffs *cutoffs;
/*-------------------------------------------------------
@@ -140,16 +142,17 @@ typedef struct
* all_visible and all_frozen indicate if the all-visible and all-frozen
* bits in the visibility map can be set for this page after pruning.
*
- * visibility_cutoff_xid is the newest xmin of live tuples on the page.
- * The caller can use it as the conflict horizon, when setting the VM
- * bits. It is only valid if we froze some tuples, and all_frozen is
- * true.
+ * visibility_cutoff_xid is the newest xmin of live tuples on the page. It
+ * can be used as the conflict horizon when setting the VM or when
+ * freezing all the tuples on the page. It is only valid when all the live
+ * tuples on the page are all-visible.
*
* NOTE: all_visible and all_frozen initially don't include LP_DEAD items.
* That's convenient for heap_page_prune_and_freeze() to use them to
- * decide whether to freeze the page or not. The all_visible and
- * all_frozen values returned to the caller are adjusted to include
- * LP_DEAD items after we determine whether to opportunistically freeze.
+ * decide whether to opportunistically freeze the page or not. The
+ * all_visible and all_frozen values ultimately used to set the VM are
+ * adjusted to include LP_DEAD items after we determine whether or not to
+ * opportunistically freeze.
*/
bool all_visible;
bool all_frozen;
@@ -191,6 +194,13 @@ static void page_verify_redirects(Page page);
static bool heap_page_will_freeze(Relation relation, Buffer buffer,
bool did_tuple_hint_fpi, bool do_prune, bool do_hint_prune,
PruneState *prstate);
+static bool heap_page_will_set_vm(Relation relation,
+ BlockNumber heap_blk,
+ Buffer heap_buf,
+ Buffer vmbuffer,
+ bool blk_known_av,
+ const PruneFreezeResult *presult,
+ uint8 *new_vmbits);
/*
@@ -280,6 +290,8 @@ heap_page_prune_opt(Relation relation, Buffer buffer)
PruneFreezeParams params = {
.relation = relation,
.buffer = buffer,
+ .vmbuffer = InvalidBuffer,
+ .blk_known_av = false,
.reason = PRUNE_ON_ACCESS,
.options = 0,
.vistest = vistest,
@@ -338,6 +350,8 @@ prune_freeze_setup(PruneFreezeParams *params,
/* cutoffs must be provided if we will attempt freezing */
Assert(!(params->options & HEAP_PAGE_PRUNE_FREEZE) || params->cutoffs);
prstate->attempt_freeze = (params->options & HEAP_PAGE_PRUNE_FREEZE) != 0;
+ prstate->attempt_update_vm =
+ (params->options & HEAP_PAGE_PRUNE_UPDATE_VM) != 0;
prstate->cutoffs = params->cutoffs;
/*
@@ -386,51 +400,54 @@ prune_freeze_setup(PruneFreezeParams *params,
prstate->frz_conflict_horizon = InvalidTransactionId;
/*
- * Vacuum may update the VM after we're done. We can keep track of
- * whether the page will be all-visible and all-frozen after pruning and
- * freezing to help the caller to do that.
+ * Track whether the page could be marked all-visible and/or all-frozen.
+ * This information is used for opportunistic freezing and for updating
+ * the visibility map (VM) if requested by the caller.
*
- * Currently, only VACUUM sets the VM bits. To save the effort, only do
- * the bookkeeping if the caller needs it. Currently, that's tied to
- * HEAP_PAGE_PRUNE_FREEZE, but it could be a separate flag if you wanted
- * to update the VM bits without also freezing or freeze without also
- * setting the VM bits.
+ * Currently, only VACUUM performs freezing, but other callers may in the
+ * future. Visibility bookkeeping is required not just for setting the VM
+ * bits, but also for opportunistic freezing: we only consider freezing if
+ * the page would become all-frozen, or if it would be all-frozen except
+ * for dead tuples that VACUUM will remove. If attempt_update_vm is false,
+ * we will not set the VM bit even if the page is found to be all-visible.
*
- * In addition to telling the caller whether it can set the VM bit, we
- * also use 'all_visible' and 'all_frozen' for our own decision-making. If
- * the whole page would become frozen, we consider opportunistically
- * freezing tuples. We will not be able to freeze the whole page if there
- * are tuples present that are not visible to everyone or if there are
- * dead tuples which are not yet removable. However, dead tuples which
- * will be removed by the end of vacuuming should not preclude us from
- * opportunistically freezing. Because of that, we do not immediately
- * clear all_visible and all_frozen when we see LP_DEAD items. We fix
- * that after scanning the line pointers. We must correct all_visible and
- * all_frozen before we return them to the caller, so that the caller
- * doesn't set the VM bits incorrectly.
+ * If HEAP_PAGE_PRUNE_UPDATE_VM is passed without HEAP_PAGE_PRUNE_FREEZE,
+ * prstate.all_frozen must be initialized to false, since we will not call
+ * heap_prepare_freeze_tuple() for each tuple.
+ *
+ * Dead tuples that will be removed by the end of vacuum should not
+ * prevent opportunistic freezing. Therefore, we do not clear all_visible
+ * and all_frozen when we encounter LP_DEAD items. Instead, we correct
+ * them after deciding whether to freeze, but before updating the VM, to
+ * avoid setting the VM bits incorrectly.
+ *
+ * If neither freezing nor VM updates are requested, we skip the extra
+ * bookkeeping. In this case, initializing all_visible to false allows
+ * heap_prune_record_unchanged_lp_normal() to bypass unnecessary work.
*/
if (prstate->attempt_freeze)
{
prstate->all_visible = true;
prstate->all_frozen = true;
}
+ else if (prstate->attempt_update_vm)
+ {
+ prstate->all_visible = true;
+ prstate->all_frozen = false;
+ }
else
{
- /*
- * Initializing to false allows skipping the work to update them in
- * heap_prune_record_unchanged_lp_normal().
- */
prstate->all_visible = false;
prstate->all_frozen = false;
}
/*
- * The visibility cutoff xid is the newest xmin of live tuples on the
- * page. In the common case, this will be set as the conflict horizon the
- * caller can use for updating the VM. If, at the end of freezing and
- * pruning, the page is all-frozen, there is no possibility that any
- * running transaction on the standby does not see tuples on the page as
- * all-visible, so the conflict horizon remains InvalidTransactionId.
+ * The visibility cutoff xid is the newest xmin of live, committed tuples
+ * older than OldestXmin on the page. This field is only kept up-to-date
+ * if the page is all-visible. As soon as a tuple is encountered that is
+ * not visible to all, this field is unmaintained. As long as it is
+ * maintained, it can be used to calculate the snapshot conflict horizon
+ * when updating the VM and/or freezing all the tuples on the page.
*/
prstate->visibility_cutoff_xid = InvalidTransactionId;
}
@@ -765,10 +782,118 @@ heap_page_will_freeze(Relation relation, Buffer buffer,
return do_freeze;
}
+/*
+ * Decide whether to set the visibility map bits (all-visible and all-frozen)
+ * for heap_blk using information from PruneFreezeResult and blk_known_av.
+ * Some callers may already have examined this page’s VM bits (e.g., VACUUM in
+ * the previous heap_vac_scan_next_block() call) and can pass that along as
+ * blk_known_av. Callers that have not previously checked the page's status in
+ * the VM should pass false for blk_known_av.
+ *
+ * This function does not actually set the VM bits or page-level visibility
+ * hint, PD_ALL_VISIBLE.
+ *
+ * However, if it finds that the page-level visibility hint or VM is
+ * corrupted, it will fix them by clearing the VM bits and page visibility
+ * hint. This does not need to be done in a critical section.
+ *
+ * Returns true if one or both VM bits should be set, along with returning the
+ * desired what bits should be set in the VM in *new_vmbits.
+ */
+static bool
+heap_page_will_set_vm(Relation relation,
+ BlockNumber heap_blk,
+ Buffer heap_buf,
+ Buffer vmbuffer,
+ bool blk_known_av,
+ const PruneFreezeResult *presult,
+ uint8 *new_vmbits)
+{
+ Page heap_page = BufferGetPage(heap_buf);
+
+ *new_vmbits = 0;
+
+ /*
+ * Determine what the visibility map bits should be set to using the
+ * values of all_visible and all_frozen determined during
+ * pruning/freezing.
+ */
+ if ((presult->all_visible && !blk_known_av) ||
+ (presult->all_frozen && !VM_ALL_FROZEN(relation, heap_blk, &vmbuffer)))
+ {
+ *new_vmbits = VISIBILITYMAP_ALL_VISIBLE;
+ if (presult->all_frozen)
+ {
+ Assert(!TransactionIdIsValid(presult->vm_conflict_horizon));
+ *new_vmbits |= VISIBILITYMAP_ALL_FROZEN;
+ }
+
+ return true;
+ }
+
+ /*
+ * Now handle two potential corruption cases:
+ *
+ * These do not need to happen in a critical section and are not
+ * WAL-logged.
+ *
+ * As of PostgreSQL 9.2, the visibility map bit should never be set if the
+ * page-level bit is clear. However, it's possible that the bit got
+ * cleared after heap_vac_scan_next_block() was called, so we must recheck
+ * with buffer lock before concluding that the VM is corrupt.
+ *
+ * Callers which did not check the visibility map and determine
+ * blk_known_av will not be eligible for this, however the cost of
+ * potentially needing to read the visibility map for pages that are not
+ * all-visible is too high to justify generalizing the check.
+ */
+ else if (blk_known_av && !PageIsAllVisible(heap_page) &&
+ visibilitymap_get_status(relation, heap_blk, &vmbuffer) != 0)
+ {
+ ereport(WARNING,
+ (errcode(ERRCODE_DATA_CORRUPTED),
+ errmsg("page is not marked all-visible but visibility map bit is set in relation \"%s\" page %u",
+ RelationGetRelationName(relation), heap_blk)));
+
+ visibilitymap_clear(relation, heap_blk, vmbuffer,
+ VISIBILITYMAP_VALID_BITS);
+ }
+
+ /*
+ * It's possible for the value returned by
+ * GetOldestNonRemovableTransactionId() to move backwards, so it's not
+ * wrong for us to see tuples that appear to not be visible to everyone
+ * yet, while PD_ALL_VISIBLE is already set. The real safe xmin value
+ * never moves backwards, but GetOldestNonRemovableTransactionId() is
+ * conservative and sometimes returns a value that's unnecessarily small,
+ * so if we see that contradiction it just means that the tuples that we
+ * think are not visible to everyone yet actually are, and the
+ * PD_ALL_VISIBLE flag is correct.
+ *
+ * There should never be LP_DEAD items on a page with PD_ALL_VISIBLE set,
+ * however.
+ */
+ else if (presult->lpdead_items > 0 && PageIsAllVisible(heap_page))
+ {
+ ereport(WARNING,
+ (errcode(ERRCODE_DATA_CORRUPTED),
+ errmsg("page containing LP_DEAD items is marked as all-visible in relation \"%s\" page %u",
+ RelationGetRelationName(relation), heap_blk)));
+
+ PageClearAllVisible(heap_page);
+ MarkBufferDirty(heap_buf);
+ visibilitymap_clear(relation, heap_blk, vmbuffer,
+ VISIBILITYMAP_VALID_BITS);
+ }
+
+ return false;
+}
+
/*
* Prune and repair fragmentation and potentially freeze tuples on the
- * specified page.
+ * specified page. If the page's visibility status has changed, update it in
+ * the VM.
*
* Caller must have pin and buffer cleanup lock on the page. Note that we
* don't update the FSM information for page on caller's behalf. Caller might
@@ -783,12 +908,13 @@ heap_page_will_freeze(Relation relation, Buffer buffer,
* tuples if it's required in order to advance relfrozenxid / relminmxid, or
* if it's considered advantageous for overall system performance to do so
* now. The 'params.cutoffs', 'presult', 'new_relfrozen_xid' and
- * 'new_relmin_mxid' arguments are required when freezing. When
- * HEAP_PAGE_PRUNE_FREEZE option is passed, we also set presult->all_visible
- * and presult->all_frozen after determining whether or not to
- * opportunistically freeze, to indicate if the VM bits can be set. They are
- * always set to false when the HEAP_PAGE_PRUNE_FREEZE option is not passed,
- * because at the moment only callers that also freeze need that information.
+ * 'new_relmin_mxid' arguments are required when freezing.
+ *
+ * If HEAP_PAGE_PRUNE_UPDATE_VM is set in params and the visibility status of
+ * the page has changed, we will update the VM at the same time as pruning and
+ * freezing the heap page. We will also update presult->old_vmbits and
+ * presult->new_vmbits with the state of the VM before and after updating it
+ * for the caller to use in bookkeeping.
*
* presult contains output parameters needed by callers, such as the number of
* tuples removed and the offsets of dead items on the page after pruning.
@@ -813,11 +939,14 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
MultiXactId *new_relmin_mxid)
{
Buffer buffer = params->buffer;
+ Buffer vmbuffer = params->vmbuffer;
Page page = BufferGetPage(buffer);
+ BlockNumber blockno = BufferGetBlockNumber(buffer);
PruneState prstate;
bool do_freeze;
bool do_prune;
bool do_hint_prune;
+ bool do_set_vm;
bool did_tuple_hint_fpi;
int64 fpi_before = pgWalUsage.wal_fpi;
@@ -1001,6 +1130,48 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
*new_relmin_mxid = prstate.pagefrz.NoFreezePageRelminMxid;
}
}
+
+ presult->new_vmbits = 0;
+ presult->old_vmbits = 0;
+
+ /* Now update the visibility map and PD_ALL_VISIBLE hint */
+ Assert(!prstate.all_visible || (prstate.lpdead_items == 0));
+
+ do_set_vm = false;
+ if (prstate.attempt_update_vm)
+ do_set_vm = heap_page_will_set_vm(params->relation,
+ blockno,
+ buffer,
+ vmbuffer,
+ params->blk_known_av,
+ presult,
+ &presult->new_vmbits);
+
+ /*
+ * new_vmbits should be 0 regardless of whether or not the page is
+ * all-visible if we do not intend to set the VM.
+ */
+ Assert(do_set_vm || presult->new_vmbits == 0);
+
+ if (do_set_vm)
+ {
+ /*
+ * It should never be the case that the visibility map page is set
+ * while the page-level bit is clear, but the reverse is allowed (if
+ * checksums are not enabled).
+ *
+ * The heap page is added to the WAL chain even if it wasn't modified,
+ * so we still need to mark it dirty. The only scenario where it isn't
+ * modified in phase I is when the VM was truncated or removed, which
+ * isn't worth optimizing for.
+ */
+ PageSetAllVisible(page);
+ MarkBufferDirty(buffer);
+ presult->old_vmbits = visibilitymap_set(params->relation, blockno, buffer,
+ InvalidXLogRecPtr,
+ vmbuffer, presult->vm_conflict_horizon,
+ presult->new_vmbits);
+ }
}
@@ -1475,6 +1646,8 @@ heap_prune_record_unchanged_lp_normal(Page page, PruneState *prstate, OffsetNumb
{
TransactionId xmin;
+ Assert(prstate->attempt_update_vm);
+
if (!HeapTupleHeaderXminCommitted(htup))
{
prstate->all_visible = false;
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index 577950c2f77..86822778abc 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -1935,103 +1935,6 @@ cmpOffsetNumbers(const void *a, const void *b)
}
-/*
- * Decide whether to set the visibility map bits (all-visible and all-frozen)
- * for heap_blk using information from PruneFreezeResult and
- * all_visible_according_to_vm. This function does not actually set the VM
- * bits or page-level visibility hint, PD_ALL_VISIBLE.
- *
- * If it finds that the page-level visibility hint or VM is corrupted, it will
- * fix them by clearing the VM bits and visibility page hint. This does not
- * need to be done in a critical section.
- *
- * Returns true if one or both VM bits should be set, along with returning
- * what bits should be set in the VM in *new_vmbits.
- */
-static bool
-heap_page_will_set_vm(Relation relation,
- BlockNumber heap_blk,
- Buffer heap_buf,
- Buffer vmbuffer,
- bool all_visible_according_to_vm,
- const PruneFreezeResult *presult,
- uint8 *new_vmbits)
-{
- Page heap_page = BufferGetPage(heap_buf);
-
- *new_vmbits = 0;
-
- /*
- * Determine what to set the visibility map bits to based on information
- * from the VM (as of last heap_vac_scan_next_block() call), and from
- * all_visible and all_frozen variables.
- */
- if ((presult->all_visible && !all_visible_according_to_vm) ||
- (presult->all_frozen && !VM_ALL_FROZEN(relation, heap_blk, &vmbuffer)))
- {
- *new_vmbits = VISIBILITYMAP_ALL_VISIBLE;
- if (presult->all_frozen)
- {
- Assert(!TransactionIdIsValid(presult->vm_conflict_horizon));
- *new_vmbits |= VISIBILITYMAP_ALL_FROZEN;
- }
-
- return true;
- }
-
- /*
- * Now handle two potential corruption cases:
- *
- * These do not need to happen in a critical section and are not
- * WAL-logged.
- *
- * As of PostgreSQL 9.2, the visibility map bit should never be set if the
- * page-level bit is clear. However, it's possible that the bit got
- * cleared after heap_vac_scan_next_block() was called, so we must recheck
- * with buffer lock before concluding that the VM is corrupt.
- */
- else if (all_visible_according_to_vm && !PageIsAllVisible(heap_page) &&
- visibilitymap_get_status(relation, heap_blk, &vmbuffer) != 0)
- {
- ereport(WARNING,
- (errcode(ERRCODE_DATA_CORRUPTED),
- errmsg("page is not marked all-visible but visibility map bit is set in relation \"%s\" page %u",
- RelationGetRelationName(relation), heap_blk)));
-
- visibilitymap_clear(relation, heap_blk, vmbuffer,
- VISIBILITYMAP_VALID_BITS);
- }
-
- /*
- * It's possible for the value returned by
- * GetOldestNonRemovableTransactionId() to move backwards, so it's not
- * wrong for us to see tuples that appear to not be visible to everyone
- * yet, while PD_ALL_VISIBLE is already set. The real safe xmin value
- * never moves backwards, but GetOldestNonRemovableTransactionId() is
- * conservative and sometimes returns a value that's unnecessarily small,
- * so if we see that contradiction it just means that the tuples that we
- * think are not visible to everyone yet actually are, and the
- * PD_ALL_VISIBLE flag is correct.
- *
- * There should never be LP_DEAD items on a page with PD_ALL_VISIBLE set,
- * however.
- */
- else if (presult->lpdead_items > 0 && PageIsAllVisible(heap_page))
- {
- ereport(WARNING,
- (errcode(ERRCODE_DATA_CORRUPTED),
- errmsg("page containing LP_DEAD items is marked as all-visible in relation \"%s\" page %u",
- RelationGetRelationName(relation), heap_blk)));
-
- PageClearAllVisible(heap_page);
- MarkBufferDirty(heap_buf);
- visibilitymap_clear(relation, heap_blk, vmbuffer,
- VISIBILITYMAP_VALID_BITS);
- }
-
- return false;
-}
-
/*
* lazy_scan_prune() -- lazy_scan_heap() pruning and freezing.
*
@@ -2062,15 +1965,14 @@ lazy_scan_prune(LVRelState *vacrel,
bool *vm_page_frozen)
{
Relation rel = vacrel->rel;
- bool do_set_vm = false;
- uint8 new_vmbits = 0;
- uint8 old_vmbits = 0;
PruneFreezeResult presult;
PruneFreezeParams params = {
.relation = rel,
.buffer = buf,
+ .vmbuffer = vmbuffer,
+ .blk_known_av = all_visible_according_to_vm,
.reason = PRUNE_VACUUM_SCAN,
- .options = HEAP_PAGE_PRUNE_FREEZE,
+ .options = HEAP_PAGE_PRUNE_FREEZE | HEAP_PAGE_PRUNE_UPDATE_VM,
.vistest = vacrel->vistest,
.cutoffs = &vacrel->cutoffs,
};
@@ -2173,55 +2075,24 @@ lazy_scan_prune(LVRelState *vacrel,
/* Did we find LP_DEAD items? */
*has_lpdead_items = (presult.lpdead_items > 0);
- Assert(!presult.all_visible || !(*has_lpdead_items));
- Assert(!presult.all_frozen || presult.all_visible);
-
- do_set_vm = heap_page_will_set_vm(rel,
- blkno,
- buf,
- vmbuffer,
- all_visible_according_to_vm,
- &presult,
- &new_vmbits);
-
- if (do_set_vm)
- {
- /*
- * It should never be the case that the visibility map page is set
- * while the page-level bit is clear, but the reverse is allowed (if
- * checksums are not enabled).
- *
- * The heap page is added to the WAL chain even if it wasn't modified,
- * so we still need to mark it dirty. The only scenario where it isn't
- * modified in phase I is when the VM was truncated or removed, which
- * isn't worth optimizing for.
- */
- PageSetAllVisible(page);
- MarkBufferDirty(buf);
- old_vmbits = visibilitymap_set(vacrel->rel, blkno, buf,
- InvalidXLogRecPtr,
- vmbuffer, presult.vm_conflict_horizon,
- new_vmbits);
- }
-
/*
* For the purposes of logging, count whether or not the page was newly
* set all-visible and, potentially, all-frozen.
*/
- if ((old_vmbits & VISIBILITYMAP_ALL_VISIBLE) == 0 &&
- (new_vmbits & VISIBILITYMAP_ALL_VISIBLE) != 0)
+ if ((presult.old_vmbits & VISIBILITYMAP_ALL_VISIBLE) == 0 &&
+ (presult.new_vmbits & VISIBILITYMAP_ALL_VISIBLE) != 0)
{
vacrel->vm_new_visible_pages++;
- if ((new_vmbits & VISIBILITYMAP_ALL_FROZEN) != 0)
+ if ((presult.new_vmbits & VISIBILITYMAP_ALL_FROZEN) != 0)
{
vacrel->vm_new_visible_frozen_pages++;
*vm_page_frozen = true;
}
}
- else if ((old_vmbits & VISIBILITYMAP_ALL_FROZEN) == 0 &&
- (new_vmbits & VISIBILITYMAP_ALL_FROZEN) != 0)
+ else if ((presult.old_vmbits & VISIBILITYMAP_ALL_FROZEN) == 0 &&
+ (presult.new_vmbits & VISIBILITYMAP_ALL_FROZEN) != 0)
{
- Assert((new_vmbits & VISIBILITYMAP_ALL_VISIBLE) != 0);
+ Assert((presult.new_vmbits & VISIBILITYMAP_ALL_VISIBLE) != 0);
vacrel->vm_new_frozen_pages++;
*vm_page_frozen = true;
}
diff --git a/src/include/access/heapam.h b/src/include/access/heapam.h
index 632c4332a8c..bb712c5b29f 100644
--- a/src/include/access/heapam.h
+++ b/src/include/access/heapam.h
@@ -42,6 +42,7 @@
/* "options" flag bits for heap_page_prune_and_freeze */
#define HEAP_PAGE_PRUNE_MARK_UNUSED_NOW (1 << 0)
#define HEAP_PAGE_PRUNE_FREEZE (1 << 1)
+#define HEAP_PAGE_PRUNE_UPDATE_VM (1 << 2)
typedef struct BulkInsertStateData *BulkInsertState;
typedef struct GlobalVisState GlobalVisState;
@@ -238,6 +239,18 @@ typedef struct PruneFreezeParams
Relation relation; /* relation containing buffer to be pruned */
Buffer buffer; /* buffer to be pruned */
+ /*
+ * vmbuffer is the buffer that must already contain the required block of
+ * the visibility map if we are to update it. blk_known_av is the
+ * visibility status of the heap block as of the last call to
+ * find_next_unskippable_block(). Callers which did not check the
+ * visibility map already should pass false for blk_known_av. This is only
+ * an optimization for callers that did check the VM and won't affect
+ * correctness.
+ */
+ Buffer vmbuffer;
+ bool blk_known_av;
+
/*
* The reason pruning was performed. It is used to set the WAL record
* opcode which is used for debugging and analysis purposes.
@@ -252,6 +265,9 @@ typedef struct PruneFreezeParams
*
* HEAP_PAGE_PRUNE_FREEZE indicates that we will also freeze tuples, and
* will return 'all_visible', 'all_frozen' flags to the caller.
+ *
+ * HEAP_PAGE_PRUNE_UPDATE_VM indicates that we will set the page's status
+ * in the VM.
*/
int options;
@@ -299,6 +315,17 @@ typedef struct PruneFreezeResult
bool all_frozen;
TransactionId vm_conflict_horizon;
+ /*
+ * old_vmbits are the state of the all-visible and all-frozen bits in the
+ * visibility map before updating it during phase I of vacuuming.
+ * new_vmbits are the state of those bits after phase I of vacuuming.
+ *
+ * These are only set if the HEAP_PAGE_PRUNE_UPDATE_VM option is set and
+ * we have attempted to update the VM.
+ */
+ uint8 new_vmbits;
+ uint8 old_vmbits;
+
/*
* Whether or not the page makes rel truncation unsafe. This is set to
* 'true', even if the page contains LP_DEAD items. VACUUM will remove
--
2.43.0
v24-0006-Move-VM-assert-into-prune-freeze-code.patchtext/x-patch; charset=UTF-8; name=v24-0006-Move-VM-assert-into-prune-freeze-code.patchDownload
From 30e6b5420669389b0b0e6169905d344442d17266 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <melanieplageman@gmail.com>
Date: Tue, 2 Dec 2025 15:57:34 -0500
Subject: [PATCH v24 06/16] Move VM assert into prune/freeze code
This is a step toward setting the VM in the same WAL record as pruning
and freezing. It moves the check of the heap page into prune/freeze code
before setting the VM. This allows us to remove some fields of the
PruneFreezeResult.
---
src/backend/access/heap/pruneheap.c | 138 +++++++++++++++++++--------
src/backend/access/heap/vacuumlazy.c | 68 +------------
src/include/access/heapam.h | 25 ++---
3 files changed, 109 insertions(+), 122 deletions(-)
diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index d7f36e2764f..96dc902ec12 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -199,7 +199,7 @@ static bool heap_page_will_set_vm(Relation relation,
Buffer heap_buf,
Buffer vmbuffer,
bool blk_known_av,
- const PruneFreezeResult *presult,
+ const PruneState *prstate,
uint8 *new_vmbits);
@@ -784,9 +784,9 @@ heap_page_will_freeze(Relation relation, Buffer buffer,
/*
* Decide whether to set the visibility map bits (all-visible and all-frozen)
- * for heap_blk using information from PruneFreezeResult and blk_known_av.
- * Some callers may already have examined this page’s VM bits (e.g., VACUUM in
- * the previous heap_vac_scan_next_block() call) and can pass that along as
+ * for heap_blk using information from PruneState and blk_known_av. Some
+ * callers may already have examined this page’s VM bits (e.g., VACUUM in the
+ * previous heap_vac_scan_next_block() call) and can pass that along as
* blk_known_av. Callers that have not previously checked the page's status in
* the VM should pass false for blk_known_av.
*
@@ -806,27 +806,30 @@ heap_page_will_set_vm(Relation relation,
Buffer heap_buf,
Buffer vmbuffer,
bool blk_known_av,
- const PruneFreezeResult *presult,
+ const PruneState *prstate,
uint8 *new_vmbits)
{
Page heap_page = BufferGetPage(heap_buf);
*new_vmbits = 0;
+ if (!prstate->attempt_update_vm)
+ {
+ Assert(!prstate->all_visible && !prstate->all_frozen);
+ return false;
+ }
+
/*
* Determine what the visibility map bits should be set to using the
* values of all_visible and all_frozen determined during
* pruning/freezing.
*/
- if ((presult->all_visible && !blk_known_av) ||
- (presult->all_frozen && !VM_ALL_FROZEN(relation, heap_blk, &vmbuffer)))
+ if ((prstate->all_visible && !blk_known_av) ||
+ (prstate->all_frozen && !VM_ALL_FROZEN(relation, heap_blk, &vmbuffer)))
{
*new_vmbits = VISIBILITYMAP_ALL_VISIBLE;
- if (presult->all_frozen)
- {
- Assert(!TransactionIdIsValid(presult->vm_conflict_horizon));
+ if (prstate->all_frozen)
*new_vmbits |= VISIBILITYMAP_ALL_FROZEN;
- }
return true;
}
@@ -873,7 +876,7 @@ heap_page_will_set_vm(Relation relation,
* There should never be LP_DEAD items on a page with PD_ALL_VISIBLE set,
* however.
*/
- else if (presult->lpdead_items > 0 && PageIsAllVisible(heap_page))
+ else if (prstate->lpdead_items > 0 && PageIsAllVisible(heap_page))
{
ereport(WARNING,
(errcode(ERRCODE_DATA_CORRUPTED),
@@ -889,6 +892,30 @@ heap_page_will_set_vm(Relation relation,
return false;
}
+#ifdef USE_ASSERT_CHECKING
+
+/*
+ * Wrapper for heap_page_would_be_all_visible() which can be used for callers
+ * that expect no LP_DEAD on the page. Currently assert-only, but there is no
+ * reason not to use it outside of asserts.
+ */
+static bool
+heap_page_is_all_visible(Relation rel, Buffer buf,
+ TransactionId OldestXmin,
+ bool *all_frozen,
+ TransactionId *visibility_cutoff_xid,
+ OffsetNumber *logging_offnum)
+{
+
+ return heap_page_would_be_all_visible(rel, buf,
+ OldestXmin,
+ NULL, 0,
+ all_frozen,
+ visibility_cutoff_xid,
+ logging_offnum);
+}
+#endif
+
/*
* Prune and repair fragmentation and potentially freeze tuples on the
@@ -942,6 +969,7 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
Buffer vmbuffer = params->vmbuffer;
Page page = BufferGetPage(buffer);
BlockNumber blockno = BufferGetBlockNumber(buffer);
+ TransactionId vm_conflict_horizon = InvalidTransactionId;
PruneState prstate;
bool do_freeze;
bool do_prune;
@@ -1097,23 +1125,8 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
presult->nfrozen = prstate.nfrozen;
presult->live_tuples = prstate.live_tuples;
presult->recently_dead_tuples = prstate.recently_dead_tuples;
- presult->all_visible = prstate.all_visible;
- presult->all_frozen = prstate.all_frozen;
presult->hastup = prstate.hastup;
- /*
- * For callers planning to update the visibility map, the conflict horizon
- * for that record must be the newest xmin on the page. However, if the
- * page is completely frozen, there can be no conflict and the
- * vm_conflict_horizon should remain InvalidTransactionId. This includes
- * the case that we just froze all the tuples; the prune-freeze record
- * included the conflict XID already so the caller doesn't need it.
- */
- if (presult->all_frozen)
- presult->vm_conflict_horizon = InvalidTransactionId;
- else
- presult->vm_conflict_horizon = prstate.visibility_cutoff_xid;
-
presult->lpdead_items = prstate.lpdead_items;
/* the presult->deadoffsets array was already filled in */
@@ -1134,18 +1147,67 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
presult->new_vmbits = 0;
presult->old_vmbits = 0;
- /* Now update the visibility map and PD_ALL_VISIBLE hint */
+ /*
+ * If updating the visibility map, the conflict horizon for that record
+ * must be the newest xmin on the page. However, if the page is
+ * completely frozen, there can be no conflict and the vm_conflict_horizon
+ * should remain InvalidTransactionId. This includes the case that we
+ * just froze all the tuples; the prune-freeze record included the
+ * conflict XID already so we don't need to again.
+ */
+ if (prstate.all_frozen)
+ vm_conflict_horizon = InvalidTransactionId;
+ else
+ vm_conflict_horizon = prstate.visibility_cutoff_xid;
+
+ /*
+ * During its second pass over the heap, VACUUM calls
+ * heap_page_would_be_all_visible() to determine whether a page is
+ * all-visible and all-frozen. The logic here is similar. After completing
+ * pruning and freezing, use an assertion to verify that our results
+ * remain consistent with heap_page_would_be_all_visible().
+ */
+#ifdef USE_ASSERT_CHECKING
+ if (prstate.all_visible)
+ {
+ TransactionId debug_cutoff;
+ bool debug_all_frozen;
+
+ Assert(presult->lpdead_items == 0);
+
+ Assert(heap_page_is_all_visible(params->relation, buffer,
+ prstate.cutoffs->OldestXmin,
+ &debug_all_frozen,
+ &debug_cutoff, off_loc));
+
+ Assert(prstate.all_frozen == debug_all_frozen);
+
+ Assert(!TransactionIdIsValid(debug_cutoff) ||
+ debug_cutoff == vm_conflict_horizon);
+ }
+#endif
+
+ Assert(!prstate.all_frozen || prstate.all_visible);
Assert(!prstate.all_visible || (prstate.lpdead_items == 0));
- do_set_vm = false;
- if (prstate.attempt_update_vm)
- do_set_vm = heap_page_will_set_vm(params->relation,
- blockno,
- buffer,
- vmbuffer,
- params->blk_known_av,
- presult,
- &presult->new_vmbits);
+ /*
+ * Decide whether to set the page-level PD_ALL_VISIBLE bit and the VM bits
+ * based on information from the VM and the all_visible/all_frozen flags.
+ *
+ * While it is valid for PD_ALL_VISIBLE to be set when the corresponding
+ * VM bit is clear, we strongly prefer to keep them in sync.
+ *
+ * Accordingly, we also allow updating only the VM when PD_ALL_VISIBLE has
+ * already been set. Setting only the VM is most common when setting an
+ * already all-visible page all-frozen.
+ */
+ do_set_vm = heap_page_will_set_vm(params->relation,
+ blockno,
+ buffer,
+ vmbuffer,
+ params->blk_known_av,
+ &prstate,
+ &presult->new_vmbits);
/*
* new_vmbits should be 0 regardless of whether or not the page is
@@ -1169,7 +1231,7 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
MarkBufferDirty(buffer);
presult->old_vmbits = visibilitymap_set(params->relation, blockno, buffer,
InvalidXLogRecPtr,
- vmbuffer, presult->vm_conflict_horizon,
+ vmbuffer, vm_conflict_horizon,
presult->new_vmbits);
}
}
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index 86822778abc..9f404e03869 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -464,20 +464,6 @@ static void dead_items_add(LVRelState *vacrel, BlockNumber blkno, OffsetNumber *
static void dead_items_reset(LVRelState *vacrel);
static void dead_items_cleanup(LVRelState *vacrel);
-#ifdef USE_ASSERT_CHECKING
-static bool heap_page_is_all_visible(Relation rel, Buffer buf,
- TransactionId OldestXmin,
- bool *all_frozen,
- TransactionId *visibility_cutoff_xid,
- OffsetNumber *logging_offnum);
-#endif
-static bool heap_page_would_be_all_visible(Relation rel, Buffer buf,
- TransactionId OldestXmin,
- OffsetNumber *deadoffsets,
- int ndeadoffsets,
- bool *all_frozen,
- TransactionId *visibility_cutoff_xid,
- OffsetNumber *logging_offnum);
static void update_relstats_all_indexes(LVRelState *vacrel);
static void vacuum_error_callback(void *arg);
static void update_vacuum_error_info(LVRelState *vacrel,
@@ -2016,32 +2002,6 @@ lazy_scan_prune(LVRelState *vacrel,
vacrel->new_frozen_tuple_pages++;
}
- /*
- * VACUUM will call heap_page_is_all_visible() during the second pass over
- * the heap to determine all_visible and all_frozen for the page -- this
- * is a specialized version of the logic from this function. Now that
- * we've finished pruning and freezing, make sure that we're in total
- * agreement with heap_page_is_all_visible() using an assertion.
- */
-#ifdef USE_ASSERT_CHECKING
- if (presult.all_visible)
- {
- TransactionId debug_cutoff;
- bool debug_all_frozen;
-
- Assert(presult.lpdead_items == 0);
-
- Assert(heap_page_is_all_visible(vacrel->rel, buf,
- vacrel->cutoffs.OldestXmin, &debug_all_frozen,
- &debug_cutoff, &vacrel->offnum));
-
- Assert(presult.all_frozen == debug_all_frozen);
-
- Assert(!TransactionIdIsValid(debug_cutoff) ||
- debug_cutoff == presult.vm_conflict_horizon);
- }
-#endif
-
/*
* Now save details of the LP_DEAD items from the page in vacrel
*/
@@ -3496,29 +3456,6 @@ dead_items_cleanup(LVRelState *vacrel)
vacrel->pvs = NULL;
}
-#ifdef USE_ASSERT_CHECKING
-
-/*
- * Wrapper for heap_page_would_be_all_visible() which can be used for callers
- * that expect no LP_DEAD on the page. Currently assert-only, but there is no
- * reason not to use it outside of asserts.
- */
-static bool
-heap_page_is_all_visible(Relation rel, Buffer buf,
- TransactionId OldestXmin,
- bool *all_frozen,
- TransactionId *visibility_cutoff_xid,
- OffsetNumber *logging_offnum)
-{
-
- return heap_page_would_be_all_visible(rel, buf,
- OldestXmin,
- NULL, 0,
- all_frozen,
- visibility_cutoff_xid,
- logging_offnum);
-}
-#endif
/*
* Check whether the heap page in buf is all-visible except for the dead
@@ -3542,15 +3479,12 @@ heap_page_is_all_visible(Relation rel, Buffer buf,
* - *logging_offnum: OffsetNumber of current tuple being processed;
* used by vacuum's error callback system.
*
- * Callers looking to verify that the page is already all-visible can call
- * heap_page_is_all_visible().
- *
* This logic is closely related to heap_prune_record_unchanged_lp_normal().
* If you modify this function, ensure consistency with that code. An
* assertion cross-checks that both remain in agreement. Do not introduce new
* side-effects.
*/
-static bool
+bool
heap_page_would_be_all_visible(Relation rel, Buffer buf,
TransactionId OldestXmin,
OffsetNumber *deadoffsets,
diff --git a/src/include/access/heapam.h b/src/include/access/heapam.h
index bb712c5b29f..392af6503da 100644
--- a/src/include/access/heapam.h
+++ b/src/include/access/heapam.h
@@ -263,8 +263,7 @@ typedef struct PruneFreezeParams
* HEAP_PAGE_PRUNE_MARK_UNUSED_NOW indicates that dead items can be set
* LP_UNUSED during pruning.
*
- * HEAP_PAGE_PRUNE_FREEZE indicates that we will also freeze tuples, and
- * will return 'all_visible', 'all_frozen' flags to the caller.
+ * HEAP_PAGE_PRUNE_FREEZE indicates that we will also freeze tuples
*
* HEAP_PAGE_PRUNE_UPDATE_VM indicates that we will set the page's status
* in the VM.
@@ -300,21 +299,6 @@ typedef struct PruneFreezeResult
int live_tuples;
int recently_dead_tuples;
- /*
- * all_visible and all_frozen indicate if the all-visible and all-frozen
- * bits in the visibility map can be set for this page, after pruning.
- *
- * vm_conflict_horizon is the newest xmin of live tuples on the page. The
- * caller can use it as the conflict horizon when setting the VM bits. It
- * is only valid if we froze some tuples (nfrozen > 0), and all_frozen is
- * true.
- *
- * These are only set if the HEAP_PRUNE_FREEZE option is set.
- */
- bool all_visible;
- bool all_frozen;
- TransactionId vm_conflict_horizon;
-
/*
* old_vmbits are the state of the all-visible and all-frozen bits in the
* visibility map before updating it during phase I of vacuuming.
@@ -460,6 +444,13 @@ extern void log_heap_prune_and_freeze(Relation relation, Buffer buffer,
/* in heap/vacuumlazy.c */
extern void heap_vacuum_rel(Relation rel,
const VacuumParams params, BufferAccessStrategy bstrategy);
+extern bool heap_page_would_be_all_visible(Relation rel, Buffer buf,
+ TransactionId OldestXmin,
+ OffsetNumber *deadoffsets,
+ int ndeadoffsets,
+ bool *all_frozen,
+ TransactionId *visibility_cutoff_xid,
+ OffsetNumber *logging_offnum);
/* in heap/heapam_visibility.c */
extern bool HeapTupleSatisfiesVisibility(HeapTuple htup, Snapshot snapshot,
--
2.43.0
v24-0007-Eliminate-XLOG_HEAP2_VISIBLE-from-vacuum-phase-I.patchtext/x-patch; charset=US-ASCII; name=v24-0007-Eliminate-XLOG_HEAP2_VISIBLE-from-vacuum-phase-I.patchDownload
From b191695afcba438ae8c5d1c3b4d5939c76d22a4f Mon Sep 17 00:00:00 2001
From: Melanie Plageman <melanieplageman@gmail.com>
Date: Tue, 2 Dec 2025 16:16:22 -0500
Subject: [PATCH v24 07/16] Eliminate XLOG_HEAP2_VISIBLE from vacuum phase I
prune/freeze
Vacuum no longer emits a separate WAL record for each page set
all-visible or all-frozen during phase I. Instead, visibility map
updates are now included in the XLOG_HEAP2_PRUNE_VACUUM_SCAN record that
is already emitted for pruning and freezing.
Previously, heap_page_prune_and_freeze() determined whether a page was
all-visible, but the corresponding VM bits were only set later in
lazy_scan_prune(). Now the VM is updated immediately in
heap_page_prune_and_freeze(), at the same time as the heap
modifications.
This change applies only to vacuum phase I, not to pruning performed
during normal page access.
NOTE: This commit is the main commit and all review-only commits
preceding it will be squashed into it.
Author: Melanie Plageman <melanieplageman@gmail.com>
Reviewed-by: Andres Freund <andres@anarazel.de>
Reviewed-by: Robert Haas <robertmhaas@gmail.com>
Reviewed-by: Kirill Reshke <reshkekirill@gmail.com>
Reviewed-by: Chao Li <li.evan.chao@gmail.com>
Discussion: https://postgr.es/m/flat/CAAKRu_ZMw6Npd_qm2KM%2BFwQ3cMOMx1Dh3VMhp8-V7SOLxdK9-g%40mail.gmail.com
---
src/backend/access/heap/pruneheap.c | 255 ++++++++++++++++------------
1 file changed, 144 insertions(+), 111 deletions(-)
diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index 96dc902ec12..489b8487599 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -194,6 +194,12 @@ static void page_verify_redirects(Page page);
static bool heap_page_will_freeze(Relation relation, Buffer buffer,
bool did_tuple_hint_fpi, bool do_prune, bool do_hint_prune,
PruneState *prstate);
+static TransactionId get_conflict_xid(bool do_prune, bool do_freeze, bool do_set_vm,
+ uint8 new_vmbits,
+ TransactionId latest_xid_removed,
+ TransactionId frz_conflict_horizon,
+ TransactionId visibility_cutoff_xid,
+ bool blk_already_av);
static bool heap_page_will_set_vm(Relation relation,
BlockNumber heap_blk,
Buffer heap_buf,
@@ -782,6 +788,64 @@ heap_page_will_freeze(Relation relation, Buffer buffer,
return do_freeze;
}
+/*
+ * Calculate the conflict horizon for the whole XLOG_HEAP2_PRUNE_VACUUM_SCAN
+ * or XLOG_HEAP2_PRUNE_ON_ACCESS record.
+ */
+static TransactionId
+get_conflict_xid(bool do_prune, bool do_freeze, bool do_set_vm, uint8 new_vmbits,
+ TransactionId latest_xid_removed, TransactionId frz_conflict_horizon,
+ TransactionId visibility_cutoff_xid, bool blk_already_av)
+{
+ TransactionId conflict_xid;
+
+ /*
+ * We can omit the snapshot conflict horizon if we are not pruning or
+ * freezing any tuples and are setting an already all-visible page
+ * all-frozen in the VM. In this case, all of the tuples on the page must
+ * already be visible to all MVCC snapshots on the standby.
+ */
+ if (!do_prune && !do_freeze &&
+ do_set_vm && blk_already_av && (new_vmbits & VISIBILITYMAP_ALL_FROZEN))
+ return InvalidTransactionId;
+
+ /*
+ * The snapshotConflictHorizon for the whole record should be the most
+ * conservative of all the horizons calculated for any of the possible
+ * modifications. If this record will prune tuples, any transactions on
+ * the standby older than the youngest xmax of the most recently removed
+ * tuple this record will prune will conflict. If this record will freeze
+ * tuples, any transactions on the standby with xids older than the
+ * youngest tuple this record will freeze will conflict.
+ */
+ conflict_xid = InvalidTransactionId;
+
+ /*
+ * If we are updating the VM, the conflict horizon is almost always the
+ * visibility cutoff XID.
+ *
+ * Separately, if we are freezing any tuples, as an optimization, we can
+ * use the visibility_cutoff_xid as the conflict horizon if the page will
+ * be all-frozen. This is true even if there are LP_DEAD line pointers
+ * because we ignored those when maintaining the visibility_cutoff_xid.
+ * This will have been calculated earlier as the frz_conflict_horizon when
+ * we determined we would freeze.
+ */
+ if (do_set_vm)
+ conflict_xid = visibility_cutoff_xid;
+ else if (do_freeze)
+ conflict_xid = frz_conflict_horizon;
+
+ /*
+ * If we are removing tuples with a younger xmax than our so far
+ * calculated conflict_xid, we must use this as our horizon.
+ */
+ if (TransactionIdFollows(latest_xid_removed, conflict_xid))
+ conflict_xid = latest_xid_removed;
+
+ return conflict_xid;
+}
+
/*
* Decide whether to set the visibility map bits (all-visible and all-frozen)
* for heap_blk using information from PruneState and blk_known_av. Some
@@ -969,7 +1033,6 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
Buffer vmbuffer = params->vmbuffer;
Page page = BufferGetPage(buffer);
BlockNumber blockno = BufferGetBlockNumber(buffer);
- TransactionId vm_conflict_horizon = InvalidTransactionId;
PruneState prstate;
bool do_freeze;
bool do_prune;
@@ -977,6 +1040,9 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
bool do_set_vm;
bool did_tuple_hint_fpi;
int64 fpi_before = pgWalUsage.wal_fpi;
+ TransactionId conflict_xid = InvalidTransactionId;
+ uint8 new_vmbits = 0;
+ uint8 old_vmbits = 0;
/* Initialize prstate */
prune_freeze_setup(params,
@@ -1038,6 +1104,36 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
prstate.all_visible = prstate.all_frozen = false;
Assert(!prstate.all_frozen || prstate.all_visible);
+ Assert(!prstate.all_visible || (prstate.lpdead_items == 0));
+
+ /*
+ * Decide whether to set the page-level PD_ALL_VISIBLE bit and the VM bits
+ * based on information from the VM and the all_visible/all_frozen flags.
+ *
+ * While it is valid for PD_ALL_VISIBLE to be set when the corresponding
+ * VM bit is clear, we strongly prefer to keep them in sync.
+ *
+ * Accordingly, we also allow updating only the VM when PD_ALL_VISIBLE has
+ * already been set. Setting only the VM is most common when setting an
+ * already all-visible page all-frozen.
+ */
+ do_set_vm = heap_page_will_set_vm(params->relation,
+ blockno, buffer, vmbuffer, params->blk_known_av,
+ &prstate, &new_vmbits);
+
+ /*
+ * new_vmbits should be 0 regardless of whether or not the page is
+ * all-visible if we do not intend to set the VM.
+ */
+ Assert(do_set_vm || new_vmbits == 0);
+
+ conflict_xid = get_conflict_xid(do_prune, do_freeze, do_set_vm, new_vmbits,
+ prstate.latest_xid_removed, prstate.frz_conflict_horizon,
+ prstate.visibility_cutoff_xid, params->blk_known_av);
+
+ /* Lock vmbuffer before entering a critical section */
+ if (do_set_vm)
+ LockBuffer(vmbuffer, BUFFER_LOCK_EXCLUSIVE);
/* Any error while applying the changes is critical */
START_CRIT_SECTION();
@@ -1059,14 +1155,17 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
/*
* If that's all we had to do to the page, this is a non-WAL-logged
- * hint. If we are going to freeze or prune the page, we will mark
- * the buffer dirty below.
+ * hint. If we are going to freeze or prune the page or set
+ * PD_ALL_VISIBLE, we will mark the buffer dirty below.
+ *
+ * Setting PD_ALL_VISIBLE is fully WAL-logged because it is forbidden
+ * for the VM to be set and PD_ALL_VISIBLE to be clear.
*/
- if (!do_freeze && !do_prune)
+ if (!do_freeze && !do_prune && !do_set_vm)
MarkBufferDirtyHint(buffer, true);
}
- if (do_prune || do_freeze)
+ if (do_prune || do_freeze || do_set_vm)
{
/* Apply the planned item changes and repair page fragmentation. */
if (do_prune)
@@ -1080,6 +1179,15 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
if (do_freeze)
heap_freeze_prepared_tuples(buffer, prstate.frozen, prstate.nfrozen);
+ if (do_set_vm)
+ {
+ PageSetAllVisible(page);
+ old_vmbits = visibilitymap_set_vmbits(blockno,
+ vmbuffer, new_vmbits,
+ params->relation->rd_locator);
+ Assert(old_vmbits != new_vmbits);
+ }
+
MarkBufferDirty(buffer);
/*
@@ -1087,29 +1195,12 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
*/
if (RelationNeedsWAL(params->relation))
{
- /*
- * The snapshotConflictHorizon for the whole record should be the
- * most conservative of all the horizons calculated for any of the
- * possible modifications. If this record will prune tuples, any
- * transactions on the standby older than the youngest xmax of the
- * most recently removed tuple this record will prune will
- * conflict. If this record will freeze tuples, any transactions
- * on the standby with xids older than the youngest tuple this
- * record will freeze will conflict.
- */
- TransactionId conflict_xid;
-
- if (TransactionIdFollows(prstate.frz_conflict_horizon,
- prstate.latest_xid_removed))
- conflict_xid = prstate.frz_conflict_horizon;
- else
- conflict_xid = prstate.latest_xid_removed;
-
log_heap_prune_and_freeze(params->relation, buffer,
- InvalidBuffer, /* vmbuffer */
- 0, /* vmflags */
+ do_set_vm ? vmbuffer : InvalidBuffer,
+ do_set_vm ? new_vmbits : 0,
conflict_xid,
- true, params->reason,
+ true, /* cleanup lock */
+ params->reason,
prstate.frozen, prstate.nfrozen,
prstate.redirected, prstate.nredirected,
prstate.nowdead, prstate.ndead,
@@ -1119,46 +1210,8 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
END_CRIT_SECTION();
- /* Copy information back for caller */
- presult->ndeleted = prstate.ndeleted;
- presult->nnewlpdead = prstate.ndead;
- presult->nfrozen = prstate.nfrozen;
- presult->live_tuples = prstate.live_tuples;
- presult->recently_dead_tuples = prstate.recently_dead_tuples;
- presult->hastup = prstate.hastup;
-
- presult->lpdead_items = prstate.lpdead_items;
- /* the presult->deadoffsets array was already filled in */
-
- if (prstate.attempt_freeze)
- {
- if (presult->nfrozen > 0)
- {
- *new_relfrozen_xid = prstate.pagefrz.FreezePageRelfrozenXid;
- *new_relmin_mxid = prstate.pagefrz.FreezePageRelminMxid;
- }
- else
- {
- *new_relfrozen_xid = prstate.pagefrz.NoFreezePageRelfrozenXid;
- *new_relmin_mxid = prstate.pagefrz.NoFreezePageRelminMxid;
- }
- }
-
- presult->new_vmbits = 0;
- presult->old_vmbits = 0;
-
- /*
- * If updating the visibility map, the conflict horizon for that record
- * must be the newest xmin on the page. However, if the page is
- * completely frozen, there can be no conflict and the vm_conflict_horizon
- * should remain InvalidTransactionId. This includes the case that we
- * just froze all the tuples; the prune-freeze record included the
- * conflict XID already so we don't need to again.
- */
- if (prstate.all_frozen)
- vm_conflict_horizon = InvalidTransactionId;
- else
- vm_conflict_horizon = prstate.visibility_cutoff_xid;
+ if (do_set_vm)
+ LockBuffer(vmbuffer, BUFFER_LOCK_UNLOCK);
/*
* During its second pass over the heap, VACUUM calls
@@ -1173,7 +1226,8 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
TransactionId debug_cutoff;
bool debug_all_frozen;
- Assert(presult->lpdead_items == 0);
+ Assert(prstate.lpdead_items == 0);
+ Assert(prstate.cutoffs);
Assert(heap_page_is_all_visible(params->relation, buffer,
prstate.cutoffs->OldestXmin,
@@ -1183,56 +1237,35 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
Assert(prstate.all_frozen == debug_all_frozen);
Assert(!TransactionIdIsValid(debug_cutoff) ||
- debug_cutoff == vm_conflict_horizon);
+ debug_cutoff == prstate.visibility_cutoff_xid);
}
#endif
- Assert(!prstate.all_frozen || prstate.all_visible);
- Assert(!prstate.all_visible || (prstate.lpdead_items == 0));
-
- /*
- * Decide whether to set the page-level PD_ALL_VISIBLE bit and the VM bits
- * based on information from the VM and the all_visible/all_frozen flags.
- *
- * While it is valid for PD_ALL_VISIBLE to be set when the corresponding
- * VM bit is clear, we strongly prefer to keep them in sync.
- *
- * Accordingly, we also allow updating only the VM when PD_ALL_VISIBLE has
- * already been set. Setting only the VM is most common when setting an
- * already all-visible page all-frozen.
- */
- do_set_vm = heap_page_will_set_vm(params->relation,
- blockno,
- buffer,
- vmbuffer,
- params->blk_known_av,
- &prstate,
- &presult->new_vmbits);
+ /* Copy information back for caller */
+ presult->ndeleted = prstate.ndeleted;
+ presult->nnewlpdead = prstate.ndead;
+ presult->nfrozen = prstate.nfrozen;
+ presult->live_tuples = prstate.live_tuples;
+ presult->recently_dead_tuples = prstate.recently_dead_tuples;
+ presult->hastup = prstate.hastup;
+ presult->new_vmbits = new_vmbits;
+ presult->old_vmbits = old_vmbits;
- /*
- * new_vmbits should be 0 regardless of whether or not the page is
- * all-visible if we do not intend to set the VM.
- */
- Assert(do_set_vm || presult->new_vmbits == 0);
+ presult->lpdead_items = prstate.lpdead_items;
+ /* the presult->deadoffsets array was already filled in */
- if (do_set_vm)
+ if (prstate.attempt_freeze)
{
- /*
- * It should never be the case that the visibility map page is set
- * while the page-level bit is clear, but the reverse is allowed (if
- * checksums are not enabled).
- *
- * The heap page is added to the WAL chain even if it wasn't modified,
- * so we still need to mark it dirty. The only scenario where it isn't
- * modified in phase I is when the VM was truncated or removed, which
- * isn't worth optimizing for.
- */
- PageSetAllVisible(page);
- MarkBufferDirty(buffer);
- presult->old_vmbits = visibilitymap_set(params->relation, blockno, buffer,
- InvalidXLogRecPtr,
- vmbuffer, vm_conflict_horizon,
- presult->new_vmbits);
+ if (presult->nfrozen > 0)
+ {
+ *new_relfrozen_xid = prstate.pagefrz.FreezePageRelfrozenXid;
+ *new_relmin_mxid = prstate.pagefrz.FreezePageRelminMxid;
+ }
+ else
+ {
+ *new_relfrozen_xid = prstate.pagefrz.NoFreezePageRelfrozenXid;
+ *new_relmin_mxid = prstate.pagefrz.NoFreezePageRelminMxid;
+ }
}
}
--
2.43.0
v24-0008-Eliminate-XLOG_HEAP2_VISIBLE-from-empty-page-vac.patchtext/x-patch; charset=US-ASCII; name=v24-0008-Eliminate-XLOG_HEAP2_VISIBLE-from-empty-page-vac.patchDownload
From 9f13a78c6bf5d6deda758b623b7790c45317ad6f Mon Sep 17 00:00:00 2001
From: Melanie Plageman <melanieplageman@gmail.com>
Date: Sat, 27 Sep 2025 11:55:21 -0400
Subject: [PATCH v24 08/16] Eliminate XLOG_HEAP2_VISIBLE from empty-page vacuum
As part of removing XLOG_HEAP2_VISIBLE records, phase I of VACUUM now
marks empty pages all-visible in a XLOG_HEAP2_PRUNE_VACUUM_SCAN record.
Author: Melanie Plageman <melanieplageman@gmail.com>
Reviewed-by: Robert Haas <robertmhaas@gmail.com>
Reviewed-by: Andres Freund <andres@anarazel.de>
Reviewed-by: Chao Li <li.evan.chao@gmail.com>
---
src/backend/access/heap/vacuumlazy.c | 35 +++++++++++++++++++++++-----
1 file changed, 29 insertions(+), 6 deletions(-)
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index 9f404e03869..6107777097d 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -1872,9 +1872,12 @@ lazy_scan_new_or_empty(LVRelState *vacrel, Buffer buf, BlockNumber blkno,
*/
if (!PageIsAllVisible(page))
{
+ /* Lock vmbuffer before entering critical section */
+ LockBuffer(vmbuffer, BUFFER_LOCK_EXCLUSIVE);
+
START_CRIT_SECTION();
- /* mark buffer dirty before writing a WAL record */
+ /* Mark buffer dirty before writing any WAL records */
MarkBufferDirty(buf);
/*
@@ -1891,13 +1894,33 @@ lazy_scan_new_or_empty(LVRelState *vacrel, Buffer buf, BlockNumber blkno,
log_newpage_buffer(buf, true);
PageSetAllVisible(page);
- visibilitymap_set(vacrel->rel, blkno, buf,
- InvalidXLogRecPtr,
- vmbuffer, InvalidTransactionId,
- VISIBILITYMAP_ALL_VISIBLE |
- VISIBILITYMAP_ALL_FROZEN);
+ visibilitymap_set_vmbits(blkno,
+ vmbuffer,
+ VISIBILITYMAP_ALL_VISIBLE |
+ VISIBILITYMAP_ALL_FROZEN,
+ vacrel->rel->rd_locator);
+
+ /*
+ * Emit WAL for setting PD_ALL_VISIBLE on the heap page and
+ * setting the VM.
+ */
+ if (RelationNeedsWAL(vacrel->rel))
+ log_heap_prune_and_freeze(vacrel->rel, buf,
+ vmbuffer,
+ VISIBILITYMAP_ALL_VISIBLE |
+ VISIBILITYMAP_ALL_FROZEN,
+ InvalidTransactionId, /* conflict xid */
+ false, /* cleanup lock */
+ PRUNE_VACUUM_SCAN, /* reason */
+ NULL, 0,
+ NULL, 0,
+ NULL, 0,
+ NULL, 0);
+
END_CRIT_SECTION();
+ LockBuffer(vmbuffer, BUFFER_LOCK_UNLOCK);
+
/* Count the newly all-frozen pages for logging */
vacrel->vm_new_visible_pages++;
vacrel->vm_new_visible_frozen_pages++;
--
2.43.0
v24-0009-Remove-XLOG_HEAP2_VISIBLE-entirely.patchtext/x-patch; charset=US-ASCII; name=v24-0009-Remove-XLOG_HEAP2_VISIBLE-entirely.patchDownload
From f21dc2adabecf6404a7cf96c1e9254dbe77fa613 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <melanieplageman@gmail.com>
Date: Sat, 27 Sep 2025 11:55:36 -0400
Subject: [PATCH v24 09/16] Remove XLOG_HEAP2_VISIBLE entirely
As no remaining users emit XLOG_HEAP2_VISIBLE records.
This includes deleting the xl_heap_visible struct and all functions
responsible for emitting or replaying XLOG_HEAP2_VISIBLE records.
This changes the visibility map API, so any external users/consumers of
the VM-only WAL record will need to change.
Author: Melanie Plageman <melanieplageman@gmail.com>
Reviewed-by: Andrey Borodin <x4mmm@yandex-team.ru>
Reviewed-by: Andres Freund <andres@anarazel.de>
Reviewed-by: Chao Li <li.evan.chao@gmail.com>
---
src/backend/access/common/bufmask.c | 4 +-
src/backend/access/heap/heapam.c | 54 +-------
src/backend/access/heap/heapam_xlog.c | 155 ++---------------------
src/backend/access/heap/pruneheap.c | 6 +-
src/backend/access/heap/vacuumlazy.c | 16 +--
src/backend/access/heap/visibilitymap.c | 112 +---------------
src/backend/access/rmgrdesc/heapdesc.c | 10 --
src/backend/replication/logical/decode.c | 1 -
src/backend/storage/ipc/standby.c | 12 +-
src/include/access/heapam_xlog.h | 28 +---
src/include/access/visibilitymap.h | 13 +-
src/include/access/visibilitymapdefs.h | 9 --
src/tools/pgindent/typedefs.list | 1 -
13 files changed, 46 insertions(+), 375 deletions(-)
diff --git a/src/backend/access/common/bufmask.c b/src/backend/access/common/bufmask.c
index bb260cffa68..5f07f179415 100644
--- a/src/backend/access/common/bufmask.c
+++ b/src/backend/access/common/bufmask.c
@@ -56,8 +56,8 @@ mask_page_hint_bits(Page page)
/*
* During replay, if the page LSN has advanced past our XLOG record's LSN,
- * we don't mark the page all-visible. See heap_xlog_visible() for
- * details.
+ * we don't mark the page all-visible. See heap_xlog_prune_and_freeze()
+ * for more details.
*/
PageClearAllVisible(page);
}
diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index 4d382a04338..3ad78ba4694 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -2539,11 +2539,11 @@ heap_multi_insert(Relation relation, TupleTableSlot **slots, int ntuples,
else if (all_frozen_set)
{
PageSetAllVisible(page);
- visibilitymap_set_vmbits(BufferGetBlockNumber(buffer),
- vmbuffer,
- VISIBILITYMAP_ALL_VISIBLE |
- VISIBILITYMAP_ALL_FROZEN,
- relation->rd_locator);
+ visibilitymap_set(BufferGetBlockNumber(buffer),
+ vmbuffer,
+ VISIBILITYMAP_ALL_VISIBLE |
+ VISIBILITYMAP_ALL_FROZEN,
+ relation->rd_locator);
}
/*
@@ -8812,50 +8812,6 @@ bottomup_sort_and_shrink(TM_IndexDeleteOp *delstate)
return nblocksfavorable;
}
-/*
- * Perform XLogInsert for a heap-visible operation. 'block' is the block
- * being marked all-visible, and vm_buffer is the buffer containing the
- * corresponding visibility map block. Both should have already been modified
- * and dirtied.
- *
- * snapshotConflictHorizon comes from the largest xmin on the page being
- * marked all-visible. REDO routine uses it to generate recovery conflicts.
- *
- * If checksums or wal_log_hints are enabled, we may also generate a full-page
- * image of heap_buffer. Otherwise, we optimize away the FPI (by specifying
- * REGBUF_NO_IMAGE for the heap buffer), in which case the caller should *not*
- * update the heap page's LSN.
- */
-XLogRecPtr
-log_heap_visible(Relation rel, Buffer heap_buffer, Buffer vm_buffer,
- TransactionId snapshotConflictHorizon, uint8 vmflags)
-{
- xl_heap_visible xlrec;
- XLogRecPtr recptr;
- uint8 flags;
-
- Assert(BufferIsValid(heap_buffer));
- Assert(BufferIsValid(vm_buffer));
-
- xlrec.snapshotConflictHorizon = snapshotConflictHorizon;
- xlrec.flags = vmflags;
- if (RelationIsAccessibleInLogicalDecoding(rel))
- xlrec.flags |= VISIBILITYMAP_XLOG_CATALOG_REL;
- XLogBeginInsert();
- XLogRegisterData(&xlrec, SizeOfHeapVisible);
-
- XLogRegisterBuffer(0, vm_buffer, 0);
-
- flags = REGBUF_STANDARD;
- if (!XLogHintBitIsNeeded())
- flags |= REGBUF_NO_IMAGE;
- XLogRegisterBuffer(1, heap_buffer, flags);
-
- recptr = XLogInsert(RM_HEAP2_ID, XLOG_HEAP2_VISIBLE);
-
- return recptr;
-}
-
/*
* Perform XLogInsert for a heap-update operation. Caller must already
* have modified the buffer(s) and marked them dirty.
diff --git a/src/backend/access/heap/heapam_xlog.c b/src/backend/access/heap/heapam_xlog.c
index a09fb4b803a..b66736ea282 100644
--- a/src/backend/access/heap/heapam_xlog.c
+++ b/src/backend/access/heap/heapam_xlog.c
@@ -236,7 +236,7 @@ heap_xlog_prune_freeze(XLogReaderState *record)
if (PageIsNew(vmpage))
PageInit(vmpage, BLCKSZ, 0);
- visibilitymap_set_vmbits(blkno, vmbuffer, vmflags, rlocator);
+ visibilitymap_set(blkno, vmbuffer, vmflags, rlocator);
Assert(BufferIsDirty(vmbuffer));
PageSetLSN(vmpage, lsn);
@@ -249,142 +249,6 @@ heap_xlog_prune_freeze(XLogReaderState *record)
XLogRecordPageWithFreeSpace(rlocator, blkno, freespace);
}
-/*
- * Replay XLOG_HEAP2_VISIBLE records.
- *
- * The critical integrity requirement here is that we must never end up with
- * a situation where the visibility map bit is set, and the page-level
- * PD_ALL_VISIBLE bit is clear. If that were to occur, then a subsequent
- * page modification would fail to clear the visibility map bit.
- */
-static void
-heap_xlog_visible(XLogReaderState *record)
-{
- XLogRecPtr lsn = record->EndRecPtr;
- xl_heap_visible *xlrec = (xl_heap_visible *) XLogRecGetData(record);
- Buffer vmbuffer = InvalidBuffer;
- Buffer buffer;
- Page page;
- RelFileLocator rlocator;
- BlockNumber blkno;
- XLogRedoAction action;
-
- Assert((xlrec->flags & VISIBILITYMAP_XLOG_VALID_BITS) == xlrec->flags);
-
- XLogRecGetBlockTag(record, 1, &rlocator, NULL, &blkno);
-
- /*
- * If there are any Hot Standby transactions running that have an xmin
- * horizon old enough that this page isn't all-visible for them, they
- * might incorrectly decide that an index-only scan can skip a heap fetch.
- *
- * NB: It might be better to throw some kind of "soft" conflict here that
- * forces any index-only scan that is in flight to perform heap fetches,
- * rather than killing the transaction outright.
- */
- if (InHotStandby)
- ResolveRecoveryConflictWithSnapshot(xlrec->snapshotConflictHorizon,
- xlrec->flags & VISIBILITYMAP_XLOG_CATALOG_REL,
- rlocator);
-
- /*
- * Read the heap page, if it still exists. If the heap file has dropped or
- * truncated later in recovery, we don't need to update the page, but we'd
- * better still update the visibility map.
- */
- action = XLogReadBufferForRedo(record, 1, &buffer);
- if (action == BLK_NEEDS_REDO)
- {
- /*
- * We don't bump the LSN of the heap page when setting the visibility
- * map bit (unless checksums or wal_hint_bits is enabled, in which
- * case we must). This exposes us to torn page hazards, but since
- * we're not inspecting the existing page contents in any way, we
- * don't care.
- */
- page = BufferGetPage(buffer);
-
- PageSetAllVisible(page);
-
- if (XLogHintBitIsNeeded())
- PageSetLSN(page, lsn);
-
- MarkBufferDirty(buffer);
- }
- else if (action == BLK_RESTORED)
- {
- /*
- * If heap block was backed up, we already restored it and there's
- * nothing more to do. (This can only happen with checksums or
- * wal_log_hints enabled.)
- */
- }
-
- if (BufferIsValid(buffer))
- {
- Size space = PageGetFreeSpace(BufferGetPage(buffer));
-
- UnlockReleaseBuffer(buffer);
-
- /*
- * Since FSM is not WAL-logged and only updated heuristically, it
- * easily becomes stale in standbys. If the standby is later promoted
- * and runs VACUUM, it will skip updating individual free space
- * figures for pages that became all-visible (or all-frozen, depending
- * on the vacuum mode,) which is troublesome when FreeSpaceMapVacuum
- * propagates too optimistic free space values to upper FSM layers;
- * later inserters try to use such pages only to find out that they
- * are unusable. This can cause long stalls when there are many such
- * pages.
- *
- * Forestall those problems by updating FSM's idea about a page that
- * is becoming all-visible or all-frozen.
- *
- * Do this regardless of a full-page image being applied, since the
- * FSM data is not in the page anyway.
- */
- if (xlrec->flags & VISIBILITYMAP_VALID_BITS)
- XLogRecordPageWithFreeSpace(rlocator, blkno, space);
- }
-
- /*
- * Even if we skipped the heap page update due to the LSN interlock, it's
- * still safe to update the visibility map. Any WAL record that clears
- * the visibility map bit does so before checking the page LSN, so any
- * bits that need to be cleared will still be cleared.
- */
- if (XLogReadBufferForRedoExtended(record, 0, RBM_ZERO_ON_ERROR, false,
- &vmbuffer) == BLK_NEEDS_REDO)
- {
- Page vmpage = BufferGetPage(vmbuffer);
- Relation reln;
- uint8 vmbits;
-
- /* initialize the page if it was read as zeros */
- if (PageIsNew(vmpage))
- PageInit(vmpage, BLCKSZ, 0);
-
- /* remove VISIBILITYMAP_XLOG_* */
- vmbits = xlrec->flags & VISIBILITYMAP_VALID_BITS;
-
- /*
- * XLogReadBufferForRedoExtended locked the buffer. But
- * visibilitymap_set will handle locking itself.
- */
- LockBuffer(vmbuffer, BUFFER_LOCK_UNLOCK);
-
- reln = CreateFakeRelcacheEntry(rlocator);
-
- visibilitymap_set(reln, blkno, InvalidBuffer, lsn, vmbuffer,
- xlrec->snapshotConflictHorizon, vmbits);
-
- ReleaseBuffer(vmbuffer);
- FreeFakeRelcacheEntry(reln);
- }
- else if (BufferIsValid(vmbuffer))
- UnlockReleaseBuffer(vmbuffer);
-}
-
/*
* Given an "infobits" field from an XLog record, set the correct bits in the
* given infomask and infomask2 for the tuple touched by the record.
@@ -762,8 +626,8 @@ heap_xlog_multi_insert(XLogReaderState *record)
*
* During recovery, however, no concurrent writers exist. Therefore,
* updating the VM without holding the heap page lock is safe enough. This
- * same approach is taken when replaying xl_heap_visible records (see
- * heap_xlog_visible()).
+ * same approach is taken when replaying XLOG_HEAP2_PRUNE* records (see
+ * heap_xlog_prune_and_freeze()).
*/
if ((xlrec->flags & XLH_INSERT_ALL_FROZEN_SET) &&
XLogReadBufferForRedoExtended(record, 1, RBM_ZERO_ON_ERROR, false,
@@ -775,11 +639,11 @@ heap_xlog_multi_insert(XLogReaderState *record)
if (PageIsNew(vmpage))
PageInit(vmpage, BLCKSZ, 0);
- visibilitymap_set_vmbits(blkno,
- vmbuffer,
- VISIBILITYMAP_ALL_VISIBLE |
- VISIBILITYMAP_ALL_FROZEN,
- rlocator);
+ visibilitymap_set(blkno,
+ vmbuffer,
+ VISIBILITYMAP_ALL_VISIBLE |
+ VISIBILITYMAP_ALL_FROZEN,
+ rlocator);
Assert(BufferIsDirty(vmbuffer));
PageSetLSN(vmpage, lsn);
@@ -1360,9 +1224,6 @@ heap2_redo(XLogReaderState *record)
case XLOG_HEAP2_PRUNE_VACUUM_CLEANUP:
heap_xlog_prune_freeze(record);
break;
- case XLOG_HEAP2_VISIBLE:
- heap_xlog_visible(record);
- break;
case XLOG_HEAP2_MULTI_INSERT:
heap_xlog_multi_insert(record);
break;
diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index 489b8487599..ab354add711 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -1182,9 +1182,9 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
if (do_set_vm)
{
PageSetAllVisible(page);
- old_vmbits = visibilitymap_set_vmbits(blockno,
- vmbuffer, new_vmbits,
- params->relation->rd_locator);
+ old_vmbits = visibilitymap_set(blockno,
+ vmbuffer, new_vmbits,
+ params->relation->rd_locator);
Assert(old_vmbits != new_vmbits);
}
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index 6107777097d..b73dbdbe4ed 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -1894,11 +1894,11 @@ lazy_scan_new_or_empty(LVRelState *vacrel, Buffer buf, BlockNumber blkno,
log_newpage_buffer(buf, true);
PageSetAllVisible(page);
- visibilitymap_set_vmbits(blkno,
- vmbuffer,
- VISIBILITYMAP_ALL_VISIBLE |
- VISIBILITYMAP_ALL_FROZEN,
- vacrel->rel->rd_locator);
+ visibilitymap_set(blkno,
+ vmbuffer,
+ VISIBILITYMAP_ALL_VISIBLE |
+ VISIBILITYMAP_ALL_FROZEN,
+ vacrel->rel->rd_locator);
/*
* Emit WAL for setting PD_ALL_VISIBLE on the heap page and
@@ -2780,9 +2780,9 @@ lazy_vacuum_heap_page(LVRelState *vacrel, BlockNumber blkno, Buffer buffer,
* set PD_ALL_VISIBLE.
*/
PageSetAllVisible(page);
- visibilitymap_set_vmbits(blkno,
- vmbuffer, vmflags,
- vacrel->rel->rd_locator);
+ visibilitymap_set(blkno,
+ vmbuffer, vmflags,
+ vacrel->rel->rd_locator);
conflict_xid = visibility_cutoff_xid;
}
diff --git a/src/backend/access/heap/visibilitymap.c b/src/backend/access/heap/visibilitymap.c
index d14588e92ae..7997e926872 100644
--- a/src/backend/access/heap/visibilitymap.c
+++ b/src/backend/access/heap/visibilitymap.c
@@ -14,8 +14,7 @@
* visibilitymap_clear - clear bits for one page in the visibility map
* visibilitymap_pin - pin a map page for setting a bit
* visibilitymap_pin_ok - check whether correct map page is already pinned
- * visibilitymap_set - set bit(s) in a previously pinned page and log
- * visibilitymap_set_vmbits - set bit(s) in a pinned page
+ * visibilitymap_set - set bit(s) in a previously pinned page
* visibilitymap_get_status - get status of bits
* visibilitymap_count - count number of bits set in visibility map
* visibilitymap_prepare_truncate -
@@ -220,109 +219,6 @@ visibilitymap_pin_ok(BlockNumber heapBlk, Buffer vmbuf)
return BufferIsValid(vmbuf) && BufferGetBlockNumber(vmbuf) == mapBlock;
}
-/*
- * visibilitymap_set - set bit(s) on a previously pinned page
- *
- * recptr is the LSN of the XLOG record we're replaying, if we're in recovery,
- * or InvalidXLogRecPtr in normal running. The VM page LSN is advanced to the
- * one provided; in normal running, we generate a new XLOG record and set the
- * page LSN to that value (though the heap page's LSN may *not* be updated;
- * see below). cutoff_xid is the largest xmin on the page being marked
- * all-visible; it is needed for Hot Standby, and can be InvalidTransactionId
- * if the page contains no tuples. It can also be set to InvalidTransactionId
- * when a page that is already all-visible is being marked all-frozen.
- *
- * Caller is expected to set the heap page's PD_ALL_VISIBLE bit before calling
- * this function. Except in recovery, caller should also pass the heap
- * buffer. When checksums are enabled and we're not in recovery, we must add
- * the heap buffer to the WAL chain to protect it from being torn.
- *
- * You must pass a buffer containing the correct map page to this function.
- * Call visibilitymap_pin first to pin the right one. This function doesn't do
- * any I/O.
- *
- * Returns the state of the page's VM bits before setting flags.
- */
-uint8
-visibilitymap_set(Relation rel, BlockNumber heapBlk, Buffer heapBuf,
- XLogRecPtr recptr, Buffer vmBuf, TransactionId cutoff_xid,
- uint8 flags)
-{
- BlockNumber mapBlock = HEAPBLK_TO_MAPBLOCK(heapBlk);
- uint32 mapByte = HEAPBLK_TO_MAPBYTE(heapBlk);
- uint8 mapOffset = HEAPBLK_TO_OFFSET(heapBlk);
- Page page;
- uint8 *map;
- uint8 status;
-
-#ifdef TRACE_VISIBILITYMAP
- elog(DEBUG1, "vm_set flags 0x%02X for %s %d",
- flags, RelationGetRelationName(rel), heapBlk);
-#endif
-
- Assert(InRecovery || !XLogRecPtrIsValid(recptr));
- Assert(InRecovery || PageIsAllVisible(BufferGetPage(heapBuf)));
- Assert((flags & VISIBILITYMAP_VALID_BITS) == flags);
-
- /* Must never set all_frozen bit without also setting all_visible bit */
- Assert(flags != VISIBILITYMAP_ALL_FROZEN);
-
- /* Check that we have the right heap page pinned, if present */
- if (BufferIsValid(heapBuf) && BufferGetBlockNumber(heapBuf) != heapBlk)
- elog(ERROR, "wrong heap buffer passed to visibilitymap_set");
-
- Assert(!BufferIsValid(heapBuf) ||
- BufferIsLockedByMeInMode(heapBuf, BUFFER_LOCK_EXCLUSIVE));
-
- /* Check that we have the right VM page pinned */
- if (!BufferIsValid(vmBuf) || BufferGetBlockNumber(vmBuf) != mapBlock)
- elog(ERROR, "wrong VM buffer passed to visibilitymap_set");
-
- page = BufferGetPage(vmBuf);
- map = (uint8 *) PageGetContents(page);
- LockBuffer(vmBuf, BUFFER_LOCK_EXCLUSIVE);
-
- status = (map[mapByte] >> mapOffset) & VISIBILITYMAP_VALID_BITS;
- if (flags != status)
- {
- START_CRIT_SECTION();
-
- map[mapByte] |= (flags << mapOffset);
- MarkBufferDirty(vmBuf);
-
- if (RelationNeedsWAL(rel))
- {
- if (!XLogRecPtrIsValid(recptr))
- {
- Assert(!InRecovery);
- recptr = log_heap_visible(rel, heapBuf, vmBuf, cutoff_xid, flags);
-
- /*
- * If data checksums are enabled (or wal_log_hints=on), we
- * need to protect the heap page from being torn.
- *
- * If not, then we must *not* update the heap page's LSN. In
- * this case, the FPI for the heap page was omitted from the
- * WAL record inserted above, so it would be incorrect to
- * update the heap page's LSN.
- */
- if (XLogHintBitIsNeeded())
- {
- Page heapPage = BufferGetPage(heapBuf);
-
- PageSetLSN(heapPage, recptr);
- }
- }
- PageSetLSN(page, recptr);
- }
-
- END_CRIT_SECTION();
- }
-
- LockBuffer(vmBuf, BUFFER_LOCK_UNLOCK);
- return status;
-}
-
/*
* Set VM (visibility map) flags in the VM block in vmBuf.
*
@@ -344,9 +240,9 @@ visibilitymap_set(Relation rel, BlockNumber heapBlk, Buffer heapBuf,
* rlocator is used only for debugging messages.
*/
uint8
-visibilitymap_set_vmbits(BlockNumber heapBlk,
- Buffer vmBuf, uint8 flags,
- const RelFileLocator rlocator)
+visibilitymap_set(BlockNumber heapBlk,
+ Buffer vmBuf, uint8 flags,
+ const RelFileLocator rlocator)
{
BlockNumber mapBlock = HEAPBLK_TO_MAPBLOCK(heapBlk);
uint32 mapByte = HEAPBLK_TO_MAPBYTE(heapBlk);
diff --git a/src/backend/access/rmgrdesc/heapdesc.c b/src/backend/access/rmgrdesc/heapdesc.c
index ca26d1f0ed1..08461fdf593 100644
--- a/src/backend/access/rmgrdesc/heapdesc.c
+++ b/src/backend/access/rmgrdesc/heapdesc.c
@@ -349,13 +349,6 @@ heap2_desc(StringInfo buf, XLogReaderState *record)
}
}
}
- else if (info == XLOG_HEAP2_VISIBLE)
- {
- xl_heap_visible *xlrec = (xl_heap_visible *) rec;
-
- appendStringInfo(buf, "snapshotConflictHorizon: %u, flags: 0x%02X",
- xlrec->snapshotConflictHorizon, xlrec->flags);
- }
else if (info == XLOG_HEAP2_MULTI_INSERT)
{
xl_heap_multi_insert *xlrec = (xl_heap_multi_insert *) rec;
@@ -461,9 +454,6 @@ heap2_identify(uint8 info)
case XLOG_HEAP2_PRUNE_VACUUM_CLEANUP:
id = "PRUNE_VACUUM_CLEANUP";
break;
- case XLOG_HEAP2_VISIBLE:
- id = "VISIBLE";
- break;
case XLOG_HEAP2_MULTI_INSERT:
id = "MULTI_INSERT";
break;
diff --git a/src/backend/replication/logical/decode.c b/src/backend/replication/logical/decode.c
index cc03f0706e9..2fdd4af90a8 100644
--- a/src/backend/replication/logical/decode.c
+++ b/src/backend/replication/logical/decode.c
@@ -454,7 +454,6 @@ heap2_decode(LogicalDecodingContext *ctx, XLogRecordBuffer *buf)
case XLOG_HEAP2_PRUNE_ON_ACCESS:
case XLOG_HEAP2_PRUNE_VACUUM_SCAN:
case XLOG_HEAP2_PRUNE_VACUUM_CLEANUP:
- case XLOG_HEAP2_VISIBLE:
case XLOG_HEAP2_LOCK_UPDATED:
break;
default:
diff --git a/src/backend/storage/ipc/standby.c b/src/backend/storage/ipc/standby.c
index fc45d72c79b..3655358ed6b 100644
--- a/src/backend/storage/ipc/standby.c
+++ b/src/backend/storage/ipc/standby.c
@@ -475,12 +475,12 @@ ResolveRecoveryConflictWithSnapshot(TransactionId snapshotConflictHorizon,
* If we get passed InvalidTransactionId then we do nothing (no conflict).
*
* This can happen when replaying already-applied WAL records after a
- * standby crash or restart, or when replaying an XLOG_HEAP2_VISIBLE
- * record that marks as frozen a page which was already all-visible. It's
- * also quite common with records generated during index deletion
- * (original execution of the deletion can reason that a recovery conflict
- * which is sufficient for the deletion operation must take place before
- * replay of the deletion record itself).
+ * standby crash or restart, or when replaying a record that marks as
+ * frozen a page which was already marked all-visible in the visibility
+ * map. It's also quite common with records generated during index
+ * deletion (original execution of the deletion can reason that a recovery
+ * conflict which is sufficient for the deletion operation must take place
+ * before replay of the deletion record itself).
*/
if (!TransactionIdIsValid(snapshotConflictHorizon))
return;
diff --git a/src/include/access/heapam_xlog.h b/src/include/access/heapam_xlog.h
index 16c2b2e3c9c..69678187832 100644
--- a/src/include/access/heapam_xlog.h
+++ b/src/include/access/heapam_xlog.h
@@ -60,7 +60,6 @@
#define XLOG_HEAP2_PRUNE_ON_ACCESS 0x10
#define XLOG_HEAP2_PRUNE_VACUUM_SCAN 0x20
#define XLOG_HEAP2_PRUNE_VACUUM_CLEANUP 0x30
-#define XLOG_HEAP2_VISIBLE 0x40
#define XLOG_HEAP2_MULTI_INSERT 0x50
#define XLOG_HEAP2_LOCK_UPDATED 0x60
#define XLOG_HEAP2_NEW_CID 0x70
@@ -294,7 +293,13 @@ typedef struct xl_heap_prune
#define SizeOfHeapPrune (offsetof(xl_heap_prune, flags) + sizeof(uint16))
-/* to handle recovery conflict during logical decoding on standby */
+/*
+ * To handle recovery conflict during logical decoding on standby, we must know
+ * if the table is a catalog table. Note that in visibilitymapdefs.h
+ * VISIBILITYMAP_XLOG_CATALOG_REL is also defined as (1 << 2). xl_heap_prune
+ * records should use XLHP_IS_CATALOG_REL, not VISIBILIYTMAP_XLOG_CATALOG_REL --
+ * even if they only contain updates to the VM.
+ */
#define XLHP_IS_CATALOG_REL (1 << 1)
/*
@@ -443,20 +448,6 @@ typedef struct xl_heap_inplace
#define MinSizeOfHeapInplace (offsetof(xl_heap_inplace, nmsgs) + sizeof(int))
-/*
- * This is what we need to know about setting a visibility map bit
- *
- * Backup blk 0: visibility map buffer
- * Backup blk 1: heap buffer
- */
-typedef struct xl_heap_visible
-{
- TransactionId snapshotConflictHorizon;
- uint8 flags;
-} xl_heap_visible;
-
-#define SizeOfHeapVisible (offsetof(xl_heap_visible, flags) + sizeof(uint8))
-
typedef struct xl_heap_new_cid
{
/*
@@ -500,11 +491,6 @@ extern void heap2_desc(StringInfo buf, XLogReaderState *record);
extern const char *heap2_identify(uint8 info);
extern void heap_xlog_logical_rewrite(XLogReaderState *r);
-extern XLogRecPtr log_heap_visible(Relation rel, Buffer heap_buffer,
- Buffer vm_buffer,
- TransactionId snapshotConflictHorizon,
- uint8 vmflags);
-
/* in heapdesc.c, so it can be shared between frontend/backend code */
extern void heap_xlog_deserialize_prune_and_freeze(char *cursor, uint16 flags,
int *nplans, xlhp_freeze_plan **plans,
diff --git a/src/include/access/visibilitymap.h b/src/include/access/visibilitymap.h
index c6fa37be968..05ba6786b47 100644
--- a/src/include/access/visibilitymap.h
+++ b/src/include/access/visibilitymap.h
@@ -15,7 +15,6 @@
#define VISIBILITYMAP_H
#include "access/visibilitymapdefs.h"
-#include "access/xlogdefs.h"
#include "storage/block.h"
#include "storage/buf.h"
#include "storage/relfilelocator.h"
@@ -32,15 +31,9 @@ extern bool visibilitymap_clear(Relation rel, BlockNumber heapBlk,
extern void visibilitymap_pin(Relation rel, BlockNumber heapBlk,
Buffer *vmbuf);
extern bool visibilitymap_pin_ok(BlockNumber heapBlk, Buffer vmbuf);
-extern uint8 visibilitymap_set(Relation rel,
- BlockNumber heapBlk, Buffer heapBuf,
- XLogRecPtr recptr,
- Buffer vmBuf,
- TransactionId cutoff_xid,
- uint8 flags);
-extern uint8 visibilitymap_set_vmbits(BlockNumber heapBlk,
- Buffer vmBuf, uint8 flags,
- const RelFileLocator rlocator);
+extern uint8 visibilitymap_set(BlockNumber heapBlk,
+ Buffer vmBuf, uint8 flags,
+ const RelFileLocator rlocator);
extern uint8 visibilitymap_get_status(Relation rel, BlockNumber heapBlk, Buffer *vmbuf);
extern void visibilitymap_count(Relation rel, BlockNumber *all_visible, BlockNumber *all_frozen);
extern BlockNumber visibilitymap_prepare_truncate(Relation rel,
diff --git a/src/include/access/visibilitymapdefs.h b/src/include/access/visibilitymapdefs.h
index 5ad5c020877..e01bce4c99f 100644
--- a/src/include/access/visibilitymapdefs.h
+++ b/src/include/access/visibilitymapdefs.h
@@ -21,14 +21,5 @@
#define VISIBILITYMAP_ALL_FROZEN 0x02
#define VISIBILITYMAP_VALID_BITS 0x03 /* OR of all valid visibilitymap
* flags bits */
-/*
- * To detect recovery conflicts during logical decoding on a standby, we need
- * to know if a table is a user catalog table. For that we add an additional
- * bit into xl_heap_visible.flags, in addition to the above.
- *
- * NB: VISIBILITYMAP_XLOG_* may not be passed to visibilitymap_set().
- */
-#define VISIBILITYMAP_XLOG_CATALOG_REL 0x04
-#define VISIBILITYMAP_XLOG_VALID_BITS (VISIBILITYMAP_VALID_BITS | VISIBILITYMAP_XLOG_CATALOG_REL)
#endif /* VISIBILITYMAPDEFS_H */
diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list
index 9dd65b10254..1819f3dbb77 100644
--- a/src/tools/pgindent/typedefs.list
+++ b/src/tools/pgindent/typedefs.list
@@ -4306,7 +4306,6 @@ xl_heap_prune
xl_heap_rewrite_mapping
xl_heap_truncate
xl_heap_update
-xl_heap_visible
xl_invalid_page
xl_invalid_page_key
xl_invalidations
--
2.43.0
v24-0010-Rename-GlobalVisTestIsRemovableXid-to-GlobalVisX.patchtext/x-patch; charset=UTF-8; name=v24-0010-Rename-GlobalVisTestIsRemovableXid-to-GlobalVisX.patchDownload
From 0e0143836cef4f45e89a29ebcceed4a94fbff1d9 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <melanieplageman@gmail.com>
Date: Fri, 18 Jul 2025 16:30:04 -0400
Subject: [PATCH v24 10/16] Rename GlobalVisTestIsRemovableXid() to
GlobalVisXidVisibleToAll()
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
The function is currently only used to check whether a tuple’s xmax is
visible to all transactions (and thus removable). Upcoming changes will
also use it to test whether a tuple’s xmin is visible to all to
decide if a page can be marked all-visible in the visibility map.
The new name, GlobalVisXidVisibleToAll(), better reflects this broader
purpose.
Reviewed-by: Andres Freund <andres@anarazel.de>
Reviewed-by: Kirill Reshke <reshkekirill@gmail.com>
Reviewed-by: Chao Li <li.evan.chao@gmail.com>
---
src/backend/access/heap/heapam_visibility.c | 6 +++---
src/backend/access/heap/pruneheap.c | 12 ++++++------
src/backend/access/spgist/spgvacuum.c | 2 +-
src/backend/storage/ipc/procarray.c | 17 ++++++++---------
src/include/utils/snapmgr.h | 4 ++--
5 files changed, 20 insertions(+), 21 deletions(-)
diff --git a/src/backend/access/heap/heapam_visibility.c b/src/backend/access/heap/heapam_visibility.c
index 05f6946fe60..4ebc8abdbeb 100644
--- a/src/backend/access/heap/heapam_visibility.c
+++ b/src/backend/access/heap/heapam_visibility.c
@@ -1447,7 +1447,7 @@ HeapTupleSatisfiesNonVacuumable(HeapTuple htup, Snapshot snapshot,
{
Assert(TransactionIdIsValid(dead_after));
- if (GlobalVisTestIsRemovableXid(snapshot->vistest, dead_after))
+ if (GlobalVisXidVisibleToAll(snapshot->vistest, dead_after))
res = HEAPTUPLE_DEAD;
}
else
@@ -1512,8 +1512,8 @@ HeapTupleIsSurelyDead(HeapTuple htup, GlobalVisState *vistest)
return false;
/* Deleter committed, so tuple is dead if the XID is old enough. */
- return GlobalVisTestIsRemovableXid(vistest,
- HeapTupleHeaderGetRawXmax(tuple));
+ return GlobalVisXidVisibleToAll(vistest,
+ HeapTupleHeaderGetRawXmax(tuple));
}
/*
diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index ab354add711..00016a0c1dd 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -252,7 +252,7 @@ heap_page_prune_opt(Relation relation, Buffer buffer)
*/
vistest = GlobalVisTestFor(relation);
- if (!GlobalVisTestIsRemovableXid(vistest, prune_xid))
+ if (!GlobalVisXidVisibleToAll(vistest, prune_xid))
return;
/*
@@ -486,7 +486,7 @@ prune_freeze_plan(Oid reloid, Buffer buffer, PruneState *prstate,
* Determining HTSV only once for each tuple is required for correctness,
* to deal with cases where running HTSV twice could result in different
* results. For example, RECENTLY_DEAD can turn to DEAD if another
- * checked item causes GlobalVisTestIsRemovableFullXid() to update the
+ * checked item causes GlobalVisFullXidVisibleToAll() to update the
* horizon, or INSERT_IN_PROGRESS can change to DEAD if the inserting
* transaction aborts.
*
@@ -1299,11 +1299,11 @@ heap_prune_satisfies_vacuum(PruneState *prstate, HeapTuple tup, Buffer buffer)
* Determine whether or not the tuple is considered dead when compared
* with the provided GlobalVisState. On-access pruning does not provide
* VacuumCutoffs. And for vacuum, even if the tuple's xmax is not older
- * than OldestXmin, GlobalVisTestIsRemovableXid() could find the row dead
- * if the GlobalVisState has been updated since the beginning of vacuuming
+ * than OldestXmin, GlobalVisXidVisibleToAll() could find the row dead if
+ * the GlobalVisState has been updated since the beginning of vacuuming
* the relation.
*/
- if (GlobalVisTestIsRemovableXid(prstate->vistest, dead_after))
+ if (GlobalVisXidVisibleToAll(prstate->vistest, dead_after))
return HEAPTUPLE_DEAD;
return res;
@@ -1762,7 +1762,7 @@ heap_prune_record_unchanged_lp_normal(Page page, PruneState *prstate, OffsetNumb
/*
* For now always use prstate->cutoffs for this test, because
* we only update 'all_visible' and 'all_frozen' when freezing
- * is requested. We could use GlobalVisTestIsRemovableXid
+ * is requested. We could use GlobalVisXidVisibleToAll()
* instead, if a non-freezing caller wanted to set the VM bit.
*/
Assert(prstate->cutoffs);
diff --git a/src/backend/access/spgist/spgvacuum.c b/src/backend/access/spgist/spgvacuum.c
index 71ef2e5036f..1c0eb425ee9 100644
--- a/src/backend/access/spgist/spgvacuum.c
+++ b/src/backend/access/spgist/spgvacuum.c
@@ -536,7 +536,7 @@ vacuumRedirectAndPlaceholder(Relation index, Relation heaprel, Buffer buffer)
*/
if (dt->tupstate == SPGIST_REDIRECT &&
(!TransactionIdIsValid(dt->xid) ||
- GlobalVisTestIsRemovableXid(vistest, dt->xid)))
+ GlobalVisXidVisibleToAll(vistest, dt->xid)))
{
dt->tupstate = SPGIST_PLACEHOLDER;
Assert(opaque->nRedirection > 0);
diff --git a/src/backend/storage/ipc/procarray.c b/src/backend/storage/ipc/procarray.c
index 200f72c6e25..235c3b584f6 100644
--- a/src/backend/storage/ipc/procarray.c
+++ b/src/backend/storage/ipc/procarray.c
@@ -4181,8 +4181,7 @@ GlobalVisUpdate(void)
* See comment for GlobalVisState for details.
*/
bool
-GlobalVisTestIsRemovableFullXid(GlobalVisState *state,
- FullTransactionId fxid)
+GlobalVisFullXidVisibleToAll(GlobalVisState *state, FullTransactionId fxid)
{
/*
* If fxid is older than maybe_needed bound, it definitely is visible to
@@ -4216,14 +4215,14 @@ GlobalVisTestIsRemovableFullXid(GlobalVisState *state,
}
/*
- * Wrapper around GlobalVisTestIsRemovableFullXid() for 32bit xids.
+ * Wrapper around GlobalVisFullXidVisibleToAll() for 32bit xids.
*
* It is crucial that this only gets called for xids from a source that
* protects against xid wraparounds (e.g. from a table and thus protected by
* relfrozenxid).
*/
bool
-GlobalVisTestIsRemovableXid(GlobalVisState *state, TransactionId xid)
+GlobalVisXidVisibleToAll(GlobalVisState *state, TransactionId xid)
{
FullTransactionId fxid;
@@ -4237,12 +4236,12 @@ GlobalVisTestIsRemovableXid(GlobalVisState *state, TransactionId xid)
*/
fxid = FullXidRelativeTo(state->definitely_needed, xid);
- return GlobalVisTestIsRemovableFullXid(state, fxid);
+ return GlobalVisFullXidVisibleToAll(state, fxid);
}
/*
* Convenience wrapper around GlobalVisTestFor() and
- * GlobalVisTestIsRemovableFullXid(), see their comments.
+ * GlobalVisFullXidVisibleToAll(), see their comments.
*/
bool
GlobalVisCheckRemovableFullXid(Relation rel, FullTransactionId fxid)
@@ -4251,12 +4250,12 @@ GlobalVisCheckRemovableFullXid(Relation rel, FullTransactionId fxid)
state = GlobalVisTestFor(rel);
- return GlobalVisTestIsRemovableFullXid(state, fxid);
+ return GlobalVisFullXidVisibleToAll(state, fxid);
}
/*
* Convenience wrapper around GlobalVisTestFor() and
- * GlobalVisTestIsRemovableXid(), see their comments.
+ * GlobalVisTestIsVisibleXid(), see their comments.
*/
bool
GlobalVisCheckRemovableXid(Relation rel, TransactionId xid)
@@ -4265,7 +4264,7 @@ GlobalVisCheckRemovableXid(Relation rel, TransactionId xid)
state = GlobalVisTestFor(rel);
- return GlobalVisTestIsRemovableXid(state, xid);
+ return GlobalVisXidVisibleToAll(state, xid);
}
/*
diff --git a/src/include/utils/snapmgr.h b/src/include/utils/snapmgr.h
index 604c1f90216..a0ea2cfcea2 100644
--- a/src/include/utils/snapmgr.h
+++ b/src/include/utils/snapmgr.h
@@ -100,8 +100,8 @@ extern char *ExportSnapshot(Snapshot snapshot);
*/
typedef struct GlobalVisState GlobalVisState;
extern GlobalVisState *GlobalVisTestFor(Relation rel);
-extern bool GlobalVisTestIsRemovableXid(GlobalVisState *state, TransactionId xid);
-extern bool GlobalVisTestIsRemovableFullXid(GlobalVisState *state, FullTransactionId fxid);
+extern bool GlobalVisXidVisibleToAll(GlobalVisState *state, TransactionId xid);
+extern bool GlobalVisFullXidVisibleToAll(GlobalVisState *state, FullTransactionId fxid);
extern bool GlobalVisCheckRemovableXid(Relation rel, TransactionId xid);
extern bool GlobalVisCheckRemovableFullXid(Relation rel, FullTransactionId fxid);
--
2.43.0
v24-0011-Use-GlobalVisState-in-vacuum-to-determine-page-l.patchtext/x-patch; charset=UTF-8; name=v24-0011-Use-GlobalVisState-in-vacuum-to-determine-page-l.patchDownload
From ff5e29b04980eb5ae8c96f32423683dfc26cebd7 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <melanieplageman@gmail.com>
Date: Tue, 29 Jul 2025 14:38:24 -0400
Subject: [PATCH v24 11/16] Use GlobalVisState in vacuum to determine page
level visibility
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
During vacuum's first and third phases, we examine tuples' visibility
to determine if we can set the page all-visible in the visibility map.
Previously, this check compared tuple xmins against a single XID chosen at
the start of vacuum (OldestXmin). We now use GlobalVisState, which also
enables future work to set the VM during on-access pruning, since ordinary
queries have access to GlobalVisState but not OldestXmin.
This also benefits vacuum directly: in some cases, GlobalVisState may
advance during a vacuum, allowing more pages to become considered
all-visible. And, in the future, we could easily add a heuristic to
update GlobalVisState more frequently during vacuums of large tables. In
the rare case that the GlobalVisState moves backward, vacuum falls back
to OldestXmin to ensure we don’t attempt to freeze a dead tuple that
wasn’t yet prunable according to the GlobalVisState.
Because comparing a transaction ID against GlobalVisState is more
expensive than comparing against a single XID, we defer this check until
after scanning all tuples on the page. If visibility_cutoff_xid was
maintained, we perform the GlobalVisState check only once per page.
This is safe because visibility_cutoff_xid records the newest xmin on
the page; if it is globally visible, then the entire page is all-visible.
This approach may result in examining more tuple xmins than before,
since with OldestXmin we could sometimes rule out the page being
all-visible earlier. However, profiling shows the additional cost is not
significant.
Reviewed-by: Andres Freund <andres@anarazel.de>
Reviewed-by: Chao Li <li.evan.chao@gmail.com>
---
src/backend/access/heap/heapam_visibility.c | 28 ++++++++++++++
src/backend/access/heap/pruneheap.c | 43 +++++++++------------
src/backend/access/heap/vacuumlazy.c | 10 ++---
src/include/access/heapam.h | 11 +++---
4 files changed, 58 insertions(+), 34 deletions(-)
diff --git a/src/backend/access/heap/heapam_visibility.c b/src/backend/access/heap/heapam_visibility.c
index 4ebc8abdbeb..edd529dc3c0 100644
--- a/src/backend/access/heap/heapam_visibility.c
+++ b/src/backend/access/heap/heapam_visibility.c
@@ -1189,6 +1189,34 @@ HeapTupleSatisfiesVacuum(HeapTuple htup, TransactionId OldestXmin,
return res;
}
+/*
+ * Nearly the same as HeapTupleSatisfiesVacuum, but uses a GlobalVisState to
+ * determine whether or not a tuple is HEAPTUPLE_DEAD Or
+ * HEAPTUPLE_RECENTLY_DEAD. It serves the same purpose but can be used by
+ * callers that have not calculated a single OldestXmin value.
+ */
+HTSV_Result
+HeapTupleSatisfiesVacuumGlobalVis(HeapTuple htup, GlobalVisState *vistest,
+ Buffer buffer)
+{
+ TransactionId dead_after = InvalidTransactionId;
+ HTSV_Result res;
+
+ res = HeapTupleSatisfiesVacuumHorizon(htup, buffer, &dead_after);
+
+ if (res == HEAPTUPLE_RECENTLY_DEAD)
+ {
+ Assert(TransactionIdIsValid(dead_after));
+
+ if (GlobalVisXidVisibleToAll(vistest, dead_after))
+ res = HEAPTUPLE_DEAD;
+ }
+ else
+ Assert(!TransactionIdIsValid(dead_after));
+
+ return res;
+}
+
/*
* Work horse for HeapTupleSatisfiesVacuum and similar routines.
*
diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index 00016a0c1dd..7e628d4ad59 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -449,11 +449,12 @@ prune_freeze_setup(PruneFreezeParams *params,
/*
* The visibility cutoff xid is the newest xmin of live, committed tuples
- * older than OldestXmin on the page. This field is only kept up-to-date
- * if the page is all-visible. As soon as a tuple is encountered that is
- * not visible to all, this field is unmaintained. As long as it is
- * maintained, it can be used to calculate the snapshot conflict horizon
- * when updating the VM and/or freezing all the tuples on the page.
+ * on the page older than the visibility horizon represented in the
+ * GlobalVisState. This field is only kept up-to-date if the page is
+ * all-visible. As soon as a tuple is encountered that is not visible to
+ * all, this field is unmaintained. As long as it is maintained, it can be
+ * used to calculate the snapshot conflict horizon when updating the VM
+ * and/or freezing all the tuples on the page.
*/
prstate->visibility_cutoff_xid = InvalidTransactionId;
}
@@ -965,14 +966,13 @@ heap_page_will_set_vm(Relation relation,
*/
static bool
heap_page_is_all_visible(Relation rel, Buffer buf,
- TransactionId OldestXmin,
+ GlobalVisState *vistest,
bool *all_frozen,
TransactionId *visibility_cutoff_xid,
OffsetNumber *logging_offnum)
{
- return heap_page_would_be_all_visible(rel, buf,
- OldestXmin,
+ return heap_page_would_be_all_visible(rel, buf, vistest,
NULL, 0,
all_frozen,
visibility_cutoff_xid,
@@ -1058,6 +1058,16 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
prune_freeze_plan(RelationGetRelid(params->relation),
buffer, &prstate, off_loc);
+ /*
+ * After processing all the live tuples on the page, if the newest xmin
+ * amongst them is not visible to everyone, the page cannot be
+ * all-visible.
+ */
+ if (prstate.all_visible &&
+ TransactionIdIsNormal(prstate.visibility_cutoff_xid) &&
+ !GlobalVisXidVisibleToAll(prstate.vistest, prstate.visibility_cutoff_xid))
+ prstate.all_visible = prstate.all_frozen = false;
+
/*
* If checksums are enabled, calling heap_prune_satisfies_vacuum() while
* checking tuple visibility information in prune_freeze_plan() may have
@@ -1227,10 +1237,9 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
bool debug_all_frozen;
Assert(prstate.lpdead_items == 0);
- Assert(prstate.cutoffs);
Assert(heap_page_is_all_visible(params->relation, buffer,
- prstate.cutoffs->OldestXmin,
+ prstate.vistest,
&debug_all_frozen,
&debug_cutoff, off_loc));
@@ -1759,20 +1768,6 @@ heap_prune_record_unchanged_lp_normal(Page page, PruneState *prstate, OffsetNumb
*/
xmin = HeapTupleHeaderGetXmin(htup);
- /*
- * For now always use prstate->cutoffs for this test, because
- * we only update 'all_visible' and 'all_frozen' when freezing
- * is requested. We could use GlobalVisXidVisibleToAll()
- * instead, if a non-freezing caller wanted to set the VM bit.
- */
- Assert(prstate->cutoffs);
- if (!TransactionIdPrecedes(xmin, prstate->cutoffs->OldestXmin))
- {
- prstate->all_visible = false;
- prstate->all_frozen = false;
- break;
- }
-
/* Track newest xmin on page. */
if (TransactionIdFollows(xmin, prstate->visibility_cutoff_xid) &&
TransactionIdIsNormal(xmin))
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index b73dbdbe4ed..587cf906fe6 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -2734,7 +2734,7 @@ lazy_vacuum_heap_page(LVRelState *vacrel, BlockNumber blkno, Buffer buffer,
* done outside the critical section.
*/
if (heap_page_would_be_all_visible(vacrel->rel, buffer,
- vacrel->cutoffs.OldestXmin,
+ vacrel->vistest,
deadoffsets, num_offsets,
&all_frozen, &visibility_cutoff_xid,
&vacrel->offnum))
@@ -3493,7 +3493,7 @@ dead_items_cleanup(LVRelState *vacrel)
* Returns true if the page is all-visible other than the provided
* deadoffsets and false otherwise.
*
- * OldestXmin is used to determine visibility.
+ * vistest is used to determine visibility.
*
* Output parameters:
*
@@ -3509,7 +3509,7 @@ dead_items_cleanup(LVRelState *vacrel)
*/
bool
heap_page_would_be_all_visible(Relation rel, Buffer buf,
- TransactionId OldestXmin,
+ GlobalVisState *vistest,
OffsetNumber *deadoffsets,
int ndeadoffsets,
bool *all_frozen,
@@ -3583,7 +3583,7 @@ heap_page_would_be_all_visible(Relation rel, Buffer buf,
/* Visibility checks may do IO or allocate memory */
Assert(CritSectionCount == 0);
- switch (HeapTupleSatisfiesVacuum(&tuple, OldestXmin, buf))
+ switch (HeapTupleSatisfiesVacuumGlobalVis(&tuple, vistest, buf))
{
case HEAPTUPLE_LIVE:
{
@@ -3602,7 +3602,7 @@ heap_page_would_be_all_visible(Relation rel, Buffer buf,
* that everyone sees it as committed?
*/
xmin = HeapTupleHeaderGetXmin(tuple.t_data);
- if (!TransactionIdPrecedes(xmin, OldestXmin))
+ if (!GlobalVisXidVisibleToAll(vistest, xmin))
{
all_visible = false;
*all_frozen = false;
diff --git a/src/include/access/heapam.h b/src/include/access/heapam.h
index 392af6503da..b6f1b3fb448 100644
--- a/src/include/access/heapam.h
+++ b/src/include/access/heapam.h
@@ -278,10 +278,9 @@ typedef struct PruneFreezeParams
/*
* Contains the cutoffs used for freezing. They are required if the
- * HEAP_PAGE_PRUNE_FREEZE option is set. cutoffs->OldestXmin is also used
- * to determine if dead tuples are HEAPTUPLE_RECENTLY_DEAD or
- * HEAPTUPLE_DEAD. Currently only vacuum passes in cutoffs. Vacuum
- * calculates them once, at the beginning of vacuuming the relation.
+ * HEAP_PAGE_PRUNE_FREEZE option is set. Currently only vacuum passes in
+ * cutoffs. Vacuum calculates them once, at the beginning of vacuuming the
+ * relation.
*/
struct VacuumCutoffs *cutoffs;
} PruneFreezeParams;
@@ -445,7 +444,7 @@ extern void log_heap_prune_and_freeze(Relation relation, Buffer buffer,
extern void heap_vacuum_rel(Relation rel,
const VacuumParams params, BufferAccessStrategy bstrategy);
extern bool heap_page_would_be_all_visible(Relation rel, Buffer buf,
- TransactionId OldestXmin,
+ GlobalVisState *vistest,
OffsetNumber *deadoffsets,
int ndeadoffsets,
bool *all_frozen,
@@ -459,6 +458,8 @@ extern TM_Result HeapTupleSatisfiesUpdate(HeapTuple htup, CommandId curcid,
Buffer buffer);
extern HTSV_Result HeapTupleSatisfiesVacuum(HeapTuple htup, TransactionId OldestXmin,
Buffer buffer);
+extern HTSV_Result HeapTupleSatisfiesVacuumGlobalVis(HeapTuple htup,
+ GlobalVisState *vistest, Buffer buffer);
extern HTSV_Result HeapTupleSatisfiesVacuumHorizon(HeapTuple htup, Buffer buffer,
TransactionId *dead_after);
extern void HeapTupleSetHintBits(HeapTupleHeader tuple, Buffer buffer,
--
2.43.0
v24-0012-Unset-all_visible-sooner-if-not-freezing.patchtext/x-patch; charset=UTF-8; name=v24-0012-Unset-all_visible-sooner-if-not-freezing.patchDownload
From 4869ad945d582cbe7dd57b5bb5ef458187ba4f64 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <melanieplageman@gmail.com>
Date: Tue, 14 Oct 2025 15:22:35 -0400
Subject: [PATCH v24 12/16] Unset all_visible sooner if not freezing
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
In the prune/freeze path, we currently delay clearing all_visible and
all_frozen in the presence of dead items to allow opportunistic
freezing.
However, if no freezing will be attempted, there’s no need to delay.
Clearing the flags earlier avoids extra bookkeeping in
heap_prune_record_unchanged_lp_normal(). This currently has no runtime
effect because all callers that consider setting the VM also prepare
freeze plans, but upcoming changes will allow on-access pruning to set
the VM without freezing. The extra bookkeeping was noticeable in a
profile of on-access VM setting.
Reviewed-by: Chao Li <li.evan.chao@gmail.com>
---
src/backend/access/heap/pruneheap.c | 15 +++++++++++++--
1 file changed, 13 insertions(+), 2 deletions(-)
diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index 7e628d4ad59..4ed2eff5e05 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -1625,8 +1625,13 @@ heap_prune_record_dead(PruneState *prstate, OffsetNumber offnum,
/*
* Deliberately delay unsetting all_visible and all_frozen until later
* during pruning. Removable dead tuples shouldn't preclude freezing the
- * page.
+ * page. If we won't attempt freezing, just unset all-visible now, though.
*/
+ if (!prstate->attempt_freeze)
+ {
+ prstate->all_visible = false;
+ prstate->all_frozen = false;
+ }
/* Record the dead offset for vacuum */
prstate->deadoffsets[prstate->lpdead_items++] = offnum;
@@ -1885,8 +1890,14 @@ heap_prune_record_unchanged_lp_dead(Page page, PruneState *prstate, OffsetNumber
* Similarly, don't unset all_visible and all_frozen until later, at the
* end of heap_page_prune_and_freeze(). This will allow us to attempt to
* freeze the page after pruning. As long as we unset it before updating
- * the visibility map, this will be correct.
+ * the visibility map, this will be correct. If we won't attempt freezing,
+ * though, just unset all-visible now.
*/
+ if (!prstate->attempt_freeze)
+ {
+ prstate->all_visible = false;
+ prstate->all_frozen = false;
+ }
/* Record the dead offset for vacuum */
prstate->deadoffsets[prstate->lpdead_items++] = offnum;
--
2.43.0
v24-0013-Track-which-relations-are-modified-by-a-query.patchtext/x-patch; charset=US-ASCII; name=v24-0013-Track-which-relations-are-modified-by-a-query.patchDownload
From 031b80ff688f33c0be14037d7b5e7a06cc9d6aef Mon Sep 17 00:00:00 2001
From: Melanie Plageman <melanieplageman@gmail.com>
Date: Wed, 3 Dec 2025 15:07:24 -0500
Subject: [PATCH v24 13/16] Track which relations are modified by a query
Save the relids in a bitmap in the estate. A later commit will pass this
information down to scan nodes to control whether or not the scan allows
setting the visibility map while on-access pruning. We don't want to set
the visibility map if the query is just going to modify the page
immediately after.
---
src/backend/executor/execMain.c | 4 ++++
src/backend/executor/execUtils.c | 2 ++
src/include/nodes/execnodes.h | 6 ++++++
3 files changed, 12 insertions(+)
diff --git a/src/backend/executor/execMain.c b/src/backend/executor/execMain.c
index 27c9eec697b..0630a5af79e 100644
--- a/src/backend/executor/execMain.c
+++ b/src/backend/executor/execMain.c
@@ -916,6 +916,10 @@ InitPlan(QueryDesc *queryDesc, int eflags)
break;
}
+ /* If it has a rowmark, the relation is modified */
+ estate->es_modified_relids = bms_add_member(estate->es_modified_relids,
+ rc->rti);
+
/* Check that relation is a legal target for marking */
if (relation)
CheckValidRowMarkRel(relation, rc->markType);
diff --git a/src/backend/executor/execUtils.c b/src/backend/executor/execUtils.c
index fdc65c2b42b..28a06dcd244 100644
--- a/src/backend/executor/execUtils.c
+++ b/src/backend/executor/execUtils.c
@@ -893,6 +893,8 @@ ExecInitResultRelation(EState *estate, ResultRelInfo *resultRelInfo,
estate->es_result_relations = (ResultRelInfo **)
palloc0(estate->es_range_table_size * sizeof(ResultRelInfo *));
estate->es_result_relations[rti - 1] = resultRelInfo;
+ estate->es_modified_relids = bms_add_member(estate->es_modified_relids,
+ rti);
/*
* Saving in the list allows to avoid needlessly traversing the whole
diff --git a/src/include/nodes/execnodes.h b/src/include/nodes/execnodes.h
index 64ff6996431..7f6522cea8e 100644
--- a/src/include/nodes/execnodes.h
+++ b/src/include/nodes/execnodes.h
@@ -676,6 +676,12 @@ typedef struct EState
* ExecDoInitialPruning() */
const char *es_sourceText; /* Source text from QueryDesc */
+ /*
+ * RT indexes of relations modified by the query either through
+ * UPDATE/DELETE/INSERT/MERGE or SELECT FOR UPDATE
+ */
+ Bitmapset *es_modified_relids;
+
JunkFilter *es_junkFilter; /* top-level junk filter, if any */
/* If query can insert/delete tuples, the command ID to mark them with */
--
2.43.0
v24-0014-Pass-down-information-on-table-modification-to-s.patchtext/x-patch; charset=US-ASCII; name=v24-0014-Pass-down-information-on-table-modification-to-s.patchDownload
From 00c65fad7e817a2a10ec47272bbe990c50502078 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <melanieplageman@gmail.com>
Date: Wed, 3 Dec 2025 15:12:18 -0500
Subject: [PATCH v24 14/16] Pass down information on table modification to scan
node
Pass down information to sequential scan, index scan, and bitmap table
scan nodes on whether or not the query modifies the relation being
scanned. A later commit will use this information to update the VM
during on-access pruning only if the relation is not modified by the
query.
---
contrib/pgrowlocks/pgrowlocks.c | 2 +-
src/backend/access/brin/brin.c | 3 ++-
src/backend/access/gin/gininsert.c | 3 ++-
src/backend/access/heap/heapam_handler.c | 7 +++---
src/backend/access/index/genam.c | 4 ++--
src/backend/access/index/indexam.c | 6 +++---
src/backend/access/nbtree/nbtsort.c | 2 +-
src/backend/access/table/tableam.c | 7 +++---
src/backend/commands/constraint.c | 2 +-
src/backend/commands/copyto.c | 2 +-
src/backend/commands/tablecmds.c | 4 ++--
src/backend/commands/typecmds.c | 4 ++--
src/backend/executor/execIndexing.c | 2 +-
src/backend/executor/execReplication.c | 8 +++----
src/backend/executor/nodeBitmapHeapscan.c | 9 +++++++-
src/backend/executor/nodeIndexonlyscan.c | 2 +-
src/backend/executor/nodeIndexscan.c | 11 ++++++++--
src/backend/executor/nodeSeqscan.c | 26 ++++++++++++++++++++---
src/backend/partitioning/partbounds.c | 2 +-
src/backend/utils/adt/selfuncs.c | 2 +-
src/include/access/genam.h | 2 +-
src/include/access/heapam.h | 6 ++++++
src/include/access/tableam.h | 19 ++++++++++-------
23 files changed, 91 insertions(+), 44 deletions(-)
diff --git a/contrib/pgrowlocks/pgrowlocks.c b/contrib/pgrowlocks/pgrowlocks.c
index f88269332b6..27f01d8055f 100644
--- a/contrib/pgrowlocks/pgrowlocks.c
+++ b/contrib/pgrowlocks/pgrowlocks.c
@@ -114,7 +114,7 @@ pgrowlocks(PG_FUNCTION_ARGS)
RelationGetRelationName(rel));
/* Scan the relation */
- scan = table_beginscan(rel, GetActiveSnapshot(), 0, NULL);
+ scan = table_beginscan(rel, GetActiveSnapshot(), 0, NULL, 0);
hscan = (HeapScanDesc) scan;
attinmeta = TupleDescGetAttInMetadata(rsinfo->setDesc);
diff --git a/src/backend/access/brin/brin.c b/src/backend/access/brin/brin.c
index cb3331921cb..b9613787b85 100644
--- a/src/backend/access/brin/brin.c
+++ b/src/backend/access/brin/brin.c
@@ -2842,7 +2842,8 @@ _brin_parallel_scan_and_build(BrinBuildState *state,
indexInfo->ii_Concurrent = brinshared->isconcurrent;
scan = table_beginscan_parallel(heap,
- ParallelTableScanFromBrinShared(brinshared));
+ ParallelTableScanFromBrinShared(brinshared),
+ 0);
reltuples = table_index_build_scan(heap, index, indexInfo, true, true,
brinbuildCallbackParallel, state, scan);
diff --git a/src/backend/access/gin/gininsert.c b/src/backend/access/gin/gininsert.c
index f87c60a230c..645688f9241 100644
--- a/src/backend/access/gin/gininsert.c
+++ b/src/backend/access/gin/gininsert.c
@@ -2058,7 +2058,8 @@ _gin_parallel_scan_and_build(GinBuildState *state,
indexInfo->ii_Concurrent = ginshared->isconcurrent;
scan = table_beginscan_parallel(heap,
- ParallelTableScanFromGinBuildShared(ginshared));
+ ParallelTableScanFromGinBuildShared(ginshared),
+ 0);
reltuples = table_index_build_scan(heap, index, indexInfo, true, progress,
ginBuildCallbackParallel, state, scan);
diff --git a/src/backend/access/heap/heapam_handler.c b/src/backend/access/heap/heapam_handler.c
index bcbac844bb6..d7fac94826d 100644
--- a/src/backend/access/heap/heapam_handler.c
+++ b/src/backend/access/heap/heapam_handler.c
@@ -79,12 +79,13 @@ heapam_slot_callbacks(Relation relation)
*/
static IndexFetchTableData *
-heapam_index_fetch_begin(Relation rel)
+heapam_index_fetch_begin(Relation rel, uint32 flags)
{
IndexFetchHeapData *hscan = palloc0(sizeof(IndexFetchHeapData));
hscan->xs_base.rel = rel;
hscan->xs_cbuf = InvalidBuffer;
+ hscan->modifies_base_rel = !(flags & SO_HINT_REL_READ_ONLY);
return &hscan->xs_base;
}
@@ -753,7 +754,7 @@ heapam_relation_copy_for_cluster(Relation OldHeap, Relation NewHeap,
tableScan = NULL;
heapScan = NULL;
- indexScan = index_beginscan(OldHeap, OldIndex, SnapshotAny, NULL, 0, 0);
+ indexScan = index_beginscan(OldHeap, OldIndex, SnapshotAny, NULL, 0, 0, 0);
index_rescan(indexScan, NULL, 0, NULL, 0);
}
else
@@ -762,7 +763,7 @@ heapam_relation_copy_for_cluster(Relation OldHeap, Relation NewHeap,
pgstat_progress_update_param(PROGRESS_CLUSTER_PHASE,
PROGRESS_CLUSTER_PHASE_SEQ_SCAN_HEAP);
- tableScan = table_beginscan(OldHeap, SnapshotAny, 0, (ScanKey) NULL);
+ tableScan = table_beginscan(OldHeap, SnapshotAny, 0, (ScanKey) NULL, 0);
heapScan = (HeapScanDesc) tableScan;
indexScan = NULL;
diff --git a/src/backend/access/index/genam.c b/src/backend/access/index/genam.c
index c96917085c2..9d425504e1b 100644
--- a/src/backend/access/index/genam.c
+++ b/src/backend/access/index/genam.c
@@ -447,7 +447,7 @@ systable_beginscan(Relation heapRelation,
}
sysscan->iscan = index_beginscan(heapRelation, irel,
- snapshot, NULL, nkeys, 0);
+ snapshot, NULL, nkeys, 0, 0);
index_rescan(sysscan->iscan, idxkey, nkeys, NULL, 0);
sysscan->scan = NULL;
@@ -708,7 +708,7 @@ systable_beginscan_ordered(Relation heapRelation,
}
sysscan->iscan = index_beginscan(heapRelation, indexRelation,
- snapshot, NULL, nkeys, 0);
+ snapshot, NULL, nkeys, 0, 0);
index_rescan(sysscan->iscan, idxkey, nkeys, NULL, 0);
sysscan->scan = NULL;
diff --git a/src/backend/access/index/indexam.c b/src/backend/access/index/indexam.c
index 0492d92d23b..b5523cf2ab1 100644
--- a/src/backend/access/index/indexam.c
+++ b/src/backend/access/index/indexam.c
@@ -257,7 +257,7 @@ index_beginscan(Relation heapRelation,
Relation indexRelation,
Snapshot snapshot,
IndexScanInstrumentation *instrument,
- int nkeys, int norderbys)
+ int nkeys, int norderbys, uint32 flags)
{
IndexScanDesc scan;
@@ -284,7 +284,7 @@ index_beginscan(Relation heapRelation,
scan->instrument = instrument;
/* prepare to fetch index matches from table */
- scan->xs_heapfetch = table_index_fetch_begin(heapRelation);
+ scan->xs_heapfetch = table_index_fetch_begin(heapRelation, flags);
return scan;
}
@@ -615,7 +615,7 @@ index_beginscan_parallel(Relation heaprel, Relation indexrel,
scan->instrument = instrument;
/* prepare to fetch index matches from table */
- scan->xs_heapfetch = table_index_fetch_begin(heaprel);
+ scan->xs_heapfetch = table_index_fetch_begin(heaprel, 0);
return scan;
}
diff --git a/src/backend/access/nbtree/nbtsort.c b/src/backend/access/nbtree/nbtsort.c
index 454adaee7dc..02ab0233e59 100644
--- a/src/backend/access/nbtree/nbtsort.c
+++ b/src/backend/access/nbtree/nbtsort.c
@@ -1925,7 +1925,7 @@ _bt_parallel_scan_and_sort(BTSpool *btspool, BTSpool *btspool2,
indexInfo = BuildIndexInfo(btspool->index);
indexInfo->ii_Concurrent = btshared->isconcurrent;
scan = table_beginscan_parallel(btspool->heap,
- ParallelTableScanFromBTShared(btshared));
+ ParallelTableScanFromBTShared(btshared), 0);
reltuples = table_index_build_scan(btspool->heap, btspool->index, indexInfo,
true, progress, _bt_build_callback,
&buildstate, scan);
diff --git a/src/backend/access/table/tableam.c b/src/backend/access/table/tableam.c
index 1e099febdc8..db2a302a486 100644
--- a/src/backend/access/table/tableam.c
+++ b/src/backend/access/table/tableam.c
@@ -163,10 +163,11 @@ table_parallelscan_initialize(Relation rel, ParallelTableScanDesc pscan,
}
TableScanDesc
-table_beginscan_parallel(Relation relation, ParallelTableScanDesc pscan)
+table_beginscan_parallel(Relation relation, ParallelTableScanDesc pscan, uint32 flags)
{
Snapshot snapshot;
- uint32 flags = SO_TYPE_SEQSCAN |
+
+ flags |= SO_TYPE_SEQSCAN |
SO_ALLOW_STRAT | SO_ALLOW_SYNC | SO_ALLOW_PAGEMODE;
Assert(RelFileLocatorEquals(relation->rd_locator, pscan->phs_locator));
@@ -248,7 +249,7 @@ table_index_fetch_tuple_check(Relation rel,
bool found;
slot = table_slot_create(rel, NULL);
- scan = table_index_fetch_begin(rel);
+ scan = table_index_fetch_begin(rel, 0);
found = table_index_fetch_tuple(scan, tid, snapshot, slot, &call_again,
all_dead);
table_index_fetch_end(scan);
diff --git a/src/backend/commands/constraint.c b/src/backend/commands/constraint.c
index 3497a8221f2..97c8278e36d 100644
--- a/src/backend/commands/constraint.c
+++ b/src/backend/commands/constraint.c
@@ -106,7 +106,7 @@ unique_key_recheck(PG_FUNCTION_ARGS)
*/
tmptid = checktid;
{
- IndexFetchTableData *scan = table_index_fetch_begin(trigdata->tg_relation);
+ IndexFetchTableData *scan = table_index_fetch_begin(trigdata->tg_relation, 0);
bool call_again = false;
if (!table_index_fetch_tuple(scan, &tmptid, SnapshotSelf, slot,
diff --git a/src/backend/commands/copyto.c b/src/backend/commands/copyto.c
index cef452584e5..22b453dc617 100644
--- a/src/backend/commands/copyto.c
+++ b/src/backend/commands/copyto.c
@@ -1156,7 +1156,7 @@ CopyRelationTo(CopyToState cstate, Relation rel, Relation root_rel, uint64 *proc
AttrMap *map = NULL;
TupleTableSlot *root_slot = NULL;
- scandesc = table_beginscan(rel, GetActiveSnapshot(), 0, NULL);
+ scandesc = table_beginscan(rel, GetActiveSnapshot(), 0, NULL, 0);
slot = table_slot_create(rel, NULL);
/*
diff --git a/src/backend/commands/tablecmds.c b/src/backend/commands/tablecmds.c
index 07e5b95782e..58dbbf4d851 100644
--- a/src/backend/commands/tablecmds.c
+++ b/src/backend/commands/tablecmds.c
@@ -6345,7 +6345,7 @@ ATRewriteTable(AlteredTableInfo *tab, Oid OIDNewHeap)
* checking all the constraints.
*/
snapshot = RegisterSnapshot(GetLatestSnapshot());
- scan = table_beginscan(oldrel, snapshot, 0, NULL);
+ scan = table_beginscan(oldrel, snapshot, 0, NULL, 0);
/*
* Switch to per-tuple memory context and reset it for each tuple
@@ -13730,7 +13730,7 @@ validateForeignKeyConstraint(char *conname,
*/
snapshot = RegisterSnapshot(GetLatestSnapshot());
slot = table_slot_create(rel, NULL);
- scan = table_beginscan(rel, snapshot, 0, NULL);
+ scan = table_beginscan(rel, snapshot, 0, NULL, 0);
perTupCxt = AllocSetContextCreate(CurrentMemoryContext,
"validateForeignKeyConstraint",
diff --git a/src/backend/commands/typecmds.c b/src/backend/commands/typecmds.c
index 47d5047fe8b..055759cd343 100644
--- a/src/backend/commands/typecmds.c
+++ b/src/backend/commands/typecmds.c
@@ -3157,7 +3157,7 @@ validateDomainNotNullConstraint(Oid domainoid)
/* Scan all tuples in this relation */
snapshot = RegisterSnapshot(GetLatestSnapshot());
- scan = table_beginscan(testrel, snapshot, 0, NULL);
+ scan = table_beginscan(testrel, snapshot, 0, NULL, 0);
slot = table_slot_create(testrel, NULL);
while (table_scan_getnextslot(scan, ForwardScanDirection, slot))
{
@@ -3238,7 +3238,7 @@ validateDomainCheckConstraint(Oid domainoid, const char *ccbin, LOCKMODE lockmod
/* Scan all tuples in this relation */
snapshot = RegisterSnapshot(GetLatestSnapshot());
- scan = table_beginscan(testrel, snapshot, 0, NULL);
+ scan = table_beginscan(testrel, snapshot, 0, NULL, 0);
slot = table_slot_create(testrel, NULL);
while (table_scan_getnextslot(scan, ForwardScanDirection, slot))
{
diff --git a/src/backend/executor/execIndexing.c b/src/backend/executor/execIndexing.c
index dd323c9b9fd..b41bfeca244 100644
--- a/src/backend/executor/execIndexing.c
+++ b/src/backend/executor/execIndexing.c
@@ -816,7 +816,7 @@ check_exclusion_or_unique_constraint(Relation heap, Relation index,
retry:
conflict = false;
found_self = false;
- index_scan = index_beginscan(heap, index, &DirtySnapshot, NULL, indnkeyatts, 0);
+ index_scan = index_beginscan(heap, index, &DirtySnapshot, NULL, indnkeyatts, 0, 0);
index_rescan(index_scan, scankeys, indnkeyatts, NULL, 0);
while (index_getnext_slot(index_scan, ForwardScanDirection, existing_slot))
diff --git a/src/backend/executor/execReplication.c b/src/backend/executor/execReplication.c
index def32774c90..473d236e551 100644
--- a/src/backend/executor/execReplication.c
+++ b/src/backend/executor/execReplication.c
@@ -204,7 +204,7 @@ RelationFindReplTupleByIndex(Relation rel, Oid idxoid,
skey_attoff = build_replindex_scan_key(skey, rel, idxrel, searchslot);
/* Start an index scan. */
- scan = index_beginscan(rel, idxrel, &snap, NULL, skey_attoff, 0);
+ scan = index_beginscan(rel, idxrel, &snap, NULL, skey_attoff, 0, 0);
retry:
found = false;
@@ -382,7 +382,7 @@ RelationFindReplTupleSeq(Relation rel, LockTupleMode lockmode,
/* Start a heap scan. */
InitDirtySnapshot(snap);
- scan = table_beginscan(rel, &snap, 0, NULL);
+ scan = table_beginscan(rel, &snap, 0, NULL, 0);
scanslot = table_slot_create(rel, NULL);
retry:
@@ -601,7 +601,7 @@ RelationFindDeletedTupleInfoSeq(Relation rel, TupleTableSlot *searchslot,
* not yet committed or those just committed prior to the scan are
* excluded in update_most_recent_deletion_info().
*/
- scan = table_beginscan(rel, SnapshotAny, 0, NULL);
+ scan = table_beginscan(rel, SnapshotAny, 0, NULL, 0);
scanslot = table_slot_create(rel, NULL);
table_rescan(scan, NULL);
@@ -665,7 +665,7 @@ RelationFindDeletedTupleInfoByIndex(Relation rel, Oid idxoid,
* not yet committed or those just committed prior to the scan are
* excluded in update_most_recent_deletion_info().
*/
- scan = index_beginscan(rel, idxrel, SnapshotAny, NULL, skey_attoff, 0);
+ scan = index_beginscan(rel, idxrel, SnapshotAny, NULL, skey_attoff, 0, 0);
index_rescan(scan, skey, skey_attoff, NULL, 0);
diff --git a/src/backend/executor/nodeBitmapHeapscan.c b/src/backend/executor/nodeBitmapHeapscan.c
index bf24f3d7fe0..0d854db51a1 100644
--- a/src/backend/executor/nodeBitmapHeapscan.c
+++ b/src/backend/executor/nodeBitmapHeapscan.c
@@ -105,11 +105,18 @@ BitmapTableScanSetup(BitmapHeapScanState *node)
*/
if (!node->ss.ss_currentScanDesc)
{
+ uint32 flags = 0;
+
+ if (!bms_is_member(((Scan *) node->ss.ps.plan)->scanrelid,
+ node->ss.ps.state->es_modified_relids))
+ flags = SO_HINT_REL_READ_ONLY;
+
node->ss.ss_currentScanDesc =
table_beginscan_bm(node->ss.ss_currentRelation,
node->ss.ps.state->es_snapshot,
0,
- NULL);
+ NULL,
+ flags);
}
node->ss.ss_currentScanDesc->st.rs_tbmiterator = tbmiterator;
diff --git a/src/backend/executor/nodeIndexonlyscan.c b/src/backend/executor/nodeIndexonlyscan.c
index f464cca9507..87b04b1b88e 100644
--- a/src/backend/executor/nodeIndexonlyscan.c
+++ b/src/backend/executor/nodeIndexonlyscan.c
@@ -94,7 +94,7 @@ IndexOnlyNext(IndexOnlyScanState *node)
estate->es_snapshot,
&node->ioss_Instrument,
node->ioss_NumScanKeys,
- node->ioss_NumOrderByKeys);
+ node->ioss_NumOrderByKeys, 0);
node->ioss_ScanDesc = scandesc;
diff --git a/src/backend/executor/nodeIndexscan.c b/src/backend/executor/nodeIndexscan.c
index f36929deec3..90f929ce741 100644
--- a/src/backend/executor/nodeIndexscan.c
+++ b/src/backend/executor/nodeIndexscan.c
@@ -102,6 +102,12 @@ IndexNext(IndexScanState *node)
if (scandesc == NULL)
{
+ uint32 flags = 0;
+
+ if (!bms_is_member(((Scan *) node->ss.ps.plan)->scanrelid,
+ estate->es_modified_relids))
+ flags = SO_HINT_REL_READ_ONLY;
+
/*
* We reach here if the index scan is not parallel, or if we're
* serially executing an index scan that was planned to be parallel.
@@ -111,7 +117,8 @@ IndexNext(IndexScanState *node)
estate->es_snapshot,
&node->iss_Instrument,
node->iss_NumScanKeys,
- node->iss_NumOrderByKeys);
+ node->iss_NumOrderByKeys,
+ flags);
node->iss_ScanDesc = scandesc;
@@ -207,7 +214,7 @@ IndexNextWithReorder(IndexScanState *node)
estate->es_snapshot,
&node->iss_Instrument,
node->iss_NumScanKeys,
- node->iss_NumOrderByKeys);
+ node->iss_NumOrderByKeys, 0);
node->iss_ScanDesc = scandesc;
diff --git a/src/backend/executor/nodeSeqscan.c b/src/backend/executor/nodeSeqscan.c
index 94047d29430..4d0cbb9dee4 100644
--- a/src/backend/executor/nodeSeqscan.c
+++ b/src/backend/executor/nodeSeqscan.c
@@ -65,13 +65,20 @@ SeqNext(SeqScanState *node)
if (scandesc == NULL)
{
+ uint32 flags = 0;
+
+ if (!bms_is_member(((Scan *) node->ss.ps.plan)->scanrelid,
+ estate->es_modified_relids))
+ flags = SO_HINT_REL_READ_ONLY;
+
/*
* We reach here if the scan is not parallel, or if we're serially
* executing a scan that was planned to be parallel.
*/
scandesc = table_beginscan(node->ss.ss_currentRelation,
estate->es_snapshot,
- 0, NULL);
+ 0, NULL, flags);
+
node->ss.ss_currentScanDesc = scandesc;
}
@@ -367,14 +374,20 @@ ExecSeqScanInitializeDSM(SeqScanState *node,
{
EState *estate = node->ss.ps.state;
ParallelTableScanDesc pscan;
+ uint32 flags = 0;
pscan = shm_toc_allocate(pcxt->toc, node->pscan_len);
table_parallelscan_initialize(node->ss.ss_currentRelation,
pscan,
estate->es_snapshot);
shm_toc_insert(pcxt->toc, node->ss.ps.plan->plan_node_id, pscan);
+ if (!bms_is_member(((Scan *) node->ss.ps.plan)->scanrelid,
+ estate->es_modified_relids))
+ flags = SO_HINT_REL_READ_ONLY;
+
node->ss.ss_currentScanDesc =
- table_beginscan_parallel(node->ss.ss_currentRelation, pscan);
+ table_beginscan_parallel(node->ss.ss_currentRelation, pscan,
+ flags);
}
/* ----------------------------------------------------------------
@@ -404,8 +417,15 @@ ExecSeqScanInitializeWorker(SeqScanState *node,
ParallelWorkerContext *pwcxt)
{
ParallelTableScanDesc pscan;
+ uint32 flags = 0;
+
+ if (!bms_is_member(((Scan *) node->ss.ps.plan)->scanrelid,
+ node->ss.ps.state->es_modified_relids))
+ flags = SO_HINT_REL_READ_ONLY;
pscan = shm_toc_lookup(pwcxt->toc, node->ss.ps.plan->plan_node_id, false);
node->ss.ss_currentScanDesc =
- table_beginscan_parallel(node->ss.ss_currentRelation, pscan);
+ table_beginscan_parallel(node->ss.ss_currentRelation,
+ pscan,
+ flags);
}
diff --git a/src/backend/partitioning/partbounds.c b/src/backend/partitioning/partbounds.c
index 8ba038c5ef4..d3b340ee2a7 100644
--- a/src/backend/partitioning/partbounds.c
+++ b/src/backend/partitioning/partbounds.c
@@ -3370,7 +3370,7 @@ check_default_partition_contents(Relation parent, Relation default_rel,
econtext = GetPerTupleExprContext(estate);
snapshot = RegisterSnapshot(GetLatestSnapshot());
tupslot = table_slot_create(part_rel, &estate->es_tupleTable);
- scan = table_beginscan(part_rel, snapshot, 0, NULL);
+ scan = table_beginscan(part_rel, snapshot, 0, NULL, 0);
/*
* Switch to per-tuple memory context and reset it for each tuple
diff --git a/src/backend/utils/adt/selfuncs.c b/src/backend/utils/adt/selfuncs.c
index 540aa9628d7..28434146eba 100644
--- a/src/backend/utils/adt/selfuncs.c
+++ b/src/backend/utils/adt/selfuncs.c
@@ -7100,7 +7100,7 @@ get_actual_variable_endpoint(Relation heapRel,
index_scan = index_beginscan(heapRel, indexRel,
&SnapshotNonVacuumable, NULL,
- 1, 0);
+ 1, 0, 0);
/* Set it up for index-only scan */
index_scan->xs_want_itup = true;
index_rescan(index_scan, scankeys, 1, NULL, 0);
diff --git a/src/include/access/genam.h b/src/include/access/genam.h
index 9200a22bd9f..d29d9e905fc 100644
--- a/src/include/access/genam.h
+++ b/src/include/access/genam.h
@@ -177,7 +177,7 @@ extern IndexScanDesc index_beginscan(Relation heapRelation,
Relation indexRelation,
Snapshot snapshot,
IndexScanInstrumentation *instrument,
- int nkeys, int norderbys);
+ int nkeys, int norderbys, uint32 flags);
extern IndexScanDesc index_beginscan_bitmap(Relation indexRelation,
Snapshot snapshot,
IndexScanInstrumentation *instrument,
diff --git a/src/include/access/heapam.h b/src/include/access/heapam.h
index b6f1b3fb448..480a1bd654f 100644
--- a/src/include/access/heapam.h
+++ b/src/include/access/heapam.h
@@ -119,6 +119,12 @@ typedef struct IndexFetchHeapData
Buffer xs_cbuf; /* current heap buffer in scan, if any */
/* NB: if xs_cbuf is not InvalidBuffer, we hold a pin on that buffer */
+
+ /*
+ * Some optimizations can only be performed if the query does not modify
+ * the underlying relation. Track that here.
+ */
+ bool modifies_base_rel;
} IndexFetchHeapData;
/* Result codes for HeapTupleSatisfiesVacuum */
diff --git a/src/include/access/tableam.h b/src/include/access/tableam.h
index 2fa790b6bf5..d10b1b03cdb 100644
--- a/src/include/access/tableam.h
+++ b/src/include/access/tableam.h
@@ -63,6 +63,8 @@ typedef enum ScanOptions
/* unregister snapshot at scan end? */
SO_TEMP_SNAPSHOT = 1 << 9,
+ /* set if the query doesn't modify the rel */
+ SO_HINT_REL_READ_ONLY = 1 << 10,
} ScanOptions;
/*
@@ -420,7 +422,7 @@ typedef struct TableAmRoutine
*
* Tuples for an index scan can then be fetched via index_fetch_tuple.
*/
- struct IndexFetchTableData *(*index_fetch_begin) (Relation rel);
+ struct IndexFetchTableData *(*index_fetch_begin) (Relation rel, uint32 flags);
/*
* Reset index fetch. Typically this will release cross index fetch
@@ -874,9 +876,9 @@ extern TupleTableSlot *table_slot_create(Relation relation, List **reglist);
*/
static inline TableScanDesc
table_beginscan(Relation rel, Snapshot snapshot,
- int nkeys, ScanKeyData *key)
+ int nkeys, ScanKeyData *key, uint32 flags)
{
- uint32 flags = SO_TYPE_SEQSCAN |
+ flags |= SO_TYPE_SEQSCAN |
SO_ALLOW_STRAT | SO_ALLOW_SYNC | SO_ALLOW_PAGEMODE;
return rel->rd_tableam->scan_begin(rel, snapshot, nkeys, key, NULL, flags);
@@ -919,9 +921,9 @@ table_beginscan_strat(Relation rel, Snapshot snapshot,
*/
static inline TableScanDesc
table_beginscan_bm(Relation rel, Snapshot snapshot,
- int nkeys, ScanKeyData *key)
+ int nkeys, ScanKeyData *key, uint32 flags)
{
- uint32 flags = SO_TYPE_BITMAPSCAN | SO_ALLOW_PAGEMODE;
+ flags |= SO_TYPE_BITMAPSCAN | SO_ALLOW_PAGEMODE;
return rel->rd_tableam->scan_begin(rel, snapshot, nkeys, key,
NULL, flags);
@@ -1128,7 +1130,8 @@ extern void table_parallelscan_initialize(Relation rel,
* Caller must hold a suitable lock on the relation.
*/
extern TableScanDesc table_beginscan_parallel(Relation relation,
- ParallelTableScanDesc pscan);
+ ParallelTableScanDesc pscan,
+ uint32 flags);
/*
* Begin a parallel tid range scan. `pscan` needs to have been initialized
@@ -1164,9 +1167,9 @@ table_parallelscan_reinitialize(Relation rel, ParallelTableScanDesc pscan)
* Tuples for an index scan can then be fetched via table_index_fetch_tuple().
*/
static inline IndexFetchTableData *
-table_index_fetch_begin(Relation rel)
+table_index_fetch_begin(Relation rel, uint32 flags)
{
- return rel->rd_tableam->index_fetch_begin(rel);
+ return rel->rd_tableam->index_fetch_begin(rel, flags);
}
/*
--
2.43.0
v24-0015-Allow-on-access-pruning-to-set-pages-all-visible.patchtext/x-patch; charset=US-ASCII; name=v24-0015-Allow-on-access-pruning-to-set-pages-all-visible.patchDownload
From 352e4756fb7641a0142a7e2a1a0826d81427b935 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <melanieplageman@gmail.com>
Date: Wed, 3 Dec 2025 15:24:08 -0500
Subject: [PATCH v24 15/16] Allow on-access pruning to set pages all-visible
Many queries do not modify the underlying relation. For such queries, if
on-access pruning occurs during the scan, we can check whether the page
has become all-visible and update the visibility map accordingly.
Previously, only vacuum and COPY FREEZE marked pages as all-visible or
all-frozen.
This commit implements on-access VM setting for sequential scans as well
as for the underlying heap relation in index scans and bitmap heap
scans.
Author: Melanie Plageman <melanieplageman@gmail.com>
Reviewed-by: Andres Freund <andres@anarazel.de>
Reviewed-by: Kirill Reshke <reshkekirill@gmail.com>
Discussion: https://postgr.es/m/flat/CAAKRu_ZMw6Npd_qm2KM%2BFwQ3cMOMx1Dh3VMhp8-V7SOLxdK9-g%40mail.gmail.com
---
src/backend/access/heap/heapam.c | 15 +++-
src/backend/access/heap/heapam_handler.c | 15 +++-
src/backend/access/heap/pruneheap.c | 74 ++++++++++++++++---
src/include/access/heapam.h | 24 +++++-
.../t/035_standby_logical_decoding.pl | 3 +-
5 files changed, 114 insertions(+), 17 deletions(-)
diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index 3ad78ba4694..ecc04390ac7 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -570,6 +570,7 @@ heap_prepare_pagescan(TableScanDesc sscan)
Buffer buffer = scan->rs_cbuf;
BlockNumber block = scan->rs_cblock;
Snapshot snapshot;
+ Buffer *vmbuffer = NULL;
Page page;
int lines;
bool all_visible;
@@ -584,7 +585,9 @@ heap_prepare_pagescan(TableScanDesc sscan)
/*
* Prune and repair fragmentation for the whole page, if possible.
*/
- heap_page_prune_opt(scan->rs_base.rs_rd, buffer);
+ if (sscan->rs_flags & SO_HINT_REL_READ_ONLY)
+ vmbuffer = &scan->rs_vmbuffer;
+ heap_page_prune_opt(scan->rs_base.rs_rd, buffer, vmbuffer);
/*
* We must hold share lock on the buffer content while examining tuple
@@ -1261,6 +1264,7 @@ heap_beginscan(Relation relation, Snapshot snapshot,
sizeof(TBMIterateResult));
}
+ scan->rs_vmbuffer = InvalidBuffer;
return (TableScanDesc) scan;
}
@@ -1299,6 +1303,12 @@ heap_rescan(TableScanDesc sscan, ScanKey key, bool set_params,
scan->rs_cbuf = InvalidBuffer;
}
+ if (BufferIsValid(scan->rs_vmbuffer))
+ {
+ ReleaseBuffer(scan->rs_vmbuffer);
+ scan->rs_vmbuffer = InvalidBuffer;
+ }
+
/*
* SO_TYPE_BITMAPSCAN would be cleaned up here, but it does not hold any
* additional data vs a normal HeapScan
@@ -1331,6 +1341,9 @@ heap_endscan(TableScanDesc sscan)
if (BufferIsValid(scan->rs_cbuf))
ReleaseBuffer(scan->rs_cbuf);
+ if (BufferIsValid(scan->rs_vmbuffer))
+ ReleaseBuffer(scan->rs_vmbuffer);
+
/*
* Must free the read stream before freeing the BufferAccessStrategy.
*/
diff --git a/src/backend/access/heap/heapam_handler.c b/src/backend/access/heap/heapam_handler.c
index d7fac94826d..27e3498f5f4 100644
--- a/src/backend/access/heap/heapam_handler.c
+++ b/src/backend/access/heap/heapam_handler.c
@@ -85,6 +85,7 @@ heapam_index_fetch_begin(Relation rel, uint32 flags)
hscan->xs_base.rel = rel;
hscan->xs_cbuf = InvalidBuffer;
+ hscan->xs_vmbuffer = InvalidBuffer;
hscan->modifies_base_rel = !(flags & SO_HINT_REL_READ_ONLY);
return &hscan->xs_base;
@@ -100,6 +101,12 @@ heapam_index_fetch_reset(IndexFetchTableData *scan)
ReleaseBuffer(hscan->xs_cbuf);
hscan->xs_cbuf = InvalidBuffer;
}
+
+ if (BufferIsValid(hscan->xs_vmbuffer))
+ {
+ ReleaseBuffer(hscan->xs_vmbuffer);
+ hscan->xs_vmbuffer = InvalidBuffer;
+ }
}
static void
@@ -139,7 +146,8 @@ heapam_index_fetch_tuple(struct IndexFetchTableData *scan,
* Prune page, but only if we weren't already on this page
*/
if (prev_buf != hscan->xs_cbuf)
- heap_page_prune_opt(hscan->xs_base.rel, hscan->xs_cbuf);
+ heap_page_prune_opt(hscan->xs_base.rel, hscan->xs_cbuf,
+ hscan->modifies_base_rel ? NULL : &hscan->xs_vmbuffer);
}
/* Obtain share-lock on the buffer so we can examine visibility */
@@ -2472,6 +2480,7 @@ BitmapHeapScanNextBlock(TableScanDesc scan,
TBMIterateResult *tbmres;
OffsetNumber offsets[TBM_MAX_TUPLES_PER_PAGE];
int noffsets = -1;
+ Buffer *vmbuffer = NULL;
Assert(scan->rs_flags & SO_TYPE_BITMAPSCAN);
Assert(hscan->rs_read_stream);
@@ -2518,7 +2527,9 @@ BitmapHeapScanNextBlock(TableScanDesc scan,
/*
* Prune and repair fragmentation for the whole page, if possible.
*/
- heap_page_prune_opt(scan->rs_rd, buffer);
+ if (scan->rs_flags & SO_HINT_REL_READ_ONLY)
+ vmbuffer = &hscan->rs_vmbuffer;
+ heap_page_prune_opt(scan->rs_rd, buffer, vmbuffer);
/*
* We must hold share lock on the buffer content while examining tuple
diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index 4ed2eff5e05..15239e0cbbd 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -205,7 +205,9 @@ static bool heap_page_will_set_vm(Relation relation,
Buffer heap_buf,
Buffer vmbuffer,
bool blk_known_av,
- const PruneState *prstate,
+ PruneReason reason,
+ bool do_prune, bool do_freeze,
+ PruneState *prstate,
uint8 *new_vmbits);
@@ -220,9 +222,13 @@ static bool heap_page_will_set_vm(Relation relation,
* if there's not any use in pruning.
*
* Caller must have pin on the buffer, and must *not* have a lock on it.
+ *
+ * If vmbuffer is not NULL, it is okay for pruning to set the visibility map if
+ * the page is all-visible. We will take care of pinning and, if needed,
+ * reading in the page of the visibility map.
*/
void
-heap_page_prune_opt(Relation relation, Buffer buffer)
+heap_page_prune_opt(Relation relation, Buffer buffer, Buffer *vmbuffer)
{
Page page = BufferGetPage(buffer);
TransactionId prune_xid;
@@ -304,6 +310,13 @@ heap_page_prune_opt(Relation relation, Buffer buffer)
.cutoffs = NULL,
};
+ if (vmbuffer)
+ {
+ visibilitymap_pin(relation, BufferGetBlockNumber(buffer), vmbuffer);
+ params.options |= HEAP_PAGE_PRUNE_UPDATE_VM;
+ params.vmbuffer = *vmbuffer;
+ }
+
heap_page_prune_and_freeze(¶ms, &presult, &dummy_off_loc,
NULL, NULL);
@@ -862,6 +875,9 @@ get_conflict_xid(bool do_prune, bool do_freeze, bool do_set_vm, uint8 new_vmbits
* corrupted, it will fix them by clearing the VM bits and page visibility
* hint. This does not need to be done in a critical section.
*
+ * This should be called only after do_freeze has been decided (and do_prune
+ * has been set), as these factor into our heuristic-based decision.
+ *
* Returns true if one or both VM bits should be set, along with returning the
* desired what bits should be set in the VM in *new_vmbits.
*/
@@ -871,7 +887,9 @@ heap_page_will_set_vm(Relation relation,
Buffer heap_buf,
Buffer vmbuffer,
bool blk_known_av,
- const PruneState *prstate,
+ PruneReason reason,
+ bool do_prune, bool do_freeze,
+ PruneState *prstate,
uint8 *new_vmbits)
{
Page heap_page = BufferGetPage(heap_buf);
@@ -884,6 +902,24 @@ heap_page_will_set_vm(Relation relation,
return false;
}
+ /*
+ * If this is an on-access call and we're not actually pruning, avoid
+ * setting the visibility map if it would newly dirty the heap page or, if
+ * the page is already dirty, if doing so would require including a
+ * full-page image (FPI) of the heap page in the WAL. This situation
+ * should be rare, as on-access pruning is only attempted when
+ * pd_prune_xid is valid.
+ */
+ if (reason == PRUNE_ON_ACCESS &&
+ prstate->all_visible &&
+ !do_prune && !do_freeze &&
+ (!BufferIsDirty(heap_buf) || XLogCheckBufferNeedsBackup(heap_buf)))
+ {
+ prstate->all_visible = false;
+ prstate->all_frozen = false;
+ return false;
+ }
+
/*
* Determine what the visibility map bits should be set to using the
* values of all_visible and all_frozen determined during
@@ -906,14 +942,15 @@ heap_page_will_set_vm(Relation relation,
* WAL-logged.
*
* As of PostgreSQL 9.2, the visibility map bit should never be set if the
- * page-level bit is clear. However, it's possible that the bit got
- * cleared after heap_vac_scan_next_block() was called, so we must recheck
- * with buffer lock before concluding that the VM is corrupt.
+ * page-level bit is clear. However, it's possible that in vacuum the bit
+ * got cleared after heap_vac_scan_next_block() was called, so we must
+ * recheck with buffer lock before concluding that the VM is corrupt.
*
* Callers which did not check the visibility map and determine
* blk_known_av will not be eligible for this, however the cost of
* potentially needing to read the visibility map for pages that are not
- * all-visible is too high to justify generalizing the check.
+ * all-visible is too high to justify generalizing the check. A future
+ * vacuum will have to take care of fixing the corruption.
*/
else if (blk_known_av && !PageIsAllVisible(heap_page) &&
visibilitymap_get_status(relation, heap_blk, &vmbuffer) != 0)
@@ -1129,6 +1166,7 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
*/
do_set_vm = heap_page_will_set_vm(params->relation,
blockno, buffer, vmbuffer, params->blk_known_av,
+ params->reason, do_prune, do_freeze,
&prstate, &new_vmbits);
/*
@@ -1195,15 +1233,31 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
old_vmbits = visibilitymap_set(blockno,
vmbuffer, new_vmbits,
params->relation->rd_locator);
- Assert(old_vmbits != new_vmbits);
+
+ /*
+ * If on-access pruning set the VM in between when vacuum first
+ * checked the visibility map and determined blk_known_av and when
+ * we actually prune the page, we could end up trying to set the
+ * VM only to find it is already set.
+ */
+ if (old_vmbits == new_vmbits)
+ {
+ LockBuffer(vmbuffer, BUFFER_LOCK_UNLOCK);
+ /* Unset so we don't emit WAL since no change occured */
+ do_set_vm = false;
+ }
}
MarkBufferDirty(buffer);
/*
- * Emit a WAL XLOG_HEAP2_PRUNE* record showing what we did
+ * Emit a WAL XLOG_HEAP2_PRUNE* record showing what we did. If we were
+ * only planning to update the VM, and it turns out that it was
+ * already set, there is no need to emit WAL. As such, we must check
+ * that some change is required again.
*/
- if (RelationNeedsWAL(params->relation))
+ if (RelationNeedsWAL(params->relation) &&
+ (do_prune || do_freeze || do_set_vm))
{
log_heap_prune_and_freeze(params->relation, buffer,
do_set_vm ? vmbuffer : InvalidBuffer,
diff --git a/src/include/access/heapam.h b/src/include/access/heapam.h
index 480a1bd654f..89538652566 100644
--- a/src/include/access/heapam.h
+++ b/src/include/access/heapam.h
@@ -95,6 +95,13 @@ typedef struct HeapScanDescData
*/
ParallelBlockTableScanWorkerData *rs_parallelworkerdata;
+ /*
+ * For sequential scans and bitmap heap scans. If the relation is not
+ * being modified, on-access pruning may read in the current heap page's
+ * corresponding VM block to this buffer.
+ */
+ Buffer rs_vmbuffer;
+
/* these fields only used in page-at-a-time mode and for bitmap scans */
uint32 rs_cindex; /* current tuple's index in vistuples */
uint32 rs_ntuples; /* number of visible tuples on page */
@@ -117,8 +124,18 @@ typedef struct IndexFetchHeapData
{
IndexFetchTableData xs_base; /* AM independent part of the descriptor */
- Buffer xs_cbuf; /* current heap buffer in scan, if any */
- /* NB: if xs_cbuf is not InvalidBuffer, we hold a pin on that buffer */
+ /*
+ * Current heap buffer in scan, if any. NB: if xs_cbuf is not
+ * InvalidBuffer, we hold a pin on that buffer.
+ */
+ Buffer xs_cbuf;
+
+ /*
+ * For index scans that do not modify the underlying heap table, on-access
+ * pruning may read in the current heap page's corresponding VM block to
+ * this buffer.
+ */
+ Buffer xs_vmbuffer;
/*
* Some optimizations can only be performed if the query does not modify
@@ -425,7 +442,8 @@ extern TransactionId heap_index_delete_tuples(Relation rel,
TM_IndexDeleteOp *delstate);
/* in heap/pruneheap.c */
-extern void heap_page_prune_opt(Relation relation, Buffer buffer);
+extern void heap_page_prune_opt(Relation relation, Buffer buffer,
+ Buffer *vmbuffer);
extern void heap_page_prune_and_freeze(PruneFreezeParams *params,
PruneFreezeResult *presult,
OffsetNumber *off_loc,
diff --git a/src/test/recovery/t/035_standby_logical_decoding.pl b/src/test/recovery/t/035_standby_logical_decoding.pl
index ebe2fae1789..bdd9f0a62cd 100644
--- a/src/test/recovery/t/035_standby_logical_decoding.pl
+++ b/src/test/recovery/t/035_standby_logical_decoding.pl
@@ -296,6 +296,7 @@ wal_level = 'logical'
max_replication_slots = 4
max_wal_senders = 4
autovacuum = off
+hot_standby_feedback = on
});
$node_primary->dump_info;
$node_primary->start;
@@ -748,7 +749,7 @@ check_pg_recvlogical_stderr($handle,
$logstart = -s $node_standby->logfile;
reactive_slots_change_hfs_and_wait_for_xmins('shared_row_removal_',
- 'no_conflict_', 0, 1);
+ 'no_conflict_', 1, 0);
# This should not trigger a conflict
wait_until_vacuum_can_remove(
--
2.43.0
v24-0016-Set-pd_prune_xid-on-insert.patchtext/x-patch; charset=UTF-8; name=v24-0016-Set-pd_prune_xid-on-insert.patchDownload
From 44126b8cbf2f4a353a82d380f1c636db72087db4 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <melanieplageman@gmail.com>
Date: Tue, 29 Jul 2025 16:12:56 -0400
Subject: [PATCH v24 16/16] Set pd_prune_xid on insert
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
Now that visibility map (VM) updates can occur during read-only queries,
it makes sense to also set the page’s pd_prune_xid hint during inserts.
This enables heap_page_prune_and_freeze() to run after a page is
filled with newly inserted tuples the first time it is read.
This change also addresses a long-standing note in heap_insert() and
heap_multi_insert(), which observed that setting pd_prune_xid would
help clean up aborted insertions sooner. Without it, such tuples might
linger until VACUUM, whereas now they can be pruned earlier.
Setting pd_prune_xid on insert can cause a page to be dirtied and
written out when it previously would not have been, affetcting the
reported number of hits in the index-killtuples isolation test. It is
unclear if this is a bug in the way hits are tracked, a faulty test
expectation, or if simply updating the test's expected output is
sufficient remediation.
ci-os-only:
---
src/backend/access/heap/heapam.c | 25 +++++++++++++------
src/backend/access/heap/heapam_xlog.c | 15 ++++++++++-
.../modules/index/expected/killtuples.out | 6 ++---
3 files changed, 34 insertions(+), 12 deletions(-)
diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index ecc04390ac7..d5f3f897dd3 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -2119,6 +2119,7 @@ heap_insert(Relation relation, HeapTuple tup, CommandId cid,
TransactionId xid = GetCurrentTransactionId();
HeapTuple heaptup;
Buffer buffer;
+ Page page;
Buffer vmbuffer = InvalidBuffer;
bool all_visible_cleared = false;
@@ -2178,15 +2179,19 @@ heap_insert(Relation relation, HeapTuple tup, CommandId cid,
}
/*
- * XXX Should we set PageSetPrunable on this page ?
+ * Set pd_prune_xid to trigger heap_page_prune_and_freeze() once the page
+ * is full so that we can set the page all-visible in the VM.
*
- * The inserting transaction may eventually abort thus making this tuple
- * DEAD and hence available for pruning. Though we don't want to optimize
- * for aborts, if no other tuple in this page is UPDATEd/DELETEd, the
- * aborted tuple will never be pruned until next vacuum is triggered.
+ * Setting pd_prune_xid is also handy if the inserting transaction
+ * eventually aborts making this tuple DEAD and hence available for
+ * pruning. If no other tuple in this page is UPDATEd/DELETEd, the aborted
+ * tuple would never otherwise be pruned until next vacuum is triggered.
*
- * If you do add PageSetPrunable here, add it in heap_xlog_insert too.
+ * Don't set it if we are in bootstrap mode, though.
*/
+ page = BufferGetPage(buffer);
+ if (TransactionIdIsNormal(xid))
+ PageSetPrunable(page, xid);
MarkBufferDirty(buffer);
@@ -2196,7 +2201,6 @@ heap_insert(Relation relation, HeapTuple tup, CommandId cid,
xl_heap_insert xlrec;
xl_heap_header xlhdr;
XLogRecPtr recptr;
- Page page = BufferGetPage(buffer);
uint8 info = XLOG_HEAP_INSERT;
int bufflags = 0;
@@ -2560,8 +2564,13 @@ heap_multi_insert(Relation relation, TupleTableSlot **slots, int ntuples,
}
/*
- * XXX Should we set PageSetPrunable on this page ? See heap_insert()
+ * Set pd_prune_xid. See heap_insert() for more on why we do this when
+ * inserting tuples. This only makes sense if we aren't already
+ * setting the page frozen in the VM. We also don't set it in
+ * bootstrap mode.
*/
+ if (!all_frozen_set && TransactionIdIsNormal(xid))
+ PageSetPrunable(page, xid);
MarkBufferDirty(buffer);
diff --git a/src/backend/access/heap/heapam_xlog.c b/src/backend/access/heap/heapam_xlog.c
index b66736ea282..5c8dc2718ce 100644
--- a/src/backend/access/heap/heapam_xlog.c
+++ b/src/backend/access/heap/heapam_xlog.c
@@ -447,6 +447,12 @@ heap_xlog_insert(XLogReaderState *record)
freespace = PageGetHeapFreeSpace(page); /* needed to update FSM below */
+ /*
+ * Set the page prunable to trigger on-access pruning later which may
+ * set the page all-visible in the VM.
+ */
+ PageSetPrunable(page, XLogRecGetXid(record));
+
PageSetLSN(page, lsn);
if (xlrec->flags & XLH_INSERT_ALL_VISIBLE_CLEARED)
@@ -596,9 +602,16 @@ heap_xlog_multi_insert(XLogReaderState *record)
if (xlrec->flags & XLH_INSERT_ALL_VISIBLE_CLEARED)
PageClearAllVisible(page);
- /* XLH_INSERT_ALL_FROZEN_SET implies that all tuples are visible */
+ /*
+ * XLH_INSERT_ALL_FROZEN_SET implies that all tuples are visible. If
+ * we are not setting the page frozen, then set the page's prunable
+ * hint so that we trigger on-access pruning later which may set the
+ * page all-visible in the VM.
+ */
if (xlrec->flags & XLH_INSERT_ALL_FROZEN_SET)
PageSetAllVisible(page);
+ else
+ PageSetPrunable(page, XLogRecGetXid(record));
MarkBufferDirty(buffer);
}
diff --git a/src/test/modules/index/expected/killtuples.out b/src/test/modules/index/expected/killtuples.out
index be7ddd756ef..b29f2434b00 100644
--- a/src/test/modules/index/expected/killtuples.out
+++ b/src/test/modules/index/expected/killtuples.out
@@ -54,7 +54,7 @@ step flush: SELECT FROM pg_stat_force_next_flush();
step result: SELECT heap_blks_read + heap_blks_hit - counter.heap_accesses AS new_heap_accesses FROM counter, pg_statio_all_tables WHERE relname = 'kill_prior_tuple';
new_heap_accesses
-----------------
- 1
+ 2
(1 row)
step measure: UPDATE counter SET heap_accesses = (SELECT heap_blks_read + heap_blks_hit FROM pg_statio_all_tables WHERE relname = 'kill_prior_tuple');
@@ -130,7 +130,7 @@ step flush: SELECT FROM pg_stat_force_next_flush();
step result: SELECT heap_blks_read + heap_blks_hit - counter.heap_accesses AS new_heap_accesses FROM counter, pg_statio_all_tables WHERE relname = 'kill_prior_tuple';
new_heap_accesses
-----------------
- 1
+ 2
(1 row)
step measure: UPDATE counter SET heap_accesses = (SELECT heap_blks_read + heap_blks_hit FROM pg_statio_all_tables WHERE relname = 'kill_prior_tuple');
@@ -283,7 +283,7 @@ step flush: SELECT FROM pg_stat_force_next_flush();
step result: SELECT heap_blks_read + heap_blks_hit - counter.heap_accesses AS new_heap_accesses FROM counter, pg_statio_all_tables WHERE relname = 'kill_prior_tuple';
new_heap_accesses
-----------------
- 1
+ 2
(1 row)
step measure: UPDATE counter SET heap_accesses = (SELECT heap_blks_read + heap_blks_hit FROM pg_statio_all_tables WHERE relname = 'kill_prior_tuple');
--
2.43.0
On Tue, Dec 9, 2025 at 12:48 PM Melanie Plageman
<melanieplageman@gmail.com> wrote:
In this set 0001 and 0002 are independent. 0003-0007 are all small
steps toward the single change in 0007 which combines the VM updates
into the same WAL record as pruning and freezing. 0008 and 0009 are
removing the rest of XLOG_HEAP2_VISIBLE. 0010 - 0012 are refactoring
needed to set the VM during on-access pruning. 0013 - 0015 are small
steps toward setting the VM on-access. And 0016 sets the prune xid on
insert so we may set the VM on-access for pages that have only new
data.
I committed 0001 and 0002. attached v25 reflects that.
0001-0004 refactoring steps for eliminate visible record from phase I
(not probably independent commits in the end)
0005 eliminate XLOG_HEAP2_VISIBLE from phase I vac
0006-0007 removing the rest of XLOG_HEAP2_VISIBLE
0008-0010 refactoring for setting VM on-access
0011-0013 setting the VM on-access
0014 - setting pd_prune_xid on insert
- Melanie
Attachments:
v25-0001-Combine-visibilitymap_set-cases-in-lazy_scan_pru.patchapplication/x-patch; name=v25-0001-Combine-visibilitymap_set-cases-in-lazy_scan_pru.patchDownload
From 0863c5db56d8f62cf525e8b98ab71245e27c17a6 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <melanieplageman@gmail.com>
Date: Mon, 8 Dec 2025 15:49:54 -0500
Subject: [PATCH v25 01/14] Combine visibilitymap_set() cases in
lazy_scan_prune()
The heap buffer is unconditionally added to the WAL chain when setting
the VM, so it must always be marked dirty.
In one of the cases in lazy_scan_prune(), we try to avoid setting
PD_ALL_VISIBLE and marking the buffer dirty again if PD_ALL_VISIBLE is
already set. There is little gain here, and if we eliminate that
condition, we can easily combine the two cases which set the VM in
lazy_scan_prune(). This is more straightforward and makes it clear that
the heap buffer must be marked dirty since it is added to the WAL chain.
In the previously separate second VM set case, the heap buffer would
always be dirty anyway -- either because we just froze a tuple and
marked the buffer dirty or because we modified the buffer between
find_next_unskippable_block() and heap_page_prune_and_freeze() and then
pruned it in heap_page_prune_and_freeze().
This commit also adds a test case for vacuum when it does not need
to modify the heap page. Currently that would ensure the heap buffer is
marked dirty before adding it to the WAL chain, but if we ever remove it
from the VM set WAL chain or pass it with REGBUF_NO_CHANGES, it would
also serve as coverage of that.
---
.../pg_visibility/expected/pg_visibility.out | 17 ++++
contrib/pg_visibility/sql/pg_visibility.sql | 13 +++
src/backend/access/heap/vacuumlazy.c | 91 ++++---------------
3 files changed, 48 insertions(+), 73 deletions(-)
diff --git a/contrib/pg_visibility/expected/pg_visibility.out b/contrib/pg_visibility/expected/pg_visibility.out
index 09fa5933a35..3608f801eee 100644
--- a/contrib/pg_visibility/expected/pg_visibility.out
+++ b/contrib/pg_visibility/expected/pg_visibility.out
@@ -204,6 +204,23 @@ select pg_truncate_visibility_map('test_partition');
(1 row)
+-- test the case where vacuum phase I does not need to modify the heap buffer
+-- and only needs to set the VM
+create table test_vac_unmodified_heap(a int);
+insert into test_vac_unmodified_heap values (1);
+vacuum (freeze) test_vac_unmodified_heap;
+-- the checkpoint cleans the buffer dirtied by freezing the sole tuple
+checkpoint;
+-- truncating the VM ensures that the next vacuum will need to set it
+select pg_truncate_visibility_map('test_vac_unmodified_heap');
+ pg_truncate_visibility_map
+----------------------------
+
+(1 row)
+
+-- vacuum sets the VM but does not need to set PD_ALL_VISIBLE so no heap page
+-- modification
+vacuum test_vac_unmodified_heap;
-- test copy freeze
create table copyfreeze (a int, b char(1500));
-- load all rows via COPY FREEZE and ensure that all pages are set all-visible
diff --git a/contrib/pg_visibility/sql/pg_visibility.sql b/contrib/pg_visibility/sql/pg_visibility.sql
index 5af06ec5b76..6af7c179df0 100644
--- a/contrib/pg_visibility/sql/pg_visibility.sql
+++ b/contrib/pg_visibility/sql/pg_visibility.sql
@@ -94,6 +94,19 @@ select count(*) > 0 from pg_visibility_map_summary('test_partition');
select * from pg_check_frozen('test_partition'); -- hopefully none
select pg_truncate_visibility_map('test_partition');
+-- test the case where vacuum phase I does not need to modify the heap buffer
+-- and only needs to set the VM
+create table test_vac_unmodified_heap(a int);
+insert into test_vac_unmodified_heap values (1);
+vacuum (freeze) test_vac_unmodified_heap;
+-- the checkpoint cleans the buffer dirtied by freezing the sole tuple
+checkpoint;
+-- truncating the VM ensures that the next vacuum will need to set it
+select pg_truncate_visibility_map('test_vac_unmodified_heap');
+-- vacuum sets the VM but does not need to set PD_ALL_VISIBLE so no heap page
+-- modification
+vacuum test_vac_unmodified_heap;
+
-- test copy freeze
create table copyfreeze (a int, b char(1500));
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index e8c99c3773d..38a1268b004 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -2094,15 +2094,21 @@ lazy_scan_prune(LVRelState *vacrel,
* of last heap_vac_scan_next_block() call), and from all_visible and
* all_frozen variables
*/
- if (!all_visible_according_to_vm && presult.all_visible)
+ if ((presult.all_visible && !all_visible_according_to_vm) ||
+ (presult.all_frozen && !VM_ALL_FROZEN(rel, blkno, &vmbuffer)))
{
uint8 old_vmbits;
- uint8 flags = VISIBILITYMAP_ALL_VISIBLE;
+ uint8 new_vmbits = VISIBILITYMAP_ALL_VISIBLE;
if (presult.all_frozen)
{
+ /*
+ * We can pass InvalidTransactionId as our cutoff_xid, since a
+ * snapshotConflictHorizon sufficient to make everything safe for
+ * REDO was logged when the page's tuples were frozen.
+ */
Assert(!TransactionIdIsValid(presult.vm_conflict_horizon));
- flags |= VISIBILITYMAP_ALL_FROZEN;
+ new_vmbits |= VISIBILITYMAP_ALL_FROZEN;
}
/*
@@ -2111,35 +2117,33 @@ lazy_scan_prune(LVRelState *vacrel,
* checksums are not enabled). Regardless, set both bits so that we
* get back in sync.
*
- * NB: If the heap page is all-visible but the VM bit is not set, we
- * don't need to dirty the heap page. However, if checksums are
- * enabled, we do need to make sure that the heap page is dirtied
- * before passing it to visibilitymap_set(), because it may be logged.
- * Given that this situation should only happen in rare cases after a
- * crash, it is not worth optimizing.
+ * The heap page is added to the WAL chain even if it wasn't modified,
+ * so we still need to mark it dirty. The only scenario where it isn't
+ * modified in phase I is when the VM was truncated or removed, which
+ * isn't worth optimizing for.
*/
PageSetAllVisible(page);
MarkBufferDirty(buf);
old_vmbits = visibilitymap_set(vacrel->rel, blkno, buf,
InvalidXLogRecPtr,
vmbuffer, presult.vm_conflict_horizon,
- flags);
+ new_vmbits);
/*
- * If the page wasn't already set all-visible and/or all-frozen in the
- * VM, count it as newly set for logging.
+ * For the purposes of logging, count whether or not the page was
+ * newly set all-visible and, potentially, all-frozen.
*/
if ((old_vmbits & VISIBILITYMAP_ALL_VISIBLE) == 0)
{
vacrel->vm_new_visible_pages++;
- if (presult.all_frozen)
+ if ((new_vmbits & VISIBILITYMAP_ALL_FROZEN) != 0)
{
vacrel->vm_new_visible_frozen_pages++;
*vm_page_frozen = true;
}
}
else if ((old_vmbits & VISIBILITYMAP_ALL_FROZEN) == 0 &&
- presult.all_frozen)
+ (new_vmbits & VISIBILITYMAP_ALL_FROZEN) != 0)
{
vacrel->vm_new_frozen_pages++;
*vm_page_frozen = true;
@@ -2191,65 +2195,6 @@ lazy_scan_prune(LVRelState *vacrel,
VISIBILITYMAP_VALID_BITS);
}
- /*
- * If the all-visible page is all-frozen but not marked as such yet, mark
- * it as all-frozen.
- */
- else if (all_visible_according_to_vm && presult.all_frozen &&
- !VM_ALL_FROZEN(vacrel->rel, blkno, &vmbuffer))
- {
- uint8 old_vmbits;
-
- /*
- * Avoid relying on all_visible_according_to_vm as a proxy for the
- * page-level PD_ALL_VISIBLE bit being set, since it might have become
- * stale -- even when all_visible is set
- */
- if (!PageIsAllVisible(page))
- {
- PageSetAllVisible(page);
- MarkBufferDirty(buf);
- }
-
- /*
- * Set the page all-frozen (and all-visible) in the VM.
- *
- * We can pass InvalidTransactionId as our cutoff_xid, since a
- * snapshotConflictHorizon sufficient to make everything safe for REDO
- * was logged when the page's tuples were frozen.
- */
- Assert(!TransactionIdIsValid(presult.vm_conflict_horizon));
- old_vmbits = visibilitymap_set(vacrel->rel, blkno, buf,
- InvalidXLogRecPtr,
- vmbuffer, InvalidTransactionId,
- VISIBILITYMAP_ALL_VISIBLE |
- VISIBILITYMAP_ALL_FROZEN);
-
- /*
- * The page was likely already set all-visible in the VM. However,
- * there is a small chance that it was modified sometime between
- * setting all_visible_according_to_vm and checking the visibility
- * during pruning. Check the return value of old_vmbits anyway to
- * ensure the visibility map counters used for logging are accurate.
- */
- if ((old_vmbits & VISIBILITYMAP_ALL_VISIBLE) == 0)
- {
- vacrel->vm_new_visible_pages++;
- vacrel->vm_new_visible_frozen_pages++;
- *vm_page_frozen = true;
- }
-
- /*
- * We already checked that the page was not set all-frozen in the VM
- * above, so we don't need to test the value of old_vmbits.
- */
- else
- {
- vacrel->vm_new_frozen_pages++;
- *vm_page_frozen = true;
- }
- }
-
return presult.ndeleted;
}
--
2.43.0
v25-0002-Refactor-lazy_scan_prune-VM-set-logic-into-helpe.patchapplication/x-patch; name=v25-0002-Refactor-lazy_scan_prune-VM-set-logic-into-helpe.patchDownload
From e697a3c859e18365d14d7c754c47ecccfb0f8e9a Mon Sep 17 00:00:00 2001
From: Melanie Plageman <melanieplageman@gmail.com>
Date: Tue, 2 Dec 2025 13:36:39 -0500
Subject: [PATCH v25 02/14] Refactor lazy_scan_prune() VM set logic into helper
While this may not be an improvement on its own, encapsulating the logic
for determining what to set the VM bits to in a helper is one step
toward setting the VM in heap_page_prune_and_freeze().
---
src/backend/access/heap/vacuumlazy.c | 207 ++++++++++++++++-----------
1 file changed, 126 insertions(+), 81 deletions(-)
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index 38a1268b004..6d5d708352e 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -1948,6 +1948,104 @@ cmpOffsetNumbers(const void *a, const void *b)
return pg_cmp_u16(*(const OffsetNumber *) a, *(const OffsetNumber *) b);
}
+
+/*
+ * Decide whether to set the visibility map bits (all-visible and all-frozen)
+ * for heap_blk using information from PruneFreezeResult and
+ * all_visible_according_to_vm. This function does not actually set the VM
+ * bits or page-level visibility hint, PD_ALL_VISIBLE.
+ *
+ * If it finds that the page-level visibility hint or VM is corrupted, it will
+ * fix them by clearing the VM bits and visibility page hint. This does not
+ * need to be done in a critical section.
+ *
+ * Returns true if one or both VM bits should be set, along with returning
+ * what bits should be set in the VM in *new_vmbits.
+ */
+static bool
+heap_page_will_set_vm(Relation relation,
+ BlockNumber heap_blk,
+ Buffer heap_buf,
+ Buffer vmbuffer,
+ bool all_visible_according_to_vm,
+ const PruneFreezeResult *presult,
+ uint8 *new_vmbits)
+{
+ Page heap_page = BufferGetPage(heap_buf);
+
+ *new_vmbits = 0;
+
+ /*
+ * Determine what to set the visibility map bits to based on information
+ * from the VM (as of last heap_vac_scan_next_block() call), and from
+ * all_visible and all_frozen variables.
+ */
+ if ((presult->all_visible && !all_visible_according_to_vm) ||
+ (presult->all_frozen && !VM_ALL_FROZEN(relation, heap_blk, &vmbuffer)))
+ {
+ *new_vmbits = VISIBILITYMAP_ALL_VISIBLE;
+ if (presult->all_frozen)
+ {
+ Assert(!TransactionIdIsValid(presult->vm_conflict_horizon));
+ *new_vmbits |= VISIBILITYMAP_ALL_FROZEN;
+ }
+
+ return true;
+ }
+
+ /*
+ * Now handle two potential corruption cases:
+ *
+ * These do not need to happen in a critical section and are not
+ * WAL-logged.
+ *
+ * As of PostgreSQL 9.2, the visibility map bit should never be set if the
+ * page-level bit is clear. However, it's possible that the bit got
+ * cleared after heap_vac_scan_next_block() was called, so we must recheck
+ * with buffer lock before concluding that the VM is corrupt.
+ */
+ else if (all_visible_according_to_vm && !PageIsAllVisible(heap_page) &&
+ visibilitymap_get_status(relation, heap_blk, &vmbuffer) != 0)
+ {
+ ereport(WARNING,
+ (errcode(ERRCODE_DATA_CORRUPTED),
+ errmsg("page is not marked all-visible but visibility map bit is set in relation \"%s\" page %u",
+ RelationGetRelationName(relation), heap_blk)));
+
+ visibilitymap_clear(relation, heap_blk, vmbuffer,
+ VISIBILITYMAP_VALID_BITS);
+ }
+
+ /*
+ * It's possible for the value returned by
+ * GetOldestNonRemovableTransactionId() to move backwards, so it's not
+ * wrong for us to see tuples that appear to not be visible to everyone
+ * yet, while PD_ALL_VISIBLE is already set. The real safe xmin value
+ * never moves backwards, but GetOldestNonRemovableTransactionId() is
+ * conservative and sometimes returns a value that's unnecessarily small,
+ * so if we see that contradiction it just means that the tuples that we
+ * think are not visible to everyone yet actually are, and the
+ * PD_ALL_VISIBLE flag is correct.
+ *
+ * There should never be LP_DEAD items on a page with PD_ALL_VISIBLE set,
+ * however.
+ */
+ else if (presult->lpdead_items > 0 && PageIsAllVisible(heap_page))
+ {
+ ereport(WARNING,
+ (errcode(ERRCODE_DATA_CORRUPTED),
+ errmsg("page containing LP_DEAD items is marked as all-visible in relation \"%s\" page %u",
+ RelationGetRelationName(relation), heap_blk)));
+
+ PageClearAllVisible(heap_page);
+ MarkBufferDirty(heap_buf);
+ visibilitymap_clear(relation, heap_blk, vmbuffer,
+ VISIBILITYMAP_VALID_BITS);
+ }
+
+ return false;
+}
+
/*
* lazy_scan_prune() -- lazy_scan_heap() pruning and freezing.
*
@@ -1978,6 +2076,9 @@ lazy_scan_prune(LVRelState *vacrel,
bool *vm_page_frozen)
{
Relation rel = vacrel->rel;
+ bool do_set_vm = false;
+ uint8 new_vmbits = 0;
+ uint8 old_vmbits = 0;
PruneFreezeResult presult;
PruneFreezeParams params = {
.relation = rel,
@@ -2089,33 +2190,20 @@ lazy_scan_prune(LVRelState *vacrel,
Assert(!presult.all_visible || !(*has_lpdead_items));
Assert(!presult.all_frozen || presult.all_visible);
- /*
- * Handle setting visibility map bit based on information from the VM (as
- * of last heap_vac_scan_next_block() call), and from all_visible and
- * all_frozen variables
- */
- if ((presult.all_visible && !all_visible_according_to_vm) ||
- (presult.all_frozen && !VM_ALL_FROZEN(rel, blkno, &vmbuffer)))
- {
- uint8 old_vmbits;
- uint8 new_vmbits = VISIBILITYMAP_ALL_VISIBLE;
-
- if (presult.all_frozen)
- {
- /*
- * We can pass InvalidTransactionId as our cutoff_xid, since a
- * snapshotConflictHorizon sufficient to make everything safe for
- * REDO was logged when the page's tuples were frozen.
- */
- Assert(!TransactionIdIsValid(presult.vm_conflict_horizon));
- new_vmbits |= VISIBILITYMAP_ALL_FROZEN;
- }
+ do_set_vm = heap_page_will_set_vm(rel,
+ blkno,
+ buf,
+ vmbuffer,
+ all_visible_according_to_vm,
+ &presult,
+ &new_vmbits);
+ if (do_set_vm)
+ {
/*
* It should never be the case that the visibility map page is set
* while the page-level bit is clear, but the reverse is allowed (if
- * checksums are not enabled). Regardless, set both bits so that we
- * get back in sync.
+ * checksums are not enabled).
*
* The heap page is added to the WAL chain even if it wasn't modified,
* so we still need to mark it dirty. The only scenario where it isn't
@@ -2128,71 +2216,28 @@ lazy_scan_prune(LVRelState *vacrel,
InvalidXLogRecPtr,
vmbuffer, presult.vm_conflict_horizon,
new_vmbits);
-
- /*
- * For the purposes of logging, count whether or not the page was
- * newly set all-visible and, potentially, all-frozen.
- */
- if ((old_vmbits & VISIBILITYMAP_ALL_VISIBLE) == 0)
- {
- vacrel->vm_new_visible_pages++;
- if ((new_vmbits & VISIBILITYMAP_ALL_FROZEN) != 0)
- {
- vacrel->vm_new_visible_frozen_pages++;
- *vm_page_frozen = true;
- }
- }
- else if ((old_vmbits & VISIBILITYMAP_ALL_FROZEN) == 0 &&
- (new_vmbits & VISIBILITYMAP_ALL_FROZEN) != 0)
- {
- vacrel->vm_new_frozen_pages++;
- *vm_page_frozen = true;
- }
}
/*
- * As of PostgreSQL 9.2, the visibility map bit should never be set if the
- * page-level bit is clear. However, it's possible that the bit got
- * cleared after heap_vac_scan_next_block() was called, so we must recheck
- * with buffer lock before concluding that the VM is corrupt.
+ * For the purposes of logging, count whether or not the page was newly
+ * set all-visible and, potentially, all-frozen.
*/
- else if (all_visible_according_to_vm && !PageIsAllVisible(page) &&
- visibilitymap_get_status(vacrel->rel, blkno, &vmbuffer) != 0)
+ if ((old_vmbits & VISIBILITYMAP_ALL_VISIBLE) == 0 &&
+ (new_vmbits & VISIBILITYMAP_ALL_VISIBLE) != 0)
{
- ereport(WARNING,
- (errcode(ERRCODE_DATA_CORRUPTED),
- errmsg("page is not marked all-visible but visibility map bit is set in relation \"%s\" page %u",
- vacrel->relname, blkno)));
-
- visibilitymap_clear(vacrel->rel, blkno, vmbuffer,
- VISIBILITYMAP_VALID_BITS);
+ vacrel->vm_new_visible_pages++;
+ if ((new_vmbits & VISIBILITYMAP_ALL_FROZEN) != 0)
+ {
+ vacrel->vm_new_visible_frozen_pages++;
+ *vm_page_frozen = true;
+ }
}
-
- /*
- * It's possible for the value returned by
- * GetOldestNonRemovableTransactionId() to move backwards, so it's not
- * wrong for us to see tuples that appear to not be visible to everyone
- * yet, while PD_ALL_VISIBLE is already set. The real safe xmin value
- * never moves backwards, but GetOldestNonRemovableTransactionId() is
- * conservative and sometimes returns a value that's unnecessarily small,
- * so if we see that contradiction it just means that the tuples that we
- * think are not visible to everyone yet actually are, and the
- * PD_ALL_VISIBLE flag is correct.
- *
- * There should never be LP_DEAD items on a page with PD_ALL_VISIBLE set,
- * however.
- */
- else if (presult.lpdead_items > 0 && PageIsAllVisible(page))
+ else if ((old_vmbits & VISIBILITYMAP_ALL_FROZEN) == 0 &&
+ (new_vmbits & VISIBILITYMAP_ALL_FROZEN) != 0)
{
- ereport(WARNING,
- (errcode(ERRCODE_DATA_CORRUPTED),
- errmsg("page containing LP_DEAD items is marked as all-visible in relation \"%s\" page %u",
- vacrel->relname, blkno)));
-
- PageClearAllVisible(page);
- MarkBufferDirty(buf);
- visibilitymap_clear(vacrel->rel, blkno, vmbuffer,
- VISIBILITYMAP_VALID_BITS);
+ Assert((new_vmbits & VISIBILITYMAP_ALL_VISIBLE) != 0);
+ vacrel->vm_new_frozen_pages++;
+ *vm_page_frozen = true;
}
return presult.ndeleted;
--
2.43.0
v25-0003-Set-the-VM-in-heap_page_prune_and_freeze.patchapplication/x-patch; name=v25-0003-Set-the-VM-in-heap_page_prune_and_freeze.patchDownload
From 2253e4f0982072516ce5da65e9cbadff818836e5 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <melanieplageman@gmail.com>
Date: Tue, 2 Dec 2025 15:07:42 -0500
Subject: [PATCH v25 03/14] Set the VM in heap_page_prune_and_freeze()
This has no independent benefit. It is meant for ease of review. As of
this commit, there is still a separate WAL record emitted for setting
the VM after pruning and freezing. But it is easier to review if moving
the logic into pruneheap.c is separate from setting the VM in the same
WAL record.
---
src/backend/access/heap/pruneheap.c | 263 ++++++++++++++++++++++-----
src/backend/access/heap/vacuumlazy.c | 147 +--------------
src/include/access/heapam.h | 27 +++
3 files changed, 254 insertions(+), 183 deletions(-)
diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index ca44225a10e..d7f36e2764f 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -19,7 +19,7 @@
#include "access/htup_details.h"
#include "access/multixact.h"
#include "access/transam.h"
-#include "access/visibilitymapdefs.h"
+#include "access/visibilitymap.h"
#include "access/xlog.h"
#include "access/xloginsert.h"
#include "commands/vacuum.h"
@@ -44,6 +44,8 @@ typedef struct
bool mark_unused_now;
/* whether to attempt freezing tuples */
bool attempt_freeze;
+ /* whether or not to attempt updating the VM */
+ bool attempt_update_vm;
struct VacuumCutoffs *cutoffs;
/*-------------------------------------------------------
@@ -140,16 +142,17 @@ typedef struct
* all_visible and all_frozen indicate if the all-visible and all-frozen
* bits in the visibility map can be set for this page after pruning.
*
- * visibility_cutoff_xid is the newest xmin of live tuples on the page.
- * The caller can use it as the conflict horizon, when setting the VM
- * bits. It is only valid if we froze some tuples, and all_frozen is
- * true.
+ * visibility_cutoff_xid is the newest xmin of live tuples on the page. It
+ * can be used as the conflict horizon when setting the VM or when
+ * freezing all the tuples on the page. It is only valid when all the live
+ * tuples on the page are all-visible.
*
* NOTE: all_visible and all_frozen initially don't include LP_DEAD items.
* That's convenient for heap_page_prune_and_freeze() to use them to
- * decide whether to freeze the page or not. The all_visible and
- * all_frozen values returned to the caller are adjusted to include
- * LP_DEAD items after we determine whether to opportunistically freeze.
+ * decide whether to opportunistically freeze the page or not. The
+ * all_visible and all_frozen values ultimately used to set the VM are
+ * adjusted to include LP_DEAD items after we determine whether or not to
+ * opportunistically freeze.
*/
bool all_visible;
bool all_frozen;
@@ -191,6 +194,13 @@ static void page_verify_redirects(Page page);
static bool heap_page_will_freeze(Relation relation, Buffer buffer,
bool did_tuple_hint_fpi, bool do_prune, bool do_hint_prune,
PruneState *prstate);
+static bool heap_page_will_set_vm(Relation relation,
+ BlockNumber heap_blk,
+ Buffer heap_buf,
+ Buffer vmbuffer,
+ bool blk_known_av,
+ const PruneFreezeResult *presult,
+ uint8 *new_vmbits);
/*
@@ -280,6 +290,8 @@ heap_page_prune_opt(Relation relation, Buffer buffer)
PruneFreezeParams params = {
.relation = relation,
.buffer = buffer,
+ .vmbuffer = InvalidBuffer,
+ .blk_known_av = false,
.reason = PRUNE_ON_ACCESS,
.options = 0,
.vistest = vistest,
@@ -338,6 +350,8 @@ prune_freeze_setup(PruneFreezeParams *params,
/* cutoffs must be provided if we will attempt freezing */
Assert(!(params->options & HEAP_PAGE_PRUNE_FREEZE) || params->cutoffs);
prstate->attempt_freeze = (params->options & HEAP_PAGE_PRUNE_FREEZE) != 0;
+ prstate->attempt_update_vm =
+ (params->options & HEAP_PAGE_PRUNE_UPDATE_VM) != 0;
prstate->cutoffs = params->cutoffs;
/*
@@ -386,51 +400,54 @@ prune_freeze_setup(PruneFreezeParams *params,
prstate->frz_conflict_horizon = InvalidTransactionId;
/*
- * Vacuum may update the VM after we're done. We can keep track of
- * whether the page will be all-visible and all-frozen after pruning and
- * freezing to help the caller to do that.
+ * Track whether the page could be marked all-visible and/or all-frozen.
+ * This information is used for opportunistic freezing and for updating
+ * the visibility map (VM) if requested by the caller.
*
- * Currently, only VACUUM sets the VM bits. To save the effort, only do
- * the bookkeeping if the caller needs it. Currently, that's tied to
- * HEAP_PAGE_PRUNE_FREEZE, but it could be a separate flag if you wanted
- * to update the VM bits without also freezing or freeze without also
- * setting the VM bits.
+ * Currently, only VACUUM performs freezing, but other callers may in the
+ * future. Visibility bookkeeping is required not just for setting the VM
+ * bits, but also for opportunistic freezing: we only consider freezing if
+ * the page would become all-frozen, or if it would be all-frozen except
+ * for dead tuples that VACUUM will remove. If attempt_update_vm is false,
+ * we will not set the VM bit even if the page is found to be all-visible.
*
- * In addition to telling the caller whether it can set the VM bit, we
- * also use 'all_visible' and 'all_frozen' for our own decision-making. If
- * the whole page would become frozen, we consider opportunistically
- * freezing tuples. We will not be able to freeze the whole page if there
- * are tuples present that are not visible to everyone or if there are
- * dead tuples which are not yet removable. However, dead tuples which
- * will be removed by the end of vacuuming should not preclude us from
- * opportunistically freezing. Because of that, we do not immediately
- * clear all_visible and all_frozen when we see LP_DEAD items. We fix
- * that after scanning the line pointers. We must correct all_visible and
- * all_frozen before we return them to the caller, so that the caller
- * doesn't set the VM bits incorrectly.
+ * If HEAP_PAGE_PRUNE_UPDATE_VM is passed without HEAP_PAGE_PRUNE_FREEZE,
+ * prstate.all_frozen must be initialized to false, since we will not call
+ * heap_prepare_freeze_tuple() for each tuple.
+ *
+ * Dead tuples that will be removed by the end of vacuum should not
+ * prevent opportunistic freezing. Therefore, we do not clear all_visible
+ * and all_frozen when we encounter LP_DEAD items. Instead, we correct
+ * them after deciding whether to freeze, but before updating the VM, to
+ * avoid setting the VM bits incorrectly.
+ *
+ * If neither freezing nor VM updates are requested, we skip the extra
+ * bookkeeping. In this case, initializing all_visible to false allows
+ * heap_prune_record_unchanged_lp_normal() to bypass unnecessary work.
*/
if (prstate->attempt_freeze)
{
prstate->all_visible = true;
prstate->all_frozen = true;
}
+ else if (prstate->attempt_update_vm)
+ {
+ prstate->all_visible = true;
+ prstate->all_frozen = false;
+ }
else
{
- /*
- * Initializing to false allows skipping the work to update them in
- * heap_prune_record_unchanged_lp_normal().
- */
prstate->all_visible = false;
prstate->all_frozen = false;
}
/*
- * The visibility cutoff xid is the newest xmin of live tuples on the
- * page. In the common case, this will be set as the conflict horizon the
- * caller can use for updating the VM. If, at the end of freezing and
- * pruning, the page is all-frozen, there is no possibility that any
- * running transaction on the standby does not see tuples on the page as
- * all-visible, so the conflict horizon remains InvalidTransactionId.
+ * The visibility cutoff xid is the newest xmin of live, committed tuples
+ * older than OldestXmin on the page. This field is only kept up-to-date
+ * if the page is all-visible. As soon as a tuple is encountered that is
+ * not visible to all, this field is unmaintained. As long as it is
+ * maintained, it can be used to calculate the snapshot conflict horizon
+ * when updating the VM and/or freezing all the tuples on the page.
*/
prstate->visibility_cutoff_xid = InvalidTransactionId;
}
@@ -765,10 +782,118 @@ heap_page_will_freeze(Relation relation, Buffer buffer,
return do_freeze;
}
+/*
+ * Decide whether to set the visibility map bits (all-visible and all-frozen)
+ * for heap_blk using information from PruneFreezeResult and blk_known_av.
+ * Some callers may already have examined this page’s VM bits (e.g., VACUUM in
+ * the previous heap_vac_scan_next_block() call) and can pass that along as
+ * blk_known_av. Callers that have not previously checked the page's status in
+ * the VM should pass false for blk_known_av.
+ *
+ * This function does not actually set the VM bits or page-level visibility
+ * hint, PD_ALL_VISIBLE.
+ *
+ * However, if it finds that the page-level visibility hint or VM is
+ * corrupted, it will fix them by clearing the VM bits and page visibility
+ * hint. This does not need to be done in a critical section.
+ *
+ * Returns true if one or both VM bits should be set, along with returning the
+ * desired what bits should be set in the VM in *new_vmbits.
+ */
+static bool
+heap_page_will_set_vm(Relation relation,
+ BlockNumber heap_blk,
+ Buffer heap_buf,
+ Buffer vmbuffer,
+ bool blk_known_av,
+ const PruneFreezeResult *presult,
+ uint8 *new_vmbits)
+{
+ Page heap_page = BufferGetPage(heap_buf);
+
+ *new_vmbits = 0;
+
+ /*
+ * Determine what the visibility map bits should be set to using the
+ * values of all_visible and all_frozen determined during
+ * pruning/freezing.
+ */
+ if ((presult->all_visible && !blk_known_av) ||
+ (presult->all_frozen && !VM_ALL_FROZEN(relation, heap_blk, &vmbuffer)))
+ {
+ *new_vmbits = VISIBILITYMAP_ALL_VISIBLE;
+ if (presult->all_frozen)
+ {
+ Assert(!TransactionIdIsValid(presult->vm_conflict_horizon));
+ *new_vmbits |= VISIBILITYMAP_ALL_FROZEN;
+ }
+
+ return true;
+ }
+
+ /*
+ * Now handle two potential corruption cases:
+ *
+ * These do not need to happen in a critical section and are not
+ * WAL-logged.
+ *
+ * As of PostgreSQL 9.2, the visibility map bit should never be set if the
+ * page-level bit is clear. However, it's possible that the bit got
+ * cleared after heap_vac_scan_next_block() was called, so we must recheck
+ * with buffer lock before concluding that the VM is corrupt.
+ *
+ * Callers which did not check the visibility map and determine
+ * blk_known_av will not be eligible for this, however the cost of
+ * potentially needing to read the visibility map for pages that are not
+ * all-visible is too high to justify generalizing the check.
+ */
+ else if (blk_known_av && !PageIsAllVisible(heap_page) &&
+ visibilitymap_get_status(relation, heap_blk, &vmbuffer) != 0)
+ {
+ ereport(WARNING,
+ (errcode(ERRCODE_DATA_CORRUPTED),
+ errmsg("page is not marked all-visible but visibility map bit is set in relation \"%s\" page %u",
+ RelationGetRelationName(relation), heap_blk)));
+
+ visibilitymap_clear(relation, heap_blk, vmbuffer,
+ VISIBILITYMAP_VALID_BITS);
+ }
+
+ /*
+ * It's possible for the value returned by
+ * GetOldestNonRemovableTransactionId() to move backwards, so it's not
+ * wrong for us to see tuples that appear to not be visible to everyone
+ * yet, while PD_ALL_VISIBLE is already set. The real safe xmin value
+ * never moves backwards, but GetOldestNonRemovableTransactionId() is
+ * conservative and sometimes returns a value that's unnecessarily small,
+ * so if we see that contradiction it just means that the tuples that we
+ * think are not visible to everyone yet actually are, and the
+ * PD_ALL_VISIBLE flag is correct.
+ *
+ * There should never be LP_DEAD items on a page with PD_ALL_VISIBLE set,
+ * however.
+ */
+ else if (presult->lpdead_items > 0 && PageIsAllVisible(heap_page))
+ {
+ ereport(WARNING,
+ (errcode(ERRCODE_DATA_CORRUPTED),
+ errmsg("page containing LP_DEAD items is marked as all-visible in relation \"%s\" page %u",
+ RelationGetRelationName(relation), heap_blk)));
+
+ PageClearAllVisible(heap_page);
+ MarkBufferDirty(heap_buf);
+ visibilitymap_clear(relation, heap_blk, vmbuffer,
+ VISIBILITYMAP_VALID_BITS);
+ }
+
+ return false;
+}
+
/*
* Prune and repair fragmentation and potentially freeze tuples on the
- * specified page.
+ * specified page. If the page's visibility status has changed, update it in
+ * the VM.
*
* Caller must have pin and buffer cleanup lock on the page. Note that we
* don't update the FSM information for page on caller's behalf. Caller might
@@ -783,12 +908,13 @@ heap_page_will_freeze(Relation relation, Buffer buffer,
* tuples if it's required in order to advance relfrozenxid / relminmxid, or
* if it's considered advantageous for overall system performance to do so
* now. The 'params.cutoffs', 'presult', 'new_relfrozen_xid' and
- * 'new_relmin_mxid' arguments are required when freezing. When
- * HEAP_PAGE_PRUNE_FREEZE option is passed, we also set presult->all_visible
- * and presult->all_frozen after determining whether or not to
- * opportunistically freeze, to indicate if the VM bits can be set. They are
- * always set to false when the HEAP_PAGE_PRUNE_FREEZE option is not passed,
- * because at the moment only callers that also freeze need that information.
+ * 'new_relmin_mxid' arguments are required when freezing.
+ *
+ * If HEAP_PAGE_PRUNE_UPDATE_VM is set in params and the visibility status of
+ * the page has changed, we will update the VM at the same time as pruning and
+ * freezing the heap page. We will also update presult->old_vmbits and
+ * presult->new_vmbits with the state of the VM before and after updating it
+ * for the caller to use in bookkeeping.
*
* presult contains output parameters needed by callers, such as the number of
* tuples removed and the offsets of dead items on the page after pruning.
@@ -813,11 +939,14 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
MultiXactId *new_relmin_mxid)
{
Buffer buffer = params->buffer;
+ Buffer vmbuffer = params->vmbuffer;
Page page = BufferGetPage(buffer);
+ BlockNumber blockno = BufferGetBlockNumber(buffer);
PruneState prstate;
bool do_freeze;
bool do_prune;
bool do_hint_prune;
+ bool do_set_vm;
bool did_tuple_hint_fpi;
int64 fpi_before = pgWalUsage.wal_fpi;
@@ -1001,6 +1130,48 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
*new_relmin_mxid = prstate.pagefrz.NoFreezePageRelminMxid;
}
}
+
+ presult->new_vmbits = 0;
+ presult->old_vmbits = 0;
+
+ /* Now update the visibility map and PD_ALL_VISIBLE hint */
+ Assert(!prstate.all_visible || (prstate.lpdead_items == 0));
+
+ do_set_vm = false;
+ if (prstate.attempt_update_vm)
+ do_set_vm = heap_page_will_set_vm(params->relation,
+ blockno,
+ buffer,
+ vmbuffer,
+ params->blk_known_av,
+ presult,
+ &presult->new_vmbits);
+
+ /*
+ * new_vmbits should be 0 regardless of whether or not the page is
+ * all-visible if we do not intend to set the VM.
+ */
+ Assert(do_set_vm || presult->new_vmbits == 0);
+
+ if (do_set_vm)
+ {
+ /*
+ * It should never be the case that the visibility map page is set
+ * while the page-level bit is clear, but the reverse is allowed (if
+ * checksums are not enabled).
+ *
+ * The heap page is added to the WAL chain even if it wasn't modified,
+ * so we still need to mark it dirty. The only scenario where it isn't
+ * modified in phase I is when the VM was truncated or removed, which
+ * isn't worth optimizing for.
+ */
+ PageSetAllVisible(page);
+ MarkBufferDirty(buffer);
+ presult->old_vmbits = visibilitymap_set(params->relation, blockno, buffer,
+ InvalidXLogRecPtr,
+ vmbuffer, presult->vm_conflict_horizon,
+ presult->new_vmbits);
+ }
}
@@ -1475,6 +1646,8 @@ heap_prune_record_unchanged_lp_normal(Page page, PruneState *prstate, OffsetNumb
{
TransactionId xmin;
+ Assert(prstate->attempt_update_vm);
+
if (!HeapTupleHeaderXminCommitted(htup))
{
prstate->all_visible = false;
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index 6d5d708352e..81ef81cb8f3 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -1949,103 +1949,6 @@ cmpOffsetNumbers(const void *a, const void *b)
}
-/*
- * Decide whether to set the visibility map bits (all-visible and all-frozen)
- * for heap_blk using information from PruneFreezeResult and
- * all_visible_according_to_vm. This function does not actually set the VM
- * bits or page-level visibility hint, PD_ALL_VISIBLE.
- *
- * If it finds that the page-level visibility hint or VM is corrupted, it will
- * fix them by clearing the VM bits and visibility page hint. This does not
- * need to be done in a critical section.
- *
- * Returns true if one or both VM bits should be set, along with returning
- * what bits should be set in the VM in *new_vmbits.
- */
-static bool
-heap_page_will_set_vm(Relation relation,
- BlockNumber heap_blk,
- Buffer heap_buf,
- Buffer vmbuffer,
- bool all_visible_according_to_vm,
- const PruneFreezeResult *presult,
- uint8 *new_vmbits)
-{
- Page heap_page = BufferGetPage(heap_buf);
-
- *new_vmbits = 0;
-
- /*
- * Determine what to set the visibility map bits to based on information
- * from the VM (as of last heap_vac_scan_next_block() call), and from
- * all_visible and all_frozen variables.
- */
- if ((presult->all_visible && !all_visible_according_to_vm) ||
- (presult->all_frozen && !VM_ALL_FROZEN(relation, heap_blk, &vmbuffer)))
- {
- *new_vmbits = VISIBILITYMAP_ALL_VISIBLE;
- if (presult->all_frozen)
- {
- Assert(!TransactionIdIsValid(presult->vm_conflict_horizon));
- *new_vmbits |= VISIBILITYMAP_ALL_FROZEN;
- }
-
- return true;
- }
-
- /*
- * Now handle two potential corruption cases:
- *
- * These do not need to happen in a critical section and are not
- * WAL-logged.
- *
- * As of PostgreSQL 9.2, the visibility map bit should never be set if the
- * page-level bit is clear. However, it's possible that the bit got
- * cleared after heap_vac_scan_next_block() was called, so we must recheck
- * with buffer lock before concluding that the VM is corrupt.
- */
- else if (all_visible_according_to_vm && !PageIsAllVisible(heap_page) &&
- visibilitymap_get_status(relation, heap_blk, &vmbuffer) != 0)
- {
- ereport(WARNING,
- (errcode(ERRCODE_DATA_CORRUPTED),
- errmsg("page is not marked all-visible but visibility map bit is set in relation \"%s\" page %u",
- RelationGetRelationName(relation), heap_blk)));
-
- visibilitymap_clear(relation, heap_blk, vmbuffer,
- VISIBILITYMAP_VALID_BITS);
- }
-
- /*
- * It's possible for the value returned by
- * GetOldestNonRemovableTransactionId() to move backwards, so it's not
- * wrong for us to see tuples that appear to not be visible to everyone
- * yet, while PD_ALL_VISIBLE is already set. The real safe xmin value
- * never moves backwards, but GetOldestNonRemovableTransactionId() is
- * conservative and sometimes returns a value that's unnecessarily small,
- * so if we see that contradiction it just means that the tuples that we
- * think are not visible to everyone yet actually are, and the
- * PD_ALL_VISIBLE flag is correct.
- *
- * There should never be LP_DEAD items on a page with PD_ALL_VISIBLE set,
- * however.
- */
- else if (presult->lpdead_items > 0 && PageIsAllVisible(heap_page))
- {
- ereport(WARNING,
- (errcode(ERRCODE_DATA_CORRUPTED),
- errmsg("page containing LP_DEAD items is marked as all-visible in relation \"%s\" page %u",
- RelationGetRelationName(relation), heap_blk)));
-
- PageClearAllVisible(heap_page);
- MarkBufferDirty(heap_buf);
- visibilitymap_clear(relation, heap_blk, vmbuffer,
- VISIBILITYMAP_VALID_BITS);
- }
-
- return false;
-}
-
/*
* lazy_scan_prune() -- lazy_scan_heap() pruning and freezing.
*
@@ -2076,15 +1979,14 @@ lazy_scan_prune(LVRelState *vacrel,
bool *vm_page_frozen)
{
Relation rel = vacrel->rel;
- bool do_set_vm = false;
- uint8 new_vmbits = 0;
- uint8 old_vmbits = 0;
PruneFreezeResult presult;
PruneFreezeParams params = {
.relation = rel,
.buffer = buf,
+ .vmbuffer = vmbuffer,
+ .blk_known_av = all_visible_according_to_vm,
.reason = PRUNE_VACUUM_SCAN,
- .options = HEAP_PAGE_PRUNE_FREEZE,
+ .options = HEAP_PAGE_PRUNE_FREEZE | HEAP_PAGE_PRUNE_UPDATE_VM,
.vistest = vacrel->vistest,
.cutoffs = &vacrel->cutoffs,
};
@@ -2187,55 +2089,24 @@ lazy_scan_prune(LVRelState *vacrel,
/* Did we find LP_DEAD items? */
*has_lpdead_items = (presult.lpdead_items > 0);
- Assert(!presult.all_visible || !(*has_lpdead_items));
- Assert(!presult.all_frozen || presult.all_visible);
-
- do_set_vm = heap_page_will_set_vm(rel,
- blkno,
- buf,
- vmbuffer,
- all_visible_according_to_vm,
- &presult,
- &new_vmbits);
-
- if (do_set_vm)
- {
- /*
- * It should never be the case that the visibility map page is set
- * while the page-level bit is clear, but the reverse is allowed (if
- * checksums are not enabled).
- *
- * The heap page is added to the WAL chain even if it wasn't modified,
- * so we still need to mark it dirty. The only scenario where it isn't
- * modified in phase I is when the VM was truncated or removed, which
- * isn't worth optimizing for.
- */
- PageSetAllVisible(page);
- MarkBufferDirty(buf);
- old_vmbits = visibilitymap_set(vacrel->rel, blkno, buf,
- InvalidXLogRecPtr,
- vmbuffer, presult.vm_conflict_horizon,
- new_vmbits);
- }
-
/*
* For the purposes of logging, count whether or not the page was newly
* set all-visible and, potentially, all-frozen.
*/
- if ((old_vmbits & VISIBILITYMAP_ALL_VISIBLE) == 0 &&
- (new_vmbits & VISIBILITYMAP_ALL_VISIBLE) != 0)
+ if ((presult.old_vmbits & VISIBILITYMAP_ALL_VISIBLE) == 0 &&
+ (presult.new_vmbits & VISIBILITYMAP_ALL_VISIBLE) != 0)
{
vacrel->vm_new_visible_pages++;
- if ((new_vmbits & VISIBILITYMAP_ALL_FROZEN) != 0)
+ if ((presult.new_vmbits & VISIBILITYMAP_ALL_FROZEN) != 0)
{
vacrel->vm_new_visible_frozen_pages++;
*vm_page_frozen = true;
}
}
- else if ((old_vmbits & VISIBILITYMAP_ALL_FROZEN) == 0 &&
- (new_vmbits & VISIBILITYMAP_ALL_FROZEN) != 0)
+ else if ((presult.old_vmbits & VISIBILITYMAP_ALL_FROZEN) == 0 &&
+ (presult.new_vmbits & VISIBILITYMAP_ALL_FROZEN) != 0)
{
- Assert((new_vmbits & VISIBILITYMAP_ALL_VISIBLE) != 0);
+ Assert((presult.new_vmbits & VISIBILITYMAP_ALL_VISIBLE) != 0);
vacrel->vm_new_frozen_pages++;
*vm_page_frozen = true;
}
diff --git a/src/include/access/heapam.h b/src/include/access/heapam.h
index 632c4332a8c..bb712c5b29f 100644
--- a/src/include/access/heapam.h
+++ b/src/include/access/heapam.h
@@ -42,6 +42,7 @@
/* "options" flag bits for heap_page_prune_and_freeze */
#define HEAP_PAGE_PRUNE_MARK_UNUSED_NOW (1 << 0)
#define HEAP_PAGE_PRUNE_FREEZE (1 << 1)
+#define HEAP_PAGE_PRUNE_UPDATE_VM (1 << 2)
typedef struct BulkInsertStateData *BulkInsertState;
typedef struct GlobalVisState GlobalVisState;
@@ -238,6 +239,18 @@ typedef struct PruneFreezeParams
Relation relation; /* relation containing buffer to be pruned */
Buffer buffer; /* buffer to be pruned */
+ /*
+ * vmbuffer is the buffer that must already contain the required block of
+ * the visibility map if we are to update it. blk_known_av is the
+ * visibility status of the heap block as of the last call to
+ * find_next_unskippable_block(). Callers which did not check the
+ * visibility map already should pass false for blk_known_av. This is only
+ * an optimization for callers that did check the VM and won't affect
+ * correctness.
+ */
+ Buffer vmbuffer;
+ bool blk_known_av;
+
/*
* The reason pruning was performed. It is used to set the WAL record
* opcode which is used for debugging and analysis purposes.
@@ -252,6 +265,9 @@ typedef struct PruneFreezeParams
*
* HEAP_PAGE_PRUNE_FREEZE indicates that we will also freeze tuples, and
* will return 'all_visible', 'all_frozen' flags to the caller.
+ *
+ * HEAP_PAGE_PRUNE_UPDATE_VM indicates that we will set the page's status
+ * in the VM.
*/
int options;
@@ -299,6 +315,17 @@ typedef struct PruneFreezeResult
bool all_frozen;
TransactionId vm_conflict_horizon;
+ /*
+ * old_vmbits are the state of the all-visible and all-frozen bits in the
+ * visibility map before updating it during phase I of vacuuming.
+ * new_vmbits are the state of those bits after phase I of vacuuming.
+ *
+ * These are only set if the HEAP_PAGE_PRUNE_UPDATE_VM option is set and
+ * we have attempted to update the VM.
+ */
+ uint8 new_vmbits;
+ uint8 old_vmbits;
+
/*
* Whether or not the page makes rel truncation unsafe. This is set to
* 'true', even if the page contains LP_DEAD items. VACUUM will remove
--
2.43.0
v25-0004-Move-VM-assert-into-prune-freeze-code.patchapplication/x-patch; name=v25-0004-Move-VM-assert-into-prune-freeze-code.patchDownload
From 0076da14668b81986c7db9b6eeb464f70fc3870d Mon Sep 17 00:00:00 2001
From: Melanie Plageman <melanieplageman@gmail.com>
Date: Tue, 2 Dec 2025 15:57:34 -0500
Subject: [PATCH v25 04/14] Move VM assert into prune/freeze code
This is a step toward setting the VM in the same WAL record as pruning
and freezing. It moves the check of the heap page into prune/freeze code
before setting the VM. This allows us to remove some fields of the
PruneFreezeResult.
---
src/backend/access/heap/pruneheap.c | 134 +++++++++++++++++++--------
src/backend/access/heap/vacuumlazy.c | 68 +-------------
src/include/access/heapam.h | 25 ++---
3 files changed, 104 insertions(+), 123 deletions(-)
diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index d7f36e2764f..1b0273c02c9 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -199,7 +199,7 @@ static bool heap_page_will_set_vm(Relation relation,
Buffer heap_buf,
Buffer vmbuffer,
bool blk_known_av,
- const PruneFreezeResult *presult,
+ const PruneState *prstate,
uint8 *new_vmbits);
@@ -784,9 +784,9 @@ heap_page_will_freeze(Relation relation, Buffer buffer,
/*
* Decide whether to set the visibility map bits (all-visible and all-frozen)
- * for heap_blk using information from PruneFreezeResult and blk_known_av.
- * Some callers may already have examined this page’s VM bits (e.g., VACUUM in
- * the previous heap_vac_scan_next_block() call) and can pass that along as
+ * for heap_blk using information from PruneState and blk_known_av. Some
+ * callers may already have examined this page’s VM bits (e.g., VACUUM in the
+ * previous heap_vac_scan_next_block() call) and can pass that along as
* blk_known_av. Callers that have not previously checked the page's status in
* the VM should pass false for blk_known_av.
*
@@ -806,27 +806,30 @@ heap_page_will_set_vm(Relation relation,
Buffer heap_buf,
Buffer vmbuffer,
bool blk_known_av,
- const PruneFreezeResult *presult,
+ const PruneState *prstate,
uint8 *new_vmbits)
{
Page heap_page = BufferGetPage(heap_buf);
*new_vmbits = 0;
+ if (!prstate->attempt_update_vm)
+ {
+ Assert(!prstate->all_visible && !prstate->all_frozen);
+ return false;
+ }
+
/*
* Determine what the visibility map bits should be set to using the
* values of all_visible and all_frozen determined during
* pruning/freezing.
*/
- if ((presult->all_visible && !blk_known_av) ||
- (presult->all_frozen && !VM_ALL_FROZEN(relation, heap_blk, &vmbuffer)))
+ if ((prstate->all_visible && !blk_known_av) ||
+ (prstate->all_frozen && !VM_ALL_FROZEN(relation, heap_blk, &vmbuffer)))
{
*new_vmbits = VISIBILITYMAP_ALL_VISIBLE;
- if (presult->all_frozen)
- {
- Assert(!TransactionIdIsValid(presult->vm_conflict_horizon));
+ if (prstate->all_frozen)
*new_vmbits |= VISIBILITYMAP_ALL_FROZEN;
- }
return true;
}
@@ -873,7 +876,7 @@ heap_page_will_set_vm(Relation relation,
* There should never be LP_DEAD items on a page with PD_ALL_VISIBLE set,
* however.
*/
- else if (presult->lpdead_items > 0 && PageIsAllVisible(heap_page))
+ else if (prstate->lpdead_items > 0 && PageIsAllVisible(heap_page))
{
ereport(WARNING,
(errcode(ERRCODE_DATA_CORRUPTED),
@@ -889,6 +892,30 @@ heap_page_will_set_vm(Relation relation,
return false;
}
+#ifdef USE_ASSERT_CHECKING
+
+/*
+ * Wrapper for heap_page_would_be_all_visible() which can be used for callers
+ * that expect no LP_DEAD on the page. Currently assert-only, but there is no
+ * reason not to use it outside of asserts.
+ */
+static bool
+heap_page_is_all_visible(Relation rel, Buffer buf,
+ TransactionId OldestXmin,
+ bool *all_frozen,
+ TransactionId *visibility_cutoff_xid,
+ OffsetNumber *logging_offnum)
+{
+
+ return heap_page_would_be_all_visible(rel, buf,
+ OldestXmin,
+ NULL, 0,
+ all_frozen,
+ visibility_cutoff_xid,
+ logging_offnum);
+}
+#endif
+
/*
* Prune and repair fragmentation and potentially freeze tuples on the
@@ -942,6 +969,7 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
Buffer vmbuffer = params->vmbuffer;
Page page = BufferGetPage(buffer);
BlockNumber blockno = BufferGetBlockNumber(buffer);
+ TransactionId vm_conflict_horizon = InvalidTransactionId;
PruneState prstate;
bool do_freeze;
bool do_prune;
@@ -1097,23 +1125,8 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
presult->nfrozen = prstate.nfrozen;
presult->live_tuples = prstate.live_tuples;
presult->recently_dead_tuples = prstate.recently_dead_tuples;
- presult->all_visible = prstate.all_visible;
- presult->all_frozen = prstate.all_frozen;
presult->hastup = prstate.hastup;
- /*
- * For callers planning to update the visibility map, the conflict horizon
- * for that record must be the newest xmin on the page. However, if the
- * page is completely frozen, there can be no conflict and the
- * vm_conflict_horizon should remain InvalidTransactionId. This includes
- * the case that we just froze all the tuples; the prune-freeze record
- * included the conflict XID already so the caller doesn't need it.
- */
- if (presult->all_frozen)
- presult->vm_conflict_horizon = InvalidTransactionId;
- else
- presult->vm_conflict_horizon = prstate.visibility_cutoff_xid;
-
presult->lpdead_items = prstate.lpdead_items;
/* the presult->deadoffsets array was already filled in */
@@ -1134,18 +1147,60 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
presult->new_vmbits = 0;
presult->old_vmbits = 0;
- /* Now update the visibility map and PD_ALL_VISIBLE hint */
+ /*
+ * If updating the visibility map, the conflict horizon for that record
+ * must be the newest xmin on the page. However, if the page is
+ * completely frozen, there can be no conflict and the vm_conflict_horizon
+ * should remain InvalidTransactionId. This includes the case that we
+ * just froze all the tuples; the prune-freeze record included the
+ * conflict XID already so we don't need to again.
+ */
+ if (prstate.all_frozen)
+ vm_conflict_horizon = InvalidTransactionId;
+ else
+ vm_conflict_horizon = prstate.visibility_cutoff_xid;
+
+ /*
+ * During its second pass over the heap, VACUUM calls
+ * heap_page_would_be_all_visible() to determine whether a page is
+ * all-visible and all-frozen. The logic here is similar. After completing
+ * pruning and freezing, use an assertion to verify that our results
+ * remain consistent with heap_page_would_be_all_visible().
+ */
+#ifdef USE_ASSERT_CHECKING
+ if (prstate.all_visible)
+ {
+ TransactionId debug_cutoff;
+ bool debug_all_frozen;
+
+ Assert(presult->lpdead_items == 0);
+
+ Assert(heap_page_is_all_visible(params->relation, buffer,
+ prstate.cutoffs->OldestXmin,
+ &debug_all_frozen,
+ &debug_cutoff, off_loc));
+
+ Assert(prstate.all_frozen == debug_all_frozen);
+
+ Assert(!TransactionIdIsValid(debug_cutoff) ||
+ debug_cutoff == vm_conflict_horizon);
+ }
+#endif
+
+ Assert(!prstate.all_frozen || prstate.all_visible);
Assert(!prstate.all_visible || (prstate.lpdead_items == 0));
- do_set_vm = false;
- if (prstate.attempt_update_vm)
- do_set_vm = heap_page_will_set_vm(params->relation,
- blockno,
- buffer,
- vmbuffer,
- params->blk_known_av,
- presult,
- &presult->new_vmbits);
+ /*
+ * Decide whether to set the VM bits based on information from the VM and
+ * the all_visible/all_frozen flags.
+ */
+ do_set_vm = heap_page_will_set_vm(params->relation,
+ blockno,
+ buffer,
+ vmbuffer,
+ params->blk_known_av,
+ &prstate,
+ &presult->new_vmbits);
/*
* new_vmbits should be 0 regardless of whether or not the page is
@@ -1158,7 +1213,8 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
/*
* It should never be the case that the visibility map page is set
* while the page-level bit is clear, but the reverse is allowed (if
- * checksums are not enabled).
+ * checksums are not enabled). However, we strongly prefer to keep
+ * them in sync.
*
* The heap page is added to the WAL chain even if it wasn't modified,
* so we still need to mark it dirty. The only scenario where it isn't
@@ -1169,7 +1225,7 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
MarkBufferDirty(buffer);
presult->old_vmbits = visibilitymap_set(params->relation, blockno, buffer,
InvalidXLogRecPtr,
- vmbuffer, presult->vm_conflict_horizon,
+ vmbuffer, vm_conflict_horizon,
presult->new_vmbits);
}
}
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index 81ef81cb8f3..29382550c03 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -464,20 +464,6 @@ static void dead_items_add(LVRelState *vacrel, BlockNumber blkno, OffsetNumber *
static void dead_items_reset(LVRelState *vacrel);
static void dead_items_cleanup(LVRelState *vacrel);
-#ifdef USE_ASSERT_CHECKING
-static bool heap_page_is_all_visible(Relation rel, Buffer buf,
- TransactionId OldestXmin,
- bool *all_frozen,
- TransactionId *visibility_cutoff_xid,
- OffsetNumber *logging_offnum);
-#endif
-static bool heap_page_would_be_all_visible(Relation rel, Buffer buf,
- TransactionId OldestXmin,
- OffsetNumber *deadoffsets,
- int ndeadoffsets,
- bool *all_frozen,
- TransactionId *visibility_cutoff_xid,
- OffsetNumber *logging_offnum);
static void update_relstats_all_indexes(LVRelState *vacrel);
static void vacuum_error_callback(void *arg);
static void update_vacuum_error_info(LVRelState *vacrel,
@@ -2030,32 +2016,6 @@ lazy_scan_prune(LVRelState *vacrel,
vacrel->new_frozen_tuple_pages++;
}
- /*
- * VACUUM will call heap_page_is_all_visible() during the second pass over
- * the heap to determine all_visible and all_frozen for the page -- this
- * is a specialized version of the logic from this function. Now that
- * we've finished pruning and freezing, make sure that we're in total
- * agreement with heap_page_is_all_visible() using an assertion.
- */
-#ifdef USE_ASSERT_CHECKING
- if (presult.all_visible)
- {
- TransactionId debug_cutoff;
- bool debug_all_frozen;
-
- Assert(presult.lpdead_items == 0);
-
- Assert(heap_page_is_all_visible(vacrel->rel, buf,
- vacrel->cutoffs.OldestXmin, &debug_all_frozen,
- &debug_cutoff, &vacrel->offnum));
-
- Assert(presult.all_frozen == debug_all_frozen);
-
- Assert(!TransactionIdIsValid(debug_cutoff) ||
- debug_cutoff == presult.vm_conflict_horizon);
- }
-#endif
-
/*
* Now save details of the LP_DEAD items from the page in vacrel
*/
@@ -3511,29 +3471,6 @@ dead_items_cleanup(LVRelState *vacrel)
vacrel->pvs = NULL;
}
-#ifdef USE_ASSERT_CHECKING
-
-/*
- * Wrapper for heap_page_would_be_all_visible() which can be used for callers
- * that expect no LP_DEAD on the page. Currently assert-only, but there is no
- * reason not to use it outside of asserts.
- */
-static bool
-heap_page_is_all_visible(Relation rel, Buffer buf,
- TransactionId OldestXmin,
- bool *all_frozen,
- TransactionId *visibility_cutoff_xid,
- OffsetNumber *logging_offnum)
-{
-
- return heap_page_would_be_all_visible(rel, buf,
- OldestXmin,
- NULL, 0,
- all_frozen,
- visibility_cutoff_xid,
- logging_offnum);
-}
-#endif
/*
* Check whether the heap page in buf is all-visible except for the dead
@@ -3557,15 +3494,12 @@ heap_page_is_all_visible(Relation rel, Buffer buf,
* - *logging_offnum: OffsetNumber of current tuple being processed;
* used by vacuum's error callback system.
*
- * Callers looking to verify that the page is already all-visible can call
- * heap_page_is_all_visible().
- *
* This logic is closely related to heap_prune_record_unchanged_lp_normal().
* If you modify this function, ensure consistency with that code. An
* assertion cross-checks that both remain in agreement. Do not introduce new
* side-effects.
*/
-static bool
+bool
heap_page_would_be_all_visible(Relation rel, Buffer buf,
TransactionId OldestXmin,
OffsetNumber *deadoffsets,
diff --git a/src/include/access/heapam.h b/src/include/access/heapam.h
index bb712c5b29f..392af6503da 100644
--- a/src/include/access/heapam.h
+++ b/src/include/access/heapam.h
@@ -263,8 +263,7 @@ typedef struct PruneFreezeParams
* HEAP_PAGE_PRUNE_MARK_UNUSED_NOW indicates that dead items can be set
* LP_UNUSED during pruning.
*
- * HEAP_PAGE_PRUNE_FREEZE indicates that we will also freeze tuples, and
- * will return 'all_visible', 'all_frozen' flags to the caller.
+ * HEAP_PAGE_PRUNE_FREEZE indicates that we will also freeze tuples
*
* HEAP_PAGE_PRUNE_UPDATE_VM indicates that we will set the page's status
* in the VM.
@@ -300,21 +299,6 @@ typedef struct PruneFreezeResult
int live_tuples;
int recently_dead_tuples;
- /*
- * all_visible and all_frozen indicate if the all-visible and all-frozen
- * bits in the visibility map can be set for this page, after pruning.
- *
- * vm_conflict_horizon is the newest xmin of live tuples on the page. The
- * caller can use it as the conflict horizon when setting the VM bits. It
- * is only valid if we froze some tuples (nfrozen > 0), and all_frozen is
- * true.
- *
- * These are only set if the HEAP_PRUNE_FREEZE option is set.
- */
- bool all_visible;
- bool all_frozen;
- TransactionId vm_conflict_horizon;
-
/*
* old_vmbits are the state of the all-visible and all-frozen bits in the
* visibility map before updating it during phase I of vacuuming.
@@ -460,6 +444,13 @@ extern void log_heap_prune_and_freeze(Relation relation, Buffer buffer,
/* in heap/vacuumlazy.c */
extern void heap_vacuum_rel(Relation rel,
const VacuumParams params, BufferAccessStrategy bstrategy);
+extern bool heap_page_would_be_all_visible(Relation rel, Buffer buf,
+ TransactionId OldestXmin,
+ OffsetNumber *deadoffsets,
+ int ndeadoffsets,
+ bool *all_frozen,
+ TransactionId *visibility_cutoff_xid,
+ OffsetNumber *logging_offnum);
/* in heap/heapam_visibility.c */
extern bool HeapTupleSatisfiesVisibility(HeapTuple htup, Snapshot snapshot,
--
2.43.0
v25-0005-Eliminate-XLOG_HEAP2_VISIBLE-from-vacuum-phase-I.patchapplication/x-patch; name=v25-0005-Eliminate-XLOG_HEAP2_VISIBLE-from-vacuum-phase-I.patchDownload
From 4ea8150c7cd3a2edb487f3bac1f86e574416ee67 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <melanieplageman@gmail.com>
Date: Tue, 2 Dec 2025 16:16:22 -0500
Subject: [PATCH v25 05/14] Eliminate XLOG_HEAP2_VISIBLE from vacuum phase I
prune/freeze
Vacuum no longer emits a separate WAL record for each page set
all-visible or all-frozen during phase I. Instead, visibility map
updates are now included in the XLOG_HEAP2_PRUNE_VACUUM_SCAN record that
is already emitted for pruning and freezing.
Previously, heap_page_prune_and_freeze() determined whether a page was
all-visible, but the corresponding VM bits were only set later in
lazy_scan_prune(). Now the VM is updated immediately in
heap_page_prune_and_freeze(), at the same time as the heap
modifications.
This change applies only to vacuum phase I, not to pruning performed
during normal page access.
NOTE: This commit is the main commit and all review-only commits
preceding it will be squashed into it.
Author: Melanie Plageman <melanieplageman@gmail.com>
Reviewed-by: Andres Freund <andres@anarazel.de>
Reviewed-by: Robert Haas <robertmhaas@gmail.com>
Reviewed-by: Kirill Reshke <reshkekirill@gmail.com>
Reviewed-by: Chao Li <li.evan.chao@gmail.com>
Discussion: https://postgr.es/m/flat/CAAKRu_ZMw6Npd_qm2KM%2BFwQ3cMOMx1Dh3VMhp8-V7SOLxdK9-g%40mail.gmail.com
---
src/backend/access/heap/pruneheap.c | 247 ++++++++++++++++------------
1 file changed, 142 insertions(+), 105 deletions(-)
diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index 1b0273c02c9..fb82b0c0f86 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -194,6 +194,12 @@ static void page_verify_redirects(Page page);
static bool heap_page_will_freeze(Relation relation, Buffer buffer,
bool did_tuple_hint_fpi, bool do_prune, bool do_hint_prune,
PruneState *prstate);
+static TransactionId get_conflict_xid(bool do_prune, bool do_freeze, bool do_set_vm,
+ uint8 new_vmbits,
+ TransactionId latest_xid_removed,
+ TransactionId frz_conflict_horizon,
+ TransactionId visibility_cutoff_xid,
+ bool blk_already_av);
static bool heap_page_will_set_vm(Relation relation,
BlockNumber heap_blk,
Buffer heap_buf,
@@ -782,6 +788,64 @@ heap_page_will_freeze(Relation relation, Buffer buffer,
return do_freeze;
}
+/*
+ * Calculate the conflict horizon for the whole XLOG_HEAP2_PRUNE_VACUUM_SCAN
+ * or XLOG_HEAP2_PRUNE_ON_ACCESS record.
+ */
+static TransactionId
+get_conflict_xid(bool do_prune, bool do_freeze, bool do_set_vm, uint8 new_vmbits,
+ TransactionId latest_xid_removed, TransactionId frz_conflict_horizon,
+ TransactionId visibility_cutoff_xid, bool blk_already_av)
+{
+ TransactionId conflict_xid;
+
+ /*
+ * We can omit the snapshot conflict horizon if we are not pruning or
+ * freezing any tuples and are setting an already all-visible page
+ * all-frozen in the VM. In this case, all of the tuples on the page must
+ * already be visible to all MVCC snapshots on the standby.
+ */
+ if (!do_prune && !do_freeze &&
+ do_set_vm && blk_already_av && (new_vmbits & VISIBILITYMAP_ALL_FROZEN))
+ return InvalidTransactionId;
+
+ /*
+ * The snapshotConflictHorizon for the whole record should be the most
+ * conservative of all the horizons calculated for any of the possible
+ * modifications. If this record will prune tuples, any transactions on
+ * the standby older than the youngest xmax of the most recently removed
+ * tuple this record will prune will conflict. If this record will freeze
+ * tuples, any transactions on the standby with xids older than the
+ * youngest tuple this record will freeze will conflict.
+ */
+ conflict_xid = InvalidTransactionId;
+
+ /*
+ * If we are updating the VM, the conflict horizon is almost always the
+ * visibility cutoff XID.
+ *
+ * Separately, if we are freezing any tuples, as an optimization, we can
+ * use the visibility_cutoff_xid as the conflict horizon if the page will
+ * be all-frozen. This is true even if there are LP_DEAD line pointers
+ * because we ignored those when maintaining the visibility_cutoff_xid.
+ * This will have been calculated earlier as the frz_conflict_horizon when
+ * we determined we would freeze.
+ */
+ if (do_set_vm)
+ conflict_xid = visibility_cutoff_xid;
+ else if (do_freeze)
+ conflict_xid = frz_conflict_horizon;
+
+ /*
+ * If we are removing tuples with a younger xmax than our so far
+ * calculated conflict_xid, we must use this as our horizon.
+ */
+ if (TransactionIdFollows(latest_xid_removed, conflict_xid))
+ conflict_xid = latest_xid_removed;
+
+ return conflict_xid;
+}
+
/*
* Decide whether to set the visibility map bits (all-visible and all-frozen)
* for heap_blk using information from PruneState and blk_known_av. Some
@@ -969,7 +1033,6 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
Buffer vmbuffer = params->vmbuffer;
Page page = BufferGetPage(buffer);
BlockNumber blockno = BufferGetBlockNumber(buffer);
- TransactionId vm_conflict_horizon = InvalidTransactionId;
PruneState prstate;
bool do_freeze;
bool do_prune;
@@ -977,6 +1040,9 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
bool do_set_vm;
bool did_tuple_hint_fpi;
int64 fpi_before = pgWalUsage.wal_fpi;
+ TransactionId conflict_xid = InvalidTransactionId;
+ uint8 new_vmbits = 0;
+ uint8 old_vmbits = 0;
/* Initialize prstate */
prune_freeze_setup(params,
@@ -1038,6 +1104,29 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
prstate.all_visible = prstate.all_frozen = false;
Assert(!prstate.all_frozen || prstate.all_visible);
+ Assert(!prstate.all_visible || (prstate.lpdead_items == 0));
+
+ /*
+ * Decide whether to set the VM bits based on information from the VM and
+ * the all_visible/all_frozen flags.
+ */
+ do_set_vm = heap_page_will_set_vm(params->relation,
+ blockno, buffer, vmbuffer, params->blk_known_av,
+ &prstate, &new_vmbits);
+
+ /*
+ * new_vmbits should be 0 regardless of whether or not the page is
+ * all-visible if we do not intend to set the VM.
+ */
+ Assert(do_set_vm || new_vmbits == 0);
+
+ conflict_xid = get_conflict_xid(do_prune, do_freeze, do_set_vm, new_vmbits,
+ prstate.latest_xid_removed, prstate.frz_conflict_horizon,
+ prstate.visibility_cutoff_xid, params->blk_known_av);
+
+ /* Lock vmbuffer before entering a critical section */
+ if (do_set_vm)
+ LockBuffer(vmbuffer, BUFFER_LOCK_EXCLUSIVE);
/* Any error while applying the changes is critical */
START_CRIT_SECTION();
@@ -1059,14 +1148,17 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
/*
* If that's all we had to do to the page, this is a non-WAL-logged
- * hint. If we are going to freeze or prune the page, we will mark
- * the buffer dirty below.
+ * hint. If we are going to freeze or prune the page or set
+ * PD_ALL_VISIBLE, we will mark the buffer dirty below.
+ *
+ * Setting PD_ALL_VISIBLE is fully WAL-logged because it is forbidden
+ * for the VM to be set and PD_ALL_VISIBLE to be clear.
*/
- if (!do_freeze && !do_prune)
+ if (!do_freeze && !do_prune && !do_set_vm)
MarkBufferDirtyHint(buffer, true);
}
- if (do_prune || do_freeze)
+ if (do_prune || do_freeze || do_set_vm)
{
/* Apply the planned item changes and repair page fragmentation. */
if (do_prune)
@@ -1080,6 +1172,20 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
if (do_freeze)
heap_freeze_prepared_tuples(buffer, prstate.frozen, prstate.nfrozen);
+ if (do_set_vm)
+ {
+ /*
+ * While it is valid for PD_ALL_VISIBLE to be set when the
+ * corresponding VM bit is clear, we strongly prefer to keep them
+ * in sync.
+ */
+ PageSetAllVisible(page);
+ old_vmbits = visibilitymap_set_vmbits(blockno,
+ vmbuffer, new_vmbits,
+ params->relation->rd_locator);
+ Assert(old_vmbits != new_vmbits);
+ }
+
MarkBufferDirty(buffer);
/*
@@ -1087,29 +1193,12 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
*/
if (RelationNeedsWAL(params->relation))
{
- /*
- * The snapshotConflictHorizon for the whole record should be the
- * most conservative of all the horizons calculated for any of the
- * possible modifications. If this record will prune tuples, any
- * transactions on the standby older than the youngest xmax of the
- * most recently removed tuple this record will prune will
- * conflict. If this record will freeze tuples, any transactions
- * on the standby with xids older than the youngest tuple this
- * record will freeze will conflict.
- */
- TransactionId conflict_xid;
-
- if (TransactionIdFollows(prstate.frz_conflict_horizon,
- prstate.latest_xid_removed))
- conflict_xid = prstate.frz_conflict_horizon;
- else
- conflict_xid = prstate.latest_xid_removed;
-
log_heap_prune_and_freeze(params->relation, buffer,
- InvalidBuffer, /* vmbuffer */
- 0, /* vmflags */
+ do_set_vm ? vmbuffer : InvalidBuffer,
+ do_set_vm ? new_vmbits : 0,
conflict_xid,
- true, params->reason,
+ true, /* cleanup lock */
+ params->reason,
prstate.frozen, prstate.nfrozen,
prstate.redirected, prstate.nredirected,
prstate.nowdead, prstate.ndead,
@@ -1119,46 +1208,8 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
END_CRIT_SECTION();
- /* Copy information back for caller */
- presult->ndeleted = prstate.ndeleted;
- presult->nnewlpdead = prstate.ndead;
- presult->nfrozen = prstate.nfrozen;
- presult->live_tuples = prstate.live_tuples;
- presult->recently_dead_tuples = prstate.recently_dead_tuples;
- presult->hastup = prstate.hastup;
-
- presult->lpdead_items = prstate.lpdead_items;
- /* the presult->deadoffsets array was already filled in */
-
- if (prstate.attempt_freeze)
- {
- if (presult->nfrozen > 0)
- {
- *new_relfrozen_xid = prstate.pagefrz.FreezePageRelfrozenXid;
- *new_relmin_mxid = prstate.pagefrz.FreezePageRelminMxid;
- }
- else
- {
- *new_relfrozen_xid = prstate.pagefrz.NoFreezePageRelfrozenXid;
- *new_relmin_mxid = prstate.pagefrz.NoFreezePageRelminMxid;
- }
- }
-
- presult->new_vmbits = 0;
- presult->old_vmbits = 0;
-
- /*
- * If updating the visibility map, the conflict horizon for that record
- * must be the newest xmin on the page. However, if the page is
- * completely frozen, there can be no conflict and the vm_conflict_horizon
- * should remain InvalidTransactionId. This includes the case that we
- * just froze all the tuples; the prune-freeze record included the
- * conflict XID already so we don't need to again.
- */
- if (prstate.all_frozen)
- vm_conflict_horizon = InvalidTransactionId;
- else
- vm_conflict_horizon = prstate.visibility_cutoff_xid;
+ if (do_set_vm)
+ LockBuffer(vmbuffer, BUFFER_LOCK_UNLOCK);
/*
* During its second pass over the heap, VACUUM calls
@@ -1173,7 +1224,8 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
TransactionId debug_cutoff;
bool debug_all_frozen;
- Assert(presult->lpdead_items == 0);
+ Assert(prstate.lpdead_items == 0);
+ Assert(prstate.cutoffs);
Assert(heap_page_is_all_visible(params->relation, buffer,
prstate.cutoffs->OldestXmin,
@@ -1183,50 +1235,35 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
Assert(prstate.all_frozen == debug_all_frozen);
Assert(!TransactionIdIsValid(debug_cutoff) ||
- debug_cutoff == vm_conflict_horizon);
+ debug_cutoff == prstate.visibility_cutoff_xid);
}
#endif
- Assert(!prstate.all_frozen || prstate.all_visible);
- Assert(!prstate.all_visible || (prstate.lpdead_items == 0));
-
- /*
- * Decide whether to set the VM bits based on information from the VM and
- * the all_visible/all_frozen flags.
- */
- do_set_vm = heap_page_will_set_vm(params->relation,
- blockno,
- buffer,
- vmbuffer,
- params->blk_known_av,
- &prstate,
- &presult->new_vmbits);
+ /* Copy information back for caller */
+ presult->ndeleted = prstate.ndeleted;
+ presult->nnewlpdead = prstate.ndead;
+ presult->nfrozen = prstate.nfrozen;
+ presult->live_tuples = prstate.live_tuples;
+ presult->recently_dead_tuples = prstate.recently_dead_tuples;
+ presult->hastup = prstate.hastup;
+ presult->new_vmbits = new_vmbits;
+ presult->old_vmbits = old_vmbits;
- /*
- * new_vmbits should be 0 regardless of whether or not the page is
- * all-visible if we do not intend to set the VM.
- */
- Assert(do_set_vm || presult->new_vmbits == 0);
+ presult->lpdead_items = prstate.lpdead_items;
+ /* the presult->deadoffsets array was already filled in */
- if (do_set_vm)
+ if (prstate.attempt_freeze)
{
- /*
- * It should never be the case that the visibility map page is set
- * while the page-level bit is clear, but the reverse is allowed (if
- * checksums are not enabled). However, we strongly prefer to keep
- * them in sync.
- *
- * The heap page is added to the WAL chain even if it wasn't modified,
- * so we still need to mark it dirty. The only scenario where it isn't
- * modified in phase I is when the VM was truncated or removed, which
- * isn't worth optimizing for.
- */
- PageSetAllVisible(page);
- MarkBufferDirty(buffer);
- presult->old_vmbits = visibilitymap_set(params->relation, blockno, buffer,
- InvalidXLogRecPtr,
- vmbuffer, vm_conflict_horizon,
- presult->new_vmbits);
+ if (presult->nfrozen > 0)
+ {
+ *new_relfrozen_xid = prstate.pagefrz.FreezePageRelfrozenXid;
+ *new_relmin_mxid = prstate.pagefrz.FreezePageRelminMxid;
+ }
+ else
+ {
+ *new_relfrozen_xid = prstate.pagefrz.NoFreezePageRelfrozenXid;
+ *new_relmin_mxid = prstate.pagefrz.NoFreezePageRelminMxid;
+ }
}
}
--
2.43.0
v25-0006-Eliminate-XLOG_HEAP2_VISIBLE-from-empty-page-vac.patchapplication/x-patch; name=v25-0006-Eliminate-XLOG_HEAP2_VISIBLE-from-empty-page-vac.patchDownload
From 80189d9a76a8a993d390fc3372c1b4d866cc4fb4 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <melanieplageman@gmail.com>
Date: Sat, 27 Sep 2025 11:55:21 -0400
Subject: [PATCH v25 06/14] Eliminate XLOG_HEAP2_VISIBLE from empty-page vacuum
As part of removing XLOG_HEAP2_VISIBLE records, phase I of VACUUM now
marks empty pages all-visible in a XLOG_HEAP2_PRUNE_VACUUM_SCAN record.
Author: Melanie Plageman <melanieplageman@gmail.com>
Reviewed-by: Robert Haas <robertmhaas@gmail.com>
Reviewed-by: Andres Freund <andres@anarazel.de>
Reviewed-by: Chao Li <li.evan.chao@gmail.com>
---
src/backend/access/heap/vacuumlazy.c | 35 +++++++++++++++++++++++-----
1 file changed, 29 insertions(+), 6 deletions(-)
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index 29382550c03..b51112a71a7 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -1886,9 +1886,12 @@ lazy_scan_new_or_empty(LVRelState *vacrel, Buffer buf, BlockNumber blkno,
*/
if (!PageIsAllVisible(page))
{
+ /* Lock vmbuffer before entering critical section */
+ LockBuffer(vmbuffer, BUFFER_LOCK_EXCLUSIVE);
+
START_CRIT_SECTION();
- /* mark buffer dirty before writing a WAL record */
+ /* Mark buffer dirty before writing any WAL records */
MarkBufferDirty(buf);
/*
@@ -1905,13 +1908,33 @@ lazy_scan_new_or_empty(LVRelState *vacrel, Buffer buf, BlockNumber blkno,
log_newpage_buffer(buf, true);
PageSetAllVisible(page);
- visibilitymap_set(vacrel->rel, blkno, buf,
- InvalidXLogRecPtr,
- vmbuffer, InvalidTransactionId,
- VISIBILITYMAP_ALL_VISIBLE |
- VISIBILITYMAP_ALL_FROZEN);
+ visibilitymap_set_vmbits(blkno,
+ vmbuffer,
+ VISIBILITYMAP_ALL_VISIBLE |
+ VISIBILITYMAP_ALL_FROZEN,
+ vacrel->rel->rd_locator);
+
+ /*
+ * Emit WAL for setting PD_ALL_VISIBLE on the heap page and
+ * setting the VM.
+ */
+ if (RelationNeedsWAL(vacrel->rel))
+ log_heap_prune_and_freeze(vacrel->rel, buf,
+ vmbuffer,
+ VISIBILITYMAP_ALL_VISIBLE |
+ VISIBILITYMAP_ALL_FROZEN,
+ InvalidTransactionId, /* conflict xid */
+ false, /* cleanup lock */
+ PRUNE_VACUUM_SCAN, /* reason */
+ NULL, 0,
+ NULL, 0,
+ NULL, 0,
+ NULL, 0);
+
END_CRIT_SECTION();
+ LockBuffer(vmbuffer, BUFFER_LOCK_UNLOCK);
+
/* Count the newly all-frozen pages for logging */
vacrel->vm_new_visible_pages++;
vacrel->vm_new_visible_frozen_pages++;
--
2.43.0
v25-0007-Remove-XLOG_HEAP2_VISIBLE-entirely.patchapplication/x-patch; name=v25-0007-Remove-XLOG_HEAP2_VISIBLE-entirely.patchDownload
From 41dfb68868816c178bc2809144cb4fe6cbef8b37 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <melanieplageman@gmail.com>
Date: Sat, 27 Sep 2025 11:55:36 -0400
Subject: [PATCH v25 07/14] Remove XLOG_HEAP2_VISIBLE entirely
As no remaining users emit XLOG_HEAP2_VISIBLE records.
This includes deleting the xl_heap_visible struct and all functions
responsible for emitting or replaying XLOG_HEAP2_VISIBLE records.
This changes the visibility map API, so any external users/consumers of
the VM-only WAL record will need to change.
Author: Melanie Plageman <melanieplageman@gmail.com>
Reviewed-by: Andrey Borodin <x4mmm@yandex-team.ru>
Reviewed-by: Andres Freund <andres@anarazel.de>
Reviewed-by: Chao Li <li.evan.chao@gmail.com>
---
src/backend/access/common/bufmask.c | 4 +-
src/backend/access/heap/heapam.c | 54 +-------
src/backend/access/heap/heapam_xlog.c | 155 ++---------------------
src/backend/access/heap/pruneheap.c | 6 +-
src/backend/access/heap/vacuumlazy.c | 16 +--
src/backend/access/heap/visibilitymap.c | 112 +---------------
src/backend/access/rmgrdesc/heapdesc.c | 10 --
src/backend/replication/logical/decode.c | 1 -
src/backend/storage/ipc/standby.c | 12 +-
src/include/access/heapam_xlog.h | 28 +---
src/include/access/visibilitymap.h | 13 +-
src/include/access/visibilitymapdefs.h | 9 --
src/tools/pgindent/typedefs.list | 1 -
13 files changed, 46 insertions(+), 375 deletions(-)
diff --git a/src/backend/access/common/bufmask.c b/src/backend/access/common/bufmask.c
index bb260cffa68..5f07f179415 100644
--- a/src/backend/access/common/bufmask.c
+++ b/src/backend/access/common/bufmask.c
@@ -56,8 +56,8 @@ mask_page_hint_bits(Page page)
/*
* During replay, if the page LSN has advanced past our XLOG record's LSN,
- * we don't mark the page all-visible. See heap_xlog_visible() for
- * details.
+ * we don't mark the page all-visible. See heap_xlog_prune_and_freeze()
+ * for more details.
*/
PageClearAllVisible(page);
}
diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index 225f9829f22..60cc6ba998d 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -2539,11 +2539,11 @@ heap_multi_insert(Relation relation, TupleTableSlot **slots, int ntuples,
else if (all_frozen_set)
{
PageSetAllVisible(page);
- visibilitymap_set_vmbits(BufferGetBlockNumber(buffer),
- vmbuffer,
- VISIBILITYMAP_ALL_VISIBLE |
- VISIBILITYMAP_ALL_FROZEN,
- relation->rd_locator);
+ visibilitymap_set(BufferGetBlockNumber(buffer),
+ vmbuffer,
+ VISIBILITYMAP_ALL_VISIBLE |
+ VISIBILITYMAP_ALL_FROZEN,
+ relation->rd_locator);
}
/*
@@ -8812,50 +8812,6 @@ bottomup_sort_and_shrink(TM_IndexDeleteOp *delstate)
return nblocksfavorable;
}
-/*
- * Perform XLogInsert for a heap-visible operation. 'block' is the block
- * being marked all-visible, and vm_buffer is the buffer containing the
- * corresponding visibility map block. Both should have already been modified
- * and dirtied.
- *
- * snapshotConflictHorizon comes from the largest xmin on the page being
- * marked all-visible. REDO routine uses it to generate recovery conflicts.
- *
- * If checksums or wal_log_hints are enabled, we may also generate a full-page
- * image of heap_buffer. Otherwise, we optimize away the FPI (by specifying
- * REGBUF_NO_IMAGE for the heap buffer), in which case the caller should *not*
- * update the heap page's LSN.
- */
-XLogRecPtr
-log_heap_visible(Relation rel, Buffer heap_buffer, Buffer vm_buffer,
- TransactionId snapshotConflictHorizon, uint8 vmflags)
-{
- xl_heap_visible xlrec;
- XLogRecPtr recptr;
- uint8 flags;
-
- Assert(BufferIsValid(heap_buffer));
- Assert(BufferIsValid(vm_buffer));
-
- xlrec.snapshotConflictHorizon = snapshotConflictHorizon;
- xlrec.flags = vmflags;
- if (RelationIsAccessibleInLogicalDecoding(rel))
- xlrec.flags |= VISIBILITYMAP_XLOG_CATALOG_REL;
- XLogBeginInsert();
- XLogRegisterData(&xlrec, SizeOfHeapVisible);
-
- XLogRegisterBuffer(0, vm_buffer, 0);
-
- flags = REGBUF_STANDARD;
- if (!XLogHintBitIsNeeded())
- flags |= REGBUF_NO_IMAGE;
- XLogRegisterBuffer(1, heap_buffer, flags);
-
- recptr = XLogInsert(RM_HEAP2_ID, XLOG_HEAP2_VISIBLE);
-
- return recptr;
-}
-
/*
* Perform XLogInsert for a heap-update operation. Caller must already
* have modified the buffer(s) and marked them dirty.
diff --git a/src/backend/access/heap/heapam_xlog.c b/src/backend/access/heap/heapam_xlog.c
index a09fb4b803a..b66736ea282 100644
--- a/src/backend/access/heap/heapam_xlog.c
+++ b/src/backend/access/heap/heapam_xlog.c
@@ -236,7 +236,7 @@ heap_xlog_prune_freeze(XLogReaderState *record)
if (PageIsNew(vmpage))
PageInit(vmpage, BLCKSZ, 0);
- visibilitymap_set_vmbits(blkno, vmbuffer, vmflags, rlocator);
+ visibilitymap_set(blkno, vmbuffer, vmflags, rlocator);
Assert(BufferIsDirty(vmbuffer));
PageSetLSN(vmpage, lsn);
@@ -249,142 +249,6 @@ heap_xlog_prune_freeze(XLogReaderState *record)
XLogRecordPageWithFreeSpace(rlocator, blkno, freespace);
}
-/*
- * Replay XLOG_HEAP2_VISIBLE records.
- *
- * The critical integrity requirement here is that we must never end up with
- * a situation where the visibility map bit is set, and the page-level
- * PD_ALL_VISIBLE bit is clear. If that were to occur, then a subsequent
- * page modification would fail to clear the visibility map bit.
- */
-static void
-heap_xlog_visible(XLogReaderState *record)
-{
- XLogRecPtr lsn = record->EndRecPtr;
- xl_heap_visible *xlrec = (xl_heap_visible *) XLogRecGetData(record);
- Buffer vmbuffer = InvalidBuffer;
- Buffer buffer;
- Page page;
- RelFileLocator rlocator;
- BlockNumber blkno;
- XLogRedoAction action;
-
- Assert((xlrec->flags & VISIBILITYMAP_XLOG_VALID_BITS) == xlrec->flags);
-
- XLogRecGetBlockTag(record, 1, &rlocator, NULL, &blkno);
-
- /*
- * If there are any Hot Standby transactions running that have an xmin
- * horizon old enough that this page isn't all-visible for them, they
- * might incorrectly decide that an index-only scan can skip a heap fetch.
- *
- * NB: It might be better to throw some kind of "soft" conflict here that
- * forces any index-only scan that is in flight to perform heap fetches,
- * rather than killing the transaction outright.
- */
- if (InHotStandby)
- ResolveRecoveryConflictWithSnapshot(xlrec->snapshotConflictHorizon,
- xlrec->flags & VISIBILITYMAP_XLOG_CATALOG_REL,
- rlocator);
-
- /*
- * Read the heap page, if it still exists. If the heap file has dropped or
- * truncated later in recovery, we don't need to update the page, but we'd
- * better still update the visibility map.
- */
- action = XLogReadBufferForRedo(record, 1, &buffer);
- if (action == BLK_NEEDS_REDO)
- {
- /*
- * We don't bump the LSN of the heap page when setting the visibility
- * map bit (unless checksums or wal_hint_bits is enabled, in which
- * case we must). This exposes us to torn page hazards, but since
- * we're not inspecting the existing page contents in any way, we
- * don't care.
- */
- page = BufferGetPage(buffer);
-
- PageSetAllVisible(page);
-
- if (XLogHintBitIsNeeded())
- PageSetLSN(page, lsn);
-
- MarkBufferDirty(buffer);
- }
- else if (action == BLK_RESTORED)
- {
- /*
- * If heap block was backed up, we already restored it and there's
- * nothing more to do. (This can only happen with checksums or
- * wal_log_hints enabled.)
- */
- }
-
- if (BufferIsValid(buffer))
- {
- Size space = PageGetFreeSpace(BufferGetPage(buffer));
-
- UnlockReleaseBuffer(buffer);
-
- /*
- * Since FSM is not WAL-logged and only updated heuristically, it
- * easily becomes stale in standbys. If the standby is later promoted
- * and runs VACUUM, it will skip updating individual free space
- * figures for pages that became all-visible (or all-frozen, depending
- * on the vacuum mode,) which is troublesome when FreeSpaceMapVacuum
- * propagates too optimistic free space values to upper FSM layers;
- * later inserters try to use such pages only to find out that they
- * are unusable. This can cause long stalls when there are many such
- * pages.
- *
- * Forestall those problems by updating FSM's idea about a page that
- * is becoming all-visible or all-frozen.
- *
- * Do this regardless of a full-page image being applied, since the
- * FSM data is not in the page anyway.
- */
- if (xlrec->flags & VISIBILITYMAP_VALID_BITS)
- XLogRecordPageWithFreeSpace(rlocator, blkno, space);
- }
-
- /*
- * Even if we skipped the heap page update due to the LSN interlock, it's
- * still safe to update the visibility map. Any WAL record that clears
- * the visibility map bit does so before checking the page LSN, so any
- * bits that need to be cleared will still be cleared.
- */
- if (XLogReadBufferForRedoExtended(record, 0, RBM_ZERO_ON_ERROR, false,
- &vmbuffer) == BLK_NEEDS_REDO)
- {
- Page vmpage = BufferGetPage(vmbuffer);
- Relation reln;
- uint8 vmbits;
-
- /* initialize the page if it was read as zeros */
- if (PageIsNew(vmpage))
- PageInit(vmpage, BLCKSZ, 0);
-
- /* remove VISIBILITYMAP_XLOG_* */
- vmbits = xlrec->flags & VISIBILITYMAP_VALID_BITS;
-
- /*
- * XLogReadBufferForRedoExtended locked the buffer. But
- * visibilitymap_set will handle locking itself.
- */
- LockBuffer(vmbuffer, BUFFER_LOCK_UNLOCK);
-
- reln = CreateFakeRelcacheEntry(rlocator);
-
- visibilitymap_set(reln, blkno, InvalidBuffer, lsn, vmbuffer,
- xlrec->snapshotConflictHorizon, vmbits);
-
- ReleaseBuffer(vmbuffer);
- FreeFakeRelcacheEntry(reln);
- }
- else if (BufferIsValid(vmbuffer))
- UnlockReleaseBuffer(vmbuffer);
-}
-
/*
* Given an "infobits" field from an XLog record, set the correct bits in the
* given infomask and infomask2 for the tuple touched by the record.
@@ -762,8 +626,8 @@ heap_xlog_multi_insert(XLogReaderState *record)
*
* During recovery, however, no concurrent writers exist. Therefore,
* updating the VM without holding the heap page lock is safe enough. This
- * same approach is taken when replaying xl_heap_visible records (see
- * heap_xlog_visible()).
+ * same approach is taken when replaying XLOG_HEAP2_PRUNE* records (see
+ * heap_xlog_prune_and_freeze()).
*/
if ((xlrec->flags & XLH_INSERT_ALL_FROZEN_SET) &&
XLogReadBufferForRedoExtended(record, 1, RBM_ZERO_ON_ERROR, false,
@@ -775,11 +639,11 @@ heap_xlog_multi_insert(XLogReaderState *record)
if (PageIsNew(vmpage))
PageInit(vmpage, BLCKSZ, 0);
- visibilitymap_set_vmbits(blkno,
- vmbuffer,
- VISIBILITYMAP_ALL_VISIBLE |
- VISIBILITYMAP_ALL_FROZEN,
- rlocator);
+ visibilitymap_set(blkno,
+ vmbuffer,
+ VISIBILITYMAP_ALL_VISIBLE |
+ VISIBILITYMAP_ALL_FROZEN,
+ rlocator);
Assert(BufferIsDirty(vmbuffer));
PageSetLSN(vmpage, lsn);
@@ -1360,9 +1224,6 @@ heap2_redo(XLogReaderState *record)
case XLOG_HEAP2_PRUNE_VACUUM_CLEANUP:
heap_xlog_prune_freeze(record);
break;
- case XLOG_HEAP2_VISIBLE:
- heap_xlog_visible(record);
- break;
case XLOG_HEAP2_MULTI_INSERT:
heap_xlog_multi_insert(record);
break;
diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index fb82b0c0f86..7c36b89324e 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -1180,9 +1180,9 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
* in sync.
*/
PageSetAllVisible(page);
- old_vmbits = visibilitymap_set_vmbits(blockno,
- vmbuffer, new_vmbits,
- params->relation->rd_locator);
+ old_vmbits = visibilitymap_set(blockno,
+ vmbuffer, new_vmbits,
+ params->relation->rd_locator);
Assert(old_vmbits != new_vmbits);
}
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index b51112a71a7..c18030087c1 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -1908,11 +1908,11 @@ lazy_scan_new_or_empty(LVRelState *vacrel, Buffer buf, BlockNumber blkno,
log_newpage_buffer(buf, true);
PageSetAllVisible(page);
- visibilitymap_set_vmbits(blkno,
- vmbuffer,
- VISIBILITYMAP_ALL_VISIBLE |
- VISIBILITYMAP_ALL_FROZEN,
- vacrel->rel->rd_locator);
+ visibilitymap_set(blkno,
+ vmbuffer,
+ VISIBILITYMAP_ALL_VISIBLE |
+ VISIBILITYMAP_ALL_FROZEN,
+ vacrel->rel->rd_locator);
/*
* Emit WAL for setting PD_ALL_VISIBLE on the heap page and
@@ -2794,9 +2794,9 @@ lazy_vacuum_heap_page(LVRelState *vacrel, BlockNumber blkno, Buffer buffer,
* set PD_ALL_VISIBLE.
*/
PageSetAllVisible(page);
- visibilitymap_set_vmbits(blkno,
- vmbuffer, vmflags,
- vacrel->rel->rd_locator);
+ visibilitymap_set(blkno,
+ vmbuffer, vmflags,
+ vacrel->rel->rd_locator);
conflict_xid = visibility_cutoff_xid;
}
diff --git a/src/backend/access/heap/visibilitymap.c b/src/backend/access/heap/visibilitymap.c
index d14588e92ae..7997e926872 100644
--- a/src/backend/access/heap/visibilitymap.c
+++ b/src/backend/access/heap/visibilitymap.c
@@ -14,8 +14,7 @@
* visibilitymap_clear - clear bits for one page in the visibility map
* visibilitymap_pin - pin a map page for setting a bit
* visibilitymap_pin_ok - check whether correct map page is already pinned
- * visibilitymap_set - set bit(s) in a previously pinned page and log
- * visibilitymap_set_vmbits - set bit(s) in a pinned page
+ * visibilitymap_set - set bit(s) in a previously pinned page
* visibilitymap_get_status - get status of bits
* visibilitymap_count - count number of bits set in visibility map
* visibilitymap_prepare_truncate -
@@ -220,109 +219,6 @@ visibilitymap_pin_ok(BlockNumber heapBlk, Buffer vmbuf)
return BufferIsValid(vmbuf) && BufferGetBlockNumber(vmbuf) == mapBlock;
}
-/*
- * visibilitymap_set - set bit(s) on a previously pinned page
- *
- * recptr is the LSN of the XLOG record we're replaying, if we're in recovery,
- * or InvalidXLogRecPtr in normal running. The VM page LSN is advanced to the
- * one provided; in normal running, we generate a new XLOG record and set the
- * page LSN to that value (though the heap page's LSN may *not* be updated;
- * see below). cutoff_xid is the largest xmin on the page being marked
- * all-visible; it is needed for Hot Standby, and can be InvalidTransactionId
- * if the page contains no tuples. It can also be set to InvalidTransactionId
- * when a page that is already all-visible is being marked all-frozen.
- *
- * Caller is expected to set the heap page's PD_ALL_VISIBLE bit before calling
- * this function. Except in recovery, caller should also pass the heap
- * buffer. When checksums are enabled and we're not in recovery, we must add
- * the heap buffer to the WAL chain to protect it from being torn.
- *
- * You must pass a buffer containing the correct map page to this function.
- * Call visibilitymap_pin first to pin the right one. This function doesn't do
- * any I/O.
- *
- * Returns the state of the page's VM bits before setting flags.
- */
-uint8
-visibilitymap_set(Relation rel, BlockNumber heapBlk, Buffer heapBuf,
- XLogRecPtr recptr, Buffer vmBuf, TransactionId cutoff_xid,
- uint8 flags)
-{
- BlockNumber mapBlock = HEAPBLK_TO_MAPBLOCK(heapBlk);
- uint32 mapByte = HEAPBLK_TO_MAPBYTE(heapBlk);
- uint8 mapOffset = HEAPBLK_TO_OFFSET(heapBlk);
- Page page;
- uint8 *map;
- uint8 status;
-
-#ifdef TRACE_VISIBILITYMAP
- elog(DEBUG1, "vm_set flags 0x%02X for %s %d",
- flags, RelationGetRelationName(rel), heapBlk);
-#endif
-
- Assert(InRecovery || !XLogRecPtrIsValid(recptr));
- Assert(InRecovery || PageIsAllVisible(BufferGetPage(heapBuf)));
- Assert((flags & VISIBILITYMAP_VALID_BITS) == flags);
-
- /* Must never set all_frozen bit without also setting all_visible bit */
- Assert(flags != VISIBILITYMAP_ALL_FROZEN);
-
- /* Check that we have the right heap page pinned, if present */
- if (BufferIsValid(heapBuf) && BufferGetBlockNumber(heapBuf) != heapBlk)
- elog(ERROR, "wrong heap buffer passed to visibilitymap_set");
-
- Assert(!BufferIsValid(heapBuf) ||
- BufferIsLockedByMeInMode(heapBuf, BUFFER_LOCK_EXCLUSIVE));
-
- /* Check that we have the right VM page pinned */
- if (!BufferIsValid(vmBuf) || BufferGetBlockNumber(vmBuf) != mapBlock)
- elog(ERROR, "wrong VM buffer passed to visibilitymap_set");
-
- page = BufferGetPage(vmBuf);
- map = (uint8 *) PageGetContents(page);
- LockBuffer(vmBuf, BUFFER_LOCK_EXCLUSIVE);
-
- status = (map[mapByte] >> mapOffset) & VISIBILITYMAP_VALID_BITS;
- if (flags != status)
- {
- START_CRIT_SECTION();
-
- map[mapByte] |= (flags << mapOffset);
- MarkBufferDirty(vmBuf);
-
- if (RelationNeedsWAL(rel))
- {
- if (!XLogRecPtrIsValid(recptr))
- {
- Assert(!InRecovery);
- recptr = log_heap_visible(rel, heapBuf, vmBuf, cutoff_xid, flags);
-
- /*
- * If data checksums are enabled (or wal_log_hints=on), we
- * need to protect the heap page from being torn.
- *
- * If not, then we must *not* update the heap page's LSN. In
- * this case, the FPI for the heap page was omitted from the
- * WAL record inserted above, so it would be incorrect to
- * update the heap page's LSN.
- */
- if (XLogHintBitIsNeeded())
- {
- Page heapPage = BufferGetPage(heapBuf);
-
- PageSetLSN(heapPage, recptr);
- }
- }
- PageSetLSN(page, recptr);
- }
-
- END_CRIT_SECTION();
- }
-
- LockBuffer(vmBuf, BUFFER_LOCK_UNLOCK);
- return status;
-}
-
/*
* Set VM (visibility map) flags in the VM block in vmBuf.
*
@@ -344,9 +240,9 @@ visibilitymap_set(Relation rel, BlockNumber heapBlk, Buffer heapBuf,
* rlocator is used only for debugging messages.
*/
uint8
-visibilitymap_set_vmbits(BlockNumber heapBlk,
- Buffer vmBuf, uint8 flags,
- const RelFileLocator rlocator)
+visibilitymap_set(BlockNumber heapBlk,
+ Buffer vmBuf, uint8 flags,
+ const RelFileLocator rlocator)
{
BlockNumber mapBlock = HEAPBLK_TO_MAPBLOCK(heapBlk);
uint32 mapByte = HEAPBLK_TO_MAPBYTE(heapBlk);
diff --git a/src/backend/access/rmgrdesc/heapdesc.c b/src/backend/access/rmgrdesc/heapdesc.c
index ca26d1f0ed1..08461fdf593 100644
--- a/src/backend/access/rmgrdesc/heapdesc.c
+++ b/src/backend/access/rmgrdesc/heapdesc.c
@@ -349,13 +349,6 @@ heap2_desc(StringInfo buf, XLogReaderState *record)
}
}
}
- else if (info == XLOG_HEAP2_VISIBLE)
- {
- xl_heap_visible *xlrec = (xl_heap_visible *) rec;
-
- appendStringInfo(buf, "snapshotConflictHorizon: %u, flags: 0x%02X",
- xlrec->snapshotConflictHorizon, xlrec->flags);
- }
else if (info == XLOG_HEAP2_MULTI_INSERT)
{
xl_heap_multi_insert *xlrec = (xl_heap_multi_insert *) rec;
@@ -461,9 +454,6 @@ heap2_identify(uint8 info)
case XLOG_HEAP2_PRUNE_VACUUM_CLEANUP:
id = "PRUNE_VACUUM_CLEANUP";
break;
- case XLOG_HEAP2_VISIBLE:
- id = "VISIBLE";
- break;
case XLOG_HEAP2_MULTI_INSERT:
id = "MULTI_INSERT";
break;
diff --git a/src/backend/replication/logical/decode.c b/src/backend/replication/logical/decode.c
index cc03f0706e9..2fdd4af90a8 100644
--- a/src/backend/replication/logical/decode.c
+++ b/src/backend/replication/logical/decode.c
@@ -454,7 +454,6 @@ heap2_decode(LogicalDecodingContext *ctx, XLogRecordBuffer *buf)
case XLOG_HEAP2_PRUNE_ON_ACCESS:
case XLOG_HEAP2_PRUNE_VACUUM_SCAN:
case XLOG_HEAP2_PRUNE_VACUUM_CLEANUP:
- case XLOG_HEAP2_VISIBLE:
case XLOG_HEAP2_LOCK_UPDATED:
break;
default:
diff --git a/src/backend/storage/ipc/standby.c b/src/backend/storage/ipc/standby.c
index fc45d72c79b..3655358ed6b 100644
--- a/src/backend/storage/ipc/standby.c
+++ b/src/backend/storage/ipc/standby.c
@@ -475,12 +475,12 @@ ResolveRecoveryConflictWithSnapshot(TransactionId snapshotConflictHorizon,
* If we get passed InvalidTransactionId then we do nothing (no conflict).
*
* This can happen when replaying already-applied WAL records after a
- * standby crash or restart, or when replaying an XLOG_HEAP2_VISIBLE
- * record that marks as frozen a page which was already all-visible. It's
- * also quite common with records generated during index deletion
- * (original execution of the deletion can reason that a recovery conflict
- * which is sufficient for the deletion operation must take place before
- * replay of the deletion record itself).
+ * standby crash or restart, or when replaying a record that marks as
+ * frozen a page which was already marked all-visible in the visibility
+ * map. It's also quite common with records generated during index
+ * deletion (original execution of the deletion can reason that a recovery
+ * conflict which is sufficient for the deletion operation must take place
+ * before replay of the deletion record itself).
*/
if (!TransactionIdIsValid(snapshotConflictHorizon))
return;
diff --git a/src/include/access/heapam_xlog.h b/src/include/access/heapam_xlog.h
index 16c2b2e3c9c..69678187832 100644
--- a/src/include/access/heapam_xlog.h
+++ b/src/include/access/heapam_xlog.h
@@ -60,7 +60,6 @@
#define XLOG_HEAP2_PRUNE_ON_ACCESS 0x10
#define XLOG_HEAP2_PRUNE_VACUUM_SCAN 0x20
#define XLOG_HEAP2_PRUNE_VACUUM_CLEANUP 0x30
-#define XLOG_HEAP2_VISIBLE 0x40
#define XLOG_HEAP2_MULTI_INSERT 0x50
#define XLOG_HEAP2_LOCK_UPDATED 0x60
#define XLOG_HEAP2_NEW_CID 0x70
@@ -294,7 +293,13 @@ typedef struct xl_heap_prune
#define SizeOfHeapPrune (offsetof(xl_heap_prune, flags) + sizeof(uint16))
-/* to handle recovery conflict during logical decoding on standby */
+/*
+ * To handle recovery conflict during logical decoding on standby, we must know
+ * if the table is a catalog table. Note that in visibilitymapdefs.h
+ * VISIBILITYMAP_XLOG_CATALOG_REL is also defined as (1 << 2). xl_heap_prune
+ * records should use XLHP_IS_CATALOG_REL, not VISIBILIYTMAP_XLOG_CATALOG_REL --
+ * even if they only contain updates to the VM.
+ */
#define XLHP_IS_CATALOG_REL (1 << 1)
/*
@@ -443,20 +448,6 @@ typedef struct xl_heap_inplace
#define MinSizeOfHeapInplace (offsetof(xl_heap_inplace, nmsgs) + sizeof(int))
-/*
- * This is what we need to know about setting a visibility map bit
- *
- * Backup blk 0: visibility map buffer
- * Backup blk 1: heap buffer
- */
-typedef struct xl_heap_visible
-{
- TransactionId snapshotConflictHorizon;
- uint8 flags;
-} xl_heap_visible;
-
-#define SizeOfHeapVisible (offsetof(xl_heap_visible, flags) + sizeof(uint8))
-
typedef struct xl_heap_new_cid
{
/*
@@ -500,11 +491,6 @@ extern void heap2_desc(StringInfo buf, XLogReaderState *record);
extern const char *heap2_identify(uint8 info);
extern void heap_xlog_logical_rewrite(XLogReaderState *r);
-extern XLogRecPtr log_heap_visible(Relation rel, Buffer heap_buffer,
- Buffer vm_buffer,
- TransactionId snapshotConflictHorizon,
- uint8 vmflags);
-
/* in heapdesc.c, so it can be shared between frontend/backend code */
extern void heap_xlog_deserialize_prune_and_freeze(char *cursor, uint16 flags,
int *nplans, xlhp_freeze_plan **plans,
diff --git a/src/include/access/visibilitymap.h b/src/include/access/visibilitymap.h
index c6fa37be968..05ba6786b47 100644
--- a/src/include/access/visibilitymap.h
+++ b/src/include/access/visibilitymap.h
@@ -15,7 +15,6 @@
#define VISIBILITYMAP_H
#include "access/visibilitymapdefs.h"
-#include "access/xlogdefs.h"
#include "storage/block.h"
#include "storage/buf.h"
#include "storage/relfilelocator.h"
@@ -32,15 +31,9 @@ extern bool visibilitymap_clear(Relation rel, BlockNumber heapBlk,
extern void visibilitymap_pin(Relation rel, BlockNumber heapBlk,
Buffer *vmbuf);
extern bool visibilitymap_pin_ok(BlockNumber heapBlk, Buffer vmbuf);
-extern uint8 visibilitymap_set(Relation rel,
- BlockNumber heapBlk, Buffer heapBuf,
- XLogRecPtr recptr,
- Buffer vmBuf,
- TransactionId cutoff_xid,
- uint8 flags);
-extern uint8 visibilitymap_set_vmbits(BlockNumber heapBlk,
- Buffer vmBuf, uint8 flags,
- const RelFileLocator rlocator);
+extern uint8 visibilitymap_set(BlockNumber heapBlk,
+ Buffer vmBuf, uint8 flags,
+ const RelFileLocator rlocator);
extern uint8 visibilitymap_get_status(Relation rel, BlockNumber heapBlk, Buffer *vmbuf);
extern void visibilitymap_count(Relation rel, BlockNumber *all_visible, BlockNumber *all_frozen);
extern BlockNumber visibilitymap_prepare_truncate(Relation rel,
diff --git a/src/include/access/visibilitymapdefs.h b/src/include/access/visibilitymapdefs.h
index 5ad5c020877..e01bce4c99f 100644
--- a/src/include/access/visibilitymapdefs.h
+++ b/src/include/access/visibilitymapdefs.h
@@ -21,14 +21,5 @@
#define VISIBILITYMAP_ALL_FROZEN 0x02
#define VISIBILITYMAP_VALID_BITS 0x03 /* OR of all valid visibilitymap
* flags bits */
-/*
- * To detect recovery conflicts during logical decoding on a standby, we need
- * to know if a table is a user catalog table. For that we add an additional
- * bit into xl_heap_visible.flags, in addition to the above.
- *
- * NB: VISIBILITYMAP_XLOG_* may not be passed to visibilitymap_set().
- */
-#define VISIBILITYMAP_XLOG_CATALOG_REL 0x04
-#define VISIBILITYMAP_XLOG_VALID_BITS (VISIBILITYMAP_VALID_BITS | VISIBILITYMAP_XLOG_CATALOG_REL)
#endif /* VISIBILITYMAPDEFS_H */
diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list
index 9dd65b10254..1819f3dbb77 100644
--- a/src/tools/pgindent/typedefs.list
+++ b/src/tools/pgindent/typedefs.list
@@ -4306,7 +4306,6 @@ xl_heap_prune
xl_heap_rewrite_mapping
xl_heap_truncate
xl_heap_update
-xl_heap_visible
xl_invalid_page
xl_invalid_page_key
xl_invalidations
--
2.43.0
v25-0008-Rename-GlobalVisTestIsRemovableXid-to-GlobalVisX.patchapplication/x-patch; name=v25-0008-Rename-GlobalVisTestIsRemovableXid-to-GlobalVisX.patchDownload
From 66f6c73a5ab743db859e5a91790c1148ef2ff3e6 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <melanieplageman@gmail.com>
Date: Fri, 18 Jul 2025 16:30:04 -0400
Subject: [PATCH v25 08/14] Rename GlobalVisTestIsRemovableXid() to
GlobalVisXidVisibleToAll()
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
The function is currently only used to check whether a tuple’s xmax is
visible to all transactions (and thus removable). Upcoming changes will
also use it to test whether a tuple’s xmin is visible to all to
decide if a page can be marked all-visible in the visibility map.
The new name, GlobalVisXidVisibleToAll(), better reflects this broader
purpose.
Reviewed-by: Andres Freund <andres@anarazel.de>
Reviewed-by: Kirill Reshke <reshkekirill@gmail.com>
Reviewed-by: Chao Li <li.evan.chao@gmail.com>
---
src/backend/access/heap/heapam_visibility.c | 6 +++---
src/backend/access/heap/pruneheap.c | 12 ++++++------
src/backend/access/spgist/spgvacuum.c | 2 +-
src/backend/storage/ipc/procarray.c | 17 ++++++++---------
src/include/utils/snapmgr.h | 4 ++--
5 files changed, 20 insertions(+), 21 deletions(-)
diff --git a/src/backend/access/heap/heapam_visibility.c b/src/backend/access/heap/heapam_visibility.c
index 05f6946fe60..4ebc8abdbeb 100644
--- a/src/backend/access/heap/heapam_visibility.c
+++ b/src/backend/access/heap/heapam_visibility.c
@@ -1447,7 +1447,7 @@ HeapTupleSatisfiesNonVacuumable(HeapTuple htup, Snapshot snapshot,
{
Assert(TransactionIdIsValid(dead_after));
- if (GlobalVisTestIsRemovableXid(snapshot->vistest, dead_after))
+ if (GlobalVisXidVisibleToAll(snapshot->vistest, dead_after))
res = HEAPTUPLE_DEAD;
}
else
@@ -1512,8 +1512,8 @@ HeapTupleIsSurelyDead(HeapTuple htup, GlobalVisState *vistest)
return false;
/* Deleter committed, so tuple is dead if the XID is old enough. */
- return GlobalVisTestIsRemovableXid(vistest,
- HeapTupleHeaderGetRawXmax(tuple));
+ return GlobalVisXidVisibleToAll(vistest,
+ HeapTupleHeaderGetRawXmax(tuple));
}
/*
diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index 7c36b89324e..b7ccef1c084 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -252,7 +252,7 @@ heap_page_prune_opt(Relation relation, Buffer buffer)
*/
vistest = GlobalVisTestFor(relation);
- if (!GlobalVisTestIsRemovableXid(vistest, prune_xid))
+ if (!GlobalVisXidVisibleToAll(vistest, prune_xid))
return;
/*
@@ -486,7 +486,7 @@ prune_freeze_plan(Oid reloid, Buffer buffer, PruneState *prstate,
* Determining HTSV only once for each tuple is required for correctness,
* to deal with cases where running HTSV twice could result in different
* results. For example, RECENTLY_DEAD can turn to DEAD if another
- * checked item causes GlobalVisTestIsRemovableFullXid() to update the
+ * checked item causes GlobalVisFullXidVisibleToAll() to update the
* horizon, or INSERT_IN_PROGRESS can change to DEAD if the inserting
* transaction aborts.
*
@@ -1297,11 +1297,11 @@ heap_prune_satisfies_vacuum(PruneState *prstate, HeapTuple tup, Buffer buffer)
* Determine whether or not the tuple is considered dead when compared
* with the provided GlobalVisState. On-access pruning does not provide
* VacuumCutoffs. And for vacuum, even if the tuple's xmax is not older
- * than OldestXmin, GlobalVisTestIsRemovableXid() could find the row dead
- * if the GlobalVisState has been updated since the beginning of vacuuming
+ * than OldestXmin, GlobalVisXidVisibleToAll() could find the row dead if
+ * the GlobalVisState has been updated since the beginning of vacuuming
* the relation.
*/
- if (GlobalVisTestIsRemovableXid(prstate->vistest, dead_after))
+ if (GlobalVisXidVisibleToAll(prstate->vistest, dead_after))
return HEAPTUPLE_DEAD;
return res;
@@ -1760,7 +1760,7 @@ heap_prune_record_unchanged_lp_normal(Page page, PruneState *prstate, OffsetNumb
/*
* For now always use prstate->cutoffs for this test, because
* we only update 'all_visible' and 'all_frozen' when freezing
- * is requested. We could use GlobalVisTestIsRemovableXid
+ * is requested. We could use GlobalVisXidVisibleToAll()
* instead, if a non-freezing caller wanted to set the VM bit.
*/
Assert(prstate->cutoffs);
diff --git a/src/backend/access/spgist/spgvacuum.c b/src/backend/access/spgist/spgvacuum.c
index cb5671c1a4e..3a68757c09a 100644
--- a/src/backend/access/spgist/spgvacuum.c
+++ b/src/backend/access/spgist/spgvacuum.c
@@ -536,7 +536,7 @@ vacuumRedirectAndPlaceholder(Relation index, Relation heaprel, Buffer buffer)
*/
if (dt->tupstate == SPGIST_REDIRECT &&
(!TransactionIdIsValid(dt->xid) ||
- GlobalVisTestIsRemovableXid(vistest, dt->xid)))
+ GlobalVisXidVisibleToAll(vistest, dt->xid)))
{
dt->tupstate = SPGIST_PLACEHOLDER;
Assert(opaque->nRedirection > 0);
diff --git a/src/backend/storage/ipc/procarray.c b/src/backend/storage/ipc/procarray.c
index f3a1603204e..67da6737496 100644
--- a/src/backend/storage/ipc/procarray.c
+++ b/src/backend/storage/ipc/procarray.c
@@ -4179,8 +4179,7 @@ GlobalVisUpdate(void)
* See comment for GlobalVisState for details.
*/
bool
-GlobalVisTestIsRemovableFullXid(GlobalVisState *state,
- FullTransactionId fxid)
+GlobalVisFullXidVisibleToAll(GlobalVisState *state, FullTransactionId fxid)
{
/*
* If fxid is older than maybe_needed bound, it definitely is visible to
@@ -4214,14 +4213,14 @@ GlobalVisTestIsRemovableFullXid(GlobalVisState *state,
}
/*
- * Wrapper around GlobalVisTestIsRemovableFullXid() for 32bit xids.
+ * Wrapper around GlobalVisFullXidVisibleToAll() for 32bit xids.
*
* It is crucial that this only gets called for xids from a source that
* protects against xid wraparounds (e.g. from a table and thus protected by
* relfrozenxid).
*/
bool
-GlobalVisTestIsRemovableXid(GlobalVisState *state, TransactionId xid)
+GlobalVisXidVisibleToAll(GlobalVisState *state, TransactionId xid)
{
FullTransactionId fxid;
@@ -4235,12 +4234,12 @@ GlobalVisTestIsRemovableXid(GlobalVisState *state, TransactionId xid)
*/
fxid = FullXidRelativeTo(state->definitely_needed, xid);
- return GlobalVisTestIsRemovableFullXid(state, fxid);
+ return GlobalVisFullXidVisibleToAll(state, fxid);
}
/*
* Convenience wrapper around GlobalVisTestFor() and
- * GlobalVisTestIsRemovableFullXid(), see their comments.
+ * GlobalVisFullXidVisibleToAll(), see their comments.
*/
bool
GlobalVisCheckRemovableFullXid(Relation rel, FullTransactionId fxid)
@@ -4249,12 +4248,12 @@ GlobalVisCheckRemovableFullXid(Relation rel, FullTransactionId fxid)
state = GlobalVisTestFor(rel);
- return GlobalVisTestIsRemovableFullXid(state, fxid);
+ return GlobalVisFullXidVisibleToAll(state, fxid);
}
/*
* Convenience wrapper around GlobalVisTestFor() and
- * GlobalVisTestIsRemovableXid(), see their comments.
+ * GlobalVisTestIsVisibleXid(), see their comments.
*/
bool
GlobalVisCheckRemovableXid(Relation rel, TransactionId xid)
@@ -4263,7 +4262,7 @@ GlobalVisCheckRemovableXid(Relation rel, TransactionId xid)
state = GlobalVisTestFor(rel);
- return GlobalVisTestIsRemovableXid(state, xid);
+ return GlobalVisXidVisibleToAll(state, xid);
}
/*
diff --git a/src/include/utils/snapmgr.h b/src/include/utils/snapmgr.h
index 604c1f90216..a0ea2cfcea2 100644
--- a/src/include/utils/snapmgr.h
+++ b/src/include/utils/snapmgr.h
@@ -100,8 +100,8 @@ extern char *ExportSnapshot(Snapshot snapshot);
*/
typedef struct GlobalVisState GlobalVisState;
extern GlobalVisState *GlobalVisTestFor(Relation rel);
-extern bool GlobalVisTestIsRemovableXid(GlobalVisState *state, TransactionId xid);
-extern bool GlobalVisTestIsRemovableFullXid(GlobalVisState *state, FullTransactionId fxid);
+extern bool GlobalVisXidVisibleToAll(GlobalVisState *state, TransactionId xid);
+extern bool GlobalVisFullXidVisibleToAll(GlobalVisState *state, FullTransactionId fxid);
extern bool GlobalVisCheckRemovableXid(Relation rel, TransactionId xid);
extern bool GlobalVisCheckRemovableFullXid(Relation rel, FullTransactionId fxid);
--
2.43.0
v25-0009-Use-GlobalVisState-in-vacuum-to-determine-page-l.patchapplication/x-patch; name=v25-0009-Use-GlobalVisState-in-vacuum-to-determine-page-l.patchDownload
From ebf2e3ddd0d222991bf089ddc8ac784e43dfa140 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <melanieplageman@gmail.com>
Date: Tue, 29 Jul 2025 14:38:24 -0400
Subject: [PATCH v25 09/14] Use GlobalVisState in vacuum to determine page
level visibility
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
During vacuum's first and third phases, we examine tuples' visibility
to determine if we can set the page all-visible in the visibility map.
Previously, this check compared tuple xmins against a single XID chosen at
the start of vacuum (OldestXmin). We now use GlobalVisState, which also
enables future work to set the VM during on-access pruning, since ordinary
queries have access to GlobalVisState but not OldestXmin.
This also benefits vacuum directly: in some cases, GlobalVisState may
advance during a vacuum, allowing more pages to become considered
all-visible. And, in the future, we could easily add a heuristic to
update GlobalVisState more frequently during vacuums of large tables. In
the rare case that the GlobalVisState moves backward, vacuum falls back
to OldestXmin to ensure we don’t attempt to freeze a dead tuple that
wasn’t yet prunable according to the GlobalVisState.
Because comparing a transaction ID against GlobalVisState is more
expensive than comparing against a single XID, we defer this check until
after scanning all tuples on the page. If visibility_cutoff_xid was
maintained, we perform the GlobalVisState check only once per page.
This is safe because visibility_cutoff_xid records the newest xmin on
the page; if it is globally visible, then the entire page is all-visible.
This approach may result in examining more tuple xmins than before,
since with OldestXmin we could sometimes rule out the page being
all-visible earlier. However, profiling shows the additional cost is not
significant.
Reviewed-by: Andres Freund <andres@anarazel.de>
Reviewed-by: Chao Li <li.evan.chao@gmail.com>
---
src/backend/access/heap/heapam_visibility.c | 28 ++++++++++++++
src/backend/access/heap/pruneheap.c | 43 +++++++++------------
src/backend/access/heap/vacuumlazy.c | 10 ++---
src/include/access/heapam.h | 11 +++---
4 files changed, 58 insertions(+), 34 deletions(-)
diff --git a/src/backend/access/heap/heapam_visibility.c b/src/backend/access/heap/heapam_visibility.c
index 4ebc8abdbeb..edd529dc3c0 100644
--- a/src/backend/access/heap/heapam_visibility.c
+++ b/src/backend/access/heap/heapam_visibility.c
@@ -1189,6 +1189,34 @@ HeapTupleSatisfiesVacuum(HeapTuple htup, TransactionId OldestXmin,
return res;
}
+/*
+ * Nearly the same as HeapTupleSatisfiesVacuum, but uses a GlobalVisState to
+ * determine whether or not a tuple is HEAPTUPLE_DEAD Or
+ * HEAPTUPLE_RECENTLY_DEAD. It serves the same purpose but can be used by
+ * callers that have not calculated a single OldestXmin value.
+ */
+HTSV_Result
+HeapTupleSatisfiesVacuumGlobalVis(HeapTuple htup, GlobalVisState *vistest,
+ Buffer buffer)
+{
+ TransactionId dead_after = InvalidTransactionId;
+ HTSV_Result res;
+
+ res = HeapTupleSatisfiesVacuumHorizon(htup, buffer, &dead_after);
+
+ if (res == HEAPTUPLE_RECENTLY_DEAD)
+ {
+ Assert(TransactionIdIsValid(dead_after));
+
+ if (GlobalVisXidVisibleToAll(vistest, dead_after))
+ res = HEAPTUPLE_DEAD;
+ }
+ else
+ Assert(!TransactionIdIsValid(dead_after));
+
+ return res;
+}
+
/*
* Work horse for HeapTupleSatisfiesVacuum and similar routines.
*
diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index b7ccef1c084..a5eab2b41a0 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -449,11 +449,12 @@ prune_freeze_setup(PruneFreezeParams *params,
/*
* The visibility cutoff xid is the newest xmin of live, committed tuples
- * older than OldestXmin on the page. This field is only kept up-to-date
- * if the page is all-visible. As soon as a tuple is encountered that is
- * not visible to all, this field is unmaintained. As long as it is
- * maintained, it can be used to calculate the snapshot conflict horizon
- * when updating the VM and/or freezing all the tuples on the page.
+ * on the page older than the visibility horizon represented in the
+ * GlobalVisState. This field is only kept up-to-date if the page is
+ * all-visible. As soon as a tuple is encountered that is not visible to
+ * all, this field is unmaintained. As long as it is maintained, it can be
+ * used to calculate the snapshot conflict horizon when updating the VM
+ * and/or freezing all the tuples on the page.
*/
prstate->visibility_cutoff_xid = InvalidTransactionId;
}
@@ -965,14 +966,13 @@ heap_page_will_set_vm(Relation relation,
*/
static bool
heap_page_is_all_visible(Relation rel, Buffer buf,
- TransactionId OldestXmin,
+ GlobalVisState *vistest,
bool *all_frozen,
TransactionId *visibility_cutoff_xid,
OffsetNumber *logging_offnum)
{
- return heap_page_would_be_all_visible(rel, buf,
- OldestXmin,
+ return heap_page_would_be_all_visible(rel, buf, vistest,
NULL, 0,
all_frozen,
visibility_cutoff_xid,
@@ -1058,6 +1058,16 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
prune_freeze_plan(RelationGetRelid(params->relation),
buffer, &prstate, off_loc);
+ /*
+ * After processing all the live tuples on the page, if the newest xmin
+ * amongst them is not visible to everyone, the page cannot be
+ * all-visible.
+ */
+ if (prstate.all_visible &&
+ TransactionIdIsNormal(prstate.visibility_cutoff_xid) &&
+ !GlobalVisXidVisibleToAll(prstate.vistest, prstate.visibility_cutoff_xid))
+ prstate.all_visible = prstate.all_frozen = false;
+
/*
* If checksums are enabled, calling heap_prune_satisfies_vacuum() while
* checking tuple visibility information in prune_freeze_plan() may have
@@ -1225,10 +1235,9 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
bool debug_all_frozen;
Assert(prstate.lpdead_items == 0);
- Assert(prstate.cutoffs);
Assert(heap_page_is_all_visible(params->relation, buffer,
- prstate.cutoffs->OldestXmin,
+ prstate.vistest,
&debug_all_frozen,
&debug_cutoff, off_loc));
@@ -1757,20 +1766,6 @@ heap_prune_record_unchanged_lp_normal(Page page, PruneState *prstate, OffsetNumb
*/
xmin = HeapTupleHeaderGetXmin(htup);
- /*
- * For now always use prstate->cutoffs for this test, because
- * we only update 'all_visible' and 'all_frozen' when freezing
- * is requested. We could use GlobalVisXidVisibleToAll()
- * instead, if a non-freezing caller wanted to set the VM bit.
- */
- Assert(prstate->cutoffs);
- if (!TransactionIdPrecedes(xmin, prstate->cutoffs->OldestXmin))
- {
- prstate->all_visible = false;
- prstate->all_frozen = false;
- break;
- }
-
/* Track newest xmin on page. */
if (TransactionIdFollows(xmin, prstate->visibility_cutoff_xid) &&
TransactionIdIsNormal(xmin))
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index c18030087c1..10543eca065 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -2748,7 +2748,7 @@ lazy_vacuum_heap_page(LVRelState *vacrel, BlockNumber blkno, Buffer buffer,
* done outside the critical section.
*/
if (heap_page_would_be_all_visible(vacrel->rel, buffer,
- vacrel->cutoffs.OldestXmin,
+ vacrel->vistest,
deadoffsets, num_offsets,
&all_frozen, &visibility_cutoff_xid,
&vacrel->offnum))
@@ -3508,7 +3508,7 @@ dead_items_cleanup(LVRelState *vacrel)
* Returns true if the page is all-visible other than the provided
* deadoffsets and false otherwise.
*
- * OldestXmin is used to determine visibility.
+ * vistest is used to determine visibility.
*
* Output parameters:
*
@@ -3524,7 +3524,7 @@ dead_items_cleanup(LVRelState *vacrel)
*/
bool
heap_page_would_be_all_visible(Relation rel, Buffer buf,
- TransactionId OldestXmin,
+ GlobalVisState *vistest,
OffsetNumber *deadoffsets,
int ndeadoffsets,
bool *all_frozen,
@@ -3598,7 +3598,7 @@ heap_page_would_be_all_visible(Relation rel, Buffer buf,
/* Visibility checks may do IO or allocate memory */
Assert(CritSectionCount == 0);
- switch (HeapTupleSatisfiesVacuum(&tuple, OldestXmin, buf))
+ switch (HeapTupleSatisfiesVacuumGlobalVis(&tuple, vistest, buf))
{
case HEAPTUPLE_LIVE:
{
@@ -3617,7 +3617,7 @@ heap_page_would_be_all_visible(Relation rel, Buffer buf,
* that everyone sees it as committed?
*/
xmin = HeapTupleHeaderGetXmin(tuple.t_data);
- if (!TransactionIdPrecedes(xmin, OldestXmin))
+ if (!GlobalVisXidVisibleToAll(vistest, xmin))
{
all_visible = false;
*all_frozen = false;
diff --git a/src/include/access/heapam.h b/src/include/access/heapam.h
index 392af6503da..b6f1b3fb448 100644
--- a/src/include/access/heapam.h
+++ b/src/include/access/heapam.h
@@ -278,10 +278,9 @@ typedef struct PruneFreezeParams
/*
* Contains the cutoffs used for freezing. They are required if the
- * HEAP_PAGE_PRUNE_FREEZE option is set. cutoffs->OldestXmin is also used
- * to determine if dead tuples are HEAPTUPLE_RECENTLY_DEAD or
- * HEAPTUPLE_DEAD. Currently only vacuum passes in cutoffs. Vacuum
- * calculates them once, at the beginning of vacuuming the relation.
+ * HEAP_PAGE_PRUNE_FREEZE option is set. Currently only vacuum passes in
+ * cutoffs. Vacuum calculates them once, at the beginning of vacuuming the
+ * relation.
*/
struct VacuumCutoffs *cutoffs;
} PruneFreezeParams;
@@ -445,7 +444,7 @@ extern void log_heap_prune_and_freeze(Relation relation, Buffer buffer,
extern void heap_vacuum_rel(Relation rel,
const VacuumParams params, BufferAccessStrategy bstrategy);
extern bool heap_page_would_be_all_visible(Relation rel, Buffer buf,
- TransactionId OldestXmin,
+ GlobalVisState *vistest,
OffsetNumber *deadoffsets,
int ndeadoffsets,
bool *all_frozen,
@@ -459,6 +458,8 @@ extern TM_Result HeapTupleSatisfiesUpdate(HeapTuple htup, CommandId curcid,
Buffer buffer);
extern HTSV_Result HeapTupleSatisfiesVacuum(HeapTuple htup, TransactionId OldestXmin,
Buffer buffer);
+extern HTSV_Result HeapTupleSatisfiesVacuumGlobalVis(HeapTuple htup,
+ GlobalVisState *vistest, Buffer buffer);
extern HTSV_Result HeapTupleSatisfiesVacuumHorizon(HeapTuple htup, Buffer buffer,
TransactionId *dead_after);
extern void HeapTupleSetHintBits(HeapTupleHeader tuple, Buffer buffer,
--
2.43.0
v25-0011-Track-which-relations-are-modified-by-a-query.patchapplication/x-patch; name=v25-0011-Track-which-relations-are-modified-by-a-query.patchDownload
From 15eeb3a01ced0214bb2b189b9e273936b25d523f Mon Sep 17 00:00:00 2001
From: Melanie Plageman <melanieplageman@gmail.com>
Date: Wed, 3 Dec 2025 15:07:24 -0500
Subject: [PATCH v25 11/14] Track which relations are modified by a query
Save the relids in a bitmap in the estate. A later commit will pass this
information down to scan nodes to control whether or not the scan allows
setting the visibility map while on-access pruning. We don't want to set
the visibility map if the query is just going to modify the page
immediately after.
---
src/backend/executor/execMain.c | 4 ++++
src/backend/executor/execUtils.c | 2 ++
src/include/nodes/execnodes.h | 6 ++++++
3 files changed, 12 insertions(+)
diff --git a/src/backend/executor/execMain.c b/src/backend/executor/execMain.c
index 797d8b1ca1c..5b299ef81aa 100644
--- a/src/backend/executor/execMain.c
+++ b/src/backend/executor/execMain.c
@@ -916,6 +916,10 @@ InitPlan(QueryDesc *queryDesc, int eflags)
break;
}
+ /* If it has a rowmark, the relation is modified */
+ estate->es_modified_relids = bms_add_member(estate->es_modified_relids,
+ rc->rti);
+
/* Check that relation is a legal target for marking */
if (relation)
CheckValidRowMarkRel(relation, rc->markType);
diff --git a/src/backend/executor/execUtils.c b/src/backend/executor/execUtils.c
index fdc65c2b42b..28a06dcd244 100644
--- a/src/backend/executor/execUtils.c
+++ b/src/backend/executor/execUtils.c
@@ -893,6 +893,8 @@ ExecInitResultRelation(EState *estate, ResultRelInfo *resultRelInfo,
estate->es_result_relations = (ResultRelInfo **)
palloc0(estate->es_range_table_size * sizeof(ResultRelInfo *));
estate->es_result_relations[rti - 1] = resultRelInfo;
+ estate->es_modified_relids = bms_add_member(estate->es_modified_relids,
+ rti);
/*
* Saving in the list allows to avoid needlessly traversing the whole
diff --git a/src/include/nodes/execnodes.h b/src/include/nodes/execnodes.h
index 64ff6996431..7f6522cea8e 100644
--- a/src/include/nodes/execnodes.h
+++ b/src/include/nodes/execnodes.h
@@ -676,6 +676,12 @@ typedef struct EState
* ExecDoInitialPruning() */
const char *es_sourceText; /* Source text from QueryDesc */
+ /*
+ * RT indexes of relations modified by the query either through
+ * UPDATE/DELETE/INSERT/MERGE or SELECT FOR UPDATE
+ */
+ Bitmapset *es_modified_relids;
+
JunkFilter *es_junkFilter; /* top-level junk filter, if any */
/* If query can insert/delete tuples, the command ID to mark them with */
--
2.43.0
v25-0012-Pass-down-information-on-table-modification-to-s.patchapplication/x-patch; name=v25-0012-Pass-down-information-on-table-modification-to-s.patchDownload
From 7d352b125b3239a3a3cc030b6a58d5fcae43c139 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <melanieplageman@gmail.com>
Date: Wed, 3 Dec 2025 15:12:18 -0500
Subject: [PATCH v25 12/14] Pass down information on table modification to scan
node
Pass down information to sequential scan, index scan, and bitmap table
scan nodes on whether or not the query modifies the relation being
scanned. A later commit will use this information to update the VM
during on-access pruning only if the relation is not modified by the
query.
---
contrib/pgrowlocks/pgrowlocks.c | 2 +-
src/backend/access/brin/brin.c | 3 ++-
src/backend/access/gin/gininsert.c | 3 ++-
src/backend/access/heap/heapam_handler.c | 7 +++---
src/backend/access/index/genam.c | 4 ++--
src/backend/access/index/indexam.c | 6 +++---
src/backend/access/nbtree/nbtsort.c | 2 +-
src/backend/access/table/tableam.c | 7 +++---
src/backend/commands/constraint.c | 2 +-
src/backend/commands/copyto.c | 2 +-
src/backend/commands/tablecmds.c | 4 ++--
src/backend/commands/typecmds.c | 4 ++--
src/backend/executor/execIndexing.c | 2 +-
src/backend/executor/execReplication.c | 8 +++----
src/backend/executor/nodeBitmapHeapscan.c | 9 +++++++-
src/backend/executor/nodeIndexonlyscan.c | 2 +-
src/backend/executor/nodeIndexscan.c | 11 ++++++++--
src/backend/executor/nodeSeqscan.c | 26 ++++++++++++++++++++---
src/backend/partitioning/partbounds.c | 2 +-
src/backend/utils/adt/selfuncs.c | 2 +-
src/include/access/genam.h | 2 +-
src/include/access/heapam.h | 6 ++++++
src/include/access/tableam.h | 19 ++++++++++-------
23 files changed, 91 insertions(+), 44 deletions(-)
diff --git a/contrib/pgrowlocks/pgrowlocks.c b/contrib/pgrowlocks/pgrowlocks.c
index f88269332b6..27f01d8055f 100644
--- a/contrib/pgrowlocks/pgrowlocks.c
+++ b/contrib/pgrowlocks/pgrowlocks.c
@@ -114,7 +114,7 @@ pgrowlocks(PG_FUNCTION_ARGS)
RelationGetRelationName(rel));
/* Scan the relation */
- scan = table_beginscan(rel, GetActiveSnapshot(), 0, NULL);
+ scan = table_beginscan(rel, GetActiveSnapshot(), 0, NULL, 0);
hscan = (HeapScanDesc) scan;
attinmeta = TupleDescGetAttInMetadata(rsinfo->setDesc);
diff --git a/src/backend/access/brin/brin.c b/src/backend/access/brin/brin.c
index 26cb75058d1..4ad8941c60a 100644
--- a/src/backend/access/brin/brin.c
+++ b/src/backend/access/brin/brin.c
@@ -2842,7 +2842,8 @@ _brin_parallel_scan_and_build(BrinBuildState *state,
indexInfo->ii_Concurrent = brinshared->isconcurrent;
scan = table_beginscan_parallel(heap,
- ParallelTableScanFromBrinShared(brinshared));
+ ParallelTableScanFromBrinShared(brinshared),
+ 0);
reltuples = table_index_build_scan(heap, index, indexInfo, true, true,
brinbuildCallbackParallel, state, scan);
diff --git a/src/backend/access/gin/gininsert.c b/src/backend/access/gin/gininsert.c
index df30dcc0228..aaa5401b731 100644
--- a/src/backend/access/gin/gininsert.c
+++ b/src/backend/access/gin/gininsert.c
@@ -2058,7 +2058,8 @@ _gin_parallel_scan_and_build(GinBuildState *state,
indexInfo->ii_Concurrent = ginshared->isconcurrent;
scan = table_beginscan_parallel(heap,
- ParallelTableScanFromGinBuildShared(ginshared));
+ ParallelTableScanFromGinBuildShared(ginshared),
+ 0);
reltuples = table_index_build_scan(heap, index, indexInfo, true, progress,
ginBuildCallbackParallel, state, scan);
diff --git a/src/backend/access/heap/heapam_handler.c b/src/backend/access/heap/heapam_handler.c
index dd4fe6bf62f..6c2e4e08b16 100644
--- a/src/backend/access/heap/heapam_handler.c
+++ b/src/backend/access/heap/heapam_handler.c
@@ -79,12 +79,13 @@ heapam_slot_callbacks(Relation relation)
*/
static IndexFetchTableData *
-heapam_index_fetch_begin(Relation rel)
+heapam_index_fetch_begin(Relation rel, uint32 flags)
{
IndexFetchHeapData *hscan = palloc0_object(IndexFetchHeapData);
hscan->xs_base.rel = rel;
hscan->xs_cbuf = InvalidBuffer;
+ hscan->modifies_base_rel = !(flags & SO_HINT_REL_READ_ONLY);
return &hscan->xs_base;
}
@@ -753,7 +754,7 @@ heapam_relation_copy_for_cluster(Relation OldHeap, Relation NewHeap,
tableScan = NULL;
heapScan = NULL;
- indexScan = index_beginscan(OldHeap, OldIndex, SnapshotAny, NULL, 0, 0);
+ indexScan = index_beginscan(OldHeap, OldIndex, SnapshotAny, NULL, 0, 0, 0);
index_rescan(indexScan, NULL, 0, NULL, 0);
}
else
@@ -762,7 +763,7 @@ heapam_relation_copy_for_cluster(Relation OldHeap, Relation NewHeap,
pgstat_progress_update_param(PROGRESS_CLUSTER_PHASE,
PROGRESS_CLUSTER_PHASE_SEQ_SCAN_HEAP);
- tableScan = table_beginscan(OldHeap, SnapshotAny, 0, (ScanKey) NULL);
+ tableScan = table_beginscan(OldHeap, SnapshotAny, 0, (ScanKey) NULL, 0);
heapScan = (HeapScanDesc) tableScan;
indexScan = NULL;
diff --git a/src/backend/access/index/genam.c b/src/backend/access/index/genam.c
index 707c25289cd..468830cc0b8 100644
--- a/src/backend/access/index/genam.c
+++ b/src/backend/access/index/genam.c
@@ -447,7 +447,7 @@ systable_beginscan(Relation heapRelation,
}
sysscan->iscan = index_beginscan(heapRelation, irel,
- snapshot, NULL, nkeys, 0);
+ snapshot, NULL, nkeys, 0, 0);
index_rescan(sysscan->iscan, idxkey, nkeys, NULL, 0);
sysscan->scan = NULL;
@@ -708,7 +708,7 @@ systable_beginscan_ordered(Relation heapRelation,
}
sysscan->iscan = index_beginscan(heapRelation, indexRelation,
- snapshot, NULL, nkeys, 0);
+ snapshot, NULL, nkeys, 0, 0);
index_rescan(sysscan->iscan, idxkey, nkeys, NULL, 0);
sysscan->scan = NULL;
diff --git a/src/backend/access/index/indexam.c b/src/backend/access/index/indexam.c
index 0492d92d23b..b5523cf2ab1 100644
--- a/src/backend/access/index/indexam.c
+++ b/src/backend/access/index/indexam.c
@@ -257,7 +257,7 @@ index_beginscan(Relation heapRelation,
Relation indexRelation,
Snapshot snapshot,
IndexScanInstrumentation *instrument,
- int nkeys, int norderbys)
+ int nkeys, int norderbys, uint32 flags)
{
IndexScanDesc scan;
@@ -284,7 +284,7 @@ index_beginscan(Relation heapRelation,
scan->instrument = instrument;
/* prepare to fetch index matches from table */
- scan->xs_heapfetch = table_index_fetch_begin(heapRelation);
+ scan->xs_heapfetch = table_index_fetch_begin(heapRelation, flags);
return scan;
}
@@ -615,7 +615,7 @@ index_beginscan_parallel(Relation heaprel, Relation indexrel,
scan->instrument = instrument;
/* prepare to fetch index matches from table */
- scan->xs_heapfetch = table_index_fetch_begin(heaprel);
+ scan->xs_heapfetch = table_index_fetch_begin(heaprel, 0);
return scan;
}
diff --git a/src/backend/access/nbtree/nbtsort.c b/src/backend/access/nbtree/nbtsort.c
index d7695dc1108..7bdbc7e5fa7 100644
--- a/src/backend/access/nbtree/nbtsort.c
+++ b/src/backend/access/nbtree/nbtsort.c
@@ -1925,7 +1925,7 @@ _bt_parallel_scan_and_sort(BTSpool *btspool, BTSpool *btspool2,
indexInfo = BuildIndexInfo(btspool->index);
indexInfo->ii_Concurrent = btshared->isconcurrent;
scan = table_beginscan_parallel(btspool->heap,
- ParallelTableScanFromBTShared(btshared));
+ ParallelTableScanFromBTShared(btshared), 0);
reltuples = table_index_build_scan(btspool->heap, btspool->index, indexInfo,
true, progress, _bt_build_callback,
&buildstate, scan);
diff --git a/src/backend/access/table/tableam.c b/src/backend/access/table/tableam.c
index 1e099febdc8..db2a302a486 100644
--- a/src/backend/access/table/tableam.c
+++ b/src/backend/access/table/tableam.c
@@ -163,10 +163,11 @@ table_parallelscan_initialize(Relation rel, ParallelTableScanDesc pscan,
}
TableScanDesc
-table_beginscan_parallel(Relation relation, ParallelTableScanDesc pscan)
+table_beginscan_parallel(Relation relation, ParallelTableScanDesc pscan, uint32 flags)
{
Snapshot snapshot;
- uint32 flags = SO_TYPE_SEQSCAN |
+
+ flags |= SO_TYPE_SEQSCAN |
SO_ALLOW_STRAT | SO_ALLOW_SYNC | SO_ALLOW_PAGEMODE;
Assert(RelFileLocatorEquals(relation->rd_locator, pscan->phs_locator));
@@ -248,7 +249,7 @@ table_index_fetch_tuple_check(Relation rel,
bool found;
slot = table_slot_create(rel, NULL);
- scan = table_index_fetch_begin(rel);
+ scan = table_index_fetch_begin(rel, 0);
found = table_index_fetch_tuple(scan, tid, snapshot, slot, &call_again,
all_dead);
table_index_fetch_end(scan);
diff --git a/src/backend/commands/constraint.c b/src/backend/commands/constraint.c
index 3497a8221f2..97c8278e36d 100644
--- a/src/backend/commands/constraint.c
+++ b/src/backend/commands/constraint.c
@@ -106,7 +106,7 @@ unique_key_recheck(PG_FUNCTION_ARGS)
*/
tmptid = checktid;
{
- IndexFetchTableData *scan = table_index_fetch_begin(trigdata->tg_relation);
+ IndexFetchTableData *scan = table_index_fetch_begin(trigdata->tg_relation, 0);
bool call_again = false;
if (!table_index_fetch_tuple(scan, &tmptid, SnapshotSelf, slot,
diff --git a/src/backend/commands/copyto.c b/src/backend/commands/copyto.c
index dae91630ac3..1957bb0f1a2 100644
--- a/src/backend/commands/copyto.c
+++ b/src/backend/commands/copyto.c
@@ -1156,7 +1156,7 @@ CopyRelationTo(CopyToState cstate, Relation rel, Relation root_rel, uint64 *proc
AttrMap *map = NULL;
TupleTableSlot *root_slot = NULL;
- scandesc = table_beginscan(rel, GetActiveSnapshot(), 0, NULL);
+ scandesc = table_beginscan(rel, GetActiveSnapshot(), 0, NULL, 0);
slot = table_slot_create(rel, NULL);
/*
diff --git a/src/backend/commands/tablecmds.c b/src/backend/commands/tablecmds.c
index 1c9ef53be20..1c00e053e05 100644
--- a/src/backend/commands/tablecmds.c
+++ b/src/backend/commands/tablecmds.c
@@ -6345,7 +6345,7 @@ ATRewriteTable(AlteredTableInfo *tab, Oid OIDNewHeap)
* checking all the constraints.
*/
snapshot = RegisterSnapshot(GetLatestSnapshot());
- scan = table_beginscan(oldrel, snapshot, 0, NULL);
+ scan = table_beginscan(oldrel, snapshot, 0, NULL, 0);
/*
* Switch to per-tuple memory context and reset it for each tuple
@@ -13730,7 +13730,7 @@ validateForeignKeyConstraint(char *conname,
*/
snapshot = RegisterSnapshot(GetLatestSnapshot());
slot = table_slot_create(rel, NULL);
- scan = table_beginscan(rel, snapshot, 0, NULL);
+ scan = table_beginscan(rel, snapshot, 0, NULL, 0);
perTupCxt = AllocSetContextCreate(CurrentMemoryContext,
"validateForeignKeyConstraint",
diff --git a/src/backend/commands/typecmds.c b/src/backend/commands/typecmds.c
index 0eb8e0a2bb0..6319db488fc 100644
--- a/src/backend/commands/typecmds.c
+++ b/src/backend/commands/typecmds.c
@@ -3157,7 +3157,7 @@ validateDomainNotNullConstraint(Oid domainoid)
/* Scan all tuples in this relation */
snapshot = RegisterSnapshot(GetLatestSnapshot());
- scan = table_beginscan(testrel, snapshot, 0, NULL);
+ scan = table_beginscan(testrel, snapshot, 0, NULL, 0);
slot = table_slot_create(testrel, NULL);
while (table_scan_getnextslot(scan, ForwardScanDirection, slot))
{
@@ -3238,7 +3238,7 @@ validateDomainCheckConstraint(Oid domainoid, const char *ccbin, LOCKMODE lockmod
/* Scan all tuples in this relation */
snapshot = RegisterSnapshot(GetLatestSnapshot());
- scan = table_beginscan(testrel, snapshot, 0, NULL);
+ scan = table_beginscan(testrel, snapshot, 0, NULL, 0);
slot = table_slot_create(testrel, NULL);
while (table_scan_getnextslot(scan, ForwardScanDirection, slot))
{
diff --git a/src/backend/executor/execIndexing.c b/src/backend/executor/execIndexing.c
index 0b3a31f1703..74262a34819 100644
--- a/src/backend/executor/execIndexing.c
+++ b/src/backend/executor/execIndexing.c
@@ -816,7 +816,7 @@ check_exclusion_or_unique_constraint(Relation heap, Relation index,
retry:
conflict = false;
found_self = false;
- index_scan = index_beginscan(heap, index, &DirtySnapshot, NULL, indnkeyatts, 0);
+ index_scan = index_beginscan(heap, index, &DirtySnapshot, NULL, indnkeyatts, 0, 0);
index_rescan(index_scan, scankeys, indnkeyatts, NULL, 0);
while (index_getnext_slot(index_scan, ForwardScanDirection, existing_slot))
diff --git a/src/backend/executor/execReplication.c b/src/backend/executor/execReplication.c
index 860f79f9cc1..6e49ea5c5d8 100644
--- a/src/backend/executor/execReplication.c
+++ b/src/backend/executor/execReplication.c
@@ -204,7 +204,7 @@ RelationFindReplTupleByIndex(Relation rel, Oid idxoid,
skey_attoff = build_replindex_scan_key(skey, rel, idxrel, searchslot);
/* Start an index scan. */
- scan = index_beginscan(rel, idxrel, &snap, NULL, skey_attoff, 0);
+ scan = index_beginscan(rel, idxrel, &snap, NULL, skey_attoff, 0, 0);
retry:
found = false;
@@ -382,7 +382,7 @@ RelationFindReplTupleSeq(Relation rel, LockTupleMode lockmode,
/* Start a heap scan. */
InitDirtySnapshot(snap);
- scan = table_beginscan(rel, &snap, 0, NULL);
+ scan = table_beginscan(rel, &snap, 0, NULL, 0);
scanslot = table_slot_create(rel, NULL);
retry:
@@ -601,7 +601,7 @@ RelationFindDeletedTupleInfoSeq(Relation rel, TupleTableSlot *searchslot,
* not yet committed or those just committed prior to the scan are
* excluded in update_most_recent_deletion_info().
*/
- scan = table_beginscan(rel, SnapshotAny, 0, NULL);
+ scan = table_beginscan(rel, SnapshotAny, 0, NULL, 0);
scanslot = table_slot_create(rel, NULL);
table_rescan(scan, NULL);
@@ -665,7 +665,7 @@ RelationFindDeletedTupleInfoByIndex(Relation rel, Oid idxoid,
* not yet committed or those just committed prior to the scan are
* excluded in update_most_recent_deletion_info().
*/
- scan = index_beginscan(rel, idxrel, SnapshotAny, NULL, skey_attoff, 0);
+ scan = index_beginscan(rel, idxrel, SnapshotAny, NULL, skey_attoff, 0, 0);
index_rescan(scan, skey, skey_attoff, NULL, 0);
diff --git a/src/backend/executor/nodeBitmapHeapscan.c b/src/backend/executor/nodeBitmapHeapscan.c
index bf24f3d7fe0..0d854db51a1 100644
--- a/src/backend/executor/nodeBitmapHeapscan.c
+++ b/src/backend/executor/nodeBitmapHeapscan.c
@@ -105,11 +105,18 @@ BitmapTableScanSetup(BitmapHeapScanState *node)
*/
if (!node->ss.ss_currentScanDesc)
{
+ uint32 flags = 0;
+
+ if (!bms_is_member(((Scan *) node->ss.ps.plan)->scanrelid,
+ node->ss.ps.state->es_modified_relids))
+ flags = SO_HINT_REL_READ_ONLY;
+
node->ss.ss_currentScanDesc =
table_beginscan_bm(node->ss.ss_currentRelation,
node->ss.ps.state->es_snapshot,
0,
- NULL);
+ NULL,
+ flags);
}
node->ss.ss_currentScanDesc->st.rs_tbmiterator = tbmiterator;
diff --git a/src/backend/executor/nodeIndexonlyscan.c b/src/backend/executor/nodeIndexonlyscan.c
index 6bea42f128f..2c87ba5f767 100644
--- a/src/backend/executor/nodeIndexonlyscan.c
+++ b/src/backend/executor/nodeIndexonlyscan.c
@@ -94,7 +94,7 @@ IndexOnlyNext(IndexOnlyScanState *node)
estate->es_snapshot,
&node->ioss_Instrument,
node->ioss_NumScanKeys,
- node->ioss_NumOrderByKeys);
+ node->ioss_NumOrderByKeys, 0);
node->ioss_ScanDesc = scandesc;
diff --git a/src/backend/executor/nodeIndexscan.c b/src/backend/executor/nodeIndexscan.c
index 72b135e5dcf..92674441c6d 100644
--- a/src/backend/executor/nodeIndexscan.c
+++ b/src/backend/executor/nodeIndexscan.c
@@ -102,6 +102,12 @@ IndexNext(IndexScanState *node)
if (scandesc == NULL)
{
+ uint32 flags = 0;
+
+ if (!bms_is_member(((Scan *) node->ss.ps.plan)->scanrelid,
+ estate->es_modified_relids))
+ flags = SO_HINT_REL_READ_ONLY;
+
/*
* We reach here if the index scan is not parallel, or if we're
* serially executing an index scan that was planned to be parallel.
@@ -111,7 +117,8 @@ IndexNext(IndexScanState *node)
estate->es_snapshot,
&node->iss_Instrument,
node->iss_NumScanKeys,
- node->iss_NumOrderByKeys);
+ node->iss_NumOrderByKeys,
+ flags);
node->iss_ScanDesc = scandesc;
@@ -207,7 +214,7 @@ IndexNextWithReorder(IndexScanState *node)
estate->es_snapshot,
&node->iss_Instrument,
node->iss_NumScanKeys,
- node->iss_NumOrderByKeys);
+ node->iss_NumOrderByKeys, 0);
node->iss_ScanDesc = scandesc;
diff --git a/src/backend/executor/nodeSeqscan.c b/src/backend/executor/nodeSeqscan.c
index 94047d29430..4d0cbb9dee4 100644
--- a/src/backend/executor/nodeSeqscan.c
+++ b/src/backend/executor/nodeSeqscan.c
@@ -65,13 +65,20 @@ SeqNext(SeqScanState *node)
if (scandesc == NULL)
{
+ uint32 flags = 0;
+
+ if (!bms_is_member(((Scan *) node->ss.ps.plan)->scanrelid,
+ estate->es_modified_relids))
+ flags = SO_HINT_REL_READ_ONLY;
+
/*
* We reach here if the scan is not parallel, or if we're serially
* executing a scan that was planned to be parallel.
*/
scandesc = table_beginscan(node->ss.ss_currentRelation,
estate->es_snapshot,
- 0, NULL);
+ 0, NULL, flags);
+
node->ss.ss_currentScanDesc = scandesc;
}
@@ -367,14 +374,20 @@ ExecSeqScanInitializeDSM(SeqScanState *node,
{
EState *estate = node->ss.ps.state;
ParallelTableScanDesc pscan;
+ uint32 flags = 0;
pscan = shm_toc_allocate(pcxt->toc, node->pscan_len);
table_parallelscan_initialize(node->ss.ss_currentRelation,
pscan,
estate->es_snapshot);
shm_toc_insert(pcxt->toc, node->ss.ps.plan->plan_node_id, pscan);
+ if (!bms_is_member(((Scan *) node->ss.ps.plan)->scanrelid,
+ estate->es_modified_relids))
+ flags = SO_HINT_REL_READ_ONLY;
+
node->ss.ss_currentScanDesc =
- table_beginscan_parallel(node->ss.ss_currentRelation, pscan);
+ table_beginscan_parallel(node->ss.ss_currentRelation, pscan,
+ flags);
}
/* ----------------------------------------------------------------
@@ -404,8 +417,15 @@ ExecSeqScanInitializeWorker(SeqScanState *node,
ParallelWorkerContext *pwcxt)
{
ParallelTableScanDesc pscan;
+ uint32 flags = 0;
+
+ if (!bms_is_member(((Scan *) node->ss.ps.plan)->scanrelid,
+ node->ss.ps.state->es_modified_relids))
+ flags = SO_HINT_REL_READ_ONLY;
pscan = shm_toc_lookup(pwcxt->toc, node->ss.ps.plan->plan_node_id, false);
node->ss.ss_currentScanDesc =
- table_beginscan_parallel(node->ss.ss_currentRelation, pscan);
+ table_beginscan_parallel(node->ss.ss_currentRelation,
+ pscan,
+ flags);
}
diff --git a/src/backend/partitioning/partbounds.c b/src/backend/partitioning/partbounds.c
index 40ac700d529..a6626325296 100644
--- a/src/backend/partitioning/partbounds.c
+++ b/src/backend/partitioning/partbounds.c
@@ -3361,7 +3361,7 @@ check_default_partition_contents(Relation parent, Relation default_rel,
econtext = GetPerTupleExprContext(estate);
snapshot = RegisterSnapshot(GetLatestSnapshot());
tupslot = table_slot_create(part_rel, &estate->es_tupleTable);
- scan = table_beginscan(part_rel, snapshot, 0, NULL);
+ scan = table_beginscan(part_rel, snapshot, 0, NULL, 0);
/*
* Switch to per-tuple memory context and reset it for each tuple
diff --git a/src/backend/utils/adt/selfuncs.c b/src/backend/utils/adt/selfuncs.c
index c760b19db55..ec0def0d1e2 100644
--- a/src/backend/utils/adt/selfuncs.c
+++ b/src/backend/utils/adt/selfuncs.c
@@ -7100,7 +7100,7 @@ get_actual_variable_endpoint(Relation heapRel,
index_scan = index_beginscan(heapRel, indexRel,
&SnapshotNonVacuumable, NULL,
- 1, 0);
+ 1, 0, 0);
/* Set it up for index-only scan */
index_scan->xs_want_itup = true;
index_rescan(index_scan, scankeys, 1, NULL, 0);
diff --git a/src/include/access/genam.h b/src/include/access/genam.h
index 9200a22bd9f..d29d9e905fc 100644
--- a/src/include/access/genam.h
+++ b/src/include/access/genam.h
@@ -177,7 +177,7 @@ extern IndexScanDesc index_beginscan(Relation heapRelation,
Relation indexRelation,
Snapshot snapshot,
IndexScanInstrumentation *instrument,
- int nkeys, int norderbys);
+ int nkeys, int norderbys, uint32 flags);
extern IndexScanDesc index_beginscan_bitmap(Relation indexRelation,
Snapshot snapshot,
IndexScanInstrumentation *instrument,
diff --git a/src/include/access/heapam.h b/src/include/access/heapam.h
index b6f1b3fb448..480a1bd654f 100644
--- a/src/include/access/heapam.h
+++ b/src/include/access/heapam.h
@@ -119,6 +119,12 @@ typedef struct IndexFetchHeapData
Buffer xs_cbuf; /* current heap buffer in scan, if any */
/* NB: if xs_cbuf is not InvalidBuffer, we hold a pin on that buffer */
+
+ /*
+ * Some optimizations can only be performed if the query does not modify
+ * the underlying relation. Track that here.
+ */
+ bool modifies_base_rel;
} IndexFetchHeapData;
/* Result codes for HeapTupleSatisfiesVacuum */
diff --git a/src/include/access/tableam.h b/src/include/access/tableam.h
index 2fa790b6bf5..d10b1b03cdb 100644
--- a/src/include/access/tableam.h
+++ b/src/include/access/tableam.h
@@ -63,6 +63,8 @@ typedef enum ScanOptions
/* unregister snapshot at scan end? */
SO_TEMP_SNAPSHOT = 1 << 9,
+ /* set if the query doesn't modify the rel */
+ SO_HINT_REL_READ_ONLY = 1 << 10,
} ScanOptions;
/*
@@ -420,7 +422,7 @@ typedef struct TableAmRoutine
*
* Tuples for an index scan can then be fetched via index_fetch_tuple.
*/
- struct IndexFetchTableData *(*index_fetch_begin) (Relation rel);
+ struct IndexFetchTableData *(*index_fetch_begin) (Relation rel, uint32 flags);
/*
* Reset index fetch. Typically this will release cross index fetch
@@ -874,9 +876,9 @@ extern TupleTableSlot *table_slot_create(Relation relation, List **reglist);
*/
static inline TableScanDesc
table_beginscan(Relation rel, Snapshot snapshot,
- int nkeys, ScanKeyData *key)
+ int nkeys, ScanKeyData *key, uint32 flags)
{
- uint32 flags = SO_TYPE_SEQSCAN |
+ flags |= SO_TYPE_SEQSCAN |
SO_ALLOW_STRAT | SO_ALLOW_SYNC | SO_ALLOW_PAGEMODE;
return rel->rd_tableam->scan_begin(rel, snapshot, nkeys, key, NULL, flags);
@@ -919,9 +921,9 @@ table_beginscan_strat(Relation rel, Snapshot snapshot,
*/
static inline TableScanDesc
table_beginscan_bm(Relation rel, Snapshot snapshot,
- int nkeys, ScanKeyData *key)
+ int nkeys, ScanKeyData *key, uint32 flags)
{
- uint32 flags = SO_TYPE_BITMAPSCAN | SO_ALLOW_PAGEMODE;
+ flags |= SO_TYPE_BITMAPSCAN | SO_ALLOW_PAGEMODE;
return rel->rd_tableam->scan_begin(rel, snapshot, nkeys, key,
NULL, flags);
@@ -1128,7 +1130,8 @@ extern void table_parallelscan_initialize(Relation rel,
* Caller must hold a suitable lock on the relation.
*/
extern TableScanDesc table_beginscan_parallel(Relation relation,
- ParallelTableScanDesc pscan);
+ ParallelTableScanDesc pscan,
+ uint32 flags);
/*
* Begin a parallel tid range scan. `pscan` needs to have been initialized
@@ -1164,9 +1167,9 @@ table_parallelscan_reinitialize(Relation rel, ParallelTableScanDesc pscan)
* Tuples for an index scan can then be fetched via table_index_fetch_tuple().
*/
static inline IndexFetchTableData *
-table_index_fetch_begin(Relation rel)
+table_index_fetch_begin(Relation rel, uint32 flags)
{
- return rel->rd_tableam->index_fetch_begin(rel);
+ return rel->rd_tableam->index_fetch_begin(rel, flags);
}
/*
--
2.43.0
v25-0013-Allow-on-access-pruning-to-set-pages-all-visible.patchapplication/x-patch; name=v25-0013-Allow-on-access-pruning-to-set-pages-all-visible.patchDownload
From 63e4ad0f2cba76b874e8915da8a5b92e2ec00fb6 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <melanieplageman@gmail.com>
Date: Wed, 3 Dec 2025 15:24:08 -0500
Subject: [PATCH v25 13/14] Allow on-access pruning to set pages all-visible
Many queries do not modify the underlying relation. For such queries, if
on-access pruning occurs during the scan, we can check whether the page
has become all-visible and update the visibility map accordingly.
Previously, only vacuum and COPY FREEZE marked pages as all-visible or
all-frozen.
This commit implements on-access VM setting for sequential scans as well
as for the underlying heap relation in index scans and bitmap heap
scans.
Author: Melanie Plageman <melanieplageman@gmail.com>
Reviewed-by: Andres Freund <andres@anarazel.de>
Reviewed-by: Kirill Reshke <reshkekirill@gmail.com>
Discussion: https://postgr.es/m/flat/CAAKRu_ZMw6Npd_qm2KM%2BFwQ3cMOMx1Dh3VMhp8-V7SOLxdK9-g%40mail.gmail.com
---
src/backend/access/heap/heapam.c | 15 ++++-
src/backend/access/heap/heapam_handler.c | 15 ++++-
src/backend/access/heap/pruneheap.c | 61 ++++++++++++++++---
src/include/access/heapam.h | 24 +++++++-
.../t/035_standby_logical_decoding.pl | 3 +-
5 files changed, 101 insertions(+), 17 deletions(-)
diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index 60cc6ba998d..deb64e19ae8 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -570,6 +570,7 @@ heap_prepare_pagescan(TableScanDesc sscan)
Buffer buffer = scan->rs_cbuf;
BlockNumber block = scan->rs_cblock;
Snapshot snapshot;
+ Buffer *vmbuffer = NULL;
Page page;
int lines;
bool all_visible;
@@ -584,7 +585,9 @@ heap_prepare_pagescan(TableScanDesc sscan)
/*
* Prune and repair fragmentation for the whole page, if possible.
*/
- heap_page_prune_opt(scan->rs_base.rs_rd, buffer);
+ if (sscan->rs_flags & SO_HINT_REL_READ_ONLY)
+ vmbuffer = &scan->rs_vmbuffer;
+ heap_page_prune_opt(scan->rs_base.rs_rd, buffer, vmbuffer);
/*
* We must hold share lock on the buffer content while examining tuple
@@ -1261,6 +1264,7 @@ heap_beginscan(Relation relation, Snapshot snapshot,
sizeof(TBMIterateResult));
}
+ scan->rs_vmbuffer = InvalidBuffer;
return (TableScanDesc) scan;
}
@@ -1299,6 +1303,12 @@ heap_rescan(TableScanDesc sscan, ScanKey key, bool set_params,
scan->rs_cbuf = InvalidBuffer;
}
+ if (BufferIsValid(scan->rs_vmbuffer))
+ {
+ ReleaseBuffer(scan->rs_vmbuffer);
+ scan->rs_vmbuffer = InvalidBuffer;
+ }
+
/*
* SO_TYPE_BITMAPSCAN would be cleaned up here, but it does not hold any
* additional data vs a normal HeapScan
@@ -1331,6 +1341,9 @@ heap_endscan(TableScanDesc sscan)
if (BufferIsValid(scan->rs_cbuf))
ReleaseBuffer(scan->rs_cbuf);
+ if (BufferIsValid(scan->rs_vmbuffer))
+ ReleaseBuffer(scan->rs_vmbuffer);
+
/*
* Must free the read stream before freeing the BufferAccessStrategy.
*/
diff --git a/src/backend/access/heap/heapam_handler.c b/src/backend/access/heap/heapam_handler.c
index 6c2e4e08b16..2cb98e58956 100644
--- a/src/backend/access/heap/heapam_handler.c
+++ b/src/backend/access/heap/heapam_handler.c
@@ -85,6 +85,7 @@ heapam_index_fetch_begin(Relation rel, uint32 flags)
hscan->xs_base.rel = rel;
hscan->xs_cbuf = InvalidBuffer;
+ hscan->xs_vmbuffer = InvalidBuffer;
hscan->modifies_base_rel = !(flags & SO_HINT_REL_READ_ONLY);
return &hscan->xs_base;
@@ -100,6 +101,12 @@ heapam_index_fetch_reset(IndexFetchTableData *scan)
ReleaseBuffer(hscan->xs_cbuf);
hscan->xs_cbuf = InvalidBuffer;
}
+
+ if (BufferIsValid(hscan->xs_vmbuffer))
+ {
+ ReleaseBuffer(hscan->xs_vmbuffer);
+ hscan->xs_vmbuffer = InvalidBuffer;
+ }
}
static void
@@ -139,7 +146,8 @@ heapam_index_fetch_tuple(struct IndexFetchTableData *scan,
* Prune page, but only if we weren't already on this page
*/
if (prev_buf != hscan->xs_cbuf)
- heap_page_prune_opt(hscan->xs_base.rel, hscan->xs_cbuf);
+ heap_page_prune_opt(hscan->xs_base.rel, hscan->xs_cbuf,
+ hscan->modifies_base_rel ? NULL : &hscan->xs_vmbuffer);
}
/* Obtain share-lock on the buffer so we can examine visibility */
@@ -2472,6 +2480,7 @@ BitmapHeapScanNextBlock(TableScanDesc scan,
TBMIterateResult *tbmres;
OffsetNumber offsets[TBM_MAX_TUPLES_PER_PAGE];
int noffsets = -1;
+ Buffer *vmbuffer = NULL;
Assert(scan->rs_flags & SO_TYPE_BITMAPSCAN);
Assert(hscan->rs_read_stream);
@@ -2518,7 +2527,9 @@ BitmapHeapScanNextBlock(TableScanDesc scan,
/*
* Prune and repair fragmentation for the whole page, if possible.
*/
- heap_page_prune_opt(scan->rs_rd, buffer);
+ if (scan->rs_flags & SO_HINT_REL_READ_ONLY)
+ vmbuffer = &hscan->rs_vmbuffer;
+ heap_page_prune_opt(scan->rs_rd, buffer, vmbuffer);
/*
* We must hold share lock on the buffer content while examining tuple
diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index 4fe869fea99..912684ead63 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -205,7 +205,9 @@ static bool heap_page_will_set_vm(Relation relation,
Buffer heap_buf,
Buffer vmbuffer,
bool blk_known_av,
- const PruneState *prstate,
+ PruneReason reason,
+ bool do_prune, bool do_freeze,
+ PruneState *prstate,
uint8 *new_vmbits);
@@ -220,9 +222,13 @@ static bool heap_page_will_set_vm(Relation relation,
* if there's not any use in pruning.
*
* Caller must have pin on the buffer, and must *not* have a lock on it.
+ *
+ * If vmbuffer is not NULL, it is okay for pruning to set the visibility map if
+ * the page is all-visible. We will take care of pinning and, if needed,
+ * reading in the page of the visibility map.
*/
void
-heap_page_prune_opt(Relation relation, Buffer buffer)
+heap_page_prune_opt(Relation relation, Buffer buffer, Buffer *vmbuffer)
{
Page page = BufferGetPage(buffer);
TransactionId prune_xid;
@@ -304,6 +310,13 @@ heap_page_prune_opt(Relation relation, Buffer buffer)
.cutoffs = NULL,
};
+ if (vmbuffer)
+ {
+ visibilitymap_pin(relation, BufferGetBlockNumber(buffer), vmbuffer);
+ params.options |= HEAP_PAGE_PRUNE_UPDATE_VM;
+ params.vmbuffer = *vmbuffer;
+ }
+
heap_page_prune_and_freeze(¶ms, &presult, &dummy_off_loc,
NULL, NULL);
@@ -862,6 +875,9 @@ get_conflict_xid(bool do_prune, bool do_freeze, bool do_set_vm, uint8 new_vmbits
* corrupted, it will fix them by clearing the VM bits and page visibility
* hint. This does not need to be done in a critical section.
*
+ * This should be called only after do_freeze has been decided (and do_prune
+ * has been set), as these factor into our heuristic-based decision.
+ *
* Returns true if one or both VM bits should be set, along with returning the
* desired what bits should be set in the VM in *new_vmbits.
*/
@@ -871,7 +887,9 @@ heap_page_will_set_vm(Relation relation,
Buffer heap_buf,
Buffer vmbuffer,
bool blk_known_av,
- const PruneState *prstate,
+ PruneReason reason,
+ bool do_prune, bool do_freeze,
+ PruneState *prstate,
uint8 *new_vmbits)
{
Page heap_page = BufferGetPage(heap_buf);
@@ -884,6 +902,24 @@ heap_page_will_set_vm(Relation relation,
return false;
}
+ /*
+ * If this is an on-access call and we're not actually pruning, avoid
+ * setting the visibility map if it would newly dirty the heap page or, if
+ * the page is already dirty, if doing so would require including a
+ * full-page image (FPI) of the heap page in the WAL. This situation
+ * should be rare, as on-access pruning is only attempted when
+ * pd_prune_xid is valid.
+ */
+ if (reason == PRUNE_ON_ACCESS &&
+ prstate->all_visible &&
+ !do_prune && !do_freeze &&
+ (!BufferIsDirty(heap_buf) || XLogCheckBufferNeedsBackup(heap_buf)))
+ {
+ prstate->all_visible = false;
+ prstate->all_frozen = false;
+ return false;
+ }
+
/*
* Determine what the visibility map bits should be set to using the
* values of all_visible and all_frozen determined during
@@ -906,14 +942,15 @@ heap_page_will_set_vm(Relation relation,
* WAL-logged.
*
* As of PostgreSQL 9.2, the visibility map bit should never be set if the
- * page-level bit is clear. However, it's possible that the bit got
- * cleared after heap_vac_scan_next_block() was called, so we must recheck
- * with buffer lock before concluding that the VM is corrupt.
+ * page-level bit is clear. However, it's possible that in vacuum the bit
+ * got cleared after heap_vac_scan_next_block() was called, so we must
+ * recheck with buffer lock before concluding that the VM is corrupt.
*
* Callers which did not check the visibility map and determine
* blk_known_av will not be eligible for this, however the cost of
* potentially needing to read the visibility map for pages that are not
- * all-visible is too high to justify generalizing the check.
+ * all-visible is too high to justify generalizing the check. A future
+ * vacuum will have to take care of fixing the corruption.
*/
else if (blk_known_av && !PageIsAllVisible(heap_page) &&
visibilitymap_get_status(relation, heap_blk, &vmbuffer) != 0)
@@ -1122,6 +1159,7 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
*/
do_set_vm = heap_page_will_set_vm(params->relation,
blockno, buffer, vmbuffer, params->blk_known_av,
+ params->reason, do_prune, do_freeze,
&prstate, &new_vmbits);
/*
@@ -1193,15 +1231,18 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
old_vmbits = visibilitymap_set(blockno,
vmbuffer, new_vmbits,
params->relation->rd_locator);
- Assert(old_vmbits != new_vmbits);
}
MarkBufferDirty(buffer);
/*
- * Emit a WAL XLOG_HEAP2_PRUNE* record showing what we did
+ * Emit a WAL XLOG_HEAP2_PRUNE* record showing what we did. If we were
+ * only planning to update the VM, and it turns out that it was
+ * already set, there is no need to emit WAL. As such, we must check
+ * again that there is some change to emit WAL for.
*/
- if (RelationNeedsWAL(params->relation))
+ if (RelationNeedsWAL(params->relation) &&
+ (do_prune || do_freeze || (do_set_vm && old_vmbits != new_vmbits)))
{
log_heap_prune_and_freeze(params->relation, buffer,
do_set_vm ? vmbuffer : InvalidBuffer,
diff --git a/src/include/access/heapam.h b/src/include/access/heapam.h
index 480a1bd654f..89538652566 100644
--- a/src/include/access/heapam.h
+++ b/src/include/access/heapam.h
@@ -95,6 +95,13 @@ typedef struct HeapScanDescData
*/
ParallelBlockTableScanWorkerData *rs_parallelworkerdata;
+ /*
+ * For sequential scans and bitmap heap scans. If the relation is not
+ * being modified, on-access pruning may read in the current heap page's
+ * corresponding VM block to this buffer.
+ */
+ Buffer rs_vmbuffer;
+
/* these fields only used in page-at-a-time mode and for bitmap scans */
uint32 rs_cindex; /* current tuple's index in vistuples */
uint32 rs_ntuples; /* number of visible tuples on page */
@@ -117,8 +124,18 @@ typedef struct IndexFetchHeapData
{
IndexFetchTableData xs_base; /* AM independent part of the descriptor */
- Buffer xs_cbuf; /* current heap buffer in scan, if any */
- /* NB: if xs_cbuf is not InvalidBuffer, we hold a pin on that buffer */
+ /*
+ * Current heap buffer in scan, if any. NB: if xs_cbuf is not
+ * InvalidBuffer, we hold a pin on that buffer.
+ */
+ Buffer xs_cbuf;
+
+ /*
+ * For index scans that do not modify the underlying heap table, on-access
+ * pruning may read in the current heap page's corresponding VM block to
+ * this buffer.
+ */
+ Buffer xs_vmbuffer;
/*
* Some optimizations can only be performed if the query does not modify
@@ -425,7 +442,8 @@ extern TransactionId heap_index_delete_tuples(Relation rel,
TM_IndexDeleteOp *delstate);
/* in heap/pruneheap.c */
-extern void heap_page_prune_opt(Relation relation, Buffer buffer);
+extern void heap_page_prune_opt(Relation relation, Buffer buffer,
+ Buffer *vmbuffer);
extern void heap_page_prune_and_freeze(PruneFreezeParams *params,
PruneFreezeResult *presult,
OffsetNumber *off_loc,
diff --git a/src/test/recovery/t/035_standby_logical_decoding.pl b/src/test/recovery/t/035_standby_logical_decoding.pl
index ebe2fae1789..bdd9f0a62cd 100644
--- a/src/test/recovery/t/035_standby_logical_decoding.pl
+++ b/src/test/recovery/t/035_standby_logical_decoding.pl
@@ -296,6 +296,7 @@ wal_level = 'logical'
max_replication_slots = 4
max_wal_senders = 4
autovacuum = off
+hot_standby_feedback = on
});
$node_primary->dump_info;
$node_primary->start;
@@ -748,7 +749,7 @@ check_pg_recvlogical_stderr($handle,
$logstart = -s $node_standby->logfile;
reactive_slots_change_hfs_and_wait_for_xmins('shared_row_removal_',
- 'no_conflict_', 0, 1);
+ 'no_conflict_', 1, 0);
# This should not trigger a conflict
wait_until_vacuum_can_remove(
--
2.43.0
v25-0014-Set-pd_prune_xid-on-insert.patchtext/x-patch; charset=UTF-8; name=v25-0014-Set-pd_prune_xid-on-insert.patchDownload
From 60bcb22b7f964fdfeffd285ca5dae84d663987e5 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <melanieplageman@gmail.com>
Date: Tue, 29 Jul 2025 16:12:56 -0400
Subject: [PATCH v25 14/14] Set pd_prune_xid on insert
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
Now that visibility map (VM) updates can occur during read-only queries,
it makes sense to also set the page’s pd_prune_xid hint during inserts.
This enables heap_page_prune_and_freeze() to run after a page is
filled with newly inserted tuples the first time it is read.
This change also addresses a long-standing note in heap_insert() and
heap_multi_insert(), which observed that setting pd_prune_xid would
help clean up aborted insertions sooner. Without it, such tuples might
linger until VACUUM, whereas now they can be pruned earlier.
Setting pd_prune_xid on insert can cause a page to be dirtied and
written out when it previously would not have been, affetcting the
reported number of hits in the index-killtuples isolation test. It is
unclear if this is a bug in the way hits are tracked, a faulty test
expectation, or if simply updating the test's expected output is
sufficient remediation.
---
src/backend/access/heap/heapam.c | 25 +++++++++++++------
src/backend/access/heap/heapam_xlog.c | 15 ++++++++++-
.../modules/index/expected/killtuples.out | 6 ++---
3 files changed, 34 insertions(+), 12 deletions(-)
diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index deb64e19ae8..ee95df919c7 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -2119,6 +2119,7 @@ heap_insert(Relation relation, HeapTuple tup, CommandId cid,
TransactionId xid = GetCurrentTransactionId();
HeapTuple heaptup;
Buffer buffer;
+ Page page;
Buffer vmbuffer = InvalidBuffer;
bool all_visible_cleared = false;
@@ -2178,15 +2179,19 @@ heap_insert(Relation relation, HeapTuple tup, CommandId cid,
}
/*
- * XXX Should we set PageSetPrunable on this page ?
+ * Set pd_prune_xid to trigger heap_page_prune_and_freeze() once the page
+ * is full so that we can set the page all-visible in the VM.
*
- * The inserting transaction may eventually abort thus making this tuple
- * DEAD and hence available for pruning. Though we don't want to optimize
- * for aborts, if no other tuple in this page is UPDATEd/DELETEd, the
- * aborted tuple will never be pruned until next vacuum is triggered.
+ * Setting pd_prune_xid is also handy if the inserting transaction
+ * eventually aborts making this tuple DEAD and hence available for
+ * pruning. If no other tuple in this page is UPDATEd/DELETEd, the aborted
+ * tuple would never otherwise be pruned until next vacuum is triggered.
*
- * If you do add PageSetPrunable here, add it in heap_xlog_insert too.
+ * Don't set it if we are in bootstrap mode, though.
*/
+ page = BufferGetPage(buffer);
+ if (TransactionIdIsNormal(xid))
+ PageSetPrunable(page, xid);
MarkBufferDirty(buffer);
@@ -2196,7 +2201,6 @@ heap_insert(Relation relation, HeapTuple tup, CommandId cid,
xl_heap_insert xlrec;
xl_heap_header xlhdr;
XLogRecPtr recptr;
- Page page = BufferGetPage(buffer);
uint8 info = XLOG_HEAP_INSERT;
int bufflags = 0;
@@ -2560,8 +2564,13 @@ heap_multi_insert(Relation relation, TupleTableSlot **slots, int ntuples,
}
/*
- * XXX Should we set PageSetPrunable on this page ? See heap_insert()
+ * Set pd_prune_xid. See heap_insert() for more on why we do this when
+ * inserting tuples. This only makes sense if we aren't already
+ * setting the page frozen in the VM. We also don't set it in
+ * bootstrap mode.
*/
+ if (!all_frozen_set && TransactionIdIsNormal(xid))
+ PageSetPrunable(page, xid);
MarkBufferDirty(buffer);
diff --git a/src/backend/access/heap/heapam_xlog.c b/src/backend/access/heap/heapam_xlog.c
index b66736ea282..5c8dc2718ce 100644
--- a/src/backend/access/heap/heapam_xlog.c
+++ b/src/backend/access/heap/heapam_xlog.c
@@ -447,6 +447,12 @@ heap_xlog_insert(XLogReaderState *record)
freespace = PageGetHeapFreeSpace(page); /* needed to update FSM below */
+ /*
+ * Set the page prunable to trigger on-access pruning later which may
+ * set the page all-visible in the VM.
+ */
+ PageSetPrunable(page, XLogRecGetXid(record));
+
PageSetLSN(page, lsn);
if (xlrec->flags & XLH_INSERT_ALL_VISIBLE_CLEARED)
@@ -596,9 +602,16 @@ heap_xlog_multi_insert(XLogReaderState *record)
if (xlrec->flags & XLH_INSERT_ALL_VISIBLE_CLEARED)
PageClearAllVisible(page);
- /* XLH_INSERT_ALL_FROZEN_SET implies that all tuples are visible */
+ /*
+ * XLH_INSERT_ALL_FROZEN_SET implies that all tuples are visible. If
+ * we are not setting the page frozen, then set the page's prunable
+ * hint so that we trigger on-access pruning later which may set the
+ * page all-visible in the VM.
+ */
if (xlrec->flags & XLH_INSERT_ALL_FROZEN_SET)
PageSetAllVisible(page);
+ else
+ PageSetPrunable(page, XLogRecGetXid(record));
MarkBufferDirty(buffer);
}
diff --git a/src/test/modules/index/expected/killtuples.out b/src/test/modules/index/expected/killtuples.out
index be7ddd756ef..b29f2434b00 100644
--- a/src/test/modules/index/expected/killtuples.out
+++ b/src/test/modules/index/expected/killtuples.out
@@ -54,7 +54,7 @@ step flush: SELECT FROM pg_stat_force_next_flush();
step result: SELECT heap_blks_read + heap_blks_hit - counter.heap_accesses AS new_heap_accesses FROM counter, pg_statio_all_tables WHERE relname = 'kill_prior_tuple';
new_heap_accesses
-----------------
- 1
+ 2
(1 row)
step measure: UPDATE counter SET heap_accesses = (SELECT heap_blks_read + heap_blks_hit FROM pg_statio_all_tables WHERE relname = 'kill_prior_tuple');
@@ -130,7 +130,7 @@ step flush: SELECT FROM pg_stat_force_next_flush();
step result: SELECT heap_blks_read + heap_blks_hit - counter.heap_accesses AS new_heap_accesses FROM counter, pg_statio_all_tables WHERE relname = 'kill_prior_tuple';
new_heap_accesses
-----------------
- 1
+ 2
(1 row)
step measure: UPDATE counter SET heap_accesses = (SELECT heap_blks_read + heap_blks_hit FROM pg_statio_all_tables WHERE relname = 'kill_prior_tuple');
@@ -283,7 +283,7 @@ step flush: SELECT FROM pg_stat_force_next_flush();
step result: SELECT heap_blks_read + heap_blks_hit - counter.heap_accesses AS new_heap_accesses FROM counter, pg_statio_all_tables WHERE relname = 'kill_prior_tuple';
new_heap_accesses
-----------------
- 1
+ 2
(1 row)
step measure: UPDATE counter SET heap_accesses = (SELECT heap_blks_read + heap_blks_hit FROM pg_statio_all_tables WHERE relname = 'kill_prior_tuple');
--
2.43.0
On Dec 11, 2025, at 07:35, Melanie Plageman <melanieplageman@gmail.com> wrote:
On Tue, Dec 9, 2025 at 12:48 PM Melanie Plageman
<melanieplageman@gmail.com> wrote:In this set 0001 and 0002 are independent. 0003-0007 are all small
steps toward the single change in 0007 which combines the VM updates
into the same WAL record as pruning and freezing. 0008 and 0009 are
removing the rest of XLOG_HEAP2_VISIBLE. 0010 - 0012 are refactoring
needed to set the VM during on-access pruning. 0013 - 0015 are small
steps toward setting the VM on-access. And 0016 sets the prune xid on
insert so we may set the VM on-access for pages that have only new
data.I committed 0001 and 0002. attached v25 reflects that.
0001-0004 refactoring steps for eliminate visible record from phase I
(not probably independent commits in the end)
0005 eliminate XLOG_HEAP2_VISIBLE from phase I vac
0006-0007 removing the rest of XLOG_HEAP2_VISIBLE
0008-0010 refactoring for setting VM on-access
0011-0013 setting the VM on-access
0014 - setting pd_prune_xid on insert- Melanie
<v25-0001-Combine-visibilitymap_set-cases-in-lazy_scan_pru.patch><v25-0002-Refactor-lazy_scan_prune-VM-set-logic-into-helpe.patch><v25-0003-Set-the-VM-in-heap_page_prune_and_freeze.patch><v25-0004-Move-VM-assert-into-prune-freeze-code.patch><v25-0005-Eliminate-XLOG_HEAP2_VISIBLE-from-vacuum-phase-I.patch><v25-0006-Eliminate-XLOG_HEAP2_VISIBLE-from-empty-page-vac.patch><v25-0007-Remove-XLOG_HEAP2_VISIBLE-entirely.patch><v25-0008-Rename-GlobalVisTestIsRemovableXid-to-GlobalVisX.patch><v25-0009-Use-GlobalVisState-in-vacuum-to-determine-page-l.patch><v25-0011-Track-which-relations-are-modified-by-a-query.patch><v25-0012-Pass-down-information-on-table-modification-to-s.patch><v25-0013-Allow-on-access-pruning-to-set-pages-all-visible.patch><v25-0014-Set-pd_prune_xid-on-insert.patch>
A few more small comments. Sorry for keeping come out new comments. Actually I learned a lot about vacuum from reviewing this patch.
1 - 0001
```
+-- the checkpoint cleans the buffer dirtied by freezing the sole tuple
+checkpoint;
+-- truncating the VM ensures that the next vacuum will need to set it
+select pg_truncate_visibility_map('test_vac_unmodified_heap');
+-- vacuum sets the VM but does not need to set PD_ALL_VISIBLE so no heap page
+-- modification
+vacuum test_vac_unmodified_heap;
```
The last vacuum is expected to set vm bits, but the test doesn’t verify that. Should we verify that like:
```
evantest=# SELECT blkno, all_visible, all_frozen FROM pg_visibility_map('test_vac_unmodified_heap');
blkno | all_visible | all_frozen
-------+-------------+------------
0 | t | t
(1 row)
```
As you have been using the extension pg_visibility, adding the verification with pg_visibility_map() should not be a burden.
2 - 0001
```
if (presult.all_frozen)
{
+ /*
+ * We can pass InvalidTransactionId as our cutoff_xid, since a
+ * snapshotConflictHorizon sufficient to make everything safe for
+ * REDO was logged when the page's tuples were frozen.
+ */
Assert(!TransactionIdIsValid(presult.vm_conflict_horizon));
- flags |= VISIBILITYMAP_ALL_FROZEN;
+ new_vmbits |= VISIBILITYMAP_ALL_FROZEN;
}
```
The comment here is a little confusing. In the old code, the Assert() as immediately above the call visibilitymap_set(), and cutoff_xid is a parameter to the call. But the new code moves the Assert() as well as the comment far away from the call visibilitymap_set(), so I think the comment should stay together with the call of visibilitymap_set().
3 - 0002
```
* If it finds that the page-level visibility hint or VM is corrupted, it will
* fix them by clearing the VM bits and visibility page hint. This does not
```
In the second line, “visibility page hint” is understandable but feels not quite good. I know it’s actually “page-level visibility hint”, so how about just “visibility hint”.
4 - 0002
```
/*
- * As of PostgreSQL 9.2, the visibility map bit should never be set if the
- * page-level bit is clear. However, it's possible that the bit got
- * cleared after heap_vac_scan_next_block() was called, so we must recheck
- * with buffer lock before concluding that the VM is corrupt.
+ * For the purposes of logging, count whether or not the page was newly
+ * set all-visible and, potentially, all-frozen.
*/
- else if (all_visible_according_to_vm && !PageIsAllVisible(page) &&
- visibilitymap_get_status(vacrel->rel, blkno, &vmbuffer) != 0)
+ if ((old_vmbits & VISIBILITYMAP_ALL_VISIBLE) == 0 &&
+ (new_vmbits & VISIBILITYMAP_ALL_VISIBLE) != 0)
{
```
Without do_set_vm==true, old_vmbits will only be 0, thus this “if-elseif” that uses old_vmbits should be moved into “if (do_set_vm)”. From this perspective, if not do_set_vm, this function can return early, like:
```
Do_set_vm = heap_page_will_set_vm(&new_vmbits)
If (!do_set_vm)
Return presult.ndeleted;
PageSetAllVisible(page);
MarkBufferDirty(buf);
old_vmbits = visibilitymap_set(new_vmbits);
If (old_vmbits..)
{
..
}
Else if (old_vmbits…)
{
…
}
Return presult.ndeleted;
```
5 - 0003
```
/*
* lazy_scan_prune() -- lazy_scan_heap() pruning and freezing.
*
@@ -2076,15 +1979,14 @@ lazy_scan_prune(LVRelState *vacrel,
bool *vm_page_frozen)
{
Relation rel = vacrel->rel;
- bool do_set_vm = false;
- uint8 new_vmbits = 0;
- uint8 old_vmbits = 0;
PruneFreezeResult presult;
PruneFreezeParams params = {
.relation = rel,
.buffer = buf,
+ .vmbuffer = vmbuffer,
+ .blk_known_av = all_visible_according_to_vm,
.reason = PRUNE_VACUUM_SCAN,
- .options = HEAP_PAGE_PRUNE_FREEZE,
+ .options = HEAP_PAGE_PRUNE_FREEZE | HEAP_PAGE_PRUNE_UPDATE_VM,
.vistest = vacrel->vistest,
.cutoffs = &vacrel->cutoffs,
};
```
This maybe a legacy bug. Here presult is not initialized, and it is immediately passed to heap_page_prune_and_freeze():
```
heap_page_prune_and_freeze(¶ms,
&presult, <=== here
&vacrel->offnum,
&vacrel->NewRelfrozenXid, &vacrel->NewRelminMxid);
```
Then heap_page_prune_and_freeze() immediately calls prune_freeze_setup():
```
/* Initialize prstate */
prune_freeze_setup(params,
new_relfrozen_xid, new_relmin_mxid,
presult, &prstate);
```
And prune_freeze_setup() takes presult as a const pointer:
```
static void
prune_freeze_setup(PruneFreezeParams *params,
TransactionId *new_relfrozen_xid,
MultiXactId *new_relmin_mxid,
const PruneFreezeResult *presult, <=== here
PruneState *prstate)
{
prstate->deadoffsets = (OffsetNumber *) presult->deadoffsets; <== here, presult->deadoffsets could be a random value
}
```
As this is a separate issue off the current patch, I just filed a new patch to fix it. Please take a look at:
/messages/by-id/CAEoWx2=jiD1nqch4JQN+odAxZSD7mRvdoHUGJYN2r6tQG_66yQ@mail.gmail.com
6 - 0003
```
+ * Returns true if one or both VM bits should be set, along with returning the
+ * desired what bits should be set in the VM in *new_vmbits.
```
Looks like a typo: “returning the desired what bits should be set”, maybe change to “returning the desired bits to be set”.
Best regards,
--
Chao Li (Evan)
HighGo Software Co., Ltd.
https://www.highgo.com/
On 20.11.25 18:19, Melanie Plageman wrote:
+ prstate->deadoffsets = (OffsetNumber *) presult->deadoffsets;
In your patch
v22-0001-Split-heap_page_prune_and_freeze-into-helpers.patch, the
assignment above casts away the const qualification of the function
argument presult:
+static void
+prune_freeze_setup(PruneFreezeParams *params,
+ TransactionId new_relfrozen_xid,
+ MultiXactId new_relmin_mxid,
+ const PruneFreezeResult *presult,
+ PruneState *prstate)
(The cast is otherwise unnecessary, since the underlying type is the
same on both sides.)
Since prstate->deadoffsets is in fact later modified, this makes the
original const qualification invalid.
I suggest the attached patch to remove the faulty const qualification
and the then-unnecessary cast.
Attachments:
0001-Fix-const-qualification-in-prune_freeze_setup.patch.nocfbottext/plain; charset=UTF-8; name=0001-Fix-const-qualification-in-prune_freeze_setup.patch.nocfbotDownload
From 336aa87add1a85aca84d8ca751c4187a08aa9d7f Mon Sep 17 00:00:00 2001
From: Peter Eisentraut <peter@eisentraut.org>
Date: Sat, 13 Dec 2025 14:45:08 +0100
Subject: [PATCH] Fix const qualification in prune_freeze_setup()
The const qualification of the presult argument is later cast away, so
it was not correct.
---
src/backend/access/heap/pruneheap.c | 6 +++---
1 file changed, 3 insertions(+), 3 deletions(-)
diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index ca44225a10e..4eb49380b92 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -160,7 +160,7 @@ typedef struct
static void prune_freeze_setup(PruneFreezeParams *params,
TransactionId *new_relfrozen_xid,
MultiXactId *new_relmin_mxid,
- const PruneFreezeResult *presult,
+ PruneFreezeResult *presult,
PruneState *prstate);
static void prune_freeze_plan(Oid reloid, Buffer buffer,
PruneState *prstate,
@@ -327,7 +327,7 @@ static void
prune_freeze_setup(PruneFreezeParams *params,
TransactionId *new_relfrozen_xid,
MultiXactId *new_relmin_mxid,
- const PruneFreezeResult *presult,
+ PruneFreezeResult *presult,
PruneState *prstate)
{
/* Copy parameters to prstate */
@@ -382,7 +382,7 @@ prune_freeze_setup(PruneFreezeParams *params,
prstate->recently_dead_tuples = 0;
prstate->hastup = false;
prstate->lpdead_items = 0;
- prstate->deadoffsets = (OffsetNumber *) presult->deadoffsets;
+ prstate->deadoffsets = presult->deadoffsets;
prstate->frz_conflict_horizon = InvalidTransactionId;
/*
--
2.52.0
On Sat, Dec 13, 2025 at 8:59 AM Peter Eisentraut <peter@eisentraut.org> wrote:
On 20.11.25 18:19, Melanie Plageman wrote:
+ prstate->deadoffsets = (OffsetNumber *) presult->deadoffsets;
In your patch
v22-0001-Split-heap_page_prune_and_freeze-into-helpers.patch, the
assignment above casts away the const qualification of the function
argument presult:
Yea, this code (prune_freeze_setup() with a const-qualified
PruneFreezeResult parameter) is actually already in master -- not just
in this patchset.
+static void +prune_freeze_setup(PruneFreezeParams *params, + TransactionId new_relfrozen_xid, + MultiXactId new_relmin_mxid, + const PruneFreezeResult *presult, + PruneState *prstate)(The cast is otherwise unnecessary, since the underlying type is the
same on both sides.)Since prstate->deadoffsets is in fact later modified, this makes the
original const qualification invalid.
I didn't realize I was misusing const here. What I meant to indicate
by defining the prune_freeze_setup() parameter, as const, is that the
PruneFreezeResult wouldn't be modified by prune_freeze_setup(). I did
not mean to indicate that no members of PruneFreezeResult would ever
be modified. deadoffsets is not modified in prune_freeze_setup(). So,
are you saying that I can't define a parameter as const if even the
caller modifies it?
I'm fine with committing a change, I just want to understand.
- Melanie
Hi,
Attached v26 includes a new patch, 0002, which gets rid of
all_visible_according_to_vm in lazy_scan_prune(). We've kept this
cached copy of the all-visible bit since the VM was added way back in
608195a3a365. Back then, the VM wasn't pinned unless
all_visible_according_to_vm was false. Now that we unconditionally
have the VM page pinned, there isn't much performance benefit to using
that cached value. I did some testing of the worst possible case and
saw no difference in timing. By removing that, we simplify heap vacuum
code now. And we improve clarity once the VM update is combined into
the prune/freeze WAL record and when the VM is set on-access.
I think 0001 and 0002 (and maybe 0003) are worthwhile clarity
improvements on their own.
On Wed, Dec 10, 2025 at 11:07 PM Chao Li <li.evan.chao@gmail.com> wrote:
A few more small comments. Sorry for keeping come out new comments. Actually I learned a lot about vacuum from reviewing this patch.
Thanks for the continued review. Your feedback is improving the patchset.
The last vacuum is expected to set vm bits, but the test doesn’t verify that. Should we verify that like:
```
evantest=# SELECT blkno, all_visible, all_frozen FROM pg_visibility_map('test_vac_unmodified_heap');
blkno | all_visible | all_frozen
-------+-------------+------------
0 | t | t
(1 row)
I've done this. I've actually added three such verifications -- one
after each step where the VM is expected to change. It shouldn't be
very expensive, so I think it is okay. The way the test would fail if
the buffer wasn't correctly dirtied is that it would assert out -- so
the visibility map test wouldn't even have a chance to fail. But, I
think it is also okay to confirm that the expected things are
happening with the VM -- it just gives us extra coverage.
if (presult.all_frozen) { + /* + * We can pass InvalidTransactionId as our cutoff_xid, since a + * snapshotConflictHorizon sufficient to make everything safe for + * REDO was logged when the page's tuples were frozen. + */ Assert(!TransactionIdIsValid(presult.vm_conflict_horizon)); - flags |= VISIBILITYMAP_ALL_FROZEN; + new_vmbits |= VISIBILITYMAP_ALL_FROZEN; }The comment here is a little confusing. In the old code, the Assert() as immediately above the call visibilitymap_set(), and cutoff_xid is a parameter to the call. But the new code moves the Assert() as well as the comment far away from the call visibilitymap_set(), so I think the comment should stay together with the call of visibilitymap_set().
Good point. I've moved it closer to visibilitymap_set() and modified
and moved the assert so that it is together with the comment. I think
the comment makes little sense without the assertion.
* If it finds that the page-level visibility hint or VM is corrupted, it will
* fix them by clearing the VM bits and visibility page hint. This does notIn the second line, “visibility page hint” is understandable but feels not quite good. I know it’s actually “page-level visibility hint”, so how about just “visibility hint”.
I've changed this.
/* - * As of PostgreSQL 9.2, the visibility map bit should never be set if the - * page-level bit is clear. However, it's possible that the bit got - * cleared after heap_vac_scan_next_block() was called, so we must recheck - * with buffer lock before concluding that the VM is corrupt. + * For the purposes of logging, count whether or not the page was newly + * set all-visible and, potentially, all-frozen. */ - else if (all_visible_according_to_vm && !PageIsAllVisible(page) && - visibilitymap_get_status(vacrel->rel, blkno, &vmbuffer) != 0) + if ((old_vmbits & VISIBILITYMAP_ALL_VISIBLE) == 0 && + (new_vmbits & VISIBILITYMAP_ALL_VISIBLE) != 0) { ```Without do_set_vm==true, old_vmbits will only be 0, thus this “if-elseif” that uses old_vmbits should be moved into “if (do_set_vm)”. From this perspective, if not do_set_vm, this function can return early, like:
Good point. I've actually gone ahead in 0002 and refactored this whole
section a bit (I got rid of all_visible_according_to_vm). 0002 is a
new patch in this attached v26, and it needs review. I think this
refactoring makes the code quite a bit clearer -- especially once we
start setting the VM on-access. It does, amongst other things, return
early if all_visible is false, like you suggested.
+ * Returns true if one or both VM bits should be set, along with returning the + * desired what bits should be set in the VM in *new_vmbits. ```Looks like a typo: “returning the desired what bits should be set”, maybe change to “returning the desired bits to be set”.
Fixed.
- Melanie
Attachments:
v26-0001-Combine-visibilitymap_set-cases-in-lazy_scan_pru.patchtext/x-patch; charset=UTF-8; name=v26-0001-Combine-visibilitymap_set-cases-in-lazy_scan_pru.patchDownload
From 0749b6a9978f6e74af89d91b8beddf0fa1c7ed03 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <melanieplageman@gmail.com>
Date: Mon, 8 Dec 2025 15:49:54 -0500
Subject: [PATCH v26 01/15] Combine visibilitymap_set() cases in
lazy_scan_prune()
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
lazy_scan_prune() previously had two separate cases that called
visibilitymap_set() after pruning and freezing. These branches were
nearly identical except that one attempted to avoid dirtying the heap
buffer. However, that situation can never occur — the heap buffer cannot
be clean at that point (and we would hit an assertion if it were).
In lazy_scan_prune(), when we change a previously all-visible page to
all-frozen and the page was recorded as all-visible in the visibility
map by find_next_unskippable_block(), the heap buffer will always be
dirty. Either we have just frozen a tuple and already dirtied the
buffer, or the buffer was modified between find_next_unskippable_block()
and heap_page_prune_and_freeze() and then pruned in
heap_page_prune_and_freeze().
Additionally, XLogRegisterBuffer() asserts that the buffer is dirty, so
attempting to add a clean heap buffer to the WAL chain would fail
anyway.
Since the “clean heap buffer with already set VM” case is impossible,
the two visibilitymap_set() branches in lazy_scan_prune() can be merged.
Doing so makes the intent clearer and emphasizes that the heap buffer
must always be marked dirty before being added to the WAL chain.
This commit also adds a test case for vacuuming when no heap
modifications are required. Currently this ensures that the heap buffer
is marked dirty before it is added to the WAL chain, but if we later
remove the heap buffer from the VM-set WAL chain or pass it with the
REGBUF_NO_CHANGES flag, this test would guard that behavior.
Author: Melanie Plageman <melanieplageman@gmail.com>
Reviewed-by: Chao Li <li.evan.chao@gmail.com>
Discussion: https://postgr.es/m/5CEAA162-67B1-44DA-B60D-8B65717E8B05%40gmail.com
---
.../pg_visibility/expected/pg_visibility.out | 35 +++++++
contrib/pg_visibility/sql/pg_visibility.sql | 16 ++++
src/backend/access/heap/vacuumlazy.c | 95 +++++--------------
3 files changed, 73 insertions(+), 73 deletions(-)
diff --git a/contrib/pg_visibility/expected/pg_visibility.out b/contrib/pg_visibility/expected/pg_visibility.out
index 09fa5933a35..cbc04aad016 100644
--- a/contrib/pg_visibility/expected/pg_visibility.out
+++ b/contrib/pg_visibility/expected/pg_visibility.out
@@ -204,6 +204,41 @@ select pg_truncate_visibility_map('test_partition');
(1 row)
+-- test the case where vacuum phase I does not need to modify the heap buffer
+-- and only needs to set the VM
+create table test_vac_unmodified_heap(a int);
+insert into test_vac_unmodified_heap values (1);
+vacuum (freeze) test_vac_unmodified_heap;
+select pg_visibility_map_summary('test_vac_unmodified_heap');
+ pg_visibility_map_summary
+---------------------------
+ (1,1)
+(1 row)
+
+-- the checkpoint cleans the buffer dirtied by freezing the sole tuple
+checkpoint;
+-- truncating the VM ensures that the next vacuum will need to set it
+select pg_truncate_visibility_map('test_vac_unmodified_heap');
+ pg_truncate_visibility_map
+----------------------------
+
+(1 row)
+
+select pg_visibility_map_summary('test_vac_unmodified_heap');
+ pg_visibility_map_summary
+---------------------------
+ (0,0)
+(1 row)
+
+-- vacuum sets the VM but does not need to set PD_ALL_VISIBLE so no heap page
+-- modification
+vacuum test_vac_unmodified_heap;
+select pg_visibility_map_summary('test_vac_unmodified_heap');
+ pg_visibility_map_summary
+---------------------------
+ (1,1)
+(1 row)
+
-- test copy freeze
create table copyfreeze (a int, b char(1500));
-- load all rows via COPY FREEZE and ensure that all pages are set all-visible
diff --git a/contrib/pg_visibility/sql/pg_visibility.sql b/contrib/pg_visibility/sql/pg_visibility.sql
index 5af06ec5b76..0d13116248b 100644
--- a/contrib/pg_visibility/sql/pg_visibility.sql
+++ b/contrib/pg_visibility/sql/pg_visibility.sql
@@ -94,6 +94,22 @@ select count(*) > 0 from pg_visibility_map_summary('test_partition');
select * from pg_check_frozen('test_partition'); -- hopefully none
select pg_truncate_visibility_map('test_partition');
+-- test the case where vacuum phase I does not need to modify the heap buffer
+-- and only needs to set the VM
+create table test_vac_unmodified_heap(a int);
+insert into test_vac_unmodified_heap values (1);
+vacuum (freeze) test_vac_unmodified_heap;
+select pg_visibility_map_summary('test_vac_unmodified_heap');
+-- the checkpoint cleans the buffer dirtied by freezing the sole tuple
+checkpoint;
+-- truncating the VM ensures that the next vacuum will need to set it
+select pg_truncate_visibility_map('test_vac_unmodified_heap');
+select pg_visibility_map_summary('test_vac_unmodified_heap');
+-- vacuum sets the VM but does not need to set PD_ALL_VISIBLE so no heap page
+-- modification
+vacuum test_vac_unmodified_heap;
+select pg_visibility_map_summary('test_vac_unmodified_heap');
+
-- test copy freeze
create table copyfreeze (a int, b char(1500));
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index 62035b7f9c3..811e7e33678 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -2094,16 +2094,14 @@ lazy_scan_prune(LVRelState *vacrel,
* of last heap_vac_scan_next_block() call), and from all_visible and
* all_frozen variables
*/
- if (!all_visible_according_to_vm && presult.all_visible)
+ if ((presult.all_visible && !all_visible_according_to_vm) ||
+ (presult.all_frozen && !VM_ALL_FROZEN(rel, blkno, &vmbuffer)))
{
uint8 old_vmbits;
- uint8 flags = VISIBILITYMAP_ALL_VISIBLE;
+ uint8 new_vmbits = VISIBILITYMAP_ALL_VISIBLE;
if (presult.all_frozen)
- {
- Assert(!TransactionIdIsValid(presult.vm_conflict_horizon));
- flags |= VISIBILITYMAP_ALL_FROZEN;
- }
+ new_vmbits |= VISIBILITYMAP_ALL_FROZEN;
/*
* It should never be the case that the visibility map page is set
@@ -2111,19 +2109,29 @@ lazy_scan_prune(LVRelState *vacrel,
* checksums are not enabled). Regardless, set both bits so that we
* get back in sync.
*
- * NB: If the heap page is all-visible but the VM bit is not set, we
- * don't need to dirty the heap page. However, if checksums are
- * enabled, we do need to make sure that the heap page is dirtied
- * before passing it to visibilitymap_set(), because it may be logged.
- * Given that this situation should only happen in rare cases after a
- * crash, it is not worth optimizing.
+ * Even if PD_ALL_VISIBLE is already set, we don't need to worry about
+ * unnecessarily dirtying the heap buffer, as it must be marked dirty
+ * before adding it to the WAL chain. The only scenario where it is
+ * not already dirty is if the VM was removed, and that isn't worth
+ * optimizing for.
*/
PageSetAllVisible(page);
MarkBufferDirty(buf);
+
+ /*
+ * If the page is being set all-frozen, we pass InvalidTransactionId
+ * as the cutoff_xid, since a snapshot conflict horizon sufficient to
+ * make everything safe for REDO was logged when the page's tuples
+ * were frozen.
+ */
+ Assert(!presult.all_frozen ||
+ !TransactionIdIsValid(presult.vm_conflict_horizon));
+
old_vmbits = visibilitymap_set(vacrel->rel, blkno, buf,
InvalidXLogRecPtr,
- vmbuffer, presult.vm_conflict_horizon,
- flags);
+ vmbuffer,
+ presult.vm_conflict_horizon,
+ new_vmbits);
/*
* If the page wasn't already set all-visible and/or all-frozen in the
@@ -2191,65 +2199,6 @@ lazy_scan_prune(LVRelState *vacrel,
VISIBILITYMAP_VALID_BITS);
}
- /*
- * If the all-visible page is all-frozen but not marked as such yet, mark
- * it as all-frozen.
- */
- else if (all_visible_according_to_vm && presult.all_frozen &&
- !VM_ALL_FROZEN(vacrel->rel, blkno, &vmbuffer))
- {
- uint8 old_vmbits;
-
- /*
- * Avoid relying on all_visible_according_to_vm as a proxy for the
- * page-level PD_ALL_VISIBLE bit being set, since it might have become
- * stale -- even when all_visible is set
- */
- if (!PageIsAllVisible(page))
- {
- PageSetAllVisible(page);
- MarkBufferDirty(buf);
- }
-
- /*
- * Set the page all-frozen (and all-visible) in the VM.
- *
- * We can pass InvalidTransactionId as our cutoff_xid, since a
- * snapshotConflictHorizon sufficient to make everything safe for REDO
- * was logged when the page's tuples were frozen.
- */
- Assert(!TransactionIdIsValid(presult.vm_conflict_horizon));
- old_vmbits = visibilitymap_set(vacrel->rel, blkno, buf,
- InvalidXLogRecPtr,
- vmbuffer, InvalidTransactionId,
- VISIBILITYMAP_ALL_VISIBLE |
- VISIBILITYMAP_ALL_FROZEN);
-
- /*
- * The page was likely already set all-visible in the VM. However,
- * there is a small chance that it was modified sometime between
- * setting all_visible_according_to_vm and checking the visibility
- * during pruning. Check the return value of old_vmbits anyway to
- * ensure the visibility map counters used for logging are accurate.
- */
- if ((old_vmbits & VISIBILITYMAP_ALL_VISIBLE) == 0)
- {
- vacrel->vm_new_visible_pages++;
- vacrel->vm_new_visible_frozen_pages++;
- *vm_page_frozen = true;
- }
-
- /*
- * We already checked that the page was not set all-frozen in the VM
- * above, so we don't need to test the value of old_vmbits.
- */
- else
- {
- vacrel->vm_new_frozen_pages++;
- *vm_page_frozen = true;
- }
- }
-
return presult.ndeleted;
}
--
2.43.0
v26-0002-Eliminate-use-of-cached-VM-value-in-lazy_scan_pr.patchtext/x-patch; charset=US-ASCII; name=v26-0002-Eliminate-use-of-cached-VM-value-in-lazy_scan_pr.patchDownload
From ee1ff0f2e7322f0e034083d37ceab3d8a8ff374d Mon Sep 17 00:00:00 2001
From: Melanie Plageman <melanieplageman@gmail.com>
Date: Thu, 11 Dec 2025 10:48:13 -0500
Subject: [PATCH v26 02/15] Eliminate use of cached VM value in
lazy_scan_prune()
lazy_scan_prune() takes a parameter from lazy_scan_heap() indicating
whether the page was marked all-visible in the VM at the time it was
last checked in find_next_unskippable_block(). This behavior is
historical, dating back to commit 608195a3a365, when we did not pin the
VM page until confirming it was not all-visible. Now that the VM page is
already pinned, there is no meaningful benefit to relying on a cached VM
status.
Removing this cached value simplifies the logic in both lazy_scan_heap()
and lazy_scan_prune(). It also clarifies future work that will set the
visibility map on-access: such paths will not have a cached value
available which would make the logic harder to reason about. Eliminating
it also enables us to detect and repair VM corruption on-access.
Along with removing the cached value and unconditionally checking the
visibility status of the heap page, this commit also moves the VM
corruption handling to occur first. This reordering should have no
performance impact, since the checks are inexpensive and performed only
once per page. It does, however, make the control flow easier to
understand. The new restructuring also makes it possible that after
fixing corruption, the VM could be newly set, if pruning found the page
all-visible.
Author: Melanie Plageman <melanieplageman@gmail.com>
Discussion: https://postgr.es/m/5CEAA162-67B1-44DA-B60D-8B65717E8B05%40gmail.com
---
src/backend/access/heap/vacuumlazy.c | 172 ++++++++++++---------------
1 file changed, 79 insertions(+), 93 deletions(-)
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index 811e7e33678..436143cd12c 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -253,7 +253,6 @@ typedef enum
* about the block it read to the caller.
*/
#define VAC_BLK_WAS_EAGER_SCANNED (1 << 0)
-#define VAC_BLK_ALL_VISIBLE_ACCORDING_TO_VM (1 << 1)
typedef struct LVRelState
{
@@ -358,7 +357,6 @@ typedef struct LVRelState
/* State maintained by heap_vac_scan_next_block() */
BlockNumber current_block; /* last block returned */
BlockNumber next_unskippable_block; /* next unskippable block */
- bool next_unskippable_allvis; /* its visibility status */
bool next_unskippable_eager_scanned; /* if it was eagerly scanned */
Buffer next_unskippable_vmbuffer; /* buffer containing its VM bit */
@@ -432,7 +430,7 @@ static bool lazy_scan_new_or_empty(LVRelState *vacrel, Buffer buf,
bool sharelock, Buffer vmbuffer);
static int lazy_scan_prune(LVRelState *vacrel, Buffer buf,
BlockNumber blkno, Page page,
- Buffer vmbuffer, bool all_visible_according_to_vm,
+ Buffer vmbuffer,
bool *has_lpdead_items, bool *vm_page_frozen);
static bool lazy_scan_noprune(LVRelState *vacrel, Buffer buf,
BlockNumber blkno, Page page,
@@ -1249,7 +1247,6 @@ lazy_scan_heap(LVRelState *vacrel)
/* Initialize for the first heap_vac_scan_next_block() call */
vacrel->current_block = InvalidBlockNumber;
vacrel->next_unskippable_block = InvalidBlockNumber;
- vacrel->next_unskippable_allvis = false;
vacrel->next_unskippable_eager_scanned = false;
vacrel->next_unskippable_vmbuffer = InvalidBuffer;
@@ -1265,13 +1262,13 @@ lazy_scan_heap(LVRelState *vacrel)
MAIN_FORKNUM,
heap_vac_scan_next_block,
vacrel,
- sizeof(uint8));
+ sizeof(bool));
while (true)
{
Buffer buf;
Page page;
- uint8 blk_info = 0;
+ bool was_eager_scanned = false;
int ndeleted = 0;
bool has_lpdead_items;
void *per_buffer_data = NULL;
@@ -1340,13 +1337,13 @@ lazy_scan_heap(LVRelState *vacrel)
if (!BufferIsValid(buf))
break;
- blk_info = *((uint8 *) per_buffer_data);
+ was_eager_scanned = *((bool *) per_buffer_data);
CheckBufferIsPinnedOnce(buf);
page = BufferGetPage(buf);
blkno = BufferGetBlockNumber(buf);
vacrel->scanned_pages++;
- if (blk_info & VAC_BLK_WAS_EAGER_SCANNED)
+ if (was_eager_scanned)
vacrel->eager_scanned_pages++;
/* Report as block scanned, update error traceback information */
@@ -1417,7 +1414,6 @@ lazy_scan_heap(LVRelState *vacrel)
if (got_cleanup_lock)
ndeleted = lazy_scan_prune(vacrel, buf, blkno, page,
vmbuffer,
- blk_info & VAC_BLK_ALL_VISIBLE_ACCORDING_TO_VM,
&has_lpdead_items, &vm_page_frozen);
/*
@@ -1434,8 +1430,7 @@ lazy_scan_heap(LVRelState *vacrel)
* exclude pages skipped due to cleanup lock contention from eager
* freeze algorithm caps.
*/
- if (got_cleanup_lock &&
- (blk_info & VAC_BLK_WAS_EAGER_SCANNED))
+ if (got_cleanup_lock && was_eager_scanned)
{
/* Aggressive vacuums do not eager scan. */
Assert(!vacrel->aggressive);
@@ -1602,7 +1597,6 @@ heap_vac_scan_next_block(ReadStream *stream,
{
BlockNumber next_block;
LVRelState *vacrel = callback_private_data;
- uint8 blk_info = 0;
/* relies on InvalidBlockNumber + 1 overflowing to 0 on first call */
next_block = vacrel->current_block + 1;
@@ -1665,8 +1659,8 @@ heap_vac_scan_next_block(ReadStream *stream,
* otherwise they would've been unskippable.
*/
vacrel->current_block = next_block;
- blk_info |= VAC_BLK_ALL_VISIBLE_ACCORDING_TO_VM;
- *((uint8 *) per_buffer_data) = blk_info;
+ /* Block was not eager scanned */
+ *((bool *) per_buffer_data) = false;
return vacrel->current_block;
}
else
@@ -1678,11 +1672,7 @@ heap_vac_scan_next_block(ReadStream *stream,
Assert(next_block == vacrel->next_unskippable_block);
vacrel->current_block = next_block;
- if (vacrel->next_unskippable_allvis)
- blk_info |= VAC_BLK_ALL_VISIBLE_ACCORDING_TO_VM;
- if (vacrel->next_unskippable_eager_scanned)
- blk_info |= VAC_BLK_WAS_EAGER_SCANNED;
- *((uint8 *) per_buffer_data) = blk_info;
+ *((bool *) per_buffer_data) = vacrel->next_unskippable_eager_scanned;
return vacrel->current_block;
}
}
@@ -1707,7 +1697,6 @@ find_next_unskippable_block(LVRelState *vacrel, bool *skipsallvis)
BlockNumber next_unskippable_block = vacrel->next_unskippable_block + 1;
Buffer next_unskippable_vmbuffer = vacrel->next_unskippable_vmbuffer;
bool next_unskippable_eager_scanned = false;
- bool next_unskippable_allvis;
*skipsallvis = false;
@@ -1717,7 +1706,6 @@ find_next_unskippable_block(LVRelState *vacrel, bool *skipsallvis)
next_unskippable_block,
&next_unskippable_vmbuffer);
- next_unskippable_allvis = (mapbits & VISIBILITYMAP_ALL_VISIBLE) != 0;
/*
* At the start of each eager scan region, normal vacuums with eager
@@ -1736,7 +1724,7 @@ find_next_unskippable_block(LVRelState *vacrel, bool *skipsallvis)
* A block is unskippable if it is not all visible according to the
* visibility map.
*/
- if (!next_unskippable_allvis)
+ if ((mapbits & VISIBILITYMAP_ALL_VISIBLE) == 0)
{
Assert((mapbits & VISIBILITYMAP_ALL_FROZEN) == 0);
break;
@@ -1793,7 +1781,6 @@ find_next_unskippable_block(LVRelState *vacrel, bool *skipsallvis)
/* write the local variables back to vacrel */
vacrel->next_unskippable_block = next_unskippable_block;
- vacrel->next_unskippable_allvis = next_unskippable_allvis;
vacrel->next_unskippable_eager_scanned = next_unskippable_eager_scanned;
vacrel->next_unskippable_vmbuffer = next_unskippable_vmbuffer;
}
@@ -1954,9 +1941,7 @@ cmpOffsetNumbers(const void *a, const void *b)
* Caller must hold pin and buffer cleanup lock on the buffer.
*
* vmbuffer is the buffer containing the VM block with visibility information
- * for the heap block, blkno. all_visible_according_to_vm is the saved
- * visibility status of the heap block looked up earlier by the caller. We
- * won't rely entirely on this status, as it may be out of date.
+ * for the heap block, blkno.
*
* *has_lpdead_items is set to true or false depending on whether, upon return
* from this function, any LP_DEAD items are still present on the page.
@@ -1973,7 +1958,6 @@ lazy_scan_prune(LVRelState *vacrel,
BlockNumber blkno,
Page page,
Buffer vmbuffer,
- bool all_visible_according_to_vm,
bool *has_lpdead_items,
bool *vm_page_frozen)
{
@@ -1987,6 +1971,8 @@ lazy_scan_prune(LVRelState *vacrel,
.vistest = vacrel->vistest,
.cutoffs = &vacrel->cutoffs,
};
+ uint8 old_vmbits = 0;
+ uint8 new_vmbits = 0;
Assert(BufferGetBlockNumber(buf) == blkno);
@@ -2089,70 +2075,7 @@ lazy_scan_prune(LVRelState *vacrel,
Assert(!presult.all_visible || !(*has_lpdead_items));
Assert(!presult.all_frozen || presult.all_visible);
- /*
- * Handle setting visibility map bit based on information from the VM (as
- * of last heap_vac_scan_next_block() call), and from all_visible and
- * all_frozen variables
- */
- if ((presult.all_visible && !all_visible_according_to_vm) ||
- (presult.all_frozen && !VM_ALL_FROZEN(rel, blkno, &vmbuffer)))
- {
- uint8 old_vmbits;
- uint8 new_vmbits = VISIBILITYMAP_ALL_VISIBLE;
-
- if (presult.all_frozen)
- new_vmbits |= VISIBILITYMAP_ALL_FROZEN;
-
- /*
- * It should never be the case that the visibility map page is set
- * while the page-level bit is clear, but the reverse is allowed (if
- * checksums are not enabled). Regardless, set both bits so that we
- * get back in sync.
- *
- * Even if PD_ALL_VISIBLE is already set, we don't need to worry about
- * unnecessarily dirtying the heap buffer, as it must be marked dirty
- * before adding it to the WAL chain. The only scenario where it is
- * not already dirty is if the VM was removed, and that isn't worth
- * optimizing for.
- */
- PageSetAllVisible(page);
- MarkBufferDirty(buf);
-
- /*
- * If the page is being set all-frozen, we pass InvalidTransactionId
- * as the cutoff_xid, since a snapshot conflict horizon sufficient to
- * make everything safe for REDO was logged when the page's tuples
- * were frozen.
- */
- Assert(!presult.all_frozen ||
- !TransactionIdIsValid(presult.vm_conflict_horizon));
-
- old_vmbits = visibilitymap_set(vacrel->rel, blkno, buf,
- InvalidXLogRecPtr,
- vmbuffer,
- presult.vm_conflict_horizon,
- new_vmbits);
-
- /*
- * If the page wasn't already set all-visible and/or all-frozen in the
- * VM, count it as newly set for logging.
- */
- if ((old_vmbits & VISIBILITYMAP_ALL_VISIBLE) == 0)
- {
- vacrel->vm_new_visible_pages++;
- if (presult.all_frozen)
- {
- vacrel->vm_new_visible_frozen_pages++;
- *vm_page_frozen = true;
- }
- }
- else if ((old_vmbits & VISIBILITYMAP_ALL_FROZEN) == 0 &&
- presult.all_frozen)
- {
- vacrel->vm_new_frozen_pages++;
- *vm_page_frozen = true;
- }
- }
+ old_vmbits = visibilitymap_get_status(vacrel->rel, blkno, &vmbuffer);
/*
* As of PostgreSQL 9.2, the visibility map bit should never be set if the
@@ -2160,8 +2083,8 @@ lazy_scan_prune(LVRelState *vacrel,
* cleared after heap_vac_scan_next_block() was called, so we must recheck
* with buffer lock before concluding that the VM is corrupt.
*/
- else if (all_visible_according_to_vm && !PageIsAllVisible(page) &&
- visibilitymap_get_status(vacrel->rel, blkno, &vmbuffer) != 0)
+ if (!PageIsAllVisible(page) &&
+ (old_vmbits & VISIBILITYMAP_ALL_VISIBLE) != 0)
{
ereport(WARNING,
(errcode(ERRCODE_DATA_CORRUPTED),
@@ -2199,6 +2122,69 @@ lazy_scan_prune(LVRelState *vacrel,
VISIBILITYMAP_VALID_BITS);
}
+ if (!presult.all_visible)
+ return presult.ndeleted;
+
+ /* Set the visibility map and page visibility hint */
+ new_vmbits = VISIBILITYMAP_ALL_VISIBLE;
+
+ if (presult.all_frozen)
+ new_vmbits |= VISIBILITYMAP_ALL_FROZEN;
+
+ /* Nothing to do */
+ if (old_vmbits == new_vmbits)
+ return presult.ndeleted;
+
+ Assert(presult.all_visible);
+
+ /*
+ * It should never be the case that the visibility map page is set while
+ * the page-level bit is clear, but the reverse is allowed (if checksums
+ * are not enabled). Regardless, set both bits so that we get back in
+ * sync.
+ *
+ * Even if PD_ALL_VISIBLE is already set, we don't need to worry about
+ * unnecessarily dirtying the heap buffer, as it must be marked dirty
+ * before adding it to the WAL chain. The only scenario where it is not
+ * already dirty is if the VM was removed, and that isn't worth optimizing
+ * for.
+ */
+ PageSetAllVisible(page);
+ MarkBufferDirty(buf);
+
+ /*
+ * If the page is being set all-frozen, we pass InvalidTransactionId as
+ * the cutoff_xid, since a snapshot conflict horizon sufficient to make
+ * everything safe for REDO was logged when the page's tuples were frozen.
+ */
+ Assert(!presult.all_frozen ||
+ !TransactionIdIsValid(presult.vm_conflict_horizon));
+
+ visibilitymap_set(vacrel->rel, blkno, buf,
+ InvalidXLogRecPtr,
+ vmbuffer, presult.vm_conflict_horizon,
+ new_vmbits);
+
+ /*
+ * If the page wasn't already set all-visible and/or all-frozen in the VM,
+ * count it as newly set for logging.
+ */
+ if ((old_vmbits & VISIBILITYMAP_ALL_VISIBLE) == 0)
+ {
+ vacrel->vm_new_visible_pages++;
+ if (presult.all_frozen)
+ {
+ vacrel->vm_new_visible_frozen_pages++;
+ *vm_page_frozen = true;
+ }
+ }
+ else if ((old_vmbits & VISIBILITYMAP_ALL_FROZEN) == 0 &&
+ presult.all_frozen)
+ {
+ vacrel->vm_new_frozen_pages++;
+ *vm_page_frozen = true;
+ }
+
return presult.ndeleted;
}
--
2.43.0
v26-0003-Refactor-lazy_scan_prune-VM-clear-logic-into-hel.patchtext/x-patch; charset=US-ASCII; name=v26-0003-Refactor-lazy_scan_prune-VM-clear-logic-into-hel.patchDownload
From ad1ddde609491f606adc4b87429c380e3b86ad52 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <melanieplageman@gmail.com>
Date: Tue, 2 Dec 2025 13:36:39 -0500
Subject: [PATCH v26 03/15] Refactor lazy_scan_prune() VM clear logic into
helper
Encapsulating them in a helper makes the whole function clearer. Before
we move all of this logic into heap_page_prune_and_freeze(), we want to
make it more compact and clear.
---
src/backend/access/heap/vacuumlazy.c | 122 +++++++++++++++++----------
1 file changed, 78 insertions(+), 44 deletions(-)
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index 436143cd12c..425dc2f8691 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -428,6 +428,11 @@ static void find_next_unskippable_block(LVRelState *vacrel, bool *skipsallvis);
static bool lazy_scan_new_or_empty(LVRelState *vacrel, Buffer buf,
BlockNumber blkno, Page page,
bool sharelock, Buffer vmbuffer);
+static void identify_and_fix_vm_corruption(Relation rel, Buffer heap_buffer,
+ BlockNumber heap_blk, Page heap_page,
+ int nlpdead_items,
+ Buffer vmbuffer,
+ uint8 vmbits);
static int lazy_scan_prune(LVRelState *vacrel, Buffer buf,
BlockNumber blkno, Page page,
Buffer vmbuffer,
@@ -1935,6 +1940,77 @@ cmpOffsetNumbers(const void *a, const void *b)
return pg_cmp_u16(*(const OffsetNumber *) a, *(const OffsetNumber *) b);
}
+/*
+ * Helper to correct any corruption detected on an heap page and its
+ * corresponding visibility map page after pruning but before setting the
+ * visibility map. It examines the heap page, the associated VM page, and the
+ * number of dead items previously identified.
+ *
+ * This function must be called while holding an exclusive lock on the heap
+ * buffer, and the dead items must have been discovered under that same lock.
+
+ * The provided vmbits must reflect the current state of the VM block
+ * referenced by vmbuffer. Although we do not hold a lock on the VM buffer, it
+ * is pinned, and the heap buffer is exclusively locked, ensuring that no
+ * other backend can update the VM bits corresponding to this heap page.
+ */
+static void
+identify_and_fix_vm_corruption(Relation rel, Buffer heap_buffer,
+ BlockNumber heap_blk, Page heap_page,
+ int nlpdead_items,
+ Buffer vmbuffer,
+ uint8 vmbits)
+{
+ Assert(visibilitymap_get_status(rel, heap_blk, &vmbuffer) == vmbits);
+
+ Assert(BufferIsLockedByMeInMode(heap_buffer, BUFFER_LOCK_EXCLUSIVE));
+
+ /*
+ * As of PostgreSQL 9.2, the visibility map bit should never be set if the
+ * page-level bit is clear. However, it's possible that the bit got
+ * cleared after heap_vac_scan_next_block() was called, so we must recheck
+ * with buffer lock before concluding that the VM is corrupt.
+ */
+ if (!PageIsAllVisible(heap_page) &&
+ ((vmbits & VISIBILITYMAP_ALL_VISIBLE) != 0))
+ {
+ ereport(WARNING,
+ (errcode(ERRCODE_DATA_CORRUPTED),
+ errmsg("page is not marked all-visible but visibility map bit is set in relation \"%s\" page %u",
+ RelationGetRelationName(rel), heap_blk)));
+
+ visibilitymap_clear(rel, heap_blk, vmbuffer,
+ VISIBILITYMAP_VALID_BITS);
+ }
+
+ /*
+ * It's possible for the value returned by
+ * GetOldestNonRemovableTransactionId() to move backwards, so it's not
+ * wrong for us to see tuples that appear to not be visible to everyone
+ * yet, while PD_ALL_VISIBLE is already set. The real safe xmin value
+ * never moves backwards, but GetOldestNonRemovableTransactionId() is
+ * conservative and sometimes returns a value that's unnecessarily small,
+ * so if we see that contradiction it just means that the tuples that we
+ * think are not visible to everyone yet actually are, and the
+ * PD_ALL_VISIBLE flag is correct.
+ *
+ * There should never be LP_DEAD items on a page with PD_ALL_VISIBLE set,
+ * however.
+ */
+ else if (PageIsAllVisible(heap_page) && nlpdead_items > 0)
+ {
+ ereport(WARNING,
+ (errcode(ERRCODE_DATA_CORRUPTED),
+ errmsg("page containing LP_DEAD items is marked as all-visible in relation \"%s\" page %u",
+ RelationGetRelationName(rel), heap_blk)));
+
+ PageClearAllVisible(heap_page);
+ MarkBufferDirty(heap_buffer);
+ visibilitymap_clear(rel, heap_blk, vmbuffer,
+ VISIBILITYMAP_VALID_BITS);
+ }
+}
+
/*
* lazy_scan_prune() -- lazy_scan_heap() pruning and freezing.
*
@@ -2077,50 +2153,8 @@ lazy_scan_prune(LVRelState *vacrel,
old_vmbits = visibilitymap_get_status(vacrel->rel, blkno, &vmbuffer);
- /*
- * As of PostgreSQL 9.2, the visibility map bit should never be set if the
- * page-level bit is clear. However, it's possible that the bit got
- * cleared after heap_vac_scan_next_block() was called, so we must recheck
- * with buffer lock before concluding that the VM is corrupt.
- */
- if (!PageIsAllVisible(page) &&
- (old_vmbits & VISIBILITYMAP_ALL_VISIBLE) != 0)
- {
- ereport(WARNING,
- (errcode(ERRCODE_DATA_CORRUPTED),
- errmsg("page is not marked all-visible but visibility map bit is set in relation \"%s\" page %u",
- vacrel->relname, blkno)));
-
- visibilitymap_clear(vacrel->rel, blkno, vmbuffer,
- VISIBILITYMAP_VALID_BITS);
- }
-
- /*
- * It's possible for the value returned by
- * GetOldestNonRemovableTransactionId() to move backwards, so it's not
- * wrong for us to see tuples that appear to not be visible to everyone
- * yet, while PD_ALL_VISIBLE is already set. The real safe xmin value
- * never moves backwards, but GetOldestNonRemovableTransactionId() is
- * conservative and sometimes returns a value that's unnecessarily small,
- * so if we see that contradiction it just means that the tuples that we
- * think are not visible to everyone yet actually are, and the
- * PD_ALL_VISIBLE flag is correct.
- *
- * There should never be LP_DEAD items on a page with PD_ALL_VISIBLE set,
- * however.
- */
- else if (presult.lpdead_items > 0 && PageIsAllVisible(page))
- {
- ereport(WARNING,
- (errcode(ERRCODE_DATA_CORRUPTED),
- errmsg("page containing LP_DEAD items is marked as all-visible in relation \"%s\" page %u",
- vacrel->relname, blkno)));
-
- PageClearAllVisible(page);
- MarkBufferDirty(buf);
- visibilitymap_clear(vacrel->rel, blkno, vmbuffer,
- VISIBILITYMAP_VALID_BITS);
- }
+ identify_and_fix_vm_corruption(vacrel->rel, buf, blkno, page,
+ presult.lpdead_items, vmbuffer, old_vmbits);
if (!presult.all_visible)
return presult.ndeleted;
--
2.43.0
v26-0004-Set-the-VM-in-heap_page_prune_and_freeze.patchtext/x-patch; charset=US-ASCII; name=v26-0004-Set-the-VM-in-heap_page_prune_and_freeze.patchDownload
From 0c36849792d268443c4de300e86008f7cd1adefa Mon Sep 17 00:00:00 2001
From: Melanie Plageman <melanieplageman@gmail.com>
Date: Tue, 2 Dec 2025 15:07:42 -0500
Subject: [PATCH v26 04/15] Set the VM in heap_page_prune_and_freeze()
This has no independent benefit. It is meant for ease of review. As of
this commit, there is still a separate WAL record emitted for setting
the VM after pruning and freezing. But it is easier to review if moving
the logic into pruneheap.c is separate from setting the VM in the same
WAL record.
---
src/backend/access/heap/pruneheap.c | 301 +++++++++++++++++++++++----
src/backend/access/heap/vacuumlazy.c | 142 +------------
src/include/access/heapam.h | 21 ++
3 files changed, 285 insertions(+), 179 deletions(-)
diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index ca44225a10e..0d825228b62 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -19,7 +19,7 @@
#include "access/htup_details.h"
#include "access/multixact.h"
#include "access/transam.h"
-#include "access/visibilitymapdefs.h"
+#include "access/visibilitymap.h"
#include "access/xlog.h"
#include "access/xloginsert.h"
#include "commands/vacuum.h"
@@ -44,6 +44,8 @@ typedef struct
bool mark_unused_now;
/* whether to attempt freezing tuples */
bool attempt_freeze;
+ /* whether or not to attempt updating the VM */
+ bool attempt_update_vm;
struct VacuumCutoffs *cutoffs;
/*-------------------------------------------------------
@@ -140,16 +142,17 @@ typedef struct
* all_visible and all_frozen indicate if the all-visible and all-frozen
* bits in the visibility map can be set for this page after pruning.
*
- * visibility_cutoff_xid is the newest xmin of live tuples on the page.
- * The caller can use it as the conflict horizon, when setting the VM
- * bits. It is only valid if we froze some tuples, and all_frozen is
- * true.
+ * visibility_cutoff_xid is the newest xmin of live tuples on the page. It
+ * can be used as the conflict horizon when setting the VM or when
+ * freezing all the tuples on the page. It is only valid when all the live
+ * tuples on the page are all-visible.
*
* NOTE: all_visible and all_frozen initially don't include LP_DEAD items.
* That's convenient for heap_page_prune_and_freeze() to use them to
- * decide whether to freeze the page or not. The all_visible and
- * all_frozen values returned to the caller are adjusted to include
- * LP_DEAD items after we determine whether to opportunistically freeze.
+ * decide whether to opportunistically freeze the page or not. The
+ * all_visible and all_frozen values ultimately used to set the VM are
+ * adjusted to include LP_DEAD items after we determine whether or not to
+ * opportunistically freeze.
*/
bool all_visible;
bool all_frozen;
@@ -191,6 +194,17 @@ static void page_verify_redirects(Page page);
static bool heap_page_will_freeze(Relation relation, Buffer buffer,
bool did_tuple_hint_fpi, bool do_prune, bool do_hint_prune,
PruneState *prstate);
+static void identify_and_fix_vm_corruption(Relation rel, Buffer heap_buffer,
+ BlockNumber heap_blk, Page heap_page, int nlpdead_items,
+ Buffer vmbuffer,
+ uint8 vmbits);
+static bool heap_page_will_set_vm(PruneState *prstate,
+ Relation relation,
+ BlockNumber heap_blk, Buffer heap_buffer, Page heap_page,
+ Buffer vmbuffer,
+ int nlpdead_items,
+ uint8 *old_vmbits,
+ uint8 *new_vmbits);
/*
@@ -280,6 +294,7 @@ heap_page_prune_opt(Relation relation, Buffer buffer)
PruneFreezeParams params = {
.relation = relation,
.buffer = buffer,
+ .vmbuffer = InvalidBuffer,
.reason = PRUNE_ON_ACCESS,
.options = 0,
.vistest = vistest,
@@ -338,6 +353,8 @@ prune_freeze_setup(PruneFreezeParams *params,
/* cutoffs must be provided if we will attempt freezing */
Assert(!(params->options & HEAP_PAGE_PRUNE_FREEZE) || params->cutoffs);
prstate->attempt_freeze = (params->options & HEAP_PAGE_PRUNE_FREEZE) != 0;
+ prstate->attempt_update_vm =
+ (params->options & HEAP_PAGE_PRUNE_UPDATE_VM) != 0;
prstate->cutoffs = params->cutoffs;
/*
@@ -386,51 +403,54 @@ prune_freeze_setup(PruneFreezeParams *params,
prstate->frz_conflict_horizon = InvalidTransactionId;
/*
- * Vacuum may update the VM after we're done. We can keep track of
- * whether the page will be all-visible and all-frozen after pruning and
- * freezing to help the caller to do that.
+ * Track whether the page could be marked all-visible and/or all-frozen.
+ * This information is used for opportunistic freezing and for updating
+ * the visibility map (VM) if requested by the caller.
*
- * Currently, only VACUUM sets the VM bits. To save the effort, only do
- * the bookkeeping if the caller needs it. Currently, that's tied to
- * HEAP_PAGE_PRUNE_FREEZE, but it could be a separate flag if you wanted
- * to update the VM bits without also freezing or freeze without also
- * setting the VM bits.
+ * Currently, only VACUUM performs freezing, but other callers may in the
+ * future. Visibility bookkeeping is required not just for setting the VM
+ * bits, but also for opportunistic freezing: we only consider freezing if
+ * the page would become all-frozen, or if it would be all-frozen except
+ * for dead tuples that VACUUM will remove. If attempt_update_vm is false,
+ * we will not set the VM bit even if the page is found to be all-visible.
*
- * In addition to telling the caller whether it can set the VM bit, we
- * also use 'all_visible' and 'all_frozen' for our own decision-making. If
- * the whole page would become frozen, we consider opportunistically
- * freezing tuples. We will not be able to freeze the whole page if there
- * are tuples present that are not visible to everyone or if there are
- * dead tuples which are not yet removable. However, dead tuples which
- * will be removed by the end of vacuuming should not preclude us from
- * opportunistically freezing. Because of that, we do not immediately
- * clear all_visible and all_frozen when we see LP_DEAD items. We fix
- * that after scanning the line pointers. We must correct all_visible and
- * all_frozen before we return them to the caller, so that the caller
- * doesn't set the VM bits incorrectly.
+ * If HEAP_PAGE_PRUNE_UPDATE_VM is passed without HEAP_PAGE_PRUNE_FREEZE,
+ * prstate.all_frozen must be initialized to false, since we will not call
+ * heap_prepare_freeze_tuple() for each tuple.
+ *
+ * Dead tuples that will be removed by the end of vacuum should not
+ * prevent opportunistic freezing. Therefore, we do not clear all_visible
+ * and all_frozen when we encounter LP_DEAD items. Instead, we correct
+ * them after deciding whether to freeze, but before updating the VM, to
+ * avoid setting the VM bits incorrectly.
+ *
+ * If neither freezing nor VM updates are requested, we skip the extra
+ * bookkeeping. In this case, initializing all_visible to false allows
+ * heap_prune_record_unchanged_lp_normal() to bypass unnecessary work.
*/
if (prstate->attempt_freeze)
{
prstate->all_visible = true;
prstate->all_frozen = true;
}
+ else if (prstate->attempt_update_vm)
+ {
+ prstate->all_visible = true;
+ prstate->all_frozen = false;
+ }
else
{
- /*
- * Initializing to false allows skipping the work to update them in
- * heap_prune_record_unchanged_lp_normal().
- */
prstate->all_visible = false;
prstate->all_frozen = false;
}
/*
- * The visibility cutoff xid is the newest xmin of live tuples on the
- * page. In the common case, this will be set as the conflict horizon the
- * caller can use for updating the VM. If, at the end of freezing and
- * pruning, the page is all-frozen, there is no possibility that any
- * running transaction on the standby does not see tuples on the page as
- * all-visible, so the conflict horizon remains InvalidTransactionId.
+ * The visibility cutoff xid is the newest xmin of live, committed tuples
+ * older than OldestXmin on the page. This field is only kept up-to-date
+ * if the page is all-visible. As soon as a tuple is encountered that is
+ * not visible to all, this field is unmaintained. As long as it is
+ * maintained, it can be used to calculate the snapshot conflict horizon
+ * when updating the VM and/or freezing all the tuples on the page.
*/
prstate->visibility_cutoff_xid = InvalidTransactionId;
}
@@ -765,10 +785,134 @@ heap_page_will_freeze(Relation relation, Buffer buffer,
return do_freeze;
}
+/*
+ * Helper to correct any corruption detected on an heap page and its
+ * corresponding visibility map page after pruning but before setting the
+ * visibility map. It examines the heap page, the associated VM page, and the
+ * number of dead items previously identified.
+ *
+ * This function must be called while holding an exclusive lock on the heap
+ * buffer, and the dead items must have been discovered under that same lock.
+ *
+ * The provided vmbits must reflect the current state of the VM block
+ * referenced by vmbuffer. Although we do not hold a lock on the VM buffer, it
+ * is pinned, and the heap buffer is exclusively locked, ensuring that no
+ * other backend can update the VM bits corresponding to this heap page.
+ */
+static void
+identify_and_fix_vm_corruption(Relation rel, Buffer heap_buffer,
+ BlockNumber heap_blk, Page heap_page,
+ int nlpdead_items,
+ Buffer vmbuffer,
+ uint8 vmbits)
+{
+ Assert(visibilitymap_get_status(rel, heap_blk, &vmbuffer) == vmbits);
+
+ Assert(BufferIsLockedByMeInMode(heap_buffer, BUFFER_LOCK_EXCLUSIVE));
+
+ /*
+ * As of PostgreSQL 9.2, the visibility map bit should never be set if the
+ * page-level bit is clear. However, it's possible that the bit got
+ * cleared after heap_vac_scan_next_block() was called, so we must recheck
+ * with buffer lock before concluding that the VM is corrupt.
+ */
+ if (!PageIsAllVisible(heap_page) &&
+ ((vmbits & VISIBILITYMAP_ALL_VISIBLE) != 0))
+ {
+ ereport(WARNING,
+ (errcode(ERRCODE_DATA_CORRUPTED),
+ errmsg("page is not marked all-visible but visibility map bit is set in relation \"%s\" page %u",
+ RelationGetRelationName(rel), heap_blk)));
+
+ visibilitymap_clear(rel, heap_blk, vmbuffer,
+ VISIBILITYMAP_VALID_BITS);
+ }
+
+ /*
+ * It's possible for the value returned by
+ * GetOldestNonRemovableTransactionId() to move backwards, so it's not
+ * wrong for us to see tuples that appear to not be visible to everyone
+ * yet, while PD_ALL_VISIBLE is already set. The real safe xmin value
+ * never moves backwards, but GetOldestNonRemovableTransactionId() is
+ * conservative and sometimes returns a value that's unnecessarily small,
+ * so if we see that contradiction it just means that the tuples that we
+ * think are not visible to everyone yet actually are, and the
+ * PD_ALL_VISIBLE flag is correct.
+ *
+ * There should never be LP_DEAD items on a page with PD_ALL_VISIBLE set,
+ * however.
+ */
+ else if (PageIsAllVisible(heap_page) && nlpdead_items > 0)
+ {
+ ereport(WARNING,
+ (errcode(ERRCODE_DATA_CORRUPTED),
+ errmsg("page containing LP_DEAD items is marked as all-visible in relation \"%s\" page %u",
+ RelationGetRelationName(rel), heap_blk)));
+
+ PageClearAllVisible(heap_page);
+ MarkBufferDirty(heap_buffer);
+ visibilitymap_clear(rel, heap_blk, vmbuffer,
+ VISIBILITYMAP_VALID_BITS);
+ }
+}
+
+/*
+ * Decide whether to set the visibility map bits (all-visible and all-frozen)
+ * for heap_blk using information from the PruneState and VM.
+ *
+ * This function does not actually set the VM bits or page-level visibility
+ * hint, PD_ALL_VISIBLE.
+ *
+ * However, if it finds that the page-level visibility hint or VM is
+ * corrupted, it will fix them by clearing the VM bits and visibility hint.
+ * This does not need to be done in a critical section.
+ *
+ * Returns true if one or both VM bits should be set, along with returning the
+ * current value of the VM bits in *old_vmbits and the desired new value of
+ * the VM bits in *new_vmbits.
+ */
+static bool
+heap_page_will_set_vm(PruneState *prstate,
+ Relation relation,
+ BlockNumber heap_blk, Buffer heap_buffer, Page heap_page,
+ Buffer vmbuffer,
+ int nlpdead_items,
+ uint8 *old_vmbits,
+ uint8 *new_vmbits)
+{
+ if (!prstate->attempt_update_vm)
+ return false;
+
+ *old_vmbits = visibilitymap_get_status(relation, heap_blk,
+ &vmbuffer);
+
+ /* We do this even if not all-visible */
+ identify_and_fix_vm_corruption(relation, heap_buffer, heap_blk, heap_page,
+ nlpdead_items, vmbuffer,
+ *old_vmbits);
+
+ if (!prstate->all_visible)
+ return false;
+
+ *new_vmbits = VISIBILITYMAP_ALL_VISIBLE;
+
+ if (prstate->all_frozen)
+ *new_vmbits |= VISIBILITYMAP_ALL_FROZEN;
+
+ if (*new_vmbits == *old_vmbits)
+ {
+ *new_vmbits = 0;
+ return false;
+ }
+
+ return true;
+}
+
/*
* Prune and repair fragmentation and potentially freeze tuples on the
- * specified page.
+ * specified page. If the page's visibility status has changed, update it in
+ * the VM.
*
* Caller must have pin and buffer cleanup lock on the page. Note that we
* don't update the FSM information for page on caller's behalf. Caller might
@@ -783,12 +927,13 @@ heap_page_will_freeze(Relation relation, Buffer buffer,
* tuples if it's required in order to advance relfrozenxid / relminmxid, or
* if it's considered advantageous for overall system performance to do so
* now. The 'params.cutoffs', 'presult', 'new_relfrozen_xid' and
- * 'new_relmin_mxid' arguments are required when freezing. When
- * HEAP_PAGE_PRUNE_FREEZE option is passed, we also set presult->all_visible
- * and presult->all_frozen after determining whether or not to
- * opportunistically freeze, to indicate if the VM bits can be set. They are
- * always set to false when the HEAP_PAGE_PRUNE_FREEZE option is not passed,
- * because at the moment only callers that also freeze need that information.
+ * 'new_relmin_mxid' arguments are required when freezing.
+ *
+ * If HEAP_PAGE_PRUNE_UPDATE_VM is set in params and the visibility status of
+ * the page has changed, we will update the VM at the same time as pruning and
+ * freezing the heap page. We will also update presult->old_vmbits and
+ * presult->new_vmbits with the state of the VM before and after updating it
+ * for the caller to use in bookkeeping.
*
* presult contains output parameters needed by callers, such as the number of
* tuples removed and the offsets of dead items on the page after pruning.
@@ -813,13 +958,19 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
MultiXactId *new_relmin_mxid)
{
Buffer buffer = params->buffer;
+ Buffer vmbuffer = params->vmbuffer;
Page page = BufferGetPage(buffer);
+ BlockNumber blockno = BufferGetBlockNumber(buffer);
PruneState prstate;
bool do_freeze;
bool do_prune;
bool do_hint_prune;
+ bool do_set_vm;
bool did_tuple_hint_fpi;
int64 fpi_before = pgWalUsage.wal_fpi;
+ uint8 new_vmbits = 0;
+ uint8 old_vmbits = 0;
+
/* Initialize prstate */
prune_freeze_setup(params,
@@ -1001,6 +1152,64 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
*new_relmin_mxid = prstate.pagefrz.NoFreezePageRelminMxid;
}
}
+
+ /* Now update the visibility map and PD_ALL_VISIBLE hint */
+ Assert(!prstate.all_visible || (prstate.lpdead_items == 0));
+
+ do_set_vm = heap_page_will_set_vm(&prstate,
+ params->relation,
+ blockno,
+ buffer,
+ page,
+ vmbuffer,
+ prstate.lpdead_items,
+ &old_vmbits,
+ &new_vmbits);
+
+ /*
+ * new_vmbits should be 0 regardless of whether or not the page is
+ * all-visible if we do not intend to set the VM.
+ */
+ Assert(do_set_vm || new_vmbits == 0);
+
+ /* Set the visibility map and page visibility hint, if relevant */
+ if (do_set_vm)
+ {
+ Assert(prstate.all_visible);
+
+ /*
+ * It should never be the case that the visibility map page is set
+ * while the page-level bit is clear, but the reverse is allowed (if
+ * checksums are not enabled). Regardless, set both bits so that we
+ * get back in sync.
+ *
+ * Even if PD_ALL_VISIBLE is already set, we don't need to worry about
+ * unnecessarily dirtying the heap buffer, as it must be marked dirty
+ * before adding it to the WAL chain. The only scenario where it is
+ * not already dirty is if the VM was removed, and that isn't worth
+ * optimizing for.
+ */
+ PageSetAllVisible(page);
+ MarkBufferDirty(buffer);
+
+ /*
+ * If the page is being set all-frozen, we pass InvalidTransactionId
+ * as the cutoff_xid, since a snapshot conflict horizon sufficient to
+ * make everything safe for REDO was logged when the page's tuples
+ * were frozen.
+ */
+ Assert(!prstate.all_frozen ||
+ !TransactionIdIsValid(presult->vm_conflict_horizon));
+
+ visibilitymap_set(params->relation, blockno, buffer,
+ InvalidXLogRecPtr,
+ vmbuffer, presult->vm_conflict_horizon,
+ new_vmbits);
+ }
+
+ /* Save the vmbits for caller */
+ presult->old_vmbits = old_vmbits;
+ presult->new_vmbits = new_vmbits;
}
@@ -1475,6 +1684,8 @@ heap_prune_record_unchanged_lp_normal(Page page, PruneState *prstate, OffsetNumb
{
TransactionId xmin;
+ Assert(prstate->attempt_update_vm);
+
if (!HeapTupleHeaderXminCommitted(htup))
{
prstate->all_visible = false;
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index 425dc2f8691..ccfad5b2dba 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -428,11 +428,6 @@ static void find_next_unskippable_block(LVRelState *vacrel, bool *skipsallvis);
static bool lazy_scan_new_or_empty(LVRelState *vacrel, Buffer buf,
BlockNumber blkno, Page page,
bool sharelock, Buffer vmbuffer);
-static void identify_and_fix_vm_corruption(Relation rel, Buffer heap_buffer,
- BlockNumber heap_blk, Page heap_page,
- int nlpdead_items,
- Buffer vmbuffer,
- uint8 vmbits);
static int lazy_scan_prune(LVRelState *vacrel, Buffer buf,
BlockNumber blkno, Page page,
Buffer vmbuffer,
@@ -1940,77 +1935,6 @@ cmpOffsetNumbers(const void *a, const void *b)
return pg_cmp_u16(*(const OffsetNumber *) a, *(const OffsetNumber *) b);
}
-/*
- * Helper to correct any corruption detected on an heap page and its
- * corresponding visibility map page after pruning but before setting the
- * visibility map. It examines the heap page, the associated VM page, and the
- * number of dead items previously identified.
- *
- * This function must be called while holding an exclusive lock on the heap
- * buffer, and the dead items must have been discovered under that same lock.
-
- * The provided vmbits must reflect the current state of the VM block
- * referenced by vmbuffer. Although we do not hold a lock on the VM buffer, it
- * is pinned, and the heap buffer is exclusively locked, ensuring that no
- * other backend can update the VM bits corresponding to this heap page.
- */
-static void
-identify_and_fix_vm_corruption(Relation rel, Buffer heap_buffer,
- BlockNumber heap_blk, Page heap_page,
- int nlpdead_items,
- Buffer vmbuffer,
- uint8 vmbits)
-{
- Assert(visibilitymap_get_status(rel, heap_blk, &vmbuffer) == vmbits);
-
- Assert(BufferIsLockedByMeInMode(heap_buffer, BUFFER_LOCK_EXCLUSIVE));
-
- /*
- * As of PostgreSQL 9.2, the visibility map bit should never be set if the
- * page-level bit is clear. However, it's possible that the bit got
- * cleared after heap_vac_scan_next_block() was called, so we must recheck
- * with buffer lock before concluding that the VM is corrupt.
- */
- if (!PageIsAllVisible(heap_page) &&
- ((vmbits & VISIBILITYMAP_ALL_VISIBLE) != 0))
- {
- ereport(WARNING,
- (errcode(ERRCODE_DATA_CORRUPTED),
- errmsg("page is not marked all-visible but visibility map bit is set in relation \"%s\" page %u",
- RelationGetRelationName(rel), heap_blk)));
-
- visibilitymap_clear(rel, heap_blk, vmbuffer,
- VISIBILITYMAP_VALID_BITS);
- }
-
- /*
- * It's possible for the value returned by
- * GetOldestNonRemovableTransactionId() to move backwards, so it's not
- * wrong for us to see tuples that appear to not be visible to everyone
- * yet, while PD_ALL_VISIBLE is already set. The real safe xmin value
- * never moves backwards, but GetOldestNonRemovableTransactionId() is
- * conservative and sometimes returns a value that's unnecessarily small,
- * so if we see that contradiction it just means that the tuples that we
- * think are not visible to everyone yet actually are, and the
- * PD_ALL_VISIBLE flag is correct.
- *
- * There should never be LP_DEAD items on a page with PD_ALL_VISIBLE set,
- * however.
- */
- else if (PageIsAllVisible(heap_page) && nlpdead_items > 0)
- {
- ereport(WARNING,
- (errcode(ERRCODE_DATA_CORRUPTED),
- errmsg("page containing LP_DEAD items is marked as all-visible in relation \"%s\" page %u",
- RelationGetRelationName(rel), heap_blk)));
-
- PageClearAllVisible(heap_page);
- MarkBufferDirty(heap_buffer);
- visibilitymap_clear(rel, heap_blk, vmbuffer,
- VISIBILITYMAP_VALID_BITS);
- }
-}
-
/*
* lazy_scan_prune() -- lazy_scan_heap() pruning and freezing.
*
@@ -2042,13 +1966,12 @@ lazy_scan_prune(LVRelState *vacrel,
PruneFreezeParams params = {
.relation = rel,
.buffer = buf,
+ .vmbuffer = vmbuffer,
.reason = PRUNE_VACUUM_SCAN,
- .options = HEAP_PAGE_PRUNE_FREEZE,
+ .options = HEAP_PAGE_PRUNE_FREEZE | HEAP_PAGE_PRUNE_UPDATE_VM,
.vistest = vacrel->vistest,
.cutoffs = &vacrel->cutoffs,
};
- uint8 old_vmbits = 0;
- uint8 new_vmbits = 0;
Assert(BufferGetBlockNumber(buf) == blkno);
@@ -2148,73 +2071,24 @@ lazy_scan_prune(LVRelState *vacrel,
/* Did we find LP_DEAD items? */
*has_lpdead_items = (presult.lpdead_items > 0);
- Assert(!presult.all_visible || !(*has_lpdead_items));
- Assert(!presult.all_frozen || presult.all_visible);
-
- old_vmbits = visibilitymap_get_status(vacrel->rel, blkno, &vmbuffer);
-
- identify_and_fix_vm_corruption(vacrel->rel, buf, blkno, page,
- presult.lpdead_items, vmbuffer, old_vmbits);
-
- if (!presult.all_visible)
- return presult.ndeleted;
-
- /* Set the visibility map and page visibility hint */
- new_vmbits = VISIBILITYMAP_ALL_VISIBLE;
-
- if (presult.all_frozen)
- new_vmbits |= VISIBILITYMAP_ALL_FROZEN;
-
- /* Nothing to do */
- if (old_vmbits == new_vmbits)
- return presult.ndeleted;
-
- Assert(presult.all_visible);
-
- /*
- * It should never be the case that the visibility map page is set while
- * the page-level bit is clear, but the reverse is allowed (if checksums
- * are not enabled). Regardless, set both bits so that we get back in
- * sync.
- *
- * Even if PD_ALL_VISIBLE is already set, we don't need to worry about
- * unnecessarily dirtying the heap buffer, as it must be marked dirty
- * before adding it to the WAL chain. The only scenario where it is not
- * already dirty is if the VM was removed, and that isn't worth optimizing
- * for.
- */
- PageSetAllVisible(page);
- MarkBufferDirty(buf);
-
- /*
- * If the page is being set all-frozen, we pass InvalidTransactionId as
- * the cutoff_xid, since a snapshot conflict horizon sufficient to make
- * everything safe for REDO was logged when the page's tuples were frozen.
- */
- Assert(!presult.all_frozen ||
- !TransactionIdIsValid(presult.vm_conflict_horizon));
-
- visibilitymap_set(vacrel->rel, blkno, buf,
- InvalidXLogRecPtr,
- vmbuffer, presult.vm_conflict_horizon,
- new_vmbits);
-
/*
* If the page wasn't already set all-visible and/or all-frozen in the VM,
* count it as newly set for logging.
*/
- if ((old_vmbits & VISIBILITYMAP_ALL_VISIBLE) == 0)
+ if ((presult.old_vmbits & VISIBILITYMAP_ALL_VISIBLE) == 0 &&
+ (presult.new_vmbits & VISIBILITYMAP_ALL_VISIBLE) != 0)
{
vacrel->vm_new_visible_pages++;
- if (presult.all_frozen)
+ if ((presult.new_vmbits & VISIBILITYMAP_ALL_FROZEN) != 0)
{
vacrel->vm_new_visible_frozen_pages++;
*vm_page_frozen = true;
}
}
- else if ((old_vmbits & VISIBILITYMAP_ALL_FROZEN) == 0 &&
- presult.all_frozen)
+ else if ((presult.old_vmbits & VISIBILITYMAP_ALL_FROZEN) == 0 &&
+ (presult.new_vmbits & VISIBILITYMAP_ALL_FROZEN) != 0)
{
+ Assert((presult.new_vmbits & VISIBILITYMAP_ALL_VISIBLE) != 0);
vacrel->vm_new_frozen_pages++;
*vm_page_frozen = true;
}
diff --git a/src/include/access/heapam.h b/src/include/access/heapam.h
index f7e4ae3843c..f3fa61c9c1b 100644
--- a/src/include/access/heapam.h
+++ b/src/include/access/heapam.h
@@ -42,6 +42,7 @@
/* "options" flag bits for heap_page_prune_and_freeze */
#define HEAP_PAGE_PRUNE_MARK_UNUSED_NOW (1 << 0)
#define HEAP_PAGE_PRUNE_FREEZE (1 << 1)
+#define HEAP_PAGE_PRUNE_UPDATE_VM (1 << 2)
typedef struct BulkInsertStateData *BulkInsertState;
typedef struct GlobalVisState GlobalVisState;
@@ -238,6 +239,12 @@ typedef struct PruneFreezeParams
Relation relation; /* relation containing buffer to be pruned */
Buffer buffer; /* buffer to be pruned */
+ /*
+ * vmbuffer is the buffer that must already contain the required block of
+ * the visibility map if we are to update it.
+ */
+ Buffer vmbuffer;
+
/*
* The reason pruning was performed. It is used to set the WAL record
* opcode which is used for debugging and analysis purposes.
@@ -252,6 +259,9 @@ typedef struct PruneFreezeParams
*
* HEAP_PAGE_PRUNE_FREEZE indicates that we will also freeze tuples, and
* will return 'all_visible', 'all_frozen' flags to the caller.
+ *
+ * HEAP_PAGE_PRUNE_UPDATE_VM indicates that we will set the page's status
+ * in the VM.
*/
int options;
@@ -299,6 +309,17 @@ typedef struct PruneFreezeResult
bool all_frozen;
TransactionId vm_conflict_horizon;
+ /*
+ * old_vmbits are the state of the all-visible and all-frozen bits in the
+ * visibility map before updating it during phase I of vacuuming.
+ * new_vmbits are the state of those bits after phase I of vacuuming.
+ *
+ * These are only set if the HEAP_PAGE_PRUNE_UPDATE_VM option is set and
+ * we have attempted to update the VM.
+ */
+ uint8 new_vmbits;
+ uint8 old_vmbits;
+
/*
* Whether or not the page makes rel truncation unsafe. This is set to
* 'true', even if the page contains LP_DEAD items. VACUUM will remove
--
2.43.0
v26-0005-Move-VM-assert-into-prune-freeze-code.patchtext/x-patch; charset=US-ASCII; name=v26-0005-Move-VM-assert-into-prune-freeze-code.patchDownload
From 06e17918d799a4b654eccd76e1a39b2bd49e505b Mon Sep 17 00:00:00 2001
From: Melanie Plageman <melanieplageman@gmail.com>
Date: Tue, 2 Dec 2025 15:57:34 -0500
Subject: [PATCH v26 05/15] Move VM assert into prune/freeze code
This is a step toward setting the VM in the same WAL record as pruning
and freezing. It moves the check of the heap page into prune/freeze code
before setting the VM. This allows us to remove some fields of the
PruneFreezeResult.
---
src/backend/access/heap/pruneheap.c | 86 ++++++++++++++++++++++------
src/backend/access/heap/vacuumlazy.c | 68 +---------------------
src/include/access/heapam.h | 25 +++-----
3 files changed, 77 insertions(+), 102 deletions(-)
diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index 0d825228b62..ab567e7518b 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -908,6 +908,31 @@ heap_page_will_set_vm(PruneState *prstate,
return true;
}
+#ifdef USE_ASSERT_CHECKING
+
+/*
+ * Wrapper for heap_page_would_be_all_visible() which can be used for callers
+ * that expect no LP_DEAD on the page. Currently assert-only, but there is no
+ * reason not to use it outside of asserts.
+ */
+static bool
+heap_page_is_all_visible(Relation rel, Buffer buf,
+ TransactionId OldestXmin,
+ bool *all_frozen,
+ TransactionId *visibility_cutoff_xid,
+ OffsetNumber *logging_offnum)
+{
+
+ return heap_page_would_be_all_visible(rel, buf,
+ OldestXmin,
+ NULL, 0,
+ all_frozen,
+ visibility_cutoff_xid,
+ logging_offnum);
+}
+#endif
+
+
/*
* Prune and repair fragmentation and potentially freeze tuples on the
@@ -961,6 +986,7 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
Buffer vmbuffer = params->vmbuffer;
Page page = BufferGetPage(buffer);
BlockNumber blockno = BufferGetBlockNumber(buffer);
+ TransactionId vm_conflict_horizon = InvalidTransactionId;
PruneState prstate;
bool do_freeze;
bool do_prune;
@@ -1119,23 +1145,8 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
presult->nfrozen = prstate.nfrozen;
presult->live_tuples = prstate.live_tuples;
presult->recently_dead_tuples = prstate.recently_dead_tuples;
- presult->all_visible = prstate.all_visible;
- presult->all_frozen = prstate.all_frozen;
presult->hastup = prstate.hastup;
- /*
- * For callers planning to update the visibility map, the conflict horizon
- * for that record must be the newest xmin on the page. However, if the
- * page is completely frozen, there can be no conflict and the
- * vm_conflict_horizon should remain InvalidTransactionId. This includes
- * the case that we just froze all the tuples; the prune-freeze record
- * included the conflict XID already so the caller doesn't need it.
- */
- if (presult->all_frozen)
- presult->vm_conflict_horizon = InvalidTransactionId;
- else
- presult->vm_conflict_horizon = prstate.visibility_cutoff_xid;
-
presult->lpdead_items = prstate.lpdead_items;
/* the presult->deadoffsets array was already filled in */
@@ -1153,6 +1164,46 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
}
}
+ /*
+ * If updating the visibility map, the conflict horizon for that record
+ * must be the newest xmin on the page. However, if the page is
+ * completely frozen, there can be no conflict and the vm_conflict_horizon
+ * should remain InvalidTransactionId. This includes the case that we
+ * just froze all the tuples; the prune-freeze record included the
+ * conflict XID already so we don't need to again.
+ */
+ if (prstate.all_frozen)
+ vm_conflict_horizon = InvalidTransactionId;
+ else
+ vm_conflict_horizon = prstate.visibility_cutoff_xid;
+
+ /*
+ * During its second pass over the heap, VACUUM calls
+ * heap_page_would_be_all_visible() to determine whether a page is
+ * all-visible and all-frozen. The logic here is similar. After completing
+ * pruning and freezing, use an assertion to verify that our results
+ * remain consistent with heap_page_would_be_all_visible().
+ */
+#ifdef USE_ASSERT_CHECKING
+ if (prstate.all_visible)
+ {
+ TransactionId debug_cutoff;
+ bool debug_all_frozen;
+
+ Assert(presult->lpdead_items == 0);
+
+ Assert(heap_page_is_all_visible(params->relation, buffer,
+ prstate.cutoffs->OldestXmin,
+ &debug_all_frozen,
+ &debug_cutoff, off_loc));
+
+ Assert(prstate.all_frozen == debug_all_frozen);
+
+ Assert(!TransactionIdIsValid(debug_cutoff) ||
+ debug_cutoff == vm_conflict_horizon);
+ }
+#endif
+
/* Now update the visibility map and PD_ALL_VISIBLE hint */
Assert(!prstate.all_visible || (prstate.lpdead_items == 0));
@@ -1198,12 +1249,11 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
* make everything safe for REDO was logged when the page's tuples
* were frozen.
*/
- Assert(!prstate.all_frozen ||
- !TransactionIdIsValid(presult->vm_conflict_horizon));
+ Assert(!prstate.all_frozen || !TransactionIdIsValid(vm_conflict_horizon));
visibilitymap_set(params->relation, blockno, buffer,
InvalidXLogRecPtr,
- vmbuffer, presult->vm_conflict_horizon,
+ vmbuffer, vm_conflict_horizon,
new_vmbits);
}
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index ccfad5b2dba..3fa03470722 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -462,20 +462,6 @@ static void dead_items_add(LVRelState *vacrel, BlockNumber blkno, OffsetNumber *
static void dead_items_reset(LVRelState *vacrel);
static void dead_items_cleanup(LVRelState *vacrel);
-#ifdef USE_ASSERT_CHECKING
-static bool heap_page_is_all_visible(Relation rel, Buffer buf,
- TransactionId OldestXmin,
- bool *all_frozen,
- TransactionId *visibility_cutoff_xid,
- OffsetNumber *logging_offnum);
-#endif
-static bool heap_page_would_be_all_visible(Relation rel, Buffer buf,
- TransactionId OldestXmin,
- OffsetNumber *deadoffsets,
- int ndeadoffsets,
- bool *all_frozen,
- TransactionId *visibility_cutoff_xid,
- OffsetNumber *logging_offnum);
static void update_relstats_all_indexes(LVRelState *vacrel);
static void vacuum_error_callback(void *arg);
static void update_vacuum_error_info(LVRelState *vacrel,
@@ -2012,32 +1998,6 @@ lazy_scan_prune(LVRelState *vacrel,
vacrel->new_frozen_tuple_pages++;
}
- /*
- * VACUUM will call heap_page_is_all_visible() during the second pass over
- * the heap to determine all_visible and all_frozen for the page -- this
- * is a specialized version of the logic from this function. Now that
- * we've finished pruning and freezing, make sure that we're in total
- * agreement with heap_page_is_all_visible() using an assertion.
- */
-#ifdef USE_ASSERT_CHECKING
- if (presult.all_visible)
- {
- TransactionId debug_cutoff;
- bool debug_all_frozen;
-
- Assert(presult.lpdead_items == 0);
-
- Assert(heap_page_is_all_visible(vacrel->rel, buf,
- vacrel->cutoffs.OldestXmin, &debug_all_frozen,
- &debug_cutoff, &vacrel->offnum));
-
- Assert(presult.all_frozen == debug_all_frozen);
-
- Assert(!TransactionIdIsValid(debug_cutoff) ||
- debug_cutoff == presult.vm_conflict_horizon);
- }
-#endif
-
/*
* Now save details of the LP_DEAD items from the page in vacrel
*/
@@ -3494,29 +3454,6 @@ dead_items_cleanup(LVRelState *vacrel)
vacrel->pvs = NULL;
}
-#ifdef USE_ASSERT_CHECKING
-
-/*
- * Wrapper for heap_page_would_be_all_visible() which can be used for callers
- * that expect no LP_DEAD on the page. Currently assert-only, but there is no
- * reason not to use it outside of asserts.
- */
-static bool
-heap_page_is_all_visible(Relation rel, Buffer buf,
- TransactionId OldestXmin,
- bool *all_frozen,
- TransactionId *visibility_cutoff_xid,
- OffsetNumber *logging_offnum)
-{
-
- return heap_page_would_be_all_visible(rel, buf,
- OldestXmin,
- NULL, 0,
- all_frozen,
- visibility_cutoff_xid,
- logging_offnum);
-}
-#endif
/*
* Check whether the heap page in buf is all-visible except for the dead
@@ -3540,15 +3477,12 @@ heap_page_is_all_visible(Relation rel, Buffer buf,
* - *logging_offnum: OffsetNumber of current tuple being processed;
* used by vacuum's error callback system.
*
- * Callers looking to verify that the page is already all-visible can call
- * heap_page_is_all_visible().
- *
* This logic is closely related to heap_prune_record_unchanged_lp_normal().
* If you modify this function, ensure consistency with that code. An
* assertion cross-checks that both remain in agreement. Do not introduce new
* side-effects.
*/
-static bool
+bool
heap_page_would_be_all_visible(Relation rel, Buffer buf,
TransactionId OldestXmin,
OffsetNumber *deadoffsets,
diff --git a/src/include/access/heapam.h b/src/include/access/heapam.h
index f3fa61c9c1b..9100d42ccbb 100644
--- a/src/include/access/heapam.h
+++ b/src/include/access/heapam.h
@@ -257,8 +257,7 @@ typedef struct PruneFreezeParams
* HEAP_PAGE_PRUNE_MARK_UNUSED_NOW indicates that dead items can be set
* LP_UNUSED during pruning.
*
- * HEAP_PAGE_PRUNE_FREEZE indicates that we will also freeze tuples, and
- * will return 'all_visible', 'all_frozen' flags to the caller.
+ * HEAP_PAGE_PRUNE_FREEZE indicates that we will also freeze tuples
*
* HEAP_PAGE_PRUNE_UPDATE_VM indicates that we will set the page's status
* in the VM.
@@ -294,21 +293,6 @@ typedef struct PruneFreezeResult
int live_tuples;
int recently_dead_tuples;
- /*
- * all_visible and all_frozen indicate if the all-visible and all-frozen
- * bits in the visibility map can be set for this page, after pruning.
- *
- * vm_conflict_horizon is the newest xmin of live tuples on the page. The
- * caller can use it as the conflict horizon when setting the VM bits. It
- * is only valid if we froze some tuples (nfrozen > 0), and all_frozen is
- * true.
- *
- * These are only set if the HEAP_PRUNE_FREEZE option is set.
- */
- bool all_visible;
- bool all_frozen;
- TransactionId vm_conflict_horizon;
-
/*
* old_vmbits are the state of the all-visible and all-frozen bits in the
* visibility map before updating it during phase I of vacuuming.
@@ -454,6 +438,13 @@ extern void log_heap_prune_and_freeze(Relation relation, Buffer buffer,
/* in heap/vacuumlazy.c */
extern void heap_vacuum_rel(Relation rel,
const VacuumParams params, BufferAccessStrategy bstrategy);
+extern bool heap_page_would_be_all_visible(Relation rel, Buffer buf,
+ TransactionId OldestXmin,
+ OffsetNumber *deadoffsets,
+ int ndeadoffsets,
+ bool *all_frozen,
+ TransactionId *visibility_cutoff_xid,
+ OffsetNumber *logging_offnum);
/* in heap/heapam_visibility.c */
extern bool HeapTupleSatisfiesVisibility(HeapTuple htup, Snapshot snapshot,
--
2.43.0
v26-0006-Eliminate-XLOG_HEAP2_VISIBLE-from-vacuum-phase-I.patchtext/x-patch; charset=US-ASCII; name=v26-0006-Eliminate-XLOG_HEAP2_VISIBLE-from-vacuum-phase-I.patchDownload
From 548a9a8ae3c633e5ab2cfec438aca03ba9d1e6f9 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <melanieplageman@gmail.com>
Date: Tue, 2 Dec 2025 16:16:22 -0500
Subject: [PATCH v26 06/15] Eliminate XLOG_HEAP2_VISIBLE from vacuum phase I
prune/freeze
Vacuum no longer emits a separate WAL record for each page set
all-visible or all-frozen during phase I. Instead, visibility map
updates are now included in the XLOG_HEAP2_PRUNE_VACUUM_SCAN record that
is already emitted for pruning and freezing.
Previously, heap_page_prune_and_freeze() determined whether a page was
all-visible, but the corresponding VM bits were only set later in
lazy_scan_prune(). Now the VM is updated immediately in
heap_page_prune_and_freeze(), at the same time as the heap
modifications.
This change applies only to vacuum phase I, not to pruning performed
during normal page access.
Author: Melanie Plageman <melanieplageman@gmail.com>
Reviewed-by: Andres Freund <andres@anarazel.de>
Reviewed-by: Robert Haas <robertmhaas@gmail.com>
Reviewed-by: Kirill Reshke <reshkekirill@gmail.com>
Reviewed-by: Chao Li <li.evan.chao@gmail.com>
Discussion: https://postgr.es/m/flat/CAAKRu_ZMw6Npd_qm2KM%2BFwQ3cMOMx1Dh3VMhp8-V7SOLxdK9-g%40mail.gmail.com
---
src/backend/access/heap/pruneheap.c | 275 ++++++++++++++++------------
1 file changed, 157 insertions(+), 118 deletions(-)
diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index ab567e7518b..5a7ba904f1a 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -205,6 +205,11 @@ static bool heap_page_will_set_vm(PruneState *prstate,
int nlpdead_items,
uint8 *old_vmbits,
uint8 *new_vmbits);
+static TransactionId get_conflict_xid(bool do_prune, bool do_freeze, bool do_set_vm,
+ uint8 old_vmbits, uint8 new_vmbits,
+ TransactionId latest_xid_removed,
+ TransactionId frz_conflict_horizon,
+ TransactionId visibility_cutoff_xid);
/*
@@ -785,6 +790,68 @@ heap_page_will_freeze(Relation relation, Buffer buffer,
return do_freeze;
}
+/*
+ * Calculate the conflict horizon for the whole XLOG_HEAP2_PRUNE_VACUUM_SCAN
+ * or XLOG_HEAP2_PRUNE_ON_ACCESS record.
+ */
+static TransactionId
+get_conflict_xid(bool do_prune, bool do_freeze, bool do_set_vm,
+ uint8 old_vmbits, uint8 new_vmbits,
+ TransactionId latest_xid_removed, TransactionId frz_conflict_horizon,
+ TransactionId visibility_cutoff_xid)
+{
+ TransactionId conflict_xid;
+
+ /*
+ * We can omit the snapshot conflict horizon if we are not pruning or
+ * freezing any tuples and are setting an already all-visible page
+ * all-frozen in the VM. In this case, all of the tuples on the page must
+ * already be visible to all MVCC snapshots on the standby.
+ */
+ if (!do_prune &&
+ !do_freeze &&
+ do_set_vm &&
+ (old_vmbits & VISIBILITYMAP_ALL_VISIBLE) != 0 &&
+ (new_vmbits & VISIBILITYMAP_ALL_FROZEN) != 0)
+ return InvalidTransactionId;
+
+ /*
+ * The snapshotConflictHorizon for the whole record should be the most
+ * conservative of all the horizons calculated for any of the possible
+ * modifications. If this record will prune tuples, any transactions on
+ * the standby older than the youngest xmax of the most recently removed
+ * tuple this record will prune will conflict. If this record will freeze
+ * tuples, any transactions on the standby with xids older than the
+ * youngest tuple this record will freeze will conflict.
+ */
+ conflict_xid = InvalidTransactionId;
+
+ /*
+ * If we are updating the VM, the conflict horizon is almost always the
+ * visibility cutoff XID.
+ *
+ * Separately, if we are freezing any tuples, as an optimization, we can
+ * use the visibility_cutoff_xid as the conflict horizon if the page will
+ * be all-frozen. This is true even if there are LP_DEAD line pointers
+ * because we ignored those when maintaining the visibility_cutoff_xid.
+ * This will have been calculated earlier as the frz_conflict_horizon when
+ * we determined we would freeze.
+ */
+ if (do_set_vm)
+ conflict_xid = visibility_cutoff_xid;
+ else if (do_freeze)
+ conflict_xid = frz_conflict_horizon;
+
+ /*
+ * If we are removing tuples with a younger xmax than our so far
+ * calculated conflict_xid, we must use this as our horizon.
+ */
+ if (TransactionIdFollows(latest_xid_removed, conflict_xid))
+ conflict_xid = latest_xid_removed;
+
+ return conflict_xid;
+}
+
/*
* Helper to correct any corruption detected on an heap page and its
* corresponding visibility map page after pruning but before setting the
@@ -986,7 +1053,6 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
Buffer vmbuffer = params->vmbuffer;
Page page = BufferGetPage(buffer);
BlockNumber blockno = BufferGetBlockNumber(buffer);
- TransactionId vm_conflict_horizon = InvalidTransactionId;
PruneState prstate;
bool do_freeze;
bool do_prune;
@@ -994,10 +1060,10 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
bool do_set_vm;
bool did_tuple_hint_fpi;
int64 fpi_before = pgWalUsage.wal_fpi;
+ TransactionId conflict_xid = InvalidTransactionId;
uint8 new_vmbits = 0;
uint8 old_vmbits = 0;
-
/* Initialize prstate */
prune_freeze_setup(params,
new_relfrozen_xid, new_relmin_mxid,
@@ -1058,6 +1124,37 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
prstate.all_visible = prstate.all_frozen = false;
Assert(!prstate.all_frozen || prstate.all_visible);
+ Assert(!prstate.all_visible || (prstate.lpdead_items == 0));
+
+ /*
+ * Decide whether to set the VM bits based on information from the VM and
+ * the all_visible/all_frozen flags.
+ */
+ do_set_vm = heap_page_will_set_vm(&prstate,
+ params->relation,
+ blockno,
+ buffer,
+ page,
+ vmbuffer,
+ prstate.lpdead_items,
+ &old_vmbits,
+ &new_vmbits);
+
+ /*
+ * new_vmbits should be 0 regardless of whether or not the page is
+ * all-visible if we do not intend to set the VM.
+ */
+ Assert(do_set_vm || new_vmbits == 0);
+
+ conflict_xid = get_conflict_xid(do_prune, do_freeze, do_set_vm,
+ old_vmbits, new_vmbits,
+ prstate.latest_xid_removed,
+ prstate.frz_conflict_horizon,
+ prstate.visibility_cutoff_xid);
+
+ /* Lock vmbuffer before entering a critical section */
+ if (do_set_vm)
+ LockBuffer(vmbuffer, BUFFER_LOCK_EXCLUSIVE);
/* Any error while applying the changes is critical */
START_CRIT_SECTION();
@@ -1079,14 +1176,17 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
/*
* If that's all we had to do to the page, this is a non-WAL-logged
- * hint. If we are going to freeze or prune the page, we will mark
- * the buffer dirty below.
+ * hint. If we are going to freeze or prune the page or set
+ * PD_ALL_VISIBLE, we will mark the buffer dirty below.
+ *
+ * Setting PD_ALL_VISIBLE is fully WAL-logged because it is forbidden
+ * for the VM to be set and PD_ALL_VISIBLE to be clear.
*/
- if (!do_freeze && !do_prune)
+ if (!do_freeze && !do_prune && !do_set_vm)
MarkBufferDirtyHint(buffer, true);
}
- if (do_prune || do_freeze)
+ if (do_prune || do_freeze || do_set_vm)
{
/* Apply the planned item changes and repair page fragmentation. */
if (do_prune)
@@ -1100,6 +1200,26 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
if (do_freeze)
heap_freeze_prepared_tuples(buffer, prstate.frozen, prstate.nfrozen);
+ /* Set the visibility map and page visibility hint */
+ if (do_set_vm)
+ {
+ /*
+ * While it is valid for PD_ALL_VISIBLE to be set when the
+ * corresponding VM bit is clear, we strongly prefer to keep them
+ * in sync.
+ *
+ * Even if we are only setting the VM and PD_ALL_VISIBLE is
+ * already set, we don't need to worry about unnecessarily
+ * dirtying the heap buffer below, as it must be marked dirty
+ * before adding it to the WAL chain. The only scenario where it
+ * is not already dirty is if the VM was removed, and that isn't
+ * worth optimizing for.
+ */
+ PageSetAllVisible(page);
+ visibilitymap_set_vmbits(blockno, vmbuffer, new_vmbits,
+ params->relation->rd_locator);
+ }
+
MarkBufferDirty(buffer);
/*
@@ -1107,29 +1227,12 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
*/
if (RelationNeedsWAL(params->relation))
{
- /*
- * The snapshotConflictHorizon for the whole record should be the
- * most conservative of all the horizons calculated for any of the
- * possible modifications. If this record will prune tuples, any
- * transactions on the standby older than the youngest xmax of the
- * most recently removed tuple this record will prune will
- * conflict. If this record will freeze tuples, any transactions
- * on the standby with xids older than the youngest tuple this
- * record will freeze will conflict.
- */
- TransactionId conflict_xid;
-
- if (TransactionIdFollows(prstate.frz_conflict_horizon,
- prstate.latest_xid_removed))
- conflict_xid = prstate.frz_conflict_horizon;
- else
- conflict_xid = prstate.latest_xid_removed;
-
log_heap_prune_and_freeze(params->relation, buffer,
- InvalidBuffer, /* vmbuffer */
- 0, /* vmflags */
+ do_set_vm ? vmbuffer : InvalidBuffer,
+ do_set_vm ? new_vmbits : 0,
conflict_xid,
- true, params->reason,
+ true, /* cleanup lock */
+ params->reason,
prstate.frozen, prstate.nfrozen,
prstate.redirected, prstate.nredirected,
prstate.nowdead, prstate.ndead,
@@ -1139,43 +1242,8 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
END_CRIT_SECTION();
- /* Copy information back for caller */
- presult->ndeleted = prstate.ndeleted;
- presult->nnewlpdead = prstate.ndead;
- presult->nfrozen = prstate.nfrozen;
- presult->live_tuples = prstate.live_tuples;
- presult->recently_dead_tuples = prstate.recently_dead_tuples;
- presult->hastup = prstate.hastup;
-
- presult->lpdead_items = prstate.lpdead_items;
- /* the presult->deadoffsets array was already filled in */
-
- if (prstate.attempt_freeze)
- {
- if (presult->nfrozen > 0)
- {
- *new_relfrozen_xid = prstate.pagefrz.FreezePageRelfrozenXid;
- *new_relmin_mxid = prstate.pagefrz.FreezePageRelminMxid;
- }
- else
- {
- *new_relfrozen_xid = prstate.pagefrz.NoFreezePageRelfrozenXid;
- *new_relmin_mxid = prstate.pagefrz.NoFreezePageRelminMxid;
- }
- }
-
- /*
- * If updating the visibility map, the conflict horizon for that record
- * must be the newest xmin on the page. However, if the page is
- * completely frozen, there can be no conflict and the vm_conflict_horizon
- * should remain InvalidTransactionId. This includes the case that we
- * just froze all the tuples; the prune-freeze record included the
- * conflict XID already so we don't need to again.
- */
- if (prstate.all_frozen)
- vm_conflict_horizon = InvalidTransactionId;
- else
- vm_conflict_horizon = prstate.visibility_cutoff_xid;
+ if (do_set_vm)
+ LockBuffer(vmbuffer, BUFFER_LOCK_UNLOCK);
/*
* During its second pass over the heap, VACUUM calls
@@ -1190,7 +1258,8 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
TransactionId debug_cutoff;
bool debug_all_frozen;
- Assert(presult->lpdead_items == 0);
+ Assert(prstate.lpdead_items == 0);
+ Assert(prstate.cutoffs);
Assert(heap_page_is_all_visible(params->relation, buffer,
prstate.cutoffs->OldestXmin,
@@ -1200,66 +1269,36 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
Assert(prstate.all_frozen == debug_all_frozen);
Assert(!TransactionIdIsValid(debug_cutoff) ||
- debug_cutoff == vm_conflict_horizon);
+ debug_cutoff == prstate.visibility_cutoff_xid);
}
#endif
- /* Now update the visibility map and PD_ALL_VISIBLE hint */
- Assert(!prstate.all_visible || (prstate.lpdead_items == 0));
-
- do_set_vm = heap_page_will_set_vm(&prstate,
- params->relation,
- blockno,
- buffer,
- page,
- vmbuffer,
- prstate.lpdead_items,
- &old_vmbits,
- &new_vmbits);
+ /* Copy information back for caller */
+ presult->ndeleted = prstate.ndeleted;
+ presult->nnewlpdead = prstate.ndead;
+ presult->nfrozen = prstate.nfrozen;
+ presult->live_tuples = prstate.live_tuples;
+ presult->recently_dead_tuples = prstate.recently_dead_tuples;
+ presult->hastup = prstate.hastup;
+ presult->new_vmbits = new_vmbits;
+ presult->old_vmbits = old_vmbits;
- /*
- * new_vmbits should be 0 regardless of whether or not the page is
- * all-visible if we do not intend to set the VM.
- */
- Assert(do_set_vm || new_vmbits == 0);
+ presult->lpdead_items = prstate.lpdead_items;
+ /* the presult->deadoffsets array was already filled in */
- /* Set the visibility map and page visibility hint, if relevant */
- if (do_set_vm)
+ if (prstate.attempt_freeze)
{
- Assert(prstate.all_visible);
-
- /*
- * It should never be the case that the visibility map page is set
- * while the page-level bit is clear, but the reverse is allowed (if
- * checksums are not enabled). Regardless, set both bits so that we
- * get back in sync.
- *
- * Even if PD_ALL_VISIBLE is already set, we don't need to worry about
- * unnecessarily dirtying the heap buffer, as it must be marked dirty
- * before adding it to the WAL chain. The only scenario where it is
- * not already dirty is if the VM was removed, and that isn't worth
- * optimizing for.
- */
- PageSetAllVisible(page);
- MarkBufferDirty(buffer);
-
- /*
- * If the page is being set all-frozen, we pass InvalidTransactionId
- * as the cutoff_xid, since a snapshot conflict horizon sufficient to
- * make everything safe for REDO was logged when the page's tuples
- * were frozen.
- */
- Assert(!prstate.all_frozen || !TransactionIdIsValid(vm_conflict_horizon));
-
- visibilitymap_set(params->relation, blockno, buffer,
- InvalidXLogRecPtr,
- vmbuffer, vm_conflict_horizon,
- new_vmbits);
+ if (presult->nfrozen > 0)
+ {
+ *new_relfrozen_xid = prstate.pagefrz.FreezePageRelfrozenXid;
+ *new_relmin_mxid = prstate.pagefrz.FreezePageRelminMxid;
+ }
+ else
+ {
+ *new_relfrozen_xid = prstate.pagefrz.NoFreezePageRelfrozenXid;
+ *new_relmin_mxid = prstate.pagefrz.NoFreezePageRelminMxid;
+ }
}
-
- /* Save the vmbits for caller */
- presult->old_vmbits = old_vmbits;
- presult->new_vmbits = new_vmbits;
}
--
2.43.0
v26-0007-Eliminate-XLOG_HEAP2_VISIBLE-from-empty-page-vac.patchtext/x-patch; charset=US-ASCII; name=v26-0007-Eliminate-XLOG_HEAP2_VISIBLE-from-empty-page-vac.patchDownload
From 703581cd909b354b4d1028eeba57f7edd45836e8 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <melanieplageman@gmail.com>
Date: Sat, 27 Sep 2025 11:55:21 -0400
Subject: [PATCH v26 07/15] Eliminate XLOG_HEAP2_VISIBLE from empty-page vacuum
As part of removing XLOG_HEAP2_VISIBLE records, phase I of VACUUM now
marks empty pages all-visible in a XLOG_HEAP2_PRUNE_VACUUM_SCAN record.
Author: Melanie Plageman <melanieplageman@gmail.com>
Reviewed-by: Robert Haas <robertmhaas@gmail.com>
Reviewed-by: Andres Freund <andres@anarazel.de>
Reviewed-by: Chao Li <li.evan.chao@gmail.com>
---
src/backend/access/heap/vacuumlazy.c | 35 +++++++++++++++++++++++-----
1 file changed, 29 insertions(+), 6 deletions(-)
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index 3fa03470722..210afa11346 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -1873,9 +1873,12 @@ lazy_scan_new_or_empty(LVRelState *vacrel, Buffer buf, BlockNumber blkno,
*/
if (!PageIsAllVisible(page))
{
+ /* Lock vmbuffer before entering critical section */
+ LockBuffer(vmbuffer, BUFFER_LOCK_EXCLUSIVE);
+
START_CRIT_SECTION();
- /* mark buffer dirty before writing a WAL record */
+ /* Mark buffer dirty before writing any WAL records */
MarkBufferDirty(buf);
/*
@@ -1892,13 +1895,33 @@ lazy_scan_new_or_empty(LVRelState *vacrel, Buffer buf, BlockNumber blkno,
log_newpage_buffer(buf, true);
PageSetAllVisible(page);
- visibilitymap_set(vacrel->rel, blkno, buf,
- InvalidXLogRecPtr,
- vmbuffer, InvalidTransactionId,
- VISIBILITYMAP_ALL_VISIBLE |
- VISIBILITYMAP_ALL_FROZEN);
+ visibilitymap_set_vmbits(blkno,
+ vmbuffer,
+ VISIBILITYMAP_ALL_VISIBLE |
+ VISIBILITYMAP_ALL_FROZEN,
+ vacrel->rel->rd_locator);
+
+ /*
+ * Emit WAL for setting PD_ALL_VISIBLE on the heap page and
+ * setting the VM.
+ */
+ if (RelationNeedsWAL(vacrel->rel))
+ log_heap_prune_and_freeze(vacrel->rel, buf,
+ vmbuffer,
+ VISIBILITYMAP_ALL_VISIBLE |
+ VISIBILITYMAP_ALL_FROZEN,
+ InvalidTransactionId, /* conflict xid */
+ false, /* cleanup lock */
+ PRUNE_VACUUM_SCAN, /* reason */
+ NULL, 0,
+ NULL, 0,
+ NULL, 0,
+ NULL, 0);
+
END_CRIT_SECTION();
+ LockBuffer(vmbuffer, BUFFER_LOCK_UNLOCK);
+
/* Count the newly all-frozen pages for logging */
vacrel->vm_new_visible_pages++;
vacrel->vm_new_visible_frozen_pages++;
--
2.43.0
v26-0008-Remove-XLOG_HEAP2_VISIBLE-entirely.patchtext/x-patch; charset=US-ASCII; name=v26-0008-Remove-XLOG_HEAP2_VISIBLE-entirely.patchDownload
From 322cb242b4a861f792fec7b6e5614bf0a9fc2dad Mon Sep 17 00:00:00 2001
From: Melanie Plageman <melanieplageman@gmail.com>
Date: Sat, 27 Sep 2025 11:55:36 -0400
Subject: [PATCH v26 08/15] Remove XLOG_HEAP2_VISIBLE entirely
As no remaining users emit XLOG_HEAP2_VISIBLE records.
This includes deleting the xl_heap_visible struct and all functions
responsible for emitting or replaying XLOG_HEAP2_VISIBLE records.
This changes the visibility map API, so any external users/consumers of
the VM-only WAL record will need to change.
Author: Melanie Plageman <melanieplageman@gmail.com>
Reviewed-by: Andrey Borodin <x4mmm@yandex-team.ru>
Reviewed-by: Andres Freund <andres@anarazel.de>
Reviewed-by: Chao Li <li.evan.chao@gmail.com>
---
src/backend/access/common/bufmask.c | 4 +-
src/backend/access/heap/heapam.c | 54 +-------
src/backend/access/heap/heapam_xlog.c | 155 ++---------------------
src/backend/access/heap/pruneheap.c | 4 +-
src/backend/access/heap/vacuumlazy.c | 16 +--
src/backend/access/heap/visibilitymap.c | 112 +---------------
src/backend/access/rmgrdesc/heapdesc.c | 10 --
src/backend/replication/logical/decode.c | 1 -
src/backend/storage/ipc/standby.c | 12 +-
src/include/access/heapam_xlog.h | 28 +---
src/include/access/visibilitymap.h | 13 +-
src/include/access/visibilitymapdefs.h | 9 --
src/tools/pgindent/typedefs.list | 1 -
13 files changed, 45 insertions(+), 374 deletions(-)
diff --git a/src/backend/access/common/bufmask.c b/src/backend/access/common/bufmask.c
index bb260cffa68..5f07f179415 100644
--- a/src/backend/access/common/bufmask.c
+++ b/src/backend/access/common/bufmask.c
@@ -56,8 +56,8 @@ mask_page_hint_bits(Page page)
/*
* During replay, if the page LSN has advanced past our XLOG record's LSN,
- * we don't mark the page all-visible. See heap_xlog_visible() for
- * details.
+ * we don't mark the page all-visible. See heap_xlog_prune_and_freeze()
+ * for more details.
*/
PageClearAllVisible(page);
}
diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index 6daf4a87dec..fb7a7548aa0 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -2539,11 +2539,11 @@ heap_multi_insert(Relation relation, TupleTableSlot **slots, int ntuples,
else if (all_frozen_set)
{
PageSetAllVisible(page);
- visibilitymap_set_vmbits(BufferGetBlockNumber(buffer),
- vmbuffer,
- VISIBILITYMAP_ALL_VISIBLE |
- VISIBILITYMAP_ALL_FROZEN,
- relation->rd_locator);
+ visibilitymap_set(BufferGetBlockNumber(buffer),
+ vmbuffer,
+ VISIBILITYMAP_ALL_VISIBLE |
+ VISIBILITYMAP_ALL_FROZEN,
+ relation->rd_locator);
}
/*
@@ -8813,50 +8813,6 @@ bottomup_sort_and_shrink(TM_IndexDeleteOp *delstate)
return nblocksfavorable;
}
-/*
- * Perform XLogInsert for a heap-visible operation. 'block' is the block
- * being marked all-visible, and vm_buffer is the buffer containing the
- * corresponding visibility map block. Both should have already been modified
- * and dirtied.
- *
- * snapshotConflictHorizon comes from the largest xmin on the page being
- * marked all-visible. REDO routine uses it to generate recovery conflicts.
- *
- * If checksums or wal_log_hints are enabled, we may also generate a full-page
- * image of heap_buffer. Otherwise, we optimize away the FPI (by specifying
- * REGBUF_NO_IMAGE for the heap buffer), in which case the caller should *not*
- * update the heap page's LSN.
- */
-XLogRecPtr
-log_heap_visible(Relation rel, Buffer heap_buffer, Buffer vm_buffer,
- TransactionId snapshotConflictHorizon, uint8 vmflags)
-{
- xl_heap_visible xlrec;
- XLogRecPtr recptr;
- uint8 flags;
-
- Assert(BufferIsValid(heap_buffer));
- Assert(BufferIsValid(vm_buffer));
-
- xlrec.snapshotConflictHorizon = snapshotConflictHorizon;
- xlrec.flags = vmflags;
- if (RelationIsAccessibleInLogicalDecoding(rel))
- xlrec.flags |= VISIBILITYMAP_XLOG_CATALOG_REL;
- XLogBeginInsert();
- XLogRegisterData(&xlrec, SizeOfHeapVisible);
-
- XLogRegisterBuffer(0, vm_buffer, 0);
-
- flags = REGBUF_STANDARD;
- if (!XLogHintBitIsNeeded())
- flags |= REGBUF_NO_IMAGE;
- XLogRegisterBuffer(1, heap_buffer, flags);
-
- recptr = XLogInsert(RM_HEAP2_ID, XLOG_HEAP2_VISIBLE);
-
- return recptr;
-}
-
/*
* Perform XLogInsert for a heap-update operation. Caller must already
* have modified the buffer(s) and marked them dirty.
diff --git a/src/backend/access/heap/heapam_xlog.c b/src/backend/access/heap/heapam_xlog.c
index 1823feff298..47d2479415e 100644
--- a/src/backend/access/heap/heapam_xlog.c
+++ b/src/backend/access/heap/heapam_xlog.c
@@ -236,7 +236,7 @@ heap_xlog_prune_freeze(XLogReaderState *record)
if (PageIsNew(vmpage))
PageInit(vmpage, BLCKSZ, 0);
- visibilitymap_set_vmbits(blkno, vmbuffer, vmflags, rlocator);
+ visibilitymap_set(blkno, vmbuffer, vmflags, rlocator);
Assert(BufferIsDirty(vmbuffer));
PageSetLSN(vmpage, lsn);
@@ -249,142 +249,6 @@ heap_xlog_prune_freeze(XLogReaderState *record)
XLogRecordPageWithFreeSpace(rlocator, blkno, freespace);
}
-/*
- * Replay XLOG_HEAP2_VISIBLE records.
- *
- * The critical integrity requirement here is that we must never end up with
- * a situation where the visibility map bit is set, and the page-level
- * PD_ALL_VISIBLE bit is clear. If that were to occur, then a subsequent
- * page modification would fail to clear the visibility map bit.
- */
-static void
-heap_xlog_visible(XLogReaderState *record)
-{
- XLogRecPtr lsn = record->EndRecPtr;
- xl_heap_visible *xlrec = (xl_heap_visible *) XLogRecGetData(record);
- Buffer vmbuffer = InvalidBuffer;
- Buffer buffer;
- Page page;
- RelFileLocator rlocator;
- BlockNumber blkno;
- XLogRedoAction action;
-
- Assert((xlrec->flags & VISIBILITYMAP_XLOG_VALID_BITS) == xlrec->flags);
-
- XLogRecGetBlockTag(record, 1, &rlocator, NULL, &blkno);
-
- /*
- * If there are any Hot Standby transactions running that have an xmin
- * horizon old enough that this page isn't all-visible for them, they
- * might incorrectly decide that an index-only scan can skip a heap fetch.
- *
- * NB: It might be better to throw some kind of "soft" conflict here that
- * forces any index-only scan that is in flight to perform heap fetches,
- * rather than killing the transaction outright.
- */
- if (InHotStandby)
- ResolveRecoveryConflictWithSnapshot(xlrec->snapshotConflictHorizon,
- xlrec->flags & VISIBILITYMAP_XLOG_CATALOG_REL,
- rlocator);
-
- /*
- * Read the heap page, if it still exists. If the heap file has dropped or
- * truncated later in recovery, we don't need to update the page, but we'd
- * better still update the visibility map.
- */
- action = XLogReadBufferForRedo(record, 1, &buffer);
- if (action == BLK_NEEDS_REDO)
- {
- /*
- * We don't bump the LSN of the heap page when setting the visibility
- * map bit (unless checksums or wal_hint_bits is enabled, in which
- * case we must). This exposes us to torn page hazards, but since
- * we're not inspecting the existing page contents in any way, we
- * don't care.
- */
- page = BufferGetPage(buffer);
-
- PageSetAllVisible(page);
-
- if (XLogHintBitIsNeeded())
- PageSetLSN(page, lsn);
-
- MarkBufferDirty(buffer);
- }
- else if (action == BLK_RESTORED)
- {
- /*
- * If heap block was backed up, we already restored it and there's
- * nothing more to do. (This can only happen with checksums or
- * wal_log_hints enabled.)
- */
- }
-
- if (BufferIsValid(buffer))
- {
- Size space = PageGetFreeSpace(BufferGetPage(buffer));
-
- UnlockReleaseBuffer(buffer);
-
- /*
- * Since FSM is not WAL-logged and only updated heuristically, it
- * easily becomes stale in standbys. If the standby is later promoted
- * and runs VACUUM, it will skip updating individual free space
- * figures for pages that became all-visible (or all-frozen, depending
- * on the vacuum mode,) which is troublesome when FreeSpaceMapVacuum
- * propagates too optimistic free space values to upper FSM layers;
- * later inserters try to use such pages only to find out that they
- * are unusable. This can cause long stalls when there are many such
- * pages.
- *
- * Forestall those problems by updating FSM's idea about a page that
- * is becoming all-visible or all-frozen.
- *
- * Do this regardless of a full-page image being applied, since the
- * FSM data is not in the page anyway.
- */
- if (xlrec->flags & VISIBILITYMAP_VALID_BITS)
- XLogRecordPageWithFreeSpace(rlocator, blkno, space);
- }
-
- /*
- * Even if we skipped the heap page update due to the LSN interlock, it's
- * still safe to update the visibility map. Any WAL record that clears
- * the visibility map bit does so before checking the page LSN, so any
- * bits that need to be cleared will still be cleared.
- */
- if (XLogReadBufferForRedoExtended(record, 0, RBM_ZERO_ON_ERROR, false,
- &vmbuffer) == BLK_NEEDS_REDO)
- {
- Page vmpage = BufferGetPage(vmbuffer);
- Relation reln;
- uint8 vmbits;
-
- /* initialize the page if it was read as zeros */
- if (PageIsNew(vmpage))
- PageInit(vmpage, BLCKSZ, 0);
-
- /* remove VISIBILITYMAP_XLOG_* */
- vmbits = xlrec->flags & VISIBILITYMAP_VALID_BITS;
-
- /*
- * XLogReadBufferForRedoExtended locked the buffer. But
- * visibilitymap_set will handle locking itself.
- */
- LockBuffer(vmbuffer, BUFFER_LOCK_UNLOCK);
-
- reln = CreateFakeRelcacheEntry(rlocator);
-
- visibilitymap_set(reln, blkno, InvalidBuffer, lsn, vmbuffer,
- xlrec->snapshotConflictHorizon, vmbits);
-
- ReleaseBuffer(vmbuffer);
- FreeFakeRelcacheEntry(reln);
- }
- else if (BufferIsValid(vmbuffer))
- UnlockReleaseBuffer(vmbuffer);
-}
-
/*
* Given an "infobits" field from an XLog record, set the correct bits in the
* given infomask and infomask2 for the tuple touched by the record.
@@ -762,8 +626,8 @@ heap_xlog_multi_insert(XLogReaderState *record)
*
* During recovery, however, no concurrent writers exist. Therefore,
* updating the VM without holding the heap page lock is safe enough. This
- * same approach is taken when replaying xl_heap_visible records (see
- * heap_xlog_visible()).
+ * same approach is taken when replaying XLOG_HEAP2_PRUNE* records (see
+ * heap_xlog_prune_and_freeze()).
*/
if ((xlrec->flags & XLH_INSERT_ALL_FROZEN_SET) &&
XLogReadBufferForRedoExtended(record, 1, RBM_ZERO_ON_ERROR, false,
@@ -775,11 +639,11 @@ heap_xlog_multi_insert(XLogReaderState *record)
if (PageIsNew(vmpage))
PageInit(vmpage, BLCKSZ, 0);
- visibilitymap_set_vmbits(blkno,
- vmbuffer,
- VISIBILITYMAP_ALL_VISIBLE |
- VISIBILITYMAP_ALL_FROZEN,
- rlocator);
+ visibilitymap_set(blkno,
+ vmbuffer,
+ VISIBILITYMAP_ALL_VISIBLE |
+ VISIBILITYMAP_ALL_FROZEN,
+ rlocator);
Assert(BufferIsDirty(vmbuffer));
PageSetLSN(vmpage, lsn);
@@ -1360,9 +1224,6 @@ heap2_redo(XLogReaderState *record)
case XLOG_HEAP2_PRUNE_VACUUM_CLEANUP:
heap_xlog_prune_freeze(record);
break;
- case XLOG_HEAP2_VISIBLE:
- heap_xlog_visible(record);
- break;
case XLOG_HEAP2_MULTI_INSERT:
heap_xlog_multi_insert(record);
break;
diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index 5a7ba904f1a..28a50c83af4 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -1216,8 +1216,8 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
* worth optimizing for.
*/
PageSetAllVisible(page);
- visibilitymap_set_vmbits(blockno, vmbuffer, new_vmbits,
- params->relation->rd_locator);
+ visibilitymap_set(blockno, vmbuffer, new_vmbits,
+ params->relation->rd_locator);
}
MarkBufferDirty(buffer);
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index 210afa11346..87820f3ff49 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -1895,11 +1895,11 @@ lazy_scan_new_or_empty(LVRelState *vacrel, Buffer buf, BlockNumber blkno,
log_newpage_buffer(buf, true);
PageSetAllVisible(page);
- visibilitymap_set_vmbits(blkno,
- vmbuffer,
- VISIBILITYMAP_ALL_VISIBLE |
- VISIBILITYMAP_ALL_FROZEN,
- vacrel->rel->rd_locator);
+ visibilitymap_set(blkno,
+ vmbuffer,
+ VISIBILITYMAP_ALL_VISIBLE |
+ VISIBILITYMAP_ALL_FROZEN,
+ vacrel->rel->rd_locator);
/*
* Emit WAL for setting PD_ALL_VISIBLE on the heap page and
@@ -2776,9 +2776,9 @@ lazy_vacuum_heap_page(LVRelState *vacrel, BlockNumber blkno, Buffer buffer,
* set PD_ALL_VISIBLE.
*/
PageSetAllVisible(page);
- visibilitymap_set_vmbits(blkno,
- vmbuffer, vmflags,
- vacrel->rel->rd_locator);
+ visibilitymap_set(blkno,
+ vmbuffer, vmflags,
+ vacrel->rel->rd_locator);
conflict_xid = visibility_cutoff_xid;
}
diff --git a/src/backend/access/heap/visibilitymap.c b/src/backend/access/heap/visibilitymap.c
index d14588e92ae..7997e926872 100644
--- a/src/backend/access/heap/visibilitymap.c
+++ b/src/backend/access/heap/visibilitymap.c
@@ -14,8 +14,7 @@
* visibilitymap_clear - clear bits for one page in the visibility map
* visibilitymap_pin - pin a map page for setting a bit
* visibilitymap_pin_ok - check whether correct map page is already pinned
- * visibilitymap_set - set bit(s) in a previously pinned page and log
- * visibilitymap_set_vmbits - set bit(s) in a pinned page
+ * visibilitymap_set - set bit(s) in a previously pinned page
* visibilitymap_get_status - get status of bits
* visibilitymap_count - count number of bits set in visibility map
* visibilitymap_prepare_truncate -
@@ -220,109 +219,6 @@ visibilitymap_pin_ok(BlockNumber heapBlk, Buffer vmbuf)
return BufferIsValid(vmbuf) && BufferGetBlockNumber(vmbuf) == mapBlock;
}
-/*
- * visibilitymap_set - set bit(s) on a previously pinned page
- *
- * recptr is the LSN of the XLOG record we're replaying, if we're in recovery,
- * or InvalidXLogRecPtr in normal running. The VM page LSN is advanced to the
- * one provided; in normal running, we generate a new XLOG record and set the
- * page LSN to that value (though the heap page's LSN may *not* be updated;
- * see below). cutoff_xid is the largest xmin on the page being marked
- * all-visible; it is needed for Hot Standby, and can be InvalidTransactionId
- * if the page contains no tuples. It can also be set to InvalidTransactionId
- * when a page that is already all-visible is being marked all-frozen.
- *
- * Caller is expected to set the heap page's PD_ALL_VISIBLE bit before calling
- * this function. Except in recovery, caller should also pass the heap
- * buffer. When checksums are enabled and we're not in recovery, we must add
- * the heap buffer to the WAL chain to protect it from being torn.
- *
- * You must pass a buffer containing the correct map page to this function.
- * Call visibilitymap_pin first to pin the right one. This function doesn't do
- * any I/O.
- *
- * Returns the state of the page's VM bits before setting flags.
- */
-uint8
-visibilitymap_set(Relation rel, BlockNumber heapBlk, Buffer heapBuf,
- XLogRecPtr recptr, Buffer vmBuf, TransactionId cutoff_xid,
- uint8 flags)
-{
- BlockNumber mapBlock = HEAPBLK_TO_MAPBLOCK(heapBlk);
- uint32 mapByte = HEAPBLK_TO_MAPBYTE(heapBlk);
- uint8 mapOffset = HEAPBLK_TO_OFFSET(heapBlk);
- Page page;
- uint8 *map;
- uint8 status;
-
-#ifdef TRACE_VISIBILITYMAP
- elog(DEBUG1, "vm_set flags 0x%02X for %s %d",
- flags, RelationGetRelationName(rel), heapBlk);
-#endif
-
- Assert(InRecovery || !XLogRecPtrIsValid(recptr));
- Assert(InRecovery || PageIsAllVisible(BufferGetPage(heapBuf)));
- Assert((flags & VISIBILITYMAP_VALID_BITS) == flags);
-
- /* Must never set all_frozen bit without also setting all_visible bit */
- Assert(flags != VISIBILITYMAP_ALL_FROZEN);
-
- /* Check that we have the right heap page pinned, if present */
- if (BufferIsValid(heapBuf) && BufferGetBlockNumber(heapBuf) != heapBlk)
- elog(ERROR, "wrong heap buffer passed to visibilitymap_set");
-
- Assert(!BufferIsValid(heapBuf) ||
- BufferIsLockedByMeInMode(heapBuf, BUFFER_LOCK_EXCLUSIVE));
-
- /* Check that we have the right VM page pinned */
- if (!BufferIsValid(vmBuf) || BufferGetBlockNumber(vmBuf) != mapBlock)
- elog(ERROR, "wrong VM buffer passed to visibilitymap_set");
-
- page = BufferGetPage(vmBuf);
- map = (uint8 *) PageGetContents(page);
- LockBuffer(vmBuf, BUFFER_LOCK_EXCLUSIVE);
-
- status = (map[mapByte] >> mapOffset) & VISIBILITYMAP_VALID_BITS;
- if (flags != status)
- {
- START_CRIT_SECTION();
-
- map[mapByte] |= (flags << mapOffset);
- MarkBufferDirty(vmBuf);
-
- if (RelationNeedsWAL(rel))
- {
- if (!XLogRecPtrIsValid(recptr))
- {
- Assert(!InRecovery);
- recptr = log_heap_visible(rel, heapBuf, vmBuf, cutoff_xid, flags);
-
- /*
- * If data checksums are enabled (or wal_log_hints=on), we
- * need to protect the heap page from being torn.
- *
- * If not, then we must *not* update the heap page's LSN. In
- * this case, the FPI for the heap page was omitted from the
- * WAL record inserted above, so it would be incorrect to
- * update the heap page's LSN.
- */
- if (XLogHintBitIsNeeded())
- {
- Page heapPage = BufferGetPage(heapBuf);
-
- PageSetLSN(heapPage, recptr);
- }
- }
- PageSetLSN(page, recptr);
- }
-
- END_CRIT_SECTION();
- }
-
- LockBuffer(vmBuf, BUFFER_LOCK_UNLOCK);
- return status;
-}
-
/*
* Set VM (visibility map) flags in the VM block in vmBuf.
*
@@ -344,9 +240,9 @@ visibilitymap_set(Relation rel, BlockNumber heapBlk, Buffer heapBuf,
* rlocator is used only for debugging messages.
*/
uint8
-visibilitymap_set_vmbits(BlockNumber heapBlk,
- Buffer vmBuf, uint8 flags,
- const RelFileLocator rlocator)
+visibilitymap_set(BlockNumber heapBlk,
+ Buffer vmBuf, uint8 flags,
+ const RelFileLocator rlocator)
{
BlockNumber mapBlock = HEAPBLK_TO_MAPBLOCK(heapBlk);
uint32 mapByte = HEAPBLK_TO_MAPBYTE(heapBlk);
diff --git a/src/backend/access/rmgrdesc/heapdesc.c b/src/backend/access/rmgrdesc/heapdesc.c
index ca26d1f0ed1..08461fdf593 100644
--- a/src/backend/access/rmgrdesc/heapdesc.c
+++ b/src/backend/access/rmgrdesc/heapdesc.c
@@ -349,13 +349,6 @@ heap2_desc(StringInfo buf, XLogReaderState *record)
}
}
}
- else if (info == XLOG_HEAP2_VISIBLE)
- {
- xl_heap_visible *xlrec = (xl_heap_visible *) rec;
-
- appendStringInfo(buf, "snapshotConflictHorizon: %u, flags: 0x%02X",
- xlrec->snapshotConflictHorizon, xlrec->flags);
- }
else if (info == XLOG_HEAP2_MULTI_INSERT)
{
xl_heap_multi_insert *xlrec = (xl_heap_multi_insert *) rec;
@@ -461,9 +454,6 @@ heap2_identify(uint8 info)
case XLOG_HEAP2_PRUNE_VACUUM_CLEANUP:
id = "PRUNE_VACUUM_CLEANUP";
break;
- case XLOG_HEAP2_VISIBLE:
- id = "VISIBLE";
- break;
case XLOG_HEAP2_MULTI_INSERT:
id = "MULTI_INSERT";
break;
diff --git a/src/backend/replication/logical/decode.c b/src/backend/replication/logical/decode.c
index 5e15cb1825e..c0cac7ea1c3 100644
--- a/src/backend/replication/logical/decode.c
+++ b/src/backend/replication/logical/decode.c
@@ -454,7 +454,6 @@ heap2_decode(LogicalDecodingContext *ctx, XLogRecordBuffer *buf)
case XLOG_HEAP2_PRUNE_ON_ACCESS:
case XLOG_HEAP2_PRUNE_VACUUM_SCAN:
case XLOG_HEAP2_PRUNE_VACUUM_CLEANUP:
- case XLOG_HEAP2_VISIBLE:
case XLOG_HEAP2_LOCK_UPDATED:
break;
default:
diff --git a/src/backend/storage/ipc/standby.c b/src/backend/storage/ipc/standby.c
index fc45d72c79b..3655358ed6b 100644
--- a/src/backend/storage/ipc/standby.c
+++ b/src/backend/storage/ipc/standby.c
@@ -475,12 +475,12 @@ ResolveRecoveryConflictWithSnapshot(TransactionId snapshotConflictHorizon,
* If we get passed InvalidTransactionId then we do nothing (no conflict).
*
* This can happen when replaying already-applied WAL records after a
- * standby crash or restart, or when replaying an XLOG_HEAP2_VISIBLE
- * record that marks as frozen a page which was already all-visible. It's
- * also quite common with records generated during index deletion
- * (original execution of the deletion can reason that a recovery conflict
- * which is sufficient for the deletion operation must take place before
- * replay of the deletion record itself).
+ * standby crash or restart, or when replaying a record that marks as
+ * frozen a page which was already marked all-visible in the visibility
+ * map. It's also quite common with records generated during index
+ * deletion (original execution of the deletion can reason that a recovery
+ * conflict which is sufficient for the deletion operation must take place
+ * before replay of the deletion record itself).
*/
if (!TransactionIdIsValid(snapshotConflictHorizon))
return;
diff --git a/src/include/access/heapam_xlog.h b/src/include/access/heapam_xlog.h
index 16c2b2e3c9c..69678187832 100644
--- a/src/include/access/heapam_xlog.h
+++ b/src/include/access/heapam_xlog.h
@@ -60,7 +60,6 @@
#define XLOG_HEAP2_PRUNE_ON_ACCESS 0x10
#define XLOG_HEAP2_PRUNE_VACUUM_SCAN 0x20
#define XLOG_HEAP2_PRUNE_VACUUM_CLEANUP 0x30
-#define XLOG_HEAP2_VISIBLE 0x40
#define XLOG_HEAP2_MULTI_INSERT 0x50
#define XLOG_HEAP2_LOCK_UPDATED 0x60
#define XLOG_HEAP2_NEW_CID 0x70
@@ -294,7 +293,13 @@ typedef struct xl_heap_prune
#define SizeOfHeapPrune (offsetof(xl_heap_prune, flags) + sizeof(uint16))
-/* to handle recovery conflict during logical decoding on standby */
+/*
+ * To handle recovery conflict during logical decoding on standby, we must know
+ * if the table is a catalog table. Note that in visibilitymapdefs.h
+ * VISIBILITYMAP_XLOG_CATALOG_REL is also defined as (1 << 2). xl_heap_prune
+ * records should use XLHP_IS_CATALOG_REL, not VISIBILIYTMAP_XLOG_CATALOG_REL --
+ * even if they only contain updates to the VM.
+ */
#define XLHP_IS_CATALOG_REL (1 << 1)
/*
@@ -443,20 +448,6 @@ typedef struct xl_heap_inplace
#define MinSizeOfHeapInplace (offsetof(xl_heap_inplace, nmsgs) + sizeof(int))
-/*
- * This is what we need to know about setting a visibility map bit
- *
- * Backup blk 0: visibility map buffer
- * Backup blk 1: heap buffer
- */
-typedef struct xl_heap_visible
-{
- TransactionId snapshotConflictHorizon;
- uint8 flags;
-} xl_heap_visible;
-
-#define SizeOfHeapVisible (offsetof(xl_heap_visible, flags) + sizeof(uint8))
-
typedef struct xl_heap_new_cid
{
/*
@@ -500,11 +491,6 @@ extern void heap2_desc(StringInfo buf, XLogReaderState *record);
extern const char *heap2_identify(uint8 info);
extern void heap_xlog_logical_rewrite(XLogReaderState *r);
-extern XLogRecPtr log_heap_visible(Relation rel, Buffer heap_buffer,
- Buffer vm_buffer,
- TransactionId snapshotConflictHorizon,
- uint8 vmflags);
-
/* in heapdesc.c, so it can be shared between frontend/backend code */
extern void heap_xlog_deserialize_prune_and_freeze(char *cursor, uint16 flags,
int *nplans, xlhp_freeze_plan **plans,
diff --git a/src/include/access/visibilitymap.h b/src/include/access/visibilitymap.h
index c6fa37be968..05ba6786b47 100644
--- a/src/include/access/visibilitymap.h
+++ b/src/include/access/visibilitymap.h
@@ -15,7 +15,6 @@
#define VISIBILITYMAP_H
#include "access/visibilitymapdefs.h"
-#include "access/xlogdefs.h"
#include "storage/block.h"
#include "storage/buf.h"
#include "storage/relfilelocator.h"
@@ -32,15 +31,9 @@ extern bool visibilitymap_clear(Relation rel, BlockNumber heapBlk,
extern void visibilitymap_pin(Relation rel, BlockNumber heapBlk,
Buffer *vmbuf);
extern bool visibilitymap_pin_ok(BlockNumber heapBlk, Buffer vmbuf);
-extern uint8 visibilitymap_set(Relation rel,
- BlockNumber heapBlk, Buffer heapBuf,
- XLogRecPtr recptr,
- Buffer vmBuf,
- TransactionId cutoff_xid,
- uint8 flags);
-extern uint8 visibilitymap_set_vmbits(BlockNumber heapBlk,
- Buffer vmBuf, uint8 flags,
- const RelFileLocator rlocator);
+extern uint8 visibilitymap_set(BlockNumber heapBlk,
+ Buffer vmBuf, uint8 flags,
+ const RelFileLocator rlocator);
extern uint8 visibilitymap_get_status(Relation rel, BlockNumber heapBlk, Buffer *vmbuf);
extern void visibilitymap_count(Relation rel, BlockNumber *all_visible, BlockNumber *all_frozen);
extern BlockNumber visibilitymap_prepare_truncate(Relation rel,
diff --git a/src/include/access/visibilitymapdefs.h b/src/include/access/visibilitymapdefs.h
index 5ad5c020877..e01bce4c99f 100644
--- a/src/include/access/visibilitymapdefs.h
+++ b/src/include/access/visibilitymapdefs.h
@@ -21,14 +21,5 @@
#define VISIBILITYMAP_ALL_FROZEN 0x02
#define VISIBILITYMAP_VALID_BITS 0x03 /* OR of all valid visibilitymap
* flags bits */
-/*
- * To detect recovery conflicts during logical decoding on a standby, we need
- * to know if a table is a user catalog table. For that we add an additional
- * bit into xl_heap_visible.flags, in addition to the above.
- *
- * NB: VISIBILITYMAP_XLOG_* may not be passed to visibilitymap_set().
- */
-#define VISIBILITYMAP_XLOG_CATALOG_REL 0x04
-#define VISIBILITYMAP_XLOG_VALID_BITS (VISIBILITYMAP_VALID_BITS | VISIBILITYMAP_XLOG_CATALOG_REL)
#endif /* VISIBILITYMAPDEFS_H */
diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list
index 3451538565e..267d7dc5524 100644
--- a/src/tools/pgindent/typedefs.list
+++ b/src/tools/pgindent/typedefs.list
@@ -4330,7 +4330,6 @@ xl_heap_prune
xl_heap_rewrite_mapping
xl_heap_truncate
xl_heap_update
-xl_heap_visible
xl_invalid_page
xl_invalid_page_key
xl_invalidations
--
2.43.0
v26-0009-Rename-GlobalVisTestIsRemovableXid-to-GlobalVisX.patchtext/x-patch; charset=UTF-8; name=v26-0009-Rename-GlobalVisTestIsRemovableXid-to-GlobalVisX.patchDownload
From cf4615d826340a62957be22d0c41f787194f065d Mon Sep 17 00:00:00 2001
From: Melanie Plageman <melanieplageman@gmail.com>
Date: Fri, 18 Jul 2025 16:30:04 -0400
Subject: [PATCH v26 09/15] Rename GlobalVisTestIsRemovableXid() to
GlobalVisXidVisibleToAll()
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
The function is currently only used to check whether a tuple’s xmax is
visible to all transactions (and thus removable). Upcoming changes will
also use it to test whether a tuple’s xmin is visible to all to
decide if a page can be marked all-visible in the visibility map.
The new name, GlobalVisXidVisibleToAll(), better reflects this broader
purpose.
Reviewed-by: Andres Freund <andres@anarazel.de>
Reviewed-by: Kirill Reshke <reshkekirill@gmail.com>
Reviewed-by: Chao Li <li.evan.chao@gmail.com>
---
src/backend/access/heap/heapam_visibility.c | 6 +++---
src/backend/access/heap/pruneheap.c | 12 ++++++------
src/backend/access/spgist/spgvacuum.c | 2 +-
src/backend/storage/ipc/procarray.c | 17 ++++++++---------
src/include/utils/snapmgr.h | 4 ++--
5 files changed, 20 insertions(+), 21 deletions(-)
diff --git a/src/backend/access/heap/heapam_visibility.c b/src/backend/access/heap/heapam_visibility.c
index 05f6946fe60..4ebc8abdbeb 100644
--- a/src/backend/access/heap/heapam_visibility.c
+++ b/src/backend/access/heap/heapam_visibility.c
@@ -1447,7 +1447,7 @@ HeapTupleSatisfiesNonVacuumable(HeapTuple htup, Snapshot snapshot,
{
Assert(TransactionIdIsValid(dead_after));
- if (GlobalVisTestIsRemovableXid(snapshot->vistest, dead_after))
+ if (GlobalVisXidVisibleToAll(snapshot->vistest, dead_after))
res = HEAPTUPLE_DEAD;
}
else
@@ -1512,8 +1512,8 @@ HeapTupleIsSurelyDead(HeapTuple htup, GlobalVisState *vistest)
return false;
/* Deleter committed, so tuple is dead if the XID is old enough. */
- return GlobalVisTestIsRemovableXid(vistest,
- HeapTupleHeaderGetRawXmax(tuple));
+ return GlobalVisXidVisibleToAll(vistest,
+ HeapTupleHeaderGetRawXmax(tuple));
}
/*
diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index 28a50c83af4..0574c78a5eb 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -255,7 +255,7 @@ heap_page_prune_opt(Relation relation, Buffer buffer)
*/
vistest = GlobalVisTestFor(relation);
- if (!GlobalVisTestIsRemovableXid(vistest, prune_xid))
+ if (!GlobalVisXidVisibleToAll(vistest, prune_xid))
return;
/*
@@ -488,7 +488,7 @@ prune_freeze_plan(Oid reloid, Buffer buffer, PruneState *prstate,
* Determining HTSV only once for each tuple is required for correctness,
* to deal with cases where running HTSV twice could result in different
* results. For example, RECENTLY_DEAD can turn to DEAD if another
- * checked item causes GlobalVisTestIsRemovableFullXid() to update the
+ * checked item causes GlobalVisFullXidVisibleToAll() to update the
* horizon, or INSERT_IN_PROGRESS can change to DEAD if the inserting
* transaction aborts.
*
@@ -1331,11 +1331,11 @@ heap_prune_satisfies_vacuum(PruneState *prstate, HeapTuple tup, Buffer buffer)
* Determine whether or not the tuple is considered dead when compared
* with the provided GlobalVisState. On-access pruning does not provide
* VacuumCutoffs. And for vacuum, even if the tuple's xmax is not older
- * than OldestXmin, GlobalVisTestIsRemovableXid() could find the row dead
- * if the GlobalVisState has been updated since the beginning of vacuuming
+ * than OldestXmin, GlobalVisXidVisibleToAll() could find the row dead if
+ * the GlobalVisState has been updated since the beginning of vacuuming
* the relation.
*/
- if (GlobalVisTestIsRemovableXid(prstate->vistest, dead_after))
+ if (GlobalVisXidVisibleToAll(prstate->vistest, dead_after))
return HEAPTUPLE_DEAD;
return res;
@@ -1794,7 +1794,7 @@ heap_prune_record_unchanged_lp_normal(Page page, PruneState *prstate, OffsetNumb
/*
* For now always use prstate->cutoffs for this test, because
* we only update 'all_visible' and 'all_frozen' when freezing
- * is requested. We could use GlobalVisTestIsRemovableXid
+ * is requested. We could use GlobalVisXidVisibleToAll()
* instead, if a non-freezing caller wanted to set the VM bit.
*/
Assert(prstate->cutoffs);
diff --git a/src/backend/access/spgist/spgvacuum.c b/src/backend/access/spgist/spgvacuum.c
index cb5671c1a4e..3a68757c09a 100644
--- a/src/backend/access/spgist/spgvacuum.c
+++ b/src/backend/access/spgist/spgvacuum.c
@@ -536,7 +536,7 @@ vacuumRedirectAndPlaceholder(Relation index, Relation heaprel, Buffer buffer)
*/
if (dt->tupstate == SPGIST_REDIRECT &&
(!TransactionIdIsValid(dt->xid) ||
- GlobalVisTestIsRemovableXid(vistest, dt->xid)))
+ GlobalVisXidVisibleToAll(vistest, dt->xid)))
{
dt->tupstate = SPGIST_PLACEHOLDER;
Assert(opaque->nRedirection > 0);
diff --git a/src/backend/storage/ipc/procarray.c b/src/backend/storage/ipc/procarray.c
index f3a1603204e..67da6737496 100644
--- a/src/backend/storage/ipc/procarray.c
+++ b/src/backend/storage/ipc/procarray.c
@@ -4179,8 +4179,7 @@ GlobalVisUpdate(void)
* See comment for GlobalVisState for details.
*/
bool
-GlobalVisTestIsRemovableFullXid(GlobalVisState *state,
- FullTransactionId fxid)
+GlobalVisFullXidVisibleToAll(GlobalVisState *state, FullTransactionId fxid)
{
/*
* If fxid is older than maybe_needed bound, it definitely is visible to
@@ -4214,14 +4213,14 @@ GlobalVisTestIsRemovableFullXid(GlobalVisState *state,
}
/*
- * Wrapper around GlobalVisTestIsRemovableFullXid() for 32bit xids.
+ * Wrapper around GlobalVisFullXidVisibleToAll() for 32bit xids.
*
* It is crucial that this only gets called for xids from a source that
* protects against xid wraparounds (e.g. from a table and thus protected by
* relfrozenxid).
*/
bool
-GlobalVisTestIsRemovableXid(GlobalVisState *state, TransactionId xid)
+GlobalVisXidVisibleToAll(GlobalVisState *state, TransactionId xid)
{
FullTransactionId fxid;
@@ -4235,12 +4234,12 @@ GlobalVisTestIsRemovableXid(GlobalVisState *state, TransactionId xid)
*/
fxid = FullXidRelativeTo(state->definitely_needed, xid);
- return GlobalVisTestIsRemovableFullXid(state, fxid);
+ return GlobalVisFullXidVisibleToAll(state, fxid);
}
/*
* Convenience wrapper around GlobalVisTestFor() and
- * GlobalVisTestIsRemovableFullXid(), see their comments.
+ * GlobalVisFullXidVisibleToAll(), see their comments.
*/
bool
GlobalVisCheckRemovableFullXid(Relation rel, FullTransactionId fxid)
@@ -4249,12 +4248,12 @@ GlobalVisCheckRemovableFullXid(Relation rel, FullTransactionId fxid)
state = GlobalVisTestFor(rel);
- return GlobalVisTestIsRemovableFullXid(state, fxid);
+ return GlobalVisFullXidVisibleToAll(state, fxid);
}
/*
* Convenience wrapper around GlobalVisTestFor() and
- * GlobalVisTestIsRemovableXid(), see their comments.
+ * GlobalVisTestIsVisibleXid(), see their comments.
*/
bool
GlobalVisCheckRemovableXid(Relation rel, TransactionId xid)
@@ -4263,7 +4262,7 @@ GlobalVisCheckRemovableXid(Relation rel, TransactionId xid)
state = GlobalVisTestFor(rel);
- return GlobalVisTestIsRemovableXid(state, xid);
+ return GlobalVisXidVisibleToAll(state, xid);
}
/*
diff --git a/src/include/utils/snapmgr.h b/src/include/utils/snapmgr.h
index 604c1f90216..a0ea2cfcea2 100644
--- a/src/include/utils/snapmgr.h
+++ b/src/include/utils/snapmgr.h
@@ -100,8 +100,8 @@ extern char *ExportSnapshot(Snapshot snapshot);
*/
typedef struct GlobalVisState GlobalVisState;
extern GlobalVisState *GlobalVisTestFor(Relation rel);
-extern bool GlobalVisTestIsRemovableXid(GlobalVisState *state, TransactionId xid);
-extern bool GlobalVisTestIsRemovableFullXid(GlobalVisState *state, FullTransactionId fxid);
+extern bool GlobalVisXidVisibleToAll(GlobalVisState *state, TransactionId xid);
+extern bool GlobalVisFullXidVisibleToAll(GlobalVisState *state, FullTransactionId fxid);
extern bool GlobalVisCheckRemovableXid(Relation rel, TransactionId xid);
extern bool GlobalVisCheckRemovableFullXid(Relation rel, FullTransactionId fxid);
--
2.43.0
v26-0010-Use-GlobalVisState-in-vacuum-to-determine-page-l.patchtext/x-patch; charset=UTF-8; name=v26-0010-Use-GlobalVisState-in-vacuum-to-determine-page-l.patchDownload
From cc69c22be974c579e9fdc186f83dd063c40e06bb Mon Sep 17 00:00:00 2001
From: Melanie Plageman <melanieplageman@gmail.com>
Date: Tue, 29 Jul 2025 14:38:24 -0400
Subject: [PATCH v26 10/15] Use GlobalVisState in vacuum to determine page
level visibility
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
During vacuum's first and third phases, we examine tuples' visibility
to determine if we can set the page all-visible in the visibility map.
Previously, this check compared tuple xmins against a single XID chosen at
the start of vacuum (OldestXmin). We now use GlobalVisState, which also
enables future work to set the VM during on-access pruning, since ordinary
queries have access to GlobalVisState but not OldestXmin.
This also benefits vacuum directly: in some cases, GlobalVisState may
advance during a vacuum, allowing more pages to become considered
all-visible. And, in the future, we could easily add a heuristic to
update GlobalVisState more frequently during vacuums of large tables. In
the rare case that the GlobalVisState moves backward, vacuum falls back
to OldestXmin to ensure we don’t attempt to freeze a dead tuple that
wasn’t yet prunable according to the GlobalVisState.
Because comparing a transaction ID against GlobalVisState is more
expensive than comparing against a single XID, we defer this check until
after scanning all tuples on the page. If visibility_cutoff_xid was
maintained, we perform the GlobalVisState check only once per page.
This is safe because visibility_cutoff_xid records the newest xmin on
the page; if it is globally visible, then the entire page is all-visible.
This approach may result in examining more tuple xmins than before,
since with OldestXmin we could sometimes rule out the page being
all-visible earlier. However, profiling shows the additional cost is not
significant.
Reviewed-by: Andres Freund <andres@anarazel.de>
Reviewed-by: Chao Li <li.evan.chao@gmail.com>
---
src/backend/access/heap/heapam_visibility.c | 28 ++++++++++++++
src/backend/access/heap/pruneheap.c | 43 +++++++++------------
src/backend/access/heap/vacuumlazy.c | 10 ++---
src/include/access/heapam.h | 11 +++---
4 files changed, 58 insertions(+), 34 deletions(-)
diff --git a/src/backend/access/heap/heapam_visibility.c b/src/backend/access/heap/heapam_visibility.c
index 4ebc8abdbeb..edd529dc3c0 100644
--- a/src/backend/access/heap/heapam_visibility.c
+++ b/src/backend/access/heap/heapam_visibility.c
@@ -1189,6 +1189,34 @@ HeapTupleSatisfiesVacuum(HeapTuple htup, TransactionId OldestXmin,
return res;
}
+/*
+ * Nearly the same as HeapTupleSatisfiesVacuum, but uses a GlobalVisState to
+ * determine whether or not a tuple is HEAPTUPLE_DEAD Or
+ * HEAPTUPLE_RECENTLY_DEAD. It serves the same purpose but can be used by
+ * callers that have not calculated a single OldestXmin value.
+ */
+HTSV_Result
+HeapTupleSatisfiesVacuumGlobalVis(HeapTuple htup, GlobalVisState *vistest,
+ Buffer buffer)
+{
+ TransactionId dead_after = InvalidTransactionId;
+ HTSV_Result res;
+
+ res = HeapTupleSatisfiesVacuumHorizon(htup, buffer, &dead_after);
+
+ if (res == HEAPTUPLE_RECENTLY_DEAD)
+ {
+ Assert(TransactionIdIsValid(dead_after));
+
+ if (GlobalVisXidVisibleToAll(vistest, dead_after))
+ res = HEAPTUPLE_DEAD;
+ }
+ else
+ Assert(!TransactionIdIsValid(dead_after));
+
+ return res;
+}
+
/*
* Work horse for HeapTupleSatisfiesVacuum and similar routines.
*
diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index 0574c78a5eb..e9754d43f72 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -451,11 +451,12 @@ prune_freeze_setup(PruneFreezeParams *params,
/*
* The visibility cutoff xid is the newest xmin of live, committed tuples
- * older than OldestXmin on the page. This field is only kept up-to-date
- * if the page is all-visible. As soon as a tuple is encountered that is
- * not visible to all, this field is unmaintained. As long as it is
- * maintained, it can be used to calculate the snapshot conflict horizon
- * when updating the VM and/or freezing all the tuples on the page.
+ * on the page older than the visibility horizon represented in the
+ * GlobalVisState. This field is only kept up-to-date if the page is
+ * all-visible. As soon as a tuple is encountered that is not visible to
+ * all, this field is unmaintained. As long as it is maintained, it can be
+ * used to calculate the snapshot conflict horizon when updating the VM
+ * and/or freezing all the tuples on the page.
*/
prstate->visibility_cutoff_xid = InvalidTransactionId;
}
@@ -984,14 +985,13 @@ heap_page_will_set_vm(PruneState *prstate,
*/
static bool
heap_page_is_all_visible(Relation rel, Buffer buf,
- TransactionId OldestXmin,
+ GlobalVisState *vistest,
bool *all_frozen,
TransactionId *visibility_cutoff_xid,
OffsetNumber *logging_offnum)
{
- return heap_page_would_be_all_visible(rel, buf,
- OldestXmin,
+ return heap_page_would_be_all_visible(rel, buf, vistest,
NULL, 0,
all_frozen,
visibility_cutoff_xid,
@@ -1078,6 +1078,16 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
prune_freeze_plan(RelationGetRelid(params->relation),
buffer, &prstate, off_loc);
+ /*
+ * After processing all the live tuples on the page, if the newest xmin
+ * amongst them is not visible to everyone, the page cannot be
+ * all-visible.
+ */
+ if (prstate.all_visible &&
+ TransactionIdIsNormal(prstate.visibility_cutoff_xid) &&
+ !GlobalVisXidVisibleToAll(prstate.vistest, prstate.visibility_cutoff_xid))
+ prstate.all_visible = prstate.all_frozen = false;
+
/*
* If checksums are enabled, calling heap_prune_satisfies_vacuum() while
* checking tuple visibility information in prune_freeze_plan() may have
@@ -1259,10 +1269,9 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
bool debug_all_frozen;
Assert(prstate.lpdead_items == 0);
- Assert(prstate.cutoffs);
Assert(heap_page_is_all_visible(params->relation, buffer,
- prstate.cutoffs->OldestXmin,
+ prstate.vistest,
&debug_all_frozen,
&debug_cutoff, off_loc));
@@ -1791,20 +1800,6 @@ heap_prune_record_unchanged_lp_normal(Page page, PruneState *prstate, OffsetNumb
*/
xmin = HeapTupleHeaderGetXmin(htup);
- /*
- * For now always use prstate->cutoffs for this test, because
- * we only update 'all_visible' and 'all_frozen' when freezing
- * is requested. We could use GlobalVisXidVisibleToAll()
- * instead, if a non-freezing caller wanted to set the VM bit.
- */
- Assert(prstate->cutoffs);
- if (!TransactionIdPrecedes(xmin, prstate->cutoffs->OldestXmin))
- {
- prstate->all_visible = false;
- prstate->all_frozen = false;
- break;
- }
-
/* Track newest xmin on page. */
if (TransactionIdFollows(xmin, prstate->visibility_cutoff_xid) &&
TransactionIdIsNormal(xmin))
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index 87820f3ff49..479fb096974 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -2730,7 +2730,7 @@ lazy_vacuum_heap_page(LVRelState *vacrel, BlockNumber blkno, Buffer buffer,
* done outside the critical section.
*/
if (heap_page_would_be_all_visible(vacrel->rel, buffer,
- vacrel->cutoffs.OldestXmin,
+ vacrel->vistest,
deadoffsets, num_offsets,
&all_frozen, &visibility_cutoff_xid,
&vacrel->offnum))
@@ -3491,7 +3491,7 @@ dead_items_cleanup(LVRelState *vacrel)
* Returns true if the page is all-visible other than the provided
* deadoffsets and false otherwise.
*
- * OldestXmin is used to determine visibility.
+ * vistest is used to determine visibility.
*
* Output parameters:
*
@@ -3507,7 +3507,7 @@ dead_items_cleanup(LVRelState *vacrel)
*/
bool
heap_page_would_be_all_visible(Relation rel, Buffer buf,
- TransactionId OldestXmin,
+ GlobalVisState *vistest,
OffsetNumber *deadoffsets,
int ndeadoffsets,
bool *all_frozen,
@@ -3581,7 +3581,7 @@ heap_page_would_be_all_visible(Relation rel, Buffer buf,
/* Visibility checks may do IO or allocate memory */
Assert(CritSectionCount == 0);
- switch (HeapTupleSatisfiesVacuum(&tuple, OldestXmin, buf))
+ switch (HeapTupleSatisfiesVacuumGlobalVis(&tuple, vistest, buf))
{
case HEAPTUPLE_LIVE:
{
@@ -3600,7 +3600,7 @@ heap_page_would_be_all_visible(Relation rel, Buffer buf,
* that everyone sees it as committed?
*/
xmin = HeapTupleHeaderGetXmin(tuple.t_data);
- if (!TransactionIdPrecedes(xmin, OldestXmin))
+ if (!GlobalVisXidVisibleToAll(vistest, xmin))
{
all_visible = false;
*all_frozen = false;
diff --git a/src/include/access/heapam.h b/src/include/access/heapam.h
index 9100d42ccbb..a33b5ef55a8 100644
--- a/src/include/access/heapam.h
+++ b/src/include/access/heapam.h
@@ -272,10 +272,9 @@ typedef struct PruneFreezeParams
/*
* Contains the cutoffs used for freezing. They are required if the
- * HEAP_PAGE_PRUNE_FREEZE option is set. cutoffs->OldestXmin is also used
- * to determine if dead tuples are HEAPTUPLE_RECENTLY_DEAD or
- * HEAPTUPLE_DEAD. Currently only vacuum passes in cutoffs. Vacuum
- * calculates them once, at the beginning of vacuuming the relation.
+ * HEAP_PAGE_PRUNE_FREEZE option is set. Currently only vacuum passes in
+ * cutoffs. Vacuum calculates them once, at the beginning of vacuuming the
+ * relation.
*/
struct VacuumCutoffs *cutoffs;
} PruneFreezeParams;
@@ -439,7 +438,7 @@ extern void log_heap_prune_and_freeze(Relation relation, Buffer buffer,
extern void heap_vacuum_rel(Relation rel,
const VacuumParams params, BufferAccessStrategy bstrategy);
extern bool heap_page_would_be_all_visible(Relation rel, Buffer buf,
- TransactionId OldestXmin,
+ GlobalVisState *vistest,
OffsetNumber *deadoffsets,
int ndeadoffsets,
bool *all_frozen,
@@ -453,6 +452,8 @@ extern TM_Result HeapTupleSatisfiesUpdate(HeapTuple htup, CommandId curcid,
Buffer buffer);
extern HTSV_Result HeapTupleSatisfiesVacuum(HeapTuple htup, TransactionId OldestXmin,
Buffer buffer);
+extern HTSV_Result HeapTupleSatisfiesVacuumGlobalVis(HeapTuple htup,
+ GlobalVisState *vistest, Buffer buffer);
extern HTSV_Result HeapTupleSatisfiesVacuumHorizon(HeapTuple htup, Buffer buffer,
TransactionId *dead_after);
extern void HeapTupleSetHintBits(HeapTupleHeader tuple, Buffer buffer,
--
2.43.0
v26-0011-Unset-all_visible-sooner-if-not-freezing.patchtext/x-patch; charset=UTF-8; name=v26-0011-Unset-all_visible-sooner-if-not-freezing.patchDownload
From 6e10d58579ae071085bdf0ffd610c79129b549d2 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <melanieplageman@gmail.com>
Date: Tue, 14 Oct 2025 15:22:35 -0400
Subject: [PATCH v26 11/15] Unset all_visible sooner if not freezing
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
In the prune/freeze path, we currently delay clearing all_visible and
all_frozen in the presence of dead items to allow opportunistic
freezing.
However, if no freezing will be attempted, there’s no need to delay.
Clearing the flags earlier avoids extra bookkeeping in
heap_prune_record_unchanged_lp_normal(). This currently has no runtime
effect because all callers that consider setting the VM also prepare
freeze plans, but upcoming changes will allow on-access pruning to set
the VM without freezing. The extra bookkeeping was noticeable in a
profile of on-access VM setting.
Reviewed-by: Chao Li <li.evan.chao@gmail.com>
---
src/backend/access/heap/pruneheap.c | 15 +++++++++++++--
1 file changed, 13 insertions(+), 2 deletions(-)
diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index e9754d43f72..a4c3bd00253 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -1657,8 +1657,13 @@ heap_prune_record_dead(PruneState *prstate, OffsetNumber offnum,
/*
* Deliberately delay unsetting all_visible and all_frozen until later
* during pruning. Removable dead tuples shouldn't preclude freezing the
- * page.
+ * page. If we won't attempt freezing, just unset all-visible now, though.
*/
+ if (!prstate->attempt_freeze)
+ {
+ prstate->all_visible = false;
+ prstate->all_frozen = false;
+ }
/* Record the dead offset for vacuum */
prstate->deadoffsets[prstate->lpdead_items++] = offnum;
@@ -1917,8 +1922,14 @@ heap_prune_record_unchanged_lp_dead(Page page, PruneState *prstate, OffsetNumber
* Similarly, don't unset all_visible and all_frozen until later, at the
* end of heap_page_prune_and_freeze(). This will allow us to attempt to
* freeze the page after pruning. As long as we unset it before updating
- * the visibility map, this will be correct.
+ * the visibility map, this will be correct. If we won't attempt freezing,
+ * though, just unset all-visible now.
*/
+ if (!prstate->attempt_freeze)
+ {
+ prstate->all_visible = false;
+ prstate->all_frozen = false;
+ }
/* Record the dead offset for vacuum */
prstate->deadoffsets[prstate->lpdead_items++] = offnum;
--
2.43.0
v26-0012-Track-which-relations-are-modified-by-a-query.patchtext/x-patch; charset=US-ASCII; name=v26-0012-Track-which-relations-are-modified-by-a-query.patchDownload
From 8493882305b7c632c98bfda7dafcdb4785e3892c Mon Sep 17 00:00:00 2001
From: Melanie Plageman <melanieplageman@gmail.com>
Date: Wed, 3 Dec 2025 15:07:24 -0500
Subject: [PATCH v26 12/15] Track which relations are modified by a query
Save the relids in a bitmap in the estate. A later commit will pass this
information down to scan nodes to control whether or not the scan allows
setting the visibility map while on-access pruning. We don't want to set
the visibility map if the query is just going to modify the page
immediately after.
---
src/backend/executor/execMain.c | 4 ++++
src/backend/executor/execUtils.c | 2 ++
src/include/nodes/execnodes.h | 6 ++++++
3 files changed, 12 insertions(+)
diff --git a/src/backend/executor/execMain.c b/src/backend/executor/execMain.c
index 797d8b1ca1c..5b299ef81aa 100644
--- a/src/backend/executor/execMain.c
+++ b/src/backend/executor/execMain.c
@@ -916,6 +916,10 @@ InitPlan(QueryDesc *queryDesc, int eflags)
break;
}
+ /* If it has a rowmark, the relation is modified */
+ estate->es_modified_relids = bms_add_member(estate->es_modified_relids,
+ rc->rti);
+
/* Check that relation is a legal target for marking */
if (relation)
CheckValidRowMarkRel(relation, rc->markType);
diff --git a/src/backend/executor/execUtils.c b/src/backend/executor/execUtils.c
index fdc65c2b42b..28a06dcd244 100644
--- a/src/backend/executor/execUtils.c
+++ b/src/backend/executor/execUtils.c
@@ -893,6 +893,8 @@ ExecInitResultRelation(EState *estate, ResultRelInfo *resultRelInfo,
estate->es_result_relations = (ResultRelInfo **)
palloc0(estate->es_range_table_size * sizeof(ResultRelInfo *));
estate->es_result_relations[rti - 1] = resultRelInfo;
+ estate->es_modified_relids = bms_add_member(estate->es_modified_relids,
+ rti);
/*
* Saving in the list allows to avoid needlessly traversing the whole
diff --git a/src/include/nodes/execnodes.h b/src/include/nodes/execnodes.h
index 3968429f991..d8c385216e0 100644
--- a/src/include/nodes/execnodes.h
+++ b/src/include/nodes/execnodes.h
@@ -676,6 +676,12 @@ typedef struct EState
* ExecDoInitialPruning() */
const char *es_sourceText; /* Source text from QueryDesc */
+ /*
+ * RT indexes of relations modified by the query either through
+ * UPDATE/DELETE/INSERT/MERGE or SELECT FOR UPDATE
+ */
+ Bitmapset *es_modified_relids;
+
JunkFilter *es_junkFilter; /* top-level junk filter, if any */
/* If query can insert/delete tuples, the command ID to mark them with */
--
2.43.0
v26-0013-Pass-down-information-on-table-modification-to-s.patchtext/x-patch; charset=US-ASCII; name=v26-0013-Pass-down-information-on-table-modification-to-s.patchDownload
From 1d7279c53118e38246395b3ba575db48dc8172fd Mon Sep 17 00:00:00 2001
From: Melanie Plageman <melanieplageman@gmail.com>
Date: Wed, 3 Dec 2025 15:12:18 -0500
Subject: [PATCH v26 13/15] Pass down information on table modification to scan
node
Pass down information to sequential scan, index scan, and bitmap table
scan nodes on whether or not the query modifies the relation being
scanned. A later commit will use this information to update the VM
during on-access pruning only if the relation is not modified by the
query.
---
contrib/pgrowlocks/pgrowlocks.c | 2 +-
src/backend/access/brin/brin.c | 3 ++-
src/backend/access/gin/gininsert.c | 3 ++-
src/backend/access/heap/heapam_handler.c | 7 +++---
src/backend/access/index/genam.c | 4 ++--
src/backend/access/index/indexam.c | 6 +++---
src/backend/access/nbtree/nbtsort.c | 2 +-
src/backend/access/table/tableam.c | 7 +++---
src/backend/commands/constraint.c | 2 +-
src/backend/commands/copyto.c | 2 +-
src/backend/commands/tablecmds.c | 8 +++----
src/backend/commands/typecmds.c | 4 ++--
src/backend/executor/execIndexing.c | 2 +-
src/backend/executor/execReplication.c | 8 +++----
src/backend/executor/nodeBitmapHeapscan.c | 9 +++++++-
src/backend/executor/nodeIndexonlyscan.c | 2 +-
src/backend/executor/nodeIndexscan.c | 11 ++++++++--
src/backend/executor/nodeSeqscan.c | 26 ++++++++++++++++++++---
src/backend/partitioning/partbounds.c | 2 +-
src/backend/utils/adt/selfuncs.c | 2 +-
src/include/access/genam.h | 2 +-
src/include/access/heapam.h | 6 ++++++
src/include/access/tableam.h | 19 ++++++++++-------
23 files changed, 93 insertions(+), 46 deletions(-)
diff --git a/contrib/pgrowlocks/pgrowlocks.c b/contrib/pgrowlocks/pgrowlocks.c
index f88269332b6..27f01d8055f 100644
--- a/contrib/pgrowlocks/pgrowlocks.c
+++ b/contrib/pgrowlocks/pgrowlocks.c
@@ -114,7 +114,7 @@ pgrowlocks(PG_FUNCTION_ARGS)
RelationGetRelationName(rel));
/* Scan the relation */
- scan = table_beginscan(rel, GetActiveSnapshot(), 0, NULL);
+ scan = table_beginscan(rel, GetActiveSnapshot(), 0, NULL, 0);
hscan = (HeapScanDesc) scan;
attinmeta = TupleDescGetAttInMetadata(rsinfo->setDesc);
diff --git a/src/backend/access/brin/brin.c b/src/backend/access/brin/brin.c
index 26cb75058d1..4ad8941c60a 100644
--- a/src/backend/access/brin/brin.c
+++ b/src/backend/access/brin/brin.c
@@ -2842,7 +2842,8 @@ _brin_parallel_scan_and_build(BrinBuildState *state,
indexInfo->ii_Concurrent = brinshared->isconcurrent;
scan = table_beginscan_parallel(heap,
- ParallelTableScanFromBrinShared(brinshared));
+ ParallelTableScanFromBrinShared(brinshared),
+ 0);
reltuples = table_index_build_scan(heap, index, indexInfo, true, true,
brinbuildCallbackParallel, state, scan);
diff --git a/src/backend/access/gin/gininsert.c b/src/backend/access/gin/gininsert.c
index df30dcc0228..aaa5401b731 100644
--- a/src/backend/access/gin/gininsert.c
+++ b/src/backend/access/gin/gininsert.c
@@ -2058,7 +2058,8 @@ _gin_parallel_scan_and_build(GinBuildState *state,
indexInfo->ii_Concurrent = ginshared->isconcurrent;
scan = table_beginscan_parallel(heap,
- ParallelTableScanFromGinBuildShared(ginshared));
+ ParallelTableScanFromGinBuildShared(ginshared),
+ 0);
reltuples = table_index_build_scan(heap, index, indexInfo, true, progress,
ginBuildCallbackParallel, state, scan);
diff --git a/src/backend/access/heap/heapam_handler.c b/src/backend/access/heap/heapam_handler.c
index dd4fe6bf62f..6c2e4e08b16 100644
--- a/src/backend/access/heap/heapam_handler.c
+++ b/src/backend/access/heap/heapam_handler.c
@@ -79,12 +79,13 @@ heapam_slot_callbacks(Relation relation)
*/
static IndexFetchTableData *
-heapam_index_fetch_begin(Relation rel)
+heapam_index_fetch_begin(Relation rel, uint32 flags)
{
IndexFetchHeapData *hscan = palloc0_object(IndexFetchHeapData);
hscan->xs_base.rel = rel;
hscan->xs_cbuf = InvalidBuffer;
+ hscan->modifies_base_rel = !(flags & SO_HINT_REL_READ_ONLY);
return &hscan->xs_base;
}
@@ -753,7 +754,7 @@ heapam_relation_copy_for_cluster(Relation OldHeap, Relation NewHeap,
tableScan = NULL;
heapScan = NULL;
- indexScan = index_beginscan(OldHeap, OldIndex, SnapshotAny, NULL, 0, 0);
+ indexScan = index_beginscan(OldHeap, OldIndex, SnapshotAny, NULL, 0, 0, 0);
index_rescan(indexScan, NULL, 0, NULL, 0);
}
else
@@ -762,7 +763,7 @@ heapam_relation_copy_for_cluster(Relation OldHeap, Relation NewHeap,
pgstat_progress_update_param(PROGRESS_CLUSTER_PHASE,
PROGRESS_CLUSTER_PHASE_SEQ_SCAN_HEAP);
- tableScan = table_beginscan(OldHeap, SnapshotAny, 0, (ScanKey) NULL);
+ tableScan = table_beginscan(OldHeap, SnapshotAny, 0, (ScanKey) NULL, 0);
heapScan = (HeapScanDesc) tableScan;
indexScan = NULL;
diff --git a/src/backend/access/index/genam.c b/src/backend/access/index/genam.c
index b7f10a1aed0..15f9cc11582 100644
--- a/src/backend/access/index/genam.c
+++ b/src/backend/access/index/genam.c
@@ -447,7 +447,7 @@ systable_beginscan(Relation heapRelation,
}
sysscan->iscan = index_beginscan(heapRelation, irel,
- snapshot, NULL, nkeys, 0);
+ snapshot, NULL, nkeys, 0, 0);
index_rescan(sysscan->iscan, idxkey, nkeys, NULL, 0);
sysscan->scan = NULL;
@@ -708,7 +708,7 @@ systable_beginscan_ordered(Relation heapRelation,
}
sysscan->iscan = index_beginscan(heapRelation, indexRelation,
- snapshot, NULL, nkeys, 0);
+ snapshot, NULL, nkeys, 0, 0);
index_rescan(sysscan->iscan, idxkey, nkeys, NULL, 0);
sysscan->scan = NULL;
diff --git a/src/backend/access/index/indexam.c b/src/backend/access/index/indexam.c
index 0492d92d23b..b5523cf2ab1 100644
--- a/src/backend/access/index/indexam.c
+++ b/src/backend/access/index/indexam.c
@@ -257,7 +257,7 @@ index_beginscan(Relation heapRelation,
Relation indexRelation,
Snapshot snapshot,
IndexScanInstrumentation *instrument,
- int nkeys, int norderbys)
+ int nkeys, int norderbys, uint32 flags)
{
IndexScanDesc scan;
@@ -284,7 +284,7 @@ index_beginscan(Relation heapRelation,
scan->instrument = instrument;
/* prepare to fetch index matches from table */
- scan->xs_heapfetch = table_index_fetch_begin(heapRelation);
+ scan->xs_heapfetch = table_index_fetch_begin(heapRelation, flags);
return scan;
}
@@ -615,7 +615,7 @@ index_beginscan_parallel(Relation heaprel, Relation indexrel,
scan->instrument = instrument;
/* prepare to fetch index matches from table */
- scan->xs_heapfetch = table_index_fetch_begin(heaprel);
+ scan->xs_heapfetch = table_index_fetch_begin(heaprel, 0);
return scan;
}
diff --git a/src/backend/access/nbtree/nbtsort.c b/src/backend/access/nbtree/nbtsort.c
index d7695dc1108..7bdbc7e5fa7 100644
--- a/src/backend/access/nbtree/nbtsort.c
+++ b/src/backend/access/nbtree/nbtsort.c
@@ -1925,7 +1925,7 @@ _bt_parallel_scan_and_sort(BTSpool *btspool, BTSpool *btspool2,
indexInfo = BuildIndexInfo(btspool->index);
indexInfo->ii_Concurrent = btshared->isconcurrent;
scan = table_beginscan_parallel(btspool->heap,
- ParallelTableScanFromBTShared(btshared));
+ ParallelTableScanFromBTShared(btshared), 0);
reltuples = table_index_build_scan(btspool->heap, btspool->index, indexInfo,
true, progress, _bt_build_callback,
&buildstate, scan);
diff --git a/src/backend/access/table/tableam.c b/src/backend/access/table/tableam.c
index 73ebc01a08f..a00bdfdf822 100644
--- a/src/backend/access/table/tableam.c
+++ b/src/backend/access/table/tableam.c
@@ -163,10 +163,11 @@ table_parallelscan_initialize(Relation rel, ParallelTableScanDesc pscan,
}
TableScanDesc
-table_beginscan_parallel(Relation relation, ParallelTableScanDesc pscan)
+table_beginscan_parallel(Relation relation, ParallelTableScanDesc pscan, uint32 flags)
{
Snapshot snapshot;
- uint32 flags = SO_TYPE_SEQSCAN |
+
+ flags |= SO_TYPE_SEQSCAN |
SO_ALLOW_STRAT | SO_ALLOW_SYNC | SO_ALLOW_PAGEMODE;
Assert(RelFileLocatorEquals(relation->rd_locator, pscan->phs_locator));
@@ -248,7 +249,7 @@ table_index_fetch_tuple_check(Relation rel,
bool found;
slot = table_slot_create(rel, NULL);
- scan = table_index_fetch_begin(rel);
+ scan = table_index_fetch_begin(rel, 0);
found = table_index_fetch_tuple(scan, tid, snapshot, slot, &call_again,
all_dead);
table_index_fetch_end(scan);
diff --git a/src/backend/commands/constraint.c b/src/backend/commands/constraint.c
index 3497a8221f2..97c8278e36d 100644
--- a/src/backend/commands/constraint.c
+++ b/src/backend/commands/constraint.c
@@ -106,7 +106,7 @@ unique_key_recheck(PG_FUNCTION_ARGS)
*/
tmptid = checktid;
{
- IndexFetchTableData *scan = table_index_fetch_begin(trigdata->tg_relation);
+ IndexFetchTableData *scan = table_index_fetch_begin(trigdata->tg_relation, 0);
bool call_again = false;
if (!table_index_fetch_tuple(scan, &tmptid, SnapshotSelf, slot,
diff --git a/src/backend/commands/copyto.c b/src/backend/commands/copyto.c
index dae91630ac3..1957bb0f1a2 100644
--- a/src/backend/commands/copyto.c
+++ b/src/backend/commands/copyto.c
@@ -1156,7 +1156,7 @@ CopyRelationTo(CopyToState cstate, Relation rel, Relation root_rel, uint64 *proc
AttrMap *map = NULL;
TupleTableSlot *root_slot = NULL;
- scandesc = table_beginscan(rel, GetActiveSnapshot(), 0, NULL);
+ scandesc = table_beginscan(rel, GetActiveSnapshot(), 0, NULL, 0);
slot = table_slot_create(rel, NULL);
/*
diff --git a/src/backend/commands/tablecmds.c b/src/backend/commands/tablecmds.c
index 6b1a00ed477..130a670d266 100644
--- a/src/backend/commands/tablecmds.c
+++ b/src/backend/commands/tablecmds.c
@@ -6377,7 +6377,7 @@ ATRewriteTable(AlteredTableInfo *tab, Oid OIDNewHeap)
* checking all the constraints.
*/
snapshot = RegisterSnapshot(GetLatestSnapshot());
- scan = table_beginscan(oldrel, snapshot, 0, NULL);
+ scan = table_beginscan(oldrel, snapshot, 0, NULL, 0);
/*
* Switch to per-tuple memory context and reset it for each tuple
@@ -13766,7 +13766,7 @@ validateForeignKeyConstraint(char *conname,
*/
snapshot = RegisterSnapshot(GetLatestSnapshot());
slot = table_slot_create(rel, NULL);
- scan = table_beginscan(rel, snapshot, 0, NULL);
+ scan = table_beginscan(rel, snapshot, 0, NULL, 0);
perTupCxt = AllocSetContextCreate(CurrentMemoryContext,
"validateForeignKeyConstraint",
@@ -22623,7 +22623,7 @@ MergePartitionsMoveRows(List **wqueue, List *mergingPartitions, Relation newPart
/* Scan through the rows. */
snapshot = RegisterSnapshot(GetLatestSnapshot());
- scan = table_beginscan(mergingPartition, snapshot, 0, NULL);
+ scan = table_beginscan(mergingPartition, snapshot, 0, NULL, 0);
/*
* Switch to per-tuple memory context and reset it for each tuple
@@ -23087,7 +23087,7 @@ SplitPartitionMoveRows(List **wqueue, Relation rel, Relation splitRel,
/* Scan through the rows. */
snapshot = RegisterSnapshot(GetLatestSnapshot());
- scan = table_beginscan(splitRel, snapshot, 0, NULL);
+ scan = table_beginscan(splitRel, snapshot, 0, NULL, 0);
/*
* Switch to per-tuple memory context and reset it for each tuple
diff --git a/src/backend/commands/typecmds.c b/src/backend/commands/typecmds.c
index be6ffd6ddb0..2921f68c1c3 100644
--- a/src/backend/commands/typecmds.c
+++ b/src/backend/commands/typecmds.c
@@ -3157,7 +3157,7 @@ validateDomainNotNullConstraint(Oid domainoid)
/* Scan all tuples in this relation */
snapshot = RegisterSnapshot(GetLatestSnapshot());
- scan = table_beginscan(testrel, snapshot, 0, NULL);
+ scan = table_beginscan(testrel, snapshot, 0, NULL, 0);
slot = table_slot_create(testrel, NULL);
while (table_scan_getnextslot(scan, ForwardScanDirection, slot))
{
@@ -3238,7 +3238,7 @@ validateDomainCheckConstraint(Oid domainoid, const char *ccbin, LOCKMODE lockmod
/* Scan all tuples in this relation */
snapshot = RegisterSnapshot(GetLatestSnapshot());
- scan = table_beginscan(testrel, snapshot, 0, NULL);
+ scan = table_beginscan(testrel, snapshot, 0, NULL, 0);
slot = table_slot_create(testrel, NULL);
while (table_scan_getnextslot(scan, ForwardScanDirection, slot))
{
diff --git a/src/backend/executor/execIndexing.c b/src/backend/executor/execIndexing.c
index 0b3a31f1703..74262a34819 100644
--- a/src/backend/executor/execIndexing.c
+++ b/src/backend/executor/execIndexing.c
@@ -816,7 +816,7 @@ check_exclusion_or_unique_constraint(Relation heap, Relation index,
retry:
conflict = false;
found_self = false;
- index_scan = index_beginscan(heap, index, &DirtySnapshot, NULL, indnkeyatts, 0);
+ index_scan = index_beginscan(heap, index, &DirtySnapshot, NULL, indnkeyatts, 0, 0);
index_rescan(index_scan, scankeys, indnkeyatts, NULL, 0);
while (index_getnext_slot(index_scan, ForwardScanDirection, existing_slot))
diff --git a/src/backend/executor/execReplication.c b/src/backend/executor/execReplication.c
index 860f79f9cc1..6e49ea5c5d8 100644
--- a/src/backend/executor/execReplication.c
+++ b/src/backend/executor/execReplication.c
@@ -204,7 +204,7 @@ RelationFindReplTupleByIndex(Relation rel, Oid idxoid,
skey_attoff = build_replindex_scan_key(skey, rel, idxrel, searchslot);
/* Start an index scan. */
- scan = index_beginscan(rel, idxrel, &snap, NULL, skey_attoff, 0);
+ scan = index_beginscan(rel, idxrel, &snap, NULL, skey_attoff, 0, 0);
retry:
found = false;
@@ -382,7 +382,7 @@ RelationFindReplTupleSeq(Relation rel, LockTupleMode lockmode,
/* Start a heap scan. */
InitDirtySnapshot(snap);
- scan = table_beginscan(rel, &snap, 0, NULL);
+ scan = table_beginscan(rel, &snap, 0, NULL, 0);
scanslot = table_slot_create(rel, NULL);
retry:
@@ -601,7 +601,7 @@ RelationFindDeletedTupleInfoSeq(Relation rel, TupleTableSlot *searchslot,
* not yet committed or those just committed prior to the scan are
* excluded in update_most_recent_deletion_info().
*/
- scan = table_beginscan(rel, SnapshotAny, 0, NULL);
+ scan = table_beginscan(rel, SnapshotAny, 0, NULL, 0);
scanslot = table_slot_create(rel, NULL);
table_rescan(scan, NULL);
@@ -665,7 +665,7 @@ RelationFindDeletedTupleInfoByIndex(Relation rel, Oid idxoid,
* not yet committed or those just committed prior to the scan are
* excluded in update_most_recent_deletion_info().
*/
- scan = index_beginscan(rel, idxrel, SnapshotAny, NULL, skey_attoff, 0);
+ scan = index_beginscan(rel, idxrel, SnapshotAny, NULL, skey_attoff, 0, 0);
index_rescan(scan, skey, skey_attoff, NULL, 0);
diff --git a/src/backend/executor/nodeBitmapHeapscan.c b/src/backend/executor/nodeBitmapHeapscan.c
index bf24f3d7fe0..0d854db51a1 100644
--- a/src/backend/executor/nodeBitmapHeapscan.c
+++ b/src/backend/executor/nodeBitmapHeapscan.c
@@ -105,11 +105,18 @@ BitmapTableScanSetup(BitmapHeapScanState *node)
*/
if (!node->ss.ss_currentScanDesc)
{
+ uint32 flags = 0;
+
+ if (!bms_is_member(((Scan *) node->ss.ps.plan)->scanrelid,
+ node->ss.ps.state->es_modified_relids))
+ flags = SO_HINT_REL_READ_ONLY;
+
node->ss.ss_currentScanDesc =
table_beginscan_bm(node->ss.ss_currentRelation,
node->ss.ps.state->es_snapshot,
0,
- NULL);
+ NULL,
+ flags);
}
node->ss.ss_currentScanDesc->st.rs_tbmiterator = tbmiterator;
diff --git a/src/backend/executor/nodeIndexonlyscan.c b/src/backend/executor/nodeIndexonlyscan.c
index 6bea42f128f..2c87ba5f767 100644
--- a/src/backend/executor/nodeIndexonlyscan.c
+++ b/src/backend/executor/nodeIndexonlyscan.c
@@ -94,7 +94,7 @@ IndexOnlyNext(IndexOnlyScanState *node)
estate->es_snapshot,
&node->ioss_Instrument,
node->ioss_NumScanKeys,
- node->ioss_NumOrderByKeys);
+ node->ioss_NumOrderByKeys, 0);
node->ioss_ScanDesc = scandesc;
diff --git a/src/backend/executor/nodeIndexscan.c b/src/backend/executor/nodeIndexscan.c
index 72b135e5dcf..92674441c6d 100644
--- a/src/backend/executor/nodeIndexscan.c
+++ b/src/backend/executor/nodeIndexscan.c
@@ -102,6 +102,12 @@ IndexNext(IndexScanState *node)
if (scandesc == NULL)
{
+ uint32 flags = 0;
+
+ if (!bms_is_member(((Scan *) node->ss.ps.plan)->scanrelid,
+ estate->es_modified_relids))
+ flags = SO_HINT_REL_READ_ONLY;
+
/*
* We reach here if the index scan is not parallel, or if we're
* serially executing an index scan that was planned to be parallel.
@@ -111,7 +117,8 @@ IndexNext(IndexScanState *node)
estate->es_snapshot,
&node->iss_Instrument,
node->iss_NumScanKeys,
- node->iss_NumOrderByKeys);
+ node->iss_NumOrderByKeys,
+ flags);
node->iss_ScanDesc = scandesc;
@@ -207,7 +214,7 @@ IndexNextWithReorder(IndexScanState *node)
estate->es_snapshot,
&node->iss_Instrument,
node->iss_NumScanKeys,
- node->iss_NumOrderByKeys);
+ node->iss_NumOrderByKeys, 0);
node->iss_ScanDesc = scandesc;
diff --git a/src/backend/executor/nodeSeqscan.c b/src/backend/executor/nodeSeqscan.c
index 94047d29430..4d0cbb9dee4 100644
--- a/src/backend/executor/nodeSeqscan.c
+++ b/src/backend/executor/nodeSeqscan.c
@@ -65,13 +65,20 @@ SeqNext(SeqScanState *node)
if (scandesc == NULL)
{
+ uint32 flags = 0;
+
+ if (!bms_is_member(((Scan *) node->ss.ps.plan)->scanrelid,
+ estate->es_modified_relids))
+ flags = SO_HINT_REL_READ_ONLY;
+
/*
* We reach here if the scan is not parallel, or if we're serially
* executing a scan that was planned to be parallel.
*/
scandesc = table_beginscan(node->ss.ss_currentRelation,
estate->es_snapshot,
- 0, NULL);
+ 0, NULL, flags);
+
node->ss.ss_currentScanDesc = scandesc;
}
@@ -367,14 +374,20 @@ ExecSeqScanInitializeDSM(SeqScanState *node,
{
EState *estate = node->ss.ps.state;
ParallelTableScanDesc pscan;
+ uint32 flags = 0;
pscan = shm_toc_allocate(pcxt->toc, node->pscan_len);
table_parallelscan_initialize(node->ss.ss_currentRelation,
pscan,
estate->es_snapshot);
shm_toc_insert(pcxt->toc, node->ss.ps.plan->plan_node_id, pscan);
+ if (!bms_is_member(((Scan *) node->ss.ps.plan)->scanrelid,
+ estate->es_modified_relids))
+ flags = SO_HINT_REL_READ_ONLY;
+
node->ss.ss_currentScanDesc =
- table_beginscan_parallel(node->ss.ss_currentRelation, pscan);
+ table_beginscan_parallel(node->ss.ss_currentRelation, pscan,
+ flags);
}
/* ----------------------------------------------------------------
@@ -404,8 +417,15 @@ ExecSeqScanInitializeWorker(SeqScanState *node,
ParallelWorkerContext *pwcxt)
{
ParallelTableScanDesc pscan;
+ uint32 flags = 0;
+
+ if (!bms_is_member(((Scan *) node->ss.ps.plan)->scanrelid,
+ node->ss.ps.state->es_modified_relids))
+ flags = SO_HINT_REL_READ_ONLY;
pscan = shm_toc_lookup(pwcxt->toc, node->ss.ps.plan->plan_node_id, false);
node->ss.ss_currentScanDesc =
- table_beginscan_parallel(node->ss.ss_currentRelation, pscan);
+ table_beginscan_parallel(node->ss.ss_currentRelation,
+ pscan,
+ flags);
}
diff --git a/src/backend/partitioning/partbounds.c b/src/backend/partitioning/partbounds.c
index 16b0adc172c..91acf1ee2d7 100644
--- a/src/backend/partitioning/partbounds.c
+++ b/src/backend/partitioning/partbounds.c
@@ -3362,7 +3362,7 @@ check_default_partition_contents(Relation parent, Relation default_rel,
econtext = GetPerTupleExprContext(estate);
snapshot = RegisterSnapshot(GetLatestSnapshot());
tupslot = table_slot_create(part_rel, &estate->es_tupleTable);
- scan = table_beginscan(part_rel, snapshot, 0, NULL);
+ scan = table_beginscan(part_rel, snapshot, 0, NULL, 0);
/*
* Switch to per-tuple memory context and reset it for each tuple
diff --git a/src/backend/utils/adt/selfuncs.c b/src/backend/utils/adt/selfuncs.c
index c760b19db55..ec0def0d1e2 100644
--- a/src/backend/utils/adt/selfuncs.c
+++ b/src/backend/utils/adt/selfuncs.c
@@ -7100,7 +7100,7 @@ get_actual_variable_endpoint(Relation heapRel,
index_scan = index_beginscan(heapRel, indexRel,
&SnapshotNonVacuumable, NULL,
- 1, 0);
+ 1, 0, 0);
/* Set it up for index-only scan */
index_scan->xs_want_itup = true;
index_rescan(index_scan, scankeys, 1, NULL, 0);
diff --git a/src/include/access/genam.h b/src/include/access/genam.h
index 9200a22bd9f..d29d9e905fc 100644
--- a/src/include/access/genam.h
+++ b/src/include/access/genam.h
@@ -177,7 +177,7 @@ extern IndexScanDesc index_beginscan(Relation heapRelation,
Relation indexRelation,
Snapshot snapshot,
IndexScanInstrumentation *instrument,
- int nkeys, int norderbys);
+ int nkeys, int norderbys, uint32 flags);
extern IndexScanDesc index_beginscan_bitmap(Relation indexRelation,
Snapshot snapshot,
IndexScanInstrumentation *instrument,
diff --git a/src/include/access/heapam.h b/src/include/access/heapam.h
index a33b5ef55a8..ba3ff8c0845 100644
--- a/src/include/access/heapam.h
+++ b/src/include/access/heapam.h
@@ -119,6 +119,12 @@ typedef struct IndexFetchHeapData
Buffer xs_cbuf; /* current heap buffer in scan, if any */
/* NB: if xs_cbuf is not InvalidBuffer, we hold a pin on that buffer */
+
+ /*
+ * Some optimizations can only be performed if the query does not modify
+ * the underlying relation. Track that here.
+ */
+ bool modifies_base_rel;
} IndexFetchHeapData;
/* Result codes for HeapTupleSatisfiesVacuum */
diff --git a/src/include/access/tableam.h b/src/include/access/tableam.h
index 2fa790b6bf5..d10b1b03cdb 100644
--- a/src/include/access/tableam.h
+++ b/src/include/access/tableam.h
@@ -63,6 +63,8 @@ typedef enum ScanOptions
/* unregister snapshot at scan end? */
SO_TEMP_SNAPSHOT = 1 << 9,
+ /* set if the query doesn't modify the rel */
+ SO_HINT_REL_READ_ONLY = 1 << 10,
} ScanOptions;
/*
@@ -420,7 +422,7 @@ typedef struct TableAmRoutine
*
* Tuples for an index scan can then be fetched via index_fetch_tuple.
*/
- struct IndexFetchTableData *(*index_fetch_begin) (Relation rel);
+ struct IndexFetchTableData *(*index_fetch_begin) (Relation rel, uint32 flags);
/*
* Reset index fetch. Typically this will release cross index fetch
@@ -874,9 +876,9 @@ extern TupleTableSlot *table_slot_create(Relation relation, List **reglist);
*/
static inline TableScanDesc
table_beginscan(Relation rel, Snapshot snapshot,
- int nkeys, ScanKeyData *key)
+ int nkeys, ScanKeyData *key, uint32 flags)
{
- uint32 flags = SO_TYPE_SEQSCAN |
+ flags |= SO_TYPE_SEQSCAN |
SO_ALLOW_STRAT | SO_ALLOW_SYNC | SO_ALLOW_PAGEMODE;
return rel->rd_tableam->scan_begin(rel, snapshot, nkeys, key, NULL, flags);
@@ -919,9 +921,9 @@ table_beginscan_strat(Relation rel, Snapshot snapshot,
*/
static inline TableScanDesc
table_beginscan_bm(Relation rel, Snapshot snapshot,
- int nkeys, ScanKeyData *key)
+ int nkeys, ScanKeyData *key, uint32 flags)
{
- uint32 flags = SO_TYPE_BITMAPSCAN | SO_ALLOW_PAGEMODE;
+ flags |= SO_TYPE_BITMAPSCAN | SO_ALLOW_PAGEMODE;
return rel->rd_tableam->scan_begin(rel, snapshot, nkeys, key,
NULL, flags);
@@ -1128,7 +1130,8 @@ extern void table_parallelscan_initialize(Relation rel,
* Caller must hold a suitable lock on the relation.
*/
extern TableScanDesc table_beginscan_parallel(Relation relation,
- ParallelTableScanDesc pscan);
+ ParallelTableScanDesc pscan,
+ uint32 flags);
/*
* Begin a parallel tid range scan. `pscan` needs to have been initialized
@@ -1164,9 +1167,9 @@ table_parallelscan_reinitialize(Relation rel, ParallelTableScanDesc pscan)
* Tuples for an index scan can then be fetched via table_index_fetch_tuple().
*/
static inline IndexFetchTableData *
-table_index_fetch_begin(Relation rel)
+table_index_fetch_begin(Relation rel, uint32 flags)
{
- return rel->rd_tableam->index_fetch_begin(rel);
+ return rel->rd_tableam->index_fetch_begin(rel, flags);
}
/*
--
2.43.0
v26-0014-Allow-on-access-pruning-to-set-pages-all-visible.patchtext/x-patch; charset=US-ASCII; name=v26-0014-Allow-on-access-pruning-to-set-pages-all-visible.patchDownload
From 1153915bc4db207f63e2718234478d2237fdb73d Mon Sep 17 00:00:00 2001
From: Melanie Plageman <melanieplageman@gmail.com>
Date: Wed, 3 Dec 2025 15:24:08 -0500
Subject: [PATCH v26 14/15] Allow on-access pruning to set pages all-visible
Many queries do not modify the underlying relation. For such queries, if
on-access pruning occurs during the scan, we can check whether the page
has become all-visible and update the visibility map accordingly.
Previously, only vacuum and COPY FREEZE marked pages as all-visible or
all-frozen.
This commit implements on-access VM setting for sequential scans as well
as for the underlying heap relation in index scans and bitmap heap
scans.
Author: Melanie Plageman <melanieplageman@gmail.com>
Reviewed-by: Andres Freund <andres@anarazel.de>
Reviewed-by: Kirill Reshke <reshkekirill@gmail.com>
Discussion: https://postgr.es/m/flat/CAAKRu_ZMw6Npd_qm2KM%2BFwQ3cMOMx1Dh3VMhp8-V7SOLxdK9-g%40mail.gmail.com
---
src/backend/access/heap/heapam.c | 15 ++++++-
src/backend/access/heap/heapam_handler.c | 15 ++++++-
src/backend/access/heap/pruneheap.c | 44 +++++++++++++++++--
src/include/access/heapam.h | 24 ++++++++--
.../t/035_standby_logical_decoding.pl | 3 +-
5 files changed, 90 insertions(+), 11 deletions(-)
diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index fb7a7548aa0..d9dc79f4a96 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -570,6 +570,7 @@ heap_prepare_pagescan(TableScanDesc sscan)
Buffer buffer = scan->rs_cbuf;
BlockNumber block = scan->rs_cblock;
Snapshot snapshot;
+ Buffer *vmbuffer = NULL;
Page page;
int lines;
bool all_visible;
@@ -584,7 +585,9 @@ heap_prepare_pagescan(TableScanDesc sscan)
/*
* Prune and repair fragmentation for the whole page, if possible.
*/
- heap_page_prune_opt(scan->rs_base.rs_rd, buffer);
+ if (sscan->rs_flags & SO_HINT_REL_READ_ONLY)
+ vmbuffer = &scan->rs_vmbuffer;
+ heap_page_prune_opt(scan->rs_base.rs_rd, buffer, vmbuffer);
/*
* We must hold share lock on the buffer content while examining tuple
@@ -1261,6 +1264,7 @@ heap_beginscan(Relation relation, Snapshot snapshot,
sizeof(TBMIterateResult));
}
+ scan->rs_vmbuffer = InvalidBuffer;
return (TableScanDesc) scan;
}
@@ -1299,6 +1303,12 @@ heap_rescan(TableScanDesc sscan, ScanKey key, bool set_params,
scan->rs_cbuf = InvalidBuffer;
}
+ if (BufferIsValid(scan->rs_vmbuffer))
+ {
+ ReleaseBuffer(scan->rs_vmbuffer);
+ scan->rs_vmbuffer = InvalidBuffer;
+ }
+
/*
* SO_TYPE_BITMAPSCAN would be cleaned up here, but it does not hold any
* additional data vs a normal HeapScan
@@ -1331,6 +1341,9 @@ heap_endscan(TableScanDesc sscan)
if (BufferIsValid(scan->rs_cbuf))
ReleaseBuffer(scan->rs_cbuf);
+ if (BufferIsValid(scan->rs_vmbuffer))
+ ReleaseBuffer(scan->rs_vmbuffer);
+
/*
* Must free the read stream before freeing the BufferAccessStrategy.
*/
diff --git a/src/backend/access/heap/heapam_handler.c b/src/backend/access/heap/heapam_handler.c
index 6c2e4e08b16..2cb98e58956 100644
--- a/src/backend/access/heap/heapam_handler.c
+++ b/src/backend/access/heap/heapam_handler.c
@@ -85,6 +85,7 @@ heapam_index_fetch_begin(Relation rel, uint32 flags)
hscan->xs_base.rel = rel;
hscan->xs_cbuf = InvalidBuffer;
+ hscan->xs_vmbuffer = InvalidBuffer;
hscan->modifies_base_rel = !(flags & SO_HINT_REL_READ_ONLY);
return &hscan->xs_base;
@@ -100,6 +101,12 @@ heapam_index_fetch_reset(IndexFetchTableData *scan)
ReleaseBuffer(hscan->xs_cbuf);
hscan->xs_cbuf = InvalidBuffer;
}
+
+ if (BufferIsValid(hscan->xs_vmbuffer))
+ {
+ ReleaseBuffer(hscan->xs_vmbuffer);
+ hscan->xs_vmbuffer = InvalidBuffer;
+ }
}
static void
@@ -139,7 +146,8 @@ heapam_index_fetch_tuple(struct IndexFetchTableData *scan,
* Prune page, but only if we weren't already on this page
*/
if (prev_buf != hscan->xs_cbuf)
- heap_page_prune_opt(hscan->xs_base.rel, hscan->xs_cbuf);
+ heap_page_prune_opt(hscan->xs_base.rel, hscan->xs_cbuf,
+ hscan->modifies_base_rel ? NULL : &hscan->xs_vmbuffer);
}
/* Obtain share-lock on the buffer so we can examine visibility */
@@ -2472,6 +2480,7 @@ BitmapHeapScanNextBlock(TableScanDesc scan,
TBMIterateResult *tbmres;
OffsetNumber offsets[TBM_MAX_TUPLES_PER_PAGE];
int noffsets = -1;
+ Buffer *vmbuffer = NULL;
Assert(scan->rs_flags & SO_TYPE_BITMAPSCAN);
Assert(hscan->rs_read_stream);
@@ -2518,7 +2527,9 @@ BitmapHeapScanNextBlock(TableScanDesc scan,
/*
* Prune and repair fragmentation for the whole page, if possible.
*/
- heap_page_prune_opt(scan->rs_rd, buffer);
+ if (scan->rs_flags & SO_HINT_REL_READ_ONLY)
+ vmbuffer = &hscan->rs_vmbuffer;
+ heap_page_prune_opt(scan->rs_rd, buffer, vmbuffer);
/*
* We must hold share lock on the buffer content while examining tuple
diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index a4c3bd00253..d1ec6d1b601 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -202,6 +202,8 @@ static bool heap_page_will_set_vm(PruneState *prstate,
Relation relation,
BlockNumber heap_blk, Buffer heap_buffer, Page heap_page,
Buffer vmbuffer,
+ PruneReason reason,
+ bool do_prune, bool do_freeze,
int nlpdead_items,
uint8 *old_vmbits,
uint8 *new_vmbits);
@@ -223,9 +225,13 @@ static TransactionId get_conflict_xid(bool do_prune, bool do_freeze, bool do_set
* if there's not any use in pruning.
*
* Caller must have pin on the buffer, and must *not* have a lock on it.
+ *
+ * If vmbuffer is not NULL, it is okay for pruning to set the visibility map if
+ * the page is all-visible. We will take care of pinning and, if needed,
+ * reading in the page of the visibility map.
*/
void
-heap_page_prune_opt(Relation relation, Buffer buffer)
+heap_page_prune_opt(Relation relation, Buffer buffer, Buffer *vmbuffer)
{
Page page = BufferGetPage(buffer);
TransactionId prune_xid;
@@ -306,6 +312,13 @@ heap_page_prune_opt(Relation relation, Buffer buffer)
.cutoffs = NULL,
};
+ if (vmbuffer)
+ {
+ visibilitymap_pin(relation, BufferGetBlockNumber(buffer), vmbuffer);
+ params.options |= HEAP_PAGE_PRUNE_UPDATE_VM;
+ params.vmbuffer = *vmbuffer;
+ }
+
heap_page_prune_and_freeze(¶ms, &presult, &dummy_off_loc,
NULL, NULL);
@@ -935,6 +948,9 @@ identify_and_fix_vm_corruption(Relation rel, Buffer heap_buffer,
* corrupted, it will fix them by clearing the VM bits and visibility hint.
* This does not need to be done in a critical section.
*
+ * This should be called only after do_freeze has been decided (and do_prune
+ * has been set), as these factor into our heuristic-based decision.
+ *
* Returns true if one or both VM bits should be set, along with returning the
* current value of the VM bits in *old_vmbits and the desired new value of
* the VM bits in *new_vmbits.
@@ -944,6 +960,8 @@ heap_page_will_set_vm(PruneState *prstate,
Relation relation,
BlockNumber heap_blk, Buffer heap_buffer, Page heap_page,
Buffer vmbuffer,
+ PruneReason reason,
+ bool do_prune, bool do_freeze,
int nlpdead_items,
uint8 *old_vmbits,
uint8 *new_vmbits)
@@ -951,6 +969,24 @@ heap_page_will_set_vm(PruneState *prstate,
if (!prstate->attempt_update_vm)
return false;
+ /*
+ * If this is an on-access call and we're not actually pruning, avoid
+ * setting the visibility map if it would newly dirty the heap page or, if
+ * the page is already dirty, if doing so would require including a
+ * full-page image (FPI) of the heap page in the WAL. This situation
+ * should be rare, as on-access pruning is only attempted when
+ * pd_prune_xid is valid.
+ */
+ if (reason == PRUNE_ON_ACCESS &&
+ prstate->all_visible &&
+ !do_prune && !do_freeze &&
+ (!BufferIsDirty(heap_buffer) || XLogCheckBufferNeedsBackup(heap_buffer)))
+ {
+ prstate->all_visible = false;
+ prstate->all_frozen = false;
+ return false;
+ }
+
*old_vmbits = visibilitymap_get_status(relation, heap_blk,
&vmbuffer);
@@ -1146,6 +1182,8 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
buffer,
page,
vmbuffer,
+ params->reason,
+ do_prune, do_freeze,
prstate.lpdead_items,
&old_vmbits,
&new_vmbits);
@@ -1232,9 +1270,7 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
MarkBufferDirty(buffer);
- /*
- * Emit a WAL XLOG_HEAP2_PRUNE* record showing what we did
- */
+ /* Emit a WAL XLOG_HEAP2_PRUNE* record showing what we did */
if (RelationNeedsWAL(params->relation))
{
log_heap_prune_and_freeze(params->relation, buffer,
diff --git a/src/include/access/heapam.h b/src/include/access/heapam.h
index ba3ff8c0845..c835c792c80 100644
--- a/src/include/access/heapam.h
+++ b/src/include/access/heapam.h
@@ -95,6 +95,13 @@ typedef struct HeapScanDescData
*/
ParallelBlockTableScanWorkerData *rs_parallelworkerdata;
+ /*
+ * For sequential scans and bitmap heap scans. If the relation is not
+ * being modified, on-access pruning may read in the current heap page's
+ * corresponding VM block to this buffer.
+ */
+ Buffer rs_vmbuffer;
+
/* these fields only used in page-at-a-time mode and for bitmap scans */
uint32 rs_cindex; /* current tuple's index in vistuples */
uint32 rs_ntuples; /* number of visible tuples on page */
@@ -117,8 +124,18 @@ typedef struct IndexFetchHeapData
{
IndexFetchTableData xs_base; /* AM independent part of the descriptor */
- Buffer xs_cbuf; /* current heap buffer in scan, if any */
- /* NB: if xs_cbuf is not InvalidBuffer, we hold a pin on that buffer */
+ /*
+ * Current heap buffer in scan, if any. NB: if xs_cbuf is not
+ * InvalidBuffer, we hold a pin on that buffer.
+ */
+ Buffer xs_cbuf;
+
+ /*
+ * For index scans that do not modify the underlying heap table, on-access
+ * pruning may read in the current heap page's corresponding VM block to
+ * this buffer.
+ */
+ Buffer xs_vmbuffer;
/*
* Some optimizations can only be performed if the query does not modify
@@ -419,7 +436,8 @@ extern TransactionId heap_index_delete_tuples(Relation rel,
TM_IndexDeleteOp *delstate);
/* in heap/pruneheap.c */
-extern void heap_page_prune_opt(Relation relation, Buffer buffer);
+extern void heap_page_prune_opt(Relation relation, Buffer buffer,
+ Buffer *vmbuffer);
extern void heap_page_prune_and_freeze(PruneFreezeParams *params,
PruneFreezeResult *presult,
OffsetNumber *off_loc,
diff --git a/src/test/recovery/t/035_standby_logical_decoding.pl b/src/test/recovery/t/035_standby_logical_decoding.pl
index ebe2fae1789..bdd9f0a62cd 100644
--- a/src/test/recovery/t/035_standby_logical_decoding.pl
+++ b/src/test/recovery/t/035_standby_logical_decoding.pl
@@ -296,6 +296,7 @@ wal_level = 'logical'
max_replication_slots = 4
max_wal_senders = 4
autovacuum = off
+hot_standby_feedback = on
});
$node_primary->dump_info;
$node_primary->start;
@@ -748,7 +749,7 @@ check_pg_recvlogical_stderr($handle,
$logstart = -s $node_standby->logfile;
reactive_slots_change_hfs_and_wait_for_xmins('shared_row_removal_',
- 'no_conflict_', 0, 1);
+ 'no_conflict_', 1, 0);
# This should not trigger a conflict
wait_until_vacuum_can_remove(
--
2.43.0
v26-0015-Set-pd_prune_xid-on-insert.patchtext/x-patch; charset=UTF-8; name=v26-0015-Set-pd_prune_xid-on-insert.patchDownload
From 86187669b4c2590e50ac5fe6111cb1531a9bceee Mon Sep 17 00:00:00 2001
From: Melanie Plageman <melanieplageman@gmail.com>
Date: Tue, 29 Jul 2025 16:12:56 -0400
Subject: [PATCH v26 15/15] Set pd_prune_xid on insert
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
Now that visibility map (VM) updates can occur during read-only queries,
it makes sense to also set the page’s pd_prune_xid hint during inserts.
This enables heap_page_prune_and_freeze() to run after a page is
filled with newly inserted tuples the first time it is read.
This change also addresses a long-standing note in heap_insert() and
heap_multi_insert(), which observed that setting pd_prune_xid would
help clean up aborted insertions sooner. Without it, such tuples might
linger until VACUUM, whereas now they can be pruned earlier.
Setting pd_prune_xid on insert can cause a page to be dirtied and
written out when it previously would not have been, affetcting the
reported number of hits in the index-killtuples isolation test. It is
unclear if this is a bug in the way hits are tracked, a faulty test
expectation, or if simply updating the test's expected output is
sufficient remediation.
---
src/backend/access/heap/heapam.c | 25 +++++++++++++------
src/backend/access/heap/heapam_xlog.c | 15 ++++++++++-
.../modules/index/expected/killtuples.out | 6 ++---
3 files changed, 34 insertions(+), 12 deletions(-)
diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index d9dc79f4a96..ccebc1f244b 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -2119,6 +2119,7 @@ heap_insert(Relation relation, HeapTuple tup, CommandId cid,
TransactionId xid = GetCurrentTransactionId();
HeapTuple heaptup;
Buffer buffer;
+ Page page;
Buffer vmbuffer = InvalidBuffer;
bool all_visible_cleared = false;
@@ -2178,15 +2179,19 @@ heap_insert(Relation relation, HeapTuple tup, CommandId cid,
}
/*
- * XXX Should we set PageSetPrunable on this page ?
+ * Set pd_prune_xid to trigger heap_page_prune_and_freeze() once the page
+ * is full so that we can set the page all-visible in the VM.
*
- * The inserting transaction may eventually abort thus making this tuple
- * DEAD and hence available for pruning. Though we don't want to optimize
- * for aborts, if no other tuple in this page is UPDATEd/DELETEd, the
- * aborted tuple will never be pruned until next vacuum is triggered.
+ * Setting pd_prune_xid is also handy if the inserting transaction
+ * eventually aborts making this tuple DEAD and hence available for
+ * pruning. If no other tuple in this page is UPDATEd/DELETEd, the aborted
+ * tuple would never otherwise be pruned until next vacuum is triggered.
*
- * If you do add PageSetPrunable here, add it in heap_xlog_insert too.
+ * Don't set it if we are in bootstrap mode, though.
*/
+ page = BufferGetPage(buffer);
+ if (TransactionIdIsNormal(xid))
+ PageSetPrunable(page, xid);
MarkBufferDirty(buffer);
@@ -2196,7 +2201,6 @@ heap_insert(Relation relation, HeapTuple tup, CommandId cid,
xl_heap_insert xlrec;
xl_heap_header xlhdr;
XLogRecPtr recptr;
- Page page = BufferGetPage(buffer);
uint8 info = XLOG_HEAP_INSERT;
int bufflags = 0;
@@ -2560,8 +2564,13 @@ heap_multi_insert(Relation relation, TupleTableSlot **slots, int ntuples,
}
/*
- * XXX Should we set PageSetPrunable on this page ? See heap_insert()
+ * Set pd_prune_xid. See heap_insert() for more on why we do this when
+ * inserting tuples. This only makes sense if we aren't already
+ * setting the page frozen in the VM. We also don't set it in
+ * bootstrap mode.
*/
+ if (!all_frozen_set && TransactionIdIsNormal(xid))
+ PageSetPrunable(page, xid);
MarkBufferDirty(buffer);
diff --git a/src/backend/access/heap/heapam_xlog.c b/src/backend/access/heap/heapam_xlog.c
index 47d2479415e..ab2db931aac 100644
--- a/src/backend/access/heap/heapam_xlog.c
+++ b/src/backend/access/heap/heapam_xlog.c
@@ -447,6 +447,12 @@ heap_xlog_insert(XLogReaderState *record)
freespace = PageGetHeapFreeSpace(page); /* needed to update FSM below */
+ /*
+ * Set the page prunable to trigger on-access pruning later which may
+ * set the page all-visible in the VM.
+ */
+ PageSetPrunable(page, XLogRecGetXid(record));
+
PageSetLSN(page, lsn);
if (xlrec->flags & XLH_INSERT_ALL_VISIBLE_CLEARED)
@@ -596,9 +602,16 @@ heap_xlog_multi_insert(XLogReaderState *record)
if (xlrec->flags & XLH_INSERT_ALL_VISIBLE_CLEARED)
PageClearAllVisible(page);
- /* XLH_INSERT_ALL_FROZEN_SET implies that all tuples are visible */
+ /*
+ * XLH_INSERT_ALL_FROZEN_SET implies that all tuples are visible. If
+ * we are not setting the page frozen, then set the page's prunable
+ * hint so that we trigger on-access pruning later which may set the
+ * page all-visible in the VM.
+ */
if (xlrec->flags & XLH_INSERT_ALL_FROZEN_SET)
PageSetAllVisible(page);
+ else
+ PageSetPrunable(page, XLogRecGetXid(record));
MarkBufferDirty(buffer);
}
diff --git a/src/test/modules/index/expected/killtuples.out b/src/test/modules/index/expected/killtuples.out
index be7ddd756ef..b29f2434b00 100644
--- a/src/test/modules/index/expected/killtuples.out
+++ b/src/test/modules/index/expected/killtuples.out
@@ -54,7 +54,7 @@ step flush: SELECT FROM pg_stat_force_next_flush();
step result: SELECT heap_blks_read + heap_blks_hit - counter.heap_accesses AS new_heap_accesses FROM counter, pg_statio_all_tables WHERE relname = 'kill_prior_tuple';
new_heap_accesses
-----------------
- 1
+ 2
(1 row)
step measure: UPDATE counter SET heap_accesses = (SELECT heap_blks_read + heap_blks_hit FROM pg_statio_all_tables WHERE relname = 'kill_prior_tuple');
@@ -130,7 +130,7 @@ step flush: SELECT FROM pg_stat_force_next_flush();
step result: SELECT heap_blks_read + heap_blks_hit - counter.heap_accesses AS new_heap_accesses FROM counter, pg_statio_all_tables WHERE relname = 'kill_prior_tuple';
new_heap_accesses
-----------------
- 1
+ 2
(1 row)
step measure: UPDATE counter SET heap_accesses = (SELECT heap_blks_read + heap_blks_hit FROM pg_statio_all_tables WHERE relname = 'kill_prior_tuple');
@@ -283,7 +283,7 @@ step flush: SELECT FROM pg_stat_force_next_flush();
step result: SELECT heap_blks_read + heap_blks_hit - counter.heap_accesses AS new_heap_accesses FROM counter, pg_statio_all_tables WHERE relname = 'kill_prior_tuple';
new_heap_accesses
-----------------
- 1
+ 2
(1 row)
step measure: UPDATE counter SET heap_accesses = (SELECT heap_blks_read + heap_blks_hit FROM pg_statio_all_tables WHERE relname = 'kill_prior_tuple');
--
2.43.0
On 15.12.25 22:05, Melanie Plageman wrote:
On Sat, Dec 13, 2025 at 8:59 AM Peter Eisentraut <peter@eisentraut.org> wrote:
On 20.11.25 18:19, Melanie Plageman wrote:
+ prstate->deadoffsets = (OffsetNumber *) presult->deadoffsets;
In your patch
v22-0001-Split-heap_page_prune_and_freeze-into-helpers.patch, the
assignment above casts away the const qualification of the function
argument presult:Yea, this code (prune_freeze_setup() with a const-qualified
PruneFreezeResult parameter) is actually already in master -- not just
in this patchset.+static void +prune_freeze_setup(PruneFreezeParams *params, + TransactionId new_relfrozen_xid, + MultiXactId new_relmin_mxid, + const PruneFreezeResult *presult, + PruneState *prstate)(The cast is otherwise unnecessary, since the underlying type is the
same on both sides.)Since prstate->deadoffsets is in fact later modified, this makes the
original const qualification invalid.I didn't realize I was misusing const here. What I meant to indicate
by defining the prune_freeze_setup() parameter, as const, is that the
PruneFreezeResult wouldn't be modified by prune_freeze_setup(). I did
not mean to indicate that no members of PruneFreezeResult would ever
be modified.
I'm not sure there is a difference between these two statements. The
struct won't be modified is the same as none of its fields will be modified.
deadoffsets is not modified in prune_freeze_setup(). So,
are you saying that I can't define a parameter as const if even the
caller modifies it?
You are not modifying deadoffsets in prune_freeze_setup(), but you are
assigning its address to a pointer variable that is not const-qualified,
and so it could be used to modify it later on.
A caller to prune_freeze_setup() that sees the signature const
PruneFreezeResult *presult could pass a pointer to a PruneFreezeResult
object that is notionally in read-only memory. But through the
non-const-qualified pointer you could later modify the pointed-to
memory, which would be invalid. The point of propagating the qualifiers
is to prevent that at compile time.
If what you want is something like, "prune_freeze_setup() does not
change any of the fields of what presult points to, but it does record a
pointer to one of its fields with the intention of modifying it later
after prune_freeze_setup() is finished", then I think C cannot represent
that with this API.
Here is a simplified example:
#include <stdlib.h>
// corresponds to PruneFreezeResult
struct foo
{
int offsets[5];
};
// corresponds to PruneState
struct bar
{
int *offsets;
};
static void setup(const struct foo *f)
{
struct bar *b = malloc(sizeof(struct bar));
b->offsets = f->offsets; // warning
}
This produces a warning:
test.c:20:20: warning: assignment discards 'const' qualifier from
pointer target type
The reason is that what "f" points to is const, which means that all its
fields are const. The fix is to remove the const from the function
argument declaration.
One of the possible sources of confusion here is that one struct uses an
array and the other a pointer, and these sometimes behave similarly and
sometimes not.
On Tue, Dec 16, 2025 at 7:18 AM Peter Eisentraut <peter@eisentraut.org> wrote:
You are not modifying deadoffsets in prune_freeze_setup(), but you are
assigning its address to a pointer variable that is not const-qualified,
and so it could be used to modify it later on.A caller to prune_freeze_setup() that sees the signature const
PruneFreezeResult *presult could pass a pointer to a PruneFreezeResult
object that is notionally in read-only memory. But through the
non-const-qualified pointer you could later modify the pointed-to
memory, which would be invalid. The point of propagating the qualifiers
is to prevent that at compile time.
Thanks for the explanation. I've committed your proposed fix.
- Melanie
On Wed, Dec 3, 2025 at 6:07 PM Melanie Plageman
<melanieplageman@gmail.com> wrote:
If we're just talking about the renaming, looking at procarray.c, it
is full of the word "removable" because its functions were largely
used to examine and determine if everyone can see an xmax as committed
and thus if that tuple is removable from their perspective. But
nothing about the code that I can see means it has to be an xmax. We
could just as well use the functions to determine if everyone can see
an xmin as committed.
In the attached v27, I've removed the commit that renamed functions in
procarray.c. I've added a single wrapper GlobalVisTestXidNotRunning()
that is used in my code where I am testing live tuples. I think you'll
find that I've addressed all of your review comments now -- as I've
also gotten rid of the confusing blk_known_av logic through a series
of refactors.
The one outstanding point is which commits should bump
XLOG_PAGE_MAGIC. (also review of the reworked patches).
- Melanie
Attachments:
v27-0001-Combine-visibilitymap_set-cases-in-lazy_scan_pru.patchtext/x-patch; charset=UTF-8; name=v27-0001-Combine-visibilitymap_set-cases-in-lazy_scan_pru.patchDownload
From eb1a372848e3274d98b129d7f77ca1c11f4dceb8 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <melanieplageman@gmail.com>
Date: Mon, 8 Dec 2025 15:49:54 -0500
Subject: [PATCH v27 01/14] Combine visibilitymap_set() cases in
lazy_scan_prune()
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
lazy_scan_prune() previously had two separate cases that called
visibilitymap_set() after pruning and freezing. These branches were
nearly identical except that one attempted to avoid dirtying the heap
buffer. However, that situation can never occur — the heap buffer cannot
be clean at that point (and we would hit an assertion if it were).
In lazy_scan_prune(), when we change a previously all-visible page to
all-frozen and the page was recorded as all-visible in the visibility
map by find_next_unskippable_block(), the heap buffer will always be
dirty. Either we have just frozen a tuple and already dirtied the
buffer, or the buffer was modified between find_next_unskippable_block()
and heap_page_prune_and_freeze() and then pruned in
heap_page_prune_and_freeze().
Additionally, XLogRegisterBuffer() asserts that the buffer is dirty, so
attempting to add a clean heap buffer to the WAL chain would fail
anyway.
Since the “clean heap buffer with already set VM” case is impossible,
the two visibilitymap_set() branches in lazy_scan_prune() can be merged.
Doing so makes the intent clearer and emphasizes that the heap buffer
must always be marked dirty before being added to the WAL chain.
This commit also adds a test case for vacuuming when no heap
modifications are required. Currently this ensures that the heap buffer
is marked dirty before it is added to the WAL chain, but if we later
remove the heap buffer from the VM-set WAL chain or pass it with the
REGBUF_NO_CHANGES flag, this test would guard that behavior.
Author: Melanie Plageman <melanieplageman@gmail.com>
Reviewed-by: Chao Li <li.evan.chao@gmail.com>
Discussion: https://postgr.es/m/5CEAA162-67B1-44DA-B60D-8B65717E8B05%40gmail.com
---
.../pg_visibility/expected/pg_visibility.out | 35 +++++++
contrib/pg_visibility/sql/pg_visibility.sql | 16 ++++
src/backend/access/heap/vacuumlazy.c | 95 +++++--------------
3 files changed, 73 insertions(+), 73 deletions(-)
diff --git a/contrib/pg_visibility/expected/pg_visibility.out b/contrib/pg_visibility/expected/pg_visibility.out
index 09fa5933a35..cbc04aad016 100644
--- a/contrib/pg_visibility/expected/pg_visibility.out
+++ b/contrib/pg_visibility/expected/pg_visibility.out
@@ -204,6 +204,41 @@ select pg_truncate_visibility_map('test_partition');
(1 row)
+-- test the case where vacuum phase I does not need to modify the heap buffer
+-- and only needs to set the VM
+create table test_vac_unmodified_heap(a int);
+insert into test_vac_unmodified_heap values (1);
+vacuum (freeze) test_vac_unmodified_heap;
+select pg_visibility_map_summary('test_vac_unmodified_heap');
+ pg_visibility_map_summary
+---------------------------
+ (1,1)
+(1 row)
+
+-- the checkpoint cleans the buffer dirtied by freezing the sole tuple
+checkpoint;
+-- truncating the VM ensures that the next vacuum will need to set it
+select pg_truncate_visibility_map('test_vac_unmodified_heap');
+ pg_truncate_visibility_map
+----------------------------
+
+(1 row)
+
+select pg_visibility_map_summary('test_vac_unmodified_heap');
+ pg_visibility_map_summary
+---------------------------
+ (0,0)
+(1 row)
+
+-- vacuum sets the VM but does not need to set PD_ALL_VISIBLE so no heap page
+-- modification
+vacuum test_vac_unmodified_heap;
+select pg_visibility_map_summary('test_vac_unmodified_heap');
+ pg_visibility_map_summary
+---------------------------
+ (1,1)
+(1 row)
+
-- test copy freeze
create table copyfreeze (a int, b char(1500));
-- load all rows via COPY FREEZE and ensure that all pages are set all-visible
diff --git a/contrib/pg_visibility/sql/pg_visibility.sql b/contrib/pg_visibility/sql/pg_visibility.sql
index 5af06ec5b76..0d13116248b 100644
--- a/contrib/pg_visibility/sql/pg_visibility.sql
+++ b/contrib/pg_visibility/sql/pg_visibility.sql
@@ -94,6 +94,22 @@ select count(*) > 0 from pg_visibility_map_summary('test_partition');
select * from pg_check_frozen('test_partition'); -- hopefully none
select pg_truncate_visibility_map('test_partition');
+-- test the case where vacuum phase I does not need to modify the heap buffer
+-- and only needs to set the VM
+create table test_vac_unmodified_heap(a int);
+insert into test_vac_unmodified_heap values (1);
+vacuum (freeze) test_vac_unmodified_heap;
+select pg_visibility_map_summary('test_vac_unmodified_heap');
+-- the checkpoint cleans the buffer dirtied by freezing the sole tuple
+checkpoint;
+-- truncating the VM ensures that the next vacuum will need to set it
+select pg_truncate_visibility_map('test_vac_unmodified_heap');
+select pg_visibility_map_summary('test_vac_unmodified_heap');
+-- vacuum sets the VM but does not need to set PD_ALL_VISIBLE so no heap page
+-- modification
+vacuum test_vac_unmodified_heap;
+select pg_visibility_map_summary('test_vac_unmodified_heap');
+
-- test copy freeze
create table copyfreeze (a int, b char(1500));
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index 62035b7f9c3..811e7e33678 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -2094,16 +2094,14 @@ lazy_scan_prune(LVRelState *vacrel,
* of last heap_vac_scan_next_block() call), and from all_visible and
* all_frozen variables
*/
- if (!all_visible_according_to_vm && presult.all_visible)
+ if ((presult.all_visible && !all_visible_according_to_vm) ||
+ (presult.all_frozen && !VM_ALL_FROZEN(rel, blkno, &vmbuffer)))
{
uint8 old_vmbits;
- uint8 flags = VISIBILITYMAP_ALL_VISIBLE;
+ uint8 new_vmbits = VISIBILITYMAP_ALL_VISIBLE;
if (presult.all_frozen)
- {
- Assert(!TransactionIdIsValid(presult.vm_conflict_horizon));
- flags |= VISIBILITYMAP_ALL_FROZEN;
- }
+ new_vmbits |= VISIBILITYMAP_ALL_FROZEN;
/*
* It should never be the case that the visibility map page is set
@@ -2111,19 +2109,29 @@ lazy_scan_prune(LVRelState *vacrel,
* checksums are not enabled). Regardless, set both bits so that we
* get back in sync.
*
- * NB: If the heap page is all-visible but the VM bit is not set, we
- * don't need to dirty the heap page. However, if checksums are
- * enabled, we do need to make sure that the heap page is dirtied
- * before passing it to visibilitymap_set(), because it may be logged.
- * Given that this situation should only happen in rare cases after a
- * crash, it is not worth optimizing.
+ * Even if PD_ALL_VISIBLE is already set, we don't need to worry about
+ * unnecessarily dirtying the heap buffer, as it must be marked dirty
+ * before adding it to the WAL chain. The only scenario where it is
+ * not already dirty is if the VM was removed, and that isn't worth
+ * optimizing for.
*/
PageSetAllVisible(page);
MarkBufferDirty(buf);
+
+ /*
+ * If the page is being set all-frozen, we pass InvalidTransactionId
+ * as the cutoff_xid, since a snapshot conflict horizon sufficient to
+ * make everything safe for REDO was logged when the page's tuples
+ * were frozen.
+ */
+ Assert(!presult.all_frozen ||
+ !TransactionIdIsValid(presult.vm_conflict_horizon));
+
old_vmbits = visibilitymap_set(vacrel->rel, blkno, buf,
InvalidXLogRecPtr,
- vmbuffer, presult.vm_conflict_horizon,
- flags);
+ vmbuffer,
+ presult.vm_conflict_horizon,
+ new_vmbits);
/*
* If the page wasn't already set all-visible and/or all-frozen in the
@@ -2191,65 +2199,6 @@ lazy_scan_prune(LVRelState *vacrel,
VISIBILITYMAP_VALID_BITS);
}
- /*
- * If the all-visible page is all-frozen but not marked as such yet, mark
- * it as all-frozen.
- */
- else if (all_visible_according_to_vm && presult.all_frozen &&
- !VM_ALL_FROZEN(vacrel->rel, blkno, &vmbuffer))
- {
- uint8 old_vmbits;
-
- /*
- * Avoid relying on all_visible_according_to_vm as a proxy for the
- * page-level PD_ALL_VISIBLE bit being set, since it might have become
- * stale -- even when all_visible is set
- */
- if (!PageIsAllVisible(page))
- {
- PageSetAllVisible(page);
- MarkBufferDirty(buf);
- }
-
- /*
- * Set the page all-frozen (and all-visible) in the VM.
- *
- * We can pass InvalidTransactionId as our cutoff_xid, since a
- * snapshotConflictHorizon sufficient to make everything safe for REDO
- * was logged when the page's tuples were frozen.
- */
- Assert(!TransactionIdIsValid(presult.vm_conflict_horizon));
- old_vmbits = visibilitymap_set(vacrel->rel, blkno, buf,
- InvalidXLogRecPtr,
- vmbuffer, InvalidTransactionId,
- VISIBILITYMAP_ALL_VISIBLE |
- VISIBILITYMAP_ALL_FROZEN);
-
- /*
- * The page was likely already set all-visible in the VM. However,
- * there is a small chance that it was modified sometime between
- * setting all_visible_according_to_vm and checking the visibility
- * during pruning. Check the return value of old_vmbits anyway to
- * ensure the visibility map counters used for logging are accurate.
- */
- if ((old_vmbits & VISIBILITYMAP_ALL_VISIBLE) == 0)
- {
- vacrel->vm_new_visible_pages++;
- vacrel->vm_new_visible_frozen_pages++;
- *vm_page_frozen = true;
- }
-
- /*
- * We already checked that the page was not set all-frozen in the VM
- * above, so we don't need to test the value of old_vmbits.
- */
- else
- {
- vacrel->vm_new_frozen_pages++;
- *vm_page_frozen = true;
- }
- }
-
return presult.ndeleted;
}
--
2.43.0
v27-0002-Eliminate-use-of-cached-VM-value-in-lazy_scan_pr.patchtext/x-patch; charset=US-ASCII; name=v27-0002-Eliminate-use-of-cached-VM-value-in-lazy_scan_pr.patchDownload
From 808d8a5816f0764471ab92d43c57518279cd53c2 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <melanieplageman@gmail.com>
Date: Thu, 11 Dec 2025 10:48:13 -0500
Subject: [PATCH v27 02/14] Eliminate use of cached VM value in
lazy_scan_prune()
lazy_scan_prune() takes a parameter from lazy_scan_heap() indicating
whether the page was marked all-visible in the VM at the time it was
last checked in find_next_unskippable_block(). This behavior is
historical, dating back to commit 608195a3a365, when we did not pin the
VM page until confirming it was not all-visible. Now that the VM page is
already pinned, there is no meaningful benefit to relying on a cached VM
status.
Removing this cached value simplifies the logic in both lazy_scan_heap()
and lazy_scan_prune(). It also clarifies future work that will set the
visibility map on-access: such paths will not have a cached value
available which would make the logic harder to reason about. Eliminating
it also enables us to detect and repair VM corruption on-access.
Along with removing the cached value and unconditionally checking the
visibility status of the heap page, this commit also moves the VM
corruption handling to occur first. This reordering should have no
performance impact, since the checks are inexpensive and performed only
once per page. It does, however, make the control flow easier to
understand. The new restructuring also makes it possible that after
fixing corruption, the VM could be newly set, if pruning found the page
all-visible.
Author: Melanie Plageman <melanieplageman@gmail.com>
Discussion: https://postgr.es/m/5CEAA162-67B1-44DA-B60D-8B65717E8B05%40gmail.com
---
src/backend/access/heap/vacuumlazy.c | 172 ++++++++++++---------------
1 file changed, 79 insertions(+), 93 deletions(-)
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index 811e7e33678..436143cd12c 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -253,7 +253,6 @@ typedef enum
* about the block it read to the caller.
*/
#define VAC_BLK_WAS_EAGER_SCANNED (1 << 0)
-#define VAC_BLK_ALL_VISIBLE_ACCORDING_TO_VM (1 << 1)
typedef struct LVRelState
{
@@ -358,7 +357,6 @@ typedef struct LVRelState
/* State maintained by heap_vac_scan_next_block() */
BlockNumber current_block; /* last block returned */
BlockNumber next_unskippable_block; /* next unskippable block */
- bool next_unskippable_allvis; /* its visibility status */
bool next_unskippable_eager_scanned; /* if it was eagerly scanned */
Buffer next_unskippable_vmbuffer; /* buffer containing its VM bit */
@@ -432,7 +430,7 @@ static bool lazy_scan_new_or_empty(LVRelState *vacrel, Buffer buf,
bool sharelock, Buffer vmbuffer);
static int lazy_scan_prune(LVRelState *vacrel, Buffer buf,
BlockNumber blkno, Page page,
- Buffer vmbuffer, bool all_visible_according_to_vm,
+ Buffer vmbuffer,
bool *has_lpdead_items, bool *vm_page_frozen);
static bool lazy_scan_noprune(LVRelState *vacrel, Buffer buf,
BlockNumber blkno, Page page,
@@ -1249,7 +1247,6 @@ lazy_scan_heap(LVRelState *vacrel)
/* Initialize for the first heap_vac_scan_next_block() call */
vacrel->current_block = InvalidBlockNumber;
vacrel->next_unskippable_block = InvalidBlockNumber;
- vacrel->next_unskippable_allvis = false;
vacrel->next_unskippable_eager_scanned = false;
vacrel->next_unskippable_vmbuffer = InvalidBuffer;
@@ -1265,13 +1262,13 @@ lazy_scan_heap(LVRelState *vacrel)
MAIN_FORKNUM,
heap_vac_scan_next_block,
vacrel,
- sizeof(uint8));
+ sizeof(bool));
while (true)
{
Buffer buf;
Page page;
- uint8 blk_info = 0;
+ bool was_eager_scanned = false;
int ndeleted = 0;
bool has_lpdead_items;
void *per_buffer_data = NULL;
@@ -1340,13 +1337,13 @@ lazy_scan_heap(LVRelState *vacrel)
if (!BufferIsValid(buf))
break;
- blk_info = *((uint8 *) per_buffer_data);
+ was_eager_scanned = *((bool *) per_buffer_data);
CheckBufferIsPinnedOnce(buf);
page = BufferGetPage(buf);
blkno = BufferGetBlockNumber(buf);
vacrel->scanned_pages++;
- if (blk_info & VAC_BLK_WAS_EAGER_SCANNED)
+ if (was_eager_scanned)
vacrel->eager_scanned_pages++;
/* Report as block scanned, update error traceback information */
@@ -1417,7 +1414,6 @@ lazy_scan_heap(LVRelState *vacrel)
if (got_cleanup_lock)
ndeleted = lazy_scan_prune(vacrel, buf, blkno, page,
vmbuffer,
- blk_info & VAC_BLK_ALL_VISIBLE_ACCORDING_TO_VM,
&has_lpdead_items, &vm_page_frozen);
/*
@@ -1434,8 +1430,7 @@ lazy_scan_heap(LVRelState *vacrel)
* exclude pages skipped due to cleanup lock contention from eager
* freeze algorithm caps.
*/
- if (got_cleanup_lock &&
- (blk_info & VAC_BLK_WAS_EAGER_SCANNED))
+ if (got_cleanup_lock && was_eager_scanned)
{
/* Aggressive vacuums do not eager scan. */
Assert(!vacrel->aggressive);
@@ -1602,7 +1597,6 @@ heap_vac_scan_next_block(ReadStream *stream,
{
BlockNumber next_block;
LVRelState *vacrel = callback_private_data;
- uint8 blk_info = 0;
/* relies on InvalidBlockNumber + 1 overflowing to 0 on first call */
next_block = vacrel->current_block + 1;
@@ -1665,8 +1659,8 @@ heap_vac_scan_next_block(ReadStream *stream,
* otherwise they would've been unskippable.
*/
vacrel->current_block = next_block;
- blk_info |= VAC_BLK_ALL_VISIBLE_ACCORDING_TO_VM;
- *((uint8 *) per_buffer_data) = blk_info;
+ /* Block was not eager scanned */
+ *((bool *) per_buffer_data) = false;
return vacrel->current_block;
}
else
@@ -1678,11 +1672,7 @@ heap_vac_scan_next_block(ReadStream *stream,
Assert(next_block == vacrel->next_unskippable_block);
vacrel->current_block = next_block;
- if (vacrel->next_unskippable_allvis)
- blk_info |= VAC_BLK_ALL_VISIBLE_ACCORDING_TO_VM;
- if (vacrel->next_unskippable_eager_scanned)
- blk_info |= VAC_BLK_WAS_EAGER_SCANNED;
- *((uint8 *) per_buffer_data) = blk_info;
+ *((bool *) per_buffer_data) = vacrel->next_unskippable_eager_scanned;
return vacrel->current_block;
}
}
@@ -1707,7 +1697,6 @@ find_next_unskippable_block(LVRelState *vacrel, bool *skipsallvis)
BlockNumber next_unskippable_block = vacrel->next_unskippable_block + 1;
Buffer next_unskippable_vmbuffer = vacrel->next_unskippable_vmbuffer;
bool next_unskippable_eager_scanned = false;
- bool next_unskippable_allvis;
*skipsallvis = false;
@@ -1717,7 +1706,6 @@ find_next_unskippable_block(LVRelState *vacrel, bool *skipsallvis)
next_unskippable_block,
&next_unskippable_vmbuffer);
- next_unskippable_allvis = (mapbits & VISIBILITYMAP_ALL_VISIBLE) != 0;
/*
* At the start of each eager scan region, normal vacuums with eager
@@ -1736,7 +1724,7 @@ find_next_unskippable_block(LVRelState *vacrel, bool *skipsallvis)
* A block is unskippable if it is not all visible according to the
* visibility map.
*/
- if (!next_unskippable_allvis)
+ if ((mapbits & VISIBILITYMAP_ALL_VISIBLE) == 0)
{
Assert((mapbits & VISIBILITYMAP_ALL_FROZEN) == 0);
break;
@@ -1793,7 +1781,6 @@ find_next_unskippable_block(LVRelState *vacrel, bool *skipsallvis)
/* write the local variables back to vacrel */
vacrel->next_unskippable_block = next_unskippable_block;
- vacrel->next_unskippable_allvis = next_unskippable_allvis;
vacrel->next_unskippable_eager_scanned = next_unskippable_eager_scanned;
vacrel->next_unskippable_vmbuffer = next_unskippable_vmbuffer;
}
@@ -1954,9 +1941,7 @@ cmpOffsetNumbers(const void *a, const void *b)
* Caller must hold pin and buffer cleanup lock on the buffer.
*
* vmbuffer is the buffer containing the VM block with visibility information
- * for the heap block, blkno. all_visible_according_to_vm is the saved
- * visibility status of the heap block looked up earlier by the caller. We
- * won't rely entirely on this status, as it may be out of date.
+ * for the heap block, blkno.
*
* *has_lpdead_items is set to true or false depending on whether, upon return
* from this function, any LP_DEAD items are still present on the page.
@@ -1973,7 +1958,6 @@ lazy_scan_prune(LVRelState *vacrel,
BlockNumber blkno,
Page page,
Buffer vmbuffer,
- bool all_visible_according_to_vm,
bool *has_lpdead_items,
bool *vm_page_frozen)
{
@@ -1987,6 +1971,8 @@ lazy_scan_prune(LVRelState *vacrel,
.vistest = vacrel->vistest,
.cutoffs = &vacrel->cutoffs,
};
+ uint8 old_vmbits = 0;
+ uint8 new_vmbits = 0;
Assert(BufferGetBlockNumber(buf) == blkno);
@@ -2089,70 +2075,7 @@ lazy_scan_prune(LVRelState *vacrel,
Assert(!presult.all_visible || !(*has_lpdead_items));
Assert(!presult.all_frozen || presult.all_visible);
- /*
- * Handle setting visibility map bit based on information from the VM (as
- * of last heap_vac_scan_next_block() call), and from all_visible and
- * all_frozen variables
- */
- if ((presult.all_visible && !all_visible_according_to_vm) ||
- (presult.all_frozen && !VM_ALL_FROZEN(rel, blkno, &vmbuffer)))
- {
- uint8 old_vmbits;
- uint8 new_vmbits = VISIBILITYMAP_ALL_VISIBLE;
-
- if (presult.all_frozen)
- new_vmbits |= VISIBILITYMAP_ALL_FROZEN;
-
- /*
- * It should never be the case that the visibility map page is set
- * while the page-level bit is clear, but the reverse is allowed (if
- * checksums are not enabled). Regardless, set both bits so that we
- * get back in sync.
- *
- * Even if PD_ALL_VISIBLE is already set, we don't need to worry about
- * unnecessarily dirtying the heap buffer, as it must be marked dirty
- * before adding it to the WAL chain. The only scenario where it is
- * not already dirty is if the VM was removed, and that isn't worth
- * optimizing for.
- */
- PageSetAllVisible(page);
- MarkBufferDirty(buf);
-
- /*
- * If the page is being set all-frozen, we pass InvalidTransactionId
- * as the cutoff_xid, since a snapshot conflict horizon sufficient to
- * make everything safe for REDO was logged when the page's tuples
- * were frozen.
- */
- Assert(!presult.all_frozen ||
- !TransactionIdIsValid(presult.vm_conflict_horizon));
-
- old_vmbits = visibilitymap_set(vacrel->rel, blkno, buf,
- InvalidXLogRecPtr,
- vmbuffer,
- presult.vm_conflict_horizon,
- new_vmbits);
-
- /*
- * If the page wasn't already set all-visible and/or all-frozen in the
- * VM, count it as newly set for logging.
- */
- if ((old_vmbits & VISIBILITYMAP_ALL_VISIBLE) == 0)
- {
- vacrel->vm_new_visible_pages++;
- if (presult.all_frozen)
- {
- vacrel->vm_new_visible_frozen_pages++;
- *vm_page_frozen = true;
- }
- }
- else if ((old_vmbits & VISIBILITYMAP_ALL_FROZEN) == 0 &&
- presult.all_frozen)
- {
- vacrel->vm_new_frozen_pages++;
- *vm_page_frozen = true;
- }
- }
+ old_vmbits = visibilitymap_get_status(vacrel->rel, blkno, &vmbuffer);
/*
* As of PostgreSQL 9.2, the visibility map bit should never be set if the
@@ -2160,8 +2083,8 @@ lazy_scan_prune(LVRelState *vacrel,
* cleared after heap_vac_scan_next_block() was called, so we must recheck
* with buffer lock before concluding that the VM is corrupt.
*/
- else if (all_visible_according_to_vm && !PageIsAllVisible(page) &&
- visibilitymap_get_status(vacrel->rel, blkno, &vmbuffer) != 0)
+ if (!PageIsAllVisible(page) &&
+ (old_vmbits & VISIBILITYMAP_ALL_VISIBLE) != 0)
{
ereport(WARNING,
(errcode(ERRCODE_DATA_CORRUPTED),
@@ -2199,6 +2122,69 @@ lazy_scan_prune(LVRelState *vacrel,
VISIBILITYMAP_VALID_BITS);
}
+ if (!presult.all_visible)
+ return presult.ndeleted;
+
+ /* Set the visibility map and page visibility hint */
+ new_vmbits = VISIBILITYMAP_ALL_VISIBLE;
+
+ if (presult.all_frozen)
+ new_vmbits |= VISIBILITYMAP_ALL_FROZEN;
+
+ /* Nothing to do */
+ if (old_vmbits == new_vmbits)
+ return presult.ndeleted;
+
+ Assert(presult.all_visible);
+
+ /*
+ * It should never be the case that the visibility map page is set while
+ * the page-level bit is clear, but the reverse is allowed (if checksums
+ * are not enabled). Regardless, set both bits so that we get back in
+ * sync.
+ *
+ * Even if PD_ALL_VISIBLE is already set, we don't need to worry about
+ * unnecessarily dirtying the heap buffer, as it must be marked dirty
+ * before adding it to the WAL chain. The only scenario where it is not
+ * already dirty is if the VM was removed, and that isn't worth optimizing
+ * for.
+ */
+ PageSetAllVisible(page);
+ MarkBufferDirty(buf);
+
+ /*
+ * If the page is being set all-frozen, we pass InvalidTransactionId as
+ * the cutoff_xid, since a snapshot conflict horizon sufficient to make
+ * everything safe for REDO was logged when the page's tuples were frozen.
+ */
+ Assert(!presult.all_frozen ||
+ !TransactionIdIsValid(presult.vm_conflict_horizon));
+
+ visibilitymap_set(vacrel->rel, blkno, buf,
+ InvalidXLogRecPtr,
+ vmbuffer, presult.vm_conflict_horizon,
+ new_vmbits);
+
+ /*
+ * If the page wasn't already set all-visible and/or all-frozen in the VM,
+ * count it as newly set for logging.
+ */
+ if ((old_vmbits & VISIBILITYMAP_ALL_VISIBLE) == 0)
+ {
+ vacrel->vm_new_visible_pages++;
+ if (presult.all_frozen)
+ {
+ vacrel->vm_new_visible_frozen_pages++;
+ *vm_page_frozen = true;
+ }
+ }
+ else if ((old_vmbits & VISIBILITYMAP_ALL_FROZEN) == 0 &&
+ presult.all_frozen)
+ {
+ vacrel->vm_new_frozen_pages++;
+ *vm_page_frozen = true;
+ }
+
return presult.ndeleted;
}
--
2.43.0
v27-0003-Refactor-lazy_scan_prune-VM-clear-logic-into-hel.patchtext/x-patch; charset=US-ASCII; name=v27-0003-Refactor-lazy_scan_prune-VM-clear-logic-into-hel.patchDownload
From 026cfe10b79328d6b9f68703dfa9db1b4e7e619d Mon Sep 17 00:00:00 2001
From: Melanie Plageman <melanieplageman@gmail.com>
Date: Tue, 2 Dec 2025 13:36:39 -0500
Subject: [PATCH v27 03/14] Refactor lazy_scan_prune() VM clear logic into
helper
Encapsulating them in a helper makes the whole function clearer. Before
we move all of this logic into heap_page_prune_and_freeze(), we want to
make it more compact and clear.
---
src/backend/access/heap/vacuumlazy.c | 122 +++++++++++++++++----------
1 file changed, 78 insertions(+), 44 deletions(-)
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index 436143cd12c..425dc2f8691 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -428,6 +428,11 @@ static void find_next_unskippable_block(LVRelState *vacrel, bool *skipsallvis);
static bool lazy_scan_new_or_empty(LVRelState *vacrel, Buffer buf,
BlockNumber blkno, Page page,
bool sharelock, Buffer vmbuffer);
+static void identify_and_fix_vm_corruption(Relation rel, Buffer heap_buffer,
+ BlockNumber heap_blk, Page heap_page,
+ int nlpdead_items,
+ Buffer vmbuffer,
+ uint8 vmbits);
static int lazy_scan_prune(LVRelState *vacrel, Buffer buf,
BlockNumber blkno, Page page,
Buffer vmbuffer,
@@ -1935,6 +1940,77 @@ cmpOffsetNumbers(const void *a, const void *b)
return pg_cmp_u16(*(const OffsetNumber *) a, *(const OffsetNumber *) b);
}
+/*
+ * Helper to correct any corruption detected on an heap page and its
+ * corresponding visibility map page after pruning but before setting the
+ * visibility map. It examines the heap page, the associated VM page, and the
+ * number of dead items previously identified.
+ *
+ * This function must be called while holding an exclusive lock on the heap
+ * buffer, and the dead items must have been discovered under that same lock.
+
+ * The provided vmbits must reflect the current state of the VM block
+ * referenced by vmbuffer. Although we do not hold a lock on the VM buffer, it
+ * is pinned, and the heap buffer is exclusively locked, ensuring that no
+ * other backend can update the VM bits corresponding to this heap page.
+ */
+static void
+identify_and_fix_vm_corruption(Relation rel, Buffer heap_buffer,
+ BlockNumber heap_blk, Page heap_page,
+ int nlpdead_items,
+ Buffer vmbuffer,
+ uint8 vmbits)
+{
+ Assert(visibilitymap_get_status(rel, heap_blk, &vmbuffer) == vmbits);
+
+ Assert(BufferIsLockedByMeInMode(heap_buffer, BUFFER_LOCK_EXCLUSIVE));
+
+ /*
+ * As of PostgreSQL 9.2, the visibility map bit should never be set if the
+ * page-level bit is clear. However, it's possible that the bit got
+ * cleared after heap_vac_scan_next_block() was called, so we must recheck
+ * with buffer lock before concluding that the VM is corrupt.
+ */
+ if (!PageIsAllVisible(heap_page) &&
+ ((vmbits & VISIBILITYMAP_ALL_VISIBLE) != 0))
+ {
+ ereport(WARNING,
+ (errcode(ERRCODE_DATA_CORRUPTED),
+ errmsg("page is not marked all-visible but visibility map bit is set in relation \"%s\" page %u",
+ RelationGetRelationName(rel), heap_blk)));
+
+ visibilitymap_clear(rel, heap_blk, vmbuffer,
+ VISIBILITYMAP_VALID_BITS);
+ }
+
+ /*
+ * It's possible for the value returned by
+ * GetOldestNonRemovableTransactionId() to move backwards, so it's not
+ * wrong for us to see tuples that appear to not be visible to everyone
+ * yet, while PD_ALL_VISIBLE is already set. The real safe xmin value
+ * never moves backwards, but GetOldestNonRemovableTransactionId() is
+ * conservative and sometimes returns a value that's unnecessarily small,
+ * so if we see that contradiction it just means that the tuples that we
+ * think are not visible to everyone yet actually are, and the
+ * PD_ALL_VISIBLE flag is correct.
+ *
+ * There should never be LP_DEAD items on a page with PD_ALL_VISIBLE set,
+ * however.
+ */
+ else if (PageIsAllVisible(heap_page) && nlpdead_items > 0)
+ {
+ ereport(WARNING,
+ (errcode(ERRCODE_DATA_CORRUPTED),
+ errmsg("page containing LP_DEAD items is marked as all-visible in relation \"%s\" page %u",
+ RelationGetRelationName(rel), heap_blk)));
+
+ PageClearAllVisible(heap_page);
+ MarkBufferDirty(heap_buffer);
+ visibilitymap_clear(rel, heap_blk, vmbuffer,
+ VISIBILITYMAP_VALID_BITS);
+ }
+}
+
/*
* lazy_scan_prune() -- lazy_scan_heap() pruning and freezing.
*
@@ -2077,50 +2153,8 @@ lazy_scan_prune(LVRelState *vacrel,
old_vmbits = visibilitymap_get_status(vacrel->rel, blkno, &vmbuffer);
- /*
- * As of PostgreSQL 9.2, the visibility map bit should never be set if the
- * page-level bit is clear. However, it's possible that the bit got
- * cleared after heap_vac_scan_next_block() was called, so we must recheck
- * with buffer lock before concluding that the VM is corrupt.
- */
- if (!PageIsAllVisible(page) &&
- (old_vmbits & VISIBILITYMAP_ALL_VISIBLE) != 0)
- {
- ereport(WARNING,
- (errcode(ERRCODE_DATA_CORRUPTED),
- errmsg("page is not marked all-visible but visibility map bit is set in relation \"%s\" page %u",
- vacrel->relname, blkno)));
-
- visibilitymap_clear(vacrel->rel, blkno, vmbuffer,
- VISIBILITYMAP_VALID_BITS);
- }
-
- /*
- * It's possible for the value returned by
- * GetOldestNonRemovableTransactionId() to move backwards, so it's not
- * wrong for us to see tuples that appear to not be visible to everyone
- * yet, while PD_ALL_VISIBLE is already set. The real safe xmin value
- * never moves backwards, but GetOldestNonRemovableTransactionId() is
- * conservative and sometimes returns a value that's unnecessarily small,
- * so if we see that contradiction it just means that the tuples that we
- * think are not visible to everyone yet actually are, and the
- * PD_ALL_VISIBLE flag is correct.
- *
- * There should never be LP_DEAD items on a page with PD_ALL_VISIBLE set,
- * however.
- */
- else if (presult.lpdead_items > 0 && PageIsAllVisible(page))
- {
- ereport(WARNING,
- (errcode(ERRCODE_DATA_CORRUPTED),
- errmsg("page containing LP_DEAD items is marked as all-visible in relation \"%s\" page %u",
- vacrel->relname, blkno)));
-
- PageClearAllVisible(page);
- MarkBufferDirty(buf);
- visibilitymap_clear(vacrel->rel, blkno, vmbuffer,
- VISIBILITYMAP_VALID_BITS);
- }
+ identify_and_fix_vm_corruption(vacrel->rel, buf, blkno, page,
+ presult.lpdead_items, vmbuffer, old_vmbits);
if (!presult.all_visible)
return presult.ndeleted;
--
2.43.0
v27-0004-Set-the-VM-in-heap_page_prune_and_freeze.patchtext/x-patch; charset=US-ASCII; name=v27-0004-Set-the-VM-in-heap_page_prune_and_freeze.patchDownload
From d87e33478520e52fc010071e8dcd6eac5460ec27 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <melanieplageman@gmail.com>
Date: Tue, 2 Dec 2025 15:07:42 -0500
Subject: [PATCH v27 04/14] Set the VM in heap_page_prune_and_freeze()
This has no independent benefit. It is meant for ease of review. As of
this commit, there is still a separate WAL record emitted for setting
the VM after pruning and freezing. But it is easier to review if moving
the logic into pruneheap.c is separate from setting the VM in the same
WAL record.
---
src/backend/access/heap/pruneheap.c | 301 +++++++++++++++++++++++----
src/backend/access/heap/vacuumlazy.c | 142 +------------
src/include/access/heapam.h | 21 ++
3 files changed, 285 insertions(+), 179 deletions(-)
diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index 07aa08cfe14..c979625551c 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -19,7 +19,7 @@
#include "access/htup_details.h"
#include "access/multixact.h"
#include "access/transam.h"
-#include "access/visibilitymapdefs.h"
+#include "access/visibilitymap.h"
#include "access/xlog.h"
#include "access/xloginsert.h"
#include "commands/vacuum.h"
@@ -44,6 +44,8 @@ typedef struct
bool mark_unused_now;
/* whether to attempt freezing tuples */
bool attempt_freeze;
+ /* whether or not to attempt updating the VM */
+ bool attempt_update_vm;
struct VacuumCutoffs *cutoffs;
/*-------------------------------------------------------
@@ -140,16 +142,17 @@ typedef struct
* all_visible and all_frozen indicate if the all-visible and all-frozen
* bits in the visibility map can be set for this page after pruning.
*
- * visibility_cutoff_xid is the newest xmin of live tuples on the page.
- * The caller can use it as the conflict horizon, when setting the VM
- * bits. It is only valid if we froze some tuples, and all_frozen is
- * true.
+ * visibility_cutoff_xid is the newest xmin of live tuples on the page. It
+ * can be used as the conflict horizon when setting the VM or when
+ * freezing all the tuples on the page. It is only valid when all the live
+ * tuples on the page are all-visible.
*
* NOTE: all_visible and all_frozen initially don't include LP_DEAD items.
* That's convenient for heap_page_prune_and_freeze() to use them to
- * decide whether to freeze the page or not. The all_visible and
- * all_frozen values returned to the caller are adjusted to include
- * LP_DEAD items after we determine whether to opportunistically freeze.
+ * decide whether to opportunistically freeze the page or not. The
+ * all_visible and all_frozen values ultimately used to set the VM are
+ * adjusted to include LP_DEAD items after we determine whether or not to
+ * opportunistically freeze.
*/
bool all_visible;
bool all_frozen;
@@ -191,6 +194,17 @@ static void page_verify_redirects(Page page);
static bool heap_page_will_freeze(Relation relation, Buffer buffer,
bool did_tuple_hint_fpi, bool do_prune, bool do_hint_prune,
PruneState *prstate);
+static void identify_and_fix_vm_corruption(Relation rel, Buffer heap_buffer,
+ BlockNumber heap_blk, Page heap_page, int nlpdead_items,
+ Buffer vmbuffer,
+ uint8 vmbits);
+static bool heap_page_will_set_vm(PruneState *prstate,
+ Relation relation,
+ BlockNumber heap_blk, Buffer heap_buffer, Page heap_page,
+ Buffer vmbuffer,
+ int nlpdead_items,
+ uint8 *old_vmbits,
+ uint8 *new_vmbits);
/*
@@ -280,6 +294,7 @@ heap_page_prune_opt(Relation relation, Buffer buffer)
PruneFreezeParams params = {
.relation = relation,
.buffer = buffer,
+ .vmbuffer = InvalidBuffer,
.reason = PRUNE_ON_ACCESS,
.options = 0,
.vistest = vistest,
@@ -341,6 +356,8 @@ prune_freeze_setup(PruneFreezeParams *params,
/* cutoffs must be provided if we will attempt freezing */
Assert(!(params->options & HEAP_PAGE_PRUNE_FREEZE) || params->cutoffs);
prstate->attempt_freeze = (params->options & HEAP_PAGE_PRUNE_FREEZE) != 0;
+ prstate->attempt_update_vm =
+ (params->options & HEAP_PAGE_PRUNE_UPDATE_VM) != 0;
prstate->cutoffs = params->cutoffs;
/*
@@ -396,51 +413,54 @@ prune_freeze_setup(PruneFreezeParams *params,
prstate->frz_conflict_horizon = InvalidTransactionId;
/*
- * Vacuum may update the VM after we're done. We can keep track of
- * whether the page will be all-visible and all-frozen after pruning and
- * freezing to help the caller to do that.
+ * Track whether the page could be marked all-visible and/or all-frozen.
+ * This information is used for opportunistic freezing and for updating
+ * the visibility map (VM) if requested by the caller.
*
- * Currently, only VACUUM sets the VM bits. To save the effort, only do
- * the bookkeeping if the caller needs it. Currently, that's tied to
- * HEAP_PAGE_PRUNE_FREEZE, but it could be a separate flag if you wanted
- * to update the VM bits without also freezing or freeze without also
- * setting the VM bits.
+ * Currently, only VACUUM performs freezing, but other callers may in the
+ * future. Visibility bookkeeping is required not just for setting the VM
+ * bits, but also for opportunistic freezing: we only consider freezing if
+ * the page would become all-frozen, or if it would be all-frozen except
+ * for dead tuples that VACUUM will remove. If attempt_update_vm is false,
+ * we will not set the VM bit even if the page is found to be all-visible.
*
- * In addition to telling the caller whether it can set the VM bit, we
- * also use 'all_visible' and 'all_frozen' for our own decision-making. If
- * the whole page would become frozen, we consider opportunistically
- * freezing tuples. We will not be able to freeze the whole page if there
- * are tuples present that are not visible to everyone or if there are
- * dead tuples which are not yet removable. However, dead tuples which
- * will be removed by the end of vacuuming should not preclude us from
- * opportunistically freezing. Because of that, we do not immediately
- * clear all_visible and all_frozen when we see LP_DEAD items. We fix
- * that after scanning the line pointers. We must correct all_visible and
- * all_frozen before we return them to the caller, so that the caller
- * doesn't set the VM bits incorrectly.
+ * If HEAP_PAGE_PRUNE_UPDATE_VM is passed without HEAP_PAGE_PRUNE_FREEZE,
+ * prstate.all_frozen must be initialized to false, since we will not call
+ * heap_prepare_freeze_tuple() for each tuple.
+ *
+ * Dead tuples that will be removed by the end of vacuum should not
+ * prevent opportunistic freezing. Therefore, we do not clear all_visible
+ * and all_frozen when we encounter LP_DEAD items. Instead, we correct
+ * them after deciding whether to freeze, but before updating the VM, to
+ * avoid setting the VM bits incorrectly.
+ *
+ * If neither freezing nor VM updates are requested, we skip the extra
+ * bookkeeping. In this case, initializing all_visible to false allows
+ * heap_prune_record_unchanged_lp_normal() to bypass unnecessary work.
*/
if (prstate->attempt_freeze)
{
prstate->all_visible = true;
prstate->all_frozen = true;
}
+ else if (prstate->attempt_update_vm)
+ {
+ prstate->all_visible = true;
+ prstate->all_frozen = false;
+ }
else
{
- /*
- * Initializing to false allows skipping the work to update them in
- * heap_prune_record_unchanged_lp_normal().
- */
prstate->all_visible = false;
prstate->all_frozen = false;
}
/*
- * The visibility cutoff xid is the newest xmin of live tuples on the
- * page. In the common case, this will be set as the conflict horizon the
- * caller can use for updating the VM. If, at the end of freezing and
- * pruning, the page is all-frozen, there is no possibility that any
- * running transaction on the standby does not see tuples on the page as
- * all-visible, so the conflict horizon remains InvalidTransactionId.
+ * The visibility cutoff xid is the newest xmin of live, committed tuples
+ * older than OldestXmin on the page. This field is only kept up-to-date
+ * if the page is all-visible. As soon as a tuple is encountered that is
+ * not visible to all, this field is unmaintained. As long as it is
+ * maintained, it can be used to calculate the snapshot conflict horizon
+ * when updating the VM and/or freezing all the tuples on the page.
*/
prstate->visibility_cutoff_xid = InvalidTransactionId;
}
@@ -775,10 +795,134 @@ heap_page_will_freeze(Relation relation, Buffer buffer,
return do_freeze;
}
+/*
+ * Helper to correct any corruption detected on an heap page and its
+ * corresponding visibility map page after pruning but before setting the
+ * visibility map. It examines the heap page, the associated VM page, and the
+ * number of dead items previously identified.
+ *
+ * This function must be called while holding an exclusive lock on the heap
+ * buffer, and the dead items must have been discovered under that same lock.
+ *
+ * The provided vmbits must reflect the current state of the VM block
+ * referenced by vmbuffer. Although we do not hold a lock on the VM buffer, it
+ * is pinned, and the heap buffer is exclusively locked, ensuring that no
+ * other backend can update the VM bits corresponding to this heap page.
+ */
+static void
+identify_and_fix_vm_corruption(Relation rel, Buffer heap_buffer,
+ BlockNumber heap_blk, Page heap_page,
+ int nlpdead_items,
+ Buffer vmbuffer,
+ uint8 vmbits)
+{
+ Assert(visibilitymap_get_status(rel, heap_blk, &vmbuffer) == vmbits);
+
+ Assert(BufferIsLockedByMeInMode(heap_buffer, BUFFER_LOCK_EXCLUSIVE));
+
+ /*
+ * As of PostgreSQL 9.2, the visibility map bit should never be set if the
+ * page-level bit is clear. However, it's possible that the bit got
+ * cleared after heap_vac_scan_next_block() was called, so we must recheck
+ * with buffer lock before concluding that the VM is corrupt.
+ */
+ if (!PageIsAllVisible(heap_page) &&
+ ((vmbits & VISIBILITYMAP_ALL_VISIBLE) != 0))
+ {
+ ereport(WARNING,
+ (errcode(ERRCODE_DATA_CORRUPTED),
+ errmsg("page is not marked all-visible but visibility map bit is set in relation \"%s\" page %u",
+ RelationGetRelationName(rel), heap_blk)));
+
+ visibilitymap_clear(rel, heap_blk, vmbuffer,
+ VISIBILITYMAP_VALID_BITS);
+ }
+
+ /*
+ * It's possible for the value returned by
+ * GetOldestNonRemovableTransactionId() to move backwards, so it's not
+ * wrong for us to see tuples that appear to not be visible to everyone
+ * yet, while PD_ALL_VISIBLE is already set. The real safe xmin value
+ * never moves backwards, but GetOldestNonRemovableTransactionId() is
+ * conservative and sometimes returns a value that's unnecessarily small,
+ * so if we see that contradiction it just means that the tuples that we
+ * think are not visible to everyone yet actually are, and the
+ * PD_ALL_VISIBLE flag is correct.
+ *
+ * There should never be LP_DEAD items on a page with PD_ALL_VISIBLE set,
+ * however.
+ */
+ else if (PageIsAllVisible(heap_page) && nlpdead_items > 0)
+ {
+ ereport(WARNING,
+ (errcode(ERRCODE_DATA_CORRUPTED),
+ errmsg("page containing LP_DEAD items is marked as all-visible in relation \"%s\" page %u",
+ RelationGetRelationName(rel), heap_blk)));
+
+ PageClearAllVisible(heap_page);
+ MarkBufferDirty(heap_buffer);
+ visibilitymap_clear(rel, heap_blk, vmbuffer,
+ VISIBILITYMAP_VALID_BITS);
+ }
+}
+
+/*
+ * Decide whether to set the visibility map bits (all-visible and all-frozen)
+ * for heap_blk using information from the PruneState and VM.
+ *
+ * This function does not actually set the VM bits or page-level visibility
+ * hint, PD_ALL_VISIBLE.
+ *
+ * However, if it finds that the page-level visibility hint or VM is
+ * corrupted, it will fix them by clearing the VM bits and visibility hint.
+ * This does not need to be done in a critical section.
+ *
+ * Returns true if one or both VM bits should be set, along with returning the
+ * current value of the VM bits in *old_vmbits and the desired new value of
+ * the VM bits in *new_vmbits.
+ */
+static bool
+heap_page_will_set_vm(PruneState *prstate,
+ Relation relation,
+ BlockNumber heap_blk, Buffer heap_buffer, Page heap_page,
+ Buffer vmbuffer,
+ int nlpdead_items,
+ uint8 *old_vmbits,
+ uint8 *new_vmbits)
+{
+ if (!prstate->attempt_update_vm)
+ return false;
+
+ *old_vmbits = visibilitymap_get_status(relation, heap_blk,
+ &vmbuffer);
+
+ /* We do this even if not all-visible */
+ identify_and_fix_vm_corruption(relation, heap_buffer, heap_blk, heap_page,
+ nlpdead_items, vmbuffer,
+ *old_vmbits);
+
+ if (!prstate->all_visible)
+ return false;
+
+ *new_vmbits = VISIBILITYMAP_ALL_VISIBLE;
+
+ if (prstate->all_frozen)
+ *new_vmbits |= VISIBILITYMAP_ALL_FROZEN;
+
+ if (*new_vmbits == *old_vmbits)
+ {
+ *new_vmbits = 0;
+ return false;
+ }
+
+ return true;
+}
+
/*
* Prune and repair fragmentation and potentially freeze tuples on the
- * specified page.
+ * specified page. If the page's visibility status has changed, update it in
+ * the VM.
*
* Caller must have pin and buffer cleanup lock on the page. Note that we
* don't update the FSM information for page on caller's behalf. Caller might
@@ -793,12 +937,13 @@ heap_page_will_freeze(Relation relation, Buffer buffer,
* tuples if it's required in order to advance relfrozenxid / relminmxid, or
* if it's considered advantageous for overall system performance to do so
* now. The 'params.cutoffs', 'presult', 'new_relfrozen_xid' and
- * 'new_relmin_mxid' arguments are required when freezing. When
- * HEAP_PAGE_PRUNE_FREEZE option is passed, we also set presult->all_visible
- * and presult->all_frozen after determining whether or not to
- * opportunistically freeze, to indicate if the VM bits can be set. They are
- * always set to false when the HEAP_PAGE_PRUNE_FREEZE option is not passed,
- * because at the moment only callers that also freeze need that information.
+ * 'new_relmin_mxid' arguments are required when freezing.
+ *
+ * If HEAP_PAGE_PRUNE_UPDATE_VM is set in params and the visibility status of
+ * the page has changed, we will update the VM at the same time as pruning and
+ * freezing the heap page. We will also update presult->old_vmbits and
+ * presult->new_vmbits with the state of the VM before and after updating it
+ * for the caller to use in bookkeeping.
*
* presult contains output parameters needed by callers, such as the number of
* tuples removed and the offsets of dead items on the page after pruning.
@@ -823,13 +968,19 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
MultiXactId *new_relmin_mxid)
{
Buffer buffer = params->buffer;
+ Buffer vmbuffer = params->vmbuffer;
Page page = BufferGetPage(buffer);
+ BlockNumber blockno = BufferGetBlockNumber(buffer);
PruneState prstate;
bool do_freeze;
bool do_prune;
bool do_hint_prune;
+ bool do_set_vm;
bool did_tuple_hint_fpi;
int64 fpi_before = pgWalUsage.wal_fpi;
+ uint8 new_vmbits = 0;
+ uint8 old_vmbits = 0;
+
/* Initialize prstate */
prune_freeze_setup(params,
@@ -1011,6 +1162,64 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
*new_relmin_mxid = prstate.pagefrz.NoFreezePageRelminMxid;
}
}
+
+ /* Now update the visibility map and PD_ALL_VISIBLE hint */
+ Assert(!prstate.all_visible || (prstate.lpdead_items == 0));
+
+ do_set_vm = heap_page_will_set_vm(&prstate,
+ params->relation,
+ blockno,
+ buffer,
+ page,
+ vmbuffer,
+ prstate.lpdead_items,
+ &old_vmbits,
+ &new_vmbits);
+
+ /*
+ * new_vmbits should be 0 regardless of whether or not the page is
+ * all-visible if we do not intend to set the VM.
+ */
+ Assert(do_set_vm || new_vmbits == 0);
+
+ /* Set the visibility map and page visibility hint, if relevant */
+ if (do_set_vm)
+ {
+ Assert(prstate.all_visible);
+
+ /*
+ * It should never be the case that the visibility map page is set
+ * while the page-level bit is clear, but the reverse is allowed (if
+ * checksums are not enabled). Regardless, set both bits so that we
+ * get back in sync.
+ *
+ * Even if PD_ALL_VISIBLE is already set, we don't need to worry about
+ * unnecessarily dirtying the heap buffer, as it must be marked dirty
+ * before adding it to the WAL chain. The only scenario where it is
+ * not already dirty is if the VM was removed, and that isn't worth
+ * optimizing for.
+ */
+ PageSetAllVisible(page);
+ MarkBufferDirty(buffer);
+
+ /*
+ * If the page is being set all-frozen, we pass InvalidTransactionId
+ * as the cutoff_xid, since a snapshot conflict horizon sufficient to
+ * make everything safe for REDO was logged when the page's tuples
+ * were frozen.
+ */
+ Assert(!prstate.all_frozen ||
+ !TransactionIdIsValid(presult->vm_conflict_horizon));
+
+ visibilitymap_set(params->relation, blockno, buffer,
+ InvalidXLogRecPtr,
+ vmbuffer, presult->vm_conflict_horizon,
+ new_vmbits);
+ }
+
+ /* Save the vmbits for caller */
+ presult->old_vmbits = old_vmbits;
+ presult->new_vmbits = new_vmbits;
}
@@ -1485,6 +1694,8 @@ heap_prune_record_unchanged_lp_normal(Page page, PruneState *prstate, OffsetNumb
{
TransactionId xmin;
+ Assert(prstate->attempt_update_vm);
+
if (!HeapTupleHeaderXminCommitted(htup))
{
prstate->all_visible = false;
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index 425dc2f8691..ccfad5b2dba 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -428,11 +428,6 @@ static void find_next_unskippable_block(LVRelState *vacrel, bool *skipsallvis);
static bool lazy_scan_new_or_empty(LVRelState *vacrel, Buffer buf,
BlockNumber blkno, Page page,
bool sharelock, Buffer vmbuffer);
-static void identify_and_fix_vm_corruption(Relation rel, Buffer heap_buffer,
- BlockNumber heap_blk, Page heap_page,
- int nlpdead_items,
- Buffer vmbuffer,
- uint8 vmbits);
static int lazy_scan_prune(LVRelState *vacrel, Buffer buf,
BlockNumber blkno, Page page,
Buffer vmbuffer,
@@ -1940,77 +1935,6 @@ cmpOffsetNumbers(const void *a, const void *b)
return pg_cmp_u16(*(const OffsetNumber *) a, *(const OffsetNumber *) b);
}
-/*
- * Helper to correct any corruption detected on an heap page and its
- * corresponding visibility map page after pruning but before setting the
- * visibility map. It examines the heap page, the associated VM page, and the
- * number of dead items previously identified.
- *
- * This function must be called while holding an exclusive lock on the heap
- * buffer, and the dead items must have been discovered under that same lock.
-
- * The provided vmbits must reflect the current state of the VM block
- * referenced by vmbuffer. Although we do not hold a lock on the VM buffer, it
- * is pinned, and the heap buffer is exclusively locked, ensuring that no
- * other backend can update the VM bits corresponding to this heap page.
- */
-static void
-identify_and_fix_vm_corruption(Relation rel, Buffer heap_buffer,
- BlockNumber heap_blk, Page heap_page,
- int nlpdead_items,
- Buffer vmbuffer,
- uint8 vmbits)
-{
- Assert(visibilitymap_get_status(rel, heap_blk, &vmbuffer) == vmbits);
-
- Assert(BufferIsLockedByMeInMode(heap_buffer, BUFFER_LOCK_EXCLUSIVE));
-
- /*
- * As of PostgreSQL 9.2, the visibility map bit should never be set if the
- * page-level bit is clear. However, it's possible that the bit got
- * cleared after heap_vac_scan_next_block() was called, so we must recheck
- * with buffer lock before concluding that the VM is corrupt.
- */
- if (!PageIsAllVisible(heap_page) &&
- ((vmbits & VISIBILITYMAP_ALL_VISIBLE) != 0))
- {
- ereport(WARNING,
- (errcode(ERRCODE_DATA_CORRUPTED),
- errmsg("page is not marked all-visible but visibility map bit is set in relation \"%s\" page %u",
- RelationGetRelationName(rel), heap_blk)));
-
- visibilitymap_clear(rel, heap_blk, vmbuffer,
- VISIBILITYMAP_VALID_BITS);
- }
-
- /*
- * It's possible for the value returned by
- * GetOldestNonRemovableTransactionId() to move backwards, so it's not
- * wrong for us to see tuples that appear to not be visible to everyone
- * yet, while PD_ALL_VISIBLE is already set. The real safe xmin value
- * never moves backwards, but GetOldestNonRemovableTransactionId() is
- * conservative and sometimes returns a value that's unnecessarily small,
- * so if we see that contradiction it just means that the tuples that we
- * think are not visible to everyone yet actually are, and the
- * PD_ALL_VISIBLE flag is correct.
- *
- * There should never be LP_DEAD items on a page with PD_ALL_VISIBLE set,
- * however.
- */
- else if (PageIsAllVisible(heap_page) && nlpdead_items > 0)
- {
- ereport(WARNING,
- (errcode(ERRCODE_DATA_CORRUPTED),
- errmsg("page containing LP_DEAD items is marked as all-visible in relation \"%s\" page %u",
- RelationGetRelationName(rel), heap_blk)));
-
- PageClearAllVisible(heap_page);
- MarkBufferDirty(heap_buffer);
- visibilitymap_clear(rel, heap_blk, vmbuffer,
- VISIBILITYMAP_VALID_BITS);
- }
-}
-
/*
* lazy_scan_prune() -- lazy_scan_heap() pruning and freezing.
*
@@ -2042,13 +1966,12 @@ lazy_scan_prune(LVRelState *vacrel,
PruneFreezeParams params = {
.relation = rel,
.buffer = buf,
+ .vmbuffer = vmbuffer,
.reason = PRUNE_VACUUM_SCAN,
- .options = HEAP_PAGE_PRUNE_FREEZE,
+ .options = HEAP_PAGE_PRUNE_FREEZE | HEAP_PAGE_PRUNE_UPDATE_VM,
.vistest = vacrel->vistest,
.cutoffs = &vacrel->cutoffs,
};
- uint8 old_vmbits = 0;
- uint8 new_vmbits = 0;
Assert(BufferGetBlockNumber(buf) == blkno);
@@ -2148,73 +2071,24 @@ lazy_scan_prune(LVRelState *vacrel,
/* Did we find LP_DEAD items? */
*has_lpdead_items = (presult.lpdead_items > 0);
- Assert(!presult.all_visible || !(*has_lpdead_items));
- Assert(!presult.all_frozen || presult.all_visible);
-
- old_vmbits = visibilitymap_get_status(vacrel->rel, blkno, &vmbuffer);
-
- identify_and_fix_vm_corruption(vacrel->rel, buf, blkno, page,
- presult.lpdead_items, vmbuffer, old_vmbits);
-
- if (!presult.all_visible)
- return presult.ndeleted;
-
- /* Set the visibility map and page visibility hint */
- new_vmbits = VISIBILITYMAP_ALL_VISIBLE;
-
- if (presult.all_frozen)
- new_vmbits |= VISIBILITYMAP_ALL_FROZEN;
-
- /* Nothing to do */
- if (old_vmbits == new_vmbits)
- return presult.ndeleted;
-
- Assert(presult.all_visible);
-
- /*
- * It should never be the case that the visibility map page is set while
- * the page-level bit is clear, but the reverse is allowed (if checksums
- * are not enabled). Regardless, set both bits so that we get back in
- * sync.
- *
- * Even if PD_ALL_VISIBLE is already set, we don't need to worry about
- * unnecessarily dirtying the heap buffer, as it must be marked dirty
- * before adding it to the WAL chain. The only scenario where it is not
- * already dirty is if the VM was removed, and that isn't worth optimizing
- * for.
- */
- PageSetAllVisible(page);
- MarkBufferDirty(buf);
-
- /*
- * If the page is being set all-frozen, we pass InvalidTransactionId as
- * the cutoff_xid, since a snapshot conflict horizon sufficient to make
- * everything safe for REDO was logged when the page's tuples were frozen.
- */
- Assert(!presult.all_frozen ||
- !TransactionIdIsValid(presult.vm_conflict_horizon));
-
- visibilitymap_set(vacrel->rel, blkno, buf,
- InvalidXLogRecPtr,
- vmbuffer, presult.vm_conflict_horizon,
- new_vmbits);
-
/*
* If the page wasn't already set all-visible and/or all-frozen in the VM,
* count it as newly set for logging.
*/
- if ((old_vmbits & VISIBILITYMAP_ALL_VISIBLE) == 0)
+ if ((presult.old_vmbits & VISIBILITYMAP_ALL_VISIBLE) == 0 &&
+ (presult.new_vmbits & VISIBILITYMAP_ALL_VISIBLE) != 0)
{
vacrel->vm_new_visible_pages++;
- if (presult.all_frozen)
+ if ((presult.new_vmbits & VISIBILITYMAP_ALL_FROZEN) != 0)
{
vacrel->vm_new_visible_frozen_pages++;
*vm_page_frozen = true;
}
}
- else if ((old_vmbits & VISIBILITYMAP_ALL_FROZEN) == 0 &&
- presult.all_frozen)
+ else if ((presult.old_vmbits & VISIBILITYMAP_ALL_FROZEN) == 0 &&
+ (presult.new_vmbits & VISIBILITYMAP_ALL_FROZEN) != 0)
{
+ Assert((presult.new_vmbits & VISIBILITYMAP_ALL_VISIBLE) != 0);
vacrel->vm_new_frozen_pages++;
*vm_page_frozen = true;
}
diff --git a/src/include/access/heapam.h b/src/include/access/heapam.h
index f7e4ae3843c..f3fa61c9c1b 100644
--- a/src/include/access/heapam.h
+++ b/src/include/access/heapam.h
@@ -42,6 +42,7 @@
/* "options" flag bits for heap_page_prune_and_freeze */
#define HEAP_PAGE_PRUNE_MARK_UNUSED_NOW (1 << 0)
#define HEAP_PAGE_PRUNE_FREEZE (1 << 1)
+#define HEAP_PAGE_PRUNE_UPDATE_VM (1 << 2)
typedef struct BulkInsertStateData *BulkInsertState;
typedef struct GlobalVisState GlobalVisState;
@@ -238,6 +239,12 @@ typedef struct PruneFreezeParams
Relation relation; /* relation containing buffer to be pruned */
Buffer buffer; /* buffer to be pruned */
+ /*
+ * vmbuffer is the buffer that must already contain the required block of
+ * the visibility map if we are to update it.
+ */
+ Buffer vmbuffer;
+
/*
* The reason pruning was performed. It is used to set the WAL record
* opcode which is used for debugging and analysis purposes.
@@ -252,6 +259,9 @@ typedef struct PruneFreezeParams
*
* HEAP_PAGE_PRUNE_FREEZE indicates that we will also freeze tuples, and
* will return 'all_visible', 'all_frozen' flags to the caller.
+ *
+ * HEAP_PAGE_PRUNE_UPDATE_VM indicates that we will set the page's status
+ * in the VM.
*/
int options;
@@ -299,6 +309,17 @@ typedef struct PruneFreezeResult
bool all_frozen;
TransactionId vm_conflict_horizon;
+ /*
+ * old_vmbits are the state of the all-visible and all-frozen bits in the
+ * visibility map before updating it during phase I of vacuuming.
+ * new_vmbits are the state of those bits after phase I of vacuuming.
+ *
+ * These are only set if the HEAP_PAGE_PRUNE_UPDATE_VM option is set and
+ * we have attempted to update the VM.
+ */
+ uint8 new_vmbits;
+ uint8 old_vmbits;
+
/*
* Whether or not the page makes rel truncation unsafe. This is set to
* 'true', even if the page contains LP_DEAD items. VACUUM will remove
--
2.43.0
v27-0005-Move-VM-assert-into-prune-freeze-code.patchtext/x-patch; charset=US-ASCII; name=v27-0005-Move-VM-assert-into-prune-freeze-code.patchDownload
From ef7ba68de0fa62c11c9f71e8ce1c577efa81d0ee Mon Sep 17 00:00:00 2001
From: Melanie Plageman <melanieplageman@gmail.com>
Date: Tue, 2 Dec 2025 15:57:34 -0500
Subject: [PATCH v27 05/14] Move VM assert into prune/freeze code
This is a step toward setting the VM in the same WAL record as pruning
and freezing. It moves the check of the heap page into prune/freeze code
before setting the VM. This allows us to remove some fields of the
PruneFreezeResult.
---
src/backend/access/heap/pruneheap.c | 86 ++++++++++++++++++++++------
src/backend/access/heap/vacuumlazy.c | 68 +---------------------
src/include/access/heapam.h | 25 +++-----
3 files changed, 77 insertions(+), 102 deletions(-)
diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index c979625551c..0ca16340e3e 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -918,6 +918,31 @@ heap_page_will_set_vm(PruneState *prstate,
return true;
}
+#ifdef USE_ASSERT_CHECKING
+
+/*
+ * Wrapper for heap_page_would_be_all_visible() which can be used for callers
+ * that expect no LP_DEAD on the page. Currently assert-only, but there is no
+ * reason not to use it outside of asserts.
+ */
+static bool
+heap_page_is_all_visible(Relation rel, Buffer buf,
+ TransactionId OldestXmin,
+ bool *all_frozen,
+ TransactionId *visibility_cutoff_xid,
+ OffsetNumber *logging_offnum)
+{
+
+ return heap_page_would_be_all_visible(rel, buf,
+ OldestXmin,
+ NULL, 0,
+ all_frozen,
+ visibility_cutoff_xid,
+ logging_offnum);
+}
+#endif
+
+
/*
* Prune and repair fragmentation and potentially freeze tuples on the
@@ -971,6 +996,7 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
Buffer vmbuffer = params->vmbuffer;
Page page = BufferGetPage(buffer);
BlockNumber blockno = BufferGetBlockNumber(buffer);
+ TransactionId vm_conflict_horizon = InvalidTransactionId;
PruneState prstate;
bool do_freeze;
bool do_prune;
@@ -1129,23 +1155,8 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
presult->nfrozen = prstate.nfrozen;
presult->live_tuples = prstate.live_tuples;
presult->recently_dead_tuples = prstate.recently_dead_tuples;
- presult->all_visible = prstate.all_visible;
- presult->all_frozen = prstate.all_frozen;
presult->hastup = prstate.hastup;
- /*
- * For callers planning to update the visibility map, the conflict horizon
- * for that record must be the newest xmin on the page. However, if the
- * page is completely frozen, there can be no conflict and the
- * vm_conflict_horizon should remain InvalidTransactionId. This includes
- * the case that we just froze all the tuples; the prune-freeze record
- * included the conflict XID already so the caller doesn't need it.
- */
- if (presult->all_frozen)
- presult->vm_conflict_horizon = InvalidTransactionId;
- else
- presult->vm_conflict_horizon = prstate.visibility_cutoff_xid;
-
presult->lpdead_items = prstate.lpdead_items;
/* the presult->deadoffsets array was already filled in */
@@ -1163,6 +1174,46 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
}
}
+ /*
+ * If updating the visibility map, the conflict horizon for that record
+ * must be the newest xmin on the page. However, if the page is
+ * completely frozen, there can be no conflict and the vm_conflict_horizon
+ * should remain InvalidTransactionId. This includes the case that we
+ * just froze all the tuples; the prune-freeze record included the
+ * conflict XID already so we don't need to again.
+ */
+ if (prstate.all_frozen)
+ vm_conflict_horizon = InvalidTransactionId;
+ else
+ vm_conflict_horizon = prstate.visibility_cutoff_xid;
+
+ /*
+ * During its second pass over the heap, VACUUM calls
+ * heap_page_would_be_all_visible() to determine whether a page is
+ * all-visible and all-frozen. The logic here is similar. After completing
+ * pruning and freezing, use an assertion to verify that our results
+ * remain consistent with heap_page_would_be_all_visible().
+ */
+#ifdef USE_ASSERT_CHECKING
+ if (prstate.all_visible)
+ {
+ TransactionId debug_cutoff;
+ bool debug_all_frozen;
+
+ Assert(presult->lpdead_items == 0);
+
+ Assert(heap_page_is_all_visible(params->relation, buffer,
+ prstate.cutoffs->OldestXmin,
+ &debug_all_frozen,
+ &debug_cutoff, off_loc));
+
+ Assert(prstate.all_frozen == debug_all_frozen);
+
+ Assert(!TransactionIdIsValid(debug_cutoff) ||
+ debug_cutoff == vm_conflict_horizon);
+ }
+#endif
+
/* Now update the visibility map and PD_ALL_VISIBLE hint */
Assert(!prstate.all_visible || (prstate.lpdead_items == 0));
@@ -1208,12 +1259,11 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
* make everything safe for REDO was logged when the page's tuples
* were frozen.
*/
- Assert(!prstate.all_frozen ||
- !TransactionIdIsValid(presult->vm_conflict_horizon));
+ Assert(!prstate.all_frozen || !TransactionIdIsValid(vm_conflict_horizon));
visibilitymap_set(params->relation, blockno, buffer,
InvalidXLogRecPtr,
- vmbuffer, presult->vm_conflict_horizon,
+ vmbuffer, vm_conflict_horizon,
new_vmbits);
}
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index ccfad5b2dba..3fa03470722 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -462,20 +462,6 @@ static void dead_items_add(LVRelState *vacrel, BlockNumber blkno, OffsetNumber *
static void dead_items_reset(LVRelState *vacrel);
static void dead_items_cleanup(LVRelState *vacrel);
-#ifdef USE_ASSERT_CHECKING
-static bool heap_page_is_all_visible(Relation rel, Buffer buf,
- TransactionId OldestXmin,
- bool *all_frozen,
- TransactionId *visibility_cutoff_xid,
- OffsetNumber *logging_offnum);
-#endif
-static bool heap_page_would_be_all_visible(Relation rel, Buffer buf,
- TransactionId OldestXmin,
- OffsetNumber *deadoffsets,
- int ndeadoffsets,
- bool *all_frozen,
- TransactionId *visibility_cutoff_xid,
- OffsetNumber *logging_offnum);
static void update_relstats_all_indexes(LVRelState *vacrel);
static void vacuum_error_callback(void *arg);
static void update_vacuum_error_info(LVRelState *vacrel,
@@ -2012,32 +1998,6 @@ lazy_scan_prune(LVRelState *vacrel,
vacrel->new_frozen_tuple_pages++;
}
- /*
- * VACUUM will call heap_page_is_all_visible() during the second pass over
- * the heap to determine all_visible and all_frozen for the page -- this
- * is a specialized version of the logic from this function. Now that
- * we've finished pruning and freezing, make sure that we're in total
- * agreement with heap_page_is_all_visible() using an assertion.
- */
-#ifdef USE_ASSERT_CHECKING
- if (presult.all_visible)
- {
- TransactionId debug_cutoff;
- bool debug_all_frozen;
-
- Assert(presult.lpdead_items == 0);
-
- Assert(heap_page_is_all_visible(vacrel->rel, buf,
- vacrel->cutoffs.OldestXmin, &debug_all_frozen,
- &debug_cutoff, &vacrel->offnum));
-
- Assert(presult.all_frozen == debug_all_frozen);
-
- Assert(!TransactionIdIsValid(debug_cutoff) ||
- debug_cutoff == presult.vm_conflict_horizon);
- }
-#endif
-
/*
* Now save details of the LP_DEAD items from the page in vacrel
*/
@@ -3494,29 +3454,6 @@ dead_items_cleanup(LVRelState *vacrel)
vacrel->pvs = NULL;
}
-#ifdef USE_ASSERT_CHECKING
-
-/*
- * Wrapper for heap_page_would_be_all_visible() which can be used for callers
- * that expect no LP_DEAD on the page. Currently assert-only, but there is no
- * reason not to use it outside of asserts.
- */
-static bool
-heap_page_is_all_visible(Relation rel, Buffer buf,
- TransactionId OldestXmin,
- bool *all_frozen,
- TransactionId *visibility_cutoff_xid,
- OffsetNumber *logging_offnum)
-{
-
- return heap_page_would_be_all_visible(rel, buf,
- OldestXmin,
- NULL, 0,
- all_frozen,
- visibility_cutoff_xid,
- logging_offnum);
-}
-#endif
/*
* Check whether the heap page in buf is all-visible except for the dead
@@ -3540,15 +3477,12 @@ heap_page_is_all_visible(Relation rel, Buffer buf,
* - *logging_offnum: OffsetNumber of current tuple being processed;
* used by vacuum's error callback system.
*
- * Callers looking to verify that the page is already all-visible can call
- * heap_page_is_all_visible().
- *
* This logic is closely related to heap_prune_record_unchanged_lp_normal().
* If you modify this function, ensure consistency with that code. An
* assertion cross-checks that both remain in agreement. Do not introduce new
* side-effects.
*/
-static bool
+bool
heap_page_would_be_all_visible(Relation rel, Buffer buf,
TransactionId OldestXmin,
OffsetNumber *deadoffsets,
diff --git a/src/include/access/heapam.h b/src/include/access/heapam.h
index f3fa61c9c1b..9100d42ccbb 100644
--- a/src/include/access/heapam.h
+++ b/src/include/access/heapam.h
@@ -257,8 +257,7 @@ typedef struct PruneFreezeParams
* HEAP_PAGE_PRUNE_MARK_UNUSED_NOW indicates that dead items can be set
* LP_UNUSED during pruning.
*
- * HEAP_PAGE_PRUNE_FREEZE indicates that we will also freeze tuples, and
- * will return 'all_visible', 'all_frozen' flags to the caller.
+ * HEAP_PAGE_PRUNE_FREEZE indicates that we will also freeze tuples
*
* HEAP_PAGE_PRUNE_UPDATE_VM indicates that we will set the page's status
* in the VM.
@@ -294,21 +293,6 @@ typedef struct PruneFreezeResult
int live_tuples;
int recently_dead_tuples;
- /*
- * all_visible and all_frozen indicate if the all-visible and all-frozen
- * bits in the visibility map can be set for this page, after pruning.
- *
- * vm_conflict_horizon is the newest xmin of live tuples on the page. The
- * caller can use it as the conflict horizon when setting the VM bits. It
- * is only valid if we froze some tuples (nfrozen > 0), and all_frozen is
- * true.
- *
- * These are only set if the HEAP_PRUNE_FREEZE option is set.
- */
- bool all_visible;
- bool all_frozen;
- TransactionId vm_conflict_horizon;
-
/*
* old_vmbits are the state of the all-visible and all-frozen bits in the
* visibility map before updating it during phase I of vacuuming.
@@ -454,6 +438,13 @@ extern void log_heap_prune_and_freeze(Relation relation, Buffer buffer,
/* in heap/vacuumlazy.c */
extern void heap_vacuum_rel(Relation rel,
const VacuumParams params, BufferAccessStrategy bstrategy);
+extern bool heap_page_would_be_all_visible(Relation rel, Buffer buf,
+ TransactionId OldestXmin,
+ OffsetNumber *deadoffsets,
+ int ndeadoffsets,
+ bool *all_frozen,
+ TransactionId *visibility_cutoff_xid,
+ OffsetNumber *logging_offnum);
/* in heap/heapam_visibility.c */
extern bool HeapTupleSatisfiesVisibility(HeapTuple htup, Snapshot snapshot,
--
2.43.0
v27-0006-Eliminate-XLOG_HEAP2_VISIBLE-from-vacuum-phase-I.patchtext/x-patch; charset=US-ASCII; name=v27-0006-Eliminate-XLOG_HEAP2_VISIBLE-from-vacuum-phase-I.patchDownload
From bfd883e25cc43eee9e03e98912fa72364ecebc81 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <melanieplageman@gmail.com>
Date: Tue, 2 Dec 2025 16:16:22 -0500
Subject: [PATCH v27 06/14] Eliminate XLOG_HEAP2_VISIBLE from vacuum phase I
prune/freeze
Vacuum no longer emits a separate WAL record for each page set
all-visible or all-frozen during phase I. Instead, visibility map
updates are now included in the XLOG_HEAP2_PRUNE_VACUUM_SCAN record that
is already emitted for pruning and freezing.
Previously, heap_page_prune_and_freeze() determined whether a page was
all-visible, but the corresponding VM bits were only set later in
lazy_scan_prune(). Now the VM is updated immediately in
heap_page_prune_and_freeze(), at the same time as the heap
modifications.
This change applies only to vacuum phase I, not to pruning performed
during normal page access.
Author: Melanie Plageman <melanieplageman@gmail.com>
Reviewed-by: Andres Freund <andres@anarazel.de>
Reviewed-by: Robert Haas <robertmhaas@gmail.com>
Reviewed-by: Kirill Reshke <reshkekirill@gmail.com>
Reviewed-by: Chao Li <li.evan.chao@gmail.com>
Discussion: https://postgr.es/m/flat/CAAKRu_ZMw6Npd_qm2KM%2BFwQ3cMOMx1Dh3VMhp8-V7SOLxdK9-g%40mail.gmail.com
---
src/backend/access/heap/pruneheap.c | 275 ++++++++++++++++------------
1 file changed, 157 insertions(+), 118 deletions(-)
diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index 0ca16340e3e..e74c2e06226 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -205,6 +205,11 @@ static bool heap_page_will_set_vm(PruneState *prstate,
int nlpdead_items,
uint8 *old_vmbits,
uint8 *new_vmbits);
+static TransactionId get_conflict_xid(bool do_prune, bool do_freeze, bool do_set_vm,
+ uint8 old_vmbits, uint8 new_vmbits,
+ TransactionId latest_xid_removed,
+ TransactionId frz_conflict_horizon,
+ TransactionId visibility_cutoff_xid);
/*
@@ -795,6 +800,68 @@ heap_page_will_freeze(Relation relation, Buffer buffer,
return do_freeze;
}
+/*
+ * Calculate the conflict horizon for the whole XLOG_HEAP2_PRUNE_VACUUM_SCAN
+ * or XLOG_HEAP2_PRUNE_ON_ACCESS record.
+ */
+static TransactionId
+get_conflict_xid(bool do_prune, bool do_freeze, bool do_set_vm,
+ uint8 old_vmbits, uint8 new_vmbits,
+ TransactionId latest_xid_removed, TransactionId frz_conflict_horizon,
+ TransactionId visibility_cutoff_xid)
+{
+ TransactionId conflict_xid;
+
+ /*
+ * We can omit the snapshot conflict horizon if we are not pruning or
+ * freezing any tuples and are setting an already all-visible page
+ * all-frozen in the VM. In this case, all of the tuples on the page must
+ * already be visible to all MVCC snapshots on the standby.
+ */
+ if (!do_prune &&
+ !do_freeze &&
+ do_set_vm &&
+ (old_vmbits & VISIBILITYMAP_ALL_VISIBLE) != 0 &&
+ (new_vmbits & VISIBILITYMAP_ALL_FROZEN) != 0)
+ return InvalidTransactionId;
+
+ /*
+ * The snapshotConflictHorizon for the whole record should be the most
+ * conservative of all the horizons calculated for any of the possible
+ * modifications. If this record will prune tuples, any transactions on
+ * the standby older than the youngest xmax of the most recently removed
+ * tuple this record will prune will conflict. If this record will freeze
+ * tuples, any transactions on the standby with xids older than the
+ * youngest tuple this record will freeze will conflict.
+ */
+ conflict_xid = InvalidTransactionId;
+
+ /*
+ * If we are updating the VM, the conflict horizon is almost always the
+ * visibility cutoff XID.
+ *
+ * Separately, if we are freezing any tuples, as an optimization, we can
+ * use the visibility_cutoff_xid as the conflict horizon if the page will
+ * be all-frozen. This is true even if there are LP_DEAD line pointers
+ * because we ignored those when maintaining the visibility_cutoff_xid.
+ * This will have been calculated earlier as the frz_conflict_horizon when
+ * we determined we would freeze.
+ */
+ if (do_set_vm)
+ conflict_xid = visibility_cutoff_xid;
+ else if (do_freeze)
+ conflict_xid = frz_conflict_horizon;
+
+ /*
+ * If we are removing tuples with a younger xmax than our so far
+ * calculated conflict_xid, we must use this as our horizon.
+ */
+ if (TransactionIdFollows(latest_xid_removed, conflict_xid))
+ conflict_xid = latest_xid_removed;
+
+ return conflict_xid;
+}
+
/*
* Helper to correct any corruption detected on an heap page and its
* corresponding visibility map page after pruning but before setting the
@@ -996,7 +1063,6 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
Buffer vmbuffer = params->vmbuffer;
Page page = BufferGetPage(buffer);
BlockNumber blockno = BufferGetBlockNumber(buffer);
- TransactionId vm_conflict_horizon = InvalidTransactionId;
PruneState prstate;
bool do_freeze;
bool do_prune;
@@ -1004,10 +1070,10 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
bool do_set_vm;
bool did_tuple_hint_fpi;
int64 fpi_before = pgWalUsage.wal_fpi;
+ TransactionId conflict_xid = InvalidTransactionId;
uint8 new_vmbits = 0;
uint8 old_vmbits = 0;
-
/* Initialize prstate */
prune_freeze_setup(params,
new_relfrozen_xid, new_relmin_mxid,
@@ -1068,6 +1134,37 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
prstate.all_visible = prstate.all_frozen = false;
Assert(!prstate.all_frozen || prstate.all_visible);
+ Assert(!prstate.all_visible || (prstate.lpdead_items == 0));
+
+ /*
+ * Decide whether to set the VM bits based on information from the VM and
+ * the all_visible/all_frozen flags.
+ */
+ do_set_vm = heap_page_will_set_vm(&prstate,
+ params->relation,
+ blockno,
+ buffer,
+ page,
+ vmbuffer,
+ prstate.lpdead_items,
+ &old_vmbits,
+ &new_vmbits);
+
+ /*
+ * new_vmbits should be 0 regardless of whether or not the page is
+ * all-visible if we do not intend to set the VM.
+ */
+ Assert(do_set_vm || new_vmbits == 0);
+
+ conflict_xid = get_conflict_xid(do_prune, do_freeze, do_set_vm,
+ old_vmbits, new_vmbits,
+ prstate.latest_xid_removed,
+ prstate.frz_conflict_horizon,
+ prstate.visibility_cutoff_xid);
+
+ /* Lock vmbuffer before entering a critical section */
+ if (do_set_vm)
+ LockBuffer(vmbuffer, BUFFER_LOCK_EXCLUSIVE);
/* Any error while applying the changes is critical */
START_CRIT_SECTION();
@@ -1089,14 +1186,17 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
/*
* If that's all we had to do to the page, this is a non-WAL-logged
- * hint. If we are going to freeze or prune the page, we will mark
- * the buffer dirty below.
+ * hint. If we are going to freeze or prune the page or set
+ * PD_ALL_VISIBLE, we will mark the buffer dirty below.
+ *
+ * Setting PD_ALL_VISIBLE is fully WAL-logged because it is forbidden
+ * for the VM to be set and PD_ALL_VISIBLE to be clear.
*/
- if (!do_freeze && !do_prune)
+ if (!do_freeze && !do_prune && !do_set_vm)
MarkBufferDirtyHint(buffer, true);
}
- if (do_prune || do_freeze)
+ if (do_prune || do_freeze || do_set_vm)
{
/* Apply the planned item changes and repair page fragmentation. */
if (do_prune)
@@ -1110,6 +1210,26 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
if (do_freeze)
heap_freeze_prepared_tuples(buffer, prstate.frozen, prstate.nfrozen);
+ /* Set the visibility map and page visibility hint */
+ if (do_set_vm)
+ {
+ /*
+ * While it is valid for PD_ALL_VISIBLE to be set when the
+ * corresponding VM bit is clear, we strongly prefer to keep them
+ * in sync.
+ *
+ * Even if we are only setting the VM and PD_ALL_VISIBLE is
+ * already set, we don't need to worry about unnecessarily
+ * dirtying the heap buffer below, as it must be marked dirty
+ * before adding it to the WAL chain. The only scenario where it
+ * is not already dirty is if the VM was removed, and that isn't
+ * worth optimizing for.
+ */
+ PageSetAllVisible(page);
+ visibilitymap_set_vmbits(blockno, vmbuffer, new_vmbits,
+ params->relation->rd_locator);
+ }
+
MarkBufferDirty(buffer);
/*
@@ -1117,29 +1237,12 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
*/
if (RelationNeedsWAL(params->relation))
{
- /*
- * The snapshotConflictHorizon for the whole record should be the
- * most conservative of all the horizons calculated for any of the
- * possible modifications. If this record will prune tuples, any
- * transactions on the standby older than the youngest xmax of the
- * most recently removed tuple this record will prune will
- * conflict. If this record will freeze tuples, any transactions
- * on the standby with xids older than the youngest tuple this
- * record will freeze will conflict.
- */
- TransactionId conflict_xid;
-
- if (TransactionIdFollows(prstate.frz_conflict_horizon,
- prstate.latest_xid_removed))
- conflict_xid = prstate.frz_conflict_horizon;
- else
- conflict_xid = prstate.latest_xid_removed;
-
log_heap_prune_and_freeze(params->relation, buffer,
- InvalidBuffer, /* vmbuffer */
- 0, /* vmflags */
+ do_set_vm ? vmbuffer : InvalidBuffer,
+ do_set_vm ? new_vmbits : 0,
conflict_xid,
- true, params->reason,
+ true, /* cleanup lock */
+ params->reason,
prstate.frozen, prstate.nfrozen,
prstate.redirected, prstate.nredirected,
prstate.nowdead, prstate.ndead,
@@ -1149,43 +1252,8 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
END_CRIT_SECTION();
- /* Copy information back for caller */
- presult->ndeleted = prstate.ndeleted;
- presult->nnewlpdead = prstate.ndead;
- presult->nfrozen = prstate.nfrozen;
- presult->live_tuples = prstate.live_tuples;
- presult->recently_dead_tuples = prstate.recently_dead_tuples;
- presult->hastup = prstate.hastup;
-
- presult->lpdead_items = prstate.lpdead_items;
- /* the presult->deadoffsets array was already filled in */
-
- if (prstate.attempt_freeze)
- {
- if (presult->nfrozen > 0)
- {
- *new_relfrozen_xid = prstate.pagefrz.FreezePageRelfrozenXid;
- *new_relmin_mxid = prstate.pagefrz.FreezePageRelminMxid;
- }
- else
- {
- *new_relfrozen_xid = prstate.pagefrz.NoFreezePageRelfrozenXid;
- *new_relmin_mxid = prstate.pagefrz.NoFreezePageRelminMxid;
- }
- }
-
- /*
- * If updating the visibility map, the conflict horizon for that record
- * must be the newest xmin on the page. However, if the page is
- * completely frozen, there can be no conflict and the vm_conflict_horizon
- * should remain InvalidTransactionId. This includes the case that we
- * just froze all the tuples; the prune-freeze record included the
- * conflict XID already so we don't need to again.
- */
- if (prstate.all_frozen)
- vm_conflict_horizon = InvalidTransactionId;
- else
- vm_conflict_horizon = prstate.visibility_cutoff_xid;
+ if (do_set_vm)
+ LockBuffer(vmbuffer, BUFFER_LOCK_UNLOCK);
/*
* During its second pass over the heap, VACUUM calls
@@ -1200,7 +1268,8 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
TransactionId debug_cutoff;
bool debug_all_frozen;
- Assert(presult->lpdead_items == 0);
+ Assert(prstate.lpdead_items == 0);
+ Assert(prstate.cutoffs);
Assert(heap_page_is_all_visible(params->relation, buffer,
prstate.cutoffs->OldestXmin,
@@ -1210,66 +1279,36 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
Assert(prstate.all_frozen == debug_all_frozen);
Assert(!TransactionIdIsValid(debug_cutoff) ||
- debug_cutoff == vm_conflict_horizon);
+ debug_cutoff == prstate.visibility_cutoff_xid);
}
#endif
- /* Now update the visibility map and PD_ALL_VISIBLE hint */
- Assert(!prstate.all_visible || (prstate.lpdead_items == 0));
-
- do_set_vm = heap_page_will_set_vm(&prstate,
- params->relation,
- blockno,
- buffer,
- page,
- vmbuffer,
- prstate.lpdead_items,
- &old_vmbits,
- &new_vmbits);
+ /* Copy information back for caller */
+ presult->ndeleted = prstate.ndeleted;
+ presult->nnewlpdead = prstate.ndead;
+ presult->nfrozen = prstate.nfrozen;
+ presult->live_tuples = prstate.live_tuples;
+ presult->recently_dead_tuples = prstate.recently_dead_tuples;
+ presult->hastup = prstate.hastup;
+ presult->new_vmbits = new_vmbits;
+ presult->old_vmbits = old_vmbits;
- /*
- * new_vmbits should be 0 regardless of whether or not the page is
- * all-visible if we do not intend to set the VM.
- */
- Assert(do_set_vm || new_vmbits == 0);
+ presult->lpdead_items = prstate.lpdead_items;
+ /* the presult->deadoffsets array was already filled in */
- /* Set the visibility map and page visibility hint, if relevant */
- if (do_set_vm)
+ if (prstate.attempt_freeze)
{
- Assert(prstate.all_visible);
-
- /*
- * It should never be the case that the visibility map page is set
- * while the page-level bit is clear, but the reverse is allowed (if
- * checksums are not enabled). Regardless, set both bits so that we
- * get back in sync.
- *
- * Even if PD_ALL_VISIBLE is already set, we don't need to worry about
- * unnecessarily dirtying the heap buffer, as it must be marked dirty
- * before adding it to the WAL chain. The only scenario where it is
- * not already dirty is if the VM was removed, and that isn't worth
- * optimizing for.
- */
- PageSetAllVisible(page);
- MarkBufferDirty(buffer);
-
- /*
- * If the page is being set all-frozen, we pass InvalidTransactionId
- * as the cutoff_xid, since a snapshot conflict horizon sufficient to
- * make everything safe for REDO was logged when the page's tuples
- * were frozen.
- */
- Assert(!prstate.all_frozen || !TransactionIdIsValid(vm_conflict_horizon));
-
- visibilitymap_set(params->relation, blockno, buffer,
- InvalidXLogRecPtr,
- vmbuffer, vm_conflict_horizon,
- new_vmbits);
+ if (presult->nfrozen > 0)
+ {
+ *new_relfrozen_xid = prstate.pagefrz.FreezePageRelfrozenXid;
+ *new_relmin_mxid = prstate.pagefrz.FreezePageRelminMxid;
+ }
+ else
+ {
+ *new_relfrozen_xid = prstate.pagefrz.NoFreezePageRelfrozenXid;
+ *new_relmin_mxid = prstate.pagefrz.NoFreezePageRelminMxid;
+ }
}
-
- /* Save the vmbits for caller */
- presult->old_vmbits = old_vmbits;
- presult->new_vmbits = new_vmbits;
}
--
2.43.0
v27-0007-Eliminate-XLOG_HEAP2_VISIBLE-from-empty-page-vac.patchtext/x-patch; charset=US-ASCII; name=v27-0007-Eliminate-XLOG_HEAP2_VISIBLE-from-empty-page-vac.patchDownload
From 505f5ed860559b4c45c53aee6fd8355e1bfd4ea8 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <melanieplageman@gmail.com>
Date: Sat, 27 Sep 2025 11:55:21 -0400
Subject: [PATCH v27 07/14] Eliminate XLOG_HEAP2_VISIBLE from empty-page vacuum
As part of removing XLOG_HEAP2_VISIBLE records, phase I of VACUUM now
marks empty pages all-visible in a XLOG_HEAP2_PRUNE_VACUUM_SCAN record.
Author: Melanie Plageman <melanieplageman@gmail.com>
Reviewed-by: Robert Haas <robertmhaas@gmail.com>
Reviewed-by: Andres Freund <andres@anarazel.de>
Reviewed-by: Chao Li <li.evan.chao@gmail.com>
---
src/backend/access/heap/vacuumlazy.c | 35 +++++++++++++++++++++++-----
1 file changed, 29 insertions(+), 6 deletions(-)
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index 3fa03470722..210afa11346 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -1873,9 +1873,12 @@ lazy_scan_new_or_empty(LVRelState *vacrel, Buffer buf, BlockNumber blkno,
*/
if (!PageIsAllVisible(page))
{
+ /* Lock vmbuffer before entering critical section */
+ LockBuffer(vmbuffer, BUFFER_LOCK_EXCLUSIVE);
+
START_CRIT_SECTION();
- /* mark buffer dirty before writing a WAL record */
+ /* Mark buffer dirty before writing any WAL records */
MarkBufferDirty(buf);
/*
@@ -1892,13 +1895,33 @@ lazy_scan_new_or_empty(LVRelState *vacrel, Buffer buf, BlockNumber blkno,
log_newpage_buffer(buf, true);
PageSetAllVisible(page);
- visibilitymap_set(vacrel->rel, blkno, buf,
- InvalidXLogRecPtr,
- vmbuffer, InvalidTransactionId,
- VISIBILITYMAP_ALL_VISIBLE |
- VISIBILITYMAP_ALL_FROZEN);
+ visibilitymap_set_vmbits(blkno,
+ vmbuffer,
+ VISIBILITYMAP_ALL_VISIBLE |
+ VISIBILITYMAP_ALL_FROZEN,
+ vacrel->rel->rd_locator);
+
+ /*
+ * Emit WAL for setting PD_ALL_VISIBLE on the heap page and
+ * setting the VM.
+ */
+ if (RelationNeedsWAL(vacrel->rel))
+ log_heap_prune_and_freeze(vacrel->rel, buf,
+ vmbuffer,
+ VISIBILITYMAP_ALL_VISIBLE |
+ VISIBILITYMAP_ALL_FROZEN,
+ InvalidTransactionId, /* conflict xid */
+ false, /* cleanup lock */
+ PRUNE_VACUUM_SCAN, /* reason */
+ NULL, 0,
+ NULL, 0,
+ NULL, 0,
+ NULL, 0);
+
END_CRIT_SECTION();
+ LockBuffer(vmbuffer, BUFFER_LOCK_UNLOCK);
+
/* Count the newly all-frozen pages for logging */
vacrel->vm_new_visible_pages++;
vacrel->vm_new_visible_frozen_pages++;
--
2.43.0
v27-0008-Remove-XLOG_HEAP2_VISIBLE-entirely.patchtext/x-patch; charset=US-ASCII; name=v27-0008-Remove-XLOG_HEAP2_VISIBLE-entirely.patchDownload
From 02a2153aaa4be45d868047e6100ea1ca47f5d7e2 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <melanieplageman@gmail.com>
Date: Sat, 27 Sep 2025 11:55:36 -0400
Subject: [PATCH v27 08/14] Remove XLOG_HEAP2_VISIBLE entirely
As no remaining users emit XLOG_HEAP2_VISIBLE records.
This includes deleting the xl_heap_visible struct and all functions
responsible for emitting or replaying XLOG_HEAP2_VISIBLE records.
This changes the visibility map API, so any external users/consumers of
the VM-only WAL record will need to change.
Author: Melanie Plageman <melanieplageman@gmail.com>
Reviewed-by: Andrey Borodin <x4mmm@yandex-team.ru>
Reviewed-by: Andres Freund <andres@anarazel.de>
Reviewed-by: Chao Li <li.evan.chao@gmail.com>
---
src/backend/access/common/bufmask.c | 4 +-
src/backend/access/heap/heapam.c | 54 +-------
src/backend/access/heap/heapam_xlog.c | 155 ++---------------------
src/backend/access/heap/pruneheap.c | 4 +-
src/backend/access/heap/vacuumlazy.c | 16 +--
src/backend/access/heap/visibilitymap.c | 112 +---------------
src/backend/access/rmgrdesc/heapdesc.c | 10 --
src/backend/replication/logical/decode.c | 1 -
src/backend/storage/ipc/standby.c | 12 +-
src/include/access/heapam_xlog.h | 28 +---
src/include/access/visibilitymap.h | 13 +-
src/include/access/visibilitymapdefs.h | 9 --
src/tools/pgindent/typedefs.list | 1 -
13 files changed, 45 insertions(+), 374 deletions(-)
diff --git a/src/backend/access/common/bufmask.c b/src/backend/access/common/bufmask.c
index bb260cffa68..5f07f179415 100644
--- a/src/backend/access/common/bufmask.c
+++ b/src/backend/access/common/bufmask.c
@@ -56,8 +56,8 @@ mask_page_hint_bits(Page page)
/*
* During replay, if the page LSN has advanced past our XLOG record's LSN,
- * we don't mark the page all-visible. See heap_xlog_visible() for
- * details.
+ * we don't mark the page all-visible. See heap_xlog_prune_and_freeze()
+ * for more details.
*/
PageClearAllVisible(page);
}
diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index 6daf4a87dec..fb7a7548aa0 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -2539,11 +2539,11 @@ heap_multi_insert(Relation relation, TupleTableSlot **slots, int ntuples,
else if (all_frozen_set)
{
PageSetAllVisible(page);
- visibilitymap_set_vmbits(BufferGetBlockNumber(buffer),
- vmbuffer,
- VISIBILITYMAP_ALL_VISIBLE |
- VISIBILITYMAP_ALL_FROZEN,
- relation->rd_locator);
+ visibilitymap_set(BufferGetBlockNumber(buffer),
+ vmbuffer,
+ VISIBILITYMAP_ALL_VISIBLE |
+ VISIBILITYMAP_ALL_FROZEN,
+ relation->rd_locator);
}
/*
@@ -8813,50 +8813,6 @@ bottomup_sort_and_shrink(TM_IndexDeleteOp *delstate)
return nblocksfavorable;
}
-/*
- * Perform XLogInsert for a heap-visible operation. 'block' is the block
- * being marked all-visible, and vm_buffer is the buffer containing the
- * corresponding visibility map block. Both should have already been modified
- * and dirtied.
- *
- * snapshotConflictHorizon comes from the largest xmin on the page being
- * marked all-visible. REDO routine uses it to generate recovery conflicts.
- *
- * If checksums or wal_log_hints are enabled, we may also generate a full-page
- * image of heap_buffer. Otherwise, we optimize away the FPI (by specifying
- * REGBUF_NO_IMAGE for the heap buffer), in which case the caller should *not*
- * update the heap page's LSN.
- */
-XLogRecPtr
-log_heap_visible(Relation rel, Buffer heap_buffer, Buffer vm_buffer,
- TransactionId snapshotConflictHorizon, uint8 vmflags)
-{
- xl_heap_visible xlrec;
- XLogRecPtr recptr;
- uint8 flags;
-
- Assert(BufferIsValid(heap_buffer));
- Assert(BufferIsValid(vm_buffer));
-
- xlrec.snapshotConflictHorizon = snapshotConflictHorizon;
- xlrec.flags = vmflags;
- if (RelationIsAccessibleInLogicalDecoding(rel))
- xlrec.flags |= VISIBILITYMAP_XLOG_CATALOG_REL;
- XLogBeginInsert();
- XLogRegisterData(&xlrec, SizeOfHeapVisible);
-
- XLogRegisterBuffer(0, vm_buffer, 0);
-
- flags = REGBUF_STANDARD;
- if (!XLogHintBitIsNeeded())
- flags |= REGBUF_NO_IMAGE;
- XLogRegisterBuffer(1, heap_buffer, flags);
-
- recptr = XLogInsert(RM_HEAP2_ID, XLOG_HEAP2_VISIBLE);
-
- return recptr;
-}
-
/*
* Perform XLogInsert for a heap-update operation. Caller must already
* have modified the buffer(s) and marked them dirty.
diff --git a/src/backend/access/heap/heapam_xlog.c b/src/backend/access/heap/heapam_xlog.c
index 1823feff298..47d2479415e 100644
--- a/src/backend/access/heap/heapam_xlog.c
+++ b/src/backend/access/heap/heapam_xlog.c
@@ -236,7 +236,7 @@ heap_xlog_prune_freeze(XLogReaderState *record)
if (PageIsNew(vmpage))
PageInit(vmpage, BLCKSZ, 0);
- visibilitymap_set_vmbits(blkno, vmbuffer, vmflags, rlocator);
+ visibilitymap_set(blkno, vmbuffer, vmflags, rlocator);
Assert(BufferIsDirty(vmbuffer));
PageSetLSN(vmpage, lsn);
@@ -249,142 +249,6 @@ heap_xlog_prune_freeze(XLogReaderState *record)
XLogRecordPageWithFreeSpace(rlocator, blkno, freespace);
}
-/*
- * Replay XLOG_HEAP2_VISIBLE records.
- *
- * The critical integrity requirement here is that we must never end up with
- * a situation where the visibility map bit is set, and the page-level
- * PD_ALL_VISIBLE bit is clear. If that were to occur, then a subsequent
- * page modification would fail to clear the visibility map bit.
- */
-static void
-heap_xlog_visible(XLogReaderState *record)
-{
- XLogRecPtr lsn = record->EndRecPtr;
- xl_heap_visible *xlrec = (xl_heap_visible *) XLogRecGetData(record);
- Buffer vmbuffer = InvalidBuffer;
- Buffer buffer;
- Page page;
- RelFileLocator rlocator;
- BlockNumber blkno;
- XLogRedoAction action;
-
- Assert((xlrec->flags & VISIBILITYMAP_XLOG_VALID_BITS) == xlrec->flags);
-
- XLogRecGetBlockTag(record, 1, &rlocator, NULL, &blkno);
-
- /*
- * If there are any Hot Standby transactions running that have an xmin
- * horizon old enough that this page isn't all-visible for them, they
- * might incorrectly decide that an index-only scan can skip a heap fetch.
- *
- * NB: It might be better to throw some kind of "soft" conflict here that
- * forces any index-only scan that is in flight to perform heap fetches,
- * rather than killing the transaction outright.
- */
- if (InHotStandby)
- ResolveRecoveryConflictWithSnapshot(xlrec->snapshotConflictHorizon,
- xlrec->flags & VISIBILITYMAP_XLOG_CATALOG_REL,
- rlocator);
-
- /*
- * Read the heap page, if it still exists. If the heap file has dropped or
- * truncated later in recovery, we don't need to update the page, but we'd
- * better still update the visibility map.
- */
- action = XLogReadBufferForRedo(record, 1, &buffer);
- if (action == BLK_NEEDS_REDO)
- {
- /*
- * We don't bump the LSN of the heap page when setting the visibility
- * map bit (unless checksums or wal_hint_bits is enabled, in which
- * case we must). This exposes us to torn page hazards, but since
- * we're not inspecting the existing page contents in any way, we
- * don't care.
- */
- page = BufferGetPage(buffer);
-
- PageSetAllVisible(page);
-
- if (XLogHintBitIsNeeded())
- PageSetLSN(page, lsn);
-
- MarkBufferDirty(buffer);
- }
- else if (action == BLK_RESTORED)
- {
- /*
- * If heap block was backed up, we already restored it and there's
- * nothing more to do. (This can only happen with checksums or
- * wal_log_hints enabled.)
- */
- }
-
- if (BufferIsValid(buffer))
- {
- Size space = PageGetFreeSpace(BufferGetPage(buffer));
-
- UnlockReleaseBuffer(buffer);
-
- /*
- * Since FSM is not WAL-logged and only updated heuristically, it
- * easily becomes stale in standbys. If the standby is later promoted
- * and runs VACUUM, it will skip updating individual free space
- * figures for pages that became all-visible (or all-frozen, depending
- * on the vacuum mode,) which is troublesome when FreeSpaceMapVacuum
- * propagates too optimistic free space values to upper FSM layers;
- * later inserters try to use such pages only to find out that they
- * are unusable. This can cause long stalls when there are many such
- * pages.
- *
- * Forestall those problems by updating FSM's idea about a page that
- * is becoming all-visible or all-frozen.
- *
- * Do this regardless of a full-page image being applied, since the
- * FSM data is not in the page anyway.
- */
- if (xlrec->flags & VISIBILITYMAP_VALID_BITS)
- XLogRecordPageWithFreeSpace(rlocator, blkno, space);
- }
-
- /*
- * Even if we skipped the heap page update due to the LSN interlock, it's
- * still safe to update the visibility map. Any WAL record that clears
- * the visibility map bit does so before checking the page LSN, so any
- * bits that need to be cleared will still be cleared.
- */
- if (XLogReadBufferForRedoExtended(record, 0, RBM_ZERO_ON_ERROR, false,
- &vmbuffer) == BLK_NEEDS_REDO)
- {
- Page vmpage = BufferGetPage(vmbuffer);
- Relation reln;
- uint8 vmbits;
-
- /* initialize the page if it was read as zeros */
- if (PageIsNew(vmpage))
- PageInit(vmpage, BLCKSZ, 0);
-
- /* remove VISIBILITYMAP_XLOG_* */
- vmbits = xlrec->flags & VISIBILITYMAP_VALID_BITS;
-
- /*
- * XLogReadBufferForRedoExtended locked the buffer. But
- * visibilitymap_set will handle locking itself.
- */
- LockBuffer(vmbuffer, BUFFER_LOCK_UNLOCK);
-
- reln = CreateFakeRelcacheEntry(rlocator);
-
- visibilitymap_set(reln, blkno, InvalidBuffer, lsn, vmbuffer,
- xlrec->snapshotConflictHorizon, vmbits);
-
- ReleaseBuffer(vmbuffer);
- FreeFakeRelcacheEntry(reln);
- }
- else if (BufferIsValid(vmbuffer))
- UnlockReleaseBuffer(vmbuffer);
-}
-
/*
* Given an "infobits" field from an XLog record, set the correct bits in the
* given infomask and infomask2 for the tuple touched by the record.
@@ -762,8 +626,8 @@ heap_xlog_multi_insert(XLogReaderState *record)
*
* During recovery, however, no concurrent writers exist. Therefore,
* updating the VM without holding the heap page lock is safe enough. This
- * same approach is taken when replaying xl_heap_visible records (see
- * heap_xlog_visible()).
+ * same approach is taken when replaying XLOG_HEAP2_PRUNE* records (see
+ * heap_xlog_prune_and_freeze()).
*/
if ((xlrec->flags & XLH_INSERT_ALL_FROZEN_SET) &&
XLogReadBufferForRedoExtended(record, 1, RBM_ZERO_ON_ERROR, false,
@@ -775,11 +639,11 @@ heap_xlog_multi_insert(XLogReaderState *record)
if (PageIsNew(vmpage))
PageInit(vmpage, BLCKSZ, 0);
- visibilitymap_set_vmbits(blkno,
- vmbuffer,
- VISIBILITYMAP_ALL_VISIBLE |
- VISIBILITYMAP_ALL_FROZEN,
- rlocator);
+ visibilitymap_set(blkno,
+ vmbuffer,
+ VISIBILITYMAP_ALL_VISIBLE |
+ VISIBILITYMAP_ALL_FROZEN,
+ rlocator);
Assert(BufferIsDirty(vmbuffer));
PageSetLSN(vmpage, lsn);
@@ -1360,9 +1224,6 @@ heap2_redo(XLogReaderState *record)
case XLOG_HEAP2_PRUNE_VACUUM_CLEANUP:
heap_xlog_prune_freeze(record);
break;
- case XLOG_HEAP2_VISIBLE:
- heap_xlog_visible(record);
- break;
case XLOG_HEAP2_MULTI_INSERT:
heap_xlog_multi_insert(record);
break;
diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index e74c2e06226..8568587af4a 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -1226,8 +1226,8 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
* worth optimizing for.
*/
PageSetAllVisible(page);
- visibilitymap_set_vmbits(blockno, vmbuffer, new_vmbits,
- params->relation->rd_locator);
+ visibilitymap_set(blockno, vmbuffer, new_vmbits,
+ params->relation->rd_locator);
}
MarkBufferDirty(buffer);
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index 210afa11346..87820f3ff49 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -1895,11 +1895,11 @@ lazy_scan_new_or_empty(LVRelState *vacrel, Buffer buf, BlockNumber blkno,
log_newpage_buffer(buf, true);
PageSetAllVisible(page);
- visibilitymap_set_vmbits(blkno,
- vmbuffer,
- VISIBILITYMAP_ALL_VISIBLE |
- VISIBILITYMAP_ALL_FROZEN,
- vacrel->rel->rd_locator);
+ visibilitymap_set(blkno,
+ vmbuffer,
+ VISIBILITYMAP_ALL_VISIBLE |
+ VISIBILITYMAP_ALL_FROZEN,
+ vacrel->rel->rd_locator);
/*
* Emit WAL for setting PD_ALL_VISIBLE on the heap page and
@@ -2776,9 +2776,9 @@ lazy_vacuum_heap_page(LVRelState *vacrel, BlockNumber blkno, Buffer buffer,
* set PD_ALL_VISIBLE.
*/
PageSetAllVisible(page);
- visibilitymap_set_vmbits(blkno,
- vmbuffer, vmflags,
- vacrel->rel->rd_locator);
+ visibilitymap_set(blkno,
+ vmbuffer, vmflags,
+ vacrel->rel->rd_locator);
conflict_xid = visibility_cutoff_xid;
}
diff --git a/src/backend/access/heap/visibilitymap.c b/src/backend/access/heap/visibilitymap.c
index d14588e92ae..7997e926872 100644
--- a/src/backend/access/heap/visibilitymap.c
+++ b/src/backend/access/heap/visibilitymap.c
@@ -14,8 +14,7 @@
* visibilitymap_clear - clear bits for one page in the visibility map
* visibilitymap_pin - pin a map page for setting a bit
* visibilitymap_pin_ok - check whether correct map page is already pinned
- * visibilitymap_set - set bit(s) in a previously pinned page and log
- * visibilitymap_set_vmbits - set bit(s) in a pinned page
+ * visibilitymap_set - set bit(s) in a previously pinned page
* visibilitymap_get_status - get status of bits
* visibilitymap_count - count number of bits set in visibility map
* visibilitymap_prepare_truncate -
@@ -220,109 +219,6 @@ visibilitymap_pin_ok(BlockNumber heapBlk, Buffer vmbuf)
return BufferIsValid(vmbuf) && BufferGetBlockNumber(vmbuf) == mapBlock;
}
-/*
- * visibilitymap_set - set bit(s) on a previously pinned page
- *
- * recptr is the LSN of the XLOG record we're replaying, if we're in recovery,
- * or InvalidXLogRecPtr in normal running. The VM page LSN is advanced to the
- * one provided; in normal running, we generate a new XLOG record and set the
- * page LSN to that value (though the heap page's LSN may *not* be updated;
- * see below). cutoff_xid is the largest xmin on the page being marked
- * all-visible; it is needed for Hot Standby, and can be InvalidTransactionId
- * if the page contains no tuples. It can also be set to InvalidTransactionId
- * when a page that is already all-visible is being marked all-frozen.
- *
- * Caller is expected to set the heap page's PD_ALL_VISIBLE bit before calling
- * this function. Except in recovery, caller should also pass the heap
- * buffer. When checksums are enabled and we're not in recovery, we must add
- * the heap buffer to the WAL chain to protect it from being torn.
- *
- * You must pass a buffer containing the correct map page to this function.
- * Call visibilitymap_pin first to pin the right one. This function doesn't do
- * any I/O.
- *
- * Returns the state of the page's VM bits before setting flags.
- */
-uint8
-visibilitymap_set(Relation rel, BlockNumber heapBlk, Buffer heapBuf,
- XLogRecPtr recptr, Buffer vmBuf, TransactionId cutoff_xid,
- uint8 flags)
-{
- BlockNumber mapBlock = HEAPBLK_TO_MAPBLOCK(heapBlk);
- uint32 mapByte = HEAPBLK_TO_MAPBYTE(heapBlk);
- uint8 mapOffset = HEAPBLK_TO_OFFSET(heapBlk);
- Page page;
- uint8 *map;
- uint8 status;
-
-#ifdef TRACE_VISIBILITYMAP
- elog(DEBUG1, "vm_set flags 0x%02X for %s %d",
- flags, RelationGetRelationName(rel), heapBlk);
-#endif
-
- Assert(InRecovery || !XLogRecPtrIsValid(recptr));
- Assert(InRecovery || PageIsAllVisible(BufferGetPage(heapBuf)));
- Assert((flags & VISIBILITYMAP_VALID_BITS) == flags);
-
- /* Must never set all_frozen bit without also setting all_visible bit */
- Assert(flags != VISIBILITYMAP_ALL_FROZEN);
-
- /* Check that we have the right heap page pinned, if present */
- if (BufferIsValid(heapBuf) && BufferGetBlockNumber(heapBuf) != heapBlk)
- elog(ERROR, "wrong heap buffer passed to visibilitymap_set");
-
- Assert(!BufferIsValid(heapBuf) ||
- BufferIsLockedByMeInMode(heapBuf, BUFFER_LOCK_EXCLUSIVE));
-
- /* Check that we have the right VM page pinned */
- if (!BufferIsValid(vmBuf) || BufferGetBlockNumber(vmBuf) != mapBlock)
- elog(ERROR, "wrong VM buffer passed to visibilitymap_set");
-
- page = BufferGetPage(vmBuf);
- map = (uint8 *) PageGetContents(page);
- LockBuffer(vmBuf, BUFFER_LOCK_EXCLUSIVE);
-
- status = (map[mapByte] >> mapOffset) & VISIBILITYMAP_VALID_BITS;
- if (flags != status)
- {
- START_CRIT_SECTION();
-
- map[mapByte] |= (flags << mapOffset);
- MarkBufferDirty(vmBuf);
-
- if (RelationNeedsWAL(rel))
- {
- if (!XLogRecPtrIsValid(recptr))
- {
- Assert(!InRecovery);
- recptr = log_heap_visible(rel, heapBuf, vmBuf, cutoff_xid, flags);
-
- /*
- * If data checksums are enabled (or wal_log_hints=on), we
- * need to protect the heap page from being torn.
- *
- * If not, then we must *not* update the heap page's LSN. In
- * this case, the FPI for the heap page was omitted from the
- * WAL record inserted above, so it would be incorrect to
- * update the heap page's LSN.
- */
- if (XLogHintBitIsNeeded())
- {
- Page heapPage = BufferGetPage(heapBuf);
-
- PageSetLSN(heapPage, recptr);
- }
- }
- PageSetLSN(page, recptr);
- }
-
- END_CRIT_SECTION();
- }
-
- LockBuffer(vmBuf, BUFFER_LOCK_UNLOCK);
- return status;
-}
-
/*
* Set VM (visibility map) flags in the VM block in vmBuf.
*
@@ -344,9 +240,9 @@ visibilitymap_set(Relation rel, BlockNumber heapBlk, Buffer heapBuf,
* rlocator is used only for debugging messages.
*/
uint8
-visibilitymap_set_vmbits(BlockNumber heapBlk,
- Buffer vmBuf, uint8 flags,
- const RelFileLocator rlocator)
+visibilitymap_set(BlockNumber heapBlk,
+ Buffer vmBuf, uint8 flags,
+ const RelFileLocator rlocator)
{
BlockNumber mapBlock = HEAPBLK_TO_MAPBLOCK(heapBlk);
uint32 mapByte = HEAPBLK_TO_MAPBYTE(heapBlk);
diff --git a/src/backend/access/rmgrdesc/heapdesc.c b/src/backend/access/rmgrdesc/heapdesc.c
index ca26d1f0ed1..08461fdf593 100644
--- a/src/backend/access/rmgrdesc/heapdesc.c
+++ b/src/backend/access/rmgrdesc/heapdesc.c
@@ -349,13 +349,6 @@ heap2_desc(StringInfo buf, XLogReaderState *record)
}
}
}
- else if (info == XLOG_HEAP2_VISIBLE)
- {
- xl_heap_visible *xlrec = (xl_heap_visible *) rec;
-
- appendStringInfo(buf, "snapshotConflictHorizon: %u, flags: 0x%02X",
- xlrec->snapshotConflictHorizon, xlrec->flags);
- }
else if (info == XLOG_HEAP2_MULTI_INSERT)
{
xl_heap_multi_insert *xlrec = (xl_heap_multi_insert *) rec;
@@ -461,9 +454,6 @@ heap2_identify(uint8 info)
case XLOG_HEAP2_PRUNE_VACUUM_CLEANUP:
id = "PRUNE_VACUUM_CLEANUP";
break;
- case XLOG_HEAP2_VISIBLE:
- id = "VISIBLE";
- break;
case XLOG_HEAP2_MULTI_INSERT:
id = "MULTI_INSERT";
break;
diff --git a/src/backend/replication/logical/decode.c b/src/backend/replication/logical/decode.c
index 5e15cb1825e..c0cac7ea1c3 100644
--- a/src/backend/replication/logical/decode.c
+++ b/src/backend/replication/logical/decode.c
@@ -454,7 +454,6 @@ heap2_decode(LogicalDecodingContext *ctx, XLogRecordBuffer *buf)
case XLOG_HEAP2_PRUNE_ON_ACCESS:
case XLOG_HEAP2_PRUNE_VACUUM_SCAN:
case XLOG_HEAP2_PRUNE_VACUUM_CLEANUP:
- case XLOG_HEAP2_VISIBLE:
case XLOG_HEAP2_LOCK_UPDATED:
break;
default:
diff --git a/src/backend/storage/ipc/standby.c b/src/backend/storage/ipc/standby.c
index fc45d72c79b..3655358ed6b 100644
--- a/src/backend/storage/ipc/standby.c
+++ b/src/backend/storage/ipc/standby.c
@@ -475,12 +475,12 @@ ResolveRecoveryConflictWithSnapshot(TransactionId snapshotConflictHorizon,
* If we get passed InvalidTransactionId then we do nothing (no conflict).
*
* This can happen when replaying already-applied WAL records after a
- * standby crash or restart, or when replaying an XLOG_HEAP2_VISIBLE
- * record that marks as frozen a page which was already all-visible. It's
- * also quite common with records generated during index deletion
- * (original execution of the deletion can reason that a recovery conflict
- * which is sufficient for the deletion operation must take place before
- * replay of the deletion record itself).
+ * standby crash or restart, or when replaying a record that marks as
+ * frozen a page which was already marked all-visible in the visibility
+ * map. It's also quite common with records generated during index
+ * deletion (original execution of the deletion can reason that a recovery
+ * conflict which is sufficient for the deletion operation must take place
+ * before replay of the deletion record itself).
*/
if (!TransactionIdIsValid(snapshotConflictHorizon))
return;
diff --git a/src/include/access/heapam_xlog.h b/src/include/access/heapam_xlog.h
index 16c2b2e3c9c..69678187832 100644
--- a/src/include/access/heapam_xlog.h
+++ b/src/include/access/heapam_xlog.h
@@ -60,7 +60,6 @@
#define XLOG_HEAP2_PRUNE_ON_ACCESS 0x10
#define XLOG_HEAP2_PRUNE_VACUUM_SCAN 0x20
#define XLOG_HEAP2_PRUNE_VACUUM_CLEANUP 0x30
-#define XLOG_HEAP2_VISIBLE 0x40
#define XLOG_HEAP2_MULTI_INSERT 0x50
#define XLOG_HEAP2_LOCK_UPDATED 0x60
#define XLOG_HEAP2_NEW_CID 0x70
@@ -294,7 +293,13 @@ typedef struct xl_heap_prune
#define SizeOfHeapPrune (offsetof(xl_heap_prune, flags) + sizeof(uint16))
-/* to handle recovery conflict during logical decoding on standby */
+/*
+ * To handle recovery conflict during logical decoding on standby, we must know
+ * if the table is a catalog table. Note that in visibilitymapdefs.h
+ * VISIBILITYMAP_XLOG_CATALOG_REL is also defined as (1 << 2). xl_heap_prune
+ * records should use XLHP_IS_CATALOG_REL, not VISIBILIYTMAP_XLOG_CATALOG_REL --
+ * even if they only contain updates to the VM.
+ */
#define XLHP_IS_CATALOG_REL (1 << 1)
/*
@@ -443,20 +448,6 @@ typedef struct xl_heap_inplace
#define MinSizeOfHeapInplace (offsetof(xl_heap_inplace, nmsgs) + sizeof(int))
-/*
- * This is what we need to know about setting a visibility map bit
- *
- * Backup blk 0: visibility map buffer
- * Backup blk 1: heap buffer
- */
-typedef struct xl_heap_visible
-{
- TransactionId snapshotConflictHorizon;
- uint8 flags;
-} xl_heap_visible;
-
-#define SizeOfHeapVisible (offsetof(xl_heap_visible, flags) + sizeof(uint8))
-
typedef struct xl_heap_new_cid
{
/*
@@ -500,11 +491,6 @@ extern void heap2_desc(StringInfo buf, XLogReaderState *record);
extern const char *heap2_identify(uint8 info);
extern void heap_xlog_logical_rewrite(XLogReaderState *r);
-extern XLogRecPtr log_heap_visible(Relation rel, Buffer heap_buffer,
- Buffer vm_buffer,
- TransactionId snapshotConflictHorizon,
- uint8 vmflags);
-
/* in heapdesc.c, so it can be shared between frontend/backend code */
extern void heap_xlog_deserialize_prune_and_freeze(char *cursor, uint16 flags,
int *nplans, xlhp_freeze_plan **plans,
diff --git a/src/include/access/visibilitymap.h b/src/include/access/visibilitymap.h
index c6fa37be968..05ba6786b47 100644
--- a/src/include/access/visibilitymap.h
+++ b/src/include/access/visibilitymap.h
@@ -15,7 +15,6 @@
#define VISIBILITYMAP_H
#include "access/visibilitymapdefs.h"
-#include "access/xlogdefs.h"
#include "storage/block.h"
#include "storage/buf.h"
#include "storage/relfilelocator.h"
@@ -32,15 +31,9 @@ extern bool visibilitymap_clear(Relation rel, BlockNumber heapBlk,
extern void visibilitymap_pin(Relation rel, BlockNumber heapBlk,
Buffer *vmbuf);
extern bool visibilitymap_pin_ok(BlockNumber heapBlk, Buffer vmbuf);
-extern uint8 visibilitymap_set(Relation rel,
- BlockNumber heapBlk, Buffer heapBuf,
- XLogRecPtr recptr,
- Buffer vmBuf,
- TransactionId cutoff_xid,
- uint8 flags);
-extern uint8 visibilitymap_set_vmbits(BlockNumber heapBlk,
- Buffer vmBuf, uint8 flags,
- const RelFileLocator rlocator);
+extern uint8 visibilitymap_set(BlockNumber heapBlk,
+ Buffer vmBuf, uint8 flags,
+ const RelFileLocator rlocator);
extern uint8 visibilitymap_get_status(Relation rel, BlockNumber heapBlk, Buffer *vmbuf);
extern void visibilitymap_count(Relation rel, BlockNumber *all_visible, BlockNumber *all_frozen);
extern BlockNumber visibilitymap_prepare_truncate(Relation rel,
diff --git a/src/include/access/visibilitymapdefs.h b/src/include/access/visibilitymapdefs.h
index 5ad5c020877..e01bce4c99f 100644
--- a/src/include/access/visibilitymapdefs.h
+++ b/src/include/access/visibilitymapdefs.h
@@ -21,14 +21,5 @@
#define VISIBILITYMAP_ALL_FROZEN 0x02
#define VISIBILITYMAP_VALID_BITS 0x03 /* OR of all valid visibilitymap
* flags bits */
-/*
- * To detect recovery conflicts during logical decoding on a standby, we need
- * to know if a table is a user catalog table. For that we add an additional
- * bit into xl_heap_visible.flags, in addition to the above.
- *
- * NB: VISIBILITYMAP_XLOG_* may not be passed to visibilitymap_set().
- */
-#define VISIBILITYMAP_XLOG_CATALOG_REL 0x04
-#define VISIBILITYMAP_XLOG_VALID_BITS (VISIBILITYMAP_VALID_BITS | VISIBILITYMAP_XLOG_CATALOG_REL)
#endif /* VISIBILITYMAPDEFS_H */
diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list
index 04845d5e680..6505628120c 100644
--- a/src/tools/pgindent/typedefs.list
+++ b/src/tools/pgindent/typedefs.list
@@ -4331,7 +4331,6 @@ xl_heap_prune
xl_heap_rewrite_mapping
xl_heap_truncate
xl_heap_update
-xl_heap_visible
xl_invalid_page
xl_invalid_page_key
xl_invalidations
--
2.43.0
v27-0009-Use-GlobalVisState-in-vacuum-to-determine-page-l.patchtext/x-patch; charset=UTF-8; name=v27-0009-Use-GlobalVisState-in-vacuum-to-determine-page-l.patchDownload
From d301117f52d6a6e78fbdafbbb2c0c4dd62b5b861 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <melanieplageman@gmail.com>
Date: Tue, 29 Jul 2025 14:38:24 -0400
Subject: [PATCH v27 09/14] Use GlobalVisState in vacuum to determine page
level visibility
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
During vacuum's first and third phases, we examine tuples' visibility
to determine if we can set the page all-visible in the visibility map.
Previously, this check compared tuple xmins against a single XID chosen at
the start of vacuum (OldestXmin). We now use GlobalVisState, which also
enables future work to set the VM during on-access pruning, since ordinary
queries have access to GlobalVisState but not OldestXmin.
This also benefits vacuum directly: in some cases, GlobalVisState may
advance during a vacuum, allowing more pages to become considered
all-visible. And, in the future, we could easily add a heuristic to
update GlobalVisState more frequently during vacuums of large tables. In
the rare case that the GlobalVisState moves backward, vacuum falls back
to OldestXmin to ensure we don’t attempt to freeze a dead tuple that
wasn’t yet prunable according to the GlobalVisState.
Because comparing a transaction ID against GlobalVisState is more
expensive than comparing against a single XID, we defer this check until
after scanning all tuples on the page. If visibility_cutoff_xid was
maintained, we perform the GlobalVisState check only once per page.
This is safe because visibility_cutoff_xid records the newest xmin on
the page; if it is globally visible, then the entire page is all-visible.
This approach may result in examining more tuple xmins than before,
since with OldestXmin we could sometimes rule out the page being
all-visible earlier. However, profiling shows the additional cost is not
significant.
Reviewed-by: Andres Freund <andres@anarazel.de>
Reviewed-by: Chao Li <li.evan.chao@gmail.com>
---
src/backend/access/heap/heapam_visibility.c | 50 +++++++++++++++++++++
src/backend/access/heap/pruneheap.c | 43 ++++++++----------
src/backend/access/heap/vacuumlazy.c | 10 ++---
src/include/access/heapam.h | 13 +++---
4 files changed, 82 insertions(+), 34 deletions(-)
diff --git a/src/backend/access/heap/heapam_visibility.c b/src/backend/access/heap/heapam_visibility.c
index 05f6946fe60..6bcd8b6d017 100644
--- a/src/backend/access/heap/heapam_visibility.c
+++ b/src/backend/access/heap/heapam_visibility.c
@@ -1189,6 +1189,56 @@ HeapTupleSatisfiesVacuum(HeapTuple htup, TransactionId OldestXmin,
return res;
}
+/*
+ * Wrapper around GlobalVisTestIsRemovableXid() for use when examining live
+ * tuples. Returns true if the given XID is no longer considered running by
+ * any snapshot.
+ *
+ * This function alone is insufficient to determine tuple visibility; callers
+ * must also consider the tuple’s commit status. Its purpose is purely
+ * semantic: when applied to live tuples, GlobalVisTestIsRemovableXid() is
+ * checking whether the inserting transaction is still considered running,
+ * not whether the tuple is removable. Live tuples are, by definition, not
+ * removable, but the snapshot criteria for “transaction still running” are
+ * identical to those used for deletion XIDs.
+ *
+ * See the comment above GlobalVisTestIsRemovable[Full]Xid() for details on the
+ * required preconditions for calling this function.
+ */
+bool
+GlobalVisTestXidNotRunning(GlobalVisState *state, TransactionId xid)
+{
+ return GlobalVisTestIsRemovableXid(state, xid);
+}
+
+/*
+ * Nearly the same as HeapTupleSatisfiesVacuum, but uses a GlobalVisState to
+ * determine whether or not a tuple is HEAPTUPLE_DEAD Or
+ * HEAPTUPLE_RECENTLY_DEAD. It serves the same purpose but can be used by
+ * callers that have not calculated a single OldestXmin value.
+ */
+HTSV_Result
+HeapTupleSatisfiesVacuumGlobalVis(HeapTuple htup, GlobalVisState *vistest,
+ Buffer buffer)
+{
+ TransactionId dead_after = InvalidTransactionId;
+ HTSV_Result res;
+
+ res = HeapTupleSatisfiesVacuumHorizon(htup, buffer, &dead_after);
+
+ if (res == HEAPTUPLE_RECENTLY_DEAD)
+ {
+ Assert(TransactionIdIsValid(dead_after));
+
+ if (GlobalVisTestXidNotRunning(vistest, dead_after))
+ res = HEAPTUPLE_DEAD;
+ }
+ else
+ Assert(!TransactionIdIsValid(dead_after));
+
+ return res;
+}
+
/*
* Work horse for HeapTupleSatisfiesVacuum and similar routines.
*
diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index 8568587af4a..08ffe511d03 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -461,11 +461,12 @@ prune_freeze_setup(PruneFreezeParams *params,
/*
* The visibility cutoff xid is the newest xmin of live, committed tuples
- * older than OldestXmin on the page. This field is only kept up-to-date
- * if the page is all-visible. As soon as a tuple is encountered that is
- * not visible to all, this field is unmaintained. As long as it is
- * maintained, it can be used to calculate the snapshot conflict horizon
- * when updating the VM and/or freezing all the tuples on the page.
+ * on the page older than the visibility horizon represented in the
+ * GlobalVisState. This field is only kept up-to-date if the page is
+ * all-visible. As soon as a tuple is encountered that is not visible to
+ * all, this field is unmaintained. As long as it is maintained, it can be
+ * used to calculate the snapshot conflict horizon when updating the VM
+ * and/or freezing all the tuples on the page.
*/
prstate->visibility_cutoff_xid = InvalidTransactionId;
}
@@ -994,14 +995,13 @@ heap_page_will_set_vm(PruneState *prstate,
*/
static bool
heap_page_is_all_visible(Relation rel, Buffer buf,
- TransactionId OldestXmin,
+ GlobalVisState *vistest,
bool *all_frozen,
TransactionId *visibility_cutoff_xid,
OffsetNumber *logging_offnum)
{
- return heap_page_would_be_all_visible(rel, buf,
- OldestXmin,
+ return heap_page_would_be_all_visible(rel, buf, vistest,
NULL, 0,
all_frozen,
visibility_cutoff_xid,
@@ -1088,6 +1088,16 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
prune_freeze_plan(RelationGetRelid(params->relation),
buffer, &prstate, off_loc);
+ /*
+ * After processing all the live tuples on the page, if the newest xmin
+ * amongst them is not visible to everyone, the page cannot be
+ * all-visible.
+ */
+ if (prstate.all_visible &&
+ TransactionIdIsNormal(prstate.visibility_cutoff_xid) &&
+ !GlobalVisTestXidNotRunning(prstate.vistest, prstate.visibility_cutoff_xid))
+ prstate.all_visible = prstate.all_frozen = false;
+
/*
* If checksums are enabled, calling heap_prune_satisfies_vacuum() while
* checking tuple visibility information in prune_freeze_plan() may have
@@ -1269,10 +1279,9 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
bool debug_all_frozen;
Assert(prstate.lpdead_items == 0);
- Assert(prstate.cutoffs);
Assert(heap_page_is_all_visible(params->relation, buffer,
- prstate.cutoffs->OldestXmin,
+ prstate.vistest,
&debug_all_frozen,
&debug_cutoff, off_loc));
@@ -1801,20 +1810,6 @@ heap_prune_record_unchanged_lp_normal(Page page, PruneState *prstate, OffsetNumb
*/
xmin = HeapTupleHeaderGetXmin(htup);
- /*
- * For now always use prstate->cutoffs for this test, because
- * we only update 'all_visible' and 'all_frozen' when freezing
- * is requested. We could use GlobalVisTestIsRemovableXid
- * instead, if a non-freezing caller wanted to set the VM bit.
- */
- Assert(prstate->cutoffs);
- if (!TransactionIdPrecedes(xmin, prstate->cutoffs->OldestXmin))
- {
- prstate->all_visible = false;
- prstate->all_frozen = false;
- break;
- }
-
/* Track newest xmin on page. */
if (TransactionIdFollows(xmin, prstate->visibility_cutoff_xid) &&
TransactionIdIsNormal(xmin))
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index 87820f3ff49..3b8c9dbdb4b 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -2730,7 +2730,7 @@ lazy_vacuum_heap_page(LVRelState *vacrel, BlockNumber blkno, Buffer buffer,
* done outside the critical section.
*/
if (heap_page_would_be_all_visible(vacrel->rel, buffer,
- vacrel->cutoffs.OldestXmin,
+ vacrel->vistest,
deadoffsets, num_offsets,
&all_frozen, &visibility_cutoff_xid,
&vacrel->offnum))
@@ -3491,7 +3491,7 @@ dead_items_cleanup(LVRelState *vacrel)
* Returns true if the page is all-visible other than the provided
* deadoffsets and false otherwise.
*
- * OldestXmin is used to determine visibility.
+ * vistest is used to determine visibility.
*
* Output parameters:
*
@@ -3507,7 +3507,7 @@ dead_items_cleanup(LVRelState *vacrel)
*/
bool
heap_page_would_be_all_visible(Relation rel, Buffer buf,
- TransactionId OldestXmin,
+ GlobalVisState *vistest,
OffsetNumber *deadoffsets,
int ndeadoffsets,
bool *all_frozen,
@@ -3581,7 +3581,7 @@ heap_page_would_be_all_visible(Relation rel, Buffer buf,
/* Visibility checks may do IO or allocate memory */
Assert(CritSectionCount == 0);
- switch (HeapTupleSatisfiesVacuum(&tuple, OldestXmin, buf))
+ switch (HeapTupleSatisfiesVacuumGlobalVis(&tuple, vistest, buf))
{
case HEAPTUPLE_LIVE:
{
@@ -3600,7 +3600,7 @@ heap_page_would_be_all_visible(Relation rel, Buffer buf,
* that everyone sees it as committed?
*/
xmin = HeapTupleHeaderGetXmin(tuple.t_data);
- if (!TransactionIdPrecedes(xmin, OldestXmin))
+ if (!GlobalVisTestXidNotRunning(vistest, xmin))
{
all_visible = false;
*all_frozen = false;
diff --git a/src/include/access/heapam.h b/src/include/access/heapam.h
index 9100d42ccbb..e2ee035ae0b 100644
--- a/src/include/access/heapam.h
+++ b/src/include/access/heapam.h
@@ -272,10 +272,9 @@ typedef struct PruneFreezeParams
/*
* Contains the cutoffs used for freezing. They are required if the
- * HEAP_PAGE_PRUNE_FREEZE option is set. cutoffs->OldestXmin is also used
- * to determine if dead tuples are HEAPTUPLE_RECENTLY_DEAD or
- * HEAPTUPLE_DEAD. Currently only vacuum passes in cutoffs. Vacuum
- * calculates them once, at the beginning of vacuuming the relation.
+ * HEAP_PAGE_PRUNE_FREEZE option is set. Currently only vacuum passes in
+ * cutoffs. Vacuum calculates them once, at the beginning of vacuuming the
+ * relation.
*/
struct VacuumCutoffs *cutoffs;
} PruneFreezeParams;
@@ -439,7 +438,7 @@ extern void log_heap_prune_and_freeze(Relation relation, Buffer buffer,
extern void heap_vacuum_rel(Relation rel,
const VacuumParams params, BufferAccessStrategy bstrategy);
extern bool heap_page_would_be_all_visible(Relation rel, Buffer buf,
- TransactionId OldestXmin,
+ GlobalVisState *vistest,
OffsetNumber *deadoffsets,
int ndeadoffsets,
bool *all_frozen,
@@ -453,6 +452,10 @@ extern TM_Result HeapTupleSatisfiesUpdate(HeapTuple htup, CommandId curcid,
Buffer buffer);
extern HTSV_Result HeapTupleSatisfiesVacuum(HeapTuple htup, TransactionId OldestXmin,
Buffer buffer);
+
+extern bool GlobalVisTestXidNotRunning(GlobalVisState *state, TransactionId xid);
+extern HTSV_Result HeapTupleSatisfiesVacuumGlobalVis(HeapTuple htup,
+ GlobalVisState *vistest, Buffer buffer);
extern HTSV_Result HeapTupleSatisfiesVacuumHorizon(HeapTuple htup, Buffer buffer,
TransactionId *dead_after);
extern void HeapTupleSetHintBits(HeapTupleHeader tuple, Buffer buffer,
--
2.43.0
v27-0010-Unset-all_visible-sooner-if-not-freezing.patchtext/x-patch; charset=UTF-8; name=v27-0010-Unset-all_visible-sooner-if-not-freezing.patchDownload
From e3dd1db8931e00d09d1c29d399f56434146beab3 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <melanieplageman@gmail.com>
Date: Tue, 14 Oct 2025 15:22:35 -0400
Subject: [PATCH v27 10/14] Unset all_visible sooner if not freezing
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
In the prune/freeze path, we currently delay clearing all_visible and
all_frozen in the presence of dead items to allow opportunistic
freezing.
However, if no freezing will be attempted, there’s no need to delay.
Clearing the flags earlier avoids extra bookkeeping in
heap_prune_record_unchanged_lp_normal(). This currently has no runtime
effect because all callers that consider setting the VM also prepare
freeze plans, but upcoming changes will allow on-access pruning to set
the VM without freezing. The extra bookkeeping was noticeable in a
profile of on-access VM setting.
Reviewed-by: Chao Li <li.evan.chao@gmail.com>
---
src/backend/access/heap/pruneheap.c | 15 +++++++++++++--
1 file changed, 13 insertions(+), 2 deletions(-)
diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index 08ffe511d03..3d34532b766 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -1667,8 +1667,13 @@ heap_prune_record_dead(PruneState *prstate, OffsetNumber offnum,
/*
* Deliberately delay unsetting all_visible and all_frozen until later
* during pruning. Removable dead tuples shouldn't preclude freezing the
- * page.
+ * page. If we won't attempt freezing, just unset all-visible now, though.
*/
+ if (!prstate->attempt_freeze)
+ {
+ prstate->all_visible = false;
+ prstate->all_frozen = false;
+ }
/* Record the dead offset for vacuum */
prstate->deadoffsets[prstate->lpdead_items++] = offnum;
@@ -1927,8 +1932,14 @@ heap_prune_record_unchanged_lp_dead(Page page, PruneState *prstate, OffsetNumber
* Similarly, don't unset all_visible and all_frozen until later, at the
* end of heap_page_prune_and_freeze(). This will allow us to attempt to
* freeze the page after pruning. As long as we unset it before updating
- * the visibility map, this will be correct.
+ * the visibility map, this will be correct. If we won't attempt freezing,
+ * though, just unset all-visible now.
*/
+ if (!prstate->attempt_freeze)
+ {
+ prstate->all_visible = false;
+ prstate->all_frozen = false;
+ }
/* Record the dead offset for vacuum */
prstate->deadoffsets[prstate->lpdead_items++] = offnum;
--
2.43.0
v27-0011-Track-which-relations-are-modified-by-a-query.patchtext/x-patch; charset=US-ASCII; name=v27-0011-Track-which-relations-are-modified-by-a-query.patchDownload
From 48da46f219ac3f4c09b4cb6df23a31544921087e Mon Sep 17 00:00:00 2001
From: Melanie Plageman <melanieplageman@gmail.com>
Date: Wed, 3 Dec 2025 15:07:24 -0500
Subject: [PATCH v27 11/14] Track which relations are modified by a query
Save the relids in a bitmap in the estate. A later commit will pass this
information down to scan nodes to control whether or not the scan allows
setting the visibility map while on-access pruning. We don't want to set
the visibility map if the query is just going to modify the page
immediately after.
---
src/backend/executor/execMain.c | 4 ++++
src/backend/executor/execUtils.c | 2 ++
src/include/nodes/execnodes.h | 6 ++++++
3 files changed, 12 insertions(+)
diff --git a/src/backend/executor/execMain.c b/src/backend/executor/execMain.c
index 797d8b1ca1c..5b299ef81aa 100644
--- a/src/backend/executor/execMain.c
+++ b/src/backend/executor/execMain.c
@@ -916,6 +916,10 @@ InitPlan(QueryDesc *queryDesc, int eflags)
break;
}
+ /* If it has a rowmark, the relation is modified */
+ estate->es_modified_relids = bms_add_member(estate->es_modified_relids,
+ rc->rti);
+
/* Check that relation is a legal target for marking */
if (relation)
CheckValidRowMarkRel(relation, rc->markType);
diff --git a/src/backend/executor/execUtils.c b/src/backend/executor/execUtils.c
index fdc65c2b42b..28a06dcd244 100644
--- a/src/backend/executor/execUtils.c
+++ b/src/backend/executor/execUtils.c
@@ -893,6 +893,8 @@ ExecInitResultRelation(EState *estate, ResultRelInfo *resultRelInfo,
estate->es_result_relations = (ResultRelInfo **)
palloc0(estate->es_range_table_size * sizeof(ResultRelInfo *));
estate->es_result_relations[rti - 1] = resultRelInfo;
+ estate->es_modified_relids = bms_add_member(estate->es_modified_relids,
+ rti);
/*
* Saving in the list allows to avoid needlessly traversing the whole
diff --git a/src/include/nodes/execnodes.h b/src/include/nodes/execnodes.h
index 3968429f991..d8c385216e0 100644
--- a/src/include/nodes/execnodes.h
+++ b/src/include/nodes/execnodes.h
@@ -676,6 +676,12 @@ typedef struct EState
* ExecDoInitialPruning() */
const char *es_sourceText; /* Source text from QueryDesc */
+ /*
+ * RT indexes of relations modified by the query either through
+ * UPDATE/DELETE/INSERT/MERGE or SELECT FOR UPDATE
+ */
+ Bitmapset *es_modified_relids;
+
JunkFilter *es_junkFilter; /* top-level junk filter, if any */
/* If query can insert/delete tuples, the command ID to mark them with */
--
2.43.0
v27-0012-Pass-down-information-on-table-modification-to-s.patchtext/x-patch; charset=US-ASCII; name=v27-0012-Pass-down-information-on-table-modification-to-s.patchDownload
From 2b371cbe252e262f2fdf68b9f507f3d5f401628e Mon Sep 17 00:00:00 2001
From: Melanie Plageman <melanieplageman@gmail.com>
Date: Wed, 3 Dec 2025 15:12:18 -0500
Subject: [PATCH v27 12/14] Pass down information on table modification to scan
node
Pass down information to sequential scan, index scan, and bitmap table
scan nodes on whether or not the query modifies the relation being
scanned. A later commit will use this information to update the VM
during on-access pruning only if the relation is not modified by the
query.
---
contrib/pgrowlocks/pgrowlocks.c | 2 +-
src/backend/access/brin/brin.c | 3 ++-
src/backend/access/gin/gininsert.c | 3 ++-
src/backend/access/heap/heapam_handler.c | 7 +++---
src/backend/access/index/genam.c | 4 ++--
src/backend/access/index/indexam.c | 6 +++---
src/backend/access/nbtree/nbtsort.c | 2 +-
src/backend/access/table/tableam.c | 7 +++---
src/backend/commands/constraint.c | 2 +-
src/backend/commands/copyto.c | 2 +-
src/backend/commands/tablecmds.c | 8 +++----
src/backend/commands/typecmds.c | 4 ++--
src/backend/executor/execIndexing.c | 2 +-
src/backend/executor/execReplication.c | 8 +++----
src/backend/executor/nodeBitmapHeapscan.c | 9 +++++++-
src/backend/executor/nodeIndexonlyscan.c | 2 +-
src/backend/executor/nodeIndexscan.c | 11 ++++++++--
src/backend/executor/nodeSeqscan.c | 26 ++++++++++++++++++++---
src/backend/partitioning/partbounds.c | 2 +-
src/backend/utils/adt/selfuncs.c | 2 +-
src/include/access/genam.h | 2 +-
src/include/access/heapam.h | 6 ++++++
src/include/access/tableam.h | 19 ++++++++++-------
23 files changed, 93 insertions(+), 46 deletions(-)
diff --git a/contrib/pgrowlocks/pgrowlocks.c b/contrib/pgrowlocks/pgrowlocks.c
index f88269332b6..27f01d8055f 100644
--- a/contrib/pgrowlocks/pgrowlocks.c
+++ b/contrib/pgrowlocks/pgrowlocks.c
@@ -114,7 +114,7 @@ pgrowlocks(PG_FUNCTION_ARGS)
RelationGetRelationName(rel));
/* Scan the relation */
- scan = table_beginscan(rel, GetActiveSnapshot(), 0, NULL);
+ scan = table_beginscan(rel, GetActiveSnapshot(), 0, NULL, 0);
hscan = (HeapScanDesc) scan;
attinmeta = TupleDescGetAttInMetadata(rsinfo->setDesc);
diff --git a/src/backend/access/brin/brin.c b/src/backend/access/brin/brin.c
index 26cb75058d1..4ad8941c60a 100644
--- a/src/backend/access/brin/brin.c
+++ b/src/backend/access/brin/brin.c
@@ -2842,7 +2842,8 @@ _brin_parallel_scan_and_build(BrinBuildState *state,
indexInfo->ii_Concurrent = brinshared->isconcurrent;
scan = table_beginscan_parallel(heap,
- ParallelTableScanFromBrinShared(brinshared));
+ ParallelTableScanFromBrinShared(brinshared),
+ 0);
reltuples = table_index_build_scan(heap, index, indexInfo, true, true,
brinbuildCallbackParallel, state, scan);
diff --git a/src/backend/access/gin/gininsert.c b/src/backend/access/gin/gininsert.c
index df30dcc0228..aaa5401b731 100644
--- a/src/backend/access/gin/gininsert.c
+++ b/src/backend/access/gin/gininsert.c
@@ -2058,7 +2058,8 @@ _gin_parallel_scan_and_build(GinBuildState *state,
indexInfo->ii_Concurrent = ginshared->isconcurrent;
scan = table_beginscan_parallel(heap,
- ParallelTableScanFromGinBuildShared(ginshared));
+ ParallelTableScanFromGinBuildShared(ginshared),
+ 0);
reltuples = table_index_build_scan(heap, index, indexInfo, true, progress,
ginBuildCallbackParallel, state, scan);
diff --git a/src/backend/access/heap/heapam_handler.c b/src/backend/access/heap/heapam_handler.c
index dd4fe6bf62f..6c2e4e08b16 100644
--- a/src/backend/access/heap/heapam_handler.c
+++ b/src/backend/access/heap/heapam_handler.c
@@ -79,12 +79,13 @@ heapam_slot_callbacks(Relation relation)
*/
static IndexFetchTableData *
-heapam_index_fetch_begin(Relation rel)
+heapam_index_fetch_begin(Relation rel, uint32 flags)
{
IndexFetchHeapData *hscan = palloc0_object(IndexFetchHeapData);
hscan->xs_base.rel = rel;
hscan->xs_cbuf = InvalidBuffer;
+ hscan->modifies_base_rel = !(flags & SO_HINT_REL_READ_ONLY);
return &hscan->xs_base;
}
@@ -753,7 +754,7 @@ heapam_relation_copy_for_cluster(Relation OldHeap, Relation NewHeap,
tableScan = NULL;
heapScan = NULL;
- indexScan = index_beginscan(OldHeap, OldIndex, SnapshotAny, NULL, 0, 0);
+ indexScan = index_beginscan(OldHeap, OldIndex, SnapshotAny, NULL, 0, 0, 0);
index_rescan(indexScan, NULL, 0, NULL, 0);
}
else
@@ -762,7 +763,7 @@ heapam_relation_copy_for_cluster(Relation OldHeap, Relation NewHeap,
pgstat_progress_update_param(PROGRESS_CLUSTER_PHASE,
PROGRESS_CLUSTER_PHASE_SEQ_SCAN_HEAP);
- tableScan = table_beginscan(OldHeap, SnapshotAny, 0, (ScanKey) NULL);
+ tableScan = table_beginscan(OldHeap, SnapshotAny, 0, (ScanKey) NULL, 0);
heapScan = (HeapScanDesc) tableScan;
indexScan = NULL;
diff --git a/src/backend/access/index/genam.c b/src/backend/access/index/genam.c
index b7f10a1aed0..15f9cc11582 100644
--- a/src/backend/access/index/genam.c
+++ b/src/backend/access/index/genam.c
@@ -447,7 +447,7 @@ systable_beginscan(Relation heapRelation,
}
sysscan->iscan = index_beginscan(heapRelation, irel,
- snapshot, NULL, nkeys, 0);
+ snapshot, NULL, nkeys, 0, 0);
index_rescan(sysscan->iscan, idxkey, nkeys, NULL, 0);
sysscan->scan = NULL;
@@ -708,7 +708,7 @@ systable_beginscan_ordered(Relation heapRelation,
}
sysscan->iscan = index_beginscan(heapRelation, indexRelation,
- snapshot, NULL, nkeys, 0);
+ snapshot, NULL, nkeys, 0, 0);
index_rescan(sysscan->iscan, idxkey, nkeys, NULL, 0);
sysscan->scan = NULL;
diff --git a/src/backend/access/index/indexam.c b/src/backend/access/index/indexam.c
index 0492d92d23b..b5523cf2ab1 100644
--- a/src/backend/access/index/indexam.c
+++ b/src/backend/access/index/indexam.c
@@ -257,7 +257,7 @@ index_beginscan(Relation heapRelation,
Relation indexRelation,
Snapshot snapshot,
IndexScanInstrumentation *instrument,
- int nkeys, int norderbys)
+ int nkeys, int norderbys, uint32 flags)
{
IndexScanDesc scan;
@@ -284,7 +284,7 @@ index_beginscan(Relation heapRelation,
scan->instrument = instrument;
/* prepare to fetch index matches from table */
- scan->xs_heapfetch = table_index_fetch_begin(heapRelation);
+ scan->xs_heapfetch = table_index_fetch_begin(heapRelation, flags);
return scan;
}
@@ -615,7 +615,7 @@ index_beginscan_parallel(Relation heaprel, Relation indexrel,
scan->instrument = instrument;
/* prepare to fetch index matches from table */
- scan->xs_heapfetch = table_index_fetch_begin(heaprel);
+ scan->xs_heapfetch = table_index_fetch_begin(heaprel, 0);
return scan;
}
diff --git a/src/backend/access/nbtree/nbtsort.c b/src/backend/access/nbtree/nbtsort.c
index d7695dc1108..7bdbc7e5fa7 100644
--- a/src/backend/access/nbtree/nbtsort.c
+++ b/src/backend/access/nbtree/nbtsort.c
@@ -1925,7 +1925,7 @@ _bt_parallel_scan_and_sort(BTSpool *btspool, BTSpool *btspool2,
indexInfo = BuildIndexInfo(btspool->index);
indexInfo->ii_Concurrent = btshared->isconcurrent;
scan = table_beginscan_parallel(btspool->heap,
- ParallelTableScanFromBTShared(btshared));
+ ParallelTableScanFromBTShared(btshared), 0);
reltuples = table_index_build_scan(btspool->heap, btspool->index, indexInfo,
true, progress, _bt_build_callback,
&buildstate, scan);
diff --git a/src/backend/access/table/tableam.c b/src/backend/access/table/tableam.c
index 73ebc01a08f..a00bdfdf822 100644
--- a/src/backend/access/table/tableam.c
+++ b/src/backend/access/table/tableam.c
@@ -163,10 +163,11 @@ table_parallelscan_initialize(Relation rel, ParallelTableScanDesc pscan,
}
TableScanDesc
-table_beginscan_parallel(Relation relation, ParallelTableScanDesc pscan)
+table_beginscan_parallel(Relation relation, ParallelTableScanDesc pscan, uint32 flags)
{
Snapshot snapshot;
- uint32 flags = SO_TYPE_SEQSCAN |
+
+ flags |= SO_TYPE_SEQSCAN |
SO_ALLOW_STRAT | SO_ALLOW_SYNC | SO_ALLOW_PAGEMODE;
Assert(RelFileLocatorEquals(relation->rd_locator, pscan->phs_locator));
@@ -248,7 +249,7 @@ table_index_fetch_tuple_check(Relation rel,
bool found;
slot = table_slot_create(rel, NULL);
- scan = table_index_fetch_begin(rel);
+ scan = table_index_fetch_begin(rel, 0);
found = table_index_fetch_tuple(scan, tid, snapshot, slot, &call_again,
all_dead);
table_index_fetch_end(scan);
diff --git a/src/backend/commands/constraint.c b/src/backend/commands/constraint.c
index 3497a8221f2..97c8278e36d 100644
--- a/src/backend/commands/constraint.c
+++ b/src/backend/commands/constraint.c
@@ -106,7 +106,7 @@ unique_key_recheck(PG_FUNCTION_ARGS)
*/
tmptid = checktid;
{
- IndexFetchTableData *scan = table_index_fetch_begin(trigdata->tg_relation);
+ IndexFetchTableData *scan = table_index_fetch_begin(trigdata->tg_relation, 0);
bool call_again = false;
if (!table_index_fetch_tuple(scan, &tmptid, SnapshotSelf, slot,
diff --git a/src/backend/commands/copyto.c b/src/backend/commands/copyto.c
index dae91630ac3..1957bb0f1a2 100644
--- a/src/backend/commands/copyto.c
+++ b/src/backend/commands/copyto.c
@@ -1156,7 +1156,7 @@ CopyRelationTo(CopyToState cstate, Relation rel, Relation root_rel, uint64 *proc
AttrMap *map = NULL;
TupleTableSlot *root_slot = NULL;
- scandesc = table_beginscan(rel, GetActiveSnapshot(), 0, NULL);
+ scandesc = table_beginscan(rel, GetActiveSnapshot(), 0, NULL, 0);
slot = table_slot_create(rel, NULL);
/*
diff --git a/src/backend/commands/tablecmds.c b/src/backend/commands/tablecmds.c
index 6b1a00ed477..130a670d266 100644
--- a/src/backend/commands/tablecmds.c
+++ b/src/backend/commands/tablecmds.c
@@ -6377,7 +6377,7 @@ ATRewriteTable(AlteredTableInfo *tab, Oid OIDNewHeap)
* checking all the constraints.
*/
snapshot = RegisterSnapshot(GetLatestSnapshot());
- scan = table_beginscan(oldrel, snapshot, 0, NULL);
+ scan = table_beginscan(oldrel, snapshot, 0, NULL, 0);
/*
* Switch to per-tuple memory context and reset it for each tuple
@@ -13766,7 +13766,7 @@ validateForeignKeyConstraint(char *conname,
*/
snapshot = RegisterSnapshot(GetLatestSnapshot());
slot = table_slot_create(rel, NULL);
- scan = table_beginscan(rel, snapshot, 0, NULL);
+ scan = table_beginscan(rel, snapshot, 0, NULL, 0);
perTupCxt = AllocSetContextCreate(CurrentMemoryContext,
"validateForeignKeyConstraint",
@@ -22623,7 +22623,7 @@ MergePartitionsMoveRows(List **wqueue, List *mergingPartitions, Relation newPart
/* Scan through the rows. */
snapshot = RegisterSnapshot(GetLatestSnapshot());
- scan = table_beginscan(mergingPartition, snapshot, 0, NULL);
+ scan = table_beginscan(mergingPartition, snapshot, 0, NULL, 0);
/*
* Switch to per-tuple memory context and reset it for each tuple
@@ -23087,7 +23087,7 @@ SplitPartitionMoveRows(List **wqueue, Relation rel, Relation splitRel,
/* Scan through the rows. */
snapshot = RegisterSnapshot(GetLatestSnapshot());
- scan = table_beginscan(splitRel, snapshot, 0, NULL);
+ scan = table_beginscan(splitRel, snapshot, 0, NULL, 0);
/*
* Switch to per-tuple memory context and reset it for each tuple
diff --git a/src/backend/commands/typecmds.c b/src/backend/commands/typecmds.c
index be6ffd6ddb0..2921f68c1c3 100644
--- a/src/backend/commands/typecmds.c
+++ b/src/backend/commands/typecmds.c
@@ -3157,7 +3157,7 @@ validateDomainNotNullConstraint(Oid domainoid)
/* Scan all tuples in this relation */
snapshot = RegisterSnapshot(GetLatestSnapshot());
- scan = table_beginscan(testrel, snapshot, 0, NULL);
+ scan = table_beginscan(testrel, snapshot, 0, NULL, 0);
slot = table_slot_create(testrel, NULL);
while (table_scan_getnextslot(scan, ForwardScanDirection, slot))
{
@@ -3238,7 +3238,7 @@ validateDomainCheckConstraint(Oid domainoid, const char *ccbin, LOCKMODE lockmod
/* Scan all tuples in this relation */
snapshot = RegisterSnapshot(GetLatestSnapshot());
- scan = table_beginscan(testrel, snapshot, 0, NULL);
+ scan = table_beginscan(testrel, snapshot, 0, NULL, 0);
slot = table_slot_create(testrel, NULL);
while (table_scan_getnextslot(scan, ForwardScanDirection, slot))
{
diff --git a/src/backend/executor/execIndexing.c b/src/backend/executor/execIndexing.c
index 0b3a31f1703..74262a34819 100644
--- a/src/backend/executor/execIndexing.c
+++ b/src/backend/executor/execIndexing.c
@@ -816,7 +816,7 @@ check_exclusion_or_unique_constraint(Relation heap, Relation index,
retry:
conflict = false;
found_self = false;
- index_scan = index_beginscan(heap, index, &DirtySnapshot, NULL, indnkeyatts, 0);
+ index_scan = index_beginscan(heap, index, &DirtySnapshot, NULL, indnkeyatts, 0, 0);
index_rescan(index_scan, scankeys, indnkeyatts, NULL, 0);
while (index_getnext_slot(index_scan, ForwardScanDirection, existing_slot))
diff --git a/src/backend/executor/execReplication.c b/src/backend/executor/execReplication.c
index 860f79f9cc1..6e49ea5c5d8 100644
--- a/src/backend/executor/execReplication.c
+++ b/src/backend/executor/execReplication.c
@@ -204,7 +204,7 @@ RelationFindReplTupleByIndex(Relation rel, Oid idxoid,
skey_attoff = build_replindex_scan_key(skey, rel, idxrel, searchslot);
/* Start an index scan. */
- scan = index_beginscan(rel, idxrel, &snap, NULL, skey_attoff, 0);
+ scan = index_beginscan(rel, idxrel, &snap, NULL, skey_attoff, 0, 0);
retry:
found = false;
@@ -382,7 +382,7 @@ RelationFindReplTupleSeq(Relation rel, LockTupleMode lockmode,
/* Start a heap scan. */
InitDirtySnapshot(snap);
- scan = table_beginscan(rel, &snap, 0, NULL);
+ scan = table_beginscan(rel, &snap, 0, NULL, 0);
scanslot = table_slot_create(rel, NULL);
retry:
@@ -601,7 +601,7 @@ RelationFindDeletedTupleInfoSeq(Relation rel, TupleTableSlot *searchslot,
* not yet committed or those just committed prior to the scan are
* excluded in update_most_recent_deletion_info().
*/
- scan = table_beginscan(rel, SnapshotAny, 0, NULL);
+ scan = table_beginscan(rel, SnapshotAny, 0, NULL, 0);
scanslot = table_slot_create(rel, NULL);
table_rescan(scan, NULL);
@@ -665,7 +665,7 @@ RelationFindDeletedTupleInfoByIndex(Relation rel, Oid idxoid,
* not yet committed or those just committed prior to the scan are
* excluded in update_most_recent_deletion_info().
*/
- scan = index_beginscan(rel, idxrel, SnapshotAny, NULL, skey_attoff, 0);
+ scan = index_beginscan(rel, idxrel, SnapshotAny, NULL, skey_attoff, 0, 0);
index_rescan(scan, skey, skey_attoff, NULL, 0);
diff --git a/src/backend/executor/nodeBitmapHeapscan.c b/src/backend/executor/nodeBitmapHeapscan.c
index bf24f3d7fe0..0d854db51a1 100644
--- a/src/backend/executor/nodeBitmapHeapscan.c
+++ b/src/backend/executor/nodeBitmapHeapscan.c
@@ -105,11 +105,18 @@ BitmapTableScanSetup(BitmapHeapScanState *node)
*/
if (!node->ss.ss_currentScanDesc)
{
+ uint32 flags = 0;
+
+ if (!bms_is_member(((Scan *) node->ss.ps.plan)->scanrelid,
+ node->ss.ps.state->es_modified_relids))
+ flags = SO_HINT_REL_READ_ONLY;
+
node->ss.ss_currentScanDesc =
table_beginscan_bm(node->ss.ss_currentRelation,
node->ss.ps.state->es_snapshot,
0,
- NULL);
+ NULL,
+ flags);
}
node->ss.ss_currentScanDesc->st.rs_tbmiterator = tbmiterator;
diff --git a/src/backend/executor/nodeIndexonlyscan.c b/src/backend/executor/nodeIndexonlyscan.c
index 6bea42f128f..2c87ba5f767 100644
--- a/src/backend/executor/nodeIndexonlyscan.c
+++ b/src/backend/executor/nodeIndexonlyscan.c
@@ -94,7 +94,7 @@ IndexOnlyNext(IndexOnlyScanState *node)
estate->es_snapshot,
&node->ioss_Instrument,
node->ioss_NumScanKeys,
- node->ioss_NumOrderByKeys);
+ node->ioss_NumOrderByKeys, 0);
node->ioss_ScanDesc = scandesc;
diff --git a/src/backend/executor/nodeIndexscan.c b/src/backend/executor/nodeIndexscan.c
index 72b135e5dcf..92674441c6d 100644
--- a/src/backend/executor/nodeIndexscan.c
+++ b/src/backend/executor/nodeIndexscan.c
@@ -102,6 +102,12 @@ IndexNext(IndexScanState *node)
if (scandesc == NULL)
{
+ uint32 flags = 0;
+
+ if (!bms_is_member(((Scan *) node->ss.ps.plan)->scanrelid,
+ estate->es_modified_relids))
+ flags = SO_HINT_REL_READ_ONLY;
+
/*
* We reach here if the index scan is not parallel, or if we're
* serially executing an index scan that was planned to be parallel.
@@ -111,7 +117,8 @@ IndexNext(IndexScanState *node)
estate->es_snapshot,
&node->iss_Instrument,
node->iss_NumScanKeys,
- node->iss_NumOrderByKeys);
+ node->iss_NumOrderByKeys,
+ flags);
node->iss_ScanDesc = scandesc;
@@ -207,7 +214,7 @@ IndexNextWithReorder(IndexScanState *node)
estate->es_snapshot,
&node->iss_Instrument,
node->iss_NumScanKeys,
- node->iss_NumOrderByKeys);
+ node->iss_NumOrderByKeys, 0);
node->iss_ScanDesc = scandesc;
diff --git a/src/backend/executor/nodeSeqscan.c b/src/backend/executor/nodeSeqscan.c
index 94047d29430..4d0cbb9dee4 100644
--- a/src/backend/executor/nodeSeqscan.c
+++ b/src/backend/executor/nodeSeqscan.c
@@ -65,13 +65,20 @@ SeqNext(SeqScanState *node)
if (scandesc == NULL)
{
+ uint32 flags = 0;
+
+ if (!bms_is_member(((Scan *) node->ss.ps.plan)->scanrelid,
+ estate->es_modified_relids))
+ flags = SO_HINT_REL_READ_ONLY;
+
/*
* We reach here if the scan is not parallel, or if we're serially
* executing a scan that was planned to be parallel.
*/
scandesc = table_beginscan(node->ss.ss_currentRelation,
estate->es_snapshot,
- 0, NULL);
+ 0, NULL, flags);
+
node->ss.ss_currentScanDesc = scandesc;
}
@@ -367,14 +374,20 @@ ExecSeqScanInitializeDSM(SeqScanState *node,
{
EState *estate = node->ss.ps.state;
ParallelTableScanDesc pscan;
+ uint32 flags = 0;
pscan = shm_toc_allocate(pcxt->toc, node->pscan_len);
table_parallelscan_initialize(node->ss.ss_currentRelation,
pscan,
estate->es_snapshot);
shm_toc_insert(pcxt->toc, node->ss.ps.plan->plan_node_id, pscan);
+ if (!bms_is_member(((Scan *) node->ss.ps.plan)->scanrelid,
+ estate->es_modified_relids))
+ flags = SO_HINT_REL_READ_ONLY;
+
node->ss.ss_currentScanDesc =
- table_beginscan_parallel(node->ss.ss_currentRelation, pscan);
+ table_beginscan_parallel(node->ss.ss_currentRelation, pscan,
+ flags);
}
/* ----------------------------------------------------------------
@@ -404,8 +417,15 @@ ExecSeqScanInitializeWorker(SeqScanState *node,
ParallelWorkerContext *pwcxt)
{
ParallelTableScanDesc pscan;
+ uint32 flags = 0;
+
+ if (!bms_is_member(((Scan *) node->ss.ps.plan)->scanrelid,
+ node->ss.ps.state->es_modified_relids))
+ flags = SO_HINT_REL_READ_ONLY;
pscan = shm_toc_lookup(pwcxt->toc, node->ss.ps.plan->plan_node_id, false);
node->ss.ss_currentScanDesc =
- table_beginscan_parallel(node->ss.ss_currentRelation, pscan);
+ table_beginscan_parallel(node->ss.ss_currentRelation,
+ pscan,
+ flags);
}
diff --git a/src/backend/partitioning/partbounds.c b/src/backend/partitioning/partbounds.c
index 16b0adc172c..91acf1ee2d7 100644
--- a/src/backend/partitioning/partbounds.c
+++ b/src/backend/partitioning/partbounds.c
@@ -3362,7 +3362,7 @@ check_default_partition_contents(Relation parent, Relation default_rel,
econtext = GetPerTupleExprContext(estate);
snapshot = RegisterSnapshot(GetLatestSnapshot());
tupslot = table_slot_create(part_rel, &estate->es_tupleTable);
- scan = table_beginscan(part_rel, snapshot, 0, NULL);
+ scan = table_beginscan(part_rel, snapshot, 0, NULL, 0);
/*
* Switch to per-tuple memory context and reset it for each tuple
diff --git a/src/backend/utils/adt/selfuncs.c b/src/backend/utils/adt/selfuncs.c
index c760b19db55..ec0def0d1e2 100644
--- a/src/backend/utils/adt/selfuncs.c
+++ b/src/backend/utils/adt/selfuncs.c
@@ -7100,7 +7100,7 @@ get_actual_variable_endpoint(Relation heapRel,
index_scan = index_beginscan(heapRel, indexRel,
&SnapshotNonVacuumable, NULL,
- 1, 0);
+ 1, 0, 0);
/* Set it up for index-only scan */
index_scan->xs_want_itup = true;
index_rescan(index_scan, scankeys, 1, NULL, 0);
diff --git a/src/include/access/genam.h b/src/include/access/genam.h
index 9200a22bd9f..d29d9e905fc 100644
--- a/src/include/access/genam.h
+++ b/src/include/access/genam.h
@@ -177,7 +177,7 @@ extern IndexScanDesc index_beginscan(Relation heapRelation,
Relation indexRelation,
Snapshot snapshot,
IndexScanInstrumentation *instrument,
- int nkeys, int norderbys);
+ int nkeys, int norderbys, uint32 flags);
extern IndexScanDesc index_beginscan_bitmap(Relation indexRelation,
Snapshot snapshot,
IndexScanInstrumentation *instrument,
diff --git a/src/include/access/heapam.h b/src/include/access/heapam.h
index e2ee035ae0b..38294b33fac 100644
--- a/src/include/access/heapam.h
+++ b/src/include/access/heapam.h
@@ -119,6 +119,12 @@ typedef struct IndexFetchHeapData
Buffer xs_cbuf; /* current heap buffer in scan, if any */
/* NB: if xs_cbuf is not InvalidBuffer, we hold a pin on that buffer */
+
+ /*
+ * Some optimizations can only be performed if the query does not modify
+ * the underlying relation. Track that here.
+ */
+ bool modifies_base_rel;
} IndexFetchHeapData;
/* Result codes for HeapTupleSatisfiesVacuum */
diff --git a/src/include/access/tableam.h b/src/include/access/tableam.h
index 2fa790b6bf5..d10b1b03cdb 100644
--- a/src/include/access/tableam.h
+++ b/src/include/access/tableam.h
@@ -63,6 +63,8 @@ typedef enum ScanOptions
/* unregister snapshot at scan end? */
SO_TEMP_SNAPSHOT = 1 << 9,
+ /* set if the query doesn't modify the rel */
+ SO_HINT_REL_READ_ONLY = 1 << 10,
} ScanOptions;
/*
@@ -420,7 +422,7 @@ typedef struct TableAmRoutine
*
* Tuples for an index scan can then be fetched via index_fetch_tuple.
*/
- struct IndexFetchTableData *(*index_fetch_begin) (Relation rel);
+ struct IndexFetchTableData *(*index_fetch_begin) (Relation rel, uint32 flags);
/*
* Reset index fetch. Typically this will release cross index fetch
@@ -874,9 +876,9 @@ extern TupleTableSlot *table_slot_create(Relation relation, List **reglist);
*/
static inline TableScanDesc
table_beginscan(Relation rel, Snapshot snapshot,
- int nkeys, ScanKeyData *key)
+ int nkeys, ScanKeyData *key, uint32 flags)
{
- uint32 flags = SO_TYPE_SEQSCAN |
+ flags |= SO_TYPE_SEQSCAN |
SO_ALLOW_STRAT | SO_ALLOW_SYNC | SO_ALLOW_PAGEMODE;
return rel->rd_tableam->scan_begin(rel, snapshot, nkeys, key, NULL, flags);
@@ -919,9 +921,9 @@ table_beginscan_strat(Relation rel, Snapshot snapshot,
*/
static inline TableScanDesc
table_beginscan_bm(Relation rel, Snapshot snapshot,
- int nkeys, ScanKeyData *key)
+ int nkeys, ScanKeyData *key, uint32 flags)
{
- uint32 flags = SO_TYPE_BITMAPSCAN | SO_ALLOW_PAGEMODE;
+ flags |= SO_TYPE_BITMAPSCAN | SO_ALLOW_PAGEMODE;
return rel->rd_tableam->scan_begin(rel, snapshot, nkeys, key,
NULL, flags);
@@ -1128,7 +1130,8 @@ extern void table_parallelscan_initialize(Relation rel,
* Caller must hold a suitable lock on the relation.
*/
extern TableScanDesc table_beginscan_parallel(Relation relation,
- ParallelTableScanDesc pscan);
+ ParallelTableScanDesc pscan,
+ uint32 flags);
/*
* Begin a parallel tid range scan. `pscan` needs to have been initialized
@@ -1164,9 +1167,9 @@ table_parallelscan_reinitialize(Relation rel, ParallelTableScanDesc pscan)
* Tuples for an index scan can then be fetched via table_index_fetch_tuple().
*/
static inline IndexFetchTableData *
-table_index_fetch_begin(Relation rel)
+table_index_fetch_begin(Relation rel, uint32 flags)
{
- return rel->rd_tableam->index_fetch_begin(rel);
+ return rel->rd_tableam->index_fetch_begin(rel, flags);
}
/*
--
2.43.0
v27-0013-Allow-on-access-pruning-to-set-pages-all-visible.patchtext/x-patch; charset=US-ASCII; name=v27-0013-Allow-on-access-pruning-to-set-pages-all-visible.patchDownload
From 519f2e4ee947d35edd6182850a26988744343ed4 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <melanieplageman@gmail.com>
Date: Wed, 3 Dec 2025 15:24:08 -0500
Subject: [PATCH v27 13/14] Allow on-access pruning to set pages all-visible
Many queries do not modify the underlying relation. For such queries, if
on-access pruning occurs during the scan, we can check whether the page
has become all-visible and update the visibility map accordingly.
Previously, only vacuum and COPY FREEZE marked pages as all-visible or
all-frozen.
This commit implements on-access VM setting for sequential scans as well
as for the underlying heap relation in index scans and bitmap heap
scans.
Author: Melanie Plageman <melanieplageman@gmail.com>
Reviewed-by: Andres Freund <andres@anarazel.de>
Reviewed-by: Kirill Reshke <reshkekirill@gmail.com>
Discussion: https://postgr.es/m/flat/CAAKRu_ZMw6Npd_qm2KM%2BFwQ3cMOMx1Dh3VMhp8-V7SOLxdK9-g%40mail.gmail.com
---
src/backend/access/heap/heapam.c | 15 ++++++-
src/backend/access/heap/heapam_handler.c | 15 ++++++-
src/backend/access/heap/pruneheap.c | 44 +++++++++++++++++--
src/include/access/heapam.h | 24 ++++++++--
.../t/035_standby_logical_decoding.pl | 3 +-
5 files changed, 90 insertions(+), 11 deletions(-)
diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index fb7a7548aa0..d9dc79f4a96 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -570,6 +570,7 @@ heap_prepare_pagescan(TableScanDesc sscan)
Buffer buffer = scan->rs_cbuf;
BlockNumber block = scan->rs_cblock;
Snapshot snapshot;
+ Buffer *vmbuffer = NULL;
Page page;
int lines;
bool all_visible;
@@ -584,7 +585,9 @@ heap_prepare_pagescan(TableScanDesc sscan)
/*
* Prune and repair fragmentation for the whole page, if possible.
*/
- heap_page_prune_opt(scan->rs_base.rs_rd, buffer);
+ if (sscan->rs_flags & SO_HINT_REL_READ_ONLY)
+ vmbuffer = &scan->rs_vmbuffer;
+ heap_page_prune_opt(scan->rs_base.rs_rd, buffer, vmbuffer);
/*
* We must hold share lock on the buffer content while examining tuple
@@ -1261,6 +1264,7 @@ heap_beginscan(Relation relation, Snapshot snapshot,
sizeof(TBMIterateResult));
}
+ scan->rs_vmbuffer = InvalidBuffer;
return (TableScanDesc) scan;
}
@@ -1299,6 +1303,12 @@ heap_rescan(TableScanDesc sscan, ScanKey key, bool set_params,
scan->rs_cbuf = InvalidBuffer;
}
+ if (BufferIsValid(scan->rs_vmbuffer))
+ {
+ ReleaseBuffer(scan->rs_vmbuffer);
+ scan->rs_vmbuffer = InvalidBuffer;
+ }
+
/*
* SO_TYPE_BITMAPSCAN would be cleaned up here, but it does not hold any
* additional data vs a normal HeapScan
@@ -1331,6 +1341,9 @@ heap_endscan(TableScanDesc sscan)
if (BufferIsValid(scan->rs_cbuf))
ReleaseBuffer(scan->rs_cbuf);
+ if (BufferIsValid(scan->rs_vmbuffer))
+ ReleaseBuffer(scan->rs_vmbuffer);
+
/*
* Must free the read stream before freeing the BufferAccessStrategy.
*/
diff --git a/src/backend/access/heap/heapam_handler.c b/src/backend/access/heap/heapam_handler.c
index 6c2e4e08b16..2cb98e58956 100644
--- a/src/backend/access/heap/heapam_handler.c
+++ b/src/backend/access/heap/heapam_handler.c
@@ -85,6 +85,7 @@ heapam_index_fetch_begin(Relation rel, uint32 flags)
hscan->xs_base.rel = rel;
hscan->xs_cbuf = InvalidBuffer;
+ hscan->xs_vmbuffer = InvalidBuffer;
hscan->modifies_base_rel = !(flags & SO_HINT_REL_READ_ONLY);
return &hscan->xs_base;
@@ -100,6 +101,12 @@ heapam_index_fetch_reset(IndexFetchTableData *scan)
ReleaseBuffer(hscan->xs_cbuf);
hscan->xs_cbuf = InvalidBuffer;
}
+
+ if (BufferIsValid(hscan->xs_vmbuffer))
+ {
+ ReleaseBuffer(hscan->xs_vmbuffer);
+ hscan->xs_vmbuffer = InvalidBuffer;
+ }
}
static void
@@ -139,7 +146,8 @@ heapam_index_fetch_tuple(struct IndexFetchTableData *scan,
* Prune page, but only if we weren't already on this page
*/
if (prev_buf != hscan->xs_cbuf)
- heap_page_prune_opt(hscan->xs_base.rel, hscan->xs_cbuf);
+ heap_page_prune_opt(hscan->xs_base.rel, hscan->xs_cbuf,
+ hscan->modifies_base_rel ? NULL : &hscan->xs_vmbuffer);
}
/* Obtain share-lock on the buffer so we can examine visibility */
@@ -2472,6 +2480,7 @@ BitmapHeapScanNextBlock(TableScanDesc scan,
TBMIterateResult *tbmres;
OffsetNumber offsets[TBM_MAX_TUPLES_PER_PAGE];
int noffsets = -1;
+ Buffer *vmbuffer = NULL;
Assert(scan->rs_flags & SO_TYPE_BITMAPSCAN);
Assert(hscan->rs_read_stream);
@@ -2518,7 +2527,9 @@ BitmapHeapScanNextBlock(TableScanDesc scan,
/*
* Prune and repair fragmentation for the whole page, if possible.
*/
- heap_page_prune_opt(scan->rs_rd, buffer);
+ if (scan->rs_flags & SO_HINT_REL_READ_ONLY)
+ vmbuffer = &hscan->rs_vmbuffer;
+ heap_page_prune_opt(scan->rs_rd, buffer, vmbuffer);
/*
* We must hold share lock on the buffer content while examining tuple
diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index 3d34532b766..393dff5ab3d 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -202,6 +202,8 @@ static bool heap_page_will_set_vm(PruneState *prstate,
Relation relation,
BlockNumber heap_blk, Buffer heap_buffer, Page heap_page,
Buffer vmbuffer,
+ PruneReason reason,
+ bool do_prune, bool do_freeze,
int nlpdead_items,
uint8 *old_vmbits,
uint8 *new_vmbits);
@@ -223,9 +225,13 @@ static TransactionId get_conflict_xid(bool do_prune, bool do_freeze, bool do_set
* if there's not any use in pruning.
*
* Caller must have pin on the buffer, and must *not* have a lock on it.
+ *
+ * If vmbuffer is not NULL, it is okay for pruning to set the visibility map if
+ * the page is all-visible. We will take care of pinning and, if needed,
+ * reading in the page of the visibility map.
*/
void
-heap_page_prune_opt(Relation relation, Buffer buffer)
+heap_page_prune_opt(Relation relation, Buffer buffer, Buffer *vmbuffer)
{
Page page = BufferGetPage(buffer);
TransactionId prune_xid;
@@ -306,6 +312,13 @@ heap_page_prune_opt(Relation relation, Buffer buffer)
.cutoffs = NULL,
};
+ if (vmbuffer)
+ {
+ visibilitymap_pin(relation, BufferGetBlockNumber(buffer), vmbuffer);
+ params.options |= HEAP_PAGE_PRUNE_UPDATE_VM;
+ params.vmbuffer = *vmbuffer;
+ }
+
heap_page_prune_and_freeze(¶ms, &presult, &dummy_off_loc,
NULL, NULL);
@@ -945,6 +958,9 @@ identify_and_fix_vm_corruption(Relation rel, Buffer heap_buffer,
* corrupted, it will fix them by clearing the VM bits and visibility hint.
* This does not need to be done in a critical section.
*
+ * This should be called only after do_freeze has been decided (and do_prune
+ * has been set), as these factor into our heuristic-based decision.
+ *
* Returns true if one or both VM bits should be set, along with returning the
* current value of the VM bits in *old_vmbits and the desired new value of
* the VM bits in *new_vmbits.
@@ -954,6 +970,8 @@ heap_page_will_set_vm(PruneState *prstate,
Relation relation,
BlockNumber heap_blk, Buffer heap_buffer, Page heap_page,
Buffer vmbuffer,
+ PruneReason reason,
+ bool do_prune, bool do_freeze,
int nlpdead_items,
uint8 *old_vmbits,
uint8 *new_vmbits)
@@ -961,6 +979,24 @@ heap_page_will_set_vm(PruneState *prstate,
if (!prstate->attempt_update_vm)
return false;
+ /*
+ * If this is an on-access call and we're not actually pruning, avoid
+ * setting the visibility map if it would newly dirty the heap page or, if
+ * the page is already dirty, if doing so would require including a
+ * full-page image (FPI) of the heap page in the WAL. This situation
+ * should be rare, as on-access pruning is only attempted when
+ * pd_prune_xid is valid.
+ */
+ if (reason == PRUNE_ON_ACCESS &&
+ prstate->all_visible &&
+ !do_prune && !do_freeze &&
+ (!BufferIsDirty(heap_buffer) || XLogCheckBufferNeedsBackup(heap_buffer)))
+ {
+ prstate->all_visible = false;
+ prstate->all_frozen = false;
+ return false;
+ }
+
*old_vmbits = visibilitymap_get_status(relation, heap_blk,
&vmbuffer);
@@ -1156,6 +1192,8 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
buffer,
page,
vmbuffer,
+ params->reason,
+ do_prune, do_freeze,
prstate.lpdead_items,
&old_vmbits,
&new_vmbits);
@@ -1242,9 +1280,7 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
MarkBufferDirty(buffer);
- /*
- * Emit a WAL XLOG_HEAP2_PRUNE* record showing what we did
- */
+ /* Emit a WAL XLOG_HEAP2_PRUNE* record showing what we did */
if (RelationNeedsWAL(params->relation))
{
log_heap_prune_and_freeze(params->relation, buffer,
diff --git a/src/include/access/heapam.h b/src/include/access/heapam.h
index 38294b33fac..6ed681b815c 100644
--- a/src/include/access/heapam.h
+++ b/src/include/access/heapam.h
@@ -95,6 +95,13 @@ typedef struct HeapScanDescData
*/
ParallelBlockTableScanWorkerData *rs_parallelworkerdata;
+ /*
+ * For sequential scans and bitmap heap scans. If the relation is not
+ * being modified, on-access pruning may read in the current heap page's
+ * corresponding VM block to this buffer.
+ */
+ Buffer rs_vmbuffer;
+
/* these fields only used in page-at-a-time mode and for bitmap scans */
uint32 rs_cindex; /* current tuple's index in vistuples */
uint32 rs_ntuples; /* number of visible tuples on page */
@@ -117,8 +124,18 @@ typedef struct IndexFetchHeapData
{
IndexFetchTableData xs_base; /* AM independent part of the descriptor */
- Buffer xs_cbuf; /* current heap buffer in scan, if any */
- /* NB: if xs_cbuf is not InvalidBuffer, we hold a pin on that buffer */
+ /*
+ * Current heap buffer in scan, if any. NB: if xs_cbuf is not
+ * InvalidBuffer, we hold a pin on that buffer.
+ */
+ Buffer xs_cbuf;
+
+ /*
+ * For index scans that do not modify the underlying heap table, on-access
+ * pruning may read in the current heap page's corresponding VM block to
+ * this buffer.
+ */
+ Buffer xs_vmbuffer;
/*
* Some optimizations can only be performed if the query does not modify
@@ -419,7 +436,8 @@ extern TransactionId heap_index_delete_tuples(Relation rel,
TM_IndexDeleteOp *delstate);
/* in heap/pruneheap.c */
-extern void heap_page_prune_opt(Relation relation, Buffer buffer);
+extern void heap_page_prune_opt(Relation relation, Buffer buffer,
+ Buffer *vmbuffer);
extern void heap_page_prune_and_freeze(PruneFreezeParams *params,
PruneFreezeResult *presult,
OffsetNumber *off_loc,
diff --git a/src/test/recovery/t/035_standby_logical_decoding.pl b/src/test/recovery/t/035_standby_logical_decoding.pl
index ebe2fae1789..bdd9f0a62cd 100644
--- a/src/test/recovery/t/035_standby_logical_decoding.pl
+++ b/src/test/recovery/t/035_standby_logical_decoding.pl
@@ -296,6 +296,7 @@ wal_level = 'logical'
max_replication_slots = 4
max_wal_senders = 4
autovacuum = off
+hot_standby_feedback = on
});
$node_primary->dump_info;
$node_primary->start;
@@ -748,7 +749,7 @@ check_pg_recvlogical_stderr($handle,
$logstart = -s $node_standby->logfile;
reactive_slots_change_hfs_and_wait_for_xmins('shared_row_removal_',
- 'no_conflict_', 0, 1);
+ 'no_conflict_', 1, 0);
# This should not trigger a conflict
wait_until_vacuum_can_remove(
--
2.43.0
v27-0014-Set-pd_prune_xid-on-insert.patchtext/x-patch; charset=UTF-8; name=v27-0014-Set-pd_prune_xid-on-insert.patchDownload
From eba0bfcd80c64a2c89e631db77f1afb3090de471 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <melanieplageman@gmail.com>
Date: Tue, 29 Jul 2025 16:12:56 -0400
Subject: [PATCH v27 14/14] Set pd_prune_xid on insert
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
Now that visibility map (VM) updates can occur during read-only queries,
it makes sense to also set the page’s pd_prune_xid hint during inserts.
This enables heap_page_prune_and_freeze() to run after a page is
filled with newly inserted tuples the first time it is read.
This change also addresses a long-standing note in heap_insert() and
heap_multi_insert(), which observed that setting pd_prune_xid would
help clean up aborted insertions sooner. Without it, such tuples might
linger until VACUUM, whereas now they can be pruned earlier.
Setting pd_prune_xid on insert can cause a page to be dirtied and
written out when it previously would not have been, affetcting the
reported number of hits in the index-killtuples isolation test. It is
unclear if this is a bug in the way hits are tracked, a faulty test
expectation, or if simply updating the test's expected output is
sufficient remediation.
---
src/backend/access/heap/heapam.c | 25 +++++++++++++------
src/backend/access/heap/heapam_xlog.c | 15 ++++++++++-
.../modules/index/expected/killtuples.out | 6 ++---
3 files changed, 34 insertions(+), 12 deletions(-)
diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index d9dc79f4a96..ccebc1f244b 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -2119,6 +2119,7 @@ heap_insert(Relation relation, HeapTuple tup, CommandId cid,
TransactionId xid = GetCurrentTransactionId();
HeapTuple heaptup;
Buffer buffer;
+ Page page;
Buffer vmbuffer = InvalidBuffer;
bool all_visible_cleared = false;
@@ -2178,15 +2179,19 @@ heap_insert(Relation relation, HeapTuple tup, CommandId cid,
}
/*
- * XXX Should we set PageSetPrunable on this page ?
+ * Set pd_prune_xid to trigger heap_page_prune_and_freeze() once the page
+ * is full so that we can set the page all-visible in the VM.
*
- * The inserting transaction may eventually abort thus making this tuple
- * DEAD and hence available for pruning. Though we don't want to optimize
- * for aborts, if no other tuple in this page is UPDATEd/DELETEd, the
- * aborted tuple will never be pruned until next vacuum is triggered.
+ * Setting pd_prune_xid is also handy if the inserting transaction
+ * eventually aborts making this tuple DEAD and hence available for
+ * pruning. If no other tuple in this page is UPDATEd/DELETEd, the aborted
+ * tuple would never otherwise be pruned until next vacuum is triggered.
*
- * If you do add PageSetPrunable here, add it in heap_xlog_insert too.
+ * Don't set it if we are in bootstrap mode, though.
*/
+ page = BufferGetPage(buffer);
+ if (TransactionIdIsNormal(xid))
+ PageSetPrunable(page, xid);
MarkBufferDirty(buffer);
@@ -2196,7 +2201,6 @@ heap_insert(Relation relation, HeapTuple tup, CommandId cid,
xl_heap_insert xlrec;
xl_heap_header xlhdr;
XLogRecPtr recptr;
- Page page = BufferGetPage(buffer);
uint8 info = XLOG_HEAP_INSERT;
int bufflags = 0;
@@ -2560,8 +2564,13 @@ heap_multi_insert(Relation relation, TupleTableSlot **slots, int ntuples,
}
/*
- * XXX Should we set PageSetPrunable on this page ? See heap_insert()
+ * Set pd_prune_xid. See heap_insert() for more on why we do this when
+ * inserting tuples. This only makes sense if we aren't already
+ * setting the page frozen in the VM. We also don't set it in
+ * bootstrap mode.
*/
+ if (!all_frozen_set && TransactionIdIsNormal(xid))
+ PageSetPrunable(page, xid);
MarkBufferDirty(buffer);
diff --git a/src/backend/access/heap/heapam_xlog.c b/src/backend/access/heap/heapam_xlog.c
index 47d2479415e..ab2db931aac 100644
--- a/src/backend/access/heap/heapam_xlog.c
+++ b/src/backend/access/heap/heapam_xlog.c
@@ -447,6 +447,12 @@ heap_xlog_insert(XLogReaderState *record)
freespace = PageGetHeapFreeSpace(page); /* needed to update FSM below */
+ /*
+ * Set the page prunable to trigger on-access pruning later which may
+ * set the page all-visible in the VM.
+ */
+ PageSetPrunable(page, XLogRecGetXid(record));
+
PageSetLSN(page, lsn);
if (xlrec->flags & XLH_INSERT_ALL_VISIBLE_CLEARED)
@@ -596,9 +602,16 @@ heap_xlog_multi_insert(XLogReaderState *record)
if (xlrec->flags & XLH_INSERT_ALL_VISIBLE_CLEARED)
PageClearAllVisible(page);
- /* XLH_INSERT_ALL_FROZEN_SET implies that all tuples are visible */
+ /*
+ * XLH_INSERT_ALL_FROZEN_SET implies that all tuples are visible. If
+ * we are not setting the page frozen, then set the page's prunable
+ * hint so that we trigger on-access pruning later which may set the
+ * page all-visible in the VM.
+ */
if (xlrec->flags & XLH_INSERT_ALL_FROZEN_SET)
PageSetAllVisible(page);
+ else
+ PageSetPrunable(page, XLogRecGetXid(record));
MarkBufferDirty(buffer);
}
diff --git a/src/test/modules/index/expected/killtuples.out b/src/test/modules/index/expected/killtuples.out
index be7ddd756ef..b29f2434b00 100644
--- a/src/test/modules/index/expected/killtuples.out
+++ b/src/test/modules/index/expected/killtuples.out
@@ -54,7 +54,7 @@ step flush: SELECT FROM pg_stat_force_next_flush();
step result: SELECT heap_blks_read + heap_blks_hit - counter.heap_accesses AS new_heap_accesses FROM counter, pg_statio_all_tables WHERE relname = 'kill_prior_tuple';
new_heap_accesses
-----------------
- 1
+ 2
(1 row)
step measure: UPDATE counter SET heap_accesses = (SELECT heap_blks_read + heap_blks_hit FROM pg_statio_all_tables WHERE relname = 'kill_prior_tuple');
@@ -130,7 +130,7 @@ step flush: SELECT FROM pg_stat_force_next_flush();
step result: SELECT heap_blks_read + heap_blks_hit - counter.heap_accesses AS new_heap_accesses FROM counter, pg_statio_all_tables WHERE relname = 'kill_prior_tuple';
new_heap_accesses
-----------------
- 1
+ 2
(1 row)
step measure: UPDATE counter SET heap_accesses = (SELECT heap_blks_read + heap_blks_hit FROM pg_statio_all_tables WHERE relname = 'kill_prior_tuple');
@@ -283,7 +283,7 @@ step flush: SELECT FROM pg_stat_force_next_flush();
step result: SELECT heap_blks_read + heap_blks_hit - counter.heap_accesses AS new_heap_accesses FROM counter, pg_statio_all_tables WHERE relname = 'kill_prior_tuple';
new_heap_accesses
-----------------
- 1
+ 2
(1 row)
step measure: UPDATE counter SET heap_accesses = (SELECT heap_blks_read + heap_blks_hit FROM pg_statio_all_tables WHERE relname = 'kill_prior_tuple');
--
2.43.0
Hi!
in v27-0001:
Melanie Plageman <melanieplageman(at)gmail(dot)com> wrote:
The last vacuum is expected to set vm bits, but the test doesn’t verify that. Should we verify that like:
```
evantest=# SELECT blkno, all_visible, all_frozen FROM pg_visibility_map('test_vac_unmodified_heap');
blkno | all_visible | all_frozen
-------+-------------+------------
0 | t | t
(1 row)
I've done this. I've actually added three such verifications -- one
after each step where the VM is expected to change. It shouldn't be
very expensive, so I think it is okay. The way the test would fail if
the buffer wasn't correctly dirtied is that it would assert out -- so
the visibility map test wouldn't even have a chance to fail. But, I
think it is also okay to confirm that the expected things are
happening with the VM -- it just gives us extra coverage.
+1 on extra coverage. Should we also do sql-level check that the VM
indeed does not need to set PD_ALL_VISIBLE (check header bytes using
pageinspect?).
v27-0003 & v27-0004: I did not get the exact reason we introduced
`identify_and_fix_vm_corruption` in 0003 and moved code in 0004 to
another place. I can see we have this starting v25 of patch set. Well,
maybe this is not an issue at all...
in v27-0005. This patch changes code which is not exercised in
tests[0]https://coverage.postgresql.org/src/backend/access/heap/vacuumlazy.c.gcov.html#1902 -- Best regards, Kirill Reshke. I spent some time understanding the conditions when we
entered this. There is a comment about non-finished relation
extension, but I got no success trying to reproduce this. I ended up
modifying code to lose PageSetAllVisible in proper places and running
vacuum. Looks like everything works as expected. I will spend some
more time on this, maybe I will be successful in writing an
injection-point-based TAP test which hits this...
[0]: https://coverage.postgresql.org/src/backend/access/heap/vacuumlazy.c.gcov.html#1902 -- Best regards, Kirill Reshke
--
Best regards,
Kirill Reshke
Thanks for the review!
In addition to addressing your feedback, attached v28 includes a
number of small fixes to comments, commit messages, and other things.
Notably, I've added one new refactoring patch 0009, which reduces the
diff of 0010 -- using the GlobalVisState instead of OldestXmin for
page visibility -- even further.
On Wed, Dec 17, 2025 at 1:27 PM Kirill Reshke <reshkekirill@gmail.com> wrote:
I've done this. I've actually added three such verifications -- one
after each step where the VM is expected to change. It shouldn't be
very expensive, so I think it is okay. The way the test would fail if
the buffer wasn't correctly dirtied is that it would assert out -- so
the visibility map test wouldn't even have a chance to fail. But, I
think it is also okay to confirm that the expected things are
happening with the VM -- it just gives us extra coverage.+1 on extra coverage. Should we also do sql-level check that the VM
indeed does not need to set PD_ALL_VISIBLE (check header bytes using
pageinspect?).
That's an interesting idea. I checked and, AFAICT, there are no tests
currently directly comparing the flags column returned by the
pageinspect page_header() function to one of the flag values. I've
added the following to attached v28.
SELECT (flags & x'0004'::int) <> 0
FROM page_header(get_raw_page('test_vac_unmodified_heap', 0));
But I'm not sure if it is weird/confusing to be comparing the flag
directly to the number 4 like this. I don't really want to bother with
adding another function to pageinspect returning the status of
PD_ALL_VISIBLE (like page_visible() or something).
v27-0003 & v27-0004: I did not get the exact reason we introduced
`identify_and_fix_vm_corruption` in 0003 and moved code in 0004 to
another place. I can see we have this starting v25 of patch set. Well,
maybe this is not an issue at all...
It's mostly for ease of review. This is a pretty sensitive area of
code, so I thought it would be easier for the reviewer to confirm
correctness if I split it up. Andres had mentioned that the commit was
hard to review because so many different things were happening.
In v27, 0003 moves the VM clear code into a helper. 0004 and 0005
moves all the VM setting/clearing code to
heap_page_prune_and_freeze(). And 0006 actually sets the VM in the
same critical section as pruning/freezing and emits a single WAL
record.
I'm not really sure which commits should stay independent in the final
version I push to master.
in v27-0005. This patch changes code which is not exercised in
tests[0]. I spent some time understanding the conditions when we
entered this. There is a comment about non-finished relation
extension, but I got no success trying to reproduce this. I ended up
modifying code to lose PageSetAllVisible in proper places and running
vacuum. Looks like everything works as expected. I will spend some
more time on this, maybe I will be successful in writing an
injection-point-based TAP test which hits this...
Based on the coverage report link you provided, that code is changed
by v27 0007, not 0005. 0005 is about moving an assertion out of
lazy_scan_prune(). 0007 changes lazy_scan_new_or_empty() (the code in
question).
Regarding 0007, it looks like what is uncovered (the orange bits in
the coverage report are uncovered, I assume) is empty pages _without_
PD_ALL_VISIBLE set. I don't see anywhere where PageSetAllVisible() is
called except vacuum and COPY FREEZE.
If I was trying to guess how empty pages with PD_ALL_VISIBLE set are
getting vacuumed, I would think it is due to SKIP_PAGES_THRESHOLD
causing us to vacuum an all-frozen empty page.
Then the question is, why wouldn't we have coverage of the empty page
first being set all-visible/all-frozen? It can't be COPY FREEZE
because the page is empty. And it can't be vacuum, because then we
would have coverage. It's very mysterious.
It would be good to have coverage for this case. I don't think you'll
need an injection point for the main case of "empty page not yet set
all-visible is vacuumed for the first time" (unless I'm
misunderstanding something).
I'm not sure how you'll test the "vacuuming an empty, previously
uninitialized page" case described in this comment, though.
* It's possible that another backend has extended the heap,
* initialized the page, and then failed to WAL-log the page due
* to an ERROR. Since heap extension is not WAL-logged, recovery
* might try to replay our record setting the page all-visible and
* find that the page isn't initialized, which will cause a PANIC.
* To prevent that, check whether the page has been previously
* WAL-logged, and if not, do that now.
You'd want to force an error during relation extension and then vacuum
the page. I don't know if you need an injection point to force the
error -- depends on what kind of error, I think.
So that I know for attribution, did you review 0003-0005?
- Melanie
Attachments:
v28-0001-Combine-visibilitymap_set-cases-in-lazy_scan_pru.patchtext/x-patch; charset=UTF-8; name=v28-0001-Combine-visibilitymap_set-cases-in-lazy_scan_pru.patchDownload
From ee7c5f860799f195644e2fedf2b63b6789045cbc Mon Sep 17 00:00:00 2001
From: Melanie Plageman <melanieplageman@gmail.com>
Date: Mon, 8 Dec 2025 15:49:54 -0500
Subject: [PATCH v28 01/15] Combine visibilitymap_set() cases in
lazy_scan_prune()
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
lazy_scan_prune() previously had two separate cases that called
visibilitymap_set() after pruning and freezing. These branches were
nearly identical except that one attempted to avoid dirtying the heap
buffer. However, that situation can never occur — the heap buffer cannot
be clean at that point (and we would hit an assertion if it were).
In lazy_scan_prune(), when we change a previously all-visible page to
all-frozen and the page was recorded as all-visible in the visibility
map by find_next_unskippable_block(), the heap buffer will always be
dirty. Either we have just frozen a tuple and already dirtied the
buffer, or the buffer was modified between find_next_unskippable_block()
and heap_page_prune_and_freeze() and then pruned in
heap_page_prune_and_freeze().
Additionally, XLogRegisterBuffer() asserts that the buffer is dirty, so
attempting to add a clean heap buffer to the WAL chain would assert out
anyway.
Since the “clean heap buffer with already set VM” case is impossible,
the two visibilitymap_set() branches in lazy_scan_prune() can be merged.
Doing so makes the intent clearer and emphasizes that the heap buffer
must always be marked dirty before being added to the WAL chain.
This commit also adds a test case for vacuuming when no heap
modifications are required. Currently this ensures that the heap buffer
is marked dirty before it is added to the WAL chain, but if we later
remove the heap buffer from the VM-set WAL chain or pass it with the
REGBUF_NO_CHANGES flag, this test would guard that behavior.
Author: Melanie Plageman <melanieplageman@gmail.com>
Reviewed-by: Chao Li <li.evan.chao@gmail.com>
Reviewed-by: Srinath Reddy Sadipiralla <srinath2133@gmail.com>
Reviewed-by: Kirill Reshke <reshkekirill@gmail.com>
Discussion: https://postgr.es/m/5CEAA162-67B1-44DA-B60D-8B65717E8B05%40gmail.com
---
.../pg_visibility/expected/pg_visibility.out | 44 ++++++++++
contrib/pg_visibility/sql/pg_visibility.sql | 20 +++++
src/backend/access/heap/vacuumlazy.c | 87 ++++---------------
3 files changed, 82 insertions(+), 69 deletions(-)
diff --git a/contrib/pg_visibility/expected/pg_visibility.out b/contrib/pg_visibility/expected/pg_visibility.out
index 09fa5933a35..e10f1706015 100644
--- a/contrib/pg_visibility/expected/pg_visibility.out
+++ b/contrib/pg_visibility/expected/pg_visibility.out
@@ -1,4 +1,5 @@
CREATE EXTENSION pg_visibility;
+CREATE EXTENSION pageinspect;
--
-- recently-dropped table
--
@@ -204,6 +205,49 @@ select pg_truncate_visibility_map('test_partition');
(1 row)
+-- test the case where vacuum phase I does not need to modify the heap buffer
+-- and only needs to set the VM
+create table test_vac_unmodified_heap(a int);
+insert into test_vac_unmodified_heap values (1);
+vacuum (freeze) test_vac_unmodified_heap;
+select pg_visibility_map_summary('test_vac_unmodified_heap');
+ pg_visibility_map_summary
+---------------------------
+ (1,1)
+(1 row)
+
+-- the checkpoint cleans the buffer dirtied by freezing the sole tuple
+checkpoint;
+-- truncating the VM ensures that the next vacuum will need to set it
+select pg_truncate_visibility_map('test_vac_unmodified_heap');
+ pg_truncate_visibility_map
+----------------------------
+
+(1 row)
+
+select pg_visibility_map_summary('test_vac_unmodified_heap');
+ pg_visibility_map_summary
+---------------------------
+ (0,0)
+(1 row)
+
+-- though the VM is truncated, the heap page-level visibility hint,
+-- PD_ALL_VISIBLE should still be set
+SELECT (flags & x'0004'::int) <> 0
+ FROM page_header(get_raw_page('test_vac_unmodified_heap', 0));
+ ?column?
+----------
+ t
+(1 row)
+
+-- vacuum sets the VM
+vacuum test_vac_unmodified_heap;
+select pg_visibility_map_summary('test_vac_unmodified_heap');
+ pg_visibility_map_summary
+---------------------------
+ (1,1)
+(1 row)
+
-- test copy freeze
create table copyfreeze (a int, b char(1500));
-- load all rows via COPY FREEZE and ensure that all pages are set all-visible
diff --git a/contrib/pg_visibility/sql/pg_visibility.sql b/contrib/pg_visibility/sql/pg_visibility.sql
index 5af06ec5b76..57af8a0c5b6 100644
--- a/contrib/pg_visibility/sql/pg_visibility.sql
+++ b/contrib/pg_visibility/sql/pg_visibility.sql
@@ -1,4 +1,5 @@
CREATE EXTENSION pg_visibility;
+CREATE EXTENSION pageinspect;
--
-- recently-dropped table
@@ -94,6 +95,25 @@ select count(*) > 0 from pg_visibility_map_summary('test_partition');
select * from pg_check_frozen('test_partition'); -- hopefully none
select pg_truncate_visibility_map('test_partition');
+-- test the case where vacuum phase I does not need to modify the heap buffer
+-- and only needs to set the VM
+create table test_vac_unmodified_heap(a int);
+insert into test_vac_unmodified_heap values (1);
+vacuum (freeze) test_vac_unmodified_heap;
+select pg_visibility_map_summary('test_vac_unmodified_heap');
+-- the checkpoint cleans the buffer dirtied by freezing the sole tuple
+checkpoint;
+-- truncating the VM ensures that the next vacuum will need to set it
+select pg_truncate_visibility_map('test_vac_unmodified_heap');
+select pg_visibility_map_summary('test_vac_unmodified_heap');
+-- though the VM is truncated, the heap page-level visibility hint,
+-- PD_ALL_VISIBLE should still be set
+SELECT (flags & x'0004'::int) <> 0
+ FROM page_header(get_raw_page('test_vac_unmodified_heap', 0));
+-- vacuum sets the VM
+vacuum test_vac_unmodified_heap;
+select pg_visibility_map_summary('test_vac_unmodified_heap');
+
-- test copy freeze
create table copyfreeze (a int, b char(1500));
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index 30778a15639..cecba2146ea 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -2093,16 +2093,14 @@ lazy_scan_prune(LVRelState *vacrel,
* of last heap_vac_scan_next_block() call), and from all_visible and
* all_frozen variables
*/
- if (!all_visible_according_to_vm && presult.all_visible)
+ if ((presult.all_visible && !all_visible_according_to_vm) ||
+ (presult.all_frozen && !VM_ALL_FROZEN(rel, blkno, &vmbuffer)))
{
uint8 old_vmbits;
uint8 flags = VISIBILITYMAP_ALL_VISIBLE;
if (presult.all_frozen)
- {
- Assert(!TransactionIdIsValid(presult.vm_conflict_horizon));
flags |= VISIBILITYMAP_ALL_FROZEN;
- }
/*
* It should never be the case that the visibility map page is set
@@ -2110,15 +2108,25 @@ lazy_scan_prune(LVRelState *vacrel,
* checksums are not enabled). Regardless, set both bits so that we
* get back in sync.
*
- * NB: If the heap page is all-visible but the VM bit is not set, we
- * don't need to dirty the heap page. However, if checksums are
- * enabled, we do need to make sure that the heap page is dirtied
- * before passing it to visibilitymap_set(), because it may be logged.
- * Given that this situation should only happen in rare cases after a
- * crash, it is not worth optimizing.
+ * Even if PD_ALL_VISIBLE is already set, we don't need to worry about
+ * unnecessarily dirtying the heap buffer. Nearly the only scenario
+ * where PD_ALL_VISIBLE is set but the VM is not is if the VM was
+ * removed -- and that isn't worth optimizing for. And if we add the
+ * heap buffer to the WAL chain (without passing REGBUF_NO_CHANGES),
+ * it must be marked dirty.
*/
PageSetAllVisible(page);
MarkBufferDirty(buf);
+
+ /*
+ * If the page is being set all-frozen, we pass InvalidTransactionId
+ * as the cutoff_xid, since a snapshot conflict horizon sufficient to
+ * make everything safe for REDO was logged when the page's tuples
+ * were frozen.
+ */
+ Assert(!presult.all_frozen ||
+ !TransactionIdIsValid(presult.vm_conflict_horizon));
+
old_vmbits = visibilitymap_set(vacrel->rel, blkno, buf,
InvalidXLogRecPtr,
vmbuffer, presult.vm_conflict_horizon,
@@ -2190,65 +2198,6 @@ lazy_scan_prune(LVRelState *vacrel,
VISIBILITYMAP_VALID_BITS);
}
- /*
- * If the all-visible page is all-frozen but not marked as such yet, mark
- * it as all-frozen.
- */
- else if (all_visible_according_to_vm && presult.all_frozen &&
- !VM_ALL_FROZEN(vacrel->rel, blkno, &vmbuffer))
- {
- uint8 old_vmbits;
-
- /*
- * Avoid relying on all_visible_according_to_vm as a proxy for the
- * page-level PD_ALL_VISIBLE bit being set, since it might have become
- * stale -- even when all_visible is set
- */
- if (!PageIsAllVisible(page))
- {
- PageSetAllVisible(page);
- MarkBufferDirty(buf);
- }
-
- /*
- * Set the page all-frozen (and all-visible) in the VM.
- *
- * We can pass InvalidTransactionId as our cutoff_xid, since a
- * snapshotConflictHorizon sufficient to make everything safe for REDO
- * was logged when the page's tuples were frozen.
- */
- Assert(!TransactionIdIsValid(presult.vm_conflict_horizon));
- old_vmbits = visibilitymap_set(vacrel->rel, blkno, buf,
- InvalidXLogRecPtr,
- vmbuffer, InvalidTransactionId,
- VISIBILITYMAP_ALL_VISIBLE |
- VISIBILITYMAP_ALL_FROZEN);
-
- /*
- * The page was likely already set all-visible in the VM. However,
- * there is a small chance that it was modified sometime between
- * setting all_visible_according_to_vm and checking the visibility
- * during pruning. Check the return value of old_vmbits anyway to
- * ensure the visibility map counters used for logging are accurate.
- */
- if ((old_vmbits & VISIBILITYMAP_ALL_VISIBLE) == 0)
- {
- vacrel->vm_new_visible_pages++;
- vacrel->vm_new_visible_frozen_pages++;
- *vm_page_frozen = true;
- }
-
- /*
- * We already checked that the page was not set all-frozen in the VM
- * above, so we don't need to test the value of old_vmbits.
- */
- else
- {
- vacrel->vm_new_frozen_pages++;
- *vm_page_frozen = true;
- }
- }
-
return presult.ndeleted;
}
--
2.43.0
v28-0002-Eliminate-use-of-cached-VM-value-in-lazy_scan_pr.patchtext/x-patch; charset=US-ASCII; name=v28-0002-Eliminate-use-of-cached-VM-value-in-lazy_scan_pr.patchDownload
From 0dac7060ae0eddc2617a1919150757a7e63924f3 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <melanieplageman@gmail.com>
Date: Thu, 11 Dec 2025 10:48:13 -0500
Subject: [PATCH v28 02/15] Eliminate use of cached VM value in
lazy_scan_prune()
lazy_scan_prune() takes a parameter from lazy_scan_heap() indicating
whether the page was marked all-visible in the VM at the time it was
last checked in find_next_unskippable_block(). This behavior is
historical, dating back to commit 608195a3a365, when we did not pin the
VM page until confirming it was not all-visible. Now that the VM page is
already pinned, there is no meaningful benefit to relying on a cached VM
status.
Removing this cached value simplifies the logic in both lazy_scan_heap()
and lazy_scan_prune(). It also clarifies future work that will set the
visibility map on-access: such paths will not have a cached value
available which would make the logic harder to reason about. Eliminating
it also enables us to detect and repair VM corruption on-access.
Along with removing the cached value and unconditionally checking the
visibility status of the heap page, this commit also moves the VM
corruption handling to occur first. This reordering should have no
performance impact, since the checks are inexpensive and performed only
once per page. It does, however, make the control flow easier to
understand. The new restructuring also makes it possible that after
fixing corruption, the VM could be newly set, if pruning found the page
all-visible.
Author: Melanie Plageman <melanieplageman@gmail.com>
Discussion: https://postgr.es/m/5CEAA162-67B1-44DA-B60D-8B65717E8B05%40gmail.com
---
src/backend/access/heap/vacuumlazy.c | 178 ++++++++++++---------------
1 file changed, 79 insertions(+), 99 deletions(-)
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index cecba2146ea..9d2523a55b2 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -248,13 +248,6 @@ typedef enum
*/
#define EAGER_SCAN_REGION_SIZE 4096
-/*
- * heap_vac_scan_next_block() sets these flags to communicate information
- * about the block it read to the caller.
- */
-#define VAC_BLK_WAS_EAGER_SCANNED (1 << 0)
-#define VAC_BLK_ALL_VISIBLE_ACCORDING_TO_VM (1 << 1)
-
typedef struct LVRelState
{
/* Target heap relation and its indexes */
@@ -358,7 +351,6 @@ typedef struct LVRelState
/* State maintained by heap_vac_scan_next_block() */
BlockNumber current_block; /* last block returned */
BlockNumber next_unskippable_block; /* next unskippable block */
- bool next_unskippable_allvis; /* its visibility status */
bool next_unskippable_eager_scanned; /* if it was eagerly scanned */
Buffer next_unskippable_vmbuffer; /* buffer containing its VM bit */
@@ -432,7 +424,7 @@ static bool lazy_scan_new_or_empty(LVRelState *vacrel, Buffer buf,
bool sharelock, Buffer vmbuffer);
static int lazy_scan_prune(LVRelState *vacrel, Buffer buf,
BlockNumber blkno, Page page,
- Buffer vmbuffer, bool all_visible_according_to_vm,
+ Buffer vmbuffer,
bool *has_lpdead_items, bool *vm_page_frozen);
static bool lazy_scan_noprune(LVRelState *vacrel, Buffer buf,
BlockNumber blkno, Page page,
@@ -1248,7 +1240,6 @@ lazy_scan_heap(LVRelState *vacrel)
/* Initialize for the first heap_vac_scan_next_block() call */
vacrel->current_block = InvalidBlockNumber;
vacrel->next_unskippable_block = InvalidBlockNumber;
- vacrel->next_unskippable_allvis = false;
vacrel->next_unskippable_eager_scanned = false;
vacrel->next_unskippable_vmbuffer = InvalidBuffer;
@@ -1264,13 +1255,13 @@ lazy_scan_heap(LVRelState *vacrel)
MAIN_FORKNUM,
heap_vac_scan_next_block,
vacrel,
- sizeof(uint8));
+ sizeof(bool));
while (true)
{
Buffer buf;
Page page;
- uint8 blk_info = 0;
+ bool was_eager_scanned = false;
int ndeleted = 0;
bool has_lpdead_items;
void *per_buffer_data = NULL;
@@ -1339,13 +1330,13 @@ lazy_scan_heap(LVRelState *vacrel)
if (!BufferIsValid(buf))
break;
- blk_info = *((uint8 *) per_buffer_data);
+ was_eager_scanned = *((bool *) per_buffer_data);
CheckBufferIsPinnedOnce(buf);
page = BufferGetPage(buf);
blkno = BufferGetBlockNumber(buf);
vacrel->scanned_pages++;
- if (blk_info & VAC_BLK_WAS_EAGER_SCANNED)
+ if (was_eager_scanned)
vacrel->eager_scanned_pages++;
/* Report as block scanned, update error traceback information */
@@ -1416,7 +1407,6 @@ lazy_scan_heap(LVRelState *vacrel)
if (got_cleanup_lock)
ndeleted = lazy_scan_prune(vacrel, buf, blkno, page,
vmbuffer,
- blk_info & VAC_BLK_ALL_VISIBLE_ACCORDING_TO_VM,
&has_lpdead_items, &vm_page_frozen);
/*
@@ -1433,8 +1423,7 @@ lazy_scan_heap(LVRelState *vacrel)
* exclude pages skipped due to cleanup lock contention from eager
* freeze algorithm caps.
*/
- if (got_cleanup_lock &&
- (blk_info & VAC_BLK_WAS_EAGER_SCANNED))
+ if (got_cleanup_lock && was_eager_scanned)
{
/* Aggressive vacuums do not eager scan. */
Assert(!vacrel->aggressive);
@@ -1601,7 +1590,6 @@ heap_vac_scan_next_block(ReadStream *stream,
{
BlockNumber next_block;
LVRelState *vacrel = callback_private_data;
- uint8 blk_info = 0;
/* relies on InvalidBlockNumber + 1 overflowing to 0 on first call */
next_block = vacrel->current_block + 1;
@@ -1664,8 +1652,8 @@ heap_vac_scan_next_block(ReadStream *stream,
* otherwise they would've been unskippable.
*/
vacrel->current_block = next_block;
- blk_info |= VAC_BLK_ALL_VISIBLE_ACCORDING_TO_VM;
- *((uint8 *) per_buffer_data) = blk_info;
+ /* Block was not eager scanned */
+ *((bool *) per_buffer_data) = false;
return vacrel->current_block;
}
else
@@ -1677,11 +1665,7 @@ heap_vac_scan_next_block(ReadStream *stream,
Assert(next_block == vacrel->next_unskippable_block);
vacrel->current_block = next_block;
- if (vacrel->next_unskippable_allvis)
- blk_info |= VAC_BLK_ALL_VISIBLE_ACCORDING_TO_VM;
- if (vacrel->next_unskippable_eager_scanned)
- blk_info |= VAC_BLK_WAS_EAGER_SCANNED;
- *((uint8 *) per_buffer_data) = blk_info;
+ *((bool *) per_buffer_data) = vacrel->next_unskippable_eager_scanned;
return vacrel->current_block;
}
}
@@ -1706,7 +1690,6 @@ find_next_unskippable_block(LVRelState *vacrel, bool *skipsallvis)
BlockNumber next_unskippable_block = vacrel->next_unskippable_block + 1;
Buffer next_unskippable_vmbuffer = vacrel->next_unskippable_vmbuffer;
bool next_unskippable_eager_scanned = false;
- bool next_unskippable_allvis;
*skipsallvis = false;
@@ -1716,7 +1699,6 @@ find_next_unskippable_block(LVRelState *vacrel, bool *skipsallvis)
next_unskippable_block,
&next_unskippable_vmbuffer);
- next_unskippable_allvis = (mapbits & VISIBILITYMAP_ALL_VISIBLE) != 0;
/*
* At the start of each eager scan region, normal vacuums with eager
@@ -1735,7 +1717,7 @@ find_next_unskippable_block(LVRelState *vacrel, bool *skipsallvis)
* A block is unskippable if it is not all visible according to the
* visibility map.
*/
- if (!next_unskippable_allvis)
+ if ((mapbits & VISIBILITYMAP_ALL_VISIBLE) == 0)
{
Assert((mapbits & VISIBILITYMAP_ALL_FROZEN) == 0);
break;
@@ -1792,7 +1774,6 @@ find_next_unskippable_block(LVRelState *vacrel, bool *skipsallvis)
/* write the local variables back to vacrel */
vacrel->next_unskippable_block = next_unskippable_block;
- vacrel->next_unskippable_allvis = next_unskippable_allvis;
vacrel->next_unskippable_eager_scanned = next_unskippable_eager_scanned;
vacrel->next_unskippable_vmbuffer = next_unskippable_vmbuffer;
}
@@ -1953,9 +1934,7 @@ cmpOffsetNumbers(const void *a, const void *b)
* Caller must hold pin and buffer cleanup lock on the buffer.
*
* vmbuffer is the buffer containing the VM block with visibility information
- * for the heap block, blkno. all_visible_according_to_vm is the saved
- * visibility status of the heap block looked up earlier by the caller. We
- * won't rely entirely on this status, as it may be out of date.
+ * for the heap block, blkno.
*
* *has_lpdead_items is set to true or false depending on whether, upon return
* from this function, any LP_DEAD items are still present on the page.
@@ -1972,7 +1951,6 @@ lazy_scan_prune(LVRelState *vacrel,
BlockNumber blkno,
Page page,
Buffer vmbuffer,
- bool all_visible_according_to_vm,
bool *has_lpdead_items,
bool *vm_page_frozen)
{
@@ -1986,6 +1964,8 @@ lazy_scan_prune(LVRelState *vacrel,
.vistest = vacrel->vistest,
.cutoffs = &vacrel->cutoffs,
};
+ uint8 old_vmbits = 0;
+ uint8 new_vmbits = 0;
Assert(BufferGetBlockNumber(buf) == blkno);
@@ -2088,70 +2068,7 @@ lazy_scan_prune(LVRelState *vacrel,
Assert(!presult.all_visible || !(*has_lpdead_items));
Assert(!presult.all_frozen || presult.all_visible);
- /*
- * Handle setting visibility map bit based on information from the VM (as
- * of last heap_vac_scan_next_block() call), and from all_visible and
- * all_frozen variables
- */
- if ((presult.all_visible && !all_visible_according_to_vm) ||
- (presult.all_frozen && !VM_ALL_FROZEN(rel, blkno, &vmbuffer)))
- {
- uint8 old_vmbits;
- uint8 flags = VISIBILITYMAP_ALL_VISIBLE;
-
- if (presult.all_frozen)
- flags |= VISIBILITYMAP_ALL_FROZEN;
-
- /*
- * It should never be the case that the visibility map page is set
- * while the page-level bit is clear, but the reverse is allowed (if
- * checksums are not enabled). Regardless, set both bits so that we
- * get back in sync.
- *
- * Even if PD_ALL_VISIBLE is already set, we don't need to worry about
- * unnecessarily dirtying the heap buffer. Nearly the only scenario
- * where PD_ALL_VISIBLE is set but the VM is not is if the VM was
- * removed -- and that isn't worth optimizing for. And if we add the
- * heap buffer to the WAL chain (without passing REGBUF_NO_CHANGES),
- * it must be marked dirty.
- */
- PageSetAllVisible(page);
- MarkBufferDirty(buf);
-
- /*
- * If the page is being set all-frozen, we pass InvalidTransactionId
- * as the cutoff_xid, since a snapshot conflict horizon sufficient to
- * make everything safe for REDO was logged when the page's tuples
- * were frozen.
- */
- Assert(!presult.all_frozen ||
- !TransactionIdIsValid(presult.vm_conflict_horizon));
-
- old_vmbits = visibilitymap_set(vacrel->rel, blkno, buf,
- InvalidXLogRecPtr,
- vmbuffer, presult.vm_conflict_horizon,
- flags);
-
- /*
- * If the page wasn't already set all-visible and/or all-frozen in the
- * VM, count it as newly set for logging.
- */
- if ((old_vmbits & VISIBILITYMAP_ALL_VISIBLE) == 0)
- {
- vacrel->vm_new_visible_pages++;
- if (presult.all_frozen)
- {
- vacrel->vm_new_visible_frozen_pages++;
- *vm_page_frozen = true;
- }
- }
- else if ((old_vmbits & VISIBILITYMAP_ALL_FROZEN) == 0 &&
- presult.all_frozen)
- {
- vacrel->vm_new_frozen_pages++;
- *vm_page_frozen = true;
- }
- }
+ old_vmbits = visibilitymap_get_status(vacrel->rel, blkno, &vmbuffer);
/*
* As of PostgreSQL 9.2, the visibility map bit should never be set if the
@@ -2159,8 +2076,8 @@ lazy_scan_prune(LVRelState *vacrel,
* cleared after heap_vac_scan_next_block() was called, so we must recheck
* with buffer lock before concluding that the VM is corrupt.
*/
- else if (all_visible_according_to_vm && !PageIsAllVisible(page) &&
- visibilitymap_get_status(vacrel->rel, blkno, &vmbuffer) != 0)
+ if (!PageIsAllVisible(page) &&
+ (old_vmbits & VISIBILITYMAP_ALL_VISIBLE) != 0)
{
ereport(WARNING,
(errcode(ERRCODE_DATA_CORRUPTED),
@@ -2198,6 +2115,69 @@ lazy_scan_prune(LVRelState *vacrel,
VISIBILITYMAP_VALID_BITS);
}
+ if (!presult.all_visible)
+ return presult.ndeleted;
+
+ /* Set the visibility map and page visibility hint */
+ new_vmbits = VISIBILITYMAP_ALL_VISIBLE;
+
+ if (presult.all_frozen)
+ new_vmbits |= VISIBILITYMAP_ALL_FROZEN;
+
+ /* Nothing to do */
+ if (old_vmbits == new_vmbits)
+ return presult.ndeleted;
+
+ Assert(presult.all_visible);
+
+ /*
+ * It should never be the case that the visibility map page is set while
+ * the page-level bit is clear, but the reverse is allowed (if checksums
+ * are not enabled). Regardless, set both bits so that we get back in
+ * sync.
+ *
+ * Even if PD_ALL_VISIBLE is already set, we don't need to worry about
+ * unnecessarily dirtying the heap buffer. Nearly the only scenario where
+ * PD_ALL_VISIBLE is set but the VM is not is if the VM was removed -- and
+ * that isn't worth optimizing for. And if we add the heap buffer to the
+ * WAL chain (without passing REGBUF_NO_CHANGES), it must be marked dirty.
+ */
+ PageSetAllVisible(page);
+ MarkBufferDirty(buf);
+
+ /*
+ * If the page is being set all-frozen, we pass InvalidTransactionId as
+ * the cutoff_xid, since a snapshot conflict horizon sufficient to make
+ * everything safe for REDO was logged when the page's tuples were frozen.
+ */
+ Assert(!presult.all_frozen ||
+ !TransactionIdIsValid(presult.vm_conflict_horizon));
+
+ visibilitymap_set(vacrel->rel, blkno, buf,
+ InvalidXLogRecPtr,
+ vmbuffer, presult.vm_conflict_horizon,
+ new_vmbits);
+
+ /*
+ * If the page wasn't already set all-visible and/or all-frozen in the VM,
+ * count it as newly set for logging.
+ */
+ if ((old_vmbits & VISIBILITYMAP_ALL_VISIBLE) == 0)
+ {
+ vacrel->vm_new_visible_pages++;
+ if (presult.all_frozen)
+ {
+ vacrel->vm_new_visible_frozen_pages++;
+ *vm_page_frozen = true;
+ }
+ }
+ else if ((old_vmbits & VISIBILITYMAP_ALL_FROZEN) == 0 &&
+ presult.all_frozen)
+ {
+ vacrel->vm_new_frozen_pages++;
+ *vm_page_frozen = true;
+ }
+
return presult.ndeleted;
}
--
2.43.0
v28-0003-Refactor-lazy_scan_prune-VM-clear-logic-into-hel.patchtext/x-patch; charset=US-ASCII; name=v28-0003-Refactor-lazy_scan_prune-VM-clear-logic-into-hel.patchDownload
From c23840a12ff14eeffb5116c2cfd34e34e3987b02 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <melanieplageman@gmail.com>
Date: Tue, 2 Dec 2025 13:36:39 -0500
Subject: [PATCH v28 03/15] Refactor lazy_scan_prune() VM clear logic into
helper
Encapsulating them in a helper makes the whole function clearer. There
is no functional change other than moving it into a helper.
Reviewed-by: Kirill Reshke <reshkekirill@gmail.com>
---
src/backend/access/heap/vacuumlazy.c | 122 +++++++++++++++++----------
1 file changed, 78 insertions(+), 44 deletions(-)
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index 9d2523a55b2..ff34a99edbd 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -422,6 +422,11 @@ static void find_next_unskippable_block(LVRelState *vacrel, bool *skipsallvis);
static bool lazy_scan_new_or_empty(LVRelState *vacrel, Buffer buf,
BlockNumber blkno, Page page,
bool sharelock, Buffer vmbuffer);
+static void identify_and_fix_vm_corruption(Relation rel, Buffer heap_buffer,
+ BlockNumber heap_blk, Page heap_page,
+ int nlpdead_items,
+ Buffer vmbuffer,
+ uint8 vmbits);
static int lazy_scan_prune(LVRelState *vacrel, Buffer buf,
BlockNumber blkno, Page page,
Buffer vmbuffer,
@@ -1928,6 +1933,77 @@ cmpOffsetNumbers(const void *a, const void *b)
return pg_cmp_u16(*(const OffsetNumber *) a, *(const OffsetNumber *) b);
}
+/*
+ * Helper to correct any corruption detected on an heap page and its
+ * corresponding visibility map page after pruning but before setting the
+ * visibility map. It examines the heap page, the associated VM page, and the
+ * number of dead items previously identified.
+ *
+ * This function must be called while holding an exclusive lock on the heap
+ * buffer, and the dead items must have been discovered under that same lock.
+
+ * The provided vmbits must reflect the current state of the VM block
+ * referenced by vmbuffer. Although we do not hold a lock on the VM buffer, it
+ * is pinned, and the heap buffer is exclusively locked, ensuring that no
+ * other backend can update the VM bits corresponding to this heap page.
+ */
+static void
+identify_and_fix_vm_corruption(Relation rel, Buffer heap_buffer,
+ BlockNumber heap_blk, Page heap_page,
+ int nlpdead_items,
+ Buffer vmbuffer,
+ uint8 vmbits)
+{
+ Assert(visibilitymap_get_status(rel, heap_blk, &vmbuffer) == vmbits);
+
+ Assert(BufferIsLockedByMeInMode(heap_buffer, BUFFER_LOCK_EXCLUSIVE));
+
+ /*
+ * As of PostgreSQL 9.2, the visibility map bit should never be set if the
+ * page-level bit is clear. However, it's possible that the bit got
+ * cleared after heap_vac_scan_next_block() was called, so we must recheck
+ * with buffer lock before concluding that the VM is corrupt.
+ */
+ if (!PageIsAllVisible(heap_page) &&
+ ((vmbits & VISIBILITYMAP_ALL_VISIBLE) != 0))
+ {
+ ereport(WARNING,
+ (errcode(ERRCODE_DATA_CORRUPTED),
+ errmsg("page is not marked all-visible but visibility map bit is set in relation \"%s\" page %u",
+ RelationGetRelationName(rel), heap_blk)));
+
+ visibilitymap_clear(rel, heap_blk, vmbuffer,
+ VISIBILITYMAP_VALID_BITS);
+ }
+
+ /*
+ * It's possible for the value returned by
+ * GetOldestNonRemovableTransactionId() to move backwards, so it's not
+ * wrong for us to see tuples that appear to not be visible to everyone
+ * yet, while PD_ALL_VISIBLE is already set. The real safe xmin value
+ * never moves backwards, but GetOldestNonRemovableTransactionId() is
+ * conservative and sometimes returns a value that's unnecessarily small,
+ * so if we see that contradiction it just means that the tuples that we
+ * think are not visible to everyone yet actually are, and the
+ * PD_ALL_VISIBLE flag is correct.
+ *
+ * There should never be LP_DEAD items on a page with PD_ALL_VISIBLE set,
+ * however.
+ */
+ else if (PageIsAllVisible(heap_page) && nlpdead_items > 0)
+ {
+ ereport(WARNING,
+ (errcode(ERRCODE_DATA_CORRUPTED),
+ errmsg("page containing LP_DEAD items is marked as all-visible in relation \"%s\" page %u",
+ RelationGetRelationName(rel), heap_blk)));
+
+ PageClearAllVisible(heap_page);
+ MarkBufferDirty(heap_buffer);
+ visibilitymap_clear(rel, heap_blk, vmbuffer,
+ VISIBILITYMAP_VALID_BITS);
+ }
+}
+
/*
* lazy_scan_prune() -- lazy_scan_heap() pruning and freezing.
*
@@ -2070,50 +2146,8 @@ lazy_scan_prune(LVRelState *vacrel,
old_vmbits = visibilitymap_get_status(vacrel->rel, blkno, &vmbuffer);
- /*
- * As of PostgreSQL 9.2, the visibility map bit should never be set if the
- * page-level bit is clear. However, it's possible that the bit got
- * cleared after heap_vac_scan_next_block() was called, so we must recheck
- * with buffer lock before concluding that the VM is corrupt.
- */
- if (!PageIsAllVisible(page) &&
- (old_vmbits & VISIBILITYMAP_ALL_VISIBLE) != 0)
- {
- ereport(WARNING,
- (errcode(ERRCODE_DATA_CORRUPTED),
- errmsg("page is not marked all-visible but visibility map bit is set in relation \"%s\" page %u",
- vacrel->relname, blkno)));
-
- visibilitymap_clear(vacrel->rel, blkno, vmbuffer,
- VISIBILITYMAP_VALID_BITS);
- }
-
- /*
- * It's possible for the value returned by
- * GetOldestNonRemovableTransactionId() to move backwards, so it's not
- * wrong for us to see tuples that appear to not be visible to everyone
- * yet, while PD_ALL_VISIBLE is already set. The real safe xmin value
- * never moves backwards, but GetOldestNonRemovableTransactionId() is
- * conservative and sometimes returns a value that's unnecessarily small,
- * so if we see that contradiction it just means that the tuples that we
- * think are not visible to everyone yet actually are, and the
- * PD_ALL_VISIBLE flag is correct.
- *
- * There should never be LP_DEAD items on a page with PD_ALL_VISIBLE set,
- * however.
- */
- else if (presult.lpdead_items > 0 && PageIsAllVisible(page))
- {
- ereport(WARNING,
- (errcode(ERRCODE_DATA_CORRUPTED),
- errmsg("page containing LP_DEAD items is marked as all-visible in relation \"%s\" page %u",
- vacrel->relname, blkno)));
-
- PageClearAllVisible(page);
- MarkBufferDirty(buf);
- visibilitymap_clear(vacrel->rel, blkno, vmbuffer,
- VISIBILITYMAP_VALID_BITS);
- }
+ identify_and_fix_vm_corruption(vacrel->rel, buf, blkno, page,
+ presult.lpdead_items, vmbuffer, old_vmbits);
if (!presult.all_visible)
return presult.ndeleted;
--
2.43.0
v28-0004-Set-the-VM-in-heap_page_prune_and_freeze.patchtext/x-patch; charset=US-ASCII; name=v28-0004-Set-the-VM-in-heap_page_prune_and_freeze.patchDownload
From 014abb83438cf3a3600f34b1060bca430f572275 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <melanieplageman@gmail.com>
Date: Tue, 2 Dec 2025 15:07:42 -0500
Subject: [PATCH v28 04/15] Set the VM in heap_page_prune_and_freeze()
This has no independent benefit. It is meant for ease of review. As of
this commit, there is still a separate WAL record emitted for setting
the VM after pruning and freezing. But it is easier to review if moving
the logic into pruneheap.c is separate from setting the VM in the same
WAL record.
---
src/backend/access/heap/pruneheap.c | 302 +++++++++++++++++++++++----
src/backend/access/heap/vacuumlazy.c | 142 +------------
src/include/access/heapam.h | 21 ++
3 files changed, 286 insertions(+), 179 deletions(-)
diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index 07aa08cfe14..14d40476be9 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -19,7 +19,7 @@
#include "access/htup_details.h"
#include "access/multixact.h"
#include "access/transam.h"
-#include "access/visibilitymapdefs.h"
+#include "access/visibilitymap.h"
#include "access/xlog.h"
#include "access/xloginsert.h"
#include "commands/vacuum.h"
@@ -44,6 +44,8 @@ typedef struct
bool mark_unused_now;
/* whether to attempt freezing tuples */
bool attempt_freeze;
+ /* whether or not to attempt updating the VM */
+ bool attempt_update_vm;
struct VacuumCutoffs *cutoffs;
/*-------------------------------------------------------
@@ -140,16 +142,17 @@ typedef struct
* all_visible and all_frozen indicate if the all-visible and all-frozen
* bits in the visibility map can be set for this page after pruning.
*
- * visibility_cutoff_xid is the newest xmin of live tuples on the page.
- * The caller can use it as the conflict horizon, when setting the VM
- * bits. It is only valid if we froze some tuples, and all_frozen is
- * true.
+ * visibility_cutoff_xid is the newest xmin of live tuples on the page. It
+ * can be used as the conflict horizon when setting the VM or when
+ * freezing all the tuples on the page. It is only valid when all the live
+ * tuples on the page are all-visible.
*
* NOTE: all_visible and all_frozen initially don't include LP_DEAD items.
* That's convenient for heap_page_prune_and_freeze() to use them to
- * decide whether to freeze the page or not. The all_visible and
- * all_frozen values returned to the caller are adjusted to include
- * LP_DEAD items after we determine whether to opportunistically freeze.
+ * decide whether to opportunistically freeze the page or not. The
+ * all_visible and all_frozen values ultimately used to set the VM are
+ * adjusted to include LP_DEAD items after we determine whether or not to
+ * opportunistically freeze.
*/
bool all_visible;
bool all_frozen;
@@ -191,6 +194,17 @@ static void page_verify_redirects(Page page);
static bool heap_page_will_freeze(Relation relation, Buffer buffer,
bool did_tuple_hint_fpi, bool do_prune, bool do_hint_prune,
PruneState *prstate);
+static void identify_and_fix_vm_corruption(Relation rel, Buffer heap_buffer,
+ BlockNumber heap_blk, Page heap_page, int nlpdead_items,
+ Buffer vmbuffer,
+ uint8 vmbits);
+static bool heap_page_will_set_vm(PruneState *prstate,
+ Relation relation,
+ BlockNumber heap_blk, Buffer heap_buffer, Page heap_page,
+ Buffer vmbuffer,
+ int nlpdead_items,
+ uint8 *old_vmbits,
+ uint8 *new_vmbits);
/*
@@ -280,6 +294,7 @@ heap_page_prune_opt(Relation relation, Buffer buffer)
PruneFreezeParams params = {
.relation = relation,
.buffer = buffer,
+ .vmbuffer = InvalidBuffer,
.reason = PRUNE_ON_ACCESS,
.options = 0,
.vistest = vistest,
@@ -341,6 +356,8 @@ prune_freeze_setup(PruneFreezeParams *params,
/* cutoffs must be provided if we will attempt freezing */
Assert(!(params->options & HEAP_PAGE_PRUNE_FREEZE) || params->cutoffs);
prstate->attempt_freeze = (params->options & HEAP_PAGE_PRUNE_FREEZE) != 0;
+ prstate->attempt_update_vm =
+ (params->options & HEAP_PAGE_PRUNE_UPDATE_VM) != 0;
prstate->cutoffs = params->cutoffs;
/*
@@ -396,51 +413,54 @@ prune_freeze_setup(PruneFreezeParams *params,
prstate->frz_conflict_horizon = InvalidTransactionId;
/*
- * Vacuum may update the VM after we're done. We can keep track of
- * whether the page will be all-visible and all-frozen after pruning and
- * freezing to help the caller to do that.
+ * Track whether the page could be marked all-visible and/or all-frozen.
+ * This information is used for opportunistic freezing and for updating
+ * the visibility map (VM) if requested by the caller.
*
- * Currently, only VACUUM sets the VM bits. To save the effort, only do
- * the bookkeeping if the caller needs it. Currently, that's tied to
- * HEAP_PAGE_PRUNE_FREEZE, but it could be a separate flag if you wanted
- * to update the VM bits without also freezing or freeze without also
- * setting the VM bits.
+ * Currently, only VACUUM performs freezing, but other callers may in the
+ * future. Visibility bookkeeping is required not just for setting the VM
+ * bits, but also for opportunistic freezing: we only consider freezing if
+ * the page would become all-frozen, or if it would be all-frozen except
+ * for dead tuples that VACUUM will remove. If attempt_update_vm is false,
+ * we will not set the VM bit even if the page is found to be all-visible.
*
- * In addition to telling the caller whether it can set the VM bit, we
- * also use 'all_visible' and 'all_frozen' for our own decision-making. If
- * the whole page would become frozen, we consider opportunistically
- * freezing tuples. We will not be able to freeze the whole page if there
- * are tuples present that are not visible to everyone or if there are
- * dead tuples which are not yet removable. However, dead tuples which
- * will be removed by the end of vacuuming should not preclude us from
- * opportunistically freezing. Because of that, we do not immediately
- * clear all_visible and all_frozen when we see LP_DEAD items. We fix
- * that after scanning the line pointers. We must correct all_visible and
- * all_frozen before we return them to the caller, so that the caller
- * doesn't set the VM bits incorrectly.
+ * If HEAP_PAGE_PRUNE_UPDATE_VM is passed without HEAP_PAGE_PRUNE_FREEZE,
+ * prstate.all_frozen must be initialized to false, since we will not call
+ * heap_prepare_freeze_tuple() for each tuple.
+ *
+ * Dead tuples that will be removed by the end of vacuum should not
+ * prevent opportunistic freezing. Therefore, we do not clear all_visible
+ * and all_frozen when we encounter LP_DEAD items. Instead, we correct
+ * them after deciding whether to freeze, but before updating the VM, to
+ * avoid setting the VM bits incorrectly.
+ *
+ * If neither freezing nor VM updates are requested, we skip the extra
+ * bookkeeping. In this case, initializing all_visible to false allows
+ * heap_prune_record_unchanged_lp_normal() to bypass unnecessary work.
*/
if (prstate->attempt_freeze)
{
prstate->all_visible = true;
prstate->all_frozen = true;
}
+ else if (prstate->attempt_update_vm)
+ {
+ prstate->all_visible = true;
+ prstate->all_frozen = false;
+ }
else
{
- /*
- * Initializing to false allows skipping the work to update them in
- * heap_prune_record_unchanged_lp_normal().
- */
prstate->all_visible = false;
prstate->all_frozen = false;
}
/*
- * The visibility cutoff xid is the newest xmin of live tuples on the
- * page. In the common case, this will be set as the conflict horizon the
- * caller can use for updating the VM. If, at the end of freezing and
- * pruning, the page is all-frozen, there is no possibility that any
- * running transaction on the standby does not see tuples on the page as
- * all-visible, so the conflict horizon remains InvalidTransactionId.
+ * The visibility cutoff xid is the newest xmin of live, committed tuples
+ * older than OldestXmin on the page. This field is only kept up-to-date
+ * if the page is all-visible. As soon as a tuple is encountered that is
+ * not visible to all, this field is unmaintained. As long as it is
+ * maintained, it can be used to calculate the snapshot conflict horizon
+ * when updating the VM and/or freezing all the tuples on the page.
*/
prstate->visibility_cutoff_xid = InvalidTransactionId;
}
@@ -775,10 +795,134 @@ heap_page_will_freeze(Relation relation, Buffer buffer,
return do_freeze;
}
+/*
+ * Helper to correct any corruption detected on an heap page and its
+ * corresponding visibility map page after pruning but before setting the
+ * visibility map. It examines the heap page, the associated VM page, and the
+ * number of dead items previously identified.
+ *
+ * This function must be called while holding an exclusive lock on the heap
+ * buffer, and the dead items must have been discovered under that same lock.
+ *
+ * The provided vmbits must reflect the current state of the VM block
+ * referenced by vmbuffer. Although we do not hold a lock on the VM buffer, it
+ * is pinned, and the heap buffer is exclusively locked, ensuring that no
+ * other backend can update the VM bits corresponding to this heap page.
+ */
+static void
+identify_and_fix_vm_corruption(Relation rel, Buffer heap_buffer,
+ BlockNumber heap_blk, Page heap_page,
+ int nlpdead_items,
+ Buffer vmbuffer,
+ uint8 vmbits)
+{
+ Assert(visibilitymap_get_status(rel, heap_blk, &vmbuffer) == vmbits);
+
+ Assert(BufferIsLockedByMeInMode(heap_buffer, BUFFER_LOCK_EXCLUSIVE));
+
+ /*
+ * As of PostgreSQL 9.2, the visibility map bit should never be set if the
+ * page-level bit is clear. However, it's possible that the bit got
+ * cleared after heap_vac_scan_next_block() was called, so we must recheck
+ * with buffer lock before concluding that the VM is corrupt.
+ */
+ if (!PageIsAllVisible(heap_page) &&
+ ((vmbits & VISIBILITYMAP_ALL_VISIBLE) != 0))
+ {
+ ereport(WARNING,
+ (errcode(ERRCODE_DATA_CORRUPTED),
+ errmsg("page is not marked all-visible but visibility map bit is set in relation \"%s\" page %u",
+ RelationGetRelationName(rel), heap_blk)));
+
+ visibilitymap_clear(rel, heap_blk, vmbuffer,
+ VISIBILITYMAP_VALID_BITS);
+ }
+
+ /*
+ * It's possible for the value returned by
+ * GetOldestNonRemovableTransactionId() to move backwards, so it's not
+ * wrong for us to see tuples that appear to not be visible to everyone
+ * yet, while PD_ALL_VISIBLE is already set. The real safe xmin value
+ * never moves backwards, but GetOldestNonRemovableTransactionId() is
+ * conservative and sometimes returns a value that's unnecessarily small,
+ * so if we see that contradiction it just means that the tuples that we
+ * think are not visible to everyone yet actually are, and the
+ * PD_ALL_VISIBLE flag is correct.
+ *
+ * There should never be LP_DEAD items on a page with PD_ALL_VISIBLE set,
+ * however.
+ */
+ else if (PageIsAllVisible(heap_page) && nlpdead_items > 0)
+ {
+ ereport(WARNING,
+ (errcode(ERRCODE_DATA_CORRUPTED),
+ errmsg("page containing LP_DEAD items is marked as all-visible in relation \"%s\" page %u",
+ RelationGetRelationName(rel), heap_blk)));
+
+ PageClearAllVisible(heap_page);
+ MarkBufferDirty(heap_buffer);
+ visibilitymap_clear(rel, heap_blk, vmbuffer,
+ VISIBILITYMAP_VALID_BITS);
+ }
+}
+
+/*
+ * Decide whether to set the visibility map bits (all-visible and all-frozen)
+ * for heap_blk using information from the PruneState and VM.
+ *
+ * This function does not actually set the VM bits or page-level visibility
+ * hint, PD_ALL_VISIBLE.
+ *
+ * However, if it finds that the page-level visibility hint or VM is
+ * corrupted, it will fix them by clearing the VM bits and visibility hint.
+ * This does not need to be done in a critical section.
+ *
+ * Returns true if one or both VM bits should be set, along with returning the
+ * current value of the VM bits in *old_vmbits and the desired new value of
+ * the VM bits in *new_vmbits.
+ */
+static bool
+heap_page_will_set_vm(PruneState *prstate,
+ Relation relation,
+ BlockNumber heap_blk, Buffer heap_buffer, Page heap_page,
+ Buffer vmbuffer,
+ int nlpdead_items,
+ uint8 *old_vmbits,
+ uint8 *new_vmbits)
+{
+ if (!prstate->attempt_update_vm)
+ return false;
+
+ *old_vmbits = visibilitymap_get_status(relation, heap_blk,
+ &vmbuffer);
+
+ /* We do this even if not all-visible */
+ identify_and_fix_vm_corruption(relation, heap_buffer, heap_blk, heap_page,
+ nlpdead_items, vmbuffer,
+ *old_vmbits);
+
+ if (!prstate->all_visible)
+ return false;
+
+ *new_vmbits = VISIBILITYMAP_ALL_VISIBLE;
+
+ if (prstate->all_frozen)
+ *new_vmbits |= VISIBILITYMAP_ALL_FROZEN;
+
+ if (*new_vmbits == *old_vmbits)
+ {
+ *new_vmbits = 0;
+ return false;
+ }
+
+ return true;
+}
+
/*
* Prune and repair fragmentation and potentially freeze tuples on the
- * specified page.
+ * specified page. If the page's visibility status has changed, update it in
+ * the VM.
*
* Caller must have pin and buffer cleanup lock on the page. Note that we
* don't update the FSM information for page on caller's behalf. Caller might
@@ -793,12 +937,13 @@ heap_page_will_freeze(Relation relation, Buffer buffer,
* tuples if it's required in order to advance relfrozenxid / relminmxid, or
* if it's considered advantageous for overall system performance to do so
* now. The 'params.cutoffs', 'presult', 'new_relfrozen_xid' and
- * 'new_relmin_mxid' arguments are required when freezing. When
- * HEAP_PAGE_PRUNE_FREEZE option is passed, we also set presult->all_visible
- * and presult->all_frozen after determining whether or not to
- * opportunistically freeze, to indicate if the VM bits can be set. They are
- * always set to false when the HEAP_PAGE_PRUNE_FREEZE option is not passed,
- * because at the moment only callers that also freeze need that information.
+ * 'new_relmin_mxid' arguments are required when freezing.
+ *
+ * If HEAP_PAGE_PRUNE_UPDATE_VM is set in params and the visibility status of
+ * the page has changed, we will update the VM at the same time as pruning and
+ * freezing the heap page. We will also update presult->old_vmbits and
+ * presult->new_vmbits with the state of the VM before and after updating it
+ * for the caller to use in bookkeeping.
*
* presult contains output parameters needed by callers, such as the number of
* tuples removed and the offsets of dead items on the page after pruning.
@@ -823,13 +968,19 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
MultiXactId *new_relmin_mxid)
{
Buffer buffer = params->buffer;
+ Buffer vmbuffer = params->vmbuffer;
Page page = BufferGetPage(buffer);
+ BlockNumber blockno = BufferGetBlockNumber(buffer);
PruneState prstate;
bool do_freeze;
bool do_prune;
bool do_hint_prune;
+ bool do_set_vm;
bool did_tuple_hint_fpi;
int64 fpi_before = pgWalUsage.wal_fpi;
+ uint8 new_vmbits = 0;
+ uint8 old_vmbits = 0;
+
/* Initialize prstate */
prune_freeze_setup(params,
@@ -1011,6 +1162,65 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
*new_relmin_mxid = prstate.pagefrz.NoFreezePageRelminMxid;
}
}
+
+ /* Now update the visibility map and PD_ALL_VISIBLE hint */
+ Assert(!prstate.all_visible || (prstate.lpdead_items == 0));
+
+ do_set_vm = heap_page_will_set_vm(&prstate,
+ params->relation,
+ blockno,
+ buffer,
+ page,
+ vmbuffer,
+ prstate.lpdead_items,
+ &old_vmbits,
+ &new_vmbits);
+
+ /*
+ * new_vmbits should be 0 regardless of whether or not the page is
+ * all-visible if we do not intend to set the VM.
+ */
+ Assert(do_set_vm || new_vmbits == 0);
+
+ /* Set the visibility map and page visibility hint, if relevant */
+ if (do_set_vm)
+ {
+ Assert(prstate.all_visible);
+
+ /*
+ * It should never be the case that the visibility map page is set
+ * while the page-level bit is clear, but the reverse is allowed (if
+ * checksums are not enabled). Regardless, set both bits so that we
+ * get back in sync.
+ *
+ * Even if PD_ALL_VISIBLE is already set, we don't need to worry about
+ * unnecessarily dirtying the heap buffer. Nearly the only scenario
+ * where PD_ALL_VISIBLE is set but the VM is not is if the VM was
+ * removed -- and that isn't worth optimizing for. And if we add the
+ * heap buffer to the WAL chain (without passing REGBUF_NO_CHANGES),
+ * it must be marked dirty.
+ */
+ PageSetAllVisible(page);
+ MarkBufferDirty(buffer);
+
+ /*
+ * If the page is being set all-frozen, we pass InvalidTransactionId
+ * as the cutoff_xid, since a snapshot conflict horizon sufficient to
+ * make everything safe for REDO was logged when the page's tuples
+ * were frozen.
+ */
+ Assert(!prstate.all_frozen ||
+ !TransactionIdIsValid(presult->vm_conflict_horizon));
+
+ visibilitymap_set(params->relation, blockno, buffer,
+ InvalidXLogRecPtr,
+ vmbuffer, presult->vm_conflict_horizon,
+ new_vmbits);
+ }
+
+ /* Save the vmbits for caller */
+ presult->old_vmbits = old_vmbits;
+ presult->new_vmbits = new_vmbits;
}
@@ -1485,6 +1695,8 @@ heap_prune_record_unchanged_lp_normal(Page page, PruneState *prstate, OffsetNumb
{
TransactionId xmin;
+ Assert(prstate->attempt_update_vm);
+
if (!HeapTupleHeaderXminCommitted(htup))
{
prstate->all_visible = false;
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index ff34a99edbd..d5c57516785 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -422,11 +422,6 @@ static void find_next_unskippable_block(LVRelState *vacrel, bool *skipsallvis);
static bool lazy_scan_new_or_empty(LVRelState *vacrel, Buffer buf,
BlockNumber blkno, Page page,
bool sharelock, Buffer vmbuffer);
-static void identify_and_fix_vm_corruption(Relation rel, Buffer heap_buffer,
- BlockNumber heap_blk, Page heap_page,
- int nlpdead_items,
- Buffer vmbuffer,
- uint8 vmbits);
static int lazy_scan_prune(LVRelState *vacrel, Buffer buf,
BlockNumber blkno, Page page,
Buffer vmbuffer,
@@ -1933,77 +1928,6 @@ cmpOffsetNumbers(const void *a, const void *b)
return pg_cmp_u16(*(const OffsetNumber *) a, *(const OffsetNumber *) b);
}
-/*
- * Helper to correct any corruption detected on an heap page and its
- * corresponding visibility map page after pruning but before setting the
- * visibility map. It examines the heap page, the associated VM page, and the
- * number of dead items previously identified.
- *
- * This function must be called while holding an exclusive lock on the heap
- * buffer, and the dead items must have been discovered under that same lock.
-
- * The provided vmbits must reflect the current state of the VM block
- * referenced by vmbuffer. Although we do not hold a lock on the VM buffer, it
- * is pinned, and the heap buffer is exclusively locked, ensuring that no
- * other backend can update the VM bits corresponding to this heap page.
- */
-static void
-identify_and_fix_vm_corruption(Relation rel, Buffer heap_buffer,
- BlockNumber heap_blk, Page heap_page,
- int nlpdead_items,
- Buffer vmbuffer,
- uint8 vmbits)
-{
- Assert(visibilitymap_get_status(rel, heap_blk, &vmbuffer) == vmbits);
-
- Assert(BufferIsLockedByMeInMode(heap_buffer, BUFFER_LOCK_EXCLUSIVE));
-
- /*
- * As of PostgreSQL 9.2, the visibility map bit should never be set if the
- * page-level bit is clear. However, it's possible that the bit got
- * cleared after heap_vac_scan_next_block() was called, so we must recheck
- * with buffer lock before concluding that the VM is corrupt.
- */
- if (!PageIsAllVisible(heap_page) &&
- ((vmbits & VISIBILITYMAP_ALL_VISIBLE) != 0))
- {
- ereport(WARNING,
- (errcode(ERRCODE_DATA_CORRUPTED),
- errmsg("page is not marked all-visible but visibility map bit is set in relation \"%s\" page %u",
- RelationGetRelationName(rel), heap_blk)));
-
- visibilitymap_clear(rel, heap_blk, vmbuffer,
- VISIBILITYMAP_VALID_BITS);
- }
-
- /*
- * It's possible for the value returned by
- * GetOldestNonRemovableTransactionId() to move backwards, so it's not
- * wrong for us to see tuples that appear to not be visible to everyone
- * yet, while PD_ALL_VISIBLE is already set. The real safe xmin value
- * never moves backwards, but GetOldestNonRemovableTransactionId() is
- * conservative and sometimes returns a value that's unnecessarily small,
- * so if we see that contradiction it just means that the tuples that we
- * think are not visible to everyone yet actually are, and the
- * PD_ALL_VISIBLE flag is correct.
- *
- * There should never be LP_DEAD items on a page with PD_ALL_VISIBLE set,
- * however.
- */
- else if (PageIsAllVisible(heap_page) && nlpdead_items > 0)
- {
- ereport(WARNING,
- (errcode(ERRCODE_DATA_CORRUPTED),
- errmsg("page containing LP_DEAD items is marked as all-visible in relation \"%s\" page %u",
- RelationGetRelationName(rel), heap_blk)));
-
- PageClearAllVisible(heap_page);
- MarkBufferDirty(heap_buffer);
- visibilitymap_clear(rel, heap_blk, vmbuffer,
- VISIBILITYMAP_VALID_BITS);
- }
-}
-
/*
* lazy_scan_prune() -- lazy_scan_heap() pruning and freezing.
*
@@ -2035,13 +1959,12 @@ lazy_scan_prune(LVRelState *vacrel,
PruneFreezeParams params = {
.relation = rel,
.buffer = buf,
+ .vmbuffer = vmbuffer,
.reason = PRUNE_VACUUM_SCAN,
- .options = HEAP_PAGE_PRUNE_FREEZE,
+ .options = HEAP_PAGE_PRUNE_FREEZE | HEAP_PAGE_PRUNE_UPDATE_VM,
.vistest = vacrel->vistest,
.cutoffs = &vacrel->cutoffs,
};
- uint8 old_vmbits = 0;
- uint8 new_vmbits = 0;
Assert(BufferGetBlockNumber(buf) == blkno);
@@ -2141,73 +2064,24 @@ lazy_scan_prune(LVRelState *vacrel,
/* Did we find LP_DEAD items? */
*has_lpdead_items = (presult.lpdead_items > 0);
- Assert(!presult.all_visible || !(*has_lpdead_items));
- Assert(!presult.all_frozen || presult.all_visible);
-
- old_vmbits = visibilitymap_get_status(vacrel->rel, blkno, &vmbuffer);
-
- identify_and_fix_vm_corruption(vacrel->rel, buf, blkno, page,
- presult.lpdead_items, vmbuffer, old_vmbits);
-
- if (!presult.all_visible)
- return presult.ndeleted;
-
- /* Set the visibility map and page visibility hint */
- new_vmbits = VISIBILITYMAP_ALL_VISIBLE;
-
- if (presult.all_frozen)
- new_vmbits |= VISIBILITYMAP_ALL_FROZEN;
-
- /* Nothing to do */
- if (old_vmbits == new_vmbits)
- return presult.ndeleted;
-
- Assert(presult.all_visible);
-
- /*
- * It should never be the case that the visibility map page is set while
- * the page-level bit is clear, but the reverse is allowed (if checksums
- * are not enabled). Regardless, set both bits so that we get back in
- * sync.
- *
- * Even if PD_ALL_VISIBLE is already set, we don't need to worry about
- * unnecessarily dirtying the heap buffer. Nearly the only scenario where
- * PD_ALL_VISIBLE is set but the VM is not is if the VM was removed -- and
- * that isn't worth optimizing for. And if we add the heap buffer to the
- * WAL chain (without passing REGBUF_NO_CHANGES), it must be marked dirty.
- */
- PageSetAllVisible(page);
- MarkBufferDirty(buf);
-
- /*
- * If the page is being set all-frozen, we pass InvalidTransactionId as
- * the cutoff_xid, since a snapshot conflict horizon sufficient to make
- * everything safe for REDO was logged when the page's tuples were frozen.
- */
- Assert(!presult.all_frozen ||
- !TransactionIdIsValid(presult.vm_conflict_horizon));
-
- visibilitymap_set(vacrel->rel, blkno, buf,
- InvalidXLogRecPtr,
- vmbuffer, presult.vm_conflict_horizon,
- new_vmbits);
-
/*
* If the page wasn't already set all-visible and/or all-frozen in the VM,
* count it as newly set for logging.
*/
- if ((old_vmbits & VISIBILITYMAP_ALL_VISIBLE) == 0)
+ if ((presult.old_vmbits & VISIBILITYMAP_ALL_VISIBLE) == 0 &&
+ (presult.new_vmbits & VISIBILITYMAP_ALL_VISIBLE) != 0)
{
vacrel->vm_new_visible_pages++;
- if (presult.all_frozen)
+ if ((presult.new_vmbits & VISIBILITYMAP_ALL_FROZEN) != 0)
{
vacrel->vm_new_visible_frozen_pages++;
*vm_page_frozen = true;
}
}
- else if ((old_vmbits & VISIBILITYMAP_ALL_FROZEN) == 0 &&
- presult.all_frozen)
+ else if ((presult.old_vmbits & VISIBILITYMAP_ALL_FROZEN) == 0 &&
+ (presult.new_vmbits & VISIBILITYMAP_ALL_FROZEN) != 0)
{
+ Assert((presult.new_vmbits & VISIBILITYMAP_ALL_VISIBLE) != 0);
vacrel->vm_new_frozen_pages++;
*vm_page_frozen = true;
}
diff --git a/src/include/access/heapam.h b/src/include/access/heapam.h
index f7e4ae3843c..ad2af13ec39 100644
--- a/src/include/access/heapam.h
+++ b/src/include/access/heapam.h
@@ -42,6 +42,7 @@
/* "options" flag bits for heap_page_prune_and_freeze */
#define HEAP_PAGE_PRUNE_MARK_UNUSED_NOW (1 << 0)
#define HEAP_PAGE_PRUNE_FREEZE (1 << 1)
+#define HEAP_PAGE_PRUNE_UPDATE_VM (1 << 2)
typedef struct BulkInsertStateData *BulkInsertState;
typedef struct GlobalVisState GlobalVisState;
@@ -238,6 +239,12 @@ typedef struct PruneFreezeParams
Relation relation; /* relation containing buffer to be pruned */
Buffer buffer; /* buffer to be pruned */
+ /*
+ * If we will consider updating the visibility map, vmbuffer should
+ * contain the correct block of the visibility map and be pinned.
+ */
+ Buffer vmbuffer;
+
/*
* The reason pruning was performed. It is used to set the WAL record
* opcode which is used for debugging and analysis purposes.
@@ -252,6 +259,9 @@ typedef struct PruneFreezeParams
*
* HEAP_PAGE_PRUNE_FREEZE indicates that we will also freeze tuples, and
* will return 'all_visible', 'all_frozen' flags to the caller.
+ *
+ * HEAP_PAGE_PRUNE_UPDATE_VM indicates that we will set the page's status
+ * in the VM.
*/
int options;
@@ -299,6 +309,17 @@ typedef struct PruneFreezeResult
bool all_frozen;
TransactionId vm_conflict_horizon;
+ /*
+ * old_vmbits are the state of the all-visible and all-frozen bits in the
+ * visibility map before updating it during phase I of vacuuming.
+ * new_vmbits are the state of those bits after phase I of vacuuming.
+ *
+ * These are only set if the HEAP_PAGE_PRUNE_UPDATE_VM option is set and
+ * we have attempted to update the VM.
+ */
+ uint8 new_vmbits;
+ uint8 old_vmbits;
+
/*
* Whether or not the page makes rel truncation unsafe. This is set to
* 'true', even if the page contains LP_DEAD items. VACUUM will remove
--
2.43.0
v28-0005-Move-VM-assert-into-prune-freeze-code.patchtext/x-patch; charset=US-ASCII; name=v28-0005-Move-VM-assert-into-prune-freeze-code.patchDownload
From 218bcd4dffe014647495a9bba11d8beaeb1465cd Mon Sep 17 00:00:00 2001
From: Melanie Plageman <melanieplageman@gmail.com>
Date: Tue, 2 Dec 2025 15:57:34 -0500
Subject: [PATCH v28 05/15] Move VM assert into prune/freeze code
This is a step toward setting the VM in the same WAL record as pruning
and freezing. It moves the check of the heap page into prune/freeze code
before setting the VM. This allows us to remove some fields of the
PruneFreezeResult.
---
src/backend/access/heap/pruneheap.c | 86 ++++++++++++++++++++++------
src/backend/access/heap/vacuumlazy.c | 68 +---------------------
src/include/access/heapam.h | 25 +++-----
3 files changed, 77 insertions(+), 102 deletions(-)
diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index 14d40476be9..39149fbba7c 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -918,6 +918,31 @@ heap_page_will_set_vm(PruneState *prstate,
return true;
}
+#ifdef USE_ASSERT_CHECKING
+
+/*
+ * Wrapper for heap_page_would_be_all_visible() which can be used for callers
+ * that expect no LP_DEAD on the page. Currently assert-only, but there is no
+ * reason not to use it outside of asserts.
+ */
+static bool
+heap_page_is_all_visible(Relation rel, Buffer buf,
+ TransactionId OldestXmin,
+ bool *all_frozen,
+ TransactionId *visibility_cutoff_xid,
+ OffsetNumber *logging_offnum)
+{
+
+ return heap_page_would_be_all_visible(rel, buf,
+ OldestXmin,
+ NULL, 0,
+ all_frozen,
+ visibility_cutoff_xid,
+ logging_offnum);
+}
+#endif
+
+
/*
* Prune and repair fragmentation and potentially freeze tuples on the
@@ -971,6 +996,7 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
Buffer vmbuffer = params->vmbuffer;
Page page = BufferGetPage(buffer);
BlockNumber blockno = BufferGetBlockNumber(buffer);
+ TransactionId vm_conflict_horizon = InvalidTransactionId;
PruneState prstate;
bool do_freeze;
bool do_prune;
@@ -1129,23 +1155,8 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
presult->nfrozen = prstate.nfrozen;
presult->live_tuples = prstate.live_tuples;
presult->recently_dead_tuples = prstate.recently_dead_tuples;
- presult->all_visible = prstate.all_visible;
- presult->all_frozen = prstate.all_frozen;
presult->hastup = prstate.hastup;
- /*
- * For callers planning to update the visibility map, the conflict horizon
- * for that record must be the newest xmin on the page. However, if the
- * page is completely frozen, there can be no conflict and the
- * vm_conflict_horizon should remain InvalidTransactionId. This includes
- * the case that we just froze all the tuples; the prune-freeze record
- * included the conflict XID already so the caller doesn't need it.
- */
- if (presult->all_frozen)
- presult->vm_conflict_horizon = InvalidTransactionId;
- else
- presult->vm_conflict_horizon = prstate.visibility_cutoff_xid;
-
presult->lpdead_items = prstate.lpdead_items;
/* the presult->deadoffsets array was already filled in */
@@ -1163,6 +1174,46 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
}
}
+ /*
+ * If updating the visibility map, the conflict horizon for that record
+ * must be the newest xmin on the page. However, if the page is
+ * completely frozen, there can be no conflict and the vm_conflict_horizon
+ * should remain InvalidTransactionId. This includes the case that we
+ * just froze all the tuples; the prune-freeze record included the
+ * conflict XID already so we don't need to again.
+ */
+ if (prstate.all_frozen)
+ vm_conflict_horizon = InvalidTransactionId;
+ else
+ vm_conflict_horizon = prstate.visibility_cutoff_xid;
+
+ /*
+ * During its second pass over the heap, VACUUM calls
+ * heap_page_would_be_all_visible() to determine whether a page is
+ * all-visible and all-frozen. The logic here is similar. After completing
+ * pruning and freezing, use an assertion to verify that our results
+ * remain consistent with heap_page_would_be_all_visible().
+ */
+#ifdef USE_ASSERT_CHECKING
+ if (prstate.all_visible)
+ {
+ TransactionId debug_cutoff;
+ bool debug_all_frozen;
+
+ Assert(presult->lpdead_items == 0);
+
+ Assert(heap_page_is_all_visible(params->relation, buffer,
+ prstate.cutoffs->OldestXmin,
+ &debug_all_frozen,
+ &debug_cutoff, off_loc));
+
+ Assert(prstate.all_frozen == debug_all_frozen);
+
+ Assert(!TransactionIdIsValid(debug_cutoff) ||
+ debug_cutoff == vm_conflict_horizon);
+ }
+#endif
+
/* Now update the visibility map and PD_ALL_VISIBLE hint */
Assert(!prstate.all_visible || (prstate.lpdead_items == 0));
@@ -1209,12 +1260,11 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
* make everything safe for REDO was logged when the page's tuples
* were frozen.
*/
- Assert(!prstate.all_frozen ||
- !TransactionIdIsValid(presult->vm_conflict_horizon));
+ Assert(!prstate.all_frozen || !TransactionIdIsValid(vm_conflict_horizon));
visibilitymap_set(params->relation, blockno, buffer,
InvalidXLogRecPtr,
- vmbuffer, presult->vm_conflict_horizon,
+ vmbuffer, vm_conflict_horizon,
new_vmbits);
}
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index d5c57516785..61564aea5fd 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -456,20 +456,6 @@ static void dead_items_add(LVRelState *vacrel, BlockNumber blkno, OffsetNumber *
static void dead_items_reset(LVRelState *vacrel);
static void dead_items_cleanup(LVRelState *vacrel);
-#ifdef USE_ASSERT_CHECKING
-static bool heap_page_is_all_visible(Relation rel, Buffer buf,
- TransactionId OldestXmin,
- bool *all_frozen,
- TransactionId *visibility_cutoff_xid,
- OffsetNumber *logging_offnum);
-#endif
-static bool heap_page_would_be_all_visible(Relation rel, Buffer buf,
- TransactionId OldestXmin,
- OffsetNumber *deadoffsets,
- int ndeadoffsets,
- bool *all_frozen,
- TransactionId *visibility_cutoff_xid,
- OffsetNumber *logging_offnum);
static void update_relstats_all_indexes(LVRelState *vacrel);
static void vacuum_error_callback(void *arg);
static void update_vacuum_error_info(LVRelState *vacrel,
@@ -2005,32 +1991,6 @@ lazy_scan_prune(LVRelState *vacrel,
vacrel->new_frozen_tuple_pages++;
}
- /*
- * VACUUM will call heap_page_is_all_visible() during the second pass over
- * the heap to determine all_visible and all_frozen for the page -- this
- * is a specialized version of the logic from this function. Now that
- * we've finished pruning and freezing, make sure that we're in total
- * agreement with heap_page_is_all_visible() using an assertion.
- */
-#ifdef USE_ASSERT_CHECKING
- if (presult.all_visible)
- {
- TransactionId debug_cutoff;
- bool debug_all_frozen;
-
- Assert(presult.lpdead_items == 0);
-
- Assert(heap_page_is_all_visible(vacrel->rel, buf,
- vacrel->cutoffs.OldestXmin, &debug_all_frozen,
- &debug_cutoff, &vacrel->offnum));
-
- Assert(presult.all_frozen == debug_all_frozen);
-
- Assert(!TransactionIdIsValid(debug_cutoff) ||
- debug_cutoff == presult.vm_conflict_horizon);
- }
-#endif
-
/*
* Now save details of the LP_DEAD items from the page in vacrel
*/
@@ -3487,29 +3447,6 @@ dead_items_cleanup(LVRelState *vacrel)
vacrel->pvs = NULL;
}
-#ifdef USE_ASSERT_CHECKING
-
-/*
- * Wrapper for heap_page_would_be_all_visible() which can be used for callers
- * that expect no LP_DEAD on the page. Currently assert-only, but there is no
- * reason not to use it outside of asserts.
- */
-static bool
-heap_page_is_all_visible(Relation rel, Buffer buf,
- TransactionId OldestXmin,
- bool *all_frozen,
- TransactionId *visibility_cutoff_xid,
- OffsetNumber *logging_offnum)
-{
-
- return heap_page_would_be_all_visible(rel, buf,
- OldestXmin,
- NULL, 0,
- all_frozen,
- visibility_cutoff_xid,
- logging_offnum);
-}
-#endif
/*
* Check whether the heap page in buf is all-visible except for the dead
@@ -3533,15 +3470,12 @@ heap_page_is_all_visible(Relation rel, Buffer buf,
* - *logging_offnum: OffsetNumber of current tuple being processed;
* used by vacuum's error callback system.
*
- * Callers looking to verify that the page is already all-visible can call
- * heap_page_is_all_visible().
- *
* This logic is closely related to heap_prune_record_unchanged_lp_normal().
* If you modify this function, ensure consistency with that code. An
* assertion cross-checks that both remain in agreement. Do not introduce new
* side-effects.
*/
-static bool
+bool
heap_page_would_be_all_visible(Relation rel, Buffer buf,
TransactionId OldestXmin,
OffsetNumber *deadoffsets,
diff --git a/src/include/access/heapam.h b/src/include/access/heapam.h
index ad2af13ec39..bec2f840102 100644
--- a/src/include/access/heapam.h
+++ b/src/include/access/heapam.h
@@ -257,8 +257,7 @@ typedef struct PruneFreezeParams
* HEAP_PAGE_PRUNE_MARK_UNUSED_NOW indicates that dead items can be set
* LP_UNUSED during pruning.
*
- * HEAP_PAGE_PRUNE_FREEZE indicates that we will also freeze tuples, and
- * will return 'all_visible', 'all_frozen' flags to the caller.
+ * HEAP_PAGE_PRUNE_FREEZE indicates that we will also freeze tuples
*
* HEAP_PAGE_PRUNE_UPDATE_VM indicates that we will set the page's status
* in the VM.
@@ -294,21 +293,6 @@ typedef struct PruneFreezeResult
int live_tuples;
int recently_dead_tuples;
- /*
- * all_visible and all_frozen indicate if the all-visible and all-frozen
- * bits in the visibility map can be set for this page, after pruning.
- *
- * vm_conflict_horizon is the newest xmin of live tuples on the page. The
- * caller can use it as the conflict horizon when setting the VM bits. It
- * is only valid if we froze some tuples (nfrozen > 0), and all_frozen is
- * true.
- *
- * These are only set if the HEAP_PRUNE_FREEZE option is set.
- */
- bool all_visible;
- bool all_frozen;
- TransactionId vm_conflict_horizon;
-
/*
* old_vmbits are the state of the all-visible and all-frozen bits in the
* visibility map before updating it during phase I of vacuuming.
@@ -454,6 +438,13 @@ extern void log_heap_prune_and_freeze(Relation relation, Buffer buffer,
/* in heap/vacuumlazy.c */
extern void heap_vacuum_rel(Relation rel,
const VacuumParams params, BufferAccessStrategy bstrategy);
+extern bool heap_page_would_be_all_visible(Relation rel, Buffer buf,
+ TransactionId OldestXmin,
+ OffsetNumber *deadoffsets,
+ int ndeadoffsets,
+ bool *all_frozen,
+ TransactionId *visibility_cutoff_xid,
+ OffsetNumber *logging_offnum);
/* in heap/heapam_visibility.c */
extern bool HeapTupleSatisfiesVisibility(HeapTuple htup, Snapshot snapshot,
--
2.43.0
v28-0006-Eliminate-XLOG_HEAP2_VISIBLE-from-vacuum-phase-I.patchtext/x-patch; charset=US-ASCII; name=v28-0006-Eliminate-XLOG_HEAP2_VISIBLE-from-vacuum-phase-I.patchDownload
From beb1ae3557904962dcec6266b882cdc75a0c7051 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <melanieplageman@gmail.com>
Date: Tue, 2 Dec 2025 16:16:22 -0500
Subject: [PATCH v28 06/15] Eliminate XLOG_HEAP2_VISIBLE from vacuum phase I
prune/freeze
Vacuum no longer emits a separate WAL record for each page set
all-visible or all-frozen during phase I. Instead, visibility map
updates are now included in the XLOG_HEAP2_PRUNE_VACUUM_SCAN record that
is already emitted for pruning and freezing.
Previously, heap_page_prune_and_freeze() determined whether a page was
all-visible, but the corresponding VM bits were only set later in
lazy_scan_prune(). Now the VM is updated immediately in
heap_page_prune_and_freeze(), at the same time as the heap
modifications.
This change applies only to vacuum phase I, not to pruning performed
during normal page access.
Author: Melanie Plageman <melanieplageman@gmail.com>
Reviewed-by: Andres Freund <andres@anarazel.de>
Reviewed-by: Robert Haas <robertmhaas@gmail.com>
Reviewed-by: Kirill Reshke <reshkekirill@gmail.com>
Reviewed-by: Chao Li <li.evan.chao@gmail.com>
Discussion: https://postgr.es/m/flat/CAAKRu_ZMw6Npd_qm2KM%2BFwQ3cMOMx1Dh3VMhp8-V7SOLxdK9-g%40mail.gmail.com
---
src/backend/access/heap/pruneheap.c | 277 ++++++++++++++++------------
1 file changed, 158 insertions(+), 119 deletions(-)
diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index 39149fbba7c..3521e70b8d0 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -205,6 +205,11 @@ static bool heap_page_will_set_vm(PruneState *prstate,
int nlpdead_items,
uint8 *old_vmbits,
uint8 *new_vmbits);
+static TransactionId get_conflict_xid(bool do_prune, bool do_freeze, bool do_set_vm,
+ uint8 old_vmbits, uint8 new_vmbits,
+ TransactionId latest_xid_removed,
+ TransactionId frz_conflict_horizon,
+ TransactionId visibility_cutoff_xid);
/*
@@ -795,6 +800,68 @@ heap_page_will_freeze(Relation relation, Buffer buffer,
return do_freeze;
}
+/*
+ * Calculate the conflict horizon for the whole XLOG_HEAP2_PRUNE_VACUUM_SCAN
+ * or XLOG_HEAP2_PRUNE_ON_ACCESS record.
+ */
+static TransactionId
+get_conflict_xid(bool do_prune, bool do_freeze, bool do_set_vm,
+ uint8 old_vmbits, uint8 new_vmbits,
+ TransactionId latest_xid_removed, TransactionId frz_conflict_horizon,
+ TransactionId visibility_cutoff_xid)
+{
+ TransactionId conflict_xid;
+
+ /*
+ * We can omit the snapshot conflict horizon if we are not pruning or
+ * freezing any tuples and are setting an already all-visible page
+ * all-frozen in the VM. In this case, all of the tuples on the page must
+ * already be visible to all MVCC snapshots on the standby.
+ */
+ if (!do_prune &&
+ !do_freeze &&
+ do_set_vm &&
+ (old_vmbits & VISIBILITYMAP_ALL_VISIBLE) != 0 &&
+ (new_vmbits & VISIBILITYMAP_ALL_FROZEN) != 0)
+ return InvalidTransactionId;
+
+ /*
+ * The snapshotConflictHorizon for the whole record should be the most
+ * conservative of all the horizons calculated for any of the possible
+ * modifications. If this record will prune tuples, any transactions on
+ * the standby older than the youngest xmax of the most recently removed
+ * tuple this record will prune will conflict. If this record will freeze
+ * tuples, any transactions on the standby with xids older than the
+ * youngest tuple this record will freeze will conflict.
+ */
+ conflict_xid = InvalidTransactionId;
+
+ /*
+ * If we are updating the VM, the conflict horizon is almost always the
+ * visibility cutoff XID.
+ *
+ * Separately, if we are freezing any tuples, as an optimization, we can
+ * use the visibility_cutoff_xid as the conflict horizon if the page will
+ * be all-frozen. This is true even if there are LP_DEAD line pointers
+ * because we ignored those when maintaining the visibility_cutoff_xid.
+ * This will have been calculated earlier as the frz_conflict_horizon when
+ * we determined we would freeze.
+ */
+ if (do_set_vm)
+ conflict_xid = visibility_cutoff_xid;
+ else if (do_freeze)
+ conflict_xid = frz_conflict_horizon;
+
+ /*
+ * If we are removing tuples with a younger xmax than our so far
+ * calculated conflict_xid, we must use this as our horizon.
+ */
+ if (TransactionIdFollows(latest_xid_removed, conflict_xid))
+ conflict_xid = latest_xid_removed;
+
+ return conflict_xid;
+}
+
/*
* Helper to correct any corruption detected on an heap page and its
* corresponding visibility map page after pruning but before setting the
@@ -996,7 +1063,6 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
Buffer vmbuffer = params->vmbuffer;
Page page = BufferGetPage(buffer);
BlockNumber blockno = BufferGetBlockNumber(buffer);
- TransactionId vm_conflict_horizon = InvalidTransactionId;
PruneState prstate;
bool do_freeze;
bool do_prune;
@@ -1004,10 +1070,10 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
bool do_set_vm;
bool did_tuple_hint_fpi;
int64 fpi_before = pgWalUsage.wal_fpi;
+ TransactionId conflict_xid = InvalidTransactionId;
uint8 new_vmbits = 0;
uint8 old_vmbits = 0;
-
/* Initialize prstate */
prune_freeze_setup(params,
new_relfrozen_xid, new_relmin_mxid,
@@ -1068,6 +1134,37 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
prstate.all_visible = prstate.all_frozen = false;
Assert(!prstate.all_frozen || prstate.all_visible);
+ Assert(!prstate.all_visible || (prstate.lpdead_items == 0));
+
+ /*
+ * Decide whether to set the VM bits based on information from the VM and
+ * the all_visible/all_frozen flags.
+ */
+ do_set_vm = heap_page_will_set_vm(&prstate,
+ params->relation,
+ blockno,
+ buffer,
+ page,
+ vmbuffer,
+ prstate.lpdead_items,
+ &old_vmbits,
+ &new_vmbits);
+
+ /*
+ * new_vmbits should be 0 regardless of whether or not the page is
+ * all-visible if we do not intend to set the VM.
+ */
+ Assert(do_set_vm || new_vmbits == 0);
+
+ conflict_xid = get_conflict_xid(do_prune, do_freeze, do_set_vm,
+ old_vmbits, new_vmbits,
+ prstate.latest_xid_removed,
+ prstate.frz_conflict_horizon,
+ prstate.visibility_cutoff_xid);
+
+ /* Lock vmbuffer before entering a critical section */
+ if (do_set_vm)
+ LockBuffer(vmbuffer, BUFFER_LOCK_EXCLUSIVE);
/* Any error while applying the changes is critical */
START_CRIT_SECTION();
@@ -1089,14 +1186,17 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
/*
* If that's all we had to do to the page, this is a non-WAL-logged
- * hint. If we are going to freeze or prune the page, we will mark
- * the buffer dirty below.
+ * hint. If we are going to freeze or prune the page or set
+ * PD_ALL_VISIBLE, we will mark the buffer dirty below.
+ *
+ * Setting PD_ALL_VISIBLE is fully WAL-logged because it is forbidden
+ * for the VM to be set and PD_ALL_VISIBLE to be clear.
*/
- if (!do_freeze && !do_prune)
+ if (!do_freeze && !do_prune && !do_set_vm)
MarkBufferDirtyHint(buffer, true);
}
- if (do_prune || do_freeze)
+ if (do_prune || do_freeze || do_set_vm)
{
/* Apply the planned item changes and repair page fragmentation. */
if (do_prune)
@@ -1110,6 +1210,27 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
if (do_freeze)
heap_freeze_prepared_tuples(buffer, prstate.frozen, prstate.nfrozen);
+ /* Set the visibility map and page visibility hint */
+ if (do_set_vm)
+ {
+ /*
+ * While it is valid for PD_ALL_VISIBLE to be set when the
+ * corresponding VM bit is clear, we strongly prefer to keep them
+ * in sync.
+ *
+ * Even if we are only setting the VM and PD_ALL_VISIBLE is
+ * already set, we don't need to worry about unnecessarily
+ * dirtying the heap buffer below. Nearly the only scenario where
+ * PD_ALL_VISIBLE is set but the VM is not is if the VM was
+ * removed, and that isn't worth optimizing for. And, if we add
+ * the heap buffer to the WAL chain (without passing
+ * REGBUF_NO_CHANGES), it must be marked dirty.
+ */
+ PageSetAllVisible(page);
+ visibilitymap_set_vmbits(blockno, vmbuffer, new_vmbits,
+ params->relation->rd_locator);
+ }
+
MarkBufferDirty(buffer);
/*
@@ -1117,29 +1238,12 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
*/
if (RelationNeedsWAL(params->relation))
{
- /*
- * The snapshotConflictHorizon for the whole record should be the
- * most conservative of all the horizons calculated for any of the
- * possible modifications. If this record will prune tuples, any
- * transactions on the standby older than the youngest xmax of the
- * most recently removed tuple this record will prune will
- * conflict. If this record will freeze tuples, any transactions
- * on the standby with xids older than the youngest tuple this
- * record will freeze will conflict.
- */
- TransactionId conflict_xid;
-
- if (TransactionIdFollows(prstate.frz_conflict_horizon,
- prstate.latest_xid_removed))
- conflict_xid = prstate.frz_conflict_horizon;
- else
- conflict_xid = prstate.latest_xid_removed;
-
log_heap_prune_and_freeze(params->relation, buffer,
- InvalidBuffer, /* vmbuffer */
- 0, /* vmflags */
+ do_set_vm ? vmbuffer : InvalidBuffer,
+ do_set_vm ? new_vmbits : 0,
conflict_xid,
- true, params->reason,
+ true, /* cleanup lock */
+ params->reason,
prstate.frozen, prstate.nfrozen,
prstate.redirected, prstate.nredirected,
prstate.nowdead, prstate.ndead,
@@ -1149,43 +1253,8 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
END_CRIT_SECTION();
- /* Copy information back for caller */
- presult->ndeleted = prstate.ndeleted;
- presult->nnewlpdead = prstate.ndead;
- presult->nfrozen = prstate.nfrozen;
- presult->live_tuples = prstate.live_tuples;
- presult->recently_dead_tuples = prstate.recently_dead_tuples;
- presult->hastup = prstate.hastup;
-
- presult->lpdead_items = prstate.lpdead_items;
- /* the presult->deadoffsets array was already filled in */
-
- if (prstate.attempt_freeze)
- {
- if (presult->nfrozen > 0)
- {
- *new_relfrozen_xid = prstate.pagefrz.FreezePageRelfrozenXid;
- *new_relmin_mxid = prstate.pagefrz.FreezePageRelminMxid;
- }
- else
- {
- *new_relfrozen_xid = prstate.pagefrz.NoFreezePageRelfrozenXid;
- *new_relmin_mxid = prstate.pagefrz.NoFreezePageRelminMxid;
- }
- }
-
- /*
- * If updating the visibility map, the conflict horizon for that record
- * must be the newest xmin on the page. However, if the page is
- * completely frozen, there can be no conflict and the vm_conflict_horizon
- * should remain InvalidTransactionId. This includes the case that we
- * just froze all the tuples; the prune-freeze record included the
- * conflict XID already so we don't need to again.
- */
- if (prstate.all_frozen)
- vm_conflict_horizon = InvalidTransactionId;
- else
- vm_conflict_horizon = prstate.visibility_cutoff_xid;
+ if (do_set_vm)
+ LockBuffer(vmbuffer, BUFFER_LOCK_UNLOCK);
/*
* During its second pass over the heap, VACUUM calls
@@ -1200,7 +1269,8 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
TransactionId debug_cutoff;
bool debug_all_frozen;
- Assert(presult->lpdead_items == 0);
+ Assert(prstate.lpdead_items == 0);
+ Assert(prstate.cutoffs);
Assert(heap_page_is_all_visible(params->relation, buffer,
prstate.cutoffs->OldestXmin,
@@ -1210,67 +1280,36 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
Assert(prstate.all_frozen == debug_all_frozen);
Assert(!TransactionIdIsValid(debug_cutoff) ||
- debug_cutoff == vm_conflict_horizon);
+ debug_cutoff == prstate.visibility_cutoff_xid);
}
#endif
- /* Now update the visibility map and PD_ALL_VISIBLE hint */
- Assert(!prstate.all_visible || (prstate.lpdead_items == 0));
-
- do_set_vm = heap_page_will_set_vm(&prstate,
- params->relation,
- blockno,
- buffer,
- page,
- vmbuffer,
- prstate.lpdead_items,
- &old_vmbits,
- &new_vmbits);
+ /* Copy information back for caller */
+ presult->ndeleted = prstate.ndeleted;
+ presult->nnewlpdead = prstate.ndead;
+ presult->nfrozen = prstate.nfrozen;
+ presult->live_tuples = prstate.live_tuples;
+ presult->recently_dead_tuples = prstate.recently_dead_tuples;
+ presult->hastup = prstate.hastup;
+ presult->new_vmbits = new_vmbits;
+ presult->old_vmbits = old_vmbits;
- /*
- * new_vmbits should be 0 regardless of whether or not the page is
- * all-visible if we do not intend to set the VM.
- */
- Assert(do_set_vm || new_vmbits == 0);
+ presult->lpdead_items = prstate.lpdead_items;
+ /* the presult->deadoffsets array was already filled in */
- /* Set the visibility map and page visibility hint, if relevant */
- if (do_set_vm)
+ if (prstate.attempt_freeze)
{
- Assert(prstate.all_visible);
-
- /*
- * It should never be the case that the visibility map page is set
- * while the page-level bit is clear, but the reverse is allowed (if
- * checksums are not enabled). Regardless, set both bits so that we
- * get back in sync.
- *
- * Even if PD_ALL_VISIBLE is already set, we don't need to worry about
- * unnecessarily dirtying the heap buffer. Nearly the only scenario
- * where PD_ALL_VISIBLE is set but the VM is not is if the VM was
- * removed -- and that isn't worth optimizing for. And if we add the
- * heap buffer to the WAL chain (without passing REGBUF_NO_CHANGES),
- * it must be marked dirty.
- */
- PageSetAllVisible(page);
- MarkBufferDirty(buffer);
-
- /*
- * If the page is being set all-frozen, we pass InvalidTransactionId
- * as the cutoff_xid, since a snapshot conflict horizon sufficient to
- * make everything safe for REDO was logged when the page's tuples
- * were frozen.
- */
- Assert(!prstate.all_frozen || !TransactionIdIsValid(vm_conflict_horizon));
-
- visibilitymap_set(params->relation, blockno, buffer,
- InvalidXLogRecPtr,
- vmbuffer, vm_conflict_horizon,
- new_vmbits);
+ if (presult->nfrozen > 0)
+ {
+ *new_relfrozen_xid = prstate.pagefrz.FreezePageRelfrozenXid;
+ *new_relmin_mxid = prstate.pagefrz.FreezePageRelminMxid;
+ }
+ else
+ {
+ *new_relfrozen_xid = prstate.pagefrz.NoFreezePageRelfrozenXid;
+ *new_relmin_mxid = prstate.pagefrz.NoFreezePageRelminMxid;
+ }
}
-
- /* Save the vmbits for caller */
- presult->old_vmbits = old_vmbits;
- presult->new_vmbits = new_vmbits;
}
--
2.43.0
v28-0007-Eliminate-XLOG_HEAP2_VISIBLE-from-empty-page-vac.patchtext/x-patch; charset=US-ASCII; name=v28-0007-Eliminate-XLOG_HEAP2_VISIBLE-from-empty-page-vac.patchDownload
From 3d8444447657a04513e044b4261b5d6334f1bef7 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <melanieplageman@gmail.com>
Date: Sat, 27 Sep 2025 11:55:21 -0400
Subject: [PATCH v28 07/15] Eliminate XLOG_HEAP2_VISIBLE from empty-page vacuum
As part of removing XLOG_HEAP2_VISIBLE records, phase I of VACUUM now
marks empty pages all-visible in a XLOG_HEAP2_PRUNE_VACUUM_SCAN record.
Author: Melanie Plageman <melanieplageman@gmail.com>
Reviewed-by: Robert Haas <robertmhaas@gmail.com>
Reviewed-by: Andres Freund <andres@anarazel.de>
Reviewed-by: Chao Li <li.evan.chao@gmail.com>
---
src/backend/access/heap/vacuumlazy.c | 35 +++++++++++++++++++++++-----
1 file changed, 29 insertions(+), 6 deletions(-)
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index 61564aea5fd..e311e7d6604 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -1866,9 +1866,12 @@ lazy_scan_new_or_empty(LVRelState *vacrel, Buffer buf, BlockNumber blkno,
*/
if (!PageIsAllVisible(page))
{
+ /* Lock vmbuffer before entering critical section */
+ LockBuffer(vmbuffer, BUFFER_LOCK_EXCLUSIVE);
+
START_CRIT_SECTION();
- /* mark buffer dirty before writing a WAL record */
+ /* Mark buffer dirty before writing any WAL records */
MarkBufferDirty(buf);
/*
@@ -1885,13 +1888,33 @@ lazy_scan_new_or_empty(LVRelState *vacrel, Buffer buf, BlockNumber blkno,
log_newpage_buffer(buf, true);
PageSetAllVisible(page);
- visibilitymap_set(vacrel->rel, blkno, buf,
- InvalidXLogRecPtr,
- vmbuffer, InvalidTransactionId,
- VISIBILITYMAP_ALL_VISIBLE |
- VISIBILITYMAP_ALL_FROZEN);
+ visibilitymap_set_vmbits(blkno,
+ vmbuffer,
+ VISIBILITYMAP_ALL_VISIBLE |
+ VISIBILITYMAP_ALL_FROZEN,
+ vacrel->rel->rd_locator);
+
+ /*
+ * Emit WAL for setting PD_ALL_VISIBLE on the heap page and
+ * setting the VM.
+ */
+ if (RelationNeedsWAL(vacrel->rel))
+ log_heap_prune_and_freeze(vacrel->rel, buf,
+ vmbuffer,
+ VISIBILITYMAP_ALL_VISIBLE |
+ VISIBILITYMAP_ALL_FROZEN,
+ InvalidTransactionId, /* conflict xid */
+ false, /* cleanup lock */
+ PRUNE_VACUUM_SCAN, /* reason */
+ NULL, 0,
+ NULL, 0,
+ NULL, 0,
+ NULL, 0);
+
END_CRIT_SECTION();
+ LockBuffer(vmbuffer, BUFFER_LOCK_UNLOCK);
+
/* Count the newly all-frozen pages for logging */
vacrel->vm_new_visible_pages++;
vacrel->vm_new_visible_frozen_pages++;
--
2.43.0
v28-0008-Remove-XLOG_HEAP2_VISIBLE-entirely.patchtext/x-patch; charset=US-ASCII; name=v28-0008-Remove-XLOG_HEAP2_VISIBLE-entirely.patchDownload
From 1acb98f855c1c28993bbd9ccb90a2250e4d64980 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <melanieplageman@gmail.com>
Date: Sat, 27 Sep 2025 11:55:36 -0400
Subject: [PATCH v28 08/15] Remove XLOG_HEAP2_VISIBLE entirely
As no remaining users emit XLOG_HEAP2_VISIBLE records.
This includes deleting the xl_heap_visible struct and all functions
responsible for emitting or replaying XLOG_HEAP2_VISIBLE records.
This changes the visibility map API, so any external users/consumers of
the VM-only WAL record will need to change.
Author: Melanie Plageman <melanieplageman@gmail.com>
Reviewed-by: Andrey Borodin <x4mmm@yandex-team.ru>
Reviewed-by: Andres Freund <andres@anarazel.de>
Reviewed-by: Chao Li <li.evan.chao@gmail.com>
---
src/backend/access/common/bufmask.c | 4 +-
src/backend/access/heap/heapam.c | 54 +-------
src/backend/access/heap/heapam_xlog.c | 155 ++---------------------
src/backend/access/heap/pruneheap.c | 4 +-
src/backend/access/heap/vacuumlazy.c | 16 +--
src/backend/access/heap/visibilitymap.c | 112 +---------------
src/backend/access/rmgrdesc/heapdesc.c | 10 --
src/backend/replication/logical/decode.c | 1 -
src/backend/storage/ipc/standby.c | 12 +-
src/include/access/heapam_xlog.h | 20 ---
src/include/access/visibilitymap.h | 13 +-
src/include/access/visibilitymapdefs.h | 9 --
src/tools/pgindent/typedefs.list | 1 -
13 files changed, 38 insertions(+), 373 deletions(-)
diff --git a/src/backend/access/common/bufmask.c b/src/backend/access/common/bufmask.c
index bb260cffa68..5f07f179415 100644
--- a/src/backend/access/common/bufmask.c
+++ b/src/backend/access/common/bufmask.c
@@ -56,8 +56,8 @@ mask_page_hint_bits(Page page)
/*
* During replay, if the page LSN has advanced past our XLOG record's LSN,
- * we don't mark the page all-visible. See heap_xlog_visible() for
- * details.
+ * we don't mark the page all-visible. See heap_xlog_prune_and_freeze()
+ * for more details.
*/
PageClearAllVisible(page);
}
diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index 6daf4a87dec..fb7a7548aa0 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -2539,11 +2539,11 @@ heap_multi_insert(Relation relation, TupleTableSlot **slots, int ntuples,
else if (all_frozen_set)
{
PageSetAllVisible(page);
- visibilitymap_set_vmbits(BufferGetBlockNumber(buffer),
- vmbuffer,
- VISIBILITYMAP_ALL_VISIBLE |
- VISIBILITYMAP_ALL_FROZEN,
- relation->rd_locator);
+ visibilitymap_set(BufferGetBlockNumber(buffer),
+ vmbuffer,
+ VISIBILITYMAP_ALL_VISIBLE |
+ VISIBILITYMAP_ALL_FROZEN,
+ relation->rd_locator);
}
/*
@@ -8813,50 +8813,6 @@ bottomup_sort_and_shrink(TM_IndexDeleteOp *delstate)
return nblocksfavorable;
}
-/*
- * Perform XLogInsert for a heap-visible operation. 'block' is the block
- * being marked all-visible, and vm_buffer is the buffer containing the
- * corresponding visibility map block. Both should have already been modified
- * and dirtied.
- *
- * snapshotConflictHorizon comes from the largest xmin on the page being
- * marked all-visible. REDO routine uses it to generate recovery conflicts.
- *
- * If checksums or wal_log_hints are enabled, we may also generate a full-page
- * image of heap_buffer. Otherwise, we optimize away the FPI (by specifying
- * REGBUF_NO_IMAGE for the heap buffer), in which case the caller should *not*
- * update the heap page's LSN.
- */
-XLogRecPtr
-log_heap_visible(Relation rel, Buffer heap_buffer, Buffer vm_buffer,
- TransactionId snapshotConflictHorizon, uint8 vmflags)
-{
- xl_heap_visible xlrec;
- XLogRecPtr recptr;
- uint8 flags;
-
- Assert(BufferIsValid(heap_buffer));
- Assert(BufferIsValid(vm_buffer));
-
- xlrec.snapshotConflictHorizon = snapshotConflictHorizon;
- xlrec.flags = vmflags;
- if (RelationIsAccessibleInLogicalDecoding(rel))
- xlrec.flags |= VISIBILITYMAP_XLOG_CATALOG_REL;
- XLogBeginInsert();
- XLogRegisterData(&xlrec, SizeOfHeapVisible);
-
- XLogRegisterBuffer(0, vm_buffer, 0);
-
- flags = REGBUF_STANDARD;
- if (!XLogHintBitIsNeeded())
- flags |= REGBUF_NO_IMAGE;
- XLogRegisterBuffer(1, heap_buffer, flags);
-
- recptr = XLogInsert(RM_HEAP2_ID, XLOG_HEAP2_VISIBLE);
-
- return recptr;
-}
-
/*
* Perform XLogInsert for a heap-update operation. Caller must already
* have modified the buffer(s) and marked them dirty.
diff --git a/src/backend/access/heap/heapam_xlog.c b/src/backend/access/heap/heapam_xlog.c
index 1823feff298..47d2479415e 100644
--- a/src/backend/access/heap/heapam_xlog.c
+++ b/src/backend/access/heap/heapam_xlog.c
@@ -236,7 +236,7 @@ heap_xlog_prune_freeze(XLogReaderState *record)
if (PageIsNew(vmpage))
PageInit(vmpage, BLCKSZ, 0);
- visibilitymap_set_vmbits(blkno, vmbuffer, vmflags, rlocator);
+ visibilitymap_set(blkno, vmbuffer, vmflags, rlocator);
Assert(BufferIsDirty(vmbuffer));
PageSetLSN(vmpage, lsn);
@@ -249,142 +249,6 @@ heap_xlog_prune_freeze(XLogReaderState *record)
XLogRecordPageWithFreeSpace(rlocator, blkno, freespace);
}
-/*
- * Replay XLOG_HEAP2_VISIBLE records.
- *
- * The critical integrity requirement here is that we must never end up with
- * a situation where the visibility map bit is set, and the page-level
- * PD_ALL_VISIBLE bit is clear. If that were to occur, then a subsequent
- * page modification would fail to clear the visibility map bit.
- */
-static void
-heap_xlog_visible(XLogReaderState *record)
-{
- XLogRecPtr lsn = record->EndRecPtr;
- xl_heap_visible *xlrec = (xl_heap_visible *) XLogRecGetData(record);
- Buffer vmbuffer = InvalidBuffer;
- Buffer buffer;
- Page page;
- RelFileLocator rlocator;
- BlockNumber blkno;
- XLogRedoAction action;
-
- Assert((xlrec->flags & VISIBILITYMAP_XLOG_VALID_BITS) == xlrec->flags);
-
- XLogRecGetBlockTag(record, 1, &rlocator, NULL, &blkno);
-
- /*
- * If there are any Hot Standby transactions running that have an xmin
- * horizon old enough that this page isn't all-visible for them, they
- * might incorrectly decide that an index-only scan can skip a heap fetch.
- *
- * NB: It might be better to throw some kind of "soft" conflict here that
- * forces any index-only scan that is in flight to perform heap fetches,
- * rather than killing the transaction outright.
- */
- if (InHotStandby)
- ResolveRecoveryConflictWithSnapshot(xlrec->snapshotConflictHorizon,
- xlrec->flags & VISIBILITYMAP_XLOG_CATALOG_REL,
- rlocator);
-
- /*
- * Read the heap page, if it still exists. If the heap file has dropped or
- * truncated later in recovery, we don't need to update the page, but we'd
- * better still update the visibility map.
- */
- action = XLogReadBufferForRedo(record, 1, &buffer);
- if (action == BLK_NEEDS_REDO)
- {
- /*
- * We don't bump the LSN of the heap page when setting the visibility
- * map bit (unless checksums or wal_hint_bits is enabled, in which
- * case we must). This exposes us to torn page hazards, but since
- * we're not inspecting the existing page contents in any way, we
- * don't care.
- */
- page = BufferGetPage(buffer);
-
- PageSetAllVisible(page);
-
- if (XLogHintBitIsNeeded())
- PageSetLSN(page, lsn);
-
- MarkBufferDirty(buffer);
- }
- else if (action == BLK_RESTORED)
- {
- /*
- * If heap block was backed up, we already restored it and there's
- * nothing more to do. (This can only happen with checksums or
- * wal_log_hints enabled.)
- */
- }
-
- if (BufferIsValid(buffer))
- {
- Size space = PageGetFreeSpace(BufferGetPage(buffer));
-
- UnlockReleaseBuffer(buffer);
-
- /*
- * Since FSM is not WAL-logged and only updated heuristically, it
- * easily becomes stale in standbys. If the standby is later promoted
- * and runs VACUUM, it will skip updating individual free space
- * figures for pages that became all-visible (or all-frozen, depending
- * on the vacuum mode,) which is troublesome when FreeSpaceMapVacuum
- * propagates too optimistic free space values to upper FSM layers;
- * later inserters try to use such pages only to find out that they
- * are unusable. This can cause long stalls when there are many such
- * pages.
- *
- * Forestall those problems by updating FSM's idea about a page that
- * is becoming all-visible or all-frozen.
- *
- * Do this regardless of a full-page image being applied, since the
- * FSM data is not in the page anyway.
- */
- if (xlrec->flags & VISIBILITYMAP_VALID_BITS)
- XLogRecordPageWithFreeSpace(rlocator, blkno, space);
- }
-
- /*
- * Even if we skipped the heap page update due to the LSN interlock, it's
- * still safe to update the visibility map. Any WAL record that clears
- * the visibility map bit does so before checking the page LSN, so any
- * bits that need to be cleared will still be cleared.
- */
- if (XLogReadBufferForRedoExtended(record, 0, RBM_ZERO_ON_ERROR, false,
- &vmbuffer) == BLK_NEEDS_REDO)
- {
- Page vmpage = BufferGetPage(vmbuffer);
- Relation reln;
- uint8 vmbits;
-
- /* initialize the page if it was read as zeros */
- if (PageIsNew(vmpage))
- PageInit(vmpage, BLCKSZ, 0);
-
- /* remove VISIBILITYMAP_XLOG_* */
- vmbits = xlrec->flags & VISIBILITYMAP_VALID_BITS;
-
- /*
- * XLogReadBufferForRedoExtended locked the buffer. But
- * visibilitymap_set will handle locking itself.
- */
- LockBuffer(vmbuffer, BUFFER_LOCK_UNLOCK);
-
- reln = CreateFakeRelcacheEntry(rlocator);
-
- visibilitymap_set(reln, blkno, InvalidBuffer, lsn, vmbuffer,
- xlrec->snapshotConflictHorizon, vmbits);
-
- ReleaseBuffer(vmbuffer);
- FreeFakeRelcacheEntry(reln);
- }
- else if (BufferIsValid(vmbuffer))
- UnlockReleaseBuffer(vmbuffer);
-}
-
/*
* Given an "infobits" field from an XLog record, set the correct bits in the
* given infomask and infomask2 for the tuple touched by the record.
@@ -762,8 +626,8 @@ heap_xlog_multi_insert(XLogReaderState *record)
*
* During recovery, however, no concurrent writers exist. Therefore,
* updating the VM without holding the heap page lock is safe enough. This
- * same approach is taken when replaying xl_heap_visible records (see
- * heap_xlog_visible()).
+ * same approach is taken when replaying XLOG_HEAP2_PRUNE* records (see
+ * heap_xlog_prune_and_freeze()).
*/
if ((xlrec->flags & XLH_INSERT_ALL_FROZEN_SET) &&
XLogReadBufferForRedoExtended(record, 1, RBM_ZERO_ON_ERROR, false,
@@ -775,11 +639,11 @@ heap_xlog_multi_insert(XLogReaderState *record)
if (PageIsNew(vmpage))
PageInit(vmpage, BLCKSZ, 0);
- visibilitymap_set_vmbits(blkno,
- vmbuffer,
- VISIBILITYMAP_ALL_VISIBLE |
- VISIBILITYMAP_ALL_FROZEN,
- rlocator);
+ visibilitymap_set(blkno,
+ vmbuffer,
+ VISIBILITYMAP_ALL_VISIBLE |
+ VISIBILITYMAP_ALL_FROZEN,
+ rlocator);
Assert(BufferIsDirty(vmbuffer));
PageSetLSN(vmpage, lsn);
@@ -1360,9 +1224,6 @@ heap2_redo(XLogReaderState *record)
case XLOG_HEAP2_PRUNE_VACUUM_CLEANUP:
heap_xlog_prune_freeze(record);
break;
- case XLOG_HEAP2_VISIBLE:
- heap_xlog_visible(record);
- break;
case XLOG_HEAP2_MULTI_INSERT:
heap_xlog_multi_insert(record);
break;
diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index 3521e70b8d0..86de3613f5e 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -1227,8 +1227,8 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
* REGBUF_NO_CHANGES), it must be marked dirty.
*/
PageSetAllVisible(page);
- visibilitymap_set_vmbits(blockno, vmbuffer, new_vmbits,
- params->relation->rd_locator);
+ visibilitymap_set(blockno, vmbuffer, new_vmbits,
+ params->relation->rd_locator);
}
MarkBufferDirty(buffer);
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index e311e7d6604..9dec4875e3a 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -1888,11 +1888,11 @@ lazy_scan_new_or_empty(LVRelState *vacrel, Buffer buf, BlockNumber blkno,
log_newpage_buffer(buf, true);
PageSetAllVisible(page);
- visibilitymap_set_vmbits(blkno,
- vmbuffer,
- VISIBILITYMAP_ALL_VISIBLE |
- VISIBILITYMAP_ALL_FROZEN,
- vacrel->rel->rd_locator);
+ visibilitymap_set(blkno,
+ vmbuffer,
+ VISIBILITYMAP_ALL_VISIBLE |
+ VISIBILITYMAP_ALL_FROZEN,
+ vacrel->rel->rd_locator);
/*
* Emit WAL for setting PD_ALL_VISIBLE on the heap page and
@@ -2769,9 +2769,9 @@ lazy_vacuum_heap_page(LVRelState *vacrel, BlockNumber blkno, Buffer buffer,
* set PD_ALL_VISIBLE.
*/
PageSetAllVisible(page);
- visibilitymap_set_vmbits(blkno,
- vmbuffer, vmflags,
- vacrel->rel->rd_locator);
+ visibilitymap_set(blkno,
+ vmbuffer, vmflags,
+ vacrel->rel->rd_locator);
conflict_xid = visibility_cutoff_xid;
}
diff --git a/src/backend/access/heap/visibilitymap.c b/src/backend/access/heap/visibilitymap.c
index d14588e92ae..7997e926872 100644
--- a/src/backend/access/heap/visibilitymap.c
+++ b/src/backend/access/heap/visibilitymap.c
@@ -14,8 +14,7 @@
* visibilitymap_clear - clear bits for one page in the visibility map
* visibilitymap_pin - pin a map page for setting a bit
* visibilitymap_pin_ok - check whether correct map page is already pinned
- * visibilitymap_set - set bit(s) in a previously pinned page and log
- * visibilitymap_set_vmbits - set bit(s) in a pinned page
+ * visibilitymap_set - set bit(s) in a previously pinned page
* visibilitymap_get_status - get status of bits
* visibilitymap_count - count number of bits set in visibility map
* visibilitymap_prepare_truncate -
@@ -220,109 +219,6 @@ visibilitymap_pin_ok(BlockNumber heapBlk, Buffer vmbuf)
return BufferIsValid(vmbuf) && BufferGetBlockNumber(vmbuf) == mapBlock;
}
-/*
- * visibilitymap_set - set bit(s) on a previously pinned page
- *
- * recptr is the LSN of the XLOG record we're replaying, if we're in recovery,
- * or InvalidXLogRecPtr in normal running. The VM page LSN is advanced to the
- * one provided; in normal running, we generate a new XLOG record and set the
- * page LSN to that value (though the heap page's LSN may *not* be updated;
- * see below). cutoff_xid is the largest xmin on the page being marked
- * all-visible; it is needed for Hot Standby, and can be InvalidTransactionId
- * if the page contains no tuples. It can also be set to InvalidTransactionId
- * when a page that is already all-visible is being marked all-frozen.
- *
- * Caller is expected to set the heap page's PD_ALL_VISIBLE bit before calling
- * this function. Except in recovery, caller should also pass the heap
- * buffer. When checksums are enabled and we're not in recovery, we must add
- * the heap buffer to the WAL chain to protect it from being torn.
- *
- * You must pass a buffer containing the correct map page to this function.
- * Call visibilitymap_pin first to pin the right one. This function doesn't do
- * any I/O.
- *
- * Returns the state of the page's VM bits before setting flags.
- */
-uint8
-visibilitymap_set(Relation rel, BlockNumber heapBlk, Buffer heapBuf,
- XLogRecPtr recptr, Buffer vmBuf, TransactionId cutoff_xid,
- uint8 flags)
-{
- BlockNumber mapBlock = HEAPBLK_TO_MAPBLOCK(heapBlk);
- uint32 mapByte = HEAPBLK_TO_MAPBYTE(heapBlk);
- uint8 mapOffset = HEAPBLK_TO_OFFSET(heapBlk);
- Page page;
- uint8 *map;
- uint8 status;
-
-#ifdef TRACE_VISIBILITYMAP
- elog(DEBUG1, "vm_set flags 0x%02X for %s %d",
- flags, RelationGetRelationName(rel), heapBlk);
-#endif
-
- Assert(InRecovery || !XLogRecPtrIsValid(recptr));
- Assert(InRecovery || PageIsAllVisible(BufferGetPage(heapBuf)));
- Assert((flags & VISIBILITYMAP_VALID_BITS) == flags);
-
- /* Must never set all_frozen bit without also setting all_visible bit */
- Assert(flags != VISIBILITYMAP_ALL_FROZEN);
-
- /* Check that we have the right heap page pinned, if present */
- if (BufferIsValid(heapBuf) && BufferGetBlockNumber(heapBuf) != heapBlk)
- elog(ERROR, "wrong heap buffer passed to visibilitymap_set");
-
- Assert(!BufferIsValid(heapBuf) ||
- BufferIsLockedByMeInMode(heapBuf, BUFFER_LOCK_EXCLUSIVE));
-
- /* Check that we have the right VM page pinned */
- if (!BufferIsValid(vmBuf) || BufferGetBlockNumber(vmBuf) != mapBlock)
- elog(ERROR, "wrong VM buffer passed to visibilitymap_set");
-
- page = BufferGetPage(vmBuf);
- map = (uint8 *) PageGetContents(page);
- LockBuffer(vmBuf, BUFFER_LOCK_EXCLUSIVE);
-
- status = (map[mapByte] >> mapOffset) & VISIBILITYMAP_VALID_BITS;
- if (flags != status)
- {
- START_CRIT_SECTION();
-
- map[mapByte] |= (flags << mapOffset);
- MarkBufferDirty(vmBuf);
-
- if (RelationNeedsWAL(rel))
- {
- if (!XLogRecPtrIsValid(recptr))
- {
- Assert(!InRecovery);
- recptr = log_heap_visible(rel, heapBuf, vmBuf, cutoff_xid, flags);
-
- /*
- * If data checksums are enabled (or wal_log_hints=on), we
- * need to protect the heap page from being torn.
- *
- * If not, then we must *not* update the heap page's LSN. In
- * this case, the FPI for the heap page was omitted from the
- * WAL record inserted above, so it would be incorrect to
- * update the heap page's LSN.
- */
- if (XLogHintBitIsNeeded())
- {
- Page heapPage = BufferGetPage(heapBuf);
-
- PageSetLSN(heapPage, recptr);
- }
- }
- PageSetLSN(page, recptr);
- }
-
- END_CRIT_SECTION();
- }
-
- LockBuffer(vmBuf, BUFFER_LOCK_UNLOCK);
- return status;
-}
-
/*
* Set VM (visibility map) flags in the VM block in vmBuf.
*
@@ -344,9 +240,9 @@ visibilitymap_set(Relation rel, BlockNumber heapBlk, Buffer heapBuf,
* rlocator is used only for debugging messages.
*/
uint8
-visibilitymap_set_vmbits(BlockNumber heapBlk,
- Buffer vmBuf, uint8 flags,
- const RelFileLocator rlocator)
+visibilitymap_set(BlockNumber heapBlk,
+ Buffer vmBuf, uint8 flags,
+ const RelFileLocator rlocator)
{
BlockNumber mapBlock = HEAPBLK_TO_MAPBLOCK(heapBlk);
uint32 mapByte = HEAPBLK_TO_MAPBYTE(heapBlk);
diff --git a/src/backend/access/rmgrdesc/heapdesc.c b/src/backend/access/rmgrdesc/heapdesc.c
index ca26d1f0ed1..08461fdf593 100644
--- a/src/backend/access/rmgrdesc/heapdesc.c
+++ b/src/backend/access/rmgrdesc/heapdesc.c
@@ -349,13 +349,6 @@ heap2_desc(StringInfo buf, XLogReaderState *record)
}
}
}
- else if (info == XLOG_HEAP2_VISIBLE)
- {
- xl_heap_visible *xlrec = (xl_heap_visible *) rec;
-
- appendStringInfo(buf, "snapshotConflictHorizon: %u, flags: 0x%02X",
- xlrec->snapshotConflictHorizon, xlrec->flags);
- }
else if (info == XLOG_HEAP2_MULTI_INSERT)
{
xl_heap_multi_insert *xlrec = (xl_heap_multi_insert *) rec;
@@ -461,9 +454,6 @@ heap2_identify(uint8 info)
case XLOG_HEAP2_PRUNE_VACUUM_CLEANUP:
id = "PRUNE_VACUUM_CLEANUP";
break;
- case XLOG_HEAP2_VISIBLE:
- id = "VISIBLE";
- break;
case XLOG_HEAP2_MULTI_INSERT:
id = "MULTI_INSERT";
break;
diff --git a/src/backend/replication/logical/decode.c b/src/backend/replication/logical/decode.c
index 5e15cb1825e..c0cac7ea1c3 100644
--- a/src/backend/replication/logical/decode.c
+++ b/src/backend/replication/logical/decode.c
@@ -454,7 +454,6 @@ heap2_decode(LogicalDecodingContext *ctx, XLogRecordBuffer *buf)
case XLOG_HEAP2_PRUNE_ON_ACCESS:
case XLOG_HEAP2_PRUNE_VACUUM_SCAN:
case XLOG_HEAP2_PRUNE_VACUUM_CLEANUP:
- case XLOG_HEAP2_VISIBLE:
case XLOG_HEAP2_LOCK_UPDATED:
break;
default:
diff --git a/src/backend/storage/ipc/standby.c b/src/backend/storage/ipc/standby.c
index fc45d72c79b..3655358ed6b 100644
--- a/src/backend/storage/ipc/standby.c
+++ b/src/backend/storage/ipc/standby.c
@@ -475,12 +475,12 @@ ResolveRecoveryConflictWithSnapshot(TransactionId snapshotConflictHorizon,
* If we get passed InvalidTransactionId then we do nothing (no conflict).
*
* This can happen when replaying already-applied WAL records after a
- * standby crash or restart, or when replaying an XLOG_HEAP2_VISIBLE
- * record that marks as frozen a page which was already all-visible. It's
- * also quite common with records generated during index deletion
- * (original execution of the deletion can reason that a recovery conflict
- * which is sufficient for the deletion operation must take place before
- * replay of the deletion record itself).
+ * standby crash or restart, or when replaying a record that marks as
+ * frozen a page which was already marked all-visible in the visibility
+ * map. It's also quite common with records generated during index
+ * deletion (original execution of the deletion can reason that a recovery
+ * conflict which is sufficient for the deletion operation must take place
+ * before replay of the deletion record itself).
*/
if (!TransactionIdIsValid(snapshotConflictHorizon))
return;
diff --git a/src/include/access/heapam_xlog.h b/src/include/access/heapam_xlog.h
index 16c2b2e3c9c..b27fcdfb345 100644
--- a/src/include/access/heapam_xlog.h
+++ b/src/include/access/heapam_xlog.h
@@ -60,7 +60,6 @@
#define XLOG_HEAP2_PRUNE_ON_ACCESS 0x10
#define XLOG_HEAP2_PRUNE_VACUUM_SCAN 0x20
#define XLOG_HEAP2_PRUNE_VACUUM_CLEANUP 0x30
-#define XLOG_HEAP2_VISIBLE 0x40
#define XLOG_HEAP2_MULTI_INSERT 0x50
#define XLOG_HEAP2_LOCK_UPDATED 0x60
#define XLOG_HEAP2_NEW_CID 0x70
@@ -443,20 +442,6 @@ typedef struct xl_heap_inplace
#define MinSizeOfHeapInplace (offsetof(xl_heap_inplace, nmsgs) + sizeof(int))
-/*
- * This is what we need to know about setting a visibility map bit
- *
- * Backup blk 0: visibility map buffer
- * Backup blk 1: heap buffer
- */
-typedef struct xl_heap_visible
-{
- TransactionId snapshotConflictHorizon;
- uint8 flags;
-} xl_heap_visible;
-
-#define SizeOfHeapVisible (offsetof(xl_heap_visible, flags) + sizeof(uint8))
-
typedef struct xl_heap_new_cid
{
/*
@@ -500,11 +485,6 @@ extern void heap2_desc(StringInfo buf, XLogReaderState *record);
extern const char *heap2_identify(uint8 info);
extern void heap_xlog_logical_rewrite(XLogReaderState *r);
-extern XLogRecPtr log_heap_visible(Relation rel, Buffer heap_buffer,
- Buffer vm_buffer,
- TransactionId snapshotConflictHorizon,
- uint8 vmflags);
-
/* in heapdesc.c, so it can be shared between frontend/backend code */
extern void heap_xlog_deserialize_prune_and_freeze(char *cursor, uint16 flags,
int *nplans, xlhp_freeze_plan **plans,
diff --git a/src/include/access/visibilitymap.h b/src/include/access/visibilitymap.h
index c6fa37be968..05ba6786b47 100644
--- a/src/include/access/visibilitymap.h
+++ b/src/include/access/visibilitymap.h
@@ -15,7 +15,6 @@
#define VISIBILITYMAP_H
#include "access/visibilitymapdefs.h"
-#include "access/xlogdefs.h"
#include "storage/block.h"
#include "storage/buf.h"
#include "storage/relfilelocator.h"
@@ -32,15 +31,9 @@ extern bool visibilitymap_clear(Relation rel, BlockNumber heapBlk,
extern void visibilitymap_pin(Relation rel, BlockNumber heapBlk,
Buffer *vmbuf);
extern bool visibilitymap_pin_ok(BlockNumber heapBlk, Buffer vmbuf);
-extern uint8 visibilitymap_set(Relation rel,
- BlockNumber heapBlk, Buffer heapBuf,
- XLogRecPtr recptr,
- Buffer vmBuf,
- TransactionId cutoff_xid,
- uint8 flags);
-extern uint8 visibilitymap_set_vmbits(BlockNumber heapBlk,
- Buffer vmBuf, uint8 flags,
- const RelFileLocator rlocator);
+extern uint8 visibilitymap_set(BlockNumber heapBlk,
+ Buffer vmBuf, uint8 flags,
+ const RelFileLocator rlocator);
extern uint8 visibilitymap_get_status(Relation rel, BlockNumber heapBlk, Buffer *vmbuf);
extern void visibilitymap_count(Relation rel, BlockNumber *all_visible, BlockNumber *all_frozen);
extern BlockNumber visibilitymap_prepare_truncate(Relation rel,
diff --git a/src/include/access/visibilitymapdefs.h b/src/include/access/visibilitymapdefs.h
index 5ad5c020877..e01bce4c99f 100644
--- a/src/include/access/visibilitymapdefs.h
+++ b/src/include/access/visibilitymapdefs.h
@@ -21,14 +21,5 @@
#define VISIBILITYMAP_ALL_FROZEN 0x02
#define VISIBILITYMAP_VALID_BITS 0x03 /* OR of all valid visibilitymap
* flags bits */
-/*
- * To detect recovery conflicts during logical decoding on a standby, we need
- * to know if a table is a user catalog table. For that we add an additional
- * bit into xl_heap_visible.flags, in addition to the above.
- *
- * NB: VISIBILITYMAP_XLOG_* may not be passed to visibilitymap_set().
- */
-#define VISIBILITYMAP_XLOG_CATALOG_REL 0x04
-#define VISIBILITYMAP_XLOG_VALID_BITS (VISIBILITYMAP_VALID_BITS | VISIBILITYMAP_XLOG_CATALOG_REL)
#endif /* VISIBILITYMAPDEFS_H */
diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list
index 04845d5e680..6505628120c 100644
--- a/src/tools/pgindent/typedefs.list
+++ b/src/tools/pgindent/typedefs.list
@@ -4331,7 +4331,6 @@ xl_heap_prune
xl_heap_rewrite_mapping
xl_heap_truncate
xl_heap_update
-xl_heap_visible
xl_invalid_page
xl_invalid_page_key
xl_invalidations
--
2.43.0
v28-0009-Simplify-heap_page_would_be_all_visible-visibili.patchtext/x-patch; charset=US-ASCII; name=v28-0009-Simplify-heap_page_would_be_all_visible-visibili.patchDownload
From 8ba420f2ac77650d22905ba7b4660dc70dad9383 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <melanieplageman@gmail.com>
Date: Wed, 17 Dec 2025 13:57:16 -0500
Subject: [PATCH v28 09/15] Simplify heap_page_would_be_all_visible visibility
check
heap_page_would_be_all_visible() doesn't care about the distinction
between HEAPTUPLE_RECENTLY_DEAD and HEAPTUPLE_DEAD tuples -- any tuple
that is not HEAPTUPLE_LIVE means the page is not all-visible and causes
us to return false.
Therefore, we don't need to call HeapTupleSatisfiesVacuum(), which
includes an extra step to distinguish between dead and recently dead
tuples using OldestXmin. Replace it with the more minimal
HeapTupleSatisfiesVacuumHorizon().
This has the added benefit of making it easier to replace uses of
OldestXmin in heap_page_would_be_all_visible() in the future.
---
src/backend/access/heap/vacuumlazy.c | 8 +++++++-
1 file changed, 7 insertions(+), 1 deletion(-)
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index 9dec4875e3a..441b4883d89 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -3535,6 +3535,7 @@ heap_page_would_be_all_visible(Relation rel, Buffer buf,
{
ItemId itemid;
HeapTupleData tuple;
+ TransactionId dead_after = InvalidTransactionId;
/*
* Set the offset number so that we can display it along with any
@@ -3574,12 +3575,14 @@ heap_page_would_be_all_visible(Relation rel, Buffer buf,
/* Visibility checks may do IO or allocate memory */
Assert(CritSectionCount == 0);
- switch (HeapTupleSatisfiesVacuum(&tuple, OldestXmin, buf))
+ switch (HeapTupleSatisfiesVacuumHorizon(&tuple, buf, &dead_after))
{
case HEAPTUPLE_LIVE:
{
TransactionId xmin;
+ Assert(!TransactionIdIsValid(dead_after));
+
/* Check comments in lazy_scan_prune. */
if (!HeapTupleHeaderXminCommitted(tuple.t_data))
{
@@ -3614,6 +3617,9 @@ heap_page_would_be_all_visible(Relation rel, Buffer buf,
case HEAPTUPLE_DEAD:
case HEAPTUPLE_RECENTLY_DEAD:
+ Assert(TransactionIdIsValid(dead_after));
+ /* FALLTHROUGH */
+
case HEAPTUPLE_INSERT_IN_PROGRESS:
case HEAPTUPLE_DELETE_IN_PROGRESS:
{
--
2.43.0
v28-0010-Use-GlobalVisState-in-vacuum-to-determine-page-l.patchtext/x-patch; charset=UTF-8; name=v28-0010-Use-GlobalVisState-in-vacuum-to-determine-page-l.patchDownload
From 7af3b6f670fd8e4d0bc2141d8d11d54696bc459c Mon Sep 17 00:00:00 2001
From: Melanie Plageman <melanieplageman@gmail.com>
Date: Wed, 17 Dec 2025 16:51:05 -0500
Subject: [PATCH v28 10/15] Use GlobalVisState in vacuum to determine page
level visibility
During vacuum's first and third phases, we examine tuples' visibility
to determine if we can set the page all-visible in the visibility map.
Previously, this check compared tuple xmins against a single XID chosen at
the start of vacuum (OldestXmin). We now use GlobalVisState, which also
enables future work to set the VM during on-access pruning, since ordinary
queries have access to GlobalVisState but not OldestXmin.
This also benefits vacuum: in some cases, GlobalVisState may advance
during a vacuum, allowing more pages to become considered all-visible.
And, in the future, we could easily add a heuristic to update
GlobalVisState more frequently during vacuums of large tables.
OldestXmin is still used for freezing and as a backstop to ensure we
don't freeze a dead tuple that wasn't yet prunable according to
GlobalVisState in the rare occurrences where GlobalVisState moves
backwards.
Because comparing a transaction ID against GlobalVisState is more
expensive than comparing against a single XID, we defer this check until
after scanning all tuples on the page. If visibility_cutoff_xid was
maintained, we perform the GlobalVisState check only once per page.
This is safe because visibility_cutoff_xid records the newest xmin on
the page; if it is globally visible, then the entire page is all-visible.
This approach may result in examining more tuple xmins than before,
since with OldestXmin we could sometimes rule out the page being
all-visible earlier. However, profiling shows the additional cost is not
significant.
Reviewed-by: Andres Freund <andres@anarazel.de>
Reviewed-by: Chao Li <li.evan.chao@gmail.com>
---
src/backend/access/heap/heapam_visibility.c | 22 +++++++++
src/backend/access/heap/pruneheap.c | 53 ++++++++++-----------
src/backend/access/heap/vacuumlazy.c | 38 ++++++++++-----
src/include/access/heapam.h | 4 +-
4 files changed, 76 insertions(+), 41 deletions(-)
diff --git a/src/backend/access/heap/heapam_visibility.c b/src/backend/access/heap/heapam_visibility.c
index 05f6946fe60..f4ab1c13169 100644
--- a/src/backend/access/heap/heapam_visibility.c
+++ b/src/backend/access/heap/heapam_visibility.c
@@ -1189,6 +1189,28 @@ HeapTupleSatisfiesVacuum(HeapTuple htup, TransactionId OldestXmin,
return res;
}
+/*
+ * Wrapper around GlobalVisTestIsRemovableXid() for use when examining live
+ * tuples. Returns true if the given XID may be considered running by at least
+ * one snapshot.
+ *
+ * This function alone is insufficient to determine tuple visibility; callers
+ * must also consider the XID's commit status. Its purpose is purely semantic:
+ * when applied to live tuples, GlobalVisTestIsRemovableXid() is checking
+ * whether the inserting transaction is still considered running, not whether
+ * the tuple is removable. Live tuples are, by definition, not removable, but
+ * the snapshot criteria for “transaction still running” are identical to
+ * those used for removal XIDs.
+ *
+ * See the comment above GlobalVisTestIsRemovable[Full]Xid() for details on the
+ * required preconditions for calling this function.
+ */
+bool
+GlobalVisTestXidMaybeRunning(GlobalVisState *state, TransactionId xid)
+{
+ return !GlobalVisTestIsRemovableXid(state, xid);
+}
+
/*
* Work horse for HeapTupleSatisfiesVacuum and similar routines.
*
diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index 86de3613f5e..e0b19b3e669 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -461,11 +461,12 @@ prune_freeze_setup(PruneFreezeParams *params,
/*
* The visibility cutoff xid is the newest xmin of live, committed tuples
- * older than OldestXmin on the page. This field is only kept up-to-date
- * if the page is all-visible. As soon as a tuple is encountered that is
- * not visible to all, this field is unmaintained. As long as it is
- * maintained, it can be used to calculate the snapshot conflict horizon
- * when updating the VM and/or freezing all the tuples on the page.
+ * on the page older than the visibility horizon represented in the
+ * GlobalVisState. This field is only kept up-to-date if the page is
+ * all-visible. As soon as a tuple is encountered that is not visible to
+ * all, this field is unmaintained. As long as it is maintained, it can be
+ * used to calculate the snapshot conflict horizon when updating the VM
+ * and/or freezing all the tuples on the page.
*/
prstate->visibility_cutoff_xid = InvalidTransactionId;
}
@@ -994,14 +995,14 @@ heap_page_will_set_vm(PruneState *prstate,
*/
static bool
heap_page_is_all_visible(Relation rel, Buffer buf,
- TransactionId OldestXmin,
+ GlobalVisState *vistest,
bool *all_frozen,
TransactionId *visibility_cutoff_xid,
OffsetNumber *logging_offnum)
{
return heap_page_would_be_all_visible(rel, buf,
- OldestXmin,
+ vistest,
NULL, 0,
all_frozen,
visibility_cutoff_xid,
@@ -1088,6 +1089,16 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
prune_freeze_plan(RelationGetRelid(params->relation),
buffer, &prstate, off_loc);
+ /*
+ * After processing all the live tuples on the page, if the newest xmin
+ * amongst them may be considered running by any snapshot, the page cannot
+ * be all-visible.
+ */
+ if (prstate.all_visible &&
+ TransactionIdIsNormal(prstate.visibility_cutoff_xid) &&
+ GlobalVisTestXidMaybeRunning(prstate.vistest, prstate.visibility_cutoff_xid))
+ prstate.all_visible = prstate.all_frozen = false;
+
/*
* If checksums are enabled, calling heap_prune_satisfies_vacuum() while
* checking tuple visibility information in prune_freeze_plan() may have
@@ -1270,10 +1281,9 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
bool debug_all_frozen;
Assert(prstate.lpdead_items == 0);
- Assert(prstate.cutoffs);
Assert(heap_page_is_all_visible(params->relation, buffer,
- prstate.cutoffs->OldestXmin,
+ prstate.vistest,
&debug_all_frozen,
&debug_cutoff, off_loc));
@@ -1794,28 +1804,15 @@ heap_prune_record_unchanged_lp_normal(Page page, PruneState *prstate, OffsetNumb
}
/*
- * The inserter definitely committed. But is it old enough
- * that everyone sees it as committed? A FrozenTransactionId
- * is seen as committed to everyone. Otherwise, we check if
- * there is a snapshot that considers this xid to still be
- * running, and if so, we don't consider the page all-visible.
+ * The inserter definitely committed. But we don't know if it
+ * is old enough that everyone sees it as committed. Later,
+ * after processing all the tuples on the page, we'll check if
+ * there is any snapshot that still consider the newest xid on
+ * the page to be running. If so, we don't consider the page
+ * all-visible.
*/
xmin = HeapTupleHeaderGetXmin(htup);
- /*
- * For now always use prstate->cutoffs for this test, because
- * we only update 'all_visible' and 'all_frozen' when freezing
- * is requested. We could use GlobalVisTestIsRemovableXid
- * instead, if a non-freezing caller wanted to set the VM bit.
- */
- Assert(prstate->cutoffs);
- if (!TransactionIdPrecedes(xmin, prstate->cutoffs->OldestXmin))
- {
- prstate->all_visible = false;
- prstate->all_frozen = false;
- break;
- }
-
/* Track newest xmin on page. */
if (TransactionIdFollows(xmin, prstate->visibility_cutoff_xid) &&
TransactionIdIsNormal(xmin))
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index 441b4883d89..082cdbc5de8 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -2723,7 +2723,7 @@ lazy_vacuum_heap_page(LVRelState *vacrel, BlockNumber blkno, Buffer buffer,
* done outside the critical section.
*/
if (heap_page_would_be_all_visible(vacrel->rel, buffer,
- vacrel->cutoffs.OldestXmin,
+ vacrel->vistest,
deadoffsets, num_offsets,
&all_frozen, &visibility_cutoff_xid,
&vacrel->offnum))
@@ -3484,7 +3484,7 @@ dead_items_cleanup(LVRelState *vacrel)
* Returns true if the page is all-visible other than the provided
* deadoffsets and false otherwise.
*
- * OldestXmin is used to determine visibility.
+ * vistest is used to determine visibility.
*
* Output parameters:
*
@@ -3500,7 +3500,7 @@ dead_items_cleanup(LVRelState *vacrel)
*/
bool
heap_page_would_be_all_visible(Relation rel, Buffer buf,
- TransactionId OldestXmin,
+ GlobalVisState *vistest,
OffsetNumber *deadoffsets,
int ndeadoffsets,
bool *all_frozen,
@@ -3583,7 +3583,7 @@ heap_page_would_be_all_visible(Relation rel, Buffer buf,
Assert(!TransactionIdIsValid(dead_after));
- /* Check comments in lazy_scan_prune. */
+ /* Check heap_prune_record_unchanged_lp_normal comments */
if (!HeapTupleHeaderXminCommitted(tuple.t_data))
{
all_visible = false;
@@ -3592,16 +3592,17 @@ heap_page_would_be_all_visible(Relation rel, Buffer buf,
}
/*
- * The inserter definitely committed. But is it old enough
- * that everyone sees it as committed?
+ * The inserter definitely committed. But we don't know if
+ * it is old enough that everyone sees it as committed.
+ * Don't check that now.
+ *
+ * If we scan all tuples without finding one that prevents
+ * the page from being all-visible, we then check whether
+ * any snapshot still considers the newest XID on the page
+ * to be running. In that case, the page is not considered
+ * all-visible.
*/
xmin = HeapTupleHeaderGetXmin(tuple.t_data);
- if (!TransactionIdPrecedes(xmin, OldestXmin))
- {
- all_visible = false;
- *all_frozen = false;
- break;
- }
/* Track newest xmin on page. */
if (TransactionIdFollows(xmin, *visibility_cutoff_xid) &&
@@ -3633,6 +3634,19 @@ heap_page_would_be_all_visible(Relation rel, Buffer buf,
}
} /* scan along page */
+ /*
+ * After processing all the live tuples on the page, if the newest xmin
+ * among them may still be considered running by any snapshot, the page
+ * cannot be all-visible.
+ */
+ if (all_visible &&
+ TransactionIdIsNormal(*visibility_cutoff_xid) &&
+ GlobalVisTestXidMaybeRunning(vistest, *visibility_cutoff_xid))
+ {
+ all_visible = false;
+ *all_frozen = false;
+ }
+
/* Clear the offset information once we have processed the given page. */
*logging_offnum = InvalidOffsetNumber;
diff --git a/src/include/access/heapam.h b/src/include/access/heapam.h
index bec2f840102..1625b107575 100644
--- a/src/include/access/heapam.h
+++ b/src/include/access/heapam.h
@@ -439,7 +439,7 @@ extern void log_heap_prune_and_freeze(Relation relation, Buffer buffer,
extern void heap_vacuum_rel(Relation rel,
const VacuumParams params, BufferAccessStrategy bstrategy);
extern bool heap_page_would_be_all_visible(Relation rel, Buffer buf,
- TransactionId OldestXmin,
+ GlobalVisState *vistest,
OffsetNumber *deadoffsets,
int ndeadoffsets,
bool *all_frozen,
@@ -453,6 +453,8 @@ extern TM_Result HeapTupleSatisfiesUpdate(HeapTuple htup, CommandId curcid,
Buffer buffer);
extern HTSV_Result HeapTupleSatisfiesVacuum(HeapTuple htup, TransactionId OldestXmin,
Buffer buffer);
+
+extern bool GlobalVisTestXidMaybeRunning(GlobalVisState *state, TransactionId xid);
extern HTSV_Result HeapTupleSatisfiesVacuumHorizon(HeapTuple htup, Buffer buffer,
TransactionId *dead_after);
extern void HeapTupleSetHintBits(HeapTupleHeader tuple, Buffer buffer,
--
2.43.0
v28-0011-Unset-all_visible-sooner-if-not-freezing.patchtext/x-patch; charset=UTF-8; name=v28-0011-Unset-all_visible-sooner-if-not-freezing.patchDownload
From 11d439fcc4ae35a31be204c2eb8a36d52c162d08 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <melanieplageman@gmail.com>
Date: Tue, 14 Oct 2025 15:22:35 -0400
Subject: [PATCH v28 11/15] Unset all_visible sooner if not freezing
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
In the prune/freeze path, we currently delay clearing all_visible and
all_frozen in the presence of dead items to allow opportunistic
freezing.
However, if no freezing will be attempted, there’s no need to delay.
Clearing the flags earlier avoids extra bookkeeping in
heap_prune_record_unchanged_lp_normal(). This currently has no runtime
effect because all callers that consider setting the VM also prepare
freeze plans, but upcoming changes will allow on-access pruning to set
the VM without freezing. The extra bookkeeping was noticeable in a
profile of on-access VM setting.
Reviewed-by: Chao Li <li.evan.chao@gmail.com>
---
src/backend/access/heap/pruneheap.c | 15 +++++++++++++--
1 file changed, 13 insertions(+), 2 deletions(-)
diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index e0b19b3e669..1fa72e19f0d 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -1669,8 +1669,13 @@ heap_prune_record_dead(PruneState *prstate, OffsetNumber offnum,
/*
* Deliberately delay unsetting all_visible and all_frozen until later
* during pruning. Removable dead tuples shouldn't preclude freezing the
- * page.
+ * page. If we won't attempt freezing, just unset all-visible now, though.
*/
+ if (!prstate->attempt_freeze)
+ {
+ prstate->all_visible = false;
+ prstate->all_frozen = false;
+ }
/* Record the dead offset for vacuum */
prstate->deadoffsets[prstate->lpdead_items++] = offnum;
@@ -1930,8 +1935,14 @@ heap_prune_record_unchanged_lp_dead(Page page, PruneState *prstate, OffsetNumber
* Similarly, don't unset all_visible and all_frozen until later, at the
* end of heap_page_prune_and_freeze(). This will allow us to attempt to
* freeze the page after pruning. As long as we unset it before updating
- * the visibility map, this will be correct.
+ * the visibility map, this will be correct. If we won't attempt freezing,
+ * though, just unset all-visible now.
*/
+ if (!prstate->attempt_freeze)
+ {
+ prstate->all_visible = false;
+ prstate->all_frozen = false;
+ }
/* Record the dead offset for vacuum */
prstate->deadoffsets[prstate->lpdead_items++] = offnum;
--
2.43.0
v28-0012-Track-which-relations-are-modified-by-a-query.patchtext/x-patch; charset=US-ASCII; name=v28-0012-Track-which-relations-are-modified-by-a-query.patchDownload
From afc918c60fd21be6339d88aa4024f908daeab8d3 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <melanieplageman@gmail.com>
Date: Wed, 3 Dec 2025 15:07:24 -0500
Subject: [PATCH v28 12/15] Track which relations are modified by a query
Save the relids in a bitmap in the estate. A later commit will pass this
information down to scan nodes to control whether or not the scan allows
setting the visibility map while on-access pruning. We don't want to set
the visibility map if the query is just going to modify the page
immediately after.
---
src/backend/executor/execMain.c | 4 ++++
src/backend/executor/execUtils.c | 2 ++
src/include/nodes/execnodes.h | 6 ++++++
3 files changed, 12 insertions(+)
diff --git a/src/backend/executor/execMain.c b/src/backend/executor/execMain.c
index 797d8b1ca1c..5b299ef81aa 100644
--- a/src/backend/executor/execMain.c
+++ b/src/backend/executor/execMain.c
@@ -916,6 +916,10 @@ InitPlan(QueryDesc *queryDesc, int eflags)
break;
}
+ /* If it has a rowmark, the relation is modified */
+ estate->es_modified_relids = bms_add_member(estate->es_modified_relids,
+ rc->rti);
+
/* Check that relation is a legal target for marking */
if (relation)
CheckValidRowMarkRel(relation, rc->markType);
diff --git a/src/backend/executor/execUtils.c b/src/backend/executor/execUtils.c
index fdc65c2b42b..28a06dcd244 100644
--- a/src/backend/executor/execUtils.c
+++ b/src/backend/executor/execUtils.c
@@ -893,6 +893,8 @@ ExecInitResultRelation(EState *estate, ResultRelInfo *resultRelInfo,
estate->es_result_relations = (ResultRelInfo **)
palloc0(estate->es_range_table_size * sizeof(ResultRelInfo *));
estate->es_result_relations[rti - 1] = resultRelInfo;
+ estate->es_modified_relids = bms_add_member(estate->es_modified_relids,
+ rti);
/*
* Saving in the list allows to avoid needlessly traversing the whole
diff --git a/src/include/nodes/execnodes.h b/src/include/nodes/execnodes.h
index 3968429f991..d8c385216e0 100644
--- a/src/include/nodes/execnodes.h
+++ b/src/include/nodes/execnodes.h
@@ -676,6 +676,12 @@ typedef struct EState
* ExecDoInitialPruning() */
const char *es_sourceText; /* Source text from QueryDesc */
+ /*
+ * RT indexes of relations modified by the query either through
+ * UPDATE/DELETE/INSERT/MERGE or SELECT FOR UPDATE
+ */
+ Bitmapset *es_modified_relids;
+
JunkFilter *es_junkFilter; /* top-level junk filter, if any */
/* If query can insert/delete tuples, the command ID to mark them with */
--
2.43.0
v28-0013-Pass-down-information-on-table-modification-to-s.patchtext/x-patch; charset=US-ASCII; name=v28-0013-Pass-down-information-on-table-modification-to-s.patchDownload
From a64995852e82d8132f23c1bb361e0f3c8080389a Mon Sep 17 00:00:00 2001
From: Melanie Plageman <melanieplageman@gmail.com>
Date: Wed, 3 Dec 2025 15:12:18 -0500
Subject: [PATCH v28 13/15] Pass down information on table modification to scan
node
Pass down information to sequential scan, index scan, and bitmap table
scan nodes on whether or not the query modifies the relation being
scanned. A later commit will use this information to update the VM
during on-access pruning only if the relation is not modified by the
query.
---
contrib/pgrowlocks/pgrowlocks.c | 2 +-
src/backend/access/brin/brin.c | 3 ++-
src/backend/access/gin/gininsert.c | 3 ++-
src/backend/access/heap/heapam_handler.c | 7 +++---
src/backend/access/index/genam.c | 4 ++--
src/backend/access/index/indexam.c | 6 +++---
src/backend/access/nbtree/nbtsort.c | 2 +-
src/backend/access/table/tableam.c | 7 +++---
src/backend/commands/constraint.c | 2 +-
src/backend/commands/copyto.c | 2 +-
src/backend/commands/tablecmds.c | 8 +++----
src/backend/commands/typecmds.c | 4 ++--
src/backend/executor/execIndexing.c | 2 +-
src/backend/executor/execReplication.c | 8 +++----
src/backend/executor/nodeBitmapHeapscan.c | 9 +++++++-
src/backend/executor/nodeIndexonlyscan.c | 2 +-
src/backend/executor/nodeIndexscan.c | 11 ++++++++--
src/backend/executor/nodeSeqscan.c | 26 ++++++++++++++++++++---
src/backend/partitioning/partbounds.c | 2 +-
src/backend/utils/adt/selfuncs.c | 2 +-
src/include/access/genam.h | 2 +-
src/include/access/heapam.h | 6 ++++++
src/include/access/tableam.h | 19 ++++++++++-------
23 files changed, 93 insertions(+), 46 deletions(-)
diff --git a/contrib/pgrowlocks/pgrowlocks.c b/contrib/pgrowlocks/pgrowlocks.c
index f88269332b6..27f01d8055f 100644
--- a/contrib/pgrowlocks/pgrowlocks.c
+++ b/contrib/pgrowlocks/pgrowlocks.c
@@ -114,7 +114,7 @@ pgrowlocks(PG_FUNCTION_ARGS)
RelationGetRelationName(rel));
/* Scan the relation */
- scan = table_beginscan(rel, GetActiveSnapshot(), 0, NULL);
+ scan = table_beginscan(rel, GetActiveSnapshot(), 0, NULL, 0);
hscan = (HeapScanDesc) scan;
attinmeta = TupleDescGetAttInMetadata(rsinfo->setDesc);
diff --git a/src/backend/access/brin/brin.c b/src/backend/access/brin/brin.c
index 26cb75058d1..4ad8941c60a 100644
--- a/src/backend/access/brin/brin.c
+++ b/src/backend/access/brin/brin.c
@@ -2842,7 +2842,8 @@ _brin_parallel_scan_and_build(BrinBuildState *state,
indexInfo->ii_Concurrent = brinshared->isconcurrent;
scan = table_beginscan_parallel(heap,
- ParallelTableScanFromBrinShared(brinshared));
+ ParallelTableScanFromBrinShared(brinshared),
+ 0);
reltuples = table_index_build_scan(heap, index, indexInfo, true, true,
brinbuildCallbackParallel, state, scan);
diff --git a/src/backend/access/gin/gininsert.c b/src/backend/access/gin/gininsert.c
index df30dcc0228..aaa5401b731 100644
--- a/src/backend/access/gin/gininsert.c
+++ b/src/backend/access/gin/gininsert.c
@@ -2058,7 +2058,8 @@ _gin_parallel_scan_and_build(GinBuildState *state,
indexInfo->ii_Concurrent = ginshared->isconcurrent;
scan = table_beginscan_parallel(heap,
- ParallelTableScanFromGinBuildShared(ginshared));
+ ParallelTableScanFromGinBuildShared(ginshared),
+ 0);
reltuples = table_index_build_scan(heap, index, indexInfo, true, progress,
ginBuildCallbackParallel, state, scan);
diff --git a/src/backend/access/heap/heapam_handler.c b/src/backend/access/heap/heapam_handler.c
index dd4fe6bf62f..6c2e4e08b16 100644
--- a/src/backend/access/heap/heapam_handler.c
+++ b/src/backend/access/heap/heapam_handler.c
@@ -79,12 +79,13 @@ heapam_slot_callbacks(Relation relation)
*/
static IndexFetchTableData *
-heapam_index_fetch_begin(Relation rel)
+heapam_index_fetch_begin(Relation rel, uint32 flags)
{
IndexFetchHeapData *hscan = palloc0_object(IndexFetchHeapData);
hscan->xs_base.rel = rel;
hscan->xs_cbuf = InvalidBuffer;
+ hscan->modifies_base_rel = !(flags & SO_HINT_REL_READ_ONLY);
return &hscan->xs_base;
}
@@ -753,7 +754,7 @@ heapam_relation_copy_for_cluster(Relation OldHeap, Relation NewHeap,
tableScan = NULL;
heapScan = NULL;
- indexScan = index_beginscan(OldHeap, OldIndex, SnapshotAny, NULL, 0, 0);
+ indexScan = index_beginscan(OldHeap, OldIndex, SnapshotAny, NULL, 0, 0, 0);
index_rescan(indexScan, NULL, 0, NULL, 0);
}
else
@@ -762,7 +763,7 @@ heapam_relation_copy_for_cluster(Relation OldHeap, Relation NewHeap,
pgstat_progress_update_param(PROGRESS_CLUSTER_PHASE,
PROGRESS_CLUSTER_PHASE_SEQ_SCAN_HEAP);
- tableScan = table_beginscan(OldHeap, SnapshotAny, 0, (ScanKey) NULL);
+ tableScan = table_beginscan(OldHeap, SnapshotAny, 0, (ScanKey) NULL, 0);
heapScan = (HeapScanDesc) tableScan;
indexScan = NULL;
diff --git a/src/backend/access/index/genam.c b/src/backend/access/index/genam.c
index b7f10a1aed0..15f9cc11582 100644
--- a/src/backend/access/index/genam.c
+++ b/src/backend/access/index/genam.c
@@ -447,7 +447,7 @@ systable_beginscan(Relation heapRelation,
}
sysscan->iscan = index_beginscan(heapRelation, irel,
- snapshot, NULL, nkeys, 0);
+ snapshot, NULL, nkeys, 0, 0);
index_rescan(sysscan->iscan, idxkey, nkeys, NULL, 0);
sysscan->scan = NULL;
@@ -708,7 +708,7 @@ systable_beginscan_ordered(Relation heapRelation,
}
sysscan->iscan = index_beginscan(heapRelation, indexRelation,
- snapshot, NULL, nkeys, 0);
+ snapshot, NULL, nkeys, 0, 0);
index_rescan(sysscan->iscan, idxkey, nkeys, NULL, 0);
sysscan->scan = NULL;
diff --git a/src/backend/access/index/indexam.c b/src/backend/access/index/indexam.c
index 0492d92d23b..b5523cf2ab1 100644
--- a/src/backend/access/index/indexam.c
+++ b/src/backend/access/index/indexam.c
@@ -257,7 +257,7 @@ index_beginscan(Relation heapRelation,
Relation indexRelation,
Snapshot snapshot,
IndexScanInstrumentation *instrument,
- int nkeys, int norderbys)
+ int nkeys, int norderbys, uint32 flags)
{
IndexScanDesc scan;
@@ -284,7 +284,7 @@ index_beginscan(Relation heapRelation,
scan->instrument = instrument;
/* prepare to fetch index matches from table */
- scan->xs_heapfetch = table_index_fetch_begin(heapRelation);
+ scan->xs_heapfetch = table_index_fetch_begin(heapRelation, flags);
return scan;
}
@@ -615,7 +615,7 @@ index_beginscan_parallel(Relation heaprel, Relation indexrel,
scan->instrument = instrument;
/* prepare to fetch index matches from table */
- scan->xs_heapfetch = table_index_fetch_begin(heaprel);
+ scan->xs_heapfetch = table_index_fetch_begin(heaprel, 0);
return scan;
}
diff --git a/src/backend/access/nbtree/nbtsort.c b/src/backend/access/nbtree/nbtsort.c
index d7695dc1108..7bdbc7e5fa7 100644
--- a/src/backend/access/nbtree/nbtsort.c
+++ b/src/backend/access/nbtree/nbtsort.c
@@ -1925,7 +1925,7 @@ _bt_parallel_scan_and_sort(BTSpool *btspool, BTSpool *btspool2,
indexInfo = BuildIndexInfo(btspool->index);
indexInfo->ii_Concurrent = btshared->isconcurrent;
scan = table_beginscan_parallel(btspool->heap,
- ParallelTableScanFromBTShared(btshared));
+ ParallelTableScanFromBTShared(btshared), 0);
reltuples = table_index_build_scan(btspool->heap, btspool->index, indexInfo,
true, progress, _bt_build_callback,
&buildstate, scan);
diff --git a/src/backend/access/table/tableam.c b/src/backend/access/table/tableam.c
index 73ebc01a08f..a00bdfdf822 100644
--- a/src/backend/access/table/tableam.c
+++ b/src/backend/access/table/tableam.c
@@ -163,10 +163,11 @@ table_parallelscan_initialize(Relation rel, ParallelTableScanDesc pscan,
}
TableScanDesc
-table_beginscan_parallel(Relation relation, ParallelTableScanDesc pscan)
+table_beginscan_parallel(Relation relation, ParallelTableScanDesc pscan, uint32 flags)
{
Snapshot snapshot;
- uint32 flags = SO_TYPE_SEQSCAN |
+
+ flags |= SO_TYPE_SEQSCAN |
SO_ALLOW_STRAT | SO_ALLOW_SYNC | SO_ALLOW_PAGEMODE;
Assert(RelFileLocatorEquals(relation->rd_locator, pscan->phs_locator));
@@ -248,7 +249,7 @@ table_index_fetch_tuple_check(Relation rel,
bool found;
slot = table_slot_create(rel, NULL);
- scan = table_index_fetch_begin(rel);
+ scan = table_index_fetch_begin(rel, 0);
found = table_index_fetch_tuple(scan, tid, snapshot, slot, &call_again,
all_dead);
table_index_fetch_end(scan);
diff --git a/src/backend/commands/constraint.c b/src/backend/commands/constraint.c
index 3497a8221f2..97c8278e36d 100644
--- a/src/backend/commands/constraint.c
+++ b/src/backend/commands/constraint.c
@@ -106,7 +106,7 @@ unique_key_recheck(PG_FUNCTION_ARGS)
*/
tmptid = checktid;
{
- IndexFetchTableData *scan = table_index_fetch_begin(trigdata->tg_relation);
+ IndexFetchTableData *scan = table_index_fetch_begin(trigdata->tg_relation, 0);
bool call_again = false;
if (!table_index_fetch_tuple(scan, &tmptid, SnapshotSelf, slot,
diff --git a/src/backend/commands/copyto.c b/src/backend/commands/copyto.c
index dae91630ac3..1957bb0f1a2 100644
--- a/src/backend/commands/copyto.c
+++ b/src/backend/commands/copyto.c
@@ -1156,7 +1156,7 @@ CopyRelationTo(CopyToState cstate, Relation rel, Relation root_rel, uint64 *proc
AttrMap *map = NULL;
TupleTableSlot *root_slot = NULL;
- scandesc = table_beginscan(rel, GetActiveSnapshot(), 0, NULL);
+ scandesc = table_beginscan(rel, GetActiveSnapshot(), 0, NULL, 0);
slot = table_slot_create(rel, NULL);
/*
diff --git a/src/backend/commands/tablecmds.c b/src/backend/commands/tablecmds.c
index 6b1a00ed477..130a670d266 100644
--- a/src/backend/commands/tablecmds.c
+++ b/src/backend/commands/tablecmds.c
@@ -6377,7 +6377,7 @@ ATRewriteTable(AlteredTableInfo *tab, Oid OIDNewHeap)
* checking all the constraints.
*/
snapshot = RegisterSnapshot(GetLatestSnapshot());
- scan = table_beginscan(oldrel, snapshot, 0, NULL);
+ scan = table_beginscan(oldrel, snapshot, 0, NULL, 0);
/*
* Switch to per-tuple memory context and reset it for each tuple
@@ -13766,7 +13766,7 @@ validateForeignKeyConstraint(char *conname,
*/
snapshot = RegisterSnapshot(GetLatestSnapshot());
slot = table_slot_create(rel, NULL);
- scan = table_beginscan(rel, snapshot, 0, NULL);
+ scan = table_beginscan(rel, snapshot, 0, NULL, 0);
perTupCxt = AllocSetContextCreate(CurrentMemoryContext,
"validateForeignKeyConstraint",
@@ -22623,7 +22623,7 @@ MergePartitionsMoveRows(List **wqueue, List *mergingPartitions, Relation newPart
/* Scan through the rows. */
snapshot = RegisterSnapshot(GetLatestSnapshot());
- scan = table_beginscan(mergingPartition, snapshot, 0, NULL);
+ scan = table_beginscan(mergingPartition, snapshot, 0, NULL, 0);
/*
* Switch to per-tuple memory context and reset it for each tuple
@@ -23087,7 +23087,7 @@ SplitPartitionMoveRows(List **wqueue, Relation rel, Relation splitRel,
/* Scan through the rows. */
snapshot = RegisterSnapshot(GetLatestSnapshot());
- scan = table_beginscan(splitRel, snapshot, 0, NULL);
+ scan = table_beginscan(splitRel, snapshot, 0, NULL, 0);
/*
* Switch to per-tuple memory context and reset it for each tuple
diff --git a/src/backend/commands/typecmds.c b/src/backend/commands/typecmds.c
index be6ffd6ddb0..2921f68c1c3 100644
--- a/src/backend/commands/typecmds.c
+++ b/src/backend/commands/typecmds.c
@@ -3157,7 +3157,7 @@ validateDomainNotNullConstraint(Oid domainoid)
/* Scan all tuples in this relation */
snapshot = RegisterSnapshot(GetLatestSnapshot());
- scan = table_beginscan(testrel, snapshot, 0, NULL);
+ scan = table_beginscan(testrel, snapshot, 0, NULL, 0);
slot = table_slot_create(testrel, NULL);
while (table_scan_getnextslot(scan, ForwardScanDirection, slot))
{
@@ -3238,7 +3238,7 @@ validateDomainCheckConstraint(Oid domainoid, const char *ccbin, LOCKMODE lockmod
/* Scan all tuples in this relation */
snapshot = RegisterSnapshot(GetLatestSnapshot());
- scan = table_beginscan(testrel, snapshot, 0, NULL);
+ scan = table_beginscan(testrel, snapshot, 0, NULL, 0);
slot = table_slot_create(testrel, NULL);
while (table_scan_getnextslot(scan, ForwardScanDirection, slot))
{
diff --git a/src/backend/executor/execIndexing.c b/src/backend/executor/execIndexing.c
index 0b3a31f1703..74262a34819 100644
--- a/src/backend/executor/execIndexing.c
+++ b/src/backend/executor/execIndexing.c
@@ -816,7 +816,7 @@ check_exclusion_or_unique_constraint(Relation heap, Relation index,
retry:
conflict = false;
found_self = false;
- index_scan = index_beginscan(heap, index, &DirtySnapshot, NULL, indnkeyatts, 0);
+ index_scan = index_beginscan(heap, index, &DirtySnapshot, NULL, indnkeyatts, 0, 0);
index_rescan(index_scan, scankeys, indnkeyatts, NULL, 0);
while (index_getnext_slot(index_scan, ForwardScanDirection, existing_slot))
diff --git a/src/backend/executor/execReplication.c b/src/backend/executor/execReplication.c
index 860f79f9cc1..6e49ea5c5d8 100644
--- a/src/backend/executor/execReplication.c
+++ b/src/backend/executor/execReplication.c
@@ -204,7 +204,7 @@ RelationFindReplTupleByIndex(Relation rel, Oid idxoid,
skey_attoff = build_replindex_scan_key(skey, rel, idxrel, searchslot);
/* Start an index scan. */
- scan = index_beginscan(rel, idxrel, &snap, NULL, skey_attoff, 0);
+ scan = index_beginscan(rel, idxrel, &snap, NULL, skey_attoff, 0, 0);
retry:
found = false;
@@ -382,7 +382,7 @@ RelationFindReplTupleSeq(Relation rel, LockTupleMode lockmode,
/* Start a heap scan. */
InitDirtySnapshot(snap);
- scan = table_beginscan(rel, &snap, 0, NULL);
+ scan = table_beginscan(rel, &snap, 0, NULL, 0);
scanslot = table_slot_create(rel, NULL);
retry:
@@ -601,7 +601,7 @@ RelationFindDeletedTupleInfoSeq(Relation rel, TupleTableSlot *searchslot,
* not yet committed or those just committed prior to the scan are
* excluded in update_most_recent_deletion_info().
*/
- scan = table_beginscan(rel, SnapshotAny, 0, NULL);
+ scan = table_beginscan(rel, SnapshotAny, 0, NULL, 0);
scanslot = table_slot_create(rel, NULL);
table_rescan(scan, NULL);
@@ -665,7 +665,7 @@ RelationFindDeletedTupleInfoByIndex(Relation rel, Oid idxoid,
* not yet committed or those just committed prior to the scan are
* excluded in update_most_recent_deletion_info().
*/
- scan = index_beginscan(rel, idxrel, SnapshotAny, NULL, skey_attoff, 0);
+ scan = index_beginscan(rel, idxrel, SnapshotAny, NULL, skey_attoff, 0, 0);
index_rescan(scan, skey, skey_attoff, NULL, 0);
diff --git a/src/backend/executor/nodeBitmapHeapscan.c b/src/backend/executor/nodeBitmapHeapscan.c
index bf24f3d7fe0..0d854db51a1 100644
--- a/src/backend/executor/nodeBitmapHeapscan.c
+++ b/src/backend/executor/nodeBitmapHeapscan.c
@@ -105,11 +105,18 @@ BitmapTableScanSetup(BitmapHeapScanState *node)
*/
if (!node->ss.ss_currentScanDesc)
{
+ uint32 flags = 0;
+
+ if (!bms_is_member(((Scan *) node->ss.ps.plan)->scanrelid,
+ node->ss.ps.state->es_modified_relids))
+ flags = SO_HINT_REL_READ_ONLY;
+
node->ss.ss_currentScanDesc =
table_beginscan_bm(node->ss.ss_currentRelation,
node->ss.ps.state->es_snapshot,
0,
- NULL);
+ NULL,
+ flags);
}
node->ss.ss_currentScanDesc->st.rs_tbmiterator = tbmiterator;
diff --git a/src/backend/executor/nodeIndexonlyscan.c b/src/backend/executor/nodeIndexonlyscan.c
index 6bea42f128f..2c87ba5f767 100644
--- a/src/backend/executor/nodeIndexonlyscan.c
+++ b/src/backend/executor/nodeIndexonlyscan.c
@@ -94,7 +94,7 @@ IndexOnlyNext(IndexOnlyScanState *node)
estate->es_snapshot,
&node->ioss_Instrument,
node->ioss_NumScanKeys,
- node->ioss_NumOrderByKeys);
+ node->ioss_NumOrderByKeys, 0);
node->ioss_ScanDesc = scandesc;
diff --git a/src/backend/executor/nodeIndexscan.c b/src/backend/executor/nodeIndexscan.c
index 72b135e5dcf..92674441c6d 100644
--- a/src/backend/executor/nodeIndexscan.c
+++ b/src/backend/executor/nodeIndexscan.c
@@ -102,6 +102,12 @@ IndexNext(IndexScanState *node)
if (scandesc == NULL)
{
+ uint32 flags = 0;
+
+ if (!bms_is_member(((Scan *) node->ss.ps.plan)->scanrelid,
+ estate->es_modified_relids))
+ flags = SO_HINT_REL_READ_ONLY;
+
/*
* We reach here if the index scan is not parallel, or if we're
* serially executing an index scan that was planned to be parallel.
@@ -111,7 +117,8 @@ IndexNext(IndexScanState *node)
estate->es_snapshot,
&node->iss_Instrument,
node->iss_NumScanKeys,
- node->iss_NumOrderByKeys);
+ node->iss_NumOrderByKeys,
+ flags);
node->iss_ScanDesc = scandesc;
@@ -207,7 +214,7 @@ IndexNextWithReorder(IndexScanState *node)
estate->es_snapshot,
&node->iss_Instrument,
node->iss_NumScanKeys,
- node->iss_NumOrderByKeys);
+ node->iss_NumOrderByKeys, 0);
node->iss_ScanDesc = scandesc;
diff --git a/src/backend/executor/nodeSeqscan.c b/src/backend/executor/nodeSeqscan.c
index 94047d29430..4d0cbb9dee4 100644
--- a/src/backend/executor/nodeSeqscan.c
+++ b/src/backend/executor/nodeSeqscan.c
@@ -65,13 +65,20 @@ SeqNext(SeqScanState *node)
if (scandesc == NULL)
{
+ uint32 flags = 0;
+
+ if (!bms_is_member(((Scan *) node->ss.ps.plan)->scanrelid,
+ estate->es_modified_relids))
+ flags = SO_HINT_REL_READ_ONLY;
+
/*
* We reach here if the scan is not parallel, or if we're serially
* executing a scan that was planned to be parallel.
*/
scandesc = table_beginscan(node->ss.ss_currentRelation,
estate->es_snapshot,
- 0, NULL);
+ 0, NULL, flags);
+
node->ss.ss_currentScanDesc = scandesc;
}
@@ -367,14 +374,20 @@ ExecSeqScanInitializeDSM(SeqScanState *node,
{
EState *estate = node->ss.ps.state;
ParallelTableScanDesc pscan;
+ uint32 flags = 0;
pscan = shm_toc_allocate(pcxt->toc, node->pscan_len);
table_parallelscan_initialize(node->ss.ss_currentRelation,
pscan,
estate->es_snapshot);
shm_toc_insert(pcxt->toc, node->ss.ps.plan->plan_node_id, pscan);
+ if (!bms_is_member(((Scan *) node->ss.ps.plan)->scanrelid,
+ estate->es_modified_relids))
+ flags = SO_HINT_REL_READ_ONLY;
+
node->ss.ss_currentScanDesc =
- table_beginscan_parallel(node->ss.ss_currentRelation, pscan);
+ table_beginscan_parallel(node->ss.ss_currentRelation, pscan,
+ flags);
}
/* ----------------------------------------------------------------
@@ -404,8 +417,15 @@ ExecSeqScanInitializeWorker(SeqScanState *node,
ParallelWorkerContext *pwcxt)
{
ParallelTableScanDesc pscan;
+ uint32 flags = 0;
+
+ if (!bms_is_member(((Scan *) node->ss.ps.plan)->scanrelid,
+ node->ss.ps.state->es_modified_relids))
+ flags = SO_HINT_REL_READ_ONLY;
pscan = shm_toc_lookup(pwcxt->toc, node->ss.ps.plan->plan_node_id, false);
node->ss.ss_currentScanDesc =
- table_beginscan_parallel(node->ss.ss_currentRelation, pscan);
+ table_beginscan_parallel(node->ss.ss_currentRelation,
+ pscan,
+ flags);
}
diff --git a/src/backend/partitioning/partbounds.c b/src/backend/partitioning/partbounds.c
index 16b0adc172c..91acf1ee2d7 100644
--- a/src/backend/partitioning/partbounds.c
+++ b/src/backend/partitioning/partbounds.c
@@ -3362,7 +3362,7 @@ check_default_partition_contents(Relation parent, Relation default_rel,
econtext = GetPerTupleExprContext(estate);
snapshot = RegisterSnapshot(GetLatestSnapshot());
tupslot = table_slot_create(part_rel, &estate->es_tupleTable);
- scan = table_beginscan(part_rel, snapshot, 0, NULL);
+ scan = table_beginscan(part_rel, snapshot, 0, NULL, 0);
/*
* Switch to per-tuple memory context and reset it for each tuple
diff --git a/src/backend/utils/adt/selfuncs.c b/src/backend/utils/adt/selfuncs.c
index c760b19db55..ec0def0d1e2 100644
--- a/src/backend/utils/adt/selfuncs.c
+++ b/src/backend/utils/adt/selfuncs.c
@@ -7100,7 +7100,7 @@ get_actual_variable_endpoint(Relation heapRel,
index_scan = index_beginscan(heapRel, indexRel,
&SnapshotNonVacuumable, NULL,
- 1, 0);
+ 1, 0, 0);
/* Set it up for index-only scan */
index_scan->xs_want_itup = true;
index_rescan(index_scan, scankeys, 1, NULL, 0);
diff --git a/src/include/access/genam.h b/src/include/access/genam.h
index 9200a22bd9f..d29d9e905fc 100644
--- a/src/include/access/genam.h
+++ b/src/include/access/genam.h
@@ -177,7 +177,7 @@ extern IndexScanDesc index_beginscan(Relation heapRelation,
Relation indexRelation,
Snapshot snapshot,
IndexScanInstrumentation *instrument,
- int nkeys, int norderbys);
+ int nkeys, int norderbys, uint32 flags);
extern IndexScanDesc index_beginscan_bitmap(Relation indexRelation,
Snapshot snapshot,
IndexScanInstrumentation *instrument,
diff --git a/src/include/access/heapam.h b/src/include/access/heapam.h
index 1625b107575..0bfe2366e1a 100644
--- a/src/include/access/heapam.h
+++ b/src/include/access/heapam.h
@@ -119,6 +119,12 @@ typedef struct IndexFetchHeapData
Buffer xs_cbuf; /* current heap buffer in scan, if any */
/* NB: if xs_cbuf is not InvalidBuffer, we hold a pin on that buffer */
+
+ /*
+ * Some optimizations can only be performed if the query does not modify
+ * the underlying relation. Track that here.
+ */
+ bool modifies_base_rel;
} IndexFetchHeapData;
/* Result codes for HeapTupleSatisfiesVacuum */
diff --git a/src/include/access/tableam.h b/src/include/access/tableam.h
index 2fa790b6bf5..d10b1b03cdb 100644
--- a/src/include/access/tableam.h
+++ b/src/include/access/tableam.h
@@ -63,6 +63,8 @@ typedef enum ScanOptions
/* unregister snapshot at scan end? */
SO_TEMP_SNAPSHOT = 1 << 9,
+ /* set if the query doesn't modify the rel */
+ SO_HINT_REL_READ_ONLY = 1 << 10,
} ScanOptions;
/*
@@ -420,7 +422,7 @@ typedef struct TableAmRoutine
*
* Tuples for an index scan can then be fetched via index_fetch_tuple.
*/
- struct IndexFetchTableData *(*index_fetch_begin) (Relation rel);
+ struct IndexFetchTableData *(*index_fetch_begin) (Relation rel, uint32 flags);
/*
* Reset index fetch. Typically this will release cross index fetch
@@ -874,9 +876,9 @@ extern TupleTableSlot *table_slot_create(Relation relation, List **reglist);
*/
static inline TableScanDesc
table_beginscan(Relation rel, Snapshot snapshot,
- int nkeys, ScanKeyData *key)
+ int nkeys, ScanKeyData *key, uint32 flags)
{
- uint32 flags = SO_TYPE_SEQSCAN |
+ flags |= SO_TYPE_SEQSCAN |
SO_ALLOW_STRAT | SO_ALLOW_SYNC | SO_ALLOW_PAGEMODE;
return rel->rd_tableam->scan_begin(rel, snapshot, nkeys, key, NULL, flags);
@@ -919,9 +921,9 @@ table_beginscan_strat(Relation rel, Snapshot snapshot,
*/
static inline TableScanDesc
table_beginscan_bm(Relation rel, Snapshot snapshot,
- int nkeys, ScanKeyData *key)
+ int nkeys, ScanKeyData *key, uint32 flags)
{
- uint32 flags = SO_TYPE_BITMAPSCAN | SO_ALLOW_PAGEMODE;
+ flags |= SO_TYPE_BITMAPSCAN | SO_ALLOW_PAGEMODE;
return rel->rd_tableam->scan_begin(rel, snapshot, nkeys, key,
NULL, flags);
@@ -1128,7 +1130,8 @@ extern void table_parallelscan_initialize(Relation rel,
* Caller must hold a suitable lock on the relation.
*/
extern TableScanDesc table_beginscan_parallel(Relation relation,
- ParallelTableScanDesc pscan);
+ ParallelTableScanDesc pscan,
+ uint32 flags);
/*
* Begin a parallel tid range scan. `pscan` needs to have been initialized
@@ -1164,9 +1167,9 @@ table_parallelscan_reinitialize(Relation rel, ParallelTableScanDesc pscan)
* Tuples for an index scan can then be fetched via table_index_fetch_tuple().
*/
static inline IndexFetchTableData *
-table_index_fetch_begin(Relation rel)
+table_index_fetch_begin(Relation rel, uint32 flags)
{
- return rel->rd_tableam->index_fetch_begin(rel);
+ return rel->rd_tableam->index_fetch_begin(rel, flags);
}
/*
--
2.43.0
v28-0014-Allow-on-access-pruning-to-set-pages-all-visible.patchtext/x-patch; charset=US-ASCII; name=v28-0014-Allow-on-access-pruning-to-set-pages-all-visible.patchDownload
From 28ccc95ff7620ef07aa3f225d3a7c05aa8c05909 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <melanieplageman@gmail.com>
Date: Wed, 3 Dec 2025 15:24:08 -0500
Subject: [PATCH v28 14/15] Allow on-access pruning to set pages all-visible
Many queries do not modify the underlying relation. For such queries, if
on-access pruning occurs during the scan, we can check whether the page
has become all-visible and update the visibility map accordingly.
Previously, only vacuum and COPY FREEZE marked pages as all-visible or
all-frozen.
This commit implements on-access VM setting for sequential scans as well
as for the underlying heap relation in index scans and bitmap heap
scans.
Author: Melanie Plageman <melanieplageman@gmail.com>
Reviewed-by: Andres Freund <andres@anarazel.de>
Reviewed-by: Kirill Reshke <reshkekirill@gmail.com>
Discussion: https://postgr.es/m/flat/CAAKRu_ZMw6Npd_qm2KM%2BFwQ3cMOMx1Dh3VMhp8-V7SOLxdK9-g%40mail.gmail.com
---
src/backend/access/heap/heapam.c | 15 ++++++-
src/backend/access/heap/heapam_handler.c | 15 ++++++-
src/backend/access/heap/pruneheap.c | 40 ++++++++++++++++++-
src/include/access/heapam.h | 24 +++++++++--
.../t/035_standby_logical_decoding.pl | 3 +-
5 files changed, 89 insertions(+), 8 deletions(-)
diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index fb7a7548aa0..d9dc79f4a96 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -570,6 +570,7 @@ heap_prepare_pagescan(TableScanDesc sscan)
Buffer buffer = scan->rs_cbuf;
BlockNumber block = scan->rs_cblock;
Snapshot snapshot;
+ Buffer *vmbuffer = NULL;
Page page;
int lines;
bool all_visible;
@@ -584,7 +585,9 @@ heap_prepare_pagescan(TableScanDesc sscan)
/*
* Prune and repair fragmentation for the whole page, if possible.
*/
- heap_page_prune_opt(scan->rs_base.rs_rd, buffer);
+ if (sscan->rs_flags & SO_HINT_REL_READ_ONLY)
+ vmbuffer = &scan->rs_vmbuffer;
+ heap_page_prune_opt(scan->rs_base.rs_rd, buffer, vmbuffer);
/*
* We must hold share lock on the buffer content while examining tuple
@@ -1261,6 +1264,7 @@ heap_beginscan(Relation relation, Snapshot snapshot,
sizeof(TBMIterateResult));
}
+ scan->rs_vmbuffer = InvalidBuffer;
return (TableScanDesc) scan;
}
@@ -1299,6 +1303,12 @@ heap_rescan(TableScanDesc sscan, ScanKey key, bool set_params,
scan->rs_cbuf = InvalidBuffer;
}
+ if (BufferIsValid(scan->rs_vmbuffer))
+ {
+ ReleaseBuffer(scan->rs_vmbuffer);
+ scan->rs_vmbuffer = InvalidBuffer;
+ }
+
/*
* SO_TYPE_BITMAPSCAN would be cleaned up here, but it does not hold any
* additional data vs a normal HeapScan
@@ -1331,6 +1341,9 @@ heap_endscan(TableScanDesc sscan)
if (BufferIsValid(scan->rs_cbuf))
ReleaseBuffer(scan->rs_cbuf);
+ if (BufferIsValid(scan->rs_vmbuffer))
+ ReleaseBuffer(scan->rs_vmbuffer);
+
/*
* Must free the read stream before freeing the BufferAccessStrategy.
*/
diff --git a/src/backend/access/heap/heapam_handler.c b/src/backend/access/heap/heapam_handler.c
index 6c2e4e08b16..2cb98e58956 100644
--- a/src/backend/access/heap/heapam_handler.c
+++ b/src/backend/access/heap/heapam_handler.c
@@ -85,6 +85,7 @@ heapam_index_fetch_begin(Relation rel, uint32 flags)
hscan->xs_base.rel = rel;
hscan->xs_cbuf = InvalidBuffer;
+ hscan->xs_vmbuffer = InvalidBuffer;
hscan->modifies_base_rel = !(flags & SO_HINT_REL_READ_ONLY);
return &hscan->xs_base;
@@ -100,6 +101,12 @@ heapam_index_fetch_reset(IndexFetchTableData *scan)
ReleaseBuffer(hscan->xs_cbuf);
hscan->xs_cbuf = InvalidBuffer;
}
+
+ if (BufferIsValid(hscan->xs_vmbuffer))
+ {
+ ReleaseBuffer(hscan->xs_vmbuffer);
+ hscan->xs_vmbuffer = InvalidBuffer;
+ }
}
static void
@@ -139,7 +146,8 @@ heapam_index_fetch_tuple(struct IndexFetchTableData *scan,
* Prune page, but only if we weren't already on this page
*/
if (prev_buf != hscan->xs_cbuf)
- heap_page_prune_opt(hscan->xs_base.rel, hscan->xs_cbuf);
+ heap_page_prune_opt(hscan->xs_base.rel, hscan->xs_cbuf,
+ hscan->modifies_base_rel ? NULL : &hscan->xs_vmbuffer);
}
/* Obtain share-lock on the buffer so we can examine visibility */
@@ -2472,6 +2480,7 @@ BitmapHeapScanNextBlock(TableScanDesc scan,
TBMIterateResult *tbmres;
OffsetNumber offsets[TBM_MAX_TUPLES_PER_PAGE];
int noffsets = -1;
+ Buffer *vmbuffer = NULL;
Assert(scan->rs_flags & SO_TYPE_BITMAPSCAN);
Assert(hscan->rs_read_stream);
@@ -2518,7 +2527,9 @@ BitmapHeapScanNextBlock(TableScanDesc scan,
/*
* Prune and repair fragmentation for the whole page, if possible.
*/
- heap_page_prune_opt(scan->rs_rd, buffer);
+ if (scan->rs_flags & SO_HINT_REL_READ_ONLY)
+ vmbuffer = &hscan->rs_vmbuffer;
+ heap_page_prune_opt(scan->rs_rd, buffer, vmbuffer);
/*
* We must hold share lock on the buffer content while examining tuple
diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index 1fa72e19f0d..c9821a5830c 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -202,6 +202,8 @@ static bool heap_page_will_set_vm(PruneState *prstate,
Relation relation,
BlockNumber heap_blk, Buffer heap_buffer, Page heap_page,
Buffer vmbuffer,
+ PruneReason reason,
+ bool do_prune, bool do_freeze,
int nlpdead_items,
uint8 *old_vmbits,
uint8 *new_vmbits);
@@ -223,9 +225,13 @@ static TransactionId get_conflict_xid(bool do_prune, bool do_freeze, bool do_set
* if there's not any use in pruning.
*
* Caller must have pin on the buffer, and must *not* have a lock on it.
+ *
+ * If vmbuffer is not NULL, it is okay for pruning to set the visibility map if
+ * the page is all-visible. We will take care of pinning and, if needed,
+ * reading in the page of the visibility map.
*/
void
-heap_page_prune_opt(Relation relation, Buffer buffer)
+heap_page_prune_opt(Relation relation, Buffer buffer, Buffer *vmbuffer)
{
Page page = BufferGetPage(buffer);
TransactionId prune_xid;
@@ -306,6 +312,13 @@ heap_page_prune_opt(Relation relation, Buffer buffer)
.cutoffs = NULL,
};
+ if (vmbuffer)
+ {
+ visibilitymap_pin(relation, BufferGetBlockNumber(buffer), vmbuffer);
+ params.options |= HEAP_PAGE_PRUNE_UPDATE_VM;
+ params.vmbuffer = *vmbuffer;
+ }
+
heap_page_prune_and_freeze(¶ms, &presult, &dummy_off_loc,
NULL, NULL);
@@ -945,6 +958,9 @@ identify_and_fix_vm_corruption(Relation rel, Buffer heap_buffer,
* corrupted, it will fix them by clearing the VM bits and visibility hint.
* This does not need to be done in a critical section.
*
+ * This should be called only after do_freeze has been decided (and do_prune
+ * has been set), as these factor into our heuristic-based decision.
+ *
* Returns true if one or both VM bits should be set, along with returning the
* current value of the VM bits in *old_vmbits and the desired new value of
* the VM bits in *new_vmbits.
@@ -954,6 +970,8 @@ heap_page_will_set_vm(PruneState *prstate,
Relation relation,
BlockNumber heap_blk, Buffer heap_buffer, Page heap_page,
Buffer vmbuffer,
+ PruneReason reason,
+ bool do_prune, bool do_freeze,
int nlpdead_items,
uint8 *old_vmbits,
uint8 *new_vmbits)
@@ -961,6 +979,24 @@ heap_page_will_set_vm(PruneState *prstate,
if (!prstate->attempt_update_vm)
return false;
+ /*
+ * If this is an on-access call and we're not actually pruning, avoid
+ * setting the visibility map if it would newly dirty the heap page or, if
+ * the page is already dirty, if doing so would require including a
+ * full-page image (FPI) of the heap page in the WAL. This situation
+ * should be rare, as on-access pruning is only attempted when
+ * pd_prune_xid is valid.
+ */
+ if (reason == PRUNE_ON_ACCESS &&
+ prstate->all_visible &&
+ !do_prune && !do_freeze &&
+ (!BufferIsDirty(heap_buffer) || XLogCheckBufferNeedsBackup(heap_buffer)))
+ {
+ prstate->all_visible = false;
+ prstate->all_frozen = false;
+ return false;
+ }
+
*old_vmbits = visibilitymap_get_status(relation, heap_blk,
&vmbuffer);
@@ -1157,6 +1193,8 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
buffer,
page,
vmbuffer,
+ params->reason,
+ do_prune, do_freeze,
prstate.lpdead_items,
&old_vmbits,
&new_vmbits);
diff --git a/src/include/access/heapam.h b/src/include/access/heapam.h
index 0bfe2366e1a..3328f56c101 100644
--- a/src/include/access/heapam.h
+++ b/src/include/access/heapam.h
@@ -95,6 +95,13 @@ typedef struct HeapScanDescData
*/
ParallelBlockTableScanWorkerData *rs_parallelworkerdata;
+ /*
+ * For sequential scans and bitmap heap scans. If the relation is not
+ * being modified, on-access pruning may read in the current heap page's
+ * corresponding VM block to this buffer.
+ */
+ Buffer rs_vmbuffer;
+
/* these fields only used in page-at-a-time mode and for bitmap scans */
uint32 rs_cindex; /* current tuple's index in vistuples */
uint32 rs_ntuples; /* number of visible tuples on page */
@@ -117,8 +124,18 @@ typedef struct IndexFetchHeapData
{
IndexFetchTableData xs_base; /* AM independent part of the descriptor */
- Buffer xs_cbuf; /* current heap buffer in scan, if any */
- /* NB: if xs_cbuf is not InvalidBuffer, we hold a pin on that buffer */
+ /*
+ * Current heap buffer in scan, if any. NB: if xs_cbuf is not
+ * InvalidBuffer, we hold a pin on that buffer.
+ */
+ Buffer xs_cbuf;
+
+ /*
+ * For index scans that do not modify the underlying heap table, on-access
+ * pruning may read in the current heap page's corresponding VM block to
+ * this buffer.
+ */
+ Buffer xs_vmbuffer;
/*
* Some optimizations can only be performed if the query does not modify
@@ -420,7 +437,8 @@ extern TransactionId heap_index_delete_tuples(Relation rel,
TM_IndexDeleteOp *delstate);
/* in heap/pruneheap.c */
-extern void heap_page_prune_opt(Relation relation, Buffer buffer);
+extern void heap_page_prune_opt(Relation relation, Buffer buffer,
+ Buffer *vmbuffer);
extern void heap_page_prune_and_freeze(PruneFreezeParams *params,
PruneFreezeResult *presult,
OffsetNumber *off_loc,
diff --git a/src/test/recovery/t/035_standby_logical_decoding.pl b/src/test/recovery/t/035_standby_logical_decoding.pl
index ebe2fae1789..bdd9f0a62cd 100644
--- a/src/test/recovery/t/035_standby_logical_decoding.pl
+++ b/src/test/recovery/t/035_standby_logical_decoding.pl
@@ -296,6 +296,7 @@ wal_level = 'logical'
max_replication_slots = 4
max_wal_senders = 4
autovacuum = off
+hot_standby_feedback = on
});
$node_primary->dump_info;
$node_primary->start;
@@ -748,7 +749,7 @@ check_pg_recvlogical_stderr($handle,
$logstart = -s $node_standby->logfile;
reactive_slots_change_hfs_and_wait_for_xmins('shared_row_removal_',
- 'no_conflict_', 0, 1);
+ 'no_conflict_', 1, 0);
# This should not trigger a conflict
wait_until_vacuum_can_remove(
--
2.43.0
v28-0015-Set-pd_prune_xid-on-insert.patchtext/x-patch; charset=UTF-8; name=v28-0015-Set-pd_prune_xid-on-insert.patchDownload
From 1df98a32c9b32458621c003f895a71c008c085d6 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <melanieplageman@gmail.com>
Date: Tue, 29 Jul 2025 16:12:56 -0400
Subject: [PATCH v28 15/15] Set pd_prune_xid on insert
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
Now that visibility map (VM) updates can occur during read-only queries,
it makes sense to also set the page’s pd_prune_xid hint during inserts.
This enables heap_page_prune_and_freeze() to run after a page is
filled with newly inserted tuples the first time it is read.
This change also addresses a long-standing note in heap_insert() and
heap_multi_insert(), which observed that setting pd_prune_xid would
help clean up aborted insertions sooner. Without it, such tuples might
linger until VACUUM, whereas now they can be pruned earlier.
Setting pd_prune_xid on insert can cause a page to be dirtied and
written out when it previously would not have been, affetcting the
reported number of hits in the index-killtuples isolation test. It is
unclear if this is a bug in the way hits are tracked, a faulty test
expectation, or if simply updating the test's expected output is
sufficient remediation.
---
src/backend/access/heap/heapam.c | 25 +++++++++++++------
src/backend/access/heap/heapam_xlog.c | 15 ++++++++++-
.../modules/index/expected/killtuples.out | 6 ++---
3 files changed, 34 insertions(+), 12 deletions(-)
diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index d9dc79f4a96..ccebc1f244b 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -2119,6 +2119,7 @@ heap_insert(Relation relation, HeapTuple tup, CommandId cid,
TransactionId xid = GetCurrentTransactionId();
HeapTuple heaptup;
Buffer buffer;
+ Page page;
Buffer vmbuffer = InvalidBuffer;
bool all_visible_cleared = false;
@@ -2178,15 +2179,19 @@ heap_insert(Relation relation, HeapTuple tup, CommandId cid,
}
/*
- * XXX Should we set PageSetPrunable on this page ?
+ * Set pd_prune_xid to trigger heap_page_prune_and_freeze() once the page
+ * is full so that we can set the page all-visible in the VM.
*
- * The inserting transaction may eventually abort thus making this tuple
- * DEAD and hence available for pruning. Though we don't want to optimize
- * for aborts, if no other tuple in this page is UPDATEd/DELETEd, the
- * aborted tuple will never be pruned until next vacuum is triggered.
+ * Setting pd_prune_xid is also handy if the inserting transaction
+ * eventually aborts making this tuple DEAD and hence available for
+ * pruning. If no other tuple in this page is UPDATEd/DELETEd, the aborted
+ * tuple would never otherwise be pruned until next vacuum is triggered.
*
- * If you do add PageSetPrunable here, add it in heap_xlog_insert too.
+ * Don't set it if we are in bootstrap mode, though.
*/
+ page = BufferGetPage(buffer);
+ if (TransactionIdIsNormal(xid))
+ PageSetPrunable(page, xid);
MarkBufferDirty(buffer);
@@ -2196,7 +2201,6 @@ heap_insert(Relation relation, HeapTuple tup, CommandId cid,
xl_heap_insert xlrec;
xl_heap_header xlhdr;
XLogRecPtr recptr;
- Page page = BufferGetPage(buffer);
uint8 info = XLOG_HEAP_INSERT;
int bufflags = 0;
@@ -2560,8 +2564,13 @@ heap_multi_insert(Relation relation, TupleTableSlot **slots, int ntuples,
}
/*
- * XXX Should we set PageSetPrunable on this page ? See heap_insert()
+ * Set pd_prune_xid. See heap_insert() for more on why we do this when
+ * inserting tuples. This only makes sense if we aren't already
+ * setting the page frozen in the VM. We also don't set it in
+ * bootstrap mode.
*/
+ if (!all_frozen_set && TransactionIdIsNormal(xid))
+ PageSetPrunable(page, xid);
MarkBufferDirty(buffer);
diff --git a/src/backend/access/heap/heapam_xlog.c b/src/backend/access/heap/heapam_xlog.c
index 47d2479415e..ab2db931aac 100644
--- a/src/backend/access/heap/heapam_xlog.c
+++ b/src/backend/access/heap/heapam_xlog.c
@@ -447,6 +447,12 @@ heap_xlog_insert(XLogReaderState *record)
freespace = PageGetHeapFreeSpace(page); /* needed to update FSM below */
+ /*
+ * Set the page prunable to trigger on-access pruning later which may
+ * set the page all-visible in the VM.
+ */
+ PageSetPrunable(page, XLogRecGetXid(record));
+
PageSetLSN(page, lsn);
if (xlrec->flags & XLH_INSERT_ALL_VISIBLE_CLEARED)
@@ -596,9 +602,16 @@ heap_xlog_multi_insert(XLogReaderState *record)
if (xlrec->flags & XLH_INSERT_ALL_VISIBLE_CLEARED)
PageClearAllVisible(page);
- /* XLH_INSERT_ALL_FROZEN_SET implies that all tuples are visible */
+ /*
+ * XLH_INSERT_ALL_FROZEN_SET implies that all tuples are visible. If
+ * we are not setting the page frozen, then set the page's prunable
+ * hint so that we trigger on-access pruning later which may set the
+ * page all-visible in the VM.
+ */
if (xlrec->flags & XLH_INSERT_ALL_FROZEN_SET)
PageSetAllVisible(page);
+ else
+ PageSetPrunable(page, XLogRecGetXid(record));
MarkBufferDirty(buffer);
}
diff --git a/src/test/modules/index/expected/killtuples.out b/src/test/modules/index/expected/killtuples.out
index be7ddd756ef..b29f2434b00 100644
--- a/src/test/modules/index/expected/killtuples.out
+++ b/src/test/modules/index/expected/killtuples.out
@@ -54,7 +54,7 @@ step flush: SELECT FROM pg_stat_force_next_flush();
step result: SELECT heap_blks_read + heap_blks_hit - counter.heap_accesses AS new_heap_accesses FROM counter, pg_statio_all_tables WHERE relname = 'kill_prior_tuple';
new_heap_accesses
-----------------
- 1
+ 2
(1 row)
step measure: UPDATE counter SET heap_accesses = (SELECT heap_blks_read + heap_blks_hit FROM pg_statio_all_tables WHERE relname = 'kill_prior_tuple');
@@ -130,7 +130,7 @@ step flush: SELECT FROM pg_stat_force_next_flush();
step result: SELECT heap_blks_read + heap_blks_hit - counter.heap_accesses AS new_heap_accesses FROM counter, pg_statio_all_tables WHERE relname = 'kill_prior_tuple';
new_heap_accesses
-----------------
- 1
+ 2
(1 row)
step measure: UPDATE counter SET heap_accesses = (SELECT heap_blks_read + heap_blks_hit FROM pg_statio_all_tables WHERE relname = 'kill_prior_tuple');
@@ -283,7 +283,7 @@ step flush: SELECT FROM pg_stat_force_next_flush();
step result: SELECT heap_blks_read + heap_blks_hit - counter.heap_accesses AS new_heap_accesses FROM counter, pg_statio_all_tables WHERE relname = 'kill_prior_tuple';
new_heap_accesses
-----------------
- 1
+ 2
(1 row)
step measure: UPDATE counter SET heap_accesses = (SELECT heap_blks_read + heap_blks_hit FROM pg_statio_all_tables WHERE relname = 'kill_prior_tuple');
--
2.43.0
On Thu, 18 Dec 2025 at 05:30, Melanie Plageman
<melanieplageman@gmail.com> wrote:
in v27-0005. This patch changes code which is not exercised in
tests[0]. I spent some time understanding the conditions when we
entered this. There is a comment about non-finished relation
extension, but I got no success trying to reproduce this. I ended up
modifying code to lose PageSetAllVisible in proper places and running
vacuum. Looks like everything works as expected. I will spend some
more time on this, maybe I will be successful in writing an
injection-point-based TAP test which hits this...Based on the coverage report link you provided, that code is changed
by v27 0007, not 0005. 0005 is about moving an assertion out of
lazy_scan_prune(). 0007 changes lazy_scan_new_or_empty() (the code in
question).Regarding 0007, it looks like what is uncovered (the orange bits in
the coverage report are uncovered, I assume) is empty pages _without_
PD_ALL_VISIBLE set. I don't see anywhere where PageSetAllVisible() is
called except vacuum and COPY FREEZE.
Sure, I meant 0007.
If I was trying to guess how empty pages with PD_ALL_VISIBLE set are
getting vacuumed, I would think it is due to SKIP_PAGES_THRESHOLD
causing us to vacuum an all-frozen empty page.
Yes, vacuum (disable_page_skipping);
Then the question is, why wouldn't we have coverage of the empty page
first being set all-visible/all-frozen? It can't be COPY FREEZE
because the page is empty. And it can't be vacuum, because then we
would have coverage. It's very mysterious.It would be good to have coverage for this case. I don't think you'll
need an injection point for the main case of "empty page not yet set
all-visible is vacuumed for the first time" (unless I'm
misunderstanding something).I'm not sure how you'll test the "vacuuming an empty, previously
uninitialized page" case described in this comment, though.* It's possible that another backend has extended the heap,
* initialized the page, and then failed to WAL-log the page due
* to an ERROR. Since heap extension is not WAL-logged, recovery
* might try to replay our record setting the page all-visible and
* find that the page isn't initialized, which will cause a PANIC.
* To prevent that, check whether the page has been previously
* WAL-logged, and if not, do that now.You'd want to force an error during relation extension and then vacuum
the page. I don't know if you need an injection point to force the
error -- depends on what kind of error, I think.
I did small archeology and this "if (PageIsEmpty(page)) { if
(!PageIsAllVisible(page)) { .... }}" code originates back to
608195a3a365. Comment about not WAL-logged relation extension is from
a6370fd9ed3d, and I don't think we need to think about this case.
I am currently inclined to think that we cannot see an empty page that
has PD_ALL_VISIBLE not-set. This is because when we make a page empty,
we are in a critical section, and we WAL-log everything we do, so our
changes should not be half-made. Maybe as of 608195a3a365, there was a
case with empry-page-without-PD_ALL_VISIBLE, but I dont think this
happens on HEAD.
So that I know for attribution, did you review 0003-0005?
yes, but I did not have any valuable review points for them.
Also, after the whole set is committed, we should then never
experience discrepancy between PD_ALL_VISIBLE and VM bits? Because
they will be set in a single WAL record. The only cases when heap and
VM disagrees on all-visibility then are corruption,
pg_visibilitymap_truncate and old data (data before v19+ upgrade?)
If my understanding is correct, should we add document this?
--
Best regards,
Kirill Reshke
On Thu, Dec 18, 2025 at 3:55 AM Kirill Reshke <reshkekirill@gmail.com> wrote:
On Thu, 18 Dec 2025 at 05:30, Melanie Plageman
<melanieplageman@gmail.com> wrote:If I was trying to guess how empty pages with PD_ALL_VISIBLE set are
getting vacuumed, I would think it is due to SKIP_PAGES_THRESHOLD
causing us to vacuum an all-frozen empty page.Yes, vacuum (disable_page_skipping);
Ah, right, that would be a reliable way for it to happen.
Then the question is, why wouldn't we have coverage of the empty page
first being set all-visible/all-frozen? It can't be COPY FREEZE
because the page is empty. And it can't be vacuum, because then we
would have coverage. It's very mysterious.
<--snip-->
I am currently inclined to think that we cannot see an empty page that
has PD_ALL_VISIBLE not-set. This is because when we make a page empty,
we are in a critical section, and we WAL-log everything we do, so our
changes should not be half-made. Maybe as of 608195a3a365, there was a
case with empry-page-without-PD_ALL_VISIBLE, but I dont think this
happens on HEAD.
Right, so the way that empty pages get set PD_ALL_VISIBLE is when a
page has all its tuples deleted, the next time it is vacuumed it will
be set all-visible and all-frozen and have PD_ALL_VISIBLE set. (if
it's a trailing page it will be truncated, but any non-trailing page
will be like this).
But you are right, I don't see any non-error code path where a heap
page would become empty (all line pointers set unused) and then not be
set all-visible. Only vacuum sets line pointers unused and if all the
line pointers are unused it will always set the page all-visible.
I think, though, that if we error out in lazy_scan_prune() after
returning from heap_page_prune_and_freeze() such that we don't set the
empty page all-visible, we can end up with an empty page without
PD_ALL_VISIBLE set. You can see how this might work by patching the VM
set code in lazy_scan_prune() to skip empty pages.
I did small archeology and this "if (PageIsEmpty(page)) { if
(!PageIsAllVisible(page)) { .... }}" code originates back to
608195a3a365. Comment about not WAL-logged relation extension is from
a6370fd9ed3d, and I don't think we need to think about this case.
Thanks for looking into this. Even if this code was added to handle
the error codepath I mentioned above, it seems like it would have been
good enough to just let lazy_scan_prune() handle setting the empty
page all-visible the next time the page was vacuumed. Since there is
no non-error code path where this can happen, it doesn't seem like it
would merit its own special case.
It is possible it was more common as of 608195a3a365, as you say.
I don't understand how the bug fixed by a6370fd9ed3d can happen. When
a new page is initialized, flags are set to 0, so regardless of WAL
logging of the extension not happening, how would the new page have
been set PD_ALL_VISIBLE? We'll have to ask Andres or Robert about how
this was hit.
Also, after the whole set is committed, we should then never
experience discrepancy between PD_ALL_VISIBLE and VM bits? Because
they will be set in a single WAL record. The only cases when heap and
VM disagrees on all-visibility then are corruption,
pg_visibilitymap_truncate and old data (data before v19+ upgrade?)
If my understanding is correct, should we add document this?
Even on current master, I don't see a scenario other than VM
corruption or truncation where PD_ALL_VISIBLE can be set but not the
VM (or vice versa). The only way would be if you error out after
setting PD_ALL_VISIBLE before setting the VM. Setting PD_ALL_VISIBLE
is not in a critical section in lazy_scan_prune(), so it won't panic
and dump shared memory, so the buffer with PD_ALL_VISIBLE set may
later get written out. But the only obvious way I see to error out of
MarkBufferDirty() is if the buffer is not valid -- which would have
kept us from doing previous operations on the buffer, I would think.
It's true this will no longer happen after my patches, as
PageSetAllVisible() will happen in a critical section. We could add a
comment about this particular scenario in the code somewhere. But I
don't think we should document it in any user-facing documentation
since you could still truncate the VM and have the two out of sync.
- Melanie
On Thu, 18 Dec 2025 at 20:18, Melanie Plageman
<melanieplageman@gmail.com> wrote:
Also, after the whole set is committed, we should then never
experience discrepancy between PD_ALL_VISIBLE and VM bits? Because
they will be set in a single WAL record. The only cases when heap and
VM disagrees on all-visibility then are corruption,
pg_visibilitymap_truncate and old data (data before v19+ upgrade?)
If my understanding is correct, should we add document this?Even on current master, I don't see a scenario other than VM
corruption or truncation where PD_ALL_VISIBLE can be set but not the
VM (or vice versa). The only way would be if you error out after
setting PD_ALL_VISIBLE before setting the VM. Setting PD_ALL_VISIBLE
is not in a critical section in lazy_scan_prune(), so it won't panic
and dump shared memory, so the buffer with PD_ALL_VISIBLE set may
later get written out. But the only obvious way I see to error out of
MarkBufferDirty() is if the buffer is not valid -- which would have
kept us from doing previous operations on the buffer, I would think.
Well... I may be missing something, but on current HEAD,
XLOG_HEAP2_PRUNE_VACUUM_SCAN and XLOG_HEAP2_VISIBLE are two different
record, XLOG_HEAP2_PRUNE_VACUUM_SCAN being always emitted first. So,
WAL writer may end up kill-9-ed just after
XLOG_HEAP2_PRUNE_VACUUM_SCAN makes it to the disk, and
XLOG_HEAP2_VISIBLE never. Crash recovery then, and we have
discrepancy. This does not happen with a single WAL record.
Another simple reproducer here: standby streaming, receiving
XLOG_HEAP2_PRUNE_VACUUM_SCAN from primary, Then network becomes bad,
and we never get XLOG_HEAP2_VISIBLE from primary. Then we promoted by
the admin. And again, VM bit vs PD_ALL_VISIBLE discrepancy. Am I
missing something?
--
Best regards,
Kirill Reshke
On Thu, 18 Dec 2025 at 20:18, Melanie Plageman
<melanieplageman@gmail.com> wrote:
But you are right, I don't see any non-error code path where a heap
page would become empty (all line pointers set unused) and then not be
set all-visible. Only vacuum sets line pointers unused and if all the
line pointers are unused it will always set the page all-visible.I think, though, that if we error out in lazy_scan_prune() after
returning from heap_page_prune_and_freeze() such that we don't set the
empty page all-visible, we can end up with an empty page without
PD_ALL_VISIBLE set. You can see how this might work by patching the VM
set code in lazy_scan_prune() to skip empty pages.
Thank you for your explanation! I completely forgot that PD_ALL_VIS
is a non-persistent change (hint bit). so its update can be trivially
lost.
The simplest real-life example is being killed just after returning
from heap_page_prune_and_freeze, yes.
PFA tap test covering lazy_scan_new_or_empty code path for
empty-but-not-all-visible page
--
Best regards,
Kirill Reshke
Attachments:
v1-0001-Add-TAP-test-for-empty-page-vacuum.patchapplication/octet-stream; name=v1-0001-Add-TAP-test-for-empty-page-vacuum.patchDownload
From ac838953de9c4ab0cb5f13d1e1b8ad0a18e73e39 Mon Sep 17 00:00:00 2001
From: reshke <reshke@double.cloud>
Date: Thu, 18 Dec 2025 18:00:22 +0000
Subject: [PATCH v1] Add TAP test for empty page vacuum.
VACUUM can be run for empty pages with DISABLE_PAGE_SKIPPING option.
In this case, VACUUM wil set up page-level visibility bit
(PD_ALL_VISIBLE) if not previously set.
To end up with empty page which is missing visibility hint bit, we need
to forcefuly cancel (kill -9) backend, executing page freezing, jsut
after it did page pruning. Use injeciton point for this purpose and
add TAP test to cover "recovery" after error code path.
---
src/backend/access/heap/vacuumlazy.c | 6 ++
.../test_misc/t/010_vacuum_empty_page.pl | 75 +++++++++++++++++++
2 files changed, 81 insertions(+)
create mode 100644 src/test/modules/test_misc/t/010_vacuum_empty_page.pl
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index 30778a15639..9b8cbb67f11 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -153,6 +153,7 @@
#include "storage/freespace.h"
#include "storage/lmgr.h"
#include "storage/read_stream.h"
+#include "utils/injection_point.h"
#include "utils/lsyscache.h"
#include "utils/pg_rusage.h"
#include "utils/timestamp.h"
@@ -1899,6 +1900,8 @@ lazy_scan_new_or_empty(LVRelState *vacrel, Buffer buf, BlockNumber blkno,
*/
if (!PageIsAllVisible(page))
{
+ INJECTION_POINT("vacuum-empty-page-non-all-vis", NULL);
+
START_CRIT_SECTION();
/* mark buffer dirty before writing a WAL record */
@@ -2012,6 +2015,9 @@ lazy_scan_prune(LVRelState *vacrel,
&vacrel->offnum,
&vacrel->NewRelfrozenXid, &vacrel->NewRelminMxid);
+
+ INJECTION_POINT("vacuum-heap-prune-and-freeze-after", NULL);
+
Assert(MultiXactIdIsValid(vacrel->NewRelminMxid));
Assert(TransactionIdIsValid(vacrel->NewRelfrozenXid));
diff --git a/src/test/modules/test_misc/t/010_vacuum_empty_page.pl b/src/test/modules/test_misc/t/010_vacuum_empty_page.pl
new file mode 100644
index 00000000000..af5c39d2435
--- /dev/null
+++ b/src/test/modules/test_misc/t/010_vacuum_empty_page.pl
@@ -0,0 +1,75 @@
+# Copyright (c) 2025, PostgreSQL Global Development Group
+
+# Check how temporary file removals and statement queries are associated
+# in the server logs for various query sequences with the simple and
+# extended query protocols.
+
+use strict;
+use warnings FATAL => 'all';
+use PostgreSQL::Test::Cluster;
+use PostgreSQL::Test::Utils;
+use Test::More;
+
+if ($ENV{enable_injection_points} ne 'yes')
+{
+ plan skip_all => 'Injection points not supported by this build';
+}
+
+# Initialize a new PostgreSQL test cluster
+my $node = PostgreSQL::Test::Cluster->new('primary');
+$node->init();
+$node->append_conf(
+ 'postgresql.conf', qq(
+log_min_messages = 'notice'
+));
+$node->start;
+
+# Check if the extension injection_points is available, as it may be
+# possible that this script is run with installcheck, where the module
+# would not be installed by default.
+if (!$node->check_extension('injection_points'))
+{
+ plan skip_all => 'Extension injection_points not installed';
+}
+
+$node->safe_psql('postgres', 'CREATE EXTENSION injection_points;');
+
+
+# Setup table and populate with data
+$node->safe_psql(
+ "postgres", qq{
+CREATE TABLE vac_empty_test(a int);
+BEGIN;
+INSERT INTO vac_empty_test DEFAULT VALUES;
+ROLLBACK;
+});
+
+# From this point, autovacuum worker will wait at startup.
+$node->safe_psql('postgres',
+ "SELECT injection_points_attach('vacuum-heap-prune-and-freeze-after', 'error');");
+$node->safe_psql('postgres',
+ "SELECT injection_points_attach('vacuum-empty-page-non-all-vis', 'notice');");
+
+$node->psql('postgres', "VACUUM (FREEZE) vac_empty_test;", on_error_stop => 1);
+
+my $offset = -s $node->logfile;
+
+# Run vacuum, force it on empty page.
+$node->safe_psql(
+ "postgres", qq{
+VACUUM (DISABLE_PAGE_SKIPPING) vac_empty_test;
+});
+
+ok( $node->log_contains(
+ qr/NOTICE: notice triggered for injection point vacuum-empty-page-non-all-vis/,
+ $offset),
+ "vacuum sets all-visible page bit for empty page");
+
+
+$node->safe_psql('postgres',
+ "SELECT injection_points_detach('vacuum-heap-prune-and-freeze-after');");
+$node->safe_psql('postgres',
+ "SELECT injection_points_detach('vacuum-empty-page-non-all-vis');");
+
+$node->stop('fast');
+done_testing();
--
2.43.0
On Thu, Dec 18, 2025 at 1:07 PM Kirill Reshke <reshkekirill@gmail.com> wrote:
On Thu, 18 Dec 2025 at 20:18, Melanie Plageman
<melanieplageman@gmail.com> wrote:But you are right, I don't see any non-error code path where a heap
page would become empty (all line pointers set unused) and then not be
set all-visible. Only vacuum sets line pointers unused and if all the
line pointers are unused it will always set the page all-visible.I think, though, that if we error out in lazy_scan_prune() after
returning from heap_page_prune_and_freeze() such that we don't set the
empty page all-visible, we can end up with an empty page without
PD_ALL_VISIBLE set. You can see how this might work by patching the VM
set code in lazy_scan_prune() to skip empty pages.Thank you for your explanation! I completely forgot that PD_ALL_VIS
is a non-persistent change (hint bit). so its update can be trivially
lost.
The simplest real-life example is being killed just after returning
from heap_page_prune_and_freeze, yes.
PFA tap test covering lazy_scan_new_or_empty code path for
empty-but-not-all-visible page
Cool test! I'm going to have to think more about whether or not it is
worth adding a whole new TAP test for this codepath. Is there an
existing TAP test we could add it to so we don't need to make a new
cluster, etc? How long does the test take to run? Obviously it will be
quite short, but every bit we add to the test suite counts. I don't
actually know how much overhead there is with injection points.
I was chatting with Andres and he mentioned there is one other case
where you can end up in this code path (empty page without
PD_ALL_VISIBLE set) and this case does actually trigger this code:
if (RelationNeedsWAL(vacrel->rel) &&
!XLogRecPtrIsValid(PageGetLSN(page)))
log_newpage_buffer(buf, true);
If you are inserting to a new page and you successfully call
PageInit() (making the page no longer considered new by PageIsNew()
because pd_upper will be set) but you error out before actually
inserting the tuple, then you will have an empty page without
PD_ALL_VISIBLE set. And assuming you error out before emitting WAL,
the page will not have a valid LSN set. So you will hit that code
which calls log_newpage_buffer().
I would say this case is so narrow (the log_newpage_buffer() codepath
in lazy_scan_new_or_empty()), it's not worth the added test overhead,
but I just wanted to share what I learned about when this code could
be hit.
Previously it was more common in the bulk extension case to have empty
pages not set PD_ALL_VISIBLE because bulk extension would call
PageInit() on all of the pages it extended so all the pages except the
target page were empty (today they are not initialized so they go into
the PageIsNew() branch).
So, in both cases, it seems like the empty page not set PD_ALL_VISIBLE
mostly only hit if we previously errored out.
- Melanie
On Thu, Dec 18, 2025 at 10:46 AM Kirill Reshke <reshkekirill@gmail.com> wrote:
On Thu, 18 Dec 2025 at 20:18, Melanie Plageman
<melanieplageman@gmail.com> wrote:Also, after the whole set is committed, we should then never
experience discrepancy between PD_ALL_VISIBLE and VM bits? Because
they will be set in a single WAL record. The only cases when heap and
VM disagrees on all-visibility then are corruption,
pg_visibilitymap_truncate and old data (data before v19+ upgrade?)
If my understanding is correct, should we add document this?Even on current master, I don't see a scenario other than VM
corruption or truncation where PD_ALL_VISIBLE can be set but not the
VM (or vice versa). The only way would be if you error out after
setting PD_ALL_VISIBLE before setting the VM. Setting PD_ALL_VISIBLE
is not in a critical section in lazy_scan_prune(), so it won't panic
and dump shared memory, so the buffer with PD_ALL_VISIBLE set may
later get written out. But the only obvious way I see to error out of
MarkBufferDirty() is if the buffer is not valid -- which would have
kept us from doing previous operations on the buffer, I would think.Well... I may be missing something, but on current HEAD,
XLOG_HEAP2_PRUNE_VACUUM_SCAN and XLOG_HEAP2_VISIBLE are two different
record, XLOG_HEAP2_PRUNE_VACUUM_SCAN being always emitted first. So,
WAL writer may end up kill-9-ed just after
XLOG_HEAP2_PRUNE_VACUUM_SCAN makes it to the disk, and
XLOG_HEAP2_VISIBLE never. Crash recovery then, and we have
discrepancy. This does not happen with a single WAL record.
Another simple reproducer here: standby streaming, receiving
XLOG_HEAP2_PRUNE_VACUUM_SCAN from primary, Then network becomes bad,
and we never get XLOG_HEAP2_VISIBLE from primary. Then we promoted by
the admin. And again, VM bit vs PD_ALL_VISIBLE discrepancy. Am I
missing something?
Well, currently XLOG_HEAP2_PRUNE_VACUUM_SCAN doesn't set
PD_ALL_VISIBLE. PD_ALL_VISIBLE is WAL-logged in the XLOG_HEAP2_VISIBLE
record because in lazy_scan_prune() we call PageSetAllVisible() and
then visibilitymap_set() -> log_heap_visible() adds the heap buffer to
the WAL chain (with XLogRegisterBuffer()).
And if you notice when XLOG_HEAP2_VISIBLE is replayed in
heap_xlog_visible(), that is where we do PageSetAllVisible() on the
heap page.
So I think you can end up with PD_ALL_VISIBLE set if you error out
precisely between setting it and WAL logging it because we don't set
it in a critical section. But you can't end up with a WAL record that
sets PD_ALL_VISIBLE and another one that sets the VM.
Once we have my code changes, you can never end up with PD_ALL_VISIBLE
set and the VM not set because they are in the same critical section
and if we error out, it will cause a panic which will purge shared
memory.
- Melanie
On Fri, 19 Dec 2025 at 00:58, Melanie Plageman
<melanieplageman@gmail.com> wrote:
On Thu, Dec 18, 2025 at 1:07 PM Kirill Reshke <reshkekirill@gmail.com> wrote:
On Thu, 18 Dec 2025 at 20:18, Melanie Plageman
<melanieplageman@gmail.com> wrote:But you are right, I don't see any non-error code path where a heap
page would become empty (all line pointers set unused) and then not be
set all-visible. Only vacuum sets line pointers unused and if all the
line pointers are unused it will always set the page all-visible.I think, though, that if we error out in lazy_scan_prune() after
returning from heap_page_prune_and_freeze() such that we don't set the
empty page all-visible, we can end up with an empty page without
PD_ALL_VISIBLE set. You can see how this might work by patching the VM
set code in lazy_scan_prune() to skip empty pages.Thank you for your explanation! I completely forgot that PD_ALL_VIS
is a non-persistent change (hint bit). so its update can be trivially
lost.
The simplest real-life example is being killed just after returning
from heap_page_prune_and_freeze, yes.
PFA tap test covering lazy_scan_new_or_empty code path for
empty-but-not-all-visible pageCool test! I'm going to have to think more about whether or not it is
worth adding a whole new TAP test for this codepath. Is there an
existing TAP test we could add it to so we don't need to make a new
cluster, etc? How long does the test take to run? Obviously it will be
quite short, but every bit we add to the test suite counts. I don't
actually know how much overhead there is with injection points.
Well, on my pc this test runs in ~1.5 sec. I did not find any other
TAP test to place this, so created a new.
Actually, I only check for specific patterns in the log file of the
cluster in this test, so this test can instead be a regression test.
```
reshke=# VACUUM (DISABLE_PAGE_SKIPPING) vac_empty_test;
NOTICE: notice triggered for injection point vacuum-empty-page-non-all-vis
VACUUM
reshke=#
```
We will just check in the .out file that the code hits
'vacuum-empty-page-non-all-vis' after an error.
injection points overhead should not be that awful, just from my
experience. Maybe buildfarm members can say something here, I dunno.
Also, we already have a bunch of regression+inj point tests for some
rare cases, exempli gratia
src/test/modules/nbtree/sql/nbtree_half_dead_pages.sql.
--
Best regards,
Kirill Reshke
He Melanie,
Thanks for working on this.
On Wed, Dec 17, 2025 at 12:59 AM Melanie Plageman
<melanieplageman@gmail.com> wrote:
On Wed, Dec 3, 2025 at 6:07 PM Melanie Plageman
<melanieplageman@gmail.com> wrote:If we're just talking about the renaming, looking at procarray.c, it
is full of the word "removable" because its functions were largely
used to examine and determine if everyone can see an xmax as committed
and thus if that tuple is removable from their perspective. But
nothing about the code that I can see means it has to be an xmax. We
could just as well use the functions to determine if everyone can see
an xmin as committed.In the attached v27, I've removed the commit that renamed functions in
procarray.c. I've added a single wrapper GlobalVisTestXidNotRunning()
that is used in my code where I am testing live tuples. I think you'll
find that I've addressed all of your review comments now -- as I've
also gotten rid of the confusing blk_known_av logic through a series
of refactors.The one outstanding point is which commits should bump
XLOG_PAGE_MAGIC. (also review of the reworked patches).- Melanie
I’ve done a basic review of patches 1 and 2. Here are some comments
which may be somewhat immature, as this is a fairly large change set
and I’m new to some parts of the code.
1) Potential stale old_vmbits after VM repair n v2
// Corruption check 1
if (!PageIsAllVisible(page) &&
(old_vmbits & VISIBILITYMAP_ALL_VISIBLE) != 0)
{
visibilitymap_clear(...); // VM now cleared to 0
// but old_vmbits still holds ALL_VISIBLE
}
// ... later ...
if (!presult.all_visible)
return presult.ndeleted; // Not taken if presult.all_visible=true
new_vmbits = VISIBILITYMAP_ALL_VISIBLE; // Want to set this
if (old_vmbits == new_vmbits) // Stale old_vmbits=ALL_VISIBLE,
new_vmbits=ALL_VISIBLE
return presult.ndeleted; // issue: early return
After corruption repair clears the VM, old_vmbits is stale. The early
return can fire unexpectedly, leaving the VM cleared when it should be
re-set. Should we reset old_vmbits = 0 after the visibilitymap_clear?
2) Add Assert(BufferIsDirty(buf))
Since the patch's core claim is "buffer must be dirty before WAL
registration", an assertion encodes this invariant. Should we add:
Assert(BufferIsValid(buf));
Assert(BufferIsDirty(buf));
right before the visibilitymap_set() call?
3) Comment about "only scenario"
The comment at lines:
"The only scenario where it is not already dirty is if the VM was removed…"
This phrasing could become misleading after future refactors. Can we
make it more direct like:
"We must mark the heap buffer dirty before calling visibilitymap_set(), because it may WAL-log the buffer and XLogRegisterBuffer() requires it."
4) Comment clarity
Current comment:
"Even if PD_ALL_VISIBLE is already set, we don't need to worry about unnecessarily dirtying the heap buffer, as it must be marked dirty before adding it to the WAL chain. The only scenario where it is not already dirty is if the VM was removed..."
In this test we now call MarkBufferDirty() on the heap page even when
only setting the VM, so the comments claiming “does not need to modify
the heap buffer”/“no heap page modification” might be misleading. It
might be better to say the test doesn’t need to modify heap
tuples/page contents or doesn’t need to prune/freeze.
--
Best,
Xuneng
Attached v29 addresses some feedback and also corrects a small error
with the assertion I had added in the previous version's 0009.
On Thu, Dec 18, 2025 at 10:38 PM Xuneng Zhou <xunengzhou@gmail.com> wrote:
I’ve done a basic review of patches 1 and 2. Here are some comments
which may be somewhat immature, as this is a fairly large change set
and I’m new to some parts of the code.1) Potential stale old_vmbits after VM repair n v2
Good catch! I've fixed this in attached v29.
2) Add Assert(BufferIsDirty(buf))
Since the patch's core claim is "buffer must be dirty before WAL
registration", an assertion encodes this invariant. Should we add:Assert(BufferIsValid(buf));
Assert(BufferIsDirty(buf));right before the visibilitymap_set() call?
There are already assertions that will trip in various places -- most
importantly in XLogRegisterBuffer(), which is the one that inspired
this refactor.
The comment at lines:
"The only scenario where it is not already dirty is if the VM was removed…"
This phrasing could become misleading after future refactors. Can we
make it more direct like:"We must mark the heap buffer dirty before calling visibilitymap_set(), because it may WAL-log the buffer and XLogRegisterBuffer() requires it."
I see your point about future refactors missing updating comments like
this. But, I don't think we are going to refactor the code such that
we can have PD_ALL_VISIBLE set without the VM bits set more often.
Also, it is common practice in Postgres to describe very specific edge
cases or odd scenarios in order to explain code that may seem
confusing without the comment. It does risk that comment later
becoming stale, but it is better that future developers understand why
the code is there.
That being said, I take your point that the comment is confusing. I
have updated it in a different way.
"Even if PD_ALL_VISIBLE is already set, we don't need to worry about unnecessarily dirtying the heap buffer, as it must be marked dirty before adding it to the WAL chain. The only scenario where it is not already dirty is if the VM was removed..."
In this test we now call MarkBufferDirty() on the heap page even when
only setting the VM, so the comments claiming “does not need to modify
the heap buffer”/“no heap page modification” might be misleading. It
might be better to say the test doesn’t need to modify heap
tuples/page contents or doesn’t need to prune/freeze.
The point I'm trying to make is that we have to dirty the buffer even
if we don't modify the page because of the XLOG sub-system
requirements. And, it may seem like a waste to do that if not
modifying the page, but the page will rarely be clean anyway. I've
tried to make this more clear in attached v29.
- Melanie
Attachments:
v29-0001-Combine-visibilitymap_set-cases-in-lazy_scan_pru.patchtext/x-patch; charset=UTF-8; name=v29-0001-Combine-visibilitymap_set-cases-in-lazy_scan_pru.patchDownload
From 8442278884c0d128547910d17d3b640e0a4078e4 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <melanieplageman@gmail.com>
Date: Mon, 8 Dec 2025 15:49:54 -0500
Subject: [PATCH v29 01/15] Combine visibilitymap_set() cases in
lazy_scan_prune()
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
lazy_scan_prune() previously had two separate cases that called
visibilitymap_set() after pruning and freezing. These branches were
nearly identical except that one attempted to avoid dirtying the heap
buffer. However, that situation can never occur — the heap buffer cannot
be clean at that point (and we would hit an assertion if it were).
In lazy_scan_prune(), when we change a previously all-visible page to
all-frozen and the page was recorded as all-visible in the visibility
map by find_next_unskippable_block(), the heap buffer will always be
dirty. Either we have just frozen a tuple and already dirtied the
buffer, or the buffer was modified between find_next_unskippable_block()
and heap_page_prune_and_freeze() and then pruned in
heap_page_prune_and_freeze().
Additionally, XLogRegisterBuffer() asserts that the buffer is dirty, so
attempting to add a clean heap buffer to the WAL chain would assert out
anyway.
Since the “clean heap buffer with already set VM” case is impossible,
the two visibilitymap_set() branches in lazy_scan_prune() can be merged.
Doing so makes the intent clearer and emphasizes that the heap buffer
must always be marked dirty before being added to the WAL chain.
This commit also adds a test case for vacuuming when no heap
modifications are required. Currently this ensures that the heap buffer
is marked dirty before it is added to the WAL chain, but if we later
remove the heap buffer from the VM-set WAL chain or pass it with the
REGBUF_NO_CHANGES flag, this test would guard that behavior.
Author: Melanie Plageman <melanieplageman@gmail.com>
Reviewed-by: Chao Li <li.evan.chao@gmail.com>
Reviewed-by: Srinath Reddy Sadipiralla <srinath2133@gmail.com>
Reviewed-by: Kirill Reshke <reshkekirill@gmail.com>
Reviewed-by: Xuneng Zhou <xunengzhou@gmail.com>
Discussion: https://postgr.es/m/5CEAA162-67B1-44DA-B60D-8B65717E8B05%40gmail.com
Discussion: https://postgr.es/m/flat/CAAKRu_ZWx5gCbeCf7PWCv8p5%3D%3Db7EEws0VD2wksDxpXCvCyHvQ%40mail.gmail.com
---
.../pg_visibility/expected/pg_visibility.out | 44 ++++++++++
contrib/pg_visibility/sql/pg_visibility.sql | 20 +++++
src/backend/access/heap/vacuumlazy.c | 87 ++++---------------
3 files changed, 82 insertions(+), 69 deletions(-)
diff --git a/contrib/pg_visibility/expected/pg_visibility.out b/contrib/pg_visibility/expected/pg_visibility.out
index 09fa5933a35..e10f1706015 100644
--- a/contrib/pg_visibility/expected/pg_visibility.out
+++ b/contrib/pg_visibility/expected/pg_visibility.out
@@ -1,4 +1,5 @@
CREATE EXTENSION pg_visibility;
+CREATE EXTENSION pageinspect;
--
-- recently-dropped table
--
@@ -204,6 +205,49 @@ select pg_truncate_visibility_map('test_partition');
(1 row)
+-- test the case where vacuum phase I does not need to modify the heap buffer
+-- and only needs to set the VM
+create table test_vac_unmodified_heap(a int);
+insert into test_vac_unmodified_heap values (1);
+vacuum (freeze) test_vac_unmodified_heap;
+select pg_visibility_map_summary('test_vac_unmodified_heap');
+ pg_visibility_map_summary
+---------------------------
+ (1,1)
+(1 row)
+
+-- the checkpoint cleans the buffer dirtied by freezing the sole tuple
+checkpoint;
+-- truncating the VM ensures that the next vacuum will need to set it
+select pg_truncate_visibility_map('test_vac_unmodified_heap');
+ pg_truncate_visibility_map
+----------------------------
+
+(1 row)
+
+select pg_visibility_map_summary('test_vac_unmodified_heap');
+ pg_visibility_map_summary
+---------------------------
+ (0,0)
+(1 row)
+
+-- though the VM is truncated, the heap page-level visibility hint,
+-- PD_ALL_VISIBLE should still be set
+SELECT (flags & x'0004'::int) <> 0
+ FROM page_header(get_raw_page('test_vac_unmodified_heap', 0));
+ ?column?
+----------
+ t
+(1 row)
+
+-- vacuum sets the VM
+vacuum test_vac_unmodified_heap;
+select pg_visibility_map_summary('test_vac_unmodified_heap');
+ pg_visibility_map_summary
+---------------------------
+ (1,1)
+(1 row)
+
-- test copy freeze
create table copyfreeze (a int, b char(1500));
-- load all rows via COPY FREEZE and ensure that all pages are set all-visible
diff --git a/contrib/pg_visibility/sql/pg_visibility.sql b/contrib/pg_visibility/sql/pg_visibility.sql
index 5af06ec5b76..57af8a0c5b6 100644
--- a/contrib/pg_visibility/sql/pg_visibility.sql
+++ b/contrib/pg_visibility/sql/pg_visibility.sql
@@ -1,4 +1,5 @@
CREATE EXTENSION pg_visibility;
+CREATE EXTENSION pageinspect;
--
-- recently-dropped table
@@ -94,6 +95,25 @@ select count(*) > 0 from pg_visibility_map_summary('test_partition');
select * from pg_check_frozen('test_partition'); -- hopefully none
select pg_truncate_visibility_map('test_partition');
+-- test the case where vacuum phase I does not need to modify the heap buffer
+-- and only needs to set the VM
+create table test_vac_unmodified_heap(a int);
+insert into test_vac_unmodified_heap values (1);
+vacuum (freeze) test_vac_unmodified_heap;
+select pg_visibility_map_summary('test_vac_unmodified_heap');
+-- the checkpoint cleans the buffer dirtied by freezing the sole tuple
+checkpoint;
+-- truncating the VM ensures that the next vacuum will need to set it
+select pg_truncate_visibility_map('test_vac_unmodified_heap');
+select pg_visibility_map_summary('test_vac_unmodified_heap');
+-- though the VM is truncated, the heap page-level visibility hint,
+-- PD_ALL_VISIBLE should still be set
+SELECT (flags & x'0004'::int) <> 0
+ FROM page_header(get_raw_page('test_vac_unmodified_heap', 0));
+-- vacuum sets the VM
+vacuum test_vac_unmodified_heap;
+select pg_visibility_map_summary('test_vac_unmodified_heap');
+
-- test copy freeze
create table copyfreeze (a int, b char(1500));
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index 30778a15639..cecba2146ea 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -2093,16 +2093,14 @@ lazy_scan_prune(LVRelState *vacrel,
* of last heap_vac_scan_next_block() call), and from all_visible and
* all_frozen variables
*/
- if (!all_visible_according_to_vm && presult.all_visible)
+ if ((presult.all_visible && !all_visible_according_to_vm) ||
+ (presult.all_frozen && !VM_ALL_FROZEN(rel, blkno, &vmbuffer)))
{
uint8 old_vmbits;
uint8 flags = VISIBILITYMAP_ALL_VISIBLE;
if (presult.all_frozen)
- {
- Assert(!TransactionIdIsValid(presult.vm_conflict_horizon));
flags |= VISIBILITYMAP_ALL_FROZEN;
- }
/*
* It should never be the case that the visibility map page is set
@@ -2110,15 +2108,25 @@ lazy_scan_prune(LVRelState *vacrel,
* checksums are not enabled). Regardless, set both bits so that we
* get back in sync.
*
- * NB: If the heap page is all-visible but the VM bit is not set, we
- * don't need to dirty the heap page. However, if checksums are
- * enabled, we do need to make sure that the heap page is dirtied
- * before passing it to visibilitymap_set(), because it may be logged.
- * Given that this situation should only happen in rare cases after a
- * crash, it is not worth optimizing.
+ * Even if PD_ALL_VISIBLE is already set, we don't need to worry about
+ * unnecessarily dirtying the heap buffer. Nearly the only scenario
+ * where PD_ALL_VISIBLE is set but the VM is not is if the VM was
+ * removed -- and that isn't worth optimizing for. And if we add the
+ * heap buffer to the WAL chain (without passing REGBUF_NO_CHANGES),
+ * it must be marked dirty.
*/
PageSetAllVisible(page);
MarkBufferDirty(buf);
+
+ /*
+ * If the page is being set all-frozen, we pass InvalidTransactionId
+ * as the cutoff_xid, since a snapshot conflict horizon sufficient to
+ * make everything safe for REDO was logged when the page's tuples
+ * were frozen.
+ */
+ Assert(!presult.all_frozen ||
+ !TransactionIdIsValid(presult.vm_conflict_horizon));
+
old_vmbits = visibilitymap_set(vacrel->rel, blkno, buf,
InvalidXLogRecPtr,
vmbuffer, presult.vm_conflict_horizon,
@@ -2190,65 +2198,6 @@ lazy_scan_prune(LVRelState *vacrel,
VISIBILITYMAP_VALID_BITS);
}
- /*
- * If the all-visible page is all-frozen but not marked as such yet, mark
- * it as all-frozen.
- */
- else if (all_visible_according_to_vm && presult.all_frozen &&
- !VM_ALL_FROZEN(vacrel->rel, blkno, &vmbuffer))
- {
- uint8 old_vmbits;
-
- /*
- * Avoid relying on all_visible_according_to_vm as a proxy for the
- * page-level PD_ALL_VISIBLE bit being set, since it might have become
- * stale -- even when all_visible is set
- */
- if (!PageIsAllVisible(page))
- {
- PageSetAllVisible(page);
- MarkBufferDirty(buf);
- }
-
- /*
- * Set the page all-frozen (and all-visible) in the VM.
- *
- * We can pass InvalidTransactionId as our cutoff_xid, since a
- * snapshotConflictHorizon sufficient to make everything safe for REDO
- * was logged when the page's tuples were frozen.
- */
- Assert(!TransactionIdIsValid(presult.vm_conflict_horizon));
- old_vmbits = visibilitymap_set(vacrel->rel, blkno, buf,
- InvalidXLogRecPtr,
- vmbuffer, InvalidTransactionId,
- VISIBILITYMAP_ALL_VISIBLE |
- VISIBILITYMAP_ALL_FROZEN);
-
- /*
- * The page was likely already set all-visible in the VM. However,
- * there is a small chance that it was modified sometime between
- * setting all_visible_according_to_vm and checking the visibility
- * during pruning. Check the return value of old_vmbits anyway to
- * ensure the visibility map counters used for logging are accurate.
- */
- if ((old_vmbits & VISIBILITYMAP_ALL_VISIBLE) == 0)
- {
- vacrel->vm_new_visible_pages++;
- vacrel->vm_new_visible_frozen_pages++;
- *vm_page_frozen = true;
- }
-
- /*
- * We already checked that the page was not set all-frozen in the VM
- * above, so we don't need to test the value of old_vmbits.
- */
- else
- {
- vacrel->vm_new_frozen_pages++;
- *vm_page_frozen = true;
- }
- }
-
return presult.ndeleted;
}
--
2.43.0
v29-0002-Eliminate-use-of-cached-VM-value-in-lazy_scan_pr.patchtext/x-patch; charset=US-ASCII; name=v29-0002-Eliminate-use-of-cached-VM-value-in-lazy_scan_pr.patchDownload
From 80933bcb9f6a762a91ed773e36ea51e800105fac Mon Sep 17 00:00:00 2001
From: Melanie Plageman <melanieplageman@gmail.com>
Date: Thu, 11 Dec 2025 10:48:13 -0500
Subject: [PATCH v29 02/15] Eliminate use of cached VM value in
lazy_scan_prune()
lazy_scan_prune() takes a parameter from lazy_scan_heap() indicating
whether the page was marked all-visible in the VM at the time it was
last checked in find_next_unskippable_block(). This behavior is
historical, dating back to commit 608195a3a365, when we did not pin the
VM page until confirming it was not all-visible. Now that the VM page is
already pinned, there is no meaningful benefit to relying on a cached VM
status.
Removing this cached value simplifies the logic in both lazy_scan_heap()
and lazy_scan_prune(). It also clarifies future work that will set the
visibility map on-access: such paths will not have a cached value
available which would make the logic harder to reason about. Eliminating
it also enables us to detect and repair VM corruption on-access.
Along with removing the cached value and unconditionally checking the
visibility status of the heap page, this commit also moves the VM
corruption handling to occur first. This reordering should have no
performance impact, since the checks are inexpensive and performed only
once per page. It does, however, make the control flow easier to
understand. The new restructuring also makes it possible that after
fixing corruption, the VM could be newly set, if pruning found the page
all-visible.
Author: Melanie Plageman <melanieplageman@gmail.com>
Reviewed-by: Kirill Reshke <reshkekirill@gmail.com>
Reviewed-by: Xuneng Zhou <xunengzhou@gmail.com>
Discussion: https://postgr.es/m/5CEAA162-67B1-44DA-B60D-8B65717E8B05%40gmail.com
---
src/backend/access/heap/vacuumlazy.c | 182 ++++++++++++---------------
1 file changed, 83 insertions(+), 99 deletions(-)
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index cecba2146ea..d47ed7814c8 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -248,13 +248,6 @@ typedef enum
*/
#define EAGER_SCAN_REGION_SIZE 4096
-/*
- * heap_vac_scan_next_block() sets these flags to communicate information
- * about the block it read to the caller.
- */
-#define VAC_BLK_WAS_EAGER_SCANNED (1 << 0)
-#define VAC_BLK_ALL_VISIBLE_ACCORDING_TO_VM (1 << 1)
-
typedef struct LVRelState
{
/* Target heap relation and its indexes */
@@ -358,7 +351,6 @@ typedef struct LVRelState
/* State maintained by heap_vac_scan_next_block() */
BlockNumber current_block; /* last block returned */
BlockNumber next_unskippable_block; /* next unskippable block */
- bool next_unskippable_allvis; /* its visibility status */
bool next_unskippable_eager_scanned; /* if it was eagerly scanned */
Buffer next_unskippable_vmbuffer; /* buffer containing its VM bit */
@@ -432,7 +424,7 @@ static bool lazy_scan_new_or_empty(LVRelState *vacrel, Buffer buf,
bool sharelock, Buffer vmbuffer);
static int lazy_scan_prune(LVRelState *vacrel, Buffer buf,
BlockNumber blkno, Page page,
- Buffer vmbuffer, bool all_visible_according_to_vm,
+ Buffer vmbuffer,
bool *has_lpdead_items, bool *vm_page_frozen);
static bool lazy_scan_noprune(LVRelState *vacrel, Buffer buf,
BlockNumber blkno, Page page,
@@ -1248,7 +1240,6 @@ lazy_scan_heap(LVRelState *vacrel)
/* Initialize for the first heap_vac_scan_next_block() call */
vacrel->current_block = InvalidBlockNumber;
vacrel->next_unskippable_block = InvalidBlockNumber;
- vacrel->next_unskippable_allvis = false;
vacrel->next_unskippable_eager_scanned = false;
vacrel->next_unskippable_vmbuffer = InvalidBuffer;
@@ -1264,13 +1255,13 @@ lazy_scan_heap(LVRelState *vacrel)
MAIN_FORKNUM,
heap_vac_scan_next_block,
vacrel,
- sizeof(uint8));
+ sizeof(bool));
while (true)
{
Buffer buf;
Page page;
- uint8 blk_info = 0;
+ bool was_eager_scanned = false;
int ndeleted = 0;
bool has_lpdead_items;
void *per_buffer_data = NULL;
@@ -1339,13 +1330,13 @@ lazy_scan_heap(LVRelState *vacrel)
if (!BufferIsValid(buf))
break;
- blk_info = *((uint8 *) per_buffer_data);
+ was_eager_scanned = *((bool *) per_buffer_data);
CheckBufferIsPinnedOnce(buf);
page = BufferGetPage(buf);
blkno = BufferGetBlockNumber(buf);
vacrel->scanned_pages++;
- if (blk_info & VAC_BLK_WAS_EAGER_SCANNED)
+ if (was_eager_scanned)
vacrel->eager_scanned_pages++;
/* Report as block scanned, update error traceback information */
@@ -1416,7 +1407,6 @@ lazy_scan_heap(LVRelState *vacrel)
if (got_cleanup_lock)
ndeleted = lazy_scan_prune(vacrel, buf, blkno, page,
vmbuffer,
- blk_info & VAC_BLK_ALL_VISIBLE_ACCORDING_TO_VM,
&has_lpdead_items, &vm_page_frozen);
/*
@@ -1433,8 +1423,7 @@ lazy_scan_heap(LVRelState *vacrel)
* exclude pages skipped due to cleanup lock contention from eager
* freeze algorithm caps.
*/
- if (got_cleanup_lock &&
- (blk_info & VAC_BLK_WAS_EAGER_SCANNED))
+ if (got_cleanup_lock && was_eager_scanned)
{
/* Aggressive vacuums do not eager scan. */
Assert(!vacrel->aggressive);
@@ -1601,7 +1590,6 @@ heap_vac_scan_next_block(ReadStream *stream,
{
BlockNumber next_block;
LVRelState *vacrel = callback_private_data;
- uint8 blk_info = 0;
/* relies on InvalidBlockNumber + 1 overflowing to 0 on first call */
next_block = vacrel->current_block + 1;
@@ -1664,8 +1652,8 @@ heap_vac_scan_next_block(ReadStream *stream,
* otherwise they would've been unskippable.
*/
vacrel->current_block = next_block;
- blk_info |= VAC_BLK_ALL_VISIBLE_ACCORDING_TO_VM;
- *((uint8 *) per_buffer_data) = blk_info;
+ /* Block was not eager scanned */
+ *((bool *) per_buffer_data) = false;
return vacrel->current_block;
}
else
@@ -1677,11 +1665,7 @@ heap_vac_scan_next_block(ReadStream *stream,
Assert(next_block == vacrel->next_unskippable_block);
vacrel->current_block = next_block;
- if (vacrel->next_unskippable_allvis)
- blk_info |= VAC_BLK_ALL_VISIBLE_ACCORDING_TO_VM;
- if (vacrel->next_unskippable_eager_scanned)
- blk_info |= VAC_BLK_WAS_EAGER_SCANNED;
- *((uint8 *) per_buffer_data) = blk_info;
+ *((bool *) per_buffer_data) = vacrel->next_unskippable_eager_scanned;
return vacrel->current_block;
}
}
@@ -1706,7 +1690,6 @@ find_next_unskippable_block(LVRelState *vacrel, bool *skipsallvis)
BlockNumber next_unskippable_block = vacrel->next_unskippable_block + 1;
Buffer next_unskippable_vmbuffer = vacrel->next_unskippable_vmbuffer;
bool next_unskippable_eager_scanned = false;
- bool next_unskippable_allvis;
*skipsallvis = false;
@@ -1716,7 +1699,6 @@ find_next_unskippable_block(LVRelState *vacrel, bool *skipsallvis)
next_unskippable_block,
&next_unskippable_vmbuffer);
- next_unskippable_allvis = (mapbits & VISIBILITYMAP_ALL_VISIBLE) != 0;
/*
* At the start of each eager scan region, normal vacuums with eager
@@ -1735,7 +1717,7 @@ find_next_unskippable_block(LVRelState *vacrel, bool *skipsallvis)
* A block is unskippable if it is not all visible according to the
* visibility map.
*/
- if (!next_unskippable_allvis)
+ if ((mapbits & VISIBILITYMAP_ALL_VISIBLE) == 0)
{
Assert((mapbits & VISIBILITYMAP_ALL_FROZEN) == 0);
break;
@@ -1792,7 +1774,6 @@ find_next_unskippable_block(LVRelState *vacrel, bool *skipsallvis)
/* write the local variables back to vacrel */
vacrel->next_unskippable_block = next_unskippable_block;
- vacrel->next_unskippable_allvis = next_unskippable_allvis;
vacrel->next_unskippable_eager_scanned = next_unskippable_eager_scanned;
vacrel->next_unskippable_vmbuffer = next_unskippable_vmbuffer;
}
@@ -1953,9 +1934,7 @@ cmpOffsetNumbers(const void *a, const void *b)
* Caller must hold pin and buffer cleanup lock on the buffer.
*
* vmbuffer is the buffer containing the VM block with visibility information
- * for the heap block, blkno. all_visible_according_to_vm is the saved
- * visibility status of the heap block looked up earlier by the caller. We
- * won't rely entirely on this status, as it may be out of date.
+ * for the heap block, blkno.
*
* *has_lpdead_items is set to true or false depending on whether, upon return
* from this function, any LP_DEAD items are still present on the page.
@@ -1972,7 +1951,6 @@ lazy_scan_prune(LVRelState *vacrel,
BlockNumber blkno,
Page page,
Buffer vmbuffer,
- bool all_visible_according_to_vm,
bool *has_lpdead_items,
bool *vm_page_frozen)
{
@@ -1986,6 +1964,8 @@ lazy_scan_prune(LVRelState *vacrel,
.vistest = vacrel->vistest,
.cutoffs = &vacrel->cutoffs,
};
+ uint8 old_vmbits = 0;
+ uint8 new_vmbits = 0;
Assert(BufferGetBlockNumber(buf) == blkno);
@@ -2088,70 +2068,7 @@ lazy_scan_prune(LVRelState *vacrel,
Assert(!presult.all_visible || !(*has_lpdead_items));
Assert(!presult.all_frozen || presult.all_visible);
- /*
- * Handle setting visibility map bit based on information from the VM (as
- * of last heap_vac_scan_next_block() call), and from all_visible and
- * all_frozen variables
- */
- if ((presult.all_visible && !all_visible_according_to_vm) ||
- (presult.all_frozen && !VM_ALL_FROZEN(rel, blkno, &vmbuffer)))
- {
- uint8 old_vmbits;
- uint8 flags = VISIBILITYMAP_ALL_VISIBLE;
-
- if (presult.all_frozen)
- flags |= VISIBILITYMAP_ALL_FROZEN;
-
- /*
- * It should never be the case that the visibility map page is set
- * while the page-level bit is clear, but the reverse is allowed (if
- * checksums are not enabled). Regardless, set both bits so that we
- * get back in sync.
- *
- * Even if PD_ALL_VISIBLE is already set, we don't need to worry about
- * unnecessarily dirtying the heap buffer. Nearly the only scenario
- * where PD_ALL_VISIBLE is set but the VM is not is if the VM was
- * removed -- and that isn't worth optimizing for. And if we add the
- * heap buffer to the WAL chain (without passing REGBUF_NO_CHANGES),
- * it must be marked dirty.
- */
- PageSetAllVisible(page);
- MarkBufferDirty(buf);
-
- /*
- * If the page is being set all-frozen, we pass InvalidTransactionId
- * as the cutoff_xid, since a snapshot conflict horizon sufficient to
- * make everything safe for REDO was logged when the page's tuples
- * were frozen.
- */
- Assert(!presult.all_frozen ||
- !TransactionIdIsValid(presult.vm_conflict_horizon));
-
- old_vmbits = visibilitymap_set(vacrel->rel, blkno, buf,
- InvalidXLogRecPtr,
- vmbuffer, presult.vm_conflict_horizon,
- flags);
-
- /*
- * If the page wasn't already set all-visible and/or all-frozen in the
- * VM, count it as newly set for logging.
- */
- if ((old_vmbits & VISIBILITYMAP_ALL_VISIBLE) == 0)
- {
- vacrel->vm_new_visible_pages++;
- if (presult.all_frozen)
- {
- vacrel->vm_new_visible_frozen_pages++;
- *vm_page_frozen = true;
- }
- }
- else if ((old_vmbits & VISIBILITYMAP_ALL_FROZEN) == 0 &&
- presult.all_frozen)
- {
- vacrel->vm_new_frozen_pages++;
- *vm_page_frozen = true;
- }
- }
+ old_vmbits = visibilitymap_get_status(vacrel->rel, blkno, &vmbuffer);
/*
* As of PostgreSQL 9.2, the visibility map bit should never be set if the
@@ -2159,8 +2076,8 @@ lazy_scan_prune(LVRelState *vacrel,
* cleared after heap_vac_scan_next_block() was called, so we must recheck
* with buffer lock before concluding that the VM is corrupt.
*/
- else if (all_visible_according_to_vm && !PageIsAllVisible(page) &&
- visibilitymap_get_status(vacrel->rel, blkno, &vmbuffer) != 0)
+ if (!PageIsAllVisible(page) &&
+ (old_vmbits & VISIBILITYMAP_VALID_BITS) != 0)
{
ereport(WARNING,
(errcode(ERRCODE_DATA_CORRUPTED),
@@ -2169,6 +2086,8 @@ lazy_scan_prune(LVRelState *vacrel,
visibilitymap_clear(vacrel->rel, blkno, vmbuffer,
VISIBILITYMAP_VALID_BITS);
+ /* VM bits are now clear */
+ old_vmbits = 0;
}
/*
@@ -2196,6 +2115,71 @@ lazy_scan_prune(LVRelState *vacrel,
MarkBufferDirty(buf);
visibilitymap_clear(vacrel->rel, blkno, vmbuffer,
VISIBILITYMAP_VALID_BITS);
+ /* VM bits are now clear */
+ old_vmbits = 0;
+ }
+
+ if (!presult.all_visible)
+ return presult.ndeleted;
+
+ /* Set the visibility map and page visibility hint */
+ new_vmbits = VISIBILITYMAP_ALL_VISIBLE;
+
+ if (presult.all_frozen)
+ new_vmbits |= VISIBILITYMAP_ALL_FROZEN;
+
+ /* Nothing to do */
+ if (old_vmbits == new_vmbits)
+ return presult.ndeleted;
+
+ Assert(presult.all_visible);
+
+ /*
+ * It should never be the case that the visibility map page is set while
+ * the page-level bit is clear, but the reverse is allowed (if checksums
+ * are not enabled). Regardless, set both bits so that we get back in
+ * sync.
+ *
+ * The heap buffer must be marked dirty before adding it to the WAL chain
+ * when setting the VM. We don't worry about unnecessarily dirtying the
+ * heap buffer if PD_ALL_VISIBLE is already set, though. It is extremely
+ * rare to have a clean heap buffer with PD_ALL_VISIBLE already set and
+ * the VM bits clear, so there is no point in optimizing it.
+ */
+ PageSetAllVisible(page);
+ MarkBufferDirty(buf);
+
+ /*
+ * If the page is being set all-frozen, we pass InvalidTransactionId as
+ * the cutoff_xid, since a snapshot conflict horizon sufficient to make
+ * everything safe for REDO was logged when the page's tuples were frozen.
+ */
+ Assert(!presult.all_frozen ||
+ !TransactionIdIsValid(presult.vm_conflict_horizon));
+
+ visibilitymap_set(vacrel->rel, blkno, buf,
+ InvalidXLogRecPtr,
+ vmbuffer, presult.vm_conflict_horizon,
+ new_vmbits);
+
+ /*
+ * If the page wasn't already set all-visible and/or all-frozen in the VM,
+ * count it as newly set for logging.
+ */
+ if ((old_vmbits & VISIBILITYMAP_ALL_VISIBLE) == 0)
+ {
+ vacrel->vm_new_visible_pages++;
+ if (presult.all_frozen)
+ {
+ vacrel->vm_new_visible_frozen_pages++;
+ *vm_page_frozen = true;
+ }
+ }
+ else if ((old_vmbits & VISIBILITYMAP_ALL_FROZEN) == 0 &&
+ presult.all_frozen)
+ {
+ vacrel->vm_new_frozen_pages++;
+ *vm_page_frozen = true;
}
return presult.ndeleted;
--
2.43.0
v29-0003-Refactor-lazy_scan_prune-VM-clear-logic-into-hel.patchtext/x-patch; charset=US-ASCII; name=v29-0003-Refactor-lazy_scan_prune-VM-clear-logic-into-hel.patchDownload
From 0527745e9d51b96520d741da9a9c099fcd82a9f9 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <melanieplageman@gmail.com>
Date: Tue, 2 Dec 2025 13:36:39 -0500
Subject: [PATCH v29 03/15] Refactor lazy_scan_prune() VM clear logic into
helper
Encapsulating them in a helper makes the whole function clearer. There
is no functional change other than moving it into a helper.
Reviewed-by: Kirill Reshke <reshkekirill@gmail.com>
---
src/backend/access/heap/vacuumlazy.c | 132 +++++++++++++++++----------
1 file changed, 85 insertions(+), 47 deletions(-)
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index d47ed7814c8..2a027828891 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -422,6 +422,11 @@ static void find_next_unskippable_block(LVRelState *vacrel, bool *skipsallvis);
static bool lazy_scan_new_or_empty(LVRelState *vacrel, Buffer buf,
BlockNumber blkno, Page page,
bool sharelock, Buffer vmbuffer);
+static bool identify_and_fix_vm_corruption(Relation rel, Buffer heap_buffer,
+ BlockNumber heap_blk, Page heap_page,
+ int nlpdead_items,
+ Buffer vmbuffer,
+ uint8 vmbits);
static int lazy_scan_prune(LVRelState *vacrel, Buffer buf,
BlockNumber blkno, Page page,
Buffer vmbuffer,
@@ -1928,6 +1933,83 @@ cmpOffsetNumbers(const void *a, const void *b)
return pg_cmp_u16(*(const OffsetNumber *) a, *(const OffsetNumber *) b);
}
+/*
+ * Helper to correct any corruption detected on an heap page and its
+ * corresponding visibility map page after pruning but before setting the
+ * visibility map. It examines the heap page, the associated VM page, and the
+ * number of dead items previously identified.
+ *
+ * This function must be called while holding an exclusive lock on the heap
+ * buffer, and the dead items must have been discovered under that same lock.
+
+ * The provided vmbits must reflect the current state of the VM block
+ * referenced by vmbuffer. Although we do not hold a lock on the VM buffer, it
+ * is pinned, and the heap buffer is exclusively locked, ensuring that no
+ * other backend can update the VM bits corresponding to this heap page.
+ *
+ * Returns true if it cleared corruption and false otherwise.
+ */
+static bool
+identify_and_fix_vm_corruption(Relation rel, Buffer heap_buffer,
+ BlockNumber heap_blk, Page heap_page,
+ int nlpdead_items,
+ Buffer vmbuffer,
+ uint8 vmbits)
+{
+ Assert(visibilitymap_get_status(rel, heap_blk, &vmbuffer) == vmbits);
+
+ Assert(BufferIsLockedByMeInMode(heap_buffer, BUFFER_LOCK_EXCLUSIVE));
+
+ /*
+ * As of PostgreSQL 9.2, the visibility map bit should never be set if the
+ * page-level bit is clear. However, it's possible that the bit got
+ * cleared after heap_vac_scan_next_block() was called, so we must recheck
+ * with buffer lock before concluding that the VM is corrupt.
+ */
+ if (!PageIsAllVisible(heap_page) &&
+ ((vmbits & VISIBILITYMAP_VALID_BITS) != 0))
+ {
+ ereport(WARNING,
+ (errcode(ERRCODE_DATA_CORRUPTED),
+ errmsg("page is not marked all-visible but visibility map bit is set in relation \"%s\" page %u",
+ RelationGetRelationName(rel), heap_blk)));
+
+ visibilitymap_clear(rel, heap_blk, vmbuffer,
+ VISIBILITYMAP_VALID_BITS);
+ return true;
+ }
+
+ /*
+ * It's possible for the value returned by
+ * GetOldestNonRemovableTransactionId() to move backwards, so it's not
+ * wrong for us to see tuples that appear to not be visible to everyone
+ * yet, while PD_ALL_VISIBLE is already set. The real safe xmin value
+ * never moves backwards, but GetOldestNonRemovableTransactionId() is
+ * conservative and sometimes returns a value that's unnecessarily small,
+ * so if we see that contradiction it just means that the tuples that we
+ * think are not visible to everyone yet actually are, and the
+ * PD_ALL_VISIBLE flag is correct.
+ *
+ * There should never be LP_DEAD items on a page with PD_ALL_VISIBLE set,
+ * however.
+ */
+ else if (PageIsAllVisible(heap_page) && nlpdead_items > 0)
+ {
+ ereport(WARNING,
+ (errcode(ERRCODE_DATA_CORRUPTED),
+ errmsg("page containing LP_DEAD items is marked as all-visible in relation \"%s\" page %u",
+ RelationGetRelationName(rel), heap_blk)));
+
+ PageClearAllVisible(heap_page);
+ MarkBufferDirty(heap_buffer);
+ visibilitymap_clear(rel, heap_blk, vmbuffer,
+ VISIBILITYMAP_VALID_BITS);
+ return true;
+ }
+
+ return false;
+}
+
/*
* lazy_scan_prune() -- lazy_scan_heap() pruning and freezing.
*
@@ -2070,54 +2152,10 @@ lazy_scan_prune(LVRelState *vacrel,
old_vmbits = visibilitymap_get_status(vacrel->rel, blkno, &vmbuffer);
- /*
- * As of PostgreSQL 9.2, the visibility map bit should never be set if the
- * page-level bit is clear. However, it's possible that the bit got
- * cleared after heap_vac_scan_next_block() was called, so we must recheck
- * with buffer lock before concluding that the VM is corrupt.
- */
- if (!PageIsAllVisible(page) &&
- (old_vmbits & VISIBILITYMAP_VALID_BITS) != 0)
- {
- ereport(WARNING,
- (errcode(ERRCODE_DATA_CORRUPTED),
- errmsg("page is not marked all-visible but visibility map bit is set in relation \"%s\" page %u",
- vacrel->relname, blkno)));
-
- visibilitymap_clear(vacrel->rel, blkno, vmbuffer,
- VISIBILITYMAP_VALID_BITS);
- /* VM bits are now clear */
+ if (identify_and_fix_vm_corruption(vacrel->rel, buf, blkno, page,
+ presult.lpdead_items, vmbuffer,
+ old_vmbits))
old_vmbits = 0;
- }
-
- /*
- * It's possible for the value returned by
- * GetOldestNonRemovableTransactionId() to move backwards, so it's not
- * wrong for us to see tuples that appear to not be visible to everyone
- * yet, while PD_ALL_VISIBLE is already set. The real safe xmin value
- * never moves backwards, but GetOldestNonRemovableTransactionId() is
- * conservative and sometimes returns a value that's unnecessarily small,
- * so if we see that contradiction it just means that the tuples that we
- * think are not visible to everyone yet actually are, and the
- * PD_ALL_VISIBLE flag is correct.
- *
- * There should never be LP_DEAD items on a page with PD_ALL_VISIBLE set,
- * however.
- */
- else if (presult.lpdead_items > 0 && PageIsAllVisible(page))
- {
- ereport(WARNING,
- (errcode(ERRCODE_DATA_CORRUPTED),
- errmsg("page containing LP_DEAD items is marked as all-visible in relation \"%s\" page %u",
- vacrel->relname, blkno)));
-
- PageClearAllVisible(page);
- MarkBufferDirty(buf);
- visibilitymap_clear(vacrel->rel, blkno, vmbuffer,
- VISIBILITYMAP_VALID_BITS);
- /* VM bits are now clear */
- old_vmbits = 0;
- }
if (!presult.all_visible)
return presult.ndeleted;
--
2.43.0
v29-0004-Set-the-VM-in-heap_page_prune_and_freeze.patchtext/x-patch; charset=US-ASCII; name=v29-0004-Set-the-VM-in-heap_page_prune_and_freeze.patchDownload
From c8126c7046b81296d9cf3c81c8b6d6e5d9cf0951 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <melanieplageman@gmail.com>
Date: Tue, 2 Dec 2025 15:07:42 -0500
Subject: [PATCH v29 04/15] Set the VM in heap_page_prune_and_freeze()
This has no independent benefit. It is meant for ease of review. As of
this commit, there is still a separate WAL record emitted for setting
the VM after pruning and freezing. But it is easier to review if moving
the logic into pruneheap.c is separate from setting the VM in the same
WAL record.
---
src/backend/access/heap/pruneheap.c | 309 +++++++++++++++++++++++----
src/backend/access/heap/vacuumlazy.c | 150 +------------
src/include/access/heapam.h | 21 ++
3 files changed, 294 insertions(+), 186 deletions(-)
diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index 07aa08cfe14..62404768bef 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -19,7 +19,7 @@
#include "access/htup_details.h"
#include "access/multixact.h"
#include "access/transam.h"
-#include "access/visibilitymapdefs.h"
+#include "access/visibilitymap.h"
#include "access/xlog.h"
#include "access/xloginsert.h"
#include "commands/vacuum.h"
@@ -44,6 +44,8 @@ typedef struct
bool mark_unused_now;
/* whether to attempt freezing tuples */
bool attempt_freeze;
+ /* whether or not to attempt updating the VM */
+ bool attempt_update_vm;
struct VacuumCutoffs *cutoffs;
/*-------------------------------------------------------
@@ -140,16 +142,17 @@ typedef struct
* all_visible and all_frozen indicate if the all-visible and all-frozen
* bits in the visibility map can be set for this page after pruning.
*
- * visibility_cutoff_xid is the newest xmin of live tuples on the page.
- * The caller can use it as the conflict horizon, when setting the VM
- * bits. It is only valid if we froze some tuples, and all_frozen is
- * true.
+ * visibility_cutoff_xid is the newest xmin of live tuples on the page. It
+ * can be used as the conflict horizon when setting the VM or when
+ * freezing all the tuples on the page. It is only valid when all the live
+ * tuples on the page are all-visible.
*
* NOTE: all_visible and all_frozen initially don't include LP_DEAD items.
* That's convenient for heap_page_prune_and_freeze() to use them to
- * decide whether to freeze the page or not. The all_visible and
- * all_frozen values returned to the caller are adjusted to include
- * LP_DEAD items after we determine whether to opportunistically freeze.
+ * decide whether to opportunistically freeze the page or not. The
+ * all_visible and all_frozen values ultimately used to set the VM are
+ * adjusted to include LP_DEAD items after we determine whether or not to
+ * opportunistically freeze.
*/
bool all_visible;
bool all_frozen;
@@ -191,6 +194,17 @@ static void page_verify_redirects(Page page);
static bool heap_page_will_freeze(Relation relation, Buffer buffer,
bool did_tuple_hint_fpi, bool do_prune, bool do_hint_prune,
PruneState *prstate);
+static bool identify_and_fix_vm_corruption(Relation rel, Buffer heap_buffer,
+ BlockNumber heap_blk, Page heap_page, int nlpdead_items,
+ Buffer vmbuffer,
+ uint8 vmbits);
+static bool heap_page_will_set_vm(PruneState *prstate,
+ Relation relation,
+ BlockNumber heap_blk, Buffer heap_buffer, Page heap_page,
+ Buffer vmbuffer,
+ int nlpdead_items,
+ uint8 *old_vmbits,
+ uint8 *new_vmbits);
/*
@@ -280,6 +294,7 @@ heap_page_prune_opt(Relation relation, Buffer buffer)
PruneFreezeParams params = {
.relation = relation,
.buffer = buffer,
+ .vmbuffer = InvalidBuffer,
.reason = PRUNE_ON_ACCESS,
.options = 0,
.vistest = vistest,
@@ -341,6 +356,8 @@ prune_freeze_setup(PruneFreezeParams *params,
/* cutoffs must be provided if we will attempt freezing */
Assert(!(params->options & HEAP_PAGE_PRUNE_FREEZE) || params->cutoffs);
prstate->attempt_freeze = (params->options & HEAP_PAGE_PRUNE_FREEZE) != 0;
+ prstate->attempt_update_vm =
+ (params->options & HEAP_PAGE_PRUNE_UPDATE_VM) != 0;
prstate->cutoffs = params->cutoffs;
/*
@@ -396,51 +413,54 @@ prune_freeze_setup(PruneFreezeParams *params,
prstate->frz_conflict_horizon = InvalidTransactionId;
/*
- * Vacuum may update the VM after we're done. We can keep track of
- * whether the page will be all-visible and all-frozen after pruning and
- * freezing to help the caller to do that.
+ * Track whether the page could be marked all-visible and/or all-frozen.
+ * This information is used for opportunistic freezing and for updating
+ * the visibility map (VM) if requested by the caller.
*
- * Currently, only VACUUM sets the VM bits. To save the effort, only do
- * the bookkeeping if the caller needs it. Currently, that's tied to
- * HEAP_PAGE_PRUNE_FREEZE, but it could be a separate flag if you wanted
- * to update the VM bits without also freezing or freeze without also
- * setting the VM bits.
+ * Currently, only VACUUM performs freezing, but other callers may in the
+ * future. Visibility bookkeeping is required not just for setting the VM
+ * bits, but also for opportunistic freezing: we only consider freezing if
+ * the page would become all-frozen, or if it would be all-frozen except
+ * for dead tuples that VACUUM will remove. If attempt_update_vm is false,
+ * we will not set the VM bit even if the page is found to be all-visible.
*
- * In addition to telling the caller whether it can set the VM bit, we
- * also use 'all_visible' and 'all_frozen' for our own decision-making. If
- * the whole page would become frozen, we consider opportunistically
- * freezing tuples. We will not be able to freeze the whole page if there
- * are tuples present that are not visible to everyone or if there are
- * dead tuples which are not yet removable. However, dead tuples which
- * will be removed by the end of vacuuming should not preclude us from
- * opportunistically freezing. Because of that, we do not immediately
- * clear all_visible and all_frozen when we see LP_DEAD items. We fix
- * that after scanning the line pointers. We must correct all_visible and
- * all_frozen before we return them to the caller, so that the caller
- * doesn't set the VM bits incorrectly.
+ * If HEAP_PAGE_PRUNE_UPDATE_VM is passed without HEAP_PAGE_PRUNE_FREEZE,
+ * prstate.all_frozen must be initialized to false, since we will not call
+ * heap_prepare_freeze_tuple() for each tuple.
+ *
+ * Dead tuples that will be removed by the end of vacuum should not
+ * prevent opportunistic freezing. Therefore, we do not clear all_visible
+ * and all_frozen when we encounter LP_DEAD items. Instead, we correct
+ * them after deciding whether to freeze, but before updating the VM, to
+ * avoid setting the VM bits incorrectly.
+ *
+ * If neither freezing nor VM updates are requested, we skip the extra
+ * bookkeeping. In this case, initializing all_visible to false allows
+ * heap_prune_record_unchanged_lp_normal() to bypass unnecessary work.
*/
if (prstate->attempt_freeze)
{
prstate->all_visible = true;
prstate->all_frozen = true;
}
+ else if (prstate->attempt_update_vm)
+ {
+ prstate->all_visible = true;
+ prstate->all_frozen = false;
+ }
else
{
- /*
- * Initializing to false allows skipping the work to update them in
- * heap_prune_record_unchanged_lp_normal().
- */
prstate->all_visible = false;
prstate->all_frozen = false;
}
/*
- * The visibility cutoff xid is the newest xmin of live tuples on the
- * page. In the common case, this will be set as the conflict horizon the
- * caller can use for updating the VM. If, at the end of freezing and
- * pruning, the page is all-frozen, there is no possibility that any
- * running transaction on the standby does not see tuples on the page as
- * all-visible, so the conflict horizon remains InvalidTransactionId.
+ * The visibility cutoff xid is the newest xmin of live, committed tuples
+ * older than OldestXmin on the page. This field is only kept up-to-date
+ * if the page is all-visible. As soon as a tuple is encountered that is
+ * not visible to all, this field is unmaintained. As long as it is
+ * maintained, it can be used to calculate the snapshot conflict horizon
+ * when updating the VM and/or freezing all the tuples on the page.
*/
prstate->visibility_cutoff_xid = InvalidTransactionId;
}
@@ -775,10 +795,141 @@ heap_page_will_freeze(Relation relation, Buffer buffer,
return do_freeze;
}
+/*
+ * Helper to correct any corruption detected on an heap page and its
+ * corresponding visibility map page after pruning but before setting the
+ * visibility map. It examines the heap page, the associated VM page, and the
+ * number of dead items previously identified.
+ *
+ * This function must be called while holding an exclusive lock on the heap
+ * buffer, and the dead items must have been discovered under that same lock.
+ *
+ * The provided vmbits must reflect the current state of the VM block
+ * referenced by vmbuffer. Although we do not hold a lock on the VM buffer, it
+ * is pinned, and the heap buffer is exclusively locked, ensuring that no
+ * other backend can update the VM bits corresponding to this heap page.
+ *
+ * Returns true if it cleared corruption and false otherwise.
+ */
+static bool
+identify_and_fix_vm_corruption(Relation rel, Buffer heap_buffer,
+ BlockNumber heap_blk, Page heap_page,
+ int nlpdead_items,
+ Buffer vmbuffer,
+ uint8 vmbits)
+{
+ Assert(visibilitymap_get_status(rel, heap_blk, &vmbuffer) == vmbits);
+
+ Assert(BufferIsLockedByMeInMode(heap_buffer, BUFFER_LOCK_EXCLUSIVE));
+
+ /*
+ * As of PostgreSQL 9.2, the visibility map bit should never be set if the
+ * page-level bit is clear. However, it's possible that the bit got
+ * cleared after heap_vac_scan_next_block() was called, so we must recheck
+ * with buffer lock before concluding that the VM is corrupt.
+ */
+ if (!PageIsAllVisible(heap_page) &&
+ ((vmbits & VISIBILITYMAP_VALID_BITS) != 0))
+ {
+ ereport(WARNING,
+ (errcode(ERRCODE_DATA_CORRUPTED),
+ errmsg("page is not marked all-visible but visibility map bit is set in relation \"%s\" page %u",
+ RelationGetRelationName(rel), heap_blk)));
+
+ visibilitymap_clear(rel, heap_blk, vmbuffer,
+ VISIBILITYMAP_VALID_BITS);
+ return true;
+ }
+
+ /*
+ * It's possible for the value returned by
+ * GetOldestNonRemovableTransactionId() to move backwards, so it's not
+ * wrong for us to see tuples that appear to not be visible to everyone
+ * yet, while PD_ALL_VISIBLE is already set. The real safe xmin value
+ * never moves backwards, but GetOldestNonRemovableTransactionId() is
+ * conservative and sometimes returns a value that's unnecessarily small,
+ * so if we see that contradiction it just means that the tuples that we
+ * think are not visible to everyone yet actually are, and the
+ * PD_ALL_VISIBLE flag is correct.
+ *
+ * There should never be LP_DEAD items on a page with PD_ALL_VISIBLE set,
+ * however.
+ */
+ else if (PageIsAllVisible(heap_page) && nlpdead_items > 0)
+ {
+ ereport(WARNING,
+ (errcode(ERRCODE_DATA_CORRUPTED),
+ errmsg("page containing LP_DEAD items is marked as all-visible in relation \"%s\" page %u",
+ RelationGetRelationName(rel), heap_blk)));
+
+ PageClearAllVisible(heap_page);
+ MarkBufferDirty(heap_buffer);
+ visibilitymap_clear(rel, heap_blk, vmbuffer,
+ VISIBILITYMAP_VALID_BITS);
+ return true;
+ }
+
+ return false;
+}
+
+/*
+ * Decide whether to set the visibility map bits (all-visible and all-frozen)
+ * for heap_blk using information from the PruneState and VM.
+ *
+ * This function does not actually set the VM bits or page-level visibility
+ * hint, PD_ALL_VISIBLE.
+ *
+ * However, if it finds that the page-level visibility hint or VM is
+ * corrupted, it will fix them by clearing the VM bits and visibility hint.
+ * This does not need to be done in a critical section.
+ *
+ * Returns true if one or both VM bits should be set, along with returning the
+ * current value of the VM bits in *old_vmbits and the desired new value of
+ * the VM bits in *new_vmbits.
+ */
+static bool
+heap_page_will_set_vm(PruneState *prstate,
+ Relation relation,
+ BlockNumber heap_blk, Buffer heap_buffer, Page heap_page,
+ Buffer vmbuffer,
+ int nlpdead_items,
+ uint8 *old_vmbits,
+ uint8 *new_vmbits)
+{
+ if (!prstate->attempt_update_vm)
+ return false;
+
+ *old_vmbits = visibilitymap_get_status(relation, heap_blk,
+ &vmbuffer);
+
+ /* We do this even if not all-visible */
+ if (identify_and_fix_vm_corruption(relation, heap_buffer, heap_blk, heap_page,
+ nlpdead_items, vmbuffer,
+ *old_vmbits))
+ *old_vmbits = 0;
+
+ if (!prstate->all_visible)
+ return false;
+
+ *new_vmbits = VISIBILITYMAP_ALL_VISIBLE;
+
+ if (prstate->all_frozen)
+ *new_vmbits |= VISIBILITYMAP_ALL_FROZEN;
+
+ if (*new_vmbits == *old_vmbits)
+ {
+ *new_vmbits = 0;
+ return false;
+ }
+
+ return true;
+}
+
/*
* Prune and repair fragmentation and potentially freeze tuples on the
- * specified page.
+ * specified page. If the page's visibility status has changed, update it in
+ * the VM.
*
* Caller must have pin and buffer cleanup lock on the page. Note that we
* don't update the FSM information for page on caller's behalf. Caller might
@@ -793,12 +944,13 @@ heap_page_will_freeze(Relation relation, Buffer buffer,
* tuples if it's required in order to advance relfrozenxid / relminmxid, or
* if it's considered advantageous for overall system performance to do so
* now. The 'params.cutoffs', 'presult', 'new_relfrozen_xid' and
- * 'new_relmin_mxid' arguments are required when freezing. When
- * HEAP_PAGE_PRUNE_FREEZE option is passed, we also set presult->all_visible
- * and presult->all_frozen after determining whether or not to
- * opportunistically freeze, to indicate if the VM bits can be set. They are
- * always set to false when the HEAP_PAGE_PRUNE_FREEZE option is not passed,
- * because at the moment only callers that also freeze need that information.
+ * 'new_relmin_mxid' arguments are required when freezing.
+ *
+ * If HEAP_PAGE_PRUNE_UPDATE_VM is set in params and the visibility status of
+ * the page has changed, we will update the VM at the same time as pruning and
+ * freezing the heap page. We will also update presult->old_vmbits and
+ * presult->new_vmbits with the state of the VM before and after updating it
+ * for the caller to use in bookkeeping.
*
* presult contains output parameters needed by callers, such as the number of
* tuples removed and the offsets of dead items on the page after pruning.
@@ -823,13 +975,19 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
MultiXactId *new_relmin_mxid)
{
Buffer buffer = params->buffer;
+ Buffer vmbuffer = params->vmbuffer;
Page page = BufferGetPage(buffer);
+ BlockNumber blockno = BufferGetBlockNumber(buffer);
PruneState prstate;
bool do_freeze;
bool do_prune;
bool do_hint_prune;
+ bool do_set_vm;
bool did_tuple_hint_fpi;
int64 fpi_before = pgWalUsage.wal_fpi;
+ uint8 new_vmbits = 0;
+ uint8 old_vmbits = 0;
+
/* Initialize prstate */
prune_freeze_setup(params,
@@ -1011,6 +1169,65 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
*new_relmin_mxid = prstate.pagefrz.NoFreezePageRelminMxid;
}
}
+
+ /* Now update the visibility map and PD_ALL_VISIBLE hint */
+ Assert(!prstate.all_visible || (prstate.lpdead_items == 0));
+
+ do_set_vm = heap_page_will_set_vm(&prstate,
+ params->relation,
+ blockno,
+ buffer,
+ page,
+ vmbuffer,
+ prstate.lpdead_items,
+ &old_vmbits,
+ &new_vmbits);
+
+ /*
+ * new_vmbits should be 0 regardless of whether or not the page is
+ * all-visible if we do not intend to set the VM.
+ */
+ Assert(do_set_vm || new_vmbits == 0);
+
+ /* Set the visibility map and page visibility hint, if relevant */
+ if (do_set_vm)
+ {
+ Assert(prstate.all_visible);
+
+ /*
+ * It should never be the case that the visibility map page is set
+ * while the page-level bit is clear, but the reverse is allowed (if
+ * checksums are not enabled). Regardless, set both bits so that we
+ * get back in sync.
+ *
+ * The heap buffer must be marked dirty before adding it to the WAL
+ * chain when setting the VM. We don't worry about unnecessarily
+ * dirtying the heap buffer if PD_ALL_VISIBLE is already set, though.
+ * It is extremely rare to have a clean heap buffer with
+ * PD_ALL_VISIBLE already set and the VM bits clear, so there is no
+ * point in optimizing it.
+ */
+ PageSetAllVisible(page);
+ MarkBufferDirty(buffer);
+
+ /*
+ * If the page is being set all-frozen, we pass InvalidTransactionId
+ * as the cutoff_xid, since a snapshot conflict horizon sufficient to
+ * make everything safe for REDO was logged when the page's tuples
+ * were frozen.
+ */
+ Assert(!prstate.all_frozen ||
+ !TransactionIdIsValid(presult->vm_conflict_horizon));
+
+ visibilitymap_set(params->relation, blockno, buffer,
+ InvalidXLogRecPtr,
+ vmbuffer, presult->vm_conflict_horizon,
+ new_vmbits);
+ }
+
+ /* Save the vmbits for caller */
+ presult->old_vmbits = old_vmbits;
+ presult->new_vmbits = new_vmbits;
}
@@ -1485,6 +1702,8 @@ heap_prune_record_unchanged_lp_normal(Page page, PruneState *prstate, OffsetNumb
{
TransactionId xmin;
+ Assert(prstate->attempt_update_vm);
+
if (!HeapTupleHeaderXminCommitted(htup))
{
prstate->all_visible = false;
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index 2a027828891..8b489349312 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -422,11 +422,7 @@ static void find_next_unskippable_block(LVRelState *vacrel, bool *skipsallvis);
static bool lazy_scan_new_or_empty(LVRelState *vacrel, Buffer buf,
BlockNumber blkno, Page page,
bool sharelock, Buffer vmbuffer);
-static bool identify_and_fix_vm_corruption(Relation rel, Buffer heap_buffer,
- BlockNumber heap_blk, Page heap_page,
- int nlpdead_items,
- Buffer vmbuffer,
- uint8 vmbits);
+
static int lazy_scan_prune(LVRelState *vacrel, Buffer buf,
BlockNumber blkno, Page page,
Buffer vmbuffer,
@@ -1933,83 +1929,6 @@ cmpOffsetNumbers(const void *a, const void *b)
return pg_cmp_u16(*(const OffsetNumber *) a, *(const OffsetNumber *) b);
}
-/*
- * Helper to correct any corruption detected on an heap page and its
- * corresponding visibility map page after pruning but before setting the
- * visibility map. It examines the heap page, the associated VM page, and the
- * number of dead items previously identified.
- *
- * This function must be called while holding an exclusive lock on the heap
- * buffer, and the dead items must have been discovered under that same lock.
-
- * The provided vmbits must reflect the current state of the VM block
- * referenced by vmbuffer. Although we do not hold a lock on the VM buffer, it
- * is pinned, and the heap buffer is exclusively locked, ensuring that no
- * other backend can update the VM bits corresponding to this heap page.
- *
- * Returns true if it cleared corruption and false otherwise.
- */
-static bool
-identify_and_fix_vm_corruption(Relation rel, Buffer heap_buffer,
- BlockNumber heap_blk, Page heap_page,
- int nlpdead_items,
- Buffer vmbuffer,
- uint8 vmbits)
-{
- Assert(visibilitymap_get_status(rel, heap_blk, &vmbuffer) == vmbits);
-
- Assert(BufferIsLockedByMeInMode(heap_buffer, BUFFER_LOCK_EXCLUSIVE));
-
- /*
- * As of PostgreSQL 9.2, the visibility map bit should never be set if the
- * page-level bit is clear. However, it's possible that the bit got
- * cleared after heap_vac_scan_next_block() was called, so we must recheck
- * with buffer lock before concluding that the VM is corrupt.
- */
- if (!PageIsAllVisible(heap_page) &&
- ((vmbits & VISIBILITYMAP_VALID_BITS) != 0))
- {
- ereport(WARNING,
- (errcode(ERRCODE_DATA_CORRUPTED),
- errmsg("page is not marked all-visible but visibility map bit is set in relation \"%s\" page %u",
- RelationGetRelationName(rel), heap_blk)));
-
- visibilitymap_clear(rel, heap_blk, vmbuffer,
- VISIBILITYMAP_VALID_BITS);
- return true;
- }
-
- /*
- * It's possible for the value returned by
- * GetOldestNonRemovableTransactionId() to move backwards, so it's not
- * wrong for us to see tuples that appear to not be visible to everyone
- * yet, while PD_ALL_VISIBLE is already set. The real safe xmin value
- * never moves backwards, but GetOldestNonRemovableTransactionId() is
- * conservative and sometimes returns a value that's unnecessarily small,
- * so if we see that contradiction it just means that the tuples that we
- * think are not visible to everyone yet actually are, and the
- * PD_ALL_VISIBLE flag is correct.
- *
- * There should never be LP_DEAD items on a page with PD_ALL_VISIBLE set,
- * however.
- */
- else if (PageIsAllVisible(heap_page) && nlpdead_items > 0)
- {
- ereport(WARNING,
- (errcode(ERRCODE_DATA_CORRUPTED),
- errmsg("page containing LP_DEAD items is marked as all-visible in relation \"%s\" page %u",
- RelationGetRelationName(rel), heap_blk)));
-
- PageClearAllVisible(heap_page);
- MarkBufferDirty(heap_buffer);
- visibilitymap_clear(rel, heap_blk, vmbuffer,
- VISIBILITYMAP_VALID_BITS);
- return true;
- }
-
- return false;
-}
-
/*
* lazy_scan_prune() -- lazy_scan_heap() pruning and freezing.
*
@@ -2041,13 +1960,12 @@ lazy_scan_prune(LVRelState *vacrel,
PruneFreezeParams params = {
.relation = rel,
.buffer = buf,
+ .vmbuffer = vmbuffer,
.reason = PRUNE_VACUUM_SCAN,
- .options = HEAP_PAGE_PRUNE_FREEZE,
+ .options = HEAP_PAGE_PRUNE_FREEZE | HEAP_PAGE_PRUNE_UPDATE_VM,
.vistest = vacrel->vistest,
.cutoffs = &vacrel->cutoffs,
};
- uint8 old_vmbits = 0;
- uint8 new_vmbits = 0;
Assert(BufferGetBlockNumber(buf) == blkno);
@@ -2147,75 +2065,25 @@ lazy_scan_prune(LVRelState *vacrel,
/* Did we find LP_DEAD items? */
*has_lpdead_items = (presult.lpdead_items > 0);
- Assert(!presult.all_visible || !(*has_lpdead_items));
- Assert(!presult.all_frozen || presult.all_visible);
-
- old_vmbits = visibilitymap_get_status(vacrel->rel, blkno, &vmbuffer);
-
- if (identify_and_fix_vm_corruption(vacrel->rel, buf, blkno, page,
- presult.lpdead_items, vmbuffer,
- old_vmbits))
- old_vmbits = 0;
-
- if (!presult.all_visible)
- return presult.ndeleted;
-
- /* Set the visibility map and page visibility hint */
- new_vmbits = VISIBILITYMAP_ALL_VISIBLE;
-
- if (presult.all_frozen)
- new_vmbits |= VISIBILITYMAP_ALL_FROZEN;
-
- /* Nothing to do */
- if (old_vmbits == new_vmbits)
- return presult.ndeleted;
-
- Assert(presult.all_visible);
-
- /*
- * It should never be the case that the visibility map page is set while
- * the page-level bit is clear, but the reverse is allowed (if checksums
- * are not enabled). Regardless, set both bits so that we get back in
- * sync.
- *
- * The heap buffer must be marked dirty before adding it to the WAL chain
- * when setting the VM. We don't worry about unnecessarily dirtying the
- * heap buffer if PD_ALL_VISIBLE is already set, though. It is extremely
- * rare to have a clean heap buffer with PD_ALL_VISIBLE already set and
- * the VM bits clear, so there is no point in optimizing it.
- */
- PageSetAllVisible(page);
- MarkBufferDirty(buf);
-
- /*
- * If the page is being set all-frozen, we pass InvalidTransactionId as
- * the cutoff_xid, since a snapshot conflict horizon sufficient to make
- * everything safe for REDO was logged when the page's tuples were frozen.
- */
- Assert(!presult.all_frozen ||
- !TransactionIdIsValid(presult.vm_conflict_horizon));
-
- visibilitymap_set(vacrel->rel, blkno, buf,
- InvalidXLogRecPtr,
- vmbuffer, presult.vm_conflict_horizon,
- new_vmbits);
/*
* If the page wasn't already set all-visible and/or all-frozen in the VM,
* count it as newly set for logging.
*/
- if ((old_vmbits & VISIBILITYMAP_ALL_VISIBLE) == 0)
+ if ((presult.old_vmbits & VISIBILITYMAP_ALL_VISIBLE) == 0 &&
+ (presult.new_vmbits & VISIBILITYMAP_ALL_VISIBLE) != 0)
{
vacrel->vm_new_visible_pages++;
- if (presult.all_frozen)
+ if ((presult.new_vmbits & VISIBILITYMAP_ALL_FROZEN) != 0)
{
vacrel->vm_new_visible_frozen_pages++;
*vm_page_frozen = true;
}
}
- else if ((old_vmbits & VISIBILITYMAP_ALL_FROZEN) == 0 &&
- presult.all_frozen)
+ else if ((presult.old_vmbits & VISIBILITYMAP_ALL_FROZEN) == 0 &&
+ (presult.new_vmbits & VISIBILITYMAP_ALL_FROZEN) != 0)
{
+ Assert((presult.new_vmbits & VISIBILITYMAP_ALL_VISIBLE) != 0);
vacrel->vm_new_frozen_pages++;
*vm_page_frozen = true;
}
diff --git a/src/include/access/heapam.h b/src/include/access/heapam.h
index f7e4ae3843c..ad2af13ec39 100644
--- a/src/include/access/heapam.h
+++ b/src/include/access/heapam.h
@@ -42,6 +42,7 @@
/* "options" flag bits for heap_page_prune_and_freeze */
#define HEAP_PAGE_PRUNE_MARK_UNUSED_NOW (1 << 0)
#define HEAP_PAGE_PRUNE_FREEZE (1 << 1)
+#define HEAP_PAGE_PRUNE_UPDATE_VM (1 << 2)
typedef struct BulkInsertStateData *BulkInsertState;
typedef struct GlobalVisState GlobalVisState;
@@ -238,6 +239,12 @@ typedef struct PruneFreezeParams
Relation relation; /* relation containing buffer to be pruned */
Buffer buffer; /* buffer to be pruned */
+ /*
+ * If we will consider updating the visibility map, vmbuffer should
+ * contain the correct block of the visibility map and be pinned.
+ */
+ Buffer vmbuffer;
+
/*
* The reason pruning was performed. It is used to set the WAL record
* opcode which is used for debugging and analysis purposes.
@@ -252,6 +259,9 @@ typedef struct PruneFreezeParams
*
* HEAP_PAGE_PRUNE_FREEZE indicates that we will also freeze tuples, and
* will return 'all_visible', 'all_frozen' flags to the caller.
+ *
+ * HEAP_PAGE_PRUNE_UPDATE_VM indicates that we will set the page's status
+ * in the VM.
*/
int options;
@@ -299,6 +309,17 @@ typedef struct PruneFreezeResult
bool all_frozen;
TransactionId vm_conflict_horizon;
+ /*
+ * old_vmbits are the state of the all-visible and all-frozen bits in the
+ * visibility map before updating it during phase I of vacuuming.
+ * new_vmbits are the state of those bits after phase I of vacuuming.
+ *
+ * These are only set if the HEAP_PAGE_PRUNE_UPDATE_VM option is set and
+ * we have attempted to update the VM.
+ */
+ uint8 new_vmbits;
+ uint8 old_vmbits;
+
/*
* Whether or not the page makes rel truncation unsafe. This is set to
* 'true', even if the page contains LP_DEAD items. VACUUM will remove
--
2.43.0
v29-0005-Move-VM-assert-into-prune-freeze-code.patchtext/x-patch; charset=US-ASCII; name=v29-0005-Move-VM-assert-into-prune-freeze-code.patchDownload
From f801dd86f6b7b49b2d2aa747d1c42a11efbc53ab Mon Sep 17 00:00:00 2001
From: Melanie Plageman <melanieplageman@gmail.com>
Date: Tue, 2 Dec 2025 15:57:34 -0500
Subject: [PATCH v29 05/15] Move VM assert into prune/freeze code
This is a step toward setting the VM in the same WAL record as pruning
and freezing. It moves the check of the heap page into prune/freeze code
before setting the VM. This allows us to remove some fields of the
PruneFreezeResult.
---
src/backend/access/heap/pruneheap.c | 86 ++++++++++++++++++++++------
src/backend/access/heap/vacuumlazy.c | 68 +---------------------
src/include/access/heapam.h | 25 +++-----
3 files changed, 77 insertions(+), 102 deletions(-)
diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index 62404768bef..7f38d815de4 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -925,6 +925,31 @@ heap_page_will_set_vm(PruneState *prstate,
return true;
}
+#ifdef USE_ASSERT_CHECKING
+
+/*
+ * Wrapper for heap_page_would_be_all_visible() which can be used for callers
+ * that expect no LP_DEAD on the page. Currently assert-only, but there is no
+ * reason not to use it outside of asserts.
+ */
+static bool
+heap_page_is_all_visible(Relation rel, Buffer buf,
+ TransactionId OldestXmin,
+ bool *all_frozen,
+ TransactionId *visibility_cutoff_xid,
+ OffsetNumber *logging_offnum)
+{
+
+ return heap_page_would_be_all_visible(rel, buf,
+ OldestXmin,
+ NULL, 0,
+ all_frozen,
+ visibility_cutoff_xid,
+ logging_offnum);
+}
+#endif
+
+
/*
* Prune and repair fragmentation and potentially freeze tuples on the
@@ -978,6 +1003,7 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
Buffer vmbuffer = params->vmbuffer;
Page page = BufferGetPage(buffer);
BlockNumber blockno = BufferGetBlockNumber(buffer);
+ TransactionId vm_conflict_horizon = InvalidTransactionId;
PruneState prstate;
bool do_freeze;
bool do_prune;
@@ -1136,23 +1162,8 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
presult->nfrozen = prstate.nfrozen;
presult->live_tuples = prstate.live_tuples;
presult->recently_dead_tuples = prstate.recently_dead_tuples;
- presult->all_visible = prstate.all_visible;
- presult->all_frozen = prstate.all_frozen;
presult->hastup = prstate.hastup;
- /*
- * For callers planning to update the visibility map, the conflict horizon
- * for that record must be the newest xmin on the page. However, if the
- * page is completely frozen, there can be no conflict and the
- * vm_conflict_horizon should remain InvalidTransactionId. This includes
- * the case that we just froze all the tuples; the prune-freeze record
- * included the conflict XID already so the caller doesn't need it.
- */
- if (presult->all_frozen)
- presult->vm_conflict_horizon = InvalidTransactionId;
- else
- presult->vm_conflict_horizon = prstate.visibility_cutoff_xid;
-
presult->lpdead_items = prstate.lpdead_items;
/* the presult->deadoffsets array was already filled in */
@@ -1170,6 +1181,46 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
}
}
+ /*
+ * If updating the visibility map, the conflict horizon for that record
+ * must be the newest xmin on the page. However, if the page is
+ * completely frozen, there can be no conflict and the vm_conflict_horizon
+ * should remain InvalidTransactionId. This includes the case that we
+ * just froze all the tuples; the prune-freeze record included the
+ * conflict XID already so we don't need to again.
+ */
+ if (prstate.all_frozen)
+ vm_conflict_horizon = InvalidTransactionId;
+ else
+ vm_conflict_horizon = prstate.visibility_cutoff_xid;
+
+ /*
+ * During its second pass over the heap, VACUUM calls
+ * heap_page_would_be_all_visible() to determine whether a page is
+ * all-visible and all-frozen. The logic here is similar. After completing
+ * pruning and freezing, use an assertion to verify that our results
+ * remain consistent with heap_page_would_be_all_visible().
+ */
+#ifdef USE_ASSERT_CHECKING
+ if (prstate.all_visible)
+ {
+ TransactionId debug_cutoff;
+ bool debug_all_frozen;
+
+ Assert(presult->lpdead_items == 0);
+
+ Assert(heap_page_is_all_visible(params->relation, buffer,
+ prstate.cutoffs->OldestXmin,
+ &debug_all_frozen,
+ &debug_cutoff, off_loc));
+
+ Assert(prstate.all_frozen == debug_all_frozen);
+
+ Assert(!TransactionIdIsValid(debug_cutoff) ||
+ debug_cutoff == vm_conflict_horizon);
+ }
+#endif
+
/* Now update the visibility map and PD_ALL_VISIBLE hint */
Assert(!prstate.all_visible || (prstate.lpdead_items == 0));
@@ -1216,12 +1267,11 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
* make everything safe for REDO was logged when the page's tuples
* were frozen.
*/
- Assert(!prstate.all_frozen ||
- !TransactionIdIsValid(presult->vm_conflict_horizon));
+ Assert(!prstate.all_frozen || !TransactionIdIsValid(vm_conflict_horizon));
visibilitymap_set(params->relation, blockno, buffer,
InvalidXLogRecPtr,
- vmbuffer, presult->vm_conflict_horizon,
+ vmbuffer, vm_conflict_horizon,
new_vmbits);
}
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index 8b489349312..f56a02a3d46 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -457,20 +457,6 @@ static void dead_items_add(LVRelState *vacrel, BlockNumber blkno, OffsetNumber *
static void dead_items_reset(LVRelState *vacrel);
static void dead_items_cleanup(LVRelState *vacrel);
-#ifdef USE_ASSERT_CHECKING
-static bool heap_page_is_all_visible(Relation rel, Buffer buf,
- TransactionId OldestXmin,
- bool *all_frozen,
- TransactionId *visibility_cutoff_xid,
- OffsetNumber *logging_offnum);
-#endif
-static bool heap_page_would_be_all_visible(Relation rel, Buffer buf,
- TransactionId OldestXmin,
- OffsetNumber *deadoffsets,
- int ndeadoffsets,
- bool *all_frozen,
- TransactionId *visibility_cutoff_xid,
- OffsetNumber *logging_offnum);
static void update_relstats_all_indexes(LVRelState *vacrel);
static void vacuum_error_callback(void *arg);
static void update_vacuum_error_info(LVRelState *vacrel,
@@ -2006,32 +1992,6 @@ lazy_scan_prune(LVRelState *vacrel,
vacrel->new_frozen_tuple_pages++;
}
- /*
- * VACUUM will call heap_page_is_all_visible() during the second pass over
- * the heap to determine all_visible and all_frozen for the page -- this
- * is a specialized version of the logic from this function. Now that
- * we've finished pruning and freezing, make sure that we're in total
- * agreement with heap_page_is_all_visible() using an assertion.
- */
-#ifdef USE_ASSERT_CHECKING
- if (presult.all_visible)
- {
- TransactionId debug_cutoff;
- bool debug_all_frozen;
-
- Assert(presult.lpdead_items == 0);
-
- Assert(heap_page_is_all_visible(vacrel->rel, buf,
- vacrel->cutoffs.OldestXmin, &debug_all_frozen,
- &debug_cutoff, &vacrel->offnum));
-
- Assert(presult.all_frozen == debug_all_frozen);
-
- Assert(!TransactionIdIsValid(debug_cutoff) ||
- debug_cutoff == presult.vm_conflict_horizon);
- }
-#endif
-
/*
* Now save details of the LP_DEAD items from the page in vacrel
*/
@@ -3489,29 +3449,6 @@ dead_items_cleanup(LVRelState *vacrel)
vacrel->pvs = NULL;
}
-#ifdef USE_ASSERT_CHECKING
-
-/*
- * Wrapper for heap_page_would_be_all_visible() which can be used for callers
- * that expect no LP_DEAD on the page. Currently assert-only, but there is no
- * reason not to use it outside of asserts.
- */
-static bool
-heap_page_is_all_visible(Relation rel, Buffer buf,
- TransactionId OldestXmin,
- bool *all_frozen,
- TransactionId *visibility_cutoff_xid,
- OffsetNumber *logging_offnum)
-{
-
- return heap_page_would_be_all_visible(rel, buf,
- OldestXmin,
- NULL, 0,
- all_frozen,
- visibility_cutoff_xid,
- logging_offnum);
-}
-#endif
/*
* Check whether the heap page in buf is all-visible except for the dead
@@ -3535,15 +3472,12 @@ heap_page_is_all_visible(Relation rel, Buffer buf,
* - *logging_offnum: OffsetNumber of current tuple being processed;
* used by vacuum's error callback system.
*
- * Callers looking to verify that the page is already all-visible can call
- * heap_page_is_all_visible().
- *
* This logic is closely related to heap_prune_record_unchanged_lp_normal().
* If you modify this function, ensure consistency with that code. An
* assertion cross-checks that both remain in agreement. Do not introduce new
* side-effects.
*/
-static bool
+bool
heap_page_would_be_all_visible(Relation rel, Buffer buf,
TransactionId OldestXmin,
OffsetNumber *deadoffsets,
diff --git a/src/include/access/heapam.h b/src/include/access/heapam.h
index ad2af13ec39..bec2f840102 100644
--- a/src/include/access/heapam.h
+++ b/src/include/access/heapam.h
@@ -257,8 +257,7 @@ typedef struct PruneFreezeParams
* HEAP_PAGE_PRUNE_MARK_UNUSED_NOW indicates that dead items can be set
* LP_UNUSED during pruning.
*
- * HEAP_PAGE_PRUNE_FREEZE indicates that we will also freeze tuples, and
- * will return 'all_visible', 'all_frozen' flags to the caller.
+ * HEAP_PAGE_PRUNE_FREEZE indicates that we will also freeze tuples
*
* HEAP_PAGE_PRUNE_UPDATE_VM indicates that we will set the page's status
* in the VM.
@@ -294,21 +293,6 @@ typedef struct PruneFreezeResult
int live_tuples;
int recently_dead_tuples;
- /*
- * all_visible and all_frozen indicate if the all-visible and all-frozen
- * bits in the visibility map can be set for this page, after pruning.
- *
- * vm_conflict_horizon is the newest xmin of live tuples on the page. The
- * caller can use it as the conflict horizon when setting the VM bits. It
- * is only valid if we froze some tuples (nfrozen > 0), and all_frozen is
- * true.
- *
- * These are only set if the HEAP_PRUNE_FREEZE option is set.
- */
- bool all_visible;
- bool all_frozen;
- TransactionId vm_conflict_horizon;
-
/*
* old_vmbits are the state of the all-visible and all-frozen bits in the
* visibility map before updating it during phase I of vacuuming.
@@ -454,6 +438,13 @@ extern void log_heap_prune_and_freeze(Relation relation, Buffer buffer,
/* in heap/vacuumlazy.c */
extern void heap_vacuum_rel(Relation rel,
const VacuumParams params, BufferAccessStrategy bstrategy);
+extern bool heap_page_would_be_all_visible(Relation rel, Buffer buf,
+ TransactionId OldestXmin,
+ OffsetNumber *deadoffsets,
+ int ndeadoffsets,
+ bool *all_frozen,
+ TransactionId *visibility_cutoff_xid,
+ OffsetNumber *logging_offnum);
/* in heap/heapam_visibility.c */
extern bool HeapTupleSatisfiesVisibility(HeapTuple htup, Snapshot snapshot,
--
2.43.0
v29-0006-Eliminate-XLOG_HEAP2_VISIBLE-from-vacuum-phase-I.patchtext/x-patch; charset=US-ASCII; name=v29-0006-Eliminate-XLOG_HEAP2_VISIBLE-from-vacuum-phase-I.patchDownload
From 53b26c1cf6bf1e37fb5de4576aefee04a50a1f2f Mon Sep 17 00:00:00 2001
From: Melanie Plageman <melanieplageman@gmail.com>
Date: Tue, 2 Dec 2025 16:16:22 -0500
Subject: [PATCH v29 06/15] Eliminate XLOG_HEAP2_VISIBLE from vacuum phase I
prune/freeze
Vacuum no longer emits a separate WAL record for each page set
all-visible or all-frozen during phase I. Instead, visibility map
updates are now included in the XLOG_HEAP2_PRUNE_VACUUM_SCAN record that
is already emitted for pruning and freezing.
Previously, heap_page_prune_and_freeze() determined whether a page was
all-visible, but the corresponding VM bits were only set later in
lazy_scan_prune(). Now the VM is updated immediately in
heap_page_prune_and_freeze(), at the same time as the heap
modifications.
This change applies only to vacuum phase I, not to pruning performed
during normal page access.
Author: Melanie Plageman <melanieplageman@gmail.com>
Reviewed-by: Andres Freund <andres@anarazel.de>
Reviewed-by: Robert Haas <robertmhaas@gmail.com>
Reviewed-by: Kirill Reshke <reshkekirill@gmail.com>
Reviewed-by: Chao Li <li.evan.chao@gmail.com>
Discussion: https://postgr.es/m/flat/CAAKRu_ZMw6Npd_qm2KM%2BFwQ3cMOMx1Dh3VMhp8-V7SOLxdK9-g%40mail.gmail.com
---
src/backend/access/heap/pruneheap.c | 276 ++++++++++++++++------------
1 file changed, 157 insertions(+), 119 deletions(-)
diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index 7f38d815de4..b66fc6c17e6 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -205,6 +205,11 @@ static bool heap_page_will_set_vm(PruneState *prstate,
int nlpdead_items,
uint8 *old_vmbits,
uint8 *new_vmbits);
+static TransactionId get_conflict_xid(bool do_prune, bool do_freeze, bool do_set_vm,
+ uint8 old_vmbits, uint8 new_vmbits,
+ TransactionId latest_xid_removed,
+ TransactionId frz_conflict_horizon,
+ TransactionId visibility_cutoff_xid);
/*
@@ -795,6 +800,68 @@ heap_page_will_freeze(Relation relation, Buffer buffer,
return do_freeze;
}
+/*
+ * Calculate the conflict horizon for the whole XLOG_HEAP2_PRUNE_VACUUM_SCAN
+ * or XLOG_HEAP2_PRUNE_ON_ACCESS record.
+ */
+static TransactionId
+get_conflict_xid(bool do_prune, bool do_freeze, bool do_set_vm,
+ uint8 old_vmbits, uint8 new_vmbits,
+ TransactionId latest_xid_removed, TransactionId frz_conflict_horizon,
+ TransactionId visibility_cutoff_xid)
+{
+ TransactionId conflict_xid;
+
+ /*
+ * We can omit the snapshot conflict horizon if we are not pruning or
+ * freezing any tuples and are setting an already all-visible page
+ * all-frozen in the VM. In this case, all of the tuples on the page must
+ * already be visible to all MVCC snapshots on the standby.
+ */
+ if (!do_prune &&
+ !do_freeze &&
+ do_set_vm &&
+ (old_vmbits & VISIBILITYMAP_ALL_VISIBLE) != 0 &&
+ (new_vmbits & VISIBILITYMAP_ALL_FROZEN) != 0)
+ return InvalidTransactionId;
+
+ /*
+ * The snapshotConflictHorizon for the whole record should be the most
+ * conservative of all the horizons calculated for any of the possible
+ * modifications. If this record will prune tuples, any transactions on
+ * the standby older than the youngest xmax of the most recently removed
+ * tuple this record will prune will conflict. If this record will freeze
+ * tuples, any transactions on the standby with xids older than the
+ * youngest tuple this record will freeze will conflict.
+ */
+ conflict_xid = InvalidTransactionId;
+
+ /*
+ * If we are updating the VM, the conflict horizon is almost always the
+ * visibility cutoff XID.
+ *
+ * Separately, if we are freezing any tuples, as an optimization, we can
+ * use the visibility_cutoff_xid as the conflict horizon if the page will
+ * be all-frozen. This is true even if there are LP_DEAD line pointers
+ * because we ignored those when maintaining the visibility_cutoff_xid.
+ * This will have been calculated earlier as the frz_conflict_horizon when
+ * we determined we would freeze.
+ */
+ if (do_set_vm)
+ conflict_xid = visibility_cutoff_xid;
+ else if (do_freeze)
+ conflict_xid = frz_conflict_horizon;
+
+ /*
+ * If we are removing tuples with a younger xmax than our so far
+ * calculated conflict_xid, we must use this as our horizon.
+ */
+ if (TransactionIdFollows(latest_xid_removed, conflict_xid))
+ conflict_xid = latest_xid_removed;
+
+ return conflict_xid;
+}
+
/*
* Helper to correct any corruption detected on an heap page and its
* corresponding visibility map page after pruning but before setting the
@@ -1003,7 +1070,6 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
Buffer vmbuffer = params->vmbuffer;
Page page = BufferGetPage(buffer);
BlockNumber blockno = BufferGetBlockNumber(buffer);
- TransactionId vm_conflict_horizon = InvalidTransactionId;
PruneState prstate;
bool do_freeze;
bool do_prune;
@@ -1011,10 +1077,10 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
bool do_set_vm;
bool did_tuple_hint_fpi;
int64 fpi_before = pgWalUsage.wal_fpi;
+ TransactionId conflict_xid = InvalidTransactionId;
uint8 new_vmbits = 0;
uint8 old_vmbits = 0;
-
/* Initialize prstate */
prune_freeze_setup(params,
new_relfrozen_xid, new_relmin_mxid,
@@ -1075,6 +1141,37 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
prstate.all_visible = prstate.all_frozen = false;
Assert(!prstate.all_frozen || prstate.all_visible);
+ Assert(!prstate.all_visible || (prstate.lpdead_items == 0));
+
+ /*
+ * Decide whether to set the VM bits based on information from the VM and
+ * the all_visible/all_frozen flags.
+ */
+ do_set_vm = heap_page_will_set_vm(&prstate,
+ params->relation,
+ blockno,
+ buffer,
+ page,
+ vmbuffer,
+ prstate.lpdead_items,
+ &old_vmbits,
+ &new_vmbits);
+
+ /*
+ * new_vmbits should be 0 regardless of whether or not the page is
+ * all-visible if we do not intend to set the VM.
+ */
+ Assert(do_set_vm || new_vmbits == 0);
+
+ conflict_xid = get_conflict_xid(do_prune, do_freeze, do_set_vm,
+ old_vmbits, new_vmbits,
+ prstate.latest_xid_removed,
+ prstate.frz_conflict_horizon,
+ prstate.visibility_cutoff_xid);
+
+ /* Lock vmbuffer before entering a critical section */
+ if (do_set_vm)
+ LockBuffer(vmbuffer, BUFFER_LOCK_EXCLUSIVE);
/* Any error while applying the changes is critical */
START_CRIT_SECTION();
@@ -1096,14 +1193,17 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
/*
* If that's all we had to do to the page, this is a non-WAL-logged
- * hint. If we are going to freeze or prune the page, we will mark
- * the buffer dirty below.
+ * hint. If we are going to freeze or prune the page or set
+ * PD_ALL_VISIBLE, we will mark the buffer dirty below.
+ *
+ * Setting PD_ALL_VISIBLE is fully WAL-logged because it is forbidden
+ * for the VM to be set and PD_ALL_VISIBLE to be clear.
*/
- if (!do_freeze && !do_prune)
+ if (!do_freeze && !do_prune && !do_set_vm)
MarkBufferDirtyHint(buffer, true);
}
- if (do_prune || do_freeze)
+ if (do_prune || do_freeze || do_set_vm)
{
/* Apply the planned item changes and repair page fragmentation. */
if (do_prune)
@@ -1117,6 +1217,26 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
if (do_freeze)
heap_freeze_prepared_tuples(buffer, prstate.frozen, prstate.nfrozen);
+ /* Set the visibility map and page visibility hint */
+ if (do_set_vm)
+ {
+ /*
+ * While it is valid for PD_ALL_VISIBLE to be set when the
+ * corresponding VM bit is clear, we strongly prefer to keep them
+ * in sync.
+ *
+ * The heap buffer must be marked dirty before adding it to the
+ * WAL chain when setting the VM. We don't worry about
+ * unnecessarily dirtying the heap buffer if PD_ALL_VISIBLE is
+ * already set, though. It is extremely rare to have a clean heap
+ * buffer with PD_ALL_VISIBLE already set and the VM bits clear,
+ * so there is no point in optimizing it.
+ */
+ PageSetAllVisible(page);
+ visibilitymap_set_vmbits(blockno, vmbuffer, new_vmbits,
+ params->relation->rd_locator);
+ }
+
MarkBufferDirty(buffer);
/*
@@ -1124,29 +1244,12 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
*/
if (RelationNeedsWAL(params->relation))
{
- /*
- * The snapshotConflictHorizon for the whole record should be the
- * most conservative of all the horizons calculated for any of the
- * possible modifications. If this record will prune tuples, any
- * transactions on the standby older than the youngest xmax of the
- * most recently removed tuple this record will prune will
- * conflict. If this record will freeze tuples, any transactions
- * on the standby with xids older than the youngest tuple this
- * record will freeze will conflict.
- */
- TransactionId conflict_xid;
-
- if (TransactionIdFollows(prstate.frz_conflict_horizon,
- prstate.latest_xid_removed))
- conflict_xid = prstate.frz_conflict_horizon;
- else
- conflict_xid = prstate.latest_xid_removed;
-
log_heap_prune_and_freeze(params->relation, buffer,
- InvalidBuffer, /* vmbuffer */
- 0, /* vmflags */
+ do_set_vm ? vmbuffer : InvalidBuffer,
+ do_set_vm ? new_vmbits : 0,
conflict_xid,
- true, params->reason,
+ true, /* cleanup lock */
+ params->reason,
prstate.frozen, prstate.nfrozen,
prstate.redirected, prstate.nredirected,
prstate.nowdead, prstate.ndead,
@@ -1156,43 +1259,8 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
END_CRIT_SECTION();
- /* Copy information back for caller */
- presult->ndeleted = prstate.ndeleted;
- presult->nnewlpdead = prstate.ndead;
- presult->nfrozen = prstate.nfrozen;
- presult->live_tuples = prstate.live_tuples;
- presult->recently_dead_tuples = prstate.recently_dead_tuples;
- presult->hastup = prstate.hastup;
-
- presult->lpdead_items = prstate.lpdead_items;
- /* the presult->deadoffsets array was already filled in */
-
- if (prstate.attempt_freeze)
- {
- if (presult->nfrozen > 0)
- {
- *new_relfrozen_xid = prstate.pagefrz.FreezePageRelfrozenXid;
- *new_relmin_mxid = prstate.pagefrz.FreezePageRelminMxid;
- }
- else
- {
- *new_relfrozen_xid = prstate.pagefrz.NoFreezePageRelfrozenXid;
- *new_relmin_mxid = prstate.pagefrz.NoFreezePageRelminMxid;
- }
- }
-
- /*
- * If updating the visibility map, the conflict horizon for that record
- * must be the newest xmin on the page. However, if the page is
- * completely frozen, there can be no conflict and the vm_conflict_horizon
- * should remain InvalidTransactionId. This includes the case that we
- * just froze all the tuples; the prune-freeze record included the
- * conflict XID already so we don't need to again.
- */
- if (prstate.all_frozen)
- vm_conflict_horizon = InvalidTransactionId;
- else
- vm_conflict_horizon = prstate.visibility_cutoff_xid;
+ if (do_set_vm)
+ LockBuffer(vmbuffer, BUFFER_LOCK_UNLOCK);
/*
* During its second pass over the heap, VACUUM calls
@@ -1207,7 +1275,8 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
TransactionId debug_cutoff;
bool debug_all_frozen;
- Assert(presult->lpdead_items == 0);
+ Assert(prstate.lpdead_items == 0);
+ Assert(prstate.cutoffs);
Assert(heap_page_is_all_visible(params->relation, buffer,
prstate.cutoffs->OldestXmin,
@@ -1217,67 +1286,36 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
Assert(prstate.all_frozen == debug_all_frozen);
Assert(!TransactionIdIsValid(debug_cutoff) ||
- debug_cutoff == vm_conflict_horizon);
+ debug_cutoff == prstate.visibility_cutoff_xid);
}
#endif
- /* Now update the visibility map and PD_ALL_VISIBLE hint */
- Assert(!prstate.all_visible || (prstate.lpdead_items == 0));
-
- do_set_vm = heap_page_will_set_vm(&prstate,
- params->relation,
- blockno,
- buffer,
- page,
- vmbuffer,
- prstate.lpdead_items,
- &old_vmbits,
- &new_vmbits);
+ /* Copy information back for caller */
+ presult->ndeleted = prstate.ndeleted;
+ presult->nnewlpdead = prstate.ndead;
+ presult->nfrozen = prstate.nfrozen;
+ presult->live_tuples = prstate.live_tuples;
+ presult->recently_dead_tuples = prstate.recently_dead_tuples;
+ presult->hastup = prstate.hastup;
+ presult->new_vmbits = new_vmbits;
+ presult->old_vmbits = old_vmbits;
- /*
- * new_vmbits should be 0 regardless of whether or not the page is
- * all-visible if we do not intend to set the VM.
- */
- Assert(do_set_vm || new_vmbits == 0);
+ presult->lpdead_items = prstate.lpdead_items;
+ /* the presult->deadoffsets array was already filled in */
- /* Set the visibility map and page visibility hint, if relevant */
- if (do_set_vm)
+ if (prstate.attempt_freeze)
{
- Assert(prstate.all_visible);
-
- /*
- * It should never be the case that the visibility map page is set
- * while the page-level bit is clear, but the reverse is allowed (if
- * checksums are not enabled). Regardless, set both bits so that we
- * get back in sync.
- *
- * The heap buffer must be marked dirty before adding it to the WAL
- * chain when setting the VM. We don't worry about unnecessarily
- * dirtying the heap buffer if PD_ALL_VISIBLE is already set, though.
- * It is extremely rare to have a clean heap buffer with
- * PD_ALL_VISIBLE already set and the VM bits clear, so there is no
- * point in optimizing it.
- */
- PageSetAllVisible(page);
- MarkBufferDirty(buffer);
-
- /*
- * If the page is being set all-frozen, we pass InvalidTransactionId
- * as the cutoff_xid, since a snapshot conflict horizon sufficient to
- * make everything safe for REDO was logged when the page's tuples
- * were frozen.
- */
- Assert(!prstate.all_frozen || !TransactionIdIsValid(vm_conflict_horizon));
-
- visibilitymap_set(params->relation, blockno, buffer,
- InvalidXLogRecPtr,
- vmbuffer, vm_conflict_horizon,
- new_vmbits);
+ if (presult->nfrozen > 0)
+ {
+ *new_relfrozen_xid = prstate.pagefrz.FreezePageRelfrozenXid;
+ *new_relmin_mxid = prstate.pagefrz.FreezePageRelminMxid;
+ }
+ else
+ {
+ *new_relfrozen_xid = prstate.pagefrz.NoFreezePageRelfrozenXid;
+ *new_relmin_mxid = prstate.pagefrz.NoFreezePageRelminMxid;
+ }
}
-
- /* Save the vmbits for caller */
- presult->old_vmbits = old_vmbits;
- presult->new_vmbits = new_vmbits;
}
--
2.43.0
v29-0007-Eliminate-XLOG_HEAP2_VISIBLE-from-empty-page-vac.patchtext/x-patch; charset=US-ASCII; name=v29-0007-Eliminate-XLOG_HEAP2_VISIBLE-from-empty-page-vac.patchDownload
From 6f085fbf06d63c7427946cba50917c7fdf058fae Mon Sep 17 00:00:00 2001
From: Melanie Plageman <melanieplageman@gmail.com>
Date: Sat, 27 Sep 2025 11:55:21 -0400
Subject: [PATCH v29 07/15] Eliminate XLOG_HEAP2_VISIBLE from empty-page vacuum
As part of removing XLOG_HEAP2_VISIBLE records, phase I of VACUUM now
marks empty pages all-visible in a XLOG_HEAP2_PRUNE_VACUUM_SCAN record.
Author: Melanie Plageman <melanieplageman@gmail.com>
Reviewed-by: Robert Haas <robertmhaas@gmail.com>
Reviewed-by: Andres Freund <andres@anarazel.de>
Reviewed-by: Chao Li <li.evan.chao@gmail.com>
---
src/backend/access/heap/vacuumlazy.c | 35 +++++++++++++++++++++++-----
1 file changed, 29 insertions(+), 6 deletions(-)
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index f56a02a3d46..d22d2a86ed0 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -1867,9 +1867,12 @@ lazy_scan_new_or_empty(LVRelState *vacrel, Buffer buf, BlockNumber blkno,
*/
if (!PageIsAllVisible(page))
{
+ /* Lock vmbuffer before entering critical section */
+ LockBuffer(vmbuffer, BUFFER_LOCK_EXCLUSIVE);
+
START_CRIT_SECTION();
- /* mark buffer dirty before writing a WAL record */
+ /* Mark buffer dirty before writing any WAL records */
MarkBufferDirty(buf);
/*
@@ -1886,13 +1889,33 @@ lazy_scan_new_or_empty(LVRelState *vacrel, Buffer buf, BlockNumber blkno,
log_newpage_buffer(buf, true);
PageSetAllVisible(page);
- visibilitymap_set(vacrel->rel, blkno, buf,
- InvalidXLogRecPtr,
- vmbuffer, InvalidTransactionId,
- VISIBILITYMAP_ALL_VISIBLE |
- VISIBILITYMAP_ALL_FROZEN);
+ visibilitymap_set_vmbits(blkno,
+ vmbuffer,
+ VISIBILITYMAP_ALL_VISIBLE |
+ VISIBILITYMAP_ALL_FROZEN,
+ vacrel->rel->rd_locator);
+
+ /*
+ * Emit WAL for setting PD_ALL_VISIBLE on the heap page and
+ * setting the VM.
+ */
+ if (RelationNeedsWAL(vacrel->rel))
+ log_heap_prune_and_freeze(vacrel->rel, buf,
+ vmbuffer,
+ VISIBILITYMAP_ALL_VISIBLE |
+ VISIBILITYMAP_ALL_FROZEN,
+ InvalidTransactionId, /* conflict xid */
+ false, /* cleanup lock */
+ PRUNE_VACUUM_SCAN, /* reason */
+ NULL, 0,
+ NULL, 0,
+ NULL, 0,
+ NULL, 0);
+
END_CRIT_SECTION();
+ LockBuffer(vmbuffer, BUFFER_LOCK_UNLOCK);
+
/* Count the newly all-frozen pages for logging */
vacrel->vm_new_visible_pages++;
vacrel->vm_new_visible_frozen_pages++;
--
2.43.0
v29-0008-Remove-XLOG_HEAP2_VISIBLE-entirely.patchtext/x-patch; charset=US-ASCII; name=v29-0008-Remove-XLOG_HEAP2_VISIBLE-entirely.patchDownload
From edb532823e2eebae3edf7c68e7dc5dfa8bd3f509 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <melanieplageman@gmail.com>
Date: Sat, 27 Sep 2025 11:55:36 -0400
Subject: [PATCH v29 08/15] Remove XLOG_HEAP2_VISIBLE entirely
As no remaining users emit XLOG_HEAP2_VISIBLE records.
This includes deleting the xl_heap_visible struct and all functions
responsible for emitting or replaying XLOG_HEAP2_VISIBLE records.
This changes the visibility map API, so any external users/consumers of
the VM-only WAL record will need to change.
Author: Melanie Plageman <melanieplageman@gmail.com>
Reviewed-by: Andrey Borodin <x4mmm@yandex-team.ru>
Reviewed-by: Andres Freund <andres@anarazel.de>
Reviewed-by: Chao Li <li.evan.chao@gmail.com>
---
src/backend/access/common/bufmask.c | 4 +-
src/backend/access/heap/heapam.c | 54 +-------
src/backend/access/heap/heapam_xlog.c | 155 ++---------------------
src/backend/access/heap/pruneheap.c | 4 +-
src/backend/access/heap/vacuumlazy.c | 16 +--
src/backend/access/heap/visibilitymap.c | 112 +---------------
src/backend/access/rmgrdesc/heapdesc.c | 10 --
src/backend/replication/logical/decode.c | 1 -
src/backend/storage/ipc/standby.c | 12 +-
src/include/access/heapam_xlog.h | 20 ---
src/include/access/visibilitymap.h | 13 +-
src/include/access/visibilitymapdefs.h | 9 --
src/tools/pgindent/typedefs.list | 1 -
13 files changed, 38 insertions(+), 373 deletions(-)
diff --git a/src/backend/access/common/bufmask.c b/src/backend/access/common/bufmask.c
index bb260cffa68..5f07f179415 100644
--- a/src/backend/access/common/bufmask.c
+++ b/src/backend/access/common/bufmask.c
@@ -56,8 +56,8 @@ mask_page_hint_bits(Page page)
/*
* During replay, if the page LSN has advanced past our XLOG record's LSN,
- * we don't mark the page all-visible. See heap_xlog_visible() for
- * details.
+ * we don't mark the page all-visible. See heap_xlog_prune_and_freeze()
+ * for more details.
*/
PageClearAllVisible(page);
}
diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index 6daf4a87dec..fb7a7548aa0 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -2539,11 +2539,11 @@ heap_multi_insert(Relation relation, TupleTableSlot **slots, int ntuples,
else if (all_frozen_set)
{
PageSetAllVisible(page);
- visibilitymap_set_vmbits(BufferGetBlockNumber(buffer),
- vmbuffer,
- VISIBILITYMAP_ALL_VISIBLE |
- VISIBILITYMAP_ALL_FROZEN,
- relation->rd_locator);
+ visibilitymap_set(BufferGetBlockNumber(buffer),
+ vmbuffer,
+ VISIBILITYMAP_ALL_VISIBLE |
+ VISIBILITYMAP_ALL_FROZEN,
+ relation->rd_locator);
}
/*
@@ -8813,50 +8813,6 @@ bottomup_sort_and_shrink(TM_IndexDeleteOp *delstate)
return nblocksfavorable;
}
-/*
- * Perform XLogInsert for a heap-visible operation. 'block' is the block
- * being marked all-visible, and vm_buffer is the buffer containing the
- * corresponding visibility map block. Both should have already been modified
- * and dirtied.
- *
- * snapshotConflictHorizon comes from the largest xmin on the page being
- * marked all-visible. REDO routine uses it to generate recovery conflicts.
- *
- * If checksums or wal_log_hints are enabled, we may also generate a full-page
- * image of heap_buffer. Otherwise, we optimize away the FPI (by specifying
- * REGBUF_NO_IMAGE for the heap buffer), in which case the caller should *not*
- * update the heap page's LSN.
- */
-XLogRecPtr
-log_heap_visible(Relation rel, Buffer heap_buffer, Buffer vm_buffer,
- TransactionId snapshotConflictHorizon, uint8 vmflags)
-{
- xl_heap_visible xlrec;
- XLogRecPtr recptr;
- uint8 flags;
-
- Assert(BufferIsValid(heap_buffer));
- Assert(BufferIsValid(vm_buffer));
-
- xlrec.snapshotConflictHorizon = snapshotConflictHorizon;
- xlrec.flags = vmflags;
- if (RelationIsAccessibleInLogicalDecoding(rel))
- xlrec.flags |= VISIBILITYMAP_XLOG_CATALOG_REL;
- XLogBeginInsert();
- XLogRegisterData(&xlrec, SizeOfHeapVisible);
-
- XLogRegisterBuffer(0, vm_buffer, 0);
-
- flags = REGBUF_STANDARD;
- if (!XLogHintBitIsNeeded())
- flags |= REGBUF_NO_IMAGE;
- XLogRegisterBuffer(1, heap_buffer, flags);
-
- recptr = XLogInsert(RM_HEAP2_ID, XLOG_HEAP2_VISIBLE);
-
- return recptr;
-}
-
/*
* Perform XLogInsert for a heap-update operation. Caller must already
* have modified the buffer(s) and marked them dirty.
diff --git a/src/backend/access/heap/heapam_xlog.c b/src/backend/access/heap/heapam_xlog.c
index 1823feff298..47d2479415e 100644
--- a/src/backend/access/heap/heapam_xlog.c
+++ b/src/backend/access/heap/heapam_xlog.c
@@ -236,7 +236,7 @@ heap_xlog_prune_freeze(XLogReaderState *record)
if (PageIsNew(vmpage))
PageInit(vmpage, BLCKSZ, 0);
- visibilitymap_set_vmbits(blkno, vmbuffer, vmflags, rlocator);
+ visibilitymap_set(blkno, vmbuffer, vmflags, rlocator);
Assert(BufferIsDirty(vmbuffer));
PageSetLSN(vmpage, lsn);
@@ -249,142 +249,6 @@ heap_xlog_prune_freeze(XLogReaderState *record)
XLogRecordPageWithFreeSpace(rlocator, blkno, freespace);
}
-/*
- * Replay XLOG_HEAP2_VISIBLE records.
- *
- * The critical integrity requirement here is that we must never end up with
- * a situation where the visibility map bit is set, and the page-level
- * PD_ALL_VISIBLE bit is clear. If that were to occur, then a subsequent
- * page modification would fail to clear the visibility map bit.
- */
-static void
-heap_xlog_visible(XLogReaderState *record)
-{
- XLogRecPtr lsn = record->EndRecPtr;
- xl_heap_visible *xlrec = (xl_heap_visible *) XLogRecGetData(record);
- Buffer vmbuffer = InvalidBuffer;
- Buffer buffer;
- Page page;
- RelFileLocator rlocator;
- BlockNumber blkno;
- XLogRedoAction action;
-
- Assert((xlrec->flags & VISIBILITYMAP_XLOG_VALID_BITS) == xlrec->flags);
-
- XLogRecGetBlockTag(record, 1, &rlocator, NULL, &blkno);
-
- /*
- * If there are any Hot Standby transactions running that have an xmin
- * horizon old enough that this page isn't all-visible for them, they
- * might incorrectly decide that an index-only scan can skip a heap fetch.
- *
- * NB: It might be better to throw some kind of "soft" conflict here that
- * forces any index-only scan that is in flight to perform heap fetches,
- * rather than killing the transaction outright.
- */
- if (InHotStandby)
- ResolveRecoveryConflictWithSnapshot(xlrec->snapshotConflictHorizon,
- xlrec->flags & VISIBILITYMAP_XLOG_CATALOG_REL,
- rlocator);
-
- /*
- * Read the heap page, if it still exists. If the heap file has dropped or
- * truncated later in recovery, we don't need to update the page, but we'd
- * better still update the visibility map.
- */
- action = XLogReadBufferForRedo(record, 1, &buffer);
- if (action == BLK_NEEDS_REDO)
- {
- /*
- * We don't bump the LSN of the heap page when setting the visibility
- * map bit (unless checksums or wal_hint_bits is enabled, in which
- * case we must). This exposes us to torn page hazards, but since
- * we're not inspecting the existing page contents in any way, we
- * don't care.
- */
- page = BufferGetPage(buffer);
-
- PageSetAllVisible(page);
-
- if (XLogHintBitIsNeeded())
- PageSetLSN(page, lsn);
-
- MarkBufferDirty(buffer);
- }
- else if (action == BLK_RESTORED)
- {
- /*
- * If heap block was backed up, we already restored it and there's
- * nothing more to do. (This can only happen with checksums or
- * wal_log_hints enabled.)
- */
- }
-
- if (BufferIsValid(buffer))
- {
- Size space = PageGetFreeSpace(BufferGetPage(buffer));
-
- UnlockReleaseBuffer(buffer);
-
- /*
- * Since FSM is not WAL-logged and only updated heuristically, it
- * easily becomes stale in standbys. If the standby is later promoted
- * and runs VACUUM, it will skip updating individual free space
- * figures for pages that became all-visible (or all-frozen, depending
- * on the vacuum mode,) which is troublesome when FreeSpaceMapVacuum
- * propagates too optimistic free space values to upper FSM layers;
- * later inserters try to use such pages only to find out that they
- * are unusable. This can cause long stalls when there are many such
- * pages.
- *
- * Forestall those problems by updating FSM's idea about a page that
- * is becoming all-visible or all-frozen.
- *
- * Do this regardless of a full-page image being applied, since the
- * FSM data is not in the page anyway.
- */
- if (xlrec->flags & VISIBILITYMAP_VALID_BITS)
- XLogRecordPageWithFreeSpace(rlocator, blkno, space);
- }
-
- /*
- * Even if we skipped the heap page update due to the LSN interlock, it's
- * still safe to update the visibility map. Any WAL record that clears
- * the visibility map bit does so before checking the page LSN, so any
- * bits that need to be cleared will still be cleared.
- */
- if (XLogReadBufferForRedoExtended(record, 0, RBM_ZERO_ON_ERROR, false,
- &vmbuffer) == BLK_NEEDS_REDO)
- {
- Page vmpage = BufferGetPage(vmbuffer);
- Relation reln;
- uint8 vmbits;
-
- /* initialize the page if it was read as zeros */
- if (PageIsNew(vmpage))
- PageInit(vmpage, BLCKSZ, 0);
-
- /* remove VISIBILITYMAP_XLOG_* */
- vmbits = xlrec->flags & VISIBILITYMAP_VALID_BITS;
-
- /*
- * XLogReadBufferForRedoExtended locked the buffer. But
- * visibilitymap_set will handle locking itself.
- */
- LockBuffer(vmbuffer, BUFFER_LOCK_UNLOCK);
-
- reln = CreateFakeRelcacheEntry(rlocator);
-
- visibilitymap_set(reln, blkno, InvalidBuffer, lsn, vmbuffer,
- xlrec->snapshotConflictHorizon, vmbits);
-
- ReleaseBuffer(vmbuffer);
- FreeFakeRelcacheEntry(reln);
- }
- else if (BufferIsValid(vmbuffer))
- UnlockReleaseBuffer(vmbuffer);
-}
-
/*
* Given an "infobits" field from an XLog record, set the correct bits in the
* given infomask and infomask2 for the tuple touched by the record.
@@ -762,8 +626,8 @@ heap_xlog_multi_insert(XLogReaderState *record)
*
* During recovery, however, no concurrent writers exist. Therefore,
* updating the VM without holding the heap page lock is safe enough. This
- * same approach is taken when replaying xl_heap_visible records (see
- * heap_xlog_visible()).
+ * same approach is taken when replaying XLOG_HEAP2_PRUNE* records (see
+ * heap_xlog_prune_and_freeze()).
*/
if ((xlrec->flags & XLH_INSERT_ALL_FROZEN_SET) &&
XLogReadBufferForRedoExtended(record, 1, RBM_ZERO_ON_ERROR, false,
@@ -775,11 +639,11 @@ heap_xlog_multi_insert(XLogReaderState *record)
if (PageIsNew(vmpage))
PageInit(vmpage, BLCKSZ, 0);
- visibilitymap_set_vmbits(blkno,
- vmbuffer,
- VISIBILITYMAP_ALL_VISIBLE |
- VISIBILITYMAP_ALL_FROZEN,
- rlocator);
+ visibilitymap_set(blkno,
+ vmbuffer,
+ VISIBILITYMAP_ALL_VISIBLE |
+ VISIBILITYMAP_ALL_FROZEN,
+ rlocator);
Assert(BufferIsDirty(vmbuffer));
PageSetLSN(vmpage, lsn);
@@ -1360,9 +1224,6 @@ heap2_redo(XLogReaderState *record)
case XLOG_HEAP2_PRUNE_VACUUM_CLEANUP:
heap_xlog_prune_freeze(record);
break;
- case XLOG_HEAP2_VISIBLE:
- heap_xlog_visible(record);
- break;
case XLOG_HEAP2_MULTI_INSERT:
heap_xlog_multi_insert(record);
break;
diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index b66fc6c17e6..538d06f8449 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -1233,8 +1233,8 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
* so there is no point in optimizing it.
*/
PageSetAllVisible(page);
- visibilitymap_set_vmbits(blockno, vmbuffer, new_vmbits,
- params->relation->rd_locator);
+ visibilitymap_set(blockno, vmbuffer, new_vmbits,
+ params->relation->rd_locator);
}
MarkBufferDirty(buffer);
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index d22d2a86ed0..93f0f39c5f0 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -1889,11 +1889,11 @@ lazy_scan_new_or_empty(LVRelState *vacrel, Buffer buf, BlockNumber blkno,
log_newpage_buffer(buf, true);
PageSetAllVisible(page);
- visibilitymap_set_vmbits(blkno,
- vmbuffer,
- VISIBILITYMAP_ALL_VISIBLE |
- VISIBILITYMAP_ALL_FROZEN,
- vacrel->rel->rd_locator);
+ visibilitymap_set(blkno,
+ vmbuffer,
+ VISIBILITYMAP_ALL_VISIBLE |
+ VISIBILITYMAP_ALL_FROZEN,
+ vacrel->rel->rd_locator);
/*
* Emit WAL for setting PD_ALL_VISIBLE on the heap page and
@@ -2771,9 +2771,9 @@ lazy_vacuum_heap_page(LVRelState *vacrel, BlockNumber blkno, Buffer buffer,
* set PD_ALL_VISIBLE.
*/
PageSetAllVisible(page);
- visibilitymap_set_vmbits(blkno,
- vmbuffer, vmflags,
- vacrel->rel->rd_locator);
+ visibilitymap_set(blkno,
+ vmbuffer, vmflags,
+ vacrel->rel->rd_locator);
conflict_xid = visibility_cutoff_xid;
}
diff --git a/src/backend/access/heap/visibilitymap.c b/src/backend/access/heap/visibilitymap.c
index d14588e92ae..7997e926872 100644
--- a/src/backend/access/heap/visibilitymap.c
+++ b/src/backend/access/heap/visibilitymap.c
@@ -14,8 +14,7 @@
* visibilitymap_clear - clear bits for one page in the visibility map
* visibilitymap_pin - pin a map page for setting a bit
* visibilitymap_pin_ok - check whether correct map page is already pinned
- * visibilitymap_set - set bit(s) in a previously pinned page and log
- * visibilitymap_set_vmbits - set bit(s) in a pinned page
+ * visibilitymap_set - set bit(s) in a previously pinned page
* visibilitymap_get_status - get status of bits
* visibilitymap_count - count number of bits set in visibility map
* visibilitymap_prepare_truncate -
@@ -220,109 +219,6 @@ visibilitymap_pin_ok(BlockNumber heapBlk, Buffer vmbuf)
return BufferIsValid(vmbuf) && BufferGetBlockNumber(vmbuf) == mapBlock;
}
-/*
- * visibilitymap_set - set bit(s) on a previously pinned page
- *
- * recptr is the LSN of the XLOG record we're replaying, if we're in recovery,
- * or InvalidXLogRecPtr in normal running. The VM page LSN is advanced to the
- * one provided; in normal running, we generate a new XLOG record and set the
- * page LSN to that value (though the heap page's LSN may *not* be updated;
- * see below). cutoff_xid is the largest xmin on the page being marked
- * all-visible; it is needed for Hot Standby, and can be InvalidTransactionId
- * if the page contains no tuples. It can also be set to InvalidTransactionId
- * when a page that is already all-visible is being marked all-frozen.
- *
- * Caller is expected to set the heap page's PD_ALL_VISIBLE bit before calling
- * this function. Except in recovery, caller should also pass the heap
- * buffer. When checksums are enabled and we're not in recovery, we must add
- * the heap buffer to the WAL chain to protect it from being torn.
- *
- * You must pass a buffer containing the correct map page to this function.
- * Call visibilitymap_pin first to pin the right one. This function doesn't do
- * any I/O.
- *
- * Returns the state of the page's VM bits before setting flags.
- */
-uint8
-visibilitymap_set(Relation rel, BlockNumber heapBlk, Buffer heapBuf,
- XLogRecPtr recptr, Buffer vmBuf, TransactionId cutoff_xid,
- uint8 flags)
-{
- BlockNumber mapBlock = HEAPBLK_TO_MAPBLOCK(heapBlk);
- uint32 mapByte = HEAPBLK_TO_MAPBYTE(heapBlk);
- uint8 mapOffset = HEAPBLK_TO_OFFSET(heapBlk);
- Page page;
- uint8 *map;
- uint8 status;
-
-#ifdef TRACE_VISIBILITYMAP
- elog(DEBUG1, "vm_set flags 0x%02X for %s %d",
- flags, RelationGetRelationName(rel), heapBlk);
-#endif
-
- Assert(InRecovery || !XLogRecPtrIsValid(recptr));
- Assert(InRecovery || PageIsAllVisible(BufferGetPage(heapBuf)));
- Assert((flags & VISIBILITYMAP_VALID_BITS) == flags);
-
- /* Must never set all_frozen bit without also setting all_visible bit */
- Assert(flags != VISIBILITYMAP_ALL_FROZEN);
-
- /* Check that we have the right heap page pinned, if present */
- if (BufferIsValid(heapBuf) && BufferGetBlockNumber(heapBuf) != heapBlk)
- elog(ERROR, "wrong heap buffer passed to visibilitymap_set");
-
- Assert(!BufferIsValid(heapBuf) ||
- BufferIsLockedByMeInMode(heapBuf, BUFFER_LOCK_EXCLUSIVE));
-
- /* Check that we have the right VM page pinned */
- if (!BufferIsValid(vmBuf) || BufferGetBlockNumber(vmBuf) != mapBlock)
- elog(ERROR, "wrong VM buffer passed to visibilitymap_set");
-
- page = BufferGetPage(vmBuf);
- map = (uint8 *) PageGetContents(page);
- LockBuffer(vmBuf, BUFFER_LOCK_EXCLUSIVE);
-
- status = (map[mapByte] >> mapOffset) & VISIBILITYMAP_VALID_BITS;
- if (flags != status)
- {
- START_CRIT_SECTION();
-
- map[mapByte] |= (flags << mapOffset);
- MarkBufferDirty(vmBuf);
-
- if (RelationNeedsWAL(rel))
- {
- if (!XLogRecPtrIsValid(recptr))
- {
- Assert(!InRecovery);
- recptr = log_heap_visible(rel, heapBuf, vmBuf, cutoff_xid, flags);
-
- /*
- * If data checksums are enabled (or wal_log_hints=on), we
- * need to protect the heap page from being torn.
- *
- * If not, then we must *not* update the heap page's LSN. In
- * this case, the FPI for the heap page was omitted from the
- * WAL record inserted above, so it would be incorrect to
- * update the heap page's LSN.
- */
- if (XLogHintBitIsNeeded())
- {
- Page heapPage = BufferGetPage(heapBuf);
-
- PageSetLSN(heapPage, recptr);
- }
- }
- PageSetLSN(page, recptr);
- }
-
- END_CRIT_SECTION();
- }
-
- LockBuffer(vmBuf, BUFFER_LOCK_UNLOCK);
- return status;
-}
-
/*
* Set VM (visibility map) flags in the VM block in vmBuf.
*
@@ -344,9 +240,9 @@ visibilitymap_set(Relation rel, BlockNumber heapBlk, Buffer heapBuf,
* rlocator is used only for debugging messages.
*/
uint8
-visibilitymap_set_vmbits(BlockNumber heapBlk,
- Buffer vmBuf, uint8 flags,
- const RelFileLocator rlocator)
+visibilitymap_set(BlockNumber heapBlk,
+ Buffer vmBuf, uint8 flags,
+ const RelFileLocator rlocator)
{
BlockNumber mapBlock = HEAPBLK_TO_MAPBLOCK(heapBlk);
uint32 mapByte = HEAPBLK_TO_MAPBYTE(heapBlk);
diff --git a/src/backend/access/rmgrdesc/heapdesc.c b/src/backend/access/rmgrdesc/heapdesc.c
index ca26d1f0ed1..08461fdf593 100644
--- a/src/backend/access/rmgrdesc/heapdesc.c
+++ b/src/backend/access/rmgrdesc/heapdesc.c
@@ -349,13 +349,6 @@ heap2_desc(StringInfo buf, XLogReaderState *record)
}
}
}
- else if (info == XLOG_HEAP2_VISIBLE)
- {
- xl_heap_visible *xlrec = (xl_heap_visible *) rec;
-
- appendStringInfo(buf, "snapshotConflictHorizon: %u, flags: 0x%02X",
- xlrec->snapshotConflictHorizon, xlrec->flags);
- }
else if (info == XLOG_HEAP2_MULTI_INSERT)
{
xl_heap_multi_insert *xlrec = (xl_heap_multi_insert *) rec;
@@ -461,9 +454,6 @@ heap2_identify(uint8 info)
case XLOG_HEAP2_PRUNE_VACUUM_CLEANUP:
id = "PRUNE_VACUUM_CLEANUP";
break;
- case XLOG_HEAP2_VISIBLE:
- id = "VISIBLE";
- break;
case XLOG_HEAP2_MULTI_INSERT:
id = "MULTI_INSERT";
break;
diff --git a/src/backend/replication/logical/decode.c b/src/backend/replication/logical/decode.c
index 5e15cb1825e..c0cac7ea1c3 100644
--- a/src/backend/replication/logical/decode.c
+++ b/src/backend/replication/logical/decode.c
@@ -454,7 +454,6 @@ heap2_decode(LogicalDecodingContext *ctx, XLogRecordBuffer *buf)
case XLOG_HEAP2_PRUNE_ON_ACCESS:
case XLOG_HEAP2_PRUNE_VACUUM_SCAN:
case XLOG_HEAP2_PRUNE_VACUUM_CLEANUP:
- case XLOG_HEAP2_VISIBLE:
case XLOG_HEAP2_LOCK_UPDATED:
break;
default:
diff --git a/src/backend/storage/ipc/standby.c b/src/backend/storage/ipc/standby.c
index fc45d72c79b..3655358ed6b 100644
--- a/src/backend/storage/ipc/standby.c
+++ b/src/backend/storage/ipc/standby.c
@@ -475,12 +475,12 @@ ResolveRecoveryConflictWithSnapshot(TransactionId snapshotConflictHorizon,
* If we get passed InvalidTransactionId then we do nothing (no conflict).
*
* This can happen when replaying already-applied WAL records after a
- * standby crash or restart, or when replaying an XLOG_HEAP2_VISIBLE
- * record that marks as frozen a page which was already all-visible. It's
- * also quite common with records generated during index deletion
- * (original execution of the deletion can reason that a recovery conflict
- * which is sufficient for the deletion operation must take place before
- * replay of the deletion record itself).
+ * standby crash or restart, or when replaying a record that marks as
+ * frozen a page which was already marked all-visible in the visibility
+ * map. It's also quite common with records generated during index
+ * deletion (original execution of the deletion can reason that a recovery
+ * conflict which is sufficient for the deletion operation must take place
+ * before replay of the deletion record itself).
*/
if (!TransactionIdIsValid(snapshotConflictHorizon))
return;
diff --git a/src/include/access/heapam_xlog.h b/src/include/access/heapam_xlog.h
index 16c2b2e3c9c..b27fcdfb345 100644
--- a/src/include/access/heapam_xlog.h
+++ b/src/include/access/heapam_xlog.h
@@ -60,7 +60,6 @@
#define XLOG_HEAP2_PRUNE_ON_ACCESS 0x10
#define XLOG_HEAP2_PRUNE_VACUUM_SCAN 0x20
#define XLOG_HEAP2_PRUNE_VACUUM_CLEANUP 0x30
-#define XLOG_HEAP2_VISIBLE 0x40
#define XLOG_HEAP2_MULTI_INSERT 0x50
#define XLOG_HEAP2_LOCK_UPDATED 0x60
#define XLOG_HEAP2_NEW_CID 0x70
@@ -443,20 +442,6 @@ typedef struct xl_heap_inplace
#define MinSizeOfHeapInplace (offsetof(xl_heap_inplace, nmsgs) + sizeof(int))
-/*
- * This is what we need to know about setting a visibility map bit
- *
- * Backup blk 0: visibility map buffer
- * Backup blk 1: heap buffer
- */
-typedef struct xl_heap_visible
-{
- TransactionId snapshotConflictHorizon;
- uint8 flags;
-} xl_heap_visible;
-
-#define SizeOfHeapVisible (offsetof(xl_heap_visible, flags) + sizeof(uint8))
-
typedef struct xl_heap_new_cid
{
/*
@@ -500,11 +485,6 @@ extern void heap2_desc(StringInfo buf, XLogReaderState *record);
extern const char *heap2_identify(uint8 info);
extern void heap_xlog_logical_rewrite(XLogReaderState *r);
-extern XLogRecPtr log_heap_visible(Relation rel, Buffer heap_buffer,
- Buffer vm_buffer,
- TransactionId snapshotConflictHorizon,
- uint8 vmflags);
-
/* in heapdesc.c, so it can be shared between frontend/backend code */
extern void heap_xlog_deserialize_prune_and_freeze(char *cursor, uint16 flags,
int *nplans, xlhp_freeze_plan **plans,
diff --git a/src/include/access/visibilitymap.h b/src/include/access/visibilitymap.h
index c6fa37be968..05ba6786b47 100644
--- a/src/include/access/visibilitymap.h
+++ b/src/include/access/visibilitymap.h
@@ -15,7 +15,6 @@
#define VISIBILITYMAP_H
#include "access/visibilitymapdefs.h"
-#include "access/xlogdefs.h"
#include "storage/block.h"
#include "storage/buf.h"
#include "storage/relfilelocator.h"
@@ -32,15 +31,9 @@ extern bool visibilitymap_clear(Relation rel, BlockNumber heapBlk,
extern void visibilitymap_pin(Relation rel, BlockNumber heapBlk,
Buffer *vmbuf);
extern bool visibilitymap_pin_ok(BlockNumber heapBlk, Buffer vmbuf);
-extern uint8 visibilitymap_set(Relation rel,
- BlockNumber heapBlk, Buffer heapBuf,
- XLogRecPtr recptr,
- Buffer vmBuf,
- TransactionId cutoff_xid,
- uint8 flags);
-extern uint8 visibilitymap_set_vmbits(BlockNumber heapBlk,
- Buffer vmBuf, uint8 flags,
- const RelFileLocator rlocator);
+extern uint8 visibilitymap_set(BlockNumber heapBlk,
+ Buffer vmBuf, uint8 flags,
+ const RelFileLocator rlocator);
extern uint8 visibilitymap_get_status(Relation rel, BlockNumber heapBlk, Buffer *vmbuf);
extern void visibilitymap_count(Relation rel, BlockNumber *all_visible, BlockNumber *all_frozen);
extern BlockNumber visibilitymap_prepare_truncate(Relation rel,
diff --git a/src/include/access/visibilitymapdefs.h b/src/include/access/visibilitymapdefs.h
index 5ad5c020877..e01bce4c99f 100644
--- a/src/include/access/visibilitymapdefs.h
+++ b/src/include/access/visibilitymapdefs.h
@@ -21,14 +21,5 @@
#define VISIBILITYMAP_ALL_FROZEN 0x02
#define VISIBILITYMAP_VALID_BITS 0x03 /* OR of all valid visibilitymap
* flags bits */
-/*
- * To detect recovery conflicts during logical decoding on a standby, we need
- * to know if a table is a user catalog table. For that we add an additional
- * bit into xl_heap_visible.flags, in addition to the above.
- *
- * NB: VISIBILITYMAP_XLOG_* may not be passed to visibilitymap_set().
- */
-#define VISIBILITYMAP_XLOG_CATALOG_REL 0x04
-#define VISIBILITYMAP_XLOG_VALID_BITS (VISIBILITYMAP_VALID_BITS | VISIBILITYMAP_XLOG_CATALOG_REL)
#endif /* VISIBILITYMAPDEFS_H */
diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list
index 04845d5e680..6505628120c 100644
--- a/src/tools/pgindent/typedefs.list
+++ b/src/tools/pgindent/typedefs.list
@@ -4331,7 +4331,6 @@ xl_heap_prune
xl_heap_rewrite_mapping
xl_heap_truncate
xl_heap_update
-xl_heap_visible
xl_invalid_page
xl_invalid_page_key
xl_invalidations
--
2.43.0
v29-0009-Simplify-heap_page_would_be_all_visible-visibili.patchtext/x-patch; charset=US-ASCII; name=v29-0009-Simplify-heap_page_would_be_all_visible-visibili.patchDownload
From eee8545874d8553cc74042fd6e3110cc38a71be4 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <melanieplageman@gmail.com>
Date: Wed, 17 Dec 2025 13:57:16 -0500
Subject: [PATCH v29 09/15] Simplify heap_page_would_be_all_visible visibility
check
heap_page_would_be_all_visible() doesn't care about the distinction
between HEAPTUPLE_RECENTLY_DEAD and HEAPTUPLE_DEAD tuples -- any tuple
that is not HEAPTUPLE_LIVE means the page is not all-visible and causes
us to return false.
Therefore, we don't need to call HeapTupleSatisfiesVacuum(), which
includes an extra step to distinguish between dead and recently dead
tuples using OldestXmin. Replace it with the more minimal
HeapTupleSatisfiesVacuumHorizon().
This has the added benefit of making it easier to replace uses of
OldestXmin in heap_page_would_be_all_visible() in the future.
---
src/backend/access/heap/vacuumlazy.c | 9 +++++++--
1 file changed, 7 insertions(+), 2 deletions(-)
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index 93f0f39c5f0..ff297b0b025 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -3537,6 +3537,7 @@ heap_page_would_be_all_visible(Relation rel, Buffer buf,
{
ItemId itemid;
HeapTupleData tuple;
+ TransactionId dead_after = InvalidTransactionId;
/*
* Set the offset number so that we can display it along with any
@@ -3576,12 +3577,14 @@ heap_page_would_be_all_visible(Relation rel, Buffer buf,
/* Visibility checks may do IO or allocate memory */
Assert(CritSectionCount == 0);
- switch (HeapTupleSatisfiesVacuum(&tuple, OldestXmin, buf))
+ switch (HeapTupleSatisfiesVacuumHorizon(&tuple, buf, &dead_after))
{
case HEAPTUPLE_LIVE:
{
TransactionId xmin;
+ Assert(!TransactionIdIsValid(dead_after));
+
/* Check comments in lazy_scan_prune. */
if (!HeapTupleHeaderXminCommitted(tuple.t_data))
{
@@ -3614,8 +3617,10 @@ heap_page_would_be_all_visible(Relation rel, Buffer buf,
}
break;
- case HEAPTUPLE_DEAD:
case HEAPTUPLE_RECENTLY_DEAD:
+ Assert(TransactionIdIsValid(dead_after));
+ /* FALLTHROUGH */
+ case HEAPTUPLE_DEAD:
case HEAPTUPLE_INSERT_IN_PROGRESS:
case HEAPTUPLE_DELETE_IN_PROGRESS:
{
--
2.43.0
v29-0010-Use-GlobalVisState-in-vacuum-to-determine-page-l.patchtext/x-patch; charset=UTF-8; name=v29-0010-Use-GlobalVisState-in-vacuum-to-determine-page-l.patchDownload
From ca02669803f9af01e4e7e3767a3e1a08d931bd0f Mon Sep 17 00:00:00 2001
From: Melanie Plageman <melanieplageman@gmail.com>
Date: Wed, 17 Dec 2025 16:51:05 -0500
Subject: [PATCH v29 10/15] Use GlobalVisState in vacuum to determine page
level visibility
During vacuum's first and third phases, we examine tuples' visibility
to determine if we can set the page all-visible in the visibility map.
Previously, this check compared tuple xmins against a single XID chosen at
the start of vacuum (OldestXmin). We now use GlobalVisState, which also
enables future work to set the VM during on-access pruning, since ordinary
queries have access to GlobalVisState but not OldestXmin.
This also benefits vacuum: in some cases, GlobalVisState may advance
during a vacuum, allowing more pages to become considered all-visible.
And, in the future, we could easily add a heuristic to update
GlobalVisState more frequently during vacuums of large tables.
OldestXmin is still used for freezing and as a backstop to ensure we
don't freeze a dead tuple that wasn't yet prunable according to
GlobalVisState in the rare occurrences where GlobalVisState moves
backwards.
Because comparing a transaction ID against GlobalVisState is more
expensive than comparing against a single XID, we defer this check until
after scanning all tuples on the page. If visibility_cutoff_xid was
maintained, we perform the GlobalVisState check only once per page.
This is safe because visibility_cutoff_xid records the newest xmin on
the page; if it is globally visible, then the entire page is all-visible.
This approach may result in examining more tuple xmins than before,
since with OldestXmin we could sometimes rule out the page being
all-visible earlier. However, profiling shows the additional cost is not
significant.
Reviewed-by: Andres Freund <andres@anarazel.de>
Reviewed-by: Chao Li <li.evan.chao@gmail.com>
---
src/backend/access/heap/heapam_visibility.c | 22 +++++++++
src/backend/access/heap/pruneheap.c | 53 ++++++++++-----------
src/backend/access/heap/vacuumlazy.c | 38 ++++++++++-----
src/include/access/heapam.h | 4 +-
4 files changed, 76 insertions(+), 41 deletions(-)
diff --git a/src/backend/access/heap/heapam_visibility.c b/src/backend/access/heap/heapam_visibility.c
index bf899c2d2c6..7d9bd28d8f0 100644
--- a/src/backend/access/heap/heapam_visibility.c
+++ b/src/backend/access/heap/heapam_visibility.c
@@ -1053,6 +1053,28 @@ HeapTupleSatisfiesVacuum(HeapTuple htup, TransactionId OldestXmin,
return res;
}
+/*
+ * Wrapper around GlobalVisTestIsRemovableXid() for use when examining live
+ * tuples. Returns true if the given XID may be considered running by at least
+ * one snapshot.
+ *
+ * This function alone is insufficient to determine tuple visibility; callers
+ * must also consider the XID's commit status. Its purpose is purely semantic:
+ * when applied to live tuples, GlobalVisTestIsRemovableXid() is checking
+ * whether the inserting transaction is still considered running, not whether
+ * the tuple is removable. Live tuples are, by definition, not removable, but
+ * the snapshot criteria for “transaction still running” are identical to
+ * those used for removal XIDs.
+ *
+ * See the comment above GlobalVisTestIsRemovable[Full]Xid() for details on the
+ * required preconditions for calling this function.
+ */
+bool
+GlobalVisTestXidMaybeRunning(GlobalVisState *state, TransactionId xid)
+{
+ return !GlobalVisTestIsRemovableXid(state, xid);
+}
+
/*
* Work horse for HeapTupleSatisfiesVacuum and similar routines.
*
diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index 538d06f8449..54e60e2c635 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -461,11 +461,12 @@ prune_freeze_setup(PruneFreezeParams *params,
/*
* The visibility cutoff xid is the newest xmin of live, committed tuples
- * older than OldestXmin on the page. This field is only kept up-to-date
- * if the page is all-visible. As soon as a tuple is encountered that is
- * not visible to all, this field is unmaintained. As long as it is
- * maintained, it can be used to calculate the snapshot conflict horizon
- * when updating the VM and/or freezing all the tuples on the page.
+ * on the page older than the visibility horizon represented in the
+ * GlobalVisState. This field is only kept up-to-date if the page is
+ * all-visible. As soon as a tuple is encountered that is not visible to
+ * all, this field is unmaintained. As long as it is maintained, it can be
+ * used to calculate the snapshot conflict horizon when updating the VM
+ * and/or freezing all the tuples on the page.
*/
prstate->visibility_cutoff_xid = InvalidTransactionId;
}
@@ -1001,14 +1002,14 @@ heap_page_will_set_vm(PruneState *prstate,
*/
static bool
heap_page_is_all_visible(Relation rel, Buffer buf,
- TransactionId OldestXmin,
+ GlobalVisState *vistest,
bool *all_frozen,
TransactionId *visibility_cutoff_xid,
OffsetNumber *logging_offnum)
{
return heap_page_would_be_all_visible(rel, buf,
- OldestXmin,
+ vistest,
NULL, 0,
all_frozen,
visibility_cutoff_xid,
@@ -1095,6 +1096,16 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
prune_freeze_plan(RelationGetRelid(params->relation),
buffer, &prstate, off_loc);
+ /*
+ * After processing all the live tuples on the page, if the newest xmin
+ * amongst them may be considered running by any snapshot, the page cannot
+ * be all-visible.
+ */
+ if (prstate.all_visible &&
+ TransactionIdIsNormal(prstate.visibility_cutoff_xid) &&
+ GlobalVisTestXidMaybeRunning(prstate.vistest, prstate.visibility_cutoff_xid))
+ prstate.all_visible = prstate.all_frozen = false;
+
/*
* If checksums are enabled, calling heap_prune_satisfies_vacuum() while
* checking tuple visibility information in prune_freeze_plan() may have
@@ -1276,10 +1287,9 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
bool debug_all_frozen;
Assert(prstate.lpdead_items == 0);
- Assert(prstate.cutoffs);
Assert(heap_page_is_all_visible(params->relation, buffer,
- prstate.cutoffs->OldestXmin,
+ prstate.vistest,
&debug_all_frozen,
&debug_cutoff, off_loc));
@@ -1800,28 +1810,15 @@ heap_prune_record_unchanged_lp_normal(Page page, PruneState *prstate, OffsetNumb
}
/*
- * The inserter definitely committed. But is it old enough
- * that everyone sees it as committed? A FrozenTransactionId
- * is seen as committed to everyone. Otherwise, we check if
- * there is a snapshot that considers this xid to still be
- * running, and if so, we don't consider the page all-visible.
+ * The inserter definitely committed. But we don't know if it
+ * is old enough that everyone sees it as committed. Later,
+ * after processing all the tuples on the page, we'll check if
+ * there is any snapshot that still consider the newest xid on
+ * the page to be running. If so, we don't consider the page
+ * all-visible.
*/
xmin = HeapTupleHeaderGetXmin(htup);
- /*
- * For now always use prstate->cutoffs for this test, because
- * we only update 'all_visible' and 'all_frozen' when freezing
- * is requested. We could use GlobalVisTestIsRemovableXid
- * instead, if a non-freezing caller wanted to set the VM bit.
- */
- Assert(prstate->cutoffs);
- if (!TransactionIdPrecedes(xmin, prstate->cutoffs->OldestXmin))
- {
- prstate->all_visible = false;
- prstate->all_frozen = false;
- break;
- }
-
/* Track newest xmin on page. */
if (TransactionIdFollows(xmin, prstate->visibility_cutoff_xid) &&
TransactionIdIsNormal(xmin))
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index ff297b0b025..94f8546be95 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -2725,7 +2725,7 @@ lazy_vacuum_heap_page(LVRelState *vacrel, BlockNumber blkno, Buffer buffer,
* done outside the critical section.
*/
if (heap_page_would_be_all_visible(vacrel->rel, buffer,
- vacrel->cutoffs.OldestXmin,
+ vacrel->vistest,
deadoffsets, num_offsets,
&all_frozen, &visibility_cutoff_xid,
&vacrel->offnum))
@@ -3486,7 +3486,7 @@ dead_items_cleanup(LVRelState *vacrel)
* Returns true if the page is all-visible other than the provided
* deadoffsets and false otherwise.
*
- * OldestXmin is used to determine visibility.
+ * vistest is used to determine visibility.
*
* Output parameters:
*
@@ -3502,7 +3502,7 @@ dead_items_cleanup(LVRelState *vacrel)
*/
bool
heap_page_would_be_all_visible(Relation rel, Buffer buf,
- TransactionId OldestXmin,
+ GlobalVisState *vistest,
OffsetNumber *deadoffsets,
int ndeadoffsets,
bool *all_frozen,
@@ -3585,7 +3585,7 @@ heap_page_would_be_all_visible(Relation rel, Buffer buf,
Assert(!TransactionIdIsValid(dead_after));
- /* Check comments in lazy_scan_prune. */
+ /* Check heap_prune_record_unchanged_lp_normal comments */
if (!HeapTupleHeaderXminCommitted(tuple.t_data))
{
all_visible = false;
@@ -3594,16 +3594,17 @@ heap_page_would_be_all_visible(Relation rel, Buffer buf,
}
/*
- * The inserter definitely committed. But is it old enough
- * that everyone sees it as committed?
+ * The inserter definitely committed. But we don't know if
+ * it is old enough that everyone sees it as committed.
+ * Don't check that now.
+ *
+ * If we scan all tuples without finding one that prevents
+ * the page from being all-visible, we then check whether
+ * any snapshot still considers the newest XID on the page
+ * to be running. In that case, the page is not considered
+ * all-visible.
*/
xmin = HeapTupleHeaderGetXmin(tuple.t_data);
- if (!TransactionIdPrecedes(xmin, OldestXmin))
- {
- all_visible = false;
- *all_frozen = false;
- break;
- }
/* Track newest xmin on page. */
if (TransactionIdFollows(xmin, *visibility_cutoff_xid) &&
@@ -3634,6 +3635,19 @@ heap_page_would_be_all_visible(Relation rel, Buffer buf,
}
} /* scan along page */
+ /*
+ * After processing all the live tuples on the page, if the newest xmin
+ * among them may still be considered running by any snapshot, the page
+ * cannot be all-visible.
+ */
+ if (all_visible &&
+ TransactionIdIsNormal(*visibility_cutoff_xid) &&
+ GlobalVisTestXidMaybeRunning(vistest, *visibility_cutoff_xid))
+ {
+ all_visible = false;
+ *all_frozen = false;
+ }
+
/* Clear the offset information once we have processed the given page. */
*logging_offnum = InvalidOffsetNumber;
diff --git a/src/include/access/heapam.h b/src/include/access/heapam.h
index bec2f840102..1625b107575 100644
--- a/src/include/access/heapam.h
+++ b/src/include/access/heapam.h
@@ -439,7 +439,7 @@ extern void log_heap_prune_and_freeze(Relation relation, Buffer buffer,
extern void heap_vacuum_rel(Relation rel,
const VacuumParams params, BufferAccessStrategy bstrategy);
extern bool heap_page_would_be_all_visible(Relation rel, Buffer buf,
- TransactionId OldestXmin,
+ GlobalVisState *vistest,
OffsetNumber *deadoffsets,
int ndeadoffsets,
bool *all_frozen,
@@ -453,6 +453,8 @@ extern TM_Result HeapTupleSatisfiesUpdate(HeapTuple htup, CommandId curcid,
Buffer buffer);
extern HTSV_Result HeapTupleSatisfiesVacuum(HeapTuple htup, TransactionId OldestXmin,
Buffer buffer);
+
+extern bool GlobalVisTestXidMaybeRunning(GlobalVisState *state, TransactionId xid);
extern HTSV_Result HeapTupleSatisfiesVacuumHorizon(HeapTuple htup, Buffer buffer,
TransactionId *dead_after);
extern void HeapTupleSetHintBits(HeapTupleHeader tuple, Buffer buffer,
--
2.43.0
v29-0011-Unset-all_visible-sooner-if-not-freezing.patchtext/x-patch; charset=UTF-8; name=v29-0011-Unset-all_visible-sooner-if-not-freezing.patchDownload
From 097858a0e31bc307169aee8f54c1c693fb8cdc23 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <melanieplageman@gmail.com>
Date: Tue, 14 Oct 2025 15:22:35 -0400
Subject: [PATCH v29 11/15] Unset all_visible sooner if not freezing
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
In the prune/freeze path, we currently delay clearing all_visible and
all_frozen in the presence of dead items to allow opportunistic
freezing.
However, if no freezing will be attempted, there’s no need to delay.
Clearing the flags earlier avoids extra bookkeeping in
heap_prune_record_unchanged_lp_normal(). This currently has no runtime
effect because all callers that consider setting the VM also prepare
freeze plans, but upcoming changes will allow on-access pruning to set
the VM without freezing. The extra bookkeeping was noticeable in a
profile of on-access VM setting.
Reviewed-by: Chao Li <li.evan.chao@gmail.com>
---
src/backend/access/heap/pruneheap.c | 15 +++++++++++++--
1 file changed, 13 insertions(+), 2 deletions(-)
diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index 54e60e2c635..8df81833179 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -1675,8 +1675,13 @@ heap_prune_record_dead(PruneState *prstate, OffsetNumber offnum,
/*
* Deliberately delay unsetting all_visible and all_frozen until later
* during pruning. Removable dead tuples shouldn't preclude freezing the
- * page.
+ * page. If we won't attempt freezing, just unset all-visible now, though.
*/
+ if (!prstate->attempt_freeze)
+ {
+ prstate->all_visible = false;
+ prstate->all_frozen = false;
+ }
/* Record the dead offset for vacuum */
prstate->deadoffsets[prstate->lpdead_items++] = offnum;
@@ -1936,8 +1941,14 @@ heap_prune_record_unchanged_lp_dead(Page page, PruneState *prstate, OffsetNumber
* Similarly, don't unset all_visible and all_frozen until later, at the
* end of heap_page_prune_and_freeze(). This will allow us to attempt to
* freeze the page after pruning. As long as we unset it before updating
- * the visibility map, this will be correct.
+ * the visibility map, this will be correct. If we won't attempt freezing,
+ * though, just unset all-visible now.
*/
+ if (!prstate->attempt_freeze)
+ {
+ prstate->all_visible = false;
+ prstate->all_frozen = false;
+ }
/* Record the dead offset for vacuum */
prstate->deadoffsets[prstate->lpdead_items++] = offnum;
--
2.43.0
v29-0012-Track-which-relations-are-modified-by-a-query.patchtext/x-patch; charset=US-ASCII; name=v29-0012-Track-which-relations-are-modified-by-a-query.patchDownload
From ea4f0d5c34c303e3cf99f9e240e5c0a1db088cf0 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <melanieplageman@gmail.com>
Date: Wed, 3 Dec 2025 15:07:24 -0500
Subject: [PATCH v29 12/15] Track which relations are modified by a query
Save the relids in a bitmap in the estate. A later commit will pass this
information down to scan nodes to control whether or not the scan allows
setting the visibility map while on-access pruning. We don't want to set
the visibility map if the query is just going to modify the page
immediately after.
---
src/backend/executor/execMain.c | 4 ++++
src/backend/executor/execUtils.c | 2 ++
src/include/nodes/execnodes.h | 6 ++++++
3 files changed, 12 insertions(+)
diff --git a/src/backend/executor/execMain.c b/src/backend/executor/execMain.c
index 797d8b1ca1c..5b299ef81aa 100644
--- a/src/backend/executor/execMain.c
+++ b/src/backend/executor/execMain.c
@@ -916,6 +916,10 @@ InitPlan(QueryDesc *queryDesc, int eflags)
break;
}
+ /* If it has a rowmark, the relation is modified */
+ estate->es_modified_relids = bms_add_member(estate->es_modified_relids,
+ rc->rti);
+
/* Check that relation is a legal target for marking */
if (relation)
CheckValidRowMarkRel(relation, rc->markType);
diff --git a/src/backend/executor/execUtils.c b/src/backend/executor/execUtils.c
index fdc65c2b42b..28a06dcd244 100644
--- a/src/backend/executor/execUtils.c
+++ b/src/backend/executor/execUtils.c
@@ -893,6 +893,8 @@ ExecInitResultRelation(EState *estate, ResultRelInfo *resultRelInfo,
estate->es_result_relations = (ResultRelInfo **)
palloc0(estate->es_range_table_size * sizeof(ResultRelInfo *));
estate->es_result_relations[rti - 1] = resultRelInfo;
+ estate->es_modified_relids = bms_add_member(estate->es_modified_relids,
+ rti);
/*
* Saving in the list allows to avoid needlessly traversing the whole
diff --git a/src/include/nodes/execnodes.h b/src/include/nodes/execnodes.h
index 3968429f991..d8c385216e0 100644
--- a/src/include/nodes/execnodes.h
+++ b/src/include/nodes/execnodes.h
@@ -676,6 +676,12 @@ typedef struct EState
* ExecDoInitialPruning() */
const char *es_sourceText; /* Source text from QueryDesc */
+ /*
+ * RT indexes of relations modified by the query either through
+ * UPDATE/DELETE/INSERT/MERGE or SELECT FOR UPDATE
+ */
+ Bitmapset *es_modified_relids;
+
JunkFilter *es_junkFilter; /* top-level junk filter, if any */
/* If query can insert/delete tuples, the command ID to mark them with */
--
2.43.0
v29-0013-Pass-down-information-on-table-modification-to-s.patchtext/x-patch; charset=US-ASCII; name=v29-0013-Pass-down-information-on-table-modification-to-s.patchDownload
From 21c288f212887e37433155ca80093ab3893ea1f6 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <melanieplageman@gmail.com>
Date: Wed, 3 Dec 2025 15:12:18 -0500
Subject: [PATCH v29 13/15] Pass down information on table modification to scan
node
Pass down information to sequential scan, index scan, and bitmap table
scan nodes on whether or not the query modifies the relation being
scanned. A later commit will use this information to update the VM
during on-access pruning only if the relation is not modified by the
query.
---
contrib/pgrowlocks/pgrowlocks.c | 2 +-
src/backend/access/brin/brin.c | 3 ++-
src/backend/access/gin/gininsert.c | 3 ++-
src/backend/access/heap/heapam_handler.c | 7 +++---
src/backend/access/index/genam.c | 4 ++--
src/backend/access/index/indexam.c | 6 +++---
src/backend/access/nbtree/nbtsort.c | 2 +-
src/backend/access/table/tableam.c | 7 +++---
src/backend/commands/constraint.c | 2 +-
src/backend/commands/copyto.c | 2 +-
src/backend/commands/tablecmds.c | 8 +++----
src/backend/commands/typecmds.c | 4 ++--
src/backend/executor/execIndexing.c | 2 +-
src/backend/executor/execReplication.c | 8 +++----
src/backend/executor/nodeBitmapHeapscan.c | 9 +++++++-
src/backend/executor/nodeIndexonlyscan.c | 2 +-
src/backend/executor/nodeIndexscan.c | 11 ++++++++--
src/backend/executor/nodeSeqscan.c | 26 ++++++++++++++++++++---
src/backend/partitioning/partbounds.c | 2 +-
src/backend/utils/adt/selfuncs.c | 2 +-
src/include/access/genam.h | 2 +-
src/include/access/heapam.h | 6 ++++++
src/include/access/tableam.h | 19 ++++++++++-------
23 files changed, 93 insertions(+), 46 deletions(-)
diff --git a/contrib/pgrowlocks/pgrowlocks.c b/contrib/pgrowlocks/pgrowlocks.c
index f88269332b6..27f01d8055f 100644
--- a/contrib/pgrowlocks/pgrowlocks.c
+++ b/contrib/pgrowlocks/pgrowlocks.c
@@ -114,7 +114,7 @@ pgrowlocks(PG_FUNCTION_ARGS)
RelationGetRelationName(rel));
/* Scan the relation */
- scan = table_beginscan(rel, GetActiveSnapshot(), 0, NULL);
+ scan = table_beginscan(rel, GetActiveSnapshot(), 0, NULL, 0);
hscan = (HeapScanDesc) scan;
attinmeta = TupleDescGetAttInMetadata(rsinfo->setDesc);
diff --git a/src/backend/access/brin/brin.c b/src/backend/access/brin/brin.c
index 45d306037a4..5c4bf5f0c6e 100644
--- a/src/backend/access/brin/brin.c
+++ b/src/backend/access/brin/brin.c
@@ -2842,7 +2842,8 @@ _brin_parallel_scan_and_build(BrinBuildState *state,
indexInfo->ii_Concurrent = brinshared->isconcurrent;
scan = table_beginscan_parallel(heap,
- ParallelTableScanFromBrinShared(brinshared));
+ ParallelTableScanFromBrinShared(brinshared),
+ 0);
reltuples = table_index_build_scan(heap, index, indexInfo, true, true,
brinbuildCallbackParallel, state, scan);
diff --git a/src/backend/access/gin/gininsert.c b/src/backend/access/gin/gininsert.c
index 88246071c4b..b63bd24ebfb 100644
--- a/src/backend/access/gin/gininsert.c
+++ b/src/backend/access/gin/gininsert.c
@@ -2058,7 +2058,8 @@ _gin_parallel_scan_and_build(GinBuildState *state,
indexInfo->ii_Concurrent = ginshared->isconcurrent;
scan = table_beginscan_parallel(heap,
- ParallelTableScanFromGinBuildShared(ginshared));
+ ParallelTableScanFromGinBuildShared(ginshared),
+ 0);
reltuples = table_index_build_scan(heap, index, indexInfo, true, progress,
ginBuildCallbackParallel, state, scan);
diff --git a/src/backend/access/heap/heapam_handler.c b/src/backend/access/heap/heapam_handler.c
index dd4fe6bf62f..6c2e4e08b16 100644
--- a/src/backend/access/heap/heapam_handler.c
+++ b/src/backend/access/heap/heapam_handler.c
@@ -79,12 +79,13 @@ heapam_slot_callbacks(Relation relation)
*/
static IndexFetchTableData *
-heapam_index_fetch_begin(Relation rel)
+heapam_index_fetch_begin(Relation rel, uint32 flags)
{
IndexFetchHeapData *hscan = palloc0_object(IndexFetchHeapData);
hscan->xs_base.rel = rel;
hscan->xs_cbuf = InvalidBuffer;
+ hscan->modifies_base_rel = !(flags & SO_HINT_REL_READ_ONLY);
return &hscan->xs_base;
}
@@ -753,7 +754,7 @@ heapam_relation_copy_for_cluster(Relation OldHeap, Relation NewHeap,
tableScan = NULL;
heapScan = NULL;
- indexScan = index_beginscan(OldHeap, OldIndex, SnapshotAny, NULL, 0, 0);
+ indexScan = index_beginscan(OldHeap, OldIndex, SnapshotAny, NULL, 0, 0, 0);
index_rescan(indexScan, NULL, 0, NULL, 0);
}
else
@@ -762,7 +763,7 @@ heapam_relation_copy_for_cluster(Relation OldHeap, Relation NewHeap,
pgstat_progress_update_param(PROGRESS_CLUSTER_PHASE,
PROGRESS_CLUSTER_PHASE_SEQ_SCAN_HEAP);
- tableScan = table_beginscan(OldHeap, SnapshotAny, 0, (ScanKey) NULL);
+ tableScan = table_beginscan(OldHeap, SnapshotAny, 0, (ScanKey) NULL, 0);
heapScan = (HeapScanDesc) tableScan;
indexScan = NULL;
diff --git a/src/backend/access/index/genam.c b/src/backend/access/index/genam.c
index b7f10a1aed0..15f9cc11582 100644
--- a/src/backend/access/index/genam.c
+++ b/src/backend/access/index/genam.c
@@ -447,7 +447,7 @@ systable_beginscan(Relation heapRelation,
}
sysscan->iscan = index_beginscan(heapRelation, irel,
- snapshot, NULL, nkeys, 0);
+ snapshot, NULL, nkeys, 0, 0);
index_rescan(sysscan->iscan, idxkey, nkeys, NULL, 0);
sysscan->scan = NULL;
@@ -708,7 +708,7 @@ systable_beginscan_ordered(Relation heapRelation,
}
sysscan->iscan = index_beginscan(heapRelation, indexRelation,
- snapshot, NULL, nkeys, 0);
+ snapshot, NULL, nkeys, 0, 0);
index_rescan(sysscan->iscan, idxkey, nkeys, NULL, 0);
sysscan->scan = NULL;
diff --git a/src/backend/access/index/indexam.c b/src/backend/access/index/indexam.c
index 0492d92d23b..b5523cf2ab1 100644
--- a/src/backend/access/index/indexam.c
+++ b/src/backend/access/index/indexam.c
@@ -257,7 +257,7 @@ index_beginscan(Relation heapRelation,
Relation indexRelation,
Snapshot snapshot,
IndexScanInstrumentation *instrument,
- int nkeys, int norderbys)
+ int nkeys, int norderbys, uint32 flags)
{
IndexScanDesc scan;
@@ -284,7 +284,7 @@ index_beginscan(Relation heapRelation,
scan->instrument = instrument;
/* prepare to fetch index matches from table */
- scan->xs_heapfetch = table_index_fetch_begin(heapRelation);
+ scan->xs_heapfetch = table_index_fetch_begin(heapRelation, flags);
return scan;
}
@@ -615,7 +615,7 @@ index_beginscan_parallel(Relation heaprel, Relation indexrel,
scan->instrument = instrument;
/* prepare to fetch index matches from table */
- scan->xs_heapfetch = table_index_fetch_begin(heaprel);
+ scan->xs_heapfetch = table_index_fetch_begin(heaprel, 0);
return scan;
}
diff --git a/src/backend/access/nbtree/nbtsort.c b/src/backend/access/nbtree/nbtsort.c
index d7695dc1108..7bdbc7e5fa7 100644
--- a/src/backend/access/nbtree/nbtsort.c
+++ b/src/backend/access/nbtree/nbtsort.c
@@ -1925,7 +1925,7 @@ _bt_parallel_scan_and_sort(BTSpool *btspool, BTSpool *btspool2,
indexInfo = BuildIndexInfo(btspool->index);
indexInfo->ii_Concurrent = btshared->isconcurrent;
scan = table_beginscan_parallel(btspool->heap,
- ParallelTableScanFromBTShared(btshared));
+ ParallelTableScanFromBTShared(btshared), 0);
reltuples = table_index_build_scan(btspool->heap, btspool->index, indexInfo,
true, progress, _bt_build_callback,
&buildstate, scan);
diff --git a/src/backend/access/table/tableam.c b/src/backend/access/table/tableam.c
index 73ebc01a08f..a00bdfdf822 100644
--- a/src/backend/access/table/tableam.c
+++ b/src/backend/access/table/tableam.c
@@ -163,10 +163,11 @@ table_parallelscan_initialize(Relation rel, ParallelTableScanDesc pscan,
}
TableScanDesc
-table_beginscan_parallel(Relation relation, ParallelTableScanDesc pscan)
+table_beginscan_parallel(Relation relation, ParallelTableScanDesc pscan, uint32 flags)
{
Snapshot snapshot;
- uint32 flags = SO_TYPE_SEQSCAN |
+
+ flags |= SO_TYPE_SEQSCAN |
SO_ALLOW_STRAT | SO_ALLOW_SYNC | SO_ALLOW_PAGEMODE;
Assert(RelFileLocatorEquals(relation->rd_locator, pscan->phs_locator));
@@ -248,7 +249,7 @@ table_index_fetch_tuple_check(Relation rel,
bool found;
slot = table_slot_create(rel, NULL);
- scan = table_index_fetch_begin(rel);
+ scan = table_index_fetch_begin(rel, 0);
found = table_index_fetch_tuple(scan, tid, snapshot, slot, &call_again,
all_dead);
table_index_fetch_end(scan);
diff --git a/src/backend/commands/constraint.c b/src/backend/commands/constraint.c
index 3497a8221f2..97c8278e36d 100644
--- a/src/backend/commands/constraint.c
+++ b/src/backend/commands/constraint.c
@@ -106,7 +106,7 @@ unique_key_recheck(PG_FUNCTION_ARGS)
*/
tmptid = checktid;
{
- IndexFetchTableData *scan = table_index_fetch_begin(trigdata->tg_relation);
+ IndexFetchTableData *scan = table_index_fetch_begin(trigdata->tg_relation, 0);
bool call_again = false;
if (!table_index_fetch_tuple(scan, &tmptid, SnapshotSelf, slot,
diff --git a/src/backend/commands/copyto.c b/src/backend/commands/copyto.c
index dae91630ac3..1957bb0f1a2 100644
--- a/src/backend/commands/copyto.c
+++ b/src/backend/commands/copyto.c
@@ -1156,7 +1156,7 @@ CopyRelationTo(CopyToState cstate, Relation rel, Relation root_rel, uint64 *proc
AttrMap *map = NULL;
TupleTableSlot *root_slot = NULL;
- scandesc = table_beginscan(rel, GetActiveSnapshot(), 0, NULL);
+ scandesc = table_beginscan(rel, GetActiveSnapshot(), 0, NULL, 0);
slot = table_slot_create(rel, NULL);
/*
diff --git a/src/backend/commands/tablecmds.c b/src/backend/commands/tablecmds.c
index 6b1a00ed477..130a670d266 100644
--- a/src/backend/commands/tablecmds.c
+++ b/src/backend/commands/tablecmds.c
@@ -6377,7 +6377,7 @@ ATRewriteTable(AlteredTableInfo *tab, Oid OIDNewHeap)
* checking all the constraints.
*/
snapshot = RegisterSnapshot(GetLatestSnapshot());
- scan = table_beginscan(oldrel, snapshot, 0, NULL);
+ scan = table_beginscan(oldrel, snapshot, 0, NULL, 0);
/*
* Switch to per-tuple memory context and reset it for each tuple
@@ -13766,7 +13766,7 @@ validateForeignKeyConstraint(char *conname,
*/
snapshot = RegisterSnapshot(GetLatestSnapshot());
slot = table_slot_create(rel, NULL);
- scan = table_beginscan(rel, snapshot, 0, NULL);
+ scan = table_beginscan(rel, snapshot, 0, NULL, 0);
perTupCxt = AllocSetContextCreate(CurrentMemoryContext,
"validateForeignKeyConstraint",
@@ -22623,7 +22623,7 @@ MergePartitionsMoveRows(List **wqueue, List *mergingPartitions, Relation newPart
/* Scan through the rows. */
snapshot = RegisterSnapshot(GetLatestSnapshot());
- scan = table_beginscan(mergingPartition, snapshot, 0, NULL);
+ scan = table_beginscan(mergingPartition, snapshot, 0, NULL, 0);
/*
* Switch to per-tuple memory context and reset it for each tuple
@@ -23087,7 +23087,7 @@ SplitPartitionMoveRows(List **wqueue, Relation rel, Relation splitRel,
/* Scan through the rows. */
snapshot = RegisterSnapshot(GetLatestSnapshot());
- scan = table_beginscan(splitRel, snapshot, 0, NULL);
+ scan = table_beginscan(splitRel, snapshot, 0, NULL, 0);
/*
* Switch to per-tuple memory context and reset it for each tuple
diff --git a/src/backend/commands/typecmds.c b/src/backend/commands/typecmds.c
index be6ffd6ddb0..2921f68c1c3 100644
--- a/src/backend/commands/typecmds.c
+++ b/src/backend/commands/typecmds.c
@@ -3157,7 +3157,7 @@ validateDomainNotNullConstraint(Oid domainoid)
/* Scan all tuples in this relation */
snapshot = RegisterSnapshot(GetLatestSnapshot());
- scan = table_beginscan(testrel, snapshot, 0, NULL);
+ scan = table_beginscan(testrel, snapshot, 0, NULL, 0);
slot = table_slot_create(testrel, NULL);
while (table_scan_getnextslot(scan, ForwardScanDirection, slot))
{
@@ -3238,7 +3238,7 @@ validateDomainCheckConstraint(Oid domainoid, const char *ccbin, LOCKMODE lockmod
/* Scan all tuples in this relation */
snapshot = RegisterSnapshot(GetLatestSnapshot());
- scan = table_beginscan(testrel, snapshot, 0, NULL);
+ scan = table_beginscan(testrel, snapshot, 0, NULL, 0);
slot = table_slot_create(testrel, NULL);
while (table_scan_getnextslot(scan, ForwardScanDirection, slot))
{
diff --git a/src/backend/executor/execIndexing.c b/src/backend/executor/execIndexing.c
index 0b3a31f1703..74262a34819 100644
--- a/src/backend/executor/execIndexing.c
+++ b/src/backend/executor/execIndexing.c
@@ -816,7 +816,7 @@ check_exclusion_or_unique_constraint(Relation heap, Relation index,
retry:
conflict = false;
found_self = false;
- index_scan = index_beginscan(heap, index, &DirtySnapshot, NULL, indnkeyatts, 0);
+ index_scan = index_beginscan(heap, index, &DirtySnapshot, NULL, indnkeyatts, 0, 0);
index_rescan(index_scan, scankeys, indnkeyatts, NULL, 0);
while (index_getnext_slot(index_scan, ForwardScanDirection, existing_slot))
diff --git a/src/backend/executor/execReplication.c b/src/backend/executor/execReplication.c
index 860f79f9cc1..6e49ea5c5d8 100644
--- a/src/backend/executor/execReplication.c
+++ b/src/backend/executor/execReplication.c
@@ -204,7 +204,7 @@ RelationFindReplTupleByIndex(Relation rel, Oid idxoid,
skey_attoff = build_replindex_scan_key(skey, rel, idxrel, searchslot);
/* Start an index scan. */
- scan = index_beginscan(rel, idxrel, &snap, NULL, skey_attoff, 0);
+ scan = index_beginscan(rel, idxrel, &snap, NULL, skey_attoff, 0, 0);
retry:
found = false;
@@ -382,7 +382,7 @@ RelationFindReplTupleSeq(Relation rel, LockTupleMode lockmode,
/* Start a heap scan. */
InitDirtySnapshot(snap);
- scan = table_beginscan(rel, &snap, 0, NULL);
+ scan = table_beginscan(rel, &snap, 0, NULL, 0);
scanslot = table_slot_create(rel, NULL);
retry:
@@ -601,7 +601,7 @@ RelationFindDeletedTupleInfoSeq(Relation rel, TupleTableSlot *searchslot,
* not yet committed or those just committed prior to the scan are
* excluded in update_most_recent_deletion_info().
*/
- scan = table_beginscan(rel, SnapshotAny, 0, NULL);
+ scan = table_beginscan(rel, SnapshotAny, 0, NULL, 0);
scanslot = table_slot_create(rel, NULL);
table_rescan(scan, NULL);
@@ -665,7 +665,7 @@ RelationFindDeletedTupleInfoByIndex(Relation rel, Oid idxoid,
* not yet committed or those just committed prior to the scan are
* excluded in update_most_recent_deletion_info().
*/
- scan = index_beginscan(rel, idxrel, SnapshotAny, NULL, skey_attoff, 0);
+ scan = index_beginscan(rel, idxrel, SnapshotAny, NULL, skey_attoff, 0, 0);
index_rescan(scan, skey, skey_attoff, NULL, 0);
diff --git a/src/backend/executor/nodeBitmapHeapscan.c b/src/backend/executor/nodeBitmapHeapscan.c
index bf24f3d7fe0..0d854db51a1 100644
--- a/src/backend/executor/nodeBitmapHeapscan.c
+++ b/src/backend/executor/nodeBitmapHeapscan.c
@@ -105,11 +105,18 @@ BitmapTableScanSetup(BitmapHeapScanState *node)
*/
if (!node->ss.ss_currentScanDesc)
{
+ uint32 flags = 0;
+
+ if (!bms_is_member(((Scan *) node->ss.ps.plan)->scanrelid,
+ node->ss.ps.state->es_modified_relids))
+ flags = SO_HINT_REL_READ_ONLY;
+
node->ss.ss_currentScanDesc =
table_beginscan_bm(node->ss.ss_currentRelation,
node->ss.ps.state->es_snapshot,
0,
- NULL);
+ NULL,
+ flags);
}
node->ss.ss_currentScanDesc->st.rs_tbmiterator = tbmiterator;
diff --git a/src/backend/executor/nodeIndexonlyscan.c b/src/backend/executor/nodeIndexonlyscan.c
index 6bea42f128f..2c87ba5f767 100644
--- a/src/backend/executor/nodeIndexonlyscan.c
+++ b/src/backend/executor/nodeIndexonlyscan.c
@@ -94,7 +94,7 @@ IndexOnlyNext(IndexOnlyScanState *node)
estate->es_snapshot,
&node->ioss_Instrument,
node->ioss_NumScanKeys,
- node->ioss_NumOrderByKeys);
+ node->ioss_NumOrderByKeys, 0);
node->ioss_ScanDesc = scandesc;
diff --git a/src/backend/executor/nodeIndexscan.c b/src/backend/executor/nodeIndexscan.c
index 72b135e5dcf..92674441c6d 100644
--- a/src/backend/executor/nodeIndexscan.c
+++ b/src/backend/executor/nodeIndexscan.c
@@ -102,6 +102,12 @@ IndexNext(IndexScanState *node)
if (scandesc == NULL)
{
+ uint32 flags = 0;
+
+ if (!bms_is_member(((Scan *) node->ss.ps.plan)->scanrelid,
+ estate->es_modified_relids))
+ flags = SO_HINT_REL_READ_ONLY;
+
/*
* We reach here if the index scan is not parallel, or if we're
* serially executing an index scan that was planned to be parallel.
@@ -111,7 +117,8 @@ IndexNext(IndexScanState *node)
estate->es_snapshot,
&node->iss_Instrument,
node->iss_NumScanKeys,
- node->iss_NumOrderByKeys);
+ node->iss_NumOrderByKeys,
+ flags);
node->iss_ScanDesc = scandesc;
@@ -207,7 +214,7 @@ IndexNextWithReorder(IndexScanState *node)
estate->es_snapshot,
&node->iss_Instrument,
node->iss_NumScanKeys,
- node->iss_NumOrderByKeys);
+ node->iss_NumOrderByKeys, 0);
node->iss_ScanDesc = scandesc;
diff --git a/src/backend/executor/nodeSeqscan.c b/src/backend/executor/nodeSeqscan.c
index 94047d29430..4d0cbb9dee4 100644
--- a/src/backend/executor/nodeSeqscan.c
+++ b/src/backend/executor/nodeSeqscan.c
@@ -65,13 +65,20 @@ SeqNext(SeqScanState *node)
if (scandesc == NULL)
{
+ uint32 flags = 0;
+
+ if (!bms_is_member(((Scan *) node->ss.ps.plan)->scanrelid,
+ estate->es_modified_relids))
+ flags = SO_HINT_REL_READ_ONLY;
+
/*
* We reach here if the scan is not parallel, or if we're serially
* executing a scan that was planned to be parallel.
*/
scandesc = table_beginscan(node->ss.ss_currentRelation,
estate->es_snapshot,
- 0, NULL);
+ 0, NULL, flags);
+
node->ss.ss_currentScanDesc = scandesc;
}
@@ -367,14 +374,20 @@ ExecSeqScanInitializeDSM(SeqScanState *node,
{
EState *estate = node->ss.ps.state;
ParallelTableScanDesc pscan;
+ uint32 flags = 0;
pscan = shm_toc_allocate(pcxt->toc, node->pscan_len);
table_parallelscan_initialize(node->ss.ss_currentRelation,
pscan,
estate->es_snapshot);
shm_toc_insert(pcxt->toc, node->ss.ps.plan->plan_node_id, pscan);
+ if (!bms_is_member(((Scan *) node->ss.ps.plan)->scanrelid,
+ estate->es_modified_relids))
+ flags = SO_HINT_REL_READ_ONLY;
+
node->ss.ss_currentScanDesc =
- table_beginscan_parallel(node->ss.ss_currentRelation, pscan);
+ table_beginscan_parallel(node->ss.ss_currentRelation, pscan,
+ flags);
}
/* ----------------------------------------------------------------
@@ -404,8 +417,15 @@ ExecSeqScanInitializeWorker(SeqScanState *node,
ParallelWorkerContext *pwcxt)
{
ParallelTableScanDesc pscan;
+ uint32 flags = 0;
+
+ if (!bms_is_member(((Scan *) node->ss.ps.plan)->scanrelid,
+ node->ss.ps.state->es_modified_relids))
+ flags = SO_HINT_REL_READ_ONLY;
pscan = shm_toc_lookup(pwcxt->toc, node->ss.ps.plan->plan_node_id, false);
node->ss.ss_currentScanDesc =
- table_beginscan_parallel(node->ss.ss_currentRelation, pscan);
+ table_beginscan_parallel(node->ss.ss_currentRelation,
+ pscan,
+ flags);
}
diff --git a/src/backend/partitioning/partbounds.c b/src/backend/partitioning/partbounds.c
index 16b0adc172c..91acf1ee2d7 100644
--- a/src/backend/partitioning/partbounds.c
+++ b/src/backend/partitioning/partbounds.c
@@ -3362,7 +3362,7 @@ check_default_partition_contents(Relation parent, Relation default_rel,
econtext = GetPerTupleExprContext(estate);
snapshot = RegisterSnapshot(GetLatestSnapshot());
tupslot = table_slot_create(part_rel, &estate->es_tupleTable);
- scan = table_beginscan(part_rel, snapshot, 0, NULL);
+ scan = table_beginscan(part_rel, snapshot, 0, NULL, 0);
/*
* Switch to per-tuple memory context and reset it for each tuple
diff --git a/src/backend/utils/adt/selfuncs.c b/src/backend/utils/adt/selfuncs.c
index c760b19db55..ec0def0d1e2 100644
--- a/src/backend/utils/adt/selfuncs.c
+++ b/src/backend/utils/adt/selfuncs.c
@@ -7100,7 +7100,7 @@ get_actual_variable_endpoint(Relation heapRel,
index_scan = index_beginscan(heapRel, indexRel,
&SnapshotNonVacuumable, NULL,
- 1, 0);
+ 1, 0, 0);
/* Set it up for index-only scan */
index_scan->xs_want_itup = true;
index_rescan(index_scan, scankeys, 1, NULL, 0);
diff --git a/src/include/access/genam.h b/src/include/access/genam.h
index 9200a22bd9f..d29d9e905fc 100644
--- a/src/include/access/genam.h
+++ b/src/include/access/genam.h
@@ -177,7 +177,7 @@ extern IndexScanDesc index_beginscan(Relation heapRelation,
Relation indexRelation,
Snapshot snapshot,
IndexScanInstrumentation *instrument,
- int nkeys, int norderbys);
+ int nkeys, int norderbys, uint32 flags);
extern IndexScanDesc index_beginscan_bitmap(Relation indexRelation,
Snapshot snapshot,
IndexScanInstrumentation *instrument,
diff --git a/src/include/access/heapam.h b/src/include/access/heapam.h
index 1625b107575..0bfe2366e1a 100644
--- a/src/include/access/heapam.h
+++ b/src/include/access/heapam.h
@@ -119,6 +119,12 @@ typedef struct IndexFetchHeapData
Buffer xs_cbuf; /* current heap buffer in scan, if any */
/* NB: if xs_cbuf is not InvalidBuffer, we hold a pin on that buffer */
+
+ /*
+ * Some optimizations can only be performed if the query does not modify
+ * the underlying relation. Track that here.
+ */
+ bool modifies_base_rel;
} IndexFetchHeapData;
/* Result codes for HeapTupleSatisfiesVacuum */
diff --git a/src/include/access/tableam.h b/src/include/access/tableam.h
index 2fa790b6bf5..d10b1b03cdb 100644
--- a/src/include/access/tableam.h
+++ b/src/include/access/tableam.h
@@ -63,6 +63,8 @@ typedef enum ScanOptions
/* unregister snapshot at scan end? */
SO_TEMP_SNAPSHOT = 1 << 9,
+ /* set if the query doesn't modify the rel */
+ SO_HINT_REL_READ_ONLY = 1 << 10,
} ScanOptions;
/*
@@ -420,7 +422,7 @@ typedef struct TableAmRoutine
*
* Tuples for an index scan can then be fetched via index_fetch_tuple.
*/
- struct IndexFetchTableData *(*index_fetch_begin) (Relation rel);
+ struct IndexFetchTableData *(*index_fetch_begin) (Relation rel, uint32 flags);
/*
* Reset index fetch. Typically this will release cross index fetch
@@ -874,9 +876,9 @@ extern TupleTableSlot *table_slot_create(Relation relation, List **reglist);
*/
static inline TableScanDesc
table_beginscan(Relation rel, Snapshot snapshot,
- int nkeys, ScanKeyData *key)
+ int nkeys, ScanKeyData *key, uint32 flags)
{
- uint32 flags = SO_TYPE_SEQSCAN |
+ flags |= SO_TYPE_SEQSCAN |
SO_ALLOW_STRAT | SO_ALLOW_SYNC | SO_ALLOW_PAGEMODE;
return rel->rd_tableam->scan_begin(rel, snapshot, nkeys, key, NULL, flags);
@@ -919,9 +921,9 @@ table_beginscan_strat(Relation rel, Snapshot snapshot,
*/
static inline TableScanDesc
table_beginscan_bm(Relation rel, Snapshot snapshot,
- int nkeys, ScanKeyData *key)
+ int nkeys, ScanKeyData *key, uint32 flags)
{
- uint32 flags = SO_TYPE_BITMAPSCAN | SO_ALLOW_PAGEMODE;
+ flags |= SO_TYPE_BITMAPSCAN | SO_ALLOW_PAGEMODE;
return rel->rd_tableam->scan_begin(rel, snapshot, nkeys, key,
NULL, flags);
@@ -1128,7 +1130,8 @@ extern void table_parallelscan_initialize(Relation rel,
* Caller must hold a suitable lock on the relation.
*/
extern TableScanDesc table_beginscan_parallel(Relation relation,
- ParallelTableScanDesc pscan);
+ ParallelTableScanDesc pscan,
+ uint32 flags);
/*
* Begin a parallel tid range scan. `pscan` needs to have been initialized
@@ -1164,9 +1167,9 @@ table_parallelscan_reinitialize(Relation rel, ParallelTableScanDesc pscan)
* Tuples for an index scan can then be fetched via table_index_fetch_tuple().
*/
static inline IndexFetchTableData *
-table_index_fetch_begin(Relation rel)
+table_index_fetch_begin(Relation rel, uint32 flags)
{
- return rel->rd_tableam->index_fetch_begin(rel);
+ return rel->rd_tableam->index_fetch_begin(rel, flags);
}
/*
--
2.43.0
v29-0014-Allow-on-access-pruning-to-set-pages-all-visible.patchtext/x-patch; charset=US-ASCII; name=v29-0014-Allow-on-access-pruning-to-set-pages-all-visible.patchDownload
From 7eab00259868aaa07ce3b80f1a01c379eb7f8905 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <melanieplageman@gmail.com>
Date: Wed, 3 Dec 2025 15:24:08 -0500
Subject: [PATCH v29 14/15] Allow on-access pruning to set pages all-visible
Many queries do not modify the underlying relation. For such queries, if
on-access pruning occurs during the scan, we can check whether the page
has become all-visible and update the visibility map accordingly.
Previously, only vacuum and COPY FREEZE marked pages as all-visible or
all-frozen.
This commit implements on-access VM setting for sequential scans as well
as for the underlying heap relation in index scans and bitmap heap
scans.
Author: Melanie Plageman <melanieplageman@gmail.com>
Reviewed-by: Andres Freund <andres@anarazel.de>
Reviewed-by: Kirill Reshke <reshkekirill@gmail.com>
Discussion: https://postgr.es/m/flat/CAAKRu_ZMw6Npd_qm2KM%2BFwQ3cMOMx1Dh3VMhp8-V7SOLxdK9-g%40mail.gmail.com
---
src/backend/access/heap/heapam.c | 15 ++++++-
src/backend/access/heap/heapam_handler.c | 15 ++++++-
src/backend/access/heap/pruneheap.c | 40 ++++++++++++++++++-
src/include/access/heapam.h | 24 +++++++++--
.../t/035_standby_logical_decoding.pl | 3 +-
5 files changed, 89 insertions(+), 8 deletions(-)
diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index fb7a7548aa0..d9dc79f4a96 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -570,6 +570,7 @@ heap_prepare_pagescan(TableScanDesc sscan)
Buffer buffer = scan->rs_cbuf;
BlockNumber block = scan->rs_cblock;
Snapshot snapshot;
+ Buffer *vmbuffer = NULL;
Page page;
int lines;
bool all_visible;
@@ -584,7 +585,9 @@ heap_prepare_pagescan(TableScanDesc sscan)
/*
* Prune and repair fragmentation for the whole page, if possible.
*/
- heap_page_prune_opt(scan->rs_base.rs_rd, buffer);
+ if (sscan->rs_flags & SO_HINT_REL_READ_ONLY)
+ vmbuffer = &scan->rs_vmbuffer;
+ heap_page_prune_opt(scan->rs_base.rs_rd, buffer, vmbuffer);
/*
* We must hold share lock on the buffer content while examining tuple
@@ -1261,6 +1264,7 @@ heap_beginscan(Relation relation, Snapshot snapshot,
sizeof(TBMIterateResult));
}
+ scan->rs_vmbuffer = InvalidBuffer;
return (TableScanDesc) scan;
}
@@ -1299,6 +1303,12 @@ heap_rescan(TableScanDesc sscan, ScanKey key, bool set_params,
scan->rs_cbuf = InvalidBuffer;
}
+ if (BufferIsValid(scan->rs_vmbuffer))
+ {
+ ReleaseBuffer(scan->rs_vmbuffer);
+ scan->rs_vmbuffer = InvalidBuffer;
+ }
+
/*
* SO_TYPE_BITMAPSCAN would be cleaned up here, but it does not hold any
* additional data vs a normal HeapScan
@@ -1331,6 +1341,9 @@ heap_endscan(TableScanDesc sscan)
if (BufferIsValid(scan->rs_cbuf))
ReleaseBuffer(scan->rs_cbuf);
+ if (BufferIsValid(scan->rs_vmbuffer))
+ ReleaseBuffer(scan->rs_vmbuffer);
+
/*
* Must free the read stream before freeing the BufferAccessStrategy.
*/
diff --git a/src/backend/access/heap/heapam_handler.c b/src/backend/access/heap/heapam_handler.c
index 6c2e4e08b16..2cb98e58956 100644
--- a/src/backend/access/heap/heapam_handler.c
+++ b/src/backend/access/heap/heapam_handler.c
@@ -85,6 +85,7 @@ heapam_index_fetch_begin(Relation rel, uint32 flags)
hscan->xs_base.rel = rel;
hscan->xs_cbuf = InvalidBuffer;
+ hscan->xs_vmbuffer = InvalidBuffer;
hscan->modifies_base_rel = !(flags & SO_HINT_REL_READ_ONLY);
return &hscan->xs_base;
@@ -100,6 +101,12 @@ heapam_index_fetch_reset(IndexFetchTableData *scan)
ReleaseBuffer(hscan->xs_cbuf);
hscan->xs_cbuf = InvalidBuffer;
}
+
+ if (BufferIsValid(hscan->xs_vmbuffer))
+ {
+ ReleaseBuffer(hscan->xs_vmbuffer);
+ hscan->xs_vmbuffer = InvalidBuffer;
+ }
}
static void
@@ -139,7 +146,8 @@ heapam_index_fetch_tuple(struct IndexFetchTableData *scan,
* Prune page, but only if we weren't already on this page
*/
if (prev_buf != hscan->xs_cbuf)
- heap_page_prune_opt(hscan->xs_base.rel, hscan->xs_cbuf);
+ heap_page_prune_opt(hscan->xs_base.rel, hscan->xs_cbuf,
+ hscan->modifies_base_rel ? NULL : &hscan->xs_vmbuffer);
}
/* Obtain share-lock on the buffer so we can examine visibility */
@@ -2472,6 +2480,7 @@ BitmapHeapScanNextBlock(TableScanDesc scan,
TBMIterateResult *tbmres;
OffsetNumber offsets[TBM_MAX_TUPLES_PER_PAGE];
int noffsets = -1;
+ Buffer *vmbuffer = NULL;
Assert(scan->rs_flags & SO_TYPE_BITMAPSCAN);
Assert(hscan->rs_read_stream);
@@ -2518,7 +2527,9 @@ BitmapHeapScanNextBlock(TableScanDesc scan,
/*
* Prune and repair fragmentation for the whole page, if possible.
*/
- heap_page_prune_opt(scan->rs_rd, buffer);
+ if (scan->rs_flags & SO_HINT_REL_READ_ONLY)
+ vmbuffer = &hscan->rs_vmbuffer;
+ heap_page_prune_opt(scan->rs_rd, buffer, vmbuffer);
/*
* We must hold share lock on the buffer content while examining tuple
diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index 8df81833179..3ddb1b396b4 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -202,6 +202,8 @@ static bool heap_page_will_set_vm(PruneState *prstate,
Relation relation,
BlockNumber heap_blk, Buffer heap_buffer, Page heap_page,
Buffer vmbuffer,
+ PruneReason reason,
+ bool do_prune, bool do_freeze,
int nlpdead_items,
uint8 *old_vmbits,
uint8 *new_vmbits);
@@ -223,9 +225,13 @@ static TransactionId get_conflict_xid(bool do_prune, bool do_freeze, bool do_set
* if there's not any use in pruning.
*
* Caller must have pin on the buffer, and must *not* have a lock on it.
+ *
+ * If vmbuffer is not NULL, it is okay for pruning to set the visibility map if
+ * the page is all-visible. We will take care of pinning and, if needed,
+ * reading in the page of the visibility map.
*/
void
-heap_page_prune_opt(Relation relation, Buffer buffer)
+heap_page_prune_opt(Relation relation, Buffer buffer, Buffer *vmbuffer)
{
Page page = BufferGetPage(buffer);
TransactionId prune_xid;
@@ -306,6 +312,13 @@ heap_page_prune_opt(Relation relation, Buffer buffer)
.cutoffs = NULL,
};
+ if (vmbuffer)
+ {
+ visibilitymap_pin(relation, BufferGetBlockNumber(buffer), vmbuffer);
+ params.options |= HEAP_PAGE_PRUNE_UPDATE_VM;
+ params.vmbuffer = *vmbuffer;
+ }
+
heap_page_prune_and_freeze(¶ms, &presult, &dummy_off_loc,
NULL, NULL);
@@ -951,6 +964,9 @@ identify_and_fix_vm_corruption(Relation rel, Buffer heap_buffer,
* corrupted, it will fix them by clearing the VM bits and visibility hint.
* This does not need to be done in a critical section.
*
+ * This should be called only after do_freeze has been decided (and do_prune
+ * has been set), as these factor into our heuristic-based decision.
+ *
* Returns true if one or both VM bits should be set, along with returning the
* current value of the VM bits in *old_vmbits and the desired new value of
* the VM bits in *new_vmbits.
@@ -960,6 +976,8 @@ heap_page_will_set_vm(PruneState *prstate,
Relation relation,
BlockNumber heap_blk, Buffer heap_buffer, Page heap_page,
Buffer vmbuffer,
+ PruneReason reason,
+ bool do_prune, bool do_freeze,
int nlpdead_items,
uint8 *old_vmbits,
uint8 *new_vmbits)
@@ -967,6 +985,24 @@ heap_page_will_set_vm(PruneState *prstate,
if (!prstate->attempt_update_vm)
return false;
+ /*
+ * If this is an on-access call and we're not actually pruning, avoid
+ * setting the visibility map if it would newly dirty the heap page or, if
+ * the page is already dirty, if doing so would require including a
+ * full-page image (FPI) of the heap page in the WAL. This situation
+ * should be rare, as on-access pruning is only attempted when
+ * pd_prune_xid is valid.
+ */
+ if (reason == PRUNE_ON_ACCESS &&
+ prstate->all_visible &&
+ !do_prune && !do_freeze &&
+ (!BufferIsDirty(heap_buffer) || XLogCheckBufferNeedsBackup(heap_buffer)))
+ {
+ prstate->all_visible = false;
+ prstate->all_frozen = false;
+ return false;
+ }
+
*old_vmbits = visibilitymap_get_status(relation, heap_blk,
&vmbuffer);
@@ -1164,6 +1200,8 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
buffer,
page,
vmbuffer,
+ params->reason,
+ do_prune, do_freeze,
prstate.lpdead_items,
&old_vmbits,
&new_vmbits);
diff --git a/src/include/access/heapam.h b/src/include/access/heapam.h
index 0bfe2366e1a..3328f56c101 100644
--- a/src/include/access/heapam.h
+++ b/src/include/access/heapam.h
@@ -95,6 +95,13 @@ typedef struct HeapScanDescData
*/
ParallelBlockTableScanWorkerData *rs_parallelworkerdata;
+ /*
+ * For sequential scans and bitmap heap scans. If the relation is not
+ * being modified, on-access pruning may read in the current heap page's
+ * corresponding VM block to this buffer.
+ */
+ Buffer rs_vmbuffer;
+
/* these fields only used in page-at-a-time mode and for bitmap scans */
uint32 rs_cindex; /* current tuple's index in vistuples */
uint32 rs_ntuples; /* number of visible tuples on page */
@@ -117,8 +124,18 @@ typedef struct IndexFetchHeapData
{
IndexFetchTableData xs_base; /* AM independent part of the descriptor */
- Buffer xs_cbuf; /* current heap buffer in scan, if any */
- /* NB: if xs_cbuf is not InvalidBuffer, we hold a pin on that buffer */
+ /*
+ * Current heap buffer in scan, if any. NB: if xs_cbuf is not
+ * InvalidBuffer, we hold a pin on that buffer.
+ */
+ Buffer xs_cbuf;
+
+ /*
+ * For index scans that do not modify the underlying heap table, on-access
+ * pruning may read in the current heap page's corresponding VM block to
+ * this buffer.
+ */
+ Buffer xs_vmbuffer;
/*
* Some optimizations can only be performed if the query does not modify
@@ -420,7 +437,8 @@ extern TransactionId heap_index_delete_tuples(Relation rel,
TM_IndexDeleteOp *delstate);
/* in heap/pruneheap.c */
-extern void heap_page_prune_opt(Relation relation, Buffer buffer);
+extern void heap_page_prune_opt(Relation relation, Buffer buffer,
+ Buffer *vmbuffer);
extern void heap_page_prune_and_freeze(PruneFreezeParams *params,
PruneFreezeResult *presult,
OffsetNumber *off_loc,
diff --git a/src/test/recovery/t/035_standby_logical_decoding.pl b/src/test/recovery/t/035_standby_logical_decoding.pl
index ebe2fae1789..bdd9f0a62cd 100644
--- a/src/test/recovery/t/035_standby_logical_decoding.pl
+++ b/src/test/recovery/t/035_standby_logical_decoding.pl
@@ -296,6 +296,7 @@ wal_level = 'logical'
max_replication_slots = 4
max_wal_senders = 4
autovacuum = off
+hot_standby_feedback = on
});
$node_primary->dump_info;
$node_primary->start;
@@ -748,7 +749,7 @@ check_pg_recvlogical_stderr($handle,
$logstart = -s $node_standby->logfile;
reactive_slots_change_hfs_and_wait_for_xmins('shared_row_removal_',
- 'no_conflict_', 0, 1);
+ 'no_conflict_', 1, 0);
# This should not trigger a conflict
wait_until_vacuum_can_remove(
--
2.43.0
v29-0015-Set-pd_prune_xid-on-insert.patchtext/x-patch; charset=UTF-8; name=v29-0015-Set-pd_prune_xid-on-insert.patchDownload
From c0e6bb1a30761705645110c426e7aa3759a18298 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <melanieplageman@gmail.com>
Date: Tue, 29 Jul 2025 16:12:56 -0400
Subject: [PATCH v29 15/15] Set pd_prune_xid on insert
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
Now that visibility map (VM) updates can occur during read-only queries,
it makes sense to also set the page’s pd_prune_xid hint during inserts.
This enables heap_page_prune_and_freeze() to run after a page is
filled with newly inserted tuples the first time it is read.
This change also addresses a long-standing note in heap_insert() and
heap_multi_insert(), which observed that setting pd_prune_xid would
help clean up aborted insertions sooner. Without it, such tuples might
linger until VACUUM, whereas now they can be pruned earlier.
Setting pd_prune_xid on insert can cause a page to be dirtied and
written out when it previously would not have been, affetcting the
reported number of hits in the index-killtuples isolation test. It is
unclear if this is a bug in the way hits are tracked, a faulty test
expectation, or if simply updating the test's expected output is
sufficient remediation.
---
src/backend/access/heap/heapam.c | 25 +++++++++++++------
src/backend/access/heap/heapam_xlog.c | 15 ++++++++++-
.../modules/index/expected/killtuples.out | 6 ++---
3 files changed, 34 insertions(+), 12 deletions(-)
diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index d9dc79f4a96..ccebc1f244b 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -2119,6 +2119,7 @@ heap_insert(Relation relation, HeapTuple tup, CommandId cid,
TransactionId xid = GetCurrentTransactionId();
HeapTuple heaptup;
Buffer buffer;
+ Page page;
Buffer vmbuffer = InvalidBuffer;
bool all_visible_cleared = false;
@@ -2178,15 +2179,19 @@ heap_insert(Relation relation, HeapTuple tup, CommandId cid,
}
/*
- * XXX Should we set PageSetPrunable on this page ?
+ * Set pd_prune_xid to trigger heap_page_prune_and_freeze() once the page
+ * is full so that we can set the page all-visible in the VM.
*
- * The inserting transaction may eventually abort thus making this tuple
- * DEAD and hence available for pruning. Though we don't want to optimize
- * for aborts, if no other tuple in this page is UPDATEd/DELETEd, the
- * aborted tuple will never be pruned until next vacuum is triggered.
+ * Setting pd_prune_xid is also handy if the inserting transaction
+ * eventually aborts making this tuple DEAD and hence available for
+ * pruning. If no other tuple in this page is UPDATEd/DELETEd, the aborted
+ * tuple would never otherwise be pruned until next vacuum is triggered.
*
- * If you do add PageSetPrunable here, add it in heap_xlog_insert too.
+ * Don't set it if we are in bootstrap mode, though.
*/
+ page = BufferGetPage(buffer);
+ if (TransactionIdIsNormal(xid))
+ PageSetPrunable(page, xid);
MarkBufferDirty(buffer);
@@ -2196,7 +2201,6 @@ heap_insert(Relation relation, HeapTuple tup, CommandId cid,
xl_heap_insert xlrec;
xl_heap_header xlhdr;
XLogRecPtr recptr;
- Page page = BufferGetPage(buffer);
uint8 info = XLOG_HEAP_INSERT;
int bufflags = 0;
@@ -2560,8 +2564,13 @@ heap_multi_insert(Relation relation, TupleTableSlot **slots, int ntuples,
}
/*
- * XXX Should we set PageSetPrunable on this page ? See heap_insert()
+ * Set pd_prune_xid. See heap_insert() for more on why we do this when
+ * inserting tuples. This only makes sense if we aren't already
+ * setting the page frozen in the VM. We also don't set it in
+ * bootstrap mode.
*/
+ if (!all_frozen_set && TransactionIdIsNormal(xid))
+ PageSetPrunable(page, xid);
MarkBufferDirty(buffer);
diff --git a/src/backend/access/heap/heapam_xlog.c b/src/backend/access/heap/heapam_xlog.c
index 47d2479415e..ab2db931aac 100644
--- a/src/backend/access/heap/heapam_xlog.c
+++ b/src/backend/access/heap/heapam_xlog.c
@@ -447,6 +447,12 @@ heap_xlog_insert(XLogReaderState *record)
freespace = PageGetHeapFreeSpace(page); /* needed to update FSM below */
+ /*
+ * Set the page prunable to trigger on-access pruning later which may
+ * set the page all-visible in the VM.
+ */
+ PageSetPrunable(page, XLogRecGetXid(record));
+
PageSetLSN(page, lsn);
if (xlrec->flags & XLH_INSERT_ALL_VISIBLE_CLEARED)
@@ -596,9 +602,16 @@ heap_xlog_multi_insert(XLogReaderState *record)
if (xlrec->flags & XLH_INSERT_ALL_VISIBLE_CLEARED)
PageClearAllVisible(page);
- /* XLH_INSERT_ALL_FROZEN_SET implies that all tuples are visible */
+ /*
+ * XLH_INSERT_ALL_FROZEN_SET implies that all tuples are visible. If
+ * we are not setting the page frozen, then set the page's prunable
+ * hint so that we trigger on-access pruning later which may set the
+ * page all-visible in the VM.
+ */
if (xlrec->flags & XLH_INSERT_ALL_FROZEN_SET)
PageSetAllVisible(page);
+ else
+ PageSetPrunable(page, XLogRecGetXid(record));
MarkBufferDirty(buffer);
}
diff --git a/src/test/modules/index/expected/killtuples.out b/src/test/modules/index/expected/killtuples.out
index be7ddd756ef..b29f2434b00 100644
--- a/src/test/modules/index/expected/killtuples.out
+++ b/src/test/modules/index/expected/killtuples.out
@@ -54,7 +54,7 @@ step flush: SELECT FROM pg_stat_force_next_flush();
step result: SELECT heap_blks_read + heap_blks_hit - counter.heap_accesses AS new_heap_accesses FROM counter, pg_statio_all_tables WHERE relname = 'kill_prior_tuple';
new_heap_accesses
-----------------
- 1
+ 2
(1 row)
step measure: UPDATE counter SET heap_accesses = (SELECT heap_blks_read + heap_blks_hit FROM pg_statio_all_tables WHERE relname = 'kill_prior_tuple');
@@ -130,7 +130,7 @@ step flush: SELECT FROM pg_stat_force_next_flush();
step result: SELECT heap_blks_read + heap_blks_hit - counter.heap_accesses AS new_heap_accesses FROM counter, pg_statio_all_tables WHERE relname = 'kill_prior_tuple';
new_heap_accesses
-----------------
- 1
+ 2
(1 row)
step measure: UPDATE counter SET heap_accesses = (SELECT heap_blks_read + heap_blks_hit FROM pg_statio_all_tables WHERE relname = 'kill_prior_tuple');
@@ -283,7 +283,7 @@ step flush: SELECT FROM pg_stat_force_next_flush();
step result: SELECT heap_blks_read + heap_blks_hit - counter.heap_accesses AS new_heap_accesses FROM counter, pg_statio_all_tables WHERE relname = 'kill_prior_tuple';
new_heap_accesses
-----------------
- 1
+ 2
(1 row)
step measure: UPDATE counter SET heap_accesses = (SELECT heap_blks_read + heap_blks_hit FROM pg_statio_all_tables WHERE relname = 'kill_prior_tuple');
--
2.43.0
On Sat, 20 Dec 2025 at 02:10, Melanie Plageman
<melanieplageman@gmail.com> wrote:
Attached v29 addresses some feedback and also corrects a small error
with the assertion I had added in the previous version's 0009.On Thu, Dec 18, 2025 at 10:38 PM Xuneng Zhou <xunengzhou@gmail.com> wrote:
I’ve done a basic review of patches 1 and 2. Here are some comments
which may be somewhat immature, as this is a fairly large change set
and I’m new to some parts of the code.1) Potential stale old_vmbits after VM repair n v2
Good catch! I've fixed this in attached v29.
2) Add Assert(BufferIsDirty(buf))
Since the patch's core claim is "buffer must be dirty before WAL
registration", an assertion encodes this invariant. Should we add:Assert(BufferIsValid(buf));
Assert(BufferIsDirty(buf));right before the visibilitymap_set() call?
There are already assertions that will trip in various places -- most
importantly in XLogRegisterBuffer(), which is the one that inspired
this refactor.The comment at lines:
"The only scenario where it is not already dirty is if the VM was removed…"
This phrasing could become misleading after future refactors. Can we
make it more direct like:"We must mark the heap buffer dirty before calling visibilitymap_set(), because it may WAL-log the buffer and XLogRegisterBuffer() requires it."
I see your point about future refactors missing updating comments like
this. But, I don't think we are going to refactor the code such that
we can have PD_ALL_VISIBLE set without the VM bits set more often.
Also, it is common practice in Postgres to describe very specific edge
cases or odd scenarios in order to explain code that may seem
confusing without the comment. It does risk that comment later
becoming stale, but it is better that future developers understand why
the code is there.That being said, I take your point that the comment is confusing. I
have updated it in a different way."Even if PD_ALL_VISIBLE is already set, we don't need to worry about unnecessarily dirtying the heap buffer, as it must be marked dirty before adding it to the WAL chain. The only scenario where it is not already dirty is if the VM was removed..."
In this test we now call MarkBufferDirty() on the heap page even when
only setting the VM, so the comments claiming “does not need to modify
the heap buffer”/“no heap page modification” might be misleading. It
might be better to say the test doesn’t need to modify heap
tuples/page contents or doesn’t need to prune/freeze.The point I'm trying to make is that we have to dirty the buffer even
if we don't modify the page because of the XLOG sub-system
requirements. And, it may seem like a waste to do that if not
modifying the page, but the page will rarely be clean anyway. I've
tried to make this more clear in attached v29.- Melanie
Hi! I checked v29-0009, about HeapTupleSatisfiesVacuumHorizon. Origins
of this code track down to fdf9e21196a6 which was committed as part of
[0]: /messages/by-id/CABOikdP0meGuXPPWuYrP=vDvoqUdshF2xJAzZHWSKg03Rz_+9Q@mail.gmail.com
there was no HeapTupleSatisfiesVacuumHorizon function. I guess this is
the reason this optimization was not performed earlier.
I also think this patch is correct, because we do similar things for
HEAPTUPLE_DEAD & HEAPTUPLE_RECENTLY_DEAD, and
HeapTupleSatisfiesVacuumHorizon is just a proxy to
HeapTupleSatisfiesVacuumHorizon with only difference in DEAD VS
RECENTLY_DEAD handling.
Similar change could be done at heapam_scan_analyze_next_tuple
...
case HEAPTUPLE_DEAD:
case HEAPTUPLE_RECENTLY_DEAD:
/* Count dead and recently-dead rows */
*deadrows += 1;
break;
...
[0]: /messages/by-id/CABOikdP0meGuXPPWuYrP=vDvoqUdshF2xJAzZHWSKg03Rz_+9Q@mail.gmail.com
--
Best regards,
Kirill Reshke
On Dec 20, 2025, at 05:09, Melanie Plageman <melanieplageman@gmail.com> wrote:
Attached v29 addresses some feedback and also corrects a small error
with the assertion I had added in the previous version's 0009.On Thu, Dec 18, 2025 at 10:38 PM Xuneng Zhou <xunengzhou@gmail.com> wrote:
I’ve done a basic review of patches 1 and 2. Here are some comments
which may be somewhat immature, as this is a fairly large change set
and I’m new to some parts of the code.1) Potential stale old_vmbits after VM repair n v2
Good catch! I've fixed this in attached v29.
2) Add Assert(BufferIsDirty(buf))
Since the patch's core claim is "buffer must be dirty before WAL
registration", an assertion encodes this invariant. Should we add:Assert(BufferIsValid(buf));
Assert(BufferIsDirty(buf));right before the visibilitymap_set() call?
There are already assertions that will trip in various places -- most
importantly in XLogRegisterBuffer(), which is the one that inspired
this refactor.The comment at lines:
"The only scenario where it is not already dirty is if the VM was removed…"
This phrasing could become misleading after future refactors. Can we
make it more direct like:"We must mark the heap buffer dirty before calling visibilitymap_set(), because it may WAL-log the buffer and XLogRegisterBuffer() requires it."
I see your point about future refactors missing updating comments like
this. But, I don't think we are going to refactor the code such that
we can have PD_ALL_VISIBLE set without the VM bits set more often.
Also, it is common practice in Postgres to describe very specific edge
cases or odd scenarios in order to explain code that may seem
confusing without the comment. It does risk that comment later
becoming stale, but it is better that future developers understand why
the code is there.That being said, I take your point that the comment is confusing. I
have updated it in a different way."Even if PD_ALL_VISIBLE is already set, we don't need to worry about unnecessarily dirtying the heap buffer, as it must be marked dirty before adding it to the WAL chain. The only scenario where it is not already dirty is if the VM was removed..."
In this test we now call MarkBufferDirty() on the heap page even when
only setting the VM, so the comments claiming “does not need to modify
the heap buffer”/“no heap page modification” might be misleading. It
might be better to say the test doesn’t need to modify heap
tuples/page contents or doesn’t need to prune/freeze.The point I'm trying to make is that we have to dirty the buffer even
if we don't modify the page because of the XLOG sub-system
requirements. And, it may seem like a waste to do that if not
modifying the page, but the page will rarely be clean anyway. I've
tried to make this more clear in attached v29.- Melanie
<v29-0001-Combine-visibilitymap_set-cases-in-lazy_scan_pru.patch><v29-0002-Eliminate-use-of-cached-VM-value-in-lazy_scan_pr.patch><v29-0003-Refactor-lazy_scan_prune-VM-clear-logic-into-hel.patch><v29-0004-Set-the-VM-in-heap_page_prune_and_freeze.patch><v29-0005-Move-VM-assert-into-prune-freeze-code.patch><v29-0006-Eliminate-XLOG_HEAP2_VISIBLE-from-vacuum-phase-I.patch><v29-0007-Eliminate-XLOG_HEAP2_VISIBLE-from-empty-page-vac.patch><v29-0008-Remove-XLOG_HEAP2_VISIBLE-entirely.patch><v29-0009-Simplify-heap_page_would_be_all_visible-visibili.patch><v29-0010-Use-GlobalVisState-in-vacuum-to-determine-page-l.patch><v29-0011-Unset-all_visible-sooner-if-not-freezing.patch><v29-0012-Track-which-relations-are-modified-by-a-query.patch><v29-0013-Pass-down-information-on-table-modification-to-s.patch><v29-0014-Allow-on-access-pruning-to-set-pages-all-visible.patch><v29-0015-Set-pd_prune_xid-on-insert.patch>
A few more comments on v29:
1 - 0002 - Looks like since 0002, visibilitymap_set()’s return value is no longer used, so do we need to update the function and change return type to void? I remember in some patches, to address Coverity alerts, people had to do “(void) function_with_a_return_value()”.
2 - 0003
```
+ * Helper to correct any corruption detected on an heap page and its
```
Nit: “an” -> “a”
3 - 0003
```
+static bool
+identify_and_fix_vm_corruption(Relation rel, Buffer heap_buffer,
+ BlockNumber heap_blk, Page heap_page,
+ int nlpdead_items,
+ Buffer vmbuffer,
+ uint8 vmbits)
+{
+ Assert(visibilitymap_get_status(rel, heap_blk, &vmbuffer) == vmbits);
```
Right before this function is called:
```
old_vmbits = visibilitymap_get_status(vacrel->rel, blkno, &vmbuffer);
+ if (identify_and_fix_vm_corruption(vacrel->rel, buf, blkno, page,
+ presult.lpdead_items, vmbuffer,
+ old_vmbits))
```
So, the Assert() is checking if old_vmbits is newly returned from visibilitymap_get_status(), in that case, identify_and_fix_vm_corruption() can take vmbits as a pointer , and it calls visibilitymap_get_status() to get vmbits itself and returns vmbits via the pointer, so that we don’t need to call visibilitymap_get_status() twice.
4 - 0004
```
+ * These are only set if the HEAP_PAGE_PRUNE_UPDATE_VM option is set and
+ * we have attempted to update the VM.
+ */
+ uint8 new_vmbits;
+ uint8 old_vmbits;
```
The comment feels a little confusing to me. "HEAP_PAGE_PRUNE_UPDATE_VM option is set” is a clear indication, but how to decide "we have attempted to update the VM”? By reading the code:
```
+ prstate->attempt_update_vm =
+ (params->options & HEAP_PAGE_PRUNE_UPDATE_VM) != 0;
```
It’s just the result of HEAP_PAGE_PRUNE_UPDATE_VM being set. So, maybe we don’t the “and” part.
5 - 0004
```
+ * Returns true if one or both VM bits should be set, along with returning the
+ * current value of the VM bits in *old_vmbits and the desired new value of
+ * the VM bits in *new_vmbits.
+ */
+static bool
+heap_page_will_set_vm(PruneState *prstate,
+ Relation relation,
+ BlockNumber heap_blk, Buffer heap_buffer, Page heap_page,
+ Buffer vmbuffer,
+ int nlpdead_items,
+ uint8 *old_vmbits,
+ uint8 *new_vmbits)
+{
+ if (!prstate->attempt_update_vm)
+ return false;
```
old_vmbits and new_vmbits are purely output parameters. So, maybe we should set them to 0 inside this function instead of relying on callers to initialize them.
I think this is a similar case where I raised a comment earlier about initializing presult to {0} in the callers, and you only wanted to set presult in heap_page_prune_and_freeze().
6 - 0004
```
@@ -823,13 +975,19 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
MultiXactId *new_relmin_mxid)
{
Buffer buffer = params->buffer;
+ Buffer vmbuffer = params->vmbuffer;
Page page = BufferGetPage(buffer);
+ BlockNumber blockno = BufferGetBlockNumber(buffer);
PruneState prstate;
bool do_freeze;
bool do_prune;
bool do_hint_prune;
+ bool do_set_vm;
bool did_tuple_hint_fpi;
int64 fpi_before = pgWalUsage.wal_fpi;
+ uint8 new_vmbits = 0;
+ uint8 old_vmbits = 0;
+
/* Initialize prstate */
```
Nit: an extra empty line is added.
7 - 0005
```
- * HEAP_PAGE_PRUNE_FREEZE indicates that we will also freeze tuples, and
- * will return 'all_visible', 'all_frozen' flags to the caller.
+ * HEAP_PAGE_PRUNE_FREEZE indicates that we will also freeze tuples
```
Nit: a tailing dot is needed in the end of the comment line.
8 - 0005
```
@@ -978,6 +1003,7 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
Buffer vmbuffer = params->vmbuffer;
Page page = BufferGetPage(buffer);
BlockNumber blockno = BufferGetBlockNumber(buffer);
+ TransactionId vm_conflict_horizon = InvalidTransactionId;
```
I guess the variable name “vm_conflict_horizon” comes from the old "presult->vm_conflict_horizon”. But in the new logic, this variable is used more generic, for example Assert(debug_cutoff == vm_conflict_horizon). I see 0006 has renamed to “conflict_xid”, so it’s up to you if or not rename it. But to make the commit self-contained, I’d suggest renaming it.
9 - 0006
```
@@ -3537,6 +3537,7 @@ heap_page_would_be_all_visible(Relation rel, Buffer buf,
{
ItemId itemid;
HeapTupleData tuple;
+ TransactionId dead_after = InvalidTransactionId;
```
This initialization seems to not needed, as HeapTupleSatisfiesVacuumHorizon() will always set a value to it.
10 - 0010
```
+ * there is any snapshot that still consider the newest xid on
```
Nit: consider -> considers
11 - 0011
```
+ * page. If we won't attempt freezing, just unset all-visible now, though.
*/
+ if (!prstate->attempt_freeze)
+ {
+ prstate->all_visible = false;
+ prstate->all_frozen = false;
+ }
```
The comment says “just unset all-visible”, but the code actually also unset all_frozen.
12 - 0012
```
+ /*
+ * RT indexes of relations modified by the query either through
+ * UPDATE/DELETE/INSERT/MERGE or SELECT FOR UPDATE
+ */
+ Bitmapset *es_modified_relids;
```
As we intentionally only want indexes, does it make sense to just name the field es_modified_rtindexes to make it more explicit.
13 - 0012
```
+ /* If it has a rowmark, the relation is modified */
+ estate->es_modified_relids = bms_add_member(estate->es_modified_relids,
+ rc->rti);
```
I think this comment is a little misleading, because SELECT FOR UPDATE/SHARE doesn’t always modify tuples of the relation. If a reader not associating this code with this patch, he may consider the comment is wrong. So, I think we should make the comment more explicit. Maybe rephrase like “If it has a rowmark, the relation may modify or lock heap pages”.
14 - 0015 - commit message
```
Setting pd_prune_xid on insert can cause a page to be dirtied and
written out when it previously would not have been, affetcting the
```
Typo: affetcting -> affecting
Best regards,
--
Chao Li (Evan)
HighGo Software Co., Ltd.
https://www.highgo.com/
On Mon, Dec 22, 2025 at 2:20 AM Chao Li <li.evan.chao@gmail.com> wrote:
A few more comments on v29:
Thanks for the continued review! I've attached v30.
1 - 0002 - Looks like since 0002, visibilitymap_set()’s return value is no longer used, so do we need to update the function and change return type to void? I remember in some patches, to address Coverity alerts, people had to do “(void) function_with_a_return_value()”.
I was torn about whether or not to change the return value. Coverity
doesn't always warn about unused return values. Usually it warns if it
perceives the return value as needed for error checking or if it
thinks not using the return value is incorrect. It may still warn in
this case, but it's not obvious to me which way it would go.
I have changed the function signature as you suggested in v30.
My hesitation is that visibilitymap_set() is in a header file and
could be used by extensions/forks, etc. Adding more information by
changing a return value from void to non-void doesn't have any
negative effect on those potential callers. But taking away a return
value is more likely to affect them in a potentially negative way.
However, I'm significantly changing the signature in this release, so
everybody that used it will have to change their code completely
anyway. Also, I just added a return value for visibilitymap_set() in
the previous release (18). Historically, it returned void. So, I've
gone with your suggestion.
+static bool +identify_and_fix_vm_corruption(Relation rel, Buffer heap_buffer, + BlockNumber heap_blk, Page heap_page, + int nlpdead_items, + Buffer vmbuffer, + uint8 vmbits) +{ + Assert(visibilitymap_get_status(rel, heap_blk, &vmbuffer) == vmbits); ```Right before this function is called: ``` old_vmbits = visibilitymap_get_status(vacrel->rel, blkno, &vmbuffer); + if (identify_and_fix_vm_corruption(vacrel->rel, buf, blkno, page, + presult.lpdead_items, vmbuffer, + old_vmbits)) ```So, the Assert() is checking if old_vmbits is newly returned from visibilitymap_get_status(), in that case, identify_and_fix_vm_corruption() can take vmbits as a pointer , and it calls visibilitymap_get_status() to get vmbits itself and returns vmbits via the pointer, so that we don’t need to call visibilitymap_get_status() twice.
I see what you are saying, and I did consider this.
visibilitymap_get_status() is only called the second time in assert
builds, and it isn't so expensive to do it that it is worth worrying
about. I added the assertion to prevent other callers from calling
identify_and_fix_vm_corruption() with random VM bits unassociated with
the vmbuffer passed in.
The reason I don't think identify_and_fix_vm_corruption() should be
the one to call visibilitymap_get_status() and initialize old_vmbits
is that it shouldn't be a required step to setting the VM.
identify_and_fix_vm_corruption()'s job is to identify and fix
corruption -- not get the VM bits for when we set them. In fact, it
may make sense someday to check that the VM and PD_ALL_VISIBLE are in
sync before pruning and freezing is even started. (Of course, we can't
check the number of lpdead items until after).
Regarding having *old_vmbits as a return value. I thought about
directly returning the result of visibilitymap_clear() from
identify_and_fix_vm_corruption(). The reason I didn't is that if
PD_ALL_VISIBLE is set and nlpdead_items > 0 but the VM is clear,
visibilitymap_clear() will return false -- because it didn't need to
clear the VM bits. And I think we want
identify_and_fix_vm_corruption() to return true if it cleared
corruption at all.
I don't think we should have identify_and_fix_vm_corruption() reset
old_vmbits to 0 (and pass it by reference), because the caller may
want to know the value of old_vmbits before we cleared corruption.
+ * These are only set if the HEAP_PAGE_PRUNE_UPDATE_VM option is set and + * we have attempted to update the VM. + */ + uint8 new_vmbits; + uint8 old_vmbits; ```The comment feels a little confusing to me. "HEAP_PAGE_PRUNE_UPDATE_VM option is set” is a clear indication, but how to decide "we have attempted to update the VM”? By reading the code: ``` + prstate->attempt_update_vm = + (params->options & HEAP_PAGE_PRUNE_UPDATE_VM) != 0;It’s just the result of HEAP_PAGE_PRUNE_UPDATE_VM being set. So, maybe we don’t the “and” part.
Good point. Fixed.
+static bool +heap_page_will_set_vm(PruneState *prstate, + Relation relation, + BlockNumber heap_blk, Buffer heap_buffer, Page heap_page, + Buffer vmbuffer, + int nlpdead_items, + uint8 *old_vmbits, + uint8 *new_vmbits) +{ + if (!prstate->attempt_update_vm) + return false; ```old_vmbits and new_vmbits are purely output parameters. So, maybe we should set them to 0 inside this function instead of relying on callers to initialize them.
I think this is a similar case where I raised a comment earlier about initializing presult to {0} in the callers, and you only wanted to set presult in heap_page_prune_and_freeze().
I see your point. It does feel a little bit different to me since they
are local variables and coverity may not actually be able to tell they
are being unconditionally initialized by heap_page_will_set_vm(). The
other local variables that are not initialized at the top are all
unconditionally set by helper return values. But my decision to
initialize them was more instinct than rationality. I've changed it as
you suggested.
- * HEAP_PAGE_PRUNE_FREEZE indicates that we will also freeze tuples, and - * will return 'all_visible', 'all_frozen' flags to the caller. + * HEAP_PAGE_PRUNE_FREEZE indicates that we will also freeze tuplesNit: a tailing dot is needed in the end of the comment line.
I've changed it. One interesting thing is that our "policy" for
periods in comments is that we don't put periods at the end of
one-line comments and we do put them at the end of mult-line comment
sentences. This is a one-line comment inside a comment block, so I
wasn't sure what to do. If you noticed it, and it bothered you, it's
easy enough to change, though.
@@ -978,6 +1003,7 @@ heap_page_prune_and_freeze(PruneFreezeParams *params, Buffer vmbuffer = params->vmbuffer; Page page = BufferGetPage(buffer); BlockNumber blockno = BufferGetBlockNumber(buffer); + TransactionId vm_conflict_horizon = InvalidTransactionId; ```I guess the variable name “vm_conflict_horizon” comes from the old "presult->vm_conflict_horizon”. But in the new logic, this variable is used more generic, for example Assert(debug_cutoff == vm_conflict_horizon). I see 0006 has renamed to “conflict_xid”, so it’s up to you if or not rename it. But to make the commit self-contained, I’d suggest renaming it.
As of this patch, it is still being exclusively used as the conflict
XID for setting the visibility map. And it still is the visibility
horizon. I rename it to conflict xid once it includes more than just
the visibility horizon for an all-visible page. In that assertion, it
is also the visibility horizon for an all-visible page.
9 - 0006
@@ -3537,6 +3537,7 @@ heap_page_would_be_all_visible(Relation rel, Buffer buf, { ItemId itemid; HeapTupleData tuple; + TransactionId dead_after = InvalidTransactionId; ```This initialization seems to not needed, as HeapTupleSatisfiesVacuumHorizon() will always set a value to it.
I think this is a comment for a later patch in the set (you originally
said it was from 0006), but I've changed dead_after to not be
initialized like this.
+ /* + * RT indexes of relations modified by the query either through + * UPDATE/DELETE/INSERT/MERGE or SELECT FOR UPDATE + */ + Bitmapset *es_modified_relids; ```As we intentionally only want indexes, does it make sense to just name the field es_modified_rtindexes to make it more explicit.
I'm torn about this. I named it like this partially because the struct
member two above it in the estate, es_unpruned_relids, is also a
bitmapset of range table indexes and yet is called x_relids. Though
the bitmapset is one of indexes into the range table, they are the
indexes of relation IDs in that range table. I think this could go
either way, so I've left it as is for now and will think more about it
once this patch is closer to being committed.
+ /* If it has a rowmark, the relation is modified */ + estate->es_modified_relids = bms_add_member(estate->es_modified_relids, + rc->rti); ```I think this comment is a little misleading, because SELECT FOR UPDATE/SHARE doesn’t always modify tuples of the relation. If a reader not associating this code with this patch, he may consider the comment is wrong. So, I think we should make the comment more explicit. Maybe rephrase like “If it has a rowmark, the relation may modify or lock heap pages”.
I see what you are saying. It's a good point. However, the reason we
don't want to set the VM for SELECT FOR UPDATE is not because the
SELECT FOR UPDATE will lock the relation but because it is usually
indicating that we intend to modify the relation (when we do the
update). As such, I've updated the comment to say "If it has a
rowmark, the relation may be modified" -- which leaves it more open.
- Melanie
Attachments:
v30-0001-Combine-visibilitymap_set-cases-in-lazy_scan_pru.patchtext/x-patch; charset=UTF-8; name=v30-0001-Combine-visibilitymap_set-cases-in-lazy_scan_pru.patchDownload
From ec1755a3055229d5bef9cc963f8f6b7edb2a1cd3 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <melanieplageman@gmail.com>
Date: Mon, 8 Dec 2025 15:49:54 -0500
Subject: [PATCH v30 01/16] Combine visibilitymap_set() cases in
lazy_scan_prune()
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
lazy_scan_prune() previously had two separate cases that called
visibilitymap_set() after pruning and freezing. These branches were
nearly identical except that one attempted to avoid dirtying the heap
buffer. However, that situation can never occur — the heap buffer cannot
be clean at that point (and we would hit an assertion if it were).
In lazy_scan_prune(), when we change a previously all-visible page to
all-frozen and the page was recorded as all-visible in the visibility
map by find_next_unskippable_block(), the heap buffer will always be
dirty. Either we have just frozen a tuple and already dirtied the
buffer, or the buffer was modified between find_next_unskippable_block()
and heap_page_prune_and_freeze() and then pruned in
heap_page_prune_and_freeze().
Additionally, XLogRegisterBuffer() asserts that the buffer is dirty, so
attempting to add a clean heap buffer to the WAL chain would assert out
anyway.
Since the “clean heap buffer with already set VM” case is impossible,
the two visibilitymap_set() branches in lazy_scan_prune() can be merged.
Doing so makes the intent clearer and emphasizes that the heap buffer
must always be marked dirty before being added to the WAL chain.
This commit also adds a test case for vacuuming when no heap
modifications are required. Currently this ensures that the heap buffer
is marked dirty before it is added to the WAL chain, but if we later
remove the heap buffer from the VM-set WAL chain or pass it with the
REGBUF_NO_CHANGES flag, this test would guard that behavior.
Author: Melanie Plageman <melanieplageman@gmail.com>
Reviewed-by: Chao Li <li.evan.chao@gmail.com>
Reviewed-by: Srinath Reddy Sadipiralla <srinath2133@gmail.com>
Reviewed-by: Kirill Reshke <reshkekirill@gmail.com>
Reviewed-by: Xuneng Zhou <xunengzhou@gmail.com>
Discussion: https://postgr.es/m/5CEAA162-67B1-44DA-B60D-8B65717E8B05%40gmail.com
Discussion: https://postgr.es/m/flat/CAAKRu_ZWx5gCbeCf7PWCv8p5%3D%3Db7EEws0VD2wksDxpXCvCyHvQ%40mail.gmail.com
---
.../pg_visibility/expected/pg_visibility.out | 44 ++++++++++
contrib/pg_visibility/sql/pg_visibility.sql | 20 +++++
src/backend/access/heap/vacuumlazy.c | 87 ++++---------------
3 files changed, 82 insertions(+), 69 deletions(-)
diff --git a/contrib/pg_visibility/expected/pg_visibility.out b/contrib/pg_visibility/expected/pg_visibility.out
index 09fa5933a35..e10f1706015 100644
--- a/contrib/pg_visibility/expected/pg_visibility.out
+++ b/contrib/pg_visibility/expected/pg_visibility.out
@@ -1,4 +1,5 @@
CREATE EXTENSION pg_visibility;
+CREATE EXTENSION pageinspect;
--
-- recently-dropped table
--
@@ -204,6 +205,49 @@ select pg_truncate_visibility_map('test_partition');
(1 row)
+-- test the case where vacuum phase I does not need to modify the heap buffer
+-- and only needs to set the VM
+create table test_vac_unmodified_heap(a int);
+insert into test_vac_unmodified_heap values (1);
+vacuum (freeze) test_vac_unmodified_heap;
+select pg_visibility_map_summary('test_vac_unmodified_heap');
+ pg_visibility_map_summary
+---------------------------
+ (1,1)
+(1 row)
+
+-- the checkpoint cleans the buffer dirtied by freezing the sole tuple
+checkpoint;
+-- truncating the VM ensures that the next vacuum will need to set it
+select pg_truncate_visibility_map('test_vac_unmodified_heap');
+ pg_truncate_visibility_map
+----------------------------
+
+(1 row)
+
+select pg_visibility_map_summary('test_vac_unmodified_heap');
+ pg_visibility_map_summary
+---------------------------
+ (0,0)
+(1 row)
+
+-- though the VM is truncated, the heap page-level visibility hint,
+-- PD_ALL_VISIBLE should still be set
+SELECT (flags & x'0004'::int) <> 0
+ FROM page_header(get_raw_page('test_vac_unmodified_heap', 0));
+ ?column?
+----------
+ t
+(1 row)
+
+-- vacuum sets the VM
+vacuum test_vac_unmodified_heap;
+select pg_visibility_map_summary('test_vac_unmodified_heap');
+ pg_visibility_map_summary
+---------------------------
+ (1,1)
+(1 row)
+
-- test copy freeze
create table copyfreeze (a int, b char(1500));
-- load all rows via COPY FREEZE and ensure that all pages are set all-visible
diff --git a/contrib/pg_visibility/sql/pg_visibility.sql b/contrib/pg_visibility/sql/pg_visibility.sql
index 5af06ec5b76..57af8a0c5b6 100644
--- a/contrib/pg_visibility/sql/pg_visibility.sql
+++ b/contrib/pg_visibility/sql/pg_visibility.sql
@@ -1,4 +1,5 @@
CREATE EXTENSION pg_visibility;
+CREATE EXTENSION pageinspect;
--
-- recently-dropped table
@@ -94,6 +95,25 @@ select count(*) > 0 from pg_visibility_map_summary('test_partition');
select * from pg_check_frozen('test_partition'); -- hopefully none
select pg_truncate_visibility_map('test_partition');
+-- test the case where vacuum phase I does not need to modify the heap buffer
+-- and only needs to set the VM
+create table test_vac_unmodified_heap(a int);
+insert into test_vac_unmodified_heap values (1);
+vacuum (freeze) test_vac_unmodified_heap;
+select pg_visibility_map_summary('test_vac_unmodified_heap');
+-- the checkpoint cleans the buffer dirtied by freezing the sole tuple
+checkpoint;
+-- truncating the VM ensures that the next vacuum will need to set it
+select pg_truncate_visibility_map('test_vac_unmodified_heap');
+select pg_visibility_map_summary('test_vac_unmodified_heap');
+-- though the VM is truncated, the heap page-level visibility hint,
+-- PD_ALL_VISIBLE should still be set
+SELECT (flags & x'0004'::int) <> 0
+ FROM page_header(get_raw_page('test_vac_unmodified_heap', 0));
+-- vacuum sets the VM
+vacuum test_vac_unmodified_heap;
+select pg_visibility_map_summary('test_vac_unmodified_heap');
+
-- test copy freeze
create table copyfreeze (a int, b char(1500));
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index 30778a15639..cecba2146ea 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -2093,16 +2093,14 @@ lazy_scan_prune(LVRelState *vacrel,
* of last heap_vac_scan_next_block() call), and from all_visible and
* all_frozen variables
*/
- if (!all_visible_according_to_vm && presult.all_visible)
+ if ((presult.all_visible && !all_visible_according_to_vm) ||
+ (presult.all_frozen && !VM_ALL_FROZEN(rel, blkno, &vmbuffer)))
{
uint8 old_vmbits;
uint8 flags = VISIBILITYMAP_ALL_VISIBLE;
if (presult.all_frozen)
- {
- Assert(!TransactionIdIsValid(presult.vm_conflict_horizon));
flags |= VISIBILITYMAP_ALL_FROZEN;
- }
/*
* It should never be the case that the visibility map page is set
@@ -2110,15 +2108,25 @@ lazy_scan_prune(LVRelState *vacrel,
* checksums are not enabled). Regardless, set both bits so that we
* get back in sync.
*
- * NB: If the heap page is all-visible but the VM bit is not set, we
- * don't need to dirty the heap page. However, if checksums are
- * enabled, we do need to make sure that the heap page is dirtied
- * before passing it to visibilitymap_set(), because it may be logged.
- * Given that this situation should only happen in rare cases after a
- * crash, it is not worth optimizing.
+ * Even if PD_ALL_VISIBLE is already set, we don't need to worry about
+ * unnecessarily dirtying the heap buffer. Nearly the only scenario
+ * where PD_ALL_VISIBLE is set but the VM is not is if the VM was
+ * removed -- and that isn't worth optimizing for. And if we add the
+ * heap buffer to the WAL chain (without passing REGBUF_NO_CHANGES),
+ * it must be marked dirty.
*/
PageSetAllVisible(page);
MarkBufferDirty(buf);
+
+ /*
+ * If the page is being set all-frozen, we pass InvalidTransactionId
+ * as the cutoff_xid, since a snapshot conflict horizon sufficient to
+ * make everything safe for REDO was logged when the page's tuples
+ * were frozen.
+ */
+ Assert(!presult.all_frozen ||
+ !TransactionIdIsValid(presult.vm_conflict_horizon));
+
old_vmbits = visibilitymap_set(vacrel->rel, blkno, buf,
InvalidXLogRecPtr,
vmbuffer, presult.vm_conflict_horizon,
@@ -2190,65 +2198,6 @@ lazy_scan_prune(LVRelState *vacrel,
VISIBILITYMAP_VALID_BITS);
}
- /*
- * If the all-visible page is all-frozen but not marked as such yet, mark
- * it as all-frozen.
- */
- else if (all_visible_according_to_vm && presult.all_frozen &&
- !VM_ALL_FROZEN(vacrel->rel, blkno, &vmbuffer))
- {
- uint8 old_vmbits;
-
- /*
- * Avoid relying on all_visible_according_to_vm as a proxy for the
- * page-level PD_ALL_VISIBLE bit being set, since it might have become
- * stale -- even when all_visible is set
- */
- if (!PageIsAllVisible(page))
- {
- PageSetAllVisible(page);
- MarkBufferDirty(buf);
- }
-
- /*
- * Set the page all-frozen (and all-visible) in the VM.
- *
- * We can pass InvalidTransactionId as our cutoff_xid, since a
- * snapshotConflictHorizon sufficient to make everything safe for REDO
- * was logged when the page's tuples were frozen.
- */
- Assert(!TransactionIdIsValid(presult.vm_conflict_horizon));
- old_vmbits = visibilitymap_set(vacrel->rel, blkno, buf,
- InvalidXLogRecPtr,
- vmbuffer, InvalidTransactionId,
- VISIBILITYMAP_ALL_VISIBLE |
- VISIBILITYMAP_ALL_FROZEN);
-
- /*
- * The page was likely already set all-visible in the VM. However,
- * there is a small chance that it was modified sometime between
- * setting all_visible_according_to_vm and checking the visibility
- * during pruning. Check the return value of old_vmbits anyway to
- * ensure the visibility map counters used for logging are accurate.
- */
- if ((old_vmbits & VISIBILITYMAP_ALL_VISIBLE) == 0)
- {
- vacrel->vm_new_visible_pages++;
- vacrel->vm_new_visible_frozen_pages++;
- *vm_page_frozen = true;
- }
-
- /*
- * We already checked that the page was not set all-frozen in the VM
- * above, so we don't need to test the value of old_vmbits.
- */
- else
- {
- vacrel->vm_new_frozen_pages++;
- *vm_page_frozen = true;
- }
- }
-
return presult.ndeleted;
}
--
2.43.0
v30-0002-Eliminate-use-of-cached-VM-value-in-lazy_scan_pr.patchtext/x-patch; charset=US-ASCII; name=v30-0002-Eliminate-use-of-cached-VM-value-in-lazy_scan_pr.patchDownload
From f55678510379299ca66cf78fbf6e08ec8ecda0d2 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <melanieplageman@gmail.com>
Date: Thu, 11 Dec 2025 10:48:13 -0500
Subject: [PATCH v30 02/16] Eliminate use of cached VM value in
lazy_scan_prune()
lazy_scan_prune() takes a parameter from lazy_scan_heap() indicating
whether the page was marked all-visible in the VM at the time it was
last checked in find_next_unskippable_block(). This behavior is
historical, dating back to commit 608195a3a365, when we did not pin the
VM page until confirming it was not all-visible. Now that the VM page is
already pinned, there is no meaningful benefit to relying on a cached VM
status.
Removing this cached value simplifies the logic in both lazy_scan_heap()
and lazy_scan_prune(). It also clarifies future work that will set the
visibility map on-access: such paths will not have a cached value
available which would make the logic harder to reason about. Eliminating
it also enables us to detect and repair VM corruption on-access.
Along with removing the cached value and unconditionally checking the
visibility status of the heap page, this commit also moves the VM
corruption handling to occur first. This reordering should have no
performance impact, since the checks are inexpensive and performed only
once per page. It does, however, make the control flow easier to
understand. The new restructuring also makes it possible that after
fixing corruption, the VM could be newly set, if pruning found the page
all-visible.
Now that no callers of visibilitymap_set() use its return value, change
its (and visibilitymap_set_vmbits()) return type to void.
Author: Melanie Plageman <melanieplageman@gmail.com>
Reviewed-by: Kirill Reshke <reshkekirill@gmail.com>
Reviewed-by: Chao Li <li.evan.chao@gmail.com>
Reviewed-by: Xuneng Zhou <xunengzhou@gmail.com>
Discussion: https://postgr.es/m/5CEAA162-67B1-44DA-B60D-8B65717E8B05%40gmail.com
---
src/backend/access/heap/vacuumlazy.c | 182 +++++++++++-------------
src/backend/access/heap/visibilitymap.c | 9 +-
src/include/access/visibilitymap.h | 18 +--
3 files changed, 94 insertions(+), 115 deletions(-)
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index cecba2146ea..d47ed7814c8 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -248,13 +248,6 @@ typedef enum
*/
#define EAGER_SCAN_REGION_SIZE 4096
-/*
- * heap_vac_scan_next_block() sets these flags to communicate information
- * about the block it read to the caller.
- */
-#define VAC_BLK_WAS_EAGER_SCANNED (1 << 0)
-#define VAC_BLK_ALL_VISIBLE_ACCORDING_TO_VM (1 << 1)
-
typedef struct LVRelState
{
/* Target heap relation and its indexes */
@@ -358,7 +351,6 @@ typedef struct LVRelState
/* State maintained by heap_vac_scan_next_block() */
BlockNumber current_block; /* last block returned */
BlockNumber next_unskippable_block; /* next unskippable block */
- bool next_unskippable_allvis; /* its visibility status */
bool next_unskippable_eager_scanned; /* if it was eagerly scanned */
Buffer next_unskippable_vmbuffer; /* buffer containing its VM bit */
@@ -432,7 +424,7 @@ static bool lazy_scan_new_or_empty(LVRelState *vacrel, Buffer buf,
bool sharelock, Buffer vmbuffer);
static int lazy_scan_prune(LVRelState *vacrel, Buffer buf,
BlockNumber blkno, Page page,
- Buffer vmbuffer, bool all_visible_according_to_vm,
+ Buffer vmbuffer,
bool *has_lpdead_items, bool *vm_page_frozen);
static bool lazy_scan_noprune(LVRelState *vacrel, Buffer buf,
BlockNumber blkno, Page page,
@@ -1248,7 +1240,6 @@ lazy_scan_heap(LVRelState *vacrel)
/* Initialize for the first heap_vac_scan_next_block() call */
vacrel->current_block = InvalidBlockNumber;
vacrel->next_unskippable_block = InvalidBlockNumber;
- vacrel->next_unskippable_allvis = false;
vacrel->next_unskippable_eager_scanned = false;
vacrel->next_unskippable_vmbuffer = InvalidBuffer;
@@ -1264,13 +1255,13 @@ lazy_scan_heap(LVRelState *vacrel)
MAIN_FORKNUM,
heap_vac_scan_next_block,
vacrel,
- sizeof(uint8));
+ sizeof(bool));
while (true)
{
Buffer buf;
Page page;
- uint8 blk_info = 0;
+ bool was_eager_scanned = false;
int ndeleted = 0;
bool has_lpdead_items;
void *per_buffer_data = NULL;
@@ -1339,13 +1330,13 @@ lazy_scan_heap(LVRelState *vacrel)
if (!BufferIsValid(buf))
break;
- blk_info = *((uint8 *) per_buffer_data);
+ was_eager_scanned = *((bool *) per_buffer_data);
CheckBufferIsPinnedOnce(buf);
page = BufferGetPage(buf);
blkno = BufferGetBlockNumber(buf);
vacrel->scanned_pages++;
- if (blk_info & VAC_BLK_WAS_EAGER_SCANNED)
+ if (was_eager_scanned)
vacrel->eager_scanned_pages++;
/* Report as block scanned, update error traceback information */
@@ -1416,7 +1407,6 @@ lazy_scan_heap(LVRelState *vacrel)
if (got_cleanup_lock)
ndeleted = lazy_scan_prune(vacrel, buf, blkno, page,
vmbuffer,
- blk_info & VAC_BLK_ALL_VISIBLE_ACCORDING_TO_VM,
&has_lpdead_items, &vm_page_frozen);
/*
@@ -1433,8 +1423,7 @@ lazy_scan_heap(LVRelState *vacrel)
* exclude pages skipped due to cleanup lock contention from eager
* freeze algorithm caps.
*/
- if (got_cleanup_lock &&
- (blk_info & VAC_BLK_WAS_EAGER_SCANNED))
+ if (got_cleanup_lock && was_eager_scanned)
{
/* Aggressive vacuums do not eager scan. */
Assert(!vacrel->aggressive);
@@ -1601,7 +1590,6 @@ heap_vac_scan_next_block(ReadStream *stream,
{
BlockNumber next_block;
LVRelState *vacrel = callback_private_data;
- uint8 blk_info = 0;
/* relies on InvalidBlockNumber + 1 overflowing to 0 on first call */
next_block = vacrel->current_block + 1;
@@ -1664,8 +1652,8 @@ heap_vac_scan_next_block(ReadStream *stream,
* otherwise they would've been unskippable.
*/
vacrel->current_block = next_block;
- blk_info |= VAC_BLK_ALL_VISIBLE_ACCORDING_TO_VM;
- *((uint8 *) per_buffer_data) = blk_info;
+ /* Block was not eager scanned */
+ *((bool *) per_buffer_data) = false;
return vacrel->current_block;
}
else
@@ -1677,11 +1665,7 @@ heap_vac_scan_next_block(ReadStream *stream,
Assert(next_block == vacrel->next_unskippable_block);
vacrel->current_block = next_block;
- if (vacrel->next_unskippable_allvis)
- blk_info |= VAC_BLK_ALL_VISIBLE_ACCORDING_TO_VM;
- if (vacrel->next_unskippable_eager_scanned)
- blk_info |= VAC_BLK_WAS_EAGER_SCANNED;
- *((uint8 *) per_buffer_data) = blk_info;
+ *((bool *) per_buffer_data) = vacrel->next_unskippable_eager_scanned;
return vacrel->current_block;
}
}
@@ -1706,7 +1690,6 @@ find_next_unskippable_block(LVRelState *vacrel, bool *skipsallvis)
BlockNumber next_unskippable_block = vacrel->next_unskippable_block + 1;
Buffer next_unskippable_vmbuffer = vacrel->next_unskippable_vmbuffer;
bool next_unskippable_eager_scanned = false;
- bool next_unskippable_allvis;
*skipsallvis = false;
@@ -1716,7 +1699,6 @@ find_next_unskippable_block(LVRelState *vacrel, bool *skipsallvis)
next_unskippable_block,
&next_unskippable_vmbuffer);
- next_unskippable_allvis = (mapbits & VISIBILITYMAP_ALL_VISIBLE) != 0;
/*
* At the start of each eager scan region, normal vacuums with eager
@@ -1735,7 +1717,7 @@ find_next_unskippable_block(LVRelState *vacrel, bool *skipsallvis)
* A block is unskippable if it is not all visible according to the
* visibility map.
*/
- if (!next_unskippable_allvis)
+ if ((mapbits & VISIBILITYMAP_ALL_VISIBLE) == 0)
{
Assert((mapbits & VISIBILITYMAP_ALL_FROZEN) == 0);
break;
@@ -1792,7 +1774,6 @@ find_next_unskippable_block(LVRelState *vacrel, bool *skipsallvis)
/* write the local variables back to vacrel */
vacrel->next_unskippable_block = next_unskippable_block;
- vacrel->next_unskippable_allvis = next_unskippable_allvis;
vacrel->next_unskippable_eager_scanned = next_unskippable_eager_scanned;
vacrel->next_unskippable_vmbuffer = next_unskippable_vmbuffer;
}
@@ -1953,9 +1934,7 @@ cmpOffsetNumbers(const void *a, const void *b)
* Caller must hold pin and buffer cleanup lock on the buffer.
*
* vmbuffer is the buffer containing the VM block with visibility information
- * for the heap block, blkno. all_visible_according_to_vm is the saved
- * visibility status of the heap block looked up earlier by the caller. We
- * won't rely entirely on this status, as it may be out of date.
+ * for the heap block, blkno.
*
* *has_lpdead_items is set to true or false depending on whether, upon return
* from this function, any LP_DEAD items are still present on the page.
@@ -1972,7 +1951,6 @@ lazy_scan_prune(LVRelState *vacrel,
BlockNumber blkno,
Page page,
Buffer vmbuffer,
- bool all_visible_according_to_vm,
bool *has_lpdead_items,
bool *vm_page_frozen)
{
@@ -1986,6 +1964,8 @@ lazy_scan_prune(LVRelState *vacrel,
.vistest = vacrel->vistest,
.cutoffs = &vacrel->cutoffs,
};
+ uint8 old_vmbits = 0;
+ uint8 new_vmbits = 0;
Assert(BufferGetBlockNumber(buf) == blkno);
@@ -2088,70 +2068,7 @@ lazy_scan_prune(LVRelState *vacrel,
Assert(!presult.all_visible || !(*has_lpdead_items));
Assert(!presult.all_frozen || presult.all_visible);
- /*
- * Handle setting visibility map bit based on information from the VM (as
- * of last heap_vac_scan_next_block() call), and from all_visible and
- * all_frozen variables
- */
- if ((presult.all_visible && !all_visible_according_to_vm) ||
- (presult.all_frozen && !VM_ALL_FROZEN(rel, blkno, &vmbuffer)))
- {
- uint8 old_vmbits;
- uint8 flags = VISIBILITYMAP_ALL_VISIBLE;
-
- if (presult.all_frozen)
- flags |= VISIBILITYMAP_ALL_FROZEN;
-
- /*
- * It should never be the case that the visibility map page is set
- * while the page-level bit is clear, but the reverse is allowed (if
- * checksums are not enabled). Regardless, set both bits so that we
- * get back in sync.
- *
- * Even if PD_ALL_VISIBLE is already set, we don't need to worry about
- * unnecessarily dirtying the heap buffer. Nearly the only scenario
- * where PD_ALL_VISIBLE is set but the VM is not is if the VM was
- * removed -- and that isn't worth optimizing for. And if we add the
- * heap buffer to the WAL chain (without passing REGBUF_NO_CHANGES),
- * it must be marked dirty.
- */
- PageSetAllVisible(page);
- MarkBufferDirty(buf);
-
- /*
- * If the page is being set all-frozen, we pass InvalidTransactionId
- * as the cutoff_xid, since a snapshot conflict horizon sufficient to
- * make everything safe for REDO was logged when the page's tuples
- * were frozen.
- */
- Assert(!presult.all_frozen ||
- !TransactionIdIsValid(presult.vm_conflict_horizon));
-
- old_vmbits = visibilitymap_set(vacrel->rel, blkno, buf,
- InvalidXLogRecPtr,
- vmbuffer, presult.vm_conflict_horizon,
- flags);
-
- /*
- * If the page wasn't already set all-visible and/or all-frozen in the
- * VM, count it as newly set for logging.
- */
- if ((old_vmbits & VISIBILITYMAP_ALL_VISIBLE) == 0)
- {
- vacrel->vm_new_visible_pages++;
- if (presult.all_frozen)
- {
- vacrel->vm_new_visible_frozen_pages++;
- *vm_page_frozen = true;
- }
- }
- else if ((old_vmbits & VISIBILITYMAP_ALL_FROZEN) == 0 &&
- presult.all_frozen)
- {
- vacrel->vm_new_frozen_pages++;
- *vm_page_frozen = true;
- }
- }
+ old_vmbits = visibilitymap_get_status(vacrel->rel, blkno, &vmbuffer);
/*
* As of PostgreSQL 9.2, the visibility map bit should never be set if the
@@ -2159,8 +2076,8 @@ lazy_scan_prune(LVRelState *vacrel,
* cleared after heap_vac_scan_next_block() was called, so we must recheck
* with buffer lock before concluding that the VM is corrupt.
*/
- else if (all_visible_according_to_vm && !PageIsAllVisible(page) &&
- visibilitymap_get_status(vacrel->rel, blkno, &vmbuffer) != 0)
+ if (!PageIsAllVisible(page) &&
+ (old_vmbits & VISIBILITYMAP_VALID_BITS) != 0)
{
ereport(WARNING,
(errcode(ERRCODE_DATA_CORRUPTED),
@@ -2169,6 +2086,8 @@ lazy_scan_prune(LVRelState *vacrel,
visibilitymap_clear(vacrel->rel, blkno, vmbuffer,
VISIBILITYMAP_VALID_BITS);
+ /* VM bits are now clear */
+ old_vmbits = 0;
}
/*
@@ -2196,6 +2115,71 @@ lazy_scan_prune(LVRelState *vacrel,
MarkBufferDirty(buf);
visibilitymap_clear(vacrel->rel, blkno, vmbuffer,
VISIBILITYMAP_VALID_BITS);
+ /* VM bits are now clear */
+ old_vmbits = 0;
+ }
+
+ if (!presult.all_visible)
+ return presult.ndeleted;
+
+ /* Set the visibility map and page visibility hint */
+ new_vmbits = VISIBILITYMAP_ALL_VISIBLE;
+
+ if (presult.all_frozen)
+ new_vmbits |= VISIBILITYMAP_ALL_FROZEN;
+
+ /* Nothing to do */
+ if (old_vmbits == new_vmbits)
+ return presult.ndeleted;
+
+ Assert(presult.all_visible);
+
+ /*
+ * It should never be the case that the visibility map page is set while
+ * the page-level bit is clear, but the reverse is allowed (if checksums
+ * are not enabled). Regardless, set both bits so that we get back in
+ * sync.
+ *
+ * The heap buffer must be marked dirty before adding it to the WAL chain
+ * when setting the VM. We don't worry about unnecessarily dirtying the
+ * heap buffer if PD_ALL_VISIBLE is already set, though. It is extremely
+ * rare to have a clean heap buffer with PD_ALL_VISIBLE already set and
+ * the VM bits clear, so there is no point in optimizing it.
+ */
+ PageSetAllVisible(page);
+ MarkBufferDirty(buf);
+
+ /*
+ * If the page is being set all-frozen, we pass InvalidTransactionId as
+ * the cutoff_xid, since a snapshot conflict horizon sufficient to make
+ * everything safe for REDO was logged when the page's tuples were frozen.
+ */
+ Assert(!presult.all_frozen ||
+ !TransactionIdIsValid(presult.vm_conflict_horizon));
+
+ visibilitymap_set(vacrel->rel, blkno, buf,
+ InvalidXLogRecPtr,
+ vmbuffer, presult.vm_conflict_horizon,
+ new_vmbits);
+
+ /*
+ * If the page wasn't already set all-visible and/or all-frozen in the VM,
+ * count it as newly set for logging.
+ */
+ if ((old_vmbits & VISIBILITYMAP_ALL_VISIBLE) == 0)
+ {
+ vacrel->vm_new_visible_pages++;
+ if (presult.all_frozen)
+ {
+ vacrel->vm_new_visible_frozen_pages++;
+ *vm_page_frozen = true;
+ }
+ }
+ else if ((old_vmbits & VISIBILITYMAP_ALL_FROZEN) == 0 &&
+ presult.all_frozen)
+ {
+ vacrel->vm_new_frozen_pages++;
+ *vm_page_frozen = true;
}
return presult.ndeleted;
diff --git a/src/backend/access/heap/visibilitymap.c b/src/backend/access/heap/visibilitymap.c
index d14588e92ae..cdcb475e501 100644
--- a/src/backend/access/heap/visibilitymap.c
+++ b/src/backend/access/heap/visibilitymap.c
@@ -240,10 +240,8 @@ visibilitymap_pin_ok(BlockNumber heapBlk, Buffer vmbuf)
* You must pass a buffer containing the correct map page to this function.
* Call visibilitymap_pin first to pin the right one. This function doesn't do
* any I/O.
- *
- * Returns the state of the page's VM bits before setting flags.
*/
-uint8
+void
visibilitymap_set(Relation rel, BlockNumber heapBlk, Buffer heapBuf,
XLogRecPtr recptr, Buffer vmBuf, TransactionId cutoff_xid,
uint8 flags)
@@ -320,7 +318,6 @@ visibilitymap_set(Relation rel, BlockNumber heapBlk, Buffer heapBuf,
}
LockBuffer(vmBuf, BUFFER_LOCK_UNLOCK);
- return status;
}
/*
@@ -343,7 +340,7 @@ visibilitymap_set(Relation rel, BlockNumber heapBlk, Buffer heapBuf,
*
* rlocator is used only for debugging messages.
*/
-uint8
+void
visibilitymap_set_vmbits(BlockNumber heapBlk,
Buffer vmBuf, uint8 flags,
const RelFileLocator rlocator)
@@ -386,8 +383,6 @@ visibilitymap_set_vmbits(BlockNumber heapBlk,
map[mapByte] |= (flags << mapOffset);
MarkBufferDirty(vmBuf);
}
-
- return status;
}
/*
diff --git a/src/include/access/visibilitymap.h b/src/include/access/visibilitymap.h
index c6fa37be968..787c19e5fef 100644
--- a/src/include/access/visibilitymap.h
+++ b/src/include/access/visibilitymap.h
@@ -32,15 +32,15 @@ extern bool visibilitymap_clear(Relation rel, BlockNumber heapBlk,
extern void visibilitymap_pin(Relation rel, BlockNumber heapBlk,
Buffer *vmbuf);
extern bool visibilitymap_pin_ok(BlockNumber heapBlk, Buffer vmbuf);
-extern uint8 visibilitymap_set(Relation rel,
- BlockNumber heapBlk, Buffer heapBuf,
- XLogRecPtr recptr,
- Buffer vmBuf,
- TransactionId cutoff_xid,
- uint8 flags);
-extern uint8 visibilitymap_set_vmbits(BlockNumber heapBlk,
- Buffer vmBuf, uint8 flags,
- const RelFileLocator rlocator);
+extern void visibilitymap_set(Relation rel,
+ BlockNumber heapBlk, Buffer heapBuf,
+ XLogRecPtr recptr,
+ Buffer vmBuf,
+ TransactionId cutoff_xid,
+ uint8 flags);
+extern void visibilitymap_set_vmbits(BlockNumber heapBlk,
+ Buffer vmBuf, uint8 flags,
+ const RelFileLocator rlocator);
extern uint8 visibilitymap_get_status(Relation rel, BlockNumber heapBlk, Buffer *vmbuf);
extern void visibilitymap_count(Relation rel, BlockNumber *all_visible, BlockNumber *all_frozen);
extern BlockNumber visibilitymap_prepare_truncate(Relation rel,
--
2.43.0
v30-0003-Refactor-lazy_scan_prune-VM-clear-logic-into-hel.patchtext/x-patch; charset=US-ASCII; name=v30-0003-Refactor-lazy_scan_prune-VM-clear-logic-into-hel.patchDownload
From d196bdeefae2b14ca3b7abf22b6d6cffca116cd4 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <melanieplageman@gmail.com>
Date: Tue, 2 Dec 2025 13:36:39 -0500
Subject: [PATCH v30 03/16] Refactor lazy_scan_prune() VM clear logic into
helper
Encapsulating them in a helper makes the whole function clearer. There
is no functional change other than moving it into a helper.
Reviewed-by: Kirill Reshke <reshkekirill@gmail.com>
Reviewed-by: Chao Li <li.evan.chao@gmail.com>
---
src/backend/access/heap/vacuumlazy.c | 132 +++++++++++++++++----------
1 file changed, 85 insertions(+), 47 deletions(-)
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index d47ed7814c8..c5fc5b71f94 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -422,6 +422,11 @@ static void find_next_unskippable_block(LVRelState *vacrel, bool *skipsallvis);
static bool lazy_scan_new_or_empty(LVRelState *vacrel, Buffer buf,
BlockNumber blkno, Page page,
bool sharelock, Buffer vmbuffer);
+static bool identify_and_fix_vm_corruption(Relation rel, Buffer heap_buffer,
+ BlockNumber heap_blk, Page heap_page,
+ int nlpdead_items,
+ Buffer vmbuffer,
+ uint8 vmbits);
static int lazy_scan_prune(LVRelState *vacrel, Buffer buf,
BlockNumber blkno, Page page,
Buffer vmbuffer,
@@ -1928,6 +1933,83 @@ cmpOffsetNumbers(const void *a, const void *b)
return pg_cmp_u16(*(const OffsetNumber *) a, *(const OffsetNumber *) b);
}
+/*
+ * Helper to correct any corruption detected on a heap page and its
+ * corresponding visibility map page after pruning but before setting the
+ * visibility map. It examines the heap page, the associated VM page, and the
+ * number of dead items previously identified.
+ *
+ * This function must be called while holding an exclusive lock on the heap
+ * buffer, and the dead items must have been discovered under that same lock.
+
+ * The provided vmbits must reflect the current state of the VM block
+ * referenced by vmbuffer. Although we do not hold a lock on the VM buffer, it
+ * is pinned, and the heap buffer is exclusively locked, ensuring that no
+ * other backend can update the VM bits corresponding to this heap page.
+ *
+ * Returns true if it cleared corruption and false otherwise.
+ */
+static bool
+identify_and_fix_vm_corruption(Relation rel, Buffer heap_buffer,
+ BlockNumber heap_blk, Page heap_page,
+ int nlpdead_items,
+ Buffer vmbuffer,
+ uint8 vmbits)
+{
+ Assert(visibilitymap_get_status(rel, heap_blk, &vmbuffer) == vmbits);
+
+ Assert(BufferIsLockedByMeInMode(heap_buffer, BUFFER_LOCK_EXCLUSIVE));
+
+ /*
+ * As of PostgreSQL 9.2, the visibility map bit should never be set if the
+ * page-level bit is clear. However, it's possible that the bit got
+ * cleared after heap_vac_scan_next_block() was called, so we must recheck
+ * with buffer lock before concluding that the VM is corrupt.
+ */
+ if (!PageIsAllVisible(heap_page) &&
+ ((vmbits & VISIBILITYMAP_VALID_BITS) != 0))
+ {
+ ereport(WARNING,
+ (errcode(ERRCODE_DATA_CORRUPTED),
+ errmsg("page is not marked all-visible but visibility map bit is set in relation \"%s\" page %u",
+ RelationGetRelationName(rel), heap_blk)));
+
+ visibilitymap_clear(rel, heap_blk, vmbuffer,
+ VISIBILITYMAP_VALID_BITS);
+ return true;
+ }
+
+ /*
+ * It's possible for the value returned by
+ * GetOldestNonRemovableTransactionId() to move backwards, so it's not
+ * wrong for us to see tuples that appear to not be visible to everyone
+ * yet, while PD_ALL_VISIBLE is already set. The real safe xmin value
+ * never moves backwards, but GetOldestNonRemovableTransactionId() is
+ * conservative and sometimes returns a value that's unnecessarily small,
+ * so if we see that contradiction it just means that the tuples that we
+ * think are not visible to everyone yet actually are, and the
+ * PD_ALL_VISIBLE flag is correct.
+ *
+ * There should never be LP_DEAD items on a page with PD_ALL_VISIBLE set,
+ * however.
+ */
+ else if (PageIsAllVisible(heap_page) && nlpdead_items > 0)
+ {
+ ereport(WARNING,
+ (errcode(ERRCODE_DATA_CORRUPTED),
+ errmsg("page containing LP_DEAD items is marked as all-visible in relation \"%s\" page %u",
+ RelationGetRelationName(rel), heap_blk)));
+
+ PageClearAllVisible(heap_page);
+ MarkBufferDirty(heap_buffer);
+ visibilitymap_clear(rel, heap_blk, vmbuffer,
+ VISIBILITYMAP_VALID_BITS);
+ return true;
+ }
+
+ return false;
+}
+
/*
* lazy_scan_prune() -- lazy_scan_heap() pruning and freezing.
*
@@ -2070,54 +2152,10 @@ lazy_scan_prune(LVRelState *vacrel,
old_vmbits = visibilitymap_get_status(vacrel->rel, blkno, &vmbuffer);
- /*
- * As of PostgreSQL 9.2, the visibility map bit should never be set if the
- * page-level bit is clear. However, it's possible that the bit got
- * cleared after heap_vac_scan_next_block() was called, so we must recheck
- * with buffer lock before concluding that the VM is corrupt.
- */
- if (!PageIsAllVisible(page) &&
- (old_vmbits & VISIBILITYMAP_VALID_BITS) != 0)
- {
- ereport(WARNING,
- (errcode(ERRCODE_DATA_CORRUPTED),
- errmsg("page is not marked all-visible but visibility map bit is set in relation \"%s\" page %u",
- vacrel->relname, blkno)));
-
- visibilitymap_clear(vacrel->rel, blkno, vmbuffer,
- VISIBILITYMAP_VALID_BITS);
- /* VM bits are now clear */
+ if (identify_and_fix_vm_corruption(vacrel->rel, buf, blkno, page,
+ presult.lpdead_items, vmbuffer,
+ old_vmbits))
old_vmbits = 0;
- }
-
- /*
- * It's possible for the value returned by
- * GetOldestNonRemovableTransactionId() to move backwards, so it's not
- * wrong for us to see tuples that appear to not be visible to everyone
- * yet, while PD_ALL_VISIBLE is already set. The real safe xmin value
- * never moves backwards, but GetOldestNonRemovableTransactionId() is
- * conservative and sometimes returns a value that's unnecessarily small,
- * so if we see that contradiction it just means that the tuples that we
- * think are not visible to everyone yet actually are, and the
- * PD_ALL_VISIBLE flag is correct.
- *
- * There should never be LP_DEAD items on a page with PD_ALL_VISIBLE set,
- * however.
- */
- else if (presult.lpdead_items > 0 && PageIsAllVisible(page))
- {
- ereport(WARNING,
- (errcode(ERRCODE_DATA_CORRUPTED),
- errmsg("page containing LP_DEAD items is marked as all-visible in relation \"%s\" page %u",
- vacrel->relname, blkno)));
-
- PageClearAllVisible(page);
- MarkBufferDirty(buf);
- visibilitymap_clear(vacrel->rel, blkno, vmbuffer,
- VISIBILITYMAP_VALID_BITS);
- /* VM bits are now clear */
- old_vmbits = 0;
- }
if (!presult.all_visible)
return presult.ndeleted;
--
2.43.0
v30-0004-Set-the-VM-in-heap_page_prune_and_freeze.patchtext/x-patch; charset=US-ASCII; name=v30-0004-Set-the-VM-in-heap_page_prune_and_freeze.patchDownload
From 566794eed6786868a1147e6a0436d74c0603ccdf Mon Sep 17 00:00:00 2001
From: Melanie Plageman <melanieplageman@gmail.com>
Date: Tue, 2 Dec 2025 15:07:42 -0500
Subject: [PATCH v30 04/16] Set the VM in heap_page_prune_and_freeze()
This has no independent benefit. It is meant for ease of review. As of
this commit, there is still a separate WAL record emitted for setting
the VM after pruning and freezing. But it is easier to review if moving
the logic into pruneheap.c is separate from setting the VM in the same
WAL record.
Reviewed-by: Chao Li <li.evan.chao@gmail.com>
---
src/backend/access/heap/pruneheap.c | 315 +++++++++++++++++++++++----
src/backend/access/heap/vacuumlazy.c | 150 +------------
src/include/access/heapam.h | 20 ++
3 files changed, 299 insertions(+), 186 deletions(-)
diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index 07aa08cfe14..1c1446058a7 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -19,7 +19,7 @@
#include "access/htup_details.h"
#include "access/multixact.h"
#include "access/transam.h"
-#include "access/visibilitymapdefs.h"
+#include "access/visibilitymap.h"
#include "access/xlog.h"
#include "access/xloginsert.h"
#include "commands/vacuum.h"
@@ -44,6 +44,8 @@ typedef struct
bool mark_unused_now;
/* whether to attempt freezing tuples */
bool attempt_freeze;
+ /* whether or not to attempt updating the VM */
+ bool attempt_update_vm;
struct VacuumCutoffs *cutoffs;
/*-------------------------------------------------------
@@ -140,16 +142,17 @@ typedef struct
* all_visible and all_frozen indicate if the all-visible and all-frozen
* bits in the visibility map can be set for this page after pruning.
*
- * visibility_cutoff_xid is the newest xmin of live tuples on the page.
- * The caller can use it as the conflict horizon, when setting the VM
- * bits. It is only valid if we froze some tuples, and all_frozen is
- * true.
+ * visibility_cutoff_xid is the newest xmin of live tuples on the page. It
+ * can be used as the conflict horizon when setting the VM or when
+ * freezing all the tuples on the page. It is only valid when all the live
+ * tuples on the page are all-visible.
*
* NOTE: all_visible and all_frozen initially don't include LP_DEAD items.
* That's convenient for heap_page_prune_and_freeze() to use them to
- * decide whether to freeze the page or not. The all_visible and
- * all_frozen values returned to the caller are adjusted to include
- * LP_DEAD items after we determine whether to opportunistically freeze.
+ * decide whether to opportunistically freeze the page or not. The
+ * all_visible and all_frozen values ultimately used to set the VM are
+ * adjusted to include LP_DEAD items after we determine whether or not to
+ * opportunistically freeze.
*/
bool all_visible;
bool all_frozen;
@@ -191,6 +194,17 @@ static void page_verify_redirects(Page page);
static bool heap_page_will_freeze(Relation relation, Buffer buffer,
bool did_tuple_hint_fpi, bool do_prune, bool do_hint_prune,
PruneState *prstate);
+static bool identify_and_fix_vm_corruption(Relation rel, Buffer heap_buffer,
+ BlockNumber heap_blk, Page heap_page, int nlpdead_items,
+ Buffer vmbuffer,
+ uint8 vmbits);
+static bool heap_page_will_set_vm(PruneState *prstate,
+ Relation relation,
+ BlockNumber heap_blk, Buffer heap_buffer, Page heap_page,
+ Buffer vmbuffer,
+ int nlpdead_items,
+ uint8 *old_vmbits,
+ uint8 *new_vmbits);
/*
@@ -280,6 +294,7 @@ heap_page_prune_opt(Relation relation, Buffer buffer)
PruneFreezeParams params = {
.relation = relation,
.buffer = buffer,
+ .vmbuffer = InvalidBuffer,
.reason = PRUNE_ON_ACCESS,
.options = 0,
.vistest = vistest,
@@ -341,6 +356,8 @@ prune_freeze_setup(PruneFreezeParams *params,
/* cutoffs must be provided if we will attempt freezing */
Assert(!(params->options & HEAP_PAGE_PRUNE_FREEZE) || params->cutoffs);
prstate->attempt_freeze = (params->options & HEAP_PAGE_PRUNE_FREEZE) != 0;
+ prstate->attempt_update_vm =
+ (params->options & HEAP_PAGE_PRUNE_UPDATE_VM) != 0;
prstate->cutoffs = params->cutoffs;
/*
@@ -396,51 +413,54 @@ prune_freeze_setup(PruneFreezeParams *params,
prstate->frz_conflict_horizon = InvalidTransactionId;
/*
- * Vacuum may update the VM after we're done. We can keep track of
- * whether the page will be all-visible and all-frozen after pruning and
- * freezing to help the caller to do that.
+ * Track whether the page could be marked all-visible and/or all-frozen.
+ * This information is used for opportunistic freezing and for updating
+ * the visibility map (VM) if requested by the caller.
*
- * Currently, only VACUUM sets the VM bits. To save the effort, only do
- * the bookkeeping if the caller needs it. Currently, that's tied to
- * HEAP_PAGE_PRUNE_FREEZE, but it could be a separate flag if you wanted
- * to update the VM bits without also freezing or freeze without also
- * setting the VM bits.
+ * Currently, only VACUUM performs freezing, but other callers may in the
+ * future. Visibility bookkeeping is required not just for setting the VM
+ * bits, but also for opportunistic freezing: we only consider freezing if
+ * the page would become all-frozen, or if it would be all-frozen except
+ * for dead tuples that VACUUM will remove. If attempt_update_vm is false,
+ * we will not set the VM bit even if the page is found to be all-visible.
*
- * In addition to telling the caller whether it can set the VM bit, we
- * also use 'all_visible' and 'all_frozen' for our own decision-making. If
- * the whole page would become frozen, we consider opportunistically
- * freezing tuples. We will not be able to freeze the whole page if there
- * are tuples present that are not visible to everyone or if there are
- * dead tuples which are not yet removable. However, dead tuples which
- * will be removed by the end of vacuuming should not preclude us from
- * opportunistically freezing. Because of that, we do not immediately
- * clear all_visible and all_frozen when we see LP_DEAD items. We fix
- * that after scanning the line pointers. We must correct all_visible and
- * all_frozen before we return them to the caller, so that the caller
- * doesn't set the VM bits incorrectly.
+ * If HEAP_PAGE_PRUNE_UPDATE_VM is passed without HEAP_PAGE_PRUNE_FREEZE,
+ * prstate.all_frozen must be initialized to false, since we will not call
+ * heap_prepare_freeze_tuple() for each tuple.
+ *
+ * Dead tuples that will be removed by the end of vacuum should not
+ * prevent opportunistic freezing. Therefore, we do not clear all_visible
+ * and all_frozen when we encounter LP_DEAD items. Instead, we correct
+ * them after deciding whether to freeze, but before updating the VM, to
+ * avoid setting the VM bits incorrectly.
+ *
+ * If neither freezing nor VM updates are requested, we skip the extra
+ * bookkeeping. In this case, initializing all_visible to false allows
+ * heap_prune_record_unchanged_lp_normal() to bypass unnecessary work.
*/
if (prstate->attempt_freeze)
{
prstate->all_visible = true;
prstate->all_frozen = true;
}
+ else if (prstate->attempt_update_vm)
+ {
+ prstate->all_visible = true;
+ prstate->all_frozen = false;
+ }
else
{
- /*
- * Initializing to false allows skipping the work to update them in
- * heap_prune_record_unchanged_lp_normal().
- */
prstate->all_visible = false;
prstate->all_frozen = false;
}
/*
- * The visibility cutoff xid is the newest xmin of live tuples on the
- * page. In the common case, this will be set as the conflict horizon the
- * caller can use for updating the VM. If, at the end of freezing and
- * pruning, the page is all-frozen, there is no possibility that any
- * running transaction on the standby does not see tuples on the page as
- * all-visible, so the conflict horizon remains InvalidTransactionId.
+ * The visibility cutoff xid is the newest xmin of live, committed tuples
+ * older than OldestXmin on the page. This field is only kept up-to-date
+ * if the page is all-visible. As soon as a tuple is encountered that is
+ * not visible to all, this field is unmaintained. As long as it is
+ * maintained, it can be used to calculate the snapshot conflict horizon
+ * when updating the VM and/or freezing all the tuples on the page.
*/
prstate->visibility_cutoff_xid = InvalidTransactionId;
}
@@ -775,10 +795,148 @@ heap_page_will_freeze(Relation relation, Buffer buffer,
return do_freeze;
}
+/*
+ * Helper to correct any corruption detected on a heap page and its
+ * corresponding visibility map page after pruning but before setting the
+ * visibility map. It examines the heap page, the associated VM page, and the
+ * number of dead items previously identified.
+ *
+ * This function must be called while holding an exclusive lock on the heap
+ * buffer, and the dead items must have been discovered under that same lock.
+ *
+ * The provided vmbits must reflect the current state of the VM block
+ * referenced by vmbuffer. Although we do not hold a lock on the VM buffer, it
+ * is pinned, and the heap buffer is exclusively locked, ensuring that no
+ * other backend can update the VM bits corresponding to this heap page.
+ *
+ * Returns true if it cleared corruption and false otherwise.
+ */
+static bool
+identify_and_fix_vm_corruption(Relation rel, Buffer heap_buffer,
+ BlockNumber heap_blk, Page heap_page,
+ int nlpdead_items,
+ Buffer vmbuffer,
+ uint8 vmbits)
+{
+ Assert(visibilitymap_get_status(rel, heap_blk, &vmbuffer) == vmbits);
+
+ Assert(BufferIsLockedByMeInMode(heap_buffer, BUFFER_LOCK_EXCLUSIVE));
+
+ /*
+ * As of PostgreSQL 9.2, the visibility map bit should never be set if the
+ * page-level bit is clear. However, it's possible that the bit got
+ * cleared after heap_vac_scan_next_block() was called, so we must recheck
+ * with buffer lock before concluding that the VM is corrupt.
+ */
+ if (!PageIsAllVisible(heap_page) &&
+ ((vmbits & VISIBILITYMAP_VALID_BITS) != 0))
+ {
+ ereport(WARNING,
+ (errcode(ERRCODE_DATA_CORRUPTED),
+ errmsg("page is not marked all-visible but visibility map bit is set in relation \"%s\" page %u",
+ RelationGetRelationName(rel), heap_blk)));
+
+ visibilitymap_clear(rel, heap_blk, vmbuffer,
+ VISIBILITYMAP_VALID_BITS);
+ return true;
+ }
+
+ /*
+ * It's possible for the value returned by
+ * GetOldestNonRemovableTransactionId() to move backwards, so it's not
+ * wrong for us to see tuples that appear to not be visible to everyone
+ * yet, while PD_ALL_VISIBLE is already set. The real safe xmin value
+ * never moves backwards, but GetOldestNonRemovableTransactionId() is
+ * conservative and sometimes returns a value that's unnecessarily small,
+ * so if we see that contradiction it just means that the tuples that we
+ * think are not visible to everyone yet actually are, and the
+ * PD_ALL_VISIBLE flag is correct.
+ *
+ * There should never be LP_DEAD items on a page with PD_ALL_VISIBLE set,
+ * however.
+ */
+ else if (PageIsAllVisible(heap_page) && nlpdead_items > 0)
+ {
+ ereport(WARNING,
+ (errcode(ERRCODE_DATA_CORRUPTED),
+ errmsg("page containing LP_DEAD items is marked as all-visible in relation \"%s\" page %u",
+ RelationGetRelationName(rel), heap_blk)));
+
+ PageClearAllVisible(heap_page);
+ MarkBufferDirty(heap_buffer);
+ visibilitymap_clear(rel, heap_blk, vmbuffer,
+ VISIBILITYMAP_VALID_BITS);
+ return true;
+ }
+
+ return false;
+}
+
+/*
+ * Decide whether to set the visibility map bits (all-visible and all-frozen)
+ * for heap_blk using information from the PruneState and VM.
+ *
+ * This function does not actually set the VM bits or page-level visibility
+ * hint, PD_ALL_VISIBLE.
+ *
+ * However, if it finds that the page-level visibility hint or VM is
+ * corrupted, it will fix them by clearing the VM bits and visibility hint.
+ * This does not need to be done in a critical section.
+ *
+ * Returns true if one or both VM bits should be set, along with returning the
+ * current value of the VM bits in *old_vmbits and the desired new value of
+ * the VM bits in *new_vmbits.
+ *
+ * If the VM should not be set, it returns false. If we won't consider
+ * updating the VM, *old_vmbits will be 0, regardless of the current value of
+ * the VM bits.
+ */
+static bool
+heap_page_will_set_vm(PruneState *prstate,
+ Relation relation,
+ BlockNumber heap_blk, Buffer heap_buffer, Page heap_page,
+ Buffer vmbuffer,
+ int nlpdead_items,
+ uint8 *old_vmbits,
+ uint8 *new_vmbits)
+{
+ *old_vmbits = 0;
+ *new_vmbits = 0;
+
+ if (!prstate->attempt_update_vm)
+ return false;
+
+ *old_vmbits = visibilitymap_get_status(relation, heap_blk,
+ &vmbuffer);
+
+ /* We do this even if not all-visible */
+ if (identify_and_fix_vm_corruption(relation, heap_buffer, heap_blk, heap_page,
+ nlpdead_items, vmbuffer,
+ *old_vmbits))
+ *old_vmbits = 0;
+
+ if (!prstate->all_visible)
+ return false;
+
+ *new_vmbits = VISIBILITYMAP_ALL_VISIBLE;
+
+ if (prstate->all_frozen)
+ *new_vmbits |= VISIBILITYMAP_ALL_FROZEN;
+
+ if (*new_vmbits == *old_vmbits)
+ {
+ *new_vmbits = 0;
+ return false;
+ }
+
+ return true;
+}
+
/*
* Prune and repair fragmentation and potentially freeze tuples on the
- * specified page.
+ * specified page. If the page's visibility status has changed, update it in
+ * the VM.
*
* Caller must have pin and buffer cleanup lock on the page. Note that we
* don't update the FSM information for page on caller's behalf. Caller might
@@ -793,12 +951,13 @@ heap_page_will_freeze(Relation relation, Buffer buffer,
* tuples if it's required in order to advance relfrozenxid / relminmxid, or
* if it's considered advantageous for overall system performance to do so
* now. The 'params.cutoffs', 'presult', 'new_relfrozen_xid' and
- * 'new_relmin_mxid' arguments are required when freezing. When
- * HEAP_PAGE_PRUNE_FREEZE option is passed, we also set presult->all_visible
- * and presult->all_frozen after determining whether or not to
- * opportunistically freeze, to indicate if the VM bits can be set. They are
- * always set to false when the HEAP_PAGE_PRUNE_FREEZE option is not passed,
- * because at the moment only callers that also freeze need that information.
+ * 'new_relmin_mxid' arguments are required when freezing.
+ *
+ * If HEAP_PAGE_PRUNE_UPDATE_VM is set in params and the visibility status of
+ * the page has changed, we will update the VM at the same time as pruning and
+ * freezing the heap page. We will also update presult->old_vmbits and
+ * presult->new_vmbits with the state of the VM before and after updating it
+ * for the caller to use in bookkeeping.
*
* presult contains output parameters needed by callers, such as the number of
* tuples removed and the offsets of dead items on the page after pruning.
@@ -823,13 +982,18 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
MultiXactId *new_relmin_mxid)
{
Buffer buffer = params->buffer;
+ Buffer vmbuffer = params->vmbuffer;
Page page = BufferGetPage(buffer);
+ BlockNumber blockno = BufferGetBlockNumber(buffer);
PruneState prstate;
bool do_freeze;
bool do_prune;
bool do_hint_prune;
+ bool do_set_vm;
bool did_tuple_hint_fpi;
int64 fpi_before = pgWalUsage.wal_fpi;
+ uint8 new_vmbits;
+ uint8 old_vmbits;
/* Initialize prstate */
prune_freeze_setup(params,
@@ -1011,6 +1175,65 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
*new_relmin_mxid = prstate.pagefrz.NoFreezePageRelminMxid;
}
}
+
+ /* Now update the visibility map and PD_ALL_VISIBLE hint */
+ Assert(!prstate.all_visible || (prstate.lpdead_items == 0));
+
+ do_set_vm = heap_page_will_set_vm(&prstate,
+ params->relation,
+ blockno,
+ buffer,
+ page,
+ vmbuffer,
+ prstate.lpdead_items,
+ &old_vmbits,
+ &new_vmbits);
+
+ /*
+ * new_vmbits should be 0 regardless of whether or not the page is
+ * all-visible if we do not intend to set the VM.
+ */
+ Assert(do_set_vm || new_vmbits == 0);
+
+ /* Set the visibility map and page visibility hint, if relevant */
+ if (do_set_vm)
+ {
+ Assert(prstate.all_visible);
+
+ /*
+ * It should never be the case that the visibility map page is set
+ * while the page-level bit is clear, but the reverse is allowed (if
+ * checksums are not enabled). Regardless, set both bits so that we
+ * get back in sync.
+ *
+ * The heap buffer must be marked dirty before adding it to the WAL
+ * chain when setting the VM. We don't worry about unnecessarily
+ * dirtying the heap buffer if PD_ALL_VISIBLE is already set, though.
+ * It is extremely rare to have a clean heap buffer with
+ * PD_ALL_VISIBLE already set and the VM bits clear, so there is no
+ * point in optimizing it.
+ */
+ PageSetAllVisible(page);
+ MarkBufferDirty(buffer);
+
+ /*
+ * If the page is being set all-frozen, we pass InvalidTransactionId
+ * as the cutoff_xid, since a snapshot conflict horizon sufficient to
+ * make everything safe for REDO was logged when the page's tuples
+ * were frozen.
+ */
+ Assert(!prstate.all_frozen ||
+ !TransactionIdIsValid(presult->vm_conflict_horizon));
+
+ visibilitymap_set(params->relation, blockno, buffer,
+ InvalidXLogRecPtr,
+ vmbuffer, presult->vm_conflict_horizon,
+ new_vmbits);
+ }
+
+ /* Save the vmbits for caller */
+ presult->old_vmbits = old_vmbits;
+ presult->new_vmbits = new_vmbits;
}
@@ -1485,6 +1708,8 @@ heap_prune_record_unchanged_lp_normal(Page page, PruneState *prstate, OffsetNumb
{
TransactionId xmin;
+ Assert(prstate->attempt_update_vm);
+
if (!HeapTupleHeaderXminCommitted(htup))
{
prstate->all_visible = false;
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index c5fc5b71f94..8b489349312 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -422,11 +422,7 @@ static void find_next_unskippable_block(LVRelState *vacrel, bool *skipsallvis);
static bool lazy_scan_new_or_empty(LVRelState *vacrel, Buffer buf,
BlockNumber blkno, Page page,
bool sharelock, Buffer vmbuffer);
-static bool identify_and_fix_vm_corruption(Relation rel, Buffer heap_buffer,
- BlockNumber heap_blk, Page heap_page,
- int nlpdead_items,
- Buffer vmbuffer,
- uint8 vmbits);
+
static int lazy_scan_prune(LVRelState *vacrel, Buffer buf,
BlockNumber blkno, Page page,
Buffer vmbuffer,
@@ -1933,83 +1929,6 @@ cmpOffsetNumbers(const void *a, const void *b)
return pg_cmp_u16(*(const OffsetNumber *) a, *(const OffsetNumber *) b);
}
-/*
- * Helper to correct any corruption detected on a heap page and its
- * corresponding visibility map page after pruning but before setting the
- * visibility map. It examines the heap page, the associated VM page, and the
- * number of dead items previously identified.
- *
- * This function must be called while holding an exclusive lock on the heap
- * buffer, and the dead items must have been discovered under that same lock.
-
- * The provided vmbits must reflect the current state of the VM block
- * referenced by vmbuffer. Although we do not hold a lock on the VM buffer, it
- * is pinned, and the heap buffer is exclusively locked, ensuring that no
- * other backend can update the VM bits corresponding to this heap page.
- *
- * Returns true if it cleared corruption and false otherwise.
- */
-static bool
-identify_and_fix_vm_corruption(Relation rel, Buffer heap_buffer,
- BlockNumber heap_blk, Page heap_page,
- int nlpdead_items,
- Buffer vmbuffer,
- uint8 vmbits)
-{
- Assert(visibilitymap_get_status(rel, heap_blk, &vmbuffer) == vmbits);
-
- Assert(BufferIsLockedByMeInMode(heap_buffer, BUFFER_LOCK_EXCLUSIVE));
-
- /*
- * As of PostgreSQL 9.2, the visibility map bit should never be set if the
- * page-level bit is clear. However, it's possible that the bit got
- * cleared after heap_vac_scan_next_block() was called, so we must recheck
- * with buffer lock before concluding that the VM is corrupt.
- */
- if (!PageIsAllVisible(heap_page) &&
- ((vmbits & VISIBILITYMAP_VALID_BITS) != 0))
- {
- ereport(WARNING,
- (errcode(ERRCODE_DATA_CORRUPTED),
- errmsg("page is not marked all-visible but visibility map bit is set in relation \"%s\" page %u",
- RelationGetRelationName(rel), heap_blk)));
-
- visibilitymap_clear(rel, heap_blk, vmbuffer,
- VISIBILITYMAP_VALID_BITS);
- return true;
- }
-
- /*
- * It's possible for the value returned by
- * GetOldestNonRemovableTransactionId() to move backwards, so it's not
- * wrong for us to see tuples that appear to not be visible to everyone
- * yet, while PD_ALL_VISIBLE is already set. The real safe xmin value
- * never moves backwards, but GetOldestNonRemovableTransactionId() is
- * conservative and sometimes returns a value that's unnecessarily small,
- * so if we see that contradiction it just means that the tuples that we
- * think are not visible to everyone yet actually are, and the
- * PD_ALL_VISIBLE flag is correct.
- *
- * There should never be LP_DEAD items on a page with PD_ALL_VISIBLE set,
- * however.
- */
- else if (PageIsAllVisible(heap_page) && nlpdead_items > 0)
- {
- ereport(WARNING,
- (errcode(ERRCODE_DATA_CORRUPTED),
- errmsg("page containing LP_DEAD items is marked as all-visible in relation \"%s\" page %u",
- RelationGetRelationName(rel), heap_blk)));
-
- PageClearAllVisible(heap_page);
- MarkBufferDirty(heap_buffer);
- visibilitymap_clear(rel, heap_blk, vmbuffer,
- VISIBILITYMAP_VALID_BITS);
- return true;
- }
-
- return false;
-}
-
/*
* lazy_scan_prune() -- lazy_scan_heap() pruning and freezing.
*
@@ -2041,13 +1960,12 @@ lazy_scan_prune(LVRelState *vacrel,
PruneFreezeParams params = {
.relation = rel,
.buffer = buf,
+ .vmbuffer = vmbuffer,
.reason = PRUNE_VACUUM_SCAN,
- .options = HEAP_PAGE_PRUNE_FREEZE,
+ .options = HEAP_PAGE_PRUNE_FREEZE | HEAP_PAGE_PRUNE_UPDATE_VM,
.vistest = vacrel->vistest,
.cutoffs = &vacrel->cutoffs,
};
- uint8 old_vmbits = 0;
- uint8 new_vmbits = 0;
Assert(BufferGetBlockNumber(buf) == blkno);
@@ -2147,75 +2065,25 @@ lazy_scan_prune(LVRelState *vacrel,
/* Did we find LP_DEAD items? */
*has_lpdead_items = (presult.lpdead_items > 0);
- Assert(!presult.all_visible || !(*has_lpdead_items));
- Assert(!presult.all_frozen || presult.all_visible);
-
- old_vmbits = visibilitymap_get_status(vacrel->rel, blkno, &vmbuffer);
-
- if (identify_and_fix_vm_corruption(vacrel->rel, buf, blkno, page,
- presult.lpdead_items, vmbuffer,
- old_vmbits))
- old_vmbits = 0;
-
- if (!presult.all_visible)
- return presult.ndeleted;
-
- /* Set the visibility map and page visibility hint */
- new_vmbits = VISIBILITYMAP_ALL_VISIBLE;
-
- if (presult.all_frozen)
- new_vmbits |= VISIBILITYMAP_ALL_FROZEN;
-
- /* Nothing to do */
- if (old_vmbits == new_vmbits)
- return presult.ndeleted;
-
- Assert(presult.all_visible);
-
- /*
- * It should never be the case that the visibility map page is set while
- * the page-level bit is clear, but the reverse is allowed (if checksums
- * are not enabled). Regardless, set both bits so that we get back in
- * sync.
- *
- * The heap buffer must be marked dirty before adding it to the WAL chain
- * when setting the VM. We don't worry about unnecessarily dirtying the
- * heap buffer if PD_ALL_VISIBLE is already set, though. It is extremely
- * rare to have a clean heap buffer with PD_ALL_VISIBLE already set and
- * the VM bits clear, so there is no point in optimizing it.
- */
- PageSetAllVisible(page);
- MarkBufferDirty(buf);
-
- /*
- * If the page is being set all-frozen, we pass InvalidTransactionId as
- * the cutoff_xid, since a snapshot conflict horizon sufficient to make
- * everything safe for REDO was logged when the page's tuples were frozen.
- */
- Assert(!presult.all_frozen ||
- !TransactionIdIsValid(presult.vm_conflict_horizon));
-
- visibilitymap_set(vacrel->rel, blkno, buf,
- InvalidXLogRecPtr,
- vmbuffer, presult.vm_conflict_horizon,
- new_vmbits);
/*
* If the page wasn't already set all-visible and/or all-frozen in the VM,
* count it as newly set for logging.
*/
- if ((old_vmbits & VISIBILITYMAP_ALL_VISIBLE) == 0)
+ if ((presult.old_vmbits & VISIBILITYMAP_ALL_VISIBLE) == 0 &&
+ (presult.new_vmbits & VISIBILITYMAP_ALL_VISIBLE) != 0)
{
vacrel->vm_new_visible_pages++;
- if (presult.all_frozen)
+ if ((presult.new_vmbits & VISIBILITYMAP_ALL_FROZEN) != 0)
{
vacrel->vm_new_visible_frozen_pages++;
*vm_page_frozen = true;
}
}
- else if ((old_vmbits & VISIBILITYMAP_ALL_FROZEN) == 0 &&
- presult.all_frozen)
+ else if ((presult.old_vmbits & VISIBILITYMAP_ALL_FROZEN) == 0 &&
+ (presult.new_vmbits & VISIBILITYMAP_ALL_FROZEN) != 0)
{
+ Assert((presult.new_vmbits & VISIBILITYMAP_ALL_VISIBLE) != 0);
vacrel->vm_new_frozen_pages++;
*vm_page_frozen = true;
}
diff --git a/src/include/access/heapam.h b/src/include/access/heapam.h
index f7e4ae3843c..0913759219c 100644
--- a/src/include/access/heapam.h
+++ b/src/include/access/heapam.h
@@ -42,6 +42,7 @@
/* "options" flag bits for heap_page_prune_and_freeze */
#define HEAP_PAGE_PRUNE_MARK_UNUSED_NOW (1 << 0)
#define HEAP_PAGE_PRUNE_FREEZE (1 << 1)
+#define HEAP_PAGE_PRUNE_UPDATE_VM (1 << 2)
typedef struct BulkInsertStateData *BulkInsertState;
typedef struct GlobalVisState GlobalVisState;
@@ -238,6 +239,12 @@ typedef struct PruneFreezeParams
Relation relation; /* relation containing buffer to be pruned */
Buffer buffer; /* buffer to be pruned */
+ /*
+ * If we will consider updating the visibility map, vmbuffer should
+ * contain the correct block of the visibility map and be pinned.
+ */
+ Buffer vmbuffer;
+
/*
* The reason pruning was performed. It is used to set the WAL record
* opcode which is used for debugging and analysis purposes.
@@ -252,6 +259,9 @@ typedef struct PruneFreezeParams
*
* HEAP_PAGE_PRUNE_FREEZE indicates that we will also freeze tuples, and
* will return 'all_visible', 'all_frozen' flags to the caller.
+ *
+ * HEAP_PAGE_PRUNE_UPDATE_VM indicates that we will set the page's status
+ * in the VM.
*/
int options;
@@ -299,6 +309,16 @@ typedef struct PruneFreezeResult
bool all_frozen;
TransactionId vm_conflict_horizon;
+ /*
+ * old_vmbits are the state of the all-visible and all-frozen bits in the
+ * visibility map before updating it during phase I of vacuuming.
+ * new_vmbits are the state of those bits after phase I of vacuuming.
+ *
+ * These are only set if the HEAP_PAGE_PRUNE_UPDATE_VM option is set.
+ */
+ uint8 new_vmbits;
+ uint8 old_vmbits;
+
/*
* Whether or not the page makes rel truncation unsafe. This is set to
* 'true', even if the page contains LP_DEAD items. VACUUM will remove
--
2.43.0
v30-0005-Move-VM-assert-into-prune-freeze-code.patchtext/x-patch; charset=US-ASCII; name=v30-0005-Move-VM-assert-into-prune-freeze-code.patchDownload
From 9f5072500e2a3bc2f2a8490f1ca11bf60a81515a Mon Sep 17 00:00:00 2001
From: Melanie Plageman <melanieplageman@gmail.com>
Date: Tue, 2 Dec 2025 15:57:34 -0500
Subject: [PATCH v30 05/16] Move VM assert into prune/freeze code
This is a step toward setting the VM in the same WAL record as pruning
and freezing. It moves the check of the heap page into prune/freeze code
before setting the VM. This allows us to remove some fields of the
PruneFreezeResult.
Reviewed-by: Chao Li <li.evan.chao@gmail.com>
---
src/backend/access/heap/pruneheap.c | 86 ++++++++++++++++++++++------
src/backend/access/heap/vacuumlazy.c | 68 +---------------------
src/include/access/heapam.h | 25 +++-----
3 files changed, 77 insertions(+), 102 deletions(-)
diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index 1c1446058a7..7af6aea2d0e 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -932,6 +932,31 @@ heap_page_will_set_vm(PruneState *prstate,
return true;
}
+#ifdef USE_ASSERT_CHECKING
+
+/*
+ * Wrapper for heap_page_would_be_all_visible() which can be used for callers
+ * that expect no LP_DEAD on the page. Currently assert-only, but there is no
+ * reason not to use it outside of asserts.
+ */
+static bool
+heap_page_is_all_visible(Relation rel, Buffer buf,
+ TransactionId OldestXmin,
+ bool *all_frozen,
+ TransactionId *visibility_cutoff_xid,
+ OffsetNumber *logging_offnum)
+{
+
+ return heap_page_would_be_all_visible(rel, buf,
+ OldestXmin,
+ NULL, 0,
+ all_frozen,
+ visibility_cutoff_xid,
+ logging_offnum);
+}
+#endif
+
+
/*
* Prune and repair fragmentation and potentially freeze tuples on the
@@ -985,6 +1010,7 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
Buffer vmbuffer = params->vmbuffer;
Page page = BufferGetPage(buffer);
BlockNumber blockno = BufferGetBlockNumber(buffer);
+ TransactionId vm_conflict_horizon = InvalidTransactionId;
PruneState prstate;
bool do_freeze;
bool do_prune;
@@ -1142,23 +1168,8 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
presult->nfrozen = prstate.nfrozen;
presult->live_tuples = prstate.live_tuples;
presult->recently_dead_tuples = prstate.recently_dead_tuples;
- presult->all_visible = prstate.all_visible;
- presult->all_frozen = prstate.all_frozen;
presult->hastup = prstate.hastup;
- /*
- * For callers planning to update the visibility map, the conflict horizon
- * for that record must be the newest xmin on the page. However, if the
- * page is completely frozen, there can be no conflict and the
- * vm_conflict_horizon should remain InvalidTransactionId. This includes
- * the case that we just froze all the tuples; the prune-freeze record
- * included the conflict XID already so the caller doesn't need it.
- */
- if (presult->all_frozen)
- presult->vm_conflict_horizon = InvalidTransactionId;
- else
- presult->vm_conflict_horizon = prstate.visibility_cutoff_xid;
-
presult->lpdead_items = prstate.lpdead_items;
/* the presult->deadoffsets array was already filled in */
@@ -1176,6 +1187,46 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
}
}
+ /*
+ * If updating the visibility map, the conflict horizon for that record
+ * must be the newest xmin on the page. However, if the page is
+ * completely frozen, there can be no conflict and the vm_conflict_horizon
+ * should remain InvalidTransactionId. This includes the case that we
+ * just froze all the tuples; the prune-freeze record included the
+ * conflict XID already so we don't need to again.
+ */
+ if (prstate.all_frozen)
+ vm_conflict_horizon = InvalidTransactionId;
+ else
+ vm_conflict_horizon = prstate.visibility_cutoff_xid;
+
+ /*
+ * During its second pass over the heap, VACUUM calls
+ * heap_page_would_be_all_visible() to determine whether a page is
+ * all-visible and all-frozen. The logic here is similar. After completing
+ * pruning and freezing, use an assertion to verify that our results
+ * remain consistent with heap_page_would_be_all_visible().
+ */
+#ifdef USE_ASSERT_CHECKING
+ if (prstate.all_visible)
+ {
+ TransactionId debug_cutoff;
+ bool debug_all_frozen;
+
+ Assert(presult->lpdead_items == 0);
+
+ Assert(heap_page_is_all_visible(params->relation, buffer,
+ prstate.cutoffs->OldestXmin,
+ &debug_all_frozen,
+ &debug_cutoff, off_loc));
+
+ Assert(prstate.all_frozen == debug_all_frozen);
+
+ Assert(!TransactionIdIsValid(debug_cutoff) ||
+ debug_cutoff == vm_conflict_horizon);
+ }
+#endif
+
/* Now update the visibility map and PD_ALL_VISIBLE hint */
Assert(!prstate.all_visible || (prstate.lpdead_items == 0));
@@ -1222,12 +1273,11 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
* make everything safe for REDO was logged when the page's tuples
* were frozen.
*/
- Assert(!prstate.all_frozen ||
- !TransactionIdIsValid(presult->vm_conflict_horizon));
+ Assert(!prstate.all_frozen || !TransactionIdIsValid(vm_conflict_horizon));
visibilitymap_set(params->relation, blockno, buffer,
InvalidXLogRecPtr,
- vmbuffer, presult->vm_conflict_horizon,
+ vmbuffer, vm_conflict_horizon,
new_vmbits);
}
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index 8b489349312..f56a02a3d46 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -457,20 +457,6 @@ static void dead_items_add(LVRelState *vacrel, BlockNumber blkno, OffsetNumber *
static void dead_items_reset(LVRelState *vacrel);
static void dead_items_cleanup(LVRelState *vacrel);
-#ifdef USE_ASSERT_CHECKING
-static bool heap_page_is_all_visible(Relation rel, Buffer buf,
- TransactionId OldestXmin,
- bool *all_frozen,
- TransactionId *visibility_cutoff_xid,
- OffsetNumber *logging_offnum);
-#endif
-static bool heap_page_would_be_all_visible(Relation rel, Buffer buf,
- TransactionId OldestXmin,
- OffsetNumber *deadoffsets,
- int ndeadoffsets,
- bool *all_frozen,
- TransactionId *visibility_cutoff_xid,
- OffsetNumber *logging_offnum);
static void update_relstats_all_indexes(LVRelState *vacrel);
static void vacuum_error_callback(void *arg);
static void update_vacuum_error_info(LVRelState *vacrel,
@@ -2006,32 +1992,6 @@ lazy_scan_prune(LVRelState *vacrel,
vacrel->new_frozen_tuple_pages++;
}
- /*
- * VACUUM will call heap_page_is_all_visible() during the second pass over
- * the heap to determine all_visible and all_frozen for the page -- this
- * is a specialized version of the logic from this function. Now that
- * we've finished pruning and freezing, make sure that we're in total
- * agreement with heap_page_is_all_visible() using an assertion.
- */
-#ifdef USE_ASSERT_CHECKING
- if (presult.all_visible)
- {
- TransactionId debug_cutoff;
- bool debug_all_frozen;
-
- Assert(presult.lpdead_items == 0);
-
- Assert(heap_page_is_all_visible(vacrel->rel, buf,
- vacrel->cutoffs.OldestXmin, &debug_all_frozen,
- &debug_cutoff, &vacrel->offnum));
-
- Assert(presult.all_frozen == debug_all_frozen);
-
- Assert(!TransactionIdIsValid(debug_cutoff) ||
- debug_cutoff == presult.vm_conflict_horizon);
- }
-#endif
-
/*
* Now save details of the LP_DEAD items from the page in vacrel
*/
@@ -3489,29 +3449,6 @@ dead_items_cleanup(LVRelState *vacrel)
vacrel->pvs = NULL;
}
-#ifdef USE_ASSERT_CHECKING
-
-/*
- * Wrapper for heap_page_would_be_all_visible() which can be used for callers
- * that expect no LP_DEAD on the page. Currently assert-only, but there is no
- * reason not to use it outside of asserts.
- */
-static bool
-heap_page_is_all_visible(Relation rel, Buffer buf,
- TransactionId OldestXmin,
- bool *all_frozen,
- TransactionId *visibility_cutoff_xid,
- OffsetNumber *logging_offnum)
-{
-
- return heap_page_would_be_all_visible(rel, buf,
- OldestXmin,
- NULL, 0,
- all_frozen,
- visibility_cutoff_xid,
- logging_offnum);
-}
-#endif
/*
* Check whether the heap page in buf is all-visible except for the dead
@@ -3535,15 +3472,12 @@ heap_page_is_all_visible(Relation rel, Buffer buf,
* - *logging_offnum: OffsetNumber of current tuple being processed;
* used by vacuum's error callback system.
*
- * Callers looking to verify that the page is already all-visible can call
- * heap_page_is_all_visible().
- *
* This logic is closely related to heap_prune_record_unchanged_lp_normal().
* If you modify this function, ensure consistency with that code. An
* assertion cross-checks that both remain in agreement. Do not introduce new
* side-effects.
*/
-static bool
+bool
heap_page_would_be_all_visible(Relation rel, Buffer buf,
TransactionId OldestXmin,
OffsetNumber *deadoffsets,
diff --git a/src/include/access/heapam.h b/src/include/access/heapam.h
index 0913759219c..88e79c58a10 100644
--- a/src/include/access/heapam.h
+++ b/src/include/access/heapam.h
@@ -257,8 +257,7 @@ typedef struct PruneFreezeParams
* HEAP_PAGE_PRUNE_MARK_UNUSED_NOW indicates that dead items can be set
* LP_UNUSED during pruning.
*
- * HEAP_PAGE_PRUNE_FREEZE indicates that we will also freeze tuples, and
- * will return 'all_visible', 'all_frozen' flags to the caller.
+ * HEAP_PAGE_PRUNE_FREEZE indicates that we will also freeze tuples.
*
* HEAP_PAGE_PRUNE_UPDATE_VM indicates that we will set the page's status
* in the VM.
@@ -294,21 +293,6 @@ typedef struct PruneFreezeResult
int live_tuples;
int recently_dead_tuples;
- /*
- * all_visible and all_frozen indicate if the all-visible and all-frozen
- * bits in the visibility map can be set for this page, after pruning.
- *
- * vm_conflict_horizon is the newest xmin of live tuples on the page. The
- * caller can use it as the conflict horizon when setting the VM bits. It
- * is only valid if we froze some tuples (nfrozen > 0), and all_frozen is
- * true.
- *
- * These are only set if the HEAP_PRUNE_FREEZE option is set.
- */
- bool all_visible;
- bool all_frozen;
- TransactionId vm_conflict_horizon;
-
/*
* old_vmbits are the state of the all-visible and all-frozen bits in the
* visibility map before updating it during phase I of vacuuming.
@@ -453,6 +437,13 @@ extern void log_heap_prune_and_freeze(Relation relation, Buffer buffer,
/* in heap/vacuumlazy.c */
extern void heap_vacuum_rel(Relation rel,
const VacuumParams params, BufferAccessStrategy bstrategy);
+extern bool heap_page_would_be_all_visible(Relation rel, Buffer buf,
+ TransactionId OldestXmin,
+ OffsetNumber *deadoffsets,
+ int ndeadoffsets,
+ bool *all_frozen,
+ TransactionId *visibility_cutoff_xid,
+ OffsetNumber *logging_offnum);
/* in heap/heapam_visibility.c */
extern bool HeapTupleSatisfiesVisibility(HeapTuple htup, Snapshot snapshot,
--
2.43.0
v30-0006-Eliminate-XLOG_HEAP2_VISIBLE-from-vacuum-phase-I.patchtext/x-patch; charset=US-ASCII; name=v30-0006-Eliminate-XLOG_HEAP2_VISIBLE-from-vacuum-phase-I.patchDownload
From eb94a7df040b6250d3ea3e0d1a79f24a3dc4fd6a Mon Sep 17 00:00:00 2001
From: Melanie Plageman <melanieplageman@gmail.com>
Date: Tue, 2 Dec 2025 16:16:22 -0500
Subject: [PATCH v30 06/16] Eliminate XLOG_HEAP2_VISIBLE from vacuum phase I
prune/freeze
Vacuum no longer emits a separate WAL record for each page set
all-visible or all-frozen during phase I. Instead, visibility map
updates are now included in the XLOG_HEAP2_PRUNE_VACUUM_SCAN record that
is already emitted for pruning and freezing.
Previously, heap_page_prune_and_freeze() determined whether a page was
all-visible, but the corresponding VM bits were only set later in
lazy_scan_prune(). Now the VM is updated immediately in
heap_page_prune_and_freeze(), at the same time as the heap
modifications.
This change applies only to vacuum phase I, not to pruning performed
during normal page access.
Author: Melanie Plageman <melanieplageman@gmail.com>
Reviewed-by: Andres Freund <andres@anarazel.de>
Reviewed-by: Robert Haas <robertmhaas@gmail.com>
Reviewed-by: Kirill Reshke <reshkekirill@gmail.com>
Reviewed-by: Chao Li <li.evan.chao@gmail.com>
Discussion: https://postgr.es/m/flat/CAAKRu_ZMw6Npd_qm2KM%2BFwQ3cMOMx1Dh3VMhp8-V7SOLxdK9-g%40mail.gmail.com
---
src/backend/access/heap/pruneheap.c | 275 ++++++++++++++++------------
1 file changed, 157 insertions(+), 118 deletions(-)
diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index 7af6aea2d0e..49d3ebb0063 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -205,6 +205,11 @@ static bool heap_page_will_set_vm(PruneState *prstate,
int nlpdead_items,
uint8 *old_vmbits,
uint8 *new_vmbits);
+static TransactionId get_conflict_xid(bool do_prune, bool do_freeze, bool do_set_vm,
+ uint8 old_vmbits, uint8 new_vmbits,
+ TransactionId latest_xid_removed,
+ TransactionId frz_conflict_horizon,
+ TransactionId visibility_cutoff_xid);
/*
@@ -795,6 +800,68 @@ heap_page_will_freeze(Relation relation, Buffer buffer,
return do_freeze;
}
+/*
+ * Calculate the conflict horizon for the whole XLOG_HEAP2_PRUNE_VACUUM_SCAN
+ * or XLOG_HEAP2_PRUNE_ON_ACCESS record.
+ */
+static TransactionId
+get_conflict_xid(bool do_prune, bool do_freeze, bool do_set_vm,
+ uint8 old_vmbits, uint8 new_vmbits,
+ TransactionId latest_xid_removed, TransactionId frz_conflict_horizon,
+ TransactionId visibility_cutoff_xid)
+{
+ TransactionId conflict_xid;
+
+ /*
+ * We can omit the snapshot conflict horizon if we are not pruning or
+ * freezing any tuples and are setting an already all-visible page
+ * all-frozen in the VM. In this case, all of the tuples on the page must
+ * already be visible to all MVCC snapshots on the standby.
+ */
+ if (!do_prune &&
+ !do_freeze &&
+ do_set_vm &&
+ (old_vmbits & VISIBILITYMAP_ALL_VISIBLE) != 0 &&
+ (new_vmbits & VISIBILITYMAP_ALL_FROZEN) != 0)
+ return InvalidTransactionId;
+
+ /*
+ * The snapshotConflictHorizon for the whole record should be the most
+ * conservative of all the horizons calculated for any of the possible
+ * modifications. If this record will prune tuples, any transactions on
+ * the standby older than the youngest xmax of the most recently removed
+ * tuple this record will prune will conflict. If this record will freeze
+ * tuples, any transactions on the standby with xids older than the
+ * youngest tuple this record will freeze will conflict.
+ */
+ conflict_xid = InvalidTransactionId;
+
+ /*
+ * If we are updating the VM, the conflict horizon is almost always the
+ * visibility cutoff XID.
+ *
+ * Separately, if we are freezing any tuples, as an optimization, we can
+ * use the visibility_cutoff_xid as the conflict horizon if the page will
+ * be all-frozen. This is true even if there are LP_DEAD line pointers
+ * because we ignored those when maintaining the visibility_cutoff_xid.
+ * This will have been calculated earlier as the frz_conflict_horizon when
+ * we determined we would freeze.
+ */
+ if (do_set_vm)
+ conflict_xid = visibility_cutoff_xid;
+ else if (do_freeze)
+ conflict_xid = frz_conflict_horizon;
+
+ /*
+ * If we are removing tuples with a younger xmax than our so far
+ * calculated conflict_xid, we must use this as our horizon.
+ */
+ if (TransactionIdFollows(latest_xid_removed, conflict_xid))
+ conflict_xid = latest_xid_removed;
+
+ return conflict_xid;
+}
+
/*
* Helper to correct any corruption detected on a heap page and its
* corresponding visibility map page after pruning but before setting the
@@ -1010,7 +1077,6 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
Buffer vmbuffer = params->vmbuffer;
Page page = BufferGetPage(buffer);
BlockNumber blockno = BufferGetBlockNumber(buffer);
- TransactionId vm_conflict_horizon = InvalidTransactionId;
PruneState prstate;
bool do_freeze;
bool do_prune;
@@ -1018,6 +1084,7 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
bool do_set_vm;
bool did_tuple_hint_fpi;
int64 fpi_before = pgWalUsage.wal_fpi;
+ TransactionId conflict_xid;
uint8 new_vmbits;
uint8 old_vmbits;
@@ -1081,6 +1148,37 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
prstate.all_visible = prstate.all_frozen = false;
Assert(!prstate.all_frozen || prstate.all_visible);
+ Assert(!prstate.all_visible || (prstate.lpdead_items == 0));
+
+ /*
+ * Decide whether to set the VM bits based on information from the VM and
+ * the all_visible/all_frozen flags.
+ */
+ do_set_vm = heap_page_will_set_vm(&prstate,
+ params->relation,
+ blockno,
+ buffer,
+ page,
+ vmbuffer,
+ prstate.lpdead_items,
+ &old_vmbits,
+ &new_vmbits);
+
+ /*
+ * new_vmbits should be 0 regardless of whether or not the page is
+ * all-visible if we do not intend to set the VM.
+ */
+ Assert(do_set_vm || new_vmbits == 0);
+
+ conflict_xid = get_conflict_xid(do_prune, do_freeze, do_set_vm,
+ old_vmbits, new_vmbits,
+ prstate.latest_xid_removed,
+ prstate.frz_conflict_horizon,
+ prstate.visibility_cutoff_xid);
+
+ /* Lock vmbuffer before entering a critical section */
+ if (do_set_vm)
+ LockBuffer(vmbuffer, BUFFER_LOCK_EXCLUSIVE);
/* Any error while applying the changes is critical */
START_CRIT_SECTION();
@@ -1102,14 +1200,17 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
/*
* If that's all we had to do to the page, this is a non-WAL-logged
- * hint. If we are going to freeze or prune the page, we will mark
- * the buffer dirty below.
+ * hint. If we are going to freeze or prune the page or set
+ * PD_ALL_VISIBLE, we will mark the buffer dirty below.
+ *
+ * Setting PD_ALL_VISIBLE is fully WAL-logged because it is forbidden
+ * for the VM to be set and PD_ALL_VISIBLE to be clear.
*/
- if (!do_freeze && !do_prune)
+ if (!do_freeze && !do_prune && !do_set_vm)
MarkBufferDirtyHint(buffer, true);
}
- if (do_prune || do_freeze)
+ if (do_prune || do_freeze || do_set_vm)
{
/* Apply the planned item changes and repair page fragmentation. */
if (do_prune)
@@ -1123,6 +1224,26 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
if (do_freeze)
heap_freeze_prepared_tuples(buffer, prstate.frozen, prstate.nfrozen);
+ /* Set the visibility map and page visibility hint */
+ if (do_set_vm)
+ {
+ /*
+ * While it is valid for PD_ALL_VISIBLE to be set when the
+ * corresponding VM bit is clear, we strongly prefer to keep them
+ * in sync.
+ *
+ * The heap buffer must be marked dirty before adding it to the
+ * WAL chain when setting the VM. We don't worry about
+ * unnecessarily dirtying the heap buffer if PD_ALL_VISIBLE is
+ * already set, though. It is extremely rare to have a clean heap
+ * buffer with PD_ALL_VISIBLE already set and the VM bits clear,
+ * so there is no point in optimizing it.
+ */
+ PageSetAllVisible(page);
+ visibilitymap_set_vmbits(blockno, vmbuffer, new_vmbits,
+ params->relation->rd_locator);
+ }
+
MarkBufferDirty(buffer);
/*
@@ -1130,29 +1251,12 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
*/
if (RelationNeedsWAL(params->relation))
{
- /*
- * The snapshotConflictHorizon for the whole record should be the
- * most conservative of all the horizons calculated for any of the
- * possible modifications. If this record will prune tuples, any
- * transactions on the standby older than the youngest xmax of the
- * most recently removed tuple this record will prune will
- * conflict. If this record will freeze tuples, any transactions
- * on the standby with xids older than the youngest tuple this
- * record will freeze will conflict.
- */
- TransactionId conflict_xid;
-
- if (TransactionIdFollows(prstate.frz_conflict_horizon,
- prstate.latest_xid_removed))
- conflict_xid = prstate.frz_conflict_horizon;
- else
- conflict_xid = prstate.latest_xid_removed;
-
log_heap_prune_and_freeze(params->relation, buffer,
- InvalidBuffer, /* vmbuffer */
- 0, /* vmflags */
+ do_set_vm ? vmbuffer : InvalidBuffer,
+ do_set_vm ? new_vmbits : 0,
conflict_xid,
- true, params->reason,
+ true, /* cleanup lock */
+ params->reason,
prstate.frozen, prstate.nfrozen,
prstate.redirected, prstate.nredirected,
prstate.nowdead, prstate.ndead,
@@ -1162,43 +1266,8 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
END_CRIT_SECTION();
- /* Copy information back for caller */
- presult->ndeleted = prstate.ndeleted;
- presult->nnewlpdead = prstate.ndead;
- presult->nfrozen = prstate.nfrozen;
- presult->live_tuples = prstate.live_tuples;
- presult->recently_dead_tuples = prstate.recently_dead_tuples;
- presult->hastup = prstate.hastup;
-
- presult->lpdead_items = prstate.lpdead_items;
- /* the presult->deadoffsets array was already filled in */
-
- if (prstate.attempt_freeze)
- {
- if (presult->nfrozen > 0)
- {
- *new_relfrozen_xid = prstate.pagefrz.FreezePageRelfrozenXid;
- *new_relmin_mxid = prstate.pagefrz.FreezePageRelminMxid;
- }
- else
- {
- *new_relfrozen_xid = prstate.pagefrz.NoFreezePageRelfrozenXid;
- *new_relmin_mxid = prstate.pagefrz.NoFreezePageRelminMxid;
- }
- }
-
- /*
- * If updating the visibility map, the conflict horizon for that record
- * must be the newest xmin on the page. However, if the page is
- * completely frozen, there can be no conflict and the vm_conflict_horizon
- * should remain InvalidTransactionId. This includes the case that we
- * just froze all the tuples; the prune-freeze record included the
- * conflict XID already so we don't need to again.
- */
- if (prstate.all_frozen)
- vm_conflict_horizon = InvalidTransactionId;
- else
- vm_conflict_horizon = prstate.visibility_cutoff_xid;
+ if (do_set_vm)
+ LockBuffer(vmbuffer, BUFFER_LOCK_UNLOCK);
/*
* During its second pass over the heap, VACUUM calls
@@ -1213,7 +1282,8 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
TransactionId debug_cutoff;
bool debug_all_frozen;
- Assert(presult->lpdead_items == 0);
+ Assert(prstate.lpdead_items == 0);
+ Assert(prstate.cutoffs);
Assert(heap_page_is_all_visible(params->relation, buffer,
prstate.cutoffs->OldestXmin,
@@ -1223,67 +1293,36 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
Assert(prstate.all_frozen == debug_all_frozen);
Assert(!TransactionIdIsValid(debug_cutoff) ||
- debug_cutoff == vm_conflict_horizon);
+ debug_cutoff == prstate.visibility_cutoff_xid);
}
#endif
- /* Now update the visibility map and PD_ALL_VISIBLE hint */
- Assert(!prstate.all_visible || (prstate.lpdead_items == 0));
-
- do_set_vm = heap_page_will_set_vm(&prstate,
- params->relation,
- blockno,
- buffer,
- page,
- vmbuffer,
- prstate.lpdead_items,
- &old_vmbits,
- &new_vmbits);
+ /* Copy information back for caller */
+ presult->ndeleted = prstate.ndeleted;
+ presult->nnewlpdead = prstate.ndead;
+ presult->nfrozen = prstate.nfrozen;
+ presult->live_tuples = prstate.live_tuples;
+ presult->recently_dead_tuples = prstate.recently_dead_tuples;
+ presult->hastup = prstate.hastup;
+ presult->new_vmbits = new_vmbits;
+ presult->old_vmbits = old_vmbits;
- /*
- * new_vmbits should be 0 regardless of whether or not the page is
- * all-visible if we do not intend to set the VM.
- */
- Assert(do_set_vm || new_vmbits == 0);
+ presult->lpdead_items = prstate.lpdead_items;
+ /* the presult->deadoffsets array was already filled in */
- /* Set the visibility map and page visibility hint, if relevant */
- if (do_set_vm)
+ if (prstate.attempt_freeze)
{
- Assert(prstate.all_visible);
-
- /*
- * It should never be the case that the visibility map page is set
- * while the page-level bit is clear, but the reverse is allowed (if
- * checksums are not enabled). Regardless, set both bits so that we
- * get back in sync.
- *
- * The heap buffer must be marked dirty before adding it to the WAL
- * chain when setting the VM. We don't worry about unnecessarily
- * dirtying the heap buffer if PD_ALL_VISIBLE is already set, though.
- * It is extremely rare to have a clean heap buffer with
- * PD_ALL_VISIBLE already set and the VM bits clear, so there is no
- * point in optimizing it.
- */
- PageSetAllVisible(page);
- MarkBufferDirty(buffer);
-
- /*
- * If the page is being set all-frozen, we pass InvalidTransactionId
- * as the cutoff_xid, since a snapshot conflict horizon sufficient to
- * make everything safe for REDO was logged when the page's tuples
- * were frozen.
- */
- Assert(!prstate.all_frozen || !TransactionIdIsValid(vm_conflict_horizon));
-
- visibilitymap_set(params->relation, blockno, buffer,
- InvalidXLogRecPtr,
- vmbuffer, vm_conflict_horizon,
- new_vmbits);
+ if (presult->nfrozen > 0)
+ {
+ *new_relfrozen_xid = prstate.pagefrz.FreezePageRelfrozenXid;
+ *new_relmin_mxid = prstate.pagefrz.FreezePageRelminMxid;
+ }
+ else
+ {
+ *new_relfrozen_xid = prstate.pagefrz.NoFreezePageRelfrozenXid;
+ *new_relmin_mxid = prstate.pagefrz.NoFreezePageRelminMxid;
+ }
}
-
- /* Save the vmbits for caller */
- presult->old_vmbits = old_vmbits;
- presult->new_vmbits = new_vmbits;
}
--
2.43.0
v30-0007-Eliminate-XLOG_HEAP2_VISIBLE-from-empty-page-vac.patchtext/x-patch; charset=US-ASCII; name=v30-0007-Eliminate-XLOG_HEAP2_VISIBLE-from-empty-page-vac.patchDownload
From b30b92789f9b62e60348bd1441f03031e1bf7309 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <melanieplageman@gmail.com>
Date: Sat, 27 Sep 2025 11:55:21 -0400
Subject: [PATCH v30 07/16] Eliminate XLOG_HEAP2_VISIBLE from empty-page vacuum
As part of removing XLOG_HEAP2_VISIBLE records, phase I of VACUUM now
marks empty pages all-visible in a XLOG_HEAP2_PRUNE_VACUUM_SCAN record.
Author: Melanie Plageman <melanieplageman@gmail.com>
Reviewed-by: Robert Haas <robertmhaas@gmail.com>
Reviewed-by: Andres Freund <andres@anarazel.de>
Reviewed-by: Chao Li <li.evan.chao@gmail.com>
---
src/backend/access/heap/vacuumlazy.c | 35 +++++++++++++++++++++++-----
1 file changed, 29 insertions(+), 6 deletions(-)
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index f56a02a3d46..d22d2a86ed0 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -1867,9 +1867,12 @@ lazy_scan_new_or_empty(LVRelState *vacrel, Buffer buf, BlockNumber blkno,
*/
if (!PageIsAllVisible(page))
{
+ /* Lock vmbuffer before entering critical section */
+ LockBuffer(vmbuffer, BUFFER_LOCK_EXCLUSIVE);
+
START_CRIT_SECTION();
- /* mark buffer dirty before writing a WAL record */
+ /* Mark buffer dirty before writing any WAL records */
MarkBufferDirty(buf);
/*
@@ -1886,13 +1889,33 @@ lazy_scan_new_or_empty(LVRelState *vacrel, Buffer buf, BlockNumber blkno,
log_newpage_buffer(buf, true);
PageSetAllVisible(page);
- visibilitymap_set(vacrel->rel, blkno, buf,
- InvalidXLogRecPtr,
- vmbuffer, InvalidTransactionId,
- VISIBILITYMAP_ALL_VISIBLE |
- VISIBILITYMAP_ALL_FROZEN);
+ visibilitymap_set_vmbits(blkno,
+ vmbuffer,
+ VISIBILITYMAP_ALL_VISIBLE |
+ VISIBILITYMAP_ALL_FROZEN,
+ vacrel->rel->rd_locator);
+
+ /*
+ * Emit WAL for setting PD_ALL_VISIBLE on the heap page and
+ * setting the VM.
+ */
+ if (RelationNeedsWAL(vacrel->rel))
+ log_heap_prune_and_freeze(vacrel->rel, buf,
+ vmbuffer,
+ VISIBILITYMAP_ALL_VISIBLE |
+ VISIBILITYMAP_ALL_FROZEN,
+ InvalidTransactionId, /* conflict xid */
+ false, /* cleanup lock */
+ PRUNE_VACUUM_SCAN, /* reason */
+ NULL, 0,
+ NULL, 0,
+ NULL, 0,
+ NULL, 0);
+
END_CRIT_SECTION();
+ LockBuffer(vmbuffer, BUFFER_LOCK_UNLOCK);
+
/* Count the newly all-frozen pages for logging */
vacrel->vm_new_visible_pages++;
vacrel->vm_new_visible_frozen_pages++;
--
2.43.0
v30-0008-Remove-XLOG_HEAP2_VISIBLE-entirely.patchtext/x-patch; charset=US-ASCII; name=v30-0008-Remove-XLOG_HEAP2_VISIBLE-entirely.patchDownload
From fb26088478e331440a2747031ba259e2adc9808e Mon Sep 17 00:00:00 2001
From: Melanie Plageman <melanieplageman@gmail.com>
Date: Sat, 27 Sep 2025 11:55:36 -0400
Subject: [PATCH v30 08/16] Remove XLOG_HEAP2_VISIBLE entirely
As no remaining users emit XLOG_HEAP2_VISIBLE records.
This includes deleting the xl_heap_visible struct and all functions
responsible for emitting or replaying XLOG_HEAP2_VISIBLE records.
This changes the visibility map API, so any external users/consumers of
the VM-only WAL record will need to change.
Author: Melanie Plageman <melanieplageman@gmail.com>
Reviewed-by: Andrey Borodin <x4mmm@yandex-team.ru>
Reviewed-by: Andres Freund <andres@anarazel.de>
Reviewed-by: Chao Li <li.evan.chao@gmail.com>
---
src/backend/access/common/bufmask.c | 4 +-
src/backend/access/heap/heapam.c | 54 +-------
src/backend/access/heap/heapam_xlog.c | 155 ++---------------------
src/backend/access/heap/pruneheap.c | 4 +-
src/backend/access/heap/vacuumlazy.c | 16 +--
src/backend/access/heap/visibilitymap.c | 109 +---------------
src/backend/access/rmgrdesc/heapdesc.c | 10 --
src/backend/replication/logical/decode.c | 1 -
src/backend/storage/ipc/standby.c | 12 +-
src/include/access/heapam_xlog.h | 20 ---
src/include/access/visibilitymap.h | 13 +-
src/include/access/visibilitymapdefs.h | 9 --
src/tools/pgindent/typedefs.list | 1 -
13 files changed, 38 insertions(+), 370 deletions(-)
diff --git a/src/backend/access/common/bufmask.c b/src/backend/access/common/bufmask.c
index bb260cffa68..5f07f179415 100644
--- a/src/backend/access/common/bufmask.c
+++ b/src/backend/access/common/bufmask.c
@@ -56,8 +56,8 @@ mask_page_hint_bits(Page page)
/*
* During replay, if the page LSN has advanced past our XLOG record's LSN,
- * we don't mark the page all-visible. See heap_xlog_visible() for
- * details.
+ * we don't mark the page all-visible. See heap_xlog_prune_and_freeze()
+ * for more details.
*/
PageClearAllVisible(page);
}
diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index 6daf4a87dec..fb7a7548aa0 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -2539,11 +2539,11 @@ heap_multi_insert(Relation relation, TupleTableSlot **slots, int ntuples,
else if (all_frozen_set)
{
PageSetAllVisible(page);
- visibilitymap_set_vmbits(BufferGetBlockNumber(buffer),
- vmbuffer,
- VISIBILITYMAP_ALL_VISIBLE |
- VISIBILITYMAP_ALL_FROZEN,
- relation->rd_locator);
+ visibilitymap_set(BufferGetBlockNumber(buffer),
+ vmbuffer,
+ VISIBILITYMAP_ALL_VISIBLE |
+ VISIBILITYMAP_ALL_FROZEN,
+ relation->rd_locator);
}
/*
@@ -8813,50 +8813,6 @@ bottomup_sort_and_shrink(TM_IndexDeleteOp *delstate)
return nblocksfavorable;
}
-/*
- * Perform XLogInsert for a heap-visible operation. 'block' is the block
- * being marked all-visible, and vm_buffer is the buffer containing the
- * corresponding visibility map block. Both should have already been modified
- * and dirtied.
- *
- * snapshotConflictHorizon comes from the largest xmin on the page being
- * marked all-visible. REDO routine uses it to generate recovery conflicts.
- *
- * If checksums or wal_log_hints are enabled, we may also generate a full-page
- * image of heap_buffer. Otherwise, we optimize away the FPI (by specifying
- * REGBUF_NO_IMAGE for the heap buffer), in which case the caller should *not*
- * update the heap page's LSN.
- */
-XLogRecPtr
-log_heap_visible(Relation rel, Buffer heap_buffer, Buffer vm_buffer,
- TransactionId snapshotConflictHorizon, uint8 vmflags)
-{
- xl_heap_visible xlrec;
- XLogRecPtr recptr;
- uint8 flags;
-
- Assert(BufferIsValid(heap_buffer));
- Assert(BufferIsValid(vm_buffer));
-
- xlrec.snapshotConflictHorizon = snapshotConflictHorizon;
- xlrec.flags = vmflags;
- if (RelationIsAccessibleInLogicalDecoding(rel))
- xlrec.flags |= VISIBILITYMAP_XLOG_CATALOG_REL;
- XLogBeginInsert();
- XLogRegisterData(&xlrec, SizeOfHeapVisible);
-
- XLogRegisterBuffer(0, vm_buffer, 0);
-
- flags = REGBUF_STANDARD;
- if (!XLogHintBitIsNeeded())
- flags |= REGBUF_NO_IMAGE;
- XLogRegisterBuffer(1, heap_buffer, flags);
-
- recptr = XLogInsert(RM_HEAP2_ID, XLOG_HEAP2_VISIBLE);
-
- return recptr;
-}
-
/*
* Perform XLogInsert for a heap-update operation. Caller must already
* have modified the buffer(s) and marked them dirty.
diff --git a/src/backend/access/heap/heapam_xlog.c b/src/backend/access/heap/heapam_xlog.c
index 1823feff298..47d2479415e 100644
--- a/src/backend/access/heap/heapam_xlog.c
+++ b/src/backend/access/heap/heapam_xlog.c
@@ -236,7 +236,7 @@ heap_xlog_prune_freeze(XLogReaderState *record)
if (PageIsNew(vmpage))
PageInit(vmpage, BLCKSZ, 0);
- visibilitymap_set_vmbits(blkno, vmbuffer, vmflags, rlocator);
+ visibilitymap_set(blkno, vmbuffer, vmflags, rlocator);
Assert(BufferIsDirty(vmbuffer));
PageSetLSN(vmpage, lsn);
@@ -249,142 +249,6 @@ heap_xlog_prune_freeze(XLogReaderState *record)
XLogRecordPageWithFreeSpace(rlocator, blkno, freespace);
}
-/*
- * Replay XLOG_HEAP2_VISIBLE records.
- *
- * The critical integrity requirement here is that we must never end up with
- * a situation where the visibility map bit is set, and the page-level
- * PD_ALL_VISIBLE bit is clear. If that were to occur, then a subsequent
- * page modification would fail to clear the visibility map bit.
- */
-static void
-heap_xlog_visible(XLogReaderState *record)
-{
- XLogRecPtr lsn = record->EndRecPtr;
- xl_heap_visible *xlrec = (xl_heap_visible *) XLogRecGetData(record);
- Buffer vmbuffer = InvalidBuffer;
- Buffer buffer;
- Page page;
- RelFileLocator rlocator;
- BlockNumber blkno;
- XLogRedoAction action;
-
- Assert((xlrec->flags & VISIBILITYMAP_XLOG_VALID_BITS) == xlrec->flags);
-
- XLogRecGetBlockTag(record, 1, &rlocator, NULL, &blkno);
-
- /*
- * If there are any Hot Standby transactions running that have an xmin
- * horizon old enough that this page isn't all-visible for them, they
- * might incorrectly decide that an index-only scan can skip a heap fetch.
- *
- * NB: It might be better to throw some kind of "soft" conflict here that
- * forces any index-only scan that is in flight to perform heap fetches,
- * rather than killing the transaction outright.
- */
- if (InHotStandby)
- ResolveRecoveryConflictWithSnapshot(xlrec->snapshotConflictHorizon,
- xlrec->flags & VISIBILITYMAP_XLOG_CATALOG_REL,
- rlocator);
-
- /*
- * Read the heap page, if it still exists. If the heap file has dropped or
- * truncated later in recovery, we don't need to update the page, but we'd
- * better still update the visibility map.
- */
- action = XLogReadBufferForRedo(record, 1, &buffer);
- if (action == BLK_NEEDS_REDO)
- {
- /*
- * We don't bump the LSN of the heap page when setting the visibility
- * map bit (unless checksums or wal_hint_bits is enabled, in which
- * case we must). This exposes us to torn page hazards, but since
- * we're not inspecting the existing page contents in any way, we
- * don't care.
- */
- page = BufferGetPage(buffer);
-
- PageSetAllVisible(page);
-
- if (XLogHintBitIsNeeded())
- PageSetLSN(page, lsn);
-
- MarkBufferDirty(buffer);
- }
- else if (action == BLK_RESTORED)
- {
- /*
- * If heap block was backed up, we already restored it and there's
- * nothing more to do. (This can only happen with checksums or
- * wal_log_hints enabled.)
- */
- }
-
- if (BufferIsValid(buffer))
- {
- Size space = PageGetFreeSpace(BufferGetPage(buffer));
-
- UnlockReleaseBuffer(buffer);
-
- /*
- * Since FSM is not WAL-logged and only updated heuristically, it
- * easily becomes stale in standbys. If the standby is later promoted
- * and runs VACUUM, it will skip updating individual free space
- * figures for pages that became all-visible (or all-frozen, depending
- * on the vacuum mode,) which is troublesome when FreeSpaceMapVacuum
- * propagates too optimistic free space values to upper FSM layers;
- * later inserters try to use such pages only to find out that they
- * are unusable. This can cause long stalls when there are many such
- * pages.
- *
- * Forestall those problems by updating FSM's idea about a page that
- * is becoming all-visible or all-frozen.
- *
- * Do this regardless of a full-page image being applied, since the
- * FSM data is not in the page anyway.
- */
- if (xlrec->flags & VISIBILITYMAP_VALID_BITS)
- XLogRecordPageWithFreeSpace(rlocator, blkno, space);
- }
-
- /*
- * Even if we skipped the heap page update due to the LSN interlock, it's
- * still safe to update the visibility map. Any WAL record that clears
- * the visibility map bit does so before checking the page LSN, so any
- * bits that need to be cleared will still be cleared.
- */
- if (XLogReadBufferForRedoExtended(record, 0, RBM_ZERO_ON_ERROR, false,
- &vmbuffer) == BLK_NEEDS_REDO)
- {
- Page vmpage = BufferGetPage(vmbuffer);
- Relation reln;
- uint8 vmbits;
-
- /* initialize the page if it was read as zeros */
- if (PageIsNew(vmpage))
- PageInit(vmpage, BLCKSZ, 0);
-
- /* remove VISIBILITYMAP_XLOG_* */
- vmbits = xlrec->flags & VISIBILITYMAP_VALID_BITS;
-
- /*
- * XLogReadBufferForRedoExtended locked the buffer. But
- * visibilitymap_set will handle locking itself.
- */
- LockBuffer(vmbuffer, BUFFER_LOCK_UNLOCK);
-
- reln = CreateFakeRelcacheEntry(rlocator);
-
- visibilitymap_set(reln, blkno, InvalidBuffer, lsn, vmbuffer,
- xlrec->snapshotConflictHorizon, vmbits);
-
- ReleaseBuffer(vmbuffer);
- FreeFakeRelcacheEntry(reln);
- }
- else if (BufferIsValid(vmbuffer))
- UnlockReleaseBuffer(vmbuffer);
-}
-
/*
* Given an "infobits" field from an XLog record, set the correct bits in the
* given infomask and infomask2 for the tuple touched by the record.
@@ -762,8 +626,8 @@ heap_xlog_multi_insert(XLogReaderState *record)
*
* During recovery, however, no concurrent writers exist. Therefore,
* updating the VM without holding the heap page lock is safe enough. This
- * same approach is taken when replaying xl_heap_visible records (see
- * heap_xlog_visible()).
+ * same approach is taken when replaying XLOG_HEAP2_PRUNE* records (see
+ * heap_xlog_prune_and_freeze()).
*/
if ((xlrec->flags & XLH_INSERT_ALL_FROZEN_SET) &&
XLogReadBufferForRedoExtended(record, 1, RBM_ZERO_ON_ERROR, false,
@@ -775,11 +639,11 @@ heap_xlog_multi_insert(XLogReaderState *record)
if (PageIsNew(vmpage))
PageInit(vmpage, BLCKSZ, 0);
- visibilitymap_set_vmbits(blkno,
- vmbuffer,
- VISIBILITYMAP_ALL_VISIBLE |
- VISIBILITYMAP_ALL_FROZEN,
- rlocator);
+ visibilitymap_set(blkno,
+ vmbuffer,
+ VISIBILITYMAP_ALL_VISIBLE |
+ VISIBILITYMAP_ALL_FROZEN,
+ rlocator);
Assert(BufferIsDirty(vmbuffer));
PageSetLSN(vmpage, lsn);
@@ -1360,9 +1224,6 @@ heap2_redo(XLogReaderState *record)
case XLOG_HEAP2_PRUNE_VACUUM_CLEANUP:
heap_xlog_prune_freeze(record);
break;
- case XLOG_HEAP2_VISIBLE:
- heap_xlog_visible(record);
- break;
case XLOG_HEAP2_MULTI_INSERT:
heap_xlog_multi_insert(record);
break;
diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index 49d3ebb0063..b099483051a 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -1240,8 +1240,8 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
* so there is no point in optimizing it.
*/
PageSetAllVisible(page);
- visibilitymap_set_vmbits(blockno, vmbuffer, new_vmbits,
- params->relation->rd_locator);
+ visibilitymap_set(blockno, vmbuffer, new_vmbits,
+ params->relation->rd_locator);
}
MarkBufferDirty(buffer);
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index d22d2a86ed0..93f0f39c5f0 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -1889,11 +1889,11 @@ lazy_scan_new_or_empty(LVRelState *vacrel, Buffer buf, BlockNumber blkno,
log_newpage_buffer(buf, true);
PageSetAllVisible(page);
- visibilitymap_set_vmbits(blkno,
- vmbuffer,
- VISIBILITYMAP_ALL_VISIBLE |
- VISIBILITYMAP_ALL_FROZEN,
- vacrel->rel->rd_locator);
+ visibilitymap_set(blkno,
+ vmbuffer,
+ VISIBILITYMAP_ALL_VISIBLE |
+ VISIBILITYMAP_ALL_FROZEN,
+ vacrel->rel->rd_locator);
/*
* Emit WAL for setting PD_ALL_VISIBLE on the heap page and
@@ -2771,9 +2771,9 @@ lazy_vacuum_heap_page(LVRelState *vacrel, BlockNumber blkno, Buffer buffer,
* set PD_ALL_VISIBLE.
*/
PageSetAllVisible(page);
- visibilitymap_set_vmbits(blkno,
- vmbuffer, vmflags,
- vacrel->rel->rd_locator);
+ visibilitymap_set(blkno,
+ vmbuffer, vmflags,
+ vacrel->rel->rd_locator);
conflict_xid = visibility_cutoff_xid;
}
diff --git a/src/backend/access/heap/visibilitymap.c b/src/backend/access/heap/visibilitymap.c
index cdcb475e501..d30fee3a488 100644
--- a/src/backend/access/heap/visibilitymap.c
+++ b/src/backend/access/heap/visibilitymap.c
@@ -14,8 +14,7 @@
* visibilitymap_clear - clear bits for one page in the visibility map
* visibilitymap_pin - pin a map page for setting a bit
* visibilitymap_pin_ok - check whether correct map page is already pinned
- * visibilitymap_set - set bit(s) in a previously pinned page and log
- * visibilitymap_set_vmbits - set bit(s) in a pinned page
+ * visibilitymap_set - set bit(s) in a previously pinned page
* visibilitymap_get_status - get status of bits
* visibilitymap_count - count number of bits set in visibility map
* visibilitymap_prepare_truncate -
@@ -220,106 +219,6 @@ visibilitymap_pin_ok(BlockNumber heapBlk, Buffer vmbuf)
return BufferIsValid(vmbuf) && BufferGetBlockNumber(vmbuf) == mapBlock;
}
-/*
- * visibilitymap_set - set bit(s) on a previously pinned page
- *
- * recptr is the LSN of the XLOG record we're replaying, if we're in recovery,
- * or InvalidXLogRecPtr in normal running. The VM page LSN is advanced to the
- * one provided; in normal running, we generate a new XLOG record and set the
- * page LSN to that value (though the heap page's LSN may *not* be updated;
- * see below). cutoff_xid is the largest xmin on the page being marked
- * all-visible; it is needed for Hot Standby, and can be InvalidTransactionId
- * if the page contains no tuples. It can also be set to InvalidTransactionId
- * when a page that is already all-visible is being marked all-frozen.
- *
- * Caller is expected to set the heap page's PD_ALL_VISIBLE bit before calling
- * this function. Except in recovery, caller should also pass the heap
- * buffer. When checksums are enabled and we're not in recovery, we must add
- * the heap buffer to the WAL chain to protect it from being torn.
- *
- * You must pass a buffer containing the correct map page to this function.
- * Call visibilitymap_pin first to pin the right one. This function doesn't do
- * any I/O.
- */
-void
-visibilitymap_set(Relation rel, BlockNumber heapBlk, Buffer heapBuf,
- XLogRecPtr recptr, Buffer vmBuf, TransactionId cutoff_xid,
- uint8 flags)
-{
- BlockNumber mapBlock = HEAPBLK_TO_MAPBLOCK(heapBlk);
- uint32 mapByte = HEAPBLK_TO_MAPBYTE(heapBlk);
- uint8 mapOffset = HEAPBLK_TO_OFFSET(heapBlk);
- Page page;
- uint8 *map;
- uint8 status;
-
-#ifdef TRACE_VISIBILITYMAP
- elog(DEBUG1, "vm_set flags 0x%02X for %s %d",
- flags, RelationGetRelationName(rel), heapBlk);
-#endif
-
- Assert(InRecovery || !XLogRecPtrIsValid(recptr));
- Assert(InRecovery || PageIsAllVisible(BufferGetPage(heapBuf)));
- Assert((flags & VISIBILITYMAP_VALID_BITS) == flags);
-
- /* Must never set all_frozen bit without also setting all_visible bit */
- Assert(flags != VISIBILITYMAP_ALL_FROZEN);
-
- /* Check that we have the right heap page pinned, if present */
- if (BufferIsValid(heapBuf) && BufferGetBlockNumber(heapBuf) != heapBlk)
- elog(ERROR, "wrong heap buffer passed to visibilitymap_set");
-
- Assert(!BufferIsValid(heapBuf) ||
- BufferIsLockedByMeInMode(heapBuf, BUFFER_LOCK_EXCLUSIVE));
-
- /* Check that we have the right VM page pinned */
- if (!BufferIsValid(vmBuf) || BufferGetBlockNumber(vmBuf) != mapBlock)
- elog(ERROR, "wrong VM buffer passed to visibilitymap_set");
-
- page = BufferGetPage(vmBuf);
- map = (uint8 *) PageGetContents(page);
- LockBuffer(vmBuf, BUFFER_LOCK_EXCLUSIVE);
-
- status = (map[mapByte] >> mapOffset) & VISIBILITYMAP_VALID_BITS;
- if (flags != status)
- {
- START_CRIT_SECTION();
-
- map[mapByte] |= (flags << mapOffset);
- MarkBufferDirty(vmBuf);
-
- if (RelationNeedsWAL(rel))
- {
- if (!XLogRecPtrIsValid(recptr))
- {
- Assert(!InRecovery);
- recptr = log_heap_visible(rel, heapBuf, vmBuf, cutoff_xid, flags);
-
- /*
- * If data checksums are enabled (or wal_log_hints=on), we
- * need to protect the heap page from being torn.
- *
- * If not, then we must *not* update the heap page's LSN. In
- * this case, the FPI for the heap page was omitted from the
- * WAL record inserted above, so it would be incorrect to
- * update the heap page's LSN.
- */
- if (XLogHintBitIsNeeded())
- {
- Page heapPage = BufferGetPage(heapBuf);
-
- PageSetLSN(heapPage, recptr);
- }
- }
- PageSetLSN(page, recptr);
- }
-
- END_CRIT_SECTION();
- }
-
- LockBuffer(vmBuf, BUFFER_LOCK_UNLOCK);
-}
-
/*
* Set VM (visibility map) flags in the VM block in vmBuf.
*
@@ -341,9 +240,9 @@ visibilitymap_set(Relation rel, BlockNumber heapBlk, Buffer heapBuf,
* rlocator is used only for debugging messages.
*/
void
-visibilitymap_set_vmbits(BlockNumber heapBlk,
- Buffer vmBuf, uint8 flags,
- const RelFileLocator rlocator)
+visibilitymap_set(BlockNumber heapBlk,
+ Buffer vmBuf, uint8 flags,
+ const RelFileLocator rlocator)
{
BlockNumber mapBlock = HEAPBLK_TO_MAPBLOCK(heapBlk);
uint32 mapByte = HEAPBLK_TO_MAPBYTE(heapBlk);
diff --git a/src/backend/access/rmgrdesc/heapdesc.c b/src/backend/access/rmgrdesc/heapdesc.c
index ca26d1f0ed1..08461fdf593 100644
--- a/src/backend/access/rmgrdesc/heapdesc.c
+++ b/src/backend/access/rmgrdesc/heapdesc.c
@@ -349,13 +349,6 @@ heap2_desc(StringInfo buf, XLogReaderState *record)
}
}
}
- else if (info == XLOG_HEAP2_VISIBLE)
- {
- xl_heap_visible *xlrec = (xl_heap_visible *) rec;
-
- appendStringInfo(buf, "snapshotConflictHorizon: %u, flags: 0x%02X",
- xlrec->snapshotConflictHorizon, xlrec->flags);
- }
else if (info == XLOG_HEAP2_MULTI_INSERT)
{
xl_heap_multi_insert *xlrec = (xl_heap_multi_insert *) rec;
@@ -461,9 +454,6 @@ heap2_identify(uint8 info)
case XLOG_HEAP2_PRUNE_VACUUM_CLEANUP:
id = "PRUNE_VACUUM_CLEANUP";
break;
- case XLOG_HEAP2_VISIBLE:
- id = "VISIBLE";
- break;
case XLOG_HEAP2_MULTI_INSERT:
id = "MULTI_INSERT";
break;
diff --git a/src/backend/replication/logical/decode.c b/src/backend/replication/logical/decode.c
index 5e15cb1825e..c0cac7ea1c3 100644
--- a/src/backend/replication/logical/decode.c
+++ b/src/backend/replication/logical/decode.c
@@ -454,7 +454,6 @@ heap2_decode(LogicalDecodingContext *ctx, XLogRecordBuffer *buf)
case XLOG_HEAP2_PRUNE_ON_ACCESS:
case XLOG_HEAP2_PRUNE_VACUUM_SCAN:
case XLOG_HEAP2_PRUNE_VACUUM_CLEANUP:
- case XLOG_HEAP2_VISIBLE:
case XLOG_HEAP2_LOCK_UPDATED:
break;
default:
diff --git a/src/backend/storage/ipc/standby.c b/src/backend/storage/ipc/standby.c
index fc45d72c79b..3655358ed6b 100644
--- a/src/backend/storage/ipc/standby.c
+++ b/src/backend/storage/ipc/standby.c
@@ -475,12 +475,12 @@ ResolveRecoveryConflictWithSnapshot(TransactionId snapshotConflictHorizon,
* If we get passed InvalidTransactionId then we do nothing (no conflict).
*
* This can happen when replaying already-applied WAL records after a
- * standby crash or restart, or when replaying an XLOG_HEAP2_VISIBLE
- * record that marks as frozen a page which was already all-visible. It's
- * also quite common with records generated during index deletion
- * (original execution of the deletion can reason that a recovery conflict
- * which is sufficient for the deletion operation must take place before
- * replay of the deletion record itself).
+ * standby crash or restart, or when replaying a record that marks as
+ * frozen a page which was already marked all-visible in the visibility
+ * map. It's also quite common with records generated during index
+ * deletion (original execution of the deletion can reason that a recovery
+ * conflict which is sufficient for the deletion operation must take place
+ * before replay of the deletion record itself).
*/
if (!TransactionIdIsValid(snapshotConflictHorizon))
return;
diff --git a/src/include/access/heapam_xlog.h b/src/include/access/heapam_xlog.h
index 16c2b2e3c9c..b27fcdfb345 100644
--- a/src/include/access/heapam_xlog.h
+++ b/src/include/access/heapam_xlog.h
@@ -60,7 +60,6 @@
#define XLOG_HEAP2_PRUNE_ON_ACCESS 0x10
#define XLOG_HEAP2_PRUNE_VACUUM_SCAN 0x20
#define XLOG_HEAP2_PRUNE_VACUUM_CLEANUP 0x30
-#define XLOG_HEAP2_VISIBLE 0x40
#define XLOG_HEAP2_MULTI_INSERT 0x50
#define XLOG_HEAP2_LOCK_UPDATED 0x60
#define XLOG_HEAP2_NEW_CID 0x70
@@ -443,20 +442,6 @@ typedef struct xl_heap_inplace
#define MinSizeOfHeapInplace (offsetof(xl_heap_inplace, nmsgs) + sizeof(int))
-/*
- * This is what we need to know about setting a visibility map bit
- *
- * Backup blk 0: visibility map buffer
- * Backup blk 1: heap buffer
- */
-typedef struct xl_heap_visible
-{
- TransactionId snapshotConflictHorizon;
- uint8 flags;
-} xl_heap_visible;
-
-#define SizeOfHeapVisible (offsetof(xl_heap_visible, flags) + sizeof(uint8))
-
typedef struct xl_heap_new_cid
{
/*
@@ -500,11 +485,6 @@ extern void heap2_desc(StringInfo buf, XLogReaderState *record);
extern const char *heap2_identify(uint8 info);
extern void heap_xlog_logical_rewrite(XLogReaderState *r);
-extern XLogRecPtr log_heap_visible(Relation rel, Buffer heap_buffer,
- Buffer vm_buffer,
- TransactionId snapshotConflictHorizon,
- uint8 vmflags);
-
/* in heapdesc.c, so it can be shared between frontend/backend code */
extern void heap_xlog_deserialize_prune_and_freeze(char *cursor, uint16 flags,
int *nplans, xlhp_freeze_plan **plans,
diff --git a/src/include/access/visibilitymap.h b/src/include/access/visibilitymap.h
index 787c19e5fef..a6580ea6188 100644
--- a/src/include/access/visibilitymap.h
+++ b/src/include/access/visibilitymap.h
@@ -15,7 +15,6 @@
#define VISIBILITYMAP_H
#include "access/visibilitymapdefs.h"
-#include "access/xlogdefs.h"
#include "storage/block.h"
#include "storage/buf.h"
#include "storage/relfilelocator.h"
@@ -32,15 +31,9 @@ extern bool visibilitymap_clear(Relation rel, BlockNumber heapBlk,
extern void visibilitymap_pin(Relation rel, BlockNumber heapBlk,
Buffer *vmbuf);
extern bool visibilitymap_pin_ok(BlockNumber heapBlk, Buffer vmbuf);
-extern void visibilitymap_set(Relation rel,
- BlockNumber heapBlk, Buffer heapBuf,
- XLogRecPtr recptr,
- Buffer vmBuf,
- TransactionId cutoff_xid,
- uint8 flags);
-extern void visibilitymap_set_vmbits(BlockNumber heapBlk,
- Buffer vmBuf, uint8 flags,
- const RelFileLocator rlocator);
+extern void visibilitymap_set(BlockNumber heapBlk,
+ Buffer vmBuf, uint8 flags,
+ const RelFileLocator rlocator);
extern uint8 visibilitymap_get_status(Relation rel, BlockNumber heapBlk, Buffer *vmbuf);
extern void visibilitymap_count(Relation rel, BlockNumber *all_visible, BlockNumber *all_frozen);
extern BlockNumber visibilitymap_prepare_truncate(Relation rel,
diff --git a/src/include/access/visibilitymapdefs.h b/src/include/access/visibilitymapdefs.h
index 5ad5c020877..e01bce4c99f 100644
--- a/src/include/access/visibilitymapdefs.h
+++ b/src/include/access/visibilitymapdefs.h
@@ -21,14 +21,5 @@
#define VISIBILITYMAP_ALL_FROZEN 0x02
#define VISIBILITYMAP_VALID_BITS 0x03 /* OR of all valid visibilitymap
* flags bits */
-/*
- * To detect recovery conflicts during logical decoding on a standby, we need
- * to know if a table is a user catalog table. For that we add an additional
- * bit into xl_heap_visible.flags, in addition to the above.
- *
- * NB: VISIBILITYMAP_XLOG_* may not be passed to visibilitymap_set().
- */
-#define VISIBILITYMAP_XLOG_CATALOG_REL 0x04
-#define VISIBILITYMAP_XLOG_VALID_BITS (VISIBILITYMAP_VALID_BITS | VISIBILITYMAP_XLOG_CATALOG_REL)
#endif /* VISIBILITYMAPDEFS_H */
diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list
index 04845d5e680..6505628120c 100644
--- a/src/tools/pgindent/typedefs.list
+++ b/src/tools/pgindent/typedefs.list
@@ -4331,7 +4331,6 @@ xl_heap_prune
xl_heap_rewrite_mapping
xl_heap_truncate
xl_heap_update
-xl_heap_visible
xl_invalid_page
xl_invalid_page_key
xl_invalidations
--
2.43.0
v30-0009-Simplify-heap_page_would_be_all_visible-visibili.patchtext/x-patch; charset=US-ASCII; name=v30-0009-Simplify-heap_page_would_be_all_visible-visibili.patchDownload
From 667b2e7c19c70694912223bc35d8f286a439dacd Mon Sep 17 00:00:00 2001
From: Melanie Plageman <melanieplageman@gmail.com>
Date: Wed, 17 Dec 2025 13:57:16 -0500
Subject: [PATCH v30 09/16] Simplify heap_page_would_be_all_visible visibility
check
heap_page_would_be_all_visible() doesn't care about the distinction
between HEAPTUPLE_RECENTLY_DEAD and HEAPTUPLE_DEAD tuples -- any tuple
that is not HEAPTUPLE_LIVE means the page is not all-visible and causes
us to return false.
Therefore, we don't need to call HeapTupleSatisfiesVacuum(), which
includes an extra step to distinguish between dead and recently dead
tuples using OldestXmin. Replace it with the more minimal
HeapTupleSatisfiesVacuumHorizon().
This has the added benefit of making it easier to replace uses of
OldestXmin in heap_page_would_be_all_visible() in the future.
Author: Melanie Plageman <melanieplageman@gmail.com>
Reviewed-by: Kirill Reshke <reshkekirill@gmail.com>
Discussion: https://postgr.es/m/CALdSSPjvhGXihT_9f-GJabYU%3D_PjrFDUxYaURuTbfLyQM6TErg%40mail.gmail.com
---
src/backend/access/heap/vacuumlazy.c | 9 +++++++--
1 file changed, 7 insertions(+), 2 deletions(-)
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index 93f0f39c5f0..e827ca21c68 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -3537,6 +3537,7 @@ heap_page_would_be_all_visible(Relation rel, Buffer buf,
{
ItemId itemid;
HeapTupleData tuple;
+ TransactionId dead_after;
/*
* Set the offset number so that we can display it along with any
@@ -3576,12 +3577,14 @@ heap_page_would_be_all_visible(Relation rel, Buffer buf,
/* Visibility checks may do IO or allocate memory */
Assert(CritSectionCount == 0);
- switch (HeapTupleSatisfiesVacuum(&tuple, OldestXmin, buf))
+ switch (HeapTupleSatisfiesVacuumHorizon(&tuple, buf, &dead_after))
{
case HEAPTUPLE_LIVE:
{
TransactionId xmin;
+ Assert(!TransactionIdIsValid(dead_after));
+
/* Check comments in lazy_scan_prune. */
if (!HeapTupleHeaderXminCommitted(tuple.t_data))
{
@@ -3614,8 +3617,10 @@ heap_page_would_be_all_visible(Relation rel, Buffer buf,
}
break;
- case HEAPTUPLE_DEAD:
case HEAPTUPLE_RECENTLY_DEAD:
+ Assert(TransactionIdIsValid(dead_after));
+ /* FALLTHROUGH */
+ case HEAPTUPLE_DEAD:
case HEAPTUPLE_INSERT_IN_PROGRESS:
case HEAPTUPLE_DELETE_IN_PROGRESS:
{
--
2.43.0
v30-0010-Remove-table_scan_analyze_next_tuple-unneeded-pa.patchtext/x-patch; charset=US-ASCII; name=v30-0010-Remove-table_scan_analyze_next_tuple-unneeded-pa.patchDownload
From 8025146e100c0433670acb8dafa722b743842e2a Mon Sep 17 00:00:00 2001
From: Melanie Plageman <melanieplageman@gmail.com>
Date: Mon, 22 Dec 2025 10:46:45 -0500
Subject: [PATCH v30 10/16] Remove table_scan_analyze_next_tuple unneeded
parameter OldestXmin
heapam_scan_analyze_next_tuple() doesn't distinguish between dead and
recently dead tuples when counting them, so it doesn't need OldestXmin.
Looking at other table AMs implementing table_scan_analyze_next_tuple(),
it appears most do not use OldestXmin either.
Suggested-by: Kirill Reshke <reshkekirill@gmail.com>
Discussion: https://postgr.es/m/CALdSSPjvhGXihT_9f-GJabYU%3D_PjrFDUxYaURuTbfLyQM6TErg%40mail.gmail.com
---
src/backend/access/heap/heapam_handler.c | 13 +++++++++----
src/include/access/tableam.h | 3 +--
2 files changed, 10 insertions(+), 6 deletions(-)
diff --git a/src/backend/access/heap/heapam_handler.c b/src/backend/access/heap/heapam_handler.c
index dd4fe6bf62f..8707d1aab4a 100644
--- a/src/backend/access/heap/heapam_handler.c
+++ b/src/backend/access/heap/heapam_handler.c
@@ -1026,7 +1026,7 @@ heapam_scan_analyze_next_block(TableScanDesc scan, ReadStream *stream)
}
static bool
-heapam_scan_analyze_next_tuple(TableScanDesc scan, TransactionId OldestXmin,
+heapam_scan_analyze_next_tuple(TableScanDesc scan,
double *liverows, double *deadrows,
TupleTableSlot *slot)
{
@@ -1047,6 +1047,7 @@ heapam_scan_analyze_next_tuple(TableScanDesc scan, TransactionId OldestXmin,
ItemId itemid;
HeapTuple targtuple = &hslot->base.tupdata;
bool sample_it = false;
+ TransactionId dead_after;
itemid = PageGetItemId(targpage, hscan->rs_cindex);
@@ -1069,16 +1070,20 @@ heapam_scan_analyze_next_tuple(TableScanDesc scan, TransactionId OldestXmin,
targtuple->t_data = (HeapTupleHeader) PageGetItem(targpage, itemid);
targtuple->t_len = ItemIdGetLength(itemid);
- switch (HeapTupleSatisfiesVacuum(targtuple, OldestXmin,
- hscan->rs_cbuf))
+ switch (HeapTupleSatisfiesVacuumHorizon(targtuple,
+ hscan->rs_cbuf,
+ &dead_after))
{
case HEAPTUPLE_LIVE:
sample_it = true;
*liverows += 1;
break;
- case HEAPTUPLE_DEAD:
case HEAPTUPLE_RECENTLY_DEAD:
+ Assert(TransactionIdIsValid(dead_after));
+ /* FALLTHROUGH */
+
+ case HEAPTUPLE_DEAD:
/* Count dead and recently-dead rows */
*deadrows += 1;
break;
diff --git a/src/include/access/tableam.h b/src/include/access/tableam.h
index 2fa790b6bf5..767f5be838a 100644
--- a/src/include/access/tableam.h
+++ b/src/include/access/tableam.h
@@ -683,7 +683,6 @@ typedef struct TableAmRoutine
* callback).
*/
bool (*scan_analyze_next_tuple) (TableScanDesc scan,
- TransactionId OldestXmin,
double *liverows,
double *deadrows,
TupleTableSlot *slot);
@@ -1718,7 +1717,7 @@ table_scan_analyze_next_tuple(TableScanDesc scan, TransactionId OldestXmin,
double *liverows, double *deadrows,
TupleTableSlot *slot)
{
- return scan->rs_rd->rd_tableam->scan_analyze_next_tuple(scan, OldestXmin,
+ return scan->rs_rd->rd_tableam->scan_analyze_next_tuple(scan,
liverows, deadrows,
slot);
}
--
2.43.0
v30-0011-Use-GlobalVisState-in-vacuum-to-determine-page-l.patchtext/x-patch; charset=UTF-8; name=v30-0011-Use-GlobalVisState-in-vacuum-to-determine-page-l.patchDownload
From de44d947223fa6c56fb3c75f8c32517068cf05ac Mon Sep 17 00:00:00 2001
From: Melanie Plageman <melanieplageman@gmail.com>
Date: Wed, 17 Dec 2025 16:51:05 -0500
Subject: [PATCH v30 11/16] Use GlobalVisState in vacuum to determine page
level visibility
During vacuum's first and third phases, we examine tuples' visibility
to determine if we can set the page all-visible in the visibility map.
Previously, this check compared tuple xmins against a single XID chosen at
the start of vacuum (OldestXmin). We now use GlobalVisState, which also
enables future work to set the VM during on-access pruning, since ordinary
queries have access to GlobalVisState but not OldestXmin.
This also benefits vacuum: in some cases, GlobalVisState may advance
during a vacuum, allowing more pages to become considered all-visible.
And, in the future, we could easily add a heuristic to update
GlobalVisState more frequently during vacuums of large tables.
OldestXmin is still used for freezing and as a backstop to ensure we
don't freeze a dead tuple that wasn't yet prunable according to
GlobalVisState in the rare occurrences where GlobalVisState moves
backwards.
Because comparing a transaction ID against GlobalVisState is more
expensive than comparing against a single XID, we defer this check until
after scanning all tuples on the page. If visibility_cutoff_xid was
maintained, we perform the GlobalVisState check only once per page.
This is safe because visibility_cutoff_xid records the newest xmin on
the page; if it is globally visible, then the entire page is all-visible.
This approach may result in examining more tuple xmins than before,
since with OldestXmin we could sometimes rule out the page being
all-visible earlier. However, profiling shows the additional cost is not
significant.
Reviewed-by: Andres Freund <andres@anarazel.de>
Reviewed-by: Chao Li <li.evan.chao@gmail.com>
---
src/backend/access/heap/heapam_visibility.c | 22 +++++++++
src/backend/access/heap/pruneheap.c | 53 ++++++++++-----------
src/backend/access/heap/vacuumlazy.c | 38 ++++++++++-----
src/include/access/heapam.h | 4 +-
4 files changed, 76 insertions(+), 41 deletions(-)
diff --git a/src/backend/access/heap/heapam_visibility.c b/src/backend/access/heap/heapam_visibility.c
index bf899c2d2c6..7d9bd28d8f0 100644
--- a/src/backend/access/heap/heapam_visibility.c
+++ b/src/backend/access/heap/heapam_visibility.c
@@ -1053,6 +1053,28 @@ HeapTupleSatisfiesVacuum(HeapTuple htup, TransactionId OldestXmin,
return res;
}
+/*
+ * Wrapper around GlobalVisTestIsRemovableXid() for use when examining live
+ * tuples. Returns true if the given XID may be considered running by at least
+ * one snapshot.
+ *
+ * This function alone is insufficient to determine tuple visibility; callers
+ * must also consider the XID's commit status. Its purpose is purely semantic:
+ * when applied to live tuples, GlobalVisTestIsRemovableXid() is checking
+ * whether the inserting transaction is still considered running, not whether
+ * the tuple is removable. Live tuples are, by definition, not removable, but
+ * the snapshot criteria for “transaction still running” are identical to
+ * those used for removal XIDs.
+ *
+ * See the comment above GlobalVisTestIsRemovable[Full]Xid() for details on the
+ * required preconditions for calling this function.
+ */
+bool
+GlobalVisTestXidMaybeRunning(GlobalVisState *state, TransactionId xid)
+{
+ return !GlobalVisTestIsRemovableXid(state, xid);
+}
+
/*
* Work horse for HeapTupleSatisfiesVacuum and similar routines.
*
diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index b099483051a..c507231d2a4 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -461,11 +461,12 @@ prune_freeze_setup(PruneFreezeParams *params,
/*
* The visibility cutoff xid is the newest xmin of live, committed tuples
- * older than OldestXmin on the page. This field is only kept up-to-date
- * if the page is all-visible. As soon as a tuple is encountered that is
- * not visible to all, this field is unmaintained. As long as it is
- * maintained, it can be used to calculate the snapshot conflict horizon
- * when updating the VM and/or freezing all the tuples on the page.
+ * on the page older than the visibility horizon represented in the
+ * GlobalVisState. This field is only kept up-to-date if the page is
+ * all-visible. As soon as a tuple is encountered that is not visible to
+ * all, this field is unmaintained. As long as it is maintained, it can be
+ * used to calculate the snapshot conflict horizon when updating the VM
+ * and/or freezing all the tuples on the page.
*/
prstate->visibility_cutoff_xid = InvalidTransactionId;
}
@@ -1008,14 +1009,14 @@ heap_page_will_set_vm(PruneState *prstate,
*/
static bool
heap_page_is_all_visible(Relation rel, Buffer buf,
- TransactionId OldestXmin,
+ GlobalVisState *vistest,
bool *all_frozen,
TransactionId *visibility_cutoff_xid,
OffsetNumber *logging_offnum)
{
return heap_page_would_be_all_visible(rel, buf,
- OldestXmin,
+ vistest,
NULL, 0,
all_frozen,
visibility_cutoff_xid,
@@ -1102,6 +1103,16 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
prune_freeze_plan(RelationGetRelid(params->relation),
buffer, &prstate, off_loc);
+ /*
+ * After processing all the live tuples on the page, if the newest xmin
+ * amongst them may be considered running by any snapshot, the page cannot
+ * be all-visible.
+ */
+ if (prstate.all_visible &&
+ TransactionIdIsNormal(prstate.visibility_cutoff_xid) &&
+ GlobalVisTestXidMaybeRunning(prstate.vistest, prstate.visibility_cutoff_xid))
+ prstate.all_visible = prstate.all_frozen = false;
+
/*
* If checksums are enabled, calling heap_prune_satisfies_vacuum() while
* checking tuple visibility information in prune_freeze_plan() may have
@@ -1283,10 +1294,9 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
bool debug_all_frozen;
Assert(prstate.lpdead_items == 0);
- Assert(prstate.cutoffs);
Assert(heap_page_is_all_visible(params->relation, buffer,
- prstate.cutoffs->OldestXmin,
+ prstate.vistest,
&debug_all_frozen,
&debug_cutoff, off_loc));
@@ -1807,28 +1817,15 @@ heap_prune_record_unchanged_lp_normal(Page page, PruneState *prstate, OffsetNumb
}
/*
- * The inserter definitely committed. But is it old enough
- * that everyone sees it as committed? A FrozenTransactionId
- * is seen as committed to everyone. Otherwise, we check if
- * there is a snapshot that considers this xid to still be
- * running, and if so, we don't consider the page all-visible.
+ * The inserter definitely committed. But we don't know if it
+ * is old enough that everyone sees it as committed. Later,
+ * after processing all the tuples on the page, we'll check if
+ * there is any snapshot that still considers the newest xid
+ * on the page to be running. If so, we don't consider the
+ * page all-visible.
*/
xmin = HeapTupleHeaderGetXmin(htup);
- /*
- * For now always use prstate->cutoffs for this test, because
- * we only update 'all_visible' and 'all_frozen' when freezing
- * is requested. We could use GlobalVisTestIsRemovableXid
- * instead, if a non-freezing caller wanted to set the VM bit.
- */
- Assert(prstate->cutoffs);
- if (!TransactionIdPrecedes(xmin, prstate->cutoffs->OldestXmin))
- {
- prstate->all_visible = false;
- prstate->all_frozen = false;
- break;
- }
-
/* Track newest xmin on page. */
if (TransactionIdFollows(xmin, prstate->visibility_cutoff_xid) &&
TransactionIdIsNormal(xmin))
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index e827ca21c68..7463d46891b 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -2725,7 +2725,7 @@ lazy_vacuum_heap_page(LVRelState *vacrel, BlockNumber blkno, Buffer buffer,
* done outside the critical section.
*/
if (heap_page_would_be_all_visible(vacrel->rel, buffer,
- vacrel->cutoffs.OldestXmin,
+ vacrel->vistest,
deadoffsets, num_offsets,
&all_frozen, &visibility_cutoff_xid,
&vacrel->offnum))
@@ -3486,7 +3486,7 @@ dead_items_cleanup(LVRelState *vacrel)
* Returns true if the page is all-visible other than the provided
* deadoffsets and false otherwise.
*
- * OldestXmin is used to determine visibility.
+ * vistest is used to determine visibility.
*
* Output parameters:
*
@@ -3502,7 +3502,7 @@ dead_items_cleanup(LVRelState *vacrel)
*/
bool
heap_page_would_be_all_visible(Relation rel, Buffer buf,
- TransactionId OldestXmin,
+ GlobalVisState *vistest,
OffsetNumber *deadoffsets,
int ndeadoffsets,
bool *all_frozen,
@@ -3585,7 +3585,7 @@ heap_page_would_be_all_visible(Relation rel, Buffer buf,
Assert(!TransactionIdIsValid(dead_after));
- /* Check comments in lazy_scan_prune. */
+ /* Check heap_prune_record_unchanged_lp_normal comments */
if (!HeapTupleHeaderXminCommitted(tuple.t_data))
{
all_visible = false;
@@ -3594,16 +3594,17 @@ heap_page_would_be_all_visible(Relation rel, Buffer buf,
}
/*
- * The inserter definitely committed. But is it old enough
- * that everyone sees it as committed?
+ * The inserter definitely committed. But we don't know if
+ * it is old enough that everyone sees it as committed.
+ * Don't check that now.
+ *
+ * If we scan all tuples without finding one that prevents
+ * the page from being all-visible, we then check whether
+ * any snapshot still considers the newest XID on the page
+ * to be running. In that case, the page is not considered
+ * all-visible.
*/
xmin = HeapTupleHeaderGetXmin(tuple.t_data);
- if (!TransactionIdPrecedes(xmin, OldestXmin))
- {
- all_visible = false;
- *all_frozen = false;
- break;
- }
/* Track newest xmin on page. */
if (TransactionIdFollows(xmin, *visibility_cutoff_xid) &&
@@ -3634,6 +3635,19 @@ heap_page_would_be_all_visible(Relation rel, Buffer buf,
}
} /* scan along page */
+ /*
+ * After processing all the live tuples on the page, if the newest xmin
+ * among them may still be considered running by any snapshot, the page
+ * cannot be all-visible.
+ */
+ if (all_visible &&
+ TransactionIdIsNormal(*visibility_cutoff_xid) &&
+ GlobalVisTestXidMaybeRunning(vistest, *visibility_cutoff_xid))
+ {
+ all_visible = false;
+ *all_frozen = false;
+ }
+
/* Clear the offset information once we have processed the given page. */
*logging_offnum = InvalidOffsetNumber;
diff --git a/src/include/access/heapam.h b/src/include/access/heapam.h
index 88e79c58a10..5657b1df46b 100644
--- a/src/include/access/heapam.h
+++ b/src/include/access/heapam.h
@@ -438,7 +438,7 @@ extern void log_heap_prune_and_freeze(Relation relation, Buffer buffer,
extern void heap_vacuum_rel(Relation rel,
const VacuumParams params, BufferAccessStrategy bstrategy);
extern bool heap_page_would_be_all_visible(Relation rel, Buffer buf,
- TransactionId OldestXmin,
+ GlobalVisState *vistest,
OffsetNumber *deadoffsets,
int ndeadoffsets,
bool *all_frozen,
@@ -452,6 +452,8 @@ extern TM_Result HeapTupleSatisfiesUpdate(HeapTuple htup, CommandId curcid,
Buffer buffer);
extern HTSV_Result HeapTupleSatisfiesVacuum(HeapTuple htup, TransactionId OldestXmin,
Buffer buffer);
+
+extern bool GlobalVisTestXidMaybeRunning(GlobalVisState *state, TransactionId xid);
extern HTSV_Result HeapTupleSatisfiesVacuumHorizon(HeapTuple htup, Buffer buffer,
TransactionId *dead_after);
extern void HeapTupleSetHintBits(HeapTupleHeader tuple, Buffer buffer,
--
2.43.0
v30-0012-Unset-all_visible-sooner-if-not-freezing.patchtext/x-patch; charset=UTF-8; name=v30-0012-Unset-all_visible-sooner-if-not-freezing.patchDownload
From d7205c9eb70670edafd5098a5a712f2b3f8ff919 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <melanieplageman@gmail.com>
Date: Tue, 14 Oct 2025 15:22:35 -0400
Subject: [PATCH v30 12/16] Unset all_visible sooner if not freezing
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
In the prune/freeze path, we currently delay clearing all_visible and
all_frozen in the presence of dead items to allow opportunistic
freezing.
However, if no freezing will be attempted, there’s no need to delay.
Clearing the flags earlier avoids extra bookkeeping in
heap_prune_record_unchanged_lp_normal(). This currently has no runtime
effect because all callers that consider setting the VM also prepare
freeze plans, but upcoming changes will allow on-access pruning to set
the VM without freezing. The extra bookkeeping was noticeable in a
profile of on-access VM setting.
Reviewed-by: Chao Li <li.evan.chao@gmail.com>
---
src/backend/access/heap/pruneheap.c | 15 +++++++++++++--
1 file changed, 13 insertions(+), 2 deletions(-)
diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index c507231d2a4..8e59e7692c1 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -1682,8 +1682,13 @@ heap_prune_record_dead(PruneState *prstate, OffsetNumber offnum,
/*
* Deliberately delay unsetting all_visible and all_frozen until later
* during pruning. Removable dead tuples shouldn't preclude freezing the
- * page.
+ * page. If we won't attempt freezing, just unset all-visible now, though.
*/
+ if (!prstate->attempt_freeze)
+ {
+ prstate->all_visible = false;
+ prstate->all_frozen = false;
+ }
/* Record the dead offset for vacuum */
prstate->deadoffsets[prstate->lpdead_items++] = offnum;
@@ -1943,8 +1948,14 @@ heap_prune_record_unchanged_lp_dead(Page page, PruneState *prstate, OffsetNumber
* Similarly, don't unset all_visible and all_frozen until later, at the
* end of heap_page_prune_and_freeze(). This will allow us to attempt to
* freeze the page after pruning. As long as we unset it before updating
- * the visibility map, this will be correct.
+ * the visibility map, this will be correct. If we won't attempt freezing,
+ * though, just unset all_visible and all_frozen now.
*/
+ if (!prstate->attempt_freeze)
+ {
+ prstate->all_visible = false;
+ prstate->all_frozen = false;
+ }
/* Record the dead offset for vacuum */
prstate->deadoffsets[prstate->lpdead_items++] = offnum;
--
2.43.0
v30-0013-Track-which-relations-are-modified-by-a-query.patchtext/x-patch; charset=US-ASCII; name=v30-0013-Track-which-relations-are-modified-by-a-query.patchDownload
From d8acbf1885d6cae2a8954ff466e482fbad82ee9c Mon Sep 17 00:00:00 2001
From: Melanie Plageman <melanieplageman@gmail.com>
Date: Wed, 3 Dec 2025 15:07:24 -0500
Subject: [PATCH v30 13/16] Track which relations are modified by a query
Save the relids in a bitmap in the estate. A later commit will pass this
information down to scan nodes to control whether or not the scan allows
setting the visibility map while on-access pruning. We don't want to set
the visibility map if the query is just going to modify the page
immediately after.
Reviewed-by: Chao Li <li.evan.chao@gmail.com>
---
src/backend/executor/execMain.c | 4 ++++
src/backend/executor/execUtils.c | 2 ++
src/include/nodes/execnodes.h | 6 ++++++
3 files changed, 12 insertions(+)
diff --git a/src/backend/executor/execMain.c b/src/backend/executor/execMain.c
index 797d8b1ca1c..9df7df17e96 100644
--- a/src/backend/executor/execMain.c
+++ b/src/backend/executor/execMain.c
@@ -916,6 +916,10 @@ InitPlan(QueryDesc *queryDesc, int eflags)
break;
}
+ /* If it has a rowmark, the relation may be modified */
+ estate->es_modified_relids = bms_add_member(estate->es_modified_relids,
+ rc->rti);
+
/* Check that relation is a legal target for marking */
if (relation)
CheckValidRowMarkRel(relation, rc->markType);
diff --git a/src/backend/executor/execUtils.c b/src/backend/executor/execUtils.c
index fdc65c2b42b..28a06dcd244 100644
--- a/src/backend/executor/execUtils.c
+++ b/src/backend/executor/execUtils.c
@@ -893,6 +893,8 @@ ExecInitResultRelation(EState *estate, ResultRelInfo *resultRelInfo,
estate->es_result_relations = (ResultRelInfo **)
palloc0(estate->es_range_table_size * sizeof(ResultRelInfo *));
estate->es_result_relations[rti - 1] = resultRelInfo;
+ estate->es_modified_relids = bms_add_member(estate->es_modified_relids,
+ rti);
/*
* Saving in the list allows to avoid needlessly traversing the whole
diff --git a/src/include/nodes/execnodes.h b/src/include/nodes/execnodes.h
index 3968429f991..13b42b5e6d1 100644
--- a/src/include/nodes/execnodes.h
+++ b/src/include/nodes/execnodes.h
@@ -676,6 +676,12 @@ typedef struct EState
* ExecDoInitialPruning() */
const char *es_sourceText; /* Source text from QueryDesc */
+ /*
+ * RT indexes of relations modified by the query through a
+ * UPDATE/DELETE/INSERT/MERGE or targeted by a SELECT FOR UPDATE.
+ */
+ Bitmapset *es_modified_relids;
+
JunkFilter *es_junkFilter; /* top-level junk filter, if any */
/* If query can insert/delete tuples, the command ID to mark them with */
--
2.43.0
v30-0014-Pass-down-information-on-table-modification-to-s.patchtext/x-patch; charset=US-ASCII; name=v30-0014-Pass-down-information-on-table-modification-to-s.patchDownload
From 51cd27b100af830038376794ad72b15b315551af Mon Sep 17 00:00:00 2001
From: Melanie Plageman <melanieplageman@gmail.com>
Date: Wed, 3 Dec 2025 15:12:18 -0500
Subject: [PATCH v30 14/16] Pass down information on table modification to scan
node
Pass down information to sequential scan, index scan, and bitmap table
scan nodes on whether or not the query modifies the relation being
scanned. A later commit will use this information to update the VM
during on-access pruning only if the relation is not modified by the
query.
---
contrib/pgrowlocks/pgrowlocks.c | 2 +-
src/backend/access/brin/brin.c | 3 ++-
src/backend/access/gin/gininsert.c | 3 ++-
src/backend/access/heap/heapam_handler.c | 7 +++---
src/backend/access/index/genam.c | 4 ++--
src/backend/access/index/indexam.c | 6 +++---
src/backend/access/nbtree/nbtsort.c | 2 +-
src/backend/access/table/tableam.c | 7 +++---
src/backend/commands/constraint.c | 2 +-
src/backend/commands/copyto.c | 2 +-
src/backend/commands/tablecmds.c | 8 +++----
src/backend/commands/typecmds.c | 4 ++--
src/backend/executor/execIndexing.c | 2 +-
src/backend/executor/execReplication.c | 8 +++----
src/backend/executor/nodeBitmapHeapscan.c | 9 +++++++-
src/backend/executor/nodeIndexonlyscan.c | 2 +-
src/backend/executor/nodeIndexscan.c | 11 ++++++++--
src/backend/executor/nodeSeqscan.c | 26 ++++++++++++++++++++---
src/backend/partitioning/partbounds.c | 2 +-
src/backend/utils/adt/selfuncs.c | 2 +-
src/include/access/genam.h | 2 +-
src/include/access/heapam.h | 6 ++++++
src/include/access/tableam.h | 19 ++++++++++-------
23 files changed, 93 insertions(+), 46 deletions(-)
diff --git a/contrib/pgrowlocks/pgrowlocks.c b/contrib/pgrowlocks/pgrowlocks.c
index f88269332b6..27f01d8055f 100644
--- a/contrib/pgrowlocks/pgrowlocks.c
+++ b/contrib/pgrowlocks/pgrowlocks.c
@@ -114,7 +114,7 @@ pgrowlocks(PG_FUNCTION_ARGS)
RelationGetRelationName(rel));
/* Scan the relation */
- scan = table_beginscan(rel, GetActiveSnapshot(), 0, NULL);
+ scan = table_beginscan(rel, GetActiveSnapshot(), 0, NULL, 0);
hscan = (HeapScanDesc) scan;
attinmeta = TupleDescGetAttInMetadata(rsinfo->setDesc);
diff --git a/src/backend/access/brin/brin.c b/src/backend/access/brin/brin.c
index 45d306037a4..5c4bf5f0c6e 100644
--- a/src/backend/access/brin/brin.c
+++ b/src/backend/access/brin/brin.c
@@ -2842,7 +2842,8 @@ _brin_parallel_scan_and_build(BrinBuildState *state,
indexInfo->ii_Concurrent = brinshared->isconcurrent;
scan = table_beginscan_parallel(heap,
- ParallelTableScanFromBrinShared(brinshared));
+ ParallelTableScanFromBrinShared(brinshared),
+ 0);
reltuples = table_index_build_scan(heap, index, indexInfo, true, true,
brinbuildCallbackParallel, state, scan);
diff --git a/src/backend/access/gin/gininsert.c b/src/backend/access/gin/gininsert.c
index fc6af7c751b..b2457b96dcc 100644
--- a/src/backend/access/gin/gininsert.c
+++ b/src/backend/access/gin/gininsert.c
@@ -2058,7 +2058,8 @@ _gin_parallel_scan_and_build(GinBuildState *state,
indexInfo->ii_Concurrent = ginshared->isconcurrent;
scan = table_beginscan_parallel(heap,
- ParallelTableScanFromGinBuildShared(ginshared));
+ ParallelTableScanFromGinBuildShared(ginshared),
+ 0);
reltuples = table_index_build_scan(heap, index, indexInfo, true, progress,
ginBuildCallbackParallel, state, scan);
diff --git a/src/backend/access/heap/heapam_handler.c b/src/backend/access/heap/heapam_handler.c
index 8707d1aab4a..fc251e11f8a 100644
--- a/src/backend/access/heap/heapam_handler.c
+++ b/src/backend/access/heap/heapam_handler.c
@@ -79,12 +79,13 @@ heapam_slot_callbacks(Relation relation)
*/
static IndexFetchTableData *
-heapam_index_fetch_begin(Relation rel)
+heapam_index_fetch_begin(Relation rel, uint32 flags)
{
IndexFetchHeapData *hscan = palloc0_object(IndexFetchHeapData);
hscan->xs_base.rel = rel;
hscan->xs_cbuf = InvalidBuffer;
+ hscan->modifies_base_rel = !(flags & SO_HINT_REL_READ_ONLY);
return &hscan->xs_base;
}
@@ -753,7 +754,7 @@ heapam_relation_copy_for_cluster(Relation OldHeap, Relation NewHeap,
tableScan = NULL;
heapScan = NULL;
- indexScan = index_beginscan(OldHeap, OldIndex, SnapshotAny, NULL, 0, 0);
+ indexScan = index_beginscan(OldHeap, OldIndex, SnapshotAny, NULL, 0, 0, 0);
index_rescan(indexScan, NULL, 0, NULL, 0);
}
else
@@ -762,7 +763,7 @@ heapam_relation_copy_for_cluster(Relation OldHeap, Relation NewHeap,
pgstat_progress_update_param(PROGRESS_CLUSTER_PHASE,
PROGRESS_CLUSTER_PHASE_SEQ_SCAN_HEAP);
- tableScan = table_beginscan(OldHeap, SnapshotAny, 0, (ScanKey) NULL);
+ tableScan = table_beginscan(OldHeap, SnapshotAny, 0, (ScanKey) NULL, 0);
heapScan = (HeapScanDesc) tableScan;
indexScan = NULL;
diff --git a/src/backend/access/index/genam.c b/src/backend/access/index/genam.c
index b7f10a1aed0..15f9cc11582 100644
--- a/src/backend/access/index/genam.c
+++ b/src/backend/access/index/genam.c
@@ -447,7 +447,7 @@ systable_beginscan(Relation heapRelation,
}
sysscan->iscan = index_beginscan(heapRelation, irel,
- snapshot, NULL, nkeys, 0);
+ snapshot, NULL, nkeys, 0, 0);
index_rescan(sysscan->iscan, idxkey, nkeys, NULL, 0);
sysscan->scan = NULL;
@@ -708,7 +708,7 @@ systable_beginscan_ordered(Relation heapRelation,
}
sysscan->iscan = index_beginscan(heapRelation, indexRelation,
- snapshot, NULL, nkeys, 0);
+ snapshot, NULL, nkeys, 0, 0);
index_rescan(sysscan->iscan, idxkey, nkeys, NULL, 0);
sysscan->scan = NULL;
diff --git a/src/backend/access/index/indexam.c b/src/backend/access/index/indexam.c
index 0492d92d23b..b5523cf2ab1 100644
--- a/src/backend/access/index/indexam.c
+++ b/src/backend/access/index/indexam.c
@@ -257,7 +257,7 @@ index_beginscan(Relation heapRelation,
Relation indexRelation,
Snapshot snapshot,
IndexScanInstrumentation *instrument,
- int nkeys, int norderbys)
+ int nkeys, int norderbys, uint32 flags)
{
IndexScanDesc scan;
@@ -284,7 +284,7 @@ index_beginscan(Relation heapRelation,
scan->instrument = instrument;
/* prepare to fetch index matches from table */
- scan->xs_heapfetch = table_index_fetch_begin(heapRelation);
+ scan->xs_heapfetch = table_index_fetch_begin(heapRelation, flags);
return scan;
}
@@ -615,7 +615,7 @@ index_beginscan_parallel(Relation heaprel, Relation indexrel,
scan->instrument = instrument;
/* prepare to fetch index matches from table */
- scan->xs_heapfetch = table_index_fetch_begin(heaprel);
+ scan->xs_heapfetch = table_index_fetch_begin(heaprel, 0);
return scan;
}
diff --git a/src/backend/access/nbtree/nbtsort.c b/src/backend/access/nbtree/nbtsort.c
index d7695dc1108..7bdbc7e5fa7 100644
--- a/src/backend/access/nbtree/nbtsort.c
+++ b/src/backend/access/nbtree/nbtsort.c
@@ -1925,7 +1925,7 @@ _bt_parallel_scan_and_sort(BTSpool *btspool, BTSpool *btspool2,
indexInfo = BuildIndexInfo(btspool->index);
indexInfo->ii_Concurrent = btshared->isconcurrent;
scan = table_beginscan_parallel(btspool->heap,
- ParallelTableScanFromBTShared(btshared));
+ ParallelTableScanFromBTShared(btshared), 0);
reltuples = table_index_build_scan(btspool->heap, btspool->index, indexInfo,
true, progress, _bt_build_callback,
&buildstate, scan);
diff --git a/src/backend/access/table/tableam.c b/src/backend/access/table/tableam.c
index 73ebc01a08f..a00bdfdf822 100644
--- a/src/backend/access/table/tableam.c
+++ b/src/backend/access/table/tableam.c
@@ -163,10 +163,11 @@ table_parallelscan_initialize(Relation rel, ParallelTableScanDesc pscan,
}
TableScanDesc
-table_beginscan_parallel(Relation relation, ParallelTableScanDesc pscan)
+table_beginscan_parallel(Relation relation, ParallelTableScanDesc pscan, uint32 flags)
{
Snapshot snapshot;
- uint32 flags = SO_TYPE_SEQSCAN |
+
+ flags |= SO_TYPE_SEQSCAN |
SO_ALLOW_STRAT | SO_ALLOW_SYNC | SO_ALLOW_PAGEMODE;
Assert(RelFileLocatorEquals(relation->rd_locator, pscan->phs_locator));
@@ -248,7 +249,7 @@ table_index_fetch_tuple_check(Relation rel,
bool found;
slot = table_slot_create(rel, NULL);
- scan = table_index_fetch_begin(rel);
+ scan = table_index_fetch_begin(rel, 0);
found = table_index_fetch_tuple(scan, tid, snapshot, slot, &call_again,
all_dead);
table_index_fetch_end(scan);
diff --git a/src/backend/commands/constraint.c b/src/backend/commands/constraint.c
index 3497a8221f2..97c8278e36d 100644
--- a/src/backend/commands/constraint.c
+++ b/src/backend/commands/constraint.c
@@ -106,7 +106,7 @@ unique_key_recheck(PG_FUNCTION_ARGS)
*/
tmptid = checktid;
{
- IndexFetchTableData *scan = table_index_fetch_begin(trigdata->tg_relation);
+ IndexFetchTableData *scan = table_index_fetch_begin(trigdata->tg_relation, 0);
bool call_again = false;
if (!table_index_fetch_tuple(scan, &tmptid, SnapshotSelf, slot,
diff --git a/src/backend/commands/copyto.c b/src/backend/commands/copyto.c
index dae91630ac3..1957bb0f1a2 100644
--- a/src/backend/commands/copyto.c
+++ b/src/backend/commands/copyto.c
@@ -1156,7 +1156,7 @@ CopyRelationTo(CopyToState cstate, Relation rel, Relation root_rel, uint64 *proc
AttrMap *map = NULL;
TupleTableSlot *root_slot = NULL;
- scandesc = table_beginscan(rel, GetActiveSnapshot(), 0, NULL);
+ scandesc = table_beginscan(rel, GetActiveSnapshot(), 0, NULL, 0);
slot = table_slot_create(rel, NULL);
/*
diff --git a/src/backend/commands/tablecmds.c b/src/backend/commands/tablecmds.c
index 6b1a00ed477..130a670d266 100644
--- a/src/backend/commands/tablecmds.c
+++ b/src/backend/commands/tablecmds.c
@@ -6377,7 +6377,7 @@ ATRewriteTable(AlteredTableInfo *tab, Oid OIDNewHeap)
* checking all the constraints.
*/
snapshot = RegisterSnapshot(GetLatestSnapshot());
- scan = table_beginscan(oldrel, snapshot, 0, NULL);
+ scan = table_beginscan(oldrel, snapshot, 0, NULL, 0);
/*
* Switch to per-tuple memory context and reset it for each tuple
@@ -13766,7 +13766,7 @@ validateForeignKeyConstraint(char *conname,
*/
snapshot = RegisterSnapshot(GetLatestSnapshot());
slot = table_slot_create(rel, NULL);
- scan = table_beginscan(rel, snapshot, 0, NULL);
+ scan = table_beginscan(rel, snapshot, 0, NULL, 0);
perTupCxt = AllocSetContextCreate(CurrentMemoryContext,
"validateForeignKeyConstraint",
@@ -22623,7 +22623,7 @@ MergePartitionsMoveRows(List **wqueue, List *mergingPartitions, Relation newPart
/* Scan through the rows. */
snapshot = RegisterSnapshot(GetLatestSnapshot());
- scan = table_beginscan(mergingPartition, snapshot, 0, NULL);
+ scan = table_beginscan(mergingPartition, snapshot, 0, NULL, 0);
/*
* Switch to per-tuple memory context and reset it for each tuple
@@ -23087,7 +23087,7 @@ SplitPartitionMoveRows(List **wqueue, Relation rel, Relation splitRel,
/* Scan through the rows. */
snapshot = RegisterSnapshot(GetLatestSnapshot());
- scan = table_beginscan(splitRel, snapshot, 0, NULL);
+ scan = table_beginscan(splitRel, snapshot, 0, NULL, 0);
/*
* Switch to per-tuple memory context and reset it for each tuple
diff --git a/src/backend/commands/typecmds.c b/src/backend/commands/typecmds.c
index be6ffd6ddb0..2921f68c1c3 100644
--- a/src/backend/commands/typecmds.c
+++ b/src/backend/commands/typecmds.c
@@ -3157,7 +3157,7 @@ validateDomainNotNullConstraint(Oid domainoid)
/* Scan all tuples in this relation */
snapshot = RegisterSnapshot(GetLatestSnapshot());
- scan = table_beginscan(testrel, snapshot, 0, NULL);
+ scan = table_beginscan(testrel, snapshot, 0, NULL, 0);
slot = table_slot_create(testrel, NULL);
while (table_scan_getnextslot(scan, ForwardScanDirection, slot))
{
@@ -3238,7 +3238,7 @@ validateDomainCheckConstraint(Oid domainoid, const char *ccbin, LOCKMODE lockmod
/* Scan all tuples in this relation */
snapshot = RegisterSnapshot(GetLatestSnapshot());
- scan = table_beginscan(testrel, snapshot, 0, NULL);
+ scan = table_beginscan(testrel, snapshot, 0, NULL, 0);
slot = table_slot_create(testrel, NULL);
while (table_scan_getnextslot(scan, ForwardScanDirection, slot))
{
diff --git a/src/backend/executor/execIndexing.c b/src/backend/executor/execIndexing.c
index 0b3a31f1703..74262a34819 100644
--- a/src/backend/executor/execIndexing.c
+++ b/src/backend/executor/execIndexing.c
@@ -816,7 +816,7 @@ check_exclusion_or_unique_constraint(Relation heap, Relation index,
retry:
conflict = false;
found_self = false;
- index_scan = index_beginscan(heap, index, &DirtySnapshot, NULL, indnkeyatts, 0);
+ index_scan = index_beginscan(heap, index, &DirtySnapshot, NULL, indnkeyatts, 0, 0);
index_rescan(index_scan, scankeys, indnkeyatts, NULL, 0);
while (index_getnext_slot(index_scan, ForwardScanDirection, existing_slot))
diff --git a/src/backend/executor/execReplication.c b/src/backend/executor/execReplication.c
index 860f79f9cc1..6e49ea5c5d8 100644
--- a/src/backend/executor/execReplication.c
+++ b/src/backend/executor/execReplication.c
@@ -204,7 +204,7 @@ RelationFindReplTupleByIndex(Relation rel, Oid idxoid,
skey_attoff = build_replindex_scan_key(skey, rel, idxrel, searchslot);
/* Start an index scan. */
- scan = index_beginscan(rel, idxrel, &snap, NULL, skey_attoff, 0);
+ scan = index_beginscan(rel, idxrel, &snap, NULL, skey_attoff, 0, 0);
retry:
found = false;
@@ -382,7 +382,7 @@ RelationFindReplTupleSeq(Relation rel, LockTupleMode lockmode,
/* Start a heap scan. */
InitDirtySnapshot(snap);
- scan = table_beginscan(rel, &snap, 0, NULL);
+ scan = table_beginscan(rel, &snap, 0, NULL, 0);
scanslot = table_slot_create(rel, NULL);
retry:
@@ -601,7 +601,7 @@ RelationFindDeletedTupleInfoSeq(Relation rel, TupleTableSlot *searchslot,
* not yet committed or those just committed prior to the scan are
* excluded in update_most_recent_deletion_info().
*/
- scan = table_beginscan(rel, SnapshotAny, 0, NULL);
+ scan = table_beginscan(rel, SnapshotAny, 0, NULL, 0);
scanslot = table_slot_create(rel, NULL);
table_rescan(scan, NULL);
@@ -665,7 +665,7 @@ RelationFindDeletedTupleInfoByIndex(Relation rel, Oid idxoid,
* not yet committed or those just committed prior to the scan are
* excluded in update_most_recent_deletion_info().
*/
- scan = index_beginscan(rel, idxrel, SnapshotAny, NULL, skey_attoff, 0);
+ scan = index_beginscan(rel, idxrel, SnapshotAny, NULL, skey_attoff, 0, 0);
index_rescan(scan, skey, skey_attoff, NULL, 0);
diff --git a/src/backend/executor/nodeBitmapHeapscan.c b/src/backend/executor/nodeBitmapHeapscan.c
index bf24f3d7fe0..0d854db51a1 100644
--- a/src/backend/executor/nodeBitmapHeapscan.c
+++ b/src/backend/executor/nodeBitmapHeapscan.c
@@ -105,11 +105,18 @@ BitmapTableScanSetup(BitmapHeapScanState *node)
*/
if (!node->ss.ss_currentScanDesc)
{
+ uint32 flags = 0;
+
+ if (!bms_is_member(((Scan *) node->ss.ps.plan)->scanrelid,
+ node->ss.ps.state->es_modified_relids))
+ flags = SO_HINT_REL_READ_ONLY;
+
node->ss.ss_currentScanDesc =
table_beginscan_bm(node->ss.ss_currentRelation,
node->ss.ps.state->es_snapshot,
0,
- NULL);
+ NULL,
+ flags);
}
node->ss.ss_currentScanDesc->st.rs_tbmiterator = tbmiterator;
diff --git a/src/backend/executor/nodeIndexonlyscan.c b/src/backend/executor/nodeIndexonlyscan.c
index 6bea42f128f..2c87ba5f767 100644
--- a/src/backend/executor/nodeIndexonlyscan.c
+++ b/src/backend/executor/nodeIndexonlyscan.c
@@ -94,7 +94,7 @@ IndexOnlyNext(IndexOnlyScanState *node)
estate->es_snapshot,
&node->ioss_Instrument,
node->ioss_NumScanKeys,
- node->ioss_NumOrderByKeys);
+ node->ioss_NumOrderByKeys, 0);
node->ioss_ScanDesc = scandesc;
diff --git a/src/backend/executor/nodeIndexscan.c b/src/backend/executor/nodeIndexscan.c
index 72b135e5dcf..92674441c6d 100644
--- a/src/backend/executor/nodeIndexscan.c
+++ b/src/backend/executor/nodeIndexscan.c
@@ -102,6 +102,12 @@ IndexNext(IndexScanState *node)
if (scandesc == NULL)
{
+ uint32 flags = 0;
+
+ if (!bms_is_member(((Scan *) node->ss.ps.plan)->scanrelid,
+ estate->es_modified_relids))
+ flags = SO_HINT_REL_READ_ONLY;
+
/*
* We reach here if the index scan is not parallel, or if we're
* serially executing an index scan that was planned to be parallel.
@@ -111,7 +117,8 @@ IndexNext(IndexScanState *node)
estate->es_snapshot,
&node->iss_Instrument,
node->iss_NumScanKeys,
- node->iss_NumOrderByKeys);
+ node->iss_NumOrderByKeys,
+ flags);
node->iss_ScanDesc = scandesc;
@@ -207,7 +214,7 @@ IndexNextWithReorder(IndexScanState *node)
estate->es_snapshot,
&node->iss_Instrument,
node->iss_NumScanKeys,
- node->iss_NumOrderByKeys);
+ node->iss_NumOrderByKeys, 0);
node->iss_ScanDesc = scandesc;
diff --git a/src/backend/executor/nodeSeqscan.c b/src/backend/executor/nodeSeqscan.c
index 94047d29430..4d0cbb9dee4 100644
--- a/src/backend/executor/nodeSeqscan.c
+++ b/src/backend/executor/nodeSeqscan.c
@@ -65,13 +65,20 @@ SeqNext(SeqScanState *node)
if (scandesc == NULL)
{
+ uint32 flags = 0;
+
+ if (!bms_is_member(((Scan *) node->ss.ps.plan)->scanrelid,
+ estate->es_modified_relids))
+ flags = SO_HINT_REL_READ_ONLY;
+
/*
* We reach here if the scan is not parallel, or if we're serially
* executing a scan that was planned to be parallel.
*/
scandesc = table_beginscan(node->ss.ss_currentRelation,
estate->es_snapshot,
- 0, NULL);
+ 0, NULL, flags);
+
node->ss.ss_currentScanDesc = scandesc;
}
@@ -367,14 +374,20 @@ ExecSeqScanInitializeDSM(SeqScanState *node,
{
EState *estate = node->ss.ps.state;
ParallelTableScanDesc pscan;
+ uint32 flags = 0;
pscan = shm_toc_allocate(pcxt->toc, node->pscan_len);
table_parallelscan_initialize(node->ss.ss_currentRelation,
pscan,
estate->es_snapshot);
shm_toc_insert(pcxt->toc, node->ss.ps.plan->plan_node_id, pscan);
+ if (!bms_is_member(((Scan *) node->ss.ps.plan)->scanrelid,
+ estate->es_modified_relids))
+ flags = SO_HINT_REL_READ_ONLY;
+
node->ss.ss_currentScanDesc =
- table_beginscan_parallel(node->ss.ss_currentRelation, pscan);
+ table_beginscan_parallel(node->ss.ss_currentRelation, pscan,
+ flags);
}
/* ----------------------------------------------------------------
@@ -404,8 +417,15 @@ ExecSeqScanInitializeWorker(SeqScanState *node,
ParallelWorkerContext *pwcxt)
{
ParallelTableScanDesc pscan;
+ uint32 flags = 0;
+
+ if (!bms_is_member(((Scan *) node->ss.ps.plan)->scanrelid,
+ node->ss.ps.state->es_modified_relids))
+ flags = SO_HINT_REL_READ_ONLY;
pscan = shm_toc_lookup(pwcxt->toc, node->ss.ps.plan->plan_node_id, false);
node->ss.ss_currentScanDesc =
- table_beginscan_parallel(node->ss.ss_currentRelation, pscan);
+ table_beginscan_parallel(node->ss.ss_currentRelation,
+ pscan,
+ flags);
}
diff --git a/src/backend/partitioning/partbounds.c b/src/backend/partitioning/partbounds.c
index 16b0adc172c..91acf1ee2d7 100644
--- a/src/backend/partitioning/partbounds.c
+++ b/src/backend/partitioning/partbounds.c
@@ -3362,7 +3362,7 @@ check_default_partition_contents(Relation parent, Relation default_rel,
econtext = GetPerTupleExprContext(estate);
snapshot = RegisterSnapshot(GetLatestSnapshot());
tupslot = table_slot_create(part_rel, &estate->es_tupleTable);
- scan = table_beginscan(part_rel, snapshot, 0, NULL);
+ scan = table_beginscan(part_rel, snapshot, 0, NULL, 0);
/*
* Switch to per-tuple memory context and reset it for each tuple
diff --git a/src/backend/utils/adt/selfuncs.c b/src/backend/utils/adt/selfuncs.c
index c760b19db55..ec0def0d1e2 100644
--- a/src/backend/utils/adt/selfuncs.c
+++ b/src/backend/utils/adt/selfuncs.c
@@ -7100,7 +7100,7 @@ get_actual_variable_endpoint(Relation heapRel,
index_scan = index_beginscan(heapRel, indexRel,
&SnapshotNonVacuumable, NULL,
- 1, 0);
+ 1, 0, 0);
/* Set it up for index-only scan */
index_scan->xs_want_itup = true;
index_rescan(index_scan, scankeys, 1, NULL, 0);
diff --git a/src/include/access/genam.h b/src/include/access/genam.h
index 9200a22bd9f..d29d9e905fc 100644
--- a/src/include/access/genam.h
+++ b/src/include/access/genam.h
@@ -177,7 +177,7 @@ extern IndexScanDesc index_beginscan(Relation heapRelation,
Relation indexRelation,
Snapshot snapshot,
IndexScanInstrumentation *instrument,
- int nkeys, int norderbys);
+ int nkeys, int norderbys, uint32 flags);
extern IndexScanDesc index_beginscan_bitmap(Relation indexRelation,
Snapshot snapshot,
IndexScanInstrumentation *instrument,
diff --git a/src/include/access/heapam.h b/src/include/access/heapam.h
index 5657b1df46b..ba62a4d4cba 100644
--- a/src/include/access/heapam.h
+++ b/src/include/access/heapam.h
@@ -119,6 +119,12 @@ typedef struct IndexFetchHeapData
Buffer xs_cbuf; /* current heap buffer in scan, if any */
/* NB: if xs_cbuf is not InvalidBuffer, we hold a pin on that buffer */
+
+ /*
+ * Some optimizations can only be performed if the query does not modify
+ * the underlying relation. Track that here.
+ */
+ bool modifies_base_rel;
} IndexFetchHeapData;
/* Result codes for HeapTupleSatisfiesVacuum */
diff --git a/src/include/access/tableam.h b/src/include/access/tableam.h
index 767f5be838a..a7cfb125a5d 100644
--- a/src/include/access/tableam.h
+++ b/src/include/access/tableam.h
@@ -63,6 +63,8 @@ typedef enum ScanOptions
/* unregister snapshot at scan end? */
SO_TEMP_SNAPSHOT = 1 << 9,
+ /* set if the query doesn't modify the rel */
+ SO_HINT_REL_READ_ONLY = 1 << 10,
} ScanOptions;
/*
@@ -420,7 +422,7 @@ typedef struct TableAmRoutine
*
* Tuples for an index scan can then be fetched via index_fetch_tuple.
*/
- struct IndexFetchTableData *(*index_fetch_begin) (Relation rel);
+ struct IndexFetchTableData *(*index_fetch_begin) (Relation rel, uint32 flags);
/*
* Reset index fetch. Typically this will release cross index fetch
@@ -873,9 +875,9 @@ extern TupleTableSlot *table_slot_create(Relation relation, List **reglist);
*/
static inline TableScanDesc
table_beginscan(Relation rel, Snapshot snapshot,
- int nkeys, ScanKeyData *key)
+ int nkeys, ScanKeyData *key, uint32 flags)
{
- uint32 flags = SO_TYPE_SEQSCAN |
+ flags |= SO_TYPE_SEQSCAN |
SO_ALLOW_STRAT | SO_ALLOW_SYNC | SO_ALLOW_PAGEMODE;
return rel->rd_tableam->scan_begin(rel, snapshot, nkeys, key, NULL, flags);
@@ -918,9 +920,9 @@ table_beginscan_strat(Relation rel, Snapshot snapshot,
*/
static inline TableScanDesc
table_beginscan_bm(Relation rel, Snapshot snapshot,
- int nkeys, ScanKeyData *key)
+ int nkeys, ScanKeyData *key, uint32 flags)
{
- uint32 flags = SO_TYPE_BITMAPSCAN | SO_ALLOW_PAGEMODE;
+ flags |= SO_TYPE_BITMAPSCAN | SO_ALLOW_PAGEMODE;
return rel->rd_tableam->scan_begin(rel, snapshot, nkeys, key,
NULL, flags);
@@ -1127,7 +1129,8 @@ extern void table_parallelscan_initialize(Relation rel,
* Caller must hold a suitable lock on the relation.
*/
extern TableScanDesc table_beginscan_parallel(Relation relation,
- ParallelTableScanDesc pscan);
+ ParallelTableScanDesc pscan,
+ uint32 flags);
/*
* Begin a parallel tid range scan. `pscan` needs to have been initialized
@@ -1163,9 +1166,9 @@ table_parallelscan_reinitialize(Relation rel, ParallelTableScanDesc pscan)
* Tuples for an index scan can then be fetched via table_index_fetch_tuple().
*/
static inline IndexFetchTableData *
-table_index_fetch_begin(Relation rel)
+table_index_fetch_begin(Relation rel, uint32 flags)
{
- return rel->rd_tableam->index_fetch_begin(rel);
+ return rel->rd_tableam->index_fetch_begin(rel, flags);
}
/*
--
2.43.0
v30-0015-Allow-on-access-pruning-to-set-pages-all-visible.patchtext/x-patch; charset=US-ASCII; name=v30-0015-Allow-on-access-pruning-to-set-pages-all-visible.patchDownload
From 665f41020eeea237c5538d679ae248161257a87b Mon Sep 17 00:00:00 2001
From: Melanie Plageman <melanieplageman@gmail.com>
Date: Wed, 3 Dec 2025 15:24:08 -0500
Subject: [PATCH v30 15/16] Allow on-access pruning to set pages all-visible
Many queries do not modify the underlying relation. For such queries, if
on-access pruning occurs during the scan, we can check whether the page
has become all-visible and update the visibility map accordingly.
Previously, only vacuum and COPY FREEZE marked pages as all-visible or
all-frozen.
This commit implements on-access VM setting for sequential scans as well
as for the underlying heap relation in index scans and bitmap heap
scans.
Author: Melanie Plageman <melanieplageman@gmail.com>
Reviewed-by: Andres Freund <andres@anarazel.de>
Reviewed-by: Kirill Reshke <reshkekirill@gmail.com>
Discussion: https://postgr.es/m/flat/CAAKRu_ZMw6Npd_qm2KM%2BFwQ3cMOMx1Dh3VMhp8-V7SOLxdK9-g%40mail.gmail.com
---
src/backend/access/heap/heapam.c | 15 ++++++-
src/backend/access/heap/heapam_handler.c | 15 ++++++-
src/backend/access/heap/pruneheap.c | 40 ++++++++++++++++++-
src/include/access/heapam.h | 24 +++++++++--
.../t/035_standby_logical_decoding.pl | 3 +-
5 files changed, 89 insertions(+), 8 deletions(-)
diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index fb7a7548aa0..d9dc79f4a96 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -570,6 +570,7 @@ heap_prepare_pagescan(TableScanDesc sscan)
Buffer buffer = scan->rs_cbuf;
BlockNumber block = scan->rs_cblock;
Snapshot snapshot;
+ Buffer *vmbuffer = NULL;
Page page;
int lines;
bool all_visible;
@@ -584,7 +585,9 @@ heap_prepare_pagescan(TableScanDesc sscan)
/*
* Prune and repair fragmentation for the whole page, if possible.
*/
- heap_page_prune_opt(scan->rs_base.rs_rd, buffer);
+ if (sscan->rs_flags & SO_HINT_REL_READ_ONLY)
+ vmbuffer = &scan->rs_vmbuffer;
+ heap_page_prune_opt(scan->rs_base.rs_rd, buffer, vmbuffer);
/*
* We must hold share lock on the buffer content while examining tuple
@@ -1261,6 +1264,7 @@ heap_beginscan(Relation relation, Snapshot snapshot,
sizeof(TBMIterateResult));
}
+ scan->rs_vmbuffer = InvalidBuffer;
return (TableScanDesc) scan;
}
@@ -1299,6 +1303,12 @@ heap_rescan(TableScanDesc sscan, ScanKey key, bool set_params,
scan->rs_cbuf = InvalidBuffer;
}
+ if (BufferIsValid(scan->rs_vmbuffer))
+ {
+ ReleaseBuffer(scan->rs_vmbuffer);
+ scan->rs_vmbuffer = InvalidBuffer;
+ }
+
/*
* SO_TYPE_BITMAPSCAN would be cleaned up here, but it does not hold any
* additional data vs a normal HeapScan
@@ -1331,6 +1341,9 @@ heap_endscan(TableScanDesc sscan)
if (BufferIsValid(scan->rs_cbuf))
ReleaseBuffer(scan->rs_cbuf);
+ if (BufferIsValid(scan->rs_vmbuffer))
+ ReleaseBuffer(scan->rs_vmbuffer);
+
/*
* Must free the read stream before freeing the BufferAccessStrategy.
*/
diff --git a/src/backend/access/heap/heapam_handler.c b/src/backend/access/heap/heapam_handler.c
index fc251e11f8a..6946da8c9d7 100644
--- a/src/backend/access/heap/heapam_handler.c
+++ b/src/backend/access/heap/heapam_handler.c
@@ -85,6 +85,7 @@ heapam_index_fetch_begin(Relation rel, uint32 flags)
hscan->xs_base.rel = rel;
hscan->xs_cbuf = InvalidBuffer;
+ hscan->xs_vmbuffer = InvalidBuffer;
hscan->modifies_base_rel = !(flags & SO_HINT_REL_READ_ONLY);
return &hscan->xs_base;
@@ -100,6 +101,12 @@ heapam_index_fetch_reset(IndexFetchTableData *scan)
ReleaseBuffer(hscan->xs_cbuf);
hscan->xs_cbuf = InvalidBuffer;
}
+
+ if (BufferIsValid(hscan->xs_vmbuffer))
+ {
+ ReleaseBuffer(hscan->xs_vmbuffer);
+ hscan->xs_vmbuffer = InvalidBuffer;
+ }
}
static void
@@ -139,7 +146,8 @@ heapam_index_fetch_tuple(struct IndexFetchTableData *scan,
* Prune page, but only if we weren't already on this page
*/
if (prev_buf != hscan->xs_cbuf)
- heap_page_prune_opt(hscan->xs_base.rel, hscan->xs_cbuf);
+ heap_page_prune_opt(hscan->xs_base.rel, hscan->xs_cbuf,
+ hscan->modifies_base_rel ? NULL : &hscan->xs_vmbuffer);
}
/* Obtain share-lock on the buffer so we can examine visibility */
@@ -2477,6 +2485,7 @@ BitmapHeapScanNextBlock(TableScanDesc scan,
TBMIterateResult *tbmres;
OffsetNumber offsets[TBM_MAX_TUPLES_PER_PAGE];
int noffsets = -1;
+ Buffer *vmbuffer = NULL;
Assert(scan->rs_flags & SO_TYPE_BITMAPSCAN);
Assert(hscan->rs_read_stream);
@@ -2523,7 +2532,9 @@ BitmapHeapScanNextBlock(TableScanDesc scan,
/*
* Prune and repair fragmentation for the whole page, if possible.
*/
- heap_page_prune_opt(scan->rs_rd, buffer);
+ if (scan->rs_flags & SO_HINT_REL_READ_ONLY)
+ vmbuffer = &hscan->rs_vmbuffer;
+ heap_page_prune_opt(scan->rs_rd, buffer, vmbuffer);
/*
* We must hold share lock on the buffer content while examining tuple
diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index 8e59e7692c1..f414f02964d 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -202,6 +202,8 @@ static bool heap_page_will_set_vm(PruneState *prstate,
Relation relation,
BlockNumber heap_blk, Buffer heap_buffer, Page heap_page,
Buffer vmbuffer,
+ PruneReason reason,
+ bool do_prune, bool do_freeze,
int nlpdead_items,
uint8 *old_vmbits,
uint8 *new_vmbits);
@@ -223,9 +225,13 @@ static TransactionId get_conflict_xid(bool do_prune, bool do_freeze, bool do_set
* if there's not any use in pruning.
*
* Caller must have pin on the buffer, and must *not* have a lock on it.
+ *
+ * If vmbuffer is not NULL, it is okay for pruning to set the visibility map if
+ * the page is all-visible. We will take care of pinning and, if needed,
+ * reading in the page of the visibility map.
*/
void
-heap_page_prune_opt(Relation relation, Buffer buffer)
+heap_page_prune_opt(Relation relation, Buffer buffer, Buffer *vmbuffer)
{
Page page = BufferGetPage(buffer);
TransactionId prune_xid;
@@ -306,6 +312,13 @@ heap_page_prune_opt(Relation relation, Buffer buffer)
.cutoffs = NULL,
};
+ if (vmbuffer)
+ {
+ visibilitymap_pin(relation, BufferGetBlockNumber(buffer), vmbuffer);
+ params.options |= HEAP_PAGE_PRUNE_UPDATE_VM;
+ params.vmbuffer = *vmbuffer;
+ }
+
heap_page_prune_and_freeze(¶ms, &presult, &dummy_off_loc,
NULL, NULL);
@@ -951,6 +964,9 @@ identify_and_fix_vm_corruption(Relation rel, Buffer heap_buffer,
* corrupted, it will fix them by clearing the VM bits and visibility hint.
* This does not need to be done in a critical section.
*
+ * This should be called only after do_freeze has been decided (and do_prune
+ * has been set), as these factor into our heuristic-based decision.
+ *
* Returns true if one or both VM bits should be set, along with returning the
* current value of the VM bits in *old_vmbits and the desired new value of
* the VM bits in *new_vmbits.
@@ -964,6 +980,8 @@ heap_page_will_set_vm(PruneState *prstate,
Relation relation,
BlockNumber heap_blk, Buffer heap_buffer, Page heap_page,
Buffer vmbuffer,
+ PruneReason reason,
+ bool do_prune, bool do_freeze,
int nlpdead_items,
uint8 *old_vmbits,
uint8 *new_vmbits)
@@ -974,6 +992,24 @@ heap_page_will_set_vm(PruneState *prstate,
if (!prstate->attempt_update_vm)
return false;
+ /*
+ * If this is an on-access call and we're not actually pruning, avoid
+ * setting the visibility map if it would newly dirty the heap page or, if
+ * the page is already dirty, if doing so would require including a
+ * full-page image (FPI) of the heap page in the WAL. This situation
+ * should be rare, as on-access pruning is only attempted when
+ * pd_prune_xid is valid.
+ */
+ if (reason == PRUNE_ON_ACCESS &&
+ prstate->all_visible &&
+ !do_prune && !do_freeze &&
+ (!BufferIsDirty(heap_buffer) || XLogCheckBufferNeedsBackup(heap_buffer)))
+ {
+ prstate->all_visible = false;
+ prstate->all_frozen = false;
+ return false;
+ }
+
*old_vmbits = visibilitymap_get_status(relation, heap_blk,
&vmbuffer);
@@ -1171,6 +1207,8 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
buffer,
page,
vmbuffer,
+ params->reason,
+ do_prune, do_freeze,
prstate.lpdead_items,
&old_vmbits,
&new_vmbits);
diff --git a/src/include/access/heapam.h b/src/include/access/heapam.h
index ba62a4d4cba..b0e7c71463c 100644
--- a/src/include/access/heapam.h
+++ b/src/include/access/heapam.h
@@ -95,6 +95,13 @@ typedef struct HeapScanDescData
*/
ParallelBlockTableScanWorkerData *rs_parallelworkerdata;
+ /*
+ * For sequential scans and bitmap heap scans. If the relation is not
+ * being modified, on-access pruning may read in the current heap page's
+ * corresponding VM block to this buffer.
+ */
+ Buffer rs_vmbuffer;
+
/* these fields only used in page-at-a-time mode and for bitmap scans */
uint32 rs_cindex; /* current tuple's index in vistuples */
uint32 rs_ntuples; /* number of visible tuples on page */
@@ -117,8 +124,18 @@ typedef struct IndexFetchHeapData
{
IndexFetchTableData xs_base; /* AM independent part of the descriptor */
- Buffer xs_cbuf; /* current heap buffer in scan, if any */
- /* NB: if xs_cbuf is not InvalidBuffer, we hold a pin on that buffer */
+ /*
+ * Current heap buffer in scan, if any. NB: if xs_cbuf is not
+ * InvalidBuffer, we hold a pin on that buffer.
+ */
+ Buffer xs_cbuf;
+
+ /*
+ * For index scans that do not modify the underlying heap table, on-access
+ * pruning may read in the current heap page's corresponding VM block to
+ * this buffer.
+ */
+ Buffer xs_vmbuffer;
/*
* Some optimizations can only be performed if the query does not modify
@@ -419,7 +436,8 @@ extern TransactionId heap_index_delete_tuples(Relation rel,
TM_IndexDeleteOp *delstate);
/* in heap/pruneheap.c */
-extern void heap_page_prune_opt(Relation relation, Buffer buffer);
+extern void heap_page_prune_opt(Relation relation, Buffer buffer,
+ Buffer *vmbuffer);
extern void heap_page_prune_and_freeze(PruneFreezeParams *params,
PruneFreezeResult *presult,
OffsetNumber *off_loc,
diff --git a/src/test/recovery/t/035_standby_logical_decoding.pl b/src/test/recovery/t/035_standby_logical_decoding.pl
index ebe2fae1789..bdd9f0a62cd 100644
--- a/src/test/recovery/t/035_standby_logical_decoding.pl
+++ b/src/test/recovery/t/035_standby_logical_decoding.pl
@@ -296,6 +296,7 @@ wal_level = 'logical'
max_replication_slots = 4
max_wal_senders = 4
autovacuum = off
+hot_standby_feedback = on
});
$node_primary->dump_info;
$node_primary->start;
@@ -748,7 +749,7 @@ check_pg_recvlogical_stderr($handle,
$logstart = -s $node_standby->logfile;
reactive_slots_change_hfs_and_wait_for_xmins('shared_row_removal_',
- 'no_conflict_', 0, 1);
+ 'no_conflict_', 1, 0);
# This should not trigger a conflict
wait_until_vacuum_can_remove(
--
2.43.0
v30-0016-Set-pd_prune_xid-on-insert.patchtext/x-patch; charset=UTF-8; name=v30-0016-Set-pd_prune_xid-on-insert.patchDownload
From 6a843b21a4dd795004dfecbd0c321271beae8120 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <melanieplageman@gmail.com>
Date: Tue, 29 Jul 2025 16:12:56 -0400
Subject: [PATCH v30 16/16] Set pd_prune_xid on insert
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
Now that visibility map (VM) updates can occur during read-only queries,
it makes sense to also set the page’s pd_prune_xid hint during inserts.
This enables heap_page_prune_and_freeze() to run after a page is
filled with newly inserted tuples the first time it is read.
This change also addresses a long-standing note in heap_insert() and
heap_multi_insert(), which observed that setting pd_prune_xid would
help clean up aborted insertions sooner. Without it, such tuples might
linger until VACUUM, whereas now they can be pruned earlier.
Setting pd_prune_xid on insert can cause a page to be dirtied and
written out when it previously would not have been, affecting the
reported number of hits in the index-killtuples isolation test. It is
unclear if this is a bug in the way hits are tracked, a faulty test
expectation, or if simply updating the test's expected output is
sufficient remediation.
Reviewed-by: Chao Li <li.evan.chao@gmail.com>
---
src/backend/access/heap/heapam.c | 25 +++++++++++++------
src/backend/access/heap/heapam_xlog.c | 15 ++++++++++-
.../modules/index/expected/killtuples.out | 6 ++---
3 files changed, 34 insertions(+), 12 deletions(-)
diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index d9dc79f4a96..ccebc1f244b 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -2119,6 +2119,7 @@ heap_insert(Relation relation, HeapTuple tup, CommandId cid,
TransactionId xid = GetCurrentTransactionId();
HeapTuple heaptup;
Buffer buffer;
+ Page page;
Buffer vmbuffer = InvalidBuffer;
bool all_visible_cleared = false;
@@ -2178,15 +2179,19 @@ heap_insert(Relation relation, HeapTuple tup, CommandId cid,
}
/*
- * XXX Should we set PageSetPrunable on this page ?
+ * Set pd_prune_xid to trigger heap_page_prune_and_freeze() once the page
+ * is full so that we can set the page all-visible in the VM.
*
- * The inserting transaction may eventually abort thus making this tuple
- * DEAD and hence available for pruning. Though we don't want to optimize
- * for aborts, if no other tuple in this page is UPDATEd/DELETEd, the
- * aborted tuple will never be pruned until next vacuum is triggered.
+ * Setting pd_prune_xid is also handy if the inserting transaction
+ * eventually aborts making this tuple DEAD and hence available for
+ * pruning. If no other tuple in this page is UPDATEd/DELETEd, the aborted
+ * tuple would never otherwise be pruned until next vacuum is triggered.
*
- * If you do add PageSetPrunable here, add it in heap_xlog_insert too.
+ * Don't set it if we are in bootstrap mode, though.
*/
+ page = BufferGetPage(buffer);
+ if (TransactionIdIsNormal(xid))
+ PageSetPrunable(page, xid);
MarkBufferDirty(buffer);
@@ -2196,7 +2201,6 @@ heap_insert(Relation relation, HeapTuple tup, CommandId cid,
xl_heap_insert xlrec;
xl_heap_header xlhdr;
XLogRecPtr recptr;
- Page page = BufferGetPage(buffer);
uint8 info = XLOG_HEAP_INSERT;
int bufflags = 0;
@@ -2560,8 +2564,13 @@ heap_multi_insert(Relation relation, TupleTableSlot **slots, int ntuples,
}
/*
- * XXX Should we set PageSetPrunable on this page ? See heap_insert()
+ * Set pd_prune_xid. See heap_insert() for more on why we do this when
+ * inserting tuples. This only makes sense if we aren't already
+ * setting the page frozen in the VM. We also don't set it in
+ * bootstrap mode.
*/
+ if (!all_frozen_set && TransactionIdIsNormal(xid))
+ PageSetPrunable(page, xid);
MarkBufferDirty(buffer);
diff --git a/src/backend/access/heap/heapam_xlog.c b/src/backend/access/heap/heapam_xlog.c
index 47d2479415e..ab2db931aac 100644
--- a/src/backend/access/heap/heapam_xlog.c
+++ b/src/backend/access/heap/heapam_xlog.c
@@ -447,6 +447,12 @@ heap_xlog_insert(XLogReaderState *record)
freespace = PageGetHeapFreeSpace(page); /* needed to update FSM below */
+ /*
+ * Set the page prunable to trigger on-access pruning later which may
+ * set the page all-visible in the VM.
+ */
+ PageSetPrunable(page, XLogRecGetXid(record));
+
PageSetLSN(page, lsn);
if (xlrec->flags & XLH_INSERT_ALL_VISIBLE_CLEARED)
@@ -596,9 +602,16 @@ heap_xlog_multi_insert(XLogReaderState *record)
if (xlrec->flags & XLH_INSERT_ALL_VISIBLE_CLEARED)
PageClearAllVisible(page);
- /* XLH_INSERT_ALL_FROZEN_SET implies that all tuples are visible */
+ /*
+ * XLH_INSERT_ALL_FROZEN_SET implies that all tuples are visible. If
+ * we are not setting the page frozen, then set the page's prunable
+ * hint so that we trigger on-access pruning later which may set the
+ * page all-visible in the VM.
+ */
if (xlrec->flags & XLH_INSERT_ALL_FROZEN_SET)
PageSetAllVisible(page);
+ else
+ PageSetPrunable(page, XLogRecGetXid(record));
MarkBufferDirty(buffer);
}
diff --git a/src/test/modules/index/expected/killtuples.out b/src/test/modules/index/expected/killtuples.out
index be7ddd756ef..b29f2434b00 100644
--- a/src/test/modules/index/expected/killtuples.out
+++ b/src/test/modules/index/expected/killtuples.out
@@ -54,7 +54,7 @@ step flush: SELECT FROM pg_stat_force_next_flush();
step result: SELECT heap_blks_read + heap_blks_hit - counter.heap_accesses AS new_heap_accesses FROM counter, pg_statio_all_tables WHERE relname = 'kill_prior_tuple';
new_heap_accesses
-----------------
- 1
+ 2
(1 row)
step measure: UPDATE counter SET heap_accesses = (SELECT heap_blks_read + heap_blks_hit FROM pg_statio_all_tables WHERE relname = 'kill_prior_tuple');
@@ -130,7 +130,7 @@ step flush: SELECT FROM pg_stat_force_next_flush();
step result: SELECT heap_blks_read + heap_blks_hit - counter.heap_accesses AS new_heap_accesses FROM counter, pg_statio_all_tables WHERE relname = 'kill_prior_tuple';
new_heap_accesses
-----------------
- 1
+ 2
(1 row)
step measure: UPDATE counter SET heap_accesses = (SELECT heap_blks_read + heap_blks_hit FROM pg_statio_all_tables WHERE relname = 'kill_prior_tuple');
@@ -283,7 +283,7 @@ step flush: SELECT FROM pg_stat_force_next_flush();
step result: SELECT heap_blks_read + heap_blks_hit - counter.heap_accesses AS new_heap_accesses FROM counter, pg_statio_all_tables WHERE relname = 'kill_prior_tuple';
new_heap_accesses
-----------------
- 1
+ 2
(1 row)
step measure: UPDATE counter SET heap_accesses = (SELECT heap_blks_read + heap_blks_hit FROM pg_statio_all_tables WHERE relname = 'kill_prior_tuple');
--
2.43.0
On Sat, Dec 20, 2025 at 7:32 AM Kirill Reshke <reshkekirill@gmail.com> wrote:
Hi! I checked v29-0009, about HeapTupleSatisfiesVacuumHorizon. Origins
of this code track down to fdf9e21196a6 which was committed as part of
[0], at which point
there was no HeapTupleSatisfiesVacuumHorizon function. I guess this is
the reason this optimization was not performed earlier.
Thanks for taking a look into this!
I also think this patch is correct, because we do similar things for
HEAPTUPLE_DEAD & HEAPTUPLE_RECENTLY_DEAD, and
HeapTupleSatisfiesVacuumHorizon is just a proxy to
HeapTupleSatisfiesVacuumHorizon with only difference in DEAD VS
RECENTLY_DEAD handling.Similar change could be done at heapam_scan_analyze_next_tuple
...
case HEAPTUPLE_DEAD:
case HEAPTUPLE_RECENTLY_DEAD:
/* Count dead and recently-dead rows */
*deadrows += 1;
break;
In v30 sent here [1]/messages/by-id/CAAKRu_ZCjHoRPfQ8AbMrFY8TOMCPAvZ0_m9SX7yg0edfTk45-g@mail.gmail.com, I did end up making this change in 0010. I just
realized that I should have also changed
table_scan_analyze_next_tuple() and removed the call to
GetOldestRemovableTransactionId(). I've done that in attached v31.
I'm not sure we should change the table AM API (by removing
OldestXmin), though. I looked for table AMs implementing
scan_analyze_next_tuple() to see if they use OldestXmin. I found two:
OrioleDB [2]https://github.com/orioledb/orioledb/blob/acff65984d106dabf708a179e2c6694297e08c02/src/tableam/handler.c#L978C68-L978C78 and Citus columnar [3]https://github.com/citusdata/citus/blob/ee3812d267db3ab007efb6f5f432c82c1f448695/src/backend/columnar/columnar_tableam.c#L1418, which both implement
scan_analyze_next_tuple() and neither of them use OldestXmin. I
couldn't easily find other table AMs implementing
scan_analyze_next_tuple(). I don't have a strong sense of whether or
not I should make this change. Changing it is churn to a public API
and doesn't specifically enable us to do something.
I could also just leave it unused by heapam's implementation. I
haven't checked what, if any, other table AMs callbacks have
parameters completely unused by their heap implementation.
So, I'm on the fence about whether or not to make the change at all,
and, if I do, whether or not to change the table AM callback. That is
done in v31, though, so we can discuss.
- Melanie
[1]: /messages/by-id/CAAKRu_ZCjHoRPfQ8AbMrFY8TOMCPAvZ0_m9SX7yg0edfTk45-g@mail.gmail.com
[2]: https://github.com/orioledb/orioledb/blob/acff65984d106dabf708a179e2c6694297e08c02/src/tableam/handler.c#L978C68-L978C78
[3]: https://github.com/citusdata/citus/blob/ee3812d267db3ab007efb6f5f432c82c1f448695/src/backend/columnar/columnar_tableam.c#L1418
Attachments:
v31-0001-Combine-visibilitymap_set-cases-in-lazy_scan_pru.patchtext/x-patch; charset=UTF-8; name=v31-0001-Combine-visibilitymap_set-cases-in-lazy_scan_pru.patchDownload
From ec1755a3055229d5bef9cc963f8f6b7edb2a1cd3 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <melanieplageman@gmail.com>
Date: Mon, 8 Dec 2025 15:49:54 -0500
Subject: [PATCH v31 01/16] Combine visibilitymap_set() cases in
lazy_scan_prune()
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
lazy_scan_prune() previously had two separate cases that called
visibilitymap_set() after pruning and freezing. These branches were
nearly identical except that one attempted to avoid dirtying the heap
buffer. However, that situation can never occur — the heap buffer cannot
be clean at that point (and we would hit an assertion if it were).
In lazy_scan_prune(), when we change a previously all-visible page to
all-frozen and the page was recorded as all-visible in the visibility
map by find_next_unskippable_block(), the heap buffer will always be
dirty. Either we have just frozen a tuple and already dirtied the
buffer, or the buffer was modified between find_next_unskippable_block()
and heap_page_prune_and_freeze() and then pruned in
heap_page_prune_and_freeze().
Additionally, XLogRegisterBuffer() asserts that the buffer is dirty, so
attempting to add a clean heap buffer to the WAL chain would assert out
anyway.
Since the “clean heap buffer with already set VM” case is impossible,
the two visibilitymap_set() branches in lazy_scan_prune() can be merged.
Doing so makes the intent clearer and emphasizes that the heap buffer
must always be marked dirty before being added to the WAL chain.
This commit also adds a test case for vacuuming when no heap
modifications are required. Currently this ensures that the heap buffer
is marked dirty before it is added to the WAL chain, but if we later
remove the heap buffer from the VM-set WAL chain or pass it with the
REGBUF_NO_CHANGES flag, this test would guard that behavior.
Author: Melanie Plageman <melanieplageman@gmail.com>
Reviewed-by: Chao Li <li.evan.chao@gmail.com>
Reviewed-by: Srinath Reddy Sadipiralla <srinath2133@gmail.com>
Reviewed-by: Kirill Reshke <reshkekirill@gmail.com>
Reviewed-by: Xuneng Zhou <xunengzhou@gmail.com>
Discussion: https://postgr.es/m/5CEAA162-67B1-44DA-B60D-8B65717E8B05%40gmail.com
Discussion: https://postgr.es/m/flat/CAAKRu_ZWx5gCbeCf7PWCv8p5%3D%3Db7EEws0VD2wksDxpXCvCyHvQ%40mail.gmail.com
---
.../pg_visibility/expected/pg_visibility.out | 44 ++++++++++
contrib/pg_visibility/sql/pg_visibility.sql | 20 +++++
src/backend/access/heap/vacuumlazy.c | 87 ++++---------------
3 files changed, 82 insertions(+), 69 deletions(-)
diff --git a/contrib/pg_visibility/expected/pg_visibility.out b/contrib/pg_visibility/expected/pg_visibility.out
index 09fa5933a35..e10f1706015 100644
--- a/contrib/pg_visibility/expected/pg_visibility.out
+++ b/contrib/pg_visibility/expected/pg_visibility.out
@@ -1,4 +1,5 @@
CREATE EXTENSION pg_visibility;
+CREATE EXTENSION pageinspect;
--
-- recently-dropped table
--
@@ -204,6 +205,49 @@ select pg_truncate_visibility_map('test_partition');
(1 row)
+-- test the case where vacuum phase I does not need to modify the heap buffer
+-- and only needs to set the VM
+create table test_vac_unmodified_heap(a int);
+insert into test_vac_unmodified_heap values (1);
+vacuum (freeze) test_vac_unmodified_heap;
+select pg_visibility_map_summary('test_vac_unmodified_heap');
+ pg_visibility_map_summary
+---------------------------
+ (1,1)
+(1 row)
+
+-- the checkpoint cleans the buffer dirtied by freezing the sole tuple
+checkpoint;
+-- truncating the VM ensures that the next vacuum will need to set it
+select pg_truncate_visibility_map('test_vac_unmodified_heap');
+ pg_truncate_visibility_map
+----------------------------
+
+(1 row)
+
+select pg_visibility_map_summary('test_vac_unmodified_heap');
+ pg_visibility_map_summary
+---------------------------
+ (0,0)
+(1 row)
+
+-- though the VM is truncated, the heap page-level visibility hint,
+-- PD_ALL_VISIBLE should still be set
+SELECT (flags & x'0004'::int) <> 0
+ FROM page_header(get_raw_page('test_vac_unmodified_heap', 0));
+ ?column?
+----------
+ t
+(1 row)
+
+-- vacuum sets the VM
+vacuum test_vac_unmodified_heap;
+select pg_visibility_map_summary('test_vac_unmodified_heap');
+ pg_visibility_map_summary
+---------------------------
+ (1,1)
+(1 row)
+
-- test copy freeze
create table copyfreeze (a int, b char(1500));
-- load all rows via COPY FREEZE and ensure that all pages are set all-visible
diff --git a/contrib/pg_visibility/sql/pg_visibility.sql b/contrib/pg_visibility/sql/pg_visibility.sql
index 5af06ec5b76..57af8a0c5b6 100644
--- a/contrib/pg_visibility/sql/pg_visibility.sql
+++ b/contrib/pg_visibility/sql/pg_visibility.sql
@@ -1,4 +1,5 @@
CREATE EXTENSION pg_visibility;
+CREATE EXTENSION pageinspect;
--
-- recently-dropped table
@@ -94,6 +95,25 @@ select count(*) > 0 from pg_visibility_map_summary('test_partition');
select * from pg_check_frozen('test_partition'); -- hopefully none
select pg_truncate_visibility_map('test_partition');
+-- test the case where vacuum phase I does not need to modify the heap buffer
+-- and only needs to set the VM
+create table test_vac_unmodified_heap(a int);
+insert into test_vac_unmodified_heap values (1);
+vacuum (freeze) test_vac_unmodified_heap;
+select pg_visibility_map_summary('test_vac_unmodified_heap');
+-- the checkpoint cleans the buffer dirtied by freezing the sole tuple
+checkpoint;
+-- truncating the VM ensures that the next vacuum will need to set it
+select pg_truncate_visibility_map('test_vac_unmodified_heap');
+select pg_visibility_map_summary('test_vac_unmodified_heap');
+-- though the VM is truncated, the heap page-level visibility hint,
+-- PD_ALL_VISIBLE should still be set
+SELECT (flags & x'0004'::int) <> 0
+ FROM page_header(get_raw_page('test_vac_unmodified_heap', 0));
+-- vacuum sets the VM
+vacuum test_vac_unmodified_heap;
+select pg_visibility_map_summary('test_vac_unmodified_heap');
+
-- test copy freeze
create table copyfreeze (a int, b char(1500));
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index 30778a15639..cecba2146ea 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -2093,16 +2093,14 @@ lazy_scan_prune(LVRelState *vacrel,
* of last heap_vac_scan_next_block() call), and from all_visible and
* all_frozen variables
*/
- if (!all_visible_according_to_vm && presult.all_visible)
+ if ((presult.all_visible && !all_visible_according_to_vm) ||
+ (presult.all_frozen && !VM_ALL_FROZEN(rel, blkno, &vmbuffer)))
{
uint8 old_vmbits;
uint8 flags = VISIBILITYMAP_ALL_VISIBLE;
if (presult.all_frozen)
- {
- Assert(!TransactionIdIsValid(presult.vm_conflict_horizon));
flags |= VISIBILITYMAP_ALL_FROZEN;
- }
/*
* It should never be the case that the visibility map page is set
@@ -2110,15 +2108,25 @@ lazy_scan_prune(LVRelState *vacrel,
* checksums are not enabled). Regardless, set both bits so that we
* get back in sync.
*
- * NB: If the heap page is all-visible but the VM bit is not set, we
- * don't need to dirty the heap page. However, if checksums are
- * enabled, we do need to make sure that the heap page is dirtied
- * before passing it to visibilitymap_set(), because it may be logged.
- * Given that this situation should only happen in rare cases after a
- * crash, it is not worth optimizing.
+ * Even if PD_ALL_VISIBLE is already set, we don't need to worry about
+ * unnecessarily dirtying the heap buffer. Nearly the only scenario
+ * where PD_ALL_VISIBLE is set but the VM is not is if the VM was
+ * removed -- and that isn't worth optimizing for. And if we add the
+ * heap buffer to the WAL chain (without passing REGBUF_NO_CHANGES),
+ * it must be marked dirty.
*/
PageSetAllVisible(page);
MarkBufferDirty(buf);
+
+ /*
+ * If the page is being set all-frozen, we pass InvalidTransactionId
+ * as the cutoff_xid, since a snapshot conflict horizon sufficient to
+ * make everything safe for REDO was logged when the page's tuples
+ * were frozen.
+ */
+ Assert(!presult.all_frozen ||
+ !TransactionIdIsValid(presult.vm_conflict_horizon));
+
old_vmbits = visibilitymap_set(vacrel->rel, blkno, buf,
InvalidXLogRecPtr,
vmbuffer, presult.vm_conflict_horizon,
@@ -2190,65 +2198,6 @@ lazy_scan_prune(LVRelState *vacrel,
VISIBILITYMAP_VALID_BITS);
}
- /*
- * If the all-visible page is all-frozen but not marked as such yet, mark
- * it as all-frozen.
- */
- else if (all_visible_according_to_vm && presult.all_frozen &&
- !VM_ALL_FROZEN(vacrel->rel, blkno, &vmbuffer))
- {
- uint8 old_vmbits;
-
- /*
- * Avoid relying on all_visible_according_to_vm as a proxy for the
- * page-level PD_ALL_VISIBLE bit being set, since it might have become
- * stale -- even when all_visible is set
- */
- if (!PageIsAllVisible(page))
- {
- PageSetAllVisible(page);
- MarkBufferDirty(buf);
- }
-
- /*
- * Set the page all-frozen (and all-visible) in the VM.
- *
- * We can pass InvalidTransactionId as our cutoff_xid, since a
- * snapshotConflictHorizon sufficient to make everything safe for REDO
- * was logged when the page's tuples were frozen.
- */
- Assert(!TransactionIdIsValid(presult.vm_conflict_horizon));
- old_vmbits = visibilitymap_set(vacrel->rel, blkno, buf,
- InvalidXLogRecPtr,
- vmbuffer, InvalidTransactionId,
- VISIBILITYMAP_ALL_VISIBLE |
- VISIBILITYMAP_ALL_FROZEN);
-
- /*
- * The page was likely already set all-visible in the VM. However,
- * there is a small chance that it was modified sometime between
- * setting all_visible_according_to_vm and checking the visibility
- * during pruning. Check the return value of old_vmbits anyway to
- * ensure the visibility map counters used for logging are accurate.
- */
- if ((old_vmbits & VISIBILITYMAP_ALL_VISIBLE) == 0)
- {
- vacrel->vm_new_visible_pages++;
- vacrel->vm_new_visible_frozen_pages++;
- *vm_page_frozen = true;
- }
-
- /*
- * We already checked that the page was not set all-frozen in the VM
- * above, so we don't need to test the value of old_vmbits.
- */
- else
- {
- vacrel->vm_new_frozen_pages++;
- *vm_page_frozen = true;
- }
- }
-
return presult.ndeleted;
}
--
2.43.0
v31-0002-Eliminate-use-of-cached-VM-value-in-lazy_scan_pr.patchtext/x-patch; charset=US-ASCII; name=v31-0002-Eliminate-use-of-cached-VM-value-in-lazy_scan_pr.patchDownload
From f55678510379299ca66cf78fbf6e08ec8ecda0d2 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <melanieplageman@gmail.com>
Date: Thu, 11 Dec 2025 10:48:13 -0500
Subject: [PATCH v31 02/16] Eliminate use of cached VM value in
lazy_scan_prune()
lazy_scan_prune() takes a parameter from lazy_scan_heap() indicating
whether the page was marked all-visible in the VM at the time it was
last checked in find_next_unskippable_block(). This behavior is
historical, dating back to commit 608195a3a365, when we did not pin the
VM page until confirming it was not all-visible. Now that the VM page is
already pinned, there is no meaningful benefit to relying on a cached VM
status.
Removing this cached value simplifies the logic in both lazy_scan_heap()
and lazy_scan_prune(). It also clarifies future work that will set the
visibility map on-access: such paths will not have a cached value
available which would make the logic harder to reason about. Eliminating
it also enables us to detect and repair VM corruption on-access.
Along with removing the cached value and unconditionally checking the
visibility status of the heap page, this commit also moves the VM
corruption handling to occur first. This reordering should have no
performance impact, since the checks are inexpensive and performed only
once per page. It does, however, make the control flow easier to
understand. The new restructuring also makes it possible that after
fixing corruption, the VM could be newly set, if pruning found the page
all-visible.
Now that no callers of visibilitymap_set() use its return value, change
its (and visibilitymap_set_vmbits()) return type to void.
Author: Melanie Plageman <melanieplageman@gmail.com>
Reviewed-by: Kirill Reshke <reshkekirill@gmail.com>
Reviewed-by: Chao Li <li.evan.chao@gmail.com>
Reviewed-by: Xuneng Zhou <xunengzhou@gmail.com>
Discussion: https://postgr.es/m/5CEAA162-67B1-44DA-B60D-8B65717E8B05%40gmail.com
---
src/backend/access/heap/vacuumlazy.c | 182 +++++++++++-------------
src/backend/access/heap/visibilitymap.c | 9 +-
src/include/access/visibilitymap.h | 18 +--
3 files changed, 94 insertions(+), 115 deletions(-)
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index cecba2146ea..d47ed7814c8 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -248,13 +248,6 @@ typedef enum
*/
#define EAGER_SCAN_REGION_SIZE 4096
-/*
- * heap_vac_scan_next_block() sets these flags to communicate information
- * about the block it read to the caller.
- */
-#define VAC_BLK_WAS_EAGER_SCANNED (1 << 0)
-#define VAC_BLK_ALL_VISIBLE_ACCORDING_TO_VM (1 << 1)
-
typedef struct LVRelState
{
/* Target heap relation and its indexes */
@@ -358,7 +351,6 @@ typedef struct LVRelState
/* State maintained by heap_vac_scan_next_block() */
BlockNumber current_block; /* last block returned */
BlockNumber next_unskippable_block; /* next unskippable block */
- bool next_unskippable_allvis; /* its visibility status */
bool next_unskippable_eager_scanned; /* if it was eagerly scanned */
Buffer next_unskippable_vmbuffer; /* buffer containing its VM bit */
@@ -432,7 +424,7 @@ static bool lazy_scan_new_or_empty(LVRelState *vacrel, Buffer buf,
bool sharelock, Buffer vmbuffer);
static int lazy_scan_prune(LVRelState *vacrel, Buffer buf,
BlockNumber blkno, Page page,
- Buffer vmbuffer, bool all_visible_according_to_vm,
+ Buffer vmbuffer,
bool *has_lpdead_items, bool *vm_page_frozen);
static bool lazy_scan_noprune(LVRelState *vacrel, Buffer buf,
BlockNumber blkno, Page page,
@@ -1248,7 +1240,6 @@ lazy_scan_heap(LVRelState *vacrel)
/* Initialize for the first heap_vac_scan_next_block() call */
vacrel->current_block = InvalidBlockNumber;
vacrel->next_unskippable_block = InvalidBlockNumber;
- vacrel->next_unskippable_allvis = false;
vacrel->next_unskippable_eager_scanned = false;
vacrel->next_unskippable_vmbuffer = InvalidBuffer;
@@ -1264,13 +1255,13 @@ lazy_scan_heap(LVRelState *vacrel)
MAIN_FORKNUM,
heap_vac_scan_next_block,
vacrel,
- sizeof(uint8));
+ sizeof(bool));
while (true)
{
Buffer buf;
Page page;
- uint8 blk_info = 0;
+ bool was_eager_scanned = false;
int ndeleted = 0;
bool has_lpdead_items;
void *per_buffer_data = NULL;
@@ -1339,13 +1330,13 @@ lazy_scan_heap(LVRelState *vacrel)
if (!BufferIsValid(buf))
break;
- blk_info = *((uint8 *) per_buffer_data);
+ was_eager_scanned = *((bool *) per_buffer_data);
CheckBufferIsPinnedOnce(buf);
page = BufferGetPage(buf);
blkno = BufferGetBlockNumber(buf);
vacrel->scanned_pages++;
- if (blk_info & VAC_BLK_WAS_EAGER_SCANNED)
+ if (was_eager_scanned)
vacrel->eager_scanned_pages++;
/* Report as block scanned, update error traceback information */
@@ -1416,7 +1407,6 @@ lazy_scan_heap(LVRelState *vacrel)
if (got_cleanup_lock)
ndeleted = lazy_scan_prune(vacrel, buf, blkno, page,
vmbuffer,
- blk_info & VAC_BLK_ALL_VISIBLE_ACCORDING_TO_VM,
&has_lpdead_items, &vm_page_frozen);
/*
@@ -1433,8 +1423,7 @@ lazy_scan_heap(LVRelState *vacrel)
* exclude pages skipped due to cleanup lock contention from eager
* freeze algorithm caps.
*/
- if (got_cleanup_lock &&
- (blk_info & VAC_BLK_WAS_EAGER_SCANNED))
+ if (got_cleanup_lock && was_eager_scanned)
{
/* Aggressive vacuums do not eager scan. */
Assert(!vacrel->aggressive);
@@ -1601,7 +1590,6 @@ heap_vac_scan_next_block(ReadStream *stream,
{
BlockNumber next_block;
LVRelState *vacrel = callback_private_data;
- uint8 blk_info = 0;
/* relies on InvalidBlockNumber + 1 overflowing to 0 on first call */
next_block = vacrel->current_block + 1;
@@ -1664,8 +1652,8 @@ heap_vac_scan_next_block(ReadStream *stream,
* otherwise they would've been unskippable.
*/
vacrel->current_block = next_block;
- blk_info |= VAC_BLK_ALL_VISIBLE_ACCORDING_TO_VM;
- *((uint8 *) per_buffer_data) = blk_info;
+ /* Block was not eager scanned */
+ *((bool *) per_buffer_data) = false;
return vacrel->current_block;
}
else
@@ -1677,11 +1665,7 @@ heap_vac_scan_next_block(ReadStream *stream,
Assert(next_block == vacrel->next_unskippable_block);
vacrel->current_block = next_block;
- if (vacrel->next_unskippable_allvis)
- blk_info |= VAC_BLK_ALL_VISIBLE_ACCORDING_TO_VM;
- if (vacrel->next_unskippable_eager_scanned)
- blk_info |= VAC_BLK_WAS_EAGER_SCANNED;
- *((uint8 *) per_buffer_data) = blk_info;
+ *((bool *) per_buffer_data) = vacrel->next_unskippable_eager_scanned;
return vacrel->current_block;
}
}
@@ -1706,7 +1690,6 @@ find_next_unskippable_block(LVRelState *vacrel, bool *skipsallvis)
BlockNumber next_unskippable_block = vacrel->next_unskippable_block + 1;
Buffer next_unskippable_vmbuffer = vacrel->next_unskippable_vmbuffer;
bool next_unskippable_eager_scanned = false;
- bool next_unskippable_allvis;
*skipsallvis = false;
@@ -1716,7 +1699,6 @@ find_next_unskippable_block(LVRelState *vacrel, bool *skipsallvis)
next_unskippable_block,
&next_unskippable_vmbuffer);
- next_unskippable_allvis = (mapbits & VISIBILITYMAP_ALL_VISIBLE) != 0;
/*
* At the start of each eager scan region, normal vacuums with eager
@@ -1735,7 +1717,7 @@ find_next_unskippable_block(LVRelState *vacrel, bool *skipsallvis)
* A block is unskippable if it is not all visible according to the
* visibility map.
*/
- if (!next_unskippable_allvis)
+ if ((mapbits & VISIBILITYMAP_ALL_VISIBLE) == 0)
{
Assert((mapbits & VISIBILITYMAP_ALL_FROZEN) == 0);
break;
@@ -1792,7 +1774,6 @@ find_next_unskippable_block(LVRelState *vacrel, bool *skipsallvis)
/* write the local variables back to vacrel */
vacrel->next_unskippable_block = next_unskippable_block;
- vacrel->next_unskippable_allvis = next_unskippable_allvis;
vacrel->next_unskippable_eager_scanned = next_unskippable_eager_scanned;
vacrel->next_unskippable_vmbuffer = next_unskippable_vmbuffer;
}
@@ -1953,9 +1934,7 @@ cmpOffsetNumbers(const void *a, const void *b)
* Caller must hold pin and buffer cleanup lock on the buffer.
*
* vmbuffer is the buffer containing the VM block with visibility information
- * for the heap block, blkno. all_visible_according_to_vm is the saved
- * visibility status of the heap block looked up earlier by the caller. We
- * won't rely entirely on this status, as it may be out of date.
+ * for the heap block, blkno.
*
* *has_lpdead_items is set to true or false depending on whether, upon return
* from this function, any LP_DEAD items are still present on the page.
@@ -1972,7 +1951,6 @@ lazy_scan_prune(LVRelState *vacrel,
BlockNumber blkno,
Page page,
Buffer vmbuffer,
- bool all_visible_according_to_vm,
bool *has_lpdead_items,
bool *vm_page_frozen)
{
@@ -1986,6 +1964,8 @@ lazy_scan_prune(LVRelState *vacrel,
.vistest = vacrel->vistest,
.cutoffs = &vacrel->cutoffs,
};
+ uint8 old_vmbits = 0;
+ uint8 new_vmbits = 0;
Assert(BufferGetBlockNumber(buf) == blkno);
@@ -2088,70 +2068,7 @@ lazy_scan_prune(LVRelState *vacrel,
Assert(!presult.all_visible || !(*has_lpdead_items));
Assert(!presult.all_frozen || presult.all_visible);
- /*
- * Handle setting visibility map bit based on information from the VM (as
- * of last heap_vac_scan_next_block() call), and from all_visible and
- * all_frozen variables
- */
- if ((presult.all_visible && !all_visible_according_to_vm) ||
- (presult.all_frozen && !VM_ALL_FROZEN(rel, blkno, &vmbuffer)))
- {
- uint8 old_vmbits;
- uint8 flags = VISIBILITYMAP_ALL_VISIBLE;
-
- if (presult.all_frozen)
- flags |= VISIBILITYMAP_ALL_FROZEN;
-
- /*
- * It should never be the case that the visibility map page is set
- * while the page-level bit is clear, but the reverse is allowed (if
- * checksums are not enabled). Regardless, set both bits so that we
- * get back in sync.
- *
- * Even if PD_ALL_VISIBLE is already set, we don't need to worry about
- * unnecessarily dirtying the heap buffer. Nearly the only scenario
- * where PD_ALL_VISIBLE is set but the VM is not is if the VM was
- * removed -- and that isn't worth optimizing for. And if we add the
- * heap buffer to the WAL chain (without passing REGBUF_NO_CHANGES),
- * it must be marked dirty.
- */
- PageSetAllVisible(page);
- MarkBufferDirty(buf);
-
- /*
- * If the page is being set all-frozen, we pass InvalidTransactionId
- * as the cutoff_xid, since a snapshot conflict horizon sufficient to
- * make everything safe for REDO was logged when the page's tuples
- * were frozen.
- */
- Assert(!presult.all_frozen ||
- !TransactionIdIsValid(presult.vm_conflict_horizon));
-
- old_vmbits = visibilitymap_set(vacrel->rel, blkno, buf,
- InvalidXLogRecPtr,
- vmbuffer, presult.vm_conflict_horizon,
- flags);
-
- /*
- * If the page wasn't already set all-visible and/or all-frozen in the
- * VM, count it as newly set for logging.
- */
- if ((old_vmbits & VISIBILITYMAP_ALL_VISIBLE) == 0)
- {
- vacrel->vm_new_visible_pages++;
- if (presult.all_frozen)
- {
- vacrel->vm_new_visible_frozen_pages++;
- *vm_page_frozen = true;
- }
- }
- else if ((old_vmbits & VISIBILITYMAP_ALL_FROZEN) == 0 &&
- presult.all_frozen)
- {
- vacrel->vm_new_frozen_pages++;
- *vm_page_frozen = true;
- }
- }
+ old_vmbits = visibilitymap_get_status(vacrel->rel, blkno, &vmbuffer);
/*
* As of PostgreSQL 9.2, the visibility map bit should never be set if the
@@ -2159,8 +2076,8 @@ lazy_scan_prune(LVRelState *vacrel,
* cleared after heap_vac_scan_next_block() was called, so we must recheck
* with buffer lock before concluding that the VM is corrupt.
*/
- else if (all_visible_according_to_vm && !PageIsAllVisible(page) &&
- visibilitymap_get_status(vacrel->rel, blkno, &vmbuffer) != 0)
+ if (!PageIsAllVisible(page) &&
+ (old_vmbits & VISIBILITYMAP_VALID_BITS) != 0)
{
ereport(WARNING,
(errcode(ERRCODE_DATA_CORRUPTED),
@@ -2169,6 +2086,8 @@ lazy_scan_prune(LVRelState *vacrel,
visibilitymap_clear(vacrel->rel, blkno, vmbuffer,
VISIBILITYMAP_VALID_BITS);
+ /* VM bits are now clear */
+ old_vmbits = 0;
}
/*
@@ -2196,6 +2115,71 @@ lazy_scan_prune(LVRelState *vacrel,
MarkBufferDirty(buf);
visibilitymap_clear(vacrel->rel, blkno, vmbuffer,
VISIBILITYMAP_VALID_BITS);
+ /* VM bits are now clear */
+ old_vmbits = 0;
+ }
+
+ if (!presult.all_visible)
+ return presult.ndeleted;
+
+ /* Set the visibility map and page visibility hint */
+ new_vmbits = VISIBILITYMAP_ALL_VISIBLE;
+
+ if (presult.all_frozen)
+ new_vmbits |= VISIBILITYMAP_ALL_FROZEN;
+
+ /* Nothing to do */
+ if (old_vmbits == new_vmbits)
+ return presult.ndeleted;
+
+ Assert(presult.all_visible);
+
+ /*
+ * It should never be the case that the visibility map page is set while
+ * the page-level bit is clear, but the reverse is allowed (if checksums
+ * are not enabled). Regardless, set both bits so that we get back in
+ * sync.
+ *
+ * The heap buffer must be marked dirty before adding it to the WAL chain
+ * when setting the VM. We don't worry about unnecessarily dirtying the
+ * heap buffer if PD_ALL_VISIBLE is already set, though. It is extremely
+ * rare to have a clean heap buffer with PD_ALL_VISIBLE already set and
+ * the VM bits clear, so there is no point in optimizing it.
+ */
+ PageSetAllVisible(page);
+ MarkBufferDirty(buf);
+
+ /*
+ * If the page is being set all-frozen, we pass InvalidTransactionId as
+ * the cutoff_xid, since a snapshot conflict horizon sufficient to make
+ * everything safe for REDO was logged when the page's tuples were frozen.
+ */
+ Assert(!presult.all_frozen ||
+ !TransactionIdIsValid(presult.vm_conflict_horizon));
+
+ visibilitymap_set(vacrel->rel, blkno, buf,
+ InvalidXLogRecPtr,
+ vmbuffer, presult.vm_conflict_horizon,
+ new_vmbits);
+
+ /*
+ * If the page wasn't already set all-visible and/or all-frozen in the VM,
+ * count it as newly set for logging.
+ */
+ if ((old_vmbits & VISIBILITYMAP_ALL_VISIBLE) == 0)
+ {
+ vacrel->vm_new_visible_pages++;
+ if (presult.all_frozen)
+ {
+ vacrel->vm_new_visible_frozen_pages++;
+ *vm_page_frozen = true;
+ }
+ }
+ else if ((old_vmbits & VISIBILITYMAP_ALL_FROZEN) == 0 &&
+ presult.all_frozen)
+ {
+ vacrel->vm_new_frozen_pages++;
+ *vm_page_frozen = true;
}
return presult.ndeleted;
diff --git a/src/backend/access/heap/visibilitymap.c b/src/backend/access/heap/visibilitymap.c
index d14588e92ae..cdcb475e501 100644
--- a/src/backend/access/heap/visibilitymap.c
+++ b/src/backend/access/heap/visibilitymap.c
@@ -240,10 +240,8 @@ visibilitymap_pin_ok(BlockNumber heapBlk, Buffer vmbuf)
* You must pass a buffer containing the correct map page to this function.
* Call visibilitymap_pin first to pin the right one. This function doesn't do
* any I/O.
- *
- * Returns the state of the page's VM bits before setting flags.
*/
-uint8
+void
visibilitymap_set(Relation rel, BlockNumber heapBlk, Buffer heapBuf,
XLogRecPtr recptr, Buffer vmBuf, TransactionId cutoff_xid,
uint8 flags)
@@ -320,7 +318,6 @@ visibilitymap_set(Relation rel, BlockNumber heapBlk, Buffer heapBuf,
}
LockBuffer(vmBuf, BUFFER_LOCK_UNLOCK);
- return status;
}
/*
@@ -343,7 +340,7 @@ visibilitymap_set(Relation rel, BlockNumber heapBlk, Buffer heapBuf,
*
* rlocator is used only for debugging messages.
*/
-uint8
+void
visibilitymap_set_vmbits(BlockNumber heapBlk,
Buffer vmBuf, uint8 flags,
const RelFileLocator rlocator)
@@ -386,8 +383,6 @@ visibilitymap_set_vmbits(BlockNumber heapBlk,
map[mapByte] |= (flags << mapOffset);
MarkBufferDirty(vmBuf);
}
-
- return status;
}
/*
diff --git a/src/include/access/visibilitymap.h b/src/include/access/visibilitymap.h
index c6fa37be968..787c19e5fef 100644
--- a/src/include/access/visibilitymap.h
+++ b/src/include/access/visibilitymap.h
@@ -32,15 +32,15 @@ extern bool visibilitymap_clear(Relation rel, BlockNumber heapBlk,
extern void visibilitymap_pin(Relation rel, BlockNumber heapBlk,
Buffer *vmbuf);
extern bool visibilitymap_pin_ok(BlockNumber heapBlk, Buffer vmbuf);
-extern uint8 visibilitymap_set(Relation rel,
- BlockNumber heapBlk, Buffer heapBuf,
- XLogRecPtr recptr,
- Buffer vmBuf,
- TransactionId cutoff_xid,
- uint8 flags);
-extern uint8 visibilitymap_set_vmbits(BlockNumber heapBlk,
- Buffer vmBuf, uint8 flags,
- const RelFileLocator rlocator);
+extern void visibilitymap_set(Relation rel,
+ BlockNumber heapBlk, Buffer heapBuf,
+ XLogRecPtr recptr,
+ Buffer vmBuf,
+ TransactionId cutoff_xid,
+ uint8 flags);
+extern void visibilitymap_set_vmbits(BlockNumber heapBlk,
+ Buffer vmBuf, uint8 flags,
+ const RelFileLocator rlocator);
extern uint8 visibilitymap_get_status(Relation rel, BlockNumber heapBlk, Buffer *vmbuf);
extern void visibilitymap_count(Relation rel, BlockNumber *all_visible, BlockNumber *all_frozen);
extern BlockNumber visibilitymap_prepare_truncate(Relation rel,
--
2.43.0
v31-0003-Refactor-lazy_scan_prune-VM-clear-logic-into-hel.patchtext/x-patch; charset=US-ASCII; name=v31-0003-Refactor-lazy_scan_prune-VM-clear-logic-into-hel.patchDownload
From d196bdeefae2b14ca3b7abf22b6d6cffca116cd4 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <melanieplageman@gmail.com>
Date: Tue, 2 Dec 2025 13:36:39 -0500
Subject: [PATCH v31 03/16] Refactor lazy_scan_prune() VM clear logic into
helper
Encapsulating them in a helper makes the whole function clearer. There
is no functional change other than moving it into a helper.
Reviewed-by: Kirill Reshke <reshkekirill@gmail.com>
Reviewed-by: Chao Li <li.evan.chao@gmail.com>
---
src/backend/access/heap/vacuumlazy.c | 132 +++++++++++++++++----------
1 file changed, 85 insertions(+), 47 deletions(-)
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index d47ed7814c8..c5fc5b71f94 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -422,6 +422,11 @@ static void find_next_unskippable_block(LVRelState *vacrel, bool *skipsallvis);
static bool lazy_scan_new_or_empty(LVRelState *vacrel, Buffer buf,
BlockNumber blkno, Page page,
bool sharelock, Buffer vmbuffer);
+static bool identify_and_fix_vm_corruption(Relation rel, Buffer heap_buffer,
+ BlockNumber heap_blk, Page heap_page,
+ int nlpdead_items,
+ Buffer vmbuffer,
+ uint8 vmbits);
static int lazy_scan_prune(LVRelState *vacrel, Buffer buf,
BlockNumber blkno, Page page,
Buffer vmbuffer,
@@ -1928,6 +1933,83 @@ cmpOffsetNumbers(const void *a, const void *b)
return pg_cmp_u16(*(const OffsetNumber *) a, *(const OffsetNumber *) b);
}
+/*
+ * Helper to correct any corruption detected on a heap page and its
+ * corresponding visibility map page after pruning but before setting the
+ * visibility map. It examines the heap page, the associated VM page, and the
+ * number of dead items previously identified.
+ *
+ * This function must be called while holding an exclusive lock on the heap
+ * buffer, and the dead items must have been discovered under that same lock.
+
+ * The provided vmbits must reflect the current state of the VM block
+ * referenced by vmbuffer. Although we do not hold a lock on the VM buffer, it
+ * is pinned, and the heap buffer is exclusively locked, ensuring that no
+ * other backend can update the VM bits corresponding to this heap page.
+ *
+ * Returns true if it cleared corruption and false otherwise.
+ */
+static bool
+identify_and_fix_vm_corruption(Relation rel, Buffer heap_buffer,
+ BlockNumber heap_blk, Page heap_page,
+ int nlpdead_items,
+ Buffer vmbuffer,
+ uint8 vmbits)
+{
+ Assert(visibilitymap_get_status(rel, heap_blk, &vmbuffer) == vmbits);
+
+ Assert(BufferIsLockedByMeInMode(heap_buffer, BUFFER_LOCK_EXCLUSIVE));
+
+ /*
+ * As of PostgreSQL 9.2, the visibility map bit should never be set if the
+ * page-level bit is clear. However, it's possible that the bit got
+ * cleared after heap_vac_scan_next_block() was called, so we must recheck
+ * with buffer lock before concluding that the VM is corrupt.
+ */
+ if (!PageIsAllVisible(heap_page) &&
+ ((vmbits & VISIBILITYMAP_VALID_BITS) != 0))
+ {
+ ereport(WARNING,
+ (errcode(ERRCODE_DATA_CORRUPTED),
+ errmsg("page is not marked all-visible but visibility map bit is set in relation \"%s\" page %u",
+ RelationGetRelationName(rel), heap_blk)));
+
+ visibilitymap_clear(rel, heap_blk, vmbuffer,
+ VISIBILITYMAP_VALID_BITS);
+ return true;
+ }
+
+ /*
+ * It's possible for the value returned by
+ * GetOldestNonRemovableTransactionId() to move backwards, so it's not
+ * wrong for us to see tuples that appear to not be visible to everyone
+ * yet, while PD_ALL_VISIBLE is already set. The real safe xmin value
+ * never moves backwards, but GetOldestNonRemovableTransactionId() is
+ * conservative and sometimes returns a value that's unnecessarily small,
+ * so if we see that contradiction it just means that the tuples that we
+ * think are not visible to everyone yet actually are, and the
+ * PD_ALL_VISIBLE flag is correct.
+ *
+ * There should never be LP_DEAD items on a page with PD_ALL_VISIBLE set,
+ * however.
+ */
+ else if (PageIsAllVisible(heap_page) && nlpdead_items > 0)
+ {
+ ereport(WARNING,
+ (errcode(ERRCODE_DATA_CORRUPTED),
+ errmsg("page containing LP_DEAD items is marked as all-visible in relation \"%s\" page %u",
+ RelationGetRelationName(rel), heap_blk)));
+
+ PageClearAllVisible(heap_page);
+ MarkBufferDirty(heap_buffer);
+ visibilitymap_clear(rel, heap_blk, vmbuffer,
+ VISIBILITYMAP_VALID_BITS);
+ return true;
+ }
+
+ return false;
+}
+
/*
* lazy_scan_prune() -- lazy_scan_heap() pruning and freezing.
*
@@ -2070,54 +2152,10 @@ lazy_scan_prune(LVRelState *vacrel,
old_vmbits = visibilitymap_get_status(vacrel->rel, blkno, &vmbuffer);
- /*
- * As of PostgreSQL 9.2, the visibility map bit should never be set if the
- * page-level bit is clear. However, it's possible that the bit got
- * cleared after heap_vac_scan_next_block() was called, so we must recheck
- * with buffer lock before concluding that the VM is corrupt.
- */
- if (!PageIsAllVisible(page) &&
- (old_vmbits & VISIBILITYMAP_VALID_BITS) != 0)
- {
- ereport(WARNING,
- (errcode(ERRCODE_DATA_CORRUPTED),
- errmsg("page is not marked all-visible but visibility map bit is set in relation \"%s\" page %u",
- vacrel->relname, blkno)));
-
- visibilitymap_clear(vacrel->rel, blkno, vmbuffer,
- VISIBILITYMAP_VALID_BITS);
- /* VM bits are now clear */
+ if (identify_and_fix_vm_corruption(vacrel->rel, buf, blkno, page,
+ presult.lpdead_items, vmbuffer,
+ old_vmbits))
old_vmbits = 0;
- }
-
- /*
- * It's possible for the value returned by
- * GetOldestNonRemovableTransactionId() to move backwards, so it's not
- * wrong for us to see tuples that appear to not be visible to everyone
- * yet, while PD_ALL_VISIBLE is already set. The real safe xmin value
- * never moves backwards, but GetOldestNonRemovableTransactionId() is
- * conservative and sometimes returns a value that's unnecessarily small,
- * so if we see that contradiction it just means that the tuples that we
- * think are not visible to everyone yet actually are, and the
- * PD_ALL_VISIBLE flag is correct.
- *
- * There should never be LP_DEAD items on a page with PD_ALL_VISIBLE set,
- * however.
- */
- else if (presult.lpdead_items > 0 && PageIsAllVisible(page))
- {
- ereport(WARNING,
- (errcode(ERRCODE_DATA_CORRUPTED),
- errmsg("page containing LP_DEAD items is marked as all-visible in relation \"%s\" page %u",
- vacrel->relname, blkno)));
-
- PageClearAllVisible(page);
- MarkBufferDirty(buf);
- visibilitymap_clear(vacrel->rel, blkno, vmbuffer,
- VISIBILITYMAP_VALID_BITS);
- /* VM bits are now clear */
- old_vmbits = 0;
- }
if (!presult.all_visible)
return presult.ndeleted;
--
2.43.0
v31-0004-Set-the-VM-in-heap_page_prune_and_freeze.patchtext/x-patch; charset=US-ASCII; name=v31-0004-Set-the-VM-in-heap_page_prune_and_freeze.patchDownload
From 566794eed6786868a1147e6a0436d74c0603ccdf Mon Sep 17 00:00:00 2001
From: Melanie Plageman <melanieplageman@gmail.com>
Date: Tue, 2 Dec 2025 15:07:42 -0500
Subject: [PATCH v31 04/16] Set the VM in heap_page_prune_and_freeze()
This has no independent benefit. It is meant for ease of review. As of
this commit, there is still a separate WAL record emitted for setting
the VM after pruning and freezing. But it is easier to review if moving
the logic into pruneheap.c is separate from setting the VM in the same
WAL record.
Reviewed-by: Chao Li <li.evan.chao@gmail.com>
---
src/backend/access/heap/pruneheap.c | 315 +++++++++++++++++++++++----
src/backend/access/heap/vacuumlazy.c | 150 +------------
src/include/access/heapam.h | 20 ++
3 files changed, 299 insertions(+), 186 deletions(-)
diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index 07aa08cfe14..1c1446058a7 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -19,7 +19,7 @@
#include "access/htup_details.h"
#include "access/multixact.h"
#include "access/transam.h"
-#include "access/visibilitymapdefs.h"
+#include "access/visibilitymap.h"
#include "access/xlog.h"
#include "access/xloginsert.h"
#include "commands/vacuum.h"
@@ -44,6 +44,8 @@ typedef struct
bool mark_unused_now;
/* whether to attempt freezing tuples */
bool attempt_freeze;
+ /* whether or not to attempt updating the VM */
+ bool attempt_update_vm;
struct VacuumCutoffs *cutoffs;
/*-------------------------------------------------------
@@ -140,16 +142,17 @@ typedef struct
* all_visible and all_frozen indicate if the all-visible and all-frozen
* bits in the visibility map can be set for this page after pruning.
*
- * visibility_cutoff_xid is the newest xmin of live tuples on the page.
- * The caller can use it as the conflict horizon, when setting the VM
- * bits. It is only valid if we froze some tuples, and all_frozen is
- * true.
+ * visibility_cutoff_xid is the newest xmin of live tuples on the page. It
+ * can be used as the conflict horizon when setting the VM or when
+ * freezing all the tuples on the page. It is only valid when all the live
+ * tuples on the page are all-visible.
*
* NOTE: all_visible and all_frozen initially don't include LP_DEAD items.
* That's convenient for heap_page_prune_and_freeze() to use them to
- * decide whether to freeze the page or not. The all_visible and
- * all_frozen values returned to the caller are adjusted to include
- * LP_DEAD items after we determine whether to opportunistically freeze.
+ * decide whether to opportunistically freeze the page or not. The
+ * all_visible and all_frozen values ultimately used to set the VM are
+ * adjusted to include LP_DEAD items after we determine whether or not to
+ * opportunistically freeze.
*/
bool all_visible;
bool all_frozen;
@@ -191,6 +194,17 @@ static void page_verify_redirects(Page page);
static bool heap_page_will_freeze(Relation relation, Buffer buffer,
bool did_tuple_hint_fpi, bool do_prune, bool do_hint_prune,
PruneState *prstate);
+static bool identify_and_fix_vm_corruption(Relation rel, Buffer heap_buffer,
+ BlockNumber heap_blk, Page heap_page, int nlpdead_items,
+ Buffer vmbuffer,
+ uint8 vmbits);
+static bool heap_page_will_set_vm(PruneState *prstate,
+ Relation relation,
+ BlockNumber heap_blk, Buffer heap_buffer, Page heap_page,
+ Buffer vmbuffer,
+ int nlpdead_items,
+ uint8 *old_vmbits,
+ uint8 *new_vmbits);
/*
@@ -280,6 +294,7 @@ heap_page_prune_opt(Relation relation, Buffer buffer)
PruneFreezeParams params = {
.relation = relation,
.buffer = buffer,
+ .vmbuffer = InvalidBuffer,
.reason = PRUNE_ON_ACCESS,
.options = 0,
.vistest = vistest,
@@ -341,6 +356,8 @@ prune_freeze_setup(PruneFreezeParams *params,
/* cutoffs must be provided if we will attempt freezing */
Assert(!(params->options & HEAP_PAGE_PRUNE_FREEZE) || params->cutoffs);
prstate->attempt_freeze = (params->options & HEAP_PAGE_PRUNE_FREEZE) != 0;
+ prstate->attempt_update_vm =
+ (params->options & HEAP_PAGE_PRUNE_UPDATE_VM) != 0;
prstate->cutoffs = params->cutoffs;
/*
@@ -396,51 +413,54 @@ prune_freeze_setup(PruneFreezeParams *params,
prstate->frz_conflict_horizon = InvalidTransactionId;
/*
- * Vacuum may update the VM after we're done. We can keep track of
- * whether the page will be all-visible and all-frozen after pruning and
- * freezing to help the caller to do that.
+ * Track whether the page could be marked all-visible and/or all-frozen.
+ * This information is used for opportunistic freezing and for updating
+ * the visibility map (VM) if requested by the caller.
*
- * Currently, only VACUUM sets the VM bits. To save the effort, only do
- * the bookkeeping if the caller needs it. Currently, that's tied to
- * HEAP_PAGE_PRUNE_FREEZE, but it could be a separate flag if you wanted
- * to update the VM bits without also freezing or freeze without also
- * setting the VM bits.
+ * Currently, only VACUUM performs freezing, but other callers may in the
+ * future. Visibility bookkeeping is required not just for setting the VM
+ * bits, but also for opportunistic freezing: we only consider freezing if
+ * the page would become all-frozen, or if it would be all-frozen except
+ * for dead tuples that VACUUM will remove. If attempt_update_vm is false,
+ * we will not set the VM bit even if the page is found to be all-visible.
*
- * In addition to telling the caller whether it can set the VM bit, we
- * also use 'all_visible' and 'all_frozen' for our own decision-making. If
- * the whole page would become frozen, we consider opportunistically
- * freezing tuples. We will not be able to freeze the whole page if there
- * are tuples present that are not visible to everyone or if there are
- * dead tuples which are not yet removable. However, dead tuples which
- * will be removed by the end of vacuuming should not preclude us from
- * opportunistically freezing. Because of that, we do not immediately
- * clear all_visible and all_frozen when we see LP_DEAD items. We fix
- * that after scanning the line pointers. We must correct all_visible and
- * all_frozen before we return them to the caller, so that the caller
- * doesn't set the VM bits incorrectly.
+ * If HEAP_PAGE_PRUNE_UPDATE_VM is passed without HEAP_PAGE_PRUNE_FREEZE,
+ * prstate.all_frozen must be initialized to false, since we will not call
+ * heap_prepare_freeze_tuple() for each tuple.
+ *
+ * Dead tuples that will be removed by the end of vacuum should not
+ * prevent opportunistic freezing. Therefore, we do not clear all_visible
+ * and all_frozen when we encounter LP_DEAD items. Instead, we correct
+ * them after deciding whether to freeze, but before updating the VM, to
+ * avoid setting the VM bits incorrectly.
+ *
+ * If neither freezing nor VM updates are requested, we skip the extra
+ * bookkeeping. In this case, initializing all_visible to false allows
+ * heap_prune_record_unchanged_lp_normal() to bypass unnecessary work.
*/
if (prstate->attempt_freeze)
{
prstate->all_visible = true;
prstate->all_frozen = true;
}
+ else if (prstate->attempt_update_vm)
+ {
+ prstate->all_visible = true;
+ prstate->all_frozen = false;
+ }
else
{
- /*
- * Initializing to false allows skipping the work to update them in
- * heap_prune_record_unchanged_lp_normal().
- */
prstate->all_visible = false;
prstate->all_frozen = false;
}
/*
- * The visibility cutoff xid is the newest xmin of live tuples on the
- * page. In the common case, this will be set as the conflict horizon the
- * caller can use for updating the VM. If, at the end of freezing and
- * pruning, the page is all-frozen, there is no possibility that any
- * running transaction on the standby does not see tuples on the page as
- * all-visible, so the conflict horizon remains InvalidTransactionId.
+ * The visibility cutoff xid is the newest xmin of live, committed tuples
+ * older than OldestXmin on the page. This field is only kept up-to-date
+ * if the page is all-visible. As soon as a tuple is encountered that is
+ * not visible to all, this field is unmaintained. As long as it is
+ * maintained, it can be used to calculate the snapshot conflict horizon
+ * when updating the VM and/or freezing all the tuples on the page.
*/
prstate->visibility_cutoff_xid = InvalidTransactionId;
}
@@ -775,10 +795,148 @@ heap_page_will_freeze(Relation relation, Buffer buffer,
return do_freeze;
}
+/*
+ * Helper to correct any corruption detected on a heap page and its
+ * corresponding visibility map page after pruning but before setting the
+ * visibility map. It examines the heap page, the associated VM page, and the
+ * number of dead items previously identified.
+ *
+ * This function must be called while holding an exclusive lock on the heap
+ * buffer, and the dead items must have been discovered under that same lock.
+ *
+ * The provided vmbits must reflect the current state of the VM block
+ * referenced by vmbuffer. Although we do not hold a lock on the VM buffer, it
+ * is pinned, and the heap buffer is exclusively locked, ensuring that no
+ * other backend can update the VM bits corresponding to this heap page.
+ *
+ * Returns true if it cleared corruption and false otherwise.
+ */
+static bool
+identify_and_fix_vm_corruption(Relation rel, Buffer heap_buffer,
+ BlockNumber heap_blk, Page heap_page,
+ int nlpdead_items,
+ Buffer vmbuffer,
+ uint8 vmbits)
+{
+ Assert(visibilitymap_get_status(rel, heap_blk, &vmbuffer) == vmbits);
+
+ Assert(BufferIsLockedByMeInMode(heap_buffer, BUFFER_LOCK_EXCLUSIVE));
+
+ /*
+ * As of PostgreSQL 9.2, the visibility map bit should never be set if the
+ * page-level bit is clear. However, it's possible that the bit got
+ * cleared after heap_vac_scan_next_block() was called, so we must recheck
+ * with buffer lock before concluding that the VM is corrupt.
+ */
+ if (!PageIsAllVisible(heap_page) &&
+ ((vmbits & VISIBILITYMAP_VALID_BITS) != 0))
+ {
+ ereport(WARNING,
+ (errcode(ERRCODE_DATA_CORRUPTED),
+ errmsg("page is not marked all-visible but visibility map bit is set in relation \"%s\" page %u",
+ RelationGetRelationName(rel), heap_blk)));
+
+ visibilitymap_clear(rel, heap_blk, vmbuffer,
+ VISIBILITYMAP_VALID_BITS);
+ return true;
+ }
+
+ /*
+ * It's possible for the value returned by
+ * GetOldestNonRemovableTransactionId() to move backwards, so it's not
+ * wrong for us to see tuples that appear to not be visible to everyone
+ * yet, while PD_ALL_VISIBLE is already set. The real safe xmin value
+ * never moves backwards, but GetOldestNonRemovableTransactionId() is
+ * conservative and sometimes returns a value that's unnecessarily small,
+ * so if we see that contradiction it just means that the tuples that we
+ * think are not visible to everyone yet actually are, and the
+ * PD_ALL_VISIBLE flag is correct.
+ *
+ * There should never be LP_DEAD items on a page with PD_ALL_VISIBLE set,
+ * however.
+ */
+ else if (PageIsAllVisible(heap_page) && nlpdead_items > 0)
+ {
+ ereport(WARNING,
+ (errcode(ERRCODE_DATA_CORRUPTED),
+ errmsg("page containing LP_DEAD items is marked as all-visible in relation \"%s\" page %u",
+ RelationGetRelationName(rel), heap_blk)));
+
+ PageClearAllVisible(heap_page);
+ MarkBufferDirty(heap_buffer);
+ visibilitymap_clear(rel, heap_blk, vmbuffer,
+ VISIBILITYMAP_VALID_BITS);
+ return true;
+ }
+
+ return false;
+}
+
+/*
+ * Decide whether to set the visibility map bits (all-visible and all-frozen)
+ * for heap_blk using information from the PruneState and VM.
+ *
+ * This function does not actually set the VM bits or page-level visibility
+ * hint, PD_ALL_VISIBLE.
+ *
+ * However, if it finds that the page-level visibility hint or VM is
+ * corrupted, it will fix them by clearing the VM bits and visibility hint.
+ * This does not need to be done in a critical section.
+ *
+ * Returns true if one or both VM bits should be set, along with returning the
+ * current value of the VM bits in *old_vmbits and the desired new value of
+ * the VM bits in *new_vmbits.
+ *
+ * If the VM should not be set, it returns false. If we won't consider
+ * updating the VM, *old_vmbits will be 0, regardless of the current value of
+ * the VM bits.
+ */
+static bool
+heap_page_will_set_vm(PruneState *prstate,
+ Relation relation,
+ BlockNumber heap_blk, Buffer heap_buffer, Page heap_page,
+ Buffer vmbuffer,
+ int nlpdead_items,
+ uint8 *old_vmbits,
+ uint8 *new_vmbits)
+{
+ *old_vmbits = 0;
+ *new_vmbits = 0;
+
+ if (!prstate->attempt_update_vm)
+ return false;
+
+ *old_vmbits = visibilitymap_get_status(relation, heap_blk,
+ &vmbuffer);
+
+ /* We do this even if not all-visible */
+ if (identify_and_fix_vm_corruption(relation, heap_buffer, heap_blk, heap_page,
+ nlpdead_items, vmbuffer,
+ *old_vmbits))
+ *old_vmbits = 0;
+
+ if (!prstate->all_visible)
+ return false;
+
+ *new_vmbits = VISIBILITYMAP_ALL_VISIBLE;
+
+ if (prstate->all_frozen)
+ *new_vmbits |= VISIBILITYMAP_ALL_FROZEN;
+
+ if (*new_vmbits == *old_vmbits)
+ {
+ *new_vmbits = 0;
+ return false;
+ }
+
+ return true;
+}
+
/*
* Prune and repair fragmentation and potentially freeze tuples on the
- * specified page.
+ * specified page. If the page's visibility status has changed, update it in
+ * the VM.
*
* Caller must have pin and buffer cleanup lock on the page. Note that we
* don't update the FSM information for page on caller's behalf. Caller might
@@ -793,12 +951,13 @@ heap_page_will_freeze(Relation relation, Buffer buffer,
* tuples if it's required in order to advance relfrozenxid / relminmxid, or
* if it's considered advantageous for overall system performance to do so
* now. The 'params.cutoffs', 'presult', 'new_relfrozen_xid' and
- * 'new_relmin_mxid' arguments are required when freezing. When
- * HEAP_PAGE_PRUNE_FREEZE option is passed, we also set presult->all_visible
- * and presult->all_frozen after determining whether or not to
- * opportunistically freeze, to indicate if the VM bits can be set. They are
- * always set to false when the HEAP_PAGE_PRUNE_FREEZE option is not passed,
- * because at the moment only callers that also freeze need that information.
+ * 'new_relmin_mxid' arguments are required when freezing.
+ *
+ * If HEAP_PAGE_PRUNE_UPDATE_VM is set in params and the visibility status of
+ * the page has changed, we will update the VM at the same time as pruning and
+ * freezing the heap page. We will also update presult->old_vmbits and
+ * presult->new_vmbits with the state of the VM before and after updating it
+ * for the caller to use in bookkeeping.
*
* presult contains output parameters needed by callers, such as the number of
* tuples removed and the offsets of dead items on the page after pruning.
@@ -823,13 +982,18 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
MultiXactId *new_relmin_mxid)
{
Buffer buffer = params->buffer;
+ Buffer vmbuffer = params->vmbuffer;
Page page = BufferGetPage(buffer);
+ BlockNumber blockno = BufferGetBlockNumber(buffer);
PruneState prstate;
bool do_freeze;
bool do_prune;
bool do_hint_prune;
+ bool do_set_vm;
bool did_tuple_hint_fpi;
int64 fpi_before = pgWalUsage.wal_fpi;
+ uint8 new_vmbits;
+ uint8 old_vmbits;
/* Initialize prstate */
prune_freeze_setup(params,
@@ -1011,6 +1175,65 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
*new_relmin_mxid = prstate.pagefrz.NoFreezePageRelminMxid;
}
}
+
+ /* Now update the visibility map and PD_ALL_VISIBLE hint */
+ Assert(!prstate.all_visible || (prstate.lpdead_items == 0));
+
+ do_set_vm = heap_page_will_set_vm(&prstate,
+ params->relation,
+ blockno,
+ buffer,
+ page,
+ vmbuffer,
+ prstate.lpdead_items,
+ &old_vmbits,
+ &new_vmbits);
+
+ /*
+ * new_vmbits should be 0 regardless of whether or not the page is
+ * all-visible if we do not intend to set the VM.
+ */
+ Assert(do_set_vm || new_vmbits == 0);
+
+ /* Set the visibility map and page visibility hint, if relevant */
+ if (do_set_vm)
+ {
+ Assert(prstate.all_visible);
+
+ /*
+ * It should never be the case that the visibility map page is set
+ * while the page-level bit is clear, but the reverse is allowed (if
+ * checksums are not enabled). Regardless, set both bits so that we
+ * get back in sync.
+ *
+ * The heap buffer must be marked dirty before adding it to the WAL
+ * chain when setting the VM. We don't worry about unnecessarily
+ * dirtying the heap buffer if PD_ALL_VISIBLE is already set, though.
+ * It is extremely rare to have a clean heap buffer with
+ * PD_ALL_VISIBLE already set and the VM bits clear, so there is no
+ * point in optimizing it.
+ */
+ PageSetAllVisible(page);
+ MarkBufferDirty(buffer);
+
+ /*
+ * If the page is being set all-frozen, we pass InvalidTransactionId
+ * as the cutoff_xid, since a snapshot conflict horizon sufficient to
+ * make everything safe for REDO was logged when the page's tuples
+ * were frozen.
+ */
+ Assert(!prstate.all_frozen ||
+ !TransactionIdIsValid(presult->vm_conflict_horizon));
+
+ visibilitymap_set(params->relation, blockno, buffer,
+ InvalidXLogRecPtr,
+ vmbuffer, presult->vm_conflict_horizon,
+ new_vmbits);
+ }
+
+ /* Save the vmbits for caller */
+ presult->old_vmbits = old_vmbits;
+ presult->new_vmbits = new_vmbits;
}
@@ -1485,6 +1708,8 @@ heap_prune_record_unchanged_lp_normal(Page page, PruneState *prstate, OffsetNumb
{
TransactionId xmin;
+ Assert(prstate->attempt_update_vm);
+
if (!HeapTupleHeaderXminCommitted(htup))
{
prstate->all_visible = false;
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index c5fc5b71f94..8b489349312 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -422,11 +422,7 @@ static void find_next_unskippable_block(LVRelState *vacrel, bool *skipsallvis);
static bool lazy_scan_new_or_empty(LVRelState *vacrel, Buffer buf,
BlockNumber blkno, Page page,
bool sharelock, Buffer vmbuffer);
-static bool identify_and_fix_vm_corruption(Relation rel, Buffer heap_buffer,
- BlockNumber heap_blk, Page heap_page,
- int nlpdead_items,
- Buffer vmbuffer,
- uint8 vmbits);
+
static int lazy_scan_prune(LVRelState *vacrel, Buffer buf,
BlockNumber blkno, Page page,
Buffer vmbuffer,
@@ -1933,83 +1929,6 @@ cmpOffsetNumbers(const void *a, const void *b)
return pg_cmp_u16(*(const OffsetNumber *) a, *(const OffsetNumber *) b);
}
-/*
- * Helper to correct any corruption detected on a heap page and its
- * corresponding visibility map page after pruning but before setting the
- * visibility map. It examines the heap page, the associated VM page, and the
- * number of dead items previously identified.
- *
- * This function must be called while holding an exclusive lock on the heap
- * buffer, and the dead items must have been discovered under that same lock.
-
- * The provided vmbits must reflect the current state of the VM block
- * referenced by vmbuffer. Although we do not hold a lock on the VM buffer, it
- * is pinned, and the heap buffer is exclusively locked, ensuring that no
- * other backend can update the VM bits corresponding to this heap page.
- *
- * Returns true if it cleared corruption and false otherwise.
- */
-static bool
-identify_and_fix_vm_corruption(Relation rel, Buffer heap_buffer,
- BlockNumber heap_blk, Page heap_page,
- int nlpdead_items,
- Buffer vmbuffer,
- uint8 vmbits)
-{
- Assert(visibilitymap_get_status(rel, heap_blk, &vmbuffer) == vmbits);
-
- Assert(BufferIsLockedByMeInMode(heap_buffer, BUFFER_LOCK_EXCLUSIVE));
-
- /*
- * As of PostgreSQL 9.2, the visibility map bit should never be set if the
- * page-level bit is clear. However, it's possible that the bit got
- * cleared after heap_vac_scan_next_block() was called, so we must recheck
- * with buffer lock before concluding that the VM is corrupt.
- */
- if (!PageIsAllVisible(heap_page) &&
- ((vmbits & VISIBILITYMAP_VALID_BITS) != 0))
- {
- ereport(WARNING,
- (errcode(ERRCODE_DATA_CORRUPTED),
- errmsg("page is not marked all-visible but visibility map bit is set in relation \"%s\" page %u",
- RelationGetRelationName(rel), heap_blk)));
-
- visibilitymap_clear(rel, heap_blk, vmbuffer,
- VISIBILITYMAP_VALID_BITS);
- return true;
- }
-
- /*
- * It's possible for the value returned by
- * GetOldestNonRemovableTransactionId() to move backwards, so it's not
- * wrong for us to see tuples that appear to not be visible to everyone
- * yet, while PD_ALL_VISIBLE is already set. The real safe xmin value
- * never moves backwards, but GetOldestNonRemovableTransactionId() is
- * conservative and sometimes returns a value that's unnecessarily small,
- * so if we see that contradiction it just means that the tuples that we
- * think are not visible to everyone yet actually are, and the
- * PD_ALL_VISIBLE flag is correct.
- *
- * There should never be LP_DEAD items on a page with PD_ALL_VISIBLE set,
- * however.
- */
- else if (PageIsAllVisible(heap_page) && nlpdead_items > 0)
- {
- ereport(WARNING,
- (errcode(ERRCODE_DATA_CORRUPTED),
- errmsg("page containing LP_DEAD items is marked as all-visible in relation \"%s\" page %u",
- RelationGetRelationName(rel), heap_blk)));
-
- PageClearAllVisible(heap_page);
- MarkBufferDirty(heap_buffer);
- visibilitymap_clear(rel, heap_blk, vmbuffer,
- VISIBILITYMAP_VALID_BITS);
- return true;
- }
-
- return false;
-}
-
/*
* lazy_scan_prune() -- lazy_scan_heap() pruning and freezing.
*
@@ -2041,13 +1960,12 @@ lazy_scan_prune(LVRelState *vacrel,
PruneFreezeParams params = {
.relation = rel,
.buffer = buf,
+ .vmbuffer = vmbuffer,
.reason = PRUNE_VACUUM_SCAN,
- .options = HEAP_PAGE_PRUNE_FREEZE,
+ .options = HEAP_PAGE_PRUNE_FREEZE | HEAP_PAGE_PRUNE_UPDATE_VM,
.vistest = vacrel->vistest,
.cutoffs = &vacrel->cutoffs,
};
- uint8 old_vmbits = 0;
- uint8 new_vmbits = 0;
Assert(BufferGetBlockNumber(buf) == blkno);
@@ -2147,75 +2065,25 @@ lazy_scan_prune(LVRelState *vacrel,
/* Did we find LP_DEAD items? */
*has_lpdead_items = (presult.lpdead_items > 0);
- Assert(!presult.all_visible || !(*has_lpdead_items));
- Assert(!presult.all_frozen || presult.all_visible);
-
- old_vmbits = visibilitymap_get_status(vacrel->rel, blkno, &vmbuffer);
-
- if (identify_and_fix_vm_corruption(vacrel->rel, buf, blkno, page,
- presult.lpdead_items, vmbuffer,
- old_vmbits))
- old_vmbits = 0;
-
- if (!presult.all_visible)
- return presult.ndeleted;
-
- /* Set the visibility map and page visibility hint */
- new_vmbits = VISIBILITYMAP_ALL_VISIBLE;
-
- if (presult.all_frozen)
- new_vmbits |= VISIBILITYMAP_ALL_FROZEN;
-
- /* Nothing to do */
- if (old_vmbits == new_vmbits)
- return presult.ndeleted;
-
- Assert(presult.all_visible);
-
- /*
- * It should never be the case that the visibility map page is set while
- * the page-level bit is clear, but the reverse is allowed (if checksums
- * are not enabled). Regardless, set both bits so that we get back in
- * sync.
- *
- * The heap buffer must be marked dirty before adding it to the WAL chain
- * when setting the VM. We don't worry about unnecessarily dirtying the
- * heap buffer if PD_ALL_VISIBLE is already set, though. It is extremely
- * rare to have a clean heap buffer with PD_ALL_VISIBLE already set and
- * the VM bits clear, so there is no point in optimizing it.
- */
- PageSetAllVisible(page);
- MarkBufferDirty(buf);
-
- /*
- * If the page is being set all-frozen, we pass InvalidTransactionId as
- * the cutoff_xid, since a snapshot conflict horizon sufficient to make
- * everything safe for REDO was logged when the page's tuples were frozen.
- */
- Assert(!presult.all_frozen ||
- !TransactionIdIsValid(presult.vm_conflict_horizon));
-
- visibilitymap_set(vacrel->rel, blkno, buf,
- InvalidXLogRecPtr,
- vmbuffer, presult.vm_conflict_horizon,
- new_vmbits);
/*
* If the page wasn't already set all-visible and/or all-frozen in the VM,
* count it as newly set for logging.
*/
- if ((old_vmbits & VISIBILITYMAP_ALL_VISIBLE) == 0)
+ if ((presult.old_vmbits & VISIBILITYMAP_ALL_VISIBLE) == 0 &&
+ (presult.new_vmbits & VISIBILITYMAP_ALL_VISIBLE) != 0)
{
vacrel->vm_new_visible_pages++;
- if (presult.all_frozen)
+ if ((presult.new_vmbits & VISIBILITYMAP_ALL_FROZEN) != 0)
{
vacrel->vm_new_visible_frozen_pages++;
*vm_page_frozen = true;
}
}
- else if ((old_vmbits & VISIBILITYMAP_ALL_FROZEN) == 0 &&
- presult.all_frozen)
+ else if ((presult.old_vmbits & VISIBILITYMAP_ALL_FROZEN) == 0 &&
+ (presult.new_vmbits & VISIBILITYMAP_ALL_FROZEN) != 0)
{
+ Assert((presult.new_vmbits & VISIBILITYMAP_ALL_VISIBLE) != 0);
vacrel->vm_new_frozen_pages++;
*vm_page_frozen = true;
}
diff --git a/src/include/access/heapam.h b/src/include/access/heapam.h
index f7e4ae3843c..0913759219c 100644
--- a/src/include/access/heapam.h
+++ b/src/include/access/heapam.h
@@ -42,6 +42,7 @@
/* "options" flag bits for heap_page_prune_and_freeze */
#define HEAP_PAGE_PRUNE_MARK_UNUSED_NOW (1 << 0)
#define HEAP_PAGE_PRUNE_FREEZE (1 << 1)
+#define HEAP_PAGE_PRUNE_UPDATE_VM (1 << 2)
typedef struct BulkInsertStateData *BulkInsertState;
typedef struct GlobalVisState GlobalVisState;
@@ -238,6 +239,12 @@ typedef struct PruneFreezeParams
Relation relation; /* relation containing buffer to be pruned */
Buffer buffer; /* buffer to be pruned */
+ /*
+ * If we will consider updating the visibility map, vmbuffer should
+ * contain the correct block of the visibility map and be pinned.
+ */
+ Buffer vmbuffer;
+
/*
* The reason pruning was performed. It is used to set the WAL record
* opcode which is used for debugging and analysis purposes.
@@ -252,6 +259,9 @@ typedef struct PruneFreezeParams
*
* HEAP_PAGE_PRUNE_FREEZE indicates that we will also freeze tuples, and
* will return 'all_visible', 'all_frozen' flags to the caller.
+ *
+ * HEAP_PAGE_PRUNE_UPDATE_VM indicates that we will set the page's status
+ * in the VM.
*/
int options;
@@ -299,6 +309,16 @@ typedef struct PruneFreezeResult
bool all_frozen;
TransactionId vm_conflict_horizon;
+ /*
+ * old_vmbits are the state of the all-visible and all-frozen bits in the
+ * visibility map before updating it during phase I of vacuuming.
+ * new_vmbits are the state of those bits after phase I of vacuuming.
+ *
+ * These are only set if the HEAP_PAGE_PRUNE_UPDATE_VM option is set.
+ */
+ uint8 new_vmbits;
+ uint8 old_vmbits;
+
/*
* Whether or not the page makes rel truncation unsafe. This is set to
* 'true', even if the page contains LP_DEAD items. VACUUM will remove
--
2.43.0
v31-0005-Move-VM-assert-into-prune-freeze-code.patchtext/x-patch; charset=US-ASCII; name=v31-0005-Move-VM-assert-into-prune-freeze-code.patchDownload
From 9f5072500e2a3bc2f2a8490f1ca11bf60a81515a Mon Sep 17 00:00:00 2001
From: Melanie Plageman <melanieplageman@gmail.com>
Date: Tue, 2 Dec 2025 15:57:34 -0500
Subject: [PATCH v31 05/16] Move VM assert into prune/freeze code
This is a step toward setting the VM in the same WAL record as pruning
and freezing. It moves the check of the heap page into prune/freeze code
before setting the VM. This allows us to remove some fields of the
PruneFreezeResult.
Reviewed-by: Chao Li <li.evan.chao@gmail.com>
---
src/backend/access/heap/pruneheap.c | 86 ++++++++++++++++++++++------
src/backend/access/heap/vacuumlazy.c | 68 +---------------------
src/include/access/heapam.h | 25 +++-----
3 files changed, 77 insertions(+), 102 deletions(-)
diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index 1c1446058a7..7af6aea2d0e 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -932,6 +932,31 @@ heap_page_will_set_vm(PruneState *prstate,
return true;
}
+#ifdef USE_ASSERT_CHECKING
+
+/*
+ * Wrapper for heap_page_would_be_all_visible() which can be used for callers
+ * that expect no LP_DEAD on the page. Currently assert-only, but there is no
+ * reason not to use it outside of asserts.
+ */
+static bool
+heap_page_is_all_visible(Relation rel, Buffer buf,
+ TransactionId OldestXmin,
+ bool *all_frozen,
+ TransactionId *visibility_cutoff_xid,
+ OffsetNumber *logging_offnum)
+{
+
+ return heap_page_would_be_all_visible(rel, buf,
+ OldestXmin,
+ NULL, 0,
+ all_frozen,
+ visibility_cutoff_xid,
+ logging_offnum);
+}
+#endif
+
+
/*
* Prune and repair fragmentation and potentially freeze tuples on the
@@ -985,6 +1010,7 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
Buffer vmbuffer = params->vmbuffer;
Page page = BufferGetPage(buffer);
BlockNumber blockno = BufferGetBlockNumber(buffer);
+ TransactionId vm_conflict_horizon = InvalidTransactionId;
PruneState prstate;
bool do_freeze;
bool do_prune;
@@ -1142,23 +1168,8 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
presult->nfrozen = prstate.nfrozen;
presult->live_tuples = prstate.live_tuples;
presult->recently_dead_tuples = prstate.recently_dead_tuples;
- presult->all_visible = prstate.all_visible;
- presult->all_frozen = prstate.all_frozen;
presult->hastup = prstate.hastup;
- /*
- * For callers planning to update the visibility map, the conflict horizon
- * for that record must be the newest xmin on the page. However, if the
- * page is completely frozen, there can be no conflict and the
- * vm_conflict_horizon should remain InvalidTransactionId. This includes
- * the case that we just froze all the tuples; the prune-freeze record
- * included the conflict XID already so the caller doesn't need it.
- */
- if (presult->all_frozen)
- presult->vm_conflict_horizon = InvalidTransactionId;
- else
- presult->vm_conflict_horizon = prstate.visibility_cutoff_xid;
-
presult->lpdead_items = prstate.lpdead_items;
/* the presult->deadoffsets array was already filled in */
@@ -1176,6 +1187,46 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
}
}
+ /*
+ * If updating the visibility map, the conflict horizon for that record
+ * must be the newest xmin on the page. However, if the page is
+ * completely frozen, there can be no conflict and the vm_conflict_horizon
+ * should remain InvalidTransactionId. This includes the case that we
+ * just froze all the tuples; the prune-freeze record included the
+ * conflict XID already so we don't need to again.
+ */
+ if (prstate.all_frozen)
+ vm_conflict_horizon = InvalidTransactionId;
+ else
+ vm_conflict_horizon = prstate.visibility_cutoff_xid;
+
+ /*
+ * During its second pass over the heap, VACUUM calls
+ * heap_page_would_be_all_visible() to determine whether a page is
+ * all-visible and all-frozen. The logic here is similar. After completing
+ * pruning and freezing, use an assertion to verify that our results
+ * remain consistent with heap_page_would_be_all_visible().
+ */
+#ifdef USE_ASSERT_CHECKING
+ if (prstate.all_visible)
+ {
+ TransactionId debug_cutoff;
+ bool debug_all_frozen;
+
+ Assert(presult->lpdead_items == 0);
+
+ Assert(heap_page_is_all_visible(params->relation, buffer,
+ prstate.cutoffs->OldestXmin,
+ &debug_all_frozen,
+ &debug_cutoff, off_loc));
+
+ Assert(prstate.all_frozen == debug_all_frozen);
+
+ Assert(!TransactionIdIsValid(debug_cutoff) ||
+ debug_cutoff == vm_conflict_horizon);
+ }
+#endif
+
/* Now update the visibility map and PD_ALL_VISIBLE hint */
Assert(!prstate.all_visible || (prstate.lpdead_items == 0));
@@ -1222,12 +1273,11 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
* make everything safe for REDO was logged when the page's tuples
* were frozen.
*/
- Assert(!prstate.all_frozen ||
- !TransactionIdIsValid(presult->vm_conflict_horizon));
+ Assert(!prstate.all_frozen || !TransactionIdIsValid(vm_conflict_horizon));
visibilitymap_set(params->relation, blockno, buffer,
InvalidXLogRecPtr,
- vmbuffer, presult->vm_conflict_horizon,
+ vmbuffer, vm_conflict_horizon,
new_vmbits);
}
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index 8b489349312..f56a02a3d46 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -457,20 +457,6 @@ static void dead_items_add(LVRelState *vacrel, BlockNumber blkno, OffsetNumber *
static void dead_items_reset(LVRelState *vacrel);
static void dead_items_cleanup(LVRelState *vacrel);
-#ifdef USE_ASSERT_CHECKING
-static bool heap_page_is_all_visible(Relation rel, Buffer buf,
- TransactionId OldestXmin,
- bool *all_frozen,
- TransactionId *visibility_cutoff_xid,
- OffsetNumber *logging_offnum);
-#endif
-static bool heap_page_would_be_all_visible(Relation rel, Buffer buf,
- TransactionId OldestXmin,
- OffsetNumber *deadoffsets,
- int ndeadoffsets,
- bool *all_frozen,
- TransactionId *visibility_cutoff_xid,
- OffsetNumber *logging_offnum);
static void update_relstats_all_indexes(LVRelState *vacrel);
static void vacuum_error_callback(void *arg);
static void update_vacuum_error_info(LVRelState *vacrel,
@@ -2006,32 +1992,6 @@ lazy_scan_prune(LVRelState *vacrel,
vacrel->new_frozen_tuple_pages++;
}
- /*
- * VACUUM will call heap_page_is_all_visible() during the second pass over
- * the heap to determine all_visible and all_frozen for the page -- this
- * is a specialized version of the logic from this function. Now that
- * we've finished pruning and freezing, make sure that we're in total
- * agreement with heap_page_is_all_visible() using an assertion.
- */
-#ifdef USE_ASSERT_CHECKING
- if (presult.all_visible)
- {
- TransactionId debug_cutoff;
- bool debug_all_frozen;
-
- Assert(presult.lpdead_items == 0);
-
- Assert(heap_page_is_all_visible(vacrel->rel, buf,
- vacrel->cutoffs.OldestXmin, &debug_all_frozen,
- &debug_cutoff, &vacrel->offnum));
-
- Assert(presult.all_frozen == debug_all_frozen);
-
- Assert(!TransactionIdIsValid(debug_cutoff) ||
- debug_cutoff == presult.vm_conflict_horizon);
- }
-#endif
-
/*
* Now save details of the LP_DEAD items from the page in vacrel
*/
@@ -3489,29 +3449,6 @@ dead_items_cleanup(LVRelState *vacrel)
vacrel->pvs = NULL;
}
-#ifdef USE_ASSERT_CHECKING
-
-/*
- * Wrapper for heap_page_would_be_all_visible() which can be used for callers
- * that expect no LP_DEAD on the page. Currently assert-only, but there is no
- * reason not to use it outside of asserts.
- */
-static bool
-heap_page_is_all_visible(Relation rel, Buffer buf,
- TransactionId OldestXmin,
- bool *all_frozen,
- TransactionId *visibility_cutoff_xid,
- OffsetNumber *logging_offnum)
-{
-
- return heap_page_would_be_all_visible(rel, buf,
- OldestXmin,
- NULL, 0,
- all_frozen,
- visibility_cutoff_xid,
- logging_offnum);
-}
-#endif
/*
* Check whether the heap page in buf is all-visible except for the dead
@@ -3535,15 +3472,12 @@ heap_page_is_all_visible(Relation rel, Buffer buf,
* - *logging_offnum: OffsetNumber of current tuple being processed;
* used by vacuum's error callback system.
*
- * Callers looking to verify that the page is already all-visible can call
- * heap_page_is_all_visible().
- *
* This logic is closely related to heap_prune_record_unchanged_lp_normal().
* If you modify this function, ensure consistency with that code. An
* assertion cross-checks that both remain in agreement. Do not introduce new
* side-effects.
*/
-static bool
+bool
heap_page_would_be_all_visible(Relation rel, Buffer buf,
TransactionId OldestXmin,
OffsetNumber *deadoffsets,
diff --git a/src/include/access/heapam.h b/src/include/access/heapam.h
index 0913759219c..88e79c58a10 100644
--- a/src/include/access/heapam.h
+++ b/src/include/access/heapam.h
@@ -257,8 +257,7 @@ typedef struct PruneFreezeParams
* HEAP_PAGE_PRUNE_MARK_UNUSED_NOW indicates that dead items can be set
* LP_UNUSED during pruning.
*
- * HEAP_PAGE_PRUNE_FREEZE indicates that we will also freeze tuples, and
- * will return 'all_visible', 'all_frozen' flags to the caller.
+ * HEAP_PAGE_PRUNE_FREEZE indicates that we will also freeze tuples.
*
* HEAP_PAGE_PRUNE_UPDATE_VM indicates that we will set the page's status
* in the VM.
@@ -294,21 +293,6 @@ typedef struct PruneFreezeResult
int live_tuples;
int recently_dead_tuples;
- /*
- * all_visible and all_frozen indicate if the all-visible and all-frozen
- * bits in the visibility map can be set for this page, after pruning.
- *
- * vm_conflict_horizon is the newest xmin of live tuples on the page. The
- * caller can use it as the conflict horizon when setting the VM bits. It
- * is only valid if we froze some tuples (nfrozen > 0), and all_frozen is
- * true.
- *
- * These are only set if the HEAP_PRUNE_FREEZE option is set.
- */
- bool all_visible;
- bool all_frozen;
- TransactionId vm_conflict_horizon;
-
/*
* old_vmbits are the state of the all-visible and all-frozen bits in the
* visibility map before updating it during phase I of vacuuming.
@@ -453,6 +437,13 @@ extern void log_heap_prune_and_freeze(Relation relation, Buffer buffer,
/* in heap/vacuumlazy.c */
extern void heap_vacuum_rel(Relation rel,
const VacuumParams params, BufferAccessStrategy bstrategy);
+extern bool heap_page_would_be_all_visible(Relation rel, Buffer buf,
+ TransactionId OldestXmin,
+ OffsetNumber *deadoffsets,
+ int ndeadoffsets,
+ bool *all_frozen,
+ TransactionId *visibility_cutoff_xid,
+ OffsetNumber *logging_offnum);
/* in heap/heapam_visibility.c */
extern bool HeapTupleSatisfiesVisibility(HeapTuple htup, Snapshot snapshot,
--
2.43.0
v31-0006-Eliminate-XLOG_HEAP2_VISIBLE-from-vacuum-phase-I.patchtext/x-patch; charset=US-ASCII; name=v31-0006-Eliminate-XLOG_HEAP2_VISIBLE-from-vacuum-phase-I.patchDownload
From eb94a7df040b6250d3ea3e0d1a79f24a3dc4fd6a Mon Sep 17 00:00:00 2001
From: Melanie Plageman <melanieplageman@gmail.com>
Date: Tue, 2 Dec 2025 16:16:22 -0500
Subject: [PATCH v31 06/16] Eliminate XLOG_HEAP2_VISIBLE from vacuum phase I
prune/freeze
Vacuum no longer emits a separate WAL record for each page set
all-visible or all-frozen during phase I. Instead, visibility map
updates are now included in the XLOG_HEAP2_PRUNE_VACUUM_SCAN record that
is already emitted for pruning and freezing.
Previously, heap_page_prune_and_freeze() determined whether a page was
all-visible, but the corresponding VM bits were only set later in
lazy_scan_prune(). Now the VM is updated immediately in
heap_page_prune_and_freeze(), at the same time as the heap
modifications.
This change applies only to vacuum phase I, not to pruning performed
during normal page access.
Author: Melanie Plageman <melanieplageman@gmail.com>
Reviewed-by: Andres Freund <andres@anarazel.de>
Reviewed-by: Robert Haas <robertmhaas@gmail.com>
Reviewed-by: Kirill Reshke <reshkekirill@gmail.com>
Reviewed-by: Chao Li <li.evan.chao@gmail.com>
Discussion: https://postgr.es/m/flat/CAAKRu_ZMw6Npd_qm2KM%2BFwQ3cMOMx1Dh3VMhp8-V7SOLxdK9-g%40mail.gmail.com
---
src/backend/access/heap/pruneheap.c | 275 ++++++++++++++++------------
1 file changed, 157 insertions(+), 118 deletions(-)
diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index 7af6aea2d0e..49d3ebb0063 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -205,6 +205,11 @@ static bool heap_page_will_set_vm(PruneState *prstate,
int nlpdead_items,
uint8 *old_vmbits,
uint8 *new_vmbits);
+static TransactionId get_conflict_xid(bool do_prune, bool do_freeze, bool do_set_vm,
+ uint8 old_vmbits, uint8 new_vmbits,
+ TransactionId latest_xid_removed,
+ TransactionId frz_conflict_horizon,
+ TransactionId visibility_cutoff_xid);
/*
@@ -795,6 +800,68 @@ heap_page_will_freeze(Relation relation, Buffer buffer,
return do_freeze;
}
+/*
+ * Calculate the conflict horizon for the whole XLOG_HEAP2_PRUNE_VACUUM_SCAN
+ * or XLOG_HEAP2_PRUNE_ON_ACCESS record.
+ */
+static TransactionId
+get_conflict_xid(bool do_prune, bool do_freeze, bool do_set_vm,
+ uint8 old_vmbits, uint8 new_vmbits,
+ TransactionId latest_xid_removed, TransactionId frz_conflict_horizon,
+ TransactionId visibility_cutoff_xid)
+{
+ TransactionId conflict_xid;
+
+ /*
+ * We can omit the snapshot conflict horizon if we are not pruning or
+ * freezing any tuples and are setting an already all-visible page
+ * all-frozen in the VM. In this case, all of the tuples on the page must
+ * already be visible to all MVCC snapshots on the standby.
+ */
+ if (!do_prune &&
+ !do_freeze &&
+ do_set_vm &&
+ (old_vmbits & VISIBILITYMAP_ALL_VISIBLE) != 0 &&
+ (new_vmbits & VISIBILITYMAP_ALL_FROZEN) != 0)
+ return InvalidTransactionId;
+
+ /*
+ * The snapshotConflictHorizon for the whole record should be the most
+ * conservative of all the horizons calculated for any of the possible
+ * modifications. If this record will prune tuples, any transactions on
+ * the standby older than the youngest xmax of the most recently removed
+ * tuple this record will prune will conflict. If this record will freeze
+ * tuples, any transactions on the standby with xids older than the
+ * youngest tuple this record will freeze will conflict.
+ */
+ conflict_xid = InvalidTransactionId;
+
+ /*
+ * If we are updating the VM, the conflict horizon is almost always the
+ * visibility cutoff XID.
+ *
+ * Separately, if we are freezing any tuples, as an optimization, we can
+ * use the visibility_cutoff_xid as the conflict horizon if the page will
+ * be all-frozen. This is true even if there are LP_DEAD line pointers
+ * because we ignored those when maintaining the visibility_cutoff_xid.
+ * This will have been calculated earlier as the frz_conflict_horizon when
+ * we determined we would freeze.
+ */
+ if (do_set_vm)
+ conflict_xid = visibility_cutoff_xid;
+ else if (do_freeze)
+ conflict_xid = frz_conflict_horizon;
+
+ /*
+ * If we are removing tuples with a younger xmax than our so far
+ * calculated conflict_xid, we must use this as our horizon.
+ */
+ if (TransactionIdFollows(latest_xid_removed, conflict_xid))
+ conflict_xid = latest_xid_removed;
+
+ return conflict_xid;
+}
+
/*
* Helper to correct any corruption detected on a heap page and its
* corresponding visibility map page after pruning but before setting the
@@ -1010,7 +1077,6 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
Buffer vmbuffer = params->vmbuffer;
Page page = BufferGetPage(buffer);
BlockNumber blockno = BufferGetBlockNumber(buffer);
- TransactionId vm_conflict_horizon = InvalidTransactionId;
PruneState prstate;
bool do_freeze;
bool do_prune;
@@ -1018,6 +1084,7 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
bool do_set_vm;
bool did_tuple_hint_fpi;
int64 fpi_before = pgWalUsage.wal_fpi;
+ TransactionId conflict_xid;
uint8 new_vmbits;
uint8 old_vmbits;
@@ -1081,6 +1148,37 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
prstate.all_visible = prstate.all_frozen = false;
Assert(!prstate.all_frozen || prstate.all_visible);
+ Assert(!prstate.all_visible || (prstate.lpdead_items == 0));
+
+ /*
+ * Decide whether to set the VM bits based on information from the VM and
+ * the all_visible/all_frozen flags.
+ */
+ do_set_vm = heap_page_will_set_vm(&prstate,
+ params->relation,
+ blockno,
+ buffer,
+ page,
+ vmbuffer,
+ prstate.lpdead_items,
+ &old_vmbits,
+ &new_vmbits);
+
+ /*
+ * new_vmbits should be 0 regardless of whether or not the page is
+ * all-visible if we do not intend to set the VM.
+ */
+ Assert(do_set_vm || new_vmbits == 0);
+
+ conflict_xid = get_conflict_xid(do_prune, do_freeze, do_set_vm,
+ old_vmbits, new_vmbits,
+ prstate.latest_xid_removed,
+ prstate.frz_conflict_horizon,
+ prstate.visibility_cutoff_xid);
+
+ /* Lock vmbuffer before entering a critical section */
+ if (do_set_vm)
+ LockBuffer(vmbuffer, BUFFER_LOCK_EXCLUSIVE);
/* Any error while applying the changes is critical */
START_CRIT_SECTION();
@@ -1102,14 +1200,17 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
/*
* If that's all we had to do to the page, this is a non-WAL-logged
- * hint. If we are going to freeze or prune the page, we will mark
- * the buffer dirty below.
+ * hint. If we are going to freeze or prune the page or set
+ * PD_ALL_VISIBLE, we will mark the buffer dirty below.
+ *
+ * Setting PD_ALL_VISIBLE is fully WAL-logged because it is forbidden
+ * for the VM to be set and PD_ALL_VISIBLE to be clear.
*/
- if (!do_freeze && !do_prune)
+ if (!do_freeze && !do_prune && !do_set_vm)
MarkBufferDirtyHint(buffer, true);
}
- if (do_prune || do_freeze)
+ if (do_prune || do_freeze || do_set_vm)
{
/* Apply the planned item changes and repair page fragmentation. */
if (do_prune)
@@ -1123,6 +1224,26 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
if (do_freeze)
heap_freeze_prepared_tuples(buffer, prstate.frozen, prstate.nfrozen);
+ /* Set the visibility map and page visibility hint */
+ if (do_set_vm)
+ {
+ /*
+ * While it is valid for PD_ALL_VISIBLE to be set when the
+ * corresponding VM bit is clear, we strongly prefer to keep them
+ * in sync.
+ *
+ * The heap buffer must be marked dirty before adding it to the
+ * WAL chain when setting the VM. We don't worry about
+ * unnecessarily dirtying the heap buffer if PD_ALL_VISIBLE is
+ * already set, though. It is extremely rare to have a clean heap
+ * buffer with PD_ALL_VISIBLE already set and the VM bits clear,
+ * so there is no point in optimizing it.
+ */
+ PageSetAllVisible(page);
+ visibilitymap_set_vmbits(blockno, vmbuffer, new_vmbits,
+ params->relation->rd_locator);
+ }
+
MarkBufferDirty(buffer);
/*
@@ -1130,29 +1251,12 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
*/
if (RelationNeedsWAL(params->relation))
{
- /*
- * The snapshotConflictHorizon for the whole record should be the
- * most conservative of all the horizons calculated for any of the
- * possible modifications. If this record will prune tuples, any
- * transactions on the standby older than the youngest xmax of the
- * most recently removed tuple this record will prune will
- * conflict. If this record will freeze tuples, any transactions
- * on the standby with xids older than the youngest tuple this
- * record will freeze will conflict.
- */
- TransactionId conflict_xid;
-
- if (TransactionIdFollows(prstate.frz_conflict_horizon,
- prstate.latest_xid_removed))
- conflict_xid = prstate.frz_conflict_horizon;
- else
- conflict_xid = prstate.latest_xid_removed;
-
log_heap_prune_and_freeze(params->relation, buffer,
- InvalidBuffer, /* vmbuffer */
- 0, /* vmflags */
+ do_set_vm ? vmbuffer : InvalidBuffer,
+ do_set_vm ? new_vmbits : 0,
conflict_xid,
- true, params->reason,
+ true, /* cleanup lock */
+ params->reason,
prstate.frozen, prstate.nfrozen,
prstate.redirected, prstate.nredirected,
prstate.nowdead, prstate.ndead,
@@ -1162,43 +1266,8 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
END_CRIT_SECTION();
- /* Copy information back for caller */
- presult->ndeleted = prstate.ndeleted;
- presult->nnewlpdead = prstate.ndead;
- presult->nfrozen = prstate.nfrozen;
- presult->live_tuples = prstate.live_tuples;
- presult->recently_dead_tuples = prstate.recently_dead_tuples;
- presult->hastup = prstate.hastup;
-
- presult->lpdead_items = prstate.lpdead_items;
- /* the presult->deadoffsets array was already filled in */
-
- if (prstate.attempt_freeze)
- {
- if (presult->nfrozen > 0)
- {
- *new_relfrozen_xid = prstate.pagefrz.FreezePageRelfrozenXid;
- *new_relmin_mxid = prstate.pagefrz.FreezePageRelminMxid;
- }
- else
- {
- *new_relfrozen_xid = prstate.pagefrz.NoFreezePageRelfrozenXid;
- *new_relmin_mxid = prstate.pagefrz.NoFreezePageRelminMxid;
- }
- }
-
- /*
- * If updating the visibility map, the conflict horizon for that record
- * must be the newest xmin on the page. However, if the page is
- * completely frozen, there can be no conflict and the vm_conflict_horizon
- * should remain InvalidTransactionId. This includes the case that we
- * just froze all the tuples; the prune-freeze record included the
- * conflict XID already so we don't need to again.
- */
- if (prstate.all_frozen)
- vm_conflict_horizon = InvalidTransactionId;
- else
- vm_conflict_horizon = prstate.visibility_cutoff_xid;
+ if (do_set_vm)
+ LockBuffer(vmbuffer, BUFFER_LOCK_UNLOCK);
/*
* During its second pass over the heap, VACUUM calls
@@ -1213,7 +1282,8 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
TransactionId debug_cutoff;
bool debug_all_frozen;
- Assert(presult->lpdead_items == 0);
+ Assert(prstate.lpdead_items == 0);
+ Assert(prstate.cutoffs);
Assert(heap_page_is_all_visible(params->relation, buffer,
prstate.cutoffs->OldestXmin,
@@ -1223,67 +1293,36 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
Assert(prstate.all_frozen == debug_all_frozen);
Assert(!TransactionIdIsValid(debug_cutoff) ||
- debug_cutoff == vm_conflict_horizon);
+ debug_cutoff == prstate.visibility_cutoff_xid);
}
#endif
- /* Now update the visibility map and PD_ALL_VISIBLE hint */
- Assert(!prstate.all_visible || (prstate.lpdead_items == 0));
-
- do_set_vm = heap_page_will_set_vm(&prstate,
- params->relation,
- blockno,
- buffer,
- page,
- vmbuffer,
- prstate.lpdead_items,
- &old_vmbits,
- &new_vmbits);
+ /* Copy information back for caller */
+ presult->ndeleted = prstate.ndeleted;
+ presult->nnewlpdead = prstate.ndead;
+ presult->nfrozen = prstate.nfrozen;
+ presult->live_tuples = prstate.live_tuples;
+ presult->recently_dead_tuples = prstate.recently_dead_tuples;
+ presult->hastup = prstate.hastup;
+ presult->new_vmbits = new_vmbits;
+ presult->old_vmbits = old_vmbits;
- /*
- * new_vmbits should be 0 regardless of whether or not the page is
- * all-visible if we do not intend to set the VM.
- */
- Assert(do_set_vm || new_vmbits == 0);
+ presult->lpdead_items = prstate.lpdead_items;
+ /* the presult->deadoffsets array was already filled in */
- /* Set the visibility map and page visibility hint, if relevant */
- if (do_set_vm)
+ if (prstate.attempt_freeze)
{
- Assert(prstate.all_visible);
-
- /*
- * It should never be the case that the visibility map page is set
- * while the page-level bit is clear, but the reverse is allowed (if
- * checksums are not enabled). Regardless, set both bits so that we
- * get back in sync.
- *
- * The heap buffer must be marked dirty before adding it to the WAL
- * chain when setting the VM. We don't worry about unnecessarily
- * dirtying the heap buffer if PD_ALL_VISIBLE is already set, though.
- * It is extremely rare to have a clean heap buffer with
- * PD_ALL_VISIBLE already set and the VM bits clear, so there is no
- * point in optimizing it.
- */
- PageSetAllVisible(page);
- MarkBufferDirty(buffer);
-
- /*
- * If the page is being set all-frozen, we pass InvalidTransactionId
- * as the cutoff_xid, since a snapshot conflict horizon sufficient to
- * make everything safe for REDO was logged when the page's tuples
- * were frozen.
- */
- Assert(!prstate.all_frozen || !TransactionIdIsValid(vm_conflict_horizon));
-
- visibilitymap_set(params->relation, blockno, buffer,
- InvalidXLogRecPtr,
- vmbuffer, vm_conflict_horizon,
- new_vmbits);
+ if (presult->nfrozen > 0)
+ {
+ *new_relfrozen_xid = prstate.pagefrz.FreezePageRelfrozenXid;
+ *new_relmin_mxid = prstate.pagefrz.FreezePageRelminMxid;
+ }
+ else
+ {
+ *new_relfrozen_xid = prstate.pagefrz.NoFreezePageRelfrozenXid;
+ *new_relmin_mxid = prstate.pagefrz.NoFreezePageRelminMxid;
+ }
}
-
- /* Save the vmbits for caller */
- presult->old_vmbits = old_vmbits;
- presult->new_vmbits = new_vmbits;
}
--
2.43.0
v31-0007-Eliminate-XLOG_HEAP2_VISIBLE-from-empty-page-vac.patchtext/x-patch; charset=US-ASCII; name=v31-0007-Eliminate-XLOG_HEAP2_VISIBLE-from-empty-page-vac.patchDownload
From b30b92789f9b62e60348bd1441f03031e1bf7309 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <melanieplageman@gmail.com>
Date: Sat, 27 Sep 2025 11:55:21 -0400
Subject: [PATCH v31 07/16] Eliminate XLOG_HEAP2_VISIBLE from empty-page vacuum
As part of removing XLOG_HEAP2_VISIBLE records, phase I of VACUUM now
marks empty pages all-visible in a XLOG_HEAP2_PRUNE_VACUUM_SCAN record.
Author: Melanie Plageman <melanieplageman@gmail.com>
Reviewed-by: Robert Haas <robertmhaas@gmail.com>
Reviewed-by: Andres Freund <andres@anarazel.de>
Reviewed-by: Chao Li <li.evan.chao@gmail.com>
---
src/backend/access/heap/vacuumlazy.c | 35 +++++++++++++++++++++++-----
1 file changed, 29 insertions(+), 6 deletions(-)
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index f56a02a3d46..d22d2a86ed0 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -1867,9 +1867,12 @@ lazy_scan_new_or_empty(LVRelState *vacrel, Buffer buf, BlockNumber blkno,
*/
if (!PageIsAllVisible(page))
{
+ /* Lock vmbuffer before entering critical section */
+ LockBuffer(vmbuffer, BUFFER_LOCK_EXCLUSIVE);
+
START_CRIT_SECTION();
- /* mark buffer dirty before writing a WAL record */
+ /* Mark buffer dirty before writing any WAL records */
MarkBufferDirty(buf);
/*
@@ -1886,13 +1889,33 @@ lazy_scan_new_or_empty(LVRelState *vacrel, Buffer buf, BlockNumber blkno,
log_newpage_buffer(buf, true);
PageSetAllVisible(page);
- visibilitymap_set(vacrel->rel, blkno, buf,
- InvalidXLogRecPtr,
- vmbuffer, InvalidTransactionId,
- VISIBILITYMAP_ALL_VISIBLE |
- VISIBILITYMAP_ALL_FROZEN);
+ visibilitymap_set_vmbits(blkno,
+ vmbuffer,
+ VISIBILITYMAP_ALL_VISIBLE |
+ VISIBILITYMAP_ALL_FROZEN,
+ vacrel->rel->rd_locator);
+
+ /*
+ * Emit WAL for setting PD_ALL_VISIBLE on the heap page and
+ * setting the VM.
+ */
+ if (RelationNeedsWAL(vacrel->rel))
+ log_heap_prune_and_freeze(vacrel->rel, buf,
+ vmbuffer,
+ VISIBILITYMAP_ALL_VISIBLE |
+ VISIBILITYMAP_ALL_FROZEN,
+ InvalidTransactionId, /* conflict xid */
+ false, /* cleanup lock */
+ PRUNE_VACUUM_SCAN, /* reason */
+ NULL, 0,
+ NULL, 0,
+ NULL, 0,
+ NULL, 0);
+
END_CRIT_SECTION();
+ LockBuffer(vmbuffer, BUFFER_LOCK_UNLOCK);
+
/* Count the newly all-frozen pages for logging */
vacrel->vm_new_visible_pages++;
vacrel->vm_new_visible_frozen_pages++;
--
2.43.0
v31-0008-Remove-XLOG_HEAP2_VISIBLE-entirely.patchtext/x-patch; charset=US-ASCII; name=v31-0008-Remove-XLOG_HEAP2_VISIBLE-entirely.patchDownload
From fb26088478e331440a2747031ba259e2adc9808e Mon Sep 17 00:00:00 2001
From: Melanie Plageman <melanieplageman@gmail.com>
Date: Sat, 27 Sep 2025 11:55:36 -0400
Subject: [PATCH v31 08/16] Remove XLOG_HEAP2_VISIBLE entirely
As no remaining users emit XLOG_HEAP2_VISIBLE records.
This includes deleting the xl_heap_visible struct and all functions
responsible for emitting or replaying XLOG_HEAP2_VISIBLE records.
This changes the visibility map API, so any external users/consumers of
the VM-only WAL record will need to change.
Author: Melanie Plageman <melanieplageman@gmail.com>
Reviewed-by: Andrey Borodin <x4mmm@yandex-team.ru>
Reviewed-by: Andres Freund <andres@anarazel.de>
Reviewed-by: Chao Li <li.evan.chao@gmail.com>
---
src/backend/access/common/bufmask.c | 4 +-
src/backend/access/heap/heapam.c | 54 +-------
src/backend/access/heap/heapam_xlog.c | 155 ++---------------------
src/backend/access/heap/pruneheap.c | 4 +-
src/backend/access/heap/vacuumlazy.c | 16 +--
src/backend/access/heap/visibilitymap.c | 109 +---------------
src/backend/access/rmgrdesc/heapdesc.c | 10 --
src/backend/replication/logical/decode.c | 1 -
src/backend/storage/ipc/standby.c | 12 +-
src/include/access/heapam_xlog.h | 20 ---
src/include/access/visibilitymap.h | 13 +-
src/include/access/visibilitymapdefs.h | 9 --
src/tools/pgindent/typedefs.list | 1 -
13 files changed, 38 insertions(+), 370 deletions(-)
diff --git a/src/backend/access/common/bufmask.c b/src/backend/access/common/bufmask.c
index bb260cffa68..5f07f179415 100644
--- a/src/backend/access/common/bufmask.c
+++ b/src/backend/access/common/bufmask.c
@@ -56,8 +56,8 @@ mask_page_hint_bits(Page page)
/*
* During replay, if the page LSN has advanced past our XLOG record's LSN,
- * we don't mark the page all-visible. See heap_xlog_visible() for
- * details.
+ * we don't mark the page all-visible. See heap_xlog_prune_and_freeze()
+ * for more details.
*/
PageClearAllVisible(page);
}
diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index 6daf4a87dec..fb7a7548aa0 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -2539,11 +2539,11 @@ heap_multi_insert(Relation relation, TupleTableSlot **slots, int ntuples,
else if (all_frozen_set)
{
PageSetAllVisible(page);
- visibilitymap_set_vmbits(BufferGetBlockNumber(buffer),
- vmbuffer,
- VISIBILITYMAP_ALL_VISIBLE |
- VISIBILITYMAP_ALL_FROZEN,
- relation->rd_locator);
+ visibilitymap_set(BufferGetBlockNumber(buffer),
+ vmbuffer,
+ VISIBILITYMAP_ALL_VISIBLE |
+ VISIBILITYMAP_ALL_FROZEN,
+ relation->rd_locator);
}
/*
@@ -8813,50 +8813,6 @@ bottomup_sort_and_shrink(TM_IndexDeleteOp *delstate)
return nblocksfavorable;
}
-/*
- * Perform XLogInsert for a heap-visible operation. 'block' is the block
- * being marked all-visible, and vm_buffer is the buffer containing the
- * corresponding visibility map block. Both should have already been modified
- * and dirtied.
- *
- * snapshotConflictHorizon comes from the largest xmin on the page being
- * marked all-visible. REDO routine uses it to generate recovery conflicts.
- *
- * If checksums or wal_log_hints are enabled, we may also generate a full-page
- * image of heap_buffer. Otherwise, we optimize away the FPI (by specifying
- * REGBUF_NO_IMAGE for the heap buffer), in which case the caller should *not*
- * update the heap page's LSN.
- */
-XLogRecPtr
-log_heap_visible(Relation rel, Buffer heap_buffer, Buffer vm_buffer,
- TransactionId snapshotConflictHorizon, uint8 vmflags)
-{
- xl_heap_visible xlrec;
- XLogRecPtr recptr;
- uint8 flags;
-
- Assert(BufferIsValid(heap_buffer));
- Assert(BufferIsValid(vm_buffer));
-
- xlrec.snapshotConflictHorizon = snapshotConflictHorizon;
- xlrec.flags = vmflags;
- if (RelationIsAccessibleInLogicalDecoding(rel))
- xlrec.flags |= VISIBILITYMAP_XLOG_CATALOG_REL;
- XLogBeginInsert();
- XLogRegisterData(&xlrec, SizeOfHeapVisible);
-
- XLogRegisterBuffer(0, vm_buffer, 0);
-
- flags = REGBUF_STANDARD;
- if (!XLogHintBitIsNeeded())
- flags |= REGBUF_NO_IMAGE;
- XLogRegisterBuffer(1, heap_buffer, flags);
-
- recptr = XLogInsert(RM_HEAP2_ID, XLOG_HEAP2_VISIBLE);
-
- return recptr;
-}
-
/*
* Perform XLogInsert for a heap-update operation. Caller must already
* have modified the buffer(s) and marked them dirty.
diff --git a/src/backend/access/heap/heapam_xlog.c b/src/backend/access/heap/heapam_xlog.c
index 1823feff298..47d2479415e 100644
--- a/src/backend/access/heap/heapam_xlog.c
+++ b/src/backend/access/heap/heapam_xlog.c
@@ -236,7 +236,7 @@ heap_xlog_prune_freeze(XLogReaderState *record)
if (PageIsNew(vmpage))
PageInit(vmpage, BLCKSZ, 0);
- visibilitymap_set_vmbits(blkno, vmbuffer, vmflags, rlocator);
+ visibilitymap_set(blkno, vmbuffer, vmflags, rlocator);
Assert(BufferIsDirty(vmbuffer));
PageSetLSN(vmpage, lsn);
@@ -249,142 +249,6 @@ heap_xlog_prune_freeze(XLogReaderState *record)
XLogRecordPageWithFreeSpace(rlocator, blkno, freespace);
}
-/*
- * Replay XLOG_HEAP2_VISIBLE records.
- *
- * The critical integrity requirement here is that we must never end up with
- * a situation where the visibility map bit is set, and the page-level
- * PD_ALL_VISIBLE bit is clear. If that were to occur, then a subsequent
- * page modification would fail to clear the visibility map bit.
- */
-static void
-heap_xlog_visible(XLogReaderState *record)
-{
- XLogRecPtr lsn = record->EndRecPtr;
- xl_heap_visible *xlrec = (xl_heap_visible *) XLogRecGetData(record);
- Buffer vmbuffer = InvalidBuffer;
- Buffer buffer;
- Page page;
- RelFileLocator rlocator;
- BlockNumber blkno;
- XLogRedoAction action;
-
- Assert((xlrec->flags & VISIBILITYMAP_XLOG_VALID_BITS) == xlrec->flags);
-
- XLogRecGetBlockTag(record, 1, &rlocator, NULL, &blkno);
-
- /*
- * If there are any Hot Standby transactions running that have an xmin
- * horizon old enough that this page isn't all-visible for them, they
- * might incorrectly decide that an index-only scan can skip a heap fetch.
- *
- * NB: It might be better to throw some kind of "soft" conflict here that
- * forces any index-only scan that is in flight to perform heap fetches,
- * rather than killing the transaction outright.
- */
- if (InHotStandby)
- ResolveRecoveryConflictWithSnapshot(xlrec->snapshotConflictHorizon,
- xlrec->flags & VISIBILITYMAP_XLOG_CATALOG_REL,
- rlocator);
-
- /*
- * Read the heap page, if it still exists. If the heap file has dropped or
- * truncated later in recovery, we don't need to update the page, but we'd
- * better still update the visibility map.
- */
- action = XLogReadBufferForRedo(record, 1, &buffer);
- if (action == BLK_NEEDS_REDO)
- {
- /*
- * We don't bump the LSN of the heap page when setting the visibility
- * map bit (unless checksums or wal_hint_bits is enabled, in which
- * case we must). This exposes us to torn page hazards, but since
- * we're not inspecting the existing page contents in any way, we
- * don't care.
- */
- page = BufferGetPage(buffer);
-
- PageSetAllVisible(page);
-
- if (XLogHintBitIsNeeded())
- PageSetLSN(page, lsn);
-
- MarkBufferDirty(buffer);
- }
- else if (action == BLK_RESTORED)
- {
- /*
- * If heap block was backed up, we already restored it and there's
- * nothing more to do. (This can only happen with checksums or
- * wal_log_hints enabled.)
- */
- }
-
- if (BufferIsValid(buffer))
- {
- Size space = PageGetFreeSpace(BufferGetPage(buffer));
-
- UnlockReleaseBuffer(buffer);
-
- /*
- * Since FSM is not WAL-logged and only updated heuristically, it
- * easily becomes stale in standbys. If the standby is later promoted
- * and runs VACUUM, it will skip updating individual free space
- * figures for pages that became all-visible (or all-frozen, depending
- * on the vacuum mode,) which is troublesome when FreeSpaceMapVacuum
- * propagates too optimistic free space values to upper FSM layers;
- * later inserters try to use such pages only to find out that they
- * are unusable. This can cause long stalls when there are many such
- * pages.
- *
- * Forestall those problems by updating FSM's idea about a page that
- * is becoming all-visible or all-frozen.
- *
- * Do this regardless of a full-page image being applied, since the
- * FSM data is not in the page anyway.
- */
- if (xlrec->flags & VISIBILITYMAP_VALID_BITS)
- XLogRecordPageWithFreeSpace(rlocator, blkno, space);
- }
-
- /*
- * Even if we skipped the heap page update due to the LSN interlock, it's
- * still safe to update the visibility map. Any WAL record that clears
- * the visibility map bit does so before checking the page LSN, so any
- * bits that need to be cleared will still be cleared.
- */
- if (XLogReadBufferForRedoExtended(record, 0, RBM_ZERO_ON_ERROR, false,
- &vmbuffer) == BLK_NEEDS_REDO)
- {
- Page vmpage = BufferGetPage(vmbuffer);
- Relation reln;
- uint8 vmbits;
-
- /* initialize the page if it was read as zeros */
- if (PageIsNew(vmpage))
- PageInit(vmpage, BLCKSZ, 0);
-
- /* remove VISIBILITYMAP_XLOG_* */
- vmbits = xlrec->flags & VISIBILITYMAP_VALID_BITS;
-
- /*
- * XLogReadBufferForRedoExtended locked the buffer. But
- * visibilitymap_set will handle locking itself.
- */
- LockBuffer(vmbuffer, BUFFER_LOCK_UNLOCK);
-
- reln = CreateFakeRelcacheEntry(rlocator);
-
- visibilitymap_set(reln, blkno, InvalidBuffer, lsn, vmbuffer,
- xlrec->snapshotConflictHorizon, vmbits);
-
- ReleaseBuffer(vmbuffer);
- FreeFakeRelcacheEntry(reln);
- }
- else if (BufferIsValid(vmbuffer))
- UnlockReleaseBuffer(vmbuffer);
-}
-
/*
* Given an "infobits" field from an XLog record, set the correct bits in the
* given infomask and infomask2 for the tuple touched by the record.
@@ -762,8 +626,8 @@ heap_xlog_multi_insert(XLogReaderState *record)
*
* During recovery, however, no concurrent writers exist. Therefore,
* updating the VM without holding the heap page lock is safe enough. This
- * same approach is taken when replaying xl_heap_visible records (see
- * heap_xlog_visible()).
+ * same approach is taken when replaying XLOG_HEAP2_PRUNE* records (see
+ * heap_xlog_prune_and_freeze()).
*/
if ((xlrec->flags & XLH_INSERT_ALL_FROZEN_SET) &&
XLogReadBufferForRedoExtended(record, 1, RBM_ZERO_ON_ERROR, false,
@@ -775,11 +639,11 @@ heap_xlog_multi_insert(XLogReaderState *record)
if (PageIsNew(vmpage))
PageInit(vmpage, BLCKSZ, 0);
- visibilitymap_set_vmbits(blkno,
- vmbuffer,
- VISIBILITYMAP_ALL_VISIBLE |
- VISIBILITYMAP_ALL_FROZEN,
- rlocator);
+ visibilitymap_set(blkno,
+ vmbuffer,
+ VISIBILITYMAP_ALL_VISIBLE |
+ VISIBILITYMAP_ALL_FROZEN,
+ rlocator);
Assert(BufferIsDirty(vmbuffer));
PageSetLSN(vmpage, lsn);
@@ -1360,9 +1224,6 @@ heap2_redo(XLogReaderState *record)
case XLOG_HEAP2_PRUNE_VACUUM_CLEANUP:
heap_xlog_prune_freeze(record);
break;
- case XLOG_HEAP2_VISIBLE:
- heap_xlog_visible(record);
- break;
case XLOG_HEAP2_MULTI_INSERT:
heap_xlog_multi_insert(record);
break;
diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index 49d3ebb0063..b099483051a 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -1240,8 +1240,8 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
* so there is no point in optimizing it.
*/
PageSetAllVisible(page);
- visibilitymap_set_vmbits(blockno, vmbuffer, new_vmbits,
- params->relation->rd_locator);
+ visibilitymap_set(blockno, vmbuffer, new_vmbits,
+ params->relation->rd_locator);
}
MarkBufferDirty(buffer);
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index d22d2a86ed0..93f0f39c5f0 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -1889,11 +1889,11 @@ lazy_scan_new_or_empty(LVRelState *vacrel, Buffer buf, BlockNumber blkno,
log_newpage_buffer(buf, true);
PageSetAllVisible(page);
- visibilitymap_set_vmbits(blkno,
- vmbuffer,
- VISIBILITYMAP_ALL_VISIBLE |
- VISIBILITYMAP_ALL_FROZEN,
- vacrel->rel->rd_locator);
+ visibilitymap_set(blkno,
+ vmbuffer,
+ VISIBILITYMAP_ALL_VISIBLE |
+ VISIBILITYMAP_ALL_FROZEN,
+ vacrel->rel->rd_locator);
/*
* Emit WAL for setting PD_ALL_VISIBLE on the heap page and
@@ -2771,9 +2771,9 @@ lazy_vacuum_heap_page(LVRelState *vacrel, BlockNumber blkno, Buffer buffer,
* set PD_ALL_VISIBLE.
*/
PageSetAllVisible(page);
- visibilitymap_set_vmbits(blkno,
- vmbuffer, vmflags,
- vacrel->rel->rd_locator);
+ visibilitymap_set(blkno,
+ vmbuffer, vmflags,
+ vacrel->rel->rd_locator);
conflict_xid = visibility_cutoff_xid;
}
diff --git a/src/backend/access/heap/visibilitymap.c b/src/backend/access/heap/visibilitymap.c
index cdcb475e501..d30fee3a488 100644
--- a/src/backend/access/heap/visibilitymap.c
+++ b/src/backend/access/heap/visibilitymap.c
@@ -14,8 +14,7 @@
* visibilitymap_clear - clear bits for one page in the visibility map
* visibilitymap_pin - pin a map page for setting a bit
* visibilitymap_pin_ok - check whether correct map page is already pinned
- * visibilitymap_set - set bit(s) in a previously pinned page and log
- * visibilitymap_set_vmbits - set bit(s) in a pinned page
+ * visibilitymap_set - set bit(s) in a previously pinned page
* visibilitymap_get_status - get status of bits
* visibilitymap_count - count number of bits set in visibility map
* visibilitymap_prepare_truncate -
@@ -220,106 +219,6 @@ visibilitymap_pin_ok(BlockNumber heapBlk, Buffer vmbuf)
return BufferIsValid(vmbuf) && BufferGetBlockNumber(vmbuf) == mapBlock;
}
-/*
- * visibilitymap_set - set bit(s) on a previously pinned page
- *
- * recptr is the LSN of the XLOG record we're replaying, if we're in recovery,
- * or InvalidXLogRecPtr in normal running. The VM page LSN is advanced to the
- * one provided; in normal running, we generate a new XLOG record and set the
- * page LSN to that value (though the heap page's LSN may *not* be updated;
- * see below). cutoff_xid is the largest xmin on the page being marked
- * all-visible; it is needed for Hot Standby, and can be InvalidTransactionId
- * if the page contains no tuples. It can also be set to InvalidTransactionId
- * when a page that is already all-visible is being marked all-frozen.
- *
- * Caller is expected to set the heap page's PD_ALL_VISIBLE bit before calling
- * this function. Except in recovery, caller should also pass the heap
- * buffer. When checksums are enabled and we're not in recovery, we must add
- * the heap buffer to the WAL chain to protect it from being torn.
- *
- * You must pass a buffer containing the correct map page to this function.
- * Call visibilitymap_pin first to pin the right one. This function doesn't do
- * any I/O.
- */
-void
-visibilitymap_set(Relation rel, BlockNumber heapBlk, Buffer heapBuf,
- XLogRecPtr recptr, Buffer vmBuf, TransactionId cutoff_xid,
- uint8 flags)
-{
- BlockNumber mapBlock = HEAPBLK_TO_MAPBLOCK(heapBlk);
- uint32 mapByte = HEAPBLK_TO_MAPBYTE(heapBlk);
- uint8 mapOffset = HEAPBLK_TO_OFFSET(heapBlk);
- Page page;
- uint8 *map;
- uint8 status;
-
-#ifdef TRACE_VISIBILITYMAP
- elog(DEBUG1, "vm_set flags 0x%02X for %s %d",
- flags, RelationGetRelationName(rel), heapBlk);
-#endif
-
- Assert(InRecovery || !XLogRecPtrIsValid(recptr));
- Assert(InRecovery || PageIsAllVisible(BufferGetPage(heapBuf)));
- Assert((flags & VISIBILITYMAP_VALID_BITS) == flags);
-
- /* Must never set all_frozen bit without also setting all_visible bit */
- Assert(flags != VISIBILITYMAP_ALL_FROZEN);
-
- /* Check that we have the right heap page pinned, if present */
- if (BufferIsValid(heapBuf) && BufferGetBlockNumber(heapBuf) != heapBlk)
- elog(ERROR, "wrong heap buffer passed to visibilitymap_set");
-
- Assert(!BufferIsValid(heapBuf) ||
- BufferIsLockedByMeInMode(heapBuf, BUFFER_LOCK_EXCLUSIVE));
-
- /* Check that we have the right VM page pinned */
- if (!BufferIsValid(vmBuf) || BufferGetBlockNumber(vmBuf) != mapBlock)
- elog(ERROR, "wrong VM buffer passed to visibilitymap_set");
-
- page = BufferGetPage(vmBuf);
- map = (uint8 *) PageGetContents(page);
- LockBuffer(vmBuf, BUFFER_LOCK_EXCLUSIVE);
-
- status = (map[mapByte] >> mapOffset) & VISIBILITYMAP_VALID_BITS;
- if (flags != status)
- {
- START_CRIT_SECTION();
-
- map[mapByte] |= (flags << mapOffset);
- MarkBufferDirty(vmBuf);
-
- if (RelationNeedsWAL(rel))
- {
- if (!XLogRecPtrIsValid(recptr))
- {
- Assert(!InRecovery);
- recptr = log_heap_visible(rel, heapBuf, vmBuf, cutoff_xid, flags);
-
- /*
- * If data checksums are enabled (or wal_log_hints=on), we
- * need to protect the heap page from being torn.
- *
- * If not, then we must *not* update the heap page's LSN. In
- * this case, the FPI for the heap page was omitted from the
- * WAL record inserted above, so it would be incorrect to
- * update the heap page's LSN.
- */
- if (XLogHintBitIsNeeded())
- {
- Page heapPage = BufferGetPage(heapBuf);
-
- PageSetLSN(heapPage, recptr);
- }
- }
- PageSetLSN(page, recptr);
- }
-
- END_CRIT_SECTION();
- }
-
- LockBuffer(vmBuf, BUFFER_LOCK_UNLOCK);
-}
-
/*
* Set VM (visibility map) flags in the VM block in vmBuf.
*
@@ -341,9 +240,9 @@ visibilitymap_set(Relation rel, BlockNumber heapBlk, Buffer heapBuf,
* rlocator is used only for debugging messages.
*/
void
-visibilitymap_set_vmbits(BlockNumber heapBlk,
- Buffer vmBuf, uint8 flags,
- const RelFileLocator rlocator)
+visibilitymap_set(BlockNumber heapBlk,
+ Buffer vmBuf, uint8 flags,
+ const RelFileLocator rlocator)
{
BlockNumber mapBlock = HEAPBLK_TO_MAPBLOCK(heapBlk);
uint32 mapByte = HEAPBLK_TO_MAPBYTE(heapBlk);
diff --git a/src/backend/access/rmgrdesc/heapdesc.c b/src/backend/access/rmgrdesc/heapdesc.c
index ca26d1f0ed1..08461fdf593 100644
--- a/src/backend/access/rmgrdesc/heapdesc.c
+++ b/src/backend/access/rmgrdesc/heapdesc.c
@@ -349,13 +349,6 @@ heap2_desc(StringInfo buf, XLogReaderState *record)
}
}
}
- else if (info == XLOG_HEAP2_VISIBLE)
- {
- xl_heap_visible *xlrec = (xl_heap_visible *) rec;
-
- appendStringInfo(buf, "snapshotConflictHorizon: %u, flags: 0x%02X",
- xlrec->snapshotConflictHorizon, xlrec->flags);
- }
else if (info == XLOG_HEAP2_MULTI_INSERT)
{
xl_heap_multi_insert *xlrec = (xl_heap_multi_insert *) rec;
@@ -461,9 +454,6 @@ heap2_identify(uint8 info)
case XLOG_HEAP2_PRUNE_VACUUM_CLEANUP:
id = "PRUNE_VACUUM_CLEANUP";
break;
- case XLOG_HEAP2_VISIBLE:
- id = "VISIBLE";
- break;
case XLOG_HEAP2_MULTI_INSERT:
id = "MULTI_INSERT";
break;
diff --git a/src/backend/replication/logical/decode.c b/src/backend/replication/logical/decode.c
index 5e15cb1825e..c0cac7ea1c3 100644
--- a/src/backend/replication/logical/decode.c
+++ b/src/backend/replication/logical/decode.c
@@ -454,7 +454,6 @@ heap2_decode(LogicalDecodingContext *ctx, XLogRecordBuffer *buf)
case XLOG_HEAP2_PRUNE_ON_ACCESS:
case XLOG_HEAP2_PRUNE_VACUUM_SCAN:
case XLOG_HEAP2_PRUNE_VACUUM_CLEANUP:
- case XLOG_HEAP2_VISIBLE:
case XLOG_HEAP2_LOCK_UPDATED:
break;
default:
diff --git a/src/backend/storage/ipc/standby.c b/src/backend/storage/ipc/standby.c
index fc45d72c79b..3655358ed6b 100644
--- a/src/backend/storage/ipc/standby.c
+++ b/src/backend/storage/ipc/standby.c
@@ -475,12 +475,12 @@ ResolveRecoveryConflictWithSnapshot(TransactionId snapshotConflictHorizon,
* If we get passed InvalidTransactionId then we do nothing (no conflict).
*
* This can happen when replaying already-applied WAL records after a
- * standby crash or restart, or when replaying an XLOG_HEAP2_VISIBLE
- * record that marks as frozen a page which was already all-visible. It's
- * also quite common with records generated during index deletion
- * (original execution of the deletion can reason that a recovery conflict
- * which is sufficient for the deletion operation must take place before
- * replay of the deletion record itself).
+ * standby crash or restart, or when replaying a record that marks as
+ * frozen a page which was already marked all-visible in the visibility
+ * map. It's also quite common with records generated during index
+ * deletion (original execution of the deletion can reason that a recovery
+ * conflict which is sufficient for the deletion operation must take place
+ * before replay of the deletion record itself).
*/
if (!TransactionIdIsValid(snapshotConflictHorizon))
return;
diff --git a/src/include/access/heapam_xlog.h b/src/include/access/heapam_xlog.h
index 16c2b2e3c9c..b27fcdfb345 100644
--- a/src/include/access/heapam_xlog.h
+++ b/src/include/access/heapam_xlog.h
@@ -60,7 +60,6 @@
#define XLOG_HEAP2_PRUNE_ON_ACCESS 0x10
#define XLOG_HEAP2_PRUNE_VACUUM_SCAN 0x20
#define XLOG_HEAP2_PRUNE_VACUUM_CLEANUP 0x30
-#define XLOG_HEAP2_VISIBLE 0x40
#define XLOG_HEAP2_MULTI_INSERT 0x50
#define XLOG_HEAP2_LOCK_UPDATED 0x60
#define XLOG_HEAP2_NEW_CID 0x70
@@ -443,20 +442,6 @@ typedef struct xl_heap_inplace
#define MinSizeOfHeapInplace (offsetof(xl_heap_inplace, nmsgs) + sizeof(int))
-/*
- * This is what we need to know about setting a visibility map bit
- *
- * Backup blk 0: visibility map buffer
- * Backup blk 1: heap buffer
- */
-typedef struct xl_heap_visible
-{
- TransactionId snapshotConflictHorizon;
- uint8 flags;
-} xl_heap_visible;
-
-#define SizeOfHeapVisible (offsetof(xl_heap_visible, flags) + sizeof(uint8))
-
typedef struct xl_heap_new_cid
{
/*
@@ -500,11 +485,6 @@ extern void heap2_desc(StringInfo buf, XLogReaderState *record);
extern const char *heap2_identify(uint8 info);
extern void heap_xlog_logical_rewrite(XLogReaderState *r);
-extern XLogRecPtr log_heap_visible(Relation rel, Buffer heap_buffer,
- Buffer vm_buffer,
- TransactionId snapshotConflictHorizon,
- uint8 vmflags);
-
/* in heapdesc.c, so it can be shared between frontend/backend code */
extern void heap_xlog_deserialize_prune_and_freeze(char *cursor, uint16 flags,
int *nplans, xlhp_freeze_plan **plans,
diff --git a/src/include/access/visibilitymap.h b/src/include/access/visibilitymap.h
index 787c19e5fef..a6580ea6188 100644
--- a/src/include/access/visibilitymap.h
+++ b/src/include/access/visibilitymap.h
@@ -15,7 +15,6 @@
#define VISIBILITYMAP_H
#include "access/visibilitymapdefs.h"
-#include "access/xlogdefs.h"
#include "storage/block.h"
#include "storage/buf.h"
#include "storage/relfilelocator.h"
@@ -32,15 +31,9 @@ extern bool visibilitymap_clear(Relation rel, BlockNumber heapBlk,
extern void visibilitymap_pin(Relation rel, BlockNumber heapBlk,
Buffer *vmbuf);
extern bool visibilitymap_pin_ok(BlockNumber heapBlk, Buffer vmbuf);
-extern void visibilitymap_set(Relation rel,
- BlockNumber heapBlk, Buffer heapBuf,
- XLogRecPtr recptr,
- Buffer vmBuf,
- TransactionId cutoff_xid,
- uint8 flags);
-extern void visibilitymap_set_vmbits(BlockNumber heapBlk,
- Buffer vmBuf, uint8 flags,
- const RelFileLocator rlocator);
+extern void visibilitymap_set(BlockNumber heapBlk,
+ Buffer vmBuf, uint8 flags,
+ const RelFileLocator rlocator);
extern uint8 visibilitymap_get_status(Relation rel, BlockNumber heapBlk, Buffer *vmbuf);
extern void visibilitymap_count(Relation rel, BlockNumber *all_visible, BlockNumber *all_frozen);
extern BlockNumber visibilitymap_prepare_truncate(Relation rel,
diff --git a/src/include/access/visibilitymapdefs.h b/src/include/access/visibilitymapdefs.h
index 5ad5c020877..e01bce4c99f 100644
--- a/src/include/access/visibilitymapdefs.h
+++ b/src/include/access/visibilitymapdefs.h
@@ -21,14 +21,5 @@
#define VISIBILITYMAP_ALL_FROZEN 0x02
#define VISIBILITYMAP_VALID_BITS 0x03 /* OR of all valid visibilitymap
* flags bits */
-/*
- * To detect recovery conflicts during logical decoding on a standby, we need
- * to know if a table is a user catalog table. For that we add an additional
- * bit into xl_heap_visible.flags, in addition to the above.
- *
- * NB: VISIBILITYMAP_XLOG_* may not be passed to visibilitymap_set().
- */
-#define VISIBILITYMAP_XLOG_CATALOG_REL 0x04
-#define VISIBILITYMAP_XLOG_VALID_BITS (VISIBILITYMAP_VALID_BITS | VISIBILITYMAP_XLOG_CATALOG_REL)
#endif /* VISIBILITYMAPDEFS_H */
diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list
index 04845d5e680..6505628120c 100644
--- a/src/tools/pgindent/typedefs.list
+++ b/src/tools/pgindent/typedefs.list
@@ -4331,7 +4331,6 @@ xl_heap_prune
xl_heap_rewrite_mapping
xl_heap_truncate
xl_heap_update
-xl_heap_visible
xl_invalid_page
xl_invalid_page_key
xl_invalidations
--
2.43.0
v31-0009-Simplify-heap_page_would_be_all_visible-visibili.patchtext/x-patch; charset=US-ASCII; name=v31-0009-Simplify-heap_page_would_be_all_visible-visibili.patchDownload
From 667b2e7c19c70694912223bc35d8f286a439dacd Mon Sep 17 00:00:00 2001
From: Melanie Plageman <melanieplageman@gmail.com>
Date: Wed, 17 Dec 2025 13:57:16 -0500
Subject: [PATCH v31 09/16] Simplify heap_page_would_be_all_visible visibility
check
heap_page_would_be_all_visible() doesn't care about the distinction
between HEAPTUPLE_RECENTLY_DEAD and HEAPTUPLE_DEAD tuples -- any tuple
that is not HEAPTUPLE_LIVE means the page is not all-visible and causes
us to return false.
Therefore, we don't need to call HeapTupleSatisfiesVacuum(), which
includes an extra step to distinguish between dead and recently dead
tuples using OldestXmin. Replace it with the more minimal
HeapTupleSatisfiesVacuumHorizon().
This has the added benefit of making it easier to replace uses of
OldestXmin in heap_page_would_be_all_visible() in the future.
Author: Melanie Plageman <melanieplageman@gmail.com>
Reviewed-by: Kirill Reshke <reshkekirill@gmail.com>
Discussion: https://postgr.es/m/CALdSSPjvhGXihT_9f-GJabYU%3D_PjrFDUxYaURuTbfLyQM6TErg%40mail.gmail.com
---
src/backend/access/heap/vacuumlazy.c | 9 +++++++--
1 file changed, 7 insertions(+), 2 deletions(-)
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index 93f0f39c5f0..e827ca21c68 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -3537,6 +3537,7 @@ heap_page_would_be_all_visible(Relation rel, Buffer buf,
{
ItemId itemid;
HeapTupleData tuple;
+ TransactionId dead_after;
/*
* Set the offset number so that we can display it along with any
@@ -3576,12 +3577,14 @@ heap_page_would_be_all_visible(Relation rel, Buffer buf,
/* Visibility checks may do IO or allocate memory */
Assert(CritSectionCount == 0);
- switch (HeapTupleSatisfiesVacuum(&tuple, OldestXmin, buf))
+ switch (HeapTupleSatisfiesVacuumHorizon(&tuple, buf, &dead_after))
{
case HEAPTUPLE_LIVE:
{
TransactionId xmin;
+ Assert(!TransactionIdIsValid(dead_after));
+
/* Check comments in lazy_scan_prune. */
if (!HeapTupleHeaderXminCommitted(tuple.t_data))
{
@@ -3614,8 +3617,10 @@ heap_page_would_be_all_visible(Relation rel, Buffer buf,
}
break;
- case HEAPTUPLE_DEAD:
case HEAPTUPLE_RECENTLY_DEAD:
+ Assert(TransactionIdIsValid(dead_after));
+ /* FALLTHROUGH */
+ case HEAPTUPLE_DEAD:
case HEAPTUPLE_INSERT_IN_PROGRESS:
case HEAPTUPLE_DELETE_IN_PROGRESS:
{
--
2.43.0
v31-0010-Remove-table_scan_analyze_next_tuple-unneeded-pa.patchtext/x-patch; charset=US-ASCII; name=v31-0010-Remove-table_scan_analyze_next_tuple-unneeded-pa.patchDownload
From 365b928d060f7248e209a4e26ff914da41178730 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <melanieplageman@gmail.com>
Date: Mon, 22 Dec 2025 10:46:45 -0500
Subject: [PATCH v31 10/16] Remove table_scan_analyze_next_tuple unneeded
parameter OldestXmin
heapam_scan_analyze_next_tuple() doesn't distinguish between dead and
recently dead tuples when counting them, so it doesn't need OldestXmin.
Looking at other table AMs implementing table_scan_analyze_next_tuple(),
it appears most do not use OldestXmin either.
Suggested-by: Kirill Reshke <reshkekirill@gmail.com>
Discussion: https://postgr.es/m/CALdSSPjvhGXihT_9f-GJabYU%3D_PjrFDUxYaURuTbfLyQM6TErg%40mail.gmail.com
---
src/backend/access/heap/heapam_handler.c | 13 +++++++++----
src/backend/commands/analyze.c | 6 +-----
src/include/access/tableam.h | 5 ++---
3 files changed, 12 insertions(+), 12 deletions(-)
diff --git a/src/backend/access/heap/heapam_handler.c b/src/backend/access/heap/heapam_handler.c
index dd4fe6bf62f..8707d1aab4a 100644
--- a/src/backend/access/heap/heapam_handler.c
+++ b/src/backend/access/heap/heapam_handler.c
@@ -1026,7 +1026,7 @@ heapam_scan_analyze_next_block(TableScanDesc scan, ReadStream *stream)
}
static bool
-heapam_scan_analyze_next_tuple(TableScanDesc scan, TransactionId OldestXmin,
+heapam_scan_analyze_next_tuple(TableScanDesc scan,
double *liverows, double *deadrows,
TupleTableSlot *slot)
{
@@ -1047,6 +1047,7 @@ heapam_scan_analyze_next_tuple(TableScanDesc scan, TransactionId OldestXmin,
ItemId itemid;
HeapTuple targtuple = &hslot->base.tupdata;
bool sample_it = false;
+ TransactionId dead_after;
itemid = PageGetItemId(targpage, hscan->rs_cindex);
@@ -1069,16 +1070,20 @@ heapam_scan_analyze_next_tuple(TableScanDesc scan, TransactionId OldestXmin,
targtuple->t_data = (HeapTupleHeader) PageGetItem(targpage, itemid);
targtuple->t_len = ItemIdGetLength(itemid);
- switch (HeapTupleSatisfiesVacuum(targtuple, OldestXmin,
- hscan->rs_cbuf))
+ switch (HeapTupleSatisfiesVacuumHorizon(targtuple,
+ hscan->rs_cbuf,
+ &dead_after))
{
case HEAPTUPLE_LIVE:
sample_it = true;
*liverows += 1;
break;
- case HEAPTUPLE_DEAD:
case HEAPTUPLE_RECENTLY_DEAD:
+ Assert(TransactionIdIsValid(dead_after));
+ /* FALLTHROUGH */
+
+ case HEAPTUPLE_DEAD:
/* Count dead and recently-dead rows */
*deadrows += 1;
break;
diff --git a/src/backend/commands/analyze.c b/src/backend/commands/analyze.c
index 5e2a7a8234e..184bc3dd3b2 100644
--- a/src/backend/commands/analyze.c
+++ b/src/backend/commands/analyze.c
@@ -1213,7 +1213,6 @@ acquire_sample_rows(Relation onerel, int elevel,
double rowstoskip = -1; /* -1 means not set yet */
uint32 randseed; /* Seed for block sampler(s) */
BlockNumber totalblocks;
- TransactionId OldestXmin;
BlockSamplerData bs;
ReservoirStateData rstate;
TupleTableSlot *slot;
@@ -1226,9 +1225,6 @@ acquire_sample_rows(Relation onerel, int elevel,
totalblocks = RelationGetNumberOfBlocks(onerel);
- /* Need a cutoff xmin for HeapTupleSatisfiesVacuum */
- OldestXmin = GetOldestNonRemovableTransactionId(onerel);
-
/* Prepare for sampling block numbers */
randseed = pg_prng_uint32(&pg_global_prng_state);
nblocks = BlockSampler_Init(&bs, totalblocks, targrows, randseed);
@@ -1261,7 +1257,7 @@ acquire_sample_rows(Relation onerel, int elevel,
{
vacuum_delay_point(true);
- while (table_scan_analyze_next_tuple(scan, OldestXmin, &liverows, &deadrows, slot))
+ while (table_scan_analyze_next_tuple(scan, &liverows, &deadrows, slot))
{
/*
* The first targrows sample rows are simply copied into the
diff --git a/src/include/access/tableam.h b/src/include/access/tableam.h
index 2fa790b6bf5..ee9b32c4620 100644
--- a/src/include/access/tableam.h
+++ b/src/include/access/tableam.h
@@ -683,7 +683,6 @@ typedef struct TableAmRoutine
* callback).
*/
bool (*scan_analyze_next_tuple) (TableScanDesc scan,
- TransactionId OldestXmin,
double *liverows,
double *deadrows,
TupleTableSlot *slot);
@@ -1714,11 +1713,11 @@ table_scan_analyze_next_block(TableScanDesc scan, ReadStream *stream)
* tuples.
*/
static inline bool
-table_scan_analyze_next_tuple(TableScanDesc scan, TransactionId OldestXmin,
+table_scan_analyze_next_tuple(TableScanDesc scan,
double *liverows, double *deadrows,
TupleTableSlot *slot)
{
- return scan->rs_rd->rd_tableam->scan_analyze_next_tuple(scan, OldestXmin,
+ return scan->rs_rd->rd_tableam->scan_analyze_next_tuple(scan,
liverows, deadrows,
slot);
}
--
2.43.0
v31-0011-Use-GlobalVisState-in-vacuum-to-determine-page-l.patchtext/x-patch; charset=UTF-8; name=v31-0011-Use-GlobalVisState-in-vacuum-to-determine-page-l.patchDownload
From dc7f28b35a07c637d0bad46194816773217098b1 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <melanieplageman@gmail.com>
Date: Wed, 17 Dec 2025 16:51:05 -0500
Subject: [PATCH v31 11/16] Use GlobalVisState in vacuum to determine page
level visibility
During vacuum's first and third phases, we examine tuples' visibility
to determine if we can set the page all-visible in the visibility map.
Previously, this check compared tuple xmins against a single XID chosen at
the start of vacuum (OldestXmin). We now use GlobalVisState, which also
enables future work to set the VM during on-access pruning, since ordinary
queries have access to GlobalVisState but not OldestXmin.
This also benefits vacuum: in some cases, GlobalVisState may advance
during a vacuum, allowing more pages to become considered all-visible.
And, in the future, we could easily add a heuristic to update
GlobalVisState more frequently during vacuums of large tables.
OldestXmin is still used for freezing and as a backstop to ensure we
don't freeze a dead tuple that wasn't yet prunable according to
GlobalVisState in the rare occurrences where GlobalVisState moves
backwards.
Because comparing a transaction ID against GlobalVisState is more
expensive than comparing against a single XID, we defer this check until
after scanning all tuples on the page. If visibility_cutoff_xid was
maintained, we perform the GlobalVisState check only once per page.
This is safe because visibility_cutoff_xid records the newest xmin on
the page; if it is globally visible, then the entire page is all-visible.
This approach may result in examining more tuple xmins than before,
since with OldestXmin we could sometimes rule out the page being
all-visible earlier. However, profiling shows the additional cost is not
significant.
Reviewed-by: Andres Freund <andres@anarazel.de>
Reviewed-by: Chao Li <li.evan.chao@gmail.com>
---
src/backend/access/heap/heapam_visibility.c | 22 +++++++++
src/backend/access/heap/pruneheap.c | 53 ++++++++++-----------
src/backend/access/heap/vacuumlazy.c | 38 ++++++++++-----
src/include/access/heapam.h | 4 +-
4 files changed, 76 insertions(+), 41 deletions(-)
diff --git a/src/backend/access/heap/heapam_visibility.c b/src/backend/access/heap/heapam_visibility.c
index bf899c2d2c6..7d9bd28d8f0 100644
--- a/src/backend/access/heap/heapam_visibility.c
+++ b/src/backend/access/heap/heapam_visibility.c
@@ -1053,6 +1053,28 @@ HeapTupleSatisfiesVacuum(HeapTuple htup, TransactionId OldestXmin,
return res;
}
+/*
+ * Wrapper around GlobalVisTestIsRemovableXid() for use when examining live
+ * tuples. Returns true if the given XID may be considered running by at least
+ * one snapshot.
+ *
+ * This function alone is insufficient to determine tuple visibility; callers
+ * must also consider the XID's commit status. Its purpose is purely semantic:
+ * when applied to live tuples, GlobalVisTestIsRemovableXid() is checking
+ * whether the inserting transaction is still considered running, not whether
+ * the tuple is removable. Live tuples are, by definition, not removable, but
+ * the snapshot criteria for “transaction still running” are identical to
+ * those used for removal XIDs.
+ *
+ * See the comment above GlobalVisTestIsRemovable[Full]Xid() for details on the
+ * required preconditions for calling this function.
+ */
+bool
+GlobalVisTestXidMaybeRunning(GlobalVisState *state, TransactionId xid)
+{
+ return !GlobalVisTestIsRemovableXid(state, xid);
+}
+
/*
* Work horse for HeapTupleSatisfiesVacuum and similar routines.
*
diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index b099483051a..c507231d2a4 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -461,11 +461,12 @@ prune_freeze_setup(PruneFreezeParams *params,
/*
* The visibility cutoff xid is the newest xmin of live, committed tuples
- * older than OldestXmin on the page. This field is only kept up-to-date
- * if the page is all-visible. As soon as a tuple is encountered that is
- * not visible to all, this field is unmaintained. As long as it is
- * maintained, it can be used to calculate the snapshot conflict horizon
- * when updating the VM and/or freezing all the tuples on the page.
+ * on the page older than the visibility horizon represented in the
+ * GlobalVisState. This field is only kept up-to-date if the page is
+ * all-visible. As soon as a tuple is encountered that is not visible to
+ * all, this field is unmaintained. As long as it is maintained, it can be
+ * used to calculate the snapshot conflict horizon when updating the VM
+ * and/or freezing all the tuples on the page.
*/
prstate->visibility_cutoff_xid = InvalidTransactionId;
}
@@ -1008,14 +1009,14 @@ heap_page_will_set_vm(PruneState *prstate,
*/
static bool
heap_page_is_all_visible(Relation rel, Buffer buf,
- TransactionId OldestXmin,
+ GlobalVisState *vistest,
bool *all_frozen,
TransactionId *visibility_cutoff_xid,
OffsetNumber *logging_offnum)
{
return heap_page_would_be_all_visible(rel, buf,
- OldestXmin,
+ vistest,
NULL, 0,
all_frozen,
visibility_cutoff_xid,
@@ -1102,6 +1103,16 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
prune_freeze_plan(RelationGetRelid(params->relation),
buffer, &prstate, off_loc);
+ /*
+ * After processing all the live tuples on the page, if the newest xmin
+ * amongst them may be considered running by any snapshot, the page cannot
+ * be all-visible.
+ */
+ if (prstate.all_visible &&
+ TransactionIdIsNormal(prstate.visibility_cutoff_xid) &&
+ GlobalVisTestXidMaybeRunning(prstate.vistest, prstate.visibility_cutoff_xid))
+ prstate.all_visible = prstate.all_frozen = false;
+
/*
* If checksums are enabled, calling heap_prune_satisfies_vacuum() while
* checking tuple visibility information in prune_freeze_plan() may have
@@ -1283,10 +1294,9 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
bool debug_all_frozen;
Assert(prstate.lpdead_items == 0);
- Assert(prstate.cutoffs);
Assert(heap_page_is_all_visible(params->relation, buffer,
- prstate.cutoffs->OldestXmin,
+ prstate.vistest,
&debug_all_frozen,
&debug_cutoff, off_loc));
@@ -1807,28 +1817,15 @@ heap_prune_record_unchanged_lp_normal(Page page, PruneState *prstate, OffsetNumb
}
/*
- * The inserter definitely committed. But is it old enough
- * that everyone sees it as committed? A FrozenTransactionId
- * is seen as committed to everyone. Otherwise, we check if
- * there is a snapshot that considers this xid to still be
- * running, and if so, we don't consider the page all-visible.
+ * The inserter definitely committed. But we don't know if it
+ * is old enough that everyone sees it as committed. Later,
+ * after processing all the tuples on the page, we'll check if
+ * there is any snapshot that still considers the newest xid
+ * on the page to be running. If so, we don't consider the
+ * page all-visible.
*/
xmin = HeapTupleHeaderGetXmin(htup);
- /*
- * For now always use prstate->cutoffs for this test, because
- * we only update 'all_visible' and 'all_frozen' when freezing
- * is requested. We could use GlobalVisTestIsRemovableXid
- * instead, if a non-freezing caller wanted to set the VM bit.
- */
- Assert(prstate->cutoffs);
- if (!TransactionIdPrecedes(xmin, prstate->cutoffs->OldestXmin))
- {
- prstate->all_visible = false;
- prstate->all_frozen = false;
- break;
- }
-
/* Track newest xmin on page. */
if (TransactionIdFollows(xmin, prstate->visibility_cutoff_xid) &&
TransactionIdIsNormal(xmin))
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index e827ca21c68..7463d46891b 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -2725,7 +2725,7 @@ lazy_vacuum_heap_page(LVRelState *vacrel, BlockNumber blkno, Buffer buffer,
* done outside the critical section.
*/
if (heap_page_would_be_all_visible(vacrel->rel, buffer,
- vacrel->cutoffs.OldestXmin,
+ vacrel->vistest,
deadoffsets, num_offsets,
&all_frozen, &visibility_cutoff_xid,
&vacrel->offnum))
@@ -3486,7 +3486,7 @@ dead_items_cleanup(LVRelState *vacrel)
* Returns true if the page is all-visible other than the provided
* deadoffsets and false otherwise.
*
- * OldestXmin is used to determine visibility.
+ * vistest is used to determine visibility.
*
* Output parameters:
*
@@ -3502,7 +3502,7 @@ dead_items_cleanup(LVRelState *vacrel)
*/
bool
heap_page_would_be_all_visible(Relation rel, Buffer buf,
- TransactionId OldestXmin,
+ GlobalVisState *vistest,
OffsetNumber *deadoffsets,
int ndeadoffsets,
bool *all_frozen,
@@ -3585,7 +3585,7 @@ heap_page_would_be_all_visible(Relation rel, Buffer buf,
Assert(!TransactionIdIsValid(dead_after));
- /* Check comments in lazy_scan_prune. */
+ /* Check heap_prune_record_unchanged_lp_normal comments */
if (!HeapTupleHeaderXminCommitted(tuple.t_data))
{
all_visible = false;
@@ -3594,16 +3594,17 @@ heap_page_would_be_all_visible(Relation rel, Buffer buf,
}
/*
- * The inserter definitely committed. But is it old enough
- * that everyone sees it as committed?
+ * The inserter definitely committed. But we don't know if
+ * it is old enough that everyone sees it as committed.
+ * Don't check that now.
+ *
+ * If we scan all tuples without finding one that prevents
+ * the page from being all-visible, we then check whether
+ * any snapshot still considers the newest XID on the page
+ * to be running. In that case, the page is not considered
+ * all-visible.
*/
xmin = HeapTupleHeaderGetXmin(tuple.t_data);
- if (!TransactionIdPrecedes(xmin, OldestXmin))
- {
- all_visible = false;
- *all_frozen = false;
- break;
- }
/* Track newest xmin on page. */
if (TransactionIdFollows(xmin, *visibility_cutoff_xid) &&
@@ -3634,6 +3635,19 @@ heap_page_would_be_all_visible(Relation rel, Buffer buf,
}
} /* scan along page */
+ /*
+ * After processing all the live tuples on the page, if the newest xmin
+ * among them may still be considered running by any snapshot, the page
+ * cannot be all-visible.
+ */
+ if (all_visible &&
+ TransactionIdIsNormal(*visibility_cutoff_xid) &&
+ GlobalVisTestXidMaybeRunning(vistest, *visibility_cutoff_xid))
+ {
+ all_visible = false;
+ *all_frozen = false;
+ }
+
/* Clear the offset information once we have processed the given page. */
*logging_offnum = InvalidOffsetNumber;
diff --git a/src/include/access/heapam.h b/src/include/access/heapam.h
index 88e79c58a10..5657b1df46b 100644
--- a/src/include/access/heapam.h
+++ b/src/include/access/heapam.h
@@ -438,7 +438,7 @@ extern void log_heap_prune_and_freeze(Relation relation, Buffer buffer,
extern void heap_vacuum_rel(Relation rel,
const VacuumParams params, BufferAccessStrategy bstrategy);
extern bool heap_page_would_be_all_visible(Relation rel, Buffer buf,
- TransactionId OldestXmin,
+ GlobalVisState *vistest,
OffsetNumber *deadoffsets,
int ndeadoffsets,
bool *all_frozen,
@@ -452,6 +452,8 @@ extern TM_Result HeapTupleSatisfiesUpdate(HeapTuple htup, CommandId curcid,
Buffer buffer);
extern HTSV_Result HeapTupleSatisfiesVacuum(HeapTuple htup, TransactionId OldestXmin,
Buffer buffer);
+
+extern bool GlobalVisTestXidMaybeRunning(GlobalVisState *state, TransactionId xid);
extern HTSV_Result HeapTupleSatisfiesVacuumHorizon(HeapTuple htup, Buffer buffer,
TransactionId *dead_after);
extern void HeapTupleSetHintBits(HeapTupleHeader tuple, Buffer buffer,
--
2.43.0
v31-0012-Unset-all_visible-sooner-if-not-freezing.patchtext/x-patch; charset=UTF-8; name=v31-0012-Unset-all_visible-sooner-if-not-freezing.patchDownload
From 835c3a565d2148fc6a0d79c37de70b7c586edbff Mon Sep 17 00:00:00 2001
From: Melanie Plageman <melanieplageman@gmail.com>
Date: Tue, 14 Oct 2025 15:22:35 -0400
Subject: [PATCH v31 12/16] Unset all_visible sooner if not freezing
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
In the prune/freeze path, we currently delay clearing all_visible and
all_frozen in the presence of dead items to allow opportunistic
freezing.
However, if no freezing will be attempted, there’s no need to delay.
Clearing the flags earlier avoids extra bookkeeping in
heap_prune_record_unchanged_lp_normal(). This currently has no runtime
effect because all callers that consider setting the VM also prepare
freeze plans, but upcoming changes will allow on-access pruning to set
the VM without freezing. The extra bookkeeping was noticeable in a
profile of on-access VM setting.
Reviewed-by: Chao Li <li.evan.chao@gmail.com>
---
src/backend/access/heap/pruneheap.c | 15 +++++++++++++--
1 file changed, 13 insertions(+), 2 deletions(-)
diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index c507231d2a4..8e59e7692c1 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -1682,8 +1682,13 @@ heap_prune_record_dead(PruneState *prstate, OffsetNumber offnum,
/*
* Deliberately delay unsetting all_visible and all_frozen until later
* during pruning. Removable dead tuples shouldn't preclude freezing the
- * page.
+ * page. If we won't attempt freezing, just unset all-visible now, though.
*/
+ if (!prstate->attempt_freeze)
+ {
+ prstate->all_visible = false;
+ prstate->all_frozen = false;
+ }
/* Record the dead offset for vacuum */
prstate->deadoffsets[prstate->lpdead_items++] = offnum;
@@ -1943,8 +1948,14 @@ heap_prune_record_unchanged_lp_dead(Page page, PruneState *prstate, OffsetNumber
* Similarly, don't unset all_visible and all_frozen until later, at the
* end of heap_page_prune_and_freeze(). This will allow us to attempt to
* freeze the page after pruning. As long as we unset it before updating
- * the visibility map, this will be correct.
+ * the visibility map, this will be correct. If we won't attempt freezing,
+ * though, just unset all_visible and all_frozen now.
*/
+ if (!prstate->attempt_freeze)
+ {
+ prstate->all_visible = false;
+ prstate->all_frozen = false;
+ }
/* Record the dead offset for vacuum */
prstate->deadoffsets[prstate->lpdead_items++] = offnum;
--
2.43.0
v31-0013-Track-which-relations-are-modified-by-a-query.patchtext/x-patch; charset=US-ASCII; name=v31-0013-Track-which-relations-are-modified-by-a-query.patchDownload
From da8d2f225d0fc42bdecad87f55a7e86518c068cc Mon Sep 17 00:00:00 2001
From: Melanie Plageman <melanieplageman@gmail.com>
Date: Wed, 3 Dec 2025 15:07:24 -0500
Subject: [PATCH v31 13/16] Track which relations are modified by a query
Save the relids in a bitmap in the estate. A later commit will pass this
information down to scan nodes to control whether or not the scan allows
setting the visibility map while on-access pruning. We don't want to set
the visibility map if the query is just going to modify the page
immediately after.
Reviewed-by: Chao Li <li.evan.chao@gmail.com>
---
src/backend/executor/execMain.c | 4 ++++
src/backend/executor/execUtils.c | 2 ++
src/include/nodes/execnodes.h | 6 ++++++
3 files changed, 12 insertions(+)
diff --git a/src/backend/executor/execMain.c b/src/backend/executor/execMain.c
index 797d8b1ca1c..9df7df17e96 100644
--- a/src/backend/executor/execMain.c
+++ b/src/backend/executor/execMain.c
@@ -916,6 +916,10 @@ InitPlan(QueryDesc *queryDesc, int eflags)
break;
}
+ /* If it has a rowmark, the relation may be modified */
+ estate->es_modified_relids = bms_add_member(estate->es_modified_relids,
+ rc->rti);
+
/* Check that relation is a legal target for marking */
if (relation)
CheckValidRowMarkRel(relation, rc->markType);
diff --git a/src/backend/executor/execUtils.c b/src/backend/executor/execUtils.c
index fdc65c2b42b..28a06dcd244 100644
--- a/src/backend/executor/execUtils.c
+++ b/src/backend/executor/execUtils.c
@@ -893,6 +893,8 @@ ExecInitResultRelation(EState *estate, ResultRelInfo *resultRelInfo,
estate->es_result_relations = (ResultRelInfo **)
palloc0(estate->es_range_table_size * sizeof(ResultRelInfo *));
estate->es_result_relations[rti - 1] = resultRelInfo;
+ estate->es_modified_relids = bms_add_member(estate->es_modified_relids,
+ rti);
/*
* Saving in the list allows to avoid needlessly traversing the whole
diff --git a/src/include/nodes/execnodes.h b/src/include/nodes/execnodes.h
index 3968429f991..13b42b5e6d1 100644
--- a/src/include/nodes/execnodes.h
+++ b/src/include/nodes/execnodes.h
@@ -676,6 +676,12 @@ typedef struct EState
* ExecDoInitialPruning() */
const char *es_sourceText; /* Source text from QueryDesc */
+ /*
+ * RT indexes of relations modified by the query through a
+ * UPDATE/DELETE/INSERT/MERGE or targeted by a SELECT FOR UPDATE.
+ */
+ Bitmapset *es_modified_relids;
+
JunkFilter *es_junkFilter; /* top-level junk filter, if any */
/* If query can insert/delete tuples, the command ID to mark them with */
--
2.43.0
v31-0014-Pass-down-information-on-table-modification-to-s.patchtext/x-patch; charset=US-ASCII; name=v31-0014-Pass-down-information-on-table-modification-to-s.patchDownload
From 30c13a767fb0d9604595cda0fe3fce238a54df5e Mon Sep 17 00:00:00 2001
From: Melanie Plageman <melanieplageman@gmail.com>
Date: Wed, 3 Dec 2025 15:12:18 -0500
Subject: [PATCH v31 14/16] Pass down information on table modification to scan
node
Pass down information to sequential scan, index scan, and bitmap table
scan nodes on whether or not the query modifies the relation being
scanned. A later commit will use this information to update the VM
during on-access pruning only if the relation is not modified by the
query.
---
contrib/pgrowlocks/pgrowlocks.c | 2 +-
src/backend/access/brin/brin.c | 3 ++-
src/backend/access/gin/gininsert.c | 3 ++-
src/backend/access/heap/heapam_handler.c | 7 +++---
src/backend/access/index/genam.c | 4 ++--
src/backend/access/index/indexam.c | 6 +++---
src/backend/access/nbtree/nbtsort.c | 2 +-
src/backend/access/table/tableam.c | 7 +++---
src/backend/commands/constraint.c | 2 +-
src/backend/commands/copyto.c | 2 +-
src/backend/commands/tablecmds.c | 8 +++----
src/backend/commands/typecmds.c | 4 ++--
src/backend/executor/execIndexing.c | 2 +-
src/backend/executor/execReplication.c | 8 +++----
src/backend/executor/nodeBitmapHeapscan.c | 9 +++++++-
src/backend/executor/nodeIndexonlyscan.c | 2 +-
src/backend/executor/nodeIndexscan.c | 11 ++++++++--
src/backend/executor/nodeSeqscan.c | 26 ++++++++++++++++++++---
src/backend/partitioning/partbounds.c | 2 +-
src/backend/utils/adt/selfuncs.c | 2 +-
src/include/access/genam.h | 2 +-
src/include/access/heapam.h | 6 ++++++
src/include/access/tableam.h | 19 ++++++++++-------
23 files changed, 93 insertions(+), 46 deletions(-)
diff --git a/contrib/pgrowlocks/pgrowlocks.c b/contrib/pgrowlocks/pgrowlocks.c
index f88269332b6..27f01d8055f 100644
--- a/contrib/pgrowlocks/pgrowlocks.c
+++ b/contrib/pgrowlocks/pgrowlocks.c
@@ -114,7 +114,7 @@ pgrowlocks(PG_FUNCTION_ARGS)
RelationGetRelationName(rel));
/* Scan the relation */
- scan = table_beginscan(rel, GetActiveSnapshot(), 0, NULL);
+ scan = table_beginscan(rel, GetActiveSnapshot(), 0, NULL, 0);
hscan = (HeapScanDesc) scan;
attinmeta = TupleDescGetAttInMetadata(rsinfo->setDesc);
diff --git a/src/backend/access/brin/brin.c b/src/backend/access/brin/brin.c
index 45d306037a4..5c4bf5f0c6e 100644
--- a/src/backend/access/brin/brin.c
+++ b/src/backend/access/brin/brin.c
@@ -2842,7 +2842,8 @@ _brin_parallel_scan_and_build(BrinBuildState *state,
indexInfo->ii_Concurrent = brinshared->isconcurrent;
scan = table_beginscan_parallel(heap,
- ParallelTableScanFromBrinShared(brinshared));
+ ParallelTableScanFromBrinShared(brinshared),
+ 0);
reltuples = table_index_build_scan(heap, index, indexInfo, true, true,
brinbuildCallbackParallel, state, scan);
diff --git a/src/backend/access/gin/gininsert.c b/src/backend/access/gin/gininsert.c
index fc6af7c751b..b2457b96dcc 100644
--- a/src/backend/access/gin/gininsert.c
+++ b/src/backend/access/gin/gininsert.c
@@ -2058,7 +2058,8 @@ _gin_parallel_scan_and_build(GinBuildState *state,
indexInfo->ii_Concurrent = ginshared->isconcurrent;
scan = table_beginscan_parallel(heap,
- ParallelTableScanFromGinBuildShared(ginshared));
+ ParallelTableScanFromGinBuildShared(ginshared),
+ 0);
reltuples = table_index_build_scan(heap, index, indexInfo, true, progress,
ginBuildCallbackParallel, state, scan);
diff --git a/src/backend/access/heap/heapam_handler.c b/src/backend/access/heap/heapam_handler.c
index 8707d1aab4a..fc251e11f8a 100644
--- a/src/backend/access/heap/heapam_handler.c
+++ b/src/backend/access/heap/heapam_handler.c
@@ -79,12 +79,13 @@ heapam_slot_callbacks(Relation relation)
*/
static IndexFetchTableData *
-heapam_index_fetch_begin(Relation rel)
+heapam_index_fetch_begin(Relation rel, uint32 flags)
{
IndexFetchHeapData *hscan = palloc0_object(IndexFetchHeapData);
hscan->xs_base.rel = rel;
hscan->xs_cbuf = InvalidBuffer;
+ hscan->modifies_base_rel = !(flags & SO_HINT_REL_READ_ONLY);
return &hscan->xs_base;
}
@@ -753,7 +754,7 @@ heapam_relation_copy_for_cluster(Relation OldHeap, Relation NewHeap,
tableScan = NULL;
heapScan = NULL;
- indexScan = index_beginscan(OldHeap, OldIndex, SnapshotAny, NULL, 0, 0);
+ indexScan = index_beginscan(OldHeap, OldIndex, SnapshotAny, NULL, 0, 0, 0);
index_rescan(indexScan, NULL, 0, NULL, 0);
}
else
@@ -762,7 +763,7 @@ heapam_relation_copy_for_cluster(Relation OldHeap, Relation NewHeap,
pgstat_progress_update_param(PROGRESS_CLUSTER_PHASE,
PROGRESS_CLUSTER_PHASE_SEQ_SCAN_HEAP);
- tableScan = table_beginscan(OldHeap, SnapshotAny, 0, (ScanKey) NULL);
+ tableScan = table_beginscan(OldHeap, SnapshotAny, 0, (ScanKey) NULL, 0);
heapScan = (HeapScanDesc) tableScan;
indexScan = NULL;
diff --git a/src/backend/access/index/genam.c b/src/backend/access/index/genam.c
index b7f10a1aed0..15f9cc11582 100644
--- a/src/backend/access/index/genam.c
+++ b/src/backend/access/index/genam.c
@@ -447,7 +447,7 @@ systable_beginscan(Relation heapRelation,
}
sysscan->iscan = index_beginscan(heapRelation, irel,
- snapshot, NULL, nkeys, 0);
+ snapshot, NULL, nkeys, 0, 0);
index_rescan(sysscan->iscan, idxkey, nkeys, NULL, 0);
sysscan->scan = NULL;
@@ -708,7 +708,7 @@ systable_beginscan_ordered(Relation heapRelation,
}
sysscan->iscan = index_beginscan(heapRelation, indexRelation,
- snapshot, NULL, nkeys, 0);
+ snapshot, NULL, nkeys, 0, 0);
index_rescan(sysscan->iscan, idxkey, nkeys, NULL, 0);
sysscan->scan = NULL;
diff --git a/src/backend/access/index/indexam.c b/src/backend/access/index/indexam.c
index 0492d92d23b..b5523cf2ab1 100644
--- a/src/backend/access/index/indexam.c
+++ b/src/backend/access/index/indexam.c
@@ -257,7 +257,7 @@ index_beginscan(Relation heapRelation,
Relation indexRelation,
Snapshot snapshot,
IndexScanInstrumentation *instrument,
- int nkeys, int norderbys)
+ int nkeys, int norderbys, uint32 flags)
{
IndexScanDesc scan;
@@ -284,7 +284,7 @@ index_beginscan(Relation heapRelation,
scan->instrument = instrument;
/* prepare to fetch index matches from table */
- scan->xs_heapfetch = table_index_fetch_begin(heapRelation);
+ scan->xs_heapfetch = table_index_fetch_begin(heapRelation, flags);
return scan;
}
@@ -615,7 +615,7 @@ index_beginscan_parallel(Relation heaprel, Relation indexrel,
scan->instrument = instrument;
/* prepare to fetch index matches from table */
- scan->xs_heapfetch = table_index_fetch_begin(heaprel);
+ scan->xs_heapfetch = table_index_fetch_begin(heaprel, 0);
return scan;
}
diff --git a/src/backend/access/nbtree/nbtsort.c b/src/backend/access/nbtree/nbtsort.c
index d7695dc1108..7bdbc7e5fa7 100644
--- a/src/backend/access/nbtree/nbtsort.c
+++ b/src/backend/access/nbtree/nbtsort.c
@@ -1925,7 +1925,7 @@ _bt_parallel_scan_and_sort(BTSpool *btspool, BTSpool *btspool2,
indexInfo = BuildIndexInfo(btspool->index);
indexInfo->ii_Concurrent = btshared->isconcurrent;
scan = table_beginscan_parallel(btspool->heap,
- ParallelTableScanFromBTShared(btshared));
+ ParallelTableScanFromBTShared(btshared), 0);
reltuples = table_index_build_scan(btspool->heap, btspool->index, indexInfo,
true, progress, _bt_build_callback,
&buildstate, scan);
diff --git a/src/backend/access/table/tableam.c b/src/backend/access/table/tableam.c
index 73ebc01a08f..a00bdfdf822 100644
--- a/src/backend/access/table/tableam.c
+++ b/src/backend/access/table/tableam.c
@@ -163,10 +163,11 @@ table_parallelscan_initialize(Relation rel, ParallelTableScanDesc pscan,
}
TableScanDesc
-table_beginscan_parallel(Relation relation, ParallelTableScanDesc pscan)
+table_beginscan_parallel(Relation relation, ParallelTableScanDesc pscan, uint32 flags)
{
Snapshot snapshot;
- uint32 flags = SO_TYPE_SEQSCAN |
+
+ flags |= SO_TYPE_SEQSCAN |
SO_ALLOW_STRAT | SO_ALLOW_SYNC | SO_ALLOW_PAGEMODE;
Assert(RelFileLocatorEquals(relation->rd_locator, pscan->phs_locator));
@@ -248,7 +249,7 @@ table_index_fetch_tuple_check(Relation rel,
bool found;
slot = table_slot_create(rel, NULL);
- scan = table_index_fetch_begin(rel);
+ scan = table_index_fetch_begin(rel, 0);
found = table_index_fetch_tuple(scan, tid, snapshot, slot, &call_again,
all_dead);
table_index_fetch_end(scan);
diff --git a/src/backend/commands/constraint.c b/src/backend/commands/constraint.c
index 3497a8221f2..97c8278e36d 100644
--- a/src/backend/commands/constraint.c
+++ b/src/backend/commands/constraint.c
@@ -106,7 +106,7 @@ unique_key_recheck(PG_FUNCTION_ARGS)
*/
tmptid = checktid;
{
- IndexFetchTableData *scan = table_index_fetch_begin(trigdata->tg_relation);
+ IndexFetchTableData *scan = table_index_fetch_begin(trigdata->tg_relation, 0);
bool call_again = false;
if (!table_index_fetch_tuple(scan, &tmptid, SnapshotSelf, slot,
diff --git a/src/backend/commands/copyto.c b/src/backend/commands/copyto.c
index dae91630ac3..1957bb0f1a2 100644
--- a/src/backend/commands/copyto.c
+++ b/src/backend/commands/copyto.c
@@ -1156,7 +1156,7 @@ CopyRelationTo(CopyToState cstate, Relation rel, Relation root_rel, uint64 *proc
AttrMap *map = NULL;
TupleTableSlot *root_slot = NULL;
- scandesc = table_beginscan(rel, GetActiveSnapshot(), 0, NULL);
+ scandesc = table_beginscan(rel, GetActiveSnapshot(), 0, NULL, 0);
slot = table_slot_create(rel, NULL);
/*
diff --git a/src/backend/commands/tablecmds.c b/src/backend/commands/tablecmds.c
index 6b1a00ed477..130a670d266 100644
--- a/src/backend/commands/tablecmds.c
+++ b/src/backend/commands/tablecmds.c
@@ -6377,7 +6377,7 @@ ATRewriteTable(AlteredTableInfo *tab, Oid OIDNewHeap)
* checking all the constraints.
*/
snapshot = RegisterSnapshot(GetLatestSnapshot());
- scan = table_beginscan(oldrel, snapshot, 0, NULL);
+ scan = table_beginscan(oldrel, snapshot, 0, NULL, 0);
/*
* Switch to per-tuple memory context and reset it for each tuple
@@ -13766,7 +13766,7 @@ validateForeignKeyConstraint(char *conname,
*/
snapshot = RegisterSnapshot(GetLatestSnapshot());
slot = table_slot_create(rel, NULL);
- scan = table_beginscan(rel, snapshot, 0, NULL);
+ scan = table_beginscan(rel, snapshot, 0, NULL, 0);
perTupCxt = AllocSetContextCreate(CurrentMemoryContext,
"validateForeignKeyConstraint",
@@ -22623,7 +22623,7 @@ MergePartitionsMoveRows(List **wqueue, List *mergingPartitions, Relation newPart
/* Scan through the rows. */
snapshot = RegisterSnapshot(GetLatestSnapshot());
- scan = table_beginscan(mergingPartition, snapshot, 0, NULL);
+ scan = table_beginscan(mergingPartition, snapshot, 0, NULL, 0);
/*
* Switch to per-tuple memory context and reset it for each tuple
@@ -23087,7 +23087,7 @@ SplitPartitionMoveRows(List **wqueue, Relation rel, Relation splitRel,
/* Scan through the rows. */
snapshot = RegisterSnapshot(GetLatestSnapshot());
- scan = table_beginscan(splitRel, snapshot, 0, NULL);
+ scan = table_beginscan(splitRel, snapshot, 0, NULL, 0);
/*
* Switch to per-tuple memory context and reset it for each tuple
diff --git a/src/backend/commands/typecmds.c b/src/backend/commands/typecmds.c
index be6ffd6ddb0..2921f68c1c3 100644
--- a/src/backend/commands/typecmds.c
+++ b/src/backend/commands/typecmds.c
@@ -3157,7 +3157,7 @@ validateDomainNotNullConstraint(Oid domainoid)
/* Scan all tuples in this relation */
snapshot = RegisterSnapshot(GetLatestSnapshot());
- scan = table_beginscan(testrel, snapshot, 0, NULL);
+ scan = table_beginscan(testrel, snapshot, 0, NULL, 0);
slot = table_slot_create(testrel, NULL);
while (table_scan_getnextslot(scan, ForwardScanDirection, slot))
{
@@ -3238,7 +3238,7 @@ validateDomainCheckConstraint(Oid domainoid, const char *ccbin, LOCKMODE lockmod
/* Scan all tuples in this relation */
snapshot = RegisterSnapshot(GetLatestSnapshot());
- scan = table_beginscan(testrel, snapshot, 0, NULL);
+ scan = table_beginscan(testrel, snapshot, 0, NULL, 0);
slot = table_slot_create(testrel, NULL);
while (table_scan_getnextslot(scan, ForwardScanDirection, slot))
{
diff --git a/src/backend/executor/execIndexing.c b/src/backend/executor/execIndexing.c
index 0b3a31f1703..74262a34819 100644
--- a/src/backend/executor/execIndexing.c
+++ b/src/backend/executor/execIndexing.c
@@ -816,7 +816,7 @@ check_exclusion_or_unique_constraint(Relation heap, Relation index,
retry:
conflict = false;
found_self = false;
- index_scan = index_beginscan(heap, index, &DirtySnapshot, NULL, indnkeyatts, 0);
+ index_scan = index_beginscan(heap, index, &DirtySnapshot, NULL, indnkeyatts, 0, 0);
index_rescan(index_scan, scankeys, indnkeyatts, NULL, 0);
while (index_getnext_slot(index_scan, ForwardScanDirection, existing_slot))
diff --git a/src/backend/executor/execReplication.c b/src/backend/executor/execReplication.c
index 860f79f9cc1..6e49ea5c5d8 100644
--- a/src/backend/executor/execReplication.c
+++ b/src/backend/executor/execReplication.c
@@ -204,7 +204,7 @@ RelationFindReplTupleByIndex(Relation rel, Oid idxoid,
skey_attoff = build_replindex_scan_key(skey, rel, idxrel, searchslot);
/* Start an index scan. */
- scan = index_beginscan(rel, idxrel, &snap, NULL, skey_attoff, 0);
+ scan = index_beginscan(rel, idxrel, &snap, NULL, skey_attoff, 0, 0);
retry:
found = false;
@@ -382,7 +382,7 @@ RelationFindReplTupleSeq(Relation rel, LockTupleMode lockmode,
/* Start a heap scan. */
InitDirtySnapshot(snap);
- scan = table_beginscan(rel, &snap, 0, NULL);
+ scan = table_beginscan(rel, &snap, 0, NULL, 0);
scanslot = table_slot_create(rel, NULL);
retry:
@@ -601,7 +601,7 @@ RelationFindDeletedTupleInfoSeq(Relation rel, TupleTableSlot *searchslot,
* not yet committed or those just committed prior to the scan are
* excluded in update_most_recent_deletion_info().
*/
- scan = table_beginscan(rel, SnapshotAny, 0, NULL);
+ scan = table_beginscan(rel, SnapshotAny, 0, NULL, 0);
scanslot = table_slot_create(rel, NULL);
table_rescan(scan, NULL);
@@ -665,7 +665,7 @@ RelationFindDeletedTupleInfoByIndex(Relation rel, Oid idxoid,
* not yet committed or those just committed prior to the scan are
* excluded in update_most_recent_deletion_info().
*/
- scan = index_beginscan(rel, idxrel, SnapshotAny, NULL, skey_attoff, 0);
+ scan = index_beginscan(rel, idxrel, SnapshotAny, NULL, skey_attoff, 0, 0);
index_rescan(scan, skey, skey_attoff, NULL, 0);
diff --git a/src/backend/executor/nodeBitmapHeapscan.c b/src/backend/executor/nodeBitmapHeapscan.c
index bf24f3d7fe0..0d854db51a1 100644
--- a/src/backend/executor/nodeBitmapHeapscan.c
+++ b/src/backend/executor/nodeBitmapHeapscan.c
@@ -105,11 +105,18 @@ BitmapTableScanSetup(BitmapHeapScanState *node)
*/
if (!node->ss.ss_currentScanDesc)
{
+ uint32 flags = 0;
+
+ if (!bms_is_member(((Scan *) node->ss.ps.plan)->scanrelid,
+ node->ss.ps.state->es_modified_relids))
+ flags = SO_HINT_REL_READ_ONLY;
+
node->ss.ss_currentScanDesc =
table_beginscan_bm(node->ss.ss_currentRelation,
node->ss.ps.state->es_snapshot,
0,
- NULL);
+ NULL,
+ flags);
}
node->ss.ss_currentScanDesc->st.rs_tbmiterator = tbmiterator;
diff --git a/src/backend/executor/nodeIndexonlyscan.c b/src/backend/executor/nodeIndexonlyscan.c
index 6bea42f128f..2c87ba5f767 100644
--- a/src/backend/executor/nodeIndexonlyscan.c
+++ b/src/backend/executor/nodeIndexonlyscan.c
@@ -94,7 +94,7 @@ IndexOnlyNext(IndexOnlyScanState *node)
estate->es_snapshot,
&node->ioss_Instrument,
node->ioss_NumScanKeys,
- node->ioss_NumOrderByKeys);
+ node->ioss_NumOrderByKeys, 0);
node->ioss_ScanDesc = scandesc;
diff --git a/src/backend/executor/nodeIndexscan.c b/src/backend/executor/nodeIndexscan.c
index 72b135e5dcf..92674441c6d 100644
--- a/src/backend/executor/nodeIndexscan.c
+++ b/src/backend/executor/nodeIndexscan.c
@@ -102,6 +102,12 @@ IndexNext(IndexScanState *node)
if (scandesc == NULL)
{
+ uint32 flags = 0;
+
+ if (!bms_is_member(((Scan *) node->ss.ps.plan)->scanrelid,
+ estate->es_modified_relids))
+ flags = SO_HINT_REL_READ_ONLY;
+
/*
* We reach here if the index scan is not parallel, or if we're
* serially executing an index scan that was planned to be parallel.
@@ -111,7 +117,8 @@ IndexNext(IndexScanState *node)
estate->es_snapshot,
&node->iss_Instrument,
node->iss_NumScanKeys,
- node->iss_NumOrderByKeys);
+ node->iss_NumOrderByKeys,
+ flags);
node->iss_ScanDesc = scandesc;
@@ -207,7 +214,7 @@ IndexNextWithReorder(IndexScanState *node)
estate->es_snapshot,
&node->iss_Instrument,
node->iss_NumScanKeys,
- node->iss_NumOrderByKeys);
+ node->iss_NumOrderByKeys, 0);
node->iss_ScanDesc = scandesc;
diff --git a/src/backend/executor/nodeSeqscan.c b/src/backend/executor/nodeSeqscan.c
index 94047d29430..4d0cbb9dee4 100644
--- a/src/backend/executor/nodeSeqscan.c
+++ b/src/backend/executor/nodeSeqscan.c
@@ -65,13 +65,20 @@ SeqNext(SeqScanState *node)
if (scandesc == NULL)
{
+ uint32 flags = 0;
+
+ if (!bms_is_member(((Scan *) node->ss.ps.plan)->scanrelid,
+ estate->es_modified_relids))
+ flags = SO_HINT_REL_READ_ONLY;
+
/*
* We reach here if the scan is not parallel, or if we're serially
* executing a scan that was planned to be parallel.
*/
scandesc = table_beginscan(node->ss.ss_currentRelation,
estate->es_snapshot,
- 0, NULL);
+ 0, NULL, flags);
+
node->ss.ss_currentScanDesc = scandesc;
}
@@ -367,14 +374,20 @@ ExecSeqScanInitializeDSM(SeqScanState *node,
{
EState *estate = node->ss.ps.state;
ParallelTableScanDesc pscan;
+ uint32 flags = 0;
pscan = shm_toc_allocate(pcxt->toc, node->pscan_len);
table_parallelscan_initialize(node->ss.ss_currentRelation,
pscan,
estate->es_snapshot);
shm_toc_insert(pcxt->toc, node->ss.ps.plan->plan_node_id, pscan);
+ if (!bms_is_member(((Scan *) node->ss.ps.plan)->scanrelid,
+ estate->es_modified_relids))
+ flags = SO_HINT_REL_READ_ONLY;
+
node->ss.ss_currentScanDesc =
- table_beginscan_parallel(node->ss.ss_currentRelation, pscan);
+ table_beginscan_parallel(node->ss.ss_currentRelation, pscan,
+ flags);
}
/* ----------------------------------------------------------------
@@ -404,8 +417,15 @@ ExecSeqScanInitializeWorker(SeqScanState *node,
ParallelWorkerContext *pwcxt)
{
ParallelTableScanDesc pscan;
+ uint32 flags = 0;
+
+ if (!bms_is_member(((Scan *) node->ss.ps.plan)->scanrelid,
+ node->ss.ps.state->es_modified_relids))
+ flags = SO_HINT_REL_READ_ONLY;
pscan = shm_toc_lookup(pwcxt->toc, node->ss.ps.plan->plan_node_id, false);
node->ss.ss_currentScanDesc =
- table_beginscan_parallel(node->ss.ss_currentRelation, pscan);
+ table_beginscan_parallel(node->ss.ss_currentRelation,
+ pscan,
+ flags);
}
diff --git a/src/backend/partitioning/partbounds.c b/src/backend/partitioning/partbounds.c
index 16b0adc172c..91acf1ee2d7 100644
--- a/src/backend/partitioning/partbounds.c
+++ b/src/backend/partitioning/partbounds.c
@@ -3362,7 +3362,7 @@ check_default_partition_contents(Relation parent, Relation default_rel,
econtext = GetPerTupleExprContext(estate);
snapshot = RegisterSnapshot(GetLatestSnapshot());
tupslot = table_slot_create(part_rel, &estate->es_tupleTable);
- scan = table_beginscan(part_rel, snapshot, 0, NULL);
+ scan = table_beginscan(part_rel, snapshot, 0, NULL, 0);
/*
* Switch to per-tuple memory context and reset it for each tuple
diff --git a/src/backend/utils/adt/selfuncs.c b/src/backend/utils/adt/selfuncs.c
index c760b19db55..ec0def0d1e2 100644
--- a/src/backend/utils/adt/selfuncs.c
+++ b/src/backend/utils/adt/selfuncs.c
@@ -7100,7 +7100,7 @@ get_actual_variable_endpoint(Relation heapRel,
index_scan = index_beginscan(heapRel, indexRel,
&SnapshotNonVacuumable, NULL,
- 1, 0);
+ 1, 0, 0);
/* Set it up for index-only scan */
index_scan->xs_want_itup = true;
index_rescan(index_scan, scankeys, 1, NULL, 0);
diff --git a/src/include/access/genam.h b/src/include/access/genam.h
index 9200a22bd9f..d29d9e905fc 100644
--- a/src/include/access/genam.h
+++ b/src/include/access/genam.h
@@ -177,7 +177,7 @@ extern IndexScanDesc index_beginscan(Relation heapRelation,
Relation indexRelation,
Snapshot snapshot,
IndexScanInstrumentation *instrument,
- int nkeys, int norderbys);
+ int nkeys, int norderbys, uint32 flags);
extern IndexScanDesc index_beginscan_bitmap(Relation indexRelation,
Snapshot snapshot,
IndexScanInstrumentation *instrument,
diff --git a/src/include/access/heapam.h b/src/include/access/heapam.h
index 5657b1df46b..ba62a4d4cba 100644
--- a/src/include/access/heapam.h
+++ b/src/include/access/heapam.h
@@ -119,6 +119,12 @@ typedef struct IndexFetchHeapData
Buffer xs_cbuf; /* current heap buffer in scan, if any */
/* NB: if xs_cbuf is not InvalidBuffer, we hold a pin on that buffer */
+
+ /*
+ * Some optimizations can only be performed if the query does not modify
+ * the underlying relation. Track that here.
+ */
+ bool modifies_base_rel;
} IndexFetchHeapData;
/* Result codes for HeapTupleSatisfiesVacuum */
diff --git a/src/include/access/tableam.h b/src/include/access/tableam.h
index ee9b32c4620..15fad66ed87 100644
--- a/src/include/access/tableam.h
+++ b/src/include/access/tableam.h
@@ -63,6 +63,8 @@ typedef enum ScanOptions
/* unregister snapshot at scan end? */
SO_TEMP_SNAPSHOT = 1 << 9,
+ /* set if the query doesn't modify the rel */
+ SO_HINT_REL_READ_ONLY = 1 << 10,
} ScanOptions;
/*
@@ -420,7 +422,7 @@ typedef struct TableAmRoutine
*
* Tuples for an index scan can then be fetched via index_fetch_tuple.
*/
- struct IndexFetchTableData *(*index_fetch_begin) (Relation rel);
+ struct IndexFetchTableData *(*index_fetch_begin) (Relation rel, uint32 flags);
/*
* Reset index fetch. Typically this will release cross index fetch
@@ -873,9 +875,9 @@ extern TupleTableSlot *table_slot_create(Relation relation, List **reglist);
*/
static inline TableScanDesc
table_beginscan(Relation rel, Snapshot snapshot,
- int nkeys, ScanKeyData *key)
+ int nkeys, ScanKeyData *key, uint32 flags)
{
- uint32 flags = SO_TYPE_SEQSCAN |
+ flags |= SO_TYPE_SEQSCAN |
SO_ALLOW_STRAT | SO_ALLOW_SYNC | SO_ALLOW_PAGEMODE;
return rel->rd_tableam->scan_begin(rel, snapshot, nkeys, key, NULL, flags);
@@ -918,9 +920,9 @@ table_beginscan_strat(Relation rel, Snapshot snapshot,
*/
static inline TableScanDesc
table_beginscan_bm(Relation rel, Snapshot snapshot,
- int nkeys, ScanKeyData *key)
+ int nkeys, ScanKeyData *key, uint32 flags)
{
- uint32 flags = SO_TYPE_BITMAPSCAN | SO_ALLOW_PAGEMODE;
+ flags |= SO_TYPE_BITMAPSCAN | SO_ALLOW_PAGEMODE;
return rel->rd_tableam->scan_begin(rel, snapshot, nkeys, key,
NULL, flags);
@@ -1127,7 +1129,8 @@ extern void table_parallelscan_initialize(Relation rel,
* Caller must hold a suitable lock on the relation.
*/
extern TableScanDesc table_beginscan_parallel(Relation relation,
- ParallelTableScanDesc pscan);
+ ParallelTableScanDesc pscan,
+ uint32 flags);
/*
* Begin a parallel tid range scan. `pscan` needs to have been initialized
@@ -1163,9 +1166,9 @@ table_parallelscan_reinitialize(Relation rel, ParallelTableScanDesc pscan)
* Tuples for an index scan can then be fetched via table_index_fetch_tuple().
*/
static inline IndexFetchTableData *
-table_index_fetch_begin(Relation rel)
+table_index_fetch_begin(Relation rel, uint32 flags)
{
- return rel->rd_tableam->index_fetch_begin(rel);
+ return rel->rd_tableam->index_fetch_begin(rel, flags);
}
/*
--
2.43.0
v31-0015-Allow-on-access-pruning-to-set-pages-all-visible.patchtext/x-patch; charset=US-ASCII; name=v31-0015-Allow-on-access-pruning-to-set-pages-all-visible.patchDownload
From 8ff462278736c7fa1de096f43e805a92c68a5b07 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <melanieplageman@gmail.com>
Date: Wed, 3 Dec 2025 15:24:08 -0500
Subject: [PATCH v31 15/16] Allow on-access pruning to set pages all-visible
Many queries do not modify the underlying relation. For such queries, if
on-access pruning occurs during the scan, we can check whether the page
has become all-visible and update the visibility map accordingly.
Previously, only vacuum and COPY FREEZE marked pages as all-visible or
all-frozen.
This commit implements on-access VM setting for sequential scans as well
as for the underlying heap relation in index scans and bitmap heap
scans.
Author: Melanie Plageman <melanieplageman@gmail.com>
Reviewed-by: Andres Freund <andres@anarazel.de>
Reviewed-by: Kirill Reshke <reshkekirill@gmail.com>
Discussion: https://postgr.es/m/flat/CAAKRu_ZMw6Npd_qm2KM%2BFwQ3cMOMx1Dh3VMhp8-V7SOLxdK9-g%40mail.gmail.com
---
src/backend/access/heap/heapam.c | 15 ++++++-
src/backend/access/heap/heapam_handler.c | 15 ++++++-
src/backend/access/heap/pruneheap.c | 40 ++++++++++++++++++-
src/include/access/heapam.h | 24 +++++++++--
.../t/035_standby_logical_decoding.pl | 3 +-
5 files changed, 89 insertions(+), 8 deletions(-)
diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index fb7a7548aa0..d9dc79f4a96 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -570,6 +570,7 @@ heap_prepare_pagescan(TableScanDesc sscan)
Buffer buffer = scan->rs_cbuf;
BlockNumber block = scan->rs_cblock;
Snapshot snapshot;
+ Buffer *vmbuffer = NULL;
Page page;
int lines;
bool all_visible;
@@ -584,7 +585,9 @@ heap_prepare_pagescan(TableScanDesc sscan)
/*
* Prune and repair fragmentation for the whole page, if possible.
*/
- heap_page_prune_opt(scan->rs_base.rs_rd, buffer);
+ if (sscan->rs_flags & SO_HINT_REL_READ_ONLY)
+ vmbuffer = &scan->rs_vmbuffer;
+ heap_page_prune_opt(scan->rs_base.rs_rd, buffer, vmbuffer);
/*
* We must hold share lock on the buffer content while examining tuple
@@ -1261,6 +1264,7 @@ heap_beginscan(Relation relation, Snapshot snapshot,
sizeof(TBMIterateResult));
}
+ scan->rs_vmbuffer = InvalidBuffer;
return (TableScanDesc) scan;
}
@@ -1299,6 +1303,12 @@ heap_rescan(TableScanDesc sscan, ScanKey key, bool set_params,
scan->rs_cbuf = InvalidBuffer;
}
+ if (BufferIsValid(scan->rs_vmbuffer))
+ {
+ ReleaseBuffer(scan->rs_vmbuffer);
+ scan->rs_vmbuffer = InvalidBuffer;
+ }
+
/*
* SO_TYPE_BITMAPSCAN would be cleaned up here, but it does not hold any
* additional data vs a normal HeapScan
@@ -1331,6 +1341,9 @@ heap_endscan(TableScanDesc sscan)
if (BufferIsValid(scan->rs_cbuf))
ReleaseBuffer(scan->rs_cbuf);
+ if (BufferIsValid(scan->rs_vmbuffer))
+ ReleaseBuffer(scan->rs_vmbuffer);
+
/*
* Must free the read stream before freeing the BufferAccessStrategy.
*/
diff --git a/src/backend/access/heap/heapam_handler.c b/src/backend/access/heap/heapam_handler.c
index fc251e11f8a..6946da8c9d7 100644
--- a/src/backend/access/heap/heapam_handler.c
+++ b/src/backend/access/heap/heapam_handler.c
@@ -85,6 +85,7 @@ heapam_index_fetch_begin(Relation rel, uint32 flags)
hscan->xs_base.rel = rel;
hscan->xs_cbuf = InvalidBuffer;
+ hscan->xs_vmbuffer = InvalidBuffer;
hscan->modifies_base_rel = !(flags & SO_HINT_REL_READ_ONLY);
return &hscan->xs_base;
@@ -100,6 +101,12 @@ heapam_index_fetch_reset(IndexFetchTableData *scan)
ReleaseBuffer(hscan->xs_cbuf);
hscan->xs_cbuf = InvalidBuffer;
}
+
+ if (BufferIsValid(hscan->xs_vmbuffer))
+ {
+ ReleaseBuffer(hscan->xs_vmbuffer);
+ hscan->xs_vmbuffer = InvalidBuffer;
+ }
}
static void
@@ -139,7 +146,8 @@ heapam_index_fetch_tuple(struct IndexFetchTableData *scan,
* Prune page, but only if we weren't already on this page
*/
if (prev_buf != hscan->xs_cbuf)
- heap_page_prune_opt(hscan->xs_base.rel, hscan->xs_cbuf);
+ heap_page_prune_opt(hscan->xs_base.rel, hscan->xs_cbuf,
+ hscan->modifies_base_rel ? NULL : &hscan->xs_vmbuffer);
}
/* Obtain share-lock on the buffer so we can examine visibility */
@@ -2477,6 +2485,7 @@ BitmapHeapScanNextBlock(TableScanDesc scan,
TBMIterateResult *tbmres;
OffsetNumber offsets[TBM_MAX_TUPLES_PER_PAGE];
int noffsets = -1;
+ Buffer *vmbuffer = NULL;
Assert(scan->rs_flags & SO_TYPE_BITMAPSCAN);
Assert(hscan->rs_read_stream);
@@ -2523,7 +2532,9 @@ BitmapHeapScanNextBlock(TableScanDesc scan,
/*
* Prune and repair fragmentation for the whole page, if possible.
*/
- heap_page_prune_opt(scan->rs_rd, buffer);
+ if (scan->rs_flags & SO_HINT_REL_READ_ONLY)
+ vmbuffer = &hscan->rs_vmbuffer;
+ heap_page_prune_opt(scan->rs_rd, buffer, vmbuffer);
/*
* We must hold share lock on the buffer content while examining tuple
diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index 8e59e7692c1..f414f02964d 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -202,6 +202,8 @@ static bool heap_page_will_set_vm(PruneState *prstate,
Relation relation,
BlockNumber heap_blk, Buffer heap_buffer, Page heap_page,
Buffer vmbuffer,
+ PruneReason reason,
+ bool do_prune, bool do_freeze,
int nlpdead_items,
uint8 *old_vmbits,
uint8 *new_vmbits);
@@ -223,9 +225,13 @@ static TransactionId get_conflict_xid(bool do_prune, bool do_freeze, bool do_set
* if there's not any use in pruning.
*
* Caller must have pin on the buffer, and must *not* have a lock on it.
+ *
+ * If vmbuffer is not NULL, it is okay for pruning to set the visibility map if
+ * the page is all-visible. We will take care of pinning and, if needed,
+ * reading in the page of the visibility map.
*/
void
-heap_page_prune_opt(Relation relation, Buffer buffer)
+heap_page_prune_opt(Relation relation, Buffer buffer, Buffer *vmbuffer)
{
Page page = BufferGetPage(buffer);
TransactionId prune_xid;
@@ -306,6 +312,13 @@ heap_page_prune_opt(Relation relation, Buffer buffer)
.cutoffs = NULL,
};
+ if (vmbuffer)
+ {
+ visibilitymap_pin(relation, BufferGetBlockNumber(buffer), vmbuffer);
+ params.options |= HEAP_PAGE_PRUNE_UPDATE_VM;
+ params.vmbuffer = *vmbuffer;
+ }
+
heap_page_prune_and_freeze(¶ms, &presult, &dummy_off_loc,
NULL, NULL);
@@ -951,6 +964,9 @@ identify_and_fix_vm_corruption(Relation rel, Buffer heap_buffer,
* corrupted, it will fix them by clearing the VM bits and visibility hint.
* This does not need to be done in a critical section.
*
+ * This should be called only after do_freeze has been decided (and do_prune
+ * has been set), as these factor into our heuristic-based decision.
+ *
* Returns true if one or both VM bits should be set, along with returning the
* current value of the VM bits in *old_vmbits and the desired new value of
* the VM bits in *new_vmbits.
@@ -964,6 +980,8 @@ heap_page_will_set_vm(PruneState *prstate,
Relation relation,
BlockNumber heap_blk, Buffer heap_buffer, Page heap_page,
Buffer vmbuffer,
+ PruneReason reason,
+ bool do_prune, bool do_freeze,
int nlpdead_items,
uint8 *old_vmbits,
uint8 *new_vmbits)
@@ -974,6 +992,24 @@ heap_page_will_set_vm(PruneState *prstate,
if (!prstate->attempt_update_vm)
return false;
+ /*
+ * If this is an on-access call and we're not actually pruning, avoid
+ * setting the visibility map if it would newly dirty the heap page or, if
+ * the page is already dirty, if doing so would require including a
+ * full-page image (FPI) of the heap page in the WAL. This situation
+ * should be rare, as on-access pruning is only attempted when
+ * pd_prune_xid is valid.
+ */
+ if (reason == PRUNE_ON_ACCESS &&
+ prstate->all_visible &&
+ !do_prune && !do_freeze &&
+ (!BufferIsDirty(heap_buffer) || XLogCheckBufferNeedsBackup(heap_buffer)))
+ {
+ prstate->all_visible = false;
+ prstate->all_frozen = false;
+ return false;
+ }
+
*old_vmbits = visibilitymap_get_status(relation, heap_blk,
&vmbuffer);
@@ -1171,6 +1207,8 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
buffer,
page,
vmbuffer,
+ params->reason,
+ do_prune, do_freeze,
prstate.lpdead_items,
&old_vmbits,
&new_vmbits);
diff --git a/src/include/access/heapam.h b/src/include/access/heapam.h
index ba62a4d4cba..b0e7c71463c 100644
--- a/src/include/access/heapam.h
+++ b/src/include/access/heapam.h
@@ -95,6 +95,13 @@ typedef struct HeapScanDescData
*/
ParallelBlockTableScanWorkerData *rs_parallelworkerdata;
+ /*
+ * For sequential scans and bitmap heap scans. If the relation is not
+ * being modified, on-access pruning may read in the current heap page's
+ * corresponding VM block to this buffer.
+ */
+ Buffer rs_vmbuffer;
+
/* these fields only used in page-at-a-time mode and for bitmap scans */
uint32 rs_cindex; /* current tuple's index in vistuples */
uint32 rs_ntuples; /* number of visible tuples on page */
@@ -117,8 +124,18 @@ typedef struct IndexFetchHeapData
{
IndexFetchTableData xs_base; /* AM independent part of the descriptor */
- Buffer xs_cbuf; /* current heap buffer in scan, if any */
- /* NB: if xs_cbuf is not InvalidBuffer, we hold a pin on that buffer */
+ /*
+ * Current heap buffer in scan, if any. NB: if xs_cbuf is not
+ * InvalidBuffer, we hold a pin on that buffer.
+ */
+ Buffer xs_cbuf;
+
+ /*
+ * For index scans that do not modify the underlying heap table, on-access
+ * pruning may read in the current heap page's corresponding VM block to
+ * this buffer.
+ */
+ Buffer xs_vmbuffer;
/*
* Some optimizations can only be performed if the query does not modify
@@ -419,7 +436,8 @@ extern TransactionId heap_index_delete_tuples(Relation rel,
TM_IndexDeleteOp *delstate);
/* in heap/pruneheap.c */
-extern void heap_page_prune_opt(Relation relation, Buffer buffer);
+extern void heap_page_prune_opt(Relation relation, Buffer buffer,
+ Buffer *vmbuffer);
extern void heap_page_prune_and_freeze(PruneFreezeParams *params,
PruneFreezeResult *presult,
OffsetNumber *off_loc,
diff --git a/src/test/recovery/t/035_standby_logical_decoding.pl b/src/test/recovery/t/035_standby_logical_decoding.pl
index ebe2fae1789..bdd9f0a62cd 100644
--- a/src/test/recovery/t/035_standby_logical_decoding.pl
+++ b/src/test/recovery/t/035_standby_logical_decoding.pl
@@ -296,6 +296,7 @@ wal_level = 'logical'
max_replication_slots = 4
max_wal_senders = 4
autovacuum = off
+hot_standby_feedback = on
});
$node_primary->dump_info;
$node_primary->start;
@@ -748,7 +749,7 @@ check_pg_recvlogical_stderr($handle,
$logstart = -s $node_standby->logfile;
reactive_slots_change_hfs_and_wait_for_xmins('shared_row_removal_',
- 'no_conflict_', 0, 1);
+ 'no_conflict_', 1, 0);
# This should not trigger a conflict
wait_until_vacuum_can_remove(
--
2.43.0
v31-0016-Set-pd_prune_xid-on-insert.patchtext/x-patch; charset=UTF-8; name=v31-0016-Set-pd_prune_xid-on-insert.patchDownload
From d7af575ebac98821654d7cc57091d1273f2b1d86 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <melanieplageman@gmail.com>
Date: Tue, 29 Jul 2025 16:12:56 -0400
Subject: [PATCH v31 16/16] Set pd_prune_xid on insert
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
Now that visibility map (VM) updates can occur during read-only queries,
it makes sense to also set the page’s pd_prune_xid hint during inserts.
This enables heap_page_prune_and_freeze() to run after a page is
filled with newly inserted tuples the first time it is read.
This change also addresses a long-standing note in heap_insert() and
heap_multi_insert(), which observed that setting pd_prune_xid would
help clean up aborted insertions sooner. Without it, such tuples might
linger until VACUUM, whereas now they can be pruned earlier.
Setting pd_prune_xid on insert can cause a page to be dirtied and
written out when it previously would not have been, affecting the
reported number of hits in the index-killtuples isolation test. It is
unclear if this is a bug in the way hits are tracked, a faulty test
expectation, or if simply updating the test's expected output is
sufficient remediation.
Reviewed-by: Chao Li <li.evan.chao@gmail.com>
---
src/backend/access/heap/heapam.c | 25 +++++++++++++------
src/backend/access/heap/heapam_xlog.c | 15 ++++++++++-
.../modules/index/expected/killtuples.out | 6 ++---
3 files changed, 34 insertions(+), 12 deletions(-)
diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index d9dc79f4a96..ccebc1f244b 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -2119,6 +2119,7 @@ heap_insert(Relation relation, HeapTuple tup, CommandId cid,
TransactionId xid = GetCurrentTransactionId();
HeapTuple heaptup;
Buffer buffer;
+ Page page;
Buffer vmbuffer = InvalidBuffer;
bool all_visible_cleared = false;
@@ -2178,15 +2179,19 @@ heap_insert(Relation relation, HeapTuple tup, CommandId cid,
}
/*
- * XXX Should we set PageSetPrunable on this page ?
+ * Set pd_prune_xid to trigger heap_page_prune_and_freeze() once the page
+ * is full so that we can set the page all-visible in the VM.
*
- * The inserting transaction may eventually abort thus making this tuple
- * DEAD and hence available for pruning. Though we don't want to optimize
- * for aborts, if no other tuple in this page is UPDATEd/DELETEd, the
- * aborted tuple will never be pruned until next vacuum is triggered.
+ * Setting pd_prune_xid is also handy if the inserting transaction
+ * eventually aborts making this tuple DEAD and hence available for
+ * pruning. If no other tuple in this page is UPDATEd/DELETEd, the aborted
+ * tuple would never otherwise be pruned until next vacuum is triggered.
*
- * If you do add PageSetPrunable here, add it in heap_xlog_insert too.
+ * Don't set it if we are in bootstrap mode, though.
*/
+ page = BufferGetPage(buffer);
+ if (TransactionIdIsNormal(xid))
+ PageSetPrunable(page, xid);
MarkBufferDirty(buffer);
@@ -2196,7 +2201,6 @@ heap_insert(Relation relation, HeapTuple tup, CommandId cid,
xl_heap_insert xlrec;
xl_heap_header xlhdr;
XLogRecPtr recptr;
- Page page = BufferGetPage(buffer);
uint8 info = XLOG_HEAP_INSERT;
int bufflags = 0;
@@ -2560,8 +2564,13 @@ heap_multi_insert(Relation relation, TupleTableSlot **slots, int ntuples,
}
/*
- * XXX Should we set PageSetPrunable on this page ? See heap_insert()
+ * Set pd_prune_xid. See heap_insert() for more on why we do this when
+ * inserting tuples. This only makes sense if we aren't already
+ * setting the page frozen in the VM. We also don't set it in
+ * bootstrap mode.
*/
+ if (!all_frozen_set && TransactionIdIsNormal(xid))
+ PageSetPrunable(page, xid);
MarkBufferDirty(buffer);
diff --git a/src/backend/access/heap/heapam_xlog.c b/src/backend/access/heap/heapam_xlog.c
index 47d2479415e..ab2db931aac 100644
--- a/src/backend/access/heap/heapam_xlog.c
+++ b/src/backend/access/heap/heapam_xlog.c
@@ -447,6 +447,12 @@ heap_xlog_insert(XLogReaderState *record)
freespace = PageGetHeapFreeSpace(page); /* needed to update FSM below */
+ /*
+ * Set the page prunable to trigger on-access pruning later which may
+ * set the page all-visible in the VM.
+ */
+ PageSetPrunable(page, XLogRecGetXid(record));
+
PageSetLSN(page, lsn);
if (xlrec->flags & XLH_INSERT_ALL_VISIBLE_CLEARED)
@@ -596,9 +602,16 @@ heap_xlog_multi_insert(XLogReaderState *record)
if (xlrec->flags & XLH_INSERT_ALL_VISIBLE_CLEARED)
PageClearAllVisible(page);
- /* XLH_INSERT_ALL_FROZEN_SET implies that all tuples are visible */
+ /*
+ * XLH_INSERT_ALL_FROZEN_SET implies that all tuples are visible. If
+ * we are not setting the page frozen, then set the page's prunable
+ * hint so that we trigger on-access pruning later which may set the
+ * page all-visible in the VM.
+ */
if (xlrec->flags & XLH_INSERT_ALL_FROZEN_SET)
PageSetAllVisible(page);
+ else
+ PageSetPrunable(page, XLogRecGetXid(record));
MarkBufferDirty(buffer);
}
diff --git a/src/test/modules/index/expected/killtuples.out b/src/test/modules/index/expected/killtuples.out
index be7ddd756ef..b29f2434b00 100644
--- a/src/test/modules/index/expected/killtuples.out
+++ b/src/test/modules/index/expected/killtuples.out
@@ -54,7 +54,7 @@ step flush: SELECT FROM pg_stat_force_next_flush();
step result: SELECT heap_blks_read + heap_blks_hit - counter.heap_accesses AS new_heap_accesses FROM counter, pg_statio_all_tables WHERE relname = 'kill_prior_tuple';
new_heap_accesses
-----------------
- 1
+ 2
(1 row)
step measure: UPDATE counter SET heap_accesses = (SELECT heap_blks_read + heap_blks_hit FROM pg_statio_all_tables WHERE relname = 'kill_prior_tuple');
@@ -130,7 +130,7 @@ step flush: SELECT FROM pg_stat_force_next_flush();
step result: SELECT heap_blks_read + heap_blks_hit - counter.heap_accesses AS new_heap_accesses FROM counter, pg_statio_all_tables WHERE relname = 'kill_prior_tuple';
new_heap_accesses
-----------------
- 1
+ 2
(1 row)
step measure: UPDATE counter SET heap_accesses = (SELECT heap_blks_read + heap_blks_hit FROM pg_statio_all_tables WHERE relname = 'kill_prior_tuple');
@@ -283,7 +283,7 @@ step flush: SELECT FROM pg_stat_force_next_flush();
step result: SELECT heap_blks_read + heap_blks_hit - counter.heap_accesses AS new_heap_accesses FROM counter, pg_statio_all_tables WHERE relname = 'kill_prior_tuple';
new_heap_accesses
-----------------
- 1
+ 2
(1 row)
step measure: UPDATE counter SET heap_accesses = (SELECT heap_blks_read + heap_blks_hit FROM pg_statio_all_tables WHERE relname = 'kill_prior_tuple');
--
2.43.0
On Dec 23, 2025, at 01:57, Melanie Plageman <melanieplageman@gmail.com> wrote:
On Mon, Dec 22, 2025 at 2:20 AM Chao Li <li.evan.chao@gmail.com> wrote:
A few more comments on v29:
Thanks for the continued review! I've attached v30.
1 - 0002 - Looks like since 0002, visibilitymap_set()’s return value is no longer used, so do we need to update the function and change return type to void? I remember in some patches, to address Coverity alerts, people had to do “(void) function_with_a_return_value()”.
I was torn about whether or not to change the return value. Coverity
doesn't always warn about unused return values. Usually it warns if it
perceives the return value as needed for error checking or if it
thinks not using the return value is incorrect. It may still warn in
this case, but it's not obvious to me which way it would go.I have changed the function signature as you suggested in v30.
My hesitation is that visibilitymap_set() is in a header file and
could be used by extensions/forks, etc. Adding more information by
changing a return value from void to non-void doesn't have any
negative effect on those potential callers. But taking away a return
value is more likely to affect them in a potentially negative way.However, I'm significantly changing the signature in this release, so
everybody that used it will have to change their code completely
anyway. Also, I just added a return value for visibilitymap_set() in
the previous release (18). Historically, it returned void. So, I've
gone with your suggestion.
From a previous patch, I learned from Peter Eisentraut that “We don't care about ABI changes in major releases.”, see:
/messages/by-id/70913dbd-dadf-4560-9f81-c0df72bf6578@eisentraut.org
- * HEAP_PAGE_PRUNE_FREEZE indicates that we will also freeze tuples, and - * will return 'all_visible', 'all_frozen' flags to the caller. + * HEAP_PAGE_PRUNE_FREEZE indicates that we will also freeze tuplesNit: a tailing dot is needed in the end of the comment line.
I've changed it. One interesting thing is that our "policy" for
periods in comments is that we don't put periods at the end of
one-line comments and we do put them at the end of mult-line comment
sentences. This is a one-line comment inside a comment block, so I
wasn't sure what to do. If you noticed it, and it bothered you, it's
easy enough to change, though.
If this is a one-line comment, I would have not been caring about the tailing period.
The problem is this is a paragraph of a block comment, and the above and below paragraphs all have tailing periods. So, for consistency, I raised the comment.
```
* HEAP_PAGE_PRUNE_MARK_UNUSED_NOW indicates that dead items can be set
* LP_UNUSED during pruning. <=== Has a tailing period
*
- * HEAP_PAGE_PRUNE_FREEZE indicates that we will also freeze tuples, and
- * will return 'all_visible', 'all_frozen' flags to the caller.
+ * HEAP_PAGE_PRUNE_FREEZE indicates that we will also freeze tuples <=== Not a tailing period
*
* HEAP_PAGE_PRUNE_UPDATE_VM indicates that we will set the page's status
* in the VM. <=== Has a tailing period
```
9 - 0006
@@ -3537,6 +3537,7 @@ heap_page_would_be_all_visible(Relation rel, Buffer buf, { ItemId itemid; HeapTupleData tuple; + TransactionId dead_after = InvalidTransactionId; ```This initialization seems to not needed, as HeapTupleSatisfiesVacuumHorizon() will always set a value to it.
I think this is a comment for a later patch in the set (you originally
said it was from 0006), but I've changed dead_after to not be
initialized like this.
My bad. This comment was actually for 0009. In v31, I see you have removed the initialization to dead_after.
Best regards,
--
Chao Li (Evan)
HighGo Software Co., Ltd.
https://www.highgo.com/
On Mon, Dec 22, 2025 at 7:01 PM Chao Li <li.evan.chao@gmail.com> wrote:
On Dec 23, 2025, at 01:57, Melanie Plageman <melanieplageman@gmail.com> wrote:
My hesitation is that visibilitymap_set() is in a header file and
could be used by extensions/forks, etc. Adding more information by
changing a return value from void to non-void doesn't have any
negative effect on those potential callers. But taking away a return
value is more likely to affect them in a potentially negative way.However, I'm significantly changing the signature in this release, so
everybody that used it will have to change their code completely
anyway. Also, I just added a return value for visibilitymap_set() in
the previous release (18). Historically, it returned void. So, I've
gone with your suggestion.From a previous patch, I learned from Peter Eisentraut that “We don't care about ABI changes in major releases.”, see:
Right, it is totally okay to change function APIs in a major release.
My point was not that it wasn't allowed but that if people are getting
useful information returned from that function, or if we think we
might want that information again in the future, we should think twice
before changing it. But, in this case, I think we don't need to worry
about it.
- Melanie
On Tue, 23 Dec 2025 at 06:18, Melanie Plageman
<melanieplageman@gmail.com> wrote:
Right, it is totally okay to change function APIs in a major release.
My point was not that it wasn't allowed but that if people are getting
useful information returned from that function, or if we think we
might want that information again in the future, we should think twice
before changing it. But, in this case, I think we don't need to worry
about it.- Melanie
At first glance this change looks sane, and I do not find any reason why
table_scan_analyze_next_tuple needs knowledge about OldestXid. But I am not
aware of this function design discussions, maybe OldestXid is here for
a good reason.
After thinking about it for a week or so, I would actually suggest
moving forward with v31 (Remove OldestXmin from TAM).
I think this is a low probability of getting complaints about that.
Also, we're breaking ABI, so we will know about any important use-case
not long after 19-beta1.
Another user of extensible TAM, Cloudberry [0]https://github.com/apache/cloudberry, does not use
OldestXmin also. I also did not find any user of
scan_analyze_next_tuple other than postgresql itself with
http://codesearch.decbian.net/ . Using code-search on github I found
[1]: https://github.com/hydradatabase/columnar/blob/main/columnar/src/backend/columnar/columnar_tableam.c#L2080C68-L2080C78
[0]: https://github.com/apache/cloudberry
[1]: https://github.com/hydradatabase/columnar/blob/main/columnar/src/backend/columnar/columnar_tableam.c#L2080C68-L2080C78
--
Best regards,
Kirill Reshke
On Sat, Jan 3, 2026 at 4:36 PM Kirill Reshke <reshkekirill@gmail.com> wrote:
After thinking about it for a week or so, I would actually suggest
moving forward with v31 (Remove OldestXmin from TAM).
I think this is a low probability of getting complaints about that.
Also, we're breaking ABI, so we will know about any important use-case
not long after 19-beta1.
Another user of extensible TAM, Cloudberry [0], does not use
OldestXmin also. I also did not find any user of
scan_analyze_next_tuple other than postgresql itself with
http://codesearch.decbian.net/ . Using code-search on github I found
[1]. Looks like this does not need OldestXmin either.
Thanks for doing this research.
I've attached v32 which is rebased over a change in master and also
resolves the one open issue in the patch set: why the killtuples
isolation test heap accesses count changed after setting pd_prune_xid.
That test creates a table, inserts tuples, accesses one page, deletes
all the data, accesses a single page again (until the table is
vacuumed, the pages will still be there and have to be scanned even
though the data is deleted). The first time we set the VM on-access,
we have to extend the VM. That VM access is an extend and not a hit.
Once we set pd_prune_xid on the page, the extend happens during the
first access (before the delete), so when we access the VM after the
delete step, that is counted as a hit and we end up with more hits in
the stats.
- Melanie
Attachments:
v32-0001-Combine-visibilitymap_set-cases-in-lazy_scan_pru.patchtext/x-patch; charset=UTF-8; name=v32-0001-Combine-visibilitymap_set-cases-in-lazy_scan_pru.patchDownload
From 8e1286c1a6dbfe3309d111aaa21af5a8e6237bb8 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <melanieplageman@gmail.com>
Date: Mon, 8 Dec 2025 15:49:54 -0500
Subject: [PATCH v32 01/16] Combine visibilitymap_set() cases in
lazy_scan_prune()
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
lazy_scan_prune() previously had two separate cases that called
visibilitymap_set() after pruning and freezing. These branches were
nearly identical except that one attempted to avoid dirtying the heap
buffer. However, that situation can never occur — the heap buffer cannot
be clean at that point (and we would hit an assertion if it were).
In lazy_scan_prune(), when we change a previously all-visible page to
all-frozen and the page was recorded as all-visible in the visibility
map by find_next_unskippable_block(), the heap buffer will always be
dirty. Either we have just frozen a tuple and already dirtied the
buffer, or the buffer was modified between find_next_unskippable_block()
and heap_page_prune_and_freeze() and then pruned in
heap_page_prune_and_freeze().
Additionally, XLogRegisterBuffer() asserts that the buffer is dirty, so
attempting to add a clean heap buffer to the WAL chain would assert out
anyway.
Since the “clean heap buffer with already set VM” case is impossible,
the two visibilitymap_set() branches in lazy_scan_prune() can be merged.
Doing so makes the intent clearer and emphasizes that the heap buffer
must always be marked dirty before being added to the WAL chain.
This commit also adds a test case for vacuuming when no heap
modifications are required. Currently this ensures that the heap buffer
is marked dirty before it is added to the WAL chain, but if we later
remove the heap buffer from the VM-set WAL chain or pass it with the
REGBUF_NO_CHANGES flag, this test would guard that behavior.
Author: Melanie Plageman <melanieplageman@gmail.com>
Reviewed-by: Chao Li <li.evan.chao@gmail.com>
Reviewed-by: Srinath Reddy Sadipiralla <srinath2133@gmail.com>
Reviewed-by: Kirill Reshke <reshkekirill@gmail.com>
Reviewed-by: Xuneng Zhou <xunengzhou@gmail.com>
Discussion: https://postgr.es/m/5CEAA162-67B1-44DA-B60D-8B65717E8B05%40gmail.com
Discussion: https://postgr.es/m/flat/CAAKRu_ZWx5gCbeCf7PWCv8p5%3D%3Db7EEws0VD2wksDxpXCvCyHvQ%40mail.gmail.com
---
.../pg_visibility/expected/pg_visibility.out | 44 ++++++++++
contrib/pg_visibility/sql/pg_visibility.sql | 20 +++++
src/backend/access/heap/vacuumlazy.c | 87 ++++---------------
3 files changed, 82 insertions(+), 69 deletions(-)
diff --git a/contrib/pg_visibility/expected/pg_visibility.out b/contrib/pg_visibility/expected/pg_visibility.out
index 09fa5933a35..e10f1706015 100644
--- a/contrib/pg_visibility/expected/pg_visibility.out
+++ b/contrib/pg_visibility/expected/pg_visibility.out
@@ -1,4 +1,5 @@
CREATE EXTENSION pg_visibility;
+CREATE EXTENSION pageinspect;
--
-- recently-dropped table
--
@@ -204,6 +205,49 @@ select pg_truncate_visibility_map('test_partition');
(1 row)
+-- test the case where vacuum phase I does not need to modify the heap buffer
+-- and only needs to set the VM
+create table test_vac_unmodified_heap(a int);
+insert into test_vac_unmodified_heap values (1);
+vacuum (freeze) test_vac_unmodified_heap;
+select pg_visibility_map_summary('test_vac_unmodified_heap');
+ pg_visibility_map_summary
+---------------------------
+ (1,1)
+(1 row)
+
+-- the checkpoint cleans the buffer dirtied by freezing the sole tuple
+checkpoint;
+-- truncating the VM ensures that the next vacuum will need to set it
+select pg_truncate_visibility_map('test_vac_unmodified_heap');
+ pg_truncate_visibility_map
+----------------------------
+
+(1 row)
+
+select pg_visibility_map_summary('test_vac_unmodified_heap');
+ pg_visibility_map_summary
+---------------------------
+ (0,0)
+(1 row)
+
+-- though the VM is truncated, the heap page-level visibility hint,
+-- PD_ALL_VISIBLE should still be set
+SELECT (flags & x'0004'::int) <> 0
+ FROM page_header(get_raw_page('test_vac_unmodified_heap', 0));
+ ?column?
+----------
+ t
+(1 row)
+
+-- vacuum sets the VM
+vacuum test_vac_unmodified_heap;
+select pg_visibility_map_summary('test_vac_unmodified_heap');
+ pg_visibility_map_summary
+---------------------------
+ (1,1)
+(1 row)
+
-- test copy freeze
create table copyfreeze (a int, b char(1500));
-- load all rows via COPY FREEZE and ensure that all pages are set all-visible
diff --git a/contrib/pg_visibility/sql/pg_visibility.sql b/contrib/pg_visibility/sql/pg_visibility.sql
index 5af06ec5b76..57af8a0c5b6 100644
--- a/contrib/pg_visibility/sql/pg_visibility.sql
+++ b/contrib/pg_visibility/sql/pg_visibility.sql
@@ -1,4 +1,5 @@
CREATE EXTENSION pg_visibility;
+CREATE EXTENSION pageinspect;
--
-- recently-dropped table
@@ -94,6 +95,25 @@ select count(*) > 0 from pg_visibility_map_summary('test_partition');
select * from pg_check_frozen('test_partition'); -- hopefully none
select pg_truncate_visibility_map('test_partition');
+-- test the case where vacuum phase I does not need to modify the heap buffer
+-- and only needs to set the VM
+create table test_vac_unmodified_heap(a int);
+insert into test_vac_unmodified_heap values (1);
+vacuum (freeze) test_vac_unmodified_heap;
+select pg_visibility_map_summary('test_vac_unmodified_heap');
+-- the checkpoint cleans the buffer dirtied by freezing the sole tuple
+checkpoint;
+-- truncating the VM ensures that the next vacuum will need to set it
+select pg_truncate_visibility_map('test_vac_unmodified_heap');
+select pg_visibility_map_summary('test_vac_unmodified_heap');
+-- though the VM is truncated, the heap page-level visibility hint,
+-- PD_ALL_VISIBLE should still be set
+SELECT (flags & x'0004'::int) <> 0
+ FROM page_header(get_raw_page('test_vac_unmodified_heap', 0));
+-- vacuum sets the VM
+vacuum test_vac_unmodified_heap;
+select pg_visibility_map_summary('test_vac_unmodified_heap');
+
-- test copy freeze
create table copyfreeze (a int, b char(1500));
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index 2086a577199..2da35c85e76 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -2122,16 +2122,14 @@ lazy_scan_prune(LVRelState *vacrel,
* of last heap_vac_scan_next_block() call), and from all_visible and
* all_frozen variables
*/
- if (!all_visible_according_to_vm && presult.all_visible)
+ if ((presult.all_visible && !all_visible_according_to_vm) ||
+ (presult.all_frozen && !VM_ALL_FROZEN(rel, blkno, &vmbuffer)))
{
uint8 old_vmbits;
uint8 flags = VISIBILITYMAP_ALL_VISIBLE;
if (presult.all_frozen)
- {
- Assert(!TransactionIdIsValid(presult.vm_conflict_horizon));
flags |= VISIBILITYMAP_ALL_FROZEN;
- }
/*
* It should never be the case that the visibility map page is set
@@ -2139,15 +2137,25 @@ lazy_scan_prune(LVRelState *vacrel,
* checksums are not enabled). Regardless, set both bits so that we
* get back in sync.
*
- * NB: If the heap page is all-visible but the VM bit is not set, we
- * don't need to dirty the heap page. However, if checksums are
- * enabled, we do need to make sure that the heap page is dirtied
- * before passing it to visibilitymap_set(), because it may be logged.
- * Given that this situation should only happen in rare cases after a
- * crash, it is not worth optimizing.
+ * Even if PD_ALL_VISIBLE is already set, we don't need to worry about
+ * unnecessarily dirtying the heap buffer. Nearly the only scenario
+ * where PD_ALL_VISIBLE is set but the VM is not is if the VM was
+ * removed -- and that isn't worth optimizing for. And if we add the
+ * heap buffer to the WAL chain (without passing REGBUF_NO_CHANGES),
+ * it must be marked dirty.
*/
PageSetAllVisible(page);
MarkBufferDirty(buf);
+
+ /*
+ * If the page is being set all-frozen, we pass InvalidTransactionId
+ * as the cutoff_xid, since a snapshot conflict horizon sufficient to
+ * make everything safe for REDO was logged when the page's tuples
+ * were frozen.
+ */
+ Assert(!presult.all_frozen ||
+ !TransactionIdIsValid(presult.vm_conflict_horizon));
+
old_vmbits = visibilitymap_set(vacrel->rel, blkno, buf,
InvalidXLogRecPtr,
vmbuffer, presult.vm_conflict_horizon,
@@ -2219,65 +2227,6 @@ lazy_scan_prune(LVRelState *vacrel,
VISIBILITYMAP_VALID_BITS);
}
- /*
- * If the all-visible page is all-frozen but not marked as such yet, mark
- * it as all-frozen.
- */
- else if (all_visible_according_to_vm && presult.all_frozen &&
- !VM_ALL_FROZEN(vacrel->rel, blkno, &vmbuffer))
- {
- uint8 old_vmbits;
-
- /*
- * Avoid relying on all_visible_according_to_vm as a proxy for the
- * page-level PD_ALL_VISIBLE bit being set, since it might have become
- * stale -- even when all_visible is set
- */
- if (!PageIsAllVisible(page))
- {
- PageSetAllVisible(page);
- MarkBufferDirty(buf);
- }
-
- /*
- * Set the page all-frozen (and all-visible) in the VM.
- *
- * We can pass InvalidTransactionId as our cutoff_xid, since a
- * snapshotConflictHorizon sufficient to make everything safe for REDO
- * was logged when the page's tuples were frozen.
- */
- Assert(!TransactionIdIsValid(presult.vm_conflict_horizon));
- old_vmbits = visibilitymap_set(vacrel->rel, blkno, buf,
- InvalidXLogRecPtr,
- vmbuffer, InvalidTransactionId,
- VISIBILITYMAP_ALL_VISIBLE |
- VISIBILITYMAP_ALL_FROZEN);
-
- /*
- * The page was likely already set all-visible in the VM. However,
- * there is a small chance that it was modified sometime between
- * setting all_visible_according_to_vm and checking the visibility
- * during pruning. Check the return value of old_vmbits anyway to
- * ensure the visibility map counters used for logging are accurate.
- */
- if ((old_vmbits & VISIBILITYMAP_ALL_VISIBLE) == 0)
- {
- vacrel->vm_new_visible_pages++;
- vacrel->vm_new_visible_frozen_pages++;
- *vm_page_frozen = true;
- }
-
- /*
- * We already checked that the page was not set all-frozen in the VM
- * above, so we don't need to test the value of old_vmbits.
- */
- else
- {
- vacrel->vm_new_frozen_pages++;
- *vm_page_frozen = true;
- }
- }
-
return presult.ndeleted;
}
--
2.43.0
v32-0002-Eliminate-use-of-cached-VM-value-in-lazy_scan_pr.patchtext/x-patch; charset=US-ASCII; name=v32-0002-Eliminate-use-of-cached-VM-value-in-lazy_scan_pr.patchDownload
From 4d37243f9fa0dc4e264a28bcee448787fb8d7f65 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <melanieplageman@gmail.com>
Date: Thu, 11 Dec 2025 10:48:13 -0500
Subject: [PATCH v32 02/16] Eliminate use of cached VM value in
lazy_scan_prune()
lazy_scan_prune() takes a parameter from lazy_scan_heap() indicating
whether the page was marked all-visible in the VM at the time it was
last checked in find_next_unskippable_block(). This behavior is
historical, dating back to commit 608195a3a365, when we did not pin the
VM page until deciding we must read it. Now that the VM page is already
pinned, there is no meaningful benefit to relying on a cached VM status.
Removing this cached value simplifies the logic in both lazy_scan_heap()
and lazy_scan_prune(). It also clarifies future work that will set the
visibility map on-access: such paths will not have a cached value
available, which would make the logic harder to reason about. And
eliminating it enables us to detect and repair VM corruption on-access.
Along with removing the cached value and unconditionally checking the
visibility status of the heap page, this commit also moves the VM
corruption handling to occur first. This reordering should have no
performance impact, since the checks are inexpensive and performed only
once per page. It does, however, make the control flow easier to
understand. The new restructuring also makes it possible to set the VM
after fixing corruption (if pruning found the page all-visible).
Now that no callers of visibilitymap_set() use its return value, change
its (and visibilitymap_set_vmbits()) return type to void.
Author: Melanie Plageman <melanieplageman@gmail.com>
Reviewed-by: Kirill Reshke <reshkekirill@gmail.com>
Reviewed-by: Chao Li <li.evan.chao@gmail.com>
Reviewed-by: Xuneng Zhou <xunengzhou@gmail.com>
Discussion: https://postgr.es/m/5CEAA162-67B1-44DA-B60D-8B65717E8B05%40gmail.com
---
src/backend/access/heap/vacuumlazy.c | 182 +++++++++++-------------
src/backend/access/heap/visibilitymap.c | 9 +-
src/include/access/visibilitymap.h | 18 +--
3 files changed, 94 insertions(+), 115 deletions(-)
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index 2da35c85e76..3733a1cbc47 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -248,13 +248,6 @@ typedef enum
*/
#define EAGER_SCAN_REGION_SIZE 4096
-/*
- * heap_vac_scan_next_block() sets these flags to communicate information
- * about the block it read to the caller.
- */
-#define VAC_BLK_WAS_EAGER_SCANNED (1 << 0)
-#define VAC_BLK_ALL_VISIBLE_ACCORDING_TO_VM (1 << 1)
-
typedef struct LVRelState
{
/* Target heap relation and its indexes */
@@ -360,7 +353,6 @@ typedef struct LVRelState
/* State maintained by heap_vac_scan_next_block() */
BlockNumber current_block; /* last block returned */
BlockNumber next_unskippable_block; /* next unskippable block */
- bool next_unskippable_allvis; /* its visibility status */
bool next_unskippable_eager_scanned; /* if it was eagerly scanned */
Buffer next_unskippable_vmbuffer; /* buffer containing its VM bit */
@@ -434,7 +426,7 @@ static bool lazy_scan_new_or_empty(LVRelState *vacrel, Buffer buf,
bool sharelock, Buffer vmbuffer);
static int lazy_scan_prune(LVRelState *vacrel, Buffer buf,
BlockNumber blkno, Page page,
- Buffer vmbuffer, bool all_visible_according_to_vm,
+ Buffer vmbuffer,
bool *has_lpdead_items, bool *vm_page_frozen);
static bool lazy_scan_noprune(LVRelState *vacrel, Buffer buf,
BlockNumber blkno, Page page,
@@ -1277,7 +1269,6 @@ lazy_scan_heap(LVRelState *vacrel)
/* Initialize for the first heap_vac_scan_next_block() call */
vacrel->current_block = InvalidBlockNumber;
vacrel->next_unskippable_block = InvalidBlockNumber;
- vacrel->next_unskippable_allvis = false;
vacrel->next_unskippable_eager_scanned = false;
vacrel->next_unskippable_vmbuffer = InvalidBuffer;
@@ -1293,13 +1284,13 @@ lazy_scan_heap(LVRelState *vacrel)
MAIN_FORKNUM,
heap_vac_scan_next_block,
vacrel,
- sizeof(uint8));
+ sizeof(bool));
while (true)
{
Buffer buf;
Page page;
- uint8 blk_info = 0;
+ bool was_eager_scanned = false;
int ndeleted = 0;
bool has_lpdead_items;
void *per_buffer_data = NULL;
@@ -1368,13 +1359,13 @@ lazy_scan_heap(LVRelState *vacrel)
if (!BufferIsValid(buf))
break;
- blk_info = *((uint8 *) per_buffer_data);
+ was_eager_scanned = *((bool *) per_buffer_data);
CheckBufferIsPinnedOnce(buf);
page = BufferGetPage(buf);
blkno = BufferGetBlockNumber(buf);
vacrel->scanned_pages++;
- if (blk_info & VAC_BLK_WAS_EAGER_SCANNED)
+ if (was_eager_scanned)
vacrel->eager_scanned_pages++;
/* Report as block scanned, update error traceback information */
@@ -1445,7 +1436,6 @@ lazy_scan_heap(LVRelState *vacrel)
if (got_cleanup_lock)
ndeleted = lazy_scan_prune(vacrel, buf, blkno, page,
vmbuffer,
- blk_info & VAC_BLK_ALL_VISIBLE_ACCORDING_TO_VM,
&has_lpdead_items, &vm_page_frozen);
/*
@@ -1462,8 +1452,7 @@ lazy_scan_heap(LVRelState *vacrel)
* exclude pages skipped due to cleanup lock contention from eager
* freeze algorithm caps.
*/
- if (got_cleanup_lock &&
- (blk_info & VAC_BLK_WAS_EAGER_SCANNED))
+ if (got_cleanup_lock && was_eager_scanned)
{
/* Aggressive vacuums do not eager scan. */
Assert(!vacrel->aggressive);
@@ -1630,7 +1619,6 @@ heap_vac_scan_next_block(ReadStream *stream,
{
BlockNumber next_block;
LVRelState *vacrel = callback_private_data;
- uint8 blk_info = 0;
/* relies on InvalidBlockNumber + 1 overflowing to 0 on first call */
next_block = vacrel->current_block + 1;
@@ -1693,8 +1681,8 @@ heap_vac_scan_next_block(ReadStream *stream,
* otherwise they would've been unskippable.
*/
vacrel->current_block = next_block;
- blk_info |= VAC_BLK_ALL_VISIBLE_ACCORDING_TO_VM;
- *((uint8 *) per_buffer_data) = blk_info;
+ /* Block was not eager scanned */
+ *((bool *) per_buffer_data) = false;
return vacrel->current_block;
}
else
@@ -1706,11 +1694,7 @@ heap_vac_scan_next_block(ReadStream *stream,
Assert(next_block == vacrel->next_unskippable_block);
vacrel->current_block = next_block;
- if (vacrel->next_unskippable_allvis)
- blk_info |= VAC_BLK_ALL_VISIBLE_ACCORDING_TO_VM;
- if (vacrel->next_unskippable_eager_scanned)
- blk_info |= VAC_BLK_WAS_EAGER_SCANNED;
- *((uint8 *) per_buffer_data) = blk_info;
+ *((bool *) per_buffer_data) = vacrel->next_unskippable_eager_scanned;
return vacrel->current_block;
}
}
@@ -1735,7 +1719,6 @@ find_next_unskippable_block(LVRelState *vacrel, bool *skipsallvis)
BlockNumber next_unskippable_block = vacrel->next_unskippable_block + 1;
Buffer next_unskippable_vmbuffer = vacrel->next_unskippable_vmbuffer;
bool next_unskippable_eager_scanned = false;
- bool next_unskippable_allvis;
*skipsallvis = false;
@@ -1745,7 +1728,6 @@ find_next_unskippable_block(LVRelState *vacrel, bool *skipsallvis)
next_unskippable_block,
&next_unskippable_vmbuffer);
- next_unskippable_allvis = (mapbits & VISIBILITYMAP_ALL_VISIBLE) != 0;
/*
* At the start of each eager scan region, normal vacuums with eager
@@ -1764,7 +1746,7 @@ find_next_unskippable_block(LVRelState *vacrel, bool *skipsallvis)
* A block is unskippable if it is not all visible according to the
* visibility map.
*/
- if (!next_unskippable_allvis)
+ if ((mapbits & VISIBILITYMAP_ALL_VISIBLE) == 0)
{
Assert((mapbits & VISIBILITYMAP_ALL_FROZEN) == 0);
break;
@@ -1821,7 +1803,6 @@ find_next_unskippable_block(LVRelState *vacrel, bool *skipsallvis)
/* write the local variables back to vacrel */
vacrel->next_unskippable_block = next_unskippable_block;
- vacrel->next_unskippable_allvis = next_unskippable_allvis;
vacrel->next_unskippable_eager_scanned = next_unskippable_eager_scanned;
vacrel->next_unskippable_vmbuffer = next_unskippable_vmbuffer;
}
@@ -1982,9 +1963,7 @@ cmpOffsetNumbers(const void *a, const void *b)
* Caller must hold pin and buffer cleanup lock on the buffer.
*
* vmbuffer is the buffer containing the VM block with visibility information
- * for the heap block, blkno. all_visible_according_to_vm is the saved
- * visibility status of the heap block looked up earlier by the caller. We
- * won't rely entirely on this status, as it may be out of date.
+ * for the heap block, blkno.
*
* *has_lpdead_items is set to true or false depending on whether, upon return
* from this function, any LP_DEAD items are still present on the page.
@@ -2001,7 +1980,6 @@ lazy_scan_prune(LVRelState *vacrel,
BlockNumber blkno,
Page page,
Buffer vmbuffer,
- bool all_visible_according_to_vm,
bool *has_lpdead_items,
bool *vm_page_frozen)
{
@@ -2015,6 +1993,8 @@ lazy_scan_prune(LVRelState *vacrel,
.vistest = vacrel->vistest,
.cutoffs = &vacrel->cutoffs,
};
+ uint8 old_vmbits = 0;
+ uint8 new_vmbits = 0;
Assert(BufferGetBlockNumber(buf) == blkno);
@@ -2117,70 +2097,7 @@ lazy_scan_prune(LVRelState *vacrel,
Assert(!presult.all_visible || !(*has_lpdead_items));
Assert(!presult.all_frozen || presult.all_visible);
- /*
- * Handle setting visibility map bit based on information from the VM (as
- * of last heap_vac_scan_next_block() call), and from all_visible and
- * all_frozen variables
- */
- if ((presult.all_visible && !all_visible_according_to_vm) ||
- (presult.all_frozen && !VM_ALL_FROZEN(rel, blkno, &vmbuffer)))
- {
- uint8 old_vmbits;
- uint8 flags = VISIBILITYMAP_ALL_VISIBLE;
-
- if (presult.all_frozen)
- flags |= VISIBILITYMAP_ALL_FROZEN;
-
- /*
- * It should never be the case that the visibility map page is set
- * while the page-level bit is clear, but the reverse is allowed (if
- * checksums are not enabled). Regardless, set both bits so that we
- * get back in sync.
- *
- * Even if PD_ALL_VISIBLE is already set, we don't need to worry about
- * unnecessarily dirtying the heap buffer. Nearly the only scenario
- * where PD_ALL_VISIBLE is set but the VM is not is if the VM was
- * removed -- and that isn't worth optimizing for. And if we add the
- * heap buffer to the WAL chain (without passing REGBUF_NO_CHANGES),
- * it must be marked dirty.
- */
- PageSetAllVisible(page);
- MarkBufferDirty(buf);
-
- /*
- * If the page is being set all-frozen, we pass InvalidTransactionId
- * as the cutoff_xid, since a snapshot conflict horizon sufficient to
- * make everything safe for REDO was logged when the page's tuples
- * were frozen.
- */
- Assert(!presult.all_frozen ||
- !TransactionIdIsValid(presult.vm_conflict_horizon));
-
- old_vmbits = visibilitymap_set(vacrel->rel, blkno, buf,
- InvalidXLogRecPtr,
- vmbuffer, presult.vm_conflict_horizon,
- flags);
-
- /*
- * If the page wasn't already set all-visible and/or all-frozen in the
- * VM, count it as newly set for logging.
- */
- if ((old_vmbits & VISIBILITYMAP_ALL_VISIBLE) == 0)
- {
- vacrel->vm_new_visible_pages++;
- if (presult.all_frozen)
- {
- vacrel->vm_new_visible_frozen_pages++;
- *vm_page_frozen = true;
- }
- }
- else if ((old_vmbits & VISIBILITYMAP_ALL_FROZEN) == 0 &&
- presult.all_frozen)
- {
- vacrel->vm_new_frozen_pages++;
- *vm_page_frozen = true;
- }
- }
+ old_vmbits = visibilitymap_get_status(vacrel->rel, blkno, &vmbuffer);
/*
* As of PostgreSQL 9.2, the visibility map bit should never be set if the
@@ -2188,8 +2105,8 @@ lazy_scan_prune(LVRelState *vacrel,
* cleared after heap_vac_scan_next_block() was called, so we must recheck
* with buffer lock before concluding that the VM is corrupt.
*/
- else if (all_visible_according_to_vm && !PageIsAllVisible(page) &&
- visibilitymap_get_status(vacrel->rel, blkno, &vmbuffer) != 0)
+ if (!PageIsAllVisible(page) &&
+ (old_vmbits & VISIBILITYMAP_VALID_BITS) != 0)
{
ereport(WARNING,
(errcode(ERRCODE_DATA_CORRUPTED),
@@ -2198,6 +2115,8 @@ lazy_scan_prune(LVRelState *vacrel,
visibilitymap_clear(vacrel->rel, blkno, vmbuffer,
VISIBILITYMAP_VALID_BITS);
+ /* VM bits are now clear */
+ old_vmbits = 0;
}
/*
@@ -2225,6 +2144,71 @@ lazy_scan_prune(LVRelState *vacrel,
MarkBufferDirty(buf);
visibilitymap_clear(vacrel->rel, blkno, vmbuffer,
VISIBILITYMAP_VALID_BITS);
+ /* VM bits are now clear */
+ old_vmbits = 0;
+ }
+
+ if (!presult.all_visible)
+ return presult.ndeleted;
+
+ /* Set the visibility map and page visibility hint */
+ new_vmbits = VISIBILITYMAP_ALL_VISIBLE;
+
+ if (presult.all_frozen)
+ new_vmbits |= VISIBILITYMAP_ALL_FROZEN;
+
+ /* Nothing to do */
+ if (old_vmbits == new_vmbits)
+ return presult.ndeleted;
+
+ Assert(presult.all_visible);
+
+ /*
+ * It should never be the case that the visibility map page is set while
+ * the page-level bit is clear, but the reverse is allowed (if checksums
+ * are not enabled). Regardless, set both bits so that we get back in
+ * sync.
+ *
+ * The heap buffer must be marked dirty before adding it to the WAL chain
+ * when setting the VM. We don't worry about unnecessarily dirtying the
+ * heap buffer if PD_ALL_VISIBLE is already set, though. It is extremely
+ * rare to have a clean heap buffer with PD_ALL_VISIBLE already set and
+ * the VM bits clear, so there is no point in optimizing it.
+ */
+ PageSetAllVisible(page);
+ MarkBufferDirty(buf);
+
+ /*
+ * If the page is being set all-frozen, we pass InvalidTransactionId as
+ * the cutoff_xid, since a snapshot conflict horizon sufficient to make
+ * everything safe for REDO was logged when the page's tuples were frozen.
+ */
+ Assert(!presult.all_frozen ||
+ !TransactionIdIsValid(presult.vm_conflict_horizon));
+
+ visibilitymap_set(vacrel->rel, blkno, buf,
+ InvalidXLogRecPtr,
+ vmbuffer, presult.vm_conflict_horizon,
+ new_vmbits);
+
+ /*
+ * If the page wasn't already set all-visible and/or all-frozen in the VM,
+ * count it as newly set for logging.
+ */
+ if ((old_vmbits & VISIBILITYMAP_ALL_VISIBLE) == 0)
+ {
+ vacrel->vm_new_visible_pages++;
+ if (presult.all_frozen)
+ {
+ vacrel->vm_new_visible_frozen_pages++;
+ *vm_page_frozen = true;
+ }
+ }
+ else if ((old_vmbits & VISIBILITYMAP_ALL_FROZEN) == 0 &&
+ presult.all_frozen)
+ {
+ vacrel->vm_new_frozen_pages++;
+ *vm_page_frozen = true;
}
return presult.ndeleted;
diff --git a/src/backend/access/heap/visibilitymap.c b/src/backend/access/heap/visibilitymap.c
index 2382d18f72b..3047bd46def 100644
--- a/src/backend/access/heap/visibilitymap.c
+++ b/src/backend/access/heap/visibilitymap.c
@@ -240,10 +240,8 @@ visibilitymap_pin_ok(BlockNumber heapBlk, Buffer vmbuf)
* You must pass a buffer containing the correct map page to this function.
* Call visibilitymap_pin first to pin the right one. This function doesn't do
* any I/O.
- *
- * Returns the state of the page's VM bits before setting flags.
*/
-uint8
+void
visibilitymap_set(Relation rel, BlockNumber heapBlk, Buffer heapBuf,
XLogRecPtr recptr, Buffer vmBuf, TransactionId cutoff_xid,
uint8 flags)
@@ -320,7 +318,6 @@ visibilitymap_set(Relation rel, BlockNumber heapBlk, Buffer heapBuf,
}
LockBuffer(vmBuf, BUFFER_LOCK_UNLOCK);
- return status;
}
/*
@@ -343,7 +340,7 @@ visibilitymap_set(Relation rel, BlockNumber heapBlk, Buffer heapBuf,
*
* rlocator is used only for debugging messages.
*/
-uint8
+void
visibilitymap_set_vmbits(BlockNumber heapBlk,
Buffer vmBuf, uint8 flags,
const RelFileLocator rlocator)
@@ -386,8 +383,6 @@ visibilitymap_set_vmbits(BlockNumber heapBlk,
map[mapByte] |= (flags << mapOffset);
MarkBufferDirty(vmBuf);
}
-
- return status;
}
/*
diff --git a/src/include/access/visibilitymap.h b/src/include/access/visibilitymap.h
index 47ad489a9a7..a0166c5b410 100644
--- a/src/include/access/visibilitymap.h
+++ b/src/include/access/visibilitymap.h
@@ -32,15 +32,15 @@ extern bool visibilitymap_clear(Relation rel, BlockNumber heapBlk,
extern void visibilitymap_pin(Relation rel, BlockNumber heapBlk,
Buffer *vmbuf);
extern bool visibilitymap_pin_ok(BlockNumber heapBlk, Buffer vmbuf);
-extern uint8 visibilitymap_set(Relation rel,
- BlockNumber heapBlk, Buffer heapBuf,
- XLogRecPtr recptr,
- Buffer vmBuf,
- TransactionId cutoff_xid,
- uint8 flags);
-extern uint8 visibilitymap_set_vmbits(BlockNumber heapBlk,
- Buffer vmBuf, uint8 flags,
- const RelFileLocator rlocator);
+extern void visibilitymap_set(Relation rel,
+ BlockNumber heapBlk, Buffer heapBuf,
+ XLogRecPtr recptr,
+ Buffer vmBuf,
+ TransactionId cutoff_xid,
+ uint8 flags);
+extern void visibilitymap_set_vmbits(BlockNumber heapBlk,
+ Buffer vmBuf, uint8 flags,
+ const RelFileLocator rlocator);
extern uint8 visibilitymap_get_status(Relation rel, BlockNumber heapBlk, Buffer *vmbuf);
extern void visibilitymap_count(Relation rel, BlockNumber *all_visible, BlockNumber *all_frozen);
extern BlockNumber visibilitymap_prepare_truncate(Relation rel,
--
2.43.0
v32-0003-Refactor-lazy_scan_prune-VM-clear-logic-into-hel.patchtext/x-patch; charset=US-ASCII; name=v32-0003-Refactor-lazy_scan_prune-VM-clear-logic-into-hel.patchDownload
From 0fc1b4cbb4e67b193eca8347dca1bf8053d2020e Mon Sep 17 00:00:00 2001
From: Melanie Plageman <melanieplageman@gmail.com>
Date: Tue, 2 Dec 2025 13:36:39 -0500
Subject: [PATCH v32 03/16] Refactor lazy_scan_prune() VM clear logic into
helper
Encapsulating them in a helper makes the whole function clearer. There
is no functional change other than moving it into a helper.
Reviewed-by: Kirill Reshke <reshkekirill@gmail.com>
Reviewed-by: Chao Li <li.evan.chao@gmail.com>
---
src/backend/access/heap/vacuumlazy.c | 132 +++++++++++++++++----------
1 file changed, 85 insertions(+), 47 deletions(-)
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index 3733a1cbc47..5857fd1bfb6 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -424,6 +424,11 @@ static void find_next_unskippable_block(LVRelState *vacrel, bool *skipsallvis);
static bool lazy_scan_new_or_empty(LVRelState *vacrel, Buffer buf,
BlockNumber blkno, Page page,
bool sharelock, Buffer vmbuffer);
+static bool identify_and_fix_vm_corruption(Relation rel, Buffer heap_buffer,
+ BlockNumber heap_blk, Page heap_page,
+ int nlpdead_items,
+ Buffer vmbuffer,
+ uint8 vmbits);
static int lazy_scan_prune(LVRelState *vacrel, Buffer buf,
BlockNumber blkno, Page page,
Buffer vmbuffer,
@@ -1957,6 +1962,83 @@ cmpOffsetNumbers(const void *a, const void *b)
return pg_cmp_u16(*(const OffsetNumber *) a, *(const OffsetNumber *) b);
}
+/*
+ * Helper to correct any corruption detected on a heap page and its
+ * corresponding visibility map page after pruning but before setting the
+ * visibility map. It examines the heap page, the associated VM page, and the
+ * number of dead items previously identified.
+ *
+ * This function must be called while holding an exclusive lock on the heap
+ * buffer, and the dead items must have been discovered under that same lock.
+
+ * The provided vmbits must reflect the current state of the VM block
+ * referenced by vmbuffer. Although we do not hold a lock on the VM buffer, it
+ * is pinned, and the heap buffer is exclusively locked, ensuring that no
+ * other backend can update the VM bits corresponding to this heap page.
+ *
+ * Returns true if it cleared corruption and false otherwise.
+ */
+static bool
+identify_and_fix_vm_corruption(Relation rel, Buffer heap_buffer,
+ BlockNumber heap_blk, Page heap_page,
+ int nlpdead_items,
+ Buffer vmbuffer,
+ uint8 vmbits)
+{
+ Assert(visibilitymap_get_status(rel, heap_blk, &vmbuffer) == vmbits);
+
+ Assert(BufferIsLockedByMeInMode(heap_buffer, BUFFER_LOCK_EXCLUSIVE));
+
+ /*
+ * As of PostgreSQL 9.2, the visibility map bit should never be set if the
+ * page-level bit is clear. However, it's possible that the bit got
+ * cleared after heap_vac_scan_next_block() was called, so we must recheck
+ * with buffer lock before concluding that the VM is corrupt.
+ */
+ if (!PageIsAllVisible(heap_page) &&
+ ((vmbits & VISIBILITYMAP_VALID_BITS) != 0))
+ {
+ ereport(WARNING,
+ (errcode(ERRCODE_DATA_CORRUPTED),
+ errmsg("page is not marked all-visible but visibility map bit is set in relation \"%s\" page %u",
+ RelationGetRelationName(rel), heap_blk)));
+
+ visibilitymap_clear(rel, heap_blk, vmbuffer,
+ VISIBILITYMAP_VALID_BITS);
+ return true;
+ }
+
+ /*
+ * It's possible for the value returned by
+ * GetOldestNonRemovableTransactionId() to move backwards, so it's not
+ * wrong for us to see tuples that appear to not be visible to everyone
+ * yet, while PD_ALL_VISIBLE is already set. The real safe xmin value
+ * never moves backwards, but GetOldestNonRemovableTransactionId() is
+ * conservative and sometimes returns a value that's unnecessarily small,
+ * so if we see that contradiction it just means that the tuples that we
+ * think are not visible to everyone yet actually are, and the
+ * PD_ALL_VISIBLE flag is correct.
+ *
+ * There should never be LP_DEAD items on a page with PD_ALL_VISIBLE set,
+ * however.
+ */
+ else if (PageIsAllVisible(heap_page) && nlpdead_items > 0)
+ {
+ ereport(WARNING,
+ (errcode(ERRCODE_DATA_CORRUPTED),
+ errmsg("page containing LP_DEAD items is marked as all-visible in relation \"%s\" page %u",
+ RelationGetRelationName(rel), heap_blk)));
+
+ PageClearAllVisible(heap_page);
+ MarkBufferDirty(heap_buffer);
+ visibilitymap_clear(rel, heap_blk, vmbuffer,
+ VISIBILITYMAP_VALID_BITS);
+ return true;
+ }
+
+ return false;
+}
+
/*
* lazy_scan_prune() -- lazy_scan_heap() pruning and freezing.
*
@@ -2099,54 +2181,10 @@ lazy_scan_prune(LVRelState *vacrel,
old_vmbits = visibilitymap_get_status(vacrel->rel, blkno, &vmbuffer);
- /*
- * As of PostgreSQL 9.2, the visibility map bit should never be set if the
- * page-level bit is clear. However, it's possible that the bit got
- * cleared after heap_vac_scan_next_block() was called, so we must recheck
- * with buffer lock before concluding that the VM is corrupt.
- */
- if (!PageIsAllVisible(page) &&
- (old_vmbits & VISIBILITYMAP_VALID_BITS) != 0)
- {
- ereport(WARNING,
- (errcode(ERRCODE_DATA_CORRUPTED),
- errmsg("page is not marked all-visible but visibility map bit is set in relation \"%s\" page %u",
- vacrel->relname, blkno)));
-
- visibilitymap_clear(vacrel->rel, blkno, vmbuffer,
- VISIBILITYMAP_VALID_BITS);
- /* VM bits are now clear */
+ if (identify_and_fix_vm_corruption(vacrel->rel, buf, blkno, page,
+ presult.lpdead_items, vmbuffer,
+ old_vmbits))
old_vmbits = 0;
- }
-
- /*
- * It's possible for the value returned by
- * GetOldestNonRemovableTransactionId() to move backwards, so it's not
- * wrong for us to see tuples that appear to not be visible to everyone
- * yet, while PD_ALL_VISIBLE is already set. The real safe xmin value
- * never moves backwards, but GetOldestNonRemovableTransactionId() is
- * conservative and sometimes returns a value that's unnecessarily small,
- * so if we see that contradiction it just means that the tuples that we
- * think are not visible to everyone yet actually are, and the
- * PD_ALL_VISIBLE flag is correct.
- *
- * There should never be LP_DEAD items on a page with PD_ALL_VISIBLE set,
- * however.
- */
- else if (presult.lpdead_items > 0 && PageIsAllVisible(page))
- {
- ereport(WARNING,
- (errcode(ERRCODE_DATA_CORRUPTED),
- errmsg("page containing LP_DEAD items is marked as all-visible in relation \"%s\" page %u",
- vacrel->relname, blkno)));
-
- PageClearAllVisible(page);
- MarkBufferDirty(buf);
- visibilitymap_clear(vacrel->rel, blkno, vmbuffer,
- VISIBILITYMAP_VALID_BITS);
- /* VM bits are now clear */
- old_vmbits = 0;
- }
if (!presult.all_visible)
return presult.ndeleted;
--
2.43.0
v32-0004-Set-the-VM-in-heap_page_prune_and_freeze.patchtext/x-patch; charset=US-ASCII; name=v32-0004-Set-the-VM-in-heap_page_prune_and_freeze.patchDownload
From 5c65e73246b4968ddfa9d3739f53d0d8734b8727 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <melanieplageman@gmail.com>
Date: Tue, 2 Dec 2025 15:07:42 -0500
Subject: [PATCH v32 04/16] Set the VM in heap_page_prune_and_freeze()
This has no independent benefit. It is meant for ease of review. As of
this commit, there is still a separate WAL record emitted for setting
the VM after pruning and freezing. But it is easier to review if moving
the logic into pruneheap.c is separate from setting the VM in the same
WAL record.
Reviewed-by: Chao Li <li.evan.chao@gmail.com>
---
src/backend/access/heap/pruneheap.c | 315 +++++++++++++++++++++++----
src/backend/access/heap/vacuumlazy.c | 150 +------------
src/include/access/heapam.h | 20 ++
3 files changed, 299 insertions(+), 186 deletions(-)
diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index af788b29714..53b7711ab21 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -19,7 +19,7 @@
#include "access/htup_details.h"
#include "access/multixact.h"
#include "access/transam.h"
-#include "access/visibilitymapdefs.h"
+#include "access/visibilitymap.h"
#include "access/xlog.h"
#include "access/xloginsert.h"
#include "commands/vacuum.h"
@@ -44,6 +44,8 @@ typedef struct
bool mark_unused_now;
/* whether to attempt freezing tuples */
bool attempt_freeze;
+ /* whether or not to attempt updating the VM */
+ bool attempt_update_vm;
struct VacuumCutoffs *cutoffs;
/*-------------------------------------------------------
@@ -140,16 +142,17 @@ typedef struct
* all_visible and all_frozen indicate if the all-visible and all-frozen
* bits in the visibility map can be set for this page after pruning.
*
- * visibility_cutoff_xid is the newest xmin of live tuples on the page.
- * The caller can use it as the conflict horizon, when setting the VM
- * bits. It is only valid if we froze some tuples, and all_frozen is
- * true.
+ * visibility_cutoff_xid is the newest xmin of live tuples on the page. It
+ * can be used as the conflict horizon when setting the VM or when
+ * freezing all the tuples on the page. It is only valid when all the live
+ * tuples on the page are all-visible.
*
* NOTE: all_visible and all_frozen initially don't include LP_DEAD items.
* That's convenient for heap_page_prune_and_freeze() to use them to
- * decide whether to freeze the page or not. The all_visible and
- * all_frozen values returned to the caller are adjusted to include
- * LP_DEAD items after we determine whether to opportunistically freeze.
+ * decide whether to opportunistically freeze the page or not. The
+ * all_visible and all_frozen values ultimately used to set the VM are
+ * adjusted to include LP_DEAD items after we determine whether or not to
+ * opportunistically freeze.
*/
bool all_visible;
bool all_frozen;
@@ -191,6 +194,17 @@ static void page_verify_redirects(Page page);
static bool heap_page_will_freeze(Relation relation, Buffer buffer,
bool did_tuple_hint_fpi, bool do_prune, bool do_hint_prune,
PruneState *prstate);
+static bool identify_and_fix_vm_corruption(Relation rel, Buffer heap_buffer,
+ BlockNumber heap_blk, Page heap_page, int nlpdead_items,
+ Buffer vmbuffer,
+ uint8 vmbits);
+static bool heap_page_will_set_vm(PruneState *prstate,
+ Relation relation,
+ BlockNumber heap_blk, Buffer heap_buffer, Page heap_page,
+ Buffer vmbuffer,
+ int nlpdead_items,
+ uint8 *old_vmbits,
+ uint8 *new_vmbits);
/*
@@ -280,6 +294,7 @@ heap_page_prune_opt(Relation relation, Buffer buffer)
PruneFreezeParams params = {
.relation = relation,
.buffer = buffer,
+ .vmbuffer = InvalidBuffer,
.reason = PRUNE_ON_ACCESS,
.options = 0,
.vistest = vistest,
@@ -341,6 +356,8 @@ prune_freeze_setup(PruneFreezeParams *params,
/* cutoffs must be provided if we will attempt freezing */
Assert(!(params->options & HEAP_PAGE_PRUNE_FREEZE) || params->cutoffs);
prstate->attempt_freeze = (params->options & HEAP_PAGE_PRUNE_FREEZE) != 0;
+ prstate->attempt_update_vm =
+ (params->options & HEAP_PAGE_PRUNE_UPDATE_VM) != 0;
prstate->cutoffs = params->cutoffs;
/*
@@ -396,51 +413,54 @@ prune_freeze_setup(PruneFreezeParams *params,
prstate->frz_conflict_horizon = InvalidTransactionId;
/*
- * Vacuum may update the VM after we're done. We can keep track of
- * whether the page will be all-visible and all-frozen after pruning and
- * freezing to help the caller to do that.
+ * Track whether the page could be marked all-visible and/or all-frozen.
+ * This information is used for opportunistic freezing and for updating
+ * the visibility map (VM) if requested by the caller.
*
- * Currently, only VACUUM sets the VM bits. To save the effort, only do
- * the bookkeeping if the caller needs it. Currently, that's tied to
- * HEAP_PAGE_PRUNE_FREEZE, but it could be a separate flag if you wanted
- * to update the VM bits without also freezing or freeze without also
- * setting the VM bits.
+ * Currently, only VACUUM performs freezing, but other callers may in the
+ * future. Visibility bookkeeping is required not just for setting the VM
+ * bits, but also for opportunistic freezing: we only consider freezing if
+ * the page would become all-frozen, or if it would be all-frozen except
+ * for dead tuples that VACUUM will remove. If attempt_update_vm is false,
+ * we will not set the VM bit even if the page is found to be all-visible.
*
- * In addition to telling the caller whether it can set the VM bit, we
- * also use 'all_visible' and 'all_frozen' for our own decision-making. If
- * the whole page would become frozen, we consider opportunistically
- * freezing tuples. We will not be able to freeze the whole page if there
- * are tuples present that are not visible to everyone or if there are
- * dead tuples which are not yet removable. However, dead tuples which
- * will be removed by the end of vacuuming should not preclude us from
- * opportunistically freezing. Because of that, we do not immediately
- * clear all_visible and all_frozen when we see LP_DEAD items. We fix
- * that after scanning the line pointers. We must correct all_visible and
- * all_frozen before we return them to the caller, so that the caller
- * doesn't set the VM bits incorrectly.
+ * If HEAP_PAGE_PRUNE_UPDATE_VM is passed without HEAP_PAGE_PRUNE_FREEZE,
+ * prstate.all_frozen must be initialized to false, since we will not call
+ * heap_prepare_freeze_tuple() for each tuple.
+ *
+ * Dead tuples that will be removed by the end of vacuum should not
+ * prevent opportunistic freezing. Therefore, we do not clear all_visible
+ * and all_frozen when we encounter LP_DEAD items. Instead, we correct
+ * them after deciding whether to freeze, but before updating the VM, to
+ * avoid setting the VM bits incorrectly.
+ *
+ * If neither freezing nor VM updates are requested, we skip the extra
+ * bookkeeping. In this case, initializing all_visible to false allows
+ * heap_prune_record_unchanged_lp_normal() to bypass unnecessary work.
*/
if (prstate->attempt_freeze)
{
prstate->all_visible = true;
prstate->all_frozen = true;
}
+ else if (prstate->attempt_update_vm)
+ {
+ prstate->all_visible = true;
+ prstate->all_frozen = false;
+ }
else
{
- /*
- * Initializing to false allows skipping the work to update them in
- * heap_prune_record_unchanged_lp_normal().
- */
prstate->all_visible = false;
prstate->all_frozen = false;
}
/*
- * The visibility cutoff xid is the newest xmin of live tuples on the
- * page. In the common case, this will be set as the conflict horizon the
- * caller can use for updating the VM. If, at the end of freezing and
- * pruning, the page is all-frozen, there is no possibility that any
- * running transaction on the standby does not see tuples on the page as
- * all-visible, so the conflict horizon remains InvalidTransactionId.
+ * The visibility cutoff xid is the newest xmin of live, committed tuples
+ * older than OldestXmin on the page. This field is only kept up-to-date
+ * if the page is all-visible. As soon as a tuple is encountered that is
+ * not visible to all, this field is unmaintained. As long as it is
+ * maintained, it can be used to calculate the snapshot conflict horizon
+ * when updating the VM and/or freezing all the tuples on the page.
*/
prstate->visibility_cutoff_xid = InvalidTransactionId;
}
@@ -775,10 +795,148 @@ heap_page_will_freeze(Relation relation, Buffer buffer,
return do_freeze;
}
+/*
+ * Helper to correct any corruption detected on a heap page and its
+ * corresponding visibility map page after pruning but before setting the
+ * visibility map. It examines the heap page, the associated VM page, and the
+ * number of dead items previously identified.
+ *
+ * This function must be called while holding an exclusive lock on the heap
+ * buffer, and the dead items must have been discovered under that same lock.
+ *
+ * The provided vmbits must reflect the current state of the VM block
+ * referenced by vmbuffer. Although we do not hold a lock on the VM buffer, it
+ * is pinned, and the heap buffer is exclusively locked, ensuring that no
+ * other backend can update the VM bits corresponding to this heap page.
+ *
+ * Returns true if it cleared corruption and false otherwise.
+ */
+static bool
+identify_and_fix_vm_corruption(Relation rel, Buffer heap_buffer,
+ BlockNumber heap_blk, Page heap_page,
+ int nlpdead_items,
+ Buffer vmbuffer,
+ uint8 vmbits)
+{
+ Assert(visibilitymap_get_status(rel, heap_blk, &vmbuffer) == vmbits);
+
+ Assert(BufferIsLockedByMeInMode(heap_buffer, BUFFER_LOCK_EXCLUSIVE));
+
+ /*
+ * As of PostgreSQL 9.2, the visibility map bit should never be set if the
+ * page-level bit is clear. However, it's possible that the bit got
+ * cleared after heap_vac_scan_next_block() was called, so we must recheck
+ * with buffer lock before concluding that the VM is corrupt.
+ */
+ if (!PageIsAllVisible(heap_page) &&
+ ((vmbits & VISIBILITYMAP_VALID_BITS) != 0))
+ {
+ ereport(WARNING,
+ (errcode(ERRCODE_DATA_CORRUPTED),
+ errmsg("page is not marked all-visible but visibility map bit is set in relation \"%s\" page %u",
+ RelationGetRelationName(rel), heap_blk)));
+
+ visibilitymap_clear(rel, heap_blk, vmbuffer,
+ VISIBILITYMAP_VALID_BITS);
+ return true;
+ }
+
+ /*
+ * It's possible for the value returned by
+ * GetOldestNonRemovableTransactionId() to move backwards, so it's not
+ * wrong for us to see tuples that appear to not be visible to everyone
+ * yet, while PD_ALL_VISIBLE is already set. The real safe xmin value
+ * never moves backwards, but GetOldestNonRemovableTransactionId() is
+ * conservative and sometimes returns a value that's unnecessarily small,
+ * so if we see that contradiction it just means that the tuples that we
+ * think are not visible to everyone yet actually are, and the
+ * PD_ALL_VISIBLE flag is correct.
+ *
+ * There should never be LP_DEAD items on a page with PD_ALL_VISIBLE set,
+ * however.
+ */
+ else if (PageIsAllVisible(heap_page) && nlpdead_items > 0)
+ {
+ ereport(WARNING,
+ (errcode(ERRCODE_DATA_CORRUPTED),
+ errmsg("page containing LP_DEAD items is marked as all-visible in relation \"%s\" page %u",
+ RelationGetRelationName(rel), heap_blk)));
+
+ PageClearAllVisible(heap_page);
+ MarkBufferDirty(heap_buffer);
+ visibilitymap_clear(rel, heap_blk, vmbuffer,
+ VISIBILITYMAP_VALID_BITS);
+ return true;
+ }
+
+ return false;
+}
+
+/*
+ * Decide whether to set the visibility map bits (all-visible and all-frozen)
+ * for heap_blk using information from the PruneState and VM.
+ *
+ * This function does not actually set the VM bits or page-level visibility
+ * hint, PD_ALL_VISIBLE.
+ *
+ * However, if it finds that the page-level visibility hint or VM is
+ * corrupted, it will fix them by clearing the VM bits and visibility hint.
+ * This does not need to be done in a critical section.
+ *
+ * Returns true if one or both VM bits should be set, along with returning the
+ * current value of the VM bits in *old_vmbits and the desired new value of
+ * the VM bits in *new_vmbits.
+ *
+ * If the VM should not be set, it returns false. If we won't consider
+ * updating the VM, *old_vmbits will be 0, regardless of the current value of
+ * the VM bits.
+ */
+static bool
+heap_page_will_set_vm(PruneState *prstate,
+ Relation relation,
+ BlockNumber heap_blk, Buffer heap_buffer, Page heap_page,
+ Buffer vmbuffer,
+ int nlpdead_items,
+ uint8 *old_vmbits,
+ uint8 *new_vmbits)
+{
+ *old_vmbits = 0;
+ *new_vmbits = 0;
+
+ if (!prstate->attempt_update_vm)
+ return false;
+
+ *old_vmbits = visibilitymap_get_status(relation, heap_blk,
+ &vmbuffer);
+
+ /* We do this even if not all-visible */
+ if (identify_and_fix_vm_corruption(relation, heap_buffer, heap_blk, heap_page,
+ nlpdead_items, vmbuffer,
+ *old_vmbits))
+ *old_vmbits = 0;
+
+ if (!prstate->all_visible)
+ return false;
+
+ *new_vmbits = VISIBILITYMAP_ALL_VISIBLE;
+
+ if (prstate->all_frozen)
+ *new_vmbits |= VISIBILITYMAP_ALL_FROZEN;
+
+ if (*new_vmbits == *old_vmbits)
+ {
+ *new_vmbits = 0;
+ return false;
+ }
+
+ return true;
+}
+
/*
* Prune and repair fragmentation and potentially freeze tuples on the
- * specified page.
+ * specified page. If the page's visibility status has changed, update it in
+ * the VM.
*
* Caller must have pin and buffer cleanup lock on the page. Note that we
* don't update the FSM information for page on caller's behalf. Caller might
@@ -793,12 +951,13 @@ heap_page_will_freeze(Relation relation, Buffer buffer,
* tuples if it's required in order to advance relfrozenxid / relminmxid, or
* if it's considered advantageous for overall system performance to do so
* now. The 'params.cutoffs', 'presult', 'new_relfrozen_xid' and
- * 'new_relmin_mxid' arguments are required when freezing. When
- * HEAP_PAGE_PRUNE_FREEZE option is passed, we also set presult->all_visible
- * and presult->all_frozen after determining whether or not to
- * opportunistically freeze, to indicate if the VM bits can be set. They are
- * always set to false when the HEAP_PAGE_PRUNE_FREEZE option is not passed,
- * because at the moment only callers that also freeze need that information.
+ * 'new_relmin_mxid' arguments are required when freezing.
+ *
+ * If HEAP_PAGE_PRUNE_UPDATE_VM is set in params and the visibility status of
+ * the page has changed, we will update the VM at the same time as pruning and
+ * freezing the heap page. We will also update presult->old_vmbits and
+ * presult->new_vmbits with the state of the VM before and after updating it
+ * for the caller to use in bookkeeping.
*
* presult contains output parameters needed by callers, such as the number of
* tuples removed and the offsets of dead items on the page after pruning.
@@ -823,13 +982,18 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
MultiXactId *new_relmin_mxid)
{
Buffer buffer = params->buffer;
+ Buffer vmbuffer = params->vmbuffer;
Page page = BufferGetPage(buffer);
+ BlockNumber blockno = BufferGetBlockNumber(buffer);
PruneState prstate;
bool do_freeze;
bool do_prune;
bool do_hint_prune;
+ bool do_set_vm;
bool did_tuple_hint_fpi;
int64 fpi_before = pgWalUsage.wal_fpi;
+ uint8 new_vmbits;
+ uint8 old_vmbits;
/* Initialize prstate */
prune_freeze_setup(params,
@@ -1011,6 +1175,65 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
*new_relmin_mxid = prstate.pagefrz.NoFreezePageRelminMxid;
}
}
+
+ /* Now update the visibility map and PD_ALL_VISIBLE hint */
+ Assert(!prstate.all_visible || (prstate.lpdead_items == 0));
+
+ do_set_vm = heap_page_will_set_vm(&prstate,
+ params->relation,
+ blockno,
+ buffer,
+ page,
+ vmbuffer,
+ prstate.lpdead_items,
+ &old_vmbits,
+ &new_vmbits);
+
+ /*
+ * new_vmbits should be 0 regardless of whether or not the page is
+ * all-visible if we do not intend to set the VM.
+ */
+ Assert(do_set_vm || new_vmbits == 0);
+
+ /* Set the visibility map and page visibility hint, if relevant */
+ if (do_set_vm)
+ {
+ Assert(prstate.all_visible);
+
+ /*
+ * It should never be the case that the visibility map page is set
+ * while the page-level bit is clear, but the reverse is allowed (if
+ * checksums are not enabled). Regardless, set both bits so that we
+ * get back in sync.
+ *
+ * The heap buffer must be marked dirty before adding it to the WAL
+ * chain when setting the VM. We don't worry about unnecessarily
+ * dirtying the heap buffer if PD_ALL_VISIBLE is already set, though.
+ * It is extremely rare to have a clean heap buffer with
+ * PD_ALL_VISIBLE already set and the VM bits clear, so there is no
+ * point in optimizing it.
+ */
+ PageSetAllVisible(page);
+ MarkBufferDirty(buffer);
+
+ /*
+ * If the page is being set all-frozen, we pass InvalidTransactionId
+ * as the cutoff_xid, since a snapshot conflict horizon sufficient to
+ * make everything safe for REDO was logged when the page's tuples
+ * were frozen.
+ */
+ Assert(!prstate.all_frozen ||
+ !TransactionIdIsValid(presult->vm_conflict_horizon));
+
+ visibilitymap_set(params->relation, blockno, buffer,
+ InvalidXLogRecPtr,
+ vmbuffer, presult->vm_conflict_horizon,
+ new_vmbits);
+ }
+
+ /* Save the vmbits for caller */
+ presult->old_vmbits = old_vmbits;
+ presult->new_vmbits = new_vmbits;
}
@@ -1485,6 +1708,8 @@ heap_prune_record_unchanged_lp_normal(Page page, PruneState *prstate, OffsetNumb
{
TransactionId xmin;
+ Assert(prstate->attempt_update_vm);
+
if (!HeapTupleHeaderXminCommitted(htup))
{
prstate->all_visible = false;
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index 5857fd1bfb6..fe816299f4b 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -424,11 +424,7 @@ static void find_next_unskippable_block(LVRelState *vacrel, bool *skipsallvis);
static bool lazy_scan_new_or_empty(LVRelState *vacrel, Buffer buf,
BlockNumber blkno, Page page,
bool sharelock, Buffer vmbuffer);
-static bool identify_and_fix_vm_corruption(Relation rel, Buffer heap_buffer,
- BlockNumber heap_blk, Page heap_page,
- int nlpdead_items,
- Buffer vmbuffer,
- uint8 vmbits);
+
static int lazy_scan_prune(LVRelState *vacrel, Buffer buf,
BlockNumber blkno, Page page,
Buffer vmbuffer,
@@ -1962,83 +1958,6 @@ cmpOffsetNumbers(const void *a, const void *b)
return pg_cmp_u16(*(const OffsetNumber *) a, *(const OffsetNumber *) b);
}
-/*
- * Helper to correct any corruption detected on a heap page and its
- * corresponding visibility map page after pruning but before setting the
- * visibility map. It examines the heap page, the associated VM page, and the
- * number of dead items previously identified.
- *
- * This function must be called while holding an exclusive lock on the heap
- * buffer, and the dead items must have been discovered under that same lock.
-
- * The provided vmbits must reflect the current state of the VM block
- * referenced by vmbuffer. Although we do not hold a lock on the VM buffer, it
- * is pinned, and the heap buffer is exclusively locked, ensuring that no
- * other backend can update the VM bits corresponding to this heap page.
- *
- * Returns true if it cleared corruption and false otherwise.
- */
-static bool
-identify_and_fix_vm_corruption(Relation rel, Buffer heap_buffer,
- BlockNumber heap_blk, Page heap_page,
- int nlpdead_items,
- Buffer vmbuffer,
- uint8 vmbits)
-{
- Assert(visibilitymap_get_status(rel, heap_blk, &vmbuffer) == vmbits);
-
- Assert(BufferIsLockedByMeInMode(heap_buffer, BUFFER_LOCK_EXCLUSIVE));
-
- /*
- * As of PostgreSQL 9.2, the visibility map bit should never be set if the
- * page-level bit is clear. However, it's possible that the bit got
- * cleared after heap_vac_scan_next_block() was called, so we must recheck
- * with buffer lock before concluding that the VM is corrupt.
- */
- if (!PageIsAllVisible(heap_page) &&
- ((vmbits & VISIBILITYMAP_VALID_BITS) != 0))
- {
- ereport(WARNING,
- (errcode(ERRCODE_DATA_CORRUPTED),
- errmsg("page is not marked all-visible but visibility map bit is set in relation \"%s\" page %u",
- RelationGetRelationName(rel), heap_blk)));
-
- visibilitymap_clear(rel, heap_blk, vmbuffer,
- VISIBILITYMAP_VALID_BITS);
- return true;
- }
-
- /*
- * It's possible for the value returned by
- * GetOldestNonRemovableTransactionId() to move backwards, so it's not
- * wrong for us to see tuples that appear to not be visible to everyone
- * yet, while PD_ALL_VISIBLE is already set. The real safe xmin value
- * never moves backwards, but GetOldestNonRemovableTransactionId() is
- * conservative and sometimes returns a value that's unnecessarily small,
- * so if we see that contradiction it just means that the tuples that we
- * think are not visible to everyone yet actually are, and the
- * PD_ALL_VISIBLE flag is correct.
- *
- * There should never be LP_DEAD items on a page with PD_ALL_VISIBLE set,
- * however.
- */
- else if (PageIsAllVisible(heap_page) && nlpdead_items > 0)
- {
- ereport(WARNING,
- (errcode(ERRCODE_DATA_CORRUPTED),
- errmsg("page containing LP_DEAD items is marked as all-visible in relation \"%s\" page %u",
- RelationGetRelationName(rel), heap_blk)));
-
- PageClearAllVisible(heap_page);
- MarkBufferDirty(heap_buffer);
- visibilitymap_clear(rel, heap_blk, vmbuffer,
- VISIBILITYMAP_VALID_BITS);
- return true;
- }
-
- return false;
-}
-
/*
* lazy_scan_prune() -- lazy_scan_heap() pruning and freezing.
*
@@ -2070,13 +1989,12 @@ lazy_scan_prune(LVRelState *vacrel,
PruneFreezeParams params = {
.relation = rel,
.buffer = buf,
+ .vmbuffer = vmbuffer,
.reason = PRUNE_VACUUM_SCAN,
- .options = HEAP_PAGE_PRUNE_FREEZE,
+ .options = HEAP_PAGE_PRUNE_FREEZE | HEAP_PAGE_PRUNE_UPDATE_VM,
.vistest = vacrel->vistest,
.cutoffs = &vacrel->cutoffs,
};
- uint8 old_vmbits = 0;
- uint8 new_vmbits = 0;
Assert(BufferGetBlockNumber(buf) == blkno);
@@ -2176,75 +2094,25 @@ lazy_scan_prune(LVRelState *vacrel,
/* Did we find LP_DEAD items? */
*has_lpdead_items = (presult.lpdead_items > 0);
- Assert(!presult.all_visible || !(*has_lpdead_items));
- Assert(!presult.all_frozen || presult.all_visible);
-
- old_vmbits = visibilitymap_get_status(vacrel->rel, blkno, &vmbuffer);
-
- if (identify_and_fix_vm_corruption(vacrel->rel, buf, blkno, page,
- presult.lpdead_items, vmbuffer,
- old_vmbits))
- old_vmbits = 0;
-
- if (!presult.all_visible)
- return presult.ndeleted;
-
- /* Set the visibility map and page visibility hint */
- new_vmbits = VISIBILITYMAP_ALL_VISIBLE;
-
- if (presult.all_frozen)
- new_vmbits |= VISIBILITYMAP_ALL_FROZEN;
-
- /* Nothing to do */
- if (old_vmbits == new_vmbits)
- return presult.ndeleted;
-
- Assert(presult.all_visible);
-
- /*
- * It should never be the case that the visibility map page is set while
- * the page-level bit is clear, but the reverse is allowed (if checksums
- * are not enabled). Regardless, set both bits so that we get back in
- * sync.
- *
- * The heap buffer must be marked dirty before adding it to the WAL chain
- * when setting the VM. We don't worry about unnecessarily dirtying the
- * heap buffer if PD_ALL_VISIBLE is already set, though. It is extremely
- * rare to have a clean heap buffer with PD_ALL_VISIBLE already set and
- * the VM bits clear, so there is no point in optimizing it.
- */
- PageSetAllVisible(page);
- MarkBufferDirty(buf);
-
- /*
- * If the page is being set all-frozen, we pass InvalidTransactionId as
- * the cutoff_xid, since a snapshot conflict horizon sufficient to make
- * everything safe for REDO was logged when the page's tuples were frozen.
- */
- Assert(!presult.all_frozen ||
- !TransactionIdIsValid(presult.vm_conflict_horizon));
-
- visibilitymap_set(vacrel->rel, blkno, buf,
- InvalidXLogRecPtr,
- vmbuffer, presult.vm_conflict_horizon,
- new_vmbits);
/*
* If the page wasn't already set all-visible and/or all-frozen in the VM,
* count it as newly set for logging.
*/
- if ((old_vmbits & VISIBILITYMAP_ALL_VISIBLE) == 0)
+ if ((presult.old_vmbits & VISIBILITYMAP_ALL_VISIBLE) == 0 &&
+ (presult.new_vmbits & VISIBILITYMAP_ALL_VISIBLE) != 0)
{
vacrel->vm_new_visible_pages++;
- if (presult.all_frozen)
+ if ((presult.new_vmbits & VISIBILITYMAP_ALL_FROZEN) != 0)
{
vacrel->vm_new_visible_frozen_pages++;
*vm_page_frozen = true;
}
}
- else if ((old_vmbits & VISIBILITYMAP_ALL_FROZEN) == 0 &&
- presult.all_frozen)
+ else if ((presult.old_vmbits & VISIBILITYMAP_ALL_FROZEN) == 0 &&
+ (presult.new_vmbits & VISIBILITYMAP_ALL_FROZEN) != 0)
{
+ Assert((presult.new_vmbits & VISIBILITYMAP_ALL_VISIBLE) != 0);
vacrel->vm_new_frozen_pages++;
*vm_page_frozen = true;
}
diff --git a/src/include/access/heapam.h b/src/include/access/heapam.h
index ce48fac42ba..2c07e197dc8 100644
--- a/src/include/access/heapam.h
+++ b/src/include/access/heapam.h
@@ -42,6 +42,7 @@
/* "options" flag bits for heap_page_prune_and_freeze */
#define HEAP_PAGE_PRUNE_MARK_UNUSED_NOW (1 << 0)
#define HEAP_PAGE_PRUNE_FREEZE (1 << 1)
+#define HEAP_PAGE_PRUNE_UPDATE_VM (1 << 2)
typedef struct BulkInsertStateData *BulkInsertState;
typedef struct GlobalVisState GlobalVisState;
@@ -238,6 +239,12 @@ typedef struct PruneFreezeParams
Relation relation; /* relation containing buffer to be pruned */
Buffer buffer; /* buffer to be pruned */
+ /*
+ * If we will consider updating the visibility map, vmbuffer should
+ * contain the correct block of the visibility map and be pinned.
+ */
+ Buffer vmbuffer;
+
/*
* The reason pruning was performed. It is used to set the WAL record
* opcode which is used for debugging and analysis purposes.
@@ -252,6 +259,9 @@ typedef struct PruneFreezeParams
*
* HEAP_PAGE_PRUNE_FREEZE indicates that we will also freeze tuples, and
* will return 'all_visible', 'all_frozen' flags to the caller.
+ *
+ * HEAP_PAGE_PRUNE_UPDATE_VM indicates that we will set the page's status
+ * in the VM.
*/
int options;
@@ -299,6 +309,16 @@ typedef struct PruneFreezeResult
bool all_frozen;
TransactionId vm_conflict_horizon;
+ /*
+ * old_vmbits are the state of the all-visible and all-frozen bits in the
+ * visibility map before updating it during phase I of vacuuming.
+ * new_vmbits are the state of those bits after phase I of vacuuming.
+ *
+ * These are only set if the HEAP_PAGE_PRUNE_UPDATE_VM option is set.
+ */
+ uint8 new_vmbits;
+ uint8 old_vmbits;
+
/*
* Whether or not the page makes rel truncation unsafe. This is set to
* 'true', even if the page contains LP_DEAD items. VACUUM will remove
--
2.43.0
v32-0005-Move-VM-assert-into-prune-freeze-code.patchtext/x-patch; charset=US-ASCII; name=v32-0005-Move-VM-assert-into-prune-freeze-code.patchDownload
From 0162c78c42764cdb0ecf0ad82eb616954d15a94d Mon Sep 17 00:00:00 2001
From: Melanie Plageman <melanieplageman@gmail.com>
Date: Tue, 2 Dec 2025 15:57:34 -0500
Subject: [PATCH v32 05/16] Move VM assert into prune/freeze code
This is a step toward setting the VM in the same WAL record as pruning
and freezing. It moves the check of the heap page into prune/freeze code
before setting the VM. This allows us to remove some fields of the
PruneFreezeResult.
Reviewed-by: Chao Li <li.evan.chao@gmail.com>
---
src/backend/access/heap/pruneheap.c | 86 ++++++++++++++++++++++------
src/backend/access/heap/vacuumlazy.c | 68 +---------------------
src/include/access/heapam.h | 25 +++-----
3 files changed, 77 insertions(+), 102 deletions(-)
diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index 53b7711ab21..85ac1a54882 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -932,6 +932,31 @@ heap_page_will_set_vm(PruneState *prstate,
return true;
}
+#ifdef USE_ASSERT_CHECKING
+
+/*
+ * Wrapper for heap_page_would_be_all_visible() which can be used for callers
+ * that expect no LP_DEAD on the page. Currently assert-only, but there is no
+ * reason not to use it outside of asserts.
+ */
+static bool
+heap_page_is_all_visible(Relation rel, Buffer buf,
+ TransactionId OldestXmin,
+ bool *all_frozen,
+ TransactionId *visibility_cutoff_xid,
+ OffsetNumber *logging_offnum)
+{
+
+ return heap_page_would_be_all_visible(rel, buf,
+ OldestXmin,
+ NULL, 0,
+ all_frozen,
+ visibility_cutoff_xid,
+ logging_offnum);
+}
+#endif
+
+
/*
* Prune and repair fragmentation and potentially freeze tuples on the
@@ -985,6 +1010,7 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
Buffer vmbuffer = params->vmbuffer;
Page page = BufferGetPage(buffer);
BlockNumber blockno = BufferGetBlockNumber(buffer);
+ TransactionId vm_conflict_horizon = InvalidTransactionId;
PruneState prstate;
bool do_freeze;
bool do_prune;
@@ -1142,23 +1168,8 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
presult->nfrozen = prstate.nfrozen;
presult->live_tuples = prstate.live_tuples;
presult->recently_dead_tuples = prstate.recently_dead_tuples;
- presult->all_visible = prstate.all_visible;
- presult->all_frozen = prstate.all_frozen;
presult->hastup = prstate.hastup;
- /*
- * For callers planning to update the visibility map, the conflict horizon
- * for that record must be the newest xmin on the page. However, if the
- * page is completely frozen, there can be no conflict and the
- * vm_conflict_horizon should remain InvalidTransactionId. This includes
- * the case that we just froze all the tuples; the prune-freeze record
- * included the conflict XID already so the caller doesn't need it.
- */
- if (presult->all_frozen)
- presult->vm_conflict_horizon = InvalidTransactionId;
- else
- presult->vm_conflict_horizon = prstate.visibility_cutoff_xid;
-
presult->lpdead_items = prstate.lpdead_items;
/* the presult->deadoffsets array was already filled in */
@@ -1176,6 +1187,46 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
}
}
+ /*
+ * If updating the visibility map, the conflict horizon for that record
+ * must be the newest xmin on the page. However, if the page is
+ * completely frozen, there can be no conflict and the vm_conflict_horizon
+ * should remain InvalidTransactionId. This includes the case that we
+ * just froze all the tuples; the prune-freeze record included the
+ * conflict XID already so we don't need to again.
+ */
+ if (prstate.all_frozen)
+ vm_conflict_horizon = InvalidTransactionId;
+ else
+ vm_conflict_horizon = prstate.visibility_cutoff_xid;
+
+ /*
+ * During its second pass over the heap, VACUUM calls
+ * heap_page_would_be_all_visible() to determine whether a page is
+ * all-visible and all-frozen. The logic here is similar. After completing
+ * pruning and freezing, use an assertion to verify that our results
+ * remain consistent with heap_page_would_be_all_visible().
+ */
+#ifdef USE_ASSERT_CHECKING
+ if (prstate.all_visible)
+ {
+ TransactionId debug_cutoff;
+ bool debug_all_frozen;
+
+ Assert(presult->lpdead_items == 0);
+
+ Assert(heap_page_is_all_visible(params->relation, buffer,
+ prstate.cutoffs->OldestXmin,
+ &debug_all_frozen,
+ &debug_cutoff, off_loc));
+
+ Assert(prstate.all_frozen == debug_all_frozen);
+
+ Assert(!TransactionIdIsValid(debug_cutoff) ||
+ debug_cutoff == vm_conflict_horizon);
+ }
+#endif
+
/* Now update the visibility map and PD_ALL_VISIBLE hint */
Assert(!prstate.all_visible || (prstate.lpdead_items == 0));
@@ -1222,12 +1273,11 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
* make everything safe for REDO was logged when the page's tuples
* were frozen.
*/
- Assert(!prstate.all_frozen ||
- !TransactionIdIsValid(presult->vm_conflict_horizon));
+ Assert(!prstate.all_frozen || !TransactionIdIsValid(vm_conflict_horizon));
visibilitymap_set(params->relation, blockno, buffer,
InvalidXLogRecPtr,
- vmbuffer, presult->vm_conflict_horizon,
+ vmbuffer, vm_conflict_horizon,
new_vmbits);
}
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index fe816299f4b..b7d834969d6 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -459,20 +459,6 @@ static void dead_items_add(LVRelState *vacrel, BlockNumber blkno, OffsetNumber *
static void dead_items_reset(LVRelState *vacrel);
static void dead_items_cleanup(LVRelState *vacrel);
-#ifdef USE_ASSERT_CHECKING
-static bool heap_page_is_all_visible(Relation rel, Buffer buf,
- TransactionId OldestXmin,
- bool *all_frozen,
- TransactionId *visibility_cutoff_xid,
- OffsetNumber *logging_offnum);
-#endif
-static bool heap_page_would_be_all_visible(Relation rel, Buffer buf,
- TransactionId OldestXmin,
- OffsetNumber *deadoffsets,
- int ndeadoffsets,
- bool *all_frozen,
- TransactionId *visibility_cutoff_xid,
- OffsetNumber *logging_offnum);
static void update_relstats_all_indexes(LVRelState *vacrel);
static void vacuum_error_callback(void *arg);
static void update_vacuum_error_info(LVRelState *vacrel,
@@ -2035,32 +2021,6 @@ lazy_scan_prune(LVRelState *vacrel,
vacrel->new_frozen_tuple_pages++;
}
- /*
- * VACUUM will call heap_page_is_all_visible() during the second pass over
- * the heap to determine all_visible and all_frozen for the page -- this
- * is a specialized version of the logic from this function. Now that
- * we've finished pruning and freezing, make sure that we're in total
- * agreement with heap_page_is_all_visible() using an assertion.
- */
-#ifdef USE_ASSERT_CHECKING
- if (presult.all_visible)
- {
- TransactionId debug_cutoff;
- bool debug_all_frozen;
-
- Assert(presult.lpdead_items == 0);
-
- Assert(heap_page_is_all_visible(vacrel->rel, buf,
- vacrel->cutoffs.OldestXmin, &debug_all_frozen,
- &debug_cutoff, &vacrel->offnum));
-
- Assert(presult.all_frozen == debug_all_frozen);
-
- Assert(!TransactionIdIsValid(debug_cutoff) ||
- debug_cutoff == presult.vm_conflict_horizon);
- }
-#endif
-
/*
* Now save details of the LP_DEAD items from the page in vacrel
*/
@@ -3522,29 +3482,6 @@ dead_items_cleanup(LVRelState *vacrel)
vacrel->pvs = NULL;
}
-#ifdef USE_ASSERT_CHECKING
-
-/*
- * Wrapper for heap_page_would_be_all_visible() which can be used for callers
- * that expect no LP_DEAD on the page. Currently assert-only, but there is no
- * reason not to use it outside of asserts.
- */
-static bool
-heap_page_is_all_visible(Relation rel, Buffer buf,
- TransactionId OldestXmin,
- bool *all_frozen,
- TransactionId *visibility_cutoff_xid,
- OffsetNumber *logging_offnum)
-{
-
- return heap_page_would_be_all_visible(rel, buf,
- OldestXmin,
- NULL, 0,
- all_frozen,
- visibility_cutoff_xid,
- logging_offnum);
-}
-#endif
/*
* Check whether the heap page in buf is all-visible except for the dead
@@ -3568,15 +3505,12 @@ heap_page_is_all_visible(Relation rel, Buffer buf,
* - *logging_offnum: OffsetNumber of current tuple being processed;
* used by vacuum's error callback system.
*
- * Callers looking to verify that the page is already all-visible can call
- * heap_page_is_all_visible().
- *
* This logic is closely related to heap_prune_record_unchanged_lp_normal().
* If you modify this function, ensure consistency with that code. An
* assertion cross-checks that both remain in agreement. Do not introduce new
* side-effects.
*/
-static bool
+bool
heap_page_would_be_all_visible(Relation rel, Buffer buf,
TransactionId OldestXmin,
OffsetNumber *deadoffsets,
diff --git a/src/include/access/heapam.h b/src/include/access/heapam.h
index 2c07e197dc8..e0da1f7cdcc 100644
--- a/src/include/access/heapam.h
+++ b/src/include/access/heapam.h
@@ -257,8 +257,7 @@ typedef struct PruneFreezeParams
* HEAP_PAGE_PRUNE_MARK_UNUSED_NOW indicates that dead items can be set
* LP_UNUSED during pruning.
*
- * HEAP_PAGE_PRUNE_FREEZE indicates that we will also freeze tuples, and
- * will return 'all_visible', 'all_frozen' flags to the caller.
+ * HEAP_PAGE_PRUNE_FREEZE indicates that we will also freeze tuples.
*
* HEAP_PAGE_PRUNE_UPDATE_VM indicates that we will set the page's status
* in the VM.
@@ -294,21 +293,6 @@ typedef struct PruneFreezeResult
int live_tuples;
int recently_dead_tuples;
- /*
- * all_visible and all_frozen indicate if the all-visible and all-frozen
- * bits in the visibility map can be set for this page, after pruning.
- *
- * vm_conflict_horizon is the newest xmin of live tuples on the page. The
- * caller can use it as the conflict horizon when setting the VM bits. It
- * is only valid if we froze some tuples (nfrozen > 0), and all_frozen is
- * true.
- *
- * These are only set if the HEAP_PAGE_PRUNE_FREEZE option is set.
- */
- bool all_visible;
- bool all_frozen;
- TransactionId vm_conflict_horizon;
-
/*
* old_vmbits are the state of the all-visible and all-frozen bits in the
* visibility map before updating it during phase I of vacuuming.
@@ -453,6 +437,13 @@ extern void log_heap_prune_and_freeze(Relation relation, Buffer buffer,
/* in heap/vacuumlazy.c */
extern void heap_vacuum_rel(Relation rel,
const VacuumParams params, BufferAccessStrategy bstrategy);
+extern bool heap_page_would_be_all_visible(Relation rel, Buffer buf,
+ TransactionId OldestXmin,
+ OffsetNumber *deadoffsets,
+ int ndeadoffsets,
+ bool *all_frozen,
+ TransactionId *visibility_cutoff_xid,
+ OffsetNumber *logging_offnum);
/* in heap/heapam_visibility.c */
extern bool HeapTupleSatisfiesVisibility(HeapTuple htup, Snapshot snapshot,
--
2.43.0
v32-0006-Eliminate-XLOG_HEAP2_VISIBLE-from-vacuum-phase-I.patchtext/x-patch; charset=US-ASCII; name=v32-0006-Eliminate-XLOG_HEAP2_VISIBLE-from-vacuum-phase-I.patchDownload
From cdf5776fadeae3430c692999b37f8a7ec944bda1 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <melanieplageman@gmail.com>
Date: Tue, 2 Dec 2025 16:16:22 -0500
Subject: [PATCH v32 06/16] Eliminate XLOG_HEAP2_VISIBLE from vacuum phase I
prune/freeze
Vacuum no longer emits a separate WAL record for each page set
all-visible or all-frozen during phase I. Instead, visibility map
updates are now included in the XLOG_HEAP2_PRUNE_VACUUM_SCAN record that
is already emitted for pruning and freezing.
Previously, heap_page_prune_and_freeze() determined whether a page was
all-visible, but the corresponding VM bits were only set later in
lazy_scan_prune(). Now the VM is updated immediately in
heap_page_prune_and_freeze(), at the same time as the heap
modifications.
This change applies only to vacuum phase I, not to pruning performed
during normal page access.
Author: Melanie Plageman <melanieplageman@gmail.com>
Reviewed-by: Andres Freund <andres@anarazel.de>
Reviewed-by: Robert Haas <robertmhaas@gmail.com>
Reviewed-by: Kirill Reshke <reshkekirill@gmail.com>
Reviewed-by: Chao Li <li.evan.chao@gmail.com>
Discussion: https://postgr.es/m/flat/CAAKRu_ZMw6Npd_qm2KM%2BFwQ3cMOMx1Dh3VMhp8-V7SOLxdK9-g%40mail.gmail.com
---
src/backend/access/heap/pruneheap.c | 275 ++++++++++++++++------------
1 file changed, 157 insertions(+), 118 deletions(-)
diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index 85ac1a54882..b3ea42f1be1 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -205,6 +205,11 @@ static bool heap_page_will_set_vm(PruneState *prstate,
int nlpdead_items,
uint8 *old_vmbits,
uint8 *new_vmbits);
+static TransactionId get_conflict_xid(bool do_prune, bool do_freeze, bool do_set_vm,
+ uint8 old_vmbits, uint8 new_vmbits,
+ TransactionId latest_xid_removed,
+ TransactionId frz_conflict_horizon,
+ TransactionId visibility_cutoff_xid);
/*
@@ -795,6 +800,68 @@ heap_page_will_freeze(Relation relation, Buffer buffer,
return do_freeze;
}
+/*
+ * Calculate the conflict horizon for the whole XLOG_HEAP2_PRUNE_VACUUM_SCAN
+ * or XLOG_HEAP2_PRUNE_ON_ACCESS record.
+ */
+static TransactionId
+get_conflict_xid(bool do_prune, bool do_freeze, bool do_set_vm,
+ uint8 old_vmbits, uint8 new_vmbits,
+ TransactionId latest_xid_removed, TransactionId frz_conflict_horizon,
+ TransactionId visibility_cutoff_xid)
+{
+ TransactionId conflict_xid;
+
+ /*
+ * We can omit the snapshot conflict horizon if we are not pruning or
+ * freezing any tuples and are setting an already all-visible page
+ * all-frozen in the VM. In this case, all of the tuples on the page must
+ * already be visible to all MVCC snapshots on the standby.
+ */
+ if (!do_prune &&
+ !do_freeze &&
+ do_set_vm &&
+ (old_vmbits & VISIBILITYMAP_ALL_VISIBLE) != 0 &&
+ (new_vmbits & VISIBILITYMAP_ALL_FROZEN) != 0)
+ return InvalidTransactionId;
+
+ /*
+ * The snapshotConflictHorizon for the whole record should be the most
+ * conservative of all the horizons calculated for any of the possible
+ * modifications. If this record will prune tuples, any transactions on
+ * the standby older than the youngest xmax of the most recently removed
+ * tuple this record will prune will conflict. If this record will freeze
+ * tuples, any transactions on the standby with xids older than the
+ * youngest tuple this record will freeze will conflict.
+ */
+ conflict_xid = InvalidTransactionId;
+
+ /*
+ * If we are updating the VM, the conflict horizon is almost always the
+ * visibility cutoff XID.
+ *
+ * Separately, if we are freezing any tuples, as an optimization, we can
+ * use the visibility_cutoff_xid as the conflict horizon if the page will
+ * be all-frozen. This is true even if there are LP_DEAD line pointers
+ * because we ignored those when maintaining the visibility_cutoff_xid.
+ * This will have been calculated earlier as the frz_conflict_horizon when
+ * we determined we would freeze.
+ */
+ if (do_set_vm)
+ conflict_xid = visibility_cutoff_xid;
+ else if (do_freeze)
+ conflict_xid = frz_conflict_horizon;
+
+ /*
+ * If we are removing tuples with a younger xmax than our so far
+ * calculated conflict_xid, we must use this as our horizon.
+ */
+ if (TransactionIdFollows(latest_xid_removed, conflict_xid))
+ conflict_xid = latest_xid_removed;
+
+ return conflict_xid;
+}
+
/*
* Helper to correct any corruption detected on a heap page and its
* corresponding visibility map page after pruning but before setting the
@@ -1010,7 +1077,6 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
Buffer vmbuffer = params->vmbuffer;
Page page = BufferGetPage(buffer);
BlockNumber blockno = BufferGetBlockNumber(buffer);
- TransactionId vm_conflict_horizon = InvalidTransactionId;
PruneState prstate;
bool do_freeze;
bool do_prune;
@@ -1018,6 +1084,7 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
bool do_set_vm;
bool did_tuple_hint_fpi;
int64 fpi_before = pgWalUsage.wal_fpi;
+ TransactionId conflict_xid;
uint8 new_vmbits;
uint8 old_vmbits;
@@ -1081,6 +1148,37 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
prstate.all_visible = prstate.all_frozen = false;
Assert(!prstate.all_frozen || prstate.all_visible);
+ Assert(!prstate.all_visible || (prstate.lpdead_items == 0));
+
+ /*
+ * Decide whether to set the VM bits based on information from the VM and
+ * the all_visible/all_frozen flags.
+ */
+ do_set_vm = heap_page_will_set_vm(&prstate,
+ params->relation,
+ blockno,
+ buffer,
+ page,
+ vmbuffer,
+ prstate.lpdead_items,
+ &old_vmbits,
+ &new_vmbits);
+
+ /*
+ * new_vmbits should be 0 regardless of whether or not the page is
+ * all-visible if we do not intend to set the VM.
+ */
+ Assert(do_set_vm || new_vmbits == 0);
+
+ conflict_xid = get_conflict_xid(do_prune, do_freeze, do_set_vm,
+ old_vmbits, new_vmbits,
+ prstate.latest_xid_removed,
+ prstate.frz_conflict_horizon,
+ prstate.visibility_cutoff_xid);
+
+ /* Lock vmbuffer before entering a critical section */
+ if (do_set_vm)
+ LockBuffer(vmbuffer, BUFFER_LOCK_EXCLUSIVE);
/* Any error while applying the changes is critical */
START_CRIT_SECTION();
@@ -1102,14 +1200,17 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
/*
* If that's all we had to do to the page, this is a non-WAL-logged
- * hint. If we are going to freeze or prune the page, we will mark
- * the buffer dirty below.
+ * hint. If we are going to freeze or prune the page or set
+ * PD_ALL_VISIBLE, we will mark the buffer dirty below.
+ *
+ * Setting PD_ALL_VISIBLE is fully WAL-logged because it is forbidden
+ * for the VM to be set and PD_ALL_VISIBLE to be clear.
*/
- if (!do_freeze && !do_prune)
+ if (!do_freeze && !do_prune && !do_set_vm)
MarkBufferDirtyHint(buffer, true);
}
- if (do_prune || do_freeze)
+ if (do_prune || do_freeze || do_set_vm)
{
/* Apply the planned item changes and repair page fragmentation. */
if (do_prune)
@@ -1123,6 +1224,26 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
if (do_freeze)
heap_freeze_prepared_tuples(buffer, prstate.frozen, prstate.nfrozen);
+ /* Set the visibility map and page visibility hint */
+ if (do_set_vm)
+ {
+ /*
+ * While it is valid for PD_ALL_VISIBLE to be set when the
+ * corresponding VM bit is clear, we strongly prefer to keep them
+ * in sync.
+ *
+ * The heap buffer must be marked dirty before adding it to the
+ * WAL chain when setting the VM. We don't worry about
+ * unnecessarily dirtying the heap buffer if PD_ALL_VISIBLE is
+ * already set, though. It is extremely rare to have a clean heap
+ * buffer with PD_ALL_VISIBLE already set and the VM bits clear,
+ * so there is no point in optimizing it.
+ */
+ PageSetAllVisible(page);
+ visibilitymap_set_vmbits(blockno, vmbuffer, new_vmbits,
+ params->relation->rd_locator);
+ }
+
MarkBufferDirty(buffer);
/*
@@ -1130,29 +1251,12 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
*/
if (RelationNeedsWAL(params->relation))
{
- /*
- * The snapshotConflictHorizon for the whole record should be the
- * most conservative of all the horizons calculated for any of the
- * possible modifications. If this record will prune tuples, any
- * transactions on the standby older than the youngest xmax of the
- * most recently removed tuple this record will prune will
- * conflict. If this record will freeze tuples, any transactions
- * on the standby with xids older than the youngest tuple this
- * record will freeze will conflict.
- */
- TransactionId conflict_xid;
-
- if (TransactionIdFollows(prstate.frz_conflict_horizon,
- prstate.latest_xid_removed))
- conflict_xid = prstate.frz_conflict_horizon;
- else
- conflict_xid = prstate.latest_xid_removed;
-
log_heap_prune_and_freeze(params->relation, buffer,
- InvalidBuffer, /* vmbuffer */
- 0, /* vmflags */
+ do_set_vm ? vmbuffer : InvalidBuffer,
+ do_set_vm ? new_vmbits : 0,
conflict_xid,
- true, params->reason,
+ true, /* cleanup lock */
+ params->reason,
prstate.frozen, prstate.nfrozen,
prstate.redirected, prstate.nredirected,
prstate.nowdead, prstate.ndead,
@@ -1162,43 +1266,8 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
END_CRIT_SECTION();
- /* Copy information back for caller */
- presult->ndeleted = prstate.ndeleted;
- presult->nnewlpdead = prstate.ndead;
- presult->nfrozen = prstate.nfrozen;
- presult->live_tuples = prstate.live_tuples;
- presult->recently_dead_tuples = prstate.recently_dead_tuples;
- presult->hastup = prstate.hastup;
-
- presult->lpdead_items = prstate.lpdead_items;
- /* the presult->deadoffsets array was already filled in */
-
- if (prstate.attempt_freeze)
- {
- if (presult->nfrozen > 0)
- {
- *new_relfrozen_xid = prstate.pagefrz.FreezePageRelfrozenXid;
- *new_relmin_mxid = prstate.pagefrz.FreezePageRelminMxid;
- }
- else
- {
- *new_relfrozen_xid = prstate.pagefrz.NoFreezePageRelfrozenXid;
- *new_relmin_mxid = prstate.pagefrz.NoFreezePageRelminMxid;
- }
- }
-
- /*
- * If updating the visibility map, the conflict horizon for that record
- * must be the newest xmin on the page. However, if the page is
- * completely frozen, there can be no conflict and the vm_conflict_horizon
- * should remain InvalidTransactionId. This includes the case that we
- * just froze all the tuples; the prune-freeze record included the
- * conflict XID already so we don't need to again.
- */
- if (prstate.all_frozen)
- vm_conflict_horizon = InvalidTransactionId;
- else
- vm_conflict_horizon = prstate.visibility_cutoff_xid;
+ if (do_set_vm)
+ LockBuffer(vmbuffer, BUFFER_LOCK_UNLOCK);
/*
* During its second pass over the heap, VACUUM calls
@@ -1213,7 +1282,8 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
TransactionId debug_cutoff;
bool debug_all_frozen;
- Assert(presult->lpdead_items == 0);
+ Assert(prstate.lpdead_items == 0);
+ Assert(prstate.cutoffs);
Assert(heap_page_is_all_visible(params->relation, buffer,
prstate.cutoffs->OldestXmin,
@@ -1223,67 +1293,36 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
Assert(prstate.all_frozen == debug_all_frozen);
Assert(!TransactionIdIsValid(debug_cutoff) ||
- debug_cutoff == vm_conflict_horizon);
+ debug_cutoff == prstate.visibility_cutoff_xid);
}
#endif
- /* Now update the visibility map and PD_ALL_VISIBLE hint */
- Assert(!prstate.all_visible || (prstate.lpdead_items == 0));
-
- do_set_vm = heap_page_will_set_vm(&prstate,
- params->relation,
- blockno,
- buffer,
- page,
- vmbuffer,
- prstate.lpdead_items,
- &old_vmbits,
- &new_vmbits);
+ /* Copy information back for caller */
+ presult->ndeleted = prstate.ndeleted;
+ presult->nnewlpdead = prstate.ndead;
+ presult->nfrozen = prstate.nfrozen;
+ presult->live_tuples = prstate.live_tuples;
+ presult->recently_dead_tuples = prstate.recently_dead_tuples;
+ presult->hastup = prstate.hastup;
+ presult->new_vmbits = new_vmbits;
+ presult->old_vmbits = old_vmbits;
- /*
- * new_vmbits should be 0 regardless of whether or not the page is
- * all-visible if we do not intend to set the VM.
- */
- Assert(do_set_vm || new_vmbits == 0);
+ presult->lpdead_items = prstate.lpdead_items;
+ /* the presult->deadoffsets array was already filled in */
- /* Set the visibility map and page visibility hint, if relevant */
- if (do_set_vm)
+ if (prstate.attempt_freeze)
{
- Assert(prstate.all_visible);
-
- /*
- * It should never be the case that the visibility map page is set
- * while the page-level bit is clear, but the reverse is allowed (if
- * checksums are not enabled). Regardless, set both bits so that we
- * get back in sync.
- *
- * The heap buffer must be marked dirty before adding it to the WAL
- * chain when setting the VM. We don't worry about unnecessarily
- * dirtying the heap buffer if PD_ALL_VISIBLE is already set, though.
- * It is extremely rare to have a clean heap buffer with
- * PD_ALL_VISIBLE already set and the VM bits clear, so there is no
- * point in optimizing it.
- */
- PageSetAllVisible(page);
- MarkBufferDirty(buffer);
-
- /*
- * If the page is being set all-frozen, we pass InvalidTransactionId
- * as the cutoff_xid, since a snapshot conflict horizon sufficient to
- * make everything safe for REDO was logged when the page's tuples
- * were frozen.
- */
- Assert(!prstate.all_frozen || !TransactionIdIsValid(vm_conflict_horizon));
-
- visibilitymap_set(params->relation, blockno, buffer,
- InvalidXLogRecPtr,
- vmbuffer, vm_conflict_horizon,
- new_vmbits);
+ if (presult->nfrozen > 0)
+ {
+ *new_relfrozen_xid = prstate.pagefrz.FreezePageRelfrozenXid;
+ *new_relmin_mxid = prstate.pagefrz.FreezePageRelminMxid;
+ }
+ else
+ {
+ *new_relfrozen_xid = prstate.pagefrz.NoFreezePageRelfrozenXid;
+ *new_relmin_mxid = prstate.pagefrz.NoFreezePageRelminMxid;
+ }
}
-
- /* Save the vmbits for caller */
- presult->old_vmbits = old_vmbits;
- presult->new_vmbits = new_vmbits;
}
--
2.43.0
v32-0007-Eliminate-XLOG_HEAP2_VISIBLE-from-empty-page-vac.patchtext/x-patch; charset=US-ASCII; name=v32-0007-Eliminate-XLOG_HEAP2_VISIBLE-from-empty-page-vac.patchDownload
From 8a3d02ccb9165d53e50c391dd4d71cc108c9ef15 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <melanieplageman@gmail.com>
Date: Sat, 27 Sep 2025 11:55:21 -0400
Subject: [PATCH v32 07/16] Eliminate XLOG_HEAP2_VISIBLE from empty-page vacuum
As part of removing XLOG_HEAP2_VISIBLE records, phase I of VACUUM now
marks empty pages all-visible in a XLOG_HEAP2_PRUNE_VACUUM_SCAN record.
Author: Melanie Plageman <melanieplageman@gmail.com>
Reviewed-by: Robert Haas <robertmhaas@gmail.com>
Reviewed-by: Andres Freund <andres@anarazel.de>
Reviewed-by: Chao Li <li.evan.chao@gmail.com>
---
src/backend/access/heap/vacuumlazy.c | 35 +++++++++++++++++++++++-----
1 file changed, 29 insertions(+), 6 deletions(-)
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index b7d834969d6..afa2c3af833 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -1896,9 +1896,12 @@ lazy_scan_new_or_empty(LVRelState *vacrel, Buffer buf, BlockNumber blkno,
*/
if (!PageIsAllVisible(page))
{
+ /* Lock vmbuffer before entering critical section */
+ LockBuffer(vmbuffer, BUFFER_LOCK_EXCLUSIVE);
+
START_CRIT_SECTION();
- /* mark buffer dirty before writing a WAL record */
+ /* Mark buffer dirty before writing any WAL records */
MarkBufferDirty(buf);
/*
@@ -1915,13 +1918,33 @@ lazy_scan_new_or_empty(LVRelState *vacrel, Buffer buf, BlockNumber blkno,
log_newpage_buffer(buf, true);
PageSetAllVisible(page);
- visibilitymap_set(vacrel->rel, blkno, buf,
- InvalidXLogRecPtr,
- vmbuffer, InvalidTransactionId,
- VISIBILITYMAP_ALL_VISIBLE |
- VISIBILITYMAP_ALL_FROZEN);
+ visibilitymap_set_vmbits(blkno,
+ vmbuffer,
+ VISIBILITYMAP_ALL_VISIBLE |
+ VISIBILITYMAP_ALL_FROZEN,
+ vacrel->rel->rd_locator);
+
+ /*
+ * Emit WAL for setting PD_ALL_VISIBLE on the heap page and
+ * setting the VM.
+ */
+ if (RelationNeedsWAL(vacrel->rel))
+ log_heap_prune_and_freeze(vacrel->rel, buf,
+ vmbuffer,
+ VISIBILITYMAP_ALL_VISIBLE |
+ VISIBILITYMAP_ALL_FROZEN,
+ InvalidTransactionId, /* conflict xid */
+ false, /* cleanup lock */
+ PRUNE_VACUUM_SCAN, /* reason */
+ NULL, 0,
+ NULL, 0,
+ NULL, 0,
+ NULL, 0);
+
END_CRIT_SECTION();
+ LockBuffer(vmbuffer, BUFFER_LOCK_UNLOCK);
+
/* Count the newly all-frozen pages for logging */
vacrel->vm_new_visible_pages++;
vacrel->vm_new_visible_frozen_pages++;
--
2.43.0
v32-0008-Remove-XLOG_HEAP2_VISIBLE-entirely.patchtext/x-patch; charset=US-ASCII; name=v32-0008-Remove-XLOG_HEAP2_VISIBLE-entirely.patchDownload
From eb516fefdf4b2dde35306d38c5bc07b2f8de0183 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <melanieplageman@gmail.com>
Date: Sat, 27 Sep 2025 11:55:36 -0400
Subject: [PATCH v32 08/16] Remove XLOG_HEAP2_VISIBLE entirely
As no remaining users emit XLOG_HEAP2_VISIBLE records.
This includes deleting the xl_heap_visible struct and all functions
responsible for emitting or replaying XLOG_HEAP2_VISIBLE records.
This changes the visibility map API, so any external users/consumers of
the VM-only WAL record will need to change.
Author: Melanie Plageman <melanieplageman@gmail.com>
Reviewed-by: Andrey Borodin <x4mmm@yandex-team.ru>
Reviewed-by: Andres Freund <andres@anarazel.de>
Reviewed-by: Chao Li <li.evan.chao@gmail.com>
---
src/backend/access/common/bufmask.c | 4 +-
src/backend/access/heap/heapam.c | 54 +-------
src/backend/access/heap/heapam_xlog.c | 155 ++---------------------
src/backend/access/heap/pruneheap.c | 4 +-
src/backend/access/heap/vacuumlazy.c | 16 +--
src/backend/access/heap/visibilitymap.c | 109 +---------------
src/backend/access/rmgrdesc/heapdesc.c | 10 --
src/backend/replication/logical/decode.c | 1 -
src/backend/storage/ipc/standby.c | 12 +-
src/include/access/heapam_xlog.h | 20 ---
src/include/access/visibilitymap.h | 13 +-
src/include/access/visibilitymapdefs.h | 9 --
src/tools/pgindent/typedefs.list | 1 -
13 files changed, 38 insertions(+), 370 deletions(-)
diff --git a/src/backend/access/common/bufmask.c b/src/backend/access/common/bufmask.c
index 1a9e7bea5d2..bce767d7b71 100644
--- a/src/backend/access/common/bufmask.c
+++ b/src/backend/access/common/bufmask.c
@@ -56,8 +56,8 @@ mask_page_hint_bits(Page page)
/*
* During replay, if the page LSN has advanced past our XLOG record's LSN,
- * we don't mark the page all-visible. See heap_xlog_visible() for
- * details.
+ * we don't mark the page all-visible. See heap_xlog_prune_and_freeze()
+ * for more details.
*/
PageClearAllVisible(page);
}
diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index ad9d6338ec2..f219c7a71cf 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -2542,11 +2542,11 @@ heap_multi_insert(Relation relation, TupleTableSlot **slots, int ntuples,
else if (all_frozen_set)
{
PageSetAllVisible(page);
- visibilitymap_set_vmbits(BufferGetBlockNumber(buffer),
- vmbuffer,
- VISIBILITYMAP_ALL_VISIBLE |
- VISIBILITYMAP_ALL_FROZEN,
- relation->rd_locator);
+ visibilitymap_set(BufferGetBlockNumber(buffer),
+ vmbuffer,
+ VISIBILITYMAP_ALL_VISIBLE |
+ VISIBILITYMAP_ALL_FROZEN,
+ relation->rd_locator);
}
/*
@@ -8831,50 +8831,6 @@ bottomup_sort_and_shrink(TM_IndexDeleteOp *delstate)
return nblocksfavorable;
}
-/*
- * Perform XLogInsert for a heap-visible operation. 'block' is the block
- * being marked all-visible, and vm_buffer is the buffer containing the
- * corresponding visibility map block. Both should have already been modified
- * and dirtied.
- *
- * snapshotConflictHorizon comes from the largest xmin on the page being
- * marked all-visible. REDO routine uses it to generate recovery conflicts.
- *
- * If checksums or wal_log_hints are enabled, we may also generate a full-page
- * image of heap_buffer. Otherwise, we optimize away the FPI (by specifying
- * REGBUF_NO_IMAGE for the heap buffer), in which case the caller should *not*
- * update the heap page's LSN.
- */
-XLogRecPtr
-log_heap_visible(Relation rel, Buffer heap_buffer, Buffer vm_buffer,
- TransactionId snapshotConflictHorizon, uint8 vmflags)
-{
- xl_heap_visible xlrec;
- XLogRecPtr recptr;
- uint8 flags;
-
- Assert(BufferIsValid(heap_buffer));
- Assert(BufferIsValid(vm_buffer));
-
- xlrec.snapshotConflictHorizon = snapshotConflictHorizon;
- xlrec.flags = vmflags;
- if (RelationIsAccessibleInLogicalDecoding(rel))
- xlrec.flags |= VISIBILITYMAP_XLOG_CATALOG_REL;
- XLogBeginInsert();
- XLogRegisterData(&xlrec, SizeOfHeapVisible);
-
- XLogRegisterBuffer(0, vm_buffer, 0);
-
- flags = REGBUF_STANDARD;
- if (!XLogHintBitIsNeeded())
- flags |= REGBUF_NO_IMAGE;
- XLogRegisterBuffer(1, heap_buffer, flags);
-
- recptr = XLogInsert(RM_HEAP2_ID, XLOG_HEAP2_VISIBLE);
-
- return recptr;
-}
-
/*
* Perform XLogInsert for a heap-update operation. Caller must already
* have modified the buffer(s) and marked them dirty.
diff --git a/src/backend/access/heap/heapam_xlog.c b/src/backend/access/heap/heapam_xlog.c
index f765345e9e4..9a29fda3601 100644
--- a/src/backend/access/heap/heapam_xlog.c
+++ b/src/backend/access/heap/heapam_xlog.c
@@ -236,7 +236,7 @@ heap_xlog_prune_freeze(XLogReaderState *record)
if (PageIsNew(vmpage))
PageInit(vmpage, BLCKSZ, 0);
- visibilitymap_set_vmbits(blkno, vmbuffer, vmflags, rlocator);
+ visibilitymap_set(blkno, vmbuffer, vmflags, rlocator);
Assert(BufferIsDirty(vmbuffer));
PageSetLSN(vmpage, lsn);
@@ -249,142 +249,6 @@ heap_xlog_prune_freeze(XLogReaderState *record)
XLogRecordPageWithFreeSpace(rlocator, blkno, freespace);
}
-/*
- * Replay XLOG_HEAP2_VISIBLE records.
- *
- * The critical integrity requirement here is that we must never end up with
- * a situation where the visibility map bit is set, and the page-level
- * PD_ALL_VISIBLE bit is clear. If that were to occur, then a subsequent
- * page modification would fail to clear the visibility map bit.
- */
-static void
-heap_xlog_visible(XLogReaderState *record)
-{
- XLogRecPtr lsn = record->EndRecPtr;
- xl_heap_visible *xlrec = (xl_heap_visible *) XLogRecGetData(record);
- Buffer vmbuffer = InvalidBuffer;
- Buffer buffer;
- Page page;
- RelFileLocator rlocator;
- BlockNumber blkno;
- XLogRedoAction action;
-
- Assert((xlrec->flags & VISIBILITYMAP_XLOG_VALID_BITS) == xlrec->flags);
-
- XLogRecGetBlockTag(record, 1, &rlocator, NULL, &blkno);
-
- /*
- * If there are any Hot Standby transactions running that have an xmin
- * horizon old enough that this page isn't all-visible for them, they
- * might incorrectly decide that an index-only scan can skip a heap fetch.
- *
- * NB: It might be better to throw some kind of "soft" conflict here that
- * forces any index-only scan that is in flight to perform heap fetches,
- * rather than killing the transaction outright.
- */
- if (InHotStandby)
- ResolveRecoveryConflictWithSnapshot(xlrec->snapshotConflictHorizon,
- xlrec->flags & VISIBILITYMAP_XLOG_CATALOG_REL,
- rlocator);
-
- /*
- * Read the heap page, if it still exists. If the heap file has dropped or
- * truncated later in recovery, we don't need to update the page, but we'd
- * better still update the visibility map.
- */
- action = XLogReadBufferForRedo(record, 1, &buffer);
- if (action == BLK_NEEDS_REDO)
- {
- /*
- * We don't bump the LSN of the heap page when setting the visibility
- * map bit (unless checksums or wal_hint_bits is enabled, in which
- * case we must). This exposes us to torn page hazards, but since
- * we're not inspecting the existing page contents in any way, we
- * don't care.
- */
- page = BufferGetPage(buffer);
-
- PageSetAllVisible(page);
-
- if (XLogHintBitIsNeeded())
- PageSetLSN(page, lsn);
-
- MarkBufferDirty(buffer);
- }
- else if (action == BLK_RESTORED)
- {
- /*
- * If heap block was backed up, we already restored it and there's
- * nothing more to do. (This can only happen with checksums or
- * wal_log_hints enabled.)
- */
- }
-
- if (BufferIsValid(buffer))
- {
- Size space = PageGetFreeSpace(BufferGetPage(buffer));
-
- UnlockReleaseBuffer(buffer);
-
- /*
- * Since FSM is not WAL-logged and only updated heuristically, it
- * easily becomes stale in standbys. If the standby is later promoted
- * and runs VACUUM, it will skip updating individual free space
- * figures for pages that became all-visible (or all-frozen, depending
- * on the vacuum mode,) which is troublesome when FreeSpaceMapVacuum
- * propagates too optimistic free space values to upper FSM layers;
- * later inserters try to use such pages only to find out that they
- * are unusable. This can cause long stalls when there are many such
- * pages.
- *
- * Forestall those problems by updating FSM's idea about a page that
- * is becoming all-visible or all-frozen.
- *
- * Do this regardless of a full-page image being applied, since the
- * FSM data is not in the page anyway.
- */
- if (xlrec->flags & VISIBILITYMAP_VALID_BITS)
- XLogRecordPageWithFreeSpace(rlocator, blkno, space);
- }
-
- /*
- * Even if we skipped the heap page update due to the LSN interlock, it's
- * still safe to update the visibility map. Any WAL record that clears
- * the visibility map bit does so before checking the page LSN, so any
- * bits that need to be cleared will still be cleared.
- */
- if (XLogReadBufferForRedoExtended(record, 0, RBM_ZERO_ON_ERROR, false,
- &vmbuffer) == BLK_NEEDS_REDO)
- {
- Page vmpage = BufferGetPage(vmbuffer);
- Relation reln;
- uint8 vmbits;
-
- /* initialize the page if it was read as zeros */
- if (PageIsNew(vmpage))
- PageInit(vmpage, BLCKSZ, 0);
-
- /* remove VISIBILITYMAP_XLOG_* */
- vmbits = xlrec->flags & VISIBILITYMAP_VALID_BITS;
-
- /*
- * XLogReadBufferForRedoExtended locked the buffer. But
- * visibilitymap_set will handle locking itself.
- */
- LockBuffer(vmbuffer, BUFFER_LOCK_UNLOCK);
-
- reln = CreateFakeRelcacheEntry(rlocator);
-
- visibilitymap_set(reln, blkno, InvalidBuffer, lsn, vmbuffer,
- xlrec->snapshotConflictHorizon, vmbits);
-
- ReleaseBuffer(vmbuffer);
- FreeFakeRelcacheEntry(reln);
- }
- else if (BufferIsValid(vmbuffer))
- UnlockReleaseBuffer(vmbuffer);
-}
-
/*
* Given an "infobits" field from an XLog record, set the correct bits in the
* given infomask and infomask2 for the tuple touched by the record.
@@ -762,8 +626,8 @@ heap_xlog_multi_insert(XLogReaderState *record)
*
* During recovery, however, no concurrent writers exist. Therefore,
* updating the VM without holding the heap page lock is safe enough. This
- * same approach is taken when replaying xl_heap_visible records (see
- * heap_xlog_visible()).
+ * same approach is taken when replaying XLOG_HEAP2_PRUNE* records (see
+ * heap_xlog_prune_and_freeze()).
*/
if ((xlrec->flags & XLH_INSERT_ALL_FROZEN_SET) &&
XLogReadBufferForRedoExtended(record, 1, RBM_ZERO_ON_ERROR, false,
@@ -775,11 +639,11 @@ heap_xlog_multi_insert(XLogReaderState *record)
if (PageIsNew(vmpage))
PageInit(vmpage, BLCKSZ, 0);
- visibilitymap_set_vmbits(blkno,
- vmbuffer,
- VISIBILITYMAP_ALL_VISIBLE |
- VISIBILITYMAP_ALL_FROZEN,
- rlocator);
+ visibilitymap_set(blkno,
+ vmbuffer,
+ VISIBILITYMAP_ALL_VISIBLE |
+ VISIBILITYMAP_ALL_FROZEN,
+ rlocator);
Assert(BufferIsDirty(vmbuffer));
PageSetLSN(vmpage, lsn);
@@ -1360,9 +1224,6 @@ heap2_redo(XLogReaderState *record)
case XLOG_HEAP2_PRUNE_VACUUM_CLEANUP:
heap_xlog_prune_freeze(record);
break;
- case XLOG_HEAP2_VISIBLE:
- heap_xlog_visible(record);
- break;
case XLOG_HEAP2_MULTI_INSERT:
heap_xlog_multi_insert(record);
break;
diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index b3ea42f1be1..cac09dff31f 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -1240,8 +1240,8 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
* so there is no point in optimizing it.
*/
PageSetAllVisible(page);
- visibilitymap_set_vmbits(blockno, vmbuffer, new_vmbits,
- params->relation->rd_locator);
+ visibilitymap_set(blockno, vmbuffer, new_vmbits,
+ params->relation->rd_locator);
}
MarkBufferDirty(buffer);
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index afa2c3af833..4d7e1636526 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -1918,11 +1918,11 @@ lazy_scan_new_or_empty(LVRelState *vacrel, Buffer buf, BlockNumber blkno,
log_newpage_buffer(buf, true);
PageSetAllVisible(page);
- visibilitymap_set_vmbits(blkno,
- vmbuffer,
- VISIBILITYMAP_ALL_VISIBLE |
- VISIBILITYMAP_ALL_FROZEN,
- vacrel->rel->rd_locator);
+ visibilitymap_set(blkno,
+ vmbuffer,
+ VISIBILITYMAP_ALL_VISIBLE |
+ VISIBILITYMAP_ALL_FROZEN,
+ vacrel->rel->rd_locator);
/*
* Emit WAL for setting PD_ALL_VISIBLE on the heap page and
@@ -2800,9 +2800,9 @@ lazy_vacuum_heap_page(LVRelState *vacrel, BlockNumber blkno, Buffer buffer,
* set PD_ALL_VISIBLE.
*/
PageSetAllVisible(page);
- visibilitymap_set_vmbits(blkno,
- vmbuffer, vmflags,
- vacrel->rel->rd_locator);
+ visibilitymap_set(blkno,
+ vmbuffer, vmflags,
+ vacrel->rel->rd_locator);
conflict_xid = visibility_cutoff_xid;
}
diff --git a/src/backend/access/heap/visibilitymap.c b/src/backend/access/heap/visibilitymap.c
index 3047bd46def..fe76bc37dce 100644
--- a/src/backend/access/heap/visibilitymap.c
+++ b/src/backend/access/heap/visibilitymap.c
@@ -14,8 +14,7 @@
* visibilitymap_clear - clear bits for one page in the visibility map
* visibilitymap_pin - pin a map page for setting a bit
* visibilitymap_pin_ok - check whether correct map page is already pinned
- * visibilitymap_set - set bit(s) in a previously pinned page and log
- * visibilitymap_set_vmbits - set bit(s) in a pinned page
+ * visibilitymap_set - set bit(s) in a previously pinned page
* visibilitymap_get_status - get status of bits
* visibilitymap_count - count number of bits set in visibility map
* visibilitymap_prepare_truncate -
@@ -220,106 +219,6 @@ visibilitymap_pin_ok(BlockNumber heapBlk, Buffer vmbuf)
return BufferIsValid(vmbuf) && BufferGetBlockNumber(vmbuf) == mapBlock;
}
-/*
- * visibilitymap_set - set bit(s) on a previously pinned page
- *
- * recptr is the LSN of the XLOG record we're replaying, if we're in recovery,
- * or InvalidXLogRecPtr in normal running. The VM page LSN is advanced to the
- * one provided; in normal running, we generate a new XLOG record and set the
- * page LSN to that value (though the heap page's LSN may *not* be updated;
- * see below). cutoff_xid is the largest xmin on the page being marked
- * all-visible; it is needed for Hot Standby, and can be InvalidTransactionId
- * if the page contains no tuples. It can also be set to InvalidTransactionId
- * when a page that is already all-visible is being marked all-frozen.
- *
- * Caller is expected to set the heap page's PD_ALL_VISIBLE bit before calling
- * this function. Except in recovery, caller should also pass the heap
- * buffer. When checksums are enabled and we're not in recovery, we must add
- * the heap buffer to the WAL chain to protect it from being torn.
- *
- * You must pass a buffer containing the correct map page to this function.
- * Call visibilitymap_pin first to pin the right one. This function doesn't do
- * any I/O.
- */
-void
-visibilitymap_set(Relation rel, BlockNumber heapBlk, Buffer heapBuf,
- XLogRecPtr recptr, Buffer vmBuf, TransactionId cutoff_xid,
- uint8 flags)
-{
- BlockNumber mapBlock = HEAPBLK_TO_MAPBLOCK(heapBlk);
- uint32 mapByte = HEAPBLK_TO_MAPBYTE(heapBlk);
- uint8 mapOffset = HEAPBLK_TO_OFFSET(heapBlk);
- Page page;
- uint8 *map;
- uint8 status;
-
-#ifdef TRACE_VISIBILITYMAP
- elog(DEBUG1, "vm_set flags 0x%02X for %s %d",
- flags, RelationGetRelationName(rel), heapBlk);
-#endif
-
- Assert(InRecovery || !XLogRecPtrIsValid(recptr));
- Assert(InRecovery || PageIsAllVisible(BufferGetPage(heapBuf)));
- Assert((flags & VISIBILITYMAP_VALID_BITS) == flags);
-
- /* Must never set all_frozen bit without also setting all_visible bit */
- Assert(flags != VISIBILITYMAP_ALL_FROZEN);
-
- /* Check that we have the right heap page pinned, if present */
- if (BufferIsValid(heapBuf) && BufferGetBlockNumber(heapBuf) != heapBlk)
- elog(ERROR, "wrong heap buffer passed to visibilitymap_set");
-
- Assert(!BufferIsValid(heapBuf) ||
- BufferIsLockedByMeInMode(heapBuf, BUFFER_LOCK_EXCLUSIVE));
-
- /* Check that we have the right VM page pinned */
- if (!BufferIsValid(vmBuf) || BufferGetBlockNumber(vmBuf) != mapBlock)
- elog(ERROR, "wrong VM buffer passed to visibilitymap_set");
-
- page = BufferGetPage(vmBuf);
- map = (uint8 *) PageGetContents(page);
- LockBuffer(vmBuf, BUFFER_LOCK_EXCLUSIVE);
-
- status = (map[mapByte] >> mapOffset) & VISIBILITYMAP_VALID_BITS;
- if (flags != status)
- {
- START_CRIT_SECTION();
-
- map[mapByte] |= (flags << mapOffset);
- MarkBufferDirty(vmBuf);
-
- if (RelationNeedsWAL(rel))
- {
- if (!XLogRecPtrIsValid(recptr))
- {
- Assert(!InRecovery);
- recptr = log_heap_visible(rel, heapBuf, vmBuf, cutoff_xid, flags);
-
- /*
- * If data checksums are enabled (or wal_log_hints=on), we
- * need to protect the heap page from being torn.
- *
- * If not, then we must *not* update the heap page's LSN. In
- * this case, the FPI for the heap page was omitted from the
- * WAL record inserted above, so it would be incorrect to
- * update the heap page's LSN.
- */
- if (XLogHintBitIsNeeded())
- {
- Page heapPage = BufferGetPage(heapBuf);
-
- PageSetLSN(heapPage, recptr);
- }
- }
- PageSetLSN(page, recptr);
- }
-
- END_CRIT_SECTION();
- }
-
- LockBuffer(vmBuf, BUFFER_LOCK_UNLOCK);
-}
-
/*
* Set VM (visibility map) flags in the VM block in vmBuf.
*
@@ -341,9 +240,9 @@ visibilitymap_set(Relation rel, BlockNumber heapBlk, Buffer heapBuf,
* rlocator is used only for debugging messages.
*/
void
-visibilitymap_set_vmbits(BlockNumber heapBlk,
- Buffer vmBuf, uint8 flags,
- const RelFileLocator rlocator)
+visibilitymap_set(BlockNumber heapBlk,
+ Buffer vmBuf, uint8 flags,
+ const RelFileLocator rlocator)
{
BlockNumber mapBlock = HEAPBLK_TO_MAPBLOCK(heapBlk);
uint32 mapByte = HEAPBLK_TO_MAPBYTE(heapBlk);
diff --git a/src/backend/access/rmgrdesc/heapdesc.c b/src/backend/access/rmgrdesc/heapdesc.c
index 02ae91653c1..75ae6f9d375 100644
--- a/src/backend/access/rmgrdesc/heapdesc.c
+++ b/src/backend/access/rmgrdesc/heapdesc.c
@@ -349,13 +349,6 @@ heap2_desc(StringInfo buf, XLogReaderState *record)
}
}
}
- else if (info == XLOG_HEAP2_VISIBLE)
- {
- xl_heap_visible *xlrec = (xl_heap_visible *) rec;
-
- appendStringInfo(buf, "snapshotConflictHorizon: %u, flags: 0x%02X",
- xlrec->snapshotConflictHorizon, xlrec->flags);
- }
else if (info == XLOG_HEAP2_MULTI_INSERT)
{
xl_heap_multi_insert *xlrec = (xl_heap_multi_insert *) rec;
@@ -461,9 +454,6 @@ heap2_identify(uint8 info)
case XLOG_HEAP2_PRUNE_VACUUM_CLEANUP:
id = "PRUNE_VACUUM_CLEANUP";
break;
- case XLOG_HEAP2_VISIBLE:
- id = "VISIBLE";
- break;
case XLOG_HEAP2_MULTI_INSERT:
id = "MULTI_INSERT";
break;
diff --git a/src/backend/replication/logical/decode.c b/src/backend/replication/logical/decode.c
index e25dd6bc366..f7ddb56fc30 100644
--- a/src/backend/replication/logical/decode.c
+++ b/src/backend/replication/logical/decode.c
@@ -449,7 +449,6 @@ heap2_decode(LogicalDecodingContext *ctx, XLogRecordBuffer *buf)
case XLOG_HEAP2_PRUNE_ON_ACCESS:
case XLOG_HEAP2_PRUNE_VACUUM_SCAN:
case XLOG_HEAP2_PRUNE_VACUUM_CLEANUP:
- case XLOG_HEAP2_VISIBLE:
case XLOG_HEAP2_LOCK_UPDATED:
break;
default:
diff --git a/src/backend/storage/ipc/standby.c b/src/backend/storage/ipc/standby.c
index afffab77106..f8681dcc9c7 100644
--- a/src/backend/storage/ipc/standby.c
+++ b/src/backend/storage/ipc/standby.c
@@ -475,12 +475,12 @@ ResolveRecoveryConflictWithSnapshot(TransactionId snapshotConflictHorizon,
* If we get passed InvalidTransactionId then we do nothing (no conflict).
*
* This can happen when replaying already-applied WAL records after a
- * standby crash or restart, or when replaying an XLOG_HEAP2_VISIBLE
- * record that marks as frozen a page which was already all-visible. It's
- * also quite common with records generated during index deletion
- * (original execution of the deletion can reason that a recovery conflict
- * which is sufficient for the deletion operation must take place before
- * replay of the deletion record itself).
+ * standby crash or restart, or when replaying a record that marks as
+ * frozen a page which was already marked all-visible in the visibility
+ * map. It's also quite common with records generated during index
+ * deletion (original execution of the deletion can reason that a recovery
+ * conflict which is sufficient for the deletion operation must take place
+ * before replay of the deletion record itself).
*/
if (!TransactionIdIsValid(snapshotConflictHorizon))
return;
diff --git a/src/include/access/heapam_xlog.h b/src/include/access/heapam_xlog.h
index ce3566ba949..5eed567a8e5 100644
--- a/src/include/access/heapam_xlog.h
+++ b/src/include/access/heapam_xlog.h
@@ -60,7 +60,6 @@
#define XLOG_HEAP2_PRUNE_ON_ACCESS 0x10
#define XLOG_HEAP2_PRUNE_VACUUM_SCAN 0x20
#define XLOG_HEAP2_PRUNE_VACUUM_CLEANUP 0x30
-#define XLOG_HEAP2_VISIBLE 0x40
#define XLOG_HEAP2_MULTI_INSERT 0x50
#define XLOG_HEAP2_LOCK_UPDATED 0x60
#define XLOG_HEAP2_NEW_CID 0x70
@@ -443,20 +442,6 @@ typedef struct xl_heap_inplace
#define MinSizeOfHeapInplace (offsetof(xl_heap_inplace, nmsgs) + sizeof(int))
-/*
- * This is what we need to know about setting a visibility map bit
- *
- * Backup blk 0: visibility map buffer
- * Backup blk 1: heap buffer
- */
-typedef struct xl_heap_visible
-{
- TransactionId snapshotConflictHorizon;
- uint8 flags;
-} xl_heap_visible;
-
-#define SizeOfHeapVisible (offsetof(xl_heap_visible, flags) + sizeof(uint8))
-
typedef struct xl_heap_new_cid
{
/*
@@ -500,11 +485,6 @@ extern void heap2_desc(StringInfo buf, XLogReaderState *record);
extern const char *heap2_identify(uint8 info);
extern void heap_xlog_logical_rewrite(XLogReaderState *r);
-extern XLogRecPtr log_heap_visible(Relation rel, Buffer heap_buffer,
- Buffer vm_buffer,
- TransactionId snapshotConflictHorizon,
- uint8 vmflags);
-
/* in heapdesc.c, so it can be shared between frontend/backend code */
extern void heap_xlog_deserialize_prune_and_freeze(char *cursor, uint16 flags,
int *nplans, xlhp_freeze_plan **plans,
diff --git a/src/include/access/visibilitymap.h b/src/include/access/visibilitymap.h
index a0166c5b410..001afb037f3 100644
--- a/src/include/access/visibilitymap.h
+++ b/src/include/access/visibilitymap.h
@@ -15,7 +15,6 @@
#define VISIBILITYMAP_H
#include "access/visibilitymapdefs.h"
-#include "access/xlogdefs.h"
#include "storage/block.h"
#include "storage/buf.h"
#include "storage/relfilelocator.h"
@@ -32,15 +31,9 @@ extern bool visibilitymap_clear(Relation rel, BlockNumber heapBlk,
extern void visibilitymap_pin(Relation rel, BlockNumber heapBlk,
Buffer *vmbuf);
extern bool visibilitymap_pin_ok(BlockNumber heapBlk, Buffer vmbuf);
-extern void visibilitymap_set(Relation rel,
- BlockNumber heapBlk, Buffer heapBuf,
- XLogRecPtr recptr,
- Buffer vmBuf,
- TransactionId cutoff_xid,
- uint8 flags);
-extern void visibilitymap_set_vmbits(BlockNumber heapBlk,
- Buffer vmBuf, uint8 flags,
- const RelFileLocator rlocator);
+extern void visibilitymap_set(BlockNumber heapBlk,
+ Buffer vmBuf, uint8 flags,
+ const RelFileLocator rlocator);
extern uint8 visibilitymap_get_status(Relation rel, BlockNumber heapBlk, Buffer *vmbuf);
extern void visibilitymap_count(Relation rel, BlockNumber *all_visible, BlockNumber *all_frozen);
extern BlockNumber visibilitymap_prepare_truncate(Relation rel,
diff --git a/src/include/access/visibilitymapdefs.h b/src/include/access/visibilitymapdefs.h
index 89153b3cd9a..e5794c8559e 100644
--- a/src/include/access/visibilitymapdefs.h
+++ b/src/include/access/visibilitymapdefs.h
@@ -21,14 +21,5 @@
#define VISIBILITYMAP_ALL_FROZEN 0x02
#define VISIBILITYMAP_VALID_BITS 0x03 /* OR of all valid visibilitymap
* flags bits */
-/*
- * To detect recovery conflicts during logical decoding on a standby, we need
- * to know if a table is a user catalog table. For that we add an additional
- * bit into xl_heap_visible.flags, in addition to the above.
- *
- * NB: VISIBILITYMAP_XLOG_* may not be passed to visibilitymap_set().
- */
-#define VISIBILITYMAP_XLOG_CATALOG_REL 0x04
-#define VISIBILITYMAP_XLOG_VALID_BITS (VISIBILITYMAP_VALID_BITS | VISIBILITYMAP_XLOG_CATALOG_REL)
#endif /* VISIBILITYMAPDEFS_H */
diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list
index b9e671fcda8..308cfff999e 100644
--- a/src/tools/pgindent/typedefs.list
+++ b/src/tools/pgindent/typedefs.list
@@ -4331,7 +4331,6 @@ xl_heap_prune
xl_heap_rewrite_mapping
xl_heap_truncate
xl_heap_update
-xl_heap_visible
xl_invalid_page
xl_invalid_page_key
xl_invalidations
--
2.43.0
v32-0009-Simplify-heap_page_would_be_all_visible-visibili.patchtext/x-patch; charset=US-ASCII; name=v32-0009-Simplify-heap_page_would_be_all_visible-visibili.patchDownload
From 30d5812809ebb3b66a4b33b0ebc4aa6b1268acd8 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <melanieplageman@gmail.com>
Date: Wed, 17 Dec 2025 13:57:16 -0500
Subject: [PATCH v32 09/16] Simplify heap_page_would_be_all_visible visibility
check
heap_page_would_be_all_visible() doesn't care about the distinction
between HEAPTUPLE_RECENTLY_DEAD and HEAPTUPLE_DEAD tuples -- any tuple
that is not HEAPTUPLE_LIVE means the page is not all-visible and causes
us to return false.
Therefore, we don't need to call HeapTupleSatisfiesVacuum(), which
includes an extra step to distinguish between dead and recently dead
tuples using OldestXmin. Replace it with the more minimal
HeapTupleSatisfiesVacuumHorizon().
This has the added benefit of making it easier to replace uses of
OldestXmin in heap_page_would_be_all_visible() in the future.
Author: Melanie Plageman <melanieplageman@gmail.com>
Reviewed-by: Kirill Reshke <reshkekirill@gmail.com>
Discussion: https://postgr.es/m/CALdSSPjvhGXihT_9f-GJabYU%3D_PjrFDUxYaURuTbfLyQM6TErg%40mail.gmail.com
---
src/backend/access/heap/vacuumlazy.c | 9 +++++++--
1 file changed, 7 insertions(+), 2 deletions(-)
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index 4d7e1636526..4b2a26f7336 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -3570,6 +3570,7 @@ heap_page_would_be_all_visible(Relation rel, Buffer buf,
{
ItemId itemid;
HeapTupleData tuple;
+ TransactionId dead_after;
/*
* Set the offset number so that we can display it along with any
@@ -3609,12 +3610,14 @@ heap_page_would_be_all_visible(Relation rel, Buffer buf,
/* Visibility checks may do IO or allocate memory */
Assert(CritSectionCount == 0);
- switch (HeapTupleSatisfiesVacuum(&tuple, OldestXmin, buf))
+ switch (HeapTupleSatisfiesVacuumHorizon(&tuple, buf, &dead_after))
{
case HEAPTUPLE_LIVE:
{
TransactionId xmin;
+ Assert(!TransactionIdIsValid(dead_after));
+
/* Check comments in lazy_scan_prune. */
if (!HeapTupleHeaderXminCommitted(tuple.t_data))
{
@@ -3647,8 +3650,10 @@ heap_page_would_be_all_visible(Relation rel, Buffer buf,
}
break;
- case HEAPTUPLE_DEAD:
case HEAPTUPLE_RECENTLY_DEAD:
+ Assert(TransactionIdIsValid(dead_after));
+ /* FALLTHROUGH */
+ case HEAPTUPLE_DEAD:
case HEAPTUPLE_INSERT_IN_PROGRESS:
case HEAPTUPLE_DELETE_IN_PROGRESS:
{
--
2.43.0
v32-0010-Remove-table_scan_analyze_next_tuple-unneeded-pa.patchtext/x-patch; charset=US-ASCII; name=v32-0010-Remove-table_scan_analyze_next_tuple-unneeded-pa.patchDownload
From 4cf660283b1ed92befb5fd8a7f0c9a2819619038 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <melanieplageman@gmail.com>
Date: Mon, 22 Dec 2025 10:46:45 -0500
Subject: [PATCH v32 10/16] Remove table_scan_analyze_next_tuple unneeded
parameter OldestXmin
heapam_scan_analyze_next_tuple() doesn't distinguish between dead and
recently dead tuples when counting them, so it doesn't need OldestXmin.
Looking at other table AMs implementing table_scan_analyze_next_tuple(),
it appears most do not use OldestXmin either.
Suggested-by: Kirill Reshke <reshkekirill@gmail.com>
Reviewed-by: Kirill Reshke <reshkekirill@gmail.com>
Discussion: https://postgr.es/m/CALdSSPjvhGXihT_9f-GJabYU%3D_PjrFDUxYaURuTbfLyQM6TErg%40mail.gmail.com
---
src/backend/access/heap/heapam_handler.c | 13 +++++++++----
src/backend/commands/analyze.c | 6 +-----
src/include/access/tableam.h | 5 ++---
3 files changed, 12 insertions(+), 12 deletions(-)
diff --git a/src/backend/access/heap/heapam_handler.c b/src/backend/access/heap/heapam_handler.c
index 09a456e9966..df2440e82a7 100644
--- a/src/backend/access/heap/heapam_handler.c
+++ b/src/backend/access/heap/heapam_handler.c
@@ -1026,7 +1026,7 @@ heapam_scan_analyze_next_block(TableScanDesc scan, ReadStream *stream)
}
static bool
-heapam_scan_analyze_next_tuple(TableScanDesc scan, TransactionId OldestXmin,
+heapam_scan_analyze_next_tuple(TableScanDesc scan,
double *liverows, double *deadrows,
TupleTableSlot *slot)
{
@@ -1047,6 +1047,7 @@ heapam_scan_analyze_next_tuple(TableScanDesc scan, TransactionId OldestXmin,
ItemId itemid;
HeapTuple targtuple = &hslot->base.tupdata;
bool sample_it = false;
+ TransactionId dead_after;
itemid = PageGetItemId(targpage, hscan->rs_cindex);
@@ -1069,16 +1070,20 @@ heapam_scan_analyze_next_tuple(TableScanDesc scan, TransactionId OldestXmin,
targtuple->t_data = (HeapTupleHeader) PageGetItem(targpage, itemid);
targtuple->t_len = ItemIdGetLength(itemid);
- switch (HeapTupleSatisfiesVacuum(targtuple, OldestXmin,
- hscan->rs_cbuf))
+ switch (HeapTupleSatisfiesVacuumHorizon(targtuple,
+ hscan->rs_cbuf,
+ &dead_after))
{
case HEAPTUPLE_LIVE:
sample_it = true;
*liverows += 1;
break;
- case HEAPTUPLE_DEAD:
case HEAPTUPLE_RECENTLY_DEAD:
+ Assert(TransactionIdIsValid(dead_after));
+ /* FALLTHROUGH */
+
+ case HEAPTUPLE_DEAD:
/* Count dead and recently-dead rows */
*deadrows += 1;
break;
diff --git a/src/backend/commands/analyze.c b/src/backend/commands/analyze.c
index a483424152c..53adac9139b 100644
--- a/src/backend/commands/analyze.c
+++ b/src/backend/commands/analyze.c
@@ -1213,7 +1213,6 @@ acquire_sample_rows(Relation onerel, int elevel,
double rowstoskip = -1; /* -1 means not set yet */
uint32 randseed; /* Seed for block sampler(s) */
BlockNumber totalblocks;
- TransactionId OldestXmin;
BlockSamplerData bs;
ReservoirStateData rstate;
TupleTableSlot *slot;
@@ -1226,9 +1225,6 @@ acquire_sample_rows(Relation onerel, int elevel,
totalblocks = RelationGetNumberOfBlocks(onerel);
- /* Need a cutoff xmin for HeapTupleSatisfiesVacuum */
- OldestXmin = GetOldestNonRemovableTransactionId(onerel);
-
/* Prepare for sampling block numbers */
randseed = pg_prng_uint32(&pg_global_prng_state);
nblocks = BlockSampler_Init(&bs, totalblocks, targrows, randseed);
@@ -1261,7 +1257,7 @@ acquire_sample_rows(Relation onerel, int elevel,
{
vacuum_delay_point(true);
- while (table_scan_analyze_next_tuple(scan, OldestXmin, &liverows, &deadrows, slot))
+ while (table_scan_analyze_next_tuple(scan, &liverows, &deadrows, slot))
{
/*
* The first targrows sample rows are simply copied into the
diff --git a/src/include/access/tableam.h b/src/include/access/tableam.h
index e2ec5289d4d..c9fa9f259cd 100644
--- a/src/include/access/tableam.h
+++ b/src/include/access/tableam.h
@@ -683,7 +683,6 @@ typedef struct TableAmRoutine
* callback).
*/
bool (*scan_analyze_next_tuple) (TableScanDesc scan,
- TransactionId OldestXmin,
double *liverows,
double *deadrows,
TupleTableSlot *slot);
@@ -1714,11 +1713,11 @@ table_scan_analyze_next_block(TableScanDesc scan, ReadStream *stream)
* tuples.
*/
static inline bool
-table_scan_analyze_next_tuple(TableScanDesc scan, TransactionId OldestXmin,
+table_scan_analyze_next_tuple(TableScanDesc scan,
double *liverows, double *deadrows,
TupleTableSlot *slot)
{
- return scan->rs_rd->rd_tableam->scan_analyze_next_tuple(scan, OldestXmin,
+ return scan->rs_rd->rd_tableam->scan_analyze_next_tuple(scan,
liverows, deadrows,
slot);
}
--
2.43.0
v32-0011-Use-GlobalVisState-in-vacuum-to-determine-page-l.patchtext/x-patch; charset=UTF-8; name=v32-0011-Use-GlobalVisState-in-vacuum-to-determine-page-l.patchDownload
From f7d2a0353a79ca29d900b6bcab0dab946fac5752 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <melanieplageman@gmail.com>
Date: Wed, 17 Dec 2025 16:51:05 -0500
Subject: [PATCH v32 11/16] Use GlobalVisState in vacuum to determine page
level visibility
During vacuum's first and third phases, we examine tuples' visibility
to determine if we can set the page all-visible in the visibility map.
Previously, this check compared tuple xmins against a single XID chosen at
the start of vacuum (OldestXmin). We now use GlobalVisState, which also
enables future work to set the VM during on-access pruning, since ordinary
queries have access to GlobalVisState but not OldestXmin.
This also benefits vacuum: in some cases, GlobalVisState may advance
during a vacuum, allowing more pages to become considered all-visible.
And, in the future, we could easily add a heuristic to update
GlobalVisState more frequently during vacuums of large tables.
OldestXmin is still used for freezing and as a backstop to ensure we
don't freeze a dead tuple that wasn't yet prunable according to
GlobalVisState in the rare occurrences where GlobalVisState moves
backwards.
Because comparing a transaction ID against GlobalVisState is more
expensive than comparing against a single XID, we defer this check until
after scanning all tuples on the page. If visibility_cutoff_xid was
maintained, we perform the GlobalVisState check only once per page.
This is safe because visibility_cutoff_xid records the newest xmin on
the page; if it is globally visible, then the entire page is all-visible.
This approach may result in examining more tuple xmins than before,
since with OldestXmin we could sometimes rule out the page being
all-visible earlier. However, profiling shows the additional cost is not
significant.
Reviewed-by: Andres Freund <andres@anarazel.de>
Reviewed-by: Chao Li <li.evan.chao@gmail.com>
---
src/backend/access/heap/heapam_visibility.c | 22 +++++++++
src/backend/access/heap/pruneheap.c | 53 ++++++++++-----------
src/backend/access/heap/vacuumlazy.c | 38 ++++++++++-----
src/include/access/heapam.h | 4 +-
4 files changed, 76 insertions(+), 41 deletions(-)
diff --git a/src/backend/access/heap/heapam_visibility.c b/src/backend/access/heap/heapam_visibility.c
index 05e70b7d92a..b4489020609 100644
--- a/src/backend/access/heap/heapam_visibility.c
+++ b/src/backend/access/heap/heapam_visibility.c
@@ -1053,6 +1053,28 @@ HeapTupleSatisfiesVacuum(HeapTuple htup, TransactionId OldestXmin,
return res;
}
+/*
+ * Wrapper around GlobalVisTestIsRemovableXid() for use when examining live
+ * tuples. Returns true if the given XID may be considered running by at least
+ * one snapshot.
+ *
+ * This function alone is insufficient to determine tuple visibility; callers
+ * must also consider the XID's commit status. Its purpose is purely semantic:
+ * when applied to live tuples, GlobalVisTestIsRemovableXid() is checking
+ * whether the inserting transaction is still considered running, not whether
+ * the tuple is removable. Live tuples are, by definition, not removable, but
+ * the snapshot criteria for “transaction still running” are identical to
+ * those used for removal XIDs.
+ *
+ * See the comment above GlobalVisTestIsRemovable[Full]Xid() for details on the
+ * required preconditions for calling this function.
+ */
+bool
+GlobalVisTestXidMaybeRunning(GlobalVisState *state, TransactionId xid)
+{
+ return !GlobalVisTestIsRemovableXid(state, xid);
+}
+
/*
* Work horse for HeapTupleSatisfiesVacuum and similar routines.
*
diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index cac09dff31f..da09c769b4d 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -461,11 +461,12 @@ prune_freeze_setup(PruneFreezeParams *params,
/*
* The visibility cutoff xid is the newest xmin of live, committed tuples
- * older than OldestXmin on the page. This field is only kept up-to-date
- * if the page is all-visible. As soon as a tuple is encountered that is
- * not visible to all, this field is unmaintained. As long as it is
- * maintained, it can be used to calculate the snapshot conflict horizon
- * when updating the VM and/or freezing all the tuples on the page.
+ * on the page older than the visibility horizon represented in the
+ * GlobalVisState. This field is only kept up-to-date if the page is
+ * all-visible. As soon as a tuple is encountered that is not visible to
+ * all, this field is unmaintained. As long as it is maintained, it can be
+ * used to calculate the snapshot conflict horizon when updating the VM
+ * and/or freezing all the tuples on the page.
*/
prstate->visibility_cutoff_xid = InvalidTransactionId;
}
@@ -1008,14 +1009,14 @@ heap_page_will_set_vm(PruneState *prstate,
*/
static bool
heap_page_is_all_visible(Relation rel, Buffer buf,
- TransactionId OldestXmin,
+ GlobalVisState *vistest,
bool *all_frozen,
TransactionId *visibility_cutoff_xid,
OffsetNumber *logging_offnum)
{
return heap_page_would_be_all_visible(rel, buf,
- OldestXmin,
+ vistest,
NULL, 0,
all_frozen,
visibility_cutoff_xid,
@@ -1102,6 +1103,16 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
prune_freeze_plan(RelationGetRelid(params->relation),
buffer, &prstate, off_loc);
+ /*
+ * After processing all the live tuples on the page, if the newest xmin
+ * amongst them may be considered running by any snapshot, the page cannot
+ * be all-visible.
+ */
+ if (prstate.all_visible &&
+ TransactionIdIsNormal(prstate.visibility_cutoff_xid) &&
+ GlobalVisTestXidMaybeRunning(prstate.vistest, prstate.visibility_cutoff_xid))
+ prstate.all_visible = prstate.all_frozen = false;
+
/*
* If checksums are enabled, calling heap_prune_satisfies_vacuum() while
* checking tuple visibility information in prune_freeze_plan() may have
@@ -1283,10 +1294,9 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
bool debug_all_frozen;
Assert(prstate.lpdead_items == 0);
- Assert(prstate.cutoffs);
Assert(heap_page_is_all_visible(params->relation, buffer,
- prstate.cutoffs->OldestXmin,
+ prstate.vistest,
&debug_all_frozen,
&debug_cutoff, off_loc));
@@ -1807,28 +1817,15 @@ heap_prune_record_unchanged_lp_normal(Page page, PruneState *prstate, OffsetNumb
}
/*
- * The inserter definitely committed. But is it old enough
- * that everyone sees it as committed? A FrozenTransactionId
- * is seen as committed to everyone. Otherwise, we check if
- * there is a snapshot that considers this xid to still be
- * running, and if so, we don't consider the page all-visible.
+ * The inserter definitely committed. But we don't know if it
+ * is old enough that everyone sees it as committed. Later,
+ * after processing all the tuples on the page, we'll check if
+ * there is any snapshot that still considers the newest xid
+ * on the page to be running. If so, we don't consider the
+ * page all-visible.
*/
xmin = HeapTupleHeaderGetXmin(htup);
- /*
- * For now always use prstate->cutoffs for this test, because
- * we only update 'all_visible' and 'all_frozen' when freezing
- * is requested. We could use GlobalVisTestIsRemovableXid
- * instead, if a non-freezing caller wanted to set the VM bit.
- */
- Assert(prstate->cutoffs);
- if (!TransactionIdPrecedes(xmin, prstate->cutoffs->OldestXmin))
- {
- prstate->all_visible = false;
- prstate->all_frozen = false;
- break;
- }
-
/* Track newest xmin on page. */
if (TransactionIdFollows(xmin, prstate->visibility_cutoff_xid) &&
TransactionIdIsNormal(xmin))
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index 4b2a26f7336..c97ad2a931a 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -2754,7 +2754,7 @@ lazy_vacuum_heap_page(LVRelState *vacrel, BlockNumber blkno, Buffer buffer,
* done outside the critical section.
*/
if (heap_page_would_be_all_visible(vacrel->rel, buffer,
- vacrel->cutoffs.OldestXmin,
+ vacrel->vistest,
deadoffsets, num_offsets,
&all_frozen, &visibility_cutoff_xid,
&vacrel->offnum))
@@ -3519,7 +3519,7 @@ dead_items_cleanup(LVRelState *vacrel)
* Returns true if the page is all-visible other than the provided
* deadoffsets and false otherwise.
*
- * OldestXmin is used to determine visibility.
+ * vistest is used to determine visibility.
*
* Output parameters:
*
@@ -3535,7 +3535,7 @@ dead_items_cleanup(LVRelState *vacrel)
*/
bool
heap_page_would_be_all_visible(Relation rel, Buffer buf,
- TransactionId OldestXmin,
+ GlobalVisState *vistest,
OffsetNumber *deadoffsets,
int ndeadoffsets,
bool *all_frozen,
@@ -3618,7 +3618,7 @@ heap_page_would_be_all_visible(Relation rel, Buffer buf,
Assert(!TransactionIdIsValid(dead_after));
- /* Check comments in lazy_scan_prune. */
+ /* Check heap_prune_record_unchanged_lp_normal comments */
if (!HeapTupleHeaderXminCommitted(tuple.t_data))
{
all_visible = false;
@@ -3627,16 +3627,17 @@ heap_page_would_be_all_visible(Relation rel, Buffer buf,
}
/*
- * The inserter definitely committed. But is it old enough
- * that everyone sees it as committed?
+ * The inserter definitely committed. But we don't know if
+ * it is old enough that everyone sees it as committed.
+ * Don't check that now.
+ *
+ * If we scan all tuples without finding one that prevents
+ * the page from being all-visible, we then check whether
+ * any snapshot still considers the newest XID on the page
+ * to be running. In that case, the page is not considered
+ * all-visible.
*/
xmin = HeapTupleHeaderGetXmin(tuple.t_data);
- if (!TransactionIdPrecedes(xmin, OldestXmin))
- {
- all_visible = false;
- *all_frozen = false;
- break;
- }
/* Track newest xmin on page. */
if (TransactionIdFollows(xmin, *visibility_cutoff_xid) &&
@@ -3667,6 +3668,19 @@ heap_page_would_be_all_visible(Relation rel, Buffer buf,
}
} /* scan along page */
+ /*
+ * After processing all the live tuples on the page, if the newest xmin
+ * among them may still be considered running by any snapshot, the page
+ * cannot be all-visible.
+ */
+ if (all_visible &&
+ TransactionIdIsNormal(*visibility_cutoff_xid) &&
+ GlobalVisTestXidMaybeRunning(vistest, *visibility_cutoff_xid))
+ {
+ all_visible = false;
+ *all_frozen = false;
+ }
+
/* Clear the offset information once we have processed the given page. */
*logging_offnum = InvalidOffsetNumber;
diff --git a/src/include/access/heapam.h b/src/include/access/heapam.h
index e0da1f7cdcc..ac771390a37 100644
--- a/src/include/access/heapam.h
+++ b/src/include/access/heapam.h
@@ -438,7 +438,7 @@ extern void log_heap_prune_and_freeze(Relation relation, Buffer buffer,
extern void heap_vacuum_rel(Relation rel,
const VacuumParams params, BufferAccessStrategy bstrategy);
extern bool heap_page_would_be_all_visible(Relation rel, Buffer buf,
- TransactionId OldestXmin,
+ GlobalVisState *vistest,
OffsetNumber *deadoffsets,
int ndeadoffsets,
bool *all_frozen,
@@ -452,6 +452,8 @@ extern TM_Result HeapTupleSatisfiesUpdate(HeapTuple htup, CommandId curcid,
Buffer buffer);
extern HTSV_Result HeapTupleSatisfiesVacuum(HeapTuple htup, TransactionId OldestXmin,
Buffer buffer);
+
+extern bool GlobalVisTestXidMaybeRunning(GlobalVisState *state, TransactionId xid);
extern HTSV_Result HeapTupleSatisfiesVacuumHorizon(HeapTuple htup, Buffer buffer,
TransactionId *dead_after);
extern void HeapTupleSetHintBits(HeapTupleHeader tuple, Buffer buffer,
--
2.43.0
v32-0012-Unset-all_visible-sooner-if-not-freezing.patchtext/x-patch; charset=UTF-8; name=v32-0012-Unset-all_visible-sooner-if-not-freezing.patchDownload
From a33bc378a3970647602bdfb1a68cf76ed97911d1 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <melanieplageman@gmail.com>
Date: Tue, 14 Oct 2025 15:22:35 -0400
Subject: [PATCH v32 12/16] Unset all_visible sooner if not freezing
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
In the prune/freeze path, we currently delay clearing all_visible and
all_frozen in the presence of dead items to allow opportunistic
freezing.
However, if no freezing will be attempted, there’s no need to delay.
Clearing the flags earlier avoids extra bookkeeping in
heap_prune_record_unchanged_lp_normal(). This currently has no runtime
effect because all callers that consider setting the VM also prepare
freeze plans, but upcoming changes will allow on-access pruning to set
the VM without freezing. The extra bookkeeping was noticeable in a
profile of on-access VM setting.
Reviewed-by: Chao Li <li.evan.chao@gmail.com>
---
src/backend/access/heap/pruneheap.c | 15 +++++++++++++--
1 file changed, 13 insertions(+), 2 deletions(-)
diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index da09c769b4d..9f1257529b9 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -1682,8 +1682,13 @@ heap_prune_record_dead(PruneState *prstate, OffsetNumber offnum,
/*
* Deliberately delay unsetting all_visible and all_frozen until later
* during pruning. Removable dead tuples shouldn't preclude freezing the
- * page.
+ * page. If we won't attempt freezing, just unset all-visible now, though.
*/
+ if (!prstate->attempt_freeze)
+ {
+ prstate->all_visible = false;
+ prstate->all_frozen = false;
+ }
/* Record the dead offset for vacuum */
prstate->deadoffsets[prstate->lpdead_items++] = offnum;
@@ -1943,8 +1948,14 @@ heap_prune_record_unchanged_lp_dead(Page page, PruneState *prstate, OffsetNumber
* Similarly, don't unset all_visible and all_frozen until later, at the
* end of heap_page_prune_and_freeze(). This will allow us to attempt to
* freeze the page after pruning. As long as we unset it before updating
- * the visibility map, this will be correct.
+ * the visibility map, this will be correct. If we won't attempt freezing,
+ * though, just unset all_visible and all_frozen now.
*/
+ if (!prstate->attempt_freeze)
+ {
+ prstate->all_visible = false;
+ prstate->all_frozen = false;
+ }
/* Record the dead offset for vacuum */
prstate->deadoffsets[prstate->lpdead_items++] = offnum;
--
2.43.0
v32-0013-Track-which-relations-are-modified-by-a-query.patchtext/x-patch; charset=US-ASCII; name=v32-0013-Track-which-relations-are-modified-by-a-query.patchDownload
From bd5e3e9cb10858f4d18ea487d8bc98344afd8a5c Mon Sep 17 00:00:00 2001
From: Melanie Plageman <melanieplageman@gmail.com>
Date: Wed, 3 Dec 2025 15:07:24 -0500
Subject: [PATCH v32 13/16] Track which relations are modified by a query
Save the relids in a bitmap in the estate. A later commit will pass this
information down to scan nodes to control whether or not the scan allows
setting the visibility map while on-access pruning. We don't want to set
the visibility map if the query is just going to modify the page
immediately after.
Reviewed-by: Chao Li <li.evan.chao@gmail.com>
---
src/backend/executor/execMain.c | 4 ++++
src/backend/executor/execUtils.c | 2 ++
src/include/nodes/execnodes.h | 6 ++++++
3 files changed, 12 insertions(+)
diff --git a/src/backend/executor/execMain.c b/src/backend/executor/execMain.c
index ca14cdabdd0..6a0283985c3 100644
--- a/src/backend/executor/execMain.c
+++ b/src/backend/executor/execMain.c
@@ -916,6 +916,10 @@ InitPlan(QueryDesc *queryDesc, int eflags)
break;
}
+ /* If it has a rowmark, the relation may be modified */
+ estate->es_modified_relids = bms_add_member(estate->es_modified_relids,
+ rc->rti);
+
/* Check that relation is a legal target for marking */
if (relation)
CheckValidRowMarkRel(relation, rc->markType);
diff --git a/src/backend/executor/execUtils.c b/src/backend/executor/execUtils.c
index cc3c5de71eb..dcb2ef2275c 100644
--- a/src/backend/executor/execUtils.c
+++ b/src/backend/executor/execUtils.c
@@ -893,6 +893,8 @@ ExecInitResultRelation(EState *estate, ResultRelInfo *resultRelInfo,
estate->es_result_relations = (ResultRelInfo **)
palloc0(estate->es_range_table_size * sizeof(ResultRelInfo *));
estate->es_result_relations[rti - 1] = resultRelInfo;
+ estate->es_modified_relids = bms_add_member(estate->es_modified_relids,
+ rti);
/*
* Saving in the list allows to avoid needlessly traversing the whole
diff --git a/src/include/nodes/execnodes.h b/src/include/nodes/execnodes.h
index 02265456978..29e2e2da7ea 100644
--- a/src/include/nodes/execnodes.h
+++ b/src/include/nodes/execnodes.h
@@ -676,6 +676,12 @@ typedef struct EState
* ExecDoInitialPruning() */
const char *es_sourceText; /* Source text from QueryDesc */
+ /*
+ * RT indexes of relations modified by the query through a
+ * UPDATE/DELETE/INSERT/MERGE or targeted by a SELECT FOR UPDATE.
+ */
+ Bitmapset *es_modified_relids;
+
JunkFilter *es_junkFilter; /* top-level junk filter, if any */
/* If query can insert/delete tuples, the command ID to mark them with */
--
2.43.0
v32-0014-Pass-down-information-on-table-modification-to-s.patchtext/x-patch; charset=US-ASCII; name=v32-0014-Pass-down-information-on-table-modification-to-s.patchDownload
From 239ba2a8134755d8700b9718010b8f0a7b29fef1 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <melanieplageman@gmail.com>
Date: Wed, 3 Dec 2025 15:12:18 -0500
Subject: [PATCH v32 14/16] Pass down information on table modification to scan
node
Pass down information to sequential scan, index scan, and bitmap table
scan nodes on whether or not the query modifies the relation being
scanned. A later commit will use this information to update the VM
during on-access pruning only if the relation is not modified by the
query.
---
contrib/pgrowlocks/pgrowlocks.c | 2 +-
src/backend/access/brin/brin.c | 3 ++-
src/backend/access/gin/gininsert.c | 3 ++-
src/backend/access/heap/heapam_handler.c | 7 +++---
src/backend/access/index/genam.c | 4 ++--
src/backend/access/index/indexam.c | 6 +++---
src/backend/access/nbtree/nbtsort.c | 2 +-
src/backend/access/table/tableam.c | 7 +++---
src/backend/commands/constraint.c | 2 +-
src/backend/commands/copyto.c | 2 +-
src/backend/commands/tablecmds.c | 8 +++----
src/backend/commands/typecmds.c | 4 ++--
src/backend/executor/execIndexing.c | 2 +-
src/backend/executor/execReplication.c | 8 +++----
src/backend/executor/nodeBitmapHeapscan.c | 9 +++++++-
src/backend/executor/nodeIndexonlyscan.c | 2 +-
src/backend/executor/nodeIndexscan.c | 11 ++++++++--
src/backend/executor/nodeSeqscan.c | 26 ++++++++++++++++++++---
src/backend/partitioning/partbounds.c | 2 +-
src/backend/utils/adt/selfuncs.c | 2 +-
src/include/access/genam.h | 2 +-
src/include/access/heapam.h | 6 ++++++
src/include/access/tableam.h | 19 ++++++++++-------
23 files changed, 93 insertions(+), 46 deletions(-)
diff --git a/contrib/pgrowlocks/pgrowlocks.c b/contrib/pgrowlocks/pgrowlocks.c
index f88269332b6..27f01d8055f 100644
--- a/contrib/pgrowlocks/pgrowlocks.c
+++ b/contrib/pgrowlocks/pgrowlocks.c
@@ -114,7 +114,7 @@ pgrowlocks(PG_FUNCTION_ARGS)
RelationGetRelationName(rel));
/* Scan the relation */
- scan = table_beginscan(rel, GetActiveSnapshot(), 0, NULL);
+ scan = table_beginscan(rel, GetActiveSnapshot(), 0, NULL, 0);
hscan = (HeapScanDesc) scan;
attinmeta = TupleDescGetAttInMetadata(rsinfo->setDesc);
diff --git a/src/backend/access/brin/brin.c b/src/backend/access/brin/brin.c
index 6887e421442..4d9684b1b19 100644
--- a/src/backend/access/brin/brin.c
+++ b/src/backend/access/brin/brin.c
@@ -2843,7 +2843,8 @@ _brin_parallel_scan_and_build(BrinBuildState *state,
indexInfo->ii_Concurrent = brinshared->isconcurrent;
scan = table_beginscan_parallel(heap,
- ParallelTableScanFromBrinShared(brinshared));
+ ParallelTableScanFromBrinShared(brinshared),
+ 0);
reltuples = table_index_build_scan(heap, index, indexInfo, true, true,
brinbuildCallbackParallel, state, scan);
diff --git a/src/backend/access/gin/gininsert.c b/src/backend/access/gin/gininsert.c
index c08ea927ac5..b502d4088d7 100644
--- a/src/backend/access/gin/gininsert.c
+++ b/src/backend/access/gin/gininsert.c
@@ -2058,7 +2058,8 @@ _gin_parallel_scan_and_build(GinBuildState *state,
indexInfo->ii_Concurrent = ginshared->isconcurrent;
scan = table_beginscan_parallel(heap,
- ParallelTableScanFromGinBuildShared(ginshared));
+ ParallelTableScanFromGinBuildShared(ginshared),
+ 0);
reltuples = table_index_build_scan(heap, index, indexInfo, true, progress,
ginBuildCallbackParallel, state, scan);
diff --git a/src/backend/access/heap/heapam_handler.c b/src/backend/access/heap/heapam_handler.c
index df2440e82a7..e88db52fd7e 100644
--- a/src/backend/access/heap/heapam_handler.c
+++ b/src/backend/access/heap/heapam_handler.c
@@ -79,12 +79,13 @@ heapam_slot_callbacks(Relation relation)
*/
static IndexFetchTableData *
-heapam_index_fetch_begin(Relation rel)
+heapam_index_fetch_begin(Relation rel, uint32 flags)
{
IndexFetchHeapData *hscan = palloc0_object(IndexFetchHeapData);
hscan->xs_base.rel = rel;
hscan->xs_cbuf = InvalidBuffer;
+ hscan->modifies_base_rel = !(flags & SO_HINT_REL_READ_ONLY);
return &hscan->xs_base;
}
@@ -753,7 +754,7 @@ heapam_relation_copy_for_cluster(Relation OldHeap, Relation NewHeap,
tableScan = NULL;
heapScan = NULL;
- indexScan = index_beginscan(OldHeap, OldIndex, SnapshotAny, NULL, 0, 0);
+ indexScan = index_beginscan(OldHeap, OldIndex, SnapshotAny, NULL, 0, 0, 0);
index_rescan(indexScan, NULL, 0, NULL, 0);
}
else
@@ -762,7 +763,7 @@ heapam_relation_copy_for_cluster(Relation OldHeap, Relation NewHeap,
pgstat_progress_update_param(PROGRESS_CLUSTER_PHASE,
PROGRESS_CLUSTER_PHASE_SEQ_SCAN_HEAP);
- tableScan = table_beginscan(OldHeap, SnapshotAny, 0, (ScanKey) NULL);
+ tableScan = table_beginscan(OldHeap, SnapshotAny, 0, (ScanKey) NULL, 0);
heapScan = (HeapScanDesc) tableScan;
indexScan = NULL;
diff --git a/src/backend/access/index/genam.c b/src/backend/access/index/genam.c
index a29be6f467b..5ac7d22e49f 100644
--- a/src/backend/access/index/genam.c
+++ b/src/backend/access/index/genam.c
@@ -447,7 +447,7 @@ systable_beginscan(Relation heapRelation,
}
sysscan->iscan = index_beginscan(heapRelation, irel,
- snapshot, NULL, nkeys, 0);
+ snapshot, NULL, nkeys, 0, 0);
index_rescan(sysscan->iscan, idxkey, nkeys, NULL, 0);
sysscan->scan = NULL;
@@ -708,7 +708,7 @@ systable_beginscan_ordered(Relation heapRelation,
}
sysscan->iscan = index_beginscan(heapRelation, indexRelation,
- snapshot, NULL, nkeys, 0);
+ snapshot, NULL, nkeys, 0, 0);
index_rescan(sysscan->iscan, idxkey, nkeys, NULL, 0);
sysscan->scan = NULL;
diff --git a/src/backend/access/index/indexam.c b/src/backend/access/index/indexam.c
index 4ed0508c605..4df56087841 100644
--- a/src/backend/access/index/indexam.c
+++ b/src/backend/access/index/indexam.c
@@ -257,7 +257,7 @@ index_beginscan(Relation heapRelation,
Relation indexRelation,
Snapshot snapshot,
IndexScanInstrumentation *instrument,
- int nkeys, int norderbys)
+ int nkeys, int norderbys, uint32 flags)
{
IndexScanDesc scan;
@@ -284,7 +284,7 @@ index_beginscan(Relation heapRelation,
scan->instrument = instrument;
/* prepare to fetch index matches from table */
- scan->xs_heapfetch = table_index_fetch_begin(heapRelation);
+ scan->xs_heapfetch = table_index_fetch_begin(heapRelation, flags);
return scan;
}
@@ -615,7 +615,7 @@ index_beginscan_parallel(Relation heaprel, Relation indexrel,
scan->instrument = instrument;
/* prepare to fetch index matches from table */
- scan->xs_heapfetch = table_index_fetch_begin(heaprel);
+ scan->xs_heapfetch = table_index_fetch_begin(heaprel, 0);
return scan;
}
diff --git a/src/backend/access/nbtree/nbtsort.c b/src/backend/access/nbtree/nbtsort.c
index 90ab4e91b56..8ae54217f36 100644
--- a/src/backend/access/nbtree/nbtsort.c
+++ b/src/backend/access/nbtree/nbtsort.c
@@ -1925,7 +1925,7 @@ _bt_parallel_scan_and_sort(BTSpool *btspool, BTSpool *btspool2,
indexInfo = BuildIndexInfo(btspool->index);
indexInfo->ii_Concurrent = btshared->isconcurrent;
scan = table_beginscan_parallel(btspool->heap,
- ParallelTableScanFromBTShared(btshared));
+ ParallelTableScanFromBTShared(btshared), 0);
reltuples = table_index_build_scan(btspool->heap, btspool->index, indexInfo,
true, progress, _bt_build_callback,
&buildstate, scan);
diff --git a/src/backend/access/table/tableam.c b/src/backend/access/table/tableam.c
index 87491796523..2ff29b6e40b 100644
--- a/src/backend/access/table/tableam.c
+++ b/src/backend/access/table/tableam.c
@@ -163,10 +163,11 @@ table_parallelscan_initialize(Relation rel, ParallelTableScanDesc pscan,
}
TableScanDesc
-table_beginscan_parallel(Relation relation, ParallelTableScanDesc pscan)
+table_beginscan_parallel(Relation relation, ParallelTableScanDesc pscan, uint32 flags)
{
Snapshot snapshot;
- uint32 flags = SO_TYPE_SEQSCAN |
+
+ flags |= SO_TYPE_SEQSCAN |
SO_ALLOW_STRAT | SO_ALLOW_SYNC | SO_ALLOW_PAGEMODE;
Assert(RelFileLocatorEquals(relation->rd_locator, pscan->phs_locator));
@@ -248,7 +249,7 @@ table_index_fetch_tuple_check(Relation rel,
bool found;
slot = table_slot_create(rel, NULL);
- scan = table_index_fetch_begin(rel);
+ scan = table_index_fetch_begin(rel, 0);
found = table_index_fetch_tuple(scan, tid, snapshot, slot, &call_again,
all_dead);
table_index_fetch_end(scan);
diff --git a/src/backend/commands/constraint.c b/src/backend/commands/constraint.c
index cc11c47b6f2..37cfbd63938 100644
--- a/src/backend/commands/constraint.c
+++ b/src/backend/commands/constraint.c
@@ -106,7 +106,7 @@ unique_key_recheck(PG_FUNCTION_ARGS)
*/
tmptid = checktid;
{
- IndexFetchTableData *scan = table_index_fetch_begin(trigdata->tg_relation);
+ IndexFetchTableData *scan = table_index_fetch_begin(trigdata->tg_relation, 0);
bool call_again = false;
if (!table_index_fetch_tuple(scan, &tmptid, SnapshotSelf, slot,
diff --git a/src/backend/commands/copyto.c b/src/backend/commands/copyto.c
index 4ab4a3893d5..4261baf4a41 100644
--- a/src/backend/commands/copyto.c
+++ b/src/backend/commands/copyto.c
@@ -1156,7 +1156,7 @@ CopyRelationTo(CopyToState cstate, Relation rel, Relation root_rel, uint64 *proc
AttrMap *map = NULL;
TupleTableSlot *root_slot = NULL;
- scandesc = table_beginscan(rel, GetActiveSnapshot(), 0, NULL);
+ scandesc = table_beginscan(rel, GetActiveSnapshot(), 0, NULL, 0);
slot = table_slot_create(rel, NULL);
/*
diff --git a/src/backend/commands/tablecmds.c b/src/backend/commands/tablecmds.c
index f976c0e5c7e..eb35dbbc853 100644
--- a/src/backend/commands/tablecmds.c
+++ b/src/backend/commands/tablecmds.c
@@ -6378,7 +6378,7 @@ ATRewriteTable(AlteredTableInfo *tab, Oid OIDNewHeap)
* checking all the constraints.
*/
snapshot = RegisterSnapshot(GetLatestSnapshot());
- scan = table_beginscan(oldrel, snapshot, 0, NULL);
+ scan = table_beginscan(oldrel, snapshot, 0, NULL, 0);
/*
* Switch to per-tuple memory context and reset it for each tuple
@@ -13768,7 +13768,7 @@ validateForeignKeyConstraint(char *conname,
*/
snapshot = RegisterSnapshot(GetLatestSnapshot());
slot = table_slot_create(rel, NULL);
- scan = table_beginscan(rel, snapshot, 0, NULL);
+ scan = table_beginscan(rel, snapshot, 0, NULL, 0);
perTupCxt = AllocSetContextCreate(CurrentMemoryContext,
"validateForeignKeyConstraint",
@@ -22626,7 +22626,7 @@ MergePartitionsMoveRows(List **wqueue, List *mergingPartitions, Relation newPart
/* Scan through the rows. */
snapshot = RegisterSnapshot(GetLatestSnapshot());
- scan = table_beginscan(mergingPartition, snapshot, 0, NULL);
+ scan = table_beginscan(mergingPartition, snapshot, 0, NULL, 0);
/*
* Switch to per-tuple memory context and reset it for each tuple
@@ -23090,7 +23090,7 @@ SplitPartitionMoveRows(List **wqueue, Relation rel, Relation splitRel,
/* Scan through the rows. */
snapshot = RegisterSnapshot(GetLatestSnapshot());
- scan = table_beginscan(splitRel, snapshot, 0, NULL);
+ scan = table_beginscan(splitRel, snapshot, 0, NULL, 0);
/*
* Switch to per-tuple memory context and reset it for each tuple
diff --git a/src/backend/commands/typecmds.c b/src/backend/commands/typecmds.c
index e5fa0578889..8c114fa56fa 100644
--- a/src/backend/commands/typecmds.c
+++ b/src/backend/commands/typecmds.c
@@ -3157,7 +3157,7 @@ validateDomainNotNullConstraint(Oid domainoid)
/* Scan all tuples in this relation */
snapshot = RegisterSnapshot(GetLatestSnapshot());
- scan = table_beginscan(testrel, snapshot, 0, NULL);
+ scan = table_beginscan(testrel, snapshot, 0, NULL, 0);
slot = table_slot_create(testrel, NULL);
while (table_scan_getnextslot(scan, ForwardScanDirection, slot))
{
@@ -3238,7 +3238,7 @@ validateDomainCheckConstraint(Oid domainoid, const char *ccbin, LOCKMODE lockmod
/* Scan all tuples in this relation */
snapshot = RegisterSnapshot(GetLatestSnapshot());
- scan = table_beginscan(testrel, snapshot, 0, NULL);
+ scan = table_beginscan(testrel, snapshot, 0, NULL, 0);
slot = table_slot_create(testrel, NULL);
while (table_scan_getnextslot(scan, ForwardScanDirection, slot))
{
diff --git a/src/backend/executor/execIndexing.c b/src/backend/executor/execIndexing.c
index 6ae0f959592..6d3e9d2f311 100644
--- a/src/backend/executor/execIndexing.c
+++ b/src/backend/executor/execIndexing.c
@@ -816,7 +816,7 @@ check_exclusion_or_unique_constraint(Relation heap, Relation index,
retry:
conflict = false;
found_self = false;
- index_scan = index_beginscan(heap, index, &DirtySnapshot, NULL, indnkeyatts, 0);
+ index_scan = index_beginscan(heap, index, &DirtySnapshot, NULL, indnkeyatts, 0, 0);
index_rescan(index_scan, scankeys, indnkeyatts, NULL, 0);
while (index_getnext_slot(index_scan, ForwardScanDirection, existing_slot))
diff --git a/src/backend/executor/execReplication.c b/src/backend/executor/execReplication.c
index 173d2fe548d..db1b322c665 100644
--- a/src/backend/executor/execReplication.c
+++ b/src/backend/executor/execReplication.c
@@ -204,7 +204,7 @@ RelationFindReplTupleByIndex(Relation rel, Oid idxoid,
skey_attoff = build_replindex_scan_key(skey, rel, idxrel, searchslot);
/* Start an index scan. */
- scan = index_beginscan(rel, idxrel, &snap, NULL, skey_attoff, 0);
+ scan = index_beginscan(rel, idxrel, &snap, NULL, skey_attoff, 0, 0);
retry:
found = false;
@@ -382,7 +382,7 @@ RelationFindReplTupleSeq(Relation rel, LockTupleMode lockmode,
/* Start a heap scan. */
InitDirtySnapshot(snap);
- scan = table_beginscan(rel, &snap, 0, NULL);
+ scan = table_beginscan(rel, &snap, 0, NULL, 0);
scanslot = table_slot_create(rel, NULL);
retry:
@@ -601,7 +601,7 @@ RelationFindDeletedTupleInfoSeq(Relation rel, TupleTableSlot *searchslot,
* not yet committed or those just committed prior to the scan are
* excluded in update_most_recent_deletion_info().
*/
- scan = table_beginscan(rel, SnapshotAny, 0, NULL);
+ scan = table_beginscan(rel, SnapshotAny, 0, NULL, 0);
scanslot = table_slot_create(rel, NULL);
table_rescan(scan, NULL);
@@ -665,7 +665,7 @@ RelationFindDeletedTupleInfoByIndex(Relation rel, Oid idxoid,
* not yet committed or those just committed prior to the scan are
* excluded in update_most_recent_deletion_info().
*/
- scan = index_beginscan(rel, idxrel, SnapshotAny, NULL, skey_attoff, 0);
+ scan = index_beginscan(rel, idxrel, SnapshotAny, NULL, skey_attoff, 0, 0);
index_rescan(scan, skey, skey_attoff, NULL, 0);
diff --git a/src/backend/executor/nodeBitmapHeapscan.c b/src/backend/executor/nodeBitmapHeapscan.c
index 2c68327cb29..62dff010d10 100644
--- a/src/backend/executor/nodeBitmapHeapscan.c
+++ b/src/backend/executor/nodeBitmapHeapscan.c
@@ -105,11 +105,18 @@ BitmapTableScanSetup(BitmapHeapScanState *node)
*/
if (!node->ss.ss_currentScanDesc)
{
+ uint32 flags = 0;
+
+ if (!bms_is_member(((Scan *) node->ss.ps.plan)->scanrelid,
+ node->ss.ps.state->es_modified_relids))
+ flags = SO_HINT_REL_READ_ONLY;
+
node->ss.ss_currentScanDesc =
table_beginscan_bm(node->ss.ss_currentRelation,
node->ss.ps.state->es_snapshot,
0,
- NULL);
+ NULL,
+ flags);
}
node->ss.ss_currentScanDesc->st.rs_tbmiterator = tbmiterator;
diff --git a/src/backend/executor/nodeIndexonlyscan.c b/src/backend/executor/nodeIndexonlyscan.c
index c2d09374517..cf4d9a4f832 100644
--- a/src/backend/executor/nodeIndexonlyscan.c
+++ b/src/backend/executor/nodeIndexonlyscan.c
@@ -94,7 +94,7 @@ IndexOnlyNext(IndexOnlyScanState *node)
estate->es_snapshot,
&node->ioss_Instrument,
node->ioss_NumScanKeys,
- node->ioss_NumOrderByKeys);
+ node->ioss_NumOrderByKeys, 0);
node->ioss_ScanDesc = scandesc;
diff --git a/src/backend/executor/nodeIndexscan.c b/src/backend/executor/nodeIndexscan.c
index 84823f0b615..00e86c5bfdf 100644
--- a/src/backend/executor/nodeIndexscan.c
+++ b/src/backend/executor/nodeIndexscan.c
@@ -102,6 +102,12 @@ IndexNext(IndexScanState *node)
if (scandesc == NULL)
{
+ uint32 flags = 0;
+
+ if (!bms_is_member(((Scan *) node->ss.ps.plan)->scanrelid,
+ estate->es_modified_relids))
+ flags = SO_HINT_REL_READ_ONLY;
+
/*
* We reach here if the index scan is not parallel, or if we're
* serially executing an index scan that was planned to be parallel.
@@ -111,7 +117,8 @@ IndexNext(IndexScanState *node)
estate->es_snapshot,
&node->iss_Instrument,
node->iss_NumScanKeys,
- node->iss_NumOrderByKeys);
+ node->iss_NumOrderByKeys,
+ flags);
node->iss_ScanDesc = scandesc;
@@ -207,7 +214,7 @@ IndexNextWithReorder(IndexScanState *node)
estate->es_snapshot,
&node->iss_Instrument,
node->iss_NumScanKeys,
- node->iss_NumOrderByKeys);
+ node->iss_NumOrderByKeys, 0);
node->iss_ScanDesc = scandesc;
diff --git a/src/backend/executor/nodeSeqscan.c b/src/backend/executor/nodeSeqscan.c
index b8119face43..7718376bc2f 100644
--- a/src/backend/executor/nodeSeqscan.c
+++ b/src/backend/executor/nodeSeqscan.c
@@ -65,13 +65,20 @@ SeqNext(SeqScanState *node)
if (scandesc == NULL)
{
+ uint32 flags = 0;
+
+ if (!bms_is_member(((Scan *) node->ss.ps.plan)->scanrelid,
+ estate->es_modified_relids))
+ flags = SO_HINT_REL_READ_ONLY;
+
/*
* We reach here if the scan is not parallel, or if we're serially
* executing a scan that was planned to be parallel.
*/
scandesc = table_beginscan(node->ss.ss_currentRelation,
estate->es_snapshot,
- 0, NULL);
+ 0, NULL, flags);
+
node->ss.ss_currentScanDesc = scandesc;
}
@@ -367,14 +374,20 @@ ExecSeqScanInitializeDSM(SeqScanState *node,
{
EState *estate = node->ss.ps.state;
ParallelTableScanDesc pscan;
+ uint32 flags = 0;
pscan = shm_toc_allocate(pcxt->toc, node->pscan_len);
table_parallelscan_initialize(node->ss.ss_currentRelation,
pscan,
estate->es_snapshot);
shm_toc_insert(pcxt->toc, node->ss.ps.plan->plan_node_id, pscan);
+ if (!bms_is_member(((Scan *) node->ss.ps.plan)->scanrelid,
+ estate->es_modified_relids))
+ flags = SO_HINT_REL_READ_ONLY;
+
node->ss.ss_currentScanDesc =
- table_beginscan_parallel(node->ss.ss_currentRelation, pscan);
+ table_beginscan_parallel(node->ss.ss_currentRelation, pscan,
+ flags);
}
/* ----------------------------------------------------------------
@@ -404,8 +417,15 @@ ExecSeqScanInitializeWorker(SeqScanState *node,
ParallelWorkerContext *pwcxt)
{
ParallelTableScanDesc pscan;
+ uint32 flags = 0;
+
+ if (!bms_is_member(((Scan *) node->ss.ps.plan)->scanrelid,
+ node->ss.ps.state->es_modified_relids))
+ flags = SO_HINT_REL_READ_ONLY;
pscan = shm_toc_lookup(pwcxt->toc, node->ss.ps.plan->plan_node_id, false);
node->ss.ss_currentScanDesc =
- table_beginscan_parallel(node->ss.ss_currentRelation, pscan);
+ table_beginscan_parallel(node->ss.ss_currentRelation,
+ pscan,
+ flags);
}
diff --git a/src/backend/partitioning/partbounds.c b/src/backend/partitioning/partbounds.c
index 0ca312ac27d..b7c4e6d1071 100644
--- a/src/backend/partitioning/partbounds.c
+++ b/src/backend/partitioning/partbounds.c
@@ -3362,7 +3362,7 @@ check_default_partition_contents(Relation parent, Relation default_rel,
econtext = GetPerTupleExprContext(estate);
snapshot = RegisterSnapshot(GetLatestSnapshot());
tupslot = table_slot_create(part_rel, &estate->es_tupleTable);
- scan = table_beginscan(part_rel, snapshot, 0, NULL);
+ scan = table_beginscan(part_rel, snapshot, 0, NULL, 0);
/*
* Switch to per-tuple memory context and reset it for each tuple
diff --git a/src/backend/utils/adt/selfuncs.c b/src/backend/utils/adt/selfuncs.c
index 29fec655593..ac181853225 100644
--- a/src/backend/utils/adt/selfuncs.c
+++ b/src/backend/utils/adt/selfuncs.c
@@ -7181,7 +7181,7 @@ get_actual_variable_endpoint(Relation heapRel,
index_scan = index_beginscan(heapRel, indexRel,
&SnapshotNonVacuumable, NULL,
- 1, 0);
+ 1, 0, 0);
/* Set it up for index-only scan */
index_scan->xs_want_itup = true;
index_rescan(index_scan, scankeys, 1, NULL, 0);
diff --git a/src/include/access/genam.h b/src/include/access/genam.h
index e37834c406d..43b9d8aaaf1 100644
--- a/src/include/access/genam.h
+++ b/src/include/access/genam.h
@@ -177,7 +177,7 @@ extern IndexScanDesc index_beginscan(Relation heapRelation,
Relation indexRelation,
Snapshot snapshot,
IndexScanInstrumentation *instrument,
- int nkeys, int norderbys);
+ int nkeys, int norderbys, uint32 flags);
extern IndexScanDesc index_beginscan_bitmap(Relation indexRelation,
Snapshot snapshot,
IndexScanInstrumentation *instrument,
diff --git a/src/include/access/heapam.h b/src/include/access/heapam.h
index ac771390a37..a0e89365c70 100644
--- a/src/include/access/heapam.h
+++ b/src/include/access/heapam.h
@@ -119,6 +119,12 @@ typedef struct IndexFetchHeapData
Buffer xs_cbuf; /* current heap buffer in scan, if any */
/* NB: if xs_cbuf is not InvalidBuffer, we hold a pin on that buffer */
+
+ /*
+ * Some optimizations can only be performed if the query does not modify
+ * the underlying relation. Track that here.
+ */
+ bool modifies_base_rel;
} IndexFetchHeapData;
/* Result codes for HeapTupleSatisfiesVacuum */
diff --git a/src/include/access/tableam.h b/src/include/access/tableam.h
index c9fa9f259cd..6066ae156de 100644
--- a/src/include/access/tableam.h
+++ b/src/include/access/tableam.h
@@ -63,6 +63,8 @@ typedef enum ScanOptions
/* unregister snapshot at scan end? */
SO_TEMP_SNAPSHOT = 1 << 9,
+ /* set if the query doesn't modify the rel */
+ SO_HINT_REL_READ_ONLY = 1 << 10,
} ScanOptions;
/*
@@ -420,7 +422,7 @@ typedef struct TableAmRoutine
*
* Tuples for an index scan can then be fetched via index_fetch_tuple.
*/
- struct IndexFetchTableData *(*index_fetch_begin) (Relation rel);
+ struct IndexFetchTableData *(*index_fetch_begin) (Relation rel, uint32 flags);
/*
* Reset index fetch. Typically this will release cross index fetch
@@ -873,9 +875,9 @@ extern TupleTableSlot *table_slot_create(Relation relation, List **reglist);
*/
static inline TableScanDesc
table_beginscan(Relation rel, Snapshot snapshot,
- int nkeys, ScanKeyData *key)
+ int nkeys, ScanKeyData *key, uint32 flags)
{
- uint32 flags = SO_TYPE_SEQSCAN |
+ flags |= SO_TYPE_SEQSCAN |
SO_ALLOW_STRAT | SO_ALLOW_SYNC | SO_ALLOW_PAGEMODE;
return rel->rd_tableam->scan_begin(rel, snapshot, nkeys, key, NULL, flags);
@@ -918,9 +920,9 @@ table_beginscan_strat(Relation rel, Snapshot snapshot,
*/
static inline TableScanDesc
table_beginscan_bm(Relation rel, Snapshot snapshot,
- int nkeys, ScanKeyData *key)
+ int nkeys, ScanKeyData *key, uint32 flags)
{
- uint32 flags = SO_TYPE_BITMAPSCAN | SO_ALLOW_PAGEMODE;
+ flags |= SO_TYPE_BITMAPSCAN | SO_ALLOW_PAGEMODE;
return rel->rd_tableam->scan_begin(rel, snapshot, nkeys, key,
NULL, flags);
@@ -1127,7 +1129,8 @@ extern void table_parallelscan_initialize(Relation rel,
* Caller must hold a suitable lock on the relation.
*/
extern TableScanDesc table_beginscan_parallel(Relation relation,
- ParallelTableScanDesc pscan);
+ ParallelTableScanDesc pscan,
+ uint32 flags);
/*
* Begin a parallel tid range scan. `pscan` needs to have been initialized
@@ -1163,9 +1166,9 @@ table_parallelscan_reinitialize(Relation rel, ParallelTableScanDesc pscan)
* Tuples for an index scan can then be fetched via table_index_fetch_tuple().
*/
static inline IndexFetchTableData *
-table_index_fetch_begin(Relation rel)
+table_index_fetch_begin(Relation rel, uint32 flags)
{
- return rel->rd_tableam->index_fetch_begin(rel);
+ return rel->rd_tableam->index_fetch_begin(rel, flags);
}
/*
--
2.43.0
v32-0015-Allow-on-access-pruning-to-set-pages-all-visible.patchtext/x-patch; charset=US-ASCII; name=v32-0015-Allow-on-access-pruning-to-set-pages-all-visible.patchDownload
From 4b7ca7c4febc50c6ee019cf01252a19ef1c06797 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <melanieplageman@gmail.com>
Date: Wed, 3 Dec 2025 15:24:08 -0500
Subject: [PATCH v32 15/16] Allow on-access pruning to set pages all-visible
Many queries do not modify the underlying relation. For such queries, if
on-access pruning occurs during the scan, we can check whether the page
has become all-visible and update the visibility map accordingly.
Previously, only vacuum and COPY FREEZE marked pages as all-visible or
all-frozen.
This commit implements on-access VM setting for sequential scans as well
as for the underlying heap relation in index scans and bitmap heap
scans.
Author: Melanie Plageman <melanieplageman@gmail.com>
Reviewed-by: Andres Freund <andres@anarazel.de>
Reviewed-by: Kirill Reshke <reshkekirill@gmail.com>
Discussion: https://postgr.es/m/flat/CAAKRu_ZMw6Npd_qm2KM%2BFwQ3cMOMx1Dh3VMhp8-V7SOLxdK9-g%40mail.gmail.com
---
src/backend/access/heap/heapam.c | 15 ++++++-
src/backend/access/heap/heapam_handler.c | 15 ++++++-
src/backend/access/heap/pruneheap.c | 40 ++++++++++++++++++-
src/include/access/heapam.h | 24 +++++++++--
.../t/035_standby_logical_decoding.pl | 3 +-
5 files changed, 89 insertions(+), 8 deletions(-)
diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index f219c7a71cf..8940297f6f3 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -573,6 +573,7 @@ heap_prepare_pagescan(TableScanDesc sscan)
Buffer buffer = scan->rs_cbuf;
BlockNumber block = scan->rs_cblock;
Snapshot snapshot;
+ Buffer *vmbuffer = NULL;
Page page;
int lines;
bool all_visible;
@@ -587,7 +588,9 @@ heap_prepare_pagescan(TableScanDesc sscan)
/*
* Prune and repair fragmentation for the whole page, if possible.
*/
- heap_page_prune_opt(scan->rs_base.rs_rd, buffer);
+ if (sscan->rs_flags & SO_HINT_REL_READ_ONLY)
+ vmbuffer = &scan->rs_vmbuffer;
+ heap_page_prune_opt(scan->rs_base.rs_rd, buffer, vmbuffer);
/*
* We must hold share lock on the buffer content while examining tuple
@@ -1264,6 +1267,7 @@ heap_beginscan(Relation relation, Snapshot snapshot,
sizeof(TBMIterateResult));
}
+ scan->rs_vmbuffer = InvalidBuffer;
return (TableScanDesc) scan;
}
@@ -1302,6 +1306,12 @@ heap_rescan(TableScanDesc sscan, ScanKey key, bool set_params,
scan->rs_cbuf = InvalidBuffer;
}
+ if (BufferIsValid(scan->rs_vmbuffer))
+ {
+ ReleaseBuffer(scan->rs_vmbuffer);
+ scan->rs_vmbuffer = InvalidBuffer;
+ }
+
/*
* SO_TYPE_BITMAPSCAN would be cleaned up here, but it does not hold any
* additional data vs a normal HeapScan
@@ -1334,6 +1344,9 @@ heap_endscan(TableScanDesc sscan)
if (BufferIsValid(scan->rs_cbuf))
ReleaseBuffer(scan->rs_cbuf);
+ if (BufferIsValid(scan->rs_vmbuffer))
+ ReleaseBuffer(scan->rs_vmbuffer);
+
/*
* Must free the read stream before freeing the BufferAccessStrategy.
*/
diff --git a/src/backend/access/heap/heapam_handler.c b/src/backend/access/heap/heapam_handler.c
index e88db52fd7e..ab175948c5b 100644
--- a/src/backend/access/heap/heapam_handler.c
+++ b/src/backend/access/heap/heapam_handler.c
@@ -85,6 +85,7 @@ heapam_index_fetch_begin(Relation rel, uint32 flags)
hscan->xs_base.rel = rel;
hscan->xs_cbuf = InvalidBuffer;
+ hscan->xs_vmbuffer = InvalidBuffer;
hscan->modifies_base_rel = !(flags & SO_HINT_REL_READ_ONLY);
return &hscan->xs_base;
@@ -100,6 +101,12 @@ heapam_index_fetch_reset(IndexFetchTableData *scan)
ReleaseBuffer(hscan->xs_cbuf);
hscan->xs_cbuf = InvalidBuffer;
}
+
+ if (BufferIsValid(hscan->xs_vmbuffer))
+ {
+ ReleaseBuffer(hscan->xs_vmbuffer);
+ hscan->xs_vmbuffer = InvalidBuffer;
+ }
}
static void
@@ -139,7 +146,8 @@ heapam_index_fetch_tuple(struct IndexFetchTableData *scan,
* Prune page, but only if we weren't already on this page
*/
if (prev_buf != hscan->xs_cbuf)
- heap_page_prune_opt(hscan->xs_base.rel, hscan->xs_cbuf);
+ heap_page_prune_opt(hscan->xs_base.rel, hscan->xs_cbuf,
+ hscan->modifies_base_rel ? NULL : &hscan->xs_vmbuffer);
}
/* Obtain share-lock on the buffer so we can examine visibility */
@@ -2477,6 +2485,7 @@ BitmapHeapScanNextBlock(TableScanDesc scan,
TBMIterateResult *tbmres;
OffsetNumber offsets[TBM_MAX_TUPLES_PER_PAGE];
int noffsets = -1;
+ Buffer *vmbuffer = NULL;
Assert(scan->rs_flags & SO_TYPE_BITMAPSCAN);
Assert(hscan->rs_read_stream);
@@ -2523,7 +2532,9 @@ BitmapHeapScanNextBlock(TableScanDesc scan,
/*
* Prune and repair fragmentation for the whole page, if possible.
*/
- heap_page_prune_opt(scan->rs_rd, buffer);
+ if (scan->rs_flags & SO_HINT_REL_READ_ONLY)
+ vmbuffer = &hscan->rs_vmbuffer;
+ heap_page_prune_opt(scan->rs_rd, buffer, vmbuffer);
/*
* We must hold share lock on the buffer content while examining tuple
diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index 9f1257529b9..04aa56e81b6 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -202,6 +202,8 @@ static bool heap_page_will_set_vm(PruneState *prstate,
Relation relation,
BlockNumber heap_blk, Buffer heap_buffer, Page heap_page,
Buffer vmbuffer,
+ PruneReason reason,
+ bool do_prune, bool do_freeze,
int nlpdead_items,
uint8 *old_vmbits,
uint8 *new_vmbits);
@@ -223,9 +225,13 @@ static TransactionId get_conflict_xid(bool do_prune, bool do_freeze, bool do_set
* if there's not any use in pruning.
*
* Caller must have pin on the buffer, and must *not* have a lock on it.
+ *
+ * If vmbuffer is not NULL, it is okay for pruning to set the visibility map if
+ * the page is all-visible. We will take care of pinning and, if needed,
+ * reading in the page of the visibility map.
*/
void
-heap_page_prune_opt(Relation relation, Buffer buffer)
+heap_page_prune_opt(Relation relation, Buffer buffer, Buffer *vmbuffer)
{
Page page = BufferGetPage(buffer);
TransactionId prune_xid;
@@ -306,6 +312,13 @@ heap_page_prune_opt(Relation relation, Buffer buffer)
.cutoffs = NULL,
};
+ if (vmbuffer)
+ {
+ visibilitymap_pin(relation, BufferGetBlockNumber(buffer), vmbuffer);
+ params.options |= HEAP_PAGE_PRUNE_UPDATE_VM;
+ params.vmbuffer = *vmbuffer;
+ }
+
heap_page_prune_and_freeze(¶ms, &presult, &dummy_off_loc,
NULL, NULL);
@@ -951,6 +964,9 @@ identify_and_fix_vm_corruption(Relation rel, Buffer heap_buffer,
* corrupted, it will fix them by clearing the VM bits and visibility hint.
* This does not need to be done in a critical section.
*
+ * This should be called only after do_freeze has been decided (and do_prune
+ * has been set), as these factor into our heuristic-based decision.
+ *
* Returns true if one or both VM bits should be set, along with returning the
* current value of the VM bits in *old_vmbits and the desired new value of
* the VM bits in *new_vmbits.
@@ -964,6 +980,8 @@ heap_page_will_set_vm(PruneState *prstate,
Relation relation,
BlockNumber heap_blk, Buffer heap_buffer, Page heap_page,
Buffer vmbuffer,
+ PruneReason reason,
+ bool do_prune, bool do_freeze,
int nlpdead_items,
uint8 *old_vmbits,
uint8 *new_vmbits)
@@ -974,6 +992,24 @@ heap_page_will_set_vm(PruneState *prstate,
if (!prstate->attempt_update_vm)
return false;
+ /*
+ * If this is an on-access call and we're not actually pruning, avoid
+ * setting the visibility map if it would newly dirty the heap page or, if
+ * the page is already dirty, if doing so would require including a
+ * full-page image (FPI) of the heap page in the WAL. This situation
+ * should be rare, as on-access pruning is only attempted when
+ * pd_prune_xid is valid.
+ */
+ if (reason == PRUNE_ON_ACCESS &&
+ prstate->all_visible &&
+ !do_prune && !do_freeze &&
+ (!BufferIsDirty(heap_buffer) || XLogCheckBufferNeedsBackup(heap_buffer)))
+ {
+ prstate->all_visible = false;
+ prstate->all_frozen = false;
+ return false;
+ }
+
*old_vmbits = visibilitymap_get_status(relation, heap_blk,
&vmbuffer);
@@ -1171,6 +1207,8 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
buffer,
page,
vmbuffer,
+ params->reason,
+ do_prune, do_freeze,
prstate.lpdead_items,
&old_vmbits,
&new_vmbits);
diff --git a/src/include/access/heapam.h b/src/include/access/heapam.h
index a0e89365c70..7e68928f3e9 100644
--- a/src/include/access/heapam.h
+++ b/src/include/access/heapam.h
@@ -95,6 +95,13 @@ typedef struct HeapScanDescData
*/
ParallelBlockTableScanWorkerData *rs_parallelworkerdata;
+ /*
+ * For sequential scans and bitmap heap scans. If the relation is not
+ * being modified, on-access pruning may read in the current heap page's
+ * corresponding VM block to this buffer.
+ */
+ Buffer rs_vmbuffer;
+
/* these fields only used in page-at-a-time mode and for bitmap scans */
uint32 rs_cindex; /* current tuple's index in vistuples */
uint32 rs_ntuples; /* number of visible tuples on page */
@@ -117,8 +124,18 @@ typedef struct IndexFetchHeapData
{
IndexFetchTableData xs_base; /* AM independent part of the descriptor */
- Buffer xs_cbuf; /* current heap buffer in scan, if any */
- /* NB: if xs_cbuf is not InvalidBuffer, we hold a pin on that buffer */
+ /*
+ * Current heap buffer in scan, if any. NB: if xs_cbuf is not
+ * InvalidBuffer, we hold a pin on that buffer.
+ */
+ Buffer xs_cbuf;
+
+ /*
+ * For index scans that do not modify the underlying heap table, on-access
+ * pruning may read in the current heap page's corresponding VM block to
+ * this buffer.
+ */
+ Buffer xs_vmbuffer;
/*
* Some optimizations can only be performed if the query does not modify
@@ -419,7 +436,8 @@ extern TransactionId heap_index_delete_tuples(Relation rel,
TM_IndexDeleteOp *delstate);
/* in heap/pruneheap.c */
-extern void heap_page_prune_opt(Relation relation, Buffer buffer);
+extern void heap_page_prune_opt(Relation relation, Buffer buffer,
+ Buffer *vmbuffer);
extern void heap_page_prune_and_freeze(PruneFreezeParams *params,
PruneFreezeResult *presult,
OffsetNumber *off_loc,
diff --git a/src/test/recovery/t/035_standby_logical_decoding.pl b/src/test/recovery/t/035_standby_logical_decoding.pl
index d264a698ff6..a5536ba4ff6 100644
--- a/src/test/recovery/t/035_standby_logical_decoding.pl
+++ b/src/test/recovery/t/035_standby_logical_decoding.pl
@@ -296,6 +296,7 @@ wal_level = 'logical'
max_replication_slots = 4
max_wal_senders = 4
autovacuum = off
+hot_standby_feedback = on
});
$node_primary->dump_info;
$node_primary->start;
@@ -748,7 +749,7 @@ check_pg_recvlogical_stderr($handle,
$logstart = -s $node_standby->logfile;
reactive_slots_change_hfs_and_wait_for_xmins('shared_row_removal_',
- 'no_conflict_', 0, 1);
+ 'no_conflict_', 1, 0);
# This should not trigger a conflict
wait_until_vacuum_can_remove(
--
2.43.0
v32-0016-Set-pd_prune_xid-on-insert.patchtext/x-patch; charset=UTF-8; name=v32-0016-Set-pd_prune_xid-on-insert.patchDownload
From b55c561d064517234447c5d1a20b868482762b06 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <melanieplageman@gmail.com>
Date: Tue, 29 Jul 2025 16:12:56 -0400
Subject: [PATCH v32 16/16] Set pd_prune_xid on insert
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
Now that visibility map (VM) updates can occur during read-only queries,
it makes sense to also set the page’s pd_prune_xid hint during inserts.
This enables heap_page_prune_and_freeze() to run after a page is
filled with newly inserted tuples the first time it is read.
This change also addresses a long-standing note in heap_insert() and
heap_multi_insert(), which observed that setting pd_prune_xid would
help clean up aborted insertions sooner. Without it, such tuples might
linger until VACUUM, whereas now they can be pruned earlier.
The index killtuples test had to be updated to reflect a larger number
of hits by some accesses. Since the prune_xid is set by the fill/insert
step, on-access pruning can happen during the first access step (before
the DELETE). This is when the VM is extended. After the DELETE, the next
access hits the VM block instead of extending it. Thus, an additional
buffer hit is counted for the table.
Reviewed-by: Chao Li <li.evan.chao@gmail.com>
---
src/backend/access/heap/heapam.c | 25 +++++++++++++------
src/backend/access/heap/heapam_xlog.c | 15 ++++++++++-
.../modules/index/expected/killtuples.out | 6 ++---
3 files changed, 34 insertions(+), 12 deletions(-)
diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index 8940297f6f3..18413d5878f 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -2122,6 +2122,7 @@ heap_insert(Relation relation, HeapTuple tup, CommandId cid,
TransactionId xid = GetCurrentTransactionId();
HeapTuple heaptup;
Buffer buffer;
+ Page page;
Buffer vmbuffer = InvalidBuffer;
bool all_visible_cleared = false;
@@ -2181,15 +2182,19 @@ heap_insert(Relation relation, HeapTuple tup, CommandId cid,
}
/*
- * XXX Should we set PageSetPrunable on this page ?
+ * Set pd_prune_xid to trigger heap_page_prune_and_freeze() once the page
+ * is full so that we can set the page all-visible in the VM.
*
- * The inserting transaction may eventually abort thus making this tuple
- * DEAD and hence available for pruning. Though we don't want to optimize
- * for aborts, if no other tuple in this page is UPDATEd/DELETEd, the
- * aborted tuple will never be pruned until next vacuum is triggered.
+ * Setting pd_prune_xid is also handy if the inserting transaction
+ * eventually aborts making this tuple DEAD and hence available for
+ * pruning. If no other tuple in this page is UPDATEd/DELETEd, the aborted
+ * tuple would never otherwise be pruned until next vacuum is triggered.
*
- * If you do add PageSetPrunable here, add it in heap_xlog_insert too.
+ * Don't set it if we are in bootstrap mode, though.
*/
+ page = BufferGetPage(buffer);
+ if (TransactionIdIsNormal(xid))
+ PageSetPrunable(page, xid);
MarkBufferDirty(buffer);
@@ -2199,7 +2204,6 @@ heap_insert(Relation relation, HeapTuple tup, CommandId cid,
xl_heap_insert xlrec;
xl_heap_header xlhdr;
XLogRecPtr recptr;
- Page page = BufferGetPage(buffer);
uint8 info = XLOG_HEAP_INSERT;
int bufflags = 0;
@@ -2563,8 +2567,13 @@ heap_multi_insert(Relation relation, TupleTableSlot **slots, int ntuples,
}
/*
- * XXX Should we set PageSetPrunable on this page ? See heap_insert()
+ * Set pd_prune_xid. See heap_insert() for more on why we do this when
+ * inserting tuples. This only makes sense if we aren't already
+ * setting the page frozen in the VM. We also don't set it in
+ * bootstrap mode.
*/
+ if (!all_frozen_set && TransactionIdIsNormal(xid))
+ PageSetPrunable(page, xid);
MarkBufferDirty(buffer);
diff --git a/src/backend/access/heap/heapam_xlog.c b/src/backend/access/heap/heapam_xlog.c
index 9a29fda3601..49cc83a6479 100644
--- a/src/backend/access/heap/heapam_xlog.c
+++ b/src/backend/access/heap/heapam_xlog.c
@@ -447,6 +447,12 @@ heap_xlog_insert(XLogReaderState *record)
freespace = PageGetHeapFreeSpace(page); /* needed to update FSM below */
+ /*
+ * Set the page prunable to trigger on-access pruning later which may
+ * set the page all-visible in the VM.
+ */
+ PageSetPrunable(page, XLogRecGetXid(record));
+
PageSetLSN(page, lsn);
if (xlrec->flags & XLH_INSERT_ALL_VISIBLE_CLEARED)
@@ -596,9 +602,16 @@ heap_xlog_multi_insert(XLogReaderState *record)
if (xlrec->flags & XLH_INSERT_ALL_VISIBLE_CLEARED)
PageClearAllVisible(page);
- /* XLH_INSERT_ALL_FROZEN_SET implies that all tuples are visible */
+ /*
+ * XLH_INSERT_ALL_FROZEN_SET implies that all tuples are visible. If
+ * we are not setting the page frozen, then set the page's prunable
+ * hint so that we trigger on-access pruning later which may set the
+ * page all-visible in the VM.
+ */
if (xlrec->flags & XLH_INSERT_ALL_FROZEN_SET)
PageSetAllVisible(page);
+ else
+ PageSetPrunable(page, XLogRecGetXid(record));
MarkBufferDirty(buffer);
}
diff --git a/src/test/modules/index/expected/killtuples.out b/src/test/modules/index/expected/killtuples.out
index be7ddd756ef..b29f2434b00 100644
--- a/src/test/modules/index/expected/killtuples.out
+++ b/src/test/modules/index/expected/killtuples.out
@@ -54,7 +54,7 @@ step flush: SELECT FROM pg_stat_force_next_flush();
step result: SELECT heap_blks_read + heap_blks_hit - counter.heap_accesses AS new_heap_accesses FROM counter, pg_statio_all_tables WHERE relname = 'kill_prior_tuple';
new_heap_accesses
-----------------
- 1
+ 2
(1 row)
step measure: UPDATE counter SET heap_accesses = (SELECT heap_blks_read + heap_blks_hit FROM pg_statio_all_tables WHERE relname = 'kill_prior_tuple');
@@ -130,7 +130,7 @@ step flush: SELECT FROM pg_stat_force_next_flush();
step result: SELECT heap_blks_read + heap_blks_hit - counter.heap_accesses AS new_heap_accesses FROM counter, pg_statio_all_tables WHERE relname = 'kill_prior_tuple';
new_heap_accesses
-----------------
- 1
+ 2
(1 row)
step measure: UPDATE counter SET heap_accesses = (SELECT heap_blks_read + heap_blks_hit FROM pg_statio_all_tables WHERE relname = 'kill_prior_tuple');
@@ -283,7 +283,7 @@ step flush: SELECT FROM pg_stat_force_next_flush();
step result: SELECT heap_blks_read + heap_blks_hit - counter.heap_accesses AS new_heap_accesses FROM counter, pg_statio_all_tables WHERE relname = 'kill_prior_tuple';
new_heap_accesses
-----------------
- 1
+ 2
(1 row)
step measure: UPDATE counter SET heap_accesses = (SELECT heap_blks_read + heap_blks_hit FROM pg_statio_all_tables WHERE relname = 'kill_prior_tuple');
--
2.43.0
On 6 Jan 2026, at 00:24, Melanie Plageman <melanieplageman@gmail.com> wrote:
<v32-0014-Pass-down-information-on-table-modification-to-s.patch>
I've tried to take an attempt to review some patches of this patchset. It's huge and mostly polished.
In a step "Pass down information on table modification to scan node" you pass SO_HINT_REL_READ_ONLY flag in IndexNext() and BitmapTableScanSetup(), but not in IndexNextWithReorder() and IndexOnlyNext(). Is there a reason why index scans with ordering cannot use on-access VM setting?
Also, comment about visibilitymap_set() says "Callers that log VM changes separately should use visibilitymap_set()" as if visibilitymap_set() is some other function.
Best regards, Andrey Borodin.
On Tue, Jan 6, 2026 at 4:40 AM Andrey Borodin <x4mmm@yandex-team.ru> wrote:
<v32-0014-Pass-down-information-on-table-modification-to-s.patch>
I've tried to take an attempt to review some patches of this patchset. It's huge and mostly polished.
I've added attributed your review on the patches you specifically
mention here (and from previous emails you sent). Let me know if there
are other patches you reviewed that you did not mention.
In a step "Pass down information on table modification to scan node" you pass SO_HINT_REL_READ_ONLY flag in IndexNext() and BitmapTableScanSetup(), but not in IndexNextWithReorder() and IndexOnlyNext(). Is there a reason why index scans with ordering cannot use on-access VM setting?
Great point, I simply hadn't tested those cases and didn't think to
add them. I've added them in attached v33.
While looking at other callers of index_beginscan(), I was wondering
if systable_beginscan() and systable_beginscan_ordered() should ever
pass SO_HINT_REL_READ_ONLY. I guess we would need to pass if the
operation is read-only above the index_beginscan() -- I'm not sure if
we always know in the caller of systable_beginscan() whether this
operation will modify the catalog. That seems like it could be a
separate project, though, so maybe it is better to say this feature is
just for regular tables.
As for the other cases: We don't have the relation range table index
in check_exclusion_or_unique_constraints(), so I don't think we can do
it there.
And I think that the other index scan cases like in replication code
or get_actual_variable_endpoint() are too small to be worth it, don't
have the needed info, or don't do on-access pruning (bc of the
snapshot type they use).
Also, comment about visibilitymap_set() says "Callers that log VM changes separately should use visibilitymap_set()" as if visibilitymap_set() is some other function.
Ah, yes, I forgot to remove that when I removed the old
visibilitymap_set() and made visibilitymap_set_vmbits() into
visiblitymap_set(). Done in v33.
- Melanie
Attachments:
v33-0004-Set-the-VM-in-heap_page_prune_and_freeze.patchtext/x-patch; charset=US-ASCII; name=v33-0004-Set-the-VM-in-heap_page_prune_and_freeze.patchDownload
From 5c65e73246b4968ddfa9d3739f53d0d8734b8727 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <melanieplageman@gmail.com>
Date: Tue, 2 Dec 2025 15:07:42 -0500
Subject: [PATCH v33 04/16] Set the VM in heap_page_prune_and_freeze()
This has no independent benefit. It is meant for ease of review. As of
this commit, there is still a separate WAL record emitted for setting
the VM after pruning and freezing. But it is easier to review if moving
the logic into pruneheap.c is separate from setting the VM in the same
WAL record.
Reviewed-by: Chao Li <li.evan.chao@gmail.com>
---
src/backend/access/heap/pruneheap.c | 315 +++++++++++++++++++++++----
src/backend/access/heap/vacuumlazy.c | 150 +------------
src/include/access/heapam.h | 20 ++
3 files changed, 299 insertions(+), 186 deletions(-)
diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index af788b29714..53b7711ab21 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -19,7 +19,7 @@
#include "access/htup_details.h"
#include "access/multixact.h"
#include "access/transam.h"
-#include "access/visibilitymapdefs.h"
+#include "access/visibilitymap.h"
#include "access/xlog.h"
#include "access/xloginsert.h"
#include "commands/vacuum.h"
@@ -44,6 +44,8 @@ typedef struct
bool mark_unused_now;
/* whether to attempt freezing tuples */
bool attempt_freeze;
+ /* whether or not to attempt updating the VM */
+ bool attempt_update_vm;
struct VacuumCutoffs *cutoffs;
/*-------------------------------------------------------
@@ -140,16 +142,17 @@ typedef struct
* all_visible and all_frozen indicate if the all-visible and all-frozen
* bits in the visibility map can be set for this page after pruning.
*
- * visibility_cutoff_xid is the newest xmin of live tuples on the page.
- * The caller can use it as the conflict horizon, when setting the VM
- * bits. It is only valid if we froze some tuples, and all_frozen is
- * true.
+ * visibility_cutoff_xid is the newest xmin of live tuples on the page. It
+ * can be used as the conflict horizon when setting the VM or when
+ * freezing all the tuples on the page. It is only valid when all the live
+ * tuples on the page are all-visible.
*
* NOTE: all_visible and all_frozen initially don't include LP_DEAD items.
* That's convenient for heap_page_prune_and_freeze() to use them to
- * decide whether to freeze the page or not. The all_visible and
- * all_frozen values returned to the caller are adjusted to include
- * LP_DEAD items after we determine whether to opportunistically freeze.
+ * decide whether to opportunistically freeze the page or not. The
+ * all_visible and all_frozen values ultimately used to set the VM are
+ * adjusted to include LP_DEAD items after we determine whether or not to
+ * opportunistically freeze.
*/
bool all_visible;
bool all_frozen;
@@ -191,6 +194,17 @@ static void page_verify_redirects(Page page);
static bool heap_page_will_freeze(Relation relation, Buffer buffer,
bool did_tuple_hint_fpi, bool do_prune, bool do_hint_prune,
PruneState *prstate);
+static bool identify_and_fix_vm_corruption(Relation rel, Buffer heap_buffer,
+ BlockNumber heap_blk, Page heap_page, int nlpdead_items,
+ Buffer vmbuffer,
+ uint8 vmbits);
+static bool heap_page_will_set_vm(PruneState *prstate,
+ Relation relation,
+ BlockNumber heap_blk, Buffer heap_buffer, Page heap_page,
+ Buffer vmbuffer,
+ int nlpdead_items,
+ uint8 *old_vmbits,
+ uint8 *new_vmbits);
/*
@@ -280,6 +294,7 @@ heap_page_prune_opt(Relation relation, Buffer buffer)
PruneFreezeParams params = {
.relation = relation,
.buffer = buffer,
+ .vmbuffer = InvalidBuffer,
.reason = PRUNE_ON_ACCESS,
.options = 0,
.vistest = vistest,
@@ -341,6 +356,8 @@ prune_freeze_setup(PruneFreezeParams *params,
/* cutoffs must be provided if we will attempt freezing */
Assert(!(params->options & HEAP_PAGE_PRUNE_FREEZE) || params->cutoffs);
prstate->attempt_freeze = (params->options & HEAP_PAGE_PRUNE_FREEZE) != 0;
+ prstate->attempt_update_vm =
+ (params->options & HEAP_PAGE_PRUNE_UPDATE_VM) != 0;
prstate->cutoffs = params->cutoffs;
/*
@@ -396,51 +413,54 @@ prune_freeze_setup(PruneFreezeParams *params,
prstate->frz_conflict_horizon = InvalidTransactionId;
/*
- * Vacuum may update the VM after we're done. We can keep track of
- * whether the page will be all-visible and all-frozen after pruning and
- * freezing to help the caller to do that.
+ * Track whether the page could be marked all-visible and/or all-frozen.
+ * This information is used for opportunistic freezing and for updating
+ * the visibility map (VM) if requested by the caller.
*
- * Currently, only VACUUM sets the VM bits. To save the effort, only do
- * the bookkeeping if the caller needs it. Currently, that's tied to
- * HEAP_PAGE_PRUNE_FREEZE, but it could be a separate flag if you wanted
- * to update the VM bits without also freezing or freeze without also
- * setting the VM bits.
+ * Currently, only VACUUM performs freezing, but other callers may in the
+ * future. Visibility bookkeeping is required not just for setting the VM
+ * bits, but also for opportunistic freezing: we only consider freezing if
+ * the page would become all-frozen, or if it would be all-frozen except
+ * for dead tuples that VACUUM will remove. If attempt_update_vm is false,
+ * we will not set the VM bit even if the page is found to be all-visible.
*
- * In addition to telling the caller whether it can set the VM bit, we
- * also use 'all_visible' and 'all_frozen' for our own decision-making. If
- * the whole page would become frozen, we consider opportunistically
- * freezing tuples. We will not be able to freeze the whole page if there
- * are tuples present that are not visible to everyone or if there are
- * dead tuples which are not yet removable. However, dead tuples which
- * will be removed by the end of vacuuming should not preclude us from
- * opportunistically freezing. Because of that, we do not immediately
- * clear all_visible and all_frozen when we see LP_DEAD items. We fix
- * that after scanning the line pointers. We must correct all_visible and
- * all_frozen before we return them to the caller, so that the caller
- * doesn't set the VM bits incorrectly.
+ * If HEAP_PAGE_PRUNE_UPDATE_VM is passed without HEAP_PAGE_PRUNE_FREEZE,
+ * prstate.all_frozen must be initialized to false, since we will not call
+ * heap_prepare_freeze_tuple() for each tuple.
+ *
+ * Dead tuples that will be removed by the end of vacuum should not
+ * prevent opportunistic freezing. Therefore, we do not clear all_visible
+ * and all_frozen when we encounter LP_DEAD items. Instead, we correct
+ * them after deciding whether to freeze, but before updating the VM, to
+ * avoid setting the VM bits incorrectly.
+ *
+ * If neither freezing nor VM updates are requested, we skip the extra
+ * bookkeeping. In this case, initializing all_visible to false allows
+ * heap_prune_record_unchanged_lp_normal() to bypass unnecessary work.
*/
if (prstate->attempt_freeze)
{
prstate->all_visible = true;
prstate->all_frozen = true;
}
+ else if (prstate->attempt_update_vm)
+ {
+ prstate->all_visible = true;
+ prstate->all_frozen = false;
+ }
else
{
- /*
- * Initializing to false allows skipping the work to update them in
- * heap_prune_record_unchanged_lp_normal().
- */
prstate->all_visible = false;
prstate->all_frozen = false;
}
/*
- * The visibility cutoff xid is the newest xmin of live tuples on the
- * page. In the common case, this will be set as the conflict horizon the
- * caller can use for updating the VM. If, at the end of freezing and
- * pruning, the page is all-frozen, there is no possibility that any
- * running transaction on the standby does not see tuples on the page as
- * all-visible, so the conflict horizon remains InvalidTransactionId.
+ * The visibility cutoff xid is the newest xmin of live, committed tuples
+ * older than OldestXmin on the page. This field is only kept up-to-date
+ * if the page is all-visible. As soon as a tuple is encountered that is
+ * not visible to all, this field is unmaintained. As long as it is
+ * maintained, it can be used to calculate the snapshot conflict horizon
+ * when updating the VM and/or freezing all the tuples on the page.
*/
prstate->visibility_cutoff_xid = InvalidTransactionId;
}
@@ -775,10 +795,148 @@ heap_page_will_freeze(Relation relation, Buffer buffer,
return do_freeze;
}
+/*
+ * Helper to correct any corruption detected on a heap page and its
+ * corresponding visibility map page after pruning but before setting the
+ * visibility map. It examines the heap page, the associated VM page, and the
+ * number of dead items previously identified.
+ *
+ * This function must be called while holding an exclusive lock on the heap
+ * buffer, and the dead items must have been discovered under that same lock.
+ *
+ * The provided vmbits must reflect the current state of the VM block
+ * referenced by vmbuffer. Although we do not hold a lock on the VM buffer, it
+ * is pinned, and the heap buffer is exclusively locked, ensuring that no
+ * other backend can update the VM bits corresponding to this heap page.
+ *
+ * Returns true if it cleared corruption and false otherwise.
+ */
+static bool
+identify_and_fix_vm_corruption(Relation rel, Buffer heap_buffer,
+ BlockNumber heap_blk, Page heap_page,
+ int nlpdead_items,
+ Buffer vmbuffer,
+ uint8 vmbits)
+{
+ Assert(visibilitymap_get_status(rel, heap_blk, &vmbuffer) == vmbits);
+
+ Assert(BufferIsLockedByMeInMode(heap_buffer, BUFFER_LOCK_EXCLUSIVE));
+
+ /*
+ * As of PostgreSQL 9.2, the visibility map bit should never be set if the
+ * page-level bit is clear. However, it's possible that the bit got
+ * cleared after heap_vac_scan_next_block() was called, so we must recheck
+ * with buffer lock before concluding that the VM is corrupt.
+ */
+ if (!PageIsAllVisible(heap_page) &&
+ ((vmbits & VISIBILITYMAP_VALID_BITS) != 0))
+ {
+ ereport(WARNING,
+ (errcode(ERRCODE_DATA_CORRUPTED),
+ errmsg("page is not marked all-visible but visibility map bit is set in relation \"%s\" page %u",
+ RelationGetRelationName(rel), heap_blk)));
+
+ visibilitymap_clear(rel, heap_blk, vmbuffer,
+ VISIBILITYMAP_VALID_BITS);
+ return true;
+ }
+
+ /*
+ * It's possible for the value returned by
+ * GetOldestNonRemovableTransactionId() to move backwards, so it's not
+ * wrong for us to see tuples that appear to not be visible to everyone
+ * yet, while PD_ALL_VISIBLE is already set. The real safe xmin value
+ * never moves backwards, but GetOldestNonRemovableTransactionId() is
+ * conservative and sometimes returns a value that's unnecessarily small,
+ * so if we see that contradiction it just means that the tuples that we
+ * think are not visible to everyone yet actually are, and the
+ * PD_ALL_VISIBLE flag is correct.
+ *
+ * There should never be LP_DEAD items on a page with PD_ALL_VISIBLE set,
+ * however.
+ */
+ else if (PageIsAllVisible(heap_page) && nlpdead_items > 0)
+ {
+ ereport(WARNING,
+ (errcode(ERRCODE_DATA_CORRUPTED),
+ errmsg("page containing LP_DEAD items is marked as all-visible in relation \"%s\" page %u",
+ RelationGetRelationName(rel), heap_blk)));
+
+ PageClearAllVisible(heap_page);
+ MarkBufferDirty(heap_buffer);
+ visibilitymap_clear(rel, heap_blk, vmbuffer,
+ VISIBILITYMAP_VALID_BITS);
+ return true;
+ }
+
+ return false;
+}
+
+/*
+ * Decide whether to set the visibility map bits (all-visible and all-frozen)
+ * for heap_blk using information from the PruneState and VM.
+ *
+ * This function does not actually set the VM bits or page-level visibility
+ * hint, PD_ALL_VISIBLE.
+ *
+ * However, if it finds that the page-level visibility hint or VM is
+ * corrupted, it will fix them by clearing the VM bits and visibility hint.
+ * This does not need to be done in a critical section.
+ *
+ * Returns true if one or both VM bits should be set, along with returning the
+ * current value of the VM bits in *old_vmbits and the desired new value of
+ * the VM bits in *new_vmbits.
+ *
+ * If the VM should not be set, it returns false. If we won't consider
+ * updating the VM, *old_vmbits will be 0, regardless of the current value of
+ * the VM bits.
+ */
+static bool
+heap_page_will_set_vm(PruneState *prstate,
+ Relation relation,
+ BlockNumber heap_blk, Buffer heap_buffer, Page heap_page,
+ Buffer vmbuffer,
+ int nlpdead_items,
+ uint8 *old_vmbits,
+ uint8 *new_vmbits)
+{
+ *old_vmbits = 0;
+ *new_vmbits = 0;
+
+ if (!prstate->attempt_update_vm)
+ return false;
+
+ *old_vmbits = visibilitymap_get_status(relation, heap_blk,
+ &vmbuffer);
+
+ /* We do this even if not all-visible */
+ if (identify_and_fix_vm_corruption(relation, heap_buffer, heap_blk, heap_page,
+ nlpdead_items, vmbuffer,
+ *old_vmbits))
+ *old_vmbits = 0;
+
+ if (!prstate->all_visible)
+ return false;
+
+ *new_vmbits = VISIBILITYMAP_ALL_VISIBLE;
+
+ if (prstate->all_frozen)
+ *new_vmbits |= VISIBILITYMAP_ALL_FROZEN;
+
+ if (*new_vmbits == *old_vmbits)
+ {
+ *new_vmbits = 0;
+ return false;
+ }
+
+ return true;
+}
+
/*
* Prune and repair fragmentation and potentially freeze tuples on the
- * specified page.
+ * specified page. If the page's visibility status has changed, update it in
+ * the VM.
*
* Caller must have pin and buffer cleanup lock on the page. Note that we
* don't update the FSM information for page on caller's behalf. Caller might
@@ -793,12 +951,13 @@ heap_page_will_freeze(Relation relation, Buffer buffer,
* tuples if it's required in order to advance relfrozenxid / relminmxid, or
* if it's considered advantageous for overall system performance to do so
* now. The 'params.cutoffs', 'presult', 'new_relfrozen_xid' and
- * 'new_relmin_mxid' arguments are required when freezing. When
- * HEAP_PAGE_PRUNE_FREEZE option is passed, we also set presult->all_visible
- * and presult->all_frozen after determining whether or not to
- * opportunistically freeze, to indicate if the VM bits can be set. They are
- * always set to false when the HEAP_PAGE_PRUNE_FREEZE option is not passed,
- * because at the moment only callers that also freeze need that information.
+ * 'new_relmin_mxid' arguments are required when freezing.
+ *
+ * If HEAP_PAGE_PRUNE_UPDATE_VM is set in params and the visibility status of
+ * the page has changed, we will update the VM at the same time as pruning and
+ * freezing the heap page. We will also update presult->old_vmbits and
+ * presult->new_vmbits with the state of the VM before and after updating it
+ * for the caller to use in bookkeeping.
*
* presult contains output parameters needed by callers, such as the number of
* tuples removed and the offsets of dead items on the page after pruning.
@@ -823,13 +982,18 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
MultiXactId *new_relmin_mxid)
{
Buffer buffer = params->buffer;
+ Buffer vmbuffer = params->vmbuffer;
Page page = BufferGetPage(buffer);
+ BlockNumber blockno = BufferGetBlockNumber(buffer);
PruneState prstate;
bool do_freeze;
bool do_prune;
bool do_hint_prune;
+ bool do_set_vm;
bool did_tuple_hint_fpi;
int64 fpi_before = pgWalUsage.wal_fpi;
+ uint8 new_vmbits;
+ uint8 old_vmbits;
/* Initialize prstate */
prune_freeze_setup(params,
@@ -1011,6 +1175,65 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
*new_relmin_mxid = prstate.pagefrz.NoFreezePageRelminMxid;
}
}
+
+ /* Now update the visibility map and PD_ALL_VISIBLE hint */
+ Assert(!prstate.all_visible || (prstate.lpdead_items == 0));
+
+ do_set_vm = heap_page_will_set_vm(&prstate,
+ params->relation,
+ blockno,
+ buffer,
+ page,
+ vmbuffer,
+ prstate.lpdead_items,
+ &old_vmbits,
+ &new_vmbits);
+
+ /*
+ * new_vmbits should be 0 regardless of whether or not the page is
+ * all-visible if we do not intend to set the VM.
+ */
+ Assert(do_set_vm || new_vmbits == 0);
+
+ /* Set the visibility map and page visibility hint, if relevant */
+ if (do_set_vm)
+ {
+ Assert(prstate.all_visible);
+
+ /*
+ * It should never be the case that the visibility map page is set
+ * while the page-level bit is clear, but the reverse is allowed (if
+ * checksums are not enabled). Regardless, set both bits so that we
+ * get back in sync.
+ *
+ * The heap buffer must be marked dirty before adding it to the WAL
+ * chain when setting the VM. We don't worry about unnecessarily
+ * dirtying the heap buffer if PD_ALL_VISIBLE is already set, though.
+ * It is extremely rare to have a clean heap buffer with
+ * PD_ALL_VISIBLE already set and the VM bits clear, so there is no
+ * point in optimizing it.
+ */
+ PageSetAllVisible(page);
+ MarkBufferDirty(buffer);
+
+ /*
+ * If the page is being set all-frozen, we pass InvalidTransactionId
+ * as the cutoff_xid, since a snapshot conflict horizon sufficient to
+ * make everything safe for REDO was logged when the page's tuples
+ * were frozen.
+ */
+ Assert(!prstate.all_frozen ||
+ !TransactionIdIsValid(presult->vm_conflict_horizon));
+
+ visibilitymap_set(params->relation, blockno, buffer,
+ InvalidXLogRecPtr,
+ vmbuffer, presult->vm_conflict_horizon,
+ new_vmbits);
+ }
+
+ /* Save the vmbits for caller */
+ presult->old_vmbits = old_vmbits;
+ presult->new_vmbits = new_vmbits;
}
@@ -1485,6 +1708,8 @@ heap_prune_record_unchanged_lp_normal(Page page, PruneState *prstate, OffsetNumb
{
TransactionId xmin;
+ Assert(prstate->attempt_update_vm);
+
if (!HeapTupleHeaderXminCommitted(htup))
{
prstate->all_visible = false;
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index 5857fd1bfb6..fe816299f4b 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -424,11 +424,7 @@ static void find_next_unskippable_block(LVRelState *vacrel, bool *skipsallvis);
static bool lazy_scan_new_or_empty(LVRelState *vacrel, Buffer buf,
BlockNumber blkno, Page page,
bool sharelock, Buffer vmbuffer);
-static bool identify_and_fix_vm_corruption(Relation rel, Buffer heap_buffer,
- BlockNumber heap_blk, Page heap_page,
- int nlpdead_items,
- Buffer vmbuffer,
- uint8 vmbits);
+
static int lazy_scan_prune(LVRelState *vacrel, Buffer buf,
BlockNumber blkno, Page page,
Buffer vmbuffer,
@@ -1962,83 +1958,6 @@ cmpOffsetNumbers(const void *a, const void *b)
return pg_cmp_u16(*(const OffsetNumber *) a, *(const OffsetNumber *) b);
}
-/*
- * Helper to correct any corruption detected on a heap page and its
- * corresponding visibility map page after pruning but before setting the
- * visibility map. It examines the heap page, the associated VM page, and the
- * number of dead items previously identified.
- *
- * This function must be called while holding an exclusive lock on the heap
- * buffer, and the dead items must have been discovered under that same lock.
-
- * The provided vmbits must reflect the current state of the VM block
- * referenced by vmbuffer. Although we do not hold a lock on the VM buffer, it
- * is pinned, and the heap buffer is exclusively locked, ensuring that no
- * other backend can update the VM bits corresponding to this heap page.
- *
- * Returns true if it cleared corruption and false otherwise.
- */
-static bool
-identify_and_fix_vm_corruption(Relation rel, Buffer heap_buffer,
- BlockNumber heap_blk, Page heap_page,
- int nlpdead_items,
- Buffer vmbuffer,
- uint8 vmbits)
-{
- Assert(visibilitymap_get_status(rel, heap_blk, &vmbuffer) == vmbits);
-
- Assert(BufferIsLockedByMeInMode(heap_buffer, BUFFER_LOCK_EXCLUSIVE));
-
- /*
- * As of PostgreSQL 9.2, the visibility map bit should never be set if the
- * page-level bit is clear. However, it's possible that the bit got
- * cleared after heap_vac_scan_next_block() was called, so we must recheck
- * with buffer lock before concluding that the VM is corrupt.
- */
- if (!PageIsAllVisible(heap_page) &&
- ((vmbits & VISIBILITYMAP_VALID_BITS) != 0))
- {
- ereport(WARNING,
- (errcode(ERRCODE_DATA_CORRUPTED),
- errmsg("page is not marked all-visible but visibility map bit is set in relation \"%s\" page %u",
- RelationGetRelationName(rel), heap_blk)));
-
- visibilitymap_clear(rel, heap_blk, vmbuffer,
- VISIBILITYMAP_VALID_BITS);
- return true;
- }
-
- /*
- * It's possible for the value returned by
- * GetOldestNonRemovableTransactionId() to move backwards, so it's not
- * wrong for us to see tuples that appear to not be visible to everyone
- * yet, while PD_ALL_VISIBLE is already set. The real safe xmin value
- * never moves backwards, but GetOldestNonRemovableTransactionId() is
- * conservative and sometimes returns a value that's unnecessarily small,
- * so if we see that contradiction it just means that the tuples that we
- * think are not visible to everyone yet actually are, and the
- * PD_ALL_VISIBLE flag is correct.
- *
- * There should never be LP_DEAD items on a page with PD_ALL_VISIBLE set,
- * however.
- */
- else if (PageIsAllVisible(heap_page) && nlpdead_items > 0)
- {
- ereport(WARNING,
- (errcode(ERRCODE_DATA_CORRUPTED),
- errmsg("page containing LP_DEAD items is marked as all-visible in relation \"%s\" page %u",
- RelationGetRelationName(rel), heap_blk)));
-
- PageClearAllVisible(heap_page);
- MarkBufferDirty(heap_buffer);
- visibilitymap_clear(rel, heap_blk, vmbuffer,
- VISIBILITYMAP_VALID_BITS);
- return true;
- }
-
- return false;
-}
-
/*
* lazy_scan_prune() -- lazy_scan_heap() pruning and freezing.
*
@@ -2070,13 +1989,12 @@ lazy_scan_prune(LVRelState *vacrel,
PruneFreezeParams params = {
.relation = rel,
.buffer = buf,
+ .vmbuffer = vmbuffer,
.reason = PRUNE_VACUUM_SCAN,
- .options = HEAP_PAGE_PRUNE_FREEZE,
+ .options = HEAP_PAGE_PRUNE_FREEZE | HEAP_PAGE_PRUNE_UPDATE_VM,
.vistest = vacrel->vistest,
.cutoffs = &vacrel->cutoffs,
};
- uint8 old_vmbits = 0;
- uint8 new_vmbits = 0;
Assert(BufferGetBlockNumber(buf) == blkno);
@@ -2176,75 +2094,25 @@ lazy_scan_prune(LVRelState *vacrel,
/* Did we find LP_DEAD items? */
*has_lpdead_items = (presult.lpdead_items > 0);
- Assert(!presult.all_visible || !(*has_lpdead_items));
- Assert(!presult.all_frozen || presult.all_visible);
-
- old_vmbits = visibilitymap_get_status(vacrel->rel, blkno, &vmbuffer);
-
- if (identify_and_fix_vm_corruption(vacrel->rel, buf, blkno, page,
- presult.lpdead_items, vmbuffer,
- old_vmbits))
- old_vmbits = 0;
-
- if (!presult.all_visible)
- return presult.ndeleted;
-
- /* Set the visibility map and page visibility hint */
- new_vmbits = VISIBILITYMAP_ALL_VISIBLE;
-
- if (presult.all_frozen)
- new_vmbits |= VISIBILITYMAP_ALL_FROZEN;
-
- /* Nothing to do */
- if (old_vmbits == new_vmbits)
- return presult.ndeleted;
-
- Assert(presult.all_visible);
-
- /*
- * It should never be the case that the visibility map page is set while
- * the page-level bit is clear, but the reverse is allowed (if checksums
- * are not enabled). Regardless, set both bits so that we get back in
- * sync.
- *
- * The heap buffer must be marked dirty before adding it to the WAL chain
- * when setting the VM. We don't worry about unnecessarily dirtying the
- * heap buffer if PD_ALL_VISIBLE is already set, though. It is extremely
- * rare to have a clean heap buffer with PD_ALL_VISIBLE already set and
- * the VM bits clear, so there is no point in optimizing it.
- */
- PageSetAllVisible(page);
- MarkBufferDirty(buf);
-
- /*
- * If the page is being set all-frozen, we pass InvalidTransactionId as
- * the cutoff_xid, since a snapshot conflict horizon sufficient to make
- * everything safe for REDO was logged when the page's tuples were frozen.
- */
- Assert(!presult.all_frozen ||
- !TransactionIdIsValid(presult.vm_conflict_horizon));
-
- visibilitymap_set(vacrel->rel, blkno, buf,
- InvalidXLogRecPtr,
- vmbuffer, presult.vm_conflict_horizon,
- new_vmbits);
/*
* If the page wasn't already set all-visible and/or all-frozen in the VM,
* count it as newly set for logging.
*/
- if ((old_vmbits & VISIBILITYMAP_ALL_VISIBLE) == 0)
+ if ((presult.old_vmbits & VISIBILITYMAP_ALL_VISIBLE) == 0 &&
+ (presult.new_vmbits & VISIBILITYMAP_ALL_VISIBLE) != 0)
{
vacrel->vm_new_visible_pages++;
- if (presult.all_frozen)
+ if ((presult.new_vmbits & VISIBILITYMAP_ALL_FROZEN) != 0)
{
vacrel->vm_new_visible_frozen_pages++;
*vm_page_frozen = true;
}
}
- else if ((old_vmbits & VISIBILITYMAP_ALL_FROZEN) == 0 &&
- presult.all_frozen)
+ else if ((presult.old_vmbits & VISIBILITYMAP_ALL_FROZEN) == 0 &&
+ (presult.new_vmbits & VISIBILITYMAP_ALL_FROZEN) != 0)
{
+ Assert((presult.new_vmbits & VISIBILITYMAP_ALL_VISIBLE) != 0);
vacrel->vm_new_frozen_pages++;
*vm_page_frozen = true;
}
diff --git a/src/include/access/heapam.h b/src/include/access/heapam.h
index ce48fac42ba..2c07e197dc8 100644
--- a/src/include/access/heapam.h
+++ b/src/include/access/heapam.h
@@ -42,6 +42,7 @@
/* "options" flag bits for heap_page_prune_and_freeze */
#define HEAP_PAGE_PRUNE_MARK_UNUSED_NOW (1 << 0)
#define HEAP_PAGE_PRUNE_FREEZE (1 << 1)
+#define HEAP_PAGE_PRUNE_UPDATE_VM (1 << 2)
typedef struct BulkInsertStateData *BulkInsertState;
typedef struct GlobalVisState GlobalVisState;
@@ -238,6 +239,12 @@ typedef struct PruneFreezeParams
Relation relation; /* relation containing buffer to be pruned */
Buffer buffer; /* buffer to be pruned */
+ /*
+ * If we will consider updating the visibility map, vmbuffer should
+ * contain the correct block of the visibility map and be pinned.
+ */
+ Buffer vmbuffer;
+
/*
* The reason pruning was performed. It is used to set the WAL record
* opcode which is used for debugging and analysis purposes.
@@ -252,6 +259,9 @@ typedef struct PruneFreezeParams
*
* HEAP_PAGE_PRUNE_FREEZE indicates that we will also freeze tuples, and
* will return 'all_visible', 'all_frozen' flags to the caller.
+ *
+ * HEAP_PAGE_PRUNE_UPDATE_VM indicates that we will set the page's status
+ * in the VM.
*/
int options;
@@ -299,6 +309,16 @@ typedef struct PruneFreezeResult
bool all_frozen;
TransactionId vm_conflict_horizon;
+ /*
+ * old_vmbits are the state of the all-visible and all-frozen bits in the
+ * visibility map before updating it during phase I of vacuuming.
+ * new_vmbits are the state of those bits after phase I of vacuuming.
+ *
+ * These are only set if the HEAP_PAGE_PRUNE_UPDATE_VM option is set.
+ */
+ uint8 new_vmbits;
+ uint8 old_vmbits;
+
/*
* Whether or not the page makes rel truncation unsafe. This is set to
* 'true', even if the page contains LP_DEAD items. VACUUM will remove
--
2.43.0
v33-0005-Move-VM-assert-into-prune-freeze-code.patchtext/x-patch; charset=US-ASCII; name=v33-0005-Move-VM-assert-into-prune-freeze-code.patchDownload
From 0162c78c42764cdb0ecf0ad82eb616954d15a94d Mon Sep 17 00:00:00 2001
From: Melanie Plageman <melanieplageman@gmail.com>
Date: Tue, 2 Dec 2025 15:57:34 -0500
Subject: [PATCH v33 05/16] Move VM assert into prune/freeze code
This is a step toward setting the VM in the same WAL record as pruning
and freezing. It moves the check of the heap page into prune/freeze code
before setting the VM. This allows us to remove some fields of the
PruneFreezeResult.
Reviewed-by: Chao Li <li.evan.chao@gmail.com>
---
src/backend/access/heap/pruneheap.c | 86 ++++++++++++++++++++++------
src/backend/access/heap/vacuumlazy.c | 68 +---------------------
src/include/access/heapam.h | 25 +++-----
3 files changed, 77 insertions(+), 102 deletions(-)
diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index 53b7711ab21..85ac1a54882 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -932,6 +932,31 @@ heap_page_will_set_vm(PruneState *prstate,
return true;
}
+#ifdef USE_ASSERT_CHECKING
+
+/*
+ * Wrapper for heap_page_would_be_all_visible() which can be used for callers
+ * that expect no LP_DEAD on the page. Currently assert-only, but there is no
+ * reason not to use it outside of asserts.
+ */
+static bool
+heap_page_is_all_visible(Relation rel, Buffer buf,
+ TransactionId OldestXmin,
+ bool *all_frozen,
+ TransactionId *visibility_cutoff_xid,
+ OffsetNumber *logging_offnum)
+{
+
+ return heap_page_would_be_all_visible(rel, buf,
+ OldestXmin,
+ NULL, 0,
+ all_frozen,
+ visibility_cutoff_xid,
+ logging_offnum);
+}
+#endif
+
+
/*
* Prune and repair fragmentation and potentially freeze tuples on the
@@ -985,6 +1010,7 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
Buffer vmbuffer = params->vmbuffer;
Page page = BufferGetPage(buffer);
BlockNumber blockno = BufferGetBlockNumber(buffer);
+ TransactionId vm_conflict_horizon = InvalidTransactionId;
PruneState prstate;
bool do_freeze;
bool do_prune;
@@ -1142,23 +1168,8 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
presult->nfrozen = prstate.nfrozen;
presult->live_tuples = prstate.live_tuples;
presult->recently_dead_tuples = prstate.recently_dead_tuples;
- presult->all_visible = prstate.all_visible;
- presult->all_frozen = prstate.all_frozen;
presult->hastup = prstate.hastup;
- /*
- * For callers planning to update the visibility map, the conflict horizon
- * for that record must be the newest xmin on the page. However, if the
- * page is completely frozen, there can be no conflict and the
- * vm_conflict_horizon should remain InvalidTransactionId. This includes
- * the case that we just froze all the tuples; the prune-freeze record
- * included the conflict XID already so the caller doesn't need it.
- */
- if (presult->all_frozen)
- presult->vm_conflict_horizon = InvalidTransactionId;
- else
- presult->vm_conflict_horizon = prstate.visibility_cutoff_xid;
-
presult->lpdead_items = prstate.lpdead_items;
/* the presult->deadoffsets array was already filled in */
@@ -1176,6 +1187,46 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
}
}
+ /*
+ * If updating the visibility map, the conflict horizon for that record
+ * must be the newest xmin on the page. However, if the page is
+ * completely frozen, there can be no conflict and the vm_conflict_horizon
+ * should remain InvalidTransactionId. This includes the case that we
+ * just froze all the tuples; the prune-freeze record included the
+ * conflict XID already so we don't need to again.
+ */
+ if (prstate.all_frozen)
+ vm_conflict_horizon = InvalidTransactionId;
+ else
+ vm_conflict_horizon = prstate.visibility_cutoff_xid;
+
+ /*
+ * During its second pass over the heap, VACUUM calls
+ * heap_page_would_be_all_visible() to determine whether a page is
+ * all-visible and all-frozen. The logic here is similar. After completing
+ * pruning and freezing, use an assertion to verify that our results
+ * remain consistent with heap_page_would_be_all_visible().
+ */
+#ifdef USE_ASSERT_CHECKING
+ if (prstate.all_visible)
+ {
+ TransactionId debug_cutoff;
+ bool debug_all_frozen;
+
+ Assert(presult->lpdead_items == 0);
+
+ Assert(heap_page_is_all_visible(params->relation, buffer,
+ prstate.cutoffs->OldestXmin,
+ &debug_all_frozen,
+ &debug_cutoff, off_loc));
+
+ Assert(prstate.all_frozen == debug_all_frozen);
+
+ Assert(!TransactionIdIsValid(debug_cutoff) ||
+ debug_cutoff == vm_conflict_horizon);
+ }
+#endif
+
/* Now update the visibility map and PD_ALL_VISIBLE hint */
Assert(!prstate.all_visible || (prstate.lpdead_items == 0));
@@ -1222,12 +1273,11 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
* make everything safe for REDO was logged when the page's tuples
* were frozen.
*/
- Assert(!prstate.all_frozen ||
- !TransactionIdIsValid(presult->vm_conflict_horizon));
+ Assert(!prstate.all_frozen || !TransactionIdIsValid(vm_conflict_horizon));
visibilitymap_set(params->relation, blockno, buffer,
InvalidXLogRecPtr,
- vmbuffer, presult->vm_conflict_horizon,
+ vmbuffer, vm_conflict_horizon,
new_vmbits);
}
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index fe816299f4b..b7d834969d6 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -459,20 +459,6 @@ static void dead_items_add(LVRelState *vacrel, BlockNumber blkno, OffsetNumber *
static void dead_items_reset(LVRelState *vacrel);
static void dead_items_cleanup(LVRelState *vacrel);
-#ifdef USE_ASSERT_CHECKING
-static bool heap_page_is_all_visible(Relation rel, Buffer buf,
- TransactionId OldestXmin,
- bool *all_frozen,
- TransactionId *visibility_cutoff_xid,
- OffsetNumber *logging_offnum);
-#endif
-static bool heap_page_would_be_all_visible(Relation rel, Buffer buf,
- TransactionId OldestXmin,
- OffsetNumber *deadoffsets,
- int ndeadoffsets,
- bool *all_frozen,
- TransactionId *visibility_cutoff_xid,
- OffsetNumber *logging_offnum);
static void update_relstats_all_indexes(LVRelState *vacrel);
static void vacuum_error_callback(void *arg);
static void update_vacuum_error_info(LVRelState *vacrel,
@@ -2035,32 +2021,6 @@ lazy_scan_prune(LVRelState *vacrel,
vacrel->new_frozen_tuple_pages++;
}
- /*
- * VACUUM will call heap_page_is_all_visible() during the second pass over
- * the heap to determine all_visible and all_frozen for the page -- this
- * is a specialized version of the logic from this function. Now that
- * we've finished pruning and freezing, make sure that we're in total
- * agreement with heap_page_is_all_visible() using an assertion.
- */
-#ifdef USE_ASSERT_CHECKING
- if (presult.all_visible)
- {
- TransactionId debug_cutoff;
- bool debug_all_frozen;
-
- Assert(presult.lpdead_items == 0);
-
- Assert(heap_page_is_all_visible(vacrel->rel, buf,
- vacrel->cutoffs.OldestXmin, &debug_all_frozen,
- &debug_cutoff, &vacrel->offnum));
-
- Assert(presult.all_frozen == debug_all_frozen);
-
- Assert(!TransactionIdIsValid(debug_cutoff) ||
- debug_cutoff == presult.vm_conflict_horizon);
- }
-#endif
-
/*
* Now save details of the LP_DEAD items from the page in vacrel
*/
@@ -3522,29 +3482,6 @@ dead_items_cleanup(LVRelState *vacrel)
vacrel->pvs = NULL;
}
-#ifdef USE_ASSERT_CHECKING
-
-/*
- * Wrapper for heap_page_would_be_all_visible() which can be used for callers
- * that expect no LP_DEAD on the page. Currently assert-only, but there is no
- * reason not to use it outside of asserts.
- */
-static bool
-heap_page_is_all_visible(Relation rel, Buffer buf,
- TransactionId OldestXmin,
- bool *all_frozen,
- TransactionId *visibility_cutoff_xid,
- OffsetNumber *logging_offnum)
-{
-
- return heap_page_would_be_all_visible(rel, buf,
- OldestXmin,
- NULL, 0,
- all_frozen,
- visibility_cutoff_xid,
- logging_offnum);
-}
-#endif
/*
* Check whether the heap page in buf is all-visible except for the dead
@@ -3568,15 +3505,12 @@ heap_page_is_all_visible(Relation rel, Buffer buf,
* - *logging_offnum: OffsetNumber of current tuple being processed;
* used by vacuum's error callback system.
*
- * Callers looking to verify that the page is already all-visible can call
- * heap_page_is_all_visible().
- *
* This logic is closely related to heap_prune_record_unchanged_lp_normal().
* If you modify this function, ensure consistency with that code. An
* assertion cross-checks that both remain in agreement. Do not introduce new
* side-effects.
*/
-static bool
+bool
heap_page_would_be_all_visible(Relation rel, Buffer buf,
TransactionId OldestXmin,
OffsetNumber *deadoffsets,
diff --git a/src/include/access/heapam.h b/src/include/access/heapam.h
index 2c07e197dc8..e0da1f7cdcc 100644
--- a/src/include/access/heapam.h
+++ b/src/include/access/heapam.h
@@ -257,8 +257,7 @@ typedef struct PruneFreezeParams
* HEAP_PAGE_PRUNE_MARK_UNUSED_NOW indicates that dead items can be set
* LP_UNUSED during pruning.
*
- * HEAP_PAGE_PRUNE_FREEZE indicates that we will also freeze tuples, and
- * will return 'all_visible', 'all_frozen' flags to the caller.
+ * HEAP_PAGE_PRUNE_FREEZE indicates that we will also freeze tuples.
*
* HEAP_PAGE_PRUNE_UPDATE_VM indicates that we will set the page's status
* in the VM.
@@ -294,21 +293,6 @@ typedef struct PruneFreezeResult
int live_tuples;
int recently_dead_tuples;
- /*
- * all_visible and all_frozen indicate if the all-visible and all-frozen
- * bits in the visibility map can be set for this page, after pruning.
- *
- * vm_conflict_horizon is the newest xmin of live tuples on the page. The
- * caller can use it as the conflict horizon when setting the VM bits. It
- * is only valid if we froze some tuples (nfrozen > 0), and all_frozen is
- * true.
- *
- * These are only set if the HEAP_PAGE_PRUNE_FREEZE option is set.
- */
- bool all_visible;
- bool all_frozen;
- TransactionId vm_conflict_horizon;
-
/*
* old_vmbits are the state of the all-visible and all-frozen bits in the
* visibility map before updating it during phase I of vacuuming.
@@ -453,6 +437,13 @@ extern void log_heap_prune_and_freeze(Relation relation, Buffer buffer,
/* in heap/vacuumlazy.c */
extern void heap_vacuum_rel(Relation rel,
const VacuumParams params, BufferAccessStrategy bstrategy);
+extern bool heap_page_would_be_all_visible(Relation rel, Buffer buf,
+ TransactionId OldestXmin,
+ OffsetNumber *deadoffsets,
+ int ndeadoffsets,
+ bool *all_frozen,
+ TransactionId *visibility_cutoff_xid,
+ OffsetNumber *logging_offnum);
/* in heap/heapam_visibility.c */
extern bool HeapTupleSatisfiesVisibility(HeapTuple htup, Snapshot snapshot,
--
2.43.0
v33-0006-Eliminate-XLOG_HEAP2_VISIBLE-from-vacuum-phase-I.patchtext/x-patch; charset=US-ASCII; name=v33-0006-Eliminate-XLOG_HEAP2_VISIBLE-from-vacuum-phase-I.patchDownload
From cdf5776fadeae3430c692999b37f8a7ec944bda1 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <melanieplageman@gmail.com>
Date: Tue, 2 Dec 2025 16:16:22 -0500
Subject: [PATCH v33 06/16] Eliminate XLOG_HEAP2_VISIBLE from vacuum phase I
prune/freeze
Vacuum no longer emits a separate WAL record for each page set
all-visible or all-frozen during phase I. Instead, visibility map
updates are now included in the XLOG_HEAP2_PRUNE_VACUUM_SCAN record that
is already emitted for pruning and freezing.
Previously, heap_page_prune_and_freeze() determined whether a page was
all-visible, but the corresponding VM bits were only set later in
lazy_scan_prune(). Now the VM is updated immediately in
heap_page_prune_and_freeze(), at the same time as the heap
modifications.
This change applies only to vacuum phase I, not to pruning performed
during normal page access.
Author: Melanie Plageman <melanieplageman@gmail.com>
Reviewed-by: Andres Freund <andres@anarazel.de>
Reviewed-by: Robert Haas <robertmhaas@gmail.com>
Reviewed-by: Kirill Reshke <reshkekirill@gmail.com>
Reviewed-by: Chao Li <li.evan.chao@gmail.com>
Discussion: https://postgr.es/m/flat/CAAKRu_ZMw6Npd_qm2KM%2BFwQ3cMOMx1Dh3VMhp8-V7SOLxdK9-g%40mail.gmail.com
---
src/backend/access/heap/pruneheap.c | 275 ++++++++++++++++------------
1 file changed, 157 insertions(+), 118 deletions(-)
diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index 85ac1a54882..b3ea42f1be1 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -205,6 +205,11 @@ static bool heap_page_will_set_vm(PruneState *prstate,
int nlpdead_items,
uint8 *old_vmbits,
uint8 *new_vmbits);
+static TransactionId get_conflict_xid(bool do_prune, bool do_freeze, bool do_set_vm,
+ uint8 old_vmbits, uint8 new_vmbits,
+ TransactionId latest_xid_removed,
+ TransactionId frz_conflict_horizon,
+ TransactionId visibility_cutoff_xid);
/*
@@ -795,6 +800,68 @@ heap_page_will_freeze(Relation relation, Buffer buffer,
return do_freeze;
}
+/*
+ * Calculate the conflict horizon for the whole XLOG_HEAP2_PRUNE_VACUUM_SCAN
+ * or XLOG_HEAP2_PRUNE_ON_ACCESS record.
+ */
+static TransactionId
+get_conflict_xid(bool do_prune, bool do_freeze, bool do_set_vm,
+ uint8 old_vmbits, uint8 new_vmbits,
+ TransactionId latest_xid_removed, TransactionId frz_conflict_horizon,
+ TransactionId visibility_cutoff_xid)
+{
+ TransactionId conflict_xid;
+
+ /*
+ * We can omit the snapshot conflict horizon if we are not pruning or
+ * freezing any tuples and are setting an already all-visible page
+ * all-frozen in the VM. In this case, all of the tuples on the page must
+ * already be visible to all MVCC snapshots on the standby.
+ */
+ if (!do_prune &&
+ !do_freeze &&
+ do_set_vm &&
+ (old_vmbits & VISIBILITYMAP_ALL_VISIBLE) != 0 &&
+ (new_vmbits & VISIBILITYMAP_ALL_FROZEN) != 0)
+ return InvalidTransactionId;
+
+ /*
+ * The snapshotConflictHorizon for the whole record should be the most
+ * conservative of all the horizons calculated for any of the possible
+ * modifications. If this record will prune tuples, any transactions on
+ * the standby older than the youngest xmax of the most recently removed
+ * tuple this record will prune will conflict. If this record will freeze
+ * tuples, any transactions on the standby with xids older than the
+ * youngest tuple this record will freeze will conflict.
+ */
+ conflict_xid = InvalidTransactionId;
+
+ /*
+ * If we are updating the VM, the conflict horizon is almost always the
+ * visibility cutoff XID.
+ *
+ * Separately, if we are freezing any tuples, as an optimization, we can
+ * use the visibility_cutoff_xid as the conflict horizon if the page will
+ * be all-frozen. This is true even if there are LP_DEAD line pointers
+ * because we ignored those when maintaining the visibility_cutoff_xid.
+ * This will have been calculated earlier as the frz_conflict_horizon when
+ * we determined we would freeze.
+ */
+ if (do_set_vm)
+ conflict_xid = visibility_cutoff_xid;
+ else if (do_freeze)
+ conflict_xid = frz_conflict_horizon;
+
+ /*
+ * If we are removing tuples with a younger xmax than our so far
+ * calculated conflict_xid, we must use this as our horizon.
+ */
+ if (TransactionIdFollows(latest_xid_removed, conflict_xid))
+ conflict_xid = latest_xid_removed;
+
+ return conflict_xid;
+}
+
/*
* Helper to correct any corruption detected on a heap page and its
* corresponding visibility map page after pruning but before setting the
@@ -1010,7 +1077,6 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
Buffer vmbuffer = params->vmbuffer;
Page page = BufferGetPage(buffer);
BlockNumber blockno = BufferGetBlockNumber(buffer);
- TransactionId vm_conflict_horizon = InvalidTransactionId;
PruneState prstate;
bool do_freeze;
bool do_prune;
@@ -1018,6 +1084,7 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
bool do_set_vm;
bool did_tuple_hint_fpi;
int64 fpi_before = pgWalUsage.wal_fpi;
+ TransactionId conflict_xid;
uint8 new_vmbits;
uint8 old_vmbits;
@@ -1081,6 +1148,37 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
prstate.all_visible = prstate.all_frozen = false;
Assert(!prstate.all_frozen || prstate.all_visible);
+ Assert(!prstate.all_visible || (prstate.lpdead_items == 0));
+
+ /*
+ * Decide whether to set the VM bits based on information from the VM and
+ * the all_visible/all_frozen flags.
+ */
+ do_set_vm = heap_page_will_set_vm(&prstate,
+ params->relation,
+ blockno,
+ buffer,
+ page,
+ vmbuffer,
+ prstate.lpdead_items,
+ &old_vmbits,
+ &new_vmbits);
+
+ /*
+ * new_vmbits should be 0 regardless of whether or not the page is
+ * all-visible if we do not intend to set the VM.
+ */
+ Assert(do_set_vm || new_vmbits == 0);
+
+ conflict_xid = get_conflict_xid(do_prune, do_freeze, do_set_vm,
+ old_vmbits, new_vmbits,
+ prstate.latest_xid_removed,
+ prstate.frz_conflict_horizon,
+ prstate.visibility_cutoff_xid);
+
+ /* Lock vmbuffer before entering a critical section */
+ if (do_set_vm)
+ LockBuffer(vmbuffer, BUFFER_LOCK_EXCLUSIVE);
/* Any error while applying the changes is critical */
START_CRIT_SECTION();
@@ -1102,14 +1200,17 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
/*
* If that's all we had to do to the page, this is a non-WAL-logged
- * hint. If we are going to freeze or prune the page, we will mark
- * the buffer dirty below.
+ * hint. If we are going to freeze or prune the page or set
+ * PD_ALL_VISIBLE, we will mark the buffer dirty below.
+ *
+ * Setting PD_ALL_VISIBLE is fully WAL-logged because it is forbidden
+ * for the VM to be set and PD_ALL_VISIBLE to be clear.
*/
- if (!do_freeze && !do_prune)
+ if (!do_freeze && !do_prune && !do_set_vm)
MarkBufferDirtyHint(buffer, true);
}
- if (do_prune || do_freeze)
+ if (do_prune || do_freeze || do_set_vm)
{
/* Apply the planned item changes and repair page fragmentation. */
if (do_prune)
@@ -1123,6 +1224,26 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
if (do_freeze)
heap_freeze_prepared_tuples(buffer, prstate.frozen, prstate.nfrozen);
+ /* Set the visibility map and page visibility hint */
+ if (do_set_vm)
+ {
+ /*
+ * While it is valid for PD_ALL_VISIBLE to be set when the
+ * corresponding VM bit is clear, we strongly prefer to keep them
+ * in sync.
+ *
+ * The heap buffer must be marked dirty before adding it to the
+ * WAL chain when setting the VM. We don't worry about
+ * unnecessarily dirtying the heap buffer if PD_ALL_VISIBLE is
+ * already set, though. It is extremely rare to have a clean heap
+ * buffer with PD_ALL_VISIBLE already set and the VM bits clear,
+ * so there is no point in optimizing it.
+ */
+ PageSetAllVisible(page);
+ visibilitymap_set_vmbits(blockno, vmbuffer, new_vmbits,
+ params->relation->rd_locator);
+ }
+
MarkBufferDirty(buffer);
/*
@@ -1130,29 +1251,12 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
*/
if (RelationNeedsWAL(params->relation))
{
- /*
- * The snapshotConflictHorizon for the whole record should be the
- * most conservative of all the horizons calculated for any of the
- * possible modifications. If this record will prune tuples, any
- * transactions on the standby older than the youngest xmax of the
- * most recently removed tuple this record will prune will
- * conflict. If this record will freeze tuples, any transactions
- * on the standby with xids older than the youngest tuple this
- * record will freeze will conflict.
- */
- TransactionId conflict_xid;
-
- if (TransactionIdFollows(prstate.frz_conflict_horizon,
- prstate.latest_xid_removed))
- conflict_xid = prstate.frz_conflict_horizon;
- else
- conflict_xid = prstate.latest_xid_removed;
-
log_heap_prune_and_freeze(params->relation, buffer,
- InvalidBuffer, /* vmbuffer */
- 0, /* vmflags */
+ do_set_vm ? vmbuffer : InvalidBuffer,
+ do_set_vm ? new_vmbits : 0,
conflict_xid,
- true, params->reason,
+ true, /* cleanup lock */
+ params->reason,
prstate.frozen, prstate.nfrozen,
prstate.redirected, prstate.nredirected,
prstate.nowdead, prstate.ndead,
@@ -1162,43 +1266,8 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
END_CRIT_SECTION();
- /* Copy information back for caller */
- presult->ndeleted = prstate.ndeleted;
- presult->nnewlpdead = prstate.ndead;
- presult->nfrozen = prstate.nfrozen;
- presult->live_tuples = prstate.live_tuples;
- presult->recently_dead_tuples = prstate.recently_dead_tuples;
- presult->hastup = prstate.hastup;
-
- presult->lpdead_items = prstate.lpdead_items;
- /* the presult->deadoffsets array was already filled in */
-
- if (prstate.attempt_freeze)
- {
- if (presult->nfrozen > 0)
- {
- *new_relfrozen_xid = prstate.pagefrz.FreezePageRelfrozenXid;
- *new_relmin_mxid = prstate.pagefrz.FreezePageRelminMxid;
- }
- else
- {
- *new_relfrozen_xid = prstate.pagefrz.NoFreezePageRelfrozenXid;
- *new_relmin_mxid = prstate.pagefrz.NoFreezePageRelminMxid;
- }
- }
-
- /*
- * If updating the visibility map, the conflict horizon for that record
- * must be the newest xmin on the page. However, if the page is
- * completely frozen, there can be no conflict and the vm_conflict_horizon
- * should remain InvalidTransactionId. This includes the case that we
- * just froze all the tuples; the prune-freeze record included the
- * conflict XID already so we don't need to again.
- */
- if (prstate.all_frozen)
- vm_conflict_horizon = InvalidTransactionId;
- else
- vm_conflict_horizon = prstate.visibility_cutoff_xid;
+ if (do_set_vm)
+ LockBuffer(vmbuffer, BUFFER_LOCK_UNLOCK);
/*
* During its second pass over the heap, VACUUM calls
@@ -1213,7 +1282,8 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
TransactionId debug_cutoff;
bool debug_all_frozen;
- Assert(presult->lpdead_items == 0);
+ Assert(prstate.lpdead_items == 0);
+ Assert(prstate.cutoffs);
Assert(heap_page_is_all_visible(params->relation, buffer,
prstate.cutoffs->OldestXmin,
@@ -1223,67 +1293,36 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
Assert(prstate.all_frozen == debug_all_frozen);
Assert(!TransactionIdIsValid(debug_cutoff) ||
- debug_cutoff == vm_conflict_horizon);
+ debug_cutoff == prstate.visibility_cutoff_xid);
}
#endif
- /* Now update the visibility map and PD_ALL_VISIBLE hint */
- Assert(!prstate.all_visible || (prstate.lpdead_items == 0));
-
- do_set_vm = heap_page_will_set_vm(&prstate,
- params->relation,
- blockno,
- buffer,
- page,
- vmbuffer,
- prstate.lpdead_items,
- &old_vmbits,
- &new_vmbits);
+ /* Copy information back for caller */
+ presult->ndeleted = prstate.ndeleted;
+ presult->nnewlpdead = prstate.ndead;
+ presult->nfrozen = prstate.nfrozen;
+ presult->live_tuples = prstate.live_tuples;
+ presult->recently_dead_tuples = prstate.recently_dead_tuples;
+ presult->hastup = prstate.hastup;
+ presult->new_vmbits = new_vmbits;
+ presult->old_vmbits = old_vmbits;
- /*
- * new_vmbits should be 0 regardless of whether or not the page is
- * all-visible if we do not intend to set the VM.
- */
- Assert(do_set_vm || new_vmbits == 0);
+ presult->lpdead_items = prstate.lpdead_items;
+ /* the presult->deadoffsets array was already filled in */
- /* Set the visibility map and page visibility hint, if relevant */
- if (do_set_vm)
+ if (prstate.attempt_freeze)
{
- Assert(prstate.all_visible);
-
- /*
- * It should never be the case that the visibility map page is set
- * while the page-level bit is clear, but the reverse is allowed (if
- * checksums are not enabled). Regardless, set both bits so that we
- * get back in sync.
- *
- * The heap buffer must be marked dirty before adding it to the WAL
- * chain when setting the VM. We don't worry about unnecessarily
- * dirtying the heap buffer if PD_ALL_VISIBLE is already set, though.
- * It is extremely rare to have a clean heap buffer with
- * PD_ALL_VISIBLE already set and the VM bits clear, so there is no
- * point in optimizing it.
- */
- PageSetAllVisible(page);
- MarkBufferDirty(buffer);
-
- /*
- * If the page is being set all-frozen, we pass InvalidTransactionId
- * as the cutoff_xid, since a snapshot conflict horizon sufficient to
- * make everything safe for REDO was logged when the page's tuples
- * were frozen.
- */
- Assert(!prstate.all_frozen || !TransactionIdIsValid(vm_conflict_horizon));
-
- visibilitymap_set(params->relation, blockno, buffer,
- InvalidXLogRecPtr,
- vmbuffer, vm_conflict_horizon,
- new_vmbits);
+ if (presult->nfrozen > 0)
+ {
+ *new_relfrozen_xid = prstate.pagefrz.FreezePageRelfrozenXid;
+ *new_relmin_mxid = prstate.pagefrz.FreezePageRelminMxid;
+ }
+ else
+ {
+ *new_relfrozen_xid = prstate.pagefrz.NoFreezePageRelfrozenXid;
+ *new_relmin_mxid = prstate.pagefrz.NoFreezePageRelminMxid;
+ }
}
-
- /* Save the vmbits for caller */
- presult->old_vmbits = old_vmbits;
- presult->new_vmbits = new_vmbits;
}
--
2.43.0
v33-0007-Eliminate-XLOG_HEAP2_VISIBLE-from-empty-page-vac.patchtext/x-patch; charset=US-ASCII; name=v33-0007-Eliminate-XLOG_HEAP2_VISIBLE-from-empty-page-vac.patchDownload
From 8a3d02ccb9165d53e50c391dd4d71cc108c9ef15 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <melanieplageman@gmail.com>
Date: Sat, 27 Sep 2025 11:55:21 -0400
Subject: [PATCH v33 07/16] Eliminate XLOG_HEAP2_VISIBLE from empty-page vacuum
As part of removing XLOG_HEAP2_VISIBLE records, phase I of VACUUM now
marks empty pages all-visible in a XLOG_HEAP2_PRUNE_VACUUM_SCAN record.
Author: Melanie Plageman <melanieplageman@gmail.com>
Reviewed-by: Robert Haas <robertmhaas@gmail.com>
Reviewed-by: Andres Freund <andres@anarazel.de>
Reviewed-by: Chao Li <li.evan.chao@gmail.com>
---
src/backend/access/heap/vacuumlazy.c | 35 +++++++++++++++++++++++-----
1 file changed, 29 insertions(+), 6 deletions(-)
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index b7d834969d6..afa2c3af833 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -1896,9 +1896,12 @@ lazy_scan_new_or_empty(LVRelState *vacrel, Buffer buf, BlockNumber blkno,
*/
if (!PageIsAllVisible(page))
{
+ /* Lock vmbuffer before entering critical section */
+ LockBuffer(vmbuffer, BUFFER_LOCK_EXCLUSIVE);
+
START_CRIT_SECTION();
- /* mark buffer dirty before writing a WAL record */
+ /* Mark buffer dirty before writing any WAL records */
MarkBufferDirty(buf);
/*
@@ -1915,13 +1918,33 @@ lazy_scan_new_or_empty(LVRelState *vacrel, Buffer buf, BlockNumber blkno,
log_newpage_buffer(buf, true);
PageSetAllVisible(page);
- visibilitymap_set(vacrel->rel, blkno, buf,
- InvalidXLogRecPtr,
- vmbuffer, InvalidTransactionId,
- VISIBILITYMAP_ALL_VISIBLE |
- VISIBILITYMAP_ALL_FROZEN);
+ visibilitymap_set_vmbits(blkno,
+ vmbuffer,
+ VISIBILITYMAP_ALL_VISIBLE |
+ VISIBILITYMAP_ALL_FROZEN,
+ vacrel->rel->rd_locator);
+
+ /*
+ * Emit WAL for setting PD_ALL_VISIBLE on the heap page and
+ * setting the VM.
+ */
+ if (RelationNeedsWAL(vacrel->rel))
+ log_heap_prune_and_freeze(vacrel->rel, buf,
+ vmbuffer,
+ VISIBILITYMAP_ALL_VISIBLE |
+ VISIBILITYMAP_ALL_FROZEN,
+ InvalidTransactionId, /* conflict xid */
+ false, /* cleanup lock */
+ PRUNE_VACUUM_SCAN, /* reason */
+ NULL, 0,
+ NULL, 0,
+ NULL, 0,
+ NULL, 0);
+
END_CRIT_SECTION();
+ LockBuffer(vmbuffer, BUFFER_LOCK_UNLOCK);
+
/* Count the newly all-frozen pages for logging */
vacrel->vm_new_visible_pages++;
vacrel->vm_new_visible_frozen_pages++;
--
2.43.0
v33-0008-Remove-XLOG_HEAP2_VISIBLE-entirely.patchtext/x-patch; charset=US-ASCII; name=v33-0008-Remove-XLOG_HEAP2_VISIBLE-entirely.patchDownload
From bdda4434c391863526b8f93a95ab398595a6906b Mon Sep 17 00:00:00 2001
From: Melanie Plageman <melanieplageman@gmail.com>
Date: Sat, 27 Sep 2025 11:55:36 -0400
Subject: [PATCH v33 08/16] Remove XLOG_HEAP2_VISIBLE entirely
As no remaining users emit XLOG_HEAP2_VISIBLE records.
This includes deleting the xl_heap_visible struct and all functions
responsible for emitting or replaying XLOG_HEAP2_VISIBLE records.
This changes the visibility map API, so any external users/consumers of
the VM-only WAL record will need to change.
Author: Melanie Plageman <melanieplageman@gmail.com>
Reviewed-by: Andrey Borodin <x4mmm@yandex-team.ru>
Reviewed-by: Andres Freund <andres@anarazel.de>
Reviewed-by: Chao Li <li.evan.chao@gmail.com>
---
src/backend/access/common/bufmask.c | 4 +-
src/backend/access/heap/heapam.c | 54 +-------
src/backend/access/heap/heapam_xlog.c | 155 ++---------------------
src/backend/access/heap/pruneheap.c | 4 +-
src/backend/access/heap/vacuumlazy.c | 16 +--
src/backend/access/heap/visibilitymap.c | 110 +---------------
src/backend/access/rmgrdesc/heapdesc.c | 10 --
src/backend/replication/logical/decode.c | 1 -
src/backend/storage/ipc/standby.c | 12 +-
src/include/access/heapam_xlog.h | 20 ---
src/include/access/visibilitymap.h | 13 +-
src/include/access/visibilitymapdefs.h | 9 --
src/tools/pgindent/typedefs.list | 1 -
13 files changed, 38 insertions(+), 371 deletions(-)
diff --git a/src/backend/access/common/bufmask.c b/src/backend/access/common/bufmask.c
index 1a9e7bea5d2..bce767d7b71 100644
--- a/src/backend/access/common/bufmask.c
+++ b/src/backend/access/common/bufmask.c
@@ -56,8 +56,8 @@ mask_page_hint_bits(Page page)
/*
* During replay, if the page LSN has advanced past our XLOG record's LSN,
- * we don't mark the page all-visible. See heap_xlog_visible() for
- * details.
+ * we don't mark the page all-visible. See heap_xlog_prune_and_freeze()
+ * for more details.
*/
PageClearAllVisible(page);
}
diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index ad9d6338ec2..f219c7a71cf 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -2542,11 +2542,11 @@ heap_multi_insert(Relation relation, TupleTableSlot **slots, int ntuples,
else if (all_frozen_set)
{
PageSetAllVisible(page);
- visibilitymap_set_vmbits(BufferGetBlockNumber(buffer),
- vmbuffer,
- VISIBILITYMAP_ALL_VISIBLE |
- VISIBILITYMAP_ALL_FROZEN,
- relation->rd_locator);
+ visibilitymap_set(BufferGetBlockNumber(buffer),
+ vmbuffer,
+ VISIBILITYMAP_ALL_VISIBLE |
+ VISIBILITYMAP_ALL_FROZEN,
+ relation->rd_locator);
}
/*
@@ -8831,50 +8831,6 @@ bottomup_sort_and_shrink(TM_IndexDeleteOp *delstate)
return nblocksfavorable;
}
-/*
- * Perform XLogInsert for a heap-visible operation. 'block' is the block
- * being marked all-visible, and vm_buffer is the buffer containing the
- * corresponding visibility map block. Both should have already been modified
- * and dirtied.
- *
- * snapshotConflictHorizon comes from the largest xmin on the page being
- * marked all-visible. REDO routine uses it to generate recovery conflicts.
- *
- * If checksums or wal_log_hints are enabled, we may also generate a full-page
- * image of heap_buffer. Otherwise, we optimize away the FPI (by specifying
- * REGBUF_NO_IMAGE for the heap buffer), in which case the caller should *not*
- * update the heap page's LSN.
- */
-XLogRecPtr
-log_heap_visible(Relation rel, Buffer heap_buffer, Buffer vm_buffer,
- TransactionId snapshotConflictHorizon, uint8 vmflags)
-{
- xl_heap_visible xlrec;
- XLogRecPtr recptr;
- uint8 flags;
-
- Assert(BufferIsValid(heap_buffer));
- Assert(BufferIsValid(vm_buffer));
-
- xlrec.snapshotConflictHorizon = snapshotConflictHorizon;
- xlrec.flags = vmflags;
- if (RelationIsAccessibleInLogicalDecoding(rel))
- xlrec.flags |= VISIBILITYMAP_XLOG_CATALOG_REL;
- XLogBeginInsert();
- XLogRegisterData(&xlrec, SizeOfHeapVisible);
-
- XLogRegisterBuffer(0, vm_buffer, 0);
-
- flags = REGBUF_STANDARD;
- if (!XLogHintBitIsNeeded())
- flags |= REGBUF_NO_IMAGE;
- XLogRegisterBuffer(1, heap_buffer, flags);
-
- recptr = XLogInsert(RM_HEAP2_ID, XLOG_HEAP2_VISIBLE);
-
- return recptr;
-}
-
/*
* Perform XLogInsert for a heap-update operation. Caller must already
* have modified the buffer(s) and marked them dirty.
diff --git a/src/backend/access/heap/heapam_xlog.c b/src/backend/access/heap/heapam_xlog.c
index f765345e9e4..9a29fda3601 100644
--- a/src/backend/access/heap/heapam_xlog.c
+++ b/src/backend/access/heap/heapam_xlog.c
@@ -236,7 +236,7 @@ heap_xlog_prune_freeze(XLogReaderState *record)
if (PageIsNew(vmpage))
PageInit(vmpage, BLCKSZ, 0);
- visibilitymap_set_vmbits(blkno, vmbuffer, vmflags, rlocator);
+ visibilitymap_set(blkno, vmbuffer, vmflags, rlocator);
Assert(BufferIsDirty(vmbuffer));
PageSetLSN(vmpage, lsn);
@@ -249,142 +249,6 @@ heap_xlog_prune_freeze(XLogReaderState *record)
XLogRecordPageWithFreeSpace(rlocator, blkno, freespace);
}
-/*
- * Replay XLOG_HEAP2_VISIBLE records.
- *
- * The critical integrity requirement here is that we must never end up with
- * a situation where the visibility map bit is set, and the page-level
- * PD_ALL_VISIBLE bit is clear. If that were to occur, then a subsequent
- * page modification would fail to clear the visibility map bit.
- */
-static void
-heap_xlog_visible(XLogReaderState *record)
-{
- XLogRecPtr lsn = record->EndRecPtr;
- xl_heap_visible *xlrec = (xl_heap_visible *) XLogRecGetData(record);
- Buffer vmbuffer = InvalidBuffer;
- Buffer buffer;
- Page page;
- RelFileLocator rlocator;
- BlockNumber blkno;
- XLogRedoAction action;
-
- Assert((xlrec->flags & VISIBILITYMAP_XLOG_VALID_BITS) == xlrec->flags);
-
- XLogRecGetBlockTag(record, 1, &rlocator, NULL, &blkno);
-
- /*
- * If there are any Hot Standby transactions running that have an xmin
- * horizon old enough that this page isn't all-visible for them, they
- * might incorrectly decide that an index-only scan can skip a heap fetch.
- *
- * NB: It might be better to throw some kind of "soft" conflict here that
- * forces any index-only scan that is in flight to perform heap fetches,
- * rather than killing the transaction outright.
- */
- if (InHotStandby)
- ResolveRecoveryConflictWithSnapshot(xlrec->snapshotConflictHorizon,
- xlrec->flags & VISIBILITYMAP_XLOG_CATALOG_REL,
- rlocator);
-
- /*
- * Read the heap page, if it still exists. If the heap file has dropped or
- * truncated later in recovery, we don't need to update the page, but we'd
- * better still update the visibility map.
- */
- action = XLogReadBufferForRedo(record, 1, &buffer);
- if (action == BLK_NEEDS_REDO)
- {
- /*
- * We don't bump the LSN of the heap page when setting the visibility
- * map bit (unless checksums or wal_hint_bits is enabled, in which
- * case we must). This exposes us to torn page hazards, but since
- * we're not inspecting the existing page contents in any way, we
- * don't care.
- */
- page = BufferGetPage(buffer);
-
- PageSetAllVisible(page);
-
- if (XLogHintBitIsNeeded())
- PageSetLSN(page, lsn);
-
- MarkBufferDirty(buffer);
- }
- else if (action == BLK_RESTORED)
- {
- /*
- * If heap block was backed up, we already restored it and there's
- * nothing more to do. (This can only happen with checksums or
- * wal_log_hints enabled.)
- */
- }
-
- if (BufferIsValid(buffer))
- {
- Size space = PageGetFreeSpace(BufferGetPage(buffer));
-
- UnlockReleaseBuffer(buffer);
-
- /*
- * Since FSM is not WAL-logged and only updated heuristically, it
- * easily becomes stale in standbys. If the standby is later promoted
- * and runs VACUUM, it will skip updating individual free space
- * figures for pages that became all-visible (or all-frozen, depending
- * on the vacuum mode,) which is troublesome when FreeSpaceMapVacuum
- * propagates too optimistic free space values to upper FSM layers;
- * later inserters try to use such pages only to find out that they
- * are unusable. This can cause long stalls when there are many such
- * pages.
- *
- * Forestall those problems by updating FSM's idea about a page that
- * is becoming all-visible or all-frozen.
- *
- * Do this regardless of a full-page image being applied, since the
- * FSM data is not in the page anyway.
- */
- if (xlrec->flags & VISIBILITYMAP_VALID_BITS)
- XLogRecordPageWithFreeSpace(rlocator, blkno, space);
- }
-
- /*
- * Even if we skipped the heap page update due to the LSN interlock, it's
- * still safe to update the visibility map. Any WAL record that clears
- * the visibility map bit does so before checking the page LSN, so any
- * bits that need to be cleared will still be cleared.
- */
- if (XLogReadBufferForRedoExtended(record, 0, RBM_ZERO_ON_ERROR, false,
- &vmbuffer) == BLK_NEEDS_REDO)
- {
- Page vmpage = BufferGetPage(vmbuffer);
- Relation reln;
- uint8 vmbits;
-
- /* initialize the page if it was read as zeros */
- if (PageIsNew(vmpage))
- PageInit(vmpage, BLCKSZ, 0);
-
- /* remove VISIBILITYMAP_XLOG_* */
- vmbits = xlrec->flags & VISIBILITYMAP_VALID_BITS;
-
- /*
- * XLogReadBufferForRedoExtended locked the buffer. But
- * visibilitymap_set will handle locking itself.
- */
- LockBuffer(vmbuffer, BUFFER_LOCK_UNLOCK);
-
- reln = CreateFakeRelcacheEntry(rlocator);
-
- visibilitymap_set(reln, blkno, InvalidBuffer, lsn, vmbuffer,
- xlrec->snapshotConflictHorizon, vmbits);
-
- ReleaseBuffer(vmbuffer);
- FreeFakeRelcacheEntry(reln);
- }
- else if (BufferIsValid(vmbuffer))
- UnlockReleaseBuffer(vmbuffer);
-}
-
/*
* Given an "infobits" field from an XLog record, set the correct bits in the
* given infomask and infomask2 for the tuple touched by the record.
@@ -762,8 +626,8 @@ heap_xlog_multi_insert(XLogReaderState *record)
*
* During recovery, however, no concurrent writers exist. Therefore,
* updating the VM without holding the heap page lock is safe enough. This
- * same approach is taken when replaying xl_heap_visible records (see
- * heap_xlog_visible()).
+ * same approach is taken when replaying XLOG_HEAP2_PRUNE* records (see
+ * heap_xlog_prune_and_freeze()).
*/
if ((xlrec->flags & XLH_INSERT_ALL_FROZEN_SET) &&
XLogReadBufferForRedoExtended(record, 1, RBM_ZERO_ON_ERROR, false,
@@ -775,11 +639,11 @@ heap_xlog_multi_insert(XLogReaderState *record)
if (PageIsNew(vmpage))
PageInit(vmpage, BLCKSZ, 0);
- visibilitymap_set_vmbits(blkno,
- vmbuffer,
- VISIBILITYMAP_ALL_VISIBLE |
- VISIBILITYMAP_ALL_FROZEN,
- rlocator);
+ visibilitymap_set(blkno,
+ vmbuffer,
+ VISIBILITYMAP_ALL_VISIBLE |
+ VISIBILITYMAP_ALL_FROZEN,
+ rlocator);
Assert(BufferIsDirty(vmbuffer));
PageSetLSN(vmpage, lsn);
@@ -1360,9 +1224,6 @@ heap2_redo(XLogReaderState *record)
case XLOG_HEAP2_PRUNE_VACUUM_CLEANUP:
heap_xlog_prune_freeze(record);
break;
- case XLOG_HEAP2_VISIBLE:
- heap_xlog_visible(record);
- break;
case XLOG_HEAP2_MULTI_INSERT:
heap_xlog_multi_insert(record);
break;
diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index b3ea42f1be1..cac09dff31f 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -1240,8 +1240,8 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
* so there is no point in optimizing it.
*/
PageSetAllVisible(page);
- visibilitymap_set_vmbits(blockno, vmbuffer, new_vmbits,
- params->relation->rd_locator);
+ visibilitymap_set(blockno, vmbuffer, new_vmbits,
+ params->relation->rd_locator);
}
MarkBufferDirty(buffer);
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index afa2c3af833..4d7e1636526 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -1918,11 +1918,11 @@ lazy_scan_new_or_empty(LVRelState *vacrel, Buffer buf, BlockNumber blkno,
log_newpage_buffer(buf, true);
PageSetAllVisible(page);
- visibilitymap_set_vmbits(blkno,
- vmbuffer,
- VISIBILITYMAP_ALL_VISIBLE |
- VISIBILITYMAP_ALL_FROZEN,
- vacrel->rel->rd_locator);
+ visibilitymap_set(blkno,
+ vmbuffer,
+ VISIBILITYMAP_ALL_VISIBLE |
+ VISIBILITYMAP_ALL_FROZEN,
+ vacrel->rel->rd_locator);
/*
* Emit WAL for setting PD_ALL_VISIBLE on the heap page and
@@ -2800,9 +2800,9 @@ lazy_vacuum_heap_page(LVRelState *vacrel, BlockNumber blkno, Buffer buffer,
* set PD_ALL_VISIBLE.
*/
PageSetAllVisible(page);
- visibilitymap_set_vmbits(blkno,
- vmbuffer, vmflags,
- vacrel->rel->rd_locator);
+ visibilitymap_set(blkno,
+ vmbuffer, vmflags,
+ vacrel->rel->rd_locator);
conflict_xid = visibility_cutoff_xid;
}
diff --git a/src/backend/access/heap/visibilitymap.c b/src/backend/access/heap/visibilitymap.c
index 3047bd46def..fc74e39e069 100644
--- a/src/backend/access/heap/visibilitymap.c
+++ b/src/backend/access/heap/visibilitymap.c
@@ -14,8 +14,7 @@
* visibilitymap_clear - clear bits for one page in the visibility map
* visibilitymap_pin - pin a map page for setting a bit
* visibilitymap_pin_ok - check whether correct map page is already pinned
- * visibilitymap_set - set bit(s) in a previously pinned page and log
- * visibilitymap_set_vmbits - set bit(s) in a pinned page
+ * visibilitymap_set - set bit(s) in a previously pinned page
* visibilitymap_get_status - get status of bits
* visibilitymap_count - count number of bits set in visibility map
* visibilitymap_prepare_truncate -
@@ -220,112 +219,11 @@ visibilitymap_pin_ok(BlockNumber heapBlk, Buffer vmbuf)
return BufferIsValid(vmbuf) && BufferGetBlockNumber(vmbuf) == mapBlock;
}
-/*
- * visibilitymap_set - set bit(s) on a previously pinned page
- *
- * recptr is the LSN of the XLOG record we're replaying, if we're in recovery,
- * or InvalidXLogRecPtr in normal running. The VM page LSN is advanced to the
- * one provided; in normal running, we generate a new XLOG record and set the
- * page LSN to that value (though the heap page's LSN may *not* be updated;
- * see below). cutoff_xid is the largest xmin on the page being marked
- * all-visible; it is needed for Hot Standby, and can be InvalidTransactionId
- * if the page contains no tuples. It can also be set to InvalidTransactionId
- * when a page that is already all-visible is being marked all-frozen.
- *
- * Caller is expected to set the heap page's PD_ALL_VISIBLE bit before calling
- * this function. Except in recovery, caller should also pass the heap
- * buffer. When checksums are enabled and we're not in recovery, we must add
- * the heap buffer to the WAL chain to protect it from being torn.
- *
- * You must pass a buffer containing the correct map page to this function.
- * Call visibilitymap_pin first to pin the right one. This function doesn't do
- * any I/O.
- */
-void
-visibilitymap_set(Relation rel, BlockNumber heapBlk, Buffer heapBuf,
- XLogRecPtr recptr, Buffer vmBuf, TransactionId cutoff_xid,
- uint8 flags)
-{
- BlockNumber mapBlock = HEAPBLK_TO_MAPBLOCK(heapBlk);
- uint32 mapByte = HEAPBLK_TO_MAPBYTE(heapBlk);
- uint8 mapOffset = HEAPBLK_TO_OFFSET(heapBlk);
- Page page;
- uint8 *map;
- uint8 status;
-
-#ifdef TRACE_VISIBILITYMAP
- elog(DEBUG1, "vm_set flags 0x%02X for %s %d",
- flags, RelationGetRelationName(rel), heapBlk);
-#endif
-
- Assert(InRecovery || !XLogRecPtrIsValid(recptr));
- Assert(InRecovery || PageIsAllVisible(BufferGetPage(heapBuf)));
- Assert((flags & VISIBILITYMAP_VALID_BITS) == flags);
-
- /* Must never set all_frozen bit without also setting all_visible bit */
- Assert(flags != VISIBILITYMAP_ALL_FROZEN);
-
- /* Check that we have the right heap page pinned, if present */
- if (BufferIsValid(heapBuf) && BufferGetBlockNumber(heapBuf) != heapBlk)
- elog(ERROR, "wrong heap buffer passed to visibilitymap_set");
-
- Assert(!BufferIsValid(heapBuf) ||
- BufferIsLockedByMeInMode(heapBuf, BUFFER_LOCK_EXCLUSIVE));
-
- /* Check that we have the right VM page pinned */
- if (!BufferIsValid(vmBuf) || BufferGetBlockNumber(vmBuf) != mapBlock)
- elog(ERROR, "wrong VM buffer passed to visibilitymap_set");
-
- page = BufferGetPage(vmBuf);
- map = (uint8 *) PageGetContents(page);
- LockBuffer(vmBuf, BUFFER_LOCK_EXCLUSIVE);
-
- status = (map[mapByte] >> mapOffset) & VISIBILITYMAP_VALID_BITS;
- if (flags != status)
- {
- START_CRIT_SECTION();
-
- map[mapByte] |= (flags << mapOffset);
- MarkBufferDirty(vmBuf);
-
- if (RelationNeedsWAL(rel))
- {
- if (!XLogRecPtrIsValid(recptr))
- {
- Assert(!InRecovery);
- recptr = log_heap_visible(rel, heapBuf, vmBuf, cutoff_xid, flags);
-
- /*
- * If data checksums are enabled (or wal_log_hints=on), we
- * need to protect the heap page from being torn.
- *
- * If not, then we must *not* update the heap page's LSN. In
- * this case, the FPI for the heap page was omitted from the
- * WAL record inserted above, so it would be incorrect to
- * update the heap page's LSN.
- */
- if (XLogHintBitIsNeeded())
- {
- Page heapPage = BufferGetPage(heapBuf);
-
- PageSetLSN(heapPage, recptr);
- }
- }
- PageSetLSN(page, recptr);
- }
-
- END_CRIT_SECTION();
- }
-
- LockBuffer(vmBuf, BUFFER_LOCK_UNLOCK);
-}
-
/*
* Set VM (visibility map) flags in the VM block in vmBuf.
*
* This function is intended for callers that log VM changes together
* with the heap page modifications that rendered the page all-visible.
- * Callers that log VM changes separately should use visibilitymap_set().
*
* vmBuf must be pinned and exclusively locked, and it must cover the VM bits
* corresponding to heapBlk.
@@ -341,9 +239,9 @@ visibilitymap_set(Relation rel, BlockNumber heapBlk, Buffer heapBuf,
* rlocator is used only for debugging messages.
*/
void
-visibilitymap_set_vmbits(BlockNumber heapBlk,
- Buffer vmBuf, uint8 flags,
- const RelFileLocator rlocator)
+visibilitymap_set(BlockNumber heapBlk,
+ Buffer vmBuf, uint8 flags,
+ const RelFileLocator rlocator)
{
BlockNumber mapBlock = HEAPBLK_TO_MAPBLOCK(heapBlk);
uint32 mapByte = HEAPBLK_TO_MAPBYTE(heapBlk);
diff --git a/src/backend/access/rmgrdesc/heapdesc.c b/src/backend/access/rmgrdesc/heapdesc.c
index 02ae91653c1..75ae6f9d375 100644
--- a/src/backend/access/rmgrdesc/heapdesc.c
+++ b/src/backend/access/rmgrdesc/heapdesc.c
@@ -349,13 +349,6 @@ heap2_desc(StringInfo buf, XLogReaderState *record)
}
}
}
- else if (info == XLOG_HEAP2_VISIBLE)
- {
- xl_heap_visible *xlrec = (xl_heap_visible *) rec;
-
- appendStringInfo(buf, "snapshotConflictHorizon: %u, flags: 0x%02X",
- xlrec->snapshotConflictHorizon, xlrec->flags);
- }
else if (info == XLOG_HEAP2_MULTI_INSERT)
{
xl_heap_multi_insert *xlrec = (xl_heap_multi_insert *) rec;
@@ -461,9 +454,6 @@ heap2_identify(uint8 info)
case XLOG_HEAP2_PRUNE_VACUUM_CLEANUP:
id = "PRUNE_VACUUM_CLEANUP";
break;
- case XLOG_HEAP2_VISIBLE:
- id = "VISIBLE";
- break;
case XLOG_HEAP2_MULTI_INSERT:
id = "MULTI_INSERT";
break;
diff --git a/src/backend/replication/logical/decode.c b/src/backend/replication/logical/decode.c
index e25dd6bc366..f7ddb56fc30 100644
--- a/src/backend/replication/logical/decode.c
+++ b/src/backend/replication/logical/decode.c
@@ -449,7 +449,6 @@ heap2_decode(LogicalDecodingContext *ctx, XLogRecordBuffer *buf)
case XLOG_HEAP2_PRUNE_ON_ACCESS:
case XLOG_HEAP2_PRUNE_VACUUM_SCAN:
case XLOG_HEAP2_PRUNE_VACUUM_CLEANUP:
- case XLOG_HEAP2_VISIBLE:
case XLOG_HEAP2_LOCK_UPDATED:
break;
default:
diff --git a/src/backend/storage/ipc/standby.c b/src/backend/storage/ipc/standby.c
index afffab77106..f8681dcc9c7 100644
--- a/src/backend/storage/ipc/standby.c
+++ b/src/backend/storage/ipc/standby.c
@@ -475,12 +475,12 @@ ResolveRecoveryConflictWithSnapshot(TransactionId snapshotConflictHorizon,
* If we get passed InvalidTransactionId then we do nothing (no conflict).
*
* This can happen when replaying already-applied WAL records after a
- * standby crash or restart, or when replaying an XLOG_HEAP2_VISIBLE
- * record that marks as frozen a page which was already all-visible. It's
- * also quite common with records generated during index deletion
- * (original execution of the deletion can reason that a recovery conflict
- * which is sufficient for the deletion operation must take place before
- * replay of the deletion record itself).
+ * standby crash or restart, or when replaying a record that marks as
+ * frozen a page which was already marked all-visible in the visibility
+ * map. It's also quite common with records generated during index
+ * deletion (original execution of the deletion can reason that a recovery
+ * conflict which is sufficient for the deletion operation must take place
+ * before replay of the deletion record itself).
*/
if (!TransactionIdIsValid(snapshotConflictHorizon))
return;
diff --git a/src/include/access/heapam_xlog.h b/src/include/access/heapam_xlog.h
index ce3566ba949..5eed567a8e5 100644
--- a/src/include/access/heapam_xlog.h
+++ b/src/include/access/heapam_xlog.h
@@ -60,7 +60,6 @@
#define XLOG_HEAP2_PRUNE_ON_ACCESS 0x10
#define XLOG_HEAP2_PRUNE_VACUUM_SCAN 0x20
#define XLOG_HEAP2_PRUNE_VACUUM_CLEANUP 0x30
-#define XLOG_HEAP2_VISIBLE 0x40
#define XLOG_HEAP2_MULTI_INSERT 0x50
#define XLOG_HEAP2_LOCK_UPDATED 0x60
#define XLOG_HEAP2_NEW_CID 0x70
@@ -443,20 +442,6 @@ typedef struct xl_heap_inplace
#define MinSizeOfHeapInplace (offsetof(xl_heap_inplace, nmsgs) + sizeof(int))
-/*
- * This is what we need to know about setting a visibility map bit
- *
- * Backup blk 0: visibility map buffer
- * Backup blk 1: heap buffer
- */
-typedef struct xl_heap_visible
-{
- TransactionId snapshotConflictHorizon;
- uint8 flags;
-} xl_heap_visible;
-
-#define SizeOfHeapVisible (offsetof(xl_heap_visible, flags) + sizeof(uint8))
-
typedef struct xl_heap_new_cid
{
/*
@@ -500,11 +485,6 @@ extern void heap2_desc(StringInfo buf, XLogReaderState *record);
extern const char *heap2_identify(uint8 info);
extern void heap_xlog_logical_rewrite(XLogReaderState *r);
-extern XLogRecPtr log_heap_visible(Relation rel, Buffer heap_buffer,
- Buffer vm_buffer,
- TransactionId snapshotConflictHorizon,
- uint8 vmflags);
-
/* in heapdesc.c, so it can be shared between frontend/backend code */
extern void heap_xlog_deserialize_prune_and_freeze(char *cursor, uint16 flags,
int *nplans, xlhp_freeze_plan **plans,
diff --git a/src/include/access/visibilitymap.h b/src/include/access/visibilitymap.h
index a0166c5b410..001afb037f3 100644
--- a/src/include/access/visibilitymap.h
+++ b/src/include/access/visibilitymap.h
@@ -15,7 +15,6 @@
#define VISIBILITYMAP_H
#include "access/visibilitymapdefs.h"
-#include "access/xlogdefs.h"
#include "storage/block.h"
#include "storage/buf.h"
#include "storage/relfilelocator.h"
@@ -32,15 +31,9 @@ extern bool visibilitymap_clear(Relation rel, BlockNumber heapBlk,
extern void visibilitymap_pin(Relation rel, BlockNumber heapBlk,
Buffer *vmbuf);
extern bool visibilitymap_pin_ok(BlockNumber heapBlk, Buffer vmbuf);
-extern void visibilitymap_set(Relation rel,
- BlockNumber heapBlk, Buffer heapBuf,
- XLogRecPtr recptr,
- Buffer vmBuf,
- TransactionId cutoff_xid,
- uint8 flags);
-extern void visibilitymap_set_vmbits(BlockNumber heapBlk,
- Buffer vmBuf, uint8 flags,
- const RelFileLocator rlocator);
+extern void visibilitymap_set(BlockNumber heapBlk,
+ Buffer vmBuf, uint8 flags,
+ const RelFileLocator rlocator);
extern uint8 visibilitymap_get_status(Relation rel, BlockNumber heapBlk, Buffer *vmbuf);
extern void visibilitymap_count(Relation rel, BlockNumber *all_visible, BlockNumber *all_frozen);
extern BlockNumber visibilitymap_prepare_truncate(Relation rel,
diff --git a/src/include/access/visibilitymapdefs.h b/src/include/access/visibilitymapdefs.h
index 89153b3cd9a..e5794c8559e 100644
--- a/src/include/access/visibilitymapdefs.h
+++ b/src/include/access/visibilitymapdefs.h
@@ -21,14 +21,5 @@
#define VISIBILITYMAP_ALL_FROZEN 0x02
#define VISIBILITYMAP_VALID_BITS 0x03 /* OR of all valid visibilitymap
* flags bits */
-/*
- * To detect recovery conflicts during logical decoding on a standby, we need
- * to know if a table is a user catalog table. For that we add an additional
- * bit into xl_heap_visible.flags, in addition to the above.
- *
- * NB: VISIBILITYMAP_XLOG_* may not be passed to visibilitymap_set().
- */
-#define VISIBILITYMAP_XLOG_CATALOG_REL 0x04
-#define VISIBILITYMAP_XLOG_VALID_BITS (VISIBILITYMAP_VALID_BITS | VISIBILITYMAP_XLOG_CATALOG_REL)
#endif /* VISIBILITYMAPDEFS_H */
diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list
index b9e671fcda8..308cfff999e 100644
--- a/src/tools/pgindent/typedefs.list
+++ b/src/tools/pgindent/typedefs.list
@@ -4331,7 +4331,6 @@ xl_heap_prune
xl_heap_rewrite_mapping
xl_heap_truncate
xl_heap_update
-xl_heap_visible
xl_invalid_page
xl_invalid_page_key
xl_invalidations
--
2.43.0
v33-0009-Simplify-heap_page_would_be_all_visible-visibili.patchtext/x-patch; charset=US-ASCII; name=v33-0009-Simplify-heap_page_would_be_all_visible-visibili.patchDownload
From 6fee46f117980751d5ca5c73a08fe8823de50414 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <melanieplageman@gmail.com>
Date: Wed, 17 Dec 2025 13:57:16 -0500
Subject: [PATCH v33 09/16] Simplify heap_page_would_be_all_visible visibility
check
heap_page_would_be_all_visible() doesn't care about the distinction
between HEAPTUPLE_RECENTLY_DEAD and HEAPTUPLE_DEAD tuples -- any tuple
that is not HEAPTUPLE_LIVE means the page is not all-visible and causes
us to return false.
Therefore, we don't need to call HeapTupleSatisfiesVacuum(), which
includes an extra step to distinguish between dead and recently dead
tuples using OldestXmin. Replace it with the more minimal
HeapTupleSatisfiesVacuumHorizon().
This has the added benefit of making it easier to replace uses of
OldestXmin in heap_page_would_be_all_visible() in the future.
Author: Melanie Plageman <melanieplageman@gmail.com>
Reviewed-by: Kirill Reshke <reshkekirill@gmail.com>
Discussion: https://postgr.es/m/CALdSSPjvhGXihT_9f-GJabYU%3D_PjrFDUxYaURuTbfLyQM6TErg%40mail.gmail.com
---
src/backend/access/heap/vacuumlazy.c | 9 +++++++--
1 file changed, 7 insertions(+), 2 deletions(-)
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index 4d7e1636526..4b2a26f7336 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -3570,6 +3570,7 @@ heap_page_would_be_all_visible(Relation rel, Buffer buf,
{
ItemId itemid;
HeapTupleData tuple;
+ TransactionId dead_after;
/*
* Set the offset number so that we can display it along with any
@@ -3609,12 +3610,14 @@ heap_page_would_be_all_visible(Relation rel, Buffer buf,
/* Visibility checks may do IO or allocate memory */
Assert(CritSectionCount == 0);
- switch (HeapTupleSatisfiesVacuum(&tuple, OldestXmin, buf))
+ switch (HeapTupleSatisfiesVacuumHorizon(&tuple, buf, &dead_after))
{
case HEAPTUPLE_LIVE:
{
TransactionId xmin;
+ Assert(!TransactionIdIsValid(dead_after));
+
/* Check comments in lazy_scan_prune. */
if (!HeapTupleHeaderXminCommitted(tuple.t_data))
{
@@ -3647,8 +3650,10 @@ heap_page_would_be_all_visible(Relation rel, Buffer buf,
}
break;
- case HEAPTUPLE_DEAD:
case HEAPTUPLE_RECENTLY_DEAD:
+ Assert(TransactionIdIsValid(dead_after));
+ /* FALLTHROUGH */
+ case HEAPTUPLE_DEAD:
case HEAPTUPLE_INSERT_IN_PROGRESS:
case HEAPTUPLE_DELETE_IN_PROGRESS:
{
--
2.43.0
v33-0010-Remove-table_scan_analyze_next_tuple-unneeded-pa.patchtext/x-patch; charset=US-ASCII; name=v33-0010-Remove-table_scan_analyze_next_tuple-unneeded-pa.patchDownload
From 45fce23bcffa39701fc25ccd67ad455edfb99a0f Mon Sep 17 00:00:00 2001
From: Melanie Plageman <melanieplageman@gmail.com>
Date: Mon, 22 Dec 2025 10:46:45 -0500
Subject: [PATCH v33 10/16] Remove table_scan_analyze_next_tuple unneeded
parameter OldestXmin
heapam_scan_analyze_next_tuple() doesn't distinguish between dead and
recently dead tuples when counting them, so it doesn't need OldestXmin.
Looking at other table AMs implementing table_scan_analyze_next_tuple(),
it appears most do not use OldestXmin either.
Suggested-by: Kirill Reshke <reshkekirill@gmail.com>
Reviewed-by: Kirill Reshke <reshkekirill@gmail.com>
Discussion: https://postgr.es/m/CALdSSPjvhGXihT_9f-GJabYU%3D_PjrFDUxYaURuTbfLyQM6TErg%40mail.gmail.com
---
src/backend/access/heap/heapam_handler.c | 13 +++++++++----
src/backend/commands/analyze.c | 6 +-----
src/include/access/tableam.h | 5 ++---
3 files changed, 12 insertions(+), 12 deletions(-)
diff --git a/src/backend/access/heap/heapam_handler.c b/src/backend/access/heap/heapam_handler.c
index 09a456e9966..df2440e82a7 100644
--- a/src/backend/access/heap/heapam_handler.c
+++ b/src/backend/access/heap/heapam_handler.c
@@ -1026,7 +1026,7 @@ heapam_scan_analyze_next_block(TableScanDesc scan, ReadStream *stream)
}
static bool
-heapam_scan_analyze_next_tuple(TableScanDesc scan, TransactionId OldestXmin,
+heapam_scan_analyze_next_tuple(TableScanDesc scan,
double *liverows, double *deadrows,
TupleTableSlot *slot)
{
@@ -1047,6 +1047,7 @@ heapam_scan_analyze_next_tuple(TableScanDesc scan, TransactionId OldestXmin,
ItemId itemid;
HeapTuple targtuple = &hslot->base.tupdata;
bool sample_it = false;
+ TransactionId dead_after;
itemid = PageGetItemId(targpage, hscan->rs_cindex);
@@ -1069,16 +1070,20 @@ heapam_scan_analyze_next_tuple(TableScanDesc scan, TransactionId OldestXmin,
targtuple->t_data = (HeapTupleHeader) PageGetItem(targpage, itemid);
targtuple->t_len = ItemIdGetLength(itemid);
- switch (HeapTupleSatisfiesVacuum(targtuple, OldestXmin,
- hscan->rs_cbuf))
+ switch (HeapTupleSatisfiesVacuumHorizon(targtuple,
+ hscan->rs_cbuf,
+ &dead_after))
{
case HEAPTUPLE_LIVE:
sample_it = true;
*liverows += 1;
break;
- case HEAPTUPLE_DEAD:
case HEAPTUPLE_RECENTLY_DEAD:
+ Assert(TransactionIdIsValid(dead_after));
+ /* FALLTHROUGH */
+
+ case HEAPTUPLE_DEAD:
/* Count dead and recently-dead rows */
*deadrows += 1;
break;
diff --git a/src/backend/commands/analyze.c b/src/backend/commands/analyze.c
index a483424152c..53adac9139b 100644
--- a/src/backend/commands/analyze.c
+++ b/src/backend/commands/analyze.c
@@ -1213,7 +1213,6 @@ acquire_sample_rows(Relation onerel, int elevel,
double rowstoskip = -1; /* -1 means not set yet */
uint32 randseed; /* Seed for block sampler(s) */
BlockNumber totalblocks;
- TransactionId OldestXmin;
BlockSamplerData bs;
ReservoirStateData rstate;
TupleTableSlot *slot;
@@ -1226,9 +1225,6 @@ acquire_sample_rows(Relation onerel, int elevel,
totalblocks = RelationGetNumberOfBlocks(onerel);
- /* Need a cutoff xmin for HeapTupleSatisfiesVacuum */
- OldestXmin = GetOldestNonRemovableTransactionId(onerel);
-
/* Prepare for sampling block numbers */
randseed = pg_prng_uint32(&pg_global_prng_state);
nblocks = BlockSampler_Init(&bs, totalblocks, targrows, randseed);
@@ -1261,7 +1257,7 @@ acquire_sample_rows(Relation onerel, int elevel,
{
vacuum_delay_point(true);
- while (table_scan_analyze_next_tuple(scan, OldestXmin, &liverows, &deadrows, slot))
+ while (table_scan_analyze_next_tuple(scan, &liverows, &deadrows, slot))
{
/*
* The first targrows sample rows are simply copied into the
diff --git a/src/include/access/tableam.h b/src/include/access/tableam.h
index e2ec5289d4d..c9fa9f259cd 100644
--- a/src/include/access/tableam.h
+++ b/src/include/access/tableam.h
@@ -683,7 +683,6 @@ typedef struct TableAmRoutine
* callback).
*/
bool (*scan_analyze_next_tuple) (TableScanDesc scan,
- TransactionId OldestXmin,
double *liverows,
double *deadrows,
TupleTableSlot *slot);
@@ -1714,11 +1713,11 @@ table_scan_analyze_next_block(TableScanDesc scan, ReadStream *stream)
* tuples.
*/
static inline bool
-table_scan_analyze_next_tuple(TableScanDesc scan, TransactionId OldestXmin,
+table_scan_analyze_next_tuple(TableScanDesc scan,
double *liverows, double *deadrows,
TupleTableSlot *slot)
{
- return scan->rs_rd->rd_tableam->scan_analyze_next_tuple(scan, OldestXmin,
+ return scan->rs_rd->rd_tableam->scan_analyze_next_tuple(scan,
liverows, deadrows,
slot);
}
--
2.43.0
v33-0001-Combine-visibilitymap_set-cases-in-lazy_scan_pru.patchtext/x-patch; charset=UTF-8; name=v33-0001-Combine-visibilitymap_set-cases-in-lazy_scan_pru.patchDownload
From 8e1286c1a6dbfe3309d111aaa21af5a8e6237bb8 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <melanieplageman@gmail.com>
Date: Mon, 8 Dec 2025 15:49:54 -0500
Subject: [PATCH v33 01/16] Combine visibilitymap_set() cases in
lazy_scan_prune()
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
lazy_scan_prune() previously had two separate cases that called
visibilitymap_set() after pruning and freezing. These branches were
nearly identical except that one attempted to avoid dirtying the heap
buffer. However, that situation can never occur — the heap buffer cannot
be clean at that point (and we would hit an assertion if it were).
In lazy_scan_prune(), when we change a previously all-visible page to
all-frozen and the page was recorded as all-visible in the visibility
map by find_next_unskippable_block(), the heap buffer will always be
dirty. Either we have just frozen a tuple and already dirtied the
buffer, or the buffer was modified between find_next_unskippable_block()
and heap_page_prune_and_freeze() and then pruned in
heap_page_prune_and_freeze().
Additionally, XLogRegisterBuffer() asserts that the buffer is dirty, so
attempting to add a clean heap buffer to the WAL chain would assert out
anyway.
Since the “clean heap buffer with already set VM” case is impossible,
the two visibilitymap_set() branches in lazy_scan_prune() can be merged.
Doing so makes the intent clearer and emphasizes that the heap buffer
must always be marked dirty before being added to the WAL chain.
This commit also adds a test case for vacuuming when no heap
modifications are required. Currently this ensures that the heap buffer
is marked dirty before it is added to the WAL chain, but if we later
remove the heap buffer from the VM-set WAL chain or pass it with the
REGBUF_NO_CHANGES flag, this test would guard that behavior.
Author: Melanie Plageman <melanieplageman@gmail.com>
Reviewed-by: Chao Li <li.evan.chao@gmail.com>
Reviewed-by: Srinath Reddy Sadipiralla <srinath2133@gmail.com>
Reviewed-by: Kirill Reshke <reshkekirill@gmail.com>
Reviewed-by: Xuneng Zhou <xunengzhou@gmail.com>
Discussion: https://postgr.es/m/5CEAA162-67B1-44DA-B60D-8B65717E8B05%40gmail.com
Discussion: https://postgr.es/m/flat/CAAKRu_ZWx5gCbeCf7PWCv8p5%3D%3Db7EEws0VD2wksDxpXCvCyHvQ%40mail.gmail.com
---
.../pg_visibility/expected/pg_visibility.out | 44 ++++++++++
contrib/pg_visibility/sql/pg_visibility.sql | 20 +++++
src/backend/access/heap/vacuumlazy.c | 87 ++++---------------
3 files changed, 82 insertions(+), 69 deletions(-)
diff --git a/contrib/pg_visibility/expected/pg_visibility.out b/contrib/pg_visibility/expected/pg_visibility.out
index 09fa5933a35..e10f1706015 100644
--- a/contrib/pg_visibility/expected/pg_visibility.out
+++ b/contrib/pg_visibility/expected/pg_visibility.out
@@ -1,4 +1,5 @@
CREATE EXTENSION pg_visibility;
+CREATE EXTENSION pageinspect;
--
-- recently-dropped table
--
@@ -204,6 +205,49 @@ select pg_truncate_visibility_map('test_partition');
(1 row)
+-- test the case where vacuum phase I does not need to modify the heap buffer
+-- and only needs to set the VM
+create table test_vac_unmodified_heap(a int);
+insert into test_vac_unmodified_heap values (1);
+vacuum (freeze) test_vac_unmodified_heap;
+select pg_visibility_map_summary('test_vac_unmodified_heap');
+ pg_visibility_map_summary
+---------------------------
+ (1,1)
+(1 row)
+
+-- the checkpoint cleans the buffer dirtied by freezing the sole tuple
+checkpoint;
+-- truncating the VM ensures that the next vacuum will need to set it
+select pg_truncate_visibility_map('test_vac_unmodified_heap');
+ pg_truncate_visibility_map
+----------------------------
+
+(1 row)
+
+select pg_visibility_map_summary('test_vac_unmodified_heap');
+ pg_visibility_map_summary
+---------------------------
+ (0,0)
+(1 row)
+
+-- though the VM is truncated, the heap page-level visibility hint,
+-- PD_ALL_VISIBLE should still be set
+SELECT (flags & x'0004'::int) <> 0
+ FROM page_header(get_raw_page('test_vac_unmodified_heap', 0));
+ ?column?
+----------
+ t
+(1 row)
+
+-- vacuum sets the VM
+vacuum test_vac_unmodified_heap;
+select pg_visibility_map_summary('test_vac_unmodified_heap');
+ pg_visibility_map_summary
+---------------------------
+ (1,1)
+(1 row)
+
-- test copy freeze
create table copyfreeze (a int, b char(1500));
-- load all rows via COPY FREEZE and ensure that all pages are set all-visible
diff --git a/contrib/pg_visibility/sql/pg_visibility.sql b/contrib/pg_visibility/sql/pg_visibility.sql
index 5af06ec5b76..57af8a0c5b6 100644
--- a/contrib/pg_visibility/sql/pg_visibility.sql
+++ b/contrib/pg_visibility/sql/pg_visibility.sql
@@ -1,4 +1,5 @@
CREATE EXTENSION pg_visibility;
+CREATE EXTENSION pageinspect;
--
-- recently-dropped table
@@ -94,6 +95,25 @@ select count(*) > 0 from pg_visibility_map_summary('test_partition');
select * from pg_check_frozen('test_partition'); -- hopefully none
select pg_truncate_visibility_map('test_partition');
+-- test the case where vacuum phase I does not need to modify the heap buffer
+-- and only needs to set the VM
+create table test_vac_unmodified_heap(a int);
+insert into test_vac_unmodified_heap values (1);
+vacuum (freeze) test_vac_unmodified_heap;
+select pg_visibility_map_summary('test_vac_unmodified_heap');
+-- the checkpoint cleans the buffer dirtied by freezing the sole tuple
+checkpoint;
+-- truncating the VM ensures that the next vacuum will need to set it
+select pg_truncate_visibility_map('test_vac_unmodified_heap');
+select pg_visibility_map_summary('test_vac_unmodified_heap');
+-- though the VM is truncated, the heap page-level visibility hint,
+-- PD_ALL_VISIBLE should still be set
+SELECT (flags & x'0004'::int) <> 0
+ FROM page_header(get_raw_page('test_vac_unmodified_heap', 0));
+-- vacuum sets the VM
+vacuum test_vac_unmodified_heap;
+select pg_visibility_map_summary('test_vac_unmodified_heap');
+
-- test copy freeze
create table copyfreeze (a int, b char(1500));
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index 2086a577199..2da35c85e76 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -2122,16 +2122,14 @@ lazy_scan_prune(LVRelState *vacrel,
* of last heap_vac_scan_next_block() call), and from all_visible and
* all_frozen variables
*/
- if (!all_visible_according_to_vm && presult.all_visible)
+ if ((presult.all_visible && !all_visible_according_to_vm) ||
+ (presult.all_frozen && !VM_ALL_FROZEN(rel, blkno, &vmbuffer)))
{
uint8 old_vmbits;
uint8 flags = VISIBILITYMAP_ALL_VISIBLE;
if (presult.all_frozen)
- {
- Assert(!TransactionIdIsValid(presult.vm_conflict_horizon));
flags |= VISIBILITYMAP_ALL_FROZEN;
- }
/*
* It should never be the case that the visibility map page is set
@@ -2139,15 +2137,25 @@ lazy_scan_prune(LVRelState *vacrel,
* checksums are not enabled). Regardless, set both bits so that we
* get back in sync.
*
- * NB: If the heap page is all-visible but the VM bit is not set, we
- * don't need to dirty the heap page. However, if checksums are
- * enabled, we do need to make sure that the heap page is dirtied
- * before passing it to visibilitymap_set(), because it may be logged.
- * Given that this situation should only happen in rare cases after a
- * crash, it is not worth optimizing.
+ * Even if PD_ALL_VISIBLE is already set, we don't need to worry about
+ * unnecessarily dirtying the heap buffer. Nearly the only scenario
+ * where PD_ALL_VISIBLE is set but the VM is not is if the VM was
+ * removed -- and that isn't worth optimizing for. And if we add the
+ * heap buffer to the WAL chain (without passing REGBUF_NO_CHANGES),
+ * it must be marked dirty.
*/
PageSetAllVisible(page);
MarkBufferDirty(buf);
+
+ /*
+ * If the page is being set all-frozen, we pass InvalidTransactionId
+ * as the cutoff_xid, since a snapshot conflict horizon sufficient to
+ * make everything safe for REDO was logged when the page's tuples
+ * were frozen.
+ */
+ Assert(!presult.all_frozen ||
+ !TransactionIdIsValid(presult.vm_conflict_horizon));
+
old_vmbits = visibilitymap_set(vacrel->rel, blkno, buf,
InvalidXLogRecPtr,
vmbuffer, presult.vm_conflict_horizon,
@@ -2219,65 +2227,6 @@ lazy_scan_prune(LVRelState *vacrel,
VISIBILITYMAP_VALID_BITS);
}
- /*
- * If the all-visible page is all-frozen but not marked as such yet, mark
- * it as all-frozen.
- */
- else if (all_visible_according_to_vm && presult.all_frozen &&
- !VM_ALL_FROZEN(vacrel->rel, blkno, &vmbuffer))
- {
- uint8 old_vmbits;
-
- /*
- * Avoid relying on all_visible_according_to_vm as a proxy for the
- * page-level PD_ALL_VISIBLE bit being set, since it might have become
- * stale -- even when all_visible is set
- */
- if (!PageIsAllVisible(page))
- {
- PageSetAllVisible(page);
- MarkBufferDirty(buf);
- }
-
- /*
- * Set the page all-frozen (and all-visible) in the VM.
- *
- * We can pass InvalidTransactionId as our cutoff_xid, since a
- * snapshotConflictHorizon sufficient to make everything safe for REDO
- * was logged when the page's tuples were frozen.
- */
- Assert(!TransactionIdIsValid(presult.vm_conflict_horizon));
- old_vmbits = visibilitymap_set(vacrel->rel, blkno, buf,
- InvalidXLogRecPtr,
- vmbuffer, InvalidTransactionId,
- VISIBILITYMAP_ALL_VISIBLE |
- VISIBILITYMAP_ALL_FROZEN);
-
- /*
- * The page was likely already set all-visible in the VM. However,
- * there is a small chance that it was modified sometime between
- * setting all_visible_according_to_vm and checking the visibility
- * during pruning. Check the return value of old_vmbits anyway to
- * ensure the visibility map counters used for logging are accurate.
- */
- if ((old_vmbits & VISIBILITYMAP_ALL_VISIBLE) == 0)
- {
- vacrel->vm_new_visible_pages++;
- vacrel->vm_new_visible_frozen_pages++;
- *vm_page_frozen = true;
- }
-
- /*
- * We already checked that the page was not set all-frozen in the VM
- * above, so we don't need to test the value of old_vmbits.
- */
- else
- {
- vacrel->vm_new_frozen_pages++;
- *vm_page_frozen = true;
- }
- }
-
return presult.ndeleted;
}
--
2.43.0
v33-0002-Eliminate-use-of-cached-VM-value-in-lazy_scan_pr.patchtext/x-patch; charset=US-ASCII; name=v33-0002-Eliminate-use-of-cached-VM-value-in-lazy_scan_pr.patchDownload
From 4d37243f9fa0dc4e264a28bcee448787fb8d7f65 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <melanieplageman@gmail.com>
Date: Thu, 11 Dec 2025 10:48:13 -0500
Subject: [PATCH v33 02/16] Eliminate use of cached VM value in
lazy_scan_prune()
lazy_scan_prune() takes a parameter from lazy_scan_heap() indicating
whether the page was marked all-visible in the VM at the time it was
last checked in find_next_unskippable_block(). This behavior is
historical, dating back to commit 608195a3a365, when we did not pin the
VM page until deciding we must read it. Now that the VM page is already
pinned, there is no meaningful benefit to relying on a cached VM status.
Removing this cached value simplifies the logic in both lazy_scan_heap()
and lazy_scan_prune(). It also clarifies future work that will set the
visibility map on-access: such paths will not have a cached value
available, which would make the logic harder to reason about. And
eliminating it enables us to detect and repair VM corruption on-access.
Along with removing the cached value and unconditionally checking the
visibility status of the heap page, this commit also moves the VM
corruption handling to occur first. This reordering should have no
performance impact, since the checks are inexpensive and performed only
once per page. It does, however, make the control flow easier to
understand. The new restructuring also makes it possible to set the VM
after fixing corruption (if pruning found the page all-visible).
Now that no callers of visibilitymap_set() use its return value, change
its (and visibilitymap_set_vmbits()) return type to void.
Author: Melanie Plageman <melanieplageman@gmail.com>
Reviewed-by: Kirill Reshke <reshkekirill@gmail.com>
Reviewed-by: Chao Li <li.evan.chao@gmail.com>
Reviewed-by: Xuneng Zhou <xunengzhou@gmail.com>
Discussion: https://postgr.es/m/5CEAA162-67B1-44DA-B60D-8B65717E8B05%40gmail.com
---
src/backend/access/heap/vacuumlazy.c | 182 +++++++++++-------------
src/backend/access/heap/visibilitymap.c | 9 +-
src/include/access/visibilitymap.h | 18 +--
3 files changed, 94 insertions(+), 115 deletions(-)
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index 2da35c85e76..3733a1cbc47 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -248,13 +248,6 @@ typedef enum
*/
#define EAGER_SCAN_REGION_SIZE 4096
-/*
- * heap_vac_scan_next_block() sets these flags to communicate information
- * about the block it read to the caller.
- */
-#define VAC_BLK_WAS_EAGER_SCANNED (1 << 0)
-#define VAC_BLK_ALL_VISIBLE_ACCORDING_TO_VM (1 << 1)
-
typedef struct LVRelState
{
/* Target heap relation and its indexes */
@@ -360,7 +353,6 @@ typedef struct LVRelState
/* State maintained by heap_vac_scan_next_block() */
BlockNumber current_block; /* last block returned */
BlockNumber next_unskippable_block; /* next unskippable block */
- bool next_unskippable_allvis; /* its visibility status */
bool next_unskippable_eager_scanned; /* if it was eagerly scanned */
Buffer next_unskippable_vmbuffer; /* buffer containing its VM bit */
@@ -434,7 +426,7 @@ static bool lazy_scan_new_or_empty(LVRelState *vacrel, Buffer buf,
bool sharelock, Buffer vmbuffer);
static int lazy_scan_prune(LVRelState *vacrel, Buffer buf,
BlockNumber blkno, Page page,
- Buffer vmbuffer, bool all_visible_according_to_vm,
+ Buffer vmbuffer,
bool *has_lpdead_items, bool *vm_page_frozen);
static bool lazy_scan_noprune(LVRelState *vacrel, Buffer buf,
BlockNumber blkno, Page page,
@@ -1277,7 +1269,6 @@ lazy_scan_heap(LVRelState *vacrel)
/* Initialize for the first heap_vac_scan_next_block() call */
vacrel->current_block = InvalidBlockNumber;
vacrel->next_unskippable_block = InvalidBlockNumber;
- vacrel->next_unskippable_allvis = false;
vacrel->next_unskippable_eager_scanned = false;
vacrel->next_unskippable_vmbuffer = InvalidBuffer;
@@ -1293,13 +1284,13 @@ lazy_scan_heap(LVRelState *vacrel)
MAIN_FORKNUM,
heap_vac_scan_next_block,
vacrel,
- sizeof(uint8));
+ sizeof(bool));
while (true)
{
Buffer buf;
Page page;
- uint8 blk_info = 0;
+ bool was_eager_scanned = false;
int ndeleted = 0;
bool has_lpdead_items;
void *per_buffer_data = NULL;
@@ -1368,13 +1359,13 @@ lazy_scan_heap(LVRelState *vacrel)
if (!BufferIsValid(buf))
break;
- blk_info = *((uint8 *) per_buffer_data);
+ was_eager_scanned = *((bool *) per_buffer_data);
CheckBufferIsPinnedOnce(buf);
page = BufferGetPage(buf);
blkno = BufferGetBlockNumber(buf);
vacrel->scanned_pages++;
- if (blk_info & VAC_BLK_WAS_EAGER_SCANNED)
+ if (was_eager_scanned)
vacrel->eager_scanned_pages++;
/* Report as block scanned, update error traceback information */
@@ -1445,7 +1436,6 @@ lazy_scan_heap(LVRelState *vacrel)
if (got_cleanup_lock)
ndeleted = lazy_scan_prune(vacrel, buf, blkno, page,
vmbuffer,
- blk_info & VAC_BLK_ALL_VISIBLE_ACCORDING_TO_VM,
&has_lpdead_items, &vm_page_frozen);
/*
@@ -1462,8 +1452,7 @@ lazy_scan_heap(LVRelState *vacrel)
* exclude pages skipped due to cleanup lock contention from eager
* freeze algorithm caps.
*/
- if (got_cleanup_lock &&
- (blk_info & VAC_BLK_WAS_EAGER_SCANNED))
+ if (got_cleanup_lock && was_eager_scanned)
{
/* Aggressive vacuums do not eager scan. */
Assert(!vacrel->aggressive);
@@ -1630,7 +1619,6 @@ heap_vac_scan_next_block(ReadStream *stream,
{
BlockNumber next_block;
LVRelState *vacrel = callback_private_data;
- uint8 blk_info = 0;
/* relies on InvalidBlockNumber + 1 overflowing to 0 on first call */
next_block = vacrel->current_block + 1;
@@ -1693,8 +1681,8 @@ heap_vac_scan_next_block(ReadStream *stream,
* otherwise they would've been unskippable.
*/
vacrel->current_block = next_block;
- blk_info |= VAC_BLK_ALL_VISIBLE_ACCORDING_TO_VM;
- *((uint8 *) per_buffer_data) = blk_info;
+ /* Block was not eager scanned */
+ *((bool *) per_buffer_data) = false;
return vacrel->current_block;
}
else
@@ -1706,11 +1694,7 @@ heap_vac_scan_next_block(ReadStream *stream,
Assert(next_block == vacrel->next_unskippable_block);
vacrel->current_block = next_block;
- if (vacrel->next_unskippable_allvis)
- blk_info |= VAC_BLK_ALL_VISIBLE_ACCORDING_TO_VM;
- if (vacrel->next_unskippable_eager_scanned)
- blk_info |= VAC_BLK_WAS_EAGER_SCANNED;
- *((uint8 *) per_buffer_data) = blk_info;
+ *((bool *) per_buffer_data) = vacrel->next_unskippable_eager_scanned;
return vacrel->current_block;
}
}
@@ -1735,7 +1719,6 @@ find_next_unskippable_block(LVRelState *vacrel, bool *skipsallvis)
BlockNumber next_unskippable_block = vacrel->next_unskippable_block + 1;
Buffer next_unskippable_vmbuffer = vacrel->next_unskippable_vmbuffer;
bool next_unskippable_eager_scanned = false;
- bool next_unskippable_allvis;
*skipsallvis = false;
@@ -1745,7 +1728,6 @@ find_next_unskippable_block(LVRelState *vacrel, bool *skipsallvis)
next_unskippable_block,
&next_unskippable_vmbuffer);
- next_unskippable_allvis = (mapbits & VISIBILITYMAP_ALL_VISIBLE) != 0;
/*
* At the start of each eager scan region, normal vacuums with eager
@@ -1764,7 +1746,7 @@ find_next_unskippable_block(LVRelState *vacrel, bool *skipsallvis)
* A block is unskippable if it is not all visible according to the
* visibility map.
*/
- if (!next_unskippable_allvis)
+ if ((mapbits & VISIBILITYMAP_ALL_VISIBLE) == 0)
{
Assert((mapbits & VISIBILITYMAP_ALL_FROZEN) == 0);
break;
@@ -1821,7 +1803,6 @@ find_next_unskippable_block(LVRelState *vacrel, bool *skipsallvis)
/* write the local variables back to vacrel */
vacrel->next_unskippable_block = next_unskippable_block;
- vacrel->next_unskippable_allvis = next_unskippable_allvis;
vacrel->next_unskippable_eager_scanned = next_unskippable_eager_scanned;
vacrel->next_unskippable_vmbuffer = next_unskippable_vmbuffer;
}
@@ -1982,9 +1963,7 @@ cmpOffsetNumbers(const void *a, const void *b)
* Caller must hold pin and buffer cleanup lock on the buffer.
*
* vmbuffer is the buffer containing the VM block with visibility information
- * for the heap block, blkno. all_visible_according_to_vm is the saved
- * visibility status of the heap block looked up earlier by the caller. We
- * won't rely entirely on this status, as it may be out of date.
+ * for the heap block, blkno.
*
* *has_lpdead_items is set to true or false depending on whether, upon return
* from this function, any LP_DEAD items are still present on the page.
@@ -2001,7 +1980,6 @@ lazy_scan_prune(LVRelState *vacrel,
BlockNumber blkno,
Page page,
Buffer vmbuffer,
- bool all_visible_according_to_vm,
bool *has_lpdead_items,
bool *vm_page_frozen)
{
@@ -2015,6 +1993,8 @@ lazy_scan_prune(LVRelState *vacrel,
.vistest = vacrel->vistest,
.cutoffs = &vacrel->cutoffs,
};
+ uint8 old_vmbits = 0;
+ uint8 new_vmbits = 0;
Assert(BufferGetBlockNumber(buf) == blkno);
@@ -2117,70 +2097,7 @@ lazy_scan_prune(LVRelState *vacrel,
Assert(!presult.all_visible || !(*has_lpdead_items));
Assert(!presult.all_frozen || presult.all_visible);
- /*
- * Handle setting visibility map bit based on information from the VM (as
- * of last heap_vac_scan_next_block() call), and from all_visible and
- * all_frozen variables
- */
- if ((presult.all_visible && !all_visible_according_to_vm) ||
- (presult.all_frozen && !VM_ALL_FROZEN(rel, blkno, &vmbuffer)))
- {
- uint8 old_vmbits;
- uint8 flags = VISIBILITYMAP_ALL_VISIBLE;
-
- if (presult.all_frozen)
- flags |= VISIBILITYMAP_ALL_FROZEN;
-
- /*
- * It should never be the case that the visibility map page is set
- * while the page-level bit is clear, but the reverse is allowed (if
- * checksums are not enabled). Regardless, set both bits so that we
- * get back in sync.
- *
- * Even if PD_ALL_VISIBLE is already set, we don't need to worry about
- * unnecessarily dirtying the heap buffer. Nearly the only scenario
- * where PD_ALL_VISIBLE is set but the VM is not is if the VM was
- * removed -- and that isn't worth optimizing for. And if we add the
- * heap buffer to the WAL chain (without passing REGBUF_NO_CHANGES),
- * it must be marked dirty.
- */
- PageSetAllVisible(page);
- MarkBufferDirty(buf);
-
- /*
- * If the page is being set all-frozen, we pass InvalidTransactionId
- * as the cutoff_xid, since a snapshot conflict horizon sufficient to
- * make everything safe for REDO was logged when the page's tuples
- * were frozen.
- */
- Assert(!presult.all_frozen ||
- !TransactionIdIsValid(presult.vm_conflict_horizon));
-
- old_vmbits = visibilitymap_set(vacrel->rel, blkno, buf,
- InvalidXLogRecPtr,
- vmbuffer, presult.vm_conflict_horizon,
- flags);
-
- /*
- * If the page wasn't already set all-visible and/or all-frozen in the
- * VM, count it as newly set for logging.
- */
- if ((old_vmbits & VISIBILITYMAP_ALL_VISIBLE) == 0)
- {
- vacrel->vm_new_visible_pages++;
- if (presult.all_frozen)
- {
- vacrel->vm_new_visible_frozen_pages++;
- *vm_page_frozen = true;
- }
- }
- else if ((old_vmbits & VISIBILITYMAP_ALL_FROZEN) == 0 &&
- presult.all_frozen)
- {
- vacrel->vm_new_frozen_pages++;
- *vm_page_frozen = true;
- }
- }
+ old_vmbits = visibilitymap_get_status(vacrel->rel, blkno, &vmbuffer);
/*
* As of PostgreSQL 9.2, the visibility map bit should never be set if the
@@ -2188,8 +2105,8 @@ lazy_scan_prune(LVRelState *vacrel,
* cleared after heap_vac_scan_next_block() was called, so we must recheck
* with buffer lock before concluding that the VM is corrupt.
*/
- else if (all_visible_according_to_vm && !PageIsAllVisible(page) &&
- visibilitymap_get_status(vacrel->rel, blkno, &vmbuffer) != 0)
+ if (!PageIsAllVisible(page) &&
+ (old_vmbits & VISIBILITYMAP_VALID_BITS) != 0)
{
ereport(WARNING,
(errcode(ERRCODE_DATA_CORRUPTED),
@@ -2198,6 +2115,8 @@ lazy_scan_prune(LVRelState *vacrel,
visibilitymap_clear(vacrel->rel, blkno, vmbuffer,
VISIBILITYMAP_VALID_BITS);
+ /* VM bits are now clear */
+ old_vmbits = 0;
}
/*
@@ -2225,6 +2144,71 @@ lazy_scan_prune(LVRelState *vacrel,
MarkBufferDirty(buf);
visibilitymap_clear(vacrel->rel, blkno, vmbuffer,
VISIBILITYMAP_VALID_BITS);
+ /* VM bits are now clear */
+ old_vmbits = 0;
+ }
+
+ if (!presult.all_visible)
+ return presult.ndeleted;
+
+ /* Set the visibility map and page visibility hint */
+ new_vmbits = VISIBILITYMAP_ALL_VISIBLE;
+
+ if (presult.all_frozen)
+ new_vmbits |= VISIBILITYMAP_ALL_FROZEN;
+
+ /* Nothing to do */
+ if (old_vmbits == new_vmbits)
+ return presult.ndeleted;
+
+ Assert(presult.all_visible);
+
+ /*
+ * It should never be the case that the visibility map page is set while
+ * the page-level bit is clear, but the reverse is allowed (if checksums
+ * are not enabled). Regardless, set both bits so that we get back in
+ * sync.
+ *
+ * The heap buffer must be marked dirty before adding it to the WAL chain
+ * when setting the VM. We don't worry about unnecessarily dirtying the
+ * heap buffer if PD_ALL_VISIBLE is already set, though. It is extremely
+ * rare to have a clean heap buffer with PD_ALL_VISIBLE already set and
+ * the VM bits clear, so there is no point in optimizing it.
+ */
+ PageSetAllVisible(page);
+ MarkBufferDirty(buf);
+
+ /*
+ * If the page is being set all-frozen, we pass InvalidTransactionId as
+ * the cutoff_xid, since a snapshot conflict horizon sufficient to make
+ * everything safe for REDO was logged when the page's tuples were frozen.
+ */
+ Assert(!presult.all_frozen ||
+ !TransactionIdIsValid(presult.vm_conflict_horizon));
+
+ visibilitymap_set(vacrel->rel, blkno, buf,
+ InvalidXLogRecPtr,
+ vmbuffer, presult.vm_conflict_horizon,
+ new_vmbits);
+
+ /*
+ * If the page wasn't already set all-visible and/or all-frozen in the VM,
+ * count it as newly set for logging.
+ */
+ if ((old_vmbits & VISIBILITYMAP_ALL_VISIBLE) == 0)
+ {
+ vacrel->vm_new_visible_pages++;
+ if (presult.all_frozen)
+ {
+ vacrel->vm_new_visible_frozen_pages++;
+ *vm_page_frozen = true;
+ }
+ }
+ else if ((old_vmbits & VISIBILITYMAP_ALL_FROZEN) == 0 &&
+ presult.all_frozen)
+ {
+ vacrel->vm_new_frozen_pages++;
+ *vm_page_frozen = true;
}
return presult.ndeleted;
diff --git a/src/backend/access/heap/visibilitymap.c b/src/backend/access/heap/visibilitymap.c
index 2382d18f72b..3047bd46def 100644
--- a/src/backend/access/heap/visibilitymap.c
+++ b/src/backend/access/heap/visibilitymap.c
@@ -240,10 +240,8 @@ visibilitymap_pin_ok(BlockNumber heapBlk, Buffer vmbuf)
* You must pass a buffer containing the correct map page to this function.
* Call visibilitymap_pin first to pin the right one. This function doesn't do
* any I/O.
- *
- * Returns the state of the page's VM bits before setting flags.
*/
-uint8
+void
visibilitymap_set(Relation rel, BlockNumber heapBlk, Buffer heapBuf,
XLogRecPtr recptr, Buffer vmBuf, TransactionId cutoff_xid,
uint8 flags)
@@ -320,7 +318,6 @@ visibilitymap_set(Relation rel, BlockNumber heapBlk, Buffer heapBuf,
}
LockBuffer(vmBuf, BUFFER_LOCK_UNLOCK);
- return status;
}
/*
@@ -343,7 +340,7 @@ visibilitymap_set(Relation rel, BlockNumber heapBlk, Buffer heapBuf,
*
* rlocator is used only for debugging messages.
*/
-uint8
+void
visibilitymap_set_vmbits(BlockNumber heapBlk,
Buffer vmBuf, uint8 flags,
const RelFileLocator rlocator)
@@ -386,8 +383,6 @@ visibilitymap_set_vmbits(BlockNumber heapBlk,
map[mapByte] |= (flags << mapOffset);
MarkBufferDirty(vmBuf);
}
-
- return status;
}
/*
diff --git a/src/include/access/visibilitymap.h b/src/include/access/visibilitymap.h
index 47ad489a9a7..a0166c5b410 100644
--- a/src/include/access/visibilitymap.h
+++ b/src/include/access/visibilitymap.h
@@ -32,15 +32,15 @@ extern bool visibilitymap_clear(Relation rel, BlockNumber heapBlk,
extern void visibilitymap_pin(Relation rel, BlockNumber heapBlk,
Buffer *vmbuf);
extern bool visibilitymap_pin_ok(BlockNumber heapBlk, Buffer vmbuf);
-extern uint8 visibilitymap_set(Relation rel,
- BlockNumber heapBlk, Buffer heapBuf,
- XLogRecPtr recptr,
- Buffer vmBuf,
- TransactionId cutoff_xid,
- uint8 flags);
-extern uint8 visibilitymap_set_vmbits(BlockNumber heapBlk,
- Buffer vmBuf, uint8 flags,
- const RelFileLocator rlocator);
+extern void visibilitymap_set(Relation rel,
+ BlockNumber heapBlk, Buffer heapBuf,
+ XLogRecPtr recptr,
+ Buffer vmBuf,
+ TransactionId cutoff_xid,
+ uint8 flags);
+extern void visibilitymap_set_vmbits(BlockNumber heapBlk,
+ Buffer vmBuf, uint8 flags,
+ const RelFileLocator rlocator);
extern uint8 visibilitymap_get_status(Relation rel, BlockNumber heapBlk, Buffer *vmbuf);
extern void visibilitymap_count(Relation rel, BlockNumber *all_visible, BlockNumber *all_frozen);
extern BlockNumber visibilitymap_prepare_truncate(Relation rel,
--
2.43.0
v33-0003-Refactor-lazy_scan_prune-VM-clear-logic-into-hel.patchtext/x-patch; charset=US-ASCII; name=v33-0003-Refactor-lazy_scan_prune-VM-clear-logic-into-hel.patchDownload
From 0fc1b4cbb4e67b193eca8347dca1bf8053d2020e Mon Sep 17 00:00:00 2001
From: Melanie Plageman <melanieplageman@gmail.com>
Date: Tue, 2 Dec 2025 13:36:39 -0500
Subject: [PATCH v33 03/16] Refactor lazy_scan_prune() VM clear logic into
helper
Encapsulating them in a helper makes the whole function clearer. There
is no functional change other than moving it into a helper.
Reviewed-by: Kirill Reshke <reshkekirill@gmail.com>
Reviewed-by: Chao Li <li.evan.chao@gmail.com>
---
src/backend/access/heap/vacuumlazy.c | 132 +++++++++++++++++----------
1 file changed, 85 insertions(+), 47 deletions(-)
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index 3733a1cbc47..5857fd1bfb6 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -424,6 +424,11 @@ static void find_next_unskippable_block(LVRelState *vacrel, bool *skipsallvis);
static bool lazy_scan_new_or_empty(LVRelState *vacrel, Buffer buf,
BlockNumber blkno, Page page,
bool sharelock, Buffer vmbuffer);
+static bool identify_and_fix_vm_corruption(Relation rel, Buffer heap_buffer,
+ BlockNumber heap_blk, Page heap_page,
+ int nlpdead_items,
+ Buffer vmbuffer,
+ uint8 vmbits);
static int lazy_scan_prune(LVRelState *vacrel, Buffer buf,
BlockNumber blkno, Page page,
Buffer vmbuffer,
@@ -1957,6 +1962,83 @@ cmpOffsetNumbers(const void *a, const void *b)
return pg_cmp_u16(*(const OffsetNumber *) a, *(const OffsetNumber *) b);
}
+/*
+ * Helper to correct any corruption detected on a heap page and its
+ * corresponding visibility map page after pruning but before setting the
+ * visibility map. It examines the heap page, the associated VM page, and the
+ * number of dead items previously identified.
+ *
+ * This function must be called while holding an exclusive lock on the heap
+ * buffer, and the dead items must have been discovered under that same lock.
+
+ * The provided vmbits must reflect the current state of the VM block
+ * referenced by vmbuffer. Although we do not hold a lock on the VM buffer, it
+ * is pinned, and the heap buffer is exclusively locked, ensuring that no
+ * other backend can update the VM bits corresponding to this heap page.
+ *
+ * Returns true if it cleared corruption and false otherwise.
+ */
+static bool
+identify_and_fix_vm_corruption(Relation rel, Buffer heap_buffer,
+ BlockNumber heap_blk, Page heap_page,
+ int nlpdead_items,
+ Buffer vmbuffer,
+ uint8 vmbits)
+{
+ Assert(visibilitymap_get_status(rel, heap_blk, &vmbuffer) == vmbits);
+
+ Assert(BufferIsLockedByMeInMode(heap_buffer, BUFFER_LOCK_EXCLUSIVE));
+
+ /*
+ * As of PostgreSQL 9.2, the visibility map bit should never be set if the
+ * page-level bit is clear. However, it's possible that the bit got
+ * cleared after heap_vac_scan_next_block() was called, so we must recheck
+ * with buffer lock before concluding that the VM is corrupt.
+ */
+ if (!PageIsAllVisible(heap_page) &&
+ ((vmbits & VISIBILITYMAP_VALID_BITS) != 0))
+ {
+ ereport(WARNING,
+ (errcode(ERRCODE_DATA_CORRUPTED),
+ errmsg("page is not marked all-visible but visibility map bit is set in relation \"%s\" page %u",
+ RelationGetRelationName(rel), heap_blk)));
+
+ visibilitymap_clear(rel, heap_blk, vmbuffer,
+ VISIBILITYMAP_VALID_BITS);
+ return true;
+ }
+
+ /*
+ * It's possible for the value returned by
+ * GetOldestNonRemovableTransactionId() to move backwards, so it's not
+ * wrong for us to see tuples that appear to not be visible to everyone
+ * yet, while PD_ALL_VISIBLE is already set. The real safe xmin value
+ * never moves backwards, but GetOldestNonRemovableTransactionId() is
+ * conservative and sometimes returns a value that's unnecessarily small,
+ * so if we see that contradiction it just means that the tuples that we
+ * think are not visible to everyone yet actually are, and the
+ * PD_ALL_VISIBLE flag is correct.
+ *
+ * There should never be LP_DEAD items on a page with PD_ALL_VISIBLE set,
+ * however.
+ */
+ else if (PageIsAllVisible(heap_page) && nlpdead_items > 0)
+ {
+ ereport(WARNING,
+ (errcode(ERRCODE_DATA_CORRUPTED),
+ errmsg("page containing LP_DEAD items is marked as all-visible in relation \"%s\" page %u",
+ RelationGetRelationName(rel), heap_blk)));
+
+ PageClearAllVisible(heap_page);
+ MarkBufferDirty(heap_buffer);
+ visibilitymap_clear(rel, heap_blk, vmbuffer,
+ VISIBILITYMAP_VALID_BITS);
+ return true;
+ }
+
+ return false;
+}
+
/*
* lazy_scan_prune() -- lazy_scan_heap() pruning and freezing.
*
@@ -2099,54 +2181,10 @@ lazy_scan_prune(LVRelState *vacrel,
old_vmbits = visibilitymap_get_status(vacrel->rel, blkno, &vmbuffer);
- /*
- * As of PostgreSQL 9.2, the visibility map bit should never be set if the
- * page-level bit is clear. However, it's possible that the bit got
- * cleared after heap_vac_scan_next_block() was called, so we must recheck
- * with buffer lock before concluding that the VM is corrupt.
- */
- if (!PageIsAllVisible(page) &&
- (old_vmbits & VISIBILITYMAP_VALID_BITS) != 0)
- {
- ereport(WARNING,
- (errcode(ERRCODE_DATA_CORRUPTED),
- errmsg("page is not marked all-visible but visibility map bit is set in relation \"%s\" page %u",
- vacrel->relname, blkno)));
-
- visibilitymap_clear(vacrel->rel, blkno, vmbuffer,
- VISIBILITYMAP_VALID_BITS);
- /* VM bits are now clear */
+ if (identify_and_fix_vm_corruption(vacrel->rel, buf, blkno, page,
+ presult.lpdead_items, vmbuffer,
+ old_vmbits))
old_vmbits = 0;
- }
-
- /*
- * It's possible for the value returned by
- * GetOldestNonRemovableTransactionId() to move backwards, so it's not
- * wrong for us to see tuples that appear to not be visible to everyone
- * yet, while PD_ALL_VISIBLE is already set. The real safe xmin value
- * never moves backwards, but GetOldestNonRemovableTransactionId() is
- * conservative and sometimes returns a value that's unnecessarily small,
- * so if we see that contradiction it just means that the tuples that we
- * think are not visible to everyone yet actually are, and the
- * PD_ALL_VISIBLE flag is correct.
- *
- * There should never be LP_DEAD items on a page with PD_ALL_VISIBLE set,
- * however.
- */
- else if (presult.lpdead_items > 0 && PageIsAllVisible(page))
- {
- ereport(WARNING,
- (errcode(ERRCODE_DATA_CORRUPTED),
- errmsg("page containing LP_DEAD items is marked as all-visible in relation \"%s\" page %u",
- vacrel->relname, blkno)));
-
- PageClearAllVisible(page);
- MarkBufferDirty(buf);
- visibilitymap_clear(vacrel->rel, blkno, vmbuffer,
- VISIBILITYMAP_VALID_BITS);
- /* VM bits are now clear */
- old_vmbits = 0;
- }
if (!presult.all_visible)
return presult.ndeleted;
--
2.43.0
v33-0011-Use-GlobalVisState-in-vacuum-to-determine-page-l.patchtext/x-patch; charset=UTF-8; name=v33-0011-Use-GlobalVisState-in-vacuum-to-determine-page-l.patchDownload
From 834ce896a3cc2d38b9506db702863182c0b3e166 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <melanieplageman@gmail.com>
Date: Wed, 17 Dec 2025 16:51:05 -0500
Subject: [PATCH v33 11/16] Use GlobalVisState in vacuum to determine page
level visibility
During vacuum's first and third phases, we examine tuples' visibility
to determine if we can set the page all-visible in the visibility map.
Previously, this check compared tuple xmins against a single XID chosen at
the start of vacuum (OldestXmin). We now use GlobalVisState, which also
enables future work to set the VM during on-access pruning, since ordinary
queries have access to GlobalVisState but not OldestXmin.
This also benefits vacuum: in some cases, GlobalVisState may advance
during a vacuum, allowing more pages to become considered all-visible.
And, in the future, we could easily add a heuristic to update
GlobalVisState more frequently during vacuums of large tables.
OldestXmin is still used for freezing and as a backstop to ensure we
don't freeze a dead tuple that wasn't yet prunable according to
GlobalVisState in the rare occurrences where GlobalVisState moves
backwards.
Because comparing a transaction ID against GlobalVisState is more
expensive than comparing against a single XID, we defer this check until
after scanning all tuples on the page. If visibility_cutoff_xid was
maintained, we perform the GlobalVisState check only once per page.
This is safe because visibility_cutoff_xid records the newest xmin on
the page; if it is globally visible, then the entire page is all-visible.
This approach may result in examining more tuple xmins than before,
since with OldestXmin we could sometimes rule out the page being
all-visible earlier. However, profiling shows the additional cost is not
significant.
Reviewed-by: Andres Freund <andres@anarazel.de>
Reviewed-by: Chao Li <li.evan.chao@gmail.com>
---
src/backend/access/heap/heapam_visibility.c | 22 +++++++++
src/backend/access/heap/pruneheap.c | 53 ++++++++++-----------
src/backend/access/heap/vacuumlazy.c | 38 ++++++++++-----
src/include/access/heapam.h | 4 +-
4 files changed, 76 insertions(+), 41 deletions(-)
diff --git a/src/backend/access/heap/heapam_visibility.c b/src/backend/access/heap/heapam_visibility.c
index 05e70b7d92a..b4489020609 100644
--- a/src/backend/access/heap/heapam_visibility.c
+++ b/src/backend/access/heap/heapam_visibility.c
@@ -1053,6 +1053,28 @@ HeapTupleSatisfiesVacuum(HeapTuple htup, TransactionId OldestXmin,
return res;
}
+/*
+ * Wrapper around GlobalVisTestIsRemovableXid() for use when examining live
+ * tuples. Returns true if the given XID may be considered running by at least
+ * one snapshot.
+ *
+ * This function alone is insufficient to determine tuple visibility; callers
+ * must also consider the XID's commit status. Its purpose is purely semantic:
+ * when applied to live tuples, GlobalVisTestIsRemovableXid() is checking
+ * whether the inserting transaction is still considered running, not whether
+ * the tuple is removable. Live tuples are, by definition, not removable, but
+ * the snapshot criteria for “transaction still running” are identical to
+ * those used for removal XIDs.
+ *
+ * See the comment above GlobalVisTestIsRemovable[Full]Xid() for details on the
+ * required preconditions for calling this function.
+ */
+bool
+GlobalVisTestXidMaybeRunning(GlobalVisState *state, TransactionId xid)
+{
+ return !GlobalVisTestIsRemovableXid(state, xid);
+}
+
/*
* Work horse for HeapTupleSatisfiesVacuum and similar routines.
*
diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index cac09dff31f..da09c769b4d 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -461,11 +461,12 @@ prune_freeze_setup(PruneFreezeParams *params,
/*
* The visibility cutoff xid is the newest xmin of live, committed tuples
- * older than OldestXmin on the page. This field is only kept up-to-date
- * if the page is all-visible. As soon as a tuple is encountered that is
- * not visible to all, this field is unmaintained. As long as it is
- * maintained, it can be used to calculate the snapshot conflict horizon
- * when updating the VM and/or freezing all the tuples on the page.
+ * on the page older than the visibility horizon represented in the
+ * GlobalVisState. This field is only kept up-to-date if the page is
+ * all-visible. As soon as a tuple is encountered that is not visible to
+ * all, this field is unmaintained. As long as it is maintained, it can be
+ * used to calculate the snapshot conflict horizon when updating the VM
+ * and/or freezing all the tuples on the page.
*/
prstate->visibility_cutoff_xid = InvalidTransactionId;
}
@@ -1008,14 +1009,14 @@ heap_page_will_set_vm(PruneState *prstate,
*/
static bool
heap_page_is_all_visible(Relation rel, Buffer buf,
- TransactionId OldestXmin,
+ GlobalVisState *vistest,
bool *all_frozen,
TransactionId *visibility_cutoff_xid,
OffsetNumber *logging_offnum)
{
return heap_page_would_be_all_visible(rel, buf,
- OldestXmin,
+ vistest,
NULL, 0,
all_frozen,
visibility_cutoff_xid,
@@ -1102,6 +1103,16 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
prune_freeze_plan(RelationGetRelid(params->relation),
buffer, &prstate, off_loc);
+ /*
+ * After processing all the live tuples on the page, if the newest xmin
+ * amongst them may be considered running by any snapshot, the page cannot
+ * be all-visible.
+ */
+ if (prstate.all_visible &&
+ TransactionIdIsNormal(prstate.visibility_cutoff_xid) &&
+ GlobalVisTestXidMaybeRunning(prstate.vistest, prstate.visibility_cutoff_xid))
+ prstate.all_visible = prstate.all_frozen = false;
+
/*
* If checksums are enabled, calling heap_prune_satisfies_vacuum() while
* checking tuple visibility information in prune_freeze_plan() may have
@@ -1283,10 +1294,9 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
bool debug_all_frozen;
Assert(prstate.lpdead_items == 0);
- Assert(prstate.cutoffs);
Assert(heap_page_is_all_visible(params->relation, buffer,
- prstate.cutoffs->OldestXmin,
+ prstate.vistest,
&debug_all_frozen,
&debug_cutoff, off_loc));
@@ -1807,28 +1817,15 @@ heap_prune_record_unchanged_lp_normal(Page page, PruneState *prstate, OffsetNumb
}
/*
- * The inserter definitely committed. But is it old enough
- * that everyone sees it as committed? A FrozenTransactionId
- * is seen as committed to everyone. Otherwise, we check if
- * there is a snapshot that considers this xid to still be
- * running, and if so, we don't consider the page all-visible.
+ * The inserter definitely committed. But we don't know if it
+ * is old enough that everyone sees it as committed. Later,
+ * after processing all the tuples on the page, we'll check if
+ * there is any snapshot that still considers the newest xid
+ * on the page to be running. If so, we don't consider the
+ * page all-visible.
*/
xmin = HeapTupleHeaderGetXmin(htup);
- /*
- * For now always use prstate->cutoffs for this test, because
- * we only update 'all_visible' and 'all_frozen' when freezing
- * is requested. We could use GlobalVisTestIsRemovableXid
- * instead, if a non-freezing caller wanted to set the VM bit.
- */
- Assert(prstate->cutoffs);
- if (!TransactionIdPrecedes(xmin, prstate->cutoffs->OldestXmin))
- {
- prstate->all_visible = false;
- prstate->all_frozen = false;
- break;
- }
-
/* Track newest xmin on page. */
if (TransactionIdFollows(xmin, prstate->visibility_cutoff_xid) &&
TransactionIdIsNormal(xmin))
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index 4b2a26f7336..c97ad2a931a 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -2754,7 +2754,7 @@ lazy_vacuum_heap_page(LVRelState *vacrel, BlockNumber blkno, Buffer buffer,
* done outside the critical section.
*/
if (heap_page_would_be_all_visible(vacrel->rel, buffer,
- vacrel->cutoffs.OldestXmin,
+ vacrel->vistest,
deadoffsets, num_offsets,
&all_frozen, &visibility_cutoff_xid,
&vacrel->offnum))
@@ -3519,7 +3519,7 @@ dead_items_cleanup(LVRelState *vacrel)
* Returns true if the page is all-visible other than the provided
* deadoffsets and false otherwise.
*
- * OldestXmin is used to determine visibility.
+ * vistest is used to determine visibility.
*
* Output parameters:
*
@@ -3535,7 +3535,7 @@ dead_items_cleanup(LVRelState *vacrel)
*/
bool
heap_page_would_be_all_visible(Relation rel, Buffer buf,
- TransactionId OldestXmin,
+ GlobalVisState *vistest,
OffsetNumber *deadoffsets,
int ndeadoffsets,
bool *all_frozen,
@@ -3618,7 +3618,7 @@ heap_page_would_be_all_visible(Relation rel, Buffer buf,
Assert(!TransactionIdIsValid(dead_after));
- /* Check comments in lazy_scan_prune. */
+ /* Check heap_prune_record_unchanged_lp_normal comments */
if (!HeapTupleHeaderXminCommitted(tuple.t_data))
{
all_visible = false;
@@ -3627,16 +3627,17 @@ heap_page_would_be_all_visible(Relation rel, Buffer buf,
}
/*
- * The inserter definitely committed. But is it old enough
- * that everyone sees it as committed?
+ * The inserter definitely committed. But we don't know if
+ * it is old enough that everyone sees it as committed.
+ * Don't check that now.
+ *
+ * If we scan all tuples without finding one that prevents
+ * the page from being all-visible, we then check whether
+ * any snapshot still considers the newest XID on the page
+ * to be running. In that case, the page is not considered
+ * all-visible.
*/
xmin = HeapTupleHeaderGetXmin(tuple.t_data);
- if (!TransactionIdPrecedes(xmin, OldestXmin))
- {
- all_visible = false;
- *all_frozen = false;
- break;
- }
/* Track newest xmin on page. */
if (TransactionIdFollows(xmin, *visibility_cutoff_xid) &&
@@ -3667,6 +3668,19 @@ heap_page_would_be_all_visible(Relation rel, Buffer buf,
}
} /* scan along page */
+ /*
+ * After processing all the live tuples on the page, if the newest xmin
+ * among them may still be considered running by any snapshot, the page
+ * cannot be all-visible.
+ */
+ if (all_visible &&
+ TransactionIdIsNormal(*visibility_cutoff_xid) &&
+ GlobalVisTestXidMaybeRunning(vistest, *visibility_cutoff_xid))
+ {
+ all_visible = false;
+ *all_frozen = false;
+ }
+
/* Clear the offset information once we have processed the given page. */
*logging_offnum = InvalidOffsetNumber;
diff --git a/src/include/access/heapam.h b/src/include/access/heapam.h
index e0da1f7cdcc..ac771390a37 100644
--- a/src/include/access/heapam.h
+++ b/src/include/access/heapam.h
@@ -438,7 +438,7 @@ extern void log_heap_prune_and_freeze(Relation relation, Buffer buffer,
extern void heap_vacuum_rel(Relation rel,
const VacuumParams params, BufferAccessStrategy bstrategy);
extern bool heap_page_would_be_all_visible(Relation rel, Buffer buf,
- TransactionId OldestXmin,
+ GlobalVisState *vistest,
OffsetNumber *deadoffsets,
int ndeadoffsets,
bool *all_frozen,
@@ -452,6 +452,8 @@ extern TM_Result HeapTupleSatisfiesUpdate(HeapTuple htup, CommandId curcid,
Buffer buffer);
extern HTSV_Result HeapTupleSatisfiesVacuum(HeapTuple htup, TransactionId OldestXmin,
Buffer buffer);
+
+extern bool GlobalVisTestXidMaybeRunning(GlobalVisState *state, TransactionId xid);
extern HTSV_Result HeapTupleSatisfiesVacuumHorizon(HeapTuple htup, Buffer buffer,
TransactionId *dead_after);
extern void HeapTupleSetHintBits(HeapTupleHeader tuple, Buffer buffer,
--
2.43.0
v33-0012-Unset-all_visible-sooner-if-not-freezing.patchtext/x-patch; charset=UTF-8; name=v33-0012-Unset-all_visible-sooner-if-not-freezing.patchDownload
From f01a815565075cc30ca43aadc577b51fa90f639e Mon Sep 17 00:00:00 2001
From: Melanie Plageman <melanieplageman@gmail.com>
Date: Tue, 14 Oct 2025 15:22:35 -0400
Subject: [PATCH v33 12/16] Unset all_visible sooner if not freezing
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
In the prune/freeze path, we currently delay clearing all_visible and
all_frozen in the presence of dead items to allow opportunistic
freezing.
However, if no freezing will be attempted, there’s no need to delay.
Clearing the flags earlier avoids extra bookkeeping in
heap_prune_record_unchanged_lp_normal(). This currently has no runtime
effect because all callers that consider setting the VM also prepare
freeze plans, but upcoming changes will allow on-access pruning to set
the VM without freezing. The extra bookkeeping was noticeable in a
profile of on-access VM setting.
Reviewed-by: Chao Li <li.evan.chao@gmail.com>
---
src/backend/access/heap/pruneheap.c | 15 +++++++++++++--
1 file changed, 13 insertions(+), 2 deletions(-)
diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index da09c769b4d..9f1257529b9 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -1682,8 +1682,13 @@ heap_prune_record_dead(PruneState *prstate, OffsetNumber offnum,
/*
* Deliberately delay unsetting all_visible and all_frozen until later
* during pruning. Removable dead tuples shouldn't preclude freezing the
- * page.
+ * page. If we won't attempt freezing, just unset all-visible now, though.
*/
+ if (!prstate->attempt_freeze)
+ {
+ prstate->all_visible = false;
+ prstate->all_frozen = false;
+ }
/* Record the dead offset for vacuum */
prstate->deadoffsets[prstate->lpdead_items++] = offnum;
@@ -1943,8 +1948,14 @@ heap_prune_record_unchanged_lp_dead(Page page, PruneState *prstate, OffsetNumber
* Similarly, don't unset all_visible and all_frozen until later, at the
* end of heap_page_prune_and_freeze(). This will allow us to attempt to
* freeze the page after pruning. As long as we unset it before updating
- * the visibility map, this will be correct.
+ * the visibility map, this will be correct. If we won't attempt freezing,
+ * though, just unset all_visible and all_frozen now.
*/
+ if (!prstate->attempt_freeze)
+ {
+ prstate->all_visible = false;
+ prstate->all_frozen = false;
+ }
/* Record the dead offset for vacuum */
prstate->deadoffsets[prstate->lpdead_items++] = offnum;
--
2.43.0
v33-0013-Track-which-relations-are-modified-by-a-query.patchtext/x-patch; charset=US-ASCII; name=v33-0013-Track-which-relations-are-modified-by-a-query.patchDownload
From 5b62fa1efe6cec0f92429da72b110927bf42418f Mon Sep 17 00:00:00 2001
From: Melanie Plageman <melanieplageman@gmail.com>
Date: Wed, 3 Dec 2025 15:07:24 -0500
Subject: [PATCH v33 13/16] Track which relations are modified by a query
Save the relids in a bitmap in the estate. A later commit will pass this
information down to scan nodes to control whether or not the scan allows
setting the visibility map while on-access pruning. We don't want to set
the visibility map if the query is just going to modify the page
immediately after.
Reviewed-by: Chao Li <li.evan.chao@gmail.com>
---
src/backend/executor/execMain.c | 4 ++++
src/backend/executor/execUtils.c | 2 ++
src/include/nodes/execnodes.h | 6 ++++++
3 files changed, 12 insertions(+)
diff --git a/src/backend/executor/execMain.c b/src/backend/executor/execMain.c
index ca14cdabdd0..6a0283985c3 100644
--- a/src/backend/executor/execMain.c
+++ b/src/backend/executor/execMain.c
@@ -916,6 +916,10 @@ InitPlan(QueryDesc *queryDesc, int eflags)
break;
}
+ /* If it has a rowmark, the relation may be modified */
+ estate->es_modified_relids = bms_add_member(estate->es_modified_relids,
+ rc->rti);
+
/* Check that relation is a legal target for marking */
if (relation)
CheckValidRowMarkRel(relation, rc->markType);
diff --git a/src/backend/executor/execUtils.c b/src/backend/executor/execUtils.c
index cc3c5de71eb..dcb2ef2275c 100644
--- a/src/backend/executor/execUtils.c
+++ b/src/backend/executor/execUtils.c
@@ -893,6 +893,8 @@ ExecInitResultRelation(EState *estate, ResultRelInfo *resultRelInfo,
estate->es_result_relations = (ResultRelInfo **)
palloc0(estate->es_range_table_size * sizeof(ResultRelInfo *));
estate->es_result_relations[rti - 1] = resultRelInfo;
+ estate->es_modified_relids = bms_add_member(estate->es_modified_relids,
+ rti);
/*
* Saving in the list allows to avoid needlessly traversing the whole
diff --git a/src/include/nodes/execnodes.h b/src/include/nodes/execnodes.h
index 02265456978..29e2e2da7ea 100644
--- a/src/include/nodes/execnodes.h
+++ b/src/include/nodes/execnodes.h
@@ -676,6 +676,12 @@ typedef struct EState
* ExecDoInitialPruning() */
const char *es_sourceText; /* Source text from QueryDesc */
+ /*
+ * RT indexes of relations modified by the query through a
+ * UPDATE/DELETE/INSERT/MERGE or targeted by a SELECT FOR UPDATE.
+ */
+ Bitmapset *es_modified_relids;
+
JunkFilter *es_junkFilter; /* top-level junk filter, if any */
/* If query can insert/delete tuples, the command ID to mark them with */
--
2.43.0
v33-0014-Pass-down-information-on-table-modification-to-s.patchtext/x-patch; charset=US-ASCII; name=v33-0014-Pass-down-information-on-table-modification-to-s.patchDownload
From 25f4a45c95cfdefe9eb96730270bfdab6a7d245c Mon Sep 17 00:00:00 2001
From: Melanie Plageman <melanieplageman@gmail.com>
Date: Wed, 3 Dec 2025 15:12:18 -0500
Subject: [PATCH v33 14/16] Pass down information on table modification to scan
node
Pass down information to sequential scan, index [only] scan, and bitmap
table scan nodes on whether or not the query modifies the relation being
scanned. A later commit will use this information to update the VM
during on-access pruning only if the relation is not modified by the
query.
Author: Melanie Plageman <melanieplageman@gmail.com>
Reviewed-by: Andrey Borodin <x4mmm@yandex-team.ru>
Discussion: https://postgr.es/m/4379FDA3-9446-4E2C-9C15-32EFE8D4F31B%40yandex-team.ru
---
contrib/pgrowlocks/pgrowlocks.c | 2 +-
src/backend/access/brin/brin.c | 3 ++-
src/backend/access/gin/gininsert.c | 3 ++-
src/backend/access/heap/heapam_handler.c | 7 +++---
src/backend/access/index/genam.c | 4 ++--
src/backend/access/index/indexam.c | 6 +++---
src/backend/access/nbtree/nbtsort.c | 2 +-
src/backend/access/table/tableam.c | 7 +++---
src/backend/commands/constraint.c | 2 +-
src/backend/commands/copyto.c | 2 +-
src/backend/commands/tablecmds.c | 8 +++----
src/backend/commands/typecmds.c | 4 ++--
src/backend/executor/execIndexing.c | 2 +-
src/backend/executor/execReplication.c | 8 +++----
src/backend/executor/nodeBitmapHeapscan.c | 9 +++++++-
src/backend/executor/nodeIndexonlyscan.c | 9 +++++++-
src/backend/executor/nodeIndexscan.c | 18 ++++++++++++++--
src/backend/executor/nodeSeqscan.c | 26 ++++++++++++++++++++---
src/backend/partitioning/partbounds.c | 2 +-
src/backend/utils/adt/selfuncs.c | 2 +-
src/include/access/genam.h | 2 +-
src/include/access/heapam.h | 6 ++++++
src/include/access/tableam.h | 19 ++++++++++-------
23 files changed, 107 insertions(+), 46 deletions(-)
diff --git a/contrib/pgrowlocks/pgrowlocks.c b/contrib/pgrowlocks/pgrowlocks.c
index f88269332b6..27f01d8055f 100644
--- a/contrib/pgrowlocks/pgrowlocks.c
+++ b/contrib/pgrowlocks/pgrowlocks.c
@@ -114,7 +114,7 @@ pgrowlocks(PG_FUNCTION_ARGS)
RelationGetRelationName(rel));
/* Scan the relation */
- scan = table_beginscan(rel, GetActiveSnapshot(), 0, NULL);
+ scan = table_beginscan(rel, GetActiveSnapshot(), 0, NULL, 0);
hscan = (HeapScanDesc) scan;
attinmeta = TupleDescGetAttInMetadata(rsinfo->setDesc);
diff --git a/src/backend/access/brin/brin.c b/src/backend/access/brin/brin.c
index 6887e421442..4d9684b1b19 100644
--- a/src/backend/access/brin/brin.c
+++ b/src/backend/access/brin/brin.c
@@ -2843,7 +2843,8 @@ _brin_parallel_scan_and_build(BrinBuildState *state,
indexInfo->ii_Concurrent = brinshared->isconcurrent;
scan = table_beginscan_parallel(heap,
- ParallelTableScanFromBrinShared(brinshared));
+ ParallelTableScanFromBrinShared(brinshared),
+ 0);
reltuples = table_index_build_scan(heap, index, indexInfo, true, true,
brinbuildCallbackParallel, state, scan);
diff --git a/src/backend/access/gin/gininsert.c b/src/backend/access/gin/gininsert.c
index c08ea927ac5..b502d4088d7 100644
--- a/src/backend/access/gin/gininsert.c
+++ b/src/backend/access/gin/gininsert.c
@@ -2058,7 +2058,8 @@ _gin_parallel_scan_and_build(GinBuildState *state,
indexInfo->ii_Concurrent = ginshared->isconcurrent;
scan = table_beginscan_parallel(heap,
- ParallelTableScanFromGinBuildShared(ginshared));
+ ParallelTableScanFromGinBuildShared(ginshared),
+ 0);
reltuples = table_index_build_scan(heap, index, indexInfo, true, progress,
ginBuildCallbackParallel, state, scan);
diff --git a/src/backend/access/heap/heapam_handler.c b/src/backend/access/heap/heapam_handler.c
index df2440e82a7..e88db52fd7e 100644
--- a/src/backend/access/heap/heapam_handler.c
+++ b/src/backend/access/heap/heapam_handler.c
@@ -79,12 +79,13 @@ heapam_slot_callbacks(Relation relation)
*/
static IndexFetchTableData *
-heapam_index_fetch_begin(Relation rel)
+heapam_index_fetch_begin(Relation rel, uint32 flags)
{
IndexFetchHeapData *hscan = palloc0_object(IndexFetchHeapData);
hscan->xs_base.rel = rel;
hscan->xs_cbuf = InvalidBuffer;
+ hscan->modifies_base_rel = !(flags & SO_HINT_REL_READ_ONLY);
return &hscan->xs_base;
}
@@ -753,7 +754,7 @@ heapam_relation_copy_for_cluster(Relation OldHeap, Relation NewHeap,
tableScan = NULL;
heapScan = NULL;
- indexScan = index_beginscan(OldHeap, OldIndex, SnapshotAny, NULL, 0, 0);
+ indexScan = index_beginscan(OldHeap, OldIndex, SnapshotAny, NULL, 0, 0, 0);
index_rescan(indexScan, NULL, 0, NULL, 0);
}
else
@@ -762,7 +763,7 @@ heapam_relation_copy_for_cluster(Relation OldHeap, Relation NewHeap,
pgstat_progress_update_param(PROGRESS_CLUSTER_PHASE,
PROGRESS_CLUSTER_PHASE_SEQ_SCAN_HEAP);
- tableScan = table_beginscan(OldHeap, SnapshotAny, 0, (ScanKey) NULL);
+ tableScan = table_beginscan(OldHeap, SnapshotAny, 0, (ScanKey) NULL, 0);
heapScan = (HeapScanDesc) tableScan;
indexScan = NULL;
diff --git a/src/backend/access/index/genam.c b/src/backend/access/index/genam.c
index a29be6f467b..5ac7d22e49f 100644
--- a/src/backend/access/index/genam.c
+++ b/src/backend/access/index/genam.c
@@ -447,7 +447,7 @@ systable_beginscan(Relation heapRelation,
}
sysscan->iscan = index_beginscan(heapRelation, irel,
- snapshot, NULL, nkeys, 0);
+ snapshot, NULL, nkeys, 0, 0);
index_rescan(sysscan->iscan, idxkey, nkeys, NULL, 0);
sysscan->scan = NULL;
@@ -708,7 +708,7 @@ systable_beginscan_ordered(Relation heapRelation,
}
sysscan->iscan = index_beginscan(heapRelation, indexRelation,
- snapshot, NULL, nkeys, 0);
+ snapshot, NULL, nkeys, 0, 0);
index_rescan(sysscan->iscan, idxkey, nkeys, NULL, 0);
sysscan->scan = NULL;
diff --git a/src/backend/access/index/indexam.c b/src/backend/access/index/indexam.c
index 4ed0508c605..4df56087841 100644
--- a/src/backend/access/index/indexam.c
+++ b/src/backend/access/index/indexam.c
@@ -257,7 +257,7 @@ index_beginscan(Relation heapRelation,
Relation indexRelation,
Snapshot snapshot,
IndexScanInstrumentation *instrument,
- int nkeys, int norderbys)
+ int nkeys, int norderbys, uint32 flags)
{
IndexScanDesc scan;
@@ -284,7 +284,7 @@ index_beginscan(Relation heapRelation,
scan->instrument = instrument;
/* prepare to fetch index matches from table */
- scan->xs_heapfetch = table_index_fetch_begin(heapRelation);
+ scan->xs_heapfetch = table_index_fetch_begin(heapRelation, flags);
return scan;
}
@@ -615,7 +615,7 @@ index_beginscan_parallel(Relation heaprel, Relation indexrel,
scan->instrument = instrument;
/* prepare to fetch index matches from table */
- scan->xs_heapfetch = table_index_fetch_begin(heaprel);
+ scan->xs_heapfetch = table_index_fetch_begin(heaprel, 0);
return scan;
}
diff --git a/src/backend/access/nbtree/nbtsort.c b/src/backend/access/nbtree/nbtsort.c
index 90ab4e91b56..8ae54217f36 100644
--- a/src/backend/access/nbtree/nbtsort.c
+++ b/src/backend/access/nbtree/nbtsort.c
@@ -1925,7 +1925,7 @@ _bt_parallel_scan_and_sort(BTSpool *btspool, BTSpool *btspool2,
indexInfo = BuildIndexInfo(btspool->index);
indexInfo->ii_Concurrent = btshared->isconcurrent;
scan = table_beginscan_parallel(btspool->heap,
- ParallelTableScanFromBTShared(btshared));
+ ParallelTableScanFromBTShared(btshared), 0);
reltuples = table_index_build_scan(btspool->heap, btspool->index, indexInfo,
true, progress, _bt_build_callback,
&buildstate, scan);
diff --git a/src/backend/access/table/tableam.c b/src/backend/access/table/tableam.c
index 87491796523..2ff29b6e40b 100644
--- a/src/backend/access/table/tableam.c
+++ b/src/backend/access/table/tableam.c
@@ -163,10 +163,11 @@ table_parallelscan_initialize(Relation rel, ParallelTableScanDesc pscan,
}
TableScanDesc
-table_beginscan_parallel(Relation relation, ParallelTableScanDesc pscan)
+table_beginscan_parallel(Relation relation, ParallelTableScanDesc pscan, uint32 flags)
{
Snapshot snapshot;
- uint32 flags = SO_TYPE_SEQSCAN |
+
+ flags |= SO_TYPE_SEQSCAN |
SO_ALLOW_STRAT | SO_ALLOW_SYNC | SO_ALLOW_PAGEMODE;
Assert(RelFileLocatorEquals(relation->rd_locator, pscan->phs_locator));
@@ -248,7 +249,7 @@ table_index_fetch_tuple_check(Relation rel,
bool found;
slot = table_slot_create(rel, NULL);
- scan = table_index_fetch_begin(rel);
+ scan = table_index_fetch_begin(rel, 0);
found = table_index_fetch_tuple(scan, tid, snapshot, slot, &call_again,
all_dead);
table_index_fetch_end(scan);
diff --git a/src/backend/commands/constraint.c b/src/backend/commands/constraint.c
index cc11c47b6f2..37cfbd63938 100644
--- a/src/backend/commands/constraint.c
+++ b/src/backend/commands/constraint.c
@@ -106,7 +106,7 @@ unique_key_recheck(PG_FUNCTION_ARGS)
*/
tmptid = checktid;
{
- IndexFetchTableData *scan = table_index_fetch_begin(trigdata->tg_relation);
+ IndexFetchTableData *scan = table_index_fetch_begin(trigdata->tg_relation, 0);
bool call_again = false;
if (!table_index_fetch_tuple(scan, &tmptid, SnapshotSelf, slot,
diff --git a/src/backend/commands/copyto.c b/src/backend/commands/copyto.c
index 4ab4a3893d5..4261baf4a41 100644
--- a/src/backend/commands/copyto.c
+++ b/src/backend/commands/copyto.c
@@ -1156,7 +1156,7 @@ CopyRelationTo(CopyToState cstate, Relation rel, Relation root_rel, uint64 *proc
AttrMap *map = NULL;
TupleTableSlot *root_slot = NULL;
- scandesc = table_beginscan(rel, GetActiveSnapshot(), 0, NULL);
+ scandesc = table_beginscan(rel, GetActiveSnapshot(), 0, NULL, 0);
slot = table_slot_create(rel, NULL);
/*
diff --git a/src/backend/commands/tablecmds.c b/src/backend/commands/tablecmds.c
index f976c0e5c7e..eb35dbbc853 100644
--- a/src/backend/commands/tablecmds.c
+++ b/src/backend/commands/tablecmds.c
@@ -6378,7 +6378,7 @@ ATRewriteTable(AlteredTableInfo *tab, Oid OIDNewHeap)
* checking all the constraints.
*/
snapshot = RegisterSnapshot(GetLatestSnapshot());
- scan = table_beginscan(oldrel, snapshot, 0, NULL);
+ scan = table_beginscan(oldrel, snapshot, 0, NULL, 0);
/*
* Switch to per-tuple memory context and reset it for each tuple
@@ -13768,7 +13768,7 @@ validateForeignKeyConstraint(char *conname,
*/
snapshot = RegisterSnapshot(GetLatestSnapshot());
slot = table_slot_create(rel, NULL);
- scan = table_beginscan(rel, snapshot, 0, NULL);
+ scan = table_beginscan(rel, snapshot, 0, NULL, 0);
perTupCxt = AllocSetContextCreate(CurrentMemoryContext,
"validateForeignKeyConstraint",
@@ -22626,7 +22626,7 @@ MergePartitionsMoveRows(List **wqueue, List *mergingPartitions, Relation newPart
/* Scan through the rows. */
snapshot = RegisterSnapshot(GetLatestSnapshot());
- scan = table_beginscan(mergingPartition, snapshot, 0, NULL);
+ scan = table_beginscan(mergingPartition, snapshot, 0, NULL, 0);
/*
* Switch to per-tuple memory context and reset it for each tuple
@@ -23090,7 +23090,7 @@ SplitPartitionMoveRows(List **wqueue, Relation rel, Relation splitRel,
/* Scan through the rows. */
snapshot = RegisterSnapshot(GetLatestSnapshot());
- scan = table_beginscan(splitRel, snapshot, 0, NULL);
+ scan = table_beginscan(splitRel, snapshot, 0, NULL, 0);
/*
* Switch to per-tuple memory context and reset it for each tuple
diff --git a/src/backend/commands/typecmds.c b/src/backend/commands/typecmds.c
index e5fa0578889..8c114fa56fa 100644
--- a/src/backend/commands/typecmds.c
+++ b/src/backend/commands/typecmds.c
@@ -3157,7 +3157,7 @@ validateDomainNotNullConstraint(Oid domainoid)
/* Scan all tuples in this relation */
snapshot = RegisterSnapshot(GetLatestSnapshot());
- scan = table_beginscan(testrel, snapshot, 0, NULL);
+ scan = table_beginscan(testrel, snapshot, 0, NULL, 0);
slot = table_slot_create(testrel, NULL);
while (table_scan_getnextslot(scan, ForwardScanDirection, slot))
{
@@ -3238,7 +3238,7 @@ validateDomainCheckConstraint(Oid domainoid, const char *ccbin, LOCKMODE lockmod
/* Scan all tuples in this relation */
snapshot = RegisterSnapshot(GetLatestSnapshot());
- scan = table_beginscan(testrel, snapshot, 0, NULL);
+ scan = table_beginscan(testrel, snapshot, 0, NULL, 0);
slot = table_slot_create(testrel, NULL);
while (table_scan_getnextslot(scan, ForwardScanDirection, slot))
{
diff --git a/src/backend/executor/execIndexing.c b/src/backend/executor/execIndexing.c
index 6ae0f959592..6d3e9d2f311 100644
--- a/src/backend/executor/execIndexing.c
+++ b/src/backend/executor/execIndexing.c
@@ -816,7 +816,7 @@ check_exclusion_or_unique_constraint(Relation heap, Relation index,
retry:
conflict = false;
found_self = false;
- index_scan = index_beginscan(heap, index, &DirtySnapshot, NULL, indnkeyatts, 0);
+ index_scan = index_beginscan(heap, index, &DirtySnapshot, NULL, indnkeyatts, 0, 0);
index_rescan(index_scan, scankeys, indnkeyatts, NULL, 0);
while (index_getnext_slot(index_scan, ForwardScanDirection, existing_slot))
diff --git a/src/backend/executor/execReplication.c b/src/backend/executor/execReplication.c
index 173d2fe548d..db1b322c665 100644
--- a/src/backend/executor/execReplication.c
+++ b/src/backend/executor/execReplication.c
@@ -204,7 +204,7 @@ RelationFindReplTupleByIndex(Relation rel, Oid idxoid,
skey_attoff = build_replindex_scan_key(skey, rel, idxrel, searchslot);
/* Start an index scan. */
- scan = index_beginscan(rel, idxrel, &snap, NULL, skey_attoff, 0);
+ scan = index_beginscan(rel, idxrel, &snap, NULL, skey_attoff, 0, 0);
retry:
found = false;
@@ -382,7 +382,7 @@ RelationFindReplTupleSeq(Relation rel, LockTupleMode lockmode,
/* Start a heap scan. */
InitDirtySnapshot(snap);
- scan = table_beginscan(rel, &snap, 0, NULL);
+ scan = table_beginscan(rel, &snap, 0, NULL, 0);
scanslot = table_slot_create(rel, NULL);
retry:
@@ -601,7 +601,7 @@ RelationFindDeletedTupleInfoSeq(Relation rel, TupleTableSlot *searchslot,
* not yet committed or those just committed prior to the scan are
* excluded in update_most_recent_deletion_info().
*/
- scan = table_beginscan(rel, SnapshotAny, 0, NULL);
+ scan = table_beginscan(rel, SnapshotAny, 0, NULL, 0);
scanslot = table_slot_create(rel, NULL);
table_rescan(scan, NULL);
@@ -665,7 +665,7 @@ RelationFindDeletedTupleInfoByIndex(Relation rel, Oid idxoid,
* not yet committed or those just committed prior to the scan are
* excluded in update_most_recent_deletion_info().
*/
- scan = index_beginscan(rel, idxrel, SnapshotAny, NULL, skey_attoff, 0);
+ scan = index_beginscan(rel, idxrel, SnapshotAny, NULL, skey_attoff, 0, 0);
index_rescan(scan, skey, skey_attoff, NULL, 0);
diff --git a/src/backend/executor/nodeBitmapHeapscan.c b/src/backend/executor/nodeBitmapHeapscan.c
index 2c68327cb29..62dff010d10 100644
--- a/src/backend/executor/nodeBitmapHeapscan.c
+++ b/src/backend/executor/nodeBitmapHeapscan.c
@@ -105,11 +105,18 @@ BitmapTableScanSetup(BitmapHeapScanState *node)
*/
if (!node->ss.ss_currentScanDesc)
{
+ uint32 flags = 0;
+
+ if (!bms_is_member(((Scan *) node->ss.ps.plan)->scanrelid,
+ node->ss.ps.state->es_modified_relids))
+ flags = SO_HINT_REL_READ_ONLY;
+
node->ss.ss_currentScanDesc =
table_beginscan_bm(node->ss.ss_currentRelation,
node->ss.ps.state->es_snapshot,
0,
- NULL);
+ NULL,
+ flags);
}
node->ss.ss_currentScanDesc->st.rs_tbmiterator = tbmiterator;
diff --git a/src/backend/executor/nodeIndexonlyscan.c b/src/backend/executor/nodeIndexonlyscan.c
index c2d09374517..2fe724a323f 100644
--- a/src/backend/executor/nodeIndexonlyscan.c
+++ b/src/backend/executor/nodeIndexonlyscan.c
@@ -84,6 +84,12 @@ IndexOnlyNext(IndexOnlyScanState *node)
if (scandesc == NULL)
{
+ uint32 flags = 0;
+
+ if (!bms_is_member(((Scan *) node->ss.ps.plan)->scanrelid,
+ estate->es_modified_relids))
+ flags = SO_HINT_REL_READ_ONLY;
+
/*
* We reach here if the index only scan is not parallel, or if we're
* serially executing an index only scan that was planned to be
@@ -94,7 +100,8 @@ IndexOnlyNext(IndexOnlyScanState *node)
estate->es_snapshot,
&node->ioss_Instrument,
node->ioss_NumScanKeys,
- node->ioss_NumOrderByKeys);
+ node->ioss_NumOrderByKeys,
+ flags);
node->ioss_ScanDesc = scandesc;
diff --git a/src/backend/executor/nodeIndexscan.c b/src/backend/executor/nodeIndexscan.c
index 84823f0b615..0ec660c8fa9 100644
--- a/src/backend/executor/nodeIndexscan.c
+++ b/src/backend/executor/nodeIndexscan.c
@@ -102,6 +102,12 @@ IndexNext(IndexScanState *node)
if (scandesc == NULL)
{
+ uint32 flags = 0;
+
+ if (!bms_is_member(((Scan *) node->ss.ps.plan)->scanrelid,
+ estate->es_modified_relids))
+ flags = SO_HINT_REL_READ_ONLY;
+
/*
* We reach here if the index scan is not parallel, or if we're
* serially executing an index scan that was planned to be parallel.
@@ -111,7 +117,8 @@ IndexNext(IndexScanState *node)
estate->es_snapshot,
&node->iss_Instrument,
node->iss_NumScanKeys,
- node->iss_NumOrderByKeys);
+ node->iss_NumOrderByKeys,
+ flags);
node->iss_ScanDesc = scandesc;
@@ -198,6 +205,12 @@ IndexNextWithReorder(IndexScanState *node)
if (scandesc == NULL)
{
+ uint32 flags = 0;
+
+ if (!bms_is_member(((Scan *) node->ss.ps.plan)->scanrelid,
+ estate->es_modified_relids))
+ flags = SO_HINT_REL_READ_ONLY;
+
/*
* We reach here if the index scan is not parallel, or if we're
* serially executing an index scan that was planned to be parallel.
@@ -207,7 +220,8 @@ IndexNextWithReorder(IndexScanState *node)
estate->es_snapshot,
&node->iss_Instrument,
node->iss_NumScanKeys,
- node->iss_NumOrderByKeys);
+ node->iss_NumOrderByKeys,
+ flags);
node->iss_ScanDesc = scandesc;
diff --git a/src/backend/executor/nodeSeqscan.c b/src/backend/executor/nodeSeqscan.c
index b8119face43..7718376bc2f 100644
--- a/src/backend/executor/nodeSeqscan.c
+++ b/src/backend/executor/nodeSeqscan.c
@@ -65,13 +65,20 @@ SeqNext(SeqScanState *node)
if (scandesc == NULL)
{
+ uint32 flags = 0;
+
+ if (!bms_is_member(((Scan *) node->ss.ps.plan)->scanrelid,
+ estate->es_modified_relids))
+ flags = SO_HINT_REL_READ_ONLY;
+
/*
* We reach here if the scan is not parallel, or if we're serially
* executing a scan that was planned to be parallel.
*/
scandesc = table_beginscan(node->ss.ss_currentRelation,
estate->es_snapshot,
- 0, NULL);
+ 0, NULL, flags);
+
node->ss.ss_currentScanDesc = scandesc;
}
@@ -367,14 +374,20 @@ ExecSeqScanInitializeDSM(SeqScanState *node,
{
EState *estate = node->ss.ps.state;
ParallelTableScanDesc pscan;
+ uint32 flags = 0;
pscan = shm_toc_allocate(pcxt->toc, node->pscan_len);
table_parallelscan_initialize(node->ss.ss_currentRelation,
pscan,
estate->es_snapshot);
shm_toc_insert(pcxt->toc, node->ss.ps.plan->plan_node_id, pscan);
+ if (!bms_is_member(((Scan *) node->ss.ps.plan)->scanrelid,
+ estate->es_modified_relids))
+ flags = SO_HINT_REL_READ_ONLY;
+
node->ss.ss_currentScanDesc =
- table_beginscan_parallel(node->ss.ss_currentRelation, pscan);
+ table_beginscan_parallel(node->ss.ss_currentRelation, pscan,
+ flags);
}
/* ----------------------------------------------------------------
@@ -404,8 +417,15 @@ ExecSeqScanInitializeWorker(SeqScanState *node,
ParallelWorkerContext *pwcxt)
{
ParallelTableScanDesc pscan;
+ uint32 flags = 0;
+
+ if (!bms_is_member(((Scan *) node->ss.ps.plan)->scanrelid,
+ node->ss.ps.state->es_modified_relids))
+ flags = SO_HINT_REL_READ_ONLY;
pscan = shm_toc_lookup(pwcxt->toc, node->ss.ps.plan->plan_node_id, false);
node->ss.ss_currentScanDesc =
- table_beginscan_parallel(node->ss.ss_currentRelation, pscan);
+ table_beginscan_parallel(node->ss.ss_currentRelation,
+ pscan,
+ flags);
}
diff --git a/src/backend/partitioning/partbounds.c b/src/backend/partitioning/partbounds.c
index 0ca312ac27d..b7c4e6d1071 100644
--- a/src/backend/partitioning/partbounds.c
+++ b/src/backend/partitioning/partbounds.c
@@ -3362,7 +3362,7 @@ check_default_partition_contents(Relation parent, Relation default_rel,
econtext = GetPerTupleExprContext(estate);
snapshot = RegisterSnapshot(GetLatestSnapshot());
tupslot = table_slot_create(part_rel, &estate->es_tupleTable);
- scan = table_beginscan(part_rel, snapshot, 0, NULL);
+ scan = table_beginscan(part_rel, snapshot, 0, NULL, 0);
/*
* Switch to per-tuple memory context and reset it for each tuple
diff --git a/src/backend/utils/adt/selfuncs.c b/src/backend/utils/adt/selfuncs.c
index 29fec655593..ac181853225 100644
--- a/src/backend/utils/adt/selfuncs.c
+++ b/src/backend/utils/adt/selfuncs.c
@@ -7181,7 +7181,7 @@ get_actual_variable_endpoint(Relation heapRel,
index_scan = index_beginscan(heapRel, indexRel,
&SnapshotNonVacuumable, NULL,
- 1, 0);
+ 1, 0, 0);
/* Set it up for index-only scan */
index_scan->xs_want_itup = true;
index_rescan(index_scan, scankeys, 1, NULL, 0);
diff --git a/src/include/access/genam.h b/src/include/access/genam.h
index e37834c406d..43b9d8aaaf1 100644
--- a/src/include/access/genam.h
+++ b/src/include/access/genam.h
@@ -177,7 +177,7 @@ extern IndexScanDesc index_beginscan(Relation heapRelation,
Relation indexRelation,
Snapshot snapshot,
IndexScanInstrumentation *instrument,
- int nkeys, int norderbys);
+ int nkeys, int norderbys, uint32 flags);
extern IndexScanDesc index_beginscan_bitmap(Relation indexRelation,
Snapshot snapshot,
IndexScanInstrumentation *instrument,
diff --git a/src/include/access/heapam.h b/src/include/access/heapam.h
index ac771390a37..a0e89365c70 100644
--- a/src/include/access/heapam.h
+++ b/src/include/access/heapam.h
@@ -119,6 +119,12 @@ typedef struct IndexFetchHeapData
Buffer xs_cbuf; /* current heap buffer in scan, if any */
/* NB: if xs_cbuf is not InvalidBuffer, we hold a pin on that buffer */
+
+ /*
+ * Some optimizations can only be performed if the query does not modify
+ * the underlying relation. Track that here.
+ */
+ bool modifies_base_rel;
} IndexFetchHeapData;
/* Result codes for HeapTupleSatisfiesVacuum */
diff --git a/src/include/access/tableam.h b/src/include/access/tableam.h
index c9fa9f259cd..6066ae156de 100644
--- a/src/include/access/tableam.h
+++ b/src/include/access/tableam.h
@@ -63,6 +63,8 @@ typedef enum ScanOptions
/* unregister snapshot at scan end? */
SO_TEMP_SNAPSHOT = 1 << 9,
+ /* set if the query doesn't modify the rel */
+ SO_HINT_REL_READ_ONLY = 1 << 10,
} ScanOptions;
/*
@@ -420,7 +422,7 @@ typedef struct TableAmRoutine
*
* Tuples for an index scan can then be fetched via index_fetch_tuple.
*/
- struct IndexFetchTableData *(*index_fetch_begin) (Relation rel);
+ struct IndexFetchTableData *(*index_fetch_begin) (Relation rel, uint32 flags);
/*
* Reset index fetch. Typically this will release cross index fetch
@@ -873,9 +875,9 @@ extern TupleTableSlot *table_slot_create(Relation relation, List **reglist);
*/
static inline TableScanDesc
table_beginscan(Relation rel, Snapshot snapshot,
- int nkeys, ScanKeyData *key)
+ int nkeys, ScanKeyData *key, uint32 flags)
{
- uint32 flags = SO_TYPE_SEQSCAN |
+ flags |= SO_TYPE_SEQSCAN |
SO_ALLOW_STRAT | SO_ALLOW_SYNC | SO_ALLOW_PAGEMODE;
return rel->rd_tableam->scan_begin(rel, snapshot, nkeys, key, NULL, flags);
@@ -918,9 +920,9 @@ table_beginscan_strat(Relation rel, Snapshot snapshot,
*/
static inline TableScanDesc
table_beginscan_bm(Relation rel, Snapshot snapshot,
- int nkeys, ScanKeyData *key)
+ int nkeys, ScanKeyData *key, uint32 flags)
{
- uint32 flags = SO_TYPE_BITMAPSCAN | SO_ALLOW_PAGEMODE;
+ flags |= SO_TYPE_BITMAPSCAN | SO_ALLOW_PAGEMODE;
return rel->rd_tableam->scan_begin(rel, snapshot, nkeys, key,
NULL, flags);
@@ -1127,7 +1129,8 @@ extern void table_parallelscan_initialize(Relation rel,
* Caller must hold a suitable lock on the relation.
*/
extern TableScanDesc table_beginscan_parallel(Relation relation,
- ParallelTableScanDesc pscan);
+ ParallelTableScanDesc pscan,
+ uint32 flags);
/*
* Begin a parallel tid range scan. `pscan` needs to have been initialized
@@ -1163,9 +1166,9 @@ table_parallelscan_reinitialize(Relation rel, ParallelTableScanDesc pscan)
* Tuples for an index scan can then be fetched via table_index_fetch_tuple().
*/
static inline IndexFetchTableData *
-table_index_fetch_begin(Relation rel)
+table_index_fetch_begin(Relation rel, uint32 flags)
{
- return rel->rd_tableam->index_fetch_begin(rel);
+ return rel->rd_tableam->index_fetch_begin(rel, flags);
}
/*
--
2.43.0
v33-0015-Allow-on-access-pruning-to-set-pages-all-visible.patchtext/x-patch; charset=US-ASCII; name=v33-0015-Allow-on-access-pruning-to-set-pages-all-visible.patchDownload
From 5beb927efb98f05c12dfd84f584c11c48d18bd96 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <melanieplageman@gmail.com>
Date: Wed, 3 Dec 2025 15:24:08 -0500
Subject: [PATCH v33 15/16] Allow on-access pruning to set pages all-visible
Many queries do not modify the underlying relation. For such queries, if
on-access pruning occurs during the scan, we can check whether the page
has become all-visible and update the visibility map accordingly.
Previously, only vacuum and COPY FREEZE marked pages as all-visible or
all-frozen.
This commit implements on-access VM setting for sequential scans as well
as for the underlying heap relation in index scans and bitmap heap
scans.
Author: Melanie Plageman <melanieplageman@gmail.com>
Reviewed-by: Andres Freund <andres@anarazel.de>
Reviewed-by: Kirill Reshke <reshkekirill@gmail.com>
Discussion: https://postgr.es/m/flat/CAAKRu_ZMw6Npd_qm2KM%2BFwQ3cMOMx1Dh3VMhp8-V7SOLxdK9-g%40mail.gmail.com
---
src/backend/access/heap/heapam.c | 15 ++++++-
src/backend/access/heap/heapam_handler.c | 15 ++++++-
src/backend/access/heap/pruneheap.c | 40 ++++++++++++++++++-
src/include/access/heapam.h | 24 +++++++++--
.../t/035_standby_logical_decoding.pl | 3 +-
5 files changed, 89 insertions(+), 8 deletions(-)
diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index f219c7a71cf..8940297f6f3 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -573,6 +573,7 @@ heap_prepare_pagescan(TableScanDesc sscan)
Buffer buffer = scan->rs_cbuf;
BlockNumber block = scan->rs_cblock;
Snapshot snapshot;
+ Buffer *vmbuffer = NULL;
Page page;
int lines;
bool all_visible;
@@ -587,7 +588,9 @@ heap_prepare_pagescan(TableScanDesc sscan)
/*
* Prune and repair fragmentation for the whole page, if possible.
*/
- heap_page_prune_opt(scan->rs_base.rs_rd, buffer);
+ if (sscan->rs_flags & SO_HINT_REL_READ_ONLY)
+ vmbuffer = &scan->rs_vmbuffer;
+ heap_page_prune_opt(scan->rs_base.rs_rd, buffer, vmbuffer);
/*
* We must hold share lock on the buffer content while examining tuple
@@ -1264,6 +1267,7 @@ heap_beginscan(Relation relation, Snapshot snapshot,
sizeof(TBMIterateResult));
}
+ scan->rs_vmbuffer = InvalidBuffer;
return (TableScanDesc) scan;
}
@@ -1302,6 +1306,12 @@ heap_rescan(TableScanDesc sscan, ScanKey key, bool set_params,
scan->rs_cbuf = InvalidBuffer;
}
+ if (BufferIsValid(scan->rs_vmbuffer))
+ {
+ ReleaseBuffer(scan->rs_vmbuffer);
+ scan->rs_vmbuffer = InvalidBuffer;
+ }
+
/*
* SO_TYPE_BITMAPSCAN would be cleaned up here, but it does not hold any
* additional data vs a normal HeapScan
@@ -1334,6 +1344,9 @@ heap_endscan(TableScanDesc sscan)
if (BufferIsValid(scan->rs_cbuf))
ReleaseBuffer(scan->rs_cbuf);
+ if (BufferIsValid(scan->rs_vmbuffer))
+ ReleaseBuffer(scan->rs_vmbuffer);
+
/*
* Must free the read stream before freeing the BufferAccessStrategy.
*/
diff --git a/src/backend/access/heap/heapam_handler.c b/src/backend/access/heap/heapam_handler.c
index e88db52fd7e..ab175948c5b 100644
--- a/src/backend/access/heap/heapam_handler.c
+++ b/src/backend/access/heap/heapam_handler.c
@@ -85,6 +85,7 @@ heapam_index_fetch_begin(Relation rel, uint32 flags)
hscan->xs_base.rel = rel;
hscan->xs_cbuf = InvalidBuffer;
+ hscan->xs_vmbuffer = InvalidBuffer;
hscan->modifies_base_rel = !(flags & SO_HINT_REL_READ_ONLY);
return &hscan->xs_base;
@@ -100,6 +101,12 @@ heapam_index_fetch_reset(IndexFetchTableData *scan)
ReleaseBuffer(hscan->xs_cbuf);
hscan->xs_cbuf = InvalidBuffer;
}
+
+ if (BufferIsValid(hscan->xs_vmbuffer))
+ {
+ ReleaseBuffer(hscan->xs_vmbuffer);
+ hscan->xs_vmbuffer = InvalidBuffer;
+ }
}
static void
@@ -139,7 +146,8 @@ heapam_index_fetch_tuple(struct IndexFetchTableData *scan,
* Prune page, but only if we weren't already on this page
*/
if (prev_buf != hscan->xs_cbuf)
- heap_page_prune_opt(hscan->xs_base.rel, hscan->xs_cbuf);
+ heap_page_prune_opt(hscan->xs_base.rel, hscan->xs_cbuf,
+ hscan->modifies_base_rel ? NULL : &hscan->xs_vmbuffer);
}
/* Obtain share-lock on the buffer so we can examine visibility */
@@ -2477,6 +2485,7 @@ BitmapHeapScanNextBlock(TableScanDesc scan,
TBMIterateResult *tbmres;
OffsetNumber offsets[TBM_MAX_TUPLES_PER_PAGE];
int noffsets = -1;
+ Buffer *vmbuffer = NULL;
Assert(scan->rs_flags & SO_TYPE_BITMAPSCAN);
Assert(hscan->rs_read_stream);
@@ -2523,7 +2532,9 @@ BitmapHeapScanNextBlock(TableScanDesc scan,
/*
* Prune and repair fragmentation for the whole page, if possible.
*/
- heap_page_prune_opt(scan->rs_rd, buffer);
+ if (scan->rs_flags & SO_HINT_REL_READ_ONLY)
+ vmbuffer = &hscan->rs_vmbuffer;
+ heap_page_prune_opt(scan->rs_rd, buffer, vmbuffer);
/*
* We must hold share lock on the buffer content while examining tuple
diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index 9f1257529b9..04aa56e81b6 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -202,6 +202,8 @@ static bool heap_page_will_set_vm(PruneState *prstate,
Relation relation,
BlockNumber heap_blk, Buffer heap_buffer, Page heap_page,
Buffer vmbuffer,
+ PruneReason reason,
+ bool do_prune, bool do_freeze,
int nlpdead_items,
uint8 *old_vmbits,
uint8 *new_vmbits);
@@ -223,9 +225,13 @@ static TransactionId get_conflict_xid(bool do_prune, bool do_freeze, bool do_set
* if there's not any use in pruning.
*
* Caller must have pin on the buffer, and must *not* have a lock on it.
+ *
+ * If vmbuffer is not NULL, it is okay for pruning to set the visibility map if
+ * the page is all-visible. We will take care of pinning and, if needed,
+ * reading in the page of the visibility map.
*/
void
-heap_page_prune_opt(Relation relation, Buffer buffer)
+heap_page_prune_opt(Relation relation, Buffer buffer, Buffer *vmbuffer)
{
Page page = BufferGetPage(buffer);
TransactionId prune_xid;
@@ -306,6 +312,13 @@ heap_page_prune_opt(Relation relation, Buffer buffer)
.cutoffs = NULL,
};
+ if (vmbuffer)
+ {
+ visibilitymap_pin(relation, BufferGetBlockNumber(buffer), vmbuffer);
+ params.options |= HEAP_PAGE_PRUNE_UPDATE_VM;
+ params.vmbuffer = *vmbuffer;
+ }
+
heap_page_prune_and_freeze(¶ms, &presult, &dummy_off_loc,
NULL, NULL);
@@ -951,6 +964,9 @@ identify_and_fix_vm_corruption(Relation rel, Buffer heap_buffer,
* corrupted, it will fix them by clearing the VM bits and visibility hint.
* This does not need to be done in a critical section.
*
+ * This should be called only after do_freeze has been decided (and do_prune
+ * has been set), as these factor into our heuristic-based decision.
+ *
* Returns true if one or both VM bits should be set, along with returning the
* current value of the VM bits in *old_vmbits and the desired new value of
* the VM bits in *new_vmbits.
@@ -964,6 +980,8 @@ heap_page_will_set_vm(PruneState *prstate,
Relation relation,
BlockNumber heap_blk, Buffer heap_buffer, Page heap_page,
Buffer vmbuffer,
+ PruneReason reason,
+ bool do_prune, bool do_freeze,
int nlpdead_items,
uint8 *old_vmbits,
uint8 *new_vmbits)
@@ -974,6 +992,24 @@ heap_page_will_set_vm(PruneState *prstate,
if (!prstate->attempt_update_vm)
return false;
+ /*
+ * If this is an on-access call and we're not actually pruning, avoid
+ * setting the visibility map if it would newly dirty the heap page or, if
+ * the page is already dirty, if doing so would require including a
+ * full-page image (FPI) of the heap page in the WAL. This situation
+ * should be rare, as on-access pruning is only attempted when
+ * pd_prune_xid is valid.
+ */
+ if (reason == PRUNE_ON_ACCESS &&
+ prstate->all_visible &&
+ !do_prune && !do_freeze &&
+ (!BufferIsDirty(heap_buffer) || XLogCheckBufferNeedsBackup(heap_buffer)))
+ {
+ prstate->all_visible = false;
+ prstate->all_frozen = false;
+ return false;
+ }
+
*old_vmbits = visibilitymap_get_status(relation, heap_blk,
&vmbuffer);
@@ -1171,6 +1207,8 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
buffer,
page,
vmbuffer,
+ params->reason,
+ do_prune, do_freeze,
prstate.lpdead_items,
&old_vmbits,
&new_vmbits);
diff --git a/src/include/access/heapam.h b/src/include/access/heapam.h
index a0e89365c70..7e68928f3e9 100644
--- a/src/include/access/heapam.h
+++ b/src/include/access/heapam.h
@@ -95,6 +95,13 @@ typedef struct HeapScanDescData
*/
ParallelBlockTableScanWorkerData *rs_parallelworkerdata;
+ /*
+ * For sequential scans and bitmap heap scans. If the relation is not
+ * being modified, on-access pruning may read in the current heap page's
+ * corresponding VM block to this buffer.
+ */
+ Buffer rs_vmbuffer;
+
/* these fields only used in page-at-a-time mode and for bitmap scans */
uint32 rs_cindex; /* current tuple's index in vistuples */
uint32 rs_ntuples; /* number of visible tuples on page */
@@ -117,8 +124,18 @@ typedef struct IndexFetchHeapData
{
IndexFetchTableData xs_base; /* AM independent part of the descriptor */
- Buffer xs_cbuf; /* current heap buffer in scan, if any */
- /* NB: if xs_cbuf is not InvalidBuffer, we hold a pin on that buffer */
+ /*
+ * Current heap buffer in scan, if any. NB: if xs_cbuf is not
+ * InvalidBuffer, we hold a pin on that buffer.
+ */
+ Buffer xs_cbuf;
+
+ /*
+ * For index scans that do not modify the underlying heap table, on-access
+ * pruning may read in the current heap page's corresponding VM block to
+ * this buffer.
+ */
+ Buffer xs_vmbuffer;
/*
* Some optimizations can only be performed if the query does not modify
@@ -419,7 +436,8 @@ extern TransactionId heap_index_delete_tuples(Relation rel,
TM_IndexDeleteOp *delstate);
/* in heap/pruneheap.c */
-extern void heap_page_prune_opt(Relation relation, Buffer buffer);
+extern void heap_page_prune_opt(Relation relation, Buffer buffer,
+ Buffer *vmbuffer);
extern void heap_page_prune_and_freeze(PruneFreezeParams *params,
PruneFreezeResult *presult,
OffsetNumber *off_loc,
diff --git a/src/test/recovery/t/035_standby_logical_decoding.pl b/src/test/recovery/t/035_standby_logical_decoding.pl
index d264a698ff6..a5536ba4ff6 100644
--- a/src/test/recovery/t/035_standby_logical_decoding.pl
+++ b/src/test/recovery/t/035_standby_logical_decoding.pl
@@ -296,6 +296,7 @@ wal_level = 'logical'
max_replication_slots = 4
max_wal_senders = 4
autovacuum = off
+hot_standby_feedback = on
});
$node_primary->dump_info;
$node_primary->start;
@@ -748,7 +749,7 @@ check_pg_recvlogical_stderr($handle,
$logstart = -s $node_standby->logfile;
reactive_slots_change_hfs_and_wait_for_xmins('shared_row_removal_',
- 'no_conflict_', 0, 1);
+ 'no_conflict_', 1, 0);
# This should not trigger a conflict
wait_until_vacuum_can_remove(
--
2.43.0
v33-0016-Set-pd_prune_xid-on-insert.patchtext/x-patch; charset=UTF-8; name=v33-0016-Set-pd_prune_xid-on-insert.patchDownload
From 5e27f30bd970c7546a2ec763533d03ec44c1d69b Mon Sep 17 00:00:00 2001
From: Melanie Plageman <melanieplageman@gmail.com>
Date: Tue, 29 Jul 2025 16:12:56 -0400
Subject: [PATCH v33 16/16] Set pd_prune_xid on insert
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
Now that visibility map (VM) updates can occur during read-only queries,
it makes sense to also set the page’s pd_prune_xid hint during inserts.
This enables heap_page_prune_and_freeze() to run after a page is
filled with newly inserted tuples the first time it is read.
This change also addresses a long-standing note in heap_insert() and
heap_multi_insert(), which observed that setting pd_prune_xid would
help clean up aborted insertions sooner. Without it, such tuples might
linger until VACUUM, whereas now they can be pruned earlier.
The index killtuples test had to be updated to reflect a larger number
of hits by some accesses. Since the prune_xid is set by the fill/insert
step, on-access pruning can happen during the first access step (before
the DELETE). This is when the VM is extended. After the DELETE, the next
access hits the VM block instead of extending it. Thus, an additional
buffer hit is counted for the table.
Reviewed-by: Chao Li <li.evan.chao@gmail.com>
---
src/backend/access/heap/heapam.c | 25 +++++++++++++------
src/backend/access/heap/heapam_xlog.c | 15 ++++++++++-
.../modules/index/expected/killtuples.out | 6 ++---
3 files changed, 34 insertions(+), 12 deletions(-)
diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index 8940297f6f3..18413d5878f 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -2122,6 +2122,7 @@ heap_insert(Relation relation, HeapTuple tup, CommandId cid,
TransactionId xid = GetCurrentTransactionId();
HeapTuple heaptup;
Buffer buffer;
+ Page page;
Buffer vmbuffer = InvalidBuffer;
bool all_visible_cleared = false;
@@ -2181,15 +2182,19 @@ heap_insert(Relation relation, HeapTuple tup, CommandId cid,
}
/*
- * XXX Should we set PageSetPrunable on this page ?
+ * Set pd_prune_xid to trigger heap_page_prune_and_freeze() once the page
+ * is full so that we can set the page all-visible in the VM.
*
- * The inserting transaction may eventually abort thus making this tuple
- * DEAD and hence available for pruning. Though we don't want to optimize
- * for aborts, if no other tuple in this page is UPDATEd/DELETEd, the
- * aborted tuple will never be pruned until next vacuum is triggered.
+ * Setting pd_prune_xid is also handy if the inserting transaction
+ * eventually aborts making this tuple DEAD and hence available for
+ * pruning. If no other tuple in this page is UPDATEd/DELETEd, the aborted
+ * tuple would never otherwise be pruned until next vacuum is triggered.
*
- * If you do add PageSetPrunable here, add it in heap_xlog_insert too.
+ * Don't set it if we are in bootstrap mode, though.
*/
+ page = BufferGetPage(buffer);
+ if (TransactionIdIsNormal(xid))
+ PageSetPrunable(page, xid);
MarkBufferDirty(buffer);
@@ -2199,7 +2204,6 @@ heap_insert(Relation relation, HeapTuple tup, CommandId cid,
xl_heap_insert xlrec;
xl_heap_header xlhdr;
XLogRecPtr recptr;
- Page page = BufferGetPage(buffer);
uint8 info = XLOG_HEAP_INSERT;
int bufflags = 0;
@@ -2563,8 +2567,13 @@ heap_multi_insert(Relation relation, TupleTableSlot **slots, int ntuples,
}
/*
- * XXX Should we set PageSetPrunable on this page ? See heap_insert()
+ * Set pd_prune_xid. See heap_insert() for more on why we do this when
+ * inserting tuples. This only makes sense if we aren't already
+ * setting the page frozen in the VM. We also don't set it in
+ * bootstrap mode.
*/
+ if (!all_frozen_set && TransactionIdIsNormal(xid))
+ PageSetPrunable(page, xid);
MarkBufferDirty(buffer);
diff --git a/src/backend/access/heap/heapam_xlog.c b/src/backend/access/heap/heapam_xlog.c
index 9a29fda3601..49cc83a6479 100644
--- a/src/backend/access/heap/heapam_xlog.c
+++ b/src/backend/access/heap/heapam_xlog.c
@@ -447,6 +447,12 @@ heap_xlog_insert(XLogReaderState *record)
freespace = PageGetHeapFreeSpace(page); /* needed to update FSM below */
+ /*
+ * Set the page prunable to trigger on-access pruning later which may
+ * set the page all-visible in the VM.
+ */
+ PageSetPrunable(page, XLogRecGetXid(record));
+
PageSetLSN(page, lsn);
if (xlrec->flags & XLH_INSERT_ALL_VISIBLE_CLEARED)
@@ -596,9 +602,16 @@ heap_xlog_multi_insert(XLogReaderState *record)
if (xlrec->flags & XLH_INSERT_ALL_VISIBLE_CLEARED)
PageClearAllVisible(page);
- /* XLH_INSERT_ALL_FROZEN_SET implies that all tuples are visible */
+ /*
+ * XLH_INSERT_ALL_FROZEN_SET implies that all tuples are visible. If
+ * we are not setting the page frozen, then set the page's prunable
+ * hint so that we trigger on-access pruning later which may set the
+ * page all-visible in the VM.
+ */
if (xlrec->flags & XLH_INSERT_ALL_FROZEN_SET)
PageSetAllVisible(page);
+ else
+ PageSetPrunable(page, XLogRecGetXid(record));
MarkBufferDirty(buffer);
}
diff --git a/src/test/modules/index/expected/killtuples.out b/src/test/modules/index/expected/killtuples.out
index be7ddd756ef..b29f2434b00 100644
--- a/src/test/modules/index/expected/killtuples.out
+++ b/src/test/modules/index/expected/killtuples.out
@@ -54,7 +54,7 @@ step flush: SELECT FROM pg_stat_force_next_flush();
step result: SELECT heap_blks_read + heap_blks_hit - counter.heap_accesses AS new_heap_accesses FROM counter, pg_statio_all_tables WHERE relname = 'kill_prior_tuple';
new_heap_accesses
-----------------
- 1
+ 2
(1 row)
step measure: UPDATE counter SET heap_accesses = (SELECT heap_blks_read + heap_blks_hit FROM pg_statio_all_tables WHERE relname = 'kill_prior_tuple');
@@ -130,7 +130,7 @@ step flush: SELECT FROM pg_stat_force_next_flush();
step result: SELECT heap_blks_read + heap_blks_hit - counter.heap_accesses AS new_heap_accesses FROM counter, pg_statio_all_tables WHERE relname = 'kill_prior_tuple';
new_heap_accesses
-----------------
- 1
+ 2
(1 row)
step measure: UPDATE counter SET heap_accesses = (SELECT heap_blks_read + heap_blks_hit FROM pg_statio_all_tables WHERE relname = 'kill_prior_tuple');
@@ -283,7 +283,7 @@ step flush: SELECT FROM pg_stat_force_next_flush();
step result: SELECT heap_blks_read + heap_blks_hit - counter.heap_accesses AS new_heap_accesses FROM counter, pg_statio_all_tables WHERE relname = 'kill_prior_tuple';
new_heap_accesses
-----------------
- 1
+ 2
(1 row)
step measure: UPDATE counter SET heap_accesses = (SELECT heap_blks_read + heap_blks_hit FROM pg_statio_all_tables WHERE relname = 'kill_prior_tuple');
--
2.43.0
On Tue, 6 Jan 2026 at 22:32, Melanie Plageman <melanieplageman@gmail.com> wrote:
Ah, yes, I forgot to remove that when I removed the old
visibilitymap_set() and made visibilitymap_set_vmbits() into
visiblitymap_set(). Done in v33.- Melanie
I think 0001-0003 and 0009-0010 are ready.
That test creates a table, inserts tuples, accesses one page, deletes
all the data, accesses a single page again (until the table is
vacuumed, the pages will still be there and have to be scanned even
though the data is deleted). The first time we set the VM on-access,
we have to extend the VM. That VM access is an extend and not a hit.
Once we set pd_prune_xid on the page, the extend happens during the
first access (before the delete), so when we access the VM after the
delete step, that is counted as a hit and we end up with more hits in
the stats.
Good
--
Best regards,
Kirill Reshke
On Jan 7, 2026, at 01:31, Melanie Plageman <melanieplageman@gmail.com> wrote:
On Tue, Jan 6, 2026 at 4:40 AM Andrey Borodin <x4mmm@yandex-team.ru> wrote:
<v32-0014-Pass-down-information-on-table-modification-to-s.patch>
I've tried to take an attempt to review some patches of this patchset. It's huge and mostly polished.
I've added attributed your review on the patches you specifically
mention here (and from previous emails you sent). Let me know if there
are other patches you reviewed that you did not mention.In a step "Pass down information on table modification to scan node" you pass SO_HINT_REL_READ_ONLY flag in IndexNext() and BitmapTableScanSetup(), but not in IndexNextWithReorder() and IndexOnlyNext(). Is there a reason why index scans with ordering cannot use on-access VM setting?
Great point, I simply hadn't tested those cases and didn't think to
add them. I've added them in attached v33.While looking at other callers of index_beginscan(), I was wondering
if systable_beginscan() and systable_beginscan_ordered() should ever
pass SO_HINT_REL_READ_ONLY. I guess we would need to pass if the
operation is read-only above the index_beginscan() -- I'm not sure if
we always know in the caller of systable_beginscan() whether this
operation will modify the catalog. That seems like it could be a
separate project, though, so maybe it is better to say this feature is
just for regular tables.As for the other cases: We don't have the relation range table index
in check_exclusion_or_unique_constraints(), so I don't think we can do
it there.And I think that the other index scan cases like in replication code
or get_actual_variable_endpoint() are too small to be worth it, don't
have the needed info, or don't do on-access pruning (bc of the
snapshot type they use).Also, comment about visibilitymap_set() says "Callers that log VM changes separately should use visibilitymap_set()" as if visibilitymap_set() is some other function.
Ah, yes, I forgot to remove that when I removed the old
visibilitymap_set() and made visibilitymap_set_vmbits() into
visiblitymap_set(). Done in v33.- Melanie
<v33-0001-Combine-visibilitymap_set-cases-in-lazy_scan_pru.patch><v33-0002-Eliminate-use-of-cached-VM-value-in-lazy_scan_pr.patch><v33-0003-Refactor-lazy_scan_prune-VM-clear-logic-into-hel.patch><v33-0004-Set-the-VM-in-heap_page_prune_and_freeze.patch><v33-0005-Move-VM-assert-into-prune-freeze-code.patch><v33-0006-Eliminate-XLOG_HEAP2_VISIBLE-from-vacuum-phase-I.patch><v33-0007-Eliminate-XLOG_HEAP2_VISIBLE-from-empty-page-vac.patch><v33-0008-Remove-XLOG_HEAP2_VISIBLE-entirely.patch><v33-0009-Simplify-heap_page_would_be_all_visible-visibili.patch><v33-0010-Remove-table_scan_analyze_next_tuple-unneeded-pa.patch><v33-0011-Use-GlobalVisState-in-vacuum-to-determine-page-l.patch><v33-0012-Unset-all_visible-sooner-if-not-freezing.patch><v33-0013-Track-which-relations-are-modified-by-a-query.patch><v33-0014-Pass-down-information-on-table-modification-to-s.patch><v33-0015-Allow-on-access-pruning-to-set-pages-all-visible.patch><v33-0016-Set-pd_prune_xid-on-insert.patch>
I see the same problem in 0009 and 0010:
0009
```
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -3570,6 +3570,7 @@ heap_page_would_be_all_visible(Relation rel, Buffer buf,
{
ItemId itemid;
HeapTupleData tuple;
+ TransactionId dead_after;
/*
* Set the offset number so that we can display it along with any
@@ -3609,12 +3610,14 @@ heap_page_would_be_all_visible(Relation rel, Buffer buf,
/* Visibility checks may do IO or allocate memory */
Assert(CritSectionCount == 0);
- switch (HeapTupleSatisfiesVacuum(&tuple, OldestXmin, buf))
+ switch (HeapTupleSatisfiesVacuumHorizon(&tuple, buf, &dead_after))
{
case HEAPTUPLE_LIVE:
{
TransactionId xmin;
+ Assert(!TransactionIdIsValid(dead_after));
+
/* Check comments in lazy_scan_prune. */
if (!HeapTupleHeaderXminCommitted(tuple.t_data))
{
@@ -3647,8 +3650,10 @@ heap_page_would_be_all_visible(Relation rel, Buffer buf,
}
break;
- case HEAPTUPLE_DEAD:
case HEAPTUPLE_RECENTLY_DEAD:
+ Assert(TransactionIdIsValid(dead_after));
+ /* FALLTHROUGH */
```
0010:
```
static bool
-heapam_scan_analyze_next_tuple(TableScanDesc scan, TransactionId OldestXmin,
+heapam_scan_analyze_next_tuple(TableScanDesc scan,
double *liverows, double *deadrows,
TupleTableSlot *slot)
{
@@ -1047,6 +1047,7 @@ heapam_scan_analyze_next_tuple(TableScanDesc scan, TransactionId OldestXmin,
ItemId itemid;
HeapTuple targtuple = &hslot->base.tupdata;
bool sample_it = false;
+ TransactionId dead_after;
itemid = PageGetItemId(targpage, hscan->rs_cindex);
@@ -1069,16 +1070,20 @@ heapam_scan_analyze_next_tuple(TableScanDesc scan, TransactionId OldestXmin,
targtuple->t_data = (HeapTupleHeader) PageGetItem(targpage, itemid);
targtuple->t_len = ItemIdGetLength(itemid);
- switch (HeapTupleSatisfiesVacuum(targtuple, OldestXmin,
- hscan->rs_cbuf))
+ switch (HeapTupleSatisfiesVacuumHorizon(targtuple,
+ hscan->rs_cbuf,
+ &dead_after))
{
case HEAPTUPLE_LIVE:
sample_it = true;
*liverows += 1;
break;
- case HEAPTUPLE_DEAD:
case HEAPTUPLE_RECENTLY_DEAD:
+ Assert(TransactionIdIsValid(dead_after));
+ /* FALLTHROUGH */
```
I believe the reason why we add Assert(TransactionIdIsValid(dead_after)) under HEAPTUPLE_RECENTLY_DEAD is to ensure that when HeapTupleSatisfiesVacuumHorizon() returns HEAPTUPLE_RECENTLY_DEAD, dead_after must be set. So the goal of the assert is to catch bugs of HeapTupleSatisfiesVacuumHorizon().
From this perspective, I now feel dead_after should be initialized to InvalidTransactionId. Otherwise, say HeapTupleSatisfiesVacuumHorizon() has a bug and miss to set dead_after, then the assert mostly like won’t be fired, because it holds a random value, most likely not be 0.
I know this comment conflicts to one of my previous comments, sorry about that. As I read this patch once and again, I am getting more understanding to it.
0014
```
+ /* set if the query doesn't modify the rel */
+ SO_HINT_REL_READ_ONLY = 1 << 10,
```
Nit: I think it’s better to replace “rel” to “relation”. For a function comment, if there is a parameter named “rel”, then we can use it to refer to the parameter, without such a context, I guess here a while word is better.
Best regards,
--
Chao Li (Evan)
HighGo Software Co., Ltd.
https://www.highgo.com/