Fix some inconsistencies with open-coded visibilitymap_set() callers

Started by Melanie Plageman7 months ago7 messages

melanieplageman@gmail.com

7 months ago

1 attachment(s)

Hi,

visibilitymap_set() arguably breaks a few of the coding rules for
modifying and WAL logging buffers set out in
src/backend/access/transam/README.

In 4 of visibilitymap_set()'s 5 non-recovery callers, we set
PD_ALL_VISIBLE and mark the heap buffer dirty outside of the critical
section in which we make the VM changes and emit WAL.

In at least two of its callers, before visibilitymap_set() is called,
MarkBufferDirty() is used when MarkBufferDirtyHint() would be
appropriate (when checksums/wal_log_hints are disabled) -- because we
are not making other changes to the heap page.

And in at least one of its callers (see lazy_scan_prune()), where
PD_ALL_VISIBLE is already set and we don't mark the buffer dirty and
checksums/wal_log_hints are enabled, visibilitymap_set() will still
set the heap page LSN -- even though we didn't set the buffer dirty.

It may be uncommon for the page to be set PD_ALL_VISIBLE and the VM
bit to be clear because of a crash or error after modifying the heap
page but before setting the VM. But it seems easy for us to be only
setting the all-frozen bit in the VM and thus not need to set
PD_ALL_VISIBLE or mark the heap buffer dirty. In that case, we'll
incorrectly set the page LSN without having marked the buffer dirty
(when wal_log_hints/checksums on).

Besides all of these issues, having these operations open-coded all
over the place is error-prone. It's easy to forget to always
PageSetAllVisible() before visibilitymap_set().

I've attached a patch to add a heap-specific wrapper for
visibilitymap_set() that attempts to handle all cases.

One thing I noticed is that log_heap_visible() will do
XLogRegisterBuffer() for the heap page -- even if we made no changes
to the heap page. It seems like we shouldn't do that. The most common
case would be when setting all-visible pages all-frozen when checksums
are not enabled. I don't actually know how we handle replay when we
are not sure which blocks might be registered, though.

Besides feeling like principled cleanup, this patch is a prerequisite
for a larger project I am working on [1]/messages/by-id/CAAKRu_ZMw6Npd_qm2KM+FwQ3cMOMx1Dh3VMhp8-V7SOLxdK9-g@mail.gmail.com (targeting 19) to combine VM
updates in the same WAL record as the heap page changes and eliminate
xl_heap_visible.

- Melanie

[1]: /messages/by-id/CAAKRu_ZMw6Npd_qm2KM+FwQ3cMOMx1Dh3VMhp8-V7SOLxdK9-g@mail.gmail.com

Attachments:

v1-0001-Introduce-heap-specific-wrapper-for-visibilitymap.patchtext/x-patch; charset=US-ASCII; name=v1-0001-Introduce-heap-specific-wrapper-for-visibilitymap.patchDownload

From 4a654c720fcea84a320d5e9378976d9bafd76dd2 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <melanieplageman@gmail.com>
Date: Tue, 24 Jun 2025 15:05:41 -0400
Subject: [PATCH v1] Introduce heap-specific wrapper for visibilitymap_set()

visibilitymap_set(), which sets bits in the visibility map corresponding
to the heap block of the table passed in, arguably breaks a few of the
coding rules for modifying and WAL logging buffers set out in
access/transam/README.

In several of the places where visibilitymap_set() is called, setting
the heap page PD_ALL_VISIBLE and marking the buffer dirty are done
outside of a critical section.

In some places before visibilitymap_set() is called, MarkBufferDirty()
is used when MarkBufferDirtyHint() would be appropriate.

And in some places where PD_ALL_VISIBLE may already be set and we don't
mark the buffer dirty, when checksums/wal_log_hints are enabled
visibilitymap_set() will still set the heap page LSN -- even though it
was correct not to set the buffer dirty.

Besides all of these issues, having these operations open-coded all over
the place is error-prone. This commit introduces a wrapper that does the
correct operations to the heap page itself and invokes
visibilitymap_set() to make changes to the VM page.
---
 src/backend/access/heap/heapam.c        | 92 ++++++++++++++++++++-----
 src/backend/access/heap/heapam_xlog.c   |  2 +-
 src/backend/access/heap/vacuumlazy.c    | 66 +++++-------------
 src/backend/access/heap/visibilitymap.c | 58 ++++++----------
 src/include/access/heapam.h             |  3 +
 src/include/access/visibilitymap.h      |  2 +-
 6 files changed, 117 insertions(+), 106 deletions(-)

diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index 0dcd6ee817e..112f946dab0 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -2505,8 +2505,6 @@ heap_multi_insert(Relation relation, TupleTableSlot **slots, int ntuples,
 								BufferGetBlockNumber(buffer),
 								vmbuffer, VISIBILITYMAP_VALID_BITS);
 		}
-		else if (all_frozen_set)
-			PageSetAllVisible(page);
 
 		/*
 		 * XXX Should we set PageSetPrunable on this page ? See heap_insert()
@@ -2632,23 +2630,16 @@ heap_multi_insert(Relation relation, TupleTableSlot **slots, int ntuples,
 
 		/*
 		 * If we've frozen everything on the page, update the visibilitymap.
-		 * We're already holding pin on the vmbuffer.
+		 * We're already holding pin on the vmbuffer. It's fine to use
+		 * InvalidTransactionId as the cutoff_xid here - this is only used
+		 * when HEAP_INSERT_FROZEN is specified, which intentionally violates
+		 * visibility rules.
 		 */
 		if (all_frozen_set)
-		{
-			Assert(PageIsAllVisible(page));
-			Assert(visibilitymap_pin_ok(BufferGetBlockNumber(buffer), vmbuffer));
-
-			/*
-			 * It's fine to use InvalidTransactionId here - this is only used
-			 * when HEAP_INSERT_FROZEN is specified, which intentionally
-			 * violates visibility rules.
-			 */
-			visibilitymap_set(relation, BufferGetBlockNumber(buffer), buffer,
-							  InvalidXLogRecPtr, vmbuffer,
-							  InvalidTransactionId,
-							  VISIBILITYMAP_ALL_VISIBLE | VISIBILITYMAP_ALL_FROZEN);
-		}
+			heap_page_set_vm_and_log(relation, BufferGetBlockNumber(buffer), buffer,
+									 vmbuffer,
+									 InvalidTransactionId,
+									 VISIBILITYMAP_ALL_VISIBLE | VISIBILITYMAP_ALL_FROZEN);
 
 		UnlockReleaseBuffer(buffer);
 		ndone += nthispage;
@@ -7840,6 +7831,73 @@ heap_tuple_needs_eventual_freeze(HeapTupleHeader tuple)
 	return false;
 }
 
+/*
+ * Make the heap and VM page changes needed to set a page all-visible.
+ * Do not call in recovery.
+ */
+uint8
+heap_page_set_vm_and_log(Relation rel, BlockNumber heap_blk, Buffer heap_buf,
+						 Buffer vmbuf, TransactionId cutoff_xid,
+						 uint8 vmflags)
+{
+	Page		heap_page = BufferGetPage(heap_buf);
+	bool		set_heap_lsn = false;
+	XLogRecPtr	recptr = InvalidXLogRecPtr;
+	uint8		old_vmbits = 0;
+
+	Assert(BufferIsValid(heap_buf));
+
+	START_CRIT_SECTION();
+
+	/* Check that we have the right heap page pinned, if present */
+	if (BufferGetBlockNumber(heap_buf) != heap_blk)
+		elog(ERROR, "wrong heap buffer passed to heap_page_set_vm_and_log");
+
+	/*
+	 * We must never end up with the VM bit set and the page-level
+	 * PD_ALL_VISIBLE bit clear. If that were to occur, a subsequent page
+	 * modification would fail to clear the VM bit. Though it is possible for
+	 * the page-level bit to be set and the VM bit to be clear if checksums
+	 * and wal_log_hints are not enabled.
+	 */
+	if (!PageIsAllVisible(heap_page))
+	{
+		PageSetAllVisible(heap_page);
+
+		/*
+		 * Buffer will usually be dirty from other changes, so it is worth the
+		 * extra check
+		 */
+		if (!BufferIsDirty(heap_buf))
+		{
+			if (XLogHintBitIsNeeded())
+				MarkBufferDirty(heap_buf);
+			else
+				MarkBufferDirtyHint(heap_buf, true);
+		}
+
+		set_heap_lsn = XLogHintBitIsNeeded();
+	}
+
+	old_vmbits = visibilitymap_set(rel, heap_blk, heap_buf,
+								   &recptr, vmbuf, cutoff_xid, vmflags);
+
+	/*
+	 * If we modified the heap page and data checksums are enabled (or
+	 * wal_log_hints=on), we need to protect the heap page from being torn.
+	 *
+	 * If not, then we must *not* update the heap page's LSN. In this case,
+	 * the FPI for the heap page was omitted from the WAL record inserted in
+	 * the VM record, so it would be incorrect to update the heap page's LSN.
+	 */
+	if (set_heap_lsn)
+		PageSetLSN(heap_page, recptr);
+
+	END_CRIT_SECTION();
+
+	return old_vmbits;
+}
+
 /*
  * heap_tuple_should_freeze
  *
diff --git a/src/backend/access/heap/heapam_xlog.c b/src/backend/access/heap/heapam_xlog.c
index fa94e104f1c..d9a4d6af2f8 100644
--- a/src/backend/access/heap/heapam_xlog.c
+++ b/src/backend/access/heap/heapam_xlog.c
@@ -297,7 +297,7 @@ heap_xlog_visible(XLogReaderState *record)
 		reln = CreateFakeRelcacheEntry(rlocator);
 		visibilitymap_pin(reln, blkno, &vmbuffer);
 
-		visibilitymap_set(reln, blkno, InvalidBuffer, lsn, vmbuffer,
+		visibilitymap_set(reln, blkno, InvalidBuffer, &lsn, vmbuffer,
 						  xlrec->snapshotConflictHorizon, vmbits);
 
 		ReleaseBuffer(vmbuffer);
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index 09416450af9..a9d0c570361 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -1876,9 +1876,6 @@ lazy_scan_new_or_empty(LVRelState *vacrel, Buffer buf, BlockNumber blkno,
 
 			START_CRIT_SECTION();
 
-			/* mark buffer dirty before writing a WAL record */
-			MarkBufferDirty(buf);
-
 			/*
 			 * It's possible that another backend has extended the heap,
 			 * initialized the page, and then failed to WAL-log the page due
@@ -1890,14 +1887,15 @@ lazy_scan_new_or_empty(LVRelState *vacrel, Buffer buf, BlockNumber blkno,
 			 */
 			if (RelationNeedsWAL(vacrel->rel) &&
 				PageGetLSN(page) == InvalidXLogRecPtr)
+			{
+				MarkBufferDirty(buf);
 				log_newpage_buffer(buf, true);
+			}
 
-			PageSetAllVisible(page);
-			old_vmbits = visibilitymap_set(vacrel->rel, blkno, buf,
-										   InvalidXLogRecPtr,
-										   vmbuffer, InvalidTransactionId,
-										   VISIBILITYMAP_ALL_VISIBLE |
-										   VISIBILITYMAP_ALL_FROZEN);
+			old_vmbits = heap_page_set_vm_and_log(vacrel->rel, blkno, buf,
+												  vmbuffer, InvalidTransactionId,
+												  VISIBILITYMAP_ALL_VISIBLE |
+												  VISIBILITYMAP_ALL_FROZEN);
 			END_CRIT_SECTION();
 
 			/*
@@ -2079,25 +2077,9 @@ lazy_scan_prune(LVRelState *vacrel,
 			flags |= VISIBILITYMAP_ALL_FROZEN;
 		}
 
-		/*
-		 * It should never be the case that the visibility map page is set
-		 * while the page-level bit is clear, but the reverse is allowed (if
-		 * checksums are not enabled).  Regardless, set both bits so that we
-		 * get back in sync.
-		 *
-		 * NB: If the heap page is all-visible but the VM bit is not set, we
-		 * don't need to dirty the heap page.  However, if checksums are
-		 * enabled, we do need to make sure that the heap page is dirtied
-		 * before passing it to visibilitymap_set(), because it may be logged.
-		 * Given that this situation should only happen in rare cases after a
-		 * crash, it is not worth optimizing.
-		 */
-		PageSetAllVisible(page);
-		MarkBufferDirty(buf);
-		old_vmbits = visibilitymap_set(vacrel->rel, blkno, buf,
-									   InvalidXLogRecPtr,
-									   vmbuffer, presult.vm_conflict_horizon,
-									   flags);
+		old_vmbits = heap_page_set_vm_and_log(vacrel->rel, blkno, buf,
+											  vmbuffer, presult.vm_conflict_horizon,
+											  flags);
 
 		/*
 		 * If the page wasn't already set all-visible and/or all-frozen in the
@@ -2169,17 +2151,6 @@ lazy_scan_prune(LVRelState *vacrel,
 	{
 		uint8		old_vmbits;
 
-		/*
-		 * Avoid relying on all_visible_according_to_vm as a proxy for the
-		 * page-level PD_ALL_VISIBLE bit being set, since it might have become
-		 * stale -- even when all_visible is set
-		 */
-		if (!PageIsAllVisible(page))
-		{
-			PageSetAllVisible(page);
-			MarkBufferDirty(buf);
-		}
-
 		/*
 		 * Set the page all-frozen (and all-visible) in the VM.
 		 *
@@ -2188,11 +2159,10 @@ lazy_scan_prune(LVRelState *vacrel,
 		 * was logged when the page's tuples were frozen.
 		 */
 		Assert(!TransactionIdIsValid(presult.vm_conflict_horizon));
-		old_vmbits = visibilitymap_set(vacrel->rel, blkno, buf,
-									   InvalidXLogRecPtr,
-									   vmbuffer, InvalidTransactionId,
-									   VISIBILITYMAP_ALL_VISIBLE |
-									   VISIBILITYMAP_ALL_FROZEN);
+		old_vmbits = heap_page_set_vm_and_log(vacrel->rel, blkno, buf,
+											  vmbuffer, InvalidTransactionId,
+											  VISIBILITYMAP_ALL_VISIBLE |
+											  VISIBILITYMAP_ALL_FROZEN);
 
 		/*
 		 * The page was likely already set all-visible in the VM. However,
@@ -2924,11 +2894,9 @@ lazy_vacuum_heap_page(LVRelState *vacrel, BlockNumber blkno, Buffer buffer,
 			flags |= VISIBILITYMAP_ALL_FROZEN;
 		}
 
-		PageSetAllVisible(page);
-		old_vmbits = visibilitymap_set(vacrel->rel, blkno, buffer,
-									   InvalidXLogRecPtr,
-									   vmbuffer, visibility_cutoff_xid,
-									   flags);
+		old_vmbits = heap_page_set_vm_and_log(vacrel->rel, blkno, buffer,
+											  vmbuffer, visibility_cutoff_xid,
+											  flags);
 
 		/*
 		 * If the page wasn't already set all-visible and/or all-frozen in the
diff --git a/src/backend/access/heap/visibilitymap.c b/src/backend/access/heap/visibilitymap.c
index 745a04ef26e..c57632168c7 100644
--- a/src/backend/access/heap/visibilitymap.c
+++ b/src/backend/access/heap/visibilitymap.c
@@ -222,29 +222,31 @@ visibilitymap_pin_ok(BlockNumber heapBlk, Buffer vmbuf)
 /*
  *	visibilitymap_set - set bit(s) on a previously pinned page
  *
- * recptr is the LSN of the XLOG record we're replaying, if we're in recovery,
- * or InvalidXLogRecPtr in normal running.  The VM page LSN is advanced to the
- * one provided; in normal running, we generate a new XLOG record and set the
- * page LSN to that value (though the heap page's LSN may *not* be updated;
- * see below).  cutoff_xid is the largest xmin on the page being marked
- * all-visible; it is needed for Hot Standby, and can be InvalidTransactionId
- * if the page contains no tuples.  It can also be set to InvalidTransactionId
- * when a page that is already all-visible is being marked all-frozen.
- *
  * Caller is expected to set the heap page's PD_ALL_VISIBLE bit before calling
- * this function. Except in recovery, caller should also pass the heap
- * buffer. When checksums are enabled and we're not in recovery, we must add
- * the heap buffer to the WAL chain to protect it from being torn.
+ * this function. Except in recovery, caller should also pass the heap buffer.
+ * When checksums are enabled and we're not in recovery, we must add the heap
+ * buffer to the WAL chain to protect it from being torn.
  *
  * You must pass a buffer containing the correct map page to this function.
  * Call visibilitymap_pin first to pin the right one. This function doesn't do
  * any I/O.
  *
- * Returns the state of the page's VM bits before setting flags.
+ * cutoff_xid is the largest xmin on the page being marked all-visible; it is
+ * needed for Hot Standby, and can be InvalidTransactionId if the page
+ * contains no tuples.  It can also be set to InvalidTransactionId when a page
+ * that is already all-visible is being marked all-frozen.
+ *
+ * If we're in recovery, recptr points to the LSN of the XLOG record we're
+ * replaying and the VM page LSN is advanced to this LSN. During normal
+ * running, we'll generate a new XLOG record for the changes to the VM and set
+ * the VM page LSN. We will return this LSN in recptr, and the caller may use
+ * this to set the heap page LSN.
+ *
+ * Returns the state of the page's VM bits before setting flags and sets.
  */
 uint8
 visibilitymap_set(Relation rel, BlockNumber heapBlk, Buffer heapBuf,
-				  XLogRecPtr recptr, Buffer vmBuf, TransactionId cutoff_xid,
+				  XLogRecPtr *recptr, Buffer vmBuf, TransactionId cutoff_xid,
 				  uint8 flags)
 {
 	BlockNumber mapBlock = HEAPBLK_TO_MAPBLOCK(heapBlk);
@@ -258,17 +260,13 @@ visibilitymap_set(Relation rel, BlockNumber heapBlk, Buffer heapBuf,
 	elog(DEBUG1, "vm_set %s %d", RelationGetRelationName(rel), heapBlk);
 #endif
 
-	Assert(InRecovery || XLogRecPtrIsInvalid(recptr));
+	Assert(InRecovery || XLogRecPtrIsInvalid(*recptr));
 	Assert(InRecovery || PageIsAllVisible((Page) BufferGetPage(heapBuf)));
 	Assert((flags & VISIBILITYMAP_VALID_BITS) == flags);
 
 	/* Must never set all_frozen bit without also setting all_visible bit */
 	Assert(flags != VISIBILITYMAP_ALL_FROZEN);
 
-	/* Check that we have the right heap page pinned, if present */
-	if (BufferIsValid(heapBuf) && BufferGetBlockNumber(heapBuf) != heapBlk)
-		elog(ERROR, "wrong heap buffer passed to visibilitymap_set");
-
 	/* Check that we have the right VM page pinned */
 	if (!BufferIsValid(vmBuf) || BufferGetBlockNumber(vmBuf) != mapBlock)
 		elog(ERROR, "wrong VM buffer passed to visibilitymap_set");
@@ -287,28 +285,12 @@ visibilitymap_set(Relation rel, BlockNumber heapBlk, Buffer heapBuf,
 
 		if (RelationNeedsWAL(rel))
 		{
-			if (XLogRecPtrIsInvalid(recptr))
+			if (XLogRecPtrIsInvalid(*recptr))
 			{
 				Assert(!InRecovery);
-				recptr = log_heap_visible(rel, heapBuf, vmBuf, cutoff_xid, flags);
-
-				/*
-				 * If data checksums are enabled (or wal_log_hints=on), we
-				 * need to protect the heap page from being torn.
-				 *
-				 * If not, then we must *not* update the heap page's LSN. In
-				 * this case, the FPI for the heap page was omitted from the
-				 * WAL record inserted above, so it would be incorrect to
-				 * update the heap page's LSN.
-				 */
-				if (XLogHintBitIsNeeded())
-				{
-					Page		heapPage = BufferGetPage(heapBuf);
-
-					PageSetLSN(heapPage, recptr);
-				}
+				*recptr = log_heap_visible(rel, heapBuf, vmBuf, cutoff_xid, flags);
 			}
-			PageSetLSN(page, recptr);
+			PageSetLSN(page, *recptr);
 		}
 
 		END_CRIT_SECTION();
diff --git a/src/include/access/heapam.h b/src/include/access/heapam.h
index 3a9424c19c9..9375296062f 100644
--- a/src/include/access/heapam.h
+++ b/src/include/access/heapam.h
@@ -360,6 +360,9 @@ extern bool heap_tuple_should_freeze(HeapTupleHeader tuple,
 									 TransactionId *NoFreezePageRelfrozenXid,
 									 MultiXactId *NoFreezePageRelminMxid);
 extern bool heap_tuple_needs_eventual_freeze(HeapTupleHeader tuple);
+extern uint8 heap_page_set_vm_and_log(Relation rel, BlockNumber heap_blk, Buffer heap_buf,
+									  Buffer vmbuf, TransactionId cutoff_xid,
+									  uint8 vmflags);
 
 extern void simple_heap_insert(Relation relation, HeapTuple tup);
 extern void simple_heap_delete(Relation relation, ItemPointer tid);
diff --git a/src/include/access/visibilitymap.h b/src/include/access/visibilitymap.h
index be21c6dd1a3..4c7472e0b51 100644
--- a/src/include/access/visibilitymap.h
+++ b/src/include/access/visibilitymap.h
@@ -33,7 +33,7 @@ extern void visibilitymap_pin(Relation rel, BlockNumber heapBlk,
 extern bool visibilitymap_pin_ok(BlockNumber heapBlk, Buffer vmbuf);
 extern uint8 visibilitymap_set(Relation rel,
 							   BlockNumber heapBlk, Buffer heapBuf,
-							   XLogRecPtr recptr,
+							   XLogRecPtr *recptr,
 							   Buffer vmBuf,
 							   TransactionId cutoff_xid,
 							   uint8 flags);
-- 
2.34.1

Melanie Plageman

melanieplageman@gmail.com

7 months ago

In reply to: Melanie Plageman (#1)

1 attachment(s)

Re: Fix some inconsistencies with open-coded visibilitymap_set() callers

On Tue, Jun 24, 2025 at 4:01 PM Melanie Plageman
<melanieplageman@gmail.com> wrote:

visibilitymap_set() arguably breaks a few of the coding rules for
modifying and WAL logging buffers set out in
src/backend/access/transam/README.

Here is a rebased version of this (it had some conflicts with recent commits).

- Melanie

Attachments:

v2-0001-Introduce-heap-specific-wrapper-for-visibilitymap.patchtext/x-patch; charset=US-ASCII; name=v2-0001-Introduce-heap-specific-wrapper-for-visibilitymap.patchDownload

From 0b4998028f30001d455f8856a58ce909725a427a Mon Sep 17 00:00:00 2001
From: Melanie Plageman <melanieplageman@gmail.com>
Date: Thu, 26 Jun 2025 15:45:35 -0400
Subject: [PATCH v2] Introduce heap-specific wrapper for visibilitymap_set()

visibilitymap_set(), which sets bits in the visibility map corresponding
to the heap block of the table passed in, arguably breaks a few of the
coding rules for modifying and WAL logging buffers set out in
access/transam/README.

In several of the places where visibilitymap_set() is called, setting
the heap page PD_ALL_VISIBLE and marking the buffer dirty are done
outside of a critical section.

In some places before visibilitymap_set() is called, MarkBufferDirty()
is used when MarkBufferDirtyHint() would be appropriate.

And in some places where PD_ALL_VISIBLE may already be set and we don't
mark the buffer dirty, when checksums/wal_log_hints are enabled
visibilitymap_set() will still set the heap page LSN -- even though it
was correct not to set the buffer dirty.

Besides all of these issues, having these operations open-coded all over
the place is error-prone. This commit introduces a wrapper that does the
correct operations to the heap page itself and invokes
visibilitymap_set() to make changes to the VM page.
---
 src/backend/access/heap/heapam.c        | 92 ++++++++++++++++++++-----
 src/backend/access/heap/heapam_xlog.c   |  2 +-
 src/backend/access/heap/vacuumlazy.c    | 66 +++++-------------
 src/backend/access/heap/visibilitymap.c | 58 ++++++----------
 src/include/access/heapam.h             |  3 +
 src/include/access/visibilitymap.h      |  2 +-
 6 files changed, 117 insertions(+), 106 deletions(-)

diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index 0dcd6ee817e..112f946dab0 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -2505,8 +2505,6 @@ heap_multi_insert(Relation relation, TupleTableSlot **slots, int ntuples,
 								BufferGetBlockNumber(buffer),
 								vmbuffer, VISIBILITYMAP_VALID_BITS);
 		}
-		else if (all_frozen_set)
-			PageSetAllVisible(page);
 
 		/*
 		 * XXX Should we set PageSetPrunable on this page ? See heap_insert()
@@ -2632,23 +2630,16 @@ heap_multi_insert(Relation relation, TupleTableSlot **slots, int ntuples,
 
 		/*
 		 * If we've frozen everything on the page, update the visibilitymap.
-		 * We're already holding pin on the vmbuffer.
+		 * We're already holding pin on the vmbuffer. It's fine to use
+		 * InvalidTransactionId as the cutoff_xid here - this is only used
+		 * when HEAP_INSERT_FROZEN is specified, which intentionally violates
+		 * visibility rules.
 		 */
 		if (all_frozen_set)
-		{
-			Assert(PageIsAllVisible(page));
-			Assert(visibilitymap_pin_ok(BufferGetBlockNumber(buffer), vmbuffer));
-
-			/*
-			 * It's fine to use InvalidTransactionId here - this is only used
-			 * when HEAP_INSERT_FROZEN is specified, which intentionally
-			 * violates visibility rules.
-			 */
-			visibilitymap_set(relation, BufferGetBlockNumber(buffer), buffer,
-							  InvalidXLogRecPtr, vmbuffer,
-							  InvalidTransactionId,
-							  VISIBILITYMAP_ALL_VISIBLE | VISIBILITYMAP_ALL_FROZEN);
-		}
+			heap_page_set_vm_and_log(relation, BufferGetBlockNumber(buffer), buffer,
+									 vmbuffer,
+									 InvalidTransactionId,
+									 VISIBILITYMAP_ALL_VISIBLE | VISIBILITYMAP_ALL_FROZEN);
 
 		UnlockReleaseBuffer(buffer);
 		ndone += nthispage;
@@ -7840,6 +7831,73 @@ heap_tuple_needs_eventual_freeze(HeapTupleHeader tuple)
 	return false;
 }
 
+/*
+ * Make the heap and VM page changes needed to set a page all-visible.
+ * Do not call in recovery.
+ */
+uint8
+heap_page_set_vm_and_log(Relation rel, BlockNumber heap_blk, Buffer heap_buf,
+						 Buffer vmbuf, TransactionId cutoff_xid,
+						 uint8 vmflags)
+{
+	Page		heap_page = BufferGetPage(heap_buf);
+	bool		set_heap_lsn = false;
+	XLogRecPtr	recptr = InvalidXLogRecPtr;
+	uint8		old_vmbits = 0;
+
+	Assert(BufferIsValid(heap_buf));
+
+	START_CRIT_SECTION();
+
+	/* Check that we have the right heap page pinned, if present */
+	if (BufferGetBlockNumber(heap_buf) != heap_blk)
+		elog(ERROR, "wrong heap buffer passed to heap_page_set_vm_and_log");
+
+	/*
+	 * We must never end up with the VM bit set and the page-level
+	 * PD_ALL_VISIBLE bit clear. If that were to occur, a subsequent page
+	 * modification would fail to clear the VM bit. Though it is possible for
+	 * the page-level bit to be set and the VM bit to be clear if checksums
+	 * and wal_log_hints are not enabled.
+	 */
+	if (!PageIsAllVisible(heap_page))
+	{
+		PageSetAllVisible(heap_page);
+
+		/*
+		 * Buffer will usually be dirty from other changes, so it is worth the
+		 * extra check
+		 */
+		if (!BufferIsDirty(heap_buf))
+		{
+			if (XLogHintBitIsNeeded())
+				MarkBufferDirty(heap_buf);
+			else
+				MarkBufferDirtyHint(heap_buf, true);
+		}
+
+		set_heap_lsn = XLogHintBitIsNeeded();
+	}
+
+	old_vmbits = visibilitymap_set(rel, heap_blk, heap_buf,
+								   &recptr, vmbuf, cutoff_xid, vmflags);
+
+	/*
+	 * If we modified the heap page and data checksums are enabled (or
+	 * wal_log_hints=on), we need to protect the heap page from being torn.
+	 *
+	 * If not, then we must *not* update the heap page's LSN. In this case,
+	 * the FPI for the heap page was omitted from the WAL record inserted in
+	 * the VM record, so it would be incorrect to update the heap page's LSN.
+	 */
+	if (set_heap_lsn)
+		PageSetLSN(heap_page, recptr);
+
+	END_CRIT_SECTION();
+
+	return old_vmbits;
+}
+
 /*
  * heap_tuple_should_freeze
  *
diff --git a/src/backend/access/heap/heapam_xlog.c b/src/backend/access/heap/heapam_xlog.c
index eb4bd3d6ae3..f2bc1bd06ee 100644
--- a/src/backend/access/heap/heapam_xlog.c
+++ b/src/backend/access/heap/heapam_xlog.c
@@ -297,7 +297,7 @@ heap_xlog_visible(XLogReaderState *record)
 		reln = CreateFakeRelcacheEntry(rlocator);
 		visibilitymap_pin(reln, blkno, &vmbuffer);
 
-		visibilitymap_set(reln, blkno, InvalidBuffer, lsn, vmbuffer,
+		visibilitymap_set(reln, blkno, InvalidBuffer, &lsn, vmbuffer,
 						  xlrec->snapshotConflictHorizon, vmbits);
 
 		ReleaseBuffer(vmbuffer);
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index 8a42e17aec2..c0608af7d29 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -1874,9 +1874,6 @@ lazy_scan_new_or_empty(LVRelState *vacrel, Buffer buf, BlockNumber blkno,
 		{
 			START_CRIT_SECTION();
 
-			/* mark buffer dirty before writing a WAL record */
-			MarkBufferDirty(buf);
-
 			/*
 			 * It's possible that another backend has extended the heap,
 			 * initialized the page, and then failed to WAL-log the page due
@@ -1888,14 +1885,15 @@ lazy_scan_new_or_empty(LVRelState *vacrel, Buffer buf, BlockNumber blkno,
 			 */
 			if (RelationNeedsWAL(vacrel->rel) &&
 				PageGetLSN(page) == InvalidXLogRecPtr)
+			{
+				MarkBufferDirty(buf);
 				log_newpage_buffer(buf, true);
+			}
 
-			PageSetAllVisible(page);
-			visibilitymap_set(vacrel->rel, blkno, buf,
-							  InvalidXLogRecPtr,
-							  vmbuffer, InvalidTransactionId,
-							  VISIBILITYMAP_ALL_VISIBLE |
-							  VISIBILITYMAP_ALL_FROZEN);
+			heap_page_set_vm_and_log(vacrel->rel, blkno, buf,
+									 vmbuffer, InvalidTransactionId,
+									 VISIBILITYMAP_ALL_VISIBLE |
+									 VISIBILITYMAP_ALL_FROZEN);
 			END_CRIT_SECTION();
 
 			/* Count the newly all-frozen pages for logging */
@@ -2069,25 +2067,9 @@ lazy_scan_prune(LVRelState *vacrel,
 			flags |= VISIBILITYMAP_ALL_FROZEN;
 		}
 
-		/*
-		 * It should never be the case that the visibility map page is set
-		 * while the page-level bit is clear, but the reverse is allowed (if
-		 * checksums are not enabled).  Regardless, set both bits so that we
-		 * get back in sync.
-		 *
-		 * NB: If the heap page is all-visible but the VM bit is not set, we
-		 * don't need to dirty the heap page.  However, if checksums are
-		 * enabled, we do need to make sure that the heap page is dirtied
-		 * before passing it to visibilitymap_set(), because it may be logged.
-		 * Given that this situation should only happen in rare cases after a
-		 * crash, it is not worth optimizing.
-		 */
-		PageSetAllVisible(page);
-		MarkBufferDirty(buf);
-		old_vmbits = visibilitymap_set(vacrel->rel, blkno, buf,
-									   InvalidXLogRecPtr,
-									   vmbuffer, presult.vm_conflict_horizon,
-									   flags);
+		old_vmbits = heap_page_set_vm_and_log(vacrel->rel, blkno, buf,
+											  vmbuffer, presult.vm_conflict_horizon,
+											  flags);
 
 		/*
 		 * If the page wasn't already set all-visible and/or all-frozen in the
@@ -2159,17 +2141,6 @@ lazy_scan_prune(LVRelState *vacrel,
 	{
 		uint8		old_vmbits;
 
-		/*
-		 * Avoid relying on all_visible_according_to_vm as a proxy for the
-		 * page-level PD_ALL_VISIBLE bit being set, since it might have become
-		 * stale -- even when all_visible is set
-		 */
-		if (!PageIsAllVisible(page))
-		{
-			PageSetAllVisible(page);
-			MarkBufferDirty(buf);
-		}
-
 		/*
 		 * Set the page all-frozen (and all-visible) in the VM.
 		 *
@@ -2178,11 +2149,10 @@ lazy_scan_prune(LVRelState *vacrel,
 		 * was logged when the page's tuples were frozen.
 		 */
 		Assert(!TransactionIdIsValid(presult.vm_conflict_horizon));
-		old_vmbits = visibilitymap_set(vacrel->rel, blkno, buf,
-									   InvalidXLogRecPtr,
-									   vmbuffer, InvalidTransactionId,
-									   VISIBILITYMAP_ALL_VISIBLE |
-									   VISIBILITYMAP_ALL_FROZEN);
+		old_vmbits = heap_page_set_vm_and_log(vacrel->rel, blkno, buf,
+											  vmbuffer, InvalidTransactionId,
+											  VISIBILITYMAP_ALL_VISIBLE |
+											  VISIBILITYMAP_ALL_FROZEN);
 
 		/*
 		 * The page was likely already set all-visible in the VM. However,
@@ -2913,11 +2883,9 @@ lazy_vacuum_heap_page(LVRelState *vacrel, BlockNumber blkno, Buffer buffer,
 			flags |= VISIBILITYMAP_ALL_FROZEN;
 		}
 
-		PageSetAllVisible(page);
-		visibilitymap_set(vacrel->rel, blkno, buffer,
-						  InvalidXLogRecPtr,
-						  vmbuffer, visibility_cutoff_xid,
-						  flags);
+		heap_page_set_vm_and_log(vacrel->rel, blkno, buffer,
+								 vmbuffer, visibility_cutoff_xid,
+								 flags);
 
 		/* Count the newly set VM page for logging */
 		vacrel->vm_new_visible_pages++;
diff --git a/src/backend/access/heap/visibilitymap.c b/src/backend/access/heap/visibilitymap.c
index 745a04ef26e..c57632168c7 100644
--- a/src/backend/access/heap/visibilitymap.c
+++ b/src/backend/access/heap/visibilitymap.c
@@ -222,29 +222,31 @@ visibilitymap_pin_ok(BlockNumber heapBlk, Buffer vmbuf)
 /*
  *	visibilitymap_set - set bit(s) on a previously pinned page
  *
- * recptr is the LSN of the XLOG record we're replaying, if we're in recovery,
- * or InvalidXLogRecPtr in normal running.  The VM page LSN is advanced to the
- * one provided; in normal running, we generate a new XLOG record and set the
- * page LSN to that value (though the heap page's LSN may *not* be updated;
- * see below).  cutoff_xid is the largest xmin on the page being marked
- * all-visible; it is needed for Hot Standby, and can be InvalidTransactionId
- * if the page contains no tuples.  It can also be set to InvalidTransactionId
- * when a page that is already all-visible is being marked all-frozen.
- *
  * Caller is expected to set the heap page's PD_ALL_VISIBLE bit before calling
- * this function. Except in recovery, caller should also pass the heap
- * buffer. When checksums are enabled and we're not in recovery, we must add
- * the heap buffer to the WAL chain to protect it from being torn.
+ * this function. Except in recovery, caller should also pass the heap buffer.
+ * When checksums are enabled and we're not in recovery, we must add the heap
+ * buffer to the WAL chain to protect it from being torn.
  *
  * You must pass a buffer containing the correct map page to this function.
  * Call visibilitymap_pin first to pin the right one. This function doesn't do
  * any I/O.
  *
- * Returns the state of the page's VM bits before setting flags.
+ * cutoff_xid is the largest xmin on the page being marked all-visible; it is
+ * needed for Hot Standby, and can be InvalidTransactionId if the page
+ * contains no tuples.  It can also be set to InvalidTransactionId when a page
+ * that is already all-visible is being marked all-frozen.
+ *
+ * If we're in recovery, recptr points to the LSN of the XLOG record we're
+ * replaying and the VM page LSN is advanced to this LSN. During normal
+ * running, we'll generate a new XLOG record for the changes to the VM and set
+ * the VM page LSN. We will return this LSN in recptr, and the caller may use
+ * this to set the heap page LSN.
+ *
+ * Returns the state of the page's VM bits before setting flags and sets.
  */
 uint8
 visibilitymap_set(Relation rel, BlockNumber heapBlk, Buffer heapBuf,
-				  XLogRecPtr recptr, Buffer vmBuf, TransactionId cutoff_xid,
+				  XLogRecPtr *recptr, Buffer vmBuf, TransactionId cutoff_xid,
 				  uint8 flags)
 {
 	BlockNumber mapBlock = HEAPBLK_TO_MAPBLOCK(heapBlk);
@@ -258,17 +260,13 @@ visibilitymap_set(Relation rel, BlockNumber heapBlk, Buffer heapBuf,
 	elog(DEBUG1, "vm_set %s %d", RelationGetRelationName(rel), heapBlk);
 #endif
 
-	Assert(InRecovery || XLogRecPtrIsInvalid(recptr));
+	Assert(InRecovery || XLogRecPtrIsInvalid(*recptr));
 	Assert(InRecovery || PageIsAllVisible((Page) BufferGetPage(heapBuf)));
 	Assert((flags & VISIBILITYMAP_VALID_BITS) == flags);
 
 	/* Must never set all_frozen bit without also setting all_visible bit */
 	Assert(flags != VISIBILITYMAP_ALL_FROZEN);
 
-	/* Check that we have the right heap page pinned, if present */
-	if (BufferIsValid(heapBuf) && BufferGetBlockNumber(heapBuf) != heapBlk)
-		elog(ERROR, "wrong heap buffer passed to visibilitymap_set");
-
 	/* Check that we have the right VM page pinned */
 	if (!BufferIsValid(vmBuf) || BufferGetBlockNumber(vmBuf) != mapBlock)
 		elog(ERROR, "wrong VM buffer passed to visibilitymap_set");
@@ -287,28 +285,12 @@ visibilitymap_set(Relation rel, BlockNumber heapBlk, Buffer heapBuf,
 
 		if (RelationNeedsWAL(rel))
 		{
-			if (XLogRecPtrIsInvalid(recptr))
+			if (XLogRecPtrIsInvalid(*recptr))
 			{
 				Assert(!InRecovery);
-				recptr = log_heap_visible(rel, heapBuf, vmBuf, cutoff_xid, flags);
-
-				/*
-				 * If data checksums are enabled (or wal_log_hints=on), we
-				 * need to protect the heap page from being torn.
-				 *
-				 * If not, then we must *not* update the heap page's LSN. In
-				 * this case, the FPI for the heap page was omitted from the
-				 * WAL record inserted above, so it would be incorrect to
-				 * update the heap page's LSN.
-				 */
-				if (XLogHintBitIsNeeded())
-				{
-					Page		heapPage = BufferGetPage(heapBuf);
-
-					PageSetLSN(heapPage, recptr);
-				}
+				*recptr = log_heap_visible(rel, heapBuf, vmBuf, cutoff_xid, flags);
 			}
-			PageSetLSN(page, recptr);
+			PageSetLSN(page, *recptr);
 		}
 
 		END_CRIT_SECTION();
diff --git a/src/include/access/heapam.h b/src/include/access/heapam.h
index 3a9424c19c9..9375296062f 100644
--- a/src/include/access/heapam.h
+++ b/src/include/access/heapam.h
@@ -360,6 +360,9 @@ extern bool heap_tuple_should_freeze(HeapTupleHeader tuple,
 									 TransactionId *NoFreezePageRelfrozenXid,
 									 MultiXactId *NoFreezePageRelminMxid);
 extern bool heap_tuple_needs_eventual_freeze(HeapTupleHeader tuple);
+extern uint8 heap_page_set_vm_and_log(Relation rel, BlockNumber heap_blk, Buffer heap_buf,
+									  Buffer vmbuf, TransactionId cutoff_xid,
+									  uint8 vmflags);
 
 extern void simple_heap_insert(Relation relation, HeapTuple tup);
 extern void simple_heap_delete(Relation relation, ItemPointer tid);
diff --git a/src/include/access/visibilitymap.h b/src/include/access/visibilitymap.h
index be21c6dd1a3..4c7472e0b51 100644
--- a/src/include/access/visibilitymap.h
+++ b/src/include/access/visibilitymap.h
@@ -33,7 +33,7 @@ extern void visibilitymap_pin(Relation rel, BlockNumber heapBlk,
 extern bool visibilitymap_pin_ok(BlockNumber heapBlk, Buffer vmbuf);
 extern uint8 visibilitymap_set(Relation rel,
 							   BlockNumber heapBlk, Buffer heapBuf,
-							   XLogRecPtr recptr,
+							   XLogRecPtr *recptr,
 							   Buffer vmBuf,
 							   TransactionId cutoff_xid,
 							   uint8 flags);
-- 
2.34.1

Robert Haas

robertmhaas@gmail.com

7 months ago

In reply to: Melanie Plageman (#2)

Re: Fix some inconsistencies with open-coded visibilitymap_set() callers

On Thu, Jun 26, 2025 at 3:56 PM Melanie Plageman
<melanieplageman@gmail.com> wrote:

Here is a rebased version of this (it had some conflicts with recent commits).

One general complaint is that the commit message complains about the
status quo but isn't real clear about what the patch is actually
fixing.

I'm pretty concerned about this change:

/*
* If the page is all visible, need to clear that, unless we're only
* going to add further frozen rows to it.
*
* If we're only adding already frozen rows to a previously empty
* page, mark it as all-visible.
*/
if (PageIsAllVisible(page) && !(options & HEAP_INSERT_FROZEN))
{
all_visible_cleared = true;
PageClearAllVisible(page);
visibilitymap_clear(relation,
BufferGetBlockNumber(buffer),
vmbuffer, VISIBILITYMAP_VALID_BITS);
}
- else if (all_frozen_set)
- PageSetAllVisible(page);

One problem is that this falsifies the comment -- the second sentence
in the comment refers to code that the patch deletes. But also, the
all_frozen_set flag is used a bit further down when setting
xlrec->flags, and now we'll set XLH_INSERT_ALL_FROZEN_SET when in fact
we didn't set the all-frozen bit. It seems like the intention here was
that the existing XLOG_HEAP2_MULTI_INSERT record should cover setting
the all-frozen bit, if we do that, and it seems like you're changing
that somehow, but it's not clear to me why we'd want to change that,
and even if we do, and it doesn't appear to me that you've chased down
all the loose ends i.e. if that's the plan, then I'd expect the redo
routine to be modified and XLH_INSERT_ALL_FROZEN_SET to be deleted.

I think this might actually be an underlying design problem with the
patch -- heap_page_set_vm_and_log() seems to want to be in charge of
both what we do with the heap page and what we do with the
corresponding VM bit, but that runs contrary to the caller's need to
bundle the VM bit changes with some WAL record that is doing other
things to the heap page.

heap_page_set_vm_and_log() says that it shouldn't be called during
recovery but doesn't assert that it isn't.

-        /*
-         * It should never be the case that the visibility map page is set
-         * while the page-level bit is clear, but the reverse is allowed (if
-         * checksums are not enabled).  Regardless, set both bits so that we
-         * get back in sync.
-         *
-         * NB: If the heap page is all-visible but the VM bit is not set, we
-         * don't need to dirty the heap page.  However, if checksums are
-         * enabled, we do need to make sure that the heap page is dirtied
-         * before passing it to visibilitymap_set(), because it may be logged.
-         * Given that this situation should only happen in rare cases after a
-         * crash, it is not worth optimizing.
-         */
-        PageSetAllVisible(page);
-        MarkBufferDirty(buf);
-        old_vmbits = visibilitymap_set(vacrel->rel, blkno, buf,
-                                       InvalidXLogRecPtr,
-                                       vmbuffer, presult.vm_conflict_horizon,
-                                       flags);
+        old_vmbits = heap_page_set_vm_and_log(vacrel->rel, blkno, buf,
+                                              vmbuffer,
presult.vm_conflict_horizon,
+                                              flags);

Why are we deleting a huge comment here explaining the intent of the
code when, AFAICS, the intent of the new code is the same? Even if the
intent of the new code isn't the same, why no comment?

- /*
- * If data checksums are enabled (or wal_log_hints=on), we
- * need to protect the heap page from being torn.
- *
- * If not, then we must *not* update the heap page's LSN. In
- * this case, the FPI for the heap page was omitted from the
- * WAL record inserted above, so it would be incorrect to
- * update the heap page's LSN.
- */
- if (XLogHintBitIsNeeded())
- {
- Page heapPage = BufferGetPage(heapBuf);
-
- PageSetLSN(heapPage, recptr);
- }

This is another lengthy explanatory comment that goes away with this
patch, but this time the code goes, too. But why? Presumably, either
(1) this change is wrong, or (2) the preexisting code is wrong, or (3)
the patch is trying to change the contract between visibilitymap_set()
and its callers. If it's (3), then I think there should be more
explanation of that contract change. It is unclear to me why you've
rearranged the header comment for visibilitymap_set() in the way that
you have: some of the material that used to be at the top is now at
the bottom, but it's if anything a little less extensive than before,
and certainly not enough for me to understand why or whether this
change is correct.

+    /*
+     * We must never end up with the VM bit set and the page-level
+     * PD_ALL_VISIBLE bit clear. If that were to occur, a subsequent page
+     * modification would fail to clear the VM bit. Though it is possible for
+     * the page-level bit to be set and the VM bit to be clear if checksums
+     * and wal_log_hints are not enabled.
+     */

The last sentence is not great English and maybe not that relevant,
either. I would suggest deleting the whole thing or deleting the word
"Though" and putting the rest of the sentence in parentheses.

--
Robert Haas
EDB: http://www.enterprisedb.com

Melanie Plageman

melanieplageman@gmail.com

6 months ago

In reply to: Robert Haas (#3)

Re: Fix some inconsistencies with open-coded visibilitymap_set() callers

On Mon, Jun 30, 2025 at 3:01 PM Robert Haas <robertmhaas@gmail.com> wrote:

I'm pretty concerned about this change:

/*
* If the page is all visible, need to clear that, unless we're only
* going to add further frozen rows to it.
*
* If we're only adding already frozen rows to a previously empty
* page, mark it as all-visible.
*/
if (PageIsAllVisible(page) && !(options & HEAP_INSERT_FROZEN))
{
all_visible_cleared = true;
PageClearAllVisible(page);
visibilitymap_clear(relation,
BufferGetBlockNumber(buffer),
vmbuffer, VISIBILITYMAP_VALID_BITS);
}
- else if (all_frozen_set)
- PageSetAllVisible(page);

One problem is that this falsifies the comment -- the second sentence
in the comment refers to code that the patch deletes.

Yep, I should have updated that comment.

But also, the
all_frozen_set flag is used a bit further down when setting
xlrec->flags, and now we'll set XLH_INSERT_ALL_FROZEN_SET when in fact
we didn't set the all-frozen bit. It seems like the intention here was
that the existing XLOG_HEAP2_MULTI_INSERT record should cover setting
the all-frozen bit, if we do that, and it seems like you're changing
that somehow, but it's not clear to me why we'd want to change that,
and even if we do, and it doesn't appear to me that you've chased down
all the loose ends i.e. if that's the plan, then I'd expect the redo
routine to be modified and XLH_INSERT_ALL_FROZEN_SET to be deleted.

Right, yes, I accidentally forgot to remove setting PD_ALL_VISIBLE
from heap_xlog_multi_insert(). That was not great.

As for why I changed the responsibility, I was trying to make COPY
FREEZE more consistent with all of the other places we set the VM and
mark PD_ALL_VISIBLE. In other places in master, we make the changes to
the heap page that render the page all-visible and then separately set
PD_ALL_VISIBLE and emit an xl_heap_visible record.

Ironically, my larger patch set [1]/messages/by-id/CAAKRu_ZMw6Npd_qm2KM+FwQ3cMOMx1Dh3VMhp8-V7SOLxdK9-g@mail.gmail.com seeks to eliminate xl_heap_visible
and add the changes to the VM to the WAL records that make the changes
rendering the page all-visible.

I think you are right that the change I made in the patch in this
thread is not an improvement to COPY FREEZE.

I think this might actually be an underlying design problem with the
patch -- heap_page_set_vm_and_log() seems to want to be in charge of
both what we do with the heap page and what we do with the
corresponding VM bit, but that runs contrary to the caller's need to
bundle the VM bit changes with some WAL record that is doing other
things to the heap page.

The way it is currently, visibilitymap_set() sets the heap page LSN
but it doesn't actually set PD_ALL_VISIBLE or mark the heap buffer
dirty. There is no way to tell visibilitymap_set() whether or not you
modified the heap page -- it just assumes you did. That separation of
duties didn't make sense to me.

In fact, even if you make visibilitymap_set() aware that you haven't
modified the heap page and then don't set the heap page LSN, we'll
still register the heap buffer when emitting the xl_heap_visible
record and then end up modifying it when replaying that record.

heap_page_set_vm_and_log() says that it shouldn't be called during
recovery but doesn't assert that it isn't.

Ah, good point.

So, I read and started addressing your other inline code review, but
after thinking more about your comment about
heap_page_set_vm_and_log() having a design problem, I went back to the
drawing board and tried to figure out 1) which of the issues I point
out are actually bugs and 2) how I can do my larger patch set without
these wrappers. I solved 2, but that's a story for another thread.

As for 1, so, I went through all of these cases and tried to figure
out what I thought was actually a bug and should be fixed. The only
one I think is a bug is the case in lazy_scan_prune() where the page
was already set all-visible but not already set all-frozen. The code
is as follows:

else if (all_visible_according_to_vm && presult.all_visible &&
presult.all_frozen && !VM_ALL_FROZEN(vacrel->rel, blkno,
&vmbuffer))
{
...
if (!PageIsAllVisible(page))
{
PageSetAllVisible(page);
MarkBufferDirty(buf);
}

...
old_vmbits = visibilitymap_set(vacrel->rel, blkno, buf,
InvalidXLogRecPtr,
vmbuffer, InvalidTransactionId,
VISIBILITYMAP_ALL_VISIBLE |
VISIBILITYMAP_ALL_FROZEN);

In that case, when checksums or wal_log_hints are on, if
PD_ALL_VISIBLE is already set and we didn't otherwise dirty the page
during heap_page_prune_and_freeze(), visibilitymap_set() will stamp
the clean heap page with the LSN of the xl_heap_visible record.

I spent quite a bit of time trying to think of what could happen if we
advance the LSN of an otherwise clean page and then don't mark it
dirty to reflect having made that change. And, honestly, I couldn't
come up with something.

If someone evicts the page or we crash, we'll lose the LSN, but that
seems like it can't hurt anything if nothing else on the page has
changed.

But, I may be insufficiently creative. I know we are definitely not
supposed to advance the LSN of clean pages probably at all and even
more so without marking them dirty.

One thing we could do is check if the heap buffer is dirty before
setting the LSN in visibilitymap_set():

diff --git a/src/backend/access/heap/visibilitymap.c
b/src/backend/access/heap/visibilitymap.c
index 745a04ef26e..a4e2e454e1f 100644
--- a/src/backend/access/heap/visibilitymap.c
+++ b/src/backend/access/heap/visibilitymap.c
@@ -301,7 +301,7 @@ visibilitymap_set(Relation rel, BlockNumber
heapBlk, Buffer heapBuf,
                 * WAL record inserted above, so it would be incorrect to
                 * update the heap page's LSN.
                 */
-               if (XLogHintBitIsNeeded())
+               if (XLogHintBitIsNeeded() && BufferIsDirty(heapBuf))
                {
                    Page        heapPage = BufferGetPage(heapBuf);

Something about this doesn't feel altogether right, though, but I'm
not sure why.

As for the not bug cases:

For all the cases where we modify the heap and VM page not in the same
critical section, we could error out after making the heap page
changes before setting the VM. But because this would just lead to
PD_ALL_VISIBLE being set and the VM bit not being set, which is
technically fine.

It seems like it would be better to make all of the changes in the
same critical section, and, in some of the cases, it wouldn't be hard
to do so. But, since I cannot claim a correctness issue, we can leave
it the way it is.

In lazy_scan_new_or_empty(), it would be better to PD_ALL_VISIBLE
before marking the buffer dirty, but it doesn't really matter because
no one else could write the buffer out since we have an exclusive lock

And in heap_multi_insert() for the COPY FREEZE case, in recovery, we
set PD_ALL_VISIBLE and mark the buffer dirty when replaying both the
xl_heap_multi_insert record and when replaying the xl_heap_visible
record. This involves taking and releasing the content lock on the
heap buffer page twice to make redundant changes. Again, not a
correctness issue, but I think it makes much more sense to include the
VM changes in the xl_heap_multi_insert record.

So, in summary, do you think we should do something about the
lazy_scan_prune() -> visibilitymap_set() case where we advance the LSN
of a clean buffer and don't mark it dirty?

- Melanie

[1]: /messages/by-id/CAAKRu_ZMw6Npd_qm2KM+FwQ3cMOMx1Dh3VMhp8-V7SOLxdK9-g@mail.gmail.com

Robert Haas

robertmhaas@gmail.com

6 months ago

In reply to: Melanie Plageman (#4)

Re: Fix some inconsistencies with open-coded visibilitymap_set() callers

On Wed, Jul 2, 2025 at 7:33 PM Melanie Plageman
<melanieplageman@gmail.com> wrote:

I think this might actually be an underlying design problem with the
patch -- heap_page_set_vm_and_log() seems to want to be in charge of
both what we do with the heap page and what we do with the
corresponding VM bit, but that runs contrary to the caller's need to
bundle the VM bit changes with some WAL record that is doing other
things to the heap page.

The way it is currently, visibilitymap_set() sets the heap page LSN
but it doesn't actually set PD_ALL_VISIBLE or mark the heap buffer
dirty. There is no way to tell visibilitymap_set() whether or not you
modified the heap page -- it just assumes you did. That separation of
duties didn't make sense to me.

It doesn't make sense to me, either. I have a strong feeling that
visibilitymap_set() isn't well-scoped, but I don't know whether it
needs to do less or more or just different stuff. I don't think it can
be right for code that doesn't modify the page to set the page LSN;
and I suspect it would be best to try to get modifying the page,
marking the page dirty, and emitting WAL for the modification if
required into a tight segment of code, at least in the ideal world
where somebody else does the work and I just get to sit back and nod
wisely.

In that case, when checksums or wal_log_hints are on, if
PD_ALL_VISIBLE is already set and we didn't otherwise dirty the page
during heap_page_prune_and_freeze(), visibilitymap_set() will stamp
the clean heap page with the LSN of the xl_heap_visible record.

I spent quite a bit of time trying to think of what could happen if we
advance the LSN of an otherwise clean page and then don't mark it
dirty to reflect having made that change. And, honestly, I couldn't
come up with something.

Either advancing the LSN of the heap page is important for some
purpose -- like making crash recovery work properly -- and therefore
the buffer needs to be marked dirty -- or it is not, and the LSN
shouldn't be set in the first place when the page is otherwise
unchanged. I agree with you that it's hard to see how anything goes
wrong as a result of bumping the LSN on a clean page without dirtying
it, but I think it's playing with fire. At the very least, making the
code consistent and intelligible could help future readers of the code
to avoid introducing bugs of their own.

One thing we could do is check if the heap buffer is dirty before
setting the LSN in visibilitymap_set():

I don't think this is the way. visibilitymap_set() shouldn't guess
what the caller has done or will do; the caller should make it clear
explicitly.

As for the not bug cases:

For all the cases where we modify the heap and VM page not in the same
critical section, we could error out after making the heap page
changes before setting the VM. But because this would just lead to
PD_ALL_VISIBLE being set and the VM bit not being set, which is
technically fine.

It seems like it would be better to make all of the changes in the
same critical section, and, in some of the cases, it wouldn't be hard
to do so. But, since I cannot claim a correctness issue, we can leave
it the way it is.

If it's useful to combine things as a precursor to future work, that
may be fair enough, but otherwise I don't see the point. It's not
"technically fine;" it's just straight-up fine. It's not desirable, in
the sense that we might end up having to do extra page reads later to
correct the situation, but crashes won't intervene between those two
critical sections often enough to matter. Unifying those critical
sections won't change anything about what states code elsewhere in the
tree must be prepared to see on disk.

In lazy_scan_new_or_empty(), it would be better to PD_ALL_VISIBLE
before marking the buffer dirty, but it doesn't really matter because
no one else could write the buffer out since we have an exclusive lock

Again, I disagree that this "doesn't really matter"; I think it
straight-up does not matter.

And in heap_multi_insert() for the COPY FREEZE case, in recovery, we
set PD_ALL_VISIBLE and mark the buffer dirty when replaying both the
xl_heap_multi_insert record and when replaying the xl_heap_visible
record. This involves taking and releasing the content lock on the
heap buffer page twice to make redundant changes. Again, not a
correctness issue, but I think it makes much more sense to include the
VM changes in the xl_heap_multi_insert record.

Without looking at the code, if you're saying we could go from 2 WAL
records down to 1, that would improve performance, which would be
worthwhile, but not a bug fix.

So, in summary, do you think we should do something about the
lazy_scan_prune() -> visibilitymap_set() case where we advance the LSN
of a clean buffer and don't mark it dirty?

Yes, I think that would be worth trying to fix.

--
Robert Haas
EDB: http://www.enterprisedb.com

Melanie Plageman

melanieplageman@gmail.com

6 months ago

In reply to: Robert Haas (#5)

Re: Fix some inconsistencies with open-coded visibilitymap_set() callers

On Mon, Jul 7, 2025 at 11:38 AM Robert Haas <robertmhaas@gmail.com> wrote:

On Wed, Jul 2, 2025 at 7:33 PM Melanie Plageman
<melanieplageman@gmail.com> wrote:

One thing we could do is check if the heap buffer is dirty before
setting the LSN in visibilitymap_set():

I don't think this is the way. visibilitymap_set() shouldn't guess
what the caller has done or will do; the caller should make it clear
explicitly.

A direct translation of this would be to add a boolean parameter to
visibilitymap_set() like "heap_page_modified" or "set_heap_lsn". Is
that along the lines you were thinking?

For all the cases where we modify the heap and VM page not in the same
critical section, we could error out after making the heap page
changes before setting the VM. But because this would just lead to
PD_ALL_VISIBLE being set and the VM bit not being set, which is
technically fine.

It seems like it would be better to make all of the changes in the
same critical section, and, in some of the cases, it wouldn't be hard
to do so. But, since I cannot claim a correctness issue, we can leave
it the way it is.

If it's useful to combine things as a precursor to future work, that
may be fair enough, but otherwise I don't see the point. It's not
"technically fine;" it's just straight-up fine. It's not desirable, in
the sense that we might end up having to do extra page reads later to
correct the situation, but crashes won't intervene between those two
critical sections often enough to matter. Unifying those critical
sections won't change anything about what states code elsewhere in the
tree must be prepared to see on disk.

I think one argument for having it in the same critical section is the
defensive coding perspective. Our rule is that you make changes to the
heap page, mark it dirty, emit WAL, and then stamp it with that LSN
all in the same critical section.

In the case of PD_ALL_VISIBLE, it is okay to break this rule because
it is okay to lose the PD_ALL_VISIBLE hint as long as the VM bit
doesn't get set. But, future programmers could see this and make other
modifications to the heap page in the same area we set PD_ALL_VISIBLE
(the one outside of a critical section).

Anyway, I'm now convinced that the way forward for this particular
issue is to represent the VM changes in the same WAL record that has
the heap page changes, so I won't further belabor the issue.

- Melanie

Robert Haas

robertmhaas@gmail.com

6 months ago

In reply to: Melanie Plageman (#6)

Re: Fix some inconsistencies with open-coded visibilitymap_set() callers

On Thu, Jul 10, 2025 at 9:57 AM Melanie Plageman
<melanieplageman@gmail.com> wrote:

A direct translation of this would be to add a boolean parameter to
visibilitymap_set() like "heap_page_modified" or "set_heap_lsn". Is
that along the lines you were thinking?

Yeah, something like that. I haven't thought through the details.

If it's useful to combine things as a precursor to future work, that
may be fair enough, but otherwise I don't see the point. It's not
"technically fine;" it's just straight-up fine. It's not desirable, in
the sense that we might end up having to do extra page reads later to
correct the situation, but crashes won't intervene between those two
critical sections often enough to matter. Unifying those critical
sections won't change anything about what states code elsewhere in the
tree must be prepared to see on disk.

I think one argument for having it in the same critical section is the
defensive coding perspective. Our rule is that you make changes to the
heap page, mark it dirty, emit WAL, and then stamp it with that LSN
all in the same critical section.

I agree that we need to follow this rule.

In the case of PD_ALL_VISIBLE, it is okay to break this rule because
it is okay to lose the PD_ALL_VISIBLE hint as long as the VM bit
doesn't get set. But, future programmers could see this and make other
modifications to the heap page in the same area we set PD_ALL_VISIBLE
(the one outside of a critical section).

But this is saying that PD_ALL_VISIBLE isn't really covered by the WAL
record in question -- instead, it's a related hint. Or really
semi-hint, since it's only conditionally OK to lose it. So the rule
above doesn't necessarily fully apply to this situation.

Anyway, I'm now convinced that the way forward for this particular
issue is to represent the VM changes in the same WAL record that has
the heap page changes, so I won't further belabor the issue.

Makes sense.

--
Robert Haas
EDB: http://www.enterprisedb.com