hio.c does visibilitymap_pin()/IO while holding buffer lock

Started by Andres Freundalmost 3 years ago12 messages
#1Andres Freund
andres@anarazel.de

Hi,

Starting with

commit 7db0cd2145f2bce84cac92402e205e4d2b045bf2
Author: Tomas Vondra <tomas.vondra@postgresql.org>
Date: 2021-01-17 22:11:39 +0100

Set PD_ALL_VISIBLE and visibility map bits in COPY FREEZE

RelationGetBufferForTuple does

/*
* The page is empty, pin vmbuffer to set all_frozen bit.
*/
if (options & HEAP_INSERT_FROZEN)
{
Assert(PageGetMaxOffsetNumber(BufferGetPage(buffer)) == 0);
visibilitymap_pin(relation, BufferGetBlockNumber(buffer), vmbuffer);
}

while holding a buffer lock. visibilitymap_pin() reads pages, if vmbuffer
doesn't already point to the right block.

The lock ordering rules are to lock VM pages *before* locking heap pages.

I think the reason this hasn't yet bitten us badly, is that INSERT_FROZEN
effectively requires that the relation is access exclusive locked. There
shouldn't be other backends locking multiple buffers in the relation (bgwriter
/ checkpointer can lock a single buffer at a time, but that's it).

I see roughly two ways forward:

1) We add a comment explaining why it's safe to violate lock ordering rules in
this one situation

2) Change relevant code so that we only return a valid vmbuffer if we could do
so without blocking / IO and, obviously, skip updating the VM if we
couldn't get the buffer.

Greetings,

Andres Freund

#2Tomas Vondra
tomas.vondra@enterprisedb.com
In reply to: Andres Freund (#1)
Re: hio.c does visibilitymap_pin()/IO while holding buffer lock

On 3/25/23 03:57, Andres Freund wrote:

Hi,

Starting with

commit 7db0cd2145f2bce84cac92402e205e4d2b045bf2
Author: Tomas Vondra <tomas.vondra@postgresql.org>
Date: 2021-01-17 22:11:39 +0100

Set PD_ALL_VISIBLE and visibility map bits in COPY FREEZE

That's a bummer :-(

RelationGetBufferForTuple does

/*
* The page is empty, pin vmbuffer to set all_frozen bit.
*/
if (options & HEAP_INSERT_FROZEN)
{
Assert(PageGetMaxOffsetNumber(BufferGetPage(buffer)) == 0);
visibilitymap_pin(relation, BufferGetBlockNumber(buffer), vmbuffer);
}

while holding a buffer lock. visibilitymap_pin() reads pages, if vmbuffer
doesn't already point to the right block.

The lock ordering rules are to lock VM pages *before* locking heap pages.

I think the reason this hasn't yet bitten us badly, is that INSERT_FROZEN
effectively requires that the relation is access exclusive locked. There
shouldn't be other backends locking multiple buffers in the relation (bgwriter
/ checkpointer can lock a single buffer at a time, but that's it).

Right. Still, it seems a bit fragile ...

I see roughly two ways forward:

1) We add a comment explaining why it's safe to violate lock ordering rules in
this one situation

Possible, although I feel uneasy about just documenting a broken rule.
Would be better to maintain the locking order.

2) Change relevant code so that we only return a valid vmbuffer if we could do
so without blocking / IO and, obviously, skip updating the VM if we
couldn't get the buffer.

I don't recall the exact details about the vm locking/pinning, but can't
we just ensure we actually follow the proper locking order? I mean, this
only deals with new pages, requested at line ~624:

buffer = ReadBufferBI(relation, P_NEW, RBM_ZERO_AND_LOCK, bistate);

Can't we ensure we actually lock the vm buffer too in ReadBufferBI,
before calling ReadBufferExtended? Or am I confused and that's not
possible for some reason?

regards

--
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

#3Andres Freund
andres@anarazel.de
In reply to: Tomas Vondra (#2)
Re: hio.c does visibilitymap_pin()/IO while holding buffer lock

Hi,

On 2023-03-25 14:34:25 +0100, Tomas Vondra wrote:

On 3/25/23 03:57, Andres Freund wrote:

2) Change relevant code so that we only return a valid vmbuffer if we could do
so without blocking / IO and, obviously, skip updating the VM if we
couldn't get the buffer.

I don't recall the exact details about the vm locking/pinning, but can't
we just ensure we actually follow the proper locking order? I mean, this
only deals with new pages, requested at line ~624:

buffer = ReadBufferBI(relation, P_NEW, RBM_ZERO_AND_LOCK, bistate);

Can't we ensure we actually lock the vm buffer too in ReadBufferBI,
before calling ReadBufferExtended? Or am I confused and that's not
possible for some reason?

Note that this is using P_NEW. I.e. we don't know the buffer location yet.

Greetings,

Andres Freund

#4Tom Lane
tgl@sss.pgh.pa.us
In reply to: Andres Freund (#3)
Re: hio.c does visibilitymap_pin()/IO while holding buffer lock

Andres Freund <andres@anarazel.de> writes:

On 2023-03-25 14:34:25 +0100, Tomas Vondra wrote:

Can't we ensure we actually lock the vm buffer too in ReadBufferBI,
before calling ReadBufferExtended? Or am I confused and that's not
possible for some reason?

Note that this is using P_NEW. I.e. we don't know the buffer location yet.

Maybe the relation-extension logic needs to include the ability to get
the relevant vm page?

regards, tom lane

#5Andres Freund
andres@anarazel.de
In reply to: Tom Lane (#4)
Re: hio.c does visibilitymap_pin()/IO while holding buffer lock

Hi,

On 2023-03-25 12:57:17 -0400, Tom Lane wrote:

Andres Freund <andres@anarazel.de> writes:

On 2023-03-25 14:34:25 +0100, Tomas Vondra wrote:

Can't we ensure we actually lock the vm buffer too in ReadBufferBI,
before calling ReadBufferExtended? Or am I confused and that's not
possible for some reason?

Note that this is using P_NEW. I.e. we don't know the buffer location yet.

Maybe the relation-extension logic needs to include the ability to get
the relevant vm page?

I don't see how that's easily possible with the current lock ordering
rules. At least without giving up using RBM_ZERO_AND_LOCK for extending or
stuffing even more things to happen with the the extension lock held, which I
don't think we want to. I don't think INSERT_FROZEN is worth that price.

Perhaps we should just try to heuristically pin the right VM buffer before
trying to extend?

Thinking more about this, I think there's no inherent deadlock danger with
reading the VM while holding a buffer lock, "just" an efficiency issue. If we
avoid needing to do IO nearly all the time, by trying to pin the right page
before extending, it's probably good enough.

Greetings,

Andres Freund

In reply to: Andres Freund (#5)
Re: hio.c does visibilitymap_pin()/IO while holding buffer lock

On Sat, Mar 25, 2023 at 11:17 AM Andres Freund <andres@anarazel.de> wrote:

Thinking more about this, I think there's no inherent deadlock danger with
reading the VM while holding a buffer lock, "just" an efficiency issue. If we
avoid needing to do IO nearly all the time, by trying to pin the right page
before extending, it's probably good enough.

Uh, it was quite possible for lazy_vacuum_heap_page() to do that up
until very recently (it was fixed by my commit 980ae17310). Since it
would call visibilitymap_get_status() with an exclusive buffer lock on
the heap page, which sometimes had to change the VM page. It
potentially did an IO at that point, to read in a later VM page to the
caller's initially-pinned one.

In other words, up until recently there was a strange idiom used by
lazy_vacuum_heap_page/lazy_vacuum_heap_rel, where we'd abuse
visibilitymap_get_status() as a replacement for calling
visibilitymap_pin() right before acquire a heap page buffer lock. But
now the second heap pass does it the same way as the first heap pass.
(Even still, I have no reason to believe that the previous approach
was all that bad; it was just a bit ugly.)

There are still a few visibilitymap_get_status()-with-buffer-lock
calls in vacuumlazy.c, FWIW, but they don't run the risk of needing to
change the vmbuffer we have pinned with the heap page buffer lock
held.

--
Peter Geoghegan

#7Andres Freund
andres@anarazel.de
In reply to: Andres Freund (#5)
2 attachment(s)
Re: hio.c does visibilitymap_pin()/IO while holding buffer lock

Hi,

On 2023-03-25 11:17:07 -0700, Andres Freund wrote:

I don't see how that's easily possible with the current lock ordering
rules. At least without giving up using RBM_ZERO_AND_LOCK for extending or
stuffing even more things to happen with the the extension lock held, which I
don't think we want to. I don't think INSERT_FROZEN is worth that price.

I think I might have been thinking of this too narrowly. It's extremely
unlikely that another backend would discover the page. And we can use
visibilitymap_pin_ok() to amortize the cost to almost nothing - there's a lot
of bits in an 8k block...

Here's a draft patch.

The bulk relation patch I am polishing has a similar issue, except that there
the problem is inserting into the FSM, instead of pinning a VM pageabout the
FSM. Hence the patch above makes the infrastructure a bit more general than
required for the HEAP_INSERT_FROZEN case alone (where we currently shouldn't
ever have a valid otherBuffer).

The way the parameter ordering for GetVisibilityMapPins() works make it
somewhat unwieldy - see e.g the existing
if (otherBuffer == InvalidBuffer || targetBlock <= otherBlock)
GetVisibilityMapPins(relation, buffer, otherBuffer,
targetBlock, otherBlock, vmbuffer,
vmbuffer_other);
else
GetVisibilityMapPins(relation, otherBuffer, buffer,
otherBlock, targetBlock, vmbuffer_other,
vmbuffer);

Which I now duplicated in yet another place.

Perhaps we just ought to switch buffer1/block1 with buffer2/block2 inside
GetVisibilityMapPins(), to avoid duplicating that code elsewhere?

Because we now track whether the *targetBuffer* was ever unlocked, we can be a
bit more narrow about the possibility of there not being sufficient space.

The patch could be narrowed for backpatching. But as there's likely no
practical problem at this point, I wouldn't want to backpatch anyway?

Greetings,

Andres Freund

Attachments:

v1-0001-hio-Release-extension-lock-before-initializing-pa.patchtext/x-diff; charset=us-asciiDownload
From 6f27018765f167a8d2a52bc7c9068cb09ca8d82c Mon Sep 17 00:00:00 2001
From: Andres Freund <andres@anarazel.de>
Date: Mon, 24 Oct 2022 12:28:06 -0700
Subject: [PATCH v1 1/2] hio: Release extension lock before initializing page /
 pinning VM

PageInit() while holding the extension lock is unnecessary after 0d1fe9f74e3
started to use RBM_ZERO_AND_LOCK - nobody can look at the new page before we
release the page lock. PageInit() zeroes the page, which isn't that cheap, so
deferring it until after the extension lock is released seems like a good idea.

Doing visibilitymap_pin() while holding the extension lock, introduced in
7db0cd2145f2, looks like an accident. Due to the restrictions on
HEAP_INSERT_FROZEN it's unlikely to be a performance issue, but it still seems
better to move it out.
---
 src/backend/access/heap/hio.c | 14 +++++++-------
 1 file changed, 7 insertions(+), 7 deletions(-)

diff --git a/src/backend/access/heap/hio.c b/src/backend/access/heap/hio.c
index e152807d2dc..7479212d4e0 100644
--- a/src/backend/access/heap/hio.c
+++ b/src/backend/access/heap/hio.c
@@ -623,6 +623,13 @@ loop:
 	 */
 	buffer = ReadBufferBI(relation, P_NEW, RBM_ZERO_AND_LOCK, bistate);
 
+	/*
+	 * Release the file-extension lock; it's now OK for someone else to extend
+	 * the relation some more.
+	 */
+	if (needLock)
+		UnlockRelationForExtension(relation, ExclusiveLock);
+
 	/*
 	 * We need to initialize the empty new page.  Double-check that it really
 	 * is empty (this should never happen, but if it does we don't want to
@@ -647,13 +654,6 @@ loop:
 		visibilitymap_pin(relation, BufferGetBlockNumber(buffer), vmbuffer);
 	}
 
-	/*
-	 * Release the file-extension lock; it's now OK for someone else to extend
-	 * the relation some more.
-	 */
-	if (needLock)
-		UnlockRelationForExtension(relation, ExclusiveLock);
-
 	/*
 	 * Lock the other buffer. It's guaranteed to be of a lower page number
 	 * than the new page. To conform with the deadlock prevent rules, we ought
-- 
2.38.0

v1-0002-WIP-relation-extension-Don-t-pin-the-VM-while-hol.patchtext/x-diff; charset=us-asciiDownload
From 5aeb7f5d042eda6ba4c70d8f475400c51607fac5 Mon Sep 17 00:00:00 2001
From: Andres Freund <andres@anarazel.de>
Date: Tue, 28 Mar 2023 18:17:11 -0700
Subject: [PATCH v1 2/2] WIP: relation extension: Don't pin the VM while
 holding buffer lock

---
 src/backend/access/heap/hio.c | 115 +++++++++++++++++++++++-----------
 1 file changed, 78 insertions(+), 37 deletions(-)

diff --git a/src/backend/access/heap/hio.c b/src/backend/access/heap/hio.c
index 7479212d4e0..11ef053705e 100644
--- a/src/backend/access/heap/hio.c
+++ b/src/backend/access/heap/hio.c
@@ -134,14 +134,17 @@ ReadBufferBI(Relation relation, BlockNumber targetBlock,
  * buffer2 may be InvalidBuffer, if only one buffer is involved.  buffer1
  * must not be InvalidBuffer.  If both buffers are specified, block1 must
  * be less than block2.
+ *
+ * Returns whether buffer locks were released.
  */
-static void
+static bool
 GetVisibilityMapPins(Relation relation, Buffer buffer1, Buffer buffer2,
 					 BlockNumber block1, BlockNumber block2,
 					 Buffer *vmbuffer1, Buffer *vmbuffer2)
 {
 	bool		need_to_pin_buffer1;
 	bool		need_to_pin_buffer2;
+	bool		released_locks = false;
 
 	Assert(BufferIsValid(buffer1));
 	Assert(buffer2 == InvalidBuffer || block1 <= block2);
@@ -155,9 +158,10 @@ GetVisibilityMapPins(Relation relation, Buffer buffer1, Buffer buffer2,
 			&& PageIsAllVisible(BufferGetPage(buffer2))
 			&& !visibilitymap_pin_ok(block2, *vmbuffer2);
 		if (!need_to_pin_buffer1 && !need_to_pin_buffer2)
-			return;
+			break;
 
 		/* We must unlock both buffers before doing any I/O. */
+		released_locks = true;
 		LockBuffer(buffer1, BUFFER_LOCK_UNLOCK);
 		if (buffer2 != InvalidBuffer && buffer2 != buffer1)
 			LockBuffer(buffer2, BUFFER_LOCK_UNLOCK);
@@ -183,6 +187,8 @@ GetVisibilityMapPins(Relation relation, Buffer buffer1, Buffer buffer2,
 			|| (need_to_pin_buffer1 && need_to_pin_buffer2))
 			break;
 	}
+
+	return released_locks;
 }
 
 /*
@@ -345,6 +351,7 @@ RelationGetBufferForTuple(Relation relation, Size len,
 	BlockNumber targetBlock,
 				otherBlock;
 	bool		needLock;
+	bool		unlockedTargetBuffer;
 
 	len = MAXALIGN(len);		/* be conservative */
 
@@ -630,6 +637,9 @@ loop:
 	if (needLock)
 		UnlockRelationForExtension(relation, ExclusiveLock);
 
+	unlockedTargetBuffer = false;
+	targetBlock = BufferGetBlockNumber(buffer);
+
 	/*
 	 * We need to initialize the empty new page.  Double-check that it really
 	 * is empty (this should never happen, but if it does we don't want to
@@ -639,75 +649,106 @@ loop:
 
 	if (!PageIsNew(page))
 		elog(ERROR, "page %u of relation \"%s\" should be empty but is not",
-			 BufferGetBlockNumber(buffer),
+			 targetBlock,
 			 RelationGetRelationName(relation));
 
 	PageInit(page, BufferGetPageSize(buffer), 0);
 	MarkBufferDirty(buffer);
 
 	/*
-	 * The page is empty, pin vmbuffer to set all_frozen bit.
+	 * The page is empty, pin vmbuffer to set all_frozen bit. We don't want to
+	 * do IO while the buffer is locked, so we unlock the page first if IO is
+	 * needed (necessitating checks below).
 	 */
 	if (options & HEAP_INSERT_FROZEN)
 	{
-		Assert(PageGetMaxOffsetNumber(BufferGetPage(buffer)) == 0);
-		visibilitymap_pin(relation, BufferGetBlockNumber(buffer), vmbuffer);
+		Assert(PageGetMaxOffsetNumber(page) == 0);
+
+		if (!visibilitymap_pin_ok(targetBlock, *vmbuffer))
+		{
+			unlockedTargetBuffer = true;
+			LockBuffer(buffer, BUFFER_LOCK_UNLOCK);
+			visibilitymap_pin(relation, targetBlock, vmbuffer);
+		}
 	}
 
 	/*
-	 * Lock the other buffer. It's guaranteed to be of a lower page number
-	 * than the new page. To conform with the deadlock prevent rules, we ought
-	 * to lock otherBuffer first, but that would give other backends a chance
-	 * to put tuples on our page. To reduce the likelihood of that, attempt to
-	 * lock the other buffer conditionally, that's very likely to work.
-	 * Otherwise we need to lock buffers in the correct order, and retry if
-	 * the space has been used in the mean time.
+	 * If we unlocked the target buffer above, it's unlikely, but possible,
+	 * that another backend used space on this page.
+	 *
+	 * If we didn't, and otherBuffer is valid, we need to lock the other
+	 * buffer. It's guaranteed to be of a lower page number than the new
+	 * page. To conform with the deadlock prevent rules, we ought to lock
+	 * otherBuffer first, but that would give other backends a chance to put
+	 * tuples on our page. To reduce the likelihood of that, attempt to lock
+	 * the other buffer conditionally, that's very likely to work.  Otherwise
+	 * we need to lock buffers in the correct order, and retry if the space
+	 * has been used in the mean time.
 	 *
 	 * Alternatively, we could acquire the lock on otherBuffer before
 	 * extending the relation, but that'd require holding the lock while
 	 * performing IO, which seems worse than an unlikely retry.
 	 */
-	if (otherBuffer != InvalidBuffer)
+	if (unlockedTargetBuffer)
+	{
+		if (otherBuffer != InvalidBuffer)
+			LockBuffer(otherBuffer, BUFFER_LOCK_EXCLUSIVE);
+		LockBuffer(buffer, BUFFER_LOCK_EXCLUSIVE);
+	}
+	else if (otherBuffer != InvalidBuffer)
 	{
 		Assert(otherBuffer != buffer);
-		targetBlock = BufferGetBlockNumber(buffer);
 		Assert(targetBlock > otherBlock);
 
 		if (unlikely(!ConditionalLockBuffer(otherBuffer)))
 		{
+			unlockedTargetBuffer = true;
 			LockBuffer(buffer, BUFFER_LOCK_UNLOCK);
 			LockBuffer(otherBuffer, BUFFER_LOCK_EXCLUSIVE);
 			LockBuffer(buffer, BUFFER_LOCK_EXCLUSIVE);
 		}
+	}
 
-		/*
-		 * Because the buffers were unlocked for a while, it's possible,
-		 * although unlikely, that an all-visible flag became set or that
-		 * somebody used up the available space in the new page.  We can use
-		 * GetVisibilityMapPins to deal with the first case.  In the second
-		 * case, just retry from start.
-		 */
-		GetVisibilityMapPins(relation, otherBuffer, buffer,
-							 otherBlock, targetBlock, vmbuffer_other,
-							 vmbuffer);
+	/*
+	 * If one of the buffers was unlocked, it's possible, although unlikely,
+	 * that an all-visible flag became set.  We can use GetVisibilityMapPins
+	 * to deal with that.
+	 */
+	if (unlockedTargetBuffer || otherBuffer != InvalidBuffer)
+	{
+		if (otherBuffer != InvalidBuffer)
+		{
+			if (GetVisibilityMapPins(relation, otherBuffer, buffer,
+									 otherBlock, targetBlock, vmbuffer_other,
+									 vmbuffer))
+				unlockedTargetBuffer = true;
+		}
+		else
+		{
+			if (GetVisibilityMapPins(relation, buffer, InvalidBuffer,
+									 targetBlock, InvalidBlockNumber,
+									 vmbuffer, InvalidBuffer))
+				unlockedTargetBuffer = true;
+		}
+	}
 
-		/*
-		 * Note that we have to check the available space even if our
-		 * conditional lock succeeded, because GetVisibilityMapPins might've
-		 * transiently released lock on the target buffer to acquire a VM pin
-		 * for the otherBuffer.
-		 */
-		if (len > PageGetHeapFreeSpace(page))
+	/*
+	 * If the target buffer was temporarily unlocked since the relation
+	 * extension, it's possible, although unlikely, that all the space on the
+	 * page was already used. If so, we just retry from the start.  If we
+	 * didn't unlock, something has gone wrong if there's not enough space -
+	 * the test at the top should have prevented reaching this case.
+	 */
+	pageFreeSpace = PageGetHeapFreeSpace(page);
+	if (len > pageFreeSpace)
+	{
+		if (unlockedTargetBuffer)
 		{
 			LockBuffer(otherBuffer, BUFFER_LOCK_UNLOCK);
 			UnlockReleaseBuffer(buffer);
 
 			goto loop;
 		}
-	}
-	else if (len > PageGetHeapFreeSpace(page))
-	{
-		/* We should not get here given the test at the top */
 		elog(PANIC, "tuple is too big: size %zu", len);
 	}
 
@@ -720,7 +761,7 @@ loop:
 	 * current backend to make more insertions or not, which is probably a
 	 * good bet most of the time.  So for now, don't add it to FSM yet.
 	 */
-	RelationSetTargetBlock(relation, BufferGetBlockNumber(buffer));
+	RelationSetTargetBlock(relation, targetBlock);
 
 	return buffer;
 }
-- 
2.38.0

#8Andres Freund
andres@anarazel.de
In reply to: Andres Freund (#7)
2 attachment(s)
Re: hio.c does visibilitymap_pin()/IO while holding buffer lock

Hi,

On 2023-03-28 18:21:02 -0700, Andres Freund wrote:

Here's a draft patch.

Attached is v2, with a stupid bug fixed and a bit of comment / pgindent
polish.

Greetings,

Andres Freund

Attachments:

v2-0002-WIP-relation-extension-Don-t-pin-the-VM-while-hol.patchtext/x-diff; charset=us-asciiDownload
From 5a33df025ce466343d88bbc57ac1bc80d883a230 Mon Sep 17 00:00:00 2001
From: Andres Freund <andres@anarazel.de>
Date: Tue, 28 Mar 2023 18:17:11 -0700
Subject: [PATCH v2 2/2] WIP: relation extension: Don't pin the VM while
 holding buffer lock

---
 src/backend/access/heap/hio.c | 120 +++++++++++++++++++++++-----------
 1 file changed, 81 insertions(+), 39 deletions(-)

diff --git a/src/backend/access/heap/hio.c b/src/backend/access/heap/hio.c
index 7479212d4e0..8b3dfa0ccae 100644
--- a/src/backend/access/heap/hio.c
+++ b/src/backend/access/heap/hio.c
@@ -134,14 +134,17 @@ ReadBufferBI(Relation relation, BlockNumber targetBlock,
  * buffer2 may be InvalidBuffer, if only one buffer is involved.  buffer1
  * must not be InvalidBuffer.  If both buffers are specified, block1 must
  * be less than block2.
+ *
+ * Returns whether buffer locks were temporarily released.
  */
-static void
+static bool
 GetVisibilityMapPins(Relation relation, Buffer buffer1, Buffer buffer2,
 					 BlockNumber block1, BlockNumber block2,
 					 Buffer *vmbuffer1, Buffer *vmbuffer2)
 {
 	bool		need_to_pin_buffer1;
 	bool		need_to_pin_buffer2;
+	bool		released_locks = false;
 
 	Assert(BufferIsValid(buffer1));
 	Assert(buffer2 == InvalidBuffer || block1 <= block2);
@@ -155,9 +158,10 @@ GetVisibilityMapPins(Relation relation, Buffer buffer1, Buffer buffer2,
 			&& PageIsAllVisible(BufferGetPage(buffer2))
 			&& !visibilitymap_pin_ok(block2, *vmbuffer2);
 		if (!need_to_pin_buffer1 && !need_to_pin_buffer2)
-			return;
+			break;
 
 		/* We must unlock both buffers before doing any I/O. */
+		released_locks = true;
 		LockBuffer(buffer1, BUFFER_LOCK_UNLOCK);
 		if (buffer2 != InvalidBuffer && buffer2 != buffer1)
 			LockBuffer(buffer2, BUFFER_LOCK_UNLOCK);
@@ -183,6 +187,8 @@ GetVisibilityMapPins(Relation relation, Buffer buffer1, Buffer buffer2,
 			|| (need_to_pin_buffer1 && need_to_pin_buffer2))
 			break;
 	}
+
+	return released_locks;
 }
 
 /*
@@ -345,6 +351,7 @@ RelationGetBufferForTuple(Relation relation, Size len,
 	BlockNumber targetBlock,
 				otherBlock;
 	bool		needLock;
+	bool		unlockedTargetBuffer;
 
 	len = MAXALIGN(len);		/* be conservative */
 
@@ -630,6 +637,9 @@ loop:
 	if (needLock)
 		UnlockRelationForExtension(relation, ExclusiveLock);
 
+	unlockedTargetBuffer = false;
+	targetBlock = BufferGetBlockNumber(buffer);
+
 	/*
 	 * We need to initialize the empty new page.  Double-check that it really
 	 * is empty (this should never happen, but if it does we don't want to
@@ -639,75 +649,107 @@ loop:
 
 	if (!PageIsNew(page))
 		elog(ERROR, "page %u of relation \"%s\" should be empty but is not",
-			 BufferGetBlockNumber(buffer),
+			 targetBlock,
 			 RelationGetRelationName(relation));
 
 	PageInit(page, BufferGetPageSize(buffer), 0);
 	MarkBufferDirty(buffer);
 
 	/*
-	 * The page is empty, pin vmbuffer to set all_frozen bit.
+	 * The page is empty, pin vmbuffer to set all_frozen bit. We don't want to
+	 * do IO while the buffer is locked, so we unlock the page first if IO is
+	 * needed (necessitating checks below).
 	 */
 	if (options & HEAP_INSERT_FROZEN)
 	{
-		Assert(PageGetMaxOffsetNumber(BufferGetPage(buffer)) == 0);
-		visibilitymap_pin(relation, BufferGetBlockNumber(buffer), vmbuffer);
+		Assert(PageGetMaxOffsetNumber(page) == 0);
+
+		if (!visibilitymap_pin_ok(targetBlock, *vmbuffer))
+		{
+			unlockedTargetBuffer = true;
+			LockBuffer(buffer, BUFFER_LOCK_UNLOCK);
+			visibilitymap_pin(relation, targetBlock, vmbuffer);
+		}
 	}
 
 	/*
-	 * Lock the other buffer. It's guaranteed to be of a lower page number
-	 * than the new page. To conform with the deadlock prevent rules, we ought
-	 * to lock otherBuffer first, but that would give other backends a chance
-	 * to put tuples on our page. To reduce the likelihood of that, attempt to
-	 * lock the other buffer conditionally, that's very likely to work.
-	 * Otherwise we need to lock buffers in the correct order, and retry if
-	 * the space has been used in the mean time.
+	 * If we unlocked the target buffer above, it's unlikely, but possible,
+	 * that another backend used space on this page.
+	 *
+	 * If we didn't, and otherBuffer is valid, we need to lock the other
+	 * buffer. It's guaranteed to be of a lower page number than the new page.
+	 * To conform with the deadlock prevent rules, we ought to lock
+	 * otherBuffer first, but that would give other backends a chance to put
+	 * tuples on our page. To reduce the likelihood of that, attempt to lock
+	 * the other buffer conditionally, that's very likely to work.  Otherwise
+	 * we need to lock buffers in the correct order, and retry if the space
+	 * has been used in the mean time.
 	 *
 	 * Alternatively, we could acquire the lock on otherBuffer before
 	 * extending the relation, but that'd require holding the lock while
 	 * performing IO, which seems worse than an unlikely retry.
 	 */
-	if (otherBuffer != InvalidBuffer)
+	if (unlockedTargetBuffer)
+	{
+		if (otherBuffer != InvalidBuffer)
+			LockBuffer(otherBuffer, BUFFER_LOCK_EXCLUSIVE);
+		LockBuffer(buffer, BUFFER_LOCK_EXCLUSIVE);
+	}
+	else if (otherBuffer != InvalidBuffer)
 	{
 		Assert(otherBuffer != buffer);
-		targetBlock = BufferGetBlockNumber(buffer);
 		Assert(targetBlock > otherBlock);
 
 		if (unlikely(!ConditionalLockBuffer(otherBuffer)))
 		{
+			unlockedTargetBuffer = true;
 			LockBuffer(buffer, BUFFER_LOCK_UNLOCK);
 			LockBuffer(otherBuffer, BUFFER_LOCK_EXCLUSIVE);
 			LockBuffer(buffer, BUFFER_LOCK_EXCLUSIVE);
 		}
+	}
 
-		/*
-		 * Because the buffers were unlocked for a while, it's possible,
-		 * although unlikely, that an all-visible flag became set or that
-		 * somebody used up the available space in the new page.  We can use
-		 * GetVisibilityMapPins to deal with the first case.  In the second
-		 * case, just retry from start.
-		 */
-		GetVisibilityMapPins(relation, otherBuffer, buffer,
-							 otherBlock, targetBlock, vmbuffer_other,
-							 vmbuffer);
-
-		/*
-		 * Note that we have to check the available space even if our
-		 * conditional lock succeeded, because GetVisibilityMapPins might've
-		 * transiently released lock on the target buffer to acquire a VM pin
-		 * for the otherBuffer.
-		 */
-		if (len > PageGetHeapFreeSpace(page))
+	/*
+	 * If one of the buffers was unlocked, it's possible, although unlikely,
+	 * that an all-visible flag became set.  We can use GetVisibilityMapPins
+	 * to deal with that.
+	 */
+	if (unlockedTargetBuffer || otherBuffer != InvalidBuffer)
+	{
+		if (otherBuffer != InvalidBuffer)
 		{
-			LockBuffer(otherBuffer, BUFFER_LOCK_UNLOCK);
+			if (GetVisibilityMapPins(relation, otherBuffer, buffer,
+									 otherBlock, targetBlock, vmbuffer_other,
+									 vmbuffer))
+				unlockedTargetBuffer = true;
+		}
+		else
+		{
+			if (GetVisibilityMapPins(relation, buffer, InvalidBuffer,
+									 targetBlock, InvalidBlockNumber,
+									 vmbuffer, InvalidBuffer))
+				unlockedTargetBuffer = true;
+		}
+	}
+
+	/*
+	 * If the target buffer was temporarily unlocked since the relation
+	 * extension, it's possible, although unlikely, that all the space on the
+	 * page was already used. If so, we just retry from the start.  If we
+	 * didn't unlock, something has gone wrong if there's not enough space -
+	 * the test at the top should have prevented reaching this case.
+	 */
+	pageFreeSpace = PageGetHeapFreeSpace(page);
+	if (len > pageFreeSpace)
+	{
+		if (unlockedTargetBuffer)
+		{
+			if (otherBuffer != InvalidBuffer)
+				LockBuffer(otherBuffer, BUFFER_LOCK_UNLOCK);
 			UnlockReleaseBuffer(buffer);
 
 			goto loop;
 		}
-	}
-	else if (len > PageGetHeapFreeSpace(page))
-	{
-		/* We should not get here given the test at the top */
 		elog(PANIC, "tuple is too big: size %zu", len);
 	}
 
@@ -720,7 +762,7 @@ loop:
 	 * current backend to make more insertions or not, which is probably a
 	 * good bet most of the time.  So for now, don't add it to FSM yet.
 	 */
-	RelationSetTargetBlock(relation, BufferGetBlockNumber(buffer));
+	RelationSetTargetBlock(relation, targetBlock);
 
 	return buffer;
 }
-- 
2.38.0

v2-0001-hio-Release-extension-lock-before-initializing-pa.patchtext/x-diff; charset=us-asciiDownload
From 6f27018765f167a8d2a52bc7c9068cb09ca8d82c Mon Sep 17 00:00:00 2001
From: Andres Freund <andres@anarazel.de>
Date: Mon, 24 Oct 2022 12:28:06 -0700
Subject: [PATCH v2 1/2] hio: Release extension lock before initializing page /
 pinning VM

PageInit() while holding the extension lock is unnecessary after 0d1fe9f74e3
started to use RBM_ZERO_AND_LOCK - nobody can look at the new page before we
release the page lock. PageInit() zeroes the page, which isn't that cheap, so
deferring it until after the extension lock is released seems like a good idea.

Doing visibilitymap_pin() while holding the extension lock, introduced in
7db0cd2145f2, looks like an accident. Due to the restrictions on
HEAP_INSERT_FROZEN it's unlikely to be a performance issue, but it still seems
better to move it out.
---
 src/backend/access/heap/hio.c | 14 +++++++-------
 1 file changed, 7 insertions(+), 7 deletions(-)

diff --git a/src/backend/access/heap/hio.c b/src/backend/access/heap/hio.c
index e152807d2dc..7479212d4e0 100644
--- a/src/backend/access/heap/hio.c
+++ b/src/backend/access/heap/hio.c
@@ -623,6 +623,13 @@ loop:
 	 */
 	buffer = ReadBufferBI(relation, P_NEW, RBM_ZERO_AND_LOCK, bistate);
 
+	/*
+	 * Release the file-extension lock; it's now OK for someone else to extend
+	 * the relation some more.
+	 */
+	if (needLock)
+		UnlockRelationForExtension(relation, ExclusiveLock);
+
 	/*
 	 * We need to initialize the empty new page.  Double-check that it really
 	 * is empty (this should never happen, but if it does we don't want to
@@ -647,13 +654,6 @@ loop:
 		visibilitymap_pin(relation, BufferGetBlockNumber(buffer), vmbuffer);
 	}
 
-	/*
-	 * Release the file-extension lock; it's now OK for someone else to extend
-	 * the relation some more.
-	 */
-	if (needLock)
-		UnlockRelationForExtension(relation, ExclusiveLock);
-
 	/*
 	 * Lock the other buffer. It's guaranteed to be of a lower page number
 	 * than the new page. To conform with the deadlock prevent rules, we ought
-- 
2.38.0

#9Andres Freund
andres@anarazel.de
In reply to: Andres Freund (#8)
Re: hio.c does visibilitymap_pin()/IO while holding buffer lock

Hi,

On 2023-03-28 19:17:21 -0700, Andres Freund wrote:

On 2023-03-28 18:21:02 -0700, Andres Freund wrote:

Here's a draft patch.

Attached is v2, with a stupid bug fixed and a bit of comment / pgindent
polish.

I'd welcome some review (Tomas?), but otherwise I'm planning to push ahead
with this.

I'm still debating with myself whether this commit (or a prerequisite commit)
should move logic dealing with the buffer ordering into
GetVisibilityMapPins(), so we don't need two blocks like this:

if (otherBuffer == InvalidBuffer || targetBlock <= otherBlock)
GetVisibilityMapPins(relation, buffer, otherBuffer,
targetBlock, otherBlock, vmbuffer,
vmbuffer_other);
else
GetVisibilityMapPins(relation, otherBuffer, buffer,
otherBlock, targetBlock, vmbuffer_other,
vmbuffer);
...

if (otherBuffer != InvalidBuffer)
{
if (GetVisibilityMapPins(relation, otherBuffer, buffer,
otherBlock, targetBlock, vmbuffer_other,
vmbuffer))
unlockedTargetBuffer = true;
}
else
{
if (GetVisibilityMapPins(relation, buffer, InvalidBuffer,
targetBlock, InvalidBlockNumber,
vmbuffer, InvalidBuffer))
unlockedTargetBuffer = true;
}
}

Greetings,

Andres Freund

#10Tomas Vondra
tomas.vondra@enterprisedb.com
In reply to: Andres Freund (#9)
Re: hio.c does visibilitymap_pin()/IO while holding buffer lock

On 4/3/23 00:40, Andres Freund wrote:

Hi,

On 2023-03-28 19:17:21 -0700, Andres Freund wrote:

On 2023-03-28 18:21:02 -0700, Andres Freund wrote:

Here's a draft patch.

Attached is v2, with a stupid bug fixed and a bit of comment / pgindent
polish.

I'd welcome some review (Tomas?), but otherwise I'm planning to push ahead
with this.

I guess the 0001 part was already pushed, so I should be looking only at
0002, correct?

I think 0002 makes RelationGetBufferForTuple() harder to understand. I'm
not saying it's incorrect, but I find it hard to reason about the new
combinations of conditions :-(

I mean, it only had this condition:

if (otherBuffer != InvalidBuffer)
{
...
}

but now it have

if (unlockedTargetBuffer)
{
...
}
else if (otherBuffer != InvalidBuffer)
{
...
}

if (unlockedTargetBuffer || otherBuffer != InvalidBuffer)
{
...
}

Not sure how to improve that :-/ but not exactly trivial to figure out
what's going to happen.

Maybe this

* If we unlocked the target buffer above, it's unlikely, but possible,
* that another backend used space on this page.

might say what we're going to do in this case. I mean - I understand
some backend may use space in unlocked page, but what does that mean for
this code? What is it going to do? (The same comment talks about the
next condition in much more detail, for example.)

Also, isn't it a bit strange the free space check now happens outside
any if condition? It used to happen in the

if (otherBuffer != InvalidBuffer)
{
...
}

block, but now it happens outside.

I'm still debating with myself whether this commit (or a prerequisite commit)
should move logic dealing with the buffer ordering into
GetVisibilityMapPins(), so we don't need two blocks like this:

if (otherBuffer == InvalidBuffer || targetBlock <= otherBlock)
GetVisibilityMapPins(relation, buffer, otherBuffer,
targetBlock, otherBlock, vmbuffer,
vmbuffer_other);
else
GetVisibilityMapPins(relation, otherBuffer, buffer,
otherBlock, targetBlock, vmbuffer_other,
vmbuffer);
...

if (otherBuffer != InvalidBuffer)
{
if (GetVisibilityMapPins(relation, otherBuffer, buffer,
otherBlock, targetBlock, vmbuffer_other,
vmbuffer))
unlockedTargetBuffer = true;
}
else
{
if (GetVisibilityMapPins(relation, buffer, InvalidBuffer,
targetBlock, InvalidBlockNumber,
vmbuffer, InvalidBuffer))
unlockedTargetBuffer = true;
}
}

Yeah. I haven't tried, but I imagine it'd make RelationGetBufferForTuple
a little bit.

regards

--
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

#11Andres Freund
andres@anarazel.de
In reply to: Tomas Vondra (#10)
2 attachment(s)
Re: hio.c does visibilitymap_pin()/IO while holding buffer lock

Hi,

On 2023-04-03 14:25:59 +0200, Tomas Vondra wrote:

On 4/3/23 00:40, Andres Freund wrote:

Hi,

On 2023-03-28 19:17:21 -0700, Andres Freund wrote:

On 2023-03-28 18:21:02 -0700, Andres Freund wrote:

Here's a draft patch.

Attached is v2, with a stupid bug fixed and a bit of comment / pgindent
polish.

I'd welcome some review (Tomas?), but otherwise I'm planning to push ahead
with this.

I guess the 0001 part was already pushed, so I should be looking only at
0002, correct?

Yes.

I think 0002 makes RelationGetBufferForTuple() harder to understand. I'm
not saying it's incorrect, but I find it hard to reason about the new
combinations of conditions :-(

I mean, it only had this condition:

if (otherBuffer != InvalidBuffer)
{
...
}

but now it have

if (unlockedTargetBuffer)
{
...
}
else if (otherBuffer != InvalidBuffer)
{
...
}

if (unlockedTargetBuffer || otherBuffer != InvalidBuffer)
{
...
}

Not sure how to improve that :-/ but not exactly trivial to figure out
what's going to happen.

It's not great, I agree. I tried to make it easier to read in this version by
a) changing GetVisibilityMapPins() as I proposed
b) added a new variable "recheckVmPins", that gets set in
if (unlockedTargetBuffer)
and
if (otherBuffer != InvalidBuffer)
c) reformulated comments

Maybe this

* If we unlocked the target buffer above, it's unlikely, but possible,
* that another backend used space on this page.

might say what we're going to do in this case. I mean - I understand
some backend may use space in unlocked page, but what does that mean for
this code? What is it going to do? (The same comment talks about the
next condition in much more detail, for example.)

There's a comment about that detail further below. But you're right, it wasn't
clear as-is. How about now?

Also, isn't it a bit strange the free space check now happens outside
any if condition? It used to happen in the

if (otherBuffer != InvalidBuffer)
{
...
}

block, but now it happens outside.

Well, the alternative is to repeat it in the two branches, which doesn't seem
great either. Particularly because there'll be a third branch after the bulk
extension patch.

Greetings,

Andres Freund

Attachments:

v3-0001-hio-Relax-rules-for-calling-GetVisibilityMapPins.patchtext/x-diff; charset=us-asciiDownload
From 49183283d6701d22cf12f6a6280ed1aba02fc063 Mon Sep 17 00:00:00 2001
From: Andres Freund <andres@anarazel.de>
Date: Mon, 3 Apr 2023 11:18:10 -0700
Subject: [PATCH v3 1/2] hio: Relax rules for calling GetVisibilityMapPins()

Author:
Reviewed-by:
Discussion: https://postgr.es/m/
Backpatch:
---
 src/backend/access/heap/hio.c | 37 ++++++++++++++++++++++++-----------
 1 file changed, 26 insertions(+), 11 deletions(-)

diff --git a/src/backend/access/heap/hio.c b/src/backend/access/heap/hio.c
index 7479212d4e0..e8aea42f8a9 100644
--- a/src/backend/access/heap/hio.c
+++ b/src/backend/access/heap/hio.c
@@ -131,9 +131,9 @@ ReadBufferBI(Relation relation, BlockNumber targetBlock,
  * For each heap page which is all-visible, acquire a pin on the appropriate
  * visibility map page, if we haven't already got one.
  *
- * buffer2 may be InvalidBuffer, if only one buffer is involved.  buffer1
- * must not be InvalidBuffer.  If both buffers are specified, block1 must
- * be less than block2.
+ * To avoid complexity in the callers, either buffer1 or buffer2 may be
+ * InvalidBuffer if only one buffer is involved. For the same reason, block2
+ * may be smaller than block1.
  */
 static void
 GetVisibilityMapPins(Relation relation, Buffer buffer1, Buffer buffer2,
@@ -143,6 +143,26 @@ GetVisibilityMapPins(Relation relation, Buffer buffer1, Buffer buffer2,
 	bool		need_to_pin_buffer1;
 	bool		need_to_pin_buffer2;
 
+	/*
+	 * Swap buffers around to handle case of a single block/buffer, and to
+	 * handle if lock ordering rules require to lock block2 first.
+	 */
+	if (!BufferIsValid(buffer1) ||
+		(BufferIsValid(buffer2) && block1 > block2))
+	{
+		Buffer tmpbuf = buffer1;
+		Buffer *tmpvmbuf = vmbuffer1;
+		BlockNumber tmpblock = block1;
+
+		buffer1 = buffer2;
+		vmbuffer1 = vmbuffer2;
+		block1 = block2;
+
+		buffer2 = tmpbuf;
+		vmbuffer2 = tmpvmbuf;
+		block2 = tmpblock;
+	}
+
 	Assert(BufferIsValid(buffer1));
 	Assert(buffer2 == InvalidBuffer || block1 <= block2);
 
@@ -502,14 +522,9 @@ loop:
 		 * done a bit of extra work for no gain, but there's no real harm
 		 * done.
 		 */
-		if (otherBuffer == InvalidBuffer || targetBlock <= otherBlock)
-			GetVisibilityMapPins(relation, buffer, otherBuffer,
-								 targetBlock, otherBlock, vmbuffer,
-								 vmbuffer_other);
-		else
-			GetVisibilityMapPins(relation, otherBuffer, buffer,
-								 otherBlock, targetBlock, vmbuffer_other,
-								 vmbuffer);
+		GetVisibilityMapPins(relation, buffer, otherBuffer,
+							 targetBlock, otherBlock, vmbuffer,
+							 vmbuffer_other);
 
 		/*
 		 * Now we can check to see if there's enough free space here. If so,
-- 
2.38.0

v3-0002-hio-Don-t-pin-the-VM-while-holding-buffer-lock-wh.patchtext/x-diff; charset=us-asciiDownload
From 27f9ace30d9862f05b73676526a1b0233ef1faea Mon Sep 17 00:00:00 2001
From: Andres Freund <andres@anarazel.de>
Date: Mon, 3 Apr 2023 11:26:16 -0700
Subject: [PATCH v3 2/2] hio: Don't pin the VM while holding buffer lock while
 extending

Discussion: http://postgr.es/m/20230325025740.wzvchp2kromw4zqz@awork3.anarazel.de
---
 src/backend/access/heap/hio.c | 126 +++++++++++++++++++++++-----------
 1 file changed, 85 insertions(+), 41 deletions(-)

diff --git a/src/backend/access/heap/hio.c b/src/backend/access/heap/hio.c
index e8aea42f8a9..15912f6415f 100644
--- a/src/backend/access/heap/hio.c
+++ b/src/backend/access/heap/hio.c
@@ -134,14 +134,17 @@ ReadBufferBI(Relation relation, BlockNumber targetBlock,
  * To avoid complexity in the callers, either buffer1 or buffer2 may be
  * InvalidBuffer if only one buffer is involved. For the same reason, block2
  * may be smaller than block1.
+ *
+ * Returns whether buffer locks were temporarily released.
  */
-static void
+static bool
 GetVisibilityMapPins(Relation relation, Buffer buffer1, Buffer buffer2,
 					 BlockNumber block1, BlockNumber block2,
 					 Buffer *vmbuffer1, Buffer *vmbuffer2)
 {
 	bool		need_to_pin_buffer1;
 	bool		need_to_pin_buffer2;
+	bool		released_locks = false;
 
 	/*
 	 * Swap buffers around to handle case of a single block/buffer, and to
@@ -175,9 +178,10 @@ GetVisibilityMapPins(Relation relation, Buffer buffer1, Buffer buffer2,
 			&& PageIsAllVisible(BufferGetPage(buffer2))
 			&& !visibilitymap_pin_ok(block2, *vmbuffer2);
 		if (!need_to_pin_buffer1 && !need_to_pin_buffer2)
-			return;
+			break;
 
 		/* We must unlock both buffers before doing any I/O. */
+		released_locks = true;
 		LockBuffer(buffer1, BUFFER_LOCK_UNLOCK);
 		if (buffer2 != InvalidBuffer && buffer2 != buffer1)
 			LockBuffer(buffer2, BUFFER_LOCK_UNLOCK);
@@ -203,6 +207,8 @@ GetVisibilityMapPins(Relation relation, Buffer buffer1, Buffer buffer2,
 			|| (need_to_pin_buffer1 && need_to_pin_buffer2))
 			break;
 	}
+
+	return released_locks;
 }
 
 /*
@@ -365,6 +371,8 @@ RelationGetBufferForTuple(Relation relation, Size len,
 	BlockNumber targetBlock,
 				otherBlock;
 	bool		needLock;
+	bool		unlockedTargetBuffer;
+	bool		recheckVmPins;
 
 	len = MAXALIGN(len);		/* be conservative */
 
@@ -645,6 +653,9 @@ loop:
 	if (needLock)
 		UnlockRelationForExtension(relation, ExclusiveLock);
 
+	unlockedTargetBuffer = false;
+	targetBlock = BufferGetBlockNumber(buffer);
+
 	/*
 	 * We need to initialize the empty new page.  Double-check that it really
 	 * is empty (this should never happen, but if it does we don't want to
@@ -654,75 +665,108 @@ loop:
 
 	if (!PageIsNew(page))
 		elog(ERROR, "page %u of relation \"%s\" should be empty but is not",
-			 BufferGetBlockNumber(buffer),
+			 targetBlock,
 			 RelationGetRelationName(relation));
 
 	PageInit(page, BufferGetPageSize(buffer), 0);
 	MarkBufferDirty(buffer);
 
 	/*
-	 * The page is empty, pin vmbuffer to set all_frozen bit.
+	 * The page is empty, pin vmbuffer to set all_frozen bit. We don't want to
+	 * do IO while the buffer is locked, so we unlock the page first if IO is
+	 * needed (necessitating checks below).
 	 */
 	if (options & HEAP_INSERT_FROZEN)
 	{
-		Assert(PageGetMaxOffsetNumber(BufferGetPage(buffer)) == 0);
-		visibilitymap_pin(relation, BufferGetBlockNumber(buffer), vmbuffer);
+		Assert(PageGetMaxOffsetNumber(page) == 0);
+
+		if (!visibilitymap_pin_ok(targetBlock, *vmbuffer))
+		{
+			unlockedTargetBuffer = true;
+			LockBuffer(buffer, BUFFER_LOCK_UNLOCK);
+			visibilitymap_pin(relation, targetBlock, vmbuffer);
+		}
 	}
 
 	/*
-	 * Lock the other buffer. It's guaranteed to be of a lower page number
-	 * than the new page. To conform with the deadlock prevent rules, we ought
-	 * to lock otherBuffer first, but that would give other backends a chance
-	 * to put tuples on our page. To reduce the likelihood of that, attempt to
-	 * lock the other buffer conditionally, that's very likely to work.
-	 * Otherwise we need to lock buffers in the correct order, and retry if
-	 * the space has been used in the mean time.
+	 * Reacquire locks if necessary.
 	 *
-	 * Alternatively, we could acquire the lock on otherBuffer before
-	 * extending the relation, but that'd require holding the lock while
-	 * performing IO, which seems worse than an unlikely retry.
+	 * If the target buffer was unlocked above, or is unlocked while
+	 * reacquiring the lock on otherBuffer below, it's unlikely, but possible,
+	 * that another backend used space on this page. We check for that below,
+	 * and retry if necessary.
 	 */
-	if (otherBuffer != InvalidBuffer)
+	recheckVmPins = false;
+	if (unlockedTargetBuffer)
 	{
+		/* released lock on target buffer above */
+		if (otherBuffer != InvalidBuffer)
+			LockBuffer(otherBuffer, BUFFER_LOCK_EXCLUSIVE);
+		LockBuffer(buffer, BUFFER_LOCK_EXCLUSIVE);
+		recheckVmPins = true;
+	}
+	else if (otherBuffer != InvalidBuffer)
+	{
+		/*
+		 * We did not release the target buffer, and otherBuffer is valid,
+		 * need to lock the other buffer. It's guaranteed to be of a lower
+		 * page number than the new page.  To conform with the deadlock
+		 * prevent rules, we ought to lock otherBuffer first, but that would
+		 * give other backends a chance to put tuples on our page. To reduce
+		 * the likelihood of that, attempt to lock the other buffer
+		 * conditionally, that's very likely to work.
+		 *
+		 * Alternatively, we could acquire the lock on otherBuffer before
+		 * extending the relation, but that'd require holding the lock while
+		 * performing IO, which seems worse than an unlikely retry.
+		 */
 		Assert(otherBuffer != buffer);
-		targetBlock = BufferGetBlockNumber(buffer);
 		Assert(targetBlock > otherBlock);
 
 		if (unlikely(!ConditionalLockBuffer(otherBuffer)))
 		{
+			unlockedTargetBuffer = true;
 			LockBuffer(buffer, BUFFER_LOCK_UNLOCK);
 			LockBuffer(otherBuffer, BUFFER_LOCK_EXCLUSIVE);
 			LockBuffer(buffer, BUFFER_LOCK_EXCLUSIVE);
 		}
+		recheckVmPins = true;
+	}
 
-		/*
-		 * Because the buffers were unlocked for a while, it's possible,
-		 * although unlikely, that an all-visible flag became set or that
-		 * somebody used up the available space in the new page.  We can use
-		 * GetVisibilityMapPins to deal with the first case.  In the second
-		 * case, just retry from start.
-		 */
-		GetVisibilityMapPins(relation, otherBuffer, buffer,
-							 otherBlock, targetBlock, vmbuffer_other,
-							 vmbuffer);
+	/*
+	 * If one of the buffers was unlocked (always the case if otherBuffer is
+	 * valid), it's possible, although unlikely, that an all-visible flag
+	 * became set.  We can use GetVisibilityMapPins to deal with that. It's
+	 * possible that GetVisibilityMapPins() might need to temporarily release
+	 * buffer locks, in which case we'll need to check if there's still enough
+	 * space on the page below.
+	 */
+	if (recheckVmPins)
+	{
+		if (GetVisibilityMapPins(relation, otherBuffer, buffer,
+								 otherBlock, targetBlock, vmbuffer_other,
+								 vmbuffer))
+			unlockedTargetBuffer = true;
+	}
 
-		/*
-		 * Note that we have to check the available space even if our
-		 * conditional lock succeeded, because GetVisibilityMapPins might've
-		 * transiently released lock on the target buffer to acquire a VM pin
-		 * for the otherBuffer.
-		 */
-		if (len > PageGetHeapFreeSpace(page))
+	/*
+	 * If the target buffer was temporarily unlocked since the relation
+	 * extension, it's possible, although unlikely, that all the space on the
+	 * page was already used. If so, we just retry from the start.  If we
+	 * didn't unlock, something has gone wrong if there's not enough space -
+	 * the test at the top should have prevented reaching this case.
+	 */
+	pageFreeSpace = PageGetHeapFreeSpace(page);
+	if (len > pageFreeSpace)
+	{
+		if (unlockedTargetBuffer)
 		{
-			LockBuffer(otherBuffer, BUFFER_LOCK_UNLOCK);
+			if (otherBuffer != InvalidBuffer)
+				LockBuffer(otherBuffer, BUFFER_LOCK_UNLOCK);
 			UnlockReleaseBuffer(buffer);
 
 			goto loop;
 		}
-	}
-	else if (len > PageGetHeapFreeSpace(page))
-	{
-		/* We should not get here given the test at the top */
 		elog(PANIC, "tuple is too big: size %zu", len);
 	}
 
@@ -735,7 +779,7 @@ loop:
 	 * current backend to make more insertions or not, which is probably a
 	 * good bet most of the time.  So for now, don't add it to FSM yet.
 	 */
-	RelationSetTargetBlock(relation, BufferGetBlockNumber(buffer));
+	RelationSetTargetBlock(relation, targetBlock);
 
 	return buffer;
 }
-- 
2.38.0

#12Andres Freund
andres@anarazel.de
In reply to: Andres Freund (#11)
Re: hio.c does visibilitymap_pin()/IO while holding buffer lock

Hi,

On 2023-04-03 12:00:30 -0700, Andres Freund wrote:

It's not great, I agree. I tried to make it easier to read in this version by
a) changing GetVisibilityMapPins() as I proposed
b) added a new variable "recheckVmPins", that gets set in
if (unlockedTargetBuffer)
and
if (otherBuffer != InvalidBuffer)
c) reformulated comments

I pushed this version a couple hours ago, after a bit more polishing.

Greetings,

Andres Freund