Using Valgrind to detect faulty buffer accesses (no pin or buffer content lock held)

Started by Peter Geogheganover 5 years ago12 messages
4 attachment(s)

I recently expressed an interest in using Valgrind memcheck to detect
access to pages whose buffers do not have a pin held in the backend,
or do not have a buffer lock held (the latter check makes sense for
pages owned by index access methods). I came up with a quick and dirty
patch, that I confirmed found a bug in nbtree VACUUM that I spotted
randomly:

/messages/by-id/CAH2-Wz=WRu6NMWtit2weDnuGxdsWeNyFygeBP_zZ2Sso0YAGFg@mail.gmail.com

(This is a bug in commit 857f9c36cda.)

Alvaro wrote a similar patch back in 2015, that I'd forgotten about
but was reminded of today:

/messages/by-id/20150723195349.GW5596@postgresql.org

I took his version (which was better than my rough original) and
rebased it -- that's attached as the first patch. The second patch is
something that takes the general idea further by having nbtree mark
pages whose buffers lack a buffer lock (that may or may not have a
buffer pin) as NOACCESS in a similar way.

This second patch detected two more bugs in nbtree page deletion by
running the regression tests with Valgrind memcheck. These additional
bugs are probably of lower severity than the first one, since we at
least have a buffer pin (we just don't have buffer locks). All three
bugs are very similar, though: they all involve dereferencing a
pointer to the special area of a page at a point where the underlying
buffer is no longer safe to access.

The final two patches fix the two newly discovered bugs -- I don't
have a fix for the first bug yet, since that one is more complicated
(and probably more serious). The regression tests run with Valgrind
will complain about all three bugs if you just apply the first two
patches (though you only need the first patch to see a complaint about
the first, more serious bug when the tests are run).

--
Peter Geoghegan

Attachments:

v1-0001-Add-Valgrind-buffer-access-instrumentation.patchapplication/octet-stream; name=v1-0001-Add-Valgrind-buffer-access-instrumentation.patchDownload
From a0074830b5a7372dc234dbc6c768a607faa85271 Mon Sep 17 00:00:00 2001
From: Peter Geoghegan <pg@bowt.ie>
Date: Wed, 22 Apr 2020 19:51:43 -0700
Subject: [PATCH v1 1/4] Add Valgrind buffer access instrumentation.
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

Based on a patch from Álvaro Herrera posted to:
https://www.postgresql.org/message-id/20150723195349.GW5596@postgresql.org
---
 src/backend/storage/buffer/bufmgr.c | 18 ++++++++++++++++++
 1 file changed, 18 insertions(+)

diff --git a/src/backend/storage/buffer/bufmgr.c b/src/backend/storage/buffer/bufmgr.c
index f9980cf80c..5d3211140c 100644
--- a/src/backend/storage/buffer/bufmgr.c
+++ b/src/backend/storage/buffer/bufmgr.c
@@ -49,6 +49,7 @@
 #include "storage/proc.h"
 #include "storage/smgr.h"
 #include "storage/standby.h"
+#include "utils/memdebug.h"
 #include "utils/ps_status.h"
 #include "utils/rel.h"
 #include "utils/resowner_private.h"
@@ -1633,6 +1634,13 @@ PinBuffer(BufferDesc *buf, BufferAccessStrategy strategy)
 											   buf_state))
 			{
 				result = (buf_state & BM_VALID) != 0;
+
+				/*
+				 * If we successfully acquired our first pin on this buffer
+				 * within this backend, mark buffer contents defined
+				 */
+				if (result)
+					VALGRIND_MAKE_MEM_DEFINED(BufHdrGetBlock(buf), BLCKSZ);
 				break;
 			}
 		}
@@ -1683,6 +1691,13 @@ PinBuffer_Locked(BufferDesc *buf)
 	 */
 	Assert(GetPrivateRefCountEntry(BufferDescriptorGetBuffer(buf), false) == NULL);
 
+	/*
+	 * Buffer can't have a preexisting pin, so mark its page as defined to
+	 * Valgrind (this is similar to the PinBuffer() case where the backend
+	 * doesn't already have a buffer pin)
+	 */
+	VALGRIND_MAKE_MEM_DEFINED(BufHdrGetBlock(buf), BLCKSZ);
+
 	/*
 	 * Since we hold the buffer spinlock, we can update the buffer state and
 	 * release the lock in one operation.
@@ -1728,6 +1743,9 @@ UnpinBuffer(BufferDesc *buf, bool fixOwner)
 		uint32		buf_state;
 		uint32		old_buf_state;
 
+		/* Mark undefined, now that no pins remain in backend */
+		VALGRIND_MAKE_MEM_NOACCESS(BufHdrGetBlock(buf), BLCKSZ);
+
 		/* I'd better not still hold any locks on the buffer */
 		Assert(!LWLockHeldByMe(BufferDescriptorGetContentLock(buf)));
 		Assert(!LWLockHeldByMe(BufferDescriptorGetIOLock(buf)));
-- 
2.25.1

v1-0002-Add-nbtree-Valgrind-buffer-lock-checks.patchapplication/octet-stream; name=v1-0002-Add-nbtree-Valgrind-buffer-lock-checks.patchDownload
From 3f3a97174f0f8b97ffee5121607bd89c6583a2b7 Mon Sep 17 00:00:00 2001
From: Peter Geoghegan <pg@bowt.ie>
Date: Thu, 23 Apr 2020 14:43:54 -0700
Subject: [PATCH v1 2/4] Add nbtree Valgrind buffer lock checks.

Teach nbtree to use Valgrind memcheck client requests to mark pages as
NOACCESS at the point the buffer content lock is released.  Callers
usually also drop their pin at the same time, though there are some
exceptions.  This instrumentation makes Valgrind detect any page access
that occurs without a buffer lock, which is fundamentally unsafe with
B-Tree index pages.  Buffer pins only prevent VACUUM from performing
concurrent recycling of heap TIDs from the pinned leaf page.  Our
assumption is that they don't make it safe to access the page in any way
whatsoever.

We provide nbtree specific wrappers for LockBuffer() and related
functions.  All nbtree code is required to use the wrappers.

These checks are a stricter version of the memcheck client requests in
bufmgr.c that detect buffer accesses that take place without a buffer
pin held.  The checks never make the buffer pin checking more lax.
---
 src/include/access/nbtree.h           |   4 +
 src/backend/access/nbtree/nbtinsert.c |   2 +-
 src/backend/access/nbtree/nbtpage.c   | 118 ++++++++++++++++++++++----
 src/backend/access/nbtree/nbtree.c    |   6 +-
 src/backend/access/nbtree/nbtsearch.c |  16 ++--
 src/backend/access/nbtree/nbtutils.c  |   4 +-
 6 files changed, 118 insertions(+), 32 deletions(-)

diff --git a/src/include/access/nbtree.h b/src/include/access/nbtree.h
index 9a3acd26b7..284524a8f3 100644
--- a/src/include/access/nbtree.h
+++ b/src/include/access/nbtree.h
@@ -1072,6 +1072,10 @@ extern Buffer _bt_getbuf(Relation rel, BlockNumber blkno, int access);
 extern Buffer _bt_relandgetbuf(Relation rel, Buffer obuf,
 							   BlockNumber blkno, int access);
 extern void _bt_relbuf(Relation rel, Buffer buf);
+extern void _bt_lockbuf(Relation rel, Buffer buf, int access);
+extern void _bt_unlockbuf(Relation rel, Buffer buf);
+extern bool _bt_conditionallockbuf(Relation rel, Buffer buf);
+extern void _bt_upgradelockbufcleanup(Relation rel, Buffer buf);
 extern void _bt_pageinit(Page page, Size size);
 extern bool _bt_page_recyclable(Page page);
 extern void _bt_delitems_vacuum(Relation rel, Buffer buf,
diff --git a/src/backend/access/nbtree/nbtinsert.c b/src/backend/access/nbtree/nbtinsert.c
index a70b64d964..80c3a42bca 100644
--- a/src/backend/access/nbtree/nbtinsert.c
+++ b/src/backend/access/nbtree/nbtinsert.c
@@ -303,7 +303,7 @@ _bt_search_insert(Relation rel, BTInsertState insertstate)
 	{
 		/* Simulate a _bt_getbuf() call with conditional locking */
 		insertstate->buf = ReadBuffer(rel, RelationGetTargetBlock(rel));
-		if (ConditionalLockBuffer(insertstate->buf))
+		if (_bt_conditionallockbuf(rel, insertstate->buf))
 		{
 			Page		page;
 			BTPageOpaque lpageop;
diff --git a/src/backend/access/nbtree/nbtpage.c b/src/backend/access/nbtree/nbtpage.c
index 39b8f17f4b..aa163cb493 100644
--- a/src/backend/access/nbtree/nbtpage.c
+++ b/src/backend/access/nbtree/nbtpage.c
@@ -32,6 +32,7 @@
 #include "storage/indexfsm.h"
 #include "storage/lmgr.h"
 #include "storage/predicate.h"
+#include "utils/memdebug.h"
 #include "utils/snapmgr.h"
 
 static BTMetaPageData *_bt_getmeta(Relation rel, Buffer metabuf);
@@ -191,8 +192,8 @@ _bt_update_meta_cleanup_info(Relation rel, TransactionId oldestBtpoXact,
 	}
 
 	/* trade in our read lock for a write lock */
-	LockBuffer(metabuf, BUFFER_LOCK_UNLOCK);
-	LockBuffer(metabuf, BT_WRITE);
+	_bt_unlockbuf(rel, metabuf);
+	_bt_lockbuf(rel, metabuf, BT_WRITE);
 
 	START_CRIT_SECTION();
 
@@ -333,8 +334,8 @@ _bt_getroot(Relation rel, int access)
 		}
 
 		/* trade in our read lock for a write lock */
-		LockBuffer(metabuf, BUFFER_LOCK_UNLOCK);
-		LockBuffer(metabuf, BT_WRITE);
+		_bt_unlockbuf(rel, metabuf);
+		_bt_lockbuf(rel, metabuf, BT_WRITE);
 
 		/*
 		 * Race condition:	if someone else initialized the metadata between
@@ -427,8 +428,8 @@ _bt_getroot(Relation rel, int access)
 		 * else accessing the new root page while it's unlocked, since no one
 		 * else knows where it is yet.
 		 */
-		LockBuffer(rootbuf, BUFFER_LOCK_UNLOCK);
-		LockBuffer(rootbuf, BT_READ);
+		_bt_unlockbuf(rel, rootbuf);
+		_bt_lockbuf(rel, rootbuf, BT_READ);
 
 		/* okay, metadata is correct, release lock on it without caching */
 		_bt_relbuf(rel, metabuf);
@@ -790,7 +791,7 @@ _bt_getbuf(Relation rel, BlockNumber blkno, int access)
 	{
 		/* Read an existing block of the relation */
 		buf = ReadBuffer(rel, blkno);
-		LockBuffer(buf, access);
+		_bt_lockbuf(rel, buf, access);
 		_bt_checkpage(rel, buf);
 	}
 	else
@@ -830,7 +831,7 @@ _bt_getbuf(Relation rel, BlockNumber blkno, int access)
 			if (blkno == InvalidBlockNumber)
 				break;
 			buf = ReadBuffer(rel, blkno);
-			if (ConditionalLockBuffer(buf))
+			if (_bt_conditionallockbuf(rel, buf))
 			{
 				page = BufferGetPage(buf);
 				if (_bt_page_recyclable(page))
@@ -883,7 +884,7 @@ _bt_getbuf(Relation rel, BlockNumber blkno, int access)
 		buf = ReadBuffer(rel, P_NEW);
 
 		/* Acquire buffer lock on new page */
-		LockBuffer(buf, BT_WRITE);
+		_bt_lockbuf(rel, buf, BT_WRITE);
 
 		/*
 		 * Release the file-extension lock; it's now OK for someone else to
@@ -924,9 +925,10 @@ _bt_relandgetbuf(Relation rel, Buffer obuf, BlockNumber blkno, int access)
 
 	Assert(blkno != P_NEW);
 	if (BufferIsValid(obuf))
-		LockBuffer(obuf, BUFFER_LOCK_UNLOCK);
+		_bt_unlockbuf(rel, obuf);
 	buf = ReleaseAndReadBuffer(obuf, rel, blkno);
-	LockBuffer(buf, access);
+	_bt_lockbuf(rel, buf, access);
+
 	_bt_checkpage(rel, buf);
 	return buf;
 }
@@ -939,7 +941,87 @@ _bt_relandgetbuf(Relation rel, Buffer obuf, BlockNumber blkno, int access)
 void
 _bt_relbuf(Relation rel, Buffer buf)
 {
-	UnlockReleaseBuffer(buf);
+	_bt_unlockbuf(rel, buf);
+	ReleaseBuffer(buf);
+}
+
+/*
+ *	_bt_lockbuf() -- lock a pinned buffer.
+ *
+ * Lock is acquired without acquiring another pin.  This is like a raw
+ * LockBuffer() call, but performs extra steps needed by Valgrind.
+ */
+void
+_bt_lockbuf(Relation rel, Buffer buf, int access)
+{
+	LockBuffer(buf, access);
+
+	/*
+	 * Must not have called _bt_killitems() from _bt_steppage() (our caller),
+	 * since we have a pin but not a lock.  Mark buffer's page defined here
+	 * (this usually happens in _bt_getbuf()).
+	 */
+	if (!RelationUsesLocalBuffers(rel))
+		VALGRIND_MAKE_MEM_DEFINED(BufferGetPage(buf), BLCKSZ);
+}
+
+/*
+ *	_bt_unlockbuf() -- unlock a pinned buffer.
+ *
+ * Only lock is dropped.  Performs extra steps needed by Valgrind.
+ */
+void
+_bt_unlockbuf(Relation rel, Buffer buf)
+{
+	/*
+	 * Buffer is pinned and locked, which means that it is expected to be
+	 * defined and addressable in general.  Make sure of it explicitly here.
+	 */
+	if (!RelationUsesLocalBuffers(rel))
+		VALGRIND_CHECK_MEM_IS_DEFINED(BufferGetPage(buf), BLCKSZ);
+
+	LockBuffer(buf, BUFFER_LOCK_UNLOCK);
+
+	if (!RelationUsesLocalBuffers(rel))
+		VALGRIND_MAKE_MEM_NOACCESS(BufferGetPage(buf), BLCKSZ);
+}
+
+/*
+ *	_bt_conditionallockbuf() -- conditionally BT_WRITE lock pinned buffer.
+ *
+ * Performs extra steps needed by Valgrind.
+ */
+bool
+_bt_conditionallockbuf(Relation rel, Buffer buf)
+{
+	if (!ConditionalLockBuffer(buf))
+		return false;
+
+	if (!RelationUsesLocalBuffers(rel))
+		VALGRIND_MAKE_MEM_DEFINED(BufferGetPage(buf), BLCKSZ);
+
+	return true;
+}
+
+/*
+ *	_bt_upgradelockbufcleanup() -- upgrade lock to super-exclusive/cleanup
+ *	lock.
+ *
+ * Performs extra steps needed by Valgrind.
+ */
+void
+_bt_upgradelockbufcleanup(Relation rel, Buffer buf)
+{
+	/*
+	 * We don't need to mark the buffer NOACCESS, since we're just upgrading
+	 * caller's lock to a cleanup lock.  Make sure that buf's page is already
+	 * defined memory to Valgrind in passing, though.
+	 */
+	if (!RelationUsesLocalBuffers(rel))
+		VALGRIND_CHECK_MEM_IS_DEFINED(BufferGetPage(buf), BLCKSZ);
+
+	LockBuffer(buf, BUFFER_LOCK_UNLOCK);
+	LockBufferForCleanup(buf);
 }
 
 /*
@@ -1603,7 +1685,7 @@ _bt_pagedel(Relation rel, Buffer buf)
 				 * To avoid deadlocks, we'd better drop the leaf page lock
 				 * before going further.
 				 */
-				LockBuffer(buf, BUFFER_LOCK_UNLOCK);
+				_bt_unlockbuf(rel, buf);
 
 				/*
 				 * Fetch the left sibling, to check that it's not marked with
@@ -1649,7 +1731,7 @@ _bt_pagedel(Relation rel, Buffer buf)
 				 * Re-lock the leaf page, and start over, to re-check that the
 				 * page can still be deleted.
 				 */
-				LockBuffer(buf, BT_WRITE);
+				_bt_lockbuf(rel, buf, BT_WRITE);
 				continue;
 			}
 
@@ -1948,7 +2030,7 @@ _bt_unlink_halfdead_page(Relation rel, Buffer leafbuf, bool *rightsib_empty)
 	leafleftsib = opaque->btpo_prev;
 	leafrightsib = opaque->btpo_next;
 
-	LockBuffer(leafbuf, BUFFER_LOCK_UNLOCK);
+	_bt_unlockbuf(rel, leafbuf);
 
 	/*
 	 * Check here, as calling loops will have locks held, preventing
@@ -1979,7 +2061,7 @@ _bt_unlink_halfdead_page(Relation rel, Buffer leafbuf, bool *rightsib_empty)
 		 * To avoid deadlocks, we'd better drop the target page lock before
 		 * going further.
 		 */
-		LockBuffer(buf, BUFFER_LOCK_UNLOCK);
+		_bt_unlockbuf(rel, buf);
 	}
 	else
 	{
@@ -2004,7 +2086,7 @@ _bt_unlink_halfdead_page(Relation rel, Buffer leafbuf, bool *rightsib_empty)
 	 * table.)
 	 */
 	if (target != leafblkno)
-		LockBuffer(leafbuf, BT_WRITE);
+		_bt_lockbuf(rel, leafbuf, BT_WRITE);
 	if (leftsib != P_NONE)
 	{
 		lbuf = _bt_getbuf(rel, leftsib, BT_WRITE);
@@ -2054,7 +2136,7 @@ _bt_unlink_halfdead_page(Relation rel, Buffer leafbuf, bool *rightsib_empty)
 	 * a write lock not a superexclusive lock, since no scans would stop on an
 	 * empty page.
 	 */
-	LockBuffer(buf, BT_WRITE);
+	_bt_lockbuf(rel, buf, BT_WRITE);
 	page = BufferGetPage(buf);
 	opaque = (BTPageOpaque) PageGetSpecialPointer(page);
 
diff --git a/src/backend/access/nbtree/nbtree.c b/src/backend/access/nbtree/nbtree.c
index 36294789f3..5856a1cdc5 100644
--- a/src/backend/access/nbtree/nbtree.c
+++ b/src/backend/access/nbtree/nbtree.c
@@ -1109,7 +1109,8 @@ restart:
 	 */
 	buf = ReadBufferExtended(rel, MAIN_FORKNUM, blkno, RBM_NORMAL,
 							 info->strategy);
-	LockBuffer(buf, BT_READ);
+	_bt_lockbuf(rel, buf, BT_READ);
+
 	page = BufferGetPage(buf);
 	if (!PageIsNew(page))
 	{
@@ -1175,8 +1176,7 @@ restart:
 		 * course of the vacuum scan, whether or not it actually contains any
 		 * deletable tuples --- see nbtree/README.
 		 */
-		LockBuffer(buf, BUFFER_LOCK_UNLOCK);
-		LockBufferForCleanup(buf);
+		_bt_upgradelockbufcleanup(rel, buf);
 
 		/*
 		 * Check whether we need to recurse back to earlier pages.  What we
diff --git a/src/backend/access/nbtree/nbtsearch.c b/src/backend/access/nbtree/nbtsearch.c
index 8ff49ce6d6..80ce78b348 100644
--- a/src/backend/access/nbtree/nbtsearch.c
+++ b/src/backend/access/nbtree/nbtsearch.c
@@ -64,7 +64,7 @@ static inline void _bt_initialize_more_data(BTScanOpaque so, ScanDirection dir);
 static void
 _bt_drop_lock_and_maybe_pin(IndexScanDesc scan, BTScanPos sp)
 {
-	LockBuffer(sp->buf, BUFFER_LOCK_UNLOCK);
+	_bt_unlockbuf(scan->indexRelation, sp->buf);
 
 	if (IsMVCCSnapshot(scan->xs_snapshot) &&
 		RelationNeedsWAL(scan->indexRelation) &&
@@ -190,8 +190,8 @@ _bt_search(Relation rel, BTScanInsert key, Buffer *bufP, int access,
 	if (access == BT_WRITE && page_access == BT_READ)
 	{
 		/* trade in our read lock for a write lock */
-		LockBuffer(*bufP, BUFFER_LOCK_UNLOCK);
-		LockBuffer(*bufP, BT_WRITE);
+		_bt_unlockbuf(rel, *bufP);
+		_bt_lockbuf(rel, *bufP, BT_WRITE);
 
 		/*
 		 * If the page was split between the time that we surrendered our read
@@ -293,8 +293,8 @@ _bt_moveright(Relation rel,
 			/* upgrade our lock if necessary */
 			if (access == BT_READ)
 			{
-				LockBuffer(buf, BUFFER_LOCK_UNLOCK);
-				LockBuffer(buf, BT_WRITE);
+				_bt_unlockbuf(rel, buf);
+				_bt_lockbuf(rel, buf, BT_WRITE);
 			}
 
 			if (P_INCOMPLETE_SPLIT(opaque))
@@ -1417,7 +1417,7 @@ _bt_first(IndexScanDesc scan, ScanDirection dir)
 		 * There's no actually-matching data on this page.  Try to advance to
 		 * the next page.  Return false if there's no matching data at all.
 		 */
-		LockBuffer(so->currPos.buf, BUFFER_LOCK_UNLOCK);
+		_bt_unlockbuf(scan->indexRelation, so->currPos.buf);
 		if (!_bt_steppage(scan, dir))
 			return false;
 	}
@@ -2065,7 +2065,7 @@ _bt_readnextpage(IndexScanDesc scan, BlockNumber blkno, ScanDirection dir)
 		 * deleted.
 		 */
 		if (BTScanPosIsPinned(so->currPos))
-			LockBuffer(so->currPos.buf, BT_READ);
+			_bt_lockbuf(rel, so->currPos.buf, BT_READ);
 		else
 			so->currPos.buf = _bt_getbuf(rel, so->currPos.currPage, BT_READ);
 
@@ -2443,7 +2443,7 @@ _bt_endpoint(IndexScanDesc scan, ScanDirection dir)
 		 * There's no actually-matching data on this page.  Try to advance to
 		 * the next page.  Return false if there's no matching data at all.
 		 */
-		LockBuffer(so->currPos.buf, BUFFER_LOCK_UNLOCK);
+		_bt_unlockbuf(scan->indexRelation, so->currPos.buf);
 		if (!_bt_steppage(scan, dir))
 			return false;
 	}
diff --git a/src/backend/access/nbtree/nbtutils.c b/src/backend/access/nbtree/nbtutils.c
index ce48a51640..f30c0fbdf3 100644
--- a/src/backend/access/nbtree/nbtutils.c
+++ b/src/backend/access/nbtree/nbtutils.c
@@ -1744,7 +1744,7 @@ _bt_killitems(IndexScanDesc scan)
 		 * LSN.
 		 */
 		droppedpin = false;
-		LockBuffer(so->currPos.buf, BT_READ);
+		_bt_lockbuf(scan->indexRelation, so->currPos.buf, BT_READ);
 
 		page = BufferGetPage(so->currPos.buf);
 	}
@@ -1881,7 +1881,7 @@ _bt_killitems(IndexScanDesc scan)
 		MarkBufferDirtyHint(so->currPos.buf, true);
 	}
 
-	LockBuffer(so->currPos.buf, BUFFER_LOCK_UNLOCK);
+	_bt_unlockbuf(scan->indexRelation, so->currPos.buf);
 }
 
 
-- 
2.25.1

v1-0003-Fix-second-page-deletion-bug.patchapplication/octet-stream; name=v1-0003-Fix-second-page-deletion-bug.patchDownload
From 00e662f9ade2cb1bec60483427f46d07c1485e15 Mon Sep 17 00:00:00 2001
From: Peter Geoghegan <pg@bowt.ie>
Date: Fri, 24 Apr 2020 16:29:58 -0700
Subject: [PATCH v1 3/4] Fix second page deletion bug.

---
 src/backend/access/nbtree/nbtpage.c | 17 +++++++++--------
 1 file changed, 9 insertions(+), 8 deletions(-)

diff --git a/src/backend/access/nbtree/nbtpage.c b/src/backend/access/nbtree/nbtpage.c
index aa163cb493..a2e39a6712 100644
--- a/src/backend/access/nbtree/nbtpage.c
+++ b/src/backend/access/nbtree/nbtpage.c
@@ -2030,6 +2030,14 @@ _bt_unlink_halfdead_page(Relation rel, Buffer leafbuf, bool *rightsib_empty)
 	leafleftsib = opaque->btpo_prev;
 	leafrightsib = opaque->btpo_next;
 
+	/*
+	 * If the leaf page still has a parent pointing to it (or a chain of
+	 * parents), we don't unlink the leaf page yet, but the topmost remaining
+	 * parent in the branch.  Set 'target' to reference the page actually
+	 * being unlinked.
+	 */
+	target = BTreeTupleGetTopParent(leafhikey);
+
 	_bt_unlockbuf(rel, leafbuf);
 
 	/*
@@ -2038,16 +2046,9 @@ _bt_unlink_halfdead_page(Relation rel, Buffer leafbuf, bool *rightsib_empty)
 	 */
 	CHECK_FOR_INTERRUPTS();
 
-	/*
-	 * If the leaf page still has a parent pointing to it (or a chain of
-	 * parents), we don't unlink the leaf page yet, but the topmost remaining
-	 * parent in the branch.  Set 'target' and 'buf' to reference the page
-	 * actually being unlinked.
-	 */
-	target = BTreeTupleGetTopParent(leafhikey);
-
 	if (target != InvalidBlockNumber)
 	{
+		/* Set 'buf' to reference the page actually being unlinked */
 		Assert(target != leafblkno);
 
 		/* fetch the block number of the topmost parent's left sibling */
-- 
2.25.1

v1-0004-Fix-third-page-deletion-bug.patchapplication/octet-stream; name=v1-0004-Fix-third-page-deletion-bug.patchDownload
From b161b7cd7903d331f498f19f3d6d23ea69294842 Mon Sep 17 00:00:00 2001
From: Peter Geoghegan <pg@bowt.ie>
Date: Fri, 24 Apr 2020 16:53:11 -0700
Subject: [PATCH v1 4/4] Fix third page deletion bug.

---
 src/backend/access/nbtree/nbtpage.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/src/backend/access/nbtree/nbtpage.c b/src/backend/access/nbtree/nbtpage.c
index a2e39a6712..2aa88acdc6 100644
--- a/src/backend/access/nbtree/nbtpage.c
+++ b/src/backend/access/nbtree/nbtpage.c
@@ -1693,7 +1693,7 @@ _bt_pagedel(Relation rel, Buffer buf)
 				 * to-be-deleted doesn't have a downlink, and the page
 				 * deletion algorithm isn't prepared to handle that.
 				 */
-				if (!P_LEFTMOST(opaque))
+				if (leftsib != P_NONE)
 				{
 					BTPageOpaque lopaque;
 					Page		lpage;
-- 
2.25.1

In reply to: Peter Geoghegan (#1)
2 attachment(s)
Re: Using Valgrind to detect faulty buffer accesses (no pin or buffer content lock held)

On Fri, Apr 24, 2020 at 6:37 PM Peter Geoghegan <pg@bowt.ie> wrote:

The final two patches fix the two newly discovered bugs -- I don't
have a fix for the first bug yet, since that one is more complicated
(and probably more serious).

I pushed both of the two fixes that I posted yesterday -- fixes for
the benign "no buffer lock held" issues in nbtree page deletion. We
still need a fix for the more serious issue, though. And that will
need to be backpatched down to v11. I'll try to get to that next week.

Attached is v2 of the patch series. This version has a centralized
description of what we require from nbtree code above _bt_getbuf(),
alongside existing, similar commentary. Here's the general idea: If we
lock a buffer, that has to go through one of the wrapper functions
that knows about Valgrind in all cases. It's presented as a stricter
version of what happens in bufmgr.c for all buffers from all access
methods.

I also added something about how the nbtree Valgrind client requests
(those added by the second patch in the series) assume that the
bufmgr.c client requests (from the first patch) also take place. We
need to be able to rely on bufmgr.c to "clean up" in the event of an
error, so the second patch has a strict dependency on the first. If
the transaction aborts, we can rely on bufmgr.c marking the buffer's
page as defined when the backend acquires its first buffer pin on the
buffer in the next transaction (doesn't matter whether or not the same
block/page is in the same buffer as before). This is why we can't use
client requests for local buffers (not that we'd want to; the existing
aset.c Valgrind client requests take care of them automatically).

--
Peter Geoghegan

Attachments:

v2-0001-Add-Valgrind-buffer-access-instrumentation.patchapplication/octet-stream; name=v2-0001-Add-Valgrind-buffer-access-instrumentation.patchDownload
From 8e120f9b6079767e656f8ca7eeda447a5c88113b Mon Sep 17 00:00:00 2001
From: Peter Geoghegan <pg@bowt.ie>
Date: Wed, 22 Apr 2020 19:51:43 -0700
Subject: [PATCH v2 1/2] Add Valgrind buffer access instrumentation.
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

Based on a patch from Álvaro Herrera posted to:
https://www.postgresql.org/message-id/20150723195349.GW5596@postgresql.org
---
 src/backend/storage/buffer/bufmgr.c | 18 ++++++++++++++++++
 1 file changed, 18 insertions(+)

diff --git a/src/backend/storage/buffer/bufmgr.c b/src/backend/storage/buffer/bufmgr.c
index f9980cf80c..5d3211140c 100644
--- a/src/backend/storage/buffer/bufmgr.c
+++ b/src/backend/storage/buffer/bufmgr.c
@@ -49,6 +49,7 @@
 #include "storage/proc.h"
 #include "storage/smgr.h"
 #include "storage/standby.h"
+#include "utils/memdebug.h"
 #include "utils/ps_status.h"
 #include "utils/rel.h"
 #include "utils/resowner_private.h"
@@ -1633,6 +1634,13 @@ PinBuffer(BufferDesc *buf, BufferAccessStrategy strategy)
 											   buf_state))
 			{
 				result = (buf_state & BM_VALID) != 0;
+
+				/*
+				 * If we successfully acquired our first pin on this buffer
+				 * within this backend, mark buffer contents defined
+				 */
+				if (result)
+					VALGRIND_MAKE_MEM_DEFINED(BufHdrGetBlock(buf), BLCKSZ);
 				break;
 			}
 		}
@@ -1683,6 +1691,13 @@ PinBuffer_Locked(BufferDesc *buf)
 	 */
 	Assert(GetPrivateRefCountEntry(BufferDescriptorGetBuffer(buf), false) == NULL);
 
+	/*
+	 * Buffer can't have a preexisting pin, so mark its page as defined to
+	 * Valgrind (this is similar to the PinBuffer() case where the backend
+	 * doesn't already have a buffer pin)
+	 */
+	VALGRIND_MAKE_MEM_DEFINED(BufHdrGetBlock(buf), BLCKSZ);
+
 	/*
 	 * Since we hold the buffer spinlock, we can update the buffer state and
 	 * release the lock in one operation.
@@ -1728,6 +1743,9 @@ UnpinBuffer(BufferDesc *buf, bool fixOwner)
 		uint32		buf_state;
 		uint32		old_buf_state;
 
+		/* Mark undefined, now that no pins remain in backend */
+		VALGRIND_MAKE_MEM_NOACCESS(BufHdrGetBlock(buf), BLCKSZ);
+
 		/* I'd better not still hold any locks on the buffer */
 		Assert(!LWLockHeldByMe(BufferDescriptorGetContentLock(buf)));
 		Assert(!LWLockHeldByMe(BufferDescriptorGetIOLock(buf)));
-- 
2.25.1

v2-0002-Add-nbtree-Valgrind-buffer-lock-checks.patchapplication/octet-stream; name=v2-0002-Add-nbtree-Valgrind-buffer-lock-checks.patchDownload
From 2d89b1708b7f9d904fd11a10cd8a13042e406f69 Mon Sep 17 00:00:00 2001
From: Peter Geoghegan <pg@bowt.ie>
Date: Thu, 23 Apr 2020 14:43:54 -0700
Subject: [PATCH v2 2/2] Add nbtree Valgrind buffer lock checks.

Teach nbtree to use Valgrind memcheck client requests to mark pages as
NOACCESS at the point the buffer content lock is released.  Callers
usually also drop their pin at the same time, though there are some
exceptions.  This instrumentation makes Valgrind detect any page access
that occurs without a buffer lock, which is fundamentally unsafe with
B-Tree index pages.  Buffer pins only prevent VACUUM from performing
concurrent recycling of heap TIDs from the pinned leaf page.  Our
assumption is that they don't make it safe to access the page in any way
whatsoever.

We provide nbtree specific wrappers for LockBuffer() and related
functions.  All nbtree code is required to use the wrappers.

These checks are a stricter version of the memcheck client requests in
bufmgr.c that detect buffer accesses that take place without a buffer
pin held.  The checks never make the buffer pin checking more lax.
---
 src/include/access/nbtree.h           |   4 +
 src/backend/access/nbtree/nbtinsert.c |   2 +-
 src/backend/access/nbtree/nbtpage.c   | 126 ++++++++++++++++++++++----
 src/backend/access/nbtree/nbtree.c    |   6 +-
 src/backend/access/nbtree/nbtsearch.c |  16 ++--
 src/backend/access/nbtree/nbtutils.c  |   4 +-
 6 files changed, 125 insertions(+), 33 deletions(-)

diff --git a/src/include/access/nbtree.h b/src/include/access/nbtree.h
index 9a3acd26b7..284524a8f3 100644
--- a/src/include/access/nbtree.h
+++ b/src/include/access/nbtree.h
@@ -1072,6 +1072,10 @@ extern Buffer _bt_getbuf(Relation rel, BlockNumber blkno, int access);
 extern Buffer _bt_relandgetbuf(Relation rel, Buffer obuf,
 							   BlockNumber blkno, int access);
 extern void _bt_relbuf(Relation rel, Buffer buf);
+extern void _bt_lockbuf(Relation rel, Buffer buf, int access);
+extern void _bt_unlockbuf(Relation rel, Buffer buf);
+extern bool _bt_conditionallockbuf(Relation rel, Buffer buf);
+extern void _bt_upgradelockbufcleanup(Relation rel, Buffer buf);
 extern void _bt_pageinit(Page page, Size size);
 extern bool _bt_page_recyclable(Page page);
 extern void _bt_delitems_vacuum(Relation rel, Buffer buf,
diff --git a/src/backend/access/nbtree/nbtinsert.c b/src/backend/access/nbtree/nbtinsert.c
index a70b64d964..80c3a42bca 100644
--- a/src/backend/access/nbtree/nbtinsert.c
+++ b/src/backend/access/nbtree/nbtinsert.c
@@ -303,7 +303,7 @@ _bt_search_insert(Relation rel, BTInsertState insertstate)
 	{
 		/* Simulate a _bt_getbuf() call with conditional locking */
 		insertstate->buf = ReadBuffer(rel, RelationGetTargetBlock(rel));
-		if (ConditionalLockBuffer(insertstate->buf))
+		if (_bt_conditionallockbuf(rel, insertstate->buf))
 		{
 			Page		page;
 			BTPageOpaque lpageop;
diff --git a/src/backend/access/nbtree/nbtpage.c b/src/backend/access/nbtree/nbtpage.c
index f5962f6163..22e2560d00 100644
--- a/src/backend/access/nbtree/nbtpage.c
+++ b/src/backend/access/nbtree/nbtpage.c
@@ -32,6 +32,7 @@
 #include "storage/indexfsm.h"
 #include "storage/lmgr.h"
 #include "storage/predicate.h"
+#include "utils/memdebug.h"
 #include "utils/snapmgr.h"
 
 static BTMetaPageData *_bt_getmeta(Relation rel, Buffer metabuf);
@@ -191,8 +192,8 @@ _bt_update_meta_cleanup_info(Relation rel, TransactionId oldestBtpoXact,
 	}
 
 	/* trade in our read lock for a write lock */
-	LockBuffer(metabuf, BUFFER_LOCK_UNLOCK);
-	LockBuffer(metabuf, BT_WRITE);
+	_bt_unlockbuf(rel, metabuf);
+	_bt_lockbuf(rel, metabuf, BT_WRITE);
 
 	START_CRIT_SECTION();
 
@@ -333,8 +334,8 @@ _bt_getroot(Relation rel, int access)
 		}
 
 		/* trade in our read lock for a write lock */
-		LockBuffer(metabuf, BUFFER_LOCK_UNLOCK);
-		LockBuffer(metabuf, BT_WRITE);
+		_bt_unlockbuf(rel, metabuf);
+		_bt_lockbuf(rel, metabuf, BT_WRITE);
 
 		/*
 		 * Race condition:	if someone else initialized the metadata between
@@ -427,8 +428,8 @@ _bt_getroot(Relation rel, int access)
 		 * else accessing the new root page while it's unlocked, since no one
 		 * else knows where it is yet.
 		 */
-		LockBuffer(rootbuf, BUFFER_LOCK_UNLOCK);
-		LockBuffer(rootbuf, BT_READ);
+		_bt_unlockbuf(rel, rootbuf);
+		_bt_lockbuf(rel, rootbuf, BT_READ);
 
 		/* okay, metadata is correct, release lock on it without caching */
 		_bt_relbuf(rel, metabuf);
@@ -779,7 +780,26 @@ _bt_log_reuse_page(Relation rel, BlockNumber blkno, TransactionId latestRemovedX
  *		When this routine returns, the appropriate lock is set on the
  *		requested buffer and its reference count has been incremented
  *		(ie, the buffer is "locked and pinned").  Also, we apply
- *		_bt_checkpage to sanity-check the page (except in P_NEW case).
+ *		_bt_checkpage to sanity-check the page (except in P_NEW case),
+ *		and perform Valgrind client requests that indicate whether or
+ *		not it's currently safe to access the buffer (i.e. whether or
+ *		not a lock is held on the buffer).
+ *
+ *		The general rule in nbtree is that it's never okay to access a
+ *		page without holding both a buffer pin and a buffer lock on
+ *		the page's buffer.  This is slightly stricter than necessary,
+ *		but it makes the Valgrind checks simpler.
+ *
+ *		The Valgrind checks are a strict superset of the generic
+ *		buffer pin checks in bufmgr.c.  Raw LockBuffer() calls are
+ *		disallowed in nbtree; all buffer lock requests need to go
+ *		through the wrapper functions.  The client requests here rely
+ *		on the generic bufmgr.c client requests to mark memory as
+ *		defined when a shared buffer is reused for another page in the
+ *		event of an error.  That's why we don't mark pages as NOACCESS
+ *		when they're in a local buffer: it can lead to false positives
+ *		from Valgrind.  (Besides, local buffers are protected by the
+ *		generic aset.c client requests.)
  */
 Buffer
 _bt_getbuf(Relation rel, BlockNumber blkno, int access)
@@ -790,7 +810,7 @@ _bt_getbuf(Relation rel, BlockNumber blkno, int access)
 	{
 		/* Read an existing block of the relation */
 		buf = ReadBuffer(rel, blkno);
-		LockBuffer(buf, access);
+		_bt_lockbuf(rel, buf, access);
 		_bt_checkpage(rel, buf);
 	}
 	else
@@ -830,7 +850,7 @@ _bt_getbuf(Relation rel, BlockNumber blkno, int access)
 			if (blkno == InvalidBlockNumber)
 				break;
 			buf = ReadBuffer(rel, blkno);
-			if (ConditionalLockBuffer(buf))
+			if (_bt_conditionallockbuf(rel, buf))
 			{
 				page = BufferGetPage(buf);
 				if (_bt_page_recyclable(page))
@@ -883,7 +903,7 @@ _bt_getbuf(Relation rel, BlockNumber blkno, int access)
 		buf = ReadBuffer(rel, P_NEW);
 
 		/* Acquire buffer lock on new page */
-		LockBuffer(buf, BT_WRITE);
+		_bt_lockbuf(rel, buf, BT_WRITE);
 
 		/*
 		 * Release the file-extension lock; it's now OK for someone else to
@@ -924,9 +944,10 @@ _bt_relandgetbuf(Relation rel, Buffer obuf, BlockNumber blkno, int access)
 
 	Assert(blkno != P_NEW);
 	if (BufferIsValid(obuf))
-		LockBuffer(obuf, BUFFER_LOCK_UNLOCK);
+		_bt_unlockbuf(rel, obuf);
 	buf = ReleaseAndReadBuffer(obuf, rel, blkno);
-	LockBuffer(buf, access);
+	_bt_lockbuf(rel, buf, access);
+
 	_bt_checkpage(rel, buf);
 	return buf;
 }
@@ -939,7 +960,74 @@ _bt_relandgetbuf(Relation rel, Buffer obuf, BlockNumber blkno, int access)
 void
 _bt_relbuf(Relation rel, Buffer buf)
 {
-	UnlockReleaseBuffer(buf);
+	_bt_unlockbuf(rel, buf);
+	ReleaseBuffer(buf);
+}
+
+/*
+ *	_bt_lockbuf() -- lock a pinned buffer.
+ *
+ * Lock is acquired without acquiring another pin.  This is like a raw
+ * LockBuffer() call, but performs extra steps needed by Valgrind.
+ */
+void
+_bt_lockbuf(Relation rel, Buffer buf, int access)
+{
+	LockBuffer(buf, access);
+
+	if (!RelationUsesLocalBuffers(rel))
+		VALGRIND_MAKE_MEM_DEFINED(BufferGetPage(buf), BLCKSZ);
+}
+
+/*
+ *	_bt_unlockbuf() -- unlock a pinned buffer.
+ */
+void
+_bt_unlockbuf(Relation rel, Buffer buf)
+{
+	/*
+	 * Buffer is pinned and locked, which means that it is expected to
+	 * be defined and addressable in general.  Make sure of it.
+	 */
+	VALGRIND_CHECK_MEM_IS_DEFINED(BufferGetPage(buf), BLCKSZ);
+
+	LockBuffer(buf, BUFFER_LOCK_UNLOCK);
+
+	if (!RelationUsesLocalBuffers(rel))
+		VALGRIND_MAKE_MEM_NOACCESS(BufferGetPage(buf), BLCKSZ);
+}
+
+/*
+ *	_bt_conditionallockbuf() -- conditionally BT_WRITE lock pinned
+ *	buffer.
+ */
+bool
+_bt_conditionallockbuf(Relation rel, Buffer buf)
+{
+	if (!ConditionalLockBuffer(buf))
+		return false;
+
+	if (!RelationUsesLocalBuffers(rel))
+		VALGRIND_MAKE_MEM_DEFINED(BufferGetPage(buf), BLCKSZ);
+
+	return true;
+}
+
+/*
+ *	_bt_upgradelockbufcleanup() -- upgrade lock to super-exclusive/cleanup
+ *	lock.
+ */
+void
+_bt_upgradelockbufcleanup(Relation rel, Buffer buf)
+{
+	/*
+	 * Buffer is pinned and locked, which means that it is expected to
+	 * be defined and addressable in general.  Make sure of it.
+	 */
+	VALGRIND_CHECK_MEM_IS_DEFINED(BufferGetPage(buf), BLCKSZ);
+
+	LockBuffer(buf, BUFFER_LOCK_UNLOCK);
+	LockBufferForCleanup(buf);
 }
 
 /*
@@ -1603,7 +1691,7 @@ _bt_pagedel(Relation rel, Buffer buf)
 				 * To avoid deadlocks, we'd better drop the leaf page lock
 				 * before going further.
 				 */
-				LockBuffer(buf, BUFFER_LOCK_UNLOCK);
+				_bt_unlockbuf(rel, buf);
 
 				/*
 				 * Fetch the left sibling, to check that it's not marked with
@@ -1649,7 +1737,7 @@ _bt_pagedel(Relation rel, Buffer buf)
 				 * Re-lock the leaf page, and start over, to re-check that the
 				 * page can still be deleted.
 				 */
-				LockBuffer(buf, BT_WRITE);
+				_bt_lockbuf(rel, buf, BT_WRITE);
 				continue;
 			}
 
@@ -1955,7 +2043,7 @@ _bt_unlink_halfdead_page(Relation rel, Buffer leafbuf, bool *rightsib_empty)
 	leafleftsib = opaque->btpo_prev;
 	leafrightsib = opaque->btpo_next;
 
-	LockBuffer(leafbuf, BUFFER_LOCK_UNLOCK);
+	_bt_unlockbuf(rel, leafbuf);
 
 	/*
 	 * Check here, as calling loops will have locks held, preventing
@@ -1994,7 +2082,7 @@ _bt_unlink_halfdead_page(Relation rel, Buffer leafbuf, bool *rightsib_empty)
 		 * To avoid deadlocks, we'd better drop the target page lock before
 		 * going further.
 		 */
-		LockBuffer(buf, BUFFER_LOCK_UNLOCK);
+		_bt_unlockbuf(rel, buf);
 	}
 
 	/*
@@ -2011,7 +2099,7 @@ _bt_unlink_halfdead_page(Relation rel, Buffer leafbuf, bool *rightsib_empty)
 	 * table.)
 	 */
 	if (target != leafblkno)
-		LockBuffer(leafbuf, BT_WRITE);
+		_bt_lockbuf(rel, leafbuf, BT_WRITE);
 	if (leftsib != P_NONE)
 	{
 		lbuf = _bt_getbuf(rel, leftsib, BT_WRITE);
@@ -2061,7 +2149,7 @@ _bt_unlink_halfdead_page(Relation rel, Buffer leafbuf, bool *rightsib_empty)
 	 * a write lock not a superexclusive lock, since no scans would stop on an
 	 * empty page.
 	 */
-	LockBuffer(buf, BT_WRITE);
+	_bt_lockbuf(rel, buf, BT_WRITE);
 	page = BufferGetPage(buf);
 	opaque = (BTPageOpaque) PageGetSpecialPointer(page);
 
diff --git a/src/backend/access/nbtree/nbtree.c b/src/backend/access/nbtree/nbtree.c
index 36294789f3..5856a1cdc5 100644
--- a/src/backend/access/nbtree/nbtree.c
+++ b/src/backend/access/nbtree/nbtree.c
@@ -1109,7 +1109,8 @@ restart:
 	 */
 	buf = ReadBufferExtended(rel, MAIN_FORKNUM, blkno, RBM_NORMAL,
 							 info->strategy);
-	LockBuffer(buf, BT_READ);
+	_bt_lockbuf(rel, buf, BT_READ);
+
 	page = BufferGetPage(buf);
 	if (!PageIsNew(page))
 	{
@@ -1175,8 +1176,7 @@ restart:
 		 * course of the vacuum scan, whether or not it actually contains any
 		 * deletable tuples --- see nbtree/README.
 		 */
-		LockBuffer(buf, BUFFER_LOCK_UNLOCK);
-		LockBufferForCleanup(buf);
+		_bt_upgradelockbufcleanup(rel, buf);
 
 		/*
 		 * Check whether we need to recurse back to earlier pages.  What we
diff --git a/src/backend/access/nbtree/nbtsearch.c b/src/backend/access/nbtree/nbtsearch.c
index 8ff49ce6d6..80ce78b348 100644
--- a/src/backend/access/nbtree/nbtsearch.c
+++ b/src/backend/access/nbtree/nbtsearch.c
@@ -64,7 +64,7 @@ static inline void _bt_initialize_more_data(BTScanOpaque so, ScanDirection dir);
 static void
 _bt_drop_lock_and_maybe_pin(IndexScanDesc scan, BTScanPos sp)
 {
-	LockBuffer(sp->buf, BUFFER_LOCK_UNLOCK);
+	_bt_unlockbuf(scan->indexRelation, sp->buf);
 
 	if (IsMVCCSnapshot(scan->xs_snapshot) &&
 		RelationNeedsWAL(scan->indexRelation) &&
@@ -190,8 +190,8 @@ _bt_search(Relation rel, BTScanInsert key, Buffer *bufP, int access,
 	if (access == BT_WRITE && page_access == BT_READ)
 	{
 		/* trade in our read lock for a write lock */
-		LockBuffer(*bufP, BUFFER_LOCK_UNLOCK);
-		LockBuffer(*bufP, BT_WRITE);
+		_bt_unlockbuf(rel, *bufP);
+		_bt_lockbuf(rel, *bufP, BT_WRITE);
 
 		/*
 		 * If the page was split between the time that we surrendered our read
@@ -293,8 +293,8 @@ _bt_moveright(Relation rel,
 			/* upgrade our lock if necessary */
 			if (access == BT_READ)
 			{
-				LockBuffer(buf, BUFFER_LOCK_UNLOCK);
-				LockBuffer(buf, BT_WRITE);
+				_bt_unlockbuf(rel, buf);
+				_bt_lockbuf(rel, buf, BT_WRITE);
 			}
 
 			if (P_INCOMPLETE_SPLIT(opaque))
@@ -1417,7 +1417,7 @@ _bt_first(IndexScanDesc scan, ScanDirection dir)
 		 * There's no actually-matching data on this page.  Try to advance to
 		 * the next page.  Return false if there's no matching data at all.
 		 */
-		LockBuffer(so->currPos.buf, BUFFER_LOCK_UNLOCK);
+		_bt_unlockbuf(scan->indexRelation, so->currPos.buf);
 		if (!_bt_steppage(scan, dir))
 			return false;
 	}
@@ -2065,7 +2065,7 @@ _bt_readnextpage(IndexScanDesc scan, BlockNumber blkno, ScanDirection dir)
 		 * deleted.
 		 */
 		if (BTScanPosIsPinned(so->currPos))
-			LockBuffer(so->currPos.buf, BT_READ);
+			_bt_lockbuf(rel, so->currPos.buf, BT_READ);
 		else
 			so->currPos.buf = _bt_getbuf(rel, so->currPos.currPage, BT_READ);
 
@@ -2443,7 +2443,7 @@ _bt_endpoint(IndexScanDesc scan, ScanDirection dir)
 		 * There's no actually-matching data on this page.  Try to advance to
 		 * the next page.  Return false if there's no matching data at all.
 		 */
-		LockBuffer(so->currPos.buf, BUFFER_LOCK_UNLOCK);
+		_bt_unlockbuf(scan->indexRelation, so->currPos.buf);
 		if (!_bt_steppage(scan, dir))
 			return false;
 	}
diff --git a/src/backend/access/nbtree/nbtutils.c b/src/backend/access/nbtree/nbtutils.c
index ce48a51640..f30c0fbdf3 100644
--- a/src/backend/access/nbtree/nbtutils.c
+++ b/src/backend/access/nbtree/nbtutils.c
@@ -1744,7 +1744,7 @@ _bt_killitems(IndexScanDesc scan)
 		 * LSN.
 		 */
 		droppedpin = false;
-		LockBuffer(so->currPos.buf, BT_READ);
+		_bt_lockbuf(scan->indexRelation, so->currPos.buf, BT_READ);
 
 		page = BufferGetPage(so->currPos.buf);
 	}
@@ -1881,7 +1881,7 @@ _bt_killitems(IndexScanDesc scan)
 		MarkBufferDirtyHint(so->currPos.buf, true);
 	}
 
-	LockBuffer(so->currPos.buf, BUFFER_LOCK_UNLOCK);
+	_bt_unlockbuf(scan->indexRelation, so->currPos.buf);
 }
 
 
-- 
2.25.1

#3Daniel Gustafsson
daniel@yesql.se
In reply to: Peter Geoghegan (#2)
Re: Using Valgrind to detect faulty buffer accesses (no pin or buffer content lock held)

On 26 Apr 2020, at 02:17, Peter Geoghegan <pg@bowt.ie> wrote:

Attached is v2 of the patch series.

This patch fails to apply to HEAD due to conflicts in nbtpage.c, can you please
submit a rebased version?

cheers ./daniel

In reply to: Daniel Gustafsson (#3)
2 attachment(s)
Re: Using Valgrind to detect faulty buffer accesses (no pin or buffer content lock held)

On Thu, Jul 2, 2020 at 7:48 AM Daniel Gustafsson <daniel@yesql.se> wrote:

This patch fails to apply to HEAD due to conflicts in nbtpage.c, can you please
submit a rebased version?

I attach the rebased patch series.

Thanks
--
Peter Geoghegan

Attachments:

v3-0002-Add-nbtree-Valgrind-buffer-lock-checks.patchapplication/octet-stream; name=v3-0002-Add-nbtree-Valgrind-buffer-lock-checks.patchDownload
From b84934beb3fd08ab09fec2d2dd68168453e0b9b0 Mon Sep 17 00:00:00 2001
From: Peter Geoghegan <pg@bowt.ie>
Date: Thu, 30 Apr 2020 12:31:42 -0700
Subject: [PATCH v3 2/2] Add nbtree Valgrind buffer lock checks.

Teach nbtree to use Valgrind memcheck client requests to mark pages as
NOACCESS at the point the buffer content lock is released.  Callers
usually also drop their pin at the same time, though there are some
exceptions.  This instrumentation makes Valgrind detect any page access
that occurs without a buffer lock, which is fundamentally unsafe with
B-Tree index pages.  Buffer pins only prevent VACUUM from performing
concurrent recycling of heap TIDs from the pinned leaf page.  Our
assumption is that they don't make it safe to access the page in any way
whatsoever.

We provide nbtree specific wrappers for LockBuffer() and related
functions.  All nbtree code is required to use the wrappers.

These checks are a stricter version of the memcheck client requests in
bufmgr.c that detect buffer accesses that take place without a buffer
pin held.  The checks never make the buffer pin checking more lax.
---
 src/include/access/nbtree.h           |   4 +
 src/backend/access/nbtree/nbtinsert.c |   2 +-
 src/backend/access/nbtree/nbtpage.c   | 128 ++++++++++++++++++++++----
 src/backend/access/nbtree/nbtree.c    |   6 +-
 src/backend/access/nbtree/nbtsearch.c |  16 ++--
 src/backend/access/nbtree/nbtutils.c  |   4 +-
 6 files changed, 127 insertions(+), 33 deletions(-)

diff --git a/src/include/access/nbtree.h b/src/include/access/nbtree.h
index 79506c748b..f5274cc750 100644
--- a/src/include/access/nbtree.h
+++ b/src/include/access/nbtree.h
@@ -1073,6 +1073,10 @@ extern Buffer _bt_getbuf(Relation rel, BlockNumber blkno, int access);
 extern Buffer _bt_relandgetbuf(Relation rel, Buffer obuf,
 							   BlockNumber blkno, int access);
 extern void _bt_relbuf(Relation rel, Buffer buf);
+extern void _bt_lockbuf(Relation rel, Buffer buf, int access);
+extern void _bt_unlockbuf(Relation rel, Buffer buf);
+extern bool _bt_conditionallockbuf(Relation rel, Buffer buf);
+extern void _bt_upgradelockbufcleanup(Relation rel, Buffer buf);
 extern void _bt_pageinit(Page page, Size size);
 extern bool _bt_page_recyclable(Page page);
 extern void _bt_delitems_vacuum(Relation rel, Buffer buf,
diff --git a/src/backend/access/nbtree/nbtinsert.c b/src/backend/access/nbtree/nbtinsert.c
index b86c122763..e3a44bc09e 100644
--- a/src/backend/access/nbtree/nbtinsert.c
+++ b/src/backend/access/nbtree/nbtinsert.c
@@ -303,7 +303,7 @@ _bt_search_insert(Relation rel, BTInsertState insertstate)
 	{
 		/* Simulate a _bt_getbuf() call with conditional locking */
 		insertstate->buf = ReadBuffer(rel, RelationGetTargetBlock(rel));
-		if (ConditionalLockBuffer(insertstate->buf))
+		if (_bt_conditionallockbuf(rel, insertstate->buf))
 		{
 			Page		page;
 			BTPageOpaque lpageop;
diff --git a/src/backend/access/nbtree/nbtpage.c b/src/backend/access/nbtree/nbtpage.c
index 75628e0eb9..a516600a85 100644
--- a/src/backend/access/nbtree/nbtpage.c
+++ b/src/backend/access/nbtree/nbtpage.c
@@ -32,6 +32,7 @@
 #include "storage/indexfsm.h"
 #include "storage/lmgr.h"
 #include "storage/predicate.h"
+#include "utils/memdebug.h"
 #include "utils/snapmgr.h"
 
 static BTMetaPageData *_bt_getmeta(Relation rel, Buffer metabuf);
@@ -198,8 +199,8 @@ _bt_update_meta_cleanup_info(Relation rel, TransactionId oldestBtpoXact,
 	}
 
 	/* trade in our read lock for a write lock */
-	LockBuffer(metabuf, BUFFER_LOCK_UNLOCK);
-	LockBuffer(metabuf, BT_WRITE);
+	_bt_unlockbuf(rel, metabuf);
+	_bt_lockbuf(rel, metabuf, BT_WRITE);
 
 	START_CRIT_SECTION();
 
@@ -340,8 +341,8 @@ _bt_getroot(Relation rel, int access)
 		}
 
 		/* trade in our read lock for a write lock */
-		LockBuffer(metabuf, BUFFER_LOCK_UNLOCK);
-		LockBuffer(metabuf, BT_WRITE);
+		_bt_unlockbuf(rel, metabuf);
+		_bt_lockbuf(rel, metabuf, BT_WRITE);
 
 		/*
 		 * Race condition:	if someone else initialized the metadata between
@@ -434,8 +435,8 @@ _bt_getroot(Relation rel, int access)
 		 * else accessing the new root page while it's unlocked, since no one
 		 * else knows where it is yet.
 		 */
-		LockBuffer(rootbuf, BUFFER_LOCK_UNLOCK);
-		LockBuffer(rootbuf, BT_READ);
+		_bt_unlockbuf(rel, rootbuf);
+		_bt_lockbuf(rel, rootbuf, BT_READ);
 
 		/* okay, metadata is correct, release lock on it without caching */
 		_bt_relbuf(rel, metabuf);
@@ -786,7 +787,26 @@ _bt_log_reuse_page(Relation rel, BlockNumber blkno, TransactionId latestRemovedX
  *		When this routine returns, the appropriate lock is set on the
  *		requested buffer and its reference count has been incremented
  *		(ie, the buffer is "locked and pinned").  Also, we apply
- *		_bt_checkpage to sanity-check the page (except in P_NEW case).
+ *		_bt_checkpage to sanity-check the page (except in P_NEW case),
+ *		and perform Valgrind client requests that indicate whether or
+ *		not it's currently safe to access the buffer (i.e. whether or
+ *		not a lock is held on the buffer).
+ *
+ *		The general rule in nbtree is that it's never okay to access a
+ *		page without holding both a buffer pin and a buffer lock on
+ *		the page's buffer.  This is slightly stricter than necessary,
+ *		but it makes the Valgrind checks simpler.
+ *
+ *		The Valgrind checks are a strict superset of the generic
+ *		buffer pin checks in bufmgr.c.  Raw LockBuffer() calls are
+ *		disallowed in nbtree; all buffer lock requests need to go
+ *		through the wrapper functions.  The client requests here rely
+ *		on the generic bufmgr.c client requests to mark memory as
+ *		defined when a shared buffer is reused for another page in the
+ *		event of an error.  That's why we don't mark pages as NOACCESS
+ *		when they're in a local buffer: it can lead to false positives
+ *		from Valgrind.  (Besides, local buffers are protected by the
+ *		generic aset.c client requests.)
  */
 Buffer
 _bt_getbuf(Relation rel, BlockNumber blkno, int access)
@@ -797,7 +817,7 @@ _bt_getbuf(Relation rel, BlockNumber blkno, int access)
 	{
 		/* Read an existing block of the relation */
 		buf = ReadBuffer(rel, blkno);
-		LockBuffer(buf, access);
+		_bt_lockbuf(rel, buf, access);
 		_bt_checkpage(rel, buf);
 	}
 	else
@@ -837,7 +857,7 @@ _bt_getbuf(Relation rel, BlockNumber blkno, int access)
 			if (blkno == InvalidBlockNumber)
 				break;
 			buf = ReadBuffer(rel, blkno);
-			if (ConditionalLockBuffer(buf))
+			if (_bt_conditionallockbuf(rel, buf))
 			{
 				page = BufferGetPage(buf);
 				if (_bt_page_recyclable(page))
@@ -890,7 +910,7 @@ _bt_getbuf(Relation rel, BlockNumber blkno, int access)
 		buf = ReadBuffer(rel, P_NEW);
 
 		/* Acquire buffer lock on new page */
-		LockBuffer(buf, BT_WRITE);
+		_bt_lockbuf(rel, buf, BT_WRITE);
 
 		/*
 		 * Release the file-extension lock; it's now OK for someone else to
@@ -931,9 +951,10 @@ _bt_relandgetbuf(Relation rel, Buffer obuf, BlockNumber blkno, int access)
 
 	Assert(blkno != P_NEW);
 	if (BufferIsValid(obuf))
-		LockBuffer(obuf, BUFFER_LOCK_UNLOCK);
+		_bt_unlockbuf(rel, obuf);
 	buf = ReleaseAndReadBuffer(obuf, rel, blkno);
-	LockBuffer(buf, access);
+	_bt_lockbuf(rel, buf, access);
+
 	_bt_checkpage(rel, buf);
 	return buf;
 }
@@ -946,7 +967,76 @@ _bt_relandgetbuf(Relation rel, Buffer obuf, BlockNumber blkno, int access)
 void
 _bt_relbuf(Relation rel, Buffer buf)
 {
-	UnlockReleaseBuffer(buf);
+	_bt_unlockbuf(rel, buf);
+	ReleaseBuffer(buf);
+}
+
+/*
+ *	_bt_lockbuf() -- lock a pinned buffer.
+ *
+ * Lock is acquired without acquiring another pin.  This is like a raw
+ * LockBuffer() call, but performs extra steps needed by Valgrind.
+ */
+void
+_bt_lockbuf(Relation rel, Buffer buf, int access)
+{
+	LockBuffer(buf, access);
+
+	if (!RelationUsesLocalBuffers(rel))
+		VALGRIND_MAKE_MEM_DEFINED(BufferGetPage(buf), BLCKSZ);
+}
+
+/*
+ *	_bt_unlockbuf() -- unlock a pinned buffer.
+ */
+void
+_bt_unlockbuf(Relation rel, Buffer buf)
+{
+	/*
+	 * Buffer is pinned and locked, which means that it is expected to be
+	 * defined and addressable in general.  Make sure of it.
+	 */
+	VALGRIND_CHECK_MEM_IS_DEFINED(BufferGetPage(buf), BLCKSZ);
+
+	LockBuffer(buf, BUFFER_LOCK_UNLOCK);
+
+	if (!RelationUsesLocalBuffers(rel))
+		VALGRIND_MAKE_MEM_NOACCESS(BufferGetPage(buf), BLCKSZ);
+}
+
+/*
+ *	_bt_conditionallockbuf() -- conditionally BT_WRITE lock pinned
+ *	buffer.
+ *
+ * Note: Caller is responsible for calling _bt_checkpage() on success.
+ */
+bool
+_bt_conditionallockbuf(Relation rel, Buffer buf)
+{
+	if (!ConditionalLockBuffer(buf))
+		return false;
+
+	if (!RelationUsesLocalBuffers(rel))
+		VALGRIND_MAKE_MEM_DEFINED(BufferGetPage(buf), BLCKSZ);
+
+	return true;
+}
+
+/*
+ *	_bt_upgradelockbufcleanup() -- upgrade lock to super-exclusive/cleanup
+ *	lock.
+ */
+void
+_bt_upgradelockbufcleanup(Relation rel, Buffer buf)
+{
+	/*
+	 * Buffer is pinned and locked, which means that it is expected to be
+	 * defined and addressable in general.  Make sure of it.
+	 */
+	VALGRIND_CHECK_MEM_IS_DEFINED(BufferGetPage(buf), BLCKSZ);
+
+	LockBuffer(buf, BUFFER_LOCK_UNLOCK);
+	LockBufferForCleanup(buf);
 }
 
 /*
@@ -1580,7 +1670,7 @@ _bt_pagedel(Relation rel, Buffer leafbuf, TransactionId *oldestBtpoXact)
 				 * To avoid deadlocks, we'd better drop the leaf page lock
 				 * before going further.
 				 */
-				LockBuffer(leafbuf, BUFFER_LOCK_UNLOCK);
+				_bt_unlockbuf(rel, leafbuf);
 
 				/*
 				 * Check that the left sibling of leafbuf (if any) is not
@@ -1616,7 +1706,7 @@ _bt_pagedel(Relation rel, Buffer leafbuf, TransactionId *oldestBtpoXact)
 				 * (Page deletion can cope with the stack being to the left of
 				 * leafbuf, but not to the right of leafbuf.)
 				 */
-				LockBuffer(leafbuf, BT_WRITE);
+				_bt_lockbuf(rel, leafbuf, BT_WRITE);
 				continue;
 			}
 
@@ -1970,7 +2060,7 @@ _bt_unlink_halfdead_page(Relation rel, Buffer leafbuf, BlockNumber scanblkno,
 	leafleftsib = opaque->btpo_prev;
 	leafrightsib = opaque->btpo_next;
 
-	LockBuffer(leafbuf, BUFFER_LOCK_UNLOCK);
+	_bt_unlockbuf(rel, leafbuf);
 
 	/*
 	 * Check here, as calling loops will have locks held, preventing
@@ -2005,7 +2095,7 @@ _bt_unlink_halfdead_page(Relation rel, Buffer leafbuf, BlockNumber scanblkno,
 		 * To avoid deadlocks, we'd better drop the target page lock before
 		 * going further.
 		 */
-		LockBuffer(buf, BUFFER_LOCK_UNLOCK);
+		_bt_unlockbuf(rel, buf);
 	}
 
 	/*
@@ -2022,7 +2112,7 @@ _bt_unlink_halfdead_page(Relation rel, Buffer leafbuf, BlockNumber scanblkno,
 	 * table.)
 	 */
 	if (target != leafblkno)
-		LockBuffer(leafbuf, BT_WRITE);
+		_bt_lockbuf(rel, leafbuf, BT_WRITE);
 	if (leftsib != P_NONE)
 	{
 		lbuf = _bt_getbuf(rel, leftsib, BT_WRITE);
@@ -2072,7 +2162,7 @@ _bt_unlink_halfdead_page(Relation rel, Buffer leafbuf, BlockNumber scanblkno,
 	 * rather than a superexclusive lock, since no scan will stop on an empty
 	 * page.
 	 */
-	LockBuffer(buf, BT_WRITE);
+	_bt_lockbuf(rel, buf, BT_WRITE);
 	page = BufferGetPage(buf);
 	opaque = (BTPageOpaque) PageGetSpecialPointer(page);
 
diff --git a/src/backend/access/nbtree/nbtree.c b/src/backend/access/nbtree/nbtree.c
index e947addef6..cf409c4c8a 100644
--- a/src/backend/access/nbtree/nbtree.c
+++ b/src/backend/access/nbtree/nbtree.c
@@ -1115,7 +1115,8 @@ backtrack:
 	 */
 	buf = ReadBufferExtended(rel, MAIN_FORKNUM, blkno, RBM_NORMAL,
 							 info->strategy);
-	LockBuffer(buf, BT_READ);
+	_bt_lockbuf(rel, buf, BT_READ);
+
 	page = BufferGetPage(buf);
 	opaque = NULL;
 	if (!PageIsNew(page))
@@ -1222,8 +1223,7 @@ backtrack:
 		 * course of the vacuum scan, whether or not it actually contains any
 		 * deletable tuples --- see nbtree/README.
 		 */
-		LockBuffer(buf, BUFFER_LOCK_UNLOCK);
-		LockBufferForCleanup(buf);
+		_bt_upgradelockbufcleanup(rel, buf);
 
 		/*
 		 * Check whether we need to backtrack to earlier pages.  What we are
diff --git a/src/backend/access/nbtree/nbtsearch.c b/src/backend/access/nbtree/nbtsearch.c
index 4534224812..b1aaa446f8 100644
--- a/src/backend/access/nbtree/nbtsearch.c
+++ b/src/backend/access/nbtree/nbtsearch.c
@@ -64,7 +64,7 @@ static inline void _bt_initialize_more_data(BTScanOpaque so, ScanDirection dir);
 static void
 _bt_drop_lock_and_maybe_pin(IndexScanDesc scan, BTScanPos sp)
 {
-	LockBuffer(sp->buf, BUFFER_LOCK_UNLOCK);
+	_bt_unlockbuf(scan->indexRelation, sp->buf);
 
 	if (IsMVCCSnapshot(scan->xs_snapshot) &&
 		RelationNeedsWAL(scan->indexRelation) &&
@@ -189,8 +189,8 @@ _bt_search(Relation rel, BTScanInsert key, Buffer *bufP, int access,
 	if (access == BT_WRITE && page_access == BT_READ)
 	{
 		/* trade in our read lock for a write lock */
-		LockBuffer(*bufP, BUFFER_LOCK_UNLOCK);
-		LockBuffer(*bufP, BT_WRITE);
+		_bt_unlockbuf(rel, *bufP);
+		_bt_lockbuf(rel, *bufP, BT_WRITE);
 
 		/*
 		 * If the page was split between the time that we surrendered our read
@@ -292,8 +292,8 @@ _bt_moveright(Relation rel,
 			/* upgrade our lock if necessary */
 			if (access == BT_READ)
 			{
-				LockBuffer(buf, BUFFER_LOCK_UNLOCK);
-				LockBuffer(buf, BT_WRITE);
+				_bt_unlockbuf(rel, buf);
+				_bt_lockbuf(rel, buf, BT_WRITE);
 			}
 
 			if (P_INCOMPLETE_SPLIT(opaque))
@@ -1416,7 +1416,7 @@ _bt_first(IndexScanDesc scan, ScanDirection dir)
 		 * There's no actually-matching data on this page.  Try to advance to
 		 * the next page.  Return false if there's no matching data at all.
 		 */
-		LockBuffer(so->currPos.buf, BUFFER_LOCK_UNLOCK);
+		_bt_unlockbuf(scan->indexRelation, so->currPos.buf);
 		if (!_bt_steppage(scan, dir))
 			return false;
 	}
@@ -2064,7 +2064,7 @@ _bt_readnextpage(IndexScanDesc scan, BlockNumber blkno, ScanDirection dir)
 		 * deleted.
 		 */
 		if (BTScanPosIsPinned(so->currPos))
-			LockBuffer(so->currPos.buf, BT_READ);
+			_bt_lockbuf(rel, so->currPos.buf, BT_READ);
 		else
 			so->currPos.buf = _bt_getbuf(rel, so->currPos.currPage, BT_READ);
 
@@ -2442,7 +2442,7 @@ _bt_endpoint(IndexScanDesc scan, ScanDirection dir)
 		 * There's no actually-matching data on this page.  Try to advance to
 		 * the next page.  Return false if there's no matching data at all.
 		 */
-		LockBuffer(so->currPos.buf, BUFFER_LOCK_UNLOCK);
+		_bt_unlockbuf(scan->indexRelation, so->currPos.buf);
 		if (!_bt_steppage(scan, dir))
 			return false;
 	}
diff --git a/src/backend/access/nbtree/nbtutils.c b/src/backend/access/nbtree/nbtutils.c
index 7c33711a9f..81589b9056 100644
--- a/src/backend/access/nbtree/nbtutils.c
+++ b/src/backend/access/nbtree/nbtutils.c
@@ -1744,7 +1744,7 @@ _bt_killitems(IndexScanDesc scan)
 		 * LSN.
 		 */
 		droppedpin = false;
-		LockBuffer(so->currPos.buf, BT_READ);
+		_bt_lockbuf(scan->indexRelation, so->currPos.buf, BT_READ);
 
 		page = BufferGetPage(so->currPos.buf);
 	}
@@ -1885,7 +1885,7 @@ _bt_killitems(IndexScanDesc scan)
 		MarkBufferDirtyHint(so->currPos.buf, true);
 	}
 
-	LockBuffer(so->currPos.buf, BUFFER_LOCK_UNLOCK);
+	_bt_unlockbuf(scan->indexRelation, so->currPos.buf);
 }
 
 
-- 
2.25.1

v3-0001-Add-Valgrind-buffer-access-instrumentation.patchapplication/octet-stream; name=v3-0001-Add-Valgrind-buffer-access-instrumentation.patchDownload
From c7c5ff7f9edf4ee4d4e4eb7edff20f64b5f8dc1a Mon Sep 17 00:00:00 2001
From: Peter Geoghegan <pg@bowt.ie>
Date: Thu, 30 Apr 2020 12:31:42 -0700
Subject: [PATCH v3 1/2] Add Valgrind buffer access instrumentation.
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

Based on a patch from Álvaro Herrera posted to:
https://www.postgresql.org/message-id/20150723195349.GW5596@postgresql.org
---
 src/backend/storage/buffer/bufmgr.c | 18 ++++++++++++++++++
 1 file changed, 18 insertions(+)

diff --git a/src/backend/storage/buffer/bufmgr.c b/src/backend/storage/buffer/bufmgr.c
index 29c920800a..8ef073b382 100644
--- a/src/backend/storage/buffer/bufmgr.c
+++ b/src/backend/storage/buffer/bufmgr.c
@@ -49,6 +49,7 @@
 #include "storage/proc.h"
 #include "storage/smgr.h"
 #include "storage/standby.h"
+#include "utils/memdebug.h"
 #include "utils/ps_status.h"
 #include "utils/rel.h"
 #include "utils/resowner_private.h"
@@ -1633,6 +1634,13 @@ PinBuffer(BufferDesc *buf, BufferAccessStrategy strategy)
 											   buf_state))
 			{
 				result = (buf_state & BM_VALID) != 0;
+
+				/*
+				 * If we successfully acquired our first pin on this buffer
+				 * within this backend, mark buffer contents defined
+				 */
+				if (result)
+					VALGRIND_MAKE_MEM_DEFINED(BufHdrGetBlock(buf), BLCKSZ);
 				break;
 			}
 		}
@@ -1683,6 +1691,13 @@ PinBuffer_Locked(BufferDesc *buf)
 	 */
 	Assert(GetPrivateRefCountEntry(BufferDescriptorGetBuffer(buf), false) == NULL);
 
+	/*
+	 * Buffer can't have a preexisting pin, so mark its page as defined to
+	 * Valgrind (this is similar to the PinBuffer() case where the backend
+	 * doesn't already have a buffer pin)
+	 */
+	VALGRIND_MAKE_MEM_DEFINED(BufHdrGetBlock(buf), BLCKSZ);
+
 	/*
 	 * Since we hold the buffer spinlock, we can update the buffer state and
 	 * release the lock in one operation.
@@ -1728,6 +1743,9 @@ UnpinBuffer(BufferDesc *buf, bool fixOwner)
 		uint32		buf_state;
 		uint32		old_buf_state;
 
+		/* Mark undefined, now that no pins remain in backend */
+		VALGRIND_MAKE_MEM_NOACCESS(BufHdrGetBlock(buf), BLCKSZ);
+
 		/* I'd better not still hold any locks on the buffer */
 		Assert(!LWLockHeldByMe(BufferDescriptorGetContentLock(buf)));
 		Assert(!LWLockHeldByMe(BufferDescriptorGetIOLock(buf)));
-- 
2.25.1

#5Georgios Kokolatos
gkokolatos@protonmail.com
In reply to: Peter Geoghegan (#4)
Re: Using Valgrind to detect faulty buffer accesses (no pin or buffer content lock held)

As a general overview, the series of patches in the mail thread do match their description. The addition of the stricter, explicit use of instrumentation does improve the design as the distinction of the use cases requiring a pin or a lock is made more clear. The added commentary is descriptive and appears grammatically correct, at least to a non native speaker.

Unfortunately though, the two bug fixes do not seem to apply.

Also, there is a small issue regarding the process, not the content of the patches. In CF app there is a latest attachment (v3-0002-Add-nbtree-Valgrind-buffer-lock-checks.patch) which does not appear in the mail thread. Before changing the status, I will kindly ask for the complete latest series that applies in the mail thread.

The new status of this patch is: Waiting on Author

#6Anastasia Lubennikova
a.lubennikova@postgrespro.ru
In reply to: Peter Geoghegan (#4)
Re: Using Valgrind to detect faulty buffer accesses (no pin or buffer content lock held)

On 02.07.2020 20:11, Peter Geoghegan wrote:

On Thu, Jul 2, 2020 at 7:48 AM Daniel Gustafsson <daniel@yesql.se> wrote:

This patch fails to apply to HEAD due to conflicts in nbtpage.c, can you please
submit a rebased version?

I attach the rebased patch series.

Thanks

It's impressive that this check helped to find several bugs.

I only noticed small inconsistency in the new comment for
_bt_conditionallockbuf().

It says "Note: Caller is responsible for calling _bt_checkpage() on
success.", while in _bt_getbuf() the call is not followed by
_bt_checkpage().
Moreover, _bt_page_recyclable() contradicts _bt_checkpage() checks.

Other than that, patches look good to me, so move them to "Ready For
Committer".

Are you planning to add same checks for other access methods?

--
Anastasia Lubennikova
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

In reply to: Anastasia Lubennikova (#6)
Re: Using Valgrind to detect faulty buffer accesses (no pin or buffer content lock held)

On Thu, Jul 16, 2020 at 10:24 AM Anastasia Lubennikova
<a.lubennikova@postgrespro.ru> wrote:

It's impressive that this check helped to find several bugs.

While it's definitely true that it *could* have detected the bug fixed
by commit b0229f26, it's kind of debatable whether or not the bugs I
fixed in commit fa7ff642 and commit 7154aa16 (which actually were
found using this new instrumentation) were truly bugs.

The behavior in question was probably safe, since only the
special/opaque page area was accessed -- and with at least a buffer
pin held. But it's not worth having a debate about whether or not it
should be considered safe. There is no downside to not having a simple
strict rule that's easy to enforce. Also, I myself spotted some bugs
in the skip scan patch series at one point that would also be caught
by the new instrumentation.

I only noticed small inconsistency in the new comment for
_bt_conditionallockbuf().

It says "Note: Caller is responsible for calling _bt_checkpage() on
success.", while in _bt_getbuf() the call is not followed by
_bt_checkpage().
Moreover, _bt_page_recyclable() contradicts _bt_checkpage() checks.

Nice catch.

Other than that, patches look good to me, so move them to "Ready For
Committer".

Pushed the first patch just now, and intend to push the other one soon. Thanks!

Are you planning to add same checks for other access methods?

Not at the moment, but it does seem like my approach could be
generalized to other index access methods.

--
Peter Geoghegan

#8Tom Lane
tgl@sss.pgh.pa.us
In reply to: Peter Geoghegan (#7)
Re: Using Valgrind to detect faulty buffer accesses (no pin or buffer content lock held)

Peter Geoghegan <pg@bowt.ie> writes:

On Thu, Jul 16, 2020 at 10:24 AM Anastasia Lubennikova
<a.lubennikova@postgrespro.ru> wrote:

Other than that, patches look good to me, so move them to "Ready For
Committer".

Pushed the first patch just now, and intend to push the other one soon. Thanks!

I wonder whether skink's failure today is due to this change:

https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=skink&amp;dt=2020-07-18%2018%3A01%3A10

regards, tom lane

In reply to: Tom Lane (#8)
1 attachment(s)
Re: Using Valgrind to detect faulty buffer accesses (no pin or buffer content lock held)

On Sat, Jul 18, 2020 at 7:36 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:

I wonder whether skink's failure today is due to this change:

https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=skink&amp;dt=2020-07-18%2018%3A01%3A10

That seems extremely likely. I think that I need to do something like
what you see in the attached.

Anyway, I'll take care of it tomorrow. Sorry for missing it before my commit.

--
Peter Geoghegan

Attachments:

0001-Mark-buffers-as-defined-to-Valgrind-consistently.patchapplication/octet-stream; name=0001-Mark-buffers-as-defined-to-Valgrind-consistently.patchDownload
From 07d03aaf68eaa6e149e9cb522f16e08255615689 Mon Sep 17 00:00:00 2001
From: Peter Geoghegan <pg@bowt.ie>
Date: Sat, 18 Jul 2020 11:19:45 -0700
Subject: [PATCH 1/2] Mark buffers as defined to Valgrind consistently.

Make PinBuffer() mark buffers as defined to Valgrind unconditionally,
including when the buffer header spinlock must be acquired.  This
theoretically creates a risk that we'll mark buffers defined even when
external callers don't end up with a buffer pin.  That seems perfectly
acceptable, though, since in general we make no guarantees about buffers
that are unsafe to access being reliably marked as unsafe.

Oversight in commit 1e0dfd16, which added valgrind buffer access
instrumentation.
---
 src/backend/storage/buffer/bufmgr.c | 10 ++++++----
 1 file changed, 6 insertions(+), 4 deletions(-)

diff --git a/src/backend/storage/buffer/bufmgr.c b/src/backend/storage/buffer/bufmgr.c
index 8ef073b382..83d91b14fb 100644
--- a/src/backend/storage/buffer/bufmgr.c
+++ b/src/backend/storage/buffer/bufmgr.c
@@ -1636,11 +1636,13 @@ PinBuffer(BufferDesc *buf, BufferAccessStrategy strategy)
 				result = (buf_state & BM_VALID) != 0;
 
 				/*
-				 * If we successfully acquired our first pin on this buffer
-				 * within this backend, mark buffer contents defined
+				 * Assume that we acquired a buffer pin for the purposes of
+				 * Valgrind buffer client checks (even in !result case) to
+				 * keep things simple.  Buffers that are unsafe to access are
+				 * not generally guaranteed to be marked undefined in any
+				 * case.
 				 */
-				if (result)
-					VALGRIND_MAKE_MEM_DEFINED(BufHdrGetBlock(buf), BLCKSZ);
+				VALGRIND_MAKE_MEM_DEFINED(BufHdrGetBlock(buf), BLCKSZ);
 				break;
 			}
 		}
-- 
2.25.1

In reply to: Georgios Kokolatos (#5)
Re: Using Valgrind to detect faulty buffer accesses (no pin or buffer content lock held)

On Mon, Jul 6, 2020 at 1:35 AM Georgios Kokolatos
<gkokolatos@protonmail.com> wrote:

As a general overview, the series of patches in the mail thread do match their description. The addition of the stricter, explicit use of instrumentation does improve the design as the distinction of the use cases requiring a pin or a lock is made more clear. The added commentary is descriptive and appears grammatically correct, at least to a non native speaker.

I didn't see this review until now because it ended up in gmail's spam
folder. :-(

Thanks for taking a look at it!

--
Peter Geoghegan

In reply to: Peter Geoghegan (#7)
Re: Using Valgrind to detect faulty buffer accesses (no pin or buffer content lock held)

On Fri, Jul 17, 2020 at 5:53 PM Peter Geoghegan <pg@bowt.ie> wrote:

Pushed the first patch just now, and intend to push the other one soon. Thanks!

Pushed the second piece of this (the nbtree patch) just now.

Thanks for the review!
--
Peter Geoghegan

#12Georgios
gkokolatos@protonmail.com
In reply to: Peter Geoghegan (#10)
Re: Using Valgrind to detect faulty buffer accesses (no pin or buffer content lock held)

‐‐‐‐‐‐‐ Original Message ‐‐‐‐‐‐‐
On Tuesday, July 21, 2020 11:52 PM, Peter Geoghegan <pg@bowt.ie> wrote:

On Mon, Jul 6, 2020 at 1:35 AM Georgios Kokolatos
gkokolatos@protonmail.com wrote:

As a general overview, the series of patches in the mail thread do match their description. The addition of the stricter, explicit use of instrumentation does improve the design as the distinction of the use cases requiring a pin or a lock is made more clear. The added commentary is descriptive and appears grammatically correct, at least to a non native speaker.

I didn't see this review until now because it ended up in gmail's spam
folder. :-(

Thanks for taking a look at it!

No worries at all. It happens and it was beneficial for me to read the patch.

//Georgios

Show quoted text

----------------------------------------------------------------------------------------------------------------------

Peter Geoghegan