Opportunistically pruning page before update

Started by James Colemanover 2 years ago13 messages
#1James Coleman
jtc331@gmail.com
2 attachment(s)

Hello,

While at PGCon I was chatting with Andres (and I think Peter G. and a
few others who I can't remember at the moment, apologies) and Andres
noted that while we opportunistically prune a page when inserting a
tuple (before deciding we need a new page) we don't do the same for
updates.

Attached is a patch series to do the following:

0001: Make it possible to call heap_page_prune_opt already holding an
exclusive lock on the buffer.
0002: Opportunistically prune pages on update when the current tuple's
page has no free space. If this frees up enough space, then we
continue to put the new tuple on that page; if not, then we take the
existing code path and get a new page.

One would plausibly expect the following improvements:
- Reduced table bloat
- Increased HOT update rate
- Improved performance on updates

I started to work on benchmarking this, but haven't had time to devote
properly to that, so I'm wondering if there's anyone who might be
interested in collaborating on that part.

Other TODOs:
- Audit other callers of RelationSetTargetBlock() to ensure they don't
hold pointers into the page.

Regards,
James Coleman

Attachments:

v1-0001-Allow-getting-lock-before-calling-heap_page_prune.patchapplication/octet-stream; name=v1-0001-Allow-getting-lock-before-calling-heap_page_prune.patchDownload
From 3064f77620653b9cd0faf07f43c975449612c8b3 Mon Sep 17 00:00:00 2001
From: jcoleman <jtc331@gmail.com>
Date: Fri, 2 Jun 2023 10:01:06 -0400
Subject: [PATCH v1 1/2] Allow getting lock before calling
 heap_page_prune_opt()

---
 src/backend/access/heap/heapam.c         |  2 +-
 src/backend/access/heap/heapam_handler.c |  4 ++--
 src/backend/access/heap/pruneheap.c      | 10 ++++++----
 src/include/access/heapam.h              |  2 +-
 4 files changed, 10 insertions(+), 8 deletions(-)

diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index 7ed72abe59..51f645dda0 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -415,7 +415,7 @@ heapgetpage(TableScanDesc sscan, BlockNumber block)
 	/*
 	 * Prune and repair fragmentation for the whole page, if possible.
 	 */
-	heap_page_prune_opt(scan->rs_base.rs_rd, buffer);
+	heap_page_prune_opt(scan->rs_base.rs_rd, buffer, false);
 
 	/*
 	 * We must hold share lock on the buffer content while examining tuple
diff --git a/src/backend/access/heap/heapam_handler.c b/src/backend/access/heap/heapam_handler.c
index 646135cc21..b1be2e1897 100644
--- a/src/backend/access/heap/heapam_handler.c
+++ b/src/backend/access/heap/heapam_handler.c
@@ -135,7 +135,7 @@ heapam_index_fetch_tuple(struct IndexFetchTableData *scan,
 		 * Prune page, but only if we weren't already on this page
 		 */
 		if (prev_buf != hscan->xs_cbuf)
-			heap_page_prune_opt(hscan->xs_base.rel, hscan->xs_cbuf);
+			heap_page_prune_opt(hscan->xs_base.rel, hscan->xs_cbuf, false);
 	}
 
 	/* Obtain share-lock on the buffer so we can examine visibility */
@@ -2157,7 +2157,7 @@ heapam_scan_bitmap_next_block(TableScanDesc scan,
 	/*
 	 * Prune and repair fragmentation for the whole page, if possible.
 	 */
-	heap_page_prune_opt(scan->rs_rd, buffer);
+	heap_page_prune_opt(scan->rs_rd, buffer, false);
 
 	/*
 	 * We must hold share lock on the buffer content while examining tuple
diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index 47b9e20915..af052f94af 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -102,10 +102,11 @@ static void page_verify_redirects(Page page);
  * Note: this is called quite often.  It's important that it fall out quickly
  * if there's not any use in pruning.
  *
- * Caller must have pin on the buffer, and must *not* have a lock on it.
+ * Caller must have pin on the buffer, and must either have an exclusive lock
+ * (and pass already_locked = true) or not have a lock on it.
  */
 void
-heap_page_prune_opt(Relation relation, Buffer buffer)
+heap_page_prune_opt(Relation relation, Buffer buffer, bool already_locked)
 {
 	Page		page = BufferGetPage(buffer);
 	TransactionId prune_xid;
@@ -191,8 +192,9 @@ heap_page_prune_opt(Relation relation, Buffer buffer)
 
 	if (PageIsFull(page) || PageGetHeapFreeSpace(page) < minfree)
 	{
-		/* OK, try to get exclusive buffer lock */
-		if (!ConditionalLockBufferForCleanup(buffer))
+		/* OK, try to get exclusive buffer lock if necessary */
+		if ((!already_locked && !ConditionalLockBufferForCleanup(buffer)) ||
+				(already_locked && !IsBufferCleanupOK(buffer)))
 			return;
 
 		/*
diff --git a/src/include/access/heapam.h b/src/include/access/heapam.h
index faf5026519..d72b476920 100644
--- a/src/include/access/heapam.h
+++ b/src/include/access/heapam.h
@@ -283,7 +283,7 @@ extern TransactionId heap_index_delete_tuples(Relation rel,
 
 /* in heap/pruneheap.c */
 struct GlobalVisState;
-extern void heap_page_prune_opt(Relation relation, Buffer buffer);
+extern void	heap_page_prune_opt(Relation relation, Buffer buffer, bool already_locked);
 extern int	heap_page_prune(Relation relation, Buffer buffer,
 							struct GlobalVisState *vistest,
 							TransactionId old_snap_xmin,
-- 
2.39.2 (Apple Git-143)

v1-0002-Opportunistically-prune-to-avoid-building-a-new-p.patchapplication/octet-stream; name=v1-0002-Opportunistically-prune-to-avoid-building-a-new-p.patchDownload
From 0615979a9d86bb23353b8542e3940a88539251df Mon Sep 17 00:00:00 2001
From: jcoleman <jtc331@gmail.com>
Date: Fri, 2 Jun 2023 10:02:07 -0400
Subject: [PATCH v1 2/2] Opportunistically prune to avoid building a new page
 for an update

---
 src/backend/access/heap/heapam.c    | 34 +++++++++++++++++++++++++++++
 src/backend/access/heap/hio.c       | 20 +++++++++++++++++
 src/backend/access/heap/pruneheap.c |  5 +++--
 3 files changed, 57 insertions(+), 2 deletions(-)

diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index 51f645dda0..b106d55c53 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -3017,9 +3017,16 @@ heap_update(Relation relation, ItemPointer otid, HeapTuple newtup,
 				infomask2_old_tuple,
 				infomask_new_tuple,
 				infomask2_new_tuple;
+#ifdef USE_ASSERT_CHECKING
+	ItemPointerData original_otid;
+#endif
 
 	Assert(ItemPointerIsValid(otid));
 
+#ifdef USE_ASSERT_CHECKING
+	ItemPointerCopy(otid, &original_otid);
+#endif
+
 	/* Cheap, simplistic check that the tuple matches the rel's rowtype. */
 	Assert(HeapTupleHeaderGetNatts(newtup->t_data) <=
 		   RelationGetNumberOfAttributes(relation));
@@ -3141,6 +3148,7 @@ heap_update(Relation relation, ItemPointer otid, HeapTuple newtup,
 	}
 
 	/*
+	 * TODO: is this note the problem pointer?
 	 * Note: beyond this point, use oldtup not otid to refer to old tuple.
 	 * otid may very well point at newtup->t_self, which we will overwrite
 	 * with the new tuple's location, so there's great risk of confusion if we
@@ -3625,13 +3633,39 @@ l2:
 			if (newtupsize > pagefree)
 			{
 				/* It doesn't fit, must use RelationGetBufferForTuple. */
+				/* TODO: every time we call this we need to make sure we don't have pointers into the page */
 				newbuf = RelationGetBufferForTuple(relation, heaptup->t_len,
 												   buffer, 0, NULL,
 												   &vmbuffer_new, &vmbuffer,
 												   0);
+
+#ifdef USE_ASSERT_CHECKING
+				/*
+				 * About 500 lines ago in this function a comment claimed we
+				 * might not be able to rely on otid after that point, but it
+				 * appears we can.
+				 */
+				Assert(ItemPointerEquals(otid, &original_otid));
+#endif
+
+				/*
+				 * Getting a buffer may have pruned the page, so we can't rely
+				 * on our original pointer into the page.
+				 */
+				lp = PageGetItemId(page, ItemPointerGetOffsetNumber(otid));
+				Assert(ItemIdIsNormal(lp));
+
+				/*
+				 * Mirror what we filled into oldtup at the beginning
+				 * of this function.
+				 */
+				oldtup.t_data = (HeapTupleHeader) PageGetItem(page, lp);
+				oldtup.t_len = ItemIdGetLength(lp);
+
 				/* We're all done. */
 				break;
 			}
+
 			/* Acquire VM page pin if needed and we don't have it. */
 			if (vmbuffer == InvalidBuffer && PageIsAllVisible(page))
 				visibilitymap_pin(relation, block, &vmbuffer);
diff --git a/src/backend/access/heap/hio.c b/src/backend/access/heap/hio.c
index c275b08494..9c145e1720 100644
--- a/src/backend/access/heap/hio.c
+++ b/src/backend/access/heap/hio.c
@@ -688,6 +688,26 @@ loop:
 			RelationSetTargetBlock(relation, targetBlock);
 			return buffer;
 		}
+		else
+		{
+			/*
+			 * Opportunistically prune and see if that frees up enough space to
+			 * avoid needing to build a new page.
+			 */
+			heap_page_prune_opt(relation, buffer, true);
+
+			/* If pruning freed up any slots, we can check free space again. */
+			if (PageHasFreeLinePointers(page))
+			{
+				pageFreeSpace = PageGetHeapFreeSpace(page);
+				if (targetFreeSpace <= pageFreeSpace)
+				{
+					/* use this page as future insert target, too */
+					RelationSetTargetBlock(relation, targetBlock);
+					return buffer;
+				}
+			}
+		}
 
 		/*
 		 * Not enough space, so we must give up our page locks and pin (if
diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index af052f94af..a34b21770f 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -229,8 +229,9 @@ heap_page_prune_opt(Relation relation, Buffer buffer, bool already_locked)
 											   ndeleted - nnewlpdead);
 		}
 
-		/* And release buffer lock */
-		LockBuffer(buffer, BUFFER_LOCK_UNLOCK);
+		/* And release buffer lock if we acquired it */
+		if (!already_locked)
+			LockBuffer(buffer, BUFFER_LOCK_UNLOCK);
 
 		/*
 		 * We avoid reuse of any free space created on the page by unrelated
-- 
2.39.2 (Apple Git-143)

#2Melanie Plageman
melanieplageman@gmail.com
In reply to: James Coleman (#1)
Re: Opportunistically pruning page before update

On Wed, Jun 21, 2023 at 8:51 AM James Coleman <jtc331@gmail.com> wrote:

While at PGCon I was chatting with Andres (and I think Peter G. and a
few others who I can't remember at the moment, apologies) and Andres
noted that while we opportunistically prune a page when inserting a
tuple (before deciding we need a new page) we don't do the same for
updates.

Attached is a patch series to do the following:

0001: Make it possible to call heap_page_prune_opt already holding an
exclusive lock on the buffer.
0002: Opportunistically prune pages on update when the current tuple's
page has no free space. If this frees up enough space, then we
continue to put the new tuple on that page; if not, then we take the
existing code path and get a new page.

I've reviewed these patches and have questions.

Under what conditions would this be exercised for UPDATE? Could you
provide an example?

With your patch applied, when I create a table, the first time I update
it heap_page_prune_opt() will return before actually doing any pruning
because the page prune_xid hadn't been set (it is set after pruning as
well as later in heap_update() after RelationGetBufferForTuple() is
called).

I actually added an additional parameter to heap_page_prune() and
heap_page_prune_opt() to identify if heap_page_prune() was called from
RelationGetBufferForTuple() and logged a message when this was true.
Running the test suite, I didn't see any UPDATEs executing
heap_page_prune() from RelationGetBufferForTuple(). I did, however, see
other statement types doing so (see RelationGetBufferForTuple()'s other
callers). Was that intended?

I started to work on benchmarking this, but haven't had time to devote
properly to that, so I'm wondering if there's anyone who might be
interested in collaborating on that part.

I'm interested in this feature and in helping with it/helping with
benchmarking it, but I don't yet understand the design in its current
form.

- Melanie

#3James Coleman
jtc331@gmail.com
In reply to: Melanie Plageman (#2)
Re: Opportunistically pruning page before update

On Tue, Sep 5, 2023 at 1:40 PM Melanie Plageman
<melanieplageman@gmail.com> wrote:

On Wed, Jun 21, 2023 at 8:51 AM James Coleman <jtc331@gmail.com> wrote:

While at PGCon I was chatting with Andres (and I think Peter G. and a
few others who I can't remember at the moment, apologies) and Andres
noted that while we opportunistically prune a page when inserting a
tuple (before deciding we need a new page) we don't do the same for
updates.

Attached is a patch series to do the following:

0001: Make it possible to call heap_page_prune_opt already holding an
exclusive lock on the buffer.
0002: Opportunistically prune pages on update when the current tuple's
page has no free space. If this frees up enough space, then we
continue to put the new tuple on that page; if not, then we take the
existing code path and get a new page.

I've reviewed these patches and have questions.

Under what conditions would this be exercised for UPDATE? Could you
provide an example?

With your patch applied, when I create a table, the first time I update
it heap_page_prune_opt() will return before actually doing any pruning
because the page prune_xid hadn't been set (it is set after pruning as
well as later in heap_update() after RelationGetBufferForTuple() is
called).

I actually added an additional parameter to heap_page_prune() and
heap_page_prune_opt() to identify if heap_page_prune() was called from
RelationGetBufferForTuple() and logged a message when this was true.
Running the test suite, I didn't see any UPDATEs executing
heap_page_prune() from RelationGetBufferForTuple(). I did, however, see
other statement types doing so (see RelationGetBufferForTuple()'s other
callers). Was that intended?

I started to work on benchmarking this, but haven't had time to devote
properly to that, so I'm wondering if there's anyone who might be
interested in collaborating on that part.

I'm interested in this feature and in helping with it/helping with
benchmarking it, but I don't yet understand the design in its current
form.

Hi Melanie,

Thanks for taking a look at this! Apologies for the long delay in
replying: I started to take a look at your questions earlier, and it
turned into more of a rabbit hole than I'd anticipated. I've since
been distracted by other things. So -- I don't have any conclusions
here yet, but I'm hoping at or after PGConf NYC that I'll be able to
dedicate the time this deserves.

Thanks,
James Coleman

#4James Coleman
jtc331@gmail.com
In reply to: James Coleman (#3)
4 attachment(s)
Re: Opportunistically pruning page before update

On Tue, Sep 26, 2023 at 8:30 AM James Coleman <jtc331@gmail.com> wrote:

On Tue, Sep 5, 2023 at 1:40 PM Melanie Plageman
<melanieplageman@gmail.com> wrote:

On Wed, Jun 21, 2023 at 8:51 AM James Coleman <jtc331@gmail.com> wrote:

While at PGCon I was chatting with Andres (and I think Peter G. and a
few others who I can't remember at the moment, apologies) and Andres
noted that while we opportunistically prune a page when inserting a
tuple (before deciding we need a new page) we don't do the same for
updates.

Attached is a patch series to do the following:

0001: Make it possible to call heap_page_prune_opt already holding an
exclusive lock on the buffer.
0002: Opportunistically prune pages on update when the current tuple's
page has no free space. If this frees up enough space, then we
continue to put the new tuple on that page; if not, then we take the
existing code path and get a new page.

I've reviewed these patches and have questions.

Under what conditions would this be exercised for UPDATE? Could you
provide an example?

With your patch applied, when I create a table, the first time I update
it heap_page_prune_opt() will return before actually doing any pruning
because the page prune_xid hadn't been set (it is set after pruning as
well as later in heap_update() after RelationGetBufferForTuple() is
called).

I actually added an additional parameter to heap_page_prune() and
heap_page_prune_opt() to identify if heap_page_prune() was called from
RelationGetBufferForTuple() and logged a message when this was true.
Running the test suite, I didn't see any UPDATEs executing
heap_page_prune() from RelationGetBufferForTuple(). I did, however, see
other statement types doing so (see RelationGetBufferForTuple()'s other
callers). Was that intended?

I started to work on benchmarking this, but haven't had time to devote
properly to that, so I'm wondering if there's anyone who might be
interested in collaborating on that part.

I'm interested in this feature and in helping with it/helping with
benchmarking it, but I don't yet understand the design in its current
form.

Hi Melanie,

Thanks for taking a look at this! Apologies for the long delay in
replying: I started to take a look at your questions earlier, and it
turned into more of a rabbit hole than I'd anticipated. I've since
been distracted by other things. So -- I don't have any conclusions
here yet, but I'm hoping at or after PGConf NYC that I'll be able to
dedicate the time this deserves.

Hi,

I poked at this a decent amount last night and uncovered a couple of
things (whether or not Andres and I had discussed these details at
PGCon...I don't remember):

1. We don't ever opportunistically prune on INSERT, but we do
(somewhat, see below) on UPDATE, since we call it the first time we
read the page with the to-be-updated tuple on it.
2. The reason that original testing on v1 didn't see any real changes
is because PageClearHasFreeLinePointers() wasn't the right fastpath
gate on this; I should have been using !PageIsFull().

With the change to use !PageIsFull() I can trivially show that there
is improvement functionally. Consider the following commands:

drop table if exists foo;
create table foo(pk serial primary key, t text);
insert into foo(t) select repeat('a', 250) from generate_series(1, 27);
select pg_relation_size('foo');
delete from foo where pk <= 10;
insert into foo(t) select repeat('b', 250) from generate_series(1, 10);
select pg_relation_size('foo');

On master this will result in a final relation size of 16384 while
with the patch applied the final relation size is 8192.

I talked to Andres and Peter again today, and out of that conversation
I have some observations and ideas for future improvements.

1. The most trivial case where this is useful is INSERT: we have a
target page, and it may have dead tuples, so trying to prune may
result in us being able to use the target page rather than getting a
new page.
2. The next most trivial case is where UPDATE (potentially after
failing to find space for a HOT tuple on the source tuple's page);
much like the INSERT case our backend's target page may benefit from
pruning.
3. A more complex UPDATE case occurs when we check the tuple's page
for space in order to insert a HOT tuple and fail to find enough
space. While we've already opportunistically pruned the page on
initial read of the tuple, in complex queries this might be some time
in the past, so it may be worth attempting again. Beyond that context
is key: if we already know we could otherwise do a HOT update but for
the lack of free space on the page, then spending extra cycles
rescuing that failed attempt is easier to justify. In order to do that
we ought to invent an "aggressive" flag to heap_page_prune_opt telling
it that it doesn't need to be quite so careful about exiting fast.
Perhaps we can rescue the HOT update optimization by pruning
aggressively.
4. We can prune the target page when the current backend recently
aborted a transaction. Additionally we could prune the target page
immediately on rollback (potentially we could even get into the
complexity of doing retail index tuple deletion when a transaction
aborts).

It may or may not be the case that I end up pursuing all of these in
this particular patch series, but I wanted to at least get it written
down here for history's sake.

The attached v2 patch series handles case 1 and likely case 2 (though
I haven't tested case 2 yet). The "log when pruning" patch files may
or may not be useful to you: they add a bunch of logging to make it
easier to observe what's happening while playing around in psql.

Regards,
James

Attachments:

v2-0003-Opportunistically-prune-to-avoid-building-a-new-p.patchapplication/octet-stream; name=v2-0003-Opportunistically-prune-to-avoid-building-a-new-p.patchDownload
From bb408e16b029a4515d2669ea1d549ec23bf6392e Mon Sep 17 00:00:00 2001
From: jcoleman <jtc331@gmail.com>
Date: Wed, 4 Oct 2023 09:38:53 -0400
Subject: [PATCH v2 3/6] Opportunistically prune to avoid building a new page
 for inserts

---
 src/backend/access/heap/heapam.c | 34 ++++++++++++++++++++++++++++++++
 src/backend/access/heap/hio.c    | 23 +++++++++++++++++++++
 2 files changed, 57 insertions(+)

diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index a1d3593f21..06c3998339 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -3009,9 +3009,16 @@ heap_update(Relation relation, ItemPointer otid, HeapTuple newtup,
 				infomask2_old_tuple,
 				infomask_new_tuple,
 				infomask2_new_tuple;
+#ifdef USE_ASSERT_CHECKING
+	ItemPointerData original_otid;
+#endif
 
 	Assert(ItemPointerIsValid(otid));
 
+#ifdef USE_ASSERT_CHECKING
+	ItemPointerCopy(otid, &original_otid);
+#endif
+
 	/* Cheap, simplistic check that the tuple matches the rel's rowtype. */
 	Assert(HeapTupleHeaderGetNatts(newtup->t_data) <=
 		   RelationGetNumberOfAttributes(relation));
@@ -3133,6 +3140,7 @@ heap_update(Relation relation, ItemPointer otid, HeapTuple newtup,
 	}
 
 	/*
+	 * TODO: is this note the problem pointer?
 	 * Note: beyond this point, use oldtup not otid to refer to old tuple.
 	 * otid may very well point at newtup->t_self, which we will overwrite
 	 * with the new tuple's location, so there's great risk of confusion if we
@@ -3617,13 +3625,39 @@ l2:
 			if (newtupsize > pagefree)
 			{
 				/* It doesn't fit, must use RelationGetBufferForTuple. */
+				/* TODO: every time we call this we need to make sure we don't have pointers into the page */
 				newbuf = RelationGetBufferForTuple(relation, heaptup->t_len,
 												   buffer, 0, NULL,
 												   &vmbuffer_new, &vmbuffer,
 												   0);
+
+#ifdef USE_ASSERT_CHECKING
+				/*
+				 * About 500 lines ago in this function a comment claimed we
+				 * might not be able to rely on otid after that point, but it
+				 * appears we can.
+				 */
+				Assert(ItemPointerEquals(otid, &original_otid));
+#endif
+
+				/*
+				 * Getting a buffer may have pruned the page, so we can't rely
+				 * on our original pointer into the page.
+				 */
+				lp = PageGetItemId(page, ItemPointerGetOffsetNumber(otid));
+				Assert(ItemIdIsNormal(lp));
+
+				/*
+				 * Mirror what we filled into oldtup at the beginning
+				 * of this function.
+				 */
+				oldtup.t_data = (HeapTupleHeader) PageGetItem(page, lp);
+				oldtup.t_len = ItemIdGetLength(lp);
+
 				/* We're all done. */
 				break;
 			}
+
 			/* Acquire VM page pin if needed and we don't have it. */
 			if (vmbuffer == InvalidBuffer && PageIsAllVisible(page))
 				visibilitymap_pin(relation, block, &vmbuffer);
diff --git a/src/backend/access/heap/hio.c b/src/backend/access/heap/hio.c
index caa62708aa..501747063a 100644
--- a/src/backend/access/heap/hio.c
+++ b/src/backend/access/heap/hio.c
@@ -707,6 +707,29 @@ loop:
 			RelationSetTargetBlock(relation, targetBlock);
 			return buffer;
 		}
+		else
+		{
+			/*
+			 * Opportunistically prune and see if that frees up enough space to
+			 * avoid needing to build a new page.
+			 */
+			heap_page_prune_opt(relation, buffer, true);
+
+			/*
+			 * If pruning cleared the PG_PAGE_FULL hint bit, then it's worth
+			 * checking free space again.
+			 */
+			if (!PageIsFull(page))
+			{
+				pageFreeSpace = PageGetHeapFreeSpace(page);
+				if (targetFreeSpace <= pageFreeSpace)
+				{
+					/* use this page as future insert target, too */
+					RelationSetTargetBlock(relation, targetBlock);
+					return buffer;
+				}
+			}
+		}
 
 		/*
 		 * Not enough space, so we must give up our page locks and pin (if
-- 
2.39.3 (Apple Git-145)

v2-0004-log-when-pruning-2.patchapplication/octet-stream; name=v2-0004-log-when-pruning-2.patchDownload
From ba8ad971775d9f3836341f941026b3fb094b3f14 Mon Sep 17 00:00:00 2001
From: jcoleman <jtc331@gmail.com>
Date: Wed, 4 Oct 2023 09:43:32 -0400
Subject: [PATCH v2 4/6] log when pruning (2)

---
 src/backend/access/heap/hio.c | 16 ++++++++++++++++
 1 file changed, 16 insertions(+)

diff --git a/src/backend/access/heap/hio.c b/src/backend/access/heap/hio.c
index 501747063a..e9441123eb 100644
--- a/src/backend/access/heap/hio.c
+++ b/src/backend/access/heap/hio.c
@@ -704,6 +704,8 @@ loop:
 		if (targetFreeSpace <= pageFreeSpace)
 		{
 			/* use this page as future insert target, too */
+			if (relation->rd_id > 16384)
+				elog(WARNING, "page has free space");
 			RelationSetTargetBlock(relation, targetBlock);
 			return buffer;
 		}
@@ -713,6 +715,8 @@ loop:
 			 * Opportunistically prune and see if that frees up enough space to
 			 * avoid needing to build a new page.
 			 */
+			if (relation->rd_id > 16384)
+				elog(WARNING, "calling heap_page_prune_opt");
 			heap_page_prune_opt(relation, buffer, true);
 
 			/*
@@ -726,8 +730,20 @@ loop:
 				{
 					/* use this page as future insert target, too */
 					RelationSetTargetBlock(relation, targetBlock);
+					if (relation->rd_id > 16384)
+						elog(WARNING, "heap_page_prune_opt found enough space");
 					return buffer;
 				}
+				else
+				{
+					if (relation->rd_id > 16384)
+						elog(WARNING, "heap_page_prune_opt did not find enough space");
+				}
+			}
+			else
+			{
+				if (relation->rd_id > 16384)
+					elog(WARNING, "heap_page_prune_opt did not unset PD_PAGE_FULL");
 			}
 		}
 
-- 
2.39.3 (Apple Git-145)

v2-0001-Allow-getting-lock-before-calling-heap_page_prune.patchapplication/octet-stream; name=v2-0001-Allow-getting-lock-before-calling-heap_page_prune.patchDownload
From b9282079afe18270403b08220731faae74c9dfd6 Mon Sep 17 00:00:00 2001
From: jcoleman <jtc331@gmail.com>
Date: Fri, 2 Jun 2023 10:01:06 -0400
Subject: [PATCH v2 1/6] Allow getting lock before calling
 heap_page_prune_opt()

---
 src/backend/access/heap/heapam.c         |  2 +-
 src/backend/access/heap/heapam_handler.c |  4 ++--
 src/backend/access/heap/pruneheap.c      | 15 +++++++++------
 src/include/access/heapam.h              |  2 +-
 4 files changed, 13 insertions(+), 10 deletions(-)

diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index 88a123d38a..a1d3593f21 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -415,7 +415,7 @@ heapgetpage(TableScanDesc sscan, BlockNumber block)
 	/*
 	 * Prune and repair fragmentation for the whole page, if possible.
 	 */
-	heap_page_prune_opt(scan->rs_base.rs_rd, buffer);
+	heap_page_prune_opt(scan->rs_base.rs_rd, buffer, false);
 
 	/*
 	 * We must hold share lock on the buffer content while examining tuple
diff --git a/src/backend/access/heap/heapam_handler.c b/src/backend/access/heap/heapam_handler.c
index 7c28dafb72..ff578db37b 100644
--- a/src/backend/access/heap/heapam_handler.c
+++ b/src/backend/access/heap/heapam_handler.c
@@ -135,7 +135,7 @@ heapam_index_fetch_tuple(struct IndexFetchTableData *scan,
 		 * Prune page, but only if we weren't already on this page
 		 */
 		if (prev_buf != hscan->xs_cbuf)
-			heap_page_prune_opt(hscan->xs_base.rel, hscan->xs_cbuf);
+			heap_page_prune_opt(hscan->xs_base.rel, hscan->xs_cbuf, false);
 	}
 
 	/* Obtain share-lock on the buffer so we can examine visibility */
@@ -2150,7 +2150,7 @@ heapam_scan_bitmap_next_block(TableScanDesc scan,
 	/*
 	 * Prune and repair fragmentation for the whole page, if possible.
 	 */
-	heap_page_prune_opt(scan->rs_rd, buffer);
+	heap_page_prune_opt(scan->rs_rd, buffer, false);
 
 	/*
 	 * We must hold share lock on the buffer content while examining tuple
diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index c5f1abd95a..fa8f5c1dfb 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -81,10 +81,11 @@ static void page_verify_redirects(Page page);
  * Note: this is called quite often.  It's important that it fall out quickly
  * if there's not any use in pruning.
  *
- * Caller must have pin on the buffer, and must *not* have a lock on it.
+ * Caller must have pin on the buffer, and must either have an exclusive lock
+ * (and pass already_locked = true) or not have a lock on it.
  */
 void
-heap_page_prune_opt(Relation relation, Buffer buffer)
+heap_page_prune_opt(Relation relation, Buffer buffer, bool already_locked)
 {
 	Page		page = BufferGetPage(buffer);
 	TransactionId prune_xid;
@@ -135,8 +136,9 @@ heap_page_prune_opt(Relation relation, Buffer buffer)
 
 	if (PageIsFull(page) || PageGetHeapFreeSpace(page) < minfree)
 	{
-		/* OK, try to get exclusive buffer lock */
-		if (!ConditionalLockBufferForCleanup(buffer))
+		/* OK, try to get exclusive buffer lock if necessary */
+		if ((!already_locked && !ConditionalLockBufferForCleanup(buffer)) ||
+				(already_locked && !IsBufferCleanupOK(buffer)))
 			return;
 
 		/*
@@ -169,8 +171,9 @@ heap_page_prune_opt(Relation relation, Buffer buffer)
 											   presult.ndeleted - presult.nnewlpdead);
 		}
 
-		/* And release buffer lock */
-		LockBuffer(buffer, BUFFER_LOCK_UNLOCK);
+		/* And release buffer lock if we acquired it */
+		if (!already_locked)
+			LockBuffer(buffer, BUFFER_LOCK_UNLOCK);
 
 		/*
 		 * We avoid reuse of any free space created on the page by unrelated
diff --git a/src/include/access/heapam.h b/src/include/access/heapam.h
index 62fac1d5d2..d15fa04feb 100644
--- a/src/include/access/heapam.h
+++ b/src/include/access/heapam.h
@@ -317,7 +317,7 @@ extern TransactionId heap_index_delete_tuples(Relation rel,
 
 /* in heap/pruneheap.c */
 struct GlobalVisState;
-extern void heap_page_prune_opt(Relation relation, Buffer buffer);
+extern void	heap_page_prune_opt(Relation relation, Buffer buffer, bool already_locked);
 extern void heap_page_prune(Relation relation, Buffer buffer,
 							struct GlobalVisState *vistest,
 							PruneResult *presult,
-- 
2.39.3 (Apple Git-145)

v2-0002-log-when-pruning.patchapplication/octet-stream; name=v2-0002-log-when-pruning.patchDownload
From 31e73f04ef9e81e551175f95db12bebde368fbd4 Mon Sep 17 00:00:00 2001
From: jcoleman <jtc331@gmail.com>
Date: Tue, 3 Oct 2023 19:53:41 -0400
Subject: [PATCH v2 2/6] log when pruning

---
 src/backend/access/heap/pruneheap.c | 8 ++++++++
 src/include/storage/bufpage.h       | 3 +++
 2 files changed, 11 insertions(+)

diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index fa8f5c1dfb..80eaacbc6c 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -92,6 +92,8 @@ heap_page_prune_opt(Relation relation, Buffer buffer, bool already_locked)
 	GlobalVisState *vistest;
 	Size		minfree;
 
+	if (relation->rd_id > 16384)
+		elog(WARNING, "in heap_page_prune_opt");
 	/*
 	 * We can't write WAL in recovery mode, so there's no point trying to
 	 * clean the page. The primary will likely issue a cleaning WAL record
@@ -216,6 +218,8 @@ heap_page_prune(Relation relation, Buffer buffer,
 	PruneState	prstate;
 	HeapTupleData tup;
 
+	if (relation->rd_id > 16384)
+	  elog(WARNING, "in heap_page_prune");
 	/*
 	 * Our strategy is to scan the page and make lists of items to change,
 	 * then apply the changes within a critical section.  This keeps as much
@@ -342,6 +346,8 @@ heap_page_prune(Relation relation, Buffer buffer,
 		 * Update the page's pd_prune_xid field to either zero, or the lowest
 		 * XID of any soon-prunable tuple.
 		 */
+		if (relation->rd_id > 16384)
+			elog(WARNING, "setting pd_prune_xid=%u", prstate.new_prune_xid);
 		((PageHeader) page)->pd_prune_xid = prstate.new_prune_xid;
 
 		/*
@@ -408,6 +414,8 @@ heap_page_prune(Relation relation, Buffer buffer,
 		if (((PageHeader) page)->pd_prune_xid != prstate.new_prune_xid ||
 			PageIsFull(page))
 		{
+			if (relation->rd_id > 16384)
+				elog(WARNING, "setting pd_prune_xid=%u", prstate.new_prune_xid);
 			((PageHeader) page)->pd_prune_xid = prstate.new_prune_xid;
 			PageClearFull(page);
 			MarkBufferDirtyHint(buffer, true);
diff --git a/src/include/storage/bufpage.h b/src/include/storage/bufpage.h
index 424ecba028..78b44cbb62 100644
--- a/src/include/storage/bufpage.h
+++ b/src/include/storage/bufpage.h
@@ -447,7 +447,10 @@ do { \
 	Assert(TransactionIdIsNormal(xid)); \
 	if (!TransactionIdIsValid(((PageHeader) (page))->pd_prune_xid) || \
 		TransactionIdPrecedes(xid, ((PageHeader) (page))->pd_prune_xid)) \
+	{ \
+		elog(WARNING, "PageSetPrunable setting pd_prune_xid=%u", xid); \
 		((PageHeader) (page))->pd_prune_xid = (xid); \
+	} \
 } while (0)
 #define PageClearPrunable(page) \
 	(((PageHeader) (page))->pd_prune_xid = InvalidTransactionId)
-- 
2.39.3 (Apple Git-145)

#5Dilip Kumar
dilipbalaut@gmail.com
In reply to: James Coleman (#4)
Re: Opportunistically pruning page before update

On Thu, Oct 5, 2023 at 2:35 AM James Coleman <jtc331@gmail.com> wrote:

I talked to Andres and Peter again today, and out of that conversation
I have some observations and ideas for future improvements.

1. The most trivial case where this is useful is INSERT: we have a
target page, and it may have dead tuples, so trying to prune may
result in us being able to use the target page rather than getting a
new page.
2. The next most trivial case is where UPDATE (potentially after
failing to find space for a HOT tuple on the source tuple's page);
much like the INSERT case our backend's target page may benefit from
pruning.

By looking at the patch I believe that v2-0003 is implementing these 2
ideas. So my question is are we planning to prune the backend's
current target page only or if we can not find space in that then we
are targetting to prune the other target pages as well which we are
getting from FSM? Because in the patch you have put a check in a loop
it will try to prune every page it gets from the FSM not just the
current target page of the backend. Just wanted to understand if this
is intentional.

In general, all 4 ideas look promising.

--
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com

#6James Coleman
jtc331@gmail.com
In reply to: Dilip Kumar (#5)
Re: Opportunistically pruning page before update

Hi,

Thanks for taking a look!

On Fri, Oct 6, 2023 at 1:18 AM Dilip Kumar <dilipbalaut@gmail.com> wrote:

On Thu, Oct 5, 2023 at 2:35 AM James Coleman <jtc331@gmail.com> wrote:

I talked to Andres and Peter again today, and out of that conversation
I have some observations and ideas for future improvements.

1. The most trivial case where this is useful is INSERT: we have a
target page, and it may have dead tuples, so trying to prune may
result in us being able to use the target page rather than getting a
new page.
2. The next most trivial case is where UPDATE (potentially after
failing to find space for a HOT tuple on the source tuple's page);
much like the INSERT case our backend's target page may benefit from
pruning.

By looking at the patch I believe that v2-0003 is implementing these 2
ideas. So my question is are we planning to prune the backend's
current target page only or if we can not find space in that then we
are targetting to prune the other target pages as well which we are
getting from FSM? Because in the patch you have put a check in a loop
it will try to prune every page it gets from the FSM not just the
current target page of the backend. Just wanted to understand if this
is intentional.

Yes, just like with our opportunistically pruning on each read during
a select I think we should at least check when we have a new target
page. This seems particularly true since we're hoping to write to the
page anyway, and the cost of additionally making pruning changes to
that page is low. I looked at freespace.c, and it doesn't appear that
getting the block from the FSM does this already, so we're not
duplicating any existing work.

Regards,
James

#7Peter Smith
smithpb2250@gmail.com
In reply to: James Coleman (#6)
Re: Opportunistically pruning page before update

2024-01 Commitfest.

Hi, This patch has a CF status of "Needs Review" [1]https://commitfest.postgresql.org/46/4384//, but it seems
like there was some CFbot test failure last time it was run [2]https://cirrus-ci.com/github/postgresql-cfbot/postgresql/commitfest/46/4384.
Please have a look and post an updated version if necessary.

======
[1]: https://commitfest.postgresql.org/46/4384//
[2]: https://cirrus-ci.com/github/postgresql-cfbot/postgresql/commitfest/46/4384

#8James Coleman
jtc331@gmail.com
In reply to: Peter Smith (#7)
2 attachment(s)
Re: Opportunistically pruning page before update

On Sun, Jan 21, 2024 at 9:58 PM Peter Smith <smithpb2250@gmail.com> wrote:

2024-01 Commitfest.

Hi, This patch has a CF status of "Needs Review" [1], but it seems
like there was some CFbot test failure last time it was run [2].
Please have a look and post an updated version if necessary.

======
[1] https://commitfest.postgresql.org/46/4384//
[2] https://cirrus-ci.com/github/postgresql-cfbot/postgresql/commitfest/46/4384

See rebased patch attached.

Thanks,
James Coleman

Attachments:

v3-0001-Allow-getting-lock-before-calling-heap_page_prune.patchapplication/octet-stream; name=v3-0001-Allow-getting-lock-before-calling-heap_page_prune.patchDownload
From e644b2e38135be625d55f67597020ed7d66de29f Mon Sep 17 00:00:00 2001
From: jcoleman <jtc331@gmail.com>
Date: Fri, 2 Jun 2023 10:01:06 -0400
Subject: [PATCH v3 1/2] Allow getting lock before calling
 heap_page_prune_opt()

---
 src/backend/access/heap/heapam.c         |  2 +-
 src/backend/access/heap/heapam_handler.c |  4 ++--
 src/backend/access/heap/pruneheap.c      | 15 +++++++++------
 src/include/access/heapam.h              |  4 ++--
 4 files changed, 14 insertions(+), 11 deletions(-)

diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index 707460a536..e4b659f944 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -415,7 +415,7 @@ heapgetpage(TableScanDesc sscan, BlockNumber block)
 	/*
 	 * Prune and repair fragmentation for the whole page, if possible.
 	 */
-	heap_page_prune_opt(scan->rs_base.rs_rd, buffer);
+	heap_page_prune_opt(scan->rs_base.rs_rd, buffer, false);
 
 	/*
 	 * We must hold share lock on the buffer content while examining tuple
diff --git a/src/backend/access/heap/heapam_handler.c b/src/backend/access/heap/heapam_handler.c
index d15a02b2be..bdc98ac55b 100644
--- a/src/backend/access/heap/heapam_handler.c
+++ b/src/backend/access/heap/heapam_handler.c
@@ -135,7 +135,7 @@ heapam_index_fetch_tuple(struct IndexFetchTableData *scan,
 		 * Prune page, but only if we weren't already on this page
 		 */
 		if (prev_buf != hscan->xs_cbuf)
-			heap_page_prune_opt(hscan->xs_base.rel, hscan->xs_cbuf);
+			heap_page_prune_opt(hscan->xs_base.rel, hscan->xs_cbuf, false);
 	}
 
 	/* Obtain share-lock on the buffer so we can examine visibility */
@@ -2150,7 +2150,7 @@ heapam_scan_bitmap_next_block(TableScanDesc scan,
 	/*
 	 * Prune and repair fragmentation for the whole page, if possible.
 	 */
-	heap_page_prune_opt(scan->rs_rd, buffer);
+	heap_page_prune_opt(scan->rs_rd, buffer, false);
 
 	/*
 	 * We must hold share lock on the buffer content while examining tuple
diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index 5917633567..1334ffec01 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -84,10 +84,11 @@ static void page_verify_redirects(Page page);
  * Note: this is called quite often.  It's important that it fall out quickly
  * if there's not any use in pruning.
  *
- * Caller must have pin on the buffer, and must *not* have a lock on it.
+ * Caller must have pin on the buffer, and must either have an exclusive lock
+ * (and pass already_locked = true) or not have a lock on it.
  */
 void
-heap_page_prune_opt(Relation relation, Buffer buffer)
+heap_page_prune_opt(Relation relation, Buffer buffer, bool already_locked)
 {
 	Page		page = BufferGetPage(buffer);
 	TransactionId prune_xid;
@@ -138,8 +139,9 @@ heap_page_prune_opt(Relation relation, Buffer buffer)
 
 	if (PageIsFull(page) || PageGetHeapFreeSpace(page) < minfree)
 	{
-		/* OK, try to get exclusive buffer lock */
-		if (!ConditionalLockBufferForCleanup(buffer))
+		/* OK, try to get exclusive buffer lock if necessary */
+		if ((!already_locked && !ConditionalLockBufferForCleanup(buffer)) ||
+				(already_locked && !IsBufferCleanupOK(buffer)))
 			return;
 
 		/*
@@ -178,8 +180,9 @@ heap_page_prune_opt(Relation relation, Buffer buffer)
 											   presult.ndeleted - presult.nnewlpdead);
 		}
 
-		/* And release buffer lock */
-		LockBuffer(buffer, BUFFER_LOCK_UNLOCK);
+		/* And release buffer lock if we acquired it */
+		if (!already_locked)
+			LockBuffer(buffer, BUFFER_LOCK_UNLOCK);
 
 		/*
 		 * We avoid reuse of any free space created on the page by unrelated
diff --git a/src/include/access/heapam.h b/src/include/access/heapam.h
index 4b133f6859..18ce464a2a 100644
--- a/src/include/access/heapam.h
+++ b/src/include/access/heapam.h
@@ -317,8 +317,8 @@ extern TransactionId heap_index_delete_tuples(Relation rel,
 
 /* in heap/pruneheap.c */
 struct GlobalVisState;
-extern void heap_page_prune_opt(Relation relation, Buffer buffer);
-extern void heap_page_prune(Relation relation, Buffer buffer,
+extern void	heap_page_prune_opt(Relation relation, Buffer buffer, bool already_locked);
+extern void	heap_page_prune(Relation relation, Buffer buffer,
 							struct GlobalVisState *vistest,
 							bool mark_unused_now,
 							PruneResult *presult,
-- 
2.39.3 (Apple Git-145)

v3-0002-Opportunistically-prune-to-avoid-building-a-new-p.patchapplication/octet-stream; name=v3-0002-Opportunistically-prune-to-avoid-building-a-new-p.patchDownload
From 2b4c36f69548c1538463fee5e60612706ceaece8 Mon Sep 17 00:00:00 2001
From: jcoleman <jtc331@gmail.com>
Date: Fri, 2 Jun 2023 10:02:07 -0400
Subject: [PATCH v3 2/2] Opportunistically prune to avoid building a new page
 for an update

---
 src/backend/access/heap/heapam.c    | 34 +++++++++++++++++++++++++++++
 src/backend/access/heap/hio.c       | 20 +++++++++++++++++
 src/backend/access/heap/pruneheap.c |  2 +-
 src/include/access/heapam.h         |  2 +-
 4 files changed, 56 insertions(+), 2 deletions(-)

diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index e4b659f944..bd70714b50 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -3025,9 +3025,16 @@ heap_update(Relation relation, ItemPointer otid, HeapTuple newtup,
 				infomask2_old_tuple,
 				infomask_new_tuple,
 				infomask2_new_tuple;
+#ifdef USE_ASSERT_CHECKING
+	ItemPointerData original_otid;
+#endif
 
 	Assert(ItemPointerIsValid(otid));
 
+#ifdef USE_ASSERT_CHECKING
+	ItemPointerCopy(otid, &original_otid);
+#endif
+
 	/* Cheap, simplistic check that the tuple matches the rel's rowtype. */
 	Assert(HeapTupleHeaderGetNatts(newtup->t_data) <=
 		   RelationGetNumberOfAttributes(relation));
@@ -3149,6 +3156,7 @@ heap_update(Relation relation, ItemPointer otid, HeapTuple newtup,
 	}
 
 	/*
+	 * TODO: is this note the problem pointer?
 	 * Note: beyond this point, use oldtup not otid to refer to old tuple.
 	 * otid may very well point at newtup->t_self, which we will overwrite
 	 * with the new tuple's location, so there's great risk of confusion if we
@@ -3635,13 +3643,39 @@ l2:
 			if (newtupsize > pagefree)
 			{
 				/* It doesn't fit, must use RelationGetBufferForTuple. */
+				/* TODO: every time we call this we need to make sure we don't have pointers into the page */
 				newbuf = RelationGetBufferForTuple(relation, heaptup->t_len,
 												   buffer, 0, NULL,
 												   &vmbuffer_new, &vmbuffer,
 												   0);
+
+#ifdef USE_ASSERT_CHECKING
+				/*
+				 * About 500 lines ago in this function a comment claimed we
+				 * might not be able to rely on otid after that point, but it
+				 * appears we can.
+				 */
+				Assert(ItemPointerEquals(otid, &original_otid));
+#endif
+
+				/*
+				 * Getting a buffer may have pruned the page, so we can't rely
+				 * on our original pointer into the page.
+				 */
+				lp = PageGetItemId(page, ItemPointerGetOffsetNumber(otid));
+				Assert(ItemIdIsNormal(lp));
+
+				/*
+				 * Mirror what we filled into oldtup at the beginning
+				 * of this function.
+				 */
+				oldtup.t_data = (HeapTupleHeader) PageGetItem(page, lp);
+				oldtup.t_len = ItemIdGetLength(lp);
+
 				/* We're all done. */
 				break;
 			}
+
 			/* Acquire VM page pin if needed and we don't have it. */
 			if (vmbuffer == InvalidBuffer && PageIsAllVisible(page))
 				visibilitymap_pin(relation, block, &vmbuffer);
diff --git a/src/backend/access/heap/hio.c b/src/backend/access/heap/hio.c
index c7248d7c68..bb377596ab 100644
--- a/src/backend/access/heap/hio.c
+++ b/src/backend/access/heap/hio.c
@@ -707,6 +707,26 @@ loop:
 			RelationSetTargetBlock(relation, targetBlock);
 			return buffer;
 		}
+		else
+		{
+			/*
+			 * Opportunistically prune and see if that frees up enough space to
+			 * avoid needing to build a new page.
+			 */
+			heap_page_prune_opt(relation, buffer, true);
+
+			/* If pruning freed up any slots, we can check free space again. */
+			if (PageHasFreeLinePointers(page))
+			{
+				pageFreeSpace = PageGetHeapFreeSpace(page);
+				if (targetFreeSpace <= pageFreeSpace)
+				{
+					/* use this page as future insert target, too */
+					RelationSetTargetBlock(relation, targetBlock);
+					return buffer;
+				}
+			}
+		}
 
 		/*
 		 * Not enough space, so we must give up our page locks and pin (if
diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index 1334ffec01..0c9c6f5d12 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -215,7 +215,7 @@ heap_page_prune_opt(Relation relation, Buffer buffer, bool already_locked)
  * tuples removed and the number of line pointers newly marked LP_DEAD.
  * heap_page_prune() is responsible for initializing it.
  */
-void
+int
 heap_page_prune(Relation relation, Buffer buffer,
 				GlobalVisState *vistest,
 				bool mark_unused_now,
diff --git a/src/include/access/heapam.h b/src/include/access/heapam.h
index 18ce464a2a..ebc6a9d3ae 100644
--- a/src/include/access/heapam.h
+++ b/src/include/access/heapam.h
@@ -318,7 +318,7 @@ extern TransactionId heap_index_delete_tuples(Relation rel,
 /* in heap/pruneheap.c */
 struct GlobalVisState;
 extern void	heap_page_prune_opt(Relation relation, Buffer buffer, bool already_locked);
-extern void	heap_page_prune(Relation relation, Buffer buffer,
+extern int	heap_page_prune(Relation relation, Buffer buffer,
 							struct GlobalVisState *vistest,
 							bool mark_unused_now,
 							PruneResult *presult,
-- 
2.39.3 (Apple Git-145)

#9James Coleman
jtc331@gmail.com
In reply to: James Coleman (#8)
2 attachment(s)
Re: Opportunistically pruning page before update

On Mon, Jan 22, 2024 at 8:21 PM James Coleman <jtc331@gmail.com> wrote:

See rebased patch attached.

I just realized I left a change in during the rebase that wasn't necessary.

v4 attached.

Regards,
James Coleman

Attachments:

v4-0002-Opportunistically-prune-to-avoid-building-a-new-p.patchapplication/octet-stream; name=v4-0002-Opportunistically-prune-to-avoid-building-a-new-p.patchDownload
From f2b62ab9f6b810ef0c8e34106aa28b621dd528f7 Mon Sep 17 00:00:00 2001
From: jcoleman <jtc331@gmail.com>
Date: Fri, 2 Jun 2023 10:02:07 -0400
Subject: [PATCH v4 2/2] Opportunistically prune to avoid building a new page
 for an update

---
 src/backend/access/heap/heapam.c | 34 ++++++++++++++++++++++++++++++++
 src/backend/access/heap/hio.c    | 20 +++++++++++++++++++
 2 files changed, 54 insertions(+)

diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index e4b659f944..bd70714b50 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -3025,9 +3025,16 @@ heap_update(Relation relation, ItemPointer otid, HeapTuple newtup,
 				infomask2_old_tuple,
 				infomask_new_tuple,
 				infomask2_new_tuple;
+#ifdef USE_ASSERT_CHECKING
+	ItemPointerData original_otid;
+#endif
 
 	Assert(ItemPointerIsValid(otid));
 
+#ifdef USE_ASSERT_CHECKING
+	ItemPointerCopy(otid, &original_otid);
+#endif
+
 	/* Cheap, simplistic check that the tuple matches the rel's rowtype. */
 	Assert(HeapTupleHeaderGetNatts(newtup->t_data) <=
 		   RelationGetNumberOfAttributes(relation));
@@ -3149,6 +3156,7 @@ heap_update(Relation relation, ItemPointer otid, HeapTuple newtup,
 	}
 
 	/*
+	 * TODO: is this note the problem pointer?
 	 * Note: beyond this point, use oldtup not otid to refer to old tuple.
 	 * otid may very well point at newtup->t_self, which we will overwrite
 	 * with the new tuple's location, so there's great risk of confusion if we
@@ -3635,13 +3643,39 @@ l2:
 			if (newtupsize > pagefree)
 			{
 				/* It doesn't fit, must use RelationGetBufferForTuple. */
+				/* TODO: every time we call this we need to make sure we don't have pointers into the page */
 				newbuf = RelationGetBufferForTuple(relation, heaptup->t_len,
 												   buffer, 0, NULL,
 												   &vmbuffer_new, &vmbuffer,
 												   0);
+
+#ifdef USE_ASSERT_CHECKING
+				/*
+				 * About 500 lines ago in this function a comment claimed we
+				 * might not be able to rely on otid after that point, but it
+				 * appears we can.
+				 */
+				Assert(ItemPointerEquals(otid, &original_otid));
+#endif
+
+				/*
+				 * Getting a buffer may have pruned the page, so we can't rely
+				 * on our original pointer into the page.
+				 */
+				lp = PageGetItemId(page, ItemPointerGetOffsetNumber(otid));
+				Assert(ItemIdIsNormal(lp));
+
+				/*
+				 * Mirror what we filled into oldtup at the beginning
+				 * of this function.
+				 */
+				oldtup.t_data = (HeapTupleHeader) PageGetItem(page, lp);
+				oldtup.t_len = ItemIdGetLength(lp);
+
 				/* We're all done. */
 				break;
 			}
+
 			/* Acquire VM page pin if needed and we don't have it. */
 			if (vmbuffer == InvalidBuffer && PageIsAllVisible(page))
 				visibilitymap_pin(relation, block, &vmbuffer);
diff --git a/src/backend/access/heap/hio.c b/src/backend/access/heap/hio.c
index c7248d7c68..bb377596ab 100644
--- a/src/backend/access/heap/hio.c
+++ b/src/backend/access/heap/hio.c
@@ -707,6 +707,26 @@ loop:
 			RelationSetTargetBlock(relation, targetBlock);
 			return buffer;
 		}
+		else
+		{
+			/*
+			 * Opportunistically prune and see if that frees up enough space to
+			 * avoid needing to build a new page.
+			 */
+			heap_page_prune_opt(relation, buffer, true);
+
+			/* If pruning freed up any slots, we can check free space again. */
+			if (PageHasFreeLinePointers(page))
+			{
+				pageFreeSpace = PageGetHeapFreeSpace(page);
+				if (targetFreeSpace <= pageFreeSpace)
+				{
+					/* use this page as future insert target, too */
+					RelationSetTargetBlock(relation, targetBlock);
+					return buffer;
+				}
+			}
+		}
 
 		/*
 		 * Not enough space, so we must give up our page locks and pin (if
-- 
2.39.3 (Apple Git-145)

v4-0001-Allow-getting-lock-before-calling-heap_page_prune.patchapplication/octet-stream; name=v4-0001-Allow-getting-lock-before-calling-heap_page_prune.patchDownload
From e644b2e38135be625d55f67597020ed7d66de29f Mon Sep 17 00:00:00 2001
From: jcoleman <jtc331@gmail.com>
Date: Fri, 2 Jun 2023 10:01:06 -0400
Subject: [PATCH v4 1/2] Allow getting lock before calling
 heap_page_prune_opt()

---
 src/backend/access/heap/heapam.c         |  2 +-
 src/backend/access/heap/heapam_handler.c |  4 ++--
 src/backend/access/heap/pruneheap.c      | 15 +++++++++------
 src/include/access/heapam.h              |  4 ++--
 4 files changed, 14 insertions(+), 11 deletions(-)

diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index 707460a536..e4b659f944 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -415,7 +415,7 @@ heapgetpage(TableScanDesc sscan, BlockNumber block)
 	/*
 	 * Prune and repair fragmentation for the whole page, if possible.
 	 */
-	heap_page_prune_opt(scan->rs_base.rs_rd, buffer);
+	heap_page_prune_opt(scan->rs_base.rs_rd, buffer, false);
 
 	/*
 	 * We must hold share lock on the buffer content while examining tuple
diff --git a/src/backend/access/heap/heapam_handler.c b/src/backend/access/heap/heapam_handler.c
index d15a02b2be..bdc98ac55b 100644
--- a/src/backend/access/heap/heapam_handler.c
+++ b/src/backend/access/heap/heapam_handler.c
@@ -135,7 +135,7 @@ heapam_index_fetch_tuple(struct IndexFetchTableData *scan,
 		 * Prune page, but only if we weren't already on this page
 		 */
 		if (prev_buf != hscan->xs_cbuf)
-			heap_page_prune_opt(hscan->xs_base.rel, hscan->xs_cbuf);
+			heap_page_prune_opt(hscan->xs_base.rel, hscan->xs_cbuf, false);
 	}
 
 	/* Obtain share-lock on the buffer so we can examine visibility */
@@ -2150,7 +2150,7 @@ heapam_scan_bitmap_next_block(TableScanDesc scan,
 	/*
 	 * Prune and repair fragmentation for the whole page, if possible.
 	 */
-	heap_page_prune_opt(scan->rs_rd, buffer);
+	heap_page_prune_opt(scan->rs_rd, buffer, false);
 
 	/*
 	 * We must hold share lock on the buffer content while examining tuple
diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index 5917633567..1334ffec01 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -84,10 +84,11 @@ static void page_verify_redirects(Page page);
  * Note: this is called quite often.  It's important that it fall out quickly
  * if there's not any use in pruning.
  *
- * Caller must have pin on the buffer, and must *not* have a lock on it.
+ * Caller must have pin on the buffer, and must either have an exclusive lock
+ * (and pass already_locked = true) or not have a lock on it.
  */
 void
-heap_page_prune_opt(Relation relation, Buffer buffer)
+heap_page_prune_opt(Relation relation, Buffer buffer, bool already_locked)
 {
 	Page		page = BufferGetPage(buffer);
 	TransactionId prune_xid;
@@ -138,8 +139,9 @@ heap_page_prune_opt(Relation relation, Buffer buffer)
 
 	if (PageIsFull(page) || PageGetHeapFreeSpace(page) < minfree)
 	{
-		/* OK, try to get exclusive buffer lock */
-		if (!ConditionalLockBufferForCleanup(buffer))
+		/* OK, try to get exclusive buffer lock if necessary */
+		if ((!already_locked && !ConditionalLockBufferForCleanup(buffer)) ||
+				(already_locked && !IsBufferCleanupOK(buffer)))
 			return;
 
 		/*
@@ -178,8 +180,9 @@ heap_page_prune_opt(Relation relation, Buffer buffer)
 											   presult.ndeleted - presult.nnewlpdead);
 		}
 
-		/* And release buffer lock */
-		LockBuffer(buffer, BUFFER_LOCK_UNLOCK);
+		/* And release buffer lock if we acquired it */
+		if (!already_locked)
+			LockBuffer(buffer, BUFFER_LOCK_UNLOCK);
 
 		/*
 		 * We avoid reuse of any free space created on the page by unrelated
diff --git a/src/include/access/heapam.h b/src/include/access/heapam.h
index 4b133f6859..18ce464a2a 100644
--- a/src/include/access/heapam.h
+++ b/src/include/access/heapam.h
@@ -317,8 +317,8 @@ extern TransactionId heap_index_delete_tuples(Relation rel,
 
 /* in heap/pruneheap.c */
 struct GlobalVisState;
-extern void heap_page_prune_opt(Relation relation, Buffer buffer);
-extern void heap_page_prune(Relation relation, Buffer buffer,
+extern void	heap_page_prune_opt(Relation relation, Buffer buffer, bool already_locked);
+extern void	heap_page_prune(Relation relation, Buffer buffer,
 							struct GlobalVisState *vistest,
 							bool mark_unused_now,
 							PruneResult *presult,
-- 
2.39.3 (Apple Git-145)

#10Dilip Kumar
dilipbalaut@gmail.com
In reply to: James Coleman (#9)
Re: Opportunistically pruning page before update

On Tue, Jan 23, 2024 at 7:18 AM James Coleman <jtc331@gmail.com> wrote:

On Mon, Jan 22, 2024 at 8:21 PM James Coleman <jtc331@gmail.com> wrote:

See rebased patch attached.

I just realized I left a change in during the rebase that wasn't necessary.

v4 attached.

I have noticed that you are performing the opportunistic pruning after
we decided that the updated tuple can not fit in the current page and
then we are performing the pruning on the new target page. Why don't
we first perform the pruning on the existing page of the tuple itself?
Or this is already being done before this patch? I could not find
such existing pruning so got this question because such pruning can
convert many non-hot updates to the HOT update right?

--
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com

#11James Coleman
jtc331@gmail.com
In reply to: Dilip Kumar (#10)
4 attachment(s)
Re: Opportunistically pruning page before update

On Tue, Jan 23, 2024 at 2:46 AM Dilip Kumar <dilipbalaut@gmail.com> wrote:

On Tue, Jan 23, 2024 at 7:18 AM James Coleman <jtc331@gmail.com> wrote:

On Mon, Jan 22, 2024 at 8:21 PM James Coleman <jtc331@gmail.com> wrote:

See rebased patch attached.

I just realized I left a change in during the rebase that wasn't necessary.

v4 attached.

I have noticed that you are performing the opportunistic pruning after
we decided that the updated tuple can not fit in the current page and
then we are performing the pruning on the new target page. Why don't
we first perform the pruning on the existing page of the tuple itself?
Or this is already being done before this patch? I could not find
such existing pruning so got this question because such pruning can
convert many non-hot updates to the HOT update right?

First off I noticed that I accidentally sent a different version of
the patch I'd originally worked on. Here's the one from the proper
branch. It's still similar, but I want to make sure the right one is
being reviewed.

I'm working on a demo case for updates (to go along with the insert
case I sent earlier) to test out your question, and I'll reply when I
have that.

Regards,
James Coleman

Attachments:

v5-0004-log-when-pruning-2.patchapplication/octet-stream; name=v5-0004-log-when-pruning-2.patchDownload
From bb7691eee50d2b8205e66f32f83e5698cb80e067 Mon Sep 17 00:00:00 2001
From: jcoleman <jtc331@gmail.com>
Date: Wed, 4 Oct 2023 09:43:32 -0400
Subject: [PATCH v5 4/6] log when pruning (2)

---
 src/backend/access/heap/hio.c | 16 ++++++++++++++++
 1 file changed, 16 insertions(+)

diff --git a/src/backend/access/heap/hio.c b/src/backend/access/heap/hio.c
index e958311082..b61d41fcc7 100644
--- a/src/backend/access/heap/hio.c
+++ b/src/backend/access/heap/hio.c
@@ -704,6 +704,8 @@ loop:
 		if (targetFreeSpace <= pageFreeSpace)
 		{
 			/* use this page as future insert target, too */
+			if (relation->rd_id > 16384)
+				elog(WARNING, "page has free space");
 			RelationSetTargetBlock(relation, targetBlock);
 			return buffer;
 		}
@@ -713,6 +715,8 @@ loop:
 			 * Opportunistically prune and see if that frees up enough space to
 			 * avoid needing to build a new page.
 			 */
+			if (relation->rd_id > 16384)
+				elog(WARNING, "calling heap_page_prune_opt");
 			heap_page_prune_opt(relation, buffer, true);
 
 			/*
@@ -726,8 +730,20 @@ loop:
 				{
 					/* use this page as future insert target, too */
 					RelationSetTargetBlock(relation, targetBlock);
+					if (relation->rd_id > 16384)
+						elog(WARNING, "heap_page_prune_opt found enough space");
 					return buffer;
 				}
+				else
+				{
+					if (relation->rd_id > 16384)
+						elog(WARNING, "heap_page_prune_opt did not find enough space");
+				}
+			}
+			else
+			{
+				if (relation->rd_id > 16384)
+					elog(WARNING, "heap_page_prune_opt did not unset PD_PAGE_FULL");
 			}
 		}
 
-- 
2.39.3 (Apple Git-145)

v5-0002-log-when-pruning.patchapplication/octet-stream; name=v5-0002-log-when-pruning.patchDownload
From 78b114a691121b2c992c520beda72cdaf959f783 Mon Sep 17 00:00:00 2001
From: jcoleman <jtc331@gmail.com>
Date: Tue, 3 Oct 2023 19:53:41 -0400
Subject: [PATCH v5 2/6] log when pruning

---
 src/backend/access/heap/pruneheap.c | 8 ++++++++
 src/include/storage/bufpage.h       | 3 +++
 2 files changed, 11 insertions(+)

diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index 1334ffec01..42a2a7fa7e 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -95,6 +95,8 @@ heap_page_prune_opt(Relation relation, Buffer buffer, bool already_locked)
 	GlobalVisState *vistest;
 	Size		minfree;
 
+	if (relation->rd_id > 16384)
+		elog(WARNING, "in heap_page_prune_opt");
 	/*
 	 * We can't write WAL in recovery mode, so there's no point trying to
 	 * clean the page. The primary will likely issue a cleaning WAL record
@@ -229,6 +231,8 @@ heap_page_prune(Relation relation, Buffer buffer,
 	PruneState	prstate;
 	HeapTupleData tup;
 
+	if (relation->rd_id > 16384)
+	  elog(WARNING, "in heap_page_prune");
 	/*
 	 * Our strategy is to scan the page and make lists of items to change,
 	 * then apply the changes within a critical section.  This keeps as much
@@ -356,6 +360,8 @@ heap_page_prune(Relation relation, Buffer buffer,
 		 * Update the page's pd_prune_xid field to either zero, or the lowest
 		 * XID of any soon-prunable tuple.
 		 */
+		if (relation->rd_id > 16384)
+			elog(WARNING, "setting pd_prune_xid=%u", prstate.new_prune_xid);
 		((PageHeader) page)->pd_prune_xid = prstate.new_prune_xid;
 
 		/*
@@ -422,6 +428,8 @@ heap_page_prune(Relation relation, Buffer buffer,
 		if (((PageHeader) page)->pd_prune_xid != prstate.new_prune_xid ||
 			PageIsFull(page))
 		{
+			if (relation->rd_id > 16384)
+				elog(WARNING, "setting pd_prune_xid=%u", prstate.new_prune_xid);
 			((PageHeader) page)->pd_prune_xid = prstate.new_prune_xid;
 			PageClearFull(page);
 			MarkBufferDirtyHint(buffer, true);
diff --git a/src/include/storage/bufpage.h b/src/include/storage/bufpage.h
index d0df02d39c..6b93aa02d3 100644
--- a/src/include/storage/bufpage.h
+++ b/src/include/storage/bufpage.h
@@ -447,7 +447,10 @@ do { \
 	Assert(TransactionIdIsNormal(xid)); \
 	if (!TransactionIdIsValid(((PageHeader) (page))->pd_prune_xid) || \
 		TransactionIdPrecedes(xid, ((PageHeader) (page))->pd_prune_xid)) \
+	{ \
+		elog(WARNING, "PageSetPrunable setting pd_prune_xid=%u", xid); \
 		((PageHeader) (page))->pd_prune_xid = (xid); \
+	} \
 } while (0)
 #define PageClearPrunable(page) \
 	(((PageHeader) (page))->pd_prune_xid = InvalidTransactionId)
-- 
2.39.3 (Apple Git-145)

v5-0003-Opportunistically-prune-to-avoid-building-a-new-p.patchapplication/octet-stream; name=v5-0003-Opportunistically-prune-to-avoid-building-a-new-p.patchDownload
From 620298f0e78132fb36de5e95a8d290ab82b9acf0 Mon Sep 17 00:00:00 2001
From: jcoleman <jtc331@gmail.com>
Date: Wed, 4 Oct 2023 09:38:53 -0400
Subject: [PATCH v5 3/6] Opportunistically prune to avoid building a new page
 for inserts

---
 src/backend/access/heap/heapam.c | 34 ++++++++++++++++++++++++++++++++
 src/backend/access/heap/hio.c    | 23 +++++++++++++++++++++
 2 files changed, 57 insertions(+)

diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index e4b659f944..bd70714b50 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -3025,9 +3025,16 @@ heap_update(Relation relation, ItemPointer otid, HeapTuple newtup,
 				infomask2_old_tuple,
 				infomask_new_tuple,
 				infomask2_new_tuple;
+#ifdef USE_ASSERT_CHECKING
+	ItemPointerData original_otid;
+#endif
 
 	Assert(ItemPointerIsValid(otid));
 
+#ifdef USE_ASSERT_CHECKING
+	ItemPointerCopy(otid, &original_otid);
+#endif
+
 	/* Cheap, simplistic check that the tuple matches the rel's rowtype. */
 	Assert(HeapTupleHeaderGetNatts(newtup->t_data) <=
 		   RelationGetNumberOfAttributes(relation));
@@ -3149,6 +3156,7 @@ heap_update(Relation relation, ItemPointer otid, HeapTuple newtup,
 	}
 
 	/*
+	 * TODO: is this note the problem pointer?
 	 * Note: beyond this point, use oldtup not otid to refer to old tuple.
 	 * otid may very well point at newtup->t_self, which we will overwrite
 	 * with the new tuple's location, so there's great risk of confusion if we
@@ -3635,13 +3643,39 @@ l2:
 			if (newtupsize > pagefree)
 			{
 				/* It doesn't fit, must use RelationGetBufferForTuple. */
+				/* TODO: every time we call this we need to make sure we don't have pointers into the page */
 				newbuf = RelationGetBufferForTuple(relation, heaptup->t_len,
 												   buffer, 0, NULL,
 												   &vmbuffer_new, &vmbuffer,
 												   0);
+
+#ifdef USE_ASSERT_CHECKING
+				/*
+				 * About 500 lines ago in this function a comment claimed we
+				 * might not be able to rely on otid after that point, but it
+				 * appears we can.
+				 */
+				Assert(ItemPointerEquals(otid, &original_otid));
+#endif
+
+				/*
+				 * Getting a buffer may have pruned the page, so we can't rely
+				 * on our original pointer into the page.
+				 */
+				lp = PageGetItemId(page, ItemPointerGetOffsetNumber(otid));
+				Assert(ItemIdIsNormal(lp));
+
+				/*
+				 * Mirror what we filled into oldtup at the beginning
+				 * of this function.
+				 */
+				oldtup.t_data = (HeapTupleHeader) PageGetItem(page, lp);
+				oldtup.t_len = ItemIdGetLength(lp);
+
 				/* We're all done. */
 				break;
 			}
+
 			/* Acquire VM page pin if needed and we don't have it. */
 			if (vmbuffer == InvalidBuffer && PageIsAllVisible(page))
 				visibilitymap_pin(relation, block, &vmbuffer);
diff --git a/src/backend/access/heap/hio.c b/src/backend/access/heap/hio.c
index c7248d7c68..e958311082 100644
--- a/src/backend/access/heap/hio.c
+++ b/src/backend/access/heap/hio.c
@@ -707,6 +707,29 @@ loop:
 			RelationSetTargetBlock(relation, targetBlock);
 			return buffer;
 		}
+		else
+		{
+			/*
+			 * Opportunistically prune and see if that frees up enough space to
+			 * avoid needing to build a new page.
+			 */
+			heap_page_prune_opt(relation, buffer, true);
+
+			/*
+			 * If pruning cleared the PG_PAGE_FULL hint bit, then it's worth
+			 * checking free space again.
+			 */
+			if (!PageIsFull(page))
+			{
+				pageFreeSpace = PageGetHeapFreeSpace(page);
+				if (targetFreeSpace <= pageFreeSpace)
+				{
+					/* use this page as future insert target, too */
+					RelationSetTargetBlock(relation, targetBlock);
+					return buffer;
+				}
+			}
+		}
 
 		/*
 		 * Not enough space, so we must give up our page locks and pin (if
-- 
2.39.3 (Apple Git-145)

v5-0001-Allow-getting-lock-before-calling-heap_page_prune.patchapplication/octet-stream; name=v5-0001-Allow-getting-lock-before-calling-heap_page_prune.patchDownload
From 8ecb574a778d16f79c99309f4fc2baa98c9648ed Mon Sep 17 00:00:00 2001
From: jcoleman <jtc331@gmail.com>
Date: Fri, 2 Jun 2023 10:01:06 -0400
Subject: [PATCH v5 1/6] Allow getting lock before calling
 heap_page_prune_opt()

---
 src/backend/access/heap/heapam.c         |  2 +-
 src/backend/access/heap/heapam_handler.c |  4 ++--
 src/backend/access/heap/pruneheap.c      | 15 +++++++++------
 src/include/access/heapam.h              |  2 +-
 4 files changed, 13 insertions(+), 10 deletions(-)

diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index 707460a536..e4b659f944 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -415,7 +415,7 @@ heapgetpage(TableScanDesc sscan, BlockNumber block)
 	/*
 	 * Prune and repair fragmentation for the whole page, if possible.
 	 */
-	heap_page_prune_opt(scan->rs_base.rs_rd, buffer);
+	heap_page_prune_opt(scan->rs_base.rs_rd, buffer, false);
 
 	/*
 	 * We must hold share lock on the buffer content while examining tuple
diff --git a/src/backend/access/heap/heapam_handler.c b/src/backend/access/heap/heapam_handler.c
index d15a02b2be..bdc98ac55b 100644
--- a/src/backend/access/heap/heapam_handler.c
+++ b/src/backend/access/heap/heapam_handler.c
@@ -135,7 +135,7 @@ heapam_index_fetch_tuple(struct IndexFetchTableData *scan,
 		 * Prune page, but only if we weren't already on this page
 		 */
 		if (prev_buf != hscan->xs_cbuf)
-			heap_page_prune_opt(hscan->xs_base.rel, hscan->xs_cbuf);
+			heap_page_prune_opt(hscan->xs_base.rel, hscan->xs_cbuf, false);
 	}
 
 	/* Obtain share-lock on the buffer so we can examine visibility */
@@ -2150,7 +2150,7 @@ heapam_scan_bitmap_next_block(TableScanDesc scan,
 	/*
 	 * Prune and repair fragmentation for the whole page, if possible.
 	 */
-	heap_page_prune_opt(scan->rs_rd, buffer);
+	heap_page_prune_opt(scan->rs_rd, buffer, false);
 
 	/*
 	 * We must hold share lock on the buffer content while examining tuple
diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index 5917633567..1334ffec01 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -84,10 +84,11 @@ static void page_verify_redirects(Page page);
  * Note: this is called quite often.  It's important that it fall out quickly
  * if there's not any use in pruning.
  *
- * Caller must have pin on the buffer, and must *not* have a lock on it.
+ * Caller must have pin on the buffer, and must either have an exclusive lock
+ * (and pass already_locked = true) or not have a lock on it.
  */
 void
-heap_page_prune_opt(Relation relation, Buffer buffer)
+heap_page_prune_opt(Relation relation, Buffer buffer, bool already_locked)
 {
 	Page		page = BufferGetPage(buffer);
 	TransactionId prune_xid;
@@ -138,8 +139,9 @@ heap_page_prune_opt(Relation relation, Buffer buffer)
 
 	if (PageIsFull(page) || PageGetHeapFreeSpace(page) < minfree)
 	{
-		/* OK, try to get exclusive buffer lock */
-		if (!ConditionalLockBufferForCleanup(buffer))
+		/* OK, try to get exclusive buffer lock if necessary */
+		if ((!already_locked && !ConditionalLockBufferForCleanup(buffer)) ||
+				(already_locked && !IsBufferCleanupOK(buffer)))
 			return;
 
 		/*
@@ -178,8 +180,9 @@ heap_page_prune_opt(Relation relation, Buffer buffer)
 											   presult.ndeleted - presult.nnewlpdead);
 		}
 
-		/* And release buffer lock */
-		LockBuffer(buffer, BUFFER_LOCK_UNLOCK);
+		/* And release buffer lock if we acquired it */
+		if (!already_locked)
+			LockBuffer(buffer, BUFFER_LOCK_UNLOCK);
 
 		/*
 		 * We avoid reuse of any free space created on the page by unrelated
diff --git a/src/include/access/heapam.h b/src/include/access/heapam.h
index 4b133f6859..fccad9e21d 100644
--- a/src/include/access/heapam.h
+++ b/src/include/access/heapam.h
@@ -317,7 +317,7 @@ extern TransactionId heap_index_delete_tuples(Relation rel,
 
 /* in heap/pruneheap.c */
 struct GlobalVisState;
-extern void heap_page_prune_opt(Relation relation, Buffer buffer);
+extern void	heap_page_prune_opt(Relation relation, Buffer buffer, bool already_locked);
 extern void heap_page_prune(Relation relation, Buffer buffer,
 							struct GlobalVisState *vistest,
 							bool mark_unused_now,
-- 
2.39.3 (Apple Git-145)

#12James Coleman
jtc331@gmail.com
In reply to: James Coleman (#11)
Re: Opportunistically pruning page before update

On Fri, Jan 26, 2024 at 8:33 PM James Coleman <jtc331@gmail.com> wrote:

On Tue, Jan 23, 2024 at 2:46 AM Dilip Kumar <dilipbalaut@gmail.com> wrote:

On Tue, Jan 23, 2024 at 7:18 AM James Coleman <jtc331@gmail.com> wrote:

On Mon, Jan 22, 2024 at 8:21 PM James Coleman <jtc331@gmail.com> wrote:

See rebased patch attached.

I just realized I left a change in during the rebase that wasn't necessary.

v4 attached.

I have noticed that you are performing the opportunistic pruning after
we decided that the updated tuple can not fit in the current page and
then we are performing the pruning on the new target page. Why don't
we first perform the pruning on the existing page of the tuple itself?
Or this is already being done before this patch? I could not find
such existing pruning so got this question because such pruning can
convert many non-hot updates to the HOT update right?

First off I noticed that I accidentally sent a different version of
the patch I'd originally worked on. Here's the one from the proper
branch. It's still similar, but I want to make sure the right one is
being reviewed.

I'm working on a demo case for updates (to go along with the insert
case I sent earlier) to test out your question, and I'll reply when I
have that.

All right, getting all this loaded back into my head, as you noted
earlier the patch currently implements points 1 and 2 of my list of
possible improvements:

1. The most trivial case where this is useful is INSERT: we have a
target page, and it may have dead tuples, so trying to prune may
result in us being able to use the target page rather than getting a
new page.
2. The next most trivial case is where UPDATE (potentially after
failing to find space for a HOT tuple on the source tuple's page);
much like the INSERT case our backend's target page may benefit from
pruning.

What you're describing above would be implementing (at least part of) point 3:

3. A more complex UPDATE case occurs when we check the tuple's page
for space in order to insert a HOT tuple and fail to find enough
space. While we've already opportunistically pruned the page on
initial read of the tuple, in complex queries this might be some time
in the past, so it may be worth attempting again.
...

If we try to design a simple test case for updates (like my insert
test case above) we might end up with something like:

drop table if exists foo;
create table foo(pk serial primary key, t text);
insert into foo(t) select repeat('a', 250) from generate_series(1, 27);
select pg_relation_size('foo');
delete from foo where pk = 1;
update foo set t = repeat('b', 250) where pk = 2;
select pg_relation_size('foo');

But that actually works as expected on master, because we call
heap_page_prune_opt from heapam_index_fetch_tuple as part of the index
scan that drives the update query.

I was theorizing that if there are concurrent writes to the page we
might being able to trigger the need to re-prune a page in the for
loop in heap_update(), and I tried to both regular pgbench and a
custom pgbench script with inserts/deletes/updates (including some
artificial delays).

What I concluded what this isn't isn't likely to be fruitful: we need
the buffer to be local to our backend (no other pins) to be able to
clean it, but since we've already pruned it on read, we need to have
had another backend modify the page (and dropped its pin!) between our
read and our write.

If someone believes there's a scenario that would demonstrate
otherwise, I would of course be interested to hear any ideas, but at
this point I think it's probably worth focusing on the first two cases
this patch already addresses.

Regards,
James Coleman

#13Melanie Plageman
melanieplageman@gmail.com
In reply to: James Coleman (#11)
Re: Opportunistically pruning page before update

On Fri, Jan 26, 2024 at 8:33 PM James Coleman <jtc331@gmail.com> wrote:

On Tue, Jan 23, 2024 at 2:46 AM Dilip Kumar <dilipbalaut@gmail.com> wrote:

On Tue, Jan 23, 2024 at 7:18 AM James Coleman <jtc331@gmail.com> wrote:

On Mon, Jan 22, 2024 at 8:21 PM James Coleman <jtc331@gmail.com> wrote:

See rebased patch attached.

I just realized I left a change in during the rebase that wasn't necessary.

v4 attached.

I have noticed that you are performing the opportunistic pruning after
we decided that the updated tuple can not fit in the current page and
then we are performing the pruning on the new target page. Why don't
we first perform the pruning on the existing page of the tuple itself?
Or this is already being done before this patch? I could not find
such existing pruning so got this question because such pruning can
convert many non-hot updates to the HOT update right?

First off I noticed that I accidentally sent a different version of
the patch I'd originally worked on. Here's the one from the proper
branch. It's still similar, but I want to make sure the right one is
being reviewed.

I finally got back around to looking at this. Sorry for the delay.

I don't feel confident enough to say at a high level whether or not it
is a good idea in the general case to try pruning every block
RelationGetBufferForTuple() considers as a target block.

But, I did have a few thoughts on the implementation:

heap_page_prune_opt() checks PageGetHeapFreeSpace() twice. You will
have already done that in RelationGetBufferForTuple(). And you don't
need even need to do it both of those times because you have a lock
(which is why heap_page_prune_opt() does it twice). This all seems a
bit wasteful. And then, you do it again after pruning.

This made me think, vacuum cares how much space heap_page_prune() (now
heap_page_prune_and_freeze()) freed up. Now if we add another caller
who cares how much space pruning freed up, perhaps it is worth
calculating this while pruning and returning it. I know
PageGetHeapFreeSpace() isn't the most expensive function in the world,
but it also seems like a useful, relevant piece of information to
inform the caller of.

You don't have to implement the above, it was just something I was
thinking about.

Looking at the code also made me wonder if it is worth changing
RelationGetBufferForTuple() to call PageGetHeapFreeSpace() before
taking a lock (which won't give a totally accurate result, but that's
probably okay) and then call heap_page_prune_opt() without a lock when
PageGetHeapFreeSpace() says there isn't enough space.

Also do we want to do GetVisibilityMapPins() before calling
heap_page_prune_opt()? I don't quite get why we do that before knowing
if we are going to actually use the target block in the current code.

Anyway, I'm not sure I like just adding a parameter to
heap_page_prune_opt() to indicate it already has an exclusive lock. It
does a bunch of stuff that was always done without a lock and now you
are doing it with an exclusive lock.

- Melanie