Emit fewer vacuum records by reaping removable tuples during pruning

Started by Melanie Plagemanover 2 years ago103 messages
Jump to latest
#1Melanie Plageman
melanieplageman@gmail.com

Hi,

When there are no indexes on the relation, we can set would-be dead
items LP_UNUSED and remove them during pruning. This saves us a vacuum
WAL record, reducing WAL volume (and time spent writing and syncing
WAL).

See this example:

drop table if exists foo;
create table foo(a int) with (autovacuum_enabled=false);
insert into foo select i from generate_series(1,10000000)i;
update foo set a = 10;
\timing on
vacuum foo;

On my machine, the attached patch set provides a 10% speedup for vacuum
for this example -- and a 40% decrease in WAL bytes emitted.

Admittedly, this case is probably unusual in the real world. On-access
pruning would preclude it. Throw a SELECT * FROM foo before the vacuum
and the patch has no performance benefit.

However, it has no downside as far as I can tell. And, IMHO, it is a
code clarity improvement. This change means that lazy_vacuum_heap_page()
is only called when we are actually doing a second pass and reaping dead
items. I found it quite confusing that lazy_vacuum_heap_page() was
called by lazy_scan_heap() to set dead items unused in a block that we
just pruned.

I think it also makes it clear that we should update the VM in
lazy_scan_prune(). All callers of lazy_scan_prune() will now consider
updating the VM after returning. And most of the state communicated back
to lazy_scan_heap() from lazy_scan_prune() is to inform it whether or
not to update the VM. I didn't do that in this patch set because I would
need to pass all_visible_according_to_vm to lazy_scan_prune() and that
change didn't seem worth the improvement in code clarity in
lazy_scan_heap().

I am planning to add a VM update into the freeze record, at which point
I will move the VM update code into lazy_scan_prune(). This will then
allow us to consolidate the freespace map update code for the prune and
noprune cases and make lazy_scan_heap() short and sweet.

Note that (on principle) this patch set is on top of the bug fix I
proposed in [1]/messages/by-id/CAAKRu_YiL=44GvGnt1dpYouDSSoV7wzxVoXs8m3p311rp-TVQQ@mail.gmail.com.

- Melanie

[1]: /messages/by-id/CAAKRu_YiL=44GvGnt1dpYouDSSoV7wzxVoXs8m3p311rp-TVQQ@mail.gmail.com

Attachments:

v1-0003-Set-would-be-dead-items-LP_UNUSED-while-pruning.patchapplication/x-patch; name=v1-0003-Set-would-be-dead-items-LP_UNUSED-while-pruning.patchDownload+125-105
v1-0001-Release-lock-on-heap-buffer-before-vacuuming-FSM.patchapplication/x-patch; name=v1-0001-Release-lock-on-heap-buffer-before-vacuuming-FSM.patchDownload+12-13
v1-0002-Indicate-rel-truncation-unsafe-in-lazy_scan-no-pr.patchapplication/x-patch; name=v1-0002-Indicate-rel-truncation-unsafe-in-lazy_scan-no-pr.patchDownload+26-25
In reply to: Melanie Plageman (#1)
Re: Emit fewer vacuum records by reaping removable tuples during pruning

On Mon, Nov 13, 2023 at 2:29 PM Melanie Plageman
<melanieplageman@gmail.com> wrote:

I think it also makes it clear that we should update the VM in
lazy_scan_prune(). All callers of lazy_scan_prune() will now consider
updating the VM after returning. And most of the state communicated back
to lazy_scan_heap() from lazy_scan_prune() is to inform it whether or
not to update the VM.

That makes sense.

I didn't do that in this patch set because I would
need to pass all_visible_according_to_vm to lazy_scan_prune() and that
change didn't seem worth the improvement in code clarity in
lazy_scan_heap().

Have you thought about finding a way to get rid of
all_visible_according_to_vm? (Not necessarily in the scope of the
ongoing work, just in general.)

all_visible_according_to_vm is inherently prone to races -- it tells
us what the VM used to say about the page, back when we looked. It is
very likely that the page isn't visible in this sense, anyway, because
VACUUM is after all choosing to scan the page in the first place when
we end up in lazy_scan_prune. (Granted, this is much less true than it
should be due to the influence of SKIP_PAGES_THRESHOLD, which
implements a weird and inefficient form of prefetching/readahead.)

Why should we necessarily care what all_visible_according_to_vm says
or would say at this point? We're looking at the heap page itself,
which is more authoritative than the VM (theoretically they're equally
authoritative, but not really, not when push comes to shove). The best
answer that I can think of is that all_visible_according_to_vm gives
us a way of noticing and then logging any inconsistencies between VM
and heap that we might end up "repairing" in passing (once we've
rechecked the VM). But maybe that could happen elsewhere.

Perhaps that cross-check could be pushed into visibilitymap.c, when we
go to set a !PageIsAllVisible(page) page all-visible in the VM. If
we're setting it and find that it's already set from earlier on, then
remember and complaing/LOG it. No all_visible_according_to_vm
required, plus this seems like it might be more thorough.

Just a thought.

--
Peter Geoghegan

#3Melanie Plageman
melanieplageman@gmail.com
In reply to: Peter Geoghegan (#2)
Re: Emit fewer vacuum records by reaping removable tuples during pruning

On Mon, Nov 13, 2023 at 5:59 PM Peter Geoghegan <pg@bowt.ie> wrote:

On Mon, Nov 13, 2023 at 2:29 PM Melanie Plageman
<melanieplageman@gmail.com> wrote:

I think it also makes it clear that we should update the VM in
lazy_scan_prune(). All callers of lazy_scan_prune() will now consider
updating the VM after returning. And most of the state communicated back
to lazy_scan_heap() from lazy_scan_prune() is to inform it whether or
not to update the VM.

That makes sense.

I didn't do that in this patch set because I would
need to pass all_visible_according_to_vm to lazy_scan_prune() and that
change didn't seem worth the improvement in code clarity in
lazy_scan_heap().

Have you thought about finding a way to get rid of
all_visible_according_to_vm? (Not necessarily in the scope of the
ongoing work, just in general.)

all_visible_according_to_vm is inherently prone to races -- it tells
us what the VM used to say about the page, back when we looked. It is
very likely that the page isn't visible in this sense, anyway, because
VACUUM is after all choosing to scan the page in the first place when
we end up in lazy_scan_prune. (Granted, this is much less true than it
should be due to the influence of SKIP_PAGES_THRESHOLD, which
implements a weird and inefficient form of prefetching/readahead.)

Why should we necessarily care what all_visible_according_to_vm says
or would say at this point? We're looking at the heap page itself,
which is more authoritative than the VM (theoretically they're equally
authoritative, but not really, not when push comes to shove). The best
answer that I can think of is that all_visible_according_to_vm gives
us a way of noticing and then logging any inconsistencies between VM
and heap that we might end up "repairing" in passing (once we've
rechecked the VM). But maybe that could happen elsewhere.

Perhaps that cross-check could be pushed into visibilitymap.c, when we
go to set a !PageIsAllVisible(page) page all-visible in the VM. If
we're setting it and find that it's already set from earlier on, then
remember and complaing/LOG it. No all_visible_according_to_vm
required, plus this seems like it might be more thorough.

Setting aside the data corruption cases (the repairing you mentioned), I
think the primary case that all_visible_according_to_vm seeks to protect
us from is if the page was already marked all visible in the VM and
pruning did not change that. So, it avoids a call to visibilitymap_set()
when the page was known to already be set all visible (as long as we
didn't newly set all frozen).

As you say, this does not seem like the common case. Pages we vacuum
will most often not be all visible.

I actually wonder if it is worse to rely on that old value of all
visible and then call visibilitymap_set(), which takes an exclusive lock
on the VM page, than it would be to just call visibilitymap_get_status()
anew. This is what we do with all frozen.

The problem is that it is kind of hard to tell because the whole thing
is a bit of a tangled knot.

I've ripped this code apart and put it back together again six ways from
Sunday trying to get rid of all_visible_according_to_vm and
next_unskippable_allvis/skipping_current_range's bootleg prefetching
without incurring any additional calls to visibilitymap_get_status().

As best I can tell, our best case scenario is Thomas' streaming read API
goes in, we add vacuum as a user, and we can likely remove the skip
range logic.

Then, the question remains about how useful it is to save the visibility
map statuses we got when deciding whether or not to skip a block.

- Melanie

[1]: /messages/by-id/CA+hUKGJkOiOCa+mag4BF+zHo7qo=o9CFheB8=g6uT5TUm2gkvA@mail.gmail.com

#4Melanie Plageman
melanieplageman@gmail.com
In reply to: Melanie Plageman (#1)
Re: Emit fewer vacuum records by reaping removable tuples during pruning

On Mon, Nov 13, 2023 at 5:28 PM Melanie Plageman
<melanieplageman@gmail.com> wrote:

When there are no indexes on the relation, we can set would-be dead
items LP_UNUSED and remove them during pruning. This saves us a vacuum
WAL record, reducing WAL volume (and time spent writing and syncing
WAL).

...

Note that (on principle) this patch set is on top of the bug fix I
proposed in [1].

[1] /messages/by-id/CAAKRu_YiL=44GvGnt1dpYouDSSoV7wzxVoXs8m3p311rp-TVQQ@mail.gmail.com

Rebased on top of fix in b2e237afddc56a and registered for the january fest
https://commitfest.postgresql.org/46/4665/

- Melanie

Attachments:

v2-0002-Set-would-be-dead-items-LP_UNUSED-while-pruning.patchtext/x-patch; charset=US-ASCII; name=v2-0002-Set-would-be-dead-items-LP_UNUSED-while-pruning.patchDownload+125-111
v2-0001-Indicate-rel-truncation-unsafe-in-lazy_scan-no-pr.patchtext/x-patch; charset=US-ASCII; name=v2-0001-Indicate-rel-truncation-unsafe-in-lazy_scan-no-pr.patchDownload+26-25
#5Melanie Plageman
melanieplageman@gmail.com
In reply to: Melanie Plageman (#4)
Re: Emit fewer vacuum records by reaping removable tuples during pruning

On Fri, Nov 17, 2023 at 6:12 PM Melanie Plageman
<melanieplageman@gmail.com> wrote:

On Mon, Nov 13, 2023 at 5:28 PM Melanie Plageman
<melanieplageman@gmail.com> wrote:

When there are no indexes on the relation, we can set would-be dead
items LP_UNUSED and remove them during pruning. This saves us a vacuum
WAL record, reducing WAL volume (and time spent writing and syncing
WAL).

...

Note that (on principle) this patch set is on top of the bug fix I
proposed in [1].

[1] /messages/by-id/CAAKRu_YiL=44GvGnt1dpYouDSSoV7wzxVoXs8m3p311rp-TVQQ@mail.gmail.com

Rebased on top of fix in b2e237afddc56a and registered for the january fest
https://commitfest.postgresql.org/46/4665/

I got an off-list question about whether or not this codepath is
exercised in existing regression tests. It is -- vacuum.sql tests
include those which vacuum a table with no indexes and tuples that can
be deleted.

I also looked through [1]https://www.postgresql.org/docs/devel/routine-vacuuming.html to see if there were any user-facing docs
which happened to mention the exact implementation details of how and
when tuples are deleted by vacuum. I didn't see anything like that, so
I don't think there are user-facing docs which need updating.

- Melanie

[1]: https://www.postgresql.org/docs/devel/routine-vacuuming.html

#6Michael Paquier
michael@paquier.xyz
In reply to: Melanie Plageman (#3)
Re: Emit fewer vacuum records by reaping removable tuples during pruning

On Mon, Nov 13, 2023 at 07:06:15PM -0500, Melanie Plageman wrote:

As best I can tell, our best case scenario is Thomas' streaming read API
goes in, we add vacuum as a user, and we can likely remove the skip
range logic.

This does not prevent the work you've been doing in 0001 and 0002
posted upthread, right? Some progress is always better than no
progress, and I can see the appeal behind doing 0001 actually to keep
the updates of the block numbers closer to where we determine if
relation truncation is safe of not rather than use an intermediate
state in LVPagePruneState.

0002 is much, much, much trickier..
--
Michael

#7Melanie Plageman
melanieplageman@gmail.com
In reply to: Michael Paquier (#6)
Re: Emit fewer vacuum records by reaping removable tuples during pruning

On Sat, Dec 23, 2023 at 9:14 PM Michael Paquier <michael@paquier.xyz> wrote:

On Mon, Nov 13, 2023 at 07:06:15PM -0500, Melanie Plageman wrote:

As best I can tell, our best case scenario is Thomas' streaming read API
goes in, we add vacuum as a user, and we can likely remove the skip
range logic.

This does not prevent the work you've been doing in 0001 and 0002
posted upthread, right? Some progress is always better than no
progress

Correct. Peter and I were mainly discussing next refactoring steps as
we move toward combining the prune, freeze, and VM records. This
thread's patches stand alone.

I can see the appeal behind doing 0001 actually to keep
the updates of the block numbers closer to where we determine if
relation truncation is safe of not rather than use an intermediate
state in LVPagePruneState.

Exactly.

0002 is much, much, much trickier..

Do you have specific concerns about its correctness? I understand it
is an area where we have to be sure we are correct. But, to be fair,
that is true of all the pruning and vacuuming code.

- Melanie

#8Robert Haas
robertmhaas@gmail.com
In reply to: Melanie Plageman (#7)
Re: Emit fewer vacuum records by reaping removable tuples during pruning

On Wed, Dec 27, 2023 at 11:27 AM Melanie Plageman
<melanieplageman@gmail.com> wrote:

Do you have specific concerns about its correctness? I understand it
is an area where we have to be sure we are correct. But, to be fair,
that is true of all the pruning and vacuuming code.

I'm kind of concerned that 0002 might be a performance regression. It
pushes more branches down into the heap-pruning code, which I think
can sometimes be quite hot, for the sake of a case that rarely occurs
in practice. I take your point about it improving things when there
are no indexes, but what about when there are? And even if there are
no adverse performance consequences, is it really worth complicating
the logic at such a low level?

Also, I find "pronto_reap" to be a poor choice of name. "pronto" is an
informal word that seems to have no advantage over something like
"immediate" or "now," and I don't think "reap" has a precise,
universally-understood meaning. You could call this "mark_unused_now"
or "immediately_mark_unused" or something and it would be far more
self-documenting, IMHO.

--
Robert Haas
EDB: http://www.enterprisedb.com

#9Andres Freund
andres@anarazel.de
In reply to: Robert Haas (#8)
Re: Emit fewer vacuum records by reaping removable tuples during pruning

Hi,

On 2024-01-04 12:31:36 -0500, Robert Haas wrote:

On Wed, Dec 27, 2023 at 11:27 AM Melanie Plageman
<melanieplageman@gmail.com> wrote:

Do you have specific concerns about its correctness? I understand it
is an area where we have to be sure we are correct. But, to be fair,
that is true of all the pruning and vacuuming code.

I'm kind of concerned that 0002 might be a performance regression. It
pushes more branches down into the heap-pruning code, which I think
can sometimes be quite hot, for the sake of a case that rarely occurs
in practice.

I was wondering the same when talking about this with Melanie. But I guess
there are some scenarios that aren't unrealistic, consider e.g. bulk data
loading with COPY with an UPDATE to massage the data afterwards, before
creating the indexes.

Where I could see this becoming more interesting / powerful is if we were able
to do 'pronto reaping' not just from vacuum but also during on-access
pruning. For ETL workloads VACUUM might often not run between modifications of
the same page, but on-access pruning will. Being able to reclaim dead items at
that point seems like it could be pretty sizable improvement?

I take your point about it improving things when there are no indexes, but
what about when there are?

I suspect that branch prediction should take care of the additional
branches. heap_prune_chain() indeed can be very hot, but I think we're
primarily bottlenecked on data cache misses.

For a single VACUUM, whether we'd do pronto reaping would be constant -> very
well predictable. We could add an unlikely() to make sure the
branch-predictor-is-empty case optimizes for the much more common case of
having indexes. Falsely assuming we'd not pronto reap wouldn't be
particularly bad, as the wins for the case are so much bigger.

If we were to use pronto reaping for on-access pruning, it's perhaps a bit
less predictable, as pruning for pages of a relation with indexes could be
interleaved with pruning for relations without. But even there I suspect it'd
not be the primary bottleneck: We call heap_page_prune_chain() in a loop for
every tuple on a page, the branch predictor should quickly learn whether we're
using pronto reaping. Whereas we're not becoming less cache-miss heavy when
looking at subsequent tuples.

And even if there are no adverse performance consequences, is it really
worth complicating the logic at such a low level?

Yes, I think this is the main question here. It's not clear though that the
state after the patch is meaningfullye more complicated? It removes nontrivial
code from lazy_scan_heap() and pays for that with a bit more complexity in
heap_prune_chain().

Also, I find "pronto_reap" to be a poor choice of name. "pronto" is an
informal word that seems to have no advantage over something like
"immediate" or "now,"

I didn't like that either :)

and I don't think "reap" has a precise, universally-understood meaning.

Less concerned about that.

You could call this "mark_unused_now" or "immediately_mark_unused" or
something and it would be far more self-documenting, IMHO.

How about 'no_indexes' or such?

Greetings,

Andres Freund

#10Andres Freund
andres@anarazel.de
In reply to: Melanie Plageman (#4)
Re: Emit fewer vacuum records by reaping removable tuples during pruning

Hi,

On 2023-11-17 18:12:08 -0500, Melanie Plageman wrote:

diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index 14de8158d49..b578c32eeb6 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -8803,8 +8803,13 @@ heap_xlog_prune(XLogReaderState *record)
nunused = (end - nowunused);
Assert(nunused >= 0);
-		/* Update all line pointers per the record, and repair fragmentation */
-		heap_page_prune_execute(buffer,
+		/*
+		 * Update all line pointers per the record, and repair fragmentation.
+		 * We always pass pronto_reap as true, because we don't know whether
+		 * or not this option was used when pruning. This reduces the
+		 * validation done on replay in an assert build.
+		 */

Hm, that seems not great. Both because we loose validation and because it
seems to invite problems down the line, due to pronto_reap falsely being set
to true in heap_page_prune_execute().

@@ -581,7 +589,17 @@ heap_prune_chain(Buffer buffer, OffsetNumber rootoffnum,
* function.)
*/
if (ItemIdIsDead(lp))
+		{
+			/*
+			 * If the relation has no indexes, we can set dead line pointers
+			 * LP_UNUSED now. We don't increment ndeleted here since the LP
+			 * was already marked dead.
+			 */
+			if (prstate->pronto_reap)
+				heap_prune_record_unused(prstate, offnum);
+
break;
+		}

I wasn't immediately sure whether this is reachable - but it is, e.g. after
on-access pruning (which currently doesn't yet use pronto reaping), after
pg_upgrade or dropping an index.

Assert(ItemIdIsNormal(lp));
htup = (HeapTupleHeader) PageGetItem(dp, lp);
@@ -715,7 +733,17 @@ heap_prune_chain(Buffer buffer, OffsetNumber rootoffnum,
* redirect the root to the correct chain member.
*/
if (i >= nchain)
-			heap_prune_record_dead(prstate, rootoffnum);
+		{
+			/*
+			 * If the relation has no indexes, we can remove dead tuples
+			 * during pruning instead of marking their line pointers dead. Set
+			 * this tuple's line pointer LP_UNUSED.
+			 */
+			if (prstate->pronto_reap)
+				heap_prune_record_unused(prstate, rootoffnum);
+			else
+				heap_prune_record_dead(prstate, rootoffnum);
+		}
else
heap_prune_record_redirect(prstate, rootoffnum, chainitems[i]);
}
@@ -726,9 +754,12 @@ heap_prune_chain(Buffer buffer, OffsetNumber rootoffnum,
* item.  This can happen if the loop in heap_page_prune caused us to
* visit the dead successor of a redirect item before visiting the
* redirect item.  We can clean up by setting the redirect item to
-		 * DEAD state.
+		 * DEAD state. If pronto_reap is true, we can set it LP_UNUSED now.
*/
-		heap_prune_record_dead(prstate, rootoffnum);
+		if (prstate->pronto_reap)
+			heap_prune_record_unused(prstate, rootoffnum);
+		else
+			heap_prune_record_dead(prstate, rootoffnum);
}

return ndeleted;

There's three new calls to heap_prune_record_unused() and the logic got more
nested. Is there a way to get to a nicer end result?

From 608658f2cbc0acde55aac815c0fdb523ec24c452 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <melanieplageman@gmail.com>
Date: Mon, 13 Nov 2023 16:47:08 -0500
Subject: [PATCH v2 1/2] Indicate rel truncation unsafe in lazy_scan[no]prune

Both lazy_scan_prune() and lazy_scan_noprune() must determine whether or
not there are tuples on the page making rel truncation unsafe.
LVRelState->nonempty_pages is updated to reflect this. Previously, both
functions set an output parameter or output parameter member, hastup, to
indicate that nonempty_pages should be updated to reflect the latest
non-removable page. There doesn't seem to be any reason to wait until
lazy_scan_[no]prune() returns to update nonempty_pages. Plenty of other
counters in the LVRelState are updated in lazy_scan_[no]prune().
This allows us to get rid of the output parameter hastup.

@@ -972,20 +970,21 @@ lazy_scan_heap(LVRelState *vacrel)
continue;
}

-			/* Collect LP_DEAD items in dead_items array, count tuples */
-			if (lazy_scan_noprune(vacrel, buf, blkno, page, &hastup,
+			/*
+			 * Collect LP_DEAD items in dead_items array, count tuples,
+			 * determine if rel truncation is safe
+			 */
+			if (lazy_scan_noprune(vacrel, buf, blkno, page,
&recordfreespace))
{
Size		freespace = 0;
/*
* Processed page successfully (without cleanup lock) -- just
-				 * need to perform rel truncation and FSM steps, much like the
-				 * lazy_scan_prune case.  Don't bother trying to match its
-				 * visibility map setting steps, though.
+				 * need to update the FSM, much like the lazy_scan_prune case.
+				 * Don't bother trying to match its visibility map setting
+				 * steps, though.
*/
-				if (hastup)
-					vacrel->nonempty_pages = blkno + 1;
if (recordfreespace)
freespace = PageGetHeapFreeSpace(page);
UnlockReleaseBuffer(buf);

The comment continues to say that we "determine if rel truncation is safe" -
but I don't see that? Oh, I see, it's done inside lazy_scan_noprune(). This
doesn't seem like a clear improvement to me. Particularly because it's only
set if lazy_scan_noprune() actually does something.

I don't like the existing code in lazy_scan_heap(). But this kinda seems like
tinkering around the edges, without getting to the heart of the issue. I think
we should

1) Move everything after ReadBufferExtended() and the end of the loop into its
own function

2) All the code in the loop body after the call to lazy_scan_prune() is
specific to the lazy_scan_prune() path, it doesn't make sense that it's at
the same level as the the calls to lazy_scan_noprune(),
lazy_scan_new_or_empty() or lazy_scan_prune(). Either it should be in
lazy_scan_prune() or a new wrapper function.

3) It's imo wrong that we have UnlockReleaseBuffer() (there are 6 different
places unlocking if I didn't miscount!) and RecordPageWithFreeSpace() calls
in this many places. I think this is largely a consequence of the previous
points. Once those are addressed, we can have one common place.

But I digress.

Greetings,

Andres Freund

#11Melanie Plageman
melanieplageman@gmail.com
In reply to: Andres Freund (#10)
Re: Emit fewer vacuum records by reaping removable tuples during pruning

Thanks for the review!

On Thu, Jan 4, 2024 at 3:03 PM Andres Freund <andres@anarazel.de> wrote:

On 2023-11-17 18:12:08 -0500, Melanie Plageman wrote:

Assert(ItemIdIsNormal(lp));
htup = (HeapTupleHeader) PageGetItem(dp, lp);
@@ -715,7 +733,17 @@ heap_prune_chain(Buffer buffer, OffsetNumber rootoffnum,
* redirect the root to the correct chain member.
*/
if (i >= nchain)
-                     heap_prune_record_dead(prstate, rootoffnum);
+             {
+                     /*
+                      * If the relation has no indexes, we can remove dead tuples
+                      * during pruning instead of marking their line pointers dead. Set
+                      * this tuple's line pointer LP_UNUSED.
+                      */
+                     if (prstate->pronto_reap)
+                             heap_prune_record_unused(prstate, rootoffnum);
+                     else
+                             heap_prune_record_dead(prstate, rootoffnum);
+             }
else
heap_prune_record_redirect(prstate, rootoffnum, chainitems[i]);
}
@@ -726,9 +754,12 @@ heap_prune_chain(Buffer buffer, OffsetNumber rootoffnum,
* item.  This can happen if the loop in heap_page_prune caused us to
* visit the dead successor of a redirect item before visiting the
* redirect item.  We can clean up by setting the redirect item to
-              * DEAD state.
+              * DEAD state. If pronto_reap is true, we can set it LP_UNUSED now.
*/
-             heap_prune_record_dead(prstate, rootoffnum);
+             if (prstate->pronto_reap)
+                     heap_prune_record_unused(prstate, rootoffnum);
+             else
+                     heap_prune_record_dead(prstate, rootoffnum);
}

return ndeleted;

There's three new calls to heap_prune_record_unused() and the logic got more
nested. Is there a way to get to a nicer end result?

So, I could do another loop through the line pointers in
heap_page_prune() (after the loop calling heap_prune_chain()) and, if
pronto_reap is true, set dead line pointers LP_UNUSED. Then, when
constructing the WAL record, I would just not add the prstate.nowdead
that I saved from heap_prune_chain() to the prune WAL record.

This would eliminate the extra if statements from heap_prune_chain().
It will be more performant than sticking with the original (master)
call to lazy_vacuum_heap_page(). However, I'm not convinced that the
extra loop to set line pointers LP_DEAD -> LP_UNUSED is less confusing
than keeping the if pronto_reap test in heap_prune_chain().
heap_prune_chain() is where line pointers' new values are decided. It
seems weird to pick one new value for a line pointer in
heap_prune_chain() and then pick another, different new value in a
loop after heap_prune_chain(). I don't see any way to eliminate the if
pronto_reap tests without a separate loop setting LP_DEAD->LP_UNUSED,
though.

From 608658f2cbc0acde55aac815c0fdb523ec24c452 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <melanieplageman@gmail.com>
Date: Mon, 13 Nov 2023 16:47:08 -0500
Subject: [PATCH v2 1/2] Indicate rel truncation unsafe in lazy_scan[no]prune

Both lazy_scan_prune() and lazy_scan_noprune() must determine whether or
not there are tuples on the page making rel truncation unsafe.
LVRelState->nonempty_pages is updated to reflect this. Previously, both
functions set an output parameter or output parameter member, hastup, to
indicate that nonempty_pages should be updated to reflect the latest
non-removable page. There doesn't seem to be any reason to wait until
lazy_scan_[no]prune() returns to update nonempty_pages. Plenty of other
counters in the LVRelState are updated in lazy_scan_[no]prune().
This allows us to get rid of the output parameter hastup.

@@ -972,20 +970,21 @@ lazy_scan_heap(LVRelState *vacrel)
continue;
}

-                     /* Collect LP_DEAD items in dead_items array, count tuples */
-                     if (lazy_scan_noprune(vacrel, buf, blkno, page, &hastup,
+                     /*
+                      * Collect LP_DEAD items in dead_items array, count tuples,
+                      * determine if rel truncation is safe
+                      */
+                     if (lazy_scan_noprune(vacrel, buf, blkno, page,
&recordfreespace))
{
Size            freespace = 0;
/*
* Processed page successfully (without cleanup lock) -- just
-                              * need to perform rel truncation and FSM steps, much like the
-                              * lazy_scan_prune case.  Don't bother trying to match its
-                              * visibility map setting steps, though.
+                              * need to update the FSM, much like the lazy_scan_prune case.
+                              * Don't bother trying to match its visibility map setting
+                              * steps, though.
*/
-                             if (hastup)
-                                     vacrel->nonempty_pages = blkno + 1;
if (recordfreespace)
freespace = PageGetHeapFreeSpace(page);
UnlockReleaseBuffer(buf);

The comment continues to say that we "determine if rel truncation is safe" -
but I don't see that? Oh, I see, it's done inside lazy_scan_noprune(). This
doesn't seem like a clear improvement to me. Particularly because it's only
set if lazy_scan_noprune() actually does something.

I don't get what the last sentence means ("Particularly because...").
The new location of the hastup test in lazy_scan_noprune() is above an
unconditional return true, so it is also only set if
lazy_scan_noprune() actually does something. I think the
lazy_scan[]prune() functions shouldn't try to export the hastup
information to lazy_scan_heap(). It's confusing. We should be moving
all of the page-specific processing into the individual functions
instead of in the body of lazy_scan_heap().

I don't like the existing code in lazy_scan_heap(). But this kinda seems like
tinkering around the edges, without getting to the heart of the issue. I think
we should

1) Move everything after ReadBufferExtended() and the end of the loop into its
own function

2) All the code in the loop body after the call to lazy_scan_prune() is
specific to the lazy_scan_prune() path, it doesn't make sense that it's at
the same level as the the calls to lazy_scan_noprune(),
lazy_scan_new_or_empty() or lazy_scan_prune(). Either it should be in
lazy_scan_prune() or a new wrapper function.

3) It's imo wrong that we have UnlockReleaseBuffer() (there are 6 different
places unlocking if I didn't miscount!) and RecordPageWithFreeSpace() calls
in this many places. I think this is largely a consequence of the previous
points. Once those are addressed, we can have one common place.

I have other patches that do versions of all of the above, but they
didn't seem to really fit with this patch set. I am taking a step to
move code out of lazy_scan_heap() that doesn't belong there. That fact
that other code should also be moved from there seems more like a "yes
and" than a "no but". That being said, do you think I should introduce
patches doing further refactoring of lazy_scan_heap() (like what you
suggest above) into this thread?

- Melanie

#12Melanie Plageman
melanieplageman@gmail.com
In reply to: Robert Haas (#8)
Re: Emit fewer vacuum records by reaping removable tuples during pruning

On Thu, Jan 4, 2024 at 12:31 PM Robert Haas <robertmhaas@gmail.com> wrote:

On Wed, Dec 27, 2023 at 11:27 AM Melanie Plageman
<melanieplageman@gmail.com> wrote:

Do you have specific concerns about its correctness? I understand it
is an area where we have to be sure we are correct. But, to be fair,
that is true of all the pruning and vacuuming code.

I'm kind of concerned that 0002 might be a performance regression. It
pushes more branches down into the heap-pruning code, which I think
can sometimes be quite hot, for the sake of a case that rarely occurs
in practice. I take your point about it improving things when there
are no indexes, but what about when there are? And even if there are
no adverse performance consequences, is it really worth complicating
the logic at such a low level?

Regarding the additional code complexity, I think the extra call to
lazy_vacuum_heap_page() in lazy_scan_heap() actually represents a fair
amount of code complexity. It is a special case of page-level
processing that should be handled by heap_page_prune() and not
lazy_scan_heap().

lazy_scan_heap() is responsible for three main things -- loop through
the blocks in a relation and process each page (pruning, freezing,
etc), invoke index vacuuming, invoke functions to loop through
dead_items and vacuum pages. The logic to do the per-page processing
is spread across three places, though.

When a single page is being processed, page pruning happens in
heap_page_prune(). Freezing, dead items recording, and visibility
checks happen in lazy_scan_prune(). Visibility map updates and
freespace map updates happen back in lazy_scan_heap(). Except, if the
table has no indexes, in which case, lazy_scan_heap() also invokes
lazy_vacuum_heap_page() to set dead line pointers unused and do
another separate visibility check and VM update. I maintain that all
page-level processing should be done in the page-level processing
functions (like lazy_scan_prune()). And lazy_scan_heap() shouldn't be
directly responsible for special case page-level processing.

Also, I find "pronto_reap" to be a poor choice of name. "pronto" is an
informal word that seems to have no advantage over something like
"immediate" or "now," and I don't think "reap" has a precise,
universally-understood meaning. You could call this "mark_unused_now"
or "immediately_mark_unused" or something and it would be far more
self-documenting, IMHO.

Yes, I see how pronto is unnecessarily informal. If there are no cases
other than when the table has no indexes that we would consider
immediately marking LPs unused, then perhaps it is better to call it
"no_indexes" (per andres' suggestion)?

- Melanie

#13Robert Haas
robertmhaas@gmail.com
In reply to: Melanie Plageman (#12)
Re: Emit fewer vacuum records by reaping removable tuples during pruning

On Thu, Jan 4, 2024 at 6:03 PM Melanie Plageman
<melanieplageman@gmail.com> wrote:

When a single page is being processed, page pruning happens in
heap_page_prune(). Freezing, dead items recording, and visibility
checks happen in lazy_scan_prune(). Visibility map updates and
freespace map updates happen back in lazy_scan_heap(). Except, if the
table has no indexes, in which case, lazy_scan_heap() also invokes
lazy_vacuum_heap_page() to set dead line pointers unused and do
another separate visibility check and VM update. I maintain that all
page-level processing should be done in the page-level processing
functions (like lazy_scan_prune()). And lazy_scan_heap() shouldn't be
directly responsible for special case page-level processing.

But you can just as easily turn this argument on its head, can't you?
In general, except for HOT tuples, line pointers are marked dead by
pruning and unused by vacuum. Here you want to turn it on its head and
make pruning do what would normally be vacuum's responsibility.

I mean, that's not to say that your argument is "wrong" ... but what I
just said really is how I think about it, too.

Also, I find "pronto_reap" to be a poor choice of name. "pronto" is an
informal word that seems to have no advantage over something like
"immediate" or "now," and I don't think "reap" has a precise,
universally-understood meaning. You could call this "mark_unused_now"
or "immediately_mark_unused" or something and it would be far more
self-documenting, IMHO.

Yes, I see how pronto is unnecessarily informal. If there are no cases
other than when the table has no indexes that we would consider
immediately marking LPs unused, then perhaps it is better to call it
"no_indexes" (per andres' suggestion)?

wfm.

--
Robert Haas
EDB: http://www.enterprisedb.com

#14Melanie Plageman
melanieplageman@gmail.com
In reply to: Robert Haas (#13)
Re: Emit fewer vacuum records by reaping removable tuples during pruning

On Fri, Jan 5, 2024 at 8:59 AM Robert Haas <robertmhaas@gmail.com> wrote:

On Thu, Jan 4, 2024 at 6:03 PM Melanie Plageman
<melanieplageman@gmail.com> wrote:

When a single page is being processed, page pruning happens in
heap_page_prune(). Freezing, dead items recording, and visibility
checks happen in lazy_scan_prune(). Visibility map updates and
freespace map updates happen back in lazy_scan_heap(). Except, if the
table has no indexes, in which case, lazy_scan_heap() also invokes
lazy_vacuum_heap_page() to set dead line pointers unused and do
another separate visibility check and VM update. I maintain that all
page-level processing should be done in the page-level processing
functions (like lazy_scan_prune()). And lazy_scan_heap() shouldn't be
directly responsible for special case page-level processing.

But you can just as easily turn this argument on its head, can't you?
In general, except for HOT tuples, line pointers are marked dead by
pruning and unused by vacuum. Here you want to turn it on its head and
make pruning do what would normally be vacuum's responsibility.

I actually think we are going to want to stop referring to these steps
as pruning and vacuuming. It is confusing because vacuuming refers to
the whole process of doing garbage collection on the table and also to
the specific step of setting dead line pointers unused. If we called
these steps say, pruning and reaping, that may be more clear.

Vacuuming consists of three phases -- the first pass, index vacuuming,
and the second pass. I don't think we should dictate what happens in
each pass. That is, we shouldn't expect only pruning to happen in the
first pass and only reaping to happen in the second pass. For example,
I think Andres has previously proposed doing another round of pruning
after index vacuuming. The second pass/third phase is distinguished
primarily by being after index vacuuming.

If we think about it this way, that frees us up to set dead line
pointers unused in the first pass when the table has no indexes. For
clarity, I could add a block comment explaining that doing this is an
optimization and not a logical requirement. One way to make this even
more clear would be to set the dead line pointers unused in a separate
loop after heap_prune_chain() as I proposed upthread.

- Melanie

#15Robert Haas
robertmhaas@gmail.com
In reply to: Melanie Plageman (#14)
Re: Emit fewer vacuum records by reaping removable tuples during pruning

On Fri, Jan 5, 2024 at 12:59 PM Melanie Plageman
<melanieplageman@gmail.com> wrote:

I actually think we are going to want to stop referring to these steps
as pruning and vacuuming. It is confusing because vacuuming refers to
the whole process of doing garbage collection on the table and also to
the specific step of setting dead line pointers unused. If we called
these steps say, pruning and reaping, that may be more clear.

I agree that there's some terminological confusion here, but if we
leave all of the current naming unchanged and start using new naming
conventions in new code, I don't think that's going to clear things
up. Getting rid of the weird naming with pruning, "scanning" (that is
part of vacuum), and "vacuuming" (that is another part of vacuum) is
gonna take some work.

Vacuuming consists of three phases -- the first pass, index vacuuming,
and the second pass. I don't think we should dictate what happens in
each pass. That is, we shouldn't expect only pruning to happen in the
first pass and only reaping to happen in the second pass. For example,
I think Andres has previously proposed doing another round of pruning
after index vacuuming. The second pass/third phase is distinguished
primarily by being after index vacuuming.

If we think about it this way, that frees us up to set dead line
pointers unused in the first pass when the table has no indexes. For
clarity, I could add a block comment explaining that doing this is an
optimization and not a logical requirement. One way to make this even
more clear would be to set the dead line pointers unused in a separate
loop after heap_prune_chain() as I proposed upthread.

I don't really disagree with any of this, but I think the question is
whether the code is cleaner with mark-LP-as-unused pushed down into
pruning or whether it's better the way it is. Andres seems confident
that the change won't suck for performance, which is good, but I'm not
quite convinced that it's the right direction to go with the code, and
he doesn't seem to be either. Perhaps this all turns on this point:

MP> I am planning to add a VM update into the freeze record, at which point
MP> I will move the VM update code into lazy_scan_prune(). This will then
MP> allow us to consolidate the freespace map update code for the prune and
MP> noprune cases and make lazy_scan_heap() short and sweet.

Can we see what that looks like on top of this change?

--
Robert Haas
EDB: http://www.enterprisedb.com

#16Andres Freund
andres@anarazel.de
In reply to: Melanie Plageman (#11)
Re: Emit fewer vacuum records by reaping removable tuples during pruning

Hi,

On 2024-01-04 17:37:27 -0500, Melanie Plageman wrote:

On Thu, Jan 4, 2024 at 3:03 PM Andres Freund <andres@anarazel.de> wrote:

On 2023-11-17 18:12:08 -0500, Melanie Plageman wrote:

Assert(ItemIdIsNormal(lp));
htup = (HeapTupleHeader) PageGetItem(dp, lp);
@@ -715,7 +733,17 @@ heap_prune_chain(Buffer buffer, OffsetNumber rootoffnum,
* redirect the root to the correct chain member.
*/
if (i >= nchain)
-                     heap_prune_record_dead(prstate, rootoffnum);
+             {
+                     /*
+                      * If the relation has no indexes, we can remove dead tuples
+                      * during pruning instead of marking their line pointers dead. Set
+                      * this tuple's line pointer LP_UNUSED.
+                      */
+                     if (prstate->pronto_reap)
+                             heap_prune_record_unused(prstate, rootoffnum);
+                     else
+                             heap_prune_record_dead(prstate, rootoffnum);
+             }
else
heap_prune_record_redirect(prstate, rootoffnum, chainitems[i]);
}
@@ -726,9 +754,12 @@ heap_prune_chain(Buffer buffer, OffsetNumber rootoffnum,
* item.  This can happen if the loop in heap_page_prune caused us to
* visit the dead successor of a redirect item before visiting the
* redirect item.  We can clean up by setting the redirect item to
-              * DEAD state.
+              * DEAD state. If pronto_reap is true, we can set it LP_UNUSED now.
*/
-             heap_prune_record_dead(prstate, rootoffnum);
+             if (prstate->pronto_reap)
+                     heap_prune_record_unused(prstate, rootoffnum);
+             else
+                     heap_prune_record_dead(prstate, rootoffnum);
}

return ndeleted;

There's three new calls to heap_prune_record_unused() and the logic got more
nested. Is there a way to get to a nicer end result?

So, I could do another loop through the line pointers in
heap_page_prune() (after the loop calling heap_prune_chain()) and, if
pronto_reap is true, set dead line pointers LP_UNUSED. Then, when
constructing the WAL record, I would just not add the prstate.nowdead
that I saved from heap_prune_chain() to the prune WAL record.

Hm, that seems a bit sad as well. I am wondering if we could move the
pronto_reap handling into heap_prune_record_dead() or a wrapper of it. I am
more concerned about the human reader than the CPU here...

@@ -972,20 +970,21 @@ lazy_scan_heap(LVRelState *vacrel)
continue;
}

-                     /* Collect LP_DEAD items in dead_items array, count tuples */
-                     if (lazy_scan_noprune(vacrel, buf, blkno, page, &hastup,
+                     /*
+                      * Collect LP_DEAD items in dead_items array, count tuples,
+                      * determine if rel truncation is safe
+                      */
+                     if (lazy_scan_noprune(vacrel, buf, blkno, page,
&recordfreespace))
{
Size            freespace = 0;
/*
* Processed page successfully (without cleanup lock) -- just
-                              * need to perform rel truncation and FSM steps, much like the
-                              * lazy_scan_prune case.  Don't bother trying to match its
-                              * visibility map setting steps, though.
+                              * need to update the FSM, much like the lazy_scan_prune case.
+                              * Don't bother trying to match its visibility map setting
+                              * steps, though.
*/
-                             if (hastup)
-                                     vacrel->nonempty_pages = blkno + 1;
if (recordfreespace)
freespace = PageGetHeapFreeSpace(page);
UnlockReleaseBuffer(buf);

The comment continues to say that we "determine if rel truncation is safe" -
but I don't see that? Oh, I see, it's done inside lazy_scan_noprune(). This
doesn't seem like a clear improvement to me. Particularly because it's only
set if lazy_scan_noprune() actually does something.

I don't get what the last sentence means ("Particularly because...").

Took me a second to understand myself again too, oops. What I think I meant is
that it seems error-prone that it's only set in some paths inside
lazy_scan_noprune(). Previously it was at least a bit clearer in
lazy_scan_heap() that it would be set for the different possible paths.

I don't like the existing code in lazy_scan_heap(). But this kinda seems like
tinkering around the edges, without getting to the heart of the issue. I think
we should

1) Move everything after ReadBufferExtended() and the end of the loop into its
own function

2) All the code in the loop body after the call to lazy_scan_prune() is
specific to the lazy_scan_prune() path, it doesn't make sense that it's at
the same level as the the calls to lazy_scan_noprune(),
lazy_scan_new_or_empty() or lazy_scan_prune(). Either it should be in
lazy_scan_prune() or a new wrapper function.

3) It's imo wrong that we have UnlockReleaseBuffer() (there are 6 different
places unlocking if I didn't miscount!) and RecordPageWithFreeSpace() calls
in this many places. I think this is largely a consequence of the previous
points. Once those are addressed, we can have one common place.

I have other patches that do versions of all of the above, but they
didn't seem to really fit with this patch set. I am taking a step to
move code out of lazy_scan_heap() that doesn't belong there. That fact
that other code should also be moved from there seems more like a "yes
and" than a "no but". That being said, do you think I should introduce
patches doing further refactoring of lazy_scan_heap() (like what you
suggest above) into this thread?

It probably should not be part of this patchset. I probably shouldn't have
written the above here, but after concluding that I didn't think your small
refactoring patch was quite right, I couldn't stop myself from thinking about
what would be right.

Greetings,

Andres Freund

#17Andres Freund
andres@anarazel.de
In reply to: Robert Haas (#13)
Re: Emit fewer vacuum records by reaping removable tuples during pruning

Hi,

On 2024-01-05 08:59:41 -0500, Robert Haas wrote:

On Thu, Jan 4, 2024 at 6:03 PM Melanie Plageman
<melanieplageman@gmail.com> wrote:

When a single page is being processed, page pruning happens in
heap_page_prune(). Freezing, dead items recording, and visibility
checks happen in lazy_scan_prune(). Visibility map updates and
freespace map updates happen back in lazy_scan_heap(). Except, if the
table has no indexes, in which case, lazy_scan_heap() also invokes
lazy_vacuum_heap_page() to set dead line pointers unused and do
another separate visibility check and VM update. I maintain that all
page-level processing should be done in the page-level processing
functions (like lazy_scan_prune()). And lazy_scan_heap() shouldn't be
directly responsible for special case page-level processing.

But you can just as easily turn this argument on its head, can't you?
In general, except for HOT tuples, line pointers are marked dead by
pruning and unused by vacuum. Here you want to turn it on its head and
make pruning do what would normally be vacuum's responsibility.

OTOH, the pruning logic, including its WAL record, already supports marking
items unused, all we need to do is to tell it to do so in a few more cases. If
we didn't already need to have support for this, I'd a much harder time
arguing for doing this.

One important part of the larger project is to combine the WAL records for
pruning, freezing and setting the all-visible/all-frozen bit into one WAL
record. We can't set all-frozen before we have removed the dead items. So
either we need to combine pruning and setting items unused for no-index tables
or we end up considerably less efficient in the no-indexes case.

An aside:

As I think we chatted about before, I eventually would like the option to
remove index entries for a tuple during on-access pruning, for OLTP
workloads. I.e. before removing the tuple, construct the corresponding index
tuple, use it to look up index entries pointing to the tuple. If all the index
entries were found (they might not be, if they already were marked dead during
a lookup, or if an expression wasn't actually immutable), we can prune without
the full index scan. Obviously this would only be suitable for some
workloads, but it could be quite beneficial when you have huge indexes. The
reason I mention this is that then we'd have another source of marking items
unused during pruning.

Greetings,

Andres Freund

#18Robert Haas
robertmhaas@gmail.com
In reply to: Andres Freund (#17)
Re: Emit fewer vacuum records by reaping removable tuples during pruning

On Fri, Jan 5, 2024 at 3:05 PM Andres Freund <andres@anarazel.de> wrote:

OTOH, the pruning logic, including its WAL record, already supports marking
items unused, all we need to do is to tell it to do so in a few more cases. If
we didn't already need to have support for this, I'd a much harder time
arguing for doing this.

One important part of the larger project is to combine the WAL records for
pruning, freezing and setting the all-visible/all-frozen bit into one WAL
record. We can't set all-frozen before we have removed the dead items. So
either we need to combine pruning and setting items unused for no-index tables
or we end up considerably less efficient in the no-indexes case.

Those are fair arguments.

An aside:

As I think we chatted about before, I eventually would like the option to
remove index entries for a tuple during on-access pruning, for OLTP
workloads. I.e. before removing the tuple, construct the corresponding index
tuple, use it to look up index entries pointing to the tuple. If all the index
entries were found (they might not be, if they already were marked dead during
a lookup, or if an expression wasn't actually immutable), we can prune without
the full index scan. Obviously this would only be suitable for some
workloads, but it could be quite beneficial when you have huge indexes. The
reason I mention this is that then we'd have another source of marking items
unused during pruning.

I will be astonished if you can make this work well enough to avoid
huge regressions in plausible cases. There are plenty of cases where
we do a very thorough job opportunistically removing index tuples.

--
Robert Haas
EDB: http://www.enterprisedb.com

#19Melanie Plageman
melanieplageman@gmail.com
In reply to: Robert Haas (#15)
Re: Emit fewer vacuum records by reaping removable tuples during pruning

On Fri, Jan 5, 2024 at 1:47 PM Robert Haas <robertmhaas@gmail.com> wrote:

On Fri, Jan 5, 2024 at 12:59 PM Melanie Plageman
<melanieplageman@gmail.com> wrote:
MP> I am planning to add a VM update into the freeze record, at which point
MP> I will move the VM update code into lazy_scan_prune(). This will then
MP> allow us to consolidate the freespace map update code for the prune and
MP> noprune cases and make lazy_scan_heap() short and sweet.

Can we see what that looks like on top of this change?

Yes, attached is a patch set which does this. My previous patchset
already reduced the number of places we unlock the buffer and update
the freespace map in lazy_scan_heap(). This patchset combines the
lazy_scan_prune() and lazy_scan_noprune() FSM update cases. I also
have a version which moves the freespace map updates into
lazy_scan_prune() and lazy_scan_noprune() -- eliminating all of these
from lazy_scan_heap(). This is arguably more clear. But Andres
mentioned he wanted fewer places unlocking the buffer and updating the
FSM.

- Melanie

Attachments:

v3-0004-Inline-LVPagePruneState-members-in-lazy_scan_prun.patchtext/x-patch; charset=US-ASCII; name=v3-0004-Inline-LVPagePruneState-members-in-lazy_scan_prun.patchDownload+55-50
v3-0001-Indicate-rel-truncation-unsafe-in-lazy_scan-no-pr.patchtext/x-patch; charset=US-ASCII; name=v3-0001-Indicate-rel-truncation-unsafe-in-lazy_scan-no-pr.patchDownload+26-25
v3-0003-Update-VM-in-lazy_scan_prune.patchtext/x-patch; charset=US-ASCII; name=v3-0003-Update-VM-in-lazy_scan_prune.patchDownload+115-111
v3-0002-Set-would-be-dead-items-LP_UNUSED-while-pruning.patchtext/x-patch; charset=US-ASCII; name=v3-0002-Set-would-be-dead-items-LP_UNUSED-while-pruning.patchDownload+125-111
v3-0005-Remove-unneeded-struct-LVPagePruneState.patchtext/x-patch; charset=US-ASCII; name=v3-0005-Remove-unneeded-struct-LVPagePruneState.patchDownload+7-20
v3-0006-Conslidate-FSM-update-code-in-lazy_scan_heap.patchtext/x-patch; charset=US-ASCII; name=v3-0006-Conslidate-FSM-update-code-in-lazy_scan_heap.patchDownload+23-32
#20Andres Freund
andres@anarazel.de
In reply to: Robert Haas (#18)
Re: Emit fewer vacuum records by reaping removable tuples during pruning

Hi,

On 2024-01-05 15:23:12 -0500, Robert Haas wrote:

On Fri, Jan 5, 2024 at 3:05 PM Andres Freund <andres@anarazel.de> wrote:

An aside:

As I think we chatted about before, I eventually would like the option to
remove index entries for a tuple during on-access pruning, for OLTP
workloads. I.e. before removing the tuple, construct the corresponding index
tuple, use it to look up index entries pointing to the tuple. If all the index
entries were found (they might not be, if they already were marked dead during
a lookup, or if an expression wasn't actually immutable), we can prune without
the full index scan. Obviously this would only be suitable for some
workloads, but it could be quite beneficial when you have huge indexes. The
reason I mention this is that then we'd have another source of marking items
unused during pruning.

I will be astonished if you can make this work well enough to avoid
huge regressions in plausible cases. There are plenty of cases where
we do a very thorough job opportunistically removing index tuples.

These days the AM is often involved with that, via
table_index_delete_tuples()/heap_index_delete_tuples(). That IIRC has to
happen before physically removing the already-marked-killed index entries. We
can't rely on being able to actually prune the heap page at that point, there
might be other backends pinning it, but often we will be able to. If we were
to prune below heap_index_delete_tuples(), we wouldn't need to recheck that
index again during "individual tuple pruning", if the to-be-marked-unused heap
tuple is one of the tuples passed to heap_index_delete_tuples(). Which
presumably will be very commonly the case.

At least for nbtree, we are much more aggressive about marking index entries
as killed, than about actually removing the index entries. "individual tuple
pruning" would have to look for killed-but-still-present index entries, not
just for "live" entries.

Greetings,

Andres Freund

#21Melanie Plageman
melanieplageman@gmail.com
In reply to: Andres Freund (#16)
#22Melanie Plageman
melanieplageman@gmail.com
In reply to: Andres Freund (#10)
In reply to: Robert Haas (#18)
In reply to: Andres Freund (#20)
In reply to: Melanie Plageman (#14)
#26Robert Haas
robertmhaas@gmail.com
In reply to: Andres Freund (#20)
#27Robert Haas
robertmhaas@gmail.com
In reply to: Melanie Plageman (#19)
#28Michael Paquier
michael@paquier.xyz
In reply to: Robert Haas (#27)
#29Melanie Plageman
melanieplageman@gmail.com
In reply to: Michael Paquier (#28)
#30Melanie Plageman
melanieplageman@gmail.com
In reply to: Robert Haas (#27)
#31Robert Haas
robertmhaas@gmail.com
In reply to: Melanie Plageman (#29)
#32Robert Haas
robertmhaas@gmail.com
In reply to: Melanie Plageman (#30)
#33Melanie Plageman
melanieplageman@gmail.com
In reply to: Robert Haas (#31)
#34Robert Haas
robertmhaas@gmail.com
In reply to: Melanie Plageman (#33)
#35Andres Freund
andres@anarazel.de
In reply to: Robert Haas (#34)
#36Melanie Plageman
melanieplageman@gmail.com
In reply to: Robert Haas (#32)
#37Robert Haas
robertmhaas@gmail.com
In reply to: Melanie Plageman (#36)
#38Jim Nasby
Jim.Nasby@BlueTreble.com
In reply to: Robert Haas (#26)
#39Melanie Plageman
melanieplageman@gmail.com
In reply to: Robert Haas (#37)
#40Robert Haas
robertmhaas@gmail.com
In reply to: Melanie Plageman (#39)
#41Melanie Plageman
melanieplageman@gmail.com
In reply to: Robert Haas (#40)
#42Robert Haas
robertmhaas@gmail.com
In reply to: Melanie Plageman (#41)
#43Robert Haas
robertmhaas@gmail.com
In reply to: Andres Freund (#35)
#44Melanie Plageman
melanieplageman@gmail.com
In reply to: Robert Haas (#42)
#45Robert Haas
robertmhaas@gmail.com
In reply to: Melanie Plageman (#44)
#46Melanie Plageman
melanieplageman@gmail.com
In reply to: Robert Haas (#45)
#47Michael Paquier
michael@paquier.xyz
In reply to: Robert Haas (#43)
#48Robert Haas
robertmhaas@gmail.com
In reply to: Melanie Plageman (#46)
In reply to: Robert Haas (#48)
#50Melanie Plageman
melanieplageman@gmail.com
In reply to: Robert Haas (#48)
#51Melanie Plageman
melanieplageman@gmail.com
In reply to: Peter Geoghegan (#49)
In reply to: Melanie Plageman (#51)
#53Robert Haas
robertmhaas@gmail.com
In reply to: Peter Geoghegan (#49)
#54Melanie Plageman
melanieplageman@gmail.com
In reply to: Peter Geoghegan (#52)
#55Melanie Plageman
melanieplageman@gmail.com
In reply to: Peter Geoghegan (#52)
#56Robert Haas
robertmhaas@gmail.com
In reply to: Melanie Plageman (#54)
In reply to: Robert Haas (#56)
#58Robert Haas
robertmhaas@gmail.com
In reply to: Peter Geoghegan (#57)
In reply to: Melanie Plageman (#54)
#60Melanie Plageman
melanieplageman@gmail.com
In reply to: Robert Haas (#58)
#61Robert Haas
robertmhaas@gmail.com
In reply to: Melanie Plageman (#60)
#62Melanie Plageman
melanieplageman@gmail.com
In reply to: Robert Haas (#61)
#63Robert Haas
robertmhaas@gmail.com
In reply to: Melanie Plageman (#62)
#64Melanie Plageman
melanieplageman@gmail.com
In reply to: Robert Haas (#63)
#65Robert Haas
robertmhaas@gmail.com
In reply to: Melanie Plageman (#64)
#66Melanie Plageman
melanieplageman@gmail.com
In reply to: Robert Haas (#65)
#67Robert Haas
robertmhaas@gmail.com
In reply to: Melanie Plageman (#66)
#68Jim Nasby
Jim.Nasby@BlueTreble.com
In reply to: Robert Haas (#53)
#69Robert Haas
robertmhaas@gmail.com
In reply to: Jim Nasby (#68)
#70Melanie Plageman
melanieplageman@gmail.com
In reply to: Robert Haas (#67)
#71Robert Haas
robertmhaas@gmail.com
In reply to: Melanie Plageman (#70)
#72Melanie Plageman
melanieplageman@gmail.com
In reply to: Robert Haas (#71)
#73Robert Haas
robertmhaas@gmail.com
In reply to: Melanie Plageman (#72)
In reply to: Robert Haas (#73)
In reply to: Peter Geoghegan (#74)
#76Melanie Plageman
melanieplageman@gmail.com
In reply to: Robert Haas (#73)
#77Melanie Plageman
melanieplageman@gmail.com
In reply to: Peter Geoghegan (#74)
#78Melanie Plageman
melanieplageman@gmail.com
In reply to: Peter Geoghegan (#75)
In reply to: Melanie Plageman (#78)
#80Robert Haas
robertmhaas@gmail.com
In reply to: Peter Geoghegan (#75)
#81Melanie Plageman
melanieplageman@gmail.com
In reply to: Melanie Plageman (#76)
In reply to: Robert Haas (#80)
#83Robert Haas
robertmhaas@gmail.com
In reply to: Melanie Plageman (#81)
#84Robert Haas
robertmhaas@gmail.com
In reply to: Peter Geoghegan (#82)
#85Robert Haas
robertmhaas@gmail.com
In reply to: Robert Haas (#84)
In reply to: Robert Haas (#84)
#87Robert Haas
robertmhaas@gmail.com
In reply to: Peter Geoghegan (#86)
In reply to: Robert Haas (#87)
#89Robert Haas
robertmhaas@gmail.com
In reply to: Peter Geoghegan (#88)
#90Robert Haas
robertmhaas@gmail.com
In reply to: Robert Haas (#83)
#91Melanie Plageman
melanieplageman@gmail.com
In reply to: Robert Haas (#90)
#92Robert Haas
robertmhaas@gmail.com
In reply to: Melanie Plageman (#91)
#93Melanie Plageman
melanieplageman@gmail.com
In reply to: Robert Haas (#92)
#94Melanie Plageman
melanieplageman@gmail.com
In reply to: Melanie Plageman (#93)
In reply to: Melanie Plageman (#94)
#96Robert Haas
robertmhaas@gmail.com
In reply to: Melanie Plageman (#94)
#97Melanie Plageman
melanieplageman@gmail.com
In reply to: Robert Haas (#96)
#98Robert Haas
robertmhaas@gmail.com
In reply to: Melanie Plageman (#97)
#99Melanie Plageman
melanieplageman@gmail.com
In reply to: Robert Haas (#98)
#100Robert Haas
robertmhaas@gmail.com
In reply to: Melanie Plageman (#99)
#101Robert Haas
robertmhaas@gmail.com
In reply to: Robert Haas (#100)
#102Melanie Plageman
melanieplageman@gmail.com
In reply to: Robert Haas (#101)
In reply to: Robert Haas (#101)