Eliminate redundant tuple visibility check in vacuum
While working on a set of patches to combine the freeze and visibility
map WAL records into the prune record, I wrote the attached patches
reusing the tuple visibility information collected in heap_page_prune()
back in lazy_scan_prune().
heap_page_prune() collects the HTSV_Result for every tuple on a page
and saves it in an array used by heap_prune_chain(). If we make that
array available to lazy_scan_prune(), it can use it when collecting
stats for vacuum and determining whether or not to freeze tuples.
This avoids calling HeapTupleSatisfiesVacuum() again on every tuple in
the page.
It also gets rid of the retry loop in lazy_scan_prune().
- Melanie
Attachments:
v1-0002-Reuse-heap_page_prune-tuple-visibility-statuses.patchapplication/x-patch; name=v1-0002-Reuse-heap_page_prune-tuple-visibility-statuses.patchDownload+68-81
v1-0001-Rebrand-LVPagePruneState-as-PruneResult.patchapplication/x-patch; name=v1-0001-Rebrand-LVPagePruneState-as-PruneResult.patchDownload+68-68
Hi Melanie,
On 8/29/23 01:49, Melanie Plageman wrote:
While working on a set of patches to combine the freeze and visibility
map WAL records into the prune record, I wrote the attached patches
reusing the tuple visibility information collected in heap_page_prune()
back in lazy_scan_prune().heap_page_prune() collects the HTSV_Result for every tuple on a page
and saves it in an array used by heap_prune_chain(). If we make that
array available to lazy_scan_prune(), it can use it when collecting
stats for vacuum and determining whether or not to freeze tuples.
This avoids calling HeapTupleSatisfiesVacuum() again on every tuple in
the page.It also gets rid of the retry loop in lazy_scan_prune().
How did you test this change?
Could you measure any performance difference?
If so could you provide your test case?
--
David Geier
(ServiceNow)
Hi David,
Thanks for taking a look!
On Tue, Aug 29, 2023 at 5:07 AM David Geier <geidav.pg@gmail.com> wrote:
Hi Melanie,
On 8/29/23 01:49, Melanie Plageman wrote:
While working on a set of patches to combine the freeze and visibility
map WAL records into the prune record, I wrote the attached patches
reusing the tuple visibility information collected in heap_page_prune()
back in lazy_scan_prune().heap_page_prune() collects the HTSV_Result for every tuple on a page
and saves it in an array used by heap_prune_chain(). If we make that
array available to lazy_scan_prune(), it can use it when collecting
stats for vacuum and determining whether or not to freeze tuples.
This avoids calling HeapTupleSatisfiesVacuum() again on every tuple in
the page.It also gets rid of the retry loop in lazy_scan_prune().
How did you test this change?
I didn't add a new test, but you can confirm some existing test
coverage if you, for example, set every HTSV_Result in the array to
HEAPTUPLE_LIVE and run the regression tests. Tests relying on vacuum
removing the right tuples may fail. For example, select count(*) from
gin_test_tbl where i @> array[1, 999]; in src/test/regress/sql/gin.sql
fails for me since it sees a tuple it shouldn't.
Could you measure any performance difference?
If so could you provide your test case?
I created a large table and then updated a tuple on every page in the
relation and vacuumed it. I saw a consistent slight improvement in
vacuum execution time. I profiled a bit with perf stat as well. The
difference is relatively small for this kind of example, but I can
work on a more compelling, realistic example. I think eliminating the
retry loop is also useful, as I have heard that users have had to
cancel vacuums which were in this retry loop for too long.
- Melanie
On Tue, Aug 29, 2023 at 5:07 AM David Geier <geidav.pg@gmail.com> wrote:
Could you measure any performance difference?
If so could you provide your test case?
I created a large table and then updated a tuple on every page in the
relation and vacuumed it. I saw a consistent slight improvement in
vacuum execution time. I profiled a bit with perf stat as well. The
difference is relatively small for this kind of example, but I can
work on a more compelling, realistic example. I think eliminating the
retry loop is also useful, as I have heard that users have had to
cancel vacuums which were in this retry loop for too long.
Just to provide a specific test case, if you create a small table like this
create table foo (a int, b int, c int) with(autovacuum_enabled=false);
insert into foo select i, i, i from generate_series(1, 10000000);
And then vacuum it. I find that with my patch applied I see a
consistent ~9% speedup (averaged across multiple runs).
master: ~533ms
patch: ~487ms
And in the profile, with my patch applied, you notice less time spent
in HeapTupleSatisfiesVacuumHorizon()
master:
11.83% postgres postgres [.] heap_page_prune
11.59% postgres postgres [.] heap_prepare_freeze_tuple
8.77% postgres postgres [.] lazy_scan_prune
8.01% postgres postgres [.] HeapTupleSatisfiesVacuumHorizon
7.79% postgres postgres [.] heap_tuple_should_freeze
patch:
13.41% postgres postgres [.] heap_prepare_freeze_tuple
9.88% postgres postgres [.] heap_page_prune
8.61% postgres postgres [.] lazy_scan_prune
7.00% postgres postgres [.] heap_tuple_should_freeze
6.43% postgres postgres [.] HeapTupleSatisfiesVacuumHorizon
- Melanie
Hi Melanie,
On 8/31/23 02:59, Melanie Plageman wrote:
I created a large table and then updated a tuple on every page in the
relation and vacuumed it. I saw a consistent slight improvement in
vacuum execution time. I profiled a bit with perf stat as well. The
difference is relatively small for this kind of example, but I can
work on a more compelling, realistic example. I think eliminating the
retry loop is also useful, as I have heard that users have had to
cancel vacuums which were in this retry loop for too long.Just to provide a specific test case, if you create a small table like this
create table foo (a int, b int, c int) with(autovacuum_enabled=false);
insert into foo select i, i, i from generate_series(1, 10000000);And then vacuum it. I find that with my patch applied I see a
consistent ~9% speedup (averaged across multiple runs).master: ~533ms
patch: ~487msAnd in the profile, with my patch applied, you notice less time spent
in HeapTupleSatisfiesVacuumHorizon()master:
11.83% postgres postgres [.] heap_page_prune
11.59% postgres postgres [.] heap_prepare_freeze_tuple
8.77% postgres postgres [.] lazy_scan_prune
8.01% postgres postgres [.] HeapTupleSatisfiesVacuumHorizon
7.79% postgres postgres [.] heap_tuple_should_freezepatch:
13.41% postgres postgres [.] heap_prepare_freeze_tuple
9.88% postgres postgres [.] heap_page_prune
8.61% postgres postgres [.] lazy_scan_prune
7.00% postgres postgres [.] heap_tuple_should_freeze
6.43% postgres postgres [.] HeapTupleSatisfiesVacuumHorizon
Thanks a lot for providing additional information and the test case.
I tried it on a release build and I also see a 10% speed-up. I reset the
visibility map between VACUUM runs, see:
CREATE EXTENSION pg_visibility; CREATE TABLE foo (a INT, b INT, c INT)
WITH(autovacuum_enabled=FALSE); INSERT INTO foo SELECT i, i, i from
generate_series(1, 10000000) i; VACUUM foo; SELECT
pg_truncate_visibility_map('foo'); VACUUM foo; SELECT
pg_truncate_visibility_map('foo'); VACUUM foo; ...
The first patch, which refactors the code so we can pass the result of
the visibility checks to the caller, looks good to me.
Regarding the 2nd patch (disclaimer: I'm not too familiar with that area
of the code): I don't completely understand why the retry loop is not
needed anymore and how you now detect/handle the possible race
condition? It can still happen that an aborting transaction changes the
state of a row after heap_page_prune() looked at that row. Would that
case now not be ignored?
--
David Geier
(ServiceNow)
On Mon, Aug 28, 2023 at 7:49 PM Melanie Plageman
<melanieplageman@gmail.com> wrote:
While working on a set of patches to combine the freeze and visibility
map WAL records into the prune record, I wrote the attached patches
reusing the tuple visibility information collected in heap_page_prune()
back in lazy_scan_prune().heap_page_prune() collects the HTSV_Result for every tuple on a page
and saves it in an array used by heap_prune_chain(). If we make that
array available to lazy_scan_prune(), it can use it when collecting
stats for vacuum and determining whether or not to freeze tuples.
This avoids calling HeapTupleSatisfiesVacuum() again on every tuple in
the page.It also gets rid of the retry loop in lazy_scan_prune().
In general, I like these patches. I think it's a clever approach, and
I don't really see any down side. It should just be straight-up better
than what we have now, and if it's not better, it still shouldn't be
any worse.
I have a few suggestions:
- Rather than removing the rather large comment block at the top of
lazy_scan_prune() I'd like to see it rewritten to explain how we now
deal with the problem. I'd suggest leaving the first paragraph ("Prior
to...") just as it is and replace all the words following "The
approach we take now is" with a description of the approach that this
patch takes to the problem.
- I'm not a huge fan of the caller of heap_page_prune() having to know
how to initialize the PruneResult. Can heap_page_prune() itself do
that work, so that presult is an out parameter rather than an in-out
parameter? Or, if not, can it be moved to a new helper function, like
heap_page_prune_init(), rather than having that logic in 2+ places?
- int ndeleted,
- nnewlpdead;
-
- ndeleted = heap_page_prune(relation, buffer, vistest, limited_xmin,
- limited_ts, &nnewlpdead, NULL);
+ int ndeleted = heap_page_prune(relation, buffer, vistest, limited_xmin,
+ limited_ts, &presult, NULL);
- I don't particularly like merging the declaration with the
assignment unless the call is narrow enough that we don't need a line
break in there, which is not the case here.
- I haven't thoroughly investigated yet, but I wonder if there are any
other places where comments need updating. As a general note, I find
it desirable for a function's header comment to mention whether any
pointer parameters are in parameters, out parameters, or in-out
parameters, and what the contract between caller and callee actually
is.
--
Robert Haas
EDB: http://www.enterprisedb.com
On Thu, Aug 31, 2023 at 2:03 PM Robert Haas <robertmhaas@gmail.com> wrote:
I have a few suggestions:
- Rather than removing the rather large comment block at the top of
lazy_scan_prune() I'd like to see it rewritten to explain how we now
deal with the problem. I'd suggest leaving the first paragraph ("Prior
to...") just as it is and replace all the words following "The
approach we take now is" with a description of the approach that this
patch takes to the problem.
Good idea. I've updated the comment. I also explain why this new
approach works in the commit message and reference the commit which
added the previous approach.
- I'm not a huge fan of the caller of heap_page_prune() having to know
how to initialize the PruneResult. Can heap_page_prune() itself do
that work, so that presult is an out parameter rather than an in-out
parameter? Or, if not, can it be moved to a new helper function, like
heap_page_prune_init(), rather than having that logic in 2+ places?
Ah, yes. Now that it has two callers, and since it is exclusively an
output parameter, it is quite ugly to initialize it in both callers.
Fixed in the attached.
- int ndeleted, - nnewlpdead; - - ndeleted = heap_page_prune(relation, buffer, vistest, limited_xmin, - limited_ts, &nnewlpdead, NULL); + int ndeleted = heap_page_prune(relation, buffer, vistest, limited_xmin, + limited_ts, &presult, NULL);- I don't particularly like merging the declaration with the
assignment unless the call is narrow enough that we don't need a line
break in there, which is not the case here.
I have changed this.
- I haven't thoroughly investigated yet, but I wonder if there are any
other places where comments need updating. As a general note, I find
it desirable for a function's header comment to mention whether any
pointer parameters are in parameters, out parameters, or in-out
parameters, and what the contract between caller and callee actually
is.
I've investigated vacuumlazy.c and pruneheap.c and looked at the
commit that added the retry loop (8523492d4e349) to see everywhere it
added comments and don't see anywhere else that needs updating.
I have updated lazy_scan_prune()'s function header comment to describe
the nature of the in-out and output parameters and the contract.
- Melanie
Attachments:
v2-0001-Rebrand-LVPagePruneState-as-PruneResult.patchtext/x-patch; charset=US-ASCII; name=v2-0001-Rebrand-LVPagePruneState-as-PruneResult.patchDownload+76-68
v2-0002-Reuse-heap_page_prune-tuple-visibility-statuses.patchtext/x-patch; charset=US-ASCII; name=v2-0002-Reuse-heap_page_prune-tuple-visibility-statuses.patchDownload+64-71
On Thu, Aug 31, 2023 at 5:39 AM David Geier <geidav.pg@gmail.com> wrote:
Regarding the 2nd patch (disclaimer: I'm not too familiar with that area
of the code): I don't completely understand why the retry loop is not
needed anymore and how you now detect/handle the possible race
condition? It can still happen that an aborting transaction changes the
state of a row after heap_page_prune() looked at that row. Would that
case now not be ignored?
Thanks for asking. I've updated the comment in the code and the commit
message about this, as it seems important to be clear.
Any inserting transaction which aborts after heap_page_prune()'s
visibility check will now be of no concern to lazy_scan_prune(). Since
we don't do the visibility check again, we won't find the tuple
HEAPTUPLE_DEAD and thus won't have the problem of adding the tuple to
the array of tuples for vacuum to reap. This does mean that we won't
reap and remove tuples whose inserting transactions abort right after
heap_page_prune()'s visibility check. But, this doesn't seem like an
issue. They may not be removed until the next vacuum, but ISTM it is
actually worse to pay the cost of re-pruning the whole page just to
clean up that one tuple. Maybe if that renders the page all visible
and we can mark it as such in the visibility map -- but that seems
like a relatively narrow use case.
- Melanie
On Thu, Aug 31, 2023 at 3:35 PM Melanie Plageman
<melanieplageman@gmail.com> wrote:
Any inserting transaction which aborts after heap_page_prune()'s
visibility check will now be of no concern to lazy_scan_prune(). Since
we don't do the visibility check again, we won't find the tuple
HEAPTUPLE_DEAD and thus won't have the problem of adding the tuple to
the array of tuples for vacuum to reap. This does mean that we won't
reap and remove tuples whose inserting transactions abort right after
heap_page_prune()'s visibility check. But, this doesn't seem like an
issue.
It's definitely not an issue.
The loop is essentially a hacky way of getting a consistent picture of
which tuples should be treated as HEAPTUPLE_DEAD, and which tuples
need to be left behind (consistent at the level of each page and each
HOT chain, at least). Making that explicit seems strictly better.
They may not be removed until the next vacuum, but ISTM it is
actually worse to pay the cost of re-pruning the whole page just to
clean up that one tuple. Maybe if that renders the page all visible
and we can mark it as such in the visibility map -- but that seems
like a relatively narrow use case.
The chances of actually hitting the retry are microscopic anyway. It
has nothing to do with making sure that dead tuples from aborted
tuples get removed for its own sake, or anything. Rather, the retry is
all about making sure that all TIDs that get removed from indexes can
only point to LP_DEAD stubs. Prior to Postgres 14, HEAPTUPLE_DEAD
tuples with storage would very occasionally be left behind, which made
life difficult in a bunch of other places -- for no good reason.
--
Peter Geoghegan
Hi,
On 9/1/23 03:25, Peter Geoghegan wrote:
On Thu, Aug 31, 2023 at 3:35 PM Melanie Plageman
<melanieplageman@gmail.com> wrote:Any inserting transaction which aborts after heap_page_prune()'s
visibility check will now be of no concern to lazy_scan_prune(). Since
we don't do the visibility check again, we won't find the tuple
HEAPTUPLE_DEAD and thus won't have the problem of adding the tuple to
the array of tuples for vacuum to reap. This does mean that we won't
reap and remove tuples whose inserting transactions abort right after
heap_page_prune()'s visibility check. But, this doesn't seem like an
issue.It's definitely not an issue.
The loop is essentially a hacky way of getting a consistent picture of
which tuples should be treated as HEAPTUPLE_DEAD, and which tuples
need to be left behind (consistent at the level of each page and each
HOT chain, at least). Making that explicit seems strictly better.They may not be removed until the next vacuum, but ISTM it is
actually worse to pay the cost of re-pruning the whole page just to
clean up that one tuple. Maybe if that renders the page all visible
and we can mark it as such in the visibility map -- but that seems
like a relatively narrow use case.The chances of actually hitting the retry are microscopic anyway. It
has nothing to do with making sure that dead tuples from aborted
tuples get removed for its own sake, or anything. Rather, the retry is
all about making sure that all TIDs that get removed from indexes can
only point to LP_DEAD stubs. Prior to Postgres 14, HEAPTUPLE_DEAD
tuples with storage would very occasionally be left behind, which made
life difficult in a bunch of other places -- for no good reason.
That makes sense and seems like a much better compromise. Thanks for the
explanations. Please update the comment to document the corner case and
how we handle it.
--
David Geier
(ServiceNow)
On Thu, Aug 31, 2023 at 6:29 PM Melanie Plageman
<melanieplageman@gmail.com> wrote:
I have changed this.
I spent a bunch of time today looking at this, thinking maybe I could
commit it. But then I got cold feet.
With these patches applied, PruneResult ends up being declared in
heapam.h, with a comment that says /* State returned from pruning */.
But that comment isn't accurate. The two new members that get added to
the structure by 0002, namely nnewlpdead and htsv, are in fact state
that is returned from pruning. But the other 5 members aren't. They're
just initialized to constant values by pruning and then filled in for
real by the vacuum logic. That's extremely weird. It would be fine if
heap_page_prune() just grew a new output argument that only returned
the HTSV results, or perhaps it could make sense to bundle any
existing out parameters together into a struct and then add new things
to that struct instead of adding even more parameters to the function
itself. But there doesn't seem to be any good reason to muddle
together the new output parameters for heap_page_prune() with a bunch
of state that is currently internal to vacuumlazy.c.
I realize that the shape of the patches probably stems from the fact
that they started out life as part of a bigger patch set. But to be
committed independently, they need to be shaped in a way that makes
sense independently, and I don't think this qualifies. On the plus
side, it seems to me that it's probably not that much work to fix this
issue and that the result would likely be a smaller patch than what
you have now, which is something.
--
Robert Haas
EDB: http://www.enterprisedb.com
On Wed, Sep 6, 2023 at 1:04 PM Robert Haas <robertmhaas@gmail.com> wrote:
On Thu, Aug 31, 2023 at 6:29 PM Melanie Plageman
<melanieplageman@gmail.com> wrote:I have changed this.
I spent a bunch of time today looking at this, thinking maybe I could
commit it. But then I got cold feet.With these patches applied, PruneResult ends up being declared in
heapam.h, with a comment that says /* State returned from pruning */.
But that comment isn't accurate. The two new members that get added to
the structure by 0002, namely nnewlpdead and htsv, are in fact state
that is returned from pruning. But the other 5 members aren't. They're
just initialized to constant values by pruning and then filled in for
real by the vacuum logic. That's extremely weird. It would be fine if
heap_page_prune() just grew a new output argument that only returned
the HTSV results, or perhaps it could make sense to bundle any
existing out parameters together into a struct and then add new things
to that struct instead of adding even more parameters to the function
itself. But there doesn't seem to be any good reason to muddle
together the new output parameters for heap_page_prune() with a bunch
of state that is currently internal to vacuumlazy.c.I realize that the shape of the patches probably stems from the fact
that they started out life as part of a bigger patch set. But to be
committed independently, they need to be shaped in a way that makes
sense independently, and I don't think this qualifies. On the plus
side, it seems to me that it's probably not that much work to fix this
issue and that the result would likely be a smaller patch than what
you have now, which is something.
Yeah, I think this is a fair concern. I have addressed it in the
attached patches.
I thought a lot about whether or not adding a PruneResult which
contains only the output parameters and result of heap_page_prune() is
annoying since we have so many other *Prune* data structures. I
decided it's not annoying. In some cases, four outputs don't merit a
new structure. In this case, I think it declutters the code a bit --
independent of any other patches I may be writing :)
- Melanie
Attachments:
v3-0001-Move-heap_page_prune-output-parameters-into-struc.patchtext/x-patch; charset=US-ASCII; name=v3-0001-Move-heap_page_prune-output-parameters-into-struc.patchDownload+45-43
v3-0002-Reuse-heap_page_prune-tuple-visibility-statuses.patchtext/x-patch; charset=US-ASCII; name=v3-0002-Reuse-heap_page_prune-tuple-visibility-statuses.patchDownload+41-50
On Wed, Sep 6, 2023 at 5:21 PM Melanie Plageman
<melanieplageman@gmail.com> wrote:
Yeah, I think this is a fair concern. I have addressed it in the
attached patches.I thought a lot about whether or not adding a PruneResult which
contains only the output parameters and result of heap_page_prune() is
annoying since we have so many other *Prune* data structures. I
decided it's not annoying. In some cases, four outputs don't merit a
new structure. In this case, I think it declutters the code a bit --
independent of any other patches I may be writing :)
I took a look at 0001 and I think that it's incorrect. In the existing
code, *off_loc gets updated before each call to
heap_prune_satisfies_vacuum(). This means that if an error occurs in
heap_prune_satisfies_vacuum(), *off_loc will as of that moment contain
the relevant offset number. In your version, the relevant offset
number will only be stored in some local structure to which the caller
doesn't yet have access. The difference is meaningful. lazy_scan_prune
passes off_loc as vacrel->offnum, which means that if an error
happens, vacrel->offnum will have the right value, and so when the
error context callback installed by heap_vacuum_rel() fires, namely
vacuum_error_callback(), it can look at vacrel->offnum and know where
the error happened. With your patch, I think that would no longer
work.
I haven't run the regression suite with 0001 applied. I'm guessing
that you did, and that they passed, which if true means that we don't
have a test for this, which is a shame, although writing such a test
might be a bit tricky. If there is a test case for this and you didn't
run it, woops. This is also why I think it's *extremely* important for
the header comment of a function that takes pointer parameters to
document the semantics of those pointers. Normally they are in
parameters or out parameters or in-out parameters, but here it's
something even more complicated. The existing header comment says
"off_loc is the offset location required by the caller to use in error
callback," which I didn't really understand until I actually looked at
what the code is doing, so I consider that somebody could've done a
better job writing this comment, but in theory you could've also
noticed that, at least AFAICS, there's no way for the function to
return with *off_loc set to anything other than InvalidOffsetNumber.
That means that the statements which set *off_loc to other values must
have some other purpose.
--
Robert Haas
EDB: http://www.enterprisedb.com
On Thu, Sep 7, 2023 at 1:37 PM Robert Haas <robertmhaas@gmail.com> wrote:
On Wed, Sep 6, 2023 at 5:21 PM Melanie Plageman
<melanieplageman@gmail.com> wrote:Yeah, I think this is a fair concern. I have addressed it in the
attached patches.I thought a lot about whether or not adding a PruneResult which
contains only the output parameters and result of heap_page_prune() is
annoying since we have so many other *Prune* data structures. I
decided it's not annoying. In some cases, four outputs don't merit a
new structure. In this case, I think it declutters the code a bit --
independent of any other patches I may be writing :)I took a look at 0001 and I think that it's incorrect. In the existing
code, *off_loc gets updated before each call to
heap_prune_satisfies_vacuum(). This means that if an error occurs in
heap_prune_satisfies_vacuum(), *off_loc will as of that moment contain
the relevant offset number. In your version, the relevant offset
number will only be stored in some local structure to which the caller
doesn't yet have access. The difference is meaningful. lazy_scan_prune
passes off_loc as vacrel->offnum, which means that if an error
happens, vacrel->offnum will have the right value, and so when the
error context callback installed by heap_vacuum_rel() fires, namely
vacuum_error_callback(), it can look at vacrel->offnum and know where
the error happened. With your patch, I think that would no longer
work.
You are right. That is a major problem. Thank you for finding that. I
was able to confirm the breakage by patching in an error to
heap_page_prune() after we have set off_loc and confirming that the
error context has the offset in master and is missing it with my patch
applied.
I can fix it by changing the type of PruneResult->off_loc to be an
OffsetNumber pointer. This does mean that PruneResult will be
initialized partially by heap_page_prune() callers. I wonder if you
think that undermines the case for making a new struct.
I still want to eliminate the NULL check of off_loc in
heap_page_prune() by making it a required parameter. Even though
on-access pruning does not have an error callback mechanism which uses
the offset, it seems better to have a pointless local variable in
heap_page_prune_opt() than to do a NULL check for every tuple pruned.
I haven't run the regression suite with 0001 applied. I'm guessing
that you did, and that they passed, which if true means that we don't
have a test for this, which is a shame, although writing such a test
might be a bit tricky. If there is a test case for this and you didn't
run it, woops.
There is no test coverage for the vacuum error callback context
currently (tests passed for me). I looked into how we might add such a
test. First, I investigated what kind of errors might occur during
heap_prune_satisfies_vacuum(). Some of the multixact functions called
by HeapTupleSatisfiesVacuumHorizon() could error out -- for example
GetMultiXactIdMembers(). It seems difficult to trigger the errors in
GetMultiXactIdMembers(), as we would have to cause wraparound. It
would be even more difficult to ensure that we hit those specific
errors from a call stack containing heap_prune_satisfies_vacuum(). As
such, I'm not sure I can think of a way to protect future developers
from repeating my mistake--apart from improving the comment like you
mentioned.
- Melanie
On Thu, Sep 7, 2023 at 3:10 PM Melanie Plageman
<melanieplageman@gmail.com> wrote:
I can fix it by changing the type of PruneResult->off_loc to be an
OffsetNumber pointer. This does mean that PruneResult will be
initialized partially by heap_page_prune() callers. I wonder if you
think that undermines the case for making a new struct.
I think that it undermines the case for including that particular
argument in the struct. It's not really a Prune*Result* if the caller
initializes it in part. It seems fairly reasonable to still have a
PruneResult struct for the other stuff, though, at least to me. How do
you see it?
(One could also argue that this is a somewhat more byzantine way of
doing error reporting than would be desirable, but fixing that problem
doesn't seem too straightforward so perhaps it's prudent to leave it
well enough alone.)
I still want to eliminate the NULL check of off_loc in
heap_page_prune() by making it a required parameter. Even though
on-access pruning does not have an error callback mechanism which uses
the offset, it seems better to have a pointless local variable in
heap_page_prune_opt() than to do a NULL check for every tuple pruned.
It doesn't seem important to me unless it improves performance. If
it's just stylistic, I don't object, but I also don't really see a
reason to care.
I haven't run the regression suite with 0001 applied. I'm guessing
that you did, and that they passed, which if true means that we don't
have a test for this, which is a shame, although writing such a test
might be a bit tricky. If there is a test case for this and you didn't
run it, woops.There is no test coverage for the vacuum error callback context
currently (tests passed for me). I looked into how we might add such a
test. First, I investigated what kind of errors might occur during
heap_prune_satisfies_vacuum(). Some of the multixact functions called
by HeapTupleSatisfiesVacuumHorizon() could error out -- for example
GetMultiXactIdMembers(). It seems difficult to trigger the errors in
GetMultiXactIdMembers(), as we would have to cause wraparound. It
would be even more difficult to ensure that we hit those specific
errors from a call stack containing heap_prune_satisfies_vacuum(). As
such, I'm not sure I can think of a way to protect future developers
from repeating my mistake--apart from improving the comment like you
mentioned.
004_verify_heapam.pl has some tests that intentionally corrupt pages
and then use pg_amcheck to detect the corruption. Such an approach
could probably also be used here. But it's a pain to get such tests
right, because any change to the page format due to endianness,
different block size, or whatever can make carefully-written tests go
boom.
--
Robert Haas
EDB: http://www.enterprisedb.com
On Thu, Sep 7, 2023 at 3:30 PM Robert Haas <robertmhaas@gmail.com> wrote:
On Thu, Sep 7, 2023 at 3:10 PM Melanie Plageman
<melanieplageman@gmail.com> wrote:I can fix it by changing the type of PruneResult->off_loc to be an
OffsetNumber pointer. This does mean that PruneResult will be
initialized partially by heap_page_prune() callers. I wonder if you
think that undermines the case for making a new struct.I think that it undermines the case for including that particular
argument in the struct. It's not really a Prune*Result* if the caller
initializes it in part. It seems fairly reasonable to still have a
PruneResult struct for the other stuff, though, at least to me. How do
you see it?
Yes, I think off_loc probably didn't belong in PruneResult to begin with.
It is inherently not a result of pruning since it will only be used in
the event that pruning doesn't complete (it errors out).
In the attached v4 patch set, I have both PruneResult and off_loc as
parameters to heap_page_prune(). I have also added a separate commit
which adds comments both above heap_page_prune()'s call site in
lazy_scan_prune() and in the heap_page_prune() function header
clarifying the various points we discussed.
I still want to eliminate the NULL check of off_loc in
heap_page_prune() by making it a required parameter. Even though
on-access pruning does not have an error callback mechanism which uses
the offset, it seems better to have a pointless local variable in
heap_page_prune_opt() than to do a NULL check for every tuple pruned.It doesn't seem important to me unless it improves performance. If
it's just stylistic, I don't object, but I also don't really see a
reason to care.
I did some performance testing but, as expected, I couldn't concoct a
scenario where the overhead was noticeable in a profile. So much else
is happening in that code, the NULL check basically doesn't matter
(even though it isn't optimized out).
I mostly wanted to remove the NULL checks because I found them
distracting (so, a stylistic complaint). However, upon further
reflection, I actually think it is better if heap_page_prune_opt()
passes NULL. heap_page_prune() has no error callback mechanism that
could use it, and passing a valid value implies otherwise. Also, as
you said, off_loc will always be InvalidOffsetNumber when
heap_page_prune() returns normally, so having heap_page_prune_opt()
pass a dummy value might actually be more confusing for future
programmers.
I haven't run the regression suite with 0001 applied. I'm guessing
that you did, and that they passed, which if true means that we don't
have a test for this, which is a shame, although writing such a test
might be a bit tricky. If there is a test case for this and you didn't
run it, woops.There is no test coverage for the vacuum error callback context
currently (tests passed for me). I looked into how we might add such a
test. First, I investigated what kind of errors might occur during
heap_prune_satisfies_vacuum(). Some of the multixact functions called
by HeapTupleSatisfiesVacuumHorizon() could error out -- for example
GetMultiXactIdMembers(). It seems difficult to trigger the errors in
GetMultiXactIdMembers(), as we would have to cause wraparound. It
would be even more difficult to ensure that we hit those specific
errors from a call stack containing heap_prune_satisfies_vacuum(). As
such, I'm not sure I can think of a way to protect future developers
from repeating my mistake--apart from improving the comment like you
mentioned.004_verify_heapam.pl has some tests that intentionally corrupt pages
and then use pg_amcheck to detect the corruption. Such an approach
could probably also be used here. But it's a pain to get such tests
right, because any change to the page format due to endianness,
different block size, or whatever can make carefully-written tests go
boom.
Cool! I hadn't examined how these tests work until now. I took a stab
at writing a test in the existing 0004_verify_heapam.pl. The simplest
thing would be if we could just vacuum the corrupted table ("test")
after running pg_amcheck and compare the error context to our
expectation. I found that this didn't work, though. In an assert
build, vacuum trips an assert before it hits an error while vacuuming
a corrupted tuple in the "test" table. There might be a way of
modifying the existing test code to avoid this, but I tried the next
easiest thing -- corrupting a tuple in the other existing table in the
file, "junk". This is possible to do, but we have to repeat a lot of
the setup code done for the "test" table to get the line pointer array
and loop through and corrupt a tuple. In order to do this well, I
would want to refactor some of the boilerplate into a function. There
are other fiddly bits as well that I needed to consider. I think a
test like this could be useful coverage of the some of the possible
errors that could happen in heap_prune_satisfies_vacuum(), but it
definitely isn't coverage of pg_amcheck (and thus shouldn't go in that
file) and a whole new test which spins up a Postgres to cover
vacuum_error_callback() seemed like a bit much.
- Melanie
Attachments:
v4-0002-Move-heap_page_prune-output-parameters-into-struc.patchtext/x-patch; charset=US-ASCII; name=v4-0002-Move-heap_page_prune-output-parameters-into-struc.patchDownload+33-32
v4-0003-Reuse-heap_page_prune-tuple-visibility-statuses.patchtext/x-patch; charset=US-ASCII; name=v4-0003-Reuse-heap_page_prune-tuple-visibility-statuses.patchDownload+41-50
v4-0001-Clarify-usage-of-heap_page_prune-parameter-off_lo.patchtext/x-patch; charset=US-ASCII; name=v4-0001-Clarify-usage-of-heap_page_prune-parameter-off_lo.patchDownload+9-3
On Thu, Sep 7, 2023 at 6:23 PM Melanie Plageman
<melanieplageman@gmail.com> wrote:
I mostly wanted to remove the NULL checks because I found them
distracting (so, a stylistic complaint). However, upon further
reflection, I actually think it is better if heap_page_prune_opt()
passes NULL. heap_page_prune() has no error callback mechanism that
could use it, and passing a valid value implies otherwise. Also, as
you said, off_loc will always be InvalidOffsetNumber when
heap_page_prune() returns normally, so having heap_page_prune_opt()
pass a dummy value might actually be more confusing for future
programmers.
I'll look at the new patches more next week, but I wanted to comment
on this point. I think this is kind of six of one, half a dozen of the
other. It's not that hard to spot a variable that's only used in a
function call and never initialized beforehand or used afterward, and
if someone really feels the need to hammer home the point, they could
always name it dummy or dummy_loc or whatever. So this point doesn't
really carry a lot of weight with me. I actually think that the
proposed change is probably better, but it seems like such a minor
improvement that I get slightly reluctant to make it, only because
churning the source code for arguable points sometimes annoys other
developers.
But I also had the thought that maybe it wouldn't be such a terrible
idea if heap_page_prune_opt() actually used off_loc for some error
reporting goodness. I mean, if HOT pruning fails, and we don't get the
detail as to which tuple caused the failure, we can always run VACUUM
and it will give us that information, assuming of course that the same
failure happens again. But is there any reason why HOT pruning
shouldn't include that error detail? If it did, then off_loc would be
passed by all callers, at which point we surely would want to get rid
of the branches.
--
Robert Haas
EDB: http://www.enterprisedb.com
On Fri, Sep 8, 2023 at 11:06 AM Robert Haas <robertmhaas@gmail.com> wrote:
On Thu, Sep 7, 2023 at 6:23 PM Melanie Plageman
<melanieplageman@gmail.com> wrote:I mostly wanted to remove the NULL checks because I found them
distracting (so, a stylistic complaint). However, upon further
reflection, I actually think it is better if heap_page_prune_opt()
passes NULL. heap_page_prune() has no error callback mechanism that
could use it, and passing a valid value implies otherwise. Also, as
you said, off_loc will always be InvalidOffsetNumber when
heap_page_prune() returns normally, so having heap_page_prune_opt()
pass a dummy value might actually be more confusing for future
programmers.I'll look at the new patches more next week, but I wanted to comment
on this point. I think this is kind of six of one, half a dozen of the
other. It's not that hard to spot a variable that's only used in a
function call and never initialized beforehand or used afterward, and
if someone really feels the need to hammer home the point, they could
always name it dummy or dummy_loc or whatever. So this point doesn't
really carry a lot of weight with me. I actually think that the
proposed change is probably better, but it seems like such a minor
improvement that I get slightly reluctant to make it, only because
churning the source code for arguable points sometimes annoys other
developers.But I also had the thought that maybe it wouldn't be such a terrible
idea if heap_page_prune_opt() actually used off_loc for some error
reporting goodness. I mean, if HOT pruning fails, and we don't get the
detail as to which tuple caused the failure, we can always run VACUUM
and it will give us that information, assuming of course that the same
failure happens again. But is there any reason why HOT pruning
shouldn't include that error detail? If it did, then off_loc would be
passed by all callers, at which point we surely would want to get rid
of the branches.
This is a good idea. I will work on a separate patch set to add an
error context callback for on-access HOT pruning.
- Melanie
Hi,
On 2023-09-07 18:23:22 -0400, Melanie Plageman wrote:
From e986940e546171d1f1d06f62a101d695a8481e7a Mon Sep 17 00:00:00 2001
From: Melanie Plageman <melanieplageman@gmail.com>
Date: Wed, 6 Sep 2023 14:57:20 -0400
Subject: [PATCH v4 2/3] Move heap_page_prune output parameters into structAdd PruneResult, a structure containing the output parameters and result
of heap_page_prune(). Reorganizing the results of heap_page_prune() into
a struct simplifies the function signature and provides a location for
future commits to store additional output parameters.Discussion: /messages/by-id/CAAKRu_br124qsGJieuYA0nGjywEukhK1dKBfRdby_4yY3E9SXA@mail.gmail.com
---
src/backend/access/heap/pruneheap.c | 33 +++++++++++++---------------
src/backend/access/heap/vacuumlazy.c | 17 +++++---------
src/include/access/heapam.h | 13 +++++++++--
src/tools/pgindent/typedefs.list | 1 +
4 files changed, 33 insertions(+), 31 deletions(-)diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c index 392b54f093..5ac286e152 100644 --- a/src/backend/access/heap/pruneheap.c +++ b/src/backend/access/heap/pruneheap.c @@ -155,15 +155,13 @@ heap_page_prune_opt(Relation relation, Buffer buffer) */ if (PageIsFull(page) || PageGetHeapFreeSpace(page) < minfree) { - int ndeleted, - nnewlpdead; + PruneResult presult;- ndeleted = heap_page_prune(relation, buffer, vistest, - &nnewlpdead, NULL); + heap_page_prune(relation, buffer, vistest, &presult, NULL);/* * Report the number of tuples reclaimed to pgstats. This is - * ndeleted minus the number of newly-LP_DEAD-set items. + * presult.ndeleted minus the number of newly-LP_DEAD-set items. * * We derive the number of dead tuples like this to avoid totally * forgetting about items that were set to LP_DEAD, since they @@ -175,9 +173,9 @@ heap_page_prune_opt(Relation relation, Buffer buffer) * tracks ndeleted, since it will set the same LP_DEAD items to * LP_UNUSED separately. */ - if (ndeleted > nnewlpdead) + if (presult.ndeleted > presult.nnewlpdead) pgstat_update_heap_dead_tuples(relation, - ndeleted - nnewlpdead); + presult.ndeleted - presult.nnewlpdead); }/* And release buffer lock */ @@ -204,24 +202,22 @@ heap_page_prune_opt(Relation relation, Buffer buffer) * (see heap_prune_satisfies_vacuum and * HeapTupleSatisfiesVacuum). * - * Sets *nnewlpdead for caller, indicating the number of items that were - * newly set LP_DEAD during prune operation. + * presult contains output parameters needed by callers such as the number of + * tuples removed and the number of line pointers newly marked LP_DEAD. + * heap_page_prune() is responsible for initializing it. * * off_loc is the current offset into the line pointer array while pruning. * This is used by vacuum to populate the error context message. On-access * pruning has no such callback mechanism for populating the error context, so * it passes NULL. When provided by the caller, off_loc is set exclusively by * heap_page_prune(). - * - * Returns the number of tuples deleted from the page during this call. */ -int +void heap_page_prune(Relation relation, Buffer buffer, GlobalVisState *vistest, - int *nnewlpdead, + PruneResult *presult, OffsetNumber *off_loc) { - int ndeleted = 0; Page page = BufferGetPage(buffer); BlockNumber blockno = BufferGetBlockNumber(buffer); OffsetNumber offnum, @@ -247,6 +243,9 @@ heap_page_prune(Relation relation, Buffer buffer, prstate.nredirected = prstate.ndead = prstate.nunused = 0; memset(prstate.marked, 0, sizeof(prstate.marked));+ presult->ndeleted = 0; + presult->nnewlpdead = 0; + maxoff = PageGetMaxOffsetNumber(page); tup.t_tableOid = RelationGetRelid(prstate.rel);@@ -321,7 +320,7 @@ heap_page_prune(Relation relation, Buffer buffer,
continue;/* Process this item or chain of items */ - ndeleted += heap_prune_chain(buffer, offnum, &prstate); + presult->ndeleted += heap_prune_chain(buffer, offnum, &prstate); }/* Clear the offset information once we have processed the given page. */
@@ -422,9 +421,7 @@ heap_page_prune(Relation relation, Buffer buffer,
END_CRIT_SECTION();/* Record number of newly-set-LP_DEAD items for caller */ - *nnewlpdead = prstate.ndead; - - return ndeleted; + presult->nnewlpdead = prstate.ndead; }diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c index 102cc97358..7ead9cfe9d 100644 --- a/src/backend/access/heap/vacuumlazy.c +++ b/src/backend/access/heap/vacuumlazy.c @@ -1544,12 +1544,11 @@ lazy_scan_prune(LVRelState *vacrel, ItemId itemid; HeapTupleData tuple; HTSV_Result res; - int tuples_deleted, - tuples_frozen, + PruneResult presult; + int tuples_frozen, lpdead_items, live_tuples, recently_dead_tuples; - int nnewlpdead; HeapPageFreeze pagefrz; int64 fpi_before = pgWalUsage.wal_fpi; OffsetNumber deadoffsets[MaxHeapTuplesPerPage]; @@ -1572,7 +1571,6 @@ retry: pagefrz.FreezePageRelminMxid = vacrel->NewRelminMxid; pagefrz.NoFreezePageRelfrozenXid = vacrel->NewRelfrozenXid; pagefrz.NoFreezePageRelminMxid = vacrel->NewRelminMxid; - tuples_deleted = 0; tuples_frozen = 0; lpdead_items = 0; live_tuples = 0; @@ -1581,9 +1579,8 @@ retry: /* * Prune all HOT-update chains in this page. * - * We count tuples removed by the pruning step as tuples_deleted. Its - * final value can be thought of as the number of tuples that have been - * deleted from the table. It should not be confused with lpdead_items; + * We count the number of tuples removed from the page by the pruning step + * in presult.ndeleted. It should not be confused with lpdead_items; * lpdead_items's final value can be thought of as the number of tuples * that were deleted from indexes. * @@ -1591,9 +1588,7 @@ retry: * current offset when populating the error context message, so it is * imperative that we pass its location to heap_page_prune. */ - tuples_deleted = heap_page_prune(rel, buf, vacrel->vistest, - &nnewlpdead, - &vacrel->offnum); + heap_page_prune(rel, buf, vacrel->vistest, &presult, &vacrel->offnum);/*
* Now scan the page to collect LP_DEAD items and check for tuples
@@ -1933,7 +1928,7 @@ retry:
}/* Finally, add page-local counts to whole-VACUUM counts */ - vacrel->tuples_deleted += tuples_deleted; + vacrel->tuples_deleted += presult.ndeleted; vacrel->tuples_frozen += tuples_frozen; vacrel->lpdead_items += lpdead_items; vacrel->live_tuples += live_tuples; diff --git a/src/include/access/heapam.h b/src/include/access/heapam.h index 6598c4d7d8..2d3f149e4f 100644 --- a/src/include/access/heapam.h +++ b/src/include/access/heapam.h @@ -191,6 +191,15 @@ typedef struct HeapPageFreeze} HeapPageFreeze;
+/* + * Per-page state returned from pruning + */ +typedef struct PruneResult +{ + int ndeleted; /* Number of tuples deleted from the page */ + int nnewlpdead; /* Number of newly LP_DEAD items */ +} PruneResult;
I think it might be worth making the names a bit less ambiguous than they
were. It's a bit odd that one has "new" in the name, the other doesn't,
despite both being about newly marked things. And "deleted" seems somewhat
ambiguous, it could also be understood as marking things LP_DEAD. Maybe
nnewunused?
static int heap_prune_chain(Buffer buffer,
OffsetNumber rootoffnum,
+ int8 *htsv,
PruneState *prstate);
Hm, do we really want to pass this explicitly to a bunch of functions? Seems
like it might be better to either pass the PruneResult around or to have a
pointer in PruneState?
/* * The criteria for counting a tuple as live in this block need to @@ -1682,7 +1664,7 @@ retry: * (Cases where we bypass index vacuuming will violate this optimistic * assumption, but the overall impact of that should be negligible.) */ - switch (res) + switch ((HTSV_Result) presult.htsv[offnum]) { case HEAPTUPLE_LIVE:
I think we should assert that we have a valid HTSV_Result here, i.e. not
-1. You could wrap the cast and Assert into an inline funciton as well.
Greetings,
Andres Freund
[ sorry for the delay getting back to this ]
On Wed, Sep 13, 2023 at 1:29 PM Andres Freund <andres@anarazel.de> wrote:
I think it might be worth making the names a bit less ambiguous than they
were. It's a bit odd that one has "new" in the name, the other doesn't,
despite both being about newly marked things. And "deleted" seems somewhat
ambiguous, it could also be understood as marking things LP_DEAD. Maybe
nnewunused?
I like it the better the way Melanie did it. The current name may not
be for the best, but that can be changed some other time, in a
separate patch, if someone likes. For now, changing the name seems
like a can of worms we don't need to open; the existing names have
precedent on their side if nothing else.
static int heap_prune_chain(Buffer buffer,
OffsetNumber rootoffnum,
+ int8 *htsv,
PruneState *prstate);Hm, do we really want to pass this explicitly to a bunch of functions? Seems
like it might be better to either pass the PruneResult around or to have a
pointer in PruneState?
As far as I can see, 0002 adds it to one function (heap_page_pune) and
0003 adds it to one more (heap_prune_chain). That's not much of a
bunch.
/* * The criteria for counting a tuple as live in this block need to @@ -1682,7 +1664,7 @@ retry: * (Cases where we bypass index vacuuming will violate this optimistic * assumption, but the overall impact of that should be negligible.) */ - switch (res) + switch ((HTSV_Result) presult.htsv[offnum]) { case HEAPTUPLE_LIVE:I think we should assert that we have a valid HTSV_Result here, i.e. not
-1. You could wrap the cast and Assert into an inline funciton as well.
This isn't a bad idea, although I don't find it completely necessary either.
--
Robert Haas
EDB: http://www.enterprisedb.com