Eagerly scan all-visible pages to amortize aggressive vacuum
Hi,
An aggressive vacuum of a relation is triggered when its relfrozenxid
is older than vacuum_freeze_table_age XIDs. Aggressive vacuums require
examining every unfrozen tuple in the relation. Normal vacuums can
skip all-visible pages. So a relation with a large number of
all-visible but not all-frozen pages may suddenly have to vacuum an
order of magnitude more pages than the previous vacuum.
In many cases, these all-visible not all-frozen pages are not part of
the working set and must be read in. All of the pages with newly
frozen tuples have to be written out and all of the WAL associated
with freezing and setting the page all-frozen in the VM must be
emitted. This extra I/O can affect performance of the foreground
workload substantially.
The best solution would be to freeze the pages instead of just setting
them all-visible. But we don't want to do this if the page will be
modified again because freezing costs extra I/O.
Last year, I worked on a vacuum patch to try and predict which pages
should be eagerly frozen [1]/messages/by-id/CAAKRu_b3tpbdRPUPh1Q5h35gXhY=spH2ssNsEsJ9sDfw6=PEAg@mail.gmail.com by building a distribution of page
modification intervals and estimating the probability that a given
page would stay frozen long enough to merit freezing.
While working on it I encountered a problem. Pages set all-visible but
not all-frozen by vacuum and not modified again do not have a
modification interval. As such, the distribution would not account for
an outstanding debt of old, unfrozen pages.
As I thought about the problem more, I realized that even if we could
find a way to include those pages in our model and then predict and
effectively eagerly freeze pages, there will always be pages we miss
and have to be picked up later by an aggressive vacuum.
While it would be best to freeze these pages the first time they are
vacuumed and set all-visible, the write amplification is already being
incurred. This patch proposes to spread it out across multiple
"semi-aggressive" vacuums.
I believe eager scanning is actually a required step toward more
intelligent eager freezing. It is worth noting that eager scanning
should also allow us to lift the restriction on setting pages
all-visible in the VM during on-access pruning. This could enable
index-only scans in more cases.
The approach I take in the attached patch set is built on suggestions
and feedback from both Robert and Andres as well as my own ideas and
research.
It implements a new "semi-aggressive" vacuum. Semi-aggressive vacuums
eagerly scan some number of all-visible but not all-frozen pages in
hopes of freezing them. All-visible pages that are eagerly scanned and
set all-frozen in the visibility map are considered successful eager
scans and those not frozen are considered failed eager scans.
Because we want to amortize our eager scanning across a few vacuums,
we cap the maximum number of successful eager scans to a percentage of
the total number of all-visible but not all-frozen pages in the table
(currently 20%).
We also want to cap the maximum number of failures. We assume that
different areas or "regions" of the relation are likely to contain
similarly aged data. So, if too many blocks are eagerly scanned and
not frozen in a given region of the table, eager scanning is
temporarily suspended.
Since I refer to vacuums that eagerly scan a set number of pages as
"semi-aggressive vacuums," I’ve begun calling those that scan every
unfrozen tuple "fully aggressive vacuums" and those with no eager
scanning, or with permanently disabled eager scanning, "unaggressive
vacuums."
v1 of this feature is attached. The first eight patches in the set are
preliminary.
I've proposed 0001-0003 in this thread [2]/messages/by-id/CAAKRu_aJM+0Gwoq_+-sozMX8QEax4QeDhMvySxRt2ayteXJNCg@mail.gmail.com -- they boil down to
counting pages set all-frozen in the VM.
0004-0007 are a bit of refactoring to make the code a better shape for
the eager scanning feature.
0008 is a WIP patch to add a more general description of heap
vacuuming to the top of vacuumlazy.c.
0009 is the actual eager scanning feature.
To demonstrate the results, I ran an append-only workload run for over
3 hours on master and with my patch applied. The patched version of
Postgres amortized the work of freezing the all-visible but not
all-frozen pages nicely. The first aggressive vacuum with the patch
was 44 seconds and on master it was 1201 seconds.
patch
LOG: automatic aggressive vacuum of table "history": index scans: 0
vacuum duration: 44 seconds (msecs: 44661).
pages: 0 removed, 27425085 remain, 1104095 scanned (4.03% of
total), 709889 eagerly scanned
frozen: 316544 pages from table (1.15% of total) had 17409920
tuples frozen. 316544 pages set all-frozen in the VM
I/O timings: read: 1160.599 ms, write: 2461.205 ms. approx time
spent in vacuum delay: 16230 ms.
buffer usage: 1105630 hits, 1111898 reads, 646229 newly dirtied,
1750426 dirtied.
WAL usage: 1027099 records, 316566 full page images, 276209780 bytes.
master
LOG: automatic aggressive vacuum of table "history": index scans: 0
vacuum duration: 1201 seconds (msecs: 1201487).
pages: 0 removed, 27515348 remain, 15800948 scanned (57.43% of
total), 15098257 eagerly scanned
frozen: 15096384 pages from table (54.87% of total) had 830247896
tuples frozen. 15096384 pages set all-frozen in the VM
I/O timings: read: 246537.348 ms, write: 73000.498 ms. approx time
spent in vacuum delay: 349166 ms.
buffer usage: 15798343 hits, 15813524 reads, 15097063 newly
dirtied, 31274333 dirtied.
WAL usage: 30733564 records, 15097073 full page images, 11789257631 bytes.
This is because, with the patch, the freezing work is being off-loaded
to earlier vacuums.
In the attached chart.png, you can see the vm_page_freezes climbing
steadily with the patch, whereas on master, there are sudden spikes
aligned with the aggressive vacuums. You can also see that the number
of pages that are all-visible but not all-frozen grows steadily on
master until the aggressive vacuum. This is vacuum's "backlog" of
freezing work.
In this benchmark, the TPS is rate-limited, but using pgbench
per-statement reports, I calculated that the P99 latency for this
benchmark is 16 ms on master and 1 ms with the patch. Vacuuming pages
sooner decreases vacuum reads and doing the freezing work spread over
more vacuums improves P99 transaction latency.
Below is the comparative WAL volume, checkpointer and background
writer writes, reads and writes done by all other backend types, time
spent vacuuming in milliseconds, and p99 latency. Notice that overall
vacuum IO time is substantially lower with the patch.
version wal cptr_bgwriter_w other_rw vac_io_time p99_lat
patch 770 GB 5903264 235073744 513722 1
master 767 GB 5908523 216887764 1003654 16
(Note that the benchmarks were run on Postgres with a few extra
patches applied to both master and the patch to trigger vacuums more
frequently. I've proposed those here [3]/messages/by-id/CAAKRu_aj-P7YyBz_cPNwztz6ohP+vWis=iz3YcomkB3NpYA--w@mail.gmail.com.)
I've also run the built-in tpcb-like pgbench workload and confirmed
that it improves the vacuuming behavior on pgbench_history but has
little impact on vacuuming of heavy-update tables like
pgbench_accounts -- depending on how aggressively the eager scanning
is tuned. Which brings me to the TODOs.
I need to do further benchmarking and investigation to determine
optimal failure and success caps -- ones that will work well for all
workloads. Perhaps the failure cap per region should be configurable.
I also need to try other scenarios -- like those in which old data is
deleted -- and determine if the region boundaries should change from
run to run to avoid eager scanning and failing to freeze the same
pages each time.
Also, all my benchmarking so far has been done on compressed
timelines. I tuned Postgres to exhibit the behavior it might normally
exhibit over days or a week in a few hours. As such, I need to start
running long benchmarks to observe the behavior in a more natural
environment.
- Melanie
[1]: /messages/by-id/CAAKRu_b3tpbdRPUPh1Q5h35gXhY=spH2ssNsEsJ9sDfw6=PEAg@mail.gmail.com
[2]: /messages/by-id/CAAKRu_aJM+0Gwoq_+-sozMX8QEax4QeDhMvySxRt2ayteXJNCg@mail.gmail.com
[3]: /messages/by-id/CAAKRu_aj-P7YyBz_cPNwztz6ohP+vWis=iz3YcomkB3NpYA--w@mail.gmail.com
Attachments:
v1-0007-Make-heap_vac_scan_next_block-return-BlockNumber.patchtext/x-patch; charset=US-ASCII; name=v1-0007-Make-heap_vac_scan_next_block-return-BlockNumber.patchDownload+13-15
v1-0008-WIP-Add-more-general-summary-to-vacuumlazy.c.patchtext/x-patch; charset=US-ASCII; name=v1-0008-WIP-Add-more-general-summary-to-vacuumlazy.c.patchDownload+11-1
v1-0009-Eagerly-scan-all-visible-pages-to-amortize-aggres.patchtext/x-patch; charset=US-ASCII; name=v1-0009-Eagerly-scan-all-visible-pages-to-amortize-aggres.patchDownload+326-50
chart.pngimage/png; name=chart.pngDownload+1-2
v1-0004-Replace-uses-of-blkno-local-variable-in-lazy_scan.patchtext/x-patch; charset=US-ASCII; name=v1-0004-Replace-uses-of-blkno-local-variable-in-lazy_scan.patchDownload+7-6
v1-0003-Count-pages-set-all-visible-and-all-frozen-in-VM-.patchtext/x-patch; charset=US-ASCII; name=v1-0003-Count-pages-set-all-visible-and-all-frozen-in-VM-.patchDownload+76-13
v1-0001-Rename-LVRelState-frozen_pages.patchtext/x-patch; charset=US-ASCII; name=v1-0001-Rename-LVRelState-frozen_pages.patchDownload+11-10
v1-0002-Make-visibilitymap_set-return-previous-state-of-v.patchtext/x-patch; charset=US-ASCII; name=v1-0002-Make-visibilitymap_set-return-previous-state-of-v.patchDownload+11-7
v1-0005-Move-vacuum-VM-buffer-release.patchtext/x-patch; charset=US-ASCII; name=v1-0005-Move-vacuum-VM-buffer-release.patchDownload+8-8
v1-0006-Remove-superfluous-next_block-local-variable-in-v.patchtext/x-patch; charset=US-ASCII; name=v1-0006-Remove-superfluous-next_block-local-variable-in-v.patchDownload+10-12
Hi,
On 2024-11-01 19:35:22 -0400, Melanie Plageman wrote:
It implements a new "semi-aggressive" vacuum. Semi-aggressive vacuums
eagerly scan some number of all-visible but not all-frozen pages in
hopes of freezing them. All-visible pages that are eagerly scanned and
set all-frozen in the visibility map are considered successful eager
scans and those not frozen are considered failed eager scans.
I think it's worth mentioning that this would be useful to eventually be able
to mark pages as all-visible during on-access-pruning, to allow for
index-only-scans (and also faster seqscans). Today that'd have the danger of
increasing the "anti-wraparound vacuum debt" unconscionably, but with this
patch that'd be addressed.
Because we want to amortize our eager scanning across a few vacuums,
we cap the maximum number of successful eager scans to a percentage of
the total number of all-visible but not all-frozen pages in the table
(currently 20%).
One thing worth mentioning around here seems that we currently can't
partially-aggressively freeze tuples that are "too young" and how that
interacts with everything else.
To demonstrate the results, I ran an append-only workload run for over
3 hours on master and with my patch applied. The patched version of
Postgres amortized the work of freezing the all-visible but not
all-frozen pages nicely. The first aggressive vacuum with the patch
was 44 seconds and on master it was 1201 seconds.
Oops.
patch
LOG: automatic aggressive vacuum of table "history": index scans: 0
vacuum duration: 44 seconds (msecs: 44661).
pages: 0 removed, 27425085 remain, 1104095 scanned (4.03% of
total), 709889 eagerly scanned
frozen: 316544 pages from table (1.15% of total) had 17409920
tuples frozen. 316544 pages set all-frozen in the VM
I/O timings: read: 1160.599 ms, write: 2461.205 ms. approx time
spent in vacuum delay: 16230 ms.
buffer usage: 1105630 hits, 1111898 reads, 646229 newly dirtied,
1750426 dirtied.
WAL usage: 1027099 records, 316566 full page images, 276209780 bytes.master
LOG: automatic aggressive vacuum of table "history": index scans: 0
vacuum duration: 1201 seconds (msecs: 1201487).
pages: 0 removed, 27515348 remain, 15800948 scanned (57.43% of
total), 15098257 eagerly scanned
frozen: 15096384 pages from table (54.87% of total) had 830247896
tuples frozen. 15096384 pages set all-frozen in the VM
I/O timings: read: 246537.348 ms, write: 73000.498 ms. approx time
spent in vacuum delay: 349166 ms.
buffer usage: 15798343 hits, 15813524 reads, 15097063 newly
dirtied, 31274333 dirtied.
WAL usage: 30733564 records, 15097073 full page images, 11789257631 bytes.This is because, with the patch, the freezing work is being off-loaded
to earlier vacuums.In the attached chart.png, you can see the vm_page_freezes climbing
steadily with the patch, whereas on master, there are sudden spikes
aligned with the aggressive vacuums. You can also see that the number
of pages that are all-visible but not all-frozen grows steadily on
master until the aggressive vacuum. This is vacuum's "backlog" of
freezing work.
What's the reason for all-visible-but-not-all-frozen to increase to a higher
value initially than where it later settles?
In this benchmark, the TPS is rate-limited, but using pgbench
per-statement reports, I calculated that the P99 latency for this
benchmark is 16 ms on master and 1 ms with the patch. Vacuuming pages
sooner decreases vacuum reads and doing the freezing work spread over
more vacuums improves P99 transaction latency.
Nice!
Below is the comparative WAL volume, checkpointer and background
writer writes, reads and writes done by all other backend types, time
spent vacuuming in milliseconds, and p99 latency. Notice that overall
vacuum IO time is substantially lower with the patch.version wal cptr_bgwriter_w other_rw vac_io_time p99_lat
patch 770 GB 5903264 235073744 513722 1
master 767 GB 5908523 216887764 1003654 16
Hm. It's not clear to me why other_rw is higher with the patch? After all,
given the workload, there's no chance of unnecessarily freezing tuples? Is
that just because at the end of the benchmark there's leftover work?
From 69b517f34caf39ad814691d6412c68d54e852990 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <melanieplageman@gmail.com>
Date: Mon, 28 Oct 2024 10:53:37 -0400
Subject: [PATCH v1 1/9] Rename LVRelState->frozen_pagesRename frozen_pages to tuple_freeze_pages in LVRelState, the struct used
for tracking state during vacuuming of a heap relation. frozen_pages
sounds like it includes every all-frozen page. That is a misnomer. It
does not include pages with already frozen tuples. It also includes
pages that are not actually all-frozen.
Hm. Is tuple_freeze_pages that much clearer, it could also just indicate pages
that already were frozen. How about "newly_frozen_pages"?
Subject: [PATCH v1 2/9] Make visibilitymap_set() return previous state of
LGTM.
From c317a272713bb833f7f2761a5be1924f8e1bdb4d Mon Sep 17 00:00:00 2001
From: Melanie Plageman <melanieplageman@gmail.com>
Date: Thu, 31 Oct 2024 18:19:18 -0400
Subject: [PATCH v1 3/9] Count pages set all-visible and all-frozen in VM
during vacuumVacuum already counts and logs pages with newly frozen tuples. Count and
log pages set all-frozen in the VM too. This includes pages that are
empty before or after vacuuming.While we are at it, count and log the number of pages vacuum set
all-visible. Pages that are all-visible but not all-frozen are debt for
future aggressive vacuums. The newly all-visible and all-frozen counts
give us visiblity into the rate at which this debt is being accrued and
paid down.
Hm. Why is it interesting to log VM changes separately from the state changes
of underlying pages? Wouldn't it e.g. be more intersting to log the number of
empty pages that vacuum [re-]discovered? I've a bit hard time seeing how a
user could take advantage of this.
From 67781cc2511bb7d62ccc9461f1787272820abcc4 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <melanieplageman@gmail.com>
Date: Mon, 28 Oct 2024 11:07:50 -0400
Subject: [PATCH v1 4/9] Replace uses of blkno local variable in
lazy_scan_heap()
Largely LGTM, but I'm not sure that it's worth having as a separate commit.
@@ -1114,7 +1117,6 @@ heap_vac_scan_next_block(LVRelState *vacrel, BlockNumber *blkno,
ReleaseBuffer(vacrel->next_unskippable_vmbuffer);
vacrel->next_unskippable_vmbuffer = InvalidBuffer;
}
- *blkno = vacrel->rel_pages;
return false;
}
I'd seat *blkno to InvalidBlockNumber or such, that'd make misuse more
apparent than having it set to some random older block.
From 67b5565ad57d3b196695f85811dde2044ba79f3e Mon Sep 17 00:00:00 2001
From: Melanie Plageman <melanieplageman@gmail.com>
Date: Mon, 28 Oct 2024 11:14:24 -0400
Subject: [PATCH v1 5/9] Move vacuum VM buffer releaseThe VM buffer for the next unskippable block can be released after the
main loop in lazy_scan_heap(). Doing so de-clutters
heap_vac_scan_next_block() and opens up more refactoring options.
That's vague...
From 8485dc400b3d4e9f895170af4f5fb1bb959b8495 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <melanieplageman@gmail.com>
Date: Mon, 28 Oct 2024 11:36:58 -0400
Subject: [PATCH v1 6/9] Remove superfluous next_block local variable in vacuum
codeReduce the number of block related variables in lazy_scan_heap() and its
helpers by removing the next_block local variable from
heap_vac_scan_next_block().
I don't mind this change, but I also don't get how it's related to anything
else here or why it's really better than the status quo.
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c index 4b1eadea1f2..52c9d49f2b1 100644 --- a/src/backend/access/heap/vacuumlazy.c +++ b/src/backend/access/heap/vacuumlazy.c @@ -1112,19 +1112,17 @@ static bool heap_vac_scan_next_block(LVRelState *vacrel, BlockNumber *blkno, bool *all_visible_according_to_vm) { - BlockNumber next_block; - /* relies on InvalidBlockNumber + 1 overflowing to 0 on first call */ - next_block = vacrel->current_block + 1; + vacrel->current_block++;
I realize this isn't introduced in this commit, but darn, that's ugly.
From 78ad9e022b95e024ff5bfa96af78e9e44730c970 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <melanieplageman@gmail.com>
Date: Mon, 28 Oct 2024 11:42:10 -0400
Subject: [PATCH v1 7/9] Make heap_vac_scan_next_block() return BlockNumber
@@ -857,7 +857,8 @@ lazy_scan_heap(LVRelState *vacrel)
vacrel->next_unskippable_allvis = false;
vacrel->next_unskippable_vmbuffer = InvalidBuffer;- while (heap_vac_scan_next_block(vacrel, &blkno, &all_visible_according_to_vm)) + while (BlockNumberIsValid(blkno = heap_vac_scan_next_block(vacrel, + &all_visible_according_to_vm)))
Personally I'd write this as
while (true)
{
BlockNumber blkno;
blkno = heap_vac_scan_next_block(vacrel, ...);
if (!BlockNumberIsValid(blkno))
break;
Mostly because it's good to use more minimal scopes when possible,
particularly when previously the scope intentionally was larger. But also
partially because I don't love variable assignments inside a macro call,
inside a while().
From 818d1c3b068c6705611256cfc3eb1f10bdc0b684 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <melanieplageman@gmail.com>
Date: Fri, 1 Nov 2024 18:25:05 -0400
Subject: [PATCH v1 8/9] WIP: Add more general summary to vacuumlazy.cCurrently the summary at the top of vacuumlazy.c provides some specific
details related to the new dead TID storage in 17. I plan to add a
summary and maybe some sub-sections to contextualize it.
I like this idea. It's hard to understand vacuumlazy.c without already
understanding vacuumlazy.c, which isn't a good situation.
---
src/backend/access/heap/vacuumlazy.c | 11 +++++++++++
1 file changed, 11 insertions(+)diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c index 7ce69953ba0..15a04c6b10b 100644 --- a/src/backend/access/heap/vacuumlazy.c +++ b/src/backend/access/heap/vacuumlazy.c @@ -3,6 +3,17 @@ * vacuumlazy.c * Concurrent ("lazy") vacuuming. * + * Heap relations are vacuumed in three main phases. In the first phase, + * vacuum scans relation pages, pruning and freezing tuples and saving dead + * tuples' TIDs in a TID store. If that TID store fills up or vacuum finishes + * scanning the relation, it progresses to the second phase: index vacuuming. + * After index vacuuming is complete, vacuum scans the blocks of the relation + * indicated by the TIDs in the TID store and reaps the dead tuples, freeing + * that space for future tuples. Finally, vacuum may truncate the relation if + * it has emptied pages at the end. XXX: this summary needs work.
Yea, at least we ought to mention that the phasing can be different when there
are no indexes and that the later phases can heuristically be omitted when
there aren't enough dead items.
From f21f0bab1dbe675be4b4dddcb2eea486d8a69d36 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <melanieplageman@gmail.com>
Date: Mon, 28 Oct 2024 12:15:08 -0400
Subject: [PATCH v1 9/9] Eagerly scan all-visible pages to amortize aggressive
vacuumIntroduce semi-aggressive vacuums, which scan some of the all-visible
but not all-frozen pages in the relation to amortize the cost of an
aggressive vacuum.
I wonder if "aggressive" is really the right terminology going
forward... Somehow it doesn't seem particularly descriptive anymore if, in
many workloads, almost all vacuums are going to be aggressive-ish.
Because the goal is to freeze these all-visible pages, all-visible pages
that are eagerly scanned and set all-frozen in the visibility map are
considered successful eager scans and those not frozen are considered
failed eager scans.If too many eager scans fail in a row, eager scanning is temporarily
suspended until a later portion of the relation. Because the goal is to
amortize aggressive vacuums, we cap the number of successes as well.
Once we reach the maximum number of blocks successfully eager scanned
and frozen, the semi-aggressive vacuum is downgraded to an unaggressive
vacuum.
---
src/backend/access/heap/vacuumlazy.c | 327 +++++++++++++++++++++++----
src/backend/commands/vacuum.c | 20 +-
src/include/commands/vacuum.h | 27 ++-
src/tools/pgindent/typedefs.list | 1 +
4 files changed, 326 insertions(+), 49 deletions(-)diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c index 15a04c6b10b..adabb5ff5f1 100644 --- a/src/backend/access/heap/vacuumlazy.c +++ b/src/backend/access/heap/vacuumlazy.c @@ -12,6 +12,40 @@ * that space for future tuples. Finally, vacuum may truncate the relation if * it has emptied pages at the end. XXX: this summary needs work. * + * Relation Scanning: + * + * Vacuum scans the heap relation, starting at the beginning and progressing + * to the end, skipping pages as permitted by their visibility status, vacuum + * options, and the aggressiveness level of the vacuum. + * + * When page skipping is enabled, unaggressive vacuums may skip scanning pages + * that are marked all-visible in the visibility map. We may choose not to + * skip pages if the range of skippable pages is below SKIP_PAGES_THRESHOLD. + * + * Semi-aggressive vacuums will scan skippable pages in an effort to freeze + * them and decrease the backlog of all-visible but not all-frozen pages that + * have to be processed to advance relfrozenxid and avoid transaction ID + * wraparound. + * + * We count it as a success when we are able to set an eagerly scanned page + * all-frozen in the VM and a failure when we are not able to set the page + * all-frozen. + * + * Because we want to amortize the overhead of freezing pages over multiple + * vacuums, we cap the number of successful eager scans to + * EAGER_SCAN_SUCCESS_RATE of the number of all-visible but not all-frozen + * pages at the beginning of the vacuum. + * + * On the assumption that different regions of the table are likely to contain + * similarly aged data, we use a localized failure cap instead of a global cap + * for the whole relation. The failure count is reset on each region of the + * table, comprised of RELSEG_SIZE blocks (or 1/4 of the table size for a + * small table). In each region, we tolerate MAX_SUCCESSIVE_EAGER_SCAN_FAILS + * before suspending eager scanning until the end of the region.
I'm a bit surprised to see such large regions. Why not something finer, in the
range of a few megabytes? The FSM steers new tuples quite aggressively to the
start of the table, which means that in many workloads there will be old and
new data interspersed at the start of the table. Using RELSEG_SIZE sized
regions for semi-aggressive vacuuming will mean that we'll often not do any
semi-aggressive processing beyond the start of the relation, as we'll reach
the failure rate quickly.
I also find it layering-wise a bit weird to use RELSEG_SIZE, that's really imo
is just an md.c concept.
+/* + * Semi-aggressive vacuums eagerly scan some all-visible but not all-frozen + * pages. Since our goal is to freeze these pages, an eager scan that fails to + * set the page all-frozen in the VM is considered to have "failed". + * + * On the assumption that different regions of the table tend to have + * similarly aged data, once we fail to freeze MAX_SUCCESSIVE_EAGER_SCAN_FAILS + * blocks, we suspend eager scanning until vacuum has progressed to another + * region of the table with potentially older data. + */ +#define MAX_SUCCESSIVE_EAGER_SCAN_FAILS 1024
Can this really be a constant, given that the semi-aggressive regions are
shrunk for small tables?
+/* + * An eager scan of a page that is set all-frozen in the VM is considered + * "successful". To spread out eager scanning across multiple semi-aggressive + * vacuums, we limit the number of successful eager scans (as well as the + * number of failures). The maximum number of successful eager scans is + * calculated as a ratio of the all-visible but not all-frozen pages at the + * beginning of the vacuum. + */ +#define EAGER_SCAN_SUCCESS_RATE 0.2 typedef struct LVRelState
There imo should be newlines between the define and LVRelState.
{
/* Target heap relation and its indexes */
@@ -153,8 +208,22 @@ typedef struct LVRelState
BufferAccessStrategy bstrategy;
ParallelVacuumState *pvs;- /* Aggressive VACUUM? (must set relfrozenxid >= FreezeLimit) */ - bool aggressive; + /* + * Whether or not this is an aggressive, semi-aggressive, or unaggressive + * VACUUM. A fully aggressive vacuum must set relfrozenxid >= FreezeLimit + * and therefore must scan every unfrozen tuple. A semi-aggressive vacuum + * will scan a certain number of all-visible pages until it is downgraded + * to an unaggressive vacuum. + */ + VacAggressive aggressive;
Hm. A few comments:
- why is VacAggressive defined in vacuum.h? Isn't this fairly tightly coupled
to heapam?
- Kinda feels like the type should be named VacAggressivness or such?
+ /* + * A semi-aggressive vacuum that has failed to freeze too many eagerly + * scanned blocks in a row suspends eager scanning. unaggressive_to is the + * block number of the first block eligible for resumed eager scanning. + */ + BlockNumber unaggressive_to;
What's it set to otherwise? What is it set to in aggressive vacuums?
@@ -227,6 +296,26 @@ typedef struct LVRelState BlockNumber next_unskippable_block; /* next unskippable block */ bool next_unskippable_allvis; /* its visibility status */ Buffer next_unskippable_vmbuffer; /* buffer containing its VM bit */ + + /* + * Count of skippable blocks eagerly scanned as part of a semi-aggressive + * vacuum (for logging only). + */ + BlockNumber eager_scanned;
I'd add _pages or such.
+ /* + * The number of eagerly scanned blocks a semi-aggressive vacuum failed to + * freeze (due to age) in the current eager scan region. It is reset each + * time we hit MAX_SUCCESSIVE_EAGER_SCAN_FAILS. + */ + BlockNumber eager_scanned_failed_frozen; + + /* + * The remaining number of blocks a semi-aggressive vacuum will consider + * eager scanning. This is initialized to EAGER_SCAN_SUCCESS_RATE of the + * total number of all-visible but not all-frozen pages. + */ + BlockNumber remaining_eager_scan_successes;
I think it might look better if you just bundled these into a struct like
struct
{
BlockNumber scanned;
BlockNumber failed_frozen;
BlockNumber remaining_successes;
} eager_pages;
@@ -471,24 +569,49 @@ heap_vacuum_rel(Relation rel, VacuumParams *params,
* Force aggressive mode, and disable skipping blocks using the
* visibility map (even those set all-frozen)
*/
- vacrel->aggressive = true;
+ vacrel->aggressive = VAC_AGGRESSIVE;
skipwithvm = false;
}vacrel->skipwithvm = skipwithvm; + vacrel->eager_scanned = 0; + vacrel->eager_scanned_failed_frozen = 0; + + /* + * Even if we successfully freeze them, we want to cap the number of + * eagerly scanned blocks so that we spread out the overhead across + * multiple vacuums. remaining_eager_scan_successes is only used by + * semi-aggressive vacuums. + */
Somehow the "them" feels like a dangling pointer that's initialized too late ;)
+ visibilitymap_count(rel, &orig_rel_allvisible, &orig_rel_allfrozen); + vacrel->remaining_eager_scan_successes = + (BlockNumber) (EAGER_SCAN_SUCCESS_RATE * (orig_rel_allvisible - orig_rel_allfrozen));if (verbose) { - if (vacrel->aggressive) - ereport(INFO, - (errmsg("aggressively vacuuming \"%s.%s.%s\"", - vacrel->dbname, vacrel->relnamespace, - vacrel->relname))); - else - ereport(INFO, - (errmsg("vacuuming \"%s.%s.%s\"", - vacrel->dbname, vacrel->relnamespace, - vacrel->relname))); + switch (vacrel->aggressive) + { + case VAC_UNAGGRESSIVE: + ereport(INFO, + (errmsg("vacuuming \"%s.%s.%s\"", + vacrel->dbname, vacrel->relnamespace, + vacrel->relname))); + break; + + case VAC_AGGRESSIVE: + ereport(INFO, + (errmsg("aggressively vacuuming \"%s.%s.%s\"", + vacrel->dbname, vacrel->relnamespace, + vacrel->relname))); + break; + + case VAC_SEMIAGGRESSIVE: + ereport(INFO, + (errmsg("semiaggressively vacuuming \"%s.%s.%s\"", + vacrel->dbname, vacrel->relnamespace, + vacrel->relname))); + break; + }
Wonder if we should have a function that returns the aggressiveness of a
vacuum as an already translated string. There are other places where we emit
the aggressiveness as part of a message, and it's pretty silly to duplicate
most of the message.
@@ -545,11 +668,13 @@ heap_vacuum_rel(Relation rel, VacuumParams *params, * Non-aggressive VACUUMs may advance them by any amount, or not at all. */ Assert(vacrel->NewRelfrozenXid == vacrel->cutoffs.OldestXmin || - TransactionIdPrecedesOrEquals(vacrel->aggressive ? vacrel->cutoffs.FreezeLimit : + TransactionIdPrecedesOrEquals(vacrel->aggressive == VAC_AGGRESSIVE ? + vacrel->cutoffs.FreezeLimit : vacrel->cutoffs.relfrozenxid, vacrel->NewRelfrozenXid)); Assert(vacrel->NewRelminMxid == vacrel->cutoffs.OldestMxact || - MultiXactIdPrecedesOrEquals(vacrel->aggressive ? vacrel->cutoffs.MultiXactCutoff : + MultiXactIdPrecedesOrEquals(vacrel->aggressive == VAC_AGGRESSIVE ? + vacrel->cutoffs.MultiXactCutoff : vacrel->cutoffs.relminmxid, vacrel->NewRelminMxid)); if (vacrel->skippedallvis)
These are starting to feel somewhat complicated. Wonder if it'd be easier to
read if they were written as normal ifs.
+/* + * Helper to decrement a block number to 0 without wrapping around. + */ +static void +decrement_blkno(BlockNumber *block) +{ + if ((*block) > 0) + (*block)--; +}
Huh.
@@ -956,11 +1094,23 @@ lazy_scan_heap(LVRelState *vacrel)
if (!got_cleanup_lock)
LockBuffer(buf, BUFFER_LOCK_SHARE);+ page_freezes = vacrel->vm_page_freezes; + /* Check for new or empty pages before lazy_scan_[no]prune call */ if (lazy_scan_new_or_empty(vacrel, buf, blkno, page, !got_cleanup_lock, vmbuffer)) { /* Processed as new/empty page (lock and pin released) */ + + /* count an eagerly scanned page as a failure or a success */ + if (was_eager_scanned) + { + if (vacrel->vm_page_freezes > page_freezes) + decrement_blkno(&vacrel->remaining_eager_scan_successes); + else + vacrel->eager_scanned_failed_frozen++; + } + continue; }
Maybe I'm confused, but ISTM that remaining_eager_scan_successes shouldn't
actually be a BlockNumber, given it doesn't actually indicates a specific
block...
I don't understand why we would sometimes want to treat empty pages as a
failure? They can't fail to be frozen, no?
I'm not sure it makes sense to count them as progress towards the success
limit either - afaict we just rediscovered free space is the table. That's imo
separate from semi-aggressive freezing.
Storing page_freezes as a copy of vacrel->vm_page_freezes and then checking if
that increased feels like a somewhat ugly way of tracking if freezing
happend. There's no more direct way?
Why is decrement_blkno() needed? How can we ever get into negative territory?
Shouldn't eager scanning have been disabled when
remaining_eager_scan_successes reaches zero and thus prevent
remaining_eager_scan_successes from ever going below zero?
@@ -1144,7 +1310,65 @@ heap_vac_scan_next_block(LVRelState *vacrel,
*/
bool skipsallvis;- find_next_unskippable_block(vacrel, &skipsallvis); + /* + * Figure out if we should disable eager scan going forward or + * downgrade to an unaggressive vacuum altogether. + */ + if (vacrel->aggressive == VAC_SEMIAGGRESSIVE) + { + /* + * If we hit our success limit, there is no need to eagerly scan + * any additional pages. Downgrade the vacuum to unaggressive. + */ + if (vacrel->remaining_eager_scan_successes == 0) + vacrel->aggressive = VAC_UNAGGRESSIVE; + + /* + * If we hit the max number of failed eager scans for this region + * of the table, figure out where the next eager scan region + * should start. Eager scanning is effectively disabled until we + * scan a block in that new region. + */ + else if (vacrel->eager_scanned_failed_frozen >= + MAX_SUCCESSIVE_EAGER_SCAN_FAILS) + { + BlockNumber region_size, + offset; +
Why are we doing this logic here, rather than after incrementing
eager_scanned_failed_frozen? Seems that'd limit the amount of times we need to
run through this logic substantially?
+ /* + * On the assumption that different regions of the table are + * likely to have similarly aged data, we will retry eager + * scanning again later. For a small table, we'll retry eager + * scanning every quarter of the table. For a larger table, + * we'll consider eager scanning again after processing + * another region's worth of data. + * + * We consider the region to start from the first failure, so + * calculate the block to restart eager scanning from there. + */ + region_size = Min(RELSEG_SIZE, (vacrel->rel_pages / 4)); + + offset = vacrel->eager_scanned_failed_frozen % region_size; + + Assert(vacrel->eager_scanned > 0); + + vacrel->unaggressive_to = vacrel->current_block + (region_size - offset); + vacrel->eager_scanned_failed_frozen = 0; + } + }
I'd put the logic for this in a separate function, feels complicated and
independent enough for it to be worth not having it in heap_vac_scan_next_block().
@@ -1199,6 +1423,11 @@ heap_vac_scan_next_block(LVRelState *vacrel, * The next unskippable block and its visibility information is updated in * vacrel. * + * consider_eager_scan indicates whether or not we should consider scanning + * all-visible but not all-frozen blocks. was_eager_scanned is set to true if + * we decided to eager scan a block. In this case, next_unskippable_block is + * set to that block number. + * * Note: our opinion of which blocks can be skipped can go stale immediately. * It's okay if caller "misses" a page whose all-visible or all-frozen marking * was concurrently cleared, though. All that matters is that caller scan all @@ -1208,7 +1437,11 @@ heap_vac_scan_next_block(LVRelState *vacrel, * to skip such a range is actually made, making everything safe.) */ static void -find_next_unskippable_block(LVRelState *vacrel, bool *skipsallvis) +find_next_unskippable_block( + LVRelState *vacrel, + bool consider_eager_scan, + bool *was_eager_scanned, + bool *skipsallvis)
I don't think we ever have function definitions without a parameter on the
line with the function name.
Hm - isn't consider_eager_scan potentially outdated after
find_next_unskippable_block() iterated over a bunch of blocks? If
consider_eager_scan is false, this could very well go into the next
"semi-aggressive region", where consider_eager_scan should have been
re-enabled, no?
Greetings,
Andres Freund
On Thu, Nov 7, 2024 at 10:42 AM Andres Freund <andres@anarazel.de> wrote:
Hi,
Thanks for the review!
Attached v2 should address your feedback and also fixes a few bugs with v1.
I've still yet to run very long-running benchmarks. I did start running more
varied benchmark scenarios -- but all still under two hours. So far, the
behavior is as expected.
On 2024-11-01 19:35:22 -0400, Melanie Plageman wrote:
Because we want to amortize our eager scanning across a few vacuums,
we cap the maximum number of successful eager scans to a percentage of
the total number of all-visible but not all-frozen pages in the table
(currently 20%).One thing worth mentioning around here seems that we currently can't
partially-aggressively freeze tuples that are "too young" and how that
interacts with everything else.
I'm not sure I know what you mean. Are you talking about how we don't freeze
tuples that are visible to everyone but younger than the freeze limit?
In the attached chart.png, you can see the vm_page_freezes climbing
steadily with the patch, whereas on master, there are sudden spikes
aligned with the aggressive vacuums. You can also see that the number
of pages that are all-visible but not all-frozen grows steadily on
master until the aggressive vacuum. This is vacuum's "backlog" of
freezing work.What's the reason for all-visible-but-not-all-frozen to increase to a higher
value initially than where it later settles?
My guess is that it has to do with shorter, more frequent vacuums at the
beginning of the benchmark when the relation is smaller (and we haven't
exceeded shared buffers or memory yet). They are setting pages all-visible, but
we haven't used up enough xids yet to qualify for an eager vacuum.
The peak of AVnAF pages aligns with the start of the first eager vacuum. We
don't do any eager scanning until we are sure there is some data requiring
freeze (see this criteria):
if (TransactionIdIsNormal(vacrel->cutoffs.relfrozenxid) &&
TransactionIdPrecedesOrEquals(vacrel->cutoffs.relfrozenxid,
vacrel->cutoffs.FreezeLimit))
Once we have used up enough xids to qualify for the first eager vacuum, the
number of AVnAF pages starts to go down.
It would follow from this theory that we would see a build-up like this after
each relfrozenxid advancement (so after the next aggressive vacuum).
But I think we don't see this because the vacuums are longer by the time
aggressive vacuums have started, so we end up using up enough XIDs between
vacuums to qualify for eager vacuums on vacuums after the aggressive vacuum.
That is just my theory though.
Below is the comparative WAL volume, checkpointer and background
writer writes, reads and writes done by all other backend types, time
spent vacuuming in milliseconds, and p99 latency. Notice that overall
vacuum IO time is substantially lower with the patch.version wal cptr_bgwriter_w other_rw vac_io_time p99_lat
patch 770 GB 5903264 235073744 513722 1
master 767 GB 5908523 216887764 1003654 16Hm. It's not clear to me why other_rw is higher with the patch? After all,
given the workload, there's no chance of unnecessarily freezing tuples? Is
that just because at the end of the benchmark there's leftover work?
So other_rw is mostly client backend and autovacuum reads and writes. It is
higher with the patch because there are actually more vacuum reads and writes
with the patch than on master. However the autovacuum worker read and write
time is much lower. Those blocks are more often in shared buffers, I would
guess.
From 67781cc2511bb7d62ccc9461f1787272820abcc4 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <melanieplageman@gmail.com>
Date: Mon, 28 Oct 2024 11:07:50 -0400
Subject: [PATCH v1 4/9] Replace uses of blkno local variable in
lazy_scan_heap()Largely LGTM, but I'm not sure that it's worth having as a separate commit.
I've squashed it into the commit that makes heap_vac_scan_next_block() return
the next block number.
From 67b5565ad57d3b196695f85811dde2044ba79f3e Mon Sep 17 00:00:00 2001
From: Melanie Plageman <melanieplageman@gmail.com>
Date: Mon, 28 Oct 2024 11:14:24 -0400
Subject: [PATCH v1 5/9] Move vacuum VM buffer releaseThe VM buffer for the next unskippable block can be released after the
main loop in lazy_scan_heap(). Doing so de-clutters
heap_vac_scan_next_block() and opens up more refactoring options.That's vague...
I've changed the commit message justification to the fact that all the other
vmbuffer releases in vacuum code are in the body of lazy_scan_heap() too (not
in helpers).
From 8485dc400b3d4e9f895170af4f5fb1bb959b8495 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <melanieplageman@gmail.com>
Date: Mon, 28 Oct 2024 11:36:58 -0400
Subject: [PATCH v1 6/9] Remove superfluous next_block local variable in vacuum
codeReduce the number of block related variables in lazy_scan_heap() and its
helpers by removing the next_block local variable from
heap_vac_scan_next_block().I don't mind this change, but I also don't get how it's related to anything
else here or why it's really better than the status quo.
So because this feature adds more complexity to the already complex vacuum code
selecting what blocks to scan, I thought it was important to reduce the number
of variables.
I think the patches in this set that seek to streamline
heap_vac_scan_next_block() help overall clarity.
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c index 4b1eadea1f2..52c9d49f2b1 100644 --- a/src/backend/access/heap/vacuumlazy.c +++ b/src/backend/access/heap/vacuumlazy.c @@ -1112,19 +1112,17 @@ static bool heap_vac_scan_next_block(LVRelState *vacrel, BlockNumber *blkno, bool *all_visible_according_to_vm) { - BlockNumber next_block; - /* relies on InvalidBlockNumber + 1 overflowing to 0 on first call */ - next_block = vacrel->current_block + 1; + vacrel->current_block++;I realize this isn't introduced in this commit, but darn, that's ugly.
I didn't like having special cases for block 0 in heap_vac_scan_next_block()
and personally prefer it this way. I thought it made it more error-prone and
harder to understand.
From 78ad9e022b95e024ff5bfa96af78e9e44730c970 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <melanieplageman@gmail.com>
Date: Mon, 28 Oct 2024 11:42:10 -0400
Subject: [PATCH v1 7/9] Make heap_vac_scan_next_block() return BlockNumber@@ -857,7 +857,8 @@ lazy_scan_heap(LVRelState *vacrel)
vacrel->next_unskippable_allvis = false;
vacrel->next_unskippable_vmbuffer = InvalidBuffer;- while (heap_vac_scan_next_block(vacrel, &blkno, &all_visible_according_to_vm)) + while (BlockNumberIsValid(blkno = heap_vac_scan_next_block(vacrel, + &all_visible_according_to_vm)))Personally I'd write this as
while (true)
{
BlockNumber blkno;blkno = heap_vac_scan_next_block(vacrel, ...);
if (!BlockNumberIsValid(blkno))
break;Mostly because it's good to use more minimal scopes when possible,
particularly when previously the scope intentionally was larger. But also
partially because I don't love variable assignments inside a macro call,
inside a while().
I changed it to be as you suggest. I will concede that variable assignments
inside a macro call inside a while() are a bit much.
From 818d1c3b068c6705611256cfc3eb1f10bdc0b684 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <melanieplageman@gmail.com>
Date: Fri, 1 Nov 2024 18:25:05 -0400
Subject: [PATCH v1 8/9] WIP: Add more general summary to vacuumlazy.cCurrently the summary at the top of vacuumlazy.c provides some specific
details related to the new dead TID storage in 17. I plan to add a
summary and maybe some sub-sections to contextualize it.I like this idea. It's hard to understand vacuumlazy.c without already
understanding vacuumlazy.c, which isn't a good situation.
I've added a bit more to it in this version, but I likely could use some more
text on index vacuuming. I'm thinking I'll commit something minimal but correct
and let people elaborate more later.
---
src/backend/access/heap/vacuumlazy.c | 11 +++++++++++
1 file changed, 11 insertions(+)diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c index 7ce69953ba0..15a04c6b10b 100644 --- a/src/backend/access/heap/vacuumlazy.c +++ b/src/backend/access/heap/vacuumlazy.c @@ -3,6 +3,17 @@ * vacuumlazy.c * Concurrent ("lazy") vacuuming. * + * Heap relations are vacuumed in three main phases. In the first phase, + * vacuum scans relation pages, pruning and freezing tuples and saving dead + * tuples' TIDs in a TID store. If that TID store fills up or vacuum finishes + * scanning the relation, it progresses to the second phase: index vacuuming. + * After index vacuuming is complete, vacuum scans the blocks of the relation + * indicated by the TIDs in the TID store and reaps the dead tuples, freeing + * that space for future tuples. Finally, vacuum may truncate the relation if + * it has emptied pages at the end. XXX: this summary needs work.Yea, at least we ought to mention that the phasing can be different when there
are no indexes and that the later phases can heuristically be omitted when
there aren't enough dead items.
I've done this.
From f21f0bab1dbe675be4b4dddcb2eea486d8a69d36 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <melanieplageman@gmail.com>
Date: Mon, 28 Oct 2024 12:15:08 -0400
Subject: [PATCH v1 9/9] Eagerly scan all-visible pages to amortize aggressive
vacuumIntroduce semi-aggressive vacuums, which scan some of the all-visible
but not all-frozen pages in the relation to amortize the cost of an
aggressive vacuum.I wonder if "aggressive" is really the right terminology going
forward... Somehow it doesn't seem particularly descriptive anymore if, in
many workloads, almost all vacuums are going to be aggressive-ish.
I've changed it to normal, eager, and aggressive
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c index 15a04c6b10b..adabb5ff5f1 100644 --- a/src/backend/access/heap/vacuumlazy.c +++ b/src/backend/access/heap/vacuumlazy.c + * + * On the assumption that different regions of the table are likely to contain + * similarly aged data, we use a localized failure cap instead of a global cap + * for the whole relation. The failure count is reset on each region of the + * table, comprised of RELSEG_SIZE blocks (or 1/4 of the table size for a + * small table). In each region, we tolerate MAX_SUCCESSIVE_EAGER_SCAN_FAILS + * before suspending eager scanning until the end of the region.I'm a bit surprised to see such large regions. Why not something finer, in the
range of a few megabytes? The FSM steers new tuples quite aggressively to the
start of the table, which means that in many workloads there will be old and
new data interspersed at the start of the table. Using RELSEG_SIZE sized
regions for semi-aggressive vacuuming will mean that we'll often not do any
semi-aggressive processing beyond the start of the relation, as we'll reach
the failure rate quickly.
I've changed the region size to 32 MB but I also decreased the allowed failures
to 128 blocks per region (to avoid eager scanning too many blocks if we are
failing to freeze them).
This doesn't completely address your concern about missing freezing
opportunities.
However, this version does randomize the eager scan start block selection in
the first region. The first eager scan block will be somewhere in the first
region to avoid re-scanning unfreezable blocks across multiple vacuums. I will
note that this problem is unlikely to persist across multiple vacuums. If the
page is being modified frequently, it won't be all-visible. You would have to
have this pattern for it to be an issue: modify the page, vacuum, vacuum,
modify, vacuum, vacuum (since the first vacuum after the modification will set
the page all-visible).
I also find it layering-wise a bit weird to use RELSEG_SIZE, that's really imo
is just an md.c concept.
Makes sense. New version has a dedicated macro.
+/* + * Semi-aggressive vacuums eagerly scan some all-visible but not all-frozen + * pages. Since our goal is to freeze these pages, an eager scan that fails to + * set the page all-frozen in the VM is considered to have "failed". + * + * On the assumption that different regions of the table tend to have + * similarly aged data, once we fail to freeze MAX_SUCCESSIVE_EAGER_SCAN_FAILS + * blocks, we suspend eager scanning until vacuum has progressed to another + * region of the table with potentially older data. + */ +#define MAX_SUCCESSIVE_EAGER_SCAN_FAILS 1024Can this really be a constant, given that the semi-aggressive regions are
shrunk for small tables?
Good point. This version actually disables eager scans for relations smaller
than a single region.
{ /* Target heap relation and its indexes */ @@ -153,8 +208,22 @@ typedef struct LVRelState + /* + * Whether or not this is an aggressive, semi-aggressive, or unaggressive + * VACUUM. A fully aggressive vacuum must set relfrozenxid >= FreezeLimit + * and therefore must scan every unfrozen tuple. A semi-aggressive vacuum + * will scan a certain number of all-visible pages until it is downgraded + * to an unaggressive vacuum. + */ + VacAggressive aggressive;- why is VacAggressive defined in vacuum.h? Isn't this fairly tightly coupled
to heapam?
It was because I had vacuum_get_cutoffs() return the aggressiveness. I've
changed this so that the enum can be defined in vacuumlazy.c.
- Kinda feels like the type should be named VacAggressivness or such?
I changed it to VacEagerness.
+ /* + * A semi-aggressive vacuum that has failed to freeze too many eagerly + * scanned blocks in a row suspends eager scanning. unaggressive_to is the + * block number of the first block eligible for resumed eager scanning. + */ + BlockNumber unaggressive_to;What's it set to otherwise? What is it set to in aggressive vacuums?
The idea was to set it to 0 for aggressive vacuum and never advance it.
However, for eager vacuum, there was actually a problem with this version of
the patch set that meant that we weren't actually enabling and disabling eager
scanning per region. Instead we were waiting until we hit the fail limit and
then disabling eager scanning for region-size # of blocks. This was effectively
a cooling off period as opposed to a region-based approach.
I've changed this in the current version. Now, for eager vacuum, we save the
block number of the next eager scan region in next_eager_scan_region_start.
Then when we cross over into the next region, we advance it. Eager scanning is
enabled as long as eager_pages.remaining_fails is > 0. When we cross into a new
region, we reset it to re-enable eager scanning if it was disabled.
For normal and aggressive vacuum, I set next_eager_scan_region_start to
InvalidBlockNumber to ensure we never trigger region calculations. However, for
aggressive vacuums, I do keep track of all-visible pages scanned using the same
counter in LVRelState that counts eager pages scanned. I'm not sure if it is
confusing to use some of this accounting labeled as eager scan accounting for
aggressive vacuum. In fact, in the logs, I print out all-visible pages scanned
-- which will be > 0 for both aggressive vacuums and eager vacuums.
There are tradeoffs between using the eager scan counters for all vacuum types
and initializing them to different values based on the vacuum eagerness level
and guarding all reference to them by vacuum type (and not initializing them to
valid but special values).
Let me know what you think about using the counters for aggressive vacuum too.
On the topic of the region-based method vs the cool-off method, with the region
method, if all of the failures are concentrated at the end of the region, we
will start eager scanning again as soon as we start the next region. With the
cool-off method we would wait a consistent number of blocks. But I think the
region method is still better. The region cutoff may be arbitrary but it
produces a consistent amount of extra scanning. What do you think?
+ /* + * The number of eagerly scanned blocks a semi-aggressive vacuum failed to + * freeze (due to age) in the current eager scan region. It is reset each + * time we hit MAX_SUCCESSIVE_EAGER_SCAN_FAILS. + */ + BlockNumber eager_scanned_failed_frozen; + + /* + * The remaining number of blocks a semi-aggressive vacuum will consider + * eager scanning. This is initialized to EAGER_SCAN_SUCCESS_RATE of the + * total number of all-visible but not all-frozen pages. + */ + BlockNumber remaining_eager_scan_successes;I think it might look better if you just bundled these into a struct like
struct
{
BlockNumber scanned;
BlockNumber failed_frozen;
BlockNumber remaining_successes;
} eager_pages;
Done
+ visibilitymap_count(rel, &orig_rel_allvisible, &orig_rel_allfrozen); + vacrel->remaining_eager_scan_successes = + (BlockNumber) (EAGER_SCAN_SUCCESS_RATE * (orig_rel_allvisible - orig_rel_allfrozen));if (verbose) { - if (vacrel->aggressive) - ereport(INFO, - (errmsg("aggressively vacuuming \"%s.%s.%s\"", - vacrel->dbname, vacrel->relnamespace, - vacrel->relname))); - else - ereport(INFO, - (errmsg("vacuuming \"%s.%s.%s\"", - vacrel->dbname, vacrel->relnamespace, - vacrel->relname))); + switch (vacrel->aggressive) + { + case VAC_UNAGGRESSIVE: + ereport(INFO, + (errmsg("vacuuming \"%s.%s.%s\"", + vacrel->dbname, vacrel->relnamespace, + vacrel->relname))); + break; + + case VAC_AGGRESSIVE: + ereport(INFO, + (errmsg("aggressively vacuuming \"%s.%s.%s\"", + vacrel->dbname, vacrel->relnamespace, + vacrel->relname))); + break; + + case VAC_SEMIAGGRESSIVE: + ereport(INFO, + (errmsg("semiaggressively vacuuming \"%s.%s.%s\"", + vacrel->dbname, vacrel->relnamespace, + vacrel->relname))); + break; + }Wonder if we should have a function that returns the aggressiveness of a
vacuum as an already translated string. There are other places where we emit
the aggressiveness as part of a message, and it's pretty silly to duplicate
most of the message.
I've added a helper to return text with the vacuum eagerness level. I used
gettext_noop() to mark it for translation later because I think the autovacuum
logging uses _() and ereports are translated. But I'm not sure this is
completely right.
@@ -545,11 +668,13 @@ heap_vacuum_rel(Relation rel, VacuumParams *params, * Non-aggressive VACUUMs may advance them by any amount, or not at all. */ Assert(vacrel->NewRelfrozenXid == vacrel->cutoffs.OldestXmin || - TransactionIdPrecedesOrEquals(vacrel->aggressive ? vacrel->cutoffs.FreezeLimit : + TransactionIdPrecedesOrEquals(vacrel->aggressive == VAC_AGGRESSIVE ? + vacrel->cutoffs.FreezeLimit : vacrel->cutoffs.relfrozenxid, vacrel->NewRelfrozenXid)); Assert(vacrel->NewRelminMxid == vacrel->cutoffs.OldestMxact || - MultiXactIdPrecedesOrEquals(vacrel->aggressive ? vacrel->cutoffs.MultiXactCutoff : + MultiXactIdPrecedesOrEquals(vacrel->aggressive == VAC_AGGRESSIVE ? + vacrel->cutoffs.MultiXactCutoff : vacrel->cutoffs.relminmxid, vacrel->NewRelminMxid)); if (vacrel->skippedallvis)These are starting to feel somewhat complicated. Wonder if it'd be easier to
read if they were written as normal ifs.
Did this.
+/* + * Helper to decrement a block number to 0 without wrapping around. + */ +static void +decrement_blkno(BlockNumber *block) +{ + if ((*block) > 0) + (*block)--; +}
I've removed it.
@@ -956,11 +1094,23 @@ lazy_scan_heap(LVRelState *vacrel)
if (!got_cleanup_lock)
LockBuffer(buf, BUFFER_LOCK_SHARE);+ page_freezes = vacrel->vm_page_freezes; + /* Check for new or empty pages before lazy_scan_[no]prune call */ if (lazy_scan_new_or_empty(vacrel, buf, blkno, page, !got_cleanup_lock, vmbuffer)) { /* Processed as new/empty page (lock and pin released) */ + + /* count an eagerly scanned page as a failure or a success */ + if (was_eager_scanned) + { + if (vacrel->vm_page_freezes > page_freezes) + decrement_blkno(&vacrel->remaining_eager_scan_successes); + else + vacrel->eager_scanned_failed_frozen++; + } + continue; }Maybe I'm confused, but ISTM that remaining_eager_scan_successes shouldn't
actually be a BlockNumber, given it doesn't actually indicates a specific
block...
All of the other counters like this in LVRelState are BlockNumbers (see
lpdead_item_pages, missed_dead_pages, etc). I'm fine with not using a
BlockNumber for this, but I would want to have a justification for why it is
different than the other cumulative counters for numbers of blocks.
I don't understand why we would sometimes want to treat empty pages as a
failure? They can't fail to be frozen, no?I'm not sure it makes sense to count them as progress towards the success
limit either - afaict we just rediscovered free space is the table. That's imo
separate from semi-aggressive freezing.
That's a great point. In fact, I don't think we could ever have exercised this
code anyway. Since we will always freeze it, there shouldn't be any all-visible
but not all-frozen empty pages. I've removed this code.
Storing page_freezes as a copy of vacrel->vm_page_freezes and then checking if
that increased feels like a somewhat ugly way of tracking if freezing
happend. There's no more direct way.
This version uses an output parameter to lazy_scan_prune().
Why is decrement_blkno() needed? How can we ever get into negative territory?
Shouldn't eager scanning have been disabled when
remaining_eager_scan_successes reaches zero and thus prevent
remaining_eager_scan_successes from ever going below zero?
Right I've removed this.
@@ -1144,7 +1310,65 @@ heap_vac_scan_next_block(LVRelState *vacrel,
*/
bool skipsallvis;- find_next_unskippable_block(vacrel, &skipsallvis); + /* + * Figure out if we should disable eager scan going forward or + * downgrade to an unaggressive vacuum altogether. + */ + if (vacrel->aggressive == VAC_SEMIAGGRESSIVE) + { + /* + * If we hit our success limit, there is no need to eagerly scan + * any additional pages. Downgrade the vacuum to unaggressive. + */ + if (vacrel->remaining_eager_scan_successes == 0) + vacrel->aggressive = VAC_UNAGGRESSIVE; + + /* + * If we hit the max number of failed eager scans for this region + * of the table, figure out where the next eager scan region + * should start. Eager scanning is effectively disabled until we + * scan a block in that new region. + */ + else if (vacrel->eager_scanned_failed_frozen >= + MAX_SUCCESSIVE_EAGER_SCAN_FAILS) + { + BlockNumber region_size, + offset; +Why are we doing this logic here, rather than after incrementing
eager_scanned_failed_frozen? Seems that'd limit the amount of times we need to
run through this logic substantially?
I've moved it there. Actually, all of the logic has been moved around to make
the region method work.
Hm - isn't consider_eager_scan potentially outdated after
find_next_unskippable_block() iterated over a bunch of blocks? If
consider_eager_scan is false, this could very well go into the next
"semi-aggressive region", where consider_eager_scan should have been
re-enabled, no?
Yep. Great catch. I believe I've fixed this by removing consider_eager_scan and
instead checking if we have remaining failures in this region and resetting
remaining failures whenever we enter a new region.
Also turns out was_eager_scanned had some issues that I believe I've now fixed
as well.
- Melanie
Attachments:
v2-0001-Rename-LVRelState-frozen_pages.patchtext/x-patch; charset=US-ASCII; name=v2-0001-Rename-LVRelState-frozen_pages.patchDownload+10-9
v2-0005-Move-vacuum-VM-buffer-release.patchtext/x-patch; charset=US-ASCII; name=v2-0005-Move-vacuum-VM-buffer-release.patchDownload+8-6
v2-0002-Make-visibilitymap_set-return-previous-state-of-v.patchtext/x-patch; charset=US-ASCII; name=v2-0002-Make-visibilitymap_set-return-previous-state-of-v.patchDownload+13-6
v2-0004-Remove-leftover-mentions-of-XLOG_HEAP2_FREEZE_PAG.patchtext/x-patch; charset=US-ASCII; name=v2-0004-Remove-leftover-mentions-of-XLOG_HEAP2_FREEZE_PAG.patchDownload+2-3
v2-0003-Count-pages-set-all-visible-and-all-frozen-in-VM-.patchtext/x-patch; charset=US-ASCII; name=v2-0003-Count-pages-set-all-visible-and-all-frozen-in-VM-.patchDownload+113-13
v2-0006-Remove-superfluous-next_block-local-variable-in-v.patchtext/x-patch; charset=US-ASCII; name=v2-0006-Remove-superfluous-next_block-local-variable-in-v.patchDownload+10-12
v2-0007-Make-heap_vac_scan_next_block-return-BlockNumber.patchtext/x-patch; charset=US-ASCII; name=v2-0007-Make-heap_vac_scan_next_block-return-BlockNumber.patchDownload+26-23
v2-0009-Add-more-general-summary-to-vacuumlazy.c.patchtext/x-patch; charset=US-ASCII; name=v2-0009-Add-more-general-summary-to-vacuumlazy.c.patchDownload+39-1
v2-0008-Refactor-vacuum-assert-into-multiple-if-statement.patchtext/x-patch; charset=US-ASCII; name=v2-0008-Refactor-vacuum-assert-into-multiple-if-statement.patchDownload+37-9
v2-0010-Eagerly-scan-all-visible-pages-to-amortize-aggres.patchtext/x-patch; charset=US-ASCII; name=v2-0010-Eagerly-scan-all-visible-pages-to-amortize-aggres.patchDownload+373-55
On Fri, Dec 13, 2024 at 5:53 PM Melanie Plageman
<melanieplageman@gmail.com> wrote:
On Thu, Nov 7, 2024 at 10:42 AM Andres Freund <andres@anarazel.de> wrote:
Hi,
Thanks for the review!
Attached v2 should address your feedback and also fixes a few bugs with v1.I've still yet to run very long-running benchmarks. I did start running more
varied benchmark scenarios -- but all still under two hours. So far, the
behavior is as expected.On 2024-11-01 19:35:22 -0400, Melanie Plageman wrote:
Because we want to amortize our eager scanning across a few vacuums,
we cap the maximum number of successful eager scans to a percentage of
the total number of all-visible but not all-frozen pages in the table
(currently 20%).One thing worth mentioning around here seems that we currently can't
partially-aggressively freeze tuples that are "too young" and how that
interacts with everything else.I'm not sure I know what you mean. Are you talking about how we don't freeze
tuples that are visible to everyone but younger than the freeze limit?
FWIW that was my interpretation of his statement, though I had a
clarifying question around this topic myself, which is, from a user
perspective when would we expect to see these eager vacuums? ISTM we
would be doing 'normal vacuums' prior to vacuum_freeze_min_age, and
'aggressive vacuums' after (autovacuum_freeze_max_age -
vacuum_freeze_min_age), so the expectation is that 'eager vacuums'
would fall into the ~50 million transaction window between those two
points (assuming defaults, which admittedly I don't use). Does that
sound right?
In the attached chart.png, you can see the vm_page_freezes climbing
steadily with the patch, whereas on master, there are sudden spikes
aligned with the aggressive vacuums. You can also see that the number
of pages that are all-visible but not all-frozen grows steadily on
master until the aggressive vacuum. This is vacuum's "backlog" of
freezing work.What's the reason for all-visible-but-not-all-frozen to increase to a higher
value initially than where it later settles?My guess is that it has to do with shorter, more frequent vacuums at the
beginning of the benchmark when the relation is smaller (and we haven't
exceeded shared buffers or memory yet). They are setting pages all-visible, but
we haven't used up enough xids yet to qualify for an eager vacuum.The peak of AVnAF pages aligns with the start of the first eager vacuum. We
don't do any eager scanning until we are sure there is some data requiring
freeze (see this criteria):if (TransactionIdIsNormal(vacrel->cutoffs.relfrozenxid) &&
TransactionIdPrecedesOrEquals(vacrel->cutoffs.relfrozenxid,
vacrel->cutoffs.FreezeLimit))Once we have used up enough xids to qualify for the first eager vacuum, the
number of AVnAF pages starts to go down.It would follow from this theory that we would see a build-up like this after
each relfrozenxid advancement (so after the next aggressive vacuum).But I think we don't see this because the vacuums are longer by the time
aggressive vacuums have started, so we end up using up enough XIDs between
vacuums to qualify for eager vacuums on vacuums after the aggressive vacuum.That is just my theory though.
I like your theory but it's a little too counterintuitive for me :-)
I would expect we'd see a change in the vacuum time & rate after the
first aggressive scan, which incidentally your graph does show for
master, but it looks a bit too smooth on your original patchset. I
guess there could be a sweet spot where the rate of changes fit
perfectly with regards to the line between lazy / eager vacuums, but
hard to imagine you were that lucky.
Below is the comparative WAL volume, checkpointer and background
writer writes, reads and writes done by all other backend types, time
spent vacuuming in milliseconds, and p99 latency. Notice that overall
vacuum IO time is substantially lower with the patch.version wal cptr_bgwriter_w other_rw vac_io_time p99_lat
patch 770 GB 5903264 235073744 513722 1
master 767 GB 5908523 216887764 1003654 16Hm. It's not clear to me why other_rw is higher with the patch? After all,
given the workload, there's no chance of unnecessarily freezing tuples? Is
that just because at the end of the benchmark there's leftover work?So other_rw is mostly client backend and autovacuum reads and writes. It is
higher with the patch because there are actually more vacuum reads and writes
with the patch than on master. However the autovacuum worker read and write
time is much lower. Those blocks are more often in shared buffers, I would
guess.
<snip>
From e36b4fac345be44954410c4f0e61467dc0f49a72 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <melanieplageman@gmail.com>
Date: Thu, 12 Dec 2024 16:44:37 -0500
Subject: [PATCH v2 10/10] Eagerly scan all-visible pages to amortize
aggressive vacuum@@ -27,11 +27,37 @@ * to the end, skipping pages as permitted by their visibility status, vacuum * options, and the eagerness level of the vacuum. * + * There are three vacuum eagerness levels: normal vacuum, eager vacuum, and + * aggressive vacuum. + * * When page skipping is enabled, non-aggressive vacuums may skip scanning - * pages that are marked all-visible in the visibility map. We may choose not + * pages that are marked all-visible in the visibility map. It may choose not * to skip pages if the range of skippable pages is below * SKIP_PAGES_THRESHOLD. *
I find the above confusing since page skipping is the regular activity
but referred to in the negative, and because you use the term
"non-aggressive vacuums" which in prior releases only mapped to
"normal" vacuums, but now would map to both "normal" and "eager"
vacuums, and it isn't clear that is desired (in my head anyway). Does
the following still convey what you meant (and hopefully work better
with the paragraphs that follow)?
When page skipping is not disabled, a normal vacuum may skip scanning
pages that are marked all-visible in the visibility map if the range
of skippable pages is below SKIP_PAGES_THRESHOLD.
+ * Eager vacuums will scan skippable pages in an effort to freeze them and + * decrease the backlog of all-visible but not all-frozen pages that have to + * be processed to advance relfrozenxid and avoid transaction ID wraparound. + *
@@ -170,6 +197,51 @@ typedef enum
VACUUM_ERRCB_PHASE_TRUNCATE,
} VacErrPhase;+/* + * Eager vacuums scan some all-visible but not all-frozen pages. Since our + * goal is to freeze these pages, an eager scan that fails to set the page + * all-frozen in the VM is considered to have "failed". + * + * On the assumption that different regions of the table tend to have + * similarly aged data, once we fail to freeze EAGER_SCAN_MAX_FAILS_PER_REGION + * blocks in a region of size EAGER_SCAN_REGION_SIZE, we suspend eager + * scanning until vacuum has progressed to another region of the table with + * potentially older data. + */ +#define EAGER_SCAN_REGION_SIZE 4096 +#define EAGER_SCAN_MAX_FAILS_PER_REGION 128 + +/* + * An eager scan of a page that is set all-frozen in the VM is considered + * "successful". To spread out eager scanning across multiple eager vacuums, + * we limit the number of successful eager page scans. The maximum number of + * successful eager page scans is calculated as a ratio of the all-visible but + * not all-frozen pages at the beginning of the vacuum. + */ +#define EAGER_SCAN_SUCCESS_RATE 0.2 + +/* + * The eagerness level of a vacuum determines how many all-visible but + * not all-frozen pages it eagerly scans. + * + * A normal vacuum (eagerness VAC_NORMAL) scans no all-visible pages (with the + * exception of those scanned due to SKIP_PAGES_THRESHOLD). + * + * An eager vacuum (eagerness VAC_EAGER) scans a number of pages up to a limit + * based on whether or not it is succeeding or failing. An eager vacuum is + * downgraded to a normal vacuum when it hits its success quota. An aggressive + * vacuum cannot be downgraded. No eagerness level is ever upgraded. + *
At the risk of being overly nit-picky... eager vacuums scan their
subset of all-visible pages "up to a limit" based solely on the
success ratio. In the case of (excessive) failures, there is no limit
to the number of pages scanned, only a pause in the pages scanned
until the next region.
+ * An aggressive vacuum (eagerness EAGER_FULL) must scan all all-visible but + * not all-frozen pages. + */
I think the above should be VAC_AGGRESSIVE vs EAGER_FULL, no?
+typedef enum VacEagerness +{ + VAC_NORMAL, + VAC_EAGER, + VAC_AGGRESSIVE, +} VacEagerness; +
@@ -516,25 +772,20 @@ heap_vacuum_rel(Relation rel, VacuumParams *params,
* Force aggressive mode, and disable skipping blocks using the
* visibility map (even those set all-frozen)
*/
- vacrel->aggressive = true;
+ aggressive = true;
skipwithvm = false;
}vacrel->skipwithvm = skipwithvm;
+ heap_vacuum_set_up_eagerness(vacrel, aggressive); + if (verbose) - { - if (vacrel->aggressive) - ereport(INFO, - (errmsg("aggressively vacuuming \"%s.%s.%s\"", - vacrel->dbname, vacrel->relnamespace, - vacrel->relname))); - else - ereport(INFO, - (errmsg("vacuuming \"%s.%s.%s\"", - vacrel->dbname, vacrel->relnamespace, - vacrel->relname))); - } + ereport(INFO, + (errmsg("%s of \"%s.%s.%s\"", + vac_eagerness_description(vacrel->eagerness), + vacrel->dbname, vacrel->relnamespace, + vacrel->relname)));/*
* Allocate dead_items memory using dead_items_alloc. This handles
One thing I am wondering about is that since we actually modify
vacrel->eagerness during the "success downgrade" cycle, a single
vacuum run could potentially produce messages with both eager vacuum
and normal vacuum language. I don't think that'd be a problem in the
above spot, but wondering if it might be elsewhere (maybe in
pg_stat_activity?).
Robert Treat
https://xzilla.net
Hi,
Thank you for working on this!
On Sat, 14 Dec 2024 at 01:53, Melanie Plageman
<melanieplageman@gmail.com> wrote:
On Thu, Nov 7, 2024 at 10:42 AM Andres Freund <andres@anarazel.de> wrote:
Hi,
Thanks for the review!
Attached v2 should address your feedback and also fixes a few bugs with v1.
Here are couple of code comments:
=== [PATCH v2 07/10] ===
It took me a while to understand that heap_vac_scan_next_block() loops
until rel_pages. What do you think about adding
Assert(vacrel->current_block == rel_pages) before the
pgstat_progress_update_param(PROGRESS_VACUUM_HEAP_BLKS_SCANNED,
rel_pages) and having a comment on main loop should process blocks
until rel_pages?
=== [PATCH v2 09/10] ===
+ * If there are no indexes or index scanning is disabled, phase II may be
+ * skipped. If phase I identified very few dead index entries, vacuum may skip
+ * phases II and III. Index vacuuming deletes the dead index entries from the
+ * TID store.
phase III is not mentioned in the previous comments. It could be
better to first explain what phase III is before mentioning it.
+ * After index vacuuming is complete, vacuum scans the blocks of the relation
+ * indicated by the TIDs in the TID store and reaps the dead tuples, freeing
+ * that space for future tuples. Finally, vacuum may truncate the relation if
+ * it has emptied pages at the end.
+ *
+ * After finishing all phases of work, vacuum updates relation statistics in
+ * pg_class and the cumulative statistics subsystem.
There is no clear definition of phase III here as well. I can not
understand what phase III is and which parts the vacuum may skip.
=== [PATCH v2 10/10] ===
+ /*
+ * The number of eagerly scanned blocks an eager vacuum failed to
+ * freeze (due to age) in the current eager scan region. Eager vacuums
+ * reset it to EAGER_SCAN_MAX_FAILS_PER_REGION each time they enter a
+ * new region of the relation. Aggressive vacuums also decrement this
+ * coutner but it is initialized to # blocks in the relation.
+ */
s/coutner/counter
/* non-export function prototypes */
+
static void lazy_scan_heap(LVRelState *vacrel);
Extra blank line.
+ if (TransactionIdIsNormal(vacrel->cutoffs.relfrozenxid) &&
+ TransactionIdPrecedesOrEquals(vacrel->cutoffs.relfrozenxid,
+ vacrel->cutoffs.FreezeLimit))
+ oldest_unfrozen_requires_freeze = true;
+
+ if (MultiXactIdIsValid(vacrel->cutoffs.relminmxid) &&
+ MultiXactIdPrecedesOrEquals(vacrel->cutoffs.relminmxid,
+ vacrel->cutoffs.MultiXactCutoff))
+ oldest_unfrozen_requires_freeze = true;
You may want to short-circuit the second if condition with
!oldest_unfrozen_requires_freeze but it may increase the complexity,
up to you.
+ vacrel->eager_pages.remaining_fails = EAGER_SCAN_MAX_FAILS_PER_REGION *
+ (1 - vacrel->next_eager_scan_region_start / EAGER_SCAN_REGION_SIZE);
This will always return EAGER_SCAN_MAX_FAILS_PER_REGION or 0 because
of the integer dividing.
+ if (aggressive)
+ {
+ vacrel->eagerness = VAC_AGGRESSIVE;
+
+ /*
+ * An aggressive vacuum must scan every all-visible page to safely
+ * advance the relfrozenxid and/or relminmxid. As such, there is no
+ * cap to the number of allowed successes or failures.
+ */
+ vacrel->eager_pages.remaining_fails = vacrel->rel_pages + 1;
+ vacrel->eager_pages.remaining_successes = vacrel->rel_pages + 1;
+ return;
+ }
...
...
+ if (was_eager_scanned)
+ {
+ if (vm_page_frozen)
+ {
+ Assert(vacrel->eager_pages.remaining_successes > 0);
+ vacrel->eager_pages.remaining_successes--;
+
+ if (vacrel->eager_pages.remaining_successes == 0)
+ {
+ Assert(vacrel->eagerness == VAC_EAGER);
My initial thought was that since *was_eager_scanned* is true,
Assert(vacrel->eagerness == VAC_EAGER) should be under the top if
condition (I assumed that was_eager_scanned is only true for eager
vacuums, not for aggressive vacuums too) but I see your point here.
Since you set vacrel->eager_pages.remaining_successes to
vacrel->rel_pages + 1, vacrel->eager_pages.remaining_successes can not
reach 0 although all pages are processed as successful. I think
comment is needed in both places to explain why
vacrel->eager_pages.[remaining_fails | remaining_successes] is set to
vacrel->rel_pages + 1 and why vacrel->eagerness should be VAC_EAGER
when was_eager_scanned is true and
vacrel->eager_pages.remaining_successes is 0.
--
Regards,
Nazir Bilal Yavuz
Microsoft
Thanks for taking a look!
I've rebased and attached an updated v3 which also addresses review feedback.
On Sun, Dec 15, 2024 at 1:05 AM Robert Treat <rob@xzilla.net> wrote:
On Fri, Dec 13, 2024 at 5:53 PM Melanie Plageman
<melanieplageman@gmail.com> wrote:On Thu, Nov 7, 2024 at 10:42 AM Andres Freund <andres@anarazel.de> wrote:
Hi,
One thing worth mentioning around here seems that we currently can't
partially-aggressively freeze tuples that are "too young" and how that
interacts with everything else.I'm not sure I know what you mean. Are you talking about how we don't freeze
tuples that are visible to everyone but younger than the freeze limit?FWIW that was my interpretation of his statement, though I had a
clarifying question around this topic myself, which is, from a user
perspective when would we expect to see these eager vacuums? ISTM we
would be doing 'normal vacuums' prior to vacuum_freeze_min_age, and
'aggressive vacuums' after (autovacuum_freeze_max_age -
vacuum_freeze_min_age), so the expectation is that 'eager vacuums'
would fall into the ~50 million transaction window between those two
points (assuming defaults, which admittedly I don't use). Does that
sound right?
Basically, yes (but 100 mill instead of 50)
By default, vacuum_freeze_min_age default is 50 million and
vacuum_freeze_table_age is 150 million (autovacuum_freeze_max_age
[default 200 mill] determines whether or not we kick off an anti-wrap
vac while vacuum_freeze_table_age determines whether or not a given
vacuum is an aggressive vacuum, which means that all anti-wrap vacs
are aggressive). Anyway, that means the eager vacuums will be
triggered somewhere in that 100 million xid window.
This may be an admittedly small window. However, for users tuning
these parameters with the intent of freezing, I would hope they are
decreasing vacuum_freeze_min_age and increasing
autovacuum_freeze_max_age.
In the attached chart.png, you can see the vm_page_freezes climbing
steadily with the patch, whereas on master, there are sudden spikes
aligned with the aggressive vacuums. You can also see that the number
of pages that are all-visible but not all-frozen grows steadily on
master until the aggressive vacuum. This is vacuum's "backlog" of
freezing work.What's the reason for all-visible-but-not-all-frozen to increase to a higher
value initially than where it later settles?My guess is that it has to do with shorter, more frequent vacuums at the
beginning of the benchmark when the relation is smaller (and we haven't
exceeded shared buffers or memory yet). They are setting pages all-visible, but
we haven't used up enough xids yet to qualify for an eager vacuum.The peak of AVnAF pages aligns with the start of the first eager vacuum. We
don't do any eager scanning until we are sure there is some data requiring
freeze (see this criteria):if (TransactionIdIsNormal(vacrel->cutoffs.relfrozenxid) &&
TransactionIdPrecedesOrEquals(vacrel->cutoffs.relfrozenxid,
vacrel->cutoffs.FreezeLimit))Once we have used up enough xids to qualify for the first eager vacuum, the
number of AVnAF pages starts to go down.It would follow from this theory that we would see a build-up like this after
each relfrozenxid advancement (so after the next aggressive vacuum).But I think we don't see this because the vacuums are longer by the time
aggressive vacuums have started, so we end up using up enough XIDs between
vacuums to qualify for eager vacuums on vacuums after the aggressive vacuum.That is just my theory though.
I like your theory but it's a little too counterintuitive for me :-)
I would expect we'd see a change in the vacuum time & rate after the
first aggressive scan, which incidentally your graph does show for
master, but it looks a bit too smooth on your original patchset. I
guess there could be a sweet spot where the rate of changes fit
perfectly with regards to the line between lazy / eager vacuums, but
hard to imagine you were that lucky.
I think it's harder to see differences on the chart for the patched
version in terms of how long the vacuums are taking. I went and took a
closer look at the logs and at the beginning of the benchmark vacuums
are taking ~10 seconds and from the end of one vacuum to the next,
about 1.5 million xids are used. By the time the aggressive vacuum
happens, vacuums are taking around 30-40 seconds (30 seconds is where
the vacuum duration stabilizes until the end of the benchmark) and ~3
million xids are being used between the end of one vacuum and the
next.
So, all vacuums are eager after the first eager vacuum (in this
workload). It still isn't a completely satisfactory explanation, but I
will admit that the version of the patch set on which I ran the
benchmarks and generated the chart from had a few bugs. Attached are
the same charts generated from a benchmark run I did on the patch set
you reviewed -- and it seems like this effect is basically gone.
From e36b4fac345be44954410c4f0e61467dc0f49a72 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <melanieplageman@gmail.com>
Date: Thu, 12 Dec 2024 16:44:37 -0500
Subject: [PATCH v2 10/10] Eagerly scan all-visible pages to amortize
aggressive vacuum@@ -27,11 +27,37 @@ * to the end, skipping pages as permitted by their visibility status, vacuum * options, and the eagerness level of the vacuum. * + * There are three vacuum eagerness levels: normal vacuum, eager vacuum, and + * aggressive vacuum. + * * When page skipping is enabled, non-aggressive vacuums may skip scanning - * pages that are marked all-visible in the visibility map. We may choose not + * pages that are marked all-visible in the visibility map. It may choose not * to skip pages if the range of skippable pages is below * SKIP_PAGES_THRESHOLD. *I find the above confusing since page skipping is the regular activity
but referred to in the negative, and because you use the term
"non-aggressive vacuums" which in prior releases only mapped to
"normal" vacuums, but now would map to both "normal" and "eager"
vacuums, and it isn't clear that is desired (in my head anyway). Does
the following still convey what you meant (and hopefully work better
with the paragraphs that follow)?When page skipping is not disabled, a normal vacuum may skip scanning
pages that are marked all-visible in the visibility map if the range
of skippable pages is below SKIP_PAGES_THRESHOLD.
Yes, that is much better. Thank you! I've incorporated this into the
new version.
+ * The eagerness level of a vacuum determines how many all-visible but + * not all-frozen pages it eagerly scans. + * + * A normal vacuum (eagerness VAC_NORMAL) scans no all-visible pages (with the + * exception of those scanned due to SKIP_PAGES_THRESHOLD). + * + * An eager vacuum (eagerness VAC_EAGER) scans a number of pages up to a limit + * based on whether or not it is succeeding or failing. An eager vacuum is + * downgraded to a normal vacuum when it hits its success quota. An aggressive + * vacuum cannot be downgraded. No eagerness level is ever upgraded. + *At the risk of being overly nit-picky... eager vacuums scan their
subset of all-visible pages "up to a limit" based solely on the
success ratio. In the case of (excessive) failures, there is no limit
to the number of pages scanned, only a pause in the pages scanned
until the next region.
Yes, this is a good point. I've changed it to specifically indicate
the limit is based on successes. However, I feel like I should either
not mention the success limit here or mention the regional limiting
based on failures -- otherwise it is confusing. Can you think of a
wording that would be good for this comment?
+ * An aggressive vacuum (eagerness EAGER_FULL) must scan all all-visible but + * not all-frozen pages. + */I think the above should be VAC_AGGRESSIVE vs EAGER_FULL, no?
Yep, thanks! Fixed.
vacrel->skipwithvm = skipwithvm;
+ heap_vacuum_set_up_eagerness(vacrel, aggressive); + if (verbose) - { - if (vacrel->aggressive) - ereport(INFO, - (errmsg("aggressively vacuuming \"%s.%s.%s\"", - vacrel->dbname, vacrel->relnamespace, - vacrel->relname))); - else - ereport(INFO, - (errmsg("vacuuming \"%s.%s.%s\"", - vacrel->dbname, vacrel->relnamespace, - vacrel->relname))); - } + ereport(INFO, + (errmsg("%s of \"%s.%s.%s\"", + vac_eagerness_description(vacrel->eagerness), + vacrel->dbname, vacrel->relnamespace, + vacrel->relname)));/*
* Allocate dead_items memory using dead_items_alloc. This handlesOne thing I am wondering about is that since we actually modify
vacrel->eagerness during the "success downgrade" cycle, a single
vacuum run could potentially produce messages with both eager vacuum
and normal vacuum language. I don't think that'd be a problem in the
above spot, but wondering if it might be elsewhere (maybe in
pg_stat_activity?).
Great point. It's actually already an issue in the vacuum log output.
In the new patch, I save the eagerness level at the beginning and use
it in the logging output. Any future users of the eagerness level will
have to figure out if they care about the original eagerness level or
the current one.
- Melanie
Attachments:
v3-0004-Make-heap_vac_scan_next_block-return-BlockNumber.patchtext/x-patch; charset=US-ASCII; name=v3-0004-Make-heap_vac_scan_next_block-return-BlockNumber.patchDownload+28-23
v3-0003-Remove-superfluous-next_block-local-variable-in-v.patchtext/x-patch; charset=US-ASCII; name=v3-0003-Remove-superfluous-next_block-local-variable-in-v.patchDownload+10-12
v3-0001-Remove-leftover-mentions-of-XLOG_HEAP2_FREEZE_PAG.patchtext/x-patch; charset=US-ASCII; name=v3-0001-Remove-leftover-mentions-of-XLOG_HEAP2_FREEZE_PAG.patchDownload+2-3
v3-0002-Move-vacuum-VM-buffer-release.patchtext/x-patch; charset=US-ASCII; name=v3-0002-Move-vacuum-VM-buffer-release.patchDownload+8-6
v3-0005-Refactor-vacuum-assert-into-multiple-if-statement.patchtext/x-patch; charset=US-ASCII; name=v3-0005-Refactor-vacuum-assert-into-multiple-if-statement.patchDownload+37-9
v3-0006-Add-more-general-summary-to-vacuumlazy.c.patchtext/x-patch; charset=US-ASCII; name=v3-0006-Add-more-general-summary-to-vacuumlazy.c.patchDownload+37-1
v3-0007-Eagerly-scan-all-visible-pages-to-amortize-aggres.patchtext/x-patch; charset=US-ASCII; name=v3-0007-Eagerly-scan-all-visible-pages-to-amortize-aggres.patchDownload+392-58
Thanks for the review!
On Tue, Dec 17, 2024 at 10:57 AM Nazir Bilal Yavuz <byavuz81@gmail.com> wrote:
Here are couple of code comments:
=== [PATCH v2 07/10] ===
It took me a while to understand that heap_vac_scan_next_block() loops
until rel_pages. What do you think about adding
Assert(vacrel->current_block == rel_pages) before the
pgstat_progress_update_param(PROGRESS_VACUUM_HEAP_BLKS_SCANNED,
rel_pages) and having a comment on main loop should process blocks
until rel_pages?
I added these to the v3 I sent in [1]/messages/by-id/CAAKRu_Zni_idCUyKTBteRM-G5X1qiB9mf75rZGtHpt+nk1z4Gg@mail.gmail.com
=== [PATCH v2 09/10] ===
+ * If there are no indexes or index scanning is disabled, phase II may be + * skipped. If phase I identified very few dead index entries, vacuum may skip + * phases II and III. Index vacuuming deletes the dead index entries from the + * TID store.phase III is not mentioned in the previous comments. It could be
better to first explain what phase III is before mentioning it.+ * After index vacuuming is complete, vacuum scans the blocks of the relation + * indicated by the TIDs in the TID store and reaps the dead tuples, freeing + * that space for future tuples. Finally, vacuum may truncate the relation if + * it has emptied pages at the end. + * + * After finishing all phases of work, vacuum updates relation statistics in + * pg_class and the cumulative statistics subsystem.There is no clear definition of phase III here as well. I can not
understand what phase III is and which parts the vacuum may skip.
I've attempted to address this in v3.
=== [PATCH v2 10/10] ===
+ /* + * The number of eagerly scanned blocks an eager vacuum failed to + * freeze (due to age) in the current eager scan region. Eager vacuums + * reset it to EAGER_SCAN_MAX_FAILS_PER_REGION each time they enter a + * new region of the relation. Aggressive vacuums also decrement this + * coutner but it is initialized to # blocks in the relation. + */s/coutner/counter
Fixed
/* non-export function prototypes */
+
static void lazy_scan_heap(LVRelState *vacrel);Extra blank line.
Fixed
+ if (TransactionIdIsNormal(vacrel->cutoffs.relfrozenxid) && + TransactionIdPrecedesOrEquals(vacrel->cutoffs.relfrozenxid, + vacrel->cutoffs.FreezeLimit)) + oldest_unfrozen_requires_freeze = true; + + if (MultiXactIdIsValid(vacrel->cutoffs.relminmxid) && + MultiXactIdPrecedesOrEquals(vacrel->cutoffs.relminmxid, + vacrel->cutoffs.MultiXactCutoff)) + oldest_unfrozen_requires_freeze = true;You may want to short-circuit the second if condition with
!oldest_unfrozen_requires_freeze but it may increase the complexity,
up to you.
Good point. Done.
+ vacrel->eager_pages.remaining_fails = EAGER_SCAN_MAX_FAILS_PER_REGION * + (1 - vacrel->next_eager_scan_region_start / EAGER_SCAN_REGION_SIZE);This will always return EAGER_SCAN_MAX_FAILS_PER_REGION or 0 because
of the integer dividing.
Ah, wow, thanks so much for catching that!
+ if (aggressive) + { + vacrel->eagerness = VAC_AGGRESSIVE; + + /* + * An aggressive vacuum must scan every all-visible page to safely + * advance the relfrozenxid and/or relminmxid. As such, there is no + * cap to the number of allowed successes or failures. + */ + vacrel->eager_pages.remaining_fails = vacrel->rel_pages + 1; + vacrel->eager_pages.remaining_successes = vacrel->rel_pages + 1; + return; + } ... ... + if (was_eager_scanned) + { + if (vm_page_frozen) + { + Assert(vacrel->eager_pages.remaining_successes > 0); + vacrel->eager_pages.remaining_successes--; + + if (vacrel->eager_pages.remaining_successes == 0) + { + Assert(vacrel->eagerness == VAC_EAGER);My initial thought was that since *was_eager_scanned* is true,
Assert(vacrel->eagerness == VAC_EAGER) should be under the top if
condition (I assumed that was_eager_scanned is only true for eager
vacuums, not for aggressive vacuums too) but I see your point here.
Since you set vacrel->eager_pages.remaining_successes to
vacrel->rel_pages + 1, vacrel->eager_pages.remaining_successes can not
reach 0 although all pages are processed as successful. I think
comment is needed in both places to explain why
vacrel->eager_pages.[remaining_fails | remaining_successes] is set to
vacrel->rel_pages + 1 and why vacrel->eagerness should be VAC_EAGER
when was_eager_scanned is true and
vacrel->eager_pages.remaining_successes is 0.
Makes sense. I've attempted to clarify as you suggest in v3.
- Melanie
[1]: /messages/by-id/CAAKRu_Zni_idCUyKTBteRM-G5X1qiB9mf75rZGtHpt+nk1z4Gg@mail.gmail.com
On Tue, Dec 17, 2024 at 5:54 PM Melanie Plageman
<melanieplageman@gmail.com> wrote:
Makes sense. I've attempted to clarify as you suggest in v3.
I would just commit 0001. There's nothing to be gained by waiting around.
I don't care about 0002 much. It doesn't seem particularly better or
worse. I would suggest that if you want to do this, maybe don't split
up this:
vacrel->blkno = InvalidBlockNumber;
if (BufferIsValid(vmbuffer))
ReleaseBuffer(vmbuffer);
You could put the moved code before or after after that instead of the
middle of it. Unless there's some reason it makes more sense in the
middle.
I don't care about 0003 much, either. It looks correct to me, but
whether it's better is subjective.
I dislike 0004 as presented. Reducing the scope of blkno is a false
economy; the function doesn't do much of anything interesting after
the main loop. And why do we want to report rel_pages to the progress
reporting machinery instead of blkno? I'd rather that the code report
where it actually ended up (blkno) rather than reporting where it
thinks it must have ended up (rel_pages).
I agree that the assert that 0005 replaces is confusing, but replacing
8 lines of code with 37 is not an improvement in my book.
I like 0006. The phase I-II-III terminology doesn't appear anywhere in
the code (at least, not to my knowledge) but we speak of it that way
so frequently in email and in conversation that mentioning it
someplace that a future developer might find seems advisable. I think
it would be worth mentioning in the second paragraph of the comment
that we may resume phase I after completing phase III if we entered
phase II due to lack of memory. I'm not sure what the right way to
phrase this is -- in a sense, this possibility means that these aren't
really phases at all, however much we may speak of them that way. But
as I say, we talk about them that way all the time, so it's good that
this is finally adding comments to match.
Regarding 0007:
- Why do we measure failure as an integer total but success as a
percentage? Maybe the thought here is that failures are counted per
region but successes are counted globally across the relation, but
then I wonder if that's what we want, and if so, whether we need some
comments to explain why we want that rather than doing both per
region.
- I do not like the nested struct definition for eager_pages. Either
define the struct separately, give it a name, and then refer to that
name here, or just define the elements individually, and maybe give
them a common prefix or something. I don't think what you have here is
a style we normally use. As you can see, pgindent is not particularly
fond of it. It seems particularly weird given that you didn't even put
all the stuff related to eagerness inside of it.
- It's a little weird that you mostly treat eager vacuums as an
intermediate position between aggressive and normal, but then we
decide whether we're eager in a different place than we decide whether
we're aggressive.
- On a related note, won't most vacuums be VAC_EAGER rather than
VAC_NORMAL, thus making VAC_NORMAL a misnomer? I wonder if it's better
to merge eager and normal together, and just treat the cases where we
judge eager scanning not worthwhile as a mild special case of an
otherwise-normal vacuum. It's important that a user can tell what
happened from the log message, but it doesn't seem absolutely
necessary for the start-of-vacuum message to make that clear. It could
just be that %u all-visible scanned => 0 all-visible scanned means we
didn't end up being at all eager.
- Right after the comment "Produce distinct output for the corner case
all the same, just in case," you've changed it so that there's no
longer an if-statement. While you haven't really falsified the comment
in any deep sense, some rewording is probably in order. Also, the
first sentence of this comment overtly contradicts itself without
really explaining anything useful. That's a preexisting problem for
which you're not responsible, but maybe it makes sense to fix it while
you're adjusting this comment.
- Perhaps it's unfair of me, but I think I would have hoped for an
acknowledgement in this commit message, considering that I believe I
was the one who suggested breaking the relation into logical regions,
trying to freeze a percentage of the all-visible-but-not-all-frozen
pages, and capping both successes and failures. Starting the region at
a random offset wasn't my idea, and the specific thresholds you've
chosen were not the ones I suggested, and limiting successes globally
rather than per-region was not what I think I had in mind, and I don't
mean to take away from everything you've done to move this forward,
but unless I am misunderstanding the situation, this particular patch
(0007) is basically an implementation of an algorithm that, as far as
I know, I was the first to propose.
- Which of course also means that I tend to like the idea, but also
that I'm biased. Still, what is the reasonable alternative to this
patch? I have a hard time believing that it's "do nothing". As far as
I know, pretty much everyone agrees that the large burst of work that
tends to occur when an aggressive vacuum kicks off is extremely
problematic, particularly but not only the first time it kicks off on
a table or group of tables that may have accumulated many
all-visible-but-not-all-frozen pages. This patch might conceivably err
in moving that work too aggressively to earlier vacuums, thus making
those vacuums too expensive or wasting work if the pages end up being
modified again; or it might conceivably err in moving work
insufficiently aggressively to earlier vacuums, leaving too much
remaining work when the aggressive vacuum finally happens. In fact, I
would be surprised if it doesn't have those problems in some
situations. But it could have moderately severe cases of those
problems and still be quite a bit better than what we have now
overall.
- So,I think we should avoid fine-tuning this and try to understand if
there's anything wrong with the big picture. Can we imagine a user who
is systematically unhappy with this change? Like, not a user who got
hosed once because of some bad luck, but someone who is constantly and
predictably getting hosed? They'd need to be accumulating lots of
all-visible-not-all-frozen pages in relatively large tables on a
regular basis, but then I guess they'd need to either drop the tables
before the aggressive vacuum happened, or they'd need to render the
pages not-all-visible again before the aggressive vacuum would have
happened. I'm not entirely sure how possible that is. My best guess is
that it's possible if the timing of your autovacuum runs is
particularly poor -- you just need to load some data, vacuum it early
enough that the XIDs are still young so it doesn't get frozen, then
have the eager vacuum hit it, and then update it. That doesn't seem
impossible, but I'm not sure if it's possible to make it happen often
enough and severely enough to really cause a problem. And I'm not sure
we're going to find that out before this is committed, so that brings
me to:
- If this patch is going to go into the source tree in PostgreSQL 18,
it should do that soon. This is a bad patch to be committing in late
March. I propose that it should be shipped by end of January or wait a
year. Even if it goes into the tree by the end of January, there is a
chance we'll find problems that we can't fix quickly and have to
revert the whole thing. But that becomes much more likely if it is
committed close to feature freeze. There is a certain kind of patch -
and I think it includes this patch - where you just can't ever really
know what the problems people are going to hit IRL are until it's
committed and people try things and hit problems, perhaps not even as
part of an intent to test this patch, but just testing in general. If
we don't find those problems with enough time remaining to react to
them in a thoughtful way, that's where full reverts start to happen.
--
Robert Haas
EDB: http://www.enterprisedb.com
On Tue, Dec 17, 2024 at 5:51 PM Melanie Plageman
<melanieplageman@gmail.com> wrote:
Thanks for taking a look!
I've rebased and attached an updated v3 which also addresses review feedback.
On Sun, Dec 15, 2024 at 1:05 AM Robert Treat <rob@xzilla.net> wrote:
On Fri, Dec 13, 2024 at 5:53 PM Melanie Plageman
<melanieplageman@gmail.com> wrote:
<snip>
So, all vacuums are eager after the first eager vacuum (in this
workload).
I have been thinking about this statement and a sense of awkwardness
that I felt about how this implementation came together which I think
Robert captured well with his statement "- It's a little weird that
you mostly treat eager vacuums as an intermediate position between
aggressive and normal, but then we decide whether we're eager in a
different place than we decide whether
we're aggressive."
It feels to me like eager vacuums are not so much a distinct thing,
but that, like how all vacuum do index cleanup unless told not to, all
vacuums are optimistic that they will convert avp to afp, only we bail
out quickly if the table is below vacuum_freeze_min_age and we throw
the limits out if it is above autovacuum_freeze_max_age, similar to
how we manage vacuum cost limit.
+ * The eagerness level of a vacuum determines how many all-visible but + * not all-frozen pages it eagerly scans. + * + * A normal vacuum (eagerness VAC_NORMAL) scans no all-visible pages (with the + * exception of those scanned due to SKIP_PAGES_THRESHOLD). + * + * An eager vacuum (eagerness VAC_EAGER) scans a number of pages up to a limit + * based on whether or not it is succeeding or failing. An eager vacuum is + * downgraded to a normal vacuum when it hits its success quota. An aggressive + * vacuum cannot be downgraded. No eagerness level is ever upgraded. + *At the risk of being overly nit-picky... eager vacuums scan their
subset of all-visible pages "up to a limit" based solely on the
success ratio. In the case of (excessive) failures, there is no limit
to the number of pages scanned, only a pause in the pages scanned
until the next region.Yes, this is a good point. I've changed it to specifically indicate
the limit is based on successes. However, I feel like I should either
not mention the success limit here or mention the regional limiting
based on failures -- otherwise it is confusing. Can you think of a
wording that would be good for this comment?
I don't feel like I came up with anything concise :-)
I think the gist is something like "a given vacuum will scan a number
of pages up to either a limit based on successes, after which it is
downgraded to a normal vacuum, or a subset of pages based on failures
across different regions within the heap." but even that feels like we
are relying on people already understanding what is going on rather
than defining/explaining it.
+ * An aggressive vacuum (eagerness EAGER_FULL) must scan all all-visible but + * not all-frozen pages. + */I think the above should be VAC_AGGRESSIVE vs EAGER_FULL, no?
Yep, thanks! Fixed.
vacrel->skipwithvm = skipwithvm;
+ heap_vacuum_set_up_eagerness(vacrel, aggressive); + if (verbose) - { - if (vacrel->aggressive) - ereport(INFO, - (errmsg("aggressively vacuuming \"%s.%s.%s\"", - vacrel->dbname, vacrel->relnamespace, - vacrel->relname))); - else - ereport(INFO, - (errmsg("vacuuming \"%s.%s.%s\"", - vacrel->dbname, vacrel->relnamespace, - vacrel->relname))); - } + ereport(INFO, + (errmsg("%s of \"%s.%s.%s\"", + vac_eagerness_description(vacrel->eagerness), + vacrel->dbname, vacrel->relnamespace, + vacrel->relname)));/*
* Allocate dead_items memory using dead_items_alloc. This handlesOne thing I am wondering about is that since we actually modify
vacrel->eagerness during the "success downgrade" cycle, a single
vacuum run could potentially produce messages with both eager vacuum
and normal vacuum language. I don't think that'd be a problem in the
above spot, but wondering if it might be elsewhere (maybe in
pg_stat_activity?).Great point. It's actually already an issue in the vacuum log output.
In the new patch, I save the eagerness level at the beginning and use
it in the logging output. Any future users of the eagerness level will
have to figure out if they care about the original eagerness level or
the current one.
In a general sense, you want to know if your vacuums are converting
avp to afp since it can explain i/o usage. In this version of the
patch, we know it is turned on based on VacEagerness, but it'd be nice
to know if it got turned off; ie. basically if we end up in
if (vacrel->eager_pages.remaining_successes == 0) maybe drop a note in the logs.
In a world where all vacuums are eager vacuums, I think you still want
the latter messaging, but I think the former would end up being noted
based on the outcome of criteria in vacuum_get_cutoffs() (primarily if
we are over the freeze limit).
Robert Treat
https://xzilla.net
Updated v4 attached.
On Wed, Dec 18, 2024 at 4:30 PM Robert Haas <robertmhaas@gmail.com> wrote:
On Tue, Dec 17, 2024 at 5:54 PM Melanie Plageman
<melanieplageman@gmail.com> wrote:Makes sense. I've attempted to clarify as you suggest in v3.
I would just commit 0001. There's nothing to be gained by waiting around.
Done.
I don't care about 0002 much. It doesn't seem particularly better or
worse. I would suggest that if you want to do this, maybe don't split
up this:vacrel->blkno = InvalidBlockNumber;
if (BufferIsValid(vmbuffer))
ReleaseBuffer(vmbuffer);You could put the moved code before or after after that instead of the
middle of it. Unless there's some reason it makes more sense in the
middle.
Good point. Anyway, I've dropped all the patches except 0006 and 0007
since no one seems to really like them.
I dislike 0004 as presented. Reducing the scope of blkno is a false
economy; the function doesn't do much of anything interesting after
the main loop. And why do we want to report rel_pages to the progress
reporting machinery instead of blkno? I'd rather that the code report
where it actually ended up (blkno) rather than reporting where it
thinks it must have ended up (rel_pages).
Well, part of the reason I didn't like it is that because we start
from 0, we have to artificially set blkno to rel_pages anyway because
we never actually scan a block with BlockNumber == rel_pages. In the
loop, pgstat_progress_update_param() passes blkno before it actually
scans blkno, so I think of it as reporting that it had scanned a total
number of blocks equal to blkno. That's why it seems weird to me that
we use blkno to indicate the number of blocks scanned outside the
loop.
Anyway, I'm fine with setting this aside for now if you feel it is
more confusing with rel_pages instead of blkno.
I agree that the assert that 0005 replaces is confusing, but replacing
8 lines of code with 37 is not an improvement in my book.
Right, yes. It is long. I dropped it along with the others.
I like 0006. The phase I-II-III terminology doesn't appear anywhere in
the code (at least, not to my knowledge) but we speak of it that way
so frequently in email and in conversation that mentioning it
someplace that a future developer might find seems advisable. I think
it would be worth mentioning in the second paragraph of the comment
that we may resume phase I after completing phase III if we entered
phase II due to lack of memory. I'm not sure what the right way to
phrase this is -- in a sense, this possibility means that these aren't
really phases at all, however much we may speak of them that way. But
as I say, we talk about them that way all the time, so it's good that
this is finally adding comments to match.
They are more like states in a state machine, I suppose. We've all
been calling them phases for so long though, it's probably best to go
with that. I've added more about how we can return to phases. I also
added a small note about how this makes them more like states but
we've always called them phases.
Regarding 0007:
- Why do we measure failure as an integer total but success as a
percentage? Maybe the thought here is that failures are counted per
region but successes are counted globally across the relation, but
then I wonder if that's what we want, and if so, whether we need some
comments to explain why we want that rather than doing both per
region.
I've added a comment about this to the top of the file. The idea is
that if you don't cap successes, you won't end up amortizing anything.
However, you don't want to limit yourself from freezing the data at
the beginning of the table if you are succeeding. Especially given
that append-mostly workloads will see the most benefit from this
feature.
I did wonder if we should also have some sort of global failure limit
to cap the total pages scanned, but I wondered if that was too much
extra complexity to have a global and local failure limit (also
unclear what to set the global failure limit to). It's probably better
to make the fails per region configurable.
- I do not like the nested struct definition for eager_pages. Either
define the struct separately, give it a name, and then refer to that
name here, or just define the elements individually, and maybe give
them a common prefix or something. I don't think what you have here is
a style we normally use. As you can see, pgindent is not particularly
fond of it. It seems particularly weird given that you didn't even put
all the stuff related to eagerness inside of it.
I don't mind changing it. This version has them prefixed with "eager" instead.
- It's a little weird that you mostly treat eager vacuums as an
intermediate position between aggressive and normal, but then we
decide whether we're eager in a different place than we decide whether
we're aggressive.
Yes, originally I had the code in heap_vacuum_set_up_eagerness() in
vacuum_get_cutoffs() but I moved it after noticing
vacuum_get_cutoffs() was in vacuum.c which is technically AM-agnostic.
I didn't want to pollute it too much with the eagerness logic which is
actually used in heap-specific code.
With the other changes to eliminate the idea of a separate eager
vacuum in this version, I think this is no longer an issue.
- On a related note, won't most vacuums be VAC_EAGER rather than
VAC_NORMAL, thus making VAC_NORMAL a misnomer? I wonder if it's better
to merge eager and normal together, and just treat the cases where we
judge eager scanning not worthwhile as a mild special case of an
otherwise-normal vacuum. It's important that a user can tell what
happened from the log message, but it doesn't seem absolutely
necessary for the start-of-vacuum message to make that clear. It could
just be that %u all-visible scanned => 0 all-visible scanned means we
didn't end up being at all eager.
I can see this. In this version, I've eliminated the concept of eager
vacuums and reverted the LVRelState flag back to a boolean indicating
aggressive or non-aggressive. Normal vacuums may eager scan some pages
if they qualify. If so, the eager scan management state is set up in
the LVRelState. Aggressive vacuums and normal vacuums with eager
scanning disabled have all of this state set to values indicating
eager scanning is disabled.
- Perhaps it's unfair of me, but I think I would have hoped for an
acknowledgement in this commit message, considering that I believe I
was the one who suggested breaking the relation into logical regions,
trying to freeze a percentage of the all-visible-but-not-all-frozen
pages, and capping both successes and failures. Starting the region at
a random offset wasn't my idea, and the specific thresholds you've
chosen were not the ones I suggested, and limiting successes globally
rather than per-region was not what I think I had in mind, and I don't
mean to take away from everything you've done to move this forward,
but unless I am misunderstanding the situation, this particular patch
(0007) is basically an implementation of an algorithm that, as far as
I know, I was the first to propose.
Ah, you're right. This was an oversight that I believe I've corrected
in the attached version's commit message.
- Which of course also means that I tend to like the idea, but also
that I'm biased. Still, what is the reasonable alternative to this
patch? I have a hard time believing that it's "do nothing". As far as
I know, pretty much everyone agrees that the large burst of work that
tends to occur when an aggressive vacuum kicks off is extremely
problematic, particularly but not only the first time it kicks off on
a table or group of tables that may have accumulated many
all-visible-but-not-all-frozen pages. This patch might conceivably err
in moving that work too aggressively to earlier vacuums, thus making
those vacuums too expensive or wasting work if the pages end up being
modified again; or it might conceivably err in moving work
insufficiently aggressively to earlier vacuums, leaving too much
remaining work when the aggressive vacuum finally happens. In fact, I
would be surprised if it doesn't have those problems in some
situations. But it could have moderately severe cases of those
problems and still be quite a bit better than what we have now
overall.- So,I think we should avoid fine-tuning this and try to understand if
there's anything wrong with the big picture. Can we imagine a user who
is systematically unhappy with this change? Like, not a user who got
hosed once because of some bad luck, but someone who is constantly and
predictably getting hosed? They'd need to be accumulating lots of
all-visible-not-all-frozen pages in relatively large tables on a
regular basis, but then I guess they'd need to either drop the tables
before the aggressive vacuum happened, or they'd need to render the
pages not-all-visible again before the aggressive vacuum would have
happened. I'm not entirely sure how possible that is. My best guess is
that it's possible if the timing of your autovacuum runs is
particularly poor -- you just need to load some data, vacuum it early
enough that the XIDs are still young so it doesn't get frozen, then
have the eager vacuum hit it, and then update it. That doesn't seem
impossible, but I'm not sure if it's possible to make it happen often
enough and severely enough to really cause a problem. And I'm not sure
we're going to find that out before this is committed
I suppose in the worst case, if the timings all align poorly and
you've set your autovacuum_freeze_max_age/vacuum_freeze_table_age very
high and vacuum_freeze_min_age very low, you could end up uselessly
freezing the same page multiple times before aggressive vacuuming.
If you cycle through modifying a page, vacuuming it, setting it
all-visible, and eagerly scanning and freezing it multiple times
before an aggressive vacuum, this would be a lot of extra useless
freezing. It seems difficult to do because the page will likely be
frozen the first time you vacuum it if vacuum_freeze_min_age is set
sufficiently low.
The other "worst case" is just that you always scan and fail to freeze
an extra 3% of the relation while vacuuming the table. This one is
much easier to achieve. As such, it seems worthwhile to add a GUC and
table option to tune the EAGER_SCAN_MAX_FAILS_PER_REGION such that you
can disable eager scanning altogether (or increase or decrease how
aggressive it is).
- Melanie
Attachments:
v4-0001-Add-more-general-summary-to-vacuumlazy.c.patchtext/x-patch; charset=US-ASCII; name=v4-0001-Add-more-general-summary-to-vacuumlazy.c.patchDownload+42-1
v4-0002-Eagerly-scan-all-visible-pages-to-amortize-aggres.patchtext/x-patch; charset=US-ASCII; name=v4-0002-Eagerly-scan-all-visible-pages-to-amortize-aggres.patchDownload+352-28
On Sat, Dec 21, 2024 at 10:28 AM Robert Treat <rob@xzilla.net> wrote:
On Tue, Dec 17, 2024 at 5:51 PM Melanie Plageman
<melanieplageman@gmail.com> wrote:It feels to me like eager vacuums are not so much a distinct thing,
but that, like how all vacuum do index cleanup unless told not to, all
vacuums are optimistic that they will convert avp to afp, only we bail
out quickly if the table is below vacuum_freeze_min_age and we throw
the limits out if it is above autovacuum_freeze_max_age, similar to
how we manage vacuum cost limit.
I like this framing/way of thinking about it. v4 in [1]/messages/by-id/CAAKRu_byJgf3wB8ukv6caXROReS1SRZsVPb7RWP+8qPtdDGykw@mail.gmail.com eliminates the
concept of eager vacuums. I still use the aggressive vacuum and
non-aggressive vacuum framing. But, if we add a reloption/GUC to allow
configuring failures per region (proposed in [1]/messages/by-id/CAAKRu_byJgf3wB8ukv6caXROReS1SRZsVPb7RWP+8qPtdDGykw@mail.gmail.com), that means more
using-facing docs and I think this way of framing it as a spectrum of
all-visible page scanning based on relfrozenxid's relationship to each
of these GUCs might be a clear way to explain eager scanning to users.
In a general sense, you want to know if your vacuums are converting
avp to afp since it can explain i/o usage. In this version of the
patch, we know it is turned on based on VacEagerness, but it'd be nice
to know if it got turned off; ie. basically if we end up in
if (vacrel->eager_pages.remaining_successes == 0) maybe drop a note in the logs.
I've added a logging message like this.
In a world where all vacuums are eager vacuums, I think you still want
the latter messaging, but I think the former would end up being noted
based on the outcome of criteria in vacuum_get_cutoffs() (primarily if
we are over the freeze limit).
Hmm. Now that I've eliminated the concept of eager vacuums, users are
not informed whether or not eager scanning is enabled at the beginning
of vacuum. Only when eager scanning is disabled during vacuum or at
the end when they see the number of pages eagerly scanned would they
know whether or not this eager scanning happened.
- Melanie
[1]: /messages/by-id/CAAKRu_byJgf3wB8ukv6caXROReS1SRZsVPb7RWP+8qPtdDGykw@mail.gmail.com
On Mon, Dec 23, 2024 at 12:50 PM Melanie Plageman
<melanieplageman@gmail.com> wrote:
The other "worst case" is just that you always scan and fail to freeze
an extra 3% of the relation while vacuuming the table. This one is
much easier to achieve. As such, it seems worthwhile to add a GUC and
table option to tune the EAGER_SCAN_MAX_FAILS_PER_REGION such that you
can disable eager scanning altogether (or increase or decrease how
aggressive it is).
Attached v5 adds the GUC and table storage option controlling the
maximum number of eager scan failures tolerated for each region of the
table.
0001 is a version of a docs patch proposed in [1]/messages/by-id/CAAKRu_aQUOaMYrcjNuXeSkJtaX9oRUzKP57bsYbC0gVVWS+cbA@mail.gmail.com. It adds a new
Vacuuming subsection of the Server Configuration docs where the
proposed GUC in this patch is added in 0003.
While adding the table storage option and GUC, I struggled a bit with
where in the code to actually determine the final value of
vacuum_eager_scan_max_fails.
For table storage options, those related to vacuum but not autovacuum
are in the main StdRdOptions struct. Of those, some are overridden by
VACUUM command parameters which are parsed out into the VacuumParams
struct. Though the members of VacuumParams are initialized in
ExecVacuum(), the storage parameter overrides are determined in
vacuum_rel() and the final value goes in the VacuumParams struct which
is passed all the way through to heap_vacuum_rel().
Because VacuumParams is what ultimately gets passed down to the
table-AM specific vacuum implementation, autovacuum also initializes
its own instance of VacuumParams in the autovac_table struct in
table_recheck_autovac() (even though no VACUUM command parameters can
affect autovacuum). These are overridden in vacuum_rel() as well.
Ultimately vacuum_eager_scan_max_fails is a bit different from the
existing members of VacuumParams and StdRdOptions. It is a GUC and a
table storage option but not a SQL command parameter -- and both the
GUC and the table storage parameter affect both vacuum and autovacuum.
And it doesn't need to be initialized in different ways for autovacuum
and vacuum. In the end, I decided to follow the existing conventions
as closely as I could.
- Melanie
[1]: /messages/by-id/CAAKRu_aQUOaMYrcjNuXeSkJtaX9oRUzKP57bsYbC0gVVWS+cbA@mail.gmail.com
Attachments:
v5-0003-Eagerly-scan-all-visible-pages-to-amortize-aggres.patchtext/x-patch; charset=US-ASCII; name=v5-0003-Eagerly-scan-all-visible-pages-to-amortize-aggres.patchDownload+458-29
v5-0001-Consolidate-docs-for-vacuum-related-GUCs-in-new-s.patchtext/x-patch; charset=US-ASCII; name=v5-0001-Consolidate-docs-for-vacuum-related-GUCs-in-new-s.patchDownload+628-611
v5-0002-Add-more-general-summary-to-vacuumlazy.c.patchtext/x-patch; charset=US-ASCII; name=v5-0002-Add-more-general-summary-to-vacuumlazy.c.patchDownload+42-1
Hi,
On 2025-01-07 15:46:26 -0500, Melanie Plageman wrote:
For table storage options, those related to vacuum but not autovacuum
are in the main StdRdOptions struct. Of those, some are overridden by
VACUUM command parameters which are parsed out into the VacuumParams
struct. Though the members of VacuumParams are initialized in
ExecVacuum(), the storage parameter overrides are determined in
vacuum_rel() and the final value goes in the VacuumParams struct which
is passed all the way through to heap_vacuum_rel().Because VacuumParams is what ultimately gets passed down to the
table-AM specific vacuum implementation, autovacuum also initializes
its own instance of VacuumParams in the autovac_table struct in
table_recheck_autovac() (even though no VACUUM command parameters can
affect autovacuum). These are overridden in vacuum_rel() as well.Ultimately vacuum_eager_scan_max_fails is a bit different from the
existing members of VacuumParams and StdRdOptions. It is a GUC and a
table storage option but not a SQL command parameter -- and both the
GUC and the table storage parameter affect both vacuum and autovacuum.
And it doesn't need to be initialized in different ways for autovacuum
and vacuum. In the end, I decided to follow the existing conventions
as closely as I could.
I think that's fine. The abstractions in this area aren't exactly perfect, and
I don't think this makes it worse in any meaningful way. It's not really
different from having other heap-specific params like freeze_min_age in
VacuumParams.
Greetings,
Andres Freund
On Thu, Jan 9, 2025 at 1:24 PM Andres Freund <andres@anarazel.de> wrote:
On 2025-01-07 15:46:26 -0500, Melanie Plageman wrote:
For table storage options, those related to vacuum but not autovacuum
are in the main StdRdOptions struct. Of those, some are overridden by
VACUUM command parameters which are parsed out into the VacuumParams
struct. Though the members of VacuumParams are initialized in
ExecVacuum(), the storage parameter overrides are determined in
vacuum_rel() and the final value goes in the VacuumParams struct which
is passed all the way through to heap_vacuum_rel().Because VacuumParams is what ultimately gets passed down to the
table-AM specific vacuum implementation, autovacuum also initializes
its own instance of VacuumParams in the autovac_table struct in
table_recheck_autovac() (even though no VACUUM command parameters can
affect autovacuum). These are overridden in vacuum_rel() as well.Ultimately vacuum_eager_scan_max_fails is a bit different from the
existing members of VacuumParams and StdRdOptions. It is a GUC and a
table storage option but not a SQL command parameter -- and both the
GUC and the table storage parameter affect both vacuum and autovacuum.
And it doesn't need to be initialized in different ways for autovacuum
and vacuum. In the end, I decided to follow the existing conventions
as closely as I could.I think that's fine. The abstractions in this area aren't exactly perfect, and
I don't think this makes it worse in any meaningful way. It's not really
different from having other heap-specific params like freeze_min_age in
VacuumParams.
Got it. I've left it as is, then.
Attached v6 is rebased over recent changes in the vacuum-related docs.
I've also updated the "Routine Vacuuming" section of the docs to
mention eager scanning.
I'm planning to commit 0001 (which updates the code comment at the top
of vacuumlazy.c to explain heap vacuuming) --barring any objections.
I've been running a few multi-day benchmarks to ensure that the patch
behaves the same in a "normal" timeframe as it did in a compressed
one.
So far, it looks good. For a multi-day transactional benchmark with a
gaussian data access pattern, it looks about the same as a shorter
version (that is, aggressive vacuums are much shorter and there is no
difference when compared to master WRT total WAL volume, TPS, etc).
The final long benchmarks I'm waiting on are a hot tail workload with
a job that deletes old data.
- Melanie
Attachments:
v6-0002-Eagerly-scan-all-visible-pages-to-amortize-aggres.patchtext/x-patch; charset=US-ASCII; name=v6-0002-Eagerly-scan-all-visible-pages-to-amortize-aggres.patchDownload+488-46
v6-0001-Add-more-general-summary-to-vacuumlazy.c.patchtext/x-patch; charset=US-ASCII; name=v6-0001-Add-more-general-summary-to-vacuumlazy.c.patchDownload+42-1
Hi!
On 14.01.2025 00:46, Melanie Plageman wrote:
On Thu, Jan 9, 2025 at 1:24 PM Andres Freund<andres@anarazel.de> wrote:
On 2025-01-07 15:46:26 -0500, Melanie Plageman wrote:
For table storage options, those related to vacuum but not autovacuum
are in the main StdRdOptions struct. Of those, some are overridden by
VACUUM command parameters which are parsed out into the VacuumParams
struct. Though the members of VacuumParams are initialized in
ExecVacuum(), the storage parameter overrides are determined in
vacuum_rel() and the final value goes in the VacuumParams struct which
is passed all the way through to heap_vacuum_rel().Because VacuumParams is what ultimately gets passed down to the
table-AM specific vacuum implementation, autovacuum also initializes
its own instance of VacuumParams in the autovac_table struct in
table_recheck_autovac() (even though no VACUUM command parameters can
affect autovacuum). These are overridden in vacuum_rel() as well.Ultimately vacuum_eager_scan_max_fails is a bit different from the
existing members of VacuumParams and StdRdOptions. It is a GUC and a
table storage option but not a SQL command parameter -- and both the
GUC and the table storage parameter affect both vacuum and autovacuum.
And it doesn't need to be initialized in different ways for autovacuum
and vacuum. In the end, I decided to follow the existing conventions
as closely as I could.I think that's fine. The abstractions in this area aren't exactly perfect, and
I don't think this makes it worse in any meaningful way. It's not really
different from having other heap-specific params like freeze_min_age in
VacuumParams.Got it. I've left it as is, then.
Attached v6 is rebased over recent changes in the vacuum-related docs.
I've also updated the "Routine Vacuuming" section of the docs to
mention eager scanning.I'm planning to commit 0001 (which updates the code comment at the top
of vacuumlazy.c to explain heap vacuuming) --barring any objections.
Thank you for working on this patch, without this explanation it is
difficult to understand what is happening, to put it mildly.
While working on the vacuum statistics, I'll add you a link to the
thread just in case [0]https://commitfest.postgresql.org/51/5012/, I discovered some more important points that I
think can be mentioned.
The first of them is related to the fact that vacuum will not clean
tuples referenced in indexes, since it was previously unable to take a
cleanup lock on the index. You can look at the increment of
missed_dead_tuples and vacrel->missed_dead_pages in the
lazy_scan_noprune function. That is, these are absolutely dead tuples
for vacuum that it simply could not clean.
Secondly, I think it is worth mentioning the moment when vacuum urgently
starts cleaning the heap relationship when there is a threat of a
wraparound round. At this point, it skips the index processing phase and
heap relationship truncation.
Thirdly, FreeSpaceMap is updated every time after the complete
completion of index and table cleaning (after the lazy_vacuum function)
and after table heap pruning stage (the lazy_scan_prune function). Maybe
you should add it.
I think it is possible to add additional information about parallel
vacuum - firstly, workers are generated for each index, which perform
their cleaning. Some indexes are defined by vacuum as unsafe for
processing by a parallel worker and can be processed only by a
postmaster (or leader). These are indexes that do not support parallel
bulk-deletion, parallel cleanup (see
parallel_vacuum_index_is_parallel_safe function).
I noticed an interesting point, but I don’t know if it is necessary to
write about it, but for me it was not obvious and informative that the
buffer and wal statistics are thrown by the indexes that were processed
by workers and are thrown separately in (pvs->buffer_usage, pvs->wal_usage).
[0]: https://commitfest.postgresql.org/51/5012/
--
Regards,
Alena Rybakina
Postgres Professional
On Mon, Jan 13, 2025 at 5:37 PM Alena Rybakina
<a.rybakina@postgrespro.ru> wrote:
Thank you for working on this patch, without this explanation it is difficult to understand what is happening, to put it mildly.
Thanks for the review! I've incorporated most of them into attached v7.
The first of them is related to the fact that vacuum will not clean tuples referenced in indexes, since it was previously unable to take a cleanup lock on the index. You can look at the increment of missed_dead_tuples and vacrel->missed_dead_pages in the lazy_scan_noprune function. That is, these are absolutely dead tuples for vacuum that it simply could not clean.
I had mentioned that if a (non-aggressive) vacuum cannot get a cleanup
lock on a page, it will skip pruning and freezing. I have expanded the
note to mention that this means it will not remove those dead tuples
or index entries.
Secondly, I think it is worth mentioning the moment when vacuum urgently starts cleaning the heap relationship when there is a threat of a wraparound round. At this point, it skips the index processing phase and heap relationship truncation.
I've added failsafe to the list of reasons why we might skip phase II and III.
Thirdly, FreeSpaceMap is updated every time after the complete completion of index and table cleaning (after the lazy_vacuum function) and after table heap pruning stage (the lazy_scan_prune function). Maybe you should add it.
I've added a sentence about this. It looks a bit awkward by itself,
but it doesn't really go with the other paragraphs. Anyway, I think it
is probably fine.
I think it is possible to add additional information about parallel vacuum - firstly, workers are generated for each index, which perform their cleaning. Some indexes are defined by vacuum as unsafe for processing by a parallel worker and can be processed only by a postmaster (or leader). These are indexes that do not support parallel bulk-deletion, parallel cleanup (see parallel_vacuum_index_is_parallel_safe function).
I hesitated to add too much about parallel index vacuuming to
vacuumlazy.c. I have added a line which mentions that manual vacuums
may vacuum indexes in parallel and to look at vacuumparallel.c for
more info.
I noticed an interesting point, but I don’t know if it is necessary to write about it, but for me it was not obvious and informative that the buffer and wal statistics are thrown by the indexes that were processed by workers and are thrown separately in (pvs->buffer_usage, pvs->wal_usage).
This is interesting, but I think it might belong as commentary in
vacuumparallel.c instead.
Thanks again for your close reading and detailed thoughts!
- Melanie
Attachments:
v7-0002-Eagerly-scan-all-visible-pages-to-amortize-aggres.patchtext/x-patch; charset=US-ASCII; name=v7-0002-Eagerly-scan-all-visible-pages-to-amortize-aggres.patchDownload+488-46
v7-0001-Add-more-general-summary-to-vacuumlazy.c.patchtext/x-patch; charset=US-ASCII; name=v7-0001-Add-more-general-summary-to-vacuumlazy.c.patchDownload+54-1
On 14.01.2025 22:51, Melanie Plageman wrote:
On Mon, Jan 13, 2025 at 5:37 PM Alena Rybakina
<a.rybakina@postgrespro.ru> wrote:Thank you for working on this patch, without this explanation it is difficult to understand what is happening, to put it mildly.
Thanks for the review! I've incorporated most of them into attached v7.
You are welcome! Thank you too)
The first of them is related to the fact that vacuum will not clean tuples referenced in indexes, since it was previously unable to take a cleanup lock on the index. You can look at the increment of missed_dead_tuples and vacrel->missed_dead_pages in the lazy_scan_noprune function. That is, these are absolutely dead tuples for vacuum that it simply could not clean.
I had mentioned that if a (non-aggressive) vacuum cannot get a cleanup
lock on a page, it will skip pruning and freezing. I have expanded the
note to mention that this means it will not remove those dead tuples
or index entries.Secondly, I think it is worth mentioning the moment when vacuum urgently starts cleaning the heap relationship when there is a threat of a wraparound round. At this point, it skips the index processing phase and heap relationship truncation.
I've added failsafe to the list of reasons why we might skip phase II and III.
Yes, I agree with you. After reviewing the patch again, I noticed them.
Thirdly, FreeSpaceMap is updated every time after the complete completion of index and table cleaning (after the lazy_vacuum function) and after table heap pruning stage (the lazy_scan_prune function). Maybe you should add it.
I've added a sentence about this. It looks a bit awkward by itself,
but it doesn't really go with the other paragraphs. Anyway, I think it
is probably fine.
I think it is fine.
I think it is possible to add additional information about parallel vacuum - firstly, workers are generated for each index, which perform their cleaning. Some indexes are defined by vacuum as unsafe for processing by a parallel worker and can be processed only by a postmaster (or leader). These are indexes that do not support parallel bulk-deletion, parallel cleanup (see parallel_vacuum_index_is_parallel_safe function).
I hesitated to add too much about parallel index vacuuming to
vacuumlazy.c. I have added a line which mentions that manual vacuums
may vacuum indexes in parallel and to look at vacuumparallel.c for
more info.
In my opinion, there is enough information about it. Additionally, the
code for parallel vacuums is located there. I believe a brief
description of what a parallel vacuum is and where to find more
information about it is sufficient, so it is fine as it is now.
I noticed an interesting point, but I don’t know if it is necessary to write about it, but for me it was not obvious and informative that the buffer and wal statistics are thrown by the indexes that were processed by workers and are thrown separately in (pvs->buffer_usage, pvs->wal_usage).
This is interesting, but I think it might belong as commentary in
vacuumparallel.c instead.
I added some description about it, I hope it is fine. I attached
vacuum_description.diff
Thanks again for your close reading and detailed thoughts!
Thank you for your contribution!)
--
Regards,
Alena Rybakina
Postgres Professional
Attachments:
vacuum_description.difftext/x-patch; charset=UTF-8; name=vacuum_description.diffDownload+6-2
On 14.01.2025 22:51, Melanie Plageman wrote:
There is a typo on committed vacuumlazy.c file
* If the TID store fills up in phase I, vacuum suspends phase I, proceeds
to
* phases II and II, cleaning up the dead tuples referenced in the current
TID
* store. This empties the TID store resumes phase I.
Probably you meant phases II and III
regards
Marcos
On Wed, Jan 15, 2025 at 2:36 PM Marcos Pegoraro <marcos@f10.com.br> wrote:
On 14.01.2025 22:51, Melanie Plageman wrote:
There is a typo on committed vacuumlazy.c file
* If the TID store fills up in phase I, vacuum suspends phase I, proceeds to
* phases II and II, cleaning up the dead tuples referenced in the current TID
* store. This empties the TID store resumes phase I.Probably you meant phases II and III
Oops. Thanks for catching this. I'll fix that and the following
sentence in the main patch proposed in this thread to avoid extra
noise.
- Melanie
On Wed, Jan 15, 2025 at 12:08 PM Alena Rybakina
<a.rybakina@postgrespro.ru> wrote:
This is interesting, but I think it might belong as commentary in
vacuumparallel.c instead.I added some description about it, I hope it is fine. I attached vacuum_description.diff
Alena, thanks again for your review. I pushed the part that we worked
on so far. I didn't end up adding in your proposed addition to
vacuumparallel.c. Honestly, I am not as familiar with the parallel
vacuuming code, so I wasn't sure what information rises to the level
of importance of being in the comment at the top of the file.
- Melanie