Teaching users how they can get the most out of HOT in Postgres 14
Recent work from commit 5100010e taught VACUUM that it doesn't have to
do index vacuuming in cases where there are practically zero (not
necessarily exactly zero) tuples to delete from indexes. It also
surfaces the information used to decide whether or not we skip index
vacuuming in the logs, via the log_autovacuum_min_duration mechanism.
This log output can be used to get a sense of how effective HOT is
over time.
There is one number of particular interest: the proportion of heap
pages that have one or more LP_DEAD items across successive VACUUMs
(this is expressed as a percentage of the table). The immediate reason
to expose this is that it is crucial to the skipping behavior from
commit 5100010e -- the threshold for skipping is 2% of all heap pages.
But that's certainly not the only reason to pay attention to the
percentage. It can also be used to understand HOT in general. It can
be correlated with workload spikes and stressors that tend to make HOT
less effective.
A number of interesting workload-specific patterns seem to emerge by
focussing on how this number changes/grows over time. I think that
this should be pointed out directly in the docs. What's more, it seems
like a good vehicle for discussing how HOT works in general. Why did
we never really get around to documenting HOT? There should at least
be some handling of how DBAs can get the most out of HOT through
monitoring and through tuning -- especially by lowering heap
fillfactor.
It's very hard to get all UPDATEs to use HOT. It's much easier to get
UPDATEs to mostly use HOT most of the time. How things change over
time seems crucially important.
I'll show one realistic example, just to give some idea of what it
might look like. This is output for 3 successive autovacuums against
the largest TPC-C table:
automatic vacuum of table "postgres.public.bmsql_order_line": index scans: 0
pages: 0 removed, 4668405 remain, 0 skipped due to pins, 696997 skipped frozen
tuples: 324571 removed, 186702373 remain, 333888 are dead but not yet
removable, oldest xmin: 7624965
buffer usage: 3969937 hits, 3931997 misses, 1883255 dirtied
index scan bypassed: 42634 pages from table (0.91% of total) have
324364 dead item identifiers
avg read rate: 62.469 MB/s, avg write rate: 29.920 MB/s
I/O Timings: read=42359.501 write=11867.903
system usage: CPU: user: 43.62 s, system: 38.17 s, elapsed: 491.74 s
WAL usage: 4586766 records, 1850599 full page images, 8499388881 bytes
automatic vacuum of table "postgres.public.bmsql_order_line": index scans: 0
pages: 0 removed, 5976391 remain, 0 skipped due to pins, 2516643 skipped frozen
tuples: 759956 removed, 239787517 remain, 1848431 are dead but not yet
removable, oldest xmin: 18489326
buffer usage: 3432019 hits, 3385757 misses, 2426571 dirtied
index scan bypassed: 103941 pages from table (1.74% of total) have
790233 dead item identifiers
avg read rate: 50.338 MB/s, avg write rate: 36.077 MB/s
I/O Timings: read=49252.721 write=17003.987
system usage: CPU: user: 45.86 s, system: 34.47 s, elapsed: 525.47 s
WAL usage: 5598040 records, 2274556 full page images, 10510281959 bytes
automatic vacuum of table "postgres.public.bmsql_order_line": index scans: 1
pages: 0 removed, 7483804 remain, 1 skipped due to pins, 4208295 skipped frozen
tuples: 972778 removed, 299295048 remain, 1970910 are dead but not yet
removable, oldest xmin: 30987445
buffer usage: 3384994 hits, 4593727 misses, 2891003 dirtied
index scan needed: 174243 pages from table (2.33% of total) had
1325752 dead item identifiers removed
index "bmsql_order_line_pkey": pages: 1250660 in total, 0 newly
deleted, 0 currently deleted, 0 reusable
avg read rate: 60.505 MB/s, avg write rate: 38.078 MB/s
I/O Timings: read=72881.986 write=21872.615
system usage: CPU: user: 65.24 s, system: 42.24 s, elapsed: 593.14 s
WAL usage: 6668353 records, 2684040 full page images, 12374536939 bytes
These autovacuums occur every 60-90 minutes with the workload in
question (with pretty aggressive autovacuum settings). We see that HOT
works rather well here -- but not so well that index vacuuming can be
avoided consistently, which happens in the final autovacuum (it has
"index scans: 1"). There was slow but steady growth in the percentage
of LP_DEAD-containing heap pages over time here, which is common
enough.
The point of HOT is not to avoid having to do index vacuuming, of
course -- that has it backwards. But framing HOT as doing work in
backends so autovacuum doesn't have to do similar work later on is a
good mental model to encourage users to adopt. There are also
significant advantages to reducing the effectiveness of HOT to this
one number -- HOT must be working well if it's close to 0%, almost
always below 2%, with the occasional aberration that sees it go up to
maybe 5%. But if it ever goes too high (in the absence of DELETEs),
you might have trouble on your hands. It might not go down again.
There are other interesting patterns from other tables within the same
database -- including on tables with no UPDATEs, and tables that
generally cannot use HOT due to a need to modify indexed columns. The
particulars with these other tables hint at problems with heap
fragmentation, which is something that users can think of as a
degenerative process -- something that gets progressively worse in
extreme cases (i.e. cases where it matters).
This new percentage metric isn't about HOT per se. It's actually about
the broader question of how effective the system is at keeping the
physical location of each logical row stable over time, for a given
workload. So maybe that's what any new documentation should address.
The documentation would still have plenty to say about HOT, though. It
would also have something to say about bottom-up index deletion, which
can be thought of as avoiding problems when HOT doesn't or can't be
applied very often.
--
Peter Geoghegan
Hi,
On 2021-04-12 16:11:59 -0700, Peter Geoghegan wrote:
Recent work from commit 5100010e taught VACUUM that it doesn't have to
do index vacuuming in cases where there are practically zero (not
necessarily exactly zero) tuples to delete from indexes.
FWIW, I'd not at all be surprised if this causes some issues. Consider
cases where all index lookups are via bitmap scans (which does not
support killtuples) - if value ranges are looked up often the repeated
heap fetches can absolutely kill query performance. I've definitely had
to make autovacuum more aggressive for cases like this or schedule
manual vacuums, and now that's silently not good enough anymore. Yes, 2%
of the table isn't all that much, but often enough all the updates and
lookups concentrate in one value range.
As far as I can see there's no reasonable way to disable this
"optimization", which scares me.
Greetings,
Andres Freund
On Mon, Apr 12, 2021 at 4:30 PM Andres Freund <andres@anarazel.de> wrote:
As far as I can see there's no reasonable way to disable this
"optimization", which scares me.
I'm fine with adding a simple 'off' switch. What I'd like to avoid
doing is making the behavior tunable, since it's likely to change in
Postgres 15 and Postgres 16 anyway.
--
Peter Geoghegan
On Mon, Apr 12, 2021 at 04:35:13PM -0700, Peter Geoghegan wrote:
On Mon, Apr 12, 2021 at 4:30 PM Andres Freund <andres@anarazel.de> wrote:
As far as I can see there's no reasonable way to disable this
"optimization", which scares me.I'm fine with adding a simple 'off' switch. What I'd like to avoid
doing is making the behavior tunable, since it's likely to change in
Postgres 15 and Postgres 16 anyway.
While going through this commit a couple of days ago, I really got to
wonder why you are controlling this stuff with a hardcoded value and I
found that scary, while what you should be using are two GUCs with the
reloptions that come with the feature (?):
- A threshold, as an integer, to define a number of pages.
- A scale factor to define a percentage of pages.
Also, I am a bit confused with the choice of BYPASS_THRESHOLD_PAGES as
parameter name. For all the other parameters of autovacuum, we use
"threshold" for a fixed number of items, not a percentage of a given
item.
--
Michael
On Mon, Apr 12, 2021 at 4:52 PM Michael Paquier <michael@paquier.xyz> wrote:
While going through this commit a couple of days ago, I really got to
wonder why you are controlling this stuff with a hardcoded value and I
found that scary, while what you should be using are two GUCs with the
reloptions that come with the feature (?):
- A threshold, as an integer, to define a number of pages.
- A scale factor to define a percentage of pages.
Why?
--
Peter Geoghegan
Hi,
On 2021-04-12 16:53:47 -0700, Peter Geoghegan wrote:
On Mon, Apr 12, 2021 at 4:52 PM Michael Paquier <michael@paquier.xyz> wrote:
While going through this commit a couple of days ago, I really got to
wonder why you are controlling this stuff with a hardcoded value and I
found that scary, while what you should be using are two GUCs with the
reloptions that come with the feature (?):
- A threshold, as an integer, to define a number of pages.
- A scale factor to define a percentage of pages.Why?
Well, one argument is that you made a fairly significant behavioural
change, with hard-coded logic for when the optimization kicks in. It's
not at all clear that your constants are the right ones for every
workload. We'll likely on get to know whether they're right in > 1 year
- not having a real out at that point imo is somewhat scary.
That said, adding more and more reloptions has a significant cost, so I
don't think it's clear cut that it's the right decision to add
one. Perhaps vacuum_cleanup_index_scale_factor should just be reused for
BYPASS_THRESHOLD_PAGES?
Greetings,
Andres Freund
On Mon, Apr 12, 2021 at 5:37 PM Andres Freund <andres@anarazel.de> wrote:
Well, one argument is that you made a fairly significant behavioural
change, with hard-coded logic for when the optimization kicks in. It's
not at all clear that your constants are the right ones for every
workload.
(Apparently nobody wants to talk about HOT and the documentation.)
The BYPASS_THRESHOLD_PAGES cutoff was chosen conservatively, so that
it would avoid index vacuuming in truly marginal cases -- and it tends
to only delay it there.
A table-level threshold is not the best way of constraining the
problem. In the future, the table threshold should be treated as only
one factor among several. Plus there will be more than a simple yes/no
question to consider. We should eventually be able to do index
vacuuming for some indexes but not others. Bottom-up index deletion
has totally changed things here, because roughly speaking it makes
index bloat proportionate to the number of logical changes to indexed
columns -- you could have one super-bloated index on the table, but
several others that perfectly retain their original size. You still
need to do heap vacuuming eventually, which necessitates vacuuming
indexes too, but the right strategy is probably to vacuum much more
frequently, vacuuming the bloated index each time. You only do a full
round of index vacuuming when the table starts to accumulate way too
many LP_DEAD items. You need a much more sophisticated model for this.
It might also need to hook into autovacuums scheduling.
One of the dangers of high BYPASS_THRESHOLD_PAGES settings is that
it'll work well for some indexes but not others. To a dramatic degree,
even.
That said, nbtree isn't the only index AM, and it is hard to be
completely sure that you've caught everything. So an off switch seems
like a good idea now.
We'll likely on get to know whether they're right in > 1 year
- not having a real out at that point imo is somewhat scary.That said, adding more and more reloptions has a significant cost, so I
don't think it's clear cut that it's the right decision to add
one. Perhaps vacuum_cleanup_index_scale_factor should just be reused for
BYPASS_THRESHOLD_PAGES?
I think that the right way to do this is to generalize INDEX_CLEANUP
to support a mode of operation that disallows vacuumlazy.c from
applying this optimization, as well as any similar optimizations which
will be added in the future.
Even if you don't buy my argument about directly parameterizing
BYPASS_THRESHOLD_PAGES undermining future work, allowing it to be set
much higher than 5% - 10% would be a pretty big footgun. It might
appear to help at first, but risks destabilizing things much later on.
--
Peter Geoghegan
On Mon, Apr 12, 2021 at 06:12:18PM -0700, Peter Geoghegan wrote:
One of the dangers of high BYPASS_THRESHOLD_PAGES settings is that
it'll work well for some indexes but not others. To a dramatic degree,
even.That said, nbtree isn't the only index AM, and it is hard to be
completely sure that you've caught everything. So an off switch seems
like a good idea now.
Whatever the solution chosen, the thing I can see we agree on here is
that we need to do something, at least in the shape of an on/off
switch to have an escape path in case of problems. Peter, could we
get something by beta1 for that? FWIW, I would use a float GUC to
control that, and not a boolean switch, but I am just one voice here,
and that's not a feature I worked on.
--
Michael
On Tue, May 11, 2021 at 04:42:27PM +0900, Michael Paquier wrote:
Whatever the solution chosen, the thing I can see we agree on here is
that we need to do something, at least in the shape of an on/off
switch to have an escape path in case of problems. Peter, could we
get something by beta1 for that? FWIW, I would use a float GUC to
control that, and not a boolean switch, but I am just one voice here,
and that's not a feature I worked on.
So, I have been thinking more about this item, and a boolean switch
still sounded weird to me, so attached is a patch to have two GUCs,
one for manual VACUUM and autovacuum like any other parameters, to
control this behavior, with a default set at 2% of the number of
relation pages with dead items needed to do the index cleanup work.
Even if we switch the parameter type used here, the easiest and most
consistent way to pass down the parameter is just to use VacuumParams
set within ExecVacuum() and the autovacuum code path. The docs need
more work, I guess.
Thoughts?
--
Michael
Attachments:
vacuum-index-cleanup-v1.patchtext/x-diff; charset=us-asciiDownload+114-11
On Thu, May 13, 2021 at 04:27:47PM +0900, Michael Paquier wrote:
On Tue, May 11, 2021 at 04:42:27PM +0900, Michael Paquier wrote:
Whatever the solution chosen, the thing I can see we agree on here is
that we need to do something, at least in the shape of an on/off
switch to have an escape path in case of problems. Peter, could we
get something by beta1 for that? FWIW, I would use a float GUC to
control that, and not a boolean switch, but I am just one voice here,
and that's not a feature I worked on.So, I have been thinking more about this item, and a boolean switch
still sounded weird to me, so attached is a patch to have two GUCs,
one for manual VACUUM and autovacuum like any other parameters, to
control this behavior, with a default set at 2% of the number of
relation pages with dead items needed to do the index cleanup work.Even if we switch the parameter type used here, the easiest and most
consistent way to pass down the parameter is just to use VacuumParams
set within ExecVacuum() and the autovacuum code path. The docs need
more work, I guess.Thoughts?
+ cleanup_index_scale_factor = autovacuum_cleanup_index_scale >= 0 ? + autovacuum_cleanup_index_scale : VacuumCostDelay;
CostDelay is surely not what you meant.
+ <title>Vacuum parameters for Indexes</title> + <para> + During the execution of <xref linkend="sql-vacuum"/> + and <xref linkend="sql-analyze"/>
"and analyze" is wrong?
+ This parameter can only be set in the <filename>postgresql.conf</filename> + file or on the server command line.
It's SIGHUP
+ This parameter can only be set in the <filename>postgresql.conf</filename> + file or on the server command line.
Same
+ {
+ {"vacuum_cleanup_index_scale_factor", PGC_SIGHUP, VACUUM_INDEX,
+ gettext_noop("Fraction of relation pages, with at least one dead item, required to clean up indexes."),
+ NULL
+ },
+ &VacuumCleanupIndexScale,
+ 0.02, 0.0, 0.05,
+ NULL, NULL, NULL
+ },
Why is the allowed range from 0 to 0.05? Why not 0.10 or 1.0 ?
The old GUC of the same name had max 1e10.
I think a reduced range and a redefinition of the GUC would need to be called
out as an incompatibility.
Also, the old GUC (removed at 9f3665fbf) had:
- {"vacuum_cleanup_index_scale_factor", PGC_USERSET, CLIENT_CONN_STATEMENT,
I think USERSET and STATEMENT were right ?
Alternately, what if this were in the DEVELOPER category, which makes this
easier to remove in v15.
--
Justin
On Thu, May 13, 2021 at 5:06 AM Justin Pryzby <pryzby@telsasoft.com> wrote:
Why is the allowed range from 0 to 0.05? Why not 0.10 or 1.0 ?
The old GUC of the same name had max 1e10.
It also had a completely different purpose and default.
I think a reduced range and a redefinition of the GUC would need to be called
out as an incompatibility.
The justification from Michael for this approach is that not having
this level of control would be weird, at least to him. But that
justification itself seems weird to me; why start from the premise
that you need a knob (as opposed to an off switch) at all? Why not
start with the way the mechanism works (or is intended to work) in
practice? Most individual tables will *never* have VACUUM apply the
optimization with *any* reasonable threshold value, so we only need to
consider the subset of tables/workloads where it *might* make sense to
skip index vacuuming. This is more qualitative than quantitative.
It makes zero sense to treat the threshold as a universal scale --
this is one reason why I don't want to expose a true tunable knob to
users. Though the threshold-driven/BYPASS_THRESHOLD_PAGES design is
not exactly something with stable behavior for a given table, it
almost works like that in practice: tables tend to usually skip index
vacuuming, or never skip it even once. There is a clear bifurcation
along this line when you view how VACUUM behaves with a variety of
different tables using the new autovacuum logging stuff.
Almost all of the benefit of the optimization is available with the
current BYPASS_THRESHOLD_PAGES threshold (2% of heap pages have
LP_DEAD items), which has less risk than a higher threshold. I don't
think it matters much if we have the occasional unnecessary round of
index vacuuming on account of not applying the optimization. The truly
important benefit of the optimization is to not do unnecessary index
vacuuming all the time in the extreme cases where it's really hard to
justify. There is currently zero evidence that anything higher than 2%
will ever help anybody to an appreciably degree. (There is also no
evidence that the optimization will ever need to be disabled, but I
accept the need to be conservative and offer an off switch -- the
precautionary principle applies when talking about new harms.)
Not having to scan every index on every VACUUM, but only every 5th or
so VACUUM is a huge improvement. But going from every 5th VACUUM to
every 10th VACUUM? That's at best a tiny additional improvement in
exchange for what I'd guess is a roughly linear increase in risk
(maybe a greater-than-linear increase, even). That's an awful deal.
--
Peter Geoghegan
On Thu, May 13, 2021 at 01:27:44PM -0700, Peter Geoghegan wrote:
Almost all of the benefit of the optimization is available with the
current BYPASS_THRESHOLD_PAGES threshold (2% of heap pages have
LP_DEAD items), which has less risk than a higher threshold. I don't
think it matters much if we have the occasional unnecessary round of
index vacuuming on account of not applying the optimization. The truly
important benefit of the optimization is to not do unnecessary index
vacuuming all the time in the extreme cases where it's really hard to
justify. There is currently zero evidence that anything higher than 2%
will ever help anybody to an appreciably degree. (There is also no
evidence that the optimization will ever need to be disabled, but I
accept the need to be conservative and offer an off switch -- the
precautionary principle applies when talking about new harms.)Not having to scan every index on every VACUUM, but only every 5th or
so VACUUM is a huge improvement. But going from every 5th VACUUM to
every 10th VACUUM? That's at best a tiny additional improvement in
exchange for what I'd guess is a roughly linear increase in risk
(maybe a greater-than-linear increase, even). That's an awful deal.
Perhaps that's an awful deal, but based on which facts can you really
say that this new behavior of needing at least 2% of relation pages
with some dead items to clean up indexes is not a worse deal in some
cases? This may cause more problems for the in-core index AMs, as
much as it could impact any out-of-core index AM, no? What about
other values like 1%, or even 5%? My guess is that there would be an
ask to have more control on that, though that stands as my opinion.
Saying that, as long as there is a way to disable that for the users
with autovacuum and manual vacuums, I'd be fine. It is worth noting
that adding an GUC to control this optimization would make the code
more confusing, as there is already do_index_cleanup, a
vacuum_index_cleanup reloption, and specifying vacuum_index_cleanup to
TRUE may cause the index cleanup to not actually kick if the 2% bar is
not reached.
--
Michael
On Thu, May 13, 2021 at 7:14 PM Michael Paquier <michael@paquier.xyz> wrote:
Perhaps that's an awful deal, but based on which facts can you really
say that this new behavior of needing at least 2% of relation pages
with some dead items to clean up indexes is not a worse deal in some
cases?
If I thought that it simply wasn't possible then I wouldn't have
accepted the need to make it possible to disable. This is a
cost/benefit decision problem, which must be made based on imperfect
information -- there are no absolute certainties. But I'm certain
about one thing: there is a large practical difference between the
optimization causing terrible performance in certain scenarios and the
optimization causing slightly suboptimal performance in certain
scenarios. A tiny risk of the former scenario is *much* worse than a
relatively large risk of the latter scenario. There needs to be a
sense of proportion about risk.
This may cause more problems for the in-core index AMs, as
much as it could impact any out-of-core index AM, no?
I don't understand what you mean here.
What about
other values like 1%, or even 5%? My guess is that there would be an
ask to have more control on that, though that stands as my opinion.
How did you arrive at that guess? Why do you believe that? This is the
second time I've asked.
Saying that, as long as there is a way to disable that for the users
with autovacuum and manual vacuums, I'd be fine. It is worth noting
that adding an GUC to control this optimization would make the code
more confusing, as there is already do_index_cleanup, a
vacuum_index_cleanup reloption, and specifying vacuum_index_cleanup to
TRUE may cause the index cleanup to not actually kick if the 2% bar is
not reached.
I don't intend to add a GUC. A reloption should suffice.
Your interpretation of what specifying vacuum_index_cleanup (the
VACUUM command option) represents doesn't seem particularly justified
to me. To me it just means "index cleanup and vacuuming are not
explicitly disabled, the default behavior". It's an option largely
intended for emergencies, and largely superseded by the failsafe
mechanism. This interpretation is justified by well established
precedent: it has long been possible for VACUUM to skip heap page
pruning and even heap page vacuuming just because a super-exclusive
lock could not be acquired (though the latter case no longer happens
due to the same work inside vacuumlazy.c) -- which also implies
skipping some index vacuuming, without it ever being apparent to the
user.
--
Peter Geoghegan
(I had missed this discussion due to the mismatched thread subject..)
On Fri, May 14, 2021 at 11:14 AM Michael Paquier <michael@paquier.xyz> wrote:
On Thu, May 13, 2021 at 01:27:44PM -0700, Peter Geoghegan wrote:
Almost all of the benefit of the optimization is available with the
current BYPASS_THRESHOLD_PAGES threshold (2% of heap pages have
LP_DEAD items), which has less risk than a higher threshold. I don't
think it matters much if we have the occasional unnecessary round of
index vacuuming on account of not applying the optimization. The truly
important benefit of the optimization is to not do unnecessary index
vacuuming all the time in the extreme cases where it's really hard to
justify. There is currently zero evidence that anything higher than 2%
will ever help anybody to an appreciably degree. (There is also no
evidence that the optimization will ever need to be disabled, but I
accept the need to be conservative and offer an off switch -- the
precautionary principle applies when talking about new harms.)Not having to scan every index on every VACUUM, but only every 5th or
so VACUUM is a huge improvement. But going from every 5th VACUUM to
every 10th VACUUM? That's at best a tiny additional improvement in
exchange for what I'd guess is a roughly linear increase in risk
(maybe a greater-than-linear increase, even). That's an awful deal.Perhaps that's an awful deal, but based on which facts can you really
say that this new behavior of needing at least 2% of relation pages
with some dead items to clean up indexes is not a worse deal in some
cases? This may cause more problems for the in-core index AMs, as
much as it could impact any out-of-core index AM, no? What about
other values like 1%, or even 5%? My guess is that there would be an
ask to have more control on that, though that stands as my opinion.
I'm concerned how users can tune that scale type parameter that can be
configurable between 0.0 and 0.05. I think that users basically don't
pay attention to how many blocks are updated by UPDATE/DELETE. Unlike
old vacuum_cleanup_index_scale_factor, increasing this parameter would
directly affect index bloats. If the user can accept more index bloat
to speed up (auto)vacuum, they can use vacuum_index_cleanup instead.
I prefer to have an on/off switch just in case. I remember I also
commented the same thing before. We’ve discussed a way to control
whether or not to enable the skipping optimization by adding a new
mode to INDEX_CLEANUP option, as Peter mentioned. For example, we can
use the new mode “auto” (or “smart”) mode by default, enabling all
skipping optimizations, and specifying “on” disables them. Or we can
add “force” mode to disable all skipping optimizations while leaving
the existing modes as they are. Anyway, I think it’s not a good idea
to add a new GUC parameter that we’re not sure how to tune.
IIUC skipping index vacuum when less than 2% of relation pages with at
least one LP_DEAD is a table’s optimization rather than a btree
index’s optimization. Since we’re not likely to set many pages
all-visible or collect many dead tuples in that case, we can skip
index vacuuming and table vacuuming. IIUC this case, fortunately, goes
well together btree indexes’ bottom-up deletion. If this skipping
behavior badly affects other indexes AMs, this optimization should be
considered within btree indexes, although we will need a way for index
AMs to consider and tell the vacuum strategy. But I guess this works
well in most cases so having an on/off switch suffice.
Regards,
--
Masahiko Sawada
EDB: https://www.enterprisedb.com/
On Tue, May 18, 2021 at 7:29 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
I prefer to have an on/off switch just in case. I remember I also
commented the same thing before. We’ve discussed a way to control
whether or not to enable the skipping optimization by adding a new
mode to INDEX_CLEANUP option, as Peter mentioned. For example, we can
use the new mode “auto” (or “smart”) mode by default, enabling all
skipping optimizations, and specifying “on” disables them. Or we can
add “force” mode to disable all skipping optimizations while leaving
the existing modes as they are. Anyway, I think it’s not a good idea
to add a new GUC parameter that we’re not sure how to tune.IIUC skipping index vacuum when less than 2% of relation pages with at
least one LP_DEAD is a table’s optimization rather than a btree
index’s optimization.
Right. There *is* an excellent way to tune this behavior: by adjusting
heap fillfactor to make HOT more effective. That was why I started
this thread!
If you leave heap fillfactor at the default of 100, and have lots of
updates (that don't modify indexed columns) and no deletes, then
you're almost certainly not going to have VACUUM skip indexes anyway
-- in practice you're bound to exceed having 2% of pages with an
LP_DEAD item before very long. Tuning heap fillfactor is practically
essential to see a real benefit, regardless of the exact
BYPASS_THRESHOLD_PAGES. (There may be some rare exceptions, but for
the most part this mechanism helps with tables that get many updates
that are expected to use HOT, and will use HOT barring a tiny number
of cases where the new tuple won't' quite fit, etc.)
The idea of tuning the behavior directly (e.g. with a reloption that
lets the user specify a BYPASS_THRESHOLD_PAGES style threshold) is
exactly backwards. The point for the user should not be to skip
indexes during VACUUM. The point for the user is to get lots of
non-HOT updates to *avoid heap fragmentation*, guided by the new
autovacuum instrumentation. That also means that there will be much
less index vacuuming. But that's a pretty minor side-benefit. Why
should the user *expect* largely unnecessary index vacuuming to take
place?
To put it another way, the index bypass mechanism added to
vacuumlazy.c was not intended to add a new good behavior. It was
actually intended to subtract an old bad behavior. The patch is mostly
useful because it allows the user to make VACUUM *more* aggressive
with freezing and VM bit setting (not less aggressive with indexes).
The BYPASS_THRESHOLD_PAGES threshold of 0.02 is a little arbitrary --
but only a little.
Since we’re not likely to set many pages
all-visible or collect many dead tuples in that case, we can skip
index vacuuming and table vacuuming. IIUC this case, fortunately, goes
well together btree indexes’ bottom-up deletion.
It's true that bottom-up index deletion provides additional insurance
against problems, but I don't think that that insurance is strictly
necessary. It's nice to have insurance, though.
If this skipping
behavior badly affects other indexes AMs, this optimization should be
considered within btree indexes, although we will need a way for index
AMs to consider and tell the vacuum strategy. But I guess this works
well in most cases so having an on/off switch suffice.
Right. I doubt that it will actually turn out to be necessary to have
such a switch. But I try to be modest when it comes to predicting what
will be important to some user workload -- it's way too complicated to
have total confidence about something like that. It is a risk to be
managed.
--
Peter Geoghegan
On Wed, May 19, 2021 at 6:09 AM Peter Geoghegan <pg@bowt.ie> wrote:
On Tue, May 18, 2021 at 7:29 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
If this skipping
behavior badly affects other indexes AMs, this optimization should be
considered within btree indexes, although we will need a way for index
AMs to consider and tell the vacuum strategy. But I guess this works
well in most cases so having an on/off switch suffice.Right. I doubt that it will actually turn out to be necessary to have
such a switch. But I try to be modest when it comes to predicting what
will be important to some user workload -- it's way too complicated to
have total confidence about something like that. It is a risk to be
managed.
I think the possible side effect of this hard-coded
BYPASS_THRESHOLD_PAGES would be that by default, bulkdelete is not
called for a long term and the index becomes bloat. IOW, we will
enforce users have the index bloat corresponding to 2% of table pages.
The bloat could be serious depending on the index tuple size (e.g.,
index including many columns). The user may have been running
autovacuums aggressively on that table to prevent the index bloat but
it's no longer possible and there is no choice. So I think that for
those (relatively) rare use cases, it's good to provide a way to
somehow control it. Fortunately, an on/off switch is likely to be
useful for controlling other optimizations that could be added in the
future.
Regards,
--
Masahiko Sawada
EDB: https://www.enterprisedb.com/
On Sun, May 23, 2021 at 11:34 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
I think the possible side effect of this hard-coded
BYPASS_THRESHOLD_PAGES would be that by default, bulkdelete is not
called for a long term and the index becomes bloat.
What do you think of the approach taken in the attached POC patch?
The patch makes it possible to disable the optimization by
generalizing the INDEX_CLEANUP reloption to be an enum that looks like
a trinay boolean (not just a plain boolean). INDEX_CLEANUP now accepts
the values 'auto', 'on', and 'off' (plus a variety of alternative
spellings, the usual ones for booleans in Postgres). Now 'auto' is the
default, and 'on' forces the previous behavior inside vacuumlazy.c. It
does not disable the failsafe, though -- INDEX_CLEANUP remains a
fairly mechanical thing.
This approach seems good to me because INDEX_CLEANUP remains
consistent with the original purpose and design of INDEX_CLEANUP --
that was always an option that forced VACUUM to do something special
with indexes. I don't see much downside to this approach, either. As
things stand, INDEX_CLEANUP is mostly superseded by the failsafe, so
we don't really need to talk about wraparound emergencies in the docs
for INDEX_CLEANUP anymore. This seems much more elegant than either
repurposing/reviving cleanup_index_scale_factor (which makes no sense
to me at all) or inventing a new reloption (which would itself be in
tension with INDEX_CLEANUP).
There are some practical issues that make this patch surprisingly
complicated for such a simple problem. For example, I hope that I
haven't missed any subtlety in generalizing a boolean reloption like
this. We've done similar things with GUCs in the past, but this may be
a little different. Another concern with this approach is what it
means for the VACUUM command itself. I haven't added an 'auto'
spelling that is accepted by the VACUUM command in this POC version.
But do I need to at all? Can that just be implied by not having any
INDEX_CLEANUP option? And does StdRdOptions.vacuum_truncate now need
to become a VacOptTernaryValue field too, for consistency with the new
definition of StdRdOptions.vacuum_index_cleanup?
--
Peter Geoghegan
Attachments:
0001-Generalize-VACUUM-s-INDEX_CLEANUP-option.patchapplication/octet-stream; name=0001-Generalize-VACUUM-s-INDEX_CLEANUP-option.patchDownload+60-47
On Fri, May 28, 2021 at 9:53 AM Peter Geoghegan <pg@bowt.ie> wrote:
On Sun, May 23, 2021 at 11:34 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
I think the possible side effect of this hard-coded
BYPASS_THRESHOLD_PAGES would be that by default, bulkdelete is not
called for a long term and the index becomes bloat.What do you think of the approach taken in the attached POC patch?
The patch makes it possible to disable the optimization by
generalizing the INDEX_CLEANUP reloption to be an enum that looks like
a trinay boolean (not just a plain boolean). INDEX_CLEANUP now accepts
the values 'auto', 'on', and 'off' (plus a variety of alternative
spellings, the usual ones for booleans in Postgres). Now 'auto' is the
default, and 'on' forces the previous behavior inside vacuumlazy.c. It
does not disable the failsafe, though -- INDEX_CLEANUP remains a
fairly mechanical thing.
+1
This approach seems good to me because INDEX_CLEANUP remains
consistent with the original purpose and design of INDEX_CLEANUP --
that was always an option that forced VACUUM to do something special
with indexes. I don't see much downside to this approach, either. As
things stand, INDEX_CLEANUP is mostly superseded by the failsafe, so
we don't really need to talk about wraparound emergencies in the docs
for INDEX_CLEANUP anymore. This seems much more elegant than either
repurposing/reviving cleanup_index_scale_factor (which makes no sense
to me at all) or inventing a new reloption (which would itself be in
tension with INDEX_CLEANUP).
+1
There are some practical issues that make this patch surprisingly
complicated for such a simple problem. For example, I hope that I
haven't missed any subtlety in generalizing a boolean reloption like
this. We've done similar things with GUCs in the past, but this may be
a little different.
+/* values from HeapOptIndexCleanupMode */
+relopt_enum_elt_def HeapOptIndexCleanupOptValues[] =
+{
+ {"auto", VACOPT_TERNARY_DEFAULT},
+ {"on", VACOPT_TERNARY_ENABLED},
+ {"off", VACOPT_TERNARY_DISABLED},
+ {"true", VACOPT_TERNARY_ENABLED},
+ {"false", VACOPT_TERNARY_DISABLED},
+ {"1", VACOPT_TERNARY_ENABLED},
+ {"0", VACOPT_TERNARY_DISABLED},
+ {(const char *) NULL} /* list terminator */
+};
We need to accept "yes" and "no" too? Currently, the parsing of a
boolean type reloption accepts those words.
Another concern with this approach is what it
means for the VACUUM command itself. I haven't added an 'auto'
spelling that is accepted by the VACUUM command in this POC version.
But do I need to at all? Can that just be implied by not having any
INDEX_CLEANUP option?
It seems to me that it's better to have INDEX_CLEANUP option of VACUUM
command support AUTO for consistency. Do you have any concerns about
supporting it?
And does StdRdOptions.vacuum_truncate now need
to become a VacOptTernaryValue field too, for consistency with the new
definition of StdRdOptions.vacuum_index_cleanup?
We don't have the bypass optimization for heap truncation, unlike
index vacuuming. So I think we can leave both vacuum_truncate
reloption and TRUNCATE option as boolean parameters.
Regards,
--
Masahiko Sawada
EDB: https://www.enterprisedb.com/
On Mon, May 31, 2021 at 10:30:08AM +0900, Masahiko Sawada wrote:
On Fri, May 28, 2021 at 9:53 AM Peter Geoghegan <pg@bowt.ie> wrote:
Another concern with this approach is what it
means for the VACUUM command itself. I haven't added an 'auto'
spelling that is accepted by the VACUUM command in this POC version.
But do I need to at all? Can that just be implied by not having any
INDEX_CLEANUP option?It seems to me that it's better to have INDEX_CLEANUP option of VACUUM
command support AUTO for consistency. Do you have any concerns about
supporting it?
I have read through the patch, and I am surprised to see that this
only makes possible to control the optimization at relation level.
The origin of the complaints is that this index cleanup optimization
has been introduced as a new rule that gets enforced at *system*
level, so I think that we should have an equivalent with a GUC to
control the behavior for the whole system. With what you are
presenting here, one could only disable the optimization for each
relation, one-by-one. If this optimization proves to be a problem,
it's just going to be harder to users to go through all the relations
and re-tune autovacuum. Am I missing something?
--
Michael
On Fri, Jun 4, 2021 at 3:15 PM Michael Paquier <michael@paquier.xyz> wrote:
On Mon, May 31, 2021 at 10:30:08AM +0900, Masahiko Sawada wrote:
On Fri, May 28, 2021 at 9:53 AM Peter Geoghegan <pg@bowt.ie> wrote:
Another concern with this approach is what it
means for the VACUUM command itself. I haven't added an 'auto'
spelling that is accepted by the VACUUM command in this POC version.
But do I need to at all? Can that just be implied by not having any
INDEX_CLEANUP option?It seems to me that it's better to have INDEX_CLEANUP option of VACUUM
command support AUTO for consistency. Do you have any concerns about
supporting it?I have read through the patch, and I am surprised to see that this
only makes possible to control the optimization at relation level.
The origin of the complaints is that this index cleanup optimization
has been introduced as a new rule that gets enforced at *system*
level, so I think that we should have an equivalent with a GUC to
control the behavior for the whole system. With what you are
presenting here, one could only disable the optimization for each
relation, one-by-one. If this optimization proves to be a problem,
it's just going to be harder to users to go through all the relations
and re-tune autovacuum. Am I missing something?
I've not thought to disable that optimization at system level. I think
we can use this option for particular tables whose indexes
unexpectedly bloated much due to this optimization. Similarly, we have
DISABLE_PAGE_SKIPPING option but don’t have a way to disable lazy
vacuum’s page skipping behavior at system level.
Regards,
--
Masahiko Sawada
EDB: https://www.enterprisedb.com/