New GUC autovacuum_max_threshold ?
Hello,
I would like to suggest a new parameter, autovacuum_max_threshold, which
would set an upper limit on the number of tuples to delete/update/insert
prior to vacuum/analyze.
A good default might be 500000.
The idea would be to replace the following calculation :
vacthresh = (float4) vac_base_thresh + vac_scale_factor * reltuples;
with this one :
vacthresh = (float4) vac_base_thresh + vac_scale_factor * reltuples / (1
+ vac_scale_factor * reltuples / autovacuum_max_threshold)
(and the same for the others, vacinsthresh and anlthresh).
The attached graph plots vacthresh against pgclass.reltuples, with
default settings :
autovacuum_vacuum_threshold = 50
autovacuum_vacuum_scale_factor = 0.2
and
autovacuum_max_threshold = 500000 (the suggested default)
Thus, for small tables, vacthresh is only slightly smaller than 0.2 *
pgclass.reltuples, but it grows towards 500000 when reltuples → ∞
The idea is to reduce the need for autovacuum tuning.
The attached (draft) patch further illustrates the idea.
My guess is that a similar proposal has already been submitted... and
rejected 🙂 If so, I'm very sorry for the useless noise.
Best regards,
Frédéric
On Wed, Apr 24, 2024 at 8:08 AM Frédéric Yhuel
<frederic.yhuel@dalibo.com> wrote:
Hello,
I would like to suggest a new parameter, autovacuum_max_threshold, which
would set an upper limit on the number of tuples to delete/update/insert
prior to vacuum/analyze.
Hi Frédéric, thanks for the proposal! You are tackling a very tough
problem. I would also find it useful to know more about what led you
to suggest this particular solution. I am very interested in user
stories around difficulties with what tables are autovacuumed and
when.
Am I correct in thinking that one of the major goals here is for a
very large table to be more likely to be vacuumed?
The idea would be to replace the following calculation :
vacthresh = (float4) vac_base_thresh + vac_scale_factor * reltuples;
with this one :
vacthresh = (float4) vac_base_thresh + vac_scale_factor * reltuples / (1
+ vac_scale_factor * reltuples / autovacuum_max_threshold)(and the same for the others, vacinsthresh and anlthresh).
My first thought when reviewing the GUC and how it is used is
wondering if its description is a bit misleading.
autovacuum_vacuum_threshold is the "minimum number of updated or
deleted tuples needed to trigger a vacuum". That is, if this many
tuples are modified, it *may* trigger a vacuum, but we also may skip
vacuuming the table for other reasons or due to other factors.
autovacuum_max_threshold's proposed definition is the upper
limit/maximum number of tuples to insert/update/delete prior to
vacuum/analyze. This implies that if that many tuples have been
modified or inserted, the table will definitely be vacuumed -- which
isn't true. Maybe that is okay, but I thought I would bring it up.
The attached (draft) patch further illustrates the idea.
Thanks for including a patch!
My guess is that a similar proposal has already been submitted... and
rejected 🙂 If so, I'm very sorry for the useless noise.
I rooted around in the hackers archive and couldn't find any threads
on this specific proposal. I copied some other hackers I knew of who
have worked on this problem and thought about it in the past, in case
they know of some existing threads or prior work on this specific
topic.
- Melanie
On Wed, Apr 24, 2024 at 03:10:27PM -0400, Melanie Plageman wrote:
On Wed, Apr 24, 2024 at 8:08 AM Frédéric Yhuel
<frederic.yhuel@dalibo.com> wrote:I would like to suggest a new parameter, autovacuum_max_threshold, which
would set an upper limit on the number of tuples to delete/update/insert
prior to vacuum/analyze.Hi Frédéric, thanks for the proposal! You are tackling a very tough
problem. I would also find it useful to know more about what led you
to suggest this particular solution. I am very interested in user
stories around difficulties with what tables are autovacuumed and
when.Am I correct in thinking that one of the major goals here is for a
very large table to be more likely to be vacuumed?
If this is indeed the goal, +1 from me for doing something along these
lines.
The idea would be to replace the following calculation :
vacthresh = (float4) vac_base_thresh + vac_scale_factor * reltuples;
with this one :
vacthresh = (float4) vac_base_thresh + vac_scale_factor * reltuples / (1
+ vac_scale_factor * reltuples / autovacuum_max_threshold)(and the same for the others, vacinsthresh and anlthresh).
My first thought when reviewing the GUC and how it is used is
wondering if its description is a bit misleading.
Yeah, I'm having trouble following the proposed mechanics for this new GUC,
and it's difficult to understand how users would choose a value. If we
just want to cap the number of tuples required before autovacuum takes
action, perhaps we could simplify it to something like
vacthresh = (float4) vac_base_thresh + vac_scale_factor * reltuples;
vacthresh = Min(vacthres, vac_max_thresh);
This would effectively cause autovacuum_vacuum_scale_factor to be
overridden for large tables where the scale factor would otherwise cause
the calculated threshold to be extremely high.
My guess is that a similar proposal has already been submitted... and
rejected 🙂 If so, I'm very sorry for the useless noise.I rooted around in the hackers archive and couldn't find any threads
on this specific proposal. I copied some other hackers I knew of who
have worked on this problem and thought about it in the past, in case
they know of some existing threads or prior work on this specific
topic.
FWIW I have heard about this problem in the past, too.
--
Nathan Bossart
Amazon Web Services: https://aws.amazon.com
Le 24/04/2024 à 21:10, Melanie Plageman a écrit :
On Wed, Apr 24, 2024 at 8:08 AM Frédéric Yhuel
<frederic.yhuel@dalibo.com> wrote:Hello,
I would like to suggest a new parameter, autovacuum_max_threshold, which
would set an upper limit on the number of tuples to delete/update/insert
prior to vacuum/analyze.Hi Frédéric, thanks for the proposal! You are tackling a very tough
problem. I would also find it useful to know more about what led you
to suggest this particular solution. I am very interested in user
stories around difficulties with what tables are autovacuumed and
when.
Hi Melanie! I can certainly start compiling user stories about that.
Recently, one of my colleagues wrote an email to our DBA team saying
something along these lines:
« Hey, here is our suggested settings for per table autovacuum
configuration:
| *autovacuum* | L < 1 million | L >= 1 million | L >= 5
millions | L >= 10 millions |
|:---------------------|--------------:|---------------:|----------------:|-----------------:|
|`vacuum_scale_factor` | 0.2 (défaut) | 0.1 | 0.05
| 0.0 |
|`vacuum_threshold` | 50 (défaut) | 50 (défaut) | 50
(défaut) | 500 000 |
|`analyze_scale_factor`| 0.1 (défaut) | 0.1 (défaut) | 0.05
| 0.0 |
|`analyze_threshold` | 50 (défaut) | 50 (défaut) | 50
(défaut) | 500 000 |
Let's update this table with values for the vacuum_insert_* parameters. »
I wasn't aware that we had this table, and although the settings made
sense to me, I thought it was rather ugly and cumbersome for the user,
and I started thinking about how postgres could make his life easier.
Am I correct in thinking that one of the major goals here is for a
very large table to be more likely to be vacuumed?
Absolutely.
The idea would be to replace the following calculation :
vacthresh = (float4) vac_base_thresh + vac_scale_factor * reltuples;
with this one :
vacthresh = (float4) vac_base_thresh + vac_scale_factor * reltuples / (1
+ vac_scale_factor * reltuples / autovacuum_max_threshold)(and the same for the others, vacinsthresh and anlthresh).
My first thought when reviewing the GUC and how it is used is
wondering if its description is a bit misleading.autovacuum_vacuum_threshold is the "minimum number of updated or
deleted tuples needed to trigger a vacuum". That is, if this many
tuples are modified, it *may* trigger a vacuum, but we also may skip
vacuuming the table for other reasons or due to other factors.
autovacuum_max_threshold's proposed definition is the upper
limit/maximum number of tuples to insert/update/delete prior to
vacuum/analyze. This implies that if that many tuples have been
modified or inserted, the table will definitely be vacuumed -- which
isn't true. Maybe that is okay, but I thought I would bring it up.
I'm not too sure I understand. What are the reasons it might by skipped?
I can think of a concurrent index creation on the same table, or
anything holding a SHARE UPDATE EXCLUSIVE lock or above. Is this the
sort of thing you are talking about?
Perhaps a better name for the GUC would be
autovacuum_asymptotic_limit... or something like that?
The attached (draft) patch further illustrates the idea.
Thanks for including a patch!
My guess is that a similar proposal has already been submitted... and
rejected 🙂 If so, I'm very sorry for the useless noise.I rooted around in the hackers archive and couldn't find any threads
on this specific proposal. I copied some other hackers I knew of who
have worked on this problem and thought about it in the past, in case
they know of some existing threads or prior work on this specific
topic.
Thanks!
Hi Nathan, thanks for your review.
Le 24/04/2024 à 21:57, Nathan Bossart a écrit :
Yeah, I'm having trouble following the proposed mechanics for this new GUC,
and it's difficult to understand how users would choose a value. If we
just want to cap the number of tuples required before autovacuum takes
action, perhaps we could simplify it to something likevacthresh = (float4) vac_base_thresh + vac_scale_factor * reltuples;
vacthresh = Min(vacthres, vac_max_thresh);This would effectively cause autovacuum_vacuum_scale_factor to be
overridden for large tables where the scale factor would otherwise cause
the calculated threshold to be extremely high.
This would indeed work, and the parameter would be easier to define in
the user documentation. I prefer a continuous function... but that is
personal taste. It seems to me that autovacuum tuning is quite hard
anyway, and that it wouldn't be that much difficult with this kind of
asymptotic limit parameter.
But I think the most important thing is to avoid per-table configuration
for most of the users, or event autovacuum tuning at all, so either of
these two formulas would do.
On Thu, Apr 25, 2024 at 09:13:07AM +0200, Fr�d�ric Yhuel wrote:
Le 24/04/2024 � 21:57, Nathan Bossart a �crit�:
Yeah, I'm having trouble following the proposed mechanics for this new GUC,
and it's difficult to understand how users would choose a value. If we
just want to cap the number of tuples required before autovacuum takes
action, perhaps we could simplify it to something likevacthresh = (float4) vac_base_thresh + vac_scale_factor * reltuples;
vacthresh = Min(vacthres, vac_max_thresh);This would effectively cause autovacuum_vacuum_scale_factor to be
overridden for large tables where the scale factor would otherwise cause
the calculated threshold to be extremely high.This would indeed work, and the parameter would be easier to define in the
user documentation. I prefer a continuous function... but that is personal
taste. It seems to me that autovacuum tuning is quite hard anyway, and that
it wouldn't be that much difficult with this kind of asymptotic limit
parameter.
I do think this is a neat idea, but would the two approaches really be much
different in practice? The scale factor parameters already help keep the
limit smaller for small tables and larger for large ones, so it strikes me
as needless complexity. I think we'd need some sort of tangible reason to
think the asymptotic limit is better.
But I think the most important thing is to avoid per-table configuration for
most of the users, or event autovacuum tuning at all, so either of these two
formulas would do.
Yeah, I agree with the goal of minimizing the need for per-table
configurations.
--
Nathan Bossart
Amazon Web Services: https://aws.amazon.com
On Thu, Apr 25, 2024 at 2:52 AM Frédéric Yhuel
<frederic.yhuel@dalibo.com> wrote:
Le 24/04/2024 à 21:10, Melanie Plageman a écrit :
On Wed, Apr 24, 2024 at 8:08 AM Frédéric Yhuel
<frederic.yhuel@dalibo.com> wrote:Hello,
I would like to suggest a new parameter, autovacuum_max_threshold, which
would set an upper limit on the number of tuples to delete/update/insert
prior to vacuum/analyze.Hi Frédéric, thanks for the proposal! You are tackling a very tough
problem. I would also find it useful to know more about what led you
to suggest this particular solution. I am very interested in user
stories around difficulties with what tables are autovacuumed and
when.Hi Melanie! I can certainly start compiling user stories about that.
Cool! That would be very useful.
The idea would be to replace the following calculation :
vacthresh = (float4) vac_base_thresh + vac_scale_factor * reltuples;
with this one :
vacthresh = (float4) vac_base_thresh + vac_scale_factor * reltuples / (1
+ vac_scale_factor * reltuples / autovacuum_max_threshold)(and the same for the others, vacinsthresh and anlthresh).
My first thought when reviewing the GUC and how it is used is
wondering if its description is a bit misleading.autovacuum_vacuum_threshold is the "minimum number of updated or
deleted tuples needed to trigger a vacuum". That is, if this many
tuples are modified, it *may* trigger a vacuum, but we also may skip
vacuuming the table for other reasons or due to other factors.
autovacuum_max_threshold's proposed definition is the upper
limit/maximum number of tuples to insert/update/delete prior to
vacuum/analyze. This implies that if that many tuples have been
modified or inserted, the table will definitely be vacuumed -- which
isn't true. Maybe that is okay, but I thought I would bring it up.I'm not too sure I understand. What are the reasons it might by skipped?
I can think of a concurrent index creation on the same table, or
anything holding a SHARE UPDATE EXCLUSIVE lock or above. Is this the
sort of thing you are talking about?
No, I was thinking more literally that, if reltuples (assuming
reltuples is modified/inserted tuples) > autovacuum_max_threshold, I
would expect the table to be vacuumed. However, with your formula,
that wouldn't necessarily be true.
I think there are values of reltuples and autovacuum_max_threshold at
which reltuples > autovacuum_max_threshold but reltuples <=
vac_base_thresh + vac_scale_factor * reltuples / (1 + vac_scale_factor
* reltuples / autovacuum_max_threshold)
I tried to reduce the formula to come up with a precise definition of
the range of values for which this is true, however I wasn't able to
reduce it to something nice.
Here is just an example of a case:
vac_base_thresh = 2000
vac_scale_factor = 0.9
reltuples = 3200
autovacuum_max_threshold = 2500
total_thresh = vac_base_thresh + vac_scale_factor * reltuples / (1 +
vac_scale_factor * reltuples / autovacuum_max_threshold)
total_thresh: 3338. dead tuples: 3200. autovacuum_max_threshold: 2500
so there are more dead tuples than the max threshold, so it should
trigger a vacuum, but it doesn't because the total calculated
threshold is higher than the number of dead tuples.
This of course may not be a realistic scenario in practice. It works
best the closer scale factor is to 1 (wish I had derived the formula
successfully) and when autovacuum_max_threshold > 2 * vac_base_thresh.
So, maybe it is not an issue.
Perhaps a better name for the GUC would be
autovacuum_asymptotic_limit... or something like that?
If we keep the asymptotic part, that makes sense. I wonder if we have
to add another "vacuum" in there (e.g.
autovacuum_vacuum_max_threshold) to be consistent with the other gucs.
I don't really know why they have that extra "vacuum" in them, though.
Makes the names so long.
- Melanie
On Wed, Apr 24, 2024 at 3:57 PM Nathan Bossart <nathandbossart@gmail.com> wrote:
Yeah, I'm having trouble following the proposed mechanics for this new GUC,
and it's difficult to understand how users would choose a value. If we
just want to cap the number of tuples required before autovacuum takes
action, perhaps we could simplify it to something likevacthresh = (float4) vac_base_thresh + vac_scale_factor * reltuples;
vacthresh = Min(vacthres, vac_max_thresh);This would effectively cause autovacuum_vacuum_scale_factor to be
overridden for large tables where the scale factor would otherwise cause
the calculated threshold to be extremely high.
+1 for this. It seems a lot easier to understand than the original
proposal. And in fact, when I was working on my 2024.pgconf.dev
presentation, I suggested exactly this idea on one of my slides.
I believe that the underlying problem here can be summarized in this
way: just because I'm OK with 2MB of bloat in my 10MB table doesn't
mean that I'm OK with 2TB of bloat in my 10TB table. One reason for
this is simply that I can afford to waste 2MB much more easily than I
can afford to waste 2TB -- and that applies both on disk and in
memory. Another reason, at least in existing releases, is that at some
point index vacuuming hits a wall because we run out of space for dead
tuples. We *most definitely* want to do index vacuuming before we get
to the point where we're going to have to do multiple cycles of index
vacuuming. That latter problem should be fixed in v17 by the recent
dead TID storage changes. But even so, you generally want to contain
bloat before too many pages get added to your tables or indexes,
because you can't easily get rid of them again afterward, so I think
there's still a good case for preventing autovacuum from scaling the
threshold out to infinity.
What does surprise me is that Frédéric suggests a default value of
500,000. If half a million tuples (proposed default) is 20% of your
table (default value of autovacuum_vacuum_scale_factor) then your
table has 2.5 million tuples. Unless those tuples are very wide, that
table isn't even 1GB in size. I'm not aware that there's any problem
at all with the current formula on a table of that size, or even ten
times that size. I think you need to have tables that are hundreds of
gigabytes in size at least before this starts to become a serious
problem. Looking at this from another angle, in existing releases, the
maximum usable amount of autovacuum_work_mem is 1GB, which means we
can store one-sixth of a billion dead TIDs, or roughly 166 million.
And that limit has been a source of occasional complaints for years.
So we have those complaints on the one hand, suggesting that 166
million is not enough, and then we have this proposal, saying that
more than half a million is too much. That's really strange; my
initial hunch is that the value should be 100-500x higher than what
Frédéric proposed.
I'm also sort of wondering how much the tuple width matters here. I'm
not quite sure.
--
Robert Haas
EDB: http://www.enterprisedb.com
On Thu, Apr 25, 2024 at 02:33:05PM -0400, Robert Haas wrote:
What does surprise me is that Fr�d�ric suggests a default value of
500,000. If half a million tuples (proposed default) is 20% of your
table (default value of autovacuum_vacuum_scale_factor) then your
table has 2.5 million tuples. Unless those tuples are very wide, that
table isn't even 1GB in size. I'm not aware that there's any problem
at all with the current formula on a table of that size, or even ten
times that size. I think you need to have tables that are hundreds of
gigabytes in size at least before this starts to become a serious
problem. Looking at this from another angle, in existing releases, the
maximum usable amount of autovacuum_work_mem is 1GB, which means we
can store one-sixth of a billion dead TIDs, or roughly 166 million.
And that limit has been a source of occasional complaints for years.
So we have those complaints on the one hand, suggesting that 166
million is not enough, and then we have this proposal, saying that
more than half a million is too much. That's really strange; my
initial hunch is that the value should be 100-500x higher than what
Fr�d�ric proposed.
Agreed, the default should probably be on the order of 100-200M minimum.
The original proposal also seems to introduce one parameter that would
affect all three of autovacuum_vacuum_threshold,
autovacuum_vacuum_insert_threshold, and autovacuum_analyze_threshold. Is
that okay? Or do we need to introduce a "limit" GUC for each? I guess the
question is whether we anticipate any need to have different values for
these limits, which might be unlikely.
--
Nathan Bossart
Amazon Web Services: https://aws.amazon.com
On Thu, Apr 25, 2024 at 3:21 PM Nathan Bossart <nathandbossart@gmail.com> wrote:
Agreed, the default should probably be on the order of 100-200M minimum.
The original proposal also seems to introduce one parameter that would
affect all three of autovacuum_vacuum_threshold,
autovacuum_vacuum_insert_threshold, and autovacuum_analyze_threshold. Is
that okay? Or do we need to introduce a "limit" GUC for each? I guess the
question is whether we anticipate any need to have different values for
these limits, which might be unlikely.
I don't think we should make the same limit apply to more than one of
those. I would phrase the question in the opposite way that you did:
is there any particular reason to believe that the limits should be
the same? I don't see one.
I think it would be OK to introduce limits for some and leave the
others uncapped, but I don't like the idea of reusing the same limit
for different things.
My intuition is strongest for the vacuum threshold -- that's such an
expensive operation, takes so long, and has such dire consequences if
it isn't done. We need to force the table to be vacuumed before it
bloats out of control. Maybe essentially the same logic applies to the
insert threshold, namely, that we should vacuum before the number of
not-all-visible pages gets too large, but I think it's less clear.
It's just not nearly as bad if that happens. Sure, it may not be great
when vacuum eventually runs and hits a ton of pages all at once, but
it's not even close to being as catastrophic as the vacuum case.
The analyze case, I feel, is really murky.
autovacuum_analyze_scale_factor stands for the proposition that as the
table becomes larger, analyze doesn't need to be done as often. If
what you're concerned about is the frequency estimates, that's true:
an injection of a million new rows can shift frequencies dramatically
in a small table, but the effect is blunted in a large one. But a lot
of the cases I've seen have involved the histogram boundaries. If
you're inserting data into a table in increasing order, every new
million rows shifts the boundary of the last histogram bucket by the
same amount. You either need those rows included in the histogram to
get good query plans, or you don't. If you do, the frequency with
which you need to analyze does not change as the table grows. If you
don't, then it probably does. But the answer doesn't really depend on
how big the table is already, but on your workload. So it's unclear to
me that the proposed parameter is the right idea here at all. It's
also unclear to me that the existing system is the right idea. :-)
So overall I guess I'd lean toward just introducing a cap for the
"vacuum" case and leave the "insert" and "analyze" cases as ideas for
possible future consideration, but I'm not 100% sure.
--
Robert Haas
EDB: http://www.enterprisedb.com
Le 25/04/2024 à 21:21, Nathan Bossart a écrit :
On Thu, Apr 25, 2024 at 02:33:05PM -0400, Robert Haas wrote:
What does surprise me is that Frédéric suggests a default value of
500,000. If half a million tuples (proposed default) is 20% of your
table (default value of autovacuum_vacuum_scale_factor) then your
table has 2.5 million tuples. Unless those tuples are very wide, that
table isn't even 1GB in size. I'm not aware that there's any problem
at all with the current formula on a table of that size, or even ten
times that size. I think you need to have tables that are hundreds of
gigabytes in size at least before this starts to become a serious
problem. Looking at this from another angle, in existing releases, the
maximum usable amount of autovacuum_work_mem is 1GB, which means we
can store one-sixth of a billion dead TIDs, or roughly 166 million.
And that limit has been a source of occasional complaints for years.
So we have those complaints on the one hand, suggesting that 166
million is not enough, and then we have this proposal, saying that
more than half a million is too much. That's really strange; my
initial hunch is that the value should be 100-500x higher than what
Frédéric proposed.Agreed, the default should probably be on the order of 100-200M minimum.
I'm not sure... 500000 comes from the table given in a previous message.
It may not be large enough. But vacuum also updates the visibility map,
and a few hundred thousand heap fetches can already hurt the performance
of an index-only scan, even if most of the blocs are read from cache.
The original proposal also seems to introduce one parameter that would
affect all three of autovacuum_vacuum_threshold,
autovacuum_vacuum_insert_threshold, and autovacuum_analyze_threshold. Is
that okay? Or do we need to introduce a "limit" GUC for each? I guess the
question is whether we anticipate any need to have different values for
these limits, which might be unlikely.
I agree with you, it seems unlikely. This is also an answer to Melanie's
question about the name of the GUC : I deliberately left out the other
"vacuum" because I thought we only needed one parameter for these three
thresholds.
Now I have just read Robert's new message, and I understand his point.
But is there a real problem with triggering analyze after every 500000
(or more) modifications in the table anyway?
Le 25/04/2024 à 18:51, Melanie Plageman a écrit :
I'm not too sure I understand. What are the reasons it might by skipped?
I can think of a concurrent index creation on the same table, or
anything holding a SHARE UPDATE EXCLUSIVE lock or above. Is this the
sort of thing you are talking about?No, I was thinking more literally that, if reltuples (assuming
reltuples is modified/inserted tuples) > autovacuum_max_threshold, I
would expect the table to be vacuumed. However, with your formula,
that wouldn't necessarily be true.I think there are values of reltuples and autovacuum_max_threshold at
which reltuples > autovacuum_max_threshold but reltuples <=
vac_base_thresh + vac_scale_factor * reltuples / (1 + vac_scale_factor
* reltuples / autovacuum_max_threshold)I tried to reduce the formula to come up with a precise definition of
the range of values for which this is true, however I wasn't able to
reduce it to something nice.Here is just an example of a case:
vac_base_thresh = 2000
vac_scale_factor = 0.9
reltuples = 3200
autovacuum_max_threshold = 2500total_thresh = vac_base_thresh + vac_scale_factor * reltuples / (1 +
vac_scale_factor * reltuples / autovacuum_max_threshold)total_thresh: 3338. dead tuples: 3200. autovacuum_max_threshold: 2500
so there are more dead tuples than the max threshold, so it should
trigger a vacuum, but it doesn't because the total calculated
threshold is higher than the number of dead tuples.
OK, thank you! I got it.
This of course may not be a realistic scenario in practice. It works
best the closer scale factor is to 1 (wish I had derived the formula
successfully) and when autovacuum_max_threshold > 2 * vac_base_thresh.
So, maybe it is not an issue.
I haven't thought much about this yet. I hope we can avoid such an
extreme scenario by imposing some kind of constraint on this parameter,
in relation to the others.
Anyway, with Nathan and Robert upvoting the simpler formula, this will
probably become irrelevant anyway :-)
On Thu, Apr 25, 2024 at 4:57 PM Frédéric Yhuel
<frederic.yhuel@dalibo.com> wrote:
Now I have just read Robert's new message, and I understand his point.
But is there a real problem with triggering analyze after every 500000
(or more) modifications in the table anyway?
It depends on the situation, but even on a laptop, you can do that
number of modifications in one second. You could easily have a
moderate large number of tables that hit that threshold every minute,
and thus get auto-analyzed every minute when an autovacuum worker is
launched in that database. Now, in some situations, that could be a
good thing, because I suspect it's not very hard to construct a
workload where constantly analyzing all of your busy tables is
necessary to maintain query performance. But in general I think what
would happen with such a low threshold is that you'd end up with
autovacuum spending an awful lot of its available resources on useless
analyze operations, which would waste I/O and CPU time, and more
importantly, interfere with its ability to get vacuums done.
To put it another way, suppose my tables contain 10 million tuples
each, which is not particularly large. The analyze scale factor is
10%, so currently I'd analyze after a million table modifications.
Your proposal drops that to half a million, so I'm going to start
analyzing 20 times more often. If you start doing ANYTHING to a
database twenty times more often, it can cause a problem. Twenty times
more selects, twenty times more checkpoints, twenty times more
vacuuming, whatever. It's just a lot of resources to spend on
something if that thing isn't actually necessary.
--
Robert Haas
EDB: http://www.enterprisedb.com
On Thu, 2024-04-25 at 14:33 -0400, Robert Haas wrote:
I believe that the underlying problem here can be summarized in this
way: just because I'm OK with 2MB of bloat in my 10MB table doesn't
mean that I'm OK with 2TB of bloat in my 10TB table. One reason for
this is simply that I can afford to waste 2MB much more easily than I
can afford to waste 2TB -- and that applies both on disk and in
memory.
I don't find that convincing. Why are 2TB of wasted space in a 10TB
table worse than 2TB of wasted space in 100 tables of 100GB each?
Another reason, at least in existing releases, is that at some
point index vacuuming hits a wall because we run out of space for dead
tuples. We *most definitely* want to do index vacuuming before we get
to the point where we're going to have to do multiple cycles of index
vacuuming.
That is more convincing. But do we need a GUC for that? What about
making a table eligible for autovacuum as soon as the number of dead
tuples reaches 90% of what you can hold in "autovacuum_work_mem"?
Yours,
Laurenz Albe
Le 26/04/2024 à 04:24, Laurenz Albe a écrit :
On Thu, 2024-04-25 at 14:33 -0400, Robert Haas wrote:
I believe that the underlying problem here can be summarized in this
way: just because I'm OK with 2MB of bloat in my 10MB table doesn't
mean that I'm OK with 2TB of bloat in my 10TB table. One reason for
this is simply that I can afford to waste 2MB much more easily than I
can afford to waste 2TB -- and that applies both on disk and in
memory.I don't find that convincing. Why are 2TB of wasted space in a 10TB
table worse than 2TB of wasted space in 100 tables of 100GB each?
Good point, but another way of summarizing the problem would be that the
autovacuum_*_scale_factor parameters work well as long as we have a more
or less evenly distributed access pattern in the table.
Suppose my very large table gets updated only for its 1% most recent
rows. We probably want to decrease autovacuum_analyze_scale_factor and
autovacuum_vacuum_scale_factor for this one.
Partitioning would be a good solution, but IMHO postgres should be able
to handle this case anyway, ideally without per-table configuration.
Hi,
On Fri, Apr 26, 2024 at 04:24:45AM +0200, Laurenz Albe wrote:
On Thu, 2024-04-25 at 14:33 -0400, Robert Haas wrote:
Another reason, at least in existing releases, is that at some
point index vacuuming hits a wall because we run out of space for dead
tuples. We *most definitely* want to do index vacuuming before we get
to the point where we're going to have to do multiple cycles of index
vacuuming.That is more convincing. But do we need a GUC for that? What about
making a table eligible for autovacuum as soon as the number of dead
tuples reaches 90% of what you can hold in "autovacuum_work_mem"?
Due to the improvements in v17, this would basically never trigger
accordings to my understanding, or at least only after an excessive
amount of bloat has been accumulated.
Michael
Le 25/04/2024 à 22:21, Robert Haas a écrit :
The analyze case, I feel, is really murky.
autovacuum_analyze_scale_factor stands for the proposition that as the
table becomes larger, analyze doesn't need to be done as often. If
what you're concerned about is the frequency estimates, that's true:
an injection of a million new rows can shift frequencies dramatically
in a small table, but the effect is blunted in a large one. But a lot
of the cases I've seen have involved the histogram boundaries. If
you're inserting data into a table in increasing order, every new
million rows shifts the boundary of the last histogram bucket by the
same amount. You either need those rows included in the histogram to
get good query plans, or you don't. If you do, the frequency with
which you need to analyze does not change as the table grows. If you
don't, then it probably does. But the answer doesn't really depend on
how big the table is already, but on your workload. So it's unclear to
me that the proposed parameter is the right idea here at all. It's
also unclear to me that the existing system is the right idea. 🙂
This is very interesting. And what about ndistinct? I believe it could
be problematic, too, in some (admittedly rare or pathological) cases.
For example, suppose that the actual number of distinct values grows
from 1000 to 200000 after a batch of insertions, for a particular
column. OK, in such a case, the default analyze sampling isn't large
enough to compute a ndistinct close enough to reality anyway. But
without any analyze at all, it can lead to very bad planning - think of
a Nested Loop with a parallel seq scan for the outer table instead of a
simple efficient index scan, because the index scan of the inner table
is overestimated (each index scan cost and number or rows returned).
On Fri, 2024-04-26 at 09:35 +0200, Frédéric Yhuel wrote:
Le 26/04/2024 à 04:24, Laurenz Albe a écrit :
On Thu, 2024-04-25 at 14:33 -0400, Robert Haas wrote:
I believe that the underlying problem here can be summarized in this
way: just because I'm OK with 2MB of bloat in my 10MB table doesn't
mean that I'm OK with 2TB of bloat in my 10TB table. One reason for
this is simply that I can afford to waste 2MB much more easily than I
can afford to waste 2TB -- and that applies both on disk and in
memory.I don't find that convincing. Why are 2TB of wasted space in a 10TB
table worse than 2TB of wasted space in 100 tables of 100GB each?Good point, but another way of summarizing the problem would be that the
autovacuum_*_scale_factor parameters work well as long as we have a more
or less evenly distributed access pattern in the table.Suppose my very large table gets updated only for its 1% most recent
rows. We probably want to decrease autovacuum_analyze_scale_factor and
autovacuum_vacuum_scale_factor for this one.Partitioning would be a good solution, but IMHO postgres should be able
to handle this case anyway, ideally without per-table configuration.
I agree that you may well want autovacuum and autoanalyze treat your large
table differently from your small tables.
But I am reluctant to accept even more autovacuum GUCs. It's not like
we don't have enough of them, rather the opposite. You can slap on more
GUCs to treat more special cases, but we will never reach the goal of
having a default that will make everybody happy.
I believe that the defaults should work well in moderately sized databases
with moderate usage characteristics. If you have large tables or a high
number of transactions per second, you can be expected to make the effort
and adjust the settings for your case. Adding more GUCs makes life *harder*
for the users who are trying to understand and configure how autovacuum works.
Yours,
Laurenz Albe
Hi,
On Fri, Apr 26, 2024 at 10:18:00AM +0200, Laurenz Albe wrote:
On Fri, 2024-04-26 at 09:35 +0200, Fr�d�ric Yhuel wrote:
Le 26/04/2024 � 04:24, Laurenz Albe a �crit�:
On Thu, 2024-04-25 at 14:33 -0400, Robert Haas wrote:
I believe that the underlying problem here can be summarized in this
way: just because I'm OK with 2MB of bloat in my 10MB table doesn't
mean that I'm OK with 2TB of bloat in my 10TB table. One reason for
this is simply that I can afford to waste 2MB much more easily than I
can afford to waste 2TB -- and that applies both on disk and in
memory.I don't find that convincing. Why are 2TB of wasted space in a 10TB
table worse than 2TB of wasted space in 100 tables of 100GB each?Good point, but another way of summarizing the problem would be that the
autovacuum_*_scale_factor parameters work well as long as we have a more
or less evenly distributed access pattern in the table.Suppose my very large table gets updated only for its 1% most recent
rows. We probably want to decrease autovacuum_analyze_scale_factor and
autovacuum_vacuum_scale_factor for this one.Partitioning would be a good solution, but IMHO postgres should be able
to handle this case anyway, ideally without per-table configuration.I agree that you may well want autovacuum and autoanalyze treat your large
table differently from your small tables.But I am reluctant to accept even more autovacuum GUCs. It's not like
we don't have enough of them, rather the opposite. You can slap on more
GUCs to treat more special cases, but we will never reach the goal of
having a default that will make everybody happy.I believe that the defaults should work well in moderately sized databases
with moderate usage characteristics. If you have large tables or a high
number of transactions per second, you can be expected to make the effort
and adjust the settings for your case. Adding more GUCs makes life *harder*
for the users who are trying to understand and configure how autovacuum works.
Well, I disagree to some degree. I agree that the defaults should work
well in moderately sized databases with moderate usage characteristics.
But I also think we can do better than telling DBAs to they have to
manually fine-tune autovacuum for large tables (and frequenlty
implementing by hand what this patch is proposed, namely setting
autovacuum_vacuum_scale_factor to 0 and autovacuum_vacuum_threshold to a
high number), as this is cumbersome and needs adult supervision that is
not always available. Of course, it would be great if we just slap some
AI into the autovacuum launcher that figures things out automagically,
but I don't think we are there, yet.
So this proposal (probably along with a higher default threshold than
500000, but IMO less than what Robert and Nathan suggested) sounds like
a stop forward to me. DBAs can set the threshold lower if they want, or
maybe we can just turn it off by default if we cannot agree on a sane
default, but I think this (using the simplified formula from Nathan) is
a good approach that takes some pain away from autovacuum tuning and
reserves that for the really difficult cases.
Michael
On 4/26/24 04:43, Michael Banck wrote:
So this proposal (probably along with a higher default threshold than
500000, but IMO less than what Robert and Nathan suggested) sounds like
a stop forward to me. DBAs can set the threshold lower if they want, or
maybe we can just turn it off by default if we cannot agree on a sane
default, but I think this (using the simplified formula from Nathan) is
a good approach that takes some pain away from autovacuum tuning and
reserves that for the really difficult cases.
+1 to the above
Although I don't think 500000 is necessarily too small. In my view,
having autovac run very quickly, even if more frequently, provides an
overall better user experience.
--
Joe Conway
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com