autovacuum not prioritising for-wraparound tables
Hi,
I have a bug pending that autovacuum fails to give priority to
for-wraparound tables. When xid consumption rate is high and dead tuple
creation is also high, it is possible that some tables are waiting for
for-wraparound vacuums that don't complete in time because the workers
are busy processing other tables that have accumulated dead tuples; the
system is then down because it's too near the Xid wraparound horizon.
Apparently this is particularly notorious in connection with TOAST
tables, because those are always put in the tables-to-process list after
regular tables.
(As far as I recall, this was already reported elsewhere, but so far I
have been unable to find the discussion in the archives. Pointers
appreciated.)
So here's a small, backpatchable patch that sorts the list of tables to
process (not all that much tested yet). Tables which have the
wraparound flag set are processed before those that are not. Other
than this criterion, the order is not defined.
Now we could implement this differently, and maybe more simply (say by
keeping two lists of tables to process, one with for-wraparound tables
and one with the rest) but this way it is simpler to add additional
sorting criteria later: say within each category we could first process
smaller tables that have more dead tuples.
My intention is to clean this up and backpatch to all live branches.
Comments?
--
Álvaro Herrera http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
Attachments:
autovacuum-toast-wraparound.patchtext/x-diff; charset=us-asciiDownload+87-52
On 24.01.2013 23:57, Alvaro Herrera wrote:
I have a bug pending that autovacuum fails to give priority to
for-wraparound tables. When xid consumption rate is high and dead tuple
creation is also high, it is possible that some tables are waiting for
for-wraparound vacuums that don't complete in time because the workers
are busy processing other tables that have accumulated dead tuples; the
system is then down because it's too near the Xid wraparound horizon.
Apparently this is particularly notorious in connection with TOAST
tables, because those are always put in the tables-to-process list after
regular tables.(As far as I recall, this was already reported elsewhere, but so far I
have been unable to find the discussion in the archives. Pointers
appreciated.)So here's a small, backpatchable patch that sorts the list of tables to
process (not all that much tested yet). Tables which have the
wraparound flag set are processed before those that are not. Other
than this criterion, the order is not defined.Now we could implement this differently, and maybe more simply (say by
keeping two lists of tables to process, one with for-wraparound tables
and one with the rest) but this way it is simpler to add additional
sorting criteria later: say within each category we could first process
smaller tables that have more dead tuples.My intention is to clean this up and backpatch to all live branches.
Comments?
Backpatching sounds a bit scary. It's not a clear-cut bug, it's just
that autovacuum could be smarter about its priorities. There are other
ways you can still bump into the xid-wraparound issue, even with this patch.
- Heikki
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Thu, Jan 24, 2013 at 5:22 PM, Heikki Linnakangas
<hlinnakangas@vmware.com> wrote:
Backpatching sounds a bit scary. It's not a clear-cut bug, it's just that
autovacuum could be smarter about its priorities. There are other ways you
can still bump into the xid-wraparound issue, even with this patch.
I don't think this is a single-priority issue. It's *also* crucial
that small tables
with high "tuple attrition rates" get vacuumed extremely frequently; your system
will bog down, albeit in a different way, if the small tables don't
get vacuumed enough.
This seems to me to involve multiple competing priorities where the
main solution
*I* can think of is to have multiple backends doing autovacuum, and assigning
some to XID activity and others to the "small, needs vacuuming
frequently" tables.
--
When confronted by a difficult problem, solve it by reducing it to the
question, "How would the Lone Ranger handle this?"
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Christopher Browne <cbbrowne@gmail.com> writes:
On Thu, Jan 24, 2013 at 5:22 PM, Heikki Linnakangas
<hlinnakangas@vmware.com> wrote:Backpatching sounds a bit scary. It's not a clear-cut bug, it's just that
autovacuum could be smarter about its priorities. There are other ways you
can still bump into the xid-wraparound issue, even with this patch.
I don't think this is a single-priority issue. It's *also* crucial
that small tables with high "tuple attrition rates" get vacuumed
extremely frequently; your system will bog down, albeit in a different
way, if the small tables don't get vacuumed enough.
Yeah. Another problem with a simple-minded priority arrangement is that
it might cause some tables to get starved for service because workers
keep on choosing other ones; we have to be sure the sorting rule is
designed to prevent that.
As posted, what we've got here is sorting on a boolean condition, with
the behavior within each group totally up to the whims of qsort(). That
seems especially dangerous since the priority order is mostly undefined.
I was a bit surprised that Alvaro didn't propose sorting by the age of
relfrozenxid, at least for the subset of tables that are considered
wraparound hazards. Not sure what a good criterion is for the rest.
regards, tom lane
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Hi Alvaro,
Nice to see a patch on this!
On 2013-01-24 18:57:15 -0300, Alvaro Herrera wrote:
I have a bug pending that autovacuum fails to give priority to
for-wraparound tables. When xid consumption rate is high and dead tuple
creation is also high, it is possible that some tables are waiting for
for-wraparound vacuums that don't complete in time because the workers
are busy processing other tables that have accumulated dead tuples; the
system is then down because it's too near the Xid wraparound horizon.
Apparently this is particularly notorious in connection with TOAST
tables, because those are always put in the tables-to-process list after
regular tables.(As far as I recall, this was already reported elsewhere, but so far I
have been unable to find the discussion in the archives. Pointers
appreciated.)So here's a small, backpatchable patch that sorts the list of tables to
process (not all that much tested yet). Tables which have the
wraparound flag set are processed before those that are not. Other
than this criterion, the order is not defined.Now we could implement this differently, and maybe more simply (say by
keeping two lists of tables to process, one with for-wraparound tables
and one with the rest) but this way it is simpler to add additional
sorting criteria later: say within each category we could first process
smaller tables that have more dead tuples.
If I remember the issue that triggered this correctly I don't think this
would be sufficient to solve the whole issue although it sure would
delay the shutdown.
Due to the high activity on the system while some bigger, active table
got vacuumed, other previously vacuumed tables already hit
freeze_max_age again and thus they were reeligible for vacuum again even
though other tables - in our the specific case always toast relations
because they always got added last - were very short before the shutdown
limit.
So I think we need to sort by age(relfrozenxid) in tables that are over
the anti-wraparound limit. Given your code that doesn't seem to be that
hard?
I think after the infrastructure is there we might want to have some
more intelligence for non-wraparound tables too, but that possibly looks
more like a HEAD than a backpatch thing.
I am very much of the opinion that this needs to be backpatched though -
its a pretty bad thing if autovacuum cannot be relied on to keep a
system from shutting itself down because it always vacuums the wrong
relations and never gets to the problematic ones. Single user mode is
nothing normal users should ever have to see.
Greetings,
Andres Freund
--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Tom Lane escribió:
As posted, what we've got here is sorting on a boolean condition, with
the behavior within each group totally up to the whims of qsort(). That
seems especially dangerous since the priority order is mostly undefined.I was a bit surprised that Alvaro didn't propose sorting by the age of
relfrozenxid, at least for the subset of tables that are considered
wraparound hazards. Not sure what a good criterion is for the rest.
Hah. This patch began life with more complex prioritisation at first,
but before going much further I dumbed down the idea to avoid having to
discuss these issues, as it doesn't seem a particularly good timing.
And I do want to get something back-patchable.
So if we're to discuss this, here's what I had in mind:
1. for-wraparound tables always go first; oldest age(relfrozenxid) are
sorted earlier. For tables of the same age, consider size as below.
2. for other tables, consider floor(log(size)). This makes tables of
sizes in the same ballpark be considered together.
3. For tables of similar size, consider
(n_dead_tuples - threshold) / threshold.
"threshold" is what gets calculated as the number of tuples over which
a table is considered for vacuuming. This number, then, is a relative
measure of how hard is vacuuming needed.
--
Álvaro Herrera http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On 1/25/13 10:29 AM, Alvaro Herrera wrote:
And I do want to get something back-patchable.
Autovacuum has existed for N years and nobody complained about this
until just now, so I don't see a strong justification for backpatching.
Or is this a regression from an earlier release?
In general, I think we should backpatch less.
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Peter Eisentraut escribió:
On 1/25/13 10:29 AM, Alvaro Herrera wrote:
And I do want to get something back-patchable.
Autovacuum has existed for N years and nobody complained about this
until just now, so I don't see a strong justification for backpatching.
I disagree about people not complaining. Maybe the complaints have not
been specifically about the wraparound stuff and toast tables, but for
sure there have been complaints about autovacuum not giving more
priority to tables that need work more urgently.
Or is this a regression from an earlier release?
Nope.
In general, I think we should backpatch less.
I don't disagree with this general principle, but I certainly don't like
the idea of letting systems run with known flaws just because we're too
scared to patch them. Now I don't object to a plan such as keep it in
master only for a while and backpatch after it has seen some more
testing. But for large sites, this is a real problem and they have to
work around it manually which is frequently inconvenient; keep in mind
9.0 is going to be supported for years yet.
That said, if consensus here is to not backpatch this at all, I will go
with that; but let's have the argument first.
--
Álvaro Herrera http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Alvaro Herrera <alvherre@2ndquadrant.com> writes:
Peter Eisentraut escribi�:
Autovacuum has existed for N years and nobody complained about this
until just now, so I don't see a strong justification for backpatching.
I disagree about people not complaining. Maybe the complaints have not
been specifically about the wraparound stuff and toast tables, but for
sure there have been complaints about autovacuum not giving more
priority to tables that need work more urgently.
FWIW, I don't see that this is too scary to back-patch. It's unlikely
to make things worse than the current coding, which is more or less
pg_class tuple order.
I do suggest that it might be wise not to try to squeeze it into the
early-February update releases. Put it in master as soon as we agree
on the behavior, then back-patch after the next updates. That will
give us a couple months' testing, rather than a few days, before it
hits any release tarballs.
regards, tom lane
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Alvaro Herrera <alvherre@2ndquadrant.com> writes:
So if we're to discuss this, here's what I had in mind:
1. for-wraparound tables always go first; oldest age(relfrozenxid) are
sorted earlier. For tables of the same age, consider size as below.
It seems unlikely that age(relfrozenxid) will be identical for multiple
tables often enough to worry about, so the second part of that seems
like overcomplication.
2. for other tables, consider floor(log(size)). This makes tables of
sizes in the same ballpark be considered together.
3. For tables of similar size, consider
(n_dead_tuples - threshold) / threshold.
"threshold" is what gets calculated as the number of tuples over which
a table is considered for vacuuming. This number, then, is a relative
measure of how hard is vacuuming needed.
The floor(log(size)) part seems like it will have rather arbitrary
behavioral shifts when a table grows just past a log boundary. Also,
I'm not exactly sure whether you're proposing smaller tables first or
bigger tables first, nor that either of those orderings is a good thing.
I think sorting by just age(relfrozenxid) for for-wraparound tables, and
just the n_dead_tuples measurement for others, is probably reasonable
for now. If we find out that has bad behaviors then we can look at how
to fix them, but I don't think we have enough understanding yet of what
the bad behaviors might be.
regards, tom lane
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On 2013-01-25 11:51:33 -0500, Tom Lane wrote:
Alvaro Herrera <alvherre@2ndquadrant.com> writes:
2. for other tables, consider floor(log(size)). This makes tables of
sizes in the same ballpark be considered together.3. For tables of similar size, consider
(n_dead_tuples - threshold) / threshold.
"threshold" is what gets calculated as the number of tuples over which
a table is considered for vacuuming. This number, then, is a relative
measure of how hard is vacuuming needed.The floor(log(size)) part seems like it will have rather arbitrary
behavioral shifts when a table grows just past a log boundary. Also,
I'm not exactly sure whether you're proposing smaller tables first or
bigger tables first, nor that either of those orderings is a good thing.
That seems dubious to me as well.
I think sorting by just age(relfrozenxid) for for-wraparound tables, and
just the n_dead_tuples measurement for others, is probably reasonable
for now. If we find out that has bad behaviors then we can look at how
to fix them, but I don't think we have enough understanding yet of what
the bad behaviors might be.
If we want another ordering criterion than that it might be worth
thinking about something like n_dead_tuples/relpages to make sure that
small tables with a high dead tuples ratio get vacuumed in time.
Greetings,
Andres Freund
--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Fri, Jan 25, 2013 at 11:51 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
The floor(log(size)) part seems like it will have rather arbitrary
behavioral shifts when a table grows just past a log boundary. Also,
I'm not exactly sure whether you're proposing smaller tables first or
bigger tables first, nor that either of those orderings is a good thing.I think sorting by just age(relfrozenxid) for for-wraparound tables, and
just the n_dead_tuples measurement for others, is probably reasonable
for now. If we find out that has bad behaviors then we can look at how
to fix them, but I don't think we have enough understanding yet of what
the bad behaviors might be.
Which is exactly why back-patching this is not a good idea, IMHO. We
could easily run across a system where pg_class order happens to be
better than anything else we come up with. Such changes are expected
in new major versions, but not in maintenance releases.
I think that to do this right, we need to consider not only the status
quo but the trajectory. For example, suppose we have two tables to
process, one of which needs a wraparound vacuum and the other one of
which needs dead tuples removed. If the table needing the wraparound
vacuum is small and just barely over the threshold, it isn't urgent;
but if it's large and way over the threshold, it's quite urgent.
Similarly, if the table which needs dead tuples removed is rarely
updated, postponing vacuum is not a big deal, but if it's being
updated like crazy, postponing vacuum is a big problem. Categorically
putting autovacuum wraparound tables ahead of everything else seems
simplistic, and thinking that more dead tuples is more urgent than
fewer dead tuples seems *extremely* simplistic.
I ran across a real-world case where a user had a small table that had
to be vacuumed every 15 seconds to prevent bloat. If we change the
algorithm in a way that gives other things priority over that table,
then that user could easily get hosed when they install a maintenance
release containing this change.
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On 2013-01-25 12:19:25 -0500, Robert Haas wrote:
On Fri, Jan 25, 2013 at 11:51 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
The floor(log(size)) part seems like it will have rather arbitrary
behavioral shifts when a table grows just past a log boundary. Also,
I'm not exactly sure whether you're proposing smaller tables first or
bigger tables first, nor that either of those orderings is a good thing.I think sorting by just age(relfrozenxid) for for-wraparound tables, and
just the n_dead_tuples measurement for others, is probably reasonable
for now. If we find out that has bad behaviors then we can look at how
to fix them, but I don't think we have enough understanding yet of what
the bad behaviors might be.
I think that to do this right, we need to consider not only the status
quo but the trajectory. For example, suppose we have two tables to
process, one of which needs a wraparound vacuum and the other one of
which needs dead tuples removed. If the table needing the wraparound
vacuum is small and just barely over the threshold, it isn't urgent;
but if it's large and way over the threshold, it's quite urgent.
Similarly, if the table which needs dead tuples removed is rarely
updated, postponing vacuum is not a big deal, but if it's being
updated like crazy, postponing vacuum is a big problem. Categorically
putting autovacuum wraparound tables ahead of everything else seems
simplistic, and thinking that more dead tuples is more urgent than
fewer dead tuples seems *extremely* simplistic.
I don't think the first part is problematic. Which scenario do you have
in mind where that would really cause adverse behaviour? autovacuum
seldomly does full table vacuums on tables otherwise these days so
tables get "old" in that sense pretty regularly and mostly uniform.
I agree that the second criterion isn't worth very much and that we need
something better there.
I ran across a real-world case where a user had a small table that had
to be vacuumed every 15 seconds to prevent bloat. If we change the
algorithm in a way that gives other things priority over that table,
then that user could easily get hosed when they install a maintenance
release containing this change.
I think if we backpatch this we should only prefer wraparound tables and
leave the rest unchanged.
Which is exactly why back-patching this is not a good idea, IMHO. We
could easily run across a system where pg_class order happens to be
better than anything else we come up with. Such changes are expected
in new major versions, but not in maintenance releases.
I think a minimal version might be acceptable. Its a bug if the database
regularly shuts down and you need to write manual vacuuming scripts to
prevent it from happening.
I don't think the argument that the pg_class order might work better
than anything holds that much truth - its not like thats something
really stable.
Greetings,
Andres Freund
--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Andres Freund <andres@2ndquadrant.com> writes:
I think if we backpatch this we should only prefer wraparound tables and
leave the rest unchanged.
That's not a realistic option, at least not with anything that uses this
approach to sorting the tables. You'd have to assume that qsort() is
stable which it probably isn't.
I don't think the argument that the pg_class order might work better
than anything holds that much truth - its not like thats something
really stable.
I find that less than credible as well.
regards, tom lane
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Fri, Jan 25, 2013 at 12:00 PM, Andres Freund <andres@2ndquadrant.com> wrote:
On 2013-01-25 11:51:33 -0500, Tom Lane wrote:
Alvaro Herrera <alvherre@2ndquadrant.com> writes:
2. for other tables, consider floor(log(size)). This makes tables of
sizes in the same ballpark be considered together.3. For tables of similar size, consider
(n_dead_tuples - threshold) / threshold.
"threshold" is what gets calculated as the number of tuples over which
a table is considered for vacuuming. This number, then, is a relative
measure of how hard is vacuuming needed.The floor(log(size)) part seems like it will have rather arbitrary
behavioral shifts when a table grows just past a log boundary. Also,
I'm not exactly sure whether you're proposing smaller tables first or
bigger tables first, nor that either of those orderings is a good thing.That seems dubious to me as well.
I think sorting by just age(relfrozenxid) for for-wraparound tables, and
just the n_dead_tuples measurement for others, is probably reasonable
for now. If we find out that has bad behaviors then we can look at how
to fix them, but I don't think we have enough understanding yet of what
the bad behaviors might be.If we want another ordering criterion than that it might be worth
thinking about something like n_dead_tuples/relpages to make sure that
small tables with a high dead tuples ratio get vacuumed in time.
I'd imagine it a good idea to reserve some autovacuum connections for small
tables, that is, to have a maximum relpages for some portion of the
connections.
That way you don't get stuck having all the connections busy working on
huge tables and leaving small tables starved. That scenario seems pretty
obvious.
I'd be inclined to do something a bit more sophisticated than just
age(relfrozenxid) for wraparound; I'd be inclined to kick off large tables'
wraparound vacuums earlier than those for smaller tables.
With a little bit of noodling around, here's a thought for a joint function
that I *think* has reasonably common scales:
f(deadtuples, relpages, age) =
deadtuples/relpages + e ^ (age*ln(relpages)/2^32)
When the age of the table is low, this is dominated by the deadtuple/relpages
part of the equation; you vacuum tables based on what has the largest % of
dead tuples.
But when a table is not vacuumed for a long time, the second term will kick
in, and we'll tend to:
a) Vacuum the ones that are largest the earliest, but nonetheless
b) Vacuum them as the ration of age/2^32 gets close to 1.
This function assumes relpages > 0, and there's a constant, 2^32, there which
might be fiddled with.
--
When confronted by a difficult problem, solve it by reducing it to the
question, "How would the Lone Ranger handle this?"
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On 2013-01-25 12:52:46 -0500, Tom Lane wrote:
Andres Freund <andres@2ndquadrant.com> writes:
I think if we backpatch this we should only prefer wraparound tables and
leave the rest unchanged.That's not a realistic option, at least not with anything that uses this
approach to sorting the tables. You'd have to assume that qsort() is
stable which it probably isn't.
Well, comparing them equally will result in an about as arbitrary order
as right now, so I don't really see a problem with that. I am fine with
sorting them truly randomly as well (by assining a temporary value when
putting it into the list so the comparison is repeatable and conforms to
the triangle inequality etc).
Greetings,
Andres Freund
--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Fri, Jan 25, 2013 at 12:35 PM, Andres Freund <andres@2ndquadrant.com> wrote:
I think that to do this right, we need to consider not only the status
quo but the trajectory. For example, suppose we have two tables to
process, one of which needs a wraparound vacuum and the other one of
which needs dead tuples removed. If the table needing the wraparound
vacuum is small and just barely over the threshold, it isn't urgent;
but if it's large and way over the threshold, it's quite urgent.
Similarly, if the table which needs dead tuples removed is rarely
updated, postponing vacuum is not a big deal, but if it's being
updated like crazy, postponing vacuum is a big problem. Categorically
putting autovacuum wraparound tables ahead of everything else seems
simplistic, and thinking that more dead tuples is more urgent than
fewer dead tuples seems *extremely* simplistic.I don't think the first part is problematic. Which scenario do you have
in mind where that would really cause adverse behaviour? autovacuum
seldomly does full table vacuums on tables otherwise these days so
tables get "old" in that sense pretty regularly and mostly uniform.
I'm worried about the case of a very, very frequently updated table
getting put ahead of a table that needs a wraparound vacuum, but only
just. It doesn't sit well with me to think that the priority of that
goes from 0 (we don't even try to update it) to infinity (it goes
ahead of all tables needing to be vacuumed for dead tuples) the
instant we hit the vacuum_freeze_table_age.
One idea would be to give each table a "badness". So estimate the
percentage of the tuples in each table that are dead. And then we
compute the percentage by which age(relfrozenxid) exceeds the table
age, and add those two percentages up to get total badness. We
process tables that are otherwise-eligible for vacuuming in descending
order of badness. So if autovacuum_vacuum_scale_factor = 0.2 and a
table is more than than 120% of vacuum_freeze_table_age, then it's
certain to be vacuumed before any table that only needs dead-tuple
processing. But if it's only slightly past the cutoff, it doesn't get
to stomp all over the people who need dead tuples cleaned up.
The thing is, avoiding a full-cluster shutdown due to anti-wraparound
vacuum is important. But, IME, that rarely happens. What is much
more common is that an individual table gets bloated and CLUSTER or
VACUUM FULL is required to recover, and now the system is effectively
down for as long as that takes to complete. I don't want to make that
case substantially more likely just to avoid a danger of full-cluster
shutdown that, for most users most of the time, is really a very
remote risk. There's some point at which an anti-wraparound vacuum
should not only trump everything else, but probably also ignore the
configured cost delay settings - but equating that point with the
first point at which we consider doing it at all does not seem right
to me.
I think a minimal version might be acceptable. Its a bug if the database
regularly shuts down and you need to write manual vacuuming scripts to
prevent it from happening.I don't think the argument that the pg_class order might work better
than anything holds that much truth - its not like thats something
really stable.
I freely admit that if pg_class order happens to work better, it's
just good luck. But sometimes people get lucky.
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Robert Haas <robertmhaas@gmail.com> writes:
On Fri, Jan 25, 2013 at 12:35 PM, Andres Freund <andres@2ndquadrant.com> wrote:
I don't think the first part is problematic. Which scenario do you have
in mind where that would really cause adverse behaviour? autovacuum
seldomly does full table vacuums on tables otherwise these days so
tables get "old" in that sense pretty regularly and mostly uniform.
I'm worried about the case of a very, very frequently updated table
getting put ahead of a table that needs a wraparound vacuum, but only
just. It doesn't sit well with me to think that the priority of that
goes from 0 (we don't even try to update it) to infinity (it goes
ahead of all tables needing to be vacuumed for dead tuples) the
instant we hit the vacuum_freeze_table_age.
Well, really the answer to that is that we have multiple autovac
workers, and even if the first one that comes along picks the wraparound
job, the next one won't.
Having said that, I agree that it might be better to express the
sort priority as some sort of continuous function of multiple figures of
merit, rather than "sort by one then the next". See Chris Browne's
mail for another variant.
regards, tom lane
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Fri, Jan 25, 2013 at 1:17 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
Robert Haas <robertmhaas@gmail.com> writes:
On Fri, Jan 25, 2013 at 12:35 PM, Andres Freund <andres@2ndquadrant.com> wrote:
I don't think the first part is problematic. Which scenario do you have
in mind where that would really cause adverse behaviour? autovacuum
seldomly does full table vacuums on tables otherwise these days so
tables get "old" in that sense pretty regularly and mostly uniform.I'm worried about the case of a very, very frequently updated table
getting put ahead of a table that needs a wraparound vacuum, but only
just. It doesn't sit well with me to think that the priority of that
goes from 0 (we don't even try to update it) to infinity (it goes
ahead of all tables needing to be vacuumed for dead tuples) the
instant we hit the vacuum_freeze_table_age.Well, really the answer to that is that we have multiple autovac
workers, and even if the first one that comes along picks the wraparound
job, the next one won't.
Sure, but you could easily have 10 or 20 cross the
vacuum_freeze_table_age threshold simultaneously - and you'll only be
able to process a few of those at a time, due to
autovacuum_max_workers. Moreover, even if you don't hit the
autovacuum_max_workers limit (say it's jacked up to 100 or so), you're
still introducing a delay of up to N * autovacuum_naptime, where N is
the number of tables that cross the threshold at the same instant,
before any dead-tuple cleanup vacuums are initiated. It's not
difficult to imagine that being bad.
Having said that, I agree that it might be better to express the
sort priority as some sort of continuous function of multiple figures of
merit, rather than "sort by one then the next". See Chris Browne's
mail for another variant.
Ah, so. I think, though, that my variant is a whole lot simpler and
accomplishes mostly the same purpose. One difference between my
proposal and the others that have popped up thus far is that I am not
convinced table size matters, or at least not in the way that people
are proposing to make it matter. The main reason I can see why big
tables matter more than small tables is that a big table takes
*longer* to autovacuum than a small table. If you are 123,456
transactions from a cluster-wide shutdown, and there is one big table
and one small table that need to be autovacuumed, you had better start
on the big one first - because the next autovacuum worker to come
along will quite possibly be able to finish the small one before
doomsday, but if you don't start the big one now you won't finish in
time. This remains true even if the small table has a slightly older
relfrozenxid than the large one, but ceases to be true when the
difference is large enough that vacuuming the small one first will
advance datfrozenxid enough to extend the time until a shutdown occurs
by more than the time it takes to vacuum it.
For dead-tuple vacuuming, the question of whether the table is large
or small does not seem to me to have a categorical right answer. You
could argue that it's more important recover 2GB of space in a 20GB
table than 2MB of space in a 20MB table, because more space is being
wasted. On the flip side you could argue that a small table becomes
bloated much more easily than a large table, because even a minute of
heavy update activity can turn over the entire table contents, which
is unlikely for a larger table. I am inclined to think that the
percentage of dead tuples is a more important rubric - if things are
going well, it shouldn't ever be much different from the threshold
that triggers AV in the first place - but if somehow it is much
different (e.g. because the table's been locked for a while, or is
accumulating more bloat that the threshold in a single
autovacuum_naptime), that seems like good justification for doing it
ahead of other things that are less bloated.
We do need to make sure that the formula is defined in such a way that
something that is *severely* past vacuum_freeze_table_age always beats
an arbitrarily-bloated table.
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On 25 January 2013 17:19, Robert Haas <robertmhaas@gmail.com> wrote:
We
could easily run across a system where pg_class order happens to be
better than anything else we come up with.
I think you should read that back to yourself and see if you still
feel the word "easily" applies here.
I agree with Tom that its hard for almost any prioritisation not to be
better than we have now.
But also, we should keep it fairly simple to avoid introducing new
behaviour that defeats people with a highly tuned vacuum config.
--
Simon Riggs http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers