Berserk Autovacuum (let's save next Mandrill)
Hi hackers,
Attached is sketch of small patch that fixes several edge cases with
autovacuum. Long story short autovacuum never comes to append only tables,
killing large productions.
First case, mine.
/messages/by-id/CAC8Q8tLBeAxR+BXWuKK+HP5m8tEVYn270CVrDvKXt=0PkJTY9g@mail.gmail.com
We had a table we were appending and wanted Index Only Scan to work. For it
to work, you need to call VACUUM manually, since VACUUM is the only way to
mark pages all visible, and autovacuum never comes to append only tables.
We were clever to invent a workflow without dead tuples and it painfully
bit us.
Second case, just read in the news.
https://mailchimp.com/what-we-learned-from-the-recent-mandrill-outage/
Mandrill has 6TB append only table that autovacuum probably never vacuumed.
Then anti-wraparound came and production went down. If autovacuum did its
job before that last moment, it would probably be okay.
Idea: look not on dead tuples, but on changes, just like ANALYZE does.
It's my first patch on Postgres, it's probably all wrong but I hope it
helps you get the idea.
--
Darafei Praliaskouski
Support me: http://patreon.com/komzpa
Attachments:
autovacuum_berserk_v1.patchtext/x-patch; charset=US-ASCII; name=autovacuum_berserk_v1.patchDownload+6-1
On 2019-Mar-27, Darafei "Komяpa" Praliaskouski wrote:
Attached is sketch of small patch that fixes several edge cases with
autovacuum. Long story short autovacuum never comes to append only tables,
killing large productions.
Yeah, autovac is not coping with these scenarios (and probably others).
However, rather than taking your patch's idea verbatim, I think we
should have autovacuum use separate actions for those two (wildly
different) scenarios. For example:
* certain tables would have some sort of partial scan that sets the
visibility map. There's no reason to invoke the whole vacuuming
machinery. I don't think this is limited to append-only tables, but
rather those are just the ones that are affected the most.
* tables nearing wraparound danger should use the (yet to be committed)
option to skip index cleaning, which makes the cleanup action faster.
Again, no need for complete vacuuming.
--
Álvaro Herrera https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Hi,
чт, 28 мар. 2019 г. в 00:32, Alvaro Herrera <alvherre@2ndquadrant.com>:
On 2019-Mar-27, Darafei "Komяpa" Praliaskouski wrote:
Attached is sketch of small patch that fixes several edge cases with
autovacuum. Long story short autovacuum never comes to append onlytables,
killing large productions.
Yeah, autovac is not coping with these scenarios (and probably others).
However, rather than taking your patch's idea verbatim, I think we
should have autovacuum use separate actions for those two (wildly
different) scenarios. For example:* certain tables would have some sort of partial scan that sets the
visibility map. There's no reason to invoke the whole vacuuming
machinery. I don't think this is limited to append-only tables, but
rather those are just the ones that are affected the most.
What other machinery runs on VACUUM invocation that is not wanted there?
Since Postgres 11 index cleanup is already skipped on append-only tables.
* tables nearing wraparound danger should use the (yet to be committed)
option to skip index cleaning, which makes the cleanup action faster.
Again, no need for complete vacuuming.
"Nearing wraparound" is too late already. In Amazon, reading table from gp2
after you exhausted your IOPS burst budget is like reading a floppy drive,
you have to freeze a lot earlier than you hit several terabytes of unfrozen
data, or you're dead like Mandrill's Search and Url tables from the link I
shared.
--
Álvaro Herrera https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
--
Darafei Praliaskouski
Support me: http://patreon.com/komzpa
On 2019-Mar-28, Darafei "Komяpa" Praliaskouski wrote:
чт, 28 мар. 2019 г. в 00:32, Alvaro Herrera <alvherre@2ndquadrant.com>:
On 2019-Mar-27, Darafei "Komяpa" Praliaskouski wrote:
* certain tables would have some sort of partial scan that sets the
visibility map. There's no reason to invoke the whole vacuuming
machinery. I don't think this is limited to append-only tables, but
rather those are just the ones that are affected the most.What other machinery runs on VACUUM invocation that is not wanted there?
Since Postgres 11 index cleanup is already skipped on append-only tables.
Well, I think it would be useful to set all-visible earlier than waiting
for a vacuum to be necessary, even for tables that are not append-only.
So if you think about this just for the append-only table, you leave
money on the table.
* tables nearing wraparound danger should use the (yet to be committed)
option to skip index cleaning, which makes the cleanup action faster.
Again, no need for complete vacuuming."Nearing wraparound" is too late already. In Amazon, reading table from gp2
after you exhausted your IOPS burst budget is like reading a floppy drive,
you have to freeze a lot earlier than you hit several terabytes of unfrozen
data, or you're dead like Mandrill's Search and Url tables from the link I
shared.
OK, then start freezing tuples in the cheap mode (skip index updates)
earlier than that. I suppose a good question is when to start.
I wonder if Mandrill's problem is related to Mailchimp raising the
freeze_max_age to a point where autovac did not have enough time to
react with an emergency vacuum. If you keep raising that value because
the vacuums cause problems for you (they block DDL), there's something
wrong.
--
Álvaro Herrera https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On Thu, 28 Mar 2019 at 11:01, Alvaro Herrera <alvherre@2ndquadrant.com> wrote:
On 2019-Mar-28, Darafei "Komяpa" Praliaskouski wrote:
"Nearing wraparound" is too late already. In Amazon, reading table from gp2
after you exhausted your IOPS burst budget is like reading a floppy drive,
you have to freeze a lot earlier than you hit several terabytes of unfrozen
data, or you're dead like Mandrill's Search and Url tables from the link I
shared.OK, then start freezing tuples in the cheap mode (skip index updates)
earlier than that. I suppose a good question is when to start.
I thought recently that it would be good to have some sort of
pro-active auto-vacuum mode that made use of idle workers. Probably
there would need to be some mode flag that mentioned which workers
were in proactive mode so that these could be cancelled when more
pressing work came in. I don't have an idea exactly of what
"pro-active" would actually be defined as, but I know that when the
single transaction ID is consumed that causes terra bytes of tables to
suddenly need an anti-wraparound vacuum, then it's not a good
situation to be in. Perhaps getting to some percentage of
autovacuum_freeze_max_age could be classed as pro-active.
I wonder if Mandrill's problem is related to Mailchimp raising the
freeze_max_age to a point where autovac did not have enough time to
react with an emergency vacuum. If you keep raising that value because
the vacuums cause problems for you (they block DDL), there's something
wrong.
I have seen some very high autovacuum_freeze_max_age settings
recently. It would be interesting to know what they had theirs set to.
I see they mentioned "Search and Url tables". I can imagine "search"
never needs any UPDATEs, so quite possibly those were append-only, in
which case the anti-wraparound vacuum would have had quite a lot of
work on its hands since possibly every page needed frozen. A table
receiving regular auto-vacuums from dead tuples would likely get some
pages frozen during those.
--
David Rowley http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
чт, 28 мар. 2019 г. в 01:01, Alvaro Herrera <alvherre@2ndquadrant.com>:
On 2019-Mar-28, Darafei "Komяpa" Praliaskouski wrote:
чт, 28 мар. 2019 г. в 00:32, Alvaro Herrera <alvherre@2ndquadrant.com>:
On 2019-Mar-27, Darafei "Komяpa" Praliaskouski wrote:
* certain tables would have some sort of partial scan that sets the
visibility map. There's no reason to invoke the whole vacuuming
machinery. I don't think this is limited to append-only tables, but
rather those are just the ones that are affected the most.What other machinery runs on VACUUM invocation that is not wanted there?
Since Postgres 11 index cleanup is already skipped on append-only tables.Well, I think it would be useful to set all-visible earlier than waiting
for a vacuum to be necessary, even for tables that are not append-only.
So if you think about this just for the append-only table, you leave
money on the table.
Thing is, problem does not exist for non-append-only tables, they're going
to be vacuumed after 50 rows got updated, automatically.
* tables nearing wraparound danger should use the (yet to be committed)
option to skip index cleaning, which makes the cleanup action faster.
Again, no need for complete vacuuming."Nearing wraparound" is too late already. In Amazon, reading table from
gp2
after you exhausted your IOPS burst budget is like reading a floppy
drive,
you have to freeze a lot earlier than you hit several terabytes of
unfrozen
data, or you're dead like Mandrill's Search and Url tables from the link
I
shared.
OK, then start freezing tuples in the cheap mode (skip index updates)
earlier than that. I suppose a good question is when to start.
Attached (autovacuum_berserk_v1.patch)
code achieves that. For append-only tables since
https://commitfest.postgresql.org/16/952/ vacuum skips index cleanup if no
updates happened. You just need to trigger it, and it already will be
"cheap".
--
Darafei Praliaskouski
Support me: http://patreon.com/komzpa
On Thu, Mar 28, 2019 at 2:36 AM David Rowley <david.rowley@2ndquadrant.com>
wrote:
On Thu, 28 Mar 2019 at 11:01, Alvaro Herrera <alvherre@2ndquadrant.com>
wrote:On 2019-Mar-28, Darafei "Komяpa" Praliaskouski wrote:
"Nearing wraparound" is too late already. In Amazon, reading table
from gp2
after you exhausted your IOPS burst budget is like reading a floppy
drive,
you have to freeze a lot earlier than you hit several terabytes of
unfrozen
data, or you're dead like Mandrill's Search and Url tables from the
link I
shared.
OK, then start freezing tuples in the cheap mode (skip index updates)
earlier than that. I suppose a good question is when to start.I thought recently that it would be good to have some sort of
pro-active auto-vacuum mode that made use of idle workers.
Problem with "idle" is that it never happens on system that are going to
wraparound on their lifetime. This has to be a part of normal database
functioning.
Why not select a table that has inserts, updates and deletes for autovacuum
just like we do for autoanalyze, not only deletes and updates like we do
now?
--
Darafei Praliaskouski
Support me: http://patreon.com/komzpa
On Thu, 28 Mar 2019 at 22:04, Darafei "Komяpa" Praliaskouski
<me@komzpa.net> wrote:
On Thu, Mar 28, 2019 at 2:36 AM David Rowley <david.rowley@2ndquadrant.com> wrote:
I thought recently that it would be good to have some sort of
pro-active auto-vacuum mode that made use of idle workers.Problem with "idle" is that it never happens on system that are going to wraparound on their lifetime. This has to be a part of normal database functioning.
I'd say auto-vacuum is configured to run too slowly if you never have
an idle worker. The chances that it happens to be running at exactly
the right speed to keep up with demand must be about close to nil.
Why not select a table that has inserts, updates and deletes for autovacuum just like we do for autoanalyze, not only deletes and updates like we do now?
Sounds like a good idea, although I do agree with Alvaro when he
mentions that it would be good to only invoke a worker that was only
going to freeze tuples and not look at the indexes. I've not looked at
it, but there's a patch [1]https://commitfest.postgresql.org/22/1817/ in the current CF for that. I'd say a
good course of action would be to review that then write a patch with
a new bool flag in relation_needs_vacanalyze for "freezeonly" and have
auto-vacuum invoke vacuum in this new freeze only mode if freezeonly
is set and dovacuum is not.
Any patch not in the current CF is already PG13 or beyond. Having at
least a freeze only vacuum mode main ease some pain, even if it still
needs to be done manually for anyone finding themselves in a similar
situation as mailchimp.
The idea I was mentioning was more targeted to ease the sudden rush of
auto-vacuum activity when suddenly a bunch of large tables require an
anti-wraparound vacuum all at once.
[1]: https://commitfest.postgresql.org/22/1817/
--
David Rowley http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
On Thu, Mar 28, 2019 at 6:32 PM David Rowley
<david.rowley@2ndquadrant.com> wrote:
On Thu, 28 Mar 2019 at 22:04, Darafei "Komяpa" Praliaskouski
<me@komzpa.net> wrote:On Thu, Mar 28, 2019 at 2:36 AM David Rowley <david.rowley@2ndquadrant.com> wrote:
I thought recently that it would be good to have some sort of
pro-active auto-vacuum mode that made use of idle workers.Problem with "idle" is that it never happens on system that are going to wraparound on their lifetime. This has to be a part of normal database functioning.
I'd say auto-vacuum is configured to run too slowly if you never have
an idle worker. The chances that it happens to be running at exactly
the right speed to keep up with demand must be about close to nil.Why not select a table that has inserts, updates and deletes for autovacuum just like we do for autoanalyze, not only deletes and updates like we do now?
Sounds like a good idea, although I do agree with Alvaro when he
mentions that it would be good to only invoke a worker that was only
going to freeze tuples and not look at the indexes.
The invoking autovacuum on table based on inserts, not only deletes
and updates, seems good idea to me. But in this case, I think that we
can not only freeze tuples but also update visibility map even when
setting all-visible. Roughly speaking I think vacuum does the
following operations.
1. heap vacuum
2. HOT pruning
3. freezing tuples
4. updating visibility map (all-visible and all-frozen)
5. index vacuum/cleanup
6. truncation
With the proposed patch[1]https://commitfest.postgresql.org/22/1817/ we can control to do 5 or not. In addition
to that, another proposed patch[2]https://commitfest.postgresql.org/22/1981/ allows us to control 6.
For append-only tables (and similar tables), what we periodically want
to do would be 3 and 4 (possibly we can do 2 as well). So maybe we
need to have both an option of (auto)vacuum to control whether to do 1
and something like a new autovacuum threshold (or an option) to invoke
the vacuum that disables 1, 5 and 6. The vacuum that does only 2, 3
and 4 would be much cheaper than today's vacuum and anti-wraparound
vacuum would be able to skip almost pages.
[1]: https://commitfest.postgresql.org/22/1817/
[2]: https://commitfest.postgresql.org/22/1981/
Regards,
--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center
On Thu, Mar 28, 2019 at 12:32 PM David Rowley <david.rowley@2ndquadrant.com>
wrote:
On Thu, 28 Mar 2019 at 22:04, Darafei "Komяpa" Praliaskouski
<me@komzpa.net> wrote:On Thu, Mar 28, 2019 at 2:36 AM David Rowley <
david.rowley@2ndquadrant.com> wrote:
I thought recently that it would be good to have some sort of
pro-active auto-vacuum mode that made use of idle workers.Problem with "idle" is that it never happens on system that are going to
wraparound on their lifetime. This has to be a part of normal database
functioning.I'd say auto-vacuum is configured to run too slowly if you never have
an idle worker. The chances that it happens to be running at exactly
the right speed to keep up with demand must be about close to nil.Why not select a table that has inserts, updates and deletes for
autovacuum just like we do for autoanalyze, not only deletes and updates
like we do now?Sounds like a good idea, although I do agree with Alvaro when he
mentions that it would be good to only invoke a worker that was only
going to freeze tuples and not look at the indexes.
This is current behavior of VACUUM on tables without dead tuples, already.
Issue is that nothing triggers this VACUUM apart from user performing
VACUUM manually, or super late antiwraparound vacuum.
Any patch not in the current CF is already PG13 or beyond. Having at
least a freeze only vacuum mode main ease some pain, even if it still
needs to be done manually for anyone finding themselves in a similar
situation as mailchimp.
If you're in wraparound halt with super large table on Amazon gp2 nothing
will help you - issue is, there's no option to "rewrite all of it quickly".
Burst limit lets you feel the shared drive as if it was an SSD on most of
your load, but reading and re-writing all the storage gets throttled, and
there's no option to escape this quickly.
The process that freezes and marks all-visible pages has to run in parallel
and at the speed of your backend pushing pages to disk, maybe lagging
behind a bit - but not up to "we need to rescan all the table".
The idea I was mentioning was more targeted to ease the sudden rush of
auto-vacuum activity when suddenly a bunch of large tables require an
anti-wraparound vacuum all at once.[1] https://commitfest.postgresql.org/22/1817/
--
David Rowley http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
--
Darafei Praliaskouski
Support me: http://patreon.com/komzpa
Hi,
Why not select a table that has inserts, updates and deletes for
autovacuum just like we do for autoanalyze, not only deletes and updates
like we do now?
Sounds like a good idea, although I do agree with Alvaro when he
mentions that it would be good to only invoke a worker that was only
going to freeze tuples and not look at the indexes.The invoking autovacuum on table based on inserts, not only deletes
and updates, seems good idea to me. But in this case, I think that we
can not only freeze tuples but also update visibility map even when
setting all-visible. Roughly speaking I think vacuum does the
following operations.1. heap vacuum
2. HOT pruning
Is it worth skipping it if we're writing a page anyway for the sake of hint
bits and new xids? This will all be no-op anyway on append-only tables and
happen only when we actually need something?
3. freezing tuples
4. updating visibility map (all-visible and all-frozen)
These two are needed, and current autovacuum launch process does not take
into account that this is also needed for non-dead tuples.
5. index vacuum/cleanup
There is a separate patch for that. But, since
https://commitfest.postgresql.org/16/952/ for almost a year already
Postgres skips index cleanup on tables without new dead tuples, so this
case is taken care of already?
6. truncation
This shouldn't be a heavy operation?
With the proposed patch[1] we can control to do 5 or not. In addition
to that, another proposed patch[2] allows us to control 6.For append-only tables (and similar tables), what we periodically want
to do would be 3 and 4 (possibly we can do 2 as well). So maybe we
need to have both an option of (auto)vacuum to control whether to do 1
and something like a new autovacuum threshold (or an option) to invoke
the vacuum that disables 1, 5 and 6. The vacuum that does only 2, 3
and 4 would be much cheaper than today's vacuum and anti-wraparound
vacuum would be able to skip almost pages.
Why will we want to get rid of 1? It's a noop from write perspective and
saves a scan to do it if it's not noop.
Why make it faster in emergency situations when situation can be made
non-emergency from the very beginning instead?
[1] https://commitfest.postgresql.org/22/1817/
[2] https://commitfest.postgresql.org/22/1981/Regards,
--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center
--
Darafei Praliaskouski
Support me: http://patreon.com/komzpa
On Thu, Mar 28, 2019 at 8:58 PM Darafei "Komяpa" Praliaskouski
<me@komzpa.net> wrote:
Hi,
Why not select a table that has inserts, updates and deletes for autovacuum just like we do for autoanalyze, not only deletes and updates like we do now?
Sounds like a good idea, although I do agree with Alvaro when he
mentions that it would be good to only invoke a worker that was only
going to freeze tuples and not look at the indexes.The invoking autovacuum on table based on inserts, not only deletes
and updates, seems good idea to me. But in this case, I think that we
can not only freeze tuples but also update visibility map even when
setting all-visible. Roughly speaking I think vacuum does the
following operations.1. heap vacuum
2. HOT pruning
Is it worth skipping it if we're writing a page anyway for the sake of hint bits and new xids? This will all be no-op anyway on append-only tables and happen only when we actually need something?
Yeah, these operations are required only when the table has actual
garbage. IOW, append-only tables never require them.
3. freezing tuples
4. updating visibility map (all-visible and all-frozen)These two are needed, and current autovacuum launch process does not take into account that this is also needed for non-dead tuples.
5. index vacuum/cleanup
There is a separate patch for that. But, since https://commitfest.postgresql.org/16/952/ for almost a year already Postgres skips index cleanup on tables without new dead tuples, so this case is taken care of already?
I think that's not enough. The feature "GUC for cleanup index
threshold" allows us to skip only index cleanup when there are less
insertion than the fraction of the total number of heap tuples since
last index cleanup. Therefore it helps only append-only tables (and
supporting only btree index for now). We still have to do index
vacuuming even if the table has just a few dead tuple. The proposed
patch[1] helps this situation; vacuum can run while skipping index
vacuuming and index cleanup.
6. truncation
This shouldn't be a heavy operation?
I don't think so. This could take AccessExclusiveLock on the table and
take a long time with large shared buffer as per reported on that
thread[2].
With the proposed patch[1] we can control to do 5 or not. In addition
to that, another proposed patch[2] allows us to control 6.For append-only tables (and similar tables), what we periodically want
to do would be 3 and 4 (possibly we can do 2 as well). So maybe we
need to have both an option of (auto)vacuum to control whether to do 1
and something like a new autovacuum threshold (or an option) to invoke
the vacuum that disables 1, 5 and 6. The vacuum that does only 2, 3
and 4 would be much cheaper than today's vacuum and anti-wraparound
vacuum would be able to skip almost pages.Why will we want to get rid of 1? It's a noop from write perspective and saves a scan to do it if it's not noop.
Because that's for tables that have many inserts but have some
updates/deletes. I think that this strategy would help not only
append-only tables but also such tables.
Why make it faster in emergency situations when situation can be made non-emergency from the very beginning instead?
I don't understand the meaning of "situation can be made non-emergency
from the very beginning". Could you please elaborate on that?
[1] https://commitfest.postgresql.org/22/1817/
[2] https://commitfest.postgresql.org/22/1981/
Regards,
--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center
Hi,
On Thu, Mar 28, 2019 at 12:36:24PM +1300, David Rowley wrote:
On Thu, 28 Mar 2019 at 11:01, Alvaro Herrera <alvherre@2ndquadrant.com> wrote:
I wonder if Mandrill's problem is related to Mailchimp raising the
freeze_max_age to a point where autovac did not have enough time to
react with an emergency vacuum. If you keep raising that value because
the vacuums cause problems for you (they block DDL), there's something
wrong.I have seen some very high autovacuum_freeze_max_age settings
recently. It would be interesting to know what they had theirs set to.
I see they mentioned "Search and Url tables". I can imagine "search"
never needs any UPDATEs, so quite possibly those were append-only, in
which case the anti-wraparound vacuum would have had quite a lot of
work on its hands since possibly every page needed frozen. A table
receiving regular auto-vacuums from dead tuples would likely get some
pages frozen during those.
By the way, the Routine Vacuuming chapter of the documentation says:
"The sole disadvantage of increasing autovacuum_freeze_max_age (and
vacuum_freeze_table_age along with it) is that the pg_xact and
pg_commit_ts subdirectories of the database cluster will take more space
[...]
If [pg_xact and pg_commit_ts taking 0.5 and 20 GB, respectively]
is trivial compared to your total database size, setting
autovacuum_freeze_max_age to its maximum allowed value is recommended."
Maybe this should be qualified with "unless you have trouble with your
autovacuum keeping up" or so; or generally reworded?
Michael
On Wed, Mar 27, 2019 at 5:32 PM Alvaro Herrera <alvherre@2ndquadrant.com> wrote:
* certain tables would have some sort of partial scan that sets the
visibility map. There's no reason to invoke the whole vacuuming
machinery. I don't think this is limited to append-only tables, but
rather those are just the ones that are affected the most.
I think this is a really good idea, but in order for it to work well I
think we would need to have some kind of estimate of autovacuum
pressure.
If we know that we're currently fairly on top of things, and there is
not much for autovacuum to do, periodically vacuuming a chunk of some
table that has a lot of unset visibility-map bits is probably a good
idea. However, we can't possibly guess how aggressively to do this if
we have no idea how long it's going to be before we need to vacuum
that table for real. If the number of XIDs remaining until the table
gets a wraparound vacuum is X, and the number of XIDs being consumed
per day is Y, we can estimate that in roughly X/Y days, we're going to
need to do a wraparound vacuum. That value might be in the range of
months, or in the range of hours.
If it's months, we probably want limit vacuum to working at a pretty
slow rate, say 1% of the table size per hour or something. If it's in
hours, we need to be a lot more aggressive. Right now we have no
information to tell us which of those things is the case, so we'd just
be shooting in the dark.
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
On Sun, Mar 31, 2019 at 1:11 AM Robert Haas <robertmhaas@gmail.com> wrote:
On Wed, Mar 27, 2019 at 5:32 PM Alvaro Herrera <alvherre@2ndquadrant.com> wrote:
* certain tables would have some sort of partial scan that sets the
visibility map. There's no reason to invoke the whole vacuuming
machinery. I don't think this is limited to append-only tables, but
rather those are just the ones that are affected the most.I think this is a really good idea, but in order for it to work well I
think we would need to have some kind of estimate of autovacuum
pressure.If we know that we're currently fairly on top of things, and there is
not much for autovacuum to do, periodically vacuuming a chunk of some
table that has a lot of unset visibility-map bits is probably a good
idea. However, we can't possibly guess how aggressively to do this if
we have no idea how long it's going to be before we need to vacuum
that table for real. If the number of XIDs remaining until the table
gets a wraparound vacuum is X, and the number of XIDs being consumed
per day is Y, we can estimate that in roughly X/Y days, we're going to
need to do a wraparound vacuum. That value might be in the range of
months, or in the range of hours.If it's months, we probably want limit vacuum to working at a pretty
slow rate, say 1% of the table size per hour or something. If it's in
hours, we need to be a lot more aggressive. Right now we have no
information to tell us which of those things is the case, so we'd just
be shooting in the dark.
Sawada-san presented some ideas in his PGCon 2018 talk that may be related.
https://www.pgcon.org/2018/schedule/attachments/488_Vacuum_More_Efficient_Than_Ever
(slide 32~)
Thanks,
Amit
On 27/03/2019 21:54, Darafei "Komяpa" Praliaskouski wrote:
Hi hackers,
Attached is sketch of small patch that fixes several edge cases with
autovacuum. Long story short autovacuum never comes to append only
tables, killing large productions.First case, mine.
/messages/by-id/CAC8Q8tLBeAxR+BXWuKK+HP5m8tEVYn270CVrDvKXt=0PkJTY9g@mail.gmail.comWe had a table we were appending and wanted Index Only Scan to work. For
it to work, you need to call VACUUM manually, since VACUUM is the only
way to mark pages all visible, and autovacuum never comes to append only
tables. We were clever to invent a workflow without dead tuples and it
painfully bit us.Second case, just read in the news.
https://mailchimp.com/what-we-learned-from-the-recent-mandrill-outage/Mandrill has 6TB append only table that autovacuum probably never
vacuumed. Then anti-wraparound came and production went down. If
autovacuum did its job before that last moment, it would probably be okay.Idea: look not on dead tuples, but on changes, just like ANALYZE does.
It's my first patch on Postgres, it's probably all wrong but I hope it
helps you get the idea.
This was suggested and rejected years ago:
/messages/by-id/b970f20f-f096-2d3a-6c6d-ee887bd30cfb@2ndquadrant.fr
--
Vik Fearing +33 6 46 75 15 36
http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support
On Thu, Mar 28, 2019 at 6:43 PM Masahiko Sawada <sawada.mshk@gmail.com>
wrote:
1. heap vacuum
2. HOT pruning
Is it worth skipping it if we're writing a page anyway for the sake of
hint bits and new xids? This will all be no-op anyway on append-only tables
and happen only when we actually need something?Yeah, these operations are required only when the table has actual
garbage. IOW, append-only tables never require them.3. freezing tuples
4. updating visibility map (all-visible and all-frozen)These two are needed, and current autovacuum launch process does not
take into account that this is also needed for non-dead tuples.
5. index vacuum/cleanup
There is a separate patch for that. But, since
https://commitfest.postgresql.org/16/952/ for almost a year already
Postgres skips index cleanup on tables without new dead tuples, so this
case is taken care of already?I think that's not enough. The feature "GUC for cleanup index
threshold" allows us to skip only index cleanup when there are less
insertion than the fraction of the total number of heap tuples since
last index cleanup. Therefore it helps only append-only tables (and
supporting only btree index for now). We still have to do index
vacuuming even if the table has just a few dead tuple. The proposed
patch[1] helps this situation; vacuum can run while skipping index
vacuuming and index cleanup.
So, the patch I posted can be technically applied after
https://commitfest.postgresql.org/22/1817/ gets merged?
The change with my patch is that a table with 49 insertions and one delete:
- previously will wait for 49 more deletes by default (and ignore
insertions), and only then clean up both table and indexes.
- with patch will freeze/update VM for insertions, and scan the index.
In my experience only btree index is requiring a slow full index scan,
that's why only it was in the "GUC for cleanup index
threshold" patch. Is it wrong and more index types do a full index scan on
vacuum after deletion of a single tuple?
6. truncation
This shouldn't be a heavy operation?
I don't think so. This could take AccessExclusiveLock on the table and
take a long time with large shared buffer as per reported on that
thread[2].
While this can be a useful optimization, I believe it is out of scope for
this patch. I want to fix vacuum never coming to append only tables without
breaking other behaviors. Truncation is likely a case of enough dead tuples
to trigger a vacuum via currently existing mechanisms.
With the proposed patch[1] we can control to do 5 or not. In addition
to that, another proposed patch[2] allows us to control 6.For append-only tables (and similar tables), what we periodically want
to do would be 3 and 4 (possibly we can do 2 as well). So maybe we
need to have both an option of (auto)vacuum to control whether to do 1
and something like a new autovacuum threshold (or an option) to invoke
the vacuum that disables 1, 5 and 6. The vacuum that does only 2, 3
and 4 would be much cheaper than today's vacuum and anti-wraparound
vacuum would be able to skip almost pages.Why will we want to get rid of 1? It's a noop from write perspective and
saves a scan to do it if it's not noop.
Because that's for tables that have many inserts but have some
updates/deletes. I think that this strategy would help not only
append-only tables but also such tables.
How much do we save by skipping a heap vacuum on almost-append-only table,
where amount of updates is below 50 which is current threshold?
Why make it faster in emergency situations when situation can be made
non-emergency from the very beginning instead?
I don't understand the meaning of "situation can be made non-emergency
from the very beginning". Could you please elaborate on that?
Let's imagine a simple append-only workflow on current default settings
Postgres. You create a table, and start inserting tuples, one per
transaction. Let's imagine a page fits 50 tuples (my case for taxi movement
data), and Amazon gp2 storage which caps you say at 1000 IOPS in non-burst
mode.
Anti-wrap-around-auto-vacuum (we need a drawing of misreading of this term
with a crossed out car bent in Space) will be triggered
in autovacuum_freeze_max_age inserts, 200000000 by default. That converts
into 4000000 pages, or around 32 GB. It will be the first vacuum ever on
that table, since no other mechanism triggers it, and if it steals all the
available IOPS, it will finish in 200000000/50 /1000 = 4000 seconds,
killing prod for over an hour.
Telemetry workloads can easily generate 32 GB of data a day (I've seen
more, but let's stick to that number). Production going down for an hour a
day isn't good and I consider it an emergency.
Now, two ways to fix it that reading documentation leads you while you're a
sleepy one trying to get prod back:
- raise autovacuum_freeze_max_age so VACUUM keeps sleeping;
- rewrite code to use batching to insert more tuples at once.
We don't have a better recommendation mechanism for settings, and
experience in tuning autovacuum into right direction comes at the cost of a
job or company to people :)
Both ways not fix the problem but just delay the inevitable. Ratio of "one
hour of vacuum per day of operation" keeps, you just delay it.
Let's say had same thing with 1000 records batched inserts, and moved
autovacuum_freeze_max_age to the highest possible value. How much will the
downtime last?
2**31 (max tid) * 1000 (tuples per tid) / 50 (tuples in page) / 1000 (pages
per second) / 86400 (seconds in day) = 49 days.
This matches highest estimation in Mandrill's report, so that might be what
have happened to them.
This all would not be needed if autovacuum came after 50 inserted tuples.
It will just mark page as all visible and all frozen and be gone, while
it's still in memory. This will get rid of emergency altogether.
Is this elaborate enough disaster scenario? :)
--
Darafei Praliaskouski
Support me: http://patreon.com/komzpa
By the way, the Routine Vacuuming chapter of the documentation says:
"The sole disadvantage of increasing autovacuum_freeze_max_age (and
vacuum_freeze_table_age along with it) is that the pg_xact and
pg_commit_ts subdirectories of the database cluster will take more space[...]
If [pg_xact and pg_commit_ts taking 0.5 and 20 GB, respectively]
is trivial compared to your total database size, setting
autovacuum_freeze_max_age to its maximum allowed value is recommended."Maybe this should be qualified with "unless you have trouble with your
autovacuum keeping up" or so; or generally reworded?
This recommendation is in the mindset of "wraparound never happens".
If your database is large, you have more chances to hit it painfully, and
if it's append-only even more so.
Alternative point of "if your database is super large and actively written,
you may want to set autovacuum_freeze_max_age to even smaller values so
that autovacuum load is more evenly spread over time" may be needed.
--
Darafei Praliaskouski
Support me: http://patreon.com/komzpa
If it's months, we probably want limit vacuum to working at a pretty
slow rate, say 1% of the table size per hour or something. If it's in
hours, we need to be a lot more aggressive. Right now we have no
information to tell us which of those things is the case, so we'd just
be shooting in the dark.
Thing is, you don't need to spread out your vacuum in time if the rate of
vacuuming matches rate of table growth. Can we mark tuples/pages as
all-visible and all-frozen say, the moment they're pushed out of
shared_buffers?
--
Darafei Praliaskouski
Support me: http://patreon.com/komzpa
Idea: look not on dead tuples, but on changes, just like ANALYZE does.
It's my first patch on Postgres, it's probably all wrong but I hope it
helps you get the idea.This was suggested and rejected years ago:
/messages/by-id/b970f20f-f096-2d3a-6c6d-ee887bd30cfb@2ndquadrant.fr
Thank you for sharing the link. I've read through the thread and see you
posted two patches, first being similar but different from mine, and second
being about a different matter.
I don't see "rejected" there, just a common distraction of "you should also
consider this" and time-out leading to "returned with feedback" at the end.
Thing is, we have dead large productions and post-mortems now as your patch
wasn't pushed back in 2016, so situation is different. Let's push at least
first of two patches of yours, or mine.
Which one is better and why?
I believe mine, as it just follows a pattern already established and proven
in autoanalyze. If vacuum comes and unable to harvest some dead tuples, it
will come over again in your case, and just sleep until it gets new dead
tuples in mine, which looks better to me - there's no dead loop in case
some dead tuples are stuck forever.
If someone thinks yours is better we may also consider it for autoanalyze?
--
Darafei Praliaskouski
Support me: http://patreon.com/komzpa