Overhauling "Routine Vacuuming" docs, particularly its handling of freezing
My work on page-level freezing for PostgreSQL 16 has some remaining
loose ends to tie up with the documentation. The "Routine Vacuuming"
section of the docs has no mention of page-level freezing. It also
doesn't mention the FPI optimization added by commit 1de58df4. This
isn't a small thing to leave out; I fully expect that the FPI
optimization will very significantly alter when and how VACUUM
freezes. The cadence will look quite a lot different.
It seemed almost impossible to fit in discussion of page-level
freezing to the existing structure. In part this is because the
existing documentation emphasizes the worst case scenario, rather than
talking about freezing as a maintenance task that affects physical
heap pages in roughly the same way as pruning does. There isn't a
clean separation of things that would allow me to just add a paragraph
about the FPI thing.
Obviously it's important that the system never enters xidStopLimit
mode -- not being able to allocate new XIDs is a huge problem. But it
seems unhelpful to define that as the only goal of freezing, or even
the main goal. To me this seems similar to defining the goal of
cleaning up bloat as avoiding completely running out of disk space;
while it may be "the single most important thing" in some general
sense, it isn't all that important in most individual cases. There are
many very bad things that will happen before that extreme worst case
is hit, which are far more likely to be the real source of pain.
There are also very big structural problems with "Routine Vacuuming",
that I also propose to do something about. Honestly, it's a huge mess
at this point. It's nobody's fault in particular; there has been
accretion after accretion added, over many years. It is time to
finally bite the bullet and do some serious restructuring. I'm hoping
that I don't get too much push back on this, because it's already very
difficult work.
Attached patch series shows what I consider to be a much better
overall structure. To make this convenient to take a quick look at, I
also attach a prebuilt version of routine-vacuuming.html (not the only
page that I've changed, but the most important set of changes by far).
This initial version is still quite lacking in overall polish, but I
believe that it gets the general structure right. That's what I'd like
to get feedback on right now: can I get agreement with me about the
general nature of the problem? Does this high level direction seem
like the right one?
The following list is a summary of the major changes that I propose:
1. Restructures the order of items to match the actual processing
order within VACUUM (and ANALYZE), rather than jumping from VACUUM to
ANALYZE and then back to VACUUM.
This flows a lot better, which helps with later items that deal with
freezing/wraparound.
2. Renamed "Preventing Transaction ID Wraparound Failures" to
"Freezing to manage the transaction ID space". Now we talk about
wraparound as a subtopic of freezing, not vice-versa. (This is a
complete rewrite, as described by later items in this list).
3. All of the stuff about modulo-2^32 arithmetic is moved to the
storage chapter, where we describe the heap tuple header format.
It seems crazy to me that the second sentence in our discussion of
wraparound/freezing is still:
"But since transaction IDs have limited size (32 bits) a cluster that
runs for a long time (more than 4 billion transactions) would suffer
transaction ID wraparound: the XID counter wraps around to zero, and
all of a sudden transactions that were in the past appear to be in the
future"
Here we start the whole discussion of wraparound (a particularly
delicate topic) by describing how VACUUM used to work 20 years ago,
before the invention of freezing. That was the last time that a
PostgreSQL cluster could run for 4 billion XIDs without freezing. The
invariant is that we activate xidStopLimit mode protections to avoid a
"distance" between any two unfrozen XIDs that exceeds about 2 billion
XIDs. So why on earth are we talking about 4 billion XIDs? This is the
most confusing, least useful way of describing freezing that I can
think of.
4. No more separate section for MultiXactID freezing -- that's
discussed as part of the discussion of page-level freezing.
Page-level freezing takes place without regard to the trigger
condition for freezing. So the new approach to freezing has a fixed
idea of what it means to freeze a given page (what physical
modifications it entails). This means that having a separate sect3
subsection for MultiXactIds now makes no sense (if it ever did).
5. The top-level list of maintenance tasks has a new addition: "To
truncate obsolescent transaction status information, when possible".
It makes a lot of sense to talk about this as something that happens
last (or last among those steps that take place during VACUUM). It's
far less important than avoiding xidStopLimit outages, obviously
(using some extra disk space is almost certainly the least of your
worries when you're near to xidStopLimit). The current documentation
seems to take precisely the opposite view, when it says the following:
"The sole disadvantage of increasing autovacuum_freeze_max_age (and
vacuum_freeze_table_age along with it) is that the pg_xact and
pg_commit_ts subdirectories of the database cluster will take more
space"
This sentence is dangerously bad advice. It is precisely backwards. At
the same time, we'd better say something about the need to truncate
pg_xact/clog here. Besides all this, the new section for this is a far
more accurate reflection of what's really going on: most individual
VACUUMs (even most aggressive VACUUMs) won't ever truncate
pg_xact/clog (or the other relevant SLRUs). Truncation only happens
after a VACUUM that advances the relfrozenxid of the table which
previously had the oldest relfrozenxid among all tables in the entire
cluster -- so we need to talk about it as an issue with the high
watermark storage for pg_xact.
6. Rename the whole "Routine Vacuuming" section to "Autovacuum
Maintenance Tasks".
This is what we should be emphasizing over manually run VACUUMs.
Besides, the current title just seems wrong -- we're talking about
ANALYZE just as much as VACUUM.
Thoughts?
--
Peter Geoghegan
Attachments:
routine-vacuuming.htmltext/html; charset=UTF-8; name=routine-vacuuming.htmlDownload
v1-0009-Overhaul-Recovering-Disk-Space-vacuuming-docs.patchapplication/octet-stream; name=v1-0009-Overhaul-Recovering-Disk-Space-vacuuming-docs.patchDownload+93-81
v1-0008-Overhaul-freezing-and-wraparound-docs.patchapplication/octet-stream; name=v1-0008-Overhaul-freezing-and-wraparound-docs.patchDownload+399-248
v1-0006-Merge-basic-vacuuming-sect2-into-sect1-introducti.patchapplication/octet-stream; name=v1-0006-Merge-basic-vacuuming-sect2-into-sect1-introducti.patchDownload+51-56
v1-0007-Make-maintenance.sgml-more-autovacuum-orientated.patchapplication/octet-stream; name=v1-0007-Make-maintenance.sgml-more-autovacuum-orientated.patchDownload+44-48
v1-0003-Normalize-maintenance.sgml-indentation.patchapplication/octet-stream; name=v1-0003-Normalize-maintenance.sgml-indentation.patchDownload+41-42
v1-0004-Reorder-routine-vacuuming-sections.patchapplication/octet-stream; name=v1-0004-Reorder-routine-vacuuming-sections.patchDownload+155-152
v1-0002-Restructure-autuovacuum-daemon-section.patchapplication/octet-stream; name=v1-0002-Restructure-autuovacuum-daemon-section.patchDownload+42-25
v1-0005-Move-Interpreting-XID-stamps-from-tuple-headers.patchapplication/octet-stream; name=v1-0005-Move-Interpreting-XID-stamps-from-tuple-headers.patchDownload+78-66
v1-0001-Make-autovacuum-docs-into-a-sect1-of-its-own.patchapplication/octet-stream; name=v1-0001-Make-autovacuum-docs-into-a-sect1-of-its-own.patchDownload+166-167
On Tue, Apr 25, 2023 at 4:58 AM Peter Geoghegan <pg@bowt.ie> wrote:
There are also very big structural problems with "Routine Vacuuming",
that I also propose to do something about. Honestly, it's a huge mess
at this point. It's nobody's fault in particular; there has been
accretion after accretion added, over many years. It is time to
finally bite the bullet and do some serious restructuring. I'm hoping
that I don't get too much push back on this, because it's already very
difficult work.
Now is a great time to revise this section, in my view. (I myself am about
ready to get back to testing and writing for the task of removing that
"obnoxious hint".)
Attached patch series shows what I consider to be a much better
overall structure. To make this convenient to take a quick look at, I
also attach a prebuilt version of routine-vacuuming.html (not the only
page that I've changed, but the most important set of changes by far).This initial version is still quite lacking in overall polish, but I
believe that it gets the general structure right. That's what I'd like
to get feedback on right now: can I get agreement with me about the
general nature of the problem? Does this high level direction seem
like the right one?
I believe the high-level direction is sound, and some details have been
discussed before.
The following list is a summary of the major changes that I propose:
1. Restructures the order of items to match the actual processing
order within VACUUM (and ANALYZE), rather than jumping from VACUUM to
ANALYZE and then back to VACUUM.This flows a lot better, which helps with later items that deal with
freezing/wraparound.
Seems logical.
2. Renamed "Preventing Transaction ID Wraparound Failures" to
"Freezing to manage the transaction ID space". Now we talk about
wraparound as a subtopic of freezing, not vice-versa. (This is a
complete rewrite, as described by later items in this list).
+1
3. All of the stuff about modulo-2^32 arithmetic is moved to the
storage chapter, where we describe the heap tuple header format.
It does seem to be an excessive level of detail for this chapter, so +1.
Speaking of excessive detail, however...(skipping ahead)
+ <note>
+ <para>
+ There is no fundamental difference between a
+ <command>VACUUM</command> run during anti-wraparound
+ autovacuuming and a <command>VACUUM</command> that happens to
+ use the aggressive strategy (whether run by autovacuum or
+ manually issued).
+ </para>
+ </note>
I don't see the value of this, from the user's perspective, of mentioning
this at all, much less for it to be called out as a Note. Imagine a user
who has been burnt by non-cancellable vacuums. How would they interpret
this statement?
It seems crazy to me that the second sentence in our discussion of
wraparound/freezing is still:"But since transaction IDs have limited size (32 bits) a cluster that
runs for a long time (more than 4 billion transactions) would suffer
transaction ID wraparound: the XID counter wraps around to zero, and
all of a sudden transactions that were in the past appear to be in the
future"
Hah!
4. No more separate section for MultiXactID freezing -- that's
discussed as part of the discussion of page-level freezing.Page-level freezing takes place without regard to the trigger
condition for freezing. So the new approach to freezing has a fixed
idea of what it means to freeze a given page (what physical
modifications it entails). This means that having a separate sect3
subsection for MultiXactIds now makes no sense (if it ever did).
I have no strong opinion on that.
5. The top-level list of maintenance tasks has a new addition: "To
truncate obsolescent transaction status information, when possible".
+1
6. Rename the whole "Routine Vacuuming" section to "Autovacuum
Maintenance Tasks".This is what we should be emphasizing over manually run VACUUMs.
Besides, the current title just seems wrong -- we're talking about
ANALYZE just as much as VACUUM.
Seems more accurate. On top of that, "Routine vacuuming" slightly implies
manual vacuums.
I've only taken a cursory look, but will look more closely as time permits.
(Side note: My personal preference for rough doc patches would be to leave
out spurious whitespace changes. That not only includes indentation, but
also paragraphs where many of the words haven't changed at all, but every
line has changed to keep the paragraph tidy. Seems like more work for both
the author and the reviewer.)
--
John Naylor
EDB: http://www.enterprisedb.com
On Wed, Apr 26, 2023 at 12:16 AM John Naylor
<john.naylor@enterprisedb.com> wrote:
Now is a great time to revise this section, in my view. (I myself am about ready to get back to testing and writing for the task of removing that "obnoxious hint".)
Although I didn't mention the issue with single user mode in my
introductory email (the situation there is just appalling IMV), it
seems like I might not be able to ignore that problem while I'm
working on this patch. Declaring that as out of scope for this doc
patch series (on pragmatic grounds) feels awkward. I have to work
around something that is just wrong. For now, the doc patch just has
an "XXX" item about it. (Hopefully I'll think of a more natural way of
not fixing it.)
This initial version is still quite lacking in overall polish, but I
believe that it gets the general structure right. That's what I'd like
to get feedback on right now: can I get agreement with me about the
general nature of the problem? Does this high level direction seem
like the right one?I believe the high-level direction is sound, and some details have been discussed before.
I'm relieved that you think so. I was a bit worried that I'd get
bogged down, having already invested a lot of time in this.
Attached is v2. It has the same high level direction as v1, but is a
lot more polished. Still not committable, to be sure. But better than
v1.
I'm also attaching a prebuilt copy of routine-vacuuming.html, as with
v1 -- hopefully that's helpful.
3. All of the stuff about modulo-2^32 arithmetic is moved to the
storage chapter, where we describe the heap tuple header format.It does seem to be an excessive level of detail for this chapter, so +1. Speaking of excessive detail, however...(skipping ahead)
My primary objection to talking about modulo-2^32 stuff first is not
that it's an excessive amount of detail (though it definitely is). My
objection is that it places emphasis on exactly the thing that *isn't*
supposed to matter, under the design of freezing -- greatly confusing
the reader (even sophisticated readers). Discussion of so-called
wraparound should start with logical concepts, such as xmin XIDs being
treated as "infinitely far in the past" once frozen. The physical data
structures do matter too, but even there the emphasis should be on
heap pages being "self-contained", in the sense that SQL queries won't
need to access pg_xact to read the rows from the pages going forward
(even on standbys).
Why do we call wraparound wraparound, anyway? The 32-bit XID space is
circular! The whole point of the design is that unsigned integer
wraparound is meaningless -- there isn't really a point in "the
circle" that you should think of as the start point or end point.
(We're probably stuck with the term "wraparound" for now, so I'm not
proposing that it be changed here, purely on pragmatic grounds.)
+ <note> + <para> + There is no fundamental difference between a + <command>VACUUM</command> run during anti-wraparound + autovacuuming and a <command>VACUUM</command> that happens to + use the aggressive strategy (whether run by autovacuum or + manually issued). + </para> + </note>I don't see the value of this, from the user's perspective, of mentioning this at all, much less for it to be called out as a Note. Imagine a user who has been burnt by non-cancellable vacuums. How would they interpret this statement?
I meant that it isn't special from the point of view of vacuumlazy.c.
I do see your point, though. I've taken that out in v2.
(I happen to believe that the antiwraparound autocancellation behavior
is very unhelpful as currently implemented, which biased my view of
this.)
4. No more separate section for MultiXactID freezing -- that's
discussed as part of the discussion of page-level freezing.Page-level freezing takes place without regard to the trigger
condition for freezing. So the new approach to freezing has a fixed
idea of what it means to freeze a given page (what physical
modifications it entails). This means that having a separate sect3
subsection for MultiXactIds now makes no sense (if it ever did).I have no strong opinion on that.
Most of the time, when antiwraparound autovacuums are triggered by
autovacuum_multixact_freeze_max_age, in a way that is noticeable (say
a large table), VACUUM will in all likelihood end up processing
exactly 0 multis. What you'll get is pretty much an "early" aggressive
VACUUM, which isn't such a big deal (especially with page-level
freezing). You can already get an "early" aggressive VACUUM due to
hitting vacuum_freeze_table_age before autovacuum_freeze_max_age is
ever reached (in fact it's the common case, now that we have
insert-driven autovacuums).
So I'm trying to suggest that an aggressive VACUUM is the same
regardless of the trigger condition. To a lesser extent, I'm trying to
make the user aware that the mechanical difference between aggressive
and non-aggressive is fairly minor, even if the consequences of that
difference are quite noticeable. (Though maybe they're less noticeable
with the v16 work in place.)
I've only taken a cursory look, but will look more closely as time permits.
I would really appreciate that. This is not easy work.
I suspect that the docs talk about wraparound using extremely alarming
language possible because at one point it really was necessary to
scare users into running VACUUM to avoid data loss. This was before
autovacuum, and before the invention of vxids, and even before the
invention of freezing. It was up to you as a user to VACUUM your
database using cron, and if you didn't then eventually data loss could
result.
Obviously these docs were updated many times over the years, but I
maintain that the basic structure from 20 years ago is still present
in a way that it really shouldn't be.
(Side note: My personal preference for rough doc patches would be to leave out spurious whitespace changes.
I've tried to keep them out (or at least break the noisy whitespace
changes out into their own commit). I might have missed a few of them
in v1, which are fixed in v2.
Thanks
--
Peter Geoghegan
Attachments:
routine-vacuuming.htmltext/html; charset=UTF-8; name=routine-vacuuming.htmlDownload
v2-0006-Merge-basic-vacuuming-sect2-into-sect1-introducti.patchapplication/octet-stream; name=v2-0006-Merge-basic-vacuuming-sect2-into-sect1-introducti.patchDownload+51-56
v2-0008-Overhaul-freezing-and-wraparound-docs.patchapplication/octet-stream; name=v2-0008-Overhaul-freezing-and-wraparound-docs.patchDownload+514-265
v2-0007-Make-maintenance.sgml-more-autovacuum-orientated.patchapplication/octet-stream; name=v2-0007-Make-maintenance.sgml-more-autovacuum-orientated.patchDownload+44-48
v2-0001-Make-autovacuum-docs-into-a-sect1-of-its-own.patchapplication/octet-stream; name=v2-0001-Make-autovacuum-docs-into-a-sect1-of-its-own.patchDownload+166-167
v2-0009-Overhaul-Recovering-Disk-Space-vacuuming-docs.patchapplication/octet-stream; name=v2-0009-Overhaul-Recovering-Disk-Space-vacuuming-docs.patchDownload+93-81
v2-0004-Reorder-routine-vacuuming-sections.patchapplication/octet-stream; name=v2-0004-Reorder-routine-vacuuming-sections.patchDownload+155-152
v2-0002-Restructure-autovacuum-daemon-section.patchapplication/octet-stream; name=v2-0002-Restructure-autovacuum-daemon-section.patchDownload+42-25
v2-0003-Normalize-maintenance.sgml-indentation.patchapplication/octet-stream; name=v2-0003-Normalize-maintenance.sgml-indentation.patchDownload+41-42
v2-0005-Move-Interpreting-XID-stamps-from-tuple-headers.patchapplication/octet-stream; name=v2-0005-Move-Interpreting-XID-stamps-from-tuple-headers.patchDownload+78-66
On Thu, Apr 27, 2023 at 12:58 AM Peter Geoghegan <pg@bowt.ie> wrote:
On Wed, Apr 26, 2023 at 12:16 AM John Naylor
<john.naylor@enterprisedb.com> wrote:Now is a great time to revise this section, in my view. (I myself am
about ready to get back to testing and writing for the task of removing
that "obnoxious hint".)
Although I didn't mention the issue with single user mode in my
introductory email (the situation there is just appalling IMV), it
seems like I might not be able to ignore that problem while I'm
working on this patch. Declaring that as out of scope for this doc
patch series (on pragmatic grounds) feels awkward. I have to work
around something that is just wrong. For now, the doc patch just has
an "XXX" item about it. (Hopefully I'll think of a more natural way of
not fixing it.)
If it helps, I've gone ahead with some testing and polishing on that, and
it's close to ready, I think (CC'd you). I'd like that piece to be separate
and small enough to be backpatchable (at least in theory).
--
John Naylor
EDB: http://www.enterprisedb.com
On Sat, Apr 29, 2023 at 1:17 AM John Naylor
<john.naylor@enterprisedb.com> wrote:
Although I didn't mention the issue with single user mode in my
introductory email (the situation there is just appalling IMV), it
seems like I might not be able to ignore that problem while I'm
working on this patch. Declaring that as out of scope for this doc
patch series (on pragmatic grounds) feels awkward. I have to work
around something that is just wrong. For now, the doc patch just has
an "XXX" item about it. (Hopefully I'll think of a more natural way of
not fixing it.)If it helps, I've gone ahead with some testing and polishing on that, and it's close to ready, I think (CC'd you). I'd like that piece to be separate and small enough to be backpatchable (at least in theory).
That's great news. Not least because it unblocks this patch series of mine.
--
Peter Geoghegan
On Thu, Apr 27, 2023 at 12:58 AM Peter Geoghegan <pg@bowt.ie> wrote:
[v2]
I've done a more careful read-through, but I'll need a couple more, I
imagine.
I'll first point out some things I appreciate, and I'm glad are taken care
of as part of this work:
- Pushing the talk of scheduled manual vacuums to the last, rather than
first, para in the intro
- No longer pretending that turning off autovacuum is somehow normal
- Removing the egregiously outdated practice of referring to VACUUM FULL as
a "variant" of VACUUM
- Removing the mention of ALTER TABLE that has no earthly business in this
chapter -- for that, rewriting the table is a side effect to try to avoid,
not a tool in our smorgasbord for removing severe bloat.
Some suggestions:
- The section "Recovering Disk Space" now has 5 tips/notes/warnings in a
row. This is good information, but I wonder about:
"Note: Although VACUUM FULL is technically an option of the VACUUM command,
VACUUM FULL uses a completely different implementation. VACUUM FULL is
essentially a variant of CLUSTER. (The name VACUUM FULL is historical; the
original implementation was somewhat closer to standard VACUUM.)"
...maybe move this to a second paragraph in the warning about VACUUM FULL
and CLUSTER?
- The sentence "The XID cutoff point that VACUUM uses..." reads a bit
abruptly and unmotivated (although it is important). Part of the reason for
this is that the hyperlink "transaction ID number (XID)" which points to
the glossary is further down the page than this first mention.
- "VACUUM often marks certain pages frozen, indicating that all eligible
rows on the page were inserted by a transaction that committed sufficiently
far in the past that the effects of the inserting transaction are certain
to be visible to all current and future transactions."
-> This sentence is much harder to understand than the one it replaces.
Also, this is the first time "eligible" is mentioned. It may not need a
separate definition, but in this form it's rather circular.
- "freezing plays a crucial role in enabling _management of the XID
address_ space by VACUUM"
-> "management of the XID address space" links to the
aggressive-strategy sub-section below, but it's a strange link title
because the section we're in is itself titled "Freezing to manage the
transaction ID space".
- "The maximum “distance” that the system can tolerate..."
-> The next sentence goes on to show the "age" function, so using
different terms is a bit strange. Mixing the established age term with an
in-quotes "distance" could perhaps be done once in a definition, but then
all uses should stick to age.
--
John Naylor
EDB: http://www.enterprisedb.com
On Sat, Apr 29, 2023 at 8:54 PM John Naylor
<john.naylor@enterprisedb.com> wrote:
I've done a more careful read-through, but I'll need a couple more, I imagine.
Yeah, it's tough to get this stuff right.
I'll first point out some things I appreciate, and I'm glad are taken care of as part of this work:
- Pushing the talk of scheduled manual vacuums to the last, rather than first, para in the intro
- No longer pretending that turning off autovacuum is somehow normal
- Removing the egregiously outdated practice of referring to VACUUM FULL as a "variant" of VACUUM
- Removing the mention of ALTER TABLE that has no earthly business in this chapter -- for that, rewriting the table is a side effect to try to avoid, not a tool in our smorgasbord for removing severe bloat.Some suggestions:
- The section "Recovering Disk Space" now has 5 tips/notes/warnings in a row.
It occurs to me that all of this stuff (TRUNCATE, VACUUM FULL, and so
on) isn't "routine" at all. And so maybe this is the wrong chapter for
this entirely. The way I dealt with it in v2 wasn't very worked out --
I just knew that I had to do something, but hadn't given much thought
to what actually made sense.
I wonder if it would make sense to move all of that stuff into its own
new sect1 of "Chapter 29. Monitoring Disk Usage" -- something along
the lines of "what to do about bloat when all else fails, when the
problem gets completely out of hand". Naturally we'd link to this new
section from "Routine Vacuuming". What do you think of that general
approach?
This is good information, but I wonder about:
(Various points)
That's good feedback. I'll get to this in a couple of days.
--
Peter Geoghegan
On Wed, Apr 26, 2023 at 1:58 PM Peter Geoghegan <pg@bowt.ie> wrote:
Why do we call wraparound wraparound, anyway? The 32-bit XID space is
circular! The whole point of the design is that unsigned integer
wraparound is meaningless -- there isn't really a point in "the
circle" that you should think of as the start point or end point.
(We're probably stuck with the term "wraparound" for now, so I'm not
proposing that it be changed here, purely on pragmatic grounds.)
To me, the fact that the XID space is circular is the whole point of
talking about wraparound. If the XID space were non-circular, it could
never try to reuse the XID values that have previously been used, and
this entire class of problems would go away. Because it is circular,
it's possible for the XID counter to arrive back at a place that it's
been before i.e. it can wrap around.
--
Robert Haas
EDB: http://www.enterprisedb.com
On Mon, May 1, 2023 at 8:03 AM Robert Haas <robertmhaas@gmail.com> wrote:
To me, the fact that the XID space is circular is the whole point of
talking about wraparound.
The word wraparound is ambiguous. It's not the same thing as
xidStopLimit in my view. It's literal integer wraparound.
If you think of XIDs as having a native 64-bit representation, while
using a truncated 32-bit on-disk representation in tuple headers
(which is the view promoted by the doc patch), then XIDs cannot wrap
around. There is still no possibility of "the future becoming the
past" (assuming no use of single user mode), either, because even in
the worst case we have xidStopLimit to make sure that the database
doesn't become corrupt. Why talk about what's *not* happening in a
place of prominence?
We'll still talk about literal integer wraparound with the doc patch,
but it's part of a discussion of the on-disk format in a distant
chapter. It's just an implementation detail, which is of no practical
consequence. The main discussion need only say something succinct and
vague about the use of a truncated representation (lacking a separate
epoch) in tuple headers eventually forcing freezing.
If the XID space were non-circular, it could
never try to reuse the XID values that have previously been used, and
this entire class of problems would go away. Because it is circular,
it's possible for the XID counter to arrive back at a place that it's
been before i.e. it can wrap around.
But integer wrap around isn't really aligned with anything important.
xidStopLimit will kick in when we're only halfway towards literal
integer wrap around. Users have practical concerns about avoiding
xidStopLimit -- what a world without xidStopLimit looks like just
doesn't matter. Just having some vague awareness of truncated XIDs
being insufficient at some point is all you really need, even if
you're an advanced user.
--
Peter Geoghegan
On Mon, May 1, 2023 at 12:01 PM Peter Geoghegan <pg@bowt.ie> wrote:
If the XID space were non-circular, it could
never try to reuse the XID values that have previously been used, and
this entire class of problems would go away. Because it is circular,
it's possible for the XID counter to arrive back at a place that it's
been before i.e. it can wrap around.But integer wrap around isn't really aligned with anything important.
xidStopLimit will kick in when we're only halfway towards literal
integer wrap around. Users have practical concerns about avoiding
xidStopLimit -- what a world without xidStopLimit looks like just
doesn't matter. Just having some vague awareness of truncated XIDs
being insufficient at some point is all you really need, even if
you're an advanced user.
I disagree. If you start the cluster in single-user mode, you can
actually wrap it around, unless something has changed that I don't
know about.
I'm not trying to debate the details of the patch, which I have not
read. I am saying that, while wraparound is perhaps not a perfect term
for what's happening, it is not, in my opinion, a bad term either. I
don't think it's accurate to imagine that this is a 64-bit counter
where we only store 32 bits on disk. We're trying to retcon that into
being true, but we'd have to work significantly harder to actually
make it true.
--
Robert Haas
EDB: http://www.enterprisedb.com
On Mon, May 1, 2023 at 9:08 AM Robert Haas <robertmhaas@gmail.com> wrote:
I disagree. If you start the cluster in single-user mode, you can
actually wrap it around, unless something has changed that I don't
know about.
This patch relies on John's other patch which strongly discourages the
use of single-user mode. Were it not for that, I might agree.
I'm not trying to debate the details of the patch, which I have not
read. I am saying that, while wraparound is perhaps not a perfect term
for what's happening, it is not, in my opinion, a bad term either. I
don't think it's accurate to imagine that this is a 64-bit counter
where we only store 32 bits on disk. We're trying to retcon that into
being true, but we'd have to work significantly harder to actually
make it true.
The purpose of this documentation section is to give users practical
guidance, obviously. The main reason to frame it this way is because
it seems to make the material easier to understand.
--
Peter Geoghegan
On Mon, May 1, 2023 at 9:16 AM Peter Geoghegan <pg@bowt.ie> wrote:
On Mon, May 1, 2023 at 9:08 AM Robert Haas <robertmhaas@gmail.com> wrote:
I disagree. If you start the cluster in single-user mode, you can
actually wrap it around, unless something has changed that I don't
know about.This patch relies on John's other patch which strongly discourages the
use of single-user mode. Were it not for that, I might agree.
Also, it's not clear that the term "wraparound" even describes what
happens when you corrupt the database by violating the "no more than
~2.1 billion XIDs distance between any two unfrozen XIDs" invariant in
single-user mode. What specific thing will have wrapped around? It's
possible (and very likely) that every unfrozen XID in the database is
from the same 64-XID-wise epoch.
I don't think that we need to say very much about this scenario (and
nothing at all about the specifics in "Routine Vacuuming"), so maybe
it doesn't matter much. But I maintain that it makes most sense to
describe this scenario as a violation of the "no more than ~2.1
billion XIDs distance between any two unfrozen XIDs" invariant, while
leaving the term "wraparound" out of it completely. That terms has way
too much baggage.
--
Peter Geoghegan
On Mon, May 1, 2023, 18:08 Robert Haas <robertmhaas@gmail.com> wrote:
I am saying that, while wraparound is perhaps not a perfect term
for what's happening, it is not, in my opinion, a bad term either.
I don't want to put words into Peter's mouth, but I think that he's arguing
that the term "wraparound" suggests that there is something special about
the transition between xid 2^32 and xid 0 (or, well, 3). There isn't.
There's only something special about the transition, as your current xid
advances, between the xid that's half the xid space ahead of your current
xid and the xid that's half the xid space behind the current xid, if the
latter is not frozen. I don't think that's what most users think of when
they hear "wraparound".
On Mon, May 1, 2023 at 12:03 PM Maciek Sakrejda <m.sakrejda@gmail.com> wrote:
I don't want to put words into Peter's mouth, but I think that he's arguing that the term "wraparound" suggests that there is something special about the transition between xid 2^32 and xid 0 (or, well, 3). There isn't.
Yes, that's exactly what I mean. There are two points that seem to be
very much in tension here:
1. The scenario where you corrupt the database in single user mode by
unsafely allocating XIDs (you need single user mode to bypass the
xidStopLimit protections) generally won't involve unsigned integer
wraparound (and if it does it's *entirely* incidental to the data
corruption).
2. Actual unsigned integer wraparound is 100% harmless and routine, by design.
So why do we use the term wraparound as a synonym of "the end of the
world"? I assume that it's just an artefact of how the system worked
before the invention of freezing. Back then, you had to do a dump and
restore when the system reached about 4 billion XIDs. Wraparound
really did mean "the end of the world" over 20 years ago.
This is related to my preference for explaining the issues with
reference to a 64-bit XID space. Today we compare 64-bit XIDs using
simple unsigned integer comparisons. That's the same way that 32-bit
XID comparisons worked before freezing was invented in 2001. So it
really does seem like the natural way to explain it.
--
Peter Geoghegan
On Tue, May 2, 2023 at 12:09 AM Peter Geoghegan <pg@bowt.ie> wrote:
On Mon, May 1, 2023 at 9:16 AM Peter Geoghegan <pg@bowt.ie> wrote:
On Mon, May 1, 2023 at 9:08 AM Robert Haas <robertmhaas@gmail.com>
wrote:
I disagree. If you start the cluster in single-user mode, you can
actually wrap it around, unless something has changed that I don't
know about.
+1 Pretending otherwise is dishonest.
This patch relies on John's other patch which strongly discourages the
use of single-user mode. Were it not for that, I might agree.
Oh that's rich. I'll note that 5% of your review was actually helpful
(actual correction), the other 95% was needless distraction trying to
enlist me in your holy crusade against the term "wraparound". It had the
opposite effect.
Also, it's not clear that the term "wraparound" even describes what
happens when you corrupt the database by violating the "no more than
~2.1 billion XIDs distance between any two unfrozen XIDs" invariant in
single-user mode. What specific thing will have wrapped around?
In your first message you said "I'm hoping that I don't get too much push
back on this, because it's already very difficult work."
Here's some advice on how to avoid pushback:
1. Insist that all terms can only be interpreted in the most pig-headedly
literal sense possible.
2. Use that premise to pretend basic facts are a complete mystery.
3. Claim that others are holding you back, and then try to move the
goalposts in their work.
--
John Naylor
EDB: http://www.enterprisedb.com
On Mon, May 1, 2023 at 8:04 PM John Naylor <john.naylor@enterprisedb.com> wrote:
Here's some advice on how to avoid pushback:
1. Insist that all terms can only be interpreted in the most pig-headedly literal sense possible.
2. Use that premise to pretend basic facts are a complete mystery.
I can't imagine why you feel it necessary to communicate with me like
this. This is just vitriol, lacking any substance.
How we use words like wraparound is actually something of great
consequence to the Postgres project. We've needlessly scared users
with the way this information has been presented up until now -- that
much is clear. To have you talk to me like this when I'm working on
such a difficult, thankless task is a real slap in the face.
3. Claim that others are holding you back, and then try to move the goalposts in their work.
When did I say that? When did I even suggest it?
--
Peter Geoghegan
On Mon, May 1, 2023 at 8:04 PM John Naylor <john.naylor@enterprisedb.com> wrote:
Oh that's rich. I'll note that 5% of your review was actually helpful (actual correction), the other 95% was needless distraction trying to enlist me in your holy crusade against the term "wraparound". It had the opposite effect.
I went back and checked. There were exactly two short paragraphs about
wraparound terminology on the thread associated with the patch you're
working on, towards the end of this one email:
/messages/by-id/CAH2-Wzm2fpPQ_=pXpRvkNiuTYBGTAUfxRNW40kLitxj9T3Ny7w@mail.gmail.com
In what world does that amount to 95% of my review, or anything like it?
--
Peter Geoghegan
On Mon, May 1, 2023 at 11:21 PM Peter Geoghegan <pg@bowt.ie> wrote:
I can't imagine why you feel it necessary to communicate with me like
this. This is just vitriol, lacking any substance.
John's email is pretty harsh, but I can understand why he's frustrated.
I told you that I did not agree with your dislike for the term
wraparound and I explained why. You sent a couple more emails telling
me that I was wrong and, frankly, saying a lot of things that seem
only tangentially related to the point that I was actually making. You
seem to expect other people to spend a LOT OF TIME trying to
understand what you're trying to say, but you don't seem to invest
similar effort in trying to understand what they're trying to say. I
couldn't even begin to grasp what your point was until Maciek stepped
in to explain, and I still don't really agree with it, and I expect
that no matter how many emails I write about that, your position won't
budge an iota.
It's really demoralizing. If I just vote -1 on the patch set, then I'm
a useless obstruction. If I actually try to review it, we'll exchange
100 emails and I won't get anything else done for the next two weeks
and I probably won't feel much better about the patch at the end of
that process than at the beginning. I don't see that I have any
winning options here.
--
Robert Haas
EDB: http://www.enterprisedb.com
On Tue, May 2, 2023 at 1:29 PM Robert Haas <robertmhaas@gmail.com> wrote:
I told you that I did not agree with your dislike for the term
wraparound and I explained why. You sent a couple more emails telling
me that I was wrong and, frankly, saying a lot of things that seem
only tangentially related to the point that I was actually making.
I agree that that's what I did. You're perfectly entitled to find that
annoying (though I maintain that my point about the 64-bit XID space
was a good one, assuming the general subject matter was of interest).
However, you're talking about this as if I dug my feet in on a
substantive issue affecting the basic shape of the patch -- I don't
believe that that conclusion is justified by anything I've said or
done. I'm not even sure that we disagree on some less important point
that will directly affect the patch (it's quite possible, but I'm not
even sure of it).
I've already said that I don't think that the term wraparound is going
anywhere anytime soon (granted, that was on the other thread). So it's
not like I'm attempting to banish all existing use of that terminology
within the scope of this patch series -- far from it. At most I tried
to avoid inventing new terms that contain the word "wraparound" (also
on the other thread).
The topic originally came up in the context of moving talk about
physical wraparound to an entirely different chapter. Which is, I
believe (based in part on previous discussions), something that all
three of us already agree on! So again, I must ask: is there actually
a substantive disagreement at all?
It's really demoralizing. If I just vote -1 on the patch set, then I'm
a useless obstruction. If I actually try to review it, we'll exchange
100 emails and I won't get anything else done for the next two weeks
and I probably won't feel much better about the patch at the end of
that process than at the beginning. I don't see that I have any
winning options here.
I've already put a huge amount of work into this. It is inherently a
very difficult thing to get right -- it's not hard to understand why
it was put off for so long. Why shouldn't I have opinions, given all
that? I'm frustrated too.
Despite all this, John basically agreed with my high level direction
-- all of the important points seemed to have been settled without any
arguments whatsoever (also based in part on previous discussions).
John's volley of abuse seemed to come from nowhere at all.
--
Peter Geoghegan
Hi,
On Mon, Apr 24, 2023 at 2:58 PM Peter Geoghegan <pg@bowt.ie> wrote:
My work on page-level freezing for PostgreSQL 16 has some remaining
loose ends to tie up with the documentation. The "Routine Vacuuming"
section of the docs has no mention of page-level freezing. It also
doesn't mention the FPI optimization added by commit 1de58df4. This
isn't a small thing to leave out; I fully expect that the FPI
optimization will very significantly alter when and how VACUUM
freezes. The cadence will look quite a lot different.It seemed almost impossible to fit in discussion of page-level
freezing to the existing structure. In part this is because the
existing documentation emphasizes the worst case scenario, rather than
talking about freezing as a maintenance task that affects physical
heap pages in roughly the same way as pruning does. There isn't a
clean separation of things that would allow me to just add a paragraph
about the FPI thing.Obviously it's important that the system never enters xidStopLimit
mode -- not being able to allocate new XIDs is a huge problem. But it
seems unhelpful to define that as the only goal of freezing, or even
the main goal. To me this seems similar to defining the goal of
cleaning up bloat as avoiding completely running out of disk space;
while it may be "the single most important thing" in some general
sense, it isn't all that important in most individual cases. There are
many very bad things that will happen before that extreme worst case
is hit, which are far more likely to be the real source of pain.There are also very big structural problems with "Routine Vacuuming",
that I also propose to do something about. Honestly, it's a huge mess
at this point. It's nobody's fault in particular; there has been
accretion after accretion added, over many years. It is time to
finally bite the bullet and do some serious restructuring. I'm hoping
that I don't get too much push back on this, because it's already very
difficult work.
Thanks for taking the time to do this. It is indeed difficult work. I'll
give my perspective as someone who has not read the vacuum code but have
learnt most of what I know about autovacuum / vacuuming by reading the
"Routine Vacuuming" page 10s of times.
Attached patch series shows what I consider to be a much better
overall structure. To make this convenient to take a quick look at, I
also attach a prebuilt version of routine-vacuuming.html (not the only
page that I've changed, but the most important set of changes by far).This initial version is still quite lacking in overall polish, but I
believe that it gets the general structure right. That's what I'd like
to get feedback on right now: can I get agreement with me about the
general nature of the problem? Does this high level direction seem
like the right one?
There are things I like about the changes you've proposed and some where I
feel that the previous section was easier to understand. I'll comment
inline on the summary below and will put in a few points about things I
think can be improved at the end.
The following list is a summary of the major changes that I propose:
1. Restructures the order of items to match the actual processing
order within VACUUM (and ANALYZE), rather than jumping from VACUUM to
ANALYZE and then back to VACUUM.This flows a lot better, which helps with later items that deal with
freezing/wraparound.
+1
2. Renamed "Preventing Transaction ID Wraparound Failures" to
"Freezing to manage the transaction ID space". Now we talk about
wraparound as a subtopic of freezing, not vice-versa. (This is a
complete rewrite, as described by later items in this list).
+1 on this too. Freezing is a normal part of vacuuming and while the
aggressive vacuums are different, I think just talking about the worst case
scenario while referring to it is alarmist.
3. All of the stuff about modulo-2^32 arithmetic is moved to the
storage chapter, where we describe the heap tuple header format.It seems crazy to me that the second sentence in our discussion of
wraparound/freezing is still:"But since transaction IDs have limited size (32 bits) a cluster that
runs for a long time (more than 4 billion transactions) would suffer
transaction ID wraparound: the XID counter wraps around to zero, and
all of a sudden transactions that were in the past appear to be in the
future"Here we start the whole discussion of wraparound (a particularly
delicate topic) by describing how VACUUM used to work 20 years ago,
before the invention of freezing. That was the last time that a
PostgreSQL cluster could run for 4 billion XIDs without freezing. The
invariant is that we activate xidStopLimit mode protections to avoid a
"distance" between any two unfrozen XIDs that exceeds about 2 billion
XIDs. So why on earth are we talking about 4 billion XIDs? This is the
most confusing, least useful way of describing freezing that I can
think of.4. No more separate section for MultiXactID freezing -- that's
discussed as part of the discussion of page-level freezing.Page-level freezing takes place without regard to the trigger
condition for freezing. So the new approach to freezing has a fixed
idea of what it means to freeze a given page (what physical
modifications it entails). This means that having a separate sect3
subsection for MultiXactIds now makes no sense (if it ever did).5. The top-level list of maintenance tasks has a new addition: "To
truncate obsolescent transaction status information, when possible".It makes a lot of sense to talk about this as something that happens
last (or last among those steps that take place during VACUUM). It's
far less important than avoiding xidStopLimit outages, obviously
(using some extra disk space is almost certainly the least of your
worries when you're near to xidStopLimit). The current documentation
seems to take precisely the opposite view, when it says the following:"The sole disadvantage of increasing autovacuum_freeze_max_age (and
vacuum_freeze_table_age along with it) is that the pg_xact and
pg_commit_ts subdirectories of the database cluster will take more
space"This sentence is dangerously bad advice. It is precisely backwards. At
the same time, we'd better say something about the need to truncate
pg_xact/clog here. Besides all this, the new section for this is a far
more accurate reflection of what's really going on: most individual
VACUUMs (even most aggressive VACUUMs) won't ever truncate
pg_xact/clog (or the other relevant SLRUs). Truncation only happens
after a VACUUM that advances the relfrozenxid of the table which
previously had the oldest relfrozenxid among all tables in the entire
cluster -- so we need to talk about it as an issue with the high
watermark storage for pg_xact.6. Rename the whole "Routine Vacuuming" section to "Autovacuum
Maintenance Tasks".This is what we should be emphasizing over manually run VACUUMs.
Besides, the current title just seems wrong -- we're talking about
ANALYZE just as much as VACUUM.
+1 on this. Talking about autovacuum as the default and how to get the most
out of it seems like the right way to go.
I read through the new version a couple times and here is some of my
feedback. I haven't yet reviewed individual patches or done a very detailed
comparison with the previous version.
1) While I agree that bundling VACUUM and VACUUM FULL is not the right way,
moving all VACUUM FULL references into tips and warnings also seems
excessive. I think it's probably best to just have a single paragraph which
talks about VACUUM FULL as I do think it should be mentioned in the
reclaiming disk space section.
2) I felt that the new section, "Freezing to manage the transaction ID
space" could be made simpler to understand. As an example, I understood
what the parameters (autovacuum_freeze_max_age, vacuum_freeze_table_age) do
and how they interact better in the previous version of the docs.
3) In the "VACUUMs aggressive strategy" section, we should first introduce
what an aggressive VACUUM is before going into when it's triggered, where
metadata is stored etc. It's only several paragraphs later that I get to
know what we are referring to as an "aggressive" autovacuum.
4) I think we should explicitly call out that seeing an anti-wraparound
VACUUM or "VACUUM table (to prevent wraparound)" is normal and that it's
just a VACUUM triggered due to the table having unfrozen rows with an XID
older than autovacuum_freeze_max_age. I've seen many users panicking on
seeing this and feeling that they are close to a wraparound. Also, we
should be more clear about how it's different from VACUUMs triggered due to
the scale factors (cancellation behavior, being triggered when autovacuum
is disabled etc.). I think you do some of this but given the panic around
transactionid wraparounds, being more clear about this is better.
5) Can we use a better name for the XidStopLimit mode? It seems like a very
implementation centric name. Maybe a better version of "Running out of the
XID space" or something like that?
6) In the XidStopLimit mode section, it would be good to explain briefly
why you could get to this scenario. It's not something which should happen
in a normal running system unless you have a long running transaction or
inactive replication slots or a badly configured system or something of
that sort. If you got to this point, other than running VACUUM to get out
of the situation, it's also important to figure out what got you there in
the first place as many VACUUMs should have attempted to advance the
relfrozenxid and failed.
There are a few other small things I noticed along the way but my goal was
to look at the overall structure. As we address some of these, I'm happy to
do more detailed review of individual patches.
Regards,
Samay
Thoughts?
Show quoted text
--
Peter Geoghegan